@thierrynakoa/fire-flow 10.0.0 → 12.2.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.claude-plugin/plugin.json +8 -8
- package/ARCHITECTURE-DIAGRAM.md +7 -4
- package/COMMAND-REFERENCE.md +33 -13
- package/DOMINION-FLOW-OVERVIEW.md +581 -421
- package/QUICK-START.md +3 -3
- package/README.md +101 -44
- package/TROUBLESHOOTING.md +264 -264
- package/agents/fire-executor.md +200 -116
- package/agents/fire-fact-checker.md +276 -276
- package/agents/fire-phoenix-analyst.md +394 -0
- package/agents/fire-planner.md +145 -53
- package/agents/fire-project-researcher.md +155 -155
- package/agents/fire-research-synthesizer.md +166 -166
- package/agents/fire-researcher.md +144 -59
- package/agents/fire-roadmapper.md +215 -203
- package/agents/fire-verifier.md +247 -65
- package/agents/fire-vision-architect.md +381 -0
- package/commands/fire-0-orient.md +476 -476
- package/commands/fire-1a-new.md +216 -0
- package/commands/fire-1b-research.md +210 -0
- package/commands/fire-1c-setup.md +254 -0
- package/commands/{fire-1a-discuss.md → fire-1d-discuss.md} +35 -7
- package/commands/fire-3-execute.md +55 -2
- package/commands/fire-4-verify.md +61 -0
- package/commands/fire-5-handoff.md +2 -2
- package/commands/fire-6-resume.md +37 -2
- package/commands/fire-add-new-skill.md +2 -2
- package/commands/fire-autonomous.md +20 -3
- package/commands/fire-brainstorm.md +1 -1
- package/commands/fire-complete-milestone.md +2 -2
- package/commands/fire-cost.md +183 -0
- package/commands/fire-dashboard.md +2 -2
- package/commands/fire-debug.md +663 -663
- package/commands/fire-loop-resume.md +2 -2
- package/commands/fire-loop-stop.md +1 -1
- package/commands/fire-loop.md +1168 -1168
- package/commands/fire-map-codebase.md +3 -3
- package/commands/fire-new-milestone.md +356 -356
- package/commands/fire-phoenix.md +603 -0
- package/commands/fire-reflect.md +235 -235
- package/commands/fire-research.md +246 -246
- package/commands/fire-search.md +1 -1
- package/commands/fire-skills-diff.md +3 -3
- package/commands/fire-skills-history.md +3 -3
- package/commands/fire-skills-rollback.md +7 -7
- package/commands/fire-skills-sync.md +5 -5
- package/commands/fire-test.md +9 -9
- package/commands/fire-todos.md +1 -1
- package/commands/fire-update.md +5 -5
- package/hooks/hooks.json +16 -16
- package/hooks/run-hook.sh +8 -8
- package/hooks/run-session-end.sh +7 -7
- package/hooks/session-end.sh +90 -90
- package/hooks/session-start.sh +1 -1
- package/package.json +4 -2
- package/plugin.json +7 -7
- package/references/metrics-and-trends.md +1 -1
- package/skills-library/SKILLS-INDEX.md +588 -588
- package/skills-library/_general/methodology/AUTONOMOUS_ORCHESTRATION.md +182 -0
- package/skills-library/_general/methodology/BACKWARD_PLANNING_INTERVIEW.md +307 -0
- package/skills-library/_general/methodology/CIRCUIT_BREAKER_INTELLIGENCE.md +163 -0
- package/skills-library/_general/methodology/CONTEXT_ROTATION.md +151 -0
- package/skills-library/_general/methodology/DEAD_ENDS_SHELF.md +188 -0
- package/skills-library/_general/methodology/DESIGN_PHILOSOPHY_ENFORCEMENT.md +152 -0
- package/skills-library/_general/methodology/INTERNAL_CONSISTENCY_AUDIT.md +212 -0
- package/skills-library/_general/methodology/LIVE_BREADCRUMB_PROTOCOL.md +242 -0
- package/skills-library/_general/methodology/PHOENIX_REBUILD_METHODOLOGY.md +251 -0
- package/skills-library/_general/methodology/QUALITY_GATES_AND_VERIFICATION.md +157 -0
- package/skills-library/_general/methodology/RELIABILITY_PREDICTION.md +104 -0
- package/skills-library/_general/methodology/REQUIREMENTS_DECOMPOSITION.md +155 -0
- package/skills-library/_general/methodology/SELF_TESTING_FEEDBACK_LOOP.md +143 -0
- package/skills-library/_general/methodology/STACK_COMPATIBILITY_MATRIX.md +178 -0
- package/skills-library/_general/methodology/TIERED_CONTEXT_ARCHITECTURE.md +118 -0
- package/skills-library/_general/methodology/ZERO_FRICTION_CLI_SETUP.md +312 -0
- package/skills-library/_general/methodology/autonomous-multi-phase-build.md +133 -0
- package/skills-library/_general/methodology/claude-md-archival.md +280 -0
- package/skills-library/_general/methodology/debug-swarm-researcher-escape-hatch.md +240 -240
- package/skills-library/_general/methodology/git-worktrees-parallel.md +232 -0
- package/skills-library/_general/methodology/llm-judge-memory-crud.md +241 -0
- package/skills-library/_general/methodology/multi-project-autonomous-build.md +360 -0
- package/skills-library/_general/methodology/shell-autonomous-loop-fixplan.md +238 -238
- package/skills-library/_general/patterns-standards/GOF_DESIGN_PATTERNS_FOR_AI_AGENTS.md +358 -0
- package/skills-library/methodology/BREATH_BASED_PARALLEL_EXECUTION.md +1 -1
- package/skills-library/methodology/RESEARCH_BACKED_WORKFLOW_UPGRADE.md +1 -1
- package/skills-library/methodology/SABBATH_REST_PATTERN.md +1 -1
- package/templates/ASSUMPTIONS.md +1 -1
- package/templates/BLOCKERS.md +1 -1
- package/templates/DECISION_LOG.md +1 -1
- package/templates/phase-prompt.md +1 -1
- package/templates/phoenix-comparison.md +80 -0
- package/version.json +2 -2
- package/workflows/handoff-session.md +1 -1
- package/workflows/new-project.md +2 -2
- package/commands/fire-1-new.md +0 -281
|
@@ -0,0 +1,182 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: AUTONOMOUS_ORCHESTRATION
|
|
3
|
+
category: methodology
|
|
4
|
+
description: Industry patterns for autonomous AI agent orchestration — Planner/Worker/Judge separation, scope manifests, DORA metrics, phase-gate hybrids, and supervised autonomy tiers
|
|
5
|
+
version: 1.0.0
|
|
6
|
+
tags: [autonomous, orchestration, agents, planner-worker-judge, dora, phase-gate, scope]
|
|
7
|
+
sources:
|
|
8
|
+
- "Mike Mason — AI Coding Agents in 2026: Coherence Through Orchestration"
|
|
9
|
+
- "Anthropic Engineering — Effective Harnesses for Long-Running Agents"
|
|
10
|
+
- "OpenHands SDK — arxiv 2511.03690"
|
|
11
|
+
- "Google DORA — State of DevOps 2025"
|
|
12
|
+
- "Robert Cooper — Agile-Stage-Gate Hybrids"
|
|
13
|
+
- "AWS — Agentic AI Security Scoping Matrix (TBAC)"
|
|
14
|
+
- "SWE-Bench Pro — Can AI Agents Solve Long-Horizon Tasks?"
|
|
15
|
+
---
|
|
16
|
+
|
|
17
|
+
# Autonomous Orchestration Patterns
|
|
18
|
+
|
|
19
|
+
> **Core insight:** "You don't trust; you instrument." (Boris Cherny) — Verification stays active. Verdicts auto-route to fix cycles. The human reviews the finished product, not intermediate steps.
|
|
20
|
+
|
|
21
|
+
---
|
|
22
|
+
|
|
23
|
+
## 1. Planner / Worker / Judge Separation
|
|
24
|
+
|
|
25
|
+
The architecture that consistently outperforms alternatives in autonomous coding research:
|
|
26
|
+
|
|
27
|
+
| Role | Can Do | Cannot Do | Dominion Flow Equivalent |
|
|
28
|
+
|------|--------|-----------|------------------------|
|
|
29
|
+
| **Planner** | Read codebase, decompose tasks, set scope | Write code, modify files | fire-planner |
|
|
30
|
+
| **Worker** | Execute scoped tasks, write code | Plan, verify own work, exceed scope | fire-executor |
|
|
31
|
+
| **Judge** | Run verification, read output, report findings | Fix what it finds broken, modify code | fire-verifier |
|
|
32
|
+
|
|
33
|
+
**Why this matters:** The most dangerous failure mode in autonomous AI is the agent judging its own work. The worker cannot declare itself done. The judge cannot fix what it finds. This separation is load-bearing architecture, not process overhead.
|
|
34
|
+
|
|
35
|
+
**Anti-pattern:** "The executor also checks if things work" — this is the worker judging itself. SWE-Bench Pro data shows agents that self-verify have significantly lower success rates than those with independent verification.
|
|
36
|
+
|
|
37
|
+
---
|
|
38
|
+
|
|
39
|
+
## 2. Scope Manifests (Task-Based Access Control)
|
|
40
|
+
|
|
41
|
+
Every task should include a scope boundary:
|
|
42
|
+
|
|
43
|
+
```yaml
|
|
44
|
+
scope:
|
|
45
|
+
allowed_files:
|
|
46
|
+
- "server/routes/auth.js"
|
|
47
|
+
- "server/middleware/auth.js"
|
|
48
|
+
- "server/models/User.js"
|
|
49
|
+
allowed_operations:
|
|
50
|
+
- create_file
|
|
51
|
+
- modify_file
|
|
52
|
+
- run_tests
|
|
53
|
+
forbidden:
|
|
54
|
+
- modify files outside allowed_files
|
|
55
|
+
- install new dependencies without plan approval
|
|
56
|
+
- delete existing tests
|
|
57
|
+
max_file_changes: 5
|
|
58
|
+
```
|
|
59
|
+
|
|
60
|
+
**Why explicit scope:** Agents drift without manifest enforcement. Conversational instructions ("only change the auth files") are less reliable than tool-level constraints. The circuit breaker should trip if the agent attempts out-of-scope actions.
|
|
61
|
+
|
|
62
|
+
**Agent action:** fire-planner includes scope in BLUEPRINT frontmatter. fire-executor reads scope before starting. fire-verifier checks that changes stayed within scope.
|
|
63
|
+
|
|
64
|
+
---
|
|
65
|
+
|
|
66
|
+
## 3. External Structured State > Long Context
|
|
67
|
+
|
|
68
|
+
From Anthropic's engineering blog: context window amnesia is solved by structured external state, not by longer context windows.
|
|
69
|
+
|
|
70
|
+
### The Pattern
|
|
71
|
+
```
|
|
72
|
+
Session start:
|
|
73
|
+
1. Read CONSCIENCE.md → current phase/status
|
|
74
|
+
2. Read latest WARRIOR handoff → prior session context
|
|
75
|
+
3. Read RECORD.md → what was done, what's pending
|
|
76
|
+
4. Read FAILURES.md → dead ends to avoid
|
|
77
|
+
|
|
78
|
+
Session end:
|
|
79
|
+
1. Update CONSCIENCE.md → new status
|
|
80
|
+
2. Write WARRIOR handoff → structured state for next session
|
|
81
|
+
3. Update RECORD.md → what was accomplished
|
|
82
|
+
4. Commit checkpoint
|
|
83
|
+
```
|
|
84
|
+
|
|
85
|
+
**Why:** A 200K context window that includes irrelevant history is worse than a 50K window with precisely structured state. External state files are the memory — the context window is the working space.
|
|
86
|
+
|
|
87
|
+
**Key from OpenHands:** Use an append-only event log for mutable state. When history approaches context limits, summarize old events while preserving the full log. Reduced API costs 2x with no performance degradation.
|
|
88
|
+
|
|
89
|
+
---
|
|
90
|
+
|
|
91
|
+
## 4. Supervised Autonomy Tiers
|
|
92
|
+
|
|
93
|
+
The industry is converging on three tiers:
|
|
94
|
+
|
|
95
|
+
| Tier | Risk Level | Oversight | Dominion Flow Mode |
|
|
96
|
+
|------|-----------|-----------|-------------------|
|
|
97
|
+
| **Human-in-the-loop** | High | Approval required before action | Manual `/fire-3-execute` |
|
|
98
|
+
| **Human-on-the-loop** | Medium | Autonomous with monitoring + escalation | `/fire-autonomous` |
|
|
99
|
+
| **Human-out-of-the-loop** | Low | Fully autonomous, periodic audit | Future: batch mode |
|
|
100
|
+
|
|
101
|
+
**Confidence thresholds drive tier assignment:**
|
|
102
|
+
- Routine tasks (boilerplate, config): 80% confidence → auto-proceed
|
|
103
|
+
- Business logic tasks: 85% confidence → auto-proceed, flag for review
|
|
104
|
+
- Architecture decisions: 90% confidence → require explicit approval
|
|
105
|
+
|
|
106
|
+
**Industry benchmark:** Operational escalation rates above 15% indicate confidence thresholds are miscalibrated — too much is being auto-approved or too little.
|
|
107
|
+
|
|
108
|
+
---
|
|
109
|
+
|
|
110
|
+
## 5. DORA Metrics for AI-Assisted Development
|
|
111
|
+
|
|
112
|
+
The 2025 DORA Report finding: **AI adoption improves throughput but increases delivery instability.** Faster output with more failures.
|
|
113
|
+
|
|
114
|
+
| Metric | What It Measures | AI-Agent Equivalent |
|
|
115
|
+
|--------|-----------------|-------------------|
|
|
116
|
+
| **Deployment Frequency** | How often to production | Phases completed per session |
|
|
117
|
+
| **Change Lead Time** | Commit → production | Plan → verified output |
|
|
118
|
+
| **Change Failure Rate** | % of deploys causing incidents | % of phases requiring re-execution |
|
|
119
|
+
| **Recovery Time** | Mean time to restore | Time from verification FAIL to PASS |
|
|
120
|
+
|
|
121
|
+
**The key insight:** Optimizing throughput (more phases faster) without measuring stability (failure rate, recovery time) produces more bugs faster. Both dimensions must be tracked.
|
|
122
|
+
|
|
123
|
+
**Agent action:** The autonomous log should track both: phases completed AND phases that needed retry. A session that completes 3 phases cleanly is better than one that completes 5 phases with 3 retries each.
|
|
124
|
+
|
|
125
|
+
---
|
|
126
|
+
|
|
127
|
+
## 6. Phase-Gate + Agile Hybrid
|
|
128
|
+
|
|
129
|
+
The sweet spot for AI-assisted development:
|
|
130
|
+
|
|
131
|
+
```
|
|
132
|
+
MACRO: Phase-gate discipline at boundaries
|
|
133
|
+
Plan → [GATE] → Execute → [GATE] → Verify → [GATE] → Handoff
|
|
134
|
+
|
|
135
|
+
Gates are non-negotiable. The project cannot advance
|
|
136
|
+
without passing verification.
|
|
137
|
+
|
|
138
|
+
MICRO: Agile flexibility within phases
|
|
139
|
+
Within Execute: iterate freely on tasks
|
|
140
|
+
Within Verify: scope-adaptive checks
|
|
141
|
+
Within Plan: explore alternatives
|
|
142
|
+
```
|
|
143
|
+
|
|
144
|
+
**Definition of Ready (before starting a phase):**
|
|
145
|
+
- [ ] Phase requirements clear (MEMORY.md populated)
|
|
146
|
+
- [ ] Dependencies from prior phase resolved
|
|
147
|
+
- [ ] Scope bounded in BLUEPRINT
|
|
148
|
+
- [ ] Kill conditions defined for high-risk tasks
|
|
149
|
+
|
|
150
|
+
**Definition of Done (before declaring phase complete):**
|
|
151
|
+
- [ ] All BLUEPRINT tasks executed
|
|
152
|
+
- [ ] Verification APPROVED or CONDITIONAL
|
|
153
|
+
- [ ] Review completed (no BLOCK findings)
|
|
154
|
+
- [ ] RECORD.md updated
|
|
155
|
+
- [ ] CONSCIENCE.md advanced
|
|
156
|
+
|
|
157
|
+
---
|
|
158
|
+
|
|
159
|
+
## 7. What Fails in Autonomous Mode
|
|
160
|
+
|
|
161
|
+
From SWE-Bench Pro and industry analysis — failure modes to watch for:
|
|
162
|
+
|
|
163
|
+
| Failure Mode | Symptom | Prevention |
|
|
164
|
+
|-------------|---------|-----------|
|
|
165
|
+
| **Premature termination** | Agent declares "done" before end-to-end verification | Judge must verify, not worker |
|
|
166
|
+
| **Scope creep** | Agent fixes "related" issues outside scope | Scope manifest enforcement |
|
|
167
|
+
| **Context overflow** | Endless file reading, losing track | Condensation + structured state |
|
|
168
|
+
| **Quality degradation at scale** | More output but more bugs | Track change failure rate alongside throughput |
|
|
169
|
+
| **Semantic misunderstanding** | Solution passes tests but misses the point | Verify against requirements, not just tests |
|
|
170
|
+
| **Self-certification** | Agent says "looks good" without running checks | Mandatory verification gates |
|
|
171
|
+
|
|
172
|
+
**The hardest failure:** Semantic misunderstanding — the agent produces code that is technically correct but doesn't solve the actual problem. Tests pass because the tests test the wrong thing. This is only caught by requirements-level verification, not code-level verification.
|
|
173
|
+
|
|
174
|
+
---
|
|
175
|
+
|
|
176
|
+
## When Agents Should Reference This Skill
|
|
177
|
+
|
|
178
|
+
- **fire-autonomous:** Apply supervised autonomy tiers, track DORA metrics in autonomous log
|
|
179
|
+
- **fire-planner:** Include scope manifests and kill conditions in BLUEPRINTs
|
|
180
|
+
- **fire-executor:** Respect scope boundaries, never self-verify
|
|
181
|
+
- **fire-verifier:** Independent judge role — verify against requirements, not just tests
|
|
182
|
+
- **fire-5-handoff:** Structure handoff as external state for next session (not narrative)
|
|
@@ -0,0 +1,307 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: BACKWARD_PLANNING_INTERVIEW
|
|
3
|
+
category: methodology
|
|
4
|
+
description: Structured questioning protocol for extracting project end-state from beginners who don't know technical terminology
|
|
5
|
+
version: 1.0.0
|
|
6
|
+
tags: [backward-planning, interview, vision, beginners, vibe-coder]
|
|
7
|
+
---
|
|
8
|
+
|
|
9
|
+
# Backward Planning Interview Protocol
|
|
10
|
+
|
|
11
|
+
Reference skill for `fire-vision-architect` (backward mode) and `fire-1-new` (adaptive questioning). Provides structured question sequences that extract the mission objective from users who don't know what questions to ask.
|
|
12
|
+
|
|
13
|
+
> **Origin:** Military backward planning doctrine — fix the end-state first, then derive every checkpoint from it. Adapted for software project initialization.
|
|
14
|
+
|
|
15
|
+
---
|
|
16
|
+
|
|
17
|
+
## Mode Gate
|
|
18
|
+
|
|
19
|
+
This interview activates when the user answers the mode gate question:
|
|
20
|
+
|
|
21
|
+
> **"Have you already started building this, or are we starting from scratch?"**
|
|
22
|
+
|
|
23
|
+
If they say "from scratch," "just an idea," or anything that reveals no existing tech context — this protocol runs. Never ask "What tech stack are you using?" — it forces beginners to bluff or freeze.
|
|
24
|
+
|
|
25
|
+
---
|
|
26
|
+
|
|
27
|
+
## Core Principle
|
|
28
|
+
|
|
29
|
+
Beginners describe products in terms of **what they can see and do**, not in technical terms. Every question below is designed to extract a hidden technical requirement from a plain-language answer.
|
|
30
|
+
|
|
31
|
+
**Never ask:** "Do you need WebSockets?" (they don't know what that is)
|
|
32
|
+
**Always ask:** "Should users see changes from other people instantly, like Google Docs?" (they know exactly what that means)
|
|
33
|
+
|
|
34
|
+
---
|
|
35
|
+
|
|
36
|
+
## Phase 0: Visual Input (Show, Don't Tell)
|
|
37
|
+
|
|
38
|
+
*Goal: Let users show what they're building instead of describing it. A picture extracts more requirements in 5 seconds than 10 minutes of questions.*
|
|
39
|
+
|
|
40
|
+
### Question 0: The Visual
|
|
41
|
+
> **"Do you have anything visual — a screenshot, a wireframe, a Figma link, a hand-drawn sketch, or even a photo of a napkin drawing? Drop it here and I'll extract the technical requirements from it."**
|
|
42
|
+
|
|
43
|
+
*What this reveals:* UI complexity, navigation patterns, data models, feature scope — all at once.
|
|
44
|
+
|
|
45
|
+
**Accepted formats:**
|
|
46
|
+
| Input Type | How to Share | What Claude Extracts |
|
|
47
|
+
|-----------|-------------|---------------------|
|
|
48
|
+
| Screenshot of similar app | Paste image or file path | UI patterns, features to clone, complexity level |
|
|
49
|
+
| Figma/design tool export | Export as PNG and share | Component hierarchy, page count, navigation flow |
|
|
50
|
+
| Hand-drawn wireframe | Photo of paper sketch | Screen count, data relationships, user flow |
|
|
51
|
+
| Napkin drawing | Phone photo | Core screens, rough feature scope |
|
|
52
|
+
| Figma link | Share the URL (needs MCP) | Full design system, components, variants |
|
|
53
|
+
| Excalidraw export | Share PNG or `.excalidraw` file | Architecture diagrams, flow charts, wireframes |
|
|
54
|
+
|
|
55
|
+
**Visual Extraction Protocol:**
|
|
56
|
+
|
|
57
|
+
When the user provides a visual, Claude reads the image and extracts:
|
|
58
|
+
|
|
59
|
+
```markdown
|
|
60
|
+
## Visual Analysis
|
|
61
|
+
|
|
62
|
+
**Screens identified:** {count}
|
|
63
|
+
**Screen list:**
|
|
64
|
+
1. {screen name} — {what it shows}
|
|
65
|
+
2. {screen name} — {what it shows}
|
|
66
|
+
|
|
67
|
+
**UI Elements detected:**
|
|
68
|
+
- Navigation: {sidebar / topbar / tabs / hamburger}
|
|
69
|
+
- Forms: {login / signup / settings / data entry}
|
|
70
|
+
- Data displays: {tables / cards / lists / charts / maps}
|
|
71
|
+
- Interactive: {drag-drop / editor / canvas / video player}
|
|
72
|
+
- Social: {comments / chat / feed / profiles}
|
|
73
|
+
|
|
74
|
+
**Derived capabilities from visual:**
|
|
75
|
+
| Visual Element | → Technical Requirement |
|
|
76
|
+
|---------------|----------------------|
|
|
77
|
+
| Login screen | Auth system |
|
|
78
|
+
| Dashboard with charts | Data aggregation + visualization library |
|
|
79
|
+
| User avatar / profile | Image upload + user profiles |
|
|
80
|
+
| Chat sidebar | Real-time messaging (WebSockets) |
|
|
81
|
+
| Payment/pricing page | Stripe integration |
|
|
82
|
+
| Admin panel | Role-based access control |
|
|
83
|
+
| Search bar | Search index (full-text or Algolia) |
|
|
84
|
+
| Map view | Geolocation API + maps SDK |
|
|
85
|
+
| Video player | Video hosting / streaming |
|
|
86
|
+
| File list / upload area | Object storage (S3/R2/Supabase) |
|
|
87
|
+
| Mobile layout visible | Responsive design required |
|
|
88
|
+
```
|
|
89
|
+
|
|
90
|
+
**After visual extraction:**
|
|
91
|
+
- Skip any interview questions already answered by the visual
|
|
92
|
+
- Use remaining questions to fill gaps the visual didn't reveal
|
|
93
|
+
- Reference the visual analysis in the Capability Summary output
|
|
94
|
+
|
|
95
|
+
> **Pro tip:** Even a rough sketch reveals navigation structure, screen count, and data relationships that take 5+ questions to extract verbally. Always ask for visuals first.
|
|
96
|
+
|
|
97
|
+
---
|
|
98
|
+
|
|
99
|
+
## Phase 1: The Walkthrough (Mission Objective)
|
|
100
|
+
|
|
101
|
+
*Goal: Get the user to narrate what their finished product looks like in use.*
|
|
102
|
+
|
|
103
|
+
### Question 1: The Elevator Pitch
|
|
104
|
+
> **"In one sentence, what does your app do for the person using it?"**
|
|
105
|
+
|
|
106
|
+
*What this reveals:* Core value proposition, primary user type, product category.
|
|
107
|
+
|
|
108
|
+
| Answer Pattern | Hidden Requirement |
|
|
109
|
+
|---------------|-------------------|
|
|
110
|
+
| "Helps teachers manage courses" | LMS, role-based auth (teacher/student), content management |
|
|
111
|
+
| "Lets people sell handmade goods" | E-commerce, payments, seller/buyer roles, product catalog |
|
|
112
|
+
| "Tracks my workouts" | Personal data, mobile-friendly, charts/visualization |
|
|
113
|
+
| "Connects freelancers with clients" | Marketplace, messaging, payments, reviews |
|
|
114
|
+
|
|
115
|
+
### Question 2: The First 60 Seconds
|
|
116
|
+
> **"A new user just signed up. Walk me through their first 60 seconds — what do they see, what do they click?"**
|
|
117
|
+
|
|
118
|
+
*What this reveals:* Onboarding flow, auth requirements, initial data structure, primary navigation.
|
|
119
|
+
|
|
120
|
+
| Answer Pattern | Hidden Requirement |
|
|
121
|
+
|---------------|-------------------|
|
|
122
|
+
| "They fill out a profile with their photo" | User profiles, image upload, storage |
|
|
123
|
+
| "They see a feed of posts from people they follow" | Social graph, feed algorithm, follow system |
|
|
124
|
+
| "They get a dashboard showing their stats" | Data aggregation, charts, dashboard UI |
|
|
125
|
+
| "They pick a plan and enter payment" | Subscription billing, payment gateway, plan tiers |
|
|
126
|
+
|
|
127
|
+
### Question 3: The Money Screen
|
|
128
|
+
> **"What's the ONE screen where your app delivers the most value? Describe it like you're showing it to a friend."**
|
|
129
|
+
|
|
130
|
+
*What this reveals:* Core feature complexity, data relationships, UI sophistication needed.
|
|
131
|
+
|
|
132
|
+
| Answer Pattern | Hidden Requirement |
|
|
133
|
+
|---------------|-------------------|
|
|
134
|
+
| "A drag-and-drop board like Trello" | Complex UI interactions, state management, real-time sync |
|
|
135
|
+
| "A clean editor where you write and format text" | Rich text editor (Tiptap/ProseMirror), content blocks |
|
|
136
|
+
| "A map showing nearby services" | Geolocation, maps API, location-based queries |
|
|
137
|
+
| "A video player with comments on the side" | Video hosting/streaming, threaded comments, timestamps |
|
|
138
|
+
|
|
139
|
+
---
|
|
140
|
+
|
|
141
|
+
## Phase 2: The Users (Personnel & Roles)
|
|
142
|
+
|
|
143
|
+
*Goal: Discover user roles, permissions, and access patterns.*
|
|
144
|
+
|
|
145
|
+
### Question 4: Who's Involved?
|
|
146
|
+
> **"Besides the main user, who else uses this? Does anyone have special powers — like an admin, a manager, a teacher?"**
|
|
147
|
+
|
|
148
|
+
*What this reveals:* Role-based access control (RBAC), permission tiers, multi-tenant needs.
|
|
149
|
+
|
|
150
|
+
| Answer Pattern | Hidden Requirement |
|
|
151
|
+
|---------------|-------------------|
|
|
152
|
+
| "Just me" | Single user, no auth needed or simple auth |
|
|
153
|
+
| "Users and admins" | 2-role RBAC, admin dashboard |
|
|
154
|
+
| "Teachers, students, and school admins" | 3+ role RBAC, organization/tenant model |
|
|
155
|
+
| "Anyone can view, only members can post" | Public/private content, auth-gated actions |
|
|
156
|
+
|
|
157
|
+
### Question 5: Solo or Social?
|
|
158
|
+
> **"Do users interact with each other, or is it a solo experience? Can they see each other's stuff?"**
|
|
159
|
+
|
|
160
|
+
*What this reveals:* Social features, real-time needs, content visibility model.
|
|
161
|
+
|
|
162
|
+
| Answer Pattern | Hidden Requirement |
|
|
163
|
+
|---------------|-------------------|
|
|
164
|
+
| "Totally solo, it's a personal tool" | Simple data model, no sharing infrastructure |
|
|
165
|
+
| "They can share links to their work" | Public URLs, sharing permissions |
|
|
166
|
+
| "They message each other" | Messaging system, notifications, possibly real-time |
|
|
167
|
+
| "They collaborate on the same document" | Real-time collaboration (WebSockets/CRDT), conflict resolution |
|
|
168
|
+
|
|
169
|
+
---
|
|
170
|
+
|
|
171
|
+
## Phase 3: The Capabilities (Equipment & Logistics)
|
|
172
|
+
|
|
173
|
+
*Goal: Discover technical requirements the user doesn't know they have.*
|
|
174
|
+
|
|
175
|
+
### Question 6: The Similar App
|
|
176
|
+
> **"Name 1-2 apps that feel closest to what you're building. What do you like about them? What would you change?"**
|
|
177
|
+
|
|
178
|
+
*What this reveals:* Feature benchmark, UI expectations, implicit technical requirements.
|
|
179
|
+
|
|
180
|
+
| Reference App | Implied Stack Needs |
|
|
181
|
+
|--------------|-------------------|
|
|
182
|
+
| "Like Notion" | Rich text, content blocks, flexible schema, collaborative editing |
|
|
183
|
+
| "Like Shopify" | E-commerce engine, payments, inventory, multi-vendor possible |
|
|
184
|
+
| "Like Duolingo" | Gamification, progress tracking, spaced repetition, mobile-first |
|
|
185
|
+
| "Like Airbnb" | Marketplace, search/filter, maps, booking/calendar, reviews |
|
|
186
|
+
| "Like Slack" | Real-time messaging, channels, file sharing, notifications |
|
|
187
|
+
| "Like Canva" | Canvas editor, templates, asset library, export pipeline |
|
|
188
|
+
|
|
189
|
+
### Question 7: The Deal-Breakers
|
|
190
|
+
> **"Which of these does your app NEED to do? (Just say yes or no to each)"**
|
|
191
|
+
>
|
|
192
|
+
> - Users log in with email/password or Google?
|
|
193
|
+
> - Accept payments or subscriptions?
|
|
194
|
+
> - Users upload files (images, videos, documents)?
|
|
195
|
+
> - Send emails or notifications?
|
|
196
|
+
> - Work well on phones (not just desktop)?
|
|
197
|
+
> - Show real-time updates (like a live chat or live dashboard)?
|
|
198
|
+
|
|
199
|
+
*What this reveals:* Direct capability mapping. Each "yes" locks in a technical requirement.
|
|
200
|
+
|
|
201
|
+
| "Yes" Answer | Technical Requirement |
|
|
202
|
+
|-------------|---------------------|
|
|
203
|
+
| Login | Auth system (Supabase Auth, NextAuth, better-auth, Passport) |
|
|
204
|
+
| Payments | Stripe integration, webhook handling, subscription model |
|
|
205
|
+
| File uploads | Object storage (S3, Supabase Storage, Cloudflare R2) |
|
|
206
|
+
| Emails | Email service (Resend, SendGrid, AWS SES) |
|
|
207
|
+
| Mobile | Responsive design (Tailwind) or React Native |
|
|
208
|
+
| Real-time | WebSockets (Socket.io, Supabase Realtime) or SSE |
|
|
209
|
+
|
|
210
|
+
### Question 8: The Scale Question
|
|
211
|
+
> **"In your dream scenario, how many people are using this? Just you? Hundreds? Thousands? Millions?"**
|
|
212
|
+
|
|
213
|
+
*What this reveals:* Infrastructure scaling needs, database choice implications, hosting tier.
|
|
214
|
+
|
|
215
|
+
| Answer | Infrastructure Implication |
|
|
216
|
+
|--------|--------------------------|
|
|
217
|
+
| "Just me / a few people" | SQLite or free-tier Supabase, simple hosting |
|
|
218
|
+
| "Hundreds" | Standard PostgreSQL, single-server hosting |
|
|
219
|
+
| "Thousands" | PostgreSQL + connection pooling, CDN, caching layer |
|
|
220
|
+
| "Millions" | Horizontal scaling, Redis, CDN, queue system, microservices |
|
|
221
|
+
|
|
222
|
+
---
|
|
223
|
+
|
|
224
|
+
## Phase 4: The Timeline (Mission Calendar)
|
|
225
|
+
|
|
226
|
+
*Goal: Understand urgency and what "done" means to them.*
|
|
227
|
+
|
|
228
|
+
### Question 9: The Deadline
|
|
229
|
+
> **"When do you need this working? Is there a hard deadline (like a launch event) or is it flexible?"**
|
|
230
|
+
|
|
231
|
+
*What this reveals:* Scope constraints, MVP vs full-build, phase prioritization.
|
|
232
|
+
|
|
233
|
+
### Question 10: The MVP Gate
|
|
234
|
+
> **"If you could only ship THREE features and nothing else, which three?"**
|
|
235
|
+
|
|
236
|
+
*What this reveals:* True priorities stripped of nice-to-haves. This becomes Phase 1.
|
|
237
|
+
|
|
238
|
+
---
|
|
239
|
+
|
|
240
|
+
## Capability-to-Stack Derivation Table
|
|
241
|
+
|
|
242
|
+
After the interview, map collected capabilities to stack constraints:
|
|
243
|
+
|
|
244
|
+
| Collected Capabilities | Rules Out | Points Toward |
|
|
245
|
+
|-----------------------|-----------|---------------|
|
|
246
|
+
| Rich text editor + collaboration | Static sites, simple CRUD | Next.js + Supabase Realtime, or MERN + Socket.io |
|
|
247
|
+
| Payments + subscriptions | Frontend-only, static | Full-stack with Stripe SDK (Next.js, Express, Rails) |
|
|
248
|
+
| File uploads (images only) | Nothing major | Supabase Storage, S3, Cloudflare R2 |
|
|
249
|
+
| File uploads (video) | Serverless-only (size limits) | Dedicated upload service, container hosting |
|
|
250
|
+
| Real-time (chat/collab) | Pure REST APIs | WebSocket-capable stack (Supabase, Socket.io, Ably) |
|
|
251
|
+
| Mobile-first | Desktop-heavy frameworks | React Native, or responsive Next.js/Remix |
|
|
252
|
+
| 3+ user roles | Simple auth | RBAC system, role middleware, admin dashboard |
|
|
253
|
+
| Multi-tenant (orgs) | Simple data model | PostgreSQL RLS, tenant isolation, org-scoped queries |
|
|
254
|
+
| ML/AI features | Frontend-only | Python backend or API calls to Claude/Gemini |
|
|
255
|
+
| Offline support | Server-dependent stacks | PWA, local-first (SQLite + sync), service workers |
|
|
256
|
+
|
|
257
|
+
---
|
|
258
|
+
|
|
259
|
+
## Interview Anti-Patterns
|
|
260
|
+
|
|
261
|
+
**Don't do these:**
|
|
262
|
+
|
|
263
|
+
| Anti-Pattern | Why It Fails | Do Instead |
|
|
264
|
+
|-------------|-------------|------------|
|
|
265
|
+
| "What's your tech stack?" | Beginners freeze or guess wrong | Ask about the product, derive the stack |
|
|
266
|
+
| "Do you need a relational database?" | Jargon — they don't know | "Does your data have relationships? Like courses have lessons, lessons have students?" |
|
|
267
|
+
| "SSR or CSR?" | Meaningless to beginners | "Should Google be able to find your pages? Like a blog?" (→ SSR) |
|
|
268
|
+
| "REST or GraphQL?" | Implementation detail | Never ask. Derive from data complexity |
|
|
269
|
+
| "Monolith or microservices?" | Architecture astronautics | Never ask. Default monolith, split later if needed |
|
|
270
|
+
| Asking all 10 questions mechanically | Feels like an interrogation | Adapt — skip questions already answered in earlier responses |
|
|
271
|
+
|
|
272
|
+
---
|
|
273
|
+
|
|
274
|
+
## Output: Capability Summary
|
|
275
|
+
|
|
276
|
+
After the interview, produce a structured summary that feeds into branch generation:
|
|
277
|
+
|
|
278
|
+
```markdown
|
|
279
|
+
## Capability Summary (from Backward Planning Interview)
|
|
280
|
+
|
|
281
|
+
**Product:** {elevator pitch}
|
|
282
|
+
**Similar to:** {reference apps}
|
|
283
|
+
**Primary user:** {who}
|
|
284
|
+
**User roles:** {list}
|
|
285
|
+
**Visual provided:** {yes/no — if yes, include extracted screen count and key findings}
|
|
286
|
+
|
|
287
|
+
### Required Capabilities
|
|
288
|
+
- [ ] Auth: {type}
|
|
289
|
+
- [ ] Payments: {yes/no, type}
|
|
290
|
+
- [ ] File uploads: {type, size}
|
|
291
|
+
- [ ] Real-time: {yes/no, what kind}
|
|
292
|
+
- [ ] Email/notifications: {yes/no}
|
|
293
|
+
- [ ] Mobile: {responsive or native}
|
|
294
|
+
- [ ] Scale target: {users}
|
|
295
|
+
|
|
296
|
+
### Derived Constraints
|
|
297
|
+
| Capability | Constraint | Eliminates |
|
|
298
|
+
|-----------|-----------|------------|
|
|
299
|
+
| {cap} | {what this means technically} | {stacks ruled out} |
|
|
300
|
+
|
|
301
|
+
### MVP Features (Phase 1)
|
|
302
|
+
1. {feature}
|
|
303
|
+
2. {feature}
|
|
304
|
+
3. {feature}
|
|
305
|
+
|
|
306
|
+
→ Feed this into fire-vision-architect Step B2 (Backward Mode)
|
|
307
|
+
```
|
|
@@ -0,0 +1,163 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: CIRCUIT_BREAKER_INTELLIGENCE
|
|
3
|
+
category: methodology
|
|
4
|
+
description: Intelligent stuck-state detection with type classification, threshold tuning, dead-end engineering from Google X and NASA, and pre-defined kill conditions
|
|
5
|
+
version: 1.0.0
|
|
6
|
+
tags: [circuit-breaker, stuck-detection, dead-ends, google-x, kill-conditions, recovery]
|
|
7
|
+
sources:
|
|
8
|
+
- "Microsoft Azure Architecture Center — Circuit Breaker Pattern"
|
|
9
|
+
- "Martin Fowler — Circuit Breaker (bliki)"
|
|
10
|
+
- "Google X — Moonshot Factory Operating Manual (Astro Teller)"
|
|
11
|
+
- "NASA — Knowledge Transfer and Tacit Knowledge Loss"
|
|
12
|
+
- "PMC — Drug Repurposing / ReFRAME compound library"
|
|
13
|
+
---
|
|
14
|
+
|
|
15
|
+
# Circuit Breaker Intelligence
|
|
16
|
+
|
|
17
|
+
> **Core insight:** Not all "stuck" states are the same. A syntax error, a fixation loop, and a fundamentally impossible approach each require different interventions. Classify before intervening.
|
|
18
|
+
|
|
19
|
+
---
|
|
20
|
+
|
|
21
|
+
## 1. Stuck-State Classification
|
|
22
|
+
|
|
23
|
+
Before triggering any recovery, classify the stuck type:
|
|
24
|
+
|
|
25
|
+
| Stuck Type | Symptom | Correct Intervention |
|
|
26
|
+
|------------|---------|---------------------|
|
|
27
|
+
| **Transient error** | Build/API failure, external dependency timeout | Wait + retry (standard circuit breaker) |
|
|
28
|
+
| **Fixation** | Same approach with varied syntax, 3+ attempts | Context rotation — fresh agent with dead-end map only |
|
|
29
|
+
| **Context overflow** | Endless file navigation, losing track of changes | Condensation + fresh context window |
|
|
30
|
+
| **Semantic misunderstanding** | Solution passes unit tests, fails integration | Human clarification — agent misunderstands the goal |
|
|
31
|
+
| **Dead end** | All viable approaches exhausted, research returned nothing | Shelf with wake conditions, escalate or pivot |
|
|
32
|
+
| **Scope violation** | Agent drifting outside declared file/tool boundaries | Re-read scope manifest, constrain tools |
|
|
33
|
+
|
|
34
|
+
**Agent action:** When hitting a wall, classify FIRST. Then apply the matching intervention. Don't use "retry harder" for fixation problems or "fresh eyes" for transient errors.
|
|
35
|
+
|
|
36
|
+
---
|
|
37
|
+
|
|
38
|
+
## 2. Three-State Circuit Breaker
|
|
39
|
+
|
|
40
|
+
```
|
|
41
|
+
CLOSED (normal):
|
|
42
|
+
Task executes. Error counter tracks failures.
|
|
43
|
+
|
|
44
|
+
IF same error pattern seen {threshold} times:
|
|
45
|
+
→ Trip to OPEN
|
|
46
|
+
|
|
47
|
+
OPEN (tripped):
|
|
48
|
+
Stop executing this approach immediately.
|
|
49
|
+
Route to: research → re-plan → or shelf
|
|
50
|
+
|
|
51
|
+
Timeout: After research completes or new session starts
|
|
52
|
+
→ Move to HALF-OPEN
|
|
53
|
+
|
|
54
|
+
HALF-OPEN (probing):
|
|
55
|
+
Try the researched alternative with limited scope.
|
|
56
|
+
|
|
57
|
+
IF success: → Reset to CLOSED
|
|
58
|
+
IF failure: → Back to OPEN (shelf as dead end)
|
|
59
|
+
```
|
|
60
|
+
|
|
61
|
+
### Threshold Tuning
|
|
62
|
+
- **Transient errors:** threshold = 3 (retries are cheap)
|
|
63
|
+
- **Logic errors:** threshold = 2 (retries are expensive)
|
|
64
|
+
- **Architectural errors:** threshold = 1 (retry is pointless)
|
|
65
|
+
|
|
66
|
+
**Anti-pattern:** Single shared breaker for all failure types. Maintain per-strategy breakers — one broken approach shouldn't mask another healthy one.
|
|
67
|
+
|
|
68
|
+
---
|
|
69
|
+
|
|
70
|
+
## 3. Pre-Defined Kill Conditions (Google X Pattern)
|
|
71
|
+
|
|
72
|
+
> "Run at the hardest problem first." — Astro Teller, Google X
|
|
73
|
+
|
|
74
|
+
Before any task executes, define what would prove the approach unviable:
|
|
75
|
+
|
|
76
|
+
```yaml
|
|
77
|
+
kill_conditions:
|
|
78
|
+
- "3 consecutive verification failures on same root cause"
|
|
79
|
+
- "approach requires changing >5 files outside declared scope"
|
|
80
|
+
- "same error repeats after 2 different fix strategies"
|
|
81
|
+
- "external dependency does not support required feature"
|
|
82
|
+
|
|
83
|
+
wake_conditions:
|
|
84
|
+
- "if {blocking dependency} releases version with {feature}"
|
|
85
|
+
- "if {alternative library} becomes available"
|
|
86
|
+
- "if user provides {missing credential/config}"
|
|
87
|
+
```
|
|
88
|
+
|
|
89
|
+
**Why define upfront:** Kill conditions defined AFTER failure are rationalizations. Kill conditions defined BEFORE execution are engineering discipline. Google X kills ~97% of projects at rapid evaluation — before significant resources are allocated.
|
|
90
|
+
|
|
91
|
+
**Agent action:** fire-planner should include 2-3 kill conditions per high-risk task in BLUEPRINT frontmatter. fire-executor checks these before retrying.
|
|
92
|
+
|
|
93
|
+
---
|
|
94
|
+
|
|
95
|
+
## 4. Dead-End Engineering
|
|
96
|
+
|
|
97
|
+
Dead ends are **first-class knowledge artifacts**, not failures to delete.
|
|
98
|
+
|
|
99
|
+
### What Makes a Good Dead-End Record
|
|
100
|
+
|
|
101
|
+
From NASA's knowledge loss lessons: if you only record WHAT failed but not WHY, the next agent will attempt the same approach. The "why" is the asset.
|
|
102
|
+
|
|
103
|
+
```markdown
|
|
104
|
+
### [DEAD-END] {title}
|
|
105
|
+
|
|
106
|
+
**What:** {what was attempted}
|
|
107
|
+
**Why it failed:** {root cause, not just symptom}
|
|
108
|
+
**Approaches tried:** {list with expected vs actual for each}
|
|
109
|
+
**Fundamental constraint:** {the thing that makes this approach unviable}
|
|
110
|
+
**Wake conditions:** {what would make this worth revisiting}
|
|
111
|
+
**Status:** SHELVED | ABANDONED | SUPERSEDED BY {task-id}
|
|
112
|
+
```
|
|
113
|
+
|
|
114
|
+
### The ReFRAME Principle (Drug Repurposing)
|
|
115
|
+
Pharmaceutical R&D maintains libraries of 12,000+ compounds that "failed" in one context but succeed when retested in new contexts. A dead end in Phase 3 may become the solution in Phase 7 when constraints change.
|
|
116
|
+
|
|
117
|
+
**Agent action:** Before starting a new task, grep FAILURES.md for `[DEAD-END]` entries with related tags. A prior dead end may now be viable if the context has changed.
|
|
118
|
+
|
|
119
|
+
---
|
|
120
|
+
|
|
121
|
+
## 5. Articulation Before Escalation (Rubber Duck Protocol)
|
|
122
|
+
|
|
123
|
+
Before routing to a fresh agent, human, or research:
|
|
124
|
+
|
|
125
|
+
```markdown
|
|
126
|
+
## STUCK REPORT
|
|
127
|
+
|
|
128
|
+
**Goal:** {what I was trying to accomplish}
|
|
129
|
+
**Approaches tried:**
|
|
130
|
+
1. {approach} → Expected: {X} → Got: {Y}
|
|
131
|
+
2. {approach} → Expected: {X} → Got: {Y}
|
|
132
|
+
**Current constraint:** {what is physically preventing progress}
|
|
133
|
+
**What a fresh approach needs:** {information or different framing}
|
|
134
|
+
**Confidence this approach is viable:** {high/medium/low + reason}
|
|
135
|
+
```
|
|
136
|
+
|
|
137
|
+
**Why this works:** The act of articulating the problem forces assumption reconstruction. In cognitive science research, this catches 30-40% of stuck cases before escalation — the stuck agent solves it by explaining it.
|
|
138
|
+
|
|
139
|
+
---
|
|
140
|
+
|
|
141
|
+
## 6. Error Discrimination
|
|
142
|
+
|
|
143
|
+
Not all errors carry equal signal:
|
|
144
|
+
|
|
145
|
+
| Error Type | Signal Strength | Action |
|
|
146
|
+
|------------|----------------|--------|
|
|
147
|
+
| Syntax/typo | Low (weight: 0.25) | Auto-fix, minimal signal toward threshold |
|
|
148
|
+
| Import/dependency missing | Medium (weight: 0.5) | Install/resolve, moderate signal |
|
|
149
|
+
| Logic error (wrong output) | High (weight: 1.0) | Count fully, consider re-plan after 2 |
|
|
150
|
+
| Architecture mismatch | Very high (weight: 2.0) | Count as 2, consider kill condition |
|
|
151
|
+
| Cross-phase contract break | Critical (weight: 3.0) | Stop immediately, investigate integration failure |
|
|
152
|
+
|
|
153
|
+
**Agent action:** Weight errors by type when evaluating circuit breaker thresholds. Three typos ≠ three architectural failures.
|
|
154
|
+
|
|
155
|
+
---
|
|
156
|
+
|
|
157
|
+
## When Agents Should Reference This Skill
|
|
158
|
+
|
|
159
|
+
- **fire-executor:** Classify stuck states, check kill conditions, write stuck reports
|
|
160
|
+
- **fire-planner:** Define kill conditions in BLUEPRINT frontmatter for high-risk tasks
|
|
161
|
+
- **fire-verifier:** Flag recurring failure patterns as potential dead ends
|
|
162
|
+
- **fire-researcher:** Read dead-end records before researching — avoid repeating prior approaches
|
|
163
|
+
- **fire-autonomous:** Use error budgets + kill conditions to decide retry vs. shelf vs. escalate
|