@thierrynakoa/fire-flow 10.0.0 → 12.2.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (94) hide show
  1. package/.claude-plugin/plugin.json +8 -8
  2. package/ARCHITECTURE-DIAGRAM.md +7 -4
  3. package/COMMAND-REFERENCE.md +33 -13
  4. package/DOMINION-FLOW-OVERVIEW.md +581 -421
  5. package/QUICK-START.md +3 -3
  6. package/README.md +101 -44
  7. package/TROUBLESHOOTING.md +264 -264
  8. package/agents/fire-executor.md +200 -116
  9. package/agents/fire-fact-checker.md +276 -276
  10. package/agents/fire-phoenix-analyst.md +394 -0
  11. package/agents/fire-planner.md +145 -53
  12. package/agents/fire-project-researcher.md +155 -155
  13. package/agents/fire-research-synthesizer.md +166 -166
  14. package/agents/fire-researcher.md +144 -59
  15. package/agents/fire-roadmapper.md +215 -203
  16. package/agents/fire-verifier.md +247 -65
  17. package/agents/fire-vision-architect.md +381 -0
  18. package/commands/fire-0-orient.md +476 -476
  19. package/commands/fire-1a-new.md +216 -0
  20. package/commands/fire-1b-research.md +210 -0
  21. package/commands/fire-1c-setup.md +254 -0
  22. package/commands/{fire-1a-discuss.md → fire-1d-discuss.md} +35 -7
  23. package/commands/fire-3-execute.md +55 -2
  24. package/commands/fire-4-verify.md +61 -0
  25. package/commands/fire-5-handoff.md +2 -2
  26. package/commands/fire-6-resume.md +37 -2
  27. package/commands/fire-add-new-skill.md +2 -2
  28. package/commands/fire-autonomous.md +20 -3
  29. package/commands/fire-brainstorm.md +1 -1
  30. package/commands/fire-complete-milestone.md +2 -2
  31. package/commands/fire-cost.md +183 -0
  32. package/commands/fire-dashboard.md +2 -2
  33. package/commands/fire-debug.md +663 -663
  34. package/commands/fire-loop-resume.md +2 -2
  35. package/commands/fire-loop-stop.md +1 -1
  36. package/commands/fire-loop.md +1168 -1168
  37. package/commands/fire-map-codebase.md +3 -3
  38. package/commands/fire-new-milestone.md +356 -356
  39. package/commands/fire-phoenix.md +603 -0
  40. package/commands/fire-reflect.md +235 -235
  41. package/commands/fire-research.md +246 -246
  42. package/commands/fire-search.md +1 -1
  43. package/commands/fire-skills-diff.md +3 -3
  44. package/commands/fire-skills-history.md +3 -3
  45. package/commands/fire-skills-rollback.md +7 -7
  46. package/commands/fire-skills-sync.md +5 -5
  47. package/commands/fire-test.md +9 -9
  48. package/commands/fire-todos.md +1 -1
  49. package/commands/fire-update.md +5 -5
  50. package/hooks/hooks.json +16 -16
  51. package/hooks/run-hook.sh +8 -8
  52. package/hooks/run-session-end.sh +7 -7
  53. package/hooks/session-end.sh +90 -90
  54. package/hooks/session-start.sh +1 -1
  55. package/package.json +4 -2
  56. package/plugin.json +7 -7
  57. package/references/metrics-and-trends.md +1 -1
  58. package/skills-library/SKILLS-INDEX.md +588 -588
  59. package/skills-library/_general/methodology/AUTONOMOUS_ORCHESTRATION.md +182 -0
  60. package/skills-library/_general/methodology/BACKWARD_PLANNING_INTERVIEW.md +307 -0
  61. package/skills-library/_general/methodology/CIRCUIT_BREAKER_INTELLIGENCE.md +163 -0
  62. package/skills-library/_general/methodology/CONTEXT_ROTATION.md +151 -0
  63. package/skills-library/_general/methodology/DEAD_ENDS_SHELF.md +188 -0
  64. package/skills-library/_general/methodology/DESIGN_PHILOSOPHY_ENFORCEMENT.md +152 -0
  65. package/skills-library/_general/methodology/INTERNAL_CONSISTENCY_AUDIT.md +212 -0
  66. package/skills-library/_general/methodology/LIVE_BREADCRUMB_PROTOCOL.md +242 -0
  67. package/skills-library/_general/methodology/PHOENIX_REBUILD_METHODOLOGY.md +251 -0
  68. package/skills-library/_general/methodology/QUALITY_GATES_AND_VERIFICATION.md +157 -0
  69. package/skills-library/_general/methodology/RELIABILITY_PREDICTION.md +104 -0
  70. package/skills-library/_general/methodology/REQUIREMENTS_DECOMPOSITION.md +155 -0
  71. package/skills-library/_general/methodology/SELF_TESTING_FEEDBACK_LOOP.md +143 -0
  72. package/skills-library/_general/methodology/STACK_COMPATIBILITY_MATRIX.md +178 -0
  73. package/skills-library/_general/methodology/TIERED_CONTEXT_ARCHITECTURE.md +118 -0
  74. package/skills-library/_general/methodology/ZERO_FRICTION_CLI_SETUP.md +312 -0
  75. package/skills-library/_general/methodology/autonomous-multi-phase-build.md +133 -0
  76. package/skills-library/_general/methodology/claude-md-archival.md +280 -0
  77. package/skills-library/_general/methodology/debug-swarm-researcher-escape-hatch.md +240 -240
  78. package/skills-library/_general/methodology/git-worktrees-parallel.md +232 -0
  79. package/skills-library/_general/methodology/llm-judge-memory-crud.md +241 -0
  80. package/skills-library/_general/methodology/multi-project-autonomous-build.md +360 -0
  81. package/skills-library/_general/methodology/shell-autonomous-loop-fixplan.md +238 -238
  82. package/skills-library/_general/patterns-standards/GOF_DESIGN_PATTERNS_FOR_AI_AGENTS.md +358 -0
  83. package/skills-library/methodology/BREATH_BASED_PARALLEL_EXECUTION.md +1 -1
  84. package/skills-library/methodology/RESEARCH_BACKED_WORKFLOW_UPGRADE.md +1 -1
  85. package/skills-library/methodology/SABBATH_REST_PATTERN.md +1 -1
  86. package/templates/ASSUMPTIONS.md +1 -1
  87. package/templates/BLOCKERS.md +1 -1
  88. package/templates/DECISION_LOG.md +1 -1
  89. package/templates/phase-prompt.md +1 -1
  90. package/templates/phoenix-comparison.md +80 -0
  91. package/version.json +2 -2
  92. package/workflows/handoff-session.md +1 -1
  93. package/workflows/new-project.md +2 -2
  94. package/commands/fire-1-new.md +0 -281
@@ -0,0 +1,182 @@
1
+ ---
2
+ name: AUTONOMOUS_ORCHESTRATION
3
+ category: methodology
4
+ description: Industry patterns for autonomous AI agent orchestration — Planner/Worker/Judge separation, scope manifests, DORA metrics, phase-gate hybrids, and supervised autonomy tiers
5
+ version: 1.0.0
6
+ tags: [autonomous, orchestration, agents, planner-worker-judge, dora, phase-gate, scope]
7
+ sources:
8
+ - "Mike Mason — AI Coding Agents in 2026: Coherence Through Orchestration"
9
+ - "Anthropic Engineering — Effective Harnesses for Long-Running Agents"
10
+ - "OpenHands SDK — arxiv 2511.03690"
11
+ - "Google DORA — State of DevOps 2025"
12
+ - "Robert Cooper — Agile-Stage-Gate Hybrids"
13
+ - "AWS — Agentic AI Security Scoping Matrix (TBAC)"
14
+ - "SWE-Bench Pro — Can AI Agents Solve Long-Horizon Tasks?"
15
+ ---
16
+
17
+ # Autonomous Orchestration Patterns
18
+
19
+ > **Core insight:** "You don't trust; you instrument." (Boris Cherny) — Verification stays active. Verdicts auto-route to fix cycles. The human reviews the finished product, not intermediate steps.
20
+
21
+ ---
22
+
23
+ ## 1. Planner / Worker / Judge Separation
24
+
25
+ The architecture that consistently outperforms alternatives in autonomous coding research:
26
+
27
+ | Role | Can Do | Cannot Do | Dominion Flow Equivalent |
28
+ |------|--------|-----------|------------------------|
29
+ | **Planner** | Read codebase, decompose tasks, set scope | Write code, modify files | fire-planner |
30
+ | **Worker** | Execute scoped tasks, write code | Plan, verify own work, exceed scope | fire-executor |
31
+ | **Judge** | Run verification, read output, report findings | Fix what it finds broken, modify code | fire-verifier |
32
+
33
+ **Why this matters:** The most dangerous failure mode in autonomous AI is the agent judging its own work. The worker cannot declare itself done. The judge cannot fix what it finds. This separation is load-bearing architecture, not process overhead.
34
+
35
+ **Anti-pattern:** "The executor also checks if things work" — this is the worker judging itself. SWE-Bench Pro data shows agents that self-verify have significantly lower success rates than those with independent verification.
36
+
37
+ ---
38
+
39
+ ## 2. Scope Manifests (Task-Based Access Control)
40
+
41
+ Every task should include a scope boundary:
42
+
43
+ ```yaml
44
+ scope:
45
+ allowed_files:
46
+ - "server/routes/auth.js"
47
+ - "server/middleware/auth.js"
48
+ - "server/models/User.js"
49
+ allowed_operations:
50
+ - create_file
51
+ - modify_file
52
+ - run_tests
53
+ forbidden:
54
+ - modify files outside allowed_files
55
+ - install new dependencies without plan approval
56
+ - delete existing tests
57
+ max_file_changes: 5
58
+ ```
59
+
60
+ **Why explicit scope:** Agents drift without manifest enforcement. Conversational instructions ("only change the auth files") are less reliable than tool-level constraints. The circuit breaker should trip if the agent attempts out-of-scope actions.
61
+
62
+ **Agent action:** fire-planner includes scope in BLUEPRINT frontmatter. fire-executor reads scope before starting. fire-verifier checks that changes stayed within scope.
63
+
64
+ ---
65
+
66
+ ## 3. External Structured State > Long Context
67
+
68
+ From Anthropic's engineering blog: context window amnesia is solved by structured external state, not by longer context windows.
69
+
70
+ ### The Pattern
71
+ ```
72
+ Session start:
73
+ 1. Read CONSCIENCE.md → current phase/status
74
+ 2. Read latest WARRIOR handoff → prior session context
75
+ 3. Read RECORD.md → what was done, what's pending
76
+ 4. Read FAILURES.md → dead ends to avoid
77
+
78
+ Session end:
79
+ 1. Update CONSCIENCE.md → new status
80
+ 2. Write WARRIOR handoff → structured state for next session
81
+ 3. Update RECORD.md → what was accomplished
82
+ 4. Commit checkpoint
83
+ ```
84
+
85
+ **Why:** A 200K context window that includes irrelevant history is worse than a 50K window with precisely structured state. External state files are the memory — the context window is the working space.
86
+
87
+ **Key from OpenHands:** Use an append-only event log for mutable state. When history approaches context limits, summarize old events while preserving the full log. Reduced API costs 2x with no performance degradation.
88
+
89
+ ---
90
+
91
+ ## 4. Supervised Autonomy Tiers
92
+
93
+ The industry is converging on three tiers:
94
+
95
+ | Tier | Risk Level | Oversight | Dominion Flow Mode |
96
+ |------|-----------|-----------|-------------------|
97
+ | **Human-in-the-loop** | High | Approval required before action | Manual `/fire-3-execute` |
98
+ | **Human-on-the-loop** | Medium | Autonomous with monitoring + escalation | `/fire-autonomous` |
99
+ | **Human-out-of-the-loop** | Low | Fully autonomous, periodic audit | Future: batch mode |
100
+
101
+ **Confidence thresholds drive tier assignment:**
102
+ - Routine tasks (boilerplate, config): 80% confidence → auto-proceed
103
+ - Business logic tasks: 85% confidence → auto-proceed, flag for review
104
+ - Architecture decisions: 90% confidence → require explicit approval
105
+
106
+ **Industry benchmark:** Operational escalation rates above 15% indicate confidence thresholds are miscalibrated — too much is being auto-approved or too little.
107
+
108
+ ---
109
+
110
+ ## 5. DORA Metrics for AI-Assisted Development
111
+
112
+ The 2025 DORA Report finding: **AI adoption improves throughput but increases delivery instability.** Faster output with more failures.
113
+
114
+ | Metric | What It Measures | AI-Agent Equivalent |
115
+ |--------|-----------------|-------------------|
116
+ | **Deployment Frequency** | How often to production | Phases completed per session |
117
+ | **Change Lead Time** | Commit → production | Plan → verified output |
118
+ | **Change Failure Rate** | % of deploys causing incidents | % of phases requiring re-execution |
119
+ | **Recovery Time** | Mean time to restore | Time from verification FAIL to PASS |
120
+
121
+ **The key insight:** Optimizing throughput (more phases faster) without measuring stability (failure rate, recovery time) produces more bugs faster. Both dimensions must be tracked.
122
+
123
+ **Agent action:** The autonomous log should track both: phases completed AND phases that needed retry. A session that completes 3 phases cleanly is better than one that completes 5 phases with 3 retries each.
124
+
125
+ ---
126
+
127
+ ## 6. Phase-Gate + Agile Hybrid
128
+
129
+ The sweet spot for AI-assisted development:
130
+
131
+ ```
132
+ MACRO: Phase-gate discipline at boundaries
133
+ Plan → [GATE] → Execute → [GATE] → Verify → [GATE] → Handoff
134
+
135
+ Gates are non-negotiable. The project cannot advance
136
+ without passing verification.
137
+
138
+ MICRO: Agile flexibility within phases
139
+ Within Execute: iterate freely on tasks
140
+ Within Verify: scope-adaptive checks
141
+ Within Plan: explore alternatives
142
+ ```
143
+
144
+ **Definition of Ready (before starting a phase):**
145
+ - [ ] Phase requirements clear (MEMORY.md populated)
146
+ - [ ] Dependencies from prior phase resolved
147
+ - [ ] Scope bounded in BLUEPRINT
148
+ - [ ] Kill conditions defined for high-risk tasks
149
+
150
+ **Definition of Done (before declaring phase complete):**
151
+ - [ ] All BLUEPRINT tasks executed
152
+ - [ ] Verification APPROVED or CONDITIONAL
153
+ - [ ] Review completed (no BLOCK findings)
154
+ - [ ] RECORD.md updated
155
+ - [ ] CONSCIENCE.md advanced
156
+
157
+ ---
158
+
159
+ ## 7. What Fails in Autonomous Mode
160
+
161
+ From SWE-Bench Pro and industry analysis — failure modes to watch for:
162
+
163
+ | Failure Mode | Symptom | Prevention |
164
+ |-------------|---------|-----------|
165
+ | **Premature termination** | Agent declares "done" before end-to-end verification | Judge must verify, not worker |
166
+ | **Scope creep** | Agent fixes "related" issues outside scope | Scope manifest enforcement |
167
+ | **Context overflow** | Endless file reading, losing track | Condensation + structured state |
168
+ | **Quality degradation at scale** | More output but more bugs | Track change failure rate alongside throughput |
169
+ | **Semantic misunderstanding** | Solution passes tests but misses the point | Verify against requirements, not just tests |
170
+ | **Self-certification** | Agent says "looks good" without running checks | Mandatory verification gates |
171
+
172
+ **The hardest failure:** Semantic misunderstanding — the agent produces code that is technically correct but doesn't solve the actual problem. Tests pass because the tests test the wrong thing. This is only caught by requirements-level verification, not code-level verification.
173
+
174
+ ---
175
+
176
+ ## When Agents Should Reference This Skill
177
+
178
+ - **fire-autonomous:** Apply supervised autonomy tiers, track DORA metrics in autonomous log
179
+ - **fire-planner:** Include scope manifests and kill conditions in BLUEPRINTs
180
+ - **fire-executor:** Respect scope boundaries, never self-verify
181
+ - **fire-verifier:** Independent judge role — verify against requirements, not just tests
182
+ - **fire-5-handoff:** Structure handoff as external state for next session (not narrative)
@@ -0,0 +1,307 @@
1
+ ---
2
+ name: BACKWARD_PLANNING_INTERVIEW
3
+ category: methodology
4
+ description: Structured questioning protocol for extracting project end-state from beginners who don't know technical terminology
5
+ version: 1.0.0
6
+ tags: [backward-planning, interview, vision, beginners, vibe-coder]
7
+ ---
8
+
9
+ # Backward Planning Interview Protocol
10
+
11
+ Reference skill for `fire-vision-architect` (backward mode) and `fire-1-new` (adaptive questioning). Provides structured question sequences that extract the mission objective from users who don't know what questions to ask.
12
+
13
+ > **Origin:** Military backward planning doctrine — fix the end-state first, then derive every checkpoint from it. Adapted for software project initialization.
14
+
15
+ ---
16
+
17
+ ## Mode Gate
18
+
19
+ This interview activates when the user answers the mode gate question:
20
+
21
+ > **"Have you already started building this, or are we starting from scratch?"**
22
+
23
+ If they say "from scratch," "just an idea," or anything that reveals no existing tech context — this protocol runs. Never ask "What tech stack are you using?" — it forces beginners to bluff or freeze.
24
+
25
+ ---
26
+
27
+ ## Core Principle
28
+
29
+ Beginners describe products in terms of **what they can see and do**, not in technical terms. Every question below is designed to extract a hidden technical requirement from a plain-language answer.
30
+
31
+ **Never ask:** "Do you need WebSockets?" (they don't know what that is)
32
+ **Always ask:** "Should users see changes from other people instantly, like Google Docs?" (they know exactly what that means)
33
+
34
+ ---
35
+
36
+ ## Phase 0: Visual Input (Show, Don't Tell)
37
+
38
+ *Goal: Let users show what they're building instead of describing it. A picture extracts more requirements in 5 seconds than 10 minutes of questions.*
39
+
40
+ ### Question 0: The Visual
41
+ > **"Do you have anything visual — a screenshot, a wireframe, a Figma link, a hand-drawn sketch, or even a photo of a napkin drawing? Drop it here and I'll extract the technical requirements from it."**
42
+
43
+ *What this reveals:* UI complexity, navigation patterns, data models, feature scope — all at once.
44
+
45
+ **Accepted formats:**
46
+ | Input Type | How to Share | What Claude Extracts |
47
+ |-----------|-------------|---------------------|
48
+ | Screenshot of similar app | Paste image or file path | UI patterns, features to clone, complexity level |
49
+ | Figma/design tool export | Export as PNG and share | Component hierarchy, page count, navigation flow |
50
+ | Hand-drawn wireframe | Photo of paper sketch | Screen count, data relationships, user flow |
51
+ | Napkin drawing | Phone photo | Core screens, rough feature scope |
52
+ | Figma link | Share the URL (needs MCP) | Full design system, components, variants |
53
+ | Excalidraw export | Share PNG or `.excalidraw` file | Architecture diagrams, flow charts, wireframes |
54
+
55
+ **Visual Extraction Protocol:**
56
+
57
+ When the user provides a visual, Claude reads the image and extracts:
58
+
59
+ ```markdown
60
+ ## Visual Analysis
61
+
62
+ **Screens identified:** {count}
63
+ **Screen list:**
64
+ 1. {screen name} — {what it shows}
65
+ 2. {screen name} — {what it shows}
66
+
67
+ **UI Elements detected:**
68
+ - Navigation: {sidebar / topbar / tabs / hamburger}
69
+ - Forms: {login / signup / settings / data entry}
70
+ - Data displays: {tables / cards / lists / charts / maps}
71
+ - Interactive: {drag-drop / editor / canvas / video player}
72
+ - Social: {comments / chat / feed / profiles}
73
+
74
+ **Derived capabilities from visual:**
75
+ | Visual Element | → Technical Requirement |
76
+ |---------------|----------------------|
77
+ | Login screen | Auth system |
78
+ | Dashboard with charts | Data aggregation + visualization library |
79
+ | User avatar / profile | Image upload + user profiles |
80
+ | Chat sidebar | Real-time messaging (WebSockets) |
81
+ | Payment/pricing page | Stripe integration |
82
+ | Admin panel | Role-based access control |
83
+ | Search bar | Search index (full-text or Algolia) |
84
+ | Map view | Geolocation API + maps SDK |
85
+ | Video player | Video hosting / streaming |
86
+ | File list / upload area | Object storage (S3/R2/Supabase) |
87
+ | Mobile layout visible | Responsive design required |
88
+ ```
89
+
90
+ **After visual extraction:**
91
+ - Skip any interview questions already answered by the visual
92
+ - Use remaining questions to fill gaps the visual didn't reveal
93
+ - Reference the visual analysis in the Capability Summary output
94
+
95
+ > **Pro tip:** Even a rough sketch reveals navigation structure, screen count, and data relationships that take 5+ questions to extract verbally. Always ask for visuals first.
96
+
97
+ ---
98
+
99
+ ## Phase 1: The Walkthrough (Mission Objective)
100
+
101
+ *Goal: Get the user to narrate what their finished product looks like in use.*
102
+
103
+ ### Question 1: The Elevator Pitch
104
+ > **"In one sentence, what does your app do for the person using it?"**
105
+
106
+ *What this reveals:* Core value proposition, primary user type, product category.
107
+
108
+ | Answer Pattern | Hidden Requirement |
109
+ |---------------|-------------------|
110
+ | "Helps teachers manage courses" | LMS, role-based auth (teacher/student), content management |
111
+ | "Lets people sell handmade goods" | E-commerce, payments, seller/buyer roles, product catalog |
112
+ | "Tracks my workouts" | Personal data, mobile-friendly, charts/visualization |
113
+ | "Connects freelancers with clients" | Marketplace, messaging, payments, reviews |
114
+
115
+ ### Question 2: The First 60 Seconds
116
+ > **"A new user just signed up. Walk me through their first 60 seconds — what do they see, what do they click?"**
117
+
118
+ *What this reveals:* Onboarding flow, auth requirements, initial data structure, primary navigation.
119
+
120
+ | Answer Pattern | Hidden Requirement |
121
+ |---------------|-------------------|
122
+ | "They fill out a profile with their photo" | User profiles, image upload, storage |
123
+ | "They see a feed of posts from people they follow" | Social graph, feed algorithm, follow system |
124
+ | "They get a dashboard showing their stats" | Data aggregation, charts, dashboard UI |
125
+ | "They pick a plan and enter payment" | Subscription billing, payment gateway, plan tiers |
126
+
127
+ ### Question 3: The Money Screen
128
+ > **"What's the ONE screen where your app delivers the most value? Describe it like you're showing it to a friend."**
129
+
130
+ *What this reveals:* Core feature complexity, data relationships, UI sophistication needed.
131
+
132
+ | Answer Pattern | Hidden Requirement |
133
+ |---------------|-------------------|
134
+ | "A drag-and-drop board like Trello" | Complex UI interactions, state management, real-time sync |
135
+ | "A clean editor where you write and format text" | Rich text editor (Tiptap/ProseMirror), content blocks |
136
+ | "A map showing nearby services" | Geolocation, maps API, location-based queries |
137
+ | "A video player with comments on the side" | Video hosting/streaming, threaded comments, timestamps |
138
+
139
+ ---
140
+
141
+ ## Phase 2: The Users (Personnel & Roles)
142
+
143
+ *Goal: Discover user roles, permissions, and access patterns.*
144
+
145
+ ### Question 4: Who's Involved?
146
+ > **"Besides the main user, who else uses this? Does anyone have special powers — like an admin, a manager, a teacher?"**
147
+
148
+ *What this reveals:* Role-based access control (RBAC), permission tiers, multi-tenant needs.
149
+
150
+ | Answer Pattern | Hidden Requirement |
151
+ |---------------|-------------------|
152
+ | "Just me" | Single user, no auth needed or simple auth |
153
+ | "Users and admins" | 2-role RBAC, admin dashboard |
154
+ | "Teachers, students, and school admins" | 3+ role RBAC, organization/tenant model |
155
+ | "Anyone can view, only members can post" | Public/private content, auth-gated actions |
156
+
157
+ ### Question 5: Solo or Social?
158
+ > **"Do users interact with each other, or is it a solo experience? Can they see each other's stuff?"**
159
+
160
+ *What this reveals:* Social features, real-time needs, content visibility model.
161
+
162
+ | Answer Pattern | Hidden Requirement |
163
+ |---------------|-------------------|
164
+ | "Totally solo, it's a personal tool" | Simple data model, no sharing infrastructure |
165
+ | "They can share links to their work" | Public URLs, sharing permissions |
166
+ | "They message each other" | Messaging system, notifications, possibly real-time |
167
+ | "They collaborate on the same document" | Real-time collaboration (WebSockets/CRDT), conflict resolution |
168
+
169
+ ---
170
+
171
+ ## Phase 3: The Capabilities (Equipment & Logistics)
172
+
173
+ *Goal: Discover technical requirements the user doesn't know they have.*
174
+
175
+ ### Question 6: The Similar App
176
+ > **"Name 1-2 apps that feel closest to what you're building. What do you like about them? What would you change?"**
177
+
178
+ *What this reveals:* Feature benchmark, UI expectations, implicit technical requirements.
179
+
180
+ | Reference App | Implied Stack Needs |
181
+ |--------------|-------------------|
182
+ | "Like Notion" | Rich text, content blocks, flexible schema, collaborative editing |
183
+ | "Like Shopify" | E-commerce engine, payments, inventory, multi-vendor possible |
184
+ | "Like Duolingo" | Gamification, progress tracking, spaced repetition, mobile-first |
185
+ | "Like Airbnb" | Marketplace, search/filter, maps, booking/calendar, reviews |
186
+ | "Like Slack" | Real-time messaging, channels, file sharing, notifications |
187
+ | "Like Canva" | Canvas editor, templates, asset library, export pipeline |
188
+
189
+ ### Question 7: The Deal-Breakers
190
+ > **"Which of these does your app NEED to do? (Just say yes or no to each)"**
191
+ >
192
+ > - Users log in with email/password or Google?
193
+ > - Accept payments or subscriptions?
194
+ > - Users upload files (images, videos, documents)?
195
+ > - Send emails or notifications?
196
+ > - Work well on phones (not just desktop)?
197
+ > - Show real-time updates (like a live chat or live dashboard)?
198
+
199
+ *What this reveals:* Direct capability mapping. Each "yes" locks in a technical requirement.
200
+
201
+ | "Yes" Answer | Technical Requirement |
202
+ |-------------|---------------------|
203
+ | Login | Auth system (Supabase Auth, NextAuth, better-auth, Passport) |
204
+ | Payments | Stripe integration, webhook handling, subscription model |
205
+ | File uploads | Object storage (S3, Supabase Storage, Cloudflare R2) |
206
+ | Emails | Email service (Resend, SendGrid, AWS SES) |
207
+ | Mobile | Responsive design (Tailwind) or React Native |
208
+ | Real-time | WebSockets (Socket.io, Supabase Realtime) or SSE |
209
+
210
+ ### Question 8: The Scale Question
211
+ > **"In your dream scenario, how many people are using this? Just you? Hundreds? Thousands? Millions?"**
212
+
213
+ *What this reveals:* Infrastructure scaling needs, database choice implications, hosting tier.
214
+
215
+ | Answer | Infrastructure Implication |
216
+ |--------|--------------------------|
217
+ | "Just me / a few people" | SQLite or free-tier Supabase, simple hosting |
218
+ | "Hundreds" | Standard PostgreSQL, single-server hosting |
219
+ | "Thousands" | PostgreSQL + connection pooling, CDN, caching layer |
220
+ | "Millions" | Horizontal scaling, Redis, CDN, queue system, microservices |
221
+
222
+ ---
223
+
224
+ ## Phase 4: The Timeline (Mission Calendar)
225
+
226
+ *Goal: Understand urgency and what "done" means to them.*
227
+
228
+ ### Question 9: The Deadline
229
+ > **"When do you need this working? Is there a hard deadline (like a launch event) or is it flexible?"**
230
+
231
+ *What this reveals:* Scope constraints, MVP vs full-build, phase prioritization.
232
+
233
+ ### Question 10: The MVP Gate
234
+ > **"If you could only ship THREE features and nothing else, which three?"**
235
+
236
+ *What this reveals:* True priorities stripped of nice-to-haves. This becomes Phase 1.
237
+
238
+ ---
239
+
240
+ ## Capability-to-Stack Derivation Table
241
+
242
+ After the interview, map collected capabilities to stack constraints:
243
+
244
+ | Collected Capabilities | Rules Out | Points Toward |
245
+ |-----------------------|-----------|---------------|
246
+ | Rich text editor + collaboration | Static sites, simple CRUD | Next.js + Supabase Realtime, or MERN + Socket.io |
247
+ | Payments + subscriptions | Frontend-only, static | Full-stack with Stripe SDK (Next.js, Express, Rails) |
248
+ | File uploads (images only) | Nothing major | Supabase Storage, S3, Cloudflare R2 |
249
+ | File uploads (video) | Serverless-only (size limits) | Dedicated upload service, container hosting |
250
+ | Real-time (chat/collab) | Pure REST APIs | WebSocket-capable stack (Supabase, Socket.io, Ably) |
251
+ | Mobile-first | Desktop-heavy frameworks | React Native, or responsive Next.js/Remix |
252
+ | 3+ user roles | Simple auth | RBAC system, role middleware, admin dashboard |
253
+ | Multi-tenant (orgs) | Simple data model | PostgreSQL RLS, tenant isolation, org-scoped queries |
254
+ | ML/AI features | Frontend-only | Python backend or API calls to Claude/Gemini |
255
+ | Offline support | Server-dependent stacks | PWA, local-first (SQLite + sync), service workers |
256
+
257
+ ---
258
+
259
+ ## Interview Anti-Patterns
260
+
261
+ **Don't do these:**
262
+
263
+ | Anti-Pattern | Why It Fails | Do Instead |
264
+ |-------------|-------------|------------|
265
+ | "What's your tech stack?" | Beginners freeze or guess wrong | Ask about the product, derive the stack |
266
+ | "Do you need a relational database?" | Jargon — they don't know | "Does your data have relationships? Like courses have lessons, lessons have students?" |
267
+ | "SSR or CSR?" | Meaningless to beginners | "Should Google be able to find your pages? Like a blog?" (→ SSR) |
268
+ | "REST or GraphQL?" | Implementation detail | Never ask. Derive from data complexity |
269
+ | "Monolith or microservices?" | Architecture astronautics | Never ask. Default monolith, split later if needed |
270
+ | Asking all 10 questions mechanically | Feels like an interrogation | Adapt — skip questions already answered in earlier responses |
271
+
272
+ ---
273
+
274
+ ## Output: Capability Summary
275
+
276
+ After the interview, produce a structured summary that feeds into branch generation:
277
+
278
+ ```markdown
279
+ ## Capability Summary (from Backward Planning Interview)
280
+
281
+ **Product:** {elevator pitch}
282
+ **Similar to:** {reference apps}
283
+ **Primary user:** {who}
284
+ **User roles:** {list}
285
+ **Visual provided:** {yes/no — if yes, include extracted screen count and key findings}
286
+
287
+ ### Required Capabilities
288
+ - [ ] Auth: {type}
289
+ - [ ] Payments: {yes/no, type}
290
+ - [ ] File uploads: {type, size}
291
+ - [ ] Real-time: {yes/no, what kind}
292
+ - [ ] Email/notifications: {yes/no}
293
+ - [ ] Mobile: {responsive or native}
294
+ - [ ] Scale target: {users}
295
+
296
+ ### Derived Constraints
297
+ | Capability | Constraint | Eliminates |
298
+ |-----------|-----------|------------|
299
+ | {cap} | {what this means technically} | {stacks ruled out} |
300
+
301
+ ### MVP Features (Phase 1)
302
+ 1. {feature}
303
+ 2. {feature}
304
+ 3. {feature}
305
+
306
+ → Feed this into fire-vision-architect Step B2 (Backward Mode)
307
+ ```
@@ -0,0 +1,163 @@
1
+ ---
2
+ name: CIRCUIT_BREAKER_INTELLIGENCE
3
+ category: methodology
4
+ description: Intelligent stuck-state detection with type classification, threshold tuning, dead-end engineering from Google X and NASA, and pre-defined kill conditions
5
+ version: 1.0.0
6
+ tags: [circuit-breaker, stuck-detection, dead-ends, google-x, kill-conditions, recovery]
7
+ sources:
8
+ - "Microsoft Azure Architecture Center — Circuit Breaker Pattern"
9
+ - "Martin Fowler — Circuit Breaker (bliki)"
10
+ - "Google X — Moonshot Factory Operating Manual (Astro Teller)"
11
+ - "NASA — Knowledge Transfer and Tacit Knowledge Loss"
12
+ - "PMC — Drug Repurposing / ReFRAME compound library"
13
+ ---
14
+
15
+ # Circuit Breaker Intelligence
16
+
17
+ > **Core insight:** Not all "stuck" states are the same. A syntax error, a fixation loop, and a fundamentally impossible approach each require different interventions. Classify before intervening.
18
+
19
+ ---
20
+
21
+ ## 1. Stuck-State Classification
22
+
23
+ Before triggering any recovery, classify the stuck type:
24
+
25
+ | Stuck Type | Symptom | Correct Intervention |
26
+ |------------|---------|---------------------|
27
+ | **Transient error** | Build/API failure, external dependency timeout | Wait + retry (standard circuit breaker) |
28
+ | **Fixation** | Same approach with varied syntax, 3+ attempts | Context rotation — fresh agent with dead-end map only |
29
+ | **Context overflow** | Endless file navigation, losing track of changes | Condensation + fresh context window |
30
+ | **Semantic misunderstanding** | Solution passes unit tests, fails integration | Human clarification — agent misunderstands the goal |
31
+ | **Dead end** | All viable approaches exhausted, research returned nothing | Shelf with wake conditions, escalate or pivot |
32
+ | **Scope violation** | Agent drifting outside declared file/tool boundaries | Re-read scope manifest, constrain tools |
33
+
34
+ **Agent action:** When hitting a wall, classify FIRST. Then apply the matching intervention. Don't use "retry harder" for fixation problems or "fresh eyes" for transient errors.
35
+
36
+ ---
37
+
38
+ ## 2. Three-State Circuit Breaker
39
+
40
+ ```
41
+ CLOSED (normal):
42
+ Task executes. Error counter tracks failures.
43
+
44
+ IF same error pattern seen {threshold} times:
45
+ → Trip to OPEN
46
+
47
+ OPEN (tripped):
48
+ Stop executing this approach immediately.
49
+ Route to: research → re-plan → or shelf
50
+
51
+ Timeout: After research completes or new session starts
52
+ → Move to HALF-OPEN
53
+
54
+ HALF-OPEN (probing):
55
+ Try the researched alternative with limited scope.
56
+
57
+ IF success: → Reset to CLOSED
58
+ IF failure: → Back to OPEN (shelf as dead end)
59
+ ```
60
+
61
+ ### Threshold Tuning
62
+ - **Transient errors:** threshold = 3 (retries are cheap)
63
+ - **Logic errors:** threshold = 2 (retries are expensive)
64
+ - **Architectural errors:** threshold = 1 (retry is pointless)
65
+
66
+ **Anti-pattern:** Single shared breaker for all failure types. Maintain per-strategy breakers — one broken approach shouldn't mask another healthy one.
67
+
68
+ ---
69
+
70
+ ## 3. Pre-Defined Kill Conditions (Google X Pattern)
71
+
72
+ > "Run at the hardest problem first." — Astro Teller, Google X
73
+
74
+ Before any task executes, define what would prove the approach unviable:
75
+
76
+ ```yaml
77
+ kill_conditions:
78
+ - "3 consecutive verification failures on same root cause"
79
+ - "approach requires changing >5 files outside declared scope"
80
+ - "same error repeats after 2 different fix strategies"
81
+ - "external dependency does not support required feature"
82
+
83
+ wake_conditions:
84
+ - "if {blocking dependency} releases version with {feature}"
85
+ - "if {alternative library} becomes available"
86
+ - "if user provides {missing credential/config}"
87
+ ```
88
+
89
+ **Why define upfront:** Kill conditions defined AFTER failure are rationalizations. Kill conditions defined BEFORE execution are engineering discipline. Google X kills ~97% of projects at rapid evaluation — before significant resources are allocated.
90
+
91
+ **Agent action:** fire-planner should include 2-3 kill conditions per high-risk task in BLUEPRINT frontmatter. fire-executor checks these before retrying.
92
+
93
+ ---
94
+
95
+ ## 4. Dead-End Engineering
96
+
97
+ Dead ends are **first-class knowledge artifacts**, not failures to delete.
98
+
99
+ ### What Makes a Good Dead-End Record
100
+
101
+ From NASA's knowledge loss lessons: if you only record WHAT failed but not WHY, the next agent will attempt the same approach. The "why" is the asset.
102
+
103
+ ```markdown
104
+ ### [DEAD-END] {title}
105
+
106
+ **What:** {what was attempted}
107
+ **Why it failed:** {root cause, not just symptom}
108
+ **Approaches tried:** {list with expected vs actual for each}
109
+ **Fundamental constraint:** {the thing that makes this approach unviable}
110
+ **Wake conditions:** {what would make this worth revisiting}
111
+ **Status:** SHELVED | ABANDONED | SUPERSEDED BY {task-id}
112
+ ```
113
+
114
+ ### The ReFRAME Principle (Drug Repurposing)
115
+ Pharmaceutical R&D maintains libraries of 12,000+ compounds that "failed" in one context but succeed when retested in new contexts. A dead end in Phase 3 may become the solution in Phase 7 when constraints change.
116
+
117
+ **Agent action:** Before starting a new task, grep FAILURES.md for `[DEAD-END]` entries with related tags. A prior dead end may now be viable if the context has changed.
118
+
119
+ ---
120
+
121
+ ## 5. Articulation Before Escalation (Rubber Duck Protocol)
122
+
123
+ Before routing to a fresh agent, human, or research:
124
+
125
+ ```markdown
126
+ ## STUCK REPORT
127
+
128
+ **Goal:** {what I was trying to accomplish}
129
+ **Approaches tried:**
130
+ 1. {approach} → Expected: {X} → Got: {Y}
131
+ 2. {approach} → Expected: {X} → Got: {Y}
132
+ **Current constraint:** {what is physically preventing progress}
133
+ **What a fresh approach needs:** {information or different framing}
134
+ **Confidence this approach is viable:** {high/medium/low + reason}
135
+ ```
136
+
137
+ **Why this works:** The act of articulating the problem forces assumption reconstruction. In cognitive science research, this catches 30-40% of stuck cases before escalation — the stuck agent solves it by explaining it.
138
+
139
+ ---
140
+
141
+ ## 6. Error Discrimination
142
+
143
+ Not all errors carry equal signal:
144
+
145
+ | Error Type | Signal Strength | Action |
146
+ |------------|----------------|--------|
147
+ | Syntax/typo | Low (weight: 0.25) | Auto-fix, minimal signal toward threshold |
148
+ | Import/dependency missing | Medium (weight: 0.5) | Install/resolve, moderate signal |
149
+ | Logic error (wrong output) | High (weight: 1.0) | Count fully, consider re-plan after 2 |
150
+ | Architecture mismatch | Very high (weight: 2.0) | Count as 2, consider kill condition |
151
+ | Cross-phase contract break | Critical (weight: 3.0) | Stop immediately, investigate integration failure |
152
+
153
+ **Agent action:** Weight errors by type when evaluating circuit breaker thresholds. Three typos ≠ three architectural failures.
154
+
155
+ ---
156
+
157
+ ## When Agents Should Reference This Skill
158
+
159
+ - **fire-executor:** Classify stuck states, check kill conditions, write stuck reports
160
+ - **fire-planner:** Define kill conditions in BLUEPRINT frontmatter for high-risk tasks
161
+ - **fire-verifier:** Flag recurring failure patterns as potential dead ends
162
+ - **fire-researcher:** Read dead-end records before researching — avoid repeating prior approaches
163
+ - **fire-autonomous:** Use error budgets + kill conditions to decide retry vs. shelf vs. escalate