@thierrynakoa/fire-flow 10.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (215) hide show
  1. package/.claude-plugin/plugin.json +64 -0
  2. package/ARCHITECTURE-DIAGRAM.md +440 -0
  3. package/COMMAND-REFERENCE.md +172 -0
  4. package/DOMINION-FLOW-OVERVIEW.md +421 -0
  5. package/LICENSE +21 -0
  6. package/QUICK-START.md +351 -0
  7. package/README.md +398 -0
  8. package/TROUBLESHOOTING.md +264 -0
  9. package/agents/fire-codebase-mapper.md +484 -0
  10. package/agents/fire-debugger.md +535 -0
  11. package/agents/fire-executor.md +949 -0
  12. package/agents/fire-fact-checker.md +276 -0
  13. package/agents/fire-learncoding-explainer.md +237 -0
  14. package/agents/fire-learncoding-walker.md +147 -0
  15. package/agents/fire-planner.md +675 -0
  16. package/agents/fire-project-researcher.md +155 -0
  17. package/agents/fire-research-synthesizer.md +166 -0
  18. package/agents/fire-researcher.md +723 -0
  19. package/agents/fire-reviewer.md +499 -0
  20. package/agents/fire-roadmapper.md +203 -0
  21. package/agents/fire-verifier.md +880 -0
  22. package/bin/cli.js +208 -0
  23. package/commands/fire-0-orient.md +476 -0
  24. package/commands/fire-1-new.md +281 -0
  25. package/commands/fire-1a-discuss.md +455 -0
  26. package/commands/fire-2-plan.md +527 -0
  27. package/commands/fire-3-execute.md +1303 -0
  28. package/commands/fire-4-verify.md +845 -0
  29. package/commands/fire-5-handoff.md +515 -0
  30. package/commands/fire-6-resume.md +501 -0
  31. package/commands/fire-7-review.md +409 -0
  32. package/commands/fire-add-new-skill.md +598 -0
  33. package/commands/fire-analytics.md +499 -0
  34. package/commands/fire-assumptions.md +78 -0
  35. package/commands/fire-autonomous.md +528 -0
  36. package/commands/fire-brainstorm.md +413 -0
  37. package/commands/fire-complete-milestone.md +270 -0
  38. package/commands/fire-dashboard.md +375 -0
  39. package/commands/fire-debug.md +663 -0
  40. package/commands/fire-discover.md +616 -0
  41. package/commands/fire-double-check.md +460 -0
  42. package/commands/fire-execute-plan.md +182 -0
  43. package/commands/fire-learncoding.md +242 -0
  44. package/commands/fire-loop-resume.md +272 -0
  45. package/commands/fire-loop-stop.md +198 -0
  46. package/commands/fire-loop.md +1168 -0
  47. package/commands/fire-map-codebase.md +313 -0
  48. package/commands/fire-new-milestone.md +356 -0
  49. package/commands/fire-reflect.md +235 -0
  50. package/commands/fire-research.md +246 -0
  51. package/commands/fire-search.md +330 -0
  52. package/commands/fire-security-audit-repo.md +293 -0
  53. package/commands/fire-security-scan.md +484 -0
  54. package/commands/fire-session-summary.md +252 -0
  55. package/commands/fire-skills-diff.md +506 -0
  56. package/commands/fire-skills-history.md +388 -0
  57. package/commands/fire-skills-rollback.md +408 -0
  58. package/commands/fire-skills-sync.md +470 -0
  59. package/commands/fire-test.md +520 -0
  60. package/commands/fire-todos.md +335 -0
  61. package/commands/fire-transition.md +186 -0
  62. package/commands/fire-update.md +312 -0
  63. package/commands/fire-verify-uat.md +146 -0
  64. package/commands/fire-vuln-scan.md +493 -0
  65. package/hooks/hooks.json +16 -0
  66. package/hooks/run-hook.cmd +69 -0
  67. package/hooks/run-hook.sh +8 -0
  68. package/hooks/run-session-end.cmd +49 -0
  69. package/hooks/run-session-end.sh +7 -0
  70. package/hooks/session-end.sh +90 -0
  71. package/hooks/session-start.sh +111 -0
  72. package/package.json +52 -0
  73. package/plugin.json +7 -0
  74. package/references/auto-skill-extraction.md +136 -0
  75. package/references/behavioral-directives.md +365 -0
  76. package/references/blocker-tracking.md +155 -0
  77. package/references/checkpoints.md +165 -0
  78. package/references/circuit-breaker.md +410 -0
  79. package/references/context-engineering.md +587 -0
  80. package/references/decision-time-guidance.md +289 -0
  81. package/references/error-classification.md +326 -0
  82. package/references/execution-mode-intelligence.md +242 -0
  83. package/references/git-integration.md +217 -0
  84. package/references/honesty-protocols.md +304 -0
  85. package/references/integration-architecture.md +470 -0
  86. package/references/issue-to-pr-pipeline.md +150 -0
  87. package/references/metrics-and-trends.md +234 -0
  88. package/references/playwright-e2e-testing.md +326 -0
  89. package/references/questioning.md +125 -0
  90. package/references/research-improvements.md +110 -0
  91. package/references/skills-usage-guide.md +429 -0
  92. package/references/tdd.md +131 -0
  93. package/references/testing-enforcement.md +192 -0
  94. package/references/ui-brand.md +383 -0
  95. package/references/validation-checklist.md +456 -0
  96. package/references/verification-patterns.md +187 -0
  97. package/references/warrior-principles.md +173 -0
  98. package/skills-library/SKILLS-INDEX.md +588 -0
  99. package/skills-library/_general/frontend/html-visual-reports.md +292 -0
  100. package/skills-library/_general/methodology/debug-swarm-researcher-escape-hatch.md +240 -0
  101. package/skills-library/_general/methodology/learncoding-agentic-pattern.md +114 -0
  102. package/skills-library/_general/methodology/shell-autonomous-loop-fixplan.md +238 -0
  103. package/skills-library/basics/api-rest-basics.md +162 -0
  104. package/skills-library/basics/env-variables.md +96 -0
  105. package/skills-library/basics/error-handling-basics.md +125 -0
  106. package/skills-library/basics/git-commit-conventions.md +106 -0
  107. package/skills-library/basics/readme-template.md +108 -0
  108. package/skills-library/common-tasks/async-await-patterns.md +157 -0
  109. package/skills-library/common-tasks/auth-jwt-basics.md +164 -0
  110. package/skills-library/common-tasks/database-schema-design.md +166 -0
  111. package/skills-library/common-tasks/file-upload-basics.md +166 -0
  112. package/skills-library/common-tasks/form-validation.md +159 -0
  113. package/skills-library/debugging/FAILURE_TAXONOMY_CLASSIFICATION.md +117 -0
  114. package/skills-library/debugging/THREE_AGENT_HYPOTHESIS_DEBUGGING.md +86 -0
  115. package/skills-library/methodology/BREATH_BASED_PARALLEL_EXECUTION.md +678 -0
  116. package/skills-library/methodology/CONFIDENCE_GATED_EXECUTION.md +243 -0
  117. package/skills-library/methodology/EVIDENCE_BASED_VALIDATION.md +308 -0
  118. package/skills-library/methodology/MULTI_PERSPECTIVE_CODE_REVIEW.md +330 -0
  119. package/skills-library/methodology/PATH_VERIFICATION_GATE.md +211 -0
  120. package/skills-library/methodology/REFLEXION_MEMORY_PATTERN.md +183 -0
  121. package/skills-library/methodology/RESEARCH_BACKED_WORKFLOW_UPGRADE.md +263 -0
  122. package/skills-library/methodology/SABBATH_REST_PATTERN.md +267 -0
  123. package/skills-library/methodology/STONE_AND_SCAFFOLD.md +220 -0
  124. package/skills-library/performance/cache-augmented-generation.md +172 -0
  125. package/skills-library/quality-safety/debugging-steps.md +147 -0
  126. package/skills-library/quality-safety/deployment-checklist.md +155 -0
  127. package/skills-library/quality-safety/security-checklist.md +204 -0
  128. package/skills-library/quality-safety/testing-basics.md +180 -0
  129. package/skills-library/security/agent-security-scanner.md +445 -0
  130. package/skills-library/specialists/api-architecture/api-designer.md +49 -0
  131. package/skills-library/specialists/api-architecture/graphql-architect.md +49 -0
  132. package/skills-library/specialists/api-architecture/mcp-developer.md +51 -0
  133. package/skills-library/specialists/api-architecture/microservices-architect.md +50 -0
  134. package/skills-library/specialists/api-architecture/websocket-engineer.md +48 -0
  135. package/skills-library/specialists/backend/django-expert.md +52 -0
  136. package/skills-library/specialists/backend/fastapi-expert.md +52 -0
  137. package/skills-library/specialists/backend/laravel-specialist.md +52 -0
  138. package/skills-library/specialists/backend/nestjs-expert.md +51 -0
  139. package/skills-library/specialists/backend/rails-expert.md +53 -0
  140. package/skills-library/specialists/backend/spring-boot-engineer.md +56 -0
  141. package/skills-library/specialists/data-ml/fine-tuning-expert.md +48 -0
  142. package/skills-library/specialists/data-ml/ml-pipeline.md +47 -0
  143. package/skills-library/specialists/data-ml/pandas-pro.md +47 -0
  144. package/skills-library/specialists/data-ml/rag-architect.md +51 -0
  145. package/skills-library/specialists/data-ml/spark-engineer.md +47 -0
  146. package/skills-library/specialists/frontend/angular-architect.md +52 -0
  147. package/skills-library/specialists/frontend/flutter-expert.md +51 -0
  148. package/skills-library/specialists/frontend/nextjs-developer.md +54 -0
  149. package/skills-library/specialists/frontend/react-native-expert.md +50 -0
  150. package/skills-library/specialists/frontend/vue-expert.md +51 -0
  151. package/skills-library/specialists/infrastructure/chaos-engineer.md +74 -0
  152. package/skills-library/specialists/infrastructure/cloud-architect.md +70 -0
  153. package/skills-library/specialists/infrastructure/database-optimizer.md +64 -0
  154. package/skills-library/specialists/infrastructure/devops-engineer.md +70 -0
  155. package/skills-library/specialists/infrastructure/kubernetes-specialist.md +52 -0
  156. package/skills-library/specialists/infrastructure/monitoring-expert.md +70 -0
  157. package/skills-library/specialists/infrastructure/sre-engineer.md +70 -0
  158. package/skills-library/specialists/infrastructure/terraform-engineer.md +51 -0
  159. package/skills-library/specialists/languages/cpp-pro.md +74 -0
  160. package/skills-library/specialists/languages/csharp-developer.md +69 -0
  161. package/skills-library/specialists/languages/dotnet-core-expert.md +54 -0
  162. package/skills-library/specialists/languages/golang-pro.md +51 -0
  163. package/skills-library/specialists/languages/java-architect.md +49 -0
  164. package/skills-library/specialists/languages/javascript-pro.md +68 -0
  165. package/skills-library/specialists/languages/kotlin-specialist.md +68 -0
  166. package/skills-library/specialists/languages/php-pro.md +49 -0
  167. package/skills-library/specialists/languages/python-pro.md +52 -0
  168. package/skills-library/specialists/languages/react-expert.md +51 -0
  169. package/skills-library/specialists/languages/rust-engineer.md +50 -0
  170. package/skills-library/specialists/languages/sql-pro.md +56 -0
  171. package/skills-library/specialists/languages/swift-expert.md +69 -0
  172. package/skills-library/specialists/languages/typescript-pro.md +51 -0
  173. package/skills-library/specialists/platform/atlassian-mcp.md +52 -0
  174. package/skills-library/specialists/platform/embedded-systems.md +53 -0
  175. package/skills-library/specialists/platform/game-developer.md +53 -0
  176. package/skills-library/specialists/platform/salesforce-developer.md +53 -0
  177. package/skills-library/specialists/platform/shopify-expert.md +49 -0
  178. package/skills-library/specialists/platform/wordpress-pro.md +49 -0
  179. package/skills-library/specialists/quality/code-documenter.md +51 -0
  180. package/skills-library/specialists/quality/code-reviewer.md +67 -0
  181. package/skills-library/specialists/quality/debugging-wizard.md +51 -0
  182. package/skills-library/specialists/quality/fullstack-guardian.md +51 -0
  183. package/skills-library/specialists/quality/legacy-modernizer.md +50 -0
  184. package/skills-library/specialists/quality/playwright-expert.md +65 -0
  185. package/skills-library/specialists/quality/spec-miner.md +56 -0
  186. package/skills-library/specialists/quality/test-master.md +65 -0
  187. package/skills-library/specialists/security/secure-code-guardian.md +55 -0
  188. package/skills-library/specialists/security/security-reviewer.md +53 -0
  189. package/skills-library/specialists/workflow/architecture-designer.md +53 -0
  190. package/skills-library/specialists/workflow/cli-developer.md +70 -0
  191. package/skills-library/specialists/workflow/feature-forge.md +65 -0
  192. package/skills-library/specialists/workflow/prompt-engineer.md +54 -0
  193. package/skills-library/specialists/workflow/the-fool.md +62 -0
  194. package/templates/ASSUMPTIONS.md +125 -0
  195. package/templates/BLOCKERS.md +73 -0
  196. package/templates/DECISION_LOG.md +116 -0
  197. package/templates/UAT.md +96 -0
  198. package/templates/blueprint.md +94 -0
  199. package/templates/brainstorm.md +185 -0
  200. package/templates/conscience.md +92 -0
  201. package/templates/fire-handoff.md +159 -0
  202. package/templates/metrics.md +67 -0
  203. package/templates/phase-prompt.md +142 -0
  204. package/templates/record.md +131 -0
  205. package/templates/review-report.md +117 -0
  206. package/templates/skills-index.md +157 -0
  207. package/templates/verification.md +149 -0
  208. package/templates/vision.md +79 -0
  209. package/validation-config.yml +793 -0
  210. package/version.json +7 -0
  211. package/workflows/execute-phase.md +732 -0
  212. package/workflows/handoff-session.md +678 -0
  213. package/workflows/new-project.md +578 -0
  214. package/workflows/plan-phase.md +592 -0
  215. package/workflows/verify-phase.md +874 -0
@@ -0,0 +1,183 @@
1
+ # Reflexion Memory Pattern — Cross-Session Failure Learning
2
+
3
+ ## The Problem
4
+
5
+ AI agents repeat the same mistakes across sessions because failure context is lost. Debug sessions resolve issues, but the knowledge dies with the session. The next agent encountering the same symptoms starts from scratch.
6
+
7
+ ### Why It Was Hard
8
+
9
+ - Debug sessions produce rich context (symptoms, hypotheses, evidence, root causes) but it's trapped in `.planning/debug/` files that are project-specific and not searchable cross-project
10
+ - Failed approaches are the most valuable learning — but agents only record what *worked*, not what *didn't*
11
+ - Finding the right granularity: too detailed = noise, too abstract = useless
12
+ - Integration requires modifying multiple command flows (debug, loop, execute)
13
+
14
+ ### Impact
15
+
16
+ - Same bugs debugged repeatedly across sessions (hours wasted)
17
+ - Silent failures re-investigated from scratch every time
18
+ - No institutional memory of "this library is broken on Python 3.14"
19
+ - Debug sessions take 3x longer than necessary when prior knowledge exists
20
+
21
+ ---
22
+
23
+ ## The Solution
24
+
25
+ ### Root Cause
26
+
27
+ Agent systems store *conclusions* (handoffs, skills) but not *journeys* (what was tried, what failed, why). Reflexion research shows that storing the journey as linguistic self-reflection dramatically improves future performance (91% pass@1 vs baselines).
28
+
29
+ ### The Reflection File Format
30
+
31
+ ```markdown
32
+ ---
33
+ type: reflection
34
+ date: 2026-02-20
35
+ project: claude-voice-bridge
36
+ trigger: debug-resolution | test-failure | approach-rotation | stalled-loop
37
+ severity: minor | moderate | critical
38
+ tags: [pynput, keyboard, hotkeys, python-3.14]
39
+ ---
40
+ # What I tried and why it failed
41
+
42
+ ## The Problem
43
+ Hotkeys stopped responding. No errors — completely silent failure.
44
+
45
+ ## What I Tried (and why each failed)
46
+ 1. **Checked keyboard library hooks** — hooks installed, listener alive,
47
+ but zero callbacks. Root cause: keyboard 0.13.5 broken on Python 3.14.
48
+ 2. **Switched to pynput with char matching** — pynput works, but Ctrl+M
49
+ sends '\r' not 'm'. Silent mismatch.
50
+
51
+ ## What Actually Worked
52
+ Used `KeyCode.from_vk(ord(name.upper()))` — VK codes are stable
53
+ regardless of modifier state.
54
+
55
+ ## The Lesson
56
+ When a library installs without errors but produces no output, suspect
57
+ Python version incompatibility. Always match keyboard keys by VK code,
58
+ never by char when modifiers are involved.
59
+
60
+ ## Future Self: Search For This When
61
+ - Hotkeys stop working silently
62
+ - Keyboard hooks fire zero events
63
+ - Ctrl+letter combinations fail to match
64
+ ```
65
+
66
+ ### Three Integration Points
67
+
68
+ **1. Pre-Investigation Search (Step 2.5 in debug flow):**
69
+ ```
70
+ Before investigating any issue:
71
+ Search reflections: /fire-remember "{symptoms}" --type reflection
72
+
73
+ If match found with >0.75 similarity:
74
+ "I've seen this before — {lesson}. Applying directly."
75
+ Offer: [Apply same fix] [Investigate fresh] [Compare differences]
76
+ ```
77
+
78
+ **2. Post-Resolution Capture (Step 7.5 in debug flow):**
79
+ ```
80
+ After root cause found and fix verified:
81
+ Auto-generate reflection from debug file
82
+ Extract: symptoms → failed hypotheses → root cause → fix → lesson
83
+
84
+ Severity classification:
85
+ critical: 5+ eliminated hypotheses OR 10+ files changed
86
+ moderate: 2-4 eliminated hypotheses OR multi-file fix
87
+ minor: 1 hypothesis OR single-file fix
88
+ ```
89
+
90
+ **3. Loop Failure Capture (Step 9 in loop):**
91
+ ```
92
+ On STALLED (3+ iterations no progress):
93
+ Save reflection with trigger: "stalled-loop"
94
+ Include: what was attempted, measurements, why no progress
95
+
96
+ On SPINNING (same error repeated):
97
+ Save reflection with trigger: "approach-rotation"
98
+ Include: each failed approach with error hash
99
+ ```
100
+
101
+ ### Storage & Search
102
+
103
+ ```
104
+ Location: ~/.claude/reflections/
105
+ Indexed in: Qdrant as sourceType: 'reflection'
106
+ Search: /fire-remember "{query}" --type reflection
107
+ Command: /fire-reflect capture|search|list|review
108
+ ```
109
+
110
+ ---
111
+
112
+ ## Testing the Fix
113
+
114
+ ### Verification Steps
115
+
116
+ 1. Create a reflection file manually in `~/.claude/reflections/`
117
+ 2. Run `npm run consolidate` to index it
118
+ 3. Search: `npm run search -- "hotkeys silent failure" --type reflection`
119
+ 4. Confirm the reflection appears in results with correct sourceType
120
+
121
+ ### Quality Checklist
122
+
123
+ A good reflection has:
124
+ - [ ] Specific symptoms (error messages, observed behaviors)
125
+ - [ ] Multiple failed approaches with *reasons* they failed
126
+ - [ ] Concrete solution (code, command, config change — not vague advice)
127
+ - [ ] One-sentence lesson useful without context
128
+ - [ ] Search triggers matching how you'd describe the problem naturally
129
+
130
+ A bad reflection:
131
+ - "Something was wrong with the API" (too vague)
132
+ - Only records the solution without the journey
133
+ - Lesson is "be more careful" (not actionable)
134
+
135
+ ---
136
+
137
+ ## Prevention
138
+
139
+ 1. Make reflection generation **automatic** after debug resolution — don't rely on manual capture
140
+ 2. Keep reflections **concise** — the lesson and search triggers are most important
141
+ 3. Review reflections periodically — merge duplicates, update outdated ones
142
+ 4. Tag with specific technologies and error patterns for better search
143
+
144
+ ---
145
+
146
+ ## Related Patterns
147
+
148
+ - [AGENT_SELF_IMPROVEMENT_LOOP](./AGENT_SELF_IMPROVEMENT_LOOP.md) - Full 6-upgrade blueprint
149
+ - [CONFIDENCE_GATED_EXECUTION](./CONFIDENCE_GATED_EXECUTION.md) - Reflections feed confidence scoring
150
+ - [WARRIOR_WORKFLOW_DEBUGGING_PROTOCOL](./WARRIOR_WORKFLOW_DEBUGGING_PROTOCOL.md) - Debug flow where reflections integrate
151
+
152
+ ---
153
+
154
+ ## Common Mistakes to Avoid
155
+
156
+ - Capturing reflections for trivial issues (typo fixes, config changes) — noise overwhelms signal
157
+ - Writing the "lesson" as a platitude ("always test thoroughly") instead of a specific takeaway
158
+ - Not including search triggers — the reflection exists but is unfindable
159
+ - Storing reflections per-project instead of globally — defeats cross-session learning
160
+ - Skipping the "what I tried" section — the failed approaches are the most valuable part
161
+
162
+ ---
163
+
164
+ ## Resources
165
+
166
+ - Reflexion (NeurIPS 2023): https://arxiv.org/abs/2303.11366
167
+ - "Language Agents with Verbal Reinforcement Learning" — Shinn et al.
168
+ - Dominion Flow implementation: `/fire-reflect` command, `fire-debug.md` Steps 2.5 and 7.5
169
+
170
+ ---
171
+
172
+ ## Time to Implement
173
+
174
+ **2-3 hours** — Create reflection directory, write command, modify debug/loop flows, add to vector index
175
+
176
+ ## Difficulty Level
177
+
178
+ Stars: 2/5 — Conceptually simple. The hard part is building the discipline to actually search reflections before investigating and to capture them after resolution.
179
+
180
+ ---
181
+
182
+ **Author Notes:**
183
+ The most surprising finding from implementing this: the "Future Self: Search For This When" section is the single most valuable field. It's the bridge between how you describe the problem *now* (with full context) and how a future agent will describe it (with zero context, just symptoms). Writing good search triggers is an act of empathy toward your future self.
@@ -0,0 +1,263 @@
1
+ # Research-Backed Workflow Upgrade Pattern - Methodology & Implementation
2
+
3
+ ## The Problem
4
+
5
+ AI agent workflows (WARRIOR, Dominion Flow, etc.) evolve through manual intuition — someone notices a gap, proposes a fix, implements it. This works for small changes but misses systemic improvements that academic research and community patterns have already solved.
6
+
7
+ ### Why It Was Hard
8
+
9
+ - Academic papers (ACL, NeurIPS, ICML) contain breakthrough findings but use jargon that's hard to map to practical workflow changes
10
+ - Community patterns (Manus AI, Replit Agent, Bolt.new) are scattered across blog posts, tweets, and GitHub repos — no single source
11
+ - Internal gap analysis requires stepping back from the code to see structural blind spots (cross-phase contradictions, context drift, broken handoff chains)
12
+ - Synthesizing 50+ findings from different domains into a coherent upgrade plan is overwhelming without structure
13
+
14
+ ### Impact
15
+
16
+ Without systematic research-backed upgrades:
17
+ - Workflows reinvent solutions that papers already proved effective
18
+ - Known failure modes (context drift, assumption contradictions) repeat across projects
19
+ - Improvements are reactive (fix after failure) instead of proactive (prevent before failure)
20
+ - Agent performance plateaus because upgrades are incremental rather than informed by state-of-the-art
21
+
22
+ ---
23
+
24
+ ## The Solution
25
+
26
+ ### The 4-Agent Parallel Research Sweep
27
+
28
+ Launch 4 specialized research agents in parallel, each covering a different knowledge domain. They work independently and return findings that you synthesize into a prioritized upgrade plan.
29
+
30
+ ### Step 1: Define Research Scopes
31
+
32
+ Split the research into 4 non-overlapping domains:
33
+
34
+ ```
35
+ Agent 1: Academic Papers (2024-2026)
36
+ - Search: AI agent papers, multi-agent systems, code generation, debugging
37
+ - Sources: ACL, NeurIPS, ICML proceedings, arXiv
38
+ - Goal: Find proven techniques with measurable results (pass@1, accuracy, etc.)
39
+
40
+ Agent 2: Community Workflow Patterns
41
+ - Search: AI coding tool blogs, developer experience posts, open-source agents
42
+ - Sources: Manus AI, Replit, Cursor, Bolt.new, Devin, SWE-Agent
43
+ - Goal: Find practical patterns already working in production
44
+
45
+ Agent 3: Testing & Verification Research
46
+ - Search: AI testing frameworks, automated verification, quality assurance
47
+ - Sources: SWE-Bench, METR studies, CI/CD integration patterns
48
+ - Goal: Find ways to verify agent work more reliably
49
+
50
+ Agent 4: Internal Gap Analysis
51
+ - Search: Your own workflow files, past handoffs, known failure modes
52
+ - Sources: The actual workflow documentation being upgraded
53
+ - Goal: Find structural gaps, contradictions, missing features
54
+ ```
55
+
56
+ ### Step 2: Launch All 4 Agents Simultaneously
57
+
58
+ ```javascript
59
+ // Launch in a SINGLE message (parallel execution):
60
+
61
+ // Agent 1: Academic research
62
+ Task({
63
+ subagent_type: "general-purpose",
64
+ description: "Research AI agent papers 2024-2026",
65
+ prompt: "Search for recent AI papers on: multi-agent code generation, " +
66
+ "debugging with plan context, context window management, " +
67
+ "task recitation, agent evaluation. For each paper found, " +
68
+ "extract: title, key finding, measurable result, and how it " +
69
+ "could improve [YOUR WORKFLOW NAME]. Return top 15 findings."
70
+ });
71
+
72
+ // Agent 2: Community patterns
73
+ Task({
74
+ subagent_type: "general-purpose",
75
+ description: "Research community AI workflow patterns",
76
+ prompt: "Search for blog posts and docs from Manus AI, Replit Agent, " +
77
+ "Bolt.new, Cursor, Devin about: context engineering, " +
78
+ "decision-time guidance, agent loops, workflow structure. " +
79
+ "For each pattern, extract: source, pattern name, how it works, " +
80
+ "and how it could improve [YOUR WORKFLOW NAME]. Return top 15."
81
+ });
82
+
83
+ // Agent 3: Testing & verification
84
+ Task({
85
+ subagent_type: "general-purpose",
86
+ description: "Research AI testing and verification",
87
+ prompt: "Search for: SWE-Bench results, METR studies, AI agent " +
88
+ "evaluation frameworks, automated code review patterns. " +
89
+ "Focus on: what makes agent verification reliable, common " +
90
+ "failure modes, confidence calibration. Return top 10 findings."
91
+ });
92
+
93
+ // Agent 4: Internal gap analysis
94
+ Task({
95
+ subagent_type: "Explore",
96
+ description: "Analyze current workflow gaps",
97
+ prompt: "Read all workflow files in [YOUR WORKFLOW PATH]. Identify: " +
98
+ "structural gaps (missing features), contradictions between " +
99
+ "files, assumptions that aren't tracked, handoff points that " +
100
+ "could break, areas where agents lack guidance. Return top 10 gaps."
101
+ });
102
+ ```
103
+
104
+ ### Step 3: Synthesize Into Priority Tiers
105
+
106
+ When all 4 agents return, synthesize findings into 3 tiers:
107
+
108
+ ```markdown
109
+ ## Tier 1: High Impact, Low Risk (implement now)
110
+ - Findings with proven results (papers with measurable improvements)
111
+ - Patterns already working in production elsewhere
112
+ - Internal gaps that are straightforward to fix
113
+ - Changes that don't break existing functionality
114
+
115
+ ## Tier 2: Medium Impact, Medium Risk (implement next version)
116
+ - Findings that require architectural changes
117
+ - Patterns that need adaptation to your workflow
118
+ - Improvements that depend on Tier 1 being complete
119
+
120
+ ## Tier 3: High Impact, High Risk (plan for future)
121
+ - Fundamental architectural changes
122
+ - Patterns that require new infrastructure
123
+ - Research findings that need more validation
124
+ ```
125
+
126
+ ### Step 4: Implement With Inline Citations
127
+
128
+ For every change, add a comment citing the research basis:
129
+
130
+ ```markdown
131
+ > **Research basis (v3.2):** MapCoder (ACL 2024) achieved 93.9% pass@1
132
+ > by feeding the Debugging Agent the original plan alongside buggy code.
133
+ > See: references/research-improvements.md (PLAN-DEBUG-1)
134
+ ```
135
+
136
+ This creates a traceable chain: **inline comment -> reference doc -> original source**.
137
+
138
+ ### Step 5: Create a Reference Document
139
+
140
+ Create a `research-improvements.md` that indexes all sources:
141
+
142
+ ```markdown
143
+ | ID | Source | Key Finding | Applied In |
144
+ |----|--------|-------------|------------|
145
+ | PLAN-DEBUG-1 | MapCoder (ACL 2024) | Plan-aware debugging: 93.9% pass@1 | fire-debug.md |
146
+ | RECITATION-1 | Manus AI (2025) | Task recitation prevents context drift | fire-loop.md |
147
+ | GAP-1 | Internal analysis | No decision log across phases | DECISION_LOG.md |
148
+ ```
149
+
150
+ ---
151
+
152
+ ## Real-World Results: Dominion Flow v3.2
153
+
154
+ This pattern was used to upgrade Dominion Flow from v3.1 to v3.2:
155
+
156
+ **Research Phase:**
157
+ - 4 agents ran in parallel (~5 minutes total)
158
+ - Returned 50+ findings across all domains
159
+ - Synthesized into 10 improvements across 3 tiers
160
+
161
+ **Tier 1 Implemented (same session):**
162
+
163
+ | Enhancement | Research Source | Impact |
164
+ |-------------|---------------|--------|
165
+ | Task Recitation Pattern | Manus AI (context engineering) | Prevents drift after ~50 tool calls in loops |
166
+ | Plan-Aware Debugging | MapCoder ACL 2024 (93.9% pass@1) | Debugger compares intended vs actual behavior |
167
+ | Decision Log | Internal gap analysis (GAP-1) | Prevents cross-phase decision contradictions |
168
+ | Assumptions Registry | Internal gap analysis (GAP-2) | Phase-gate validation catches stale assumptions |
169
+ | Handoff Completeness Validator | Internal gap analysis (GAP-10) | 17-point checklist prevents broken context chains |
170
+ | Code Comments Standard | User request + best practices | All agent-written code includes maintenance comments |
171
+
172
+ **Files changed:** 8 files across Dominion Flow
173
+ **Time:** ~2 hours from research launch to full implementation
174
+ **Traceability:** Every change has inline citation -> reference doc -> original source
175
+
176
+ ---
177
+
178
+ ## Testing the Pattern
179
+
180
+ ### How to Verify It Worked
181
+
182
+ 1. **Citation coverage:** Every modified file should have at least one research citation
183
+ 2. **Reference doc exists:** `references/research-improvements.md` with full index
184
+ 3. **Tier separation:** Changes should be clearly separated into implementation tiers
185
+ 4. **No orphan citations:** Every inline citation tag (e.g., GAP-1) exists in the reference doc
186
+ 5. **Version bump:** Plugin version reflects the upgrade (e.g., 3.1.0 -> 3.2.0)
187
+
188
+ ### Quality Checks
189
+
190
+ ```bash
191
+ # Verify all citations resolve
192
+ grep -r "See:.*research-improvements" [workflow-files] | \
193
+ sed 's/.*(\(.*\))/\1/' | sort -u
194
+ # Then check each tag exists in research-improvements.md
195
+
196
+ # Verify no placeholder text remains
197
+ grep -r "{.*}" [modified-files] | grep -v "^Binary"
198
+ # Should return only intentional template markers
199
+ ```
200
+
201
+ ---
202
+
203
+ ## Prevention (Avoiding Stale Workflows)
204
+
205
+ 1. **Schedule quarterly research sweeps** — technology moves fast
206
+ 2. **Track Tier 2/3 items** — don't lose future improvements
207
+ 3. **Update reference doc** — keep the citation chain intact
208
+ 4. **Re-run gap analysis** after major changes — new code creates new gaps
209
+ 5. **Version your upgrades** — clear version history for rollback
210
+
211
+ ---
212
+
213
+ ## Common Mistakes to Avoid
214
+
215
+ - **Implementing everything at once** — Tier separation exists for a reason. Tier 1 first.
216
+ - **Skipping citations** — Without inline comments, nobody knows WHY a change was made. Future agents will undo your work.
217
+ - **Research without synthesis** — 50 raw findings are useless. The synthesis step (Tier sorting) is where value is created.
218
+ - **Ignoring internal gaps** — Agent 4 (gap analysis) often finds the most impactful improvements because they're specific to YOUR workflow.
219
+ - **Not creating the reference doc** — Inline citations without a backing document are dead links.
220
+ - **Changing too many files without testing** — Even documentation changes can break workflows if agents read those docs at runtime.
221
+
222
+ ---
223
+
224
+ ## Related Patterns
225
+
226
+ - [Breath-Based Parallel Execution](./BREATH_BASED_PARALLEL_EXECUTION.md) — Breath pattern used for agent parallelism
227
+ - [Advanced Orchestration Patterns](./ADVANCED_ORCHESTRATION_PATTERNS.md) — Multi-agent coordination
228
+ - [WARRIOR Workflow Debugging Protocol](./WARRIOR_WORKFLOW_DEBUGGING_PROTOCOL.md) — Debugging with plan context
229
+
230
+ ---
231
+
232
+ ## Resources
233
+
234
+ - MapCoder (ACL 2024): Multi-Agent Code Generation through Planning
235
+ - Manus AI: Context Engineering for AI Agents (2025)
236
+ - Mason (2026): Judge Agent Separation pattern
237
+ - MIT RLCR (2025): Confidence-Based Escalation
238
+ - SWE-Bench Pro (2025): Single agent + retries vs multi-agent swarms
239
+ - METR Study (2025): AI Impact on Developer Productivity
240
+ - CNCF Four Pillars (2025): Golden Paths, Guardrails, Safety Nets, Manual Review
241
+ - Full citation index: `~/.claude/plugins/dominion-flow/references/research-improvements.md`
242
+
243
+ ---
244
+
245
+ ## Time to Implement
246
+
247
+ **Research phase:** ~10 minutes (4 parallel agents)
248
+ **Synthesis:** ~15 minutes (read findings, sort into tiers)
249
+ **Tier 1 implementation:** ~2 hours (depends on scope)
250
+ **Total:** ~2.5 hours for a major workflow upgrade
251
+
252
+ ## Difficulty Level
253
+
254
+ Difficulty: 3/5 — The parallel research pattern is straightforward, but the synthesis step requires judgment about what to implement and in what order. The implementation itself is mostly documentation changes (editing agent instructions, templates, commands) rather than code.
255
+
256
+ ---
257
+
258
+ **Author Notes:**
259
+ The biggest insight from this pattern: **Agent 4 (internal gap analysis) consistently finds the highest-impact improvements.** External research gives you proven techniques, but the internal analysis tells you exactly WHERE those techniques plug into YOUR specific gaps. Always include both.
260
+
261
+ The second insight: **inline citations are non-negotiable.** Without them, the next Claude instance has no idea why a section exists and might remove it during a future upgrade. The citation chain (inline -> reference doc -> source) is what makes improvements durable across sessions.
262
+
263
+ This pattern was first used on Dominion Flow v3.2 (2026-02-10) and produced 5 Tier 1 enhancements in a single session.