@thierrynakoa/fire-flow 10.0.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.claude-plugin/plugin.json +64 -0
- package/ARCHITECTURE-DIAGRAM.md +440 -0
- package/COMMAND-REFERENCE.md +172 -0
- package/DOMINION-FLOW-OVERVIEW.md +421 -0
- package/LICENSE +21 -0
- package/QUICK-START.md +351 -0
- package/README.md +398 -0
- package/TROUBLESHOOTING.md +264 -0
- package/agents/fire-codebase-mapper.md +484 -0
- package/agents/fire-debugger.md +535 -0
- package/agents/fire-executor.md +949 -0
- package/agents/fire-fact-checker.md +276 -0
- package/agents/fire-learncoding-explainer.md +237 -0
- package/agents/fire-learncoding-walker.md +147 -0
- package/agents/fire-planner.md +675 -0
- package/agents/fire-project-researcher.md +155 -0
- package/agents/fire-research-synthesizer.md +166 -0
- package/agents/fire-researcher.md +723 -0
- package/agents/fire-reviewer.md +499 -0
- package/agents/fire-roadmapper.md +203 -0
- package/agents/fire-verifier.md +880 -0
- package/bin/cli.js +208 -0
- package/commands/fire-0-orient.md +476 -0
- package/commands/fire-1-new.md +281 -0
- package/commands/fire-1a-discuss.md +455 -0
- package/commands/fire-2-plan.md +527 -0
- package/commands/fire-3-execute.md +1303 -0
- package/commands/fire-4-verify.md +845 -0
- package/commands/fire-5-handoff.md +515 -0
- package/commands/fire-6-resume.md +501 -0
- package/commands/fire-7-review.md +409 -0
- package/commands/fire-add-new-skill.md +598 -0
- package/commands/fire-analytics.md +499 -0
- package/commands/fire-assumptions.md +78 -0
- package/commands/fire-autonomous.md +528 -0
- package/commands/fire-brainstorm.md +413 -0
- package/commands/fire-complete-milestone.md +270 -0
- package/commands/fire-dashboard.md +375 -0
- package/commands/fire-debug.md +663 -0
- package/commands/fire-discover.md +616 -0
- package/commands/fire-double-check.md +460 -0
- package/commands/fire-execute-plan.md +182 -0
- package/commands/fire-learncoding.md +242 -0
- package/commands/fire-loop-resume.md +272 -0
- package/commands/fire-loop-stop.md +198 -0
- package/commands/fire-loop.md +1168 -0
- package/commands/fire-map-codebase.md +313 -0
- package/commands/fire-new-milestone.md +356 -0
- package/commands/fire-reflect.md +235 -0
- package/commands/fire-research.md +246 -0
- package/commands/fire-search.md +330 -0
- package/commands/fire-security-audit-repo.md +293 -0
- package/commands/fire-security-scan.md +484 -0
- package/commands/fire-session-summary.md +252 -0
- package/commands/fire-skills-diff.md +506 -0
- package/commands/fire-skills-history.md +388 -0
- package/commands/fire-skills-rollback.md +408 -0
- package/commands/fire-skills-sync.md +470 -0
- package/commands/fire-test.md +520 -0
- package/commands/fire-todos.md +335 -0
- package/commands/fire-transition.md +186 -0
- package/commands/fire-update.md +312 -0
- package/commands/fire-verify-uat.md +146 -0
- package/commands/fire-vuln-scan.md +493 -0
- package/hooks/hooks.json +16 -0
- package/hooks/run-hook.cmd +69 -0
- package/hooks/run-hook.sh +8 -0
- package/hooks/run-session-end.cmd +49 -0
- package/hooks/run-session-end.sh +7 -0
- package/hooks/session-end.sh +90 -0
- package/hooks/session-start.sh +111 -0
- package/package.json +52 -0
- package/plugin.json +7 -0
- package/references/auto-skill-extraction.md +136 -0
- package/references/behavioral-directives.md +365 -0
- package/references/blocker-tracking.md +155 -0
- package/references/checkpoints.md +165 -0
- package/references/circuit-breaker.md +410 -0
- package/references/context-engineering.md +587 -0
- package/references/decision-time-guidance.md +289 -0
- package/references/error-classification.md +326 -0
- package/references/execution-mode-intelligence.md +242 -0
- package/references/git-integration.md +217 -0
- package/references/honesty-protocols.md +304 -0
- package/references/integration-architecture.md +470 -0
- package/references/issue-to-pr-pipeline.md +150 -0
- package/references/metrics-and-trends.md +234 -0
- package/references/playwright-e2e-testing.md +326 -0
- package/references/questioning.md +125 -0
- package/references/research-improvements.md +110 -0
- package/references/skills-usage-guide.md +429 -0
- package/references/tdd.md +131 -0
- package/references/testing-enforcement.md +192 -0
- package/references/ui-brand.md +383 -0
- package/references/validation-checklist.md +456 -0
- package/references/verification-patterns.md +187 -0
- package/references/warrior-principles.md +173 -0
- package/skills-library/SKILLS-INDEX.md +588 -0
- package/skills-library/_general/frontend/html-visual-reports.md +292 -0
- package/skills-library/_general/methodology/debug-swarm-researcher-escape-hatch.md +240 -0
- package/skills-library/_general/methodology/learncoding-agentic-pattern.md +114 -0
- package/skills-library/_general/methodology/shell-autonomous-loop-fixplan.md +238 -0
- package/skills-library/basics/api-rest-basics.md +162 -0
- package/skills-library/basics/env-variables.md +96 -0
- package/skills-library/basics/error-handling-basics.md +125 -0
- package/skills-library/basics/git-commit-conventions.md +106 -0
- package/skills-library/basics/readme-template.md +108 -0
- package/skills-library/common-tasks/async-await-patterns.md +157 -0
- package/skills-library/common-tasks/auth-jwt-basics.md +164 -0
- package/skills-library/common-tasks/database-schema-design.md +166 -0
- package/skills-library/common-tasks/file-upload-basics.md +166 -0
- package/skills-library/common-tasks/form-validation.md +159 -0
- package/skills-library/debugging/FAILURE_TAXONOMY_CLASSIFICATION.md +117 -0
- package/skills-library/debugging/THREE_AGENT_HYPOTHESIS_DEBUGGING.md +86 -0
- package/skills-library/methodology/BREATH_BASED_PARALLEL_EXECUTION.md +678 -0
- package/skills-library/methodology/CONFIDENCE_GATED_EXECUTION.md +243 -0
- package/skills-library/methodology/EVIDENCE_BASED_VALIDATION.md +308 -0
- package/skills-library/methodology/MULTI_PERSPECTIVE_CODE_REVIEW.md +330 -0
- package/skills-library/methodology/PATH_VERIFICATION_GATE.md +211 -0
- package/skills-library/methodology/REFLEXION_MEMORY_PATTERN.md +183 -0
- package/skills-library/methodology/RESEARCH_BACKED_WORKFLOW_UPGRADE.md +263 -0
- package/skills-library/methodology/SABBATH_REST_PATTERN.md +267 -0
- package/skills-library/methodology/STONE_AND_SCAFFOLD.md +220 -0
- package/skills-library/performance/cache-augmented-generation.md +172 -0
- package/skills-library/quality-safety/debugging-steps.md +147 -0
- package/skills-library/quality-safety/deployment-checklist.md +155 -0
- package/skills-library/quality-safety/security-checklist.md +204 -0
- package/skills-library/quality-safety/testing-basics.md +180 -0
- package/skills-library/security/agent-security-scanner.md +445 -0
- package/skills-library/specialists/api-architecture/api-designer.md +49 -0
- package/skills-library/specialists/api-architecture/graphql-architect.md +49 -0
- package/skills-library/specialists/api-architecture/mcp-developer.md +51 -0
- package/skills-library/specialists/api-architecture/microservices-architect.md +50 -0
- package/skills-library/specialists/api-architecture/websocket-engineer.md +48 -0
- package/skills-library/specialists/backend/django-expert.md +52 -0
- package/skills-library/specialists/backend/fastapi-expert.md +52 -0
- package/skills-library/specialists/backend/laravel-specialist.md +52 -0
- package/skills-library/specialists/backend/nestjs-expert.md +51 -0
- package/skills-library/specialists/backend/rails-expert.md +53 -0
- package/skills-library/specialists/backend/spring-boot-engineer.md +56 -0
- package/skills-library/specialists/data-ml/fine-tuning-expert.md +48 -0
- package/skills-library/specialists/data-ml/ml-pipeline.md +47 -0
- package/skills-library/specialists/data-ml/pandas-pro.md +47 -0
- package/skills-library/specialists/data-ml/rag-architect.md +51 -0
- package/skills-library/specialists/data-ml/spark-engineer.md +47 -0
- package/skills-library/specialists/frontend/angular-architect.md +52 -0
- package/skills-library/specialists/frontend/flutter-expert.md +51 -0
- package/skills-library/specialists/frontend/nextjs-developer.md +54 -0
- package/skills-library/specialists/frontend/react-native-expert.md +50 -0
- package/skills-library/specialists/frontend/vue-expert.md +51 -0
- package/skills-library/specialists/infrastructure/chaos-engineer.md +74 -0
- package/skills-library/specialists/infrastructure/cloud-architect.md +70 -0
- package/skills-library/specialists/infrastructure/database-optimizer.md +64 -0
- package/skills-library/specialists/infrastructure/devops-engineer.md +70 -0
- package/skills-library/specialists/infrastructure/kubernetes-specialist.md +52 -0
- package/skills-library/specialists/infrastructure/monitoring-expert.md +70 -0
- package/skills-library/specialists/infrastructure/sre-engineer.md +70 -0
- package/skills-library/specialists/infrastructure/terraform-engineer.md +51 -0
- package/skills-library/specialists/languages/cpp-pro.md +74 -0
- package/skills-library/specialists/languages/csharp-developer.md +69 -0
- package/skills-library/specialists/languages/dotnet-core-expert.md +54 -0
- package/skills-library/specialists/languages/golang-pro.md +51 -0
- package/skills-library/specialists/languages/java-architect.md +49 -0
- package/skills-library/specialists/languages/javascript-pro.md +68 -0
- package/skills-library/specialists/languages/kotlin-specialist.md +68 -0
- package/skills-library/specialists/languages/php-pro.md +49 -0
- package/skills-library/specialists/languages/python-pro.md +52 -0
- package/skills-library/specialists/languages/react-expert.md +51 -0
- package/skills-library/specialists/languages/rust-engineer.md +50 -0
- package/skills-library/specialists/languages/sql-pro.md +56 -0
- package/skills-library/specialists/languages/swift-expert.md +69 -0
- package/skills-library/specialists/languages/typescript-pro.md +51 -0
- package/skills-library/specialists/platform/atlassian-mcp.md +52 -0
- package/skills-library/specialists/platform/embedded-systems.md +53 -0
- package/skills-library/specialists/platform/game-developer.md +53 -0
- package/skills-library/specialists/platform/salesforce-developer.md +53 -0
- package/skills-library/specialists/platform/shopify-expert.md +49 -0
- package/skills-library/specialists/platform/wordpress-pro.md +49 -0
- package/skills-library/specialists/quality/code-documenter.md +51 -0
- package/skills-library/specialists/quality/code-reviewer.md +67 -0
- package/skills-library/specialists/quality/debugging-wizard.md +51 -0
- package/skills-library/specialists/quality/fullstack-guardian.md +51 -0
- package/skills-library/specialists/quality/legacy-modernizer.md +50 -0
- package/skills-library/specialists/quality/playwright-expert.md +65 -0
- package/skills-library/specialists/quality/spec-miner.md +56 -0
- package/skills-library/specialists/quality/test-master.md +65 -0
- package/skills-library/specialists/security/secure-code-guardian.md +55 -0
- package/skills-library/specialists/security/security-reviewer.md +53 -0
- package/skills-library/specialists/workflow/architecture-designer.md +53 -0
- package/skills-library/specialists/workflow/cli-developer.md +70 -0
- package/skills-library/specialists/workflow/feature-forge.md +65 -0
- package/skills-library/specialists/workflow/prompt-engineer.md +54 -0
- package/skills-library/specialists/workflow/the-fool.md +62 -0
- package/templates/ASSUMPTIONS.md +125 -0
- package/templates/BLOCKERS.md +73 -0
- package/templates/DECISION_LOG.md +116 -0
- package/templates/UAT.md +96 -0
- package/templates/blueprint.md +94 -0
- package/templates/brainstorm.md +185 -0
- package/templates/conscience.md +92 -0
- package/templates/fire-handoff.md +159 -0
- package/templates/metrics.md +67 -0
- package/templates/phase-prompt.md +142 -0
- package/templates/record.md +131 -0
- package/templates/review-report.md +117 -0
- package/templates/skills-index.md +157 -0
- package/templates/verification.md +149 -0
- package/templates/vision.md +79 -0
- package/validation-config.yml +793 -0
- package/version.json +7 -0
- package/workflows/execute-phase.md +732 -0
- package/workflows/handoff-session.md +678 -0
- package/workflows/new-project.md +578 -0
- package/workflows/plan-phase.md +592 -0
- package/workflows/verify-phase.md +874 -0
|
@@ -0,0 +1,183 @@
|
|
|
1
|
+
# Reflexion Memory Pattern — Cross-Session Failure Learning
|
|
2
|
+
|
|
3
|
+
## The Problem
|
|
4
|
+
|
|
5
|
+
AI agents repeat the same mistakes across sessions because failure context is lost. Debug sessions resolve issues, but the knowledge dies with the session. The next agent encountering the same symptoms starts from scratch.
|
|
6
|
+
|
|
7
|
+
### Why It Was Hard
|
|
8
|
+
|
|
9
|
+
- Debug sessions produce rich context (symptoms, hypotheses, evidence, root causes) but it's trapped in `.planning/debug/` files that are project-specific and not searchable cross-project
|
|
10
|
+
- Failed approaches are the most valuable learning — but agents only record what *worked*, not what *didn't*
|
|
11
|
+
- Finding the right granularity: too detailed = noise, too abstract = useless
|
|
12
|
+
- Integration requires modifying multiple command flows (debug, loop, execute)
|
|
13
|
+
|
|
14
|
+
### Impact
|
|
15
|
+
|
|
16
|
+
- Same bugs debugged repeatedly across sessions (hours wasted)
|
|
17
|
+
- Silent failures re-investigated from scratch every time
|
|
18
|
+
- No institutional memory of "this library is broken on Python 3.14"
|
|
19
|
+
- Debug sessions take 3x longer than necessary when prior knowledge exists
|
|
20
|
+
|
|
21
|
+
---
|
|
22
|
+
|
|
23
|
+
## The Solution
|
|
24
|
+
|
|
25
|
+
### Root Cause
|
|
26
|
+
|
|
27
|
+
Agent systems store *conclusions* (handoffs, skills) but not *journeys* (what was tried, what failed, why). Reflexion research shows that storing the journey as linguistic self-reflection dramatically improves future performance (91% pass@1 vs baselines).
|
|
28
|
+
|
|
29
|
+
### The Reflection File Format
|
|
30
|
+
|
|
31
|
+
```markdown
|
|
32
|
+
---
|
|
33
|
+
type: reflection
|
|
34
|
+
date: 2026-02-20
|
|
35
|
+
project: claude-voice-bridge
|
|
36
|
+
trigger: debug-resolution | test-failure | approach-rotation | stalled-loop
|
|
37
|
+
severity: minor | moderate | critical
|
|
38
|
+
tags: [pynput, keyboard, hotkeys, python-3.14]
|
|
39
|
+
---
|
|
40
|
+
# What I tried and why it failed
|
|
41
|
+
|
|
42
|
+
## The Problem
|
|
43
|
+
Hotkeys stopped responding. No errors — completely silent failure.
|
|
44
|
+
|
|
45
|
+
## What I Tried (and why each failed)
|
|
46
|
+
1. **Checked keyboard library hooks** — hooks installed, listener alive,
|
|
47
|
+
but zero callbacks. Root cause: keyboard 0.13.5 broken on Python 3.14.
|
|
48
|
+
2. **Switched to pynput with char matching** — pynput works, but Ctrl+M
|
|
49
|
+
sends '\r' not 'm'. Silent mismatch.
|
|
50
|
+
|
|
51
|
+
## What Actually Worked
|
|
52
|
+
Used `KeyCode.from_vk(ord(name.upper()))` — VK codes are stable
|
|
53
|
+
regardless of modifier state.
|
|
54
|
+
|
|
55
|
+
## The Lesson
|
|
56
|
+
When a library installs without errors but produces no output, suspect
|
|
57
|
+
Python version incompatibility. Always match keyboard keys by VK code,
|
|
58
|
+
never by char when modifiers are involved.
|
|
59
|
+
|
|
60
|
+
## Future Self: Search For This When
|
|
61
|
+
- Hotkeys stop working silently
|
|
62
|
+
- Keyboard hooks fire zero events
|
|
63
|
+
- Ctrl+letter combinations fail to match
|
|
64
|
+
```
|
|
65
|
+
|
|
66
|
+
### Three Integration Points
|
|
67
|
+
|
|
68
|
+
**1. Pre-Investigation Search (Step 2.5 in debug flow):**
|
|
69
|
+
```
|
|
70
|
+
Before investigating any issue:
|
|
71
|
+
Search reflections: /fire-remember "{symptoms}" --type reflection
|
|
72
|
+
|
|
73
|
+
If match found with >0.75 similarity:
|
|
74
|
+
"I've seen this before — {lesson}. Applying directly."
|
|
75
|
+
Offer: [Apply same fix] [Investigate fresh] [Compare differences]
|
|
76
|
+
```
|
|
77
|
+
|
|
78
|
+
**2. Post-Resolution Capture (Step 7.5 in debug flow):**
|
|
79
|
+
```
|
|
80
|
+
After root cause found and fix verified:
|
|
81
|
+
Auto-generate reflection from debug file
|
|
82
|
+
Extract: symptoms → failed hypotheses → root cause → fix → lesson
|
|
83
|
+
|
|
84
|
+
Severity classification:
|
|
85
|
+
critical: 5+ eliminated hypotheses OR 10+ files changed
|
|
86
|
+
moderate: 2-4 eliminated hypotheses OR multi-file fix
|
|
87
|
+
minor: 1 hypothesis OR single-file fix
|
|
88
|
+
```
|
|
89
|
+
|
|
90
|
+
**3. Loop Failure Capture (Step 9 in loop):**
|
|
91
|
+
```
|
|
92
|
+
On STALLED (3+ iterations no progress):
|
|
93
|
+
Save reflection with trigger: "stalled-loop"
|
|
94
|
+
Include: what was attempted, measurements, why no progress
|
|
95
|
+
|
|
96
|
+
On SPINNING (same error repeated):
|
|
97
|
+
Save reflection with trigger: "approach-rotation"
|
|
98
|
+
Include: each failed approach with error hash
|
|
99
|
+
```
|
|
100
|
+
|
|
101
|
+
### Storage & Search
|
|
102
|
+
|
|
103
|
+
```
|
|
104
|
+
Location: ~/.claude/reflections/
|
|
105
|
+
Indexed in: Qdrant as sourceType: 'reflection'
|
|
106
|
+
Search: /fire-remember "{query}" --type reflection
|
|
107
|
+
Command: /fire-reflect capture|search|list|review
|
|
108
|
+
```
|
|
109
|
+
|
|
110
|
+
---
|
|
111
|
+
|
|
112
|
+
## Testing the Fix
|
|
113
|
+
|
|
114
|
+
### Verification Steps
|
|
115
|
+
|
|
116
|
+
1. Create a reflection file manually in `~/.claude/reflections/`
|
|
117
|
+
2. Run `npm run consolidate` to index it
|
|
118
|
+
3. Search: `npm run search -- "hotkeys silent failure" --type reflection`
|
|
119
|
+
4. Confirm the reflection appears in results with correct sourceType
|
|
120
|
+
|
|
121
|
+
### Quality Checklist
|
|
122
|
+
|
|
123
|
+
A good reflection has:
|
|
124
|
+
- [ ] Specific symptoms (error messages, observed behaviors)
|
|
125
|
+
- [ ] Multiple failed approaches with *reasons* they failed
|
|
126
|
+
- [ ] Concrete solution (code, command, config change — not vague advice)
|
|
127
|
+
- [ ] One-sentence lesson useful without context
|
|
128
|
+
- [ ] Search triggers matching how you'd describe the problem naturally
|
|
129
|
+
|
|
130
|
+
A bad reflection:
|
|
131
|
+
- "Something was wrong with the API" (too vague)
|
|
132
|
+
- Only records the solution without the journey
|
|
133
|
+
- Lesson is "be more careful" (not actionable)
|
|
134
|
+
|
|
135
|
+
---
|
|
136
|
+
|
|
137
|
+
## Prevention
|
|
138
|
+
|
|
139
|
+
1. Make reflection generation **automatic** after debug resolution — don't rely on manual capture
|
|
140
|
+
2. Keep reflections **concise** — the lesson and search triggers are most important
|
|
141
|
+
3. Review reflections periodically — merge duplicates, update outdated ones
|
|
142
|
+
4. Tag with specific technologies and error patterns for better search
|
|
143
|
+
|
|
144
|
+
---
|
|
145
|
+
|
|
146
|
+
## Related Patterns
|
|
147
|
+
|
|
148
|
+
- [AGENT_SELF_IMPROVEMENT_LOOP](./AGENT_SELF_IMPROVEMENT_LOOP.md) - Full 6-upgrade blueprint
|
|
149
|
+
- [CONFIDENCE_GATED_EXECUTION](./CONFIDENCE_GATED_EXECUTION.md) - Reflections feed confidence scoring
|
|
150
|
+
- [WARRIOR_WORKFLOW_DEBUGGING_PROTOCOL](./WARRIOR_WORKFLOW_DEBUGGING_PROTOCOL.md) - Debug flow where reflections integrate
|
|
151
|
+
|
|
152
|
+
---
|
|
153
|
+
|
|
154
|
+
## Common Mistakes to Avoid
|
|
155
|
+
|
|
156
|
+
- Capturing reflections for trivial issues (typo fixes, config changes) — noise overwhelms signal
|
|
157
|
+
- Writing the "lesson" as a platitude ("always test thoroughly") instead of a specific takeaway
|
|
158
|
+
- Not including search triggers — the reflection exists but is unfindable
|
|
159
|
+
- Storing reflections per-project instead of globally — defeats cross-session learning
|
|
160
|
+
- Skipping the "what I tried" section — the failed approaches are the most valuable part
|
|
161
|
+
|
|
162
|
+
---
|
|
163
|
+
|
|
164
|
+
## Resources
|
|
165
|
+
|
|
166
|
+
- Reflexion (NeurIPS 2023): https://arxiv.org/abs/2303.11366
|
|
167
|
+
- "Language Agents with Verbal Reinforcement Learning" — Shinn et al.
|
|
168
|
+
- Dominion Flow implementation: `/fire-reflect` command, `fire-debug.md` Steps 2.5 and 7.5
|
|
169
|
+
|
|
170
|
+
---
|
|
171
|
+
|
|
172
|
+
## Time to Implement
|
|
173
|
+
|
|
174
|
+
**2-3 hours** — Create reflection directory, write command, modify debug/loop flows, add to vector index
|
|
175
|
+
|
|
176
|
+
## Difficulty Level
|
|
177
|
+
|
|
178
|
+
Stars: 2/5 — Conceptually simple. The hard part is building the discipline to actually search reflections before investigating and to capture them after resolution.
|
|
179
|
+
|
|
180
|
+
---
|
|
181
|
+
|
|
182
|
+
**Author Notes:**
|
|
183
|
+
The most surprising finding from implementing this: the "Future Self: Search For This When" section is the single most valuable field. It's the bridge between how you describe the problem *now* (with full context) and how a future agent will describe it (with zero context, just symptoms). Writing good search triggers is an act of empathy toward your future self.
|
|
@@ -0,0 +1,263 @@
|
|
|
1
|
+
# Research-Backed Workflow Upgrade Pattern - Methodology & Implementation
|
|
2
|
+
|
|
3
|
+
## The Problem
|
|
4
|
+
|
|
5
|
+
AI agent workflows (WARRIOR, Dominion Flow, etc.) evolve through manual intuition — someone notices a gap, proposes a fix, implements it. This works for small changes but misses systemic improvements that academic research and community patterns have already solved.
|
|
6
|
+
|
|
7
|
+
### Why It Was Hard
|
|
8
|
+
|
|
9
|
+
- Academic papers (ACL, NeurIPS, ICML) contain breakthrough findings but use jargon that's hard to map to practical workflow changes
|
|
10
|
+
- Community patterns (Manus AI, Replit Agent, Bolt.new) are scattered across blog posts, tweets, and GitHub repos — no single source
|
|
11
|
+
- Internal gap analysis requires stepping back from the code to see structural blind spots (cross-phase contradictions, context drift, broken handoff chains)
|
|
12
|
+
- Synthesizing 50+ findings from different domains into a coherent upgrade plan is overwhelming without structure
|
|
13
|
+
|
|
14
|
+
### Impact
|
|
15
|
+
|
|
16
|
+
Without systematic research-backed upgrades:
|
|
17
|
+
- Workflows reinvent solutions that papers already proved effective
|
|
18
|
+
- Known failure modes (context drift, assumption contradictions) repeat across projects
|
|
19
|
+
- Improvements are reactive (fix after failure) instead of proactive (prevent before failure)
|
|
20
|
+
- Agent performance plateaus because upgrades are incremental rather than informed by state-of-the-art
|
|
21
|
+
|
|
22
|
+
---
|
|
23
|
+
|
|
24
|
+
## The Solution
|
|
25
|
+
|
|
26
|
+
### The 4-Agent Parallel Research Sweep
|
|
27
|
+
|
|
28
|
+
Launch 4 specialized research agents in parallel, each covering a different knowledge domain. They work independently and return findings that you synthesize into a prioritized upgrade plan.
|
|
29
|
+
|
|
30
|
+
### Step 1: Define Research Scopes
|
|
31
|
+
|
|
32
|
+
Split the research into 4 non-overlapping domains:
|
|
33
|
+
|
|
34
|
+
```
|
|
35
|
+
Agent 1: Academic Papers (2024-2026)
|
|
36
|
+
- Search: AI agent papers, multi-agent systems, code generation, debugging
|
|
37
|
+
- Sources: ACL, NeurIPS, ICML proceedings, arXiv
|
|
38
|
+
- Goal: Find proven techniques with measurable results (pass@1, accuracy, etc.)
|
|
39
|
+
|
|
40
|
+
Agent 2: Community Workflow Patterns
|
|
41
|
+
- Search: AI coding tool blogs, developer experience posts, open-source agents
|
|
42
|
+
- Sources: Manus AI, Replit, Cursor, Bolt.new, Devin, SWE-Agent
|
|
43
|
+
- Goal: Find practical patterns already working in production
|
|
44
|
+
|
|
45
|
+
Agent 3: Testing & Verification Research
|
|
46
|
+
- Search: AI testing frameworks, automated verification, quality assurance
|
|
47
|
+
- Sources: SWE-Bench, METR studies, CI/CD integration patterns
|
|
48
|
+
- Goal: Find ways to verify agent work more reliably
|
|
49
|
+
|
|
50
|
+
Agent 4: Internal Gap Analysis
|
|
51
|
+
- Search: Your own workflow files, past handoffs, known failure modes
|
|
52
|
+
- Sources: The actual workflow documentation being upgraded
|
|
53
|
+
- Goal: Find structural gaps, contradictions, missing features
|
|
54
|
+
```
|
|
55
|
+
|
|
56
|
+
### Step 2: Launch All 4 Agents Simultaneously
|
|
57
|
+
|
|
58
|
+
```javascript
|
|
59
|
+
// Launch in a SINGLE message (parallel execution):
|
|
60
|
+
|
|
61
|
+
// Agent 1: Academic research
|
|
62
|
+
Task({
|
|
63
|
+
subagent_type: "general-purpose",
|
|
64
|
+
description: "Research AI agent papers 2024-2026",
|
|
65
|
+
prompt: "Search for recent AI papers on: multi-agent code generation, " +
|
|
66
|
+
"debugging with plan context, context window management, " +
|
|
67
|
+
"task recitation, agent evaluation. For each paper found, " +
|
|
68
|
+
"extract: title, key finding, measurable result, and how it " +
|
|
69
|
+
"could improve [YOUR WORKFLOW NAME]. Return top 15 findings."
|
|
70
|
+
});
|
|
71
|
+
|
|
72
|
+
// Agent 2: Community patterns
|
|
73
|
+
Task({
|
|
74
|
+
subagent_type: "general-purpose",
|
|
75
|
+
description: "Research community AI workflow patterns",
|
|
76
|
+
prompt: "Search for blog posts and docs from Manus AI, Replit Agent, " +
|
|
77
|
+
"Bolt.new, Cursor, Devin about: context engineering, " +
|
|
78
|
+
"decision-time guidance, agent loops, workflow structure. " +
|
|
79
|
+
"For each pattern, extract: source, pattern name, how it works, " +
|
|
80
|
+
"and how it could improve [YOUR WORKFLOW NAME]. Return top 15."
|
|
81
|
+
});
|
|
82
|
+
|
|
83
|
+
// Agent 3: Testing & verification
|
|
84
|
+
Task({
|
|
85
|
+
subagent_type: "general-purpose",
|
|
86
|
+
description: "Research AI testing and verification",
|
|
87
|
+
prompt: "Search for: SWE-Bench results, METR studies, AI agent " +
|
|
88
|
+
"evaluation frameworks, automated code review patterns. " +
|
|
89
|
+
"Focus on: what makes agent verification reliable, common " +
|
|
90
|
+
"failure modes, confidence calibration. Return top 10 findings."
|
|
91
|
+
});
|
|
92
|
+
|
|
93
|
+
// Agent 4: Internal gap analysis
|
|
94
|
+
Task({
|
|
95
|
+
subagent_type: "Explore",
|
|
96
|
+
description: "Analyze current workflow gaps",
|
|
97
|
+
prompt: "Read all workflow files in [YOUR WORKFLOW PATH]. Identify: " +
|
|
98
|
+
"structural gaps (missing features), contradictions between " +
|
|
99
|
+
"files, assumptions that aren't tracked, handoff points that " +
|
|
100
|
+
"could break, areas where agents lack guidance. Return top 10 gaps."
|
|
101
|
+
});
|
|
102
|
+
```
|
|
103
|
+
|
|
104
|
+
### Step 3: Synthesize Into Priority Tiers
|
|
105
|
+
|
|
106
|
+
When all 4 agents return, synthesize findings into 3 tiers:
|
|
107
|
+
|
|
108
|
+
```markdown
|
|
109
|
+
## Tier 1: High Impact, Low Risk (implement now)
|
|
110
|
+
- Findings with proven results (papers with measurable improvements)
|
|
111
|
+
- Patterns already working in production elsewhere
|
|
112
|
+
- Internal gaps that are straightforward to fix
|
|
113
|
+
- Changes that don't break existing functionality
|
|
114
|
+
|
|
115
|
+
## Tier 2: Medium Impact, Medium Risk (implement next version)
|
|
116
|
+
- Findings that require architectural changes
|
|
117
|
+
- Patterns that need adaptation to your workflow
|
|
118
|
+
- Improvements that depend on Tier 1 being complete
|
|
119
|
+
|
|
120
|
+
## Tier 3: High Impact, High Risk (plan for future)
|
|
121
|
+
- Fundamental architectural changes
|
|
122
|
+
- Patterns that require new infrastructure
|
|
123
|
+
- Research findings that need more validation
|
|
124
|
+
```
|
|
125
|
+
|
|
126
|
+
### Step 4: Implement With Inline Citations
|
|
127
|
+
|
|
128
|
+
For every change, add a comment citing the research basis:
|
|
129
|
+
|
|
130
|
+
```markdown
|
|
131
|
+
> **Research basis (v3.2):** MapCoder (ACL 2024) achieved 93.9% pass@1
|
|
132
|
+
> by feeding the Debugging Agent the original plan alongside buggy code.
|
|
133
|
+
> See: references/research-improvements.md (PLAN-DEBUG-1)
|
|
134
|
+
```
|
|
135
|
+
|
|
136
|
+
This creates a traceable chain: **inline comment -> reference doc -> original source**.
|
|
137
|
+
|
|
138
|
+
### Step 5: Create a Reference Document
|
|
139
|
+
|
|
140
|
+
Create a `research-improvements.md` that indexes all sources:
|
|
141
|
+
|
|
142
|
+
```markdown
|
|
143
|
+
| ID | Source | Key Finding | Applied In |
|
|
144
|
+
|----|--------|-------------|------------|
|
|
145
|
+
| PLAN-DEBUG-1 | MapCoder (ACL 2024) | Plan-aware debugging: 93.9% pass@1 | fire-debug.md |
|
|
146
|
+
| RECITATION-1 | Manus AI (2025) | Task recitation prevents context drift | fire-loop.md |
|
|
147
|
+
| GAP-1 | Internal analysis | No decision log across phases | DECISION_LOG.md |
|
|
148
|
+
```
|
|
149
|
+
|
|
150
|
+
---
|
|
151
|
+
|
|
152
|
+
## Real-World Results: Dominion Flow v3.2
|
|
153
|
+
|
|
154
|
+
This pattern was used to upgrade Dominion Flow from v3.1 to v3.2:
|
|
155
|
+
|
|
156
|
+
**Research Phase:**
|
|
157
|
+
- 4 agents ran in parallel (~5 minutes total)
|
|
158
|
+
- Returned 50+ findings across all domains
|
|
159
|
+
- Synthesized into 10 improvements across 3 tiers
|
|
160
|
+
|
|
161
|
+
**Tier 1 Implemented (same session):**
|
|
162
|
+
|
|
163
|
+
| Enhancement | Research Source | Impact |
|
|
164
|
+
|-------------|---------------|--------|
|
|
165
|
+
| Task Recitation Pattern | Manus AI (context engineering) | Prevents drift after ~50 tool calls in loops |
|
|
166
|
+
| Plan-Aware Debugging | MapCoder ACL 2024 (93.9% pass@1) | Debugger compares intended vs actual behavior |
|
|
167
|
+
| Decision Log | Internal gap analysis (GAP-1) | Prevents cross-phase decision contradictions |
|
|
168
|
+
| Assumptions Registry | Internal gap analysis (GAP-2) | Phase-gate validation catches stale assumptions |
|
|
169
|
+
| Handoff Completeness Validator | Internal gap analysis (GAP-10) | 17-point checklist prevents broken context chains |
|
|
170
|
+
| Code Comments Standard | User request + best practices | All agent-written code includes maintenance comments |
|
|
171
|
+
|
|
172
|
+
**Files changed:** 8 files across Dominion Flow
|
|
173
|
+
**Time:** ~2 hours from research launch to full implementation
|
|
174
|
+
**Traceability:** Every change has inline citation -> reference doc -> original source
|
|
175
|
+
|
|
176
|
+
---
|
|
177
|
+
|
|
178
|
+
## Testing the Pattern
|
|
179
|
+
|
|
180
|
+
### How to Verify It Worked
|
|
181
|
+
|
|
182
|
+
1. **Citation coverage:** Every modified file should have at least one research citation
|
|
183
|
+
2. **Reference doc exists:** `references/research-improvements.md` with full index
|
|
184
|
+
3. **Tier separation:** Changes should be clearly separated into implementation tiers
|
|
185
|
+
4. **No orphan citations:** Every inline citation tag (e.g., GAP-1) exists in the reference doc
|
|
186
|
+
5. **Version bump:** Plugin version reflects the upgrade (e.g., 3.1.0 -> 3.2.0)
|
|
187
|
+
|
|
188
|
+
### Quality Checks
|
|
189
|
+
|
|
190
|
+
```bash
|
|
191
|
+
# Verify all citations resolve
|
|
192
|
+
grep -r "See:.*research-improvements" [workflow-files] | \
|
|
193
|
+
sed 's/.*(\(.*\))/\1/' | sort -u
|
|
194
|
+
# Then check each tag exists in research-improvements.md
|
|
195
|
+
|
|
196
|
+
# Verify no placeholder text remains
|
|
197
|
+
grep -r "{.*}" [modified-files] | grep -v "^Binary"
|
|
198
|
+
# Should return only intentional template markers
|
|
199
|
+
```
|
|
200
|
+
|
|
201
|
+
---
|
|
202
|
+
|
|
203
|
+
## Prevention (Avoiding Stale Workflows)
|
|
204
|
+
|
|
205
|
+
1. **Schedule quarterly research sweeps** — technology moves fast
|
|
206
|
+
2. **Track Tier 2/3 items** — don't lose future improvements
|
|
207
|
+
3. **Update reference doc** — keep the citation chain intact
|
|
208
|
+
4. **Re-run gap analysis** after major changes — new code creates new gaps
|
|
209
|
+
5. **Version your upgrades** — clear version history for rollback
|
|
210
|
+
|
|
211
|
+
---
|
|
212
|
+
|
|
213
|
+
## Common Mistakes to Avoid
|
|
214
|
+
|
|
215
|
+
- **Implementing everything at once** — Tier separation exists for a reason. Tier 1 first.
|
|
216
|
+
- **Skipping citations** — Without inline comments, nobody knows WHY a change was made. Future agents will undo your work.
|
|
217
|
+
- **Research without synthesis** — 50 raw findings are useless. The synthesis step (Tier sorting) is where value is created.
|
|
218
|
+
- **Ignoring internal gaps** — Agent 4 (gap analysis) often finds the most impactful improvements because they're specific to YOUR workflow.
|
|
219
|
+
- **Not creating the reference doc** — Inline citations without a backing document are dead links.
|
|
220
|
+
- **Changing too many files without testing** — Even documentation changes can break workflows if agents read those docs at runtime.
|
|
221
|
+
|
|
222
|
+
---
|
|
223
|
+
|
|
224
|
+
## Related Patterns
|
|
225
|
+
|
|
226
|
+
- [Breath-Based Parallel Execution](./BREATH_BASED_PARALLEL_EXECUTION.md) — Breath pattern used for agent parallelism
|
|
227
|
+
- [Advanced Orchestration Patterns](./ADVANCED_ORCHESTRATION_PATTERNS.md) — Multi-agent coordination
|
|
228
|
+
- [WARRIOR Workflow Debugging Protocol](./WARRIOR_WORKFLOW_DEBUGGING_PROTOCOL.md) — Debugging with plan context
|
|
229
|
+
|
|
230
|
+
---
|
|
231
|
+
|
|
232
|
+
## Resources
|
|
233
|
+
|
|
234
|
+
- MapCoder (ACL 2024): Multi-Agent Code Generation through Planning
|
|
235
|
+
- Manus AI: Context Engineering for AI Agents (2025)
|
|
236
|
+
- Mason (2026): Judge Agent Separation pattern
|
|
237
|
+
- MIT RLCR (2025): Confidence-Based Escalation
|
|
238
|
+
- SWE-Bench Pro (2025): Single agent + retries vs multi-agent swarms
|
|
239
|
+
- METR Study (2025): AI Impact on Developer Productivity
|
|
240
|
+
- CNCF Four Pillars (2025): Golden Paths, Guardrails, Safety Nets, Manual Review
|
|
241
|
+
- Full citation index: `~/.claude/plugins/dominion-flow/references/research-improvements.md`
|
|
242
|
+
|
|
243
|
+
---
|
|
244
|
+
|
|
245
|
+
## Time to Implement
|
|
246
|
+
|
|
247
|
+
**Research phase:** ~10 minutes (4 parallel agents)
|
|
248
|
+
**Synthesis:** ~15 minutes (read findings, sort into tiers)
|
|
249
|
+
**Tier 1 implementation:** ~2 hours (depends on scope)
|
|
250
|
+
**Total:** ~2.5 hours for a major workflow upgrade
|
|
251
|
+
|
|
252
|
+
## Difficulty Level
|
|
253
|
+
|
|
254
|
+
Difficulty: 3/5 — The parallel research pattern is straightforward, but the synthesis step requires judgment about what to implement and in what order. The implementation itself is mostly documentation changes (editing agent instructions, templates, commands) rather than code.
|
|
255
|
+
|
|
256
|
+
---
|
|
257
|
+
|
|
258
|
+
**Author Notes:**
|
|
259
|
+
The biggest insight from this pattern: **Agent 4 (internal gap analysis) consistently finds the highest-impact improvements.** External research gives you proven techniques, but the internal analysis tells you exactly WHERE those techniques plug into YOUR specific gaps. Always include both.
|
|
260
|
+
|
|
261
|
+
The second insight: **inline citations are non-negotiable.** Without them, the next Claude instance has no idea why a section exists and might remove it during a future upgrade. The citation chain (inline -> reference doc -> source) is what makes improvements durable across sessions.
|
|
262
|
+
|
|
263
|
+
This pattern was first used on Dominion Flow v3.2 (2026-02-10) and produced 5 Tier 1 enhancements in a single session.
|