feed-the-machine 1.6.1 → 1.7.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/LICENSE +21 -21
- package/README.md +170 -170
- package/bin/brain.py +1340 -0
- package/bin/convert_claude_skills_to_codex.py +490 -0
- package/bin/generate-manifest.mjs +463 -463
- package/bin/harden_codex_skills.py +141 -0
- package/bin/install.mjs +491 -491
- package/bin/migrate-eng-buddy-data.py +875 -0
- package/bin/playbook_engine/__init__.py +1 -0
- package/bin/playbook_engine/conftest.py +8 -0
- package/bin/playbook_engine/extractor.py +33 -0
- package/bin/playbook_engine/manager.py +102 -0
- package/bin/playbook_engine/models.py +84 -0
- package/bin/playbook_engine/registry.py +35 -0
- package/bin/playbook_engine/test_extractor.py +72 -0
- package/bin/playbook_engine/test_integration.py +129 -0
- package/bin/playbook_engine/test_manager.py +85 -0
- package/bin/playbook_engine/test_models.py +166 -0
- package/bin/playbook_engine/test_registry.py +67 -0
- package/bin/playbook_engine/test_tracer.py +86 -0
- package/bin/playbook_engine/tracer.py +93 -0
- package/bin/tasks_db.py +456 -0
- package/docs/HOOKS.md +243 -243
- package/docs/INBOX.md +233 -233
- package/ftm/SKILL.md +125 -122
- package/ftm-audit/SKILL.md +623 -623
- package/ftm-audit/references/protocols/PROJECT-PATTERNS.md +91 -91
- package/ftm-audit/references/protocols/RUNTIME-WIRING.md +66 -66
- package/ftm-audit/references/protocols/WIRING-CONTRACTS.md +135 -135
- package/ftm-audit/references/strategies/AUTO-FIX-STRATEGIES.md +69 -69
- package/ftm-audit/references/templates/REPORT-FORMAT.md +96 -96
- package/ftm-audit/scripts/run-knip.sh +23 -23
- package/ftm-audit.yml +2 -2
- package/ftm-brainstorm/SKILL.md +1003 -498
- package/ftm-brainstorm/evals/evals.json +180 -100
- package/ftm-brainstorm/evals/promptfoo.yaml +109 -109
- package/ftm-brainstorm/references/agent-prompts.md +552 -224
- package/ftm-brainstorm/references/plan-template.md +209 -121
- package/ftm-brainstorm.yml +2 -2
- package/ftm-browse/SKILL.md +454 -454
- package/ftm-browse/daemon/browser-manager.ts +206 -206
- package/ftm-browse/daemon/bun.lock +30 -30
- package/ftm-browse/daemon/cli.ts +347 -347
- package/ftm-browse/daemon/commands.ts +410 -410
- package/ftm-browse/daemon/main.ts +357 -357
- package/ftm-browse/daemon/package.json +17 -17
- package/ftm-browse/daemon/server.ts +189 -189
- package/ftm-browse/daemon/snapshot.ts +519 -519
- package/ftm-browse/daemon/tsconfig.json +22 -22
- package/ftm-browse.yml +4 -4
- package/ftm-capture/SKILL.md +370 -370
- package/ftm-capture.yml +4 -4
- package/ftm-codex-gate/SKILL.md +361 -361
- package/ftm-codex-gate.yml +2 -2
- package/ftm-config/SKILL.md +422 -345
- package/ftm-config.default.yml +125 -82
- package/ftm-config.yml +44 -2
- package/ftm-council/SKILL.md +416 -416
- package/ftm-council/references/prompts/CLAUDE-INVESTIGATION.md +60 -60
- package/ftm-council/references/prompts/CODEX-INVESTIGATION.md +58 -58
- package/ftm-council/references/prompts/GEMINI-INVESTIGATION.md +58 -58
- package/ftm-council/references/prompts/REBUTTAL-TEMPLATE.md +57 -57
- package/ftm-council/references/protocols/PREREQUISITES.md +47 -47
- package/ftm-council/references/protocols/STEP-0-FRAMING.md +46 -46
- package/ftm-council.yml +2 -2
- package/ftm-dashboard/SKILL.md +163 -163
- package/ftm-dashboard.yml +4 -4
- package/ftm-debug/SKILL.md +1037 -1037
- package/ftm-debug/references/phases/PHASE-0-INTAKE.md +58 -58
- package/ftm-debug/references/phases/PHASE-1-TRIAGE.md +46 -46
- package/ftm-debug/references/phases/PHASE-2-WAR-ROOM-AGENTS.md +279 -279
- package/ftm-debug/references/phases/PHASE-3-TO-6-EXECUTION.md +436 -436
- package/ftm-debug/references/protocols/BLACKBOARD.md +86 -86
- package/ftm-debug/references/protocols/EDGE-CASES.md +103 -103
- package/ftm-debug.yml +2 -2
- package/ftm-diagram/SKILL.md +277 -277
- package/ftm-diagram.yml +2 -2
- package/ftm-executor/SKILL.md +777 -777
- package/ftm-executor/references/STYLE-TEMPLATE.md +73 -73
- package/ftm-executor/references/phases/PHASE-0-VERIFICATION.md +62 -62
- package/ftm-executor/references/phases/PHASE-2-AGENT-ASSEMBLY.md +34 -34
- package/ftm-executor/references/phases/PHASE-3-WORKTREES.md +38 -38
- package/ftm-executor/references/phases/PHASE-4-5-AUDIT.md +72 -72
- package/ftm-executor/references/phases/PHASE-4-DISPATCH.md +66 -66
- package/ftm-executor/references/phases/PHASE-5-5-CODEX-GATE.md +73 -73
- package/ftm-executor/references/protocols/DOCUMENTATION-BOOTSTRAP.md +36 -36
- package/ftm-executor/references/protocols/MODEL-PROFILE.md +59 -59
- package/ftm-executor/references/protocols/PROGRESS-TRACKING.md +66 -66
- package/ftm-executor/runtime/ftm-runtime.mjs +252 -252
- package/ftm-executor/runtime/package.json +8 -8
- package/ftm-executor.yml +2 -2
- package/ftm-git/SKILL.md +441 -441
- package/ftm-git/evals/evals.json +26 -26
- package/ftm-git/evals/promptfoo.yaml +75 -75
- package/ftm-git/hooks/post-commit-experience.sh +92 -92
- package/ftm-git/references/patterns/SECRET-PATTERNS.md +104 -104
- package/ftm-git/references/protocols/REMEDIATION.md +139 -139
- package/ftm-git/scripts/pre-commit-secrets.sh +110 -110
- package/ftm-git.yml +2 -2
- package/ftm-inbox/backend/__pycache__/main.cpython-314.pyc +0 -0
- package/ftm-inbox/backend/adapters/_retry.py +64 -64
- package/ftm-inbox/backend/adapters/base.py +230 -230
- package/ftm-inbox/backend/adapters/freshservice.py +104 -104
- package/ftm-inbox/backend/adapters/gmail.py +125 -125
- package/ftm-inbox/backend/adapters/jira.py +136 -136
- package/ftm-inbox/backend/adapters/registry.py +192 -192
- package/ftm-inbox/backend/adapters/slack.py +110 -110
- package/ftm-inbox/backend/db/connection.py +54 -54
- package/ftm-inbox/backend/db/schema.py +78 -78
- package/ftm-inbox/backend/executor/__init__.py +7 -7
- package/ftm-inbox/backend/executor/engine.py +149 -149
- package/ftm-inbox/backend/executor/step_runner.py +98 -98
- package/ftm-inbox/backend/main.py +103 -103
- package/ftm-inbox/backend/models/__init__.py +1 -1
- package/ftm-inbox/backend/models/unified_task.py +36 -36
- package/ftm-inbox/backend/planner/__init__.py +6 -6
- package/ftm-inbox/backend/planner/__pycache__/__init__.cpython-314.pyc +0 -0
- package/ftm-inbox/backend/planner/__pycache__/generator.cpython-314.pyc +0 -0
- package/ftm-inbox/backend/planner/__pycache__/schema.cpython-314.pyc +0 -0
- package/ftm-inbox/backend/planner/generator.py +127 -127
- package/ftm-inbox/backend/planner/schema.py +34 -34
- package/ftm-inbox/backend/requirements.txt +5 -5
- package/ftm-inbox/backend/routes/__pycache__/plan.cpython-314.pyc +0 -0
- package/ftm-inbox/backend/routes/execute.py +186 -186
- package/ftm-inbox/backend/routes/health.py +52 -52
- package/ftm-inbox/backend/routes/inbox.py +68 -68
- package/ftm-inbox/backend/routes/plan.py +271 -271
- package/ftm-inbox/bin/launchagent.mjs +91 -91
- package/ftm-inbox/bin/setup.mjs +188 -188
- package/ftm-inbox/bin/start.sh +10 -10
- package/ftm-inbox/bin/status.sh +17 -17
- package/ftm-inbox/bin/stop.sh +8 -8
- package/ftm-inbox/config.example.yml +55 -55
- package/ftm-inbox/package-lock.json +2898 -2898
- package/ftm-inbox/package.json +26 -26
- package/ftm-inbox/postcss.config.js +6 -6
- package/ftm-inbox/src/app.css +199 -199
- package/ftm-inbox/src/app.html +18 -18
- package/ftm-inbox/src/lib/api.ts +166 -166
- package/ftm-inbox/src/lib/components/ExecutionLog.svelte +81 -81
- package/ftm-inbox/src/lib/components/InboxFeed.svelte +143 -143
- package/ftm-inbox/src/lib/components/PlanStep.svelte +271 -271
- package/ftm-inbox/src/lib/components/PlanView.svelte +206 -206
- package/ftm-inbox/src/lib/components/StreamPanel.svelte +99 -99
- package/ftm-inbox/src/lib/components/TaskCard.svelte +190 -190
- package/ftm-inbox/src/lib/components/ui/EmptyState.svelte +63 -63
- package/ftm-inbox/src/lib/components/ui/KawaiiCard.svelte +86 -86
- package/ftm-inbox/src/lib/components/ui/PillButton.svelte +106 -106
- package/ftm-inbox/src/lib/components/ui/StatusBadge.svelte +67 -67
- package/ftm-inbox/src/lib/components/ui/StreamDrawer.svelte +149 -149
- package/ftm-inbox/src/lib/components/ui/ThemeToggle.svelte +80 -80
- package/ftm-inbox/src/lib/theme.ts +47 -47
- package/ftm-inbox/src/routes/+layout.svelte +76 -76
- package/ftm-inbox/src/routes/+page.svelte +401 -401
- package/ftm-inbox/svelte.config.js +12 -12
- package/ftm-inbox/tailwind.config.ts +63 -63
- package/ftm-inbox/tsconfig.json +13 -13
- package/ftm-inbox/vite.config.ts +6 -6
- package/ftm-intent/SKILL.md +241 -241
- package/ftm-intent.yml +2 -2
- package/ftm-manifest.json +3794 -3794
- package/ftm-map/SKILL.md +291 -291
- package/ftm-map/scripts/db.py +712 -712
- package/ftm-map/scripts/index.py +415 -415
- package/ftm-map/scripts/parser.py +224 -224
- package/ftm-map/scripts/queries/go-tags.scm +20 -20
- package/ftm-map/scripts/queries/javascript-tags.scm +35 -35
- package/ftm-map/scripts/queries/python-tags.scm +31 -31
- package/ftm-map/scripts/queries/ruby-tags.scm +19 -19
- package/ftm-map/scripts/queries/rust-tags.scm +37 -37
- package/ftm-map/scripts/queries/typescript-tags.scm +41 -41
- package/ftm-map/scripts/query.py +301 -301
- package/ftm-map/scripts/ranker.py +377 -377
- package/ftm-map/scripts/requirements.txt +5 -5
- package/ftm-map/scripts/setup-hooks.sh +27 -27
- package/ftm-map/scripts/setup.sh +56 -56
- package/ftm-map/scripts/test_db.py +364 -364
- package/ftm-map/scripts/test_parser.py +174 -174
- package/ftm-map/scripts/test_query.py +183 -183
- package/ftm-map/scripts/test_ranker.py +199 -199
- package/ftm-map/scripts/views.py +591 -591
- package/ftm-map.yml +2 -2
- package/ftm-mind/SKILL.md +201 -1943
- package/ftm-mind/evals/promptfoo.yaml +142 -142
- package/ftm-mind/references/blackboard-protocol.md +110 -0
- package/ftm-mind/references/blackboard-schema.md +328 -328
- package/ftm-mind/references/complexity-guide.md +110 -110
- package/ftm-mind/references/complexity-sizing.md +138 -0
- package/ftm-mind/references/decide-act-protocol.md +172 -0
- package/ftm-mind/references/direct-execution.md +51 -0
- package/ftm-mind/references/environment-discovery.md +77 -0
- package/ftm-mind/references/event-registry.md +319 -319
- package/ftm-mind/references/mcp-inventory.md +300 -296
- package/ftm-mind/references/ops-routing.md +47 -0
- package/ftm-mind/references/orient-protocol.md +234 -0
- package/ftm-mind/references/personality.md +40 -0
- package/ftm-mind/references/protocols/COMPLEXITY-SIZING.md +72 -72
- package/ftm-mind/references/protocols/MCP-HEURISTICS.md +32 -32
- package/ftm-mind/references/protocols/PLAN-APPROVAL.md +80 -80
- package/ftm-mind/references/reflexion-protocol.md +249 -249
- package/ftm-mind/references/routing/SCENARIOS.md +22 -22
- package/ftm-mind/references/routing-scenarios.md +35 -35
- package/ftm-mind.yml +2 -2
- package/ftm-ops.yml +4 -0
- package/ftm-pause/SKILL.md +395 -395
- package/ftm-pause/references/protocols/SKILL-RESTORE-PROTOCOLS.md +186 -186
- package/ftm-pause/references/protocols/VALIDATION.md +80 -80
- package/ftm-pause.yml +2 -2
- package/ftm-researcher/SKILL.md +275 -275
- package/ftm-researcher/evals/agent-diversity.yaml +17 -17
- package/ftm-researcher/evals/synthesis-quality.yaml +12 -12
- package/ftm-researcher/evals/trigger-accuracy.yaml +39 -39
- package/ftm-researcher/references/adaptive-search.md +116 -116
- package/ftm-researcher/references/agent-prompts.md +193 -193
- package/ftm-researcher/references/council-integration.md +193 -193
- package/ftm-researcher/references/output-format.md +203 -203
- package/ftm-researcher/references/synthesis-pipeline.md +165 -165
- package/ftm-researcher/scripts/score_credibility.py +234 -234
- package/ftm-researcher/scripts/validate_research.py +92 -92
- package/ftm-researcher.yml +2 -2
- package/ftm-resume/SKILL.md +518 -518
- package/ftm-resume/references/protocols/VALIDATION.md +172 -172
- package/ftm-resume.yml +2 -2
- package/ftm-retro/SKILL.md +380 -380
- package/ftm-retro/references/protocols/SCORING-RUBRICS.md +89 -89
- package/ftm-retro/references/templates/REPORT-FORMAT.md +109 -109
- package/ftm-retro.yml +2 -2
- package/ftm-routine/SKILL.md +170 -170
- package/ftm-routine.yml +4 -4
- package/ftm-state/blackboard/capabilities.json +5 -5
- package/ftm-state/blackboard/capabilities.schema.json +27 -27
- package/ftm-state/blackboard/context.json +37 -23
- package/ftm-state/blackboard/experiences/doom-statusline-fix.json +26 -0
- package/ftm-state/blackboard/experiences/hackathon-pages-site.json +26 -0
- package/ftm-state/blackboard/experiences/hindsight-sso-kickoff.json +42 -0
- package/ftm-state/blackboard/experiences/index.json +58 -9
- package/ftm-state/blackboard/experiences/learning-ragnarok-api-access.json +23 -0
- package/ftm-state/blackboard/experiences/nordlayer-members-auto-assign.json +26 -0
- package/ftm-state/blackboard/experiences/saml2aws-stale-session-fix.json +41 -0
- package/ftm-state/blackboard/patterns.json +6 -6
- package/ftm-state/schemas/context.schema.json +130 -130
- package/ftm-state/schemas/experience-index.schema.json +77 -77
- package/ftm-state/schemas/experience.schema.json +78 -78
- package/ftm-state/schemas/patterns.schema.json +44 -44
- package/ftm-upgrade/SKILL.md +194 -194
- package/ftm-upgrade/scripts/check-version.sh +76 -76
- package/ftm-upgrade/scripts/upgrade.sh +143 -143
- package/ftm-upgrade.yml +2 -2
- package/ftm-verify.yml +2 -2
- package/ftm.yml +2 -2
- package/hooks/ftm-auto-log.sh +137 -0
- package/hooks/ftm-blackboard-enforcer.sh +93 -93
- package/hooks/ftm-discovery-reminder.sh +90 -90
- package/hooks/ftm-drafts-gate.sh +61 -61
- package/hooks/ftm-event-logger.mjs +107 -107
- package/hooks/ftm-install-hooks.sh +240 -0
- package/hooks/ftm-learning-capture.sh +117 -0
- package/hooks/ftm-map-autodetect.sh +79 -79
- package/hooks/ftm-pending-sync-check.sh +22 -22
- package/hooks/ftm-plan-gate.sh +92 -92
- package/hooks/ftm-post-commit-trigger.sh +57 -57
- package/hooks/ftm-post-compaction.sh +138 -0
- package/hooks/ftm-pre-compaction.sh +147 -0
- package/hooks/ftm-session-end.sh +52 -0
- package/hooks/ftm-session-snapshot.sh +213 -0
- package/hooks/settings-template.json +81 -81
- package/install.sh +363 -363
- package/package.json +84 -84
- package/uninstall.sh +25 -25
|
@@ -1,436 +1,436 @@
|
|
|
1
|
-
# Phases 3–6: Synthesis, Solve, Review, and Present
|
|
2
|
-
|
|
3
|
-
---
|
|
4
|
-
|
|
5
|
-
## Phase 3 (War Room Phase 2 in original numbering): Synthesis & Solve
|
|
6
|
-
|
|
7
|
-
After all investigation agents complete, synthesize their findings before solving.
|
|
8
|
-
|
|
9
|
-
### Step 1: Cross-Reference Findings
|
|
10
|
-
|
|
11
|
-
Read all four reports and synthesize:
|
|
12
|
-
|
|
13
|
-
1. **Do the hypotheses match the research?** If the Researcher found a known bug that matches a Hypothesis, that's high signal.
|
|
14
|
-
2. **Does the reproduction confirm a hypothesis?** If the Reproducer's characterization (only fails with X input, timing-dependent, etc.) matches a hypothesis's prediction, that's strong evidence.
|
|
15
|
-
3. **What does the instrumentation suggest?** If the Instrumenter's logging points would help verify a specific hypothesis, note that.
|
|
16
|
-
4. **Are there contradictions?** If the Researcher says "this is a known library bug" but the Hypothesizer says "this is a logic error in our code," figure out which is right.
|
|
17
|
-
|
|
18
|
-
Present the synthesis to the user briefly:
|
|
19
|
-
|
|
20
|
-
```
|
|
21
|
-
War Room Findings:
|
|
22
|
-
Researcher: [key finding]
|
|
23
|
-
Reproducer: [reproduction status + characterization]
|
|
24
|
-
Hypothesizer: [top hypothesis]
|
|
25
|
-
Instrumenter: [logging added, key observation points]
|
|
26
|
-
|
|
27
|
-
Cross-reference: [how findings align or conflict]
|
|
28
|
-
Recommended fix approach: [what to try first]
|
|
29
|
-
|
|
30
|
-
Proceeding to solve in isolated worktree.
|
|
31
|
-
```
|
|
32
|
-
|
|
33
|
-
### Step 2: Solver Agent Prompt
|
|
34
|
-
|
|
35
|
-
Launch the **Solver agent** in a fresh worktree. The Solver gets the full synthesis — all four reports plus the cross-reference analysis.
|
|
36
|
-
|
|
37
|
-
```
|
|
38
|
-
You are the Solver in a debug war room. The investigation team has
|
|
39
|
-
completed their analysis and you now have comprehensive context. Your
|
|
40
|
-
job is to implement the fix.
|
|
41
|
-
|
|
42
|
-
Working directory: [worktree path]
|
|
43
|
-
Problem: [problem statement]
|
|
44
|
-
Codebase context: [from Phase 0]
|
|
45
|
-
|
|
46
|
-
## Investigation Results
|
|
47
|
-
|
|
48
|
-
[paste full synthesis: Research findings, Reproduction results,
|
|
49
|
-
Hypotheses ranked, Instrumentation notes, Cross-reference analysis]
|
|
50
|
-
|
|
51
|
-
## Execution Rules
|
|
52
|
-
|
|
53
|
-
### Work Incrementally
|
|
54
|
-
- Start with the highest-ranked hypothesis
|
|
55
|
-
- Implement the minimal fix that addresses it
|
|
56
|
-
- COMMIT after each discrete change (not one big commit at the end)
|
|
57
|
-
- Use clear commit messages: "Fix: [what] — addresses hypothesis [N]"
|
|
58
|
-
|
|
59
|
-
### Verify as You Go
|
|
60
|
-
- After each fix attempt, run the reproduction test from REPRODUCTION.md
|
|
61
|
-
- If the project has existing tests, run them too (zero broken windows)
|
|
62
|
-
- If the fix works on the reproduction but breaks other tests, that's
|
|
63
|
-
not done — fix the regressions too
|
|
64
|
-
|
|
65
|
-
### If the First Hypothesis Doesn't Pan It
|
|
66
|
-
- Don't keep hacking at it. Move to hypothesis #2.
|
|
67
|
-
- Revert the failed attempt (git revert or fresh branch) so each
|
|
68
|
-
attempt starts clean
|
|
69
|
-
- If you exhaust all hypotheses, say so — don't invent new ones
|
|
70
|
-
without evidence
|
|
71
|
-
|
|
72
|
-
### Clean Up After Yourself
|
|
73
|
-
- Remove any debug logging you added (unless the user wants to keep it)
|
|
74
|
-
- Make sure the fix is minimal — don't refactor surrounding code
|
|
75
|
-
- Don't add "just in case" error handling beyond what the fix requires
|
|
76
|
-
|
|
77
|
-
### Do NOT Declare Victory
|
|
78
|
-
- You are the Solver, not the Reviewer. Your job ends at "fix committed."
|
|
79
|
-
- Do NOT tell the user "restart X to see the change" — that's the
|
|
80
|
-
Reviewer's job (and the Reviewer must do it, not the user)
|
|
81
|
-
- Do NOT present results directly to the user — hand off to the
|
|
82
|
-
Reviewer agent via FIX-SUMMARY.md
|
|
83
|
-
- Do NOT say the fix works unless you have actually verified it
|
|
84
|
-
by running it. "The code looks correct" is not verification.
|
|
85
|
-
|
|
86
|
-
## Output Format
|
|
87
|
-
|
|
88
|
-
1. All changes committed in the worktree with descriptive messages
|
|
89
|
-
2. Write a file called `FIX-SUMMARY.md` documenting:
|
|
90
|
-
- **Root cause**: What was actually wrong (one paragraph)
|
|
91
|
-
- **Fix applied**: What you changed and why
|
|
92
|
-
- **Files modified**: List with brief descriptions
|
|
93
|
-
- **Commits**: List of commit hashes with messages
|
|
94
|
-
- **Verification**: What tests you ran and their results
|
|
95
|
-
- **Requires restart**: YES/NO — does the fix require restarting
|
|
96
|
-
a process, reloading config, or rebuilding to take effect?
|
|
97
|
-
- **Visual component**: YES/NO — does this bug have a visual or
|
|
98
|
-
experiential symptom that needs visual verification?
|
|
99
|
-
- **Remaining concerns**: Anything that should be monitored or
|
|
100
|
-
might need follow-up
|
|
101
|
-
```
|
|
102
|
-
|
|
103
|
-
---
|
|
104
|
-
|
|
105
|
-
## Phase 4 (War Room Phase 3): Review & Verify
|
|
106
|
-
|
|
107
|
-
**HARD GATE — You cannot proceed to Phase 5 without completing this phase.**
|
|
108
|
-
|
|
109
|
-
This is non-negotiable. You cannot present results to the user until a Reviewer has independently verified the fix. "I checked with grep" is not verification. "The tests pass" is not verification. "The patch was applied" is not verification.
|
|
110
|
-
|
|
111
|
-
Verification means: **the actual behavior the user reported as broken now works correctly, as observed by an agent, with captured evidence.**
|
|
112
|
-
|
|
113
|
-
### Step 1: Determine Verification Method BEFORE Launching the Reviewer
|
|
114
|
-
|
|
115
|
-
Look at the original bug report. Ask: "How would a human know this is fixed?"
|
|
116
|
-
|
|
117
|
-
- If the answer involves SEEING something (UI, terminal output, rendered image, visual layout) → the Reviewer MUST capture a screenshot or visual evidence. Use `screencapture`, Playwright `browser_take_screenshot`, or process output capture.
|
|
118
|
-
- If the answer involves a BEHAVIOR (API returns correct data, CLI produces right output, server responds correctly) → the Reviewer MUST exercise that behavior and capture the output.
|
|
119
|
-
- If the answer is "the error stops happening" → the Reviewer MUST trigger the scenario that caused the error and confirm it no longer occurs.
|
|
120
|
-
|
|
121
|
-
The verification method goes into the Reviewer's prompt. Don't let the Reviewer decide — tell it exactly what to verify and how.
|
|
122
|
-
|
|
123
|
-
### Step 2: If the Fix Requires a Restart, the Reviewer Handles It
|
|
124
|
-
|
|
125
|
-
Many fixes (bundle patches, config changes, build artifacts) require restarting a process to take effect. The Reviewer must:
|
|
126
|
-
|
|
127
|
-
1. Restart the process (use `osascript` to launch in a new terminal if needed, or kill and restart the background process)
|
|
128
|
-
2. Wait for it to initialize
|
|
129
|
-
3. Exercise the fixed behavior
|
|
130
|
-
4. Capture evidence (screenshot, output, logs)
|
|
131
|
-
|
|
132
|
-
If the Reviewer literally cannot restart because it's running inside the process being fixed, try these alternatives first:
|
|
133
|
-
|
|
134
|
-
1. **Launch a SEPARATE instance** via osascript/terminal:
|
|
135
|
-
```bash
|
|
136
|
-
osascript -e 'tell application "Terminal" to do script "cd /path && claude --print \"hello\""'
|
|
137
|
-
sleep 5
|
|
138
|
-
screencapture -x /tmp/verification.png
|
|
139
|
-
```
|
|
140
|
-
Then READ the screenshot to verify.
|
|
141
|
-
|
|
142
|
-
2. **Launch via background process** and capture output:
|
|
143
|
-
```bash
|
|
144
|
-
nohup claude --print "test" > /tmp/claude-output.txt 2>&1 &
|
|
145
|
-
sleep 5
|
|
146
|
-
cat /tmp/claude-output.txt
|
|
147
|
-
```
|
|
148
|
-
|
|
149
|
-
3. **Use Playwright MCP** if available to screenshot a running instance.
|
|
150
|
-
|
|
151
|
-
Only if ALL of these are impossible should you flag as BLOCKED. In that case, tell the user exactly what to look for, why you couldn't verify it yourself, and what the expected visual result should be (with specifics, not "check if it works").
|
|
152
|
-
|
|
153
|
-
### Step 3: Reviewer Agent Prompt
|
|
154
|
-
|
|
155
|
-
```
|
|
156
|
-
You are the Reviewer in a debug war room. The Solver has implemented a
|
|
157
|
-
fix and your job is to verify it actually works, doesn't break anything
|
|
158
|
-
else, and is the right approach.
|
|
159
|
-
|
|
160
|
-
Working directory: [solver's worktree path]
|
|
161
|
-
Problem: [original problem statement]
|
|
162
|
-
Fix summary: [from FIX-SUMMARY.md]
|
|
163
|
-
Reproduction: [from REPRODUCTION.md]
|
|
164
|
-
|
|
165
|
-
## Review Checklist
|
|
166
|
-
|
|
167
|
-
### 1. Does the Fix Address the Root Cause?
|
|
168
|
-
- Read the fix diff carefully
|
|
169
|
-
- Does it fix the actual root cause, or just mask the symptom?
|
|
170
|
-
- Could the same bug recur in a different form?
|
|
171
|
-
- Is the fix in the right layer of abstraction?
|
|
172
|
-
|
|
173
|
-
### 2. Reproduction Verification (YOU MUST RUN THESE — do not list them for the user)
|
|
174
|
-
- EXECUTE the reproduction test — it should PASS now
|
|
175
|
-
- Run it multiple times if the bug was intermittent
|
|
176
|
-
- Try variations of the reproduction (different inputs, timing, config)
|
|
177
|
-
- Capture the actual output/logs as evidence
|
|
178
|
-
|
|
179
|
-
### 3. Regression Check (YOU MUST RUN THESE)
|
|
180
|
-
- EXECUTE the full test suite and capture results
|
|
181
|
-
- EXECUTE linting and type checking
|
|
182
|
-
- EXECUTE any build steps and verify success
|
|
183
|
-
- If the fix involves a running process (server, CLI tool, UI):
|
|
184
|
-
launch it, exercise the fixed behavior, check logs, and capture
|
|
185
|
-
evidence that it works
|
|
186
|
-
|
|
187
|
-
### 4. Live Verification (critical — tests passing is NECESSARY but NOT SUFFICIENT)
|
|
188
|
-
|
|
189
|
-
Tests verify code structure. Live verification proves the feature actually
|
|
190
|
-
works as experienced by a user. Many bugs exist in the gap between "all
|
|
191
|
-
tests pass" and "it actually works." Your job is to close that gap.
|
|
192
|
-
|
|
193
|
-
**Why this matters**: A test can assert that a function returns the right
|
|
194
|
-
value, but that doesn't prove the function gets called, its output reaches
|
|
195
|
-
the renderer, the renderer handles it correctly, and the user sees the
|
|
196
|
-
expected result. Each layer can silently fail while tests pass.
|
|
197
|
-
|
|
198
|
-
#### Automated Runtime Verification (always do these)
|
|
199
|
-
- If the fix involves a server/process: START it, EXERCISE the fixed
|
|
200
|
-
behavior via curl/CLI/API calls, READ stdout/stderr, CAPTURE evidence
|
|
201
|
-
- If the fix involves CLI output: RUN the command, CAPTURE the output,
|
|
202
|
-
COMPARE against expected output
|
|
203
|
-
- If the fix involves log output: RUN the code, READ the log file,
|
|
204
|
-
CONFIRM expected entries appear
|
|
205
|
-
- If the fix involves a build: RUN the build, VERIFY the output artifact
|
|
206
|
-
exists and contains expected content (grep/inspect the built files)
|
|
207
|
-
- If the fix involves configuration: LOAD the config, VERIFY the values
|
|
208
|
-
propagate to where they're used at runtime (not just that the config
|
|
209
|
-
file is correct)
|
|
210
|
-
|
|
211
|
-
#### Visual/Runtime Verification (when the bug has a visual or interactive component)
|
|
212
|
-
|
|
213
|
-
Some bugs only manifest visually — terminal rendering, UI display, image
|
|
214
|
-
output, interactive behavior. Tests can't catch these. You must verify
|
|
215
|
-
the actual rendered result.
|
|
216
|
-
|
|
217
|
-
**Techniques for visual verification:**
|
|
218
|
-
|
|
219
|
-
1. **Playwright/browser automation**: For web UIs, launch Playwright,
|
|
220
|
-
navigate to the page, take a screenshot, and inspect the DOM. Check
|
|
221
|
-
that elements are visible, correctly positioned, and contain expected
|
|
222
|
-
content. This catches CSS bugs, rendering issues, and layout breaks
|
|
223
|
-
that pass all unit tests.
|
|
224
|
-
|
|
225
|
-
2. **AppleScript + screenshot** (macOS): For native apps, CLI tools with
|
|
226
|
-
visual output, or terminal-rendered content:
|
|
227
|
-
```
|
|
228
|
-
# Launch the application via AppleScript
|
|
229
|
-
osascript -e 'tell application "Terminal" to do script "your-command"'
|
|
230
|
-
# Wait for it to render, then capture
|
|
231
|
-
screencapture -x /tmp/verification-screenshot.png
|
|
232
|
-
```
|
|
233
|
-
Then read the screenshot to verify the visual result.
|
|
234
|
-
|
|
235
|
-
3. **Process output capture**: For CLI tools and terminal UIs, run the
|
|
236
|
-
command with output capture (script command, tee, or redirect) and
|
|
237
|
-
inspect the raw output including ANSI codes, escape sequences, and
|
|
238
|
-
control characters that affect rendering.
|
|
239
|
-
|
|
240
|
-
4. **Playwright for Electron/web-based tools**: Many modern tools
|
|
241
|
-
(VS Code extensions, Electron apps, web dashboards) can be automated
|
|
242
|
-
with Playwright. Use `browser_navigate`, `browser_snapshot`, and
|
|
243
|
-
`browser_take_screenshot` to verify rendered state.
|
|
244
|
-
|
|
245
|
-
5. **ftm-browse ($PB) for UI verification**: If ftm-browse is
|
|
246
|
-
installed, use it for visual verification of web UI bugs. First check
|
|
247
|
-
whether the binary exists:
|
|
248
|
-
```bash
|
|
249
|
-
PB="$HOME/.claude/skills/ftm-browse/bin/ftm-browse"
|
|
250
|
-
```
|
|
251
|
-
If the binary exists at that path, use it:
|
|
252
|
-
- **Navigate**: `$PB goto <url>` — open the affected page
|
|
253
|
-
- **Before screenshot**: `$PB screenshot --path /tmp/debug-before.png`
|
|
254
|
-
(capture state BEFORE verifying the fix is live, if you need a
|
|
255
|
-
before/after comparison — do this before the fix is applied or on
|
|
256
|
-
a pre-fix worktree)
|
|
257
|
-
- **After screenshot**: `$PB screenshot --path /tmp/debug-after.png`
|
|
258
|
-
(capture state AFTER fix is applied and running)
|
|
259
|
-
- **DOM inspection**: `$PB snapshot -i` — get the interactive ARIA
|
|
260
|
-
tree to verify element existence, visibility, and state
|
|
261
|
-
(e.g., confirm a button is now visible, a panel is collapsed,
|
|
262
|
-
an error message is gone)
|
|
263
|
-
- Report both screenshot paths in REVIEW-VERDICT.md so the user
|
|
264
|
-
can compare before/after visually.
|
|
265
|
-
|
|
266
|
-
**Graceful fallback**: If the binary does NOT exist at
|
|
267
|
-
`$HOME/.claude/skills/ftm-browse/bin/ftm-browse`, fall back to
|
|
268
|
-
test-only and other available verification methods (Playwright, etc.).
|
|
269
|
-
Do NOT fail the review. Record in the Verification Gate section:
|
|
270
|
-
"Visual verification skipped — ftm-browse not installed."
|
|
271
|
-
|
|
272
|
-
**When to use visual verification:**
|
|
273
|
-
- Terminal rendering (status lines, TUI elements, colored output, unicode)
|
|
274
|
-
- Web UI changes (layout, styling, visibility, interaction)
|
|
275
|
-
- Image/PDF/document generation (verify output visually, not just file size)
|
|
276
|
-
- Any bug where "it looks wrong" was part of the symptom
|
|
277
|
-
- Any fix where tests pass but you're not 100% confident the user will
|
|
278
|
-
see the correct result
|
|
279
|
-
|
|
280
|
-
**The rule**: If the bug was reported as something the user SAW (or didn't
|
|
281
|
-
see), verification must confirm what the user will SEE (or will now see).
|
|
282
|
-
Passing tests are evidence, not proof. Visual confirmation is proof.
|
|
283
|
-
|
|
284
|
-
#### Never Do This
|
|
285
|
-
- NEVER write "How to verify: run X" — instead, RUN X yourself and
|
|
286
|
-
report what happened
|
|
287
|
-
- NEVER say "restart the app to see the change" — restart it yourself,
|
|
288
|
-
observe the result, report back
|
|
289
|
-
- NEVER assume tests passing = feature working. Tests verify code paths.
|
|
290
|
-
Live verification proves the feature delivers its intended experience.
|
|
291
|
-
|
|
292
|
-
### 5. Code Quality
|
|
293
|
-
- Is the fix minimal and focused?
|
|
294
|
-
- Does it follow the project's existing patterns?
|
|
295
|
-
- Are there edge cases the fix doesn't handle?
|
|
296
|
-
- Is error handling appropriate (not excessive, not missing)?
|
|
297
|
-
|
|
298
|
-
### 6. Observability
|
|
299
|
-
- Will this failure mode be visible if it happens again?
|
|
300
|
-
- Should any permanent logging or monitoring be added?
|
|
301
|
-
- Are there metrics or alerts that should be updated?
|
|
302
|
-
|
|
303
|
-
## Mandatory Verification Gate
|
|
304
|
-
|
|
305
|
-
Before writing the verdict, answer these two questions:
|
|
306
|
-
|
|
307
|
-
**Q1: Was the bug reported as something visual/experiential?**
|
|
308
|
-
(Did the user say "it doesn't show up", "it looks wrong", "the UI is broken",
|
|
309
|
-
"nothing happens when I click", "the output is garbled", etc.)
|
|
310
|
-
|
|
311
|
-
If YES → Visual verification is REQUIRED. You cannot approve without
|
|
312
|
-
capturing a screenshot, reading rendered output, or observing the
|
|
313
|
-
running application. Grep checks and log analysis are not sufficient.
|
|
314
|
-
|
|
315
|
-
If NO → Automated runtime verification (running tests, checking output)
|
|
316
|
-
is sufficient.
|
|
317
|
-
|
|
318
|
-
**Q2: Does the fix require restarting a process to take effect?**
|
|
319
|
-
(Patching a bundle, changing config loaded at startup, modifying
|
|
320
|
-
compiled artifacts, etc.)
|
|
321
|
-
|
|
322
|
-
If YES → YOU must restart the process, observe the result, and capture
|
|
323
|
-
evidence. Do not tell the user to restart — do it yourself:
|
|
324
|
-
```
|
|
325
|
-
# Example: restart a CLI tool and capture its output
|
|
326
|
-
osascript -e 'tell application "Terminal" to do script "cd /path && your-command"'
|
|
327
|
-
sleep 3
|
|
328
|
-
screencapture -x /tmp/verification-screenshot.png
|
|
329
|
-
# Then READ the screenshot to verify
|
|
330
|
-
```
|
|
331
|
-
|
|
332
|
-
If you cannot restart the process (e.g., it's the very tool you're
|
|
333
|
-
running inside), this is one of the rare legitimate cases to ask the
|
|
334
|
-
user — but you MUST say what specific thing to look for and why you
|
|
335
|
-
couldn't verify it yourself.
|
|
336
|
-
|
|
337
|
-
## Output Format
|
|
338
|
-
|
|
339
|
-
Write a file called `REVIEW-VERDICT.md` with:
|
|
340
|
-
|
|
341
|
-
### Verdict: [APPROVED / APPROVED WITH CHANGES / NEEDS REWORK]
|
|
342
|
-
|
|
343
|
-
### Verification Gate
|
|
344
|
-
- Bug is visual/experiential: [YES/NO]
|
|
345
|
-
- Fix requires process restart: [YES/NO]
|
|
346
|
-
- Visual verification performed: [YES — describe what was captured / NO — explain why not required / BLOCKED — explain why agent couldn't do it]
|
|
347
|
-
|
|
348
|
-
### Fix Verification
|
|
349
|
-
- Reproduction test: [PASS/FAIL — actual output]
|
|
350
|
-
- Full test suite: [PASS/FAIL with details]
|
|
351
|
-
- Build: [PASS/FAIL]
|
|
352
|
-
- Lint/typecheck: [PASS/FAIL]
|
|
353
|
-
- Runtime verification: [what was run, what was observed]
|
|
354
|
-
- Visual verification: [screenshot path, DOM snapshot, or rendered output captured — or N/A with reason]
|
|
355
|
-
|
|
356
|
-
### Code Review Notes
|
|
357
|
-
- [specific observations, line references]
|
|
358
|
-
|
|
359
|
-
### Concerns
|
|
360
|
-
- [anything that needs attention]
|
|
361
|
-
|
|
362
|
-
### Recommended Follow-ups
|
|
363
|
-
- [monitoring, tests to add, documentation to update]
|
|
364
|
-
```
|
|
365
|
-
|
|
366
|
-
If the Reviewer says **NEEDS REWORK**, send the feedback back to the Solver agent for another iteration. The Solver-Reviewer loop continues until the verdict is APPROVED (max 3 iterations — after that, escalate to the user with full context of what's been tried).
|
|
367
|
-
|
|
368
|
-
---
|
|
369
|
-
|
|
370
|
-
## Phase 5 (War Room Phase 4): Present Results
|
|
371
|
-
|
|
372
|
-
**CHECKPOINT: Before presenting, confirm these are true:**
|
|
373
|
-
- [ ] A Reviewer agent was spawned (not just the Solver declaring victory)
|
|
374
|
-
- [ ] The Reviewer's verdict includes actual evidence (output captures, screenshots, log snippets — not just "PASS")
|
|
375
|
-
- [ ] If the bug was visual, visual evidence was captured
|
|
376
|
-
- [ ] If the fix required a restart, the restart happened and post-restart behavior was verified
|
|
377
|
-
- [ ] No "How to Verify" or "Restart X to see the change" instructions are included in the presentation
|
|
378
|
-
|
|
379
|
-
If any of these are false, you are not ready to present. Go back to Phase 4.
|
|
380
|
-
|
|
381
|
-
Once the Reviewer approves, present the full results to the user:
|
|
382
|
-
|
|
383
|
-
```
|
|
384
|
-
## Debug War Room Complete
|
|
385
|
-
|
|
386
|
-
### Root Cause
|
|
387
|
-
[One paragraph explaining what was wrong — clear enough that someone
|
|
388
|
-
unfamiliar with the code would understand]
|
|
389
|
-
|
|
390
|
-
### What Changed
|
|
391
|
-
[List of files modified with brief descriptions]
|
|
392
|
-
|
|
393
|
-
### Verification Already Performed
|
|
394
|
-
[These are things the Reviewer ALREADY RAN — not suggestions for the
|
|
395
|
-
user to do. Include actual output/evidence.]
|
|
396
|
-
- Reproduction test: PASS — [actual output snippet]
|
|
397
|
-
- Full test suite: PASS — [X tests passed, 0 failures]
|
|
398
|
-
- Build: PASS
|
|
399
|
-
- Runtime verification: [command run, output captured, expected vs actual]
|
|
400
|
-
- Visual verification (if applicable): [what was launched, screenshot/DOM
|
|
401
|
-
evidence, what the user will see — this closes the gap between "tests
|
|
402
|
-
pass" and "it actually works"]
|
|
403
|
-
- Reviewer verdict: APPROVED
|
|
404
|
-
|
|
405
|
-
### Key Findings
|
|
406
|
-
- [Top research findings that informed the fix]
|
|
407
|
-
- [Instrumentation insights that revealed the bug]
|
|
408
|
-
- [Hypotheses that were tested, including ones that were wrong — these
|
|
409
|
-
help the user's understanding]
|
|
410
|
-
|
|
411
|
-
### Commits (in worktree: [branch name])
|
|
412
|
-
[List of commits with messages]
|
|
413
|
-
|
|
414
|
-
Ready to merge. All automated verification has passed.
|
|
415
|
-
```
|
|
416
|
-
|
|
417
|
-
**Do NOT include a "How to Verify Yourself" section with manual steps.** If there is any verification that can be automated, the Reviewer must have already done it. The only reason to mention verification steps to the user is if something genuinely requires human judgment (visual design review, business logic confirmation) — and even then, explain what the agents already checked and what specifically needs a human eye.
|
|
418
|
-
|
|
419
|
-
Wait for the user to validate. Once they confirm:
|
|
420
|
-
|
|
421
|
-
1. Merge the solver's worktree branch to main
|
|
422
|
-
2. Clean up all worktrees and branches
|
|
423
|
-
3. Remove any remaining debug instrumentation (unless the user wants to keep it)
|
|
424
|
-
|
|
425
|
-
---
|
|
426
|
-
|
|
427
|
-
## Phase 6: Escalation Protocol
|
|
428
|
-
|
|
429
|
-
If after 3 Solver-Reviewer iterations the fix still isn't approved:
|
|
430
|
-
|
|
431
|
-
1. Present everything to the user: all hypotheses tested, all fix attempts, all review feedback
|
|
432
|
-
2. Ask the user for direction — they may have context that wasn't available to the agents
|
|
433
|
-
3. If the user provides new information, restart from Phase 1 with the new context
|
|
434
|
-
4. If the user wants to pair on it, switch to interactive debugging with all the instrumentation and research already done as context
|
|
435
|
-
|
|
436
|
-
The war room is powerful but not omniscient. Sometimes the bug requires domain knowledge only the user has. The goal is to do 90% of the work so the user's intervention is a focused 10%.
|
|
1
|
+
# Phases 3–6: Synthesis, Solve, Review, and Present
|
|
2
|
+
|
|
3
|
+
---
|
|
4
|
+
|
|
5
|
+
## Phase 3 (War Room Phase 2 in original numbering): Synthesis & Solve
|
|
6
|
+
|
|
7
|
+
After all investigation agents complete, synthesize their findings before solving.
|
|
8
|
+
|
|
9
|
+
### Step 1: Cross-Reference Findings
|
|
10
|
+
|
|
11
|
+
Read all four reports and synthesize:
|
|
12
|
+
|
|
13
|
+
1. **Do the hypotheses match the research?** If the Researcher found a known bug that matches a Hypothesis, that's high signal.
|
|
14
|
+
2. **Does the reproduction confirm a hypothesis?** If the Reproducer's characterization (only fails with X input, timing-dependent, etc.) matches a hypothesis's prediction, that's strong evidence.
|
|
15
|
+
3. **What does the instrumentation suggest?** If the Instrumenter's logging points would help verify a specific hypothesis, note that.
|
|
16
|
+
4. **Are there contradictions?** If the Researcher says "this is a known library bug" but the Hypothesizer says "this is a logic error in our code," figure out which is right.
|
|
17
|
+
|
|
18
|
+
Present the synthesis to the user briefly:
|
|
19
|
+
|
|
20
|
+
```
|
|
21
|
+
War Room Findings:
|
|
22
|
+
Researcher: [key finding]
|
|
23
|
+
Reproducer: [reproduction status + characterization]
|
|
24
|
+
Hypothesizer: [top hypothesis]
|
|
25
|
+
Instrumenter: [logging added, key observation points]
|
|
26
|
+
|
|
27
|
+
Cross-reference: [how findings align or conflict]
|
|
28
|
+
Recommended fix approach: [what to try first]
|
|
29
|
+
|
|
30
|
+
Proceeding to solve in isolated worktree.
|
|
31
|
+
```
|
|
32
|
+
|
|
33
|
+
### Step 2: Solver Agent Prompt
|
|
34
|
+
|
|
35
|
+
Launch the **Solver agent** in a fresh worktree. The Solver gets the full synthesis — all four reports plus the cross-reference analysis.
|
|
36
|
+
|
|
37
|
+
```
|
|
38
|
+
You are the Solver in a debug war room. The investigation team has
|
|
39
|
+
completed their analysis and you now have comprehensive context. Your
|
|
40
|
+
job is to implement the fix.
|
|
41
|
+
|
|
42
|
+
Working directory: [worktree path]
|
|
43
|
+
Problem: [problem statement]
|
|
44
|
+
Codebase context: [from Phase 0]
|
|
45
|
+
|
|
46
|
+
## Investigation Results
|
|
47
|
+
|
|
48
|
+
[paste full synthesis: Research findings, Reproduction results,
|
|
49
|
+
Hypotheses ranked, Instrumentation notes, Cross-reference analysis]
|
|
50
|
+
|
|
51
|
+
## Execution Rules
|
|
52
|
+
|
|
53
|
+
### Work Incrementally
|
|
54
|
+
- Start with the highest-ranked hypothesis
|
|
55
|
+
- Implement the minimal fix that addresses it
|
|
56
|
+
- COMMIT after each discrete change (not one big commit at the end)
|
|
57
|
+
- Use clear commit messages: "Fix: [what] — addresses hypothesis [N]"
|
|
58
|
+
|
|
59
|
+
### Verify as You Go
|
|
60
|
+
- After each fix attempt, run the reproduction test from REPRODUCTION.md
|
|
61
|
+
- If the project has existing tests, run them too (zero broken windows)
|
|
62
|
+
- If the fix works on the reproduction but breaks other tests, that's
|
|
63
|
+
not done — fix the regressions too
|
|
64
|
+
|
|
65
|
+
### If the First Hypothesis Doesn't Pan It
|
|
66
|
+
- Don't keep hacking at it. Move to hypothesis #2.
|
|
67
|
+
- Revert the failed attempt (git revert or fresh branch) so each
|
|
68
|
+
attempt starts clean
|
|
69
|
+
- If you exhaust all hypotheses, say so — don't invent new ones
|
|
70
|
+
without evidence
|
|
71
|
+
|
|
72
|
+
### Clean Up After Yourself
|
|
73
|
+
- Remove any debug logging you added (unless the user wants to keep it)
|
|
74
|
+
- Make sure the fix is minimal — don't refactor surrounding code
|
|
75
|
+
- Don't add "just in case" error handling beyond what the fix requires
|
|
76
|
+
|
|
77
|
+
### Do NOT Declare Victory
|
|
78
|
+
- You are the Solver, not the Reviewer. Your job ends at "fix committed."
|
|
79
|
+
- Do NOT tell the user "restart X to see the change" — that's the
|
|
80
|
+
Reviewer's job (and the Reviewer must do it, not the user)
|
|
81
|
+
- Do NOT present results directly to the user — hand off to the
|
|
82
|
+
Reviewer agent via FIX-SUMMARY.md
|
|
83
|
+
- Do NOT say the fix works unless you have actually verified it
|
|
84
|
+
by running it. "The code looks correct" is not verification.
|
|
85
|
+
|
|
86
|
+
## Output Format
|
|
87
|
+
|
|
88
|
+
1. All changes committed in the worktree with descriptive messages
|
|
89
|
+
2. Write a file called `FIX-SUMMARY.md` documenting:
|
|
90
|
+
- **Root cause**: What was actually wrong (one paragraph)
|
|
91
|
+
- **Fix applied**: What you changed and why
|
|
92
|
+
- **Files modified**: List with brief descriptions
|
|
93
|
+
- **Commits**: List of commit hashes with messages
|
|
94
|
+
- **Verification**: What tests you ran and their results
|
|
95
|
+
- **Requires restart**: YES/NO — does the fix require restarting
|
|
96
|
+
a process, reloading config, or rebuilding to take effect?
|
|
97
|
+
- **Visual component**: YES/NO — does this bug have a visual or
|
|
98
|
+
experiential symptom that needs visual verification?
|
|
99
|
+
- **Remaining concerns**: Anything that should be monitored or
|
|
100
|
+
might need follow-up
|
|
101
|
+
```
|
|
102
|
+
|
|
103
|
+
---
|
|
104
|
+
|
|
105
|
+
## Phase 4 (War Room Phase 3): Review & Verify
|
|
106
|
+
|
|
107
|
+
**HARD GATE — You cannot proceed to Phase 5 without completing this phase.**
|
|
108
|
+
|
|
109
|
+
This is non-negotiable. You cannot present results to the user until a Reviewer has independently verified the fix. "I checked with grep" is not verification. "The tests pass" is not verification. "The patch was applied" is not verification.
|
|
110
|
+
|
|
111
|
+
Verification means: **the actual behavior the user reported as broken now works correctly, as observed by an agent, with captured evidence.**
|
|
112
|
+
|
|
113
|
+
### Step 1: Determine Verification Method BEFORE Launching the Reviewer
|
|
114
|
+
|
|
115
|
+
Look at the original bug report. Ask: "How would a human know this is fixed?"
|
|
116
|
+
|
|
117
|
+
- If the answer involves SEEING something (UI, terminal output, rendered image, visual layout) → the Reviewer MUST capture a screenshot or visual evidence. Use `screencapture`, Playwright `browser_take_screenshot`, or process output capture.
|
|
118
|
+
- If the answer involves a BEHAVIOR (API returns correct data, CLI produces right output, server responds correctly) → the Reviewer MUST exercise that behavior and capture the output.
|
|
119
|
+
- If the answer is "the error stops happening" → the Reviewer MUST trigger the scenario that caused the error and confirm it no longer occurs.
|
|
120
|
+
|
|
121
|
+
The verification method goes into the Reviewer's prompt. Don't let the Reviewer decide — tell it exactly what to verify and how.
|
|
122
|
+
|
|
123
|
+
### Step 2: If the Fix Requires a Restart, the Reviewer Handles It
|
|
124
|
+
|
|
125
|
+
Many fixes (bundle patches, config changes, build artifacts) require restarting a process to take effect. The Reviewer must:
|
|
126
|
+
|
|
127
|
+
1. Restart the process (use `osascript` to launch in a new terminal if needed, or kill and restart the background process)
|
|
128
|
+
2. Wait for it to initialize
|
|
129
|
+
3. Exercise the fixed behavior
|
|
130
|
+
4. Capture evidence (screenshot, output, logs)
|
|
131
|
+
|
|
132
|
+
If the Reviewer literally cannot restart because it's running inside the process being fixed, try these alternatives first:
|
|
133
|
+
|
|
134
|
+
1. **Launch a SEPARATE instance** via osascript/terminal:
|
|
135
|
+
```bash
|
|
136
|
+
osascript -e 'tell application "Terminal" to do script "cd /path && claude --print \"hello\""'
|
|
137
|
+
sleep 5
|
|
138
|
+
screencapture -x /tmp/verification.png
|
|
139
|
+
```
|
|
140
|
+
Then READ the screenshot to verify.
|
|
141
|
+
|
|
142
|
+
2. **Launch via background process** and capture output:
|
|
143
|
+
```bash
|
|
144
|
+
nohup claude --print "test" > /tmp/claude-output.txt 2>&1 &
|
|
145
|
+
sleep 5
|
|
146
|
+
cat /tmp/claude-output.txt
|
|
147
|
+
```
|
|
148
|
+
|
|
149
|
+
3. **Use Playwright MCP** if available to screenshot a running instance.
|
|
150
|
+
|
|
151
|
+
Only if ALL of these are impossible should you flag as BLOCKED. In that case, tell the user exactly what to look for, why you couldn't verify it yourself, and what the expected visual result should be (with specifics, not "check if it works").
|
|
152
|
+
|
|
153
|
+
### Step 3: Reviewer Agent Prompt
|
|
154
|
+
|
|
155
|
+
```
|
|
156
|
+
You are the Reviewer in a debug war room. The Solver has implemented a
|
|
157
|
+
fix and your job is to verify it actually works, doesn't break anything
|
|
158
|
+
else, and is the right approach.
|
|
159
|
+
|
|
160
|
+
Working directory: [solver's worktree path]
|
|
161
|
+
Problem: [original problem statement]
|
|
162
|
+
Fix summary: [from FIX-SUMMARY.md]
|
|
163
|
+
Reproduction: [from REPRODUCTION.md]
|
|
164
|
+
|
|
165
|
+
## Review Checklist
|
|
166
|
+
|
|
167
|
+
### 1. Does the Fix Address the Root Cause?
|
|
168
|
+
- Read the fix diff carefully
|
|
169
|
+
- Does it fix the actual root cause, or just mask the symptom?
|
|
170
|
+
- Could the same bug recur in a different form?
|
|
171
|
+
- Is the fix in the right layer of abstraction?
|
|
172
|
+
|
|
173
|
+
### 2. Reproduction Verification (YOU MUST RUN THESE — do not list them for the user)
|
|
174
|
+
- EXECUTE the reproduction test — it should PASS now
|
|
175
|
+
- Run it multiple times if the bug was intermittent
|
|
176
|
+
- Try variations of the reproduction (different inputs, timing, config)
|
|
177
|
+
- Capture the actual output/logs as evidence
|
|
178
|
+
|
|
179
|
+
### 3. Regression Check (YOU MUST RUN THESE)
|
|
180
|
+
- EXECUTE the full test suite and capture results
|
|
181
|
+
- EXECUTE linting and type checking
|
|
182
|
+
- EXECUTE any build steps and verify success
|
|
183
|
+
- If the fix involves a running process (server, CLI tool, UI):
|
|
184
|
+
launch it, exercise the fixed behavior, check logs, and capture
|
|
185
|
+
evidence that it works
|
|
186
|
+
|
|
187
|
+
### 4. Live Verification (critical — tests passing is NECESSARY but NOT SUFFICIENT)
|
|
188
|
+
|
|
189
|
+
Tests verify code structure. Live verification proves the feature actually
|
|
190
|
+
works as experienced by a user. Many bugs exist in the gap between "all
|
|
191
|
+
tests pass" and "it actually works." Your job is to close that gap.
|
|
192
|
+
|
|
193
|
+
**Why this matters**: A test can assert that a function returns the right
|
|
194
|
+
value, but that doesn't prove the function gets called, its output reaches
|
|
195
|
+
the renderer, the renderer handles it correctly, and the user sees the
|
|
196
|
+
expected result. Each layer can silently fail while tests pass.
|
|
197
|
+
|
|
198
|
+
#### Automated Runtime Verification (always do these)
|
|
199
|
+
- If the fix involves a server/process: START it, EXERCISE the fixed
|
|
200
|
+
behavior via curl/CLI/API calls, READ stdout/stderr, CAPTURE evidence
|
|
201
|
+
- If the fix involves CLI output: RUN the command, CAPTURE the output,
|
|
202
|
+
COMPARE against expected output
|
|
203
|
+
- If the fix involves log output: RUN the code, READ the log file,
|
|
204
|
+
CONFIRM expected entries appear
|
|
205
|
+
- If the fix involves a build: RUN the build, VERIFY the output artifact
|
|
206
|
+
exists and contains expected content (grep/inspect the built files)
|
|
207
|
+
- If the fix involves configuration: LOAD the config, VERIFY the values
|
|
208
|
+
propagate to where they're used at runtime (not just that the config
|
|
209
|
+
file is correct)
|
|
210
|
+
|
|
211
|
+
#### Visual/Runtime Verification (when the bug has a visual or interactive component)
|
|
212
|
+
|
|
213
|
+
Some bugs only manifest visually — terminal rendering, UI display, image
|
|
214
|
+
output, interactive behavior. Tests can't catch these. You must verify
|
|
215
|
+
the actual rendered result.
|
|
216
|
+
|
|
217
|
+
**Techniques for visual verification:**
|
|
218
|
+
|
|
219
|
+
1. **Playwright/browser automation**: For web UIs, launch Playwright,
|
|
220
|
+
navigate to the page, take a screenshot, and inspect the DOM. Check
|
|
221
|
+
that elements are visible, correctly positioned, and contain expected
|
|
222
|
+
content. This catches CSS bugs, rendering issues, and layout breaks
|
|
223
|
+
that pass all unit tests.
|
|
224
|
+
|
|
225
|
+
2. **AppleScript + screenshot** (macOS): For native apps, CLI tools with
|
|
226
|
+
visual output, or terminal-rendered content:
|
|
227
|
+
```
|
|
228
|
+
# Launch the application via AppleScript
|
|
229
|
+
osascript -e 'tell application "Terminal" to do script "your-command"'
|
|
230
|
+
# Wait for it to render, then capture
|
|
231
|
+
screencapture -x /tmp/verification-screenshot.png
|
|
232
|
+
```
|
|
233
|
+
Then read the screenshot to verify the visual result.
|
|
234
|
+
|
|
235
|
+
3. **Process output capture**: For CLI tools and terminal UIs, run the
|
|
236
|
+
command with output capture (script command, tee, or redirect) and
|
|
237
|
+
inspect the raw output including ANSI codes, escape sequences, and
|
|
238
|
+
control characters that affect rendering.
|
|
239
|
+
|
|
240
|
+
4. **Playwright for Electron/web-based tools**: Many modern tools
|
|
241
|
+
(VS Code extensions, Electron apps, web dashboards) can be automated
|
|
242
|
+
with Playwright. Use `browser_navigate`, `browser_snapshot`, and
|
|
243
|
+
`browser_take_screenshot` to verify rendered state.
|
|
244
|
+
|
|
245
|
+
5. **ftm-browse ($PB) for UI verification**: If ftm-browse is
|
|
246
|
+
installed, use it for visual verification of web UI bugs. First check
|
|
247
|
+
whether the binary exists:
|
|
248
|
+
```bash
|
|
249
|
+
PB="$HOME/.claude/skills/ftm-browse/bin/ftm-browse"
|
|
250
|
+
```
|
|
251
|
+
If the binary exists at that path, use it:
|
|
252
|
+
- **Navigate**: `$PB goto <url>` — open the affected page
|
|
253
|
+
- **Before screenshot**: `$PB screenshot --path /tmp/debug-before.png`
|
|
254
|
+
(capture state BEFORE verifying the fix is live, if you need a
|
|
255
|
+
before/after comparison — do this before the fix is applied or on
|
|
256
|
+
a pre-fix worktree)
|
|
257
|
+
- **After screenshot**: `$PB screenshot --path /tmp/debug-after.png`
|
|
258
|
+
(capture state AFTER fix is applied and running)
|
|
259
|
+
- **DOM inspection**: `$PB snapshot -i` — get the interactive ARIA
|
|
260
|
+
tree to verify element existence, visibility, and state
|
|
261
|
+
(e.g., confirm a button is now visible, a panel is collapsed,
|
|
262
|
+
an error message is gone)
|
|
263
|
+
- Report both screenshot paths in REVIEW-VERDICT.md so the user
|
|
264
|
+
can compare before/after visually.
|
|
265
|
+
|
|
266
|
+
**Graceful fallback**: If the binary does NOT exist at
|
|
267
|
+
`$HOME/.claude/skills/ftm-browse/bin/ftm-browse`, fall back to
|
|
268
|
+
test-only and other available verification methods (Playwright, etc.).
|
|
269
|
+
Do NOT fail the review. Record in the Verification Gate section:
|
|
270
|
+
"Visual verification skipped — ftm-browse not installed."
|
|
271
|
+
|
|
272
|
+
**When to use visual verification:**
|
|
273
|
+
- Terminal rendering (status lines, TUI elements, colored output, unicode)
|
|
274
|
+
- Web UI changes (layout, styling, visibility, interaction)
|
|
275
|
+
- Image/PDF/document generation (verify output visually, not just file size)
|
|
276
|
+
- Any bug where "it looks wrong" was part of the symptom
|
|
277
|
+
- Any fix where tests pass but you're not 100% confident the user will
|
|
278
|
+
see the correct result
|
|
279
|
+
|
|
280
|
+
**The rule**: If the bug was reported as something the user SAW (or didn't
|
|
281
|
+
see), verification must confirm what the user will SEE (or will now see).
|
|
282
|
+
Passing tests are evidence, not proof. Visual confirmation is proof.
|
|
283
|
+
|
|
284
|
+
#### Never Do This
|
|
285
|
+
- NEVER write "How to verify: run X" — instead, RUN X yourself and
|
|
286
|
+
report what happened
|
|
287
|
+
- NEVER say "restart the app to see the change" — restart it yourself,
|
|
288
|
+
observe the result, report back
|
|
289
|
+
- NEVER assume tests passing = feature working. Tests verify code paths.
|
|
290
|
+
Live verification proves the feature delivers its intended experience.
|
|
291
|
+
|
|
292
|
+
### 5. Code Quality
|
|
293
|
+
- Is the fix minimal and focused?
|
|
294
|
+
- Does it follow the project's existing patterns?
|
|
295
|
+
- Are there edge cases the fix doesn't handle?
|
|
296
|
+
- Is error handling appropriate (not excessive, not missing)?
|
|
297
|
+
|
|
298
|
+
### 6. Observability
|
|
299
|
+
- Will this failure mode be visible if it happens again?
|
|
300
|
+
- Should any permanent logging or monitoring be added?
|
|
301
|
+
- Are there metrics or alerts that should be updated?
|
|
302
|
+
|
|
303
|
+
## Mandatory Verification Gate
|
|
304
|
+
|
|
305
|
+
Before writing the verdict, answer these two questions:
|
|
306
|
+
|
|
307
|
+
**Q1: Was the bug reported as something visual/experiential?**
|
|
308
|
+
(Did the user say "it doesn't show up", "it looks wrong", "the UI is broken",
|
|
309
|
+
"nothing happens when I click", "the output is garbled", etc.)
|
|
310
|
+
|
|
311
|
+
If YES → Visual verification is REQUIRED. You cannot approve without
|
|
312
|
+
capturing a screenshot, reading rendered output, or observing the
|
|
313
|
+
running application. Grep checks and log analysis are not sufficient.
|
|
314
|
+
|
|
315
|
+
If NO → Automated runtime verification (running tests, checking output)
|
|
316
|
+
is sufficient.
|
|
317
|
+
|
|
318
|
+
**Q2: Does the fix require restarting a process to take effect?**
|
|
319
|
+
(Patching a bundle, changing config loaded at startup, modifying
|
|
320
|
+
compiled artifacts, etc.)
|
|
321
|
+
|
|
322
|
+
If YES → YOU must restart the process, observe the result, and capture
|
|
323
|
+
evidence. Do not tell the user to restart — do it yourself:
|
|
324
|
+
```
|
|
325
|
+
# Example: restart a CLI tool and capture its output
|
|
326
|
+
osascript -e 'tell application "Terminal" to do script "cd /path && your-command"'
|
|
327
|
+
sleep 3
|
|
328
|
+
screencapture -x /tmp/verification-screenshot.png
|
|
329
|
+
# Then READ the screenshot to verify
|
|
330
|
+
```
|
|
331
|
+
|
|
332
|
+
If you cannot restart the process (e.g., it's the very tool you're
|
|
333
|
+
running inside), this is one of the rare legitimate cases to ask the
|
|
334
|
+
user — but you MUST say what specific thing to look for and why you
|
|
335
|
+
couldn't verify it yourself.
|
|
336
|
+
|
|
337
|
+
## Output Format
|
|
338
|
+
|
|
339
|
+
Write a file called `REVIEW-VERDICT.md` with:
|
|
340
|
+
|
|
341
|
+
### Verdict: [APPROVED / APPROVED WITH CHANGES / NEEDS REWORK]
|
|
342
|
+
|
|
343
|
+
### Verification Gate
|
|
344
|
+
- Bug is visual/experiential: [YES/NO]
|
|
345
|
+
- Fix requires process restart: [YES/NO]
|
|
346
|
+
- Visual verification performed: [YES — describe what was captured / NO — explain why not required / BLOCKED — explain why agent couldn't do it]
|
|
347
|
+
|
|
348
|
+
### Fix Verification
|
|
349
|
+
- Reproduction test: [PASS/FAIL — actual output]
|
|
350
|
+
- Full test suite: [PASS/FAIL with details]
|
|
351
|
+
- Build: [PASS/FAIL]
|
|
352
|
+
- Lint/typecheck: [PASS/FAIL]
|
|
353
|
+
- Runtime verification: [what was run, what was observed]
|
|
354
|
+
- Visual verification: [screenshot path, DOM snapshot, or rendered output captured — or N/A with reason]
|
|
355
|
+
|
|
356
|
+
### Code Review Notes
|
|
357
|
+
- [specific observations, line references]
|
|
358
|
+
|
|
359
|
+
### Concerns
|
|
360
|
+
- [anything that needs attention]
|
|
361
|
+
|
|
362
|
+
### Recommended Follow-ups
|
|
363
|
+
- [monitoring, tests to add, documentation to update]
|
|
364
|
+
```
|
|
365
|
+
|
|
366
|
+
If the Reviewer says **NEEDS REWORK**, send the feedback back to the Solver agent for another iteration. The Solver-Reviewer loop continues until the verdict is APPROVED (max 3 iterations — after that, escalate to the user with full context of what's been tried).
|
|
367
|
+
|
|
368
|
+
---
|
|
369
|
+
|
|
370
|
+
## Phase 5 (War Room Phase 4): Present Results
|
|
371
|
+
|
|
372
|
+
**CHECKPOINT: Before presenting, confirm these are true:**
|
|
373
|
+
- [ ] A Reviewer agent was spawned (not just the Solver declaring victory)
|
|
374
|
+
- [ ] The Reviewer's verdict includes actual evidence (output captures, screenshots, log snippets — not just "PASS")
|
|
375
|
+
- [ ] If the bug was visual, visual evidence was captured
|
|
376
|
+
- [ ] If the fix required a restart, the restart happened and post-restart behavior was verified
|
|
377
|
+
- [ ] No "How to Verify" or "Restart X to see the change" instructions are included in the presentation
|
|
378
|
+
|
|
379
|
+
If any of these are false, you are not ready to present. Go back to Phase 4.
|
|
380
|
+
|
|
381
|
+
Once the Reviewer approves, present the full results to the user:
|
|
382
|
+
|
|
383
|
+
```
|
|
384
|
+
## Debug War Room Complete
|
|
385
|
+
|
|
386
|
+
### Root Cause
|
|
387
|
+
[One paragraph explaining what was wrong — clear enough that someone
|
|
388
|
+
unfamiliar with the code would understand]
|
|
389
|
+
|
|
390
|
+
### What Changed
|
|
391
|
+
[List of files modified with brief descriptions]
|
|
392
|
+
|
|
393
|
+
### Verification Already Performed
|
|
394
|
+
[These are things the Reviewer ALREADY RAN — not suggestions for the
|
|
395
|
+
user to do. Include actual output/evidence.]
|
|
396
|
+
- Reproduction test: PASS — [actual output snippet]
|
|
397
|
+
- Full test suite: PASS — [X tests passed, 0 failures]
|
|
398
|
+
- Build: PASS
|
|
399
|
+
- Runtime verification: [command run, output captured, expected vs actual]
|
|
400
|
+
- Visual verification (if applicable): [what was launched, screenshot/DOM
|
|
401
|
+
evidence, what the user will see — this closes the gap between "tests
|
|
402
|
+
pass" and "it actually works"]
|
|
403
|
+
- Reviewer verdict: APPROVED
|
|
404
|
+
|
|
405
|
+
### Key Findings
|
|
406
|
+
- [Top research findings that informed the fix]
|
|
407
|
+
- [Instrumentation insights that revealed the bug]
|
|
408
|
+
- [Hypotheses that were tested, including ones that were wrong — these
|
|
409
|
+
help the user's understanding]
|
|
410
|
+
|
|
411
|
+
### Commits (in worktree: [branch name])
|
|
412
|
+
[List of commits with messages]
|
|
413
|
+
|
|
414
|
+
Ready to merge. All automated verification has passed.
|
|
415
|
+
```
|
|
416
|
+
|
|
417
|
+
**Do NOT include a "How to Verify Yourself" section with manual steps.** If there is any verification that can be automated, the Reviewer must have already done it. The only reason to mention verification steps to the user is if something genuinely requires human judgment (visual design review, business logic confirmation) — and even then, explain what the agents already checked and what specifically needs a human eye.
|
|
418
|
+
|
|
419
|
+
Wait for the user to validate. Once they confirm:
|
|
420
|
+
|
|
421
|
+
1. Merge the solver's worktree branch to main
|
|
422
|
+
2. Clean up all worktrees and branches
|
|
423
|
+
3. Remove any remaining debug instrumentation (unless the user wants to keep it)
|
|
424
|
+
|
|
425
|
+
---
|
|
426
|
+
|
|
427
|
+
## Phase 6: Escalation Protocol
|
|
428
|
+
|
|
429
|
+
If after 3 Solver-Reviewer iterations the fix still isn't approved:
|
|
430
|
+
|
|
431
|
+
1. Present everything to the user: all hypotheses tested, all fix attempts, all review feedback
|
|
432
|
+
2. Ask the user for direction — they may have context that wasn't available to the agents
|
|
433
|
+
3. If the user provides new information, restart from Phase 1 with the new context
|
|
434
|
+
4. If the user wants to pair on it, switch to interactive debugging with all the instrumentation and research already done as context
|
|
435
|
+
|
|
436
|
+
The war room is powerful but not omniscient. Sometimes the bug requires domain knowledge only the user has. The goal is to do 90% of the work so the user's intervention is a focused 10%.
|