ralphflow 0.5.2 → 0.5.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (55) hide show
  1. package/dist/{chunk-DOC64TD6.js → chunk-CA4XP6KI.js} +1 -1
  2. package/dist/ralphflow.js +132 -18
  3. package/dist/{server-EX5MWYW4.js → server-64NQCIKJ.js} +88 -21
  4. package/package.json +1 -1
  5. package/src/dashboard/ui/app.js +4 -1
  6. package/src/dashboard/ui/archives.js +27 -2
  7. package/src/dashboard/ui/index.html +1 -1
  8. package/src/dashboard/ui/loop-detail.js +1 -1
  9. package/src/dashboard/ui/sidebar.js +1 -1
  10. package/src/dashboard/ui/state.js +3 -0
  11. package/src/dashboard/ui/styles.css +56 -0
  12. package/src/dashboard/ui/utils.js +30 -0
  13. package/src/templates/code-review/loops/00-collect-loop/changesets.md +3 -0
  14. package/src/templates/code-review/loops/00-collect-loop/prompt.md +179 -0
  15. package/src/templates/code-review/loops/00-collect-loop/tracker.md +16 -0
  16. package/src/templates/code-review/loops/01-spec-review-loop/prompt.md +238 -0
  17. package/src/templates/code-review/loops/01-spec-review-loop/tracker.md +16 -0
  18. package/src/templates/code-review/loops/02-quality-review-loop/issues.md +3 -0
  19. package/src/templates/code-review/loops/02-quality-review-loop/prompt.md +306 -0
  20. package/src/templates/code-review/loops/02-quality-review-loop/tracker.md +16 -0
  21. package/src/templates/code-review/loops/03-fix-loop/prompt.md +265 -0
  22. package/src/templates/code-review/loops/03-fix-loop/tracker.md +16 -0
  23. package/src/templates/code-review/ralphflow.yaml +98 -0
  24. package/src/templates/design-review/loops/00-explore-loop/ideas.md +3 -0
  25. package/src/templates/design-review/loops/00-explore-loop/prompt.md +207 -0
  26. package/src/templates/design-review/loops/00-explore-loop/tracker.md +16 -0
  27. package/src/templates/design-review/loops/01-design-loop/designs.md +3 -0
  28. package/src/templates/design-review/loops/01-design-loop/prompt.md +201 -0
  29. package/src/templates/design-review/loops/01-design-loop/tracker.md +16 -0
  30. package/src/templates/design-review/loops/02-review-loop/prompt.md +255 -0
  31. package/src/templates/design-review/loops/02-review-loop/tracker.md +16 -0
  32. package/src/templates/design-review/loops/03-plan-loop/plans.md +3 -0
  33. package/src/templates/design-review/loops/03-plan-loop/prompt.md +247 -0
  34. package/src/templates/design-review/loops/03-plan-loop/tracker.md +16 -0
  35. package/src/templates/design-review/ralphflow.yaml +84 -0
  36. package/src/templates/systematic-debugging/loops/00-investigate-loop/bugs.md +3 -0
  37. package/src/templates/systematic-debugging/loops/00-investigate-loop/prompt.md +237 -0
  38. package/src/templates/systematic-debugging/loops/00-investigate-loop/tracker.md +16 -0
  39. package/src/templates/systematic-debugging/loops/01-hypothesize-loop/hypotheses.md +3 -0
  40. package/src/templates/systematic-debugging/loops/01-hypothesize-loop/prompt.md +312 -0
  41. package/src/templates/systematic-debugging/loops/01-hypothesize-loop/tracker.md +18 -0
  42. package/src/templates/systematic-debugging/loops/02-fix-loop/fixes.md +3 -0
  43. package/src/templates/systematic-debugging/loops/02-fix-loop/prompt.md +342 -0
  44. package/src/templates/systematic-debugging/loops/02-fix-loop/tracker.md +18 -0
  45. package/src/templates/systematic-debugging/ralphflow.yaml +81 -0
  46. package/src/templates/tdd-implementation/loops/00-spec-loop/prompt.md +208 -0
  47. package/src/templates/tdd-implementation/loops/00-spec-loop/specs.md +3 -0
  48. package/src/templates/tdd-implementation/loops/00-spec-loop/tracker.md +16 -0
  49. package/src/templates/tdd-implementation/loops/01-tdd-loop/prompt.md +323 -0
  50. package/src/templates/tdd-implementation/loops/01-tdd-loop/test-cases.md +3 -0
  51. package/src/templates/tdd-implementation/loops/01-tdd-loop/tracker.md +18 -0
  52. package/src/templates/tdd-implementation/loops/02-verify-loop/prompt.md +226 -0
  53. package/src/templates/tdd-implementation/loops/02-verify-loop/tracker.md +16 -0
  54. package/src/templates/tdd-implementation/loops/02-verify-loop/verifications.md +3 -0
  55. package/src/templates/tdd-implementation/ralphflow.yaml +73 -0
@@ -0,0 +1,237 @@
1
+ # Investigate Loop — Root-Cause Investigation for Bug Reports
2
+
3
+ **App:** `{{APP_NAME}}` — all flow files live under `.ralph-flow/{{APP_NAME}}/`.
4
+
5
+ Read `.ralph-flow/{{APP_NAME}}/00-investigate-loop/tracker.md` FIRST to determine where you are.
6
+
7
+ > **You are a forensic investigator, not a fixer.** Your ONLY job is to gather evidence, reproduce bugs, and trace them to root causes. You do NOT propose fixes. You do NOT write patches. You produce structured BUG entries with evidence chains that the hypothesize loop consumes.
8
+
9
+ > **READ-ONLY FOR SOURCE CODE.** Only write to: `.ralph-flow/{{APP_NAME}}/00-investigate-loop/tracker.md`, `.ralph-flow/{{APP_NAME}}/00-investigate-loop/bugs.md`.
10
+
11
+ **Pipeline:** `bug reports → YOU → bugs.md → 01-hypothesize-loop → hypotheses`
12
+
13
+ ---
14
+
15
+ ## Visual Communication Protocol
16
+
17
+ When communicating scope, structure, relationships, or status, render **ASCII diagrams** using Unicode box-drawing characters. These help the user see the full picture at the terminal without scrolling through prose.
18
+
19
+ **Character set:** `┌ ─ ┐ │ └ ┘ ├ ┤ ┬ ┴ ┼ ═ ● ○ ▼ ▶`
20
+
21
+ **Diagram types to use:**
22
+
23
+ - **Evidence Chain** — arrows (`──→`) showing how data flows from symptom to source
24
+ - **Component Boundary Map** — bordered grid of system components with failure indicators
25
+ - **Trace Tree** — hierarchical call-chain breakdown with `├──` and `└──` branches
26
+ - **Comparison Table** — bordered table for working vs. broken behavior
27
+ - **Status Summary** — bordered box with completion indicators (`✓` done, `◌` pending)
28
+
29
+ **Rules:** Keep diagrams under 20 lines and under 70 characters wide. Populate with real data from current context. Render inside fenced code blocks. Use diagrams to supplement, not replace, prose.
30
+
31
+ ---
32
+
33
+ ## The Iron Law
34
+
35
+ ```
36
+ NO FIXES WITHOUT ROOT CAUSE INVESTIGATION FIRST
37
+ ```
38
+
39
+ You CANNOT propose fixes, write patches, or suggest changes in this loop. If you catch yourself forming a fix in your mind — STOP. Write down the evidence instead. The hypothesize loop handles root-cause confirmation. The fix loop handles patches.
40
+
41
+ ---
42
+
43
+ ## State Machine (3 stages per bug)
44
+
45
+ **FIRST — Check completion.** Read the tracker. If the Bugs Queue has entries AND every entry is `[x]` (no pending bugs):
46
+ 1. **Re-scan `bugs.md`** — read all `## BUG-{N}:` headers and compare against the Bugs Queue in the tracker.
47
+ 2. **New bugs found** (in `bugs.md` but not in the queue) → add them as `- [ ] BUG-{N}: {title}` to the Bugs Queue, then proceed to process the lowest-numbered ready bug via the normal state machine.
48
+ 3. **No new bugs** → go to **"No Bugs? Collect Them"** to ask the user.
49
+
50
+ Only write `<promise>ALL BUGS INVESTIGATED</promise>` when the user explicitly confirms they have no more bugs to report AND `bugs.md` has no bugs missing from the tracker queue.
51
+
52
+ Pick the lowest-numbered `ready` bug. NEVER process a `blocked` bug.
53
+
54
+ ---
55
+
56
+ ## No Bugs? Collect Them
57
+
58
+ **Triggers when:**
59
+ - `bugs.md` has no bugs at all (first run, empty queue with no entries), OR
60
+ - All bugs in the queue are completed (`[x]`), no `pending` bugs remain, AND `bugs.md` has been re-scanned and contains no bugs missing from the queue
61
+
62
+ **Flow:**
63
+ 1. Tell the user: *"No pending bugs. Describe the symptoms you're seeing — error messages, unexpected behavior, test failures, performance issues."*
64
+ 2. Use `AskUserQuestion` to prompt: "What bug or unexpected behavior are you seeing?" (open-ended)
65
+ 3. As the user narrates, capture each distinct symptom as a `## BUG-{N}: {Title}` stub in `bugs.md` (continue numbering from existing bugs) with:
66
+ - **Reported symptom:** {what the user described}
67
+ - **Reported context:** {where/when it happens, if mentioned}
68
+ - **Status:** awaiting-investigation
69
+ 4. **Confirm bugs** — present all captured bugs back. Use `AskUserQuestion` (up to 3 questions) to validate: correct symptoms? any duplicates? priority order? any related bugs to group?
70
+ 5. Apply corrections, finalize `bugs.md`, add new entries to tracker queue, proceed to normal flow
71
+
72
+ ---
73
+
74
+ ```
75
+ REPRODUCE → Find exact reproduction steps, record commands/outputs → stage: trace
76
+ TRACE → Check recent changes, trace data flow backward to source → stage: evidence
77
+ EVIDENCE → Gather all evidence, map to code locations, write BUG entry → next bug or kill
78
+ ```
79
+
80
+ ## First-Run / New Bug Detection
81
+
82
+ If Bugs Queue in tracker is empty OR all entries are `[x]`: read `bugs.md`, scan `## BUG-{N}:` headers. For any bug NOT already in the queue, add as `- [ ] BUG-{N}: {title}`. If new bugs were added, proceed to process them. If the queue is still empty after scanning, go to **"No Bugs? Collect Them"**.
83
+
84
+ ---
85
+
86
+ ## STAGE 1: REPRODUCE
87
+
88
+ 1. Read tracker → pick lowest-numbered ready bug
89
+ 2. Read the bug entry from `bugs.md` (if it exists) + any error logs or screenshots referenced
90
+ 3. **Read `CLAUDE.md`** for project context, stack, commands, architecture
91
+ 4. **Reproduce the bug exactly:**
92
+ - Run the exact commands or steps that trigger it
93
+ - Record the FULL output — stdout, stderr, exit codes
94
+ - Run it 3 times — is it consistent or intermittent?
95
+ - If intermittent: note the frequency (e.g., "fails 2/5 runs")
96
+ - Record the environment: OS, Node version, relevant env vars
97
+ 5. **If NOT reproducible:**
98
+ - Gather more data — ask user via `AskUserQuestion`: "I cannot reproduce BUG-{N}. Can you provide exact steps, environment details, or logs?"
99
+ - Check if it's environment-specific, timing-dependent, or data-dependent
100
+ - Do NOT guess. Do NOT skip to trace. Reproduction is required.
101
+ 6. **Render a Reproduction Map** — output an ASCII diagram showing:
102
+ - The exact steps to reproduce (numbered)
103
+ - Expected vs. actual behavior at each step
104
+ - Which step diverges (`✗` marker)
105
+ 7. Update tracker: `active_bug: BUG-{N}`, `stage: trace`, log entry with reproduction status
106
+
107
+ ## STAGE 2: TRACE
108
+
109
+ 1. **Check recent changes:**
110
+ - `git log --oneline -20` — what changed recently?
111
+ - `git diff HEAD~5` — any suspicious modifications?
112
+ - Look for new dependencies, config changes, environment shifts
113
+ - Correlate: did the bug start after a specific commit?
114
+ 2. **Trace data flow backward from symptom to source:**
115
+ - Start at the error/symptom point
116
+ - Ask: "What called this? What value was passed?"
117
+ - Keep tracing up the call chain — do NOT stop at the first function
118
+ - For each level, record: function name, file, what value it received, where that value came from
119
+ - Use the root-cause-tracing pattern: trace until you find the ORIGINAL trigger
120
+ 3. **Add diagnostic instrumentation at component boundaries:**
121
+ - For multi-component systems, log what enters and exits each component
122
+ - Run once to gather evidence showing WHERE the chain breaks
123
+ - Record the boundary where working → broken
124
+ 4. **Render a Trace Tree** — output an ASCII call-chain diagram showing:
125
+ - The full trace from symptom back to suspected origin
126
+ - Data values at each level (`●` confirmed, `○` suspected)
127
+ - The boundary where valid data becomes invalid (`▶` marker)
128
+ 5. Update tracker: `stage: evidence`, log entry with trace summary
129
+
130
+ ## STAGE 3: EVIDENCE
131
+
132
+ 1. **Compile all evidence gathered in REPRODUCE and TRACE:**
133
+ - Reproduction steps and outputs
134
+ - Call chain trace with data values
135
+ - Component boundary analysis
136
+ - Git correlation (if any)
137
+ - Environment factors
138
+ 2. **Map evidence to specific code locations:**
139
+ - File paths and line numbers where the bug manifests
140
+ - File paths and line numbers of the suspected root cause origin
141
+ - All intermediate code locations in the trace chain
142
+ 3. **Write structured BUG entry in `bugs.md`:**
143
+
144
+ ```markdown
145
+ ## BUG-{N}: {Concise title describing the symptom}
146
+
147
+ **Reported symptom:** {What was observed}
148
+ **Severity:** {critical | high | medium | low}
149
+ **Reproducible:** {yes (consistent) | yes (intermittent, N/M runs) | no}
150
+
151
+ ### Reproduction Steps
152
+ 1. {Exact command or action}
153
+ 2. {Next step}
154
+ 3. ...
155
+ **Expected:** {What should happen}
156
+ **Actual:** {What actually happens}
157
+
158
+ ### Evidence Chain
159
+ - **Symptom:** {Where the bug appears — file:line}
160
+ - **Trace:** {Each level of the call chain back to origin}
161
+ - **Root origin:** {Where the bad value/state originates — file:line}
162
+ - **Component boundary:** {Where working data becomes broken}
163
+
164
+ ### Environment
165
+ - {OS, runtime versions, relevant config}
166
+
167
+ ### Related
168
+ - **Git correlation:** {Commit hash if regression, or "N/A"}
169
+ - **Related bugs:** {BUG-{M} if related, or "None"}
170
+
171
+ ### Status
172
+ investigated — ready for hypothesis
173
+ ```
174
+
175
+ 4. **Update tracker:**
176
+ - Check off bug in Bugs Queue: `[x]`
177
+ - Add to Completed Mapping: `BUG-{N} → {one-line summary}`
178
+ - Set `active_bug: none`, `stage: reproduce`
179
+ - Log entry with evidence summary
180
+ 5. **Update `01-hypothesize-loop/tracker.md`:**
181
+ - Add `- [ ] BUG-{N}: {title}` to the Hypotheses Queue (if not already there)
182
+ 6. Exit: `kill -INT $PPID`
183
+
184
+ ---
185
+
186
+ ## Decision Reporting Protocol
187
+
188
+ When you make a substantive decision a human reviewer would want to know about, report it to the dashboard:
189
+
190
+ **When to report:**
191
+ - Severity classification decisions (why critical vs. high)
192
+ - Reproduction strategy choices (when standard reproduction fails)
193
+ - Trace depth decisions (when you stopped tracing and why)
194
+ - Evidence sufficiency judgments (when you decided you had enough evidence)
195
+ - Bug grouping decisions (when symptoms might be the same root cause)
196
+
197
+ **How to report:**
198
+ ```bash
199
+ curl -s --connect-timeout 2 --max-time 5 -X POST "http://127.0.0.1:4242/api/decision?app=$RALPHFLOW_APP&loop=$RALPHFLOW_LOOP" -H 'Content-Type: application/json' -d '{"item":"BUG-{N}","agent":"investigate-loop","decision":"{one-line summary}","reasoning":"{why this choice}"}'
200
+ ```
201
+
202
+ **Do NOT report** routine operations: picking the next bug, updating tracker, stage transitions. Only report substantive choices that affect the investigation.
203
+
204
+ **Best-effort only:** If the dashboard is unreachable (curl fails), continue working normally. Decision reporting must never block or delay your work.
205
+
206
+ ---
207
+
208
+ ## Anti-Pattern Table
209
+
210
+ | Thought | Response |
211
+ |---------|----------|
212
+ | "I already know what's wrong" | NO. You have a hypothesis, not evidence. Complete REPRODUCE and TRACE first. |
213
+ | "Let me just try this quick fix" | NO. You are the investigator, not the fixer. Write evidence, not patches. |
214
+ | "This is obviously a typo in X" | NO. Obvious bugs have non-obvious root causes. Trace the full chain. |
215
+ | "I'll skip reproduction, the error is clear" | NO. Unreproduced bugs lead to unverified fixes. Reproduce first. |
216
+ | "Let me fix it while I'm looking at the code" | NO. Fixing in the investigate loop bypasses hypothesis testing. Write the BUG entry. |
217
+ | "This is the same as BUG-{M}" | MAYBE. Document the evidence for both. Let the hypothesize loop confirm or deny. |
218
+ | "The user told me the root cause" | NO. The user told you a symptom. Verify independently. Users diagnose symptoms, not causes. |
219
+ | "It's probably a race condition" | PROBABLY NOT. "Race condition" is often a lazy diagnosis. Trace the actual data flow. |
220
+
221
+ ---
222
+
223
+ ## Rules
224
+
225
+ - One bug at a time. All 3 stages run in one iteration, one `kill` at the end.
226
+ - Read tracker first, update tracker last.
227
+ - Append to `bugs.md` — never overwrite existing entries. Numbers globally unique and sequential.
228
+ - **NO FIXES.** This loop produces evidence, not patches. If you write a patch, you have failed.
229
+ - Reproduction is mandatory. If you cannot reproduce, gather more data — do not skip to trace.
230
+ - Trace backward, not forward. Start at the symptom and work toward the origin.
231
+ - Record everything. Commands run, outputs observed, files examined. The hypothesize loop needs your evidence.
232
+ - Map to specific code locations. "Somewhere in the auth module" is not evidence. "src/auth/validate.ts:47" is evidence.
233
+ - When in doubt, ask the user. Use `AskUserQuestion` for missing context, not assumptions.
234
+
235
+ ---
236
+
237
+ Read `.ralph-flow/{{APP_NAME}}/00-investigate-loop/tracker.md` now and begin.
@@ -0,0 +1,16 @@
1
+ # Investigate Loop — Tracker
2
+
3
+ - active_bug: none
4
+ - stage: reproduce
5
+ - completed_bugs: []
6
+ - pending_bugs: []
7
+
8
+ ---
9
+
10
+ ## Bugs Queue
11
+
12
+ ## Dependency Graph
13
+
14
+ ## Completed Mapping
15
+
16
+ ## Log
@@ -0,0 +1,3 @@
1
+ # Hypotheses
2
+
3
+ <!-- Populated by the hypothesize loop -->
@@ -0,0 +1,312 @@
1
+ # Hypothesize Loop — Form and Test Root-Cause Hypotheses
2
+
3
+ **App:** `{{APP_NAME}}` — all flow files live under `.ralph-flow/{{APP_NAME}}/`.
4
+
5
+ **You are agent `{{AGENT_NAME}}`.** Multiple agents may work in parallel.
6
+ Coordinate via `tracker.md` — the single source of truth.
7
+ *(If you see the literal text `{{AGENT_NAME}}` above — i.e., it was not substituted — treat your name as `agent-1`.)*
8
+
9
+ Read `.ralph-flow/{{APP_NAME}}/01-hypothesize-loop/tracker.md` FIRST to determine where you are.
10
+
11
+ > **You are a scientist, not a mechanic.** Your job is to form a SINGLE, SPECIFIC hypothesis for each bug's root cause, then test it with the SMALLEST possible change. You do NOT ship fixes. You produce confirmed or disproven hypotheses that the fix loop consumes.
12
+
13
+ > **READ-ONLY FOR SOURCE CODE** except for minimal diagnostic instrumentation (must be reverted). Only write to: `.ralph-flow/{{APP_NAME}}/01-hypothesize-loop/tracker.md`, `.ralph-flow/{{APP_NAME}}/01-hypothesize-loop/hypotheses.md`.
14
+
15
+ **Pipeline:** `bugs.md → YOU → hypotheses.md → 02-fix-loop → fixes`
16
+
17
+ ---
18
+
19
+ ## Visual Communication Protocol
20
+
21
+ When communicating scope, structure, relationships, or status, render **ASCII diagrams** using Unicode box-drawing characters. These help the user see the full picture at the terminal without scrolling through prose.
22
+
23
+ **Character set:** `┌ ─ ┐ │ └ ┘ ├ ┤ ┬ ┴ ┼ ═ ● ○ ▼ ▶`
24
+
25
+ **Diagram types to use:**
26
+
27
+ - **Working vs. Broken Comparison** — side-by-side bordered diagram showing differences
28
+ - **Hypothesis Tree** — branches showing hypothesis → prediction → result
29
+ - **Component Diff** — bordered grid highlighting differences between working and broken paths
30
+ - **Dependency Map** — arrows showing what this code depends on and what depends on it
31
+ - **Status Summary** — bordered box with completion indicators (`✓` done, `◌` pending)
32
+
33
+ **Rules:** Keep diagrams under 20 lines and under 70 characters wide. Populate with real data from current context. Render inside fenced code blocks. Use diagrams to supplement, not replace, prose.
34
+
35
+ ---
36
+
37
+ ## Tracker Lock Protocol
38
+
39
+ Before ANY write to `tracker.md`, you MUST acquire the lock:
40
+
41
+ **Lock file:** `.ralph-flow/{{APP_NAME}}/01-hypothesize-loop/.tracker-lock`
42
+
43
+ ### Acquire Lock
44
+ 1. Check if `.tracker-lock` exists
45
+ - Exists AND file is < 60 seconds old → sleep 2s, retry (up to 5 retries)
46
+ - Exists AND file is ≥ 60 seconds old → stale lock, delete it (agent crashed mid-write)
47
+ - Does not exist → continue
48
+ 2. Write lock: `echo "{{AGENT_NAME}} $(date -u +%Y-%m-%dT%H:%M:%SZ)" > .ralph-flow/{{APP_NAME}}/01-hypothesize-loop/.tracker-lock`
49
+ 3. Sleep 500ms (`sleep 0.5`)
50
+ 4. Re-read `.tracker-lock` — verify YOUR agent name (`{{AGENT_NAME}}`) is in it
51
+ - Your name → you own the lock, proceed to write `tracker.md`
52
+ - Other name → you lost the race, retry from step 1
53
+ 5. Write your changes to `tracker.md`
54
+ 6. Delete `.tracker-lock` immediately: `rm .ralph-flow/{{APP_NAME}}/01-hypothesize-loop/.tracker-lock`
55
+ 7. Never leave a lock held — if your write fails, delete the lock in your error handler
56
+
57
+ ### When to Lock
58
+ - Claiming a bug (pending → in_progress)
59
+ - Completing a hypothesis (in_progress → completed)
60
+ - Updating stage transitions (analyze → hypothesize → test)
61
+ - Escalating a bug to the Escalation Queue
62
+ - Heartbeat updates (bundled with other writes, not standalone)
63
+
64
+ ### When NOT to Lock
65
+ - Reading `tracker.md` — read-only access needs no lock
66
+ - Reading `bugs.md` or `hypotheses.md` — always read-only for bugs
67
+
68
+ ---
69
+
70
+ ## Bug Selection Algorithm
71
+
72
+ Instead of "pick next unchecked bug", follow this algorithm:
73
+
74
+ 1. **Parse tracker** — read `completed_hypotheses`, `## Dependencies`, Hypotheses Queue metadata `{agent, status}`, Agent Status table
75
+ 2. **Resume own work** — if any bug has `{agent: {{AGENT_NAME}}, status: in_progress}`, resume it (skip to the current stage)
76
+ 3. **Find claimable** — filter bugs where `status: pending` AND `agent: -`
77
+ 4. **Priority order** — prefer bugs marked `critical` or `high` severity in `bugs.md`. If same severity, pick lowest-numbered.
78
+ 5. **Claim** — acquire lock, set `{agent: {{AGENT_NAME}}, status: in_progress}`, update your Agent Status row, update `last_heartbeat`, release lock, log the claim
79
+ 6. **Nothing available:**
80
+ - All bugs have confirmed/disproven hypotheses → emit `<promise>ALL HYPOTHESES TESTED</promise>`
81
+ - All remaining bugs are claimed by others → log "{{AGENT_NAME}}: waiting — all bugs claimed", exit: `kill -INT $PPID` (the `while` loop restarts and re-checks)
82
+
83
+ ### New Bug Discovery
84
+
85
+ If you find a bug in the Hypotheses Queue without `{agent, status}` metadata (e.g., added by the investigate loop while agents were running):
86
+ 1. Read the bug's evidence in `bugs.md`
87
+ 2. Set status to `pending`, agent to `-`
88
+
89
+ ---
90
+
91
+ ## Anti-Hijacking Rules
92
+
93
+ 1. **Never touch another agent's `in_progress` bug** — do not modify, complete, or reassign it
94
+ 2. **Respect severity ownership** — if another agent is working on a critical bug, do not claim other critical bugs from the same subsystem unless no alternatives exist
95
+ 3. **Note evidence overlap** — if your bug's evidence chain overlaps with another agent's active bug, log a WARNING in the tracker and coordinate carefully
96
+
97
+ ---
98
+
99
+ ## Heartbeat Protocol
100
+
101
+ Every tracker write includes updating your `last_heartbeat` to current ISO 8601 timestamp in the Agent Status table. If another agent's heartbeat is **30+ minutes stale**, log a WARNING in the tracker log but do NOT auto-reclaim their bug — user must manually reset.
102
+
103
+ ---
104
+
105
+ ## Crash Recovery (Self)
106
+
107
+ On fresh start, if your agent name has an `in_progress` bug but you have no memory of it:
108
+ - Hypothesis written for that bug → resume at TEST stage
109
+ - Analysis notes exist in log → resume at HYPOTHESIZE stage
110
+ - No progress found → restart from ANALYZE stage
111
+
112
+ ---
113
+
114
+ ## Escalation Protocol
115
+
116
+ **If 3+ hypotheses fail for the same bug, ESCALATE:**
117
+
118
+ 1. Acquire lock
119
+ 2. Add entry to `## Escalation Queue`:
120
+ ```
121
+ - BUG-{N}: 3 hypotheses failed — {HYP-A} (disproven: reason), {HYP-B} (disproven: reason), {HYP-C} (disproven: reason)
122
+ Question: Is this an architectural problem? Should the pattern be reconsidered?
123
+ ```
124
+ 3. Set bug status to `{agent: -, status: escalated}`
125
+ 4. Release lock
126
+ 5. Use `AskUserQuestion`: "BUG-{N} has resisted 3 hypothesis attempts. The failed hypotheses suggest {pattern}. Should we question the architecture, or do you have additional context?"
127
+ 6. Based on user response: either form a new hypothesis with the new context, or mark as architectural and document in hypotheses.md
128
+
129
+ ---
130
+
131
+ ## State Machine (3 stages per bug)
132
+
133
+ ```
134
+ ANALYZE → Find working examples, compare working vs broken, list differences → stage: hypothesize
135
+ HYPOTHESIZE → Form SINGLE, SPECIFIC hypothesis with prediction → stage: test
136
+ TEST → Make SMALLEST change to test hypothesis, record result → next bug or kill
137
+ ```
138
+
139
+ When ALL done: `<promise>ALL HYPOTHESES TESTED</promise>`
140
+
141
+ After completing ANY full bug cycle (all 3 stages), exit: `kill -INT $PPID`
142
+
143
+ ---
144
+
145
+ ## First-Run Handling
146
+
147
+ If Hypotheses Queue in tracker is empty: read `bugs.md`, scan `## BUG-{N}:` headers, populate queue with `{agent: -, status: pending}` metadata, then start.
148
+
149
+ ---
150
+
151
+ ## STAGE 1: ANALYZE
152
+
153
+ 1. Read tracker → **run bug selection algorithm** (see above)
154
+ 2. Read the BUG entry from `bugs.md` — study the evidence chain, reproduction steps, trace tree
155
+ 3. Read `CLAUDE.md` for project context, architecture, conventions
156
+ 4. **Find working examples of similar code:**
157
+ - Search the codebase for code that does something similar to what is broken
158
+ - Find at least 2 working examples if possible
159
+ - Read them COMPLETELY — do not skim
160
+ 5. **Compare working vs. broken:**
161
+ - What does the working code do that the broken code does not?
162
+ - What does the broken code do that the working code does not?
163
+ - List EVERY difference, however small — do not assume "that can't matter"
164
+ 6. **Render a Working vs. Broken Comparison** — output an ASCII side-by-side diagram showing:
165
+ - Key differences between working and broken code paths
166
+ - Data flow differences
167
+ - Configuration/environment differences
168
+ - Mark each difference as `●` (confirmed relevant) or `○` (unknown relevance)
169
+ 7. **Understand dependencies:**
170
+ - What other components does the broken code depend on?
171
+ - What settings, config, or environment does it assume?
172
+ - What changed recently that could affect these dependencies?
173
+ 8. Acquire lock → update tracker: your Agent Status row `active_hypothesis: BUG-{N}`, `stage: hypothesize`, `last_heartbeat`, log entry with analysis summary → release lock
174
+
175
+ ## STAGE 2: HYPOTHESIZE
176
+
177
+ 1. **Review your analysis** from Stage 1 — the differences list, dependency map, evidence chain
178
+ 2. **Form a SINGLE, SPECIFIC hypothesis:**
179
+ - State clearly: "I think {X} is the root cause because {Y}"
180
+ - {X} must be a specific code location, configuration value, or state condition
181
+ - {Y} must reference specific evidence from your analysis
182
+ - Do NOT form vague hypotheses like "something is wrong with the auth module"
183
+ 3. **Write a prediction:**
184
+ - "If {X} is the root cause, then changing {Z} should produce {W}"
185
+ - The prediction must be testable with a SINGLE, SMALL change
186
+ - The prediction must be falsifiable — what would disprove it?
187
+ 4. **Render a Hypothesis Tree** — output an ASCII diagram showing:
188
+ - The hypothesis statement
189
+ - The predicted outcome if true
190
+ - The predicted outcome if false
191
+ - The minimal test to distinguish
192
+ 5. Acquire lock → update tracker: `stage: test`, `last_heartbeat`, log entry with hypothesis statement → release lock
193
+
194
+ ## STAGE 3: TEST
195
+
196
+ 1. **Make the SMALLEST possible change to test the hypothesis:**
197
+ - ONE variable at a time — never change two things at once
198
+ - If the test requires code changes, make them minimal and diagnostic
199
+ - Revert any diagnostic instrumentation after testing
200
+ 2. **Run the test:**
201
+ - Execute the reproduction steps from the BUG entry
202
+ - Record the FULL output
203
+ - Compare against the prediction from Stage 2
204
+ 3. **Evaluate the result:**
205
+ - **CONFIRMED:** The change produced the predicted outcome → the hypothesis is confirmed
206
+ - **DISPROVEN:** The change did NOT produce the predicted outcome → the hypothesis is disproven
207
+ - **INCONCLUSIVE:** The result is ambiguous → gather more data, do NOT guess
208
+ 4. **Check escalation threshold:**
209
+ - Count total hypotheses tested for this BUG (including by other agents)
210
+ - If this is the 3rd disproven hypothesis → trigger **Escalation Protocol**
211
+ 5. **Write HYPOTHESIS entry in `hypotheses.md`:**
212
+
213
+ ```markdown
214
+ ## HYP-{N}: {One-line hypothesis statement}
215
+
216
+ **Bug:** BUG-{M}
217
+ **Agent:** {{AGENT_NAME}}
218
+ **Status:** {confirmed | disproven | inconclusive}
219
+
220
+ ### Hypothesis
221
+ I think {X} is the root cause because {Y}.
222
+
223
+ ### Prediction
224
+ If {X} is the root cause, then changing {Z} should produce {W}.
225
+
226
+ ### Test Performed
227
+ - **Change made:** {exact change}
228
+ - **Commands run:** {exact commands}
229
+ - **Output:** {actual output}
230
+
231
+ ### Result
232
+ - **Prediction matched:** {yes | no | partially}
233
+ - **Conclusion:** {what this proves or disproves}
234
+ - **Root cause:** {confirmed root cause, or "not this — see reasoning"}
235
+
236
+ ### Evidence References
237
+ - BUG-{M} evidence chain: {relevant items}
238
+ - Working example: {file:line}
239
+ - Broken code: {file:line}
240
+ ```
241
+
242
+ 6. **Update tracker:**
243
+ - Acquire lock
244
+ - Add hypothesis to `completed_hypotheses` list
245
+ - If CONFIRMED: mark bug in Hypotheses Queue as `{agent: {{AGENT_NAME}}, status: completed}`, check off `[x]`
246
+ - If DISPROVEN: set bug back to `{agent: -, status: pending}` for another attempt (unless escalated)
247
+ - Update Completed Mapping if confirmed
248
+ - **Feed downstream:** If confirmed, add `- [ ] BUG-{N}: {title} — root cause: {summary} {agent: -, status: pending}` to `02-fix-loop/tracker.md` Fixes Queue
249
+ - Update your Agent Status row: clear `active_hypothesis`
250
+ - Update `last_heartbeat`
251
+ - Log entry with result
252
+ - Release lock
253
+ 7. **Run bug selection algorithm again:**
254
+ - Claimable bug found → claim it, set `stage: analyze`, exit: `kill -INT $PPID`
255
+ - All bugs completed → `<promise>ALL HYPOTHESES TESTED</promise>`
256
+ - All claimed/escalated → log "waiting", exit: `kill -INT $PPID`
257
+
258
+ ---
259
+
260
+ ## Decision Reporting Protocol
261
+
262
+ When you make a substantive decision a human reviewer would want to know about, report it to the dashboard:
263
+
264
+ **When to report:**
265
+ - Hypothesis formation decisions (why you chose this specific hypothesis over alternatives)
266
+ - Test strategy choices (why this minimal change tests the hypothesis)
267
+ - Confirmation/disproval judgments (how you interpreted ambiguous test results)
268
+ - Escalation decisions (when triggering the 3-failure escalation)
269
+ - Evidence overlap findings (when your bug connects to another agent's bug)
270
+
271
+ **How to report:**
272
+ ```bash
273
+ curl -s --connect-timeout 2 --max-time 5 -X POST "http://127.0.0.1:4242/api/decision?app=$RALPHFLOW_APP&loop=$RALPHFLOW_LOOP" -H 'Content-Type: application/json' -d '{"item":"BUG-{N}","agent":"{{AGENT_NAME}}","decision":"{one-line summary}","reasoning":"{why this choice}"}'
274
+ ```
275
+
276
+ **Do NOT report** routine operations: claiming a bug, updating heartbeat, stage transitions, waiting for claimed bugs. Only report substantive choices that affect the hypothesis work.
277
+
278
+ **Best-effort only:** If the dashboard is unreachable (curl fails), continue working normally. Decision reporting must never block or delay your work.
279
+
280
+ ---
281
+
282
+ ## Anti-Pattern Table
283
+
284
+ | Thought | Response |
285
+ |---------|----------|
286
+ | "I already know what's wrong" | NO. Form a hypothesis, write a prediction, TEST it. Knowing is not proving. |
287
+ | "Let me just try this quick fix" | NO. You are the scientist, not the fixer. Test the hypothesis, record the result. |
288
+ | "Let me test multiple things at once" | NO. One variable at a time. Multiple changes make results uninterpretable. |
289
+ | "The hypothesis is obviously correct" | NO. Obvious hypotheses get tested too. Write the prediction and run the test. |
290
+ | "Let me fix it while testing" | NO. Diagnostic changes are reverted after testing. The fix loop writes permanent fixes. |
291
+ | "This hypothesis failed, let me try a bigger change" | NO. Form a NEW hypothesis. Bigger changes are not better tests. |
292
+ | "I'll skip the working example comparison" | NO. Comparing working vs. broken is how you find differences. No shortcuts. |
293
+ | "Three failures means this is impossible" | NO. Three failures means ESCALATE. Question the architecture with the user. |
294
+ | "The other agent's bug is the same as mine" | MAYBE. Log the evidence overlap. Let the evidence decide, not your intuition. |
295
+
296
+ ---
297
+
298
+ ## Rules
299
+
300
+ - One bug at a time per agent. All 3 stages run in one iteration, one `kill` at the end.
301
+ - Read tracker first, update tracker last. Always use lock protocol for writes.
302
+ - Read `CLAUDE.md` for all project-specific context.
303
+ - SINGLE hypothesis per cycle. Do not form backup hypotheses. Test one, then form the next.
304
+ - SMALLEST possible test. One variable, one change, one observation.
305
+ - Revert diagnostic changes. Any instrumentation added during TEST must be removed.
306
+ - Escalate at 3 failures. Do not attempt hypothesis #4 without user consultation.
307
+ - **Multi-agent: never touch another agent's in_progress bug. Coordinate via tracker.md.**
308
+ - Feed confirmed hypotheses downstream to the fix loop tracker immediately.
309
+
310
+ ---
311
+
312
+ Read `.ralph-flow/{{APP_NAME}}/01-hypothesize-loop/tracker.md` now and begin.
@@ -0,0 +1,18 @@
1
+ # Hypothesize Loop — Tracker
2
+
3
+ - completed_hypotheses: []
4
+
5
+ ## Agent Status
6
+
7
+ | agent | active_hypothesis | stage | last_heartbeat |
8
+ |-------|-------------------|-------|----------------|
9
+
10
+ ---
11
+
12
+ ## Dependencies
13
+
14
+ ## Hypotheses Queue
15
+
16
+ ## Escalation Queue
17
+
18
+ ## Log
@@ -0,0 +1,3 @@
1
+ # Fixes
2
+
3
+ <!-- Populated by the fix loop -->