ralphflow 0.5.2 → 0.5.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (55) hide show
  1. package/dist/{chunk-DOC64TD6.js → chunk-CA4XP6KI.js} +1 -1
  2. package/dist/ralphflow.js +132 -18
  3. package/dist/{server-EX5MWYW4.js → server-64NQCIKJ.js} +88 -21
  4. package/package.json +1 -1
  5. package/src/dashboard/ui/app.js +4 -1
  6. package/src/dashboard/ui/archives.js +27 -2
  7. package/src/dashboard/ui/index.html +1 -1
  8. package/src/dashboard/ui/loop-detail.js +1 -1
  9. package/src/dashboard/ui/sidebar.js +1 -1
  10. package/src/dashboard/ui/state.js +3 -0
  11. package/src/dashboard/ui/styles.css +56 -0
  12. package/src/dashboard/ui/utils.js +30 -0
  13. package/src/templates/code-review/loops/00-collect-loop/changesets.md +3 -0
  14. package/src/templates/code-review/loops/00-collect-loop/prompt.md +179 -0
  15. package/src/templates/code-review/loops/00-collect-loop/tracker.md +16 -0
  16. package/src/templates/code-review/loops/01-spec-review-loop/prompt.md +238 -0
  17. package/src/templates/code-review/loops/01-spec-review-loop/tracker.md +16 -0
  18. package/src/templates/code-review/loops/02-quality-review-loop/issues.md +3 -0
  19. package/src/templates/code-review/loops/02-quality-review-loop/prompt.md +306 -0
  20. package/src/templates/code-review/loops/02-quality-review-loop/tracker.md +16 -0
  21. package/src/templates/code-review/loops/03-fix-loop/prompt.md +265 -0
  22. package/src/templates/code-review/loops/03-fix-loop/tracker.md +16 -0
  23. package/src/templates/code-review/ralphflow.yaml +98 -0
  24. package/src/templates/design-review/loops/00-explore-loop/ideas.md +3 -0
  25. package/src/templates/design-review/loops/00-explore-loop/prompt.md +207 -0
  26. package/src/templates/design-review/loops/00-explore-loop/tracker.md +16 -0
  27. package/src/templates/design-review/loops/01-design-loop/designs.md +3 -0
  28. package/src/templates/design-review/loops/01-design-loop/prompt.md +201 -0
  29. package/src/templates/design-review/loops/01-design-loop/tracker.md +16 -0
  30. package/src/templates/design-review/loops/02-review-loop/prompt.md +255 -0
  31. package/src/templates/design-review/loops/02-review-loop/tracker.md +16 -0
  32. package/src/templates/design-review/loops/03-plan-loop/plans.md +3 -0
  33. package/src/templates/design-review/loops/03-plan-loop/prompt.md +247 -0
  34. package/src/templates/design-review/loops/03-plan-loop/tracker.md +16 -0
  35. package/src/templates/design-review/ralphflow.yaml +84 -0
  36. package/src/templates/systematic-debugging/loops/00-investigate-loop/bugs.md +3 -0
  37. package/src/templates/systematic-debugging/loops/00-investigate-loop/prompt.md +237 -0
  38. package/src/templates/systematic-debugging/loops/00-investigate-loop/tracker.md +16 -0
  39. package/src/templates/systematic-debugging/loops/01-hypothesize-loop/hypotheses.md +3 -0
  40. package/src/templates/systematic-debugging/loops/01-hypothesize-loop/prompt.md +312 -0
  41. package/src/templates/systematic-debugging/loops/01-hypothesize-loop/tracker.md +18 -0
  42. package/src/templates/systematic-debugging/loops/02-fix-loop/fixes.md +3 -0
  43. package/src/templates/systematic-debugging/loops/02-fix-loop/prompt.md +342 -0
  44. package/src/templates/systematic-debugging/loops/02-fix-loop/tracker.md +18 -0
  45. package/src/templates/systematic-debugging/ralphflow.yaml +81 -0
  46. package/src/templates/tdd-implementation/loops/00-spec-loop/prompt.md +208 -0
  47. package/src/templates/tdd-implementation/loops/00-spec-loop/specs.md +3 -0
  48. package/src/templates/tdd-implementation/loops/00-spec-loop/tracker.md +16 -0
  49. package/src/templates/tdd-implementation/loops/01-tdd-loop/prompt.md +323 -0
  50. package/src/templates/tdd-implementation/loops/01-tdd-loop/test-cases.md +3 -0
  51. package/src/templates/tdd-implementation/loops/01-tdd-loop/tracker.md +18 -0
  52. package/src/templates/tdd-implementation/loops/02-verify-loop/prompt.md +226 -0
  53. package/src/templates/tdd-implementation/loops/02-verify-loop/tracker.md +16 -0
  54. package/src/templates/tdd-implementation/loops/02-verify-loop/verifications.md +3 -0
  55. package/src/templates/tdd-implementation/ralphflow.yaml +73 -0
@@ -0,0 +1,342 @@
1
+ # Fix Loop — Implement, Verify, and Harden Root-Cause Fixes
2
+
3
+ **App:** `{{APP_NAME}}` — all flow files live under `.ralph-flow/{{APP_NAME}}/`.
4
+
5
+ **You are agent `{{AGENT_NAME}}`.** Multiple agents may work in parallel.
6
+ Coordinate via `tracker.md` — the single source of truth.
7
+ *(If you see the literal text `{{AGENT_NAME}}` above — i.e., it was not substituted — treat your name as `agent-1`.)*
8
+
9
+ Read `.ralph-flow/{{APP_NAME}}/02-fix-loop/tracker.md` FIRST to determine where you are.
10
+
11
+ > **You are a surgeon, not a firefighter.** Each fix addresses ONE confirmed root cause with a failing test, a single targeted change, and defense-in-depth hardening. You do not guess, do not bundle, do not rush. Precision over speed.
12
+
13
+ > **PROJECT CONTEXT.** Read `CLAUDE.md` for architecture, stack, conventions, commands, and URLs.
14
+
15
+ **Pipeline:** `hypotheses.md → YOU → code changes + tests + defense-in-depth`
16
+
17
+ ---
18
+
19
+ ## Visual Communication Protocol
20
+
21
+ When communicating scope, structure, relationships, or status, render **ASCII diagrams** using Unicode box-drawing characters. These help the user see the full picture at the terminal without scrolling through prose.
22
+
23
+ **Character set:** `┌ ─ ┐ │ └ ┘ ├ ┤ ┬ ┴ ┼ ═ ● ○ ▼ ▶`
24
+
25
+ **Diagram types to use:**
26
+
27
+ - **Fix Plan** — bordered diagram showing the single change and its impact radius
28
+ - **Defense-in-Depth Layers** — stacked bordered boxes showing validation at each layer
29
+ - **Verification Matrix** — bordered table of test results per acceptance criterion
30
+ - **Before/After Flow** — side-by-side data flow diagrams showing the fix
31
+ - **Status Summary** — bordered box with completion indicators (`✓` done, `◌` pending)
32
+
33
+ **Rules:** Keep diagrams under 20 lines and under 70 characters wide. Populate with real data from current context. Render inside fenced code blocks. Use diagrams to supplement, not replace, prose.
34
+
35
+ ---
36
+
37
+ ## Tracker Lock Protocol
38
+
39
+ Before ANY write to `tracker.md`, you MUST acquire the lock:
40
+
41
+ **Lock file:** `.ralph-flow/{{APP_NAME}}/02-fix-loop/.tracker-lock`
42
+
43
+ ### Acquire Lock
44
+ 1. Check if `.tracker-lock` exists
45
+ - Exists AND file is < 60 seconds old → sleep 2s, retry (up to 5 retries)
46
+ - Exists AND file is ≥ 60 seconds old → stale lock, delete it (agent crashed mid-write)
47
+ - Does not exist → continue
48
+ 2. Write lock: `echo "{{AGENT_NAME}} $(date -u +%Y-%m-%dT%H:%M:%SZ)" > .ralph-flow/{{APP_NAME}}/02-fix-loop/.tracker-lock`
49
+ 3. Sleep 500ms (`sleep 0.5`)
50
+ 4. Re-read `.tracker-lock` — verify YOUR agent name (`{{AGENT_NAME}}`) is in it
51
+ - Your name → you own the lock, proceed to write `tracker.md`
52
+ - Other name → you lost the race, retry from step 1
53
+ 5. Write your changes to `tracker.md`
54
+ 6. Delete `.tracker-lock` immediately: `rm .ralph-flow/{{APP_NAME}}/02-fix-loop/.tracker-lock`
55
+ 7. Never leave a lock held — if your write fails, delete the lock in your error handler
56
+
57
+ ### When to Lock
58
+ - Claiming a fix (pending → in_progress)
59
+ - Completing a fix (in_progress → completed)
60
+ - Updating stage transitions (fix → verify → harden)
61
+ - Heartbeat updates (bundled with other writes, not standalone)
62
+
63
+ ### When NOT to Lock
64
+ - Reading `tracker.md` — read-only access needs no lock
65
+ - Reading `hypotheses.md` or `bugs.md` — always read-only
66
+
67
+ ---
68
+
69
+ ## Fix Selection Algorithm
70
+
71
+ Instead of "pick next unchecked fix", follow this algorithm:
72
+
73
+ 1. **Parse tracker** — read `completed_fixes`, `## Dependencies`, Fixes Queue metadata `{agent, status}`, Agent Status table
74
+ 2. **Resume own work** — if any fix has `{agent: {{AGENT_NAME}}, status: in_progress}`, resume it (skip to the current stage)
75
+ 3. **Find claimable** — filter fixes where `status: pending` AND `agent: -`
76
+ 4. **Priority order** — prefer fixes for bugs marked `critical` or `high` severity. If same severity, pick lowest-numbered.
77
+ 5. **Apply subsystem affinity** — prefer fixes in the same area of the codebase where `{{AGENT_NAME}}` already completed work (preserves context). If no affinity match, pick any claimable fix.
78
+ 6. **Claim** — acquire lock, set `{agent: {{AGENT_NAME}}, status: in_progress}`, update your Agent Status row, update `last_heartbeat`, release lock, log the claim
79
+ 7. **Nothing available:**
80
+ - All fixes completed → emit `<promise>ALL FIXES VERIFIED</promise>`
81
+ - All remaining fixes are claimed by others → log "{{AGENT_NAME}}: waiting — all fixes claimed", exit: `kill -INT $PPID` (the `while` loop restarts and re-checks)
82
+
83
+ ### New Fix Discovery
84
+
85
+ If you find a fix in the Fixes Queue without `{agent, status}` metadata (e.g., added by the hypothesize loop while agents were running):
86
+ 1. Read the corresponding hypothesis in `hypotheses.md`
87
+ 2. Set status to `pending`, agent to `-`
88
+
89
+ ---
90
+
91
+ ## Anti-Hijacking Rules
92
+
93
+ 1. **Never touch another agent's `in_progress` fix** — do not modify, complete, or reassign it
94
+ 2. **Respect subsystem ownership** — if another agent has an active `in_progress` fix in the same module/subsystem, leave remaining fixes in that area for them (affinity will naturally guide this). Only claim from that area if the other agent has finished all their fixes there.
95
+ 3. **Note file overlap conflicts** — if your fix modifies files that another agent's active fix also modifies, log a WARNING in the tracker and coordinate carefully
96
+
97
+ ---
98
+
99
+ ## Heartbeat Protocol
100
+
101
+ Every tracker write includes updating your `last_heartbeat` to current ISO 8601 timestamp in the Agent Status table. If another agent's heartbeat is **30+ minutes stale**, log a WARNING in the tracker log but do NOT auto-reclaim their fix — user must manually reset.
102
+
103
+ ---
104
+
105
+ ## Crash Recovery (Self)
106
+
107
+ On fresh start, if your agent name has an `in_progress` fix but you have no memory of it:
108
+ - Fix committed and tests passing → resume at HARDEN stage
109
+ - Fix committed but tests not checked → resume at VERIFY stage
110
+ - No commits found → restart from FIX stage
111
+
112
+ ---
113
+
114
+ ## State Machine (3 stages per fix)
115
+
116
+ ```
117
+ FIX → Write failing test, implement SINGLE root-cause fix, commit → stage: verify
118
+ VERIFY → Run test suite, check for regressions, record evidence → stage: harden
119
+ HARDEN → Defense-in-depth validation at multiple layers, update CLAUDE.md → next fix or kill
120
+ ```
121
+
122
+ When ALL done: `<promise>ALL FIXES VERIFIED</promise>`
123
+
124
+ After completing ANY full fix cycle (all 3 stages), exit: `kill -INT $PPID`
125
+
126
+ ---
127
+
128
+ ## First-Run Handling
129
+
130
+ If Fixes Queue in tracker is empty: read the hypothesize loop's tracker at `.ralph-flow/{{APP_NAME}}/01-hypothesize-loop/tracker.md`, find confirmed hypotheses, populate the Fixes Queue with `{agent: -, status: pending}` metadata, then start.
131
+
132
+ ---
133
+
134
+ ## STAGE 1: FIX
135
+
136
+ 1. Read tracker → **run fix selection algorithm** (see above)
137
+ 2. Read the confirmed HYPOTHESIS entry from `hypotheses.md` — study the root cause, test result, evidence references
138
+ 3. Read the corresponding BUG entry from `bugs.md` — study the reproduction steps, evidence chain
139
+ 4. Read `CLAUDE.md` for project context, conventions, test commands
140
+ 5. **Explore the fix area** — read 20+ files in and around the affected code. Understand the full context before touching anything.
141
+ 6. **Write a failing test that reproduces the bug:**
142
+ - The test must FAIL before the fix and PASS after
143
+ - Use the reproduction steps from the BUG entry as a guide
144
+ - Match existing test patterns per `CLAUDE.md`
145
+ - If no test framework exists: write a minimal script that exits 1 on failure, 0 on success
146
+ - Run the test — confirm it FAILS. If it passes, your test does not capture the bug.
147
+ 7. **Render a Fix Plan** — output an ASCII diagram showing:
148
+ - The single change to be made (file, function, what changes)
149
+ - Impact radius (what else touches this code)
150
+ - How the test validates the fix
151
+ 8. **Implement the SINGLE fix:**
152
+ - Address the root cause identified in the hypothesis — NOT the symptom
153
+ - ONE change at a time — no "while I'm here" improvements
154
+ - No bundled refactoring — the fix and only the fix
155
+ - Match existing code patterns and conventions per `CLAUDE.md`
156
+ 9. **Run the failing test** — confirm it now PASSES
157
+ 10. Commit with a clear message: `fix(scope): description — root cause: BUG-{N}`
158
+ 11. Acquire lock → update tracker: your Agent Status row `active_fix: FIX-{N}`, `stage: verify`, `last_heartbeat`, log entry → release lock
159
+
160
+ ## STAGE 2: VERIFY
161
+
162
+ 1. **Run the full test suite** (commands in `CLAUDE.md`)
163
+ - If no test suite: run lint, type checks, and manual verification of the reproduction steps
164
+ 2. **Check for regressions:**
165
+ - Did any previously passing tests break?
166
+ - Did the fix introduce new warnings or errors?
167
+ - Run the reproduction steps from the BUG entry — is the bug actually fixed?
168
+ 3. **Record verification evidence:**
169
+ - Test suite result: `X passed, Y failed, Z skipped`
170
+ - Regression check: `pass` or `{list of broken tests}`
171
+ - Reproduction check: `bug no longer reproduces` or `still reproduces`
172
+ 4. **If verification FAILS:**
173
+ - Do NOT add more code on top. STOP.
174
+ - Revert the fix: `git revert HEAD`
175
+ - Return to FIX stage with the new information
176
+ - If 3+ fix attempts fail: escalate to user via `AskUserQuestion`
177
+ 5. **If verification PASSES:** continue to HARDEN
178
+ 6. **Render a Verification Matrix** — output an ASCII table showing:
179
+ - Each verification criterion (test suite, regression, reproduction)
180
+ - Result (pass/fail)
181
+ - Evidence (command output summary)
182
+ 7. Acquire lock → update tracker: `stage: harden`, `last_heartbeat`, log entry with verification results → release lock
183
+
184
+ ## STAGE 3: HARDEN
185
+
186
+ 1. **Defense-in-depth — add validation at multiple layers:**
187
+ - **Layer 1: Entry point validation** — add input validation at the API/function boundary where bad data enters
188
+ - **Layer 2: Business logic validation** — add assertions at the business logic layer where the bug manifested
189
+ - **Layer 3: Environment guards** — add context-specific guards (e.g., test-mode safety nets, production-mode logging)
190
+ - **Layer 4: Debug instrumentation** — add logging at the component boundary where the trace chain crossed from working to broken
191
+ - Not all layers apply to every bug — add only those that make sense for this specific case. Minimum 2 layers.
192
+ 2. **Replace arbitrary timeouts with condition-based waiting:**
193
+ - Search for `sleep`, `setTimeout`, `delay` in the fix area
194
+ - If any are used as synchronization (waiting for a condition): replace with polling + condition check + timeout ceiling
195
+ - Pattern: `poll every Nms until condition true, fail after Xms`
196
+ 3. **Run the test suite again** — confirm defense-in-depth changes pass
197
+ 4. **Update CLAUDE.md** if the fix reveals patterns that future developers should know:
198
+ - New conventions discovered
199
+ - Anti-patterns to avoid in this area
200
+ - Debugging tips for this subsystem
201
+ - Keep additions under 150 words net
202
+ 5. Commit defense-in-depth changes separately: `harden(scope): defense-in-depth for BUG-{N}`
203
+ 6. **Render a Defense-in-Depth Layers diagram** — output an ASCII stacked-box diagram showing:
204
+ - Each validation layer added
205
+ - What it catches
206
+ - Where it lives (file:line)
207
+ 7. **Write FIX entry in `fixes.md`:**
208
+
209
+ ```markdown
210
+ ## FIX-{N}: {One-line description of what was fixed}
211
+
212
+ **Bug:** BUG-{M}
213
+ **Hypothesis:** HYP-{K}
214
+ **Agent:** {{AGENT_NAME}}
215
+
216
+ ### Root Cause
217
+ {Confirmed root cause from hypothesis — one paragraph}
218
+
219
+ ### Fix Applied
220
+ - **Change:** {What was changed — file, function, nature of change}
221
+ - **Commit:** {commit hash}
222
+ - **Test added:** {test file and test name}
223
+
224
+ ### Defense-in-Depth
225
+ - **Layer 1:** {Entry validation — what, where}
226
+ - **Layer 2:** {Business logic — what, where}
227
+ - **Layer 3:** {Environment guard — what, where} (if applicable)
228
+ - **Layer 4:** {Debug instrumentation — what, where} (if applicable)
229
+ - **Commit:** {commit hash}
230
+
231
+ ### Verification Evidence
232
+ - **Test suite:** {X passed, Y failed}
233
+ - **Regression check:** {pass | details}
234
+ - **Reproduction check:** {bug no longer reproduces}
235
+ - **Post-hardening suite:** {X passed, Y failed}
236
+
237
+ ### CLAUDE.md Updates
238
+ - {What was added, or "None needed"}
239
+ ```
240
+
241
+ 8. **Mark done & check for more work:**
242
+ - Acquire lock
243
+ - Add fix to `completed_fixes` list
244
+ - Check off fix in Fixes Queue: `[x]`, set `{agent: {{AGENT_NAME}}, status: completed}`
245
+ - Add commit hashes to Completed Mapping
246
+ - Update your Agent Status row: clear `active_fix`
247
+ - Update `last_heartbeat`
248
+ - Log entry
249
+ - Release lock
250
+ 9. **Run fix selection algorithm again:**
251
+ - Claimable fix found → claim it, set `stage: fix`, exit: `kill -INT $PPID`
252
+ - All fixes completed → `<promise>ALL FIXES VERIFIED</promise>`
253
+ - All claimed → log "waiting", exit: `kill -INT $PPID`
254
+
255
+ ---
256
+
257
+ ## Decision Reporting Protocol
258
+
259
+ When you make a substantive decision a human reviewer would want to know about, report it to the dashboard:
260
+
261
+ **When to report:**
262
+ - Fix approach decisions (why this implementation over alternatives)
263
+ - Test strategy choices (what the test covers, what it doesn't)
264
+ - Defense-in-depth layer decisions (which layers to add and why)
265
+ - Timeout replacement decisions (when replacing sleep with condition-based waiting)
266
+ - CLAUDE.md update decisions (what patterns to document)
267
+ - Revert decisions (when a fix attempt fails verification)
268
+ - File overlap or conflict decisions (how you handled shared files with other agents)
269
+
270
+ **How to report:**
271
+ ```bash
272
+ curl -s --connect-timeout 2 --max-time 5 -X POST "http://127.0.0.1:4242/api/decision?app=$RALPHFLOW_APP&loop=$RALPHFLOW_LOOP" -H 'Content-Type: application/json' -d '{"item":"FIX-{N}","agent":"{{AGENT_NAME}}","decision":"{one-line summary}","reasoning":"{why this choice}"}'
273
+ ```
274
+
275
+ **Do NOT report** routine operations: claiming a fix, updating heartbeat, stage transitions, waiting for claimed fixes. Only report substantive choices that affect the implementation.
276
+
277
+ **Best-effort only:** If the dashboard is unreachable (curl fails), continue working normally. Decision reporting must never block or delay your work.
278
+
279
+ ---
280
+
281
+ ## Anti-Pattern Table
282
+
283
+ | Thought | Response |
284
+ |---------|----------|
285
+ | "I already know the fix, skip the failing test" | NO. The test proves the fix works. Without it, you have an untested change. |
286
+ | "Let me fix a few other things while I'm here" | NO. One fix per root cause. Bundled changes mask which change actually fixed the bug. |
287
+ | "Defense-in-depth is overkill for this" | NO. Single-layer validation gets bypassed. Add at least 2 layers. |
288
+ | "The test suite takes too long, I'll skip it" | NO. Skipped verification means unknown regressions. Run the suite. |
289
+ | "Let me refactor this code while fixing the bug" | NO. Refactoring is a separate concern. Fix the bug, harden, ship. Refactor later. |
290
+ | "This timeout works fine, no need to replace it" | MAYBE. If it's a synchronization sleep, replace it. If it's a user-facing delay, leave it. |
291
+ | "The fix is obvious from the hypothesis" | YES, but write the failing test FIRST anyway. Obvious fixes still need verification. |
292
+ | "I'll write the test after the fix passes" | NO. Write the test FIRST, confirm it FAILS, then implement the fix. This is non-negotiable. |
293
+ | "CLAUDE.md doesn't need updating" | MAYBE. If the fix reveals a pattern others should know about, document it. When in doubt, document. |
294
+ | "Three fix attempts failed, let me try harder" | NO. Escalate to user. Three failures means something fundamental is wrong. |
295
+
296
+ ---
297
+
298
+ ## Condition-Based Waiting Reference
299
+
300
+ When replacing arbitrary timeouts during HARDEN:
301
+
302
+ **Bad (arbitrary timeout):**
303
+ ```javascript
304
+ await sleep(5000); // Hope the server is ready
305
+ ```
306
+
307
+ **Good (condition-based):**
308
+ ```javascript
309
+ const deadline = Date.now() + 30000; // 30s ceiling
310
+ while (Date.now() < deadline) {
311
+ const ready = await checkCondition();
312
+ if (ready) break;
313
+ await sleep(500); // Poll interval
314
+ }
315
+ if (Date.now() >= deadline) throw new Error('Timed out waiting for condition');
316
+ ```
317
+
318
+ **Key properties:**
319
+ - Polls a real condition, not calendar time
320
+ - Has a ceiling timeout to prevent infinite waits
321
+ - Poll interval is short enough to be responsive
322
+ - Throws on timeout instead of silently continuing
323
+
324
+ ---
325
+
326
+ ## Rules
327
+
328
+ - One fix at a time per agent. All 3 stages run in one iteration, one `kill` at the end.
329
+ - Read tracker first, update tracker last. Always use lock protocol for writes.
330
+ - Read `CLAUDE.md` for all project-specific context.
331
+ - **Failing test FIRST.** No fix is implemented without a test that proves the bug exists.
332
+ - **ONE change per fix.** No bundling, no "while I'm here" improvements.
333
+ - **Defense-in-depth is mandatory.** Minimum 2 validation layers per fix.
334
+ - **Verify with the full suite.** No shortcuts, no "it should be fine."
335
+ - **Revert on failure.** If verification fails, revert and re-analyze. Do not stack fixes.
336
+ - **Escalate at 3 failures.** Do not attempt fix #4 without user consultation.
337
+ - Update `CLAUDE.md` when the fix reveals patterns (under 150 words net).
338
+ - **Multi-agent: never touch another agent's in_progress fix. Coordinate via tracker.md.**
339
+
340
+ ---
341
+
342
+ Read `.ralph-flow/{{APP_NAME}}/02-fix-loop/tracker.md` now and begin.
@@ -0,0 +1,18 @@
1
+ # Fix Loop — Tracker
2
+
3
+ - completed_fixes: []
4
+
5
+ ## Agent Status
6
+
7
+ | agent | active_fix | stage | last_heartbeat |
8
+ |-------|------------|-------|----------------|
9
+
10
+ ---
11
+
12
+ ## Dependencies
13
+
14
+ ## Fixes Queue
15
+
16
+ ## Escalation Queue
17
+
18
+ ## Log
@@ -0,0 +1,81 @@
1
+ name: systematic-debugging
2
+ description: "Investigate → Hypothesize → Fix pipeline for root-cause-first debugging"
3
+ version: 1
4
+ dir: .ralph-flow
5
+
6
+ entities:
7
+ BUG:
8
+ prefix: BUG
9
+ data_file: 00-investigate-loop/bugs.md
10
+ HYPOTHESIS:
11
+ prefix: HYP
12
+ data_file: 01-hypothesize-loop/hypotheses.md
13
+ FIX:
14
+ prefix: FIX
15
+ data_file: 02-fix-loop/fixes.md
16
+
17
+ loops:
18
+ investigate-loop:
19
+ order: 0
20
+ name: "Investigate Loop"
21
+ prompt: 00-investigate-loop/prompt.md
22
+ tracker: 00-investigate-loop/tracker.md
23
+ data_files:
24
+ - 00-investigate-loop/bugs.md
25
+ entities: [BUG]
26
+ stages: [reproduce, trace, evidence]
27
+ completion: "ALL BUGS INVESTIGATED"
28
+ feeds: [hypothesize-loop]
29
+ multi_agent: false
30
+ model: claude-sonnet-4-6
31
+ cadence: 0
32
+
33
+ hypothesize-loop:
34
+ order: 1
35
+ name: "Hypothesize Loop"
36
+ prompt: 01-hypothesize-loop/prompt.md
37
+ tracker: 01-hypothesize-loop/tracker.md
38
+ data_files:
39
+ - 01-hypothesize-loop/hypotheses.md
40
+ entities: [HYPOTHESIS, BUG]
41
+ stages: [analyze, hypothesize, test]
42
+ completion: "ALL HYPOTHESES TESTED"
43
+ fed_by: [investigate-loop]
44
+ feeds: [fix-loop]
45
+ model: claude-sonnet-4-6
46
+ multi_agent:
47
+ enabled: true
48
+ max_agents: 3
49
+ strategy: tracker-lock
50
+ agent_placeholder: "{{AGENT_NAME}}"
51
+ lock:
52
+ file: 01-hypothesize-loop/.tracker-lock
53
+ type: echo
54
+ stale_seconds: 60
55
+ cadence: 0
56
+
57
+ fix-loop:
58
+ order: 2
59
+ name: "Fix Loop"
60
+ prompt: 02-fix-loop/prompt.md
61
+ tracker: 02-fix-loop/tracker.md
62
+ data_files:
63
+ - 02-fix-loop/fixes.md
64
+ entities: [FIX, BUG]
65
+ stages: [fix, verify, harden]
66
+ completion: "ALL FIXES VERIFIED"
67
+ fed_by: [hypothesize-loop]
68
+ model: claude-sonnet-4-6
69
+ multi_agent:
70
+ enabled: true
71
+ max_agents: 3
72
+ strategy: tracker-lock
73
+ agent_placeholder: "{{AGENT_NAME}}"
74
+ lock:
75
+ file: 02-fix-loop/.tracker-lock
76
+ type: echo
77
+ stale_seconds: 60
78
+ worktree:
79
+ strategy: shared
80
+ auto_merge: true
81
+ cadence: 0
@@ -0,0 +1,208 @@
1
+ # Spec Loop — Break Requirements into Testable Specifications
2
+
3
+ **App:** `{{APP_NAME}}` — all flow files live under `.ralph-flow/{{APP_NAME}}/`.
4
+
5
+ Read `.ralph-flow/{{APP_NAME}}/00-spec-loop/tracker.md` FIRST to determine where you are.
6
+
7
+ > **Think in tests, not tasks.** Every specification you write must answer: "What does the test assert?" and "What does the user observe?" If you cannot write a concrete assertion, the spec is not ready.
8
+
9
+ > **READ-ONLY FOR SOURCE CODE.** Only write to: `.ralph-flow/{{APP_NAME}}/01-tdd-loop/test-cases.md`, `.ralph-flow/{{APP_NAME}}/01-tdd-loop/tracker.md`, `.ralph-flow/{{APP_NAME}}/00-spec-loop/tracker.md`, `.ralph-flow/{{APP_NAME}}/00-spec-loop/specs.md`.
10
+
11
+ **Pipeline:** `specs.md → YOU → test-cases.md → 01-tdd-loop → code`
12
+
13
+ ---
14
+
15
+ ## Visual Communication Protocol
16
+
17
+ When communicating scope, structure, relationships, or status, render **ASCII diagrams** using Unicode box-drawing characters. These help the user see the full picture at the terminal without scrolling through prose.
18
+
19
+ **Character set:** `┌ ─ ┐ │ └ ┘ ├ ┤ ┬ ┴ ┼ ═ ● ○ ▼ ▶`
20
+
21
+ **Diagram types to use:**
22
+
23
+ - **Spec/Architecture Map** — components and their relationships in a bordered grid
24
+ - **Decomposition Tree** — hierarchical breakdown with `├──` and `└──` branches
25
+ - **Data Flow** — arrows (`──→`) showing how information moves between components
26
+ - **Comparison Table** — bordered table for trade-offs and design options
27
+ - **Status Summary** — bordered box with completion indicators (`✓` done, `◌` pending)
28
+
29
+ **Rules:** Keep diagrams under 20 lines and under 70 characters wide. Populate with real data from current context. Render inside fenced code blocks. Use diagrams to supplement, not replace, prose.
30
+
31
+ ---
32
+
33
+ ## State Machine (3 stages per spec)
34
+
35
+ **FIRST — Check completion.** Read the tracker. If the Specs Queue has entries
36
+ AND every entry is `[x]` (no pending specs):
37
+ 1. **Re-scan `specs.md`** — read all `## SPEC-{N}:` headers and compare
38
+ against the Specs Queue in the tracker.
39
+ 2. **New specs found** (in `specs.md` but not in the queue) → add them as
40
+ `- [ ] SPEC-{N}: {title}` to the Specs Queue, update the Dependency Graph
41
+ from their `**Depends on:**` tags, then proceed to process the lowest-numbered
42
+ ready spec via the normal state machine.
43
+ 3. **No new specs** → go to **"No Specs? Collect Them"** to ask the user.
44
+
45
+ Only write `<promise>ALL SPECS WRITTEN</promise>` when the user explicitly
46
+ confirms they have no more features to specify AND `specs.md` has no specs
47
+ missing from the tracker queue.
48
+
49
+ Pick the lowest-numbered `ready` spec. NEVER process a `blocked` spec.
50
+
51
+ ---
52
+
53
+ ## No Specs? Collect Them
54
+
55
+ **Triggers when:**
56
+ - `specs.md` has no specs at all (first run, empty queue with no entries), OR
57
+ - All specs in the queue are completed (`[x]`), no `pending` specs remain, AND
58
+ `specs.md` has been re-scanned and contains no specs missing from the queue
59
+
60
+ **Flow:**
61
+ 1. Tell the user: *"No pending specs. Describe the features or behaviors you want to build — I will turn them into testable specifications."*
62
+ 2. Use `AskUserQuestion` to prompt: "What do you want to build or fix next?" (open-ended)
63
+ 3. As the user narrates, capture each distinct behavior as a `## SPEC-{N}: {Title}` in `specs.md` (continue numbering from existing specs) with description and `**Depends on:** None` (or dependencies if mentioned)
64
+ 4. **Confirm specs & dependencies** — present all captured specs back. Use `AskUserQuestion` (up to 5 questions) to validate: correct specs? right dependency order? any to split/merge? priority adjustments?
65
+ 5. Apply corrections, finalize `specs.md`, add new entries to tracker queue, proceed to normal flow
66
+
67
+ ---
68
+
69
+ ```
70
+ ANALYZE → Read requirements, explore codebase, map behaviors → stage: specify
71
+ SPECIFY → Write detailed specs with acceptance criteria → stage: decompose
72
+ DECOMPOSE → Break into TEST-CASE entries with exact assertions → kill
73
+ ```
74
+
75
+ ## First-Run / New Spec Detection
76
+
77
+ If Specs Queue in tracker is empty OR all entries are `[x]`: read `specs.md`,
78
+ scan `## SPEC-{N}:` headers + `**Depends on:**` tags. For any spec NOT already
79
+ in the queue, add as `- [ ] SPEC-{N}: {title}` and build/update the Dependency Graph.
80
+ If new specs were added, proceed to process them. If the queue is still empty
81
+ after scanning, go to **"No Specs? Collect Them"**.
82
+
83
+ ---
84
+
85
+ ## STAGE 1: ANALYZE
86
+
87
+ 1. Read tracker → pick lowest-numbered `ready` spec
88
+ 2. Read the spec from `specs.md` (+ any referenced screenshots or docs)
89
+ 3. **Explore the codebase** — read `CLAUDE.md` for project context, then **20+ key files** across the areas this spec touches. Understand current behavior, test infrastructure, testing frameworks, existing test patterns, and what needs to change.
90
+ 4. **Identify the test framework** — determine what test runner, assertion library, and patterns the project uses. Note test file locations, naming conventions, and execution commands.
91
+ 5. **Render a Behavior Map** — output an ASCII diagram showing:
92
+ - The behaviors this spec covers (inputs → outputs)
93
+ - Existing code paths that will be tested/changed (`●` exists, `○` needs creation)
94
+ - Test file locations and how they map to source files
95
+ 6. Update tracker: `active_spec: SPEC-{N}`, `stage: specify`, log entry
96
+
97
+ ## STAGE 2: SPECIFY
98
+
99
+ 1. Formulate questions about expected behaviors, edge cases, error conditions, and acceptance thresholds
100
+ 2. **Present understanding diagram first** — render an ASCII behavior/scope diagram showing your understanding of what the spec covers. This gives the user a visual anchor to correct misconceptions.
101
+ 3. **Ask up to 20 questions, 5 at a time** via `AskUserQuestion`:
102
+ - Round 1: Core behavior — what should happen in the happy path? What inputs and outputs?
103
+ - Round 2: Edge cases — empty input, invalid data, concurrent access, boundary values?
104
+ - Round 3: Error handling — what errors can occur? What should the user see?
105
+ - Round 4+: Integration — how does this interact with other specs? Performance constraints?
106
+ - Stop early if clear enough
107
+ 4. For each acceptance criterion, ask yourself: *Can I write a test assertion for this? If not, it is too vague.*
108
+ 5. Save Q&A summary in tracker log
109
+ 6. Update tracker: `stage: decompose`, log entry with key decisions
110
+
111
+ ## STAGE 3: DECOMPOSE
112
+
113
+ 1. Find next TEST-CASE numbers (check existing in `01-tdd-loop/test-cases.md`)
114
+ 2. **Read already-written test cases** — if sibling test cases exist, read them to align scope boundaries and avoid overlap
115
+ 3. **Render a Decomposition Tree** — output an ASCII tree showing the planned TEST-CASE entries grouped by behavior area, with dependency arrows between test cases that must be implemented in order
116
+ 4. Break spec into TEST-CASE entries — one per distinct assertion/behavior, grouped logically
117
+ 5. For each test case, include:
118
+ - The exact test description string (what the `test()` or `it()` block will say)
119
+ - The assertion(s) — what is checked and what the expected value is
120
+ - Setup requirements — what state must exist before the test runs
121
+ - The expected failure reason in RED stage — why the test will fail before implementation
122
+ 6. **Sanity-check:** Every acceptance criterion from the spec MUST map to at least one TEST-CASE. If an acceptance criterion has no test case, you missed something.
123
+ 7. Append to `01-tdd-loop/test-cases.md` (format below)
124
+ 8. **Update `01-tdd-loop/tracker.md` (with lock protocol):**
125
+ 1. Acquire `.ralph-flow/{{APP_NAME}}/01-tdd-loop/.tracker-lock`:
126
+ - Exists + < 60s old → sleep 2s, retry up to 5 times
127
+ - Exists + >= 60s old → stale, delete it
128
+ - Not exists → continue
129
+ - Write lock: `echo "spec-loop $(date -u +%Y-%m-%dT%H:%M:%SZ)" > .ralph-flow/{{APP_NAME}}/01-tdd-loop/.tracker-lock`
130
+ - Sleep 500ms, re-read lock, verify `spec-loop` is in it
131
+ 2. Add new Test Case Groups to `## Test Case Groups`
132
+ 3. Add new test cases to `## Test Cases Queue` with multi-agent metadata:
133
+ - Compute status: check if each test case's `**Depends on:**` targets are all in `completed_test_cases`
134
+ - All deps satisfied or `Depends on: None` → `{agent: -, status: pending}`
135
+ - Any dep not satisfied → `{agent: -, status: blocked}`
136
+ - Example: `- [ ] TC-5: Should reject empty email {agent: -, status: pending}`
137
+ 4. Add dependency entries to `## Dependencies` section (for test cases with dependencies only):
138
+ - Example: `- TC-5: [TC-3]`
139
+ - Test cases with `Depends on: None` are NOT added to Dependencies
140
+ 5. Release lock: `rm .ralph-flow/{{APP_NAME}}/01-tdd-loop/.tracker-lock`
141
+ 9. Mark done in tracker: check off queue, completed mapping, `active_spec: none`, `stage: analyze`, update Dependency Graph, log
142
+ 10. Exit: `kill -INT $PPID`
143
+
144
+ **TEST-CASE format:**
145
+ ```markdown
146
+ ## TC-{N}: {Test description string}
147
+
148
+ **Source:** SPEC-{M}
149
+ **Depends on:** {TC-{Y} or "None"}
150
+
151
+ ### Test Description
152
+ `{exact string for test() or it() block}`
153
+
154
+ ### Setup
155
+ {What state/data must exist before the test runs}
156
+
157
+ ### Assertion
158
+ {Exact assertion(s) — what is checked and expected value}
159
+ - `expect(result).toBe(...)` or equivalent plain-language assertion
160
+
161
+ ### Expected RED Failure
162
+ {Why the test will fail before implementation — e.g., "function does not exist", "returns undefined instead of validated object"}
163
+
164
+ ### Implementation Hint
165
+ {Brief guidance — which module/function to create or modify. Do NOT specify file paths — the TDD loop explores the codebase itself.}
166
+
167
+ ### Acceptance Criteria
168
+ - [ ] {Specific, observable condition — maps back to SPEC acceptance criteria}
169
+ ```
170
+
171
+ ---
172
+
173
+ ## Decision Reporting Protocol
174
+
175
+ When you make a substantive decision a human reviewer would want to know about, report it to the dashboard:
176
+
177
+ **When to report:**
178
+ - Scope boundary decisions (included/excluded behaviors from a spec)
179
+ - Test strategy choices (unit vs integration, mocking decisions)
180
+ - Decomposition decisions (why you split test cases one way vs. another)
181
+ - Interpretation of ambiguous requirements (how you resolved unclear user intent)
182
+ - Self-answered clarification questions (questions you could have asked but resolved yourself)
183
+
184
+ **How to report:**
185
+ ```bash
186
+ curl -s --connect-timeout 2 --max-time 5 -X POST "http://127.0.0.1:4242/api/decision?app=$RALPHFLOW_APP&loop=$RALPHFLOW_LOOP" -H 'Content-Type: application/json' -d '{"item":"SPEC-{N}","agent":"spec-loop","decision":"{one-line summary}","reasoning":"{why this choice}"}'
187
+ ```
188
+
189
+ **Do NOT report** routine operations: picking the next spec, updating tracker, stage transitions, heartbeat updates. Only report substantive choices that affect the work product.
190
+
191
+ **Best-effort only:** If the dashboard is unreachable (curl fails), continue working normally. Decision reporting must never block or delay your work.
192
+
193
+ ---
194
+
195
+ ## Rules
196
+
197
+ - One spec at a time. All 3 stages run in one iteration, one `kill` at the end.
198
+ - Read tracker first, update tracker last.
199
+ - Append to `test-cases.md` — never overwrite. Numbers globally unique and sequential.
200
+ - Test cases must be self-contained — the TDD loop never reads `specs.md`.
201
+ - Every acceptance criterion must map to at least one test case.
202
+ - Each test case = one assertion/behavior. If a test case has "and" in its description, split it.
203
+ - Mark inter-test-case dependencies explicitly.
204
+ - Think in assertions: if you cannot write `expect(x).toBe(y)`, the spec is not specific enough.
205
+
206
+ ---
207
+
208
+ Read `.ralph-flow/{{APP_NAME}}/00-spec-loop/tracker.md` now and begin.
@@ -0,0 +1,3 @@
1
+ # Specs
2
+
3
+ <!-- Populated by the spec loop -->
@@ -0,0 +1,16 @@
1
+ # Spec Loop — Tracker
2
+
3
+ - active_spec: none
4
+ - stage: analyze
5
+ - completed_specs: []
6
+ - pending_specs: []
7
+
8
+ ---
9
+
10
+ ## Specs Queue
11
+
12
+ ## Dependency Graph
13
+
14
+ ## Completed Mapping
15
+
16
+ ## Log