ralphflow 0.5.1 → 0.5.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (64) hide show
  1. package/dist/{chunk-DOC64TD6.js → chunk-CA4XP6KI.js} +1 -1
  2. package/dist/ralphflow.js +237 -28
  3. package/dist/{server-EX5MWYW4.js → server-64NQCIKJ.js} +88 -21
  4. package/package.json +1 -1
  5. package/src/dashboard/ui/app.js +4 -1
  6. package/src/dashboard/ui/archives.js +27 -2
  7. package/src/dashboard/ui/index.html +1 -1
  8. package/src/dashboard/ui/loop-detail.js +1 -1
  9. package/src/dashboard/ui/prompt-builder.js +39 -4
  10. package/src/dashboard/ui/sidebar.js +1 -1
  11. package/src/dashboard/ui/state.js +3 -0
  12. package/src/dashboard/ui/styles.css +77 -0
  13. package/src/dashboard/ui/templates.js +3 -0
  14. package/src/dashboard/ui/utils.js +30 -0
  15. package/src/templates/code-implementation/loops/00-story-loop/prompt.md +51 -11
  16. package/src/templates/code-implementation/loops/01-tasks-loop/prompt.md +28 -2
  17. package/src/templates/code-implementation/loops/02-delivery-loop/prompt.md +27 -4
  18. package/src/templates/code-review/loops/00-collect-loop/changesets.md +3 -0
  19. package/src/templates/code-review/loops/00-collect-loop/prompt.md +179 -0
  20. package/src/templates/code-review/loops/00-collect-loop/tracker.md +16 -0
  21. package/src/templates/code-review/loops/01-spec-review-loop/prompt.md +238 -0
  22. package/src/templates/code-review/loops/01-spec-review-loop/tracker.md +16 -0
  23. package/src/templates/code-review/loops/02-quality-review-loop/issues.md +3 -0
  24. package/src/templates/code-review/loops/02-quality-review-loop/prompt.md +306 -0
  25. package/src/templates/code-review/loops/02-quality-review-loop/tracker.md +16 -0
  26. package/src/templates/code-review/loops/03-fix-loop/prompt.md +265 -0
  27. package/src/templates/code-review/loops/03-fix-loop/tracker.md +16 -0
  28. package/src/templates/code-review/ralphflow.yaml +98 -0
  29. package/src/templates/design-review/loops/00-explore-loop/ideas.md +3 -0
  30. package/src/templates/design-review/loops/00-explore-loop/prompt.md +207 -0
  31. package/src/templates/design-review/loops/00-explore-loop/tracker.md +16 -0
  32. package/src/templates/design-review/loops/01-design-loop/designs.md +3 -0
  33. package/src/templates/design-review/loops/01-design-loop/prompt.md +201 -0
  34. package/src/templates/design-review/loops/01-design-loop/tracker.md +16 -0
  35. package/src/templates/design-review/loops/02-review-loop/prompt.md +255 -0
  36. package/src/templates/design-review/loops/02-review-loop/tracker.md +16 -0
  37. package/src/templates/design-review/loops/03-plan-loop/plans.md +3 -0
  38. package/src/templates/design-review/loops/03-plan-loop/prompt.md +247 -0
  39. package/src/templates/design-review/loops/03-plan-loop/tracker.md +16 -0
  40. package/src/templates/design-review/ralphflow.yaml +84 -0
  41. package/src/templates/research/loops/00-discovery-loop/prompt.md +36 -5
  42. package/src/templates/research/loops/01-research-loop/prompt.md +22 -2
  43. package/src/templates/research/loops/02-story-loop/prompt.md +20 -1
  44. package/src/templates/research/loops/03-document-loop/prompt.md +20 -1
  45. package/src/templates/systematic-debugging/loops/00-investigate-loop/bugs.md +3 -0
  46. package/src/templates/systematic-debugging/loops/00-investigate-loop/prompt.md +237 -0
  47. package/src/templates/systematic-debugging/loops/00-investigate-loop/tracker.md +16 -0
  48. package/src/templates/systematic-debugging/loops/01-hypothesize-loop/hypotheses.md +3 -0
  49. package/src/templates/systematic-debugging/loops/01-hypothesize-loop/prompt.md +312 -0
  50. package/src/templates/systematic-debugging/loops/01-hypothesize-loop/tracker.md +18 -0
  51. package/src/templates/systematic-debugging/loops/02-fix-loop/fixes.md +3 -0
  52. package/src/templates/systematic-debugging/loops/02-fix-loop/prompt.md +342 -0
  53. package/src/templates/systematic-debugging/loops/02-fix-loop/tracker.md +18 -0
  54. package/src/templates/systematic-debugging/ralphflow.yaml +81 -0
  55. package/src/templates/tdd-implementation/loops/00-spec-loop/prompt.md +208 -0
  56. package/src/templates/tdd-implementation/loops/00-spec-loop/specs.md +3 -0
  57. package/src/templates/tdd-implementation/loops/00-spec-loop/tracker.md +16 -0
  58. package/src/templates/tdd-implementation/loops/01-tdd-loop/prompt.md +323 -0
  59. package/src/templates/tdd-implementation/loops/01-tdd-loop/test-cases.md +3 -0
  60. package/src/templates/tdd-implementation/loops/01-tdd-loop/tracker.md +18 -0
  61. package/src/templates/tdd-implementation/loops/02-verify-loop/prompt.md +226 -0
  62. package/src/templates/tdd-implementation/loops/02-verify-loop/tracker.md +16 -0
  63. package/src/templates/tdd-implementation/loops/02-verify-loop/verifications.md +3 -0
  64. package/src/templates/tdd-implementation/ralphflow.yaml +73 -0
@@ -0,0 +1,18 @@
1
+ # Fix Loop — Tracker
2
+
3
+ - completed_fixes: []
4
+
5
+ ## Agent Status
6
+
7
+ | agent | active_fix | stage | last_heartbeat |
8
+ |-------|------------|-------|----------------|
9
+
10
+ ---
11
+
12
+ ## Dependencies
13
+
14
+ ## Fixes Queue
15
+
16
+ ## Escalation Queue
17
+
18
+ ## Log
@@ -0,0 +1,81 @@
1
+ name: systematic-debugging
2
+ description: "Investigate → Hypothesize → Fix pipeline for root-cause-first debugging"
3
+ version: 1
4
+ dir: .ralph-flow
5
+
6
+ entities:
7
+ BUG:
8
+ prefix: BUG
9
+ data_file: 00-investigate-loop/bugs.md
10
+ HYPOTHESIS:
11
+ prefix: HYP
12
+ data_file: 01-hypothesize-loop/hypotheses.md
13
+ FIX:
14
+ prefix: FIX
15
+ data_file: 02-fix-loop/fixes.md
16
+
17
+ loops:
18
+ investigate-loop:
19
+ order: 0
20
+ name: "Investigate Loop"
21
+ prompt: 00-investigate-loop/prompt.md
22
+ tracker: 00-investigate-loop/tracker.md
23
+ data_files:
24
+ - 00-investigate-loop/bugs.md
25
+ entities: [BUG]
26
+ stages: [reproduce, trace, evidence]
27
+ completion: "ALL BUGS INVESTIGATED"
28
+ feeds: [hypothesize-loop]
29
+ multi_agent: false
30
+ model: claude-sonnet-4-6
31
+ cadence: 0
32
+
33
+ hypothesize-loop:
34
+ order: 1
35
+ name: "Hypothesize Loop"
36
+ prompt: 01-hypothesize-loop/prompt.md
37
+ tracker: 01-hypothesize-loop/tracker.md
38
+ data_files:
39
+ - 01-hypothesize-loop/hypotheses.md
40
+ entities: [HYPOTHESIS, BUG]
41
+ stages: [analyze, hypothesize, test]
42
+ completion: "ALL HYPOTHESES TESTED"
43
+ fed_by: [investigate-loop]
44
+ feeds: [fix-loop]
45
+ model: claude-sonnet-4-6
46
+ multi_agent:
47
+ enabled: true
48
+ max_agents: 3
49
+ strategy: tracker-lock
50
+ agent_placeholder: "{{AGENT_NAME}}"
51
+ lock:
52
+ file: 01-hypothesize-loop/.tracker-lock
53
+ type: echo
54
+ stale_seconds: 60
55
+ cadence: 0
56
+
57
+ fix-loop:
58
+ order: 2
59
+ name: "Fix Loop"
60
+ prompt: 02-fix-loop/prompt.md
61
+ tracker: 02-fix-loop/tracker.md
62
+ data_files:
63
+ - 02-fix-loop/fixes.md
64
+ entities: [FIX, BUG]
65
+ stages: [fix, verify, harden]
66
+ completion: "ALL FIXES VERIFIED"
67
+ fed_by: [hypothesize-loop]
68
+ model: claude-sonnet-4-6
69
+ multi_agent:
70
+ enabled: true
71
+ max_agents: 3
72
+ strategy: tracker-lock
73
+ agent_placeholder: "{{AGENT_NAME}}"
74
+ lock:
75
+ file: 02-fix-loop/.tracker-lock
76
+ type: echo
77
+ stale_seconds: 60
78
+ worktree:
79
+ strategy: shared
80
+ auto_merge: true
81
+ cadence: 0
@@ -0,0 +1,208 @@
1
+ # Spec Loop — Break Requirements into Testable Specifications
2
+
3
+ **App:** `{{APP_NAME}}` — all flow files live under `.ralph-flow/{{APP_NAME}}/`.
4
+
5
+ Read `.ralph-flow/{{APP_NAME}}/00-spec-loop/tracker.md` FIRST to determine where you are.
6
+
7
+ > **Think in tests, not tasks.** Every specification you write must answer: "What does the test assert?" and "What does the user observe?" If you cannot write a concrete assertion, the spec is not ready.
8
+
9
+ > **READ-ONLY FOR SOURCE CODE.** Only write to: `.ralph-flow/{{APP_NAME}}/01-tdd-loop/test-cases.md`, `.ralph-flow/{{APP_NAME}}/01-tdd-loop/tracker.md`, `.ralph-flow/{{APP_NAME}}/00-spec-loop/tracker.md`, `.ralph-flow/{{APP_NAME}}/00-spec-loop/specs.md`.
10
+
11
+ **Pipeline:** `specs.md → YOU → test-cases.md → 01-tdd-loop → code`
12
+
13
+ ---
14
+
15
+ ## Visual Communication Protocol
16
+
17
+ When communicating scope, structure, relationships, or status, render **ASCII diagrams** using Unicode box-drawing characters. These help the user see the full picture at the terminal without scrolling through prose.
18
+
19
+ **Character set:** `┌ ─ ┐ │ └ ┘ ├ ┤ ┬ ┴ ┼ ═ ● ○ ▼ ▶`
20
+
21
+ **Diagram types to use:**
22
+
23
+ - **Spec/Architecture Map** — components and their relationships in a bordered grid
24
+ - **Decomposition Tree** — hierarchical breakdown with `├──` and `└──` branches
25
+ - **Data Flow** — arrows (`──→`) showing how information moves between components
26
+ - **Comparison Table** — bordered table for trade-offs and design options
27
+ - **Status Summary** — bordered box with completion indicators (`✓` done, `◌` pending)
28
+
29
+ **Rules:** Keep diagrams under 20 lines and under 70 characters wide. Populate with real data from current context. Render inside fenced code blocks. Use diagrams to supplement, not replace, prose.
30
+
31
+ ---
32
+
33
+ ## State Machine (3 stages per spec)
34
+
35
+ **FIRST — Check completion.** Read the tracker. If the Specs Queue has entries
36
+ AND every entry is `[x]` (no pending specs):
37
+ 1. **Re-scan `specs.md`** — read all `## SPEC-{N}:` headers and compare
38
+ against the Specs Queue in the tracker.
39
+ 2. **New specs found** (in `specs.md` but not in the queue) → add them as
40
+ `- [ ] SPEC-{N}: {title}` to the Specs Queue, update the Dependency Graph
41
+ from their `**Depends on:**` tags, then proceed to process the lowest-numbered
42
+ ready spec via the normal state machine.
43
+ 3. **No new specs** → go to **"No Specs? Collect Them"** to ask the user.
44
+
45
+ Only write `<promise>ALL SPECS WRITTEN</promise>` when the user explicitly
46
+ confirms they have no more features to specify AND `specs.md` has no specs
47
+ missing from the tracker queue.
48
+
49
+ Pick the lowest-numbered `ready` spec. NEVER process a `blocked` spec.
50
+
51
+ ---
52
+
53
+ ## No Specs? Collect Them
54
+
55
+ **Triggers when:**
56
+ - `specs.md` has no specs at all (first run, empty queue with no entries), OR
57
+ - All specs in the queue are completed (`[x]`), no `pending` specs remain, AND
58
+ `specs.md` has been re-scanned and contains no specs missing from the queue
59
+
60
+ **Flow:**
61
+ 1. Tell the user: *"No pending specs. Describe the features or behaviors you want to build — I will turn them into testable specifications."*
62
+ 2. Use `AskUserQuestion` to prompt: "What do you want to build or fix next?" (open-ended)
63
+ 3. As the user narrates, capture each distinct behavior as a `## SPEC-{N}: {Title}` in `specs.md` (continue numbering from existing specs) with description and `**Depends on:** None` (or dependencies if mentioned)
64
+ 4. **Confirm specs & dependencies** — present all captured specs back. Use `AskUserQuestion` (up to 5 questions) to validate: correct specs? right dependency order? any to split/merge? priority adjustments?
65
+ 5. Apply corrections, finalize `specs.md`, add new entries to tracker queue, proceed to normal flow
66
+
67
+ ---
68
+
69
+ ```
70
+ ANALYZE → Read requirements, explore codebase, map behaviors → stage: specify
71
+ SPECIFY → Write detailed specs with acceptance criteria → stage: decompose
72
+ DECOMPOSE → Break into TEST-CASE entries with exact assertions → kill
73
+ ```
74
+
75
+ ## First-Run / New Spec Detection
76
+
77
+ If Specs Queue in tracker is empty OR all entries are `[x]`: read `specs.md`,
78
+ scan `## SPEC-{N}:` headers + `**Depends on:**` tags. For any spec NOT already
79
+ in the queue, add as `- [ ] SPEC-{N}: {title}` and build/update the Dependency Graph.
80
+ If new specs were added, proceed to process them. If the queue is still empty
81
+ after scanning, go to **"No Specs? Collect Them"**.
82
+
83
+ ---
84
+
85
+ ## STAGE 1: ANALYZE
86
+
87
+ 1. Read tracker → pick lowest-numbered `ready` spec
88
+ 2. Read the spec from `specs.md` (+ any referenced screenshots or docs)
89
+ 3. **Explore the codebase** — read `CLAUDE.md` for project context, then **20+ key files** across the areas this spec touches. Understand current behavior, test infrastructure, testing frameworks, existing test patterns, and what needs to change.
90
+ 4. **Identify the test framework** — determine what test runner, assertion library, and patterns the project uses. Note test file locations, naming conventions, and execution commands.
91
+ 5. **Render a Behavior Map** — output an ASCII diagram showing:
92
+ - The behaviors this spec covers (inputs → outputs)
93
+ - Existing code paths that will be tested/changed (`●` exists, `○` needs creation)
94
+ - Test file locations and how they map to source files
95
+ 6. Update tracker: `active_spec: SPEC-{N}`, `stage: specify`, log entry
96
+
97
+ ## STAGE 2: SPECIFY
98
+
99
+ 1. Formulate questions about expected behaviors, edge cases, error conditions, and acceptance thresholds
100
+ 2. **Present understanding diagram first** — render an ASCII behavior/scope diagram showing your understanding of what the spec covers. This gives the user a visual anchor to correct misconceptions.
101
+ 3. **Ask up to 20 questions, 5 at a time** via `AskUserQuestion`:
102
+ - Round 1: Core behavior — what should happen in the happy path? What inputs and outputs?
103
+ - Round 2: Edge cases — empty input, invalid data, concurrent access, boundary values?
104
+ - Round 3: Error handling — what errors can occur? What should the user see?
105
+ - Round 4+: Integration — how does this interact with other specs? Performance constraints?
106
+ - Stop early if clear enough
107
+ 4. For each acceptance criterion, ask yourself: *Can I write a test assertion for this? If not, it is too vague.*
108
+ 5. Save Q&A summary in tracker log
109
+ 6. Update tracker: `stage: decompose`, log entry with key decisions
110
+
111
+ ## STAGE 3: DECOMPOSE
112
+
113
+ 1. Find next TEST-CASE numbers (check existing in `01-tdd-loop/test-cases.md`)
114
+ 2. **Read already-written test cases** — if sibling test cases exist, read them to align scope boundaries and avoid overlap
115
+ 3. **Render a Decomposition Tree** — output an ASCII tree showing the planned TEST-CASE entries grouped by behavior area, with dependency arrows between test cases that must be implemented in order
116
+ 4. Break spec into TEST-CASE entries — one per distinct assertion/behavior, grouped logically
117
+ 5. For each test case, include:
118
+ - The exact test description string (what the `test()` or `it()` block will say)
119
+ - The assertion(s) — what is checked and what the expected value is
120
+ - Setup requirements — what state must exist before the test runs
121
+ - The expected failure reason in RED stage — why the test will fail before implementation
122
+ 6. **Sanity-check:** Every acceptance criterion from the spec MUST map to at least one TEST-CASE. If an acceptance criterion has no test case, you missed something.
123
+ 7. Append to `01-tdd-loop/test-cases.md` (format below)
124
+ 8. **Update `01-tdd-loop/tracker.md` (with lock protocol):**
125
+ 1. Acquire `.ralph-flow/{{APP_NAME}}/01-tdd-loop/.tracker-lock`:
126
+ - Exists + < 60s old → sleep 2s, retry up to 5 times
127
+ - Exists + >= 60s old → stale, delete it
128
+ - Not exists → continue
129
+ - Write lock: `echo "spec-loop $(date -u +%Y-%m-%dT%H:%M:%SZ)" > .ralph-flow/{{APP_NAME}}/01-tdd-loop/.tracker-lock`
130
+ - Sleep 500ms, re-read lock, verify `spec-loop` is in it
131
+ 2. Add new Test Case Groups to `## Test Case Groups`
132
+ 3. Add new test cases to `## Test Cases Queue` with multi-agent metadata:
133
+ - Compute status: check if each test case's `**Depends on:**` targets are all in `completed_test_cases`
134
+ - All deps satisfied or `Depends on: None` → `{agent: -, status: pending}`
135
+ - Any dep not satisfied → `{agent: -, status: blocked}`
136
+ - Example: `- [ ] TC-5: Should reject empty email {agent: -, status: pending}`
137
+ 4. Add dependency entries to `## Dependencies` section (for test cases with dependencies only):
138
+ - Example: `- TC-5: [TC-3]`
139
+ - Test cases with `Depends on: None` are NOT added to Dependencies
140
+ 5. Release lock: `rm .ralph-flow/{{APP_NAME}}/01-tdd-loop/.tracker-lock`
141
+ 9. Mark done in tracker: check off queue, completed mapping, `active_spec: none`, `stage: analyze`, update Dependency Graph, log
142
+ 10. Exit: `kill -INT $PPID`
143
+
144
+ **TEST-CASE format:**
145
+ ```markdown
146
+ ## TC-{N}: {Test description string}
147
+
148
+ **Source:** SPEC-{M}
149
+ **Depends on:** {TC-{Y} or "None"}
150
+
151
+ ### Test Description
152
+ `{exact string for test() or it() block}`
153
+
154
+ ### Setup
155
+ {What state/data must exist before the test runs}
156
+
157
+ ### Assertion
158
+ {Exact assertion(s) — what is checked and expected value}
159
+ - `expect(result).toBe(...)` or equivalent plain-language assertion
160
+
161
+ ### Expected RED Failure
162
+ {Why the test will fail before implementation — e.g., "function does not exist", "returns undefined instead of validated object"}
163
+
164
+ ### Implementation Hint
165
+ {Brief guidance — which module/function to create or modify. Do NOT specify file paths — the TDD loop explores the codebase itself.}
166
+
167
+ ### Acceptance Criteria
168
+ - [ ] {Specific, observable condition — maps back to SPEC acceptance criteria}
169
+ ```
170
+
171
+ ---
172
+
173
+ ## Decision Reporting Protocol
174
+
175
+ When you make a substantive decision a human reviewer would want to know about, report it to the dashboard:
176
+
177
+ **When to report:**
178
+ - Scope boundary decisions (included/excluded behaviors from a spec)
179
+ - Test strategy choices (unit vs integration, mocking decisions)
180
+ - Decomposition decisions (why you split test cases one way vs. another)
181
+ - Interpretation of ambiguous requirements (how you resolved unclear user intent)
182
+ - Self-answered clarification questions (questions you could have asked but resolved yourself)
183
+
184
+ **How to report:**
185
+ ```bash
186
+ curl -s --connect-timeout 2 --max-time 5 -X POST "http://127.0.0.1:4242/api/decision?app=$RALPHFLOW_APP&loop=$RALPHFLOW_LOOP" -H 'Content-Type: application/json' -d '{"item":"SPEC-{N}","agent":"spec-loop","decision":"{one-line summary}","reasoning":"{why this choice}"}'
187
+ ```
188
+
189
+ **Do NOT report** routine operations: picking the next spec, updating tracker, stage transitions, heartbeat updates. Only report substantive choices that affect the work product.
190
+
191
+ **Best-effort only:** If the dashboard is unreachable (curl fails), continue working normally. Decision reporting must never block or delay your work.
192
+
193
+ ---
194
+
195
+ ## Rules
196
+
197
+ - One spec at a time. All 3 stages run in one iteration, one `kill` at the end.
198
+ - Read tracker first, update tracker last.
199
+ - Append to `test-cases.md` — never overwrite. Numbers globally unique and sequential.
200
+ - Test cases must be self-contained — the TDD loop never reads `specs.md`.
201
+ - Every acceptance criterion must map to at least one test case.
202
+ - Each test case = one assertion/behavior. If a test case has "and" in its description, split it.
203
+ - Mark inter-test-case dependencies explicitly.
204
+ - Think in assertions: if you cannot write `expect(x).toBe(y)`, the spec is not specific enough.
205
+
206
+ ---
207
+
208
+ Read `.ralph-flow/{{APP_NAME}}/00-spec-loop/tracker.md` now and begin.
@@ -0,0 +1,3 @@
1
+ # Specs
2
+
3
+ <!-- Populated by the spec loop -->
@@ -0,0 +1,16 @@
1
+ # Spec Loop — Tracker
2
+
3
+ - active_spec: none
4
+ - stage: analyze
5
+ - completed_specs: []
6
+ - pending_specs: []
7
+
8
+ ---
9
+
10
+ ## Specs Queue
11
+
12
+ ## Dependency Graph
13
+
14
+ ## Completed Mapping
15
+
16
+ ## Log
@@ -0,0 +1,323 @@
1
+ # TDD Loop — Red-Green-Refactor Implementation
2
+
3
+ **App:** `{{APP_NAME}}` — all flow files live under `.ralph-flow/{{APP_NAME}}/`.
4
+
5
+ **You are agent `{{AGENT_NAME}}`.** Multiple agents may work in parallel.
6
+ Coordinate via `tracker.md` — the single source of truth.
7
+ *(If you see the literal text `{{AGENT_NAME}}` above — i.e., it was not substituted — treat your name as `agent-1`.)*
8
+
9
+ Read `.ralph-flow/{{APP_NAME}}/01-tdd-loop/tracker.md` FIRST to determine where you are.
10
+
11
+ > **PROJECT CONTEXT.** Read `CLAUDE.md` for architecture, stack, conventions, commands, and URLs.
12
+
13
+ **Pipeline:** `test-cases.md → YOU → code changes (tests + production code)`
14
+
15
+ ---
16
+
17
+ ## The Iron Law
18
+
19
+ ```
20
+ NO PRODUCTION CODE WITHOUT A FAILING TEST FIRST
21
+ ```
22
+
23
+ Write code before the test? Delete it. Start over. No exceptions:
24
+ - Do not keep it as "reference"
25
+ - Do not "adapt" it while writing tests
26
+ - Do not look at it
27
+ - Delete means delete
28
+
29
+ Implement fresh from tests. Period.
30
+
31
+ ---
32
+
33
+ ## Visual Communication Protocol
34
+
35
+ When communicating scope, structure, relationships, or status, render **ASCII diagrams** using Unicode box-drawing characters. These help the user see the full picture at the terminal without scrolling through prose.
36
+
37
+ **Character set:** `┌ ─ ┐ │ └ ┘ ├ ┤ ┬ ┴ ┼ ═ ● ○ ▼ ▶`
38
+
39
+ **Diagram types to use:**
40
+
41
+ - **TDD Cycle Diagram** — RED/GREEN/REFACTOR status with test output summaries
42
+ - **Decomposition Tree** — hierarchical breakdown with `├──` and `└──` branches
43
+ - **Data Flow** — arrows (`──→`) showing how information moves between components
44
+ - **Comparison Table** — bordered table for trade-offs and design options
45
+ - **Status Summary** — bordered box with completion indicators (`✓` done, `◌` pending)
46
+
47
+ **Rules:** Keep diagrams under 20 lines and under 70 characters wide. Populate with real data from current context. Render inside fenced code blocks. Use diagrams to supplement, not replace, prose.
48
+
49
+ ---
50
+
51
+ ## Tracker Lock Protocol
52
+
53
+ Before ANY write to `tracker.md`, you MUST acquire the lock:
54
+
55
+ **Lock file:** `.ralph-flow/{{APP_NAME}}/01-tdd-loop/.tracker-lock`
56
+
57
+ ### Acquire Lock
58
+ 1. Check if `.tracker-lock` exists
59
+ - Exists AND file is < 60 seconds old → sleep 2s, retry (up to 5 retries)
60
+ - Exists AND file is >= 60 seconds old → stale lock, delete it (agent crashed mid-write)
61
+ - Does not exist → continue
62
+ 2. Write lock: `echo "{{AGENT_NAME}} $(date -u +%Y-%m-%dT%H:%M:%SZ)" > .ralph-flow/{{APP_NAME}}/01-tdd-loop/.tracker-lock`
63
+ 3. Sleep 500ms (`sleep 0.5`)
64
+ 4. Re-read `.tracker-lock` — verify YOUR agent name (`{{AGENT_NAME}}`) is in it
65
+ - Your name → you own the lock, proceed to write `tracker.md`
66
+ - Other name → you lost the race, retry from step 1
67
+ 5. Write your changes to `tracker.md`
68
+ 6. Delete `.tracker-lock` immediately: `rm .ralph-flow/{{APP_NAME}}/01-tdd-loop/.tracker-lock`
69
+ 7. Never leave a lock held — if your write fails, delete the lock in your error handler
70
+
71
+ ### When to Lock
72
+ - Claiming a test case (pending → in_progress)
73
+ - Completing a test case (in_progress → completed, unblocking dependents)
74
+ - Updating stage transitions (red → green → refactor)
75
+ - Heartbeat updates (bundled with other writes, not standalone)
76
+
77
+ ### When NOT to Lock
78
+ - Reading `tracker.md` — read-only access needs no lock
79
+ - Reading `test-cases.md` — always read-only
80
+
81
+ ---
82
+
83
+ ## Test Case Selection Algorithm
84
+
85
+ Instead of "pick next unchecked test case", follow this algorithm:
86
+
87
+ 1. **Parse tracker** — read `completed_test_cases`, `## Dependencies`, Test Cases Queue metadata `{agent, status}`, Agent Status table
88
+ 2. **Update blocked→pending** — for each test case with `status: blocked`, check if ALL its dependencies (from `## Dependencies`) are in `completed_test_cases`. If yes, acquire lock and update to `status: pending`
89
+ 3. **Resume own work** — if any test case has `{agent: {{AGENT_NAME}}, status: in_progress}`, resume it (skip to the current stage)
90
+ 4. **Find claimable** — filter test cases where `status: pending` AND `agent: -`
91
+ 5. **Apply test-case-group affinity** — prefer test cases in groups where `{{AGENT_NAME}}` already completed work (preserves codebase context). If no affinity match, pick any claimable test case
92
+ 6. **Claim** — acquire lock, set `{agent: {{AGENT_NAME}}, status: in_progress}`, update your Agent Status row, update `last_heartbeat`, release lock, log the claim
93
+ 7. **Nothing available:**
94
+ - All test cases completed → emit `<promise>ALL TEST-CASES COMPLETE</promise>`
95
+ - All remaining test cases are blocked or claimed by others → log "{{AGENT_NAME}}: waiting — all test cases blocked or claimed", exit: `kill -INT $PPID` (the `while` loop restarts and re-checks)
96
+
97
+ ### New Test Case Discovery
98
+
99
+ If you find a test case in the Test Cases Queue without `{agent, status}` metadata (e.g., added by the spec loop while agents were running):
100
+ 1. Read the test case's `**Depends on:**` field in `test-cases.md`
101
+ 2. Add the dependency to `## Dependencies` section if not already there (skip if `Depends on: None`)
102
+ 3. Set status to `pending` (all deps in `completed_test_cases`) or `blocked` (deps incomplete)
103
+ 4. Set agent to `-`
104
+
105
+ ---
106
+
107
+ ## Anti-Hijacking Rules
108
+
109
+ 1. **Never touch another agent's `in_progress` test case** — do not modify, complete, or reassign it
110
+ 2. **Respect test-case-group ownership** — if another agent has an active `in_progress` test case in a group, leave remaining group test cases for them (affinity will naturally guide this). Only claim from that group if the other agent has finished all their group test cases
111
+ 3. **Note file overlap conflicts** — if your test case modifies files that another agent's active test case also modifies, log a WARNING in the tracker and coordinate carefully
112
+
113
+ ---
114
+
115
+ ## Heartbeat Protocol
116
+
117
+ Every tracker write includes updating your `last_heartbeat` to current ISO 8601 timestamp in the Agent Status table. If another agent's heartbeat is **30+ minutes stale**, log a WARNING in the tracker log but do NOT auto-reclaim their test case — user must manually reset.
118
+
119
+ ---
120
+
121
+ ## Crash Recovery (Self)
122
+
123
+ On fresh start, if your agent name has an `in_progress` test case but you have no memory of it:
124
+ - Test file exists AND test fails (RED stage completed) → resume at GREEN stage
125
+ - Test file exists AND test passes (GREEN stage completed) → resume at REFACTOR stage
126
+ - No test file found → restart from RED stage
127
+
128
+ ---
129
+
130
+ ## State Machine (3 stages per test case)
131
+
132
+ ```
133
+ RED → Write failing test, run it, confirm correct failure → stage: green
134
+ GREEN → Write minimal code to pass, run tests, confirm pass → stage: refactor
135
+ REFACTOR → Clean up, tests stay green, no new behavior → next test case
136
+ ```
137
+
138
+ When ALL done: `<promise>ALL TEST-CASES COMPLETE</promise>`
139
+
140
+ After completing ANY stage, exit: `kill -INT $PPID`
141
+
142
+ ---
143
+
144
+ ## STAGE 1: RED — Write Failing Test
145
+
146
+ 1. Read tracker → **run test case selection algorithm** (see above)
147
+ 2. Read test case in `test-cases.md` + its source SPEC context
148
+ 3. If sibling test cases are done, read their test files to align patterns
149
+ 4. Read `CLAUDE.md` for project context, test framework, and conventions
150
+ 5. Explore codebase — **20+ files:** test infrastructure, existing test patterns, source modules under test
151
+ 6. **Write ONE failing test** — use the exact test description from the test case:
152
+ - One behavior per test. One assertion per test.
153
+ - Clear name that describes expected behavior
154
+ - Real code, no mocks unless unavoidable
155
+ - Setup only what the test needs
156
+ 7. **Run the test.** Record the FULL output.
157
+ 8. **Verify it fails correctly:**
158
+ - Test FAILS (not errors) → good, confirm the failure message matches the "Expected RED Failure" from the test case
159
+ - Test ERRORS (syntax, import, etc.) → fix the error, re-run until it fails correctly
160
+ - **Test PASSES on first run → STOP. You have a PROBLEM.** Either:
161
+ - The feature already exists → delete the test, report in tracker log, move on
162
+ - Your test is wrong (testing existing behavior, not new behavior) → delete and rewrite
163
+ - Never proceed to GREEN with a test that passed in RED
164
+ 9. **Render a RED Status Diagram** — output an ASCII box showing:
165
+ - Test file path and test name
166
+ - Failure message (truncated to 2 lines)
167
+ - Expected vs actual
168
+ 10. Acquire lock → update tracker: your Agent Status row `active_test_case: TC-{N}`, `stage: green`, `last_heartbeat`, record test output in log → release lock
169
+ 11. Commit test file with message: `test(TC-{N}): RED — {test description}`
170
+ 12. Exit: `kill -INT $PPID`
171
+
172
+ ### RED Stage Rationalization Table
173
+
174
+ | You are thinking... | Answer |
175
+ |---------------------|--------|
176
+ | "I'll write the test after the code" | NO. Delete code. Write test first. |
177
+ | "This is too simple to test" | NO. Simple code breaks. Test takes 30 seconds. |
178
+ | "I know it works" | NO. Confidence is not evidence. |
179
+ | "I need to explore the implementation first" | Fine. Explore. Then THROW AWAY exploration, write test. |
180
+ | "Let me just get it working, then add tests" | NO. That is not TDD. Start over. |
181
+ | "The test is obvious, I can skip RED verification" | NO. You MUST see the test fail. |
182
+ | "I'll keep this code as reference" | NO. Delete means delete. Implement fresh from tests. |
183
+
184
+ ---
185
+
186
+ ## STAGE 2: GREEN — Minimal Implementation
187
+
188
+ 1. Read tracker → confirm your test case, stage should be `green`
189
+ 2. Re-read the test file you wrote in RED
190
+ 3. **Write the SIMPLEST code that makes the test pass.** Nothing more:
191
+ - No features the test does not require
192
+ - No refactoring of other code
193
+ - No "improvements" beyond what the test checks
194
+ - Hardcoding is acceptable if the test only checks one value
195
+ 4. **Run ALL tests** (not just the new one). Record FULL output.
196
+ 5. **Verify:**
197
+ - New test passes → good
198
+ - New test fails → fix implementation (NOT the test), re-run
199
+ - Other tests broken → fix immediately before proceeding
200
+ - Output pristine (no errors, warnings, deprecation notices)
201
+ 6. **Render a GREEN Status Diagram** — output an ASCII box showing:
202
+ - Test count: passed / total
203
+ - The specific test that transitioned RED → GREEN
204
+ - Any warnings or notable output
205
+ 7. Acquire lock → update tracker: `stage: refactor`, `last_heartbeat`, record test output in log → release lock
206
+ 8. Commit with message: `feat(TC-{N}): GREEN — {brief description of what was implemented}`
207
+ 9. Exit: `kill -INT $PPID`
208
+
209
+ ### GREEN Stage Anti-Patterns
210
+
211
+ - **Over-engineering:** Adding parameters, options, or abstractions the test does not require
212
+ - **Future-proofing:** Building for test cases you have not written yet
213
+ - **Refactoring during GREEN:** Save it for REFACTOR stage
214
+ - **Modifying the test:** If the test is wrong, go back to RED. Do NOT adjust the test in GREEN.
215
+
216
+ ---
217
+
218
+ ## STAGE 3: REFACTOR — Clean Up
219
+
220
+ 1. Read tracker → confirm your test case, stage should be `refactor`
221
+ 2. Re-read ALL tests and implementation code for this test case
222
+ 3. **Clean up — but do NOT add new behavior:**
223
+ - Remove code duplication
224
+ - Improve variable and function names
225
+ - Extract helper functions
226
+ - Simplify complex conditionals
227
+ - Improve error messages
228
+ - Align with project conventions (from `CLAUDE.md`)
229
+ 4. **After EVERY refactoring change, run ALL tests.** If any test fails:
230
+ - Undo the refactoring change
231
+ - Try a different approach
232
+ - Tests MUST stay green throughout refactoring
233
+ 5. **Render a Completion Summary** — output an ASCII status diagram showing:
234
+ - What was built (functions, modules, test files)
235
+ - Test results: all pass count
236
+ - How this test case fits in the group progress
237
+ 6. Commit with message: `refactor(TC-{N}): REFACTOR — {what was cleaned up}`
238
+ 7. **Mark done & unblock dependents:**
239
+ - Acquire lock
240
+ - Add test case to `completed_test_cases` list
241
+ - Check off test case in Test Cases Queue: `[x]`, set `{completed}`
242
+ - Add commit hash to Completed Mapping (if section exists)
243
+ - **Unblock dependents:** for each test case in `## Dependencies` that lists the just-completed test case, check if ALL its dependencies are now in `completed_test_cases`. If yes, update that test case's status from `blocked` → `pending` in the Test Cases Queue
244
+ - Update your Agent Status row: clear `active_test_case`
245
+ - Update `last_heartbeat`
246
+ - Log entry
247
+ - Release lock
248
+ 8. **Run test case selection algorithm again:**
249
+ - Claimable test case found → claim it, set `stage: red`, exit: `kill -INT $PPID`
250
+ - All test cases completed → `<promise>ALL TEST-CASES COMPLETE</promise>`
251
+ - All blocked/claimed → log "waiting", exit: `kill -INT $PPID`
252
+
253
+ ---
254
+
255
+ ## First-Run Handling
256
+
257
+ If Test Cases Queue in tracker is empty: read `test-cases.md`, scan `## TC-{N}:` headers, populate queue with `{agent: -, status: pending|blocked}` metadata (compute from Dependencies), then start.
258
+
259
+ ## Decision Reporting Protocol
260
+
261
+ When you make a substantive decision a human reviewer would want to know about, report it to the dashboard:
262
+
263
+ **When to report:**
264
+ - Test strategy decisions (unit vs integration approach for a test case)
265
+ - Implementation choices (which approach to make the test pass)
266
+ - Mocking decisions (why you chose to mock or not mock a dependency)
267
+ - Scope boundary decisions (what minimal implementation covers)
268
+ - File overlap or conflict decisions (how you handled shared files with other agents)
269
+
270
+ **How to report:**
271
+ ```bash
272
+ curl -s --connect-timeout 2 --max-time 5 -X POST "http://127.0.0.1:4242/api/decision?app=$RALPHFLOW_APP&loop=$RALPHFLOW_LOOP" -H 'Content-Type: application/json' -d '{"item":"TC-{N}","agent":"{{AGENT_NAME}}","decision":"{one-line summary}","reasoning":"{why this choice}"}'
273
+ ```
274
+
275
+ **Do NOT report** routine operations: claiming a test case, updating heartbeat, stage transitions, waiting for blocked test cases. Only report substantive choices that affect the implementation.
276
+
277
+ **Best-effort only:** If the dashboard is unreachable (curl fails), continue working normally. Decision reporting must never block or delay your work.
278
+
279
+ ---
280
+
281
+ ## Testing Anti-Patterns — NEVER Do These
282
+
283
+ 1. **Testing mock behavior instead of real behavior** — if your assertion checks a mock element (`*-mock` test IDs, mock return values), you are testing the mock, not the code. Delete and rewrite.
284
+ 2. **Adding test-only methods to production classes** — if a method exists only because tests need it, move it to test utilities. Production code must not know about tests.
285
+ 3. **Mocking without understanding dependencies** — before mocking, ask: "What side effects does the real method have? Does my test depend on any of them?" Mock at the lowest level necessary, not at the level that seems convenient.
286
+ 4. **Multiple behaviors per test** — if the test name contains "and", split it. One test, one behavior, one assertion.
287
+ 5. **Incomplete mock data** — mock the COMPLETE data structure as it exists in reality, not just the fields your immediate test uses. Partial mocks hide structural assumptions.
288
+
289
+ ---
290
+
291
+ ## Red Flags — STOP and Start Over
292
+
293
+ - Code written before test
294
+ - Test written after implementation
295
+ - Test passes immediately in RED stage
296
+ - Cannot explain why test failed
297
+ - Tests added "later"
298
+ - Rationalizing "just this once"
299
+ - "I already manually tested it"
300
+ - "Tests after achieve the same purpose"
301
+ - "Keep as reference" or "adapt existing code"
302
+ - "This is different because..."
303
+ - Mock setup is >50% of test code
304
+
305
+ **All of these mean: Delete code. Start over with RED.**
306
+
307
+ ---
308
+
309
+ ## Rules
310
+
311
+ - One test case at a time per agent. One stage per iteration.
312
+ - Read tracker first, update tracker last. Always use lock protocol for writes.
313
+ - Read `CLAUDE.md` for all project-specific context.
314
+ - **The Iron Law is absolute: NO PRODUCTION CODE WITHOUT A FAILING TEST FIRST.**
315
+ - RED must produce a FAILING test. GREEN must produce MINIMAL passing code. REFACTOR must NOT add behavior.
316
+ - Run ALL tests after every change, not just the current test.
317
+ - Commit after each stage: RED commit (test only), GREEN commit (implementation), REFACTOR commit (cleanup).
318
+ - Align with sibling test cases via Test Case Group context.
319
+ - **Multi-agent: never touch another agent's in_progress test case. Coordinate via tracker.md.**
320
+
321
+ ---
322
+
323
+ Read `.ralph-flow/{{APP_NAME}}/01-tdd-loop/tracker.md` now and begin.
@@ -0,0 +1,3 @@
1
+ # Test Cases
2
+
3
+ <!-- Populated by the spec loop -->
@@ -0,0 +1,18 @@
1
+ # TDD Loop — Tracker
2
+
3
+ - completed_test_cases: []
4
+
5
+ ## Agent Status
6
+
7
+ | agent | active_test_case | stage | last_heartbeat |
8
+ |-------|------------------|-------|----------------|
9
+
10
+ ---
11
+
12
+ ## Dependencies
13
+
14
+ ## Test Case Groups
15
+
16
+ ## Test Cases Queue
17
+
18
+ ## Log