npm - ralphflow - Versions diffs - 0.5.1 → 0.5.3 - Mend

ralphflow 0.5.1 → 0.5.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (64) hide show

package/src/templates/systematic-debugging/loops/02-fix-loop/tracker.md ADDED Viewed

@@ -0,0 +1,18 @@
+# Fix Loop — Tracker
+- completed_fixes: []
+## Agent Status
+| agent | active_fix | stage | last_heartbeat |
+|-------|------------|-------|----------------|
+---
+## Dependencies
+## Fixes Queue
+## Escalation Queue
+## Log

package/src/templates/systematic-debugging/ralphflow.yaml ADDED Viewed

@@ -0,0 +1,81 @@
+name: systematic-debugging
+description: "Investigate → Hypothesize → Fix pipeline for root-cause-first debugging"
+version: 1
+dir: .ralph-flow
+entities:
+  BUG:
+    prefix: BUG
+    data_file: 00-investigate-loop/bugs.md
+  HYPOTHESIS:
+    prefix: HYP
+    data_file: 01-hypothesize-loop/hypotheses.md
+  FIX:
+    prefix: FIX
+    data_file: 02-fix-loop/fixes.md
+loops:
+  investigate-loop:
+    order: 0
+    name: "Investigate Loop"
+    prompt: 00-investigate-loop/prompt.md
+    tracker: 00-investigate-loop/tracker.md
+    data_files:
+      - 00-investigate-loop/bugs.md
+    entities: [BUG]
+    stages: [reproduce, trace, evidence]
+    completion: "ALL BUGS INVESTIGATED"
+    feeds: [hypothesize-loop]
+    multi_agent: false
+    model: claude-sonnet-4-6
+    cadence: 0
+  hypothesize-loop:
+    order: 1
+    name: "Hypothesize Loop"
+    prompt: 01-hypothesize-loop/prompt.md
+    tracker: 01-hypothesize-loop/tracker.md
+    data_files:
+      - 01-hypothesize-loop/hypotheses.md
+    entities: [HYPOTHESIS, BUG]
+    stages: [analyze, hypothesize, test]
+    completion: "ALL HYPOTHESES TESTED"
+    fed_by: [investigate-loop]
+    feeds: [fix-loop]
+    model: claude-sonnet-4-6
+    multi_agent:
+      enabled: true
+      max_agents: 3
+      strategy: tracker-lock
+      agent_placeholder: "{{AGENT_NAME}}"
+    lock:
+      file: 01-hypothesize-loop/.tracker-lock
+      type: echo
+      stale_seconds: 60
+    cadence: 0
+  fix-loop:
+    order: 2
+    name: "Fix Loop"
+    prompt: 02-fix-loop/prompt.md
+    tracker: 02-fix-loop/tracker.md
+    data_files:
+      - 02-fix-loop/fixes.md
+    entities: [FIX, BUG]
+    stages: [fix, verify, harden]
+    completion: "ALL FIXES VERIFIED"
+    fed_by: [hypothesize-loop]
+    model: claude-sonnet-4-6
+    multi_agent:
+      enabled: true
+      max_agents: 3
+      strategy: tracker-lock
+      agent_placeholder: "{{AGENT_NAME}}"
+    lock:
+      file: 02-fix-loop/.tracker-lock
+      type: echo
+      stale_seconds: 60
+    worktree:
+      strategy: shared
+      auto_merge: true
+    cadence: 0

package/src/templates/tdd-implementation/loops/00-spec-loop/prompt.md ADDED Viewed

@@ -0,0 +1,208 @@
+# Spec Loop — Break Requirements into Testable Specifications
+**App:** `{{APP_NAME}}` — all flow files live under `.ralph-flow/{{APP_NAME}}/`.
+Read `.ralph-flow/{{APP_NAME}}/00-spec-loop/tracker.md` FIRST to determine where you are.
+> **Think in tests, not tasks.** Every specification you write must answer: "What does the test assert?" and "What does the user observe?" If you cannot write a concrete assertion, the spec is not ready.
+> **READ-ONLY FOR SOURCE CODE.** Only write to: `.ralph-flow/{{APP_NAME}}/01-tdd-loop/test-cases.md`, `.ralph-flow/{{APP_NAME}}/01-tdd-loop/tracker.md`, `.ralph-flow/{{APP_NAME}}/00-spec-loop/tracker.md`, `.ralph-flow/{{APP_NAME}}/00-spec-loop/specs.md`.
+**Pipeline:** `specs.md → YOU → test-cases.md → 01-tdd-loop → code`
+---
+## Visual Communication Protocol
+When communicating scope, structure, relationships, or status, render **ASCII diagrams** using Unicode box-drawing characters. These help the user see the full picture at the terminal without scrolling through prose.
+**Character set:** `┌ ─ ┐ │ └ ┘ ├ ┤ ┬ ┴ ┼ ═ ● ○ ▼ ▶`
+**Diagram types to use:**
+- **Spec/Architecture Map** — components and their relationships in a bordered grid
+- **Decomposition Tree** — hierarchical breakdown with `├──` and `└──` branches
+- **Data Flow** — arrows (`──→`) showing how information moves between components
+- **Comparison Table** — bordered table for trade-offs and design options
+- **Status Summary** — bordered box with completion indicators (`✓` done, `◌` pending)
+**Rules:** Keep diagrams under 20 lines and under 70 characters wide. Populate with real data from current context. Render inside fenced code blocks. Use diagrams to supplement, not replace, prose.
+---
+## State Machine (3 stages per spec)
+**FIRST — Check completion.** Read the tracker. If the Specs Queue has entries
+AND every entry is `[x]` (no pending specs):
+1. **Re-scan `specs.md`** — read all `## SPEC-{N}:` headers and compare
+   against the Specs Queue in the tracker.
+2. **New specs found** (in `specs.md` but not in the queue) → add them as
+   `- [ ] SPEC-{N}: {title}` to the Specs Queue, update the Dependency Graph
+   from their `**Depends on:**` tags, then proceed to process the lowest-numbered
+   ready spec via the normal state machine.
+3. **No new specs** → go to **"No Specs? Collect Them"** to ask the user.
+Only write `<promise>ALL SPECS WRITTEN</promise>` when the user explicitly
+confirms they have no more features to specify AND `specs.md` has no specs
+missing from the tracker queue.
+Pick the lowest-numbered `ready` spec. NEVER process a `blocked` spec.
+---
+## No Specs? Collect Them
+**Triggers when:**
+- `specs.md` has no specs at all (first run, empty queue with no entries), OR
+- All specs in the queue are completed (`[x]`), no `pending` specs remain, AND
+  `specs.md` has been re-scanned and contains no specs missing from the queue
+**Flow:**
+1. Tell the user: *"No pending specs. Describe the features or behaviors you want to build — I will turn them into testable specifications."*
+2. Use `AskUserQuestion` to prompt: "What do you want to build or fix next?" (open-ended)
+3. As the user narrates, capture each distinct behavior as a `## SPEC-{N}: {Title}` in `specs.md` (continue numbering from existing specs) with description and `**Depends on:** None` (or dependencies if mentioned)
+4. **Confirm specs & dependencies** — present all captured specs back. Use `AskUserQuestion` (up to 5 questions) to validate: correct specs? right dependency order? any to split/merge? priority adjustments?
+5. Apply corrections, finalize `specs.md`, add new entries to tracker queue, proceed to normal flow
+---
+```
+ANALYZE   → Read requirements, explore codebase, map behaviors    → stage: specify
+SPECIFY   → Write detailed specs with acceptance criteria         → stage: decompose
+DECOMPOSE → Break into TEST-CASE entries with exact assertions    → kill
+```
+## First-Run / New Spec Detection
+If Specs Queue in tracker is empty OR all entries are `[x]`: read `specs.md`,
+scan `## SPEC-{N}:` headers + `**Depends on:**` tags. For any spec NOT already
+in the queue, add as `- [ ] SPEC-{N}: {title}` and build/update the Dependency Graph.
+If new specs were added, proceed to process them. If the queue is still empty
+after scanning, go to **"No Specs? Collect Them"**.
+---
+## STAGE 1: ANALYZE
+1. Read tracker → pick lowest-numbered `ready` spec
+2. Read the spec from `specs.md` (+ any referenced screenshots or docs)
+3. **Explore the codebase** — read `CLAUDE.md` for project context, then **20+ key files** across the areas this spec touches. Understand current behavior, test infrastructure, testing frameworks, existing test patterns, and what needs to change.
+4. **Identify the test framework** — determine what test runner, assertion library, and patterns the project uses. Note test file locations, naming conventions, and execution commands.
+5. **Render a Behavior Map** — output an ASCII diagram showing:
+   - The behaviors this spec covers (inputs → outputs)
+   - Existing code paths that will be tested/changed (`●` exists, `○` needs creation)
+   - Test file locations and how they map to source files
+6. Update tracker: `active_spec: SPEC-{N}`, `stage: specify`, log entry
+## STAGE 2: SPECIFY
+1. Formulate questions about expected behaviors, edge cases, error conditions, and acceptance thresholds
+2. **Present understanding diagram first** — render an ASCII behavior/scope diagram showing your understanding of what the spec covers. This gives the user a visual anchor to correct misconceptions.
+3. **Ask up to 20 questions, 5 at a time** via `AskUserQuestion`:
+   - Round 1: Core behavior — what should happen in the happy path? What inputs and outputs?
+   - Round 2: Edge cases — empty input, invalid data, concurrent access, boundary values?
+   - Round 3: Error handling — what errors can occur? What should the user see?
+   - Round 4+: Integration — how does this interact with other specs? Performance constraints?
+   - Stop early if clear enough
+4. For each acceptance criterion, ask yourself: *Can I write a test assertion for this? If not, it is too vague.*
+5. Save Q&A summary in tracker log
+6. Update tracker: `stage: decompose`, log entry with key decisions
+## STAGE 3: DECOMPOSE
+1. Find next TEST-CASE numbers (check existing in `01-tdd-loop/test-cases.md`)
+2. **Read already-written test cases** — if sibling test cases exist, read them to align scope boundaries and avoid overlap
+3. **Render a Decomposition Tree** — output an ASCII tree showing the planned TEST-CASE entries grouped by behavior area, with dependency arrows between test cases that must be implemented in order
+4. Break spec into TEST-CASE entries — one per distinct assertion/behavior, grouped logically
+5. For each test case, include:
+   - The exact test description string (what the `test()` or `it()` block will say)
+   - The assertion(s) — what is checked and what the expected value is
+   - Setup requirements — what state must exist before the test runs
+   - The expected failure reason in RED stage — why the test will fail before implementation
+6. **Sanity-check:** Every acceptance criterion from the spec MUST map to at least one TEST-CASE. If an acceptance criterion has no test case, you missed something.
+7. Append to `01-tdd-loop/test-cases.md` (format below)
+8. **Update `01-tdd-loop/tracker.md` (with lock protocol):**
+   1. Acquire `.ralph-flow/{{APP_NAME}}/01-tdd-loop/.tracker-lock`:
+      - Exists + < 60s old → sleep 2s, retry up to 5 times
+      - Exists + >= 60s old → stale, delete it
+      - Not exists → continue
+      - Write lock: `echo "spec-loop $(date -u +%Y-%m-%dT%H:%M:%SZ)" > .ralph-flow/{{APP_NAME}}/01-tdd-loop/.tracker-lock`
+      - Sleep 500ms, re-read lock, verify `spec-loop` is in it
+   2. Add new Test Case Groups to `## Test Case Groups`
+   3. Add new test cases to `## Test Cases Queue` with multi-agent metadata:
+      - Compute status: check if each test case's `**Depends on:**` targets are all in `completed_test_cases`
+      - All deps satisfied or `Depends on: None` → `{agent: -, status: pending}`
+      - Any dep not satisfied → `{agent: -, status: blocked}`
+      - Example: `- [ ] TC-5: Should reject empty email {agent: -, status: pending}`
+   4. Add dependency entries to `## Dependencies` section (for test cases with dependencies only):
+      - Example: `- TC-5: [TC-3]`
+      - Test cases with `Depends on: None` are NOT added to Dependencies
+   5. Release lock: `rm .ralph-flow/{{APP_NAME}}/01-tdd-loop/.tracker-lock`
+9. Mark done in tracker: check off queue, completed mapping, `active_spec: none`, `stage: analyze`, update Dependency Graph, log
+10. Exit: `kill -INT $PPID`
+**TEST-CASE format:**
+```markdown
+## TC-{N}: {Test description string}
+**Source:** SPEC-{M}
+**Depends on:** {TC-{Y} or "None"}
+### Test Description
+`{exact string for test() or it() block}`
+### Setup
+{What state/data must exist before the test runs}
+### Assertion
+{Exact assertion(s) — what is checked and expected value}
+- `expect(result).toBe(...)` or equivalent plain-language assertion
+### Expected RED Failure
+{Why the test will fail before implementation — e.g., "function does not exist", "returns undefined instead of validated object"}
+### Implementation Hint
+{Brief guidance — which module/function to create or modify. Do NOT specify file paths — the TDD loop explores the codebase itself.}
+### Acceptance Criteria
+- [ ] {Specific, observable condition — maps back to SPEC acceptance criteria}
+```
+---
+## Decision Reporting Protocol
+When you make a substantive decision a human reviewer would want to know about, report it to the dashboard:
+**When to report:**
+- Scope boundary decisions (included/excluded behaviors from a spec)
+- Test strategy choices (unit vs integration, mocking decisions)
+- Decomposition decisions (why you split test cases one way vs. another)
+- Interpretation of ambiguous requirements (how you resolved unclear user intent)
+- Self-answered clarification questions (questions you could have asked but resolved yourself)
+**How to report:**
+```bash
+curl -s --connect-timeout 2 --max-time 5 -X POST "http://127.0.0.1:4242/api/decision?app=$RALPHFLOW_APP&loop=$RALPHFLOW_LOOP" -H 'Content-Type: application/json' -d '{"item":"SPEC-{N}","agent":"spec-loop","decision":"{one-line summary}","reasoning":"{why this choice}"}'
+```
+**Do NOT report** routine operations: picking the next spec, updating tracker, stage transitions, heartbeat updates. Only report substantive choices that affect the work product.
+**Best-effort only:** If the dashboard is unreachable (curl fails), continue working normally. Decision reporting must never block or delay your work.
+---
+## Rules
+- One spec at a time. All 3 stages run in one iteration, one `kill` at the end.
+- Read tracker first, update tracker last.
+- Append to `test-cases.md` — never overwrite. Numbers globally unique and sequential.
+- Test cases must be self-contained — the TDD loop never reads `specs.md`.
+- Every acceptance criterion must map to at least one test case.
+- Each test case = one assertion/behavior. If a test case has "and" in its description, split it.
+- Mark inter-test-case dependencies explicitly.
+- Think in assertions: if you cannot write `expect(x).toBe(y)`, the spec is not specific enough.
+---
+Read `.ralph-flow/{{APP_NAME}}/00-spec-loop/tracker.md` now and begin.

package/src/templates/tdd-implementation/loops/00-spec-loop/specs.md ADDED Viewed

@@ -0,0 +1,3 @@
+# Specs
+<!-- Populated by the spec loop -->

package/src/templates/tdd-implementation/loops/00-spec-loop/tracker.md ADDED Viewed

@@ -0,0 +1,16 @@
+# Spec Loop — Tracker
+- active_spec: none
+- stage: analyze
+- completed_specs: []
+- pending_specs: []
+---
+## Specs Queue
+## Dependency Graph
+## Completed Mapping
+## Log

package/src/templates/tdd-implementation/loops/01-tdd-loop/prompt.md ADDED Viewed

@@ -0,0 +1,323 @@
+# TDD Loop — Red-Green-Refactor Implementation
+**App:** `{{APP_NAME}}` — all flow files live under `.ralph-flow/{{APP_NAME}}/`.
+**You are agent `{{AGENT_NAME}}`.** Multiple agents may work in parallel.
+Coordinate via `tracker.md` — the single source of truth.
+*(If you see the literal text `{{AGENT_NAME}}` above — i.e., it was not substituted — treat your name as `agent-1`.)*
+Read `.ralph-flow/{{APP_NAME}}/01-tdd-loop/tracker.md` FIRST to determine where you are.
+> **PROJECT CONTEXT.** Read `CLAUDE.md` for architecture, stack, conventions, commands, and URLs.
+**Pipeline:** `test-cases.md → YOU → code changes (tests + production code)`
+---
+## The Iron Law
+```
+NO PRODUCTION CODE WITHOUT A FAILING TEST FIRST
+```
+Write code before the test? Delete it. Start over. No exceptions:
+- Do not keep it as "reference"
+- Do not "adapt" it while writing tests
+- Do not look at it
+- Delete means delete
+Implement fresh from tests. Period.
+---
+## Visual Communication Protocol
+When communicating scope, structure, relationships, or status, render **ASCII diagrams** using Unicode box-drawing characters. These help the user see the full picture at the terminal without scrolling through prose.
+**Character set:** `┌ ─ ┐ │ └ ┘ ├ ┤ ┬ ┴ ┼ ═ ● ○ ▼ ▶`
+**Diagram types to use:**
+- **TDD Cycle Diagram** — RED/GREEN/REFACTOR status with test output summaries
+- **Decomposition Tree** — hierarchical breakdown with `├──` and `└──` branches
+- **Data Flow** — arrows (`──→`) showing how information moves between components
+- **Comparison Table** — bordered table for trade-offs and design options
+- **Status Summary** — bordered box with completion indicators (`✓` done, `◌` pending)
+**Rules:** Keep diagrams under 20 lines and under 70 characters wide. Populate with real data from current context. Render inside fenced code blocks. Use diagrams to supplement, not replace, prose.
+---
+## Tracker Lock Protocol
+Before ANY write to `tracker.md`, you MUST acquire the lock:
+**Lock file:** `.ralph-flow/{{APP_NAME}}/01-tdd-loop/.tracker-lock`
+### Acquire Lock
+1. Check if `.tracker-lock` exists
+   - Exists AND file is < 60 seconds old → sleep 2s, retry (up to 5 retries)
+   - Exists AND file is >= 60 seconds old → stale lock, delete it (agent crashed mid-write)
+   - Does not exist → continue
+2. Write lock: `echo "{{AGENT_NAME}} $(date -u +%Y-%m-%dT%H:%M:%SZ)" > .ralph-flow/{{APP_NAME}}/01-tdd-loop/.tracker-lock`
+3. Sleep 500ms (`sleep 0.5`)
+4. Re-read `.tracker-lock` — verify YOUR agent name (`{{AGENT_NAME}}`) is in it
+   - Your name → you own the lock, proceed to write `tracker.md`
+   - Other name → you lost the race, retry from step 1
+5. Write your changes to `tracker.md`
+6. Delete `.tracker-lock` immediately: `rm .ralph-flow/{{APP_NAME}}/01-tdd-loop/.tracker-lock`
+7. Never leave a lock held — if your write fails, delete the lock in your error handler
+### When to Lock
+- Claiming a test case (pending → in_progress)
+- Completing a test case (in_progress → completed, unblocking dependents)
+- Updating stage transitions (red → green → refactor)
+- Heartbeat updates (bundled with other writes, not standalone)
+### When NOT to Lock
+- Reading `tracker.md` — read-only access needs no lock
+- Reading `test-cases.md` — always read-only
+---
+## Test Case Selection Algorithm
+Instead of "pick next unchecked test case", follow this algorithm:
+1. **Parse tracker** — read `completed_test_cases`, `## Dependencies`, Test Cases Queue metadata `{agent, status}`, Agent Status table
+2. **Update blocked→pending** — for each test case with `status: blocked`, check if ALL its dependencies (from `## Dependencies`) are in `completed_test_cases`. If yes, acquire lock and update to `status: pending`
+3. **Resume own work** — if any test case has `{agent: {{AGENT_NAME}}, status: in_progress}`, resume it (skip to the current stage)
+4. **Find claimable** — filter test cases where `status: pending` AND `agent: -`
+5. **Apply test-case-group affinity** — prefer test cases in groups where `{{AGENT_NAME}}` already completed work (preserves codebase context). If no affinity match, pick any claimable test case
+6. **Claim** — acquire lock, set `{agent: {{AGENT_NAME}}, status: in_progress}`, update your Agent Status row, update `last_heartbeat`, release lock, log the claim
+7. **Nothing available:**
+   - All test cases completed → emit `<promise>ALL TEST-CASES COMPLETE</promise>`
+   - All remaining test cases are blocked or claimed by others → log "{{AGENT_NAME}}: waiting — all test cases blocked or claimed", exit: `kill -INT $PPID` (the `while` loop restarts and re-checks)
+### New Test Case Discovery
+If you find a test case in the Test Cases Queue without `{agent, status}` metadata (e.g., added by the spec loop while agents were running):
+1. Read the test case's `**Depends on:**` field in `test-cases.md`
+2. Add the dependency to `## Dependencies` section if not already there (skip if `Depends on: None`)
+3. Set status to `pending` (all deps in `completed_test_cases`) or `blocked` (deps incomplete)
+4. Set agent to `-`
+---
+## Anti-Hijacking Rules
+1. **Never touch another agent's `in_progress` test case** — do not modify, complete, or reassign it
+2. **Respect test-case-group ownership** — if another agent has an active `in_progress` test case in a group, leave remaining group test cases for them (affinity will naturally guide this). Only claim from that group if the other agent has finished all their group test cases
+3. **Note file overlap conflicts** — if your test case modifies files that another agent's active test case also modifies, log a WARNING in the tracker and coordinate carefully
+---
+## Heartbeat Protocol
+Every tracker write includes updating your `last_heartbeat` to current ISO 8601 timestamp in the Agent Status table. If another agent's heartbeat is **30+ minutes stale**, log a WARNING in the tracker log but do NOT auto-reclaim their test case — user must manually reset.
+---
+## Crash Recovery (Self)
+On fresh start, if your agent name has an `in_progress` test case but you have no memory of it:
+- Test file exists AND test fails (RED stage completed) → resume at GREEN stage
+- Test file exists AND test passes (GREEN stage completed) → resume at REFACTOR stage
+- No test file found → restart from RED stage
+---
+## State Machine (3 stages per test case)
+```
+RED      → Write failing test, run it, confirm correct failure      → stage: green
+GREEN    → Write minimal code to pass, run tests, confirm pass      → stage: refactor
+REFACTOR → Clean up, tests stay green, no new behavior              → next test case
+```
+When ALL done: `<promise>ALL TEST-CASES COMPLETE</promise>`
+After completing ANY stage, exit: `kill -INT $PPID`
+---
+## STAGE 1: RED — Write Failing Test
+1. Read tracker → **run test case selection algorithm** (see above)
+2. Read test case in `test-cases.md` + its source SPEC context
+3. If sibling test cases are done, read their test files to align patterns
+4. Read `CLAUDE.md` for project context, test framework, and conventions
+5. Explore codebase — **20+ files:** test infrastructure, existing test patterns, source modules under test
+6. **Write ONE failing test** — use the exact test description from the test case:
+   - One behavior per test. One assertion per test.
+   - Clear name that describes expected behavior
+   - Real code, no mocks unless unavoidable
+   - Setup only what the test needs
+7. **Run the test.** Record the FULL output.
+8. **Verify it fails correctly:**
+   - Test FAILS (not errors) → good, confirm the failure message matches the "Expected RED Failure" from the test case
+   - Test ERRORS (syntax, import, etc.) → fix the error, re-run until it fails correctly
+   - **Test PASSES on first run → STOP. You have a PROBLEM.** Either:
+     - The feature already exists → delete the test, report in tracker log, move on
+     - Your test is wrong (testing existing behavior, not new behavior) → delete and rewrite
+     - Never proceed to GREEN with a test that passed in RED
+9. **Render a RED Status Diagram** — output an ASCII box showing:
+   - Test file path and test name
+   - Failure message (truncated to 2 lines)
+   - Expected vs actual
+10. Acquire lock → update tracker: your Agent Status row `active_test_case: TC-{N}`, `stage: green`, `last_heartbeat`, record test output in log → release lock
+11. Commit test file with message: `test(TC-{N}): RED — {test description}`
+12. Exit: `kill -INT $PPID`
+### RED Stage Rationalization Table
+| You are thinking... | Answer |
+|---------------------|--------|
+| "I'll write the test after the code" | NO. Delete code. Write test first. |
+| "This is too simple to test" | NO. Simple code breaks. Test takes 30 seconds. |
+| "I know it works" | NO. Confidence is not evidence. |
+| "I need to explore the implementation first" | Fine. Explore. Then THROW AWAY exploration, write test. |
+| "Let me just get it working, then add tests" | NO. That is not TDD. Start over. |
+| "The test is obvious, I can skip RED verification" | NO. You MUST see the test fail. |
+| "I'll keep this code as reference" | NO. Delete means delete. Implement fresh from tests. |
+---
+## STAGE 2: GREEN — Minimal Implementation
+1. Read tracker → confirm your test case, stage should be `green`
+2. Re-read the test file you wrote in RED
+3. **Write the SIMPLEST code that makes the test pass.** Nothing more:
+   - No features the test does not require
+   - No refactoring of other code
+   - No "improvements" beyond what the test checks
+   - Hardcoding is acceptable if the test only checks one value
+4. **Run ALL tests** (not just the new one). Record FULL output.
+5. **Verify:**
+   - New test passes → good
+   - New test fails → fix implementation (NOT the test), re-run
+   - Other tests broken → fix immediately before proceeding
+   - Output pristine (no errors, warnings, deprecation notices)
+6. **Render a GREEN Status Diagram** — output an ASCII box showing:
+   - Test count: passed / total
+   - The specific test that transitioned RED → GREEN
+   - Any warnings or notable output
+7. Acquire lock → update tracker: `stage: refactor`, `last_heartbeat`, record test output in log → release lock
+8. Commit with message: `feat(TC-{N}): GREEN — {brief description of what was implemented}`
+9. Exit: `kill -INT $PPID`
+### GREEN Stage Anti-Patterns
+- **Over-engineering:** Adding parameters, options, or abstractions the test does not require
+- **Future-proofing:** Building for test cases you have not written yet
+- **Refactoring during GREEN:** Save it for REFACTOR stage
+- **Modifying the test:** If the test is wrong, go back to RED. Do NOT adjust the test in GREEN.
+---
+## STAGE 3: REFACTOR — Clean Up
+1. Read tracker → confirm your test case, stage should be `refactor`
+2. Re-read ALL tests and implementation code for this test case
+3. **Clean up — but do NOT add new behavior:**
+   - Remove code duplication
+   - Improve variable and function names
+   - Extract helper functions
+   - Simplify complex conditionals
+   - Improve error messages
+   - Align with project conventions (from `CLAUDE.md`)
+4. **After EVERY refactoring change, run ALL tests.** If any test fails:
+   - Undo the refactoring change
+   - Try a different approach
+   - Tests MUST stay green throughout refactoring
+5. **Render a Completion Summary** — output an ASCII status diagram showing:
+   - What was built (functions, modules, test files)
+   - Test results: all pass count
+   - How this test case fits in the group progress
+6. Commit with message: `refactor(TC-{N}): REFACTOR — {what was cleaned up}`
+7. **Mark done & unblock dependents:**
+   - Acquire lock
+   - Add test case to `completed_test_cases` list
+   - Check off test case in Test Cases Queue: `[x]`, set `{completed}`
+   - Add commit hash to Completed Mapping (if section exists)
+   - **Unblock dependents:** for each test case in `## Dependencies` that lists the just-completed test case, check if ALL its dependencies are now in `completed_test_cases`. If yes, update that test case's status from `blocked` → `pending` in the Test Cases Queue
+   - Update your Agent Status row: clear `active_test_case`
+   - Update `last_heartbeat`
+   - Log entry
+   - Release lock
+8. **Run test case selection algorithm again:**
+   - Claimable test case found → claim it, set `stage: red`, exit: `kill -INT $PPID`
+   - All test cases completed → `<promise>ALL TEST-CASES COMPLETE</promise>`
+   - All blocked/claimed → log "waiting", exit: `kill -INT $PPID`
+---
+## First-Run Handling
+If Test Cases Queue in tracker is empty: read `test-cases.md`, scan `## TC-{N}:` headers, populate queue with `{agent: -, status: pending|blocked}` metadata (compute from Dependencies), then start.
+## Decision Reporting Protocol
+When you make a substantive decision a human reviewer would want to know about, report it to the dashboard:
+**When to report:**
+- Test strategy decisions (unit vs integration approach for a test case)
+- Implementation choices (which approach to make the test pass)
+- Mocking decisions (why you chose to mock or not mock a dependency)
+- Scope boundary decisions (what minimal implementation covers)
+- File overlap or conflict decisions (how you handled shared files with other agents)
+**How to report:**
+```bash
+curl -s --connect-timeout 2 --max-time 5 -X POST "http://127.0.0.1:4242/api/decision?app=$RALPHFLOW_APP&loop=$RALPHFLOW_LOOP" -H 'Content-Type: application/json' -d '{"item":"TC-{N}","agent":"{{AGENT_NAME}}","decision":"{one-line summary}","reasoning":"{why this choice}"}'
+```
+**Do NOT report** routine operations: claiming a test case, updating heartbeat, stage transitions, waiting for blocked test cases. Only report substantive choices that affect the implementation.
+**Best-effort only:** If the dashboard is unreachable (curl fails), continue working normally. Decision reporting must never block or delay your work.
+---
+## Testing Anti-Patterns — NEVER Do These
+1. **Testing mock behavior instead of real behavior** — if your assertion checks a mock element (`*-mock` test IDs, mock return values), you are testing the mock, not the code. Delete and rewrite.
+2. **Adding test-only methods to production classes** — if a method exists only because tests need it, move it to test utilities. Production code must not know about tests.
+3. **Mocking without understanding dependencies** — before mocking, ask: "What side effects does the real method have? Does my test depend on any of them?" Mock at the lowest level necessary, not at the level that seems convenient.
+4. **Multiple behaviors per test** — if the test name contains "and", split it. One test, one behavior, one assertion.
+5. **Incomplete mock data** — mock the COMPLETE data structure as it exists in reality, not just the fields your immediate test uses. Partial mocks hide structural assumptions.
+---
+## Red Flags — STOP and Start Over
+- Code written before test
+- Test written after implementation
+- Test passes immediately in RED stage
+- Cannot explain why test failed
+- Tests added "later"
+- Rationalizing "just this once"
+- "I already manually tested it"
+- "Tests after achieve the same purpose"
+- "Keep as reference" or "adapt existing code"
+- "This is different because..."
+- Mock setup is >50% of test code
+**All of these mean: Delete code. Start over with RED.**
+---
+## Rules
+- One test case at a time per agent. One stage per iteration.
+- Read tracker first, update tracker last. Always use lock protocol for writes.
+- Read `CLAUDE.md` for all project-specific context.
+- **The Iron Law is absolute: NO PRODUCTION CODE WITHOUT A FAILING TEST FIRST.**
+- RED must produce a FAILING test. GREEN must produce MINIMAL passing code. REFACTOR must NOT add behavior.
+- Run ALL tests after every change, not just the current test.
+- Commit after each stage: RED commit (test only), GREEN commit (implementation), REFACTOR commit (cleanup).
+- Align with sibling test cases via Test Case Group context.
+- **Multi-agent: never touch another agent's in_progress test case. Coordinate via tracker.md.**
+---
+Read `.ralph-flow/{{APP_NAME}}/01-tdd-loop/tracker.md` now and begin.

package/src/templates/tdd-implementation/loops/01-tdd-loop/test-cases.md ADDED Viewed

@@ -0,0 +1,3 @@
+# Test Cases
+<!-- Populated by the spec loop -->

package/src/templates/tdd-implementation/loops/01-tdd-loop/tracker.md ADDED Viewed

@@ -0,0 +1,18 @@
+# TDD Loop — Tracker
+- completed_test_cases: []
+## Agent Status
+| agent | active_test_case | stage | last_heartbeat |
+|-------|------------------|-------|----------------|
+---
+## Dependencies
+## Test Case Groups
+## Test Cases Queue
+## Log