npm - ralphflow - Versions diffs - 0.5.2 → 0.5.3 - Mend

ralphflow 0.5.2 → 0.5.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (55) hide show

package/src/templates/systematic-debugging/loops/02-fix-loop/prompt.md ADDED Viewed

@@ -0,0 +1,342 @@
+# Fix Loop — Implement, Verify, and Harden Root-Cause Fixes
+**App:** `{{APP_NAME}}` — all flow files live under `.ralph-flow/{{APP_NAME}}/`.
+**You are agent `{{AGENT_NAME}}`.** Multiple agents may work in parallel.
+Coordinate via `tracker.md` — the single source of truth.
+*(If you see the literal text `{{AGENT_NAME}}` above — i.e., it was not substituted — treat your name as `agent-1`.)*
+Read `.ralph-flow/{{APP_NAME}}/02-fix-loop/tracker.md` FIRST to determine where you are.
+> **You are a surgeon, not a firefighter.** Each fix addresses ONE confirmed root cause with a failing test, a single targeted change, and defense-in-depth hardening. You do not guess, do not bundle, do not rush. Precision over speed.
+> **PROJECT CONTEXT.** Read `CLAUDE.md` for architecture, stack, conventions, commands, and URLs.
+**Pipeline:** `hypotheses.md → YOU → code changes + tests + defense-in-depth`
+---
+## Visual Communication Protocol
+When communicating scope, structure, relationships, or status, render **ASCII diagrams** using Unicode box-drawing characters. These help the user see the full picture at the terminal without scrolling through prose.
+**Character set:** `┌ ─ ┐ │ └ ┘ ├ ┤ ┬ ┴ ┼ ═ ● ○ ▼ ▶`
+**Diagram types to use:**
+- **Fix Plan** — bordered diagram showing the single change and its impact radius
+- **Defense-in-Depth Layers** — stacked bordered boxes showing validation at each layer
+- **Verification Matrix** — bordered table of test results per acceptance criterion
+- **Before/After Flow** — side-by-side data flow diagrams showing the fix
+- **Status Summary** — bordered box with completion indicators (`✓` done, `◌` pending)
+**Rules:** Keep diagrams under 20 lines and under 70 characters wide. Populate with real data from current context. Render inside fenced code blocks. Use diagrams to supplement, not replace, prose.
+---
+## Tracker Lock Protocol
+Before ANY write to `tracker.md`, you MUST acquire the lock:
+**Lock file:** `.ralph-flow/{{APP_NAME}}/02-fix-loop/.tracker-lock`
+### Acquire Lock
+1. Check if `.tracker-lock` exists
+   - Exists AND file is < 60 seconds old → sleep 2s, retry (up to 5 retries)
+   - Exists AND file is ≥ 60 seconds old → stale lock, delete it (agent crashed mid-write)
+   - Does not exist → continue
+2. Write lock: `echo "{{AGENT_NAME}} $(date -u +%Y-%m-%dT%H:%M:%SZ)" > .ralph-flow/{{APP_NAME}}/02-fix-loop/.tracker-lock`
+3. Sleep 500ms (`sleep 0.5`)
+4. Re-read `.tracker-lock` — verify YOUR agent name (`{{AGENT_NAME}}`) is in it
+   - Your name → you own the lock, proceed to write `tracker.md`
+   - Other name → you lost the race, retry from step 1
+5. Write your changes to `tracker.md`
+6. Delete `.tracker-lock` immediately: `rm .ralph-flow/{{APP_NAME}}/02-fix-loop/.tracker-lock`
+7. Never leave a lock held — if your write fails, delete the lock in your error handler
+### When to Lock
+- Claiming a fix (pending → in_progress)
+- Completing a fix (in_progress → completed)
+- Updating stage transitions (fix → verify → harden)
+- Heartbeat updates (bundled with other writes, not standalone)
+### When NOT to Lock
+- Reading `tracker.md` — read-only access needs no lock
+- Reading `hypotheses.md` or `bugs.md` — always read-only
+---
+## Fix Selection Algorithm
+Instead of "pick next unchecked fix", follow this algorithm:
+1. **Parse tracker** — read `completed_fixes`, `## Dependencies`, Fixes Queue metadata `{agent, status}`, Agent Status table
+2. **Resume own work** — if any fix has `{agent: {{AGENT_NAME}}, status: in_progress}`, resume it (skip to the current stage)
+3. **Find claimable** — filter fixes where `status: pending` AND `agent: -`
+4. **Priority order** — prefer fixes for bugs marked `critical` or `high` severity. If same severity, pick lowest-numbered.
+5. **Apply subsystem affinity** — prefer fixes in the same area of the codebase where `{{AGENT_NAME}}` already completed work (preserves context). If no affinity match, pick any claimable fix.
+6. **Claim** — acquire lock, set `{agent: {{AGENT_NAME}}, status: in_progress}`, update your Agent Status row, update `last_heartbeat`, release lock, log the claim
+7. **Nothing available:**
+   - All fixes completed → emit `<promise>ALL FIXES VERIFIED</promise>`
+   - All remaining fixes are claimed by others → log "{{AGENT_NAME}}: waiting — all fixes claimed", exit: `kill -INT $PPID` (the `while` loop restarts and re-checks)
+### New Fix Discovery
+If you find a fix in the Fixes Queue without `{agent, status}` metadata (e.g., added by the hypothesize loop while agents were running):
+1. Read the corresponding hypothesis in `hypotheses.md`
+2. Set status to `pending`, agent to `-`
+---
+## Anti-Hijacking Rules
+1. **Never touch another agent's `in_progress` fix** — do not modify, complete, or reassign it
+2. **Respect subsystem ownership** — if another agent has an active `in_progress` fix in the same module/subsystem, leave remaining fixes in that area for them (affinity will naturally guide this). Only claim from that area if the other agent has finished all their fixes there.
+3. **Note file overlap conflicts** — if your fix modifies files that another agent's active fix also modifies, log a WARNING in the tracker and coordinate carefully
+---
+## Heartbeat Protocol
+Every tracker write includes updating your `last_heartbeat` to current ISO 8601 timestamp in the Agent Status table. If another agent's heartbeat is **30+ minutes stale**, log a WARNING in the tracker log but do NOT auto-reclaim their fix — user must manually reset.
+---
+## Crash Recovery (Self)
+On fresh start, if your agent name has an `in_progress` fix but you have no memory of it:
+- Fix committed and tests passing → resume at HARDEN stage
+- Fix committed but tests not checked → resume at VERIFY stage
+- No commits found → restart from FIX stage
+---
+## State Machine (3 stages per fix)
+```
+FIX    → Write failing test, implement SINGLE root-cause fix, commit    → stage: verify
+VERIFY → Run test suite, check for regressions, record evidence         → stage: harden
+HARDEN → Defense-in-depth validation at multiple layers, update CLAUDE.md → next fix or kill
+```
+When ALL done: `<promise>ALL FIXES VERIFIED</promise>`
+After completing ANY full fix cycle (all 3 stages), exit: `kill -INT $PPID`
+---
+## First-Run Handling
+If Fixes Queue in tracker is empty: read the hypothesize loop's tracker at `.ralph-flow/{{APP_NAME}}/01-hypothesize-loop/tracker.md`, find confirmed hypotheses, populate the Fixes Queue with `{agent: -, status: pending}` metadata, then start.
+---
+## STAGE 1: FIX
+1. Read tracker → **run fix selection algorithm** (see above)
+2. Read the confirmed HYPOTHESIS entry from `hypotheses.md` — study the root cause, test result, evidence references
+3. Read the corresponding BUG entry from `bugs.md` — study the reproduction steps, evidence chain
+4. Read `CLAUDE.md` for project context, conventions, test commands
+5. **Explore the fix area** — read 20+ files in and around the affected code. Understand the full context before touching anything.
+6. **Write a failing test that reproduces the bug:**
+   - The test must FAIL before the fix and PASS after
+   - Use the reproduction steps from the BUG entry as a guide
+   - Match existing test patterns per `CLAUDE.md`
+   - If no test framework exists: write a minimal script that exits 1 on failure, 0 on success
+   - Run the test — confirm it FAILS. If it passes, your test does not capture the bug.
+7. **Render a Fix Plan** — output an ASCII diagram showing:
+   - The single change to be made (file, function, what changes)
+   - Impact radius (what else touches this code)
+   - How the test validates the fix
+8. **Implement the SINGLE fix:**
+   - Address the root cause identified in the hypothesis — NOT the symptom
+   - ONE change at a time — no "while I'm here" improvements
+   - No bundled refactoring — the fix and only the fix
+   - Match existing code patterns and conventions per `CLAUDE.md`
+9. **Run the failing test** — confirm it now PASSES
+10. Commit with a clear message: `fix(scope): description — root cause: BUG-{N}`
+11. Acquire lock → update tracker: your Agent Status row `active_fix: FIX-{N}`, `stage: verify`, `last_heartbeat`, log entry → release lock
+## STAGE 2: VERIFY
+1. **Run the full test suite** (commands in `CLAUDE.md`)
+   - If no test suite: run lint, type checks, and manual verification of the reproduction steps
+2. **Check for regressions:**
+   - Did any previously passing tests break?
+   - Did the fix introduce new warnings or errors?
+   - Run the reproduction steps from the BUG entry — is the bug actually fixed?
+3. **Record verification evidence:**
+   - Test suite result: `X passed, Y failed, Z skipped`
+   - Regression check: `pass` or `{list of broken tests}`
+   - Reproduction check: `bug no longer reproduces` or `still reproduces`
+4. **If verification FAILS:**
+   - Do NOT add more code on top. STOP.
+   - Revert the fix: `git revert HEAD`
+   - Return to FIX stage with the new information
+   - If 3+ fix attempts fail: escalate to user via `AskUserQuestion`
+5. **If verification PASSES:** continue to HARDEN
+6. **Render a Verification Matrix** — output an ASCII table showing:
+   - Each verification criterion (test suite, regression, reproduction)
+   - Result (pass/fail)
+   - Evidence (command output summary)
+7. Acquire lock → update tracker: `stage: harden`, `last_heartbeat`, log entry with verification results → release lock
+## STAGE 3: HARDEN
+1. **Defense-in-depth — add validation at multiple layers:**
+   - **Layer 1: Entry point validation** — add input validation at the API/function boundary where bad data enters
+   - **Layer 2: Business logic validation** — add assertions at the business logic layer where the bug manifested
+   - **Layer 3: Environment guards** — add context-specific guards (e.g., test-mode safety nets, production-mode logging)
+   - **Layer 4: Debug instrumentation** — add logging at the component boundary where the trace chain crossed from working to broken
+   - Not all layers apply to every bug — add only those that make sense for this specific case. Minimum 2 layers.
+2. **Replace arbitrary timeouts with condition-based waiting:**
+   - Search for `sleep`, `setTimeout`, `delay` in the fix area
+   - If any are used as synchronization (waiting for a condition): replace with polling + condition check + timeout ceiling
+   - Pattern: `poll every Nms until condition true, fail after Xms`
+3. **Run the test suite again** — confirm defense-in-depth changes pass
+4. **Update CLAUDE.md** if the fix reveals patterns that future developers should know:
+   - New conventions discovered
+   - Anti-patterns to avoid in this area
+   - Debugging tips for this subsystem
+   - Keep additions under 150 words net
+5. Commit defense-in-depth changes separately: `harden(scope): defense-in-depth for BUG-{N}`
+6. **Render a Defense-in-Depth Layers diagram** — output an ASCII stacked-box diagram showing:
+   - Each validation layer added
+   - What it catches
+   - Where it lives (file:line)
+7. **Write FIX entry in `fixes.md`:**
+```markdown
+## FIX-{N}: {One-line description of what was fixed}
+**Bug:** BUG-{M}
+**Hypothesis:** HYP-{K}
+**Agent:** {{AGENT_NAME}}
+### Root Cause
+{Confirmed root cause from hypothesis — one paragraph}
+### Fix Applied
+- **Change:** {What was changed — file, function, nature of change}
+- **Commit:** {commit hash}
+- **Test added:** {test file and test name}
+### Defense-in-Depth
+- **Layer 1:** {Entry validation — what, where}
+- **Layer 2:** {Business logic — what, where}
+- **Layer 3:** {Environment guard — what, where} (if applicable)
+- **Layer 4:** {Debug instrumentation — what, where} (if applicable)
+- **Commit:** {commit hash}
+### Verification Evidence
+- **Test suite:** {X passed, Y failed}
+- **Regression check:** {pass | details}
+- **Reproduction check:** {bug no longer reproduces}
+- **Post-hardening suite:** {X passed, Y failed}
+### CLAUDE.md Updates
+- {What was added, or "None needed"}
+```
+8. **Mark done & check for more work:**
+   - Acquire lock
+   - Add fix to `completed_fixes` list
+   - Check off fix in Fixes Queue: `[x]`, set `{agent: {{AGENT_NAME}}, status: completed}`
+   - Add commit hashes to Completed Mapping
+   - Update your Agent Status row: clear `active_fix`
+   - Update `last_heartbeat`
+   - Log entry
+   - Release lock
+9. **Run fix selection algorithm again:**
+   - Claimable fix found → claim it, set `stage: fix`, exit: `kill -INT $PPID`
+   - All fixes completed → `<promise>ALL FIXES VERIFIED</promise>`
+   - All claimed → log "waiting", exit: `kill -INT $PPID`
+---
+## Decision Reporting Protocol
+When you make a substantive decision a human reviewer would want to know about, report it to the dashboard:
+**When to report:**
+- Fix approach decisions (why this implementation over alternatives)
+- Test strategy choices (what the test covers, what it doesn't)
+- Defense-in-depth layer decisions (which layers to add and why)
+- Timeout replacement decisions (when replacing sleep with condition-based waiting)
+- CLAUDE.md update decisions (what patterns to document)
+- Revert decisions (when a fix attempt fails verification)
+- File overlap or conflict decisions (how you handled shared files with other agents)
+**How to report:**
+```bash
+curl -s --connect-timeout 2 --max-time 5 -X POST "http://127.0.0.1:4242/api/decision?app=$RALPHFLOW_APP&loop=$RALPHFLOW_LOOP" -H 'Content-Type: application/json' -d '{"item":"FIX-{N}","agent":"{{AGENT_NAME}}","decision":"{one-line summary}","reasoning":"{why this choice}"}'
+```
+**Do NOT report** routine operations: claiming a fix, updating heartbeat, stage transitions, waiting for claimed fixes. Only report substantive choices that affect the implementation.
+**Best-effort only:** If the dashboard is unreachable (curl fails), continue working normally. Decision reporting must never block or delay your work.
+---
+## Anti-Pattern Table
+| Thought | Response |
+|---------|----------|
+| "I already know the fix, skip the failing test" | NO. The test proves the fix works. Without it, you have an untested change. |
+| "Let me fix a few other things while I'm here" | NO. One fix per root cause. Bundled changes mask which change actually fixed the bug. |
+| "Defense-in-depth is overkill for this" | NO. Single-layer validation gets bypassed. Add at least 2 layers. |
+| "The test suite takes too long, I'll skip it" | NO. Skipped verification means unknown regressions. Run the suite. |
+| "Let me refactor this code while fixing the bug" | NO. Refactoring is a separate concern. Fix the bug, harden, ship. Refactor later. |
+| "This timeout works fine, no need to replace it" | MAYBE. If it's a synchronization sleep, replace it. If it's a user-facing delay, leave it. |
+| "The fix is obvious from the hypothesis" | YES, but write the failing test FIRST anyway. Obvious fixes still need verification. |
+| "I'll write the test after the fix passes" | NO. Write the test FIRST, confirm it FAILS, then implement the fix. This is non-negotiable. |
+| "CLAUDE.md doesn't need updating" | MAYBE. If the fix reveals a pattern others should know about, document it. When in doubt, document. |
+| "Three fix attempts failed, let me try harder" | NO. Escalate to user. Three failures means something fundamental is wrong. |
+---
+## Condition-Based Waiting Reference
+When replacing arbitrary timeouts during HARDEN:
+**Bad (arbitrary timeout):**
+```javascript
+await sleep(5000); // Hope the server is ready
+```
+**Good (condition-based):**
+```javascript
+const deadline = Date.now() + 30000; // 30s ceiling
+while (Date.now() < deadline) {
+  const ready = await checkCondition();
+  if (ready) break;
+  await sleep(500); // Poll interval
+}
+if (Date.now() >= deadline) throw new Error('Timed out waiting for condition');
+```
+**Key properties:**
+- Polls a real condition, not calendar time
+- Has a ceiling timeout to prevent infinite waits
+- Poll interval is short enough to be responsive
+- Throws on timeout instead of silently continuing
+---
+## Rules
+- One fix at a time per agent. All 3 stages run in one iteration, one `kill` at the end.
+- Read tracker first, update tracker last. Always use lock protocol for writes.
+- Read `CLAUDE.md` for all project-specific context.
+- **Failing test FIRST.** No fix is implemented without a test that proves the bug exists.
+- **ONE change per fix.** No bundling, no "while I'm here" improvements.
+- **Defense-in-depth is mandatory.** Minimum 2 validation layers per fix.
+- **Verify with the full suite.** No shortcuts, no "it should be fine."
+- **Revert on failure.** If verification fails, revert and re-analyze. Do not stack fixes.
+- **Escalate at 3 failures.** Do not attempt fix #4 without user consultation.
+- Update `CLAUDE.md` when the fix reveals patterns (under 150 words net).
+- **Multi-agent: never touch another agent's in_progress fix. Coordinate via tracker.md.**
+---
+Read `.ralph-flow/{{APP_NAME}}/02-fix-loop/tracker.md` now and begin.

package/src/templates/systematic-debugging/loops/02-fix-loop/tracker.md ADDED Viewed

@@ -0,0 +1,18 @@
+# Fix Loop — Tracker
+- completed_fixes: []
+## Agent Status
+| agent | active_fix | stage | last_heartbeat |
+|-------|------------|-------|----------------|
+---
+## Dependencies
+## Fixes Queue
+## Escalation Queue
+## Log

package/src/templates/systematic-debugging/ralphflow.yaml ADDED Viewed

@@ -0,0 +1,81 @@
+name: systematic-debugging
+description: "Investigate → Hypothesize → Fix pipeline for root-cause-first debugging"
+version: 1
+dir: .ralph-flow
+entities:
+  BUG:
+    prefix: BUG
+    data_file: 00-investigate-loop/bugs.md
+  HYPOTHESIS:
+    prefix: HYP
+    data_file: 01-hypothesize-loop/hypotheses.md
+  FIX:
+    prefix: FIX
+    data_file: 02-fix-loop/fixes.md
+loops:
+  investigate-loop:
+    order: 0
+    name: "Investigate Loop"
+    prompt: 00-investigate-loop/prompt.md
+    tracker: 00-investigate-loop/tracker.md
+    data_files:
+      - 00-investigate-loop/bugs.md
+    entities: [BUG]
+    stages: [reproduce, trace, evidence]
+    completion: "ALL BUGS INVESTIGATED"
+    feeds: [hypothesize-loop]
+    multi_agent: false
+    model: claude-sonnet-4-6
+    cadence: 0
+  hypothesize-loop:
+    order: 1
+    name: "Hypothesize Loop"
+    prompt: 01-hypothesize-loop/prompt.md
+    tracker: 01-hypothesize-loop/tracker.md
+    data_files:
+      - 01-hypothesize-loop/hypotheses.md
+    entities: [HYPOTHESIS, BUG]
+    stages: [analyze, hypothesize, test]
+    completion: "ALL HYPOTHESES TESTED"
+    fed_by: [investigate-loop]
+    feeds: [fix-loop]
+    model: claude-sonnet-4-6
+    multi_agent:
+      enabled: true
+      max_agents: 3
+      strategy: tracker-lock
+      agent_placeholder: "{{AGENT_NAME}}"
+    lock:
+      file: 01-hypothesize-loop/.tracker-lock
+      type: echo
+      stale_seconds: 60
+    cadence: 0
+  fix-loop:
+    order: 2
+    name: "Fix Loop"
+    prompt: 02-fix-loop/prompt.md
+    tracker: 02-fix-loop/tracker.md
+    data_files:
+      - 02-fix-loop/fixes.md
+    entities: [FIX, BUG]
+    stages: [fix, verify, harden]
+    completion: "ALL FIXES VERIFIED"
+    fed_by: [hypothesize-loop]
+    model: claude-sonnet-4-6
+    multi_agent:
+      enabled: true
+      max_agents: 3
+      strategy: tracker-lock
+      agent_placeholder: "{{AGENT_NAME}}"
+    lock:
+      file: 02-fix-loop/.tracker-lock
+      type: echo
+      stale_seconds: 60
+    worktree:
+      strategy: shared
+      auto_merge: true
+    cadence: 0

package/src/templates/tdd-implementation/loops/00-spec-loop/prompt.md ADDED Viewed

@@ -0,0 +1,208 @@
+# Spec Loop — Break Requirements into Testable Specifications
+**App:** `{{APP_NAME}}` — all flow files live under `.ralph-flow/{{APP_NAME}}/`.
+Read `.ralph-flow/{{APP_NAME}}/00-spec-loop/tracker.md` FIRST to determine where you are.
+> **Think in tests, not tasks.** Every specification you write must answer: "What does the test assert?" and "What does the user observe?" If you cannot write a concrete assertion, the spec is not ready.
+> **READ-ONLY FOR SOURCE CODE.** Only write to: `.ralph-flow/{{APP_NAME}}/01-tdd-loop/test-cases.md`, `.ralph-flow/{{APP_NAME}}/01-tdd-loop/tracker.md`, `.ralph-flow/{{APP_NAME}}/00-spec-loop/tracker.md`, `.ralph-flow/{{APP_NAME}}/00-spec-loop/specs.md`.
+**Pipeline:** `specs.md → YOU → test-cases.md → 01-tdd-loop → code`
+---
+## Visual Communication Protocol
+When communicating scope, structure, relationships, or status, render **ASCII diagrams** using Unicode box-drawing characters. These help the user see the full picture at the terminal without scrolling through prose.
+**Character set:** `┌ ─ ┐ │ └ ┘ ├ ┤ ┬ ┴ ┼ ═ ● ○ ▼ ▶`
+**Diagram types to use:**
+- **Spec/Architecture Map** — components and their relationships in a bordered grid
+- **Decomposition Tree** — hierarchical breakdown with `├──` and `└──` branches
+- **Data Flow** — arrows (`──→`) showing how information moves between components
+- **Comparison Table** — bordered table for trade-offs and design options
+- **Status Summary** — bordered box with completion indicators (`✓` done, `◌` pending)
+**Rules:** Keep diagrams under 20 lines and under 70 characters wide. Populate with real data from current context. Render inside fenced code blocks. Use diagrams to supplement, not replace, prose.
+---
+## State Machine (3 stages per spec)
+**FIRST — Check completion.** Read the tracker. If the Specs Queue has entries
+AND every entry is `[x]` (no pending specs):
+1. **Re-scan `specs.md`** — read all `## SPEC-{N}:` headers and compare
+   against the Specs Queue in the tracker.
+2. **New specs found** (in `specs.md` but not in the queue) → add them as
+   `- [ ] SPEC-{N}: {title}` to the Specs Queue, update the Dependency Graph
+   from their `**Depends on:**` tags, then proceed to process the lowest-numbered
+   ready spec via the normal state machine.
+3. **No new specs** → go to **"No Specs? Collect Them"** to ask the user.
+Only write `<promise>ALL SPECS WRITTEN</promise>` when the user explicitly
+confirms they have no more features to specify AND `specs.md` has no specs
+missing from the tracker queue.
+Pick the lowest-numbered `ready` spec. NEVER process a `blocked` spec.
+---
+## No Specs? Collect Them
+**Triggers when:**
+- `specs.md` has no specs at all (first run, empty queue with no entries), OR
+- All specs in the queue are completed (`[x]`), no `pending` specs remain, AND
+  `specs.md` has been re-scanned and contains no specs missing from the queue
+**Flow:**
+1. Tell the user: *"No pending specs. Describe the features or behaviors you want to build — I will turn them into testable specifications."*
+2. Use `AskUserQuestion` to prompt: "What do you want to build or fix next?" (open-ended)
+3. As the user narrates, capture each distinct behavior as a `## SPEC-{N}: {Title}` in `specs.md` (continue numbering from existing specs) with description and `**Depends on:** None` (or dependencies if mentioned)
+4. **Confirm specs & dependencies** — present all captured specs back. Use `AskUserQuestion` (up to 5 questions) to validate: correct specs? right dependency order? any to split/merge? priority adjustments?
+5. Apply corrections, finalize `specs.md`, add new entries to tracker queue, proceed to normal flow
+---
+```
+ANALYZE   → Read requirements, explore codebase, map behaviors    → stage: specify
+SPECIFY   → Write detailed specs with acceptance criteria         → stage: decompose
+DECOMPOSE → Break into TEST-CASE entries with exact assertions    → kill
+```
+## First-Run / New Spec Detection
+If Specs Queue in tracker is empty OR all entries are `[x]`: read `specs.md`,
+scan `## SPEC-{N}:` headers + `**Depends on:**` tags. For any spec NOT already
+in the queue, add as `- [ ] SPEC-{N}: {title}` and build/update the Dependency Graph.
+If new specs were added, proceed to process them. If the queue is still empty
+after scanning, go to **"No Specs? Collect Them"**.
+---
+## STAGE 1: ANALYZE
+1. Read tracker → pick lowest-numbered `ready` spec
+2. Read the spec from `specs.md` (+ any referenced screenshots or docs)
+3. **Explore the codebase** — read `CLAUDE.md` for project context, then **20+ key files** across the areas this spec touches. Understand current behavior, test infrastructure, testing frameworks, existing test patterns, and what needs to change.
+4. **Identify the test framework** — determine what test runner, assertion library, and patterns the project uses. Note test file locations, naming conventions, and execution commands.
+5. **Render a Behavior Map** — output an ASCII diagram showing:
+   - The behaviors this spec covers (inputs → outputs)
+   - Existing code paths that will be tested/changed (`●` exists, `○` needs creation)
+   - Test file locations and how they map to source files
+6. Update tracker: `active_spec: SPEC-{N}`, `stage: specify`, log entry
+## STAGE 2: SPECIFY
+1. Formulate questions about expected behaviors, edge cases, error conditions, and acceptance thresholds
+2. **Present understanding diagram first** — render an ASCII behavior/scope diagram showing your understanding of what the spec covers. This gives the user a visual anchor to correct misconceptions.
+3. **Ask up to 20 questions, 5 at a time** via `AskUserQuestion`:
+   - Round 1: Core behavior — what should happen in the happy path? What inputs and outputs?
+   - Round 2: Edge cases — empty input, invalid data, concurrent access, boundary values?
+   - Round 3: Error handling — what errors can occur? What should the user see?
+   - Round 4+: Integration — how does this interact with other specs? Performance constraints?
+   - Stop early if clear enough
+4. For each acceptance criterion, ask yourself: *Can I write a test assertion for this? If not, it is too vague.*
+5. Save Q&A summary in tracker log
+6. Update tracker: `stage: decompose`, log entry with key decisions
+## STAGE 3: DECOMPOSE
+1. Find next TEST-CASE numbers (check existing in `01-tdd-loop/test-cases.md`)
+2. **Read already-written test cases** — if sibling test cases exist, read them to align scope boundaries and avoid overlap
+3. **Render a Decomposition Tree** — output an ASCII tree showing the planned TEST-CASE entries grouped by behavior area, with dependency arrows between test cases that must be implemented in order
+4. Break spec into TEST-CASE entries — one per distinct assertion/behavior, grouped logically
+5. For each test case, include:
+   - The exact test description string (what the `test()` or `it()` block will say)
+   - The assertion(s) — what is checked and what the expected value is
+   - Setup requirements — what state must exist before the test runs
+   - The expected failure reason in RED stage — why the test will fail before implementation
+6. **Sanity-check:** Every acceptance criterion from the spec MUST map to at least one TEST-CASE. If an acceptance criterion has no test case, you missed something.
+7. Append to `01-tdd-loop/test-cases.md` (format below)
+8. **Update `01-tdd-loop/tracker.md` (with lock protocol):**
+   1. Acquire `.ralph-flow/{{APP_NAME}}/01-tdd-loop/.tracker-lock`:
+      - Exists + < 60s old → sleep 2s, retry up to 5 times
+      - Exists + >= 60s old → stale, delete it
+      - Not exists → continue
+      - Write lock: `echo "spec-loop $(date -u +%Y-%m-%dT%H:%M:%SZ)" > .ralph-flow/{{APP_NAME}}/01-tdd-loop/.tracker-lock`
+      - Sleep 500ms, re-read lock, verify `spec-loop` is in it
+   2. Add new Test Case Groups to `## Test Case Groups`
+   3. Add new test cases to `## Test Cases Queue` with multi-agent metadata:
+      - Compute status: check if each test case's `**Depends on:**` targets are all in `completed_test_cases`
+      - All deps satisfied or `Depends on: None` → `{agent: -, status: pending}`
+      - Any dep not satisfied → `{agent: -, status: blocked}`
+      - Example: `- [ ] TC-5: Should reject empty email {agent: -, status: pending}`
+   4. Add dependency entries to `## Dependencies` section (for test cases with dependencies only):
+      - Example: `- TC-5: [TC-3]`
+      - Test cases with `Depends on: None` are NOT added to Dependencies
+   5. Release lock: `rm .ralph-flow/{{APP_NAME}}/01-tdd-loop/.tracker-lock`
+9. Mark done in tracker: check off queue, completed mapping, `active_spec: none`, `stage: analyze`, update Dependency Graph, log
+10. Exit: `kill -INT $PPID`
+**TEST-CASE format:**
+```markdown
+## TC-{N}: {Test description string}
+**Source:** SPEC-{M}
+**Depends on:** {TC-{Y} or "None"}
+### Test Description
+`{exact string for test() or it() block}`
+### Setup
+{What state/data must exist before the test runs}
+### Assertion
+{Exact assertion(s) — what is checked and expected value}
+- `expect(result).toBe(...)` or equivalent plain-language assertion
+### Expected RED Failure
+{Why the test will fail before implementation — e.g., "function does not exist", "returns undefined instead of validated object"}
+### Implementation Hint
+{Brief guidance — which module/function to create or modify. Do NOT specify file paths — the TDD loop explores the codebase itself.}
+### Acceptance Criteria
+- [ ] {Specific, observable condition — maps back to SPEC acceptance criteria}
+```
+---
+## Decision Reporting Protocol
+When you make a substantive decision a human reviewer would want to know about, report it to the dashboard:
+**When to report:**
+- Scope boundary decisions (included/excluded behaviors from a spec)
+- Test strategy choices (unit vs integration, mocking decisions)
+- Decomposition decisions (why you split test cases one way vs. another)
+- Interpretation of ambiguous requirements (how you resolved unclear user intent)
+- Self-answered clarification questions (questions you could have asked but resolved yourself)
+**How to report:**
+```bash
+curl -s --connect-timeout 2 --max-time 5 -X POST "http://127.0.0.1:4242/api/decision?app=$RALPHFLOW_APP&loop=$RALPHFLOW_LOOP" -H 'Content-Type: application/json' -d '{"item":"SPEC-{N}","agent":"spec-loop","decision":"{one-line summary}","reasoning":"{why this choice}"}'
+```
+**Do NOT report** routine operations: picking the next spec, updating tracker, stage transitions, heartbeat updates. Only report substantive choices that affect the work product.
+**Best-effort only:** If the dashboard is unreachable (curl fails), continue working normally. Decision reporting must never block or delay your work.
+---
+## Rules
+- One spec at a time. All 3 stages run in one iteration, one `kill` at the end.
+- Read tracker first, update tracker last.
+- Append to `test-cases.md` — never overwrite. Numbers globally unique and sequential.
+- Test cases must be self-contained — the TDD loop never reads `specs.md`.
+- Every acceptance criterion must map to at least one test case.
+- Each test case = one assertion/behavior. If a test case has "and" in its description, split it.
+- Mark inter-test-case dependencies explicitly.
+- Think in assertions: if you cannot write `expect(x).toBe(y)`, the spec is not specific enough.
+---
+Read `.ralph-flow/{{APP_NAME}}/00-spec-loop/tracker.md` now and begin.

package/src/templates/tdd-implementation/loops/00-spec-loop/specs.md ADDED Viewed

@@ -0,0 +1,3 @@
+# Specs
+<!-- Populated by the spec loop -->

package/src/templates/tdd-implementation/loops/00-spec-loop/tracker.md ADDED Viewed

@@ -0,0 +1,16 @@
+# Spec Loop — Tracker
+- active_spec: none
+- stage: analyze
+- completed_specs: []
+- pending_specs: []
+---
+## Specs Queue
+## Dependency Graph
+## Completed Mapping
+## Log