@ai-dev-methodologies/rlp-desk 0.0.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 ai-dev-methodologies
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
package/README.md ADDED
@@ -0,0 +1,182 @@
1
+ # RLP Desk
2
+
3
+ > Fresh-context iterative loops for Claude Code — autonomous task completion with independent verification.
4
+
5
+ RLP Desk brings [Geoffrey Huntley's Ralph Loop](https://ghuntley.com/ralph/) philosophy to Claude Code. Inspired by [OpenAI Codex's long-horizon tasks](https://developers.openai.com/blog/run-long-horizon-tasks-with-codex/) and [design-desk](https://github.com/derrickchoi-openai/design-desk), it orchestrates fresh-context workers and verifiers through Claude Code's `Agent()` tool.
6
+
7
+ **Key insight**: Each iteration starts fresh. No accumulated context drift. The filesystem is the only memory.
8
+
9
+ ```
10
+ [Your Session = LEADER]
11
+
12
+ Agent()├──▶ [Worker (fresh context)]
13
+ │ └── reads PRD + memory → implements → updates memory
14
+
15
+ Agent()└──▶ [Verifier (fresh context)]
16
+ └── reads done-claim → runs checks → writes verdict
17
+ ```
18
+
19
+ ## Quick Start
20
+
21
+ ### 1. Install
22
+
23
+ ```bash
24
+ npm install -g @ai-dev-methodologies/rlp-desk
25
+ ```
26
+
27
+ Or without npm:
28
+
29
+ ```bash
30
+ curl -sSL https://raw.githubusercontent.com/ai-dev-methodologies/rlp-desk/main/install.sh | bash
31
+ ```
32
+
33
+ ### 2. Brainstorm
34
+
35
+ In your project directory, start a Claude Code session:
36
+
37
+ ```
38
+ /rlp-desk brainstorm "implement a Python calculator with tests"
39
+ ```
40
+
41
+ This interactively defines the contract: slug, objective, user stories, verification commands, and iteration settings.
42
+
43
+ ### 3. Run
44
+
45
+ ```
46
+ /rlp-desk run calculator
47
+ ```
48
+
49
+ The leader loop runs autonomously — spawning workers, verifying results, and tracking progress until completion or a circuit breaker triggers.
50
+
51
+ ## Why?
52
+
53
+ ### The Context Problem
54
+
55
+ LLM conversations accumulate context. Long sessions drift, hallucinate, and forget earlier decisions. The Ralph Loop solves this by treating **context as a disposable resource**:
56
+
57
+ - Each worker gets a **fresh context** — no prior conversation, no accumulated confusion
58
+ - **Filesystem = memory** — PRDs, campaign memory, and context files are the only state
59
+ - **Independent verification** — a separate fresh-context verifier checks the worker's claims against real evidence
60
+
61
+ ### Lineage
62
+
63
+ | Concept | Source |
64
+ |---------|--------|
65
+ | Fresh context per iteration | [Ralph Loop](https://ghuntley.com/ralph/) ([guide](https://www.aihero.dev/getting-started-with-ralph), [tips](https://www.aihero.dev/tips-for-ai-coding-with-ralph-wiggum)) |
66
+ | Long-horizon autonomous tasks | [OpenAI Codex](https://developers.openai.com/blog/run-long-horizon-tasks-with-codex/) |
67
+ | Desk-based orchestration | [design-desk](https://github.com/derrickchoi-openai/design-desk) |
68
+ | Agent() subprocess model | Claude Code native |
69
+
70
+ ## How It Works
71
+
72
+ ### Three Roles
73
+
74
+ | Role | Runs In | Responsibility |
75
+ |------|---------|----------------|
76
+ | **Leader** | Your current session | Orchestrates the loop, reads memory, selects models, writes sentinels |
77
+ | **Worker** | Fresh `Agent()` context | Executes one bounded action per iteration, updates memory |
78
+ | **Verifier** | Fresh `Agent()` context | Independently verifies worker claims with fresh evidence |
79
+
80
+ ### The Loop
81
+
82
+ ```
83
+ for iteration in 1..max_iter:
84
+
85
+ 1. Check sentinels (complete? blocked?)
86
+ 2. Read campaign memory → get next iteration contract
87
+ 3. Select model (haiku/sonnet/opus based on complexity)
88
+ 4. Build worker prompt → dispatch via Agent()
89
+ 5. Worker executes one bounded action, updates memory
90
+ 6. If worker claims done → dispatch Verifier via Agent()
91
+ 7. Verifier runs fresh checks → pass/fail/blocked
92
+ 8. Update status, report to user, continue or stop
93
+ ```
94
+
95
+ ### Circuit Breakers
96
+
97
+ | Condition | Action |
98
+ |-----------|--------|
99
+ | Context unchanged for 3 iterations | BLOCKED |
100
+ | Same error repeated twice | Upgrade model, retry once, then BLOCKED |
101
+ | Max iterations reached | TIMEOUT |
102
+
103
+ ### Model Routing
104
+
105
+ | Scenario | Model |
106
+ |----------|-------|
107
+ | Simple, single-file changes | `haiku` |
108
+ | Standard work (default) | `sonnet` |
109
+ | Architecture changes, multi-file, prior failure | `opus` |
110
+ | Standard verification | `sonnet` |
111
+ | Security/critical logic verification | `opus` |
112
+
113
+ ## Commands
114
+
115
+ ```
116
+ /rlp-desk brainstorm <description> Plan before init (interactive)
117
+ /rlp-desk init <slug> [objective] Create project scaffold
118
+ /rlp-desk run <slug> [--opts] Run the loop (this session = leader)
119
+ /rlp-desk status <slug> Show loop status
120
+ /rlp-desk logs <slug> [N] Show iteration logs
121
+ /rlp-desk clean <slug> Reset for re-run
122
+ ```
123
+
124
+ ### Run Options
125
+
126
+ | Flag | Default | Description |
127
+ |------|---------|-------------|
128
+ | `--max-iter N` | 100 | Maximum iterations before timeout |
129
+ | `--worker-model MODEL` | sonnet | Worker model (haiku/sonnet/opus) |
130
+ | `--verifier-model MODEL` | sonnet | Verifier model (haiku/sonnet/opus) |
131
+
132
+ ## Project Structure
133
+
134
+ After `init`, your project gets this scaffold:
135
+
136
+ ```
137
+ your-project/
138
+ └── .claude/ralph-desk/
139
+ ├── prompts/
140
+ │ ├── <slug>.worker.prompt.md
141
+ │ └── <slug>.verifier.prompt.md
142
+ ├── context/
143
+ │ └── <slug>-latest.md
144
+ ├── memos/
145
+ │ └── <slug>-memory.md
146
+ ├── plans/
147
+ │ ├── prd-<slug>.md
148
+ │ └── test-spec-<slug>.md
149
+ └── logs/<slug>/
150
+ └── status.json
151
+ ```
152
+
153
+ ## Example: Calculator
154
+
155
+ See [`examples/calculator/`](examples/calculator/) for a complete example that implements a Python calculator module with tests using the RLP Desk loop.
156
+
157
+ The example demonstrates:
158
+ - A PRD with two user stories (calculator functions + pytest tests)
159
+ - Test specification with verification commands
160
+ - Worker and verifier prompts configured for the task
161
+
162
+ To try it yourself:
163
+
164
+ ```
165
+ mkdir my-calc && cd my-calc
166
+ /rlp-desk brainstorm "Python calculator with add, subtract, multiply, divide + pytest tests"
167
+ /rlp-desk run loop-test
168
+ ```
169
+
170
+ ## Documentation
171
+
172
+ - [Architecture](docs/architecture.md) — Design philosophy and the Agent() approach
173
+ - [Getting Started](docs/getting-started.md) — Step-by-step tutorial with the calculator example
174
+ - [Protocol Reference](docs/protocol-reference.md) — Full protocol specification
175
+
176
+ ## Contributing
177
+
178
+ See [CONTRIBUTING.md](.github/CONTRIBUTING.md).
179
+
180
+ ## License
181
+
182
+ [MIT](LICENSE)
@@ -0,0 +1,129 @@
1
+ # Architecture
2
+
3
+ ## Design Philosophy
4
+
5
+ RLP Desk is built on a single conviction: **context is a liability, not an asset**.
6
+
7
+ In long-running LLM sessions, accumulated context causes drift, hallucination, and forgotten decisions. RLP Desk eliminates this by treating each iteration as a fresh start, with the filesystem as the sole source of truth.
8
+
9
+ ### Why Fresh Context Matters
10
+
11
+ Traditional approaches:
12
+ ```
13
+ Session start → Task 1 → Task 2 → ... → Task N
14
+ ↑ context accumulates, quality degrades
15
+ ```
16
+
17
+ RLP Desk approach:
18
+ ```
19
+ Leader ──Agent()──▶ Worker 1 (fresh) ──▶ writes to filesystem
20
+ ──Agent()──▶ Worker 2 (fresh) ──▶ reads filesystem, continues
21
+ ──Agent()──▶ Worker 3 (fresh) ──▶ reads filesystem, continues
22
+ ```
23
+
24
+ Each worker reads the same filesystem state that any human could inspect. No hidden context. No accumulated confusion.
25
+
26
+ ## The Agent() Approach
27
+
28
+ Claude Code's `Agent()` tool spawns a subprocess — a completely new context window with no knowledge of the parent conversation. RLP Desk exploits this property:
29
+
30
+ ```python
31
+ # Each call = new process = fresh context = no prior conversation
32
+ Agent(
33
+ subagent_type="executor", # Worker or Verifier
34
+ model="sonnet", # Model selection per iteration
35
+ prompt=full_prompt_text, # Everything the agent needs
36
+ mode="bypassPermissions" # Autonomous execution
37
+ )
38
+ ```
39
+
40
+ The Agent returns synchronously. No polling, no signal files, no tmux. The Leader simply reads the filesystem after each Agent completes.
41
+
42
+ ### Why Agent() Over Other Approaches
43
+
44
+ | Approach | Problem |
45
+ |----------|---------|
46
+ | Single long session | Context drift, token limits |
47
+ | tmux + polling | Complex, brittle, race conditions |
48
+ | Signal files + sleep loops | Fragile timing, wasted compute |
49
+ | **Agent() subprocess** | **Clean, synchronous, guaranteed fresh context** |
50
+
51
+ ## Three-Role Architecture
52
+
53
+ ### Leader (Your Session)
54
+
55
+ The Leader is the currently running Claude Code session. It:
56
+
57
+ - Reads campaign memory to understand current state
58
+ - Decides which model to use for the next iteration
59
+ - Builds prompts by combining base prompts with iteration-specific context
60
+ - Dispatches Workers and Verifiers via `Agent()`
61
+ - Writes sentinel files (COMPLETE/BLOCKED) based on results
62
+ - Tracks circuit breaker conditions
63
+
64
+ The Leader **never writes code**. It orchestrates.
65
+
66
+ ### Worker (Fresh Context)
67
+
68
+ Each Worker:
69
+
70
+ - Receives a complete prompt with everything it needs (PRD, memory, context, task)
71
+ - Executes exactly **one bounded action** (e.g., implement one user story)
72
+ - Updates the filesystem:
73
+ - `context/<slug>-latest.md` — current frontier
74
+ - `memos/<slug>-memory.md` — campaign memory for the next worker
75
+ - `memos/<slug>-done-claim.json` — if claiming all work is complete
76
+ - Exits
77
+
78
+ The Worker has no memory of previous iterations. It relies entirely on what prior Workers wrote to the filesystem.
79
+
80
+ ### Verifier (Fresh Context)
81
+
82
+ The Verifier exists because **Worker claims are not trustworthy**. A Worker may claim "all tests pass" without actually running them.
83
+
84
+ Each Verifier:
85
+
86
+ - Reads the PRD, test spec, and the Worker's done-claim
87
+ - Runs verification commands **from scratch** (build, test, lint)
88
+ - Checks each acceptance criterion against fresh evidence
89
+ - Writes a verdict: pass, fail, or blocked
90
+ - **Never modifies code**
91
+
92
+ ## Filesystem as Memory
93
+
94
+ ```
95
+ .claude/ralph-desk/
96
+ ├── plans/ # Contracts (PRD, test spec) — written once, rarely modified
97
+ ├── prompts/ # Base prompts — templates for Worker/Verifier
98
+ ├── context/ # Current frontier — Worker updates each iteration
99
+ ├── memos/ # Runtime state — memory, claims, verdicts, sentinels
100
+ └── logs/ # Audit trail — every prompt sent, every status change
101
+ ```
102
+
103
+ ### State Lifecycle
104
+
105
+ ```
106
+ plans/prd-*.md Written at init, stable reference
107
+ plans/test-spec-*.md Written at init, stable reference
108
+ context/*-latest.md Updated by Worker each iteration
109
+ memos/*-memory.md Rewritten by Worker each iteration
110
+ memos/*-done-claim.json Created by Worker, cleaned by Leader
111
+ memos/*-verify-verdict Created by Verifier, cleaned by Leader
112
+ memos/*-complete.md Written once by Leader (terminal)
113
+ memos/*-blocked.md Written once by Leader (terminal)
114
+ ```
115
+
116
+ ## Model Routing Strategy
117
+
118
+ Not every task needs the most powerful model. RLP Desk routes based on complexity:
119
+
120
+ ```
121
+ Simple fix (typo, config) → haiku (fast, cheap)
122
+ Standard implementation → sonnet (balanced)
123
+ Architecture / debugging → opus (thorough)
124
+ ```
125
+
126
+ The Leader adapts dynamically:
127
+ - Previous iteration failed → upgrade model
128
+ - Simple repetitive task → downgrade model
129
+ - User explicitly specified → respect the choice
@@ -0,0 +1,153 @@
1
+ # Getting Started
2
+
3
+ This guide walks you through your first RLP Desk loop using a simple Python calculator example.
4
+
5
+ ## Prerequisites
6
+
7
+ - [Claude Code](https://docs.anthropic.com/en/docs/claude-code) installed and authenticated
8
+ - A terminal with bash or zsh
9
+
10
+ ## Step 1: Install RLP Desk
11
+
12
+ ```bash
13
+ npm install -g @ai-dev-methodologies/rlp-desk
14
+ ```
15
+
16
+ Or without npm:
17
+
18
+ ```bash
19
+ curl -sSL https://raw.githubusercontent.com/ai-dev-methodologies/rlp-desk/main/install.sh | bash
20
+ ```
21
+
22
+ This installs three files:
23
+ - `~/.claude/commands/rlp-desk.md` — the slash command
24
+ - `~/.claude/ralph-desk/init_ralph_desk.zsh` — the scaffold generator
25
+ - `~/.claude/ralph-desk/governance.md` — the protocol document
26
+
27
+ ## Step 2: Create a Project
28
+
29
+ ```bash
30
+ mkdir calculator-demo && cd calculator-demo
31
+ git init
32
+ ```
33
+
34
+ ## Step 3: Brainstorm
35
+
36
+ Open Claude Code and run:
37
+
38
+ ```
39
+ /rlp-desk brainstorm "Python calculator module with add, subtract, multiply, divide functions and pytest tests"
40
+ ```
41
+
42
+ The brainstorm phase interactively determines:
43
+
44
+ | Item | Example |
45
+ |------|---------|
46
+ | **Slug** | `loop-test` |
47
+ | **Objective** | Implement calc.py + test_calc.py |
48
+ | **User Stories** | US-001: calculator functions, US-002: pytest tests |
49
+ | **Iteration Unit** | One user story per iteration |
50
+ | **Verification** | `python3 -m pytest test_calc.py -v` |
51
+ | **Models** | Worker: sonnet, Verifier: sonnet |
52
+ | **Max Iterations** | 10 |
53
+
54
+ On approval, brainstorm offers to run `init` automatically.
55
+
56
+ ## Step 4: Initialize (if not done in brainstorm)
57
+
58
+ ```
59
+ /rlp-desk init loop-test "Python calculator with tests"
60
+ ```
61
+
62
+ This creates the scaffold:
63
+
64
+ ```
65
+ .claude/ralph-desk/
66
+ ├── prompts/
67
+ │ ├── loop-test.worker.prompt.md
68
+ │ └── loop-test.verifier.prompt.md
69
+ ├── context/
70
+ │ └── loop-test-latest.md
71
+ ├── memos/
72
+ │ └── loop-test-memory.md
73
+ ├── plans/
74
+ │ ├── prd-loop-test.md
75
+ │ └── test-spec-loop-test.md
76
+ └── logs/loop-test/
77
+ ```
78
+
79
+ ## Step 5: Customize the PRD
80
+
81
+ Edit `.claude/ralph-desk/plans/prd-loop-test.md` to define your user stories and acceptance criteria. See [`examples/calculator/`](../examples/calculator/.claude/ralph-desk/plans/prd-loop-test.md) for a complete example.
82
+
83
+ Key sections:
84
+ - **User Stories** with specific, testable acceptance criteria
85
+ - **Technical Constraints** (e.g., "Python 3 + pytest only")
86
+ - **Done When** conditions
87
+
88
+ ## Step 6: Define the Test Spec
89
+
90
+ Edit `.claude/ralph-desk/plans/test-spec-loop-test.md` to specify verification commands:
91
+
92
+ ```markdown
93
+ ## Verification Commands
94
+ ### Test
95
+ python3 -m pytest test_calc.py -v
96
+
97
+ ## Criteria → Verification Mapping
98
+ | Criterion | Method | Command |
99
+ |-----------|--------|---------|
100
+ | calc.py exists | automated | test -f calc.py |
101
+ | All tests pass | automated | python3 -m pytest test_calc.py -v |
102
+ ```
103
+
104
+ ## Step 7: Run the Loop
105
+
106
+ ```
107
+ /rlp-desk run loop-test
108
+ ```
109
+
110
+ What happens:
111
+
112
+ 1. **Iteration 1**: Worker reads the PRD, implements `calc.py` (US-001), updates memory
113
+ 2. **Iteration 2**: Worker reads memory, implements `test_calc.py` (US-002), writes done-claim
114
+ 3. **Iteration 3**: Verifier runs all checks, writes pass verdict
115
+ 4. Leader writes COMPLETE sentinel, reports success
116
+
117
+ You'll see status updates after each iteration:
118
+
119
+ ```
120
+ Iteration 1 | Worker (sonnet) | US-001 complete, continuing
121
+ Iteration 2 | Worker (sonnet) | All stories done, requesting verification
122
+ Iteration 3 | Verifier (sonnet) | PASS — all criteria met
123
+ ✓ COMPLETE
124
+ ```
125
+
126
+ ## Step 8: Check Status
127
+
128
+ At any point during or after a run:
129
+
130
+ ```
131
+ /rlp-desk status loop-test
132
+ /rlp-desk logs loop-test # latest iteration
133
+ /rlp-desk logs loop-test 2 # specific iteration
134
+ ```
135
+
136
+ ## Step 9: Re-run (if needed)
137
+
138
+ If you want to run the loop again:
139
+
140
+ ```
141
+ /rlp-desk clean loop-test
142
+ /rlp-desk run loop-test
143
+ ```
144
+
145
+ `clean` removes runtime artifacts (sentinels, claims, verdicts) but preserves the PRD, test spec, and prompts.
146
+
147
+ ## Tips
148
+
149
+ - **Start small**: One or two user stories for your first loop
150
+ - **Be specific in acceptance criteria**: "function returns float" is testable; "function works well" is not
151
+ - **Include verification commands**: The verifier needs concrete commands to run
152
+ - **One story per iteration**: Each worker should do one bounded action
153
+ - **Check logs when stuck**: `logs/<slug>/iter-NNN.worker-prompt.md` shows exactly what the worker received
@@ -0,0 +1,220 @@
1
+ # Protocol Reference
2
+
3
+ Complete specification of the RLP Desk leader loop protocol, signal contracts, circuit breakers, and model routing.
4
+
5
+ ## Leader Loop Protocol
6
+
7
+ ```
8
+ for iteration in 1..max_iter:
9
+
10
+ ① Check sentinels
11
+ - <slug>-complete.md exists → stop (success)
12
+ - <slug>-blocked.md exists → stop (failure)
13
+
14
+ ② Read memory.md
15
+ - Parse "Stop Status" section → continue/verify/blocked
16
+ - Parse "Next Iteration Contract" → task for this iteration
17
+
18
+ ③ Select model
19
+ - Apply model routing rules (see below)
20
+ - Check circuit breaker conditions
21
+
22
+ ④ Build Worker prompt
23
+ - Read base prompt from prompts/<slug>.worker.prompt.md
24
+ - Append iteration number + contract from memory
25
+ - Write audit copy to logs/<slug>/iter-NNN.worker-prompt.md
26
+
27
+ ⑤ Execute Worker
28
+ Agent(subagent_type="executor", model=selected, prompt=prompt)
29
+ - Synchronous return — wait for completion
30
+ - Each Agent() = fresh context (new subprocess)
31
+
32
+ ⑥ Read memory.md again (Worker updated it)
33
+ - stop=continue → go to ⑧
34
+ - stop=verify → go to ⑦
35
+ - stop=blocked → write BLOCKED sentinel, stop
36
+
37
+ ⑦ Execute Verifier
38
+ - Build verifier prompt → write to logs/<slug>/iter-NNN.verifier-prompt.md
39
+ - Agent(subagent_type="executor", model=selected, prompt=prompt)
40
+ - Read verify-verdict.json:
41
+ • verdict=pass + recommended=complete → write COMPLETE sentinel, stop
42
+ • verdict=fail + recommended=continue → go to ⑧
43
+ • verdict=blocked → write BLOCKED sentinel, stop
44
+
45
+ ⑧ Update status.json, report to user, clean runtime files, next iteration
46
+ ```
47
+
48
+ ## Signal Contracts
49
+
50
+ ### Campaign Memory (`<slug>-memory.md`)
51
+
52
+ Written by the Worker at the end of each iteration. Must contain:
53
+
54
+ ```markdown
55
+ # <slug> - Campaign Memory
56
+
57
+ ## Stop Status
58
+ continue | verify | blocked
59
+
60
+ ## Objective
61
+ <original objective>
62
+
63
+ ## Current State
64
+ Iteration N - <description>
65
+
66
+ ## Next Iteration Contract
67
+ <specific task for the next worker>
68
+
69
+ ## Patterns Discovered
70
+ ## Learnings
71
+ ## Evidence Chain
72
+ ```
73
+
74
+ The Leader reads **Stop Status** and **Next Iteration Contract** to decide what happens next.
75
+
76
+ ### Done Claim (`<slug>-done-claim.json`)
77
+
78
+ Written by the Worker when claiming all work is complete:
79
+
80
+ ```json
81
+ {
82
+ "iteration": 3,
83
+ "claimed_at_utc": "2025-01-15T10:30:00Z",
84
+ "summary": "All user stories implemented and tests passing",
85
+ "stories_completed": ["US-001", "US-002"],
86
+ "evidence": {
87
+ "test_output": "8 passed in 0.05s",
88
+ "files_created": ["calc.py", "test_calc.py"]
89
+ }
90
+ }
91
+ ```
92
+
93
+ ### Verify Verdict (`<slug>-verify-verdict.json`)
94
+
95
+ Written by the Verifier after independent verification:
96
+
97
+ ```json
98
+ {
99
+ "verdict": "pass|fail|blocked",
100
+ "verified_at_utc": "2025-01-15T10:35:00Z",
101
+ "summary": "All criteria verified with fresh evidence",
102
+ "criteria_results": [
103
+ {
104
+ "criterion": "US-001 AC1: calc.py exists",
105
+ "met": true,
106
+ "evidence": "test -f calc.py → exit 0"
107
+ }
108
+ ],
109
+ "missing_evidence": [],
110
+ "issues": [],
111
+ "recommended_state_transition": "complete|continue|blocked",
112
+ "next_iteration_contract": "Fix failing test for divide by zero",
113
+ "evidence_paths": ["test_calc.py::test_divide_by_zero"]
114
+ }
115
+ ```
116
+
117
+ ### Sentinels
118
+
119
+ Leader-only files that terminate the loop:
120
+
121
+ | File | Meaning | Written When |
122
+ |------|---------|--------------|
123
+ | `<slug>-complete.md` | Loop succeeded | Verifier passes all criteria |
124
+ | `<slug>-blocked.md` | Loop cannot continue | Autonomous blocker or circuit breaker |
125
+
126
+ **Only the Leader writes sentinels.** Workers and Verifiers never touch them.
127
+
128
+ ## Context File (`<slug>-latest.md`)
129
+
130
+ Updated by the Worker each iteration to reflect the current frontier:
131
+
132
+ ```markdown
133
+ # <slug> - Latest Context
134
+
135
+ ## Current Frontier
136
+ ### Completed
137
+ - US-001: calculator functions implemented
138
+ ### In Progress
139
+ - US-002: pytest tests
140
+ ### Next
141
+ - Run verification
142
+
143
+ ## Key Decisions
144
+ - Using ValueError for divide-by-zero (not ZeroDivisionError)
145
+
146
+ ## Known Issues
147
+ ## Files Changed This Iteration
148
+ - calc.py (created)
149
+ ## Verification Status
150
+ - python3 -m pytest → not yet run
151
+ ```
152
+
153
+ ## Circuit Breakers
154
+
155
+ | Condition | Detection | Action |
156
+ |-----------|-----------|--------|
157
+ | Stale context | `context-latest.md` hash unchanged for 3 consecutive iterations | Write BLOCKED sentinel |
158
+ | Repeated error | Worker produces the same error message 2 iterations in a row | Upgrade model, retry once; still failing → BLOCKED |
159
+ | Timeout | Iteration count reaches `max_iter` | Write TIMEOUT status, report to user |
160
+
161
+ ### Stale Context Detection
162
+
163
+ The Leader computes a hash (or diff) of `context-latest.md` before and after each Worker runs. If the content doesn't change for 3 consecutive iterations, the Worker is stuck and the loop is blocked.
164
+
165
+ ### Error Escalation
166
+
167
+ ```
168
+ Error in iteration N (sonnet) → retry with opus in iteration N+1
169
+ Same error in iteration N+1 (opus) → BLOCKED
170
+ ```
171
+
172
+ ## Model Routing
173
+
174
+ ### Selection Matrix
175
+
176
+ | Scenario | Model | Rationale |
177
+ |----------|-------|-----------|
178
+ | Single file, clear change | `haiku` | Fast, sufficient |
179
+ | Standard implementation | `sonnet` | Balanced (default) |
180
+ | Multi-file, architecture | `opus` | Needs broad understanding |
181
+ | Previous iteration failed | upgrade | Harder model may succeed |
182
+ | Verification (standard) | `sonnet` | Sufficient for running checks |
183
+ | Verification (security) | `opus` | Critical logic needs thoroughness |
184
+
185
+ ### Dynamic Adaptation
186
+
187
+ The Leader reassesses the model every iteration:
188
+
189
+ 1. Read memory for previous iteration outcome
190
+ 2. If failed → upgrade one level (haiku → sonnet → opus)
191
+ 3. If simple/repetitive → consider downgrade
192
+ 4. User override via `--worker-model` / `--verifier-model` takes precedence
193
+
194
+ ## Status File (`status.json`)
195
+
196
+ Updated by the Leader after each iteration:
197
+
198
+ ```json
199
+ {
200
+ "slug": "loop-test",
201
+ "iteration": 2,
202
+ "max_iter": 100,
203
+ "phase": "worker|verifier|complete|blocked|timeout",
204
+ "worker_model": "sonnet",
205
+ "verifier_model": "sonnet",
206
+ "last_result": "continue|verify|pass|fail|blocked",
207
+ "updated_at_utc": "2025-01-15T10:30:00Z"
208
+ }
209
+ ```
210
+
211
+ ## Slash Command Reference
212
+
213
+ | Command | Arguments | Description |
214
+ |---------|-----------|-------------|
215
+ | `brainstorm` | `<description>` | Interactive planning before init |
216
+ | `init` | `<slug> [objective]` | Create project scaffold |
217
+ | `run` | `<slug> [--max-iter N] [--worker-model M] [--verifier-model M]` | Run the leader loop |
218
+ | `status` | `<slug>` | Display current loop status |
219
+ | `logs` | `<slug> [N]` | Show iteration logs |
220
+ | `clean` | `<slug>` | Remove runtime artifacts for re-run |