@ai-dev-methodologies/rlp-desk 0.0.2 → 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -30,15 +30,20 @@ Or without npm:
30
30
  curl -sSL https://raw.githubusercontent.com/ai-dev-methodologies/rlp-desk/main/install.sh | bash
31
31
  ```
32
32
 
33
- ### 2. Brainstorm
33
+ ### 2. Brainstorm (recommended)
34
34
 
35
- In your project directory, start a Claude Code session:
35
+ **Always start with brainstorm.** It interactively walks you through the project contract:
36
36
 
37
37
  ```
38
38
  /rlp-desk brainstorm "implement a Python calculator with tests"
39
39
  ```
40
40
 
41
- This interactively defines the contract: slug, objective, user stories, verification commands, and iteration settings.
41
+ You'll be asked to confirm each item:
42
+ - **Slug** — project identifier
43
+ - **User Stories** — discrete, testable units of work
44
+ - **Iteration Unit** — one story per iteration (incremental) or all at once (fast)
45
+ - **Verification Commands** — how to check the work
46
+ - **Models** — which Claude model for Worker/Verifier
42
47
 
43
48
  ### 3. Run
44
49
 
@@ -118,7 +123,7 @@ for iteration in 1..max_iter:
118
123
  /rlp-desk run <slug> [--opts] Run the loop (this session = leader)
119
124
  /rlp-desk status <slug> Show loop status
120
125
  /rlp-desk logs <slug> [N] Show iteration logs
121
- /rlp-desk clean <slug> Reset for re-run
126
+ /rlp-desk clean <slug> [--kill-session] Reset for re-run
122
127
  ```
123
128
 
124
129
  ### Run Options
@@ -128,6 +133,65 @@ for iteration in 1..max_iter:
128
133
  | `--max-iter N` | 100 | Maximum iterations before timeout |
129
134
  | `--worker-model MODEL` | sonnet | Worker model (haiku/sonnet/opus) |
130
135
  | `--verifier-model MODEL` | opus | Verifier model (haiku/sonnet/opus) |
136
+ | `--mode agent\|tmux` | agent | Execution mode (see below) |
137
+
138
+ ## Execution Modes
139
+
140
+ RLP Desk supports two execution modes. Both honor the same governance protocol.
141
+
142
+ ### Environment Compatibility
143
+
144
+ | Environment | Agent Mode | Tmux Mode |
145
+ |-------------|-----------|-----------|
146
+ | Claude Code (any terminal) | **Works** | Requires tmux |
147
+ | Inside tmux session | **Works** | **Works** — panes split in current window |
148
+ | Outside tmux session | **Works** | **Rejected** — "start tmux first" |
149
+
150
+ ### Agent Mode (default) — "Smart Mode"
151
+
152
+ ```
153
+ /rlp-desk run calculator
154
+ ```
155
+
156
+ The current Claude Code session acts as the Leader, dispatching Workers and Verifiers via `Agent()`. The Leader is an LLM that dynamically routes models and reasons about context.
157
+
158
+ - Works anywhere — no tmux required
159
+ - Dynamic model routing — Leader upgrades models on failure
160
+ - Fix Loop — extracts verifier issues and feeds them back to the next worker
161
+ - Best for interactive development
162
+
163
+ ### Tmux Mode — "Lean Mode"
164
+
165
+ ```
166
+ /rlp-desk run calculator --mode tmux
167
+ ```
168
+
169
+ **Requires running inside a tmux session.** A shell script takes over as Leader, splitting your current window into three panes. Workers run interactive `claude` sessions — you can watch them work in real-time.
170
+
171
+ ```
172
+ +---------------------+---------------------+
173
+ | Your pane (Leader) | Worker pane |
174
+ | shell loop running | claude TUI running |
175
+ | polls signal files | you see it working |
176
+ | +---------------------+
177
+ | | Verifier pane |
178
+ | | claude TUI running |
179
+ | | (only when needed) |
180
+ +---------------------+---------------------+
181
+ ```
182
+
183
+ - Real-time visibility — watch Worker/Verifier execute live
184
+ - Zero-token orchestration — shell loop, not LLM
185
+ - Automatic cleanup — panes removed on completion
186
+ - Best for long campaigns and observability
187
+
188
+ Prerequisites: `tmux` and `jq` must be installed.
189
+
190
+ To clean up tmux artifacts:
191
+
192
+ ```
193
+ /rlp-desk clean calculator --kill-session
194
+ ```
131
195
 
132
196
  ## Project Structure
133
197
 
@@ -169,7 +233,7 @@ mkdir my-calc && cd my-calc
169
233
 
170
234
  ## Documentation
171
235
 
172
- - [Architecture](docs/architecture.md) — Design philosophy and the Agent() approach
236
+ - [Architecture](docs/architecture.md) — Design philosophy, Agent() and tmux execution modes
173
237
  - [Getting Started](docs/getting-started.md) — Step-by-step tutorial with the calculator example
174
238
  - [Protocol Reference](docs/protocol-reference.md) — Full protocol specification
175
239
 
@@ -39,14 +39,40 @@ Agent(
39
39
 
40
40
  The Agent returns synchronously. No polling, no signal files, no tmux. The Leader simply reads the filesystem after each Agent completes.
41
41
 
42
- ### Why Agent() Over Other Approaches
43
-
44
- | Approach | Problem |
45
- |----------|---------|
46
- | Single long session | Context drift, token limits |
47
- | tmux + polling | Complex, brittle, race conditions |
48
- | Signal files + sleep loops | Fragile timing, wasted compute |
49
- | **Agent() subprocess** | **Clean, synchronous, guaranteed fresh context** |
42
+ ### Two Execution Modes
43
+
44
+ RLP Desk supports two modes for running the Leader loop. Both honor the same governance protocol (section 7). Choose based on your use case.
45
+
46
+ | Mode | Leader | Model Routing | Session Required | Best For |
47
+ |------|--------|---------------|------------------|----------|
48
+ | **Agent() "Smart mode"** (default) | LLM (current session) | Dynamic — Leader reasons about which model to use each iteration | Active Claude Code session | Interactive development, complex routing decisions |
49
+ | **Tmux — "Lean mode"** | Shell script (`run_ralph_desk.zsh`) | Static — set via `WORKER_MODEL`/`VERIFIER_MODEL` env vars | None (runs detached) | Long campaigns, CI, observability, zero-token orchestration |
50
+
51
+ **Agent() mode** is synchronous and simple: each `Agent()` call blocks until the subprocess finishes, then the Leader reads the filesystem. No polling, no signal files, no tmux.
52
+
53
+ **Tmux mode** trades dynamic routing for visibility and independence. The shell Leader writes prompts to files, sends short trigger commands via `tmux send-keys`, and polls structured JSON signal files (`iter-signal.json`, `verify-verdict.json`) for control flow. It uses proven [omc-teams](https://github.com/anthropics/omc-teams) tmux patterns — write-then-notify, pane ID stability, copy-mode guards, heartbeat monitoring — for reliable, race-free orchestration.
54
+
55
+ The tmux script is a second implementation of the governance protocol. Traceability is maintained via governance.md section 7 step-number comments throughout the script.
56
+
57
+ #### Tmux Architecture
58
+
59
+ ```
60
+ [tmux session: rlp-desk-<slug>-<timestamp>]
61
+ +-------------------------------------+
62
+ | Leader pane (shell loop) |
63
+ | - writes prompts to files |
64
+ | - sends short triggers via send-keys|
65
+ | - polls iter-signal.json via jq |
66
+ | - monitors heartbeat files |
67
+ | - writes sentinels |
68
+ +------------------+------------------+
69
+ | Worker pane | Verifier pane |
70
+ | bash trigger.sh | bash trigger.sh |
71
+ | -> claude -p ... | -> claude -p ... |
72
+ | heartbeat writer | heartbeat writer |
73
+ | (fresh context) | (fresh context) |
74
+ +------------------+------------------+
75
+ ```
50
76
 
51
77
  ## Three-Role Architecture
52
78
 
@@ -11,9 +11,16 @@ for iteration in 1..max_iter:
11
11
  - <slug>-complete.md exists → stop (success)
12
12
  - <slug>-blocked.md exists → stop (failure)
13
13
 
14
+ ①½ Prep-stage cleanup (before each iteration)
15
+ - Delete <slug>-done-claim.json if exists [leader-measured]
16
+ - Delete <slug>-verify-verdict.json if exists [leader-measured]
17
+ (Ensures stale runtime files from a previous run cannot mislead the loop)
18
+
14
19
  ② Read memory.md
15
20
  - Parse "Stop Status" section → continue/verify/blocked
16
21
  - Parse "Next Iteration Contract" → task for this iteration
22
+ • Also read "Completed Stories" → track what has been verified
23
+ • Also read "Key Decisions" → architectural choices already settled
17
24
 
18
25
  ③ Select model
19
26
  - Apply model routing rules (see below)
@@ -42,7 +49,8 @@ for iteration in 1..max_iter:
42
49
  • verdict=fail + recommended=continue → go to ⑧
43
50
  • verdict=blocked → write BLOCKED sentinel, stop
44
51
 
45
- Update status.json, report to user, clean runtime files, next iteration
52
+ Write iter-NNN.result.md (see Result Log below)
53
+ Update status.json, report to user, next iteration
46
54
  ```
47
55
 
48
56
  ## Signal Contracts
@@ -63,15 +71,66 @@ continue | verify | blocked
63
71
  ## Current State
64
72
  Iteration N - <description>
65
73
 
74
+ ## Completed Stories
75
+ - US-001: Calculator add/subtract implemented [interface: `add(a, b) -> float`]
76
+ - US-002: pytest suite — 8 tests passing
77
+
66
78
  ## Next Iteration Contract
67
- <specific task for the next worker>
79
+ **Story**: US-003 Edge case handling
80
+ **Task**: Handle divide-by-zero in calc.py.
81
+ 1. Raise ValueError with message "division by zero"
82
+ 2. Add test_divide_by_zero to test_calc.py
83
+
84
+ **Criteria**:
85
+ - `pytest` exits 0
86
+ - `grep "ValueError" calc.py` matches
87
+
88
+ ## Key Decisions
89
+ - Iteration 2: Chose ValueError over ZeroDivisionError — matches project error style.
90
+ - Iteration 3: Skipped type hints — out of scope per PRD Non-Goals.
68
91
 
69
92
  ## Patterns Discovered
70
93
  ## Learnings
71
94
  ## Evidence Chain
72
95
  ```
73
96
 
74
- The Leader reads **Stop Status** and **Next Iteration Contract** to decide what happens next.
97
+ The Leader reads:
98
+ - **Stop Status** and **Next Iteration Contract** to decide what happens next.
99
+ - **Completed Stories** to track verified work without re-reading full history.
100
+ - **Key Decisions** to carry forward settled architectural choices.
101
+
102
+ All sections use plain Markdown. No YAML.
103
+
104
+ ### Iteration Signal (`<slug>-iter-signal.json`)
105
+
106
+ Written by the Worker at the end of every iteration. Provides a structured JSON signal for the Leader to detect iteration completion without parsing markdown.
107
+
108
+ ```json
109
+ {
110
+ "iteration": 3,
111
+ "status": "continue|verify|blocked",
112
+ "summary": "Completed US-001, other stories remain",
113
+ "timestamp": "2025-01-15T10:30:00Z"
114
+ }
115
+ ```
116
+
117
+ | Field | Type | Description |
118
+ |-------|------|-------------|
119
+ | `iteration` | number | Current iteration number |
120
+ | `status` | string | One of: `continue`, `verify`, `blocked` |
121
+ | `summary` | string | Brief description of what was accomplished |
122
+ | `timestamp` | string | ISO 8601 UTC timestamp |
123
+
124
+ **Status values:**
125
+ - `continue` -- Current action done but more work remains. Leader proceeds to next iteration.
126
+ - `verify` -- All work complete and done-claim written. Leader dispatches Verifier.
127
+ - `blocked` -- Autonomous blocker encountered. Leader writes BLOCKED sentinel.
128
+
129
+ **Usage by mode:**
130
+ - **Tmux mode:** The shell Leader polls for this file's existence after dispatching the Worker. Once it appears, the Leader reads the `status` field via `jq` to decide the next step. This is the primary control-flow mechanism in tmux mode.
131
+ - **Agent() mode:** The Leader MAY read this file as a structured alternative to parsing `memory.md`'s Stop Status section. Agent() mode primarily uses memory.md, so iter-signal.json is supplementary.
132
+
133
+ **Worker obligation:** The Worker MUST write this file at the end of every iteration, regardless of execution mode. This ensures both Agent() and tmux modes can use the same Worker prompt templates.
75
134
 
76
135
  ### Done Claim (`<slug>-done-claim.json`)
77
136
 
@@ -92,11 +151,15 @@ Written by the Worker when claiming all work is complete:
92
151
 
93
152
  ### Verify Verdict (`<slug>-verify-verdict.json`)
94
153
 
95
- Written by the Verifier after independent verification:
154
+ Written by the Verifier after independent verification.
155
+
156
+ **Tmux mode polling:** In tmux mode, after dispatching the Verifier, the shell Leader polls for the existence of `verify-verdict.json` (same pattern as `iter-signal.json`). Once it appears, the Leader reads the `verdict` and `recommended_state_transition` fields via `jq` to decide whether to write a COMPLETE sentinel, continue iterating, or write a BLOCKED sentinel.
157
+
158
+ **Schema:**
96
159
 
97
160
  ```json
98
161
  {
99
- "verdict": "pass|fail|blocked",
162
+ "verdict": "pass|fail|request_info",
100
163
  "verified_at_utc": "2025-01-15T10:35:00Z",
101
164
  "summary": "All criteria verified with fresh evidence",
102
165
  "criteria_results": [
@@ -107,13 +170,76 @@ Written by the Verifier after independent verification:
107
170
  }
108
171
  ],
109
172
  "missing_evidence": [],
110
- "issues": [],
173
+ "issues": [
174
+ {
175
+ "criterion": "US-002 AC1",
176
+ "description": "Test file missing",
177
+ "severity": "critical|major|minor",
178
+ "fix_hint": "(suggestion, non-authoritative) Add test_calc.py"
179
+ }
180
+ ],
111
181
  "recommended_state_transition": "complete|continue|blocked",
112
182
  "next_iteration_contract": "Fix failing test for divide by zero",
113
183
  "evidence_paths": ["test_calc.py::test_divide_by_zero"]
114
184
  }
115
185
  ```
116
186
 
187
+ **Verdict values:**
188
+ - `pass`: all criteria met — Leader may write COMPLETE sentinel
189
+ - `fail`: one or more criteria not met — Leader reads issues, builds next contract
190
+ - `request_info`: Verifier cannot determine pass/fail without more information — summary contains specific questions; Leader decides outcome and may relay questions to Worker
191
+
192
+ **Issues severity:**
193
+ - `critical`: blocking — must be fixed before COMPLETE
194
+ - `major`: significant gap in acceptance criteria
195
+ - `minor`: cosmetic or non-blocking concern
196
+
197
+ **Verifier scope:**
198
+ - Identify changed files via `git diff --name-only` — read those files and their direct imports only
199
+ - Campaign Memory (`<slug>-memory.md`) is for orientation only — not the source of truth for verification
200
+ - Delegate deterministic checks (type hints, linting, security) to tools defined in test-spec
201
+ - Focus on: AC verification, semantic review, smoke tests
202
+ - Do NOT use `fail` when uncertain — use `request_info` with specific questions instead
203
+
204
+ ### Fix Loop Protocol
205
+
206
+ When the Verifier returns `fail`, the Leader executes the Fix Loop before dispatching the next Worker:
207
+
208
+ #### Flow
209
+
210
+ ```
211
+ Verifier fail
212
+ → Leader reads verify-verdict.json issues
213
+ → Sort issues by severity: critical → major → minor
214
+ → Build structured fix contract (see format below)
215
+ → Increment consecutive_failures in status.json
216
+ → Dispatch Worker with fix contract as Next Iteration Contract
217
+ ```
218
+
219
+ #### Fix Contract Format
220
+
221
+ ```markdown
222
+ ## Next Iteration Contract
223
+ **Mode**: fix
224
+ **Verifier verdict reference**: iter-NNN
225
+
226
+ **Issues to fix** (severity-sorted):
227
+ 1. [critical] US-002 AC3: <description>
228
+ - fix_hint: (suggestion, non-authoritative) <hint text>
229
+ 2. [major] US-001 AC1: <description>
230
+ 3. [minor] US-003 AC2: <description>
231
+
232
+ **Traceability rule**: Only changes that resolve a listed issue are allowed (traceability enforcement).
233
+ Every change must be justified by the issue it addresses.
234
+ ```
235
+
236
+ #### Rules
237
+
238
+ - `fix_hint` is optional. When present it is labeled `(suggestion, non-authoritative)` — the Worker may choose a different approach.
239
+ - **traceability**: the Worker must not introduce changes beyond what is needed to resolve the listed issues.
240
+ - The Leader increments `consecutive_failures` in `status.json` after each `fail` verdict, and resets it to 0 after any `pass`.
241
+ - The Leader (not the Worker) owns the `consecutive_failures` counter.
242
+
117
243
  ### Sentinels
118
244
 
119
245
  Leader-only files that terminate the loop:
@@ -155,7 +281,8 @@ Updated by the Worker each iteration to reflect the current frontier:
155
281
  | Condition | Detection | Action |
156
282
  |-----------|-----------|--------|
157
283
  | Stale context | `context-latest.md` hash unchanged for 3 consecutive iterations | Write BLOCKED sentinel |
158
- | Repeated error | Worker produces the same error message 2 iterations in a row | Upgrade model, retry once; still failing → BLOCKED |
284
+ | Repeated criterion failure | Same acceptance criterion fails in 2 consecutive Verifier verdicts | Upgrade model, retry once; still failing → BLOCKED |
285
+ | Persistent diverse failures | 3 consecutive **fail** verdicts on 3 unique acceptance criterion IDs | Upgrade to opus, retry once; still failing → BLOCKED |
159
286
  | Timeout | Iteration count reaches `max_iter` | Write TIMEOUT status, report to user |
160
287
 
161
288
  ### Stale Context Detection
@@ -165,10 +292,20 @@ The Leader computes a hash (or diff) of `context-latest.md` before and after eac
165
292
  ### Error Escalation
166
293
 
167
294
  ```
168
- Error in iteration N (sonnet) → retry with opus in iteration N+1
169
- Same error in iteration N+1 (opus) → BLOCKED
295
+ Same acceptance criterion fails iteration N (sonnet) → retry with opus in iteration N+1
296
+ Same acceptance criterion still fails iteration N+1 (opus) → BLOCKED
170
297
  ```
171
298
 
299
+ "Same error" is defined as: **the same acceptance criterion ID appears in the `issues` list of two consecutive Verifier `fail` verdicts.** A `request_info` verdict does not break or contribute to this chain — only `fail` verdicts are counted.
300
+
301
+ ### Consecutive Failures Counter
302
+
303
+ The Leader maintains `consecutive_failures` in `status.json`. This counter:
304
+ - Increments by 1 after each Verifier `fail` verdict
305
+ - Resets to 0 after any Verifier `pass` verdict
306
+ - **Unchanged** by `request_info` verdicts (neither increments nor resets)
307
+ - Triggers the 3-consecutive-diverse-failures CB when it reaches 3 and the 3 most recent `fail` verdicts each have a unique criterion ID
308
+
172
309
  ## Model Routing
173
310
 
174
311
  ### Selection Matrix
@@ -191,6 +328,29 @@ The Leader reassesses the model every iteration:
191
328
  3. If simple/repetitive → consider downgrade
192
329
  4. User override via `--worker-model` / `--verifier-model` takes precedence
193
330
 
331
+ ## Result Log (`iter-NNN.result.md`)
332
+
333
+ Written by the Leader after each iteration completes (step ⑧). Stored in `logs/<slug>/`.
334
+
335
+ ```markdown
336
+ # Iteration NNN Result
337
+
338
+ ## Result Status
339
+ pass | fail | continue [leader-measured]
340
+
341
+ ## Files Changed
342
+ (output of `git diff --stat HEAD~1 HEAD`) [git-measured]
343
+
344
+ ## Summary
345
+ <1–2 sentence summary of what the Worker did this iteration>
346
+
347
+ ## Verifier Verdict
348
+ pass | fail | blocked | (not run) [leader-measured]
349
+ ```
350
+
351
+ - `[leader-measured]`: value determined by the Leader reading memory/verdict files.
352
+ - `[git-measured]`: value determined by running `git diff --stat` — not from Worker's claim.
353
+
194
354
  ## Status File (`status.json`)
195
355
 
196
356
  Updated by the Leader after each iteration:
@@ -204,17 +364,110 @@ Updated by the Leader after each iteration:
204
364
  "worker_model": "sonnet",
205
365
  "verifier_model": "opus",
206
366
  "last_result": "continue|verify|pass|fail|blocked",
367
+ "consecutive_failures": 0,
207
368
  "updated_at_utc": "2025-01-15T10:30:00Z"
208
369
  }
209
370
  ```
210
371
 
372
+ - `consecutive_failures`: number of consecutive Verifier `fail` verdicts since the last `pass`. Reset to 0 on any `pass`. Unchanged by `request_info`. Used by the Circuit Breaker (see above).
373
+ - `last_failing_criteria`: (optional) array of criterion IDs from recent `fail` verdicts, used by Leader to detect same-criterion and diverse-failure CB patterns. Leaders may add additional tracking fields as needed.
374
+
375
+ ## Project Plans Files
376
+
377
+ The `plans/` directory holds documents that define the project's acceptance criteria and verification approach:
378
+
379
+ | File | Required | Description |
380
+ |------|----------|-------------|
381
+ | `plans/prd-<slug>.md` | Yes | Product Requirements Document — user stories, acceptance criteria, non-goals |
382
+ | `plans/test-spec-<slug>.md` | Yes | Test specification — verification commands, criteria-to-test mapping |
383
+ | `plans/quality-spec-<slug>.md` | Optional | Additional quality constraints (coding standards, performance budgets, security requirements). Not generated by `init` — create manually when needed. |
384
+
385
+ The `quality-spec` file is not generated by `init`. Create it manually when a project requires additional quality constraints beyond the acceptance criteria in the PRD.
386
+
211
387
  ## Slash Command Reference
212
388
 
213
389
  | Command | Arguments | Description |
214
390
  |---------|-----------|-------------|
215
391
  | `brainstorm` | `<description>` | Interactive planning before init |
216
392
  | `init` | `<slug> [objective]` | Create project scaffold |
217
- | `run` | `<slug> [--max-iter N] [--worker-model M] [--verifier-model M]` | Run the leader loop |
393
+ | `run` | `<slug> [--max-iter N] [--worker-model M] [--verifier-model M] [--mode agent\|tmux]` | Run the leader loop |
218
394
  | `status` | `<slug>` | Display current loop status |
219
395
  | `logs` | `<slug> [N]` | Show iteration logs |
220
- | `clean` | `<slug>` | Remove runtime artifacts for re-run |
396
+ | `clean` | `<slug> [--kill-session]` | Remove runtime artifacts for re-run |
397
+
398
+ ### `--mode` Flag
399
+
400
+ The `run` command accepts `--mode agent|tmux` (default: `agent`).
401
+
402
+ - **`--mode agent`** (default): The current Claude Code session acts as the Leader, dispatching Workers and Verifiers via `Agent()`. Synchronous, no tmux required.
403
+ - **`--mode tmux`**: Validates the scaffold, checks prerequisites (`tmux`, `jq`), then launches `run_ralph_desk.zsh` as the Leader. The LLM session exits after launching the script. The shell script runs independently in a tmux session.
404
+
405
+ ### `--kill-session` Flag
406
+
407
+ The `clean` command accepts `--kill-session` to kill any tmux sessions matching the slug pattern (`rlp-desk-<slug>-*`) in addition to removing runtime files.
408
+
409
+ ## Tmux Mode Specifics
410
+
411
+ This section documents the tmux-specific patterns used by `run_ralph_desk.zsh`. These apply only when running with `--mode tmux`.
412
+
413
+ ### Write-Then-Notify
414
+
415
+ The single most important pattern. **Never** send data (prompts, large strings) through `tmux send-keys` directly.
416
+
417
+ 1. Write the prompt to a file: `logs/<slug>/iter-NNN.worker-prompt.md`
418
+ 2. Write a trigger script to a file: `logs/<slug>/iter-NNN.worker-trigger.sh`
419
+ 3. Send only a short command via `send-keys`: `bash /path/to/trigger.sh`
420
+
421
+ The trigger script reads the prompt file and invokes `claude -p "$(cat /path/to/prompt.md)" --model <model> --dangerously-skip-permissions`.
422
+
423
+ ### Signal File Polling
424
+
425
+ In tmux mode, the shell Leader cannot call `Agent()` synchronously. Instead, it polls for signal files:
426
+
427
+ | Signal | Written By | Polled By Leader | Purpose |
428
+ |--------|-----------|------------------|---------|
429
+ | `<slug>-iter-signal.json` | Worker | After dispatching Worker | Detect Worker iteration completion |
430
+ | `<slug>-verify-verdict.json` | Verifier | After dispatching Verifier | Detect Verifier completion |
431
+
432
+ The Leader reads these files with `jq` to extract status/verdict fields for control-flow decisions.
433
+
434
+ ### Heartbeat Monitoring
435
+
436
+ Each trigger script writes a heartbeat file (`worker-heartbeat.json` or `verifier-heartbeat.json`) in a background loop. The Leader periodically checks the heartbeat's timestamp to detect stale processes (no update within `HEARTBEAT_STALE_THRESHOLD` seconds).
437
+
438
+ ### Idle Pane Nudging
439
+
440
+ If a pane produces no output for `IDLE_NUDGE_THRESHOLD` seconds, the Leader sends a nudge (an Enter keystroke) to prompt activity. After `MAX_NUDGES` attempts without progress, the Leader treats the pane as stuck.
441
+
442
+ ### Exponential Backoff Restarts
443
+
444
+ If a Worker or Verifier process crashes, the Leader restarts it with exponential backoff: 5s, 10s, 20s, 60s. After `MAX_RESTARTS` consecutive failures, the Leader writes a BLOCKED sentinel.
445
+
446
+ ### Per-Iteration Timeout
447
+
448
+ Each iteration has a configurable timeout (`ITER_TIMEOUT`, default 600s). If a Worker does not produce an `iter-signal.json` within this period, the Leader kills the process and records the timeout.
449
+
450
+ ### Static Model Routing
451
+
452
+ Unlike Agent() mode where the LLM Leader dynamically selects models, tmux mode uses static model routing via environment variables:
453
+
454
+ | Variable | Default | Description |
455
+ |----------|---------|-------------|
456
+ | `WORKER_MODEL` | `sonnet` | Model for Worker invocations |
457
+ | `VERIFIER_MODEL` | `opus` | Model for Verifier invocations |
458
+
459
+ ### Session Config
460
+
461
+ Session metadata is stored in `logs/<slug>/session-config.json`:
462
+
463
+ ```json
464
+ {
465
+ "session_name": "rlp-desk-<slug>-20260318-143000",
466
+ "leader_pane": "%0",
467
+ "worker_pane": "%1",
468
+ "verifier_pane": "%2",
469
+ "created_at": "2026-03-18T14:30:00Z"
470
+ }
471
+ ```
472
+
473
+ This file is used by the `status` and `clean` commands to find and interact with the running tmux session.
@@ -0,0 +1,12 @@
1
+ # loop-test - Latest Context
2
+
3
+ ## Current Frontier
4
+ ### Completed
5
+ ### In Progress
6
+ ### Next
7
+ - US-001: calc.py — Basic Operations
8
+
9
+ ## Key Decisions
10
+ ## Known Issues
11
+ ## Files Changed This Iteration
12
+ ## Verification Status
@@ -0,0 +1,38 @@
1
+ Execute the plan for loop-test.
2
+
3
+ Required reads every iteration:
4
+ - PRD: .claude/ralph-desk/plans/prd-loop-test.md
5
+ - Test Spec: .claude/ralph-desk/plans/test-spec-loop-test.md
6
+ - Campaign Memory: .claude/ralph-desk/memos/loop-test-memory.md
7
+ - Latest Context: .claude/ralph-desk/context/loop-test-latest.md
8
+
9
+ CRITICAL RULE: Work on only ONE User Story per iteration.
10
+ - Check campaign memory's "Next Iteration Contract" first and do that.
11
+ - Do not touch already-completed stories.
12
+
13
+ Iteration rules:
14
+ - Use fresh context only; do NOT depend on prior chat history.
15
+ - Execute exactly ONE bounded next action (ONE user story).
16
+ - Refresh context file with the current frontier.
17
+ - Rewrite campaign memory in full.
18
+
19
+ MANDATORY: When done, write the following signal file:
20
+ - Path: .claude/ralph-desk/memos/loop-test-iter-signal.json
21
+ - Format: {"iteration": N, "status": "continue|verify|blocked", "summary": "what was done", "timestamp": "ISO"}
22
+ - Status values:
23
+ - "continue" = current story done but other stories remain
24
+ - "verify" = all stories complete + done-claim written
25
+ - "blocked" = autonomous blocker
26
+
27
+ Stop behavior:
28
+ - Current story done but other stories remain → memory stop=continue, signal status=continue
29
+ - All stories complete + all tests pass → write done-claim JSON (.claude/ralph-desk/memos/loop-test-done-claim.json) + signal status=verify
30
+ - Autonomous blocker → write blocked.md + signal status=blocked
31
+
32
+ Objective: Implement a Python calculator module: calc.py (4 functions + type hints + ValueError) + test_calc.py (pytest, 8+ tests, all passed)
33
+
34
+ ---
35
+ ## Iteration Context
36
+ - **Iteration**: 1
37
+ - **Memory Stop Status**: continue
38
+ - **Next Iteration Contract**: Start from the beginning: read PRD and implement US-001 (calc.py with 4 functions).
@@ -0,0 +1,28 @@
1
+ #!/bin/zsh
2
+ # Trigger for iteration 1 worker - generated by run_ralph_desk.zsh
3
+ # DO NOT use exec here -- it breaks heartbeat cleanup
4
+
5
+ HEARTBEAT_FILE="/Users/kyjin/dev/own/ai-dev-methodologies/rlp-desk/examples/calculator/.claude/ralph-desk/logs/loop-test/worker-heartbeat.json"
6
+
7
+ # Background heartbeat writer (omc-teams pattern)
8
+ (
9
+ while true; do
10
+ echo '{"ts":"'$(date -u +%Y-%m-%dT%H:%M:%SZ)'","pid":'"$$"'}' > "${HEARTBEAT_FILE}.tmp.$$"
11
+ mv "${HEARTBEAT_FILE}.tmp.$$" "$HEARTBEAT_FILE"
12
+ sleep 15
13
+ done
14
+ ) &
15
+ HEARTBEAT_PID=$!
16
+
17
+ # Run claude with fresh context (governance.md s7 step 5)
18
+ claude -p "$(cat /Users/kyjin/dev/own/ai-dev-methodologies/rlp-desk/examples/calculator/.claude/ralph-desk/logs/loop-test/iter-001.worker-prompt.md)" \
19
+ --model sonnet \
20
+ --dangerously-skip-permissions \
21
+ --output-format text \
22
+ 2>&1 | tee /Users/kyjin/dev/own/ai-dev-methodologies/rlp-desk/examples/calculator/.claude/ralph-desk/logs/loop-test/iter-001.worker-output.log
23
+
24
+ # Cleanup heartbeat writer
25
+ kill $HEARTBEAT_PID 2>/dev/null
26
+ wait $HEARTBEAT_PID 2>/dev/null
27
+ echo '{"ts":"'$(date -u +%Y-%m-%dT%H:%M:%SZ)'","status":"exited"}' > "${HEARTBEAT_FILE}.tmp.$$"
28
+ mv "${HEARTBEAT_FILE}.tmp.$$" "$HEARTBEAT_FILE"
@@ -0,0 +1,25 @@
1
+ {
2
+ "session_name": "rlp-desk-loop-test-20260318-232859",
3
+ "slug": "loop-test",
4
+ "created_at": "2026-03-18T14:28:59Z",
5
+ "panes": {
6
+ "leader": "%99",
7
+ "worker": "%100",
8
+ "verifier": "%101"
9
+ },
10
+ "pid": 65962,
11
+ "root": "/Users/kyjin/dev/own/ai-dev-methodologies/rlp-desk/examples/calculator",
12
+ "models": {
13
+ "worker": "sonnet",
14
+ "verifier": "opus"
15
+ },
16
+ "config": {
17
+ "max_iter": 20,
18
+ "poll_interval": 5,
19
+ "iter_timeout": 600,
20
+ "heartbeat_stale_threshold": 120,
21
+ "max_restarts": 3,
22
+ "idle_nudge_threshold": 30,
23
+ "max_nudges": 3
24
+ }
25
+ }
@@ -0,0 +1,10 @@
1
+ {
2
+ "slug": "loop-test",
3
+ "iteration": 1,
4
+ "max_iter": 20,
5
+ "phase": "worker",
6
+ "worker_model": "sonnet",
7
+ "verifier_model": "opus",
8
+ "last_result": "running",
9
+ "updated_at_utc": "2026-03-18T14:28:59Z"
10
+ }
@@ -0,0 +1 @@
1
+ {"ts":"2026-03-18T14:29:15Z","pid":66349}
@@ -0,0 +1,17 @@
1
+ # loop-test - Campaign Memory
2
+
3
+ ## Stop Status
4
+ continue
5
+
6
+ ## Objective
7
+ Implement a Python calculator module: calc.py (4 functions + type hints + ValueError) + test_calc.py (pytest, 8+ tests, all passed)
8
+
9
+ ## Current State
10
+ Iteration 0 - not started
11
+
12
+ ## Next Iteration Contract
13
+ Start from the beginning: read PRD and implement US-001 (calc.py with 4 functions).
14
+
15
+ ## Patterns Discovered
16
+ ## Learnings
17
+ ## Evidence Chain