@ai-dev-methodologies/rlp-desk 0.7.5 → 0.8.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,77 @@
1
+ import { execFile } from 'node:child_process';
2
+ import { promisify } from 'node:util';
3
+
4
+ const execFileAsync = promisify(execFile);
5
+ const SHELL_COMMANDS = new Set(['zsh', 'bash', 'sh']);
6
+ const LAYOUT_FLAGS = {
7
+ horizontal: '-h',
8
+ vertical: '-v',
9
+ };
10
+
11
+ export class TmuxError extends Error {
12
+ constructor(message, options = {}) {
13
+ super(message, options.cause ? { cause: options.cause } : undefined);
14
+ this.name = 'TmuxError';
15
+ this.paneId = options.paneId ?? null;
16
+ }
17
+ }
18
+
19
+ async function runTmux(args, { paneId = null } = {}) {
20
+ try {
21
+ return await execFileAsync('tmux', args);
22
+ } catch (error) {
23
+ const stderr = error.stderr?.trim();
24
+ const detail = stderr || error.message;
25
+ const paneDetail = paneId ? ` for pane ${paneId}` : '';
26
+ throw new TmuxError(`tmux command failed${paneDetail}: ${detail}`, {
27
+ cause: error,
28
+ paneId,
29
+ });
30
+ }
31
+ }
32
+
33
+ async function readTmuxValue(args, options) {
34
+ const { stdout } = await runTmux(args, options);
35
+ return stdout.trim();
36
+ }
37
+
38
+ export async function createPane({ targetPaneId, layout }) {
39
+ const layoutFlag = LAYOUT_FLAGS[layout];
40
+ if (!layoutFlag) {
41
+ throw new TmuxError(`Unsupported tmux layout: ${layout}`);
42
+ }
43
+
44
+ return readTmuxValue(
45
+ ['split-window', layoutFlag, '-d', '-P', '-F', '#{pane_id}', '-t', targetPaneId],
46
+ { paneId: targetPaneId },
47
+ );
48
+ }
49
+
50
+ export async function sendKeys(paneId, command) {
51
+ await runTmux(['send-keys', '-t', paneId, '-l', '--', command], { paneId });
52
+ await runTmux(['send-keys', '-t', paneId, 'Enter'], { paneId });
53
+ }
54
+
55
+ export async function waitForProcessExit(
56
+ paneId,
57
+ { pollIntervalMs = 100, timeoutMs = 5000 } = {},
58
+ ) {
59
+ const deadline = Date.now() + timeoutMs;
60
+
61
+ while (Date.now() <= deadline) {
62
+ const currentCommand = await readTmuxValue(
63
+ ['display-message', '-p', '-t', paneId, '#{pane_current_command}'],
64
+ { paneId },
65
+ );
66
+
67
+ if (SHELL_COMMANDS.has(currentCommand)) {
68
+ return;
69
+ }
70
+
71
+ await new Promise((resolve) => setTimeout(resolve, pollIntervalMs));
72
+ }
73
+
74
+ throw new TmuxError(`Timed out waiting for pane ${paneId} to return to the shell`, {
75
+ paneId,
76
+ });
77
+ }
@@ -1,347 +0,0 @@
1
- # Blueprint: rlp-desk v0.4 Evolution
2
-
3
- > Design blueprint for rlp-desk's next major direction.
4
- > Status: CONFIRMED (Deep Interview 4.9% ambiguity) | Author: kyjin | Date: 2026-03-26
5
-
6
- ---
7
-
8
- ## Vision
9
-
10
- rlp-desk is both a **task execution tool** and a **workflow generator**.
11
-
12
- Users start with unstructured work, iterate through self-verification cycles,
13
- and the accumulated process naturally becomes a reusable, formalized workflow.
14
-
15
- ```
16
- [Unstructured] [Structured]
17
- brainstorm → run → verify Workflow (skill + command composition)
18
- → re-brainstorm → run → verify Feedback loop enforcement
19
- → run → verify Reproducible process
20
- → final result ──▶ (P3: determined after P0-P2 iteration)
21
- ```
22
-
23
- ---
24
-
25
- ## 1. Debug (`--debug`) — Execution Trace
26
-
27
- ### Purpose
28
-
29
- Trace rlp-desk's execution process. Not verbose data dumps — focused logging
30
- of whether rules were followed and options behaved as configured.
31
-
32
- ### Two audiences
33
-
34
- | Audience | Use case |
35
- |----------|----------|
36
- | Developer (self) | Verify governance compliance, catch erroneous execution (e.g., codex consensus FAIL treated as PASS) |
37
- | External users (npm) | Run with `--debug`, attach debug.log + version to bug report |
38
-
39
- ### Scope
40
-
41
- - Governance rule compliance (IL-1 through IL-5, checkpoint enforcement)
42
- - Option behavior verification (consensus, per-us, model routing)
43
- - Decision points (model upgrades, circuit breaker triggers)
44
- - NOT implementation details of Worker/Verifier content
45
-
46
- ### Current state
47
-
48
- Basic implementation exists in v0.3.6 (debug.log with phase-level entries).
49
- Refine to match the scoped purpose above — no expansion needed, possibly trimming.
50
-
51
- ### Versioning
52
-
53
- debug.log is versioned on re-execution: `debug-v1.log`, `debug-v2.log`, ...
54
- Preserved for bug tracking across versions.
55
-
56
- ---
57
-
58
- ## 2. Self-Verification (`--with-self-verification`) — Quality Feedback Loop
59
-
60
- ### Purpose
61
-
62
- Evaluate whether the AI implementation meets quality expectations.
63
- When the user is unsatisfied, self-verification becomes the **input for the next
64
- execution cycle** — same goal, improved strategy.
65
-
66
- ### Status
67
-
68
- Remains an **optional flag** (`--with-self-verification`). Not always-on.
69
- Simple tasks don't need the re-execution cycle.
70
-
71
- ### Current vs. New
72
-
73
- | Aspect | Current (v0.3.x) | New vision |
74
- |--------|-------------------|------------|
75
- | Timing | Post-campaign report only | Input for re-execution cycle |
76
- | Output | Static report | Living document that drives next iteration |
77
- | Scope | Analysis of what happened | Analysis + recommendations that reshape execution plan |
78
- | PRD | Not touched | May be refined based on self-verification findings |
79
-
80
- ### Re-execution Cycle (same slug)
81
-
82
- ```
83
- brainstorm("auth-refactor")
84
- → init → run → self-verification-v1 + campaign-report-v1 + debug-v1
85
-
86
- user: "not satisfied, re-run"
87
-
88
-
89
- re-brainstorm("auth-refactor")
90
-
91
- ├─ PRD: single file, updated in place if needed (no versioning)
92
- ├─ SV report: renamed to self-verification-v1.md (preserved)
93
- ├─ Campaign report: renamed to campaign-report-v1.md (preserved)
94
- ├─ Debug log: renamed to debug-v1.log (preserved)
95
- ├─ Everything else: deleted (test-spec, prompts, context, memos, logs)
96
- └─ Re-brainstorm: informed by self-verification-v1
97
-
98
-
99
- → init → run → self-verification-v2 + campaign-report-v2 + debug-v2
100
-
101
- ...
102
- ```
103
-
104
- ### Versioning Rules
105
-
106
- When re-running the same slug:
107
-
108
- 1. **PRD** — single file (`prd-<slug>.md`), updated in place if needed.
109
- No versioning. PRD is the single source of truth.
110
-
111
- 2. **Versioned files (3 total)** — renamed with vN suffix before re-run:
112
- - `self-verification-report.md` → `self-verification-v1.md`
113
- - `campaign-report.md` → `campaign-report-v1.md`
114
- - `debug.log` → `debug-v1.log`
115
-
116
- 3. **Everything else** — deleted. Next run regenerates them automatically:
117
- - `test-spec-<slug>.md`, `prompts/`, `context/`, `memos/`, `logs/<slug>/*`
118
-
119
- 4. **Self-verification as the historical record** — each version's story
120
- is told by its self-verification report. No need to preserve iteration
121
- logs; SV summarizes what happened.
122
-
123
- ### Re-execution Detection
124
-
125
- When brainstorm detects an existing slug:
126
- - Ask the user: "Improve based on previous results, or start fresh?"
127
- - If improve: version existing files, carry forward PRD, re-brainstorm with SV context
128
- - If start fresh: clean everything (equivalent to `clean` + new brainstorm)
129
-
130
- ### Implementation: Shell Script
131
-
132
- Deterministic file operations (rename, delete, version detection) go in
133
- `init_ralph_desk.zsh`. AI handles judgment (PRD refinement, strategy changes).
134
-
135
- ---
136
-
137
- ## 3. Post-Run Reporting — Mandatory Completion Report
138
-
139
- ### Purpose
140
-
141
- After `run` completes, the user MUST receive a comprehensive, templated report
142
- before deciding next steps. This is the **decision surface** for whether to
143
- re-brainstorm or accept the result.
144
-
145
- ### Trigger
146
-
147
- Mandatory after every `run` completion (COMPLETE, BLOCKED, or TIMEOUT).
148
- Not optional. Not skippable. Applies regardless of SV flag.
149
-
150
- ### Output
151
-
152
- - **File**: `logs/<slug>/campaign-report.md` (versioned on re-execution)
153
- - **Screen**: Full report displayed to user
154
-
155
- ### Report Template
156
-
157
- ```markdown
158
- # Campaign Report: <slug>
159
-
160
- ## Objective
161
- <from PRD>
162
-
163
- ## Execution Summary
164
- | Metric | Value |
165
- |--------|-------|
166
- | Total iterations | N |
167
- | Outcome | COMPLETE / BLOCKED / TIMEOUT |
168
- | Worker model | sonnet / opus |
169
- | Verifier model | opus |
170
- | Duration | Xm Ys |
171
-
172
- ## User Stories Status
173
- | US | Description | Status | Iterations | Notes |
174
- |----|-------------|--------|------------|-------|
175
- | US-001 | ... | PASS | 2 | — |
176
- | US-002 | ... | PASS | 4 | 2 fix rounds |
177
-
178
- ## Verification Results
179
- - L1 (Unit): PASS — N tests, N assertions
180
- - L2 (Integration): PASS / N/A
181
- - L3 (E2E): PASS — input/output comparison
182
- - L4 (Deploy): N/A
183
-
184
- ## Issues Encountered
185
- <failures, fix loops, model upgrades, escalations>
186
-
187
- ## Cost & Performance
188
- | Role | Model | Tokens | Duration | Source |
189
- |------|-------|--------|----------|--------|
190
- | Worker | sonnet | N | Xm Ys | measured/estimated |
191
- | Verifier | opus | N | Xm Ys | measured/estimated |
192
- | **Total** | | **N** | **Xm Ys** | |
193
-
194
- ## Self-Verification Summary (if enabled)
195
- <from self-verification report — strengths, weaknesses, recommendations>
196
-
197
- ## Files Changed
198
- <git diff --stat summary>
199
- ```
200
-
201
- ### Post-Report Flow
202
-
203
- ```
204
- [All runs]
205
- → Report displayed + saved to file
206
-
207
- [SV enabled only]
208
- → "Would you like to re-brainstorm to improve the result?"
209
- → Yes: trigger re-execution cycle (§2)
210
- → No: session ends
211
-
212
- [SV not enabled]
213
- → Report displayed, session ends (no re-brainstorm question)
214
- ```
215
-
216
- ### Rules
217
-
218
- - Report content must reference actual data (status.json, iteration results,
219
- self-verification if available) — no fabrication
220
- - Template is fixed; sections may show "N/A" but cannot be omitted
221
- - Re-brainstorm question only appears when SV is enabled
222
-
223
- ---
224
-
225
- ## 4. Workflow Generation — From Ad-hoc to Reproducible (P3, deferred)
226
-
227
- ### Purpose
228
-
229
- After multiple self-verification cycles produce a final result,
230
- automatically generate a **formalized workflow** that captures the
231
- proven process as a reusable, enforceable process.
232
-
233
- ### Status: DEFERRED
234
-
235
- P3 design will be determined after P0-P2 are working and tested through
236
- real usage. The user will iterate with the developer to find the right form.
237
-
238
- ### Known Direction
239
-
240
- - Output: **skill + command composition** (not rlp-desk PRD format)
241
- - Invoking the command triggers a combination of skills
242
- - The command structure enforces feedback loops
243
- - Leverage existing skill ecosystem (`/find-skills`, etc.)
244
- - Trust AI models + well-structured skills + checklist-managed feedback loops
245
-
246
- ### Success Criteria (confirmed)
247
-
248
- - Process must be reproducible: same workflow → same quality of results
249
- - Code implementation may differ, but behavior/quality must be equivalent
250
- - Feedback loop + template documents as structural components
251
-
252
- ### Open Questions (to resolve after P0-P2)
253
-
254
- 1. Separate subcommand (`/rlp-desk workflow <slug>`) or independent skill?
255
- 2. Minimum self-verification versions required (2? 3?)
256
- 3. How to validate generated workflow actually reproduces results?
257
-
258
- ---
259
-
260
- ## Feature Relationship
261
-
262
- ```
263
- ┌──────────────────────────────────────────────────────────────┐
264
- │ rlp-desk execution │
265
- │ │
266
- │ brainstorm → init → run ──┐ │
267
- │ │ │
268
- │ ┌──────────────┼──────────────────┐ │
269
- │ │ │ │ │
270
- │ --debug --with-self-verification │ │
271
- │ (execution (quality evaluation) │ │
272
- │ trace) │ │ │
273
- │ │ ▼ │ │
274
- │ │ self-verification report │ │
275
- │ │ │ │ │
276
- │ ▼ ▼ │ │
277
- │ debug.log Post-Run Report (mandatory) │ │
278
- │ (versioned) + campaign-report.md │ │
279
- │ │ (versioned) │ │
280
- │ │ │ │ │
281
- │ │ [SV enabled?] │ │
282
- │ │ │ │ │ │
283
- │ │ Yes No │ │
284
- │ │ │ │ │ │
285
- │ │ ▼ ▼ │ │
286
- │ │ "Re-brainstorm?" End │ │
287
- │ │ │ │ │ │
288
- │ │ Yes No │ │
289
- │ │ │ │ │ │
290
- │ │ ▼ ▼ │ │
291
- │ │ Re-execute Accept │ │
292
- │ │ (vN+1) result │ │
293
- │ │ │ │ │ │
294
- │ │ │ ┌────┘ │ │
295
- │ │ ▼ ▼ │ │
296
- │ │ Workflow Generation (P3) │ │
297
- │ │ (deferred — after P0-P2) │ │
298
- │ │ │ │
299
- │ └── Bug report (external users) │ │
300
- └──────────────────────────────────────────────────────────────┘
301
- ```
302
-
303
- ---
304
-
305
- ## Implementation Priority
306
-
307
- | Phase | Feature | Dependency | Scope |
308
- |-------|---------|------------|-------|
309
- | P0 | Debug refinement | None | rlp-desk.md, governance.md |
310
- | P1 | Post-Run Report | None | rlp-desk.md |
311
- | P2 | Self-Verification redesign | P1 | rlp-desk.md, init_ralph_desk.zsh, governance.md |
312
- | P3 | Workflow Generation | P2 + real usage | TBD after iteration |
313
-
314
- - P0 and P1 are independent, can be done in parallel.
315
- - P2 builds on P1 (report triggers re-brainstorm question).
316
- - P3 requires P2 (needs versioned self-verification data) + real-world testing.
317
- - Breaking changes allowed (0.x semver). Document in CHANGELOG.
318
-
319
- ---
320
-
321
- ## Design Decisions Log (from Deep Interview)
322
-
323
- | # | Decision | Rationale |
324
- |---|----------|-----------|
325
- | 1 | brainstorm auto-handles re-execution | Natural UX — same command, system detects context |
326
- | 2 | Mixed judgment (quantitative + user) | Pure metrics miss quality nuance; pure subjective misses patterns |
327
- | 3 | Breaking changes OK | 0.x semver, clean redesign over backward compat hacks |
328
- | 4 | P3 essential but deferred | Core vision requires it, but form emerges from P0-P2 usage |
329
- | 5 | Skill + command composition for P3 | Leverage existing ecosystem, not reinvent |
330
- | 6 | All features in rlp-desk | Connected UX > separate tools |
331
- | 7 | Shell script for deterministic ops | AI interpretation unreliable for file manipulation |
332
- | 8 | SV remains optional | Simple tasks don't need re-execution overhead |
333
- | 9 | Re-brainstorm only with SV | No SV = no improvement data = no point asking |
334
- | 10 | Report always mandatory | Users need decision surface regardless of SV |
335
- | 11 | PRD = single file, no versioning | Source of truth, updated in place |
336
- | 12 | Only 3 files versioned | SV report + campaign report + debug.log |
337
- | 13 | Report includes cost section | Token/time tracking for optimization decisions |
338
- | 14 | Ask user intent on slug reuse | "Improve?" vs "Start fresh?" — don't assume |
339
-
340
- ---
341
-
342
- ## Open Questions (P3 only)
343
-
344
- 1. **Workflow as subcommand or separate skill?** — defer to P0-P2 experience
345
- 2. **Minimum SV versions for generation** — need real data to determine
346
- 3. **Reproducibility validation method** — how to test generated workflow works
347
- 4. **Skill composition mechanics** — how feedback loop enforcement works in practice
@@ -1,55 +0,0 @@
1
- # Ralplan + Codex Cross-Validation
2
-
3
- ```
4
- /ralplan {{OBJECTIVE}}
5
- {{SCOPE}}
6
- Run codex cross-validation after consensus. Repeat revise -> consensus -> codex until 0 issues.
7
- If the code, requirements, or source documents are insufficient or unclear, identify the gaps before proceeding.
8
- ```
9
-
10
- ---
11
-
12
- ## Placeholders
13
-
14
- | Placeholder | Required | Description |
15
- |---|---|---|
16
- | `{{OBJECTIVE}}` | Yes | Planning goal. Naturally includes target documents, deliverable type, and context. |
17
- | `{{SCOPE}}` | No | Files or directories to change. Omit to target the entire current project. |
18
-
19
- ## Prerequisites
20
-
21
- - [oh-my-claudecode](https://github.com/anthropics/oh-my-claudecode) installed and configured
22
- - Codex CLI installed and authenticated (`codex --version`)
23
- - Deep interview (`/deep-interview`) completed beforehand to clarify requirements
24
-
25
- ## How It Works
26
-
27
- 1. `/ralplan` runs the Planner -> Architect -> Critic consensus loop (built-in)
28
- 2. After consensus APPROVE, codex independently reviews the plan
29
- 3. If codex finds issues, the plan is revised and re-enters consensus
30
- 4. Repeats until codex returns 0 issues
31
-
32
- ## Why
33
-
34
- Iterating ralplan consensus with codex cross-validation produces plans robust enough
35
- to leverage rlp-desk's verification loop (self-verification, post-run report)
36
- effectively. Plans that survive both reviewers have fewer surprises during execution.
37
-
38
- ## Examples
39
-
40
- ### With scope
41
-
42
- ```
43
- /ralplan Implementation plan based on blueprint-v0.4
44
- src/commands/rlp-desk.md, src/governance.md, src/scripts/init_ralph_desk.zsh
45
- Run codex cross-validation after consensus. Repeat revise -> consensus -> codex until 0 issues.
46
- If source documents are insufficient, identify gaps before proceeding.
47
- ```
48
-
49
- ### Without scope (defaults to current project)
50
-
51
- ```
52
- /ralplan Auth module refactoring strategy
53
- Run codex cross-validation after consensus. Repeat revise -> consensus -> codex until 0 issues.
54
- If source documents are insufficient, identify gaps before proceeding.
55
- ```
@@ -1,179 +0,0 @@
1
- # Worker/Verifier Prompt Restructure Implementation Plan
2
-
3
- > **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
4
-
5
- **Goal:** Restructure Worker/Verifier prompt templates in init_ralph_desk.zsh to make TDD more prominent, reduce prompt size, and add debugging/anti-rationalization guidance.
6
-
7
- **Architecture:** Elevate TDD from procedural guidance to hard constraint box (same level as SCOPE LOCK). Compress Forbidden Shortcuts by removing items already enforced by Verifier. Add "When Stuck" debugging protocol and Verifier anti-rationalization patterns.
8
-
9
- **Tech Stack:** zsh heredoc templates in init_ralph_desk.zsh
10
-
11
- **Origin:** ralplan consensus (Architect APPROVE + Critic ACCEPT, 2026-04-06)
12
-
13
- ---
14
-
15
- ### Task 1: Insert TDD MANDATE and remove old Test-First Approach
16
-
17
- **Files:**
18
- - Modify: `src/scripts/init_ralph_desk.zsh:323-337`
19
-
20
- - [ ] **Step 1: Insert TDD MANDATE after file-reading section (L323), before SCOPE LOCK (L325)**
21
-
22
- In `src/scripts/init_ralph_desk.zsh`, find:
23
- ```
24
- 4. Latest Context: $DESK/context/$SLUG-latest.md → current state
25
-
26
- ## SCOPE LOCK (hard constraint — violation causes verification failure)
27
- ```
28
-
29
- Replace with:
30
- ```
31
- 4. Latest Context: $DESK/context/$SLUG-latest.md → current state
32
-
33
- ## TDD MANDATE (hard constraint — violation = automatic FAIL)
34
- > Write failing tests FIRST → confirm RED (exit_code=1) → implement minimum code → confirm GREEN.
35
- > Every NEW AC requires: write_test → verify_red → implement → verify_green in execution_steps.
36
- > No exceptions. Verifier rejects missing RED evidence. For already-passing ACs, use verify_existing.
37
-
38
- ## SCOPE LOCK (hard constraint — violation causes verification failure)
39
- ```
40
-
41
- - [ ] **Step 2: Remove old Test-First Approach section (L333-337)**
42
-
43
- Find and delete these 6 lines (header + blank line + 4 items):
44
- ```
45
- ## Test-First Approach (read test-spec BEFORE coding)
46
- 1. Read test-spec "Impacted Tests" — if TODO (first iteration), skip to step 2 and fill this section during your work. Otherwise, run these FIRST to confirm they pass before your changes.
47
- 2. Read test-spec "Required New Tests" — write these. They SHOULD FAIL initially.
48
- 3. Implement minimum code to make all tests pass.
49
- 4. Run ALL tests (impacted + new) to confirm nothing is broken.
50
- ```
51
-
52
- - [ ] **Step 3: Verify changes**
53
-
54
- Run: `grep -n "TDD MANDATE\|Test-First Approach\|SCOPE LOCK" src/scripts/init_ralph_desk.zsh`
55
- Expected: TDD MANDATE appears BEFORE SCOPE LOCK. "Test-First Approach" does NOT appear.
56
-
57
- ---
58
-
59
- ### Task 2: Compress Forbidden Shortcuts from 14 to 6 items
60
-
61
- **Files:**
62
- - Modify: `src/scripts/init_ralph_desk.zsh:339-353`
63
-
64
- - [ ] **Step 1: Replace Forbidden Shortcuts section**
65
-
66
- Find the entire `## Forbidden Shortcuts` section (14 items from L339-353) and replace with these 6 items:
67
-
68
- ```
69
- ## Forbidden Shortcuts (Verifier will check these)
70
- - Do not mock external services when L2 integration test is required by test-spec.
71
- - Do not delete or weaken existing assertions to make tests pass.
72
- - Do not skip boundary cases listed in the PRD.
73
- - Do not write code before tests — if you did, delete it and start with tests.
74
- - **NEVER modify rlp-desk infrastructure files** (~/.claude/ralph-desk/*, ~/.claude/commands/rlp-desk.md). If you discover a bug in rlp-desk itself, report it in done-claim.json with {"status": "blocked", "reason": "rlp-desk bug: <description>"} and signal blocked. Do NOT attempt to fix rlp-desk — it is the orchestration tool, not your project code.
75
- - **NEVER modify Claude Code settings** (~/.claude/settings.json, .claude/settings.local.json, or any settings files). Do NOT add permissions, change models, or alter configuration. If a permission prompt blocks you, report it as blocked — do NOT try to edit settings to bypass it.
76
- ```
77
-
78
- Removed items and their coverage:
79
- - L342 "test-specific logic" → Verifier step 10 (L474) checks this
80
- - L344 "code inspection" → Verifier step 10½ phrase scan (L484)
81
- - L345 "too simple to test" → Verifier step 10½ phrase scan (L484)
82
- - L346 "I'll test after" → will add to Verifier step 10½ in Task 4
83
- - L347 "already manually tested" → Verifier step 10½ phrase scan (L484)
84
- - L348 "partial check is enough" → Verifier step 10½ phrase scan (L484)
85
- - L349 "I'm confident" → Verifier step 10½ phrase scan (L484)
86
- - L350 "existing code has no tests" → TDD MANDATE covers ("no exceptions")
87
-
88
- - [ ] **Step 2: Verify line count**
89
-
90
- Run: `sed -n '/^## Forbidden Shortcuts/,/^## /p' src/scripts/init_ralph_desk.zsh | head -10`
91
- Expected: 7 lines (1 header + 6 items)
92
-
93
- ---
94
-
95
- ### Task 3: Add "When Stuck" debugging guide
96
-
97
- **Files:**
98
- - Modify: `src/scripts/init_ralph_desk.zsh` (after Forbidden Shortcuts, before Iteration rules)
99
-
100
- - [ ] **Step 1: Insert debugging guide**
101
-
102
- Find:
103
- ```
104
- - **NEVER modify Claude Code settings** (~/.claude/settings.json, .claude/settings.local.json, or any settings files). Do NOT add permissions, change models, or alter configuration. If a permission prompt blocks you, report it as blocked — do NOT try to edit settings to bypass it.
105
-
106
- ## Iteration rules
107
- ```
108
-
109
- Replace with:
110
- ```
111
- - **NEVER modify Claude Code settings** (~/.claude/settings.json, .claude/settings.local.json, or any settings files). Do NOT add permissions, change models, or alter configuration. If a permission prompt blocks you, report it as blocked — do NOT try to edit settings to bypass it.
112
-
113
- ## When Stuck (do NOT guess-and-fix)
114
- > 1. STOP and READ the error. Trace the call stack. Identify the root cause before touching code.
115
- > 2. Write a minimal test that reproduces the failure, then fix the root cause only.
116
- > 3. If 3+ fixes fail on the same issue, signal "blocked" with your diagnosis.
117
-
118
- ## Iteration rules
119
- ```
120
-
121
- - [ ] **Step 2: Verify insertion**
122
-
123
- Run: `grep -n "When Stuck" src/scripts/init_ralph_desk.zsh`
124
- Expected: exactly 1 match, between Forbidden Shortcuts and Iteration rules
125
-
126
- ---
127
-
128
- ### Task 4: Extend Verifier Anti-Rationalization + gap-close step 10½
129
-
130
- **Files:**
131
- - Modify: `src/scripts/init_ralph_desk.zsh:480,484`
132
-
133
- - [ ] **Step 1: Add rationalization red flags to step 10¼**
134
-
135
- Find:
136
- ```
137
- - Never issue a silent PASS — every pass verdict must cite specific evidence for each AC checked
138
- 10½. **Worker Process Audit**:
139
- ```
140
-
141
- Replace with:
142
- ```
143
- - Never issue a silent PASS — every pass verdict must cite specific evidence for each AC checked
144
- - Rationalization red flags: "tests pass so it works" (passing ≠ correct), "Worker is confident" (confidence ≠ evidence), "changes are minimal" (scope ≠ correctness)
145
- 10½. **Worker Process Audit**:
146
- ```
147
-
148
- - [ ] **Step 2: Gap-close — add "I'll test after" to step 10½ phrase scan**
149
-
150
- Find:
151
- ```
152
- - Forbidden shortcuts: check done-claim claims and summary for forbidden phrases ("code inspection", "I'm confident", "too simple", "already manually tested", "partial check")
153
- ```
154
-
155
- Replace with:
156
- ```
157
- - Forbidden shortcuts: check done-claim claims and summary for forbidden phrases ("code inspection", "I'm confident", "too simple", "I'll test after", "already manually tested", "partial check")
158
- ```
159
-
160
- - [ ] **Step 3: Verify both changes**
161
-
162
- Run: `grep -n "Rationalization red flags\|I'll test after" src/scripts/init_ralph_desk.zsh`
163
- Expected: 2 matches — one in step 10¼, one in step 10½
164
-
165
- ---
166
-
167
- ## Token Budget Verification
168
-
169
- After all 4 tasks, verify net token reduction:
170
-
171
- ```bash
172
- # Before: count Worker prompt lines (approximate)
173
- # After: should be ~10 lines fewer
174
- wc -l src/scripts/init_ralph_desk.zsh
175
- # Compare with git: lines removed vs added
176
- git diff --stat src/scripts/init_ralph_desk.zsh
177
- ```
178
-
179
- Expected: net negative line count (fewer lines = less context pressure on Worker).