@ai-dev-methodologies/rlp-desk 0.6.0 → 0.7.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
|
@@ -0,0 +1,179 @@
|
|
|
1
|
+
# Worker/Verifier Prompt Restructure Implementation Plan
|
|
2
|
+
|
|
3
|
+
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
|
|
4
|
+
|
|
5
|
+
**Goal:** Restructure Worker/Verifier prompt templates in init_ralph_desk.zsh to make TDD more prominent, reduce prompt size, and add debugging/anti-rationalization guidance.
|
|
6
|
+
|
|
7
|
+
**Architecture:** Elevate TDD from procedural guidance to hard constraint box (same level as SCOPE LOCK). Compress Forbidden Shortcuts by removing items already enforced by Verifier. Add "When Stuck" debugging protocol and Verifier anti-rationalization patterns.
|
|
8
|
+
|
|
9
|
+
**Tech Stack:** zsh heredoc templates in init_ralph_desk.zsh
|
|
10
|
+
|
|
11
|
+
**Origin:** ralplan consensus (Architect APPROVE + Critic ACCEPT, 2026-04-06)
|
|
12
|
+
|
|
13
|
+
---
|
|
14
|
+
|
|
15
|
+
### Task 1: Insert TDD MANDATE and remove old Test-First Approach
|
|
16
|
+
|
|
17
|
+
**Files:**
|
|
18
|
+
- Modify: `src/scripts/init_ralph_desk.zsh:323-337`
|
|
19
|
+
|
|
20
|
+
- [ ] **Step 1: Insert TDD MANDATE after file-reading section (L323), before SCOPE LOCK (L325)**
|
|
21
|
+
|
|
22
|
+
In `src/scripts/init_ralph_desk.zsh`, find:
|
|
23
|
+
```
|
|
24
|
+
4. Latest Context: $DESK/context/$SLUG-latest.md → current state
|
|
25
|
+
|
|
26
|
+
## SCOPE LOCK (hard constraint — violation causes verification failure)
|
|
27
|
+
```
|
|
28
|
+
|
|
29
|
+
Replace with:
|
|
30
|
+
```
|
|
31
|
+
4. Latest Context: $DESK/context/$SLUG-latest.md → current state
|
|
32
|
+
|
|
33
|
+
## TDD MANDATE (hard constraint — violation = automatic FAIL)
|
|
34
|
+
> Write failing tests FIRST → confirm RED (exit_code=1) → implement minimum code → confirm GREEN.
|
|
35
|
+
> Every NEW AC requires: write_test → verify_red → implement → verify_green in execution_steps.
|
|
36
|
+
> No exceptions. Verifier rejects missing RED evidence. For already-passing ACs, use verify_existing.
|
|
37
|
+
|
|
38
|
+
## SCOPE LOCK (hard constraint — violation causes verification failure)
|
|
39
|
+
```
|
|
40
|
+
|
|
41
|
+
- [ ] **Step 2: Remove old Test-First Approach section (L333-337)**
|
|
42
|
+
|
|
43
|
+
Find and delete these 6 lines (header + blank line + 4 items):
|
|
44
|
+
```
|
|
45
|
+
## Test-First Approach (read test-spec BEFORE coding)
|
|
46
|
+
1. Read test-spec "Impacted Tests" — if TODO (first iteration), skip to step 2 and fill this section during your work. Otherwise, run these FIRST to confirm they pass before your changes.
|
|
47
|
+
2. Read test-spec "Required New Tests" — write these. They SHOULD FAIL initially.
|
|
48
|
+
3. Implement minimum code to make all tests pass.
|
|
49
|
+
4. Run ALL tests (impacted + new) to confirm nothing is broken.
|
|
50
|
+
```
|
|
51
|
+
|
|
52
|
+
- [ ] **Step 3: Verify changes**
|
|
53
|
+
|
|
54
|
+
Run: `grep -n "TDD MANDATE\|Test-First Approach\|SCOPE LOCK" src/scripts/init_ralph_desk.zsh`
|
|
55
|
+
Expected: TDD MANDATE appears BEFORE SCOPE LOCK. "Test-First Approach" does NOT appear.
|
|
56
|
+
|
|
57
|
+
---
|
|
58
|
+
|
|
59
|
+
### Task 2: Compress Forbidden Shortcuts from 14 to 6 items
|
|
60
|
+
|
|
61
|
+
**Files:**
|
|
62
|
+
- Modify: `src/scripts/init_ralph_desk.zsh:339-353`
|
|
63
|
+
|
|
64
|
+
- [ ] **Step 1: Replace Forbidden Shortcuts section**
|
|
65
|
+
|
|
66
|
+
Find the entire `## Forbidden Shortcuts` section (14 items from L339-353) and replace with these 6 items:
|
|
67
|
+
|
|
68
|
+
```
|
|
69
|
+
## Forbidden Shortcuts (Verifier will check these)
|
|
70
|
+
- Do not mock external services when L2 integration test is required by test-spec.
|
|
71
|
+
- Do not delete or weaken existing assertions to make tests pass.
|
|
72
|
+
- Do not skip boundary cases listed in the PRD.
|
|
73
|
+
- Do not write code before tests — if you did, delete it and start with tests.
|
|
74
|
+
- **NEVER modify rlp-desk infrastructure files** (~/.claude/ralph-desk/*, ~/.claude/commands/rlp-desk.md). If you discover a bug in rlp-desk itself, report it in done-claim.json with {"status": "blocked", "reason": "rlp-desk bug: <description>"} and signal blocked. Do NOT attempt to fix rlp-desk — it is the orchestration tool, not your project code.
|
|
75
|
+
- **NEVER modify Claude Code settings** (~/.claude/settings.json, .claude/settings.local.json, or any settings files). Do NOT add permissions, change models, or alter configuration. If a permission prompt blocks you, report it as blocked — do NOT try to edit settings to bypass it.
|
|
76
|
+
```
|
|
77
|
+
|
|
78
|
+
Removed items and their coverage:
|
|
79
|
+
- L342 "test-specific logic" → Verifier step 10 (L474) checks this
|
|
80
|
+
- L344 "code inspection" → Verifier step 10½ phrase scan (L484)
|
|
81
|
+
- L345 "too simple to test" → Verifier step 10½ phrase scan (L484)
|
|
82
|
+
- L346 "I'll test after" → will add to Verifier step 10½ in Task 4
|
|
83
|
+
- L347 "already manually tested" → Verifier step 10½ phrase scan (L484)
|
|
84
|
+
- L348 "partial check is enough" → Verifier step 10½ phrase scan (L484)
|
|
85
|
+
- L349 "I'm confident" → Verifier step 10½ phrase scan (L484)
|
|
86
|
+
- L350 "existing code has no tests" → TDD MANDATE covers ("no exceptions")
|
|
87
|
+
|
|
88
|
+
- [ ] **Step 2: Verify line count**
|
|
89
|
+
|
|
90
|
+
Run: `sed -n '/^## Forbidden Shortcuts/,/^## /p' src/scripts/init_ralph_desk.zsh | head -10`
|
|
91
|
+
Expected: 7 lines (1 header + 6 items)
|
|
92
|
+
|
|
93
|
+
---
|
|
94
|
+
|
|
95
|
+
### Task 3: Add "When Stuck" debugging guide
|
|
96
|
+
|
|
97
|
+
**Files:**
|
|
98
|
+
- Modify: `src/scripts/init_ralph_desk.zsh` (after Forbidden Shortcuts, before Iteration rules)
|
|
99
|
+
|
|
100
|
+
- [ ] **Step 1: Insert debugging guide**
|
|
101
|
+
|
|
102
|
+
Find:
|
|
103
|
+
```
|
|
104
|
+
- **NEVER modify Claude Code settings** (~/.claude/settings.json, .claude/settings.local.json, or any settings files). Do NOT add permissions, change models, or alter configuration. If a permission prompt blocks you, report it as blocked — do NOT try to edit settings to bypass it.
|
|
105
|
+
|
|
106
|
+
## Iteration rules
|
|
107
|
+
```
|
|
108
|
+
|
|
109
|
+
Replace with:
|
|
110
|
+
```
|
|
111
|
+
- **NEVER modify Claude Code settings** (~/.claude/settings.json, .claude/settings.local.json, or any settings files). Do NOT add permissions, change models, or alter configuration. If a permission prompt blocks you, report it as blocked — do NOT try to edit settings to bypass it.
|
|
112
|
+
|
|
113
|
+
## When Stuck (do NOT guess-and-fix)
|
|
114
|
+
> 1. STOP and READ the error. Trace the call stack. Identify the root cause before touching code.
|
|
115
|
+
> 2. Write a minimal test that reproduces the failure, then fix the root cause only.
|
|
116
|
+
> 3. If 3+ fixes fail on the same issue, signal "blocked" with your diagnosis.
|
|
117
|
+
|
|
118
|
+
## Iteration rules
|
|
119
|
+
```
|
|
120
|
+
|
|
121
|
+
- [ ] **Step 2: Verify insertion**
|
|
122
|
+
|
|
123
|
+
Run: `grep -n "When Stuck" src/scripts/init_ralph_desk.zsh`
|
|
124
|
+
Expected: exactly 1 match, between Forbidden Shortcuts and Iteration rules
|
|
125
|
+
|
|
126
|
+
---
|
|
127
|
+
|
|
128
|
+
### Task 4: Extend Verifier Anti-Rationalization + gap-close step 10½
|
|
129
|
+
|
|
130
|
+
**Files:**
|
|
131
|
+
- Modify: `src/scripts/init_ralph_desk.zsh:480,484`
|
|
132
|
+
|
|
133
|
+
- [ ] **Step 1: Add rationalization red flags to step 10¼**
|
|
134
|
+
|
|
135
|
+
Find:
|
|
136
|
+
```
|
|
137
|
+
- Never issue a silent PASS — every pass verdict must cite specific evidence for each AC checked
|
|
138
|
+
10½. **Worker Process Audit**:
|
|
139
|
+
```
|
|
140
|
+
|
|
141
|
+
Replace with:
|
|
142
|
+
```
|
|
143
|
+
- Never issue a silent PASS — every pass verdict must cite specific evidence for each AC checked
|
|
144
|
+
- Rationalization red flags: "tests pass so it works" (passing ≠ correct), "Worker is confident" (confidence ≠ evidence), "changes are minimal" (scope ≠ correctness)
|
|
145
|
+
10½. **Worker Process Audit**:
|
|
146
|
+
```
|
|
147
|
+
|
|
148
|
+
- [ ] **Step 2: Gap-close — add "I'll test after" to step 10½ phrase scan**
|
|
149
|
+
|
|
150
|
+
Find:
|
|
151
|
+
```
|
|
152
|
+
- Forbidden shortcuts: check done-claim claims and summary for forbidden phrases ("code inspection", "I'm confident", "too simple", "already manually tested", "partial check")
|
|
153
|
+
```
|
|
154
|
+
|
|
155
|
+
Replace with:
|
|
156
|
+
```
|
|
157
|
+
- Forbidden shortcuts: check done-claim claims and summary for forbidden phrases ("code inspection", "I'm confident", "too simple", "I'll test after", "already manually tested", "partial check")
|
|
158
|
+
```
|
|
159
|
+
|
|
160
|
+
- [ ] **Step 3: Verify both changes**
|
|
161
|
+
|
|
162
|
+
Run: `grep -n "Rationalization red flags\|I'll test after" src/scripts/init_ralph_desk.zsh`
|
|
163
|
+
Expected: 2 matches — one in step 10¼, one in step 10½
|
|
164
|
+
|
|
165
|
+
---
|
|
166
|
+
|
|
167
|
+
## Token Budget Verification
|
|
168
|
+
|
|
169
|
+
After all 4 tasks, verify net token reduction:
|
|
170
|
+
|
|
171
|
+
```bash
|
|
172
|
+
# Before: count Worker prompt lines (approximate)
|
|
173
|
+
# After: should be ~10 lines fewer
|
|
174
|
+
wc -l src/scripts/init_ralph_desk.zsh
|
|
175
|
+
# Compare with git: lines removed vs added
|
|
176
|
+
git diff --stat src/scripts/init_ralph_desk.zsh
|
|
177
|
+
```
|
|
178
|
+
|
|
179
|
+
Expected: net negative line count (fewer lines = less context pressure on Worker).
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "@ai-dev-methodologies/rlp-desk",
|
|
3
|
-
"version": "0.
|
|
3
|
+
"version": "0.7.1",
|
|
4
4
|
"description": "Fresh-context iterative loops for Claude Code — autonomous task completion with independent verification",
|
|
5
5
|
"scripts": {
|
|
6
6
|
"postinstall": "node scripts/postinstall.js",
|
package/src/commands/rlp-desk.md
CHANGED
|
@@ -69,14 +69,14 @@ Ask about these items one by one (or in small groups):
|
|
|
69
69
|
|
|
70
70
|
| Complexity | Worker | per-US Verifier | Final Verifier | Consensus |
|
|
71
71
|
|------------|--------|-----------------|----------------|-----------|
|
|
72
|
-
| LOW |
|
|
73
|
-
| MEDIUM |
|
|
72
|
+
| LOW | gpt-5.4:medium | sonnet | opus | final-only |
|
|
73
|
+
| MEDIUM | gpt-5.4:medium | opus | opus | final-only |
|
|
74
74
|
| HIGH | gpt-5.4:high | opus | opus | all |
|
|
75
75
|
| CRITICAL | gpt-5.4:high | opus | opus + human | all |
|
|
76
76
|
|
|
77
77
|
**Worker model selection** (cross-engine):
|
|
78
|
-
- **
|
|
79
|
-
- **
|
|
78
|
+
- **gpt-5.4:medium** — default recommendation (full context window, progressive upgrade handles harder US)
|
|
79
|
+
- **spark:high** — only when US is small enough for spark's 100k context (single-file, AC count <= 4, simple logic). Do NOT use as primary recommendation — spark context window is too small for most tasks
|
|
80
80
|
|
|
81
81
|
Present complexity score with evidence to the user, e.g.: "I rate this MEDIUM because: US count=4 (MEDIUM), file scope=2 (MEDIUM), logic=conditionals (MEDIUM), deps=none (LOW), impact=modify (MEDIUM). Highest=MEDIUM."
|
|
82
82
|
|
|
@@ -85,7 +85,7 @@ Ask about these items one by one (or in small groups):
|
|
|
85
85
|
**If codex is NOT installed** — say: "Codex is not installed. Defaulting to claude-only Worker. Note: without a second engine, your Verifier shares the same perspective as the Worker — there is a risk of blind spots where both Worker and Verifier miss the same issue. To unlock cross-engine coverage: `npm install -g @openai/codex`"
|
|
86
86
|
|
|
87
87
|
8. **Batch Capacity Check** — when verify-mode is batch and PRD is large:
|
|
88
|
-
- batch + spark + AC >
|
|
88
|
+
- batch + spark + AC > 4 → warn "spark 100k context limit — switch to gpt-5.4 or split smaller"
|
|
89
89
|
- batch + gpt-5.4 + AC > 15 → warn "too many ACs for single batch — consider wave split (3-4 US per wave)"
|
|
90
90
|
- per-us → no warning (US-level processing, no limit concern)
|
|
91
91
|
9. **Verify Mode** — per-us (default) or batch. Ask: "Verify after each user story (per-us, recommended) or only after all stories are done (batch)?" Default recommendation: per-us for 2+ stories.
|
|
@@ -144,11 +144,11 @@ Tell the user:
|
|
|
144
144
|
```
|
|
145
145
|
Available run commands (copy the one you want):
|
|
146
146
|
|
|
147
|
-
# ★ Recommended: cross-engine + final-consensus (
|
|
148
|
-
/rlp-desk run <actual-slug> --mode tmux --worker-model
|
|
147
|
+
# ★ Recommended: cross-engine + final-consensus (full context + blind-spot coverage):
|
|
148
|
+
/rlp-desk run <actual-slug> --mode tmux --worker-model gpt-5.4:medium --consensus final-only --debug
|
|
149
149
|
|
|
150
|
-
#
|
|
151
|
-
/rlp-desk run <actual-slug> --mode tmux --worker-model
|
|
150
|
+
# Small tasks only (single-file, AC <= 4, simple logic — spark 100k context limit):
|
|
151
|
+
/rlp-desk run <actual-slug> --mode tmux --worker-model spark:high --consensus final-only --debug
|
|
152
152
|
|
|
153
153
|
# Critical (full consensus on every verify):
|
|
154
154
|
/rlp-desk run <actual-slug> --mode tmux --worker-model gpt-5.4:high --consensus all --debug
|
|
@@ -158,7 +158,7 @@ Tell the user:
|
|
|
158
158
|
|
|
159
159
|
# Full options reference:
|
|
160
160
|
# --mode agent|tmux (default: agent)
|
|
161
|
-
# --worker-model MODEL haiku|sonnet|opus or
|
|
161
|
+
# --worker-model MODEL haiku|sonnet|opus or gpt-5.4:high|spark:high (default: haiku)
|
|
162
162
|
# --lock-worker-model disable auto model upgrade
|
|
163
163
|
# --verifier-model MODEL per-US verifier (default: sonnet)
|
|
164
164
|
# --final-verifier-model MODEL final ALL verifier (default: opus)
|
|
@@ -707,7 +707,7 @@ Example:
|
|
|
707
707
|
|
|
708
708
|
Run options:
|
|
709
709
|
--mode agent|tmux Execution mode (default: agent)
|
|
710
|
-
--worker-model MODEL Worker model: haiku|sonnet|opus or
|
|
710
|
+
--worker-model MODEL Worker model: haiku|sonnet|opus or gpt-5.4:high|spark:high (default: haiku)
|
|
711
711
|
--lock-worker-model Disable auto model upgrade on failure
|
|
712
712
|
--verifier-model MODEL per-US verifier (default: sonnet)
|
|
713
713
|
--final-verifier-model MODEL Final ALL verifier (default: opus)
|
|
@@ -203,10 +203,10 @@ print_run_presets() {
|
|
|
203
203
|
echo "Available run commands (copy the one you want):"
|
|
204
204
|
echo ""
|
|
205
205
|
if [[ $codex_available -eq 1 ]]; then
|
|
206
|
-
echo "# Recommended: cross-engine + final-consensus (
|
|
207
|
-
echo "/rlp-desk run $slug --worker-model gpt-5.4:
|
|
206
|
+
echo "# Recommended: cross-engine + final-consensus (full context + blind-spot coverage):"
|
|
207
|
+
echo "/rlp-desk run $slug --worker-model gpt-5.4:medium --final-consensus --debug"
|
|
208
208
|
echo ""
|
|
209
|
-
echo "#
|
|
209
|
+
echo "# Small tasks only (single-file, AC <= 4, simple logic — spark 100k context limit):"
|
|
210
210
|
echo "/rlp-desk run $slug --worker-model gpt-5.3-codex-spark:high --debug"
|
|
211
211
|
echo ""
|
|
212
212
|
echo "# Claude-only:"
|
|
@@ -322,6 +322,11 @@ Read these files in order:
|
|
|
322
322
|
3. Test Spec: $DESK/plans/test-spec-$SLUG.md → verification methods
|
|
323
323
|
4. Latest Context: $DESK/context/$SLUG-latest.md → current state
|
|
324
324
|
|
|
325
|
+
## TDD MANDATE (hard constraint — violation = automatic FAIL)
|
|
326
|
+
> Write failing tests FIRST → confirm RED (exit_code=1) → implement minimum code → confirm GREEN.
|
|
327
|
+
> Every NEW AC requires: write_test → verify_red → implement → verify_green in execution_steps.
|
|
328
|
+
> No exceptions. Verifier rejects missing RED evidence. For already-passing ACs, use verify_existing.
|
|
329
|
+
|
|
325
330
|
## SCOPE LOCK (hard constraint — violation causes verification failure)
|
|
326
331
|
- You MUST only implement the work described in the "Next Iteration Contract" from campaign memory.
|
|
327
332
|
- If the contract says "implement US-001 only", do ONLY that. Do NOT touch other stories.
|
|
@@ -330,28 +335,19 @@ Read these files in order:
|
|
|
330
335
|
- No file creation or modification outside the project root.
|
|
331
336
|
- Do not modify this prompt file or any PRD/test-spec files.
|
|
332
337
|
|
|
333
|
-
## Test-First Approach (read test-spec BEFORE coding)
|
|
334
|
-
1. Read test-spec "Impacted Tests" — if TODO (first iteration), skip to step 2 and fill this section during your work. Otherwise, run these FIRST to confirm they pass before your changes.
|
|
335
|
-
2. Read test-spec "Required New Tests" — write these. They SHOULD FAIL initially.
|
|
336
|
-
3. Implement minimum code to make all tests pass.
|
|
337
|
-
4. Run ALL tests (impacted + new) to confirm nothing is broken.
|
|
338
|
-
|
|
339
338
|
## Forbidden Shortcuts (Verifier will check these)
|
|
340
339
|
- Do not mock external services when L2 integration test is required by test-spec.
|
|
341
340
|
- Do not delete or weaken existing assertions to make tests pass.
|
|
342
|
-
- Do not add test-specific logic (code that detects it is running in a test).
|
|
343
341
|
- Do not skip boundary cases listed in the PRD.
|
|
344
|
-
- Do not claim "code inspection" as verification — run the actual command.
|
|
345
|
-
- Do not say "too simple to test" — simple code breaks. Test takes 30 seconds.
|
|
346
|
-
- Do not say "I'll test after" — tests passing immediately prove nothing.
|
|
347
|
-
- Do not say "already manually tested" — ad-hoc is not systematic, no record.
|
|
348
|
-
- Do not say "partial check is enough" — partial proves nothing about the whole.
|
|
349
|
-
- Do not say "I'm confident" — confidence is not evidence.
|
|
350
|
-
- Do not say "existing code has no tests" — you are improving it, add tests.
|
|
351
342
|
- Do not write code before tests — if you did, delete it and start with tests.
|
|
352
343
|
- **NEVER modify rlp-desk infrastructure files** (~/.claude/ralph-desk/*, ~/.claude/commands/rlp-desk.md). If you discover a bug in rlp-desk itself, report it in done-claim.json with {"status": "blocked", "reason": "rlp-desk bug: <description>"} and signal blocked. Do NOT attempt to fix rlp-desk — it is the orchestration tool, not your project code.
|
|
353
344
|
- **NEVER modify Claude Code settings** (~/.claude/settings.json, .claude/settings.local.json, or any settings files). Do NOT add permissions, change models, or alter configuration. If a permission prompt blocks you, report it as blocked — do NOT try to edit settings to bypass it.
|
|
354
345
|
|
|
346
|
+
## When Stuck (do NOT guess-and-fix)
|
|
347
|
+
> 1. STOP and READ the error. Trace the call stack. Identify the root cause before touching code.
|
|
348
|
+
> 2. Write a minimal test that reproduces the failure, then fix the root cause only.
|
|
349
|
+
> 3. If 3+ fixes fail on the same issue, signal "blocked" with your diagnosis.
|
|
350
|
+
|
|
355
351
|
## Iteration rules
|
|
356
352
|
- Use fresh context only; do NOT depend on prior chat history.
|
|
357
353
|
- Execute exactly the work specified in the Next Iteration Contract.
|
|
@@ -478,10 +474,11 @@ Check the iter-signal.json "us_id" field:
|
|
|
478
474
|
- If your verdict history shows a 100% pass rate, re-examine your last verdict with increased scrutiny — a 100% pass rate is a red flag for insufficient rigor
|
|
479
475
|
- When issuing PASS with explicit warning: note any concerning patterns (e.g., low test diversity, marginal coverage) even if technically passing
|
|
480
476
|
- Never issue a silent PASS — every pass verdict must cite specific evidence for each AC checked
|
|
477
|
+
- Rationalization red flags: "tests pass so it works" (passing ≠ correct), "Worker is confident" (confidence ≠ evidence), "changes are minimal" (scope ≠ correctness)
|
|
481
478
|
10½. **Worker Process Audit**:
|
|
482
479
|
- Test-first compliance: done-claim execution_steps must show write_test step before implement step for each AC
|
|
483
480
|
- RED phase evidence: at least one verify_red step with exit_code=1 per AC (proves tests were written before passing)
|
|
484
|
-
- Forbidden shortcuts: check done-claim claims and summary for forbidden phrases ("code inspection", "I'm confident", "too simple", "already manually tested", "partial check")
|
|
481
|
+
- Forbidden shortcuts: check done-claim claims and summary for forbidden phrases ("code inspection", "I'm confident", "too simple", "I'll test after", "already manually tested", "partial check")
|
|
485
482
|
- Step completeness: each AC should have write_test → verify_red → implement → verify_green sequence in execution_steps
|
|
486
483
|
11. **Reproducibility check**: verify lock file committed, clean install succeeds, security scan passes, env vars documented (per test-spec Reproducibility Gate). Skip if test-spec says "N/A."
|
|
487
484
|
12. Write verdict JSON to: $DESK/memos/$SLUG-verify-verdict.json
|
|
@@ -29,6 +29,33 @@ log_error() {
|
|
|
29
29
|
echo "[$(date '+%Y-%m-%d %H:%M:%S')] ERROR: $*" >&2
|
|
30
30
|
}
|
|
31
31
|
|
|
32
|
+
# build_claude_cmd() — centralized claude CLI command builder
|
|
33
|
+
# Single source of truth for all claude invocation flags (--no-mcp, DISABLE_OMC, etc.)
|
|
34
|
+
# Inspired by codex-plugin-cc companion pattern: CLI abstraction in one place.
|
|
35
|
+
# Args: $1=mode (tui|print) $2=model $3=prompt_file (print mode only) $4=output_log (print mode only)
|
|
36
|
+
# Output: complete command string on stdout
|
|
37
|
+
# Globals read: CLAUDE_BIN
|
|
38
|
+
build_claude_cmd() {
|
|
39
|
+
local mode="$1"
|
|
40
|
+
local model="$2"
|
|
41
|
+
local prompt_file="${3:-}"
|
|
42
|
+
local output_log="${4:-}"
|
|
43
|
+
|
|
44
|
+
local base="DISABLE_OMC=1 $CLAUDE_BIN --model $model --no-mcp --dangerously-skip-permissions"
|
|
45
|
+
case "$mode" in
|
|
46
|
+
tui)
|
|
47
|
+
echo "$base"
|
|
48
|
+
;;
|
|
49
|
+
print)
|
|
50
|
+
echo "$base -p \"\$(cat $prompt_file)\" --output-format text 2>&1 | tee $output_log"
|
|
51
|
+
;;
|
|
52
|
+
*)
|
|
53
|
+
echo "ERROR: build_claude_cmd unknown mode '$mode'" >&2
|
|
54
|
+
return 1
|
|
55
|
+
;;
|
|
56
|
+
esac
|
|
57
|
+
}
|
|
58
|
+
|
|
32
59
|
# parse_model_flag() — parse unified --worker-model / --verifier-model value
|
|
33
60
|
# Colon format (model:reasoning) → codex engine; plain name → claude engine.
|
|
34
61
|
# Spark alias: bare "spark" is expanded to full model ID "gpt-5.3-codex-spark".
|
|
@@ -1022,7 +1022,7 @@ restart_worker() {
|
|
|
1022
1022
|
if [[ "$WORKER_ENGINE" = "codex" ]]; then
|
|
1023
1023
|
safe_send_keys "$pane_id" "${CODEX_BIN:-codex} -m $WORKER_CODEX_MODEL -c model_reasoning_effort=\"$WORKER_CODEX_REASONING\" --dangerously-bypass-approvals-and-sandbox"
|
|
1024
1024
|
else
|
|
1025
|
-
safe_send_keys "$pane_id" "$
|
|
1025
|
+
safe_send_keys "$pane_id" "$(build_claude_cmd tui "$WORKER_MODEL")"
|
|
1026
1026
|
fi
|
|
1027
1027
|
WORKER_RESTARTS[$iter]=$((restart_count + 1))
|
|
1028
1028
|
return 0
|
|
@@ -1164,12 +1164,9 @@ write_worker_trigger() {
|
|
|
1164
1164
|
\"\$(cat $prompt_file)\""
|
|
1165
1165
|
local engine_comment="# Run codex with fresh context (fallback trigger — TUI primary launch via launch_worker_codex)"
|
|
1166
1166
|
else
|
|
1167
|
-
local engine_cmd
|
|
1168
|
-
|
|
1169
|
-
|
|
1170
|
-
--output-format text \\
|
|
1171
|
-
2>&1 | tee $output_log"
|
|
1172
|
-
local engine_comment="# Run claude with fresh context (governance.md s7 step 5)"
|
|
1167
|
+
local engine_cmd
|
|
1168
|
+
engine_cmd=$(build_claude_cmd print "$WORKER_MODEL" "$prompt_file" "$output_log")
|
|
1169
|
+
local engine_comment="# Run claude with fresh context, no MCP/skills (governance.md s7 step 5)"
|
|
1173
1170
|
fi
|
|
1174
1171
|
|
|
1175
1172
|
{
|
|
@@ -1253,12 +1250,9 @@ write_verifier_trigger() {
|
|
|
1253
1250
|
2>&1 | tee $output_log"
|
|
1254
1251
|
local engine_comment="# Run codex with fresh context (governance.md s7 step 7)"
|
|
1255
1252
|
else
|
|
1256
|
-
local engine_cmd
|
|
1257
|
-
|
|
1258
|
-
|
|
1259
|
-
--output-format text \\
|
|
1260
|
-
2>&1 | tee $output_log"
|
|
1261
|
-
local engine_comment="# Run claude with fresh context (governance.md s7 step 7)"
|
|
1253
|
+
local engine_cmd
|
|
1254
|
+
engine_cmd=$(build_claude_cmd print "$verifier_model" "$prompt_file" "$output_log")
|
|
1255
|
+
local engine_comment="# Run claude with fresh context, no MCP/skills (governance.md s7 step 7)"
|
|
1262
1256
|
fi
|
|
1263
1257
|
|
|
1264
1258
|
{
|
|
@@ -1661,7 +1655,7 @@ run_single_verifier() {
|
|
|
1661
1655
|
launch_verifier_codex "$VERIFIER_PANE" "$prompt_file" "$iter" "$verifier_launch"
|
|
1662
1656
|
log_debug "Verifier$suffix codex TUI dispatched"
|
|
1663
1657
|
else
|
|
1664
|
-
verifier_launch="$
|
|
1658
|
+
verifier_launch="$(build_claude_cmd tui "$model")"
|
|
1665
1659
|
if ! launch_verifier_claude "$VERIFIER_PANE" "$prompt_file" "$iter" "$verifier_launch"; then
|
|
1666
1660
|
log_error "Verifier$suffix failed to start"
|
|
1667
1661
|
return 1
|
|
@@ -1746,7 +1740,7 @@ run_sequential_final_verify() {
|
|
|
1746
1740
|
verifier_launch="${CODEX_BIN:-codex} -m $VERIFIER_CODEX_MODEL -c model_reasoning_effort=\"$VERIFIER_CODEX_REASONING\" --dangerously-bypass-approvals-and-sandbox"
|
|
1747
1741
|
launch_verifier_codex "$VERIFIER_PANE" "$verifier_prompt" "$iter" "$verifier_launch"
|
|
1748
1742
|
else
|
|
1749
|
-
verifier_launch="$
|
|
1743
|
+
verifier_launch="$(build_claude_cmd tui "$VERIFIER_MODEL")"
|
|
1750
1744
|
launch_verifier_claude "$VERIFIER_PANE" "$verifier_prompt" "$iter" "$verifier_launch" || {
|
|
1751
1745
|
log_error "Failed to launch verifier for $us"
|
|
1752
1746
|
FAILED_US="$us"
|
|
@@ -2191,7 +2185,7 @@ main() {
|
|
|
2191
2185
|
return 1
|
|
2192
2186
|
fi
|
|
2193
2187
|
else
|
|
2194
|
-
worker_launch="$
|
|
2188
|
+
worker_launch="$(build_claude_cmd tui "$WORKER_MODEL")"
|
|
2195
2189
|
if ! launch_worker_claude "$WORKER_PANE" "$worker_prompt" "$ITERATION" "$worker_launch"; then
|
|
2196
2190
|
write_blocked_sentinel "Worker claude failed to start in pane"
|
|
2197
2191
|
update_status "blocked" "worker_start_failed"
|
|
@@ -2368,7 +2362,7 @@ main() {
|
|
|
2368
2362
|
if [[ "$VERIFIER_ENGINE" = "codex" ]]; then
|
|
2369
2363
|
verifier_launch="${CODEX_BIN:-codex} -m $VERIFIER_CODEX_MODEL -c model_reasoning_effort=\"$VERIFIER_CODEX_REASONING\" --dangerously-bypass-approvals-and-sandbox"
|
|
2370
2364
|
else
|
|
2371
|
-
verifier_launch="$
|
|
2365
|
+
verifier_launch="$(build_claude_cmd tui "$VERIFIER_MODEL")"
|
|
2372
2366
|
fi
|
|
2373
2367
|
log_debug "[FLOW] iter=$ITERATION phase=verifier engine=$VERIFIER_ENGINE model=$VERIFIER_MODEL scope=${signal_us_id:-all} dispatched=true"
|
|
2374
2368
|
|