@ai-dev-methodologies/rlp-desk 0.9.3 → 0.10.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +5 -5
- package/docs/protocol-reference.md +1 -1
- package/docs/superpowers/plans/2026-04-24-gpt-5-5-default.md +517 -0
- package/docs/superpowers/specs/2026-04-24-gpt-5-5-default.md +107 -0
- package/package.json +1 -1
- package/src/commands/rlp-desk.md +23 -23
- package/src/governance.md +7 -7
- package/src/model-upgrade-table.md +5 -5
- package/src/node/run.mjs +2 -2
- package/src/node/runner/campaign-main-loop.mjs +3 -3
package/README.md
CHANGED
|
@@ -244,8 +244,8 @@ When all US pass individually, the final ALL verify runs **sequentially per-US**
|
|
|
244
244
|
| `--verifier-model MODEL` | sonnet | per-US verification model (lighter) |
|
|
245
245
|
| `--final-verifier-model MODEL` | opus | final ALL verification model (stricter) |
|
|
246
246
|
| `--consensus off\|all\|final-only` | off | Cross-engine consensus scope |
|
|
247
|
-
| `--consensus-model MODEL` | gpt-5.
|
|
248
|
-
| `--final-consensus-model MODEL` | gpt-5.
|
|
247
|
+
| `--consensus-model MODEL` | gpt-5.5:medium | per-US cross-verifier (lighter) |
|
|
248
|
+
| `--final-consensus-model MODEL` | gpt-5.5:high | final cross-verifier (stricter) |
|
|
249
249
|
| `--verify-mode per-us\|batch` | per-us | per-us: verify each US → final ALL |
|
|
250
250
|
| `--cb-threshold N` | 6 | Consecutive failures → BLOCKED |
|
|
251
251
|
| `--max-iter N` | 100 | Max iterations → TIMEOUT |
|
|
@@ -267,7 +267,7 @@ When `--consensus` is enabled, a second cross-engine verifier runs alongside eac
|
|
|
267
267
|
After `brainstorm`, `init` detects your environment and presents run command presets:
|
|
268
268
|
|
|
269
269
|
- **Codex detected (GPT Pro / spark)** → recommends cross-engine mode (`--worker-model spark:high --consensus final-only`)
|
|
270
|
-
- **Codex detected (large PRD, AC > 15)** → offers gpt-5.
|
|
270
|
+
- **Codex detected (large PRD, AC > 15)** → offers gpt-5.5 preset (`--worker-model gpt-5.5:high --consensus final-only`)
|
|
271
271
|
- **Claude-only** → defaults to `--debug` with haiku worker and opus final verifier
|
|
272
272
|
- **Basic** → minimal flags for quick iteration
|
|
273
273
|
|
|
@@ -363,7 +363,7 @@ npm install -g @openai/codex
|
|
|
363
363
|
/rlp-desk run calculator --worker-model spark:high
|
|
364
364
|
|
|
365
365
|
# Customize model and reasoning effort
|
|
366
|
-
/rlp-desk run calculator --worker-model gpt-5.
|
|
366
|
+
/rlp-desk run calculator --worker-model gpt-5.5:high
|
|
367
367
|
|
|
368
368
|
# Cross-engine: codex worker, claude verifier (recommended)
|
|
369
369
|
/rlp-desk run calculator --worker-model spark:high --consensus final-only --debug
|
|
@@ -406,7 +406,7 @@ By default, Worker and Verifier stop and ask for human input when they encounter
|
|
|
406
406
|
**`--autonomous`** enables fully unattended campaigns:
|
|
407
407
|
|
|
408
408
|
```bash
|
|
409
|
-
/rlp-desk run my-feature --mode tmux --worker-model gpt-5.
|
|
409
|
+
/rlp-desk run my-feature --mode tmux --worker-model gpt-5.5:medium --autonomous --debug
|
|
410
410
|
```
|
|
411
411
|
|
|
412
412
|
### How it works
|
|
@@ -438,7 +438,7 @@ The `run` command accepts engine flags to control which CLI executes Worker and
|
|
|
438
438
|
|------|---------|-------------|
|
|
439
439
|
| `--worker-engine claude\|codex` | `claude` | Engine for Worker |
|
|
440
440
|
| `--verifier-engine claude\|codex` | `claude` | Engine for Verifier |
|
|
441
|
-
| `--codex-model MODEL` | `gpt-5.
|
|
441
|
+
| `--codex-model MODEL` | `gpt-5.5` | Model passed to the `codex` CLI (when engine=codex) |
|
|
442
442
|
| `--codex-reasoning low\|medium\|high` | `high` | Reasoning effort for the `codex` CLI |
|
|
443
443
|
|
|
444
444
|
**Claude engine** (default): uses `Agent()` in agent mode, `claude -p` with `--dangerously-skip-permissions` in tmux mode.
|
|
@@ -0,0 +1,517 @@
|
|
|
1
|
+
# Default Codex Model Migration (gpt-5.4 → gpt-5.5) Implementation Plan
|
|
2
|
+
|
|
3
|
+
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
|
|
4
|
+
|
|
5
|
+
**Goal:** Migrate rlp-desk's default Codex model references and env-var defaults from `gpt-5.4` to `gpt-5.5` across 6 source files, without adding remap logic or altering reasoning-effort tiers. Ship as v0.10.0.
|
|
6
|
+
|
|
7
|
+
**Architecture:** Purely textual replacement in configuration sites (shell defaults, Node defaults, upgrade-chain keys, doc examples, upgrade-table markdown) plus a `version` bump. No runtime logic is added. User-supplied model strings are unchanged — Codex CLI already accepts any passthrough name.
|
|
8
|
+
|
|
9
|
+
**Tech Stack:** zsh, Node.js (ESM), markdown, npm (package.json). No new dependencies.
|
|
10
|
+
|
|
11
|
+
**Branch:** `feature/gpt-5-5-default` (already created at plan time).
|
|
12
|
+
|
|
13
|
+
**Spec:** `docs/superpowers/specs/2026-04-24-gpt-5-5-default.md`.
|
|
14
|
+
|
|
15
|
+
---
|
|
16
|
+
|
|
17
|
+
## File Map
|
|
18
|
+
|
|
19
|
+
| File | Role after this plan |
|
|
20
|
+
|---|---|
|
|
21
|
+
| `src/scripts/run_ralph_desk.zsh` | Shell entrypoint. Env var defaults now quote `gpt-5.5` in four places and in two CLI fallback sites. |
|
|
22
|
+
| `src/node/run.mjs` | Node CLI defaults. `consensusModel` and `finalConsensusModel` default to `gpt-5.5:medium` / `gpt-5.5:high`. |
|
|
23
|
+
| `src/node/runner/campaign-main-loop.mjs` | Upgrade chain keys use `gpt-5.5` namespace. Claude-engine and spark rows unchanged. |
|
|
24
|
+
| `src/model-upgrade-table.md` | "Non-Pro" section retitled to `gpt-5.5`; table cells use `gpt-5.5:*`. |
|
|
25
|
+
| `src/commands/rlp-desk.md` | Recommendation table, worker-model guidance, batch warnings, examples, and help text reference `gpt-5.5`. |
|
|
26
|
+
| `package.json` | `version` bumped from `0.9.3` to `0.10.0`. |
|
|
27
|
+
| `docs/superpowers/specs/2026-04-24-gpt-5-5-default.md` | Already written; intentionally contains `gpt-5.4` as historical context. Excluded from regression sweep. |
|
|
28
|
+
|
|
29
|
+
Self-verification gate (per `CLAUDE.md`) is a required workflow step before commit; it does not produce tracked files but is captured as Task 8.
|
|
30
|
+
|
|
31
|
+
---
|
|
32
|
+
|
|
33
|
+
## Task 1: Update shell defaults in `run_ralph_desk.zsh`
|
|
34
|
+
|
|
35
|
+
**Files:**
|
|
36
|
+
- Modify: `src/scripts/run_ralph_desk.zsh:33,35,104,106,114-115,2762,2766`
|
|
37
|
+
|
|
38
|
+
- [ ] **Step 1: Replace comment defaults on lines 33 and 35**
|
|
39
|
+
|
|
40
|
+
Before (lines 33 and 35):
|
|
41
|
+
|
|
42
|
+
```zsh
|
|
43
|
+
# WORKER_CODEX_MODEL - codex model for Worker (default: gpt-5.4)
|
|
44
|
+
...
|
|
45
|
+
# VERIFIER_CODEX_MODEL - codex model for Verifier (default: gpt-5.4)
|
|
46
|
+
```
|
|
47
|
+
|
|
48
|
+
After:
|
|
49
|
+
|
|
50
|
+
```zsh
|
|
51
|
+
# WORKER_CODEX_MODEL - codex model for Worker (default: gpt-5.5)
|
|
52
|
+
...
|
|
53
|
+
# VERIFIER_CODEX_MODEL - codex model for Verifier (default: gpt-5.5)
|
|
54
|
+
```
|
|
55
|
+
|
|
56
|
+
Edit via the Edit tool, matching the full line so the change is unambiguous.
|
|
57
|
+
|
|
58
|
+
- [ ] **Step 2: Replace env var defaults on lines 104 and 106**
|
|
59
|
+
|
|
60
|
+
Before:
|
|
61
|
+
|
|
62
|
+
```zsh
|
|
63
|
+
WORKER_CODEX_MODEL="${WORKER_CODEX_MODEL:-gpt-5.4}"
|
|
64
|
+
WORKER_CODEX_REASONING="${WORKER_CODEX_REASONING:-high}" # low|medium|high
|
|
65
|
+
VERIFIER_CODEX_MODEL="${VERIFIER_CODEX_MODEL:-gpt-5.4}"
|
|
66
|
+
```
|
|
67
|
+
|
|
68
|
+
After:
|
|
69
|
+
|
|
70
|
+
```zsh
|
|
71
|
+
WORKER_CODEX_MODEL="${WORKER_CODEX_MODEL:-gpt-5.5}"
|
|
72
|
+
WORKER_CODEX_REASONING="${WORKER_CODEX_REASONING:-high}" # low|medium|high
|
|
73
|
+
VERIFIER_CODEX_MODEL="${VERIFIER_CODEX_MODEL:-gpt-5.5}"
|
|
74
|
+
```
|
|
75
|
+
|
|
76
|
+
- [ ] **Step 3: Replace consensus defaults on lines 114 and 115**
|
|
77
|
+
|
|
78
|
+
Before:
|
|
79
|
+
|
|
80
|
+
```zsh
|
|
81
|
+
CONSENSUS_MODEL="${CONSENSUS_MODEL:-gpt-5.4:medium}" # per-US cross-verifier (lighter)
|
|
82
|
+
FINAL_CONSENSUS_MODEL="${FINAL_CONSENSUS_MODEL:-gpt-5.4:high}" # final cross-verifier (stricter)
|
|
83
|
+
```
|
|
84
|
+
|
|
85
|
+
After:
|
|
86
|
+
|
|
87
|
+
```zsh
|
|
88
|
+
CONSENSUS_MODEL="${CONSENSUS_MODEL:-gpt-5.5:medium}" # per-US cross-verifier (lighter)
|
|
89
|
+
FINAL_CONSENSUS_MODEL="${FINAL_CONSENSUS_MODEL:-gpt-5.5:high}" # final cross-verifier (stricter)
|
|
90
|
+
```
|
|
91
|
+
|
|
92
|
+
- [ ] **Step 4: Replace CLI fallback defaults on lines 2762 and 2766**
|
|
93
|
+
|
|
94
|
+
Before:
|
|
95
|
+
|
|
96
|
+
```zsh
|
|
97
|
+
--consensus-model)
|
|
98
|
+
(( _cli_i++ ))
|
|
99
|
+
CONSENSUS_MODEL="${@[$_cli_i]:-gpt-5.4:medium}"
|
|
100
|
+
;;
|
|
101
|
+
--final-consensus-model)
|
|
102
|
+
(( _cli_i++ ))
|
|
103
|
+
FINAL_CONSENSUS_MODEL="${@[$_cli_i]:-gpt-5.4:high}"
|
|
104
|
+
;;
|
|
105
|
+
```
|
|
106
|
+
|
|
107
|
+
After:
|
|
108
|
+
|
|
109
|
+
```zsh
|
|
110
|
+
--consensus-model)
|
|
111
|
+
(( _cli_i++ ))
|
|
112
|
+
CONSENSUS_MODEL="${@[$_cli_i]:-gpt-5.5:medium}"
|
|
113
|
+
;;
|
|
114
|
+
--final-consensus-model)
|
|
115
|
+
(( _cli_i++ ))
|
|
116
|
+
FINAL_CONSENSUS_MODEL="${@[$_cli_i]:-gpt-5.5:high}"
|
|
117
|
+
;;
|
|
118
|
+
```
|
|
119
|
+
|
|
120
|
+
- [ ] **Step 5: Verify zero residual `gpt-5.4` in file**
|
|
121
|
+
|
|
122
|
+
Run:
|
|
123
|
+
|
|
124
|
+
```bash
|
|
125
|
+
grep -n "gpt-5\.4" src/scripts/run_ralph_desk.zsh
|
|
126
|
+
```
|
|
127
|
+
|
|
128
|
+
Expected: no output. Any hit means the previous steps missed a line.
|
|
129
|
+
|
|
130
|
+
- [ ] **Step 6: Shell syntax sanity check**
|
|
131
|
+
|
|
132
|
+
Run:
|
|
133
|
+
|
|
134
|
+
```bash
|
|
135
|
+
zsh -n src/scripts/run_ralph_desk.zsh
|
|
136
|
+
```
|
|
137
|
+
|
|
138
|
+
Expected: no output (exit 0). Parse errors would be shown to stderr.
|
|
139
|
+
|
|
140
|
+
---
|
|
141
|
+
|
|
142
|
+
## Task 2: Update Node defaults in `run.mjs`
|
|
143
|
+
|
|
144
|
+
**Files:**
|
|
145
|
+
- Modify: `src/node/run.mjs:14-15`
|
|
146
|
+
|
|
147
|
+
- [ ] **Step 1: Replace default model strings in `RUN_DEFAULTS`**
|
|
148
|
+
|
|
149
|
+
Before (lines 14-15):
|
|
150
|
+
|
|
151
|
+
```js
|
|
152
|
+
consensusModel: 'gpt-5.4:medium',
|
|
153
|
+
finalConsensusModel: 'gpt-5.4:high',
|
|
154
|
+
```
|
|
155
|
+
|
|
156
|
+
After:
|
|
157
|
+
|
|
158
|
+
```js
|
|
159
|
+
consensusModel: 'gpt-5.5:medium',
|
|
160
|
+
finalConsensusModel: 'gpt-5.5:high',
|
|
161
|
+
```
|
|
162
|
+
|
|
163
|
+
- [ ] **Step 2: Verify zero residual `gpt-5.4` in file**
|
|
164
|
+
|
|
165
|
+
Run:
|
|
166
|
+
|
|
167
|
+
```bash
|
|
168
|
+
grep -n "gpt-5\.4" src/node/run.mjs
|
|
169
|
+
```
|
|
170
|
+
|
|
171
|
+
Expected: no output.
|
|
172
|
+
|
|
173
|
+
- [ ] **Step 3: Parse check**
|
|
174
|
+
|
|
175
|
+
Run:
|
|
176
|
+
|
|
177
|
+
```bash
|
|
178
|
+
node --check src/node/run.mjs
|
|
179
|
+
```
|
|
180
|
+
|
|
181
|
+
Expected: no output (exit 0). Syntax errors would print to stderr.
|
|
182
|
+
|
|
183
|
+
---
|
|
184
|
+
|
|
185
|
+
## Task 3: Update upgrade chain in `campaign-main-loop.mjs`
|
|
186
|
+
|
|
187
|
+
**Files:**
|
|
188
|
+
- Modify: `src/node/runner/campaign-main-loop.mjs:29-31`
|
|
189
|
+
|
|
190
|
+
- [ ] **Step 1: Replace the non-spark rows of `MODEL_UPGRADES`**
|
|
191
|
+
|
|
192
|
+
Before (lines 28-35):
|
|
193
|
+
|
|
194
|
+
```js
|
|
195
|
+
const MODEL_UPGRADES = {
|
|
196
|
+
'gpt-5.4:medium': 'gpt-5.4:high',
|
|
197
|
+
'gpt-5.4:high': 'gpt-5.4:xhigh',
|
|
198
|
+
'gpt-5.4:xhigh': 'BLOCKED',
|
|
199
|
+
'gpt-5.3-codex-spark:medium': 'gpt-5.3-codex-spark:high',
|
|
200
|
+
'gpt-5.3-codex-spark:high': 'gpt-5.3-codex-spark:xhigh',
|
|
201
|
+
'gpt-5.3-codex-spark:xhigh': 'BLOCKED',
|
|
202
|
+
};
|
|
203
|
+
```
|
|
204
|
+
|
|
205
|
+
After:
|
|
206
|
+
|
|
207
|
+
```js
|
|
208
|
+
const MODEL_UPGRADES = {
|
|
209
|
+
'gpt-5.5:medium': 'gpt-5.5:high',
|
|
210
|
+
'gpt-5.5:high': 'gpt-5.5:xhigh',
|
|
211
|
+
'gpt-5.5:xhigh': 'BLOCKED',
|
|
212
|
+
'gpt-5.3-codex-spark:medium': 'gpt-5.3-codex-spark:high',
|
|
213
|
+
'gpt-5.3-codex-spark:high': 'gpt-5.3-codex-spark:xhigh',
|
|
214
|
+
'gpt-5.3-codex-spark:xhigh': 'BLOCKED',
|
|
215
|
+
};
|
|
216
|
+
```
|
|
217
|
+
|
|
218
|
+
- [ ] **Step 2: Verify zero residual `gpt-5.4` in file**
|
|
219
|
+
|
|
220
|
+
Run:
|
|
221
|
+
|
|
222
|
+
```bash
|
|
223
|
+
grep -n "gpt-5\.4" src/node/runner/campaign-main-loop.mjs
|
|
224
|
+
```
|
|
225
|
+
|
|
226
|
+
Expected: no output.
|
|
227
|
+
|
|
228
|
+
- [ ] **Step 3: Parse check**
|
|
229
|
+
|
|
230
|
+
Run:
|
|
231
|
+
|
|
232
|
+
```bash
|
|
233
|
+
node --check src/node/runner/campaign-main-loop.mjs
|
|
234
|
+
```
|
|
235
|
+
|
|
236
|
+
Expected: no output.
|
|
237
|
+
|
|
238
|
+
---
|
|
239
|
+
|
|
240
|
+
## Task 4: Update upgrade table in `model-upgrade-table.md`
|
|
241
|
+
|
|
242
|
+
**Files:**
|
|
243
|
+
- Modify: `src/model-upgrade-table.md:21-28`
|
|
244
|
+
|
|
245
|
+
- [ ] **Step 1: Rename the Non-Pro section and rewrite the table body**
|
|
246
|
+
|
|
247
|
+
Before (lines 21-28):
|
|
248
|
+
|
|
249
|
+
```markdown
|
|
250
|
+
## Non-Pro (gpt-5.4)
|
|
251
|
+
|
|
252
|
+
| Complexity | 1-2 | 3-4 | 5-6 | 7+ |
|
|
253
|
+
|------------|-----|-----|-----|-----|
|
|
254
|
+
| LOW | gpt-5.4:low | gpt-5.4:medium | gpt-5.4:high | BLOCKED |
|
|
255
|
+
| MEDIUM | gpt-5.4:medium | gpt-5.4:high | gpt-5.4:xhigh | BLOCKED |
|
|
256
|
+
| HIGH | gpt-5.4:high | gpt-5.4:xhigh | gpt-5.4:xhigh | BLOCKED |
|
|
257
|
+
| CRITICAL | gpt-5.4:xhigh | gpt-5.4:xhigh | gpt-5.4:xhigh | BLOCKED |
|
|
258
|
+
```
|
|
259
|
+
|
|
260
|
+
After:
|
|
261
|
+
|
|
262
|
+
```markdown
|
|
263
|
+
## Non-Pro (gpt-5.5)
|
|
264
|
+
|
|
265
|
+
| Complexity | 1-2 | 3-4 | 5-6 | 7+ |
|
|
266
|
+
|------------|-----|-----|-----|-----|
|
|
267
|
+
| LOW | gpt-5.5:low | gpt-5.5:medium | gpt-5.5:high | BLOCKED |
|
|
268
|
+
| MEDIUM | gpt-5.5:medium | gpt-5.5:high | gpt-5.5:xhigh | BLOCKED |
|
|
269
|
+
| HIGH | gpt-5.5:high | gpt-5.5:xhigh | gpt-5.5:xhigh | BLOCKED |
|
|
270
|
+
| CRITICAL | gpt-5.5:xhigh | gpt-5.5:xhigh | gpt-5.5:xhigh | BLOCKED |
|
|
271
|
+
```
|
|
272
|
+
|
|
273
|
+
The GPT Pro and Claude-only sections remain unchanged.
|
|
274
|
+
|
|
275
|
+
- [ ] **Step 2: Verify zero residual `gpt-5.4` in file**
|
|
276
|
+
|
|
277
|
+
Run:
|
|
278
|
+
|
|
279
|
+
```bash
|
|
280
|
+
grep -n "gpt-5\.4" src/model-upgrade-table.md
|
|
281
|
+
```
|
|
282
|
+
|
|
283
|
+
Expected: no output.
|
|
284
|
+
|
|
285
|
+
---
|
|
286
|
+
|
|
287
|
+
## Task 5: Update user-facing documentation in `rlp-desk.md`
|
|
288
|
+
|
|
289
|
+
**Files:**
|
|
290
|
+
- Modify: `src/commands/rlp-desk.md` (14 known occurrences across ~6 blocks: lines 78-81, 84, 94-95, 167, 173, 180, 185-186, 243, 251-252, 295-296, 476-477, 735, 740-741)
|
|
291
|
+
|
|
292
|
+
Because the same token (`gpt-5.4`) appears with both `:medium` and `:high` suffixes in a tabular document, use a global replace rather than editing each occurrence individually.
|
|
293
|
+
|
|
294
|
+
- [ ] **Step 1: Perform global replace of `gpt-5.4` → `gpt-5.5`**
|
|
295
|
+
|
|
296
|
+
Use the Edit tool with `replace_all: true` on the single token `gpt-5.4`. This is safe because:
|
|
297
|
+
- No reasoning effort label matches `gpt-5.4` as a substring.
|
|
298
|
+
- No URL or external reference in the file uses `gpt-5.4`.
|
|
299
|
+
- The spec file that intentionally mentions `gpt-5.4` lives in `docs/superpowers/specs/`, not `src/commands/`.
|
|
300
|
+
|
|
301
|
+
Edit parameters:
|
|
302
|
+
|
|
303
|
+
```
|
|
304
|
+
old_string: "gpt-5.4"
|
|
305
|
+
new_string: "gpt-5.5"
|
|
306
|
+
replace_all: true
|
|
307
|
+
```
|
|
308
|
+
|
|
309
|
+
- [ ] **Step 2: Verify zero residual `gpt-5.4` in file**
|
|
310
|
+
|
|
311
|
+
Run:
|
|
312
|
+
|
|
313
|
+
```bash
|
|
314
|
+
grep -n "gpt-5\.4" src/commands/rlp-desk.md
|
|
315
|
+
```
|
|
316
|
+
|
|
317
|
+
Expected: no output.
|
|
318
|
+
|
|
319
|
+
- [ ] **Step 3: Spot-check a few critical lines**
|
|
320
|
+
|
|
321
|
+
Run:
|
|
322
|
+
|
|
323
|
+
```bash
|
|
324
|
+
grep -n "gpt-5\.5" src/commands/rlp-desk.md | head -20
|
|
325
|
+
```
|
|
326
|
+
|
|
327
|
+
Expected: at least 14 lines referencing `gpt-5.5`, including the recommendation table body (two rows with `gpt-5.5:medium`, two with `gpt-5.5:high`), the worker model selection bullet, batch warnings, env var echo lines, example commands, and the help text.
|
|
328
|
+
|
|
329
|
+
---
|
|
330
|
+
|
|
331
|
+
## Task 6: Bump version to 0.10.0
|
|
332
|
+
|
|
333
|
+
**Files:**
|
|
334
|
+
- Modify: `package.json:3`
|
|
335
|
+
|
|
336
|
+
- [ ] **Step 1: Edit the `version` field**
|
|
337
|
+
|
|
338
|
+
Before:
|
|
339
|
+
|
|
340
|
+
```json
|
|
341
|
+
"version": "0.9.3",
|
|
342
|
+
```
|
|
343
|
+
|
|
344
|
+
After:
|
|
345
|
+
|
|
346
|
+
```json
|
|
347
|
+
"version": "0.10.0",
|
|
348
|
+
```
|
|
349
|
+
|
|
350
|
+
- [ ] **Step 2: Verify package.json is still valid JSON**
|
|
351
|
+
|
|
352
|
+
Run:
|
|
353
|
+
|
|
354
|
+
```bash
|
|
355
|
+
node -e 'const pkg = require("./package.json"); console.log(pkg.version)'
|
|
356
|
+
```
|
|
357
|
+
|
|
358
|
+
Expected stdout: `0.10.0`.
|
|
359
|
+
|
|
360
|
+
---
|
|
361
|
+
|
|
362
|
+
## Task 7: Repo-wide regression sweep
|
|
363
|
+
|
|
364
|
+
**Files:** none modified in this task — it is a verification gate.
|
|
365
|
+
|
|
366
|
+
- [ ] **Step 1: Confirm zero `gpt-5.4` hits under `src/`**
|
|
367
|
+
|
|
368
|
+
Run:
|
|
369
|
+
|
|
370
|
+
```bash
|
|
371
|
+
grep -rn "gpt-5\.4" src/
|
|
372
|
+
```
|
|
373
|
+
|
|
374
|
+
Expected: no output.
|
|
375
|
+
|
|
376
|
+
- [ ] **Step 2: Confirm zero `gpt-5.4` hits under `docs/` except the spec file**
|
|
377
|
+
|
|
378
|
+
Run:
|
|
379
|
+
|
|
380
|
+
```bash
|
|
381
|
+
grep -rn "gpt-5\.4" docs/ --exclude-dir=specs
|
|
382
|
+
```
|
|
383
|
+
|
|
384
|
+
Expected: no output.
|
|
385
|
+
|
|
386
|
+
And:
|
|
387
|
+
|
|
388
|
+
```bash
|
|
389
|
+
grep -rln "gpt-5\.4" docs/superpowers/specs/
|
|
390
|
+
```
|
|
391
|
+
|
|
392
|
+
Expected: exactly one file — `docs/superpowers/specs/2026-04-24-gpt-5-5-default.md`.
|
|
393
|
+
|
|
394
|
+
- [ ] **Step 3: Confirm `gpt-5.5` references now exist**
|
|
395
|
+
|
|
396
|
+
Run:
|
|
397
|
+
|
|
398
|
+
```bash
|
|
399
|
+
grep -rc "gpt-5\.5" src/ | grep -v ':0$'
|
|
400
|
+
```
|
|
401
|
+
|
|
402
|
+
Expected: at least 5 files each with a non-zero count, covering `run_ralph_desk.zsh`, `run.mjs`, `campaign-main-loop.mjs`, `model-upgrade-table.md`, and `commands/rlp-desk.md`.
|
|
403
|
+
|
|
404
|
+
If any step here fails, stop and repair the offending file before continuing. Do not proceed to Task 8 until the sweep is clean.
|
|
405
|
+
|
|
406
|
+
---
|
|
407
|
+
|
|
408
|
+
## Task 8: Self-verification gate (CLAUDE.md §Self-Verification Gate)
|
|
409
|
+
|
|
410
|
+
**Files:** none modified directly. This task produces three verifier verdict records that must all PASS before Task 9.
|
|
411
|
+
|
|
412
|
+
This task is mandatory because Tasks 1 and 5 modified `src/scripts/run_ralph_desk.zsh` and `src/commands/rlp-desk.md`. Per `CLAUDE.md`, all three scenarios must run with real Worker (`execution_steps`) → Verifier (5 categories with `reasoning`) → PASS. Any FAIL blocks commit; fix and re-run all three.
|
|
413
|
+
|
|
414
|
+
- [ ] **Step 1: Scenario LOW — default env propagation (L1+L3)**
|
|
415
|
+
|
|
416
|
+
Describe:
|
|
417
|
+
- **Worker task:** From the repo root, source `src/scripts/run_ralph_desk.zsh` (or simulate by evaluating only the env-var default block, lines 104-115 and 2760-2768) and print `WORKER_CODEX_MODEL`, `VERIFIER_CODEX_MODEL`, `CONSENSUS_MODEL`, `FINAL_CONSENSUS_MODEL`. Execution steps must include the exact commands run and their stdout.
|
|
418
|
+
- **Verifier categories:** (1) Correctness — all four variables equal the expected `gpt-5.5*` strings. (2) Completeness — no variable is empty or inherits an unexpected default. (3) Regression — the script still parses under `zsh -n`. (4) Docs alignment — the printed value matches what `rlp-desk.md` now documents. (5) Reasoning — one paragraph explaining why the result is convincing.
|
|
419
|
+
- **PASS criteria:** All four variables equal `gpt-5.5`, `gpt-5.5`, `gpt-5.5:medium`, `gpt-5.5:high` respectively, with zsh parse success.
|
|
420
|
+
|
|
421
|
+
- [ ] **Step 2: Scenario MEDIUM — explicit override passthrough (L1+L2+L3)**
|
|
422
|
+
|
|
423
|
+
Describe:
|
|
424
|
+
- **Worker task:** Invoke the shell with `CONSENSUS_MODEL=gpt-5.4:high` exported before sourcing, then print the resulting `CONSENSUS_MODEL`. Also invoke with an unknown model string such as `CONSENSUS_MODEL=foo-model:custom` and print the result. Execution steps must include both runs and their stdout.
|
|
425
|
+
- **Verifier categories:** (1) Correctness — both overrides are preserved byte-for-byte. (2) Completeness — no remap or warning is emitted to stderr. (3) Integration — the downstream variables that reference `CONSENSUS_MODEL` (e.g., when building consensus prompts) carry the override. (4) Docs alignment — `rlp-desk.md` describes `--consensus-model` as passthrough with a documented default. (5) Reasoning — justify why passthrough is the intended contract.
|
|
426
|
+
- **PASS criteria:** `gpt-5.4:high` and `foo-model:custom` appear unchanged; stderr is silent about remapping.
|
|
427
|
+
|
|
428
|
+
- [ ] **Step 3: Scenario CRITICAL — upgrade chain + security check (L1+L2+L3 + security)**
|
|
429
|
+
|
|
430
|
+
Describe:
|
|
431
|
+
- **Worker task:** In a Node one-off, import the `MODEL_UPGRADES` table from `src/node/runner/campaign-main-loop.mjs` (or inline-copy if import surface is not exposed) and assert the chain `gpt-5.5:medium → gpt-5.5:high → gpt-5.5:xhigh → BLOCKED`. Then verify the spark chain is untouched. Execution steps must include the exact Node invocation and output.
|
|
432
|
+
- **Verifier categories:** (1) Correctness — the four assertions hold. (2) Completeness — no extra keys were introduced; legacy `gpt-5.4:*` keys are gone; spark keys remain exact. (3) Error-path — hitting `BLOCKED` is the terminal state; no silent key lookup fallback. (4) Security — confirm no shell expansion, template injection, or model-string concatenation was added to the hot path by this change set (spot-check `campaign-main-loop.mjs` diff). (5) Reasoning — one paragraph.
|
|
433
|
+
- **PASS criteria:** All four assertions pass; spark chain intact; security spot-check clean.
|
|
434
|
+
|
|
435
|
+
- [ ] **Step 4: Gate decision**
|
|
436
|
+
|
|
437
|
+
If any scenario FAIL:
|
|
438
|
+
- Fix the underlying code (not the test).
|
|
439
|
+
- Re-run the failing scenario first; then re-run all three for clean slate.
|
|
440
|
+
- Do not proceed to Task 9.
|
|
441
|
+
|
|
442
|
+
If all three PASS, record the verdicts (even informally in the plan checkbox list) and move on.
|
|
443
|
+
|
|
444
|
+
---
|
|
445
|
+
|
|
446
|
+
## Task 9: Local file sync verification (CLAUDE.md §Local File Sync)
|
|
447
|
+
|
|
448
|
+
**Files:** none modified; this is a sync-and-verify gate executed after commit approval.
|
|
449
|
+
|
|
450
|
+
Per `CLAUDE.md`, after every commit that changes any `src/` file, sync ALL distributable files to `~/.claude/` and verify with `diff -q`.
|
|
451
|
+
|
|
452
|
+
This task is listed for completeness but is executed AFTER user approval of the commit, which is outside the plan's automated scope.
|
|
453
|
+
|
|
454
|
+
- [ ] **Step 1: Confirm sync list is current**
|
|
455
|
+
|
|
456
|
+
Re-read the Local File Sync section of `CLAUDE.md` and confirm the list of files covers: `src/commands/rlp-desk.md`, `src/governance.md`, `src/model-upgrade-table.md`, `src/scripts/init_ralph_desk.zsh`, `src/scripts/run_ralph_desk.zsh`, `src/scripts/lib_ralph_desk.zsh`, `README.md`, `install.sh`, `docs/architecture.md`, `docs/getting-started.md`, `docs/protocol-reference.md`, `docs/TODO-verification-next.md`, `docs/internal/*`, `docs/blueprints/*`.
|
|
457
|
+
|
|
458
|
+
- [ ] **Step 2: Sync all runtime files**
|
|
459
|
+
|
|
460
|
+
Run:
|
|
461
|
+
|
|
462
|
+
```bash
|
|
463
|
+
cp src/commands/rlp-desk.md ~/.claude/commands/rlp-desk.md
|
|
464
|
+
cp src/governance.md ~/.claude/ralph-desk/governance.md
|
|
465
|
+
cp src/model-upgrade-table.md ~/.claude/ralph-desk/model-upgrade-table.md
|
|
466
|
+
cp src/scripts/init_ralph_desk.zsh ~/.claude/ralph-desk/init_ralph_desk.zsh
|
|
467
|
+
cp src/scripts/run_ralph_desk.zsh ~/.claude/ralph-desk/run_ralph_desk.zsh
|
|
468
|
+
cp src/scripts/lib_ralph_desk.zsh ~/.claude/ralph-desk/lib_ralph_desk.zsh
|
|
469
|
+
```
|
|
470
|
+
|
|
471
|
+
- [ ] **Step 3: Sync reference docs**
|
|
472
|
+
|
|
473
|
+
Run:
|
|
474
|
+
|
|
475
|
+
```bash
|
|
476
|
+
cp README.md ~/.claude/ralph-desk/README.md
|
|
477
|
+
cp install.sh ~/.claude/ralph-desk/install.sh
|
|
478
|
+
cp docs/architecture.md ~/.claude/ralph-desk/docs/architecture.md
|
|
479
|
+
cp docs/getting-started.md ~/.claude/ralph-desk/docs/getting-started.md
|
|
480
|
+
cp docs/protocol-reference.md ~/.claude/ralph-desk/docs/protocol-reference.md
|
|
481
|
+
cp docs/TODO-verification-next.md ~/.claude/ralph-desk/docs/TODO-verification-next.md
|
|
482
|
+
cp -r docs/internal/. ~/.claude/ralph-desk/docs/internal/
|
|
483
|
+
cp -r docs/blueprints/. ~/.claude/ralph-desk/docs/blueprints/
|
|
484
|
+
```
|
|
485
|
+
|
|
486
|
+
- [ ] **Step 4: Verify each runtime file is byte-identical**
|
|
487
|
+
|
|
488
|
+
Run:
|
|
489
|
+
|
|
490
|
+
```bash
|
|
491
|
+
diff -q src/commands/rlp-desk.md ~/.claude/commands/rlp-desk.md
|
|
492
|
+
diff -q src/governance.md ~/.claude/ralph-desk/governance.md
|
|
493
|
+
diff -q src/model-upgrade-table.md ~/.claude/ralph-desk/model-upgrade-table.md
|
|
494
|
+
diff -q src/scripts/init_ralph_desk.zsh ~/.claude/ralph-desk/init_ralph_desk.zsh
|
|
495
|
+
diff -q src/scripts/run_ralph_desk.zsh ~/.claude/ralph-desk/run_ralph_desk.zsh
|
|
496
|
+
diff -q src/scripts/lib_ralph_desk.zsh ~/.claude/ralph-desk/lib_ralph_desk.zsh
|
|
497
|
+
diff -q README.md ~/.claude/ralph-desk/README.md
|
|
498
|
+
```
|
|
499
|
+
|
|
500
|
+
Expected: no output from any of the seven `diff -q` commands. Any line of output means sync failed and must be repeated.
|
|
501
|
+
|
|
502
|
+
---
|
|
503
|
+
|
|
504
|
+
## Release workflow (outside plan automation, user approval gated)
|
|
505
|
+
|
|
506
|
+
After Task 9 succeeds, the following release steps happen with explicit user approval at each stage, per `CLAUDE.md`:
|
|
507
|
+
|
|
508
|
+
1. `ralplan` + `codex review` both reach zero issues.
|
|
509
|
+
2. `git commit` with user-approved message.
|
|
510
|
+
3. `git push -u origin feature/gpt-5-5-default`.
|
|
511
|
+
4. Open PR, merge to `main` after approvals.
|
|
512
|
+
5. `npm version` is already set to `0.10.0` inside `package.json` by Task 6; skip the implicit tag bump and commit the bump directly.
|
|
513
|
+
6. `gh release create v0.10.0` with release notes limited to user-facing changes (see spec §Release notes draft).
|
|
514
|
+
7. `npm publish`.
|
|
515
|
+
8. Re-run Task 9 sync verification against the released tarball layout.
|
|
516
|
+
|
|
517
|
+
Each step above is documented here as a reminder, not as an automated step. The engineer executing this plan must stop at Task 9 and request user approval before continuing.
|
|
@@ -0,0 +1,107 @@
|
|
|
1
|
+
# Spec: Default Codex Model Migration gpt-5.4 → gpt-5.5
|
|
2
|
+
|
|
3
|
+
**Date:** 2026-04-24
|
|
4
|
+
**Version target:** 0.10.0 (minor bump)
|
|
5
|
+
**Branch:** `feature/gpt-5-5-default`
|
|
6
|
+
|
|
7
|
+
## Background
|
|
8
|
+
|
|
9
|
+
Codex CLI 0.124.0 now ships with `gpt-5.5` as the default model in `~/.codex/config.toml` (replacing `gpt-5.4`). The user's reasoning effort default is `high`. rlp-desk's documented defaults, example commands, and upgrade tables still reference `gpt-5.4`, which creates drift between what users see in rlp-desk and what their Codex CLI actually runs.
|
|
10
|
+
|
|
11
|
+
This spec migrates every rlp-desk default, example, and upgrade chain entry from `gpt-5.4` to `gpt-5.5`. User-supplied model flags are untouched — Codex CLI accepts any model name passthrough, so users who explicitly request `gpt-5.4:high` (or any other version) continue to work without rlp-desk interference.
|
|
12
|
+
|
|
13
|
+
## Decisions
|
|
14
|
+
|
|
15
|
+
| # | Decision | Rationale |
|
|
16
|
+
|---|---|---|
|
|
17
|
+
| D1 | Full replacement (`gpt-5.4` → `gpt-5.5` everywhere) | Codex CLI default changed; rlp-desk must mirror current reality. |
|
|
18
|
+
| D2 | No auto-remap, no deprecation warning | User-supplied model names are passed through to Codex CLI as-is. rlp-desk is not a policy layer on model selection. |
|
|
19
|
+
| D3 | Reasoning effort tiers (`low`/`medium`/`high`/`xhigh`) unchanged | Progressive upgrade philosophy preserved. Model name only. |
|
|
20
|
+
| D4 | Minor bump 0.9.3 → 0.10.0 | Default behavior changes; semver-compliant. |
|
|
21
|
+
|
|
22
|
+
## Scope
|
|
23
|
+
|
|
24
|
+
### Files with `gpt-5.4` references (in-scope)
|
|
25
|
+
|
|
26
|
+
| File | Change | Lines |
|
|
27
|
+
|---|---|---|
|
|
28
|
+
| `src/scripts/run_ralph_desk.zsh` | Default env values | 33, 35, 104, 106, 114-115, 2762, 2766 |
|
|
29
|
+
| `src/node/run.mjs` | `consensusModel` / `finalConsensusModel` defaults | 14-15 |
|
|
30
|
+
| `src/node/runner/campaign-main-loop.mjs` | Upgrade chain keys | 29-31 |
|
|
31
|
+
| `src/model-upgrade-table.md` | Section header + full table | 21-28 |
|
|
32
|
+
| `src/commands/rlp-desk.md` | Recommendation table, examples, help text | ~14 occurrences |
|
|
33
|
+
| `package.json` | `version` field | — |
|
|
34
|
+
|
|
35
|
+
### Out of scope (no changes)
|
|
36
|
+
|
|
37
|
+
- `src/node/cli/command-builder.mjs` — no remap logic added.
|
|
38
|
+
- `src/scripts/lib_ralph_desk.zsh` `parse_model_flag()` — accepts any model name as before.
|
|
39
|
+
- `gpt-5.3-codex-spark` (GPT Pro) section in `model-upgrade-table.md` — distinct token limit, tracked separately.
|
|
40
|
+
- Reasoning effort tier labels and progressive upgrade chain logic.
|
|
41
|
+
- CLI flag names (`--worker-model`, `--consensus-model`, `--final-consensus-model`).
|
|
42
|
+
|
|
43
|
+
## Release workflow
|
|
44
|
+
|
|
45
|
+
Per `CLAUDE.md` gate rules (absolute):
|
|
46
|
+
|
|
47
|
+
1. All source edits on `feature/gpt-5-5-default` branch.
|
|
48
|
+
2. Self-verification gate triggered by changes to `src/commands/rlp-desk.md` and `src/scripts/run_ralph_desk.zsh`:
|
|
49
|
+
- LOW scenario: L1+L3
|
|
50
|
+
- MEDIUM scenario: L1+L2+L3
|
|
51
|
+
- CRITICAL scenario: L1+L2+L3 + security check
|
|
52
|
+
- All three must PASS before commit.
|
|
53
|
+
3. `ralplan` + `codex review` must both reach 0 issues before merge.
|
|
54
|
+
4. User approval for each: commit, version bump, merge, release, publish, local file sync.
|
|
55
|
+
5. Local file sync of all 5 runtime files + reference docs per CLAUDE.md.
|
|
56
|
+
6. Release notes: user-facing only (per `feedback_release_notes_scope.md`).
|
|
57
|
+
|
|
58
|
+
## Release notes draft
|
|
59
|
+
|
|
60
|
+
```
|
|
61
|
+
### Changed
|
|
62
|
+
- Default Codex model upgraded from `gpt-5.4` to `gpt-5.5` to match Codex CLI 0.124.0 defaults.
|
|
63
|
+
- `WORKER_CODEX_MODEL`, `VERIFIER_CODEX_MODEL`, `CONSENSUS_MODEL`, `FINAL_CONSENSUS_MODEL` env defaults updated.
|
|
64
|
+
- Upgrade chain in `model-upgrade-table.md` now references `gpt-5.5`.
|
|
65
|
+
- Example commands in `/rlp-desk` recommendation table updated.
|
|
66
|
+
|
|
67
|
+
### Unchanged
|
|
68
|
+
- Users may still pass `--worker-model gpt-5.4:high` or any other model name explicitly; rlp-desk passes model flags through to Codex CLI without modification.
|
|
69
|
+
- Reasoning effort tiers and progressive upgrade logic unchanged.
|
|
70
|
+
- GPT Pro (`gpt-5.3-codex-spark`) path unchanged.
|
|
71
|
+
```
|
|
72
|
+
|
|
73
|
+
## Test plan
|
|
74
|
+
|
|
75
|
+
### Self-verification scenarios (per CLAUDE.md §Self-Verification Gate)
|
|
76
|
+
|
|
77
|
+
1. **LOW — default env value propagation.** Run `/rlp-desk run <slug>` with no model flags; confirm `WORKER_CODEX_MODEL=gpt-5.5` and `CONSENSUS_MODEL=gpt-5.5:medium` appear in the spawned command line.
|
|
78
|
+
2. **MEDIUM — explicit override.** Run `/rlp-desk run <slug> --worker-model gpt-5.4:high`; confirm `gpt-5.4:high` is passed through unchanged to Codex CLI.
|
|
79
|
+
3. **CRITICAL — upgrade chain.** Simulate a Worker failure; confirm the upgrade chain `gpt-5.5:medium → gpt-5.5:high → gpt-5.5:xhigh → BLOCKED` is followed.
|
|
80
|
+
|
|
81
|
+
Each scenario: Worker (with `execution_steps`) → Verifier (5 categories, with `reasoning`) → PASS. Any FAIL blocks commit.
|
|
82
|
+
|
|
83
|
+
### Regression sweep
|
|
84
|
+
|
|
85
|
+
```bash
|
|
86
|
+
grep -rn "gpt-5\.4" src/
|
|
87
|
+
# Expected: zero hits.
|
|
88
|
+
|
|
89
|
+
grep -rn "gpt-5\.4" docs/ --exclude-dir=superpowers/specs
|
|
90
|
+
# Expected: zero hits. (This spec lives under superpowers/specs/ and intentionally mentions gpt-5.4.)
|
|
91
|
+
```
|
|
92
|
+
|
|
93
|
+
## Known follow-up (out of scope for v0.10.0)
|
|
94
|
+
|
|
95
|
+
`tests/test_us004_self_verification.sh` has 4 pre-existing failures (PASS=42, FAIL=4) that surfaced during the post-migration full-suite sweep:
|
|
96
|
+
- `AC4-boundary` / `AC13-happy`: `init_ralph_desk.zsh` lacks "no PRD found — treating as first-run" fallback language.
|
|
97
|
+
- `AC13-boundary`: no note printed when `--mode` is provided but no PRD exists.
|
|
98
|
+
- `AC13-runtime-happy`: `--mode improve` against a directory without a PRD does not exit 0 + print note + create a fresh PRD from template.
|
|
99
|
+
|
|
100
|
+
Verified pre-existing via `git stash` baseline against pre-migration HEAD. These failures represent an unimplemented feature in `init_ralph_desk.zsh` (PRD template scaffolding for the no-PRD path), not a regression introduced by this migration. They will be tracked in a separate spec/PR (`init-no-prd-fallback`) targeting v0.10.1 or v0.11.0.
|
|
101
|
+
|
|
102
|
+
## Rollback
|
|
103
|
+
|
|
104
|
+
If a regression surfaces post-release:
|
|
105
|
+
- Revert the `feature/gpt-5-5-default` merge commit.
|
|
106
|
+
- Publish `0.10.1` with restored `gpt-5.4` defaults.
|
|
107
|
+
- Users can always override via `--worker-model gpt-5.5:high` while rollback is in flight.
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "@ai-dev-methodologies/rlp-desk",
|
|
3
|
-
"version": "0.
|
|
3
|
+
"version": "0.10.0",
|
|
4
4
|
"description": "Fresh-context iterative loops for Claude Code — autonomous task completion with independent verification",
|
|
5
5
|
"scripts": {
|
|
6
6
|
"postinstall": "node scripts/postinstall.js",
|
package/src/commands/rlp-desk.md
CHANGED
|
@@ -75,13 +75,13 @@ Ask about these items one by one (or in small groups):
|
|
|
75
75
|
|
|
76
76
|
| Complexity | Worker | per-US Verifier | Final Verifier | Consensus |
|
|
77
77
|
|------------|--------|-----------------|----------------|-----------|
|
|
78
|
-
| LOW | gpt-5.
|
|
79
|
-
| MEDIUM | gpt-5.
|
|
80
|
-
| HIGH | gpt-5.
|
|
81
|
-
| CRITICAL | gpt-5.
|
|
78
|
+
| LOW | gpt-5.5:medium | sonnet | opus | final-only |
|
|
79
|
+
| MEDIUM | gpt-5.5:medium | opus | opus | final-only |
|
|
80
|
+
| HIGH | gpt-5.5:high | opus | opus | all |
|
|
81
|
+
| CRITICAL | gpt-5.5:high | opus | opus + human | all |
|
|
82
82
|
|
|
83
83
|
**Worker model selection** (cross-engine):
|
|
84
|
-
- **gpt-5.
|
|
84
|
+
- **gpt-5.5:medium** — default recommendation (full context window, progressive upgrade handles harder US)
|
|
85
85
|
- **spark:high** — only when US is small enough for spark's 100k context (single-file, AC count <= 4, simple logic). Do NOT use as primary recommendation — spark context window is too small for most tasks
|
|
86
86
|
|
|
87
87
|
Present complexity score with evidence to the user, e.g.: "I rate this MEDIUM because: US count=4 (MEDIUM), file scope=2 (MEDIUM), logic=conditionals (MEDIUM), deps=none (LOW), impact=modify (MEDIUM). Highest=MEDIUM."
|
|
@@ -91,8 +91,8 @@ Ask about these items one by one (or in small groups):
|
|
|
91
91
|
**If codex is NOT installed** — say: "Codex is not installed. Defaulting to claude-only Worker. Note: without a second engine, your Verifier shares the same perspective as the Worker — there is a risk of blind spots where both Worker and Verifier miss the same issue. To unlock cross-engine coverage: `npm install -g @openai/codex`"
|
|
92
92
|
|
|
93
93
|
8. **Batch Capacity Check** — when verify-mode is batch and PRD is large:
|
|
94
|
-
- batch + spark + AC > 4 → warn "spark 100k context limit — switch to gpt-5.
|
|
95
|
-
- batch + gpt-5.
|
|
94
|
+
- batch + spark + AC > 4 → warn "spark 100k context limit — switch to gpt-5.5 or split smaller"
|
|
95
|
+
- batch + gpt-5.5 + AC > 15 → warn "too many ACs for single batch — consider wave split (3-4 US per wave)"
|
|
96
96
|
- per-us → no warning (US-level processing, no limit concern)
|
|
97
97
|
9. **Verify Mode** — per-us (default) or batch. Ask: "Verify after each user story (per-us, recommended) or only after all stories are done (batch)?" Default recommendation: per-us for 2+ stories.
|
|
98
98
|
10. **Consensus** — Ask: "Use cross-engine consensus? off (single engine), final-only (cross-engine on final verify only), or all (cross-engine on every verify). Requires codex CLI." Default: off. Recommended: final-only when codex is installed.
|
|
@@ -164,26 +164,26 @@ Tell the user:
|
|
|
164
164
|
Available run commands (copy the one you want):
|
|
165
165
|
|
|
166
166
|
# ★ Recommended: cross-engine + final-consensus (full context + blind-spot coverage):
|
|
167
|
-
/rlp-desk run <actual-slug> --mode tmux --worker-model gpt-5.
|
|
167
|
+
/rlp-desk run <actual-slug> --mode tmux --worker-model gpt-5.5:medium --consensus final-only --debug
|
|
168
168
|
|
|
169
169
|
# Small tasks only (single-file, AC <= 4, simple logic — spark 100k context limit):
|
|
170
170
|
/rlp-desk run <actual-slug> --mode tmux --worker-model spark:high --consensus final-only --debug
|
|
171
171
|
|
|
172
172
|
# Critical (full consensus on every verify):
|
|
173
|
-
/rlp-desk run <actual-slug> --mode tmux --worker-model gpt-5.
|
|
173
|
+
/rlp-desk run <actual-slug> --mode tmux --worker-model gpt-5.5:high --consensus all --debug
|
|
174
174
|
|
|
175
175
|
# Claude-only:
|
|
176
176
|
/rlp-desk run <actual-slug> --debug
|
|
177
177
|
|
|
178
178
|
# Full options reference:
|
|
179
179
|
# --mode agent|tmux (default: agent)
|
|
180
|
-
# --worker-model MODEL haiku|sonnet|opus or gpt-5.
|
|
180
|
+
# --worker-model MODEL haiku|sonnet|opus or gpt-5.5:high|spark:high (default: haiku)
|
|
181
181
|
# --lock-worker-model disable auto model upgrade
|
|
182
182
|
# --verifier-model MODEL per-US verifier (default: sonnet)
|
|
183
183
|
# --final-verifier-model MODEL final ALL verifier (default: opus)
|
|
184
184
|
# --consensus off|all|final-only cross-engine consensus (default: off)
|
|
185
|
-
# --consensus-model MODEL per-US cross-verifier (default: gpt-5.
|
|
186
|
-
# --final-consensus-model MODEL final cross-verifier (default: gpt-5.
|
|
185
|
+
# --consensus-model MODEL per-US cross-verifier (default: gpt-5.5:medium)
|
|
186
|
+
# --final-consensus-model MODEL final cross-verifier (default: gpt-5.5:high)
|
|
187
187
|
# --verify-mode per-us|batch (default: per-us)
|
|
188
188
|
# --cb-threshold N (default: 6)
|
|
189
189
|
# --max-iter N (default: 100)
|
|
@@ -240,16 +240,16 @@ Tell the user:
|
|
|
240
240
|
|
|
241
241
|
Options (parse from `$ARGUMENTS`):
|
|
242
242
|
- `--mode agent|tmux` (default: `agent`) — execution mode
|
|
243
|
-
- `--worker-model MODEL` (default: `haiku`) — Worker model. Format: `model` = claude engine, `model:reasoning` = codex engine. Examples: `haiku`, `sonnet`, `opus`, `spark:high`, `gpt-5.
|
|
244
|
-
- `--lock-worker-model` — disable automatic model upgrade on failure
|
|
243
|
+
- `--worker-model MODEL` (default: `haiku`) — Worker model. Format: `model` = claude engine, `model:reasoning` = codex engine. Examples: `haiku`, `sonnet`, `opus`, `spark:high`, `gpt-5.5:high`. Parsed by `parse_model_flag()` which auto-splits engine/model/reasoning.
|
|
244
|
+
- `--lock-worker-model` — disable automatic model upgrade on failure. Worker stays on the specified model regardless of consecutive failures.
|
|
245
245
|
- `--verifier-model MODEL` (default: `sonnet`) — per-US verification model. Campaign-fixed (no progressive upgrade). Lighter than final verifier.
|
|
246
246
|
- `--final-verifier-model MODEL` (default: `opus`) — final ALL verification model. Independent from per-US verifier. Used only for the final full-AC verify pass.
|
|
247
247
|
- `--consensus off|all|final-only` (default: `off`) — cross-engine consensus verification mode.
|
|
248
248
|
- `off`: single-engine verification only
|
|
249
249
|
- `all`: cross-engine consensus on every verify (per-US and final)
|
|
250
250
|
- `final-only`: cross-engine consensus only on the final ALL verify
|
|
251
|
-
- `--consensus-model MODEL` (default: `gpt-5.
|
|
252
|
-
- `--final-consensus-model MODEL` (default: `gpt-5.
|
|
251
|
+
- `--consensus-model MODEL` (default: `gpt-5.5:medium`) — per-US cross-verifier model. Lighter weight for cost efficiency.
|
|
252
|
+
- `--final-consensus-model MODEL` (default: `gpt-5.5:high`) — final cross-verifier model. Stricter. Note: spark is not allowed here (100k output limit).
|
|
253
253
|
- `--verify-mode per-us|batch` (default: `per-us`) — verification strategy
|
|
254
254
|
- `per-us`: verify after each US, then final full verify of all AC
|
|
255
255
|
- `batch`: verify only after all US done (legacy behavior)
|
|
@@ -292,8 +292,8 @@ VERIFIER_MODEL=<--verifier-model value, default: sonnet> \
|
|
|
292
292
|
FINAL_VERIFIER_MODEL=<--final-verifier-model value, default: opus> \
|
|
293
293
|
VERIFY_MODE=<--verify-mode value, default: per-us> \
|
|
294
294
|
CONSENSUS_MODE=<--consensus value, default: off> \
|
|
295
|
-
CONSENSUS_MODEL=<--consensus-model value, default: gpt-5.
|
|
296
|
-
FINAL_CONSENSUS_MODEL=<--final-consensus-model value, default: gpt-5.
|
|
295
|
+
CONSENSUS_MODEL=<--consensus-model value, default: gpt-5.5:medium> \
|
|
296
|
+
FINAL_CONSENSUS_MODEL=<--final-consensus-model value, default: gpt-5.5:high> \
|
|
297
297
|
CB_THRESHOLD=<--cb-threshold value, default: 6> \
|
|
298
298
|
ITER_TIMEOUT=<--iter-timeout value, default: 600> \
|
|
299
299
|
DEBUG=<1 if --debug, else 0> \
|
|
@@ -473,8 +473,8 @@ Bash("codex exec --model <codex_model> --reasoning-effort <codex_reasoning> <ful
|
|
|
473
473
|
**⑦b Consensus Verification** (when `--consensus` is `all`, or `final-only` and scope is ALL):
|
|
474
474
|
After the primary verifier runs, run a cross-engine second verifier:
|
|
475
475
|
- Determine cross-verifier model based on scope:
|
|
476
|
-
- per-US verify → use `--consensus-model` (default: gpt-5.
|
|
477
|
-
- final ALL verify → use `--final-consensus-model` (default: gpt-5.
|
|
476
|
+
- per-US verify → use `--consensus-model` (default: gpt-5.5:medium)
|
|
477
|
+
- final ALL verify → use `--final-consensus-model` (default: gpt-5.5:high)
|
|
478
478
|
- If primary engine is claude → cross-verifier uses codex (the consensus model)
|
|
479
479
|
- If primary engine is codex → cross-verifier uses claude `opus` (fixed)
|
|
480
480
|
- Both produce `verify-verdict.json` (Leader renames to `verify-verdict-claude.json` and `verify-verdict-codex.json`)
|
|
@@ -732,13 +732,13 @@ Example:
|
|
|
732
732
|
|
|
733
733
|
Run options:
|
|
734
734
|
--mode agent|tmux Execution mode (default: agent)
|
|
735
|
-
--worker-model MODEL Worker model: haiku|sonnet|opus or gpt-5.
|
|
735
|
+
--worker-model MODEL Worker model: haiku|sonnet|opus or gpt-5.5:high|spark:high (default: haiku)
|
|
736
736
|
--lock-worker-model Disable auto model upgrade on failure
|
|
737
737
|
--verifier-model MODEL per-US verifier (default: sonnet)
|
|
738
738
|
--final-verifier-model MODEL Final ALL verifier (default: opus)
|
|
739
739
|
--consensus off|all|final-only Cross-engine consensus (default: off)
|
|
740
|
-
--consensus-model MODEL per-US cross-verifier (default: gpt-5.
|
|
741
|
-
--final-consensus-model MODEL Final cross-verifier (default: gpt-5.
|
|
740
|
+
--consensus-model MODEL per-US cross-verifier (default: gpt-5.5:medium)
|
|
741
|
+
--final-consensus-model MODEL Final cross-verifier (default: gpt-5.5:high)
|
|
742
742
|
--verify-mode per-us|batch Verification strategy (default: per-us)
|
|
743
743
|
--cb-threshold N Consecutive failures before BLOCKED (default: 6)
|
|
744
744
|
--max-iter N Max iterations (default: 100)
|
package/src/governance.md
CHANGED
|
@@ -14,7 +14,7 @@ The Leader orchestrates, while Worker/Verifier run in isolated fresh contexts ev
|
|
|
14
14
|
- **Worker must NEVER modify Claude Code settings** (settings.json, settings.local.json). Permission prompts must be reported as blocked, not bypassed by editing settings.
|
|
15
15
|
- **Verifier is independent**: The Verifier judges based on evidence alone, without knowledge of the Worker's reasoning process.
|
|
16
16
|
- **Sentinels are Leader-owned**: Only the Leader writes COMPLETE/BLOCKED sentinels.
|
|
17
|
-
- **Supported engines**: claude (default; models: haiku, sonnet, opus) and codex (opt-in via `--worker-model spark:high` or `--worker-model gpt-5.
|
|
17
|
+
- **Supported engines**: claude (default; models: haiku, sonnet, opus) and codex (opt-in via `--worker-model spark:high` or `--worker-model gpt-5.5:high`).
|
|
18
18
|
|
|
19
19
|
## 1a. Iron Laws
|
|
20
20
|
|
|
@@ -300,11 +300,11 @@ The Leader decides each iteration. Decision criteria:
|
|
|
300
300
|
|
|
301
301
|
### Codex (opt-in engine)
|
|
302
302
|
|
|
303
|
-
Model routing uses `--worker-model` and `--verifier-model` with codex format: `spark:high` or `gpt-5.
|
|
303
|
+
Model routing uses `--worker-model` and `--verifier-model` with codex format: `spark:high` or `gpt-5.5:high`.
|
|
304
304
|
|
|
305
305
|
```
|
|
306
306
|
--worker-model spark:high # codex worker, spark model, high reasoning
|
|
307
|
-
--verifier-model gpt-5.
|
|
307
|
+
--verifier-model gpt-5.5:high # codex verifier, gpt-5.5, high reasoning
|
|
308
308
|
```
|
|
309
309
|
|
|
310
310
|
`parse_model_flag()` auto-detects engine from the model name: plain names (haiku, sonnet, opus) = claude; `name:reasoning` format = codex. Claude is the default engine; codex is explicitly opt-in.
|
|
@@ -331,7 +331,7 @@ Agent(
|
|
|
331
331
|
)
|
|
332
332
|
```
|
|
333
333
|
|
|
334
|
-
If `--worker-model` or `--verifier-model` uses codex format (e.g., `spark:high`, `gpt-5.
|
|
334
|
+
If `--worker-model` or `--verifier-model` uses codex format (e.g., `spark:high`, `gpt-5.5:high`) (opt-in):
|
|
335
335
|
```
|
|
336
336
|
# Worker or Verifier (codex engine)
|
|
337
337
|
Bash("codex -m <codex_model> -c model_reasoning_effort=<codex_reasoning> --dangerously-bypass-approvals-and-sandbox <prompt>")
|
|
@@ -377,7 +377,7 @@ claude -p "$(cat /path/to/prompt.md)" \
|
|
|
377
377
|
When `WORKER_ENGINE=codex` or `VERIFIER_ENGINE=codex`, the `codex` CLI is used instead:
|
|
378
378
|
```bash
|
|
379
379
|
# codex engine (opt-in)
|
|
380
|
-
codex -m gpt-5.
|
|
380
|
+
codex -m gpt-5.5 \
|
|
381
381
|
-c model_reasoning_effort="high" \
|
|
382
382
|
--dangerously-bypass-approvals-and-sandbox \
|
|
383
383
|
"$(cat /path/to/prompt.md)"
|
|
@@ -569,9 +569,9 @@ Worker completes US → signal verify
|
|
|
569
569
|
|
|
570
570
|
| Scenario | Primary verifier | Cross verifier |
|
|
571
571
|
|----------|-----------------|----------------|
|
|
572
|
-
| per-US, primary=claude | `--verifier-model` (sonnet) | `--consensus-model` (gpt-5.
|
|
572
|
+
| per-US, primary=claude | `--verifier-model` (sonnet) | `--consensus-model` (gpt-5.5:medium) |
|
|
573
573
|
| per-US, primary=codex | `--verifier-model` | claude opus (fixed) |
|
|
574
|
-
| final, primary=claude | `--final-verifier-model` (opus) | `--final-consensus-model` (gpt-5.
|
|
574
|
+
| final, primary=claude | `--final-verifier-model` (opus) | `--final-consensus-model` (gpt-5.5:high) |
|
|
575
575
|
| final, primary=codex | `--final-verifier-model` | claude opus (fixed) |
|
|
576
576
|
|
|
577
577
|
- Both must pass. No engine priority.
|
|
@@ -18,14 +18,14 @@ CB default: 6. Override: `--cb-threshold N`. Worker only — Verifier fixed at c
|
|
|
18
18
|
| HIGH | gpt-5.3-codex-spark:high | gpt-5.3-codex-spark:xhigh | gpt-5.3-codex-spark:xhigh | BLOCKED |
|
|
19
19
|
| CRITICAL | gpt-5.3-codex-spark:xhigh | gpt-5.3-codex-spark:xhigh | gpt-5.3-codex-spark:xhigh | BLOCKED |
|
|
20
20
|
|
|
21
|
-
## Non-Pro (gpt-5.
|
|
21
|
+
## Non-Pro (gpt-5.5)
|
|
22
22
|
|
|
23
23
|
| Complexity | 1-2 | 3-4 | 5-6 | 7+ |
|
|
24
24
|
|------------|-----|-----|-----|-----|
|
|
25
|
-
| LOW | gpt-5.
|
|
26
|
-
| MEDIUM | gpt-5.
|
|
27
|
-
| HIGH | gpt-5.
|
|
28
|
-
| CRITICAL | gpt-5.
|
|
25
|
+
| LOW | gpt-5.5:low | gpt-5.5:medium | gpt-5.5:high | BLOCKED |
|
|
26
|
+
| MEDIUM | gpt-5.5:medium | gpt-5.5:high | gpt-5.5:xhigh | BLOCKED |
|
|
27
|
+
| HIGH | gpt-5.5:high | gpt-5.5:xhigh | gpt-5.5:xhigh | BLOCKED |
|
|
28
|
+
| CRITICAL | gpt-5.5:xhigh | gpt-5.5:xhigh | gpt-5.5:xhigh | BLOCKED |
|
|
29
29
|
|
|
30
30
|
## Claude-only
|
|
31
31
|
|
package/src/node/run.mjs
CHANGED
|
@@ -11,8 +11,8 @@ const RUN_DEFAULTS = {
|
|
|
11
11
|
verifierModel: 'sonnet',
|
|
12
12
|
finalVerifierModel: 'opus',
|
|
13
13
|
consensusMode: 'off',
|
|
14
|
-
consensusModel: 'gpt-5.
|
|
15
|
-
finalConsensusModel: 'gpt-5.
|
|
14
|
+
consensusModel: 'gpt-5.5:medium',
|
|
15
|
+
finalConsensusModel: 'gpt-5.5:high',
|
|
16
16
|
verifyMode: 'per-us',
|
|
17
17
|
cbThreshold: 6,
|
|
18
18
|
maxIterations: 100,
|
|
@@ -26,9 +26,9 @@ const execFileAsync = promisify(execFile);
|
|
|
26
26
|
const REQUIRED_SCAFFOLD_NAMES = ['workerPrompt', 'verifierPrompt', 'memoryFile', 'prdFile', 'testSpecFile'];
|
|
27
27
|
const CLAUDE_MODELS = new Set(['haiku', 'sonnet', 'opus']);
|
|
28
28
|
const MODEL_UPGRADES = {
|
|
29
|
-
'gpt-5.
|
|
30
|
-
'gpt-5.
|
|
31
|
-
'gpt-5.
|
|
29
|
+
'gpt-5.5:medium': 'gpt-5.5:high',
|
|
30
|
+
'gpt-5.5:high': 'gpt-5.5:xhigh',
|
|
31
|
+
'gpt-5.5:xhigh': 'BLOCKED',
|
|
32
32
|
'gpt-5.3-codex-spark:medium': 'gpt-5.3-codex-spark:high',
|
|
33
33
|
'gpt-5.3-codex-spark:high': 'gpt-5.3-codex-spark:xhigh',
|
|
34
34
|
'gpt-5.3-codex-spark:xhigh': 'BLOCKED',
|