@ai-dev-methodologies/rlp-desk 0.9.2 → 0.9.3
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/docs/blueprints/blueprint-flywheel-enhancement.md +352 -0
- package/docs/blueprints/plan-flywheel-enhancement.md +817 -0
- package/package.json +1 -1
- package/src/commands/rlp-desk.md +4 -0
- package/src/governance.md +13 -0
- package/src/node/run.mjs +12 -0
- package/src/node/runner/campaign-main-loop.mjs +92 -2
|
@@ -0,0 +1,352 @@
|
|
|
1
|
+
# Blueprint: Flywheel Enhancement
|
|
2
|
+
|
|
3
|
+
> Status: TODO — not yet implemented. Document for future development.
|
|
4
|
+
> Codex-reviewed: 2026-04-13 (pre-implementation design review)
|
|
5
|
+
|
|
6
|
+
## Summary
|
|
7
|
+
|
|
8
|
+
Three enhancements to the flywheel direction-review step, unified in one blueprint:
|
|
9
|
+
|
|
10
|
+
1. **Flywheel Guard** — independent verification of flywheel decisions before Worker acts on them
|
|
11
|
+
2. **CEO Pattern Internalization** — selective additions from plan-ceo-review framework
|
|
12
|
+
3. **tmux Shell Leader Defer** — Node.js campaign-main-loop.mjs only; zsh run script deferred
|
|
13
|
+
|
|
14
|
+
## Problem
|
|
15
|
+
|
|
16
|
+
### Flywheel makes bad direction decisions
|
|
17
|
+
|
|
18
|
+
In the `surge-v3-exit-strategy` campaign, the flywheel ran 3 times. Each time it made a flawed decision that wasted iterations:
|
|
19
|
+
|
|
20
|
+
| Flywheel | Decision | Failure | Root Cause |
|
|
21
|
+
|----------|----------|---------|------------|
|
|
22
|
+
| 1st | peak_pct segmentation | look-ahead bias — peak_pct is post-hoc, not available at decision time | No feasibility check on proposed features |
|
|
23
|
+
| 2nd | fixed_tp_5pct by median | median ignores large outliers (4.7x PnL difference invisible) | No metric alignment check against PRD intent |
|
|
24
|
+
| 3rd | breakeven by mean PnL | Correct — but only after user manually caught both prior errors | 2 iterations wasted |
|
|
25
|
+
|
|
26
|
+
The flywheel prompt already has premise challenge, forced alternatives, and 10 cognitive patterns. But it lacks:
|
|
27
|
+
- **Feasibility validation** — can the proposed direction actually be deployed?
|
|
28
|
+
- **Metric scrutiny** — is the optimization metric the right proxy for the real goal?
|
|
29
|
+
- **Independent review** — self-audit by the same agent is structurally weak
|
|
30
|
+
|
|
31
|
+
### tmux leader gap
|
|
32
|
+
|
|
33
|
+
`campaign-main-loop.mjs` handles flywheel dispatch for both agent and tmux modes via Node.js. `run_ralph_desk.zsh` has no flywheel logic. This is intentional — see §3.
|
|
34
|
+
|
|
35
|
+
## Design
|
|
36
|
+
|
|
37
|
+
### 1. Flywheel Guard
|
|
38
|
+
|
|
39
|
+
#### CLI Flags
|
|
40
|
+
|
|
41
|
+
```
|
|
42
|
+
--flywheel-guard off|on (default: off)
|
|
43
|
+
--flywheel-guard-model MODEL (default: opus)
|
|
44
|
+
```
|
|
45
|
+
|
|
46
|
+
#### Architecture: Single Independent Guard
|
|
47
|
+
|
|
48
|
+
When `--flywheel-guard on`, every flywheel execution is followed by an independent Guard agent before Worker dispatch. No embedded-only phase — codex review confirmed self-audit is structurally weak for bias detection.
|
|
49
|
+
|
|
50
|
+
```
|
|
51
|
+
Verifier FAIL
|
|
52
|
+
→ Flywheel Agent (fresh context)
|
|
53
|
+
Steps 0A-0F: premise challenge, alternatives, scope decision, contract rewrite
|
|
54
|
+
→ Guard Agent (fresh context, different from flywheel)
|
|
55
|
+
Reads: flywheel-signal.json, flywheel-review.md, PRD, campaign memory
|
|
56
|
+
Checks: 4 validation items (see below)
|
|
57
|
+
Writes: flywheel-guard-verdict.json
|
|
58
|
+
→ verdict:
|
|
59
|
+
pass → Worker executes flywheel's direction
|
|
60
|
+
fail → Flywheel re-runs with guard feedback injected (max 2 retries)
|
|
61
|
+
inconclusive → Leader escalates to user (BLOCKED with escalation report)
|
|
62
|
+
→ 2 retries exhausted + still fail → BLOCKED
|
|
63
|
+
```
|
|
64
|
+
|
|
65
|
+
#### Guard Validation Checks (4 items)
|
|
66
|
+
|
|
67
|
+
**Check 1: Look-ahead Bias**
|
|
68
|
+
List every data feature the flywheel's proposed direction depends on.
|
|
69
|
+
For each feature: is it available at decision time (when the system must act)?
|
|
70
|
+
- `available`: feature exists before the event occurs (e.g., entry time, session start price)
|
|
71
|
+
- `post-hoc`: feature requires future information (e.g., peak_pct, session_end)
|
|
72
|
+
- Any `post-hoc` feature in a deployable direction → FAIL
|
|
73
|
+
|
|
74
|
+
**Check 2: Metric Alignment**
|
|
75
|
+
- State the PRD's optimization metric explicitly
|
|
76
|
+
- Does the flywheel's proposed direction optimize the same metric?
|
|
77
|
+
- Same metric → pass
|
|
78
|
+
- Different metric, not flagged → FAIL (silent metric switch)
|
|
79
|
+
- Different metric, flagged with evidence → FAIL (metric mismatch requires PRD update or user approval)
|
|
80
|
+
- PRD is ground truth. The guard cannot approve off-PRD metric changes autonomously.
|
|
81
|
+
|
|
82
|
+
**Check 3: Deployability**
|
|
83
|
+
- Can the proposed direction's output be used in production?
|
|
84
|
+
- Does it require data, infrastructure, or conditions not available in the deployment environment?
|
|
85
|
+
- Non-deployable direction proposed as champion → FAIL
|
|
86
|
+
- Direction labeled "upper-bound/reference only" → pass, but Guard MUST include `"analysis_only": true` in verdict so Leader skips Worker dispatch (analysis record only, no implementation)
|
|
87
|
+
|
|
88
|
+
**Check 4: Repeat Pattern (same-US scoped)**
|
|
89
|
+
- Compare current flywheel decision to prior flywheel decisions **for the current US only**
|
|
90
|
+
- Same direction category (e.g., same scope decision) with same underlying approach → FAIL
|
|
91
|
+
- Different framing of a previously rejected direction → FAIL
|
|
92
|
+
- Guard MUST persist rejected flywheel directions to campaign memory's Rejected Directions section before writing verdict file. This ensures cleanup cannot erase the record.
|
|
93
|
+
|
|
94
|
+
#### Guard Signal Protocol
|
|
95
|
+
|
|
96
|
+
```json
|
|
97
|
+
{
|
|
98
|
+
"verdict": "pass|fail|inconclusive",
|
|
99
|
+
"issues": [
|
|
100
|
+
{
|
|
101
|
+
"check": "look-ahead-bias|metric-alignment|deployability|repeat-pattern",
|
|
102
|
+
"status": "pass|fail|inconclusive",
|
|
103
|
+
"detail": "specific finding",
|
|
104
|
+
"evidence": "file:line or data reference"
|
|
105
|
+
}
|
|
106
|
+
],
|
|
107
|
+
"analysis_only": false,
|
|
108
|
+
"recommendation": "proceed|retry-flywheel|escalate-to-user",
|
|
109
|
+
"timestamp": "ISO"
|
|
110
|
+
}
|
|
111
|
+
```
|
|
112
|
+
|
|
113
|
+
#### State Tracking
|
|
114
|
+
|
|
115
|
+
- `flywheel_guard_count`: tracked **per-US** in status.json (not per-campaign)
|
|
116
|
+
- Increments on each guard execution for the current US
|
|
117
|
+
- Resets when US changes or passes verification
|
|
118
|
+
- `ALL` final verification treated as its own bucket
|
|
119
|
+
- Guard files in cleanup: `flywheel-guard-verdict.json` added to re-execution cleanup list
|
|
120
|
+
|
|
121
|
+
#### Boundary Conditions
|
|
122
|
+
|
|
123
|
+
| Condition | Behavior |
|
|
124
|
+
|-----------|----------|
|
|
125
|
+
| `--flywheel off` | No flywheel, no guard. Guard flag ignored. |
|
|
126
|
+
| `--flywheel on-fail` + `--flywheel-guard off` | Flywheel runs without guard (current behavior). |
|
|
127
|
+
| `--flywheel on-fail` + `--flywheel-guard on` | Every flywheel followed by independent guard. |
|
|
128
|
+
| Final ALL verification fails | Flywheel + guard runs if `--flywheel on-fail`. ALL treated as separate US bucket. |
|
|
129
|
+
| Guard returns `inconclusive` | BLOCKED with escalation report. Leader does NOT retry. |
|
|
130
|
+
| Guard model same as flywheel model | Allowed but not recommended. Different model provides better independence. |
|
|
131
|
+
| Resume after guard BLOCKED | User must clear blocked sentinel. Guard count resets for that US. |
|
|
132
|
+
|
|
133
|
+
#### Guard Prompt Template
|
|
134
|
+
|
|
135
|
+
```markdown
|
|
136
|
+
# Flywheel Guard Review
|
|
137
|
+
|
|
138
|
+
You are an independent reviewer verifying whether a flywheel direction decision is safe to execute.
|
|
139
|
+
You have NO prior context about this campaign. Read the files below and evaluate the decision objectively.
|
|
140
|
+
|
|
141
|
+
## Files to Read (in order)
|
|
142
|
+
1. PRD: {DESK}/plans/prd-{SLUG}.md — the ground truth for what success means
|
|
143
|
+
2. Flywheel Decision: {DESK}/memos/{SLUG}-flywheel-signal.json — what the flywheel decided
|
|
144
|
+
3. Flywheel Analysis: {DESK}/memos/{SLUG}-flywheel-review.md — the flywheel's reasoning
|
|
145
|
+
4. Campaign Memory: {DESK}/memos/{SLUG}-memory.md — history, rejected directions, key decisions
|
|
146
|
+
5. Done Claim: {DESK}/memos/{SLUG}-done-claim.json — what the Worker actually produced
|
|
147
|
+
6. Verify Verdict: {DESK}/memos/{SLUG}-verify-verdict.json — why the Verifier failed it
|
|
148
|
+
|
|
149
|
+
{GUARD_FEEDBACK_SECTION}
|
|
150
|
+
|
|
151
|
+
## Validation Checks
|
|
152
|
+
|
|
153
|
+
### Check 1: Look-ahead Bias
|
|
154
|
+
List every data feature the flywheel's proposed direction depends on.
|
|
155
|
+
For each: "feature X — available at decision time: YES/NO/UNCLEAR"
|
|
156
|
+
- YES: feature is known before the event (entry time, session start price, order book state)
|
|
157
|
+
- NO: feature requires future information (peak price, session end, outcome)
|
|
158
|
+
- UNCLEAR: cannot determine from available context → mark inconclusive
|
|
159
|
+
If ANY feature is NO and used in a deployable strategy (not just upper-bound analysis): FAIL.
|
|
160
|
+
|
|
161
|
+
### Check 2: Metric Alignment
|
|
162
|
+
1. What metric does the PRD define as the optimization target?
|
|
163
|
+
2. What metric does the flywheel's direction optimize?
|
|
164
|
+
3. Are they the same?
|
|
165
|
+
- Same metric → pass
|
|
166
|
+
- Different metric, not flagged → FAIL (silent metric switch)
|
|
167
|
+
- Different metric, flagged with evidence → FAIL with recommendation: "metric mismatch requires PRD update or user approval before proceeding"
|
|
168
|
+
PRD is ground truth. The guard cannot approve off-PRD metric changes autonomously.
|
|
169
|
+
|
|
170
|
+
### Check 3: Deployability
|
|
171
|
+
Can the proposed direction's output be used in production as-is?
|
|
172
|
+
- Requires post-hoc data → FAIL
|
|
173
|
+
- Requires infrastructure not mentioned in PRD → FAIL
|
|
174
|
+
- Labeled as "upper-bound only" or "reference" → pass, but you MUST include `"analysis_only": true` in your verdict so Leader skips Worker dispatch (no implementation, analysis record only)
|
|
175
|
+
|
|
176
|
+
### Check 4: Repeat Pattern (same-US scoped)
|
|
177
|
+
Compare to prior flywheel decisions **for the current US only** in campaign memory's Key Decisions section.
|
|
178
|
+
- Same scope decision + same underlying approach as a prior flywheel for this US → FAIL
|
|
179
|
+
- Reframing of a previously rejected direction (check Rejected Directions) → FAIL
|
|
180
|
+
- Genuinely new approach → pass
|
|
181
|
+
Before writing your verdict, you MUST append any rejected flywheel direction to campaign memory's Rejected Directions section. This persists the record before cleanup can erase it.
|
|
182
|
+
|
|
183
|
+
## Output
|
|
184
|
+
Write verdict to: {DESK}/memos/{SLUG}-flywheel-guard-verdict.json
|
|
185
|
+
|
|
186
|
+
Use this format:
|
|
187
|
+
{
|
|
188
|
+
"verdict": "pass|fail|inconclusive",
|
|
189
|
+
"issues": [...],
|
|
190
|
+
"analysis_only": false,
|
|
191
|
+
"recommendation": "proceed|retry-flywheel|escalate-to-user",
|
|
192
|
+
"timestamp": "ISO"
|
|
193
|
+
}
|
|
194
|
+
|
|
195
|
+
Rules:
|
|
196
|
+
- If ALL checks pass → verdict: pass, recommendation: proceed
|
|
197
|
+
- If ANY check is fail → verdict: fail, recommendation: retry-flywheel
|
|
198
|
+
- If ANY check is inconclusive and none are fail → verdict: inconclusive, recommendation: escalate-to-user
|
|
199
|
+
- Include specific evidence for every check. No "seems fine" or "probably ok."
|
|
200
|
+
```
|
|
201
|
+
|
|
202
|
+
When guard fails and flywheel retries, the `{GUARD_FEEDBACK_SECTION}` is populated:
|
|
203
|
+
|
|
204
|
+
```markdown
|
|
205
|
+
## Previous Guard Feedback (MUST address these issues)
|
|
206
|
+
The previous flywheel decision was rejected by the Guard. Issues found:
|
|
207
|
+
{list of guard issues with evidence}
|
|
208
|
+
|
|
209
|
+
You MUST address each issue above. Do NOT repeat the same direction.
|
|
210
|
+
Check Rejected Directions in campaign memory before proposing alternatives.
|
|
211
|
+
```
|
|
212
|
+
|
|
213
|
+
### 2. CEO Pattern Internalization (Selective)
|
|
214
|
+
|
|
215
|
+
From plan-ceo-review's 16+ cognitive patterns, add **2** to the flywheel prompt's existing 10 patterns:
|
|
216
|
+
|
|
217
|
+
#### Added
|
|
218
|
+
|
|
219
|
+
**11. Proxy Skepticism**
|
|
220
|
+
> Is the metric you're optimizing actually the right proxy for the real goal? What would change if you used a different metric? Name the proxy, name the goal, check the gap.
|
|
221
|
+
|
|
222
|
+
Why: Directly prevents the median-vs-mean failure. The flywheel optimized median gap without questioning whether median was the right proxy for total PnL.
|
|
223
|
+
|
|
224
|
+
Placement: Added to the CEO Cognitive Patterns list (items 1-10 already exist). Also referenced in Step 0D½ context (when applicable).
|
|
225
|
+
|
|
226
|
+
**12. Classification (reversibility x magnitude)**
|
|
227
|
+
> Rate your proposed direction change on two axes: How hard is it to reverse? How large is its impact? Hard-to-reverse + large-magnitude decisions need proportionally stronger evidence.
|
|
228
|
+
|
|
229
|
+
Why: Prevents casual PIVOT decisions on major scope changes without sufficient evidence. Lightweight — one sentence judgment per direction, not a matrix.
|
|
230
|
+
|
|
231
|
+
Placement: Added to Step 0E (Scope Decision) as a judgment criterion.
|
|
232
|
+
|
|
233
|
+
#### Not Added (with rationale)
|
|
234
|
+
|
|
235
|
+
| Pattern | Why Not |
|
|
236
|
+
|---------|---------|
|
|
237
|
+
| Wartime awareness | Mechanical (cb_threshold/2) conflicts with governance CB semantics. Flywheel already has time-value pattern (#6). |
|
|
238
|
+
| Temporal depth (5-10yr) | Iteration-level direction review, not strategic planning. |
|
|
239
|
+
| People-first sequencing | Organizational, not applicable to automated agents. |
|
|
240
|
+
| Hierarchy as service | Organizational. |
|
|
241
|
+
| Narrative coherence | Relevant for product vision, not iteration pivots. |
|
|
242
|
+
| Speed calibration (70% info) | Flywheel already operates on limited info by design. |
|
|
243
|
+
| Founder-mode bias | Human leadership pattern. |
|
|
244
|
+
| Willfulness as strategy | Human trait. |
|
|
245
|
+
| Courage accumulation | Human trait. |
|
|
246
|
+
| Leverage obsession | Too abstract for iteration-level use. |
|
|
247
|
+
| Focus as subtraction | Already covered by simplicity bias (#4). |
|
|
248
|
+
| Paranoid scanning | Already covered by inversion (#3). |
|
|
249
|
+
| Design for trust | UX-specific. |
|
|
250
|
+
| Edge case paranoia | Already covered by evidence > opinion (#10). |
|
|
251
|
+
|
|
252
|
+
#### Updated Flywheel Prompt Cognitive Patterns Section
|
|
253
|
+
|
|
254
|
+
```markdown
|
|
255
|
+
## CEO Cognitive Patterns (apply throughout your review)
|
|
256
|
+
1. First-principles — ignore convention, start from the problem itself
|
|
257
|
+
2. 10x check — can 2x effort yield 10x better result?
|
|
258
|
+
3. Inversion — what must be true for this approach to fail?
|
|
259
|
+
4. Simplicity bias — prefer simple over complex solutions
|
|
260
|
+
5. User-back — reason backwards from end-user experience
|
|
261
|
+
6. Time-value — does this direction change save 3+ iterations?
|
|
262
|
+
7. Sunk cost immunity — ignore what was already invested
|
|
263
|
+
8. Blast radius — assess impact scope of direction change
|
|
264
|
+
9. Reversibility — prefer easily reversible decisions
|
|
265
|
+
10. Evidence > opinion — judge only by this iteration's actual results
|
|
266
|
+
11. Proxy skepticism — is the optimization metric the right proxy for the real goal?
|
|
267
|
+
12. Classification — hard-to-reverse + large-magnitude changes need stronger evidence
|
|
268
|
+
```
|
|
269
|
+
|
|
270
|
+
### 3. tmux Shell Leader Defer
|
|
271
|
+
|
|
272
|
+
#### Current State
|
|
273
|
+
|
|
274
|
+
`campaign-main-loop.mjs` manages flywheel for both execution modes:
|
|
275
|
+
- **tmux mode**: creates flywheel pane, dispatches via `sendKeys`, polls `flywheel-signal.json`
|
|
276
|
+
- **agent mode**: (planned) Leader calls Agent() with flywheel prompt
|
|
277
|
+
|
|
278
|
+
`run_ralph_desk.zsh` has **no** flywheel logic. It manages Worker + Verifier panes only.
|
|
279
|
+
|
|
280
|
+
#### Defer Rationale
|
|
281
|
+
|
|
282
|
+
1. Node.js `campaign-main-loop.mjs` already covers tmux mode's flywheel dispatch via pane management
|
|
283
|
+
2. Duplicating the same logic in zsh creates maintenance burden with no functional gain
|
|
284
|
+
3. Workstream research is evaluating whether tmux leader should migrate to Node.js entirely
|
|
285
|
+
|
|
286
|
+
#### Decision Point
|
|
287
|
+
|
|
288
|
+
| Research Conclusion | Action |
|
|
289
|
+
|---------------------|--------|
|
|
290
|
+
| Keep zsh leader | Implement flywheel logic in `run_ralph_desk.zsh` — new blueprint |
|
|
291
|
+
| Migrate to Node.js | This defer item is closed. `campaign-main-loop.mjs` is the single implementation. |
|
|
292
|
+
|
|
293
|
+
#### Until Then
|
|
294
|
+
|
|
295
|
+
- `run_ralph_desk.zsh` continues to operate Worker + Verifier only
|
|
296
|
+
- flywheel is available through `node src/node/run.mjs run <slug> --flywheel on-fail --mode tmux`
|
|
297
|
+
- The Node.js runner handles all tmux pane management for flywheel
|
|
298
|
+
|
|
299
|
+
## Implementation Scope
|
|
300
|
+
|
|
301
|
+
### Files Changed
|
|
302
|
+
|
|
303
|
+
| File | Change |
|
|
304
|
+
|------|--------|
|
|
305
|
+
| `src/scripts/init_ralph_desk.zsh` | Flywheel prompt: add patterns #11-12. Guard prompt template (new). Guard files in cleanup list. |
|
|
306
|
+
| `src/node/runner/campaign-main-loop.mjs` | Guard dispatch logic, `flywheel_guard_count` per-US in status, guard verdict polling, retry-with-feedback loop, `inconclusive` → BLOCKED path. |
|
|
307
|
+
| `src/node/run.mjs` | `--flywheel-guard off|on`, `--flywheel-guard-model MODEL` flags. |
|
|
308
|
+
| `src/commands/rlp-desk.md` | Flywheel guard options documentation, guard flow description. |
|
|
309
|
+
| `src/governance.md` | §7 Leader Loop: flywheel guard step after flywheel, before Worker. |
|
|
310
|
+
|
|
311
|
+
### New Files
|
|
312
|
+
|
|
313
|
+
| File | Content |
|
|
314
|
+
|------|---------|
|
|
315
|
+
| `tests/node/test-flywheel-guard.mjs` | Guard logic unit tests, verdict parsing, retry loop, per-US count tracking. |
|
|
316
|
+
|
|
317
|
+
### Init Scaffold Additions
|
|
318
|
+
|
|
319
|
+
```
|
|
320
|
+
.claude/ralph-desk/
|
|
321
|
+
├── memos/
|
|
322
|
+
│ ├── <slug>-flywheel-guard-verdict.json (runtime; deleted on re-execution)
|
|
323
|
+
```
|
|
324
|
+
|
|
325
|
+
## Verification
|
|
326
|
+
|
|
327
|
+
### TDD Tests
|
|
328
|
+
- Guard dispatch only when `--flywheel-guard on` AND flywheel ran
|
|
329
|
+
- Guard verdict parsing (pass/fail/inconclusive)
|
|
330
|
+
- Retry loop: fail → re-run flywheel with feedback → re-guard (max 2)
|
|
331
|
+
- inconclusive → BLOCKED (no retry)
|
|
332
|
+
- Per-US guard count tracking (increments, resets on US change/pass)
|
|
333
|
+
- ALL bucket treated separately
|
|
334
|
+
- Guard files in cleanup list
|
|
335
|
+
|
|
336
|
+
### Self-Verification (5 scenarios)
|
|
337
|
+
- **LOW**: `--flywheel-guard off` → flywheel runs without guard (current behavior unchanged)
|
|
338
|
+
- **MEDIUM-1**: `--flywheel-guard on` + flywheel decision with look-ahead bias → guard catches it (Check 1 FAIL) → flywheel retries → corrected direction
|
|
339
|
+
- **MEDIUM-2**: `--flywheel-guard on` + flywheel silently switches optimization metric → guard catches it (Check 2 FAIL, metric mismatch requires PRD update) → escalation
|
|
340
|
+
- **MEDIUM-3**: `--flywheel-guard on` + flywheel proposes direction previously rejected for same US → guard catches it (Check 4 FAIL) → flywheel retries with different approach
|
|
341
|
+
- **CRITICAL**: Guard fails 2x → BLOCKED with escalation report including all guard issues from both attempts
|
|
342
|
+
|
|
343
|
+
## Dependencies
|
|
344
|
+
|
|
345
|
+
- Requires `--flywheel on-fail` (guard without flywheel is meaningless)
|
|
346
|
+
- Works with any flywheel model and guard model combination
|
|
347
|
+
- Does not require gstack installation
|
|
348
|
+
- Does not require tmux (works in agent mode)
|
|
349
|
+
|
|
350
|
+
## Priority
|
|
351
|
+
|
|
352
|
+
Medium — implement after flywheel has been battle-tested in more campaigns. Current workaround is user vigilance (which caught all 3 issues in surge-v3-exit-strategy). Guard formalizes that vigilance into a protocol.
|
|
@@ -0,0 +1,817 @@
|
|
|
1
|
+
# Flywheel Enhancement Implementation Plan
|
|
2
|
+
|
|
3
|
+
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
|
|
4
|
+
|
|
5
|
+
**Goal:** Add an independent Guard agent that validates flywheel direction decisions before Worker acts on them, plus selective CEO cognitive pattern internalization.
|
|
6
|
+
|
|
7
|
+
**Architecture:** When `--flywheel-guard on`, every flywheel execution is followed by an independent Guard agent (fresh context) that checks look-ahead bias, metric alignment, deployability, and repeat patterns. Guard verdict is 3-state (pass/fail/inconclusive). On fail, flywheel retries with guard feedback (max 2). On inconclusive, BLOCKED. Guard count tracked per-US.
|
|
8
|
+
|
|
9
|
+
**Tech Stack:** Node.js (ESM), zsh (init script), node:test
|
|
10
|
+
|
|
11
|
+
**Spec:** `docs/blueprints/blueprint-flywheel-enhancement.md`
|
|
12
|
+
|
|
13
|
+
---
|
|
14
|
+
|
|
15
|
+
### Task 1: Add CEO cognitive patterns #11-12 to flywheel prompt
|
|
16
|
+
|
|
17
|
+
**Files:**
|
|
18
|
+
- Modify: `src/scripts/init_ralph_desk.zsh:624-634`
|
|
19
|
+
- Modify: `tests/node/test-flywheel.mjs:35-48`
|
|
20
|
+
|
|
21
|
+
- [ ] **Step 1: Update test T5 to expect 12 patterns**
|
|
22
|
+
|
|
23
|
+
In `tests/node/test-flywheel.mjs`, update the test title and add 2 new assertions:
|
|
24
|
+
|
|
25
|
+
```javascript
|
|
26
|
+
test('T5: flywheel prompt contains 12 CEO cognitive patterns', async () => {
|
|
27
|
+
const script = path.join(repoRoot, 'src', 'scripts', 'init_ralph_desk.zsh');
|
|
28
|
+
const content = await fs.readFile(script, 'utf8');
|
|
29
|
+
assert.match(content, /First-principles/);
|
|
30
|
+
assert.match(content, /10x check/);
|
|
31
|
+
assert.match(content, /Inversion/);
|
|
32
|
+
assert.match(content, /Simplicity bias/);
|
|
33
|
+
assert.match(content, /User-back/);
|
|
34
|
+
assert.match(content, /Time-value/);
|
|
35
|
+
assert.match(content, /Sunk cost immunity/);
|
|
36
|
+
assert.match(content, /Blast radius/);
|
|
37
|
+
assert.match(content, /Reversibility/);
|
|
38
|
+
assert.match(content, /Evidence > opinion/);
|
|
39
|
+
assert.match(content, /Proxy skepticism/);
|
|
40
|
+
assert.match(content, /Classification/);
|
|
41
|
+
});
|
|
42
|
+
```
|
|
43
|
+
|
|
44
|
+
- [ ] **Step 2: Run test to verify it fails**
|
|
45
|
+
|
|
46
|
+
```bash
|
|
47
|
+
node --test tests/node/test-flywheel.mjs
|
|
48
|
+
```
|
|
49
|
+
Expected: T5 FAIL (`Proxy skepticism` not found)
|
|
50
|
+
|
|
51
|
+
- [ ] **Step 3: Add patterns #11-12 to flywheel prompt in init_ralph_desk.zsh**
|
|
52
|
+
|
|
53
|
+
In `src/scripts/init_ralph_desk.zsh`, replace lines 624-634 (the CEO Cognitive Patterns section inside the flywheel prompt heredoc):
|
|
54
|
+
|
|
55
|
+
```
|
|
56
|
+
## CEO Cognitive Patterns (apply throughout your review)
|
|
57
|
+
1. First-principles — ignore convention, start from the problem itself
|
|
58
|
+
2. 10x check — can 2x effort yield 10x better result?
|
|
59
|
+
3. Inversion — what must be true for this approach to fail?
|
|
60
|
+
4. Simplicity bias — prefer simple over complex solutions
|
|
61
|
+
5. User-back — reason backwards from end-user experience
|
|
62
|
+
6. Time-value — does this direction change save 3+ iterations?
|
|
63
|
+
7. Sunk cost immunity — ignore what was already invested
|
|
64
|
+
8. Blast radius — assess impact scope of direction change
|
|
65
|
+
9. Reversibility — prefer easily reversible decisions
|
|
66
|
+
10. Evidence > opinion — judge only by this iteration's actual results
|
|
67
|
+
11. Proxy skepticism — is the optimization metric the right proxy for the real goal?
|
|
68
|
+
12. Classification — hard-to-reverse + large-magnitude changes need stronger evidence
|
|
69
|
+
```
|
|
70
|
+
|
|
71
|
+
- [ ] **Step 4: Run tests to verify they pass**
|
|
72
|
+
|
|
73
|
+
```bash
|
|
74
|
+
zsh -n src/scripts/init_ralph_desk.zsh && echo "SYNTAX OK"
|
|
75
|
+
node --test tests/node/test-flywheel.mjs
|
|
76
|
+
```
|
|
77
|
+
Expected: SYNTAX OK, all PASS
|
|
78
|
+
|
|
79
|
+
- [ ] **Step 5: Commit**
|
|
80
|
+
|
|
81
|
+
```bash
|
|
82
|
+
git add tests/node/test-flywheel.mjs src/scripts/init_ralph_desk.zsh
|
|
83
|
+
git commit -m "feat: add CEO patterns #11-12 (proxy skepticism, classification) to flywheel prompt"
|
|
84
|
+
```
|
|
85
|
+
|
|
86
|
+
---
|
|
87
|
+
|
|
88
|
+
### Task 2: Add flywheel guard CLI flags
|
|
89
|
+
|
|
90
|
+
**Files:**
|
|
91
|
+
- Modify: `src/node/run.mjs:8-26` (RUN_DEFAULTS), `:32-65` (help), `:84-163` (parser)
|
|
92
|
+
- Create: `tests/node/test-flywheel-guard.mjs`
|
|
93
|
+
|
|
94
|
+
- [ ] **Step 1: Write failing tests**
|
|
95
|
+
|
|
96
|
+
Create `tests/node/test-flywheel-guard.mjs`:
|
|
97
|
+
|
|
98
|
+
```javascript
|
|
99
|
+
import test from 'node:test';
|
|
100
|
+
import assert from 'node:assert/strict';
|
|
101
|
+
|
|
102
|
+
test('G1: RUN_DEFAULTS includes flywheelGuard off and flywheelGuardModel opus', async () => {
|
|
103
|
+
const runModule = await import('../../src/node/run.mjs');
|
|
104
|
+
// Test via CLI parsing with no flags — defaults should apply
|
|
105
|
+
const stream = { data: '', write(v) { this.data += v; } };
|
|
106
|
+
// Just verify the module loads without error — defaults tested via G3
|
|
107
|
+
assert.ok(runModule.main);
|
|
108
|
+
});
|
|
109
|
+
|
|
110
|
+
test('G2: --flywheel-guard flag is parsed', async () => {
|
|
111
|
+
const { main } = await import('../../src/node/run.mjs');
|
|
112
|
+
const stream = { data: '', write(v) { this.data += v; } };
|
|
113
|
+
// --flywheel-guard without value should error
|
|
114
|
+
const code = await main(['run', 'test-slug', '--flywheel-guard'], {
|
|
115
|
+
cwd: '/tmp/nonexistent',
|
|
116
|
+
stdout: stream,
|
|
117
|
+
stderr: stream,
|
|
118
|
+
runCampaign: async () => {},
|
|
119
|
+
initCampaign: async () => {},
|
|
120
|
+
readStatus: async () => '',
|
|
121
|
+
});
|
|
122
|
+
assert.equal(code, 1);
|
|
123
|
+
assert.match(stream.data, /missing value for --flywheel-guard/);
|
|
124
|
+
});
|
|
125
|
+
|
|
126
|
+
test('G3: --flywheel-guard-model flag is parsed', async () => {
|
|
127
|
+
const { main } = await import('../../src/node/run.mjs');
|
|
128
|
+
const stream = { data: '', write(v) { this.data += v; } };
|
|
129
|
+
const code = await main(['run', 'test-slug', '--flywheel-guard-model'], {
|
|
130
|
+
cwd: '/tmp/nonexistent',
|
|
131
|
+
stdout: stream,
|
|
132
|
+
stderr: stream,
|
|
133
|
+
runCampaign: async () => {},
|
|
134
|
+
initCampaign: async () => {},
|
|
135
|
+
readStatus: async () => '',
|
|
136
|
+
});
|
|
137
|
+
assert.equal(code, 1);
|
|
138
|
+
assert.match(stream.data, /missing value for --flywheel-guard-model/);
|
|
139
|
+
});
|
|
140
|
+
|
|
141
|
+
test('G4: help text includes flywheel guard flags', async () => {
|
|
142
|
+
const { main } = await import('../../src/node/run.mjs');
|
|
143
|
+
const stream = { data: '', write(v) { this.data += v; } };
|
|
144
|
+
await main(['--help'], {
|
|
145
|
+
cwd: '/tmp',
|
|
146
|
+
stdout: stream,
|
|
147
|
+
stderr: stream,
|
|
148
|
+
runCampaign: async () => {},
|
|
149
|
+
initCampaign: async () => {},
|
|
150
|
+
readStatus: async () => '',
|
|
151
|
+
});
|
|
152
|
+
assert.match(stream.data, /--flywheel-guard off\|on/);
|
|
153
|
+
assert.match(stream.data, /--flywheel-guard-model MODEL/);
|
|
154
|
+
});
|
|
155
|
+
```
|
|
156
|
+
|
|
157
|
+
- [ ] **Step 2: Run tests to verify they fail**
|
|
158
|
+
|
|
159
|
+
```bash
|
|
160
|
+
node --test tests/node/test-flywheel-guard.mjs
|
|
161
|
+
```
|
|
162
|
+
Expected: G2-G4 FAIL (unknown option, help text missing)
|
|
163
|
+
|
|
164
|
+
- [ ] **Step 3: Add defaults, help text, and parser in run.mjs**
|
|
165
|
+
|
|
166
|
+
In `src/node/run.mjs`, add to `RUN_DEFAULTS` (after line 25 `flywheelModel: 'opus'`):
|
|
167
|
+
|
|
168
|
+
```javascript
|
|
169
|
+
flywheelGuard: 'off',
|
|
170
|
+
flywheelGuardModel: 'opus',
|
|
171
|
+
```
|
|
172
|
+
|
|
173
|
+
Add to `buildHelpText()` array (after `--flywheel-model MODEL` line):
|
|
174
|
+
|
|
175
|
+
```javascript
|
|
176
|
+
' --flywheel-guard off|on',
|
|
177
|
+
' --flywheel-guard-model MODEL',
|
|
178
|
+
```
|
|
179
|
+
|
|
180
|
+
Add to `parseRunOptions()` switch (after `--flywheel-model` case):
|
|
181
|
+
|
|
182
|
+
```javascript
|
|
183
|
+
case '--flywheel-guard':
|
|
184
|
+
options.flywheelGuard = consumeValue(args, index, token);
|
|
185
|
+
index += 1;
|
|
186
|
+
break;
|
|
187
|
+
case '--flywheel-guard-model':
|
|
188
|
+
options.flywheelGuardModel = consumeValue(args, index, token);
|
|
189
|
+
index += 1;
|
|
190
|
+
break;
|
|
191
|
+
```
|
|
192
|
+
|
|
193
|
+
- [ ] **Step 4: Run tests to verify they pass**
|
|
194
|
+
|
|
195
|
+
```bash
|
|
196
|
+
node --test tests/node/test-flywheel-guard.mjs
|
|
197
|
+
```
|
|
198
|
+
Expected: G1-G4 all PASS
|
|
199
|
+
|
|
200
|
+
- [ ] **Step 5: Run regression**
|
|
201
|
+
|
|
202
|
+
```bash
|
|
203
|
+
node --test tests/node/us008-cli-entrypoint.test.mjs
|
|
204
|
+
```
|
|
205
|
+
Expected: PASS (new defaults don't break existing deepEqual checks — verify; if deepEqual on RUN_DEFAULTS exists, update it to include new fields)
|
|
206
|
+
|
|
207
|
+
- [ ] **Step 6: Commit**
|
|
208
|
+
|
|
209
|
+
```bash
|
|
210
|
+
git add src/node/run.mjs tests/node/test-flywheel-guard.mjs
|
|
211
|
+
git commit -m "feat: add --flywheel-guard and --flywheel-guard-model CLI flags"
|
|
212
|
+
```
|
|
213
|
+
|
|
214
|
+
---
|
|
215
|
+
|
|
216
|
+
### Task 3: Add guard paths and shouldRunGuard logic
|
|
217
|
+
|
|
218
|
+
**Files:**
|
|
219
|
+
- Modify: `src/node/runner/campaign-main-loop.mjs:37-66` (buildPaths), `:415-419` (shouldRunFlywheel area)
|
|
220
|
+
- Modify: `tests/node/test-flywheel-guard.mjs`
|
|
221
|
+
|
|
222
|
+
- [ ] **Step 1: Write failing tests**
|
|
223
|
+
|
|
224
|
+
Append to `tests/node/test-flywheel-guard.mjs`:
|
|
225
|
+
|
|
226
|
+
```javascript
|
|
227
|
+
test('G5: shouldRunGuard returns false when flywheelGuard=off', async () => {
|
|
228
|
+
const { shouldRunGuard } = await import('../../src/node/runner/campaign-main-loop.mjs');
|
|
229
|
+
assert.equal(shouldRunGuard('off', { flywheel_guard_count: {} }), false);
|
|
230
|
+
});
|
|
231
|
+
|
|
232
|
+
test('G6: shouldRunGuard returns true when flywheelGuard=on', async () => {
|
|
233
|
+
const { shouldRunGuard } = await import('../../src/node/runner/campaign-main-loop.mjs');
|
|
234
|
+
assert.equal(shouldRunGuard('on', { flywheel_guard_count: {} }), true);
|
|
235
|
+
});
|
|
236
|
+
|
|
237
|
+
test('G7: shouldRunGuard returns false when flywheelGuard=on but guard retries exhausted', async () => {
|
|
238
|
+
const { shouldRunGuard } = await import('../../src/node/runner/campaign-main-loop.mjs');
|
|
239
|
+
assert.equal(shouldRunGuard('on', { flywheel_guard_count: { 'US-001': 3 } }, 'US-001'), false);
|
|
240
|
+
});
|
|
241
|
+
```
|
|
242
|
+
|
|
243
|
+
- [ ] **Step 2: Run tests to verify they fail**
|
|
244
|
+
|
|
245
|
+
```bash
|
|
246
|
+
node --test tests/node/test-flywheel-guard.mjs
|
|
247
|
+
```
|
|
248
|
+
Expected: G5-G7 FAIL (shouldRunGuard not exported)
|
|
249
|
+
|
|
250
|
+
- [ ] **Step 3: Implement shouldRunGuard and add guard paths**
|
|
251
|
+
|
|
252
|
+
In `src/node/runner/campaign-main-loop.mjs`, add to `buildPaths()` (after `flywheelSignalFile` line 65):
|
|
253
|
+
|
|
254
|
+
```javascript
|
|
255
|
+
flywheelGuardPromptFile: path.join(deskRoot, 'prompts', `${slug}.flywheel-guard.prompt.md`),
|
|
256
|
+
flywheelGuardVerdictFile: path.join(deskRoot, 'memos', `${slug}-flywheel-guard-verdict.json`),
|
|
257
|
+
```
|
|
258
|
+
|
|
259
|
+
Add new exported function (after `shouldRunFlywheel`):
|
|
260
|
+
|
|
261
|
+
```javascript
|
|
262
|
+
export function shouldRunGuard(flywheelGuard, state, usId) {
|
|
263
|
+
if (flywheelGuard !== 'on') return false;
|
|
264
|
+
const count = (state.flywheel_guard_count ?? {})[usId] ?? 0;
|
|
265
|
+
// max 2 retries (guard runs 1st time + 2 retries = 3 total guard executions max)
|
|
266
|
+
if (count >= 3) return false;
|
|
267
|
+
return true;
|
|
268
|
+
}
|
|
269
|
+
```
|
|
270
|
+
|
|
271
|
+
- [ ] **Step 4: Run tests to verify they pass**
|
|
272
|
+
|
|
273
|
+
```bash
|
|
274
|
+
node --test tests/node/test-flywheel-guard.mjs
|
|
275
|
+
```
|
|
276
|
+
Expected: G1-G7 all PASS
|
|
277
|
+
|
|
278
|
+
- [ ] **Step 5: Commit**
|
|
279
|
+
|
|
280
|
+
```bash
|
|
281
|
+
git add src/node/runner/campaign-main-loop.mjs tests/node/test-flywheel-guard.mjs
|
|
282
|
+
git commit -m "feat: add shouldRunGuard logic and guard paths to buildPaths"
|
|
283
|
+
```
|
|
284
|
+
|
|
285
|
+
---
|
|
286
|
+
|
|
287
|
+
### Task 4: Add guard prompt template to init_ralph_desk.zsh
|
|
288
|
+
|
|
289
|
+
**Files:**
|
|
290
|
+
- Modify: `src/scripts/init_ralph_desk.zsh` (after flywheel prompt section, ~line 690)
|
|
291
|
+
- Modify: `src/scripts/init_ralph_desk.zsh:276-283` (cleanup list)
|
|
292
|
+
- Modify: `src/scripts/init_ralph_desk.zsh:294-303` (prompt cleanup)
|
|
293
|
+
- Modify: `tests/node/test-flywheel-guard.mjs`
|
|
294
|
+
|
|
295
|
+
- [ ] **Step 1: Write failing tests**
|
|
296
|
+
|
|
297
|
+
Append to `tests/node/test-flywheel-guard.mjs`:
|
|
298
|
+
|
|
299
|
+
```javascript
|
|
300
|
+
import fs from 'node:fs/promises';
|
|
301
|
+
import path from 'node:path';
|
|
302
|
+
import { fileURLToPath } from 'node:url';
|
|
303
|
+
|
|
304
|
+
const repoRoot = path.resolve(path.dirname(fileURLToPath(import.meta.url)), '..', '..');
|
|
305
|
+
|
|
306
|
+
test('G8: init generates guard prompt with 4 validation checks', async () => {
|
|
307
|
+
const script = path.join(repoRoot, 'src', 'scripts', 'init_ralph_desk.zsh');
|
|
308
|
+
const content = await fs.readFile(script, 'utf8');
|
|
309
|
+
assert.match(content, /Look-ahead Bias/);
|
|
310
|
+
assert.match(content, /Metric Alignment/);
|
|
311
|
+
assert.match(content, /Deployability/);
|
|
312
|
+
assert.match(content, /Repeat Pattern/);
|
|
313
|
+
});
|
|
314
|
+
|
|
315
|
+
test('G9: guard prompt writes to flywheel-guard-verdict.json', async () => {
|
|
316
|
+
const script = path.join(repoRoot, 'src', 'scripts', 'init_ralph_desk.zsh');
|
|
317
|
+
const content = await fs.readFile(script, 'utf8');
|
|
318
|
+
assert.match(content, /flywheel-guard-verdict\.json/);
|
|
319
|
+
});
|
|
320
|
+
|
|
321
|
+
test('G10: guard verdict includes analysis_only field', async () => {
|
|
322
|
+
const script = path.join(repoRoot, 'src', 'scripts', 'init_ralph_desk.zsh');
|
|
323
|
+
const content = await fs.readFile(script, 'utf8');
|
|
324
|
+
assert.match(content, /analysis_only/);
|
|
325
|
+
});
|
|
326
|
+
|
|
327
|
+
test('G11: guard prompt references PRD as ground truth', async () => {
|
|
328
|
+
const script = path.join(repoRoot, 'src', 'scripts', 'init_ralph_desk.zsh');
|
|
329
|
+
const content = await fs.readFile(script, 'utf8');
|
|
330
|
+
assert.match(content, /PRD is ground truth/);
|
|
331
|
+
});
|
|
332
|
+
|
|
333
|
+
test('G12: cleanup list includes guard verdict file', async () => {
|
|
334
|
+
const script = path.join(repoRoot, 'src', 'scripts', 'init_ralph_desk.zsh');
|
|
335
|
+
const content = await fs.readFile(script, 'utf8');
|
|
336
|
+
assert.match(content, /flywheel-guard-verdict\.json/);
|
|
337
|
+
});
|
|
338
|
+
```
|
|
339
|
+
|
|
340
|
+
Note: the `fs`, `path`, `fileURLToPath`, `repoRoot` imports at the top of the file already exist from test-flywheel.mjs pattern. If creating a new file, include them. If appending, they must be at the top of the file — move the import block to the top and ensure no duplicates.
|
|
341
|
+
|
|
342
|
+
- [ ] **Step 2: Run tests to verify they fail**
|
|
343
|
+
|
|
344
|
+
```bash
|
|
345
|
+
node --test tests/node/test-flywheel-guard.mjs
|
|
346
|
+
```
|
|
347
|
+
Expected: G8-G12 FAIL
|
|
348
|
+
|
|
349
|
+
- [ ] **Step 3: Add guard prompt template to init_ralph_desk.zsh**
|
|
350
|
+
|
|
351
|
+
After the flywheel prompt section (after `else echo " · $F"; fi` around line 690), add:
|
|
352
|
+
|
|
353
|
+
```bash
|
|
354
|
+
# --- Flywheel Guard Prompt ---
|
|
355
|
+
F="$DESK/prompts/$SLUG.flywheel-guard.prompt.md"
|
|
356
|
+
if [[ ! -f "$F" ]]; then
|
|
357
|
+
cat > "$F" <<'GUARD_EOF'
|
|
358
|
+
# Flywheel Guard Review
|
|
359
|
+
|
|
360
|
+
You are an independent reviewer verifying whether a flywheel direction decision is safe to execute.
|
|
361
|
+
You have NO prior context about this campaign. Read the files below and evaluate the decision objectively.
|
|
362
|
+
|
|
363
|
+
## Files to Read (in order)
|
|
364
|
+
1. PRD: {DESK}/plans/prd-{SLUG}.md — the ground truth for what success means
|
|
365
|
+
2. Flywheel Decision: {DESK}/memos/{SLUG}-flywheel-signal.json — what the flywheel decided
|
|
366
|
+
3. Flywheel Analysis: {DESK}/memos/{SLUG}-flywheel-review.md — the flywheel's reasoning
|
|
367
|
+
4. Campaign Memory: {DESK}/memos/{SLUG}-memory.md — history, rejected directions, key decisions
|
|
368
|
+
5. Done Claim: {DESK}/memos/{SLUG}-done-claim.json — what the Worker actually produced
|
|
369
|
+
6. Verify Verdict: {DESK}/memos/{SLUG}-verify-verdict.json — why the Verifier failed it
|
|
370
|
+
|
|
371
|
+
## Validation Checks
|
|
372
|
+
|
|
373
|
+
### Check 1: Look-ahead Bias
|
|
374
|
+
List every data feature the flywheel's proposed direction depends on.
|
|
375
|
+
For each: "feature X — available at decision time: YES/NO/UNCLEAR"
|
|
376
|
+
- YES: feature is known before the event (entry time, session start price, order book state)
|
|
377
|
+
- NO: feature requires future information (peak price, session end, outcome)
|
|
378
|
+
- UNCLEAR: cannot determine from available context → mark inconclusive
|
|
379
|
+
If ANY feature is NO and used in a deployable strategy (not just upper-bound analysis): FAIL.
|
|
380
|
+
|
|
381
|
+
### Check 2: Metric Alignment
|
|
382
|
+
1. What metric does the PRD define as the optimization target?
|
|
383
|
+
2. What metric does the flywheel's direction optimize?
|
|
384
|
+
3. Are they the same?
|
|
385
|
+
- Same metric → pass
|
|
386
|
+
- Different metric, not flagged → FAIL (silent metric switch)
|
|
387
|
+
- Different metric, flagged with evidence → FAIL with recommendation: "metric mismatch requires PRD update or user approval before proceeding"
|
|
388
|
+
PRD is ground truth. The guard cannot approve off-PRD metric changes autonomously.
|
|
389
|
+
|
|
390
|
+
### Check 3: Deployability
|
|
391
|
+
Can the proposed direction's output be used in production as-is?
|
|
392
|
+
- Requires post-hoc data → FAIL
|
|
393
|
+
- Requires infrastructure not mentioned in PRD → FAIL
|
|
394
|
+
- Labeled as "upper-bound only" or "reference" → pass, but you MUST include "analysis_only": true in your verdict so Leader skips Worker dispatch (no implementation, analysis record only)
|
|
395
|
+
|
|
396
|
+
### Check 4: Repeat Pattern (same-US scoped)
|
|
397
|
+
Compare to prior flywheel decisions for the current US only in campaign memory's Key Decisions section.
|
|
398
|
+
- Same scope decision + same underlying approach as a prior flywheel for this US → FAIL
|
|
399
|
+
- Reframing of a previously rejected direction (check Rejected Directions) → FAIL
|
|
400
|
+
- Genuinely new approach → pass
|
|
401
|
+
Before writing your verdict, you MUST append any rejected flywheel direction to campaign memory's Rejected Directions section. This persists the record before cleanup can erase it.
|
|
402
|
+
|
|
403
|
+
## Output
|
|
404
|
+
Write verdict to: {DESK}/memos/{SLUG}-flywheel-guard-verdict.json
|
|
405
|
+
|
|
406
|
+
Use this format:
|
|
407
|
+
{
|
|
408
|
+
"verdict": "pass|fail|inconclusive",
|
|
409
|
+
"issues": [{"check": "check-name", "status": "pass|fail|inconclusive", "detail": "finding", "evidence": "reference"}],
|
|
410
|
+
"analysis_only": false,
|
|
411
|
+
"recommendation": "proceed|retry-flywheel|escalate-to-user",
|
|
412
|
+
"timestamp": "ISO"
|
|
413
|
+
}
|
|
414
|
+
|
|
415
|
+
Rules:
|
|
416
|
+
- If ALL checks pass → verdict: pass, recommendation: proceed
|
|
417
|
+
- If ANY check is fail → verdict: fail, recommendation: retry-flywheel
|
|
418
|
+
- If ANY check is inconclusive and none are fail → verdict: inconclusive, recommendation: escalate-to-user
|
|
419
|
+
- Include specific evidence for every check. No "seems fine" or "probably ok."
|
|
420
|
+
GUARD_EOF
|
|
421
|
+
|
|
422
|
+
# Replace placeholders with actual paths
|
|
423
|
+
sed -i '' "s|{DESK}|$DESK|g; s|{SLUG}|$SLUG|g" "$F"
|
|
424
|
+
|
|
425
|
+
echo " + $F"
|
|
426
|
+
else echo " · $F"; fi
|
|
427
|
+
```
|
|
428
|
+
|
|
429
|
+
- [ ] **Step 4: Add guard files to cleanup lists**
|
|
430
|
+
|
|
431
|
+
In `src/scripts/init_ralph_desk.zsh`, add to the runtime memos cleanup list (after `"$DESK/memos/$SLUG-flywheel-review.md"` around line 283):
|
|
432
|
+
|
|
433
|
+
```bash
|
|
434
|
+
"$DESK/memos/$SLUG-flywheel-guard-verdict.json" \
|
|
435
|
+
```
|
|
436
|
+
|
|
437
|
+
Add to prompt cleanup list (after `"$DESK/prompts/$SLUG.flywheel.prompt.md"` around line 298):
|
|
438
|
+
|
|
439
|
+
```bash
|
|
440
|
+
"$DESK/prompts/$SLUG.flywheel-guard.prompt.md" \
|
|
441
|
+
```
|
|
442
|
+
|
|
443
|
+
- [ ] **Step 5: Run tests to verify they pass**
|
|
444
|
+
|
|
445
|
+
```bash
|
|
446
|
+
zsh -n src/scripts/init_ralph_desk.zsh && echo "SYNTAX OK"
|
|
447
|
+
node --test tests/node/test-flywheel-guard.mjs
|
|
448
|
+
```
|
|
449
|
+
Expected: SYNTAX OK, G1-G12 all PASS
|
|
450
|
+
|
|
451
|
+
- [ ] **Step 6: Commit**
|
|
452
|
+
|
|
453
|
+
```bash
|
|
454
|
+
git add src/scripts/init_ralph_desk.zsh tests/node/test-flywheel-guard.mjs
|
|
455
|
+
git commit -m "feat: add flywheel guard prompt template with 4 validation checks"
|
|
456
|
+
```
|
|
457
|
+
|
|
458
|
+
---
|
|
459
|
+
|
|
460
|
+
### Task 5: Wire guard into campaign-main-loop.mjs
|
|
461
|
+
|
|
462
|
+
**Files:**
|
|
463
|
+
- Modify: `src/node/runner/campaign-main-loop.mjs:242-261` (readCurrentState), `:402-404` (buildFlywheelTriggerCmd area), `:537-559` (flywheel block in main loop)
|
|
464
|
+
- Modify: `tests/node/test-flywheel-guard.mjs`
|
|
465
|
+
|
|
466
|
+
- [ ] **Step 1: Write failing tests**
|
|
467
|
+
|
|
468
|
+
Append to `tests/node/test-flywheel-guard.mjs`:
|
|
469
|
+
|
|
470
|
+
```javascript
|
|
471
|
+
test('G13: buildPaths includes guard paths', async () => {
|
|
472
|
+
const script = path.join(repoRoot, 'src', 'node', 'runner', 'campaign-main-loop.mjs');
|
|
473
|
+
const content = await fs.readFile(script, 'utf8');
|
|
474
|
+
assert.match(content, /flywheelGuardPromptFile/);
|
|
475
|
+
assert.match(content, /flywheelGuardVerdictFile/);
|
|
476
|
+
});
|
|
477
|
+
|
|
478
|
+
test('G14: guard dispatch exists in main loop', async () => {
|
|
479
|
+
const script = path.join(repoRoot, 'src', 'node', 'runner', 'campaign-main-loop.mjs');
|
|
480
|
+
const content = await fs.readFile(script, 'utf8');
|
|
481
|
+
assert.match(content, /dispatchGuard/);
|
|
482
|
+
assert.match(content, /phase.*guard/i);
|
|
483
|
+
});
|
|
484
|
+
|
|
485
|
+
test('G15: guard runs AFTER flywheel and BEFORE worker', async () => {
|
|
486
|
+
const script = path.join(repoRoot, 'src', 'node', 'runner', 'campaign-main-loop.mjs');
|
|
487
|
+
const content = await fs.readFile(script, 'utf8');
|
|
488
|
+
const flywheelPos = content.indexOf('dispatchFlywheel');
|
|
489
|
+
const guardPos = content.indexOf('dispatchGuard');
|
|
490
|
+
const workerPos = content.indexOf('dispatchWorker');
|
|
491
|
+
assert.ok(flywheelPos < guardPos, 'flywheel must come before guard');
|
|
492
|
+
assert.ok(guardPos < workerPos, 'guard must come before worker');
|
|
493
|
+
});
|
|
494
|
+
|
|
495
|
+
test('G16: readCurrentState includes flywheel_guard_count', async () => {
|
|
496
|
+
const script = path.join(repoRoot, 'src', 'node', 'runner', 'campaign-main-loop.mjs');
|
|
497
|
+
const content = await fs.readFile(script, 'utf8');
|
|
498
|
+
assert.match(content, /flywheel_guard_count/);
|
|
499
|
+
});
|
|
500
|
+
|
|
501
|
+
test('G17: inconclusive verdict triggers BLOCKED', async () => {
|
|
502
|
+
const script = path.join(repoRoot, 'src', 'node', 'runner', 'campaign-main-loop.mjs');
|
|
503
|
+
const content = await fs.readFile(script, 'utf8');
|
|
504
|
+
assert.match(content, /inconclusive/);
|
|
505
|
+
assert.match(content, /escalate/i);
|
|
506
|
+
});
|
|
507
|
+
```
|
|
508
|
+
|
|
509
|
+
- [ ] **Step 2: Run tests to verify they fail**
|
|
510
|
+
|
|
511
|
+
```bash
|
|
512
|
+
node --test tests/node/test-flywheel-guard.mjs
|
|
513
|
+
```
|
|
514
|
+
Expected: G13-G17 FAIL
|
|
515
|
+
|
|
516
|
+
- [ ] **Step 3: Add flywheel_guard_count to readCurrentState**
|
|
517
|
+
|
|
518
|
+
In `src/node/runner/campaign-main-loop.mjs`, add to `readCurrentState()` return object (after `verifier_pane_id` line 259):
|
|
519
|
+
|
|
520
|
+
```javascript
|
|
521
|
+
flywheel_guard_count: status.flywheel_guard_count ?? {},
|
|
522
|
+
```
|
|
523
|
+
|
|
524
|
+
- [ ] **Step 4: Add buildGuardTriggerCmd and dispatchGuard**
|
|
525
|
+
|
|
526
|
+
After `dispatchFlywheel` function (around line 413), add:
|
|
527
|
+
|
|
528
|
+
```javascript
|
|
529
|
+
function buildGuardTriggerCmd({ guardPromptFile, guardModel, rootDir }) {
|
|
530
|
+
return `cd ${JSON.stringify(rootDir)} && DISABLE_OMC=1 claude --model ${guardModel} --no-mcp -p "$(cat ${JSON.stringify(guardPromptFile)})"`;
|
|
531
|
+
}
|
|
532
|
+
|
|
533
|
+
async function dispatchGuard({ paths, sendKeys, guardPaneId, guardModel, rootDir }) {
|
|
534
|
+
const triggerCmd = buildGuardTriggerCmd({
|
|
535
|
+
guardPromptFile: paths.flywheelGuardPromptFile,
|
|
536
|
+
guardModel,
|
|
537
|
+
rootDir,
|
|
538
|
+
});
|
|
539
|
+
await sendKeys(guardPaneId, triggerCmd);
|
|
540
|
+
}
|
|
541
|
+
```
|
|
542
|
+
|
|
543
|
+
- [ ] **Step 5: Wire guard into main loop flywheel block**
|
|
544
|
+
|
|
545
|
+
Replace the flywheel block (lines 537-559) with the expanded version that includes guard:
|
|
546
|
+
|
|
547
|
+
```javascript
|
|
548
|
+
// Flywheel direction review (runs BEFORE Worker)
|
|
549
|
+
if (shouldRunFlywheel(options.flywheel ?? 'off', state)) {
|
|
550
|
+
state.phase = 'flywheel';
|
|
551
|
+
await writeStatus(paths, state, options.onStatusChange, options.now);
|
|
552
|
+
|
|
553
|
+
await dispatchFlywheel({
|
|
554
|
+
paths,
|
|
555
|
+
sendKeys,
|
|
556
|
+
flywheelPaneId: state.flywheel_pane_id ?? state.verifier_pane_id,
|
|
557
|
+
flywheelModel: options.flywheelModel ?? 'opus',
|
|
558
|
+
rootDir,
|
|
559
|
+
});
|
|
560
|
+
|
|
561
|
+
const flywheelSignal = await pollForSignal(paths.flywheelSignalFile, {
|
|
562
|
+
mode: 'claude',
|
|
563
|
+
paneId: state.flywheel_pane_id ?? state.verifier_pane_id,
|
|
564
|
+
});
|
|
565
|
+
|
|
566
|
+
state.last_flywheel_decision = flywheelSignal.decision;
|
|
567
|
+
await fs.unlink(paths.flywheelSignalFile).catch(() => {});
|
|
568
|
+
|
|
569
|
+
// Flywheel Guard (independent validation)
|
|
570
|
+
if (shouldRunGuard(options.flywheelGuard ?? 'off', state, state.current_us)) {
|
|
571
|
+
state.phase = 'guard';
|
|
572
|
+
await writeStatus(paths, state, options.onStatusChange, options.now);
|
|
573
|
+
|
|
574
|
+
const guardPaneId = state.flywheel_pane_id ?? state.verifier_pane_id;
|
|
575
|
+
const guardModel = options.flywheelGuardModel ?? 'opus';
|
|
576
|
+
|
|
577
|
+
await dispatchGuard({ paths, sendKeys, guardPaneId, guardModel, rootDir });
|
|
578
|
+
|
|
579
|
+
const guardVerdict = await pollForSignal(paths.flywheelGuardVerdictFile, {
|
|
580
|
+
mode: 'claude',
|
|
581
|
+
paneId: guardPaneId,
|
|
582
|
+
});
|
|
583
|
+
|
|
584
|
+
// Track per-US guard count
|
|
585
|
+
if (!state.flywheel_guard_count[state.current_us]) {
|
|
586
|
+
state.flywheel_guard_count[state.current_us] = 0;
|
|
587
|
+
}
|
|
588
|
+
state.flywheel_guard_count[state.current_us] += 1;
|
|
589
|
+
|
|
590
|
+
await fs.unlink(paths.flywheelGuardVerdictFile).catch(() => {});
|
|
591
|
+
|
|
592
|
+
if (guardVerdict.verdict === 'inconclusive') {
|
|
593
|
+
// Escalate to user — BLOCKED
|
|
594
|
+
state.phase = 'blocked';
|
|
595
|
+
await writeSentinel(paths.blockedSentinel, 'blocked', state.current_us);
|
|
596
|
+
await writeStatus(paths, state, options.onStatusChange, options.now);
|
|
597
|
+
return {
|
|
598
|
+
status: 'blocked',
|
|
599
|
+
usId: state.current_us,
|
|
600
|
+
reason: 'flywheel-guard-inconclusive',
|
|
601
|
+
guardIssues: guardVerdict.issues,
|
|
602
|
+
statusFile: paths.statusFile,
|
|
603
|
+
};
|
|
604
|
+
}
|
|
605
|
+
|
|
606
|
+
if (guardVerdict.verdict === 'fail') {
|
|
607
|
+
// Check if retries exhausted
|
|
608
|
+
if (state.flywheel_guard_count[state.current_us] >= 3) {
|
|
609
|
+
state.phase = 'blocked';
|
|
610
|
+
await writeSentinel(paths.blockedSentinel, 'blocked', state.current_us);
|
|
611
|
+
await writeStatus(paths, state, options.onStatusChange, options.now);
|
|
612
|
+
return {
|
|
613
|
+
status: 'blocked',
|
|
614
|
+
usId: state.current_us,
|
|
615
|
+
reason: 'flywheel-guard-retries-exhausted',
|
|
616
|
+
guardIssues: guardVerdict.issues,
|
|
617
|
+
statusFile: paths.statusFile,
|
|
618
|
+
};
|
|
619
|
+
}
|
|
620
|
+
// Retry: skip Worker, go to next iteration (flywheel will re-run)
|
|
621
|
+
// Guard feedback is already persisted via guard agent's memory write-back
|
|
622
|
+
state.phase = 'worker';
|
|
623
|
+
await writeStatus(paths, state, options.onStatusChange, options.now);
|
|
624
|
+
state.iteration += 1;
|
|
625
|
+
continue;
|
|
626
|
+
}
|
|
627
|
+
|
|
628
|
+
// verdict === 'pass'
|
|
629
|
+
if (guardVerdict.analysis_only) {
|
|
630
|
+
// Analysis-only direction — skip Worker, record and continue
|
|
631
|
+
state.phase = 'worker';
|
|
632
|
+
await writeStatus(paths, state, options.onStatusChange, options.now);
|
|
633
|
+
state.iteration += 1;
|
|
634
|
+
continue;
|
|
635
|
+
}
|
|
636
|
+
}
|
|
637
|
+
|
|
638
|
+
// Reset guard count on pass (flywheel direction accepted)
|
|
639
|
+
if (state.flywheel_guard_count[state.current_us]) {
|
|
640
|
+
state.flywheel_guard_count[state.current_us] = 0;
|
|
641
|
+
}
|
|
642
|
+
}
|
|
643
|
+
```
|
|
644
|
+
|
|
645
|
+
- [ ] **Step 6: Run tests to verify they pass**
|
|
646
|
+
|
|
647
|
+
```bash
|
|
648
|
+
node --test tests/node/test-flywheel-guard.mjs
|
|
649
|
+
node --test tests/node/test-flywheel.mjs
|
|
650
|
+
```
|
|
651
|
+
Expected: all PASS
|
|
652
|
+
|
|
653
|
+
- [ ] **Step 7: Run regression**
|
|
654
|
+
|
|
655
|
+
```bash
|
|
656
|
+
node --test tests/node/us007-analytics-reporting.test.mjs
|
|
657
|
+
node --test tests/node/us008-cli-entrypoint.test.mjs
|
|
658
|
+
```
|
|
659
|
+
Expected: all PASS (update us008 deepEqual if it checks status.json shape with new `flywheel_guard_count` field)
|
|
660
|
+
|
|
661
|
+
- [ ] **Step 8: Commit**
|
|
662
|
+
|
|
663
|
+
```bash
|
|
664
|
+
git add src/node/runner/campaign-main-loop.mjs tests/node/test-flywheel-guard.mjs
|
|
665
|
+
git commit -m "feat: wire flywheel guard into campaign loop (after flywheel, before worker)"
|
|
666
|
+
```
|
|
667
|
+
|
|
668
|
+
---
|
|
669
|
+
|
|
670
|
+
### Task 6: Add guard flags to docs and presets
|
|
671
|
+
|
|
672
|
+
**Files:**
|
|
673
|
+
- Modify: `src/commands/rlp-desk.md:192-194` and `:222-224`
|
|
674
|
+
- Modify: `src/scripts/init_ralph_desk.zsh:243-244`
|
|
675
|
+
- Modify: `tests/node/test-flywheel-guard.mjs`
|
|
676
|
+
|
|
677
|
+
- [ ] **Step 1: Write failing tests**
|
|
678
|
+
|
|
679
|
+
Append to `tests/node/test-flywheel-guard.mjs`:
|
|
680
|
+
|
|
681
|
+
```javascript
|
|
682
|
+
test('G18: rlp-desk.md options reference includes guard flags', async () => {
|
|
683
|
+
const content = await fs.readFile(path.join(repoRoot, 'src', 'commands', 'rlp-desk.md'), 'utf8');
|
|
684
|
+
assert.match(content, /--flywheel-guard off\|on/);
|
|
685
|
+
assert.match(content, /--flywheel-guard-model MODEL/);
|
|
686
|
+
});
|
|
687
|
+
|
|
688
|
+
test('G19: init presets include guard flags', async () => {
|
|
689
|
+
const content = await fs.readFile(path.join(repoRoot, 'src', 'scripts', 'init_ralph_desk.zsh'), 'utf8');
|
|
690
|
+
assert.match(content, /--flywheel-guard off\|on/);
|
|
691
|
+
assert.match(content, /--flywheel-guard-model MODEL/);
|
|
692
|
+
});
|
|
693
|
+
```
|
|
694
|
+
|
|
695
|
+
- [ ] **Step 2: Run tests to verify they fail**
|
|
696
|
+
|
|
697
|
+
```bash
|
|
698
|
+
node --test tests/node/test-flywheel-guard.mjs
|
|
699
|
+
```
|
|
700
|
+
Expected: G18-G19 FAIL
|
|
701
|
+
|
|
702
|
+
- [ ] **Step 3: Add guard flags to rlp-desk.md**
|
|
703
|
+
|
|
704
|
+
In `src/commands/rlp-desk.md`, after both `--flywheel-model MODEL` lines (lines 194 and 224), add:
|
|
705
|
+
|
|
706
|
+
```
|
|
707
|
+
# --flywheel-guard off|on guard validates flywheel decisions (default: off)
|
|
708
|
+
# --flywheel-guard-model MODEL guard reviewer model (default: opus)
|
|
709
|
+
```
|
|
710
|
+
|
|
711
|
+
- [ ] **Step 4: Add guard flags to init presets**
|
|
712
|
+
|
|
713
|
+
In `src/scripts/init_ralph_desk.zsh`, after `--flywheel-model MODEL` echo (line 244), add:
|
|
714
|
+
|
|
715
|
+
```bash
|
|
716
|
+
echo "# --flywheel-guard off|on guard validates flywheel decisions (default: off)"
|
|
717
|
+
echo "# --flywheel-guard-model MODEL guard reviewer model (default: opus)"
|
|
718
|
+
```
|
|
719
|
+
|
|
720
|
+
- [ ] **Step 5: Run tests to verify they pass**
|
|
721
|
+
|
|
722
|
+
```bash
|
|
723
|
+
zsh -n src/scripts/init_ralph_desk.zsh && echo "SYNTAX OK"
|
|
724
|
+
node --test tests/node/test-flywheel-guard.mjs
|
|
725
|
+
```
|
|
726
|
+
Expected: SYNTAX OK, all PASS
|
|
727
|
+
|
|
728
|
+
- [ ] **Step 6: Commit**
|
|
729
|
+
|
|
730
|
+
```bash
|
|
731
|
+
git add src/commands/rlp-desk.md src/scripts/init_ralph_desk.zsh tests/node/test-flywheel-guard.mjs
|
|
732
|
+
git commit -m "feat: add flywheel guard flags to docs and run presets"
|
|
733
|
+
```
|
|
734
|
+
|
|
735
|
+
---
|
|
736
|
+
|
|
737
|
+
### Task 7: Update governance.md — guard step in Leader Loop
|
|
738
|
+
|
|
739
|
+
**Files:**
|
|
740
|
+
- Modify: `src/governance.md:453-509` (§7 Leader Loop Protocol)
|
|
741
|
+
|
|
742
|
+
- [ ] **Step 1: Add guard step to Leader Loop Protocol**
|
|
743
|
+
|
|
744
|
+
In `src/governance.md`, in the Leader Loop Protocol (§7), add after the flywheel description. The flywheel is not yet mentioned in §7 (it's only in the code), so add a new sub-step between ⑥ and ⑦:
|
|
745
|
+
|
|
746
|
+
After `⑥ Read memory.md again` (line 479), add:
|
|
747
|
+
|
|
748
|
+
```
|
|
749
|
+
⑥½ Flywheel direction review (when --flywheel on-fail and consecutive_failures > 0)
|
|
750
|
+
- Dispatch Flywheel agent (fresh context, --flywheel-model)
|
|
751
|
+
- Read flywheel-signal.json for direction decision (hold/pivot/reduce/expand)
|
|
752
|
+
- If --flywheel-guard on:
|
|
753
|
+
- Dispatch Guard agent (fresh context, --flywheel-guard-model)
|
|
754
|
+
- Read flywheel-guard-verdict.json:
|
|
755
|
+
• pass → proceed to Worker with updated contract
|
|
756
|
+
• pass + analysis_only → skip Worker, record analysis, next iteration
|
|
757
|
+
• fail → re-run Flywheel with guard feedback (max 2 retries)
|
|
758
|
+
• fail + retries exhausted → BLOCKED
|
|
759
|
+
• inconclusive → BLOCKED (escalate to user)
|
|
760
|
+
- Guard count tracked per-US in status.json
|
|
761
|
+
```
|
|
762
|
+
|
|
763
|
+
- [ ] **Step 2: Verify syntax**
|
|
764
|
+
|
|
765
|
+
```bash
|
|
766
|
+
# Quick check the file is valid markdown:
|
|
767
|
+
head -5 src/governance.md
|
|
768
|
+
```
|
|
769
|
+
|
|
770
|
+
- [ ] **Step 3: Commit**
|
|
771
|
+
|
|
772
|
+
```bash
|
|
773
|
+
git add src/governance.md
|
|
774
|
+
git commit -m "docs: add flywheel guard step to §7 Leader Loop Protocol"
|
|
775
|
+
```
|
|
776
|
+
|
|
777
|
+
---
|
|
778
|
+
|
|
779
|
+
### Task 8: Local sync + full regression
|
|
780
|
+
|
|
781
|
+
**Files:** none modified — sync and verification only
|
|
782
|
+
|
|
783
|
+
- [ ] **Step 1: Run full test suite**
|
|
784
|
+
|
|
785
|
+
```bash
|
|
786
|
+
node --test tests/node/test-flywheel.mjs tests/node/test-flywheel-guard.mjs tests/node/us007-analytics-reporting.test.mjs tests/node/us008-cli-entrypoint.test.mjs
|
|
787
|
+
```
|
|
788
|
+
Expected: 0 failures
|
|
789
|
+
|
|
790
|
+
- [ ] **Step 2: Check zsh syntax**
|
|
791
|
+
|
|
792
|
+
```bash
|
|
793
|
+
zsh -n src/scripts/init_ralph_desk.zsh && echo "SYNTAX OK"
|
|
794
|
+
```
|
|
795
|
+
|
|
796
|
+
- [ ] **Step 3: Local file sync**
|
|
797
|
+
|
|
798
|
+
```bash
|
|
799
|
+
cp src/commands/rlp-desk.md ~/.claude/commands/rlp-desk.md
|
|
800
|
+
cp src/governance.md ~/.claude/ralph-desk/governance.md
|
|
801
|
+
cp src/scripts/init_ralph_desk.zsh ~/.claude/ralph-desk/init_ralph_desk.zsh
|
|
802
|
+
cp src/scripts/run_ralph_desk.zsh ~/.claude/ralph-desk/run_ralph_desk.zsh
|
|
803
|
+
cp src/scripts/lib_ralph_desk.zsh ~/.claude/ralph-desk/lib_ralph_desk.zsh
|
|
804
|
+
cp README.md ~/.claude/ralph-desk/README.md
|
|
805
|
+
```
|
|
806
|
+
|
|
807
|
+
- [ ] **Step 4: Verify sync**
|
|
808
|
+
|
|
809
|
+
```bash
|
|
810
|
+
diff -q src/commands/rlp-desk.md ~/.claude/commands/rlp-desk.md
|
|
811
|
+
diff -q src/governance.md ~/.claude/ralph-desk/governance.md
|
|
812
|
+
diff -q src/scripts/init_ralph_desk.zsh ~/.claude/ralph-desk/init_ralph_desk.zsh
|
|
813
|
+
diff -q src/scripts/run_ralph_desk.zsh ~/.claude/ralph-desk/run_ralph_desk.zsh
|
|
814
|
+
diff -q src/scripts/lib_ralph_desk.zsh ~/.claude/ralph-desk/lib_ralph_desk.zsh
|
|
815
|
+
diff -q README.md ~/.claude/ralph-desk/README.md
|
|
816
|
+
```
|
|
817
|
+
All must produce no output (identical).
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "@ai-dev-methodologies/rlp-desk",
|
|
3
|
-
"version": "0.9.
|
|
3
|
+
"version": "0.9.3",
|
|
4
4
|
"description": "Fresh-context iterative loops for Claude Code — autonomous task completion with independent verification",
|
|
5
5
|
"scripts": {
|
|
6
6
|
"postinstall": "node scripts/postinstall.js",
|
package/src/commands/rlp-desk.md
CHANGED
|
@@ -192,6 +192,8 @@ Tell the user:
|
|
|
192
192
|
# --with-self-verification post-campaign SV report
|
|
193
193
|
# --flywheel off|on-fail direction review on fail (default: off)
|
|
194
194
|
# --flywheel-model MODEL flywheel reviewer model (default: opus)
|
|
195
|
+
# --flywheel-guard off|on guard validates flywheel decisions (default: off)
|
|
196
|
+
# --flywheel-guard-model MODEL guard reviewer model (default: opus)
|
|
195
197
|
```
|
|
196
198
|
|
|
197
199
|
**If codex is NOT installed** — show claude-only presets + install recommendation:
|
|
@@ -222,6 +224,8 @@ Tell the user:
|
|
|
222
224
|
# --with-self-verification post-campaign SV report
|
|
223
225
|
# --flywheel off|on-fail direction review on fail (default: off)
|
|
224
226
|
# --flywheel-model MODEL flywheel reviewer model (default: opus)
|
|
227
|
+
# --flywheel-guard off|on guard validates flywheel decisions (default: off)
|
|
228
|
+
# --flywheel-guard-model MODEL guard reviewer model (default: opus)
|
|
225
229
|
```
|
|
226
230
|
|
|
227
231
|
Replace `<actual-slug>` with the real slug from this init (e.g. `auth-refactor`).
|
package/src/governance.md
CHANGED
|
@@ -483,6 +483,19 @@ for iteration in 1..max_iter:
|
|
|
483
483
|
parsing memory.md. In Agent() mode, the Leader MAY read iter-signal.json
|
|
484
484
|
as a structured alternative to parsing the Stop Status from memory.md.
|
|
485
485
|
|
|
486
|
+
⑥½ Flywheel direction review (when --flywheel on-fail and consecutive_failures > 0)
|
|
487
|
+
- Dispatch Flywheel agent (fresh context, --flywheel-model)
|
|
488
|
+
- Read flywheel-signal.json for direction decision (hold/pivot/reduce/expand)
|
|
489
|
+
- If --flywheel-guard on:
|
|
490
|
+
- Dispatch Guard agent (fresh context, --flywheel-guard-model)
|
|
491
|
+
- Read flywheel-guard-verdict.json:
|
|
492
|
+
• pass → proceed to Worker with updated contract
|
|
493
|
+
• pass + analysis_only → skip Worker, record analysis, next iteration
|
|
494
|
+
• fail → re-run Flywheel with guard feedback (max 2 retries)
|
|
495
|
+
• fail + retries exhausted → BLOCKED
|
|
496
|
+
• inconclusive → BLOCKED (escalate to user)
|
|
497
|
+
- Guard count tracked per-US in status.json
|
|
498
|
+
|
|
486
499
|
⑦ Execute Verifier (see §7a for per-US and §7b for consensus details)
|
|
487
500
|
- Build prompt (scoped to us_id if per-us mode) → log
|
|
488
501
|
- Agent(subagent_type="executor", model=selected, prompt=prompt)
|
package/src/node/run.mjs
CHANGED
|
@@ -23,6 +23,8 @@ const RUN_DEFAULTS = {
|
|
|
23
23
|
withSelfVerification: false,
|
|
24
24
|
flywheel: 'off',
|
|
25
25
|
flywheelModel: 'opus',
|
|
26
|
+
flywheelGuard: 'off',
|
|
27
|
+
flywheelGuardModel: 'opus',
|
|
26
28
|
};
|
|
27
29
|
|
|
28
30
|
function write(stream, value) {
|
|
@@ -61,6 +63,8 @@ function buildHelpText() {
|
|
|
61
63
|
' --with-self-verification',
|
|
62
64
|
' --flywheel off|on-fail',
|
|
63
65
|
' --flywheel-model MODEL',
|
|
66
|
+
' --flywheel-guard off|on',
|
|
67
|
+
' --flywheel-guard-model MODEL',
|
|
64
68
|
' --help',
|
|
65
69
|
].join('\n');
|
|
66
70
|
}
|
|
@@ -154,6 +158,14 @@ function parseRunOptions(args, cwd) {
|
|
|
154
158
|
options.flywheelModel = consumeValue(args, index, token);
|
|
155
159
|
index += 1;
|
|
156
160
|
break;
|
|
161
|
+
case '--flywheel-guard':
|
|
162
|
+
options.flywheelGuard = consumeValue(args, index, token);
|
|
163
|
+
index += 1;
|
|
164
|
+
break;
|
|
165
|
+
case '--flywheel-guard-model':
|
|
166
|
+
options.flywheelGuardModel = consumeValue(args, index, token);
|
|
167
|
+
index += 1;
|
|
168
|
+
break;
|
|
157
169
|
default:
|
|
158
170
|
throw new Error(`unknown option: ${token}`);
|
|
159
171
|
}
|
|
@@ -63,6 +63,8 @@ function buildPaths(rootDir, slug) {
|
|
|
63
63
|
statusFile: path.join(campaignLogDir, 'runtime', 'status.json'),
|
|
64
64
|
flywheelPromptFile: path.join(deskRoot, 'prompts', `${slug}.flywheel.prompt.md`),
|
|
65
65
|
flywheelSignalFile: path.join(deskRoot, 'memos', `${slug}-flywheel-signal.json`),
|
|
66
|
+
flywheelGuardPromptFile: path.join(deskRoot, 'prompts', `${slug}.flywheel-guard.prompt.md`),
|
|
67
|
+
flywheelGuardVerdictFile: path.join(deskRoot, 'memos', `${slug}-flywheel-guard-verdict.json`),
|
|
66
68
|
};
|
|
67
69
|
}
|
|
68
70
|
|
|
@@ -257,6 +259,7 @@ async function readCurrentState(paths, slug, options) {
|
|
|
257
259
|
leader_pane_id: status.leader_pane_id ?? null,
|
|
258
260
|
worker_pane_id: status.worker_pane_id ?? null,
|
|
259
261
|
verifier_pane_id: status.verifier_pane_id ?? null,
|
|
262
|
+
flywheel_guard_count: status.flywheel_guard_count ?? {},
|
|
260
263
|
started_at_utc: startedAt,
|
|
261
264
|
};
|
|
262
265
|
}
|
|
@@ -412,12 +415,32 @@ async function dispatchFlywheel({ paths, sendKeys, flywheelPaneId, flywheelModel
|
|
|
412
415
|
await sendKeys(flywheelPaneId, triggerCmd);
|
|
413
416
|
}
|
|
414
417
|
|
|
418
|
+
function buildGuardTriggerCmd({ guardPromptFile, guardModel, rootDir }) {
|
|
419
|
+
return `cd ${JSON.stringify(rootDir)} && DISABLE_OMC=1 claude --model ${guardModel} --no-mcp -p "$(cat ${JSON.stringify(guardPromptFile)})"`;
|
|
420
|
+
}
|
|
421
|
+
|
|
422
|
+
async function dispatchGuard({ paths, sendKeys, guardPaneId, guardModel, rootDir }) {
|
|
423
|
+
const triggerCmd = buildGuardTriggerCmd({
|
|
424
|
+
guardPromptFile: paths.flywheelGuardPromptFile,
|
|
425
|
+
guardModel,
|
|
426
|
+
rootDir,
|
|
427
|
+
});
|
|
428
|
+
await sendKeys(guardPaneId, triggerCmd);
|
|
429
|
+
}
|
|
430
|
+
|
|
415
431
|
export function shouldRunFlywheel(flywheelMode, state) {
|
|
416
432
|
if (flywheelMode === 'off') return false;
|
|
417
433
|
if (flywheelMode === 'on-fail' && (state.consecutive_failures ?? 0) > 0) return true;
|
|
418
434
|
return false;
|
|
419
435
|
}
|
|
420
436
|
|
|
437
|
+
export function shouldRunGuard(flywheelGuard, state, usId) {
|
|
438
|
+
if (flywheelGuard !== 'on') return false;
|
|
439
|
+
const count = (state.flywheel_guard_count ?? {})[usId] ?? 0;
|
|
440
|
+
if (count >= 3) return false;
|
|
441
|
+
return true;
|
|
442
|
+
}
|
|
443
|
+
|
|
421
444
|
export async function run(slug, options = {}) {
|
|
422
445
|
const rootDir = path.resolve(options.rootDir ?? process.cwd());
|
|
423
446
|
const paths = buildPaths(rootDir, slug);
|
|
@@ -553,9 +576,76 @@ export async function run(slug, options = {}) {
|
|
|
553
576
|
});
|
|
554
577
|
|
|
555
578
|
state.last_flywheel_decision = flywheelSignal.decision;
|
|
556
|
-
// Campaign memory already updated by flywheel agent
|
|
557
|
-
// Clean signal file for next iteration
|
|
558
579
|
await fs.unlink(paths.flywheelSignalFile).catch(() => {});
|
|
580
|
+
|
|
581
|
+
// Flywheel Guard (independent validation of flywheel decision)
|
|
582
|
+
if (shouldRunGuard(options.flywheelGuard ?? 'off', state, state.current_us)) {
|
|
583
|
+
state.phase = 'guard';
|
|
584
|
+
await writeStatus(paths, state, options.onStatusChange, options.now);
|
|
585
|
+
|
|
586
|
+
const guardPaneId = state.flywheel_pane_id ?? state.verifier_pane_id;
|
|
587
|
+
const guardModel = options.flywheelGuardModel ?? 'opus';
|
|
588
|
+
|
|
589
|
+
await dispatchGuard({ paths, sendKeys, guardPaneId, guardModel, rootDir });
|
|
590
|
+
|
|
591
|
+
const guardVerdict = await pollForSignal(paths.flywheelGuardVerdictFile, {
|
|
592
|
+
mode: 'claude',
|
|
593
|
+
paneId: guardPaneId,
|
|
594
|
+
});
|
|
595
|
+
|
|
596
|
+
if (!state.flywheel_guard_count[state.current_us]) {
|
|
597
|
+
state.flywheel_guard_count[state.current_us] = 0;
|
|
598
|
+
}
|
|
599
|
+
state.flywheel_guard_count[state.current_us] += 1;
|
|
600
|
+
|
|
601
|
+
await fs.unlink(paths.flywheelGuardVerdictFile).catch(() => {});
|
|
602
|
+
|
|
603
|
+
if (guardVerdict.verdict === 'inconclusive') {
|
|
604
|
+
state.phase = 'blocked';
|
|
605
|
+
await writeSentinel(paths.blockedSentinel, 'blocked', state.current_us);
|
|
606
|
+
await writeStatus(paths, state, options.onStatusChange, options.now);
|
|
607
|
+
return {
|
|
608
|
+
status: 'blocked',
|
|
609
|
+
usId: state.current_us,
|
|
610
|
+
reason: 'flywheel-guard-escalate-inconclusive',
|
|
611
|
+
guardIssues: guardVerdict.issues,
|
|
612
|
+
statusFile: paths.statusFile,
|
|
613
|
+
};
|
|
614
|
+
}
|
|
615
|
+
|
|
616
|
+
if (guardVerdict.verdict === 'fail') {
|
|
617
|
+
if (state.flywheel_guard_count[state.current_us] >= 3) {
|
|
618
|
+
state.phase = 'blocked';
|
|
619
|
+
await writeSentinel(paths.blockedSentinel, 'blocked', state.current_us);
|
|
620
|
+
await writeStatus(paths, state, options.onStatusChange, options.now);
|
|
621
|
+
return {
|
|
622
|
+
status: 'blocked',
|
|
623
|
+
usId: state.current_us,
|
|
624
|
+
reason: 'flywheel-guard-retries-exhausted',
|
|
625
|
+
guardIssues: guardVerdict.issues,
|
|
626
|
+
statusFile: paths.statusFile,
|
|
627
|
+
};
|
|
628
|
+
}
|
|
629
|
+
// Retry: skip Worker, continue to next iteration (flywheel will re-run)
|
|
630
|
+
state.phase = 'worker';
|
|
631
|
+
await writeStatus(paths, state, options.onStatusChange, options.now);
|
|
632
|
+
state.iteration += 1;
|
|
633
|
+
continue;
|
|
634
|
+
}
|
|
635
|
+
|
|
636
|
+
// verdict === 'pass'
|
|
637
|
+
if (guardVerdict.analysis_only) {
|
|
638
|
+
state.phase = 'worker';
|
|
639
|
+
await writeStatus(paths, state, options.onStatusChange, options.now);
|
|
640
|
+
state.iteration += 1;
|
|
641
|
+
continue;
|
|
642
|
+
}
|
|
643
|
+
}
|
|
644
|
+
|
|
645
|
+
// Reset guard count on pass (flywheel direction accepted)
|
|
646
|
+
if (state.flywheel_guard_count[state.current_us]) {
|
|
647
|
+
state.flywheel_guard_count[state.current_us] = 0;
|
|
648
|
+
}
|
|
559
649
|
}
|
|
560
650
|
|
|
561
651
|
state.phase = 'worker';
|