opencode-ralph-rlm 0.1.5
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +609 -0
- package/dist/ralph-rlm.js +24643 -0
- package/package.json +47 -0
package/README.md
ADDED
|
@@ -0,0 +1,609 @@
|
|
|
1
|
+
# ralph-rlm
|
|
2
|
+
|
|
3
|
+
An [OpenCode](https://opencode.ai) plugin that turns an AI coding session into a persistent, self-correcting loop. Describe a goal, walk away, and come back to working code.
|
|
4
|
+
|
|
5
|
+
Two techniques combine to make this work:
|
|
6
|
+
|
|
7
|
+
- **Ralph** — a strategist session spawned fresh per attempt. It reviews what failed, adjusts the plan and instructions, then delegates coding to a worker. It never writes code itself.
|
|
8
|
+
- **RLM** (Recursive Language Model worker) — a file-first coding session based on [arXiv:2512.24601](https://arxiv.org/abs/2512.24601). Each attempt gets a clean context window and loads all state from files rather than inheriting noise from prior turns.
|
|
9
|
+
|
|
10
|
+
|
|
11
|
+
## The problem this solves
|
|
12
|
+
|
|
13
|
+
Long-running AI coding sessions degrade. By attempt 4 or 5, the context window contains echoes of three failed strategies, retracted plans, contradictory tool outputs, and the model's own hedging. The agent starts reasoning from a corrupted premise. You end up re-explaining what went wrong, manually pruning bad state, or starting over.
|
|
14
|
+
|
|
15
|
+
The standard response — "just use a bigger context window" — makes things worse. More capacity means more noise survives longer. The problem isn't window size, it's window hygiene.
|
|
16
|
+
|
|
17
|
+
ralph-rlm solves this by treating each attempt as disposable. State lives in files, not in the context window. Each new session loads exactly what it needs from those files and nothing else. The loop gets smarter with each failure by updating its instructions — not by accumulating turns.
|
|
18
|
+
|
|
19
|
+
|
|
20
|
+
## Philosophy
|
|
21
|
+
|
|
22
|
+
### Fresh context windows over long conversations
|
|
23
|
+
|
|
24
|
+
The insight from the RLM paper is that context windows are not free. Every token of prior conversation competes with new information for the model's attention. Failed attempts, debug noise, and superseded plans don't disappear when you move on — they stay in the window and subtly bias future reasoning.
|
|
25
|
+
|
|
26
|
+
The solution is to make context windows **ephemeral by design**. Each worker session:
|
|
27
|
+
- Starts clean, with no memory of prior attempts
|
|
28
|
+
- Loads exactly the state it needs from protocol files
|
|
29
|
+
- Does one pass, then stops
|
|
30
|
+
|
|
31
|
+
The protocol files carry forward what matters. Everything else is discarded.
|
|
32
|
+
|
|
33
|
+
### Files as the memory primitive
|
|
34
|
+
|
|
35
|
+
Context windows are session-local and finite. Files are persistent, inspectable, diff-able, and shared across sessions. By routing all persistent state through the filesystem, the loop gains properties that in-context memory cannot provide:
|
|
36
|
+
|
|
37
|
+
- **Durability**: state survives crashes, restarts, and context compression
|
|
38
|
+
- **Inspectability**: you can read `AGENT_CONTEXT_FOR_NEXT_RALPH.md` at any time to see exactly what the next attempt will see
|
|
39
|
+
- **Shareability**: multiple sessions (Ralph, worker, sub-agents) read and write the same files concurrently
|
|
40
|
+
- **Debuggability**: the entire history of an overnight run is in plaintext files you can grep
|
|
41
|
+
|
|
42
|
+
### Separation of strategy and execution
|
|
43
|
+
|
|
44
|
+
The Ralph strategist session exists because mixing strategy and execution in the same context is how reasoning degrades. When a session that just wrote failing code is also responsible for diagnosing *why* it failed and planning the next approach, it pattern-matches against its own failed reasoning. It proposes variations on what didn't work rather than stepping back.
|
|
45
|
+
|
|
46
|
+
Ralph's session gets a fresh window. It reads the failure record cold, without the accumulated baggage of having written the code. This mirrors how experienced engineering teams work: the reviewer of a failing PR is often not the one who writes the fix.
|
|
47
|
+
|
|
48
|
+
### The verify contract
|
|
49
|
+
|
|
50
|
+
The loop is only as good as its exit condition. `verify.command` is the single source of truth for "done." A machine-verifiable criterion — tests pass, types check, linter clean — turns the exit question from a judgment call into a boolean. The model cannot talk its way out of a failing test suite.
|
|
51
|
+
|
|
52
|
+
This contract has a corollary: **the better your verify command, the better the loop performs.** A verify that checks only syntax will produce syntactically valid but logically broken code. A verify that runs the full test suite, typechecks, and lints will produce code that passes all three.
|
|
53
|
+
|
|
54
|
+
### The grep-first discipline
|
|
55
|
+
|
|
56
|
+
The RLM paper demonstrates that full-file reads are expensive and often counterproductive. When a model dumps a 2000-line file into its context to answer a question that requires 30 lines, the relevant section is buried in noise and the window fills with irrelevant code.
|
|
57
|
+
|
|
58
|
+
`rlm_grep` + `rlm_slice` give surgical access: search first to find line numbers, then read only the relevant range. `CONTEXT_FOR_RLM.md` is the designated large-reference file — a place to paste API docs, specs, or large codebases that should never be read in full.
|
|
59
|
+
|
|
60
|
+
### Persistent learning across attempts
|
|
61
|
+
|
|
62
|
+
`NOTES_AND_LEARNINGS.md` and `RLM_INSTRUCTIONS.md` are the loop's long-term memory. They survive context resets and accumulate across attempts. The loop doesn't just retry — it gets smarter with each failure.
|
|
63
|
+
|
|
64
|
+
`RLM_INSTRUCTIONS.md` is the inner loop's operating manual. The Ralph strategist updates it between attempts when a pattern of failures reveals a gap in guidance. By attempt 10, the instructions encode everything learned from attempts 1-9.
|
|
65
|
+
|
|
66
|
+
This is why the approach scales to overnight runs. A fresh worker in attempt 10 starts with the accumulated knowledge of 9 prior attempts, encoded in protocol files, without the accumulated noise.
|
|
67
|
+
|
|
68
|
+
|
|
69
|
+
## How it works
|
|
70
|
+
|
|
71
|
+
### Three-level architecture
|
|
72
|
+
|
|
73
|
+
```
|
|
74
|
+
You → main session (thin meta-supervisor — your conversation)
|
|
75
|
+
│
|
|
76
|
+
├─ attempt 1:
|
|
77
|
+
│ ├─ spawns Ralph strategist session R1 ← fresh context
|
|
78
|
+
│ │ R1: ralph_load_context() → review failures → update PLAN.md
|
|
79
|
+
│ │ → ralph_spawn_worker() → STOP
|
|
80
|
+
│ │
|
|
81
|
+
│ └─ spawns RLM worker session W1 ← fresh context
|
|
82
|
+
│ W1: ralph_load_context() → code → ralph_verify() → STOP
|
|
83
|
+
│
|
|
84
|
+
├─ plugin verifies on W1 idle
|
|
85
|
+
│ fail → roll state files → spawn attempt 2
|
|
86
|
+
│
|
|
87
|
+
├─ attempt 2:
|
|
88
|
+
│ ├─ spawns Ralph strategist session R2 ← fresh context again
|
|
89
|
+
│ │ R2: reads AGENT_CONTEXT_FOR_NEXT_RALPH.md → adjusts strategy
|
|
90
|
+
│ │ → ralph_spawn_worker() → STOP
|
|
91
|
+
│ │
|
|
92
|
+
│ └─ spawns RLM worker session W2 ← fresh context
|
|
93
|
+
│ W2: loads compact state from files → code → STOP
|
|
94
|
+
│
|
|
95
|
+
└─ pass → done toast
|
|
96
|
+
```
|
|
97
|
+
|
|
98
|
+
Each session role has a distinct purpose and **fresh context window**:
|
|
99
|
+
|
|
100
|
+
| Role | Session | Context | Responsibility |
|
|
101
|
+
|---|---|---|---|
|
|
102
|
+
| **main** | Your conversation | Persistent | Goal → stop. Plugin handles the rest. |
|
|
103
|
+
| **ralph** | Per-attempt strategist | Fresh | Review failure, update PLAN.md / RLM_INSTRUCTIONS.md, call `ralph_spawn_worker()`. |
|
|
104
|
+
| **worker** | Per-attempt coder | Fresh | `ralph_load_context()` → code → `ralph_verify()` → stop. |
|
|
105
|
+
|
|
106
|
+
### The state machine
|
|
107
|
+
|
|
108
|
+
```
|
|
109
|
+
main idle
|
|
110
|
+
└─ spawn Ralph(1)
|
|
111
|
+
└─ Ralph(1) calls ralph_spawn_worker()
|
|
112
|
+
└─ spawn Worker(1)
|
|
113
|
+
└─ Worker(1) calls ralph_verify() and goes idle
|
|
114
|
+
└─ plugin runs verify
|
|
115
|
+
├─ pass → done
|
|
116
|
+
└─ fail → roll state files
|
|
117
|
+
└─ spawn Ralph(2)
|
|
118
|
+
└─ (repeat)
|
|
119
|
+
```
|
|
120
|
+
|
|
121
|
+
The plugin drives the loop from `session.idle` events. Neither Ralph nor the worker need to know about the outer loop — they just load context, do their job, and stop.
|
|
122
|
+
|
|
123
|
+
### The RLM worker discipline
|
|
124
|
+
|
|
125
|
+
Each worker session is required to:
|
|
126
|
+
|
|
127
|
+
1. Call `ralph_load_context()` first — blocked from `write`/`edit`/`bash` until it does.
|
|
128
|
+
2. Read `PLAN.md` and `RLM_INSTRUCTIONS.md` as authoritative instructions.
|
|
129
|
+
3. Use `rlm_grep` + `rlm_slice` to access large reference documents — never dump them whole.
|
|
130
|
+
4. Write scratch work to `CURRENT_STATE.md` throughout the attempt.
|
|
131
|
+
5. Promote durable changes (completed milestones, new constraints) to `PLAN.md`.
|
|
132
|
+
6. Append insights to `NOTES_AND_LEARNINGS.md`.
|
|
133
|
+
7. Call `ralph_verify()` when ready, then stop.
|
|
134
|
+
|
|
135
|
+
The one-pass contract is enforced socially (system prompt) and mechanically (context gate on destructive tools). Workers do not re-prompt themselves. Ralph controls iteration.
|
|
136
|
+
|
|
137
|
+
### Sub-agents
|
|
138
|
+
|
|
139
|
+
For tasks that can be decomposed, a worker can `subagent_spawn` a child session with an isolated goal. Each sub-agent gets its own state directory under `.opencode/agents/<name>/` and the same protocol file structure. The worker polls with `subagent_await` and integrates the result.
|
|
140
|
+
|
|
141
|
+
Sub-agents follow the same discipline as workers: one pass, file-first, fresh context.
|
|
142
|
+
|
|
143
|
+
### Supervisor communication
|
|
144
|
+
|
|
145
|
+
Spawned sessions (Ralph and workers) can communicate back to the main conversation at runtime:
|
|
146
|
+
|
|
147
|
+
- `ralph_report()` — fire-and-forget progress updates, appended to `SUPERVISOR_LOG.md` and posted to the main conversation
|
|
148
|
+
- `ralph_ask()` — blocks the session until you respond via `ralph_respond()`, enabling interactive decision points mid-loop (e.g., "should I rewrite auth.ts or patch it?")
|
|
149
|
+
|
|
150
|
+
This is implemented via file-based IPC (`.opencode/pending_input.json`) so responses survive across any session boundary.
|
|
151
|
+
|
|
152
|
+
|
|
153
|
+
## Install
|
|
154
|
+
|
|
155
|
+
### Project-level (recommended)
|
|
156
|
+
|
|
157
|
+
```
|
|
158
|
+
your-repo/
|
|
159
|
+
└── .opencode/
|
|
160
|
+
├── package.json ← add "effect" dependency
|
|
161
|
+
├── ralph.json ← verify command + tuning
|
|
162
|
+
└── plugins/
|
|
163
|
+
└── ralph-rlm.ts ← the plugin
|
|
164
|
+
```
|
|
165
|
+
|
|
166
|
+
Copy `ralph-rlm.ts` into `.opencode/plugins/` and create `.opencode/package.json`:
|
|
167
|
+
|
|
168
|
+
```json
|
|
169
|
+
{
|
|
170
|
+
"dependencies": {
|
|
171
|
+
"effect": "^3.13.0"
|
|
172
|
+
}
|
|
173
|
+
}
|
|
174
|
+
```
|
|
175
|
+
|
|
176
|
+
OpenCode runs `bun install` at startup automatically.
|
|
177
|
+
|
|
178
|
+
### Global
|
|
179
|
+
|
|
180
|
+
Copy the plugin to `~/.config/opencode/plugins/ralph-rlm.ts` and add the `package.json` to `~/.config/opencode/package.json`.
|
|
181
|
+
|
|
182
|
+
|
|
183
|
+
## Configuration
|
|
184
|
+
|
|
185
|
+
Create `.opencode/ralph.json`. All fields are optional — the plugin runs with safe defaults if the file is absent.
|
|
186
|
+
|
|
187
|
+
```json
|
|
188
|
+
{
|
|
189
|
+
"enabled": true,
|
|
190
|
+
"maxAttempts": 25,
|
|
191
|
+
"verify": {
|
|
192
|
+
"command": ["bun", "test"],
|
|
193
|
+
"cwd": "."
|
|
194
|
+
},
|
|
195
|
+
"gateDestructiveToolsUntilContextLoaded": true,
|
|
196
|
+
"maxRlmSliceLines": 200,
|
|
197
|
+
"requireGrepBeforeLargeSlice": true,
|
|
198
|
+
"grepRequiredThresholdLines": 120,
|
|
199
|
+
"subAgentEnabled": true,
|
|
200
|
+
"maxSubAgents": 5,
|
|
201
|
+
"agentMdPath": "AGENT.md"
|
|
202
|
+
}
|
|
203
|
+
```
|
|
204
|
+
|
|
205
|
+
| Field | Default | Description |
|
|
206
|
+
|---|---|---|
|
|
207
|
+
| `enabled` | `true` | Set to `false` to disable the outer loop without removing the plugin. |
|
|
208
|
+
| `maxAttempts` | `20` | Hard stop after this many failed verify attempts. |
|
|
209
|
+
| `verify.command` | — | Shell command to run as an array, e.g. `["bun", "test"]`. If omitted, verify always returns `unknown`. |
|
|
210
|
+
| `verify.cwd` | `"."` | Working directory for the verify command, relative to the repo root. |
|
|
211
|
+
| `gateDestructiveToolsUntilContextLoaded` | `true` | Block `write`, `edit`, `bash`, etc. until `ralph_load_context()` has been called in the current attempt. |
|
|
212
|
+
| `maxRlmSliceLines` | `200` | Maximum lines a single `rlm_slice` call may return. |
|
|
213
|
+
| `requireGrepBeforeLargeSlice` | `true` | Require a recent `rlm_grep` call before slices larger than `grepRequiredThresholdLines`. |
|
|
214
|
+
| `grepRequiredThresholdLines` | `120` | Line threshold above which grep-first is required. |
|
|
215
|
+
| `subAgentEnabled` | `true` | Allow `subagent_spawn`. |
|
|
216
|
+
| `maxSubAgents` | `5` | Maximum concurrently running sub-agents per session. |
|
|
217
|
+
| `agentMdPath` | `"AGENT.md"` | Path (relative to repo root) to the project AGENT.md. Read by `ralph_load_context()` and included in the context payload. Set to `""` to disable. |
|
|
218
|
+
|
|
219
|
+
### verify command examples
|
|
220
|
+
|
|
221
|
+
```json
|
|
222
|
+
{ "command": ["bun", "test"] }
|
|
223
|
+
{ "command": ["npm", "test"] }
|
|
224
|
+
{ "command": ["cargo", "test"] }
|
|
225
|
+
{ "command": ["python", "-m", "pytest"] }
|
|
226
|
+
{ "command": ["make", "ci"] }
|
|
227
|
+
{ "command": ["./scripts/verify.sh"] }
|
|
228
|
+
```
|
|
229
|
+
|
|
230
|
+
The verify command is the loop's exit condition. It should be as comprehensive as you want the output to be. A verify that runs tests + typecheck + lint will produce code that passes all three; a verify that only checks syntax will produce syntactically valid code that may be logically broken.
|
|
231
|
+
|
|
232
|
+
|
|
233
|
+
## Protocol files
|
|
234
|
+
|
|
235
|
+
The plugin bootstraps these files on first run if they do not exist. They are the persistent memory of the loop — commit them to version control.
|
|
236
|
+
|
|
237
|
+
| File | Purpose |
|
|
238
|
+
|---|---|
|
|
239
|
+
| `PLAN.md` | Goals, milestones, definition of done, changelog. Updated via `ralph_update_plan()`. |
|
|
240
|
+
| `RLM_INSTRUCTIONS.md` | Inner loop operating manual and playbooks. Updated via `ralph_update_rlm_instructions()`. |
|
|
241
|
+
| `CURRENT_STATE.md` | Scratch pad for the current Ralph attempt. Reset on each rollover. |
|
|
242
|
+
| `PREVIOUS_STATE.md` | Snapshot of the last attempt's scratch. Automatically written on rollover. |
|
|
243
|
+
| `AGENT_CONTEXT_FOR_NEXT_RALPH.md` | Shim passed to the next attempt: verdict, summary, next step. |
|
|
244
|
+
| `CONTEXT_FOR_RLM.md` | Large reference document (API docs, specs, etc.). Always accessed via `rlm_grep` + `rlm_slice`. |
|
|
245
|
+
| `NOTES_AND_LEARNINGS.md` | Append-only log of durable insights. Survives all context resets. |
|
|
246
|
+
| `TODOS.md` | Optional lightweight task list. |
|
|
247
|
+
| `SUPERVISOR_LOG.md` | Append-only feed of all `ralph_report()` entries across all attempts and sessions. |
|
|
248
|
+
|
|
249
|
+
Sub-agent state lives under `.opencode/agents/<name>/` with the same structure.
|
|
250
|
+
|
|
251
|
+
### How files flow between attempts
|
|
252
|
+
|
|
253
|
+
```
|
|
254
|
+
Attempt N worker writes:
|
|
255
|
+
CURRENT_STATE.md ← scratch: what I tried, what I found
|
|
256
|
+
NOTES_AND_LEARNINGS.md ← append: durable insight from this attempt
|
|
257
|
+
|
|
258
|
+
On N→N+1 rollover, plugin writes:
|
|
259
|
+
PREVIOUS_STATE.md ← copy of CURRENT_STATE.md
|
|
260
|
+
CURRENT_STATE.md ← reset to blank template
|
|
261
|
+
AGENT_CONTEXT_FOR_NEXT_RALPH.md ← verdict + summary + next step
|
|
262
|
+
|
|
263
|
+
Ralph(N+1) reads and optionally updates:
|
|
264
|
+
AGENT_CONTEXT_FOR_NEXT_RALPH.md ← why it failed
|
|
265
|
+
PLAN.md ← adjusts strategy
|
|
266
|
+
RLM_INSTRUCTIONS.md ← adjusts worker guidance
|
|
267
|
+
|
|
268
|
+
Worker(N+1) reads:
|
|
269
|
+
All of the above via ralph_load_context()
|
|
270
|
+
```
|
|
271
|
+
|
|
272
|
+
This is why the loop can run overnight. Each fresh session starts with the accumulated knowledge of all prior attempts, encoded in files — not in a context window that would be reset.
|
|
273
|
+
|
|
274
|
+
|
|
275
|
+
## Working with AGENT.md
|
|
276
|
+
|
|
277
|
+
OpenCode loads `AGENT.md` from the repo root into every session's system prompt automatically. The plugin coexists with this but the two files serve different roles:
|
|
278
|
+
|
|
279
|
+
| | `AGENT.md` | `RLM_INSTRUCTIONS.md` |
|
|
280
|
+
|---|---|---|
|
|
281
|
+
| **Scope** | Static project-wide rules | Dynamic per-loop operating manual |
|
|
282
|
+
| **Who writes it** | You (developer) | Agent (via `ralph_update_rlm_instructions()`) |
|
|
283
|
+
| **Changes** | Rarely — git-committed conventions | Every loop — playbooks, learnings, constraints |
|
|
284
|
+
| **Injected by** | OpenCode automatically (system prompt) | `ralph_load_context()` return payload |
|
|
285
|
+
|
|
286
|
+
### What the plugin does with AGENT.md
|
|
287
|
+
|
|
288
|
+
`ralph_load_context()` automatically reads `AGENT.md` (configurable via `agentMdPath`) and includes it in the context payload under `agent_md`. This means:
|
|
289
|
+
|
|
290
|
+
- Sub-agents, which run in isolated sessions that may not have AGENT.md injected, still see the project rules.
|
|
291
|
+
- Every attempt starts with both the static project context and the dynamic loop state in one payload.
|
|
292
|
+
|
|
293
|
+
To disable AGENT.md inclusion:
|
|
294
|
+
|
|
295
|
+
```json
|
|
296
|
+
{ "agentMdPath": "" }
|
|
297
|
+
```
|
|
298
|
+
|
|
299
|
+
To point to a non-standard location:
|
|
300
|
+
|
|
301
|
+
```json
|
|
302
|
+
{ "agentMdPath": "docs/AGENT.md" }
|
|
303
|
+
```
|
|
304
|
+
|
|
305
|
+
### Recommended AGENT.md structure
|
|
306
|
+
|
|
307
|
+
Keep AGENT.md focused on facts that never change loop-to-loop: repo layout, build commands, code style. Defer loop-specific guidance to `RLM_INSTRUCTIONS.md`.
|
|
308
|
+
|
|
309
|
+
```markdown
|
|
310
|
+
# Project Agent Rules
|
|
311
|
+
|
|
312
|
+
## Repo layout
|
|
313
|
+
- `src/` — application source
|
|
314
|
+
- `tests/` — test suite (`bun test`)
|
|
315
|
+
- `docs/` — documentation
|
|
316
|
+
|
|
317
|
+
## Build and verify
|
|
318
|
+
- Install: `bun install`
|
|
319
|
+
- Test: `bun test`
|
|
320
|
+
- Typecheck: `bun run typecheck`
|
|
321
|
+
|
|
322
|
+
## Code style
|
|
323
|
+
- TypeScript strict mode; no `any`
|
|
324
|
+
- Prefer Effect-TS over raw Promises for async/error handling
|
|
325
|
+
|
|
326
|
+
## Loop guidance
|
|
327
|
+
This project uses the ralph-rlm plugin.
|
|
328
|
+
- Call `ralph_load_context()` at the start of every attempt.
|
|
329
|
+
- Task-specific playbooks live in `RLM_INSTRUCTIONS.md` — check there for the current strategy before starting work.
|
|
330
|
+
- Do NOT put attempt-specific state in AGENT.md; write it to `CURRENT_STATE.md` or `NOTES_AND_LEARNINGS.md`.
|
|
331
|
+
```
|
|
332
|
+
|
|
333
|
+
### Avoiding conflicts
|
|
334
|
+
|
|
335
|
+
If your AGENT.md contains instructions that clash with the plugin's file-first rules (e.g. "always read files in full"), add a note that defers to `RLM_INSTRUCTIONS.md`:
|
|
336
|
+
|
|
337
|
+
```markdown
|
|
338
|
+
## Note on file access
|
|
339
|
+
When working with the ralph-rlm loop, prefer `rlm_grep` + `rlm_slice` for large files
|
|
340
|
+
over full reads. The loop-specific protocol in `RLM_INSTRUCTIONS.md` takes precedence
|
|
341
|
+
over general file-access guidance in this document.
|
|
342
|
+
```
|
|
343
|
+
|
|
344
|
+
### Extending the system prompt instead
|
|
345
|
+
|
|
346
|
+
If you want your AGENT.md content appended to the plugin's system prompt fragment (instead of included in the context payload), use `RALPH_SYSTEM_PROMPT_APPEND`:
|
|
347
|
+
|
|
348
|
+
```bash
|
|
349
|
+
export RALPH_SYSTEM_PROMPT_APPEND="@AGENT.md"
|
|
350
|
+
```
|
|
351
|
+
|
|
352
|
+
This injects the file on every turn rather than only when `ralph_load_context()` is called.
|
|
353
|
+
|
|
354
|
+
|
|
355
|
+
## Tools
|
|
356
|
+
|
|
357
|
+
### Context loading
|
|
358
|
+
|
|
359
|
+
#### `ralph_load_context()`
|
|
360
|
+
|
|
361
|
+
Reads all protocol files and returns them as a structured JSON payload. Must be called at the start of every attempt. Calling it marks the session as context-loaded, which unblocks destructive tools.
|
|
362
|
+
|
|
363
|
+
```
|
|
364
|
+
args:
|
|
365
|
+
includeRlmContextHeadings boolean optional Return headings-only from CONTEXT_FOR_RLM.md (default true)
|
|
366
|
+
rlmHeadingsMax number optional Max headings to return (default 80)
|
|
367
|
+
```
|
|
368
|
+
|
|
369
|
+
### Reading large files
|
|
370
|
+
|
|
371
|
+
#### `rlm_grep(query, file?, maxMatches?, contextLines?)`
|
|
372
|
+
|
|
373
|
+
Search a file by regex and return matching lines with line numbers. Defaults to `CONTEXT_FOR_RLM.md`. Use this to locate the relevant section before slicing.
|
|
374
|
+
|
|
375
|
+
#### `rlm_slice(startLine, endLine, file?)`
|
|
376
|
+
|
|
377
|
+
Read a specific line range from a file. Enforces the `maxRlmSliceLines` limit. Requires a recent `rlm_grep` call if the slice exceeds `grepRequiredThresholdLines`.
|
|
378
|
+
|
|
379
|
+
### Plan and instructions
|
|
380
|
+
|
|
381
|
+
#### `ralph_update_plan(patch, reason)`
|
|
382
|
+
|
|
383
|
+
Apply a unified diff patch to `PLAN.md`. Automatically appends a changelog entry. Use for durable changes only: completed milestones, new constraints, clarified acceptance criteria.
|
|
384
|
+
|
|
385
|
+
#### `ralph_update_rlm_instructions(patch, reason)`
|
|
386
|
+
|
|
387
|
+
Apply a unified diff patch to `RLM_INSTRUCTIONS.md`. Appends a changelog entry. The Fixed Header section should not be modified.
|
|
388
|
+
|
|
389
|
+
### Loop management
|
|
390
|
+
|
|
391
|
+
#### `ralph_rollover(verdict, summary, nextStep, learning?)`
|
|
392
|
+
|
|
393
|
+
Manually trigger a rollover: copies `CURRENT_STATE.md` to `PREVIOUS_STATE.md`, resets scratch, writes the next-attempt shim. Optionally appends a learning to `NOTES_AND_LEARNINGS.md`. The outer loop calls this automatically on verify failure; the agent can also call it explicitly.
|
|
394
|
+
|
|
395
|
+
#### `ralph_verify()`
|
|
396
|
+
|
|
397
|
+
Run the configured verify command. Returns `{ verdict: "pass"|"fail"|"unknown", output, error }`.
|
|
398
|
+
|
|
399
|
+
#### `ralph_spawn_worker()`
|
|
400
|
+
|
|
401
|
+
**Ralph strategist sessions only.** Spawn a fresh RLM worker session for this attempt. Call this after reviewing protocol files and optionally updating `PLAN.md` / `RLM_INSTRUCTIONS.md`. Then stop — the plugin handles verification and spawns the next Ralph session if needed.
|
|
402
|
+
|
|
403
|
+
### Sub-agents
|
|
404
|
+
|
|
405
|
+
#### `subagent_spawn(name, goal, context?)`
|
|
406
|
+
|
|
407
|
+
Spawn a child OpenCode session to handle an isolated sub-task. Creates `.opencode/agents/<name>/` state files, then sends the initial prompt to the child session.
|
|
408
|
+
|
|
409
|
+
#### `subagent_await(name, maxLines?)`
|
|
410
|
+
|
|
411
|
+
Poll a sub-agent's `CURRENT_STATE.md` for completion. Returns `{ status: "done"|"running"|"not_found", current_state }`. The sub-agent signals completion by writing `## Final Result` or outputting `SUB_AGENT_DONE`.
|
|
412
|
+
|
|
413
|
+
#### `subagent_peek(name, file?, maxLines?)`
|
|
414
|
+
|
|
415
|
+
Read any protocol file from a sub-agent's state directory without waiting for completion. Useful for monitoring progress mid-run.
|
|
416
|
+
|
|
417
|
+
#### `subagent_list()`
|
|
418
|
+
|
|
419
|
+
List all sub-agents registered in the current session with their name, goal, status, and spawn time.
|
|
420
|
+
|
|
421
|
+
### Supervisor communication
|
|
422
|
+
|
|
423
|
+
These tools let spawned sessions (Ralph strategist, RLM worker) communicate back to the main conversation at runtime. State is carried in `.opencode/pending_input.json` for question/response pairs and `SUPERVISOR_LOG.md` for the progress feed.
|
|
424
|
+
|
|
425
|
+
#### `ralph_report(message, level?, post_to_conversation?)`
|
|
426
|
+
|
|
427
|
+
Fire-and-forget progress report. Appends a timestamped entry to `SUPERVISOR_LOG.md`, shows a toast, and optionally posts into the main conversation so you can see what's happening without opening a separate session.
|
|
428
|
+
|
|
429
|
+
```
|
|
430
|
+
args:
|
|
431
|
+
message string required Progress message
|
|
432
|
+
level string optional "info" | "warning" | "error" (default: "info")
|
|
433
|
+
post_to_conversation boolean optional Post to main conversation (default: true)
|
|
434
|
+
```
|
|
435
|
+
|
|
436
|
+
#### `ralph_ask(question, context?, timeout_minutes?)`
|
|
437
|
+
|
|
438
|
+
Ask a question and **block** until you respond via `ralph_respond()`. The question is written to `.opencode/pending_input.json`, a toast appears in the main session, and the main conversation is prompted with the question ID and response instruction. The calling session polls every 5 seconds.
|
|
439
|
+
|
|
440
|
+
Use this for decisions that can't be inferred from the protocol files — e.g., "should I rewrite `auth.ts` from scratch or patch the existing implementation?"
|
|
441
|
+
|
|
442
|
+
```
|
|
443
|
+
args:
|
|
444
|
+
question string required The question
|
|
445
|
+
context string optional Additional context for the decision
|
|
446
|
+
timeout_minutes number optional Minutes to wait before timing out (default: 15)
|
|
447
|
+
```
|
|
448
|
+
|
|
449
|
+
Returns `{ id, answer }` as JSON once you respond.
|
|
450
|
+
|
|
451
|
+
#### `ralph_respond(id, answer)`
|
|
452
|
+
|
|
453
|
+
Respond to a pending question, unblocking the session that called `ralph_ask()`. The `id` is shown in the toast and in the main conversation prompt (format: `ask-NNNN`). If you mistype the ID, the tool returns an error listing all pending unanswered questions with their IDs.
|
|
454
|
+
|
|
455
|
+
```
|
|
456
|
+
args:
|
|
457
|
+
id string required Question ID (e.g. "ask-1234567890")
|
|
458
|
+
answer string required Your answer
|
|
459
|
+
```
|
|
460
|
+
|
|
461
|
+
|
|
462
|
+
## Customising prompts via environment variables
|
|
463
|
+
|
|
464
|
+
Every internal prompt the plugin sends to the model is customisable through environment variables. Values are loaded once at startup.
|
|
465
|
+
|
|
466
|
+
### Formats
|
|
467
|
+
|
|
468
|
+
```bash
|
|
469
|
+
# Literal text — use \n for newlines
|
|
470
|
+
RALPH_CONTINUE_PROMPT="Attempt {{attempt}}: fix the verify.\n\nCall ralph_verify() when done."
|
|
471
|
+
|
|
472
|
+
# File reference (relative to worktree)
|
|
473
|
+
RALPH_SYSTEM_PROMPT="@.opencode/prompts/system.txt"
|
|
474
|
+
|
|
475
|
+
# Absolute file path
|
|
476
|
+
RALPH_BOOTSTRAP_RLM_INSTRUCTIONS="@/home/user/prompts/rlm-instructions.md"
|
|
477
|
+
```
|
|
478
|
+
|
|
479
|
+
### Reference
|
|
480
|
+
|
|
481
|
+
| Variable | Tokens | Description |
|
|
482
|
+
|---|---|---|
|
|
483
|
+
| `RALPH_SYSTEM_PROMPT` | — | Full system prompt injected on every turn. Replaces the default. |
|
|
484
|
+
| `RALPH_SYSTEM_PROMPT_APPEND` | — | Appended after the system prompt. Useful for adding project-specific rules without replacing the base. |
|
|
485
|
+
| `RALPH_COMPACTION_CONTEXT` | — | Context block injected when the session is compacted (context window compressed). |
|
|
486
|
+
| `RALPH_CONTINUE_PROMPT` | `{{attempt}}` `{{verdict}}` | Re-prompt sent to the agent after a failed verification attempt. |
|
|
487
|
+
| `RALPH_DONE_FILE_CONTENT` | `{{timestamp}}` | Content written to `AGENT_CONTEXT_FOR_NEXT_RALPH.md` when verification passes. |
|
|
488
|
+
| `RALPH_SUBAGENT_PROMPT` | `{{name}}` `{{goal}}` `{{context}}` `{{stateDir}}` `{{doneSentinel}}` `{{doneHeading}}` | Initial prompt sent to a spawned sub-agent. |
|
|
489
|
+
| `RALPH_SUBAGENT_DONE_SENTINEL` | — | Phrase the sub-agent must output to signal completion. Default: `SUB_AGENT_DONE`. |
|
|
490
|
+
| `RALPH_SUBAGENT_DONE_HEADING` | — | Heading in `CURRENT_STATE.md` that marks sub-agent completion. Default: `## Final Result`. |
|
|
491
|
+
| `RALPH_BOOTSTRAP_RLM_INSTRUCTIONS` | `{{timestamp}}` | Initial content written to `RLM_INSTRUCTIONS.md` when it does not exist. |
|
|
492
|
+
| `RALPH_BOOTSTRAP_CURRENT_STATE` | — | Template written to `CURRENT_STATE.md` on bootstrap and after each rollover. |
|
|
493
|
+
| `RALPH_CONTEXT_GATE_ERROR` | — | Error message thrown when the agent tries a destructive tool before loading context. |
|
|
494
|
+
| `RALPH_WORKER_SYSTEM_PROMPT` | — | System prompt injected into every RLM worker session. Describes the one-pass contract. |
|
|
495
|
+
| `RALPH_WORKER_PROMPT` | `{{attempt}}` | Initial prompt sent to each spawned RLM worker session. |
|
|
496
|
+
| `RALPH_SESSION_SYSTEM_PROMPT` | — | System prompt injected into Ralph strategist sessions. |
|
|
497
|
+
| `RALPH_SESSION_PROMPT` | `{{attempt}}` | Initial prompt sent to each spawned Ralph strategist session. |
|
|
498
|
+
|
|
499
|
+
### Example: custom continue prompt from a file
|
|
500
|
+
|
|
501
|
+
`.opencode/prompts/continue.txt`:
|
|
502
|
+
```
|
|
503
|
+
Ralph attempt {{attempt}} — last verify: {{verdict}}.
|
|
504
|
+
|
|
505
|
+
You are working in a TypeScript monorepo. Rules:
|
|
506
|
+
1. Call ralph_load_context() first.
|
|
507
|
+
2. Check PLAN.md for the current milestone.
|
|
508
|
+
3. Run `bun typecheck` before `bun test`.
|
|
509
|
+
4. Write all intermediate findings to CURRENT_STATE.md.
|
|
510
|
+
5. When the verify passes, stop.
|
|
511
|
+
```
|
|
512
|
+
|
|
513
|
+
`.env` or shell:
|
|
514
|
+
```bash
|
|
515
|
+
export RALPH_CONTINUE_PROMPT="@.opencode/prompts/continue.txt"
|
|
516
|
+
```
|
|
517
|
+
|
|
518
|
+
|
|
519
|
+
## Workflow patterns
|
|
520
|
+
|
|
521
|
+
### Basic: run until tests pass
|
|
522
|
+
|
|
523
|
+
Fill in your `verify.command`, write a goal in `PLAN.md`, and start a session. The loop runs automatically.
|
|
524
|
+
|
|
525
|
+
```
|
|
526
|
+
1. Edit PLAN.md — set your goal and definition of done.
|
|
527
|
+
2. Open OpenCode and describe the task.
|
|
528
|
+
3. Agent calls ralph_load_context(), reads PLAN.md, starts working.
|
|
529
|
+
4. Agent calls ralph_verify().
|
|
530
|
+
5. If fail → Ralph rolls state, re-prompts. Go to 3.
|
|
531
|
+
6. If pass → Ralph shows toast. Done.
|
|
532
|
+
```
|
|
533
|
+
|
|
534
|
+
### Overnight: walk away
|
|
535
|
+
|
|
536
|
+
Set `maxAttempts` high (25–50), write a detailed `PLAN.md` with a precise definition of done, and close your laptop. The loop will:
|
|
537
|
+
|
|
538
|
+
1. Make an attempt.
|
|
539
|
+
2. Run verify.
|
|
540
|
+
3. On failure: roll state, spawn Ralph to diagnose and adjust, spawn the next worker.
|
|
541
|
+
4. Repeat until it passes or hits `maxAttempts`.
|
|
542
|
+
|
|
543
|
+
In the morning, check `SUPERVISOR_LOG.md` for the progress feed, `NOTES_AND_LEARNINGS.md` for what the loop learned, and `AGENT_CONTEXT_FOR_NEXT_RALPH.md` for where it stopped.
|
|
544
|
+
|
|
545
|
+
### Supervisory check-in
|
|
546
|
+
|
|
547
|
+
Use `ralph_report` and `ralph_ask` to stay informed and make decisions without micromanaging:
|
|
548
|
+
|
|
549
|
+
```
|
|
550
|
+
Worker:
|
|
551
|
+
ralph_report("Finished refactoring auth module. 3 tests failing — all in legacy JWT path.")
|
|
552
|
+
ralph_ask("The legacy JWT path is only used by the mobile app. Rewrite or remove?")
|
|
553
|
+
← blocks until you call ralph_respond("ask-...", "Remove it, mobile app is deprecated")
|
|
554
|
+
(continues with the answer)
|
|
555
|
+
```
|
|
556
|
+
|
|
557
|
+
You stay in the loop for decisions that require human judgment. Everything else runs unattended.
|
|
558
|
+
|
|
559
|
+
### Parallel decomposition with sub-agents
|
|
560
|
+
|
|
561
|
+
```
|
|
562
|
+
Parent agent:
|
|
563
|
+
1. ralph_load_context()
|
|
564
|
+
2. Identify two independent sub-tasks
|
|
565
|
+
3. subagent_spawn("auth", "implement JWT auth", context)
|
|
566
|
+
4. subagent_spawn("api", "implement REST endpoints", context)
|
|
567
|
+
5. subagent_await("auth") — poll until done
|
|
568
|
+
6. subagent_await("api") — poll until done
|
|
569
|
+
7. Integrate results, update PLAN.md
|
|
570
|
+
8. ralph_verify()
|
|
571
|
+
```
|
|
572
|
+
|
|
573
|
+
### Tuning the inner loop
|
|
574
|
+
|
|
575
|
+
Edit `RLM_INSTRUCTIONS.md` to add project-specific playbooks, register MCP tools, or adjust the debug workflow. Changes persist across attempts. Use `ralph_update_rlm_instructions()` from within a session, or edit the file directly.
|
|
576
|
+
|
|
577
|
+
The instructions file is the primary lever for improving loop performance. If the loop keeps making the same mistake, add a rule. If it keeps following an inefficient path, add a playbook. The Ralph strategist is responsible for updating these instructions between attempts based on what it observes in the failure record.
|
|
578
|
+
|
|
579
|
+
|
|
580
|
+
## Hooks installed
|
|
581
|
+
|
|
582
|
+
| Hook | What it does |
|
|
583
|
+
|---|---|
|
|
584
|
+
| `event: session.idle` | Routes idle events: **worker** → `handleWorkerIdle` (verify + continue loop); **ralph** → `handleRalphSessionIdle` (warn if no worker spawned); **main/other** → `handleMainIdle` (kick off attempt 1). |
|
|
585
|
+
| `event: session.created` | Pre-allocates session state for known worker/ralph sessions. |
|
|
586
|
+
| `experimental.chat.system.transform` | Three-way routing: **worker** → RLM file-first prompt; **ralph** → Ralph strategist prompt; **main/other** → supervisor prompt. |
|
|
587
|
+
| `experimental.session.compacting` | Injects protocol file pointers into compaction context so state survives context compression. |
|
|
588
|
+
| `tool.execute.before` | Blocks destructive tools (`write`, `edit`, `bash`, `delete`, `move`, `rename`) in **worker sessions** until `ralph_load_context()` has been called. Ralph strategist sessions are not gated. |
|
|
589
|
+
|
|
590
|
+
|
|
591
|
+
## Background
|
|
592
|
+
|
|
593
|
+
### The Ralph loop
|
|
594
|
+
|
|
595
|
+
The outer loop is named after the [Ralph Wiggum technique](https://www.geoffreyhuntley.com/ralph) — a `while` loop that feeds a prompt to an AI agent until it succeeds. The name reflects the philosophy: persistent, not clever. The loop doesn't try to be smart about when to give up. It tries, records what happened, and tries again with better instructions.
|
|
596
|
+
|
|
597
|
+
The key addition in this plugin over a naive Ralph implementation is the **separation of the strategist from the worker**. A naive loop re-prompts the same session. This plugin spawns a fresh Ralph strategist to review the failure before spawning the next worker. The strategist's fresh context means it analyses the failure without being anchored to the reasoning that produced it.
|
|
598
|
+
|
|
599
|
+
### The RLM inner loop
|
|
600
|
+
|
|
601
|
+
The worker discipline is based on [Recursive Language Models (arXiv:2512.24601)](https://arxiv.org/abs/2512.24601). The paper's core finding: keeping large inputs in an external environment and having the model grep/slice/recurse over them significantly outperforms shoving everything into the context window at once. Models reason better when they can retrieve exactly what they need rather than filtering signal from a noisy dump.
|
|
602
|
+
|
|
603
|
+
This plugin approximates that approach using the filesystem as the external environment and `rlm_grep` + `rlm_slice` as the retrieval primitives. `CONTEXT_FOR_RLM.md` is the designated large-reference file — paste API docs, database schemas, or reference code there and the worker accesses it surgically rather than reading it whole.
|
|
604
|
+
|
|
605
|
+
### On the verify contract
|
|
606
|
+
|
|
607
|
+
The loop's correctness guarantee is only as strong as `verify.command`. This is a feature, not a limitation. It forces clarity about what "done" means before the loop starts. Ambiguous acceptance criteria produce ambiguous results regardless of how many attempts you give the loop.
|
|
608
|
+
|
|
609
|
+
The practical recommendation: make your verify command as strict as you can tolerate. If you would normally merge a PR that passes tests + typecheck + lint, configure that as your verify command. The loop will produce code that meets that bar.
|