@neriros/ralphy 3.8.9 → 3.8.10

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -3,512 +3,83 @@
3
3
  [![npm version](https://img.shields.io/npm/v/@neriros/ralphy.svg)](https://www.npmjs.com/package/@neriros/ralphy)
4
4
  [![npm downloads](https://img.shields.io/npm/dm/@neriros/ralphy.svg)](https://www.npmjs.com/package/@neriros/ralphy)
5
5
  [![license](https://img.shields.io/npm/l/@neriros/ralphy.svg)](https://github.com/NeriRos/ralphy/blob/main/LICENSE)
6
- [![GitHub stars](https://img.shields.io/github/stars/NeriRos/ralphy.svg?style=social)](https://github.com/NeriRos/ralphy)
7
- [![GitHub issues](https://img.shields.io/github/issues/NeriRos/ralphy.svg)](https://github.com/NeriRos/ralphy/issues)
8
6
  [![Bun](https://img.shields.io/badge/runtime-Bun-fbf0df.svg)](https://bun.sh)
9
7
 
10
- An iterative AI task execution framework. Ralphy orchestrates autonomous work using Claude or Codex with built-in state management, progress tracking, and cost safeguards. It can run as a one-shot task or as a long-lived **agent** that polls Linear, ships PRs, and iterates with reviewers.
8
+ An iterative AI task execution framework. Ralphy runs Claude or Codex in a checklist-driven loop with state on disk, cost safeguards, and a long-lived **agent** that polls Linear, opens PRs, and iterates with reviewers.
11
9
 
12
- ## Contents
13
-
14
- - [How it works](#how-it-works)
15
- - [Install](#install)
16
- - [Task mode](#task-mode) — single-task / single-loop usage
17
- - [Agent mode](#agent-mode) — Linear-driven autonomous loop
18
- - [Lifecycle and triggers](#lifecycle-and-triggers)
19
- - [Linear indicators](#linear-indicators)
20
- - [PR + CI integration](#pr--ci-integration)
21
- - [Worktrees, setup, teardown](#worktrees-setup-teardown)
22
- - [Dashboard and logs](#dashboard-and-logs)
23
- - [CLI reference](#cli-reference)
24
- - [Change layout (OpenSpec)](#change-layout-openspec)
25
- - [MCP server](#mcp-server)
26
- - [Project structure and development](#project-structure-and-development)
10
+ > 📘 Full reference — Linear indicators, lifecycle, PR/CI flow, CLI flags, MCP — lives in **[GUIDE.md](./GUIDE.md)**.
27
11
 
28
12
  ## How it works
29
13
 
30
- Ralphy runs a single continuous loop against an OpenSpec change — no phases, no phase transitions.
31
-
32
14
  ```mermaid
33
15
  graph LR
34
- S[Start iteration] --> R[Read Steering] --> T[Find first unchecked task] --> W[Do the work] --> V[Validate] --> C[Check off task] --> S
35
- T -->|all tasks checked| D[Archive change]
16
+ S[Start iteration] --> R[Read Steering] --> T[First unchecked task] --> W[Do the work] --> V[Validate] --> C[Check off] --> S
17
+ T -->|all checked| D[Archive change]
36
18
  ```
37
19
 
38
- Each iteration reads the `## Steering` section of `proposal.md`, picks the first unchecked item from `tasks.md`, does the work, validates, and checks the item off. When every item is checked the loop archives the change.
20
+ Each iteration reads `## Steering` from `proposal.md`, picks the first unchecked item in `tasks.md`, does the work, validates, and checks it off. When every item is checked the loop archives the change.
39
21
 
40
22
  ## Install
41
23
 
42
- Requires [Bun](https://bun.sh). For the Claude engine you also need the [Claude CLI](https://docs.anthropic.com/en/docs/claude-cli). The Makefile install path additionally needs `jq`.
24
+ Requires [Bun](https://bun.sh). The Claude engine also needs the [Claude CLI](https://docs.anthropic.com/en/docs/claude-cli).
43
25
 
44
26
  ```bash
45
- # Global (recommended)
46
- npm install -g @neriros/ralphy
47
- # or run without installing
48
- bunx @neriros/ralphy
49
-
50
- # Per-project install (builds + wires .ralph/ into the repo)
51
- bun install
52
- make install # → ./.ralph
53
- make install ~ # → ~/.ralph
54
- make install /path/to # → /path/to/.ralph
27
+ npm install -g @neriros/ralphy # or: bunx @neriros/ralphy
55
28
  ```
56
29
 
57
- The per-project install builds the CLI and MCP server, copies them to `.ralph/bin/`, sets up templates, wires `.mcp.json`, and adds a `ralph` script to `package.json`. `.ralph/` is gitignored by default.
58
-
59
- ## Task mode
30
+ ## Task mode one-shot loop
60
31
 
61
32
  ```bash
62
- # Create + run a new task
63
33
  ralphy loop task --name fix-auth --prompt "Fix the JWT validation bug" --claude opus --max-iterations 10
64
34
 
65
- # Resume the same task later (state is on disk)
35
+ # Resume later (state is on disk)
66
36
  ralphy loop task --name fix-auth
67
37
 
68
38
  # Inspect
69
- ralphy agent list # local tasks + Linear tickets per indicator bucket (with linked PR URLs)
70
- ralphy loop status --name fix-auth # one task (details)
39
+ ralphy loop status --name fix-auth
71
40
  ```
72
41
 
73
- Engine defaults to Claude Opus. Common safeguards: `--max-iterations`, `--max-cost`, `--max-runtime`, `--max-failures`. See the [CLI reference](#cli-reference) for the full set.
42
+ Safeguards: `--max-iterations`, `--max-cost`, `--max-runtime`, `--max-failures`. Engine defaults to Claude Opus. See [GUIDE.md CLI reference](./GUIDE.md#cli-reference) for the full set.
74
43
 
75
- ## Agent mode
44
+ ## Agent mode — Linear-driven
76
45
 
77
- `ralph agent` polls Linear, runs up to N concurrent task loops, and (optionally) opens PRs, watches CI, and iterates with reviewers. Requires `LINEAR_API_KEY`.
46
+ `ralphy agent` polls Linear, runs up to N concurrent task loops, and (optionally) opens PRs, watches CI, and iterates with reviewers. Requires `LINEAR_API_KEY`.
78
47
 
79
48
  ```bash
80
49
  export LINEAR_API_KEY=lin_api_xxx
81
- ralphy agent --linear-team ENG --linear-assignee me --concurrency 3 --poll-interval 60
82
- ```
83
-
84
- Configuration lives in **`WORKFLOW.md`** at the project root — YAML frontmatter for settings, followed by a Jinja-style prompt template the worker renders for every iteration. A default is written on first run. CLI flags override config per-invocation.
85
-
86
- ### Lifecycle and triggers
87
-
88
- Each poll inspects Linear (and, when configured, GitHub PRs) and routes each issue into one of these spawn modes:
89
-
90
- | Mode | When it fires | What changes |
91
- | ---------------- | -------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------- |
92
- | **fresh** | Issue matches `getTodo` | Scaffold a new change, spawn worker, apply `setInProgress` |
93
- | **resume** | Issue matches `getInProgress` (typical: agent restart) | Re-attach to existing change directory, skip re-scaffold |
94
- | **conflict-fix** | A tracked PR (`setDone` candidate _or_ an in-progress ticket's PR) is detected as `CONFLICTING` via `gh pr view` | Interrupt resume if needed, prepend a conflict-resolution task to `tasks.md`, reactivate state |
95
- | **ci-fix** | A tracked PR's CI is red (`gh pr view --json statusCheckRollup`) and `fixCiOnFailure` is enabled | Prepend a "Fix failing CI checks" task with gh-driven log inspection; reactivate state |
96
- | **review** | Done issue carries the `getReview` marker (label trigger), _or_ a `@ralphy` mention is detected on Linear / the linked GitHub PR | Prepend a review task with the relevant comments; remove the `clearReview` label after pickup |
97
- | **code-review** | Open tracked PR has unresolved review-thread comments newer than Ralph's last pickup ack | Prepend a digest of unresolved comments with fix-or-reply instructions; repeats until PR approved |
98
-
99
- > `conflict-fix` and `ci-fix` are routed entirely from GitHub state — there is no Linear `getConflicted` / `getCiFailed` indicator anymore. The merge-state scan reads `gh pr view` directly and enqueues the matching fix trigger.
100
-
101
- ```mermaid
102
- flowchart TD
103
- POLL["Linear poll"] --> SCAN{trigger?}
104
- SCAN -- "getTodo" --> FRESH["mode: fresh\nscaffold change"]
105
- SCAN -- "getInProgress" --> RESUME["mode: resume"]
106
- SCAN -- "gh: PR CONFLICTING" --> CFX["mode: conflict-fix\nprepend fix task"]
107
- SCAN -- "gh: PR CI red\n(fixCiOnFailure)" --> CIFX["mode: ci-fix\nprepend CI fix task"]
108
- SCAN -- "getReview\nor @ralphy mention\n(Linear / GitHub)" --> REV["mode: review\nprepend comments"]
109
- SCAN -- "open PR with new\nunresolved review comments" --> CR["mode: review (code-review)\nprepend thread digest"]
110
-
111
- FRESH & RESUME & CFX & CIFX & REV & CR --> IN_PROG["Linear: setInProgress\npost pickup comment"]
112
- IN_PROG --> WT{useWorktree?}
113
- WT -- yes --> SCAFFOLD["create worktree + branch"] --> WORKER([worker loop])
114
- WT -- no --> WORKER
115
-
116
- WORKER --> EXIT{exit code}
117
- EXIT -- non-zero --> ERR_FLOW
118
- EXIT -- 0 --> WANT_PR{wantPr?}
119
- WANT_PR -- no --> DONE_FLOW
120
- WANT_PR -- yes --> PR["push + gh pr create\n↺ rebase / hook-fix"]
121
- PR -- "no commits" --> DONE_FLOW
122
- PR -- "opened" --> WATCH
123
-
124
- subgraph WATCH["watch loop"]
125
- direction LR
126
- WATCH_CHECK["conflict-check"] --> WATCH_CI["ci-poll / ci-fix"]
127
- WATCH_CI --> WATCH_CHECK
128
- end
129
- WATCH -- "green & clean" --> DONE_FLOW
130
- WATCH -- "gave up" --> ERR_FLOW
131
-
132
- subgraph DONE_FLOW["clean exit"]
133
- D1["worktree cleanup\n(if configured)"] --> D2["teardown script"] --> D5["Linear: setDone"]
134
- end
135
- subgraph ERR_FLOW["failure"]
136
- E1["worktree preserved"] --> E2["Linear: setError\nclearInProgress"]
137
- end
138
- D5 & E2 --> POLL
50
+ ralphy agent --linear-team ENG --linear-assignee me --concurrency 3 --create-pr --fix-ci
139
51
  ```
140
52
 
141
- The cycle repeats every poll. For code-review-iteration in particular, `setDone` re-applies between rounds so the next poll re-checks for new reviewer activity, until the PR is approved or merged.
142
-
143
- ### Linear indicators
144
-
145
- Linear is the source of truth for which issues Ralph has touched. The `linear.indicators` map declares how Ralph queries and mutates Linear at each lifecycle event. All keys are optional; an unset key means "Ralph doesn't perform that action".
146
-
147
- | Key | Type | Purpose |
148
- | --------------- | ---------------------- | ------------------------------------------------------------------------------- |
149
- | `getTodo` | `{filter: Marker[]}` | Issues to pick up (fresh) |
150
- | `getInProgress` | `{filter: Marker[]}` | Issues to resume after restart |
151
- | `getReview` | `{filter: Marker[]}` | Done issues flagged for review follow-up |
152
- | `getAutoMerge` | `{filter: Marker[]}` | Issues whose PR should be auto-merged once required checks pass |
153
- | `setInProgress` | `Marker` or `Marker[]` | Applied when a worker spawns (any non-resume mode) |
154
- | `setDone` | `Marker` or `Marker[]` | Applied on clean exit |
155
- | `setError` | `Marker` or `Marker[]` | Applied on non-zero exit (quarantine signal — issue is _not_ auto-resumed) |
156
- | `clearReview` | `Marker` or `Marker[]` | Label(s) removed when a review pickup happens (status removal is not supported) |
157
- | `getApproved` | `{filter: Marker[]}` | Approval signal that releases a confirmation-gated ticket into `implement` |
158
- | `clearApproved` | `Marker` or `Marker[]` | Label(s) removed once an approval is consumed (status removal is not supported) |
159
-
160
- > Conflict and CI-failure routing no longer use Linear indicators — there's no `getConflicted` / `setConflicted` / `clearConflicted` (or `getCiFailed` / `setCiFailed` / `clearCiFailed`). GitHub is the source of truth: `gh pr view` produces the conflicted / ci-failed / mergeable counts and pushes `conflict-fix` / `ci-fix` queue entries directly.
161
-
162
- A `Marker` is one of three types:
163
-
164
- | Marker type | Example value | Effect |
165
- | -------------- | --------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
166
- | `"label"` | `"ralph:in-progress"` | Adds or removes a Linear label on the issue |
167
- | `"status"` | `"In Progress"` | Updates the Linear workflow status of the issue |
168
- | `"attachment"` | `"In Progress"` | Upserts a single **Ralphy** attachment on the issue; `value` becomes the subtitle. The same entry is reused across every lifecycle transition — Ralph creates it on first apply and edits it on subsequent ones, so the issue stays tidy. |
169
-
170
- Use an array when one event sets multiple — e.g. `setDone` flipping a status _and_ adding a label _and_ updating the attachment subtitle.
171
-
172
- Example `WORKFLOW.md` frontmatter — the prompt template after the closing `---` is omitted here; see the bundled default for the full file:
53
+ Each poll routes every matching issue into one of: **fresh** (Todo → scaffold + spawn), **resume** (In Progress → reattach), **conflict-fix** / **ci-fix** (PR red on GitHub prepend fix task), or **review** / **code-review** (reviewer comments or `@ralphy` mention).
173
54
 
174
- ```yaml
175
- ---
176
- concurrency: 3
177
- pollIntervalSeconds: 60
178
- engine: claude
179
- model: opus
180
- useWorktree: true
181
- createPrOnSuccess: true
182
- autoMergeStrategy: squash
183
- fixCiOnFailure: true
55
+ Configuration lives in **`WORKFLOW.md`** at the project root — YAML frontmatter for settings, followed by the Jinja-style prompt template the worker renders each iteration. A default is written on first run; CLI flags override per invocation.
184
56
 
185
- linear:
186
- team: ENG
187
- assignee: me
188
- postComments: true
189
- updateEveryIterations: 10
190
- mentionTrigger: true
191
- mentionHandle: "@ralphy"
192
- codeReviewTrigger: true
193
- codeReviewStaleHours: 24
194
- syncTasksToComment: true
195
- syncSpecsAsAttachments: true
196
-
197
- indicators:
198
- # Todo → In Progress
199
- getTodo:
200
- filter:
201
- - type: status
202
- value: Todo
203
- getInProgress:
204
- filter:
205
- - type: status
206
- value: In Progress
207
- setInProgress:
208
- type: status
209
- value: In Progress
210
-
211
- # Done / review hand-off
212
- setDone:
213
- - type: status
214
- value: In Review
215
- - type: label
216
- value: ralphy-done
217
- getReview:
218
- filter:
219
- - type: label
220
- value: "ralph:review"
221
- clearReview:
222
- type: label
223
- value: "ralph:review"
224
-
225
- # Auto-merge opt-in
226
- getAutoMerge:
227
- filter:
228
- - type: label
229
- value: "ralph:auto-merge"
230
-
231
- # Error quarantine
232
- setError:
233
- type: label
234
- value: "ralph:error"
235
- ---
236
- ```
237
-
238
- #### Confirmation mode (human gate before `implement`)
239
-
240
- Set `linear.confirmationMode.enabled: true` to insert a human review step between the OpenSpec `tasks` and `implement` phases. Once the agent finishes drafting `tasks.md`, the ticket parks in the new `awaiting-confirmation` phase and Ralphy posts a one-shot **📋 Ralphy plan ready** Linear comment summarising the plan. Gated tickets do **not** consume a `concurrency` slot — the agent is free to pick up other work while waiting.
241
-
242
- Three signals release (or skip) the gate:
243
-
244
- | Signal | Effect |
245
- | ------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------- |
246
- | Apply the `getApproved` marker | Ralphy strips it via `clearApproved`, records the approval, and advances the ticket into `implement`. |
247
- | Comment `@ralphy revise: <why>` | The reason is written into steering, the round counter bumps, the ticket loops back to `design`. Any in-flight worker is reaped immediately. |
248
- | Apply `optOutLabel` | (default `ralph:auto-approve`) Bypasses the gate entirely — the ticket flows straight through `tasks` → `implement` as if confirmation mode were off. |
249
-
250
- By default confirmation mode applies to every ticket. Set `linear.confirmationMode.optInLabel` (e.g. `ralph:needs-review`) to flip the polarity — only tickets carrying that label go through the gate; everything else implements straight through.
251
-
252
- After `timeoutHours` (default `48`) with no activity Ralphy posts a single nudge comment per round. Tickets that exceed `maxConfirmationRounds` (default `3`) are labelled `ralph:stuck` and skipped on future polls until a human intervenes.
253
-
254
- Wire up the matching indicators alongside the rest of the `linear.indicators` map:
255
-
256
- ```yaml
257
- getApproved:
258
- filter:
259
- - type: label
260
- value: "ralph:approved"
261
- clearApproved:
262
- type: label
263
- value: "ralph:approved"
264
- ```
265
-
266
- See `linear.confirmationMode` in `WORKFLOW.md` for the full set of knobs.
267
-
268
- #### Review follow-ups (label trigger)
269
-
270
- When a Linear issue is in a done state and a reviewer adds the `getReview` marker (typically a label like `ralph:review` left alongside comments), Ralph picks it up, applies `setInProgress`, removes the `clearReview` label so the trigger doesn't re-fire, filters out Ralph's own comments, and prepends every reviewer comment as a fresh task at the top of `tasks.md`. `setDone` re-applies on clean exit.
271
-
272
- #### `@ralphy` mention trigger
273
-
274
- Set `linear.mentionTrigger: true` to scan Linear issue comments on every non-cancelled issue (Todo, In Progress, Backlog, Triage, Done) _and_ on the linked GitHub PR for a configurable handle (`linear.mentionHandle`, default `@ralphy`). Each unprocessed mention queues the issue as a review run, with the mention text used **verbatim** as the prepended task. Idempotency: a mention is processed when its `createdAt` is older than Ralph's latest `🔁 picked up` Linear comment, so the same comment never re-fires. Requires `gh` for the GitHub side.
275
-
276
- #### Code-review iteration
277
-
278
- Set `linear.codeReviewTrigger: true` (or pass `--code-review`) to watch open, unmerged, unapproved tracked PRs for unresolved review-thread comments. New activity on any unresolved thread queues a review run whose task is a digest of every unresolved comment + instructions:
279
-
280
- - **If Ralph agrees** with a comment — fix, commit, push, and resolve the thread (via `gh api graphql`'s `resolveReviewThread`).
281
- - **If Ralph disagrees** — reply on the thread with reasoning via `gh api .../comments/{id}/replies` and leave it unresolved.
282
-
283
- The loop exits; the next poll re-checks the PR. The cycle continues until the PR is **approved** or **merged**. If the reviewer is silent for more than `linear.codeReviewStaleHours` (default `24`, `0` disables) while Ralph is the last actor, one `@`-mention ping comment is posted on the GitHub PR.
284
-
285
- #### Self-review phase
286
-
287
- Once every task in `tasks.md` is checked off, the worker can spawn an in-process reviewer pass before exiting. The reviewer reads `proposal.md`, `design.md`, and the diff, and either appends new tasks back into `tasks.md` (looping the worker for another round) or signs off. Configure under `openspec.reviewPhase`:
288
-
289
- ```yaml
290
- openspec:
291
- reviewPhase:
292
- enabled: true
293
- maxRounds: 2 # hard cap on review iterations (default 1)
294
- reviewerModel: claude-sonnet-4-6 # override the reviewer's model (optional)
295
- reviewerContextStrategy: fresh # "fresh" = clean context per round (default), "warm" = reuse worker context
296
- ```
57
+ See **[GUIDE.md](./GUIDE.md)** for:
297
58
 
298
- CLI equivalents: `--review-enabled`, `--review-max-rounds <N>`, `--review-model <id>`, `--review-context-strategy fresh|warm`. The worker passes these to itself when respawning, so the same review settings apply across `respawn` / `conflict-fix` / `ci-fix` lifecycles.
299
-
300
- #### Sync tasks into a Linear comment
301
-
302
- `linear.syncTasksToComment` (default `true`) mirrors the active change's
303
- `tasks.md` into a dedicated Linear **comment** instead of the issue
304
- description. The same comment is updated in place across iterations so
305
- the timeline stays clean. When `ralph_append_steering` is invoked the
306
- existing tasks comment is deleted and re-posted so it always lands at
307
- the bottom of the timeline, after the new steering comment.
308
-
309
- The first time planning completes (every `- [ ]` under `## Planning` in
310
- `tasks.md` becomes `- [x]`), Ralph posts a one-shot "📋 Plan" comment
311
- summarizing `proposal.md` (`## Why` + `## What Changes`) and the first
312
- paragraph of `design.md`.
313
-
314
- #### Conflict re-fix / CI re-fix
315
-
316
- Every poll, the merge-state scanner reads `gh pr view --json state,mergeable,mergeStateStatus,statusCheckRollup` for each tracked PR:
317
-
318
- - **`mergeable === "CONFLICTING"`** (or `mergeStateStatus === "DIRTY"`) → enqueue a `conflict-fix` run that prepends a conflict-resolution task to `tasks.md` and re-activates the change. In-progress tickets are interrupted in favour of fixing the merge state.
319
- - **`statusCheckRollup` shows red CI** and `fixCiOnFailure` is enabled → enqueue a `ci-fix` run that prepends a "Fix failing CI checks" task with `gh run view --log-failed` steps so the worker can read the failure logs.
320
-
321
- No Linear labels are involved in either path — `gh` is the single source of truth, and the matching `conflict-fix` / `ci-fix` queue entries land directly. A one-line Linear comment is posted for visibility when a ticket is promoted into a fix flow.
322
-
323
- The scanner is resilient to:
324
-
325
- - Transient `gh` failures (failed PR-discovery is cached with a 10-minute TTL — not permanent).
326
- - Branch-name drift after a Linear title edit (falls back to `gh pr list --search "<ID> in:title state:open"`).
327
- - GitHub's async `UNKNOWN` mergeability response (fibonacci backoff up to ~31s total, also consults `mergeStateStatus` which often resolves before `mergeable` does).
328
-
329
- ### PR + CI integration
330
-
331
- | Flag / config | Behavior |
332
- | ---------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
333
- | `createPrOnSuccess` / `--create-pr` | After a clean exit, push the worker's branch and `gh pr create`. Title: `<ID>: <title>`. Idempotent — surfaces the existing URL if the PR is already open. Requires `--worktree` and `gh` authenticated. `prBaseBranch` defaults to `main`; override per-issue by labelling the Linear issue with `ralph:branch:<branch-name>`. |
334
- | `stackPrsOnDependencies` / `--stack-prs` | When the Linear issue is blocked by another issue (`blocked_by` relation) that has exactly one open GitHub PR, open this PR against that blocker's head branch instead of `prBaseBranch`. Resolves the blocker's PR via Linear's auto-attachment + `gh pr view --json state,headRefName`. Falls back to `prBaseBranch` when zero / multiple blockers (or PRs) match. A `ralph:branch:<name>` label still wins. |
335
- | `getAutoMerge` indicator | Opt an issue in for GitHub auto-merge (any-of label/status filter, same shape as `getReview`). When matched, Ralph runs `gh pr merge <url> --auto --<strategy>` right after opening the PR so GitHub merges as soon as required checks pass. Strategy comes from `autoMergeStrategy` (`squash` \| `merge` \| `rebase`, default `squash`). Failures are logged but non-fatal — the CI/conflict watch loop continues. |
336
- | `fixCiOnFailure` / `--fix-ci` | After the PR opens, poll `gh pr checks`. On failure, pull failed logs via `gh run view --log-failed`, append them to `## Steering`, re-spawn the worker, and push the new commits — repeat until green or `maxCiFixAttempts` (default `5`) is hit. While this loop runs, `setDone` is **not** applied; if CI is never green the worker is treated as failed. |
337
- | `ciPollIntervalSeconds` | Seconds between CI status polls (default `30`). |
338
- | `ignoreCiChecks` | Array of check names to ignore when computing pass/fail. |
339
- | `codeReviewTrigger` / `--code-review` | See [Code-review iteration](#code-review-iteration). |
340
-
341
- ### Pre-existing error check
342
-
343
- Opt-in gate that protects the agent from chasing failures it cannot fix. When
344
- enabled (config `preExistingErrorCheck.enabled: true` or `--pre-existing-error-check`),
345
- on every poll tick Ralph runs the configured commands against the base branch
346
- HEAD. If any command fails:
347
-
348
- 1. A Linear issue is created with the failing command, exit code, and truncated
349
- output (fingerprint embedded in the body so re-runs with the same failure
350
- don't open duplicates).
351
- 2. The coordinator pauses — new fresh/resume/conflict-fix/review pickups are
352
- blocked until the trunk is green again. **In-flight workers are not killed.**
353
- 3. The dashboard shows a red `⛔ BASELINE BROKEN <LIN-ID> · <duration>` banner.
354
-
355
- When the baseline goes green (the human merged the fix), the next poll lifts
356
- the pause automatically.
357
-
358
- ```yaml
359
- preExistingErrorCheck:
360
- enabled: false
361
- commands: # falls back to commands.lint + commands.test when empty
362
- - bun run lint
363
- - bun run test
364
- baseBranch: main
365
- label: "ralph:pre-existing-error"
366
- outputCharLimit: 4000
367
- ```
368
-
369
- ### Worktrees, setup, teardown
370
-
371
- With `useWorktree: true` (or `--worktree`) each task runs in an isolated worktree at `~/.ralph/<project>/worktrees/<change-name>` checked out onto a fresh `ralph/<change-name>` branch. Concurrent workers can't stomp on each other, and the worker's cwd _is_ the worktree.
372
-
373
- - **`setupScript`** — `sh -c`-run inside the worktree right after scaffolding (e.g. `bun install`, `cp .env.example .env`).
374
- - **`teardownScript`** — `sh -c`-run after the loop exits and (optional) worktree cleanup.
375
-
376
- Both scripts receive `WORKSPACE_ROOT` in their environment — the absolute path to the origin repository (the parent of the worktree). Use it to reference project-root files from inside a worktree, e.g. `cp "$WORKSPACE_ROOT/.env.example" .env`.
377
-
378
- - **`cleanupWorktreeOnSuccess`** — remove the worktree on clean exit. Failed workers always keep their worktree + branch for human inspection.
379
-
380
- Both scripts log failures but never block the loop. **`appendPrompt`** (or `--prompt` in agent mode) is appended to every scaffolded `proposal.md` under `## Additional instructions` — use it for cross-cutting guidance every task should see.
381
-
382
- ### Running under tmux
383
-
384
- If `tmux` is on `$PATH`, `ralphy agent` re-execs itself inside a managed tmux session on first launch (per-workspace name). Detaching the terminal — closing the SSH session, the laptop lid, the `tmux detach` keybind — leaves the agent running. Re-running `ralphy agent` from the same workspace attaches to the existing session instead of starting a second copy.
385
-
386
- | Command | Behavior |
387
- | ------------------------ | -------------------------------------------------------------------------- |
388
- | `ralphy agent` | Attach to the managed tmux session, or start one if absent |
389
- | `ralphy agent status` | Report whether the managed session exists and is currently attached |
390
- | `ralphy agent stop` | Kill the managed session (workers exit cleanly) |
391
- | `ralphy agent --no-tmux` | Skip tmux entirely and run the agent in the foreground (CI, scripted runs) |
392
-
393
- ### Dashboard and logs
394
-
395
- The terminal dashboard shows three always-visible panels: **RALPH AGENT** (engine/model, concurrency, poll interval, active limits, feature flags, Linear filter), **POLL STATUS + WORKERS** (last-poll bucket breakdown — `todo · res · conf · rev · @` (each colored when non-zero) plus `↺ Ns` next-poll countdown, active/queued worker totals), and **TASKS tab bar** (numbered worker tabs — `Tab` / `← →` / `1-9` to switch).
396
-
397
- Each worker card shows: priority badge + identifier + title + mode badge, `↗ LINEAR`, `↗ PR`, `▶ TASK` (first unchecked task from `tasks.md`, refreshed every second), `PHASE` with color + elapsed time, `⏵ CMD` when a shell command is in flight, `LOG` path for `tail -f`, and `─ OUTPUT ─` with live stdout/stderr.
398
-
399
- Log files (every line is `[ISO] [type] message`):
400
-
401
- | File | Contains |
402
- | ---------------------------------------- | ----------------------------------------------------------------------------------------- |
403
- | `~/.ralph/agent-mode.log` | Global session log, appended each agent run |
404
- | `<projectRoot>/.ralph/logs/<change>.log` | Per-worker unified log: output + phases + coordinator events |
405
- | `<taskDir>/LOG.jsonl` | Structured JSON event log used by the web UI |
406
- | `<path from --json-log-file>` | Mirror of the structured event stream (state changes, phases, polls) — file-tail friendly |
407
-
408
- Failed workers are not marked processed, so they retry on the next poll. SIGINT / SIGTERM cleanly stops polling and kills active workers. All Linear side effects are best-effort — failures log a warning but never block the loop.
409
-
410
- ## CLI reference
411
-
412
- **Task flags**
413
-
414
- | Option | Description |
415
- | ---------------------- | --------------------------------------------------------- |
416
- | `--name <name>` | Task name (required for most commands) |
417
- | `--prompt <text>` | Task description |
418
- | `--prompt-file <path>` | Read prompt from file |
419
- | `--claude [model]` | Use Claude engine (haiku / sonnet / opus, default opus) |
420
- | `--codex` | Use Codex engine |
421
- | `--model <model>` | Set model (haiku / sonnet / opus) |
422
- | `--max-iterations <N>` | Stop after N iterations (`0` = unlimited) |
423
- | `--max-cost <N>` | Stop when total cost exceeds $N |
424
- | `--max-runtime <N>` | Stop after N minutes |
425
- | `--max-failures <N>` | Stop after N consecutive identical failures (default `5`) |
426
- | `--unlimited` | Sets max iterations to 0 (default) |
427
- | `--delay <N>` | Seconds between iterations |
428
- | `--manual-test` | Enable manual-test phase (creates test tasks) |
429
- | `--log` | Log raw engine stream |
430
- | `--verbose` | Verbose output |
431
-
432
- **Agent-mode flags**
433
-
434
- | Option | Behavior |
435
- | ------------------------------- | -------------------------------------------------------------------------------------------- |
436
- | `--linear-team <key>` | Linear team key (e.g. `ENG`) |
437
- | `--linear-assignee <id>` | Assignee filter (user id, email, or `me`) |
438
- | `--poll-interval <s>` | Seconds between Linear polls (default `60`) |
439
- | `--concurrency <n>` | Max concurrent task loops (default `1`) |
440
- | `--max-tickets <n>` | Stop picking up new issues after N have been started this run (`0` = unlimited) |
441
- | `--worktree` | Run each task in its own git worktree |
442
- | `--indicator <k>:<t>:<v>` | Override one `linear.indicators` entry (repeatable, e.g. `setDone:status:Done`) |
443
- | `--create-pr` | Push worker branch + open a GitHub PR on success (needs `--worktree`) |
444
- | `--fix-ci` | After PR opens, re-run task on CI failures until green (needs `--create-pr`) |
445
- | `--stack-prs` | Open the PR against a blocker issue's open-PR head branch when present (needs `--create-pr`) |
446
- | `--code-review` | Watch open tracked PRs for unresolved review comments and prepend a code-review task |
447
- | `--json-output` | Emit JSONL to stdout instead of rendering the Ink dashboard (CI / scripting) |
448
- | `--json-log-file <path>` | Mirror the JSONL event stream to a file alongside the TUI or `--json-output` |
449
- | `--no-tmux` | Don't auto-reexec under tmux; run the agent in the foreground |
450
- | `--review-enabled` | Enable the worker's self-review phase (see [Self-review phase](#self-review-phase)) |
451
- | `--review-max-rounds <N>` | Hard cap on review rounds per task (default `1`) |
452
- | `--review-model <id>` | Override the reviewer's model (defaults to the worker's model) |
453
- | `--review-context-strategy <s>` | `fresh` (default) for a clean reviewer context per round, or `warm` to reuse the worker |
454
-
455
- **List-mode flags**
456
-
457
- | Option | Behavior |
458
- | --------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
459
- | `--debug --name <id>` | Diagnose why a Linear ticket (e.g. `ENG-42`) is not being picked up — checks team, assignee, include / exclude markers, and blocked-by relations against every configured `get*` indicator. |
460
-
461
- `ralph list` reads `WORKFLOW.md` and, when `LINEAR_API_KEY` is set, fetches every issue matching each configured `getTodo` / `getInProgress` / `getReview` / `getAutoMerge` indicator using the same include / exclude rules as `ralph agent`. For each ticket it also resolves the linked GitHub PR URL from Linear attachments and prints its conflict / CI status from `gh pr view`.
462
-
463
- **`--max-tickets`.** Caps how many issues ralph picks up in a single agent run. Once the limit is hit the coordinator stops enqueuing new work; in-flight workers continue to completion, and the dashboard header shows `│ tickets ≤N`. The limit resets each restart.
464
-
465
- ## Change layout (OpenSpec)
466
-
467
- There are no phases. One loop, one prompt, one `tasks.md` checklist. Each change lives in `<projectRoot>/openspec/changes/<name>/` (managed by OpenSpec) plus `<projectRoot>/.ralph/tasks/<name>/` (loop state only):
468
-
469
- | File / Directory | Purpose |
470
- | --------------------------------------- | --------------------------------------------------------- |
471
- | `openspec/changes/<name>/proposal.md` | Description, goals, and the `## Steering` section |
472
- | `openspec/changes/<name>/design.md` | Technical design and architecture decisions |
473
- | `openspec/changes/<name>/tasks.md` | Checklist driving iteration — one unchecked item per loop |
474
- | `openspec/changes/<name>/specs/` | Per-task specifications |
475
- | `.ralph/tasks/<name>/.ralph-state.json` | Loop state (iteration count, status, cost, history) |
476
- | `.ralph/tasks/<name>/STOP` | Create this file to signal the loop to stop |
477
-
478
- Steering is delivered by editing the `## Steering` section of `proposal.md`. The agent reads it at the start of every iteration.
59
+ - Lifecycle diagram + per-mode behavior
60
+ - `linear.indicators` schema and the full `WORKFLOW.md` example
61
+ - Confirmation gate (`@ralphy revise`, opt-in/out labels)
62
+ - `@ralphy` mentions, code-review iteration, self-review phase
63
+ - PR + CI integration (auto-merge, stacked PRs, fix-ci loop)
64
+ - Pre-existing error check, worktrees, tmux session management, dashboard, logs
65
+ - Complete CLI reference (task, agent, list modes)
479
66
 
480
67
  ## MCP server
481
68
 
482
- Ralphy includes an MCP server that exposes task-management tools to Claude agents. It's auto-configured during installation.
69
+ Ralphy ships an MCP server (auto-configured on per-project install) exposing `ralph_list_changes`, `ralph_get_change`, `ralph_create_change`, `ralph_append_steering`, `ralph_stop`. See [GUIDE.md → MCP server](./GUIDE.md#mcp-server).
483
70
 
484
- | Tool | Purpose |
485
- | ----------------------- | ------------------------------------------ |
486
- | `ralph_list_changes` | List changes with status |
487
- | `ralph_get_change` | Get change details |
488
- | `ralph_create_change` | Create and optionally start a change |
489
- | `ralph_append_steering` | Append a steering message to `proposal.md` |
490
- | `ralph_stop` | Stop a running change |
71
+ ## Development
491
72
 
492
- ## Project structure and development
493
-
494
- ```
495
- ralphy/
496
- ├── apps/
497
- │ ├── cli/ # CLI application
498
- │ └── mcp/ # MCP server
499
- ├── packages/
500
- │ ├── core/ # State management and loop
501
- │ ├── context/ # Storage abstraction
502
- │ ├── content/ # Base prompt and task templates
503
- │ ├── engine/ # Claude / Codex engine spawning
504
- │ ├── openspec/ # ChangeStore interface and OpenSpec adapter
505
- │ ├── output/ # Terminal formatting
506
- │ └── types/ # Zod schemas and types
507
- └── Makefile
73
+ ```bash
74
+ bun install
75
+ bunx nx run-many -t lint,typecheck,test,build # all checks
76
+ bunx nx run cli:build # CLI only
508
77
  ```
509
78
 
79
+ Per-project install (builds + wires `.ralph/` and `.mcp.json` into the repo):
80
+
510
81
  ```bash
511
- bun install
512
- bunx nx run-many -t lint,typecheck,test,build # Run all checks
513
- bunx nx run cli:build # Build CLI only
82
+ make install # → ./.ralph
83
+ make install ~ # ~/.ralph
84
+ make install /path/to # /path/to/.ralph
514
85
  ```
package/dist/mcp/index.js CHANGED
@@ -24066,6 +24066,7 @@ var StateSchema = exports_external.object({
24066
24066
  model: exports_external.string().default("opus"),
24067
24067
  manualTest: exports_external.boolean().default(false),
24068
24068
  createPr: exports_external.boolean().default(false),
24069
+ validateOnComplete: exports_external.boolean().default(false),
24069
24070
  usage: UsageSchema.default({}),
24070
24071
  history: exports_external.array(HistoryEntrySchema).default([]),
24071
24072
  metadata: exports_external.object({ branch: exports_external.string().optional() }).default({}),
@@ -18928,8 +18928,8 @@ import { readFileSync } from "fs";
18928
18928
  import { resolve } from "path";
18929
18929
  function getVersion() {
18930
18930
  try {
18931
- if ("3.8.9")
18932
- return "3.8.9";
18931
+ if ("3.8.10")
18932
+ return "3.8.10";
18933
18933
  } catch {}
18934
18934
  const dirsToTry = [];
18935
18935
  try {
@@ -64364,6 +64364,7 @@ var init_types2 = __esm(() => {
64364
64364
  model: exports_external.string().default("opus"),
64365
64365
  manualTest: exports_external.boolean().default(false),
64366
64366
  createPr: exports_external.boolean().default(false),
64367
+ validateOnComplete: exports_external.boolean().default(false),
64367
64368
  usage: UsageSchema.default({}),
64368
64369
  history: exports_external.array(HistoryEntrySchema).default([]),
64369
64370
  metadata: exports_external.object({ branch: exports_external.string().optional() }).default({}),
@@ -71284,8 +71285,11 @@ function buildTaskPrompt(state, taskDir, reviewPhase) {
71284
71285
  prompt += `Change name: \`${state.name}\`
71285
71286
 
71286
71287
  `;
71287
- prompt += `Run \`bunx openspec validate ${state.name}\` before committing.
71288
+ const validateOnly = state.validateOnComplete && !state.createPr;
71289
+ if (!validateOnly) {
71290
+ prompt += `Run \`bunx openspec validate ${state.name}\` before committing.
71288
71291
  `;
71292
+ }
71289
71293
  prompt += `Commit all changed files yourself before finishing \u2014 stage files individually (e.g. \`git add path/to/file\`), never \`git add -A\` or \`git commit -am\`. Nothing is committed automatically after you exit.
71290
71294
  `;
71291
71295
  if (state.createPr) {
@@ -71631,7 +71635,8 @@ function useLoop(opts) {
71631
71635
  writeState(stateDir, currentState);
71632
71636
  setState(currentState);
71633
71637
  try {
71634
- if (typeof opts.changeStore.getStatus === "function") {
71638
+ const skipStatusCheck = currentState.validateOnComplete && !currentState.createPr;
71639
+ if (!skipStatusCheck && typeof opts.changeStore.getStatus === "function") {
71635
71640
  const status = await opts.changeStore.getStatus(opts.name);
71636
71641
  if (!status.isComplete) {
71637
71642
  const blocked = status.artifacts.filter((a) => a.status !== "done").map((a) => `${a.id}=${a.status}`).join(", ");
@@ -99769,6 +99774,41 @@ async function runTeardownPhase(input, deps) {
99769
99774
  log2(`! teardown script threw: ${err.message}`, "yellow");
99770
99775
  }
99771
99776
  }
99777
+ async function runValidateOnlyPhase(input, deps) {
99778
+ const { changeName, changeDir, stateFilePath, validateCommands, cwd: cwd2 } = input;
99779
+ const { log: log2, emit: emit2, respawnWorker } = deps;
99780
+ const runCommand = deps.runCommand ?? defaultRunCommand;
99781
+ emit2("validate");
99782
+ if (validateCommands.length > 0) {
99783
+ for (const command of validateCommands) {
99784
+ const { exitCode, output } = await runCommand(command, cwd2);
99785
+ if (exitCode !== 0) {
99786
+ emit2("validate-fix", command);
99787
+ log2(`! validation check failed: ${command}`, "yellow");
99788
+ try {
99789
+ await prependFixTask(join28(changeDir, AGENT_TASKS_FILENAME), `Fix failing validation: ${command}`, output || `Command exited with code ${exitCode}`);
99790
+ } catch (err) {
99791
+ log2(`! could not prepend fix task: ${err.message}`, "red");
99792
+ return 1;
99793
+ }
99794
+ await reactivateState(stateFilePath, log2, changeName);
99795
+ return respawnWorker();
99796
+ }
99797
+ }
99798
+ }
99799
+ try {
99800
+ await prependFixTask(join28(changeDir, AGENT_TASKS_FILENAME), "Run openspec validation", [
99801
+ `Run \`bunx openspec validate ${changeName}\` to validate the change artifacts.`,
99802
+ `Commit any pending changes before running the validation command.`
99803
+ ].join(`
99804
+ `));
99805
+ } catch (err) {
99806
+ log2(`! could not prepend validation task: ${err.message}`, "red");
99807
+ return 1;
99808
+ }
99809
+ await reactivateState(stateFilePath, log2, changeName);
99810
+ return respawnWorker();
99811
+ }
99772
99812
  async function runPostTask(input, deps) {
99773
99813
  const { log: log2, cmd, git: git2, runScript } = deps;
99774
99814
  const emit2 = (phase2, detail) => deps.onPhase?.(phase2, detail);
@@ -99785,6 +99825,7 @@ async function runPostTask(input, deps) {
99785
99825
  wantPr,
99786
99826
  wantFixCi,
99787
99827
  wantAutoMerge,
99828
+ wantValidateOnly,
99788
99829
  cfg,
99789
99830
  respawnWorker
99790
99831
  } = input;
@@ -99798,6 +99839,23 @@ async function runPostTask(input, deps) {
99798
99839
  }
99799
99840
  }
99800
99841
  let effectiveCode = exitCode;
99842
+ if (wantValidateOnly && effectiveCode === 0) {
99843
+ effectiveCode = await runValidateOnlyPhase({
99844
+ changeName,
99845
+ changeDir,
99846
+ stateFilePath,
99847
+ validateCommands: cfg.validateCommands ?? [],
99848
+ cwd: cwd2
99849
+ }, {
99850
+ log: log2,
99851
+ emit: emit2,
99852
+ respawnWorker
99853
+ });
99854
+ emit2(effectiveCode === 0 ? "done" : "gave-up", effectiveCode !== 0 ? `exit ${effectiveCode}` : undefined);
99855
+ await runWorktreeCleanupPhase({ changeName, cwd: cwd2, projectRoot, useWorktree, effectiveCode, cfg }, { git: git2, log: log2, emit: emit2 });
99856
+ await runTeardownPhase({ cwd: cwd2, teardownScript: cfg.teardownScript }, { runScript, log: log2, emit: emit2 });
99857
+ return effectiveCode;
99858
+ }
99801
99859
  if (effectiveCode !== 0 && wantPr) {
99802
99860
  log2(` skipping PR phase for ${changeName} (worker exited with code ${effectiveCode})`, "gray");
99803
99861
  }
@@ -99871,7 +99929,18 @@ async function runPostTask(input, deps) {
99871
99929
  await runTeardownPhase({ cwd: cwd2, teardownScript: cfg.teardownScript }, { runScript, log: log2, emit: emit2 });
99872
99930
  return effectiveCode;
99873
99931
  }
99874
- var CI_FAILED_EXIT = 70, PR_FAILED_EXIT = 71, repoAutoMergeCache;
99932
+ var CI_FAILED_EXIT = 70, PR_FAILED_EXIT = 71, repoAutoMergeCache, defaultRunCommand = async (cmd, cwd2) => {
99933
+ const proc = Bun.spawnSync({
99934
+ cmd: ["sh", "-c", cmd],
99935
+ cwd: cwd2,
99936
+ stdout: "pipe",
99937
+ stderr: "pipe"
99938
+ });
99939
+ const decoder = new TextDecoder;
99940
+ const output = [decoder.decode(proc.stdout), decoder.decode(proc.stderr)].filter(Boolean).join(`
99941
+ `);
99942
+ return { exitCode: proc.exitCode ?? 1, output };
99943
+ };
99875
99944
  var init_post_task = __esm(() => {
99876
99945
  init_tasks_md();
99877
99946
  init_fs_change();
@@ -100059,6 +100128,23 @@ function createSpawnWorker(input) {
100059
100128
  const wantAutoMerge = issueForChange ? issueMatchesGetIndicator(issueForChange, indicators.getAutoMerge) : false;
100060
100129
  const wrapped = handle.exited.then(async (code) => {
100061
100130
  const workerLayout = projectLayout(cwd2);
100131
+ const validateSpecPath = join30(workerLayout.changeDir(changeName), "specs", "validate.md");
100132
+ const hasValidateSpec = await Bun.file(validateSpecPath).exists();
100133
+ const wantValidateOnly = hasValidateSpec && !wantPrBase;
100134
+ if (hasValidateSpec) {
100135
+ try {
100136
+ const stateFile = workerLayout.stateFile(changeName);
100137
+ const sf = Bun.file(stateFile);
100138
+ if (await sf.exists()) {
100139
+ const stateData = JSON.parse(await sf.text());
100140
+ if (!stateData.validateOnComplete) {
100141
+ stateData.validateOnComplete = true;
100142
+ stateData.createPr = false;
100143
+ await Bun.write(stateFile, JSON.stringify(stateData, null, 2));
100144
+ }
100145
+ }
100146
+ } catch {}
100147
+ }
100062
100148
  try {
100063
100149
  const prevTasks = await prevTasksPromise;
100064
100150
  const nextFile = Bun.file(missionTasksPath);
@@ -100104,6 +100190,7 @@ function createSpawnWorker(input) {
100104
100190
  wantPr,
100105
100191
  wantFixCi,
100106
100192
  wantAutoMerge,
100193
+ wantValidateOnly,
100107
100194
  cfg: {
100108
100195
  teardownScript: cfg.teardownScript ?? null,
100109
100196
  prBaseBranch: cfg.prBaseBranch,
@@ -100115,7 +100202,8 @@ function createSpawnWorker(input) {
100115
100202
  stackPrsOnDependencies: args.stackPrs || cfg.stackPrsOnDependencies,
100116
100203
  neverTouch: cfg.boundaries.never_touch,
100117
100204
  metaOnlyFiles: cfg.boundaries.meta_only_files,
100118
- manualMergeWhenAutoMergeDisabled: cfg.manualMergeWhenAutoMergeDisabled
100205
+ manualMergeWhenAutoMergeDisabled: cfg.manualMergeWhenAutoMergeDisabled,
100206
+ validateCommands: [cfg.commands.test, cfg.commands.lint, cfg.commands.typecheck].filter((c) => Boolean(c))
100119
100207
  },
100120
100208
  respawnWorker: respawn
100121
100209
  }, {
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@neriros/ralphy",
3
- "version": "3.8.9",
3
+ "version": "3.8.10",
4
4
  "description": "An iterative AI task execution framework. Orchestrates multi-phase autonomous work using Claude or Codex engines.",
5
5
  "keywords": [
6
6
  "agent",