npm - @neriros/ralphy - Versions diffs - 3.8.9 → 3.8.11 - Mend

@neriros/ralphy 3.8.9 → 3.8.11

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (4) hide show

package/README.md CHANGED Viewed

@@ -3,512 +3,127 @@
 [![npm version](https://img.shields.io/npm/v/@neriros/ralphy.svg)](https://www.npmjs.com/package/@neriros/ralphy)
 [![npm downloads](https://img.shields.io/npm/dm/@neriros/ralphy.svg)](https://www.npmjs.com/package/@neriros/ralphy)
 [![license](https://img.shields.io/npm/l/@neriros/ralphy.svg)](https://github.com/NeriRos/ralphy/blob/main/LICENSE)
-[![GitHub stars](https://img.shields.io/github/stars/NeriRos/ralphy.svg?style=social)](https://github.com/NeriRos/ralphy)
-[![GitHub issues](https://img.shields.io/github/issues/NeriRos/ralphy.svg)](https://github.com/NeriRos/ralphy/issues)
 [![Bun](https://img.shields.io/badge/runtime-Bun-fbf0df.svg)](https://bun.sh)
-An iterative AI task execution framework. Ralphy orchestrates autonomous work using Claude or Codex with built-in state management, progress tracking, and cost safeguards. It can run as a one-shot task or as a long-lived **agent** that polls Linear, ships PRs, and iterates with reviewers.
+An iterative AI task execution framework. Ralphy runs Claude or Codex in a checklist-driven loop with state on disk, cost safeguards, and a long-lived **agent** that polls Linear, opens PRs, and iterates with reviewers.
-## Contents
+> 📘 Full reference — Linear indicators, lifecycle, PR/CI flow, CLI flags, MCP — lives in **[GUIDE.md](./GUIDE.md)**.
-- [How it works](#how-it-works)
-- [Install](#install)
-- [Task mode](#task-mode) — single-task / single-loop usage
-- [Agent mode](#agent-mode) — Linear-driven autonomous loop
-  - [Lifecycle and triggers](#lifecycle-and-triggers)
-  - [Linear indicators](#linear-indicators)
-  - [PR + CI integration](#pr--ci-integration)
-  - [Worktrees, setup, teardown](#worktrees-setup-teardown)
-  - [Dashboard and logs](#dashboard-and-logs)
-- [CLI reference](#cli-reference)
-- [Change layout (OpenSpec)](#change-layout-openspec)
-- [MCP server](#mcp-server)
-- [Project structure and development](#project-structure-and-development)
+## Features
-## How it works
+**Loop**
-Ralphy runs a single continuous loop against an OpenSpec change — no phases, no phase transitions.
+- **Checklist-driven** — one unchecked task per iteration; state persists on disk so any run can be resumed.
+- **Engine choice** — Claude (haiku / sonnet / opus) or Codex, swappable per task.
+- **Safeguards** — `--max-iterations`, `--max-cost`, `--max-runtime`, `--max-failures` cap any runaway run.
+- **OpenSpec layout** — `proposal.md` (steering) + `design.md` + `tasks.md` + `specs/` per change.
-```mermaid
-graph LR
-    S[Start iteration] --> R[Read Steering] --> T[Find first unchecked task] --> W[Do the work] --> V[Validate] --> C[Check off task] --> S
-    T -->|all tasks checked| D[Archive change]
-```
+**Agent mode (Linear-driven)**
-Each iteration reads the `## Steering` section of `proposal.md`, picks the first unchecked item from `tasks.md`, does the work, validates, and checks the item off. When every item is checked the loop archives the change.
+- **Linear polling** — picks up Todo tickets, resumes In Progress, re-runs reviewer-flagged Done.
+- **Indicators** — declarative `WORKFLOW.md` map for "which labels/statuses to watch and apply" at each lifecycle event.
+- **Worktrees** — every task runs in its own `git worktree` so concurrent workers can't stomp on each other.
+- **Confirmation gate** — optional human approval step between `tasks` and `implement`; revise via `@ralphy revise: <why>`.
+- **Self-review phase** — once tasks are checked off, an in-process reviewer can append more work for another round.
+- **Tmux session management** — `ralphy agent` re-execs into a managed tmux session so detaching the terminal doesn't kill the loop.
+- **Pre-existing error check** — pauses pickups when the trunk is red so the agent doesn't chase failures it didn't cause.
-## Install
+**PR + CI**
-Requires [Bun](https://bun.sh). For the Claude engine you also need the [Claude CLI](https://docs.anthropic.com/en/docs/claude-cli). The Makefile install path additionally needs `jq`.
+- **Auto PR open** — push branch and `gh pr create` on clean exit; idempotent (surfaces existing PR if open).
+- **Auto-merge opt-in** — `getAutoMerge` triggers `gh pr merge --auto --squash|merge|rebase` right after PR creation.
+- **Stacked PRs** — `--stack-prs` opens against a blocker's head branch when a `blocked_by` Linear relation has exactly one open PR.
+- **CI fix loop** — on red CI, pulls failed logs, appends to steering, re-spawns until green or `maxCiFixAttempts` hit.
+- **Conflict re-fix** — `gh pr view`–driven; on `mergeable: CONFLICTING` enqueues a conflict-resolution task automatically.
-```bash
-# Global (recommended)
-npm install -g @neriros/ralphy
-# or run without installing
-bunx @neriros/ralphy
+**Reviewer interaction**
-# Per-project install (builds + wires .ralph/ into the repo)
-bun install
-make install            # → ./.ralph
-make install ~          # → ~/.ralph
-make install /path/to   # → /path/to/.ralph
-```
+- **`@ralphy` mentions** — Linear comments _and_ GitHub PR comments trigger a fresh review run with the mention as the prompt.
+- **Code-review iteration** — unresolved review-thread comments queue a digest; Ralph agrees-and-fixes (resolving the thread) or disagrees-and-replies.
+- **Sticky task comment** — `tasks.md` mirrors into a single Linear comment that updates in place; a one-shot "📋 Plan" comment summarises proposal + design when planning completes.
-The per-project install builds the CLI and MCP server, copies them to `.ralph/bin/`, sets up templates, wires `.mcp.json`, and adds a `ralph` script to `package.json`. `.ralph/` is gitignored by default.
+**Observability**
-## Task mode
+- **Ink dashboard** — engine/model, poll-bucket breakdown, per-worker cards with live phase, command-in-flight, and stdout tail.
+- **Structured JSON event stream** — `--json-output` for CI; `--json-log-file` mirrors the same stream to disk.
+- **Per-worker logs** — `~/.ralph/agent-mode.log` (global) + `.ralph/logs/<change>.log` (per-task) + per-change `LOG.jsonl`.
-```bash
-# Create + run a new task
-ralphy loop task --name fix-auth --prompt "Fix the JWT validation bug" --claude opus --max-iterations 10
-# Resume the same task later (state is on disk)
-ralphy loop task --name fix-auth
-# Inspect
-ralphy agent list                    # local tasks + Linear tickets per indicator bucket (with linked PR URLs)
-ralphy loop status --name fix-auth  # one task (details)
-```
-Engine defaults to Claude Opus. Common safeguards: `--max-iterations`, `--max-cost`, `--max-runtime`, `--max-failures`. See the [CLI reference](#cli-reference) for the full set.
-## Agent mode
-`ralph agent` polls Linear, runs up to N concurrent task loops, and (optionally) opens PRs, watches CI, and iterates with reviewers. Requires `LINEAR_API_KEY`.
-```bash
-export LINEAR_API_KEY=lin_api_xxx
-ralphy agent --linear-team ENG --linear-assignee me --concurrency 3 --poll-interval 60
-```
+**Extensibility**
-Configuration lives in **`WORKFLOW.md`** at the project root — YAML frontmatter for settings, followed by a Jinja-style prompt template the worker renders for every iteration. A default is written on first run. CLI flags override config per-invocation.
+- **MCP server** — exposes `ralph_list_changes` / `get_change` / `create_change` / `append_steering` / `stop` to Claude-side agents (auto-wired on per-project install).
+- **`WORKFLOW.md` template body** — Jinja-style prompt rendered per iteration, so project-specific rules / boundaries / labels flow into every task automatically.
-### Lifecycle and triggers
-Each poll inspects Linear (and, when configured, GitHub PRs) and routes each issue into one of these spawn modes:
-| Mode             | When it fires                                                                                                                    | What changes                                                                                      |
-| ---------------- | -------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------- |
-| **fresh**        | Issue matches `getTodo`                                                                                                          | Scaffold a new change, spawn worker, apply `setInProgress`                                        |
-| **resume**       | Issue matches `getInProgress` (typical: agent restart)                                                                           | Re-attach to existing change directory, skip re-scaffold                                          |
-| **conflict-fix** | A tracked PR (`setDone` candidate _or_ an in-progress ticket's PR) is detected as `CONFLICTING` via `gh pr view`                 | Interrupt resume if needed, prepend a conflict-resolution task to `tasks.md`, reactivate state    |
-| **ci-fix**       | A tracked PR's CI is red (`gh pr view --json statusCheckRollup`) and `fixCiOnFailure` is enabled                                 | Prepend a "Fix failing CI checks" task with gh-driven log inspection; reactivate state            |
-| **review**       | Done issue carries the `getReview` marker (label trigger), _or_ a `@ralphy` mention is detected on Linear / the linked GitHub PR | Prepend a review task with the relevant comments; remove the `clearReview` label after pickup     |
-| **code-review**  | Open tracked PR has unresolved review-thread comments newer than Ralph's last pickup ack                                         | Prepend a digest of unresolved comments with fix-or-reply instructions; repeats until PR approved |
-> `conflict-fix` and `ci-fix` are routed entirely from GitHub state — there is no Linear `getConflicted` / `getCiFailed` indicator anymore. The merge-state scan reads `gh pr view` directly and enqueues the matching fix trigger.
+## How it works
 ```mermaid
-flowchart TD
-    POLL["Linear poll"] --> SCAN{trigger?}
-    SCAN -- "getTodo" --> FRESH["mode: fresh\nscaffold change"]
-    SCAN -- "getInProgress" --> RESUME["mode: resume"]
-    SCAN -- "gh: PR CONFLICTING" --> CFX["mode: conflict-fix\nprepend fix task"]
-    SCAN -- "gh: PR CI red\n(fixCiOnFailure)" --> CIFX["mode: ci-fix\nprepend CI fix task"]
-    SCAN -- "getReview\nor @ralphy mention\n(Linear / GitHub)" --> REV["mode: review\nprepend comments"]
-    SCAN -- "open PR with new\nunresolved review comments" --> CR["mode: review (code-review)\nprepend thread digest"]
-    FRESH & RESUME & CFX & CIFX & REV & CR --> IN_PROG["Linear: setInProgress\npost pickup comment"]
-    IN_PROG --> WT{useWorktree?}
-    WT -- yes --> SCAFFOLD["create worktree + branch"] --> WORKER([worker loop])
-    WT -- no --> WORKER
-    WORKER --> EXIT{exit code}
-    EXIT -- non-zero --> ERR_FLOW
-    EXIT -- 0 --> WANT_PR{wantPr?}
-    WANT_PR -- no --> DONE_FLOW
-    WANT_PR -- yes --> PR["push + gh pr create\n↺ rebase / hook-fix"]
-    PR -- "no commits" --> DONE_FLOW
-    PR -- "opened" --> WATCH
-    subgraph WATCH["watch loop"]
-        direction LR
-        WATCH_CHECK["conflict-check"] --> WATCH_CI["ci-poll / ci-fix"]
-        WATCH_CI --> WATCH_CHECK
-    end
-    WATCH -- "green & clean" --> DONE_FLOW
-    WATCH -- "gave up" --> ERR_FLOW
-    subgraph DONE_FLOW["clean exit"]
-        D1["worktree cleanup\n(if configured)"] --> D2["teardown script"] --> D5["Linear: setDone"]
-    end
-    subgraph ERR_FLOW["failure"]
-        E1["worktree preserved"] --> E2["Linear: setError\nclearInProgress"]
-    end
-    D5 & E2 --> POLL
-```
-The cycle repeats every poll. For code-review-iteration in particular, `setDone` re-applies between rounds so the next poll re-checks for new reviewer activity, until the PR is approved or merged.
-### Linear indicators
-Linear is the source of truth for which issues Ralph has touched. The `linear.indicators` map declares how Ralph queries and mutates Linear at each lifecycle event. All keys are optional; an unset key means "Ralph doesn't perform that action".
-| Key             | Type                   | Purpose                                                                         |
-| --------------- | ---------------------- | ------------------------------------------------------------------------------- |
-| `getTodo`       | `{filter: Marker[]}`   | Issues to pick up (fresh)                                                       |
-| `getInProgress` | `{filter: Marker[]}`   | Issues to resume after restart                                                  |
-| `getReview`     | `{filter: Marker[]}`   | Done issues flagged for review follow-up                                        |
-| `getAutoMerge`  | `{filter: Marker[]}`   | Issues whose PR should be auto-merged once required checks pass                 |
-| `setInProgress` | `Marker` or `Marker[]` | Applied when a worker spawns (any non-resume mode)                              |
-| `setDone`       | `Marker` or `Marker[]` | Applied on clean exit                                                           |
-| `setError`      | `Marker` or `Marker[]` | Applied on non-zero exit (quarantine signal — issue is _not_ auto-resumed)      |
-| `clearReview`   | `Marker` or `Marker[]` | Label(s) removed when a review pickup happens (status removal is not supported) |
-| `getApproved`   | `{filter: Marker[]}`   | Approval signal that releases a confirmation-gated ticket into `implement`      |
-| `clearApproved` | `Marker` or `Marker[]` | Label(s) removed once an approval is consumed (status removal is not supported) |
-> Conflict and CI-failure routing no longer use Linear indicators — there's no `getConflicted` / `setConflicted` / `clearConflicted` (or `getCiFailed` / `setCiFailed` / `clearCiFailed`). GitHub is the source of truth: `gh pr view` produces the conflicted / ci-failed / mergeable counts and pushes `conflict-fix` / `ci-fix` queue entries directly.
-A `Marker` is one of three types:
-| Marker type    | Example value         | Effect                                                                                                                                                                                                                                    |
-| -------------- | --------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
-| `"label"`      | `"ralph:in-progress"` | Adds or removes a Linear label on the issue                                                                                                                                                                                               |
-| `"status"`     | `"In Progress"`       | Updates the Linear workflow status of the issue                                                                                                                                                                                           |
-| `"attachment"` | `"In Progress"`       | Upserts a single **Ralphy** attachment on the issue; `value` becomes the subtitle. The same entry is reused across every lifecycle transition — Ralph creates it on first apply and edits it on subsequent ones, so the issue stays tidy. |
-Use an array when one event sets multiple — e.g. `setDone` flipping a status _and_ adding a label _and_ updating the attachment subtitle.
-Example `WORKFLOW.md` frontmatter — the prompt template after the closing `---` is omitted here; see the bundled default for the full file:
-```yaml
----
-concurrency: 3
-pollIntervalSeconds: 60
-engine: claude
-model: opus
-useWorktree: true
-createPrOnSuccess: true
-autoMergeStrategy: squash
-fixCiOnFailure: true
-linear:
-  team: ENG
-  assignee: me
-  postComments: true
-  updateEveryIterations: 10
-  mentionTrigger: true
-  mentionHandle: "@ralphy"
-  codeReviewTrigger: true
-  codeReviewStaleHours: 24
-  syncTasksToComment: true
-  syncSpecsAsAttachments: true
-  indicators:
-    # Todo → In Progress
-    getTodo:
-      filter:
-        - type: status
-          value: Todo
-    getInProgress:
-      filter:
-        - type: status
-          value: In Progress
-    setInProgress:
-      type: status
-      value: In Progress
-    # Done / review hand-off
-    setDone:
-      - type: status
-        value: In Review
-      - type: label
-        value: ralphy-done
-    getReview:
-      filter:
-        - type: label
-          value: "ralph:review"
-    clearReview:
-      type: label
-      value: "ralph:review"
-    # Auto-merge opt-in
-    getAutoMerge:
-      filter:
-        - type: label
-          value: "ralph:auto-merge"
-    # Error quarantine
-    setError:
-      type: label
-      value: "ralph:error"
----
+graph LR
+    S[Start iteration] --> R[Read Steering] --> T[First unchecked task] --> W[Do the work] --> V[Validate] --> C[Check off] --> S
+    T -->|all checked| D[Archive change]
 ```
-#### Confirmation mode (human gate before `implement`)
-Set `linear.confirmationMode.enabled: true` to insert a human review step between the OpenSpec `tasks` and `implement` phases. Once the agent finishes drafting `tasks.md`, the ticket parks in the new `awaiting-confirmation` phase and Ralphy posts a one-shot **📋 Ralphy plan ready** Linear comment summarising the plan. Gated tickets do **not** consume a `concurrency` slot — the agent is free to pick up other work while waiting.
-Three signals release (or skip) the gate:
+Each iteration reads `## Steering` from `proposal.md`, picks the first unchecked item in `tasks.md`, does the work, validates, and checks it off. When every item is checked the loop archives the change.
-| Signal                          | Effect                                                                                                                                                |
-| ------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------- |
-| Apply the `getApproved` marker  | Ralphy strips it via `clearApproved`, records the approval, and advances the ticket into `implement`.                                                 |
-| Comment `@ralphy revise: <why>` | The reason is written into steering, the round counter bumps, the ticket loops back to `design`. Any in-flight worker is reaped immediately.          |
-| Apply `optOutLabel`             | (default `ralph:auto-approve`) Bypasses the gate entirely — the ticket flows straight through `tasks` → `implement` as if confirmation mode were off. |
-By default confirmation mode applies to every ticket. Set `linear.confirmationMode.optInLabel` (e.g. `ralph:needs-review`) to flip the polarity — only tickets carrying that label go through the gate; everything else implements straight through.
-After `timeoutHours` (default `48`) with no activity Ralphy posts a single nudge comment per round. Tickets that exceed `maxConfirmationRounds` (default `3`) are labelled `ralph:stuck` and skipped on future polls until a human intervenes.
+## Install
-Wire up the matching indicators alongside the rest of the `linear.indicators` map:
+Requires [Bun](https://bun.sh). The Claude engine also needs the [Claude CLI](https://docs.anthropic.com/en/docs/claude-cli).
-```yaml
-getApproved:
-  filter:
-    - type: label
-      value: "ralph:approved"
-clearApproved:
-  type: label
-  value: "ralph:approved"
+```bash
+npm install -g @neriros/ralphy   # or: bunx @neriros/ralphy
 ```
-See `linear.confirmationMode` in `WORKFLOW.md` for the full set of knobs.
-#### Review follow-ups (label trigger)
-When a Linear issue is in a done state and a reviewer adds the `getReview` marker (typically a label like `ralph:review` left alongside comments), Ralph picks it up, applies `setInProgress`, removes the `clearReview` label so the trigger doesn't re-fire, filters out Ralph's own comments, and prepends every reviewer comment as a fresh task at the top of `tasks.md`. `setDone` re-applies on clean exit.
-#### `@ralphy` mention trigger
-Set `linear.mentionTrigger: true` to scan Linear issue comments on every non-cancelled issue (Todo, In Progress, Backlog, Triage, Done) _and_ on the linked GitHub PR for a configurable handle (`linear.mentionHandle`, default `@ralphy`). Each unprocessed mention queues the issue as a review run, with the mention text used **verbatim** as the prepended task. Idempotency: a mention is processed when its `createdAt` is older than Ralph's latest `🔁 picked up` Linear comment, so the same comment never re-fires. Requires `gh` for the GitHub side.
-#### Code-review iteration
+## Task mode — one-shot loop
-Set `linear.codeReviewTrigger: true` (or pass `--code-review`) to watch open, unmerged, unapproved tracked PRs for unresolved review-thread comments. New activity on any unresolved thread queues a review run whose task is a digest of every unresolved comment + instructions:
-- **If Ralph agrees** with a comment — fix, commit, push, and resolve the thread (via `gh api graphql`'s `resolveReviewThread`).
-- **If Ralph disagrees** — reply on the thread with reasoning via `gh api .../comments/{id}/replies` and leave it unresolved.
-The loop exits; the next poll re-checks the PR. The cycle continues until the PR is **approved** or **merged**. If the reviewer is silent for more than `linear.codeReviewStaleHours` (default `24`, `0` disables) while Ralph is the last actor, one `@`-mention ping comment is posted on the GitHub PR.
-#### Self-review phase
+```bash
+ralphy loop task --name fix-auth --prompt "Fix the JWT validation bug" --claude opus --max-iterations 10
-Once every task in `tasks.md` is checked off, the worker can spawn an in-process reviewer pass before exiting. The reviewer reads `proposal.md`, `design.md`, and the diff, and either appends new tasks back into `tasks.md` (looping the worker for another round) or signs off. Configure under `openspec.reviewPhase`:
+# Resume later (state is on disk)
+ralphy loop task --name fix-auth
-```yaml
-openspec:
-  reviewPhase:
-    enabled: true
-    maxRounds: 2 # hard cap on review iterations (default 1)
-    reviewerModel: claude-sonnet-4-6 # override the reviewer's model (optional)
-    reviewerContextStrategy: fresh # "fresh" = clean context per round (default), "warm" = reuse worker context
+# Inspect
+ralphy loop status --name fix-auth
 ```
-CLI equivalents: `--review-enabled`, `--review-max-rounds <N>`, `--review-model <id>`, `--review-context-strategy fresh|warm`. The worker passes these to itself when respawning, so the same review settings apply across `respawn` / `conflict-fix` / `ci-fix` lifecycles.
-#### Sync tasks into a Linear comment
-`linear.syncTasksToComment` (default `true`) mirrors the active change's
-`tasks.md` into a dedicated Linear **comment** instead of the issue
-description. The same comment is updated in place across iterations so
-the timeline stays clean. When `ralph_append_steering` is invoked the
-existing tasks comment is deleted and re-posted so it always lands at
-the bottom of the timeline, after the new steering comment.
-The first time planning completes (every `- [ ]` under `## Planning` in
-`tasks.md` becomes `- [x]`), Ralph posts a one-shot "📋 Plan" comment
-summarizing `proposal.md` (`## Why` + `## What Changes`) and the first
-paragraph of `design.md`.
-#### Conflict re-fix / CI re-fix
-Every poll, the merge-state scanner reads `gh pr view --json state,mergeable,mergeStateStatus,statusCheckRollup` for each tracked PR:
-- **`mergeable === "CONFLICTING"`** (or `mergeStateStatus === "DIRTY"`) → enqueue a `conflict-fix` run that prepends a conflict-resolution task to `tasks.md` and re-activates the change. In-progress tickets are interrupted in favour of fixing the merge state.
-- **`statusCheckRollup` shows red CI** and `fixCiOnFailure` is enabled → enqueue a `ci-fix` run that prepends a "Fix failing CI checks" task with `gh run view --log-failed` steps so the worker can read the failure logs.
-No Linear labels are involved in either path — `gh` is the single source of truth, and the matching `conflict-fix` / `ci-fix` queue entries land directly. A one-line Linear comment is posted for visibility when a ticket is promoted into a fix flow.
-The scanner is resilient to:
-- Transient `gh` failures (failed PR-discovery is cached with a 10-minute TTL — not permanent).
-- Branch-name drift after a Linear title edit (falls back to `gh pr list --search "<ID> in:title state:open"`).
-- GitHub's async `UNKNOWN` mergeability response (fibonacci backoff up to ~31s total, also consults `mergeStateStatus` which often resolves before `mergeable` does).
-### PR + CI integration
-| Flag / config                            | Behavior                                                                                                                                                                                                                                                                                                                                                                                                            |
-| ---------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
-| `createPrOnSuccess` / `--create-pr`      | After a clean exit, push the worker's branch and `gh pr create`. Title: `<ID>: <title>`. Idempotent — surfaces the existing URL if the PR is already open. Requires `--worktree` and `gh` authenticated. `prBaseBranch` defaults to `main`; override per-issue by labelling the Linear issue with `ralph:branch:<branch-name>`.                                                                                     |
-| `stackPrsOnDependencies` / `--stack-prs` | When the Linear issue is blocked by another issue (`blocked_by` relation) that has exactly one open GitHub PR, open this PR against that blocker's head branch instead of `prBaseBranch`. Resolves the blocker's PR via Linear's auto-attachment + `gh pr view --json state,headRefName`. Falls back to `prBaseBranch` when zero / multiple blockers (or PRs) match. A `ralph:branch:<name>` label still wins.      |
-| `getAutoMerge` indicator                 | Opt an issue in for GitHub auto-merge (any-of label/status filter, same shape as `getReview`). When matched, Ralph runs `gh pr merge <url> --auto --<strategy>` right after opening the PR so GitHub merges as soon as required checks pass. Strategy comes from `autoMergeStrategy` (`squash` \| `merge` \| `rebase`, default `squash`). Failures are logged but non-fatal — the CI/conflict watch loop continues. |
-| `fixCiOnFailure` / `--fix-ci`            | After the PR opens, poll `gh pr checks`. On failure, pull failed logs via `gh run view --log-failed`, append them to `## Steering`, re-spawn the worker, and push the new commits — repeat until green or `maxCiFixAttempts` (default `5`) is hit. While this loop runs, `setDone` is **not** applied; if CI is never green the worker is treated as failed.                                                        |
-| `ciPollIntervalSeconds`                  | Seconds between CI status polls (default `30`).                                                                                                                                                                                                                                                                                                                                                                     |
-| `ignoreCiChecks`                         | Array of check names to ignore when computing pass/fail.                                                                                                                                                                                                                                                                                                                                                            |
-| `codeReviewTrigger` / `--code-review`    | See [Code-review iteration](#code-review-iteration).                                                                                                                                                                                                                                                                                                                                                                |
-### Pre-existing error check
+Safeguards: `--max-iterations`, `--max-cost`, `--max-runtime`, `--max-failures`. Engine defaults to Claude Opus. See [GUIDE.md → CLI reference](./GUIDE.md#cli-reference) for the full set.
-Opt-in gate that protects the agent from chasing failures it cannot fix. When
-enabled (config `preExistingErrorCheck.enabled: true` or `--pre-existing-error-check`),
-on every poll tick Ralph runs the configured commands against the base branch
-HEAD. If any command fails:
+## Agent mode — Linear-driven
-1. A Linear issue is created with the failing command, exit code, and truncated
-   output (fingerprint embedded in the body so re-runs with the same failure
-   don't open duplicates).
-2. The coordinator pauses — new fresh/resume/conflict-fix/review pickups are
-   blocked until the trunk is green again. **In-flight workers are not killed.**
-3. The dashboard shows a red `⛔ BASELINE BROKEN <LIN-ID> · <duration>` banner.
+`ralphy agent` polls Linear, runs up to N concurrent task loops, and (optionally) opens PRs, watches CI, and iterates with reviewers. Requires `LINEAR_API_KEY`.
-When the baseline goes green (the human merged the fix), the next poll lifts
-the pause automatically.
-```yaml
-preExistingErrorCheck:
-  enabled: false
-  commands: # falls back to commands.lint + commands.test when empty
-    - bun run lint
-    - bun run test
-  baseBranch: main
-  label: "ralph:pre-existing-error"
-  outputCharLimit: 4000
+```bash
+export LINEAR_API_KEY=lin_api_xxx
+ralphy agent --linear-team ENG --linear-assignee me --concurrency 3 --create-pr --fix-ci
 ```
-### Worktrees, setup, teardown
-With `useWorktree: true` (or `--worktree`) each task runs in an isolated worktree at `~/.ralph/<project>/worktrees/<change-name>` checked out onto a fresh `ralph/<change-name>` branch. Concurrent workers can't stomp on each other, and the worker's cwd _is_ the worktree.
-- **`setupScript`** — `sh -c`-run inside the worktree right after scaffolding (e.g. `bun install`, `cp .env.example .env`).
-- **`teardownScript`** — `sh -c`-run after the loop exits and (optional) worktree cleanup.
-Both scripts receive `WORKSPACE_ROOT` in their environment — the absolute path to the origin repository (the parent of the worktree). Use it to reference project-root files from inside a worktree, e.g. `cp "$WORKSPACE_ROOT/.env.example" .env`.
-- **`cleanupWorktreeOnSuccess`** — remove the worktree on clean exit. Failed workers always keep their worktree + branch for human inspection.
-Both scripts log failures but never block the loop. **`appendPrompt`** (or `--prompt` in agent mode) is appended to every scaffolded `proposal.md` under `## Additional instructions` — use it for cross-cutting guidance every task should see.
-### Running under tmux
-If `tmux` is on `$PATH`, `ralphy agent` re-execs itself inside a managed tmux session on first launch (per-workspace name). Detaching the terminal — closing the SSH session, the laptop lid, the `tmux detach` keybind — leaves the agent running. Re-running `ralphy agent` from the same workspace attaches to the existing session instead of starting a second copy.
-| Command                  | Behavior                                                                   |
-| ------------------------ | -------------------------------------------------------------------------- |
-| `ralphy agent`           | Attach to the managed tmux session, or start one if absent                 |
-| `ralphy agent status`    | Report whether the managed session exists and is currently attached        |
-| `ralphy agent stop`      | Kill the managed session (workers exit cleanly)                            |
-| `ralphy agent --no-tmux` | Skip tmux entirely and run the agent in the foreground (CI, scripted runs) |
-### Dashboard and logs
+Each poll routes every matching issue into one of: **fresh** (Todo → scaffold + spawn), **resume** (In Progress → reattach), **conflict-fix** / **ci-fix** (PR red on GitHub → prepend fix task), or **review** / **code-review** (reviewer comments or `@ralphy` mention).
-The terminal dashboard shows three always-visible panels: **RALPH AGENT** (engine/model, concurrency, poll interval, active limits, feature flags, Linear filter), **POLL STATUS + WORKERS** (last-poll bucket breakdown — `todo · res · conf · rev · @` (each colored when non-zero) plus `↺ Ns` next-poll countdown, active/queued worker totals), and **TASKS tab bar** (numbered worker tabs — `Tab` / `← →` / `1-9` to switch).
+Configuration lives in **`WORKFLOW.md`** at the project root — YAML frontmatter for settings, followed by the Jinja-style prompt template the worker renders each iteration. A default is written on first run; CLI flags override per invocation.
-Each worker card shows: priority badge + identifier + title + mode badge, `↗ LINEAR`, `↗ PR`, `▶ TASK` (first unchecked task from `tasks.md`, refreshed every second), `PHASE` with color + elapsed time, `⏵ CMD` when a shell command is in flight, `LOG` path for `tail -f`, and `─ OUTPUT ─` with live stdout/stderr.
+See **[GUIDE.md](./GUIDE.md)** for:
-Log files (every line is `[ISO] [type] message`):
-| File                                     | Contains                                                                                  |
-| ---------------------------------------- | ----------------------------------------------------------------------------------------- |
-| `~/.ralph/agent-mode.log`                | Global session log, appended each agent run                                               |
-| `<projectRoot>/.ralph/logs/<change>.log` | Per-worker unified log: output + phases + coordinator events                              |
-| `<taskDir>/LOG.jsonl`                    | Structured JSON event log used by the web UI                                              |
-| `<path from --json-log-file>`            | Mirror of the structured event stream (state changes, phases, polls) — file-tail friendly |
-Failed workers are not marked processed, so they retry on the next poll. SIGINT / SIGTERM cleanly stops polling and kills active workers. All Linear side effects are best-effort — failures log a warning but never block the loop.
-## CLI reference
-**Task flags**
-| Option                 | Description                                               |
-| ---------------------- | --------------------------------------------------------- |
-| `--name <name>`        | Task name (required for most commands)                    |
-| `--prompt <text>`      | Task description                                          |
-| `--prompt-file <path>` | Read prompt from file                                     |
-| `--claude [model]`     | Use Claude engine (haiku / sonnet / opus, default opus)   |
-| `--codex`              | Use Codex engine                                          |
-| `--model <model>`      | Set model (haiku / sonnet / opus)                         |
-| `--max-iterations <N>` | Stop after N iterations (`0` = unlimited)                 |
-| `--max-cost <N>`       | Stop when total cost exceeds $N                           |
-| `--max-runtime <N>`    | Stop after N minutes                                      |
-| `--max-failures <N>`   | Stop after N consecutive identical failures (default `5`) |
-| `--unlimited`          | Sets max iterations to 0 (default)                        |
-| `--delay <N>`          | Seconds between iterations                                |
-| `--manual-test`        | Enable manual-test phase (creates test tasks)             |
-| `--log`                | Log raw engine stream                                     |
-| `--verbose`            | Verbose output                                            |
-**Agent-mode flags**
-| Option                          | Behavior                                                                                     |
-| ------------------------------- | -------------------------------------------------------------------------------------------- |
-| `--linear-team <key>`           | Linear team key (e.g. `ENG`)                                                                 |
-| `--linear-assignee <id>`        | Assignee filter (user id, email, or `me`)                                                    |
-| `--poll-interval <s>`           | Seconds between Linear polls (default `60`)                                                  |
-| `--concurrency <n>`             | Max concurrent task loops (default `1`)                                                      |
-| `--max-tickets <n>`             | Stop picking up new issues after N have been started this run (`0` = unlimited)              |
-| `--worktree`                    | Run each task in its own git worktree                                                        |
-| `--indicator <k>:<t>:<v>`       | Override one `linear.indicators` entry (repeatable, e.g. `setDone:status:Done`)              |
-| `--create-pr`                   | Push worker branch + open a GitHub PR on success (needs `--worktree`)                        |
-| `--fix-ci`                      | After PR opens, re-run task on CI failures until green (needs `--create-pr`)                 |
-| `--stack-prs`                   | Open the PR against a blocker issue's open-PR head branch when present (needs `--create-pr`) |
-| `--code-review`                 | Watch open tracked PRs for unresolved review comments and prepend a code-review task         |
-| `--json-output`                 | Emit JSONL to stdout instead of rendering the Ink dashboard (CI / scripting)                 |
-| `--json-log-file <path>`        | Mirror the JSONL event stream to a file alongside the TUI or `--json-output`                 |
-| `--no-tmux`                     | Don't auto-reexec under tmux; run the agent in the foreground                                |
-| `--review-enabled`              | Enable the worker's self-review phase (see [Self-review phase](#self-review-phase))          |
-| `--review-max-rounds <N>`       | Hard cap on review rounds per task (default `1`)                                             |
-| `--review-model <id>`           | Override the reviewer's model (defaults to the worker's model)                               |
-| `--review-context-strategy <s>` | `fresh` (default) for a clean reviewer context per round, or `warm` to reuse the worker      |
-**List-mode flags**
-| Option                | Behavior                                                                                                                                                                                    |
-| --------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
-| `--debug --name <id>` | Diagnose why a Linear ticket (e.g. `ENG-42`) is not being picked up — checks team, assignee, include / exclude markers, and blocked-by relations against every configured `get*` indicator. |
-`ralph list` reads `WORKFLOW.md` and, when `LINEAR_API_KEY` is set, fetches every issue matching each configured `getTodo` / `getInProgress` / `getReview` / `getAutoMerge` indicator using the same include / exclude rules as `ralph agent`. For each ticket it also resolves the linked GitHub PR URL from Linear attachments and prints its conflict / CI status from `gh pr view`.
-**`--max-tickets`.** Caps how many issues ralph picks up in a single agent run. Once the limit is hit the coordinator stops enqueuing new work; in-flight workers continue to completion, and the dashboard header shows `│ tickets ≤N`. The limit resets each restart.
-## Change layout (OpenSpec)
-There are no phases. One loop, one prompt, one `tasks.md` checklist. Each change lives in `<projectRoot>/openspec/changes/<name>/` (managed by OpenSpec) plus `<projectRoot>/.ralph/tasks/<name>/` (loop state only):
-| File / Directory                        | Purpose                                                   |
-| --------------------------------------- | --------------------------------------------------------- |
-| `openspec/changes/<name>/proposal.md`   | Description, goals, and the `## Steering` section         |
-| `openspec/changes/<name>/design.md`     | Technical design and architecture decisions               |
-| `openspec/changes/<name>/tasks.md`      | Checklist driving iteration — one unchecked item per loop |
-| `openspec/changes/<name>/specs/`        | Per-task specifications                                   |
-| `.ralph/tasks/<name>/.ralph-state.json` | Loop state (iteration count, status, cost, history)       |
-| `.ralph/tasks/<name>/STOP`              | Create this file to signal the loop to stop               |
-Steering is delivered by editing the `## Steering` section of `proposal.md`. The agent reads it at the start of every iteration.
+- Lifecycle diagram + per-mode behavior
+- `linear.indicators` schema and the full `WORKFLOW.md` example
+- Confirmation gate (`@ralphy revise`, opt-in/out labels)
+- `@ralphy` mentions, code-review iteration, self-review phase
+- PR + CI integration (auto-merge, stacked PRs, fix-ci loop)
+- Pre-existing error check, worktrees, tmux session management, dashboard, logs
+- Complete CLI reference (task, agent, list modes)
 ## MCP server
-Ralphy includes an MCP server that exposes task-management tools to Claude agents. It's auto-configured during installation.
+Ralphy ships an MCP server (auto-configured on per-project install) exposing `ralph_list_changes`, `ralph_get_change`, `ralph_create_change`, `ralph_append_steering`, `ralph_stop`. See [GUIDE.md → MCP server](./GUIDE.md#mcp-server).
-| Tool                    | Purpose                                    |
-| ----------------------- | ------------------------------------------ |
-| `ralph_list_changes`    | List changes with status                   |
-| `ralph_get_change`      | Get change details                         |
-| `ralph_create_change`   | Create and optionally start a change       |
-| `ralph_append_steering` | Append a steering message to `proposal.md` |
-| `ralph_stop`            | Stop a running change                      |
+## Development
-## Project structure and development
-```
-ralphy/
-├── apps/
-│   ├── cli/          # CLI application
-│   └── mcp/          # MCP server
-├── packages/
-│   ├── core/         # State management and loop
-│   ├── context/      # Storage abstraction
-│   ├── content/      # Base prompt and task templates
-│   ├── engine/       # Claude / Codex engine spawning
-│   ├── openspec/     # ChangeStore interface and OpenSpec adapter
-│   ├── output/       # Terminal formatting
-│   └── types/        # Zod schemas and types
-└── Makefile
+```bash
+bun install
+bunx nx run-many -t lint,typecheck,test,build   # all checks
+bunx nx run cli:build                            # CLI only
 ```
+Per-project install (builds + wires `.ralph/` and `.mcp.json` into the repo):
 ```bash
-bun install
-bunx nx run-many -t lint,typecheck,test,build   # Run all checks
-bunx nx run cli:build                            # Build CLI only
+make install            # → ./.ralph
+make install ~          # → ~/.ralph
+make install /path/to   # → /path/to/.ralph
 ```

package/dist/mcp/index.js CHANGED Viewed

@@ -24066,6 +24066,7 @@ var StateSchema = exports_external.object({
   model: exports_external.string().default("opus"),
   manualTest: exports_external.boolean().default(false),
   createPr: exports_external.boolean().default(false),
+  validateOnComplete: exports_external.boolean().default(false),
   usage: UsageSchema.default({}),
   history: exports_external.array(HistoryEntrySchema).default([]),
   metadata: exports_external.object({ branch: exports_external.string().optional() }).default({}),

package/dist/shell/index.js CHANGED Viewed

@@ -18928,8 +18928,8 @@ import { readFileSync } from "fs";
 import { resolve } from "path";
 function getVersion() {
   try {
-    if ("3.8.9")
-      return "3.8.9";
+    if ("3.8.11")
+      return "3.8.11";
   } catch {}
   const dirsToTry = [];
   try {
@@ -64364,6 +64364,7 @@ var init_types2 = __esm(() => {
     model: exports_external.string().default("opus"),
     manualTest: exports_external.boolean().default(false),
     createPr: exports_external.boolean().default(false),
+    validateOnComplete: exports_external.boolean().default(false),
     usage: UsageSchema.default({}),
     history: exports_external.array(HistoryEntrySchema).default([]),
     metadata: exports_external.object({ branch: exports_external.string().optional() }).default({}),
@@ -71284,8 +71285,11 @@ function buildTaskPrompt(state, taskDir, reviewPhase) {
   prompt += `Change name: \`${state.name}\`
 `;
-  prompt += `Run \`bunx openspec validate ${state.name}\` before committing.
+  const validateOnly = state.validateOnComplete && !state.createPr;
+  if (!validateOnly) {
+    prompt += `Run \`bunx openspec validate ${state.name}\` before committing.
 `;
+  }
   prompt += `Commit all changed files yourself before finishing \u2014 stage files individually (e.g. \`git add path/to/file\`), never \`git add -A\` or \`git commit -am\`. Nothing is committed automatically after you exit.
 `;
   if (state.createPr) {
@@ -71631,7 +71635,8 @@ function useLoop(opts) {
           writeState(stateDir, currentState);
           setState(currentState);
           try {
-            if (typeof opts.changeStore.getStatus === "function") {
+            const skipStatusCheck = currentState.validateOnComplete && !currentState.createPr;
+            if (!skipStatusCheck && typeof opts.changeStore.getStatus === "function") {
               const status = await opts.changeStore.getStatus(opts.name);
               if (!status.isComplete) {
                 const blocked = status.artifacts.filter((a) => a.status !== "done").map((a) => `${a.id}=${a.status}`).join(", ");
@@ -99769,6 +99774,41 @@ async function runTeardownPhase(input, deps) {
     log2(`! teardown script threw: ${err.message}`, "yellow");
   }
 }
+async function runValidateOnlyPhase(input, deps) {
+  const { changeName, changeDir, stateFilePath, validateCommands, cwd: cwd2 } = input;
+  const { log: log2, emit: emit2, respawnWorker } = deps;
+  const runCommand = deps.runCommand ?? defaultRunCommand;
+  emit2("validate");
+  if (validateCommands.length > 0) {
+    for (const command of validateCommands) {
+      const { exitCode, output } = await runCommand(command, cwd2);
+      if (exitCode !== 0) {
+        emit2("validate-fix", command);
+        log2(`! validation check failed: ${command}`, "yellow");
+        try {
+          await prependFixTask(join28(changeDir, AGENT_TASKS_FILENAME), `Fix failing validation: ${command}`, output || `Command exited with code ${exitCode}`);
+        } catch (err) {
+          log2(`! could not prepend fix task: ${err.message}`, "red");
+          return 1;
+        }
+        await reactivateState(stateFilePath, log2, changeName);
+        return respawnWorker();
+      }
+    }
+  }
+  try {
+    await prependFixTask(join28(changeDir, AGENT_TASKS_FILENAME), "Run openspec validation", [
+      `Run \`bunx openspec validate ${changeName}\` to validate the change artifacts.`,
+      `Commit any pending changes before running the validation command.`
+    ].join(`
+`));
+  } catch (err) {
+    log2(`! could not prepend validation task: ${err.message}`, "red");
+    return 1;
+  }
+  await reactivateState(stateFilePath, log2, changeName);
+  return respawnWorker();
+}
 async function runPostTask(input, deps) {
   const { log: log2, cmd, git: git2, runScript } = deps;
   const emit2 = (phase2, detail) => deps.onPhase?.(phase2, detail);
@@ -99785,6 +99825,7 @@ async function runPostTask(input, deps) {
     wantPr,
     wantFixCi,
     wantAutoMerge,
+    wantValidateOnly,
     cfg,
     respawnWorker
   } = input;
@@ -99798,6 +99839,23 @@ async function runPostTask(input, deps) {
     }
   }
   let effectiveCode = exitCode;
+  if (wantValidateOnly && effectiveCode === 0) {
+    effectiveCode = await runValidateOnlyPhase({
+      changeName,
+      changeDir,
+      stateFilePath,
+      validateCommands: cfg.validateCommands ?? [],
+      cwd: cwd2
+    }, {
+      log: log2,
+      emit: emit2,
+      respawnWorker
+    });
+    emit2(effectiveCode === 0 ? "done" : "gave-up", effectiveCode !== 0 ? `exit ${effectiveCode}` : undefined);
+    await runWorktreeCleanupPhase({ changeName, cwd: cwd2, projectRoot, useWorktree, effectiveCode, cfg }, { git: git2, log: log2, emit: emit2 });
+    await runTeardownPhase({ cwd: cwd2, teardownScript: cfg.teardownScript }, { runScript, log: log2, emit: emit2 });
+    return effectiveCode;
+  }
   if (effectiveCode !== 0 && wantPr) {
     log2(`  skipping PR phase for ${changeName} (worker exited with code ${effectiveCode})`, "gray");
   }
@@ -99871,7 +99929,18 @@ async function runPostTask(input, deps) {
   await runTeardownPhase({ cwd: cwd2, teardownScript: cfg.teardownScript }, { runScript, log: log2, emit: emit2 });
   return effectiveCode;
 }
-var CI_FAILED_EXIT = 70, PR_FAILED_EXIT = 71, repoAutoMergeCache;
+var CI_FAILED_EXIT = 70, PR_FAILED_EXIT = 71, repoAutoMergeCache, defaultRunCommand = async (cmd, cwd2) => {
+  const proc = Bun.spawnSync({
+    cmd: ["sh", "-c", cmd],
+    cwd: cwd2,
+    stdout: "pipe",
+    stderr: "pipe"
+  });
+  const decoder = new TextDecoder;
+  const output = [decoder.decode(proc.stdout), decoder.decode(proc.stderr)].filter(Boolean).join(`
+`);
+  return { exitCode: proc.exitCode ?? 1, output };
+};
 var init_post_task = __esm(() => {
   init_tasks_md();
   init_fs_change();
@@ -100059,6 +100128,23 @@ function createSpawnWorker(input) {
     const wantAutoMerge = issueForChange ? issueMatchesGetIndicator(issueForChange, indicators.getAutoMerge) : false;
     const wrapped = handle.exited.then(async (code) => {
       const workerLayout = projectLayout(cwd2);
+      const validateSpecPath = join30(workerLayout.changeDir(changeName), "specs", "validate.md");
+      const hasValidateSpec = await Bun.file(validateSpecPath).exists();
+      const wantValidateOnly = hasValidateSpec && !wantPrBase;
+      if (hasValidateSpec) {
+        try {
+          const stateFile = workerLayout.stateFile(changeName);
+          const sf = Bun.file(stateFile);
+          if (await sf.exists()) {
+            const stateData = JSON.parse(await sf.text());
+            if (!stateData.validateOnComplete) {
+              stateData.validateOnComplete = true;
+              stateData.createPr = false;
+              await Bun.write(stateFile, JSON.stringify(stateData, null, 2));
+            }
+          }
+        } catch {}
+      }
       try {
         const prevTasks = await prevTasksPromise;
         const nextFile = Bun.file(missionTasksPath);
@@ -100104,6 +100190,7 @@ function createSpawnWorker(input) {
         wantPr,
         wantFixCi,
         wantAutoMerge,
+        wantValidateOnly,
         cfg: {
           teardownScript: cfg.teardownScript ?? null,
           prBaseBranch: cfg.prBaseBranch,
@@ -100115,7 +100202,8 @@ function createSpawnWorker(input) {
           stackPrsOnDependencies: args.stackPrs || cfg.stackPrsOnDependencies,
           neverTouch: cfg.boundaries.never_touch,
           metaOnlyFiles: cfg.boundaries.meta_only_files,
-          manualMergeWhenAutoMergeDisabled: cfg.manualMergeWhenAutoMergeDisabled
+          manualMergeWhenAutoMergeDisabled: cfg.manualMergeWhenAutoMergeDisabled,
+          validateCommands: [cfg.commands.test, cfg.commands.lint, cfg.commands.typecheck].filter((c) => Boolean(c))
         },
         respawnWorker: respawn
       }, {

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@neriros/ralphy",
-  "version": "3.8.9",
+  "version": "3.8.11",
   "description": "An iterative AI task execution framework. Orchestrates multi-phase autonomous work using Claude or Codex engines.",
   "keywords": [
     "agent",