npm - @ralph-orchestrator/ralph-cli - Versions diffs - 2.1.3 → 2.2.1 - Mend

@ralph-orchestrator/ralph-cli 2.1.3 → 2.2.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (3) hide show

package/README.md CHANGED Viewed

@@ -3,6 +3,7 @@
 [![License](https://img.shields.io/badge/license-MIT-blue)](LICENSE)
 [![Rust](https://img.shields.io/badge/rust-1.75+-orange)](https://www.rust-lang.org/)
 [![Build](https://img.shields.io/github/actions/workflow/status/mikeyobrien/ralph-orchestrator/ci.yml?branch=main&label=CI)](https://github.com/mikeyobrien/ralph-orchestrator/actions)
+[![Coverage](https://img.shields.io/badge/coverage-65%25-yellowgreen)](coverage/index.html)
 [![Mentioned in Awesome Claude Code](https://awesome.re/mentioned-badge.svg)](https://github.com/hesreallyhim/awesome-claude-code)
@@ -23,8 +24,10 @@ v1.0.0 was ralphed into existence with little oversight and guidance. v2.0.0 is
 - [Installation](#installation)
 - [Quick Start](#quick-start)
 - [Configuration](#configuration)
+- [Custom Backends and Per-Hat Configuration](#custom-backends-and-per-hat-configuration)
 - [Presets](#presets)
 - [Key Concepts](#key-concepts)
+- [Orchestration and Coordination Patterns](#orchestration-and-coordination-patterns)
 - [CLI Reference](#cli-reference)
 - [Architecture](#architecture)
 - [Building & Testing](#building--testing)
@@ -69,7 +72,9 @@ See [AGENTS.md](AGENTS.md) for the full philosophy.
 - **Event-Driven Coordination** — Hats communicate through typed events with glob pattern matching
 - **Backpressure Enforcement** — Gates that reject incomplete work (tests, lint, typecheck)
 - **Presets Library** — 20+ pre-configured workflows for common development patterns
-- **Interactive TUI** — Real-time terminal UI for monitoring Ralph's activity (experimental)
+- **Interactive TUI** — Real-time terminal UI for monitoring Ralph's activity (enabled by default)
+- **Memories** — Persistent learning across sessions stored in `.agent/memories.md`
+- **Tasks** — Runtime work tracking stored in `.agent/tasks.jsonl`
 - **Session Recording** — Record and replay sessions for debugging and testing (experimental)
 ## Installation
@@ -195,17 +200,17 @@ ralph run -p "Add input validation to the user API endpoints"
 ### 3. Run Ralph
 ```bash
-# Autonomous mode (headless, default)
+# TUI mode (default) — real-time terminal UI for monitoring
 ralph run
 # With inline prompt
 ralph run -p "Implement the login endpoint with JWT authentication"
-# TUI mode (experimental)
-ralph run --tui
+# Headless mode (no TUI)
+ralph run --no-tui
 # Resume interrupted session
-ralph resume
+ralph run --continue
 # Dry run (show what would execute)
 ralph run --dry-run
@@ -303,6 +308,15 @@ core:
     - "Don't assume 'not implemented' - search first"
     - "Backpressure is law - tests/typecheck/lint must pass"
+# Memories — persistent learning across sessions (enabled by default)
+memories:
+  enabled: true                         # Set false to disable
+  inject: auto                          # auto, manual, or none
+# Tasks — runtime work tracking (enabled by default)
+tasks:
+  enabled: true                         # Set false to use scratchpad-only mode
 # Custom hats (omit to use default planner/builder)
 hats:
   my_hat:
@@ -314,6 +328,81 @@ hats:
 ```
+## Custom Backends and Per-Hat Configuration
+### Custom Backends
+Beyond the built-in backends (Claude, Kiro, Gemini, Codex, Amp, Copilot, OpenCode), you can define custom backends to integrate any CLI-based AI agent:
+```yaml
+cli:
+  backend: "custom"
+  command: "my-agent"
+  args: ["--headless", "--auto-approve"]
+  prompt_mode: "arg"        # "arg" or "stdin"
+  prompt_flag: "-p"         # Optional: flag for prompt argument
+```
+| Field | Description |
+|-------|-------------|
+| `command` | The CLI command to execute |
+| `args` | Arguments inserted before the prompt |
+| `prompt_mode` | How to pass the prompt: `arg` (command-line argument) or `stdin` |
+| `prompt_flag` | Flag preceding the prompt (e.g., `-p`, `--prompt`). If omitted, prompt is positional. |
+### Per-Hat Backend Configuration
+Different hats can use different backends, enabling specialized tools for specialized tasks:
+```yaml
+cli:
+  backend: "claude"  # Default for Ralph and hats without explicit backend
+hats:
+  builder:
+    name: "🔨 Builder"
+    description: "Implements code"
+    triggers: ["build.task"]
+    publishes: ["build.done"]
+    backend: "claude"        # Explicit: Claude for coding
+  researcher:
+    name: "🔍 Researcher"
+    description: "Researches technical questions"
+    triggers: ["research.task"]
+    publishes: ["research.done"]
+    backend:                 # Kiro with custom agent (has MCP tools)
+      type: "kiro"
+      agent: "researcher"
+  reviewer:
+    name: "👀 Reviewer"
+    description: "Reviews code changes"
+    triggers: ["review.task"]
+    publishes: ["review.done"]
+    backend: "gemini"        # Different model for fresh perspective
+```
+**Backend Types:**
+| Type | Syntax | Invocation |
+|------|--------|------------|
+| Named | `backend: "claude"` | Uses standard backend configuration |
+| Kiro Agent | `backend: { type: "kiro", agent: "builder" }` | `kiro-cli --agent builder ...` |
+| Custom | `backend: { command: "...", args: [...] }` | Your custom command |
+**When to mix backends:**
+| Scenario | Recommended Backend |
+|----------|---------------------|
+| Complex coding | Claude (best reasoning) |
+| AWS/cloud tasks | Kiro with agent (MCP tools) |
+| Code review | Different model (fresh perspective) |
+| Internal tools | Custom backend |
+| Cost optimization | Faster/cheaper model for simple tasks |
+Hats without explicit `backend` inherit from `cli.backend`.
 ## Presets
 Presets are pre-configured workflows for common development patterns.
@@ -396,9 +485,430 @@ View event history:
 ralph events
 ```
-### Scratchpad
+## Orchestration and Coordination Patterns
+Ralph's hat system enables sophisticated multi-agent workflows through event-driven coordination. This section covers the architectural patterns, event routing mechanics, and built-in workflow templates.
+### How Hat-Based Orchestration Works
+#### The Event-Driven Model
+Hats communicate through a **pub/sub event system**:
+1. **Ralph publishes a starting event** (e.g., `task.start`)
+2. **The matching hat activates** — the hat subscribed to that event takes over
+3. **The hat does its work** and publishes an event when done
+4. **The next hat activates** — triggered by the new event
+5. **The cycle continues** until a termination event or `LOOP_COMPLETE`
+```
+┌─────────────────────────────────────────────────────────────────┐
+│  task.start → [Test Writer] → test.written → [Implementer] →   │
+│  test.passing → [Refactorer] → refactor.done ──┐                │
+│                                                │                │
+│  ┌─────────────────────────────────────────────┘                │
+│  └──→ (loops back to Test Writer for next test)                 │
+└─────────────────────────────────────────────────────────────────┘
+```
+#### Ralph as the Constant Coordinator
+In hat-based mode, **Ralph is always present**:
+- Ralph cannot be removed or replaced
+- Custom hats define the **topology** (who triggers whom)
+- Ralph executes with **topology awareness** — knowing which hats exist and their relationships
+- Ralph serves as the **universal fallback** — orphaned events automatically route to Ralph
+This means custom hats don't execute directly. Instead, Ralph reads all pending events across all hats and decides what to do based on the defined topology. Ralph then either:
+- Delegates to the appropriate hat by publishing an event
+- Handles the work directly if no hat is suited
+#### Event Routing and Topic Matching
+Events route to hats using **glob-style pattern matching**:
+| Pattern | Matches |
+|---------|---------|
+| `task.start` | Exactly `task.start` |
+| `build.*` | `build.done`, `build.blocked`, `build.task`, etc. |
+| `*.done` | `build.done`, `review.done`, `test.done`, etc. |
+| `*` | Everything (global wildcard — used by Ralph as fallback) |
+**Priority Rules:**
+- Specific patterns take precedence over wildcards
+- If multiple hats have specific subscriptions, that's an error (ambiguous routing)
+- Global wildcard (`*`) only triggers if no specific handler exists
+### Coordination Patterns
+Ralph presets implement several proven coordination patterns:
+#### 1. Linear Pipeline
+The simplest pattern — work flows through a sequence of specialists.
+```
+Input → Hat A → Event → Hat B → Event → Hat C → Output
+```
+**Example: TDD Red-Green-Refactor** (`tdd-red-green.yml`)
+```yaml
+hats:
+  test_writer:
+    triggers: ["tdd.start", "refactor.done"]
+    publishes: ["test.written"]
+  implementer:
+    triggers: ["test.written"]
+    publishes: ["test.passing"]
+  refactorer:
+    triggers: ["test.passing"]
+    publishes: ["refactor.done", "cycle.complete"]
+```
+```
+tdd.start → 🔴 Test Writer → test.written → 🟢 Implementer →
+test.passing → 🔵 Refactorer → refactor.done ─┐
+                                              │
+              ┌───────────────────────────────┘
+              └──→ (back to Test Writer)
+```
+**When to use:** Workflows with clear sequential phases where each step builds on the previous.
+#### 2. Contract-First Pipeline
+A variant where work must pass validation gates before proceeding.
+**Example: Spec-Driven Development** (`spec-driven.yml`)
+```yaml
+hats:
+  spec_writer:
+    triggers: ["spec.start", "spec.rejected"]
+    publishes: ["spec.ready"]
+  spec_reviewer:
+    triggers: ["spec.ready"]
+    publishes: ["spec.approved", "spec.rejected"]
+  implementer:
+    triggers: ["spec.approved", "spec.violated"]
+    publishes: ["implementation.done"]
+  verifier:
+    triggers: ["implementation.done"]
+    publishes: ["task.complete", "spec.violated"]
+```
+```
+spec.start → 📋 Spec Writer ──→ spec.ready ──→ 🔎 Spec Critic
+                 ↑                                   │
+                 └────── spec.rejected ──────────────┤
+                                                     ↓
+                                               spec.approved
+                                                     │
+                                                     ↓
+task.complete ←── ✅ Verifier ←── impl.done ←── ⚙️ Implementer
+                       │                              ↑
+                       └──── spec.violated ───────────┘
+```
+**When to use:** High-stakes changes where the spec must be rock-solid before implementation begins.
+#### 3. Cyclic Rotation
+Multiple roles take turns, each bringing a different perspective.
+**Example: Mob Programming** (`mob-programming.yml`)
+```yaml
+hats:
+  navigator:
+    triggers: ["mob.start", "observation.noted"]
+    publishes: ["direction.set", "mob.complete"]
+  driver:
+    triggers: ["direction.set"]
+    publishes: ["code.written"]
+  observer:
+    triggers: ["code.written"]
+    publishes: ["observation.noted"]
+```
+```
+mob.start → 🧭 Navigator → direction.set → ⌨️ Driver →
+code.written → 👁️ Observer → observation.noted ─┐
+                                                │
+              ┌─────────────────────────────────┘
+              └──→ (back to Navigator)
+```
+**When to use:** Complex features that benefit from multiple perspectives and continuous feedback.
+#### 4. Adversarial Review
+Two roles with opposing objectives ensure robustness.
+**Example: Red Team / Blue Team** (`adversarial-review.yml`)
+```yaml
+hats:
+  builder:
+    name: "🔵 Blue Team (Builder)"
+    triggers: ["security.review", "fix.applied"]
+    publishes: ["build.ready"]
+  red_team:
+    name: "🔴 Red Team (Attacker)"
+    triggers: ["build.ready"]
+    publishes: ["vulnerability.found", "security.approved"]
+  fixer:
+    triggers: ["vulnerability.found"]
+    publishes: ["fix.applied"]
+```
+```
+security.review → 🔵 Blue Team → build.ready → 🔴 Red Team
+                      ↑                            │
+                      │                            ├─→ security.approved ✓
+                      │                            │
+                      │                            └─→ vulnerability.found
+                      │                                        │
+                      └────── fix.applied ←── 🛡️ Fixer ←──────┘
+```
+**When to use:** Security-sensitive code, authentication systems, or any code where adversarial thinking improves quality.
+#### 5. Hypothesis-Driven Investigation
+The scientific method applied to debugging.
+**Example: Scientific Method** (`scientific-method.yml`)
+```yaml
+hats:
+  observer:
+    triggers: ["science.start", "hypothesis.rejected"]
+    publishes: ["observation.made"]
+  theorist:
+    triggers: ["observation.made"]
+    publishes: ["hypothesis.formed"]
+  experimenter:
+    triggers: ["hypothesis.formed"]
+    publishes: ["hypothesis.confirmed", "hypothesis.rejected"]
+  fixer:
+    triggers: ["hypothesis.confirmed"]
+    publishes: ["fix.applied"]
+```
+```
+science.start → 🔬 Observer → observation.made → 🧠 Theorist →
+hypothesis.formed → 🧪 Experimenter ──┬─→ hypothesis.confirmed → 🔧 Fixer
+                                      │
+                                      └─→ hypothesis.rejected ─┐
+                                                               │
+              ┌────────────────────────────────────────────────┘
+              └──→ (back to Observer with new data)
+```
+**When to use:** Complex bugs where the root cause isn't obvious. Forces systematic investigation over random fixes.
+#### 6. Coordinator-Specialist (Fan-Out)
+A coordinator delegates to specialists based on the work type.
+**Example: Gap Analysis** (`gap-analysis.yml`)
+```yaml
+hats:
+  analyzer:
+    triggers: ["gap.start", "verify.complete", "report.complete"]
+    publishes: ["analyze.spec", "verify.request", "report.request"]
+  verifier:
+    triggers: ["analyze.spec", "verify.request"]
+    publishes: ["verify.complete"]
+  reporter:
+    triggers: ["report.request"]
+    publishes: ["report.complete"]
+```
+```
+                    ┌─→ analyze.spec ──→ 🔍 Verifier ──┐
+                    │                                  │
+gap.start → 📊 Analyzer ←── verify.complete ──────────┘
+                    │
+                    └─→ report.request ──→ 📝 Reporter ──→ report.complete
+```
+**When to use:** Work that naturally decomposes into independent specialist tasks (analysis, verification, reporting).
+#### 7. Adaptive Entry Point
+A bootstrapping hat detects input type and routes to the appropriate workflow.
+**Example: Code-Assist** (`code-assist.yml`)
+```yaml
+hats:
+  planner:
+    triggers: ["build.start"]
+    publishes: ["tasks.ready"]
+    # Detects: PDD directory vs. code task file vs. description
+  builder:
+    triggers: ["tasks.ready", "validation.failed", "task.complete"]
+    publishes: ["implementation.ready", "task.complete"]
+  validator:
+    triggers: ["implementation.ready"]
+    publishes: ["validation.passed", "validation.failed"]
+  committer:
+    triggers: ["validation.passed"]
+    publishes: ["commit.complete"]
+```
+```
+build.start → 📋 Planner ─── (detects input type) ───→ tasks.ready
+                                                            │
+    ┌───────────────────────────────────────────────────────┘
+    │
+    ↓
+⚙️ Builder ←─────────────── validation.failed ←─────┐
+    │                                               │
+    ├── task.complete ──→ (loop for PDD mode) ──────┤
+    │                                               │
+    └── implementation.ready ──→ ✅ Validator ──────┤
+                                      │             │
+                                      └─→ validation.passed
+                                              │
+                                              ↓
+                                        📦 Committer → commit.complete
+```
+**When to use:** Workflows that need to handle multiple input formats or adapt their behavior based on context.
+### Designing Custom Hat Collections
+#### Hat Configuration Schema
+```yaml
+hats:
+  my_hat:
+    name: "🎯 Display Name"      # Shown in TUI and logs
+    description: "What this hat does"  # REQUIRED — Ralph uses this for delegation
+    triggers: ["event.a", "event.b"]   # Events that activate this hat
+    publishes: ["event.c", "event.d"]  # Events this hat can emit
+    default_publishes: "event.c"       # Fallback if hat forgets to emit
+    max_activations: 10                # Optional cap on activations
+    backend: "claude"                  # Optional backend override
+    instructions: |
+      Prompt injected when this hat is active.
+      Tell the hat what to do, not how to do it.
+```
+#### Design Principles
+1. **Description is critical** — Ralph uses hat descriptions to decide when to delegate. Make them clear and specific.
+2. **One hat, one responsibility** — Each hat should have a clear, focused purpose. If you're writing "and" in the description, consider splitting.
+3. **Events are routing signals, not data** — Keep payloads brief. Store detailed output in files and reference them in events.
+4. **Design for recovery** — If a hat fails or forgets to publish, Ralph catches the orphaned event. Your topology should handle unexpected states gracefully.
+5. **Test with simple prompts first** — Complex topologies can have emergent behavior. Start simple, validate the flow, then add complexity.
-All hats share `.agent/scratchpad.md` — persistent memory across iterations. This enables hats to build on previous work rather than starting fresh.
+#### Validation Rules
+Ralph validates hat configurations:
+- **Required description**: Every hat must have a description (Ralph needs it for delegation context)
+- **Reserved triggers**: `task.start` and `task.resume` are reserved for Ralph
+- **No ambiguous routing**: Each trigger pattern must map to exactly one hat
+```
+ERROR: Ambiguous routing for trigger 'build.done'.
+Both 'planner' and 'reviewer' trigger on 'build.done'.
+```
+### Event Emission
+Hats emit events to signal completion or hand off work:
+```bash
+# Simple event with payload
+ralph emit "build.done" "tests: pass, lint: pass"
+# Event with JSON payload
+ralph emit "review.done" --json '{"status": "approved", "issues": 0}'
+# Direct handoff to specific hat (bypasses routing)
+ralph emit "handoff" --target reviewer "Please review the changes"
+```
+**In agent output**, events are embedded as XML tags:
+```xml
+<event topic="impl.done">Implementation complete</event>
+<event topic="handoff" target="reviewer">Please review</event>
+```
+### Choosing a Pattern
+| Scenario | Recommended Pattern | Preset |
+|----------|---------------------|--------|
+| Sequential workflow with clear phases | Linear Pipeline | `tdd-red-green` |
+| Spec must be approved before coding | Contract-First | `spec-driven` |
+| Need multiple perspectives | Cyclic Rotation | `mob-programming` |
+| Security review required | Adversarial | `adversarial-review` |
+| Debugging complex issues | Hypothesis-Driven | `scientific-method` |
+| Work decomposes into specialist tasks | Coordinator-Specialist | `gap-analysis` |
+| Multiple input formats | Adaptive Entry | `code-assist` |
+| Standard feature development | Basic delegation | `feature` |
+### When Not to Use Hats
+Hat-based orchestration adds complexity. Use **traditional mode** (no hats) when:
+- The task is straightforward and single-focused
+- You don't need role separation or handoffs
+- You're prototyping and want minimal configuration
+- The work doesn't naturally decompose into distinct phases
+Traditional mode is just Ralph in a loop until completion — simpler, faster to set up, and often sufficient.
+### Memories and Tasks
+Ralph uses two complementary systems for persistent state (both enabled by default):
+**Memories** (`.agent/memories.md`) — Accumulated wisdom across sessions:
+- Codebase patterns and conventions discovered
+- Architectural decisions and rationale
+- Recurring problem solutions (fixes)
+- Project-specific context
+**Tasks** (`.agent/tasks.jsonl`) — Runtime work tracking:
+- Create, list, and close tasks during orchestration
+- Track dependencies between tasks
+- Used for loop completion verification
+When memories and tasks are enabled, they replace the scratchpad for state management. Set `memories.enabled: false` and `tasks.enabled: false` to use the legacy scratchpad-only mode.
+### Scratchpad (Legacy Mode)
+When memories/tasks are disabled, all hats share `.agent/scratchpad.md` — persistent memory across iterations. This enables hats to build on previous work rather than starting fresh.
 The scratchpad is the primary mechanism for:
 - Task tracking (with `[ ]`, `[x]`, `[~]` markers)
@@ -427,6 +937,7 @@ tests: pass, lint: pass, typecheck: pass
 | `ralph init` | Initialize configuration file |
 | `ralph clean` | Clean up `.agent/` directory |
 | `ralph emit` | Emit an event to the event log |
+| `ralph tools` | Runtime tools for memories and tasks (agent-facing) |
 ### Global Options
@@ -445,11 +956,12 @@ tests: pass, lint: pass, typecheck: pass
 | `--max-iterations <N>` | Override max iterations |
 | `--completion-promise <TEXT>` | Override completion trigger |
 | `--dry-run` | Show what would execute |
-| `--tui` | Enable TUI mode (experimental) |
+| `--no-tui` | Disable TUI mode (TUI is enabled by default) |
 | `-a, --autonomous` | Force headless mode |
 | `--idle-timeout <SECS>` | TUI idle timeout (default: 30) |
 | `--record-session <FILE>` | Record session to JSONL |
 | `-q, --quiet` | Suppress output (for CI) |
+| `--continue` | Resume from existing scratchpad |
 ### `ralph init` Options
@@ -474,9 +986,29 @@ tests: pass, lint: pass, typecheck: pass
 | `<INPUT>` | Optional description text or path to PDD plan file |
 | `-b, --backend <BACKEND>` | Backend to use (overrides config and auto-detection) |
+### `ralph tools` Subcommands
+The `tools` command provides agent-facing utilities for runtime state management:
+```bash
+# Memory management (persistent learning)
+ralph tools memory add "content" -t pattern --tags tag1,tag2
+ralph tools memory search "query"
+ralph tools memory list
+ralph tools memory show <id>
+ralph tools memory delete <id>
+# Task management (runtime tracking)
+ralph tools task add "Title" -p 2              # Create task (priority 1-5)
+ralph tools task add "X" --blocked-by Y        # With dependency
+ralph tools task list                           # All tasks
+ralph tools task ready                          # Unblocked tasks only
+ralph tools task close <id>                     # Mark complete
+```
 ## Architecture
-Ralph is organized as a Cargo workspace with six crates:
+Ralph is organized as a Cargo workspace with seven crates:
 | Crate | Purpose |
 |-------|---------|
@@ -485,6 +1017,7 @@ Ralph is organized as a Cargo workspace with six crates:
 | `ralph-adapters` | CLI backend integrations (Claude, Kiro, Gemini, etc.) |
 | `ralph-tui` | Terminal UI with ratatui |
 | `ralph-cli` | Binary entry point and CLI parsing |
+| `ralph-e2e` | End-to-end test harness for backend validation |
 | `ralph-bench` | Benchmarking harness (dev-only) |
 ## Building & Testing

package/npm-shrinkwrap.json CHANGED Viewed

@@ -23,7 +23,7 @@
       "hasInstallScript": true,
       "license": "MIT",
       "name": "@ralph-orchestrator/ralph-cli",
-      "version": "2.1.3"
+      "version": "2.2.1"
     },
     "node_modules/@isaacs/balanced-match": {
       "engines": {
@@ -515,5 +515,5 @@
     }
   },
   "requires": true,
-  "version": "2.1.3"
+  "version": "2.2.1"
 }

package/package.json CHANGED Viewed

@@ -1,5 +1,5 @@
 {
-  "artifactDownloadUrl": "https://github.com/mikeyobrien/ralph-orchestrator/releases/download/v2.1.3",
+  "artifactDownloadUrl": "https://github.com/mikeyobrien/ralph-orchestrator/releases/download/v2.2.1",
   "bin": {
     "ralph": "run-ralph.js"
   },
@@ -62,7 +62,7 @@
       "zipExt": ".tar.xz"
     }
   },
-  "version": "2.1.3",
+  "version": "2.2.1",
   "volta": {
     "node": "18.14.1",
     "npm": "9.5.0"