npm - @ai-dev-methodologies/rlp-desk - Versions diffs - 0.1.2 → 0.2.1 - Mend

@ai-dev-methodologies/rlp-desk 0.1.2 → 0.2.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (7) hide show

package/README.md +98 -0
package/docs/protocol-reference.md +90 -3
package/package.json +1 -1
package/src/commands/rlp-desk.md +114 -10
package/src/governance.md +87 -10
package/src/scripts/init_ralph_desk.zsh +22 -6
package/src/scripts/run_ralph_desk.zsh +834 -98

package/README.md CHANGED Viewed

@@ -134,6 +134,12 @@ for iteration in 1..max_iter:
 | `--worker-model MODEL` | sonnet | Worker model (haiku/sonnet/opus) |
 | `--verifier-model MODEL` | opus | Verifier model (haiku/sonnet/opus) |
 | `--mode agent\|tmux` | agent | Execution mode (see below) |
+| `--worker-engine claude\|codex` | claude | Engine for Worker (claude uses Agent(), codex uses Bash CLI) |
+| `--verifier-engine claude\|codex` | claude | Engine for Verifier |
+| `--codex-model MODEL` | gpt-5.4 | Model passed to the Codex CLI (when engine=codex) |
+| `--codex-reasoning low\|medium\|high` | high | Reasoning effort for Codex |
+| `--verify-mode per-us\|batch` | per-us | Verification strategy (see below) |
+| `--verify-consensus` | off | Cross-engine consensus verification (see below) |
 ## Execution Modes
@@ -193,6 +199,98 @@ To clean up tmux artifacts:
 /rlp-desk clean calculator --kill-session
 ```
+## Engine Support
+RLP Desk supports two execution engines for Worker and Verifier. **Claude is the default.** Codex is opt-in.
+### Claude (default)
+```
+/rlp-desk run calculator
+/rlp-desk run calculator --worker-engine claude --verifier-engine claude
+```
+Uses Claude Code's `Agent()` tool (agent mode) or `claude -p` CLI (tmux mode). Supports dynamic model routing (haiku/sonnet/opus).
+### Codex (opt-in)
+```bash
+# Install codex CLI first
+npm install -g @openai/codex
+# Run with codex worker
+/rlp-desk run calculator --worker-engine codex
+# Customize model and reasoning effort
+/rlp-desk run calculator --worker-engine codex --codex-model gpt-5.4 --codex-reasoning high
+# Mix engines: codex worker, claude verifier
+/rlp-desk run calculator --worker-engine codex --verifier-engine claude
+```
+Uses the `codex` CLI via `Bash()` (agent mode) or as an interactive TUI (tmux mode). The `codex` binary is only required when an engine is set to `codex`.
+| Engine | Agent Mode | Tmux Mode | Dynamic Routing |
+|--------|-----------|-----------|-----------------|
+| claude | `Agent()` tool | `claude -p` TUI | Yes (haiku/sonnet/opus) |
+| codex  | `Bash("codex ...")` | `codex` TUI | No (static model) |
+## Verification Modes
+RLP Desk supports two verification strategies. **Per-US is the default.**
+### Per-US Verification (default)
+```
+/rlp-desk run calculator
+/rlp-desk run calculator --verify-mode per-us
+```
+Each user story is verified independently after completion, then a final full verification runs after all stories pass:
+```
+Worker: US-001 → Verifier: US-001 AC only → pass
+Worker: US-002 → Verifier: US-002 AC only → pass
+Worker: US-003 → Verifier: US-003 AC only → pass
+Final full verify: ALL AC → pass → COMPLETE
+```
+Benefits:
+- Catch issues early, before later stories build on broken foundations
+- Smaller verification scope = faster, more accurate checks
+- Failed verification retries only the specific US
+### Batch Verification
+```
+/rlp-desk run calculator --verify-mode batch
+```
+Legacy behavior: Worker completes all stories, then a single verification checks all acceptance criteria at once.
+### Cross-Engine Consensus Verification
+```
+/rlp-desk run calculator --verify-consensus
+```
+When enabled, **both claude and codex verify independently**. Both must pass for verification to succeed.
+```
+Worker completes US → Claude verifies → Codex verifies
+  Both pass → proceed
+  Either fails → combined fix contract → Worker retry
+  3 rounds without consensus → BLOCKED
+```
+Consensus can be combined with per-US mode for maximum rigor:
+```
+/rlp-desk run calculator --verify-mode per-us --verify-consensus
+```
+Prerequisites: Both `claude` and `codex` CLIs must be installed.
 ## Project Structure
 After `init`, your project gets this scaffold:

package/docs/protocol-reference.md CHANGED Viewed

@@ -109,6 +109,7 @@ Written by the Worker at the end of every iteration. Provides a structured JSON
 {
   "iteration": 3,
   "status": "continue|verify|blocked",
+  "us_id": "US-001",
   "summary": "Completed US-001, other stories remain",
   "timestamp": "2025-01-15T10:30:00Z"
 }
@@ -118,12 +119,13 @@ Written by the Worker at the end of every iteration. Provides a structured JSON
 |-------|------|-------------|
 | `iteration` | number | Current iteration number |
 | `status` | string | One of: `continue`, `verify`, `blocked` |
+| `us_id` | string\|null | US being verified: `"US-001"`, `"ALL"` (final full verify), or null (batch mode) |
 | `summary` | string | Brief description of what was accomplished |
 | `timestamp` | string | ISO 8601 UTC timestamp |
 **Status values:**
 - `continue` -- Current action done but more work remains. Leader proceeds to next iteration.
-- `verify` -- All work complete and done-claim written. Leader dispatches Verifier.
+- `verify` -- Current US complete (per-US mode) or all work complete (batch mode). Leader dispatches Verifier scoped to `us_id`.
 - `blocked` -- Autonomous blocker encountered. Leader writes BLOCKED sentinel.
 **Usage by mode:**
@@ -160,6 +162,7 @@ Written by the Verifier after independent verification.
 ```json
 {
   "verdict": "pass|fail|request_info",
+  "us_id": "US-001",
   "verified_at_utc": "2025-01-15T10:35:00Z",
   "summary": "All criteria verified with fresh evidence",
   "criteria_results": [
@@ -185,7 +188,7 @@ Written by the Verifier after independent verification.
 ```
 **Verdict values:**
-- `pass`: all criteria met — Leader may write COMPLETE sentinel
+- `pass`: all criteria met — Leader may write COMPLETE sentinel (or add US to `verified_us` in per-US mode)
 - `fail`: one or more criteria not met — Leader reads issues, builds next contract
 - `request_info`: Verifier cannot determine pass/fail without more information — summary contains specific questions; Leader decides outcome and may relay questions to Worker
@@ -363,14 +366,28 @@ Updated by the Leader after each iteration:
   "phase": "worker|verifier|complete|blocked|timeout",
   "worker_model": "sonnet",
   "verifier_model": "opus",
+  "worker_engine": "claude",
+  "verifier_engine": "claude",
+  "verify_mode": "per-us",
+  "verify_consensus": 0,
   "last_result": "continue|verify|pass|fail|blocked",
   "consecutive_failures": 0,
+  "verified_us": ["US-001", "US-002"],
+  "consensus_round": 0,
+  "claude_verdict": "",
+  "codex_verdict": "",
   "updated_at_utc": "2025-01-15T10:30:00Z"
 }
 ```
 - `consecutive_failures`: number of consecutive Verifier `fail` verdicts since the last `pass`. Reset to 0 on any `pass`. Unchanged by `request_info`. Used by the Circuit Breaker (see above).
 - `last_failing_criteria`: (optional) array of criterion IDs from recent `fail` verdicts, used by Leader to detect same-criterion and diverse-failure CB patterns. Leaders may add additional tracking fields as needed.
+- `verified_us`: array of US IDs that have individually passed verification (per-US mode only). Empty in batch mode.
+- `verify_mode`: `per-us` or `batch`. Controls the verification strategy.
+- `verify_consensus`: `0` or `1`. Whether cross-engine consensus verification is enabled.
+- `consensus_round`: current consensus round for the active US (resets per US). Only present when `verify_consensus=1`.
+- `claude_verdict`: latest claude verifier verdict. Only present when `verify_consensus=1`.
+- `codex_verdict`: latest codex verifier verdict. Only present when `verify_consensus=1`.
 ## Project Plans Files
@@ -390,7 +407,7 @@ The `quality-spec` file is not generated by `init`. Create it manually when a pr
 |---------|-----------|-------------|
 | `brainstorm` | `<description>` | Interactive planning before init |
 | `init` | `<slug> [objective]` | Create project scaffold |
-| `run` | `<slug> [--max-iter N] [--worker-model M] [--verifier-model M] [--mode agent\|tmux]` | Run the leader loop |
+| `run` | `<slug> [--max-iter N] [--worker-model M] [--verifier-model M] [--mode agent\|tmux] [--worker-engine claude\|codex] [--verifier-engine claude\|codex] [--codex-model MODEL] [--codex-reasoning low\|medium\|high] [--verify-mode per-us\|batch] [--verify-consensus]` | Run the leader loop |
 | `status` | `<slug>` | Display current loop status |
 | `logs` | `<slug> [N]` | Show iteration logs |
 | `clean` | `<slug> [--kill-session]` | Remove runtime artifacts for re-run |
@@ -402,6 +419,76 @@ The `run` command accepts `--mode agent|tmux` (default: `agent`).
 - **`--mode agent`** (default): The current Claude Code session acts as the Leader, dispatching Workers and Verifiers via `Agent()`. Synchronous, no tmux required.
 - **`--mode tmux`**: Validates the scaffold, checks prerequisites (`tmux`, `jq`), then launches `run_ralph_desk.zsh` as the Leader. The LLM session exits after launching the script. The shell script runs independently in a tmux session.
+### Engine Options
+The `run` command accepts engine flags to control which CLI executes Worker and Verifier prompts. **Claude is the default engine.**
+| Flag | Default | Description |
+|------|---------|-------------|
+| `--worker-engine claude\|codex` | `claude` | Engine for Worker |
+| `--verifier-engine claude\|codex` | `claude` | Engine for Verifier |
+| `--codex-model MODEL` | `gpt-5.4` | Model passed to the `codex` CLI (when engine=codex) |
+| `--codex-reasoning low\|medium\|high` | `high` | Reasoning effort for the `codex` CLI |
+**Claude engine** (default): uses `Agent()` in agent mode, `claude -p` with `--dangerously-skip-permissions` in tmux mode.
+**Codex engine** (opt-in): uses `Bash("codex ...")` in agent mode, interactive `codex` TUI in tmux mode. The `codex` binary must be installed separately (`npm install -g @openai/codex`) and is only required when an engine is set to `codex`.
+Engine flags are passed to tmux mode via environment variables: `WORKER_ENGINE`, `VERIFIER_ENGINE`, `CODEX_MODEL`, `CODEX_REASONING`.
+### Verify Mode Options
+The `run` command accepts `--verify-mode` to control the verification strategy. **Per-US is the default.**
+| Flag | Default | Description |
+|------|---------|-------------|
+| `--verify-mode per-us\|batch` | `per-us` | Verification strategy |
+| `--verify-consensus` | off | Cross-engine consensus verification |
+**Per-US mode** (default): After each user story is completed, the Verifier checks only that story's acceptance criteria. After all stories individually pass, a final full verify checks all AC. The Leader tracks `verified_us` in `status.json`.
+**Batch mode**: Legacy behavior. Worker completes all stories, then a single verification checks all AC at once.
+**Consensus verification** (`--verify-consensus`): After the primary verifier runs, a second verifier runs with the alternate engine (claude or codex). Both must pass. If either fails, combined issues form a fix contract. Max 3 consensus rounds per US before BLOCKED. Requires both `claude` and `codex` CLIs.
+Verify mode flags are passed to tmux mode via environment variables: `VERIFY_MODE`, `VERIFY_CONSENSUS`.
+### Iteration Signal (`us_id` field)
+When using per-US verification, the Worker includes a `us_id` field in `iter-signal.json`:
+```json
+{
+  "iteration": 3,
+  "status": "verify",
+  "us_id": "US-001",
+  "summary": "Completed US-001",
+  "timestamp": "2025-01-15T10:30:00Z"
+}
+```
+| Value | Meaning |
+|-------|---------|
+| `"US-001"` (specific) | Verify only this story's AC |
+| `"ALL"` | Final full verify — check all AC |
+| absent/null | Legacy batch mode — check all AC |
+The Verifier's verdict JSON also includes a `us_id` field to confirm which scope was verified.
+### Consensus Verdict Fields
+When `--verify-consensus` is enabled, `status.json` includes additional fields:
+```json
+{
+  "consensus_round": 1,
+  "claude_verdict": "pass",
+  "codex_verdict": "pass"
+}
+```
+Individual engine verdicts are saved as `verify-verdict-claude.json` and `verify-verdict-codex.json` in the logs directory.
 ### `--kill-session` Flag
 The `clean` command accepts `--kill-session` to kill any tmux sessions matching the slug pattern (`rlp-desk-<slug>-*`) in addition to removing runtime files.

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@ai-dev-methodologies/rlp-desk",
-  "version": "0.1.2",
+  "version": "0.2.1",
   "description": "Fresh-context iterative loops for Claude Code — autonomous task completion with independent verification",
   "scripts": {
     "postinstall": "node scripts/postinstall.js",

package/src/commands/rlp-desk.md CHANGED Viewed

@@ -31,7 +31,16 @@ Ask about these items one by one (or in small groups):
 5. **Verification Commands** — build, test, lint commands
 6. **Completion / Blocked Criteria**
 7. **Worker / Verifier Model** — haiku, sonnet, opus. Suggest defaults (worker: sonnet, verifier: opus), ask if OK.
-8. **Max Iterations** — suggest based on story count, ask if OK.
+8. **Engine & Model** — For each role (Worker, Verifier):
+   - Engine: claude (default) or codex
+   - If claude: suggest model (haiku/sonnet/opus) based on task complexity
+   - If codex: suggest model (default: gpt-5.4) and reasoning effort (low/medium/high)
+   - AI should recommend: "For this task complexity, I suggest Worker: sonnet, Verifier: opus"
+   - If codex selected: "For codex Worker, I suggest gpt-5.4 with high reasoning"
+9. **Verify Mode** — per-us (default) or batch. Ask: "Verify after each user story (per-us, recommended) or only after all stories are done (batch)?" Default recommendation: per-us for 2+ stories.
+10. **Verify Consensus** — Ask: "Use cross-engine consensus verification? (Both claude and codex verify independently, both must pass.) Requires codex CLI." Default: no.
+11. **Consensus Scope** — If consensus enabled, ask: "Consensus on every verify (all, default) or only on final verify (final-only)?" Default: all.
+12. **Max Iterations** — suggest based on story count, ask if OK.
 After all items are confirmed, present the full contract summary.
 On approval, offer to run `init`.
@@ -56,6 +65,19 @@ Options (parse from `$ARGUMENTS`):
 - `--max-iter N` (default: 100)
 - `--worker-model MODEL` (default: sonnet)
 - `--verifier-model MODEL` (default: opus)
+- `--worker-engine claude|codex` (default: `claude`) — engine for Worker
+- `--verifier-engine claude|codex` (default: `claude`) — engine for Verifier
+- `--worker-codex-model MODEL` (default: `gpt-5.4`) — codex model for Worker
+- `--worker-codex-reasoning low|medium|high` (default: `high`) — reasoning for Worker
+- `--verifier-codex-model MODEL` (default: `gpt-5.4`) — codex model for Verifier
+- `--verifier-codex-reasoning low|medium|high` (default: `high`) — reasoning for Verifier
+- `--verify-mode per-us|batch` (default: `per-us`) — verification strategy
+  - `per-us`: verify after each US, then final full verify of all AC
+  - `batch`: verify only after all US done (legacy behavior)
+- `--verify-consensus` — enable cross-engine consensus verification (both claude and codex verify independently; both must pass)
+- `--consensus-scope all|final-only` — when consensus runs (default: `all`)
+  - `all`: consensus runs on every verify (current behavior)
+  - `final-only`: consensus only on final ALL verify
 - `--debug` — enable debug logging (tmux mode only, writes to logs/<slug>/debug.log)
 ### Mode Selection
@@ -77,13 +99,25 @@ ROOT="$PWD" \
 MAX_ITER=<--max-iter value> \
 WORKER_MODEL=<--worker-model value> \
 VERIFIER_MODEL=<--verifier-model value> \
+WORKER_ENGINE=<--worker-engine value, default: claude> \
+VERIFIER_ENGINE=<--verifier-engine value, default: claude> \
+WORKER_CODEX_MODEL=<--worker-codex-model value, default: gpt-5.4> \
+WORKER_CODEX_REASONING=<--worker-codex-reasoning value, default: high> \
+VERIFIER_CODEX_MODEL=<--verifier-codex-model value, default: gpt-5.4> \
+VERIFIER_CODEX_REASONING=<--verifier-codex-reasoning value, default: high> \
+VERIFY_MODE=<--verify-mode value, default: per-us> \
+VERIFY_CONSENSUS=<1 if --verify-consensus, else 0> \
+CONSENSUS_SCOPE=<--consensus-scope value, default: all> \
 DEBUG=<1 if --debug, else 0> \
   zsh ~/.claude/ralph-desk/run_ralph_desk.zsh
 ```
 6. **If the script exits with error (exit code 1)** — report the error to the user and STOP. Do NOT attempt to work around it. Do NOT create tmux sessions yourself. Do NOT re-launch the script in a different way. Just tell the user what went wrong and suggest using Agent mode instead.
 7. **If successful** — tell the user the tmux session has been started. The shell script takes over as the deterministic Leader. No Agent() calls are made in tmux mode.
-**IMPORTANT:** Tmux mode requires the user to already be inside a tmux session. If the runner script rejects because $TMUX is not set, do NOT try to create a tmux session yourself. Tell the user: "Start tmux first, then retry."
+**IMPORTANT RULES:**
+- Tmux mode requires the user to already be inside a tmux session. If the runner script rejects because $TMUX is not set, do NOT try to create a tmux session yourself. Tell the user: "Start tmux first, then retry."
+- Do NOT run the script in background (`&`, `run_in_background`). The script must run in foreground so panes remain visible to the user. The user needs to see Worker/Verifier panes in real-time.
+- Do NOT kill panes after completion. Panes stay alive for inspection. User cleans up with `/rlp-desk clean <slug> --kill-session`.
 #### Agent Mode (`--mode agent` or default)
@@ -124,7 +158,9 @@ rm -f .claude/ralph-desk/memos/<slug>-verify-verdict.json
 - Combine with iteration number + memory contract
 - Write to `.claude/ralph-desk/logs/<slug>/iter-NNN.worker-prompt.md` (audit trail)
-**⑤ Execute Worker via Agent()**
+**⑤ Execute Worker**
+If `--worker-engine claude` (default):
 ```
 Agent(
   description="rlp-desk worker iter-NNN",
@@ -137,24 +173,69 @@ Agent(
 - Agent returns synchronously. No polling needed.
 - Each Agent() = fresh context. Guaranteed.
+If `--worker-engine codex`:
+```
+Bash("codex exec --model <worker_codex_model> --reasoning-effort <worker_codex_reasoning> <full worker prompt text>")
+```
+- Codex runs as a subprocess via Bash(), not Agent().
+- Each Bash() call = fresh context for codex.
 **⑥ Read memory.md again** (Worker updated it)
 - `stop=continue` → go to ⑧
 - `stop=verify` → go to ⑦
 - `stop=blocked` → write BLOCKED sentinel, stop
-**⑦ Execute Verifier via Agent()**
-- Build verifier prompt, write to `iter-NNN.verifier-prompt.md`
+- Also read `iter-signal.json` for `us_id` field (which US was just completed)
+**⑦ Execute Verifier**
+**Per-US mode** (default, `--verify-mode per-us`):
+- Read `us_id` from `iter-signal.json` (e.g., "US-001" or "ALL")
+- Build verifier prompt scoped to `us_id`:
+  - If `us_id` is a specific story: "Verify ONLY the acceptance criteria for {us_id}"
+  - If `us_id` is "ALL": "Verify ALL acceptance criteria (final full verify)"
+- Write to `iter-NNN.verifier-prompt.md`
+- Track verified US in `status.json` field `verified_us` (array)
+- After verifier passes a specific US:
+  - Add that US to `verified_us` in status.json
+  - If more US remain → Worker does next US → verify → ...
+  - If all US individually passed → signal final full verify (us_id=ALL)
+  - After final full verify passes → COMPLETE
+**Batch mode** (`--verify-mode batch`):
+- Legacy behavior: verify only when Worker signals all work is done
+- Verifier checks all AC at once
+**⑦a Dispatch Verifier**
+If `--verifier-engine claude` (default):
 ```
 Agent(
-  description="rlp-desk verifier iter-NNN",
+  description="rlp-desk verifier iter-NNN (us_id)",
   subagent_type="executor",
   model=<verifier_model>,
   mode="bypassPermissions",
-  prompt=<full verifier prompt text>
+  prompt=<full verifier prompt text with US scope>
 )
 ```
-- Read `verify-verdict.json`:
+If `--verifier-engine codex`:
+```
+Bash("codex exec --model <verifier_codex_model> --reasoning-effort <verifier_codex_reasoning> <full verifier prompt text>")
+```
+**⑦b Consensus Verification** (when `--verify-consensus` is enabled):
+After the primary verifier runs, run a second verifier with the OTHER engine:
+- If primary engine is claude → run codex verifier
+- If primary engine is codex → run claude verifier
+- Both produce `verify-verdict.json` (Leader renames to `verify-verdict-claude.json` and `verify-verdict-codex.json`)
+- **Both pass** → proceed (next US or COMPLETE)
+- **Either fails** → combine issues from both verdicts into a single fix contract → Worker retry
+- Max 3 consensus rounds per US. After 3 rounds → BLOCKED.
+**⑦c Read verdict(s)**
+- Read `verify-verdict.json` (or both `-claude.json` and `-codex.json` if consensus):
   - `pass` + `complete` → write COMPLETE sentinel, report done!
+  - `pass` + specific US → add to `verified_us`, Worker does next US
   - `fail` + `continue` → **run Fix Loop** (governance.md §7½):
     1. Read `issues` array, sort by severity (`critical` → `major` → `minor`)
     2. Build structured fix contract with traceability rule
@@ -180,6 +261,13 @@ Agent(
 Track `consecutive_failures` in `status.json` (increment on `fail`, reset on `pass`, unchanged by `request_info`). Only **fail** verdicts count for CB chains — `request_info` does not break or contribute.
+Track `verified_us` (array of US IDs that passed verification) in `status.json` when using `--verify-mode per-us`.
+When `--verify-consensus` is enabled, also track in `status.json`:
+- `consensus_round`: current consensus round for this US (resets per US)
+- `claude_verdict`: latest claude verifier verdict for this US
+- `codex_verdict`: latest codex verifier verdict for this US
 ### Important Rules
 - Each Agent() = new process = fresh context
 - YOU track iteration count
@@ -216,10 +304,26 @@ tmux list-sessions -F '#{session_name}' 2>/dev/null | grep "^rlp-desk-<slug>-" |
 ```
 /rlp-desk brainstorm <description>          Plan before init (interactive)
 /rlp-desk init  <slug> [objective]          Create project scaffold
-/rlp-desk run   <slug> [--mode agent|tmux]  Run loop (agent=LLM leader, tmux=shell leader)
+/rlp-desk run   <slug> [options]            Run loop (agent=LLM leader, tmux=shell leader)
 /rlp-desk status <slug>                     Show loop status
 /rlp-desk logs  <slug> [N]                  Show iteration log
 /rlp-desk clean <slug> [--kill-session]     Reset for re-run (--kill-session kills tmux)
+Run options:
+  --mode agent|tmux          Execution mode (default: agent)
+  --max-iter N               Max iterations (default: 100)
+  --worker-model MODEL       Worker model (default: sonnet)
+  --verifier-model MODEL     Verifier model (default: opus)
+  --worker-engine claude|codex   Worker engine (default: claude)
+  --verifier-engine claude|codex Verifier engine (default: claude)
+  --worker-codex-model MODEL          Worker codex model (default: gpt-5.4)
+  --worker-codex-reasoning LEVEL      Worker codex reasoning (default: high)
+  --verifier-codex-model MODEL        Verifier codex model (default: gpt-5.4)
+  --verifier-codex-reasoning LEVEL    Verifier codex reasoning (default: high)
+  --verify-mode per-us|batch Verification strategy (default: per-us)
+  --verify-consensus         Cross-engine consensus verification
+  --consensus-scope SCOPE    When consensus runs: all|final-only (default: all)
+  --debug                    Debug logging (tmux mode only)
 ```
 ## Architecture

package/src/governance.md CHANGED Viewed

@@ -12,7 +12,7 @@ The Leader orchestrates, while Worker/Verifier run in isolated fresh contexts ev
 - **Worker claim ≠ complete**: A Worker's DONE is merely a claim. The Verifier must independently verify before it's confirmed.
 - **Verifier is independent**: The Verifier judges based on evidence alone, without knowledge of the Worker's reasoning process.
 - **Sentinels are Leader-owned**: Only the Leader writes COMPLETE/BLOCKED sentinels.
-- **Claude models only**: haiku, sonnet, opus.
+- **Supported engines**: claude (default; models: haiku, sonnet, opus) and codex (opt-in via `--worker-engine codex` / `--verifier-engine codex`).
 ## 2. Roles
@@ -43,7 +43,9 @@ The Leader orchestrates, while Worker/Verifier run in isolated fresh contexts ev
 RUNNING → DONE_CLAIMED → VERIFYING → COMPLETE | CONTINUE | BLOCKED
 ```
-## 4. Model Routing (Claude only)
+## 4. Model Routing
+### Claude (default engine)
 | Role | Default Model | Override Criteria |
 |------|---------------|-------------------|
@@ -58,12 +60,21 @@ The Leader decides each iteration. Decision criteria:
 - Simple repetitive task → downgrade model
 - User explicitly specified → use as given
+### Codex (opt-in engine)
+| Option | Default | Description |
+|--------|---------|-------------|
+| `--codex-model` | `gpt-5.4` | Model passed to the `codex` CLI |
+| `--codex-reasoning` | `high` | Reasoning effort: `low`, `medium`, or `high` |
+Model routing is static when using codex: the same model and reasoning effort apply to both Worker and Verifier. There is no dynamic upgrade path. Claude is the default engine; codex is explicitly opt-in.
 ## 5a. Execution: Agent() Approach (default) — "Smart Mode"
 All environments (Claude Code, OpenCode) use the same Agent tool.
 ```
-# Worker
+# Worker (claude engine, default)
 Agent(
   subagent_type="executor",
   model="sonnet",
@@ -71,7 +82,7 @@ Agent(
   mode="bypassPermissions"
 )
-# Verifier
+# Verifier (claude engine, default)
 Agent(
   subagent_type="executor",
   model="sonnet",
@@ -80,6 +91,15 @@ Agent(
 )
 ```
+If `--worker-engine codex` or `--verifier-engine codex` (opt-in):
+```
+# Worker or Verifier (codex engine)
+Bash("codex -m <codex_model> -c model_reasoning_effort=<codex_reasoning> --dangerously-bypass-approvals-and-sandbox <prompt>")
+```
+- Codex runs as a subprocess via `Bash()`, not `Agent()` — the Agent tool is Claude-specific.
+- Each `Bash()` call = fresh context for codex.
+- Claude is the default engine. Codex is explicitly opt-in.
 Characteristics:
 - Each call = fresh context (new subprocess)
 - Synchronous return. No polling or signal files needed.
@@ -106,14 +126,25 @@ The tmux runner (`run_ralph_desk.zsh`) creates a tmux session with three panes:
 - **Worker pane** — receives `claude -p` invocations via trigger scripts
 - **Verifier pane** — receives `claude -p` invocations via trigger scripts
-All `claude` CLI calls use `--dangerously-skip-permissions`:
+By default, `claude` CLI calls use `--dangerously-skip-permissions`:
 ```bash
+# claude engine (default)
 claude -p "$(cat /path/to/prompt.md)" \
   --model sonnet \
   --dangerously-skip-permissions
 ```
-**Security implication:** `--dangerously-skip-permissions` allows the CLI to execute code without user confirmation. The tmux runner requires this because there is no interactive user to approve each action. Only run tmux mode in trusted environments with trusted prompts.
+When `WORKER_ENGINE=codex` or `VERIFIER_ENGINE=codex`, the `codex` CLI is used instead:
+```bash
+# codex engine (opt-in)
+codex -m gpt-5.4 \
+  -c model_reasoning_effort="high" \
+  --dangerously-bypass-approvals-and-sandbox \
+  "$(cat /path/to/prompt.md)"
+```
+The codex CLI is only required when an engine is set to `codex`. Claude remains the default engine throughout.
+**Security implication:** Both `--dangerously-skip-permissions` (claude) and `--dangerously-bypass-approvals-and-sandbox` (codex) allow the CLI to execute code without user confirmation. The tmux runner requires this because there is no interactive user to approve each action. Only run tmux mode in trusted environments with trusted prompts.
 Characteristics:
 - Leader is a shell script, not an LLM — zero tokens consumed for orchestration.
@@ -193,17 +224,19 @@ for iteration in 1..max_iter:
   ⑥ Read memory.md again → check Worker's updated state
      - "continue" → go to ⑧
-     - "verify"   → go to ⑦
+     - "verify"   → go to ⑦ (also read iter-signal.json for us_id)
      - "blocked"  → write BLOCKED sentinel, stop
      Note: In tmux mode, the Leader polls `<slug>-iter-signal.json` instead of
      parsing memory.md. In Agent() mode, the Leader MAY read iter-signal.json
      as a structured alternative to parsing the Stop Status from memory.md.
-  ⑦ Execute Verifier
-     - Build prompt → log to logs/<slug>/iter-NNN.verifier-prompt.md
+  ⑦ Execute Verifier (see §7a for per-US and §7b for consensus details)
+     - Build prompt (scoped to us_id if per-us mode) → log
      - Agent(subagent_type="executor", model=selected, prompt=prompt)
+     - If --verify-consensus: run second verifier with alternate engine (see §7b)
      - Read verify-verdict.json:
-       • pass + complete → write COMPLETE sentinel, stop
+       • pass + specific US → add to verified_us, Worker does next US
+       • pass + us_id=ALL or complete → write COMPLETE sentinel, stop
        • fail + continue → go to ⑧
        • blocked → write BLOCKED sentinel, stop
@@ -211,6 +244,50 @@ for iteration in 1..max_iter:
      Update status.json, report to user, continue to next iteration
 ```
+## 7a. Per-US Verification
+By default (`--verify-mode per-us`), each user story is verified independently before proceeding to the next:
+```
+Worker completes US-001 → signal verify (us_id: "US-001")
+  → Verifier checks ONLY US-001 AC → pass
+  → Worker completes US-002 → signal verify (us_id: "US-002")
+  → Verifier checks ONLY US-002 AC → pass
+  → ...
+  → All US individually pass → signal verify (us_id: "ALL")
+  → Verifier runs FINAL FULL VERIFY (all AC) → pass → COMPLETE
+```
+**Key rules:**
+- Worker signals `verify` after each US with `us_id` set in `iter-signal.json`
+- Verifier checks only the scoped US acceptance criteria (or all if us_id=ALL)
+- Leader tracks `verified_us` array in `status.json`
+- If a per-US verify fails, the Worker retries that specific US (fix loop)
+- Final full verify ensures nothing was broken by later changes
+**Batch mode** (`--verify-mode batch`) preserves legacy behavior: Worker signals `verify` only after all work is done, and the Verifier checks all AC at once.
+## 7b. Cross-Engine Consensus Verification
+When `--verify-consensus` is enabled, after the primary verifier runs, a second verifier runs with the alternate engine:
+```
+Worker completes US → signal verify
+  → Claude Verifier runs (checks AC)
+  → Codex Verifier runs (checks AC)
+  → Both pass → proceed (next US or COMPLETE)
+  → Either fails → combined issues → fix contract → Worker retry
+  → Max 3 consensus rounds per US → BLOCKED if still disagreeing
+```
+**Key rules:**
+- Both claude and codex CLI must be installed
+- Verifiers run sequentially in the same Verifier pane (tmux) or as sequential calls (Agent mode)
+- Verdicts are saved as `verify-verdict-claude.json` and `verify-verdict-codex.json`
+- Combined fix contracts include issues from both engines
+- `status.json` includes `consensus_round`, `claude_verdict`, and `codex_verdict` fields
+- Consensus can be combined with per-US verification (each US gets consensus-verified)
 ## 7½. Fix Loop Protocol
 When the Verifier returns `fail`, the Leader runs the Fix Loop before issuing the next Worker contract: