npm - @ai-dev-methodologies/rlp-desk - Versions diffs - 0.15.5 → 0.15.6 - Mend

@ai-dev-methodologies/rlp-desk 0.15.5 → 0.15.6

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (5) hide show

package/CHANGELOG.md +16 -0
package/README.md +43 -40
package/docs/rlp-desk/getting-started.md +6 -3
package/package.json +1 -1
package/src/node/cli/command-builder.mjs +5 -2

package/CHANGELOG.md CHANGED Viewed

@@ -11,6 +11,22 @@ For pre-v0.15.4 versions, refer to `git log` and individual GitHub release notes
 - Post-v0.15.6: remove `RLP_LIFECYCLE_METRICS` flag entirely (per plan v3 ADR follow-ups).
 - Phase D.1 (handoff documents) + Phase D.2 (per-stage agent role specialization) — both deferred per `docs/plans/v0.15.4-release-runbook.md` §7.6.
+## [0.15.6] — 2026-06-18
+Patch: CI/test integrity, a codex command-builder security fix, and documentation reconciliation.
+### Fixed
+- **Codex worker command no longer passes model/reasoning unquoted.** `buildCodexCmd` now shell-quotes the model and reasoning values (parity with the claude path), closing a shell-injection / argument-splitting hazard when operator-supplied flags reach the shell via tmux send-keys.
+- **Docs now describe the correct execution modes.** The README "execution modes" section conflated the slash-command default (`--mode native`, `Agent()`-based) with the deprecated `--mode agent` (Node CLI). It is rewritten to the accurate three-mode model: `--mode tmux` is the canonical/recommended path, `--mode native` is the default companion for short/interactive use, `--mode agent` is deprecated.
+- **Docs now point to the correct scaffold path.** Getting-started and README referenced the pre-v0.13.0 `.claude/ralph-desk/` project-local path; corrected to `.rlp-desk/` (the global install path `~/.claude/ralph-desk/` is unchanged).
+### Changed
+- **CI now runs the behavioral test suites.** A new CI job runs `test:node` + `test:zsh` (previously CI ran only the existence-grep fast gate). First rollout is non-blocking to inventory CI-only flakiness before being made blocking.
+- **The full SV gate verifies the source tree.** `sv-gate:full`'s real-campaign E2E now targets the in-repo `src/` leader (and the correct `.rlp-desk/` sentinel paths) instead of the installed copy, so the gate validates the code being merged.
+### Added
+- **ADR-001 (Leader Consolidation).** Records the decision to make `--mode tmux` the canonical production leader, deprecate the `--mode agent` Node-CLI entry point on a dated schedule, and retain `--mode native` as a second-class companion. (Internal; not shipped in the tarball.)
 ## [0.15.5] — 2026-06-17
 Patch: fixes surfaced by a fresh-context live dogfood of the tmux and agent run modes, plus packaging hygiene.

package/README.md CHANGED Viewed

@@ -275,45 +275,48 @@ The brainstorm phase evaluates complexity (US count, file scope, logic, dependen
 ## Execution Modes
-RLP Desk supports two execution modes. Both honor the same governance protocol.
-> **v0.14.0 status:** `--mode tmux` (zsh-backed) is the **stable, production** path
-> with the full safety net (heartbeat, copy-mode guard, prompt-stall timeout,
-> no-progress detection, claude model upgrade chain). `--mode agent` is **alpha**
-> and ships without those features — the runner emits a stderr warning when
-> agent mode is invoked. For long campaigns and BOS-style autonomous loops,
-> use `--mode tmux`.
+RLP Desk has three execution modes, all honoring the same governance protocol. **`--mode tmux` is the canonical, recommended path for any real campaign** (see [ADR-001](docs/plans/adr-001-leader-consolidation.md)).
+> **Mode status:**
+> - **`--mode tmux`** (zsh-backed) — **stable / production / canonical.** Full safety net (heartbeat,
+>   copy-mode guard, prompt-stall timeout, no-progress detection, model upgrade chain). Use this for
+>   long campaigns and autonomous loops.
+> - **`--mode native`** (the slash-command **default**) — the current Claude Code session is the Leader,
+>   dispatching via `Agent()`. Works anywhere (no tmux), good for short/interactive use, but is a
+>   second-class companion: no iteration watchdog, turn-based pauses possible. Not for long unattended runs.
+> - **`--mode agent`** (direct Node CLI) — **deprecated alpha**, on a removal schedule (ADR-001). Prints a
+>   SCHEDULED-REMOVAL banner. Do not use for new work; prefer `--mode tmux`.
 ### Environment Compatibility
-| Environment | Agent Mode (alpha) | Tmux Mode (stable) |
-|-------------|--------------------|--------------------|
-| Claude Code (any terminal) | **Works** | Requires tmux |
-| Inside tmux session | **Works** | **Works** — panes split in current window |
-| Outside tmux session | **Works** | **Rejected** — "start tmux first" |
+| Environment | Native (default) | Tmux (canonical) | Agent (deprecated) |
+|-------------|------------------|------------------|--------------------|
+| Claude Code (any terminal) | **Works** | Requires tmux | Works |
+| Inside tmux session | **Works** | **Works** — panes split in current window | Works |
+| Outside tmux session | **Works** | **Rejected** — "start tmux first" | Works |
 ### Choosing Your Mode
 | Need | Use |
 |------|-----|
-| Production / autonomous campaigns | `--mode tmux` (stable) |
-| Long campaigns, CI, overnight runs | `--mode tmux` (stable) |
-| Quick interactive exploration inside Claude Code | `--mode agent` (alpha — Node-native) |
+| Production / autonomous / overnight / CI campaigns | `--mode tmux` (canonical) |
+| Quick interactive exploration, no tmux available | `--mode native` (default) |
+| (legacy direct-Node-CLI workflows) | `--mode agent` — deprecated; migrate to `--mode tmux` |
-### Agent Mode (default) — "Smart Mode"
+### Native Mode (slash-command default) — "Smart Mode"
 ```
-/rlp-desk run calculator
+/rlp-desk run calculator        # defaults to --mode native
 ```
 The current Claude Code session acts as the Leader, dispatching Workers and Verifiers via `Agent()`. The Leader is an LLM that dynamically routes models and reasons about context.
 - Works anywhere — no tmux required
 - Dynamic model routing — Leader upgrades models on failure
-**Known limitation:** Agent mode runs inside Claude Code's turn-based request-response model. If the LLM outputs text without a tool call, the turn terminates and the loop pauses until the user sends "continue." This is a platform constraint — the protocol mitigates it but cannot guarantee 100% uninterrupted execution. For guaranteed autonomous loops, use tmux mode.
 - Fix Loop — extracts verifier issues and feeds them back to the next worker
-- Best for interactive development
+- Best for short, interactive development
+**Known limitation:** Native mode runs inside Claude Code's turn-based request-response model. If the LLM outputs text without a tool call, the turn terminates and the loop pauses until the user sends "continue." This is a platform constraint — the protocol mitigates it but cannot guarantee 100% uninterrupted execution. **For guaranteed autonomous loops, use `--mode tmux`.**
 ### Tmux Mode — "Lean Mode"
@@ -456,7 +459,7 @@ Each conflict is logged as a JSONL entry in `logs/<slug>/conflict-log.jsonl`:
 After the campaign, review the conflict log to identify systemic issues:
 ```bash
-cat .claude/ralph-desk/logs/<slug>/conflict-log.jsonl | jq .
+cat .rlp-desk/logs/<slug>/conflict-log.jsonl | jq .
 ```
 Common patterns:
@@ -471,20 +474,20 @@ After `init`, your project gets this scaffold:
 ```
 your-project/
 ├── .claude/
-│   ├── settings.local.json          # rlp-desk permissions (auto-added by init)
-│   └── ralph-desk/
-│       ├── prompts/
-│       │   ├── <slug>.worker.prompt.md
-│       │   └── <slug>.verifier.prompt.md
-│       ├── context/
-│       │   └── <slug>-latest.md
-│       ├── memos/
-│       │   └── <slug>-memory.md
-│       ├── plans/
-│       │   ├── prd-<slug>.md
-│       │   └── test-spec-<slug>.md
-│       └── logs/<slug>/
-│           └── status.json
+│   └── settings.local.json          # rlp-desk permissions (auto-added by init)
+└── .rlp-desk/                        # scaffold (v0.13.0+; was .claude/ralph-desk/)
+    ├── prompts/
+    │   ├── <slug>.worker.prompt.md
+    │   └── <slug>.verifier.prompt.md
+    ├── context/
+    │   └── <slug>-latest.md
+    ├── memos/
+    │   └── <slug>-memory.md
+    ├── plans/
+    │   ├── prd-<slug>.md
+    │   └── test-spec-<slug>.md
+    └── logs/<slug>/
+        └── status.json
 ```
 ### Local Settings
@@ -495,15 +498,15 @@ your-project/
 {
   "permissions": {
     "allow": [
-      "Read(.claude/ralph-desk/**)",
-      "Edit(.claude/ralph-desk/**)",
-      "Write(.claude/ralph-desk/**)"
+      "Read(.rlp-desk/**)",
+      "Edit(.rlp-desk/**)",
+      "Write(.rlp-desk/**)"
     ]
   }
 }
 ```
-**Why:** Claude Code treats `.claude/` files as sensitive and prompts for confirmation on each access, even with `--dangerously-skip-permissions`. Without these permissions, Worker and Verifier agents are blocked by interactive prompts during automated loop execution.
+**Why:** Since v0.13.0 the scaffold lives at `.rlp-desk/` (outside `.claude/`), so Claude Code's `.claude/` sensitive-file gate no longer blocks Worker/Verifier writes. These explicit `.rlp-desk/**` permissions are a belt-and-suspenders helper that keeps automated loop execution prompt-free.
 **Note:** `settings.local.json` is local to your machine and is not committed to git. If the file already exists, permissions are merged without overwriting your existing settings.

package/docs/rlp-desk/getting-started.md CHANGED Viewed

@@ -64,7 +64,7 @@ On approval, brainstorm offers to run `init` automatically.
 This creates the scaffold:
 ```
-.claude/ralph-desk/
+.rlp-desk/
 ├── prompts/
 │   ├── loop-test.worker.prompt.md
 │   └── loop-test.verifier.prompt.md
@@ -78,9 +78,12 @@ This creates the scaffold:
 └── logs/loop-test/
 ```
+> Since v0.13.0 the scaffold lives at the project-local `.rlp-desk/` (not `.claude/ralph-desk/`),
+> so Claude Code's `.claude/` sensitive-file gate no longer blocks Worker/Verifier writes.
 ## Step 5: Customize the PRD
-Edit `.claude/ralph-desk/plans/prd-loop-test.md` to define your user stories and acceptance criteria. See [`examples/calculator/`](../examples/calculator/.claude/ralph-desk/plans/prd-loop-test.md) for a complete example.
+Edit `.rlp-desk/plans/prd-loop-test.md` to define your user stories and acceptance criteria. See [`examples/calculator/`](../examples/calculator/.claude/ralph-desk/plans/prd-loop-test.md) for a complete example.
 Key sections:
 - **User Stories** with Given/When/Then acceptance criteria, Task Type, and Risk Level
@@ -91,7 +94,7 @@ Key sections:
 ## Step 6: Define the Test Spec
-Edit `.claude/ralph-desk/plans/test-spec-loop-test.md` to specify verification commands:
+Edit `.rlp-desk/plans/test-spec-loop-test.md` to specify verification commands:
 ```markdown
 ## Verification Commands

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@ai-dev-methodologies/rlp-desk",
-  "version": "0.15.5",
+  "version": "0.15.6",
   "description": "Fresh-context iterative loops for Claude Code — autonomous task completion with independent verification",
   "scripts": {
     "postinstall": "node scripts/postinstall.js",

package/src/node/cli/command-builder.mjs CHANGED Viewed

@@ -72,14 +72,17 @@ export function buildClaudeCmd(mode, model, options = {}) {
 export function buildCodexCmd(mode, model, options = {}) {
   assertTuiMode(mode, 'buildCodexCmd');
+  // GAP-2 (audit): shell-quote model + reasoning for parity with buildClaudeCmd.
+  // The command string is delivered to a shell (tmux send-keys), so unquoted
+  // operator-supplied values were a shell-injection / arg-splitting hazard.
   const parts = [
     CODEX_BIN,
     '-m',
-    model,
+    shellQuote(model),
   ];
   if (options.reasoning !== undefined) {
-    parts.push('-c', `model_reasoning_effort="${options.reasoning}"`);
+    parts.push('-c', shellQuote(`model_reasoning_effort="${options.reasoning}"`));
   }
   parts.push('--disable', 'plugins', '--dangerously-bypass-approvals-and-sandbox');