npm - @ai-dev-methodologies/rlp-desk - Versions diffs - 0.17.0 → 0.18.1 - Mend

@ai-dev-methodologies/rlp-desk 0.17.0 → 0.18.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (7) hide show

package/CHANGELOG.md +49 -0
package/package.json +1 -1
package/src/governance.md +19 -3
package/src/scripts/.run_src_verify.zsh +3725 -0
package/src/scripts/init_ralph_desk.zsh +3 -2
package/src/scripts/lib_ralph_desk.zsh +114 -3
package/src/scripts/run_ralph_desk.zsh +714 -131

package/CHANGELOG.md CHANGED Viewed

@@ -11,6 +11,55 @@ For pre-v0.15.4 versions, refer to `git log` and individual GitHub release notes
 - Post-v0.15.6: remove `RLP_LIFECYCLE_METRICS` flag entirely (per plan v3 ADR follow-ups).
 - Phase D.1 (handoff documents) + Phase D.2 (per-stage agent role specialization) — both deferred per `docs/plans/v0.15.4-release-runbook.md` §7.6.
+## [0.18.1] — 2026-06-25
+### Fixed
+- End-of-campaign resilience: when the last user story passes its per-US verify, the
+  leader now runs the final verification directly instead of dispatching one more
+  worker iteration solely to hand off to the final verify. That extra round-trip was a
+  fragile dependency on a healthy worker at the very end of a campaign (a worker stall
+  or API rate-limit at that moment could block a campaign whose work was already
+  complete and verified). Per-US campaigns now finalize without it.
+## [0.18.0] — 2026-06-24
+**Leader hardening: campaigns that complete.** The `--mode tmux` leader's decision
+gates used to TERMINATE where they should RECOVER, so campaigns could stall at
+`--max-iter` or block on a single transient slip. This release resolves that
+"never completes" failure class.
+### Fixed
+- A single transient `blocked` verdict no longer ends the campaign. A fresh-context
+  Worker/Verifier mis-emitting `blocked` once now gets grace (soft-fail + retry);
+  termination only on a genuine infra failure or a repeated/consecutive block.
+- Verifier phrasing variants (`completed`/`done`, lowercase `all`) no longer strand a
+  genuinely-complete campaign at the iteration cap — recommendation/us_id normalized at read.
+- Mixed-engine dead-pane detection: a dead pane is detected per role when Worker and
+  Verifier run different engines (e.g. claude worker + codex verifier).
+- No double-credit of a re-submitted User Story on the verify pass-path.
+- Final verification now retries a single transient poll failure instead of charging a
+  false fail at the end of the campaign.
+- Durable sentinels: atomic writes detect truncation (disk-full/SIGPIPE) instead of
+  renaming a half-written file into the canonical path.
+- Race-safe locks: the runner-lock and per-slug lock close an ABA/empty-owner race that
+  could let two leaders run on one project root.
+- Done-claim freshness gate: auto-synthesis rejects a stale/wrong-US done-claim, preventing
+  the wrong User Story from being credited.
+- Model-upgrade state survives relaunch: a resumed campaign keeps the upgraded worker model
+  instead of resetting to the CLI default.
+- codex 0.141 worker compatibility: the directory-trust prompt is accepted and MCP disabled
+  so codex workers don't block on startup.
+### Changed
+- Stronger final verification: `--final-verifier-model` / `--final-verifier-effort` are now
+  actually used for the final ALL-verify pass (previously every verify ran at the per-US
+  model). "Final = stricter" now holds.
+- Consensus cross-verifier model knobs now take effect: `--consensus-model` (per-US, lighter)
+  and `--final-consensus-model` (final, stricter) are wired into the consensus path —
+  previously documented but inert.
+- Verifier FORMAT meta-gates softened cross-engine while correctness gates stay strict
+  (fewer false fails on phrasing, no weakening of substance).
 ## [0.17.0] — 2026-06-19
 **BREAKING (ADR-001): `node run.mjs run <slug> --mode agent` (the deprecated Node-leader direct-CLI alpha) now hard-errors.** Per the schedule announced in 0.16.0, the `--mode agent` entry point exits 2 with a redirect, and the Node-CLI default mode flips from `agent` to `tmux` (the canonical production leader). The `src/node/**` engine modules are retained (they back the Native Agent() path and the test suite); only the direct-CLI `--mode agent` *dispatch entry* was removed. **Migration:** replace `node run.mjs run <slug> --mode agent` with `--mode tmux`. The slash command's `--mode native` default is unaffected.

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@ai-dev-methodologies/rlp-desk",
-  "version": "0.17.0",
+  "version": "0.18.1",
   "description": "Fresh-context iterative loops for Claude Code — autonomous task completion with independent verification",
   "scripts": {
     "postinstall": "node scripts/postinstall.js",

package/src/governance.md CHANGED Viewed

@@ -66,9 +66,25 @@ Verification layers:
 - L3: E2E Simulation — known input → full pipeline → output comparison. Always required.
 - L4: Deploy Verify — production environment checks. Required when deploying.
-Layer requirements per US are determined by risk classification (§1c).
-Non-applicable layers must be explicitly marked "N/A — {reason}" in the test-spec.
-Any required layer section with TODO or blank = automatic Verifier FAIL.
+Layer requirements per US are determined by risk classification (§1c). Layer
+completeness means ACTUAL verification COVERAGE, not test-spec FORMAT: a required
+layer is satisfied when its behavior is genuinely exercised by a concrete, PASSING
+check — e.g. a per-AC command in the Criteria→Verification table counts as L1/L3
+coverage. Explicit "## L1/L2/L3" section headers and "N/A — {reason}" markers are
+recommended but NOT mandatory, and their absence is NOT itself a fail.
+FAIL a required layer ONLY when its verification is genuinely absent, blank, TODO,
+or failing — never merely for a different-but-adequate test-spec format, a missing
+N/A marker, or RED evidence aggregated across the US instead of recorded per-AC.
+The iter-signal.json only identifies WHICH US to verify; whether the Worker wrote
+it or the leader synthesized it does not change the verdict — verify the work, not
+the signal's author. Deliverable COMPLETENESS is NOT exempt: if the work an AC
+requires is absent, uncommitted/untracked, or never actually exercised, the layer
+still FAILs. This rule is IDENTICAL for every verifier engine (claude AND codex):
+do NOT block a PASS on test-spec FORMAT when the acceptance criteria are met and
+their checks are green — but DO fail on missing or unverified substance. This
+preserves strict AC + real layer coverage (IL-4 still governs test sufficiency);
+it removes only format pedantry, never substance — the F-17 cross-engine
+consistency fix (scope narrowed to format-only per the F-18 review).
 See §1d for full layer definitions.
 **IL-4: Test Sufficiency**