@ai-dev-methodologies/rlp-desk 0.17.0 → 0.18.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -11,6 +11,45 @@ For pre-v0.15.4 versions, refer to `git log` and individual GitHub release notes
11
11
  - Post-v0.15.6: remove `RLP_LIFECYCLE_METRICS` flag entirely (per plan v3 ADR follow-ups).
12
12
  - Phase D.1 (handoff documents) + Phase D.2 (per-stage agent role specialization) — both deferred per `docs/plans/v0.15.4-release-runbook.md` §7.6.
13
13
 
14
+ ## [0.18.0] — 2026-06-24
15
+
16
+ **Leader hardening: campaigns that complete.** The `--mode tmux` leader's decision
17
+ gates used to TERMINATE where they should RECOVER, so campaigns could stall at
18
+ `--max-iter` or block on a single transient slip. This release resolves that
19
+ "never completes" failure class.
20
+
21
+ ### Fixed
22
+ - A single transient `blocked` verdict no longer ends the campaign. A fresh-context
23
+ Worker/Verifier mis-emitting `blocked` once now gets grace (soft-fail + retry);
24
+ termination only on a genuine infra failure or a repeated/consecutive block.
25
+ - Verifier phrasing variants (`completed`/`done`, lowercase `all`) no longer strand a
26
+ genuinely-complete campaign at the iteration cap — recommendation/us_id normalized at read.
27
+ - Mixed-engine dead-pane detection: a dead pane is detected per role when Worker and
28
+ Verifier run different engines (e.g. claude worker + codex verifier).
29
+ - No double-credit of a re-submitted User Story on the verify pass-path.
30
+ - Final verification now retries a single transient poll failure instead of charging a
31
+ false fail at the end of the campaign.
32
+ - Durable sentinels: atomic writes detect truncation (disk-full/SIGPIPE) instead of
33
+ renaming a half-written file into the canonical path.
34
+ - Race-safe locks: the runner-lock and per-slug lock close an ABA/empty-owner race that
35
+ could let two leaders run on one project root.
36
+ - Done-claim freshness gate: auto-synthesis rejects a stale/wrong-US done-claim, preventing
37
+ the wrong User Story from being credited.
38
+ - Model-upgrade state survives relaunch: a resumed campaign keeps the upgraded worker model
39
+ instead of resetting to the CLI default.
40
+ - codex 0.141 worker compatibility: the directory-trust prompt is accepted and MCP disabled
41
+ so codex workers don't block on startup.
42
+
43
+ ### Changed
44
+ - Stronger final verification: `--final-verifier-model` / `--final-verifier-effort` are now
45
+ actually used for the final ALL-verify pass (previously every verify ran at the per-US
46
+ model). "Final = stricter" now holds.
47
+ - Consensus cross-verifier model knobs now take effect: `--consensus-model` (per-US, lighter)
48
+ and `--final-consensus-model` (final, stricter) are wired into the consensus path —
49
+ previously documented but inert.
50
+ - Verifier FORMAT meta-gates softened cross-engine while correctness gates stay strict
51
+ (fewer false fails on phrasing, no weakening of substance).
52
+
14
53
  ## [0.17.0] — 2026-06-19
15
54
 
16
55
  **BREAKING (ADR-001): `node run.mjs run <slug> --mode agent` (the deprecated Node-leader direct-CLI alpha) now hard-errors.** Per the schedule announced in 0.16.0, the `--mode agent` entry point exits 2 with a redirect, and the Node-CLI default mode flips from `agent` to `tmux` (the canonical production leader). The `src/node/**` engine modules are retained (they back the Native Agent() path and the test suite); only the direct-CLI `--mode agent` *dispatch entry* was removed. **Migration:** replace `node run.mjs run <slug> --mode agent` with `--mode tmux`. The slash command's `--mode native` default is unaffected.
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@ai-dev-methodologies/rlp-desk",
3
- "version": "0.17.0",
3
+ "version": "0.18.0",
4
4
  "description": "Fresh-context iterative loops for Claude Code — autonomous task completion with independent verification",
5
5
  "scripts": {
6
6
  "postinstall": "node scripts/postinstall.js",
package/src/governance.md CHANGED
@@ -66,9 +66,25 @@ Verification layers:
66
66
  - L3: E2E Simulation — known input → full pipeline → output comparison. Always required.
67
67
  - L4: Deploy Verify — production environment checks. Required when deploying.
68
68
 
69
- Layer requirements per US are determined by risk classification (§1c).
70
- Non-applicable layers must be explicitly marked "N/A {reason}" in the test-spec.
71
- Any required layer section with TODO or blank = automatic Verifier FAIL.
69
+ Layer requirements per US are determined by risk classification (§1c). Layer
70
+ completeness means ACTUAL verification COVERAGE, not test-spec FORMAT: a required
71
+ layer is satisfied when its behavior is genuinely exercised by a concrete, PASSING
72
+ check — e.g. a per-AC command in the Criteria→Verification table counts as L1/L3
73
+ coverage. Explicit "## L1/L2/L3" section headers and "N/A — {reason}" markers are
74
+ recommended but NOT mandatory, and their absence is NOT itself a fail.
75
+ FAIL a required layer ONLY when its verification is genuinely absent, blank, TODO,
76
+ or failing — never merely for a different-but-adequate test-spec format, a missing
77
+ N/A marker, or RED evidence aggregated across the US instead of recorded per-AC.
78
+ The iter-signal.json only identifies WHICH US to verify; whether the Worker wrote
79
+ it or the leader synthesized it does not change the verdict — verify the work, not
80
+ the signal's author. Deliverable COMPLETENESS is NOT exempt: if the work an AC
81
+ requires is absent, uncommitted/untracked, or never actually exercised, the layer
82
+ still FAILs. This rule is IDENTICAL for every verifier engine (claude AND codex):
83
+ do NOT block a PASS on test-spec FORMAT when the acceptance criteria are met and
84
+ their checks are green — but DO fail on missing or unverified substance. This
85
+ preserves strict AC + real layer coverage (IL-4 still governs test sufficiency);
86
+ it removes only format pedantry, never substance — the F-17 cross-engine
87
+ consistency fix (scope narrowed to format-only per the F-18 review).
72
88
  See §1d for full layer definitions.
73
89
 
74
90
  **IL-4: Test Sufficiency**