@ai-dev-methodologies/rlp-desk 0.16.0 → 0.18.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +54 -0
- package/docs/rlp-desk/getting-started.md +1 -1
- package/docs/rlp-desk/protocol-reference.md +1 -1
- package/package.json +5 -5
- package/scripts/uninstall.js +12 -0
- package/src/commands/rlp-desk.md +1 -1
- package/src/governance.md +20 -4
- package/src/node/run.mjs +25 -33
- package/src/node/runner/campaign-main-loop.mjs +1 -1
- package/src/scripts/.run_src_verify.zsh +3725 -0
- package/src/scripts/init_ralph_desk.zsh +4 -3
- package/src/scripts/lib_ralph_desk.zsh +126 -6
- package/src/scripts/run_ralph_desk.zsh +668 -126
- /package/examples/calculator/{.claude/ralph-desk → .rlp-desk}/context/loop-test-latest.md +0 -0
- /package/examples/calculator/{.claude/ralph-desk → .rlp-desk}/memos/loop-test-memory.md +0 -0
- /package/examples/calculator/{.claude/ralph-desk → .rlp-desk}/plans/prd-loop-test.md +0 -0
- /package/examples/calculator/{.claude/ralph-desk → .rlp-desk}/plans/test-spec-loop-test.md +0 -0
- /package/examples/calculator/{.claude/ralph-desk → .rlp-desk}/prompts/loop-test.verifier.prompt.md +0 -0
- /package/examples/calculator/{.claude/ralph-desk → .rlp-desk}/prompts/loop-test.worker.prompt.md +0 -0
|
@@ -599,7 +599,7 @@ Check the iter-signal.json "us_id" field:
|
|
|
599
599
|
2. Read done claim
|
|
600
600
|
3. Identify scope: run \`git diff --name-only\` to find changed files, then read those files + related imports only
|
|
601
601
|
4. **Scope Lock check**: (a) Read the Next Iteration Contract from campaign memory to identify the contracted US. (b) Run \`git diff --name-only\` to list all changed files. (c) For each changed file, verify it is plausibly related to the contracted US's acceptance criteria. (d) Flag files that appear unrelated. (e) Shared infrastructure (types, configs, common utilities) and dependency files are permitted if the AC implies them.
|
|
602
|
-
5. **Layer Enforcement**: check
|
|
602
|
+
5. **Layer Enforcement (IL-3)**: confirm each REQUIRED layer is actually verified by a concrete PASSING check (a per-AC command in the Criteria-to-Verification table counts as L1/L3 coverage). Explicit "## L1/L2/L3" section headers and "N/A" markers are NOT mandatory — their absence is NOT a fail. FAIL a required layer ONLY when its verification is genuinely absent, blank, TODO, or failing — never for format alone. (Identical for claude AND codex.)
|
|
603
603
|
6. Run fresh verification: execute ALL commands from test-spec verification layers (L1, L2, L3, L4 as applicable)
|
|
604
604
|
**Skip detection (IL-5)**: After running tests, check output for "skip", "pending", "not run", or "0 items collected". Tests that did not actually execute do NOT count as passed. If test_count_executed < test_count_expected, verdict = FAIL ("skipped tests detected").
|
|
605
605
|
7. Check each criterion against fresh evidence (only for the scoped US, or all if us_id=ALL)
|
|
@@ -618,10 +618,11 @@ Check the iter-signal.json "us_id" field:
|
|
|
618
618
|
- Rationalization red flags: "tests pass so it works" (passing ≠ correct), "Worker is confident" (confidence ≠ evidence), "changes are minimal" (scope ≠ correctness)
|
|
619
619
|
10½. **Worker Process Audit**:
|
|
620
620
|
- Test-first compliance: done-claim execution_steps must show write_test step before implement step for each AC
|
|
621
|
-
- RED phase evidence: at least one verify_red step with exit_code=1
|
|
621
|
+
- RED phase evidence: at least one verify_red step with exit_code=1 for the US (proves tests were written before passing). Per-AC RED is preferred, but AGGREGATE RED evidence is acceptable — do NOT FAIL merely because red/green is aggregated rather than per-AC.
|
|
622
622
|
- Forbidden shortcuts: check done-claim claims and summary for forbidden phrases ("code inspection", "I'm confident", "too simple", "I'll test after", "already manually tested", "partial check")
|
|
623
623
|
- Step completeness: each AC should have write_test → verify_red → implement → verify_green sequence in execution_steps
|
|
624
624
|
- Planning Step presence: done-claim execution_steps should include a \`plan\` step as the first entry. If missing, record in reasoning as {"check": "Planning Step", "decision": "info", "basis": "plan step present/absent"} — informational only (does not affect pass/fail verdict)
|
|
625
|
+
10¾. **FORMAT is not a PASS-blocker, but SUBSTANCE always is (F-17→F-18, identical for claude AND codex)**: when the acceptance criteria are met and their FRESH checks are green (per the Evidence Gate), record pure-FORMAT observations — missing layer-section headers, a missing N/A marker, RED evidence aggregated rather than per-AC — as warnings in reasoning, NOT as a FAIL. The iter-signal.json only identifies WHICH US to verify; its author (Worker vs leader-synthesized) does not change the verdict. But deliverable COMPLETENESS is NOT a format concern — if an AC's work is absent, uncommitted/untracked, or never actually exercised, that is a FAIL. The real correctness gates (Evidence Gate, Test Sufficiency IL-4, Skip detection IL-5, Anti-Gaming) stay strict regardless.
|
|
625
626
|
11. **Reproducibility check**: verify lock file committed, clean install succeeds, security scan passes, env vars documented (per test-spec Reproducibility Gate). Skip if test-spec says "N/A."
|
|
626
627
|
12. Write verdict JSON to: $DESK/memos/$SLUG-verify-verdict.json
|
|
627
628
|
**CRITICAL: You MUST write the verdict as a FILE (not stdout/echo/cat). The Leader polls this file path — terminal output is lost. Evidence strings: include key metrics and exit codes only, do NOT quote full command output or logs verbatim.**
|
|
@@ -761,7 +762,7 @@ Based on your decision, update campaign memory:
|
|
|
761
762
|
current direction. The wrapper polls this field for autonomous
|
|
762
763
|
multi-mission orchestration (rlp-desk does not auto-launch missions —
|
|
763
764
|
the consumer wrapper owns that policy). Field is OPTIONAL; absence is
|
|
764
|
-
treated as null. See docs/multi-mission-orchestration.md for the
|
|
765
|
+
treated as null. See docs/rlp-desk/multi-mission-orchestration.md for the
|
|
765
766
|
consumer-side polling pattern.
|
|
766
767
|
FLYWHEEL_EOF
|
|
767
768
|
|
|
@@ -240,8 +240,93 @@ record_us_failure() {
|
|
|
240
240
|
atomic_write() {
|
|
241
241
|
local target="$1"
|
|
242
242
|
local tmp="${target}.tmp.$$"
|
|
243
|
-
|
|
244
|
-
|
|
243
|
+
# F-26: check BOTH stages. A truncated tmp (ENOSPC / SIGPIPE / full disk) must
|
|
244
|
+
# never be atomically renamed into the canonical path — a half-written
|
|
245
|
+
# complete/blocked/status sentinel would otherwise pass existence checks and
|
|
246
|
+
# mis-drive (or falsely terminate) the campaign. On failure: drop the tmp,
|
|
247
|
+
# leave the existing target untouched, and signal the error to callers that
|
|
248
|
+
# check. Behaviour on success is unchanged.
|
|
249
|
+
if ! cat > "$tmp"; then
|
|
250
|
+
rm -f "$tmp" 2>/dev/null
|
|
251
|
+
return 1
|
|
252
|
+
fi
|
|
253
|
+
if ! mv "$tmp" "$target" 2>/dev/null; then
|
|
254
|
+
rm -f "$tmp" 2>/dev/null
|
|
255
|
+
return 1
|
|
256
|
+
fi
|
|
257
|
+
return 0
|
|
258
|
+
}
|
|
259
|
+
|
|
260
|
+
# =============================================================================
|
|
261
|
+
# ZSH-4: race-safe per-slug lock acquisition (redesign, v0.17.1)
|
|
262
|
+
# =============================================================================
|
|
263
|
+
# Acquire an exclusive lock at $1 (a file holding the owner PID). Race-safe vs:
|
|
264
|
+
# (a) two concurrent stale-lock recoverers,
|
|
265
|
+
# (b) a normal starter slipping into the rm/create gap,
|
|
266
|
+
# (c) a recovery mutex leaked by a crashed recoverer.
|
|
267
|
+
# Algorithm: fast path is `set -C` (noclobber) atomic create. On contention with
|
|
268
|
+
# a STALE (dead-owner) lock, recovery is serialized by an atomic `mkdir` mutex
|
|
269
|
+
# whose own staleness is PID-based (never age-based, so a slow-but-alive recoverer
|
|
270
|
+
# is never falsely reaped). Inside the mutex we re-read the lock (don't clobber a
|
|
271
|
+
# live holder that recovered first) and re-acquire with `set -C` (so a starter
|
|
272
|
+
# that grabbed the lock in the gap wins instead of us). Echoes nothing; returns:
|
|
273
|
+
# 0 = acquired (caller should set LOCKFILE_ACQUIRED=1 and trap cleanup)
|
|
274
|
+
# 1 = busy (a live instance holds the lock) OR lost a recovery race — caller exits
|
|
275
|
+
acquire_slug_lock() {
|
|
276
|
+
local lockfile="$1"
|
|
277
|
+
mkdir -p "$(dirname "$lockfile")" 2>/dev/null
|
|
278
|
+
# Fast path: atomic noclobber create.
|
|
279
|
+
if (set -C; echo $$ > "$lockfile") 2>/dev/null; then
|
|
280
|
+
return 0
|
|
281
|
+
fi
|
|
282
|
+
local lock_pid
|
|
283
|
+
lock_pid=$(cat "$lockfile" 2>/dev/null)
|
|
284
|
+
if [[ -n "$lock_pid" ]] && kill -0 "$lock_pid" 2>/dev/null; then
|
|
285
|
+
return 1 # a live instance holds it
|
|
286
|
+
fi
|
|
287
|
+
# Stale lock (dead/unknown owner) — recover under an atomic mkdir mutex.
|
|
288
|
+
local rmutex="${lockfile}.recovery.d"
|
|
289
|
+
# Reap a leaked mutex ONLY when we can prove its owner is dead. An EMPTY owner
|
|
290
|
+
# is NOT proof of death: it usually means another recoverer just won the `mkdir`
|
|
291
|
+
# and has not yet written its PID (the window between `mkdir` and the owner
|
|
292
|
+
# write below). Reaping an empty-owner mutex here is a TOCTOU that deletes a
|
|
293
|
+
# LIVE mid-creation holder, letting two recoverers both proceed. So: a
|
|
294
|
+
# present-but-dead owner is reaped immediately; for an empty owner we give a
|
|
295
|
+
# brief settle window and re-read — if a PID appears it is a live holder and we
|
|
296
|
+
# do NOT reap (we lose the mkdir below and back off), and only if it stays
|
|
297
|
+
# empty do we treat it as a genuinely leaked mutex (creator died in the gap).
|
|
298
|
+
if [[ -d "$rmutex" ]]; then
|
|
299
|
+
local mowner
|
|
300
|
+
mowner=$(cat "$rmutex/owner" 2>/dev/null)
|
|
301
|
+
if [[ -z "$mowner" ]]; then
|
|
302
|
+
sleep 0.3
|
|
303
|
+
mowner=$(cat "$rmutex/owner" 2>/dev/null)
|
|
304
|
+
fi
|
|
305
|
+
if [[ -z "$mowner" ]] || ! kill -0 "$mowner" 2>/dev/null; then
|
|
306
|
+
rm -rf "$rmutex" 2>/dev/null
|
|
307
|
+
fi
|
|
308
|
+
fi
|
|
309
|
+
if ! mkdir "$rmutex" 2>/dev/null; then
|
|
310
|
+
return 1 # another recoverer owns the critical section
|
|
311
|
+
fi
|
|
312
|
+
echo $$ > "$rmutex/owner" 2>/dev/null
|
|
313
|
+
# Critical section: re-read the lock. If a prior recoverer installed a LIVE pid,
|
|
314
|
+
# do not clobber it.
|
|
315
|
+
local cur_pid
|
|
316
|
+
cur_pid=$(cat "$lockfile" 2>/dev/null)
|
|
317
|
+
if [[ -n "$cur_pid" && "$cur_pid" != "$$" ]] && kill -0 "$cur_pid" 2>/dev/null; then
|
|
318
|
+
rm -rf "$rmutex" 2>/dev/null
|
|
319
|
+
return 1
|
|
320
|
+
fi
|
|
321
|
+
# Replace the stale lock, re-acquiring with noclobber so a starter that slipped
|
|
322
|
+
# into the gap (and created the lock) wins — we lose cleanly instead of clobbering.
|
|
323
|
+
rm -f "$lockfile" 2>/dev/null
|
|
324
|
+
if ! (set -C; echo $$ > "$lockfile") 2>/dev/null; then
|
|
325
|
+
rm -rf "$rmutex" 2>/dev/null
|
|
326
|
+
return 1
|
|
327
|
+
fi
|
|
328
|
+
rm -rf "$rmutex" 2>/dev/null
|
|
329
|
+
return 0
|
|
245
330
|
}
|
|
246
331
|
|
|
247
332
|
# =============================================================================
|
|
@@ -587,6 +672,12 @@ update_status() {
|
|
|
587
672
|
verified_us_json=$(echo "$VERIFIED_US" | tr ',' '\n' | jq -R . | jq -s .)
|
|
588
673
|
fi
|
|
589
674
|
|
|
675
|
+
# D-5: jq-encode the free-text restore fields so a reason/model with special
|
|
676
|
+
# chars can't corrupt the status JSON (the rest of this builder is echo-based).
|
|
677
|
+
local _lbr_json _owm_json
|
|
678
|
+
_lbr_json=$(printf '%s' "${LAST_BLOCK_REASON:-}" | jq -Rs . 2>/dev/null); [[ -z "$_lbr_json" ]] && _lbr_json='""'
|
|
679
|
+
_owm_json=$(printf '%s' "${_ORIGINAL_WORKER_MODEL:-}" | jq -Rs . 2>/dev/null); [[ -z "$_owm_json" ]] && _owm_json='""'
|
|
680
|
+
|
|
590
681
|
# Build consensus fields
|
|
591
682
|
local consensus_json=""
|
|
592
683
|
if [[ "$CONSENSUS_MODE" != "off" ]]; then
|
|
@@ -615,6 +706,11 @@ update_status() {
|
|
|
615
706
|
"consensus_mode": "'"$CONSENSUS_MODE"'",
|
|
616
707
|
"last_result": "'"$last_result"'",
|
|
617
708
|
"consecutive_failures": '"$CONSECUTIVE_FAILURES"',
|
|
709
|
+
"consecutive_blocks": '"${CONSECUTIVE_BLOCKS:-0}"',
|
|
710
|
+
"last_block_reason": '"$_lbr_json"',
|
|
711
|
+
"model_upgraded": '"${_MODEL_UPGRADED:-0}"',
|
|
712
|
+
"same_us_fail_count": '"${_SAME_US_FAIL_COUNT:-0}"',
|
|
713
|
+
"original_worker_model": '"$_owm_json"',
|
|
618
714
|
"verified_us": '"$verified_us_json"''"$consensus_json"',
|
|
619
715
|
"updated_at_utc": "'"$(date -u +%Y-%m-%dT%H:%M:%SZ)"'"
|
|
620
716
|
}' | atomic_write "$STATUS_FILE"
|
|
@@ -756,9 +852,18 @@ _lint_test_density() {
|
|
|
756
852
|
us_list=$(grep -oE '^##[[:space:]]+US-[0-9]+' "$prd_file" 2>/dev/null | grep -oE 'US-[0-9]+' | sort -u)
|
|
757
853
|
[[ -z "$us_list" ]] && return 0
|
|
758
854
|
|
|
759
|
-
|
|
760
|
-
|
|
761
|
-
|
|
855
|
+
# ZSH-8: prefer the campaign LOGS_DIR. When it is unavailable, avoid a fixed,
|
|
856
|
+
# predictable /tmp name (insecure-temp: symlink/collision risk) by creating a
|
|
857
|
+
# unique temp file via mktemp; fall back to a PID-scoped name only if mktemp
|
|
858
|
+
# is missing.
|
|
859
|
+
local audit_dir="${LOGS_DIR:-}"
|
|
860
|
+
local audit_file
|
|
861
|
+
if [[ -n "$audit_dir" && -d "$audit_dir" ]]; then
|
|
862
|
+
audit_file="$audit_dir/test-density-audit.jsonl"
|
|
863
|
+
else
|
|
864
|
+
audit_file=$(mktemp "${TMPDIR:-/tmp}/test-density-audit.XXXXXX" 2>/dev/null) \
|
|
865
|
+
|| audit_file="${TMPDIR:-/tmp}/test-density-audit.$$.jsonl"
|
|
866
|
+
fi
|
|
762
867
|
|
|
763
868
|
local us
|
|
764
869
|
for us in ${(f)us_list}; do
|
|
@@ -1179,6 +1284,11 @@ Summary: $summary
|
|
|
1179
1284
|
Completed at iteration $ITERATION.
|
|
1180
1285
|
|
|
1181
1286
|
Timestamp: $(date -u +%Y-%m-%dT%H:%M:%SZ)" | atomic_write "$COMPLETE_SENTINEL"
|
|
1287
|
+
# F-26: propagate atomic_write failure — never log success on a failed write.
|
|
1288
|
+
if (( ${pipestatus[-1]:-0} != 0 )); then
|
|
1289
|
+
log_error "FAILED to write COMPLETE sentinel ($COMPLETE_SENTINEL) — IO/disk error; completion NOT durably recorded"
|
|
1290
|
+
return 1
|
|
1291
|
+
fi
|
|
1182
1292
|
log "COMPLETE sentinel written: $COMPLETE_SENTINEL"
|
|
1183
1293
|
}
|
|
1184
1294
|
|
|
@@ -1288,6 +1398,7 @@ write_blocked_sentinel() {
|
|
|
1288
1398
|
suggested_action: $action,
|
|
1289
1399
|
meta: { blocked_hygiene_violated: $hygiene }
|
|
1290
1400
|
}' | atomic_write "$json_path"
|
|
1401
|
+
local _bs_json_rc=${pipestatus[-1]:-0}
|
|
1291
1402
|
|
|
1292
1403
|
echo "BLOCKED: $us_id
|
|
1293
1404
|
Reason: $reason
|
|
@@ -1298,7 +1409,16 @@ Category: $category
|
|
|
1298
1409
|
Blocked at iteration $ITERATION.
|
|
1299
1410
|
|
|
1300
1411
|
Timestamp: $now_iso" | atomic_write "$BLOCKED_SENTINEL"
|
|
1301
|
-
|
|
1412
|
+
local _bs_md_rc=${pipestatus[-1]:-0}
|
|
1413
|
+
|
|
1414
|
+
# F-26: propagate atomic_write failure. The "markdown ⇒ JSON" invariant means a
|
|
1415
|
+
# half-written sentinel must surface loudly, not log false success. (Best-effort
|
|
1416
|
+
# signal: callers already `return 1` after this, so we log+return rather than
|
|
1417
|
+
# restructure every caller.)
|
|
1418
|
+
if (( _bs_md_rc != 0 || _bs_json_rc != 0 )); then
|
|
1419
|
+
log_error "FAILED to durably write BLOCKED sentinel (md_rc=$_bs_md_rc json_rc=$_bs_json_rc) for [$category] $reason — IO/disk error"
|
|
1420
|
+
return 1
|
|
1421
|
+
fi
|
|
1302
1422
|
log_error "Campaign BLOCKED: [$category] $reason"
|
|
1303
1423
|
log "BLOCKED sentinel written: $BLOCKED_SENTINEL"
|
|
1304
1424
|
log "BLOCKED sidecar written: $json_path"
|