@ai-dev-methodologies/rlp-desk 0.17.0 → 0.18.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +49 -0
- package/package.json +1 -1
- package/src/governance.md +19 -3
- package/src/scripts/.run_src_verify.zsh +3725 -0
- package/src/scripts/init_ralph_desk.zsh +3 -2
- package/src/scripts/lib_ralph_desk.zsh +114 -3
- package/src/scripts/run_ralph_desk.zsh +714 -131
|
@@ -599,7 +599,7 @@ Check the iter-signal.json "us_id" field:
|
|
|
599
599
|
2. Read done claim
|
|
600
600
|
3. Identify scope: run \`git diff --name-only\` to find changed files, then read those files + related imports only
|
|
601
601
|
4. **Scope Lock check**: (a) Read the Next Iteration Contract from campaign memory to identify the contracted US. (b) Run \`git diff --name-only\` to list all changed files. (c) For each changed file, verify it is plausibly related to the contracted US's acceptance criteria. (d) Flag files that appear unrelated. (e) Shared infrastructure (types, configs, common utilities) and dependency files are permitted if the AC implies them.
|
|
602
|
-
5. **Layer Enforcement**: check
|
|
602
|
+
5. **Layer Enforcement (IL-3)**: confirm each REQUIRED layer is actually verified by a concrete PASSING check (a per-AC command in the Criteria-to-Verification table counts as L1/L3 coverage). Explicit "## L1/L2/L3" section headers and "N/A" markers are NOT mandatory — their absence is NOT a fail. FAIL a required layer ONLY when its verification is genuinely absent, blank, TODO, or failing — never for format alone. (Identical for claude AND codex.)
|
|
603
603
|
6. Run fresh verification: execute ALL commands from test-spec verification layers (L1, L2, L3, L4 as applicable)
|
|
604
604
|
**Skip detection (IL-5)**: After running tests, check output for "skip", "pending", "not run", or "0 items collected". Tests that did not actually execute do NOT count as passed. If test_count_executed < test_count_expected, verdict = FAIL ("skipped tests detected").
|
|
605
605
|
7. Check each criterion against fresh evidence (only for the scoped US, or all if us_id=ALL)
|
|
@@ -618,10 +618,11 @@ Check the iter-signal.json "us_id" field:
|
|
|
618
618
|
- Rationalization red flags: "tests pass so it works" (passing ≠ correct), "Worker is confident" (confidence ≠ evidence), "changes are minimal" (scope ≠ correctness)
|
|
619
619
|
10½. **Worker Process Audit**:
|
|
620
620
|
- Test-first compliance: done-claim execution_steps must show write_test step before implement step for each AC
|
|
621
|
-
- RED phase evidence: at least one verify_red step with exit_code=1
|
|
621
|
+
- RED phase evidence: at least one verify_red step with exit_code=1 for the US (proves tests were written before passing). Per-AC RED is preferred, but AGGREGATE RED evidence is acceptable — do NOT FAIL merely because red/green is aggregated rather than per-AC.
|
|
622
622
|
- Forbidden shortcuts: check done-claim claims and summary for forbidden phrases ("code inspection", "I'm confident", "too simple", "I'll test after", "already manually tested", "partial check")
|
|
623
623
|
- Step completeness: each AC should have write_test → verify_red → implement → verify_green sequence in execution_steps
|
|
624
624
|
- Planning Step presence: done-claim execution_steps should include a \`plan\` step as the first entry. If missing, record in reasoning as {"check": "Planning Step", "decision": "info", "basis": "plan step present/absent"} — informational only (does not affect pass/fail verdict)
|
|
625
|
+
10¾. **FORMAT is not a PASS-blocker, but SUBSTANCE always is (F-17→F-18, identical for claude AND codex)**: when the acceptance criteria are met and their FRESH checks are green (per the Evidence Gate), record pure-FORMAT observations — missing layer-section headers, a missing N/A marker, RED evidence aggregated rather than per-AC — as warnings in reasoning, NOT as a FAIL. The iter-signal.json only identifies WHICH US to verify; its author (Worker vs leader-synthesized) does not change the verdict. But deliverable COMPLETENESS is NOT a format concern — if an AC's work is absent, uncommitted/untracked, or never actually exercised, that is a FAIL. The real correctness gates (Evidence Gate, Test Sufficiency IL-4, Skip detection IL-5, Anti-Gaming) stay strict regardless.
|
|
625
626
|
11. **Reproducibility check**: verify lock file committed, clean install succeeds, security scan passes, env vars documented (per test-spec Reproducibility Gate). Skip if test-spec says "N/A."
|
|
626
627
|
12. Write verdict JSON to: $DESK/memos/$SLUG-verify-verdict.json
|
|
627
628
|
**CRITICAL: You MUST write the verdict as a FILE (not stdout/echo/cat). The Leader polls this file path — terminal output is lost. Evidence strings: include key metrics and exit codes only, do NOT quote full command output or logs verbatim.**
|
|
@@ -240,8 +240,93 @@ record_us_failure() {
|
|
|
240
240
|
atomic_write() {
|
|
241
241
|
local target="$1"
|
|
242
242
|
local tmp="${target}.tmp.$$"
|
|
243
|
-
|
|
244
|
-
|
|
243
|
+
# F-26: check BOTH stages. A truncated tmp (ENOSPC / SIGPIPE / full disk) must
|
|
244
|
+
# never be atomically renamed into the canonical path — a half-written
|
|
245
|
+
# complete/blocked/status sentinel would otherwise pass existence checks and
|
|
246
|
+
# mis-drive (or falsely terminate) the campaign. On failure: drop the tmp,
|
|
247
|
+
# leave the existing target untouched, and signal the error to callers that
|
|
248
|
+
# check. Behaviour on success is unchanged.
|
|
249
|
+
if ! cat > "$tmp"; then
|
|
250
|
+
rm -f "$tmp" 2>/dev/null
|
|
251
|
+
return 1
|
|
252
|
+
fi
|
|
253
|
+
if ! mv "$tmp" "$target" 2>/dev/null; then
|
|
254
|
+
rm -f "$tmp" 2>/dev/null
|
|
255
|
+
return 1
|
|
256
|
+
fi
|
|
257
|
+
return 0
|
|
258
|
+
}
|
|
259
|
+
|
|
260
|
+
# =============================================================================
|
|
261
|
+
# ZSH-4: race-safe per-slug lock acquisition (redesign, v0.17.1)
|
|
262
|
+
# =============================================================================
|
|
263
|
+
# Acquire an exclusive lock at $1 (a file holding the owner PID). Race-safe vs:
|
|
264
|
+
# (a) two concurrent stale-lock recoverers,
|
|
265
|
+
# (b) a normal starter slipping into the rm/create gap,
|
|
266
|
+
# (c) a recovery mutex leaked by a crashed recoverer.
|
|
267
|
+
# Algorithm: fast path is `set -C` (noclobber) atomic create. On contention with
|
|
268
|
+
# a STALE (dead-owner) lock, recovery is serialized by an atomic `mkdir` mutex
|
|
269
|
+
# whose own staleness is PID-based (never age-based, so a slow-but-alive recoverer
|
|
270
|
+
# is never falsely reaped). Inside the mutex we re-read the lock (don't clobber a
|
|
271
|
+
# live holder that recovered first) and re-acquire with `set -C` (so a starter
|
|
272
|
+
# that grabbed the lock in the gap wins instead of us). Echoes nothing; returns:
|
|
273
|
+
# 0 = acquired (caller should set LOCKFILE_ACQUIRED=1 and trap cleanup)
|
|
274
|
+
# 1 = busy (a live instance holds the lock) OR lost a recovery race — caller exits
|
|
275
|
+
acquire_slug_lock() {
|
|
276
|
+
local lockfile="$1"
|
|
277
|
+
mkdir -p "$(dirname "$lockfile")" 2>/dev/null
|
|
278
|
+
# Fast path: atomic noclobber create.
|
|
279
|
+
if (set -C; echo $$ > "$lockfile") 2>/dev/null; then
|
|
280
|
+
return 0
|
|
281
|
+
fi
|
|
282
|
+
local lock_pid
|
|
283
|
+
lock_pid=$(cat "$lockfile" 2>/dev/null)
|
|
284
|
+
if [[ -n "$lock_pid" ]] && kill -0 "$lock_pid" 2>/dev/null; then
|
|
285
|
+
return 1 # a live instance holds it
|
|
286
|
+
fi
|
|
287
|
+
# Stale lock (dead/unknown owner) — recover under an atomic mkdir mutex.
|
|
288
|
+
local rmutex="${lockfile}.recovery.d"
|
|
289
|
+
# Reap a leaked mutex ONLY when we can prove its owner is dead. An EMPTY owner
|
|
290
|
+
# is NOT proof of death: it usually means another recoverer just won the `mkdir`
|
|
291
|
+
# and has not yet written its PID (the window between `mkdir` and the owner
|
|
292
|
+
# write below). Reaping an empty-owner mutex here is a TOCTOU that deletes a
|
|
293
|
+
# LIVE mid-creation holder, letting two recoverers both proceed. So: a
|
|
294
|
+
# present-but-dead owner is reaped immediately; for an empty owner we give a
|
|
295
|
+
# brief settle window and re-read — if a PID appears it is a live holder and we
|
|
296
|
+
# do NOT reap (we lose the mkdir below and back off), and only if it stays
|
|
297
|
+
# empty do we treat it as a genuinely leaked mutex (creator died in the gap).
|
|
298
|
+
if [[ -d "$rmutex" ]]; then
|
|
299
|
+
local mowner
|
|
300
|
+
mowner=$(cat "$rmutex/owner" 2>/dev/null)
|
|
301
|
+
if [[ -z "$mowner" ]]; then
|
|
302
|
+
sleep 0.3
|
|
303
|
+
mowner=$(cat "$rmutex/owner" 2>/dev/null)
|
|
304
|
+
fi
|
|
305
|
+
if [[ -z "$mowner" ]] || ! kill -0 "$mowner" 2>/dev/null; then
|
|
306
|
+
rm -rf "$rmutex" 2>/dev/null
|
|
307
|
+
fi
|
|
308
|
+
fi
|
|
309
|
+
if ! mkdir "$rmutex" 2>/dev/null; then
|
|
310
|
+
return 1 # another recoverer owns the critical section
|
|
311
|
+
fi
|
|
312
|
+
echo $$ > "$rmutex/owner" 2>/dev/null
|
|
313
|
+
# Critical section: re-read the lock. If a prior recoverer installed a LIVE pid,
|
|
314
|
+
# do not clobber it.
|
|
315
|
+
local cur_pid
|
|
316
|
+
cur_pid=$(cat "$lockfile" 2>/dev/null)
|
|
317
|
+
if [[ -n "$cur_pid" && "$cur_pid" != "$$" ]] && kill -0 "$cur_pid" 2>/dev/null; then
|
|
318
|
+
rm -rf "$rmutex" 2>/dev/null
|
|
319
|
+
return 1
|
|
320
|
+
fi
|
|
321
|
+
# Replace the stale lock, re-acquiring with noclobber so a starter that slipped
|
|
322
|
+
# into the gap (and created the lock) wins — we lose cleanly instead of clobbering.
|
|
323
|
+
rm -f "$lockfile" 2>/dev/null
|
|
324
|
+
if ! (set -C; echo $$ > "$lockfile") 2>/dev/null; then
|
|
325
|
+
rm -rf "$rmutex" 2>/dev/null
|
|
326
|
+
return 1
|
|
327
|
+
fi
|
|
328
|
+
rm -rf "$rmutex" 2>/dev/null
|
|
329
|
+
return 0
|
|
245
330
|
}
|
|
246
331
|
|
|
247
332
|
# =============================================================================
|
|
@@ -587,6 +672,12 @@ update_status() {
|
|
|
587
672
|
verified_us_json=$(echo "$VERIFIED_US" | tr ',' '\n' | jq -R . | jq -s .)
|
|
588
673
|
fi
|
|
589
674
|
|
|
675
|
+
# D-5: jq-encode the free-text restore fields so a reason/model with special
|
|
676
|
+
# chars can't corrupt the status JSON (the rest of this builder is echo-based).
|
|
677
|
+
local _lbr_json _owm_json
|
|
678
|
+
_lbr_json=$(printf '%s' "${LAST_BLOCK_REASON:-}" | jq -Rs . 2>/dev/null); [[ -z "$_lbr_json" ]] && _lbr_json='""'
|
|
679
|
+
_owm_json=$(printf '%s' "${_ORIGINAL_WORKER_MODEL:-}" | jq -Rs . 2>/dev/null); [[ -z "$_owm_json" ]] && _owm_json='""'
|
|
680
|
+
|
|
590
681
|
# Build consensus fields
|
|
591
682
|
local consensus_json=""
|
|
592
683
|
if [[ "$CONSENSUS_MODE" != "off" ]]; then
|
|
@@ -615,6 +706,11 @@ update_status() {
|
|
|
615
706
|
"consensus_mode": "'"$CONSENSUS_MODE"'",
|
|
616
707
|
"last_result": "'"$last_result"'",
|
|
617
708
|
"consecutive_failures": '"$CONSECUTIVE_FAILURES"',
|
|
709
|
+
"consecutive_blocks": '"${CONSECUTIVE_BLOCKS:-0}"',
|
|
710
|
+
"last_block_reason": '"$_lbr_json"',
|
|
711
|
+
"model_upgraded": '"${_MODEL_UPGRADED:-0}"',
|
|
712
|
+
"same_us_fail_count": '"${_SAME_US_FAIL_COUNT:-0}"',
|
|
713
|
+
"original_worker_model": '"$_owm_json"',
|
|
618
714
|
"verified_us": '"$verified_us_json"''"$consensus_json"',
|
|
619
715
|
"updated_at_utc": "'"$(date -u +%Y-%m-%dT%H:%M:%SZ)"'"
|
|
620
716
|
}' | atomic_write "$STATUS_FILE"
|
|
@@ -1188,6 +1284,11 @@ Summary: $summary
|
|
|
1188
1284
|
Completed at iteration $ITERATION.
|
|
1189
1285
|
|
|
1190
1286
|
Timestamp: $(date -u +%Y-%m-%dT%H:%M:%SZ)" | atomic_write "$COMPLETE_SENTINEL"
|
|
1287
|
+
# F-26: propagate atomic_write failure — never log success on a failed write.
|
|
1288
|
+
if (( ${pipestatus[-1]:-0} != 0 )); then
|
|
1289
|
+
log_error "FAILED to write COMPLETE sentinel ($COMPLETE_SENTINEL) — IO/disk error; completion NOT durably recorded"
|
|
1290
|
+
return 1
|
|
1291
|
+
fi
|
|
1191
1292
|
log "COMPLETE sentinel written: $COMPLETE_SENTINEL"
|
|
1192
1293
|
}
|
|
1193
1294
|
|
|
@@ -1297,6 +1398,7 @@ write_blocked_sentinel() {
|
|
|
1297
1398
|
suggested_action: $action,
|
|
1298
1399
|
meta: { blocked_hygiene_violated: $hygiene }
|
|
1299
1400
|
}' | atomic_write "$json_path"
|
|
1401
|
+
local _bs_json_rc=${pipestatus[-1]:-0}
|
|
1300
1402
|
|
|
1301
1403
|
echo "BLOCKED: $us_id
|
|
1302
1404
|
Reason: $reason
|
|
@@ -1307,7 +1409,16 @@ Category: $category
|
|
|
1307
1409
|
Blocked at iteration $ITERATION.
|
|
1308
1410
|
|
|
1309
1411
|
Timestamp: $now_iso" | atomic_write "$BLOCKED_SENTINEL"
|
|
1310
|
-
|
|
1412
|
+
local _bs_md_rc=${pipestatus[-1]:-0}
|
|
1413
|
+
|
|
1414
|
+
# F-26: propagate atomic_write failure. The "markdown ⇒ JSON" invariant means a
|
|
1415
|
+
# half-written sentinel must surface loudly, not log false success. (Best-effort
|
|
1416
|
+
# signal: callers already `return 1` after this, so we log+return rather than
|
|
1417
|
+
# restructure every caller.)
|
|
1418
|
+
if (( _bs_md_rc != 0 || _bs_json_rc != 0 )); then
|
|
1419
|
+
log_error "FAILED to durably write BLOCKED sentinel (md_rc=$_bs_md_rc json_rc=$_bs_json_rc) for [$category] $reason — IO/disk error"
|
|
1420
|
+
return 1
|
|
1421
|
+
fi
|
|
1311
1422
|
log_error "Campaign BLOCKED: [$category] $reason"
|
|
1312
1423
|
log "BLOCKED sentinel written: $BLOCKED_SENTINEL"
|
|
1313
1424
|
log "BLOCKED sidecar written: $json_path"
|