npm - @ai-dev-methodologies/rlp-desk - Versions diffs - 0.16.0 → 0.18.0 - Mend

@ai-dev-methodologies/rlp-desk 0.16.0 → 0.18.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (19) hide show

package/src/scripts/init_ralph_desk.zsh CHANGED Viewed

@@ -599,7 +599,7 @@ Check the iter-signal.json "us_id" field:
 2. Read done claim
 3. Identify scope: run \`git diff --name-only\` to find changed files, then read those files + related imports only
 4. **Scope Lock check**: (a) Read the Next Iteration Contract from campaign memory to identify the contracted US. (b) Run \`git diff --name-only\` to list all changed files. (c) For each changed file, verify it is plausibly related to the contracted US's acceptance criteria. (d) Flag files that appear unrelated. (e) Shared infrastructure (types, configs, common utilities) and dependency files are permitted if the AC implies them.
-5. **Layer Enforcement**: check test-spec L1/L2/L3/L4 sections. ANY section with TODO or blank = FAIL (IL-3).
+5. **Layer Enforcement (IL-3)**: confirm each REQUIRED layer is actually verified by a concrete PASSING check (a per-AC command in the Criteria-to-Verification table counts as L1/L3 coverage). Explicit "## L1/L2/L3" section headers and "N/A" markers are NOT mandatory — their absence is NOT a fail. FAIL a required layer ONLY when its verification is genuinely absent, blank, TODO, or failing — never for format alone. (Identical for claude AND codex.)
 6. Run fresh verification: execute ALL commands from test-spec verification layers (L1, L2, L3, L4 as applicable)
    **Skip detection (IL-5)**: After running tests, check output for "skip", "pending", "not run", or "0 items collected". Tests that did not actually execute do NOT count as passed. If test_count_executed < test_count_expected, verdict = FAIL ("skipped tests detected").
 7. Check each criterion against fresh evidence (only for the scoped US, or all if us_id=ALL)
@@ -618,10 +618,11 @@ Check the iter-signal.json "us_id" field:
    - Rationalization red flags: "tests pass so it works" (passing ≠ correct), "Worker is confident" (confidence ≠ evidence), "changes are minimal" (scope ≠ correctness)
 10½. **Worker Process Audit**:
    - Test-first compliance: done-claim execution_steps must show write_test step before implement step for each AC
-   - RED phase evidence: at least one verify_red step with exit_code=1 per AC (proves tests were written before passing)
+   - RED phase evidence: at least one verify_red step with exit_code=1 for the US (proves tests were written before passing). Per-AC RED is preferred, but AGGREGATE RED evidence is acceptable — do NOT FAIL merely because red/green is aggregated rather than per-AC.
    - Forbidden shortcuts: check done-claim claims and summary for forbidden phrases ("code inspection", "I'm confident", "too simple", "I'll test after", "already manually tested", "partial check")
    - Step completeness: each AC should have write_test → verify_red → implement → verify_green sequence in execution_steps
    - Planning Step presence: done-claim execution_steps should include a \`plan\` step as the first entry. If missing, record in reasoning as {"check": "Planning Step", "decision": "info", "basis": "plan step present/absent"} — informational only (does not affect pass/fail verdict)
+10¾. **FORMAT is not a PASS-blocker, but SUBSTANCE always is (F-17→F-18, identical for claude AND codex)**: when the acceptance criteria are met and their FRESH checks are green (per the Evidence Gate), record pure-FORMAT observations — missing layer-section headers, a missing N/A marker, RED evidence aggregated rather than per-AC — as warnings in reasoning, NOT as a FAIL. The iter-signal.json only identifies WHICH US to verify; its author (Worker vs leader-synthesized) does not change the verdict. But deliverable COMPLETENESS is NOT a format concern — if an AC's work is absent, uncommitted/untracked, or never actually exercised, that is a FAIL. The real correctness gates (Evidence Gate, Test Sufficiency IL-4, Skip detection IL-5, Anti-Gaming) stay strict regardless.
 11. **Reproducibility check**: verify lock file committed, clean install succeeds, security scan passes, env vars documented (per test-spec Reproducibility Gate). Skip if test-spec says "N/A."
 12. Write verdict JSON to: $DESK/memos/$SLUG-verify-verdict.json
     **CRITICAL: You MUST write the verdict as a FILE (not stdout/echo/cat). The Leader polls this file path — terminal output is lost. Evidence strings: include key metrics and exit codes only, do NOT quote full command output or logs verbatim.**
@@ -761,7 +762,7 @@ Based on your decision, update campaign memory:
      current direction. The wrapper polls this field for autonomous
      multi-mission orchestration (rlp-desk does not auto-launch missions —
      the consumer wrapper owns that policy). Field is OPTIONAL; absence is
-     treated as null. See docs/multi-mission-orchestration.md for the
+     treated as null. See docs/rlp-desk/multi-mission-orchestration.md for the
      consumer-side polling pattern.
 FLYWHEEL_EOF

package/src/scripts/lib_ralph_desk.zsh CHANGED Viewed

@@ -240,8 +240,93 @@ record_us_failure() {
 atomic_write() {
   local target="$1"
   local tmp="${target}.tmp.$$"
-  cat > "$tmp"
-  mv "$tmp" "$target"
+  # F-26: check BOTH stages. A truncated tmp (ENOSPC / SIGPIPE / full disk) must
+  # never be atomically renamed into the canonical path — a half-written
+  # complete/blocked/status sentinel would otherwise pass existence checks and
+  # mis-drive (or falsely terminate) the campaign. On failure: drop the tmp,
+  # leave the existing target untouched, and signal the error to callers that
+  # check. Behaviour on success is unchanged.
+  if ! cat > "$tmp"; then
+    rm -f "$tmp" 2>/dev/null
+    return 1
+  fi
+  if ! mv "$tmp" "$target" 2>/dev/null; then
+    rm -f "$tmp" 2>/dev/null
+    return 1
+  fi
+  return 0
+}
+# =============================================================================
+# ZSH-4: race-safe per-slug lock acquisition (redesign, v0.17.1)
+# =============================================================================
+# Acquire an exclusive lock at $1 (a file holding the owner PID). Race-safe vs:
+#   (a) two concurrent stale-lock recoverers,
+#   (b) a normal starter slipping into the rm/create gap,
+#   (c) a recovery mutex leaked by a crashed recoverer.
+# Algorithm: fast path is `set -C` (noclobber) atomic create. On contention with
+# a STALE (dead-owner) lock, recovery is serialized by an atomic `mkdir` mutex
+# whose own staleness is PID-based (never age-based, so a slow-but-alive recoverer
+# is never falsely reaped). Inside the mutex we re-read the lock (don't clobber a
+# live holder that recovered first) and re-acquire with `set -C` (so a starter
+# that grabbed the lock in the gap wins instead of us). Echoes nothing; returns:
+#   0 = acquired (caller should set LOCKFILE_ACQUIRED=1 and trap cleanup)
+#   1 = busy (a live instance holds the lock) OR lost a recovery race — caller exits
+acquire_slug_lock() {
+  local lockfile="$1"
+  mkdir -p "$(dirname "$lockfile")" 2>/dev/null
+  # Fast path: atomic noclobber create.
+  if (set -C; echo $$ > "$lockfile") 2>/dev/null; then
+    return 0
+  fi
+  local lock_pid
+  lock_pid=$(cat "$lockfile" 2>/dev/null)
+  if [[ -n "$lock_pid" ]] && kill -0 "$lock_pid" 2>/dev/null; then
+    return 1   # a live instance holds it
+  fi
+  # Stale lock (dead/unknown owner) — recover under an atomic mkdir mutex.
+  local rmutex="${lockfile}.recovery.d"
+  # Reap a leaked mutex ONLY when we can prove its owner is dead. An EMPTY owner
+  # is NOT proof of death: it usually means another recoverer just won the `mkdir`
+  # and has not yet written its PID (the window between `mkdir` and the owner
+  # write below). Reaping an empty-owner mutex here is a TOCTOU that deletes a
+  # LIVE mid-creation holder, letting two recoverers both proceed. So: a
+  # present-but-dead owner is reaped immediately; for an empty owner we give a
+  # brief settle window and re-read — if a PID appears it is a live holder and we
+  # do NOT reap (we lose the mkdir below and back off), and only if it stays
+  # empty do we treat it as a genuinely leaked mutex (creator died in the gap).
+  if [[ -d "$rmutex" ]]; then
+    local mowner
+    mowner=$(cat "$rmutex/owner" 2>/dev/null)
+    if [[ -z "$mowner" ]]; then
+      sleep 0.3
+      mowner=$(cat "$rmutex/owner" 2>/dev/null)
+    fi
+    if [[ -z "$mowner" ]] || ! kill -0 "$mowner" 2>/dev/null; then
+      rm -rf "$rmutex" 2>/dev/null
+    fi
+  fi
+  if ! mkdir "$rmutex" 2>/dev/null; then
+    return 1   # another recoverer owns the critical section
+  fi
+  echo $$ > "$rmutex/owner" 2>/dev/null
+  # Critical section: re-read the lock. If a prior recoverer installed a LIVE pid,
+  # do not clobber it.
+  local cur_pid
+  cur_pid=$(cat "$lockfile" 2>/dev/null)
+  if [[ -n "$cur_pid" && "$cur_pid" != "$$" ]] && kill -0 "$cur_pid" 2>/dev/null; then
+    rm -rf "$rmutex" 2>/dev/null
+    return 1
+  fi
+  # Replace the stale lock, re-acquiring with noclobber so a starter that slipped
+  # into the gap (and created the lock) wins — we lose cleanly instead of clobbering.
+  rm -f "$lockfile" 2>/dev/null
+  if ! (set -C; echo $$ > "$lockfile") 2>/dev/null; then
+    rm -rf "$rmutex" 2>/dev/null
+    return 1
+  fi
+  rm -rf "$rmutex" 2>/dev/null
+  return 0
 }
 # =============================================================================
@@ -587,6 +672,12 @@ update_status() {
     verified_us_json=$(echo "$VERIFIED_US" | tr ',' '\n' | jq -R . | jq -s .)
   fi
+  # D-5: jq-encode the free-text restore fields so a reason/model with special
+  # chars can't corrupt the status JSON (the rest of this builder is echo-based).
+  local _lbr_json _owm_json
+  _lbr_json=$(printf '%s' "${LAST_BLOCK_REASON:-}" | jq -Rs . 2>/dev/null); [[ -z "$_lbr_json" ]] && _lbr_json='""'
+  _owm_json=$(printf '%s' "${_ORIGINAL_WORKER_MODEL:-}" | jq -Rs . 2>/dev/null); [[ -z "$_owm_json" ]] && _owm_json='""'
   # Build consensus fields
   local consensus_json=""
   if [[ "$CONSENSUS_MODE" != "off" ]]; then
@@ -615,6 +706,11 @@ update_status() {
   "consensus_mode": "'"$CONSENSUS_MODE"'",
   "last_result": "'"$last_result"'",
   "consecutive_failures": '"$CONSECUTIVE_FAILURES"',
+  "consecutive_blocks": '"${CONSECUTIVE_BLOCKS:-0}"',
+  "last_block_reason": '"$_lbr_json"',
+  "model_upgraded": '"${_MODEL_UPGRADED:-0}"',
+  "same_us_fail_count": '"${_SAME_US_FAIL_COUNT:-0}"',
+  "original_worker_model": '"$_owm_json"',
   "verified_us": '"$verified_us_json"''"$consensus_json"',
   "updated_at_utc": "'"$(date -u +%Y-%m-%dT%H:%M:%SZ)"'"
 }' | atomic_write "$STATUS_FILE"
@@ -756,9 +852,18 @@ _lint_test_density() {
   us_list=$(grep -oE '^##[[:space:]]+US-[0-9]+' "$prd_file" 2>/dev/null | grep -oE 'US-[0-9]+' | sort -u)
   [[ -z "$us_list" ]] && return 0
-  local audit_dir="${LOGS_DIR:-/tmp}"
-  local audit_file="$audit_dir/test-density-audit.jsonl"
-  [[ -d "$audit_dir" ]] || audit_file="/tmp/test-density-audit.jsonl"
+  # ZSH-8: prefer the campaign LOGS_DIR. When it is unavailable, avoid a fixed,
+  # predictable /tmp name (insecure-temp: symlink/collision risk) by creating a
+  # unique temp file via mktemp; fall back to a PID-scoped name only if mktemp
+  # is missing.
+  local audit_dir="${LOGS_DIR:-}"
+  local audit_file
+  if [[ -n "$audit_dir" && -d "$audit_dir" ]]; then
+    audit_file="$audit_dir/test-density-audit.jsonl"
+  else
+    audit_file=$(mktemp "${TMPDIR:-/tmp}/test-density-audit.XXXXXX" 2>/dev/null) \
+      || audit_file="${TMPDIR:-/tmp}/test-density-audit.$$.jsonl"
+  fi
   local us
   for us in ${(f)us_list}; do
@@ -1179,6 +1284,11 @@ Summary: $summary
 Completed at iteration $ITERATION.
 Timestamp: $(date -u +%Y-%m-%dT%H:%M:%SZ)" | atomic_write "$COMPLETE_SENTINEL"
+  # F-26: propagate atomic_write failure — never log success on a failed write.
+  if (( ${pipestatus[-1]:-0} != 0 )); then
+    log_error "FAILED to write COMPLETE sentinel ($COMPLETE_SENTINEL) — IO/disk error; completion NOT durably recorded"
+    return 1
+  fi
   log "COMPLETE sentinel written: $COMPLETE_SENTINEL"
 }
@@ -1288,6 +1398,7 @@ write_blocked_sentinel() {
       suggested_action: $action,
       meta: { blocked_hygiene_violated: $hygiene }
     }' | atomic_write "$json_path"
+  local _bs_json_rc=${pipestatus[-1]:-0}
   echo "BLOCKED: $us_id
 Reason: $reason
@@ -1298,7 +1409,16 @@ Category: $category
 Blocked at iteration $ITERATION.
 Timestamp: $now_iso" | atomic_write "$BLOCKED_SENTINEL"
+  local _bs_md_rc=${pipestatus[-1]:-0}
+  # F-26: propagate atomic_write failure. The "markdown ⇒ JSON" invariant means a
+  # half-written sentinel must surface loudly, not log false success. (Best-effort
+  # signal: callers already `return 1` after this, so we log+return rather than
+  # restructure every caller.)
+  if (( _bs_md_rc != 0 || _bs_json_rc != 0 )); then
+    log_error "FAILED to durably write BLOCKED sentinel (md_rc=$_bs_md_rc json_rc=$_bs_json_rc) for [$category] $reason — IO/disk error"
+    return 1
+  fi
   log_error "Campaign BLOCKED: [$category] $reason"
   log "BLOCKED sentinel written: $BLOCKED_SENTINEL"
   log "BLOCKED sidecar written: $json_path"