npm - @seanyao/roll - Versions diffs - 2026.522.2 → 2026.523.1 - Mend

@seanyao/roll 2026.522.2 → 2026.523.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (12) hide show

package/CHANGELOG.md +19 -0
package/bin/dream-test-quality-scan +110 -0
package/bin/roll +346 -71
package/lib/__pycache__/roll-loop-status.cpython-314.pyc +0 -0
package/lib/__pycache__/roll_render.cpython-314.pyc +0 -0
package/lib/loop-fmt.py +17 -3
package/lib/roll-loop-status.py +45 -29
package/lib/roll_render.py +14 -7
package/package.json +1 -1
package/skills/roll-.dream/SKILL.md +59 -0
package/skills/roll-design/SKILL.md +4 -3
package/skills/roll-notes/SKILL.md +6 -3

package/CHANGELOG.md CHANGED Viewed

@@ -1,5 +1,24 @@
 # Changelog
+## v2026.523.1
+### Added
+- **`roll loop branches`** — 一眼看见本机残留的 loop 分支；每轮入口先 GC 一次，半途中止的 cycle 也会被收掉 `[loop]`
+### Changed
+- **dashboard token 列拆成 input / output / cache 写 / cache 读** — cache 是真花钱的，账单终于解释得清 `[loop]`
+### Fixed
+- **每日 dream / brief 在 macOS 26.4 上从来没真跑过** — 换成 interval 触发，从今天起稳定每天产出 `[loop]`
+- **dashboard 上 tcr 次数、built 列表、ALERT 文案不再显示假零或别故事的旧标签** `[loop]`
+- **选一个故事不再把别的依赖它的故事也标成"在做"** — dashboard 不再骗你说有人在干活 `[loop]`
+- **`roll setup` / `roll update` 不再在隐藏的覆盖提示上无声卡死**
+- **`$roll-notes` 现在写到 `.roll/notes/`** — 和 dream / brief 一致，不再扔到项目根目录 `[loop]`
+- **loop CI 网关不再把"排队中 / 进行中"误判成失败** `[loop]`
 ## v2026.522.2
 ### Changed

package/bin/dream-test-quality-scan ADDED Viewed

@@ -0,0 +1,110 @@
+#!/usr/bin/env bash
+# dream-test-quality-scan — ad-hoc helper for roll-.dream Scan 7.
+#
+# Walks bats files and flags ❶-class anti-patterns (hardcoded business data
+# in assertion bodies). Emits structured REFACTOR-shaped lines so the
+# maintainer can sanity-check the rubric against the current suite without
+# waiting for the nightly dream cycle.
+#
+# Usage:
+#   dream-test-quality-scan [--category N] [--path PATH] [--max N]
+#   dream-test-quality-scan --help
+#
+# Only category 1 (❶ hardcoded business data) is implemented as a deterministic
+# heuristic; categories ❷..❻ stay with the dream skill (AI agent applies the
+# rubric). The helper exists so a smoke test and a maintainer dry-run can
+# confirm ❶ detection keeps working as the suite evolves.
+set -euo pipefail
+CATEGORY=1
+TARGET=""
+MAX=5
+usage() {
+  cat <<'EOF'
+dream-test-quality-scan — Scan 7 ❶ dry-run helper
+Usage:
+  dream-test-quality-scan [--category N] [--path PATH] [--max N]
+  dream-test-quality-scan --help
+Options:
+  --category N   Anti-pattern category (only 1 is implemented; default 1)
+  --path PATH    File or directory to scan (default: tests/)
+  --max N        Maximum entries to emit (default: 5; matches dream skill cap)
+  --help         Show this message
+Output:
+  One line per finding:
+    [test-quality:❶] <file>:<line> — <one-line description>
+  Exit code is 0 even when nothing is found (dry-run is informational).
+EOF
+}
+while [[ $# -gt 0 ]]; do
+  case "$1" in
+    --category) CATEGORY="${2:-1}"; shift 2 ;;
+    --path)     TARGET="${2:-}"; shift 2 ;;
+    --max)      MAX="${2:-5}"; shift 2 ;;
+    --help|-h)  usage; exit 0 ;;
+    *)          echo "unknown flag: $1" >&2; usage >&2; exit 2 ;;
+  esac
+done
+if [[ "$CATEGORY" -ne 1 ]]; then
+  echo "category $CATEGORY not yet implemented — only ❶ is mechanical" >&2
+  exit 0
+fi
+# Default scan root.
+if [[ -z "$TARGET" ]]; then
+  repo_root=$(git rev-parse --show-toplevel 2>/dev/null || pwd)
+  TARGET="${repo_root}/tests"
+fi
+if [[ ! -e "$TARGET" ]]; then
+  echo "path not found: $TARGET" >&2
+  exit 2
+fi
+scan_file() {
+  local file="$1"
+  # ❶ heuristic — assertion lines whose RHS contains a numeric literal:
+  #   - lines containing `[[` or `[ ` (bats assertion syntax)
+  #   - AND containing `==` or `=` (equality)
+  #   - AND containing a decimal/integer literal of length ≥ 1 inside quotes
+  # Emits one entry per file (not per line) to stay under the rate cap.
+  local first_hit
+  first_hit=$(grep -nE '\[\[.*"[^"]*[0-9]+(\.[0-9]+)?[^"]*"' "$file" 2>/dev/null \
+              | head -1 || true)
+  [[ -z "$first_hit" ]] && return 1
+  local lineno
+  lineno=$(echo "$first_hit" | cut -d: -f1)
+  local rel
+  rel=$(python3 -c "import os,sys; print(os.path.relpath(sys.argv[1]))" "$file" 2>/dev/null || echo "$file")
+  printf '[test-quality:❶] %s:%s — assertion body hardcodes a numeric literal that likely owns its value elsewhere\n' \
+    "$rel" "$lineno"
+  return 0
+}
+emitted=0
+if [[ -d "$TARGET" ]]; then
+  # Iterate over .bats files under TARGET; stop after MAX hits.
+  # Exclude vendored bats-core helpers — those are framework tests, not ours.
+  while IFS= read -r f; do
+    case "$f" in
+      */tests/helpers/bats-*/*) continue ;;
+    esac
+    if scan_file "$f"; then
+      emitted=$((emitted + 1))
+      [[ "$emitted" -ge "$MAX" ]] && break
+    fi
+  done < <(find "$TARGET" -type f -name '*.bats' | sort)
+else
+  scan_file "$TARGET" && emitted=1 || true
+fi
+# Always succeed — dry-run is informational, the dream cycle decides what to do.
+exit 0

package/bin/roll CHANGED Viewed

@@ -4,7 +4,7 @@ set -euo pipefail
 # Roll — AI Agent Convention Manager
 # Single source of truth for how all AI coding agents behave.
-VERSION="2026.522.2"
+VERSION="2026.523.1"
 ROLL_HOME="${ROLL_HOME:-${HOME}/.roll}"
 ROLL_CONFIG="${ROLL_HOME}/config.yaml"
 ROLL_GLOBAL="${ROLL_HOME}/conventions/global"
@@ -322,16 +322,25 @@ safe_copy() {
     if diff -q "$src" "$dst" &>/dev/null; then
       return  # identical, skip silently
     fi
+    # Non-interactive (stdin is not a terminal): silently overwrite.
+    # _run_setup_step / cmd_update redirect stdin to /dev/null and all
+    # stdout/stderr is suppressed — prompting here would either hang on a
+    # hidden read or silently default to overwrite. Be explicit.
+    if [[ ! -t 0 ]]; then
+      cp "$src" "$dst"
+      ok "Wrote: ${dst/#$HOME/~}  已写入: ${dst/#$HOME/~}"
+      return
+    fi
     echo ""
     warn "File exists and differs: ${dst/#$HOME/~}  文件已存在且内容不同: ${dst/#$HOME/~}"
     echo -e "  ${BOLD}Overwrite?${NC} [Y/n/d(iff)] "
-    read -r answer
+    read -r answer || answer="Y"
     case "$answer" in
       d|D|diff)
         diff --color=auto "$dst" "$src" || true
         echo ""
         echo -e "  ${BOLD}Overwrite?${NC} [Y/n] "
-        read -r answer2
+        read -r answer2 || answer2="Y"
         [[ "$answer2" =~ ^[Nn]$ ]] && { info "Skipped: ${dst/#$HOME/\~}  已跳过: ${dst/#$HOME/\~}"; return; }
         ;;
       n|N) info "Skipped: ${dst/#$HOME/~}  已跳过: ${dst/#$HOME/~}"; return ;;
@@ -726,7 +735,7 @@ _run_setup_step() {
   local watch="$1"; shift
   local before after
   before=$(_setup_snapshot "$watch")
-  if "$@" >/dev/null 2>&1; then
+  if "$@" </dev/null >/dev/null 2>&1; then
     after=$(_setup_snapshot "$watch")
     if [[ "$before" == "$after" ]]; then
       _ROLL_SETUP_STATE="unchanged"
@@ -883,6 +892,40 @@ HINT
 # in (or opted out) don't get spammed each upgrade.
 cmd_doctor() {
   _doctor_pr_section
+  _doctor_launchd_stale_section
+}
+# FIX-097: scan ${_LAUNCHD_DIR}/com.roll.*.plist for entries whose
+# WorkingDirectory no longer exists on disk. These are the ghost agents left
+# behind when a user manually reproduces a bug under /private/tmp/ or
+# /var/folders/ — the auto-sandbox redirects plist writes but launchctl
+# bootstrap (before this fix) registered them anyway. Print labels +
+# cleanup hint; never auto-delete (host launchctl state is user-owned).
+_doctor_launchd_stale_section() {
+  [[ "$(uname)" == "Darwin" ]] || return 0
+  local dir="${_LAUNCHD_DIR:-${HOME}/Library/LaunchAgents}"
+  [[ -d "$dir" ]] || return 0
+  local found=0 plist label wd
+  for plist in "$dir"/com.roll.*.plist; do
+    [[ -e "$plist" ]] || continue
+    wd=$(awk '
+      /<key>WorkingDirectory<\/key>/ { getline; gsub(/.*<string>|<\/string>.*/, ""); print; exit }
+    ' "$plist" 2>/dev/null)
+    [[ -n "$wd" ]] || continue
+    [[ -d "$wd" ]] && continue
+    if [[ "$found" -eq 0 ]]; then
+      echo ""
+      echo "Stale launchd plists  无效的 launchd 服务"
+      echo ""
+      found=1
+    fi
+    label=$(basename "$plist" .plist)
+    echo "  ⚠ ${label}"
+    echo "    WorkingDirectory missing: ${wd}"
+    echo "    路径已失效，可清理: launchctl bootout gui/$(id -u)/${label}; rm '${plist}'"
+  done
+  return 0
 }
 _doctor_pr_section() {
@@ -1904,7 +1947,7 @@ PY
   fi
   if [ "${#plists[@]}" -gt 0 ]; then
     for item in "${plists[@]}"; do
-      launchctl unload -w "$HOME/Library/LaunchAgents/$item" 2>/dev/null && echo "    unloaded     $item"
+      _launchctl_safe unload -w "$HOME/Library/LaunchAgents/$item" 2>/dev/null && echo "    unloaded     $item"
       rm -f "$HOME/Library/LaunchAgents/$item" 2>/dev/null
     done
   fi
@@ -3892,7 +3935,7 @@ PYEOF
   local old_plist=~/Library/LaunchAgents/com.roll.loop.${old_slug}.plist
   if [[ -f "$old_plist" ]]; then
-    launchctl unload "$old_plist" 2>/dev/null || true
+    _launchctl_safe unload "$old_plist" 2>/dev/null || true
     rm -f "$old_plist"
   fi
@@ -3969,6 +4012,13 @@ if [ -z "${_LAUNCHD_DIR:-}" ]; then
         _LAUNCHD_DIR="${_SHARED_ROOT}/LaunchAgents"
         mkdir -p "$_LAUNCHD_DIR"
         export _LAUNCHD_DIR
+        # FIX-097: same trigger that sandboxed the plist FILE path must also
+        # short-circuit every `launchctl bootstrap/load/unload/enable` against
+        # that path. Otherwise a user who reproduces a bug under /private/tmp/
+        # or /var/folders/ ends up with sandboxed plists registered in their
+        # real gui/<uid> domain — when the tmp dir is cleaned, the agents become
+        # ghosts that fire forever (the historical 23:13 CST Terminal popup).
+        export _LAUNCHD_SKIP_REGISTRY=1
       fi
       unset _roll_in_test_ctx _roll_caller
       ;;
@@ -4095,6 +4145,25 @@ _launchd_label() {
   printf 'com.roll.%s.%s' "$service" "$(_project_slug "$project_path")"
 }
+# FIX-097: central skip predicate consulted by every launchctl invocation that
+# operates on a plist path Roll wrote. Returns 0 (skip) when either:
+#   - explicit: _LAUNCHD_SKIP_REGISTRY=1 was exported (tests, future opt-out)
+#   - implicit: _LAUNCHD_DIR is a child of _SHARED_ROOT (auto-sandbox active)
+# Returns 1 (do not skip) in production.
+#
+# History: FIX-090 introduced the same logic INSIDE _install_launchd_plists.
+# FIX-097 hoists it to a helper because the bootstrap call inside
+# _install_launchd_plists was not the only leak: _loop_on / _loop_off /
+# _loop_pause / _loop_resume each had bare `launchctl load/unload/enable`
+# calls that bypassed the gate.
+_launchd_should_skip_registry() {
+  [[ "${_LAUNCHD_SKIP_REGISTRY:-}" == "1" ]] && return 0
+  case "${_LAUNCHD_DIR:-}/" in
+    "${_SHARED_ROOT:-/nonexistent}"/*) return 0 ;;
+  esac
+  return 1
+}
 _launchd_plist_path() {
   local service="$1" project_path="$2"
   printf '%s/%s.plist' "$_LAUNCHD_DIR" "$(_launchd_label "$service" "$project_path")"
@@ -4123,16 +4192,34 @@ _write_launchd_plist() {
       ;;
   esac
-  local hour_xml=""
-  [[ -n "$hour" ]] && hour_xml="    <key>Hour</key>
-    <integer>${hour}</integer>
-"
   # FIX-050: bake PATH into the plist so launchd-spawned bash can find tmux,
   # claude, node, etc. The runner script also re-asserts PATH at runtime as
   # a second layer (covers stale plists where brew was installed after setup).
   local path_value; path_value=$(_detect_path_prepend)
+  # FIX-105: macOS 26.4 launchd silently refuses to fire StartCalendarInterval
+  # entries that contain BOTH Hour and Minute keys (verified: runs stays 0,
+  # last exit "never exited", no log output, the calendarinterval trigger is
+  # registered but never invoked by UserEventAgent-Aqua). Single-Minute (hourly)
+  # entries still fire fine. Workaround: when an Hour is provided (daily
+  # schedule), emit StartInterval=86400 (24h period) instead. First fire is
+  # bootstrap+24h rather than the exact requested wall-clock time — acceptable
+  # trade since the alternative was "never fires at all" (dream/brief broken
+  # for 4+ days). The Minute/Hour args are still kept in the function signature
+  # for callers that may want to filter at runtime, but they no longer steer
+  # the plist trigger format for daily schedules.
+  local schedule_xml
+  if [[ -n "$hour" ]]; then
+    schedule_xml="  <key>StartInterval</key>
+  <integer>86400</integer>"
+  else
+    schedule_xml="  <key>StartCalendarInterval</key>
+  <dict>
+    <key>Minute</key>
+    <integer>${minute}</integer>
+  </dict>"
+  fi
   local content
   content="<?xml version=\"1.0\" encoding=\"UTF-8\"?>
 <!DOCTYPE plist PUBLIC \"-//Apple//DTD PLIST 1.0//EN\" \"http://www.apple.com/DTDs/PropertyList-1.0.dtd\">
@@ -4151,11 +4238,7 @@ _write_launchd_plist() {
     <key>PATH</key>
     <string>${path_value}</string>
   </dict>
-  <key>StartCalendarInterval</key>
-  <dict>
-    <key>Minute</key>
-    <integer>${minute}</integer>
-${hour_xml}  </dict>
+${schedule_xml}
   <key>WorkingDirectory</key>
   <string>${project_path}</string>
 </dict>
@@ -4329,13 +4412,29 @@ _inner_cleanup() {
       # FIX-091: prefer a real PR so auto-merge lands the work; tag-only is the
       # last-resort because it requires manual cherry-pick. Emit cycle_end "done"
       # (canonical success status the dashboard recognizes) when PR publishes.
+      # FIX-099: compute tcr_count + built[] from the worktree (it's still alive
+      # at EXIT trap time) so runs.jsonl and ALERT carry truthful data.
+      _orphan_tcr=0
+      _orphan_built="[]"
+      if command -v jq >/dev/null 2>&1; then
+        _orphan_tcr=\$(cd "\$WT" && git log --oneline "origin/main..HEAD" 2>/dev/null | grep -c ' tcr:' || echo 0)
+        _orphan_built=\$(cd "\$WT" && git log --oneline "origin/main..HEAD" 2>/dev/null \
+          | grep ' tcr:' \
+          | grep -oE '\b(FIX|US|REFACTOR|CHORE)-[0-9]+\b' \
+          | sort -u \
+          | jq -R -s 'split("\n") | map(select(length>0))' 2>/dev/null || echo "[]")
+      fi
       _slug=""
       if _gh_resolve _slug \\
          && ( cd "\$WT" && _loop_publish_pr "\$BRANCH" "loop cycle \${CYCLE_ID}" ) >/dev/null 2>&1; then
         _loop_event cycle_end "\${CYCLE_ID}" "\${BRANCH:-}" "done" 2>/dev/null || true
         _CYCLE_END_WRITTEN=1
-        _runs_append "done" 0 "[]" 2>/dev/null || true
-        _worktree_alert "cycle \${CYCLE_ID}: aborted with \${_unpushed} commits; FIX-091 published as PR" 2>/dev/null || true
+        # FIX-099: pass real tcr_count + built[] instead of 0/"[]"
+        _runs_append "done" "\${_orphan_tcr}" "\${_orphan_built}" 2>/dev/null || true
+        # FIX-099: three-field ALERT so callers can distinguish recovered orphan
+        # from a cycle's normally-picked story (was: "FIX-091 published as PR"
+        # which leaked a hardcoded string regardless of what was actually built).
+        _worktree_alert "cycle \${CYCLE_ID}: recovered_from_orphan=yes; tcr_commits=\${_orphan_tcr}; stories=\${_orphan_built}; pr_branch=\${BRANCH:-unknown}" 2>/dev/null || true
       else
         _orphan_tag="loop-orphan-\${CYCLE_ID}"
         if ( cd "\$WT" && git push origin "\$BRANCH" 2>/dev/null \\
@@ -4343,8 +4442,9 @@ _inner_cleanup() {
              && git push origin "\$_orphan_tag" 2>/dev/null ); then
           _loop_event cycle_end "\${CYCLE_ID}" "\${BRANCH:-}" "orphan" 2>/dev/null || true
           _CYCLE_END_WRITTEN=1
-          _runs_append "orphan" 0 "[]" 2>/dev/null || true
-          _worktree_alert "cycle \${CYCLE_ID}: aborted with \${_unpushed} commits; FIX-086 pushed orphan tag \${_orphan_tag}" 2>/dev/null || true
+          # FIX-099: pass real tcr_count + built[] for the orphan-tag path too
+          _runs_append "orphan" "\${_orphan_tcr}" "\${_orphan_built}" 2>/dev/null || true
+          _worktree_alert "cycle \${CYCLE_ID}: recovered_from_orphan=yes; tcr_commits=\${_orphan_tcr}; stories=\${_orphan_built}; FIX-086 pushed orphan tag \${_orphan_tag}" 2>/dev/null || true
         fi
       fi
     fi
@@ -4395,6 +4495,10 @@ WT="\$(_worktree_path "${slug}" "cycle-\${CYCLE_ID}")"
 BRANCH="loop/cycle-\${CYCLE_ID}"
 _USE_WORKTREE=0
 cd "${project_path}" 2>/dev/null || true
+# FIX-104: GC stale merged temp branches at cycle entry — before worktree setup
+# and before any early-exit gate (pre-run abort, CI red precheck). The post-claude
+# call site doesn't cover those paths, so merged branches accumulated on origin.
+_loop_cleanup_stale_cycle_branches "${project_path}" || true
 # FIX-040: orphan worktree recovery — scan for worktrees left by previous failed
 # cycles (publish failed or inner script was SIGKILL'd). Attempt to publish each
 # before starting the new cycle. Glob is chronological via timestamp in name.
@@ -4622,9 +4726,6 @@ if [ "\$_USE_WORKTREE" = "1" ]; then
   fi
 fi
-# US-AUTO-040: fallback GC — delete remote loop/cycle-* branches already merged to main.
-_loop_cleanup_stale_cycle_branches "${project_path}" || true
 # FIX-044 / Step 5: Write loop cycle run summary to runs.jsonl
 # Deterministic — runs in shell regardless of whether agent executes SKILL.md Step 5.
 # US-LOOP-005: now routed through _runs_append so timeout/worktree-setup-fail
@@ -4760,17 +4861,67 @@ SCRIPT
 }
 _launchd_is_loaded() {
-  launchctl print-disabled "gui/$(id -u)" 2>/dev/null | grep -qF "\"$1\" => enabled"
+  # FIX-098: probe actual launchd registry via `launchctl print`, NOT
+  # `launchctl print-disabled`. The disabled-overrides DB only tracks
+  # labels explicitly enabled/disabled by the user — after `roll loop off`
+  # (bootout) + `roll update` the label stays absent from the overrides DB,
+  # so the old grep returned false-positive "loaded". `launchctl print`
+  # returns exit 0 only when the agent is actually registered in the current
+  # launchd session; non-zero means the label is unknown to launchd.
+  launchctl print "gui/$(id -u)/$1" >/dev/null 2>&1
+}
+# FIX-101 tripwire: refuse to mutate the host's launchd session when
+# _LAUNCHD_DIR has been sandboxed (i.e. is not the canonical
+# ${HOME}/Library/LaunchAgents). Tests that auto-sandbox _LAUNCHD_DIR for
+# isolation (FIX-087) may still forget to set _LAUNCHD_SKIP_REGISTRY=1 or
+# stub the launchctl binary; without this defensive layer the production
+# label's plist path can get overwritten with a transient sandbox path,
+# leading to launchd EX_CONFIG (exit 78) when the tmp dir is later cleaned
+# and the next scheduled fire can't find the plist. Read-only ops (print*,
+# list, version) are always allowed since they have no side effects.
+_launchctl_safe() {
+  # Read-only ops are always safe (no host launchd state mutation).
+  case "${1:-}" in
+    print|print-disabled|list|version|dumpstate|examine)
+      launchctl "$@"
+      return $?
+      ;;
+  esac
+  # If `launchctl` has been replaced by a function stub (typical in bats tests
+  # that want to assert against captured calls), pass through to the stub.
+  # Stubs by definition don't touch host launchd, so this is safe; and tests
+  # like `_install_launchd_plists: bootout targets gui/<uid>/<label>` rely on
+  # the literal call landing in their captured log.
+  if [[ "$(type -t launchctl 2>/dev/null)" == "function" ]]; then
+    launchctl "$@"
+    return $?
+  fi
+  # Real launchctl binary path: refuse to mutate when _LAUNCHD_DIR has been
+  # sandboxed (i.e. is not the canonical ${HOME}/Library/LaunchAgents). This
+  # is the FIX-101 defensive layer — when a test forgets to stub launchctl
+  # AND has _LAUNCHD_DIR sandboxed, prevent the call from reaching the host's
+  # production launchd and overwriting a live label's plist path.
+  local canonical="${HOME}/Library/LaunchAgents"
+  if [[ "${_LAUNCHD_DIR:-$canonical}" != "$canonical" ]]; then
+    return 0
+  fi
+  launchctl "$@"
 }
 _launchd_svc_state() {
+  # FIX-098: three-state classification:
+  #   enabled       — plist on disk AND registered in launchd
+  #   stale         — plist on disk BUT NOT registered in launchd
+  #   installed-off — kept for back-compat (maps to stale semantics)
+  #   not-installed — no plist
   local svc="$1" project_path="$2"
   local label; label=$(_launchd_label "$svc" "$project_path")
   local plist; plist=$(_launchd_plist_path "$svc" "$project_path")
   if _launchd_is_loaded "$label"; then
     echo "enabled"
   elif [[ -f "$plist" ]]; then
-    echo "installed-off"
+    echo "stale"
   else
     echo "not-installed"
   fi
@@ -4833,42 +4984,25 @@ _install_launchd_plists() {
     local after; after=$(cat "$plist")
     if [[ "$before" != "$after" ]]; then
       updated=$((updated + 1))
-      # FIX-090: gate launchctl writes so a sandboxed plist never gets
-      # registered into the user's REAL gui/<uid> domain. Without this,
-      # `launchctl bootstrap gui/<uid> <sandbox-plist>` outlives TEST_TMP
-      # cleanup as a zombie that either fails silently (EX_CONFIG) or, when
-      # the label collides with the dev's project slug, displaces the real
-      # registration and kills the autonomous loop. Two gate paths:
-      #   - explicit: integration_setup exports _LAUNCHD_SKIP_REGISTRY=1
-      #   - implicit: if _LAUNCHD_DIR was auto-sandboxed under _SHARED_ROOT
-      #     (FIX-087 inner-runner.sh re-source path) we infer skip — callers
-      #     that genuinely want the launchctl flow override _LAUNCHD_DIR to
-      #     a path outside _SHARED_ROOT (unit tests; production has no
-      #     _SHARED_ROOT match against ~/Library/LaunchAgents).
-      # See helpers.bash and tests/unit/launchd_sandbox.bats.
-      local _skip_reg="${_LAUNCHD_SKIP_REGISTRY:-}"
-      if [[ -z "$_skip_reg" ]]; then
-        case "${_LAUNCHD_DIR:-}/" in
-          "${_SHARED_ROOT:-/nonexistent}"/*) _skip_reg=1 ;;
-          *) _skip_reg=0 ;;
-        esac
-      fi
-      if [[ "$_skip_reg" != "1" ]]; then
+      # FIX-090/FIX-097: gate launchctl writes via central helper so a
+      # sandboxed plist never gets registered into the user's REAL gui/<uid>
+      # domain. See _launchd_should_skip_registry for the predicate rules.
+      if ! _launchd_should_skip_registry; then
         if _launchd_is_loaded "$label"; then
           # FIX-027: use bootout/bootstrap so we don't disturb the label's
           # enabled flag in the launchd overrides db (which legacy
           # unload/load no-`-w` wipes on macOS Sonoma+, causing
           # `roll loop status` to falsely report off after `roll update`).
           local uid; uid=$(id -u)
-          launchctl bootout "gui/${uid}/${label}" 2>/dev/null || true
-          launchctl bootstrap "gui/${uid}" "$plist" 2>/dev/null || true
+          _launchctl_safe bootout "gui/${uid}/${label}" 2>/dev/null || true
+          _launchctl_safe bootstrap "gui/${uid}" "$plist" 2>/dev/null || true
         elif [[ -z "$before" ]]; then
           # FIX-059: brand-new plist — macOS FSEvents auto-bootstraps any new
           # file dropped in ~/Library/LaunchAgents/, so projects never enabled
           # via 'roll loop on' would fire every hour. Immediately mark disabled
           # in the overrides db to block that auto-load.
           local uid; uid=$(id -u)
-          launchctl disable "gui/${uid}/${label}" 2>/dev/null || true
+          _launchctl_safe disable "gui/${uid}/${label}" 2>/dev/null || true
         fi
       fi
     fi
@@ -4925,7 +5059,8 @@ cmd_loop() {
     notify)       _notify "${1:-roll}" "${2:-}" ;;
     enforce-tcr)  _loop_enforce_tcr "${1:-}" "${2:-}" ;;
     precheck-ci)  _loop_precheck_ci ;;
-    *) err "Usage: roll loop <on|off|now|test|status|monitor|runs|events|attach|mute|unmute|pause|resume|reset|notify|enforce-tcr|precheck-ci>"; exit 1 ;;
+    branches)     _loop_branches "$(pwd -P)" ;;
+    *) err "Usage: roll loop <on|off|now|test|status|monitor|runs|events|attach|mute|unmute|pause|resume|reset|notify|enforce-tcr|precheck-ci|branches>"; exit 1 ;;
   esac
 }
@@ -4945,12 +5080,25 @@ _loop_on() {
   if [[ "$(uname)" == "Darwin" ]]; then
     _install_launchd_plists "$project_path" >/dev/null
+    # FIX-098: use launchctl bootstrap/enable instead of load -w.
+    # `load -w` writes to the disabled-overrides DB which causes FIX-027's
+    # re-source to break after `roll update`. bootstrap is idem-potent and
+    # does not disturb the overrides DB.
+    local uid; uid=$(id -u)
     local all_loaded=true
     for svc in loop dream brief; do
       local label; label=$(_launchd_label "$svc" "$project_path")
+      local plist; plist=$(_launchd_plist_path "$svc" "$project_path")
       if ! _launchd_is_loaded "$label"; then
         all_loaded=false
-        launchctl load -w "$(_launchd_plist_path "$svc" "$project_path")" 2>/dev/null || true
+        # FIX-097 guard: skip real launchctl when _LAUNCHD_DIR was auto-sandboxed.
+        _launchd_should_skip_registry && continue
+        # FIX-098 semantic: enable+bootstrap pair (better than load -w).
+        # enable clears any disable-override; bootstrap registers with launchd.
+        # FIX-101 wrapper additionally tripwire-gates each call so a sandboxed
+        # _LAUNCHD_DIR can't accidentally touch host launchd state.
+        _launchctl_safe enable "gui/${uid}/${label}" 2>/dev/null || true
+        _launchctl_safe bootstrap "gui/${uid}" "$plist" 2>/dev/null || true
       fi
     done
@@ -5002,11 +5150,15 @@ _loop_off() {
   if [[ "$(uname)" == "Darwin" ]]; then
     local any_loaded=false
+    local _skip_off; _launchd_should_skip_registry && _skip_off=1 || _skip_off=0
     for svc in loop dream brief; do
       local label; label=$(_launchd_label "$svc" "$project_path")
       if _launchd_is_loaded "$label"; then
         any_loaded=true
-        launchctl unload -w "$(_launchd_plist_path "$svc" "$project_path")" 2>/dev/null || true
+        # FIX-097: skip real launchctl in sandbox to avoid touching the user's
+        # real launchd registry.
+        [[ "$_skip_off" == "1" ]] && continue
+        _launchctl_safe unload -w "$(_launchd_plist_path "$svc" "$project_path")" 2>/dev/null || true
       fi
     done
     if ! $any_loaded; then
@@ -5025,7 +5177,9 @@ _loop_off() {
       # disable list, polluting `launchctl print-disabled` forever even after
       # the project dir, plists, and ~/.roll are gone.
       local label; label=$(_launchd_label "$svc" "$project_path")
-      launchctl enable "gui/${uid}/${label}" 2>/dev/null || true
+      # FIX-097: same gate — never touch host launchctl from a sandbox.
+      [[ "$_skip_off" == "1" ]] && continue
+      _launchctl_safe enable "gui/${uid}/${label}" 2>/dev/null || true
     done
     ok "Loop disabled  已停用"
     return 0
@@ -5169,7 +5323,7 @@ _legacy_loop_status() {
       else
         case "$state" in
           enabled)       echo -e "    ${GREEN}${svc}     ● enabled${NC}" ;;
-          installed-off) echo -e "    ${YELLOW}${svc}     ⚠ installed/off${NC}   run: roll loop on" ;;
+          stale|installed-off) echo -e "    ${YELLOW}${svc}     ⚠ STALE — plist present but not loaded${NC}   run: roll loop on" ;;
           not-installed) echo -e "    ${RED}${svc}     ○ not installed${NC}   run: roll setup" ;;
         esac
       fi
@@ -5205,7 +5359,10 @@ _loop_pause() {
     if ! _launchd_is_loaded "$label"; then
       warn "Loop not enabled — nothing to pause  loop 未启用，无需暂停"; return 0
     fi
-    launchctl unload -w "$(_launchd_plist_path "loop" "$project_path")" 2>/dev/null || true
+    # FIX-097: never touch host launchctl from a sandboxed plist path.
+    if ! _launchd_should_skip_registry; then
+      _launchctl_safe unload -w "$(_launchd_plist_path "loop" "$project_path")" 2>/dev/null || true
+    fi
   else
     local slug; slug=$(_project_slug "$project_path")
     mkdir -p "${_SHARED_ROOT}/loop"
@@ -5225,8 +5382,9 @@ _loop_resume() {
     if [[ "$(uname)" == "Darwin" ]]; then
       local label; label=$(_launchd_label "loop" "$project_path")
       local plist; plist=$(_launchd_plist_path "loop" "$project_path")
-      if [[ -f "$plist" ]]; then
-        launchctl load -w "$plist" 2>/dev/null || true
+      if [[ -f "$plist" ]] && ! _launchd_should_skip_registry; then
+        # FIX-097: never touch host launchctl from a sandboxed plist path.
+        _launchctl_safe load -w "$plist" 2>/dev/null || true
       fi
     else
       local slug; slug=$(_project_slug "$project_path")
@@ -5562,15 +5720,28 @@ _loop_precheck_ci() {
   local commit; commit=$(git rev-parse HEAD 2>/dev/null) || return 0
+  # FIX-103: fetch both `status` and `conclusion`. Pre-run gate must distinguish
+  # a still-running CI (status=in_progress/queued/waiting, conclusion=null) from
+  # a genuinely red CI (conclusion=failure/cancelled/timed_out/...). Treating
+  # in_progress as red kills every cycle started within the first ~30s of a
+  # merge-triggered CI run.
   local runs
-  runs=$(gh -R "$slug" run list --commit "$commit" --json conclusion 2>/dev/null) || return 0
+  runs=$(gh -R "$slug" run list --commit "$commit" --json conclusion,status 2>/dev/null) || return 0
   [[ -z "$runs" || "$runs" == "[]" ]] && return 0
-  local failed
-  failed=$(echo "$runs" | jq -r '[.[] | select(.conclusion != null and .conclusion != "success" and .conclusion != "skipped")] | length' 2>/dev/null || echo "0")
+  # Conclusions that block the loop. Anything else (success, skipped, neutral,
+  # or null while still running) is treated as pass/pending.
+  local failed_conclusions
+  failed_conclusions=$(echo "$runs" \
+    | jq -r '[.[] | select(.conclusion=="failure" or .conclusion=="cancelled" or .conclusion=="timed_out" or .conclusion=="action_required" or .conclusion=="startup_failure") | .conclusion] | unique | join(",")' \
+    2>/dev/null || echo "")
-  if [[ "$failed" -gt 0 ]]; then
+  if [[ -n "$failed_conclusions" ]]; then
     local short; short=$(git rev-parse --short HEAD 2>/dev/null || echo unknown)
+    local run_states
+    run_states=$(echo "$runs" \
+      | jq -r '[.[] | "\(.status // "?")/\(.conclusion // "null")"] | unique | join(", ")' \
+      2>/dev/null || echo "?")
     err "Pre-run CI check: HEAD CI is red — refuse to build on broken base (${short})  HEAD CI 红，拒绝在破损的基础上构建"
     mkdir -p "$(dirname "$_LOOP_ALERT")"
     cat > "$_LOOP_ALERT" << EOF
@@ -5579,6 +5750,8 @@ _loop_precheck_ci() {
 **Time**: $(date '+%Y-%m-%d %H:%M')
 **Commit**: ${short}
 **Reason**: HEAD CI is red — loop refused to build on a broken base  HEAD CI 红，loop 拒绝在破损的基础上构建
+**Failing conclusions**: ${failed_conclusions}
+**Run states**: ${run_states}
 **Action required**:
 - Investigate and fix CI: \`gh -R ${slug} run list --commit ${commit}\`
@@ -6082,10 +6255,24 @@ _loop_mark_in_progress() {
   [ -n "$story_id" ] || return 1
   [ -f "$backlog" ] || return 0
   local tmp; tmp=$(mktemp "${backlog}.XXXXXX") || return 1
+  # FIX-106: match the story-id column (col 2) for equality instead of doing
+  # substring match on the whole row. Pre-fix, picking US-X-001 also flipped
+  # any row whose description contained "depends-on:US-X-001" — leaving the
+  # dashboard claiming work on stories no one had picked.
   awk -v sid="$story_id" '
     {
-      if (index($0, sid) > 0 && index($0, "📋 Todo") > 0) {
-        sub(/📋 Todo/, "🔨 In Progress")
+      if (index($0, "📋 Todo") > 0) {
+        n = split($0, cols, "|")
+        if (n >= 2) {
+          id_cell = cols[2]
+          gsub(/[[:space:]]/, "", id_cell)
+          # Markdown link form "[ID](path)" → keep just "ID"
+          sub(/^\[/, "", id_cell)
+          sub(/\].*$/, "", id_cell)
+          if (id_cell == sid) {
+            sub(/📋 Todo/, "🔨 In Progress")
+          }
+        }
       }
       print
     }
@@ -6101,10 +6288,20 @@ _loop_mark_todo() {
   [ -n "$story_id" ] || return 1
   [ -f "$backlog" ] || return 0
   local tmp; tmp=$(mktemp "${backlog}.XXXXXX") || return 1
+  # FIX-106: same column-2 equality match as _loop_mark_in_progress.
   awk -v sid="$story_id" '
     {
-      if (index($0, sid) > 0 && index($0, "🔨 In Progress") > 0) {
-        sub(/🔨 In Progress/, "📋 Todo")
+      if (index($0, "🔨 In Progress") > 0) {
+        n = split($0, cols, "|")
+        if (n >= 2) {
+          id_cell = cols[2]
+          gsub(/[[:space:]]/, "", id_cell)
+          sub(/^\[/, "", id_cell)
+          sub(/\].*$/, "", id_cell)
+          if (id_cell == sid) {
+            sub(/🔨 In Progress/, "📋 Todo")
+          }
+        }
       }
       print
     }
@@ -6512,14 +6709,29 @@ _claude_cleanup_stale_worktrees() {
   return 0
 }
+# FIX-104: scan multiple ephemeral prefixes (loop/cycle-, worktree-agent-,
+# claude/) and delete any already merged to origin/main. Unmerged branches
+# are preserved — they may be active WIP. Caller can pass a custom prefix
+# list via $2 (newline-separated `refs/heads/<prefix>*` patterns) but the
+# default whitelist covers every temp prefix the loop / Claude session /
+# worktree-agent paths create.
 _loop_cleanup_stale_cycle_branches() {
   local project_path="${1:-.}"
   local url; url=$(git -C "$project_path" remote get-url origin 2>/dev/null) || return 0
   [[ "$url" == *github.com* ]] || return 0
-  local branches
-  branches=$(git -C "$project_path" ls-remote --heads origin 'refs/heads/loop/cycle-*' 2>/dev/null \
-    | awk '{print $2}' | sed 's|^refs/heads/||')
+  local prefixes="${2:-refs/heads/loop/cycle-*
+refs/heads/worktree-agent-*
+refs/heads/claude/*}"
+  local branches=""
+  while IFS= read -r pat; do
+    [ -z "$pat" ] && continue
+    local found
+    found=$(git -C "$project_path" ls-remote --heads origin "$pat" 2>/dev/null \
+      | awk '{print $2}' | sed 's|^refs/heads/||')
+    [ -n "$found" ] && branches+="${found}"$'\n'
+  done <<< "$prefixes"
   [ -z "$branches" ] && return 0
   while IFS= read -r branch; do
@@ -6534,6 +6746,41 @@ _loop_cleanup_stale_cycle_branches() {
   return 0
 }
+# FIX-104: residual-visibility command. List origin's ephemeral temp branches
+# (loop/cycle-*, worktree-agent-*, claude/*) with their merge status so the
+# user can see what GC will clean up next cycle and what's still active WIP.
+# Output: TAB-separated `<branch>\t<merged|open>` lines, one per branch.
+# Silent on non-GitHub remote / empty / unreachable.
+_loop_branches() {
+  local project_path="${1:-.}"
+  local url; url=$(git -C "$project_path" remote get-url origin 2>/dev/null) || return 0
+  [[ "$url" == *github.com* ]] || return 0
+  local prefixes="refs/heads/loop/cycle-*
+refs/heads/worktree-agent-*
+refs/heads/claude/*"
+  local branches=""
+  while IFS= read -r pat; do
+    [ -z "$pat" ] && continue
+    local found
+    found=$(git -C "$project_path" ls-remote --heads origin "$pat" 2>/dev/null \
+      | awk '{print $2}' | sed 's|^refs/heads/||')
+    [ -n "$found" ] && branches+="${found}"$'\n'
+  done <<< "$prefixes"
+  [ -z "$branches" ] && return 0
+  while IFS= read -r branch; do
+    [ -z "$branch" ] && continue
+    local status="open"
+    if git -C "$project_path" merge-base --is-ancestor "$branch" origin/main 2>/dev/null; then
+      status="merged"
+    fi
+    printf "%s\t%s\n" "$branch" "$status"
+  done <<< "$branches"
+  return 0
+}
 # US-AUTO-033: publish a loop cycle branch as a GitHub PR with auto-merge.
 #
 # _loop_publish_pr <branch> [title]
@@ -7114,7 +7361,19 @@ cmd_ci() {
 # will switch to hard-fail. Output format mirrors a linter ("file:line:
 # message") so editors can navigate from it.
 _backlog_lint() {
-  local backlog="${1:-.roll/backlog.md}"
+  # FIX-102: --gate flag flips Phase 1 warn-only behavior to hard-fail.
+  # When passed, any violation makes the command exit 1 — used by the
+  # PreToolUse / Stop hook in ~/.claude/settings.json to actually block
+  # the assistant from leaving the backlog dirty.
+  local gate=0
+  local backlog=".roll/backlog.md"
+  while [ $# -gt 0 ]; do
+    case "$1" in
+      --gate) gate=1 ;;
+      *) backlog="$1" ;;
+    esac
+    shift
+  done
   [ -f "$backlog" ] || { err "backlog not found: $backlog"; return 1; }
   local violations=0
@@ -7139,6 +7398,18 @@ _backlog_lint() {
       | sed -E 's|^\[[A-Z]+-[0-9]+\]\([^)]*\)[[:space:]]*||' \
       | sed -E 's|^[A-Z]+-[0-9]+[[:space:]]*||')
     local issues=""
+    # FIX-102: length check — backlog rows are an index page; descriptions
+    # must be one human sentence (≤120 chars). Longer = technical detail
+    # that belongs in the linked .roll/features/<epic>/<slug>.md.
+    if [ "${#body}" -gt 120 ]; then
+      issues="${issues:+${issues}, }length>${#body}"
+    fi
+    # FIX-102: code-fence check — backticks (`code`) signal technical jargon
+    # (commands, identifiers, paths). Keep description prose plain text;
+    # any code goes in the feature file.
+    if echo "$body" | grep -qF '`'; then
+      issues="${issues:+${issues}, }code-fence"
+    fi
     # Filenames: bare `something.ext` for common code/config extensions
     if echo "$body" | grep -qE '\b[A-Za-z_][A-Za-z0-9_.-]*\.(sh|bash|yaml|yml|json|js|ts|tsx|py|rb|go|rs|c|cpp|h)\b'; then
       issues="${issues:+${issues}, }filename"
@@ -7165,11 +7436,14 @@ _backlog_lint() {
   echo ""
   if [ "$violations" -gt 0 ]; then
     echo "  ${violations} violation(s) — see conventions/global/AGENTS.md §4"
+    if [ "$gate" = 1 ]; then
+      echo "  ${violations} 条违规 — --gate enabled, exiting 1"
+      return 1
+    fi
     echo "  ${violations} 条违规 — Phase 1: warn-only, not blocking"
   else
     echo "  No violations  无违规"
   fi
-  # Phase 1: warn-only. Exit 0 regardless.
   return 0
 }
@@ -7185,7 +7459,8 @@ cmd_backlog() {
   # ── Status management subcommands ─────────────────────────────────────────
   case "$subcmd" in
     lint)
-      _backlog_lint "$backlog"
+      shift
+      _backlog_lint "$@" "$backlog"
       return
       ;;
     block|defer|unblock|promote)

package/lib/__pycache__/roll-loop-status.cpython-314.pyc CHANGED Viewed

Binary file

package/lib/__pycache__/roll_render.cpython-314.pyc CHANGED Viewed

Binary file

package/lib/loop-fmt.py CHANGED Viewed

@@ -353,14 +353,28 @@ class LoopFmt:
         # Use the cumulative totals accumulated across all assistant turns;
         # result.usage is per-turn (last only) so it would under-count badly.
         model = result_ev.get("model") or self._last_model or ""
+        # FIX-099: skip writing the usage event when claude returned no real
+        # usage data (model empty AND cost/duration both zero). This prevents
+        # stale/placeholder values from leaking into the events stream and
+        # showing up as "cost=$1.24 dur=372s" in three consecutive cycles when
+        # the real cycle had no token data (the default-value fallback).
+        # The dashboard can render "n/a" for missing usage rather than false data.
+        has_model   = bool(model)
+        has_tokens  = any(self._usage_totals[k] > 0 for k in self._usage_totals)
+        has_cost    = bool(cost_usd)
+        has_dur     = bool(dur_ms)
+        if not has_model and not has_tokens and not has_cost and not has_dur:
+            return  # nothing real to report — skip rather than persist zeros
         payload = {
-            "model":                 model,
+            "model":                 model if has_model else None,
             "input_tokens":          self._usage_totals["input_tokens"],
             "output_tokens":         self._usage_totals["output_tokens"],
             "cache_creation_tokens": self._usage_totals["cache_creation_tokens"],
             "cache_read_tokens":     self._usage_totals["cache_read_tokens"],
-            "cost_reported_usd":     float(cost_usd or 0),
-            "duration_ms":           int(dur_ms or 0),
+            "cost_reported_usd":     float(cost_usd) if has_cost else None,
+            "duration_ms":           int(dur_ms) if has_dur else None,
         }
         evfile = os.path.join(shared, "loop", f"events-{slug}.ndjson")
         line = json.dumps({

package/lib/roll-loop-status.py CHANGED Viewed

@@ -356,8 +356,10 @@ def backfill_usage_from_claude_sessions(cycles: List[Dict[str, Any]], slug: str)
         # Path 1: usage event written by loop-fmt at result time.
         ue = cy.get("usage_event")
         if isinstance(ue, dict) and (ue.get("input_tokens") or ue.get("output_tokens")):
-            cy["input_tokens"]  = int(ue.get("input_tokens")  or 0)
-            cy["output_tokens"] = int(ue.get("output_tokens") or 0)
+            cy["input_tokens"]          = int(ue.get("input_tokens")          or 0)
+            cy["output_tokens"]         = int(ue.get("output_tokens")         or 0)
+            cy["cache_creation_tokens"] = int(ue.get("cache_creation_tokens") or 0)
+            cy["cache_read_tokens"]     = int(ue.get("cache_read_tokens")     or 0)
             cy["model"] = ue.get("model")
             # US-VIEW-010: aggregate now sums per-turn usage tokens, so the
             # totals in `ue` reflect the whole cycle. Always compute cost at
@@ -380,8 +382,10 @@ def backfill_usage_from_claude_sessions(cycles: List[Dict[str, Any]], slug: str)
         u = load_claude_session_usage(cy.get("label", ""), slug)
         if not u:
             continue
-        cy["input_tokens"]  = int(u.get("input_tokens")  or 0)
-        cy["output_tokens"] = int(u.get("output_tokens") or 0)
+        cy["input_tokens"]          = int(u.get("input_tokens")          or 0)
+        cy["output_tokens"]         = int(u.get("output_tokens")         or 0)
+        cy["cache_creation_tokens"] = int(u.get("cache_creation_tokens") or 0)
+        cy["cache_read_tokens"]     = int(u.get("cache_read_tokens")     or 0)
         cy["model"] = u["model"]
         cy["cost_list"] = mp.compute_list_cost(
             u["model"],
@@ -557,7 +561,8 @@ def rollup_for_day(day_cycles: List[Dict[str, Any]]) -> Dict[str, Any]:
     # reads all 4 fields), but they don't represent the model's actual work.
     r = {"cycles": len(day_cycles), "prs": 0, "failed": 0,
          "duration_s": 0, "cost": 0.0,
-         "input_tokens": 0, "output_tokens": 0}
+         "input_tokens": 0, "output_tokens": 0,
+         "cache_creation_tokens": 0, "cache_read_tokens": 0}
     for cy in day_cycles:
         if cy.get("outcome") == "fail":
             r["failed"] += 1
@@ -567,6 +572,10 @@ def rollup_for_day(day_cycles: List[Dict[str, Any]]) -> Dict[str, Any]:
             r["input_tokens"] += cy["input_tokens"]
         if cy.get("output_tokens"):
             r["output_tokens"] += cy["output_tokens"]
+        if cy.get("cache_creation_tokens"):
+            r["cache_creation_tokens"] += cy["cache_creation_tokens"]
+        if cy.get("cache_read_tokens"):
+            r["cache_read_tokens"] += cy["cache_read_tokens"]
         # US-VIEW-011: rollup only counts cycles whose PR actually merged.
         # Backward compat: rows where pr_outcome is missing but pr URL exists
         # (no `pr` event after the writer upgrade ran for that cycle) are
@@ -634,10 +643,13 @@ def render(events, cron, state, backlog, *, days=3, lang="both", now=None,
                     c("dim", "run ") + c("fg", "roll loop on", bold=True) +
                     c("dim", " to enable"))
             eb_zh = c("dim", "  未安装 · 运行 ") + c("fg", "roll loop on") + c("dim", " 启用")
-        elif install_state == "disabled":
-            eb_l = (c("amber", "◌ installed/off", bold=True) + c("muted", "   ") +
-                    c("dim", "loop disabled — run ") + c("fg", "roll loop on", bold=True))
-            eb_zh = c("dim", "  未启用 · 运行 ") + c("fg", "roll loop on") + c("dim", " 启用")
+        elif install_state in ("stale", "disabled"):
+            # FIX-098: 'stale' = plist on disk but agent not registered in launchd.
+            # 'disabled' kept for back-compat (old install_state values). Both mean
+            # the user needs to run 'roll loop on' to bootstrap the agent.
+            eb_l = (c("amber", "◌ STALE — plist present, not loaded", bold=True) + c("muted", "   ") +
+                    c("dim", "run ") + c("fg", "roll loop on", bold=True) + c("dim", " to repair"))
+            eb_zh = c("dim", "  Plist 存在但未加载 · 运行 ") + c("fg", "roll loop on") + c("dim", " 修复")
         else:
             eb_l = (c("blue", "● IDLE", bold=True) + c("muted", " · ") +
                     c("dim", "enabled · next run ") + c("fg", _next_cron_hint(state), bold=True))
@@ -723,11 +735,12 @@ def render(events, cron, state, backlog, *, days=3, lang="both", now=None,
            yest_color="amber" if yest["failed"] > 0 else "dim",
            yest_suffix="⚠" if yest["failed"] > 0 else "")
     metric_dur("duration", today["duration_s"], yest["duration_s"], d2["duration_s"], partial=is_partial)
-    # US-VIEW-012: input + output as two separate rows. cache_read no longer
-    # surfaces here — true cost is on the "cost" line below (computed from all
-    # 4 token kinds via list price). This row labels what the model actually
-    # processed and generated for this cycle.
+    # US-VIEW-017: show all 4 token components so the cost is explainable.
+    # cache_creation (↑) and cache_read (↓) typically account for 80-90% of
+    # cost — hiding them makes the cost line incomprehensible.
     metric_tokens("input tokens",  today["input_tokens"],  yest["input_tokens"],  d2["input_tokens"],  partial=is_partial)
+    metric_tokens("cache writes",  today["cache_creation_tokens"], yest["cache_creation_tokens"], d2["cache_creation_tokens"], partial=is_partial)
+    metric_tokens("cache reads",   today["cache_read_tokens"],     yest["cache_read_tokens"],     d2["cache_read_tokens"],     partial=is_partial)
     metric_tokens("output tokens", today["output_tokens"], yest["output_tokens"], d2["output_tokens"], partial=is_partial)
     metric_dollar("cost",   today["cost"],      yest["cost"],      d2["cost"],       partial=is_partial)
@@ -784,15 +797,18 @@ def _read_plist_loop_minute() -> int:
 def _detect_install_state() -> str:
-    """FIX-095: classify the launchd install state of the loop service.
+    """FIX-095 / FIX-098: classify the launchd install state of the loop service.
     Returns one of:
       'not-installed' — no plist for com.roll.loop.<slug> in ~/Library/LaunchAgents/
-      'disabled'      — plist exists but launchctl print-disabled shows '=> disabled'
-      'enabled'       — plist exists and no disable override is set
-    Pre-FIX-095, the v2 view rendered '● IDLE' for all three states, leaving
-    users unable to tell whether the loop was actually installed/enabled.
+      'stale'         — plist on disk but agent NOT registered in launchd
+                        (happens after roll loop off + roll update without roll loop on)
+      'enabled'       — plist on disk AND registered in launchd
+    FIX-098: switched from `launchctl print-disabled` (disabled-overrides DB) to
+    `launchctl print gui/<uid>/<label>` which probes the actual launchd registry.
+    The old approach returned false-positive 'enabled' when the disabled-overrides
+    DB had no entry for the label (empty = not explicitly disabled, not loaded).
     """
     slug = project_slug()
     label = f"com.roll.loop.{slug}"
@@ -801,17 +817,17 @@ def _detect_install_state() -> str:
         return "not-installed"
     try:
         uid = os.getuid()
-        out = subprocess.run(
-            ["launchctl", "print-disabled", f"gui/{uid}"],
-            capture_output=True, text=True, timeout=2,
-        ).stdout or ""
-        for line in out.splitlines():
-            if f'"{label}"' in line and "=> disabled" in line:
-                return "disabled"
+        result = subprocess.run(
+            ["launchctl", "print", f"gui/{uid}/{label}"],
+            capture_output=True, timeout=2,
+        )
+        if result.returncode == 0:
+            return "enabled"
+        return "stale"
     except Exception:
-        # launchctl missing or timed out — best-effort fall through to enabled.
-        pass
-    return "enabled"
+        # launchctl missing or timed out — assume stale (safe: user sees STALE
+        # banner and is told to run 'roll loop on' to repair).
+        return "stale"
 def _next_cron_hint(state: Dict[str, str], zh: bool = False) -> str:

package/lib/roll_render.py CHANGED Viewed

@@ -298,12 +298,19 @@ def cycle_row(cy: Dict[str, Any], backlog: Dict[str, str]) -> None:
         from datetime import datetime as _dt, timezone as _tz
         dur_s = int((_dt.now(_tz.utc) - cy["start"]).total_seconds())
     dur = fmt_dur(dur_s) if dur_s else "—"
-    # US-VIEW-012: token column shows model's real work as input/output. Cache
-    # creation / cache read are kept in events.ndjson for cost math but never
-    # surface in the UI — they would inflate the visible number to 10–100× the
-    # "real" work done by the model on this cycle. fmt_tokens(0) already
-    # returns "—", so a cycle missing usage_event prints as "—/—".
-    tok = f"{fmt_tokens(cy.get('input_tokens') or 0)}/{fmt_tokens(cy.get('output_tokens') or 0)}"
+    # US-VIEW-017: show all 4 token components when cache data is available.
+    # Format: "in/cw↑ cr↓/out" (cache writes ↑, cache reads ↓).
+    # Falls back to "in/out" for cycles that predate cache tracking.
+    inp = cy.get('input_tokens') or 0
+    out_tok = cy.get('output_tokens') or 0
+    cw  = cy.get('cache_creation_tokens') or 0
+    cr  = cy.get('cache_read_tokens') or 0
+    if cw or cr:
+        tok = (f"{fmt_tokens(inp)}"
+               f"/{fmt_tokens(cw)}↑ {fmt_tokens(cr)}↓"
+               f"/{fmt_tokens(out_tok)}")
+    else:
+        tok = f"{fmt_tokens(inp)}/{fmt_tokens(out_tok)}"
     # cost prefers the backfilled list-price; falls back to cron.log when
     # the claude session log isn't available (only the latest cycle).
     if cy.get("cost_list") is not None:
@@ -347,7 +354,7 @@ def cycle_row(cy: Dict[str, Any], backlog: Dict[str, str]) -> None:
         "  " + c(glyph_c, glyph, bold=True) + "  " +
         c(time_c, pad(time_str, 5), bold=(outcome == "fail")) + "   " +
         c("muted", pad(dur, 4, "r")) + "  " +
-        c("muted", pad(tok, 11, "r")) + "  " +
+        c("muted", pad(tok, 26)) + "  " +
         model_seg +
         c("muted", pad(cost, 7, "r")) + "   " +
         c(sid_c, ids_str, bold=True) + pr_marker

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@seanyao/roll",
-  "version": "2026.522.2",
+  "version": "2026.523.1",
   "description": "Roll — Roll out features with AI agents",
   "scripts": {
     "test": "bash tests/run.sh"

package/skills/roll-.dream/SKILL.md CHANGED Viewed

@@ -224,6 +224,65 @@ Add after `## 文档覆盖度` section:
 {发现内容列表 或 "文档新鲜度良好，无滞后或缺失项。"}
 ```
+### Scan 7 — Test Quality (rubric-driven)
+Apply the test-quality rubric at [guide/en/testing/quality-rubric.md](../../guide/en/testing/quality-rubric.md)
+(Chinese: [quality-rubric.zh.md](../../guide/zh/testing/quality-rubric.md)) against every file under
+`tests/`. The rubric publishes six anti-pattern categories (❶..❻); each has a
+**Signals** subsection that lists the matching heuristics. Scan 7 is purely a
+mechanical apply-the-rubric step — no new logic.
+**Per-category signals** — read from the rubric, summarized here:
+| Marker | Anti-pattern | Cheapest signal |
+|--------|--------------|-----------------|
+| ❶ | Hardcoded business data | Bare numeric / version / pricing literal inside `[[ "$output" == *"..."*` that matches a value also defined in `lib/` |
+| ❷ | Over-mocking real boundaries | `function git() {` / `function gh() {` overrides at the top of a unit test |
+| ❸ | Asserting implementation details | `grep '_internal_helper'` against output; assertions on `.roll/internal/*` paths |
+| ❹ | Fixture order coupling | `setup_file` writes shared mutable state without per-test reset |
+| ❺ | Testing private functions | Test sources a `lib/` file and calls a `_underscore_prefixed` helper directly |
+| ❻ | Asserting framework behavior | References to `$BATS_TEST_NUMBER`, `$BATS_SUITE_NAME` in assertions |
+**Rate cap — 每轮 ≤ 5 条 test-quality REFACTOR entries**. Same dream cycle may
+emit more than 5 findings; the dream scan must rank by severity (❶ > ❷ > ❸ > ❹ > ❺ > ❻
+and within a class, by occurrence count) and only persist the top 5 to BACKLOG.
+Remaining findings go into the dream log under `## 测试质量` but are not made
+into REFACTOR rows — this prevents the backlog from being drowned in test-debt
+on the first scan after rubric publication.
+**REFACTOR entry format** — same as other scans, but tagged with category:
+```markdown
+| REFACTOR-XXX | docs: <one-line description> [test-quality:❶] — flagged by dream YYYY-MM-DD | 📋 Todo |
+```
+The `[test-quality:❶]` (through `❻`) tag is **required** so downstream filtering
+(e.g. "show me all ❶ items still open") is mechanical. The marker character must
+match the rubric exactly.
+**Optional helper** — `bin/dream-test-quality-scan` is a thin shell script
+maintainers can invoke ad-hoc to dry-run the ❶ detector against a single file
+or directory (see `bin/dream-test-quality-scan --help`). The dream skill itself
+does **not** depend on the helper — Scan 7 is the AI agent applying the rubric.
+The helper just exists so a maintainer (or this skill's smoke test) can confirm
+the ❶ heuristic still finds known instances.
+#### Dream Log Section (Scan 7)
+Add after `## 文档新鲜度` section:
+```markdown
+## 测试质量
+- 本轮发现 {N} 项（写入 BACKLOG 的前 5 项见下；剩余 {M} 项仅记录于本日志）
+- ❶ 硬编码业务数据：{count}
+- ❷ 过度 mock：{count}
+- ❸ 断言实现细节：{count}
+- ❹ Fixture 顺序耦合：{count}
+- ❺ 测私有函数：{count}
+- ❻ 断言框架行为：{count}
+{命中文件列表 或 "未发现可治理的测试反模式。"}
+```
 ## Output
 ### REFACTOR Entry (.roll/backlog.md)

package/skills/roll-design/SKILL.md CHANGED Viewed

@@ -118,9 +118,10 @@ Document structure (two-layer separation):
 **Important rules:**
 1. Plan files go in `.roll/features/<feature>-plan.md` (**no longer using** `docs/plans/`)
 2. US details go in the corresponding `.roll/features/<feature>.md`
-3. .roll/backlog.md only contains index rows (one row per US), **do not write** AC / Files / Notes
-4. Domain model files go in `.roll/domain/` — create on first greenfield design, update incrementally
-5. **Do not** write to `~/.kimi/` or any global config directory
+3. **FIX / IDEA detail files use ID-prefixed filenames**: `.roll/features/<epic>/FIX-097.md`, not `.roll/features/<epic>/some-descriptive-slug.md`. Reason: a single FIX is one card, not a long-lived feature; the ID is the most stable handle, descriptive slugs date quickly and break links. US can keep feature-slug naming (US lives inside a multi-Story feature file). Quick lookup: `ls .roll/features/<epic>/FIX-*.md` finds all bugs in that area without grepping content.
+4. .roll/backlog.md only contains index rows (one row per US), **do not write** AC / Files / Notes
+5. Domain model files go in `.roll/domain/` — create on first greenfield design, update incrementally
+6. **Do not** write to `~/.kimi/` or any global config directory
 **File path resolution order:**
 1. Determine Feature ownership (based on the requirement domain: compiler / ingest / qa / ...)

package/skills/roll-notes/SKILL.md CHANGED Viewed

@@ -29,7 +29,7 @@ $roll-notes 今天的 code review 给了很好的反馈
 ## Behavior
-1. **Determine file path**: `notes/YYYY-MM-DD.md` relative to project root
+1. **Determine file path**: `.roll/notes/YYYY-MM-DD.md` relative to project root (parallel to `.roll/dream/` and `.roll/briefs/` — notes is project metadata, not source)
 2. **Get current time**: Use `Asia/Shanghai` timezone (`TZ=Asia/Shanghai date`)
 3. **Read existing entries for style**: Before writing, read the last 2–3 entries
    in the same file. Analyze their style: heading format, voice/tone,
@@ -95,6 +95,9 @@ $roll-notes 今天的 code review 给了很好的反馈
 ## File location
 ```
-notes/
-  └── YYYY-MM-DD.md
+.roll/
+  └── notes/
+        └── YYYY-MM-DD.md
 ```
+注：notes 是项目元数据（与 `.roll/dream/` / `.roll/briefs/` 同级），不入 git；由 dream/brief 等下游 skill 跨日聚合。