npm - @windyroad/risk-scorer - Versions diffs - 0.12.7-preview.583 → 0.12.7-preview.591 - Mend

@windyroad/risk-scorer 0.12.7-preview.583 → 0.12.7-preview.591

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (12) hide show

package/README.md +10 -0
package/hooks/git-push-gate.sh +18 -0
package/hooks/lib/risk-gate.sh +142 -0
package/hooks/test/ci-status-gate.bats +240 -0
package/package.json +1 -1
package/skills/assess-external-comms/SKILL.md +3 -3
package/skills/assess-release/SKILL.md +3 -3
package/skills/assess-release/test/assess-skills-delegate-via-skill-tool.bats +162 -0
package/skills/assess-wip/SKILL.md +3 -3
package/skills/external-comms/SKILL.md +37 -0
package/skills/pipeline/SKILL.md +34 -0
package/skills/wip/SKILL.md +33 -0

package/README.md CHANGED Viewed

@@ -79,6 +79,16 @@ The plugin includes six specialised agents:
 | `/wr-risk-scorer:bootstrap-catalog` | Bootstrap `docs/risks/` register from existing `.risk-reports/` corpus per ADR-059 — walks reports, dedupes by ADR-056 slug, emits one `R<NNN>-<slug>.active.md` per unique slug. Idempotent. Auto-triggers from `/install-updates` Step 6.5.1 when register is empty + `RISK-POLICY.md` present + `.risk-reports/` non-empty |
 | `/wr-risk-scorer:update-policy` | Generate or update `RISK-POLICY.md` |
+### Internal-use wrapper skills
+These wrappers exist so consumer SKILLs can invoke the scoring agents via the Skill tool with `skill: wr-risk-scorer:<name>` per ADR-015's Confirmation literal phrasing. End users should invoke the `/wr-risk-scorer:assess-*` skills above; the wrappers are internal plumbing.
+| Wrapper skill | Purpose |
+|---------------|---------|
+| `wr-risk-scorer:pipeline` | Skill-tool wrapper around the `wr-risk-scorer:pipeline` agent (consumer: `/wr-risk-scorer:assess-release`) |
+| `wr-risk-scorer:wip` | Skill-tool wrapper around the `wr-risk-scorer:wip` agent (consumer: `/wr-risk-scorer:assess-wip`) |
+| `wr-risk-scorer:external-comms` | Skill-tool wrapper around the `wr-risk-scorer:external-comms` agent (consumer: `/wr-risk-scorer:assess-external-comms`) |
 ## External-comms gate
 The `external-comms-gate.sh` hook intercepts outbound prose tool calls and the

package/hooks/git-push-gate.sh CHANGED Viewed

@@ -45,6 +45,16 @@ if echo "$COMMAND" | grep -qE '(^|;|&&|\|\|)\s*npm run push:watch(\s|$)'; then
         if [ -f "${RDIR}/clean" ]; then
             exit 0
         fi
+        # CI-status precondition (P208): a within-appetite predicted-risk
+        # score is necessary but not sufficient — the lagging CI signal
+        # must also be green (or no-history-yet for the documented
+        # first-push case). Fail-closed on gh errors. Ordered AFTER the
+        # one-shot bypass markers and BEFORE the predicted-risk gate so
+        # incident workflows and clean-tree pushes are unaffected.
+        if ! check_ci_status "$SESSION_ID" "push"; then
+            risk_gate_deny "Push blocked: ${CI_GATE_REASON}"
+            exit 0
+        fi
         if ! check_risk_gate "$SESSION_ID" "push"; then
             if [ "$RISK_GATE_CATEGORY" = "threshold" ]; then
                 risk_gate_deny "Push blocked: Push risk score ${RISK_GATE_SCORE}/25 (Medium or above). To proceed: (1) release first via \`npm run release:watch\`, (2) split the push, or (3) add risk-reducing measures. If risk-neutral or risk-reducing, delegate to wr-risk-scorer:pipeline (subagent_type: 'wr-risk-scorer:pipeline') — it will create a bypass marker."
@@ -83,6 +93,8 @@ if echo "$COMMAND" | grep -qE '(^|;|&&|\|\|)\s*npm run release:watch(\s|$)'; the
         # Live-incident bypass: if an incident marker exists, allow release
         # regardless of risk score. Used when addressing outages, security
         # incidents, or information disclosure that requires immediate deployment.
+        # Per JTBD-201, this MUST short-circuit BEFORE the CI-status check
+        # so the hotfix path is unaffected by red CI on master.
         if [ -f "${RDIR}/incident-release" ]; then
             rm -f "${RDIR}/incident-release"
             exit 0
@@ -92,6 +104,12 @@ if echo "$COMMAND" | grep -qE '(^|;|&&|\|\|)\s*npm run release:watch(\s|$)'; the
             rm -f "${RDIR}/reducing-release"
             exit 0
         fi
+        # CI-status precondition (P208): a green CI run on the target
+        # branch is required before shipping. Fail-closed on gh errors.
+        if ! check_ci_status "$SESSION_ID" "release"; then
+            risk_gate_deny "Release blocked: ${CI_GATE_REASON}"
+            exit 0
+        fi
         if ! check_risk_gate "$SESSION_ID" "release"; then
             risk_gate_deny "Release blocked: ${RISK_GATE_REASON}"
             exit 0

package/hooks/lib/risk-gate.sh CHANGED Viewed

@@ -153,6 +153,148 @@ print(('yes' if score > N else 'no') + ' ' + str(N))
   return 0
 }
+# Check CI health for the current branch (P208).
+#
+# Returns 0 if push/release may proceed, 1 if denied. Sets CI_GATE_REASON
+# on deny with a human-readable message that names the CI conclusion and
+# the run URL. Sets CI_GATE_CATEGORY ∈ {bypass, no-history, allow, red,
+# pending, gh-error}.
+#
+# Consults `gh run list --branch <current-branch> --limit 1 --json
+# status,conclusion,databaseId,url` for the working branch's most recent
+# CI run.
+#
+# Decision table:
+#   - bypass marker present (${RDIR}/ci-bypass-${ACTION}) → allow, consume
+#   - gh failure (auth / timeout / API error) → DENY (fail-CLOSED, per
+#     P208 safe-high-fix-risk classifier — a buggy harden must NOT
+#     degrade to allow)
+#   - empty result `[]` → allow (no CI history yet; first push triggers
+#     CI naturally)
+#   - status ∈ {queued, in_progress, pending, requested, waiting} → deny
+#   - conclusion ∈ {failure, cancelled, timed_out, action_required,
+#     startup_failure} → deny
+#   - conclusion ∈ {success, skipped, neutral} or unknown → allow
+#
+# Usage: check_ci_status "$SESSION_ID" "push"   # or "release"
+check_ci_status() {
+  local SESSION_ID="$1"
+  local ACTION="$2"
+  local RDIR
+  RDIR=$(_risk_dir "$SESSION_ID")
+  local BYPASS_MARKER="${RDIR}/ci-bypass-${ACTION}"
+  CI_GATE_REASON=""
+  CI_GATE_CATEGORY=""
+  # One-shot bypass marker — consumed on use, same family as
+  # reducing-push / incident-release. Documented override for the
+  # legitimate "first push triggers CI" edge case and infra incidents.
+  if [ -f "$BYPASS_MARKER" ]; then
+    rm -f "$BYPASS_MARKER"
+    CI_GATE_CATEGORY="bypass"
+    return 0
+  fi
+  # Resolve current branch. If we're not in a git repo or HEAD is
+  # detached, skip the CI check (the surrounding push/release gate
+  # would itself fail at the git layer with a clearer error).
+  local BRANCH
+  BRANCH=$(git rev-parse --abbrev-ref HEAD 2>/dev/null || echo "")
+  if [ -z "$BRANCH" ] || [ "$BRANCH" = "HEAD" ]; then
+    CI_GATE_CATEGORY="allow"
+    return 0
+  fi
+  # Query GitHub. Bounded at 10s wall-clock so a network stall cannot
+  # hang push:watch indefinitely. `command -v timeout` because macOS
+  # default install does not ship GNU `timeout`.
+  local JSON GH_EXIT
+  if command -v timeout >/dev/null 2>&1; then
+    JSON=$(timeout 10s gh run list --branch "$BRANCH" --limit 1 \
+        --json status,conclusion,databaseId,url 2>/dev/null) || GH_EXIT=$?
+  else
+    JSON=$(gh run list --branch "$BRANCH" --limit 1 \
+        --json status,conclusion,databaseId,url 2>/dev/null) || GH_EXIT=$?
+  fi
+  if [ -n "${GH_EXIT:-}" ] && [ "$GH_EXIT" != "0" ]; then
+    CI_GATE_CATEGORY="gh-error"
+    CI_GATE_REASON="CI status check failed (gh exit ${GH_EXIT}: auth / timeout / API error). Fail-closed per P208 safe-high-fix-risk. Fix the underlying gh failure, or to override for a legitimate first-push-triggers-CI run, create the bypass marker: touch ${BYPASS_MARKER}"
+    return 1
+  fi
+  # Empty array = no CI history for this branch yet. Natural allow for
+  # the documented "first push triggers CI" case — no marker needed.
+  local TRIMMED
+  TRIMMED=$(printf '%s' "$JSON" | tr -d '[:space:]')
+  if [ -z "$TRIMMED" ] || [ "$TRIMMED" = "[]" ]; then
+    CI_GATE_CATEGORY="no-history"
+    return 0
+  fi
+  # Parse status, conclusion, url. Fail-closed on parse error.
+  local PARSED
+  PARSED=$(echo "$JSON" | python3 -c "
+import sys, json
+try:
+    runs = json.load(sys.stdin)
+    if not isinstance(runs, list) or not runs:
+        print('||')
+        sys.exit(0)
+    r = runs[0]
+    print('{}|{}|{}'.format(r.get('status') or '', r.get('conclusion') or '', r.get('url') or ''))
+except Exception:
+    print('PARSE_ERROR||')
+" 2>/dev/null || echo "PARSE_ERROR||")
+  local STATUS CONCLUSION URL
+  STATUS="${PARSED%%|*}"
+  local REST="${PARSED#*|}"
+  CONCLUSION="${REST%%|*}"
+  URL="${REST#*|}"
+  if [ "$STATUS" = "PARSE_ERROR" ]; then
+    CI_GATE_CATEGORY="gh-error"
+    CI_GATE_REASON="CI status check returned unparseable response. Fail-closed per P208 safe-high-fix-risk. To override for a legitimate first-push case, create the bypass marker: touch ${BYPASS_MARKER}"
+    return 1
+  fi
+  case "$STATUS" in
+    queued|in_progress|pending|requested|waiting)
+      CI_GATE_CATEGORY="pending"
+      CI_GATE_REASON="Latest CI run on branch '${BRANCH}' is still in flight (status: ${STATUS}). Wait for it to settle: ${URL}. To override, create the bypass marker: touch ${BYPASS_MARKER}"
+      return 1
+      ;;
+    completed)
+      case "$CONCLUSION" in
+        success|skipped|neutral|"")
+          CI_GATE_CATEGORY="allow"
+          return 0
+          ;;
+        failure|cancelled|timed_out|action_required|startup_failure)
+          CI_GATE_CATEGORY="red"
+          CI_GATE_REASON="Latest CI run on branch '${BRANCH}' concluded ${CONCLUSION}: ${URL}. Fix CI before pushing/releasing. To override for a legitimate first-push or infra-incident case, create the bypass marker: touch ${BYPASS_MARKER}"
+          return 1
+          ;;
+        *)
+          # Unknown conclusion — allow rather than block on a value we
+          # don't recognise. New GitHub conclusion values are infrequent.
+          CI_GATE_CATEGORY="allow"
+          return 0
+          ;;
+      esac
+      ;;
+    *)
+      # Unknown status — allow rather than block on a value we don't
+      # recognise. Conservative tilts toward the threshold check below
+      # catching the actual risk.
+      CI_GATE_CATEGORY="allow"
+      return 0
+      ;;
+  esac
+}
 # Emit fail-closed deny JSON for PreToolUse hooks.
 risk_gate_deny() {
   local REASON="$1"

package/hooks/test/ci-status-gate.bats ADDED Viewed

@@ -0,0 +1,240 @@
+#!/usr/bin/env bats
+# Tests for the CI-status precondition in the push/release gate.
+#
+# Closes P208 (known-error): git-push-gate.sh did not consult CI health
+# before scoring push/release risk, so a push could land on a CI-red
+# master and a release could ship broken code.
+#
+# Contract:
+# - `check_ci_status` queries `gh run list --branch <current-branch>
+#   --limit 1 --json status,conclusion,databaseId,url` for the current
+#   branch and returns 0 (allow) / 1 (deny).
+# - Deny on conclusion ∈ {failure, cancelled, timed_out, action_required,
+#   startup_failure}.
+# - Deny on status ∈ {queued, in_progress, pending, requested, waiting}.
+# - Allow on conclusion ∈ {success, skipped, neutral} or unknown.
+# - Empty array (no CI history yet) → allow. Handles the documented
+#   "first push triggers CI" case naturally — no bypass marker required.
+# - `gh` failure (auth/timeout/API error) → DENY (fail-closed per the
+#   safe-high-fix-risk classifier on P208).
+# - `${RDIR}/ci-bypass-${ACTION}` one-shot bypass marker — consumed on
+#   use, same family as reducing-push / incident-release.
+# - Integration: in git-push-gate.sh, the ordering is bypass-markers →
+#   CI status → risk gate. The `incident-release` bypass MUST short-
+#   circuit BEFORE the CI check fires (per JTBD-201 + ADR-018).
+setup() {
+  HOOKS_DIR="$(cd "$(dirname "$BATS_TEST_FILENAME")/.." && pwd)"
+  source "$HOOKS_DIR/lib/gate-helpers.sh"
+  source "$HOOKS_DIR/lib/risk-gate.sh"
+  TEST_SESSION="bats-ci-gate-$$-${BATS_TEST_NUMBER}"
+  RDIR=$(_risk_dir "$TEST_SESSION")
+  rm -rf "$RDIR"
+  mkdir -p "$RDIR"
+  # Stand up a fake git repo so `git rev-parse --abbrev-ref HEAD` resolves.
+  TEST_REPO="$(mktemp -d)"
+  ( cd "$TEST_REPO" && git init -q -b main && \
+      git -c user.email=t@e -c user.name=t commit --allow-empty -q -m "init" )
+  # Stub `gh` on PATH. The stub reads $FAKE_GH_OUTPUT and $FAKE_GH_EXIT
+  # for behaviour. PATH ordering: stub dir first.
+  STUB_DIR="$(mktemp -d)"
+  cat > "$STUB_DIR/gh" <<'STUB'
+#!/bin/bash
+if [ -n "${FAKE_GH_DELAY:-}" ]; then sleep "$FAKE_GH_DELAY"; fi
+if [ -n "${FAKE_GH_OUTPUT:-}" ]; then
+  printf '%s' "$FAKE_GH_OUTPUT"
+fi
+exit "${FAKE_GH_EXIT:-0}"
+STUB
+  chmod +x "$STUB_DIR/gh"
+  # `timeout` may not exist on the path on some macOS setups — stub a
+  # passthrough for portability. Tests inject FAKE_GH_DELAY only when
+  # they specifically test timeout behaviour.
+  ORIG_PATH="$PATH"
+  export PATH="$STUB_DIR:$PATH"
+  export TEST_REPO STUB_DIR
+}
+teardown() {
+  rm -rf "$RDIR" "$TEST_REPO" "$STUB_DIR"
+  export PATH="$ORIG_PATH"
+  unset FAKE_GH_OUTPUT FAKE_GH_EXIT FAKE_GH_DELAY CI_GATE_REASON CI_GATE_CATEGORY 2>/dev/null || true
+}
+# Run check_ci_status inside the fake repo so branch resolution works.
+_run_check() {
+  local action="$1"
+  CI_GATE_REASON=""
+  CI_GATE_CATEGORY=""
+  ( cd "$TEST_REPO" && \
+      FAKE_GH_OUTPUT="${FAKE_GH_OUTPUT:-}" FAKE_GH_EXIT="${FAKE_GH_EXIT:-0}" \
+      bash -c "source '$HOOKS_DIR/lib/gate-helpers.sh'; source '$HOOKS_DIR/lib/risk-gate.sh'; \
+               if check_ci_status '$TEST_SESSION' '$action'; then echo ALLOW; \
+               else echo \"DENY: \$CI_GATE_REASON\"; fi" )
+}
+@test "check_ci_status allows when latest CI run concluded success" {
+  export FAKE_GH_OUTPUT='[{"status":"completed","conclusion":"success","databaseId":1,"url":"https://github.com/x/y/actions/runs/1"}]'
+  result=$(_run_check "push")
+  [[ "$result" == "ALLOW" ]]
+}
+@test "check_ci_status denies when latest CI run concluded failure (names conclusion + URL)" {
+  export FAKE_GH_OUTPUT='[{"status":"completed","conclusion":"failure","databaseId":2,"url":"https://github.com/x/y/actions/runs/2"}]'
+  result=$(_run_check "push")
+  [[ "$result" == DENY:* ]]
+  [[ "$result" == *"failure"* ]]
+  [[ "$result" == *"https://github.com/x/y/actions/runs/2"* ]]
+}
+@test "check_ci_status denies when latest CI run concluded cancelled" {
+  export FAKE_GH_OUTPUT='[{"status":"completed","conclusion":"cancelled","databaseId":3,"url":"https://github.com/x/y/actions/runs/3"}]'
+  result=$(_run_check "release")
+  [[ "$result" == DENY:* ]]
+  [[ "$result" == *"cancelled"* ]]
+}
+@test "check_ci_status denies when latest CI run concluded timed_out" {
+  export FAKE_GH_OUTPUT='[{"status":"completed","conclusion":"timed_out","databaseId":4,"url":"https://github.com/x/y/actions/runs/4"}]'
+  result=$(_run_check "push")
+  [[ "$result" == DENY:* ]]
+  [[ "$result" == *"timed_out"* ]]
+}
+@test "check_ci_status denies when latest CI run status is in_progress" {
+  export FAKE_GH_OUTPUT='[{"status":"in_progress","conclusion":null,"databaseId":5,"url":"https://github.com/x/y/actions/runs/5"}]'
+  result=$(_run_check "push")
+  [[ "$result" == DENY:* ]]
+  [[ "$result" == *"in_progress"* ]]
+}
+@test "check_ci_status denies when latest CI run status is queued" {
+  export FAKE_GH_OUTPUT='[{"status":"queued","conclusion":null,"databaseId":6,"url":"https://github.com/x/y/actions/runs/6"}]'
+  result=$(_run_check "release")
+  [[ "$result" == DENY:* ]]
+  [[ "$result" == *"queued"* ]]
+}
+@test "check_ci_status allows when CI history is empty (first push triggers CI)" {
+  export FAKE_GH_OUTPUT='[]'
+  result=$(_run_check "push")
+  [[ "$result" == "ALLOW" ]]
+}
+@test "check_ci_status denies when gh exits non-zero (fail-closed, safe-high-fix-risk)" {
+  export FAKE_GH_OUTPUT=''
+  export FAKE_GH_EXIT=1
+  result=$(_run_check "push")
+  [[ "$result" == DENY:* ]]
+  # Must point at the ci-bypass marker for the documented override path.
+  [[ "$result" == *"ci-bypass-push"* ]]
+}
+@test "check_ci_status allows when ci-bypass marker is present and consumes it" {
+  : > "$RDIR/ci-bypass-push"
+  export FAKE_GH_OUTPUT='[{"status":"completed","conclusion":"failure","databaseId":7,"url":"https://github.com/x/y/actions/runs/7"}]'
+  result=$(_run_check "push")
+  [[ "$result" == "ALLOW" ]]
+  # Bypass markers are one-shot — same family as reducing-push / incident-release.
+  [ ! -f "$RDIR/ci-bypass-push" ]
+}
+@test "check_ci_status bypass marker is action-scoped (push marker does not bypass release)" {
+  : > "$RDIR/ci-bypass-push"
+  export FAKE_GH_OUTPUT='[{"status":"completed","conclusion":"failure","databaseId":8,"url":"https://github.com/x/y/actions/runs/8"}]'
+  result=$(_run_check "release")
+  [[ "$result" == DENY:* ]]
+  # push bypass must not have been consumed by a release check
+  [ -f "$RDIR/ci-bypass-push" ]
+}
+@test "check_ci_status allows when conclusion is skipped" {
+  export FAKE_GH_OUTPUT='[{"status":"completed","conclusion":"skipped","databaseId":9,"url":"https://github.com/x/y/actions/runs/9"}]'
+  result=$(_run_check "push")
+  [[ "$result" == "ALLOW" ]]
+}
+@test "check_ci_status allows when conclusion is neutral" {
+  export FAKE_GH_OUTPUT='[{"status":"completed","conclusion":"neutral","databaseId":10,"url":"https://github.com/x/y/actions/runs/10"}]'
+  result=$(_run_check "release")
+  [[ "$result" == "ALLOW" ]]
+}
+# ---------------------------------------------------------------------------
+# Integration: git-push-gate.sh ordering — bypass-markers → CI status → risk
+# gate. JTBD-201 demands the incident-release bypass MUST short-circuit
+# BEFORE the new CI-status check fires.
+# ---------------------------------------------------------------------------
+# Helper: build a PreToolUse Bash input with a given command
+_build_input() {
+  local cmd="$1"
+  cat <<JSON
+{
+  "session_id": "$TEST_SESSION",
+  "tool_name": "Bash",
+  "tool_input": {
+    "command": "$cmd"
+  }
+}
+JSON
+}
+@test "git-push-gate.sh denies push:watch when CI is red even if risk score is within appetite" {
+  # Within-appetite risk score
+  echo "1" > "$RDIR/push"
+  # Disable drift check (no stored hash file)
+  export FAKE_GH_OUTPUT='[{"status":"completed","conclusion":"failure","databaseId":11,"url":"https://github.com/x/y/actions/runs/11"}]'
+  INPUT=$(_build_input "npm run push:watch")
+  output=$( cd "$TEST_REPO" && echo "$INPUT" | \
+    FAKE_GH_OUTPUT="$FAKE_GH_OUTPUT" PATH="$STUB_DIR:$PATH" \
+    "$HOOKS_DIR/git-push-gate.sh" )
+  [[ "$output" == *"permissionDecision"* ]]
+  [[ "$output" == *"deny"* ]]
+  [[ "$output" == *"failure"* ]]
+}
+@test "git-push-gate.sh denies release:watch when CI is red even if risk score is within appetite" {
+  echo "1" > "$RDIR/release"
+  export FAKE_GH_OUTPUT='[{"status":"completed","conclusion":"failure","databaseId":12,"url":"https://github.com/x/y/actions/runs/12"}]'
+  INPUT=$(_build_input "npm run release:watch")
+  output=$( cd "$TEST_REPO" && echo "$INPUT" | \
+    FAKE_GH_OUTPUT="$FAKE_GH_OUTPUT" PATH="$STUB_DIR:$PATH" \
+    "$HOOKS_DIR/git-push-gate.sh" )
+  [[ "$output" == *"permissionDecision"* ]]
+  [[ "$output" == *"deny"* ]]
+  [[ "$output" == *"failure"* ]]
+}
+@test "git-push-gate.sh allows release:watch when incident-release bypass is set, even if CI is red (JTBD-201)" {
+  echo "9" > "$RDIR/release"
+  : > "$RDIR/incident-release"
+  # Even with a red CI conclusion, the incident bypass must short-circuit
+  # both the CI check and the risk threshold.
+  export FAKE_GH_OUTPUT='[{"status":"completed","conclusion":"failure","databaseId":13,"url":"https://github.com/x/y/actions/runs/13"}]'
+  INPUT=$(_build_input "npm run release:watch")
+  output=$( cd "$TEST_REPO" && echo "$INPUT" | \
+    FAKE_GH_OUTPUT="$FAKE_GH_OUTPUT" PATH="$STUB_DIR:$PATH" \
+    "$HOOKS_DIR/git-push-gate.sh" )
+  # No permissionDecision means allow (exit 0 with no JSON).
+  [[ "$output" != *"permissionDecision"* ]]
+  # incident-release marker is one-shot — must be consumed
+  [ ! -f "$RDIR/incident-release" ]
+}
+@test "git-push-gate.sh allows push:watch when CI history is empty (first push)" {
+  echo "1" > "$RDIR/push"
+  export FAKE_GH_OUTPUT='[]'
+  INPUT=$(_build_input "npm run push:watch")
+  output=$( cd "$TEST_REPO" && echo "$INPUT" | \
+    FAKE_GH_OUTPUT="$FAKE_GH_OUTPUT" PATH="$STUB_DIR:$PATH" \
+    "$HOOKS_DIR/git-push-gate.sh" )
+  [[ "$output" != *"permissionDecision"* ]]
+}

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@windyroad/risk-scorer",
-  "version": "0.12.7-preview.583",
+  "version": "0.12.7-preview.591",
   "description": "Pipeline risk scoring, commit/push gates, and secret leak detection",
   "bin": {
     "windyroad-risk-scorer": "./bin/install.mjs"

package/skills/assess-external-comms/SKILL.md CHANGED Viewed

@@ -72,14 +72,14 @@ The orchestrator does NOT pre-compute the key — the hook derives it from the p
 ### 4. Delegate to wr-risk-scorer:external-comms
-Invoke the subagent via the `Skill` tool:
+Invoke the external-comms reviewer via the `Skill` tool. The `wr-risk-scorer:external-comms` SKILL is a thin wrapper around the external-comms agent (per ADR-015 — see `packages/risk-scorer/skills/external-comms/SKILL.md`):
 ```
-subagent_type: wr-risk-scorer:external-comms
+skill: wr-risk-scorer:external-comms
 prompt: <constructed review prompt from step 3>
 ```
-Wait for the subagent to complete. The subagent outputs a structured verdict block (`EXTERNAL_COMMS_RISK_VERDICT: PASS|FAIL` + optional `EXTERNAL_COMMS_RISK_REASON: ...` on FAIL). The `PostToolUse:Agent` hook (`risk-score-mark.sh`) parses the verdict, derives the marker key from the prompt's `SURFACE:` + `<draft>` structure, and writes the marker automatically on PASS.
+Wait for the wrapper to return. The wrapper invokes the external-comms agent internally; the agent's structured verdict block (`EXTERNAL_COMMS_RISK_VERDICT: PASS|FAIL` + optional `EXTERNAL_COMMS_RISK_REASON: ...` on FAIL) flows back verbatim. The `PostToolUse:Agent` hook (`risk-score-mark.sh`) fires on the wrapper's inner Agent invocation, derives the marker key from the prompt's `SURFACE:` + `<draft>` structure, and writes the marker automatically on PASS.
 **Do not write to `${TMPDIR:-/tmp}/claude-risk-*` yourself.** The hook is the only correct mechanism.

package/skills/assess-release/SKILL.md CHANGED Viewed

@@ -64,14 +64,14 @@ Build a self-contained prompt for the pipeline subagent that includes:
 ### 5. Delegate to wr-risk-scorer:pipeline
-Invoke the pipeline subagent via the `Skill` tool:
+Invoke the pipeline scorer via the `Skill` tool. The `wr-risk-scorer:pipeline` SKILL is a thin wrapper around the pipeline agent (per ADR-015 — see `packages/risk-scorer/skills/pipeline/SKILL.md`):
 ```
-subagent_type: wr-risk-scorer:pipeline
+skill: wr-risk-scorer:pipeline
 prompt: <constructed assessment prompt from step 4>
 ```
-Wait for the subagent to complete. The subagent will output a structured `RISK_SCORES:` block. The `PostToolUse:Agent` hook (`risk-score-mark.sh`) reads that output and writes the bypass marker files automatically.
+Wait for the wrapper to return. The wrapper invokes the pipeline agent internally; the agent's structured `RISK_SCORES:` block flows back through the wrapper verbatim. The `PostToolUse:Agent` hook (`risk-score-mark.sh`) fires on the wrapper's inner Agent invocation and writes the bypass marker files automatically.
 **Do not write to `$TMPDIR/claude-risk-*` yourself.** The hook is the only correct mechanism.

package/skills/assess-release/test/assess-skills-delegate-via-skill-tool.bats ADDED Viewed

@@ -0,0 +1,162 @@
+#!/usr/bin/env bats
+# Contract guard: the on-demand assessment SKILLs (assess-release, assess-wip,
+# assess-external-comms) MUST delegate to their scoring agent via the Skill
+# tool — not via the Agent tool — matching ADR-015's Confirmation literal
+# phrasing ("the skill delegates to wr-risk-scorer:<agent> via the Skill
+# tool"). Closes the P205 contradiction surfaced by ADR-015 Confirmation
+# vs. SKILL.md prose mismatch.
+#
+# Structural assertions — Permitted Exception to the source-grep ban
+# (ADR-005 / P011), same framing as risk-scorer-register-hint.bats. SKILL.md
+# prose IS the contract document the orchestrator (Claude) consumes when
+# executing the SKILL; an LLM-output behavioural check is out of scope for
+# bats and is the responsibility of the promptfoo harness (ADR-075).
+#
+# What is asserted (contract, not implementation):
+#   1. Each assess-* SKILL's step 5 (release) / step 3 (wip) / step 4
+#      (external-comms) names `skill:` as the delegation tool parameter
+#      with the correct wrapper SKILL name as the target.
+#   2. None of the assess-* SKILLs name `subagent_type:` as the delegation
+#      tool parameter (the P205 contradiction class).
+#   3. Each wrapper SKILL (`pipeline`, `wip`, `external-comms`) exists at
+#      its expected path, is namespaced `wr-risk-scorer:<name>`, and
+#      delegates to its sibling agent via `subagent_type:`.
+#
+# Cross-reference:
+#   P205:    docs/problems/known-error/205-wr-risk-scorer-assess-release-skill-md-step-5-prose-says-skill-tool-but-provides-subagent-type.md
+#   ADR-015: docs/decisions/015-on-demand-assessment-skills.proposed.md (Confirmation criteria 189-193)
+#   ADR-052: docs/decisions/052-behavioural-tests-default.proposed.md (Permitted Exception)
+#   @jtbd JTBD-005 (invoke governance assessments on demand)
+#   @jtbd JTBD-101 (extend the suite — plugins expose corresponding skills)
+setup() {
+  SKILLS_DIR="$(cd "$(dirname "$BATS_TEST_FILENAME")/../.." && pwd)"
+  ASSESS_RELEASE="${SKILLS_DIR}/assess-release/SKILL.md"
+  ASSESS_WIP="${SKILLS_DIR}/assess-wip/SKILL.md"
+  ASSESS_EXTERNAL_COMMS="${SKILLS_DIR}/assess-external-comms/SKILL.md"
+  WRAPPER_PIPELINE="${SKILLS_DIR}/pipeline/SKILL.md"
+  WRAPPER_WIP="${SKILLS_DIR}/wip/SKILL.md"
+  WRAPPER_EXTERNAL_COMMS="${SKILLS_DIR}/external-comms/SKILL.md"
+}
+# ──────────────────────────────────────────────────────────────────────────────
+# Consumer SKILLs delegate via Skill tool (skill: parameter)
+# ──────────────────────────────────────────────────────────────────────────────
+@test "assess-release delegates via skill: wr-risk-scorer:pipeline" {
+  [ -f "$ASSESS_RELEASE" ]
+  run grep -E "^skill: wr-risk-scorer:pipeline$" "$ASSESS_RELEASE"
+  [ "$status" -eq 0 ]
+}
+@test "assess-release does NOT use subagent_type: in its delegation block" {
+  [ -f "$ASSESS_RELEASE" ]
+  # The P205 contradiction was: prose says "Skill tool" but provides
+  # subagent_type: wr-risk-scorer:pipeline. After the fix, no
+  # `subagent_type:` line may appear in the delegation block.
+  run grep -E "^subagent_type: wr-risk-scorer:pipeline$" "$ASSESS_RELEASE"
+  [ "$status" -ne 0 ]
+}
+@test "assess-wip delegates via skill: wr-risk-scorer:wip" {
+  [ -f "$ASSESS_WIP" ]
+  run grep -E "^skill: wr-risk-scorer:wip$" "$ASSESS_WIP"
+  [ "$status" -eq 0 ]
+}
+@test "assess-wip does NOT use subagent_type: in its delegation block" {
+  [ -f "$ASSESS_WIP" ]
+  run grep -E "^subagent_type: wr-risk-scorer:wip$" "$ASSESS_WIP"
+  [ "$status" -ne 0 ]
+}
+@test "assess-external-comms delegates via skill: wr-risk-scorer:external-comms" {
+  [ -f "$ASSESS_EXTERNAL_COMMS" ]
+  run grep -E "^skill: wr-risk-scorer:external-comms$" "$ASSESS_EXTERNAL_COMMS"
+  [ "$status" -eq 0 ]
+}
+@test "assess-external-comms does NOT use subagent_type: in its delegation block" {
+  [ -f "$ASSESS_EXTERNAL_COMMS" ]
+  run grep -E "^subagent_type: wr-risk-scorer:external-comms$" "$ASSESS_EXTERNAL_COMMS"
+  [ "$status" -ne 0 ]
+}
+# ──────────────────────────────────────────────────────────────────────────────
+# Wrapper SKILLs exist with correct names and delegate to the agent
+# ──────────────────────────────────────────────────────────────────────────────
+@test "wrapper SKILL packages/risk-scorer/skills/pipeline/SKILL.md exists" {
+  [ -f "$WRAPPER_PIPELINE" ]
+}
+@test "wrapper SKILL pipeline declares name: wr-risk-scorer:pipeline" {
+  [ -f "$WRAPPER_PIPELINE" ]
+  run grep -E "^name: wr-risk-scorer:pipeline$" "$WRAPPER_PIPELINE"
+  [ "$status" -eq 0 ]
+}
+@test "wrapper SKILL pipeline delegates to the pipeline agent via subagent_type:" {
+  [ -f "$WRAPPER_PIPELINE" ]
+  run grep -E "^subagent_type: wr-risk-scorer:pipeline$" "$WRAPPER_PIPELINE"
+  [ "$status" -eq 0 ]
+}
+@test "wrapper SKILL packages/risk-scorer/skills/wip/SKILL.md exists" {
+  [ -f "$WRAPPER_WIP" ]
+}
+@test "wrapper SKILL wip declares name: wr-risk-scorer:wip" {
+  [ -f "$WRAPPER_WIP" ]
+  run grep -E "^name: wr-risk-scorer:wip$" "$WRAPPER_WIP"
+  [ "$status" -eq 0 ]
+}
+@test "wrapper SKILL wip delegates to the wip agent via subagent_type:" {
+  [ -f "$WRAPPER_WIP" ]
+  run grep -E "^subagent_type: wr-risk-scorer:wip$" "$WRAPPER_WIP"
+  [ "$status" -eq 0 ]
+}
+@test "wrapper SKILL packages/risk-scorer/skills/external-comms/SKILL.md exists" {
+  [ -f "$WRAPPER_EXTERNAL_COMMS" ]
+}
+@test "wrapper SKILL external-comms declares name: wr-risk-scorer:external-comms" {
+  [ -f "$WRAPPER_EXTERNAL_COMMS" ]
+  run grep -E "^name: wr-risk-scorer:external-comms$" "$WRAPPER_EXTERNAL_COMMS"
+  [ "$status" -eq 0 ]
+}
+@test "wrapper SKILL external-comms delegates to the external-comms agent via subagent_type:" {
+  [ -f "$WRAPPER_EXTERNAL_COMMS" ]
+  run grep -E "^subagent_type: wr-risk-scorer:external-comms$" "$WRAPPER_EXTERNAL_COMMS"
+  [ "$status" -eq 0 ]
+}
+# ──────────────────────────────────────────────────────────────────────────────
+# Wrapper SKILLs disambiguate from end-user assess-* surfaces
+# ──────────────────────────────────────────────────────────────────────────────
+@test "wrapper pipeline description names assess-release as the end-user surface" {
+  [ -f "$WRAPPER_PIPELINE" ]
+  # JTBD-005 persona-fit: solo developer must not land on the raw wrapper
+  # and miss the assess-* gate-satisfaction wrap-up. Description must
+  # disambiguate.
+  run grep -E "assess-release" "$WRAPPER_PIPELINE"
+  [ "$status" -eq 0 ]
+}
+@test "wrapper wip description names assess-wip as the end-user surface" {
+  [ -f "$WRAPPER_WIP" ]
+  run grep -E "assess-wip" "$WRAPPER_WIP"
+  [ "$status" -eq 0 ]
+}
+@test "wrapper external-comms description names assess-external-comms as the end-user surface" {
+  [ -f "$WRAPPER_EXTERNAL_COMMS" ]
+  run grep -E "assess-external-comms" "$WRAPPER_EXTERNAL_COMMS"
+  [ "$status" -eq 0 ]
+}

package/skills/assess-wip/SKILL.md CHANGED Viewed

@@ -42,14 +42,14 @@ Build a self-contained prompt for the wip subagent that includes:
 ### 3. Delegate to wr-risk-scorer:wip
-Invoke the wip subagent via the `Skill` tool:
+Invoke the WIP scorer via the `Skill` tool. The `wr-risk-scorer:wip` SKILL is a thin wrapper around the wip agent (per ADR-015 — see `packages/risk-scorer/skills/wip/SKILL.md`):
 ```
-subagent_type: wr-risk-scorer:wip
+skill: wr-risk-scorer:wip
 prompt: <constructed assessment prompt from step 2>
 ```
-Wait for the subagent to complete.
+Wait for the wrapper to return. The wrapper invokes the wip agent internally and returns the agent's verdict verbatim.
 ### 4. Present results

package/skills/external-comms/SKILL.md ADDED Viewed

@@ -0,0 +1,37 @@
+---
+name: wr-risk-scorer:external-comms
+description: Invokable SKILL wrapper around the wr-risk-scorer:external-comms leak-review agent. Delegates to the agent via the Agent tool and returns the agent's structured EXTERNAL_COMMS_RISK_VERDICT. Internal-use plumbing used by `/wr-risk-scorer:assess-external-comms` per ADR-015's Confirmation literal phrasing. End users should invoke `/wr-risk-scorer:assess-external-comms` instead.
+allowed-tools: Read, Glob, Grep, Bash, Agent
+---
+# External-Comms Leak Review Skill (Wrapper)
+This SKILL is an **invokable wrapper** around the `wr-risk-scorer:external-comms` agent. It exists so consumer SKILLs can invoke the leak reviewer via the **Skill tool** with `skill: wr-risk-scorer:external-comms` — matching ADR-015's Confirmation literal phrasing.
+**End users**: invoke `/wr-risk-scorer:assess-external-comms` instead. This wrapper is internal-use plumbing — calling it directly returns the raw verdict without the structured AskUserQuestion above-appetite handling (Rewrite / Move to private channel / Override / Cancel) that `/wr-risk-scorer:assess-external-comms` provides.
+## Contract
+- **Input** (`$ARGUMENTS`): a self-contained leak-review prompt structured per `packages/risk-scorer/agents/external-comms.md` § "What you receive":
+  - A leading `SURFACE: <name>` line (one of the canonical surface strings).
+  - The draft body wrapped verbatim inside `<draft>...</draft>` markers (the PostToolUse hook derives the marker key from this).
+  - The destination when known.
+- **Output**: the agent's verbatim verdict — `EXTERNAL_COMMS_RISK_VERDICT: PASS | FAIL` plus, on FAIL, an `EXTERNAL_COMMS_RISK_REASON:` block naming each Confidential Information class and the substrings that triggered it.
+- **Side effects**: the `PostToolUse:Agent` hook (`risk-score-mark.sh`) parses the verdict and writes the `external-comms-gate.sh` marker on PASS. The wrapper itself writes no files.
+## Steps
+### 1. Pass-through to the external-comms agent
+Invoke the external-comms subagent via the Agent tool with the caller's `$ARGUMENTS` verbatim. The `SURFACE:` line and `<draft>...</draft>` markers MUST be preserved exactly — the PostToolUse hook depends on the prompt structure for marker-key derivation:
+```
+subagent_type: wr-risk-scorer:external-comms
+prompt: $ARGUMENTS
+```
+### 2. Return the agent report verbatim
+Return the agent's response to the caller without alteration. Do NOT strip, paraphrase, or post-process the `EXTERNAL_COMMS_RISK_VERDICT:` or `EXTERNAL_COMMS_RISK_REASON:` blocks — the hook parses the verdict and consumer SKILLs surface the reason directly.
+$ARGUMENTS

package/skills/pipeline/SKILL.md ADDED Viewed

@@ -0,0 +1,34 @@
+---
+name: wr-risk-scorer:pipeline
+description: Invokable SKILL wrapper around the wr-risk-scorer:pipeline scoring agent. Delegates to the agent via the Agent tool and returns the agent's structured RISK_SCORES output. Internal-use plumbing used by `/wr-risk-scorer:assess-release` and any other consumer SKILL that needs Skill-tool-shaped invocation of the pipeline scorer per ADR-015's Confirmation literal phrasing. End users should invoke `/wr-risk-scorer:assess-release` instead.
+allowed-tools: Read, Glob, Bash, Agent
+---
+# Pipeline Scoring Skill (Wrapper)
+This SKILL is an **invokable wrapper** around the `wr-risk-scorer:pipeline` agent. It exists so consumer SKILLs can invoke the pipeline scorer via the **Skill tool** with `skill: wr-risk-scorer:pipeline` — matching ADR-015's Confirmation literal phrasing.
+**End users**: invoke `/wr-risk-scorer:assess-release` instead. This wrapper is internal-use plumbing — calling it directly returns raw scoring output without the gate-satisfaction wrap-up, AskUserQuestion above-appetite handling, or release-context resolution that `/wr-risk-scorer:assess-release` provides.
+## Contract
+- **Input** (`$ARGUMENTS`): a self-contained scoring prompt with pipeline state context. Caller assembles UNCOMMITTED / UNPUSHED / UNRELEASED sections per `packages/risk-scorer/agents/pipeline.md` § Pipeline State.
+- **Output**: the agent's verbatim report, including the structured `RISK_SCORES: commit=N push=N release=N` block, optional `RISK_BYPASS:` line, optional `RISK_REMEDIATIONS:` block, optional `RISK_REGISTER_HINT:` block, and optional `CATALOG_HIT_RATE:` line.
+- **Side effects**: the `PostToolUse:Agent` hook (`risk-score-mark.sh`) reads the agent's output downstream of this wrapper and writes the bypass marker files to `${TMPDIR}/claude-risk-${SESSION_ID}/`. The wrapper itself writes no files.
+## Steps
+### 1. Pass-through to the pipeline agent
+Invoke the pipeline subagent via the Agent tool with the caller's `$ARGUMENTS` verbatim:
+```
+subagent_type: wr-risk-scorer:pipeline
+prompt: $ARGUMENTS
+```
+### 2. Return the agent report verbatim
+Return the agent's response to the caller without alteration. Do NOT strip, paraphrase, or post-process the structured output blocks (`RISK_SCORES:`, `RISK_BYPASS:`, `RISK_REMEDIATIONS:`, `RISK_REGISTER_HINT:`, `CATALOG_HIT_RATE:`). The PostToolUse hook depends on the exact byte sequence to parse.
+$ARGUMENTS

package/skills/wip/SKILL.md ADDED Viewed

@@ -0,0 +1,33 @@
+---
+name: wr-risk-scorer:wip
+description: Invokable SKILL wrapper around the wr-risk-scorer:wip nudge agent. Delegates to the agent via the Agent tool and returns the agent's structured WIP risk verdict. Internal-use plumbing used by `/wr-risk-scorer:assess-wip` per ADR-015's Confirmation literal phrasing. End users should invoke `/wr-risk-scorer:assess-wip` instead.
+allowed-tools: Read, Glob, Bash, Agent
+---
+# WIP Scoring Skill (Wrapper)
+This SKILL is an **invokable wrapper** around the `wr-risk-scorer:wip` agent. It exists so consumer SKILLs can invoke the WIP nudge scorer via the **Skill tool** with `skill: wr-risk-scorer:wip` — matching ADR-015's Confirmation literal phrasing.
+**End users**: invoke `/wr-risk-scorer:assess-wip` instead. This wrapper is internal-use plumbing — calling it directly returns raw nudge output without the present-results layer that `/wr-risk-scorer:assess-wip` provides.
+## Contract
+- **Input** (`$ARGUMENTS`): a self-contained WIP-scoring prompt — typically the edited file path(s) plus a `git diff HEAD --stat` summary per `packages/risk-scorer/agents/wip.md`.
+- **Output**: the agent's verbatim report, including the WIP Risk Assessment markdown table, the cumulative pipeline risk picture, and the structured `RISK_VERDICT: CONTINUE | PAUSE | COMMIT` line.
+## Steps
+### 1. Pass-through to the wip agent
+Invoke the wip subagent via the Agent tool with the caller's `$ARGUMENTS` verbatim:
+```
+subagent_type: wr-risk-scorer:wip
+prompt: $ARGUMENTS
+```
+### 2. Return the agent report verbatim
+Return the agent's response to the caller without alteration. Do NOT strip, paraphrase, or post-process the `RISK_VERDICT:`, `RISK_REMEDIATIONS:`, or `RISK_COMMIT_REASON:` blocks — consumer SKILLs parse them directly.
+$ARGUMENTS