@windyroad/risk-scorer 0.12.7-preview.583 → 0.12.7-preview.591

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -79,6 +79,16 @@ The plugin includes six specialised agents:
79
79
  | `/wr-risk-scorer:bootstrap-catalog` | Bootstrap `docs/risks/` register from existing `.risk-reports/` corpus per ADR-059 — walks reports, dedupes by ADR-056 slug, emits one `R<NNN>-<slug>.active.md` per unique slug. Idempotent. Auto-triggers from `/install-updates` Step 6.5.1 when register is empty + `RISK-POLICY.md` present + `.risk-reports/` non-empty |
80
80
  | `/wr-risk-scorer:update-policy` | Generate or update `RISK-POLICY.md` |
81
81
 
82
+ ### Internal-use wrapper skills
83
+
84
+ These wrappers exist so consumer SKILLs can invoke the scoring agents via the Skill tool with `skill: wr-risk-scorer:<name>` per ADR-015's Confirmation literal phrasing. End users should invoke the `/wr-risk-scorer:assess-*` skills above; the wrappers are internal plumbing.
85
+
86
+ | Wrapper skill | Purpose |
87
+ |---------------|---------|
88
+ | `wr-risk-scorer:pipeline` | Skill-tool wrapper around the `wr-risk-scorer:pipeline` agent (consumer: `/wr-risk-scorer:assess-release`) |
89
+ | `wr-risk-scorer:wip` | Skill-tool wrapper around the `wr-risk-scorer:wip` agent (consumer: `/wr-risk-scorer:assess-wip`) |
90
+ | `wr-risk-scorer:external-comms` | Skill-tool wrapper around the `wr-risk-scorer:external-comms` agent (consumer: `/wr-risk-scorer:assess-external-comms`) |
91
+
82
92
  ## External-comms gate
83
93
 
84
94
  The `external-comms-gate.sh` hook intercepts outbound prose tool calls and the
@@ -45,6 +45,16 @@ if echo "$COMMAND" | grep -qE '(^|;|&&|\|\|)\s*npm run push:watch(\s|$)'; then
45
45
  if [ -f "${RDIR}/clean" ]; then
46
46
  exit 0
47
47
  fi
48
+ # CI-status precondition (P208): a within-appetite predicted-risk
49
+ # score is necessary but not sufficient — the lagging CI signal
50
+ # must also be green (or no-history-yet for the documented
51
+ # first-push case). Fail-closed on gh errors. Ordered AFTER the
52
+ # one-shot bypass markers and BEFORE the predicted-risk gate so
53
+ # incident workflows and clean-tree pushes are unaffected.
54
+ if ! check_ci_status "$SESSION_ID" "push"; then
55
+ risk_gate_deny "Push blocked: ${CI_GATE_REASON}"
56
+ exit 0
57
+ fi
48
58
  if ! check_risk_gate "$SESSION_ID" "push"; then
49
59
  if [ "$RISK_GATE_CATEGORY" = "threshold" ]; then
50
60
  risk_gate_deny "Push blocked: Push risk score ${RISK_GATE_SCORE}/25 (Medium or above). To proceed: (1) release first via \`npm run release:watch\`, (2) split the push, or (3) add risk-reducing measures. If risk-neutral or risk-reducing, delegate to wr-risk-scorer:pipeline (subagent_type: 'wr-risk-scorer:pipeline') — it will create a bypass marker."
@@ -83,6 +93,8 @@ if echo "$COMMAND" | grep -qE '(^|;|&&|\|\|)\s*npm run release:watch(\s|$)'; the
83
93
  # Live-incident bypass: if an incident marker exists, allow release
84
94
  # regardless of risk score. Used when addressing outages, security
85
95
  # incidents, or information disclosure that requires immediate deployment.
96
+ # Per JTBD-201, this MUST short-circuit BEFORE the CI-status check
97
+ # so the hotfix path is unaffected by red CI on master.
86
98
  if [ -f "${RDIR}/incident-release" ]; then
87
99
  rm -f "${RDIR}/incident-release"
88
100
  exit 0
@@ -92,6 +104,12 @@ if echo "$COMMAND" | grep -qE '(^|;|&&|\|\|)\s*npm run release:watch(\s|$)'; the
92
104
  rm -f "${RDIR}/reducing-release"
93
105
  exit 0
94
106
  fi
107
+ # CI-status precondition (P208): a green CI run on the target
108
+ # branch is required before shipping. Fail-closed on gh errors.
109
+ if ! check_ci_status "$SESSION_ID" "release"; then
110
+ risk_gate_deny "Release blocked: ${CI_GATE_REASON}"
111
+ exit 0
112
+ fi
95
113
  if ! check_risk_gate "$SESSION_ID" "release"; then
96
114
  risk_gate_deny "Release blocked: ${RISK_GATE_REASON}"
97
115
  exit 0
@@ -153,6 +153,148 @@ print(('yes' if score > N else 'no') + ' ' + str(N))
153
153
  return 0
154
154
  }
155
155
 
156
+ # Check CI health for the current branch (P208).
157
+ #
158
+ # Returns 0 if push/release may proceed, 1 if denied. Sets CI_GATE_REASON
159
+ # on deny with a human-readable message that names the CI conclusion and
160
+ # the run URL. Sets CI_GATE_CATEGORY ∈ {bypass, no-history, allow, red,
161
+ # pending, gh-error}.
162
+ #
163
+ # Consults `gh run list --branch <current-branch> --limit 1 --json
164
+ # status,conclusion,databaseId,url` for the working branch's most recent
165
+ # CI run.
166
+ #
167
+ # Decision table:
168
+ # - bypass marker present (${RDIR}/ci-bypass-${ACTION}) → allow, consume
169
+ # - gh failure (auth / timeout / API error) → DENY (fail-CLOSED, per
170
+ # P208 safe-high-fix-risk classifier — a buggy harden must NOT
171
+ # degrade to allow)
172
+ # - empty result `[]` → allow (no CI history yet; first push triggers
173
+ # CI naturally)
174
+ # - status ∈ {queued, in_progress, pending, requested, waiting} → deny
175
+ # - conclusion ∈ {failure, cancelled, timed_out, action_required,
176
+ # startup_failure} → deny
177
+ # - conclusion ∈ {success, skipped, neutral} or unknown → allow
178
+ #
179
+ # Usage: check_ci_status "$SESSION_ID" "push" # or "release"
180
+ check_ci_status() {
181
+ local SESSION_ID="$1"
182
+ local ACTION="$2"
183
+ local RDIR
184
+ RDIR=$(_risk_dir "$SESSION_ID")
185
+ local BYPASS_MARKER="${RDIR}/ci-bypass-${ACTION}"
186
+
187
+ CI_GATE_REASON=""
188
+ CI_GATE_CATEGORY=""
189
+
190
+ # One-shot bypass marker — consumed on use, same family as
191
+ # reducing-push / incident-release. Documented override for the
192
+ # legitimate "first push triggers CI" edge case and infra incidents.
193
+ if [ -f "$BYPASS_MARKER" ]; then
194
+ rm -f "$BYPASS_MARKER"
195
+ CI_GATE_CATEGORY="bypass"
196
+ return 0
197
+ fi
198
+
199
+ # Resolve current branch. If we're not in a git repo or HEAD is
200
+ # detached, skip the CI check (the surrounding push/release gate
201
+ # would itself fail at the git layer with a clearer error).
202
+ local BRANCH
203
+ BRANCH=$(git rev-parse --abbrev-ref HEAD 2>/dev/null || echo "")
204
+ if [ -z "$BRANCH" ] || [ "$BRANCH" = "HEAD" ]; then
205
+ CI_GATE_CATEGORY="allow"
206
+ return 0
207
+ fi
208
+
209
+ # Query GitHub. Bounded at 10s wall-clock so a network stall cannot
210
+ # hang push:watch indefinitely. `command -v timeout` because macOS
211
+ # default install does not ship GNU `timeout`.
212
+ local JSON GH_EXIT
213
+ if command -v timeout >/dev/null 2>&1; then
214
+ JSON=$(timeout 10s gh run list --branch "$BRANCH" --limit 1 \
215
+ --json status,conclusion,databaseId,url 2>/dev/null) || GH_EXIT=$?
216
+ else
217
+ JSON=$(gh run list --branch "$BRANCH" --limit 1 \
218
+ --json status,conclusion,databaseId,url 2>/dev/null) || GH_EXIT=$?
219
+ fi
220
+
221
+ if [ -n "${GH_EXIT:-}" ] && [ "$GH_EXIT" != "0" ]; then
222
+ CI_GATE_CATEGORY="gh-error"
223
+ CI_GATE_REASON="CI status check failed (gh exit ${GH_EXIT}: auth / timeout / API error). Fail-closed per P208 safe-high-fix-risk. Fix the underlying gh failure, or to override for a legitimate first-push-triggers-CI run, create the bypass marker: touch ${BYPASS_MARKER}"
224
+ return 1
225
+ fi
226
+
227
+ # Empty array = no CI history for this branch yet. Natural allow for
228
+ # the documented "first push triggers CI" case — no marker needed.
229
+ local TRIMMED
230
+ TRIMMED=$(printf '%s' "$JSON" | tr -d '[:space:]')
231
+ if [ -z "$TRIMMED" ] || [ "$TRIMMED" = "[]" ]; then
232
+ CI_GATE_CATEGORY="no-history"
233
+ return 0
234
+ fi
235
+
236
+ # Parse status, conclusion, url. Fail-closed on parse error.
237
+ local PARSED
238
+ PARSED=$(echo "$JSON" | python3 -c "
239
+ import sys, json
240
+ try:
241
+ runs = json.load(sys.stdin)
242
+ if not isinstance(runs, list) or not runs:
243
+ print('||')
244
+ sys.exit(0)
245
+ r = runs[0]
246
+ print('{}|{}|{}'.format(r.get('status') or '', r.get('conclusion') or '', r.get('url') or ''))
247
+ except Exception:
248
+ print('PARSE_ERROR||')
249
+ " 2>/dev/null || echo "PARSE_ERROR||")
250
+
251
+ local STATUS CONCLUSION URL
252
+ STATUS="${PARSED%%|*}"
253
+ local REST="${PARSED#*|}"
254
+ CONCLUSION="${REST%%|*}"
255
+ URL="${REST#*|}"
256
+
257
+ if [ "$STATUS" = "PARSE_ERROR" ]; then
258
+ CI_GATE_CATEGORY="gh-error"
259
+ CI_GATE_REASON="CI status check returned unparseable response. Fail-closed per P208 safe-high-fix-risk. To override for a legitimate first-push case, create the bypass marker: touch ${BYPASS_MARKER}"
260
+ return 1
261
+ fi
262
+
263
+ case "$STATUS" in
264
+ queued|in_progress|pending|requested|waiting)
265
+ CI_GATE_CATEGORY="pending"
266
+ CI_GATE_REASON="Latest CI run on branch '${BRANCH}' is still in flight (status: ${STATUS}). Wait for it to settle: ${URL}. To override, create the bypass marker: touch ${BYPASS_MARKER}"
267
+ return 1
268
+ ;;
269
+ completed)
270
+ case "$CONCLUSION" in
271
+ success|skipped|neutral|"")
272
+ CI_GATE_CATEGORY="allow"
273
+ return 0
274
+ ;;
275
+ failure|cancelled|timed_out|action_required|startup_failure)
276
+ CI_GATE_CATEGORY="red"
277
+ CI_GATE_REASON="Latest CI run on branch '${BRANCH}' concluded ${CONCLUSION}: ${URL}. Fix CI before pushing/releasing. To override for a legitimate first-push or infra-incident case, create the bypass marker: touch ${BYPASS_MARKER}"
278
+ return 1
279
+ ;;
280
+ *)
281
+ # Unknown conclusion — allow rather than block on a value we
282
+ # don't recognise. New GitHub conclusion values are infrequent.
283
+ CI_GATE_CATEGORY="allow"
284
+ return 0
285
+ ;;
286
+ esac
287
+ ;;
288
+ *)
289
+ # Unknown status — allow rather than block on a value we don't
290
+ # recognise. Conservative tilts toward the threshold check below
291
+ # catching the actual risk.
292
+ CI_GATE_CATEGORY="allow"
293
+ return 0
294
+ ;;
295
+ esac
296
+ }
297
+
156
298
  # Emit fail-closed deny JSON for PreToolUse hooks.
157
299
  risk_gate_deny() {
158
300
  local REASON="$1"
@@ -0,0 +1,240 @@
1
+ #!/usr/bin/env bats
2
+ # Tests for the CI-status precondition in the push/release gate.
3
+ #
4
+ # Closes P208 (known-error): git-push-gate.sh did not consult CI health
5
+ # before scoring push/release risk, so a push could land on a CI-red
6
+ # master and a release could ship broken code.
7
+ #
8
+ # Contract:
9
+ # - `check_ci_status` queries `gh run list --branch <current-branch>
10
+ # --limit 1 --json status,conclusion,databaseId,url` for the current
11
+ # branch and returns 0 (allow) / 1 (deny).
12
+ # - Deny on conclusion ∈ {failure, cancelled, timed_out, action_required,
13
+ # startup_failure}.
14
+ # - Deny on status ∈ {queued, in_progress, pending, requested, waiting}.
15
+ # - Allow on conclusion ∈ {success, skipped, neutral} or unknown.
16
+ # - Empty array (no CI history yet) → allow. Handles the documented
17
+ # "first push triggers CI" case naturally — no bypass marker required.
18
+ # - `gh` failure (auth/timeout/API error) → DENY (fail-closed per the
19
+ # safe-high-fix-risk classifier on P208).
20
+ # - `${RDIR}/ci-bypass-${ACTION}` one-shot bypass marker — consumed on
21
+ # use, same family as reducing-push / incident-release.
22
+ # - Integration: in git-push-gate.sh, the ordering is bypass-markers →
23
+ # CI status → risk gate. The `incident-release` bypass MUST short-
24
+ # circuit BEFORE the CI check fires (per JTBD-201 + ADR-018).
25
+
26
+ setup() {
27
+ HOOKS_DIR="$(cd "$(dirname "$BATS_TEST_FILENAME")/.." && pwd)"
28
+ source "$HOOKS_DIR/lib/gate-helpers.sh"
29
+ source "$HOOKS_DIR/lib/risk-gate.sh"
30
+
31
+ TEST_SESSION="bats-ci-gate-$$-${BATS_TEST_NUMBER}"
32
+ RDIR=$(_risk_dir "$TEST_SESSION")
33
+ rm -rf "$RDIR"
34
+ mkdir -p "$RDIR"
35
+
36
+ # Stand up a fake git repo so `git rev-parse --abbrev-ref HEAD` resolves.
37
+ TEST_REPO="$(mktemp -d)"
38
+ ( cd "$TEST_REPO" && git init -q -b main && \
39
+ git -c user.email=t@e -c user.name=t commit --allow-empty -q -m "init" )
40
+
41
+ # Stub `gh` on PATH. The stub reads $FAKE_GH_OUTPUT and $FAKE_GH_EXIT
42
+ # for behaviour. PATH ordering: stub dir first.
43
+ STUB_DIR="$(mktemp -d)"
44
+ cat > "$STUB_DIR/gh" <<'STUB'
45
+ #!/bin/bash
46
+ if [ -n "${FAKE_GH_DELAY:-}" ]; then sleep "$FAKE_GH_DELAY"; fi
47
+ if [ -n "${FAKE_GH_OUTPUT:-}" ]; then
48
+ printf '%s' "$FAKE_GH_OUTPUT"
49
+ fi
50
+ exit "${FAKE_GH_EXIT:-0}"
51
+ STUB
52
+ chmod +x "$STUB_DIR/gh"
53
+ # `timeout` may not exist on the path on some macOS setups — stub a
54
+ # passthrough for portability. Tests inject FAKE_GH_DELAY only when
55
+ # they specifically test timeout behaviour.
56
+ ORIG_PATH="$PATH"
57
+ export PATH="$STUB_DIR:$PATH"
58
+ export TEST_REPO STUB_DIR
59
+ }
60
+
61
+ teardown() {
62
+ rm -rf "$RDIR" "$TEST_REPO" "$STUB_DIR"
63
+ export PATH="$ORIG_PATH"
64
+ unset FAKE_GH_OUTPUT FAKE_GH_EXIT FAKE_GH_DELAY CI_GATE_REASON CI_GATE_CATEGORY 2>/dev/null || true
65
+ }
66
+
67
+ # Run check_ci_status inside the fake repo so branch resolution works.
68
+ _run_check() {
69
+ local action="$1"
70
+ CI_GATE_REASON=""
71
+ CI_GATE_CATEGORY=""
72
+ ( cd "$TEST_REPO" && \
73
+ FAKE_GH_OUTPUT="${FAKE_GH_OUTPUT:-}" FAKE_GH_EXIT="${FAKE_GH_EXIT:-0}" \
74
+ bash -c "source '$HOOKS_DIR/lib/gate-helpers.sh'; source '$HOOKS_DIR/lib/risk-gate.sh'; \
75
+ if check_ci_status '$TEST_SESSION' '$action'; then echo ALLOW; \
76
+ else echo \"DENY: \$CI_GATE_REASON\"; fi" )
77
+ }
78
+
79
+ @test "check_ci_status allows when latest CI run concluded success" {
80
+ export FAKE_GH_OUTPUT='[{"status":"completed","conclusion":"success","databaseId":1,"url":"https://github.com/x/y/actions/runs/1"}]'
81
+ result=$(_run_check "push")
82
+ [[ "$result" == "ALLOW" ]]
83
+ }
84
+
85
+ @test "check_ci_status denies when latest CI run concluded failure (names conclusion + URL)" {
86
+ export FAKE_GH_OUTPUT='[{"status":"completed","conclusion":"failure","databaseId":2,"url":"https://github.com/x/y/actions/runs/2"}]'
87
+ result=$(_run_check "push")
88
+ [[ "$result" == DENY:* ]]
89
+ [[ "$result" == *"failure"* ]]
90
+ [[ "$result" == *"https://github.com/x/y/actions/runs/2"* ]]
91
+ }
92
+
93
+ @test "check_ci_status denies when latest CI run concluded cancelled" {
94
+ export FAKE_GH_OUTPUT='[{"status":"completed","conclusion":"cancelled","databaseId":3,"url":"https://github.com/x/y/actions/runs/3"}]'
95
+ result=$(_run_check "release")
96
+ [[ "$result" == DENY:* ]]
97
+ [[ "$result" == *"cancelled"* ]]
98
+ }
99
+
100
+ @test "check_ci_status denies when latest CI run concluded timed_out" {
101
+ export FAKE_GH_OUTPUT='[{"status":"completed","conclusion":"timed_out","databaseId":4,"url":"https://github.com/x/y/actions/runs/4"}]'
102
+ result=$(_run_check "push")
103
+ [[ "$result" == DENY:* ]]
104
+ [[ "$result" == *"timed_out"* ]]
105
+ }
106
+
107
+ @test "check_ci_status denies when latest CI run status is in_progress" {
108
+ export FAKE_GH_OUTPUT='[{"status":"in_progress","conclusion":null,"databaseId":5,"url":"https://github.com/x/y/actions/runs/5"}]'
109
+ result=$(_run_check "push")
110
+ [[ "$result" == DENY:* ]]
111
+ [[ "$result" == *"in_progress"* ]]
112
+ }
113
+
114
+ @test "check_ci_status denies when latest CI run status is queued" {
115
+ export FAKE_GH_OUTPUT='[{"status":"queued","conclusion":null,"databaseId":6,"url":"https://github.com/x/y/actions/runs/6"}]'
116
+ result=$(_run_check "release")
117
+ [[ "$result" == DENY:* ]]
118
+ [[ "$result" == *"queued"* ]]
119
+ }
120
+
121
+ @test "check_ci_status allows when CI history is empty (first push triggers CI)" {
122
+ export FAKE_GH_OUTPUT='[]'
123
+ result=$(_run_check "push")
124
+ [[ "$result" == "ALLOW" ]]
125
+ }
126
+
127
+ @test "check_ci_status denies when gh exits non-zero (fail-closed, safe-high-fix-risk)" {
128
+ export FAKE_GH_OUTPUT=''
129
+ export FAKE_GH_EXIT=1
130
+ result=$(_run_check "push")
131
+ [[ "$result" == DENY:* ]]
132
+ # Must point at the ci-bypass marker for the documented override path.
133
+ [[ "$result" == *"ci-bypass-push"* ]]
134
+ }
135
+
136
+ @test "check_ci_status allows when ci-bypass marker is present and consumes it" {
137
+ : > "$RDIR/ci-bypass-push"
138
+ export FAKE_GH_OUTPUT='[{"status":"completed","conclusion":"failure","databaseId":7,"url":"https://github.com/x/y/actions/runs/7"}]'
139
+ result=$(_run_check "push")
140
+ [[ "$result" == "ALLOW" ]]
141
+ # Bypass markers are one-shot — same family as reducing-push / incident-release.
142
+ [ ! -f "$RDIR/ci-bypass-push" ]
143
+ }
144
+
145
+ @test "check_ci_status bypass marker is action-scoped (push marker does not bypass release)" {
146
+ : > "$RDIR/ci-bypass-push"
147
+ export FAKE_GH_OUTPUT='[{"status":"completed","conclusion":"failure","databaseId":8,"url":"https://github.com/x/y/actions/runs/8"}]'
148
+ result=$(_run_check "release")
149
+ [[ "$result" == DENY:* ]]
150
+ # push bypass must not have been consumed by a release check
151
+ [ -f "$RDIR/ci-bypass-push" ]
152
+ }
153
+
154
+ @test "check_ci_status allows when conclusion is skipped" {
155
+ export FAKE_GH_OUTPUT='[{"status":"completed","conclusion":"skipped","databaseId":9,"url":"https://github.com/x/y/actions/runs/9"}]'
156
+ result=$(_run_check "push")
157
+ [[ "$result" == "ALLOW" ]]
158
+ }
159
+
160
+ @test "check_ci_status allows when conclusion is neutral" {
161
+ export FAKE_GH_OUTPUT='[{"status":"completed","conclusion":"neutral","databaseId":10,"url":"https://github.com/x/y/actions/runs/10"}]'
162
+ result=$(_run_check "release")
163
+ [[ "$result" == "ALLOW" ]]
164
+ }
165
+
166
+ # ---------------------------------------------------------------------------
167
+ # Integration: git-push-gate.sh ordering — bypass-markers → CI status → risk
168
+ # gate. JTBD-201 demands the incident-release bypass MUST short-circuit
169
+ # BEFORE the new CI-status check fires.
170
+ # ---------------------------------------------------------------------------
171
+
172
+ # Helper: build a PreToolUse Bash input with a given command
173
+ _build_input() {
174
+ local cmd="$1"
175
+ cat <<JSON
176
+ {
177
+ "session_id": "$TEST_SESSION",
178
+ "tool_name": "Bash",
179
+ "tool_input": {
180
+ "command": "$cmd"
181
+ }
182
+ }
183
+ JSON
184
+ }
185
+
186
+ @test "git-push-gate.sh denies push:watch when CI is red even if risk score is within appetite" {
187
+ # Within-appetite risk score
188
+ echo "1" > "$RDIR/push"
189
+ # Disable drift check (no stored hash file)
190
+ export FAKE_GH_OUTPUT='[{"status":"completed","conclusion":"failure","databaseId":11,"url":"https://github.com/x/y/actions/runs/11"}]'
191
+
192
+ INPUT=$(_build_input "npm run push:watch")
193
+ output=$( cd "$TEST_REPO" && echo "$INPUT" | \
194
+ FAKE_GH_OUTPUT="$FAKE_GH_OUTPUT" PATH="$STUB_DIR:$PATH" \
195
+ "$HOOKS_DIR/git-push-gate.sh" )
196
+ [[ "$output" == *"permissionDecision"* ]]
197
+ [[ "$output" == *"deny"* ]]
198
+ [[ "$output" == *"failure"* ]]
199
+ }
200
+
201
+ @test "git-push-gate.sh denies release:watch when CI is red even if risk score is within appetite" {
202
+ echo "1" > "$RDIR/release"
203
+ export FAKE_GH_OUTPUT='[{"status":"completed","conclusion":"failure","databaseId":12,"url":"https://github.com/x/y/actions/runs/12"}]'
204
+
205
+ INPUT=$(_build_input "npm run release:watch")
206
+ output=$( cd "$TEST_REPO" && echo "$INPUT" | \
207
+ FAKE_GH_OUTPUT="$FAKE_GH_OUTPUT" PATH="$STUB_DIR:$PATH" \
208
+ "$HOOKS_DIR/git-push-gate.sh" )
209
+ [[ "$output" == *"permissionDecision"* ]]
210
+ [[ "$output" == *"deny"* ]]
211
+ [[ "$output" == *"failure"* ]]
212
+ }
213
+
214
+ @test "git-push-gate.sh allows release:watch when incident-release bypass is set, even if CI is red (JTBD-201)" {
215
+ echo "9" > "$RDIR/release"
216
+ : > "$RDIR/incident-release"
217
+ # Even with a red CI conclusion, the incident bypass must short-circuit
218
+ # both the CI check and the risk threshold.
219
+ export FAKE_GH_OUTPUT='[{"status":"completed","conclusion":"failure","databaseId":13,"url":"https://github.com/x/y/actions/runs/13"}]'
220
+
221
+ INPUT=$(_build_input "npm run release:watch")
222
+ output=$( cd "$TEST_REPO" && echo "$INPUT" | \
223
+ FAKE_GH_OUTPUT="$FAKE_GH_OUTPUT" PATH="$STUB_DIR:$PATH" \
224
+ "$HOOKS_DIR/git-push-gate.sh" )
225
+ # No permissionDecision means allow (exit 0 with no JSON).
226
+ [[ "$output" != *"permissionDecision"* ]]
227
+ # incident-release marker is one-shot — must be consumed
228
+ [ ! -f "$RDIR/incident-release" ]
229
+ }
230
+
231
+ @test "git-push-gate.sh allows push:watch when CI history is empty (first push)" {
232
+ echo "1" > "$RDIR/push"
233
+ export FAKE_GH_OUTPUT='[]'
234
+
235
+ INPUT=$(_build_input "npm run push:watch")
236
+ output=$( cd "$TEST_REPO" && echo "$INPUT" | \
237
+ FAKE_GH_OUTPUT="$FAKE_GH_OUTPUT" PATH="$STUB_DIR:$PATH" \
238
+ "$HOOKS_DIR/git-push-gate.sh" )
239
+ [[ "$output" != *"permissionDecision"* ]]
240
+ }
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@windyroad/risk-scorer",
3
- "version": "0.12.7-preview.583",
3
+ "version": "0.12.7-preview.591",
4
4
  "description": "Pipeline risk scoring, commit/push gates, and secret leak detection",
5
5
  "bin": {
6
6
  "windyroad-risk-scorer": "./bin/install.mjs"
@@ -72,14 +72,14 @@ The orchestrator does NOT pre-compute the key — the hook derives it from the p
72
72
 
73
73
  ### 4. Delegate to wr-risk-scorer:external-comms
74
74
 
75
- Invoke the subagent via the `Skill` tool:
75
+ Invoke the external-comms reviewer via the `Skill` tool. The `wr-risk-scorer:external-comms` SKILL is a thin wrapper around the external-comms agent (per ADR-015 — see `packages/risk-scorer/skills/external-comms/SKILL.md`):
76
76
 
77
77
  ```
78
- subagent_type: wr-risk-scorer:external-comms
78
+ skill: wr-risk-scorer:external-comms
79
79
  prompt: <constructed review prompt from step 3>
80
80
  ```
81
81
 
82
- Wait for the subagent to complete. The subagent outputs a structured verdict block (`EXTERNAL_COMMS_RISK_VERDICT: PASS|FAIL` + optional `EXTERNAL_COMMS_RISK_REASON: ...` on FAIL). The `PostToolUse:Agent` hook (`risk-score-mark.sh`) parses the verdict, derives the marker key from the prompt's `SURFACE:` + `<draft>` structure, and writes the marker automatically on PASS.
82
+ Wait for the wrapper to return. The wrapper invokes the external-comms agent internally; the agent's structured verdict block (`EXTERNAL_COMMS_RISK_VERDICT: PASS|FAIL` + optional `EXTERNAL_COMMS_RISK_REASON: ...` on FAIL) flows back verbatim. The `PostToolUse:Agent` hook (`risk-score-mark.sh`) fires on the wrapper's inner Agent invocation, derives the marker key from the prompt's `SURFACE:` + `<draft>` structure, and writes the marker automatically on PASS.
83
83
 
84
84
  **Do not write to `${TMPDIR:-/tmp}/claude-risk-*` yourself.** The hook is the only correct mechanism.
85
85
 
@@ -64,14 +64,14 @@ Build a self-contained prompt for the pipeline subagent that includes:
64
64
 
65
65
  ### 5. Delegate to wr-risk-scorer:pipeline
66
66
 
67
- Invoke the pipeline subagent via the `Skill` tool:
67
+ Invoke the pipeline scorer via the `Skill` tool. The `wr-risk-scorer:pipeline` SKILL is a thin wrapper around the pipeline agent (per ADR-015 — see `packages/risk-scorer/skills/pipeline/SKILL.md`):
68
68
 
69
69
  ```
70
- subagent_type: wr-risk-scorer:pipeline
70
+ skill: wr-risk-scorer:pipeline
71
71
  prompt: <constructed assessment prompt from step 4>
72
72
  ```
73
73
 
74
- Wait for the subagent to complete. The subagent will output a structured `RISK_SCORES:` block. The `PostToolUse:Agent` hook (`risk-score-mark.sh`) reads that output and writes the bypass marker files automatically.
74
+ Wait for the wrapper to return. The wrapper invokes the pipeline agent internally; the agent's structured `RISK_SCORES:` block flows back through the wrapper verbatim. The `PostToolUse:Agent` hook (`risk-score-mark.sh`) fires on the wrapper's inner Agent invocation and writes the bypass marker files automatically.
75
75
 
76
76
  **Do not write to `$TMPDIR/claude-risk-*` yourself.** The hook is the only correct mechanism.
77
77
 
@@ -0,0 +1,162 @@
1
+ #!/usr/bin/env bats
2
+ # Contract guard: the on-demand assessment SKILLs (assess-release, assess-wip,
3
+ # assess-external-comms) MUST delegate to their scoring agent via the Skill
4
+ # tool — not via the Agent tool — matching ADR-015's Confirmation literal
5
+ # phrasing ("the skill delegates to wr-risk-scorer:<agent> via the Skill
6
+ # tool"). Closes the P205 contradiction surfaced by ADR-015 Confirmation
7
+ # vs. SKILL.md prose mismatch.
8
+ #
9
+ # Structural assertions — Permitted Exception to the source-grep ban
10
+ # (ADR-005 / P011), same framing as risk-scorer-register-hint.bats. SKILL.md
11
+ # prose IS the contract document the orchestrator (Claude) consumes when
12
+ # executing the SKILL; an LLM-output behavioural check is out of scope for
13
+ # bats and is the responsibility of the promptfoo harness (ADR-075).
14
+ #
15
+ # What is asserted (contract, not implementation):
16
+ # 1. Each assess-* SKILL's step 5 (release) / step 3 (wip) / step 4
17
+ # (external-comms) names `skill:` as the delegation tool parameter
18
+ # with the correct wrapper SKILL name as the target.
19
+ # 2. None of the assess-* SKILLs name `subagent_type:` as the delegation
20
+ # tool parameter (the P205 contradiction class).
21
+ # 3. Each wrapper SKILL (`pipeline`, `wip`, `external-comms`) exists at
22
+ # its expected path, is namespaced `wr-risk-scorer:<name>`, and
23
+ # delegates to its sibling agent via `subagent_type:`.
24
+ #
25
+ # Cross-reference:
26
+ # P205: docs/problems/known-error/205-wr-risk-scorer-assess-release-skill-md-step-5-prose-says-skill-tool-but-provides-subagent-type.md
27
+ # ADR-015: docs/decisions/015-on-demand-assessment-skills.proposed.md (Confirmation criteria 189-193)
28
+ # ADR-052: docs/decisions/052-behavioural-tests-default.proposed.md (Permitted Exception)
29
+ # @jtbd JTBD-005 (invoke governance assessments on demand)
30
+ # @jtbd JTBD-101 (extend the suite — plugins expose corresponding skills)
31
+
32
+ setup() {
33
+ SKILLS_DIR="$(cd "$(dirname "$BATS_TEST_FILENAME")/../.." && pwd)"
34
+
35
+ ASSESS_RELEASE="${SKILLS_DIR}/assess-release/SKILL.md"
36
+ ASSESS_WIP="${SKILLS_DIR}/assess-wip/SKILL.md"
37
+ ASSESS_EXTERNAL_COMMS="${SKILLS_DIR}/assess-external-comms/SKILL.md"
38
+
39
+ WRAPPER_PIPELINE="${SKILLS_DIR}/pipeline/SKILL.md"
40
+ WRAPPER_WIP="${SKILLS_DIR}/wip/SKILL.md"
41
+ WRAPPER_EXTERNAL_COMMS="${SKILLS_DIR}/external-comms/SKILL.md"
42
+ }
43
+
44
+ # ──────────────────────────────────────────────────────────────────────────────
45
+ # Consumer SKILLs delegate via Skill tool (skill: parameter)
46
+ # ──────────────────────────────────────────────────────────────────────────────
47
+
48
+ @test "assess-release delegates via skill: wr-risk-scorer:pipeline" {
49
+ [ -f "$ASSESS_RELEASE" ]
50
+ run grep -E "^skill: wr-risk-scorer:pipeline$" "$ASSESS_RELEASE"
51
+ [ "$status" -eq 0 ]
52
+ }
53
+
54
+ @test "assess-release does NOT use subagent_type: in its delegation block" {
55
+ [ -f "$ASSESS_RELEASE" ]
56
+ # The P205 contradiction was: prose says "Skill tool" but provides
57
+ # subagent_type: wr-risk-scorer:pipeline. After the fix, no
58
+ # `subagent_type:` line may appear in the delegation block.
59
+ run grep -E "^subagent_type: wr-risk-scorer:pipeline$" "$ASSESS_RELEASE"
60
+ [ "$status" -ne 0 ]
61
+ }
62
+
63
+ @test "assess-wip delegates via skill: wr-risk-scorer:wip" {
64
+ [ -f "$ASSESS_WIP" ]
65
+ run grep -E "^skill: wr-risk-scorer:wip$" "$ASSESS_WIP"
66
+ [ "$status" -eq 0 ]
67
+ }
68
+
69
+ @test "assess-wip does NOT use subagent_type: in its delegation block" {
70
+ [ -f "$ASSESS_WIP" ]
71
+ run grep -E "^subagent_type: wr-risk-scorer:wip$" "$ASSESS_WIP"
72
+ [ "$status" -ne 0 ]
73
+ }
74
+
75
+ @test "assess-external-comms delegates via skill: wr-risk-scorer:external-comms" {
76
+ [ -f "$ASSESS_EXTERNAL_COMMS" ]
77
+ run grep -E "^skill: wr-risk-scorer:external-comms$" "$ASSESS_EXTERNAL_COMMS"
78
+ [ "$status" -eq 0 ]
79
+ }
80
+
81
+ @test "assess-external-comms does NOT use subagent_type: in its delegation block" {
82
+ [ -f "$ASSESS_EXTERNAL_COMMS" ]
83
+ run grep -E "^subagent_type: wr-risk-scorer:external-comms$" "$ASSESS_EXTERNAL_COMMS"
84
+ [ "$status" -ne 0 ]
85
+ }
86
+
87
+ # ──────────────────────────────────────────────────────────────────────────────
88
+ # Wrapper SKILLs exist with correct names and delegate to the agent
89
+ # ──────────────────────────────────────────────────────────────────────────────
90
+
91
+ @test "wrapper SKILL packages/risk-scorer/skills/pipeline/SKILL.md exists" {
92
+ [ -f "$WRAPPER_PIPELINE" ]
93
+ }
94
+
95
+ @test "wrapper SKILL pipeline declares name: wr-risk-scorer:pipeline" {
96
+ [ -f "$WRAPPER_PIPELINE" ]
97
+ run grep -E "^name: wr-risk-scorer:pipeline$" "$WRAPPER_PIPELINE"
98
+ [ "$status" -eq 0 ]
99
+ }
100
+
101
+ @test "wrapper SKILL pipeline delegates to the pipeline agent via subagent_type:" {
102
+ [ -f "$WRAPPER_PIPELINE" ]
103
+ run grep -E "^subagent_type: wr-risk-scorer:pipeline$" "$WRAPPER_PIPELINE"
104
+ [ "$status" -eq 0 ]
105
+ }
106
+
107
+ @test "wrapper SKILL packages/risk-scorer/skills/wip/SKILL.md exists" {
108
+ [ -f "$WRAPPER_WIP" ]
109
+ }
110
+
111
+ @test "wrapper SKILL wip declares name: wr-risk-scorer:wip" {
112
+ [ -f "$WRAPPER_WIP" ]
113
+ run grep -E "^name: wr-risk-scorer:wip$" "$WRAPPER_WIP"
114
+ [ "$status" -eq 0 ]
115
+ }
116
+
117
+ @test "wrapper SKILL wip delegates to the wip agent via subagent_type:" {
118
+ [ -f "$WRAPPER_WIP" ]
119
+ run grep -E "^subagent_type: wr-risk-scorer:wip$" "$WRAPPER_WIP"
120
+ [ "$status" -eq 0 ]
121
+ }
122
+
123
+ @test "wrapper SKILL packages/risk-scorer/skills/external-comms/SKILL.md exists" {
124
+ [ -f "$WRAPPER_EXTERNAL_COMMS" ]
125
+ }
126
+
127
+ @test "wrapper SKILL external-comms declares name: wr-risk-scorer:external-comms" {
128
+ [ -f "$WRAPPER_EXTERNAL_COMMS" ]
129
+ run grep -E "^name: wr-risk-scorer:external-comms$" "$WRAPPER_EXTERNAL_COMMS"
130
+ [ "$status" -eq 0 ]
131
+ }
132
+
133
+ @test "wrapper SKILL external-comms delegates to the external-comms agent via subagent_type:" {
134
+ [ -f "$WRAPPER_EXTERNAL_COMMS" ]
135
+ run grep -E "^subagent_type: wr-risk-scorer:external-comms$" "$WRAPPER_EXTERNAL_COMMS"
136
+ [ "$status" -eq 0 ]
137
+ }
138
+
139
+ # ──────────────────────────────────────────────────────────────────────────────
140
+ # Wrapper SKILLs disambiguate from end-user assess-* surfaces
141
+ # ──────────────────────────────────────────────────────────────────────────────
142
+
143
+ @test "wrapper pipeline description names assess-release as the end-user surface" {
144
+ [ -f "$WRAPPER_PIPELINE" ]
145
+ # JTBD-005 persona-fit: solo developer must not land on the raw wrapper
146
+ # and miss the assess-* gate-satisfaction wrap-up. Description must
147
+ # disambiguate.
148
+ run grep -E "assess-release" "$WRAPPER_PIPELINE"
149
+ [ "$status" -eq 0 ]
150
+ }
151
+
152
+ @test "wrapper wip description names assess-wip as the end-user surface" {
153
+ [ -f "$WRAPPER_WIP" ]
154
+ run grep -E "assess-wip" "$WRAPPER_WIP"
155
+ [ "$status" -eq 0 ]
156
+ }
157
+
158
+ @test "wrapper external-comms description names assess-external-comms as the end-user surface" {
159
+ [ -f "$WRAPPER_EXTERNAL_COMMS" ]
160
+ run grep -E "assess-external-comms" "$WRAPPER_EXTERNAL_COMMS"
161
+ [ "$status" -eq 0 ]
162
+ }
@@ -42,14 +42,14 @@ Build a self-contained prompt for the wip subagent that includes:
42
42
 
43
43
  ### 3. Delegate to wr-risk-scorer:wip
44
44
 
45
- Invoke the wip subagent via the `Skill` tool:
45
+ Invoke the WIP scorer via the `Skill` tool. The `wr-risk-scorer:wip` SKILL is a thin wrapper around the wip agent (per ADR-015 — see `packages/risk-scorer/skills/wip/SKILL.md`):
46
46
 
47
47
  ```
48
- subagent_type: wr-risk-scorer:wip
48
+ skill: wr-risk-scorer:wip
49
49
  prompt: <constructed assessment prompt from step 2>
50
50
  ```
51
51
 
52
- Wait for the subagent to complete.
52
+ Wait for the wrapper to return. The wrapper invokes the wip agent internally and returns the agent's verdict verbatim.
53
53
 
54
54
  ### 4. Present results
55
55
 
@@ -0,0 +1,37 @@
1
+ ---
2
+ name: wr-risk-scorer:external-comms
3
+ description: Invokable SKILL wrapper around the wr-risk-scorer:external-comms leak-review agent. Delegates to the agent via the Agent tool and returns the agent's structured EXTERNAL_COMMS_RISK_VERDICT. Internal-use plumbing used by `/wr-risk-scorer:assess-external-comms` per ADR-015's Confirmation literal phrasing. End users should invoke `/wr-risk-scorer:assess-external-comms` instead.
4
+ allowed-tools: Read, Glob, Grep, Bash, Agent
5
+ ---
6
+
7
+ # External-Comms Leak Review Skill (Wrapper)
8
+
9
+ This SKILL is an **invokable wrapper** around the `wr-risk-scorer:external-comms` agent. It exists so consumer SKILLs can invoke the leak reviewer via the **Skill tool** with `skill: wr-risk-scorer:external-comms` — matching ADR-015's Confirmation literal phrasing.
10
+
11
+ **End users**: invoke `/wr-risk-scorer:assess-external-comms` instead. This wrapper is internal-use plumbing — calling it directly returns the raw verdict without the structured AskUserQuestion above-appetite handling (Rewrite / Move to private channel / Override / Cancel) that `/wr-risk-scorer:assess-external-comms` provides.
12
+
13
+ ## Contract
14
+
15
+ - **Input** (`$ARGUMENTS`): a self-contained leak-review prompt structured per `packages/risk-scorer/agents/external-comms.md` § "What you receive":
16
+ - A leading `SURFACE: <name>` line (one of the canonical surface strings).
17
+ - The draft body wrapped verbatim inside `<draft>...</draft>` markers (the PostToolUse hook derives the marker key from this).
18
+ - The destination when known.
19
+ - **Output**: the agent's verbatim verdict — `EXTERNAL_COMMS_RISK_VERDICT: PASS | FAIL` plus, on FAIL, an `EXTERNAL_COMMS_RISK_REASON:` block naming each Confidential Information class and the substrings that triggered it.
20
+ - **Side effects**: the `PostToolUse:Agent` hook (`risk-score-mark.sh`) parses the verdict and writes the `external-comms-gate.sh` marker on PASS. The wrapper itself writes no files.
21
+
22
+ ## Steps
23
+
24
+ ### 1. Pass-through to the external-comms agent
25
+
26
+ Invoke the external-comms subagent via the Agent tool with the caller's `$ARGUMENTS` verbatim. The `SURFACE:` line and `<draft>...</draft>` markers MUST be preserved exactly — the PostToolUse hook depends on the prompt structure for marker-key derivation:
27
+
28
+ ```
29
+ subagent_type: wr-risk-scorer:external-comms
30
+ prompt: $ARGUMENTS
31
+ ```
32
+
33
+ ### 2. Return the agent report verbatim
34
+
35
+ Return the agent's response to the caller without alteration. Do NOT strip, paraphrase, or post-process the `EXTERNAL_COMMS_RISK_VERDICT:` or `EXTERNAL_COMMS_RISK_REASON:` blocks — the hook parses the verdict and consumer SKILLs surface the reason directly.
36
+
37
+ $ARGUMENTS
@@ -0,0 +1,34 @@
1
+ ---
2
+ name: wr-risk-scorer:pipeline
3
+ description: Invokable SKILL wrapper around the wr-risk-scorer:pipeline scoring agent. Delegates to the agent via the Agent tool and returns the agent's structured RISK_SCORES output. Internal-use plumbing used by `/wr-risk-scorer:assess-release` and any other consumer SKILL that needs Skill-tool-shaped invocation of the pipeline scorer per ADR-015's Confirmation literal phrasing. End users should invoke `/wr-risk-scorer:assess-release` instead.
4
+ allowed-tools: Read, Glob, Bash, Agent
5
+ ---
6
+
7
+ # Pipeline Scoring Skill (Wrapper)
8
+
9
+ This SKILL is an **invokable wrapper** around the `wr-risk-scorer:pipeline` agent. It exists so consumer SKILLs can invoke the pipeline scorer via the **Skill tool** with `skill: wr-risk-scorer:pipeline` — matching ADR-015's Confirmation literal phrasing.
10
+
11
+ **End users**: invoke `/wr-risk-scorer:assess-release` instead. This wrapper is internal-use plumbing — calling it directly returns raw scoring output without the gate-satisfaction wrap-up, AskUserQuestion above-appetite handling, or release-context resolution that `/wr-risk-scorer:assess-release` provides.
12
+
13
+ ## Contract
14
+
15
+ - **Input** (`$ARGUMENTS`): a self-contained scoring prompt with pipeline state context. Caller assembles UNCOMMITTED / UNPUSHED / UNRELEASED sections per `packages/risk-scorer/agents/pipeline.md` § Pipeline State.
16
+ - **Output**: the agent's verbatim report, including the structured `RISK_SCORES: commit=N push=N release=N` block, optional `RISK_BYPASS:` line, optional `RISK_REMEDIATIONS:` block, optional `RISK_REGISTER_HINT:` block, and optional `CATALOG_HIT_RATE:` line.
17
+ - **Side effects**: the `PostToolUse:Agent` hook (`risk-score-mark.sh`) reads the agent's output downstream of this wrapper and writes the bypass marker files to `${TMPDIR}/claude-risk-${SESSION_ID}/`. The wrapper itself writes no files.
18
+
19
+ ## Steps
20
+
21
+ ### 1. Pass-through to the pipeline agent
22
+
23
+ Invoke the pipeline subagent via the Agent tool with the caller's `$ARGUMENTS` verbatim:
24
+
25
+ ```
26
+ subagent_type: wr-risk-scorer:pipeline
27
+ prompt: $ARGUMENTS
28
+ ```
29
+
30
+ ### 2. Return the agent report verbatim
31
+
32
+ Return the agent's response to the caller without alteration. Do NOT strip, paraphrase, or post-process the structured output blocks (`RISK_SCORES:`, `RISK_BYPASS:`, `RISK_REMEDIATIONS:`, `RISK_REGISTER_HINT:`, `CATALOG_HIT_RATE:`). The PostToolUse hook depends on the exact byte sequence to parse.
33
+
34
+ $ARGUMENTS
@@ -0,0 +1,33 @@
1
+ ---
2
+ name: wr-risk-scorer:wip
3
+ description: Invokable SKILL wrapper around the wr-risk-scorer:wip nudge agent. Delegates to the agent via the Agent tool and returns the agent's structured WIP risk verdict. Internal-use plumbing used by `/wr-risk-scorer:assess-wip` per ADR-015's Confirmation literal phrasing. End users should invoke `/wr-risk-scorer:assess-wip` instead.
4
+ allowed-tools: Read, Glob, Bash, Agent
5
+ ---
6
+
7
+ # WIP Scoring Skill (Wrapper)
8
+
9
+ This SKILL is an **invokable wrapper** around the `wr-risk-scorer:wip` agent. It exists so consumer SKILLs can invoke the WIP nudge scorer via the **Skill tool** with `skill: wr-risk-scorer:wip` — matching ADR-015's Confirmation literal phrasing.
10
+
11
+ **End users**: invoke `/wr-risk-scorer:assess-wip` instead. This wrapper is internal-use plumbing — calling it directly returns raw nudge output without the present-results layer that `/wr-risk-scorer:assess-wip` provides.
12
+
13
+ ## Contract
14
+
15
+ - **Input** (`$ARGUMENTS`): a self-contained WIP-scoring prompt — typically the edited file path(s) plus a `git diff HEAD --stat` summary per `packages/risk-scorer/agents/wip.md`.
16
+ - **Output**: the agent's verbatim report, including the WIP Risk Assessment markdown table, the cumulative pipeline risk picture, and the structured `RISK_VERDICT: CONTINUE | PAUSE | COMMIT` line.
17
+
18
+ ## Steps
19
+
20
+ ### 1. Pass-through to the wip agent
21
+
22
+ Invoke the wip subagent via the Agent tool with the caller's `$ARGUMENTS` verbatim:
23
+
24
+ ```
25
+ subagent_type: wr-risk-scorer:wip
26
+ prompt: $ARGUMENTS
27
+ ```
28
+
29
+ ### 2. Return the agent report verbatim
30
+
31
+ Return the agent's response to the caller without alteration. Do NOT strip, paraphrase, or post-process the `RISK_VERDICT:`, `RISK_REMEDIATIONS:`, or `RISK_COMMIT_REASON:` blocks — consumer SKILLs parse them directly.
32
+
33
+ $ARGUMENTS