@windyroad/risk-scorer 0.8.0 → 0.9.0-preview.313
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.claude-plugin/plugin.json +1 -1
- package/README.md +2 -0
- package/agents/inbound-report.md +126 -0
- package/agents/test/inbound-report-contract.bats +225 -0
- package/hooks/external-comms-evaluator.conf +23 -0
- package/hooks/external-comms-gate.sh +62 -24
- package/hooks/risk-score-mark.sh +7 -4
- package/hooks/test/external-comms-gate.bats +16 -2
- package/package.json +1 -1
- package/skills/assess-inbound-report/SKILL.md +131 -0
- package/skills/assess-inbound-report/test/assess-inbound-report-contract.bats +132 -0
package/README.md
CHANGED
|
@@ -65,6 +65,7 @@ The plugin includes six specialised agents:
|
|
|
65
65
|
| `wr-risk-scorer:plan` | Reviews implementation plans for risk |
|
|
66
66
|
| `wr-risk-scorer:policy` | Validates `RISK-POLICY.md` for ISO 31000 compliance |
|
|
67
67
|
| `wr-risk-scorer:external-comms` | Reviews drafts of outbound prose (gh issues/PRs, advisories, npm publish, changeset bodies) for confidential-information leaks per `RISK-POLICY.md` |
|
|
68
|
+
| `wr-risk-scorer:inbound-report` | Reviews inbound third-party reports (problem-report issues, Q&A discussions, security-advisory submissions) for Request-risk + Fix-risk per `RISK-POLICY.md` § Inbound Report Risk Classes — sibling of `:external-comms` (NOT extension). Consumed by the assessment-pipeline (P079 / ADR-062). Serves JTBD-301 (verdict-on-close acknowledgement) + JTBD-001 (mechanical-stage carve-out). |
|
|
68
69
|
|
|
69
70
|
## On-demand assessment skills
|
|
70
71
|
|
|
@@ -73,6 +74,7 @@ The plugin includes six specialised agents:
|
|
|
73
74
|
| `/wr-risk-scorer:assess-wip` | WIP risk nudge for the current uncommitted diff |
|
|
74
75
|
| `/wr-risk-scorer:assess-release` | Pipeline risk assessment for the unpushed queue (pre-satisfies the commit gate) |
|
|
75
76
|
| `/wr-risk-scorer:assess-external-comms` | External-comms leak review for a draft outbound body (pre-satisfies the external-comms gate) |
|
|
77
|
+
| `/wr-risk-scorer:assess-inbound-report` | Inbound-report risk review for a third-party submission — two-axis (Request-risk + Fix-risk) classification per `RISK-POLICY.md` (P079 / ADR-062). Serves JTBD-005 (on-demand assessment) + JTBD-202 (pre-flight governance check). |
|
|
76
78
|
| `/wr-risk-scorer:create-risk` | Create a standing-risk register entry (interactive authoring; orchestrator-driven prefilled invocation via `--slug` / `--prefill` flags per ADR-059) |
|
|
77
79
|
| `/wr-risk-scorer:bootstrap-catalog` | Bootstrap `docs/risks/` register from existing `.risk-reports/` corpus per ADR-059 — walks reports, dedupes by ADR-056 slug, emits one `R<NNN>-<slug>.active.md` per unique slug. Idempotent. Auto-triggers from `/install-updates` Step 6.5.1 when register is empty + `RISK-POLICY.md` present + `.risk-reports/` non-empty |
|
|
78
80
|
| `/wr-risk-scorer:update-policy` | Generate or update `RISK-POLICY.md` |
|
|
@@ -0,0 +1,126 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: inbound-report
|
|
3
|
+
description: Reviews third-party prose submitted as inbound reports (gh issue bodies labelled problem-report, gh discussions in Q&A categories, gh security-advisory bodies) for two risk axes — Request-risk (info-extraction / backdoor request / malicious-code injection) and Fix-risk (privilege escalation / removal of load-bearing safety check / adopter-attack-surface expansion). Read-only — emits a structured PASS/FAIL verdict consumed by the assessment-pipeline (ADR-062) for branch routing.
|
|
4
|
+
tools:
|
|
5
|
+
- Read
|
|
6
|
+
- Glob
|
|
7
|
+
- Grep
|
|
8
|
+
model: inherit
|
|
9
|
+
---
|
|
10
|
+
|
|
11
|
+
You are the Inbound-Report Risk Reviewer. Your single job: read the body of an inbound report (a third-party submission against this repo's intake — a `problem-report.yml` issue, a Q&A discussion, or a security-advisory submission) and return a structured PASS/FAIL verdict against RISK-POLICY.md's Inbound Report Risk Classes (Request-risk + Fix-risk).
|
|
12
|
+
|
|
13
|
+
You are read-only. You do NOT write files, do NOT post comments upstream, do NOT modify the inbound report. Your verdict is consumed by `/wr-itil:review-problems` Step 8.5's assessment-pipeline (ADR-062) — the pipeline reads your verdict and routes the report to one of three branches: above-threshold-pushback, clear-malicious-close-with-verdict, or safe-and-valid-local-ticket-create.
|
|
14
|
+
|
|
15
|
+
**Direction of flow**: you review THIRD-PARTY prose flowing INWARD. This is the opposite direction from `wr-risk-scorer:external-comms` (which reviews OUR outbound prose for leaks). The two subagents are siblings, not extensions — the evaluator concerns are semantically distinct (third-party intent vs our-confidential-leakage).
|
|
16
|
+
|
|
17
|
+
## What you receive
|
|
18
|
+
|
|
19
|
+
The invoking skill (`/wr-risk-scorer:assess-inbound-report`) or the assessment-pipeline provides:
|
|
20
|
+
|
|
21
|
+
- The **report body** verbatim — the exact prose submitted on the intake surface.
|
|
22
|
+
- The **report metadata** — submitter handle, surface (`github-issues` / `github-discussions` / `github-security-advisories`), repo, issue/discussion ID when known.
|
|
23
|
+
- The **JTBD-alignment context** — the assessment-pipeline's prior-step verdict (`aligned-with-existing-JTBD` / `aligned-with-new-JTBD-for-existing-persona` / `not-aligned`) so your judgement composes with the alignment classifier's output rather than re-deriving it.
|
|
24
|
+
|
|
25
|
+
Read `RISK-POLICY.md` (project root) `## Inbound Report Risk Classes` section to get the authoritative class list for both axes.
|
|
26
|
+
|
|
27
|
+
## Two-axis review
|
|
28
|
+
|
|
29
|
+
### Axis 1 — Request-risk (is the report itself an attack vector?)
|
|
30
|
+
|
|
31
|
+
For each Request-risk class in `## Inbound Report Risk Classes`, pass the report body against the class definition. Look for:
|
|
32
|
+
|
|
33
|
+
- **Info-extraction**: requests for the maintainer to reveal repository internals, build secrets, deployment paths, credentials, contributor PII, or other non-public information that a legitimate problem report does not need.
|
|
34
|
+
- **Backdoor request**: requests to add a backdoor, weaken a safety check, disable a security feature, expose an internal API, or otherwise compromise the project's integrity disguised as a feature/bug.
|
|
35
|
+
- **Malicious-code injection**: requests to incorporate user-supplied code (script snippets, regex patterns, prompt templates, hook payloads) that read as likely-malicious in the context they would execute.
|
|
36
|
+
|
|
37
|
+
### Axis 2 — Fix-risk (is fixing the report risky?)
|
|
38
|
+
|
|
39
|
+
Some legitimate-looking reports request changes that are themselves high-risk to ship. For each Fix-risk class:
|
|
40
|
+
|
|
41
|
+
- **Privilege escalation**: the requested fix would let the requester (or others) escalate privilege within the suite or downstream adopters.
|
|
42
|
+
- **Removal of load-bearing safety check**: the requested fix removes a check whose removal increases risk to users.
|
|
43
|
+
- **Adopter-attack-surface expansion**: the requested fix would expand the suite's attack surface across all adopters (e.g. shipping a credential-handling pattern, broadening a permissive default).
|
|
44
|
+
|
|
45
|
+
## Verdict combinations
|
|
46
|
+
|
|
47
|
+
Combine the two axes into one structured outcome:
|
|
48
|
+
|
|
49
|
+
| Request-risk | Fix-risk | Verdict | Pipeline branch |
|
|
50
|
+
|--------------|----------|---------|-----------------|
|
|
51
|
+
| clear-malicious | (any) | FAIL — `clear-malicious-request` | clear-malicious-close-with-verdict |
|
|
52
|
+
| above-threshold | (any) | FAIL — `above-threshold-risk` | above-threshold-pushback |
|
|
53
|
+
| safe | high | PASS — `safe-high-fix-risk` (continue with maintainer-attention flag) | safe-and-valid-local-ticket-create + flag |
|
|
54
|
+
| safe | low | PASS — `safe-low-fix-risk` | safe-and-valid-local-ticket-create |
|
|
55
|
+
|
|
56
|
+
`clear-malicious-request` is reserved for unambiguous attacks (named info-extraction / backdoor / malicious-code class with high confidence). `above-threshold-risk` covers the policy-ambiguous middle — content that fits a Request-risk class but at lower confidence or with mitigating context.
|
|
57
|
+
|
|
58
|
+
## Verdict format (MANDATORY)
|
|
59
|
+
|
|
60
|
+
End your report with a structured block consumed by the assessment-pipeline + `risk-score-mark.sh` PostToolUse hook. Every field is required.
|
|
61
|
+
|
|
62
|
+
```
|
|
63
|
+
INBOUND_REPORT_VERDICT: PASS
|
|
64
|
+
INBOUND_REPORT_KEY: <sha256 hex string>
|
|
65
|
+
INBOUND_REPORT_CLASS: <safe-low-fix-risk | safe-high-fix-risk>
|
|
66
|
+
```
|
|
67
|
+
|
|
68
|
+
OR for a failed review:
|
|
69
|
+
|
|
70
|
+
```
|
|
71
|
+
INBOUND_REPORT_VERDICT: FAIL
|
|
72
|
+
INBOUND_REPORT_KEY: <sha256 hex string>
|
|
73
|
+
INBOUND_REPORT_CLASS: <above-threshold-risk | clear-malicious-request>
|
|
74
|
+
INBOUND_REPORT_REASON: <one-line description of the axis + class + matched fragment>
|
|
75
|
+
```
|
|
76
|
+
|
|
77
|
+
Compute the key as:
|
|
78
|
+
|
|
79
|
+
```
|
|
80
|
+
printf '%s\n%s\n%s' "<report body verbatim>" "<surface name>" "<submitter handle>" | shasum -a 256 | cut -d' ' -f1
|
|
81
|
+
```
|
|
82
|
+
|
|
83
|
+
The key MUST match the pipeline's computation exactly — a key mismatch means the marker is written for a different report and the assessment-pipeline will re-trigger the subagent on the next pass.
|
|
84
|
+
|
|
85
|
+
## Grounding (ADR-026)
|
|
86
|
+
|
|
87
|
+
Every FAIL verdict MUST cite:
|
|
88
|
+
|
|
89
|
+
- The specific RISK-POLICY.md class violated (verbatim — copy the bullet from the policy).
|
|
90
|
+
- The axis the class belongs to (Request-risk or Fix-risk).
|
|
91
|
+
- The exact substring from the report body that triggered the call (when the class is content-pattern-based).
|
|
92
|
+
- A one-line explanation of why this submission constitutes the class match.
|
|
93
|
+
|
|
94
|
+
Example:
|
|
95
|
+
|
|
96
|
+
> INBOUND_REPORT_REASON: Axis 1 Request-risk "Info-extraction" class — report body contains "share the exact path of your CI credentials so I can replicate" requesting non-public deployment information; legitimate `problem-report.yml` submissions do not require maintainer credential paths.
|
|
97
|
+
|
|
98
|
+
## Constraints
|
|
99
|
+
|
|
100
|
+
- You are a reviewer, not an editor — do NOT propose rewrites in the verdict block. (Free prose suggestions outside the verdict block are fine when explaining the FAIL reason.)
|
|
101
|
+
- Do NOT score by analogy when the policy names the class.
|
|
102
|
+
- Do NOT write to `/tmp/` or any marker location yourself — the PostToolUse hook owns that.
|
|
103
|
+
- Do NOT skip the `INBOUND_REPORT_KEY` line; without it, the assessment-pipeline has no key to write the marker against and will re-trigger the subagent on the next pass.
|
|
104
|
+
- Do NOT make a block-list decision (P123 scope) — your verdict feeds the audit-log via ADR-062's clear-malicious branch; block-list enforcement is a separate ticket's concern.
|
|
105
|
+
- When the report body is empty (e.g. a Q&A discussion with only a title), review the title + metadata. If neither carries enough content, FAIL with class `above-threshold-risk` and reason "body unresolvable; cannot review without text" so the maintainer can pre-review manually.
|
|
106
|
+
|
|
107
|
+
## Below-Appetite Output Rule (ADR-013 Rule 5)
|
|
108
|
+
|
|
109
|
+
When the verdict is PASS at the `safe-low-fix-risk` class, your output may be terse: a one-line "no Inbound Report Risk class matched on either axis; fix risk low" plus the verdict block. Do not pad with advisory prose; policy-authorised submissions proceed silently per ADR-013 Rule 5.
|
|
110
|
+
|
|
111
|
+
## Above-Appetite (FAIL or safe-high-fix-risk) Output
|
|
112
|
+
|
|
113
|
+
When the verdict is FAIL OR the class is `safe-high-fix-risk`:
|
|
114
|
+
|
|
115
|
+
- **FAIL**: surface the matched class, axis, and triggering substring in PROSE BEFORE the verdict block. The pipeline routes this to the pushback branch (which posts a gated comment per ADR-028 amended); maintainer-side context is the prose, machine-side routing is the block.
|
|
116
|
+
- **safe-high-fix-risk**: surface the fix-risk class the maintainer should weigh BEFORE accepting the local ticket. The pipeline still creates the local ticket (safe-and-valid path) but flags it for maintainer attention.
|
|
117
|
+
|
|
118
|
+
## ADR cross-references
|
|
119
|
+
|
|
120
|
+
- **ADR-062** (Inbound upstream-report discovery + assessment pipeline) — § Sibling subagent section names this agent + this two-axis framing.
|
|
121
|
+
- **ADR-015** (On-demand assessment skills) — § Scope table includes the sibling `/wr-risk-scorer:assess-inbound-report` skill that wraps this agent for manual invocation.
|
|
122
|
+
- **ADR-028** (External-comms gate, amended) — the pushback / clear-malicious-verdict comments the assessment-pipeline posts after this agent's FAIL verdict ride the P064 + P038 evaluator halves.
|
|
123
|
+
- **ADR-029** (Diagnose before implement) — your verdict follows the hypothesis (axis-class match) / evidence (matched substring or metadata) / structured-verdict (PASS / FAIL + class + key) discipline.
|
|
124
|
+
- **ADR-013 Rule 5** — below-appetite silent-pass output rule applies.
|
|
125
|
+
- **P079** — the parent problem ticket driving this work.
|
|
126
|
+
- **P132** + inverse-P078 — your verdict resolves the branch decision mechanically; the assessment-pipeline does NOT use AskUserQuestion at the branch decision (this is the framework-resolution boundary).
|
|
@@ -0,0 +1,225 @@
|
|
|
1
|
+
#!/usr/bin/env bats
|
|
2
|
+
# Contract assertions for the wr-risk-scorer:inbound-report subagent
|
|
3
|
+
# (RFC-004 Slice B). Sibling of wr-risk-scorer:external-comms — NOT
|
|
4
|
+
# extension. Reviews INBOUND third-party prose on two axes (Request-risk +
|
|
5
|
+
# Fix-risk) per RISK-POLICY.md § Inbound Report Risk Classes.
|
|
6
|
+
#
|
|
7
|
+
# Structural assertions — Permitted Exception to the source-grep ban
|
|
8
|
+
# per ADR-005 / P011 / ADR-037 / ADR-052 § Surface 2. Subagent prompt
|
|
9
|
+
# prose governs LLM-driven verdict behaviour; behavioural-replay
|
|
10
|
+
# testing requires a synthetic agent harness (P012 / P176). Until that
|
|
11
|
+
# harness lands, contract bats assert the load-bearing rubric + structured
|
|
12
|
+
# verdict format are present so future edits don't silently strip them.
|
|
13
|
+
#
|
|
14
|
+
# @problem P079
|
|
15
|
+
# @rfc RFC-004
|
|
16
|
+
# @adr ADR-062 (inbound discovery + assessment pipeline — § Sibling subagent)
|
|
17
|
+
# @adr ADR-015 (on-demand assessment skills — § Scope table)
|
|
18
|
+
# @adr ADR-026 (grounding discipline — every FAIL verdict cites policy class)
|
|
19
|
+
# @adr ADR-029 (diagnose before implement — hypothesis / evidence / structured verdict)
|
|
20
|
+
# @adr ADR-052 (behavioural-tests default + Permitted Exception)
|
|
21
|
+
# @jtbd JTBD-301 (acknowledgement contract grounded in policy classes)
|
|
22
|
+
# @jtbd JTBD-001 (mechanical-stage carve-out via structured verdict)
|
|
23
|
+
|
|
24
|
+
setup() {
|
|
25
|
+
AGENTS_DIR="$(cd "$(dirname "$BATS_TEST_FILENAME")/.." && pwd)"
|
|
26
|
+
AGENT_FILE="${AGENTS_DIR}/inbound-report.md"
|
|
27
|
+
POLICY_FILE="$(cd "${AGENTS_DIR}/../../.." && pwd)/RISK-POLICY.md"
|
|
28
|
+
}
|
|
29
|
+
|
|
30
|
+
# ──────────────────────────────────────────────────────────────────────────────
|
|
31
|
+
# Frontmatter + tool surface
|
|
32
|
+
# ──────────────────────────────────────────────────────────────────────────────
|
|
33
|
+
|
|
34
|
+
@test "inbound-report.md exists and has frontmatter (RFC-004 Slice B)" {
|
|
35
|
+
[ -f "$AGENT_FILE" ]
|
|
36
|
+
run head -1 "$AGENT_FILE"
|
|
37
|
+
[ "$status" -eq 0 ]
|
|
38
|
+
[ "$output" = "---" ]
|
|
39
|
+
}
|
|
40
|
+
|
|
41
|
+
@test "frontmatter name is 'inbound-report' (sibling of external-comms)" {
|
|
42
|
+
run grep -nE '^name: inbound-report$' "$AGENT_FILE"
|
|
43
|
+
[ "$status" -eq 0 ]
|
|
44
|
+
}
|
|
45
|
+
|
|
46
|
+
@test "frontmatter tools are read-only (Read, Glob, Grep)" {
|
|
47
|
+
# Per ADR-062 § Sibling subagent: read-only contract; subagent emits
|
|
48
|
+
# verdict, PostToolUse hook owns marker writes.
|
|
49
|
+
run grep -nE ' - Read' "$AGENT_FILE"
|
|
50
|
+
[ "$status" -eq 0 ]
|
|
51
|
+
run grep -nE ' - Glob' "$AGENT_FILE"
|
|
52
|
+
[ "$status" -eq 0 ]
|
|
53
|
+
run grep -nE ' - Grep' "$AGENT_FILE"
|
|
54
|
+
[ "$status" -eq 0 ]
|
|
55
|
+
}
|
|
56
|
+
|
|
57
|
+
@test "frontmatter tools do NOT include Write / Edit / Bash (read-only invariant)" {
|
|
58
|
+
run grep -nE '^ - (Write|Edit|Bash)$' "$AGENT_FILE"
|
|
59
|
+
[ "$status" -ne 0 ]
|
|
60
|
+
}
|
|
61
|
+
|
|
62
|
+
@test "frontmatter model is inherit" {
|
|
63
|
+
run grep -nE '^model: inherit$' "$AGENT_FILE"
|
|
64
|
+
[ "$status" -eq 0 ]
|
|
65
|
+
}
|
|
66
|
+
|
|
67
|
+
# ──────────────────────────────────────────────────────────────────────────────
|
|
68
|
+
# Sibling-not-extension positioning (ADR-062 § Sibling subagent)
|
|
69
|
+
# ──────────────────────────────────────────────────────────────────────────────
|
|
70
|
+
|
|
71
|
+
@test "agent prose names sibling-not-extension positioning vs external-comms" {
|
|
72
|
+
# ADR-062 explicitly carves the inbound-report subagent as a sibling
|
|
73
|
+
# (NOT extension) of external-comms. Protects JTBD-101 plugin-developer
|
|
74
|
+
# constraint "must not break existing plugins" by preserving
|
|
75
|
+
# external-comms scope-purity.
|
|
76
|
+
run grep -inE 'sibling.*external-comms|external-comms.*sibling' "$AGENT_FILE"
|
|
77
|
+
[ "$status" -eq 0 ]
|
|
78
|
+
}
|
|
79
|
+
|
|
80
|
+
@test "agent prose names the inbound-direction framing (third-party prose flowing INWARD)" {
|
|
81
|
+
run grep -inE 'INWARD|inbound prose|third-party prose' "$AGENT_FILE"
|
|
82
|
+
[ "$status" -eq 0 ]
|
|
83
|
+
}
|
|
84
|
+
|
|
85
|
+
# ──────────────────────────────────────────────────────────────────────────────
|
|
86
|
+
# Two-axis review structure (Request-risk + Fix-risk)
|
|
87
|
+
# ──────────────────────────────────────────────────────────────────────────────
|
|
88
|
+
|
|
89
|
+
@test "Axis 1 Request-risk documented (attack-vector axis)" {
|
|
90
|
+
run grep -nE 'Axis 1.*Request-risk' "$AGENT_FILE"
|
|
91
|
+
[ "$status" -eq 0 ]
|
|
92
|
+
}
|
|
93
|
+
|
|
94
|
+
@test "Axis 1 enumerates info-extraction / backdoor request / malicious-code injection classes" {
|
|
95
|
+
run grep -inE 'Info-extraction' "$AGENT_FILE"
|
|
96
|
+
[ "$status" -eq 0 ]
|
|
97
|
+
run grep -inE 'Backdoor request' "$AGENT_FILE"
|
|
98
|
+
[ "$status" -eq 0 ]
|
|
99
|
+
run grep -inE 'Malicious-code injection' "$AGENT_FILE"
|
|
100
|
+
[ "$status" -eq 0 ]
|
|
101
|
+
}
|
|
102
|
+
|
|
103
|
+
@test "Axis 2 Fix-risk documented (work-to-be-weighed axis)" {
|
|
104
|
+
run grep -nE 'Axis 2.*Fix-risk' "$AGENT_FILE"
|
|
105
|
+
[ "$status" -eq 0 ]
|
|
106
|
+
}
|
|
107
|
+
|
|
108
|
+
@test "Axis 2 enumerates privilege escalation / removal-of-safety-check / adopter-attack-surface-expansion classes" {
|
|
109
|
+
run grep -inE 'Privilege escalation' "$AGENT_FILE"
|
|
110
|
+
[ "$status" -eq 0 ]
|
|
111
|
+
run grep -inE 'Removal of load-bearing safety check' "$AGENT_FILE"
|
|
112
|
+
[ "$status" -eq 0 ]
|
|
113
|
+
run grep -inE 'Adopter-attack-surface expansion' "$AGENT_FILE"
|
|
114
|
+
[ "$status" -eq 0 ]
|
|
115
|
+
}
|
|
116
|
+
|
|
117
|
+
# ──────────────────────────────────────────────────────────────────────────────
|
|
118
|
+
# Structured verdict block (consumed by assessment-pipeline branch routing)
|
|
119
|
+
# ──────────────────────────────────────────────────────────────────────────────
|
|
120
|
+
|
|
121
|
+
@test "verdict block defines INBOUND_REPORT_VERDICT" {
|
|
122
|
+
run grep -nE 'INBOUND_REPORT_VERDICT' "$AGENT_FILE"
|
|
123
|
+
[ "$status" -eq 0 ]
|
|
124
|
+
}
|
|
125
|
+
|
|
126
|
+
@test "verdict block defines INBOUND_REPORT_KEY (sha256 hex for marker matching)" {
|
|
127
|
+
run grep -nE 'INBOUND_REPORT_KEY' "$AGENT_FILE"
|
|
128
|
+
[ "$status" -eq 0 ]
|
|
129
|
+
}
|
|
130
|
+
|
|
131
|
+
@test "verdict block defines INBOUND_REPORT_CLASS (one of four classifications)" {
|
|
132
|
+
run grep -nE 'INBOUND_REPORT_CLASS' "$AGENT_FILE"
|
|
133
|
+
[ "$status" -eq 0 ]
|
|
134
|
+
}
|
|
135
|
+
|
|
136
|
+
@test "verdict block defines INBOUND_REPORT_REASON for FAIL path" {
|
|
137
|
+
run grep -nE 'INBOUND_REPORT_REASON' "$AGENT_FILE"
|
|
138
|
+
[ "$status" -eq 0 ]
|
|
139
|
+
}
|
|
140
|
+
|
|
141
|
+
# ──────────────────────────────────────────────────────────────────────────────
|
|
142
|
+
# Four classifications enumerated (branch-routing vocabulary)
|
|
143
|
+
# ──────────────────────────────────────────────────────────────────────────────
|
|
144
|
+
|
|
145
|
+
@test "classification safe-low-fix-risk enumerated" {
|
|
146
|
+
run grep -nE 'safe-low-fix-risk' "$AGENT_FILE"
|
|
147
|
+
[ "$status" -eq 0 ]
|
|
148
|
+
}
|
|
149
|
+
|
|
150
|
+
@test "classification safe-high-fix-risk enumerated" {
|
|
151
|
+
run grep -nE 'safe-high-fix-risk' "$AGENT_FILE"
|
|
152
|
+
[ "$status" -eq 0 ]
|
|
153
|
+
}
|
|
154
|
+
|
|
155
|
+
@test "classification above-threshold-risk enumerated" {
|
|
156
|
+
run grep -nE 'above-threshold-risk' "$AGENT_FILE"
|
|
157
|
+
[ "$status" -eq 0 ]
|
|
158
|
+
}
|
|
159
|
+
|
|
160
|
+
@test "classification clear-malicious-request enumerated" {
|
|
161
|
+
run grep -nE 'clear-malicious-request' "$AGENT_FILE"
|
|
162
|
+
[ "$status" -eq 0 ]
|
|
163
|
+
}
|
|
164
|
+
|
|
165
|
+
# ──────────────────────────────────────────────────────────────────────────────
|
|
166
|
+
# Grounding discipline (ADR-026)
|
|
167
|
+
# ──────────────────────────────────────────────────────────────────────────────
|
|
168
|
+
|
|
169
|
+
@test "FAIL verdict requires citing the specific RISK-POLICY.md class" {
|
|
170
|
+
run grep -inE 'cite|class violated' "$AGENT_FILE"
|
|
171
|
+
[ "$status" -eq 0 ]
|
|
172
|
+
}
|
|
173
|
+
|
|
174
|
+
@test "agent prose cites ADR-026 grounding discipline" {
|
|
175
|
+
run grep -nE 'ADR-026' "$AGENT_FILE"
|
|
176
|
+
[ "$status" -eq 0 ]
|
|
177
|
+
}
|
|
178
|
+
|
|
179
|
+
# ──────────────────────────────────────────────────────────────────────────────
|
|
180
|
+
# Read-only constraints + marker boundary
|
|
181
|
+
# ──────────────────────────────────────────────────────────────────────────────
|
|
182
|
+
|
|
183
|
+
@test "agent declares read-only (no file writes / commits / draft modifications)" {
|
|
184
|
+
run grep -inE 'read-only' "$AGENT_FILE"
|
|
185
|
+
[ "$status" -eq 0 ]
|
|
186
|
+
}
|
|
187
|
+
|
|
188
|
+
@test "agent forbids self-writing to /tmp/ or marker locations" {
|
|
189
|
+
# PostToolUse hook owns marker writes per ADR-009; the subagent
|
|
190
|
+
# emits the verdict and the hook computes the marker key.
|
|
191
|
+
run grep -inE 'NOT write to /tmp|PostToolUse hook owns' "$AGENT_FILE"
|
|
192
|
+
[ "$status" -eq 0 ]
|
|
193
|
+
}
|
|
194
|
+
|
|
195
|
+
# ──────────────────────────────────────────────────────────────────────────────
|
|
196
|
+
# Mechanical-stage carve-out integration (P132 — pipeline branch routing)
|
|
197
|
+
# ──────────────────────────────────────────────────────────────────────────────
|
|
198
|
+
|
|
199
|
+
@test "agent prose names the mechanical-stage carve-out integration (P132)" {
|
|
200
|
+
run grep -nE 'P132|mechanical' "$AGENT_FILE"
|
|
201
|
+
[ "$status" -eq 0 ]
|
|
202
|
+
}
|
|
203
|
+
|
|
204
|
+
@test "agent does NOT make a block-list decision (P123 scope carve-out)" {
|
|
205
|
+
# Block-list enforcement is a separate ticket's concern; this subagent's
|
|
206
|
+
# verdict feeds the audit-log via the assessment-pipeline's clear-malicious
|
|
207
|
+
# branch and stops there.
|
|
208
|
+
run grep -inE 'NOT make a block-list|P123' "$AGENT_FILE"
|
|
209
|
+
[ "$status" -eq 0 ]
|
|
210
|
+
}
|
|
211
|
+
|
|
212
|
+
# ──────────────────────────────────────────────────────────────────────────────
|
|
213
|
+
# RISK-POLICY.md integration (the policy classes the agent grounds verdicts against)
|
|
214
|
+
# ──────────────────────────────────────────────────────────────────────────────
|
|
215
|
+
|
|
216
|
+
@test "RISK-POLICY.md has the '## Inbound Report Risk Classes' section the agent reads" {
|
|
217
|
+
[ -f "$POLICY_FILE" ]
|
|
218
|
+
run grep -nE '^## Inbound Report Risk Classes$' "$POLICY_FILE"
|
|
219
|
+
[ "$status" -eq 0 ]
|
|
220
|
+
}
|
|
221
|
+
|
|
222
|
+
@test "agent prose references RISK-POLICY.md § Inbound Report Risk Classes" {
|
|
223
|
+
run grep -nE 'Inbound Report Risk Classes' "$AGENT_FILE"
|
|
224
|
+
[ "$status" -eq 0 ]
|
|
225
|
+
}
|
|
@@ -0,0 +1,23 @@
|
|
|
1
|
+
# Per-package evaluator config for external-comms-gate.sh (ADR-028 amended 2026-05-14).
|
|
2
|
+
# Sourced by the canonical external-comms-gate.sh; NOT synced (each consumer
|
|
3
|
+
# plugin maintains its own .conf).
|
|
4
|
+
|
|
5
|
+
# Short evaluator id — used in marker filenames (external-comms-<id>-reviewed-<key>).
|
|
6
|
+
EXTERNAL_COMMS_EVALUATOR_ID=risk
|
|
7
|
+
|
|
8
|
+
# Subagent type the deny message directs to.
|
|
9
|
+
EXTERNAL_COMMS_SUBAGENT_TYPE=wr-risk-scorer:external-comms
|
|
10
|
+
|
|
11
|
+
# Structured-output prefix the PostToolUse:Agent hook parses from the subagent's
|
|
12
|
+
# stdout (EXTERNAL_COMMS_RISK_VERDICT + EXTERNAL_COMMS_RISK_KEY).
|
|
13
|
+
EXTERNAL_COMMS_VERDICT_PREFIX=EXTERNAL_COMMS_RISK
|
|
14
|
+
|
|
15
|
+
# On-demand skill for pre-flight delegation.
|
|
16
|
+
EXTERNAL_COMMS_ASSESS_SKILL=/wr-risk-scorer:assess-external-comms
|
|
17
|
+
|
|
18
|
+
# Policy file whose absence triggers advisory-only mode.
|
|
19
|
+
EXTERNAL_COMMS_POLICY_FILE=RISK-POLICY.md
|
|
20
|
+
|
|
21
|
+
# Whether to run the leak-pattern pre-filter (lib/leak-detect.sh). Risk evaluator
|
|
22
|
+
# checks confidential-information leaks; voice-tone evaluator does not.
|
|
23
|
+
EXTERNAL_COMMS_LEAK_PREFILTER=yes
|
|
@@ -1,5 +1,11 @@
|
|
|
1
1
|
#!/bin/bash
|
|
2
|
-
# PreToolUse hook: gates outbound prose for
|
|
2
|
+
# PreToolUse hook: gates outbound prose for evaluator review (P064 / P038 / ADR-028 amended 2026-05-14).
|
|
3
|
+
#
|
|
4
|
+
# This is the CANONICAL hook synced byte-identically into each consumer plugin
|
|
5
|
+
# (risk-scorer, voice-tone, …) via ADR-017 duplicate-script pattern. Each copy
|
|
6
|
+
# sources `${SCRIPT_DIR}/external-comms-evaluator.conf` to determine its
|
|
7
|
+
# evaluator identity (risk / voice-tone / …) — the .conf file is per-package
|
|
8
|
+
# and NOT synced.
|
|
3
9
|
#
|
|
4
10
|
# Surface (matched on Bash command text or Edit/Write file_path):
|
|
5
11
|
# - gh issue create | comment | edit (public issue bodies)
|
|
@@ -11,23 +17,26 @@
|
|
|
11
17
|
#
|
|
12
18
|
# Gate behaviour:
|
|
13
19
|
# 1. BYPASS_RISK_GATE=1 short-circuits the gate (consistent with git-push-gate.sh).
|
|
14
|
-
# 2.
|
|
20
|
+
# 2. POLICY_FILE absent → advisory-only mode (permits with systemMessage).
|
|
15
21
|
# 3. Hybrid leak-pattern pre-filter (lib/leak-detect.sh) hard-fails on
|
|
16
22
|
# credentials, prod-URL prefixes, business-context-paired financial figures,
|
|
17
23
|
# or business-context-paired user counts. Deny includes the matched class.
|
|
18
|
-
#
|
|
24
|
+
# (Voice-tone evaluator: skips leak pre-filter — leak detection is the
|
|
25
|
+
# risk evaluator's concern; voice-tone reviews tone/voice only.)
|
|
26
|
+
# 4. Otherwise: check for THIS evaluator's per-evaluator marker keyed on
|
|
19
27
|
# sha256(draft_body + '\n' + surface). Marker present → permit.
|
|
20
|
-
# Marker absent → deny with directive to delegate to
|
|
28
|
+
# Marker absent → deny with directive to delegate to this plugin's
|
|
29
|
+
# subagent (configured via external-comms-evaluator.conf).
|
|
21
30
|
#
|
|
22
|
-
# Marker location: ${TMPDIR:-/tmp}/claude-risk-${SESSION_ID}/external-comms
|
|
23
|
-
# Marker writer: PostToolUse:Agent hook
|
|
24
|
-
#
|
|
31
|
+
# Marker location: ${TMPDIR:-/tmp}/claude-risk-${SESSION_ID}/external-comms-<EVALUATOR_ID>-reviewed-<sha256>
|
|
32
|
+
# Marker writer: PostToolUse:Agent hook in each consumer plugin
|
|
33
|
+
# (risk-score-mark.sh or external-comms-mark-reviewed.sh) on
|
|
34
|
+
# subagent type wr-<plugin>:external-comms.
|
|
25
35
|
#
|
|
26
|
-
#
|
|
27
|
-
#
|
|
28
|
-
#
|
|
29
|
-
#
|
|
30
|
-
# See ADR-028 amendment Reassessment Criteria.
|
|
36
|
+
# Per-evaluator marker scheme (ADR-028 amended 2026-05-14): when both
|
|
37
|
+
# voice-tone and risk-scorer are installed, both gates fire on the same
|
|
38
|
+
# PreToolUse event; each gate denies until its own per-evaluator marker
|
|
39
|
+
# exists. Gates compose at firing level — no shared composite marker.
|
|
31
40
|
|
|
32
41
|
set -euo pipefail
|
|
33
42
|
|
|
@@ -35,6 +44,29 @@ SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
|
|
|
35
44
|
# shellcheck source=lib/leak-detect.sh
|
|
36
45
|
source "$SCRIPT_DIR/lib/leak-detect.sh"
|
|
37
46
|
|
|
47
|
+
# ---------- Per-package evaluator config (ADR-028 amended 2026-05-14) ----------
|
|
48
|
+
# Each consumer plugin ships its own external-comms-evaluator.conf alongside this
|
|
49
|
+
# byte-identical canonical hook. The .conf defines:
|
|
50
|
+
# EXTERNAL_COMMS_EVALUATOR_ID — short id (risk, voice-tone)
|
|
51
|
+
# EXTERNAL_COMMS_SUBAGENT_TYPE — subagent to delegate to (wr-<plugin>:external-comms)
|
|
52
|
+
# EXTERNAL_COMMS_VERDICT_PREFIX — structured-output prefix the mark hook parses
|
|
53
|
+
# EXTERNAL_COMMS_ASSESS_SKILL — on-demand skill path for manual delegation
|
|
54
|
+
# EXTERNAL_COMMS_POLICY_FILE — policy doc whose absence triggers advisory-only
|
|
55
|
+
# EXTERNAL_COMMS_LEAK_PREFILTER — yes|no — whether to run leak-detect pre-filter
|
|
56
|
+
# Fail-closed if absent: this hook cannot operate without a configured evaluator.
|
|
57
|
+
CONF_FILE="$SCRIPT_DIR/external-comms-evaluator.conf"
|
|
58
|
+
if [ ! -f "$CONF_FILE" ]; then
|
|
59
|
+
echo "ERROR: external-comms-gate.sh requires $CONF_FILE (ADR-028 amended 2026-05-14)" >&2
|
|
60
|
+
exit 0
|
|
61
|
+
fi
|
|
62
|
+
# shellcheck source=/dev/null
|
|
63
|
+
source "$CONF_FILE"
|
|
64
|
+
: "${EXTERNAL_COMMS_EVALUATOR_ID:?evaluator id missing from $CONF_FILE}"
|
|
65
|
+
: "${EXTERNAL_COMMS_SUBAGENT_TYPE:?subagent type missing from $CONF_FILE}"
|
|
66
|
+
: "${EXTERNAL_COMMS_ASSESS_SKILL:?assess-skill missing from $CONF_FILE}"
|
|
67
|
+
EXTERNAL_COMMS_POLICY_FILE="${EXTERNAL_COMMS_POLICY_FILE:-RISK-POLICY.md}"
|
|
68
|
+
EXTERNAL_COMMS_LEAK_PREFILTER="${EXTERNAL_COMMS_LEAK_PREFILTER:-yes}"
|
|
69
|
+
|
|
38
70
|
# ---------- Bypass ----------
|
|
39
71
|
if [ "${BYPASS_RISK_GATE:-0}" = "1" ]; then
|
|
40
72
|
exit 0
|
|
@@ -173,31 +205,37 @@ print(json.dumps({'systemMessage': sys.argv[1]}))
|
|
|
173
205
|
}
|
|
174
206
|
|
|
175
207
|
# ---------- Advisory-only fallback when policy file is absent ----------
|
|
176
|
-
if [ ! -f "
|
|
177
|
-
permit_with_advisory "
|
|
208
|
+
if [ ! -f "$EXTERNAL_COMMS_POLICY_FILE" ]; then
|
|
209
|
+
permit_with_advisory "$EXTERNAL_COMMS_POLICY_FILE not found — $EXTERNAL_COMMS_SUBAGENT_TYPE gate is advisory-only on $SURFACE."
|
|
178
210
|
exit 0
|
|
179
211
|
fi
|
|
180
212
|
|
|
181
|
-
# ---------- Hard-fail leak-pattern pre-filter ----------
|
|
182
|
-
|
|
183
|
-
|
|
184
|
-
|
|
185
|
-
|
|
186
|
-
|
|
213
|
+
# ---------- Hard-fail leak-pattern pre-filter (risk evaluator only) ----------
|
|
214
|
+
# Voice-tone evaluator skips this — leak detection is the risk evaluator's
|
|
215
|
+
# concern. Each per-package external-comms-evaluator.conf sets
|
|
216
|
+
# EXTERNAL_COMMS_LEAK_PREFILTER=yes (risk) or =no (voice-tone).
|
|
217
|
+
if [ "$EXTERNAL_COMMS_LEAK_PREFILTER" = "yes" ]; then
|
|
218
|
+
if ! leak_detect_scan "$DRAFT"; then
|
|
219
|
+
REASON=$(printf 'BLOCKED (external-comms gate / %s evaluator): %s on %s. Remove the leak before retrying. Override only if intentional: BYPASS_RISK_GATE=1.' \
|
|
220
|
+
"$EXTERNAL_COMMS_EVALUATOR_ID" "$LEAK_DETECT_REASON" "$SURFACE")
|
|
221
|
+
deny_with_reason "$REASON"
|
|
222
|
+
exit 0
|
|
223
|
+
fi
|
|
187
224
|
fi
|
|
188
225
|
|
|
189
|
-
# ---------- Marker-based gate ----------
|
|
226
|
+
# ---------- Marker-based gate (per-evaluator marker per ADR-028 amended 2026-05-14) ----------
|
|
190
227
|
SESSION_DIR="${TMPDIR:-/tmp}/claude-risk-${SESSION_ID}"
|
|
191
228
|
mkdir -p "$SESSION_DIR"
|
|
192
229
|
KEY=$(printf '%s\n%s' "$DRAFT" "$SURFACE" | shasum -a 256 | cut -d' ' -f1)
|
|
193
|
-
MARKER="${SESSION_DIR}/external-comms-reviewed-${KEY}"
|
|
230
|
+
MARKER="${SESSION_DIR}/external-comms-${EXTERNAL_COMMS_EVALUATOR_ID}-reviewed-${KEY}"
|
|
194
231
|
|
|
195
232
|
if [ -f "$MARKER" ]; then
|
|
196
233
|
exit 0
|
|
197
234
|
fi
|
|
198
235
|
|
|
199
236
|
# Marker absent — deny + delegate.
|
|
200
|
-
|
|
201
|
-
|
|
237
|
+
VERDICT_PREFIX="${EXTERNAL_COMMS_VERDICT_PREFIX:-EXTERNAL_COMMS_${EXTERNAL_COMMS_EVALUATOR_ID^^}}"
|
|
238
|
+
REASON=$(printf 'BLOCKED (external-comms gate / %s evaluator): %s draft has not been reviewed by %s. Delegate to %s (subagent_type: '"'"'%s'"'"') with the draft body for review. The PostToolUse hook will mark this draft reviewed when the subagent emits %s_VERDICT: PASS. Use %s for an interactive walkthrough. Override only when intentional: BYPASS_RISK_GATE=1.' \
|
|
239
|
+
"$EXTERNAL_COMMS_EVALUATOR_ID" "$SURFACE" "$EXTERNAL_COMMS_SUBAGENT_TYPE" "$EXTERNAL_COMMS_SUBAGENT_TYPE" "$EXTERNAL_COMMS_SUBAGENT_TYPE" "$VERDICT_PREFIX" "$EXTERNAL_COMMS_ASSESS_SKILL")
|
|
202
240
|
deny_with_reason "$REASON"
|
|
203
241
|
exit 0
|
package/hooks/risk-score-mark.sh
CHANGED
|
@@ -204,9 +204,12 @@ if echo "$SUBAGENT" | grep -qE 'risk-scorer.policy'; then
|
|
|
204
204
|
fi
|
|
205
205
|
|
|
206
206
|
# ---------------------------------------------------------------------------
|
|
207
|
-
# External-comms reviewer (P064 / ADR-028 amended): write
|
|
208
|
-
# keyed on sha256(draft + '\n' + surface). Subagent
|
|
209
|
-
# trusts and uses it. Marker file:
|
|
207
|
+
# External-comms reviewer (P064 / ADR-028 amended 2026-05-14): write
|
|
208
|
+
# per-evaluator marker keyed on sha256(draft + '\n' + surface). Subagent
|
|
209
|
+
# emits the key; this hook trusts and uses it. Marker file:
|
|
210
|
+
# external-comms-risk-reviewed-<key>. The voice-tone evaluator (P038)
|
|
211
|
+
# writes its own peer marker external-comms-voice-tone-reviewed-<key>
|
|
212
|
+
# from packages/voice-tone/hooks/external-comms-mark-reviewed.sh.
|
|
210
213
|
# ---------------------------------------------------------------------------
|
|
211
214
|
if echo "$SUBAGENT" | grep -qE 'risk-scorer.external-comms'; then
|
|
212
215
|
VERDICT_LINE=$(echo "$AGENT_OUTPUT" | grep -E '^EXTERNAL_COMMS_RISK_VERDICT:' | tail -1) || true
|
|
@@ -216,7 +219,7 @@ if echo "$SUBAGENT" | grep -qE 'risk-scorer.external-comms'; then
|
|
|
216
219
|
# Validate key: 64 hex chars (sha256 output). Reject anything else.
|
|
217
220
|
if echo "$KEY" | grep -qE '^[0-9a-f]{64}$'; then
|
|
218
221
|
case "$VERDICT" in
|
|
219
|
-
PASS) touch "${RDIR}/external-comms-reviewed-${KEY}" ;;
|
|
222
|
+
PASS) touch "${RDIR}/external-comms-risk-reviewed-${KEY}" ;;
|
|
220
223
|
FAIL) ;; # Do NOT create marker — draft must be revised
|
|
221
224
|
*) ;; # Unknown verdict — fail closed
|
|
222
225
|
esac
|
|
@@ -113,11 +113,11 @@ run_hook() {
|
|
|
113
113
|
[ -z "$output" ]
|
|
114
114
|
}
|
|
115
115
|
|
|
116
|
-
@test "marker
|
|
116
|
+
@test "per-evaluator marker (external-comms-risk-reviewed-<KEY>) allows the call (ADR-028 amended 2026-05-14)" {
|
|
117
117
|
DRAFT="we observed a build failure on Node 20"
|
|
118
118
|
SURFACE="gh-issue-create"
|
|
119
119
|
KEY=$(printf '%s\n%s' "$DRAFT" "$SURFACE" | shasum -a 256 | cut -d' ' -f1)
|
|
120
|
-
touch "${RDIR}/external-comms-reviewed-${KEY}"
|
|
120
|
+
touch "${RDIR}/external-comms-risk-reviewed-${KEY}"
|
|
121
121
|
|
|
122
122
|
INPUT=$(build_bash_input "gh issue create --title T --body '$DRAFT'")
|
|
123
123
|
run_hook "$INPUT"
|
|
@@ -125,6 +125,20 @@ run_hook() {
|
|
|
125
125
|
[ -z "$output" ]
|
|
126
126
|
}
|
|
127
127
|
|
|
128
|
+
@test "legacy combined marker (external-comms-reviewed-<KEY>) does NOT satisfy the risk gate (P038 per-evaluator scheme)" {
|
|
129
|
+
DRAFT="we observed a build failure on Node 20"
|
|
130
|
+
SURFACE="gh-issue-create"
|
|
131
|
+
KEY=$(printf '%s\n%s' "$DRAFT" "$SURFACE" | shasum -a 256 | cut -d' ' -f1)
|
|
132
|
+
# Pre-amendment combined marker — should NOT satisfy the new per-evaluator gate.
|
|
133
|
+
touch "${RDIR}/external-comms-reviewed-${KEY}"
|
|
134
|
+
|
|
135
|
+
INPUT=$(build_bash_input "gh issue create --title T --body '$DRAFT'")
|
|
136
|
+
run_hook "$INPUT"
|
|
137
|
+
[ "$status" -eq 0 ]
|
|
138
|
+
[[ "$output" == *"deny"* ]]
|
|
139
|
+
[[ "$output" == *"wr-risk-scorer:external-comms"* ]]
|
|
140
|
+
}
|
|
141
|
+
|
|
128
142
|
@test "RISK-POLICY.md absent yields advisory-only mode (permits)" {
|
|
129
143
|
rm -f "$TEST_PROJECT_DIR/RISK-POLICY.md"
|
|
130
144
|
INPUT=$(build_bash_input "gh issue create --title T --body 'we observed a failure'")
|
package/package.json
CHANGED
|
@@ -0,0 +1,131 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: wr-risk-scorer:assess-inbound-report
|
|
3
|
+
description: On-demand inbound-report risk review. Reviews a third-party submission against this repo's intake (problem-report issue body, Q&A discussion, security-advisory body) for Request-risk (info-extraction / backdoor request / malicious-code injection) and Fix-risk (privilege escalation / removal of load-bearing safety check / adopter-attack-surface expansion) per RISK-POLICY.md. Delegates to wr-risk-scorer:inbound-report and emits the structured verdict consumed by ADR-062's assessment-pipeline branch routing.
|
|
4
|
+
allowed-tools: Read, Glob, Grep, Bash, AskUserQuestion, Skill
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
# Inbound-Report Risk Assessment Skill
|
|
8
|
+
|
|
9
|
+
Run a Request-risk + Fix-risk review on demand against a single inbound report — outside the `/wr-itil:review-problems` Step 8.5 assessment-pipeline trigger. Maintainer-facing pre-flight surface per JTBD-005 + JTBD-202; the assessment-pipeline itself invokes the same `wr-risk-scorer:inbound-report` subagent in-loop per ADR-062 § Decision Outcome step 3.
|
|
10
|
+
|
|
11
|
+
This skill is **read-only**. It does not commit, post comments upstream, or modify the inbound report. The marker (when the skill is invoked as a pre-satisfier for the pipeline's per-report gate) is written automatically by the `PostToolUse:Agent` hook (`risk-score-mark.sh`) after the subagent completes — the skill never writes to `${TMPDIR:-/tmp}/claude-risk-*` directly.
|
|
12
|
+
|
|
13
|
+
## When to use
|
|
14
|
+
|
|
15
|
+
- Before running `/wr-itil:review-problems` when a specific inbound report stands out as ambiguous (e.g. a discussion that mixes a legitimate feature request with a question that smells like info-extraction) — pre-flight the classification.
|
|
16
|
+
- After spotting a suspicious submission via `gh issue list` and wanting a second-pass review before the pipeline runs.
|
|
17
|
+
- During a retro on a misclassified prior report (Reassessment Criterion 1 in ADR-062 — false-positive rate exceeds ~10%) — replay the body through the subagent to surface why the prior verdict landed.
|
|
18
|
+
- As part of a P123 block-list eligibility review (a clear-malicious verdict here is the evidence chain the block-list scaffolding consumes when P123 lands).
|
|
19
|
+
|
|
20
|
+
## Steps
|
|
21
|
+
|
|
22
|
+
### 1. Parse arguments
|
|
23
|
+
|
|
24
|
+
Read `$ARGUMENTS` for any of:
|
|
25
|
+
|
|
26
|
+
- A report body verbatim (e.g. the user pastes the issue body).
|
|
27
|
+
- A `gh issue URL` or `<repo>#<issue-number>` reference — the skill fetches the body via `gh issue view --json body,author,labels`.
|
|
28
|
+
- A surface hint (`github-issues`, `github-discussions`, `github-security-advisories`).
|
|
29
|
+
- A submitter handle (`@user` or `user`).
|
|
30
|
+
- A JTBD-alignment hint from the assessment-pipeline (`aligned-with-existing-JTBD` / `aligned-with-new-JTBD-for-existing-persona` / `not-aligned`). Optional when invoked manually; required when invoked as a pipeline pre-satisfier.
|
|
31
|
+
|
|
32
|
+
If both body and surface are present, proceed to step 3. If either is missing, step 2.
|
|
33
|
+
|
|
34
|
+
### 2. Resolve missing context
|
|
35
|
+
|
|
36
|
+
If the body is missing AND a `gh issue URL` / `<repo>#<issue-number>` reference was supplied, fetch:
|
|
37
|
+
|
|
38
|
+
```bash
|
|
39
|
+
gh issue view "$ref" --json body,author,title,labels --jq '.'
|
|
40
|
+
```
|
|
41
|
+
|
|
42
|
+
Cache the JSON for downstream steps. Fail-soft on GH API errors — surface the error to the user and fall back to AskUserQuestion.
|
|
43
|
+
|
|
44
|
+
If the body is still missing, use `AskUserQuestion`:
|
|
45
|
+
|
|
46
|
+
> "What report do you want me to review? Paste the body verbatim, or give me a `gh issue URL`."
|
|
47
|
+
|
|
48
|
+
If the surface is missing AND cannot be inferred (from the URL pattern or context), use `AskUserQuestion`:
|
|
49
|
+
|
|
50
|
+
- header: "Inbound surface"
|
|
51
|
+
- options:
|
|
52
|
+
1. `github-issues` (problem-report.yml or similar labelled issue)
|
|
53
|
+
2. `github-discussions` (Q&A category)
|
|
54
|
+
3. `github-security-advisories` (private vendor channel)
|
|
55
|
+
|
|
56
|
+
Do not ask if the surface is obvious from the URL / context.
|
|
57
|
+
|
|
58
|
+
### 3. Construct the review prompt
|
|
59
|
+
|
|
60
|
+
Build a self-contained prompt for the `wr-risk-scorer:inbound-report` subagent that includes:
|
|
61
|
+
|
|
62
|
+
- The **report body** verbatim (between explicit `<report>...</report>` markers so the agent's substring extraction is unambiguous).
|
|
63
|
+
- The **surface** (one of the canonical strings above).
|
|
64
|
+
- The **submitter handle** when known.
|
|
65
|
+
- The **JTBD-alignment hint** when known (composes with the agent's two-axis judgement).
|
|
66
|
+
- A reminder to compute `INBOUND_REPORT_KEY = sha256(body + '\n' + surface + '\n' + submitter)`.
|
|
67
|
+
|
|
68
|
+
### 4. Delegate to wr-risk-scorer:inbound-report
|
|
69
|
+
|
|
70
|
+
Invoke the subagent via the Skill / Agent tool with `subagent_type: wr-risk-scorer:inbound-report` and the constructed review prompt.
|
|
71
|
+
|
|
72
|
+
Wait for the subagent to complete. The subagent will output a structured verdict block (`INBOUND_REPORT_VERDICT: PASS|FAIL` + `INBOUND_REPORT_KEY: <sha>` + `INBOUND_REPORT_CLASS: <class>` + optional `INBOUND_REPORT_REASON: ...`). The `PostToolUse:Agent` hook (`risk-score-mark.sh`) reads that output and writes the per-report marker automatically.
|
|
73
|
+
|
|
74
|
+
**Do not write to `${TMPDIR:-/tmp}/claude-risk-*` yourself.** The hook is the only correct mechanism.
|
|
75
|
+
|
|
76
|
+
### 5. Present results
|
|
77
|
+
|
|
78
|
+
Present the full review report to the user. Highlight:
|
|
79
|
+
|
|
80
|
+
- The verdict (PASS / FAIL).
|
|
81
|
+
- The classification (`safe-low-fix-risk` / `safe-high-fix-risk` / `above-threshold-risk` / `clear-malicious-request`).
|
|
82
|
+
- The matched RISK-POLICY.md class + axis (Request-risk / Fix-risk) when FAIL.
|
|
83
|
+
- The exact substrings or metadata signals that triggered each finding when FAIL.
|
|
84
|
+
- The pipeline branch this report would route to under ADR-062 § Decision Outcome (pushback / clear-malicious-close-with-verdict / safe-and-valid-local-ticket-create).
|
|
85
|
+
- For `safe-high-fix-risk`: the fix-risk class the maintainer should weigh before accepting the local ticket (the pipeline creates the ticket but flags it for maintainer attention).
|
|
86
|
+
|
|
87
|
+
### 6. Above-appetite handling (ADR-013 Rule 6 + ADR-062 mechanical-stage carve-out)
|
|
88
|
+
|
|
89
|
+
The branch decision itself is **mechanical** per ADR-062 § Mechanical-stage carve-out (P132). When invoked as a pipeline pre-satisfier, this skill does NOT use `AskUserQuestion` to ask the maintainer "which branch?" — the verdict + class determine the branch deterministically. The maintainer's role is to accept or override the verdict via re-running with corrections, not to pick the branch.
|
|
90
|
+
|
|
91
|
+
When invoked manually as an on-demand pre-flight (NOT as a pipeline pre-satisfier), surface a single `AskUserQuestion` for what the maintainer wants to do next:
|
|
92
|
+
|
|
93
|
+
- header: "Next step"
|
|
94
|
+
- options:
|
|
95
|
+
1. `Accept verdict + run pipeline` — the maintainer agrees with the classification; `/wr-itil:review-problems` will route accordingly on the next invocation.
|
|
96
|
+
2. `Override + re-review with extra context` — the maintainer disagrees; pass extra context (e.g. "this submitter is a known good-faith contributor in `<other-repo>`") and re-invoke from step 3.
|
|
97
|
+
3. `Block reporter (P123 scaffolding)` — surface the audit-log entry for P123 block-list enforcement when that ticket lands. Until then, this option appends to `docs/audits/inbound-discovery-log.md` only.
|
|
98
|
+
4. `Cancel` — abandon the pre-flight; report intact for later review.
|
|
99
|
+
|
|
100
|
+
When invoked as a pipeline pre-satisfier (via the `/wr-itil:review-problems` Step 8.5 orchestrator), the skill is silent on this step per the mechanical-stage carve-out.
|
|
101
|
+
|
|
102
|
+
## Composition with the assessment-pipeline
|
|
103
|
+
|
|
104
|
+
This skill and the assessment-pipeline (ADR-062 § Decision Outcome) invoke the same `wr-risk-scorer:inbound-report` subagent. The skill is the maintainer-facing manual surface; the pipeline is the automated bulk-processing surface. Verdict shape is identical across both invocation paths (same `INBOUND_REPORT_VERDICT` + `INBOUND_REPORT_KEY` + `INBOUND_REPORT_CLASS` block); the consuming infrastructure (per-report marker, audit-log append, branch routing) is the same.
|
|
105
|
+
|
|
106
|
+
| Concern | This skill (on-demand) | `/wr-itil:review-problems` Step 8.5 (pipeline) |
|
|
107
|
+
|---------|------------------------|-----------------------------------------------|
|
|
108
|
+
| Invocation | Manual / pre-flight (JTBD-005, JTBD-202) | Automatic, in-loop with channel-config polling |
|
|
109
|
+
| Cardinality | One report per invocation | N reports per pass (channel-config drives N) |
|
|
110
|
+
| Branch decision | Per ADR-062 § Decision Outcome; mechanical | Same |
|
|
111
|
+
| Audit-log append | Yes (via PostToolUse hook) | Yes (via PostToolUse hook) |
|
|
112
|
+
| README rankings impact | None (skill is read-only) | Refreshes `## Inbound Upstream Reports` section in `docs/problems/README.md` Step 9e |
|
|
113
|
+
| AskUserQuestion authority | step 6 above (manual only) | None (mechanical-stage carve-out per P132) |
|
|
114
|
+
|
|
115
|
+
## ADR cross-references
|
|
116
|
+
|
|
117
|
+
- **ADR-062** (Inbound upstream-report discovery + assessment pipeline) — § Sibling subagent + § Mechanical-stage carve-out.
|
|
118
|
+
- **ADR-015** (On-demand assessment skills) — § Scope table extended with the `assess-inbound-report` row; § Naming Convention `assess-<artifact>` pattern; § Gate Marker Interaction (no skill-side marker writes).
|
|
119
|
+
- **ADR-009** (Gate marker lifecycle) — per-report marker TTL + drift discipline; same as the existing `external-comms-gate` marker.
|
|
120
|
+
- **ADR-013 Rule 1** + Rule 6 — `AskUserQuestion` only at maintainer-direction branches; mechanical-stage carve-out applies to pipeline invocations.
|
|
121
|
+
- **ADR-014** — assessment skills are read-only and exempt from commit obligation.
|
|
122
|
+
- **ADR-028** (External-comms gate, amended) — the pushback / clear-malicious-verdict comments the assessment-pipeline posts after this skill's FAIL verdict ride the P064 + P038 evaluator halves.
|
|
123
|
+
- **ADR-029** (Diagnose before implement) — verdict follows hypothesis / evidence / structured-verdict discipline.
|
|
124
|
+
- **ADR-044** — decision-delegation contract; mechanical-stage carve-out is the category-4 framework-resolution boundary.
|
|
125
|
+
- **P079** — parent ticket; this skill is Slice B per RFC-004.
|
|
126
|
+
- **P123** — blocked-user-list mechanism; composes with the `Block reporter` option in step 6.
|
|
127
|
+
- **JTBD-005** (Invoke Governance Assessments On Demand) — primary persona driver.
|
|
128
|
+
- **JTBD-202** (Pre-Flight Governance Checks) — secondary persona driver.
|
|
129
|
+
- **JTBD-001** (Enforce Governance Without Slowing Down) — mechanical-stage carve-out preserves "without slowing down".
|
|
130
|
+
|
|
131
|
+
$ARGUMENTS
|
|
@@ -0,0 +1,132 @@
|
|
|
1
|
+
#!/usr/bin/env bats
|
|
2
|
+
# Contract assertions for /wr-risk-scorer:assess-inbound-report skill
|
|
3
|
+
# (RFC-004 Slice B — on-demand wrapper per ADR-015). Peer of
|
|
4
|
+
# /wr-risk-scorer:assess-external-comms.
|
|
5
|
+
#
|
|
6
|
+
# Structural assertions — Permitted Exception to the source-grep ban
|
|
7
|
+
# per ADR-005 / P011 / ADR-037 / ADR-052 § Surface 2. SKILL.md prose
|
|
8
|
+
# governs LLM-driven runtime behaviour; behavioural-replay testing
|
|
9
|
+
# requires a synthetic agent harness (P012 / P176). Until that harness
|
|
10
|
+
# lands, contract bats assert the load-bearing contract elements are
|
|
11
|
+
# present so future edits don't silently strip them.
|
|
12
|
+
#
|
|
13
|
+
# @problem P079
|
|
14
|
+
# @rfc RFC-004 (Slice B)
|
|
15
|
+
# @adr ADR-062 (sibling subagent + on-demand wrapper)
|
|
16
|
+
# @adr ADR-015 (on-demand assessment skills — § Scope table extended)
|
|
17
|
+
# @adr ADR-044 (decision-delegation — taste / silent-mechanical authority)
|
|
18
|
+
# @jtbd JTBD-005 (invoke governance assessments on demand)
|
|
19
|
+
# @jtbd JTBD-202 (pre-flight governance checks before release/handover)
|
|
20
|
+
# @jtbd JTBD-001 (mechanical-stage carve-out on pipeline pre-satisfier path)
|
|
21
|
+
|
|
22
|
+
setup() {
|
|
23
|
+
SKILL_DIR="$(cd "$(dirname "$BATS_TEST_FILENAME")/.." && pwd)"
|
|
24
|
+
SKILL_FILE="${SKILL_DIR}/SKILL.md"
|
|
25
|
+
ADR_015="$(cd "${SKILL_DIR}/../../../.." && pwd)/docs/decisions/015-on-demand-assessment-skills.proposed.md"
|
|
26
|
+
}
|
|
27
|
+
|
|
28
|
+
@test "SKILL.md exists and has frontmatter" {
|
|
29
|
+
[ -f "$SKILL_FILE" ]
|
|
30
|
+
run head -1 "$SKILL_FILE"
|
|
31
|
+
[ "$status" -eq 0 ]
|
|
32
|
+
[ "$output" = "---" ]
|
|
33
|
+
}
|
|
34
|
+
|
|
35
|
+
@test "frontmatter name is wr-risk-scorer:assess-inbound-report" {
|
|
36
|
+
run grep -nE '^name: wr-risk-scorer:assess-inbound-report$' "$SKILL_FILE"
|
|
37
|
+
[ "$status" -eq 0 ]
|
|
38
|
+
}
|
|
39
|
+
|
|
40
|
+
@test "frontmatter allowed-tools includes Skill (delegates to subagent)" {
|
|
41
|
+
# ADR-015 § Gate Marker Interaction: on-demand skills MUST delegate
|
|
42
|
+
# via Skill tool; never write markers directly.
|
|
43
|
+
run grep -nE '^allowed-tools:.*Skill' "$SKILL_FILE"
|
|
44
|
+
[ "$status" -eq 0 ]
|
|
45
|
+
}
|
|
46
|
+
|
|
47
|
+
@test "frontmatter allowed-tools includes AskUserQuestion (manual-mode step 6)" {
|
|
48
|
+
# Step 6 (manual invocation only — silent on pipeline pre-satisfier
|
|
49
|
+
# invocations per P132) uses AskUserQuestion to surface next-step
|
|
50
|
+
# options.
|
|
51
|
+
run grep -nE '^allowed-tools:.*AskUserQuestion' "$SKILL_FILE"
|
|
52
|
+
[ "$status" -eq 0 ]
|
|
53
|
+
}
|
|
54
|
+
|
|
55
|
+
@test "frontmatter allowed-tools includes Bash (gh issue fetch in step 2)" {
|
|
56
|
+
# Step 2 can call `gh issue view --json body,author,labels` to fetch
|
|
57
|
+
# the report body when only a URL/ref is supplied.
|
|
58
|
+
run grep -nE '^allowed-tools:.*Bash' "$SKILL_FILE"
|
|
59
|
+
[ "$status" -eq 0 ]
|
|
60
|
+
}
|
|
61
|
+
|
|
62
|
+
# ──────────────────────────────────────────────────────────────────────────────
|
|
63
|
+
# Delegation to the sibling subagent (NOT marker self-writes)
|
|
64
|
+
# ──────────────────────────────────────────────────────────────────────────────
|
|
65
|
+
|
|
66
|
+
@test "skill delegates to wr-risk-scorer:inbound-report subagent" {
|
|
67
|
+
run grep -nE 'wr-risk-scorer:inbound-report' "$SKILL_FILE"
|
|
68
|
+
[ "$status" -eq 0 ]
|
|
69
|
+
}
|
|
70
|
+
|
|
71
|
+
@test "skill MUST NOT write to /tmp/ markers directly (ADR-009 + ADR-015 boundary)" {
|
|
72
|
+
# PostToolUse:Agent hook (risk-score-mark.sh) owns marker writes per
|
|
73
|
+
# ADR-009 + ADR-015 § Gate Marker Interaction.
|
|
74
|
+
run grep -inE 'NOT write.*/tmp|PostToolUse hook' "$SKILL_FILE"
|
|
75
|
+
[ "$status" -eq 0 ]
|
|
76
|
+
}
|
|
77
|
+
|
|
78
|
+
# ──────────────────────────────────────────────────────────────────────────────
|
|
79
|
+
# Mechanical-stage carve-out: pipeline pre-satisfier path is silent
|
|
80
|
+
# ──────────────────────────────────────────────────────────────────────────────
|
|
81
|
+
|
|
82
|
+
@test "skill names the mechanical-stage carve-out (P132) for pipeline pre-satisfier path" {
|
|
83
|
+
run grep -inE 'P132|mechanical-stage carve-out' "$SKILL_FILE"
|
|
84
|
+
[ "$status" -eq 0 ]
|
|
85
|
+
}
|
|
86
|
+
|
|
87
|
+
@test "step 6 AskUserQuestion fires ONLY on manual invocation (not pipeline pre-satisfier)" {
|
|
88
|
+
# The carve-out is the load-bearing protection for JTBD-001 + JTBD-006
|
|
89
|
+
# against inverse-P078 drift. The pipeline pre-satisfier path MUST be
|
|
90
|
+
# silent on this step. Match the contract in either direction:
|
|
91
|
+
# manual-only firing OR pipeline-pre-satisfier silent-on-step.
|
|
92
|
+
run grep -inE 'invoked manually.*pre-flight|manual only|silent on this step|silent on.*pipeline pre-satisfier' "$SKILL_FILE"
|
|
93
|
+
[ "$status" -eq 0 ]
|
|
94
|
+
}
|
|
95
|
+
|
|
96
|
+
# ──────────────────────────────────────────────────────────────────────────────
|
|
97
|
+
# Persona anchors (JTBD-005 + JTBD-202)
|
|
98
|
+
# ──────────────────────────────────────────────────────────────────────────────
|
|
99
|
+
|
|
100
|
+
@test "skill cites JTBD-005 (invoke on demand) as primary persona driver" {
|
|
101
|
+
run grep -nE 'JTBD-005' "$SKILL_FILE"
|
|
102
|
+
[ "$status" -eq 0 ]
|
|
103
|
+
}
|
|
104
|
+
|
|
105
|
+
@test "skill cites JTBD-202 (pre-flight governance checks) as secondary persona driver" {
|
|
106
|
+
run grep -nE 'JTBD-202' "$SKILL_FILE"
|
|
107
|
+
[ "$status" -eq 0 ]
|
|
108
|
+
}
|
|
109
|
+
|
|
110
|
+
# ──────────────────────────────────────────────────────────────────────────────
|
|
111
|
+
# ADR-015 Scope table row exists for assess-inbound-report
|
|
112
|
+
# ──────────────────────────────────────────────────────────────────────────────
|
|
113
|
+
|
|
114
|
+
@test "ADR-015 Scope table includes the assess-inbound-report row" {
|
|
115
|
+
[ -f "$ADR_015" ]
|
|
116
|
+
run grep -nE '`assess-inbound-report`' "$ADR_015"
|
|
117
|
+
[ "$status" -eq 0 ]
|
|
118
|
+
run grep -nE '`wr-risk-scorer:inbound-report`' "$ADR_015"
|
|
119
|
+
[ "$status" -eq 0 ]
|
|
120
|
+
}
|
|
121
|
+
|
|
122
|
+
@test "ADR-015 Confirmation checkbox covers assess-inbound-report skill" {
|
|
123
|
+
run grep -nE '\[x\] `packages/risk-scorer/skills/assess-inbound-report/SKILL\.md` created' "$ADR_015"
|
|
124
|
+
[ "$status" -eq 0 ]
|
|
125
|
+
}
|
|
126
|
+
|
|
127
|
+
@test "ADR-015 Related section names ADR-062 + P079 (driver references)" {
|
|
128
|
+
run grep -nE 'ADR-062.*inbound|inbound.*ADR-062' "$ADR_015"
|
|
129
|
+
[ "$status" -eq 0 ]
|
|
130
|
+
run grep -nE 'P079' "$ADR_015"
|
|
131
|
+
[ "$status" -eq 0 ]
|
|
132
|
+
}
|