@onlooker-community/ecosystem 0.23.1 → 0.25.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (42) hide show
  1. package/.claude-plugin/marketplace.json +13 -0
  2. package/.claude-plugin/plugin.json +1 -1
  3. package/.github/workflows/autofix.yml +65 -0
  4. package/.release-please-manifest.json +4 -3
  5. package/CHANGELOG.md +14 -0
  6. package/CLAUDE.md +1 -0
  7. package/package.json +3 -3
  8. package/plugins/assayer/.claude-plugin/plugin.json +14 -0
  9. package/plugins/assayer/CHANGELOG.md +10 -0
  10. package/plugins/assayer/README.md +114 -0
  11. package/plugins/assayer/config.json +14 -0
  12. package/plugins/assayer/docs/adr/001-verify-claims-against-transcript-evidence.md +57 -0
  13. package/plugins/assayer/docs/design.md +72 -0
  14. package/plugins/assayer/hooks/hooks.json +15 -0
  15. package/plugins/assayer/scripts/hooks/assayer-stop.sh +249 -0
  16. package/plugins/assayer/scripts/lib/assayer-config.sh +88 -0
  17. package/plugins/assayer/scripts/lib/assayer-events.sh +85 -0
  18. package/plugins/assayer/scripts/lib/assayer-extract.sh +87 -0
  19. package/plugins/assayer/scripts/lib/assayer-project-key.sh +69 -0
  20. package/plugins/assayer/scripts/lib/assayer-transcript.sh +99 -0
  21. package/plugins/assayer/scripts/lib/assayer-ulid.sh +46 -0
  22. package/plugins/assayer/scripts/lib/assayer-verify.sh +95 -0
  23. package/plugins/compass/README.md +173 -0
  24. package/plugins/counsel/README.md +98 -0
  25. package/plugins/governor/README.md +127 -0
  26. package/plugins/librarian/.claude-plugin/plugin.json +2 -2
  27. package/plugins/librarian/CHANGELOG.md +7 -0
  28. package/plugins/librarian/scripts/lib/librarian-cli.sh +339 -0
  29. package/plugins/librarian/skills/librarian/SKILL.md +63 -0
  30. package/plugins/scribe/.claude-plugin/plugin.json +1 -3
  31. package/plugins/scribe/README.md +118 -0
  32. package/plugins/warden/README.md +185 -0
  33. package/release-please-config.json +16 -0
  34. package/test/bats/assayer-config.bats +60 -0
  35. package/test/bats/assayer-events.bats +99 -0
  36. package/test/bats/assayer-extract.bats +76 -0
  37. package/test/bats/assayer-project-key.bats +58 -0
  38. package/test/bats/assayer-stop-hook.bats +81 -0
  39. package/test/bats/assayer-transcript.bats +72 -0
  40. package/test/bats/assayer-ulid.bats +31 -0
  41. package/test/bats/assayer-verify.bats +89 -0
  42. package/test/bats/librarian-cli.bats +305 -0
@@ -0,0 +1,185 @@
1
+ # Warden
2
+
3
+ Untrusted-content gate enforcing the Agents Rule of Two.
4
+
5
+ Warden scans content flowing into the agent through `WebFetch` and `Read` for prompt-injection patterns. When it finds a threat, it closes a session-scoped **content gate** that blocks `Write`, `Edit`, `MultiEdit`, and `Bash` until the user explicitly clears it.
6
+
7
+ Grounded in Meta's *Agents Rule of Two*: an agent should hold no more than two of {access to private data, ability to take external actions, processing of untrusted content} at once. A coding agent in a real repository already holds the first two — your source and secrets, plus the ability to write files and run commands. The moment it ingests untrusted content (a fetched page, a file of unknown provenance) it holds all three: the dangerous configuration in which untrusted content can steer private data into external actions. Warden cannot un-read content, so it removes the *external-actions* property instead — closing the gate keeps the agent reading and reasoning while a human reviews the situation. Three-of-three collapses back to two-of-three, with the user as the release valve.
8
+
9
+ Warden is a sibling plugin to [`ecosystem`](../../) and assumes the Onlooker observability substrate (`~/.onlooker/`) is present.
10
+
11
+ ## How it works
12
+
13
+ Detection and enforcement are split across two hook surfaces, mediated only by the on-disk gate lock — the surfaces never call each other. See [ADR-001](docs/adr/001-detect-after-ingest-gate-before-action.md).
14
+
15
+ | Surface | What Warden does |
16
+ |---------|------------------|
17
+ | `PostToolUse` (`WebFetch`, `Read`) | Extracts ingested content from `tool_response`, applies the source and skip-glob filters and length cap, and runs the hybrid scanner. A strong pattern hit closes the gate immediately; a weak hit escalates to the evaluator. On a positive verdict it closes the session-scoped gate and emits `warden.threat.detected`. PostToolUse cannot block the read — and deliberately does not, because reading is how the threat is discovered. |
18
+ | `PreToolUse` (`Write`, `Edit`, `MultiEdit`, `Bash`) | Pure lock check: if the gate is closed it returns `{"decision":"block", …}` and emits `warden.gate.blocked`; otherwise it allows silently. No model call, no command parsing. |
19
+ | `SessionStart` | Initializes Warden for the session. A new session always starts with the gate open, even if a prior session saw a threat. |
20
+ | `/warden` skill | The user-facing control surface — reports gate status and is the only sanctioned way to clear a closed gate. |
21
+
22
+ ### Hybrid detection
23
+
24
+ Detection is a two-stage funnel, balancing coverage against cost and data egress:
25
+
26
+ 1. **Pattern floor** (`warden-patterns.sh`) — a curated regex set mapped to five threat types: `prompt_injection`, `instruction_override`, `credential_exfiltration`, `command_injection`, and `social_engineering`. **Strong** signatures (explicit override/exfil/command-injection phrasing) score `detection.strong_pattern_confidence` (default `0.9`) and close the gate with no model call. **Weak** signatures (social-engineering pressure, soft instruction-shaped imperatives) score `detection.weak_pattern_confidence` (default `0.5`) — below `close_threshold` — and are treated as borderline.
27
+ 2. **LLM escalation** (`warden-evaluator.sh`) — borderline content is sanitized and sent to N parallel Haiku judges (majority vote). The gate closes only if the panel judges it an injection with confidence `≥ close_threshold`.
28
+
29
+ Clean content (no signature) never reaches the model. Set `escalation.enabled: false` for a zero-egress, pattern-only posture.
30
+
31
+ ### Fail-soft posture
32
+
33
+ - Detection never blocks the read — `PostToolUse` cannot. If escalation errors, Warden falls back to the deterministic pattern verdict.
34
+ - Enforcement is a pure lock check, trivially fail-closed: a present lock always blocks.
35
+ - Event emission is best-effort; a schema-validation or emit failure is logged to stderr and never blocks a session.
36
+
37
+ ## Activation
38
+
39
+ Warden is **off by default**. Enable per-project in `.claude/settings.json`:
40
+
41
+ ```json
42
+ {
43
+ "warden": {
44
+ "enabled": true
45
+ }
46
+ }
47
+ ```
48
+
49
+ Or globally in `~/.claude/settings.json`.
50
+
51
+ ## Configuration
52
+
53
+ All keys are optional. Unset keys fall back to the plugin's `config.json` defaults.
54
+
55
+ ```json
56
+ {
57
+ "warden": {
58
+ "enabled": false,
59
+ "scan": {
60
+ "sources": ["web_fetch", "file_read"],
61
+ "max_content_chars": 20000,
62
+ "skip_globs": ["**/*.lock", "**/*.sum", "**/node_modules/**", "**/.git/**", "**/dist/**", "**/build/**"],
63
+ "store_snippet": true,
64
+ "snippet_max_chars": 240
65
+ },
66
+ "detection": {
67
+ "close_threshold": 0.65,
68
+ "strong_pattern_confidence": 0.9,
69
+ "weak_pattern_confidence": 0.5
70
+ },
71
+ "escalation": {
72
+ "enabled": true,
73
+ "borderline_only": true,
74
+ "model": "claude-haiku-4-5-20251001",
75
+ "n": 3,
76
+ "temperature": 0.0,
77
+ "max_output_tokens": 192,
78
+ "sample_timeout_seconds": 12,
79
+ "min_valid_samples": 2
80
+ },
81
+ "gate": {
82
+ "blocked_tools": ["Write", "Edit", "MultiEdit", "Bash"],
83
+ "clear_policy": "user_override_only"
84
+ }
85
+ }
86
+ }
87
+ ```
88
+
89
+ | Key | Default | Description |
90
+ |-----|---------|-------------|
91
+ | `enabled` | `false` | Must be `true` for any scanning or gating to run. |
92
+ | `scan.sources` | `["web_fetch", "file_read"]` | Which ingestion sources to scan. Matches the schema's `source_type` enum. |
93
+ | `scan.max_content_chars` | `20000` | Length cap on the content fed into detection. |
94
+ | `scan.skip_globs` | lockfiles, `node_modules`, `.git`, `dist`, `build`, … | Globs whose reads are not scanned. |
95
+ | `scan.store_snippet` | `true` | Whether to keep a flagged excerpt in the gate record and event payload. |
96
+ | `scan.snippet_max_chars` | `240` | Maximum length of a stored snippet. |
97
+ | `detection.close_threshold` | `0.65` | Confidence at or above which a verdict closes the gate. |
98
+ | `detection.strong_pattern_confidence` | `0.9` | Score assigned to strong pattern hits — above threshold, closes without a model call. |
99
+ | `detection.weak_pattern_confidence` | `0.5` | Score assigned to weak pattern hits — below threshold, escalates to the evaluator. |
100
+ | `escalation.enabled` | `true` | Whether borderline content escalates to the LLM evaluator. `false` is a zero-egress, pattern-only posture. |
101
+ | `escalation.borderline_only` | `true` | Escalate only weak/borderline hits, never clean content. |
102
+ | `escalation.model` | `claude-haiku-4-5-20251001` | Model used for the evaluator panel. |
103
+ | `escalation.n` | `3` | Number of parallel evaluator samples (majority vote). |
104
+ | `escalation.temperature` | `0.0` | Sampling temperature for the evaluator. |
105
+ | `escalation.max_output_tokens` | `192` | Token ceiling per evaluator sample. |
106
+ | `escalation.sample_timeout_seconds` | `12` | Per-sample wall-clock timeout. |
107
+ | `escalation.min_valid_samples` | `2` | Minimum valid samples required to form a verdict. |
108
+ | `gate.blocked_tools` | `["Write", "Edit", "MultiEdit", "Bash"]` | Tools blocked while the gate is closed. |
109
+ | `gate.clear_policy` | `user_override_only` | How a closed gate may be cleared. Only explicit user override is supported. |
110
+
111
+ On escalation, only a sanitized, length-capped excerpt of the ingested content is sent to the evaluator model. Setting `escalation.enabled: false` disables all egress — Warden then relies on the deterministic pattern floor alone.
112
+
113
+ ## The gate model
114
+
115
+ The gate is a single session-scoped lock with two states:
116
+
117
+ - **Open** (default — file absent, or `{"state":"open"}`) — `Write`, `Edit`, `MultiEdit`, and `Bash` are allowed.
118
+ - **Closed** (`{"state":"closed", …}`) — those operations are blocked at `PreToolUse`.
119
+
120
+ The detection hook **closes** the gate on a positive scan. Once closed, it can be **cleared only by the user** via the `/warden` skill (`clear_policy: user_override_only`) — Warden does not auto-clear in this release. The gate is session-scoped: a brand-new session starts open even if a prior session saw a threat, because the untrusted content lives in a specific session's context.
121
+
122
+ Clearing the gate re-enables write-class tools but does not remove the flagged content from the conversation — it is still in context. The skill reminds the user of this.
123
+
124
+ ## Storage layout
125
+
126
+ ```text
127
+ ~/.onlooker/warden/sessions/<session_id>/
128
+ └── gate.json
129
+ ```
130
+
131
+ `gate.json` when the gate is closed:
132
+
133
+ ```json
134
+ {
135
+ "state": "closed",
136
+ "closed_at": 1717000000,
137
+ "threat": {
138
+ "threat_id": "01J…",
139
+ "source_type": "web_fetch",
140
+ "threat_type": "credential_exfiltration",
141
+ "confidence": 0.9,
142
+ "source_url": "https://…",
143
+ "source_path": null,
144
+ "snippet": "…sanitized excerpt…",
145
+ "matched_pattern": "…",
146
+ "detection_method": "pattern_strong"
147
+ }
148
+ }
149
+ ```
150
+
151
+ The local record keeps forensic fields (`threat_id`, `matched_pattern`, `detection_method`). The emitted `warden.threat.detected` event carries only the schema-permitted fields — warden payloads use `additionalProperties: false`. State is keyed by `session_id`, not by repository: the gate guards a single session's context.
152
+
153
+ ## Events emitted
154
+
155
+ Warden emits the canonical `warden.*` event surface from [`@onlooker-community/schema`](https://github.com/onlooker-community/schema) (v2.4.0+). All events land in `~/.onlooker/logs/onlooker-events.jsonl` and are validated against the schema before write.
156
+
157
+ | Event | When | Payload |
158
+ |-------|------|---------|
159
+ | `warden.threat.detected` | A scan closes the gate. | `source_type`, `threat_type`, `confidence` (plus optional `source_url` / `source_path` / `snippet`) |
160
+ | `warden.gate.blocked` | A write/edit/bash operation is blocked by a closed gate. | `blocked_operation`, `threat_source_type` |
161
+ | `warden.threat.cleared` | The user clears the gate via `/warden`. | `source_type`, `cleared_by: user_override` |
162
+
163
+ ## The `/warden` skill
164
+
165
+ `/warden` is the user-facing control surface for the gate that the hooks open and close automatically.
166
+
167
+ - `/warden` or `/warden status` — prints whether the gate is OPEN or CLOSED. When closed, prints the recorded threat: `threat_type`, `source_type`, source URL/path, confidence, detection method, matched pattern, and the flagged snippet (when storage is enabled).
168
+ - `/warden clear` (also `reopen`, `override`, `unblock`) — verifies the gate is closed, removes the lock, re-enables `Write`/`Edit`/`Bash`, and emits `warden.threat.cleared` with `cleared_by: user_override`.
169
+
170
+ The skill resolves the active session automatically: it prefers `$CLAUDE_SESSION_ID`, falls back to the single closed gate when exactly one exists, and reports ambiguity if several sessions have closed gates (re-run with an explicit session id in that case). Closing is automatic; clearing is always a deliberate user decision.
171
+
172
+ ## Requirements
173
+
174
+ - The `ecosystem` plugin installed (for the `~/.onlooker/` substrate and canonical event emission).
175
+ - `claude` CLI on `PATH` (the evaluator shells out to `claude -p` when escalation is enabled).
176
+ - `jq` for JSON manipulation.
177
+ - `node` for canonical-event emission.
178
+
179
+ ## Architecture decisions
180
+
181
+ Key decisions made during initial design are recorded in [`docs/adr/`](docs/adr/):
182
+
183
+ - [ADR-001](docs/adr/001-detect-after-ingest-gate-before-action.md) — Detect after ingestion, gate before action (the detection/enforcement split and its Rule-of-Two mapping)
184
+
185
+ See also the full plugin design in [`docs/design.md`](docs/design.md).
@@ -206,6 +206,22 @@
206
206
  "jsonpath": "$.version"
207
207
  }
208
208
  ]
209
+ },
210
+ "plugins/assayer": {
211
+ "changelog-path": "CHANGELOG.md",
212
+ "release-type": "simple",
213
+ "bump-minor-pre-major": true,
214
+ "bump-patch-for-minor-pre-major": false,
215
+ "component": "assayer",
216
+ "draft": false,
217
+ "prerelease": false,
218
+ "extra-files": [
219
+ {
220
+ "type": "json",
221
+ "path": ".claude-plugin/plugin.json",
222
+ "jsonpath": "$.version"
223
+ }
224
+ ]
209
225
  }
210
226
  },
211
227
  "$schema": "https://raw.githubusercontent.com/googleapis/release-please/main/schemas/config.json"
@@ -0,0 +1,60 @@
1
+ #!/usr/bin/env bats
2
+
3
+ # Exercises Assayer config loading: defaults and per-project overrides.
4
+
5
+ setup() {
6
+ source "${BATS_TEST_DIRNAME}/../helpers/setup.bash"
7
+ setup_test_env
8
+ PLUGIN_ROOT="${REPO_ROOT}/plugins/assayer"
9
+ export CLAUDE_PLUGIN_ROOT="$PLUGIN_ROOT"
10
+ # shellcheck disable=SC1091
11
+ source "${PLUGIN_ROOT}/scripts/lib/assayer-config.sh"
12
+
13
+ REPO="${BATS_TEST_TMPDIR}/repo"
14
+ mkdir -p "${REPO}/.claude"
15
+ }
16
+
17
+ @test "disabled by default (no settings)" {
18
+ assayer_config_load "$REPO"
19
+ run assayer_config_enabled
20
+ [ "$status" -ne 0 ]
21
+ }
22
+
23
+ @test "enabled when settings opt in" {
24
+ printf '%s\n' '{"assayer":{"enabled":true}}' >"${REPO}/.claude/settings.json"
25
+ assayer_config_load "$REPO"
26
+ run assayer_config_enabled
27
+ [ "$status" -eq 0 ]
28
+ }
29
+
30
+ @test "default model is haiku" {
31
+ assayer_config_load "$REPO"
32
+ [ "$(assayer_config_model)" = "claude-haiku-4-5-20251001" ]
33
+ }
34
+
35
+ @test "model override is honored" {
36
+ printf '%s\n' '{"assayer":{"evaluation":{"model":"claude-opus-4-8"}}}' >"${REPO}/.claude/settings.json"
37
+ assayer_config_load "$REPO"
38
+ [ "$(assayer_config_model)" = "claude-opus-4-8" ]
39
+ }
40
+
41
+ @test "default max_claims is 12" {
42
+ assayer_config_load "$REPO"
43
+ [ "$(assayer_config_max_claims)" = "12" ]
44
+ }
45
+
46
+ @test "default min_confidence is 0.5" {
47
+ assayer_config_load "$REPO"
48
+ [ "$(assayer_config_min_confidence)" = "0.5" ]
49
+ }
50
+
51
+ @test "min_confidence override is honored" {
52
+ printf '%s\n' '{"assayer":{"min_confidence":0.8}}' >"${REPO}/.claude/settings.json"
53
+ assayer_config_load "$REPO"
54
+ [ "$(assayer_config_min_confidence)" = "0.8" ]
55
+ }
56
+
57
+ @test "default timeout is 60" {
58
+ assayer_config_load "$REPO"
59
+ [ "$(assayer_config_timeout)" = "60" ]
60
+ }
@@ -0,0 +1,99 @@
1
+ #!/usr/bin/env bats
2
+
3
+ # Validates every emitted assayer.* event against @onlooker-community/schema.
4
+ #
5
+ # The assayer.* event types ship in @onlooker-community/schema; until the
6
+ # installed version includes them, these tests skip rather than fail. Once the
7
+ # ecosystem's schema dependency is bumped to a release that carries them, they
8
+ # run for real. See plugins/assayer/README.md (Requirements).
9
+
10
+ setup() {
11
+ source "${BATS_TEST_DIRNAME}/../helpers/setup.bash"
12
+ setup_test_env
13
+
14
+ PLUGIN_ROOT="${REPO_ROOT}/plugins/assayer"
15
+ export CLAUDE_PLUGIN_ROOT="$PLUGIN_ROOT"
16
+ export ONLOOKER_EVENTS_LOG="${ONLOOKER_DIR}/logs/onlooker-events.jsonl"
17
+ mkdir -p "$(dirname "$ONLOOKER_EVENTS_LOG")"
18
+
19
+ export _ONLOOKER_EVENT_JS="${REPO_ROOT}/scripts/lib/onlooker-event.mjs"
20
+ export CLAUDE_SESSION_ID="bats-session-$$"
21
+
22
+ # shellcheck disable=SC1091
23
+ source "${PLUGIN_ROOT}/scripts/lib/assayer-events.sh"
24
+ }
25
+
26
+ # Skip when the installed schema predates the assayer.* event types.
27
+ _require_assayer_schema() {
28
+ if ! grep -q "assayer.audit.started" \
29
+ "${REPO_ROOT}/node_modules/@onlooker-community/schema/schemas/event.v1.json" 2>/dev/null; then
30
+ skip "installed @onlooker-community/schema has no assayer.* types yet"
31
+ fi
32
+ }
33
+
34
+ _validate_latest_event() {
35
+ local last
36
+ last=$(tail -n 1 "$ONLOOKER_EVENTS_LOG")
37
+ [ -n "$last" ] || return 1
38
+ printf '%s' "$last" | ONLOOKER_DIR="$ONLOOKER_DIR" \
39
+ node "${REPO_ROOT}/scripts/lib/onlooker-event.mjs" validate >/dev/null
40
+ }
41
+
42
+ # Valid 26-char Crockford Base32 ULID (no I, L, O, or U).
43
+ AUDIT_ID="01J0000000000000000000AB34"
44
+
45
+ @test "assayer.audit.started validates" {
46
+ _require_assayer_schema
47
+ local p
48
+ p=$(jq -n --arg a "$AUDIT_ID" '{audit_id: $a, claim_count: 3, trigger: "stop", command_count: 5}')
49
+ assayer_emit_event "assayer.audit.started" "$p"
50
+ run _validate_latest_event
51
+ [ "$status" -eq 0 ]
52
+ }
53
+
54
+ @test "assayer.claim.contradicted validates" {
55
+ _require_assayer_schema
56
+ local p
57
+ p=$(jq -n --arg a "$AUDIT_ID" '{
58
+ audit_id: $a,
59
+ claim: "I ran the tests and they all pass.",
60
+ claim_type: "tests_pass",
61
+ evidence_command: "npm test",
62
+ result_excerpt: "1 failed, 32 passed",
63
+ confidence: 0.9
64
+ }')
65
+ assayer_emit_event "assayer.claim.contradicted" "$p"
66
+ run _validate_latest_event
67
+ [ "$status" -eq 0 ]
68
+ }
69
+
70
+ @test "assayer.claim.unverified validates" {
71
+ _require_assayer_schema
72
+ local p
73
+ p=$(jq -n --arg a "$AUDIT_ID" '{audit_id: $a, claim: "The deploy is healthy.", claim_type: "generic", reason: "no_matching_command"}')
74
+ assayer_emit_event "assayer.claim.unverified" "$p"
75
+ run _validate_latest_event
76
+ [ "$status" -eq 0 ]
77
+ }
78
+
79
+ @test "assayer.audit.complete validates" {
80
+ _require_assayer_schema
81
+ local p
82
+ p=$(jq -n --arg a "$AUDIT_ID" '{
83
+ audit_id: $a, claim_count: 3, corroborated: 1, contradicted: 1,
84
+ unverified: 1, verdict: "contradictions_found", duration_ms: 4200
85
+ }')
86
+ assayer_emit_event "assayer.audit.complete" "$p"
87
+ run _validate_latest_event
88
+ [ "$status" -eq 0 ]
89
+ }
90
+
91
+ @test "emission fails on unknown event type" {
92
+ run assayer_emit_event "assayer.no.such.event" '{"audit_id":"x"}'
93
+ [ "$status" -ne 0 ]
94
+ }
95
+
96
+ @test "assayer_emit_event returns 1 when payload is empty" {
97
+ run assayer_emit_event "assayer.audit.started" ""
98
+ [ "$status" -ne 0 ]
99
+ }
@@ -0,0 +1,76 @@
1
+ #!/usr/bin/env bats
2
+
3
+ # Exercises claim parsing: assayer_parse_claims and the extraction prompt.
4
+
5
+ setup() {
6
+ source "${BATS_TEST_DIRNAME}/../helpers/setup.bash"
7
+ setup_test_env
8
+ PLUGIN_ROOT="${REPO_ROOT}/plugins/assayer"
9
+ # shellcheck disable=SC1091
10
+ source "${PLUGIN_ROOT}/scripts/lib/assayer-extract.sh"
11
+ }
12
+
13
+ @test "parses a clean JSON array of claims" {
14
+ run assayer_parse_claims '[{"text":"tests pass","type":"tests_pass","command_keyword":"test","confidence":0.9}]'
15
+ [ "$status" -eq 0 ]
16
+ [ "$(printf '%s' "$output" | jq 'length')" -eq 1 ]
17
+ [ "$(printf '%s' "$output" | jq -r '.[0].type')" = "tests_pass" ]
18
+ }
19
+
20
+ @test "strips markdown fences" {
21
+ local raw
22
+ raw=$'```json\n[{"text":"build ok","type":"build_succeeds","command_keyword":"build","confidence":0.8}]\n```'
23
+ run assayer_parse_claims "$raw"
24
+ [ "$status" -eq 0 ]
25
+ [ "$(printf '%s' "$output" | jq 'length')" -eq 1 ]
26
+ }
27
+
28
+ @test "drops malformed entries and entries without text" {
29
+ run assayer_parse_claims '[{"text":"ok","type":"generic","confidence":0.7},{"no_text":true},{"text":""}]'
30
+ [ "$status" -eq 0 ]
31
+ [ "$(printf '%s' "$output" | jq 'length')" -eq 1 ]
32
+ }
33
+
34
+ @test "coerces unknown type to generic" {
35
+ run assayer_parse_claims '[{"text":"thing","type":"made_up","command_keyword":"x","confidence":0.7}]'
36
+ [ "$status" -eq 0 ]
37
+ [ "$(printf '%s' "$output" | jq -r '.[0].type')" = "generic" ]
38
+ }
39
+
40
+ @test "defaults confidence when missing or non-numeric" {
41
+ run assayer_parse_claims '[{"text":"thing","type":"generic","command_keyword":"x"}]'
42
+ [ "$status" -eq 0 ]
43
+ [ "$(printf '%s' "$output" | jq -r '.[0].confidence')" = "0.6" ]
44
+ }
45
+
46
+ @test "lowercases command_keyword" {
47
+ run assayer_parse_claims '[{"text":"thing","type":"generic","command_keyword":"TEST","confidence":0.7}]'
48
+ [ "$status" -eq 0 ]
49
+ [ "$(printf '%s' "$output" | jq -r '.[0].command_keyword')" = "test" ]
50
+ }
51
+
52
+ @test "non-array input yields empty array" {
53
+ run assayer_parse_claims '{"text":"not an array"}'
54
+ [ "$status" -eq 0 ]
55
+ [ "$output" = "[]" ]
56
+ }
57
+
58
+ @test "garbage input yields empty array" {
59
+ run assayer_parse_claims 'I could not find any claims.'
60
+ [ "$status" -eq 0 ]
61
+ [ "$output" = "[]" ]
62
+ }
63
+
64
+ @test "empty input yields empty array" {
65
+ run assayer_parse_claims ""
66
+ [ "$status" -eq 0 ]
67
+ [ "$output" = "[]" ]
68
+ }
69
+
70
+ @test "extraction prompt includes the message and the JSON contract" {
71
+ run assayer_build_extraction_prompt "I ran the tests and they pass." 5
72
+ [ "$status" -eq 0 ]
73
+ [[ "$output" == *"I ran the tests and they pass."* ]]
74
+ [[ "$output" == *"TESTABLE SUCCESS CLAIM"* ]]
75
+ [[ "$output" == *"at most 5 claims"* ]]
76
+ }
@@ -0,0 +1,58 @@
1
+ #!/usr/bin/env bats
2
+
3
+ # Exercises Assayer project-key derivation.
4
+
5
+ setup() {
6
+ source "${BATS_TEST_DIRNAME}/../helpers/setup.bash"
7
+ setup_test_env
8
+ PLUGIN_ROOT="${REPO_ROOT}/plugins/assayer"
9
+ # shellcheck disable=SC1091
10
+ source "${PLUGIN_ROOT}/scripts/lib/assayer-project-key.sh"
11
+
12
+ REPO="${BATS_TEST_TMPDIR}/repo"
13
+ mkdir -p "$REPO"
14
+ git -C "$REPO" init -q
15
+ git -C "$REPO" config user.email test@example.com
16
+ git -C "$REPO" config user.name test
17
+ (cd "$REPO" && printf 'x\n' >f && git add f && git commit -q -m init)
18
+ }
19
+
20
+ @test "key is 12 hex chars for a repo with a remote" {
21
+ git -C "$REPO" remote add origin https://example.com/foo/bar.git
22
+ run assayer_project_key "$REPO"
23
+ [ "$status" -eq 0 ]
24
+ [[ "$output" =~ ^[0-9a-f]{12}$ ]]
25
+ }
26
+
27
+ @test "key is stable across calls" {
28
+ git -C "$REPO" remote add origin https://example.com/foo/bar.git
29
+ a=$(assayer_project_key "$REPO")
30
+ b=$(assayer_project_key "$REPO")
31
+ [ "$a" = "$b" ]
32
+ }
33
+
34
+ @test "remote-keyed differs from root-keyed" {
35
+ local with_remote without_remote
36
+ without_remote=$(assayer_project_key "$REPO")
37
+ git -C "$REPO" remote add origin https://example.com/foo/bar.git
38
+ with_remote=$(assayer_project_key "$REPO")
39
+ [ "$with_remote" != "$without_remote" ]
40
+ }
41
+
42
+ @test "different remotes yield different keys" {
43
+ git -C "$REPO" remote add origin https://example.com/foo/one.git
44
+ local one
45
+ one=$(assayer_project_key "$REPO")
46
+ git -C "$REPO" remote set-url origin https://example.com/foo/two.git
47
+ local two
48
+ two=$(assayer_project_key "$REPO")
49
+ [ "$one" != "$two" ]
50
+ }
51
+
52
+ @test "non-repo cwd yields empty key" {
53
+ local non_repo="${BATS_TEST_TMPDIR}/not-a-repo"
54
+ mkdir -p "$non_repo"
55
+ run assayer_project_key "$non_repo"
56
+ [ "$status" -eq 0 ]
57
+ [ -z "$output" ]
58
+ }
@@ -0,0 +1,81 @@
1
+ #!/usr/bin/env bats
2
+
3
+ # Exercises the Assayer Stop hook's gating behavior. Does not invoke claude -p
4
+ # (the hook bails before the extraction step when preconditions fail).
5
+ # Verifies: disabled-by-default, no-git, recursion guard, no-transcript, and
6
+ # stdout silence (advisory hook must never block Stop).
7
+
8
+ setup() {
9
+ source "${BATS_TEST_DIRNAME}/../helpers/setup.bash"
10
+ setup_test_env
11
+
12
+ PLUGIN_ROOT="${REPO_ROOT}/plugins/assayer"
13
+ export CLAUDE_PLUGIN_ROOT="$PLUGIN_ROOT"
14
+ HOOK="${PLUGIN_ROOT}/scripts/hooks/assayer-stop.sh"
15
+
16
+ REPO="${BATS_TEST_TMPDIR}/repo"
17
+ mkdir -p "$REPO"
18
+ git -C "$REPO" init -q
19
+ git -C "$REPO" config user.email test@example.com
20
+ git -C "$REPO" config user.name test
21
+ (cd "$REPO" && printf 'initial\n' >README.md && git add README.md && git commit -q -m init)
22
+
23
+ TRANSCRIPT="${BATS_TEST_TMPDIR}/transcript.jsonl"
24
+ printf '%s\n' '{"type":"assistant","message":{"content":[{"type":"text","text":"All tests pass."}]}}' >"$TRANSCRIPT"
25
+ }
26
+
27
+ _make_input() {
28
+ local cwd="${1:-$REPO}" sid="${2:-test-session}" transcript="${3:-$TRANSCRIPT}"
29
+ jq -n --arg cwd "$cwd" --arg sid "$sid" --arg tp "$transcript" \
30
+ '{cwd: $cwd, session_id: $sid, transcript_path: $tp}'
31
+ }
32
+
33
+ @test "exits 0 silently when assayer.enabled is false (default)" {
34
+ local input
35
+ input=$(_make_input)
36
+ run bash -c "printf '%s' '$input' | ONLOOKER_DIR='$ONLOOKER_DIR' '$HOOK'"
37
+ [ "$status" -eq 0 ]
38
+ [ -z "$output" ]
39
+ }
40
+
41
+ @test "exits 0 when cwd is not a git repo" {
42
+ local non_repo="${BATS_TEST_TMPDIR}/not-a-repo"
43
+ mkdir -p "$non_repo"
44
+ local input
45
+ input=$(_make_input "$non_repo")
46
+ run bash -c "printf '%s' '$input' | ONLOOKER_DIR='$ONLOOKER_DIR' '$HOOK'"
47
+ [ "$status" -eq 0 ]
48
+ }
49
+
50
+ @test "recursion guard: ASSAYER_NESTED=1 causes immediate exit 0" {
51
+ mkdir -p "${REPO}/.claude"
52
+ printf '%s\n' '{"assayer":{"enabled":true}}' >"${REPO}/.claude/settings.json"
53
+ local input
54
+ input=$(_make_input)
55
+ run bash -c "printf '%s' '$input' | ASSAYER_NESTED=1 ONLOOKER_DIR='$ONLOOKER_DIR' '$HOOK'"
56
+ [ "$status" -eq 0 ]
57
+ [ -z "$output" ]
58
+ }
59
+
60
+ @test "exits 0 when enabled but transcript is missing" {
61
+ mkdir -p "${REPO}/.claude"
62
+ printf '%s\n' '{"assayer":{"enabled":true}}' >"${REPO}/.claude/settings.json"
63
+ local input
64
+ input=$(_make_input "$REPO" "test-session" "${BATS_TEST_TMPDIR}/nope.jsonl")
65
+ run bash -c "printf '%s' '$input' | ONLOOKER_DIR='$ONLOOKER_DIR' '$HOOK'"
66
+ [ "$status" -eq 0 ]
67
+ [ -z "$output" ]
68
+ }
69
+
70
+ @test "exits 0 when enabled but final message is empty" {
71
+ mkdir -p "${REPO}/.claude"
72
+ printf '%s\n' '{"assayer":{"enabled":true}}' >"${REPO}/.claude/settings.json"
73
+ # Transcript with no assistant text turn.
74
+ local empty_transcript="${BATS_TEST_TMPDIR}/empty.jsonl"
75
+ printf '%s\n' '{"type":"user","message":{"content":[{"type":"text","text":"hi"}]}}' >"$empty_transcript"
76
+ local input
77
+ input=$(_make_input "$REPO" "test-session" "$empty_transcript")
78
+ run bash -c "printf '%s' '$input' | ONLOOKER_DIR='$ONLOOKER_DIR' '$HOOK'"
79
+ [ "$status" -eq 0 ]
80
+ [ -z "$output" ]
81
+ }
@@ -0,0 +1,72 @@
1
+ #!/usr/bin/env bats
2
+
3
+ # Exercises the transcript reader against a synthetic JSONL transcript shaped
4
+ # like a real Claude Code session log.
5
+
6
+ setup() {
7
+ source "${BATS_TEST_DIRNAME}/../helpers/setup.bash"
8
+ setup_test_env
9
+ PLUGIN_ROOT="${REPO_ROOT}/plugins/assayer"
10
+ # shellcheck disable=SC1091
11
+ source "${PLUGIN_ROOT}/scripts/lib/assayer-transcript.sh"
12
+
13
+ TRANSCRIPT="${BATS_TEST_TMPDIR}/transcript.jsonl"
14
+ {
15
+ # An assistant turn that runs a passing build, then a failing test.
16
+ printf '%s\n' '{"type":"assistant","message":{"content":[{"type":"text","text":"Running the build."},{"type":"tool_use","name":"Bash","id":"t1","input":{"command":"npm run build"}}]}}'
17
+ printf '%s\n' '{"type":"user","message":{"content":[{"type":"tool_result","tool_use_id":"t1","is_error":false,"content":"build ok"}]}}'
18
+ printf '%s\n' '{"type":"assistant","message":{"content":[{"type":"tool_use","name":"Bash","id":"t2","input":{"command":"npm test"}}]}}'
19
+ printf '%s\n' '{"type":"user","message":{"content":[{"type":"tool_result","tool_use_id":"t2","is_error":true,"content":"1 failed"}]}}'
20
+ # A non-Bash tool call should be ignored.
21
+ printf '%s\n' '{"type":"assistant","message":{"content":[{"type":"tool_use","name":"Read","id":"t3","input":{"file_path":"x"}}]}}'
22
+ # Final assistant message with the claims.
23
+ printf '%s\n' '{"type":"assistant","message":{"content":[{"type":"text","text":"Done. The build passes and the tests are green."}]}}'
24
+ } >"$TRANSCRIPT"
25
+ }
26
+
27
+ @test "final assistant message returns the last text turn" {
28
+ run assayer_final_assistant_message "$TRANSCRIPT" 6000
29
+ [ "$status" -eq 0 ]
30
+ [ "$output" = "Done. The build passes and the tests are green." ]
31
+ }
32
+
33
+ @test "final assistant message truncates to max_chars" {
34
+ run assayer_final_assistant_message "$TRANSCRIPT" 10
35
+ [ "$status" -eq 0 ]
36
+ [ "${#output}" -eq 10 ]
37
+ }
38
+
39
+ @test "missing transcript yields empty final message" {
40
+ run assayer_final_assistant_message "${BATS_TEST_TMPDIR}/nope.jsonl" 6000
41
+ [ "$status" -eq 0 ]
42
+ [ -z "$output" ]
43
+ }
44
+
45
+ @test "collects Bash commands with their is_error status" {
46
+ run assayer_collect_commands "$TRANSCRIPT"
47
+ [ "$status" -eq 0 ]
48
+ [ "$(printf '%s' "$output" | jq 'length')" -eq 2 ]
49
+ [ "$(printf '%s' "$output" | jq -r '.[0].command')" = "npm run build" ]
50
+ [ "$(printf '%s' "$output" | jq -r '.[0].is_error')" = "false" ]
51
+ [ "$(printf '%s' "$output" | jq -r '.[1].command')" = "npm test" ]
52
+ [ "$(printf '%s' "$output" | jq -r '.[1].is_error')" = "true" ]
53
+ }
54
+
55
+ @test "captures the failing command's output excerpt" {
56
+ run assayer_collect_commands "$TRANSCRIPT"
57
+ [ "$status" -eq 0 ]
58
+ [ "$(printf '%s' "$output" | jq -r '.[1].excerpt')" = "1 failed" ]
59
+ }
60
+
61
+ @test "non-Bash tool calls are excluded" {
62
+ run assayer_collect_commands "$TRANSCRIPT"
63
+ [ "$status" -eq 0 ]
64
+ # Only the two Bash commands, never the Read call.
65
+ [ "$(printf '%s' "$output" | jq '[.[] | select(.command | contains("file"))] | length')" -eq 0 ]
66
+ }
67
+
68
+ @test "missing transcript yields empty command array" {
69
+ run assayer_collect_commands "${BATS_TEST_TMPDIR}/nope.jsonl"
70
+ [ "$status" -eq 0 ]
71
+ [ "$output" = "[]" ]
72
+ }