@onlooker-community/ecosystem 0.23.1 → 0.25.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.claude-plugin/marketplace.json +13 -0
- package/.claude-plugin/plugin.json +1 -1
- package/.github/workflows/autofix.yml +65 -0
- package/.release-please-manifest.json +4 -3
- package/CHANGELOG.md +14 -0
- package/CLAUDE.md +1 -0
- package/package.json +3 -3
- package/plugins/assayer/.claude-plugin/plugin.json +14 -0
- package/plugins/assayer/CHANGELOG.md +10 -0
- package/plugins/assayer/README.md +114 -0
- package/plugins/assayer/config.json +14 -0
- package/plugins/assayer/docs/adr/001-verify-claims-against-transcript-evidence.md +57 -0
- package/plugins/assayer/docs/design.md +72 -0
- package/plugins/assayer/hooks/hooks.json +15 -0
- package/plugins/assayer/scripts/hooks/assayer-stop.sh +249 -0
- package/plugins/assayer/scripts/lib/assayer-config.sh +88 -0
- package/plugins/assayer/scripts/lib/assayer-events.sh +85 -0
- package/plugins/assayer/scripts/lib/assayer-extract.sh +87 -0
- package/plugins/assayer/scripts/lib/assayer-project-key.sh +69 -0
- package/plugins/assayer/scripts/lib/assayer-transcript.sh +99 -0
- package/plugins/assayer/scripts/lib/assayer-ulid.sh +46 -0
- package/plugins/assayer/scripts/lib/assayer-verify.sh +95 -0
- package/plugins/compass/README.md +173 -0
- package/plugins/counsel/README.md +98 -0
- package/plugins/governor/README.md +127 -0
- package/plugins/librarian/.claude-plugin/plugin.json +2 -2
- package/plugins/librarian/CHANGELOG.md +7 -0
- package/plugins/librarian/scripts/lib/librarian-cli.sh +339 -0
- package/plugins/librarian/skills/librarian/SKILL.md +63 -0
- package/plugins/scribe/.claude-plugin/plugin.json +1 -3
- package/plugins/scribe/README.md +118 -0
- package/plugins/warden/README.md +185 -0
- package/release-please-config.json +16 -0
- package/test/bats/assayer-config.bats +60 -0
- package/test/bats/assayer-events.bats +99 -0
- package/test/bats/assayer-extract.bats +76 -0
- package/test/bats/assayer-project-key.bats +58 -0
- package/test/bats/assayer-stop-hook.bats +81 -0
- package/test/bats/assayer-transcript.bats +72 -0
- package/test/bats/assayer-ulid.bats +31 -0
- package/test/bats/assayer-verify.bats +89 -0
- package/test/bats/librarian-cli.bats +305 -0
|
@@ -0,0 +1,185 @@
|
|
|
1
|
+
# Warden
|
|
2
|
+
|
|
3
|
+
Untrusted-content gate enforcing the Agents Rule of Two.
|
|
4
|
+
|
|
5
|
+
Warden scans content flowing into the agent through `WebFetch` and `Read` for prompt-injection patterns. When it finds a threat, it closes a session-scoped **content gate** that blocks `Write`, `Edit`, `MultiEdit`, and `Bash` until the user explicitly clears it.
|
|
6
|
+
|
|
7
|
+
Grounded in Meta's *Agents Rule of Two*: an agent should hold no more than two of {access to private data, ability to take external actions, processing of untrusted content} at once. A coding agent in a real repository already holds the first two — your source and secrets, plus the ability to write files and run commands. The moment it ingests untrusted content (a fetched page, a file of unknown provenance) it holds all three: the dangerous configuration in which untrusted content can steer private data into external actions. Warden cannot un-read content, so it removes the *external-actions* property instead — closing the gate keeps the agent reading and reasoning while a human reviews the situation. Three-of-three collapses back to two-of-three, with the user as the release valve.
|
|
8
|
+
|
|
9
|
+
Warden is a sibling plugin to [`ecosystem`](../../) and assumes the Onlooker observability substrate (`~/.onlooker/`) is present.
|
|
10
|
+
|
|
11
|
+
## How it works
|
|
12
|
+
|
|
13
|
+
Detection and enforcement are split across two hook surfaces, mediated only by the on-disk gate lock — the surfaces never call each other. See [ADR-001](docs/adr/001-detect-after-ingest-gate-before-action.md).
|
|
14
|
+
|
|
15
|
+
| Surface | What Warden does |
|
|
16
|
+
|---------|------------------|
|
|
17
|
+
| `PostToolUse` (`WebFetch`, `Read`) | Extracts ingested content from `tool_response`, applies the source and skip-glob filters and length cap, and runs the hybrid scanner. A strong pattern hit closes the gate immediately; a weak hit escalates to the evaluator. On a positive verdict it closes the session-scoped gate and emits `warden.threat.detected`. PostToolUse cannot block the read — and deliberately does not, because reading is how the threat is discovered. |
|
|
18
|
+
| `PreToolUse` (`Write`, `Edit`, `MultiEdit`, `Bash`) | Pure lock check: if the gate is closed it returns `{"decision":"block", …}` and emits `warden.gate.blocked`; otherwise it allows silently. No model call, no command parsing. |
|
|
19
|
+
| `SessionStart` | Initializes Warden for the session. A new session always starts with the gate open, even if a prior session saw a threat. |
|
|
20
|
+
| `/warden` skill | The user-facing control surface — reports gate status and is the only sanctioned way to clear a closed gate. |
|
|
21
|
+
|
|
22
|
+
### Hybrid detection
|
|
23
|
+
|
|
24
|
+
Detection is a two-stage funnel, balancing coverage against cost and data egress:
|
|
25
|
+
|
|
26
|
+
1. **Pattern floor** (`warden-patterns.sh`) — a curated regex set mapped to five threat types: `prompt_injection`, `instruction_override`, `credential_exfiltration`, `command_injection`, and `social_engineering`. **Strong** signatures (explicit override/exfil/command-injection phrasing) score `detection.strong_pattern_confidence` (default `0.9`) and close the gate with no model call. **Weak** signatures (social-engineering pressure, soft instruction-shaped imperatives) score `detection.weak_pattern_confidence` (default `0.5`) — below `close_threshold` — and are treated as borderline.
|
|
27
|
+
2. **LLM escalation** (`warden-evaluator.sh`) — borderline content is sanitized and sent to N parallel Haiku judges (majority vote). The gate closes only if the panel judges it an injection with confidence `≥ close_threshold`.
|
|
28
|
+
|
|
29
|
+
Clean content (no signature) never reaches the model. Set `escalation.enabled: false` for a zero-egress, pattern-only posture.
|
|
30
|
+
|
|
31
|
+
### Fail-soft posture
|
|
32
|
+
|
|
33
|
+
- Detection never blocks the read — `PostToolUse` cannot. If escalation errors, Warden falls back to the deterministic pattern verdict.
|
|
34
|
+
- Enforcement is a pure lock check, trivially fail-closed: a present lock always blocks.
|
|
35
|
+
- Event emission is best-effort; a schema-validation or emit failure is logged to stderr and never blocks a session.
|
|
36
|
+
|
|
37
|
+
## Activation
|
|
38
|
+
|
|
39
|
+
Warden is **off by default**. Enable per-project in `.claude/settings.json`:
|
|
40
|
+
|
|
41
|
+
```json
|
|
42
|
+
{
|
|
43
|
+
"warden": {
|
|
44
|
+
"enabled": true
|
|
45
|
+
}
|
|
46
|
+
}
|
|
47
|
+
```
|
|
48
|
+
|
|
49
|
+
Or globally in `~/.claude/settings.json`.
|
|
50
|
+
|
|
51
|
+
## Configuration
|
|
52
|
+
|
|
53
|
+
All keys are optional. Unset keys fall back to the plugin's `config.json` defaults.
|
|
54
|
+
|
|
55
|
+
```json
|
|
56
|
+
{
|
|
57
|
+
"warden": {
|
|
58
|
+
"enabled": false,
|
|
59
|
+
"scan": {
|
|
60
|
+
"sources": ["web_fetch", "file_read"],
|
|
61
|
+
"max_content_chars": 20000,
|
|
62
|
+
"skip_globs": ["**/*.lock", "**/*.sum", "**/node_modules/**", "**/.git/**", "**/dist/**", "**/build/**"],
|
|
63
|
+
"store_snippet": true,
|
|
64
|
+
"snippet_max_chars": 240
|
|
65
|
+
},
|
|
66
|
+
"detection": {
|
|
67
|
+
"close_threshold": 0.65,
|
|
68
|
+
"strong_pattern_confidence": 0.9,
|
|
69
|
+
"weak_pattern_confidence": 0.5
|
|
70
|
+
},
|
|
71
|
+
"escalation": {
|
|
72
|
+
"enabled": true,
|
|
73
|
+
"borderline_only": true,
|
|
74
|
+
"model": "claude-haiku-4-5-20251001",
|
|
75
|
+
"n": 3,
|
|
76
|
+
"temperature": 0.0,
|
|
77
|
+
"max_output_tokens": 192,
|
|
78
|
+
"sample_timeout_seconds": 12,
|
|
79
|
+
"min_valid_samples": 2
|
|
80
|
+
},
|
|
81
|
+
"gate": {
|
|
82
|
+
"blocked_tools": ["Write", "Edit", "MultiEdit", "Bash"],
|
|
83
|
+
"clear_policy": "user_override_only"
|
|
84
|
+
}
|
|
85
|
+
}
|
|
86
|
+
}
|
|
87
|
+
```
|
|
88
|
+
|
|
89
|
+
| Key | Default | Description |
|
|
90
|
+
|-----|---------|-------------|
|
|
91
|
+
| `enabled` | `false` | Must be `true` for any scanning or gating to run. |
|
|
92
|
+
| `scan.sources` | `["web_fetch", "file_read"]` | Which ingestion sources to scan. Matches the schema's `source_type` enum. |
|
|
93
|
+
| `scan.max_content_chars` | `20000` | Length cap on the content fed into detection. |
|
|
94
|
+
| `scan.skip_globs` | lockfiles, `node_modules`, `.git`, `dist`, `build`, … | Globs whose reads are not scanned. |
|
|
95
|
+
| `scan.store_snippet` | `true` | Whether to keep a flagged excerpt in the gate record and event payload. |
|
|
96
|
+
| `scan.snippet_max_chars` | `240` | Maximum length of a stored snippet. |
|
|
97
|
+
| `detection.close_threshold` | `0.65` | Confidence at or above which a verdict closes the gate. |
|
|
98
|
+
| `detection.strong_pattern_confidence` | `0.9` | Score assigned to strong pattern hits — above threshold, closes without a model call. |
|
|
99
|
+
| `detection.weak_pattern_confidence` | `0.5` | Score assigned to weak pattern hits — below threshold, escalates to the evaluator. |
|
|
100
|
+
| `escalation.enabled` | `true` | Whether borderline content escalates to the LLM evaluator. `false` is a zero-egress, pattern-only posture. |
|
|
101
|
+
| `escalation.borderline_only` | `true` | Escalate only weak/borderline hits, never clean content. |
|
|
102
|
+
| `escalation.model` | `claude-haiku-4-5-20251001` | Model used for the evaluator panel. |
|
|
103
|
+
| `escalation.n` | `3` | Number of parallel evaluator samples (majority vote). |
|
|
104
|
+
| `escalation.temperature` | `0.0` | Sampling temperature for the evaluator. |
|
|
105
|
+
| `escalation.max_output_tokens` | `192` | Token ceiling per evaluator sample. |
|
|
106
|
+
| `escalation.sample_timeout_seconds` | `12` | Per-sample wall-clock timeout. |
|
|
107
|
+
| `escalation.min_valid_samples` | `2` | Minimum valid samples required to form a verdict. |
|
|
108
|
+
| `gate.blocked_tools` | `["Write", "Edit", "MultiEdit", "Bash"]` | Tools blocked while the gate is closed. |
|
|
109
|
+
| `gate.clear_policy` | `user_override_only` | How a closed gate may be cleared. Only explicit user override is supported. |
|
|
110
|
+
|
|
111
|
+
On escalation, only a sanitized, length-capped excerpt of the ingested content is sent to the evaluator model. Setting `escalation.enabled: false` disables all egress — Warden then relies on the deterministic pattern floor alone.
|
|
112
|
+
|
|
113
|
+
## The gate model
|
|
114
|
+
|
|
115
|
+
The gate is a single session-scoped lock with two states:
|
|
116
|
+
|
|
117
|
+
- **Open** (default — file absent, or `{"state":"open"}`) — `Write`, `Edit`, `MultiEdit`, and `Bash` are allowed.
|
|
118
|
+
- **Closed** (`{"state":"closed", …}`) — those operations are blocked at `PreToolUse`.
|
|
119
|
+
|
|
120
|
+
The detection hook **closes** the gate on a positive scan. Once closed, it can be **cleared only by the user** via the `/warden` skill (`clear_policy: user_override_only`) — Warden does not auto-clear in this release. The gate is session-scoped: a brand-new session starts open even if a prior session saw a threat, because the untrusted content lives in a specific session's context.
|
|
121
|
+
|
|
122
|
+
Clearing the gate re-enables write-class tools but does not remove the flagged content from the conversation — it is still in context. The skill reminds the user of this.
|
|
123
|
+
|
|
124
|
+
## Storage layout
|
|
125
|
+
|
|
126
|
+
```text
|
|
127
|
+
~/.onlooker/warden/sessions/<session_id>/
|
|
128
|
+
└── gate.json
|
|
129
|
+
```
|
|
130
|
+
|
|
131
|
+
`gate.json` when the gate is closed:
|
|
132
|
+
|
|
133
|
+
```json
|
|
134
|
+
{
|
|
135
|
+
"state": "closed",
|
|
136
|
+
"closed_at": 1717000000,
|
|
137
|
+
"threat": {
|
|
138
|
+
"threat_id": "01J…",
|
|
139
|
+
"source_type": "web_fetch",
|
|
140
|
+
"threat_type": "credential_exfiltration",
|
|
141
|
+
"confidence": 0.9,
|
|
142
|
+
"source_url": "https://…",
|
|
143
|
+
"source_path": null,
|
|
144
|
+
"snippet": "…sanitized excerpt…",
|
|
145
|
+
"matched_pattern": "…",
|
|
146
|
+
"detection_method": "pattern_strong"
|
|
147
|
+
}
|
|
148
|
+
}
|
|
149
|
+
```
|
|
150
|
+
|
|
151
|
+
The local record keeps forensic fields (`threat_id`, `matched_pattern`, `detection_method`). The emitted `warden.threat.detected` event carries only the schema-permitted fields — warden payloads use `additionalProperties: false`. State is keyed by `session_id`, not by repository: the gate guards a single session's context.
|
|
152
|
+
|
|
153
|
+
## Events emitted
|
|
154
|
+
|
|
155
|
+
Warden emits the canonical `warden.*` event surface from [`@onlooker-community/schema`](https://github.com/onlooker-community/schema) (v2.4.0+). All events land in `~/.onlooker/logs/onlooker-events.jsonl` and are validated against the schema before write.
|
|
156
|
+
|
|
157
|
+
| Event | When | Payload |
|
|
158
|
+
|-------|------|---------|
|
|
159
|
+
| `warden.threat.detected` | A scan closes the gate. | `source_type`, `threat_type`, `confidence` (plus optional `source_url` / `source_path` / `snippet`) |
|
|
160
|
+
| `warden.gate.blocked` | A write/edit/bash operation is blocked by a closed gate. | `blocked_operation`, `threat_source_type` |
|
|
161
|
+
| `warden.threat.cleared` | The user clears the gate via `/warden`. | `source_type`, `cleared_by: user_override` |
|
|
162
|
+
|
|
163
|
+
## The `/warden` skill
|
|
164
|
+
|
|
165
|
+
`/warden` is the user-facing control surface for the gate that the hooks open and close automatically.
|
|
166
|
+
|
|
167
|
+
- `/warden` or `/warden status` — prints whether the gate is OPEN or CLOSED. When closed, prints the recorded threat: `threat_type`, `source_type`, source URL/path, confidence, detection method, matched pattern, and the flagged snippet (when storage is enabled).
|
|
168
|
+
- `/warden clear` (also `reopen`, `override`, `unblock`) — verifies the gate is closed, removes the lock, re-enables `Write`/`Edit`/`Bash`, and emits `warden.threat.cleared` with `cleared_by: user_override`.
|
|
169
|
+
|
|
170
|
+
The skill resolves the active session automatically: it prefers `$CLAUDE_SESSION_ID`, falls back to the single closed gate when exactly one exists, and reports ambiguity if several sessions have closed gates (re-run with an explicit session id in that case). Closing is automatic; clearing is always a deliberate user decision.
|
|
171
|
+
|
|
172
|
+
## Requirements
|
|
173
|
+
|
|
174
|
+
- The `ecosystem` plugin installed (for the `~/.onlooker/` substrate and canonical event emission).
|
|
175
|
+
- `claude` CLI on `PATH` (the evaluator shells out to `claude -p` when escalation is enabled).
|
|
176
|
+
- `jq` for JSON manipulation.
|
|
177
|
+
- `node` for canonical-event emission.
|
|
178
|
+
|
|
179
|
+
## Architecture decisions
|
|
180
|
+
|
|
181
|
+
Key decisions made during initial design are recorded in [`docs/adr/`](docs/adr/):
|
|
182
|
+
|
|
183
|
+
- [ADR-001](docs/adr/001-detect-after-ingest-gate-before-action.md) — Detect after ingestion, gate before action (the detection/enforcement split and its Rule-of-Two mapping)
|
|
184
|
+
|
|
185
|
+
See also the full plugin design in [`docs/design.md`](docs/design.md).
|
|
@@ -206,6 +206,22 @@
|
|
|
206
206
|
"jsonpath": "$.version"
|
|
207
207
|
}
|
|
208
208
|
]
|
|
209
|
+
},
|
|
210
|
+
"plugins/assayer": {
|
|
211
|
+
"changelog-path": "CHANGELOG.md",
|
|
212
|
+
"release-type": "simple",
|
|
213
|
+
"bump-minor-pre-major": true,
|
|
214
|
+
"bump-patch-for-minor-pre-major": false,
|
|
215
|
+
"component": "assayer",
|
|
216
|
+
"draft": false,
|
|
217
|
+
"prerelease": false,
|
|
218
|
+
"extra-files": [
|
|
219
|
+
{
|
|
220
|
+
"type": "json",
|
|
221
|
+
"path": ".claude-plugin/plugin.json",
|
|
222
|
+
"jsonpath": "$.version"
|
|
223
|
+
}
|
|
224
|
+
]
|
|
209
225
|
}
|
|
210
226
|
},
|
|
211
227
|
"$schema": "https://raw.githubusercontent.com/googleapis/release-please/main/schemas/config.json"
|
|
@@ -0,0 +1,60 @@
|
|
|
1
|
+
#!/usr/bin/env bats
|
|
2
|
+
|
|
3
|
+
# Exercises Assayer config loading: defaults and per-project overrides.
|
|
4
|
+
|
|
5
|
+
setup() {
|
|
6
|
+
source "${BATS_TEST_DIRNAME}/../helpers/setup.bash"
|
|
7
|
+
setup_test_env
|
|
8
|
+
PLUGIN_ROOT="${REPO_ROOT}/plugins/assayer"
|
|
9
|
+
export CLAUDE_PLUGIN_ROOT="$PLUGIN_ROOT"
|
|
10
|
+
# shellcheck disable=SC1091
|
|
11
|
+
source "${PLUGIN_ROOT}/scripts/lib/assayer-config.sh"
|
|
12
|
+
|
|
13
|
+
REPO="${BATS_TEST_TMPDIR}/repo"
|
|
14
|
+
mkdir -p "${REPO}/.claude"
|
|
15
|
+
}
|
|
16
|
+
|
|
17
|
+
@test "disabled by default (no settings)" {
|
|
18
|
+
assayer_config_load "$REPO"
|
|
19
|
+
run assayer_config_enabled
|
|
20
|
+
[ "$status" -ne 0 ]
|
|
21
|
+
}
|
|
22
|
+
|
|
23
|
+
@test "enabled when settings opt in" {
|
|
24
|
+
printf '%s\n' '{"assayer":{"enabled":true}}' >"${REPO}/.claude/settings.json"
|
|
25
|
+
assayer_config_load "$REPO"
|
|
26
|
+
run assayer_config_enabled
|
|
27
|
+
[ "$status" -eq 0 ]
|
|
28
|
+
}
|
|
29
|
+
|
|
30
|
+
@test "default model is haiku" {
|
|
31
|
+
assayer_config_load "$REPO"
|
|
32
|
+
[ "$(assayer_config_model)" = "claude-haiku-4-5-20251001" ]
|
|
33
|
+
}
|
|
34
|
+
|
|
35
|
+
@test "model override is honored" {
|
|
36
|
+
printf '%s\n' '{"assayer":{"evaluation":{"model":"claude-opus-4-8"}}}' >"${REPO}/.claude/settings.json"
|
|
37
|
+
assayer_config_load "$REPO"
|
|
38
|
+
[ "$(assayer_config_model)" = "claude-opus-4-8" ]
|
|
39
|
+
}
|
|
40
|
+
|
|
41
|
+
@test "default max_claims is 12" {
|
|
42
|
+
assayer_config_load "$REPO"
|
|
43
|
+
[ "$(assayer_config_max_claims)" = "12" ]
|
|
44
|
+
}
|
|
45
|
+
|
|
46
|
+
@test "default min_confidence is 0.5" {
|
|
47
|
+
assayer_config_load "$REPO"
|
|
48
|
+
[ "$(assayer_config_min_confidence)" = "0.5" ]
|
|
49
|
+
}
|
|
50
|
+
|
|
51
|
+
@test "min_confidence override is honored" {
|
|
52
|
+
printf '%s\n' '{"assayer":{"min_confidence":0.8}}' >"${REPO}/.claude/settings.json"
|
|
53
|
+
assayer_config_load "$REPO"
|
|
54
|
+
[ "$(assayer_config_min_confidence)" = "0.8" ]
|
|
55
|
+
}
|
|
56
|
+
|
|
57
|
+
@test "default timeout is 60" {
|
|
58
|
+
assayer_config_load "$REPO"
|
|
59
|
+
[ "$(assayer_config_timeout)" = "60" ]
|
|
60
|
+
}
|
|
@@ -0,0 +1,99 @@
|
|
|
1
|
+
#!/usr/bin/env bats
|
|
2
|
+
|
|
3
|
+
# Validates every emitted assayer.* event against @onlooker-community/schema.
|
|
4
|
+
#
|
|
5
|
+
# The assayer.* event types ship in @onlooker-community/schema; until the
|
|
6
|
+
# installed version includes them, these tests skip rather than fail. Once the
|
|
7
|
+
# ecosystem's schema dependency is bumped to a release that carries them, they
|
|
8
|
+
# run for real. See plugins/assayer/README.md (Requirements).
|
|
9
|
+
|
|
10
|
+
setup() {
|
|
11
|
+
source "${BATS_TEST_DIRNAME}/../helpers/setup.bash"
|
|
12
|
+
setup_test_env
|
|
13
|
+
|
|
14
|
+
PLUGIN_ROOT="${REPO_ROOT}/plugins/assayer"
|
|
15
|
+
export CLAUDE_PLUGIN_ROOT="$PLUGIN_ROOT"
|
|
16
|
+
export ONLOOKER_EVENTS_LOG="${ONLOOKER_DIR}/logs/onlooker-events.jsonl"
|
|
17
|
+
mkdir -p "$(dirname "$ONLOOKER_EVENTS_LOG")"
|
|
18
|
+
|
|
19
|
+
export _ONLOOKER_EVENT_JS="${REPO_ROOT}/scripts/lib/onlooker-event.mjs"
|
|
20
|
+
export CLAUDE_SESSION_ID="bats-session-$$"
|
|
21
|
+
|
|
22
|
+
# shellcheck disable=SC1091
|
|
23
|
+
source "${PLUGIN_ROOT}/scripts/lib/assayer-events.sh"
|
|
24
|
+
}
|
|
25
|
+
|
|
26
|
+
# Skip when the installed schema predates the assayer.* event types.
|
|
27
|
+
_require_assayer_schema() {
|
|
28
|
+
if ! grep -q "assayer.audit.started" \
|
|
29
|
+
"${REPO_ROOT}/node_modules/@onlooker-community/schema/schemas/event.v1.json" 2>/dev/null; then
|
|
30
|
+
skip "installed @onlooker-community/schema has no assayer.* types yet"
|
|
31
|
+
fi
|
|
32
|
+
}
|
|
33
|
+
|
|
34
|
+
_validate_latest_event() {
|
|
35
|
+
local last
|
|
36
|
+
last=$(tail -n 1 "$ONLOOKER_EVENTS_LOG")
|
|
37
|
+
[ -n "$last" ] || return 1
|
|
38
|
+
printf '%s' "$last" | ONLOOKER_DIR="$ONLOOKER_DIR" \
|
|
39
|
+
node "${REPO_ROOT}/scripts/lib/onlooker-event.mjs" validate >/dev/null
|
|
40
|
+
}
|
|
41
|
+
|
|
42
|
+
# Valid 26-char Crockford Base32 ULID (no I, L, O, or U).
|
|
43
|
+
AUDIT_ID="01J0000000000000000000AB34"
|
|
44
|
+
|
|
45
|
+
@test "assayer.audit.started validates" {
|
|
46
|
+
_require_assayer_schema
|
|
47
|
+
local p
|
|
48
|
+
p=$(jq -n --arg a "$AUDIT_ID" '{audit_id: $a, claim_count: 3, trigger: "stop", command_count: 5}')
|
|
49
|
+
assayer_emit_event "assayer.audit.started" "$p"
|
|
50
|
+
run _validate_latest_event
|
|
51
|
+
[ "$status" -eq 0 ]
|
|
52
|
+
}
|
|
53
|
+
|
|
54
|
+
@test "assayer.claim.contradicted validates" {
|
|
55
|
+
_require_assayer_schema
|
|
56
|
+
local p
|
|
57
|
+
p=$(jq -n --arg a "$AUDIT_ID" '{
|
|
58
|
+
audit_id: $a,
|
|
59
|
+
claim: "I ran the tests and they all pass.",
|
|
60
|
+
claim_type: "tests_pass",
|
|
61
|
+
evidence_command: "npm test",
|
|
62
|
+
result_excerpt: "1 failed, 32 passed",
|
|
63
|
+
confidence: 0.9
|
|
64
|
+
}')
|
|
65
|
+
assayer_emit_event "assayer.claim.contradicted" "$p"
|
|
66
|
+
run _validate_latest_event
|
|
67
|
+
[ "$status" -eq 0 ]
|
|
68
|
+
}
|
|
69
|
+
|
|
70
|
+
@test "assayer.claim.unverified validates" {
|
|
71
|
+
_require_assayer_schema
|
|
72
|
+
local p
|
|
73
|
+
p=$(jq -n --arg a "$AUDIT_ID" '{audit_id: $a, claim: "The deploy is healthy.", claim_type: "generic", reason: "no_matching_command"}')
|
|
74
|
+
assayer_emit_event "assayer.claim.unverified" "$p"
|
|
75
|
+
run _validate_latest_event
|
|
76
|
+
[ "$status" -eq 0 ]
|
|
77
|
+
}
|
|
78
|
+
|
|
79
|
+
@test "assayer.audit.complete validates" {
|
|
80
|
+
_require_assayer_schema
|
|
81
|
+
local p
|
|
82
|
+
p=$(jq -n --arg a "$AUDIT_ID" '{
|
|
83
|
+
audit_id: $a, claim_count: 3, corroborated: 1, contradicted: 1,
|
|
84
|
+
unverified: 1, verdict: "contradictions_found", duration_ms: 4200
|
|
85
|
+
}')
|
|
86
|
+
assayer_emit_event "assayer.audit.complete" "$p"
|
|
87
|
+
run _validate_latest_event
|
|
88
|
+
[ "$status" -eq 0 ]
|
|
89
|
+
}
|
|
90
|
+
|
|
91
|
+
@test "emission fails on unknown event type" {
|
|
92
|
+
run assayer_emit_event "assayer.no.such.event" '{"audit_id":"x"}'
|
|
93
|
+
[ "$status" -ne 0 ]
|
|
94
|
+
}
|
|
95
|
+
|
|
96
|
+
@test "assayer_emit_event returns 1 when payload is empty" {
|
|
97
|
+
run assayer_emit_event "assayer.audit.started" ""
|
|
98
|
+
[ "$status" -ne 0 ]
|
|
99
|
+
}
|
|
@@ -0,0 +1,76 @@
|
|
|
1
|
+
#!/usr/bin/env bats
|
|
2
|
+
|
|
3
|
+
# Exercises claim parsing: assayer_parse_claims and the extraction prompt.
|
|
4
|
+
|
|
5
|
+
setup() {
|
|
6
|
+
source "${BATS_TEST_DIRNAME}/../helpers/setup.bash"
|
|
7
|
+
setup_test_env
|
|
8
|
+
PLUGIN_ROOT="${REPO_ROOT}/plugins/assayer"
|
|
9
|
+
# shellcheck disable=SC1091
|
|
10
|
+
source "${PLUGIN_ROOT}/scripts/lib/assayer-extract.sh"
|
|
11
|
+
}
|
|
12
|
+
|
|
13
|
+
@test "parses a clean JSON array of claims" {
|
|
14
|
+
run assayer_parse_claims '[{"text":"tests pass","type":"tests_pass","command_keyword":"test","confidence":0.9}]'
|
|
15
|
+
[ "$status" -eq 0 ]
|
|
16
|
+
[ "$(printf '%s' "$output" | jq 'length')" -eq 1 ]
|
|
17
|
+
[ "$(printf '%s' "$output" | jq -r '.[0].type')" = "tests_pass" ]
|
|
18
|
+
}
|
|
19
|
+
|
|
20
|
+
@test "strips markdown fences" {
|
|
21
|
+
local raw
|
|
22
|
+
raw=$'```json\n[{"text":"build ok","type":"build_succeeds","command_keyword":"build","confidence":0.8}]\n```'
|
|
23
|
+
run assayer_parse_claims "$raw"
|
|
24
|
+
[ "$status" -eq 0 ]
|
|
25
|
+
[ "$(printf '%s' "$output" | jq 'length')" -eq 1 ]
|
|
26
|
+
}
|
|
27
|
+
|
|
28
|
+
@test "drops malformed entries and entries without text" {
|
|
29
|
+
run assayer_parse_claims '[{"text":"ok","type":"generic","confidence":0.7},{"no_text":true},{"text":""}]'
|
|
30
|
+
[ "$status" -eq 0 ]
|
|
31
|
+
[ "$(printf '%s' "$output" | jq 'length')" -eq 1 ]
|
|
32
|
+
}
|
|
33
|
+
|
|
34
|
+
@test "coerces unknown type to generic" {
|
|
35
|
+
run assayer_parse_claims '[{"text":"thing","type":"made_up","command_keyword":"x","confidence":0.7}]'
|
|
36
|
+
[ "$status" -eq 0 ]
|
|
37
|
+
[ "$(printf '%s' "$output" | jq -r '.[0].type')" = "generic" ]
|
|
38
|
+
}
|
|
39
|
+
|
|
40
|
+
@test "defaults confidence when missing or non-numeric" {
|
|
41
|
+
run assayer_parse_claims '[{"text":"thing","type":"generic","command_keyword":"x"}]'
|
|
42
|
+
[ "$status" -eq 0 ]
|
|
43
|
+
[ "$(printf '%s' "$output" | jq -r '.[0].confidence')" = "0.6" ]
|
|
44
|
+
}
|
|
45
|
+
|
|
46
|
+
@test "lowercases command_keyword" {
|
|
47
|
+
run assayer_parse_claims '[{"text":"thing","type":"generic","command_keyword":"TEST","confidence":0.7}]'
|
|
48
|
+
[ "$status" -eq 0 ]
|
|
49
|
+
[ "$(printf '%s' "$output" | jq -r '.[0].command_keyword')" = "test" ]
|
|
50
|
+
}
|
|
51
|
+
|
|
52
|
+
@test "non-array input yields empty array" {
|
|
53
|
+
run assayer_parse_claims '{"text":"not an array"}'
|
|
54
|
+
[ "$status" -eq 0 ]
|
|
55
|
+
[ "$output" = "[]" ]
|
|
56
|
+
}
|
|
57
|
+
|
|
58
|
+
@test "garbage input yields empty array" {
|
|
59
|
+
run assayer_parse_claims 'I could not find any claims.'
|
|
60
|
+
[ "$status" -eq 0 ]
|
|
61
|
+
[ "$output" = "[]" ]
|
|
62
|
+
}
|
|
63
|
+
|
|
64
|
+
@test "empty input yields empty array" {
|
|
65
|
+
run assayer_parse_claims ""
|
|
66
|
+
[ "$status" -eq 0 ]
|
|
67
|
+
[ "$output" = "[]" ]
|
|
68
|
+
}
|
|
69
|
+
|
|
70
|
+
@test "extraction prompt includes the message and the JSON contract" {
|
|
71
|
+
run assayer_build_extraction_prompt "I ran the tests and they pass." 5
|
|
72
|
+
[ "$status" -eq 0 ]
|
|
73
|
+
[[ "$output" == *"I ran the tests and they pass."* ]]
|
|
74
|
+
[[ "$output" == *"TESTABLE SUCCESS CLAIM"* ]]
|
|
75
|
+
[[ "$output" == *"at most 5 claims"* ]]
|
|
76
|
+
}
|
|
@@ -0,0 +1,58 @@
|
|
|
1
|
+
#!/usr/bin/env bats
|
|
2
|
+
|
|
3
|
+
# Exercises Assayer project-key derivation.
|
|
4
|
+
|
|
5
|
+
setup() {
|
|
6
|
+
source "${BATS_TEST_DIRNAME}/../helpers/setup.bash"
|
|
7
|
+
setup_test_env
|
|
8
|
+
PLUGIN_ROOT="${REPO_ROOT}/plugins/assayer"
|
|
9
|
+
# shellcheck disable=SC1091
|
|
10
|
+
source "${PLUGIN_ROOT}/scripts/lib/assayer-project-key.sh"
|
|
11
|
+
|
|
12
|
+
REPO="${BATS_TEST_TMPDIR}/repo"
|
|
13
|
+
mkdir -p "$REPO"
|
|
14
|
+
git -C "$REPO" init -q
|
|
15
|
+
git -C "$REPO" config user.email test@example.com
|
|
16
|
+
git -C "$REPO" config user.name test
|
|
17
|
+
(cd "$REPO" && printf 'x\n' >f && git add f && git commit -q -m init)
|
|
18
|
+
}
|
|
19
|
+
|
|
20
|
+
@test "key is 12 hex chars for a repo with a remote" {
|
|
21
|
+
git -C "$REPO" remote add origin https://example.com/foo/bar.git
|
|
22
|
+
run assayer_project_key "$REPO"
|
|
23
|
+
[ "$status" -eq 0 ]
|
|
24
|
+
[[ "$output" =~ ^[0-9a-f]{12}$ ]]
|
|
25
|
+
}
|
|
26
|
+
|
|
27
|
+
@test "key is stable across calls" {
|
|
28
|
+
git -C "$REPO" remote add origin https://example.com/foo/bar.git
|
|
29
|
+
a=$(assayer_project_key "$REPO")
|
|
30
|
+
b=$(assayer_project_key "$REPO")
|
|
31
|
+
[ "$a" = "$b" ]
|
|
32
|
+
}
|
|
33
|
+
|
|
34
|
+
@test "remote-keyed differs from root-keyed" {
|
|
35
|
+
local with_remote without_remote
|
|
36
|
+
without_remote=$(assayer_project_key "$REPO")
|
|
37
|
+
git -C "$REPO" remote add origin https://example.com/foo/bar.git
|
|
38
|
+
with_remote=$(assayer_project_key "$REPO")
|
|
39
|
+
[ "$with_remote" != "$without_remote" ]
|
|
40
|
+
}
|
|
41
|
+
|
|
42
|
+
@test "different remotes yield different keys" {
|
|
43
|
+
git -C "$REPO" remote add origin https://example.com/foo/one.git
|
|
44
|
+
local one
|
|
45
|
+
one=$(assayer_project_key "$REPO")
|
|
46
|
+
git -C "$REPO" remote set-url origin https://example.com/foo/two.git
|
|
47
|
+
local two
|
|
48
|
+
two=$(assayer_project_key "$REPO")
|
|
49
|
+
[ "$one" != "$two" ]
|
|
50
|
+
}
|
|
51
|
+
|
|
52
|
+
@test "non-repo cwd yields empty key" {
|
|
53
|
+
local non_repo="${BATS_TEST_TMPDIR}/not-a-repo"
|
|
54
|
+
mkdir -p "$non_repo"
|
|
55
|
+
run assayer_project_key "$non_repo"
|
|
56
|
+
[ "$status" -eq 0 ]
|
|
57
|
+
[ -z "$output" ]
|
|
58
|
+
}
|
|
@@ -0,0 +1,81 @@
|
|
|
1
|
+
#!/usr/bin/env bats
|
|
2
|
+
|
|
3
|
+
# Exercises the Assayer Stop hook's gating behavior. Does not invoke claude -p
|
|
4
|
+
# (the hook bails before the extraction step when preconditions fail).
|
|
5
|
+
# Verifies: disabled-by-default, no-git, recursion guard, no-transcript, and
|
|
6
|
+
# stdout silence (advisory hook must never block Stop).
|
|
7
|
+
|
|
8
|
+
setup() {
|
|
9
|
+
source "${BATS_TEST_DIRNAME}/../helpers/setup.bash"
|
|
10
|
+
setup_test_env
|
|
11
|
+
|
|
12
|
+
PLUGIN_ROOT="${REPO_ROOT}/plugins/assayer"
|
|
13
|
+
export CLAUDE_PLUGIN_ROOT="$PLUGIN_ROOT"
|
|
14
|
+
HOOK="${PLUGIN_ROOT}/scripts/hooks/assayer-stop.sh"
|
|
15
|
+
|
|
16
|
+
REPO="${BATS_TEST_TMPDIR}/repo"
|
|
17
|
+
mkdir -p "$REPO"
|
|
18
|
+
git -C "$REPO" init -q
|
|
19
|
+
git -C "$REPO" config user.email test@example.com
|
|
20
|
+
git -C "$REPO" config user.name test
|
|
21
|
+
(cd "$REPO" && printf 'initial\n' >README.md && git add README.md && git commit -q -m init)
|
|
22
|
+
|
|
23
|
+
TRANSCRIPT="${BATS_TEST_TMPDIR}/transcript.jsonl"
|
|
24
|
+
printf '%s\n' '{"type":"assistant","message":{"content":[{"type":"text","text":"All tests pass."}]}}' >"$TRANSCRIPT"
|
|
25
|
+
}
|
|
26
|
+
|
|
27
|
+
_make_input() {
|
|
28
|
+
local cwd="${1:-$REPO}" sid="${2:-test-session}" transcript="${3:-$TRANSCRIPT}"
|
|
29
|
+
jq -n --arg cwd "$cwd" --arg sid "$sid" --arg tp "$transcript" \
|
|
30
|
+
'{cwd: $cwd, session_id: $sid, transcript_path: $tp}'
|
|
31
|
+
}
|
|
32
|
+
|
|
33
|
+
@test "exits 0 silently when assayer.enabled is false (default)" {
|
|
34
|
+
local input
|
|
35
|
+
input=$(_make_input)
|
|
36
|
+
run bash -c "printf '%s' '$input' | ONLOOKER_DIR='$ONLOOKER_DIR' '$HOOK'"
|
|
37
|
+
[ "$status" -eq 0 ]
|
|
38
|
+
[ -z "$output" ]
|
|
39
|
+
}
|
|
40
|
+
|
|
41
|
+
@test "exits 0 when cwd is not a git repo" {
|
|
42
|
+
local non_repo="${BATS_TEST_TMPDIR}/not-a-repo"
|
|
43
|
+
mkdir -p "$non_repo"
|
|
44
|
+
local input
|
|
45
|
+
input=$(_make_input "$non_repo")
|
|
46
|
+
run bash -c "printf '%s' '$input' | ONLOOKER_DIR='$ONLOOKER_DIR' '$HOOK'"
|
|
47
|
+
[ "$status" -eq 0 ]
|
|
48
|
+
}
|
|
49
|
+
|
|
50
|
+
@test "recursion guard: ASSAYER_NESTED=1 causes immediate exit 0" {
|
|
51
|
+
mkdir -p "${REPO}/.claude"
|
|
52
|
+
printf '%s\n' '{"assayer":{"enabled":true}}' >"${REPO}/.claude/settings.json"
|
|
53
|
+
local input
|
|
54
|
+
input=$(_make_input)
|
|
55
|
+
run bash -c "printf '%s' '$input' | ASSAYER_NESTED=1 ONLOOKER_DIR='$ONLOOKER_DIR' '$HOOK'"
|
|
56
|
+
[ "$status" -eq 0 ]
|
|
57
|
+
[ -z "$output" ]
|
|
58
|
+
}
|
|
59
|
+
|
|
60
|
+
@test "exits 0 when enabled but transcript is missing" {
|
|
61
|
+
mkdir -p "${REPO}/.claude"
|
|
62
|
+
printf '%s\n' '{"assayer":{"enabled":true}}' >"${REPO}/.claude/settings.json"
|
|
63
|
+
local input
|
|
64
|
+
input=$(_make_input "$REPO" "test-session" "${BATS_TEST_TMPDIR}/nope.jsonl")
|
|
65
|
+
run bash -c "printf '%s' '$input' | ONLOOKER_DIR='$ONLOOKER_DIR' '$HOOK'"
|
|
66
|
+
[ "$status" -eq 0 ]
|
|
67
|
+
[ -z "$output" ]
|
|
68
|
+
}
|
|
69
|
+
|
|
70
|
+
@test "exits 0 when enabled but final message is empty" {
|
|
71
|
+
mkdir -p "${REPO}/.claude"
|
|
72
|
+
printf '%s\n' '{"assayer":{"enabled":true}}' >"${REPO}/.claude/settings.json"
|
|
73
|
+
# Transcript with no assistant text turn.
|
|
74
|
+
local empty_transcript="${BATS_TEST_TMPDIR}/empty.jsonl"
|
|
75
|
+
printf '%s\n' '{"type":"user","message":{"content":[{"type":"text","text":"hi"}]}}' >"$empty_transcript"
|
|
76
|
+
local input
|
|
77
|
+
input=$(_make_input "$REPO" "test-session" "$empty_transcript")
|
|
78
|
+
run bash -c "printf '%s' '$input' | ONLOOKER_DIR='$ONLOOKER_DIR' '$HOOK'"
|
|
79
|
+
[ "$status" -eq 0 ]
|
|
80
|
+
[ -z "$output" ]
|
|
81
|
+
}
|
|
@@ -0,0 +1,72 @@
|
|
|
1
|
+
#!/usr/bin/env bats
|
|
2
|
+
|
|
3
|
+
# Exercises the transcript reader against a synthetic JSONL transcript shaped
|
|
4
|
+
# like a real Claude Code session log.
|
|
5
|
+
|
|
6
|
+
setup() {
|
|
7
|
+
source "${BATS_TEST_DIRNAME}/../helpers/setup.bash"
|
|
8
|
+
setup_test_env
|
|
9
|
+
PLUGIN_ROOT="${REPO_ROOT}/plugins/assayer"
|
|
10
|
+
# shellcheck disable=SC1091
|
|
11
|
+
source "${PLUGIN_ROOT}/scripts/lib/assayer-transcript.sh"
|
|
12
|
+
|
|
13
|
+
TRANSCRIPT="${BATS_TEST_TMPDIR}/transcript.jsonl"
|
|
14
|
+
{
|
|
15
|
+
# An assistant turn that runs a passing build, then a failing test.
|
|
16
|
+
printf '%s\n' '{"type":"assistant","message":{"content":[{"type":"text","text":"Running the build."},{"type":"tool_use","name":"Bash","id":"t1","input":{"command":"npm run build"}}]}}'
|
|
17
|
+
printf '%s\n' '{"type":"user","message":{"content":[{"type":"tool_result","tool_use_id":"t1","is_error":false,"content":"build ok"}]}}'
|
|
18
|
+
printf '%s\n' '{"type":"assistant","message":{"content":[{"type":"tool_use","name":"Bash","id":"t2","input":{"command":"npm test"}}]}}'
|
|
19
|
+
printf '%s\n' '{"type":"user","message":{"content":[{"type":"tool_result","tool_use_id":"t2","is_error":true,"content":"1 failed"}]}}'
|
|
20
|
+
# A non-Bash tool call should be ignored.
|
|
21
|
+
printf '%s\n' '{"type":"assistant","message":{"content":[{"type":"tool_use","name":"Read","id":"t3","input":{"file_path":"x"}}]}}'
|
|
22
|
+
# Final assistant message with the claims.
|
|
23
|
+
printf '%s\n' '{"type":"assistant","message":{"content":[{"type":"text","text":"Done. The build passes and the tests are green."}]}}'
|
|
24
|
+
} >"$TRANSCRIPT"
|
|
25
|
+
}
|
|
26
|
+
|
|
27
|
+
@test "final assistant message returns the last text turn" {
|
|
28
|
+
run assayer_final_assistant_message "$TRANSCRIPT" 6000
|
|
29
|
+
[ "$status" -eq 0 ]
|
|
30
|
+
[ "$output" = "Done. The build passes and the tests are green." ]
|
|
31
|
+
}
|
|
32
|
+
|
|
33
|
+
@test "final assistant message truncates to max_chars" {
|
|
34
|
+
run assayer_final_assistant_message "$TRANSCRIPT" 10
|
|
35
|
+
[ "$status" -eq 0 ]
|
|
36
|
+
[ "${#output}" -eq 10 ]
|
|
37
|
+
}
|
|
38
|
+
|
|
39
|
+
@test "missing transcript yields empty final message" {
|
|
40
|
+
run assayer_final_assistant_message "${BATS_TEST_TMPDIR}/nope.jsonl" 6000
|
|
41
|
+
[ "$status" -eq 0 ]
|
|
42
|
+
[ -z "$output" ]
|
|
43
|
+
}
|
|
44
|
+
|
|
45
|
+
@test "collects Bash commands with their is_error status" {
|
|
46
|
+
run assayer_collect_commands "$TRANSCRIPT"
|
|
47
|
+
[ "$status" -eq 0 ]
|
|
48
|
+
[ "$(printf '%s' "$output" | jq 'length')" -eq 2 ]
|
|
49
|
+
[ "$(printf '%s' "$output" | jq -r '.[0].command')" = "npm run build" ]
|
|
50
|
+
[ "$(printf '%s' "$output" | jq -r '.[0].is_error')" = "false" ]
|
|
51
|
+
[ "$(printf '%s' "$output" | jq -r '.[1].command')" = "npm test" ]
|
|
52
|
+
[ "$(printf '%s' "$output" | jq -r '.[1].is_error')" = "true" ]
|
|
53
|
+
}
|
|
54
|
+
|
|
55
|
+
@test "captures the failing command's output excerpt" {
|
|
56
|
+
run assayer_collect_commands "$TRANSCRIPT"
|
|
57
|
+
[ "$status" -eq 0 ]
|
|
58
|
+
[ "$(printf '%s' "$output" | jq -r '.[1].excerpt')" = "1 failed" ]
|
|
59
|
+
}
|
|
60
|
+
|
|
61
|
+
@test "non-Bash tool calls are excluded" {
|
|
62
|
+
run assayer_collect_commands "$TRANSCRIPT"
|
|
63
|
+
[ "$status" -eq 0 ]
|
|
64
|
+
# Only the two Bash commands, never the Read call.
|
|
65
|
+
[ "$(printf '%s' "$output" | jq '[.[] | select(.command | contains("file"))] | length')" -eq 0 ]
|
|
66
|
+
}
|
|
67
|
+
|
|
68
|
+
@test "missing transcript yields empty command array" {
|
|
69
|
+
run assayer_collect_commands "${BATS_TEST_TMPDIR}/nope.jsonl"
|
|
70
|
+
[ "$status" -eq 0 ]
|
|
71
|
+
[ "$output" = "[]" ]
|
|
72
|
+
}
|