@onlooker-community/ecosystem 0.16.0 → 0.17.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.claude-plugin/marketplace.json +26 -0
- package/.claude-plugin/plugin.json +1 -1
- package/.release-please-manifest.json +4 -2
- package/CHANGELOG.md +8 -0
- package/CLAUDE.md +88 -0
- package/package.json +2 -2
- package/plugins/compass/.claude-plugin/plugin.json +14 -0
- package/plugins/compass/CHANGELOG.md +8 -0
- package/plugins/compass/config.json +71 -0
- package/plugins/compass/docs/adr/001-evaluate-prompts-in-context.md +82 -0
- package/plugins/compass/docs/design.md +421 -0
- package/plugins/compass/hooks/hooks.json +82 -0
- package/plugins/compass/scripts/hooks/compass-bash-gate.sh +95 -0
- package/plugins/compass/scripts/hooks/compass-pre-tool-use.sh +86 -0
- package/plugins/compass/scripts/hooks/compass-record-write.sh +97 -0
- package/plugins/compass/scripts/hooks/compass-session-start.sh +77 -0
- package/plugins/compass/scripts/lib/compass-config.sh +72 -0
- package/plugins/compass/scripts/lib/compass-evaluator.sh +374 -0
- package/plugins/compass/scripts/lib/compass-events.sh +81 -0
- package/plugins/compass/scripts/lib/compass-gate.sh +465 -0
- package/plugins/compass/scripts/lib/compass-sanitizer.sh +82 -0
- package/plugins/compass/scripts/lib/compass-transcript.sh +135 -0
- package/plugins/governor/.claude-plugin/plugin.json +1 -1
- package/plugins/governor/CHANGELOG.md +7 -0
- package/plugins/scribe/.claude-plugin/plugin.json +12 -0
- package/plugins/scribe/CHANGELOG.md +8 -0
- package/plugins/scribe/config.json +20 -0
- package/plugins/scribe/hooks/hooks.json +37 -0
- package/plugins/scribe/scripts/hooks/scribe-capture.sh +76 -0
- package/plugins/scribe/scripts/hooks/scribe-session-start.sh +58 -0
- package/plugins/scribe/scripts/hooks/scribe-stop.sh +67 -0
- package/plugins/scribe/scripts/lib/scribe-config.sh +72 -0
- package/plugins/scribe/scripts/lib/scribe-distill.sh +239 -0
- package/plugins/scribe/scripts/lib/scribe-events.sh +80 -0
- package/plugins/scribe/scripts/lib/scribe-extract.sh +147 -0
- package/plugins/scribe/scripts/lib/scribe-project-key.sh +89 -0
- package/plugins/scribe/scripts/lib/scribe-ulid.sh +50 -0
- package/release-please-config.json +32 -0
- package/test/bats/scribe-extract.bats +102 -0
- package/test/bats/scribe-project-key.bats +75 -0
|
@@ -72,6 +72,32 @@
|
|
|
72
72
|
"license": "MIT",
|
|
73
73
|
"keywords": ["governance", "budget", "cost", "tokens", "enforcement", "audit"],
|
|
74
74
|
"tags": ["governance", "safety"]
|
|
75
|
+
},
|
|
76
|
+
{
|
|
77
|
+
"name": "compass",
|
|
78
|
+
"source": "./plugins/compass",
|
|
79
|
+
"description": "Pre-write intent clarity gate. Intercepts write-class tool calls and samples N=5 parallel evaluators to score intent clarity before allowing writes to proceed. Blocks when confidence is low or evaluators disagree, surfacing a structured clarification prompt. Requires the ecosystem plugin.",
|
|
80
|
+
"author": {
|
|
81
|
+
"name": "Onlooker Community"
|
|
82
|
+
},
|
|
83
|
+
"homepage": "https://onlooker.dev",
|
|
84
|
+
"repository": "https://github.com/onlooker-community/ecosystem",
|
|
85
|
+
"license": "MIT",
|
|
86
|
+
"keywords": ["alignment", "intent", "safety", "pre-write", "evaluation", "clarification"],
|
|
87
|
+
"tags": ["safety", "alignment"]
|
|
88
|
+
},
|
|
89
|
+
{
|
|
90
|
+
"name": "scribe",
|
|
91
|
+
"source": "./plugins/scribe",
|
|
92
|
+
"description": "Intent documentation from agent activity. Captures why changes were made — problem context, decisions, tradeoffs, and constraints — and distills them into readable Markdown artifacts at session end. Git logs record what changed; scribe records why. Requires the ecosystem plugin.",
|
|
93
|
+
"author": {
|
|
94
|
+
"name": "Onlooker Community"
|
|
95
|
+
},
|
|
96
|
+
"homepage": "https://onlooker.dev",
|
|
97
|
+
"repository": "https://github.com/onlooker-community/ecosystem",
|
|
98
|
+
"license": "MIT",
|
|
99
|
+
"keywords": ["documentation", "intent", "decisions", "tradeoffs", "why", "context"],
|
|
100
|
+
"tags": ["documentation", "memory"]
|
|
75
101
|
}
|
|
76
102
|
]
|
|
77
103
|
}
|
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "ecosystem",
|
|
3
|
-
"version": "0.
|
|
3
|
+
"version": "0.17.0",
|
|
4
4
|
"description": "Observability substrate for Claude Code. Provides the shared ~/.onlooker/ storage root, canonical schema-validated event emission, session and tool tracking hooks, and prompt rules. Required by all other Onlooker plugins.",
|
|
5
5
|
"author": {
|
|
6
6
|
"name": "Onlooker Community",
|
|
@@ -1,8 +1,10 @@
|
|
|
1
1
|
{
|
|
2
|
-
".": "0.
|
|
2
|
+
".": "0.17.0",
|
|
3
3
|
"plugins/archivist": "0.1.0",
|
|
4
4
|
"plugins/tribunal": "1.0.1",
|
|
5
5
|
"plugins/echo": "0.2.0",
|
|
6
6
|
"plugins/cartographer": "0.2.0",
|
|
7
|
-
"plugins/governor": "0.
|
|
7
|
+
"plugins/governor": "0.2.0",
|
|
8
|
+
"plugins/compass": "0.2.0",
|
|
9
|
+
"plugins/scribe": "0.2.0"
|
|
8
10
|
}
|
package/CHANGELOG.md
CHANGED
|
@@ -1,5 +1,13 @@
|
|
|
1
1
|
# Changelog
|
|
2
2
|
|
|
3
|
+
## [0.17.0](https://github.com/onlooker-community/ecosystem/compare/ecosystem-v0.16.0...ecosystem-v0.17.0) (2026-06-01)
|
|
4
|
+
|
|
5
|
+
|
|
6
|
+
### Features
|
|
7
|
+
|
|
8
|
+
* **compass:** pre-write intent clarity gate plugin :compass: ([#47](https://github.com/onlooker-community/ecosystem/issues/47)) ([144c2ef](https://github.com/onlooker-community/ecosystem/commit/144c2ef44d28bab3dcec14a9eace7ec76470d090))
|
|
9
|
+
* **scribe:** intent documentation from agent activity :pencil2: ([#50](https://github.com/onlooker-community/ecosystem/issues/50)) ([f0a95d1](https://github.com/onlooker-community/ecosystem/commit/f0a95d1058e36d1bb5f0f645964d9e88e8f98b66))
|
|
10
|
+
|
|
3
11
|
## [0.16.0](https://github.com/onlooker-community/ecosystem/compare/ecosystem-v0.15.2...ecosystem-v0.16.0) (2026-05-26)
|
|
4
12
|
|
|
5
13
|
|
package/CLAUDE.md
ADDED
|
@@ -0,0 +1,88 @@
|
|
|
1
|
+
# Onlooker Ecosystem — Agent Instructions
|
|
2
|
+
|
|
3
|
+
## Repository layout
|
|
4
|
+
|
|
5
|
+
```
|
|
6
|
+
ecosystem/ ← substrate plugin (always-on observability)
|
|
7
|
+
hooks/ ← session, tool, and prompt hooks
|
|
8
|
+
scripts/lib/ ← shared bash helpers and the canonical event emitter
|
|
9
|
+
skills/ ← user-invocable slash commands
|
|
10
|
+
config.json ← ecosystem defaults
|
|
11
|
+
|
|
12
|
+
plugins/
|
|
13
|
+
archivist/ ← session memory across context truncation
|
|
14
|
+
cartographer/ ← instruction-file auditor (CLAUDE.md, AGENTS.md, rules/)
|
|
15
|
+
compass/ ← pre-write alignment gate (design phase)
|
|
16
|
+
echo/ ← prompt-change regression detection
|
|
17
|
+
governor/ ← resource governance and budget enforcement
|
|
18
|
+
tribunal/ ← multi-agent quality gate (Actor → Jury → Meta-Judge → Gate)
|
|
19
|
+
|
|
20
|
+
docs/
|
|
21
|
+
architecture.md ← how plugins compose and share the event bus
|
|
22
|
+
adr/ ← ecosystem-level architectural decisions
|
|
23
|
+
|
|
24
|
+
scripts/lib/onlooker-event.mjs ← canonical event builder; all plugins route through this
|
|
25
|
+
~/.onlooker/ ← shared runtime storage (logs, plugin artifacts)
|
|
26
|
+
```
|
|
27
|
+
|
|
28
|
+
## Plugin map
|
|
29
|
+
|
|
30
|
+
| Plugin | Hook surface | When it fires |
|
|
31
|
+
|--------|-------------|---------------|
|
|
32
|
+
| ecosystem | SessionStart/End, PreToolUse, PostToolUse, PostToolUseFailure, UserPromptSubmit, UserPromptExpansion, PreCompact, PostCompact, TaskCreated, TaskCompleted, WorktreeCreate, WorktreeRemove | Always — substrate |
|
|
33
|
+
| archivist | PreCompact, SessionStart | Extracts decisions/dead-ends on compaction; reinjects at next SessionStart |
|
|
34
|
+
| cartographer | SessionStart, PostToolUse (Write, Edit, MultiEdit) | Audits instruction files on session start and after instruction-file writes |
|
|
35
|
+
| compass | PreToolUse (Write, Edit, MultiEdit, Bash) | Before any write — alignment check |
|
|
36
|
+
| echo | Stop | Regression-tests prompt changes after each agent stop |
|
|
37
|
+
| governor | SessionStart, PreToolUse (Task), PostToolUse (Task), Stop | Budget gates on subagent spawns; tracks spend per session |
|
|
38
|
+
| tribunal | Stop + skill invocation | Post-task quality gate; also invokable via `/tribunal` |
|
|
39
|
+
|
|
40
|
+
Plugins communicate by emitting events to the JSONL log — they do not call each other directly. All plugins depend on the ecosystem substrate; no plugin depends on another plugin directly.
|
|
41
|
+
|
|
42
|
+
## Compass plugin (design phase)
|
|
43
|
+
|
|
44
|
+
Compass is the pre-write alignment gate. It has no implementation yet. Design lives in `plugins/compass/docs/design.md`.
|
|
45
|
+
|
|
46
|
+
**What it does:** Fires on `PreToolUse` for write-class tools. Samples N=5 parallel Haiku evaluators to score intent clarity. Blocks when `confidence < 0.65 OR stddev > 0.20` and surfaces a clarification prompt.
|
|
47
|
+
|
|
48
|
+
**Critical architectural decision (ADR-001):** The evaluator must see the **prior assistant turn** alongside the current context — not the current context alone. Evaluating a reply in isolation produces a systematic false-positive class: a user answering an agent's enumerated question ("the internal one") looks ambiguous without the question that prompted it.
|
|
49
|
+
|
|
50
|
+
The pipeline is:
|
|
51
|
+
|
|
52
|
+
```
|
|
53
|
+
Trigger Gate → Transcript Reader → Symbolic Skip Layer → Sanitizer → N=5 Evaluators → Gate
|
|
54
|
+
```
|
|
55
|
+
|
|
56
|
+
- **Transcript reader** resolves `prior_assistant_turn` from `transcript_path` in the hook JSON payload (same field tribunal-stop-gate.sh reads). Reads one turn back from that file (already committed before `PreToolUse` fires — no timing-skew risk). If `transcript_path` is absent or unreadable, proceeds with an empty prior turn.
|
|
57
|
+
- **Symbolic skip layer** short-circuits to `confident` when the prior turn is an enumerated question and the current context is an option reference, without an LLM call. Controlled by `skip_patterns.reply_to_question.enabled` (default `true`).
|
|
58
|
+
- **Evaluator prompt** uses a structured pair: `<prior_assistant_turn>` and `<context_excerpt>` as separate XML-delimited slots. The convergence question is: *"Given the prior assistant turn as context, would two independent readers converge on the same interpretation of this write?"*
|
|
59
|
+
|
|
60
|
+
See `plugins/compass/docs/adr/001-evaluate-prompts-in-context.md` for the full decision record.
|
|
61
|
+
|
|
62
|
+
## Adding a new plugin
|
|
63
|
+
|
|
64
|
+
1. Create `plugins/<name>/` with `.claude-plugin/plugin.json`, `config.json`, `hooks/hooks.json`.
|
|
65
|
+
2. Use `scripts/lib/onlooker-event.mjs` for all event emission — never write directly to the JSONL log.
|
|
66
|
+
3. Store runtime artifacts under `${ONLOOKER_DIR:-$HOME/.onlooker}/<name>/<project-key>/`. Always use `$ONLOOKER_DIR` — never hardcode `~/.onlooker` — so the test suite's isolated temp home is respected.
|
|
67
|
+
4. Derive the project key via `tribunal_project_key` (or equivalent) — first 12 hex chars of SHA256(`remote:<origin-url>`), falling back to SHA256(`root:<repo-root>`) for repos without a remote. See `plugins/tribunal/scripts/lib/tribunal-project-key.sh`.
|
|
68
|
+
5. Register event types in `@onlooker-community/schema` before emitting them (the emitter validates the envelope).
|
|
69
|
+
6. Fail-soft when `~/.onlooker/` is absent — plugins must not block a session they were not invited to.
|
|
70
|
+
|
|
71
|
+
## Development
|
|
72
|
+
|
|
73
|
+
```bash
|
|
74
|
+
mise install # installs all tools declared in mise.toml
|
|
75
|
+
npm ci
|
|
76
|
+
npm test # bats + schema validation
|
|
77
|
+
npm run test:ci # shellcheck + bats + schema + lint
|
|
78
|
+
```
|
|
79
|
+
|
|
80
|
+
Tests use an isolated temp home; nothing writes to your real `~/.onlooker/`.
|
|
81
|
+
|
|
82
|
+
## Conventions
|
|
83
|
+
|
|
84
|
+
- All hooks are bash scripts. No Python, no Node entry points in hook scripts (they may shell out to `node` for event emission or heavy lifting).
|
|
85
|
+
- Hook scripts source shared helpers from `scripts/lib/` (or the plugin's own `scripts/lib/`).
|
|
86
|
+
- Event types follow `<plugin>.<noun>.<verb>` — e.g. `compass.check.skipped`, `tribunal.gate.blocked`.
|
|
87
|
+
- ULIDs everywhere for IDs (not UUIDs). Each plugin ships its own `*_ulid` helper (e.g. `archivist-ulid.sh`, `tribunal-ulid.sh`); there is no shared ecosystem helper. Copy `plugins/tribunal/scripts/lib/tribunal-ulid.sh` as a starting point and rename the function prefix.
|
|
88
|
+
- Config defaults live in `config.json`. User overrides go in `~/.claude/settings.json` (global) or `.claude/settings.json` (per-project) under the plugin's namespace key (e.g. `"compass"`, `"tribunal"`). See ADR-004.
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "@onlooker-community/ecosystem",
|
|
3
|
-
"version": "0.
|
|
3
|
+
"version": "0.17.0",
|
|
4
4
|
"description": "Agents, skills, hooks, commands, rules, and MCP configurations that power [Onlooker](https://onlooker.dev)",
|
|
5
5
|
"author": {
|
|
6
6
|
"name": "Onlooker Community",
|
|
@@ -26,7 +26,7 @@
|
|
|
26
26
|
"test": "npm run test:bats && npm run test:schema",
|
|
27
27
|
"test:bats": "bats test/bats",
|
|
28
28
|
"test:schema": "node --test test/node/*.test.mjs",
|
|
29
|
-
"test:shellcheck": "shellcheck -S error -x install.sh scripts/common.sh scripts/hooks/*.sh scripts/lib/*.sh plugins/archivist/scripts/hooks/*.sh plugins/archivist/scripts/lib/*.sh plugins/tribunal/scripts/hooks/*.sh plugins/tribunal/scripts/lib/*.sh plugins/echo/scripts/hooks/*.sh plugins/echo/scripts/lib/*.sh plugins/governor/scripts/hooks/*.sh plugins/governor/scripts/lib/*.sh",
|
|
29
|
+
"test:shellcheck": "shellcheck -S error -x install.sh scripts/common.sh scripts/hooks/*.sh scripts/lib/*.sh plugins/archivist/scripts/hooks/*.sh plugins/archivist/scripts/lib/*.sh plugins/tribunal/scripts/hooks/*.sh plugins/tribunal/scripts/lib/*.sh plugins/echo/scripts/hooks/*.sh plugins/echo/scripts/lib/*.sh plugins/governor/scripts/hooks/*.sh plugins/governor/scripts/lib/*.sh plugins/compass/scripts/hooks/*.sh plugins/compass/scripts/lib/*.sh plugins/scribe/scripts/hooks/*.sh plugins/scribe/scripts/lib/*.sh",
|
|
30
30
|
"lint:references": "node scripts/lint/check-references.mjs",
|
|
31
31
|
"lint:manifests": "node scripts/lint/check-manifests.mjs",
|
|
32
32
|
"coverage:node": "node scripts/coverage/run-coverage.mjs",
|
|
@@ -0,0 +1,14 @@
|
|
|
1
|
+
{
|
|
2
|
+
"name": "compass",
|
|
3
|
+
"version": "0.2.0",
|
|
4
|
+
"description": "Pre-write intent clarity gate. Intercepts write-class tool calls and requires a confidence threshold before allowing them to proceed. Evaluates the pending write against the prior assistant turn as context to avoid false positives on question-answer turns. Builds on the Onlooker ecosystem plugin.",
|
|
5
|
+
"author": {
|
|
6
|
+
"name": "Onlooker Community",
|
|
7
|
+
"url": "https://onlooker.dev"
|
|
8
|
+
},
|
|
9
|
+
"homepage": "https://onlooker.dev",
|
|
10
|
+
"repository": "https://github.com/onlooker-community/ecosystem",
|
|
11
|
+
"license": "MIT",
|
|
12
|
+
"skills": [],
|
|
13
|
+
"agents": []
|
|
14
|
+
}
|
|
@@ -0,0 +1,8 @@
|
|
|
1
|
+
# Changelog
|
|
2
|
+
|
|
3
|
+
## [0.2.0](https://github.com/onlooker-community/ecosystem/compare/compass-v0.1.0...compass-v0.2.0) (2026-06-01)
|
|
4
|
+
|
|
5
|
+
|
|
6
|
+
### Features
|
|
7
|
+
|
|
8
|
+
* **compass:** pre-write intent clarity gate plugin :compass: ([#47](https://github.com/onlooker-community/ecosystem/issues/47)) ([144c2ef](https://github.com/onlooker-community/ecosystem/commit/144c2ef44d28bab3dcec14a9eace7ec76470d090))
|
|
@@ -0,0 +1,71 @@
|
|
|
1
|
+
{
|
|
2
|
+
"plugin_name": "compass",
|
|
3
|
+
"storage_path": "~/.onlooker",
|
|
4
|
+
"compass": {
|
|
5
|
+
"enabled": false,
|
|
6
|
+
"evaluator": {
|
|
7
|
+
"model": "claude-haiku-4-5-20251001",
|
|
8
|
+
"n": 5,
|
|
9
|
+
"temperature": 0.3,
|
|
10
|
+
"max_output_tokens": 128,
|
|
11
|
+
"sample_timeout_seconds": 8,
|
|
12
|
+
"min_valid_samples": 3
|
|
13
|
+
},
|
|
14
|
+
"confidence_threshold": 0.65,
|
|
15
|
+
"stddev_threshold": 0.2,
|
|
16
|
+
"threshold_calibration_note": "Noise floor at N=5 temp=0.3 is ~0.62–0.65 for unambiguous tasks. Threshold at 0.65 catches borderline-unambiguous cases (acceptable cost: one clarifying prompt) and prevents ambiguous writes in the 0.60–0.64 range from proceeding silently. Run 'compass calibrate' to measure your project-specific baseline.",
|
|
17
|
+
"cooldown": {
|
|
18
|
+
"strategy": "path_and_identity",
|
|
19
|
+
"seconds": 120,
|
|
20
|
+
"identity_match": "dir_plus_stem"
|
|
21
|
+
},
|
|
22
|
+
"transcript": {
|
|
23
|
+
"prior_turn_chars_max": 800,
|
|
24
|
+
"transcript_max_age_seconds": 300
|
|
25
|
+
},
|
|
26
|
+
"skip_patterns": {
|
|
27
|
+
"reply_to_question": {
|
|
28
|
+
"enabled": true,
|
|
29
|
+
"question_pattern": "numbered_list_with_question_mark",
|
|
30
|
+
"reply_pattern": "option_reference_or_affirmation"
|
|
31
|
+
}
|
|
32
|
+
},
|
|
33
|
+
"max_checks_per_turn": 3,
|
|
34
|
+
"min_context_chars": 80,
|
|
35
|
+
"context_chars_max": 600,
|
|
36
|
+
"include_file_contents": false,
|
|
37
|
+
"skip_globs": ["**/*.lock", "**/*.sum", "**/node_modules/**", "**/.git/**", "**/dist/**", "**/build/**"],
|
|
38
|
+
"error_policy": "closed",
|
|
39
|
+
"circuit_breaker": {
|
|
40
|
+
"enabled": true,
|
|
41
|
+
"consecutive_failures_to_open": 3,
|
|
42
|
+
"open_duration_seconds": 300,
|
|
43
|
+
"open_behavior": "fail_open"
|
|
44
|
+
},
|
|
45
|
+
"sanitization": {
|
|
46
|
+
"strip_sequences": [
|
|
47
|
+
"<prior_assistant_turn>",
|
|
48
|
+
"</prior_assistant_turn>",
|
|
49
|
+
"<context_excerpt>",
|
|
50
|
+
"</context_excerpt>",
|
|
51
|
+
"<tool_input>",
|
|
52
|
+
"</tool_input>",
|
|
53
|
+
"<instructions>",
|
|
54
|
+
"</instructions>",
|
|
55
|
+
"<|",
|
|
56
|
+
"[INST]",
|
|
57
|
+
"[/INST]",
|
|
58
|
+
"<<SYS>>",
|
|
59
|
+
"<</SYS>>"
|
|
60
|
+
],
|
|
61
|
+
"strip_null_bytes": true
|
|
62
|
+
},
|
|
63
|
+
"data_egress": {
|
|
64
|
+
"include_file_contents": false,
|
|
65
|
+
"note": "When false, only the tool name, file path, operation type, prior assistant turn excerpt, and context excerpt (<=600 chars) are sent. File contents are never sent. Set context_chars_max: 0 and prior_turn_chars_max: 0 for near-zero egress."
|
|
66
|
+
},
|
|
67
|
+
"intervention": {
|
|
68
|
+
"recheck_limit": 1
|
|
69
|
+
}
|
|
70
|
+
}
|
|
71
|
+
}
|
|
@@ -0,0 +1,82 @@
|
|
|
1
|
+
# ADR-001: Compass Evaluates Prompts-in-Context, Not Prompts-in-Isolation
|
|
2
|
+
|
|
3
|
+
- Status: Accepted
|
|
4
|
+
- Date: 2026-04-20
|
|
5
|
+
- Deciders: Meagan
|
|
6
|
+
- Tags: compass, oracle, calibration, convergence-sampling, hook-architecture
|
|
7
|
+
|
|
8
|
+
## Context and Problem Statement
|
|
9
|
+
|
|
10
|
+
Compass's `PreToolUse` hook evaluates each pending write operation using the tool call arguments and a short context excerpt from the conversation. When a user replies to a question the agent just asked — e.g., answering "Let's do 3, both" to an enumerated menu in the prior assistant turn — Compass flags the write that follows as uncertain and blocks progress.
|
|
11
|
+
|
|
12
|
+
The write is not actually ambiguous. The ambiguity-resolving information lives in the prior assistant turn (the menu), which the hook payload does not see. Compass is correctly executing a subtly wrong specification: its convergence test asks whether two independent readers of the context *alone* would converge, and for context-dependent replies ("option 2", "both", "do the first one", "yes"), the answer in isolation is always no.
|
|
13
|
+
|
|
14
|
+
This produces a class of false positives that undermines Compass's usefulness in the most common conversational pattern: the agent asks a clarifying question, the user answers it, work proceeds. Under the current design, every write that follows such an answer risks being flagged — not because the user was unclear, but because Compass is blind to what was asked.
|
|
15
|
+
|
|
16
|
+
The failure mode is most visible in multi-turn flows where the agent itself has done the disambiguation work (listing options, asking targeted questions) and the user is simply selecting. Those are precisely the moments where calibration should be cheapest, not most expensive.
|
|
17
|
+
|
|
18
|
+
## Decision Drivers
|
|
19
|
+
|
|
20
|
+
- **False-positive cost is high**: every incorrect flag interrupts the user and forces them to restate context the conversation already established.
|
|
21
|
+
- **Compass's stated value is catching misaligned-but-confident work**: the design should preserve suspicion of genuinely ambiguous writes while not penalizing legitimate context-dependent replies.
|
|
22
|
+
- **Jeong & Son design principle**: declare what you can, reflect symbolically where possible, reserve the LLM for the residual. Answering a menu is a declarable case; it should not require LLM calibration at all.
|
|
23
|
+
- **Hook architecture constraint**: Claude Code hooks receive event payloads, not full transcripts by default. Any fix must be compatible with the payload model.
|
|
24
|
+
- **Tribunal design precedent**: when an evaluator reasons about the quality of its own output, giving it the right substrate to reason over is the load-bearing decision. Tribunal's Actor works because it has access to its full task description; Compass's evaluator needs the same structural access to the turn it is calibrating.
|
|
25
|
+
|
|
26
|
+
## Considered Options
|
|
27
|
+
|
|
28
|
+
1. **Make Compass less suspicious overall.** Raise the confidence threshold so borderline cases pass.
|
|
29
|
+
2. **Evaluate prompts-in-context.** Pass the prior assistant turn into the evaluator payload so the convergence test operates on the pair `{prior_assistant_turn, current_context}`.
|
|
30
|
+
3. **Symbolic skip pattern for question-answers.** Detect when the prior assistant turn ends in an enumerated question and the user prompt references an option; skip LLM calibration entirely for that case.
|
|
31
|
+
4. **Integrate with Archivist.** Query Archivist's extracted turn-pair rather than reading the transcript directly.
|
|
32
|
+
|
|
33
|
+
## Decision
|
|
34
|
+
|
|
35
|
+
We adopt **Option 2 (evaluate prompts-in-context) as the architectural baseline, combined with Option 3 (symbolic skip pattern) as an optimization layer**.
|
|
36
|
+
|
|
37
|
+
Option 2 establishes the correct unit of analysis: Compass's convergence test will evaluate whether two independent readers *with access to the prior assistant turn* would converge on the same interpretation of the pending write. This is the minimal change that resolves the specification bug without weakening Compass's core function.
|
|
38
|
+
|
|
39
|
+
Option 3 layers on top: before invoking the LLM evaluator, Compass performs a cheap symbolic check. If the prior turn ends in an enumerated question (pattern: numbered list with `?` somewhere in the turn) and the current prompt references an option ("1", "option 2", "both", "do 3", "the first one", "yes", "no"), Compass short-circuits to `confident` without an LLM call. This is the Jeong & Son move: most answers to Claude's own questions are declarable; the LLM is reserved for the genuinely ambiguous residual.
|
|
40
|
+
|
|
41
|
+
Option 1 is rejected because it weakens Compass uniformly, including for writes where suspicion is warranted. Option 4 is deferred; it is a clean longer-term integration but introduces a cross-plugin dependency that Compass does not currently have, and Option 2 is a prerequisite in any case.
|
|
42
|
+
|
|
43
|
+
## Consequences
|
|
44
|
+
|
|
45
|
+
### Positive
|
|
46
|
+
|
|
47
|
+
- False positives on question-answer turns drop substantially (expected near-zero for the enumerated-menu case once Option 3 lands).
|
|
48
|
+
- Compass's convergence test operates on the correct unit of analysis, aligning specification with intent.
|
|
49
|
+
- The symbolic skip pattern reduces LLM invocation rate on a large class of turns, lowering latency and cost.
|
|
50
|
+
- The architectural decision is inspectable and traceable: the substrate change (prior-turn context) is separate from the symbolic optimization (skip pattern), so each can be evaluated independently.
|
|
51
|
+
|
|
52
|
+
### Negative
|
|
53
|
+
|
|
54
|
+
- The hook payload grows: it must now include the prior assistant turn, which means reading from the transcript or session log. This adds I/O per evaluation and introduces timing-skew risk (the prior turn's event may not yet be flushed when the `PreToolUse` hook fires).
|
|
55
|
+
- The symbolic skip pattern introduces a regex-based heuristic whose failure modes (false negatives on creatively-phrased answers, false positives on reply-shaped prompts that aren't actually answers) need monitoring.
|
|
56
|
+
- Edge cases appear: what counts as "the prior assistant turn" when the user sends multiple messages in sequence, or when the prior turn was a tool result rather than a conversational reply? These need explicit handling.
|
|
57
|
+
|
|
58
|
+
### Neutral
|
|
59
|
+
|
|
60
|
+
- Archivist integration (Option 4) remains available as a future refactor. If Archivist becomes the canonical source of recent turn structure, Compass's transcript-reading logic can be replaced with an Archivist query without changing the architectural decision recorded here.
|
|
61
|
+
|
|
62
|
+
## Implementation Notes
|
|
63
|
+
|
|
64
|
+
- The hook script reads the most recent assistant turn from the session transcript. The transcript path is provided as `transcript_path` in the hook JSON payload (consistent with how `tribunal-stop-gate.sh` reads it: `jq -r '.transcript_path // ""'`). If `transcript_path` is absent or the file is unreadable, the hook proceeds with an empty `prior_assistant_turn`. The Onlooker event log is not a fallback — `session.prompt` events record user-prompt telemetry, not assistant-turn content.
|
|
65
|
+
- Compass's evaluator prompt is updated to use a structured pair: `<prior_assistant_turn>` and `<context_excerpt>` as separate XML-delimited slots. The convergence question is phrased as: "Given the prior assistant turn as context, would two independent readers converge on the same interpretation of this write?"
|
|
66
|
+
- The symbolic skip pattern is implemented in bash using `jq` and regex, consistent with the plugin's hook style.
|
|
67
|
+
- Skip-pattern decisions are logged as `compass.check.skipped` with `reason: "reply_to_question_pattern"` so false-negative and false-positive rates can be measured.
|
|
68
|
+
- A new `skip_patterns.reply_to_question.enabled` config key (default: `true`) toggles the symbolic layer.
|
|
69
|
+
|
|
70
|
+
## Validation
|
|
71
|
+
|
|
72
|
+
This decision is validated by running Compass against multi-turn conversations where the current implementation produces false interventions on question-answer turns. Expected outcomes:
|
|
73
|
+
|
|
74
|
+
- Reply-to-question turns (answered with option references) short-circuit to `confident` under the skip pattern.
|
|
75
|
+
- Reply-to-question turns with genuine ambiguity (e.g., "both, but only if it's easy") still reach the LLM evaluator, now with prior-turn context, and produce a meaningful calibration state.
|
|
76
|
+
- Non-reply prompts (new requests, unrelated pivots) are unaffected by the skip pattern and continue through normal evaluation.
|
|
77
|
+
|
|
78
|
+
## References
|
|
79
|
+
|
|
80
|
+
- Jeong & Son (2026), *How Much LLM Does a Self-Revising Agent Actually Need?* (arXiv:2604.07236) — declarative substrate principle
|
|
81
|
+
- Tribunal ADR-001 — evaluator substrate precedent: evaluators need structural access to what they evaluate
|
|
82
|
+
- Compass design document (`plugins/compass/docs/design.md`)
|