@onlooker-community/ecosystem 0.9.0 → 0.14.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (112) hide show
  1. package/.claude-plugin/marketplace.json +39 -1
  2. package/.claude-plugin/plugin.json +2 -2
  3. package/.github/copilot-instructions.md +46 -0
  4. package/.github/workflows/coverage.yml +78 -0
  5. package/.github/workflows/release.yml +24 -8
  6. package/.github/workflows/test.yml +3 -0
  7. package/.markdownlintignore +3 -0
  8. package/.release-please-manifest.json +4 -1
  9. package/CHANGELOG.md +44 -0
  10. package/README.md +57 -13
  11. package/config.json +6 -1
  12. package/docs/adr/001-claude-code-hooks-as-integration-surface.md +43 -0
  13. package/docs/adr/002-centralized-jsonl-event-log.md +39 -0
  14. package/docs/adr/003-ulid-over-uuid.md +40 -0
  15. package/docs/adr/004-plugin-config-with-settings-overlay.md +34 -0
  16. package/docs/architecture.md +117 -0
  17. package/hooks/hooks.json +4 -0
  18. package/package.json +13 -7
  19. package/plugins/archivist/.claude-plugin/plugin.json +14 -0
  20. package/plugins/archivist/CHANGELOG.md +8 -0
  21. package/plugins/archivist/README.md +105 -0
  22. package/plugins/archivist/config.json +18 -0
  23. package/plugins/archivist/hooks/hooks.json +35 -0
  24. package/plugins/archivist/scripts/hooks/archivist-extract.sh +238 -0
  25. package/plugins/archivist/scripts/hooks/archivist-inject.sh +159 -0
  26. package/plugins/archivist/scripts/lib/archivist-config.sh +66 -0
  27. package/plugins/archivist/scripts/lib/archivist-project-key.sh +91 -0
  28. package/plugins/archivist/scripts/lib/archivist-storage.sh +215 -0
  29. package/plugins/archivist/scripts/lib/archivist-ulid.sh +52 -0
  30. package/plugins/echo/.claude-plugin/plugin.json +14 -0
  31. package/plugins/echo/CHANGELOG.md +24 -0
  32. package/plugins/echo/README.md +110 -0
  33. package/plugins/echo/config.json +15 -0
  34. package/plugins/echo/docs/adr/001-echo-as-separate-plugin.md +33 -0
  35. package/plugins/echo/docs/adr/002-direct-evaluation-vs-tribunal-pipeline.md +35 -0
  36. package/plugins/echo/docs/adr/003-stop-hook-trigger.md +40 -0
  37. package/plugins/echo/hooks/hooks.json +15 -0
  38. package/plugins/echo/scripts/hooks/echo-stop-gate.sh +366 -0
  39. package/plugins/echo/scripts/lib/echo-config.sh +108 -0
  40. package/plugins/echo/scripts/lib/echo-events.sh +74 -0
  41. package/plugins/echo/scripts/lib/echo-project-key.sh +81 -0
  42. package/plugins/echo/scripts/lib/echo-ulid.sh +46 -0
  43. package/plugins/tribunal/.claude-plugin/plugin.json +20 -0
  44. package/plugins/tribunal/CHANGELOG.md +10 -0
  45. package/plugins/tribunal/README.md +134 -0
  46. package/plugins/tribunal/agents/tribunal-actor.md +35 -0
  47. package/plugins/tribunal/agents/tribunal-judge-adversarial.md +51 -0
  48. package/plugins/tribunal/agents/tribunal-judge-security.md +47 -0
  49. package/plugins/tribunal/agents/tribunal-judge-standard.md +47 -0
  50. package/plugins/tribunal/agents/tribunal-meta-judge.md +61 -0
  51. package/plugins/tribunal/config.json +50 -0
  52. package/plugins/tribunal/docs/adr/001-actor-jury-meta-gate-loop.md +40 -0
  53. package/plugins/tribunal/docs/adr/002-majority-gate-policy.md +48 -0
  54. package/plugins/tribunal/hooks/hooks.json +15 -0
  55. package/plugins/tribunal/scripts/hooks/tribunal-stop-gate.sh +267 -0
  56. package/plugins/tribunal/scripts/lib/tribunal-aggregate.sh +65 -0
  57. package/plugins/tribunal/scripts/lib/tribunal-config.sh +101 -0
  58. package/plugins/tribunal/scripts/lib/tribunal-events.sh +97 -0
  59. package/plugins/tribunal/scripts/lib/tribunal-gate.sh +111 -0
  60. package/plugins/tribunal/scripts/lib/tribunal-jury.sh +102 -0
  61. package/plugins/tribunal/scripts/lib/tribunal-project-key.sh +84 -0
  62. package/plugins/tribunal/scripts/lib/tribunal-rubric.sh +153 -0
  63. package/plugins/tribunal/scripts/lib/tribunal-ulid.sh +50 -0
  64. package/plugins/tribunal/scripts/lib/tribunal-verdict.sh +127 -0
  65. package/plugins/tribunal/skills/tribunal/SKILL.md +129 -0
  66. package/release-please-config.json +43 -5
  67. package/scripts/coverage/bash-coverage.mjs +169 -0
  68. package/scripts/coverage/format-comment.mjs +120 -0
  69. package/scripts/coverage/run-coverage.mjs +151 -0
  70. package/scripts/hooks/agent-spawn-tracker.sh +4 -4
  71. package/scripts/hooks/prompt-rule-injector.sh +122 -0
  72. package/scripts/lib/onlooker-event.mjs +82 -10
  73. package/scripts/lib/portable-lock.sh +48 -0
  74. package/scripts/lib/prompt-rules.sh +207 -0
  75. package/scripts/lib/tool-history.sh +7 -8
  76. package/scripts/lib/validate-path.sh +4 -0
  77. package/scripts/lint/check-manifests.mjs +314 -0
  78. package/scripts/lint/check-references.mjs +311 -0
  79. package/skills/list-prompt-rules/SKILL.md +15 -0
  80. package/test/bats/archivist-config-files.bats +60 -0
  81. package/test/bats/archivist-config.bats +54 -0
  82. package/test/bats/archivist-inject.bats +73 -0
  83. package/test/bats/archivist-project-key.bats +75 -0
  84. package/test/bats/archivist-storage.bats +119 -0
  85. package/test/bats/archivist-ulid.bats +36 -0
  86. package/test/bats/config.bats +10 -10
  87. package/test/bats/echo-config.bats +90 -0
  88. package/test/bats/echo-events.bats +121 -0
  89. package/test/bats/echo-project-key.bats +115 -0
  90. package/test/bats/echo-stop-hook.bats +101 -0
  91. package/test/bats/echo-ulid.bats +38 -0
  92. package/test/bats/portable-lock.bats +62 -0
  93. package/test/bats/prompt-rules.bats +269 -0
  94. package/test/bats/read-chunk-tracking.bats +73 -0
  95. package/test/bats/tool-history-tracker.bats +1 -0
  96. package/test/bats/tribunal-aggregate.bats +77 -0
  97. package/test/bats/tribunal-config.bats +86 -0
  98. package/test/bats/tribunal-events.bats +209 -0
  99. package/test/bats/tribunal-gate.bats +95 -0
  100. package/test/bats/tribunal-jury.bats +80 -0
  101. package/test/bats/tribunal-rubric.bats +119 -0
  102. package/test/bats/tribunal-stop-hook.bats +73 -0
  103. package/test/bats/tribunal-verdict.bats +71 -0
  104. package/test/bats/validate-path.bats +1 -1
  105. package/test/fixtures/hook-inputs/post-tool-use-read-chunked.json +15 -0
  106. package/test/fixtures/hook-inputs/user-prompt-submit-rule-match.json +8 -0
  107. package/test/fixtures/hook-inputs/user-prompt-submit-rule-nomatch.json +8 -0
  108. package/test/helpers/setup.bash +9 -0
  109. package/test/node/check-manifests.test.mjs +173 -0
  110. package/test/node/check-references.test.mjs +279 -0
  111. package/test/node/coverage.test.mjs +143 -0
  112. package/test/node/schema-events.test.mjs +41 -1
@@ -0,0 +1,108 @@
1
+ #!/usr/bin/env bash
2
+ # Config loading for Echo.
3
+ # Reads config.json from the repo's .claude/settings.json echo.* keys,
4
+ # falling back to the plugin's own config.json defaults.
5
+
6
+ _ECHO_CONFIG_JSON=""
7
+ _ECHO_PLUGIN_CONFIG_JSON=""
8
+
9
+ echo_config_load() {
10
+ local repo_root="${1:-}"
11
+
12
+ _ECHO_PLUGIN_CONFIG_JSON=""
13
+ local plugin_config="${CLAUDE_PLUGIN_ROOT:-}/config.json"
14
+ if [[ -f "$plugin_config" ]]; then
15
+ _ECHO_PLUGIN_CONFIG_JSON=$(cat "$plugin_config" 2>/dev/null) || _ECHO_PLUGIN_CONFIG_JSON=""
16
+ fi
17
+
18
+ _ECHO_CONFIG_JSON=""
19
+ if [[ -n "$repo_root" ]]; then
20
+ local settings_file="${repo_root}/.claude/settings.json"
21
+ if [[ -f "$settings_file" ]]; then
22
+ local settings
23
+ settings=$(cat "$settings_file" 2>/dev/null) || settings=""
24
+ local echo_block
25
+ echo_block=$(printf '%s' "$settings" | jq -c '.echo // empty' 2>/dev/null) || echo_block=""
26
+ [[ -n "$echo_block" ]] && _ECHO_CONFIG_JSON="$echo_block"
27
+ fi
28
+ fi
29
+ }
30
+
31
+ # Get a single scalar value. Checks settings.json first, then plugin config.json.
32
+ echo_config_get() {
33
+ local key="$1"
34
+
35
+ if [[ -n "$_ECHO_CONFIG_JSON" ]]; then
36
+ local val
37
+ val=$(printf '%s' "$_ECHO_CONFIG_JSON" | jq -r "${key} // empty" 2>/dev/null) || val=""
38
+ [[ -n "$val" && "$val" != "null" ]] && { printf '%s' "$val"; return 0; }
39
+ fi
40
+
41
+ if [[ -n "$_ECHO_PLUGIN_CONFIG_JSON" ]]; then
42
+ local val
43
+ val=$(printf '%s' "$_ECHO_PLUGIN_CONFIG_JSON" | jq -r ".echo${key} // empty" 2>/dev/null) || val=""
44
+ [[ -n "$val" && "$val" != "null" ]] && { printf '%s' "$val"; return 0; }
45
+ fi
46
+ }
47
+
48
+ echo_config_get_json() {
49
+ local key="$1"
50
+
51
+ if [[ -n "$_ECHO_CONFIG_JSON" ]]; then
52
+ local val
53
+ val=$(printf '%s' "$_ECHO_CONFIG_JSON" | jq -c "${key} // empty" 2>/dev/null) || val=""
54
+ [[ -n "$val" && "$val" != "null" && "$val" != "empty" ]] && { printf '%s' "$val"; return 0; }
55
+ fi
56
+
57
+ if [[ -n "$_ECHO_PLUGIN_CONFIG_JSON" ]]; then
58
+ local val
59
+ val=$(printf '%s' "$_ECHO_PLUGIN_CONFIG_JSON" | jq -c ".echo${key} // empty" 2>/dev/null) || val=""
60
+ [[ -n "$val" && "$val" != "null" && "$val" != "empty" ]] && { printf '%s' "$val"; return 0; }
61
+ fi
62
+ }
63
+
64
+ echo_config_enabled() {
65
+ local val
66
+ val=$(echo_config_get '.enabled')
67
+ [[ "$val" == "true" ]]
68
+ }
69
+
70
+ echo_config_model() {
71
+ local val
72
+ val=$(echo_config_get '.evaluation.model')
73
+ printf '%s' "${val:-claude-haiku-4-5-20251001}"
74
+ }
75
+
76
+ echo_config_timeout() {
77
+ local val
78
+ val=$(echo_config_get '.evaluation.timeout_seconds')
79
+ printf '%s' "${val:-60}"
80
+ }
81
+
82
+ echo_config_drift_threshold() {
83
+ local val
84
+ val=$(echo_config_get '.drift_threshold')
85
+ printf '%s' "${val:-0.05}"
86
+ }
87
+
88
+ # Prints newline-separated list of watch glob patterns.
89
+ echo_config_watch_paths() {
90
+ local raw
91
+ raw=$(echo_config_get_json '.watch_paths')
92
+ if [[ -n "$raw" ]]; then
93
+ printf '%s' "$raw" | jq -r '.[]' 2>/dev/null
94
+ else
95
+ printf 'plugins/*/agents/*.md\n'
96
+ fi
97
+ }
98
+
99
+ # Prints newline-separated list of exclude glob patterns.
100
+ echo_config_exclude_paths() {
101
+ local raw
102
+ raw=$(echo_config_get_json '.exclude_paths')
103
+ if [[ -n "$raw" ]]; then
104
+ printf '%s' "$raw" | jq -r '.[]' 2>/dev/null
105
+ fi
106
+ # Always exclude Echo's own tree — hardcoded, not overridable.
107
+ printf 'plugins/echo/**\n'
108
+ }
@@ -0,0 +1,74 @@
1
+ #!/usr/bin/env bash
2
+ # Canonical echo.* event emission.
3
+ # Thin wrapper around the ecosystem plugin's onlooker-event.mjs `emit` mode.
4
+
5
+ _ECHO_PLUGIN_NAME="echo"
6
+
7
+ _echo_event_js_path() {
8
+ if [[ -n "${_ONLOOKER_EVENT_JS:-}" && -f "$_ONLOOKER_EVENT_JS" ]]; then
9
+ printf '%s' "$_ONLOOKER_EVENT_JS"
10
+ return 0
11
+ fi
12
+ local plugin_root="${CLAUDE_PLUGIN_ROOT:-}"
13
+ local candidates=(
14
+ "${plugin_root}/scripts/lib/onlooker-event.mjs"
15
+ "${plugin_root}/../../scripts/lib/onlooker-event.mjs"
16
+ )
17
+ local c
18
+ for c in "${candidates[@]}"; do
19
+ [[ -f "$c" ]] && { printf '%s' "$c"; return 0; }
20
+ done
21
+ return 1
22
+ }
23
+
24
+ _echo_session_id() {
25
+ if [[ -n "${_HOOK_SESSION_ID:-}" ]]; then
26
+ printf '%s' "$_HOOK_SESSION_ID"
27
+ return 0
28
+ fi
29
+ if [[ -n "${CLAUDE_SESSION_ID:-}" ]]; then
30
+ printf '%s' "$CLAUDE_SESSION_ID"
31
+ return 0
32
+ fi
33
+ printf 'unknown'
34
+ }
35
+
36
+ echo_emit_event() {
37
+ local event_type="${1:-}"
38
+ local payload="${2:-}"
39
+ [[ -z "$event_type" || -z "$payload" ]] && return 1
40
+
41
+ local event_js
42
+ event_js=$(_echo_event_js_path) || {
43
+ printf 'echo-events: cannot locate onlooker-event.mjs\n' >&2
44
+ return 1
45
+ }
46
+
47
+ local session_id
48
+ session_id=$(_echo_session_id)
49
+
50
+ local params
51
+ params=$(jq -n \
52
+ --arg plugin "$_ECHO_PLUGIN_NAME" \
53
+ --arg sid "$session_id" \
54
+ --arg type "$event_type" \
55
+ --argjson payload "$payload" \
56
+ '{plugin: $plugin, session_id: $sid, event_type: $type, payload: $payload}')
57
+
58
+ local event stderr_file
59
+ stderr_file=$(mktemp -t echo-event-err.XXXXXX 2>/dev/null) || stderr_file="/tmp/echo-event-err.$$"
60
+ event=$(printf '%s' "$params" \
61
+ | ONLOOKER_DIR="${ONLOOKER_DIR:-$HOME/.onlooker}" \
62
+ ONLOOKER_PLUGIN_NAME="$_ECHO_PLUGIN_NAME" \
63
+ node "$event_js" emit 2>"$stderr_file") || {
64
+ printf 'echo-events: schema validation failed for %s\n' "$event_type" >&2
65
+ [[ -s "$stderr_file" ]] && cat "$stderr_file" >&2
66
+ rm -f "$stderr_file"
67
+ return 1
68
+ }
69
+ rm -f "$stderr_file"
70
+
71
+ local log_path="${ONLOOKER_EVENTS_LOG:-${ONLOOKER_DIR:-$HOME/.onlooker}/logs/onlooker-events.jsonl}"
72
+ mkdir -p "$(dirname "$log_path")" 2>/dev/null || return 1
73
+ printf '%s\n' "$event" >>"$log_path"
74
+ }
@@ -0,0 +1,81 @@
1
+ #!/usr/bin/env bash
2
+ # Project key derivation for Echo.
3
+ # Mirrors archivist/tribunal: stable 12-char hex key derived from the git remote
4
+ # or repo root, surviving renames, clones, and worktrees.
5
+
6
+ _echo_sha256_first12() {
7
+ local input="$1"
8
+ if command -v shasum >/dev/null 2>&1; then
9
+ printf '%s' "$input" | shasum -a 256 2>/dev/null | cut -c1-12
10
+ elif command -v sha256sum >/dev/null 2>&1; then
11
+ printf '%s' "$input" | sha256sum 2>/dev/null | cut -c1-12
12
+ else
13
+ return 1
14
+ fi
15
+ }
16
+
17
+ echo_project_remote_url() {
18
+ local cwd="${1:-}"
19
+ [[ -z "$cwd" || ! -d "$cwd" ]] && return 0
20
+ git -C "$cwd" remote get-url origin 2>/dev/null || true
21
+ }
22
+
23
+ echo_project_repo_root() {
24
+ local cwd="${1:-}"
25
+ [[ -z "$cwd" || ! -d "$cwd" ]] && return 0
26
+
27
+ if ! git -C "$cwd" rev-parse --is-inside-work-tree >/dev/null 2>&1; then
28
+ return 0
29
+ fi
30
+
31
+ local common_dir toplevel
32
+ common_dir=$(git -C "$cwd" rev-parse --git-common-dir 2>/dev/null) || return 0
33
+
34
+ if [[ -n "$common_dir" && "$common_dir" != /* ]]; then
35
+ common_dir="$(cd "$cwd" && cd "$common_dir" 2>/dev/null && pwd -P)" || common_dir=""
36
+ fi
37
+
38
+ if [[ -n "$common_dir" && -d "$common_dir" ]]; then
39
+ toplevel="$(cd "$common_dir/.." 2>/dev/null && pwd -P)" || toplevel=""
40
+ fi
41
+
42
+ if [[ -z "$toplevel" ]]; then
43
+ toplevel=$(git -C "$cwd" rev-parse --show-toplevel 2>/dev/null || true)
44
+ [[ -n "$toplevel" ]] && toplevel="$(cd "$toplevel" 2>/dev/null && pwd -P)"
45
+ fi
46
+
47
+ printf '%s' "$toplevel"
48
+ }
49
+
50
+ echo_project_key() {
51
+ local cwd="${1:-}"
52
+ [[ -z "$cwd" ]] && cwd="$(pwd)"
53
+
54
+ local remote
55
+ remote=$(echo_project_remote_url "$cwd")
56
+ if [[ -n "$remote" ]]; then
57
+ _echo_sha256_first12 "remote:$remote"
58
+ return 0
59
+ fi
60
+
61
+ local root
62
+ root=$(echo_project_repo_root "$cwd")
63
+ if [[ -n "$root" ]]; then
64
+ _echo_sha256_first12 "root:$root"
65
+ return 0
66
+ fi
67
+
68
+ return 0
69
+ }
70
+
71
+ # Stable test_id for a file path: first 16 chars of sha256 of the path.
72
+ echo_test_id_for_path() {
73
+ local path="$1"
74
+ if command -v shasum >/dev/null 2>&1; then
75
+ printf '%s' "$path" | shasum -a 256 2>/dev/null | cut -c1-16
76
+ elif command -v sha256sum >/dev/null 2>&1; then
77
+ printf '%s' "$path" | sha256sum 2>/dev/null | cut -c1-16
78
+ else
79
+ printf '%s' "$path" | od -A n -t x1 | tr -d ' \n' | cut -c1-16
80
+ fi
81
+ }
@@ -0,0 +1,46 @@
1
+ #!/usr/bin/env bash
2
+ # Minimal ULID generator for Echo suite_id and test_id values.
3
+ # Crockford Base32, lexicographically sortable, time-ordered.
4
+
5
+ _ECHO_ULID_ALPHABET="0123456789ABCDEFGHJKMNPQRSTVWXYZ"
6
+
7
+ _echo_ulid_encode() {
8
+ local n="$1"
9
+ local len="$2"
10
+ local out=""
11
+ local i
12
+ for ((i = 0; i < len; i++)); do
13
+ out="${_ECHO_ULID_ALPHABET:$((n % 32)):1}${out}"
14
+ n=$((n / 32))
15
+ done
16
+ printf '%s' "$out"
17
+ }
18
+
19
+ echo_ulid() {
20
+ local now_ms
21
+ if [[ "$(uname)" == "Darwin" ]]; then
22
+ now_ms=$(python3 -c 'import time; print(int(time.time() * 1000))' 2>/dev/null) \
23
+ || now_ms=$(($(date +%s) * 1000))
24
+ else
25
+ now_ms=$(date +%s%3N 2>/dev/null) || now_ms=$(($(date +%s) * 1000))
26
+ fi
27
+
28
+ local rand_hex rand_hi rand_lo
29
+ rand_hex=$(openssl rand -hex 10 2>/dev/null)
30
+ if [[ -n "$rand_hex" && ${#rand_hex} -eq 20 ]]; then
31
+ rand_hi=$((16#${rand_hex:0:10}))
32
+ rand_lo=$((16#${rand_hex:10:10}))
33
+ else
34
+ rand_hi=$((RANDOM * 32768 + RANDOM))
35
+ rand_lo=$((RANDOM * 32768 + RANDOM))
36
+ rand_hi=$(((rand_hi * 256 + RANDOM % 256) & ((1 << 40) - 1)))
37
+ rand_lo=$(((rand_lo * 256 + RANDOM % 256) & ((1 << 40) - 1)))
38
+ fi
39
+
40
+ local ts_part hi_part lo_part
41
+ ts_part=$(_echo_ulid_encode "$now_ms" 10)
42
+ hi_part=$(_echo_ulid_encode "$rand_hi" 8)
43
+ lo_part=$(_echo_ulid_encode "$rand_lo" 8)
44
+
45
+ printf '%s%s%s' "$ts_part" "$hi_part" "$lo_part"
46
+ }
@@ -0,0 +1,20 @@
1
+ {
2
+ "name": "tribunal",
3
+ "version": "1.0.0",
4
+ "description": "Multi-agent execution with LLM-as-a-Judge quality gates. An Actor performs work; a jury of typed Judges scores it against a project-overridable rubric; a Meta-Judge reviews the jury for bias; the gate decides accept, retry, or exhaust. Grounded in LLM-as-a-Judge (Zheng et al. 2023) and LLM-as-a-Meta-Judge (Wu et al. 2024). Builds on the Onlooker ecosystem plugin.",
5
+ "author": {
6
+ "name": "Onlooker Community",
7
+ "url": "https://onlooker.dev"
8
+ },
9
+ "homepage": "https://onlooker.dev",
10
+ "repository": "https://github.com/onlooker-community/ecosystem",
11
+ "license": "MIT",
12
+ "skills": ["./skills/tribunal"],
13
+ "agents": [
14
+ "./agents/tribunal-actor.md",
15
+ "./agents/tribunal-judge-standard.md",
16
+ "./agents/tribunal-judge-security.md",
17
+ "./agents/tribunal-judge-adversarial.md",
18
+ "./agents/tribunal-meta-judge.md"
19
+ ]
20
+ }
@@ -0,0 +1,10 @@
1
+ # Changelog
2
+
3
+ ## 1.0.0 (2026-05-24)
4
+
5
+
6
+ ### Features
7
+
8
+ * **tribunal:** add multi-agent code review plugin :sparkles: ([#30](https://github.com/onlooker-community/ecosystem/issues/30)) ([893f24a](https://github.com/onlooker-community/ecosystem/commit/893f24a8876fdd6ccb5c7dcf2636a7c902e88949))
9
+
10
+ ## Changelog
@@ -0,0 +1,134 @@
1
+ # Tribunal
2
+
3
+ Multi-agent execution with LLM-as-a-Judge quality gates.
4
+
5
+ Tribunal wraps a task in a three-tier evaluation loop:
6
+
7
+ 1. An **Actor** subagent performs the work.
8
+ 2. A **jury** of typed **Judges** scores the output against a rubric.
9
+ 3. A **Meta-Judge** reviews the jury for bias, hallucination, and criteria misapplication.
10
+ 4. A configurable **gate policy** decides whether to accept, retry, or give up.
11
+
12
+ Grounded in two papers:
13
+
14
+ - [LLM-as-a-Judge (Zheng et al. 2023)](https://arxiv.org/abs/2306.05685) — strong LLMs can score other LLMs against rubrics with reasonable agreement to human judgment.
15
+ - [LLM-as-a-Meta-Judge (Wu et al. 2024)](https://arxiv.org/abs/2407.19594) — a second model reviewing the Judge catches position, verbosity, and self-enhancement bias.
16
+
17
+ Tribunal is a sibling plugin to [`ecosystem`](../../) and assumes the Onlooker observability substrate (`~/.onlooker/`) is present.
18
+
19
+ ## How it works
20
+
21
+ | Surface | What tribunal does |
22
+ |---|---|
23
+ | `/tribunal <task>` skill | Orchestrates a full Actor → Jury → Meta → Gate loop, retrying the Actor with Judge critiques until the gate passes or `max_iterations` is reached. Emits the full canonical event stream. |
24
+ | `Stop` hook (opt-in) | When `tribunal.stop_hook.enabled` is true, runs a single advisory pass on the just-finished turn's output and writes a verdict for review on the next session. No retry — the main session has already ended. |
25
+
26
+ ## Default jury
27
+
28
+ Out of the box, Tribunal empanels **two judges** to showcase the jury model without the cost of a full panel:
29
+
30
+ - `tribunal-judge-standard` — correctness, completeness, clarity.
31
+ - `tribunal-judge-adversarial` — devil's advocate, actively looks for failure modes and unhandled edges.
32
+
33
+ The gate uses `majority` policy with `weighted_mean` aggregation, so one strong reject does not automatically block. `tribunal-judge-security` is shipped but off by default — opt in for security-sensitive repos by adding `"security"` to `judge_types`.
34
+
35
+ ## Configuration
36
+
37
+ Tribunal is enabled by default; the Stop hook is opt-in. Override per-project in your project's `.claude/settings.json`:
38
+
39
+ ```json
40
+ {
41
+ "tribunal": {
42
+ "session": {
43
+ "judge_types": ["standard", "security", "adversarial"],
44
+ "gate_policy": "majority",
45
+ "max_iterations": 5
46
+ },
47
+ "stop_hook": { "enabled": true }
48
+ }
49
+ }
50
+ ```
51
+
52
+ The full default `config.json` is the source of truth for available knobs.
53
+
54
+ ### Project rubric override
55
+
56
+ Drop a `rubrics` file at `<repo>/.claude/tribunal.json` (or globally at `~/.onlooker/tribunal.json`) to override the built-in `default` rubric or add named rubrics referenced as `/tribunal --rubric=<id>`:
57
+
58
+ ```json
59
+ {
60
+ "rubrics": [
61
+ {
62
+ "id": "default",
63
+ "criteria": [
64
+ { "name": "correctness", "weight": 0.5, "min_pass": 0.8 },
65
+ { "name": "tests", "weight": 0.3, "min_pass": 0.7 },
66
+ { "name": "docs", "weight": 0.2, "min_pass": 0.5 }
67
+ ],
68
+ "score_threshold": 0.8,
69
+ "max_iterations": 5,
70
+ "judge_types": ["standard", "security", "adversarial"],
71
+ "gate_policy": "majority",
72
+ "aggregation_method": "weighted_mean"
73
+ }
74
+ ]
75
+ }
76
+ ```
77
+
78
+ Project rubrics override built-ins by `id`.
79
+
80
+ ## Subagents
81
+
82
+ | Agent | `judge_type` | Role |
83
+ |---|---|---|
84
+ | `tribunal-actor` | n/a | Performs the task. Receives prior iteration's verdicts on retries. |
85
+ | `tribunal-judge-standard` | `standard` | General correctness, completeness, clarity. |
86
+ | `tribunal-judge-security` | `security` | Vulnerability-focused: injection, auth bypass, data exposure. |
87
+ | `tribunal-judge-adversarial` | `adversarial` | Actively tries to find failure modes and missing edge cases. |
88
+ | `tribunal-meta-judge` | `meta` | Reviews each Judge's verdict for the six bias types defined in the LLM-as-a-Judge paper. |
89
+
90
+ `maintainability` and `domain` judge types are recognized in config but not yet shipped as subagents; they degrade to `standard` with a warning. They are planned for v0.2.
91
+
92
+ ## Storage layout
93
+
94
+ ```text
95
+ ~/.onlooker/tribunal/<project-key>/
96
+ ├── manifest.json
97
+ └── <task_id>/ # ULID
98
+ ├── manifest.json
99
+ ├── session-start.json
100
+ ├── session-complete.json
101
+ └── iteration-<iteration_id>/ # ULID per iteration
102
+ ├── actor.md
103
+ ├── jury.json
104
+ ├── verdicts/
105
+ │ └── <judge_id>.json # one per judge
106
+ ├── consensus.json
107
+ ├── dissent.json # only when emitted
108
+ ├── meta.json
109
+ └── gate.json
110
+ ```
111
+
112
+ Project keying mirrors `archivist`: SHA256 of `git remote get-url origin` (first 12 hex), falling back to a hash of the repo root realpath. Worktrees of the same repo share a key.
113
+
114
+ ## Events emitted
115
+
116
+ Tribunal emits the full canonical `tribunal.*` event surface from [`@onlooker-community/schema`](https://github.com/onlooker-community/schema) (v2.1.0+):
117
+
118
+ `session.start`, `iteration.start`, `actor.start`, `actor.complete`, `jury.empaneled`, `judge.start`, `verdict` (one per judge), `meta.start`, `meta.complete`, `consensus.reached`, `dissent.recorded` (when judges disagree), `gate.passed` / `gate.blocked`, `session.complete`.
119
+
120
+ All events land in `~/.onlooker/logs/onlooker-events.jsonl` and are validated against the schema before write.
121
+
122
+ ## Requirements
123
+
124
+ - The `ecosystem` plugin installed (for `~/.onlooker/` substrate).
125
+ - `claude` CLI on `PATH` (the Stop hook shells out to `claude -p` for its advisory pass).
126
+ - `jq` for JSON manipulation.
127
+ - `node` for canonical-event emission (the ecosystem plugin already requires this).
128
+
129
+ ## Architecture decisions
130
+
131
+ Key decisions made during initial design are recorded in [`docs/adr/`](docs/adr/):
132
+
133
+ - [ADR-001](docs/adr/001-actor-jury-meta-gate-loop.md) — The Actor → Jury → Meta-Judge → Gate loop
134
+ - [ADR-002](docs/adr/002-majority-gate-policy.md) — Majority gate policy as default (and the 2-judge edge case)
@@ -0,0 +1,35 @@
1
+ ---
2
+ name: tribunal-actor
3
+ description: Performs a task end-to-end under Tribunal supervision. Receives the task description and, on retry iterations, the prior iteration's jury verdicts and Meta-Judge feedback. Output is the work itself (code changes, an analysis, a refactor plan) rendered as the final assistant message — no JSON wrapping, no scoring; the Judges do that next.
4
+ model: claude-sonnet-4-6
5
+ tools: Read, Edit, Write, Bash, Grep, Glob
6
+ ---
7
+
8
+ # Tribunal Actor
9
+
10
+ You are the **Actor** in a Tribunal evaluation loop. Your job is to do the work the user asked for. A jury of Judges will score your output against a rubric, and a Meta-Judge will review the jury before the gate decides whether to accept, retry, or give up.
11
+
12
+ ## Inputs
13
+
14
+ You will receive:
15
+
16
+ - **Task description** — what to do.
17
+ - **Rubric criteria** — the dimensions the Judges will score on (e.g., correctness, completeness, safety, clarity). Use these as a checklist while you work; they tell you what "good" looks like for this task.
18
+ - **(On retries only) Prior iteration's feedback** — a digest of the Judges' verdicts and any Meta-Judge override or bias notes. Address the specific concerns; do not re-litigate scores.
19
+
20
+ ## Output expectations
21
+
22
+ - Render your work as the final assistant message — code, edits, an analysis, a plan, whatever the task calls for.
23
+ - Be concrete. Vague directional answers score poorly on `completeness` and `clarity`.
24
+ - When you make a non-obvious choice, state the trade-off in one line. Judges credit this under `correctness` and `clarity`; they penalize unexplained guesses.
25
+ - Do not score yourself. Do not write a "self-review." The Judges will do that.
26
+
27
+ ## What to avoid
28
+
29
+ - Stalling. If you cannot complete a step, say so explicitly and proceed with what you can finish — partial work that names its gaps scores better than fabricated completeness.
30
+ - Over-engineering. If the task is a one-line fix, give a one-line fix. Adding scaffolding hurts `clarity` and may trip the `adversarial` Judge.
31
+ - Padding. Verbosity is a known judge bias the Meta-Judge will flag against you. Say what needs saying and stop.
32
+
33
+ ## On retry
34
+
35
+ When you see prior verdicts, treat the lowest-scoring criterion as the priority. If the Meta-Judge flagged `bias_detected`, you can ignore the bias-affected critique on that dimension — but address every concern the Meta-Judge endorsed (`verdict_quality: sound`).
@@ -0,0 +1,51 @@
1
+ ---
2
+ name: tribunal-judge-adversarial
3
+ description: Devil's-advocate Tribunal judge. Actively tries to break the Actor's work — edge cases, empty inputs, concurrent callers, partial failures, version drift, assumptions that are not stated. Pairs well with tribunal-judge-standard to balance optimistic and pessimistic scoring. Emits TribunalVerdictPayload as the final message. Read-only (Bash allowed only to run existing test suites — do not modify code).
4
+ model: claude-opus-4-7
5
+ tools: Read, Grep, Glob, Bash
6
+ ---
7
+
8
+ # Tribunal Adversarial Judge
9
+
10
+ You are the **Adversarial Judge** in a Tribunal jury. Your job is to try, in good faith, to falsify the Actor's claim that the work is correct. The Standard Judge looks for what is right; you look for what could break.
11
+
12
+ ## Your stance
13
+
14
+ - Assume the Actor missed something. Prove or disprove it before scoring.
15
+ - You may run existing tests (`Bash`) to confirm or refute Actor claims. Do not write new tests or modify code — read-only stance.
16
+ - You may not invent constraints the task did not impose. The Meta-Judge will flag that as `position` or `verbosity` bias and downweight you.
17
+
18
+ ## What to probe
19
+
20
+ - **Empty / null / boundary inputs** — does the code handle `[]`, `""`, `0`, `None`, very long inputs?
21
+ - **Concurrent callers** — race on a file lock, on a shared global, on an outer cache.
22
+ - **Partial failures** — what if step 2 of 3 fails — is state left half-written?
23
+ - **Unstated assumptions** — does the code assume sorted input? Timezone-naive timestamps? `LC_ALL=C`? A specific shell?
24
+ - **Version drift** — does it use a flag added in a recent version of a tool? Will it work on the older versions documented as supported?
25
+ - **Idempotency** — what happens on a second run?
26
+ - **Reverse engineering the test** — can you produce an input that satisfies the test but breaks the spirit of the task?
27
+
28
+ ## Scoring discipline
29
+
30
+ - Each concrete falsification (a reproducible failure or a clear, named gap) drops the score by `0.15`, floor `0.10`.
31
+ - A single vague "this might fail" is worth `0.0` — name the input or do not raise it.
32
+ - If you genuinely cannot falsify, score `0.85+` and say so. Refusing to ever give a high score is `refusal` bias and the Meta-Judge will flag it.
33
+
34
+ ## Output format
35
+
36
+ Final message is a single JSON object — no prose, no fence:
37
+
38
+ ```json
39
+ {
40
+ "score": 0.55,
41
+ "passed": false,
42
+ "judge_type": "adversarial",
43
+ "criteria_evaluated": ["edge-cases", "concurrency", "idempotency"],
44
+ "strengths_count": 1,
45
+ "weaknesses_count": 2,
46
+ "confidence": 0.8,
47
+ "feedback_summary": "Reproduced: empty input array raises IndexError at parse.py:42 instead of returning []. Second run of the migration script duplicates rows — not idempotent. Concurrency story is fine, single-process by design."
48
+ }
49
+ ```
50
+
51
+ `feedback_summary` should describe each falsification with enough specificity that the Actor can reproduce it on retry.
@@ -0,0 +1,47 @@
1
+ ---
2
+ name: tribunal-judge-security
3
+ description: Security-focused Tribunal judge. Scores Actor output through a vulnerability lens — injection, auth, secrets, unsafe shell, path traversal, deserialization, SSRF, race conditions on shared resources. Off by default; opt in by adding "security" to judge_types for security-sensitive code. Emits TribunalVerdictPayload as the final message. Read-only.
4
+ model: claude-opus-4-7
5
+ tools: Read, Grep, Glob
6
+ ---
7
+
8
+ # Tribunal Security Judge
9
+
10
+ You are the **Security Judge** in a Tribunal jury. Score the Actor's output exclusively through a security lens. If the change has no security surface, score `correctness` neutrally (around `0.75`) with a short note; do not invent issues to justify your presence.
11
+
12
+ ## What to look for
13
+
14
+ - **Injection** — SQL/command/shell/template/LDAP. Anything that builds a query or command from user input.
15
+ - **AuthN/AuthZ** — bypasses, missing checks, privilege escalation, session handling, token leakage.
16
+ - **Secrets handling** — credentials in logs, env vars echoed to stdout, secrets committed to disk.
17
+ - **Unsafe shell** — `eval`, unquoted expansions, `rm -rf $VAR` without validation, `curl | bash` patterns.
18
+ - **Path traversal** — unconstrained `../` paths, symlink chasing, missing realpath validation.
19
+ - **Deserialization** — `pickle`, unsafe YAML, `JSON.parse` of untrusted input feeding `eval`.
20
+ - **SSRF / open redirects** — fetches whose target derives from user input.
21
+ - **TOCTOU** and races on shared resources, especially around files and locks.
22
+
23
+ ## Scoring discipline
24
+
25
+ - A single critical finding (RCE, auth bypass, secret leak) caps `score` at `0.3` regardless of other dimensions.
26
+ - Multiple medium findings cap at `0.6`.
27
+ - Read the changed files. Do not score from the summary.
28
+ - Do not flag style or hypothetical "could be exploited if…" without a concrete attack chain. The Meta-Judge will mark you as `biased` if you over-report.
29
+
30
+ ## Output format
31
+
32
+ Final message is a single JSON object — no prose, no fence:
33
+
34
+ ```json
35
+ {
36
+ "score": 0.45,
37
+ "passed": false,
38
+ "judge_type": "security",
39
+ "criteria_evaluated": ["injection", "secrets", "path-traversal"],
40
+ "strengths_count": 1,
41
+ "weaknesses_count": 2,
42
+ "confidence": 0.9,
43
+ "feedback_summary": "scripts/run.sh:24 passes $USER_INPUT to a shell without quoting → command injection. scripts/run.sh:31 logs the API token. Other dimensions clean."
44
+ }
45
+ ```
46
+
47
+ When `passed: false`, every finding in `feedback_summary` must point at a file and (when possible) a line. Vague security objections waste the Actor's retry budget.
@@ -0,0 +1,47 @@
1
+ ---
2
+ name: tribunal-judge-standard
3
+ description: Scores Actor output against the active rubric on correctness, completeness, and clarity. Emits a single TribunalVerdictPayload JSON object as the final message — no prose around it. Read-only: you evaluate, you do not edit. Designed for the general case (refactors, docs, analysis, most code changes). Use the security or adversarial judge for those specific lenses.
4
+ model: claude-opus-4-7
5
+ tools: Read, Grep, Glob
6
+ ---
7
+
8
+ # Tribunal Standard Judge
9
+
10
+ You are the **Standard Judge** in a Tribunal jury. Score the Actor's output against the rubric. Be honest, calibrated, and terse.
11
+
12
+ ## Inputs
13
+
14
+ - **Task description** — what the Actor was asked to do.
15
+ - **Rubric** — list of criteria with `name`, `weight`, `min_pass`. Score each criterion in [0, 1].
16
+ - **Actor output** — what to evaluate.
17
+ - **Score threshold** — the overall bar for `passed: true`.
18
+
19
+ ## Scoring discipline
20
+
21
+ - Read the actual files the Actor changed before scoring. Do not score from the Actor's summary alone.
22
+ - A `0.7` means "meets the bar." Reserve `0.9+` for clearly excellent work. Reserve `< 0.5` for clearly broken work.
23
+ - Calibrate against the rubric, not against an imagined ideal answer. A small task done well scores higher than a sprawling task done halfway.
24
+ - Avoid verbosity bias: a long Actor response is not better than a short correct one.
25
+
26
+ ## Output format
27
+
28
+ Your **final message** must be a single JSON object matching `TribunalVerdictPayload`. No markdown, no prose around it, no code fence — just JSON:
29
+
30
+ ```json
31
+ {
32
+ "score": 0.82,
33
+ "passed": true,
34
+ "judge_type": "standard",
35
+ "criteria_evaluated": ["correctness", "completeness", "clarity"],
36
+ "strengths_count": 3,
37
+ "weaknesses_count": 1,
38
+ "confidence": 0.85,
39
+ "feedback_summary": "Patch is correct and minimal. Missing test for the empty-input case. Naming and comments are clear."
40
+ }
41
+ ```
42
+
43
+ Required fields: `score`, `passed`, `judge_type`. `passed` reflects your own judgment based on the rubric thresholds — the orchestrator may still aggregate and override per gate policy.
44
+
45
+ `feedback_summary` should be 1–3 sentences. Name specific files and lines when you can. This is what the Actor sees on retry.
46
+
47
+ The orchestrator will inject `judge_id` and `iteration_id` when persisting your verdict.