@onlooker-community/ecosystem 0.10.0 → 0.14.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.claude-plugin/marketplace.json +39 -1
- package/.claude-plugin/plugin.json +2 -2
- package/.github/copilot-instructions.md +46 -0
- package/.github/workflows/coverage.yml +78 -0
- package/.github/workflows/release.yml +24 -8
- package/.github/workflows/test.yml +3 -0
- package/.markdownlintignore +3 -0
- package/.release-please-manifest.json +4 -1
- package/CHANGELOG.md +37 -0
- package/README.md +57 -13
- package/config.json +6 -1
- package/docs/adr/001-claude-code-hooks-as-integration-surface.md +43 -0
- package/docs/adr/002-centralized-jsonl-event-log.md +39 -0
- package/docs/adr/003-ulid-over-uuid.md +40 -0
- package/docs/adr/004-plugin-config-with-settings-overlay.md +34 -0
- package/docs/architecture.md +117 -0
- package/hooks/hooks.json +4 -0
- package/package.json +13 -7
- package/plugins/archivist/.claude-plugin/plugin.json +14 -0
- package/plugins/archivist/CHANGELOG.md +8 -0
- package/plugins/archivist/README.md +105 -0
- package/plugins/archivist/config.json +18 -0
- package/plugins/archivist/hooks/hooks.json +35 -0
- package/plugins/archivist/scripts/hooks/archivist-extract.sh +238 -0
- package/plugins/archivist/scripts/hooks/archivist-inject.sh +159 -0
- package/plugins/archivist/scripts/lib/archivist-config.sh +66 -0
- package/plugins/archivist/scripts/lib/archivist-project-key.sh +91 -0
- package/plugins/archivist/scripts/lib/archivist-storage.sh +215 -0
- package/plugins/archivist/scripts/lib/archivist-ulid.sh +52 -0
- package/plugins/echo/.claude-plugin/plugin.json +14 -0
- package/plugins/echo/CHANGELOG.md +24 -0
- package/plugins/echo/README.md +110 -0
- package/plugins/echo/config.json +15 -0
- package/plugins/echo/docs/adr/001-echo-as-separate-plugin.md +33 -0
- package/plugins/echo/docs/adr/002-direct-evaluation-vs-tribunal-pipeline.md +35 -0
- package/plugins/echo/docs/adr/003-stop-hook-trigger.md +40 -0
- package/plugins/echo/hooks/hooks.json +15 -0
- package/plugins/echo/scripts/hooks/echo-stop-gate.sh +366 -0
- package/plugins/echo/scripts/lib/echo-config.sh +108 -0
- package/plugins/echo/scripts/lib/echo-events.sh +74 -0
- package/plugins/echo/scripts/lib/echo-project-key.sh +81 -0
- package/plugins/echo/scripts/lib/echo-ulid.sh +46 -0
- package/plugins/tribunal/.claude-plugin/plugin.json +20 -0
- package/plugins/tribunal/CHANGELOG.md +10 -0
- package/plugins/tribunal/README.md +134 -0
- package/plugins/tribunal/agents/tribunal-actor.md +35 -0
- package/plugins/tribunal/agents/tribunal-judge-adversarial.md +51 -0
- package/plugins/tribunal/agents/tribunal-judge-security.md +47 -0
- package/plugins/tribunal/agents/tribunal-judge-standard.md +47 -0
- package/plugins/tribunal/agents/tribunal-meta-judge.md +61 -0
- package/plugins/tribunal/config.json +50 -0
- package/plugins/tribunal/docs/adr/001-actor-jury-meta-gate-loop.md +40 -0
- package/plugins/tribunal/docs/adr/002-majority-gate-policy.md +48 -0
- package/plugins/tribunal/hooks/hooks.json +15 -0
- package/plugins/tribunal/scripts/hooks/tribunal-stop-gate.sh +267 -0
- package/plugins/tribunal/scripts/lib/tribunal-aggregate.sh +65 -0
- package/plugins/tribunal/scripts/lib/tribunal-config.sh +101 -0
- package/plugins/tribunal/scripts/lib/tribunal-events.sh +97 -0
- package/plugins/tribunal/scripts/lib/tribunal-gate.sh +111 -0
- package/plugins/tribunal/scripts/lib/tribunal-jury.sh +102 -0
- package/plugins/tribunal/scripts/lib/tribunal-project-key.sh +84 -0
- package/plugins/tribunal/scripts/lib/tribunal-rubric.sh +153 -0
- package/plugins/tribunal/scripts/lib/tribunal-ulid.sh +50 -0
- package/plugins/tribunal/scripts/lib/tribunal-verdict.sh +127 -0
- package/plugins/tribunal/skills/tribunal/SKILL.md +129 -0
- package/release-please-config.json +43 -5
- package/scripts/coverage/bash-coverage.mjs +169 -0
- package/scripts/coverage/format-comment.mjs +120 -0
- package/scripts/coverage/run-coverage.mjs +151 -0
- package/scripts/hooks/agent-spawn-tracker.sh +4 -4
- package/scripts/hooks/prompt-rule-injector.sh +122 -0
- package/scripts/lib/portable-lock.sh +48 -0
- package/scripts/lib/prompt-rules.sh +207 -0
- package/scripts/lib/tool-history.sh +7 -8
- package/scripts/lib/validate-path.sh +4 -0
- package/scripts/lint/check-manifests.mjs +314 -0
- package/scripts/lint/check-references.mjs +311 -0
- package/skills/list-prompt-rules/SKILL.md +15 -0
- package/test/bats/archivist-config-files.bats +60 -0
- package/test/bats/archivist-config.bats +54 -0
- package/test/bats/archivist-inject.bats +73 -0
- package/test/bats/archivist-project-key.bats +75 -0
- package/test/bats/archivist-storage.bats +119 -0
- package/test/bats/archivist-ulid.bats +36 -0
- package/test/bats/config.bats +10 -10
- package/test/bats/echo-config.bats +90 -0
- package/test/bats/echo-events.bats +121 -0
- package/test/bats/echo-project-key.bats +115 -0
- package/test/bats/echo-stop-hook.bats +101 -0
- package/test/bats/echo-ulid.bats +38 -0
- package/test/bats/portable-lock.bats +62 -0
- package/test/bats/prompt-rules.bats +269 -0
- package/test/bats/tribunal-aggregate.bats +77 -0
- package/test/bats/tribunal-config.bats +86 -0
- package/test/bats/tribunal-events.bats +209 -0
- package/test/bats/tribunal-gate.bats +95 -0
- package/test/bats/tribunal-jury.bats +80 -0
- package/test/bats/tribunal-rubric.bats +119 -0
- package/test/bats/tribunal-stop-hook.bats +73 -0
- package/test/bats/tribunal-verdict.bats +71 -0
- package/test/fixtures/hook-inputs/user-prompt-submit-rule-match.json +8 -0
- package/test/fixtures/hook-inputs/user-prompt-submit-rule-nomatch.json +8 -0
- package/test/helpers/setup.bash +9 -0
- package/test/node/check-manifests.test.mjs +173 -0
- package/test/node/check-references.test.mjs +279 -0
- package/test/node/coverage.test.mjs +143 -0
|
@@ -0,0 +1,108 @@
|
|
|
1
|
+
#!/usr/bin/env bash
|
|
2
|
+
# Config loading for Echo.
|
|
3
|
+
# Reads config.json from the repo's .claude/settings.json echo.* keys,
|
|
4
|
+
# falling back to the plugin's own config.json defaults.
|
|
5
|
+
|
|
6
|
+
_ECHO_CONFIG_JSON=""
|
|
7
|
+
_ECHO_PLUGIN_CONFIG_JSON=""
|
|
8
|
+
|
|
9
|
+
echo_config_load() {
|
|
10
|
+
local repo_root="${1:-}"
|
|
11
|
+
|
|
12
|
+
_ECHO_PLUGIN_CONFIG_JSON=""
|
|
13
|
+
local plugin_config="${CLAUDE_PLUGIN_ROOT:-}/config.json"
|
|
14
|
+
if [[ -f "$plugin_config" ]]; then
|
|
15
|
+
_ECHO_PLUGIN_CONFIG_JSON=$(cat "$plugin_config" 2>/dev/null) || _ECHO_PLUGIN_CONFIG_JSON=""
|
|
16
|
+
fi
|
|
17
|
+
|
|
18
|
+
_ECHO_CONFIG_JSON=""
|
|
19
|
+
if [[ -n "$repo_root" ]]; then
|
|
20
|
+
local settings_file="${repo_root}/.claude/settings.json"
|
|
21
|
+
if [[ -f "$settings_file" ]]; then
|
|
22
|
+
local settings
|
|
23
|
+
settings=$(cat "$settings_file" 2>/dev/null) || settings=""
|
|
24
|
+
local echo_block
|
|
25
|
+
echo_block=$(printf '%s' "$settings" | jq -c '.echo // empty' 2>/dev/null) || echo_block=""
|
|
26
|
+
[[ -n "$echo_block" ]] && _ECHO_CONFIG_JSON="$echo_block"
|
|
27
|
+
fi
|
|
28
|
+
fi
|
|
29
|
+
}
|
|
30
|
+
|
|
31
|
+
# Get a single scalar value. Checks settings.json first, then plugin config.json.
|
|
32
|
+
echo_config_get() {
|
|
33
|
+
local key="$1"
|
|
34
|
+
|
|
35
|
+
if [[ -n "$_ECHO_CONFIG_JSON" ]]; then
|
|
36
|
+
local val
|
|
37
|
+
val=$(printf '%s' "$_ECHO_CONFIG_JSON" | jq -r "${key} // empty" 2>/dev/null) || val=""
|
|
38
|
+
[[ -n "$val" && "$val" != "null" ]] && { printf '%s' "$val"; return 0; }
|
|
39
|
+
fi
|
|
40
|
+
|
|
41
|
+
if [[ -n "$_ECHO_PLUGIN_CONFIG_JSON" ]]; then
|
|
42
|
+
local val
|
|
43
|
+
val=$(printf '%s' "$_ECHO_PLUGIN_CONFIG_JSON" | jq -r ".echo${key} // empty" 2>/dev/null) || val=""
|
|
44
|
+
[[ -n "$val" && "$val" != "null" ]] && { printf '%s' "$val"; return 0; }
|
|
45
|
+
fi
|
|
46
|
+
}
|
|
47
|
+
|
|
48
|
+
echo_config_get_json() {
|
|
49
|
+
local key="$1"
|
|
50
|
+
|
|
51
|
+
if [[ -n "$_ECHO_CONFIG_JSON" ]]; then
|
|
52
|
+
local val
|
|
53
|
+
val=$(printf '%s' "$_ECHO_CONFIG_JSON" | jq -c "${key} // empty" 2>/dev/null) || val=""
|
|
54
|
+
[[ -n "$val" && "$val" != "null" && "$val" != "empty" ]] && { printf '%s' "$val"; return 0; }
|
|
55
|
+
fi
|
|
56
|
+
|
|
57
|
+
if [[ -n "$_ECHO_PLUGIN_CONFIG_JSON" ]]; then
|
|
58
|
+
local val
|
|
59
|
+
val=$(printf '%s' "$_ECHO_PLUGIN_CONFIG_JSON" | jq -c ".echo${key} // empty" 2>/dev/null) || val=""
|
|
60
|
+
[[ -n "$val" && "$val" != "null" && "$val" != "empty" ]] && { printf '%s' "$val"; return 0; }
|
|
61
|
+
fi
|
|
62
|
+
}
|
|
63
|
+
|
|
64
|
+
echo_config_enabled() {
|
|
65
|
+
local val
|
|
66
|
+
val=$(echo_config_get '.enabled')
|
|
67
|
+
[[ "$val" == "true" ]]
|
|
68
|
+
}
|
|
69
|
+
|
|
70
|
+
echo_config_model() {
|
|
71
|
+
local val
|
|
72
|
+
val=$(echo_config_get '.evaluation.model')
|
|
73
|
+
printf '%s' "${val:-claude-haiku-4-5-20251001}"
|
|
74
|
+
}
|
|
75
|
+
|
|
76
|
+
echo_config_timeout() {
|
|
77
|
+
local val
|
|
78
|
+
val=$(echo_config_get '.evaluation.timeout_seconds')
|
|
79
|
+
printf '%s' "${val:-60}"
|
|
80
|
+
}
|
|
81
|
+
|
|
82
|
+
echo_config_drift_threshold() {
|
|
83
|
+
local val
|
|
84
|
+
val=$(echo_config_get '.drift_threshold')
|
|
85
|
+
printf '%s' "${val:-0.05}"
|
|
86
|
+
}
|
|
87
|
+
|
|
88
|
+
# Prints newline-separated list of watch glob patterns.
|
|
89
|
+
echo_config_watch_paths() {
|
|
90
|
+
local raw
|
|
91
|
+
raw=$(echo_config_get_json '.watch_paths')
|
|
92
|
+
if [[ -n "$raw" ]]; then
|
|
93
|
+
printf '%s' "$raw" | jq -r '.[]' 2>/dev/null
|
|
94
|
+
else
|
|
95
|
+
printf 'plugins/*/agents/*.md\n'
|
|
96
|
+
fi
|
|
97
|
+
}
|
|
98
|
+
|
|
99
|
+
# Prints newline-separated list of exclude glob patterns.
|
|
100
|
+
echo_config_exclude_paths() {
|
|
101
|
+
local raw
|
|
102
|
+
raw=$(echo_config_get_json '.exclude_paths')
|
|
103
|
+
if [[ -n "$raw" ]]; then
|
|
104
|
+
printf '%s' "$raw" | jq -r '.[]' 2>/dev/null
|
|
105
|
+
fi
|
|
106
|
+
# Always exclude Echo's own tree — hardcoded, not overridable.
|
|
107
|
+
printf 'plugins/echo/**\n'
|
|
108
|
+
}
|
|
@@ -0,0 +1,74 @@
|
|
|
1
|
+
#!/usr/bin/env bash
|
|
2
|
+
# Canonical echo.* event emission.
|
|
3
|
+
# Thin wrapper around the ecosystem plugin's onlooker-event.mjs `emit` mode.
|
|
4
|
+
|
|
5
|
+
_ECHO_PLUGIN_NAME="echo"
|
|
6
|
+
|
|
7
|
+
_echo_event_js_path() {
|
|
8
|
+
if [[ -n "${_ONLOOKER_EVENT_JS:-}" && -f "$_ONLOOKER_EVENT_JS" ]]; then
|
|
9
|
+
printf '%s' "$_ONLOOKER_EVENT_JS"
|
|
10
|
+
return 0
|
|
11
|
+
fi
|
|
12
|
+
local plugin_root="${CLAUDE_PLUGIN_ROOT:-}"
|
|
13
|
+
local candidates=(
|
|
14
|
+
"${plugin_root}/scripts/lib/onlooker-event.mjs"
|
|
15
|
+
"${plugin_root}/../../scripts/lib/onlooker-event.mjs"
|
|
16
|
+
)
|
|
17
|
+
local c
|
|
18
|
+
for c in "${candidates[@]}"; do
|
|
19
|
+
[[ -f "$c" ]] && { printf '%s' "$c"; return 0; }
|
|
20
|
+
done
|
|
21
|
+
return 1
|
|
22
|
+
}
|
|
23
|
+
|
|
24
|
+
_echo_session_id() {
|
|
25
|
+
if [[ -n "${_HOOK_SESSION_ID:-}" ]]; then
|
|
26
|
+
printf '%s' "$_HOOK_SESSION_ID"
|
|
27
|
+
return 0
|
|
28
|
+
fi
|
|
29
|
+
if [[ -n "${CLAUDE_SESSION_ID:-}" ]]; then
|
|
30
|
+
printf '%s' "$CLAUDE_SESSION_ID"
|
|
31
|
+
return 0
|
|
32
|
+
fi
|
|
33
|
+
printf 'unknown'
|
|
34
|
+
}
|
|
35
|
+
|
|
36
|
+
echo_emit_event() {
|
|
37
|
+
local event_type="${1:-}"
|
|
38
|
+
local payload="${2:-}"
|
|
39
|
+
[[ -z "$event_type" || -z "$payload" ]] && return 1
|
|
40
|
+
|
|
41
|
+
local event_js
|
|
42
|
+
event_js=$(_echo_event_js_path) || {
|
|
43
|
+
printf 'echo-events: cannot locate onlooker-event.mjs\n' >&2
|
|
44
|
+
return 1
|
|
45
|
+
}
|
|
46
|
+
|
|
47
|
+
local session_id
|
|
48
|
+
session_id=$(_echo_session_id)
|
|
49
|
+
|
|
50
|
+
local params
|
|
51
|
+
params=$(jq -n \
|
|
52
|
+
--arg plugin "$_ECHO_PLUGIN_NAME" \
|
|
53
|
+
--arg sid "$session_id" \
|
|
54
|
+
--arg type "$event_type" \
|
|
55
|
+
--argjson payload "$payload" \
|
|
56
|
+
'{plugin: $plugin, session_id: $sid, event_type: $type, payload: $payload}')
|
|
57
|
+
|
|
58
|
+
local event stderr_file
|
|
59
|
+
stderr_file=$(mktemp -t echo-event-err.XXXXXX 2>/dev/null) || stderr_file="/tmp/echo-event-err.$$"
|
|
60
|
+
event=$(printf '%s' "$params" \
|
|
61
|
+
| ONLOOKER_DIR="${ONLOOKER_DIR:-$HOME/.onlooker}" \
|
|
62
|
+
ONLOOKER_PLUGIN_NAME="$_ECHO_PLUGIN_NAME" \
|
|
63
|
+
node "$event_js" emit 2>"$stderr_file") || {
|
|
64
|
+
printf 'echo-events: schema validation failed for %s\n' "$event_type" >&2
|
|
65
|
+
[[ -s "$stderr_file" ]] && cat "$stderr_file" >&2
|
|
66
|
+
rm -f "$stderr_file"
|
|
67
|
+
return 1
|
|
68
|
+
}
|
|
69
|
+
rm -f "$stderr_file"
|
|
70
|
+
|
|
71
|
+
local log_path="${ONLOOKER_EVENTS_LOG:-${ONLOOKER_DIR:-$HOME/.onlooker}/logs/onlooker-events.jsonl}"
|
|
72
|
+
mkdir -p "$(dirname "$log_path")" 2>/dev/null || return 1
|
|
73
|
+
printf '%s\n' "$event" >>"$log_path"
|
|
74
|
+
}
|
|
@@ -0,0 +1,81 @@
|
|
|
1
|
+
#!/usr/bin/env bash
|
|
2
|
+
# Project key derivation for Echo.
|
|
3
|
+
# Mirrors archivist/tribunal: stable 12-char hex key derived from the git remote
|
|
4
|
+
# or repo root, surviving renames, clones, and worktrees.
|
|
5
|
+
|
|
6
|
+
_echo_sha256_first12() {
|
|
7
|
+
local input="$1"
|
|
8
|
+
if command -v shasum >/dev/null 2>&1; then
|
|
9
|
+
printf '%s' "$input" | shasum -a 256 2>/dev/null | cut -c1-12
|
|
10
|
+
elif command -v sha256sum >/dev/null 2>&1; then
|
|
11
|
+
printf '%s' "$input" | sha256sum 2>/dev/null | cut -c1-12
|
|
12
|
+
else
|
|
13
|
+
return 1
|
|
14
|
+
fi
|
|
15
|
+
}
|
|
16
|
+
|
|
17
|
+
echo_project_remote_url() {
|
|
18
|
+
local cwd="${1:-}"
|
|
19
|
+
[[ -z "$cwd" || ! -d "$cwd" ]] && return 0
|
|
20
|
+
git -C "$cwd" remote get-url origin 2>/dev/null || true
|
|
21
|
+
}
|
|
22
|
+
|
|
23
|
+
echo_project_repo_root() {
|
|
24
|
+
local cwd="${1:-}"
|
|
25
|
+
[[ -z "$cwd" || ! -d "$cwd" ]] && return 0
|
|
26
|
+
|
|
27
|
+
if ! git -C "$cwd" rev-parse --is-inside-work-tree >/dev/null 2>&1; then
|
|
28
|
+
return 0
|
|
29
|
+
fi
|
|
30
|
+
|
|
31
|
+
local common_dir toplevel
|
|
32
|
+
common_dir=$(git -C "$cwd" rev-parse --git-common-dir 2>/dev/null) || return 0
|
|
33
|
+
|
|
34
|
+
if [[ -n "$common_dir" && "$common_dir" != /* ]]; then
|
|
35
|
+
common_dir="$(cd "$cwd" && cd "$common_dir" 2>/dev/null && pwd -P)" || common_dir=""
|
|
36
|
+
fi
|
|
37
|
+
|
|
38
|
+
if [[ -n "$common_dir" && -d "$common_dir" ]]; then
|
|
39
|
+
toplevel="$(cd "$common_dir/.." 2>/dev/null && pwd -P)" || toplevel=""
|
|
40
|
+
fi
|
|
41
|
+
|
|
42
|
+
if [[ -z "$toplevel" ]]; then
|
|
43
|
+
toplevel=$(git -C "$cwd" rev-parse --show-toplevel 2>/dev/null || true)
|
|
44
|
+
[[ -n "$toplevel" ]] && toplevel="$(cd "$toplevel" 2>/dev/null && pwd -P)"
|
|
45
|
+
fi
|
|
46
|
+
|
|
47
|
+
printf '%s' "$toplevel"
|
|
48
|
+
}
|
|
49
|
+
|
|
50
|
+
echo_project_key() {
|
|
51
|
+
local cwd="${1:-}"
|
|
52
|
+
[[ -z "$cwd" ]] && cwd="$(pwd)"
|
|
53
|
+
|
|
54
|
+
local remote
|
|
55
|
+
remote=$(echo_project_remote_url "$cwd")
|
|
56
|
+
if [[ -n "$remote" ]]; then
|
|
57
|
+
_echo_sha256_first12 "remote:$remote"
|
|
58
|
+
return 0
|
|
59
|
+
fi
|
|
60
|
+
|
|
61
|
+
local root
|
|
62
|
+
root=$(echo_project_repo_root "$cwd")
|
|
63
|
+
if [[ -n "$root" ]]; then
|
|
64
|
+
_echo_sha256_first12 "root:$root"
|
|
65
|
+
return 0
|
|
66
|
+
fi
|
|
67
|
+
|
|
68
|
+
return 0
|
|
69
|
+
}
|
|
70
|
+
|
|
71
|
+
# Stable test_id for a file path: first 16 chars of sha256 of the path.
|
|
72
|
+
echo_test_id_for_path() {
|
|
73
|
+
local path="$1"
|
|
74
|
+
if command -v shasum >/dev/null 2>&1; then
|
|
75
|
+
printf '%s' "$path" | shasum -a 256 2>/dev/null | cut -c1-16
|
|
76
|
+
elif command -v sha256sum >/dev/null 2>&1; then
|
|
77
|
+
printf '%s' "$path" | sha256sum 2>/dev/null | cut -c1-16
|
|
78
|
+
else
|
|
79
|
+
printf '%s' "$path" | od -A n -t x1 | tr -d ' \n' | cut -c1-16
|
|
80
|
+
fi
|
|
81
|
+
}
|
|
@@ -0,0 +1,46 @@
|
|
|
1
|
+
#!/usr/bin/env bash
|
|
2
|
+
# Minimal ULID generator for Echo suite_id and test_id values.
|
|
3
|
+
# Crockford Base32, lexicographically sortable, time-ordered.
|
|
4
|
+
|
|
5
|
+
_ECHO_ULID_ALPHABET="0123456789ABCDEFGHJKMNPQRSTVWXYZ"
|
|
6
|
+
|
|
7
|
+
_echo_ulid_encode() {
|
|
8
|
+
local n="$1"
|
|
9
|
+
local len="$2"
|
|
10
|
+
local out=""
|
|
11
|
+
local i
|
|
12
|
+
for ((i = 0; i < len; i++)); do
|
|
13
|
+
out="${_ECHO_ULID_ALPHABET:$((n % 32)):1}${out}"
|
|
14
|
+
n=$((n / 32))
|
|
15
|
+
done
|
|
16
|
+
printf '%s' "$out"
|
|
17
|
+
}
|
|
18
|
+
|
|
19
|
+
echo_ulid() {
|
|
20
|
+
local now_ms
|
|
21
|
+
if [[ "$(uname)" == "Darwin" ]]; then
|
|
22
|
+
now_ms=$(python3 -c 'import time; print(int(time.time() * 1000))' 2>/dev/null) \
|
|
23
|
+
|| now_ms=$(($(date +%s) * 1000))
|
|
24
|
+
else
|
|
25
|
+
now_ms=$(date +%s%3N 2>/dev/null) || now_ms=$(($(date +%s) * 1000))
|
|
26
|
+
fi
|
|
27
|
+
|
|
28
|
+
local rand_hex rand_hi rand_lo
|
|
29
|
+
rand_hex=$(openssl rand -hex 10 2>/dev/null)
|
|
30
|
+
if [[ -n "$rand_hex" && ${#rand_hex} -eq 20 ]]; then
|
|
31
|
+
rand_hi=$((16#${rand_hex:0:10}))
|
|
32
|
+
rand_lo=$((16#${rand_hex:10:10}))
|
|
33
|
+
else
|
|
34
|
+
rand_hi=$((RANDOM * 32768 + RANDOM))
|
|
35
|
+
rand_lo=$((RANDOM * 32768 + RANDOM))
|
|
36
|
+
rand_hi=$(((rand_hi * 256 + RANDOM % 256) & ((1 << 40) - 1)))
|
|
37
|
+
rand_lo=$(((rand_lo * 256 + RANDOM % 256) & ((1 << 40) - 1)))
|
|
38
|
+
fi
|
|
39
|
+
|
|
40
|
+
local ts_part hi_part lo_part
|
|
41
|
+
ts_part=$(_echo_ulid_encode "$now_ms" 10)
|
|
42
|
+
hi_part=$(_echo_ulid_encode "$rand_hi" 8)
|
|
43
|
+
lo_part=$(_echo_ulid_encode "$rand_lo" 8)
|
|
44
|
+
|
|
45
|
+
printf '%s%s%s' "$ts_part" "$hi_part" "$lo_part"
|
|
46
|
+
}
|
|
@@ -0,0 +1,20 @@
|
|
|
1
|
+
{
|
|
2
|
+
"name": "tribunal",
|
|
3
|
+
"version": "1.0.0",
|
|
4
|
+
"description": "Multi-agent execution with LLM-as-a-Judge quality gates. An Actor performs work; a jury of typed Judges scores it against a project-overridable rubric; a Meta-Judge reviews the jury for bias; the gate decides accept, retry, or exhaust. Grounded in LLM-as-a-Judge (Zheng et al. 2023) and LLM-as-a-Meta-Judge (Wu et al. 2024). Builds on the Onlooker ecosystem plugin.",
|
|
5
|
+
"author": {
|
|
6
|
+
"name": "Onlooker Community",
|
|
7
|
+
"url": "https://onlooker.dev"
|
|
8
|
+
},
|
|
9
|
+
"homepage": "https://onlooker.dev",
|
|
10
|
+
"repository": "https://github.com/onlooker-community/ecosystem",
|
|
11
|
+
"license": "MIT",
|
|
12
|
+
"skills": ["./skills/tribunal"],
|
|
13
|
+
"agents": [
|
|
14
|
+
"./agents/tribunal-actor.md",
|
|
15
|
+
"./agents/tribunal-judge-standard.md",
|
|
16
|
+
"./agents/tribunal-judge-security.md",
|
|
17
|
+
"./agents/tribunal-judge-adversarial.md",
|
|
18
|
+
"./agents/tribunal-meta-judge.md"
|
|
19
|
+
]
|
|
20
|
+
}
|
|
@@ -0,0 +1,10 @@
|
|
|
1
|
+
# Changelog
|
|
2
|
+
|
|
3
|
+
## 1.0.0 (2026-05-24)
|
|
4
|
+
|
|
5
|
+
|
|
6
|
+
### Features
|
|
7
|
+
|
|
8
|
+
* **tribunal:** add multi-agent code review plugin :sparkles: ([#30](https://github.com/onlooker-community/ecosystem/issues/30)) ([893f24a](https://github.com/onlooker-community/ecosystem/commit/893f24a8876fdd6ccb5c7dcf2636a7c902e88949))
|
|
9
|
+
|
|
10
|
+
## Changelog
|
|
@@ -0,0 +1,134 @@
|
|
|
1
|
+
# Tribunal
|
|
2
|
+
|
|
3
|
+
Multi-agent execution with LLM-as-a-Judge quality gates.
|
|
4
|
+
|
|
5
|
+
Tribunal wraps a task in a three-tier evaluation loop:
|
|
6
|
+
|
|
7
|
+
1. An **Actor** subagent performs the work.
|
|
8
|
+
2. A **jury** of typed **Judges** scores the output against a rubric.
|
|
9
|
+
3. A **Meta-Judge** reviews the jury for bias, hallucination, and criteria misapplication.
|
|
10
|
+
4. A configurable **gate policy** decides whether to accept, retry, or give up.
|
|
11
|
+
|
|
12
|
+
Grounded in two papers:
|
|
13
|
+
|
|
14
|
+
- [LLM-as-a-Judge (Zheng et al. 2023)](https://arxiv.org/abs/2306.05685) — strong LLMs can score other LLMs against rubrics with reasonable agreement to human judgment.
|
|
15
|
+
- [LLM-as-a-Meta-Judge (Wu et al. 2024)](https://arxiv.org/abs/2407.19594) — a second model reviewing the Judge catches position, verbosity, and self-enhancement bias.
|
|
16
|
+
|
|
17
|
+
Tribunal is a sibling plugin to [`ecosystem`](../../) and assumes the Onlooker observability substrate (`~/.onlooker/`) is present.
|
|
18
|
+
|
|
19
|
+
## How it works
|
|
20
|
+
|
|
21
|
+
| Surface | What tribunal does |
|
|
22
|
+
|---|---|
|
|
23
|
+
| `/tribunal <task>` skill | Orchestrates a full Actor → Jury → Meta → Gate loop, retrying the Actor with Judge critiques until the gate passes or `max_iterations` is reached. Emits the full canonical event stream. |
|
|
24
|
+
| `Stop` hook (opt-in) | When `tribunal.stop_hook.enabled` is true, runs a single advisory pass on the just-finished turn's output and writes a verdict for review on the next session. No retry — the main session has already ended. |
|
|
25
|
+
|
|
26
|
+
## Default jury
|
|
27
|
+
|
|
28
|
+
Out of the box, Tribunal empanels **two judges** to showcase the jury model without the cost of a full panel:
|
|
29
|
+
|
|
30
|
+
- `tribunal-judge-standard` — correctness, completeness, clarity.
|
|
31
|
+
- `tribunal-judge-adversarial` — devil's advocate, actively looks for failure modes and unhandled edges.
|
|
32
|
+
|
|
33
|
+
The gate uses `majority` policy with `weighted_mean` aggregation, so one strong reject does not automatically block. `tribunal-judge-security` is shipped but off by default — opt in for security-sensitive repos by adding `"security"` to `judge_types`.
|
|
34
|
+
|
|
35
|
+
## Configuration
|
|
36
|
+
|
|
37
|
+
Tribunal is enabled by default; the Stop hook is opt-in. Override per-project in your project's `.claude/settings.json`:
|
|
38
|
+
|
|
39
|
+
```json
|
|
40
|
+
{
|
|
41
|
+
"tribunal": {
|
|
42
|
+
"session": {
|
|
43
|
+
"judge_types": ["standard", "security", "adversarial"],
|
|
44
|
+
"gate_policy": "majority",
|
|
45
|
+
"max_iterations": 5
|
|
46
|
+
},
|
|
47
|
+
"stop_hook": { "enabled": true }
|
|
48
|
+
}
|
|
49
|
+
}
|
|
50
|
+
```
|
|
51
|
+
|
|
52
|
+
The full default `config.json` is the source of truth for available knobs.
|
|
53
|
+
|
|
54
|
+
### Project rubric override
|
|
55
|
+
|
|
56
|
+
Drop a `rubrics` file at `<repo>/.claude/tribunal.json` (or globally at `~/.onlooker/tribunal.json`) to override the built-in `default` rubric or add named rubrics referenced as `/tribunal --rubric=<id>`:
|
|
57
|
+
|
|
58
|
+
```json
|
|
59
|
+
{
|
|
60
|
+
"rubrics": [
|
|
61
|
+
{
|
|
62
|
+
"id": "default",
|
|
63
|
+
"criteria": [
|
|
64
|
+
{ "name": "correctness", "weight": 0.5, "min_pass": 0.8 },
|
|
65
|
+
{ "name": "tests", "weight": 0.3, "min_pass": 0.7 },
|
|
66
|
+
{ "name": "docs", "weight": 0.2, "min_pass": 0.5 }
|
|
67
|
+
],
|
|
68
|
+
"score_threshold": 0.8,
|
|
69
|
+
"max_iterations": 5,
|
|
70
|
+
"judge_types": ["standard", "security", "adversarial"],
|
|
71
|
+
"gate_policy": "majority",
|
|
72
|
+
"aggregation_method": "weighted_mean"
|
|
73
|
+
}
|
|
74
|
+
]
|
|
75
|
+
}
|
|
76
|
+
```
|
|
77
|
+
|
|
78
|
+
Project rubrics override built-ins by `id`.
|
|
79
|
+
|
|
80
|
+
## Subagents
|
|
81
|
+
|
|
82
|
+
| Agent | `judge_type` | Role |
|
|
83
|
+
|---|---|---|
|
|
84
|
+
| `tribunal-actor` | n/a | Performs the task. Receives prior iteration's verdicts on retries. |
|
|
85
|
+
| `tribunal-judge-standard` | `standard` | General correctness, completeness, clarity. |
|
|
86
|
+
| `tribunal-judge-security` | `security` | Vulnerability-focused: injection, auth bypass, data exposure. |
|
|
87
|
+
| `tribunal-judge-adversarial` | `adversarial` | Actively tries to find failure modes and missing edge cases. |
|
|
88
|
+
| `tribunal-meta-judge` | `meta` | Reviews each Judge's verdict for the six bias types defined in the LLM-as-a-Judge paper. |
|
|
89
|
+
|
|
90
|
+
`maintainability` and `domain` judge types are recognized in config but not yet shipped as subagents; they degrade to `standard` with a warning. They are planned for v0.2.
|
|
91
|
+
|
|
92
|
+
## Storage layout
|
|
93
|
+
|
|
94
|
+
```text
|
|
95
|
+
~/.onlooker/tribunal/<project-key>/
|
|
96
|
+
├── manifest.json
|
|
97
|
+
└── <task_id>/ # ULID
|
|
98
|
+
├── manifest.json
|
|
99
|
+
├── session-start.json
|
|
100
|
+
├── session-complete.json
|
|
101
|
+
└── iteration-<iteration_id>/ # ULID per iteration
|
|
102
|
+
├── actor.md
|
|
103
|
+
├── jury.json
|
|
104
|
+
├── verdicts/
|
|
105
|
+
│ └── <judge_id>.json # one per judge
|
|
106
|
+
├── consensus.json
|
|
107
|
+
├── dissent.json # only when emitted
|
|
108
|
+
├── meta.json
|
|
109
|
+
└── gate.json
|
|
110
|
+
```
|
|
111
|
+
|
|
112
|
+
Project keying mirrors `archivist`: SHA256 of `git remote get-url origin` (first 12 hex), falling back to a hash of the repo root realpath. Worktrees of the same repo share a key.
|
|
113
|
+
|
|
114
|
+
## Events emitted
|
|
115
|
+
|
|
116
|
+
Tribunal emits the full canonical `tribunal.*` event surface from [`@onlooker-community/schema`](https://github.com/onlooker-community/schema) (v2.1.0+):
|
|
117
|
+
|
|
118
|
+
`session.start`, `iteration.start`, `actor.start`, `actor.complete`, `jury.empaneled`, `judge.start`, `verdict` (one per judge), `meta.start`, `meta.complete`, `consensus.reached`, `dissent.recorded` (when judges disagree), `gate.passed` / `gate.blocked`, `session.complete`.
|
|
119
|
+
|
|
120
|
+
All events land in `~/.onlooker/logs/onlooker-events.jsonl` and are validated against the schema before write.
|
|
121
|
+
|
|
122
|
+
## Requirements
|
|
123
|
+
|
|
124
|
+
- The `ecosystem` plugin installed (for `~/.onlooker/` substrate).
|
|
125
|
+
- `claude` CLI on `PATH` (the Stop hook shells out to `claude -p` for its advisory pass).
|
|
126
|
+
- `jq` for JSON manipulation.
|
|
127
|
+
- `node` for canonical-event emission (the ecosystem plugin already requires this).
|
|
128
|
+
|
|
129
|
+
## Architecture decisions
|
|
130
|
+
|
|
131
|
+
Key decisions made during initial design are recorded in [`docs/adr/`](docs/adr/):
|
|
132
|
+
|
|
133
|
+
- [ADR-001](docs/adr/001-actor-jury-meta-gate-loop.md) — The Actor → Jury → Meta-Judge → Gate loop
|
|
134
|
+
- [ADR-002](docs/adr/002-majority-gate-policy.md) — Majority gate policy as default (and the 2-judge edge case)
|
|
@@ -0,0 +1,35 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: tribunal-actor
|
|
3
|
+
description: Performs a task end-to-end under Tribunal supervision. Receives the task description and, on retry iterations, the prior iteration's jury verdicts and Meta-Judge feedback. Output is the work itself (code changes, an analysis, a refactor plan) rendered as the final assistant message — no JSON wrapping, no scoring; the Judges do that next.
|
|
4
|
+
model: claude-sonnet-4-6
|
|
5
|
+
tools: Read, Edit, Write, Bash, Grep, Glob
|
|
6
|
+
---
|
|
7
|
+
|
|
8
|
+
# Tribunal Actor
|
|
9
|
+
|
|
10
|
+
You are the **Actor** in a Tribunal evaluation loop. Your job is to do the work the user asked for. A jury of Judges will score your output against a rubric, and a Meta-Judge will review the jury before the gate decides whether to accept, retry, or give up.
|
|
11
|
+
|
|
12
|
+
## Inputs
|
|
13
|
+
|
|
14
|
+
You will receive:
|
|
15
|
+
|
|
16
|
+
- **Task description** — what to do.
|
|
17
|
+
- **Rubric criteria** — the dimensions the Judges will score on (e.g., correctness, completeness, safety, clarity). Use these as a checklist while you work; they tell you what "good" looks like for this task.
|
|
18
|
+
- **(On retries only) Prior iteration's feedback** — a digest of the Judges' verdicts and any Meta-Judge override or bias notes. Address the specific concerns; do not re-litigate scores.
|
|
19
|
+
|
|
20
|
+
## Output expectations
|
|
21
|
+
|
|
22
|
+
- Render your work as the final assistant message — code, edits, an analysis, a plan, whatever the task calls for.
|
|
23
|
+
- Be concrete. Vague directional answers score poorly on `completeness` and `clarity`.
|
|
24
|
+
- When you make a non-obvious choice, state the trade-off in one line. Judges credit this under `correctness` and `clarity`; they penalize unexplained guesses.
|
|
25
|
+
- Do not score yourself. Do not write a "self-review." The Judges will do that.
|
|
26
|
+
|
|
27
|
+
## What to avoid
|
|
28
|
+
|
|
29
|
+
- Stalling. If you cannot complete a step, say so explicitly and proceed with what you can finish — partial work that names its gaps scores better than fabricated completeness.
|
|
30
|
+
- Over-engineering. If the task is a one-line fix, give a one-line fix. Adding scaffolding hurts `clarity` and may trip the `adversarial` Judge.
|
|
31
|
+
- Padding. Verbosity is a known judge bias the Meta-Judge will flag against you. Say what needs saying and stop.
|
|
32
|
+
|
|
33
|
+
## On retry
|
|
34
|
+
|
|
35
|
+
When you see prior verdicts, treat the lowest-scoring criterion as the priority. If the Meta-Judge flagged `bias_detected`, you can ignore the bias-affected critique on that dimension — but address every concern the Meta-Judge endorsed (`verdict_quality: sound`).
|
|
@@ -0,0 +1,51 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: tribunal-judge-adversarial
|
|
3
|
+
description: Devil's-advocate Tribunal judge. Actively tries to break the Actor's work — edge cases, empty inputs, concurrent callers, partial failures, version drift, assumptions that are not stated. Pairs well with tribunal-judge-standard to balance optimistic and pessimistic scoring. Emits TribunalVerdictPayload as the final message. Read-only (Bash allowed only to run existing test suites — do not modify code).
|
|
4
|
+
model: claude-opus-4-7
|
|
5
|
+
tools: Read, Grep, Glob, Bash
|
|
6
|
+
---
|
|
7
|
+
|
|
8
|
+
# Tribunal Adversarial Judge
|
|
9
|
+
|
|
10
|
+
You are the **Adversarial Judge** in a Tribunal jury. Your job is to try, in good faith, to falsify the Actor's claim that the work is correct. The Standard Judge looks for what is right; you look for what could break.
|
|
11
|
+
|
|
12
|
+
## Your stance
|
|
13
|
+
|
|
14
|
+
- Assume the Actor missed something. Prove or disprove it before scoring.
|
|
15
|
+
- You may run existing tests (`Bash`) to confirm or refute Actor claims. Do not write new tests or modify code — read-only stance.
|
|
16
|
+
- You may not invent constraints the task did not impose. The Meta-Judge will flag that as `position` or `verbosity` bias and downweight you.
|
|
17
|
+
|
|
18
|
+
## What to probe
|
|
19
|
+
|
|
20
|
+
- **Empty / null / boundary inputs** — does the code handle `[]`, `""`, `0`, `None`, very long inputs?
|
|
21
|
+
- **Concurrent callers** — race on a file lock, on a shared global, on an outer cache.
|
|
22
|
+
- **Partial failures** — what if step 2 of 3 fails — is state left half-written?
|
|
23
|
+
- **Unstated assumptions** — does the code assume sorted input? Timezone-naive timestamps? `LC_ALL=C`? A specific shell?
|
|
24
|
+
- **Version drift** — does it use a flag added in a recent version of a tool? Will it work on the older versions documented as supported?
|
|
25
|
+
- **Idempotency** — what happens on a second run?
|
|
26
|
+
- **Reverse engineering the test** — can you produce an input that satisfies the test but breaks the spirit of the task?
|
|
27
|
+
|
|
28
|
+
## Scoring discipline
|
|
29
|
+
|
|
30
|
+
- Each concrete falsification (a reproducible failure or a clear, named gap) drops the score by `0.15`, floor `0.10`.
|
|
31
|
+
- A single vague "this might fail" is worth `0.0` — name the input or do not raise it.
|
|
32
|
+
- If you genuinely cannot falsify, score `0.85+` and say so. Refusing to ever give a high score is `refusal` bias and the Meta-Judge will flag it.
|
|
33
|
+
|
|
34
|
+
## Output format
|
|
35
|
+
|
|
36
|
+
Final message is a single JSON object — no prose, no fence:
|
|
37
|
+
|
|
38
|
+
```json
|
|
39
|
+
{
|
|
40
|
+
"score": 0.55,
|
|
41
|
+
"passed": false,
|
|
42
|
+
"judge_type": "adversarial",
|
|
43
|
+
"criteria_evaluated": ["edge-cases", "concurrency", "idempotency"],
|
|
44
|
+
"strengths_count": 1,
|
|
45
|
+
"weaknesses_count": 2,
|
|
46
|
+
"confidence": 0.8,
|
|
47
|
+
"feedback_summary": "Reproduced: empty input array raises IndexError at parse.py:42 instead of returning []. Second run of the migration script duplicates rows — not idempotent. Concurrency story is fine, single-process by design."
|
|
48
|
+
}
|
|
49
|
+
```
|
|
50
|
+
|
|
51
|
+
`feedback_summary` should describe each falsification with enough specificity that the Actor can reproduce it on retry.
|
|
@@ -0,0 +1,47 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: tribunal-judge-security
|
|
3
|
+
description: Security-focused Tribunal judge. Scores Actor output through a vulnerability lens — injection, auth, secrets, unsafe shell, path traversal, deserialization, SSRF, race conditions on shared resources. Off by default; opt in by adding "security" to judge_types for security-sensitive code. Emits TribunalVerdictPayload as the final message. Read-only.
|
|
4
|
+
model: claude-opus-4-7
|
|
5
|
+
tools: Read, Grep, Glob
|
|
6
|
+
---
|
|
7
|
+
|
|
8
|
+
# Tribunal Security Judge
|
|
9
|
+
|
|
10
|
+
You are the **Security Judge** in a Tribunal jury. Score the Actor's output exclusively through a security lens. If the change has no security surface, score `correctness` neutrally (around `0.75`) with a short note; do not invent issues to justify your presence.
|
|
11
|
+
|
|
12
|
+
## What to look for
|
|
13
|
+
|
|
14
|
+
- **Injection** — SQL/command/shell/template/LDAP. Anything that builds a query or command from user input.
|
|
15
|
+
- **AuthN/AuthZ** — bypasses, missing checks, privilege escalation, session handling, token leakage.
|
|
16
|
+
- **Secrets handling** — credentials in logs, env vars echoed to stdout, secrets committed to disk.
|
|
17
|
+
- **Unsafe shell** — `eval`, unquoted expansions, `rm -rf $VAR` without validation, `curl | bash` patterns.
|
|
18
|
+
- **Path traversal** — unconstrained `../` paths, symlink chasing, missing realpath validation.
|
|
19
|
+
- **Deserialization** — `pickle`, unsafe YAML, `JSON.parse` of untrusted input feeding `eval`.
|
|
20
|
+
- **SSRF / open redirects** — fetches whose target derives from user input.
|
|
21
|
+
- **TOCTOU** and races on shared resources, especially around files and locks.
|
|
22
|
+
|
|
23
|
+
## Scoring discipline
|
|
24
|
+
|
|
25
|
+
- A single critical finding (RCE, auth bypass, secret leak) caps `score` at `0.3` regardless of other dimensions.
|
|
26
|
+
- Multiple medium findings cap at `0.6`.
|
|
27
|
+
- Read the changed files. Do not score from the summary.
|
|
28
|
+
- Do not flag style or hypothetical "could be exploited if…" without a concrete attack chain. The Meta-Judge will mark you as `biased` if you over-report.
|
|
29
|
+
|
|
30
|
+
## Output format
|
|
31
|
+
|
|
32
|
+
Final message is a single JSON object — no prose, no fence:
|
|
33
|
+
|
|
34
|
+
```json
|
|
35
|
+
{
|
|
36
|
+
"score": 0.45,
|
|
37
|
+
"passed": false,
|
|
38
|
+
"judge_type": "security",
|
|
39
|
+
"criteria_evaluated": ["injection", "secrets", "path-traversal"],
|
|
40
|
+
"strengths_count": 1,
|
|
41
|
+
"weaknesses_count": 2,
|
|
42
|
+
"confidence": 0.9,
|
|
43
|
+
"feedback_summary": "scripts/run.sh:24 passes $USER_INPUT to a shell without quoting → command injection. scripts/run.sh:31 logs the API token. Other dimensions clean."
|
|
44
|
+
}
|
|
45
|
+
```
|
|
46
|
+
|
|
47
|
+
When `passed: false`, every finding in `feedback_summary` must point at a file and (when possible) a line. Vague security objections waste the Actor's retry budget.
|
|
@@ -0,0 +1,47 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: tribunal-judge-standard
|
|
3
|
+
description: Scores Actor output against the active rubric on correctness, completeness, and clarity. Emits a single TribunalVerdictPayload JSON object as the final message — no prose around it. Read-only: you evaluate, you do not edit. Designed for the general case (refactors, docs, analysis, most code changes). Use the security or adversarial judge for those specific lenses.
|
|
4
|
+
model: claude-opus-4-7
|
|
5
|
+
tools: Read, Grep, Glob
|
|
6
|
+
---
|
|
7
|
+
|
|
8
|
+
# Tribunal Standard Judge
|
|
9
|
+
|
|
10
|
+
You are the **Standard Judge** in a Tribunal jury. Score the Actor's output against the rubric. Be honest, calibrated, and terse.
|
|
11
|
+
|
|
12
|
+
## Inputs
|
|
13
|
+
|
|
14
|
+
- **Task description** — what the Actor was asked to do.
|
|
15
|
+
- **Rubric** — list of criteria with `name`, `weight`, `min_pass`. Score each criterion in [0, 1].
|
|
16
|
+
- **Actor output** — what to evaluate.
|
|
17
|
+
- **Score threshold** — the overall bar for `passed: true`.
|
|
18
|
+
|
|
19
|
+
## Scoring discipline
|
|
20
|
+
|
|
21
|
+
- Read the actual files the Actor changed before scoring. Do not score from the Actor's summary alone.
|
|
22
|
+
- A `0.7` means "meets the bar." Reserve `0.9+` for clearly excellent work. Reserve `< 0.5` for clearly broken work.
|
|
23
|
+
- Calibrate against the rubric, not against an imagined ideal answer. A small task done well scores higher than a sprawling task done halfway.
|
|
24
|
+
- Avoid verbosity bias: a long Actor response is not better than a short correct one.
|
|
25
|
+
|
|
26
|
+
## Output format
|
|
27
|
+
|
|
28
|
+
Your **final message** must be a single JSON object matching `TribunalVerdictPayload`. No markdown, no prose around it, no code fence — just JSON:
|
|
29
|
+
|
|
30
|
+
```json
|
|
31
|
+
{
|
|
32
|
+
"score": 0.82,
|
|
33
|
+
"passed": true,
|
|
34
|
+
"judge_type": "standard",
|
|
35
|
+
"criteria_evaluated": ["correctness", "completeness", "clarity"],
|
|
36
|
+
"strengths_count": 3,
|
|
37
|
+
"weaknesses_count": 1,
|
|
38
|
+
"confidence": 0.85,
|
|
39
|
+
"feedback_summary": "Patch is correct and minimal. Missing test for the empty-input case. Naming and comments are clear."
|
|
40
|
+
}
|
|
41
|
+
```
|
|
42
|
+
|
|
43
|
+
Required fields: `score`, `passed`, `judge_type`. `passed` reflects your own judgment based on the rubric thresholds — the orchestrator may still aggregate and override per gate policy.
|
|
44
|
+
|
|
45
|
+
`feedback_summary` should be 1–3 sentences. Name specific files and lines when you can. This is what the Actor sees on retry.
|
|
46
|
+
|
|
47
|
+
The orchestrator will inject `judge_id` and `iteration_id` when persisting your verdict.
|