@kevinrabun/judges 3.20.3 → 3.20.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -2,6 +2,15 @@
2
2
 
3
3
  All notable changes to **@kevinrabun/judges** are documented here.
4
4
 
5
+ ## [3.20.4] — 2026-03-03
6
+
7
+ ### Fixed
8
+ - **Stale documentation counts** — Updated all references across README, docs, server.json, action.yml, package.json, Dockerfile, extension metadata, examples, and scripts from "35 judges" → "37 judges", "47 patches" → "53 patches", and test badge "1515" → "1557". Historical changelog entries left unchanged.
9
+
10
+ ### Tests
11
+ - **Doc-claim verification tests** — Added 42 new tests covering: JUDGES array count assertion (exactly 37), judge schema validation (id, name, domain, description), unique judge ID enforcement, scoring penalty constants (critical=30, high=18, medium=10, low=5, info=2), confidence-weighted deductions, score floor/ceiling, positive signal bonuses (+3/+3/+3/+2/+2/+2/+2/+1/+1/+1 with cap at 15), verdict threshold logic (fail/warning/pass boundaries), and STRUCT threshold rules not previously covered: STRUCT-001 (CC>10), STRUCT-007 (file CC>40), STRUCT-008 (CC>20), STRUCT-010 (>150 lines).
12
+ - All 1,557 tests pass (976 judges + 218 negative + 251 subsystems + 70 extension + 42 tool-routing)
13
+
5
14
  ## [3.20.3] — 2026-03-03
6
15
 
7
16
  ### Fixed
package/README.md CHANGED
@@ -1,6 +1,6 @@
1
1
  # Judges Panel
2
2
 
3
- An MCP (Model Context Protocol) server that provides a panel of **35 specialized judges** to evaluate AI-generated code — acting as an independent quality gate regardless of which project is being reviewed. Combines **deterministic pattern matching & AST analysis** (instant, offline, zero LLM calls) with **LLM-powered deep-review prompts** that let your AI assistant perform expert-persona analysis across all 35 domains.
3
+ An MCP (Model Context Protocol) server that provides a panel of **37 specialized judges** to evaluate AI-generated code — acting as an independent quality gate regardless of which project is being reviewed. Combines **deterministic pattern matching & AST analysis** (instant, offline, zero LLM calls) with **LLM-powered deep-review prompts** that let your AI assistant perform expert-persona analysis across all 37 domains.
4
4
 
5
5
  **Highlights:**
6
6
  - Includes an **App Builder Workflow (3-step)** demo for release decisions, plain-language risk summaries, and prioritized fixes — see [Try the Demo](#2-try-the-demo).
@@ -11,7 +11,7 @@ An MCP (Model Context Protocol) server that provides a panel of **35 specialized
11
11
  [![npm](https://img.shields.io/npm/v/@kevinrabun/judges)](https://www.npmjs.com/package/@kevinrabun/judges)
12
12
  [![npm downloads](https://img.shields.io/npm/dw/@kevinrabun/judges)](https://www.npmjs.com/package/@kevinrabun/judges)
13
13
  [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
14
- [![Tests](https://img.shields.io/badge/tests-1267-brightgreen)](https://github.com/KevinRabun/judges/actions)
14
+ [![Tests](https://img.shields.io/badge/tests-1557-brightgreen)](https://github.com/KevinRabun/judges/actions)
15
15
 
16
16
  ---
17
17
 
@@ -21,10 +21,10 @@ AI code generators (Copilot, Cursor, Claude, ChatGPT, etc.) write code fast —
21
21
 
22
22
  | | ESLint / Biome | SonarQube | Semgrep / CodeQL | **Judges** |
23
23
  |---|---|---|---|---|
24
- | **Scope** | Style + some bugs | Bugs + code smells | Security patterns | **35 domains**: security, cost, compliance, a11y, API design, cloud, UX, … |
24
+ | **Scope** | Style + some bugs | Bugs + code smells | Security patterns | **37 domains**: security, cost, compliance, a11y, API design, cloud, UX, … |
25
25
  | **AI-generated code focus** | No | No | Partial | **Purpose-built** for AI output failure modes |
26
26
  | **Setup** | Config per project | Server + scanner | Cloud or local | **One command**: `npx @kevinrabun/judges eval file.ts` |
27
- | **Auto-fix patches** | Some | No | No | **47 deterministic patches** — instant, offline |
27
+ | **Auto-fix patches** | Some | No | No | **53 deterministic patches** — instant, offline |
28
28
  | **Non-technical output** | No | Dashboard | No | **Plain-language findings** with What/Why/Next |
29
29
  | **MCP native** | No | No | No | **Yes** — works inside Copilot, Claude, Cursor |
30
30
  | **SARIF output** | No | Yes | Yes | **Yes** — upload to GitHub Code Scanning |
@@ -79,7 +79,7 @@ judges eval --min-score 80 src/api.ts
79
79
  # One-line summary for scripts
80
80
  judges eval --summary src/api.ts
81
81
 
82
- # List all 35 judges
82
+ # List all 37 judges
83
83
  judges list
84
84
  ```
85
85
 
@@ -190,7 +190,7 @@ npm run build
190
190
 
191
191
  ### 2. Try the Demo
192
192
 
193
- Run the included demo to see all 35 judges evaluate a purposely flawed API server:
193
+ Run the included demo to see all 37 judges evaluate a purposely flawed API server:
194
194
 
195
195
  ```bash
196
196
  npm run demo
@@ -293,7 +293,7 @@ Install the **[Judges Panel](https://marketplace.visualstudio.com/items?itemName
293
293
 
294
294
  - **Inline diagnostics & quick-fixes** on every file save
295
295
  - **`@judges` chat participant** — type `@judges` in Copilot Chat, or just ask for a "judges panel review" and Copilot routes automatically
296
- - **Auto-configured MCP server** — all 35 expert-persona prompts available to Copilot with zero setup
296
+ - **Auto-configured MCP server** — all 37 expert-persona prompts available to Copilot with zero setup
297
297
 
298
298
  ```bash
299
299
  code --install-extension kevinrabun.judges-panel
@@ -420,7 +420,7 @@ All commands support `--help` for usage details.
420
420
 
421
421
  ### `judges eval`
422
422
 
423
- Evaluate a file with all 35 judges or a single judge.
423
+ Evaluate a file with all 37 judges or a single judge.
424
424
 
425
425
  | Flag | Description |
426
426
  |------|-------------|
@@ -667,13 +667,13 @@ The tribunal operates in three layers:
667
667
 
668
668
  2. **AST-Based Structural Analysis** — The Code Structure judge (`STRUCT-*` rules) uses real Abstract Syntax Tree parsing to measure cyclomatic complexity, nesting depth, function length, parameter count, dead code, and type safety with precision that regex cannot achieve. All supported languages — **TypeScript, JavaScript, Python, Rust, Go, Java, C#, and C++** — are parsed via **tree-sitter WASM grammars** (real syntax trees compiled to WebAssembly, in-process, zero native dependencies). A scope-tracking structural parser is kept as a fallback when WASM grammars are unavailable. No external AST server required.
669
669
 
670
- 3. **LLM-Powered Deep Analysis (Prompts)** — The server exposes MCP prompts (e.g., `judge-data-security`, `full-tribunal`) that provide each judge's expert persona as a system prompt. When used by an LLM-based client (Copilot, Claude, Cursor, etc.), the host LLM performs deeper, context-aware probabilistic analysis beyond what static patterns can detect. This is where the `systemPrompt` on each judge comes alive — Judges itself makes no LLM calls, but it provides the expert criteria so your AI assistant can act as 35 specialized reviewers.
670
+ 3. **LLM-Powered Deep Analysis (Prompts)** — The server exposes MCP prompts (e.g., `judge-data-security`, `full-tribunal`) that provide each judge's expert persona as a system prompt. When used by an LLM-based client (Copilot, Claude, Cursor, etc.), the host LLM performs deeper, context-aware probabilistic analysis beyond what static patterns can detect. This is where the `systemPrompt` on each judge comes alive — Judges itself makes no LLM calls, but it provides the expert criteria so your AI assistant can act as 37 specialized reviewers.
671
671
 
672
672
  ---
673
673
 
674
674
  ## Composable by Design
675
675
 
676
- Judges Panel is a **dual-layer** review system: instant **deterministic tools** (offline, no API keys) for pattern and AST analysis, plus **35 expert-persona MCP prompts** that unlock LLM-powered deep analysis when connected to an AI client. It does not try to be a CVE scanner or a linter. Those capabilities belong in dedicated MCP servers that an AI agent can orchestrate alongside Judges.
676
+ Judges Panel is a **dual-layer** review system: instant **deterministic tools** (offline, no API keys) for pattern and AST analysis, plus **37 expert-persona MCP prompts** that unlock LLM-powered deep analysis when connected to an AI client. It does not try to be a CVE scanner or a linter. Those capabilities belong in dedicated MCP servers that an AI agent can orchestrate alongside Judges.
677
677
 
678
678
  ### Built-in AST Analysis (v2.0.0+)
679
679
 
@@ -722,7 +722,7 @@ When your AI coding assistant connects to multiple MCP servers, each one contrib
722
722
 
723
723
  | Layer | What It Does | Example Servers |
724
724
  |-------|-------------|-----------------|
725
- | **Judges Panel** | 35-judge quality gate — security patterns, AST analysis, cost, scalability, a11y, compliance, sovereignty, ethics, dependency health, agent instruction governance, AI code safety, framework safety | This server |
725
+ | **Judges Panel** | 37-judge quality gate — security patterns, AST analysis, cost, scalability, a11y, compliance, sovereignty, ethics, dependency health, agent instruction governance, AI code safety, framework safety | This server |
726
726
  | **CVE / SBOM** | Vulnerability scanning against live databases — known CVEs, license risks, supply chain | OSV, Snyk, Trivy, Grype MCP servers |
727
727
  | **Linting** | Language-specific style and correctness rules | ESLint, Ruff, Clippy MCP servers |
728
728
  | **Runtime Profiling** | Memory, CPU, latency measurement on running code | Custom profiling MCP servers |
@@ -876,7 +876,7 @@ Generated from https://github.com/microsoft/vscode on 2026-02-21T12:00:00.000Z.
876
876
  List all available judges with their domains and descriptions.
877
877
 
878
878
  ### `evaluate_code`
879
- Submit code to the **full judges panel**. All 35 judges evaluate independently and return a combined verdict.
879
+ Submit code to the **full judges panel**. All 37 judges evaluate independently and return a combined verdict.
880
880
 
881
881
  | Parameter | Type | Required | Description |
882
882
  |-----------|------|----------|-------------|
@@ -900,7 +900,7 @@ Submit code to a **specific judge** for targeted review.
900
900
  | `config` | object | no | Inline configuration (see [Configuration](#configuration)) |
901
901
 
902
902
  ### `evaluate_project`
903
- Submit multiple files for **project-level analysis**. All 35 judges evaluate each file, plus cross-file architectural analysis detects code duplication, inconsistent error handling, and dependency cycles.
903
+ Submit multiple files for **project-level analysis**. All 37 judges evaluate each file, plus cross-file architectural analysis detects code duplication, inconsistent error handling, and dependency cycles.
904
904
 
905
905
  | Parameter | Type | Required | Description |
906
906
  |-----------|------|----------|-------------|
@@ -911,7 +911,7 @@ Submit multiple files for **project-level analysis**. All 35 judges evaluate eac
911
911
  | `config` | object | no | Inline configuration (see [Configuration](#configuration)) |
912
912
 
913
913
  ### `evaluate_diff`
914
- Evaluate only the **changed lines** in a code diff. Runs all 35 judges on the full file but filters findings to lines you specify. Ideal for PR reviews and incremental analysis.
914
+ Evaluate only the **changed lines** in a code diff. Runs all 37 judges on the full file but filters findings to lines you specify. Ideal for PR reviews and incremental analysis.
915
915
 
916
916
  | Parameter | Type | Required | Description |
917
917
  |-----------|------|----------|-------------|
@@ -979,7 +979,7 @@ Each judge has a corresponding prompt for LLM-powered deep analysis:
979
979
  | `judge-agent-instructions` | Deep review of agent instruction markdown quality and safety |
980
980
  | `judge-ai-code-safety` | Deep review of AI-generated code risks: prompt injection, insecure LLM output handling, debug defaults, missing validation |
981
981
  | `judge-framework-safety` | Deep review of framework-specific safety: React hooks, Express middleware, Next.js SSR/SSG, Angular/Vue patterns |
982
- | `full-tribunal` | All 35 judges in a single prompt |
982
+ | `full-tribunal` | All 37 judges in a single prompt |
983
983
 
984
984
  ---
985
985
 
@@ -1101,7 +1101,7 @@ Each judge scores the code from **0 to 100**:
1101
1101
  - **WARNING** — Any high finding, any medium finding, or score < 80
1102
1102
  - **PASS** — Score ≥ 80 with no critical, high, or medium findings
1103
1103
 
1104
- The **overall tribunal score** is the average of all 35 judges. The overall verdict fails if **any** judge fails.
1104
+ The **overall tribunal score** is the average of all 37 judges. The overall verdict fails if **any** judge fails.
1105
1105
 
1106
1106
  ---
1107
1107
 
package/package.json CHANGED
@@ -1,7 +1,7 @@
1
1
  {
2
2
  "name": "@kevinrabun/judges",
3
- "version": "3.20.3",
4
- "description": "35 specialized judges that evaluate AI-generated code for security, cost, and quality.",
3
+ "version": "3.20.4",
4
+ "description": "37 specialized judges that evaluate AI-generated code for security, cost, and quality.",
5
5
  "mcpName": "io.github.KevinRabun/judges",
6
6
  "type": "module",
7
7
  "main": "dist/index.js",
package/server.json CHANGED
@@ -2,17 +2,17 @@
2
2
  "$schema": "https://static.modelcontextprotocol.io/schemas/2025-12-11/server.schema.json",
3
3
  "name": "io.github.KevinRabun/judges",
4
4
  "title": "Judges Panel",
5
- "description": "35 judges that evaluate AI-generated code for security, cost, and quality with built-in AST.",
5
+ "description": "37 judges that evaluate AI-generated code for security, cost, and quality with built-in AST.",
6
6
  "repository": {
7
7
  "url": "https://github.com/kevinrabun/judges",
8
8
  "source": "github"
9
9
  },
10
- "version": "3.20.3",
10
+ "version": "3.20.4",
11
11
  "packages": [
12
12
  {
13
13
  "registryType": "npm",
14
14
  "identifier": "@kevinrabun/judges",
15
- "version": "3.20.3",
15
+ "version": "3.20.4",
16
16
  "transport": {
17
17
  "type": "stdio"
18
18
  }
@@ -21,7 +21,7 @@
21
21
  "tools": [
22
22
  {
23
23
  "name": "evaluate_code",
24
- "description": "Submit code to the full 35-judge tribunal for security, cost, and quality analysis. Handles all code types including application code, infrastructure-as-code (Bicep, Terraform, ARM), and configuration files."
24
+ "description": "Submit code to the full 37-judge tribunal for security, cost, and quality analysis. Handles all code types including application code, infrastructure-as-code (Bicep, Terraform, ARM), and configuration files."
25
25
  },
26
26
  {
27
27
  "name": "evaluate_code_single_judge",
@@ -59,7 +59,7 @@
59
59
  "prompts": [
60
60
  {
61
61
  "name": "full-tribunal",
62
- "description": "Convene all 35 judges for a comprehensive LLM-powered deep review."
62
+ "description": "Convene all 37 judges for a comprehensive LLM-powered deep review."
63
63
  },
64
64
  {
65
65
  "name": "judge-{id}",