npm - @kevinrabun/judges - Versions diffs - 3.20.3 → 3.20.4 - Mend

@kevinrabun/judges 3.20.3 → 3.20.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (4) hide show

package/CHANGELOG.md CHANGED Viewed

@@ -2,6 +2,15 @@
 All notable changes to **@kevinrabun/judges** are documented here.
+## [3.20.4] — 2026-03-03
+### Fixed
+- **Stale documentation counts** — Updated all references across README, docs, server.json, action.yml, package.json, Dockerfile, extension metadata, examples, and scripts from "35 judges" → "37 judges", "47 patches" → "53 patches", and test badge "1515" → "1557". Historical changelog entries left unchanged.
+### Tests
+- **Doc-claim verification tests** — Added 42 new tests covering: JUDGES array count assertion (exactly 37), judge schema validation (id, name, domain, description), unique judge ID enforcement, scoring penalty constants (critical=30, high=18, medium=10, low=5, info=2), confidence-weighted deductions, score floor/ceiling, positive signal bonuses (+3/+3/+3/+2/+2/+2/+2/+1/+1/+1 with cap at 15), verdict threshold logic (fail/warning/pass boundaries), and STRUCT threshold rules not previously covered: STRUCT-001 (CC>10), STRUCT-007 (file CC>40), STRUCT-008 (CC>20), STRUCT-010 (>150 lines).
+- All 1,557 tests pass (976 judges + 218 negative + 251 subsystems + 70 extension + 42 tool-routing)
 ## [3.20.3] — 2026-03-03
 ### Fixed

package/README.md CHANGED Viewed

@@ -1,6 +1,6 @@
 # Judges Panel
-An MCP (Model Context Protocol) server that provides a panel of **35 specialized judges** to evaluate AI-generated code — acting as an independent quality gate regardless of which project is being reviewed. Combines **deterministic pattern matching & AST analysis** (instant, offline, zero LLM calls) with **LLM-powered deep-review prompts** that let your AI assistant perform expert-persona analysis across all 35 domains.
+An MCP (Model Context Protocol) server that provides a panel of **37 specialized judges** to evaluate AI-generated code — acting as an independent quality gate regardless of which project is being reviewed. Combines **deterministic pattern matching & AST analysis** (instant, offline, zero LLM calls) with **LLM-powered deep-review prompts** that let your AI assistant perform expert-persona analysis across all 37 domains.
 **Highlights:**
 - Includes an **App Builder Workflow (3-step)** demo for release decisions, plain-language risk summaries, and prioritized fixes — see [Try the Demo](#2-try-the-demo).
@@ -11,7 +11,7 @@ An MCP (Model Context Protocol) server that provides a panel of **35 specialized
 [![npm](https://img.shields.io/npm/v/@kevinrabun/judges)](https://www.npmjs.com/package/@kevinrabun/judges)
 [![npm downloads](https://img.shields.io/npm/dw/@kevinrabun/judges)](https://www.npmjs.com/package/@kevinrabun/judges)
 [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
-[![Tests](https://img.shields.io/badge/tests-1267-brightgreen)](https://github.com/KevinRabun/judges/actions)
+[![Tests](https://img.shields.io/badge/tests-1557-brightgreen)](https://github.com/KevinRabun/judges/actions)
 ---
@@ -21,10 +21,10 @@ AI code generators (Copilot, Cursor, Claude, ChatGPT, etc.) write code fast —
 | | ESLint / Biome | SonarQube | Semgrep / CodeQL | **Judges** |
 |---|---|---|---|---|
-| **Scope** | Style + some bugs | Bugs + code smells | Security patterns | **35 domains**: security, cost, compliance, a11y, API design, cloud, UX, … |
+| **Scope** | Style + some bugs | Bugs + code smells | Security patterns | **37 domains**: security, cost, compliance, a11y, API design, cloud, UX, … |
 | **AI-generated code focus** | No | No | Partial | **Purpose-built** for AI output failure modes |
 | **Setup** | Config per project | Server + scanner | Cloud or local | **One command**: `npx @kevinrabun/judges eval file.ts` |
-| **Auto-fix patches** | Some | No | No | **47 deterministic patches** — instant, offline |
+| **Auto-fix patches** | Some | No | No | **53 deterministic patches** — instant, offline |
 | **Non-technical output** | No | Dashboard | No | **Plain-language findings** with What/Why/Next |
 | **MCP native** | No | No | No | **Yes** — works inside Copilot, Claude, Cursor |
 | **SARIF output** | No | Yes | Yes | **Yes** — upload to GitHub Code Scanning |
@@ -79,7 +79,7 @@ judges eval --min-score 80 src/api.ts
 # One-line summary for scripts
 judges eval --summary src/api.ts
-# List all 35 judges
+# List all 37 judges
 judges list
 ```
@@ -190,7 +190,7 @@ npm run build
 ### 2. Try the Demo
-Run the included demo to see all 35 judges evaluate a purposely flawed API server:
+Run the included demo to see all 37 judges evaluate a purposely flawed API server:
 ```bash
 npm run demo
@@ -293,7 +293,7 @@ Install the **[Judges Panel](https://marketplace.visualstudio.com/items?itemName
 - **Inline diagnostics & quick-fixes** on every file save
 - **`@judges` chat participant** — type `@judges` in Copilot Chat, or just ask for a "judges panel review" and Copilot routes automatically
-- **Auto-configured MCP server** — all 35 expert-persona prompts available to Copilot with zero setup
+- **Auto-configured MCP server** — all 37 expert-persona prompts available to Copilot with zero setup
 ```bash
 code --install-extension kevinrabun.judges-panel
@@ -420,7 +420,7 @@ All commands support `--help` for usage details.
 ### `judges eval`
-Evaluate a file with all 35 judges or a single judge.
+Evaluate a file with all 37 judges or a single judge.
 | Flag | Description |
 |------|-------------|
@@ -667,13 +667,13 @@ The tribunal operates in three layers:
 2. **AST-Based Structural Analysis** — The Code Structure judge (`STRUCT-*` rules) uses real Abstract Syntax Tree parsing to measure cyclomatic complexity, nesting depth, function length, parameter count, dead code, and type safety with precision that regex cannot achieve. All supported languages — **TypeScript, JavaScript, Python, Rust, Go, Java, C#, and C++** — are parsed via **tree-sitter WASM grammars** (real syntax trees compiled to WebAssembly, in-process, zero native dependencies). A scope-tracking structural parser is kept as a fallback when WASM grammars are unavailable. No external AST server required.
-3. **LLM-Powered Deep Analysis (Prompts)** — The server exposes MCP prompts (e.g., `judge-data-security`, `full-tribunal`) that provide each judge's expert persona as a system prompt. When used by an LLM-based client (Copilot, Claude, Cursor, etc.), the host LLM performs deeper, context-aware probabilistic analysis beyond what static patterns can detect. This is where the `systemPrompt` on each judge comes alive — Judges itself makes no LLM calls, but it provides the expert criteria so your AI assistant can act as 35 specialized reviewers.
+3. **LLM-Powered Deep Analysis (Prompts)** — The server exposes MCP prompts (e.g., `judge-data-security`, `full-tribunal`) that provide each judge's expert persona as a system prompt. When used by an LLM-based client (Copilot, Claude, Cursor, etc.), the host LLM performs deeper, context-aware probabilistic analysis beyond what static patterns can detect. This is where the `systemPrompt` on each judge comes alive — Judges itself makes no LLM calls, but it provides the expert criteria so your AI assistant can act as 37 specialized reviewers.
 ---
 ## Composable by Design
-Judges Panel is a **dual-layer** review system: instant **deterministic tools** (offline, no API keys) for pattern and AST analysis, plus **35 expert-persona MCP prompts** that unlock LLM-powered deep analysis when connected to an AI client. It does not try to be a CVE scanner or a linter. Those capabilities belong in dedicated MCP servers that an AI agent can orchestrate alongside Judges.
+Judges Panel is a **dual-layer** review system: instant **deterministic tools** (offline, no API keys) for pattern and AST analysis, plus **37 expert-persona MCP prompts** that unlock LLM-powered deep analysis when connected to an AI client. It does not try to be a CVE scanner or a linter. Those capabilities belong in dedicated MCP servers that an AI agent can orchestrate alongside Judges.
 ### Built-in AST Analysis (v2.0.0+)
@@ -722,7 +722,7 @@ When your AI coding assistant connects to multiple MCP servers, each one contrib
 | Layer | What It Does | Example Servers |
 |-------|-------------|-----------------|
-| **Judges Panel** | 35-judge quality gate — security patterns, AST analysis, cost, scalability, a11y, compliance, sovereignty, ethics, dependency health, agent instruction governance, AI code safety, framework safety | This server |
+| **Judges Panel** | 37-judge quality gate — security patterns, AST analysis, cost, scalability, a11y, compliance, sovereignty, ethics, dependency health, agent instruction governance, AI code safety, framework safety | This server |
 | **CVE / SBOM** | Vulnerability scanning against live databases — known CVEs, license risks, supply chain | OSV, Snyk, Trivy, Grype MCP servers |
 | **Linting** | Language-specific style and correctness rules | ESLint, Ruff, Clippy MCP servers |
 | **Runtime Profiling** | Memory, CPU, latency measurement on running code | Custom profiling MCP servers |
@@ -876,7 +876,7 @@ Generated from https://github.com/microsoft/vscode on 2026-02-21T12:00:00.000Z.
 List all available judges with their domains and descriptions.
 ### `evaluate_code`
-Submit code to the **full judges panel**. All 35 judges evaluate independently and return a combined verdict.
+Submit code to the **full judges panel**. All 37 judges evaluate independently and return a combined verdict.
 | Parameter | Type | Required | Description |
 |-----------|------|----------|-------------|
@@ -900,7 +900,7 @@ Submit code to a **specific judge** for targeted review.
 | `config` | object | no | Inline configuration (see [Configuration](#configuration)) |
 ### `evaluate_project`
-Submit multiple files for **project-level analysis**. All 35 judges evaluate each file, plus cross-file architectural analysis detects code duplication, inconsistent error handling, and dependency cycles.
+Submit multiple files for **project-level analysis**. All 37 judges evaluate each file, plus cross-file architectural analysis detects code duplication, inconsistent error handling, and dependency cycles.
 | Parameter | Type | Required | Description |
 |-----------|------|----------|-------------|
@@ -911,7 +911,7 @@ Submit multiple files for **project-level analysis**. All 35 judges evaluate eac
 | `config` | object | no | Inline configuration (see [Configuration](#configuration)) |
 ### `evaluate_diff`
-Evaluate only the **changed lines** in a code diff. Runs all 35 judges on the full file but filters findings to lines you specify. Ideal for PR reviews and incremental analysis.
+Evaluate only the **changed lines** in a code diff. Runs all 37 judges on the full file but filters findings to lines you specify. Ideal for PR reviews and incremental analysis.
 | Parameter | Type | Required | Description |
 |-----------|------|----------|-------------|
@@ -979,7 +979,7 @@ Each judge has a corresponding prompt for LLM-powered deep analysis:
 | `judge-agent-instructions` | Deep review of agent instruction markdown quality and safety |
 | `judge-ai-code-safety` | Deep review of AI-generated code risks: prompt injection, insecure LLM output handling, debug defaults, missing validation |
 | `judge-framework-safety` | Deep review of framework-specific safety: React hooks, Express middleware, Next.js SSR/SSG, Angular/Vue patterns |
-| `full-tribunal` | All 35 judges in a single prompt |
+| `full-tribunal` | All 37 judges in a single prompt |
 ---
@@ -1101,7 +1101,7 @@ Each judge scores the code from **0 to 100**:
 - **WARNING** — Any high finding, any medium finding, or score < 80
 - **PASS** — Score ≥ 80 with no critical, high, or medium findings
-The **overall tribunal score** is the average of all 35 judges. The overall verdict fails if **any** judge fails.
+The **overall tribunal score** is the average of all 37 judges. The overall verdict fails if **any** judge fails.
 ---

package/package.json CHANGED Viewed

@@ -1,7 +1,7 @@
 {
   "name": "@kevinrabun/judges",
-  "version": "3.20.3",
-  "description": "35 specialized judges that evaluate AI-generated code for security, cost, and quality.",
+  "version": "3.20.4",
+  "description": "37 specialized judges that evaluate AI-generated code for security, cost, and quality.",
   "mcpName": "io.github.KevinRabun/judges",
   "type": "module",
   "main": "dist/index.js",

package/server.json CHANGED Viewed

@@ -2,17 +2,17 @@
   "$schema": "https://static.modelcontextprotocol.io/schemas/2025-12-11/server.schema.json",
   "name": "io.github.KevinRabun/judges",
   "title": "Judges Panel",
-  "description": "35 judges that evaluate AI-generated code for security, cost, and quality with built-in AST.",
+  "description": "37 judges that evaluate AI-generated code for security, cost, and quality with built-in AST.",
   "repository": {
     "url": "https://github.com/kevinrabun/judges",
     "source": "github"
   },
-  "version": "3.20.3",
+  "version": "3.20.4",
   "packages": [
     {
       "registryType": "npm",
       "identifier": "@kevinrabun/judges",
-      "version": "3.20.3",
+      "version": "3.20.4",
       "transport": {
         "type": "stdio"
       }
@@ -21,7 +21,7 @@
   "tools": [
     {
       "name": "evaluate_code",
-      "description": "Submit code to the full 35-judge tribunal for security, cost, and quality analysis. Handles all code types including application code, infrastructure-as-code (Bicep, Terraform, ARM), and configuration files."
+      "description": "Submit code to the full 37-judge tribunal for security, cost, and quality analysis. Handles all code types including application code, infrastructure-as-code (Bicep, Terraform, ARM), and configuration files."
     },
     {
       "name": "evaluate_code_single_judge",
@@ -59,7 +59,7 @@
   "prompts": [
     {
       "name": "full-tribunal",
-      "description": "Convene all 35 judges for a comprehensive LLM-powered deep review."
+      "description": "Convene all 37 judges for a comprehensive LLM-powered deep review."
     },
     {
       "name": "judge-{id}",