@kevinrabun/judges 3.23.16 → 3.23.17

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -2,6 +2,25 @@
2
2
 
3
3
  All notable changes to **@kevinrabun/judges** are documented here.
4
4
 
5
+ ## [3.23.17] — 2026-03-07
6
+
7
+ ### Changed
8
+ - **Judge count updated to 39** — All references across docs, tests, HTML, action.yml, Dockerfile, and README updated from 37 to 39
9
+ - **VS Code extension README rewritten** — New adoption-focused copy: 1-sentence value prop, "Try in 60 seconds" quick start, noise-control section, CI integration guide, full 15-language listing
10
+ - **Default `minSeverity` raised to `"high"`** — New installs see only critical + high findings, reducing noise for first-time users
11
+ - **Preset dropdown with enum values** — `judges.preset` now offers named choices (strict, lenient, security-only, startup, compliance, performance) in the Settings UI
12
+
13
+ ### Added
14
+ - **First-run toast notification** — After the first successful evaluation, a one-time toast introduces `@judges` chat and links to noise settings
15
+ - **`Judges: Add CI Workflow` command** — Generates `.github/workflows/judges.yml` with a PR-triggered security-only preset
16
+ - **"Report false positive" code action** — New Quick Fix action opens a pre-filled GitHub issue for any Judges finding
17
+ - **Enhanced `@judges /help`** — Now includes verdict bands (PASS/WARN/FAIL), noise-control tips, and more examples
18
+ - **Improved chat command inference** — `inferCommand()` now recognizes "run judges", "judges review", "evaluate", "check" as review intent
19
+ - **Updated welcome view** — Findings panel shows 3 quick actions: evaluate file, evaluate workspace, open @judges chat
20
+
21
+ ### Tests
22
+ - 1040 tests passing (0 failures)
23
+
5
24
  ## [3.23.16] — 2026-03-07
6
25
 
7
26
  ### Fixed
package/README.md CHANGED
@@ -1,6 +1,6 @@
1
1
  # Judges Panel
2
2
 
3
- An MCP (Model Context Protocol) server that provides a panel of **37 specialized judges** to evaluate AI-generated code — acting as an independent quality gate regardless of which project is being reviewed. Combines **deterministic pattern matching & AST analysis** (instant, offline, zero LLM calls) with **LLM-powered deep-review prompts** that let your AI assistant perform expert-persona analysis across all 37 domains.
3
+ An MCP (Model Context Protocol) server that provides a panel of **39 specialized judges** to evaluate AI-generated code — acting as an independent quality gate regardless of which project is being reviewed. Combines **deterministic pattern matching & AST analysis** (instant, offline, zero LLM calls) with **LLM-powered deep-review prompts** that let your AI assistant perform expert-persona analysis across all 39 domains.
4
4
 
5
5
  **Highlights:**
6
6
  - Includes an **App Builder Workflow (3-step)** demo for release decisions, plain-language risk summaries, and prioritized fixes — see [Try the Demo](#2-try-the-demo).
@@ -17,11 +17,11 @@ An MCP (Model Context Protocol) server that provides a panel of **37 specialized
17
17
 
18
18
  ## Why Judges?
19
19
 
20
- AI code generators (Copilot, Cursor, Claude, ChatGPT, etc.) write code fast — but they routinely produce **insecure defaults, missing auth, hardcoded secrets, and poor error handling**. Human reviewers catch some of this, but nobody reviews 37 dimensions consistently.
20
+ AI code generators (Copilot, Cursor, Claude, ChatGPT, etc.) write code fast — but they routinely produce **insecure defaults, missing auth, hardcoded secrets, and poor error handling**. Human reviewers catch some of this, but nobody reviews 39 dimensions consistently.
21
21
 
22
22
  | | ESLint / Biome | SonarQube | Semgrep / CodeQL | **Judges** |
23
23
  |---|---|---|---|---|
24
- | **Scope** | Style + some bugs | Bugs + code smells | Security patterns | **37 domains**: security, cost, compliance, a11y, API design, cloud, UX, … |
24
+ | **Scope** | Style + some bugs | Bugs + code smells | Security patterns | **39 domains**: security, cost, compliance, a11y, API design, cloud, UX, … |
25
25
  | **AI-generated code focus** | No | No | Partial | **Purpose-built** for AI output failure modes |
26
26
  | **Setup** | Config per project | Server + scanner | Cloud or local | **One command**: `npx @kevinrabun/judges eval file.ts` |
27
27
  | **Auto-fix patches** | Some | No | No | **114 deterministic patches** — instant, offline |
@@ -79,7 +79,7 @@ judges eval --min-score 80 src/api.ts
79
79
  # One-line summary for scripts
80
80
  judges eval --summary src/api.ts
81
81
 
82
- # List all 37 judges
82
+ # List all 39 judges
83
83
  judges list
84
84
  ```
85
85
 
@@ -190,7 +190,7 @@ npm run build
190
190
 
191
191
  ### 2. Try the Demo
192
192
 
193
- Run the included demo to see all 37 judges evaluate a purposely flawed API server:
193
+ Run the included demo to see all 39 judges evaluate a purposely flawed API server:
194
194
 
195
195
  ```bash
196
196
  npm run demo
@@ -293,7 +293,7 @@ Install the **[Judges Panel](https://marketplace.visualstudio.com/items?itemName
293
293
 
294
294
  - **Inline diagnostics & quick-fixes** on every file save
295
295
  - **`@judges` chat participant** — type `@judges` in Copilot Chat, or just ask for a "judges panel review" and Copilot routes automatically
296
- - **Auto-configured MCP server** — all 37 expert-persona prompts available to Copilot with zero setup
296
+ - **Auto-configured MCP server** — all 39 expert-persona prompts available to Copilot with zero setup
297
297
 
298
298
  ```bash
299
299
  code --install-extension kevinrabun.judges-panel
@@ -420,7 +420,7 @@ All commands support `--help` for usage details.
420
420
 
421
421
  ### `judges eval`
422
422
 
423
- Evaluate a file with all 37 judges or a single judge.
423
+ Evaluate a file with all 39 judges or a single judge.
424
424
 
425
425
  | Flag | Description |
426
426
  |------|-------------|
@@ -669,13 +669,13 @@ The tribunal operates in three layers:
669
669
 
670
670
  2. **AST-Based Structural Analysis** — The Code Structure judge (`STRUCT-*` rules) uses real Abstract Syntax Tree parsing to measure cyclomatic complexity, nesting depth, function length, parameter count, dead code, and type safety with precision that regex cannot achieve. All supported languages — **TypeScript, JavaScript, Python, Rust, Go, Java, C#, and C++** — are parsed via **tree-sitter WASM grammars** (real syntax trees compiled to WebAssembly, in-process, zero native dependencies). A scope-tracking structural parser is kept as a fallback when WASM grammars are unavailable. No external AST server required.
671
671
 
672
- 3. **LLM-Powered Deep Analysis (Prompts)** — The server exposes MCP prompts (e.g., `judge-data-security`, `full-tribunal`) that provide each judge's expert persona as a system prompt. When used by an LLM-based client (Copilot, Claude, Cursor, etc.), the host LLM performs deeper, context-aware probabilistic analysis beyond what static patterns can detect. This is where the `systemPrompt` on each judge comes alive — Judges itself makes no LLM calls, but it provides the expert criteria so your AI assistant can act as 37 specialized reviewers.
672
+ 3. **LLM-Powered Deep Analysis (Prompts)** — The server exposes MCP prompts (e.g., `judge-data-security`, `full-tribunal`) that provide each judge's expert persona as a system prompt. When used by an LLM-based client (Copilot, Claude, Cursor, etc.), the host LLM performs deeper, context-aware probabilistic analysis beyond what static patterns can detect. This is where the `systemPrompt` on each judge comes alive — Judges itself makes no LLM calls, but it provides the expert criteria so your AI assistant can act as 39 specialized reviewers.
673
673
 
674
674
  ---
675
675
 
676
676
  ## Composable by Design
677
677
 
678
- Judges Panel is a **dual-layer** review system: instant **deterministic tools** (offline, no API keys) for pattern and AST analysis, plus **37 expert-persona MCP prompts** that unlock LLM-powered deep analysis when connected to an AI client. It does not try to be a CVE scanner or a linter. Those capabilities belong in dedicated MCP servers that an AI agent can orchestrate alongside Judges.
678
+ Judges Panel is a **dual-layer** review system: instant **deterministic tools** (offline, no API keys) for pattern and AST analysis, plus **39 expert-persona MCP prompts** that unlock LLM-powered deep analysis when connected to an AI client. It does not try to be a CVE scanner or a linter. Those capabilities belong in dedicated MCP servers that an AI agent can orchestrate alongside Judges.
679
679
 
680
680
  ### Built-in AST Analysis (v2.0.0+)
681
681
 
@@ -724,7 +724,7 @@ When your AI coding assistant connects to multiple MCP servers, each one contrib
724
724
 
725
725
  | Layer | What It Does | Example Servers |
726
726
  |-------|-------------|-----------------|
727
- | **Judges Panel** | 37-judge quality gate — security patterns, AST analysis, cost, scalability, a11y, compliance, sovereignty, ethics, dependency health, agent instruction governance, AI code safety, framework safety | This server |
727
+ | **Judges Panel** | 39-judge quality gate — security patterns, AST analysis, cost, scalability, a11y, compliance, sovereignty, ethics, dependency health, agent instruction governance, AI code safety, framework safety | This server |
728
728
  | **CVE / SBOM** | Vulnerability scanning against live databases — known CVEs, license risks, supply chain | OSV, Snyk, Trivy, Grype MCP servers |
729
729
  | **Linting** | Language-specific style and correctness rules | ESLint, Ruff, Clippy MCP servers |
730
730
  | **Runtime Profiling** | Memory, CPU, latency measurement on running code | Custom profiling MCP servers |
@@ -878,7 +878,7 @@ Generated from https://github.com/microsoft/vscode on 2026-02-21T12:00:00.000Z.
878
878
  List all available judges with their domains and descriptions.
879
879
 
880
880
  ### `evaluate_code`
881
- Submit code to the **full judges panel**. All 37 judges evaluate independently and return a combined verdict.
881
+ Submit code to the **full judges panel**. all 39 judges evaluate independently and return a combined verdict.
882
882
 
883
883
  | Parameter | Type | Required | Description |
884
884
  |-----------|------|----------|-------------|
@@ -902,7 +902,7 @@ Submit code to a **specific judge** for targeted review.
902
902
  | `config` | object | no | Inline configuration (see [Configuration](#configuration)) |
903
903
 
904
904
  ### `evaluate_project`
905
- Submit multiple files for **project-level analysis**. All 37 judges evaluate each file, plus cross-file architectural analysis detects code duplication, inconsistent error handling, and dependency cycles.
905
+ Submit multiple files for **project-level analysis**. all 39 judges evaluate each file, plus cross-file architectural analysis detects code duplication, inconsistent error handling, and dependency cycles.
906
906
 
907
907
  | Parameter | Type | Required | Description |
908
908
  |-----------|------|----------|-------------|
@@ -913,7 +913,7 @@ Submit multiple files for **project-level analysis**. All 37 judges evaluate eac
913
913
  | `config` | object | no | Inline configuration (see [Configuration](#configuration)) |
914
914
 
915
915
  ### `evaluate_diff`
916
- Evaluate only the **changed lines** in a code diff. Runs all 37 judges on the full file but filters findings to lines you specify. Ideal for PR reviews and incremental analysis.
916
+ Evaluate only the **changed lines** in a code diff. Runs all 39 judges on the full file but filters findings to lines you specify. Ideal for PR reviews and incremental analysis.
917
917
 
918
918
  | Parameter | Type | Required | Description |
919
919
  |-----------|------|----------|-------------|
@@ -983,7 +983,7 @@ Each judge has a corresponding prompt for LLM-powered deep analysis:
983
983
  | `judge-framework-safety` | Deep review of framework-specific safety: React hooks, Express middleware, Next.js SSR/SSG, Angular/Vue patterns |
984
984
  | `judge-iac-security` | Deep review of infrastructure-as-code security: Terraform, Bicep, ARM template misconfigurations |
985
985
  | `judge-false-positive-review` | Meta-judge review of pattern-based findings for false positive detection and accuracy |
986
- | `full-tribunal` | All 37 judges in a single prompt |
986
+ | `full-tribunal` | all 39 judges in a single prompt |
987
987
 
988
988
  ---
989
989
 
@@ -1105,7 +1105,7 @@ Each judge scores the code from **0 to 100**:
1105
1105
  - **WARNING** — Any high finding, any medium finding, or score < 80
1106
1106
  - **PASS** — Score ≥ 80 with no critical, high, or medium findings
1107
1107
 
1108
- The **overall tribunal score** is the average of all 37 judges. The overall verdict fails if **any** judge fails.
1108
+ The **overall tribunal score** is the average of all 39 judges. The overall verdict fails if **any** judge fails.
1109
1109
 
1110
1110
  ---
1111
1111
 
@@ -1139,7 +1139,7 @@ judges/
1139
1139
  │ ├── evaluators/ # Analysis engine for each judge
1140
1140
  │ │ ├── index.ts # evaluateWithJudge(), evaluateWithTribunal(), evaluateProject(), etc.
1141
1141
  │ │ ├── shared.ts # Scoring, verdict logic, markdown formatters
1142
- │ │ └── *.ts # One analyzer per judge (37 files)
1142
+ │ │ └── *.ts # One analyzer per judge (39 files)
1143
1143
  │ ├── formatters/ # Output formatters
1144
1144
  │ │ ├── sarif.ts # SARIF 2.1.0 output
1145
1145
  │ │ ├── html.ts # Self-contained HTML report (dark/light theme, filters)
@@ -1177,7 +1177,7 @@ judges/
1177
1177
  │ │ └── public-repo-report.ts # Public repo clone + full tribunal report generation
1178
1178
  │ └── judges/ # Judge definitions (id, name, domain, system prompt)
1179
1179
  │ ├── index.ts # JUDGES array, getJudge(), getJudgeSummaries()
1180
- │ └── *.ts # One definition per judge (37 files)
1180
+ │ └── *.ts # One definition per judge (39 files)
1181
1181
  ├── scripts/
1182
1182
  │ ├── generate-public-repo-report.ts # Run: npm run report:public-repo -- --repoUrl <url>
1183
1183
  │ ├── daily-popular-repo-autofix.ts # Run: npm run automation:daily-popular
@@ -1242,7 +1242,7 @@ judges/
1242
1242
  | `judges config export` | Export config as shareable package |
1243
1243
  | `judges config import <src>` | Import a shared configuration |
1244
1244
  | `judges compare` | Compare judges against other code review tools |
1245
- | `judges list` | List all 37 judges with domains and descriptions |
1245
+ | `judges list` | List all 39 judges with domains and descriptions |
1246
1246
 
1247
1247
  ---
1248
1248
 
@@ -4,7 +4,7 @@
4
4
  //
5
5
  // Token-optimised: shared behavioural directives (adversarial mandate,
6
6
  // precision mandate) are stated ONCE in the tribunal preamble instead of
7
- // being duplicated across all 37 judges. Per-judge sections include only
7
+ // being duplicated across all 39 judges. Per-judge sections include only
8
8
  // the unique evaluation criteria, domain-specific rules, and FP-avoidance
9
9
  // guidance. This reduces the tribunal prompt by ~40 000 chars (~10 000
10
10
  // tokens) without removing any evaluation criteria.
@@ -13,7 +13,7 @@ import { z } from "zod";
13
13
  import { JUDGES } from "../judges/index.js";
14
14
  // ─── Shared Behavioural Directives ───────────────────────────────────────────
15
15
  // Stated ONCE in the tribunal preamble so every judge benefits without
16
- // repeating the text 37 times.
16
+ // repeating the text 39 times.
17
17
  // ──────────────────────────────────────────────────────────────────────────────
18
18
  /** Adversarial evaluation stance — shared across all judges. */
19
19
  const SHARED_ADVERSARIAL_MANDATE = `ADVERSARIAL MANDATE (applies to ALL judges):
package/package.json CHANGED
@@ -1,7 +1,7 @@
1
1
  {
2
2
  "name": "@kevinrabun/judges",
3
- "version": "3.23.16",
4
- "description": "37 specialized judges that evaluate AI-generated code for security, cost, and quality.",
3
+ "version": "3.23.17",
4
+ "description": "39 specialized judges that evaluate AI-generated code for security, cost, and quality.",
5
5
  "mcpName": "io.github.KevinRabun/judges",
6
6
  "type": "module",
7
7
  "main": "dist/index.js",
package/server.json CHANGED
@@ -7,12 +7,12 @@
7
7
  "url": "https://github.com/kevinrabun/judges",
8
8
  "source": "github"
9
9
  },
10
- "version": "3.23.16",
10
+ "version": "3.23.17",
11
11
  "packages": [
12
12
  {
13
13
  "registryType": "npm",
14
14
  "identifier": "@kevinrabun/judges",
15
- "version": "3.23.16",
15
+ "version": "3.23.17",
16
16
  "transport": {
17
17
  "type": "stdio"
18
18
  }