npm - @kevinrabun/judges - Versions diffs - 3.23.16 → 3.23.17 - Mend

@kevinrabun/judges 3.23.16 → 3.23.17

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (5) hide show

package/CHANGELOG.md CHANGED Viewed

@@ -2,6 +2,25 @@
 All notable changes to **@kevinrabun/judges** are documented here.
+## [3.23.17] — 2026-03-07
+### Changed
+- **Judge count updated to 39** — All references across docs, tests, HTML, action.yml, Dockerfile, and README updated from 37 to 39
+- **VS Code extension README rewritten** — New adoption-focused copy: 1-sentence value prop, "Try in 60 seconds" quick start, noise-control section, CI integration guide, full 15-language listing
+- **Default `minSeverity` raised to `"high"`** — New installs see only critical + high findings, reducing noise for first-time users
+- **Preset dropdown with enum values** — `judges.preset` now offers named choices (strict, lenient, security-only, startup, compliance, performance) in the Settings UI
+### Added
+- **First-run toast notification** — After the first successful evaluation, a one-time toast introduces `@judges` chat and links to noise settings
+- **`Judges: Add CI Workflow` command** — Generates `.github/workflows/judges.yml` with a PR-triggered security-only preset
+- **"Report false positive" code action** — New Quick Fix action opens a pre-filled GitHub issue for any Judges finding
+- **Enhanced `@judges /help`** — Now includes verdict bands (PASS/WARN/FAIL), noise-control tips, and more examples
+- **Improved chat command inference** — `inferCommand()` now recognizes "run judges", "judges review", "evaluate", "check" as review intent
+- **Updated welcome view** — Findings panel shows 3 quick actions: evaluate file, evaluate workspace, open @judges chat
+### Tests
+- 1040 tests passing (0 failures)
 ## [3.23.16] — 2026-03-07
 ### Fixed

package/README.md CHANGED Viewed

@@ -1,6 +1,6 @@
 # Judges Panel
-An MCP (Model Context Protocol) server that provides a panel of **37 specialized judges** to evaluate AI-generated code — acting as an independent quality gate regardless of which project is being reviewed. Combines **deterministic pattern matching & AST analysis** (instant, offline, zero LLM calls) with **LLM-powered deep-review prompts** that let your AI assistant perform expert-persona analysis across all 37 domains.
+An MCP (Model Context Protocol) server that provides a panel of **39 specialized judges** to evaluate AI-generated code — acting as an independent quality gate regardless of which project is being reviewed. Combines **deterministic pattern matching & AST analysis** (instant, offline, zero LLM calls) with **LLM-powered deep-review prompts** that let your AI assistant perform expert-persona analysis across all 39 domains.
 **Highlights:**
 - Includes an **App Builder Workflow (3-step)** demo for release decisions, plain-language risk summaries, and prioritized fixes — see [Try the Demo](#2-try-the-demo).
@@ -17,11 +17,11 @@ An MCP (Model Context Protocol) server that provides a panel of **37 specialized
 ## Why Judges?
-AI code generators (Copilot, Cursor, Claude, ChatGPT, etc.) write code fast — but they routinely produce **insecure defaults, missing auth, hardcoded secrets, and poor error handling**. Human reviewers catch some of this, but nobody reviews 37 dimensions consistently.
+AI code generators (Copilot, Cursor, Claude, ChatGPT, etc.) write code fast — but they routinely produce **insecure defaults, missing auth, hardcoded secrets, and poor error handling**. Human reviewers catch some of this, but nobody reviews 39 dimensions consistently.
 | | ESLint / Biome | SonarQube | Semgrep / CodeQL | **Judges** |
 |---|---|---|---|---|
-| **Scope** | Style + some bugs | Bugs + code smells | Security patterns | **37 domains**: security, cost, compliance, a11y, API design, cloud, UX, … |
+| **Scope** | Style + some bugs | Bugs + code smells | Security patterns | **39 domains**: security, cost, compliance, a11y, API design, cloud, UX, … |
 | **AI-generated code focus** | No | No | Partial | **Purpose-built** for AI output failure modes |
 | **Setup** | Config per project | Server + scanner | Cloud or local | **One command**: `npx @kevinrabun/judges eval file.ts` |
 | **Auto-fix patches** | Some | No | No | **114 deterministic patches** — instant, offline |
@@ -79,7 +79,7 @@ judges eval --min-score 80 src/api.ts
 # One-line summary for scripts
 judges eval --summary src/api.ts
-# List all 37 judges
+# List all 39 judges
 judges list
 ```
@@ -190,7 +190,7 @@ npm run build
 ### 2. Try the Demo
-Run the included demo to see all 37 judges evaluate a purposely flawed API server:
+Run the included demo to see all 39 judges evaluate a purposely flawed API server:
 ```bash
 npm run demo
@@ -293,7 +293,7 @@ Install the **[Judges Panel](https://marketplace.visualstudio.com/items?itemName
 - **Inline diagnostics & quick-fixes** on every file save
 - **`@judges` chat participant** — type `@judges` in Copilot Chat, or just ask for a "judges panel review" and Copilot routes automatically
-- **Auto-configured MCP server** — all 37 expert-persona prompts available to Copilot with zero setup
+- **Auto-configured MCP server** — all 39 expert-persona prompts available to Copilot with zero setup
 ```bash
 code --install-extension kevinrabun.judges-panel
@@ -420,7 +420,7 @@ All commands support `--help` for usage details.
 ### `judges eval`
-Evaluate a file with all 37 judges or a single judge.
+Evaluate a file with all 39 judges or a single judge.
 | Flag | Description |
 |------|-------------|
@@ -669,13 +669,13 @@ The tribunal operates in three layers:
 2. **AST-Based Structural Analysis** — The Code Structure judge (`STRUCT-*` rules) uses real Abstract Syntax Tree parsing to measure cyclomatic complexity, nesting depth, function length, parameter count, dead code, and type safety with precision that regex cannot achieve. All supported languages — **TypeScript, JavaScript, Python, Rust, Go, Java, C#, and C++** — are parsed via **tree-sitter WASM grammars** (real syntax trees compiled to WebAssembly, in-process, zero native dependencies). A scope-tracking structural parser is kept as a fallback when WASM grammars are unavailable. No external AST server required.
-3. **LLM-Powered Deep Analysis (Prompts)** — The server exposes MCP prompts (e.g., `judge-data-security`, `full-tribunal`) that provide each judge's expert persona as a system prompt. When used by an LLM-based client (Copilot, Claude, Cursor, etc.), the host LLM performs deeper, context-aware probabilistic analysis beyond what static patterns can detect. This is where the `systemPrompt` on each judge comes alive — Judges itself makes no LLM calls, but it provides the expert criteria so your AI assistant can act as 37 specialized reviewers.
+3. **LLM-Powered Deep Analysis (Prompts)** — The server exposes MCP prompts (e.g., `judge-data-security`, `full-tribunal`) that provide each judge's expert persona as a system prompt. When used by an LLM-based client (Copilot, Claude, Cursor, etc.), the host LLM performs deeper, context-aware probabilistic analysis beyond what static patterns can detect. This is where the `systemPrompt` on each judge comes alive — Judges itself makes no LLM calls, but it provides the expert criteria so your AI assistant can act as 39 specialized reviewers.
 ---
 ## Composable by Design
-Judges Panel is a **dual-layer** review system: instant **deterministic tools** (offline, no API keys) for pattern and AST analysis, plus **37 expert-persona MCP prompts** that unlock LLM-powered deep analysis when connected to an AI client. It does not try to be a CVE scanner or a linter. Those capabilities belong in dedicated MCP servers that an AI agent can orchestrate alongside Judges.
+Judges Panel is a **dual-layer** review system: instant **deterministic tools** (offline, no API keys) for pattern and AST analysis, plus **39 expert-persona MCP prompts** that unlock LLM-powered deep analysis when connected to an AI client. It does not try to be a CVE scanner or a linter. Those capabilities belong in dedicated MCP servers that an AI agent can orchestrate alongside Judges.
 ### Built-in AST Analysis (v2.0.0+)
@@ -724,7 +724,7 @@ When your AI coding assistant connects to multiple MCP servers, each one contrib
 | Layer | What It Does | Example Servers |
 |-------|-------------|-----------------|
-| **Judges Panel** | 37-judge quality gate — security patterns, AST analysis, cost, scalability, a11y, compliance, sovereignty, ethics, dependency health, agent instruction governance, AI code safety, framework safety | This server |
+| **Judges Panel** | 39-judge quality gate — security patterns, AST analysis, cost, scalability, a11y, compliance, sovereignty, ethics, dependency health, agent instruction governance, AI code safety, framework safety | This server |
 | **CVE / SBOM** | Vulnerability scanning against live databases — known CVEs, license risks, supply chain | OSV, Snyk, Trivy, Grype MCP servers |
 | **Linting** | Language-specific style and correctness rules | ESLint, Ruff, Clippy MCP servers |
 | **Runtime Profiling** | Memory, CPU, latency measurement on running code | Custom profiling MCP servers |
@@ -878,7 +878,7 @@ Generated from https://github.com/microsoft/vscode on 2026-02-21T12:00:00.000Z.
 List all available judges with their domains and descriptions.
 ### `evaluate_code`
-Submit code to the **full judges panel**. All 37 judges evaluate independently and return a combined verdict.
+Submit code to the **full judges panel**. all 39 judges evaluate independently and return a combined verdict.
 | Parameter | Type | Required | Description |
 |-----------|------|----------|-------------|
@@ -902,7 +902,7 @@ Submit code to a **specific judge** for targeted review.
 | `config` | object | no | Inline configuration (see [Configuration](#configuration)) |
 ### `evaluate_project`
-Submit multiple files for **project-level analysis**. All 37 judges evaluate each file, plus cross-file architectural analysis detects code duplication, inconsistent error handling, and dependency cycles.
+Submit multiple files for **project-level analysis**. all 39 judges evaluate each file, plus cross-file architectural analysis detects code duplication, inconsistent error handling, and dependency cycles.
 | Parameter | Type | Required | Description |
 |-----------|------|----------|-------------|
@@ -913,7 +913,7 @@ Submit multiple files for **project-level analysis**. All 37 judges evaluate eac
 | `config` | object | no | Inline configuration (see [Configuration](#configuration)) |
 ### `evaluate_diff`
-Evaluate only the **changed lines** in a code diff. Runs all 37 judges on the full file but filters findings to lines you specify. Ideal for PR reviews and incremental analysis.
+Evaluate only the **changed lines** in a code diff. Runs all 39 judges on the full file but filters findings to lines you specify. Ideal for PR reviews and incremental analysis.
 | Parameter | Type | Required | Description |
 |-----------|------|----------|-------------|
@@ -983,7 +983,7 @@ Each judge has a corresponding prompt for LLM-powered deep analysis:
 | `judge-framework-safety` | Deep review of framework-specific safety: React hooks, Express middleware, Next.js SSR/SSG, Angular/Vue patterns |
 | `judge-iac-security` | Deep review of infrastructure-as-code security: Terraform, Bicep, ARM template misconfigurations |
 | `judge-false-positive-review` | Meta-judge review of pattern-based findings for false positive detection and accuracy |
-| `full-tribunal` | All 37 judges in a single prompt |
+| `full-tribunal` | all 39 judges in a single prompt |
 ---
@@ -1105,7 +1105,7 @@ Each judge scores the code from **0 to 100**:
 - **WARNING** — Any high finding, any medium finding, or score < 80
 - **PASS** — Score ≥ 80 with no critical, high, or medium findings
-The **overall tribunal score** is the average of all 37 judges. The overall verdict fails if **any** judge fails.
+The **overall tribunal score** is the average of all 39 judges. The overall verdict fails if **any** judge fails.
 ---
@@ -1139,7 +1139,7 @@ judges/
 │   ├── evaluators/           # Analysis engine for each judge
 │   │   ├── index.ts          # evaluateWithJudge(), evaluateWithTribunal(), evaluateProject(), etc.
 │   │   ├── shared.ts         # Scoring, verdict logic, markdown formatters
-│   │   └── *.ts              # One analyzer per judge (37 files)
+│   │   └── *.ts              # One analyzer per judge (39 files)
 │   ├── formatters/           # Output formatters
 │   │   ├── sarif.ts              # SARIF 2.1.0 output
 │   │   ├── html.ts               # Self-contained HTML report (dark/light theme, filters)
@@ -1177,7 +1177,7 @@ judges/
 │   │   └── public-repo-report.ts   # Public repo clone + full tribunal report generation
 │   └── judges/               # Judge definitions (id, name, domain, system prompt)
 │       ├── index.ts          # JUDGES array, getJudge(), getJudgeSummaries()
-│       └── *.ts              # One definition per judge (37 files)
+│       └── *.ts              # One definition per judge (39 files)
 ├── scripts/
 │   ├── generate-public-repo-report.ts  # Run: npm run report:public-repo -- --repoUrl <url>
 │   ├── daily-popular-repo-autofix.ts   # Run: npm run automation:daily-popular
@@ -1242,7 +1242,7 @@ judges/
 | `judges config export` | Export config as shareable package |
 | `judges config import <src>` | Import a shared configuration |
 | `judges compare` | Compare judges against other code review tools |
-| `judges list` | List all 37 judges with domains and descriptions |
+| `judges list` | List all 39 judges with domains and descriptions |
 ---

package/dist/tools/prompts.js CHANGED Viewed

@@ -4,7 +4,7 @@
 //
 // Token-optimised: shared behavioural directives (adversarial mandate,
 // precision mandate) are stated ONCE in the tribunal preamble instead of
-// being duplicated across all 37 judges. Per-judge sections include only
+// being duplicated across all 39 judges. Per-judge sections include only
 // the unique evaluation criteria, domain-specific rules, and FP-avoidance
 // guidance. This reduces the tribunal prompt by ~40 000 chars (~10 000
 // tokens) without removing any evaluation criteria.
@@ -13,7 +13,7 @@ import { z } from "zod";
 import { JUDGES } from "../judges/index.js";
 // ─── Shared Behavioural Directives ───────────────────────────────────────────
 // Stated ONCE in the tribunal preamble so every judge benefits without
-// repeating the text 37 times.
+// repeating the text 39 times.
 // ──────────────────────────────────────────────────────────────────────────────
 /** Adversarial evaluation stance — shared across all judges. */
 const SHARED_ADVERSARIAL_MANDATE = `ADVERSARIAL MANDATE (applies to ALL judges):

package/package.json CHANGED Viewed

@@ -1,7 +1,7 @@
 {
   "name": "@kevinrabun/judges",
-  "version": "3.23.16",
-  "description": "37 specialized judges that evaluate AI-generated code for security, cost, and quality.",
+  "version": "3.23.17",
+  "description": "39 specialized judges that evaluate AI-generated code for security, cost, and quality.",
   "mcpName": "io.github.KevinRabun/judges",
   "type": "module",
   "main": "dist/index.js",

package/server.json CHANGED Viewed

@@ -7,12 +7,12 @@
     "url": "https://github.com/kevinrabun/judges",
     "source": "github"
   },
-  "version": "3.23.16",
+  "version": "3.23.17",
   "packages": [
     {
       "registryType": "npm",
       "identifier": "@kevinrabun/judges",
-      "version": "3.23.16",
+      "version": "3.23.17",
       "transport": {
         "type": "stdio"
       }