npm - @kevinrabun/judges - Versions diffs - 3.127.1 → 3.127.3 - Mend

@kevinrabun/judges 3.127.1 → 3.127.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (10) hide show

package/README.md +47 -3
package/dist/commands/llm-benchmark.js +1 -1
package/dist/evaluators/index.js +1 -1
package/dist/probabilistic/llm-response-validator.js +4 -1
package/dist/reports/public-repo-report.js +3 -7
package/dist/skill-loader.js +1 -1
package/package.json +1 -1
package/packages/judges-cli/README.md +36 -4
package/server.json +2 -2
package/src/skill-loader.ts +1 -1

package/README.md CHANGED Viewed

@@ -15,7 +15,7 @@ An MCP (Model Context Protocol) server that provides a panel of **45 specialized
 [![npm](https://img.shields.io/npm/v/@kevinrabun/judges)](https://www.npmjs.com/package/@kevinrabun/judges)
 [![npm downloads](https://img.shields.io/npm/dw/@kevinrabun/judges)](https://www.npmjs.com/package/@kevinrabun/judges)
 [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
-[![Tests](https://img.shields.io/badge/tests-2482-brightgreen)](https://github.com/KevinRabun/judges/actions)
+[![Tests](https://img.shields.io/badge/tests-3614-brightgreen)](https://github.com/KevinRabun/judges/actions)
 > 🔰 **Packages**
 > - **CLI**: `@kevinrabun/judges-cli` → binary `judges` (use `npx @kevinrabun/judges-cli eval --file app.ts`).
@@ -843,7 +843,7 @@ The tribunal operates in three layers:
 Judges Panel is a **dual-layer** review system: instant **deterministic tools** (offline, no API keys) for pattern and AST analysis, plus **45 expert-persona MCP prompts** that unlock LLM-powered deep analysis when connected to an AI client. It does not try to be a CVE scanner or a linter. Those capabilities belong in dedicated MCP servers that an AI agent can orchestrate alongside Judges.
-### Built-in AST Analysis (v2.0.0+)
+### Built-in AST Analysis
 Unlike earlier versions that recommended a separate AST MCP server, Judges Panel now includes **real AST-based structural analysis** out of the box:
@@ -1236,7 +1236,9 @@ Create a `.judgesrc.json` (or `.judgesrc`) file in your project root to customiz
   "languages": ["typescript", "python"],
   "format": "text",
   "failOnFindings": false,
-  "baseline": ""
+  "baseline": "",
+  "regulatoryScope": ["GDPR", "PCI-DSS", "SOC2"],
+  "consensusThreshold": 0.7
 }
 ```
@@ -1252,6 +1254,14 @@ Create a `.judgesrc.json` (or `.judgesrc`) file in your project root to customiz
 | `format` | `string` | `"text"` | Default output format: `text` · `json` · `sarif` · `markdown` · `html` · `pdf` · `junit` · `codeclimate` · `github-actions` |
 | `failOnFindings` | `boolean` | `false` | Exit code 1 when verdict is `fail` — useful for CI gates |
 | `baseline` | `string` | `""` | Path to a baseline JSON file — matching findings are suppressed |
+| `plugins` | `string[]` | `[]` | Plugin module specifiers (npm packages or relative paths) that export custom judges |
+| `judgeWeights` | `object` | `{}` | Weighted importance per judge for aggregated scoring (e.g. `{ "cybersecurity": 2.0 }`) |
+| `failOnScoreBelow` | `number` | — | Minimum score (0–100) for the run to pass; complements `failOnFindings` |
+| `regulatoryScope` | `string[]` | — | Regulatory frameworks in scope (e.g. `["GDPR", "PCI-DSS"]`). Findings citing ONLY out-of-scope frameworks are suppressed. Run `judges list --frameworks` for supported values. |
+| `consensusThreshold` | `number` | — | Consensus suppression (0–1). If this fraction of judges report zero findings, minority findings are suppressed. Recommended: `0.7` for CI. |
+| `escalationThreshold` | `number` | — | Confidence threshold (0–1) below which findings are flagged for human review |
+| `overrides` | `array` | `[]` | Path-scoped config overrides (e.g. `[{ "files": "**/*.test.ts", "disabledJudges": ["documentation"] }]`) |
+| `customRules` | `array` | `[]` | User-defined regex-based rules for business logic validation |
 All evaluation tools (CLI and MCP) accept the same configuration fields via `--config <path>` or inline `config` parameter.
@@ -1288,6 +1298,38 @@ Patches include `oldText`, `newText`, `startLine`, and `endLine` for automated a
 When multiple judges flag the same issue (e.g., both Data Security and Cybersecurity detect SQL injection on line 15), findings are automatically deduplicated. The highest-severity finding wins, and the description is annotated with cross-references (e.g., *"Also identified by: CYBER-003"*).
+### Human Focus Guide
+Every tribunal evaluation includes a `humanFocusGuide` that categorizes findings into three buckets for human reviewers:
+| Bucket | Description | When to use |
+|--------|-------------|-------------|
+| **✅ Trust** | High-confidence (≥80%), evidence-backed findings with AST/taint confirmation | Act directly — these have strong automated evidence |
+| **🔍 Verify** | Lower-confidence or absence-based findings | Use your judgment — the issue may exist elsewhere in the project |
+| **🔦 Blind Spots** | Areas automated analysis cannot evaluate | Focus your manual review time here |
+Blind spots are detected from code characteristics: complex branching logic, external service calls, financial calculations, PII handling, state machines, and complex regex. The guide appears in CLI text/markdown output, JSON/SARIF output, and GitHub Action step summaries.
+### Regulatory Scope
+Configure which regulatory frameworks apply to your project in `.judgesrc`:
+```json
+{ "regulatoryScope": ["GDPR", "PCI-DSS", "SOC2"] }
+```
+Findings that cite ONLY out-of-scope frameworks are suppressed. Findings with no regulatory reference (general code quality) are always kept. Run `judges list --frameworks` to see all 17 supported frameworks (GDPR, CCPA, HIPAA, PCI-DSS, SOC2, SOX, COPPA, FedRAMP, NIST, ISO27001, ePrivacy, DORA, NIS2, EU-AI-Act, and more).
+### Self-Teaching Amendments
+The LLM benchmark system auto-generates precision amendments for judges with high false-positive rates. Amendments are data-driven corrections injected into prompts that improve accuracy over successive benchmark runs.
+The self-teaching loop:
+1. Run benchmark → analyzer identifies judges below 70% precision
+2. Generates targeted amendments (e.g., "Judge ERR: do not flag clean Express code with framework error middleware")
+3. Next benchmark run loads amendments → precision improves
+4. Run `judges codify-amendments` to bake amendments permanently into the distributed package
 ### Taint Flow Analysis
 The engine performs inter-procedural taint tracking to trace data from user-controlled sources (e.g., `req.body`, `process.env`) through transformations to security-sensitive sinks (e.g., `eval()`, `exec()`, SQL queries). Taint flows are used to boost confidence on true-positive findings and suppress false positives where sanitization is detected.
@@ -1475,6 +1517,8 @@ judges/
 | `judges config import <src>` | Import a shared configuration |
 | `judges compare` | Compare judges against other code review tools |
 | `judges list` | List all 45 judges with domains and descriptions |
+| `judges list --frameworks` | List supported regulatory frameworks and `.judgesrc` usage |
+| `judges codify-amendments` | Bake self-teaching amendments into judge source files |
 ---

package/dist/commands/llm-benchmark.js CHANGED Viewed

@@ -153,7 +153,7 @@ export function parseLlmRuleIds(response) {
     // IDs mentioned in rationale text or findings tables of "clean" judge sections
     // from being counted as detections.
     const sections = response.split(/(?:^|\n)---\s*\n|(?=^## )/m);
-    const zeroFindingsPattern = /\*?\*?(?:ZERO|zero|0|no)\s+findings?\*?\*?|(?:findings?|issues?)[\s:]*\*?\*?(?:none|0|zero)\*?\*?|no\s+(?:issues?|findings?|problems?|concerns?)\s+(?:found|detected|identified|reported)|report(?:ing)?\s+zero|Score\s*[|:]\s*\*?\*?100\s*\/?\s*100\*?\*?/i;
+    const zeroFindingsPattern = /(?:ZERO|zero|0|no) findings?|findings?[:\s]*(?:none|0|zero)|no (?:issues|findings|problems|concerns) (?:found|detected|identified|reported)|reporting? zero|Score[|: ]*100/i;
     for (const section of sections) {
         // If this section explicitly declares zero/no findings or a perfect score,
         // skip rule ID extraction — any rule IDs are explanatory references

package/dist/evaluators/index.js CHANGED Viewed

@@ -504,7 +504,7 @@ function synthesizeHumanFocusGuide(findings, code, language) {
             });
         }
         // State machines / workflow
-        const hasStateMachine = /state\s*[=:]\s*['"][^'"]+['"]|status\s*===?\s*['"]|transition|workflow|step.*next/i.test(code);
+        const hasStateMachine = /state\s*[=:]\s*['"][^'"]+['"]|status\s*===?\s*['"]|transition|workflow|step[\w\s]{0,20}next/i.test(code);
         if (hasStateMachine) {
             blindSpots.push({
                 area: "State Management / Workflow Logic",

package/dist/probabilistic/llm-response-validator.js CHANGED Viewed

@@ -5,7 +5,10 @@ const SEVERITY_SET = new Set(["critical", "high", "medium", "low", "info"]);
  * Attempt to parse a JSON payload embedded in LLM output. Supports fenced code blocks and raw JSON.
  */
 function parseJsonBlock(text) {
-    const fenceMatch = text.match(/```(?:json)?[ \t]*\n([\s\S]*?)\n[ \t]*```/i) ?? text.match(/```(?:json)?[ \t]*([\s\S]*?)```/i);
+    // Extract JSON from fenced code blocks — limit search to first 50KB to prevent ReDoS on large input
+    const searchText = text.length > 50_000 ? text.slice(0, 50_000) : text;
+    const fenceMatch = searchText.match(/```(?:json)?\s*\n([\s\S]{0,20000}?)\n\s*```/i) ??
+        searchText.match(/```(?:json)?\s*([\s\S]{0,20000}?)```/i);
     if (fenceMatch) {
         try {
             return JSON.parse(fenceMatch[1]);

package/dist/reports/public-repo-report.js CHANGED Viewed

@@ -216,13 +216,9 @@ function compileExcludeRegexes(patterns) {
     if (!patterns || patterns.length === 0)
         return [];
     return patterns.map((pattern) => {
-        try {
-            return new RegExp(pattern, "i");
-        }
-        catch {
-            // Invalid regex from user input — treat as literal string match
-            return new RegExp(pattern.replace(/[.*+?^${}()|[\]\\]/g, "\\$&"), "i");
-        }
+        // Always escape user input to prevent regex injection, then compile
+        const escaped = pattern.replace(/[.*+?^${}()|[\]\\]/g, "\\$&");
+        return new RegExp(escaped, "i");
     });
 }
 function isLikelyNonProductionPath(path) {

package/dist/skill-loader.js CHANGED Viewed

@@ -25,7 +25,7 @@ export function parseSkillFrontmatter(raw) {
             i++;
             continue;
         }
-        const kv = line.match(/^([a-zA-Z_][a-zA-Z0-9_-]*)[ \t]*:[ \t]*(.*)$/);
+        const kv = line.match(/^([a-zA-Z_][a-zA-Z0-9_-]*)[ \t]*:[ \t]*(.*?)$/s);
         if (!kv) {
             i++;
             continue;

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@kevinrabun/judges",
-  "version": "3.127.1",
+  "version": "3.127.3",
   "description": "45 specialized judges that evaluate AI-generated code for security, cost, and quality.",
   "mcpName": "io.github.KevinRabun/judges",
   "type": "module",

package/packages/judges-cli/README.md CHANGED Viewed

@@ -1,6 +1,6 @@
 # @kevinrabun/judges-cli
-Standalone CLI package for Judges.
+Standalone CLI package for the [Judges Panel](https://github.com/KevinRabun/judges) — 45 specialized judges that evaluate code for security, quality, compliance, and 40 more dimensions.
 ## Install
@@ -11,14 +11,46 @@ npm install -g @kevinrabun/judges-cli
 ## Usage
 ```bash
+# Evaluate code
 judges eval src/app.ts
+judges eval src/ --format sarif --output report.sarif
+judges eval src/app.ts --judge cybersecurity
+judges eval src/app.ts --preset strict --fail-on-findings
+# List judges and regulatory frameworks
 judges list
-judges hook install
+judges list --frameworks
+# Auto-fix findings
+judges fix src/app.ts --apply
 # Agentic skills
 judges skill ai-code-review --file src/app.ts
 judges skill security-review --file src/api.ts --format json
-judges skills   # list available skills
+judges skills
+# Self-teaching
+judges codify-amendments          # bake benchmark amendments into judge files
+judges codify-amendments --dry-run
 ```
-Use `@kevinrabun/judges` when you need the MCP server or programmatic API.
+## Configuration
+Create a `.judgesrc.json` in your project root:
+```json
+{
+  "preset": "strict",
+  "regulatoryScope": ["GDPR", "PCI-DSS"],
+  "disabledJudges": ["accessibility"],
+  "failOnFindings": true
+}
+```
+See the [full configuration reference](https://github.com/KevinRabun/judges#configuration) for all options.
+## Packages
+- **`@kevinrabun/judges-cli`** — This package. Binary `judges` for CI/CD pipelines.
+- **`@kevinrabun/judges`** — Programmatic API + MCP server.
+- **VS Code extension** — [`kevinrabun.judges-panel`](https://marketplace.visualstudio.com/items?itemName=kevinrabun.judges-panel).

package/server.json CHANGED Viewed

@@ -16,12 +16,12 @@
       "mimeType": "image/png"
     }
   ],
-  "version": "3.127.1",
+  "version": "3.127.3",
   "packages": [
     {
       "registryType": "npm",
       "identifier": "@kevinrabun/judges",
-      "version": "3.127.1",
+      "version": "3.127.3",
       "transport": {
         "type": "stdio"
       }

package/src/skill-loader.ts CHANGED Viewed

@@ -44,7 +44,7 @@ export function parseSkillFrontmatter(raw: string): { meta: SkillMeta; body: str
       i++;
       continue;
     }
-    const kv = line.match(/^([a-zA-Z_][a-zA-Z0-9_-]*)[ \t]*:[ \t]*(.*)$/);
+    const kv = line.match(/^([a-zA-Z_][a-zA-Z0-9_-]*)[ \t]*:[ \t]*(.*?)$/s);
     if (!kv) {
       i++;
       continue;