@kevinrabun/judges 3.20.12 → 3.20.13

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -2,6 +2,23 @@
2
2
 
3
3
  All notable changes to **@kevinrabun/judges** are documented here.
4
4
 
5
+ ## [3.20.13] — 2026-03-04
6
+
7
+ ### Fixed
8
+ - **Documentation accuracy audit** — Comprehensive review and correction of all documentation claims against the actual codebase:
9
+ - Updated test badge count (1557 → 1666)
10
+ - Updated judge dimension counts throughout (35 → 37) and architecture diagram heuristic count (33 → 36)
11
+ - Added missing judges (`iac-security`, `false-positive-review`) to Judge IDs list, Judge Panel table, and MCP Prompts table
12
+ - Updated evaluator and judge file counts (35 → 37)
13
+ - Added 4 missing package exports to exports table (`./diagnostics`, `./plugins`, `./fingerprint`, `./comparison`)
14
+ - Added 10 missing CLI commands to Scripts table (`feedback`, `benchmark`, `rule`, `pack`, `config`, `compare`, `list`)
15
+ - Expanded project structure with ~20 missing files and directories (AST files, formatters, patches, tools, tests, scripts)
16
+ - Fixed incorrect script filename (`analyze-report-findings.ts` → `debug-fp.ts`)
17
+ - **VS Code extension README** — Replaced 3 hardcoded GPT-4o model references with vendor-neutral phrasing ("available language model" / "AI contextual review"), fixed "right-click a file" → "right-click in the editor", updated auto-fix patch count (47+ → 53)
18
+
19
+ ### Tests
20
+ - 1666 tests, 0 failures
21
+
5
22
  ## [3.20.12] — 2026-03-03
6
23
 
7
24
  ### Changed
package/README.md CHANGED
@@ -11,13 +11,13 @@ An MCP (Model Context Protocol) server that provides a panel of **37 specialized
11
11
  [![npm](https://img.shields.io/npm/v/@kevinrabun/judges)](https://www.npmjs.com/package/@kevinrabun/judges)
12
12
  [![npm downloads](https://img.shields.io/npm/dw/@kevinrabun/judges)](https://www.npmjs.com/package/@kevinrabun/judges)
13
13
  [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
14
- [![Tests](https://img.shields.io/badge/tests-1557-brightgreen)](https://github.com/KevinRabun/judges/actions)
14
+ [![Tests](https://img.shields.io/badge/tests-1666-brightgreen)](https://github.com/KevinRabun/judges/actions)
15
15
 
16
16
  ---
17
17
 
18
18
  ## Why Judges?
19
19
 
20
- AI code generators (Copilot, Cursor, Claude, ChatGPT, etc.) write code fast — but they routinely produce **insecure defaults, missing auth, hardcoded secrets, and poor error handling**. Human reviewers catch some of this, but nobody reviews 35 dimensions consistently.
20
+ AI code generators (Copilot, Cursor, Claude, ChatGPT, etc.) write code fast — but they routinely produce **insecure defaults, missing auth, hardcoded secrets, and poor error handling**. Human reviewers catch some of this, but nobody reviews 37 dimensions consistently.
21
21
 
22
22
  | | ESLint / Biome | SonarQube | Semgrep / CodeQL | **Judges** |
23
23
  |---|---|---|---|---|
@@ -656,6 +656,8 @@ const svg2 = generateBadgeSvg(75, "quality"); // custom label
656
656
  | **Agent Instructions** | Agent Instruction Markdown Quality & Safety | `AGENT-` | Instruction hierarchy, conflict detection, unsafe overrides, scope, validation, policy guidance |
657
657
  | **AI Code Safety** | AI-Generated Code Safety | `AICS-` | Prompt injection, insecure LLM output handling, debug defaults, missing validation, unsafe deserialization of AI responses |
658
658
  | **Framework Safety** | Framework-Specific Safety | `FW-` | React hooks ordering, Express middleware chains, Next.js SSR/SSG pitfalls, Angular/Vue lifecycle patterns, framework-specific anti-patterns |
659
+ | **IaC Security** | Infrastructure as Code | `IAC-` | Terraform, Bicep, ARM template misconfigurations, hardcoded secrets, missing encryption, overly permissive network/IAM rules |
660
+ | **False-Positive Review** | False Positive Detection & Finding Accuracy | `FPR-` | Meta-judge reviewing pattern-based findings for false positives: string literal context, comment/docstring matches, test scaffolding, IaC template gating |
659
661
 
660
662
  ---
661
663
 
@@ -711,7 +713,7 @@ When your AI coding assistant connects to multiple MCP servers, each one contrib
711
713
  │ Judges │ │ CVE / │ │ Linter │
712
714
  │ Panel │ │ SBOM │ │ Server │
713
715
  │ ─────────────│ └────────┘ └────────┘
714
- 33 Heuristic │ Vuln DB Style &
716
+ 36 Heuristic │ Vuln DB Style &
715
717
  │ judges │ scanning correctness
716
718
  │ + AST judge │
717
719
  └──────────────┘
@@ -934,7 +936,7 @@ Analyze a dependency manifest file for supply-chain risks, version pinning issue
934
936
 
935
937
  #### Judge IDs
936
938
 
937
- `data-security` · `cybersecurity` · `cost-effectiveness` · `scalability` · `cloud-readiness` · `software-practices` · `accessibility` · `api-design` · `reliability` · `observability` · `performance` · `compliance` · `data-sovereignty` · `testing` · `documentation` · `internationalization` · `dependency-health` · `concurrency` · `ethics-bias` · `maintainability` · `error-handling` · `authentication` · `database` · `caching` · `configuration-management` · `backwards-compatibility` · `portability` · `ux` · `logging-privacy` · `rate-limiting` · `ci-cd` · `code-structure` · `agent-instructions` · `ai-code-safety` · `framework-safety`
939
+ `data-security` · `cybersecurity` · `cost-effectiveness` · `scalability` · `cloud-readiness` · `software-practices` · `accessibility` · `api-design` · `reliability` · `observability` · `performance` · `compliance` · `data-sovereignty` · `testing` · `documentation` · `internationalization` · `dependency-health` · `concurrency` · `ethics-bias` · `maintainability` · `error-handling` · `authentication` · `database` · `caching` · `configuration-management` · `backwards-compatibility` · `portability` · `ux` · `logging-privacy` · `rate-limiting` · `ci-cd` · `code-structure` · `agent-instructions` · `ai-code-safety` · `framework-safety` · `iac-security` · `false-positive-review`
938
940
 
939
941
  ---
940
942
 
@@ -979,6 +981,8 @@ Each judge has a corresponding prompt for LLM-powered deep analysis:
979
981
  | `judge-agent-instructions` | Deep review of agent instruction markdown quality and safety |
980
982
  | `judge-ai-code-safety` | Deep review of AI-generated code risks: prompt injection, insecure LLM output handling, debug defaults, missing validation |
981
983
  | `judge-framework-safety` | Deep review of framework-specific safety: React hooks, Express middleware, Next.js SSR/SSG, Angular/Vue patterns |
984
+ | `judge-iac-security` | Deep review of infrastructure-as-code security: Terraform, Bicep, ARM template misconfigurations |
985
+ | `judge-false-positive-review` | Meta-judge review of pattern-based findings for false positive detection and accuracy |
982
986
  | `full-tribunal` | All 37 judges in a single prompt |
983
987
 
984
988
  ---
@@ -1111,23 +1115,37 @@ The **overall tribunal score** is the average of all 37 judges. The overall verd
1111
1115
  judges/
1112
1116
  ├── src/
1113
1117
  │ ├── index.ts # MCP server entry point — tools, prompts, transport
1118
+ │ ├── api.ts # Programmatic API entry point
1119
+ │ ├── cli.ts # CLI argument parser and command router
1114
1120
  │ ├── types.ts # TypeScript interfaces (Finding, JudgeEvaluation, etc.)
1115
1121
  │ ├── config.ts # .judgesrc configuration parser and validation
1122
+ │ ├── errors.ts # Custom error types (ConfigError, EvaluationError, ParseError)
1116
1123
  │ ├── language-patterns.ts # Multi-language regex pattern constants and helpers
1124
+ │ ├── plugins.ts # Plugin system for custom rules
1125
+ │ ├── scoring.ts # Confidence scoring and calibration
1126
+ │ ├── dedup.ts # Finding deduplication engine
1127
+ │ ├── fingerprint.ts # Finding fingerprint generation
1128
+ │ ├── comparison.ts # Tool comparison benchmark data
1129
+ │ ├── cache.ts # Evaluation result caching
1130
+ │ ├── calibration.ts # Confidence calibration from feedback data
1131
+ │ ├── fix-history.ts # Auto-fix application history tracking
1117
1132
  │ ├── ast/ # AST analysis engine (built-in, no external deps)
1118
1133
  │ │ ├── index.ts # analyzeStructure() — routes to correct parser
1119
1134
  │ │ ├── types.ts # FunctionInfo, CodeStructure interfaces
1120
1135
  │ │ ├── tree-sitter-ast.ts # Tree-sitter WASM parser (all 8 languages)
1121
- │ │ └── structural-parser.ts # Fallback scope-tracking parser
1136
+ │ │ ├── structural-parser.ts # Fallback scope-tracking parser
1137
+ │ │ ├── cross-file-taint.ts # Cross-file taint propagation analysis
1138
+ │ │ └── taint-tracker.ts # Single-file taint flow tracking
1122
1139
  │ ├── evaluators/ # Analysis engine for each judge
1123
1140
  │ │ ├── index.ts # evaluateWithJudge(), evaluateWithTribunal(), evaluateProject(), etc.
1124
1141
  │ │ ├── shared.ts # Scoring, verdict logic, markdown formatters
1125
- │ │ └── *.ts # One analyzer per judge (35 files)
1142
+ │ │ └── *.ts # One analyzer per judge (37 files)
1126
1143
  │ ├── formatters/ # Output formatters
1127
1144
  │ │ ├── sarif.ts # SARIF 2.1.0 output
1128
1145
  │ │ ├── html.ts # Self-contained HTML report (dark/light theme, filters)
1129
1146
  │ │ ├── junit.ts # JUnit XML output (Jenkins, Azure DevOps, GitHub Actions)
1130
1147
  │ │ ├── codeclimate.ts # CodeClimate/GitLab Code Quality JSON
1148
+ │ │ ├── diagnostics.ts # Diagnostics formatter
1131
1149
  │ │ └── badge.ts # SVG and text badge generator
1132
1150
  │ ├── commands/ # CLI subcommands
1133
1151
  │ │ ├── init.ts # Interactive project setup wizard
@@ -1140,21 +1158,40 @@ judges/
1140
1158
  │ │ ├── deps.ts # Dependency supply-chain analysis
1141
1159
  │ │ ├── baseline.ts # Create baseline for finding suppression
1142
1160
  │ │ ├── completions.ts # Shell completions (bash/zsh/fish/PowerShell)
1143
- │ │ └── docs.ts # Per-judge rule documentation generator
1161
+ │ │ ├── docs.ts # Per-judge rule documentation generator
1162
+ │ │ ├── feedback.ts # False-positive tracking & finding feedback
1163
+ │ │ ├── benchmark.ts # Detection accuracy benchmark suite
1164
+ │ │ ├── rule.ts # Custom rule authoring wizard
1165
+ │ │ ├── language-packs.ts # Language-specific rule pack presets
1166
+ │ │ └── config-share.ts # Shareable team/org configuration
1144
1167
  │ ├── presets.ts # Named evaluation presets (strict, lenient, security-only, …)
1168
+ │ ├── patches/
1169
+ │ │ └── index.ts # 53 deterministic auto-fix patch rules
1170
+ │ ├── tools/ # MCP tool registrations
1171
+ │ │ ├── register.ts # Tool registration orchestrator
1172
+ │ │ ├── register-evaluation.ts # Evaluation tools (evaluate_code, etc.)
1173
+ │ │ ├── register-workflow.ts # Workflow tools (app builder, reports, etc.)
1174
+ │ │ ├── prompts.ts # MCP prompt registrations (per-judge + full-tribunal)
1175
+ │ │ └── schemas.ts # Zod schemas for tool parameters
1145
1176
  │ ├── reports/
1146
1177
  │ │ └── public-repo-report.ts # Public repo clone + full tribunal report generation
1147
1178
  │ └── judges/ # Judge definitions (id, name, domain, system prompt)
1148
1179
  │ ├── index.ts # JUDGES array, getJudge(), getJudgeSummaries()
1149
- │ └── *.ts # One definition per judge (35 files)
1180
+ │ └── *.ts # One definition per judge (37 files)
1150
1181
  ├── scripts/
1151
1182
  │ ├── generate-public-repo-report.ts # Run: npm run report:public-repo -- --repoUrl <url>
1152
- └── daily-popular-repo-autofix.ts # Run: npm run automation:daily-popular
1183
+ ├── daily-popular-repo-autofix.ts # Run: npm run automation:daily-popular
1184
+ │ └── debug-fp.ts # Debug false-positive findings
1153
1185
  ├── examples/
1154
1186
  │ ├── sample-vulnerable-api.ts # Intentionally flawed code (triggers all judges)
1155
- └── demo.ts # Run: npm run demo
1187
+ ├── demo.ts # Run: npm run demo
1188
+ │ └── quickstart.ts # Quick-start evaluation example
1156
1189
  ├── tests/
1157
- └── judges.test.ts # Run: npm test
1190
+ ├── judges.test.ts # Core judge evaluation tests
1191
+ │ ├── negative.test.ts # Negative / FP-avoidance tests
1192
+ │ ├── subsystems.test.ts # Subsystem integration tests
1193
+ │ ├── extension-logic.test.ts # VS Code extension logic tests
1194
+ │ └── tool-routing.test.ts # MCP tool routing tests
1158
1195
  ├── grammars/ # Tree-sitter WASM grammar files
1159
1196
  │ ├── tree-sitter-typescript.wasm
1160
1197
  │ ├── tree-sitter-cpp.wasm
@@ -1196,6 +1233,16 @@ judges/
1196
1233
  | `judges ci-templates` | Generate CI pipeline templates |
1197
1234
  | `judges docs` | Generate per-judge rule documentation |
1198
1235
  | `judges completions <shell>` | Shell completion scripts |
1236
+ | `judges feedback submit` | Mark findings as true positive, false positive, or won't fix |
1237
+ | `judges feedback stats` | Show false-positive rate statistics |
1238
+ | `judges benchmark run` | Run detection accuracy benchmark suite |
1239
+ | `judges rule create` | Interactive custom rule creation wizard |
1240
+ | `judges rule list` | List custom evaluation rules |
1241
+ | `judges pack list` | List available language packs |
1242
+ | `judges config export` | Export config as shareable package |
1243
+ | `judges config import <src>` | Import a shared configuration |
1244
+ | `judges compare` | Compare judges against other code review tools |
1245
+ | `judges list` | List all 37 judges with domains and descriptions |
1199
1246
 
1200
1247
  ---
1201
1248
 
@@ -1269,6 +1316,10 @@ const sarif = findingsToSarif(verdict.evaluations.flatMap(e => e.findings));
1269
1316
  | `@kevinrabun/judges/junit` | JUnit XML formatter |
1270
1317
  | `@kevinrabun/judges/codeclimate` | CodeClimate/GitLab Code Quality JSON |
1271
1318
  | `@kevinrabun/judges/badge` | SVG and text badge generator |
1319
+ | `@kevinrabun/judges/diagnostics` | Diagnostics formatter |
1320
+ | `@kevinrabun/judges/plugins` | Plugin system API |
1321
+ | `@kevinrabun/judges/fingerprint` | Finding fingerprint utilities |
1322
+ | `@kevinrabun/judges/comparison` | Tool comparison benchmarks |
1272
1323
 
1273
1324
  ### SARIF Output
1274
1325
 
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@kevinrabun/judges",
3
- "version": "3.20.12",
3
+ "version": "3.20.13",
4
4
  "description": "37 specialized judges that evaluate AI-generated code for security, cost, and quality.",
5
5
  "mcpName": "io.github.KevinRabun/judges",
6
6
  "type": "module",
package/server.json CHANGED
@@ -7,12 +7,12 @@
7
7
  "url": "https://github.com/kevinrabun/judges",
8
8
  "source": "github"
9
9
  },
10
- "version": "3.20.12",
10
+ "version": "3.20.13",
11
11
  "packages": [
12
12
  {
13
13
  "registryType": "npm",
14
14
  "identifier": "@kevinrabun/judges",
15
- "version": "3.20.12",
15
+ "version": "3.20.13",
16
16
  "transport": {
17
17
  "type": "stdio"
18
18
  }