@kevinrabun/judges 3.20.12 → 3.20.13
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +17 -0
- package/README.md +62 -11
- package/package.json +1 -1
- package/server.json +2 -2
package/CHANGELOG.md
CHANGED
|
@@ -2,6 +2,23 @@
|
|
|
2
2
|
|
|
3
3
|
All notable changes to **@kevinrabun/judges** are documented here.
|
|
4
4
|
|
|
5
|
+
## [3.20.13] — 2026-03-04
|
|
6
|
+
|
|
7
|
+
### Fixed
|
|
8
|
+
- **Documentation accuracy audit** — Comprehensive review and correction of all documentation claims against the actual codebase:
|
|
9
|
+
- Updated test badge count (1557 → 1666)
|
|
10
|
+
- Updated judge dimension counts throughout (35 → 37) and architecture diagram heuristic count (33 → 36)
|
|
11
|
+
- Added missing judges (`iac-security`, `false-positive-review`) to Judge IDs list, Judge Panel table, and MCP Prompts table
|
|
12
|
+
- Updated evaluator and judge file counts (35 → 37)
|
|
13
|
+
- Added 4 missing package exports to exports table (`./diagnostics`, `./plugins`, `./fingerprint`, `./comparison`)
|
|
14
|
+
- Added 10 missing CLI commands to Scripts table (`feedback`, `benchmark`, `rule`, `pack`, `config`, `compare`, `list`)
|
|
15
|
+
- Expanded project structure with ~20 missing files and directories (AST files, formatters, patches, tools, tests, scripts)
|
|
16
|
+
- Fixed incorrect script filename (`analyze-report-findings.ts` → `debug-fp.ts`)
|
|
17
|
+
- **VS Code extension README** — Replaced 3 hardcoded GPT-4o model references with vendor-neutral phrasing ("available language model" / "AI contextual review"), fixed "right-click a file" → "right-click in the editor", updated auto-fix patch count (47+ → 53)
|
|
18
|
+
|
|
19
|
+
### Tests
|
|
20
|
+
- 1666 tests, 0 failures
|
|
21
|
+
|
|
5
22
|
## [3.20.12] — 2026-03-03
|
|
6
23
|
|
|
7
24
|
### Changed
|
package/README.md
CHANGED
|
@@ -11,13 +11,13 @@ An MCP (Model Context Protocol) server that provides a panel of **37 specialized
|
|
|
11
11
|
[](https://www.npmjs.com/package/@kevinrabun/judges)
|
|
12
12
|
[](https://www.npmjs.com/package/@kevinrabun/judges)
|
|
13
13
|
[](https://opensource.org/licenses/MIT)
|
|
14
|
-
[](https://github.com/KevinRabun/judges/actions)
|
|
15
15
|
|
|
16
16
|
---
|
|
17
17
|
|
|
18
18
|
## Why Judges?
|
|
19
19
|
|
|
20
|
-
AI code generators (Copilot, Cursor, Claude, ChatGPT, etc.) write code fast — but they routinely produce **insecure defaults, missing auth, hardcoded secrets, and poor error handling**. Human reviewers catch some of this, but nobody reviews
|
|
20
|
+
AI code generators (Copilot, Cursor, Claude, ChatGPT, etc.) write code fast — but they routinely produce **insecure defaults, missing auth, hardcoded secrets, and poor error handling**. Human reviewers catch some of this, but nobody reviews 37 dimensions consistently.
|
|
21
21
|
|
|
22
22
|
| | ESLint / Biome | SonarQube | Semgrep / CodeQL | **Judges** |
|
|
23
23
|
|---|---|---|---|---|
|
|
@@ -656,6 +656,8 @@ const svg2 = generateBadgeSvg(75, "quality"); // custom label
|
|
|
656
656
|
| **Agent Instructions** | Agent Instruction Markdown Quality & Safety | `AGENT-` | Instruction hierarchy, conflict detection, unsafe overrides, scope, validation, policy guidance |
|
|
657
657
|
| **AI Code Safety** | AI-Generated Code Safety | `AICS-` | Prompt injection, insecure LLM output handling, debug defaults, missing validation, unsafe deserialization of AI responses |
|
|
658
658
|
| **Framework Safety** | Framework-Specific Safety | `FW-` | React hooks ordering, Express middleware chains, Next.js SSR/SSG pitfalls, Angular/Vue lifecycle patterns, framework-specific anti-patterns |
|
|
659
|
+
| **IaC Security** | Infrastructure as Code | `IAC-` | Terraform, Bicep, ARM template misconfigurations, hardcoded secrets, missing encryption, overly permissive network/IAM rules |
|
|
660
|
+
| **False-Positive Review** | False Positive Detection & Finding Accuracy | `FPR-` | Meta-judge reviewing pattern-based findings for false positives: string literal context, comment/docstring matches, test scaffolding, IaC template gating |
|
|
659
661
|
|
|
660
662
|
---
|
|
661
663
|
|
|
@@ -711,7 +713,7 @@ When your AI coding assistant connects to multiple MCP servers, each one contrib
|
|
|
711
713
|
│ Judges │ │ CVE / │ │ Linter │
|
|
712
714
|
│ Panel │ │ SBOM │ │ Server │
|
|
713
715
|
│ ─────────────│ └────────┘ └────────┘
|
|
714
|
-
│
|
|
716
|
+
│ 36 Heuristic │ Vuln DB Style &
|
|
715
717
|
│ judges │ scanning correctness
|
|
716
718
|
│ + AST judge │
|
|
717
719
|
└──────────────┘
|
|
@@ -934,7 +936,7 @@ Analyze a dependency manifest file for supply-chain risks, version pinning issue
|
|
|
934
936
|
|
|
935
937
|
#### Judge IDs
|
|
936
938
|
|
|
937
|
-
`data-security` · `cybersecurity` · `cost-effectiveness` · `scalability` · `cloud-readiness` · `software-practices` · `accessibility` · `api-design` · `reliability` · `observability` · `performance` · `compliance` · `data-sovereignty` · `testing` · `documentation` · `internationalization` · `dependency-health` · `concurrency` · `ethics-bias` · `maintainability` · `error-handling` · `authentication` · `database` · `caching` · `configuration-management` · `backwards-compatibility` · `portability` · `ux` · `logging-privacy` · `rate-limiting` · `ci-cd` · `code-structure` · `agent-instructions` · `ai-code-safety` · `framework-safety`
|
|
939
|
+
`data-security` · `cybersecurity` · `cost-effectiveness` · `scalability` · `cloud-readiness` · `software-practices` · `accessibility` · `api-design` · `reliability` · `observability` · `performance` · `compliance` · `data-sovereignty` · `testing` · `documentation` · `internationalization` · `dependency-health` · `concurrency` · `ethics-bias` · `maintainability` · `error-handling` · `authentication` · `database` · `caching` · `configuration-management` · `backwards-compatibility` · `portability` · `ux` · `logging-privacy` · `rate-limiting` · `ci-cd` · `code-structure` · `agent-instructions` · `ai-code-safety` · `framework-safety` · `iac-security` · `false-positive-review`
|
|
938
940
|
|
|
939
941
|
---
|
|
940
942
|
|
|
@@ -979,6 +981,8 @@ Each judge has a corresponding prompt for LLM-powered deep analysis:
|
|
|
979
981
|
| `judge-agent-instructions` | Deep review of agent instruction markdown quality and safety |
|
|
980
982
|
| `judge-ai-code-safety` | Deep review of AI-generated code risks: prompt injection, insecure LLM output handling, debug defaults, missing validation |
|
|
981
983
|
| `judge-framework-safety` | Deep review of framework-specific safety: React hooks, Express middleware, Next.js SSR/SSG, Angular/Vue patterns |
|
|
984
|
+
| `judge-iac-security` | Deep review of infrastructure-as-code security: Terraform, Bicep, ARM template misconfigurations |
|
|
985
|
+
| `judge-false-positive-review` | Meta-judge review of pattern-based findings for false positive detection and accuracy |
|
|
982
986
|
| `full-tribunal` | All 37 judges in a single prompt |
|
|
983
987
|
|
|
984
988
|
---
|
|
@@ -1111,23 +1115,37 @@ The **overall tribunal score** is the average of all 37 judges. The overall verd
|
|
|
1111
1115
|
judges/
|
|
1112
1116
|
├── src/
|
|
1113
1117
|
│ ├── index.ts # MCP server entry point — tools, prompts, transport
|
|
1118
|
+
│ ├── api.ts # Programmatic API entry point
|
|
1119
|
+
│ ├── cli.ts # CLI argument parser and command router
|
|
1114
1120
|
│ ├── types.ts # TypeScript interfaces (Finding, JudgeEvaluation, etc.)
|
|
1115
1121
|
│ ├── config.ts # .judgesrc configuration parser and validation
|
|
1122
|
+
│ ├── errors.ts # Custom error types (ConfigError, EvaluationError, ParseError)
|
|
1116
1123
|
│ ├── language-patterns.ts # Multi-language regex pattern constants and helpers
|
|
1124
|
+
│ ├── plugins.ts # Plugin system for custom rules
|
|
1125
|
+
│ ├── scoring.ts # Confidence scoring and calibration
|
|
1126
|
+
│ ├── dedup.ts # Finding deduplication engine
|
|
1127
|
+
│ ├── fingerprint.ts # Finding fingerprint generation
|
|
1128
|
+
│ ├── comparison.ts # Tool comparison benchmark data
|
|
1129
|
+
│ ├── cache.ts # Evaluation result caching
|
|
1130
|
+
│ ├── calibration.ts # Confidence calibration from feedback data
|
|
1131
|
+
│ ├── fix-history.ts # Auto-fix application history tracking
|
|
1117
1132
|
│ ├── ast/ # AST analysis engine (built-in, no external deps)
|
|
1118
1133
|
│ │ ├── index.ts # analyzeStructure() — routes to correct parser
|
|
1119
1134
|
│ │ ├── types.ts # FunctionInfo, CodeStructure interfaces
|
|
1120
1135
|
│ │ ├── tree-sitter-ast.ts # Tree-sitter WASM parser (all 8 languages)
|
|
1121
|
-
│ │
|
|
1136
|
+
│ │ ├── structural-parser.ts # Fallback scope-tracking parser
|
|
1137
|
+
│ │ ├── cross-file-taint.ts # Cross-file taint propagation analysis
|
|
1138
|
+
│ │ └── taint-tracker.ts # Single-file taint flow tracking
|
|
1122
1139
|
│ ├── evaluators/ # Analysis engine for each judge
|
|
1123
1140
|
│ │ ├── index.ts # evaluateWithJudge(), evaluateWithTribunal(), evaluateProject(), etc.
|
|
1124
1141
|
│ │ ├── shared.ts # Scoring, verdict logic, markdown formatters
|
|
1125
|
-
│ │ └── *.ts # One analyzer per judge (
|
|
1142
|
+
│ │ └── *.ts # One analyzer per judge (37 files)
|
|
1126
1143
|
│ ├── formatters/ # Output formatters
|
|
1127
1144
|
│ │ ├── sarif.ts # SARIF 2.1.0 output
|
|
1128
1145
|
│ │ ├── html.ts # Self-contained HTML report (dark/light theme, filters)
|
|
1129
1146
|
│ │ ├── junit.ts # JUnit XML output (Jenkins, Azure DevOps, GitHub Actions)
|
|
1130
1147
|
│ │ ├── codeclimate.ts # CodeClimate/GitLab Code Quality JSON
|
|
1148
|
+
│ │ ├── diagnostics.ts # Diagnostics formatter
|
|
1131
1149
|
│ │ └── badge.ts # SVG and text badge generator
|
|
1132
1150
|
│ ├── commands/ # CLI subcommands
|
|
1133
1151
|
│ │ ├── init.ts # Interactive project setup wizard
|
|
@@ -1140,21 +1158,40 @@ judges/
|
|
|
1140
1158
|
│ │ ├── deps.ts # Dependency supply-chain analysis
|
|
1141
1159
|
│ │ ├── baseline.ts # Create baseline for finding suppression
|
|
1142
1160
|
│ │ ├── completions.ts # Shell completions (bash/zsh/fish/PowerShell)
|
|
1143
|
-
│ │
|
|
1161
|
+
│ │ ├── docs.ts # Per-judge rule documentation generator
|
|
1162
|
+
│ │ ├── feedback.ts # False-positive tracking & finding feedback
|
|
1163
|
+
│ │ ├── benchmark.ts # Detection accuracy benchmark suite
|
|
1164
|
+
│ │ ├── rule.ts # Custom rule authoring wizard
|
|
1165
|
+
│ │ ├── language-packs.ts # Language-specific rule pack presets
|
|
1166
|
+
│ │ └── config-share.ts # Shareable team/org configuration
|
|
1144
1167
|
│ ├── presets.ts # Named evaluation presets (strict, lenient, security-only, …)
|
|
1168
|
+
│ ├── patches/
|
|
1169
|
+
│ │ └── index.ts # 53 deterministic auto-fix patch rules
|
|
1170
|
+
│ ├── tools/ # MCP tool registrations
|
|
1171
|
+
│ │ ├── register.ts # Tool registration orchestrator
|
|
1172
|
+
│ │ ├── register-evaluation.ts # Evaluation tools (evaluate_code, etc.)
|
|
1173
|
+
│ │ ├── register-workflow.ts # Workflow tools (app builder, reports, etc.)
|
|
1174
|
+
│ │ ├── prompts.ts # MCP prompt registrations (per-judge + full-tribunal)
|
|
1175
|
+
│ │ └── schemas.ts # Zod schemas for tool parameters
|
|
1145
1176
|
│ ├── reports/
|
|
1146
1177
|
│ │ └── public-repo-report.ts # Public repo clone + full tribunal report generation
|
|
1147
1178
|
│ └── judges/ # Judge definitions (id, name, domain, system prompt)
|
|
1148
1179
|
│ ├── index.ts # JUDGES array, getJudge(), getJudgeSummaries()
|
|
1149
|
-
│ └── *.ts # One definition per judge (
|
|
1180
|
+
│ └── *.ts # One definition per judge (37 files)
|
|
1150
1181
|
├── scripts/
|
|
1151
1182
|
│ ├── generate-public-repo-report.ts # Run: npm run report:public-repo -- --repoUrl <url>
|
|
1152
|
-
│
|
|
1183
|
+
│ ├── daily-popular-repo-autofix.ts # Run: npm run automation:daily-popular
|
|
1184
|
+
│ └── debug-fp.ts # Debug false-positive findings
|
|
1153
1185
|
├── examples/
|
|
1154
1186
|
│ ├── sample-vulnerable-api.ts # Intentionally flawed code (triggers all judges)
|
|
1155
|
-
│
|
|
1187
|
+
│ ├── demo.ts # Run: npm run demo
|
|
1188
|
+
│ └── quickstart.ts # Quick-start evaluation example
|
|
1156
1189
|
├── tests/
|
|
1157
|
-
│
|
|
1190
|
+
│ ├── judges.test.ts # Core judge evaluation tests
|
|
1191
|
+
│ ├── negative.test.ts # Negative / FP-avoidance tests
|
|
1192
|
+
│ ├── subsystems.test.ts # Subsystem integration tests
|
|
1193
|
+
│ ├── extension-logic.test.ts # VS Code extension logic tests
|
|
1194
|
+
│ └── tool-routing.test.ts # MCP tool routing tests
|
|
1158
1195
|
├── grammars/ # Tree-sitter WASM grammar files
|
|
1159
1196
|
│ ├── tree-sitter-typescript.wasm
|
|
1160
1197
|
│ ├── tree-sitter-cpp.wasm
|
|
@@ -1196,6 +1233,16 @@ judges/
|
|
|
1196
1233
|
| `judges ci-templates` | Generate CI pipeline templates |
|
|
1197
1234
|
| `judges docs` | Generate per-judge rule documentation |
|
|
1198
1235
|
| `judges completions <shell>` | Shell completion scripts |
|
|
1236
|
+
| `judges feedback submit` | Mark findings as true positive, false positive, or won't fix |
|
|
1237
|
+
| `judges feedback stats` | Show false-positive rate statistics |
|
|
1238
|
+
| `judges benchmark run` | Run detection accuracy benchmark suite |
|
|
1239
|
+
| `judges rule create` | Interactive custom rule creation wizard |
|
|
1240
|
+
| `judges rule list` | List custom evaluation rules |
|
|
1241
|
+
| `judges pack list` | List available language packs |
|
|
1242
|
+
| `judges config export` | Export config as shareable package |
|
|
1243
|
+
| `judges config import <src>` | Import a shared configuration |
|
|
1244
|
+
| `judges compare` | Compare judges against other code review tools |
|
|
1245
|
+
| `judges list` | List all 37 judges with domains and descriptions |
|
|
1199
1246
|
|
|
1200
1247
|
---
|
|
1201
1248
|
|
|
@@ -1269,6 +1316,10 @@ const sarif = findingsToSarif(verdict.evaluations.flatMap(e => e.findings));
|
|
|
1269
1316
|
| `@kevinrabun/judges/junit` | JUnit XML formatter |
|
|
1270
1317
|
| `@kevinrabun/judges/codeclimate` | CodeClimate/GitLab Code Quality JSON |
|
|
1271
1318
|
| `@kevinrabun/judges/badge` | SVG and text badge generator |
|
|
1319
|
+
| `@kevinrabun/judges/diagnostics` | Diagnostics formatter |
|
|
1320
|
+
| `@kevinrabun/judges/plugins` | Plugin system API |
|
|
1321
|
+
| `@kevinrabun/judges/fingerprint` | Finding fingerprint utilities |
|
|
1322
|
+
| `@kevinrabun/judges/comparison` | Tool comparison benchmarks |
|
|
1272
1323
|
|
|
1273
1324
|
### SARIF Output
|
|
1274
1325
|
|
package/package.json
CHANGED
package/server.json
CHANGED
|
@@ -7,12 +7,12 @@
|
|
|
7
7
|
"url": "https://github.com/kevinrabun/judges",
|
|
8
8
|
"source": "github"
|
|
9
9
|
},
|
|
10
|
-
"version": "3.20.
|
|
10
|
+
"version": "3.20.13",
|
|
11
11
|
"packages": [
|
|
12
12
|
{
|
|
13
13
|
"registryType": "npm",
|
|
14
14
|
"identifier": "@kevinrabun/judges",
|
|
15
|
-
"version": "3.20.
|
|
15
|
+
"version": "3.20.13",
|
|
16
16
|
"transport": {
|
|
17
17
|
"type": "stdio"
|
|
18
18
|
}
|