@kevinrabun/judges 1.0.1 → 1.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +205 -107
- package/dist/index.js +1 -1
- package/package.json +5 -2
- package/server.json +3 -3
package/README.md
CHANGED
|
@@ -2,132 +2,220 @@
|
|
|
2
2
|
|
|
3
3
|
An MCP (Model Context Protocol) server that provides a panel of **18 specialized judges** to evaluate AI-generated code — acting as an independent quality gate regardless of which project is being reviewed.
|
|
4
4
|
|
|
5
|
-
|
|
5
|
+
[](https://github.com/KevinRabun/judges/actions/workflows/ci.yml)
|
|
6
|
+
[](https://www.npmjs.com/package/@kevinrabun/judges)
|
|
7
|
+
[](https://opensource.org/licenses/MIT)
|
|
6
8
|
|
|
7
|
-
|
|
8
|
-
|-------|--------|-------------|-------------------|
|
|
9
|
-
| **Judge Data Security** | Data Security & Privacy | `DATA-` | Encryption, PII handling, secrets management, access controls, GDPR/CCPA/HIPAA compliance |
|
|
10
|
-
| **Judge Cybersecurity** | Cybersecurity & Threat Defense | `CYBER-` | Injection attacks, XSS, CSRF, auth flaws, dependency CVEs, OWASP Top 10 |
|
|
11
|
-
| **Judge Cost Effectiveness** | Cost Optimization | `COST-` | Algorithm efficiency, N+1 queries, memory waste, caching strategy, cloud spend |
|
|
12
|
-
| **Judge Scalability** | Scalability & Performance | `SCALE-` | Statelessness, horizontal scaling, concurrency, bottlenecks, rate limiting |
|
|
13
|
-
| **Judge Cloud Readiness** | Cloud-Native & DevOps | `CLOUD-` | 12-Factor compliance, containerization, observability, graceful shutdown, IaC |
|
|
14
|
-
| **Judge Software Practices** | Engineering Best Practices | `SWDEV-` | SOLID principles, type safety, error handling, testing, input validation, clean code |
|
|
15
|
-
| **Judge Accessibility** | Accessibility (a11y) | `A11Y-` | WCAG compliance, screen reader support, keyboard navigation, ARIA, color contrast |
|
|
16
|
-
| **Judge API Design** | API Design & Contracts | `API-` | REST conventions, versioning, pagination, error responses, consistency |
|
|
17
|
-
| **Judge Reliability** | Reliability & Resilience | `REL-` | Error handling, timeouts, retries, circuit breakers, graceful degradation |
|
|
18
|
-
| **Judge Observability** | Observability & Monitoring | `OBS-` | Structured logging, health checks, metrics, tracing, correlation IDs |
|
|
19
|
-
| **Judge Performance** | Performance & Efficiency | `PERF-` | N+1 queries, sync I/O, caching, memory leaks, algorithmic complexity |
|
|
20
|
-
| **Judge Compliance** | Regulatory Compliance | `COMP-` | GDPR/CCPA, PII protection, consent, data retention, audit trails |
|
|
21
|
-
| **Judge Testing** | Testing & Quality Assurance | `TEST-` | Test coverage, assertions, test isolation, naming, external dependencies |
|
|
22
|
-
| **Judge Documentation** | Documentation & Readability | `DOC-` | JSDoc/docstrings, magic numbers, TODOs, code comments, module docs |
|
|
23
|
-
| **Judge Internationalization** | Internationalization (i18n) | `I18N-` | Hardcoded strings, locale handling, currency formatting, RTL support |
|
|
24
|
-
| **Judge Dependency Health** | Dependency Management | `DEPS-` | Version pinning, deprecated packages, supply chain, import hygiene |
|
|
25
|
-
| **Judge Concurrency** | Concurrency & Async Safety | `CONC-` | Race conditions, unbounded parallelism, missing await, resource cleanup |
|
|
26
|
-
| **Judge Ethics & Bias** | Ethics & Bias | `ETHICS-` | Demographic logic, explainability, dark patterns, inclusive language |
|
|
9
|
+
---
|
|
27
10
|
|
|
28
|
-
##
|
|
11
|
+
## Quick Start
|
|
29
12
|
|
|
30
|
-
|
|
13
|
+
### 1. Install and Build
|
|
31
14
|
|
|
32
|
-
|
|
33
|
-
|
|
34
|
-
|
|
35
|
-
|
|
36
|
-
|
|
15
|
+
```bash
|
|
16
|
+
git clone https://github.com/KevinRabun/judges.git
|
|
17
|
+
cd judges
|
|
18
|
+
npm install
|
|
19
|
+
npm run build
|
|
20
|
+
```
|
|
37
21
|
|
|
38
|
-
###
|
|
39
|
-
List all available judges with their domains and descriptions.
|
|
22
|
+
### 2. Try the Demo
|
|
40
23
|
|
|
41
|
-
|
|
42
|
-
Submit code to the **full judges panel**. All 18 judges evaluate independently and return a combined verdict.
|
|
24
|
+
Run the included demo to see all 18 judges evaluate a purposely flawed API server:
|
|
43
25
|
|
|
44
|
-
|
|
45
|
-
|
|
46
|
-
|
|
47
|
-
- `context` (string, optional) — Additional context about the code
|
|
26
|
+
```bash
|
|
27
|
+
npm run demo
|
|
28
|
+
```
|
|
48
29
|
|
|
49
|
-
|
|
30
|
+
This evaluates [`examples/sample-vulnerable-api.ts`](examples/sample-vulnerable-api.ts) — a file intentionally packed with security holes, performance anti-patterns, and code quality issues — and prints a full verdict with per-judge scores and findings.
|
|
50
31
|
|
|
51
|
-
|
|
52
|
-
Submit code to a **specific judge** for targeted review.
|
|
32
|
+
**What you'll see:**
|
|
53
33
|
|
|
54
|
-
|
|
55
|
-
|
|
56
|
-
|
|
57
|
-
|
|
58
|
-
|
|
59
|
-
|
|
60
|
-
|
|
34
|
+
```
|
|
35
|
+
╔══════════════════════════════════════════════════════════════╗
|
|
36
|
+
║ Judges Panel — Full Tribunal Demo ║
|
|
37
|
+
╚══════════════════════════════════════════════════════════════╝
|
|
38
|
+
|
|
39
|
+
Overall Verdict : FAIL
|
|
40
|
+
Overall Score : 61/100
|
|
41
|
+
Critical Issues : 15
|
|
42
|
+
High Issues : 17
|
|
43
|
+
Total Findings : 81
|
|
44
|
+
Judges Run : 18
|
|
45
|
+
|
|
46
|
+
Per-Judge Breakdown:
|
|
47
|
+
────────────────────────────────────────────────────────────────
|
|
48
|
+
❌ Judge Data Security 0/100 7 finding(s)
|
|
49
|
+
❌ Judge Cybersecurity 24/100 6 finding(s)
|
|
50
|
+
⚠️ Judge Cost Effectiveness 70/100 5 finding(s)
|
|
51
|
+
⚠️ Judge Scalability 79/100 4 finding(s)
|
|
52
|
+
❌ Judge Cloud Readiness 77/100 4 finding(s)
|
|
53
|
+
⚠️ Judge Software Practices 73/100 5 finding(s)
|
|
54
|
+
❌ Judge Accessibility 28/100 8 finding(s)
|
|
55
|
+
❌ Judge API Design 35/100 9 finding(s)
|
|
56
|
+
⚠️ Judge Reliability 70/100 3 finding(s)
|
|
57
|
+
❌ Judge Observability 65/100 5 finding(s)
|
|
58
|
+
❌ Judge Performance 53/100 5 finding(s)
|
|
59
|
+
❌ Judge Compliance 34/100 4 finding(s)
|
|
60
|
+
✅ Judge Testing 94/100 1 finding(s)
|
|
61
|
+
✅ Judge Documentation 82/100 4 finding(s)
|
|
62
|
+
✅ Judge Internationalization 79/100 4 finding(s)
|
|
63
|
+
✅ Judge Dependency Health 94/100 1 finding(s)
|
|
64
|
+
⚠️ Judge Concurrency 64/100 4 finding(s)
|
|
65
|
+
❌ Judge Ethics & Bias 77/100 2 finding(s)
|
|
66
|
+
```
|
|
61
67
|
|
|
62
|
-
|
|
63
|
-
- `judge-cybersecurity` — Deep cybersecurity review via LLM
|
|
64
|
-
- `judge-cost-effectiveness` — Deep cost optimization review via LLM
|
|
65
|
-
- `judge-scalability` — Deep scalability review via LLM
|
|
66
|
-
- `judge-cloud-readiness` — Deep cloud readiness review via LLM
|
|
67
|
-
- `judge-software-practices` — Deep software practices review via LLM
|
|
68
|
-
- `judge-accessibility` — Deep accessibility/WCAG review via LLM
|
|
69
|
-
- `judge-api-design` — Deep API design review via LLM
|
|
70
|
-
- `judge-reliability` — Deep reliability & resilience review via LLM
|
|
71
|
-
- `judge-observability` — Deep observability & monitoring review via LLM
|
|
72
|
-
- `judge-performance` — Deep performance optimization review via LLM
|
|
73
|
-
- `judge-compliance` — Deep regulatory compliance review via LLM
|
|
74
|
-
- `judge-testing` — Deep testing quality review via LLM
|
|
75
|
-
- `judge-documentation` — Deep documentation quality review via LLM
|
|
76
|
-
- `judge-internationalization` — Deep i18n review via LLM
|
|
77
|
-
- `judge-dependency-health` — Deep dependency health review via LLM
|
|
78
|
-
- `judge-concurrency` — Deep concurrency & async safety review via LLM
|
|
79
|
-
- `judge-ethics-bias` — Deep ethics & bias review via LLM
|
|
80
|
-
- `full-tribunal` — All 18 judges via LLM in a single prompt
|
|
81
|
-
|
|
82
|
-
## Setup
|
|
83
|
-
|
|
84
|
-
### Build
|
|
68
|
+
### 3. Run the Tests
|
|
85
69
|
|
|
86
70
|
```bash
|
|
87
|
-
npm
|
|
88
|
-
npm run build
|
|
71
|
+
npm test
|
|
89
72
|
```
|
|
90
73
|
|
|
91
|
-
|
|
74
|
+
Runs 184 automated tests covering all 18 judges, markdown formatters, and edge cases.
|
|
75
|
+
|
|
76
|
+
### 4. Connect to Your Editor
|
|
77
|
+
|
|
78
|
+
Add the Judges Panel as an MCP server so your AI coding assistant can use it automatically.
|
|
92
79
|
|
|
93
|
-
|
|
80
|
+
**VS Code** — create `.vscode/mcp.json` in your project:
|
|
94
81
|
|
|
95
82
|
```json
|
|
96
83
|
{
|
|
97
|
-
"
|
|
84
|
+
"servers": {
|
|
98
85
|
"judges": {
|
|
99
86
|
"command": "node",
|
|
100
|
-
"args": ["
|
|
87
|
+
"args": ["/absolute/path/to/judges/dist/index.js"]
|
|
101
88
|
}
|
|
102
89
|
}
|
|
103
90
|
}
|
|
104
91
|
```
|
|
105
92
|
|
|
106
|
-
|
|
93
|
+
**Claude Desktop** — add to `claude_desktop_config.json`:
|
|
107
94
|
|
|
108
95
|
```json
|
|
109
96
|
{
|
|
110
|
-
"
|
|
111
|
-
"
|
|
112
|
-
"
|
|
113
|
-
|
|
114
|
-
"args": ["<path-to>/judges/dist/index.js"]
|
|
115
|
-
}
|
|
97
|
+
"mcpServers": {
|
|
98
|
+
"judges": {
|
|
99
|
+
"command": "node",
|
|
100
|
+
"args": ["/absolute/path/to/judges/dist/index.js"]
|
|
116
101
|
}
|
|
117
102
|
}
|
|
118
103
|
}
|
|
119
104
|
```
|
|
120
105
|
|
|
106
|
+
**Or install from npm** instead of cloning:
|
|
107
|
+
|
|
108
|
+
```bash
|
|
109
|
+
npm install -g @kevinrabun/judges
|
|
110
|
+
```
|
|
111
|
+
|
|
112
|
+
Then use `judges` as the command in your MCP config (no `args` needed).
|
|
113
|
+
|
|
114
|
+
---
|
|
115
|
+
|
|
116
|
+
## The Judge Panel
|
|
117
|
+
|
|
118
|
+
| Judge | Domain | Rule Prefix | What It Evaluates |
|
|
119
|
+
|-------|--------|-------------|-------------------|
|
|
120
|
+
| **Data Security** | Data Security & Privacy | `DATA-` | Encryption, PII handling, secrets management, access controls |
|
|
121
|
+
| **Cybersecurity** | Cybersecurity & Threat Defense | `CYBER-` | Injection attacks, XSS, CSRF, auth flaws, OWASP Top 10 |
|
|
122
|
+
| **Cost Effectiveness** | Cost Optimization | `COST-` | Algorithm efficiency, N+1 queries, memory waste, caching strategy |
|
|
123
|
+
| **Scalability** | Scalability & Performance | `SCALE-` | Statelessness, horizontal scaling, concurrency, bottlenecks |
|
|
124
|
+
| **Cloud Readiness** | Cloud-Native & DevOps | `CLOUD-` | 12-Factor compliance, containerization, graceful shutdown, IaC |
|
|
125
|
+
| **Software Practices** | Engineering Best Practices | `SWDEV-` | SOLID principles, type safety, error handling, input validation |
|
|
126
|
+
| **Accessibility** | Accessibility (a11y) | `A11Y-` | WCAG compliance, screen reader support, keyboard navigation, ARIA |
|
|
127
|
+
| **API Design** | API Design & Contracts | `API-` | REST conventions, versioning, pagination, error responses |
|
|
128
|
+
| **Reliability** | Reliability & Resilience | `REL-` | Error handling, timeouts, retries, circuit breakers |
|
|
129
|
+
| **Observability** | Observability & Monitoring | `OBS-` | Structured logging, health checks, metrics, tracing |
|
|
130
|
+
| **Performance** | Performance & Efficiency | `PERF-` | N+1 queries, sync I/O, caching, memory leaks |
|
|
131
|
+
| **Compliance** | Regulatory Compliance | `COMP-` | GDPR/CCPA, PII protection, consent, data retention, audit trails |
|
|
132
|
+
| **Testing** | Testing & Quality Assurance | `TEST-` | Test coverage, assertions, test isolation, naming |
|
|
133
|
+
| **Documentation** | Documentation & Readability | `DOC-` | JSDoc/docstrings, magic numbers, TODOs, code comments |
|
|
134
|
+
| **Internationalization** | Internationalization (i18n) | `I18N-` | Hardcoded strings, locale handling, currency formatting |
|
|
135
|
+
| **Dependency Health** | Dependency Management | `DEPS-` | Version pinning, deprecated packages, supply chain |
|
|
136
|
+
| **Concurrency** | Concurrency & Async Safety | `CONC-` | Race conditions, unbounded parallelism, missing await |
|
|
137
|
+
| **Ethics & Bias** | Ethics & Bias | `ETHICS-` | Demographic logic, dark patterns, inclusive language |
|
|
138
|
+
|
|
139
|
+
---
|
|
140
|
+
|
|
141
|
+
## How It Works
|
|
142
|
+
|
|
143
|
+
The tribunal operates in two modes:
|
|
144
|
+
|
|
145
|
+
1. **Pattern-Based Analysis (Tools)** — The `evaluate_code` and `evaluate_code_single_judge` tools perform heuristic analysis using pattern matching to catch common anti-patterns. This works entirely offline with zero external API calls.
|
|
146
|
+
|
|
147
|
+
2. **LLM-Powered Deep Analysis (Prompts)** — The server exposes MCP prompts (e.g., `judge-data-security`, `full-tribunal`) that provide each judge's expert persona as a system prompt. When used by an LLM-based client, this enables deeper, context-aware analysis beyond what pattern matching can detect.
|
|
148
|
+
|
|
149
|
+
---
|
|
150
|
+
|
|
151
|
+
## MCP Tools
|
|
152
|
+
|
|
153
|
+
### `get_judges`
|
|
154
|
+
List all available judges with their domains and descriptions.
|
|
155
|
+
|
|
156
|
+
### `evaluate_code`
|
|
157
|
+
Submit code to the **full judges panel**. All 18 judges evaluate independently and return a combined verdict.
|
|
158
|
+
|
|
159
|
+
| Parameter | Type | Required | Description |
|
|
160
|
+
|-----------|------|----------|-------------|
|
|
161
|
+
| `code` | string | yes | The source code to evaluate |
|
|
162
|
+
| `language` | string | yes | Programming language (e.g., `typescript`, `python`) |
|
|
163
|
+
| `context` | string | no | Additional context about the code |
|
|
164
|
+
|
|
165
|
+
### `evaluate_code_single_judge`
|
|
166
|
+
Submit code to a **specific judge** for targeted review.
|
|
167
|
+
|
|
168
|
+
| Parameter | Type | Required | Description |
|
|
169
|
+
|-----------|------|----------|-------------|
|
|
170
|
+
| `code` | string | yes | The source code to evaluate |
|
|
171
|
+
| `language` | string | yes | Programming language |
|
|
172
|
+
| `judgeId` | string | yes | See [judge IDs](#judge-ids) below |
|
|
173
|
+
| `context` | string | no | Additional context |
|
|
174
|
+
|
|
175
|
+
#### Judge IDs
|
|
176
|
+
|
|
177
|
+
`data-security` · `cybersecurity` · `cost-effectiveness` · `scalability` · `cloud-readiness` · `software-practices` · `accessibility` · `api-design` · `reliability` · `observability` · `performance` · `compliance` · `testing` · `documentation` · `internationalization` · `dependency-health` · `concurrency` · `ethics-bias`
|
|
178
|
+
|
|
179
|
+
---
|
|
180
|
+
|
|
181
|
+
## MCP Prompts
|
|
182
|
+
|
|
183
|
+
Each judge has a corresponding prompt for LLM-powered deep analysis:
|
|
184
|
+
|
|
185
|
+
| Prompt | Description |
|
|
186
|
+
|--------|-------------|
|
|
187
|
+
| `judge-data-security` | Deep data security review |
|
|
188
|
+
| `judge-cybersecurity` | Deep cybersecurity review |
|
|
189
|
+
| `judge-cost-effectiveness` | Deep cost optimization review |
|
|
190
|
+
| `judge-scalability` | Deep scalability review |
|
|
191
|
+
| `judge-cloud-readiness` | Deep cloud readiness review |
|
|
192
|
+
| `judge-software-practices` | Deep software practices review |
|
|
193
|
+
| `judge-accessibility` | Deep accessibility/WCAG review |
|
|
194
|
+
| `judge-api-design` | Deep API design review |
|
|
195
|
+
| `judge-reliability` | Deep reliability & resilience review |
|
|
196
|
+
| `judge-observability` | Deep observability & monitoring review |
|
|
197
|
+
| `judge-performance` | Deep performance optimization review |
|
|
198
|
+
| `judge-compliance` | Deep regulatory compliance review |
|
|
199
|
+
| `judge-testing` | Deep testing quality review |
|
|
200
|
+
| `judge-documentation` | Deep documentation quality review |
|
|
201
|
+
| `judge-internationalization` | Deep i18n review |
|
|
202
|
+
| `judge-dependency-health` | Deep dependency health review |
|
|
203
|
+
| `judge-concurrency` | Deep concurrency & async safety review |
|
|
204
|
+
| `judge-ethics-bias` | Deep ethics & bias review |
|
|
205
|
+
| `full-tribunal` | All 18 judges in a single prompt |
|
|
206
|
+
|
|
207
|
+
---
|
|
208
|
+
|
|
121
209
|
## Scoring
|
|
122
210
|
|
|
123
211
|
Each judge scores the code from **0 to 100**:
|
|
124
212
|
|
|
125
213
|
| Severity | Score Deduction |
|
|
126
214
|
|----------|----------------|
|
|
127
|
-
| Critical |
|
|
128
|
-
| High |
|
|
129
|
-
| Medium |
|
|
130
|
-
| Low |
|
|
215
|
+
| Critical | −20 points |
|
|
216
|
+
| High | −12 points |
|
|
217
|
+
| Medium | −6 points |
|
|
218
|
+
| Low | −3 points |
|
|
131
219
|
| Info | 0 points |
|
|
132
220
|
|
|
133
221
|
**Verdict logic:**
|
|
@@ -137,38 +225,48 @@ Each judge scores the code from **0 to 100**:
|
|
|
137
225
|
|
|
138
226
|
The **overall tribunal score** is the average of all 18 judges. The overall verdict fails if **any** judge fails.
|
|
139
227
|
|
|
140
|
-
|
|
141
|
-
|
|
142
|
-
```
|
|
143
|
-
# Judges Panel — Verdict
|
|
144
|
-
|
|
145
|
-
**Overall Verdict: WARNING** | **Score: 68/100**
|
|
146
|
-
Total critical findings: 1 | Total high findings: 3
|
|
147
|
-
|
|
148
|
-
## Individual Judge Results
|
|
149
|
-
|
|
150
|
-
❌ **Judge Data Security** (FAIL, 60/100) — 3 finding(s)
|
|
151
|
-
⚠️ **Judge Cybersecurity** (WARNING, 76/100) — 2 finding(s)
|
|
152
|
-
✅ **Judge Cost Effectiveness** (PASS, 88/100) — 1 finding(s)
|
|
153
|
-
⚠️ **Judge Scalability** (WARNING, 70/100) — 2 finding(s)
|
|
154
|
-
✅ **Judge Cloud Readiness** (PASS, 82/100) — 1 finding(s)
|
|
155
|
-
⚠️ **Judge Software Practices** (WARNING, 72/100) — 3 finding(s)
|
|
156
|
-
```
|
|
228
|
+
---
|
|
157
229
|
|
|
158
230
|
## Project Structure
|
|
159
231
|
|
|
160
232
|
```
|
|
161
233
|
judges/
|
|
162
234
|
├── src/
|
|
163
|
-
│ ├── index.ts
|
|
164
|
-
│ ├── types.ts
|
|
165
|
-
│ ├──
|
|
166
|
-
│
|
|
235
|
+
│ ├── index.ts # MCP server entry point — tools, prompts, transport
|
|
236
|
+
│ ├── types.ts # TypeScript interfaces (Finding, JudgeEvaluation, etc.)
|
|
237
|
+
│ ├── evaluators/ # Pattern-based analysis engine for each judge
|
|
238
|
+
│ │ ├── index.ts # evaluateWithJudge(), evaluateWithTribunal()
|
|
239
|
+
│ │ ├── shared.ts # Scoring, verdict logic, markdown formatters
|
|
240
|
+
│ │ └── *.ts # One analyzer per judge (18 files)
|
|
241
|
+
│ └── judges/ # Judge definitions (id, name, domain, system prompt)
|
|
242
|
+
│ ├── index.ts # JUDGES array, getJudge(), getJudgeSummaries()
|
|
243
|
+
│ └── *.ts # One definition per judge (18 files)
|
|
244
|
+
├── examples/
|
|
245
|
+
│ ├── sample-vulnerable-api.ts # Intentionally flawed code (triggers all 18 judges)
|
|
246
|
+
│ └── demo.ts # Run: npm run demo
|
|
247
|
+
├── tests/
|
|
248
|
+
│ └── judges.test.ts # Run: npm test (184 tests)
|
|
249
|
+
├── server.json # MCP Registry manifest
|
|
167
250
|
├── package.json
|
|
168
251
|
├── tsconfig.json
|
|
169
252
|
└── README.md
|
|
170
253
|
```
|
|
171
254
|
|
|
255
|
+
---
|
|
256
|
+
|
|
257
|
+
## Scripts
|
|
258
|
+
|
|
259
|
+
| Command | Description |
|
|
260
|
+
|---------|-------------|
|
|
261
|
+
| `npm run build` | Compile TypeScript to `dist/` |
|
|
262
|
+
| `npm run dev` | Watch mode — recompile on save |
|
|
263
|
+
| `npm test` | Run the full test suite (184 tests) |
|
|
264
|
+
| `npm run demo` | Run the sample tribunal demo |
|
|
265
|
+
| `npm start` | Start the MCP server |
|
|
266
|
+
| `npm run clean` | Remove `dist/` |
|
|
267
|
+
|
|
268
|
+
---
|
|
269
|
+
|
|
172
270
|
## License
|
|
173
271
|
|
|
174
272
|
MIT
|
package/dist/index.js
CHANGED
|
@@ -20,7 +20,7 @@ import { evaluateWithJudge, evaluateWithTribunal, formatVerdictAsMarkdown, forma
|
|
|
20
20
|
// ─── Create MCP Server ──────────────────────────────────────────────────────
|
|
21
21
|
const server = new McpServer({
|
|
22
22
|
name: "judges",
|
|
23
|
-
version: "1.
|
|
23
|
+
version: "1.1.0",
|
|
24
24
|
});
|
|
25
25
|
// ─── Tool: get_judges ────────────────────────────────────────────────────────
|
|
26
26
|
server.tool("get_judges", "List all available judges on the Agent Tribunal panel, including their areas of expertise and what they evaluate.", {}, async () => {
|
package/package.json
CHANGED
|
@@ -1,8 +1,8 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "@kevinrabun/judges",
|
|
3
|
-
"version": "1.0
|
|
3
|
+
"version": "1.1.0",
|
|
4
4
|
"description": "18 specialized judges that evaluate AI-generated code for security, cost, and quality.",
|
|
5
|
-
"mcpName": "io.github.
|
|
5
|
+
"mcpName": "io.github.KevinRabun/judges",
|
|
6
6
|
"type": "module",
|
|
7
7
|
"main": "dist/index.js",
|
|
8
8
|
"bin": {
|
|
@@ -19,6 +19,8 @@
|
|
|
19
19
|
"start": "node dist/index.js",
|
|
20
20
|
"dev": "tsc --watch",
|
|
21
21
|
"clean": "rimraf dist",
|
|
22
|
+
"test": "npx tsx --test tests/judges.test.ts",
|
|
23
|
+
"demo": "npx tsx examples/demo.ts",
|
|
22
24
|
"prepublishOnly": "npm run build"
|
|
23
25
|
},
|
|
24
26
|
"keywords": [
|
|
@@ -48,6 +50,7 @@
|
|
|
48
50
|
},
|
|
49
51
|
"devDependencies": {
|
|
50
52
|
"@types/node": "^25.3.0",
|
|
53
|
+
"tsx": "^4.19.4",
|
|
51
54
|
"typescript": "^5.9.3"
|
|
52
55
|
}
|
|
53
56
|
}
|
package/server.json
CHANGED
|
@@ -1,18 +1,18 @@
|
|
|
1
1
|
{
|
|
2
2
|
"$schema": "https://static.modelcontextprotocol.io/schemas/2025-12-11/server.schema.json",
|
|
3
|
-
"name": "io.github.
|
|
3
|
+
"name": "io.github.KevinRabun/judges",
|
|
4
4
|
"title": "Judges Panel",
|
|
5
5
|
"description": "18 specialized judges that evaluate AI-generated code for security, cost, and quality.",
|
|
6
6
|
"repository": {
|
|
7
7
|
"url": "https://github.com/kevinrabun/judges",
|
|
8
8
|
"source": "github"
|
|
9
9
|
},
|
|
10
|
-
"version": "1.0
|
|
10
|
+
"version": "1.1.0",
|
|
11
11
|
"packages": [
|
|
12
12
|
{
|
|
13
13
|
"registryType": "npm",
|
|
14
14
|
"identifier": "@kevinrabun/judges",
|
|
15
|
-
"version": "1.0
|
|
15
|
+
"version": "1.1.0",
|
|
16
16
|
"transport": {
|
|
17
17
|
"type": "stdio"
|
|
18
18
|
}
|