fa-mcp-sdk 0.4.73 → 0.4.76
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/cli-template/.claude/skills/mcp-app-add-to-server/SKILL.md +427 -0
- package/cli-template/.claude/skills/mcp-app-create/SKILL.md +222 -0
- package/cli-template/FA-MCP-SDK-DOC/08-agent-tester-and-headless-api.md +681 -659
- package/cli-template/README.md +193 -191
- package/cli-template/package.json +1 -1
- package/cli-template/readme-docs/SKILLS.md +85 -0
- package/package.json +1 -1
|
@@ -1,659 +1,681 @@
|
|
|
1
|
-
# Agent Tester and Headless API
|
|
2
|
-
|
|
3
|
-
## Overview
|
|
4
|
-
|
|
5
|
-
The Agent Tester is a built-in AI agent system for developing and refining MCP server tools. It goes beyond functional testing — it validates the **full agent experience**: how the LLM interprets tool descriptions, selects tools, passes arguments, and presents results.
|
|
6
|
-
|
|
7
|
-
The Headless API provides programmatic access to the Agent Tester without a browser. It enables CLI-based automated testing and returns structured trace data for every tool call, argument, result, and LLM decision.
|
|
8
|
-
|
|
9
|
-
## Developing MCP Servers as Agents
|
|
10
|
-
|
|
11
|
-
An MCP server is not just a set of tools — it is an **agent interface**. The LLM acts as the agent, deciding which tools to call, with what arguments, and how to interpret results. This means the quality of the agent experience depends on:
|
|
12
|
-
|
|
13
|
-
- **Tool descriptions** — the LLM reads them to decide when and why to call a tool
|
|
14
|
-
- **Parameter schemas** — names, types, required/optional flags, and default value documentation guide the LLM's argument construction
|
|
15
|
-
- **Response format** — `formatToolResult()` output must be structured so the LLM can interpret and relay it to the user
|
|
16
|
-
- **Agent prompt** — the system prompt shapes the LLM's conversation style, tool usage logic, and error handling behavior
|
|
17
|
-
- **Tool decomposition** — whether one tool should be split into two, or two merged into one
|
|
18
|
-
|
|
19
|
-
All of these aspects are **invisible to unit tests**. A tool can pass all unit tests and still produce a poor agent experience because the LLM misinterprets the description, sends wrong argument types, or doesn't understand the response format.
|
|
20
|
-
|
|
21
|
-
The Agent Tester closes this gap by running the **full agent loop**: user message → LLM reasoning → tool selection → tool execution → LLM interpretation → user response.
|
|
22
|
-
|
|
23
|
-
## Three-Phase Development Workflow
|
|
24
|
-
|
|
25
|
-
### Phase 1: Initial Architecture
|
|
26
|
-
|
|
27
|
-
Design tools, prompts, parameters, and handler logic based on task requirements. Implement a first working version:
|
|
28
|
-
|
|
29
|
-
```bash
|
|
30
|
-
npm run cb && npm start
|
|
31
|
-
```
|
|
32
|
-
|
|
33
|
-
### Phase 2: Basic Functionality
|
|
34
|
-
|
|
35
|
-
Verify compilation, server startup, tool registration, and basic calls. Fix crashes, connection errors, and missing tools.
|
|
36
|
-
|
|
37
|
-
### Phase 3: Iterative Refinement
|
|
38
|
-
|
|
39
|
-
This is the key phase. Send test messages through the Agent Tester, observe the agent's behavior, diagnose issues, and refine:
|
|
40
|
-
|
|
41
|
-
```
|
|
42
|
-
observe agent behavior → diagnose root cause → fix → rebuild → re-test
|
|
43
|
-
```
|
|
44
|
-
|
|
45
|
-
Root cause categories:
|
|
46
|
-
- **Tool description** — LLM picks wrong tool or misunderstands purpose
|
|
47
|
-
- **Parameter schema** — LLM sends wrong types or misses required params
|
|
48
|
-
- **Agent prompt** — LLM doesn't follow desired conversation style
|
|
49
|
-
- **Handler logic** — tool results confuse the LLM
|
|
50
|
-
- **Error messages** — failures produce unhelpful responses
|
|
51
|
-
|
|
52
|
-
## Authentication (`agentTester.useAuth`)
|
|
53
|
-
|
|
54
|
-
When `agentTester.useAuth` is `true`, the Agent Tester is protected by the full multi-auth middleware — the same authentication chain used for MCP endpoints (`permanentServerTokens` / `basic` / `jwtToken` / `custom`).
|
|
55
|
-
|
|
56
|
-
### How It Works
|
|
57
|
-
|
|
58
|
-
**Browser access:** When a user opens `/agent-tester` in a browser, the page loads normally (static assets are served without auth). The frontend checks `GET /api/auth/status` and displays a **login dialog** if the user is not authenticated. The dialog adapts to configured auth methods:
|
|
59
|
-
|
|
60
|
-
- If `permanentServerTokens` or `jwtToken` is configured — shows a "Token" input
|
|
61
|
-
- If `basic` auth is configured — shows "Username" + "Password" inputs
|
|
62
|
-
- If both are configured — shows tabs to switch between methods
|
|
63
|
-
|
|
64
|
-
After successful login via `POST /api/auth/login`, the server issues an httpOnly session cookie (`__at_sid`). All subsequent API requests from the browser include this cookie automatically. The session is valid for the configured TTL (default: 8 hours — see [Session Lifetime](#session-lifetime) below). A logout button appears in the header.
|
|
65
|
-
|
|
66
|
-
**Headless / CLI access:** Headless API consumers (curl, scripts, Claude Code) bypass the login dialog entirely. They pass an `Authorization` header with each request, which is validated by the standard `authMW`. No session cookie is needed.
|
|
67
|
-
|
|
68
|
-
### Configuration
|
|
69
|
-
|
|
70
|
-
```yaml
|
|
71
|
-
agentTester:
|
|
72
|
-
useAuth: true # Show login screen for browser, require auth for API
|
|
73
|
-
sessionTtlMs: 28800000 # Browser session lifetime in ms (default: 8h)
|
|
74
|
-
tokenTTLSec: 1800 # TTL of JWTs auto-issued for the chat UI / headless clients (default: 30 min)
|
|
75
|
-
|
|
76
|
-
webServer:
|
|
77
|
-
auth:
|
|
78
|
-
enabled: true
|
|
79
|
-
permanentServerTokens: ['my-secret-token']
|
|
80
|
-
# and/or basic, jwtToken — any configured method will be available
|
|
81
|
-
```
|
|
82
|
-
|
|
83
|
-
Environment variables:
|
|
84
|
-
|
|
85
|
-
- `AGENT_TESTER_USE_AUTH=true`
|
|
86
|
-
- `AGENT_TESTER_SESSION_TTL_MS=28800000`
|
|
87
|
-
- `AGENT_TESTER_TOKEN_TTL_SEC=1800`
|
|
88
|
-
|
|
89
|
-
When `useAuth` is `false` (default), the Agent Tester is accessible without any authentication and `sessionTtlMs` has no effect.
|
|
90
|
-
|
|
91
|
-
### Session Lifetime
|
|
92
|
-
|
|
93
|
-
When `useAuth` is `true`, a successful browser login creates a server-side session and sets an httpOnly cookie (`__at_sid`) scoped to `/agent-tester`. Both the in-memory entry and the cookie's `Max-Age` use the same TTL from `agentTester.sessionTtlMs`.
|
|
94
|
-
|
|
95
|
-
**Where sessions live**: an in-memory `Map` inside the server process (`src/core/auth/agent-tester-auth.ts`). There is no disk or Redis persistence — this is intentional because Agent Tester is a development tool, not a production auth system.
|
|
96
|
-
|
|
97
|
-
**Default TTL**: 8 hours (`28_800_000` ms). Override by setting `agentTester.sessionTtlMs` in `config/default.yaml` (or any environment-specific override file), or via `AGENT_TESTER_SESSION_TTL_MS`. Values are in milliseconds; any non-positive or non-finite value falls back to the 8h default.
|
|
98
|
-
|
|
99
|
-
**Cleanup**: a background sweep runs every 30 minutes and drops expired entries from the map. Expired entries are also evicted lazily on access.
|
|
100
|
-
|
|
101
|
-
**Impact of closing the browser or restarting the server**:
|
|
102
|
-
|
|
103
|
-
| Scenario | Re-login required? |
|
|
104
|
-
|---|---|
|
|
105
|
-
| Close tab, reopen within TTL | No — cookie is persistent, server session still live |
|
|
106
|
-
| Close entire browser, reopen within TTL | No — cookie is persistent, server session still live |
|
|
107
|
-
| TTL elapsed since last login | Yes — server drops the entry, responds 401 |
|
|
108
|
-
| Server restart (Ctrl+C, deploy, crash) | Yes — in-memory map is cleared; browser presents an unknown `__at_sid` and the login overlay reappears |
|
|
109
|
-
| User clicks the Logout button | Yes — `POST /api/auth/logout` deletes the entry and clears the cookie |
|
|
110
|
-
|
|
111
|
-
**Tuning guidance**:
|
|
112
|
-
|
|
113
|
-
- **Shorter TTL (e.g. 1 hour = `3600000`)**: more frequent logins, smaller exposure if a workstation is left unlocked.
|
|
114
|
-
- **Longer TTL (e.g. 24 hours = `86400000`)**: fewer interruptions during long development sessions.
|
|
115
|
-
- **Do not set TTL to 0 or a negative value** — the server will silently fall back to the 8h default.
|
|
116
|
-
|
|
117
|
-
> **Note**: the TTL only affects the browser login flow. Headless API access via `Authorization` header is stateless and completely bypasses sessions; it is unaffected by `sessionTtlMs`.
|
|
118
|
-
|
|
119
|
-
### Auth API Endpoints
|
|
120
|
-
|
|
121
|
-
| Endpoint | Method | Description |
|
|
122
|
-
|----------|--------|-------------|
|
|
123
|
-
| `/api/auth/status` | GET | Returns `{ authRequired, authenticated, methods }` |
|
|
124
|
-
| `/api/auth/login` | POST | Validates credentials, sets session cookie |
|
|
125
|
-
| `/api/auth/logout` | POST | Destroys session, clears cookie |
|
|
126
|
-
| `/api/auth-token` | GET | Returns a ready-to-use `Authorization` header value for the configured MCP auth method (used by the chat UI to auto-fill the header). Response: `{ authType, token, ttlSec? }`. |
|
|
127
|
-
| `/api/auth-token/refresh` | POST | Re-issues a fresh JWT (only when `webServer.auth.jwtToken.encryptKey` is configured). Response: `{ authType: 'jwtToken', token, ttlSec }`. |
|
|
128
|
-
|
|
129
|
-
### Auto-filled Authorization Header
|
|
130
|
-
|
|
131
|
-
When the MCP server requires authentication (`webServer.auth.enabled: true`) and the chat UI is configured to send the `Authorization` header, the page does **not** ask the user to type a token — it issues one for itself by calling `GET /api/auth-token` on load. The endpoint returns a header value derived from the configured method, in priority order:
|
|
132
|
-
|
|
133
|
-
1. **`jwtToken`** — `Bearer <encrypted JWT>` issued by the server with `sub: 'agentTester'`, `service: <appConfig.name>`, and TTL = `agentTester.tokenTTLSec` (default 1800 sec / 30 min). The response also includes `ttlSec` so the client can plan refresh.
|
|
134
|
-
2. **`basic`** — `Basic <base64(user:password)>` from `webServer.auth.basic`.
|
|
135
|
-
3. **`permanentServerTokens`** — `Bearer <first configured token>`.
|
|
136
|
-
|
|
137
|
-
For **JWT only**, the page periodically refreshes the token on its own via `POST /api/auth-token/refresh`. The refresh cadence is approximately `max(30, ttlSec/3 - 60)` seconds (≈ once per 1/3 of TTL, with a 60-second safety lead and a 30-second floor). At the default `tokenTTLSec: 1800`, this means a refresh roughly every **9 minutes**. The page additionally triggers an immediate refresh when the tab regains focus or `visibilitychange` fires `'visible'`, to recover from background-tab timer throttling.
|
|
138
|
-
|
|
139
|
-
If the MCP call still fails with HTTP 401 — for example, the cached token expired in the brief window between the last refresh and the request — the server transparently re-issues a JWT and retries the call **once**, but only when the target URL points to the same server (host/port match `webServer.{host,port}`, with `localhost`/`127.0.0.1`/`::1`/`0.0.0.0` treated as equivalent) and the cached header was a `Bearer …` token. This means the user typically does not see a 401 even if a request races against TTL expiry.
|
|
140
|
-
|
|
141
|
-
**Tuning**:
|
|
142
|
-
- Shorter `tokenTTLSec` → more frequent refresh requests but smaller window of exposure if a token leaks.
|
|
143
|
-
- Longer `tokenTTLSec` → fewer refreshes; useful for very long-running sessions.
|
|
144
|
-
- Headless clients (the `headless-chat.js` wrapper, custom curl scripts) may either rely on the 401-retry path or, for long-running scripts, mint their own JWT via `node scripts/generate-jwt.js` with an appropriate TTL — Agent Tester does not refresh tokens on behalf of headless clients.
|
|
145
|
-
|
|
146
|
-
**Login request body:**
|
|
147
|
-
|
|
148
|
-
```json
|
|
149
|
-
// Token-based (permanent token or JWT)
|
|
150
|
-
{ "token": "my-secret-token" }
|
|
151
|
-
|
|
152
|
-
// Basic auth
|
|
153
|
-
{ "username": "admin", "password": "secret" }
|
|
154
|
-
```
|
|
155
|
-
|
|
156
|
-
### Headless Client Example
|
|
157
|
-
|
|
158
|
-
```bash
|
|
159
|
-
# Access Agent Tester API with token (no login needed)
|
|
160
|
-
curl -H "Authorization: Bearer my-secret-token" http://localhost:9876/agent-tester/api/mcp/status
|
|
161
|
-
|
|
162
|
-
# Headless test with token
|
|
163
|
-
curl -X POST http://localhost:9876/agent-tester/api/chat/test \
|
|
164
|
-
-H "Authorization: Bearer my-secret-token" \
|
|
165
|
-
-H "Content-Type: application/json" \
|
|
166
|
-
-d '{"message":"Hello","mcpConfig":{"url":"http://localhost:9876/mcp","transport":"http"}}'
|
|
167
|
-
```
|
|
168
|
-
|
|
169
|
-
|
|
170
|
-
|
|
171
|
-
|
|
172
|
-
|
|
173
|
-
|
|
174
|
-
|
|
175
|
-
|
|
176
|
-
|
|
177
|
-
```
|
|
178
|
-
|
|
179
|
-
|
|
180
|
-
|
|
181
|
-
|
|
182
|
-
|
|
183
|
-
-
|
|
184
|
-
-
|
|
185
|
-
|
|
186
|
-
|
|
187
|
-
|
|
188
|
-
|
|
189
|
-
|
|
190
|
-
|
|
191
|
-
|
|
192
|
-
|
|
193
|
-
|
|
194
|
-
|
|
195
|
-
|
|
196
|
-
|
|
197
|
-
|
|
198
|
-
|
|
199
|
-
|
|
200
|
-
|
|
201
|
-
|
|
202
|
-
|
|
203
|
-
|
|
204
|
-
|
|
205
|
-
|
|
206
|
-
|
|
207
|
-
|
|
208
|
-
|
|
209
|
-
|
|
210
|
-
```
|
|
211
|
-
|
|
212
|
-
|
|
213
|
-
|
|
214
|
-
|
|
215
|
-
|
|
216
|
-
```
|
|
217
|
-
|
|
218
|
-
|
|
219
|
-
|
|
220
|
-
|
|
221
|
-
|
|
222
|
-
|
|
223
|
-
|
|
224
|
-
|
|
225
|
-
|
|
226
|
-
|
|
227
|
-
|
|
228
|
-
|
|
229
|
-
|
|
230
|
-
|
|
231
|
-
|
|
232
|
-
```
|
|
233
|
-
|
|
234
|
-
|
|
235
|
-
|
|
236
|
-
|
|
237
|
-
|
|
238
|
-
|
|
239
|
-
|
|
240
|
-
|
|
241
|
-
|
|
242
|
-
|
|
243
|
-
|
|
244
|
-
|
|
245
|
-
|
|
246
|
-
|
|
247
|
-
|
|
248
|
-
```
|
|
249
|
-
|
|
250
|
-
|
|
251
|
-
|
|
252
|
-
|
|
253
|
-
|
|
254
|
-
```
|
|
255
|
-
|
|
256
|
-
|
|
257
|
-
|
|
258
|
-
|
|
259
|
-
|
|
260
|
-
|
|
261
|
-
|
|
262
|
-
|
|
263
|
-
|
|
264
|
-
|
|
265
|
-
|
|
266
|
-
|
|
267
|
-
|
|
268
|
-
"
|
|
269
|
-
|
|
270
|
-
|
|
271
|
-
|
|
272
|
-
|
|
273
|
-
|
|
274
|
-
|
|
275
|
-
|
|
276
|
-
|
|
277
|
-
|
|
278
|
-
|
|
279
|
-
|
|
280
|
-
|
|
281
|
-
|
|
282
|
-
|
|
283
|
-
|
|
284
|
-
|
|
285
|
-
|
|
286
|
-
|
|
287
|
-
|
|
288
|
-
|
|
289
|
-
|
|
290
|
-
|
|
291
|
-
|
|
292
|
-
|
|
293
|
-
|
|
294
|
-
|
|
295
|
-
|
|
296
|
-
|
|
297
|
-
|
|
298
|
-
|
|
299
|
-
|
|
300
|
-
|
|
301
|
-
|
|
302
|
-
|
|
303
|
-
|
|
304
|
-
|
|
305
|
-
|
|
306
|
-
|
|
307
|
-
|
|
308
|
-
|
|
309
|
-
|
|
310
|
-
|
|
311
|
-
|
|
312
|
-
|
|
313
|
-
|
|
314
|
-
|
|
315
|
-
```
|
|
316
|
-
|
|
317
|
-
The
|
|
318
|
-
|
|
319
|
-
|
|
320
|
-
|
|
321
|
-
|
|
322
|
-
|
|
323
|
-
|
|
324
|
-
|
|
325
|
-
|
|
326
|
-
|
|
327
|
-
|
|
328
|
-
|
|
329
|
-
|
|
330
|
-
|
|
331
|
-
|
|
332
|
-
|
|
333
|
-
|
|
334
|
-
|
|
335
|
-
|
|
336
|
-
|
|
337
|
-
|
|
338
|
-
|
|
339
|
-
|
|
340
|
-
|
|
341
|
-
|
|
342
|
-
|
|
343
|
-
|
|
344
|
-
|
|
345
|
-
```
|
|
346
|
-
|
|
347
|
-
|
|
348
|
-
|
|
349
|
-
-
|
|
350
|
-
|
|
351
|
-
|
|
352
|
-
|
|
353
|
-
|
|
354
|
-
|
|
355
|
-
|
|
356
|
-
|
|
357
|
-
|
|
358
|
-
|
|
359
|
-
|
|
360
|
-
|
|
361
|
-
|
|
362
|
-
|
|
363
|
-
|
|
364
|
-
|
|
365
|
-
|
|
366
|
-
|
|
367
|
-
|
|
368
|
-
|
|
369
|
-
|
|
370
|
-
|
|
371
|
-
|
|
372
|
-
|
|
373
|
-
|
|
374
|
-
|
|
375
|
-
|
|
376
|
-
|
|
377
|
-
|
|
378
|
-
|
|
379
|
-
|
|
380
|
-
|
|
381
|
-
|
|
382
|
-
|
|
383
|
-
|
|
384
|
-
|
|
385
|
-
|
|
386
|
-
|
|
387
|
-
|
|
388
|
-
|
|
389
|
-
|
|
390
|
-
|
|
391
|
-
|
|
392
|
-
|
|
393
|
-
|
|
394
|
-
|
|
395
|
-
|
|
396
|
-
|
|
397
|
-
|
|
398
|
-
|
|
399
|
-
|
|
400
|
-
|
|
401
|
-
|
|
402
|
-
|
|
403
|
-
|
|
404
|
-
|
|
405
|
-
|
|
406
|
-
|
|
407
|
-
|
|
408
|
-
|
|
409
|
-
|
|
410
|
-
|
|
411
|
-
|
|
412
|
-
|
|
413
|
-
|
|
414
|
-
|
|
415
|
-
|
|
416
|
-
|
|
417
|
-
|
|
418
|
-
|
|
419
|
-
|
|
420
|
-
|
|
421
|
-
|
|
422
|
-
|
|
423
|
-
|
|
424
|
-
|
|
425
|
-
|
|
426
|
-
|
|
427
|
-
|
|
428
|
-
|
|
429
|
-
|
|
430
|
-
|
|
431
|
-
|
|
432
|
-
|
|
433
|
-
|
|
434
|
-
|
|
435
|
-
|
|
436
|
-
|
|
437
|
-
|
|
438
|
-
|
|
439
|
-
|
|
440
|
-
|
|
441
|
-
|
|
442
|
-
|
|
443
|
-
|
|
444
|
-
|
|
445
|
-
|
|
446
|
-
|
|
447
|
-
-
|
|
448
|
-
|
|
449
|
-
|
|
450
|
-
|
|
451
|
-
|
|
452
|
-
|
|
453
|
-
|
|
454
|
-
|
|
455
|
-
|
|
456
|
-
|
|
457
|
-
|
|
458
|
-
|
|
459
|
-
|
|
460
|
-
|
|
461
|
-
|
|
462
|
-
|
|
463
|
-
|
|
464
|
-
|
|
465
|
-
|
|
466
|
-
|
|
467
|
-
|
|
468
|
-
|
|
469
|
-
|
|
470
|
-
|
|
471
|
-
|
|
472
|
-
|
|
473
|
-
|
|
474
|
-
|
|
475
|
-
|
|
476
|
-
|
|
477
|
-
|
|
478
|
-
|
|
479
|
-
|
|
480
|
-
|
|
481
|
-
|
|
482
|
-
at-
|
|
483
|
-
|
|
484
|
-
|
|
485
|
-
|
|
486
|
-
|
|
487
|
-
|
|
488
|
-
|
|
489
|
-
|
|
490
|
-
|
|
491
|
-
|
|
492
|
-
|
|
493
|
-
|
|
494
|
-
|
|
495
|
-
|
|
496
|
-
|
|
497
|
-
|
|
498
|
-
|
|
499
|
-
|
|
500
|
-
|
|
501
|
-
|
|
502
|
-
|
|
503
|
-
|
|
504
|
-
|
|
505
|
-
|
|
506
|
-
|
|
507
|
-
|
|
508
|
-
|
|
509
|
-
|
|
510
|
-
|
|
511
|
-
|
|
512
|
-
|
|
513
|
-
|
|
|
514
|
-
|
|
515
|
-
|
|
516
|
-
|
|
517
|
-
|
|
|
518
|
-
|
|
519
|
-
| `at-
|
|
520
|
-
| `at-
|
|
521
|
-
| `at-
|
|
522
|
-
| `at-
|
|
523
|
-
| `at-
|
|
524
|
-
| `at-
|
|
525
|
-
| `at-
|
|
526
|
-
| `at-
|
|
527
|
-
|
|
528
|
-
|
|
529
|
-
|
|
530
|
-
|
|
|
531
|
-
|
|
532
|
-
| `at-
|
|
533
|
-
| `at-
|
|
534
|
-
| `at-
|
|
535
|
-
| `at-
|
|
536
|
-
|
|
537
|
-
**Sidebar —
|
|
538
|
-
|
|
539
|
-
| testid | Element |
|
|
540
|
-
|---|---|
|
|
541
|
-
| `at-
|
|
542
|
-
| `at-
|
|
543
|
-
| `at-
|
|
544
|
-
| `at-
|
|
545
|
-
|
|
546
|
-
|
|
547
|
-
|
|
548
|
-
|
|
549
|
-
|
|
550
|
-
|
|
|
551
|
-
|
|
552
|
-
| `at-
|
|
553
|
-
| `at-
|
|
554
|
-
| `at-
|
|
555
|
-
| `at-
|
|
556
|
-
| `at-
|
|
557
|
-
| `at-
|
|
558
|
-
|
|
559
|
-
|
|
560
|
-
|
|
561
|
-
|
|
|
562
|
-
|
|
563
|
-
| `at-
|
|
564
|
-
| `at-
|
|
565
|
-
| `at-
|
|
566
|
-
| `at-
|
|
567
|
-
|
|
568
|
-
|
|
569
|
-
|
|
570
|
-
|
|
571
|
-
|
|
572
|
-
|
|
573
|
-
|
|
574
|
-
|
|
|
575
|
-
|
|
576
|
-
| `at-
|
|
577
|
-
| `at-
|
|
578
|
-
| `at-
|
|
579
|
-
| `at-
|
|
580
|
-
|
|
581
|
-
|
|
582
|
-
|
|
583
|
-
|
|
|
584
|
-
|
|
585
|
-
| `at-
|
|
586
|
-
| `at-
|
|
587
|
-
| `at-
|
|
588
|
-
| `at-
|
|
589
|
-
| `at-
|
|
590
|
-
|
|
591
|
-
|
|
592
|
-
|
|
593
|
-
|
|
594
|
-
|
|
595
|
-
|
|
596
|
-
|
|
|
597
|
-
|
|
598
|
-
| `at-
|
|
599
|
-
| `at-
|
|
600
|
-
| `at-
|
|
601
|
-
| `at-
|
|
602
|
-
|
|
603
|
-
|
|
604
|
-
|
|
605
|
-
|
|
|
606
|
-
|
|
607
|
-
|
|
608
|
-
|
|
609
|
-
|
|
|
610
|
-
|
|
611
|
-
| `at-
|
|
612
|
-
|
|
613
|
-
|
|
614
|
-
|
|
615
|
-
|
|
|
616
|
-
|
|
617
|
-
| `at-
|
|
618
|
-
| `at-
|
|
619
|
-
| `at-
|
|
620
|
-
|
|
621
|
-
|
|
622
|
-
|
|
623
|
-
|
|
624
|
-
|
|
625
|
-
|
|
626
|
-
|
|
627
|
-
|
|
628
|
-
|
|
629
|
-
|
|
630
|
-
|
|
631
|
-
|
|
632
|
-
|
|
633
|
-
|
|
634
|
-
|
|
635
|
-
|
|
636
|
-
|
|
637
|
-
|
|
638
|
-
|
|
639
|
-
|
|
640
|
-
|
|
641
|
-
|
|
642
|
-
|
|
643
|
-
|
|
644
|
-
|
|
645
|
-
|
|
646
|
-
|
|
647
|
-
|
|
648
|
-
|
|
649
|
-
|
|
650
|
-
|
|
651
|
-
|
|
652
|
-
|
|
653
|
-
|
|
654
|
-
|
|
655
|
-
|
|
656
|
-
|
|
657
|
-
|
|
658
|
-
|
|
659
|
-
|
|
1
|
+
# Agent Tester and Headless API
|
|
2
|
+
|
|
3
|
+
## Overview
|
|
4
|
+
|
|
5
|
+
The Agent Tester is a built-in AI agent system for developing and refining MCP server tools. It goes beyond functional testing — it validates the **full agent experience**: how the LLM interprets tool descriptions, selects tools, passes arguments, and presents results.
|
|
6
|
+
|
|
7
|
+
The Headless API provides programmatic access to the Agent Tester without a browser. It enables CLI-based automated testing and returns structured trace data for every tool call, argument, result, and LLM decision.
|
|
8
|
+
|
|
9
|
+
## Developing MCP Servers as Agents
|
|
10
|
+
|
|
11
|
+
An MCP server is not just a set of tools — it is an **agent interface**. The LLM acts as the agent, deciding which tools to call, with what arguments, and how to interpret results. This means the quality of the agent experience depends on:
|
|
12
|
+
|
|
13
|
+
- **Tool descriptions** — the LLM reads them to decide when and why to call a tool
|
|
14
|
+
- **Parameter schemas** — names, types, required/optional flags, and default value documentation guide the LLM's argument construction
|
|
15
|
+
- **Response format** — `formatToolResult()` output must be structured so the LLM can interpret and relay it to the user
|
|
16
|
+
- **Agent prompt** — the system prompt shapes the LLM's conversation style, tool usage logic, and error handling behavior
|
|
17
|
+
- **Tool decomposition** — whether one tool should be split into two, or two merged into one
|
|
18
|
+
|
|
19
|
+
All of these aspects are **invisible to unit tests**. A tool can pass all unit tests and still produce a poor agent experience because the LLM misinterprets the description, sends wrong argument types, or doesn't understand the response format.
|
|
20
|
+
|
|
21
|
+
The Agent Tester closes this gap by running the **full agent loop**: user message → LLM reasoning → tool selection → tool execution → LLM interpretation → user response.
|
|
22
|
+
|
|
23
|
+
## Three-Phase Development Workflow
|
|
24
|
+
|
|
25
|
+
### Phase 1: Initial Architecture
|
|
26
|
+
|
|
27
|
+
Design tools, prompts, parameters, and handler logic based on task requirements. Implement a first working version:
|
|
28
|
+
|
|
29
|
+
```bash
|
|
30
|
+
npm run cb && npm start
|
|
31
|
+
```
|
|
32
|
+
|
|
33
|
+
### Phase 2: Basic Functionality
|
|
34
|
+
|
|
35
|
+
Verify compilation, server startup, tool registration, and basic calls. Fix crashes, connection errors, and missing tools.
|
|
36
|
+
|
|
37
|
+
### Phase 3: Iterative Refinement
|
|
38
|
+
|
|
39
|
+
This is the key phase. Send test messages through the Agent Tester, observe the agent's behavior, diagnose issues, and refine:
|
|
40
|
+
|
|
41
|
+
```
|
|
42
|
+
observe agent behavior → diagnose root cause → fix → rebuild → re-test
|
|
43
|
+
```
|
|
44
|
+
|
|
45
|
+
Root cause categories:
|
|
46
|
+
- **Tool description** — LLM picks wrong tool or misunderstands purpose
|
|
47
|
+
- **Parameter schema** — LLM sends wrong types or misses required params
|
|
48
|
+
- **Agent prompt** — LLM doesn't follow desired conversation style
|
|
49
|
+
- **Handler logic** — tool results confuse the LLM
|
|
50
|
+
- **Error messages** — failures produce unhelpful responses
|
|
51
|
+
|
|
52
|
+
## Authentication (`agentTester.useAuth`)
|
|
53
|
+
|
|
54
|
+
When `agentTester.useAuth` is `true`, the Agent Tester is protected by the full multi-auth middleware — the same authentication chain used for MCP endpoints (`permanentServerTokens` / `basic` / `jwtToken` / `custom`).
|
|
55
|
+
|
|
56
|
+
### How It Works
|
|
57
|
+
|
|
58
|
+
**Browser access:** When a user opens `/agent-tester` in a browser, the page loads normally (static assets are served without auth). The frontend checks `GET /api/auth/status` and displays a **login dialog** if the user is not authenticated. The dialog adapts to configured auth methods:
|
|
59
|
+
|
|
60
|
+
- If `permanentServerTokens` or `jwtToken` is configured — shows a "Token" input
|
|
61
|
+
- If `basic` auth is configured — shows "Username" + "Password" inputs
|
|
62
|
+
- If both are configured — shows tabs to switch between methods
|
|
63
|
+
|
|
64
|
+
After successful login via `POST /api/auth/login`, the server issues an httpOnly session cookie (`__at_sid`). All subsequent API requests from the browser include this cookie automatically. The session is valid for the configured TTL (default: 8 hours — see [Session Lifetime](#session-lifetime) below). A logout button appears in the header.
|
|
65
|
+
|
|
66
|
+
**Headless / CLI access:** Headless API consumers (curl, scripts, Claude Code) bypass the login dialog entirely. They pass an `Authorization` header with each request, which is validated by the standard `authMW`. No session cookie is needed.
|
|
67
|
+
|
|
68
|
+
### Configuration
|
|
69
|
+
|
|
70
|
+
```yaml
|
|
71
|
+
agentTester:
|
|
72
|
+
useAuth: true # Show login screen for browser, require auth for API
|
|
73
|
+
sessionTtlMs: 28800000 # Browser session lifetime in ms (default: 8h)
|
|
74
|
+
tokenTTLSec: 1800 # TTL of JWTs auto-issued for the chat UI / headless clients (default: 30 min)
|
|
75
|
+
|
|
76
|
+
webServer:
|
|
77
|
+
auth:
|
|
78
|
+
enabled: true
|
|
79
|
+
permanentServerTokens: ['my-secret-token']
|
|
80
|
+
# and/or basic, jwtToken — any configured method will be available
|
|
81
|
+
```
|
|
82
|
+
|
|
83
|
+
Environment variables:
|
|
84
|
+
|
|
85
|
+
- `AGENT_TESTER_USE_AUTH=true`
|
|
86
|
+
- `AGENT_TESTER_SESSION_TTL_MS=28800000`
|
|
87
|
+
- `AGENT_TESTER_TOKEN_TTL_SEC=1800`
|
|
88
|
+
|
|
89
|
+
When `useAuth` is `false` (default), the Agent Tester is accessible without any authentication and `sessionTtlMs` has no effect.
|
|
90
|
+
|
|
91
|
+
### Session Lifetime
|
|
92
|
+
|
|
93
|
+
When `useAuth` is `true`, a successful browser login creates a server-side session and sets an httpOnly cookie (`__at_sid`) scoped to `/agent-tester`. Both the in-memory entry and the cookie's `Max-Age` use the same TTL from `agentTester.sessionTtlMs`.
|
|
94
|
+
|
|
95
|
+
**Where sessions live**: an in-memory `Map` inside the server process (`src/core/auth/agent-tester-auth.ts`). There is no disk or Redis persistence — this is intentional because Agent Tester is a development tool, not a production auth system.
|
|
96
|
+
|
|
97
|
+
**Default TTL**: 8 hours (`28_800_000` ms). Override by setting `agentTester.sessionTtlMs` in `config/default.yaml` (or any environment-specific override file), or via `AGENT_TESTER_SESSION_TTL_MS`. Values are in milliseconds; any non-positive or non-finite value falls back to the 8h default.
|
|
98
|
+
|
|
99
|
+
**Cleanup**: a background sweep runs every 30 minutes and drops expired entries from the map. Expired entries are also evicted lazily on access.
|
|
100
|
+
|
|
101
|
+
**Impact of closing the browser or restarting the server**:
|
|
102
|
+
|
|
103
|
+
| Scenario | Re-login required? |
|
|
104
|
+
|---|---|
|
|
105
|
+
| Close tab, reopen within TTL | No — cookie is persistent, server session still live |
|
|
106
|
+
| Close entire browser, reopen within TTL | No — cookie is persistent, server session still live |
|
|
107
|
+
| TTL elapsed since last login | Yes — server drops the entry, responds 401 |
|
|
108
|
+
| Server restart (Ctrl+C, deploy, crash) | Yes — in-memory map is cleared; browser presents an unknown `__at_sid` and the login overlay reappears |
|
|
109
|
+
| User clicks the Logout button | Yes — `POST /api/auth/logout` deletes the entry and clears the cookie |
|
|
110
|
+
|
|
111
|
+
**Tuning guidance**:
|
|
112
|
+
|
|
113
|
+
- **Shorter TTL (e.g. 1 hour = `3600000`)**: more frequent logins, smaller exposure if a workstation is left unlocked.
|
|
114
|
+
- **Longer TTL (e.g. 24 hours = `86400000`)**: fewer interruptions during long development sessions.
|
|
115
|
+
- **Do not set TTL to 0 or a negative value** — the server will silently fall back to the 8h default.
|
|
116
|
+
|
|
117
|
+
> **Note**: the TTL only affects the browser login flow. Headless API access via `Authorization` header is stateless and completely bypasses sessions; it is unaffected by `sessionTtlMs`.
|
|
118
|
+
|
|
119
|
+
### Auth API Endpoints
|
|
120
|
+
|
|
121
|
+
| Endpoint | Method | Description |
|
|
122
|
+
|----------|--------|-------------|
|
|
123
|
+
| `/api/auth/status` | GET | Returns `{ authRequired, authenticated, methods }` |
|
|
124
|
+
| `/api/auth/login` | POST | Validates credentials, sets session cookie |
|
|
125
|
+
| `/api/auth/logout` | POST | Destroys session, clears cookie |
|
|
126
|
+
| `/api/auth-token` | GET | Returns a ready-to-use `Authorization` header value for the configured MCP auth method (used by the chat UI to auto-fill the header). Response: `{ authType, token, ttlSec? }`. |
|
|
127
|
+
| `/api/auth-token/refresh` | POST | Re-issues a fresh JWT (only when `webServer.auth.jwtToken.encryptKey` is configured). Response: `{ authType: 'jwtToken', token, ttlSec }`. |
|
|
128
|
+
|
|
129
|
+
### Auto-filled Authorization Header
|
|
130
|
+
|
|
131
|
+
When the MCP server requires authentication (`webServer.auth.enabled: true`) and the chat UI is configured to send the `Authorization` header, the page does **not** ask the user to type a token — it issues one for itself by calling `GET /api/auth-token` on load. The endpoint returns a header value derived from the configured method, in priority order:
|
|
132
|
+
|
|
133
|
+
1. **`jwtToken`** — `Bearer <encrypted JWT>` issued by the server with `sub: 'agentTester'`, `service: <appConfig.name>`, and TTL = `agentTester.tokenTTLSec` (default 1800 sec / 30 min). The response also includes `ttlSec` so the client can plan refresh.
|
|
134
|
+
2. **`basic`** — `Basic <base64(user:password)>` from `webServer.auth.basic`.
|
|
135
|
+
3. **`permanentServerTokens`** — `Bearer <first configured token>`.
|
|
136
|
+
|
|
137
|
+
For **JWT only**, the page periodically refreshes the token on its own via `POST /api/auth-token/refresh`. The refresh cadence is approximately `max(30, ttlSec/3 - 60)` seconds (≈ once per 1/3 of TTL, with a 60-second safety lead and a 30-second floor). At the default `tokenTTLSec: 1800`, this means a refresh roughly every **9 minutes**. The page additionally triggers an immediate refresh when the tab regains focus or `visibilitychange` fires `'visible'`, to recover from background-tab timer throttling.
|
|
138
|
+
|
|
139
|
+
If the MCP call still fails with HTTP 401 — for example, the cached token expired in the brief window between the last refresh and the request — the server transparently re-issues a JWT and retries the call **once**, but only when the target URL points to the same server (host/port match `webServer.{host,port}`, with `localhost`/`127.0.0.1`/`::1`/`0.0.0.0` treated as equivalent) and the cached header was a `Bearer …` token. This means the user typically does not see a 401 even if a request races against TTL expiry.
|
|
140
|
+
|
|
141
|
+
**Tuning**:
|
|
142
|
+
- Shorter `tokenTTLSec` → more frequent refresh requests but smaller window of exposure if a token leaks.
|
|
143
|
+
- Longer `tokenTTLSec` → fewer refreshes; useful for very long-running sessions.
|
|
144
|
+
- Headless clients (the `headless-chat.js` wrapper, custom curl scripts) may either rely on the 401-retry path or, for long-running scripts, mint their own JWT via `node scripts/generate-jwt.js` with an appropriate TTL — Agent Tester does not refresh tokens on behalf of headless clients.
|
|
145
|
+
|
|
146
|
+
**Login request body:**
|
|
147
|
+
|
|
148
|
+
```json
|
|
149
|
+
// Token-based (permanent token or JWT)
|
|
150
|
+
{ "token": "my-secret-token" }
|
|
151
|
+
|
|
152
|
+
// Basic auth
|
|
153
|
+
{ "username": "admin", "password": "secret" }
|
|
154
|
+
```
|
|
155
|
+
|
|
156
|
+
### Headless Client Example
|
|
157
|
+
|
|
158
|
+
```bash
|
|
159
|
+
# Access Agent Tester API with token (no login needed)
|
|
160
|
+
curl -H "Authorization: Bearer my-secret-token" http://localhost:9876/agent-tester/api/mcp/status
|
|
161
|
+
|
|
162
|
+
# Headless test with token
|
|
163
|
+
curl -X POST http://localhost:9876/agent-tester/api/chat/test \
|
|
164
|
+
-H "Authorization: Bearer my-secret-token" \
|
|
165
|
+
-H "Content-Type: application/json" \
|
|
166
|
+
-d '{"message":"Hello","mcpConfig":{"url":"http://localhost:9876/mcp","transport":"http"}}'
|
|
167
|
+
```
|
|
168
|
+
|
|
169
|
+
### Windows Encoding Note (curl + Cyrillic / Non-ASCII)
|
|
170
|
+
|
|
171
|
+
On Windows, curl's `-d` flag may corrupt non-ASCII characters (e.g. Cyrillic) because the shell passes bytes in the system codepage (CP1251), not UTF-8. The LLM then receives garbled text and propagates it into tool arguments.
|
|
172
|
+
|
|
173
|
+
**Symptom:** tool call arguments contain mojibake like `п�?п�?п�?п�?п�?` instead of readable Russian text.
|
|
174
|
+
|
|
175
|
+
**Fix:** write the JSON body to a file (UTF-8) and use `--data-binary @file`:
|
|
176
|
+
|
|
177
|
+
```bash
|
|
178
|
+
# 1. Write request JSON to a file (editor must save as UTF-8)
|
|
179
|
+
cat > tmp-request.json << 'EOF'
|
|
180
|
+
{"message":"Отправь письмо на user@example.com с темой \"Тест\"","mcpConfig":{"url":"http://localhost:9876/mcp","transport":"http"}}
|
|
181
|
+
EOF
|
|
182
|
+
|
|
183
|
+
# 2. Send with --data-binary to preserve UTF-8 encoding
|
|
184
|
+
curl -X POST http://localhost:9876/agent-tester/api/chat/test \
|
|
185
|
+
-H "Content-Type: application/json; charset=utf-8" \
|
|
186
|
+
--data-binary @tmp-request.json
|
|
187
|
+
```
|
|
188
|
+
|
|
189
|
+
This is only needed when running curl from a Windows shell with non-ASCII text. Linux/macOS terminals use UTF-8 by default and are not affected.
|
|
190
|
+
|
|
191
|
+
## Disabled State
|
|
192
|
+
|
|
193
|
+
When `agentTester.enabled` is `false` (or not set), all `/agent-tester/*` endpoints — including the Headless API — return HTTP 404:
|
|
194
|
+
|
|
195
|
+
```json
|
|
196
|
+
{
|
|
197
|
+
"error": "Not found"
|
|
198
|
+
}
|
|
199
|
+
```
|
|
200
|
+
|
|
201
|
+
This prevents disclosing the existence of the Agent Tester to external users.
|
|
202
|
+
|
|
203
|
+
To enable, developer can use one of:
|
|
204
|
+
|
|
205
|
+
- **ENV**: `AGENT_TESTER_ENABLED=true`
|
|
206
|
+
- **Config file** `config/local.yaml`:
|
|
207
|
+
```yaml
|
|
208
|
+
agentTester:
|
|
209
|
+
enabled: true
|
|
210
|
+
```
|
|
211
|
+
|
|
212
|
+
### Hiding Footer Link Without Disabling Tester
|
|
213
|
+
|
|
214
|
+
To keep the Agent Tester functional (accessible directly by URL) but hide the link from the home page footer, use `showFooterLink: false`:
|
|
215
|
+
|
|
216
|
+
```yaml
|
|
217
|
+
agentTester:
|
|
218
|
+
enabled: true
|
|
219
|
+
showFooterLink: false # Tester works, but footer link is hidden
|
|
220
|
+
```
|
|
221
|
+
|
|
222
|
+
When `showFooterLink` is omitted or `true` (default), the footer link is shown as usual.
|
|
223
|
+
|
|
224
|
+
**For Claude Code**: When you receive a 404 response on `/agent-tester/*`, the Agent Tester may be disabled. Inform the developer and suggest enabling it via `agentTester.enabled: true` in `config/local.yaml` or ENV `AGENT_TESTER_ENABLED=true`. The developer decides whether to turn it on. If the developer's prompt or instructions explicitly say not to mention Agent Tester enablement, respect that and stay silent.
|
|
225
|
+
|
|
226
|
+
## LLM Availability Check
|
|
227
|
+
|
|
228
|
+
Before using the Agent Tester or Headless API, verify that the OpenAI API key is configured and works:
|
|
229
|
+
|
|
230
|
+
```bash
|
|
231
|
+
npm run check-llm
|
|
232
|
+
```
|
|
233
|
+
|
|
234
|
+
The script checks that `agentTester.openAi.apiKey` is set and makes a minimal `chat.completions.create` request (max_tokens: 1) to verify the key works. Prints `OK` on success, `FAIL: <reason>` on failure. Exit code `0` = success, `1` = failure.
|
|
235
|
+
|
|
236
|
+
For custom OpenAI-compatible endpoints where `gpt-4o-mini` doesn't exist, pass the model name:
|
|
237
|
+
|
|
238
|
+
```bash
|
|
239
|
+
npm run check-llm -- my-custom-model
|
|
240
|
+
```
|
|
241
|
+
|
|
242
|
+
**For Claude Code**: When the development prompt or instructions mention testing with the Headless API or Agent Tester, run `npm run check-llm` before starting any Agent Tester work. If the script exits with a non-zero code, inform the developer about the issue and ask them to fix the configuration before proceeding.
|
|
243
|
+
|
|
244
|
+
## Headless API Reference
|
|
245
|
+
|
|
246
|
+
### Connection Verification
|
|
247
|
+
|
|
248
|
+
```
|
|
249
|
+
GET /agent-tester/api/mcp/status
|
|
250
|
+
```
|
|
251
|
+
|
|
252
|
+
Returns connection state and all available tools without going through the UI:
|
|
253
|
+
|
|
254
|
+
```json
|
|
255
|
+
{
|
|
256
|
+
"connected": true,
|
|
257
|
+
"servers": [
|
|
258
|
+
{
|
|
259
|
+
"name": "localhost9876",
|
|
260
|
+
"url": "http://localhost:9876/mcp",
|
|
261
|
+
"transport": "http",
|
|
262
|
+
"tools": [
|
|
263
|
+
{ "name": "get_currency_rate", "description": "Get current cross-rate...", "inputSchema": {} }
|
|
264
|
+
],
|
|
265
|
+
"toolCount": 1
|
|
266
|
+
}
|
|
267
|
+
],
|
|
268
|
+
"totalTools": 1
|
|
269
|
+
}
|
|
270
|
+
```
|
|
271
|
+
|
|
272
|
+
### Headless Chat Test
|
|
273
|
+
|
|
274
|
+
```
|
|
275
|
+
POST /agent-tester/api/chat/test
|
|
276
|
+
```
|
|
277
|
+
|
|
278
|
+
Same request body as `POST /api/chat/message`, but returns a **structured trace** of all intermediate steps.
|
|
279
|
+
|
|
280
|
+
#### Request Body
|
|
281
|
+
|
|
282
|
+
```json
|
|
283
|
+
{
|
|
284
|
+
"message": "What is the exchange rate of EUR to USD?",
|
|
285
|
+
"mcpConfig": {
|
|
286
|
+
"url": "http://localhost:9876/mcp",
|
|
287
|
+
"transport": "http",
|
|
288
|
+
"headers": { "Authorization": "Bearer <token>" }
|
|
289
|
+
},
|
|
290
|
+
"sessionId": "optional-session-id",
|
|
291
|
+
"agentPrompt": "optional agent prompt override",
|
|
292
|
+
"customPrompt": "optional additional instructions appended after agentPrompt",
|
|
293
|
+
"modelConfig": {
|
|
294
|
+
"model": "gpt-4o",
|
|
295
|
+
"temperature": 0.3,
|
|
296
|
+
"maxTokens": 4096,
|
|
297
|
+
"maxTurns": 10
|
|
298
|
+
}
|
|
299
|
+
}
|
|
300
|
+
```
|
|
301
|
+
|
|
302
|
+
Only `message` is required. `mcpConfig` is required for tool calls.
|
|
303
|
+
|
|
304
|
+
| Field | Required | Description |
|
|
305
|
+
|-------|----------|-------------|
|
|
306
|
+
| `message` | yes | User message to send to the agent |
|
|
307
|
+
| `mcpConfig` | no | MCP server connection config (required for tool calls) |
|
|
308
|
+
| `sessionId` | no | Session ID for multi-turn conversations; omit to start fresh |
|
|
309
|
+
| `agentPrompt` | no | Agent prompt to send to the LLM as the system prompt. When provided, **replaces** the MCP server's `agent_prompt`. When omitted, the MCP server's `agent_prompt` is used (if available), otherwise a built-in default |
|
|
310
|
+
| `customPrompt` | no | Additional instructions appended after `agentPrompt`. Use for per-request modifiers without replacing the main prompt |
|
|
311
|
+
| `modelConfig` | no | LLM model settings (model name, temperature, maxTokens, maxTurns) |
|
|
312
|
+
|
|
313
|
+
#### Brief Response (default)
|
|
314
|
+
|
|
315
|
+
```json
|
|
316
|
+
{
|
|
317
|
+
"message": "The EUR/USD rate is 1.0847",
|
|
318
|
+
"sessionId": "abc-123",
|
|
319
|
+
"trace": {
|
|
320
|
+
"system_prompt_sent": "You are a currency assistant...\n\nBe concise.",
|
|
321
|
+
"turns": [
|
|
322
|
+
{
|
|
323
|
+
"turn": 1,
|
|
324
|
+
"tool_calls": [
|
|
325
|
+
{ "name": "get_currency_rate", "arguments": { "quoteCurrency": "EUR", "baseCurrency": "USD" } }
|
|
326
|
+
],
|
|
327
|
+
"tool_results": [
|
|
328
|
+
{ "name": "get_currency_rate", "result": { "symbol": "EURUSD", "rate": 1.0847 }, "duration_ms": 230 }
|
|
329
|
+
]
|
|
330
|
+
}
|
|
331
|
+
],
|
|
332
|
+
"total_turns": 2,
|
|
333
|
+
"total_duration_ms": 1850,
|
|
334
|
+
"tools_used": ["get_currency_rate"]
|
|
335
|
+
}
|
|
336
|
+
}
|
|
337
|
+
```
|
|
338
|
+
|
|
339
|
+
The `system_prompt_sent` field contains the **final system prompt** that was sent to the LLM. Use it to verify exactly what the LLM received — especially when iterating on agent prompt variations.
|
|
340
|
+
|
|
341
|
+
Brief mode shows the tool interaction chain: which tools were called, with what arguments, and what they returned. No LLM internals.
|
|
342
|
+
|
|
343
|
+
#### Verbose Response
|
|
344
|
+
|
|
345
|
+
```
|
|
346
|
+
POST /agent-tester/api/chat/test?verbose=true
|
|
347
|
+
```
|
|
348
|
+
|
|
349
|
+
Adds per-turn LLM request/response details:
|
|
350
|
+
|
|
351
|
+
```json
|
|
352
|
+
{
|
|
353
|
+
"turns": [
|
|
354
|
+
{
|
|
355
|
+
"turn": 1,
|
|
356
|
+
"llm_request": { "model": "gpt-4o", "messages_count": 3 },
|
|
357
|
+
"llm_response": {
|
|
358
|
+
"finish_reason": "tool_calls",
|
|
359
|
+
"content": null,
|
|
360
|
+
"usage": { "prompt_tokens": 450, "completion_tokens": 32, "total_tokens": 482 }
|
|
361
|
+
},
|
|
362
|
+
"tool_calls": [...],
|
|
363
|
+
"tool_results": [...]
|
|
364
|
+
}
|
|
365
|
+
]
|
|
366
|
+
}
|
|
367
|
+
```
|
|
368
|
+
|
|
369
|
+
Use verbose mode when:
|
|
370
|
+
- The agent doesn't call the expected tool and the brief trace doesn't explain why
|
|
371
|
+
- The agent loops without resolving (check `finish_reason`)
|
|
372
|
+
- Token usage is unexpectedly high
|
|
373
|
+
- The response is empty or unexpected
|
|
374
|
+
|
|
375
|
+
#### Size Limit Overrides
|
|
376
|
+
|
|
377
|
+
```
|
|
378
|
+
POST /agent-tester/api/chat/test?maxResultChars=8000&maxTraceChars=100000
|
|
379
|
+
```
|
|
380
|
+
|
|
381
|
+
| Parameter | Default | Description |
|
|
382
|
+
|-----------|---------|-------------|
|
|
383
|
+
| `maxResultChars` | 4000 | Max characters per tool result in trace |
|
|
384
|
+
| `maxTraceChars` | 50000 | Max total trace size; older turns are collapsed to summaries when exceeded |
|
|
385
|
+
|
|
386
|
+
### Prompt Assembly
|
|
387
|
+
|
|
388
|
+
The system prompt sent to the LLM is resolved by priority — the first available value wins:
|
|
389
|
+
|
|
390
|
+
```
|
|
391
|
+
request.agentPrompt → session.agentPrompt → MCP server's agent_prompt → built-in default
|
|
392
|
+
```
|
|
393
|
+
|
|
394
|
+
If `customPrompt` is provided, it is appended after the resolved prompt.
|
|
395
|
+
|
|
396
|
+
The final result is sent as `{ role: "system" }` to the LLM and returned in the trace as `system_prompt_sent`.
|
|
397
|
+
|
|
398
|
+
**Key principle:** when `agentPrompt` is passed in the request, it **replaces** the MCP server's `agent_prompt` entirely. This enables the iterative prompt refinement workflow:
|
|
399
|
+
|
|
400
|
+
1. Read the current `AGENT_PROMPT` from `src/prompts/agent-prompt.ts`
|
|
401
|
+
2. Send it as `agentPrompt` in the headless request
|
|
402
|
+
3. Evaluate the agent's response and trace
|
|
403
|
+
4. Modify the prompt, send again
|
|
404
|
+
5. When satisfied, write the best variant back to `src/prompts/agent-prompt.ts`
|
|
405
|
+
|
|
406
|
+
```bash
|
|
407
|
+
# Test current prompt
|
|
408
|
+
curl -X POST http://localhost:9876/agent-tester/api/chat/test \
|
|
409
|
+
-H "Content-Type: application/json" \
|
|
410
|
+
-d '{"message":"Get EUR/USD rate","agentPrompt":"You are a concise currency assistant. Use tools, reply in one sentence.","mcpConfig":{"url":"http://localhost:9876/mcp","transport":"http"}}'
|
|
411
|
+
|
|
412
|
+
# Try a different variation
|
|
413
|
+
curl -X POST http://localhost:9876/agent-tester/api/chat/test \
|
|
414
|
+
-H "Content-Type: application/json" \
|
|
415
|
+
-d '{"message":"Get EUR/USD rate","agentPrompt":"You are a financial analyst. Explain rates with market context and trends.","mcpConfig":{"url":"http://localhost:9876/mcp","transport":"http"}}'
|
|
416
|
+
```
|
|
417
|
+
|
|
418
|
+
Compare `system_prompt_sent` and agent responses between variations to find the optimal prompt. When omitting `agentPrompt`, the MCP server's own `agent_prompt` is used automatically — this tests the currently deployed prompt as-is.
|
|
419
|
+
|
|
420
|
+
### Sessions
|
|
421
|
+
|
|
422
|
+
The headless API shares sessions with the chat UI. To start a fresh conversation, omit `sessionId`. To continue an existing conversation, pass `sessionId` from a previous response.
|
|
423
|
+
|
|
424
|
+
## Structured JSON Logging (`agentTester.logJson`)
|
|
425
|
+
|
|
426
|
+
When `agentTester.logJson` is `true`, each agent event is emitted as a single-line JSON object on stdout — useful for real-time monitoring, debugging, and log aggregation.
|
|
427
|
+
|
|
428
|
+
Enable via config, CLI flag, or environment variable:
|
|
429
|
+
|
|
430
|
+
```yaml
|
|
431
|
+
# config/local.yaml
|
|
432
|
+
agentTester:
|
|
433
|
+
logJson: true
|
|
434
|
+
```
|
|
435
|
+
|
|
436
|
+
```bash
|
|
437
|
+
npm start -- --log-json
|
|
438
|
+
# or
|
|
439
|
+
AGENT_TESTER_LOG_JSON=true npm start
|
|
440
|
+
```
|
|
441
|
+
|
|
442
|
+
Event types emitted:
|
|
443
|
+
|
|
444
|
+
```
|
|
445
|
+
{"event":"tool_call","name":"get_currency_rate","arguments":{"quoteCurrency":"EUR"},"timestamp":"2025-08-15T14:32:00.000Z"}
|
|
446
|
+
{"event":"tool_result","name":"get_currency_rate","result":{"rate":1.0847},"duration_ms":230,"timestamp":"2025-08-15T14:32:00.230Z"}
|
|
447
|
+
{"event":"llm_response","turn":2,"finish_reason":"stop","tool_calls":[],"has_content":true,"timestamp":"2025-08-15T14:32:01.500Z"}
|
|
448
|
+
{"event":"response","message":"The EUR/USD rate is 1.0847","tools_used":["get_currency_rate"],"duration_ms":1850}
|
|
449
|
+
```
|
|
450
|
+
|
|
451
|
+
**Default mode** (without `--log-json`) keeps the colored text logs for human debugging. The flag affects only agent tester events — other server logs (startup, auth, MCP protocol) continue in their normal format.
|
|
452
|
+
|
|
453
|
+
## Automated Testing with Claude Code
|
|
454
|
+
|
|
455
|
+
The Headless API is designed for CLI automation tools like Claude Code. The typical automated testing workflow:
|
|
456
|
+
|
|
457
|
+
0. Verify LLM availability: `npm run check-llm` (exit 0 = ready, non-zero = fix config first)
|
|
458
|
+
1. Build and start the server: `npm run cb && npm start`
|
|
459
|
+
2. Verify tools: `GET /agent-tester/api/mcp/status`
|
|
460
|
+
3. Send test messages: `POST /agent-tester/api/chat/test`
|
|
461
|
+
4. Analyze trace: correct tool? correct args? expected result?
|
|
462
|
+
5. If unclear: retry with `?verbose=true`
|
|
463
|
+
6. If issue found: fix code, rebuild, restart, re-test
|
|
464
|
+
7. Maintain a testing log at `claudedocs/test-log.md`
|
|
465
|
+
|
|
466
|
+
### Brief vs Verbose Strategy
|
|
467
|
+
|
|
468
|
+
**Default to brief mode.** The brief trace covers most debugging scenarios:
|
|
469
|
+
- Was the correct tool called?
|
|
470
|
+
- Were the arguments correct?
|
|
471
|
+
- Did the tool return the expected result?
|
|
472
|
+
- How many turns did the agent take?
|
|
473
|
+
|
|
474
|
+
**Switch to verbose** only when the brief trace doesn't explain the behavior:
|
|
475
|
+
- Tool was never called (check `finish_reason` — was it `stop` instead of `tool_calls`?)
|
|
476
|
+
- Wrong tool was called (check if the tool description is ambiguous)
|
|
477
|
+
- Agent loops (check per-turn `finish_reason` and token usage)
|
|
478
|
+
- Empty response (check if `content` is null across all turns)
|
|
479
|
+
|
|
480
|
+
## Agent Tester Chat UI
|
|
481
|
+
|
|
482
|
+
The Agent Tester also provides a web UI at `/agent-tester` for interactive testing. The UI auto-connects to the local MCP server and auto-fills auth headers if configured.
|
|
483
|
+
|
|
484
|
+
The chat UI uses `POST /api/chat/message` (which returns only the final response). The headless API uses `POST /api/chat/test` (which returns the response plus full trace data). Both share the same underlying agent logic and session storage.
|
|
485
|
+
|
|
486
|
+
## UI Test Selectors (`data-testid`)
|
|
487
|
+
|
|
488
|
+
For UI automation (Playwright, Cypress, Selenium) the Agent Tester page is annotated with stable `data-testid` attributes. Prefer these over CSS classes, DOM IDs, or label text — they are the documented contract and won't change with styling or copy edits.
|
|
489
|
+
|
|
490
|
+
### Naming Convention
|
|
491
|
+
|
|
492
|
+
All selectors use the `at-` prefix (short for "agent tester") in kebab-case:
|
|
493
|
+
|
|
494
|
+
```
|
|
495
|
+
at-<area>-<element>[-<modifier>]
|
|
496
|
+
```
|
|
497
|
+
|
|
498
|
+
Example: `at-auth-token-input`, `at-server-url`, `at-message-user`, `at-toast-success`.
|
|
499
|
+
|
|
500
|
+
Dynamic elements that map 1:1 to runtime data append the runtime key:
|
|
501
|
+
|
|
502
|
+
```
|
|
503
|
+
at-header-row-<headerName> e.g. at-header-row-Authorization
|
|
504
|
+
at-header-input-<headerName> e.g. at-header-input-X-Session-Id
|
|
505
|
+
at-message-<sender> e.g. at-message-user, at-message-assistant
|
|
506
|
+
at-toast-<type> e.g. at-toast-success, at-toast-error
|
|
507
|
+
```
|
|
508
|
+
|
|
509
|
+
### Selector Reference
|
|
510
|
+
|
|
511
|
+
**Auth overlay (shown when `agentTester.useAuth: true`)**
|
|
512
|
+
|
|
513
|
+
| testid | Element |
|
|
514
|
+
|---|---|
|
|
515
|
+
| `at-auth-overlay` | Root login overlay container |
|
|
516
|
+
| `at-auth-tabs` | Tab switcher (only rendered when multiple methods configured) |
|
|
517
|
+
| `at-auth-tab-token` | "Token" tab button |
|
|
518
|
+
| `at-auth-tab-basic` | "Login" tab button |
|
|
519
|
+
| `at-auth-token-form` | Token login form |
|
|
520
|
+
| `at-auth-token-input` | Token input field |
|
|
521
|
+
| `at-auth-token-submit` | Token submit button |
|
|
522
|
+
| `at-auth-basic-form` | Basic auth form |
|
|
523
|
+
| `at-auth-username` | Username input |
|
|
524
|
+
| `at-auth-password` | Password input |
|
|
525
|
+
| `at-auth-basic-submit` | Basic submit button |
|
|
526
|
+
| `at-auth-error` | Error message container |
|
|
527
|
+
|
|
528
|
+
**App shell**
|
|
529
|
+
|
|
530
|
+
| testid | Element |
|
|
531
|
+
|---|---|
|
|
532
|
+
| `at-app` | Root app container (hidden until authenticated) |
|
|
533
|
+
| `at-sidebar` | Sidebar (configuration panel) |
|
|
534
|
+
| `at-main` | Main chat area |
|
|
535
|
+
| `at-chat-header` | Chat header bar |
|
|
536
|
+
|
|
537
|
+
**Sidebar — connection form**
|
|
538
|
+
|
|
539
|
+
| testid | Element |
|
|
540
|
+
|---|---|
|
|
541
|
+
| `at-connection-form` | MCP connection form |
|
|
542
|
+
| `at-server-url` | MCP server URL input |
|
|
543
|
+
| `at-server-url-dropdown` | Saved URLs dropdown toggle |
|
|
544
|
+
| `at-server-url-dropdown-list` | Saved URLs dropdown panel |
|
|
545
|
+
| `at-server-url-add-new` | "Add new URL" menu item |
|
|
546
|
+
| `at-saved-urls-list` | Container for saved URL items |
|
|
547
|
+
| `at-saved-url-item` | Each saved URL row (dynamic) |
|
|
548
|
+
| `at-saved-url-text` | Clickable URL text within a row |
|
|
549
|
+
| `at-saved-url-delete` | Delete button for a saved URL |
|
|
550
|
+
| `at-transport` | Transport `<select>` (http / sse) |
|
|
551
|
+
| `at-connect-btn` | Connect button |
|
|
552
|
+
| `at-connected-servers` | Connection status bar container |
|
|
553
|
+
| `at-server-status-row` | Status row (dynamic, rendered after connect attempt) |
|
|
554
|
+
| `at-server-status-connected` | "X tools connected" badge |
|
|
555
|
+
| `at-server-status-disconnected` | "Disconnected" badge |
|
|
556
|
+
| `at-disconnect-btn` | Disconnect button |
|
|
557
|
+
| `at-reconnect-btn` | Reconnect button |
|
|
558
|
+
|
|
559
|
+
**Sidebar — HTTP headers section**
|
|
560
|
+
|
|
561
|
+
| testid | Element |
|
|
562
|
+
|---|---|
|
|
563
|
+
| `at-headers-section` | Headers section container |
|
|
564
|
+
| `at-dynamic-headers` | Headers list container |
|
|
565
|
+
| `at-header-row-<name>` | Row for a specific header (e.g. `at-header-row-Authorization`) |
|
|
566
|
+
| `at-header-input-<name>` | Input for a specific header value |
|
|
567
|
+
|
|
568
|
+
**Sidebar — LLM settings**
|
|
569
|
+
|
|
570
|
+
The sidebar shows only the current model name (read-only) and a gear button. All LLM parameters (Base URL, API Key, Model Name, Temperature, Max Tokens, Max Turns, Limit (chars)) are edited in the LLM Settings modal opened via that button. Settings are persisted in `localStorage['mcpAgentLlmSettings']`. If `agentTester.openAi.exposeToClient` is `true` in config, the server sends `baseURL` and `apiKey` via `GET /agent-tester/api/config` → `llmDefaults` and the UI pre-fills them into localStorage on first open (security note: only enable `exposeToClient` when the tester is protected by `useAuth: true` or deployed in a trusted network). When the effective `apiKey` is empty, a red "API Key is not set" warning is shown below the model name.
|
|
571
|
+
|
|
572
|
+
| testid | Element |
|
|
573
|
+
|---|---|
|
|
574
|
+
| `at-model-section` | Model section container |
|
|
575
|
+
| `at-model-display` | Read-only current model name |
|
|
576
|
+
| `at-llm-settings-btn` | Gear button that opens the LLM Settings modal |
|
|
577
|
+
| `at-api-key-warning` | "API Key is not set" warning (visible only when `apiKey` is empty) |
|
|
578
|
+
| `at-llm-modal` | LLM Settings modal overlay |
|
|
579
|
+
| `at-llm-modal-close` | Modal close (×) button |
|
|
580
|
+
| `at-llm-modal-cancel` | Modal Cancel button |
|
|
581
|
+
| `at-llm-modal-save` | Modal Save button |
|
|
582
|
+
| `at-llm-base-url` | Base URL input (optional — empty means OpenAI default) |
|
|
583
|
+
| `at-llm-api-key` | API Key input (password field) |
|
|
584
|
+
| `at-llm-api-key-toggle` | Show/hide API key visibility toggle |
|
|
585
|
+
| `at-llm-model-name` | Model Name input (editable combobox) |
|
|
586
|
+
| `at-llm-model-dropdown-toggle` | Model dropdown arrow button |
|
|
587
|
+
| `at-llm-model-dropdown-list` | Model dropdown list (preset models) |
|
|
588
|
+
| `at-llm-model-option-<name>` | Individual model option inside the list |
|
|
589
|
+
| `at-llm-temperature` | Temperature input |
|
|
590
|
+
| `at-llm-max-tokens` | Max tokens input |
|
|
591
|
+
| `at-llm-max-turns` | Max turns input |
|
|
592
|
+
| `at-llm-limit-chars` | Tool result char limit input |
|
|
593
|
+
|
|
594
|
+
**Sidebar — prompts**
|
|
595
|
+
|
|
596
|
+
| testid | Element |
|
|
597
|
+
|---|---|
|
|
598
|
+
| `at-system-prompt` | Agent (system) prompt `<textarea>` |
|
|
599
|
+
| `at-system-prompt-enlarge` | Enlarge button for agent prompt |
|
|
600
|
+
| `at-custom-prompt` | Custom prompt `<textarea>` |
|
|
601
|
+
| `at-custom-prompt-enlarge` | Enlarge button for custom prompt |
|
|
602
|
+
|
|
603
|
+
**Chat header**
|
|
604
|
+
|
|
605
|
+
| testid | Element |
|
|
606
|
+
|---|---|
|
|
607
|
+
| `at-sidebar-toggle-mobile` | Mobile sidebar toggle |
|
|
608
|
+
| `at-default-format` | Default display format `<select>` (HTML / MD) |
|
|
609
|
+
| `at-theme-toggle` | Light/dark theme toggle |
|
|
610
|
+
| `at-clear-chat` | Clear chat button |
|
|
611
|
+
| `at-logout-btn` | Logout button (visible only when `useAuth` is true) |
|
|
612
|
+
|
|
613
|
+
**Chat area**
|
|
614
|
+
|
|
615
|
+
| testid | Element |
|
|
616
|
+
|---|---|
|
|
617
|
+
| `at-chat-messages` | Messages scroll container |
|
|
618
|
+
| `at-welcome-message` | Initial welcome card |
|
|
619
|
+
| `at-message-user` | User message bubble (one per message) |
|
|
620
|
+
| `at-message-assistant` | Assistant message bubble |
|
|
621
|
+
| `at-message-text-user` | Inner text element of a user message |
|
|
622
|
+
| `at-message-text-assistant` | Inner text element of an assistant message |
|
|
623
|
+
| `at-message-format-toggle` | HTML/MD format toggle on an assistant message |
|
|
624
|
+
| `at-typing-indicator` | Typing indicator (shown during LLM response) |
|
|
625
|
+
| `at-message-input` | Chat input `<textarea>` |
|
|
626
|
+
| `at-char-count` | Character counter span |
|
|
627
|
+
| `at-send-btn` | Send button |
|
|
628
|
+
|
|
629
|
+
**Modals and overlays**
|
|
630
|
+
|
|
631
|
+
| testid | Element |
|
|
632
|
+
|---|---|
|
|
633
|
+
| `at-prompt-modal` | Prompt enlarge modal overlay |
|
|
634
|
+
| `at-prompt-modal-title` | Modal title |
|
|
635
|
+
| `at-prompt-modal-textarea` | Modal text area |
|
|
636
|
+
| `at-prompt-modal-save` | Apply button |
|
|
637
|
+
| `at-prompt-modal-close` | Close button |
|
|
638
|
+
| `at-loading-overlay` | Global loading overlay |
|
|
639
|
+
| `at-header-tooltip` | Floating header description tooltip |
|
|
640
|
+
| `at-toast-container` | Toast notifications container |
|
|
641
|
+
| `at-toast-success` / `at-toast-error` / `at-toast-warning` / `at-toast-info` | Individual toast (dynamic) |
|
|
642
|
+
|
|
643
|
+
### Usage Examples
|
|
644
|
+
|
|
645
|
+
**Playwright**
|
|
646
|
+
|
|
647
|
+
```js
|
|
648
|
+
await page.goto('http://localhost:9876/agent-tester');
|
|
649
|
+
|
|
650
|
+
// Login when useAuth is enabled
|
|
651
|
+
await page.getByTestId('at-auth-token-input').fill(process.env.MCP_TOKEN);
|
|
652
|
+
await page.getByTestId('at-auth-token-submit').click();
|
|
653
|
+
|
|
654
|
+
// Wait for main app
|
|
655
|
+
await page.getByTestId('at-app').waitFor();
|
|
656
|
+
|
|
657
|
+
// Send a chat message
|
|
658
|
+
await page.getByTestId('at-message-input').fill('List all tools');
|
|
659
|
+
await page.getByTestId('at-send-btn').click();
|
|
660
|
+
|
|
661
|
+
// Assert an assistant reply appeared
|
|
662
|
+
await page.getByTestId('at-message-assistant').first().waitFor();
|
|
663
|
+
```
|
|
664
|
+
|
|
665
|
+
**Cypress**
|
|
666
|
+
|
|
667
|
+
```js
|
|
668
|
+
cy.visit('/agent-tester');
|
|
669
|
+
cy.get('[data-testid=at-auth-token-input]').type(Cypress.env('MCP_TOKEN'));
|
|
670
|
+
cy.get('[data-testid=at-auth-token-submit]').click();
|
|
671
|
+
cy.get('[data-testid=at-server-status-connected]').should('be.visible');
|
|
672
|
+
```
|
|
673
|
+
|
|
674
|
+
### Stability Guarantee
|
|
675
|
+
|
|
676
|
+
These test-ids are part of the public contract of the Agent Tester UI. Once added, a given id is not renamed or removed without a changelog entry. New elements are added with new ids as the UI grows. When authoring tests, prefer `data-testid` over:
|
|
677
|
+
|
|
678
|
+
- DOM `id` (may be shared with form `<label for>` pairs and collide across scopes)
|
|
679
|
+
- CSS class names (used for styling — may be renamed or removed during refactors)
|
|
680
|
+
- Visible text (localized / editable copy — changes break tests)
|
|
681
|
+
- XPath or positional selectors (brittle to layout changes)
|