fa-mcp-sdk 0.4.74 → 0.4.76

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,659 +1,681 @@
1
- # Agent Tester and Headless API
2
-
3
- ## Overview
4
-
5
- The Agent Tester is a built-in AI agent system for developing and refining MCP server tools. It goes beyond functional testing — it validates the **full agent experience**: how the LLM interprets tool descriptions, selects tools, passes arguments, and presents results.
6
-
7
- The Headless API provides programmatic access to the Agent Tester without a browser. It enables CLI-based automated testing and returns structured trace data for every tool call, argument, result, and LLM decision.
8
-
9
- ## Developing MCP Servers as Agents
10
-
11
- An MCP server is not just a set of tools — it is an **agent interface**. The LLM acts as the agent, deciding which tools to call, with what arguments, and how to interpret results. This means the quality of the agent experience depends on:
12
-
13
- - **Tool descriptions** — the LLM reads them to decide when and why to call a tool
14
- - **Parameter schemas** — names, types, required/optional flags, and default value documentation guide the LLM's argument construction
15
- - **Response format** — `formatToolResult()` output must be structured so the LLM can interpret and relay it to the user
16
- - **Agent prompt** — the system prompt shapes the LLM's conversation style, tool usage logic, and error handling behavior
17
- - **Tool decomposition** — whether one tool should be split into two, or two merged into one
18
-
19
- All of these aspects are **invisible to unit tests**. A tool can pass all unit tests and still produce a poor agent experience because the LLM misinterprets the description, sends wrong argument types, or doesn't understand the response format.
20
-
21
- The Agent Tester closes this gap by running the **full agent loop**: user message → LLM reasoning → tool selection → tool execution → LLM interpretation → user response.
22
-
23
- ## Three-Phase Development Workflow
24
-
25
- ### Phase 1: Initial Architecture
26
-
27
- Design tools, prompts, parameters, and handler logic based on task requirements. Implement a first working version:
28
-
29
- ```bash
30
- npm run cb && npm start
31
- ```
32
-
33
- ### Phase 2: Basic Functionality
34
-
35
- Verify compilation, server startup, tool registration, and basic calls. Fix crashes, connection errors, and missing tools.
36
-
37
- ### Phase 3: Iterative Refinement
38
-
39
- This is the key phase. Send test messages through the Agent Tester, observe the agent's behavior, diagnose issues, and refine:
40
-
41
- ```
42
- observe agent behavior → diagnose root cause → fix → rebuild → re-test
43
- ```
44
-
45
- Root cause categories:
46
- - **Tool description** — LLM picks wrong tool or misunderstands purpose
47
- - **Parameter schema** — LLM sends wrong types or misses required params
48
- - **Agent prompt** — LLM doesn't follow desired conversation style
49
- - **Handler logic** — tool results confuse the LLM
50
- - **Error messages** — failures produce unhelpful responses
51
-
52
- ## Authentication (`agentTester.useAuth`)
53
-
54
- When `agentTester.useAuth` is `true`, the Agent Tester is protected by the full multi-auth middleware — the same authentication chain used for MCP endpoints (`permanentServerTokens` / `basic` / `jwtToken` / `custom`).
55
-
56
- ### How It Works
57
-
58
- **Browser access:** When a user opens `/agent-tester` in a browser, the page loads normally (static assets are served without auth). The frontend checks `GET /api/auth/status` and displays a **login dialog** if the user is not authenticated. The dialog adapts to configured auth methods:
59
-
60
- - If `permanentServerTokens` or `jwtToken` is configured — shows a "Token" input
61
- - If `basic` auth is configured — shows "Username" + "Password" inputs
62
- - If both are configured — shows tabs to switch between methods
63
-
64
- After successful login via `POST /api/auth/login`, the server issues an httpOnly session cookie (`__at_sid`). All subsequent API requests from the browser include this cookie automatically. The session is valid for the configured TTL (default: 8 hours — see [Session Lifetime](#session-lifetime) below). A logout button appears in the header.
65
-
66
- **Headless / CLI access:** Headless API consumers (curl, scripts, Claude Code) bypass the login dialog entirely. They pass an `Authorization` header with each request, which is validated by the standard `authMW`. No session cookie is needed.
67
-
68
- ### Configuration
69
-
70
- ```yaml
71
- agentTester:
72
- useAuth: true # Show login screen for browser, require auth for API
73
- sessionTtlMs: 28800000 # Browser session lifetime in ms (default: 8h)
74
- tokenTTLSec: 1800 # TTL of JWTs auto-issued for the chat UI / headless clients (default: 30 min)
75
-
76
- webServer:
77
- auth:
78
- enabled: true
79
- permanentServerTokens: ['my-secret-token']
80
- # and/or basic, jwtToken — any configured method will be available
81
- ```
82
-
83
- Environment variables:
84
-
85
- - `AGENT_TESTER_USE_AUTH=true`
86
- - `AGENT_TESTER_SESSION_TTL_MS=28800000`
87
- - `AGENT_TESTER_TOKEN_TTL_SEC=1800`
88
-
89
- When `useAuth` is `false` (default), the Agent Tester is accessible without any authentication and `sessionTtlMs` has no effect.
90
-
91
- ### Session Lifetime
92
-
93
- When `useAuth` is `true`, a successful browser login creates a server-side session and sets an httpOnly cookie (`__at_sid`) scoped to `/agent-tester`. Both the in-memory entry and the cookie's `Max-Age` use the same TTL from `agentTester.sessionTtlMs`.
94
-
95
- **Where sessions live**: an in-memory `Map` inside the server process (`src/core/auth/agent-tester-auth.ts`). There is no disk or Redis persistence — this is intentional because Agent Tester is a development tool, not a production auth system.
96
-
97
- **Default TTL**: 8 hours (`28_800_000` ms). Override by setting `agentTester.sessionTtlMs` in `config/default.yaml` (or any environment-specific override file), or via `AGENT_TESTER_SESSION_TTL_MS`. Values are in milliseconds; any non-positive or non-finite value falls back to the 8h default.
98
-
99
- **Cleanup**: a background sweep runs every 30 minutes and drops expired entries from the map. Expired entries are also evicted lazily on access.
100
-
101
- **Impact of closing the browser or restarting the server**:
102
-
103
- | Scenario | Re-login required? |
104
- |---|---|
105
- | Close tab, reopen within TTL | No — cookie is persistent, server session still live |
106
- | Close entire browser, reopen within TTL | No — cookie is persistent, server session still live |
107
- | TTL elapsed since last login | Yes — server drops the entry, responds 401 |
108
- | Server restart (Ctrl+C, deploy, crash) | Yes — in-memory map is cleared; browser presents an unknown `__at_sid` and the login overlay reappears |
109
- | User clicks the Logout button | Yes — `POST /api/auth/logout` deletes the entry and clears the cookie |
110
-
111
- **Tuning guidance**:
112
-
113
- - **Shorter TTL (e.g. 1 hour = `3600000`)**: more frequent logins, smaller exposure if a workstation is left unlocked.
114
- - **Longer TTL (e.g. 24 hours = `86400000`)**: fewer interruptions during long development sessions.
115
- - **Do not set TTL to 0 or a negative value** — the server will silently fall back to the 8h default.
116
-
117
- > **Note**: the TTL only affects the browser login flow. Headless API access via `Authorization` header is stateless and completely bypasses sessions; it is unaffected by `sessionTtlMs`.
118
-
119
- ### Auth API Endpoints
120
-
121
- | Endpoint | Method | Description |
122
- |----------|--------|-------------|
123
- | `/api/auth/status` | GET | Returns `{ authRequired, authenticated, methods }` |
124
- | `/api/auth/login` | POST | Validates credentials, sets session cookie |
125
- | `/api/auth/logout` | POST | Destroys session, clears cookie |
126
- | `/api/auth-token` | GET | Returns a ready-to-use `Authorization` header value for the configured MCP auth method (used by the chat UI to auto-fill the header). Response: `{ authType, token, ttlSec? }`. |
127
- | `/api/auth-token/refresh` | POST | Re-issues a fresh JWT (only when `webServer.auth.jwtToken.encryptKey` is configured). Response: `{ authType: 'jwtToken', token, ttlSec }`. |
128
-
129
- ### Auto-filled Authorization Header
130
-
131
- When the MCP server requires authentication (`webServer.auth.enabled: true`) and the chat UI is configured to send the `Authorization` header, the page does **not** ask the user to type a token — it issues one for itself by calling `GET /api/auth-token` on load. The endpoint returns a header value derived from the configured method, in priority order:
132
-
133
- 1. **`jwtToken`** — `Bearer <encrypted JWT>` issued by the server with `sub: 'agentTester'`, `service: <appConfig.name>`, and TTL = `agentTester.tokenTTLSec` (default 1800 sec / 30 min). The response also includes `ttlSec` so the client can plan refresh.
134
- 2. **`basic`** — `Basic <base64(user:password)>` from `webServer.auth.basic`.
135
- 3. **`permanentServerTokens`** — `Bearer <first configured token>`.
136
-
137
- For **JWT only**, the page periodically refreshes the token on its own via `POST /api/auth-token/refresh`. The refresh cadence is approximately `max(30, ttlSec/3 - 60)` seconds (≈ once per 1/3 of TTL, with a 60-second safety lead and a 30-second floor). At the default `tokenTTLSec: 1800`, this means a refresh roughly every **9 minutes**. The page additionally triggers an immediate refresh when the tab regains focus or `visibilitychange` fires `'visible'`, to recover from background-tab timer throttling.
138
-
139
- If the MCP call still fails with HTTP 401 — for example, the cached token expired in the brief window between the last refresh and the request — the server transparently re-issues a JWT and retries the call **once**, but only when the target URL points to the same server (host/port match `webServer.{host,port}`, with `localhost`/`127.0.0.1`/`::1`/`0.0.0.0` treated as equivalent) and the cached header was a `Bearer …` token. This means the user typically does not see a 401 even if a request races against TTL expiry.
140
-
141
- **Tuning**:
142
- - Shorter `tokenTTLSec` → more frequent refresh requests but smaller window of exposure if a token leaks.
143
- - Longer `tokenTTLSec` → fewer refreshes; useful for very long-running sessions.
144
- - Headless clients (the `headless-chat.js` wrapper, custom curl scripts) may either rely on the 401-retry path or, for long-running scripts, mint their own JWT via `node scripts/generate-jwt.js` with an appropriate TTL — Agent Tester does not refresh tokens on behalf of headless clients.
145
-
146
- **Login request body:**
147
-
148
- ```json
149
- // Token-based (permanent token or JWT)
150
- { "token": "my-secret-token" }
151
-
152
- // Basic auth
153
- { "username": "admin", "password": "secret" }
154
- ```
155
-
156
- ### Headless Client Example
157
-
158
- ```bash
159
- # Access Agent Tester API with token (no login needed)
160
- curl -H "Authorization: Bearer my-secret-token" http://localhost:9876/agent-tester/api/mcp/status
161
-
162
- # Headless test with token
163
- curl -X POST http://localhost:9876/agent-tester/api/chat/test \
164
- -H "Authorization: Bearer my-secret-token" \
165
- -H "Content-Type: application/json" \
166
- -d '{"message":"Hello","mcpConfig":{"url":"http://localhost:9876/mcp","transport":"http"}}'
167
- ```
168
-
169
- ## Disabled State
170
-
171
- When `agentTester.enabled` is `false` (or not set), all `/agent-tester/*` endpoints including the Headless API return HTTP 404:
172
-
173
- ```json
174
- {
175
- "error": "Not found"
176
- }
177
- ```
178
-
179
- This prevents disclosing the existence of the Agent Tester to external users.
180
-
181
- To enable, developer can use one of:
182
-
183
- - **ENV**: `AGENT_TESTER_ENABLED=true`
184
- - **Config file** `config/local.yaml`:
185
- ```yaml
186
- agentTester:
187
- enabled: true
188
- ```
189
-
190
- ### Hiding Footer Link Without Disabling Tester
191
-
192
- To keep the Agent Tester functional (accessible directly by URL) but hide the link from the home page footer, use `showFooterLink: false`:
193
-
194
- ```yaml
195
- agentTester:
196
- enabled: true
197
- showFooterLink: false # Tester works, but footer link is hidden
198
- ```
199
-
200
- When `showFooterLink` is omitted or `true` (default), the footer link is shown as usual.
201
-
202
- **For Claude Code**: When you receive a 404 response on `/agent-tester/*`, the Agent Tester may be disabled. Inform the developer and suggest enabling it via `agentTester.enabled: true` in `config/local.yaml` or ENV `AGENT_TESTER_ENABLED=true`. The developer decides whether to turn it on. If the developer's prompt or instructions explicitly say not to mention Agent Tester enablement, respect that and stay silent.
203
-
204
- ## LLM Availability Check
205
-
206
- Before using the Agent Tester or Headless API, verify that the OpenAI API key is configured and works:
207
-
208
- ```bash
209
- npm run check-llm
210
- ```
211
-
212
- The script checks that `agentTester.openAi.apiKey` is set and makes a minimal `chat.completions.create` request (max_tokens: 1) to verify the key works. Prints `OK` on success, `FAIL: <reason>` on failure. Exit code `0` = success, `1` = failure.
213
-
214
- For custom OpenAI-compatible endpoints where `gpt-4o-mini` doesn't exist, pass the model name:
215
-
216
- ```bash
217
- npm run check-llm -- my-custom-model
218
- ```
219
-
220
- **For Claude Code**: When the development prompt or instructions mention testing with the Headless API or Agent Tester, run `npm run check-llm` before starting any Agent Tester work. If the script exits with a non-zero code, inform the developer about the issue and ask them to fix the configuration before proceeding.
221
-
222
- ## Headless API Reference
223
-
224
- ### Connection Verification
225
-
226
- ```
227
- GET /agent-tester/api/mcp/status
228
- ```
229
-
230
- Returns connection state and all available tools without going through the UI:
231
-
232
- ```json
233
- {
234
- "connected": true,
235
- "servers": [
236
- {
237
- "name": "localhost9876",
238
- "url": "http://localhost:9876/mcp",
239
- "transport": "http",
240
- "tools": [
241
- { "name": "get_currency_rate", "description": "Get current cross-rate...", "inputSchema": {} }
242
- ],
243
- "toolCount": 1
244
- }
245
- ],
246
- "totalTools": 1
247
- }
248
- ```
249
-
250
- ### Headless Chat Test
251
-
252
- ```
253
- POST /agent-tester/api/chat/test
254
- ```
255
-
256
- Same request body as `POST /api/chat/message`, but returns a **structured trace** of all intermediate steps.
257
-
258
- #### Request Body
259
-
260
- ```json
261
- {
262
- "message": "What is the exchange rate of EUR to USD?",
263
- "mcpConfig": {
264
- "url": "http://localhost:9876/mcp",
265
- "transport": "http",
266
- "headers": { "Authorization": "Bearer <token>" }
267
- },
268
- "sessionId": "optional-session-id",
269
- "agentPrompt": "optional agent prompt override",
270
- "customPrompt": "optional additional instructions appended after agentPrompt",
271
- "modelConfig": {
272
- "model": "gpt-4o",
273
- "temperature": 0.3,
274
- "maxTokens": 4096,
275
- "maxTurns": 10
276
- }
277
- }
278
- ```
279
-
280
- Only `message` is required. `mcpConfig` is required for tool calls.
281
-
282
- | Field | Required | Description |
283
- |-------|----------|-------------|
284
- | `message` | yes | User message to send to the agent |
285
- | `mcpConfig` | no | MCP server connection config (required for tool calls) |
286
- | `sessionId` | no | Session ID for multi-turn conversations; omit to start fresh |
287
- | `agentPrompt` | no | Agent prompt to send to the LLM as the system prompt. When provided, **replaces** the MCP server's `agent_prompt`. When omitted, the MCP server's `agent_prompt` is used (if available), otherwise a built-in default |
288
- | `customPrompt` | no | Additional instructions appended after `agentPrompt`. Use for per-request modifiers without replacing the main prompt |
289
- | `modelConfig` | no | LLM model settings (model name, temperature, maxTokens, maxTurns) |
290
-
291
- #### Brief Response (default)
292
-
293
- ```json
294
- {
295
- "message": "The EUR/USD rate is 1.0847",
296
- "sessionId": "abc-123",
297
- "trace": {
298
- "system_prompt_sent": "You are a currency assistant...\n\nBe concise.",
299
- "turns": [
300
- {
301
- "turn": 1,
302
- "tool_calls": [
303
- { "name": "get_currency_rate", "arguments": { "quoteCurrency": "EUR", "baseCurrency": "USD" } }
304
- ],
305
- "tool_results": [
306
- { "name": "get_currency_rate", "result": { "symbol": "EURUSD", "rate": 1.0847 }, "duration_ms": 230 }
307
- ]
308
- }
309
- ],
310
- "total_turns": 2,
311
- "total_duration_ms": 1850,
312
- "tools_used": ["get_currency_rate"]
313
- }
314
- }
315
- ```
316
-
317
- The `system_prompt_sent` field contains the **final system prompt** that was sent to the LLM. Use it to verify exactly what the LLM received — especially when iterating on agent prompt variations.
318
-
319
- Brief mode shows the tool interaction chain: which tools were called, with what arguments, and what they returned. No LLM internals.
320
-
321
- #### Verbose Response
322
-
323
- ```
324
- POST /agent-tester/api/chat/test?verbose=true
325
- ```
326
-
327
- Adds per-turn LLM request/response details:
328
-
329
- ```json
330
- {
331
- "turns": [
332
- {
333
- "turn": 1,
334
- "llm_request": { "model": "gpt-4o", "messages_count": 3 },
335
- "llm_response": {
336
- "finish_reason": "tool_calls",
337
- "content": null,
338
- "usage": { "prompt_tokens": 450, "completion_tokens": 32, "total_tokens": 482 }
339
- },
340
- "tool_calls": [...],
341
- "tool_results": [...]
342
- }
343
- ]
344
- }
345
- ```
346
-
347
- Use verbose mode when:
348
- - The agent doesn't call the expected tool and the brief trace doesn't explain why
349
- - The agent loops without resolving (check `finish_reason`)
350
- - Token usage is unexpectedly high
351
- - The response is empty or unexpected
352
-
353
- #### Size Limit Overrides
354
-
355
- ```
356
- POST /agent-tester/api/chat/test?maxResultChars=8000&maxTraceChars=100000
357
- ```
358
-
359
- | Parameter | Default | Description |
360
- |-----------|---------|-------------|
361
- | `maxResultChars` | 4000 | Max characters per tool result in trace |
362
- | `maxTraceChars` | 50000 | Max total trace size; older turns are collapsed to summaries when exceeded |
363
-
364
- ### Prompt Assembly
365
-
366
- The system prompt sent to the LLM is resolved by priority — the first available value wins:
367
-
368
- ```
369
- request.agentPrompt → session.agentPrompt → MCP server's agent_prompt → built-in default
370
- ```
371
-
372
- If `customPrompt` is provided, it is appended after the resolved prompt.
373
-
374
- The final result is sent as `{ role: "system" }` to the LLM and returned in the trace as `system_prompt_sent`.
375
-
376
- **Key principle:** when `agentPrompt` is passed in the request, it **replaces** the MCP server's `agent_prompt` entirely. This enables the iterative prompt refinement workflow:
377
-
378
- 1. Read the current `AGENT_PROMPT` from `src/prompts/agent-prompt.ts`
379
- 2. Send it as `agentPrompt` in the headless request
380
- 3. Evaluate the agent's response and trace
381
- 4. Modify the prompt, send again
382
- 5. When satisfied, write the best variant back to `src/prompts/agent-prompt.ts`
383
-
384
- ```bash
385
- # Test current prompt
386
- curl -X POST http://localhost:9876/agent-tester/api/chat/test \
387
- -H "Content-Type: application/json" \
388
- -d '{"message":"Get EUR/USD rate","agentPrompt":"You are a concise currency assistant. Use tools, reply in one sentence.","mcpConfig":{"url":"http://localhost:9876/mcp","transport":"http"}}'
389
-
390
- # Try a different variation
391
- curl -X POST http://localhost:9876/agent-tester/api/chat/test \
392
- -H "Content-Type: application/json" \
393
- -d '{"message":"Get EUR/USD rate","agentPrompt":"You are a financial analyst. Explain rates with market context and trends.","mcpConfig":{"url":"http://localhost:9876/mcp","transport":"http"}}'
394
- ```
395
-
396
- Compare `system_prompt_sent` and agent responses between variations to find the optimal prompt. When omitting `agentPrompt`, the MCP server's own `agent_prompt` is used automatically this tests the currently deployed prompt as-is.
397
-
398
- ### Sessions
399
-
400
- The headless API shares sessions with the chat UI. To start a fresh conversation, omit `sessionId`. To continue an existing conversation, pass `sessionId` from a previous response.
401
-
402
- ## Structured JSON Logging (`agentTester.logJson`)
403
-
404
- When `agentTester.logJson` is `true`, each agent event is emitted as a single-line JSON object on stdout — useful for real-time monitoring, debugging, and log aggregation.
405
-
406
- Enable via config, CLI flag, or environment variable:
407
-
408
- ```yaml
409
- # config/local.yaml
410
- agentTester:
411
- logJson: true
412
- ```
413
-
414
- ```bash
415
- npm start -- --log-json
416
- # or
417
- AGENT_TESTER_LOG_JSON=true npm start
418
- ```
419
-
420
- Event types emitted:
421
-
422
- ```
423
- {"event":"tool_call","name":"get_currency_rate","arguments":{"quoteCurrency":"EUR"},"timestamp":"2025-08-15T14:32:00.000Z"}
424
- {"event":"tool_result","name":"get_currency_rate","result":{"rate":1.0847},"duration_ms":230,"timestamp":"2025-08-15T14:32:00.230Z"}
425
- {"event":"llm_response","turn":2,"finish_reason":"stop","tool_calls":[],"has_content":true,"timestamp":"2025-08-15T14:32:01.500Z"}
426
- {"event":"response","message":"The EUR/USD rate is 1.0847","tools_used":["get_currency_rate"],"duration_ms":1850}
427
- ```
428
-
429
- **Default mode** (without `--log-json`) keeps the colored text logs for human debugging. The flag affects only agent tester events — other server logs (startup, auth, MCP protocol) continue in their normal format.
430
-
431
- ## Automated Testing with Claude Code
432
-
433
- The Headless API is designed for CLI automation tools like Claude Code. The typical automated testing workflow:
434
-
435
- 0. Verify LLM availability: `npm run check-llm` (exit 0 = ready, non-zero = fix config first)
436
- 1. Build and start the server: `npm run cb && npm start`
437
- 2. Verify tools: `GET /agent-tester/api/mcp/status`
438
- 3. Send test messages: `POST /agent-tester/api/chat/test`
439
- 4. Analyze trace: correct tool? correct args? expected result?
440
- 5. If unclear: retry with `?verbose=true`
441
- 6. If issue found: fix code, rebuild, restart, re-test
442
- 7. Maintain a testing log at `claudedocs/test-log.md`
443
-
444
- ### Brief vs Verbose Strategy
445
-
446
- **Default to brief mode.** The brief trace covers most debugging scenarios:
447
- - Was the correct tool called?
448
- - Were the arguments correct?
449
- - Did the tool return the expected result?
450
- - How many turns did the agent take?
451
-
452
- **Switch to verbose** only when the brief trace doesn't explain the behavior:
453
- - Tool was never called (check `finish_reason` — was it `stop` instead of `tool_calls`?)
454
- - Wrong tool was called (check if the tool description is ambiguous)
455
- - Agent loops (check per-turn `finish_reason` and token usage)
456
- - Empty response (check if `content` is null across all turns)
457
-
458
- ## Agent Tester Chat UI
459
-
460
- The Agent Tester also provides a web UI at `/agent-tester` for interactive testing. The UI auto-connects to the local MCP server and auto-fills auth headers if configured.
461
-
462
- The chat UI uses `POST /api/chat/message` (which returns only the final response). The headless API uses `POST /api/chat/test` (which returns the response plus full trace data). Both share the same underlying agent logic and session storage.
463
-
464
- ## UI Test Selectors (`data-testid`)
465
-
466
- For UI automation (Playwright, Cypress, Selenium) the Agent Tester page is annotated with stable `data-testid` attributes. Prefer these over CSS classes, DOM IDs, or label text — they are the documented contract and won't change with styling or copy edits.
467
-
468
- ### Naming Convention
469
-
470
- All selectors use the `at-` prefix (short for "agent tester") in kebab-case:
471
-
472
- ```
473
- at-<area>-<element>[-<modifier>]
474
- ```
475
-
476
- Example: `at-auth-token-input`, `at-server-url`, `at-message-user`, `at-toast-success`.
477
-
478
- Dynamic elements that map 1:1 to runtime data append the runtime key:
479
-
480
- ```
481
- at-header-row-<headerName> e.g. at-header-row-Authorization
482
- at-header-input-<headerName> e.g. at-header-input-X-Session-Id
483
- at-message-<sender> e.g. at-message-user, at-message-assistant
484
- at-toast-<type> e.g. at-toast-success, at-toast-error
485
- ```
486
-
487
- ### Selector Reference
488
-
489
- **Auth overlay (shown when `agentTester.useAuth: true`)**
490
-
491
- | testid | Element |
492
- |---|---|
493
- | `at-auth-overlay` | Root login overlay container |
494
- | `at-auth-tabs` | Tab switcher (only rendered when multiple methods configured) |
495
- | `at-auth-tab-token` | "Token" tab button |
496
- | `at-auth-tab-basic` | "Login" tab button |
497
- | `at-auth-token-form` | Token login form |
498
- | `at-auth-token-input` | Token input field |
499
- | `at-auth-token-submit` | Token submit button |
500
- | `at-auth-basic-form` | Basic auth form |
501
- | `at-auth-username` | Username input |
502
- | `at-auth-password` | Password input |
503
- | `at-auth-basic-submit` | Basic submit button |
504
- | `at-auth-error` | Error message container |
505
-
506
- **App shell**
507
-
508
- | testid | Element |
509
- |---|---|
510
- | `at-app` | Root app container (hidden until authenticated) |
511
- | `at-sidebar` | Sidebar (configuration panel) |
512
- | `at-main` | Main chat area |
513
- | `at-chat-header` | Chat header bar |
514
-
515
- **Sidebar connection form**
516
-
517
- | testid | Element |
518
- |---|---|
519
- | `at-connection-form` | MCP connection form |
520
- | `at-server-url` | MCP server URL input |
521
- | `at-server-url-dropdown` | Saved URLs dropdown toggle |
522
- | `at-server-url-dropdown-list` | Saved URLs dropdown panel |
523
- | `at-server-url-add-new` | "Add new URL" menu item |
524
- | `at-saved-urls-list` | Container for saved URL items |
525
- | `at-saved-url-item` | Each saved URL row (dynamic) |
526
- | `at-saved-url-text` | Clickable URL text within a row |
527
- | `at-saved-url-delete` | Delete button for a saved URL |
528
- | `at-transport` | Transport `<select>` (http / sse) |
529
- | `at-connect-btn` | Connect button |
530
- | `at-connected-servers` | Connection status bar container |
531
- | `at-server-status-row` | Status row (dynamic, rendered after connect attempt) |
532
- | `at-server-status-connected` | "X tools connected" badge |
533
- | `at-server-status-disconnected` | "Disconnected" badge |
534
- | `at-disconnect-btn` | Disconnect button |
535
- | `at-reconnect-btn` | Reconnect button |
536
-
537
- **Sidebar — HTTP headers section**
538
-
539
- | testid | Element |
540
- |---|---|
541
- | `at-headers-section` | Headers section container |
542
- | `at-dynamic-headers` | Headers list container |
543
- | `at-header-row-<name>` | Row for a specific header (e.g. `at-header-row-Authorization`) |
544
- | `at-header-input-<name>` | Input for a specific header value |
545
-
546
- **Sidebar LLM settings**
547
-
548
- The sidebar shows only the current model name (read-only) and a gear button. All LLM parameters (Base URL, API Key, Model Name, Temperature, Max Tokens, Max Turns, Limit (chars)) are edited in the LLM Settings modal opened via that button. Settings are persisted in `localStorage['mcpAgentLlmSettings']`. If `agentTester.openAi.exposeToClient` is `true` in config, the server sends `baseURL` and `apiKey` via `GET /agent-tester/api/config` → `llmDefaults` and the UI pre-fills them into localStorage on first open (security note: only enable `exposeToClient` when the tester is protected by `useAuth: true` or deployed in a trusted network). When the effective `apiKey` is empty, a red "API Key is not set" warning is shown below the model name.
549
-
550
- | testid | Element |
551
- |---|---|
552
- | `at-model-section` | Model section container |
553
- | `at-model-display` | Read-only current model name |
554
- | `at-llm-settings-btn` | Gear button that opens the LLM Settings modal |
555
- | `at-api-key-warning` | "API Key is not set" warning (visible only when `apiKey` is empty) |
556
- | `at-llm-modal` | LLM Settings modal overlay |
557
- | `at-llm-modal-close` | Modal close (×) button |
558
- | `at-llm-modal-cancel` | Modal Cancel button |
559
- | `at-llm-modal-save` | Modal Save button |
560
- | `at-llm-base-url` | Base URL input (optional — empty means OpenAI default) |
561
- | `at-llm-api-key` | API Key input (password field) |
562
- | `at-llm-api-key-toggle` | Show/hide API key visibility toggle |
563
- | `at-llm-model-name` | Model Name input (editable combobox) |
564
- | `at-llm-model-dropdown-toggle` | Model dropdown arrow button |
565
- | `at-llm-model-dropdown-list` | Model dropdown list (preset models) |
566
- | `at-llm-model-option-<name>` | Individual model option inside the list |
567
- | `at-llm-temperature` | Temperature input |
568
- | `at-llm-max-tokens` | Max tokens input |
569
- | `at-llm-max-turns` | Max turns input |
570
- | `at-llm-limit-chars` | Tool result char limit input |
571
-
572
- **Sidebar prompts**
573
-
574
- | testid | Element |
575
- |---|---|
576
- | `at-system-prompt` | Agent (system) prompt `<textarea>` |
577
- | `at-system-prompt-enlarge` | Enlarge button for agent prompt |
578
- | `at-custom-prompt` | Custom prompt `<textarea>` |
579
- | `at-custom-prompt-enlarge` | Enlarge button for custom prompt |
580
-
581
- **Chat header**
582
-
583
- | testid | Element |
584
- |---|---|
585
- | `at-sidebar-toggle-mobile` | Mobile sidebar toggle |
586
- | `at-default-format` | Default display format `<select>` (HTML / MD) |
587
- | `at-theme-toggle` | Light/dark theme toggle |
588
- | `at-clear-chat` | Clear chat button |
589
- | `at-logout-btn` | Logout button (visible only when `useAuth` is true) |
590
-
591
- **Chat area**
592
-
593
- | testid | Element |
594
- |---|---|
595
- | `at-chat-messages` | Messages scroll container |
596
- | `at-welcome-message` | Initial welcome card |
597
- | `at-message-user` | User message bubble (one per message) |
598
- | `at-message-assistant` | Assistant message bubble |
599
- | `at-message-text-user` | Inner text element of a user message |
600
- | `at-message-text-assistant` | Inner text element of an assistant message |
601
- | `at-message-format-toggle` | HTML/MD format toggle on an assistant message |
602
- | `at-typing-indicator` | Typing indicator (shown during LLM response) |
603
- | `at-message-input` | Chat input `<textarea>` |
604
- | `at-char-count` | Character counter span |
605
- | `at-send-btn` | Send button |
606
-
607
- **Modals and overlays**
608
-
609
- | testid | Element |
610
- |---|---|
611
- | `at-prompt-modal` | Prompt enlarge modal overlay |
612
- | `at-prompt-modal-title` | Modal title |
613
- | `at-prompt-modal-textarea` | Modal text area |
614
- | `at-prompt-modal-save` | Apply button |
615
- | `at-prompt-modal-close` | Close button |
616
- | `at-loading-overlay` | Global loading overlay |
617
- | `at-header-tooltip` | Floating header description tooltip |
618
- | `at-toast-container` | Toast notifications container |
619
- | `at-toast-success` / `at-toast-error` / `at-toast-warning` / `at-toast-info` | Individual toast (dynamic) |
620
-
621
- ### Usage Examples
622
-
623
- **Playwright**
624
-
625
- ```js
626
- await page.goto('http://localhost:9876/agent-tester');
627
-
628
- // Login when useAuth is enabled
629
- await page.getByTestId('at-auth-token-input').fill(process.env.MCP_TOKEN);
630
- await page.getByTestId('at-auth-token-submit').click();
631
-
632
- // Wait for main app
633
- await page.getByTestId('at-app').waitFor();
634
-
635
- // Send a chat message
636
- await page.getByTestId('at-message-input').fill('List all tools');
637
- await page.getByTestId('at-send-btn').click();
638
-
639
- // Assert an assistant reply appeared
640
- await page.getByTestId('at-message-assistant').first().waitFor();
641
- ```
642
-
643
- **Cypress**
644
-
645
- ```js
646
- cy.visit('/agent-tester');
647
- cy.get('[data-testid=at-auth-token-input]').type(Cypress.env('MCP_TOKEN'));
648
- cy.get('[data-testid=at-auth-token-submit]').click();
649
- cy.get('[data-testid=at-server-status-connected]').should('be.visible');
650
- ```
651
-
652
- ### Stability Guarantee
653
-
654
- These test-ids are part of the public contract of the Agent Tester UI. Once added, a given id is not renamed or removed without a changelog entry. New elements are added with new ids as the UI grows. When authoring tests, prefer `data-testid` over:
655
-
656
- - DOM `id` (may be shared with form `<label for>` pairs and collide across scopes)
657
- - CSS class names (used for styling — may be renamed or removed during refactors)
658
- - Visible text (localized / editable copy — changes break tests)
659
- - XPath or positional selectors (brittle to layout changes)
1
+ # Agent Tester and Headless API
2
+
3
+ ## Overview
4
+
5
+ The Agent Tester is a built-in AI agent system for developing and refining MCP server tools. It goes beyond functional testing — it validates the **full agent experience**: how the LLM interprets tool descriptions, selects tools, passes arguments, and presents results.
6
+
7
+ The Headless API provides programmatic access to the Agent Tester without a browser. It enables CLI-based automated testing and returns structured trace data for every tool call, argument, result, and LLM decision.
8
+
9
+ ## Developing MCP Servers as Agents
10
+
11
+ An MCP server is not just a set of tools — it is an **agent interface**. The LLM acts as the agent, deciding which tools to call, with what arguments, and how to interpret results. This means the quality of the agent experience depends on:
12
+
13
+ - **Tool descriptions** — the LLM reads them to decide when and why to call a tool
14
+ - **Parameter schemas** — names, types, required/optional flags, and default value documentation guide the LLM's argument construction
15
+ - **Response format** — `formatToolResult()` output must be structured so the LLM can interpret and relay it to the user
16
+ - **Agent prompt** — the system prompt shapes the LLM's conversation style, tool usage logic, and error handling behavior
17
+ - **Tool decomposition** — whether one tool should be split into two, or two merged into one
18
+
19
+ All of these aspects are **invisible to unit tests**. A tool can pass all unit tests and still produce a poor agent experience because the LLM misinterprets the description, sends wrong argument types, or doesn't understand the response format.
20
+
21
+ The Agent Tester closes this gap by running the **full agent loop**: user message → LLM reasoning → tool selection → tool execution → LLM interpretation → user response.
22
+
23
+ ## Three-Phase Development Workflow
24
+
25
+ ### Phase 1: Initial Architecture
26
+
27
+ Design tools, prompts, parameters, and handler logic based on task requirements. Implement a first working version:
28
+
29
+ ```bash
30
+ npm run cb && npm start
31
+ ```
32
+
33
+ ### Phase 2: Basic Functionality
34
+
35
+ Verify compilation, server startup, tool registration, and basic calls. Fix crashes, connection errors, and missing tools.
36
+
37
+ ### Phase 3: Iterative Refinement
38
+
39
+ This is the key phase. Send test messages through the Agent Tester, observe the agent's behavior, diagnose issues, and refine:
40
+
41
+ ```
42
+ observe agent behavior → diagnose root cause → fix → rebuild → re-test
43
+ ```
44
+
45
+ Root cause categories:
46
+ - **Tool description** — LLM picks wrong tool or misunderstands purpose
47
+ - **Parameter schema** — LLM sends wrong types or misses required params
48
+ - **Agent prompt** — LLM doesn't follow desired conversation style
49
+ - **Handler logic** — tool results confuse the LLM
50
+ - **Error messages** — failures produce unhelpful responses
51
+
52
+ ## Authentication (`agentTester.useAuth`)
53
+
54
+ When `agentTester.useAuth` is `true`, the Agent Tester is protected by the full multi-auth middleware — the same authentication chain used for MCP endpoints (`permanentServerTokens` / `basic` / `jwtToken` / `custom`).
55
+
56
+ ### How It Works
57
+
58
+ **Browser access:** When a user opens `/agent-tester` in a browser, the page loads normally (static assets are served without auth). The frontend checks `GET /api/auth/status` and displays a **login dialog** if the user is not authenticated. The dialog adapts to configured auth methods:
59
+
60
+ - If `permanentServerTokens` or `jwtToken` is configured — shows a "Token" input
61
+ - If `basic` auth is configured — shows "Username" + "Password" inputs
62
+ - If both are configured — shows tabs to switch between methods
63
+
64
+ After successful login via `POST /api/auth/login`, the server issues an httpOnly session cookie (`__at_sid`). All subsequent API requests from the browser include this cookie automatically. The session is valid for the configured TTL (default: 8 hours — see [Session Lifetime](#session-lifetime) below). A logout button appears in the header.
65
+
66
+ **Headless / CLI access:** Headless API consumers (curl, scripts, Claude Code) bypass the login dialog entirely. They pass an `Authorization` header with each request, which is validated by the standard `authMW`. No session cookie is needed.
67
+
68
+ ### Configuration
69
+
70
+ ```yaml
71
+ agentTester:
72
+ useAuth: true # Show login screen for browser, require auth for API
73
+ sessionTtlMs: 28800000 # Browser session lifetime in ms (default: 8h)
74
+ tokenTTLSec: 1800 # TTL of JWTs auto-issued for the chat UI / headless clients (default: 30 min)
75
+
76
+ webServer:
77
+ auth:
78
+ enabled: true
79
+ permanentServerTokens: ['my-secret-token']
80
+ # and/or basic, jwtToken — any configured method will be available
81
+ ```
82
+
83
+ Environment variables:
84
+
85
+ - `AGENT_TESTER_USE_AUTH=true`
86
+ - `AGENT_TESTER_SESSION_TTL_MS=28800000`
87
+ - `AGENT_TESTER_TOKEN_TTL_SEC=1800`
88
+
89
+ When `useAuth` is `false` (default), the Agent Tester is accessible without any authentication and `sessionTtlMs` has no effect.
90
+
91
+ ### Session Lifetime
92
+
93
+ When `useAuth` is `true`, a successful browser login creates a server-side session and sets an httpOnly cookie (`__at_sid`) scoped to `/agent-tester`. Both the in-memory entry and the cookie's `Max-Age` use the same TTL from `agentTester.sessionTtlMs`.
94
+
95
+ **Where sessions live**: an in-memory `Map` inside the server process (`src/core/auth/agent-tester-auth.ts`). There is no disk or Redis persistence — this is intentional because Agent Tester is a development tool, not a production auth system.
96
+
97
+ **Default TTL**: 8 hours (`28_800_000` ms). Override by setting `agentTester.sessionTtlMs` in `config/default.yaml` (or any environment-specific override file), or via `AGENT_TESTER_SESSION_TTL_MS`. Values are in milliseconds; any non-positive or non-finite value falls back to the 8h default.
98
+
99
+ **Cleanup**: a background sweep runs every 30 minutes and drops expired entries from the map. Expired entries are also evicted lazily on access.
100
+
101
+ **Impact of closing the browser or restarting the server**:
102
+
103
+ | Scenario | Re-login required? |
104
+ |---|---|
105
+ | Close tab, reopen within TTL | No — cookie is persistent, server session still live |
106
+ | Close entire browser, reopen within TTL | No — cookie is persistent, server session still live |
107
+ | TTL elapsed since last login | Yes — server drops the entry, responds 401 |
108
+ | Server restart (Ctrl+C, deploy, crash) | Yes — in-memory map is cleared; browser presents an unknown `__at_sid` and the login overlay reappears |
109
+ | User clicks the Logout button | Yes — `POST /api/auth/logout` deletes the entry and clears the cookie |
110
+
111
+ **Tuning guidance**:
112
+
113
+ - **Shorter TTL (e.g. 1 hour = `3600000`)**: more frequent logins, smaller exposure if a workstation is left unlocked.
114
+ - **Longer TTL (e.g. 24 hours = `86400000`)**: fewer interruptions during long development sessions.
115
+ - **Do not set TTL to 0 or a negative value** — the server will silently fall back to the 8h default.
116
+
117
+ > **Note**: the TTL only affects the browser login flow. Headless API access via `Authorization` header is stateless and completely bypasses sessions; it is unaffected by `sessionTtlMs`.
118
+
119
+ ### Auth API Endpoints
120
+
121
+ | Endpoint | Method | Description |
122
+ |----------|--------|-------------|
123
+ | `/api/auth/status` | GET | Returns `{ authRequired, authenticated, methods }` |
124
+ | `/api/auth/login` | POST | Validates credentials, sets session cookie |
125
+ | `/api/auth/logout` | POST | Destroys session, clears cookie |
126
+ | `/api/auth-token` | GET | Returns a ready-to-use `Authorization` header value for the configured MCP auth method (used by the chat UI to auto-fill the header). Response: `{ authType, token, ttlSec? }`. |
127
+ | `/api/auth-token/refresh` | POST | Re-issues a fresh JWT (only when `webServer.auth.jwtToken.encryptKey` is configured). Response: `{ authType: 'jwtToken', token, ttlSec }`. |
128
+
129
+ ### Auto-filled Authorization Header
130
+
131
+ When the MCP server requires authentication (`webServer.auth.enabled: true`) and the chat UI is configured to send the `Authorization` header, the page does **not** ask the user to type a token — it issues one for itself by calling `GET /api/auth-token` on load. The endpoint returns a header value derived from the configured method, in priority order:
132
+
133
+ 1. **`jwtToken`** — `Bearer <encrypted JWT>` issued by the server with `sub: 'agentTester'`, `service: <appConfig.name>`, and TTL = `agentTester.tokenTTLSec` (default 1800 sec / 30 min). The response also includes `ttlSec` so the client can plan refresh.
134
+ 2. **`basic`** — `Basic <base64(user:password)>` from `webServer.auth.basic`.
135
+ 3. **`permanentServerTokens`** — `Bearer <first configured token>`.
136
+
137
+ For **JWT only**, the page periodically refreshes the token on its own via `POST /api/auth-token/refresh`. The refresh cadence is approximately `max(30, ttlSec/3 - 60)` seconds (≈ once per 1/3 of TTL, with a 60-second safety lead and a 30-second floor). At the default `tokenTTLSec: 1800`, this means a refresh roughly every **9 minutes**. The page additionally triggers an immediate refresh when the tab regains focus or `visibilitychange` fires `'visible'`, to recover from background-tab timer throttling.
138
+
139
+ If the MCP call still fails with HTTP 401 — for example, the cached token expired in the brief window between the last refresh and the request — the server transparently re-issues a JWT and retries the call **once**, but only when the target URL points to the same server (host/port match `webServer.{host,port}`, with `localhost`/`127.0.0.1`/`::1`/`0.0.0.0` treated as equivalent) and the cached header was a `Bearer …` token. This means the user typically does not see a 401 even if a request races against TTL expiry.
140
+
141
+ **Tuning**:
142
+ - Shorter `tokenTTLSec` → more frequent refresh requests but smaller window of exposure if a token leaks.
143
+ - Longer `tokenTTLSec` → fewer refreshes; useful for very long-running sessions.
144
+ - Headless clients (the `headless-chat.js` wrapper, custom curl scripts) may either rely on the 401-retry path or, for long-running scripts, mint their own JWT via `node scripts/generate-jwt.js` with an appropriate TTL — Agent Tester does not refresh tokens on behalf of headless clients.
145
+
146
+ **Login request body:**
147
+
148
+ ```json
149
+ // Token-based (permanent token or JWT)
150
+ { "token": "my-secret-token" }
151
+
152
+ // Basic auth
153
+ { "username": "admin", "password": "secret" }
154
+ ```
155
+
156
+ ### Headless Client Example
157
+
158
+ ```bash
159
+ # Access Agent Tester API with token (no login needed)
160
+ curl -H "Authorization: Bearer my-secret-token" http://localhost:9876/agent-tester/api/mcp/status
161
+
162
+ # Headless test with token
163
+ curl -X POST http://localhost:9876/agent-tester/api/chat/test \
164
+ -H "Authorization: Bearer my-secret-token" \
165
+ -H "Content-Type: application/json" \
166
+ -d '{"message":"Hello","mcpConfig":{"url":"http://localhost:9876/mcp","transport":"http"}}'
167
+ ```
168
+
169
+ ### Windows Encoding Note (curl + Cyrillic / Non-ASCII)
170
+
171
+ On Windows, curl's `-d` flag may corrupt non-ASCII characters (e.g. Cyrillic) because the shell passes bytes in the system codepage (CP1251), not UTF-8. The LLM then receives garbled text and propagates it into tool arguments.
172
+
173
+ **Symptom:** tool call arguments contain mojibake like `п�?п�?п�?п�?п�?` instead of readable Russian text.
174
+
175
+ **Fix:** write the JSON body to a file (UTF-8) and use `--data-binary @file`:
176
+
177
+ ```bash
178
+ # 1. Write request JSON to a file (editor must save as UTF-8)
179
+ cat > tmp-request.json << 'EOF'
180
+ {"message":"Отправь письмо на user@example.com с темой \"Тест\"","mcpConfig":{"url":"http://localhost:9876/mcp","transport":"http"}}
181
+ EOF
182
+
183
+ # 2. Send with --data-binary to preserve UTF-8 encoding
184
+ curl -X POST http://localhost:9876/agent-tester/api/chat/test \
185
+ -H "Content-Type: application/json; charset=utf-8" \
186
+ --data-binary @tmp-request.json
187
+ ```
188
+
189
+ This is only needed when running curl from a Windows shell with non-ASCII text. Linux/macOS terminals use UTF-8 by default and are not affected.
190
+
191
+ ## Disabled State
192
+
193
+ When `agentTester.enabled` is `false` (or not set), all `/agent-tester/*` endpoints — including the Headless API — return HTTP 404:
194
+
195
+ ```json
196
+ {
197
+ "error": "Not found"
198
+ }
199
+ ```
200
+
201
+ This prevents disclosing the existence of the Agent Tester to external users.
202
+
203
+ To enable, developer can use one of:
204
+
205
+ - **ENV**: `AGENT_TESTER_ENABLED=true`
206
+ - **Config file** `config/local.yaml`:
207
+ ```yaml
208
+ agentTester:
209
+ enabled: true
210
+ ```
211
+
212
+ ### Hiding Footer Link Without Disabling Tester
213
+
214
+ To keep the Agent Tester functional (accessible directly by URL) but hide the link from the home page footer, use `showFooterLink: false`:
215
+
216
+ ```yaml
217
+ agentTester:
218
+ enabled: true
219
+ showFooterLink: false # Tester works, but footer link is hidden
220
+ ```
221
+
222
+ When `showFooterLink` is omitted or `true` (default), the footer link is shown as usual.
223
+
224
+ **For Claude Code**: When you receive a 404 response on `/agent-tester/*`, the Agent Tester may be disabled. Inform the developer and suggest enabling it via `agentTester.enabled: true` in `config/local.yaml` or ENV `AGENT_TESTER_ENABLED=true`. The developer decides whether to turn it on. If the developer's prompt or instructions explicitly say not to mention Agent Tester enablement, respect that and stay silent.
225
+
226
+ ## LLM Availability Check
227
+
228
+ Before using the Agent Tester or Headless API, verify that the OpenAI API key is configured and works:
229
+
230
+ ```bash
231
+ npm run check-llm
232
+ ```
233
+
234
+ The script checks that `agentTester.openAi.apiKey` is set and makes a minimal `chat.completions.create` request (max_tokens: 1) to verify the key works. Prints `OK` on success, `FAIL: <reason>` on failure. Exit code `0` = success, `1` = failure.
235
+
236
+ For custom OpenAI-compatible endpoints where `gpt-4o-mini` doesn't exist, pass the model name:
237
+
238
+ ```bash
239
+ npm run check-llm -- my-custom-model
240
+ ```
241
+
242
+ **For Claude Code**: When the development prompt or instructions mention testing with the Headless API or Agent Tester, run `npm run check-llm` before starting any Agent Tester work. If the script exits with a non-zero code, inform the developer about the issue and ask them to fix the configuration before proceeding.
243
+
244
+ ## Headless API Reference
245
+
246
+ ### Connection Verification
247
+
248
+ ```
249
+ GET /agent-tester/api/mcp/status
250
+ ```
251
+
252
+ Returns connection state and all available tools without going through the UI:
253
+
254
+ ```json
255
+ {
256
+ "connected": true,
257
+ "servers": [
258
+ {
259
+ "name": "localhost9876",
260
+ "url": "http://localhost:9876/mcp",
261
+ "transport": "http",
262
+ "tools": [
263
+ { "name": "get_currency_rate", "description": "Get current cross-rate...", "inputSchema": {} }
264
+ ],
265
+ "toolCount": 1
266
+ }
267
+ ],
268
+ "totalTools": 1
269
+ }
270
+ ```
271
+
272
+ ### Headless Chat Test
273
+
274
+ ```
275
+ POST /agent-tester/api/chat/test
276
+ ```
277
+
278
+ Same request body as `POST /api/chat/message`, but returns a **structured trace** of all intermediate steps.
279
+
280
+ #### Request Body
281
+
282
+ ```json
283
+ {
284
+ "message": "What is the exchange rate of EUR to USD?",
285
+ "mcpConfig": {
286
+ "url": "http://localhost:9876/mcp",
287
+ "transport": "http",
288
+ "headers": { "Authorization": "Bearer <token>" }
289
+ },
290
+ "sessionId": "optional-session-id",
291
+ "agentPrompt": "optional agent prompt override",
292
+ "customPrompt": "optional additional instructions appended after agentPrompt",
293
+ "modelConfig": {
294
+ "model": "gpt-4o",
295
+ "temperature": 0.3,
296
+ "maxTokens": 4096,
297
+ "maxTurns": 10
298
+ }
299
+ }
300
+ ```
301
+
302
+ Only `message` is required. `mcpConfig` is required for tool calls.
303
+
304
+ | Field | Required | Description |
305
+ |-------|----------|-------------|
306
+ | `message` | yes | User message to send to the agent |
307
+ | `mcpConfig` | no | MCP server connection config (required for tool calls) |
308
+ | `sessionId` | no | Session ID for multi-turn conversations; omit to start fresh |
309
+ | `agentPrompt` | no | Agent prompt to send to the LLM as the system prompt. When provided, **replaces** the MCP server's `agent_prompt`. When omitted, the MCP server's `agent_prompt` is used (if available), otherwise a built-in default |
310
+ | `customPrompt` | no | Additional instructions appended after `agentPrompt`. Use for per-request modifiers without replacing the main prompt |
311
+ | `modelConfig` | no | LLM model settings (model name, temperature, maxTokens, maxTurns) |
312
+
313
+ #### Brief Response (default)
314
+
315
+ ```json
316
+ {
317
+ "message": "The EUR/USD rate is 1.0847",
318
+ "sessionId": "abc-123",
319
+ "trace": {
320
+ "system_prompt_sent": "You are a currency assistant...\n\nBe concise.",
321
+ "turns": [
322
+ {
323
+ "turn": 1,
324
+ "tool_calls": [
325
+ { "name": "get_currency_rate", "arguments": { "quoteCurrency": "EUR", "baseCurrency": "USD" } }
326
+ ],
327
+ "tool_results": [
328
+ { "name": "get_currency_rate", "result": { "symbol": "EURUSD", "rate": 1.0847 }, "duration_ms": 230 }
329
+ ]
330
+ }
331
+ ],
332
+ "total_turns": 2,
333
+ "total_duration_ms": 1850,
334
+ "tools_used": ["get_currency_rate"]
335
+ }
336
+ }
337
+ ```
338
+
339
+ The `system_prompt_sent` field contains the **final system prompt** that was sent to the LLM. Use it to verify exactly what the LLM received — especially when iterating on agent prompt variations.
340
+
341
+ Brief mode shows the tool interaction chain: which tools were called, with what arguments, and what they returned. No LLM internals.
342
+
343
+ #### Verbose Response
344
+
345
+ ```
346
+ POST /agent-tester/api/chat/test?verbose=true
347
+ ```
348
+
349
+ Adds per-turn LLM request/response details:
350
+
351
+ ```json
352
+ {
353
+ "turns": [
354
+ {
355
+ "turn": 1,
356
+ "llm_request": { "model": "gpt-4o", "messages_count": 3 },
357
+ "llm_response": {
358
+ "finish_reason": "tool_calls",
359
+ "content": null,
360
+ "usage": { "prompt_tokens": 450, "completion_tokens": 32, "total_tokens": 482 }
361
+ },
362
+ "tool_calls": [...],
363
+ "tool_results": [...]
364
+ }
365
+ ]
366
+ }
367
+ ```
368
+
369
+ Use verbose mode when:
370
+ - The agent doesn't call the expected tool and the brief trace doesn't explain why
371
+ - The agent loops without resolving (check `finish_reason`)
372
+ - Token usage is unexpectedly high
373
+ - The response is empty or unexpected
374
+
375
+ #### Size Limit Overrides
376
+
377
+ ```
378
+ POST /agent-tester/api/chat/test?maxResultChars=8000&maxTraceChars=100000
379
+ ```
380
+
381
+ | Parameter | Default | Description |
382
+ |-----------|---------|-------------|
383
+ | `maxResultChars` | 4000 | Max characters per tool result in trace |
384
+ | `maxTraceChars` | 50000 | Max total trace size; older turns are collapsed to summaries when exceeded |
385
+
386
+ ### Prompt Assembly
387
+
388
+ The system prompt sent to the LLM is resolved by priority the first available value wins:
389
+
390
+ ```
391
+ request.agentPrompt → session.agentPrompt → MCP server's agent_prompt → built-in default
392
+ ```
393
+
394
+ If `customPrompt` is provided, it is appended after the resolved prompt.
395
+
396
+ The final result is sent as `{ role: "system" }` to the LLM and returned in the trace as `system_prompt_sent`.
397
+
398
+ **Key principle:** when `agentPrompt` is passed in the request, it **replaces** the MCP server's `agent_prompt` entirely. This enables the iterative prompt refinement workflow:
399
+
400
+ 1. Read the current `AGENT_PROMPT` from `src/prompts/agent-prompt.ts`
401
+ 2. Send it as `agentPrompt` in the headless request
402
+ 3. Evaluate the agent's response and trace
403
+ 4. Modify the prompt, send again
404
+ 5. When satisfied, write the best variant back to `src/prompts/agent-prompt.ts`
405
+
406
+ ```bash
407
+ # Test current prompt
408
+ curl -X POST http://localhost:9876/agent-tester/api/chat/test \
409
+ -H "Content-Type: application/json" \
410
+ -d '{"message":"Get EUR/USD rate","agentPrompt":"You are a concise currency assistant. Use tools, reply in one sentence.","mcpConfig":{"url":"http://localhost:9876/mcp","transport":"http"}}'
411
+
412
+ # Try a different variation
413
+ curl -X POST http://localhost:9876/agent-tester/api/chat/test \
414
+ -H "Content-Type: application/json" \
415
+ -d '{"message":"Get EUR/USD rate","agentPrompt":"You are a financial analyst. Explain rates with market context and trends.","mcpConfig":{"url":"http://localhost:9876/mcp","transport":"http"}}'
416
+ ```
417
+
418
+ Compare `system_prompt_sent` and agent responses between variations to find the optimal prompt. When omitting `agentPrompt`, the MCP server's own `agent_prompt` is used automatically — this tests the currently deployed prompt as-is.
419
+
420
+ ### Sessions
421
+
422
+ The headless API shares sessions with the chat UI. To start a fresh conversation, omit `sessionId`. To continue an existing conversation, pass `sessionId` from a previous response.
423
+
424
+ ## Structured JSON Logging (`agentTester.logJson`)
425
+
426
+ When `agentTester.logJson` is `true`, each agent event is emitted as a single-line JSON object on stdout — useful for real-time monitoring, debugging, and log aggregation.
427
+
428
+ Enable via config, CLI flag, or environment variable:
429
+
430
+ ```yaml
431
+ # config/local.yaml
432
+ agentTester:
433
+ logJson: true
434
+ ```
435
+
436
+ ```bash
437
+ npm start -- --log-json
438
+ # or
439
+ AGENT_TESTER_LOG_JSON=true npm start
440
+ ```
441
+
442
+ Event types emitted:
443
+
444
+ ```
445
+ {"event":"tool_call","name":"get_currency_rate","arguments":{"quoteCurrency":"EUR"},"timestamp":"2025-08-15T14:32:00.000Z"}
446
+ {"event":"tool_result","name":"get_currency_rate","result":{"rate":1.0847},"duration_ms":230,"timestamp":"2025-08-15T14:32:00.230Z"}
447
+ {"event":"llm_response","turn":2,"finish_reason":"stop","tool_calls":[],"has_content":true,"timestamp":"2025-08-15T14:32:01.500Z"}
448
+ {"event":"response","message":"The EUR/USD rate is 1.0847","tools_used":["get_currency_rate"],"duration_ms":1850}
449
+ ```
450
+
451
+ **Default mode** (without `--log-json`) keeps the colored text logs for human debugging. The flag affects only agent tester events — other server logs (startup, auth, MCP protocol) continue in their normal format.
452
+
453
+ ## Automated Testing with Claude Code
454
+
455
+ The Headless API is designed for CLI automation tools like Claude Code. The typical automated testing workflow:
456
+
457
+ 0. Verify LLM availability: `npm run check-llm` (exit 0 = ready, non-zero = fix config first)
458
+ 1. Build and start the server: `npm run cb && npm start`
459
+ 2. Verify tools: `GET /agent-tester/api/mcp/status`
460
+ 3. Send test messages: `POST /agent-tester/api/chat/test`
461
+ 4. Analyze trace: correct tool? correct args? expected result?
462
+ 5. If unclear: retry with `?verbose=true`
463
+ 6. If issue found: fix code, rebuild, restart, re-test
464
+ 7. Maintain a testing log at `claudedocs/test-log.md`
465
+
466
+ ### Brief vs Verbose Strategy
467
+
468
+ **Default to brief mode.** The brief trace covers most debugging scenarios:
469
+ - Was the correct tool called?
470
+ - Were the arguments correct?
471
+ - Did the tool return the expected result?
472
+ - How many turns did the agent take?
473
+
474
+ **Switch to verbose** only when the brief trace doesn't explain the behavior:
475
+ - Tool was never called (check `finish_reason` — was it `stop` instead of `tool_calls`?)
476
+ - Wrong tool was called (check if the tool description is ambiguous)
477
+ - Agent loops (check per-turn `finish_reason` and token usage)
478
+ - Empty response (check if `content` is null across all turns)
479
+
480
+ ## Agent Tester Chat UI
481
+
482
+ The Agent Tester also provides a web UI at `/agent-tester` for interactive testing. The UI auto-connects to the local MCP server and auto-fills auth headers if configured.
483
+
484
+ The chat UI uses `POST /api/chat/message` (which returns only the final response). The headless API uses `POST /api/chat/test` (which returns the response plus full trace data). Both share the same underlying agent logic and session storage.
485
+
486
+ ## UI Test Selectors (`data-testid`)
487
+
488
+ For UI automation (Playwright, Cypress, Selenium) the Agent Tester page is annotated with stable `data-testid` attributes. Prefer these over CSS classes, DOM IDs, or label text — they are the documented contract and won't change with styling or copy edits.
489
+
490
+ ### Naming Convention
491
+
492
+ All selectors use the `at-` prefix (short for "agent tester") in kebab-case:
493
+
494
+ ```
495
+ at-<area>-<element>[-<modifier>]
496
+ ```
497
+
498
+ Example: `at-auth-token-input`, `at-server-url`, `at-message-user`, `at-toast-success`.
499
+
500
+ Dynamic elements that map 1:1 to runtime data append the runtime key:
501
+
502
+ ```
503
+ at-header-row-<headerName> e.g. at-header-row-Authorization
504
+ at-header-input-<headerName> e.g. at-header-input-X-Session-Id
505
+ at-message-<sender> e.g. at-message-user, at-message-assistant
506
+ at-toast-<type> e.g. at-toast-success, at-toast-error
507
+ ```
508
+
509
+ ### Selector Reference
510
+
511
+ **Auth overlay (shown when `agentTester.useAuth: true`)**
512
+
513
+ | testid | Element |
514
+ |---|---|
515
+ | `at-auth-overlay` | Root login overlay container |
516
+ | `at-auth-tabs` | Tab switcher (only rendered when multiple methods configured) |
517
+ | `at-auth-tab-token` | "Token" tab button |
518
+ | `at-auth-tab-basic` | "Login" tab button |
519
+ | `at-auth-token-form` | Token login form |
520
+ | `at-auth-token-input` | Token input field |
521
+ | `at-auth-token-submit` | Token submit button |
522
+ | `at-auth-basic-form` | Basic auth form |
523
+ | `at-auth-username` | Username input |
524
+ | `at-auth-password` | Password input |
525
+ | `at-auth-basic-submit` | Basic submit button |
526
+ | `at-auth-error` | Error message container |
527
+
528
+ **App shell**
529
+
530
+ | testid | Element |
531
+ |---|---|
532
+ | `at-app` | Root app container (hidden until authenticated) |
533
+ | `at-sidebar` | Sidebar (configuration panel) |
534
+ | `at-main` | Main chat area |
535
+ | `at-chat-header` | Chat header bar |
536
+
537
+ **Sidebar — connection form**
538
+
539
+ | testid | Element |
540
+ |---|---|
541
+ | `at-connection-form` | MCP connection form |
542
+ | `at-server-url` | MCP server URL input |
543
+ | `at-server-url-dropdown` | Saved URLs dropdown toggle |
544
+ | `at-server-url-dropdown-list` | Saved URLs dropdown panel |
545
+ | `at-server-url-add-new` | "Add new URL" menu item |
546
+ | `at-saved-urls-list` | Container for saved URL items |
547
+ | `at-saved-url-item` | Each saved URL row (dynamic) |
548
+ | `at-saved-url-text` | Clickable URL text within a row |
549
+ | `at-saved-url-delete` | Delete button for a saved URL |
550
+ | `at-transport` | Transport `<select>` (http / sse) |
551
+ | `at-connect-btn` | Connect button |
552
+ | `at-connected-servers` | Connection status bar container |
553
+ | `at-server-status-row` | Status row (dynamic, rendered after connect attempt) |
554
+ | `at-server-status-connected` | "X tools connected" badge |
555
+ | `at-server-status-disconnected` | "Disconnected" badge |
556
+ | `at-disconnect-btn` | Disconnect button |
557
+ | `at-reconnect-btn` | Reconnect button |
558
+
559
+ **Sidebar HTTP headers section**
560
+
561
+ | testid | Element |
562
+ |---|---|
563
+ | `at-headers-section` | Headers section container |
564
+ | `at-dynamic-headers` | Headers list container |
565
+ | `at-header-row-<name>` | Row for a specific header (e.g. `at-header-row-Authorization`) |
566
+ | `at-header-input-<name>` | Input for a specific header value |
567
+
568
+ **Sidebar LLM settings**
569
+
570
+ The sidebar shows only the current model name (read-only) and a gear button. All LLM parameters (Base URL, API Key, Model Name, Temperature, Max Tokens, Max Turns, Limit (chars)) are edited in the LLM Settings modal opened via that button. Settings are persisted in `localStorage['mcpAgentLlmSettings']`. If `agentTester.openAi.exposeToClient` is `true` in config, the server sends `baseURL` and `apiKey` via `GET /agent-tester/api/config` → `llmDefaults` and the UI pre-fills them into localStorage on first open (security note: only enable `exposeToClient` when the tester is protected by `useAuth: true` or deployed in a trusted network). When the effective `apiKey` is empty, a red "API Key is not set" warning is shown below the model name.
571
+
572
+ | testid | Element |
573
+ |---|---|
574
+ | `at-model-section` | Model section container |
575
+ | `at-model-display` | Read-only current model name |
576
+ | `at-llm-settings-btn` | Gear button that opens the LLM Settings modal |
577
+ | `at-api-key-warning` | "API Key is not set" warning (visible only when `apiKey` is empty) |
578
+ | `at-llm-modal` | LLM Settings modal overlay |
579
+ | `at-llm-modal-close` | Modal close (×) button |
580
+ | `at-llm-modal-cancel` | Modal Cancel button |
581
+ | `at-llm-modal-save` | Modal Save button |
582
+ | `at-llm-base-url` | Base URL input (optional — empty means OpenAI default) |
583
+ | `at-llm-api-key` | API Key input (password field) |
584
+ | `at-llm-api-key-toggle` | Show/hide API key visibility toggle |
585
+ | `at-llm-model-name` | Model Name input (editable combobox) |
586
+ | `at-llm-model-dropdown-toggle` | Model dropdown arrow button |
587
+ | `at-llm-model-dropdown-list` | Model dropdown list (preset models) |
588
+ | `at-llm-model-option-<name>` | Individual model option inside the list |
589
+ | `at-llm-temperature` | Temperature input |
590
+ | `at-llm-max-tokens` | Max tokens input |
591
+ | `at-llm-max-turns` | Max turns input |
592
+ | `at-llm-limit-chars` | Tool result char limit input |
593
+
594
+ **Sidebar — prompts**
595
+
596
+ | testid | Element |
597
+ |---|---|
598
+ | `at-system-prompt` | Agent (system) prompt `<textarea>` |
599
+ | `at-system-prompt-enlarge` | Enlarge button for agent prompt |
600
+ | `at-custom-prompt` | Custom prompt `<textarea>` |
601
+ | `at-custom-prompt-enlarge` | Enlarge button for custom prompt |
602
+
603
+ **Chat header**
604
+
605
+ | testid | Element |
606
+ |---|---|
607
+ | `at-sidebar-toggle-mobile` | Mobile sidebar toggle |
608
+ | `at-default-format` | Default display format `<select>` (HTML / MD) |
609
+ | `at-theme-toggle` | Light/dark theme toggle |
610
+ | `at-clear-chat` | Clear chat button |
611
+ | `at-logout-btn` | Logout button (visible only when `useAuth` is true) |
612
+
613
+ **Chat area**
614
+
615
+ | testid | Element |
616
+ |---|---|
617
+ | `at-chat-messages` | Messages scroll container |
618
+ | `at-welcome-message` | Initial welcome card |
619
+ | `at-message-user` | User message bubble (one per message) |
620
+ | `at-message-assistant` | Assistant message bubble |
621
+ | `at-message-text-user` | Inner text element of a user message |
622
+ | `at-message-text-assistant` | Inner text element of an assistant message |
623
+ | `at-message-format-toggle` | HTML/MD format toggle on an assistant message |
624
+ | `at-typing-indicator` | Typing indicator (shown during LLM response) |
625
+ | `at-message-input` | Chat input `<textarea>` |
626
+ | `at-char-count` | Character counter span |
627
+ | `at-send-btn` | Send button |
628
+
629
+ **Modals and overlays**
630
+
631
+ | testid | Element |
632
+ |---|---|
633
+ | `at-prompt-modal` | Prompt enlarge modal overlay |
634
+ | `at-prompt-modal-title` | Modal title |
635
+ | `at-prompt-modal-textarea` | Modal text area |
636
+ | `at-prompt-modal-save` | Apply button |
637
+ | `at-prompt-modal-close` | Close button |
638
+ | `at-loading-overlay` | Global loading overlay |
639
+ | `at-header-tooltip` | Floating header description tooltip |
640
+ | `at-toast-container` | Toast notifications container |
641
+ | `at-toast-success` / `at-toast-error` / `at-toast-warning` / `at-toast-info` | Individual toast (dynamic) |
642
+
643
+ ### Usage Examples
644
+
645
+ **Playwright**
646
+
647
+ ```js
648
+ await page.goto('http://localhost:9876/agent-tester');
649
+
650
+ // Login when useAuth is enabled
651
+ await page.getByTestId('at-auth-token-input').fill(process.env.MCP_TOKEN);
652
+ await page.getByTestId('at-auth-token-submit').click();
653
+
654
+ // Wait for main app
655
+ await page.getByTestId('at-app').waitFor();
656
+
657
+ // Send a chat message
658
+ await page.getByTestId('at-message-input').fill('List all tools');
659
+ await page.getByTestId('at-send-btn').click();
660
+
661
+ // Assert an assistant reply appeared
662
+ await page.getByTestId('at-message-assistant').first().waitFor();
663
+ ```
664
+
665
+ **Cypress**
666
+
667
+ ```js
668
+ cy.visit('/agent-tester');
669
+ cy.get('[data-testid=at-auth-token-input]').type(Cypress.env('MCP_TOKEN'));
670
+ cy.get('[data-testid=at-auth-token-submit]').click();
671
+ cy.get('[data-testid=at-server-status-connected]').should('be.visible');
672
+ ```
673
+
674
+ ### Stability Guarantee
675
+
676
+ These test-ids are part of the public contract of the Agent Tester UI. Once added, a given id is not renamed or removed without a changelog entry. New elements are added with new ids as the UI grows. When authoring tests, prefer `data-testid` over:
677
+
678
+ - DOM `id` (may be shared with form `<label for>` pairs and collide across scopes)
679
+ - CSS class names (used for styling — may be renamed or removed during refactors)
680
+ - Visible text (localized / editable copy — changes break tests)
681
+ - XPath or positional selectors (brittle to layout changes)