@mastra/mcp-docs-server 1.1.17-alpha.7 → 1.1.17

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (39) hide show
  1. package/.docs/docs/evals/built-in-scorers.md +1 -0
  2. package/.docs/docs/memory/observational-memory.md +49 -4
  3. package/.docs/docs/server/mastra-client.md +17 -0
  4. package/.docs/docs/server/server-adapters.md +15 -1
  5. package/.docs/models/gateways/openrouter.md +1 -1
  6. package/.docs/models/index.md +1 -1
  7. package/.docs/models/providers/bailing.md +1 -1
  8. package/.docs/models/providers/cloudflare-workers-ai.md +4 -3
  9. package/.docs/models/providers/firmware.md +2 -2
  10. package/.docs/models/providers/friendli.md +1 -1
  11. package/.docs/models/providers/github-models.md +1 -1
  12. package/.docs/models/providers/google.md +7 -2
  13. package/.docs/models/providers/groq.md +24 -16
  14. package/.docs/models/providers/huggingface.md +1 -1
  15. package/.docs/models/providers/llmgateway.md +269 -0
  16. package/.docs/models/providers/mistral.md +3 -2
  17. package/.docs/models/providers/nano-gpt.md +3 -1
  18. package/.docs/models/providers/openai.md +2 -1
  19. package/.docs/models/providers/poe.md +3 -1
  20. package/.docs/models/providers/zai-coding-plan.md +3 -2
  21. package/.docs/models/providers/zhipuai-coding-plan.md +3 -2
  22. package/.docs/models/providers.md +1 -0
  23. package/.docs/reference/ai-sdk/handle-chat-stream.md +2 -0
  24. package/.docs/reference/client-js/agents.md +11 -6
  25. package/.docs/reference/client-js/mastra-client.md +1 -1
  26. package/.docs/reference/client-js/memory.md +1 -1
  27. package/.docs/reference/configuration.md +24 -0
  28. package/.docs/reference/core/mastra-model-gateway.md +2 -0
  29. package/.docs/reference/deployer/cloudflare.md +31 -1
  30. package/.docs/reference/evals/run-evals.md +78 -3
  31. package/.docs/reference/evals/scorer-utils.md +188 -0
  32. package/.docs/reference/evals/trajectory-accuracy.md +627 -0
  33. package/.docs/reference/index.md +1 -2
  34. package/.docs/reference/logging/pino-logger.md +58 -0
  35. package/.docs/reference/memory/observational-memory.md +32 -6
  36. package/CHANGELOG.md +44 -0
  37. package/package.json +6 -6
  38. package/.docs/reference/core/getStoredAgentById.md +0 -87
  39. package/.docs/reference/core/listStoredAgents.md +0 -91
@@ -35,7 +35,7 @@ console.log(`Processed ${result.summary.totalItems} items`)
35
35
 
36
36
  **data** (`RunEvalsDataItem[]`): Array of test cases with input data and optional ground truth.
37
37
 
38
- **scorers** (`MastraScorer[] | WorkflowScorerConfig`): Array of scorers for agents, or configuration object for workflows specifying scorers for the workflow and individual steps.
38
+ **scorers** (`MastraScorer[] | AgentScorerConfig | WorkflowScorerConfig`): Scorers to use. A flat array applies all scorers to the raw output. For agents, an \`AgentScorerConfig\` object separates agent-level and trajectory scorers. For workflows, a \`WorkflowScorerConfig\` object specifies scorers for the workflow, individual steps, and trajectory.
39
39
 
40
40
  **targetOptions** (`AgentExecutionOptions | WorkflowRunOptions`): Options forwarded to the target during execution. For agents: options passed to agent.generate() (e.g. maxSteps, modelSettings, instructions). For workflows: options passed to run.start() (e.g. perStep, outputOptions, initialState).
41
41
 
@@ -49,20 +49,32 @@ console.log(`Processed ${result.summary.totalItems} items`)
49
49
 
50
50
  **groundTruth** (`any`): Expected or reference output for comparison during scoring.
51
51
 
52
+ **expectedTrajectory** (`TrajectoryExpectation`): Expected trajectory configuration for trajectory scoring. Includes expected steps, ordering, efficiency budgets, blacklists, and tool failure tolerance. Passed to trajectory scorers as \`run.expectedTrajectory\`. Overrides the static defaults in scorer constructors.
53
+
52
54
  **requestContext** (`RequestContext`): Request Context to pass to the target during execution.
53
55
 
54
56
  **tracingContext** (`TracingContext`): Tracing context for observability and debugging.
55
57
 
56
58
  **startOptions** (`WorkflowRunOptions`): Per-item workflow run options (e.g. initialState, perStep, outputOptions). Merged on top of targetOptions, so per-item values take precedence. Only applicable when the target is a workflow.
57
59
 
60
+ ## Agent scorer configuration
61
+
62
+ For agents, use `AgentScorerConfig` to separate agent-level scorers from trajectory scorers:
63
+
64
+ **agent** (`MastraScorer[]`): Scorers that receive the raw agent output (MastraDBMessage\[]). Use for evaluating response quality, content, etc.
65
+
66
+ **trajectory** (`MastraScorer[]`): Scorers that receive a pre-extracted Trajectory object. When storage is configured, the pipeline extracts a hierarchical trajectory from observability traces (including nested tool calls and model generations). Otherwise, it falls back to extracting tool calls from agent messages.
67
+
58
68
  ## Workflow scorer configuration
59
69
 
60
- For workflows, you can specify scorers at different levels using `WorkflowScorerConfig`:
70
+ For workflows, use `WorkflowScorerConfig` to specify scorers at different levels:
61
71
 
62
- **workflow** (`MastraScorer[]`): Array of scorers to evaluate the entire workflow output.
72
+ **workflow** (`MastraScorer[]`): Scorers to evaluate the entire workflow output.
63
73
 
64
74
  **steps** (`Record<string, MastraScorer[]>`): Object mapping step IDs to arrays of scorers for evaluating individual step outputs.
65
75
 
76
+ **trajectory** (`MastraScorer[]`): Scorers that receive a pre-extracted Trajectory from the workflow execution. When storage is configured, the pipeline extracts a hierarchical trajectory from observability traces (including nested agent runs and tool calls within workflow steps). Otherwise, it falls back to extracting step results from the workflow output.
77
+
66
78
  ## Returns
67
79
 
68
80
  **scores** (`Record<string, any>`): Average scores across all test cases, organized by scorer name.
@@ -105,6 +117,36 @@ const result = await runEvals({
105
117
  })
106
118
  ```
107
119
 
120
+ ### Agent trajectory evaluation
121
+
122
+ Use `AgentScorerConfig` to evaluate both the agent response and its tool-calling trajectory:
123
+
124
+ ```typescript
125
+ import { runEvals } from '@mastra/core/evals'
126
+ import { createTrajectoryAccuracyScorerCode } from '@mastra/evals/scorers/code/trajectory'
127
+
128
+ const trajectoryScorer = createTrajectoryAccuracyScorerCode()
129
+
130
+ const result = await runEvals({
131
+ target: chatAgent,
132
+ data: [
133
+ {
134
+ input: 'What is the weather in London?',
135
+ expectedTrajectory: {
136
+ steps: [{ stepType: 'tool_call', name: 'weatherTool' }],
137
+ },
138
+ },
139
+ ],
140
+ scorers: {
141
+ // agent: [responseQualityScorer], // Optional: add agent-level scorers
142
+ trajectory: [trajectoryScorer],
143
+ },
144
+ })
145
+
146
+ // result.scores.agent — average agent-level scores
147
+ // result.scores.trajectory — average trajectory scores
148
+ ```
149
+
108
150
  ### Agent with `targetOptions`
109
151
 
110
152
  Pass execution options like `maxSteps` or `modelSettings` to customize agent behavior during evaluation:
@@ -149,6 +191,37 @@ const workflowResult = await runEvals({
149
191
  })
150
192
  ```
151
193
 
194
+ ### Workflow trajectory evaluation
195
+
196
+ Add trajectory scoring to workflow evaluations to validate step execution order:
197
+
198
+ ```typescript
199
+ const workflowResult = await runEvals({
200
+ target: myWorkflow,
201
+ data: [
202
+ {
203
+ input: { query: 'Process this data' },
204
+ expectedTrajectory: {
205
+ steps: [
206
+ { stepType: 'workflow_step', name: 'validate' },
207
+ { stepType: 'workflow_step', name: 'process' },
208
+ { stepType: 'workflow_step', name: 'output' },
209
+ ],
210
+ },
211
+ },
212
+ ],
213
+ scorers: {
214
+ workflow: [outputQualityScorer],
215
+ steps: {
216
+ validate: [validationScorer],
217
+ },
218
+ trajectory: [trajectoryScorer],
219
+ },
220
+ })
221
+
222
+ // result.scores.trajectory — workflow trajectory scores
223
+ ```
224
+
152
225
  ### Workflow with per-item `startOptions`
153
226
 
154
227
  Use `startOptions` on individual data items to customize each workflow run. Per-item values take precedence over `targetOptions`:
@@ -175,5 +248,7 @@ const result = await runEvals({
175
248
 
176
249
  - [createScorer()](https://mastra.ai/reference/evals/create-scorer) - Create custom scorers for experiments
177
250
  - [MastraScorer](https://mastra.ai/reference/evals/mastra-scorer) - Learn about scorer structure and methods
251
+ - [Trajectory Accuracy](https://mastra.ai/reference/evals/trajectory-accuracy) - Built-in trajectory evaluation scorers
252
+ - [Scorer Utilities](https://mastra.ai/reference/evals/scorer-utils) - Helper functions for extracting trajectory data
178
253
  - [Custom Scorers](https://mastra.ai/docs/evals/custom-scorers) - Guide to building evaluation logic
179
254
  - [Scorers Overview](https://mastra.ai/docs/evals/overview) - Understanding scorer concepts
@@ -14,9 +14,21 @@ import {
14
14
  extractToolCalls,
15
15
  extractInputMessages,
16
16
  extractAgentResponseMessages,
17
+ compareTrajectories,
18
+ createTrajectoryTestRun,
17
19
  } from '@mastra/evals/scorers/utils'
18
20
  ```
19
21
 
22
+ Trajectory extraction functions are available from `@mastra/core/evals`:
23
+
24
+ ```typescript
25
+ import {
26
+ extractTrajectory,
27
+ extractWorkflowTrajectory,
28
+ extractTrajectoryFromTrace,
29
+ } from '@mastra/core/evals'
30
+ ```
31
+
20
32
  ## Message extraction
21
33
 
22
34
  ### `getAssistantMessageFromRunOutput`
@@ -266,6 +278,182 @@ const result = await myScorer.run({
266
278
  })
267
279
  ```
268
280
 
281
+ ## Trajectory utilities
282
+
283
+ ### `extractTrajectory`
284
+
285
+ Extracts a `Trajectory` from agent output messages (`MastraDBMessage[]`). Converts tool invocations into `ToolCallStep` objects. The `runEvals` pipeline calls this automatically for trajectory scorers — you only need it for direct testing.
286
+
287
+ Available from `@mastra/core/evals`.
288
+
289
+ ```typescript
290
+ import { extractTrajectory } from '@mastra/core/evals'
291
+
292
+ const trajectory = extractTrajectory(agentOutputMessages)
293
+ // trajectory.steps — ToolCallStep[] extracted from toolInvocations
294
+ // trajectory.rawOutput — the original MastraDBMessage[] array
295
+ ```
296
+
297
+ **Returns:** `Trajectory` — Contains `steps: TrajectoryStep[]`, `totalDurationMs`, and `rawOutput`.
298
+
299
+ ### `extractWorkflowTrajectory`
300
+
301
+ Extracts a `Trajectory` from workflow step results. Converts `StepResult` records into `WorkflowStepStep` objects, respecting the execution path ordering.
302
+
303
+ Available from `@mastra/core/evals`.
304
+
305
+ ```typescript
306
+ import { extractWorkflowTrajectory } from '@mastra/core/evals'
307
+
308
+ const trajectory = extractWorkflowTrajectory(
309
+ workflowResult.steps, // Record<string, StepResult>
310
+ workflowResult.stepExecutionPath, // string[] (optional)
311
+ )
312
+ // trajectory.steps — WorkflowStepStep[] in execution order
313
+ ```
314
+
315
+ **Returns:** `Trajectory` — Contains `steps: TrajectoryStep[]`, `totalDurationMs`, and `rawWorkflowResult`.
316
+
317
+ ### `extractTrajectoryFromTrace`
318
+
319
+ Builds a hierarchical `Trajectory` from observability trace spans (`SpanRecord[]`). Reconstructs the parent-child span tree and maps each span to the appropriate `TrajectoryStep` discriminated union type with nested `children`.
320
+
321
+ This is the preferred extraction method when storage is available. The `runEvals` pipeline calls this automatically when the target's `Mastra` instance has a configured storage backend. It produces richer trajectories than `extractTrajectory` or `extractWorkflowTrajectory` because it captures the full execution tree, including nested agent runs, tool calls, and model generations.
322
+
323
+ Available from `@mastra/core/evals`.
324
+
325
+ ```typescript
326
+ import { extractTrajectoryFromTrace } from '@mastra/core/evals'
327
+
328
+ // After fetching a trace from the observability store
329
+ const traceData = await observabilityStore.getTrace({ traceId })
330
+ const trajectory = extractTrajectoryFromTrace(traceData.spans, rootSpanId)
331
+ // trajectory.steps — hierarchical TrajectoryStep[] with children
332
+ ```
333
+
334
+ **Parameters:**
335
+
336
+ - `spans` (`SpanRecord[]`) — Array of span records from a trace query.
337
+ - `rootSpanId` (`string`, optional) — Span ID to use as the starting point. When omitted, uses spans with no parent.
338
+
339
+ **Returns:** `Trajectory` — Contains `steps: TrajectoryStep[]` with recursive `children` and `totalDurationMs`.
340
+
341
+ #### Span type mapping
342
+
343
+ | Span type | Trajectory step type | Key fields extracted |
344
+ | ---------------------- | ---------------------- | ------------------------------------------------------------- |
345
+ | `TOOL_CALL` | `tool_call` | `toolArgs`, `toolResult`, `success` |
346
+ | `MCP_TOOL_CALL` | `mcp_tool_call` | `toolArgs`, `toolResult`, `mcpServer`, `success` |
347
+ | `MODEL_GENERATION` | `model_generation` | `modelId`, `promptTokens`, `completionTokens`, `finishReason` |
348
+ | `AGENT_RUN` | `agent_run` | `agentId` (from entity ID) |
349
+ | `WORKFLOW_RUN` | `workflow_run` | `workflowId` (from entity ID) |
350
+ | `WORKFLOW_STEP` | `workflow_step` | `output` |
351
+ | `WORKFLOW_CONDITIONAL` | `workflow_conditional` | `conditionCount`, `selectedSteps` |
352
+ | `WORKFLOW_PARALLEL` | `workflow_parallel` | `branchCount`, `parallelSteps` |
353
+ | `WORKFLOW_LOOP` | `workflow_loop` | `loopType`, `totalIterations` |
354
+ | `WORKFLOW_SLEEP` | `workflow_sleep` | `sleepDurationMs`, `sleepType` |
355
+ | `WORKFLOW_WAIT_EVENT` | `workflow_wait_event` | `eventName`, `eventReceived` |
356
+ | `PROCESSOR_RUN` | `processor_run` | `processorId` |
357
+
358
+ Spans with types `GENERIC`, `MODEL_STEP`, `MODEL_CHUNK`, and `WORKFLOW_CONDITIONAL_EVAL` are skipped as noise.
359
+
360
+ ### `compareTrajectories`
361
+
362
+ Compares an actual trajectory against an expected trajectory and returns a detailed comparison result. Used internally by `createTrajectoryAccuracyScorerCode`.
363
+
364
+ The `expected` parameter accepts either a `Trajectory` (actual trajectory) or `{ steps: ExpectedStep[] }`. When using `ExpectedStep[]`, you can match by name only, name + stepType, or include data for comparison. See [Expected steps](https://mastra.ai/reference/evals/trajectory-accuracy) for details.
365
+
366
+ ```typescript
367
+ import { compareTrajectories } from '@mastra/evals/scorers/utils'
368
+
369
+ // Using ExpectedStep[] (recommended for expectations)
370
+ // Data fields (e.g. toolArgs) are auto-compared when present on expected steps
371
+ const result = compareTrajectories(
372
+ actualTrajectory,
373
+ { steps: [{ name: 'search' }, { name: 'summarize', stepType: 'tool_call' }] },
374
+ { allowRepeatedSteps: true },
375
+ )
376
+ // result.score — 0.0 to 1.0
377
+ // result.missingSteps — step names not found
378
+ // result.extraSteps — unexpected step names
379
+ // result.outOfOrderSteps — steps found but in wrong order
380
+ ```
381
+
382
+ **Returns:** `TrajectoryComparisonResult`
383
+
384
+ ### `createTrajectoryTestRun`
385
+
386
+ Creates a test run object for trajectory scorers. Wraps a `Trajectory` into the expected `ScorerRun` format.
387
+
388
+ ```typescript
389
+ import { createTrajectoryTestRun } from '@mastra/evals/scorers/utils'
390
+
391
+ const run = createTrajectoryTestRun({
392
+ steps: [
393
+ { stepType: 'tool_call', name: 'search', toolArgs: { q: 'test' } },
394
+ { stepType: 'tool_call', name: 'summarize' },
395
+ ],
396
+ })
397
+
398
+ const result = await trajectoryScorer.run(run)
399
+ ```
400
+
401
+ ### `checkTrajectoryEfficiency`
402
+
403
+ Evaluates trajectory efficiency against step, token, and duration budgets. Also detects redundant calls (same tool with same arguments).
404
+
405
+ ```typescript
406
+ import { checkTrajectoryEfficiency } from '@mastra/evals/scorers/utils'
407
+
408
+ const result = checkTrajectoryEfficiency(trajectory, {
409
+ maxSteps: 5,
410
+ maxTotalTokens: 2000,
411
+ maxTotalDurationMs: 5000,
412
+ noRedundantCalls: true,
413
+ })
414
+ // result.score — 1.0 if within all budgets, lower with penalties
415
+ // result.redundantCalls — duplicate tool+args combos
416
+ // result.overStepBudget — true if maxSteps exceeded
417
+ // result.overTokenBudget — true if maxTotalTokens exceeded
418
+ // result.overDurationBudget — true if maxTotalDurationMs exceeded
419
+ ```
420
+
421
+ **Returns:** `TrajectoryEfficiencyResult`
422
+
423
+ ### `checkTrajectoryBlacklist`
424
+
425
+ Checks whether a trajectory contains forbidden tools or tool call sequences.
426
+
427
+ ```typescript
428
+ import { checkTrajectoryBlacklist } from '@mastra/evals/scorers/utils'
429
+
430
+ const result = checkTrajectoryBlacklist(trajectory, {
431
+ blacklistedTools: ['deleteAll', 'admin-override'],
432
+ blacklistedSequences: [['escalate', 'admin-override']],
433
+ })
434
+ // result.score — 1.0 if no violations, 0.0 if any found
435
+ // result.violatedTools — blacklisted tools that were called
436
+ // result.violatedSequences — blacklisted sequences that were detected
437
+ ```
438
+
439
+ **Returns:** `TrajectoryBlacklistResult`
440
+
441
+ ### `analyzeToolFailures`
442
+
443
+ Detects tool failure patterns including retries, fallbacks, and argument corrections.
444
+
445
+ ```typescript
446
+ import { analyzeToolFailures } from '@mastra/evals/scorers/utils'
447
+
448
+ const result = analyzeToolFailures(trajectory, {
449
+ maxRetriesPerTool: 2,
450
+ })
451
+ // result.score — 1.0 if no failure patterns, lower if patterns detected
452
+ // result.patterns — detected patterns (retry, fallback, arg_correction)
453
+ ```
454
+
455
+ **Returns:** `ToolFailureAnalysisResult`
456
+
269
457
  ## Complete example
270
458
 
271
459
  Here's a complete example showing how to use multiple utilities together: