@mastra/mcp-docs-server 1.1.22-alpha.11 → 1.1.22-alpha.13

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,374 @@
1
+ # AgentBrowser class
2
+
3
+ The `AgentBrowser` class provides deterministic browser automation using the [agent-browser](https://github.com/vercel-labs/agent-browser) library. It uses accessibility tree snapshots and element refs (e.g., `@e5`) for precise, reproducible interactions.
4
+
5
+ Use `AgentBrowser` when you need reliable, deterministic browser automation. For AI-powered interactions using natural language, see [`StagehandBrowser`](https://mastra.ai/reference/browser/stagehand-browser).
6
+
7
+ ## Usage example
8
+
9
+ ```typescript
10
+ import { Agent } from '@mastra/core/agent'
11
+ import { AgentBrowser } from '@mastra/agent-browser'
12
+
13
+ const browser = new AgentBrowser({
14
+ headless: true,
15
+ viewport: { width: 1280, height: 720 },
16
+ scope: 'thread',
17
+ })
18
+
19
+ export const browserAgent = new Agent({
20
+ name: 'browser-agent',
21
+ instructions: `You can browse the web. Use browser_snapshot to see the page structure,
22
+ then interact with elements using their refs (e.g., @e5).`,
23
+ model: 'openai/gpt-5.4',
24
+ browser,
25
+ })
26
+ ```
27
+
28
+ ## Constructor parameters
29
+
30
+ **headless** (`boolean`): Whether to run the browser in headless mode (no visible UI). (Default: `true`)
31
+
32
+ **viewport** (`{ width: number; height: number }`): Browser viewport dimensions. (Default: `{ width: 1280, height: 720 }`)
33
+
34
+ **timeout** (`number`): Default timeout in milliseconds for browser operations. (Default: `30000`)
35
+
36
+ **cdpUrl** (`string | (() => string | Promise<string>)`): CDP WebSocket URL for connecting to an existing browser. Useful for cloud browser providers.
37
+
38
+ **scope** (`'shared' | 'thread'`): Browser instance scope. 'shared' shares one browser across all threads. 'thread' gives each thread its own browser. (Default: `'thread' (or 'shared' when cdpUrl is provided)`)
39
+
40
+ **onLaunch** (`(args: { browser: MastraBrowser }) => void | Promise<void>`): Callback invoked after the browser is ready.
41
+
42
+ **onClose** (`(args: { browser: MastraBrowser }) => void | Promise<void>`): Callback invoked before the browser closes.
43
+
44
+ **screencast** (`ScreencastOptions`): Configuration for streaming browser frames to Studio.
45
+
46
+ ## Tools
47
+
48
+ `AgentBrowser` provides 15 deterministic tools for browser automation. All tools that interact with elements use refs from the accessibility tree snapshot.
49
+
50
+ ### Core tools
51
+
52
+ | Tool | Description |
53
+ | ------------------ | ------------------------------------------------- |
54
+ | `browser_goto` | Navigate to a URL |
55
+ | `browser_snapshot` | Get accessibility tree snapshot with element refs |
56
+ | `browser_click` | Click an element by ref |
57
+ | `browser_type` | Type text into an element |
58
+ | `browser_press` | Press keyboard keys |
59
+ | `browser_select` | Select option from dropdown |
60
+ | `browser_scroll` | Scroll the page or element |
61
+ | `browser_close` | Close the browser |
62
+
63
+ ### Extended tools
64
+
65
+ | Tool | Description |
66
+ | ------------------ | ----------------------------------------------- |
67
+ | `browser_hover` | Hover over an element |
68
+ | `browser_back` | Go back in browser history |
69
+ | `browser_dialog` | Handle browser dialogs (alert, confirm, prompt) |
70
+ | `browser_wait` | Wait for element state changes |
71
+ | `browser_tabs` | Manage browser tabs (list, new, switch, close) |
72
+ | `browser_drag` | Drag and drop elements |
73
+ | `browser_evaluate` | Execute JavaScript in the page (escape hatch) |
74
+
75
+ ## Tool reference
76
+
77
+ ### `browser_goto`
78
+
79
+ Navigate to a URL.
80
+
81
+ ```text
82
+ // Tool input
83
+ {
84
+ "url": "https://example.com",
85
+ "waitUntil": "domcontentloaded",
86
+ "timeout": 30000
87
+ }
88
+ ```
89
+
90
+ | Parameter | Type | Description |
91
+ | ----------- | ----------------------------------------------- | ----------------------------------------------- |
92
+ | `url` | `string` | URL to navigate to |
93
+ | `waitUntil` | `"load" \| "domcontentloaded" \| "networkidle"` | When to consider navigation complete (optional) |
94
+ | `timeout` | `number` | Navigation timeout in ms (optional) |
95
+
96
+ ### `browser_snapshot`
97
+
98
+ Get an accessibility tree snapshot of the page. Returns element refs like `@e5` that you use with other tools.
99
+
100
+ ```text
101
+ // Tool input
102
+ {
103
+ "interactiveOnly": true,
104
+ "maxDepth": 10
105
+ }
106
+ ```
107
+
108
+ | Parameter | Type | Description |
109
+ | ----------------- | --------- | -------------------------------------------- |
110
+ | `interactiveOnly` | `boolean` | Only include interactive elements (optional) |
111
+ | `maxDepth` | `number` | Maximum tree depth (optional) |
112
+
113
+ **Example output:**
114
+
115
+ ```text
116
+ [document] Example Page
117
+ [banner]
118
+ [link @e1] Home
119
+ [link @e2] About
120
+ [main]
121
+ [heading @e3] Welcome
122
+ [textbox @e4] Search...
123
+ [button @e5] Submit
124
+ ```
125
+
126
+ ### `browser_click`
127
+
128
+ Click an element using its ref from the snapshot.
129
+
130
+ ```text
131
+ {
132
+ "ref": "@e5",
133
+ "button": "left",
134
+ "clickCount": 1,
135
+ "modifiers": ["Control", "Shift"]
136
+ }
137
+ ```
138
+
139
+ | Parameter | Type | Description |
140
+ | ------------ | ------------------------------- | ---------------------------------------------- |
141
+ | `ref` | `string` | Element ref from snapshot (required) |
142
+ | `button` | `"left" \| "right" \| "middle"` | Mouse button (optional) |
143
+ | `clickCount` | `number` | Number of activations, 2 for double (optional) |
144
+ | `modifiers` | `string[]` | Modifier keys (optional) |
145
+
146
+ ### `browser_type`
147
+
148
+ Type text into an input element.
149
+
150
+ ```text
151
+ // Tool input
152
+ {
153
+ "ref": "@e4",
154
+ "text": "search query",
155
+ "clear": true,
156
+ "delay": 50
157
+ }
158
+ ```
159
+
160
+ | Parameter | Type | Description |
161
+ | --------- | --------- | ----------------------------------------- |
162
+ | `ref` | `string` | Element ref from snapshot (required) |
163
+ | `text` | `string` | Text to type (required) |
164
+ | `clear` | `boolean` | Clear existing content first (optional) |
165
+ | `delay` | `number` | Delay between keystrokes in ms (optional) |
166
+
167
+ ### `browser_press`
168
+
169
+ Press keyboard keys.
170
+
171
+ ```text
172
+ // Tool input
173
+ {
174
+ "key": "Enter",
175
+ "modifiers": ["Control"]
176
+ }
177
+
178
+ // Key combinations
179
+ { "key": "Control+a" }
180
+ { "key": "Control+c" }
181
+ ```
182
+
183
+ | Parameter | Type | Description |
184
+ | ----------- | ---------- | ----------------------------------------------------------------- |
185
+ | `key` | `string` | Key name (e.g., "Enter", "Tab", "Escape", "Control+a") (required) |
186
+ | `modifiers` | `string[]` | Modifier keys (optional) |
187
+
188
+ ### `browser_select`
189
+
190
+ Select an option from a dropdown. Provide one of `value`, `label`, or `index`.
191
+
192
+ ```text
193
+ // Tool input - by value
194
+ {
195
+ "ref": "@e10",
196
+ "value": "option-value"
197
+ }
198
+
199
+ // Tool input - by label
200
+ {
201
+ "ref": "@e10",
202
+ "label": "Option Text"
203
+ }
204
+
205
+ // Tool input - by index
206
+ {
207
+ "ref": "@e10",
208
+ "index": 0
209
+ }
210
+ ```
211
+
212
+ ### `browser_scroll`
213
+
214
+ Scroll the page or a specific element.
215
+
216
+ ```text
217
+ // Tool input
218
+ {
219
+ "direction": "down",
220
+ "amount": 300,
221
+ "ref": "@e15"
222
+ }
223
+ ```
224
+
225
+ | Parameter | Type | Description |
226
+ | ----------- | ------------------------------------- | ----------------------------------------------------- |
227
+ | `direction` | `"up" \| "down" \| "left" \| "right"` | Scroll direction (required) |
228
+ | `amount` | `number` | Pixels to scroll, default 300 (optional) |
229
+ | `ref` | `string` | Element to scroll, scrolls page if omitted (optional) |
230
+
231
+ ### `browser_hover`
232
+
233
+ Hover over an element to trigger hover effects.
234
+
235
+ ```text
236
+ // Tool input
237
+ {
238
+ "ref": "@e7"
239
+ }
240
+ ```
241
+
242
+ ### `browser_back`
243
+
244
+ Go back in browser history.
245
+
246
+ ```text
247
+ // Tool input (no parameters required)
248
+
249
+ ```
250
+
251
+ ### `browser_dialog`
252
+
253
+ Handle browser dialogs (alert, confirm, prompt). Click an element that triggers a dialog and handle it.
254
+
255
+ ```text
256
+ // Tool input
257
+ {
258
+ "triggerRef": "@e5",
259
+ "action": "accept",
260
+ "text": "response"
261
+ }
262
+ ```
263
+
264
+ | Parameter | Type | Description |
265
+ | ------------ | ----------------------- | ------------------------------------------- |
266
+ | `triggerRef` | `string` | Element that triggers the dialog (required) |
267
+ | `action` | `"accept" \| "dismiss"` | How to handle the dialog (required) |
268
+ | `text` | `string` | Text for prompt dialogs (optional) |
269
+
270
+ ### `browser_wait`
271
+
272
+ Wait for an element to reach a specific state.
273
+
274
+ ```text
275
+ // Tool input
276
+ {
277
+ "ref": "@e20",
278
+ "state": "visible",
279
+ "timeout": 30000
280
+ }
281
+ ```
282
+
283
+ | Parameter | Type | Description |
284
+ | --------- | --------------------------------------------------- | ---------------------------------- |
285
+ | `ref` | `string` | Element ref to wait for (optional) |
286
+ | `state` | `"visible" \| "hidden" \| "attached" \| "detached"` | State to wait for (optional) |
287
+ | `timeout` | `number` | Max wait time in ms (optional) |
288
+
289
+ ### `browser_tabs`
290
+
291
+ Manage browser tabs.
292
+
293
+ ```text
294
+ // List all tabs
295
+ { "action": "list" }
296
+
297
+ // Open new tab
298
+ { "action": "new", "url": "https://example.com" }
299
+
300
+ // Switch to tab by index
301
+ { "action": "switch", "index": 0 }
302
+
303
+ // Close tab by index
304
+ { "action": "close", "index": 1 }
305
+ ```
306
+
307
+ ### `browser_drag`
308
+
309
+ Drag an element to a target location.
310
+
311
+ ```text
312
+ // Tool input
313
+ {
314
+ "sourceRef": "@e10",
315
+ "targetRef": "@e20"
316
+ }
317
+ ```
318
+
319
+ | Parameter | Type | Description |
320
+ | ----------- | -------- | ------------------------------ |
321
+ | `sourceRef` | `string` | Element to drag (required) |
322
+ | `targetRef` | `string` | Drop target element (required) |
323
+
324
+ ### `browser_evaluate`
325
+
326
+ Execute JavaScript in the page context. Use as an escape hatch when other tools don't cover your use case.
327
+
328
+ ```text
329
+ // Tool input
330
+ {
331
+ "script": "document.title",
332
+ "returnValue": true
333
+ }
334
+ ```
335
+
336
+ | Parameter | Type | Description |
337
+ | ------------- | --------- | --------------------------------------- |
338
+ | `script` | `string` | JavaScript to execute (required) |
339
+ | `returnValue` | `boolean` | Whether to return the result (optional) |
340
+
341
+ ### `browser_close`
342
+
343
+ Close the browser and clean up resources.
344
+
345
+ ```text
346
+ // Tool input (no parameters required)
347
+
348
+ ```
349
+
350
+ ## How refs work
351
+
352
+ The `browser_snapshot` tool returns an accessibility tree with element refs like `@e1`, `@e2`, etc. These refs are stable identifiers you use with other tools:
353
+
354
+ 1. Call `browser_snapshot` to see the page structure
355
+ 2. Find the element you want to interact with
356
+ 3. Use its ref with interaction tools like `browser_type` or `browser_scroll`.
357
+
358
+ ```text
359
+ // 1. Get snapshot
360
+ // Returns: [textbox @e4] Search... [link @e5] Home
361
+
362
+ // 2. Type in the search box
363
+ { "tool": "browser_type", "input": { "ref": "@e4", "text": "mastra" } }
364
+
365
+ // 3. Navigate to home
366
+ { "tool": "browser_goto", "input": { "url": "https://example.com" } }
367
+ ```
368
+
369
+ ## Related
370
+
371
+ - [MastraBrowser](https://mastra.ai/reference/browser/mastra-browser): Base class reference
372
+ - [StagehandBrowser](https://mastra.ai/reference/browser/stagehand-browser): AI-powered alternative
373
+ - [Browser overview](https://mastra.ai/docs/browser/overview): Conceptual guide
374
+ - [agent-browser guide](https://mastra.ai/docs/browser/agent-browser): Usage guide
@@ -0,0 +1,284 @@
1
+ # MastraBrowser class
2
+
3
+ The `MastraBrowser` class is the abstract base class for browser automation providers. It defines the common interface for launching browsers, managing thread isolation, streaming screencasts, and handling input events.
4
+
5
+ You don't instantiate `MastraBrowser` directly. Instead, use a provider implementation:
6
+
7
+ - [`AgentBrowser`](https://mastra.ai/reference/browser/agent-browser): Deterministic browser automation using refs
8
+ - [`StagehandBrowser`](https://mastra.ai/reference/browser/stagehand-browser): AI-powered browser automation using natural language
9
+
10
+ ## Usage example
11
+
12
+ ```typescript
13
+ import { Agent } from '@mastra/core/agent'
14
+ import { AgentBrowser } from '@mastra/agent-browser'
15
+
16
+ const browser = new AgentBrowser({
17
+ headless: true,
18
+ viewport: { width: 1280, height: 720 },
19
+ scope: 'thread',
20
+ })
21
+
22
+ export const browserAgent = new Agent({
23
+ name: 'browser-agent',
24
+ instructions: 'You can browse the web to find information.',
25
+ model: 'openai/gpt-5.4',
26
+ browser,
27
+ })
28
+ ```
29
+
30
+ ## Constructor parameters
31
+
32
+ **headless** (`boolean`): Whether to run the browser in headless mode (no visible UI). (Default: `true`)
33
+
34
+ **viewport** (`{ width: number; height: number }`): Browser viewport dimensions. Controls the size of the browser window. (Default: `{ width: 1280, height: 720 }`)
35
+
36
+ **timeout** (`number`): Default timeout in milliseconds for browser operations. (Default: `10000`)
37
+
38
+ **cdpUrl** (`string | (() => string | Promise<string>)`): CDP WebSocket URL, HTTP endpoint, or sync/async provider function. When provided, connects to an existing browser instead of launching a new one. HTTP endpoints are resolved to WebSocket internally. Can't be used with scope: 'thread' (automatically uses shared scope).
39
+
40
+ **scope** (`'shared' | 'thread'`): Browser instance scope across threads. 'shared' means all threads share a single browser instance. 'thread' means each thread gets its own browser instance (full isolation). (Default: `'thread' (or 'shared' when cdpUrl is provided)`)
41
+
42
+ **onLaunch** (`(args: { browser: MastraBrowser }) => void | Promise<void>`): Callback invoked after the browser reaches 'ready' status.
43
+
44
+ **onClose** (`(args: { browser: MastraBrowser }) => void | Promise<void>`): Callback invoked before the browser is closed.
45
+
46
+ **screencast** (`ScreencastOptions`): Configuration for streaming browser frames.
47
+
48
+ **screencast.format** (`'jpeg' | 'png'`): Image format for screencast frames.
49
+
50
+ **screencast.quality** (`number`): Image quality (1-100). Only applies to JPEG format.
51
+
52
+ **screencast.maxWidth** (`number`): Maximum width for screencast frames.
53
+
54
+ **screencast.maxHeight** (`number`): Maximum height for screencast frames.
55
+
56
+ **screencast.everyNthFrame** (`number`): Capture every Nth frame to reduce bandwidth.
57
+
58
+ ## Properties
59
+
60
+ The following properties (`id`, `name`, `provider`) are abstract and must be defined by concrete provider implementations:
61
+
62
+ **id** (`string`): Unique identifier for this browser instance. Abstract - defined by provider.
63
+
64
+ **name** (`string`): Human-readable name of the browser provider (e.g., 'AgentBrowser', 'StagehandBrowser'). Abstract - defined by provider.
65
+
66
+ **provider** (`string`): Provider identifier (e.g., 'vercel-labs/agent-browser', 'browserbase/stagehand'). Abstract - defined by provider.
67
+
68
+ **headless** (`boolean`): Whether the browser is running in headless mode.
69
+
70
+ **status** (`BrowserStatus`): Current browser status: 'pending', 'launching', 'ready', 'error', 'closing', or 'closed'.
71
+
72
+ ## Methods
73
+
74
+ ### Lifecycle
75
+
76
+ #### `ensureReady()`
77
+
78
+ Ensures the browser is launched and ready for use. Automatically called before tool execution. Implemented in the base class.
79
+
80
+ ```typescript
81
+ await browser.ensureReady()
82
+ ```
83
+
84
+ #### `close()`
85
+
86
+ Closes the browser and cleans up all resources. Implemented in the base class with race-condition-safe handling.
87
+
88
+ ```typescript
89
+ await browser.close()
90
+ ```
91
+
92
+ #### `isBrowserRunning()`
93
+
94
+ Checks if the browser is currently running.
95
+
96
+ ```typescript
97
+ const isRunning = browser.isBrowserRunning()
98
+ ```
99
+
100
+ **Returns:** `boolean`
101
+
102
+ ### Thread management
103
+
104
+ #### `setCurrentThread(threadId)`
105
+
106
+ Sets the current thread ID for browser operations. Used internally by the agent runtime.
107
+
108
+ ```typescript
109
+ browser.setCurrentThread('thread-123')
110
+ ```
111
+
112
+ #### `getCurrentThread()`
113
+
114
+ Gets the current thread ID.
115
+
116
+ ```typescript
117
+ const threadId = browser.getCurrentThread()
118
+ ```
119
+
120
+ **Returns:** `string`
121
+
122
+ #### `hasThreadSession(threadId)`
123
+
124
+ Checks if a thread has an active browser session.
125
+
126
+ ```typescript
127
+ const hasSession = browser.hasThreadSession('thread-123')
128
+ ```
129
+
130
+ **Returns:** `boolean`
131
+
132
+ #### `closeThreadSession(threadId)`
133
+
134
+ Closes a specific thread's browser session. For 'thread' scope, this closes that thread's browser instance. For 'shared' scope, this clears the thread's state.
135
+
136
+ ```typescript
137
+ await browser.closeThreadSession('thread-123')
138
+ ```
139
+
140
+ ### Tools
141
+
142
+ #### `getTools()`
143
+
144
+ Returns the browser tools for use with agents. Each provider returns different tools based on its paradigm.
145
+
146
+ ```typescript
147
+ const tools = browser.getTools()
148
+ ```
149
+
150
+ **Returns:** `Record<string, Tool>`
151
+
152
+ ### Screencast
153
+
154
+ #### `startScreencast(options?, threadId?)`
155
+
156
+ Starts streaming browser frames. Returns a `ScreencastStream` that emits frame events.
157
+
158
+ ```typescript
159
+ const stream = await browser.startScreencast({ format: 'jpeg', quality: 80 }, 'thread-123')
160
+
161
+ stream.on('frame', frame => {
162
+ console.log('Frame received:', frame.data.length, 'bytes')
163
+ })
164
+
165
+ stream.on('stop', reason => {
166
+ console.log('Screencast stopped:', reason)
167
+ })
168
+ ```
169
+
170
+ **Returns:** `Promise<ScreencastStream>`
171
+
172
+ ### Input injection
173
+
174
+ #### `injectMouseEvent(params, threadId?)`
175
+
176
+ Injects a mouse event into the browser. Used by Studio for live interaction.
177
+
178
+ ```typescript
179
+ await browser.injectMouseEvent({
180
+ type: 'mousePressed',
181
+ x: 100,
182
+ y: 200,
183
+ button: 'left',
184
+ clickCount: 1,
185
+ })
186
+ ```
187
+
188
+ #### `injectKeyboardEvent(params, threadId?)`
189
+
190
+ Injects a keyboard event into the browser. Used by Studio for live interaction.
191
+
192
+ ```typescript
193
+ await browser.injectKeyboardEvent({
194
+ type: 'keyDown',
195
+ key: 'Enter',
196
+ code: 'Enter',
197
+ })
198
+ ```
199
+
200
+ ### State
201
+
202
+ #### `getState(threadId?)`
203
+
204
+ Gets the current browser state including URL and tabs.
205
+
206
+ ```typescript
207
+ const state = await browser.getState('thread-123')
208
+ console.log('Current URL:', state.currentUrl)
209
+ console.log('Tabs:', state.tabs)
210
+ ```
211
+
212
+ **Returns:** `Promise<BrowserState>`
213
+
214
+ ```typescript
215
+ interface BrowserState {
216
+ currentUrl: string | null
217
+ tabs: BrowserTabState[]
218
+ activeTabIndex: number
219
+ }
220
+
221
+ interface BrowserTabState {
222
+ id: string
223
+ url: string
224
+ title: string
225
+ }
226
+ ```
227
+
228
+ #### `getCurrentUrl(threadId?)`
229
+
230
+ Gets the current page URL.
231
+
232
+ ```typescript
233
+ const url = await browser.getCurrentUrl()
234
+ ```
235
+
236
+ **Returns:** `Promise<string | null>`
237
+
238
+ ## Browser scope
239
+
240
+ The `scope` option controls how browser instances are shared across conversation threads:
241
+
242
+ | Scope | Description | Use case |
243
+ | ---------- | ------------------------------------------- | ---------------------------------------- |
244
+ | `'shared'` | All threads share a single browser instance | Cost-efficient for non-conflicting tasks |
245
+ | `'thread'` | Each thread gets its own browser instance | Full isolation for concurrent users |
246
+
247
+ ```typescript
248
+ // Shared browser for all threads
249
+ const sharedBrowser = new AgentBrowser({
250
+ scope: 'shared',
251
+ })
252
+
253
+ // Isolated browser per thread
254
+ const isolatedBrowser = new AgentBrowser({
255
+ scope: 'thread',
256
+ })
257
+ ```
258
+
259
+ When using `cdpUrl` to connect to an external browser, the scope automatically falls back to `'shared'` since you can't spawn new browser instances.
260
+
261
+ ## Cloud browser providers
262
+
263
+ Connect to cloud browser services using the `cdpUrl` option:
264
+
265
+ ```typescript
266
+ // Static CDP URL
267
+ const browser = new AgentBrowser({
268
+ cdpUrl: 'wss://browser.example.com/ws',
269
+ })
270
+
271
+ // Dynamic CDP URL (e.g., session-based)
272
+ const browser = new AgentBrowser({
273
+ cdpUrl: async () => {
274
+ const session = await createBrowserSession()
275
+ return session.wsUrl
276
+ },
277
+ })
278
+ ```
279
+
280
+ ## Related
281
+
282
+ - [AgentBrowser](https://mastra.ai/reference/browser/agent-browser): Deterministic browser automation
283
+ - [StagehandBrowser](https://mastra.ai/reference/browser/stagehand-browser): AI-powered browser automation
284
+ - [Browser overview](https://mastra.ai/docs/browser/overview): Conceptual guide to browser automation