testdriverai 7.3.15 → 7.3.17

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -37,6 +37,18 @@ const element = await testdriver.find(description, options)
37
37
  Maximum time in milliseconds to poll for the element. Retries every 5 seconds until found or timeout expires.
38
38
  </ParamField>
39
39
 
40
+ <ParamField path="confidence" type="number">
41
+ Minimum confidence threshold (0-1). If the AI's confidence score for the found element is below this value, the find will be treated as a failure (`element.found()` returns `false`). Useful for ensuring high-quality matches in critical test steps.
42
+ </ParamField>
43
+
44
+ <ParamField path="type" type="string">
45
+ Element type hint that wraps the description for better matching. Accepted values:
46
+ - `"text"` — Wraps the prompt as `The text "..."`
47
+ - `"image"` — Wraps the prompt as `The image "..."`
48
+ - `"ui"` — Wraps the prompt as `The UI element "..."`
49
+ - `"any"` — No wrapping, uses the description as-is (default behavior)
50
+ </ParamField>
51
+
40
52
  <ParamField path="zoom" type="boolean" default={false}>
41
53
  Enable two-phase zoom mode for better precision in crowded UIs with many similar elements.
42
54
  </ParamField>
@@ -0,0 +1,118 @@
1
+ ```skill
2
+ ---
3
+ name: testdriver:parse
4
+ description: Detect all UI elements on screen using OmniParser
5
+ ---
6
+ <!-- Generated from parse.mdx. DO NOT EDIT. -->
7
+
8
+ ## Overview
9
+
10
+ Parse the current screen using OmniParser v2 to detect all visible UI elements. Returns structured data including element types, text content, interactivity levels, and bounding box coordinates.
11
+
12
+ This method analyzes the entire screen and returns every detected element. It's useful for:
13
+ - Understanding the full UI layout of a screen
14
+ - Finding all clickable or interactive elements
15
+ - Building custom element-based logic
16
+ - Debugging what elements TestDriver can detect
17
+ - Accessibility auditing
18
+
19
+ <Note>
20
+ **Availability**: `parse()` requires an enterprise or self-hosted plan. It uses OmniParser v2 server-side for element detection.
21
+ </Note>
22
+
23
+ ## Syntax
24
+
25
+ ```javascript
26
+ const result = await testdriver.parse()
27
+ ```
28
+
29
+ ## Parameters
30
+
31
+ None.
32
+
33
+ ## Returns
34
+
35
+ `Promise<ParseResult>` - Object containing detected UI elements
36
+
37
+ ### ParseResult
38
+
39
+ | Property | Type | Description |
40
+ |----------|------|-------------|
41
+ | `elements` | `ParsedElement[]` | Array of detected UI elements |
42
+ | `annotatedImageUrl` | `string` | URL of the annotated screenshot with bounding boxes |
43
+ | `imageWidth` | `number` | Width of the analyzed screenshot |
44
+ | `imageHeight` | `number` | Height of the analyzed screenshot |
45
+
46
+ ### ParsedElement
47
+
48
+ | Property | Type | Description |
49
+ |----------|------|-------------|
50
+ | `index` | `number` | Element index |
51
+ | `type` | `string` | Element type (e.g. `"text"`, `"icon"`, `"button"`) |
52
+ | `content` | `string` | Text content or description of the element |
53
+ | `interactivity` | `string` | Interactivity level (e.g. `"clickable"`, `"non-interactive"`) |
54
+ | `bbox` | `object` | Bounding box in pixel coordinates `{x0, y0, x1, y1}` |
55
+ | `boundingBox` | `object` | Bounding box as `{left, top, width, height}` |
56
+
57
+ ## Examples
58
+
59
+ ### Get All Elements on Screen
60
+
61
+ ```javascript
62
+ const result = await testdriver.parse();
63
+ console.log(`Found ${result.elements.length} elements`);
64
+
65
+ result.elements.forEach((el, i) => {
66
+ console.log(`${i + 1}. [${el.type}] "${el.content}" (${el.interactivity})`);
67
+ });
68
+ ```
69
+
70
+ ### Find Clickable Elements
71
+
72
+ ```javascript
73
+ const result = await testdriver.parse();
74
+
75
+ const clickable = result.elements.filter(e => e.interactivity === 'clickable');
76
+ console.log(`Found ${clickable.length} clickable elements`);
77
+ ```
78
+
79
+ ### Find and Click an Element by Content
80
+
81
+ ```javascript
82
+ const result = await testdriver.parse();
83
+
84
+ const submitBtn = result.elements.find(e =>
85
+ e.content.toLowerCase().includes('submit') && e.interactivity === 'clickable'
86
+ );
87
+
88
+ if (submitBtn) {
89
+ const x = Math.round((submitBtn.bbox.x0 + submitBtn.bbox.x1) / 2);
90
+ const y = Math.round((submitBtn.bbox.y0 + submitBtn.bbox.y1) / 2);
91
+ await testdriver.click({ x, y });
92
+ }
93
+ ```
94
+
95
+ ### Filter by Element Type
96
+
97
+ ```javascript
98
+ const result = await testdriver.parse();
99
+
100
+ const textElements = result.elements.filter(e => e.type === 'text');
101
+ const icons = result.elements.filter(e => e.type === 'icon');
102
+ const buttons = result.elements.filter(e => e.type === 'button');
103
+ ```
104
+
105
+ ## Best Practices
106
+
107
+ - Use `find()` for targeting specific elements — `parse()` is for full UI analysis
108
+ - Filter by `interactivity` to distinguish clickable vs non-interactive elements
109
+ - Wait for the page to stabilize before calling `parse()`
110
+ - Use the `annotatedImageUrl` for visual debugging
111
+
112
+ ## Related
113
+
114
+ - [find()](/v7/find) - AI-powered element location
115
+ - [assert()](/v7/assert) - Make AI-powered assertions about screen state
116
+ - [screenshot()](/v7/screenshot) - Capture screenshots
117
+ - [Elements Reference](/v7/elements) - Complete Element API
118
+ ```
package/CHANGELOG.md CHANGED
@@ -1,3 +1,16 @@
1
+ ## [7.3.17](https://github.com/testdriverai/testdriverai/compare/v7.3.16...v7.3.17) (2026-02-18)
2
+
3
+
4
+ ### Features
5
+
6
+ * add type option to find(), move confidence to API, rename ocr to parse ([#640](https://github.com/testdriverai/testdriverai/issues/640)) ([d98a94b](https://github.com/testdriverai/testdriverai/commit/d98a94bd05c135c74d9fed2ebb72c709fc643337))
7
+
8
+
9
+
10
+ ## [7.3.16](https://github.com/testdriverai/testdriverai/compare/v7.3.14...v7.3.16) (2026-02-18)
11
+
12
+
13
+
1
14
  ## [7.3.15](https://github.com/testdriverai/testdriverai/compare/v7.3.14...v7.3.15) (2026-02-18)
2
15
 
3
16
 
package/agent/lib/sdk.js CHANGED
@@ -337,9 +337,11 @@ const createSDK = (emitter, config, sessionInstance) => {
337
337
  }
338
338
 
339
339
  const req = async (path, data, onChunk) => {
340
- // for each value of data, if it is empty remove it
340
+ // for each value of data, if it is null/undefined remove it
341
+ // Note: use == null to match both null and undefined, but preserve
342
+ // other falsy values like 0, false, and "" which may be intentional
341
343
  for (let key in data) {
342
- if (!data[key]) {
344
+ if (data[key] == null) {
343
345
  delete data[key];
344
346
  }
345
347
  }
@@ -37,6 +37,18 @@ const element = await testdriver.find(description, options)
37
37
  Maximum time in milliseconds to poll for the element. Retries every 5 seconds until found or timeout expires.
38
38
  </ParamField>
39
39
 
40
+ <ParamField path="confidence" type="number">
41
+ Minimum confidence threshold (0-1). If the AI's confidence score for the found element is below this value, the find will be treated as a failure (`element.found()` returns `false`). Useful for ensuring high-quality matches in critical test steps.
42
+ </ParamField>
43
+
44
+ <ParamField path="type" type="string">
45
+ Element type hint that wraps the description for better matching. Accepted values:
46
+ - `"text"` — Wraps the prompt as `The text "..."`
47
+ - `"image"` — Wraps the prompt as `The image "..."`
48
+ - `"ui"` — Wraps the prompt as `The UI element "..."`
49
+ - `"any"` — No wrapping, uses the description as-is (default behavior)
50
+ </ParamField>
51
+
40
52
  <ParamField path="zoom" type="boolean" default={false}>
41
53
  Enable two-phase zoom mode for better precision in crowded UIs with many similar elements.
42
54
  </ParamField>
@@ -232,6 +244,69 @@ This is useful for:
232
244
  ```
233
245
  </Check>
234
246
 
247
+ ## Confidence Threshold
248
+
249
+ Require a minimum AI confidence score for element matches. If the confidence is below the threshold, `find()` treats the result as not found:
250
+
251
+ ```javascript
252
+ // Require at least 90% confidence
253
+ const element = await testdriver.find('submit button', { confidence: 0.9 });
254
+
255
+ if (!element.found()) {
256
+ // AI found something but wasn't confident enough
257
+ throw new Error('Could not confidently locate submit button');
258
+ }
259
+
260
+ await element.click();
261
+ ```
262
+
263
+ This is useful for:
264
+ - Critical test steps where an incorrect click could cause cascading failures
265
+ - Distinguishing between similar elements (e.g., multiple buttons)
266
+ - Failing fast when the UI has changed unexpectedly
267
+
268
+ ```javascript
269
+ // Combine with timeout for robust polling with confidence gate
270
+ const element = await testdriver.find('success notification', {
271
+ confidence: 0.85,
272
+ timeout: 15000,
273
+ });
274
+ ```
275
+
276
+ <Tip>
277
+ The `confidence` value is a float between 0 and 1 (e.g., `0.9` = 90%). The AI returns its confidence with each find result, which you can also read from `element.confidence` after a successful find.
278
+ </Tip>
279
+ ## Element Type
280
+
281
+ Use the `type` option to hint what kind of element you're looking for. This wraps your description into a more specific prompt for the AI, improving match accuracy — especially when users provide short or ambiguous descriptions.
282
+
283
+ ```javascript
284
+ // Find text on the page
285
+ const label = await testdriver.find('Sign In', { type: 'text' });
286
+ // AI prompt becomes: The text "Sign In"
287
+
288
+ // Find an image
289
+ const logo = await testdriver.find('company logo', { type: 'image' });
290
+ // AI prompt becomes: The image "company logo"
291
+
292
+ // Find a UI element (button, input, checkbox, etc.)
293
+ const btn = await testdriver.find('Submit', { type: 'ui' });
294
+ // AI prompt becomes: The UI element "Submit"
295
+
296
+ // No wrapping — same as omitting the option
297
+ const el = await testdriver.find('the blue submit button', { type: 'any' });
298
+ ```
299
+
300
+ | Type | Prompt sent to AI |
301
+ |------|----|
302
+ | `"text"` | `The text "..."` |
303
+ | `"image"` | `The image "..."` |
304
+ | `"ui"` | `The UI element "..."` |
305
+ | `"any"` | Original description (no wrapping) |
306
+
307
+ <Tip>
308
+ This is particularly useful for short descriptions like `"Submit"` or `"Login"` where the AI may not know whether to look for a button, a link, or visible text. Specifying `type` removes the ambiguity.
309
+ </Tip>
235
310
  ## Polling for Dynamic Elements
236
311
 
237
312
  For elements that may not be immediately visible, use the `timeout` option to automatically poll:
@@ -0,0 +1,236 @@
1
+ ---
2
+ name: testdriver:parse
3
+ description: Detect all UI elements on screen using OmniParser
4
+ ---
5
+ <!-- Generated from parse.mdx. DO NOT EDIT. -->
6
+
7
+ ## Overview
8
+
9
+ Parse the current screen using OmniParser v2 to detect all visible UI elements. Returns structured data including element types, text content, interactivity levels, and bounding box coordinates.
10
+
11
+ This method analyzes the entire screen and returns every detected element. It's useful for:
12
+ - Understanding the full UI layout of a screen
13
+ - Finding all clickable or interactive elements
14
+ - Building custom element-based logic
15
+ - Debugging what elements TestDriver can detect
16
+ - Accessibility auditing
17
+
18
+ <Note>
19
+ **Availability**: `parse()` requires an enterprise or self-hosted plan. It uses OmniParser v2 server-side for element detection.
20
+ </Note>
21
+
22
+ ## Syntax
23
+
24
+ ```javascript
25
+ const result = await testdriver.parse()
26
+ ```
27
+
28
+ ## Parameters
29
+
30
+ None.
31
+
32
+ ## Returns
33
+
34
+ `Promise<ParseResult>` - Object containing detected UI elements
35
+
36
+ ### ParseResult
37
+
38
+ | Property | Type | Description |
39
+ |----------|------|-------------|
40
+ | `elements` | `ParsedElement[]` | Array of detected UI elements |
41
+ | `annotatedImageUrl` | `string` | URL of the annotated screenshot with bounding boxes |
42
+ | `imageWidth` | `number` | Width of the analyzed screenshot |
43
+ | `imageHeight` | `number` | Height of the analyzed screenshot |
44
+
45
+ ### ParsedElement
46
+
47
+ | Property | Type | Description |
48
+ |----------|------|-------------|
49
+ | `index` | `number` | Element index |
50
+ | `type` | `string` | Element type (e.g. `"text"`, `"icon"`, `"button"`) |
51
+ | `content` | `string` | Text content or description of the element |
52
+ | `interactivity` | `string` | Interactivity level (e.g. `"clickable"`, `"non-interactive"`) |
53
+ | `bbox` | `object` | Bounding box in pixel coordinates `{x0, y0, x1, y1}` |
54
+ | `boundingBox` | `object` | Bounding box as `{left, top, width, height}` |
55
+
56
+ ## Examples
57
+
58
+ ### Get All Elements on Screen
59
+
60
+ ```javascript
61
+ const result = await testdriver.parse();
62
+ console.log(`Found ${result.elements.length} elements`);
63
+
64
+ result.elements.forEach((el, i) => {
65
+ console.log(`${i + 1}. [${el.type}] "${el.content}" (${el.interactivity})`);
66
+ });
67
+ ```
68
+
69
+ ### Find Clickable Elements
70
+
71
+ ```javascript
72
+ const result = await testdriver.parse();
73
+
74
+ const clickable = result.elements.filter(e => e.interactivity === 'clickable');
75
+ console.log(`Found ${clickable.length} clickable elements`);
76
+
77
+ clickable.forEach(el => {
78
+ console.log(`- "${el.content}" at (${el.bbox.x0}, ${el.bbox.y0})`);
79
+ });
80
+ ```
81
+
82
+ ### Find and Click an Element by Content
83
+
84
+ ```javascript
85
+ const result = await testdriver.parse();
86
+
87
+ // Find a "Submit" button
88
+ const submitBtn = result.elements.find(e =>
89
+ e.content.toLowerCase().includes('submit') && e.interactivity === 'clickable'
90
+ );
91
+
92
+ if (submitBtn) {
93
+ // Calculate center of the bounding box
94
+ const x = Math.round((submitBtn.bbox.x0 + submitBtn.bbox.x1) / 2);
95
+ const y = Math.round((submitBtn.bbox.y0 + submitBtn.bbox.y1) / 2);
96
+
97
+ await testdriver.click({ x, y });
98
+ }
99
+ ```
100
+
101
+ ### Filter by Element Type
102
+
103
+ ```javascript
104
+ const result = await testdriver.parse();
105
+
106
+ // Get all text elements
107
+ const textElements = result.elements.filter(e => e.type === 'text');
108
+ textElements.forEach(e => console.log(`Text: "${e.content}"`));
109
+
110
+ // Get all icons
111
+ const icons = result.elements.filter(e => e.type === 'icon');
112
+ console.log(`Found ${icons.length} icons`);
113
+
114
+ // Get all buttons
115
+ const buttons = result.elements.filter(e => e.type === 'button');
116
+ console.log(`Found ${buttons.length} buttons`);
117
+ ```
118
+
119
+ ### Build Custom Assertions
120
+
121
+ ```javascript
122
+ import { describe, expect, it } from "vitest";
123
+ import { TestDriver } from "testdriverai/lib/vitest/hooks.mjs";
124
+
125
+ describe("Login Page", () => {
126
+ it("should have expected form elements", async (context) => {
127
+ const testdriver = TestDriver(context);
128
+
129
+ await testdriver.provision.chrome({
130
+ url: 'https://myapp.com/login',
131
+ });
132
+
133
+ const result = await testdriver.parse();
134
+
135
+ // Assert expected elements exist
136
+ const textContent = result.elements.map(e => e.content.toLowerCase());
137
+ expect(textContent).toContain('email');
138
+ expect(textContent).toContain('password');
139
+
140
+ // Assert there are clickable elements
141
+ const clickable = result.elements.filter(e => e.interactivity === 'clickable');
142
+ expect(clickable.length).toBeGreaterThan(0);
143
+ });
144
+ });
145
+ ```
146
+
147
+ ### Use Bounding Box Coordinates
148
+
149
+ ```javascript
150
+ const result = await testdriver.parse();
151
+
152
+ result.elements.forEach(el => {
153
+ // Pixel coordinates
154
+ console.log(`Element "${el.content}":`);
155
+ console.log(` bbox: (${el.bbox.x0}, ${el.bbox.y0}) to (${el.bbox.x1}, ${el.bbox.y1})`);
156
+ console.log(` size: ${el.boundingBox.width}x${el.boundingBox.height}`);
157
+ console.log(` position: left=${el.boundingBox.left}, top=${el.boundingBox.top}`);
158
+ });
159
+ ```
160
+
161
+ ### View Annotated Screenshot
162
+
163
+ ```javascript
164
+ const result = await testdriver.parse();
165
+
166
+ // The annotated image shows all detected elements with bounding boxes
167
+ console.log('Annotated screenshot:', result.annotatedImageUrl);
168
+ console.log(`Image dimensions: ${result.imageWidth}x${result.imageHeight}`);
169
+ ```
170
+
171
+ ## How It Works
172
+
173
+ 1. TestDriver captures a screenshot of the current screen
174
+ 2. The image is sent to the TestDriver API
175
+ 3. OmniParser v2 analyzes the image to detect all UI elements
176
+ 4. Each element is classified by type (text, icon, button, etc.) and interactivity
177
+ 5. Bounding box coordinates are returned in pixel coordinates matching the screen resolution
178
+
179
+ <Note>
180
+ OmniParser detects elements visually — it works with any UI framework, native apps, and even non-standard interfaces. It does not rely on DOM or accessibility trees.
181
+ </Note>
182
+
183
+ ## Best Practices
184
+
185
+ <AccordionGroup>
186
+ <Accordion title="Use find() for targeting specific elements">
187
+ For locating and interacting with a specific element, prefer `find()` which uses AI vision. Use `parse()` when you need a complete inventory of all elements on screen.
188
+
189
+ ```javascript
190
+ // Prefer this for clicking a specific element
191
+ await testdriver.find("Submit button").click();
192
+
193
+ // Use parse() for full UI analysis
194
+ const result = await testdriver.parse();
195
+ const allButtons = result.elements.filter(e => e.type === 'button');
196
+ ```
197
+ </Accordion>
198
+
199
+ <Accordion title="Filter by interactivity">
200
+ Use the `interactivity` field to distinguish between clickable and non-interactive elements.
201
+
202
+ ```javascript
203
+ const result = await testdriver.parse();
204
+ const interactive = result.elements.filter(e => e.interactivity === 'clickable');
205
+ const static_ = result.elements.filter(e => e.interactivity === 'non-interactive');
206
+ ```
207
+ </Accordion>
208
+
209
+ <Accordion title="Wait for content to load">
210
+ If elements aren't being detected, the page may not be fully loaded. Add a wait first.
211
+
212
+ ```javascript
213
+ // Wait for page to stabilize
214
+ await testdriver.wait(2000);
215
+
216
+ // Then parse
217
+ const result = await testdriver.parse();
218
+ ```
219
+ </Accordion>
220
+
221
+ <Accordion title="Use the annotated image for debugging">
222
+ The `annotatedImageUrl` provides a visual overlay showing all detected elements with their bounding boxes — great for debugging.
223
+
224
+ ```javascript
225
+ const result = await testdriver.parse();
226
+ console.log('View annotated screenshot:', result.annotatedImageUrl);
227
+ ```
228
+ </Accordion>
229
+ </AccordionGroup>
230
+
231
+ ## Related
232
+
233
+ - [find()](/v7/find) - AI-powered element location
234
+ - [assert()](/v7/assert) - Make AI-powered assertions about screen state
235
+ - [screenshot()](/v7/screenshot) - Capture screenshots
236
+ - [Elements Reference](/v7/elements) - Complete Element API
@@ -36,6 +36,15 @@ export async function logout(testdriver) {
36
36
  }
37
37
  ```
38
38
 
39
+ <Warning>
40
+ **Avoid hardcoding dynamic values in element descriptions.** Element selectors should describe the *type* of element, not specific content that might change.
41
+
42
+ **❌ Bad:** `await testdriver.find('profile name TestDriver in the top right')`
43
+ **✅ Good:** `await testdriver.find('user profile name in the top right')`
44
+
45
+ Hardcoded values like usernames, product names, or prices will cause tests to fail when the data changes. Use generic descriptions that work regardless of the specific content displayed.
46
+ </Warning>
47
+
39
48
  Now import and use these helpers in any test:
40
49
 
41
50
  ```javascript test/checkout.test.mjs
package/docs/docs.json CHANGED
@@ -107,7 +107,7 @@
107
107
  "/v7/hover",
108
108
  "/v7/mouse-down",
109
109
  "/v7/mouse-up",
110
- "/v7/ocr",
110
+ "/v7/parse",
111
111
  "/v7/press-keys",
112
112
  "/v7/right-click",
113
113
  "/v7/screenshot",
package/docs/v7/find.mdx CHANGED
@@ -38,6 +38,18 @@ const element = await testdriver.find(description, options)
38
38
  Maximum time in milliseconds to poll for the element. Retries every 5 seconds until found or timeout expires.
39
39
  </ParamField>
40
40
 
41
+ <ParamField path="confidence" type="number">
42
+ Minimum confidence threshold (0-1). If the AI's confidence score for the found element is below this value, the find will be treated as a failure (`element.found()` returns `false`). Useful for ensuring high-quality matches in critical test steps.
43
+ </ParamField>
44
+
45
+ <ParamField path="type" type="string">
46
+ Element type hint that wraps the description for better matching. Accepted values:
47
+ - `"text"` — Wraps the prompt as `The text "..."`
48
+ - `"image"` — Wraps the prompt as `The image "..."`
49
+ - `"ui"` — Wraps the prompt as `The UI element "..."`
50
+ - `"any"` — No wrapping, uses the description as-is (default behavior)
51
+ </ParamField>
52
+
41
53
  <ParamField path="zoom" type="boolean" default={false}>
42
54
  Enable two-phase zoom mode for better precision in crowded UIs with many similar elements.
43
55
  </ParamField>
@@ -233,6 +245,69 @@ This is useful for:
233
245
  ```
234
246
  </Check>
235
247
 
248
+ ## Confidence Threshold
249
+
250
+ Require a minimum AI confidence score for element matches. If the confidence is below the threshold, `find()` treats the result as not found:
251
+
252
+ ```javascript
253
+ // Require at least 90% confidence
254
+ const element = await testdriver.find('submit button', { confidence: 0.9 });
255
+
256
+ if (!element.found()) {
257
+ // AI found something but wasn't confident enough
258
+ throw new Error('Could not confidently locate submit button');
259
+ }
260
+
261
+ await element.click();
262
+ ```
263
+
264
+ This is useful for:
265
+ - Critical test steps where an incorrect click could cause cascading failures
266
+ - Distinguishing between similar elements (e.g., multiple buttons)
267
+ - Failing fast when the UI has changed unexpectedly
268
+
269
+ ```javascript
270
+ // Combine with timeout for robust polling with confidence gate
271
+ const element = await testdriver.find('success notification', {
272
+ confidence: 0.85,
273
+ timeout: 15000,
274
+ });
275
+ ```
276
+
277
+ <Tip>
278
+ The `confidence` value is a float between 0 and 1 (e.g., `0.9` = 90%). The AI returns its confidence with each find result, which you can also read from `element.confidence` after a successful find.
279
+ </Tip>
280
+ ## Element Type
281
+
282
+ Use the `type` option to hint what kind of element you're looking for. This wraps your description into a more specific prompt for the AI, improving match accuracy — especially when users provide short or ambiguous descriptions.
283
+
284
+ ```javascript
285
+ // Find text on the page
286
+ const label = await testdriver.find('Sign In', { type: 'text' });
287
+ // AI prompt becomes: The text "Sign In"
288
+
289
+ // Find an image
290
+ const logo = await testdriver.find('company logo', { type: 'image' });
291
+ // AI prompt becomes: The image "company logo"
292
+
293
+ // Find a UI element (button, input, checkbox, etc.)
294
+ const btn = await testdriver.find('Submit', { type: 'ui' });
295
+ // AI prompt becomes: The UI element "Submit"
296
+
297
+ // No wrapping — same as omitting the option
298
+ const el = await testdriver.find('the blue submit button', { type: 'any' });
299
+ ```
300
+
301
+ | Type | Prompt sent to AI |
302
+ |------|----|
303
+ | `"text"` | `The text "..."` |
304
+ | `"image"` | `The image "..."` |
305
+ | `"ui"` | `The UI element "..."` |
306
+ | `"any"` | Original description (no wrapping) |
307
+
308
+ <Tip>
309
+ This is particularly useful for short descriptions like `"Submit"` or `"Login"` where the AI may not know whether to look for a button, a link, or visible text. Specifying `type` removes the ambiguity.
310
+ </Tip>
236
311
  ## Polling for Dynamic Elements
237
312
 
238
313
  For elements that may not be immediately visible, use the `timeout` option to automatically poll: