testdriverai 7.3.16 → 7.3.17

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -37,6 +37,18 @@ const element = await testdriver.find(description, options)
37
37
  Maximum time in milliseconds to poll for the element. Retries every 5 seconds until found or timeout expires.
38
38
  </ParamField>
39
39
 
40
+ <ParamField path="confidence" type="number">
41
+ Minimum confidence threshold (0-1). If the AI's confidence score for the found element is below this value, the find will be treated as a failure (`element.found()` returns `false`). Useful for ensuring high-quality matches in critical test steps.
42
+ </ParamField>
43
+
44
+ <ParamField path="type" type="string">
45
+ Element type hint that wraps the description for better matching. Accepted values:
46
+ - `"text"` — Wraps the prompt as `The text "..."`
47
+ - `"image"` — Wraps the prompt as `The image "..."`
48
+ - `"ui"` — Wraps the prompt as `The UI element "..."`
49
+ - `"any"` — No wrapping, uses the description as-is (default behavior)
50
+ </ParamField>
51
+
40
52
  <ParamField path="zoom" type="boolean" default={false}>
41
53
  Enable two-phase zoom mode for better precision in crowded UIs with many similar elements.
42
54
  </ParamField>
@@ -0,0 +1,118 @@
1
+ ```skill
2
+ ---
3
+ name: testdriver:parse
4
+ description: Detect all UI elements on screen using OmniParser
5
+ ---
6
+ <!-- Generated from parse.mdx. DO NOT EDIT. -->
7
+
8
+ ## Overview
9
+
10
+ Parse the current screen using OmniParser v2 to detect all visible UI elements. Returns structured data including element types, text content, interactivity levels, and bounding box coordinates.
11
+
12
+ This method analyzes the entire screen and returns every detected element. It's useful for:
13
+ - Understanding the full UI layout of a screen
14
+ - Finding all clickable or interactive elements
15
+ - Building custom element-based logic
16
+ - Debugging what elements TestDriver can detect
17
+ - Accessibility auditing
18
+
19
+ <Note>
20
+ **Availability**: `parse()` requires an enterprise or self-hosted plan. It uses OmniParser v2 server-side for element detection.
21
+ </Note>
22
+
23
+ ## Syntax
24
+
25
+ ```javascript
26
+ const result = await testdriver.parse()
27
+ ```
28
+
29
+ ## Parameters
30
+
31
+ None.
32
+
33
+ ## Returns
34
+
35
+ `Promise<ParseResult>` - Object containing detected UI elements
36
+
37
+ ### ParseResult
38
+
39
+ | Property | Type | Description |
40
+ |----------|------|-------------|
41
+ | `elements` | `ParsedElement[]` | Array of detected UI elements |
42
+ | `annotatedImageUrl` | `string` | URL of the annotated screenshot with bounding boxes |
43
+ | `imageWidth` | `number` | Width of the analyzed screenshot |
44
+ | `imageHeight` | `number` | Height of the analyzed screenshot |
45
+
46
+ ### ParsedElement
47
+
48
+ | Property | Type | Description |
49
+ |----------|------|-------------|
50
+ | `index` | `number` | Element index |
51
+ | `type` | `string` | Element type (e.g. `"text"`, `"icon"`, `"button"`) |
52
+ | `content` | `string` | Text content or description of the element |
53
+ | `interactivity` | `string` | Interactivity level (e.g. `"clickable"`, `"non-interactive"`) |
54
+ | `bbox` | `object` | Bounding box in pixel coordinates `{x0, y0, x1, y1}` |
55
+ | `boundingBox` | `object` | Bounding box as `{left, top, width, height}` |
56
+
57
+ ## Examples
58
+
59
+ ### Get All Elements on Screen
60
+
61
+ ```javascript
62
+ const result = await testdriver.parse();
63
+ console.log(`Found ${result.elements.length} elements`);
64
+
65
+ result.elements.forEach((el, i) => {
66
+ console.log(`${i + 1}. [${el.type}] "${el.content}" (${el.interactivity})`);
67
+ });
68
+ ```
69
+
70
+ ### Find Clickable Elements
71
+
72
+ ```javascript
73
+ const result = await testdriver.parse();
74
+
75
+ const clickable = result.elements.filter(e => e.interactivity === 'clickable');
76
+ console.log(`Found ${clickable.length} clickable elements`);
77
+ ```
78
+
79
+ ### Find and Click an Element by Content
80
+
81
+ ```javascript
82
+ const result = await testdriver.parse();
83
+
84
+ const submitBtn = result.elements.find(e =>
85
+ e.content.toLowerCase().includes('submit') && e.interactivity === 'clickable'
86
+ );
87
+
88
+ if (submitBtn) {
89
+ const x = Math.round((submitBtn.bbox.x0 + submitBtn.bbox.x1) / 2);
90
+ const y = Math.round((submitBtn.bbox.y0 + submitBtn.bbox.y1) / 2);
91
+ await testdriver.click({ x, y });
92
+ }
93
+ ```
94
+
95
+ ### Filter by Element Type
96
+
97
+ ```javascript
98
+ const result = await testdriver.parse();
99
+
100
+ const textElements = result.elements.filter(e => e.type === 'text');
101
+ const icons = result.elements.filter(e => e.type === 'icon');
102
+ const buttons = result.elements.filter(e => e.type === 'button');
103
+ ```
104
+
105
+ ## Best Practices
106
+
107
+ - Use `find()` for targeting specific elements — `parse()` is for full UI analysis
108
+ - Filter by `interactivity` to distinguish clickable vs non-interactive elements
109
+ - Wait for the page to stabilize before calling `parse()`
110
+ - Use the `annotatedImageUrl` for visual debugging
111
+
112
+ ## Related
113
+
114
+ - [find()](/v7/find) - AI-powered element location
115
+ - [assert()](/v7/assert) - Make AI-powered assertions about screen state
116
+ - [screenshot()](/v7/screenshot) - Capture screenshots
117
+ - [Elements Reference](/v7/elements) - Complete Element API
118
+ ```
package/CHANGELOG.md CHANGED
@@ -1,3 +1,12 @@
1
+ ## [7.3.17](https://github.com/testdriverai/testdriverai/compare/v7.3.16...v7.3.17) (2026-02-18)
2
+
3
+
4
+ ### Features
5
+
6
+ * add type option to find(), move confidence to API, rename ocr to parse ([#640](https://github.com/testdriverai/testdriverai/issues/640)) ([d98a94b](https://github.com/testdriverai/testdriverai/commit/d98a94bd05c135c74d9fed2ebb72c709fc643337))
7
+
8
+
9
+
1
10
  ## [7.3.16](https://github.com/testdriverai/testdriverai/compare/v7.3.14...v7.3.16) (2026-02-18)
2
11
 
3
12
 
package/agent/lib/sdk.js CHANGED
@@ -337,9 +337,11 @@ const createSDK = (emitter, config, sessionInstance) => {
337
337
  }
338
338
 
339
339
  const req = async (path, data, onChunk) => {
340
- // for each value of data, if it is empty remove it
340
+ // for each value of data, if it is null/undefined remove it
341
+ // Note: use == null to match both null and undefined, but preserve
342
+ // other falsy values like 0, false, and "" which may be intentional
341
343
  for (let key in data) {
342
- if (!data[key]) {
344
+ if (data[key] == null) {
343
345
  delete data[key];
344
346
  }
345
347
  }
@@ -37,6 +37,18 @@ const element = await testdriver.find(description, options)
37
37
  Maximum time in milliseconds to poll for the element. Retries every 5 seconds until found or timeout expires.
38
38
  </ParamField>
39
39
 
40
+ <ParamField path="confidence" type="number">
41
+ Minimum confidence threshold (0-1). If the AI's confidence score for the found element is below this value, the find will be treated as a failure (`element.found()` returns `false`). Useful for ensuring high-quality matches in critical test steps.
42
+ </ParamField>
43
+
44
+ <ParamField path="type" type="string">
45
+ Element type hint that wraps the description for better matching. Accepted values:
46
+ - `"text"` — Wraps the prompt as `The text "..."`
47
+ - `"image"` — Wraps the prompt as `The image "..."`
48
+ - `"ui"` — Wraps the prompt as `The UI element "..."`
49
+ - `"any"` — No wrapping, uses the description as-is (default behavior)
50
+ </ParamField>
51
+
40
52
  <ParamField path="zoom" type="boolean" default={false}>
41
53
  Enable two-phase zoom mode for better precision in crowded UIs with many similar elements.
42
54
  </ParamField>
@@ -232,6 +244,69 @@ This is useful for:
232
244
  ```
233
245
  </Check>
234
246
 
247
+ ## Confidence Threshold
248
+
249
+ Require a minimum AI confidence score for element matches. If the confidence is below the threshold, `find()` treats the result as not found:
250
+
251
+ ```javascript
252
+ // Require at least 90% confidence
253
+ const element = await testdriver.find('submit button', { confidence: 0.9 });
254
+
255
+ if (!element.found()) {
256
+ // AI found something but wasn't confident enough
257
+ throw new Error('Could not confidently locate submit button');
258
+ }
259
+
260
+ await element.click();
261
+ ```
262
+
263
+ This is useful for:
264
+ - Critical test steps where an incorrect click could cause cascading failures
265
+ - Distinguishing between similar elements (e.g., multiple buttons)
266
+ - Failing fast when the UI has changed unexpectedly
267
+
268
+ ```javascript
269
+ // Combine with timeout for robust polling with confidence gate
270
+ const element = await testdriver.find('success notification', {
271
+ confidence: 0.85,
272
+ timeout: 15000,
273
+ });
274
+ ```
275
+
276
+ <Tip>
277
+ The `confidence` value is a float between 0 and 1 (e.g., `0.9` = 90%). The AI returns its confidence with each find result, which you can also read from `element.confidence` after a successful find.
278
+ </Tip>
279
+ ## Element Type
280
+
281
+ Use the `type` option to hint what kind of element you're looking for. This wraps your description into a more specific prompt for the AI, improving match accuracy — especially when users provide short or ambiguous descriptions.
282
+
283
+ ```javascript
284
+ // Find text on the page
285
+ const label = await testdriver.find('Sign In', { type: 'text' });
286
+ // AI prompt becomes: The text "Sign In"
287
+
288
+ // Find an image
289
+ const logo = await testdriver.find('company logo', { type: 'image' });
290
+ // AI prompt becomes: The image "company logo"
291
+
292
+ // Find a UI element (button, input, checkbox, etc.)
293
+ const btn = await testdriver.find('Submit', { type: 'ui' });
294
+ // AI prompt becomes: The UI element "Submit"
295
+
296
+ // No wrapping — same as omitting the option
297
+ const el = await testdriver.find('the blue submit button', { type: 'any' });
298
+ ```
299
+
300
+ | Type | Prompt sent to AI |
301
+ |------|----|
302
+ | `"text"` | `The text "..."` |
303
+ | `"image"` | `The image "..."` |
304
+ | `"ui"` | `The UI element "..."` |
305
+ | `"any"` | Original description (no wrapping) |
306
+
307
+ <Tip>
308
+ This is particularly useful for short descriptions like `"Submit"` or `"Login"` where the AI may not know whether to look for a button, a link, or visible text. Specifying `type` removes the ambiguity.
309
+ </Tip>
235
310
  ## Polling for Dynamic Elements
236
311
 
237
312
  For elements that may not be immediately visible, use the `timeout` option to automatically poll:
@@ -0,0 +1,236 @@
1
+ ---
2
+ name: testdriver:parse
3
+ description: Detect all UI elements on screen using OmniParser
4
+ ---
5
+ <!-- Generated from parse.mdx. DO NOT EDIT. -->
6
+
7
+ ## Overview
8
+
9
+ Parse the current screen using OmniParser v2 to detect all visible UI elements. Returns structured data including element types, text content, interactivity levels, and bounding box coordinates.
10
+
11
+ This method analyzes the entire screen and returns every detected element. It's useful for:
12
+ - Understanding the full UI layout of a screen
13
+ - Finding all clickable or interactive elements
14
+ - Building custom element-based logic
15
+ - Debugging what elements TestDriver can detect
16
+ - Accessibility auditing
17
+
18
+ <Note>
19
+ **Availability**: `parse()` requires an enterprise or self-hosted plan. It uses OmniParser v2 server-side for element detection.
20
+ </Note>
21
+
22
+ ## Syntax
23
+
24
+ ```javascript
25
+ const result = await testdriver.parse()
26
+ ```
27
+
28
+ ## Parameters
29
+
30
+ None.
31
+
32
+ ## Returns
33
+
34
+ `Promise<ParseResult>` - Object containing detected UI elements
35
+
36
+ ### ParseResult
37
+
38
+ | Property | Type | Description |
39
+ |----------|------|-------------|
40
+ | `elements` | `ParsedElement[]` | Array of detected UI elements |
41
+ | `annotatedImageUrl` | `string` | URL of the annotated screenshot with bounding boxes |
42
+ | `imageWidth` | `number` | Width of the analyzed screenshot |
43
+ | `imageHeight` | `number` | Height of the analyzed screenshot |
44
+
45
+ ### ParsedElement
46
+
47
+ | Property | Type | Description |
48
+ |----------|------|-------------|
49
+ | `index` | `number` | Element index |
50
+ | `type` | `string` | Element type (e.g. `"text"`, `"icon"`, `"button"`) |
51
+ | `content` | `string` | Text content or description of the element |
52
+ | `interactivity` | `string` | Interactivity level (e.g. `"clickable"`, `"non-interactive"`) |
53
+ | `bbox` | `object` | Bounding box in pixel coordinates `{x0, y0, x1, y1}` |
54
+ | `boundingBox` | `object` | Bounding box as `{left, top, width, height}` |
55
+
56
+ ## Examples
57
+
58
+ ### Get All Elements on Screen
59
+
60
+ ```javascript
61
+ const result = await testdriver.parse();
62
+ console.log(`Found ${result.elements.length} elements`);
63
+
64
+ result.elements.forEach((el, i) => {
65
+ console.log(`${i + 1}. [${el.type}] "${el.content}" (${el.interactivity})`);
66
+ });
67
+ ```
68
+
69
+ ### Find Clickable Elements
70
+
71
+ ```javascript
72
+ const result = await testdriver.parse();
73
+
74
+ const clickable = result.elements.filter(e => e.interactivity === 'clickable');
75
+ console.log(`Found ${clickable.length} clickable elements`);
76
+
77
+ clickable.forEach(el => {
78
+ console.log(`- "${el.content}" at (${el.bbox.x0}, ${el.bbox.y0})`);
79
+ });
80
+ ```
81
+
82
+ ### Find and Click an Element by Content
83
+
84
+ ```javascript
85
+ const result = await testdriver.parse();
86
+
87
+ // Find a "Submit" button
88
+ const submitBtn = result.elements.find(e =>
89
+ e.content.toLowerCase().includes('submit') && e.interactivity === 'clickable'
90
+ );
91
+
92
+ if (submitBtn) {
93
+ // Calculate center of the bounding box
94
+ const x = Math.round((submitBtn.bbox.x0 + submitBtn.bbox.x1) / 2);
95
+ const y = Math.round((submitBtn.bbox.y0 + submitBtn.bbox.y1) / 2);
96
+
97
+ await testdriver.click({ x, y });
98
+ }
99
+ ```
100
+
101
+ ### Filter by Element Type
102
+
103
+ ```javascript
104
+ const result = await testdriver.parse();
105
+
106
+ // Get all text elements
107
+ const textElements = result.elements.filter(e => e.type === 'text');
108
+ textElements.forEach(e => console.log(`Text: "${e.content}"`));
109
+
110
+ // Get all icons
111
+ const icons = result.elements.filter(e => e.type === 'icon');
112
+ console.log(`Found ${icons.length} icons`);
113
+
114
+ // Get all buttons
115
+ const buttons = result.elements.filter(e => e.type === 'button');
116
+ console.log(`Found ${buttons.length} buttons`);
117
+ ```
118
+
119
+ ### Build Custom Assertions
120
+
121
+ ```javascript
122
+ import { describe, expect, it } from "vitest";
123
+ import { TestDriver } from "testdriverai/lib/vitest/hooks.mjs";
124
+
125
+ describe("Login Page", () => {
126
+ it("should have expected form elements", async (context) => {
127
+ const testdriver = TestDriver(context);
128
+
129
+ await testdriver.provision.chrome({
130
+ url: 'https://myapp.com/login',
131
+ });
132
+
133
+ const result = await testdriver.parse();
134
+
135
+ // Assert expected elements exist
136
+ const textContent = result.elements.map(e => e.content.toLowerCase());
137
+ expect(textContent).toContain('email');
138
+ expect(textContent).toContain('password');
139
+
140
+ // Assert there are clickable elements
141
+ const clickable = result.elements.filter(e => e.interactivity === 'clickable');
142
+ expect(clickable.length).toBeGreaterThan(0);
143
+ });
144
+ });
145
+ ```
146
+
147
+ ### Use Bounding Box Coordinates
148
+
149
+ ```javascript
150
+ const result = await testdriver.parse();
151
+
152
+ result.elements.forEach(el => {
153
+ // Pixel coordinates
154
+ console.log(`Element "${el.content}":`);
155
+ console.log(` bbox: (${el.bbox.x0}, ${el.bbox.y0}) to (${el.bbox.x1}, ${el.bbox.y1})`);
156
+ console.log(` size: ${el.boundingBox.width}x${el.boundingBox.height}`);
157
+ console.log(` position: left=${el.boundingBox.left}, top=${el.boundingBox.top}`);
158
+ });
159
+ ```
160
+
161
+ ### View Annotated Screenshot
162
+
163
+ ```javascript
164
+ const result = await testdriver.parse();
165
+
166
+ // The annotated image shows all detected elements with bounding boxes
167
+ console.log('Annotated screenshot:', result.annotatedImageUrl);
168
+ console.log(`Image dimensions: ${result.imageWidth}x${result.imageHeight}`);
169
+ ```
170
+
171
+ ## How It Works
172
+
173
+ 1. TestDriver captures a screenshot of the current screen
174
+ 2. The image is sent to the TestDriver API
175
+ 3. OmniParser v2 analyzes the image to detect all UI elements
176
+ 4. Each element is classified by type (text, icon, button, etc.) and interactivity
177
+ 5. Bounding box coordinates are returned in pixel coordinates matching the screen resolution
178
+
179
+ <Note>
180
+ OmniParser detects elements visually — it works with any UI framework, native apps, and even non-standard interfaces. It does not rely on DOM or accessibility trees.
181
+ </Note>
182
+
183
+ ## Best Practices
184
+
185
+ <AccordionGroup>
186
+ <Accordion title="Use find() for targeting specific elements">
187
+ For locating and interacting with a specific element, prefer `find()` which uses AI vision. Use `parse()` when you need a complete inventory of all elements on screen.
188
+
189
+ ```javascript
190
+ // Prefer this for clicking a specific element
191
+ await testdriver.find("Submit button").click();
192
+
193
+ // Use parse() for full UI analysis
194
+ const result = await testdriver.parse();
195
+ const allButtons = result.elements.filter(e => e.type === 'button');
196
+ ```
197
+ </Accordion>
198
+
199
+ <Accordion title="Filter by interactivity">
200
+ Use the `interactivity` field to distinguish between clickable and non-interactive elements.
201
+
202
+ ```javascript
203
+ const result = await testdriver.parse();
204
+ const interactive = result.elements.filter(e => e.interactivity === 'clickable');
205
+ const static_ = result.elements.filter(e => e.interactivity === 'non-interactive');
206
+ ```
207
+ </Accordion>
208
+
209
+ <Accordion title="Wait for content to load">
210
+ If elements aren't being detected, the page may not be fully loaded. Add a wait first.
211
+
212
+ ```javascript
213
+ // Wait for page to stabilize
214
+ await testdriver.wait(2000);
215
+
216
+ // Then parse
217
+ const result = await testdriver.parse();
218
+ ```
219
+ </Accordion>
220
+
221
+ <Accordion title="Use the annotated image for debugging">
222
+ The `annotatedImageUrl` provides a visual overlay showing all detected elements with their bounding boxes — great for debugging.
223
+
224
+ ```javascript
225
+ const result = await testdriver.parse();
226
+ console.log('View annotated screenshot:', result.annotatedImageUrl);
227
+ ```
228
+ </Accordion>
229
+ </AccordionGroup>
230
+
231
+ ## Related
232
+
233
+ - [find()](/v7/find) - AI-powered element location
234
+ - [assert()](/v7/assert) - Make AI-powered assertions about screen state
235
+ - [screenshot()](/v7/screenshot) - Capture screenshots
236
+ - [Elements Reference](/v7/elements) - Complete Element API
@@ -36,6 +36,15 @@ export async function logout(testdriver) {
36
36
  }
37
37
  ```
38
38
 
39
+ <Warning>
40
+ **Avoid hardcoding dynamic values in element descriptions.** Element selectors should describe the *type* of element, not specific content that might change.
41
+
42
+ **❌ Bad:** `await testdriver.find('profile name TestDriver in the top right')`
43
+ **✅ Good:** `await testdriver.find('user profile name in the top right')`
44
+
45
+ Hardcoded values like usernames, product names, or prices will cause tests to fail when the data changes. Use generic descriptions that work regardless of the specific content displayed.
46
+ </Warning>
47
+
39
48
  Now import and use these helpers in any test:
40
49
 
41
50
  ```javascript test/checkout.test.mjs
package/docs/docs.json CHANGED
@@ -107,7 +107,7 @@
107
107
  "/v7/hover",
108
108
  "/v7/mouse-down",
109
109
  "/v7/mouse-up",
110
- "/v7/ocr",
110
+ "/v7/parse",
111
111
  "/v7/press-keys",
112
112
  "/v7/right-click",
113
113
  "/v7/screenshot",
package/docs/v7/find.mdx CHANGED
@@ -38,6 +38,18 @@ const element = await testdriver.find(description, options)
38
38
  Maximum time in milliseconds to poll for the element. Retries every 5 seconds until found or timeout expires.
39
39
  </ParamField>
40
40
 
41
+ <ParamField path="confidence" type="number">
42
+ Minimum confidence threshold (0-1). If the AI's confidence score for the found element is below this value, the find will be treated as a failure (`element.found()` returns `false`). Useful for ensuring high-quality matches in critical test steps.
43
+ </ParamField>
44
+
45
+ <ParamField path="type" type="string">
46
+ Element type hint that wraps the description for better matching. Accepted values:
47
+ - `"text"` — Wraps the prompt as `The text "..."`
48
+ - `"image"` — Wraps the prompt as `The image "..."`
49
+ - `"ui"` — Wraps the prompt as `The UI element "..."`
50
+ - `"any"` — No wrapping, uses the description as-is (default behavior)
51
+ </ParamField>
52
+
41
53
  <ParamField path="zoom" type="boolean" default={false}>
42
54
  Enable two-phase zoom mode for better precision in crowded UIs with many similar elements.
43
55
  </ParamField>
@@ -233,6 +245,69 @@ This is useful for:
233
245
  ```
234
246
  </Check>
235
247
 
248
+ ## Confidence Threshold
249
+
250
+ Require a minimum AI confidence score for element matches. If the confidence is below the threshold, `find()` treats the result as not found:
251
+
252
+ ```javascript
253
+ // Require at least 90% confidence
254
+ const element = await testdriver.find('submit button', { confidence: 0.9 });
255
+
256
+ if (!element.found()) {
257
+ // AI found something but wasn't confident enough
258
+ throw new Error('Could not confidently locate submit button');
259
+ }
260
+
261
+ await element.click();
262
+ ```
263
+
264
+ This is useful for:
265
+ - Critical test steps where an incorrect click could cause cascading failures
266
+ - Distinguishing between similar elements (e.g., multiple buttons)
267
+ - Failing fast when the UI has changed unexpectedly
268
+
269
+ ```javascript
270
+ // Combine with timeout for robust polling with confidence gate
271
+ const element = await testdriver.find('success notification', {
272
+ confidence: 0.85,
273
+ timeout: 15000,
274
+ });
275
+ ```
276
+
277
+ <Tip>
278
+ The `confidence` value is a float between 0 and 1 (e.g., `0.9` = 90%). The AI returns its confidence with each find result, which you can also read from `element.confidence` after a successful find.
279
+ </Tip>
280
+ ## Element Type
281
+
282
+ Use the `type` option to hint what kind of element you're looking for. This wraps your description into a more specific prompt for the AI, improving match accuracy — especially when users provide short or ambiguous descriptions.
283
+
284
+ ```javascript
285
+ // Find text on the page
286
+ const label = await testdriver.find('Sign In', { type: 'text' });
287
+ // AI prompt becomes: The text "Sign In"
288
+
289
+ // Find an image
290
+ const logo = await testdriver.find('company logo', { type: 'image' });
291
+ // AI prompt becomes: The image "company logo"
292
+
293
+ // Find a UI element (button, input, checkbox, etc.)
294
+ const btn = await testdriver.find('Submit', { type: 'ui' });
295
+ // AI prompt becomes: The UI element "Submit"
296
+
297
+ // No wrapping — same as omitting the option
298
+ const el = await testdriver.find('the blue submit button', { type: 'any' });
299
+ ```
300
+
301
+ | Type | Prompt sent to AI |
302
+ |------|----|
303
+ | `"text"` | `The text "..."` |
304
+ | `"image"` | `The image "..."` |
305
+ | `"ui"` | `The UI element "..."` |
306
+ | `"any"` | Original description (no wrapping) |
307
+
308
+ <Tip>
309
+ This is particularly useful for short descriptions like `"Submit"` or `"Login"` where the AI may not know whether to look for a button, a link, or visible text. Specifying `type` removes the ambiguity.
310
+ </Tip>
236
311
  ## Polling for Dynamic Elements
237
312
 
238
313
  For elements that may not be immediately visible, use the `timeout` option to automatically poll:
@@ -0,0 +1,237 @@
1
+ ---
2
+ title: "parse()"
3
+ sidebarTitle: "parse"
4
+ description: "Detect all UI elements on screen using OmniParser"
5
+ icon: "diagram-project"
6
+ ---
7
+
8
+ ## Overview
9
+
10
+ Parse the current screen using OmniParser v2 to detect all visible UI elements. Returns structured data including element types, text content, interactivity levels, and bounding box coordinates.
11
+
12
+ This method analyzes the entire screen and returns every detected element. It's useful for:
13
+ - Understanding the full UI layout of a screen
14
+ - Finding all clickable or interactive elements
15
+ - Building custom element-based logic
16
+ - Debugging what elements TestDriver can detect
17
+ - Accessibility auditing
18
+
19
+ <Note>
20
+ **Availability**: `parse()` requires an enterprise or self-hosted plan. It uses OmniParser v2 server-side for element detection.
21
+ </Note>
22
+
23
+ ## Syntax
24
+
25
+ ```javascript
26
+ const result = await testdriver.parse()
27
+ ```
28
+
29
+ ## Parameters
30
+
31
+ None.
32
+
33
+ ## Returns
34
+
35
+ `Promise<ParseResult>` - Object containing detected UI elements
36
+
37
+ ### ParseResult
38
+
39
+ | Property | Type | Description |
40
+ |----------|------|-------------|
41
+ | `elements` | `ParsedElement[]` | Array of detected UI elements |
42
+ | `annotatedImageUrl` | `string` | URL of the annotated screenshot with bounding boxes |
43
+ | `imageWidth` | `number` | Width of the analyzed screenshot |
44
+ | `imageHeight` | `number` | Height of the analyzed screenshot |
45
+
46
+ ### ParsedElement
47
+
48
+ | Property | Type | Description |
49
+ |----------|------|-------------|
50
+ | `index` | `number` | Element index |
51
+ | `type` | `string` | Element type (e.g. `"text"`, `"icon"`, `"button"`) |
52
+ | `content` | `string` | Text content or description of the element |
53
+ | `interactivity` | `string` | Interactivity level (e.g. `"clickable"`, `"non-interactive"`) |
54
+ | `bbox` | `object` | Bounding box in pixel coordinates `{x0, y0, x1, y1}` |
55
+ | `boundingBox` | `object` | Bounding box as `{left, top, width, height}` |
56
+
57
+ ## Examples
58
+
59
+ ### Get All Elements on Screen
60
+
61
+ ```javascript
62
+ const result = await testdriver.parse();
63
+ console.log(`Found ${result.elements.length} elements`);
64
+
65
+ result.elements.forEach((el, i) => {
66
+ console.log(`${i + 1}. [${el.type}] "${el.content}" (${el.interactivity})`);
67
+ });
68
+ ```
69
+
70
+ ### Find Clickable Elements
71
+
72
+ ```javascript
73
+ const result = await testdriver.parse();
74
+
75
+ const clickable = result.elements.filter(e => e.interactivity === 'clickable');
76
+ console.log(`Found ${clickable.length} clickable elements`);
77
+
78
+ clickable.forEach(el => {
79
+ console.log(`- "${el.content}" at (${el.bbox.x0}, ${el.bbox.y0})`);
80
+ });
81
+ ```
82
+
83
+ ### Find and Click an Element by Content
84
+
85
+ ```javascript
86
+ const result = await testdriver.parse();
87
+
88
+ // Find a "Submit" button
89
+ const submitBtn = result.elements.find(e =>
90
+ e.content.toLowerCase().includes('submit') && e.interactivity === 'clickable'
91
+ );
92
+
93
+ if (submitBtn) {
94
+ // Calculate center of the bounding box
95
+ const x = Math.round((submitBtn.bbox.x0 + submitBtn.bbox.x1) / 2);
96
+ const y = Math.round((submitBtn.bbox.y0 + submitBtn.bbox.y1) / 2);
97
+
98
+ await testdriver.click({ x, y });
99
+ }
100
+ ```
101
+
102
+ ### Filter by Element Type
103
+
104
+ ```javascript
105
+ const result = await testdriver.parse();
106
+
107
+ // Get all text elements
108
+ const textElements = result.elements.filter(e => e.type === 'text');
109
+ textElements.forEach(e => console.log(`Text: "${e.content}"`));
110
+
111
+ // Get all icons
112
+ const icons = result.elements.filter(e => e.type === 'icon');
113
+ console.log(`Found ${icons.length} icons`);
114
+
115
+ // Get all buttons
116
+ const buttons = result.elements.filter(e => e.type === 'button');
117
+ console.log(`Found ${buttons.length} buttons`);
118
+ ```
119
+
120
+ ### Build Custom Assertions
121
+
122
+ ```javascript
123
+ import { describe, expect, it } from "vitest";
124
+ import { TestDriver } from "testdriverai/lib/vitest/hooks.mjs";
125
+
126
+ describe("Login Page", () => {
127
+ it("should have expected form elements", async (context) => {
128
+ const testdriver = TestDriver(context);
129
+
130
+ await testdriver.provision.chrome({
131
+ url: 'https://myapp.com/login',
132
+ });
133
+
134
+ const result = await testdriver.parse();
135
+
136
+ // Assert expected elements exist
137
+ const textContent = result.elements.map(e => e.content.toLowerCase());
138
+ expect(textContent).toContain('email');
139
+ expect(textContent).toContain('password');
140
+
141
+ // Assert there are clickable elements
142
+ const clickable = result.elements.filter(e => e.interactivity === 'clickable');
143
+ expect(clickable.length).toBeGreaterThan(0);
144
+ });
145
+ });
146
+ ```
147
+
148
+ ### Use Bounding Box Coordinates
149
+
150
+ ```javascript
151
+ const result = await testdriver.parse();
152
+
153
+ result.elements.forEach(el => {
154
+ // Pixel coordinates
155
+ console.log(`Element "${el.content}":`);
156
+ console.log(` bbox: (${el.bbox.x0}, ${el.bbox.y0}) to (${el.bbox.x1}, ${el.bbox.y1})`);
157
+ console.log(` size: ${el.boundingBox.width}x${el.boundingBox.height}`);
158
+ console.log(` position: left=${el.boundingBox.left}, top=${el.boundingBox.top}`);
159
+ });
160
+ ```
161
+
162
+ ### View Annotated Screenshot
163
+
164
+ ```javascript
165
+ const result = await testdriver.parse();
166
+
167
+ // The annotated image shows all detected elements with bounding boxes
168
+ console.log('Annotated screenshot:', result.annotatedImageUrl);
169
+ console.log(`Image dimensions: ${result.imageWidth}x${result.imageHeight}`);
170
+ ```
171
+
172
+ ## How It Works
173
+
174
+ 1. TestDriver captures a screenshot of the current screen
175
+ 2. The image is sent to the TestDriver API
176
+ 3. OmniParser v2 analyzes the image to detect all UI elements
177
+ 4. Each element is classified by type (text, icon, button, etc.) and interactivity
178
+ 5. Bounding box coordinates are returned in pixel coordinates matching the screen resolution
179
+
180
+ <Note>
181
+ OmniParser detects elements visually — it works with any UI framework, native apps, and even non-standard interfaces. It does not rely on DOM or accessibility trees.
182
+ </Note>
183
+
184
+ ## Best Practices
185
+
186
+ <AccordionGroup>
187
+ <Accordion title="Use find() for targeting specific elements">
188
+ For locating and interacting with a specific element, prefer `find()` which uses AI vision. Use `parse()` when you need a complete inventory of all elements on screen.
189
+
190
+ ```javascript
191
+ // Prefer this for clicking a specific element
192
+ await testdriver.find("Submit button").click();
193
+
194
+ // Use parse() for full UI analysis
195
+ const result = await testdriver.parse();
196
+ const allButtons = result.elements.filter(e => e.type === 'button');
197
+ ```
198
+ </Accordion>
199
+
200
+ <Accordion title="Filter by interactivity">
201
+ Use the `interactivity` field to distinguish between clickable and non-interactive elements.
202
+
203
+ ```javascript
204
+ const result = await testdriver.parse();
205
+ const interactive = result.elements.filter(e => e.interactivity === 'clickable');
206
+ const static_ = result.elements.filter(e => e.interactivity === 'non-interactive');
207
+ ```
208
+ </Accordion>
209
+
210
+ <Accordion title="Wait for content to load">
211
+ If elements aren't being detected, the page may not be fully loaded. Add a wait first.
212
+
213
+ ```javascript
214
+ // Wait for page to stabilize
215
+ await testdriver.wait(2000);
216
+
217
+ // Then parse
218
+ const result = await testdriver.parse();
219
+ ```
220
+ </Accordion>
221
+
222
+ <Accordion title="Use the annotated image for debugging">
223
+ The `annotatedImageUrl` provides a visual overlay showing all detected elements with their bounding boxes — great for debugging.
224
+
225
+ ```javascript
226
+ const result = await testdriver.parse();
227
+ console.log('View annotated screenshot:', result.annotatedImageUrl);
228
+ ```
229
+ </Accordion>
230
+ </AccordionGroup>
231
+
232
+ ## Related
233
+
234
+ - [find()](/v7/find) - AI-powered element location
235
+ - [assert()](/v7/assert) - Make AI-powered assertions about screen state
236
+ - [screenshot()](/v7/screenshot) - Capture screenshots
237
+ - [Elements Reference](/v7/elements) - Complete Element API
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "testdriverai",
3
- "version": "7.3.16",
3
+ "version": "7.3.17",
4
4
  "description": "Next generation autonomous AI agent for end-to-end testing of web & desktop",
5
5
  "main": "sdk.js",
6
6
  "types": "sdk.d.ts",
package/sdk.d.ts CHANGED
@@ -1082,7 +1082,7 @@ export default class TestDriverSDK {
1082
1082
  find(description: string, cacheThreshold?: number): ChainableElementPromise;
1083
1083
  find(
1084
1084
  description: string,
1085
- options?: { cacheThreshold?: number; cacheKey?: string; timeout?: number; ai?: AIConfig; cache?: { thresholds?: { screen?: number; element?: number } } },
1085
+ options?: { cacheThreshold?: number; cacheKey?: string; timeout?: number; confidence?: number; type?: "text" | "image" | "ui" | "any"; ai?: AIConfig; cache?: { thresholds?: { screen?: number; element?: number } } },
1086
1086
  ): ChainableElementPromise;
1087
1087
 
1088
1088
  /**
package/sdk.js CHANGED
@@ -476,6 +476,9 @@ class Element {
476
476
  let zoom = false; // Default to disabled, enable with zoom: true
477
477
  let perCommandAi = null; // Per-command AI config override
478
478
 
479
+ let minConfidence = null; // Minimum confidence threshold
480
+ let elementType = null; // Element type hint: "text", "image", "ui", or "any"
481
+
479
482
  if (typeof options === "number") {
480
483
  // Legacy: options is just a number threshold
481
484
  cacheThreshold = options;
@@ -485,6 +488,10 @@ class Element {
485
488
  cacheThreshold = options.cacheThreshold ?? null;
486
489
  // zoom defaults to false unless explicitly set to true
487
490
  zoom = options.zoom === true;
491
+ // Minimum confidence threshold: fail find if AI confidence is below this value
492
+ minConfidence = options.confidence ?? null;
493
+ // Element type hint for prompt wrapping
494
+ elementType = options.type ?? null;
488
495
  // Per-command cache thresholds: { cache: { thresholds: { screen: 0.1, element: 0.2 } } }
489
496
  if (typeof options.cache === "object" && options.cache?.thresholds) {
490
497
  perCommandThresholds = options.cache.thresholds;
@@ -554,6 +561,8 @@ class Element {
554
561
  os: this.sdk.os,
555
562
  resolution: this.sdk.resolution,
556
563
  zoom: zoom,
564
+ confidence: minConfidence,
565
+ type: elementType,
557
566
  ai: {
558
567
  ...this.sdk.aiConfig,
559
568
  ...(perCommandAi || {}),
@@ -1,235 +0,0 @@
1
- ---
2
- name: testdriver:ocr
3
- description: Extract all visible text from the screen using OCR
4
- ---
5
- <!-- Generated from ocr.mdx. DO NOT EDIT. -->
6
-
7
- ## Overview
8
-
9
- Extract all visible text from the current screen using Tesseract OCR. Returns structured data including each word's text content, bounding box coordinates, and confidence scores.
10
-
11
- This method runs OCR on-demand and returns the results immediately. It's useful for:
12
- - Verifying text content on screen
13
- - Finding elements by their text when visual matching alone isn't enough
14
- - Debugging what text TestDriver can "see"
15
- - Building custom text-based assertions
16
-
17
- <Note>
18
- **Performance**: OCR runs server-side using Tesseract.js with a worker pool for fast extraction. A typical screenshot processes in 200-500ms.
19
- </Note>
20
-
21
- ## Syntax
22
-
23
- ```javascript
24
- const result = await testdriver.ocr()
25
- ```
26
-
27
- ## Parameters
28
-
29
- None.
30
-
31
- ## Returns
32
-
33
- `Promise<OCRResult>` - Object containing extracted text data
34
-
35
- ### OCRResult
36
-
37
- | Property | Type | Description |
38
- |----------|------|-------------|
39
- | `words` | `OCRWord[]` | Array of extracted words with positions |
40
- | `fullText` | `string` | All text concatenated with spaces |
41
- | `confidence` | `number` | Overall OCR confidence (0-100) |
42
- | `imageWidth` | `number` | Width of the analyzed screenshot |
43
- | `imageHeight` | `number` | Height of the analyzed screenshot |
44
-
45
- ### OCRWord
46
-
47
- | Property | Type | Description |
48
- |----------|------|-------------|
49
- | `content` | `string` | The word's text content |
50
- | `confidence` | `number` | Confidence score for this word (0-100) |
51
- | `bbox.x0` | `number` | Left edge X coordinate |
52
- | `bbox.y0` | `number` | Top edge Y coordinate |
53
- | `bbox.x1` | `number` | Right edge X coordinate |
54
- | `bbox.y1` | `number` | Bottom edge Y coordinate |
55
-
56
- ## Examples
57
-
58
- ### Get All Text on Screen
59
-
60
- ```javascript
61
- const result = await testdriver.ocr();
62
- console.log(result.fullText);
63
- // "Welcome to TestDriver Sign In Email Password Submit"
64
-
65
- console.log(`Found ${result.words.length} words with ${result.confidence}% confidence`);
66
- ```
67
-
68
- ### Check if Text Exists
69
-
70
- ```javascript
71
- const result = await testdriver.ocr();
72
-
73
- // Check for error message
74
- const hasError = result.words.some(w =>
75
- w.content.toLowerCase().includes('error')
76
- );
77
-
78
- if (hasError) {
79
- console.log('Error message detected on screen');
80
- }
81
- ```
82
-
83
- ### Find and Click Text
84
-
85
- ```javascript
86
- const result = await testdriver.ocr();
87
-
88
- // Find the "Submit" button text
89
- const submitWord = result.words.find(w => w.content === 'Submit');
90
-
91
- if (submitWord) {
92
- // Calculate center of the word's bounding box
93
- const x = Math.round((submitWord.bbox.x0 + submitWord.bbox.x1) / 2);
94
- const y = Math.round((submitWord.bbox.y0 + submitWord.bbox.y1) / 2);
95
-
96
- // Click at those coordinates
97
- await testdriver.click({ x, y });
98
- }
99
- ```
100
-
101
- ### Filter Words by Confidence
102
-
103
- ```javascript
104
- const result = await testdriver.ocr();
105
-
106
- // Only use high-confidence words (90%+)
107
- const reliableWords = result.words.filter(w => w.confidence >= 90);
108
-
109
- console.log('High confidence words:', reliableWords.map(w => w.content));
110
- ```
111
-
112
- ### Build Custom Assertions
113
-
114
- ```javascript
115
- import { describe, expect, it } from "vitest";
116
- import { TestDriver } from "testdriverai/lib/vitest/hooks.mjs";
117
-
118
- describe("Login Page", () => {
119
- it("should show form labels", async (context) => {
120
- const testdriver = TestDriver(context);
121
-
122
- await testdriver.provision.chrome({
123
- url: 'https://myapp.com/login',
124
- });
125
-
126
- const result = await testdriver.ocr();
127
-
128
- // Assert expected labels are present
129
- expect(result.fullText).toContain('Email');
130
- expect(result.fullText).toContain('Password');
131
- expect(result.fullText).toContain('Sign In');
132
- });
133
- });
134
- ```
135
-
136
- ### Debug Screen Content
137
-
138
- ```javascript
139
- // Useful for debugging what TestDriver can see
140
- const result = await testdriver.ocr();
141
-
142
- console.log('=== Screen Text ===');
143
- console.log(result.fullText);
144
- console.log('');
145
-
146
- console.log('=== Word Details ===');
147
- result.words.forEach((word, i) => {
148
- console.log(`${i + 1}. "${word.content}" at (${word.bbox.x0}, ${word.bbox.y0}) - ${word.confidence}% confidence`);
149
- });
150
- ```
151
-
152
- ### Find Multiple Instances
153
-
154
- ```javascript
155
- const result = await testdriver.ocr();
156
-
157
- // Find all instances of "Button" text
158
- const buttons = result.words.filter(w =>
159
- w.content.toLowerCase() === 'button'
160
- );
161
-
162
- console.log(`Found ${buttons.length} buttons on screen`);
163
-
164
- buttons.forEach((btn, i) => {
165
- console.log(`Button ${i + 1} at position (${btn.bbox.x0}, ${btn.bbox.y0})`);
166
- });
167
- ```
168
-
169
- ## How It Works
170
-
171
- 1. TestDriver captures a screenshot of the current screen
172
- 2. The image is sent to the TestDriver API
173
- 3. Tesseract.js processes the image server-side with multiple workers
174
- 4. The API returns structured data with text and positions
175
- 5. Bounding box coordinates are scaled to match the original screen resolution
176
-
177
- <Note>
178
- OCR works best with clear, readable text. Very small text, unusual fonts, or low-contrast text may have lower confidence scores or be missed entirely.
179
- </Note>
180
-
181
- ## Best Practices
182
-
183
- <AccordionGroup>
184
- <Accordion title="Use find() for element location">
185
- For locating elements, prefer `find()` which uses AI vision. Use `ocr()` when you need raw text data or want to build custom text-based logic.
186
-
187
- ```javascript
188
- // Prefer this for clicking elements
189
- await testdriver.find("Submit button").click();
190
-
191
- // Use ocr() for text verification or custom logic
192
- const result = await testdriver.ocr();
193
- expect(result.fullText).toContain('Success');
194
- ```
195
- </Accordion>
196
-
197
- <Accordion title="Filter by confidence">
198
- OCR can sometimes misread characters. Filter by confidence score when accuracy is critical.
199
-
200
- ```javascript
201
- const result = await testdriver.ocr();
202
- const reliable = result.words.filter(w => w.confidence >= 85);
203
- ```
204
- </Accordion>
205
-
206
- <Accordion title="Handle case sensitivity">
207
- Text matching should usually be case-insensitive since OCR capitalization can vary.
208
-
209
- ```javascript
210
- const result = await testdriver.ocr();
211
- const hasLogin = result.words.some(w =>
212
- w.content.toLowerCase() === 'login'
213
- );
214
- ```
215
- </Accordion>
216
-
217
- <Accordion title="Wait for content to load">
218
- If text isn't being found, the page may not be fully loaded. Add a wait or use `waitForText()`.
219
-
220
- ```javascript
221
- // Wait for specific text to appear
222
- await testdriver.waitForText("Welcome");
223
-
224
- // Then run OCR
225
- const result = await testdriver.ocr();
226
- ```
227
- </Accordion>
228
- </AccordionGroup>
229
-
230
- ## Related
231
-
232
- - [find()](/v7/find) - AI-powered element location
233
- - [assert()](/v7/assert) - Make AI-powered assertions about screen state
234
- - [waitForText()](/v7/waiting-for-elements) - Wait for text to appear on screen
235
- - [screenshot()](/v7/screenshot) - Capture screenshots
package/docs/v7/ocr.mdx DELETED
@@ -1,236 +0,0 @@
1
- ---
2
- title: "ocr()"
3
- sidebarTitle: "ocr"
4
- description: "Extract all visible text from the screen using OCR"
5
- icon: "text"
6
- ---
7
-
8
- ## Overview
9
-
10
- Extract all visible text from the current screen using Tesseract OCR. Returns structured data including each word's text content, bounding box coordinates, and confidence scores.
11
-
12
- This method runs OCR on-demand and returns the results immediately. It's useful for:
13
- - Verifying text content on screen
14
- - Finding elements by their text when visual matching alone isn't enough
15
- - Debugging what text TestDriver can "see"
16
- - Building custom text-based assertions
17
-
18
- <Note>
19
- **Performance**: OCR runs server-side using Tesseract.js with a worker pool for fast extraction. A typical screenshot processes in 200-500ms.
20
- </Note>
21
-
22
- ## Syntax
23
-
24
- ```javascript
25
- const result = await testdriver.ocr()
26
- ```
27
-
28
- ## Parameters
29
-
30
- None.
31
-
32
- ## Returns
33
-
34
- `Promise<OCRResult>` - Object containing extracted text data
35
-
36
- ### OCRResult
37
-
38
- | Property | Type | Description |
39
- |----------|------|-------------|
40
- | `words` | `OCRWord[]` | Array of extracted words with positions |
41
- | `fullText` | `string` | All text concatenated with spaces |
42
- | `confidence` | `number` | Overall OCR confidence (0-100) |
43
- | `imageWidth` | `number` | Width of the analyzed screenshot |
44
- | `imageHeight` | `number` | Height of the analyzed screenshot |
45
-
46
- ### OCRWord
47
-
48
- | Property | Type | Description |
49
- |----------|------|-------------|
50
- | `content` | `string` | The word's text content |
51
- | `confidence` | `number` | Confidence score for this word (0-100) |
52
- | `bbox.x0` | `number` | Left edge X coordinate |
53
- | `bbox.y0` | `number` | Top edge Y coordinate |
54
- | `bbox.x1` | `number` | Right edge X coordinate |
55
- | `bbox.y1` | `number` | Bottom edge Y coordinate |
56
-
57
- ## Examples
58
-
59
- ### Get All Text on Screen
60
-
61
- ```javascript
62
- const result = await testdriver.ocr();
63
- console.log(result.fullText);
64
- // "Welcome to TestDriver Sign In Email Password Submit"
65
-
66
- console.log(`Found ${result.words.length} words with ${result.confidence}% confidence`);
67
- ```
68
-
69
- ### Check if Text Exists
70
-
71
- ```javascript
72
- const result = await testdriver.ocr();
73
-
74
- // Check for error message
75
- const hasError = result.words.some(w =>
76
- w.content.toLowerCase().includes('error')
77
- );
78
-
79
- if (hasError) {
80
- console.log('Error message detected on screen');
81
- }
82
- ```
83
-
84
- ### Find and Click Text
85
-
86
- ```javascript
87
- const result = await testdriver.ocr();
88
-
89
- // Find the "Submit" button text
90
- const submitWord = result.words.find(w => w.content === 'Submit');
91
-
92
- if (submitWord) {
93
- // Calculate center of the word's bounding box
94
- const x = Math.round((submitWord.bbox.x0 + submitWord.bbox.x1) / 2);
95
- const y = Math.round((submitWord.bbox.y0 + submitWord.bbox.y1) / 2);
96
-
97
- // Click at those coordinates
98
- await testdriver.click({ x, y });
99
- }
100
- ```
101
-
102
- ### Filter Words by Confidence
103
-
104
- ```javascript
105
- const result = await testdriver.ocr();
106
-
107
- // Only use high-confidence words (90%+)
108
- const reliableWords = result.words.filter(w => w.confidence >= 90);
109
-
110
- console.log('High confidence words:', reliableWords.map(w => w.content));
111
- ```
112
-
113
- ### Build Custom Assertions
114
-
115
- ```javascript
116
- import { describe, expect, it } from "vitest";
117
- import { TestDriver } from "testdriverai/lib/vitest/hooks.mjs";
118
-
119
- describe("Login Page", () => {
120
- it("should show form labels", async (context) => {
121
- const testdriver = TestDriver(context);
122
-
123
- await testdriver.provision.chrome({
124
- url: 'https://myapp.com/login',
125
- });
126
-
127
- const result = await testdriver.ocr();
128
-
129
- // Assert expected labels are present
130
- expect(result.fullText).toContain('Email');
131
- expect(result.fullText).toContain('Password');
132
- expect(result.fullText).toContain('Sign In');
133
- });
134
- });
135
- ```
136
-
137
- ### Debug Screen Content
138
-
139
- ```javascript
140
- // Useful for debugging what TestDriver can see
141
- const result = await testdriver.ocr();
142
-
143
- console.log('=== Screen Text ===');
144
- console.log(result.fullText);
145
- console.log('');
146
-
147
- console.log('=== Word Details ===');
148
- result.words.forEach((word, i) => {
149
- console.log(`${i + 1}. "${word.content}" at (${word.bbox.x0}, ${word.bbox.y0}) - ${word.confidence}% confidence`);
150
- });
151
- ```
152
-
153
- ### Find Multiple Instances
154
-
155
- ```javascript
156
- const result = await testdriver.ocr();
157
-
158
- // Find all instances of "Button" text
159
- const buttons = result.words.filter(w =>
160
- w.content.toLowerCase() === 'button'
161
- );
162
-
163
- console.log(`Found ${buttons.length} buttons on screen`);
164
-
165
- buttons.forEach((btn, i) => {
166
- console.log(`Button ${i + 1} at position (${btn.bbox.x0}, ${btn.bbox.y0})`);
167
- });
168
- ```
169
-
170
- ## How It Works
171
-
172
- 1. TestDriver captures a screenshot of the current screen
173
- 2. The image is sent to the TestDriver API
174
- 3. Tesseract.js processes the image server-side with multiple workers
175
- 4. The API returns structured data with text and positions
176
- 5. Bounding box coordinates are scaled to match the original screen resolution
177
-
178
- <Note>
179
- OCR works best with clear, readable text. Very small text, unusual fonts, or low-contrast text may have lower confidence scores or be missed entirely.
180
- </Note>
181
-
182
- ## Best Practices
183
-
184
- <AccordionGroup>
185
- <Accordion title="Use find() for element location">
186
- For locating elements, prefer `find()` which uses AI vision. Use `ocr()` when you need raw text data or want to build custom text-based logic.
187
-
188
- ```javascript
189
- // Prefer this for clicking elements
190
- await testdriver.find("Submit button").click();
191
-
192
- // Use ocr() for text verification or custom logic
193
- const result = await testdriver.ocr();
194
- expect(result.fullText).toContain('Success');
195
- ```
196
- </Accordion>
197
-
198
- <Accordion title="Filter by confidence">
199
- OCR can sometimes misread characters. Filter by confidence score when accuracy is critical.
200
-
201
- ```javascript
202
- const result = await testdriver.ocr();
203
- const reliable = result.words.filter(w => w.confidence >= 85);
204
- ```
205
- </Accordion>
206
-
207
- <Accordion title="Handle case sensitivity">
208
- Text matching should usually be case-insensitive since OCR capitalization can vary.
209
-
210
- ```javascript
211
- const result = await testdriver.ocr();
212
- const hasLogin = result.words.some(w =>
213
- w.content.toLowerCase() === 'login'
214
- );
215
- ```
216
- </Accordion>
217
-
218
- <Accordion title="Wait for content to load">
219
- If text isn't being found, the page may not be fully loaded. Add a wait or use `waitForText()`.
220
-
221
- ```javascript
222
- // Wait for specific text to appear
223
- await testdriver.waitForText("Welcome");
224
-
225
- // Then run OCR
226
- const result = await testdriver.ocr();
227
- ```
228
- </Accordion>
229
- </AccordionGroup>
230
-
231
- ## Related
232
-
233
- - [find()](/v7/find) - AI-powered element location
234
- - [assert()](/v7/assert) - Make AI-powered assertions about screen state
235
- - [waitForText()](/v7/waiting-for-elements) - Wait for text to appear on screen
236
- - [screenshot()](/v7/screenshot) - Capture screenshots