testdriverai 7.3.16 → 7.3.18
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.github/skills/testdriver:find/SKILL.md +12 -0
- package/.github/skills/testdriver:parse/SKILL.md +118 -0
- package/.github/workflows/acceptance.yaml +2 -2
- package/CHANGELOG.md +13 -0
- package/agent/lib/sdk.js +4 -2
- package/ai/skills/testdriver:find/SKILL.md +75 -0
- package/ai/skills/testdriver:parse/SKILL.md +236 -0
- package/ai/skills/testdriver:reusable-code/SKILL.md +9 -0
- package/docs/_data/examples-manifest.json +78 -46
- package/docs/docs.json +1 -1
- package/docs/v7/examples/ai.mdx +1 -1
- package/docs/v7/examples/chrome-extension.mdx +1 -1
- package/docs/v7/examples/drag-and-drop.mdx +1 -1
- package/docs/v7/examples/element-not-found.mdx +1 -1
- package/docs/v7/examples/hover-image.mdx +1 -1
- package/docs/v7/examples/hover-text.mdx +1 -1
- package/docs/v7/examples/installer.mdx +1 -1
- package/docs/v7/examples/launch-vscode-linux.mdx +1 -1
- package/docs/v7/examples/match-image.mdx +1 -1
- package/docs/v7/examples/press-keys.mdx +1 -1
- package/docs/v7/examples/scroll-keyboard.mdx +1 -1
- package/docs/v7/examples/scroll-until-image.mdx +1 -1
- package/docs/v7/examples/scroll-until-text.mdx +1 -1
- package/docs/v7/examples/scroll.mdx +1 -1
- package/docs/v7/examples/type.mdx +1 -1
- package/docs/v7/examples/windows-installer.mdx +1 -1
- package/docs/v7/find.mdx +75 -0
- package/docs/v7/parse.mdx +237 -0
- package/interfaces/vitest-plugin.mjs +12 -45
- package/lib/sentry.js +24 -3
- package/mcp-server/dist/server.mjs +14 -1
- package/mcp-server/src/server.ts +21 -1
- package/package.json +1 -1
- package/sdk.d.ts +1 -1
- package/sdk.js +9 -0
- package/ai/skills/testdriver:ocr/SKILL.md +0 -235
- package/docs/v7/ocr.mdx +0 -236
package/docs/v7/ocr.mdx
DELETED
|
@@ -1,236 +0,0 @@
|
|
|
1
|
-
---
|
|
2
|
-
title: "ocr()"
|
|
3
|
-
sidebarTitle: "ocr"
|
|
4
|
-
description: "Extract all visible text from the screen using OCR"
|
|
5
|
-
icon: "text"
|
|
6
|
-
---
|
|
7
|
-
|
|
8
|
-
## Overview
|
|
9
|
-
|
|
10
|
-
Extract all visible text from the current screen using Tesseract OCR. Returns structured data including each word's text content, bounding box coordinates, and confidence scores.
|
|
11
|
-
|
|
12
|
-
This method runs OCR on-demand and returns the results immediately. It's useful for:
|
|
13
|
-
- Verifying text content on screen
|
|
14
|
-
- Finding elements by their text when visual matching alone isn't enough
|
|
15
|
-
- Debugging what text TestDriver can "see"
|
|
16
|
-
- Building custom text-based assertions
|
|
17
|
-
|
|
18
|
-
<Note>
|
|
19
|
-
**Performance**: OCR runs server-side using Tesseract.js with a worker pool for fast extraction. A typical screenshot processes in 200-500ms.
|
|
20
|
-
</Note>
|
|
21
|
-
|
|
22
|
-
## Syntax
|
|
23
|
-
|
|
24
|
-
```javascript
|
|
25
|
-
const result = await testdriver.ocr()
|
|
26
|
-
```
|
|
27
|
-
|
|
28
|
-
## Parameters
|
|
29
|
-
|
|
30
|
-
None.
|
|
31
|
-
|
|
32
|
-
## Returns
|
|
33
|
-
|
|
34
|
-
`Promise<OCRResult>` - Object containing extracted text data
|
|
35
|
-
|
|
36
|
-
### OCRResult
|
|
37
|
-
|
|
38
|
-
| Property | Type | Description |
|
|
39
|
-
|----------|------|-------------|
|
|
40
|
-
| `words` | `OCRWord[]` | Array of extracted words with positions |
|
|
41
|
-
| `fullText` | `string` | All text concatenated with spaces |
|
|
42
|
-
| `confidence` | `number` | Overall OCR confidence (0-100) |
|
|
43
|
-
| `imageWidth` | `number` | Width of the analyzed screenshot |
|
|
44
|
-
| `imageHeight` | `number` | Height of the analyzed screenshot |
|
|
45
|
-
|
|
46
|
-
### OCRWord
|
|
47
|
-
|
|
48
|
-
| Property | Type | Description |
|
|
49
|
-
|----------|------|-------------|
|
|
50
|
-
| `content` | `string` | The word's text content |
|
|
51
|
-
| `confidence` | `number` | Confidence score for this word (0-100) |
|
|
52
|
-
| `bbox.x0` | `number` | Left edge X coordinate |
|
|
53
|
-
| `bbox.y0` | `number` | Top edge Y coordinate |
|
|
54
|
-
| `bbox.x1` | `number` | Right edge X coordinate |
|
|
55
|
-
| `bbox.y1` | `number` | Bottom edge Y coordinate |
|
|
56
|
-
|
|
57
|
-
## Examples
|
|
58
|
-
|
|
59
|
-
### Get All Text on Screen
|
|
60
|
-
|
|
61
|
-
```javascript
|
|
62
|
-
const result = await testdriver.ocr();
|
|
63
|
-
console.log(result.fullText);
|
|
64
|
-
// "Welcome to TestDriver Sign In Email Password Submit"
|
|
65
|
-
|
|
66
|
-
console.log(`Found ${result.words.length} words with ${result.confidence}% confidence`);
|
|
67
|
-
```
|
|
68
|
-
|
|
69
|
-
### Check if Text Exists
|
|
70
|
-
|
|
71
|
-
```javascript
|
|
72
|
-
const result = await testdriver.ocr();
|
|
73
|
-
|
|
74
|
-
// Check for error message
|
|
75
|
-
const hasError = result.words.some(w =>
|
|
76
|
-
w.content.toLowerCase().includes('error')
|
|
77
|
-
);
|
|
78
|
-
|
|
79
|
-
if (hasError) {
|
|
80
|
-
console.log('Error message detected on screen');
|
|
81
|
-
}
|
|
82
|
-
```
|
|
83
|
-
|
|
84
|
-
### Find and Click Text
|
|
85
|
-
|
|
86
|
-
```javascript
|
|
87
|
-
const result = await testdriver.ocr();
|
|
88
|
-
|
|
89
|
-
// Find the "Submit" button text
|
|
90
|
-
const submitWord = result.words.find(w => w.content === 'Submit');
|
|
91
|
-
|
|
92
|
-
if (submitWord) {
|
|
93
|
-
// Calculate center of the word's bounding box
|
|
94
|
-
const x = Math.round((submitWord.bbox.x0 + submitWord.bbox.x1) / 2);
|
|
95
|
-
const y = Math.round((submitWord.bbox.y0 + submitWord.bbox.y1) / 2);
|
|
96
|
-
|
|
97
|
-
// Click at those coordinates
|
|
98
|
-
await testdriver.click({ x, y });
|
|
99
|
-
}
|
|
100
|
-
```
|
|
101
|
-
|
|
102
|
-
### Filter Words by Confidence
|
|
103
|
-
|
|
104
|
-
```javascript
|
|
105
|
-
const result = await testdriver.ocr();
|
|
106
|
-
|
|
107
|
-
// Only use high-confidence words (90%+)
|
|
108
|
-
const reliableWords = result.words.filter(w => w.confidence >= 90);
|
|
109
|
-
|
|
110
|
-
console.log('High confidence words:', reliableWords.map(w => w.content));
|
|
111
|
-
```
|
|
112
|
-
|
|
113
|
-
### Build Custom Assertions
|
|
114
|
-
|
|
115
|
-
```javascript
|
|
116
|
-
import { describe, expect, it } from "vitest";
|
|
117
|
-
import { TestDriver } from "testdriverai/lib/vitest/hooks.mjs";
|
|
118
|
-
|
|
119
|
-
describe("Login Page", () => {
|
|
120
|
-
it("should show form labels", async (context) => {
|
|
121
|
-
const testdriver = TestDriver(context);
|
|
122
|
-
|
|
123
|
-
await testdriver.provision.chrome({
|
|
124
|
-
url: 'https://myapp.com/login',
|
|
125
|
-
});
|
|
126
|
-
|
|
127
|
-
const result = await testdriver.ocr();
|
|
128
|
-
|
|
129
|
-
// Assert expected labels are present
|
|
130
|
-
expect(result.fullText).toContain('Email');
|
|
131
|
-
expect(result.fullText).toContain('Password');
|
|
132
|
-
expect(result.fullText).toContain('Sign In');
|
|
133
|
-
});
|
|
134
|
-
});
|
|
135
|
-
```
|
|
136
|
-
|
|
137
|
-
### Debug Screen Content
|
|
138
|
-
|
|
139
|
-
```javascript
|
|
140
|
-
// Useful for debugging what TestDriver can see
|
|
141
|
-
const result = await testdriver.ocr();
|
|
142
|
-
|
|
143
|
-
console.log('=== Screen Text ===');
|
|
144
|
-
console.log(result.fullText);
|
|
145
|
-
console.log('');
|
|
146
|
-
|
|
147
|
-
console.log('=== Word Details ===');
|
|
148
|
-
result.words.forEach((word, i) => {
|
|
149
|
-
console.log(`${i + 1}. "${word.content}" at (${word.bbox.x0}, ${word.bbox.y0}) - ${word.confidence}% confidence`);
|
|
150
|
-
});
|
|
151
|
-
```
|
|
152
|
-
|
|
153
|
-
### Find Multiple Instances
|
|
154
|
-
|
|
155
|
-
```javascript
|
|
156
|
-
const result = await testdriver.ocr();
|
|
157
|
-
|
|
158
|
-
// Find all instances of "Button" text
|
|
159
|
-
const buttons = result.words.filter(w =>
|
|
160
|
-
w.content.toLowerCase() === 'button'
|
|
161
|
-
);
|
|
162
|
-
|
|
163
|
-
console.log(`Found ${buttons.length} buttons on screen`);
|
|
164
|
-
|
|
165
|
-
buttons.forEach((btn, i) => {
|
|
166
|
-
console.log(`Button ${i + 1} at position (${btn.bbox.x0}, ${btn.bbox.y0})`);
|
|
167
|
-
});
|
|
168
|
-
```
|
|
169
|
-
|
|
170
|
-
## How It Works
|
|
171
|
-
|
|
172
|
-
1. TestDriver captures a screenshot of the current screen
|
|
173
|
-
2. The image is sent to the TestDriver API
|
|
174
|
-
3. Tesseract.js processes the image server-side with multiple workers
|
|
175
|
-
4. The API returns structured data with text and positions
|
|
176
|
-
5. Bounding box coordinates are scaled to match the original screen resolution
|
|
177
|
-
|
|
178
|
-
<Note>
|
|
179
|
-
OCR works best with clear, readable text. Very small text, unusual fonts, or low-contrast text may have lower confidence scores or be missed entirely.
|
|
180
|
-
</Note>
|
|
181
|
-
|
|
182
|
-
## Best Practices
|
|
183
|
-
|
|
184
|
-
<AccordionGroup>
|
|
185
|
-
<Accordion title="Use find() for element location">
|
|
186
|
-
For locating elements, prefer `find()` which uses AI vision. Use `ocr()` when you need raw text data or want to build custom text-based logic.
|
|
187
|
-
|
|
188
|
-
```javascript
|
|
189
|
-
// Prefer this for clicking elements
|
|
190
|
-
await testdriver.find("Submit button").click();
|
|
191
|
-
|
|
192
|
-
// Use ocr() for text verification or custom logic
|
|
193
|
-
const result = await testdriver.ocr();
|
|
194
|
-
expect(result.fullText).toContain('Success');
|
|
195
|
-
```
|
|
196
|
-
</Accordion>
|
|
197
|
-
|
|
198
|
-
<Accordion title="Filter by confidence">
|
|
199
|
-
OCR can sometimes misread characters. Filter by confidence score when accuracy is critical.
|
|
200
|
-
|
|
201
|
-
```javascript
|
|
202
|
-
const result = await testdriver.ocr();
|
|
203
|
-
const reliable = result.words.filter(w => w.confidence >= 85);
|
|
204
|
-
```
|
|
205
|
-
</Accordion>
|
|
206
|
-
|
|
207
|
-
<Accordion title="Handle case sensitivity">
|
|
208
|
-
Text matching should usually be case-insensitive since OCR capitalization can vary.
|
|
209
|
-
|
|
210
|
-
```javascript
|
|
211
|
-
const result = await testdriver.ocr();
|
|
212
|
-
const hasLogin = result.words.some(w =>
|
|
213
|
-
w.content.toLowerCase() === 'login'
|
|
214
|
-
);
|
|
215
|
-
```
|
|
216
|
-
</Accordion>
|
|
217
|
-
|
|
218
|
-
<Accordion title="Wait for content to load">
|
|
219
|
-
If text isn't being found, the page may not be fully loaded. Add a wait or use `waitForText()`.
|
|
220
|
-
|
|
221
|
-
```javascript
|
|
222
|
-
// Wait for specific text to appear
|
|
223
|
-
await testdriver.waitForText("Welcome");
|
|
224
|
-
|
|
225
|
-
// Then run OCR
|
|
226
|
-
const result = await testdriver.ocr();
|
|
227
|
-
```
|
|
228
|
-
</Accordion>
|
|
229
|
-
</AccordionGroup>
|
|
230
|
-
|
|
231
|
-
## Related
|
|
232
|
-
|
|
233
|
-
- [find()](/v7/find) - AI-powered element location
|
|
234
|
-
- [assert()](/v7/assert) - Make AI-powered assertions about screen state
|
|
235
|
-
- [waitForText()](/v7/waiting-for-elements) - Wait for text to appear on screen
|
|
236
|
-
- [screenshot()](/v7/screenshot) - Capture screenshots
|