testdriverai 7.3.16 → 7.3.18

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (37) hide show
  1. package/.github/skills/testdriver:find/SKILL.md +12 -0
  2. package/.github/skills/testdriver:parse/SKILL.md +118 -0
  3. package/.github/workflows/acceptance.yaml +2 -2
  4. package/CHANGELOG.md +13 -0
  5. package/agent/lib/sdk.js +4 -2
  6. package/ai/skills/testdriver:find/SKILL.md +75 -0
  7. package/ai/skills/testdriver:parse/SKILL.md +236 -0
  8. package/ai/skills/testdriver:reusable-code/SKILL.md +9 -0
  9. package/docs/_data/examples-manifest.json +78 -46
  10. package/docs/docs.json +1 -1
  11. package/docs/v7/examples/ai.mdx +1 -1
  12. package/docs/v7/examples/chrome-extension.mdx +1 -1
  13. package/docs/v7/examples/drag-and-drop.mdx +1 -1
  14. package/docs/v7/examples/element-not-found.mdx +1 -1
  15. package/docs/v7/examples/hover-image.mdx +1 -1
  16. package/docs/v7/examples/hover-text.mdx +1 -1
  17. package/docs/v7/examples/installer.mdx +1 -1
  18. package/docs/v7/examples/launch-vscode-linux.mdx +1 -1
  19. package/docs/v7/examples/match-image.mdx +1 -1
  20. package/docs/v7/examples/press-keys.mdx +1 -1
  21. package/docs/v7/examples/scroll-keyboard.mdx +1 -1
  22. package/docs/v7/examples/scroll-until-image.mdx +1 -1
  23. package/docs/v7/examples/scroll-until-text.mdx +1 -1
  24. package/docs/v7/examples/scroll.mdx +1 -1
  25. package/docs/v7/examples/type.mdx +1 -1
  26. package/docs/v7/examples/windows-installer.mdx +1 -1
  27. package/docs/v7/find.mdx +75 -0
  28. package/docs/v7/parse.mdx +237 -0
  29. package/interfaces/vitest-plugin.mjs +12 -45
  30. package/lib/sentry.js +24 -3
  31. package/mcp-server/dist/server.mjs +14 -1
  32. package/mcp-server/src/server.ts +21 -1
  33. package/package.json +1 -1
  34. package/sdk.d.ts +1 -1
  35. package/sdk.js +9 -0
  36. package/ai/skills/testdriver:ocr/SKILL.md +0 -235
  37. package/docs/v7/ocr.mdx +0 -236
package/docs/v7/ocr.mdx DELETED
@@ -1,236 +0,0 @@
1
- ---
2
- title: "ocr()"
3
- sidebarTitle: "ocr"
4
- description: "Extract all visible text from the screen using OCR"
5
- icon: "text"
6
- ---
7
-
8
- ## Overview
9
-
10
- Extract all visible text from the current screen using Tesseract OCR. Returns structured data including each word's text content, bounding box coordinates, and confidence scores.
11
-
12
- This method runs OCR on-demand and returns the results immediately. It's useful for:
13
- - Verifying text content on screen
14
- - Finding elements by their text when visual matching alone isn't enough
15
- - Debugging what text TestDriver can "see"
16
- - Building custom text-based assertions
17
-
18
- <Note>
19
- **Performance**: OCR runs server-side using Tesseract.js with a worker pool for fast extraction. A typical screenshot processes in 200-500ms.
20
- </Note>
21
-
22
- ## Syntax
23
-
24
- ```javascript
25
- const result = await testdriver.ocr()
26
- ```
27
-
28
- ## Parameters
29
-
30
- None.
31
-
32
- ## Returns
33
-
34
- `Promise<OCRResult>` - Object containing extracted text data
35
-
36
- ### OCRResult
37
-
38
- | Property | Type | Description |
39
- |----------|------|-------------|
40
- | `words` | `OCRWord[]` | Array of extracted words with positions |
41
- | `fullText` | `string` | All text concatenated with spaces |
42
- | `confidence` | `number` | Overall OCR confidence (0-100) |
43
- | `imageWidth` | `number` | Width of the analyzed screenshot |
44
- | `imageHeight` | `number` | Height of the analyzed screenshot |
45
-
46
- ### OCRWord
47
-
48
- | Property | Type | Description |
49
- |----------|------|-------------|
50
- | `content` | `string` | The word's text content |
51
- | `confidence` | `number` | Confidence score for this word (0-100) |
52
- | `bbox.x0` | `number` | Left edge X coordinate |
53
- | `bbox.y0` | `number` | Top edge Y coordinate |
54
- | `bbox.x1` | `number` | Right edge X coordinate |
55
- | `bbox.y1` | `number` | Bottom edge Y coordinate |
56
-
57
- ## Examples
58
-
59
- ### Get All Text on Screen
60
-
61
- ```javascript
62
- const result = await testdriver.ocr();
63
- console.log(result.fullText);
64
- // "Welcome to TestDriver Sign In Email Password Submit"
65
-
66
- console.log(`Found ${result.words.length} words with ${result.confidence}% confidence`);
67
- ```
68
-
69
- ### Check if Text Exists
70
-
71
- ```javascript
72
- const result = await testdriver.ocr();
73
-
74
- // Check for error message
75
- const hasError = result.words.some(w =>
76
- w.content.toLowerCase().includes('error')
77
- );
78
-
79
- if (hasError) {
80
- console.log('Error message detected on screen');
81
- }
82
- ```
83
-
84
- ### Find and Click Text
85
-
86
- ```javascript
87
- const result = await testdriver.ocr();
88
-
89
- // Find the "Submit" button text
90
- const submitWord = result.words.find(w => w.content === 'Submit');
91
-
92
- if (submitWord) {
93
- // Calculate center of the word's bounding box
94
- const x = Math.round((submitWord.bbox.x0 + submitWord.bbox.x1) / 2);
95
- const y = Math.round((submitWord.bbox.y0 + submitWord.bbox.y1) / 2);
96
-
97
- // Click at those coordinates
98
- await testdriver.click({ x, y });
99
- }
100
- ```
101
-
102
- ### Filter Words by Confidence
103
-
104
- ```javascript
105
- const result = await testdriver.ocr();
106
-
107
- // Only use high-confidence words (90%+)
108
- const reliableWords = result.words.filter(w => w.confidence >= 90);
109
-
110
- console.log('High confidence words:', reliableWords.map(w => w.content));
111
- ```
112
-
113
- ### Build Custom Assertions
114
-
115
- ```javascript
116
- import { describe, expect, it } from "vitest";
117
- import { TestDriver } from "testdriverai/lib/vitest/hooks.mjs";
118
-
119
- describe("Login Page", () => {
120
- it("should show form labels", async (context) => {
121
- const testdriver = TestDriver(context);
122
-
123
- await testdriver.provision.chrome({
124
- url: 'https://myapp.com/login',
125
- });
126
-
127
- const result = await testdriver.ocr();
128
-
129
- // Assert expected labels are present
130
- expect(result.fullText).toContain('Email');
131
- expect(result.fullText).toContain('Password');
132
- expect(result.fullText).toContain('Sign In');
133
- });
134
- });
135
- ```
136
-
137
- ### Debug Screen Content
138
-
139
- ```javascript
140
- // Useful for debugging what TestDriver can see
141
- const result = await testdriver.ocr();
142
-
143
- console.log('=== Screen Text ===');
144
- console.log(result.fullText);
145
- console.log('');
146
-
147
- console.log('=== Word Details ===');
148
- result.words.forEach((word, i) => {
149
- console.log(`${i + 1}. "${word.content}" at (${word.bbox.x0}, ${word.bbox.y0}) - ${word.confidence}% confidence`);
150
- });
151
- ```
152
-
153
- ### Find Multiple Instances
154
-
155
- ```javascript
156
- const result = await testdriver.ocr();
157
-
158
- // Find all instances of "Button" text
159
- const buttons = result.words.filter(w =>
160
- w.content.toLowerCase() === 'button'
161
- );
162
-
163
- console.log(`Found ${buttons.length} buttons on screen`);
164
-
165
- buttons.forEach((btn, i) => {
166
- console.log(`Button ${i + 1} at position (${btn.bbox.x0}, ${btn.bbox.y0})`);
167
- });
168
- ```
169
-
170
- ## How It Works
171
-
172
- 1. TestDriver captures a screenshot of the current screen
173
- 2. The image is sent to the TestDriver API
174
- 3. Tesseract.js processes the image server-side with multiple workers
175
- 4. The API returns structured data with text and positions
176
- 5. Bounding box coordinates are scaled to match the original screen resolution
177
-
178
- <Note>
179
- OCR works best with clear, readable text. Very small text, unusual fonts, or low-contrast text may have lower confidence scores or be missed entirely.
180
- </Note>
181
-
182
- ## Best Practices
183
-
184
- <AccordionGroup>
185
- <Accordion title="Use find() for element location">
186
- For locating elements, prefer `find()` which uses AI vision. Use `ocr()` when you need raw text data or want to build custom text-based logic.
187
-
188
- ```javascript
189
- // Prefer this for clicking elements
190
- await testdriver.find("Submit button").click();
191
-
192
- // Use ocr() for text verification or custom logic
193
- const result = await testdriver.ocr();
194
- expect(result.fullText).toContain('Success');
195
- ```
196
- </Accordion>
197
-
198
- <Accordion title="Filter by confidence">
199
- OCR can sometimes misread characters. Filter by confidence score when accuracy is critical.
200
-
201
- ```javascript
202
- const result = await testdriver.ocr();
203
- const reliable = result.words.filter(w => w.confidence >= 85);
204
- ```
205
- </Accordion>
206
-
207
- <Accordion title="Handle case sensitivity">
208
- Text matching should usually be case-insensitive since OCR capitalization can vary.
209
-
210
- ```javascript
211
- const result = await testdriver.ocr();
212
- const hasLogin = result.words.some(w =>
213
- w.content.toLowerCase() === 'login'
214
- );
215
- ```
216
- </Accordion>
217
-
218
- <Accordion title="Wait for content to load">
219
- If text isn't being found, the page may not be fully loaded. Add a wait or use `waitForText()`.
220
-
221
- ```javascript
222
- // Wait for specific text to appear
223
- await testdriver.waitForText("Welcome");
224
-
225
- // Then run OCR
226
- const result = await testdriver.ocr();
227
- ```
228
- </Accordion>
229
- </AccordionGroup>
230
-
231
- ## Related
232
-
233
- - [find()](/v7/find) - AI-powered element location
234
- - [assert()](/v7/assert) - Make AI-powered assertions about screen state
235
- - [waitForText()](/v7/waiting-for-elements) - Wait for text to appear on screen
236
- - [screenshot()](/v7/screenshot) - Capture screenshots