npm - testdriverai - Versions diffs - 7.3.15 → 7.3.17 - Mend

testdriverai 7.3.15 → 7.3.17

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (16) hide show

package/.github/skills/testdriver:find/SKILL.md +12 -0
package/.github/skills/testdriver:parse/SKILL.md +118 -0
package/CHANGELOG.md +13 -0
package/agent/lib/sdk.js +4 -2
package/ai/skills/testdriver:find/SKILL.md +75 -0
package/ai/skills/testdriver:parse/SKILL.md +236 -0
package/ai/skills/testdriver:reusable-code/SKILL.md +9 -0
package/docs/docs.json +1 -1
package/docs/v7/find.mdx +75 -0
package/docs/v7/parse.mdx +237 -0
package/package.json +1 -1
package/sdk.d.ts +1 -1
package/sdk.js +9 -0
package/vitest.config.mjs +3 -0
package/ai/skills/testdriver:ocr/SKILL.md +0 -235
package/docs/v7/ocr.mdx +0 -236

package/.github/skills/testdriver:find/SKILL.md CHANGED Viewed

@@ -37,6 +37,18 @@ const element = await testdriver.find(description, options)
       Maximum time in milliseconds to poll for the element. Retries every 5 seconds until found or timeout expires.
     </ParamField>
+    <ParamField path="confidence" type="number">
+      Minimum confidence threshold (0-1). If the AI's confidence score for the found element is below this value, the find will be treated as a failure (`element.found()` returns `false`). Useful for ensuring high-quality matches in critical test steps.
+    </ParamField>
+    <ParamField path="type" type="string">
+      Element type hint that wraps the description for better matching. Accepted values:
+      - `"text"` — Wraps the prompt as `The text "..."`
+      - `"image"` — Wraps the prompt as `The image "..."`
+      - `"ui"` — Wraps the prompt as `The UI element "..."`
+      - `"any"` — No wrapping, uses the description as-is (default behavior)
+    </ParamField>
     <ParamField path="zoom" type="boolean" default={false}>
       Enable two-phase zoom mode for better precision in crowded UIs with many similar elements.
     </ParamField>

package/.github/skills/testdriver:parse/SKILL.md ADDED Viewed

@@ -0,0 +1,118 @@
+```skill
+---
+name: testdriver:parse
+description: Detect all UI elements on screen using OmniParser
+---
+<!-- Generated from parse.mdx. DO NOT EDIT. -->
+## Overview
+Parse the current screen using OmniParser v2 to detect all visible UI elements. Returns structured data including element types, text content, interactivity levels, and bounding box coordinates.
+This method analyzes the entire screen and returns every detected element. It's useful for:
+- Understanding the full UI layout of a screen
+- Finding all clickable or interactive elements
+- Building custom element-based logic
+- Debugging what elements TestDriver can detect
+- Accessibility auditing
+<Note>
+  **Availability**: `parse()` requires an enterprise or self-hosted plan. It uses OmniParser v2 server-side for element detection.
+</Note>
+## Syntax
+```javascript
+const result = await testdriver.parse()
+```
+## Parameters
+None.
+## Returns
+`Promise<ParseResult>` - Object containing detected UI elements
+### ParseResult
+| Property | Type | Description |
+|----------|------|-------------|
+| `elements` | `ParsedElement[]` | Array of detected UI elements |
+| `annotatedImageUrl` | `string` | URL of the annotated screenshot with bounding boxes |
+| `imageWidth` | `number` | Width of the analyzed screenshot |
+| `imageHeight` | `number` | Height of the analyzed screenshot |
+### ParsedElement
+| Property | Type | Description |
+|----------|------|-------------|
+| `index` | `number` | Element index |
+| `type` | `string` | Element type (e.g. `"text"`, `"icon"`, `"button"`) |
+| `content` | `string` | Text content or description of the element |
+| `interactivity` | `string` | Interactivity level (e.g. `"clickable"`, `"non-interactive"`) |
+| `bbox` | `object` | Bounding box in pixel coordinates `{x0, y0, x1, y1}` |
+| `boundingBox` | `object` | Bounding box as `{left, top, width, height}` |
+## Examples
+### Get All Elements on Screen
+```javascript
+const result = await testdriver.parse();
+console.log(`Found ${result.elements.length} elements`);
+result.elements.forEach((el, i) => {
+  console.log(`${i + 1}. [${el.type}] "${el.content}" (${el.interactivity})`);
+});
+```
+### Find Clickable Elements
+```javascript
+const result = await testdriver.parse();
+const clickable = result.elements.filter(e => e.interactivity === 'clickable');
+console.log(`Found ${clickable.length} clickable elements`);
+```
+### Find and Click an Element by Content
+```javascript
+const result = await testdriver.parse();
+const submitBtn = result.elements.find(e =>
+  e.content.toLowerCase().includes('submit') && e.interactivity === 'clickable'
+);
+if (submitBtn) {
+  const x = Math.round((submitBtn.bbox.x0 + submitBtn.bbox.x1) / 2);
+  const y = Math.round((submitBtn.bbox.y0 + submitBtn.bbox.y1) / 2);
+  await testdriver.click({ x, y });
+}
+```
+### Filter by Element Type
+```javascript
+const result = await testdriver.parse();
+const textElements = result.elements.filter(e => e.type === 'text');
+const icons = result.elements.filter(e => e.type === 'icon');
+const buttons = result.elements.filter(e => e.type === 'button');
+```
+## Best Practices
+- Use `find()` for targeting specific elements — `parse()` is for full UI analysis
+- Filter by `interactivity` to distinguish clickable vs non-interactive elements
+- Wait for the page to stabilize before calling `parse()`
+- Use the `annotatedImageUrl` for visual debugging
+## Related
+- [find()](/v7/find) - AI-powered element location
+- [assert()](/v7/assert) - Make AI-powered assertions about screen state
+- [screenshot()](/v7/screenshot) - Capture screenshots
+- [Elements Reference](/v7/elements) - Complete Element API
+```

package/CHANGELOG.md CHANGED Viewed

@@ -1,3 +1,16 @@
+## [7.3.17](https://github.com/testdriverai/testdriverai/compare/v7.3.16...v7.3.17) (2026-02-18)
+### Features
+* add type option to find(), move confidence to API, rename ocr to parse ([#640](https://github.com/testdriverai/testdriverai/issues/640)) ([d98a94b](https://github.com/testdriverai/testdriverai/commit/d98a94bd05c135c74d9fed2ebb72c709fc643337))
+## [7.3.16](https://github.com/testdriverai/testdriverai/compare/v7.3.14...v7.3.16) (2026-02-18)
 ## [7.3.15](https://github.com/testdriverai/testdriverai/compare/v7.3.14...v7.3.15) (2026-02-18)

package/agent/lib/sdk.js CHANGED Viewed

@@ -337,9 +337,11 @@ const createSDK = (emitter, config, sessionInstance) => {
   }
   const req = async (path, data, onChunk) => {
-    // for each value of data, if it is empty remove it
+    // for each value of data, if it is null/undefined remove it
+    // Note: use == null to match both null and undefined, but preserve
+    // other falsy values like 0, false, and "" which may be intentional
     for (let key in data) {
-      if (!data[key]) {
+      if (data[key] == null) {
         delete data[key];
       }
     }

package/ai/skills/testdriver:find/SKILL.md CHANGED Viewed

@@ -37,6 +37,18 @@ const element = await testdriver.find(description, options)
       Maximum time in milliseconds to poll for the element. Retries every 5 seconds until found or timeout expires.
     </ParamField>
+    <ParamField path="confidence" type="number">
+      Minimum confidence threshold (0-1). If the AI's confidence score for the found element is below this value, the find will be treated as a failure (`element.found()` returns `false`). Useful for ensuring high-quality matches in critical test steps.
+    </ParamField>
+    <ParamField path="type" type="string">
+      Element type hint that wraps the description for better matching. Accepted values:
+      - `"text"` — Wraps the prompt as `The text "..."`
+      - `"image"` — Wraps the prompt as `The image "..."`
+      - `"ui"` — Wraps the prompt as `The UI element "..."`
+      - `"any"` — No wrapping, uses the description as-is (default behavior)
+    </ParamField>
     <ParamField path="zoom" type="boolean" default={false}>
       Enable two-phase zoom mode for better precision in crowded UIs with many similar elements.
     </ParamField>
@@ -232,6 +244,69 @@ This is useful for:
   ```
 </Check>
+## Confidence Threshold
+Require a minimum AI confidence score for element matches. If the confidence is below the threshold, `find()` treats the result as not found:
+```javascript
+// Require at least 90% confidence
+const element = await testdriver.find('submit button', { confidence: 0.9 });
+if (!element.found()) {
+  // AI found something but wasn't confident enough
+  throw new Error('Could not confidently locate submit button');
+}
+await element.click();
+```
+This is useful for:
+- Critical test steps where an incorrect click could cause cascading failures
+- Distinguishing between similar elements (e.g., multiple buttons)
+- Failing fast when the UI has changed unexpectedly
+```javascript
+// Combine with timeout for robust polling with confidence gate
+const element = await testdriver.find('success notification', {
+  confidence: 0.85,
+  timeout: 15000,
+});
+```
+<Tip>
+  The `confidence` value is a float between 0 and 1 (e.g., `0.9` = 90%). The AI returns its confidence with each find result, which you can also read from `element.confidence` after a successful find.
+</Tip>
+## Element Type
+Use the `type` option to hint what kind of element you're looking for. This wraps your description into a more specific prompt for the AI, improving match accuracy — especially when users provide short or ambiguous descriptions.
+```javascript
+// Find text on the page
+const label = await testdriver.find('Sign In', { type: 'text' });
+// AI prompt becomes: The text "Sign In"
+// Find an image
+const logo = await testdriver.find('company logo', { type: 'image' });
+// AI prompt becomes: The image "company logo"
+// Find a UI element (button, input, checkbox, etc.)
+const btn = await testdriver.find('Submit', { type: 'ui' });
+// AI prompt becomes: The UI element "Submit"
+// No wrapping — same as omitting the option
+const el = await testdriver.find('the blue submit button', { type: 'any' });
+```
+| Type | Prompt sent to AI |
+|------|----|
+| `"text"` | `The text "..."` |
+| `"image"` | `The image "..."` |
+| `"ui"` | `The UI element "..."` |
+| `"any"` | Original description (no wrapping) |
+<Tip>
+  This is particularly useful for short descriptions like `"Submit"` or `"Login"` where the AI may not know whether to look for a button, a link, or visible text. Specifying `type` removes the ambiguity.
+</Tip>
 ## Polling for Dynamic Elements
 For elements that may not be immediately visible, use the `timeout` option to automatically poll:

package/ai/skills/testdriver:parse/SKILL.md ADDED Viewed

@@ -0,0 +1,236 @@
+---
+name: testdriver:parse
+description: Detect all UI elements on screen using OmniParser
+---
+<!-- Generated from parse.mdx. DO NOT EDIT. -->
+## Overview
+Parse the current screen using OmniParser v2 to detect all visible UI elements. Returns structured data including element types, text content, interactivity levels, and bounding box coordinates.
+This method analyzes the entire screen and returns every detected element. It's useful for:
+- Understanding the full UI layout of a screen
+- Finding all clickable or interactive elements
+- Building custom element-based logic
+- Debugging what elements TestDriver can detect
+- Accessibility auditing
+<Note>
+  **Availability**: `parse()` requires an enterprise or self-hosted plan. It uses OmniParser v2 server-side for element detection.
+</Note>
+## Syntax
+```javascript
+const result = await testdriver.parse()
+```
+## Parameters
+None.
+## Returns
+`Promise<ParseResult>` - Object containing detected UI elements
+### ParseResult
+| Property | Type | Description |
+|----------|------|-------------|
+| `elements` | `ParsedElement[]` | Array of detected UI elements |
+| `annotatedImageUrl` | `string` | URL of the annotated screenshot with bounding boxes |
+| `imageWidth` | `number` | Width of the analyzed screenshot |
+| `imageHeight` | `number` | Height of the analyzed screenshot |
+### ParsedElement
+| Property | Type | Description |
+|----------|------|-------------|
+| `index` | `number` | Element index |
+| `type` | `string` | Element type (e.g. `"text"`, `"icon"`, `"button"`) |
+| `content` | `string` | Text content or description of the element |
+| `interactivity` | `string` | Interactivity level (e.g. `"clickable"`, `"non-interactive"`) |
+| `bbox` | `object` | Bounding box in pixel coordinates `{x0, y0, x1, y1}` |
+| `boundingBox` | `object` | Bounding box as `{left, top, width, height}` |
+## Examples
+### Get All Elements on Screen
+```javascript
+const result = await testdriver.parse();
+console.log(`Found ${result.elements.length} elements`);
+result.elements.forEach((el, i) => {
+  console.log(`${i + 1}. [${el.type}] "${el.content}" (${el.interactivity})`);
+});
+```
+### Find Clickable Elements
+```javascript
+const result = await testdriver.parse();
+const clickable = result.elements.filter(e => e.interactivity === 'clickable');
+console.log(`Found ${clickable.length} clickable elements`);
+clickable.forEach(el => {
+  console.log(`- "${el.content}" at (${el.bbox.x0}, ${el.bbox.y0})`);
+});
+```
+### Find and Click an Element by Content
+```javascript
+const result = await testdriver.parse();
+// Find a "Submit" button
+const submitBtn = result.elements.find(e =>
+  e.content.toLowerCase().includes('submit') && e.interactivity === 'clickable'
+);
+if (submitBtn) {
+  // Calculate center of the bounding box
+  const x = Math.round((submitBtn.bbox.x0 + submitBtn.bbox.x1) / 2);
+  const y = Math.round((submitBtn.bbox.y0 + submitBtn.bbox.y1) / 2);
+  await testdriver.click({ x, y });
+}
+```
+### Filter by Element Type
+```javascript
+const result = await testdriver.parse();
+// Get all text elements
+const textElements = result.elements.filter(e => e.type === 'text');
+textElements.forEach(e => console.log(`Text: "${e.content}"`));
+// Get all icons
+const icons = result.elements.filter(e => e.type === 'icon');
+console.log(`Found ${icons.length} icons`);
+// Get all buttons
+const buttons = result.elements.filter(e => e.type === 'button');
+console.log(`Found ${buttons.length} buttons`);
+```
+### Build Custom Assertions
+```javascript
+import { describe, expect, it } from "vitest";
+import { TestDriver } from "testdriverai/lib/vitest/hooks.mjs";
+describe("Login Page", () => {
+  it("should have expected form elements", async (context) => {
+    const testdriver = TestDriver(context);
+    await testdriver.provision.chrome({
+      url: 'https://myapp.com/login',
+    });
+    const result = await testdriver.parse();
+    // Assert expected elements exist
+    const textContent = result.elements.map(e => e.content.toLowerCase());
+    expect(textContent).toContain('email');
+    expect(textContent).toContain('password');
+    // Assert there are clickable elements
+    const clickable = result.elements.filter(e => e.interactivity === 'clickable');
+    expect(clickable.length).toBeGreaterThan(0);
+  });
+});
+```
+### Use Bounding Box Coordinates
+```javascript
+const result = await testdriver.parse();
+result.elements.forEach(el => {
+  // Pixel coordinates
+  console.log(`Element "${el.content}":`);
+  console.log(`  bbox: (${el.bbox.x0}, ${el.bbox.y0}) to (${el.bbox.x1}, ${el.bbox.y1})`);
+  console.log(`  size: ${el.boundingBox.width}x${el.boundingBox.height}`);
+  console.log(`  position: left=${el.boundingBox.left}, top=${el.boundingBox.top}`);
+});
+```
+### View Annotated Screenshot
+```javascript
+const result = await testdriver.parse();
+// The annotated image shows all detected elements with bounding boxes
+console.log('Annotated screenshot:', result.annotatedImageUrl);
+console.log(`Image dimensions: ${result.imageWidth}x${result.imageHeight}`);
+```
+## How It Works
+1. TestDriver captures a screenshot of the current screen
+2. The image is sent to the TestDriver API
+3. OmniParser v2 analyzes the image to detect all UI elements
+4. Each element is classified by type (text, icon, button, etc.) and interactivity
+5. Bounding box coordinates are returned in pixel coordinates matching the screen resolution
+<Note>
+  OmniParser detects elements visually — it works with any UI framework, native apps, and even non-standard interfaces. It does not rely on DOM or accessibility trees.
+</Note>
+## Best Practices
+<AccordionGroup>
+  <Accordion title="Use find() for targeting specific elements">
+    For locating and interacting with a specific element, prefer `find()` which uses AI vision. Use `parse()` when you need a complete inventory of all elements on screen.
+    ```javascript
+    // Prefer this for clicking a specific element
+    await testdriver.find("Submit button").click();
+    // Use parse() for full UI analysis
+    const result = await testdriver.parse();
+    const allButtons = result.elements.filter(e => e.type === 'button');
+    ```
+  </Accordion>
+  <Accordion title="Filter by interactivity">
+    Use the `interactivity` field to distinguish between clickable and non-interactive elements.
+    ```javascript
+    const result = await testdriver.parse();
+    const interactive = result.elements.filter(e => e.interactivity === 'clickable');
+    const static_ = result.elements.filter(e => e.interactivity === 'non-interactive');
+    ```
+  </Accordion>
+  <Accordion title="Wait for content to load">
+    If elements aren't being detected, the page may not be fully loaded. Add a wait first.
+    ```javascript
+    // Wait for page to stabilize
+    await testdriver.wait(2000);
+    // Then parse
+    const result = await testdriver.parse();
+    ```
+  </Accordion>
+  <Accordion title="Use the annotated image for debugging">
+    The `annotatedImageUrl` provides a visual overlay showing all detected elements with their bounding boxes — great for debugging.
+    ```javascript
+    const result = await testdriver.parse();
+    console.log('View annotated screenshot:', result.annotatedImageUrl);
+    ```
+  </Accordion>
+</AccordionGroup>
+## Related
+- [find()](/v7/find) - AI-powered element location
+- [assert()](/v7/assert) - Make AI-powered assertions about screen state
+- [screenshot()](/v7/screenshot) - Capture screenshots
+- [Elements Reference](/v7/elements) - Complete Element API

package/ai/skills/testdriver:reusable-code/SKILL.md CHANGED Viewed

@@ -36,6 +36,15 @@ export async function logout(testdriver) {
 }
 ```
+<Warning>
+**Avoid hardcoding dynamic values in element descriptions.** Element selectors should describe the *type* of element, not specific content that might change.
+**❌ Bad:** `await testdriver.find('profile name TestDriver in the top right')`
+**✅ Good:** `await testdriver.find('user profile name in the top right')`
+Hardcoded values like usernames, product names, or prices will cause tests to fail when the data changes. Use generic descriptions that work regardless of the specific content displayed.
+</Warning>
 Now import and use these helpers in any test:
 ```javascript test/checkout.test.mjs

package/docs/docs.json CHANGED Viewed

@@ -107,7 +107,7 @@
               "/v7/hover",
               "/v7/mouse-down",
               "/v7/mouse-up",
-              "/v7/ocr",
+              "/v7/parse",
               "/v7/press-keys",
               "/v7/right-click",
               "/v7/screenshot",

package/docs/v7/find.mdx CHANGED Viewed

@@ -38,6 +38,18 @@ const element = await testdriver.find(description, options)
       Maximum time in milliseconds to poll for the element. Retries every 5 seconds until found or timeout expires.
     </ParamField>
+    <ParamField path="confidence" type="number">
+      Minimum confidence threshold (0-1). If the AI's confidence score for the found element is below this value, the find will be treated as a failure (`element.found()` returns `false`). Useful for ensuring high-quality matches in critical test steps.
+    </ParamField>
+    <ParamField path="type" type="string">
+      Element type hint that wraps the description for better matching. Accepted values:
+      - `"text"` — Wraps the prompt as `The text "..."`
+      - `"image"` — Wraps the prompt as `The image "..."`
+      - `"ui"` — Wraps the prompt as `The UI element "..."`
+      - `"any"` — No wrapping, uses the description as-is (default behavior)
+    </ParamField>
     <ParamField path="zoom" type="boolean" default={false}>
       Enable two-phase zoom mode for better precision in crowded UIs with many similar elements.
     </ParamField>
@@ -233,6 +245,69 @@ This is useful for:
   ```
 </Check>
+## Confidence Threshold
+Require a minimum AI confidence score for element matches. If the confidence is below the threshold, `find()` treats the result as not found:
+```javascript
+// Require at least 90% confidence
+const element = await testdriver.find('submit button', { confidence: 0.9 });
+if (!element.found()) {
+  // AI found something but wasn't confident enough
+  throw new Error('Could not confidently locate submit button');
+}
+await element.click();
+```
+This is useful for:
+- Critical test steps where an incorrect click could cause cascading failures
+- Distinguishing between similar elements (e.g., multiple buttons)
+- Failing fast when the UI has changed unexpectedly
+```javascript
+// Combine with timeout for robust polling with confidence gate
+const element = await testdriver.find('success notification', {
+  confidence: 0.85,
+  timeout: 15000,
+});
+```
+<Tip>
+  The `confidence` value is a float between 0 and 1 (e.g., `0.9` = 90%). The AI returns its confidence with each find result, which you can also read from `element.confidence` after a successful find.
+</Tip>
+## Element Type
+Use the `type` option to hint what kind of element you're looking for. This wraps your description into a more specific prompt for the AI, improving match accuracy — especially when users provide short or ambiguous descriptions.
+```javascript
+// Find text on the page
+const label = await testdriver.find('Sign In', { type: 'text' });
+// AI prompt becomes: The text "Sign In"
+// Find an image
+const logo = await testdriver.find('company logo', { type: 'image' });
+// AI prompt becomes: The image "company logo"
+// Find a UI element (button, input, checkbox, etc.)
+const btn = await testdriver.find('Submit', { type: 'ui' });
+// AI prompt becomes: The UI element "Submit"
+// No wrapping — same as omitting the option
+const el = await testdriver.find('the blue submit button', { type: 'any' });
+```
+| Type | Prompt sent to AI |
+|------|----|
+| `"text"` | `The text "..."` |
+| `"image"` | `The image "..."` |
+| `"ui"` | `The UI element "..."` |
+| `"any"` | Original description (no wrapping) |
+<Tip>
+  This is particularly useful for short descriptions like `"Submit"` or `"Login"` where the AI may not know whether to look for a button, a link, or visible text. Specifying `type` removes the ambiguity.
+</Tip>
 ## Polling for Dynamic Elements
 For elements that may not be immediately visible, use the `timeout` option to automatically poll: