npm - testdriverai - Versions diffs - 7.3.32 → 7.3.33 - Mend

testdriverai 7.3.32 → 7.3.33

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (34) hide show

package/.github/copilot-instructions.md +641 -0
package/.github/skills/testdriver:assert/SKILL.md +31 -0
package/.github/skills/testdriver:aws-setup/SKILL.md +1 -68
package/.github/skills/testdriver:client/SKILL.md +33 -0
package/.github/skills/testdriver:debugging-with-screenshots/SKILL.md +175 -45
package/.github/skills/testdriver:find/SKILL.md +87 -0
package/.github/skills/testdriver:parse/SKILL.md +124 -6
package/.github/skills/testdriver:quickstart/SKILL.md +2 -17
package/.github/skills/testdriver:reusable-code/SKILL.md +9 -0
package/.github/skills/testdriver:running-tests/SKILL.md +4 -0
package/.github/skills/testdriver:screenshot/SKILL.md +84 -3
package/.github/skills/testdriver:scroll/SKILL.md +36 -0
package/.github/skills/testdriver:testdriver/SKILL.md +194 -86
package/CHANGELOG.md +4 -0
package/docs/_data/examples-manifest.json +72 -64
package/docs/v7/examples/ai.mdx +1 -1
package/docs/v7/examples/assert.mdx +1 -1
package/docs/v7/examples/chrome-extension.mdx +1 -1
package/docs/v7/examples/drag-and-drop.mdx +1 -1
package/docs/v7/examples/element-not-found.mdx +1 -1
package/docs/v7/examples/hover-image.mdx +1 -1
package/docs/v7/examples/hover-text.mdx +1 -1
package/docs/v7/examples/installer.mdx +1 -1
package/docs/v7/examples/launch-vscode-linux.mdx +1 -1
package/docs/v7/examples/match-image.mdx +1 -1
package/docs/v7/examples/press-keys.mdx +1 -1
package/docs/v7/examples/scroll-keyboard.mdx +1 -1
package/docs/v7/examples/scroll-until-image.mdx +1 -1
package/docs/v7/examples/scroll-until-text.mdx +1 -1
package/docs/v7/examples/scroll.mdx +1 -1
package/docs/v7/examples/type.mdx +1 -1
package/docs/v7/examples/windows-installer.mdx +1 -1
package/package.json +1 -2
package/sdk.js +1 -1

package/.github/skills/testdriver:parse/SKILL.md CHANGED Viewed

@@ -1,4 +1,3 @@
-```skill
 ---
 name: testdriver:parse
 description: Detect all UI elements on screen using OmniParser
@@ -74,6 +73,10 @@ const result = await testdriver.parse();
 const clickable = result.elements.filter(e => e.interactivity === 'clickable');
 console.log(`Found ${clickable.length} clickable elements`);
+clickable.forEach(el => {
+  console.log(`- "${el.content}" at (${el.bbox.x0}, ${el.bbox.y0})`);
+});
 ```
 ### Find and Click an Element by Content
@@ -81,13 +84,16 @@ console.log(`Found ${clickable.length} clickable elements`);
 ```javascript
 const result = await testdriver.parse();
+// Find a "Submit" button
 const submitBtn = result.elements.find(e =>
   e.content.toLowerCase().includes('submit') && e.interactivity === 'clickable'
 );
 if (submitBtn) {
+  // Calculate center of the bounding box
   const x = Math.round((submitBtn.bbox.x0 + submitBtn.bbox.x1) / 2);
   const y = Math.round((submitBtn.bbox.y0 + submitBtn.bbox.y1) / 2);
   await testdriver.click({ x, y });
 }
 ```
@@ -97,17 +103,130 @@ if (submitBtn) {
 ```javascript
 const result = await testdriver.parse();
+// Get all text elements
 const textElements = result.elements.filter(e => e.type === 'text');
+textElements.forEach(e => console.log(`Text: "${e.content}"`));
+// Get all icons
 const icons = result.elements.filter(e => e.type === 'icon');
+console.log(`Found ${icons.length} icons`);
+// Get all buttons
 const buttons = result.elements.filter(e => e.type === 'button');
+console.log(`Found ${buttons.length} buttons`);
+```
+### Build Custom Assertions
+```javascript
+import { describe, expect, it } from "vitest";
+import { TestDriver } from "testdriverai/lib/vitest/hooks.mjs";
+describe("Login Page", () => {
+  it("should have expected form elements", async (context) => {
+    const testdriver = TestDriver(context);
+    await testdriver.provision.chrome({
+      url: 'https://myapp.com/login',
+    });
+    const result = await testdriver.parse();
+    // Assert expected elements exist
+    const textContent = result.elements.map(e => e.content.toLowerCase());
+    expect(textContent).toContain('email');
+    expect(textContent).toContain('password');
+    // Assert there are clickable elements
+    const clickable = result.elements.filter(e => e.interactivity === 'clickable');
+    expect(clickable.length).toBeGreaterThan(0);
+  });
+});
+```
+### Use Bounding Box Coordinates
+```javascript
+const result = await testdriver.parse();
+result.elements.forEach(el => {
+  // Pixel coordinates
+  console.log(`Element "${el.content}":`);
+  console.log(`  bbox: (${el.bbox.x0}, ${el.bbox.y0}) to (${el.bbox.x1}, ${el.bbox.y1})`);
+  console.log(`  size: ${el.boundingBox.width}x${el.boundingBox.height}`);
+  console.log(`  position: left=${el.boundingBox.left}, top=${el.boundingBox.top}`);
+});
+```
+### View Annotated Screenshot
+```javascript
+const result = await testdriver.parse();
+// The annotated image shows all detected elements with bounding boxes
+console.log('Annotated screenshot:', result.annotatedImageUrl);
+console.log(`Image dimensions: ${result.imageWidth}x${result.imageHeight}`);
 ```
+## How It Works
+1. TestDriver captures a screenshot of the current screen
+2. The image is sent to the TestDriver API
+3. OmniParser v2 analyzes the image to detect all UI elements
+4. Each element is classified by type (text, icon, button, etc.) and interactivity
+5. Bounding box coordinates are returned in pixel coordinates matching the screen resolution
+<Note>
+  OmniParser detects elements visually — it works with any UI framework, native apps, and even non-standard interfaces. It does not rely on DOM or accessibility trees.
+</Note>
 ## Best Practices
-- Use `find()` for targeting specific elements — `parse()` is for full UI analysis
-- Filter by `interactivity` to distinguish clickable vs non-interactive elements
-- Wait for the page to stabilize before calling `parse()`
-- Use the `annotatedImageUrl` for visual debugging
+<AccordionGroup>
+  <Accordion title="Use find() for targeting specific elements">
+    For locating and interacting with a specific element, prefer `find()` which uses AI vision. Use `parse()` when you need a complete inventory of all elements on screen.
+    ```javascript
+    // Prefer this for clicking a specific element
+    await testdriver.find("Submit button").click();
+    // Use parse() for full UI analysis
+    const result = await testdriver.parse();
+    const allButtons = result.elements.filter(e => e.type === 'button');
+    ```
+  </Accordion>
+  <Accordion title="Filter by interactivity">
+    Use the `interactivity` field to distinguish between clickable and non-interactive elements.
+    ```javascript
+    const result = await testdriver.parse();
+    const interactive = result.elements.filter(e => e.interactivity === 'clickable');
+    const static_ = result.elements.filter(e => e.interactivity === 'non-interactive');
+    ```
+  </Accordion>
+  <Accordion title="Wait for content to load">
+    If elements aren't being detected, the page may not be fully loaded. Add a wait first.
+    ```javascript
+    // Wait for page to stabilize
+    await testdriver.wait(2000);
+    // Then parse
+    const result = await testdriver.parse();
+    ```
+  </Accordion>
+  <Accordion title="Use the annotated image for debugging">
+    The `annotatedImageUrl` provides a visual overlay showing all detected elements with their bounding boxes — great for debugging.
+    ```javascript
+    const result = await testdriver.parse();
+    console.log('View annotated screenshot:', result.annotatedImageUrl);
+    ```
+  </Accordion>
+</AccordionGroup>
 ## Related
@@ -115,4 +234,3 @@ const buttons = result.elements.filter(e => e.type === 'button');
 - [assert()](/v7/assert) - Make AI-powered assertions about screen state
 - [screenshot()](/v7/screenshot) - Capture screenshots
 - [Elements Reference](/v7/elements) - Complete Element API
-```

package/.github/skills/testdriver:quickstart/SKILL.md CHANGED Viewed

@@ -14,27 +14,12 @@ TestDriver makes it easy to write automated computer-use tests for web browsers,
     Get started quickly with the TestDriver CLI.
     <Steps>
-      <Step title="Create a TestDriver Account">
-        You will need a TestDriver account to get an API key.
-        <Card
-          title="Get an API Key"
-          icon="user-plus"
-          href="https://console.testdriver.ai/team"
-          arrow
-          horizontal
-        >
-          Start with 60 free device minutes, no credit-card required!
-        </Card>
-      </Step>
       <Step title="Install TestDriver">
         Use `npx` to quickly set up an example project:
         ```bash
-        npx testdriverai@beta init
+        npx testdriverai init
         ```
         This will walk you through creating a new project folder, installing dependencies, and setting up your API key.
@@ -79,7 +64,7 @@ TestDriver makes it easy to write automated computer-use tests for web browsers,
         Install Vitest and TestDriver as dev dependencies:
         ```bash
-        npm install --save-dev vitest testdriverai@beta
+        npm install --save-dev vitest testdriverai
         ```
       </Step>

package/.github/skills/testdriver:reusable-code/SKILL.md CHANGED Viewed

@@ -36,6 +36,15 @@ export async function logout(testdriver) {
 }
 ```
+<Warning>
+**Avoid hardcoding dynamic values in element descriptions.** Element selectors should describe the *type* of element, not specific content that might change.
+**❌ Bad:** `await testdriver.find('profile name TestDriver in the top right')`
+**✅ Good:** `await testdriver.find('user profile name in the top right')`
+Hardcoded values like usernames, product names, or prices will cause tests to fail when the data changes. Use generic descriptions that work regardless of the specific content displayed.
+</Warning>
 Now import and use these helpers in any test:
 ```javascript test/checkout.test.mjs

package/.github/skills/testdriver:running-tests/SKILL.md CHANGED Viewed

@@ -10,6 +10,10 @@ Learn how to run TestDriver tests efficiently with Vitest's powerful test runner
 TestDriver works with Vitest's powerful test runner.
+<Info>
+  Install Vitest globally for best results: `npm install vitest -g`
+</Info>
 ### Run All Tests
 ```bash

package/.github/skills/testdriver:screenshot/SKILL.md CHANGED Viewed

@@ -8,6 +8,10 @@ description: Capture and save screenshots during test execution
 Capture a screenshot of the current screen and automatically save it to a local file. Screenshots are organized by test file for easy debugging and review.
+<Note>
+  **Automatic Screenshots (Default: Enabled)**: TestDriver automatically captures screenshots before and after every command (click, type, find, etc.). These are saved with descriptive filenames like `001-click-before-L42-submit-button.png` that include the line number from your test file. You can disable this with `autoScreenshots: false` in your TestDriver options.
+</Note>
 ## Syntax
 ```javascript
@@ -32,12 +36,32 @@ Screenshots are automatically saved to `.testdriver/screenshots/<test-file-name>
 .testdriver/
   screenshots/
     login.test/
-      screenshot-1737633600000.png
-      login-page.png
+      001-find-before-L15-email-input.png     # Auto: before find()
+      002-find-after-L15-email-input.png      # Auto: after find()
+      003-click-before-L16-email-input.png    # Auto: before click()
+      004-click-after-L16-email-input.png     # Auto: after click()
+      005-type-before-L17-userexamplecom.png  # Auto: before type()
+      006-type-after-L17-userexamplecom.png   # Auto: after type()
+      custom-screenshot.png                    # Manual: screenshot("custom-screenshot")
     checkout.test/
-      screenshot-1737633700000.png
+      001-find-before-L12-checkout-button.png
+      ...
 ```
+### Automatic Screenshot Naming
+When `autoScreenshots` is enabled (default), filenames follow this format:
+`<seq>-<action>-<phase>-L<line>-<description>.png`
+| Component | Description | Example |
+|-----------|-------------|---------|
+| `seq` | Sequential number (001, 002, ...) | `001` |
+| `action` | Command name | `click`, `type`, `find` |
+| `phase` | Before, after, or error | `before`, `after` |
+| `L<line>` | Line number from test file | `L42` |
+| `description` | Element description or action target | `submit-button` |
 <Note>
   The screenshot folder for each test file is automatically cleared when the test starts. This ensures you only see screenshots from the most recent test run.
 </Note>
@@ -107,9 +131,66 @@ describe("Login Flow", () => {
 });
 ```
+## Automatic Screenshots
+By default, TestDriver captures screenshots **automatically** before and after every command. This creates a complete visual timeline of your test execution without any additional code.
+### Enabling/Disabling
+```javascript
+// Auto-screenshots enabled by default
+const testdriver = TestDriver(context);
+// Explicitly disable if needed (not recommended)
+const testdriver = TestDriver(context, {
+  autoScreenshots: false
+});
+```
+### What Gets Captured
+Automatic screenshots are taken around these commands:
+- `find()` / `findAll()`
+- `click()` / `hover()` / `doubleClick()` / `rightClick()`
+- `type()` / `pressKeys()`
+- `scroll()` / `scrollUntilText()` / `scrollUntilImage()`
+- `waitForText()` / `waitForImage()`
+- `focusApplication()`
+- `assert()` / `extract()` / `exec()`
+### Example Output
+For this test code:
+```javascript
+// Line 15: Find email input
+const emailInput = await testdriver.find("email input");
+// Line 16: Click it
+await emailInput.click();
+// Line 17: Type email
+await testdriver.type("user@example.com");
+```
+TestDriver automatically saves:
+```
+001-find-before-L15-email-input.png
+002-find-after-L15-email-input.png
+003-click-before-L16-email-input.png
+004-click-after-L16-email-input.png
+005-type-before-L17-userexamplecom.png
+006-type-after-L17-userexamplecom.png
+```
+If an error occurs, the phase will be `error` instead of `after`.
 ## Best Practices
 <AccordionGroup>
+  <Accordion title="Let automatic screenshots do the work">
+    With `autoScreenshots: true` (default), you get comprehensive coverage without adding manual `screenshot()` calls. Only add manual screenshots for specific named checkpoints.
+  </Accordion>
   <Accordion title="Use screenshots for debugging flaky tests">
     When a test fails intermittently, add screenshots at key steps to capture the actual screen state. This helps identify timing issues or unexpected UI states.
   </Accordion>

package/.github/skills/testdriver:scroll/SKILL.md CHANGED Viewed

@@ -8,6 +8,21 @@ description: Scroll pages and elements
 Scroll the page or active element in any direction using mouse wheel or keyboard.
+<Warning>
+  **Focus Requirements**
+  Scrolling requires the page or a frame to be focused. If an input field or other interactive element has focus, scroll commands may not work as expected. Before scrolling, ensure focus is on the page by:
+  - Clicking on a non-interactive area (e.g., page background)
+  - Pressing the Escape key to unfocus interactive elements
+  - Clicking outside of input fields or text areas
+  **If scroll is still not working**, try using Page Down/Page Up keys directly:
+  ```javascript
+  await testdriver.pressKeys(['pagedown']); // Scroll down
+  await testdriver.pressKeys(['pageup']);   // Scroll up
+  ```
+</Warning>
 ## Syntax
 ```javascript
@@ -133,6 +148,27 @@ await testdriver.scrollUntilImage('loading spinner', 'down', 5000, 'keyboard', n
 ## Best Practices
+<Check>
+  **Ensure page has focus before scrolling**
+  ```javascript
+  // After typing in an input, unfocus it first
+  await testdriver.find('email input').click();
+  await testdriver.type('user@example.com');
+  // Click elsewhere or press Escape before scrolling
+  await testdriver.pressKeys(['escape']);
+  // Or click a non-interactive area
+  // await testdriver.find('page background').click();
+  // Now scroll will work properly
+  await testdriver.scroll('down', 300);
+  // If scroll still doesn't work, use Page Down directly
+  // await testdriver.pressKeys(['pagedown']);
+  ```
+</Check>
 <Check>
   **Choose the right scroll method**