playwright-genie 1.0.1 → 2.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -12,9 +12,14 @@
12
12
  - **Natural Language** — describe elements in plain English, no selectors needed
13
13
  - **40+ Playwright Actions** — click, fill, check, hover, drag, wait, screenshot and more
14
14
  - **Any LLM Provider** — OpenAI, Claude, Ollama, Azure, or any OpenAI-compatible API
15
- - **Smart Caching** — persistent disk cache + in-memory cache to minimize LLM calls
15
+ - **Adaptive Page Analysis** — automatically switches between ARIA, hybrid, and DOM-only modes based on page accessibility quality
16
+ - **DOM Tree Pipeline** — generates XPath and CSS selectors in-browser for pages with poor accessibility
17
+ - **Smart Caching** — two-tier cache (memory + disk `.locator-cache.json`) to minimize LLM calls
18
+ - **Fallback Locator Chains** — LLM returns multiple locator strategies; if the primary fails, fallbacks are tried automatically
16
19
  - **Action-Aware** — `fill('username')` targets the input, not the label
17
- - **Auto-Retry** — stale cached locators are automatically re-resolved
20
+ - **Auto-Retry** — stale cached locators are automatically invalidated and re-resolved
21
+ - **Batch Resolution** — `prefetch()` resolves multiple queries in a single LLM call
22
+ - **iframe Support** — automatically detects and resolves elements inside iframes
18
23
  - **TypeScript Support** — full type definitions included
19
24
  - **Single Page Object** — one `createSmartLocator(page)` works across all navigations
20
25
 
@@ -95,6 +100,20 @@ await smart.click('submit button');
95
100
  await browser.close();
96
101
  ```
97
102
 
103
+ ## How It Works
104
+
105
+ When you call `smart.click('login button')`, the library:
106
+
107
+ 1. **Collects page structure** — gathers the ARIA accessibility tree, interactive elements, special attributes (`data-testid`, `placeholder`, `aria-label`), and a full DOM tree with XPath/CSS selectors
108
+ 2. **Evaluates ARIA quality** — scores the page as `good`, `sparse`, or `none` based on how many named interactive elements the ARIA tree contains
109
+ 3. **Builds an adaptive payload** — selects the best strategy:
110
+ - **`aria` mode** — rich ARIA tree with good accessibility; uses ARIA snapshot + special elements
111
+ - **`hybrid` mode** — sparse ARIA; combines the ARIA tree with DOM tree nodes for better coverage
112
+ - **`dom` mode** — no useful ARIA; sends DOM tree with XPaths and CSS selectors generated in-browser
113
+ 4. **Queries the LLM** — sends the payload with your natural language query; the LLM returns a Playwright locator string (e.g., `getByRole('button', { name: 'Login' })`) along with fallback locators
114
+ 5. **Validates and caches** — verifies the locator resolves to a real element, caches it to memory and disk, and returns a `SmartAction` for interaction
115
+ 6. **Auto-recovers** — if a cached locator goes stale, it's invalidated and re-resolved; if the primary locator fails, fallback locators are tried automatically
116
+
98
117
  ## API Reference
99
118
 
100
119
  ### `createSmartLocator(page, options?)`
@@ -114,6 +133,7 @@ const smart = createSmartLocator(page, { verbose: true });
114
133
  | `model` | `string` | env var | Override LLM model |
115
134
  | `temperature` | `number` | `0` | LLM temperature |
116
135
  | `maxTokens` | `number` | `1024` | Max response tokens |
136
+ | `actionTimeout` | `number` | `10000` | Timeout for actions in ms |
117
137
 
118
138
  ---
119
139
 
@@ -282,6 +302,113 @@ smart.clearCache(); // clear in-memory cache only
282
302
  smart.clearAllCaches(); // clear both memory + disk (.locator-cache.json)
283
303
  ```
284
304
 
305
+ Use `clearCache()` after SPA navigations where the DOM changes significantly (e.g., after login) to force fresh page analysis.
306
+
307
+ ## Low-Level API
308
+
309
+ For advanced use cases, you can use the lower-level exports directly.
310
+
311
+ ### `findLocator(page, query, options?)`
312
+
313
+ Resolves a single natural language query to a Playwright locator string without performing any action.
314
+
315
+ ```js
316
+ import { findLocator } from 'playwright-genie';
317
+
318
+ const result = await findLocator(page, 'submit button');
319
+ console.log(result);
320
+ // {
321
+ // found: true,
322
+ // strategy: 'role',
323
+ // locatorString: "getByRole('button', { name: 'Submit' })",
324
+ // confidence: 0.95,
325
+ // fallbackLocators: ["getByTestId('submit-btn')", "locator('#submit')"],
326
+ // isInFrame: false,
327
+ // frameSelector: null
328
+ // }
329
+ ```
330
+
331
+ ### `findAllMatches(page, query, options?)`
332
+
333
+ Returns an array of all matching locator results for a query.
334
+
335
+ ```js
336
+ import { findAllMatches } from 'playwright-genie';
337
+
338
+ const matches = await findAllMatches(page, 'navigation link');
339
+ ```
340
+
341
+ ### `getPageStructure(page, forceRefresh?)`
342
+
343
+ Returns the full page structure used for LLM resolution. Useful for debugging or building custom pipelines.
344
+
345
+ ```js
346
+ import { getPageStructure } from 'playwright-genie';
347
+
348
+ const structure = await getPageStructure(page);
349
+ console.log(structure.mainFrame.ariaQuality); // 'good' | 'sparse' | 'none'
350
+ console.log(structure.mainFrame.ariaTree); // ARIA snapshot (YAML string)
351
+ console.log(structure.mainFrame.domTree); // Array of DOM nodes with XPath/CSS
352
+ console.log(structure.mainFrame.interactiveElements); // Interactive element metadata
353
+ console.log(structure.mainFrame.specialElements); // Elements with data-testid, etc.
354
+ console.log(structure.frames); // iframe structures
355
+ ```
356
+
357
+ ### `resolveLocator(page, query, options?)`
358
+
359
+ Low-level resolver that checks memory cache → disk cache → LLM. Returns the raw result object without creating a Playwright locator.
360
+
361
+ ```js
362
+ import { resolveLocator } from 'playwright-genie';
363
+
364
+ const result = await resolveLocator(page, 'login button', { action: 'click' });
365
+ // result.source is 'memory', 'disk', or 'llm'
366
+ ```
367
+
368
+ ### `getLocator(page, query, options?)`
369
+
370
+ Resolves a query and returns both the Playwright `Locator` object and the result metadata. Handles stale cache invalidation and fallback chains.
371
+
372
+ ```js
373
+ import { getLocator } from 'playwright-genie';
374
+
375
+ const { locator, result } = await getLocator(page, 'email input', { action: 'fill' });
376
+ await locator.fill('user@example.com');
377
+ ```
378
+
379
+ ### `clearCache()` / `clearAllCaches()`
380
+
381
+ Module-level cache clearing functions.
382
+
383
+ ```js
384
+ import { clearCache, clearAllCaches } from 'playwright-genie';
385
+
386
+ clearCache(); // memory only
387
+ clearAllCaches(); // memory + disk
388
+ ```
389
+
390
+ ## Adaptive Page Analysis
391
+
392
+ The library automatically adapts to the accessibility quality of each page:
393
+
394
+ | ARIA Quality | Criteria | Mode | What Gets Sent to LLM |
395
+ |---|---|---|---|
396
+ | **`good`** | ARIA tree has 3+ named interactive elements, 20+ lines | `aria` | Trimmed ARIA tree + special elements + interactive elements |
397
+ | **`sparse`** | ARIA tree exists but fewer named elements than the page has | `hybrid` | ARIA tree + DOM tree nodes (XPath/CSS) + interactive elements |
398
+ | **`none`** | ARIA tree has < 5 lines or is missing | `dom` | DOM tree with XPaths and CSS selectors + interactive elements |
399
+
400
+ ### DOM Tree Pipeline
401
+
402
+ For pages with poor or no accessibility markup, the library walks the DOM in-browser and:
403
+
404
+ - Traverses up to **300 visible nodes** (headings, links, buttons, inputs, landmarks, etc.)
405
+ - Generates **XPath** for each node (e.g., `//*[@id="login"]`, `//form/div[2]/input[1]`)
406
+ - Generates **unique CSS selectors** (e.g., `#login`, `[data-testid="submit"]`, `button.primary`)
407
+ - Extracts text content, ARIA labels, placeholders, `data-testid` attributes, and parent context
408
+ - Filters nodes by **relevance scoring** against your query before sending to the LLM
409
+
410
+ This means the library works on any page — not just accessible ones.
411
+
285
412
  ## Caching
286
413
 
287
414
  **playwright-genie** uses a two-level cache to minimize LLM calls:
@@ -291,7 +418,9 @@ smart.clearAllCaches(); // clear both memory + disk (.locator-cache.json)
291
418
 
292
419
  Cache keys are scoped by **URL pathname + action + query**, so `fill('username')` on `/login` won't collide with `click('username')` on `/dashboard`.
293
420
 
294
- If a cached locator becomes stale (element no longer exists), it's automatically invalidated and re-resolved via LLM.
421
+ If a cached locator becomes stale (element no longer exists), the library:
422
+ 1. Tries **fallback locators** returned by the LLM
423
+ 2. If all fallbacks fail, **invalidates the cache** and re-queries the LLM with fresh page structure
295
424
 
296
425
  Set `LOCATOR_CACHE_FILE` env var to customize the cache file path.
297
426
 
@@ -353,6 +482,51 @@ test('shopping flow', async ({ page }) => {
353
482
  });
354
483
  ```
355
484
 
485
+ ### SPA Navigation with Cache Clearing
486
+
487
+ ```js
488
+ test('SPA login and navigate', async ({ page }) => {
489
+ const smart = createSmartLocator(page);
490
+ await page.goto('https://spa-app.com/login');
491
+
492
+ await smart.fill('username', 'admin');
493
+ await smart.fill('password', 'secret');
494
+ await smart.click('login button');
495
+
496
+ // After SPA navigation, clear cache to force fresh page analysis
497
+ await page.waitForURL('**/dashboard');
498
+ smart.clearCache();
499
+
500
+ await smart.click('settings tab');
501
+ await smart.waitForVisible('settings panel');
502
+ });
503
+ ```
504
+
505
+ ### Batch Pre-fetch for Performance
506
+
507
+ ```js
508
+ test('prefetch for faster tests', async ({ page }) => {
509
+ const smart = createSmartLocator(page);
510
+ await page.goto('https://myapp.com/form');
511
+
512
+ // Resolve all locators in one LLM call
513
+ await smart.prefetch(
514
+ 'first name input',
515
+ 'last name input',
516
+ 'email field',
517
+ 'phone number',
518
+ 'submit button'
519
+ );
520
+
521
+ // All cached — zero LLM calls from here
522
+ await smart.fill('first name input', 'John');
523
+ await smart.fill('last name input', 'Doe');
524
+ await smart.fill('email field', 'john@example.com');
525
+ await smart.fill('phone number', '555-0123');
526
+ await smart.click('submit button');
527
+ });
528
+ ```
529
+
356
530
  ### Dynamic Content & Modals
357
531
 
358
532
  ```js
@@ -372,35 +546,46 @@ test('handle dynamic content', async ({ page }) => {
372
546
  });
373
547
  ```
374
548
 
375
- ### Form Validation
549
+ ### Using Low-Level API for Debugging
376
550
 
377
551
  ```js
378
- test('form validation', async ({ page }) => {
379
- const smart = createSmartLocator(page);
380
- await page.goto('https://myapp.com/signup');
381
-
382
- await smart.click('submit button');
383
- await smart.waitForVisible('email error message');
384
-
385
- const error = await smart.getText('email error message');
386
- expect(error).toContain('required');
387
-
388
- await smart.fill('email field', 'user@example.com');
389
- const enabled = await smart.isEnabled('submit button');
390
- expect(enabled).toBe(true);
552
+ import { getPageStructure, findLocator } from 'playwright-genie';
553
+
554
+ test('debug locator resolution', async ({ page }) => {
555
+ await page.goto('https://myapp.com');
556
+
557
+ // Inspect page analysis
558
+ const structure = await getPageStructure(page);
559
+ console.log('ARIA quality:', structure.mainFrame.ariaQuality);
560
+ console.log('DOM nodes:', structure.mainFrame.domTree.length);
561
+ console.log('Interactive elements:', structure.mainFrame.interactiveElements.length);
562
+
563
+ // See what the LLM resolves without acting
564
+ const result = await findLocator(page, 'submit button');
565
+ console.log('Strategy:', result.strategy);
566
+ console.log('Locator:', result.locatorString);
567
+ console.log('Fallbacks:', result.fallbackLocators);
391
568
  });
392
569
  ```
393
570
 
394
- ### Drag & Drop
571
+ ## Exports
395
572
 
396
573
  ```js
397
- test('kanban board', async ({ page }) => {
398
- const smart = createSmartLocator(page);
399
- await page.goto('https://app.example.com/board');
400
-
401
- await smart.dragTo('first task card', 'done column');
402
- await smart.waitForVisible('task moved toast');
403
- });
574
+ import {
575
+ createSmartLocator, // Main entry — creates smart locator with 40+ action methods
576
+ findLocator, // Resolve a query to a locator string (no action)
577
+ findAllMatches, // Get all matching locator results
578
+ getPageStructure, // Get the full page structure (ARIA + DOM + interactive elements)
579
+ getLocator, // Resolve + validate + create Playwright Locator object
580
+ resolveLocator, // Low-level: cache lookup → LLM resolution
581
+ clearCache, // Clear in-memory cache
582
+ clearAllCaches, // Clear memory + disk cache
583
+ SmartAction, // Class wrapping a Playwright Locator with 40+ methods
584
+ chatCompletion, // Direct LLM call (for custom pipelines)
585
+ getConfig, // Get current LLM configuration
586
+ loadDiskCache, // Load disk cache manually
587
+ invalidateDiskEntry, // Invalidate a specific disk cache entry
588
+ } from 'playwright-genie';
404
589
  ```
405
590
 
406
591
  ## Best Practices
@@ -441,6 +626,18 @@ await el.press('Enter');
441
626
  // 1 LLM call instead of 2
442
627
  ```
443
628
 
629
+ **Use prefetch() for forms and multi-element pages:**
630
+ ```js
631
+ await smart.prefetch('name', 'email', 'password', 'submit');
632
+ // 1 LLM call instead of 4
633
+ ```
634
+
635
+ **Clear cache after SPA navigation:**
636
+ ```js
637
+ await page.waitForURL('**/dashboard');
638
+ smart.clearCache();
639
+ ```
640
+
444
641
  ## TypeScript
445
642
 
446
643
  ```ts
@@ -465,14 +662,28 @@ LLM_LOCATOR_DEBUG=true npx playwright test
465
662
  ```
466
663
 
467
664
  This logs:
665
+ - Payload mode selected (`aria`/`hybrid`/`dom`) and payload size
468
666
  - LLM queries and responses
469
667
  - Cache hits/misses (memory and disk)
470
- - Page structure payload sizes
471
- - Stale cache invalidations
668
+ - Stale cache invalidations and fallback attempts
669
+ - Page structure collection timing
670
+
671
+ ## Environment Variables
672
+
673
+ | Variable | Description |
674
+ |---|---|
675
+ | `LLM_API_KEY` | API key for LLM provider |
676
+ | `LLM_BASE_URL` | Base URL for OpenAI-compatible API |
677
+ | `LLM_MODEL` | Model name (e.g., `gpt-4o-mini`) |
678
+ | `OPENAI_API_KEY` | Fallback API key |
679
+ | `ANTHROPIC_API_KEY` | Fallback API key |
680
+ | `ROUTELLM_API_KEY` | Fallback API key |
681
+ | `LOCATOR_CACHE_FILE` | Custom path for disk cache file |
682
+ | `LLM_LOCATOR_DEBUG` | Set to `true` to enable debug logging |
472
683
 
473
684
  ## Security Notes
474
685
 
475
- - The library sends the page's accessibility tree to your configured LLM API
686
+ - The library sends the page's accessibility tree and/or DOM structure to your configured LLM API
476
687
  - Sensitive data visible in the DOM may be sent to the API
477
688
  - Use environment variables for API keys — never hardcode them
478
689
  - For sensitive environments, use a local LLM (e.g., Ollama)
@@ -484,9 +695,3 @@ Contributions are welcome! Please feel free to submit a Pull Request.
484
695
  ## License
485
696
 
486
697
  MIT License — see the [LICENSE](LICENSE) file for details.
487
-
488
- ## Acknowledgments
489
-
490
- - [Playwright](https://playwright.dev/) — browser automation framework
491
- - [OpenAI](https://openai.com/) — LLM API support
492
- - [Claude Code](https://docs.anthropic.com/en/docs/claude-code) — AI-powered code generation and architecture design