npm - @ulpi/browse - Versions diffs - 0.7.3 → 0.7.5 - Mend

@ulpi/browse 0.7.3 → 0.7.5

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (3) hide show

package/README.md CHANGED Viewed

@@ -10,26 +10,33 @@ Ten actions and you've burned **146K tokens — 73% of a 200K context window**
 **Same 10 actions: ~11K tokens. 6% of context. 13x less than @playwright/mcp.**
-## Benchmarks (Measured)
-Tested on 4 e-commerce sites (mumzworld, amazon, ebay, nike) across homepage, search results, and product detail pages ([raw data](BENCHMARKS.md)):
-| Site | Page | @playwright/mcp navigate | browse snapshot -i | Reduction |
-|------|------|-------------------------:|-------------------:|----------:|
-| mumzworld.com | Homepage | ~51,151 | ~15,072 | **3x** |
-| mumzworld.com | Search | ~13,860 | ~3,614 | **4x** |
-| mumzworld.com | PDP | ~10,071 | ~3,084 | **3x** |
-| amazon.com | Homepage | ~10,431 | ~2,150 | **5x** |
-| amazon.com | Search | ~19,458 | ~3,644 | **5x** |
-| ebay.com | Homepage | ~4,641 | ~1,557 | **3x** |
-| ebay.com | Search | ~35,929 | ~7,088 | **5x** |
-| ebay.com | PDP | ~1,294 | ~678 | **2x** |
-| nike.com | Homepage | ~2,495 | ~816 | **3x** |
-| nike.com | Search | ~7,998 | ~2,678 | **3x** |
-| nike.com | PDP | ~3,034 | ~989 | **3x** |
-| **TOTAL** | **11 pages** | **~160,362** | **~41,370** | **4x** |
-And that's the per-snapshot comparison. The real gap is architectural — @playwright/mcp dumps a snapshot on every action (navigate, click, type). `browse` only returns ~15 tokens per action:
+## Benchmarks
+### vs Agent Browser & Browser-Use (Token Cost)
+Tested on 3 sites across multi-step browsing flows — navigate, snapshot, scroll, search, extract text:
+**browse is 2.4-2.8x cheaper on tokens, 1.3-2.6x faster, and uses 7% of context vs 17-20%.**
+| Tool | Total Tokens | Total Time | Context Used (200K) |
+|------|-------------:|-----------:|--------------------:|
+| **browse** | **14,134** | **28.5s** | **7.1%** |
+| agent-browser | 39,414 | 36.2s | 19.7% |
+| browser-use | 34,281 | 72.7s | 17.1% |
+**Per site:**
+| Site | browse tokens | agent-browser tokens | browser-use tokens | browse time | agent-browser time | browser-use time |
+|------|-------:|-------------:|------------:|------:|------:|------:|
+| amazon.com | 7,531 | 11,596 | 20,508 | 10.1s | 12.9s | 21.9s |
+| bbc.com | 4,032 | 24,861 | 8,827 | 9.8s | 13.5s | 29.9s |
+| booking.com | 2,571 | 2,957 | 4,946 | 8.6s | 9.8s | 20.9s |
+browse uses **2.4x fewer tokens** than browser-use and **2.8x fewer** than agent-browser — and completes **2.5x faster** than browser-use across the same workflows.
+### vs @playwright/mcp (Architecture)
+@playwright/mcp dumps the full accessibility snapshot on every action (navigate, click, type). browse returns ~15 tokens per action — the agent requests a snapshot only when it needs one:
 | | @playwright/mcp | @ulpi/browse |
 |---|---:|---:|
@@ -241,8 +248,10 @@ browse click @e52
 ### Snapshot & Refs
 ```
-snapshot [-i] [-c] [-C] [-d N] [-s sel]
-  -i    Interactive elements only (buttons, links, inputs)
+snapshot [-i] [-f] [-V] [-c] [-C] [-d N] [-s sel]
+  -i    Interactive elements only — terse flat list (minimal tokens)
+  -f    Full — indented tree with props and children (use with -i)
+  -V    Viewport — only elements visible in current viewport
   -c    Compact — remove empty structural nodes
   -C    Cursor-interactive — detect hidden clickable elements
   -d N  Limit tree depth
@@ -362,6 +371,19 @@ Inspired by and originally derived from the `/browse` skill in [gstack](https://
 ## Changelog
+### v0.7.0 — Token Optimization
+- `snapshot -i` now outputs terse flat list by default (no indentation, no props, names truncated to 30 chars)
+- `-f` flag for full indented ARIA tree with props/children (the old `-i` behavior)
+- `-V` flag for viewport-only snapshot — filters to elements visible in the current viewport (BBC: 189 → 28 elements, ~85% reduction)
+- `browse version` / `--version` / `-V` — print CLI version
+- 2.4-2.8x fewer tokens than browser-use and agent-browser across real-world benchmarks
+### v0.4.0 — Video Recording
+- `video start [dir]` | `video stop` | `video status` — compositor-level WebM recording
+- Works with local and remote (CDP) browsers
 ### v0.3.0 — Headed Mode, Clipboard, DevTools
 - `--headed` flag — run browser in visible mode for debugging and demos

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@ulpi/browse",
-  "version": "0.7.3",
+  "version": "0.7.5",
   "repository": {
     "type": "git",
     "url": "https://github.com/ulpi-io/browse"

package/src/snapshot.ts CHANGED Viewed

@@ -446,27 +446,80 @@ export async function handleSnapshot(
     output.push(outputLine);
   }
-  // Viewport filter: remove elements below the visible viewport
+  // Viewport filter: remove elements outside the visible viewport
+  // Uses a single page.evaluate() for speed — checking 189 locators individually is slow
   if (opts.viewport) {
     const vp = page.viewportSize();
     if (vp) {
-      const toRemove = new Set<string>();
-      await Promise.all(
-        Array.from(refMap.entries()).map(async ([ref, loc]) => {
-          try {
-            const box = await loc.boundingBox({ timeout: 500 });
-            if (!box || box.y >= vp.height || box.y + box.height <= 0) {
-              toRemove.add(ref);
+      // Build a list of {ref, role, name} to check in the DOM
+      const checks = Array.from(refMap.keys()).map(ref => {
+        const line = output.find(l => l.includes(`@${ref} `));
+        const roleMatch = line?.match(/\[(\w+)\]/);
+        const nameMatch = line?.match(/"([^"]*)"/);
+        return { ref, role: roleMatch?.[1] || '', name: nameMatch?.[1] || '' };
+      });
+      const visibleRefs = await evalCtx.evaluate(
+        ({ checks, vpHeight }) => {
+          const ROLE_TO_SELECTOR: Record<string, string> = {
+            link: 'a,[role="link"]',
+            button: 'button,[role="button"],input[type="button"],input[type="submit"]',
+            textbox: 'input:not([type="checkbox"]):not([type="radio"]):not([type="submit"]):not([type="button"]):not([type="hidden"]),textarea,[role="textbox"]',
+            checkbox: 'input[type="checkbox"],[role="checkbox"]',
+            radio: 'input[type="radio"],[role="radio"]',
+            combobox: 'select,[role="combobox"]',
+            searchbox: 'input[type="search"],[role="searchbox"]',
+            tab: '[role="tab"]',
+            switch: '[role="switch"]',
+            slider: 'input[type="range"],[role="slider"]',
+            menuitem: '[role="menuitem"]',
+            option: 'option,[role="option"]',
+          };
+          const visible = new Set<string>();
+          // Track which elements we've already matched per role+name
+          const roleCounts = new Map<string, number>();
+          for (const { ref, role, name } of checks) {
+            const selector = ROLE_TO_SELECTOR[role] || `[role="${role}"]`;
+            const all = document.querySelectorAll(selector);
+            const key = `${role}:${name}`;
+            const skip = roleCounts.get(key) || 0;
+            let matched = 0;
+            for (let i = 0; i < all.length; i++) {
+              const el = all[i] as HTMLElement;
+              // Match by accessible name (textContent or aria-label)
+              const accName = (el.getAttribute('aria-label') || el.textContent || '').trim();
+              // For terse mode, name may be truncated — check startsWith
+              const nameMatches = !name || accName === name ||
+                (name.endsWith('...') && accName.startsWith(name.slice(0, -3)));
+              if (!nameMatches) continue;
+              if (matched < skip) { matched++; continue; }
+              const rect = el.getBoundingClientRect();
+              if (rect.y + rect.height > 0 && rect.y < vpHeight) {
+                visible.add(ref);
+              }
+              matched++;
+              break;
             }
-          } catch {
-            toRemove.add(ref);
+            roleCounts.set(key, skip + 1);
           }
-        })
+          return [...visible];
+        },
+        { checks, vpHeight: vp.height }
       );
+      const visibleSet = new Set(visibleRefs);
+      const toRemove = new Set<string>();
+      for (const ref of refMap.keys()) {
+        if (!visibleSet.has(ref)) toRemove.add(ref);
+      }
       for (const ref of toRemove) {
         refMap.delete(ref);
       }
-      // Remove output lines for filtered refs
       for (let i = output.length - 1; i >= 0; i--) {
         const match = output[i].match(/@(e\d+)/);
         if (match && toRemove.has(match[1])) {