npm - glippy-mcp - Versions diffs - 0.1.0 → 0.3.0 - Mend

glippy-mcp 0.1.0 → 0.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (5) hide show

package/README.md CHANGED Viewed

@@ -9,14 +9,15 @@ This MCP server enables AI models (Claude, GPT, etc.) to directly analyse any do
 It wraps the Glippy desktop app's server-side analysis engine (`geo-checker.js`) and exposes it over the standard MCP protocol via stdio transport.
 **Key features:**
-- Full 10-category GEO analysis with weighted scoring
+- Full 16-category GEO analysis with weighted scoring
 - robots.txt AI crawler access detection
 - llms.txt file discovery and parsing
 - Sitemap crawling and multi-page analysis
 - Domain comparison and competitive analysis
 - Export to styled Markdown or HTML reports
-- **Smart caching** — automatic deduplication of repeated analyses
-- **JSON output mode** — pass analysis results between tools to avoid re-crawling
+- **Smart caching** - automatic deduplication of repeated analyses
+- **JSON output mode** - pass analysis results between tools to avoid re-crawling
+- **Headless Chrome fallback** - automatically retries via a real browser when a site blocks bot-shaped fetches (Cloudflare, Akamai, DataDome, etc.)
 ---
@@ -41,6 +42,7 @@ It wraps the Glippy desktop app's server-side analysis engine (`geo-checker.js`)
 - [GEO Scoring Categories](#geo-scoring-categories)
 - [Rate Limiting](#rate-limiting)
 - [Output Formats](#output-formats)
+- [Chrome Rendering Fallback](#chrome-rendering-fallback)
 - [Architecture](#architecture)
 - [Manual Testing](#manual-testing)
 - [Troubleshooting](#troubleshooting)
@@ -68,6 +70,7 @@ npx -y glippy-mcp
 - Node.js 18.0.0 or higher
 - Valid Glippy MCP license key
+- **Optional:** Google Chrome or Chromium installed locally. Only needed if you want the Chrome-rendered fallback to kick in when a target site blocks static fetches. Without Chrome the server still works; it just cannot recover from WAF-blocked pages.
 ---
@@ -124,8 +127,13 @@ Add to your `.mcp.json` in your project root or `~/.claude/.mcp.json` for global
 | Variable | Required | Default | Description |
 |----------|----------|---------|-------------|
-| `GLIPPY_LICENSE_KEY` | Yes | — | Your MCP license key (`GLMCP-XXXX-XXXX-XXXX`) |
+| `GLIPPY_LICENSE_KEY` | Yes | - | Your MCP license key (`GLMCP-XXXX-XXXX-XXXX`) |
 | `GLIPPY_RATE_LIMIT` | No | `5` | Default max requests/second per domain for batch tools |
+| `CHROME_PATH` | No | auto-detect | Absolute path to your Chrome/Chromium binary. Overrides the built-in detection list. |
+| `PUPPETEER_EXECUTABLE_PATH` | No | auto-detect | Alternative name for `CHROME_PATH`, honored for puppeteer-core compatibility. |
+| `CHROME_REMOTE_URL` | No | - | Attach to an already-running Chrome instead of launching a new one. Accepts either `http://host:9222` (browserURL) or `ws://...` (browserWSEndpoint). Start Chrome with `--remote-debugging-port=9222`. |
+| `CHROME_HEADLESS` | No | `new` | Set to `0` or `false` to run Chrome visible. Useful for sites that aggressively detect headless. |
+| `CHROME_USER_DATA_DIR` | No | - | Path to a Chrome user-data directory. Lets the fallback reuse cookies, extensions, and auth state from a dedicated profile. |
 ---
@@ -160,7 +168,7 @@ The integration guide includes:
 Run a comprehensive GEO readiness analysis on a domain.
-**Description:** Checks robots.txt, llms.txt, homepage HTML (10 scoring categories), sitemap.xml, and security headers. Returns an overall weighted score (0-100) with per-category breakdowns and actionable recommendations. Use `output_format="json"` to get raw results that can be passed to `export_report`.
+**Description:** Checks robots.txt, llms.txt, homepage HTML (16 scoring categories), sitemap.xml, and security headers. Returns an overall weighted score (0-100) with per-category breakdowns and actionable recommendations. Use `render_mode="auto"` to transparently fall back to headless Chrome when a site blocks static fetches (Cloudflare, Akamai, etc.). Use `output_format="json"` to get raw results that can be passed to `export_report`.
 **Parameters:**
@@ -168,6 +176,7 @@ Run a comprehensive GEO readiness analysis on a domain.
 |-----------|------|----------|-------------|
 | `domain` | string | Yes | The domain to analyse, e.g. `"example.com"`. Do not include `https://` prefix. |
 | `max_pages` | integer | No | Maximum pages to crawl (1-10). Default: `10`. |
+| `render_mode` | enum | No | `"static"` (default) = plain Node fetch, fastest. `"auto"` = static first, falls back to a local headless Chrome on bot-blocked responses (401/403/407/429/503 or empty 2xx). `"chrome"` = always render via Chrome. Chrome modes need a local Chrome binary (see [Chrome Rendering Fallback](#chrome-rendering-fallback)). |
 | `output_format` | enum | No | `"text"` (default) for human-readable report, `"json"` for raw results to pass to `export_report`. |
 **Example:**
@@ -184,11 +193,12 @@ analyze_domain domain="example.com" max_pages=5 output_format="json"
 **Returns:**
 - Overall GEO score (0-100) with letter grade
 - Page type detection (article, product, homepage, etc.)
-- 10 category scores with pass/fail/warn checks
+- 16 category scores with pass/fail/warn checks
 - robots.txt analysis with AI crawler access
 - llms.txt presence and content preview
 - Sitemap discovery status
 - Multi-page aggregated scores (if `max_pages > 1`)
+- `renderMode` flag on the result: `static`, `chrome-fallback`, or an error code if both paths failed
 ---
@@ -264,6 +274,7 @@ Get a concise GEO readiness summary for quick assessment.
 | Parameter | Type | Required | Description |
 |-----------|------|----------|-------------|
 | `domain` | string | Yes | The domain to check, e.g. `"example.com"`. Do not include `https://` prefix. |
+| `render_mode` | enum | No | `"static"` (default), `"auto"` (static with Chrome fallback on bot-block), or `"chrome"` (always Chrome). See [Chrome Rendering Fallback](#chrome-rendering-fallback). |
 **Example:**
 ```
@@ -291,6 +302,7 @@ Analyse multiple domains in parallel and compare scores.
 |-----------|------|----------|-------------|
 | `domains` | array[string] | Yes | List of 2-10 domains to compare, e.g. `["example.com", "competitor.com"]`. Do not include `https://` prefix. |
 | `max_pages` | integer | No | Maximum pages to crawl per domain (1-10). Default: `10`. |
+| `render_mode` | enum | No | `"static"` (default), `"auto"` (static with Chrome fallback on bot-block), or `"chrome"` (always Chrome). See [Chrome Rendering Fallback](#chrome-rendering-fallback). |
 | `output_format` | enum | No | `"text"` (default) for comparison table, `"json"` for raw results to pass to `export_bulk_report`. |
 **Example:**
@@ -300,7 +312,7 @@ Compare GEO scores of example.com, competitor1.com, and competitor2.com
 **Returns:**
 - Ranked list of domains by score
-- Category comparison table (all 10 categories)
+- Category comparison table (all 16 categories)
 - Quick facts comparison (robots.txt, llms.txt, sitemap, blocked crawlers)
 - Error details for any failed analyses
@@ -319,6 +331,7 @@ Fetch a sitemap and analyse all discovered pages.
 | `sitemap_url` | string | Yes | Full URL to sitemap, e.g. `"https://example.com/sitemap.xml"` |
 | `max_urls` | integer | No | Maximum URLs to analyse (1-50,000). Default: all URLs found. |
 | `rate_limit` | number | No | Max requests/second per domain (0.1-100). Default: `5`. |
+| `render_mode` | enum | No | `"static"` (default), `"auto"` (static with Chrome fallback on bot-block), or `"chrome"` (always Chrome). Applied per URL. See [Chrome Rendering Fallback](#chrome-rendering-fallback). |
 | `output_format` | enum | No | `"text"` (default) for report, `"json"` for raw results to pass to `export_bulk_report`. |
 **Example:**
@@ -350,6 +363,7 @@ Run GEO analysis on a list of specific URLs.
 |-----------|------|----------|-------------|
 | `urls` | array[string] | Yes | List of 1-50,000 full URLs, e.g. `["https://example.com/about", "https://example.com/pricing"]`. Include `https://` prefix. |
 | `rate_limit` | number | No | Max requests/second per domain (0.1-100). Default: `5`. |
+| `render_mode` | enum | No | `"static"` (default), `"auto"` (static with Chrome fallback on bot-block), or `"chrome"` (always Chrome). Applied per URL. See [Chrome Rendering Fallback](#chrome-rendering-fallback). |
 | `output_format` | enum | No | `"text"` (default) for report, `"json"` for raw results to pass to `export_bulk_report`. |
 **Example:**
@@ -377,6 +391,7 @@ Generate a styled, shareable report file.
 | `domain` | string | No* | The domain to analyse, e.g. `"example.com"`. Do not include `https://` prefix. |
 | `format` | enum | Yes | Report format: `"markdown"` (recommendations only), `"markdown_full"` (all categories and checks), or `"html"` (standalone styled page). |
 | `max_pages` | integer | No | Maximum pages to crawl (1-10). Default: `10`. Ignored if `analysis_result` is provided. |
+| `render_mode` | enum | No | `"static"` (default), `"auto"` (Chrome fallback on bot-block), or `"chrome"` (always Chrome). Ignored if `analysis_result` is provided. See [Chrome Rendering Fallback](#chrome-rendering-fallback). |
 | `analysis_result` | object | No* | Pre-computed analysis result from `analyze_domain` (with `output_format="json"`). Skips re-crawling. |
 *Either `domain` or `analysis_result` must be provided.
@@ -420,6 +435,7 @@ Generate a styled report for bulk analysis.
 | `max_pages` | integer | No | For domain mode: pages per domain (1-10). Default: `10`. Ignored if `analysis_results` provided. |
 | `max_urls` | integer | No | For sitemap mode: max URLs to analyse. Default: all. Ignored if `analysis_results` provided. |
 | `rate_limit` | number | No | Max requests/second per domain. Default: `5`. Ignored if `analysis_results` provided. |
+| `render_mode` | enum | No | `"static"` (default), `"auto"` (Chrome fallback on bot-block), or `"chrome"` (always Chrome). Ignored if `analysis_results` provided. See [Chrome Rendering Fallback](#chrome-rendering-fallback). |
 *Provide exactly one of: `domains`, `urls`, `sitemap_url`, or `analysis_results`.
@@ -445,7 +461,7 @@ export_bulk_report format="html" analysis_results=<result from above>
 ## GEO Scoring Categories
-The analysis evaluates 10 categories, each with a weight reflecting its importance for AI/LLM readiness:
+The analysis evaluates 16 categories, each with a weight reflecting its importance for AI/LLM readiness:
 | # | Category | Weight | What It Measures |
 |---|----------|--------|------------------|
@@ -455,10 +471,16 @@ The analysis evaluates 10 categories, each with a weight reflecting its importan
 | 4 | **Internal Linking** | 1.0x | Link density, navigation structure, breadcrumb markup |
 | 5 | **Meta & Discoverability** | 1.0x | Title, meta description, canonical URL, Open Graph tags, hreflang |
 | 6 | **Machine Readability** | 1.5x | SSR detection, bot blocking checks, robots.txt rules, llms.txt presence* |
-| 7 | **Entity & Authority** | 1.0x | Author information, publication dates, organization schema |
+| 7 | **Entity & Authority** | 1.0x | Author info, publication dates, organization schema, E-E-A-T signals, credentials, editorial policy, contact completeness |
 | 8 | **Citability & Answer-Readiness** | 1.3x | FAQ content, data tables, lists, lead paragraph quality |
 | 9 | **Performance & Crawlability** | 0.3x | Image dimensions, lazy loading, resource hints |
 | 10 | **Agent Interactivity** | 0.2x | WebMCP tools, form annotations, agent-callable actions |
+| 11 | **Content Positioning** | 1.2x | Brand differentiation, proof points, social proof |
+| 12 | **Content Freshness** | 0.8x | Date signals, content age, temporal language |
+| 13 | **Information Density** | 1.0x | Substantive-to-filler ratio, section depth, claim-evidence pairing |
+| 14 | **Factual Verifiability** | 0.8x | Citations, source attribution, methodology disclosure |
+| 15 | **Content Comprehensiveness** | 0.8x | Word count, heading coverage, definitions, comparisons |
+| 16 | **Multimodal Content** | 0.5x | Image alt text, figures, video/audio, SVG, multimedia schema |
 *\*llms.txt is checked for presence but is not currently supported or consumed by any major AI model or crawler. It has minimal practical impact on GEO readiness today — see the [`check_llms_txt`](#check_llms_txt) section for details.*
@@ -593,13 +615,68 @@ export_bulk_report format="html" analysis_results=<JSON from step 1>
 ---
+## Chrome Rendering Fallback
+Some sites (Cloudflare, Akamai, PerimeterX, DataDome, Incapsula) refuse static Node fetches with 401/403/429/503 responses. The server can drive a real Chrome instance to fetch those pages instead, so they still get scored.
+### Choosing a render mode
+Every analysis tool (`analyze_domain`, `get_geo_summary`, `compare_domains`, `analyze_urls`, `analyze_sitemap`, `export_report`, `export_bulk_report`) accepts a `render_mode` parameter:
+| Mode | Behavior | Use when |
+|------|----------|----------|
+| `static` *(default)* | Plain Node fetch. Fast. No Chrome required. | You're scoring sites that don't block bots, or you explicitly want to see how a static crawler experiences the page. |
+| `auto` | Static fetch first. If it looks bot-blocked (status 401/403/407/429/503, or 2xx with an empty body), retry that URL via Chrome. | Mixed workloads - most sites fast-path through static; only blocked ones pay the Chrome cost. Recommended for competitive audits across a list of domains. |
+| `chrome` | Every URL fetched via Chrome. Slowest, most resilient. | You know the targets aggressively detect headless and want to front-load the Chrome cost, or you're debugging rendering differences. |
+The result object includes a `renderMode` field so you can tell which path ran: `static`, `chrome`, `chrome-fallback`, `chrome-blocked-<code>` (Chrome tried but also got blocked), or `static-blocked` (both paths failed).
+### Setup
+Chrome modes need a Chrome or Chromium binary. The server looks in these locations, in order:
+1. `CHROME_PATH` env var
+2. `PUPPETEER_EXECUTABLE_PATH` env var
+3. `C:/Program Files/Google/Chrome/Application/chrome.exe`
+4. `C:/Program Files (x86)/Google/Chrome/Application/chrome.exe`
+5. `/Applications/Google Chrome.app/Contents/MacOS/Google Chrome`
+6. `/usr/bin/google-chrome`, `/usr/bin/chromium`, `/usr/bin/chromium-browser`
+If none exist, `render_mode: "static"` still works; only the Chrome-backed modes become unavailable.
+### Attaching to your own Chrome
+For sites that fingerprint headless Chrome, start a Chrome instance with remote debugging and point the server at it. The server will attach to that instance instead of launching its own:
+```bash
+# macOS
+/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome \
+  --remote-debugging-port=9222 --user-data-dir=/tmp/glippy-chrome
+# Windows (PowerShell)
+& "C:\Program Files\Google\Chrome\Application\chrome.exe" `
+  --remote-debugging-port=9222 --user-data-dir=C:\Temp\glippy-chrome
+# Then in your MCP config env:
+#   CHROME_REMOTE_URL=http://127.0.0.1:9222
+```
+Using a dedicated `--user-data-dir` keeps this session isolated from your normal browsing. When attached, the fetcher leaves UA/headers/stealth untouched so requests look identical to a human using that browser.
+### Visible mode
+For debugging, set `CHROME_HEADLESS=0` to watch Chrome drive itself. Purely for development - leave it off in production.
+---
 ## Architecture
 ```
 research-mcp/
 ├── src/
-│   ├── index.js          # MCP server — tool registration, JSON-RPC handling, license validation
-│   └── geo-checker.js    # GEO analysis engine — fetches & scores domains
+│   ├── index.js           # MCP server - tool registration, JSON-RPC handling, license validation
+│   ├── geo-checker.js     # GEO analysis engine - fetches & scores domains
+│   └── chrome-fetcher.js  # Headless Chrome adapter (puppeteer-core) for WAF-blocked sites
 ├── package.json
 └── README.md
 ```
@@ -609,13 +686,13 @@ research-mcp/
 1. **Fetch resources in parallel:**
    - robots.txt
    - llms.txt
-   - Homepage HTML
+   - Homepage HTML (static fetch first, Chrome fallback if bot-blocked)
    - sitemap.xml
    - UCP profile (/.well-known/ucp)
 2. **Parse HTML with cheerio** (server-side DOM)
-3. **Run 10 weighted scoring categories**
+3. **Run 16 weighted scoring categories**
 4. **Return comprehensive analysis** with actionable recommendations

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "glippy-mcp",
-  "version": "0.1.0",
+  "version": "0.3.0",
   "description": "MCP server for GEO (Generative Engine Optimization) analysis — check any domain's AI-readiness",
   "main": "src/index.js",
   "type": "module",
@@ -38,6 +38,7 @@
   "dependencies": {
     "@modelcontextprotocol/sdk": "^1.12.1",
     "cheerio": "^1.0.0",
+    "puppeteer-core": "^24.40.0",
     "zod": "^3.24.0"
   }
 }

package/src/chrome-fetcher.js ADDED Viewed

@@ -0,0 +1,213 @@
+// Chrome-backed fetch adapter for geo-checker.
+//
+// Exposes the same shape as the internal throttledFetchUrl:
+//   { body, statusCode, headers, finalUrl }
+// but drives a headless Chrome via puppeteer-core so that bot-mitigation
+// layers (Cloudflare, Akamai, PerimeterX, DataDome, Incapsula) that block
+// raw Node fetches don't keep us out.
+//
+// The module holds a single long-lived browser + page pair. Callers fetch
+// URLs sequentially; this is fine for the audit path (one domain at a time
+// per checkGEO call) and avoids spinning up a new chromium process per page.
+import puppeteer from 'puppeteer-core';
+const DEFAULT_TIMEOUT_MS = 30_000;
+const WAIT_UNTIL = 'networkidle2';
+const DEFAULT_CHROME_PATHS = [
+  process.env.CHROME_PATH,
+  process.env.PUPPETEER_EXECUTABLE_PATH,
+  'C:/Program Files/Google/Chrome/Application/chrome.exe',
+  'C:/Program Files (x86)/Google/Chrome/Application/chrome.exe',
+  '/Applications/Google Chrome.app/Contents/MacOS/Google Chrome',
+  '/usr/bin/google-chrome',
+  '/usr/bin/chromium',
+  '/usr/bin/chromium-browser',
+].filter(Boolean);
+let browserPromise = null;
+let connectedToExisting = false;
+async function resolveChromePath() {
+  const fs = await import('node:fs/promises');
+  for (const p of DEFAULT_CHROME_PATHS) {
+    try {
+      await fs.access(p);
+      return p;
+    } catch {
+      // try next
+    }
+  }
+  return null;
+}
+async function getBrowser() {
+  if (browserPromise) return browserPromise;
+  browserPromise = (async () => {
+    // Mode 1: attach to a user's already-running Chrome via CDP.
+    // Start Chrome with `--remote-debugging-port=9222` and (if they want to
+    // reuse their normal profile) pass `--user-data-dir=...` to a dedicated
+    // clone. CHROME_REMOTE_URL can be either browserURL (http://host:port)
+    // or a browserWSEndpoint (ws://...).
+    const remoteUrl = process.env.CHROME_REMOTE_URL;
+    if (remoteUrl) {
+      const opts = remoteUrl.startsWith('ws')
+        ? { browserWSEndpoint: remoteUrl }
+        : { browserURL: remoteUrl };
+      const browser = await puppeteer.connect({
+        ...opts,
+        defaultViewport: null,
+      });
+      connectedToExisting = true;
+      return browser;
+    }
+    // Mode 2: launch our own Chrome. Headless by default; set
+    // CHROME_HEADLESS=0 to run visible (useful for sites that aggressively
+    // detect headless).
+    const executablePath = await resolveChromePath();
+    if (!executablePath) {
+      throw new Error(
+        'Chrome executable not found. Set CHROME_PATH or install Chrome/Chromium.',
+      );
+    }
+    const headlessEnv = process.env.CHROME_HEADLESS;
+    const headless = headlessEnv === '0' || headlessEnv === 'false' ? false : 'new';
+    const userDataDir = process.env.CHROME_USER_DATA_DIR || undefined;
+    const browser = await puppeteer.launch({
+      executablePath,
+      headless,
+      userDataDir,
+      args: [
+        '--no-sandbox',
+        '--disable-dev-shm-usage',
+        '--disable-blink-features=AutomationControlled',
+        '--disable-features=IsolateOrigins,site-per-process',
+      ],
+    });
+    return browser;
+  })();
+  return browserPromise;
+}
+async function applyStealth(page) {
+  // Minimal stealth: mask the navigator.webdriver flag and add common
+  // properties that headless Chrome misses. This won't defeat enterprise
+  // bot mitigation, but clears the trivial checks many WAFs rely on.
+  await page.evaluateOnNewDocument(() => {
+    Object.defineProperty(navigator, 'webdriver', { get: () => undefined });
+    // languages / plugins
+    Object.defineProperty(navigator, 'languages', { get: () => ['nl-NL', 'nl', 'en-US', 'en'] });
+    Object.defineProperty(navigator, 'plugins', { get: () => [1, 2, 3, 4, 5] });
+    // chrome.runtime stub
+    window.chrome = window.chrome || { runtime: {} };
+    // permissions query patch (Notification)
+    const originalQuery = window.navigator.permissions && window.navigator.permissions.query;
+    if (originalQuery) {
+      window.navigator.permissions.query = (parameters) =>
+        parameters.name === 'notifications'
+          ? Promise.resolve({ state: Notification.permission })
+          : originalQuery(parameters);
+    }
+  });
+}
+export async function chromeFetch(url, timeoutMs = DEFAULT_TIMEOUT_MS) {
+  const empty = { body: null, statusCode: null, headers: {}, finalUrl: null };
+  let page;
+  try {
+    const browser = await getBrowser();
+    page = await browser.newPage();
+    // When attached to a user's Chrome, leave UA/headers/stealth alone —
+    // their real profile already looks like a human. Only shape the
+    // request when we launched Chrome ourselves.
+    if (!connectedToExisting) {
+      await page.setViewport({ width: 1366, height: 768 });
+      await page.setUserAgent(
+        'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36',
+      );
+      await page.setExtraHTTPHeaders({
+        'Accept-Language': 'nl-NL,nl;q=0.9,en-US;q=0.8,en;q=0.7',
+        'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8',
+        'Upgrade-Insecure-Requests': '1',
+        'Sec-Fetch-Dest': 'document',
+        'Sec-Fetch-Mode': 'navigate',
+        'Sec-Fetch-Site': 'none',
+        'Sec-Fetch-User': '?1',
+      });
+      await applyStealth(page);
+    }
+    const response = await page.goto(url, {
+      waitUntil: WAIT_UNTIL,
+      timeout: timeoutMs,
+    });
+    if (!response) return empty;
+    let statusCode = response.status();
+    let headers = response.headers() || {};
+    // Some WAFs (Cloudflare) serve a 403 interstitial, then JS solves
+    // the challenge and navigates to real content. Give it a brief window
+    // to settle and re-read the final status from the live document.
+    if (statusCode === 403 || statusCode === 503) {
+      try {
+        await page.waitForFunction(
+          () => {
+            const html = document.documentElement ? document.documentElement.outerHTML : '';
+            // Cloudflare challenge markers
+            return !/cf-challenge|cf-browser-verification|Just a moment/i.test(html);
+          },
+          { timeout: 8000 },
+        );
+        // Re-evaluate: if navigation happened, fetch the new main response.
+        const finalResp = page.mainFrame().url() !== url
+          ? await page.waitForResponse(() => true, { timeout: 2000 }).catch(() => null)
+          : null;
+        if (finalResp) {
+          statusCode = finalResp.status();
+          headers = finalResp.headers() || headers;
+        }
+      } catch {
+        // challenge didn't clear — keep the 403/503 so caller can decide.
+      }
+    }
+    const finalUrl = page.url();
+    let body = null;
+    try {
+      body = await page.content();
+    } catch {
+      body = null;
+    }
+    return { body, statusCode, headers, finalUrl };
+  } catch (err) {
+    return { ...empty, error: err.message };
+  } finally {
+    if (page) {
+      try { await page.close(); } catch { /* ignore */ }
+    }
+  }
+}
+export async function closeBrowser() {
+  if (!browserPromise) return;
+  try {
+    const browser = await browserPromise;
+    if (connectedToExisting) {
+      await browser.disconnect();
+    } else {
+      await browser.close();
+    }
+  } catch { /* ignore */ }
+  browserPromise = null;
+  connectedToExisting = false;
+}
+// Close on process exit so the Chrome process doesn't linger.
+const shutdown = () => { closeBrowser().catch(() => {}); };
+process.once('exit', shutdown);
+process.once('SIGINT', () => { shutdown(); process.exit(130); });
+process.once('SIGTERM', () => { shutdown(); process.exit(143); });