npm - glippy-mcp - Versions diffs - 0.1.0 → 0.2.0 - Mend

glippy-mcp 0.1.0 → 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (5) hide show

package/README.md CHANGED Viewed

@@ -9,14 +9,15 @@ This MCP server enables AI models (Claude, GPT, etc.) to directly analyse any do
 It wraps the Glippy desktop app's server-side analysis engine (`geo-checker.js`) and exposes it over the standard MCP protocol via stdio transport.
 **Key features:**
-- Full 10-category GEO analysis with weighted scoring
+- Full 16-category GEO analysis with weighted scoring
 - robots.txt AI crawler access detection
 - llms.txt file discovery and parsing
 - Sitemap crawling and multi-page analysis
 - Domain comparison and competitive analysis
 - Export to styled Markdown or HTML reports
-- **Smart caching** — automatic deduplication of repeated analyses
-- **JSON output mode** — pass analysis results between tools to avoid re-crawling
+- **Smart caching** - automatic deduplication of repeated analyses
+- **JSON output mode** - pass analysis results between tools to avoid re-crawling
+- **Headless Chrome fallback** - automatically retries via a real browser when a site blocks bot-shaped fetches (Cloudflare, Akamai, DataDome, etc.)
 ---
@@ -41,6 +42,7 @@ It wraps the Glippy desktop app's server-side analysis engine (`geo-checker.js`)
 - [GEO Scoring Categories](#geo-scoring-categories)
 - [Rate Limiting](#rate-limiting)
 - [Output Formats](#output-formats)
+- [Chrome Rendering Fallback](#chrome-rendering-fallback)
 - [Architecture](#architecture)
 - [Manual Testing](#manual-testing)
 - [Troubleshooting](#troubleshooting)
@@ -68,6 +70,7 @@ npx -y glippy-mcp
 - Node.js 18.0.0 or higher
 - Valid Glippy MCP license key
+- **Optional:** Google Chrome or Chromium installed locally. Only needed if you want the Chrome-rendered fallback to kick in when a target site blocks static fetches. Without Chrome the server still works; it just cannot recover from WAF-blocked pages.
 ---
@@ -124,8 +127,13 @@ Add to your `.mcp.json` in your project root or `~/.claude/.mcp.json` for global
 | Variable | Required | Default | Description |
 |----------|----------|---------|-------------|
-| `GLIPPY_LICENSE_KEY` | Yes | — | Your MCP license key (`GLMCP-XXXX-XXXX-XXXX`) |
+| `GLIPPY_LICENSE_KEY` | Yes | - | Your MCP license key (`GLMCP-XXXX-XXXX-XXXX`) |
 | `GLIPPY_RATE_LIMIT` | No | `5` | Default max requests/second per domain for batch tools |
+| `CHROME_PATH` | No | auto-detect | Absolute path to your Chrome/Chromium binary. Overrides the built-in detection list. |
+| `PUPPETEER_EXECUTABLE_PATH` | No | auto-detect | Alternative name for `CHROME_PATH`, honored for puppeteer-core compatibility. |
+| `CHROME_REMOTE_URL` | No | - | Attach to an already-running Chrome instead of launching a new one. Accepts either `http://host:9222` (browserURL) or `ws://...` (browserWSEndpoint). Start Chrome with `--remote-debugging-port=9222`. |
+| `CHROME_HEADLESS` | No | `new` | Set to `0` or `false` to run Chrome visible. Useful for sites that aggressively detect headless. |
+| `CHROME_USER_DATA_DIR` | No | - | Path to a Chrome user-data directory. Lets the fallback reuse cookies, extensions, and auth state from a dedicated profile. |
 ---
@@ -160,7 +168,7 @@ The integration guide includes:
 Run a comprehensive GEO readiness analysis on a domain.
-**Description:** Checks robots.txt, llms.txt, homepage HTML (10 scoring categories), sitemap.xml, and security headers. Returns an overall weighted score (0-100) with per-category breakdowns and actionable recommendations. Use `output_format="json"` to get raw results that can be passed to `export_report`.
+**Description:** Checks robots.txt, llms.txt, homepage HTML (16 scoring categories), sitemap.xml, and security headers. Returns an overall weighted score (0-100) with per-category breakdowns and actionable recommendations. Use `render_mode="auto"` to transparently fall back to headless Chrome when a site blocks static fetches (Cloudflare, Akamai, etc.). Use `output_format="json"` to get raw results that can be passed to `export_report`.
 **Parameters:**
@@ -168,6 +176,7 @@ Run a comprehensive GEO readiness analysis on a domain.
 |-----------|------|----------|-------------|
 | `domain` | string | Yes | The domain to analyse, e.g. `"example.com"`. Do not include `https://` prefix. |
 | `max_pages` | integer | No | Maximum pages to crawl (1-10). Default: `10`. |
+| `render_mode` | enum | No | `"static"` (default) = plain Node fetch, fastest. `"auto"` = static first, falls back to a local headless Chrome on bot-blocked responses (401/403/407/429/503 or empty 2xx). `"chrome"` = always render via Chrome. Chrome modes need a local Chrome binary (see [Chrome Rendering Fallback](#chrome-rendering-fallback)). |
 | `output_format` | enum | No | `"text"` (default) for human-readable report, `"json"` for raw results to pass to `export_report`. |
 **Example:**
@@ -184,11 +193,12 @@ analyze_domain domain="example.com" max_pages=5 output_format="json"
 **Returns:**
 - Overall GEO score (0-100) with letter grade
 - Page type detection (article, product, homepage, etc.)
-- 10 category scores with pass/fail/warn checks
+- 16 category scores with pass/fail/warn checks
 - robots.txt analysis with AI crawler access
 - llms.txt presence and content preview
 - Sitemap discovery status
 - Multi-page aggregated scores (if `max_pages > 1`)
+- `renderMode` flag on the result: `static`, `chrome-fallback`, or an error code if both paths failed
 ---
@@ -264,6 +274,7 @@ Get a concise GEO readiness summary for quick assessment.
 | Parameter | Type | Required | Description |
 |-----------|------|----------|-------------|
 | `domain` | string | Yes | The domain to check, e.g. `"example.com"`. Do not include `https://` prefix. |
+| `render_mode` | enum | No | `"static"` (default), `"auto"` (static with Chrome fallback on bot-block), or `"chrome"` (always Chrome). See [Chrome Rendering Fallback](#chrome-rendering-fallback). |
 **Example:**
 ```
@@ -291,6 +302,7 @@ Analyse multiple domains in parallel and compare scores.
 |-----------|------|----------|-------------|
 | `domains` | array[string] | Yes | List of 2-10 domains to compare, e.g. `["example.com", "competitor.com"]`. Do not include `https://` prefix. |
 | `max_pages` | integer | No | Maximum pages to crawl per domain (1-10). Default: `10`. |
+| `render_mode` | enum | No | `"static"` (default), `"auto"` (static with Chrome fallback on bot-block), or `"chrome"` (always Chrome). See [Chrome Rendering Fallback](#chrome-rendering-fallback). |
 | `output_format` | enum | No | `"text"` (default) for comparison table, `"json"` for raw results to pass to `export_bulk_report`. |
 **Example:**
@@ -300,7 +312,7 @@ Compare GEO scores of example.com, competitor1.com, and competitor2.com
 **Returns:**
 - Ranked list of domains by score
-- Category comparison table (all 10 categories)
+- Category comparison table (all 16 categories)
 - Quick facts comparison (robots.txt, llms.txt, sitemap, blocked crawlers)
 - Error details for any failed analyses
@@ -319,6 +331,7 @@ Fetch a sitemap and analyse all discovered pages.
 | `sitemap_url` | string | Yes | Full URL to sitemap, e.g. `"https://example.com/sitemap.xml"` |
 | `max_urls` | integer | No | Maximum URLs to analyse (1-50,000). Default: all URLs found. |
 | `rate_limit` | number | No | Max requests/second per domain (0.1-100). Default: `5`. |
+| `render_mode` | enum | No | `"static"` (default), `"auto"` (static with Chrome fallback on bot-block), or `"chrome"` (always Chrome). Applied per URL. See [Chrome Rendering Fallback](#chrome-rendering-fallback). |
 | `output_format` | enum | No | `"text"` (default) for report, `"json"` for raw results to pass to `export_bulk_report`. |
 **Example:**
@@ -350,6 +363,7 @@ Run GEO analysis on a list of specific URLs.
 |-----------|------|----------|-------------|
 | `urls` | array[string] | Yes | List of 1-50,000 full URLs, e.g. `["https://example.com/about", "https://example.com/pricing"]`. Include `https://` prefix. |
 | `rate_limit` | number | No | Max requests/second per domain (0.1-100). Default: `5`. |
+| `render_mode` | enum | No | `"static"` (default), `"auto"` (static with Chrome fallback on bot-block), or `"chrome"` (always Chrome). Applied per URL. See [Chrome Rendering Fallback](#chrome-rendering-fallback). |
 | `output_format` | enum | No | `"text"` (default) for report, `"json"` for raw results to pass to `export_bulk_report`. |
 **Example:**
@@ -377,6 +391,7 @@ Generate a styled, shareable report file.
 | `domain` | string | No* | The domain to analyse, e.g. `"example.com"`. Do not include `https://` prefix. |
 | `format` | enum | Yes | Report format: `"markdown"` (recommendations only), `"markdown_full"` (all categories and checks), or `"html"` (standalone styled page). |
 | `max_pages` | integer | No | Maximum pages to crawl (1-10). Default: `10`. Ignored if `analysis_result` is provided. |
+| `render_mode` | enum | No | `"static"` (default), `"auto"` (Chrome fallback on bot-block), or `"chrome"` (always Chrome). Ignored if `analysis_result` is provided. See [Chrome Rendering Fallback](#chrome-rendering-fallback). |
 | `analysis_result` | object | No* | Pre-computed analysis result from `analyze_domain` (with `output_format="json"`). Skips re-crawling. |
 *Either `domain` or `analysis_result` must be provided.
@@ -420,6 +435,7 @@ Generate a styled report for bulk analysis.
 | `max_pages` | integer | No | For domain mode: pages per domain (1-10). Default: `10`. Ignored if `analysis_results` provided. |
 | `max_urls` | integer | No | For sitemap mode: max URLs to analyse. Default: all. Ignored if `analysis_results` provided. |
 | `rate_limit` | number | No | Max requests/second per domain. Default: `5`. Ignored if `analysis_results` provided. |
+| `render_mode` | enum | No | `"static"` (default), `"auto"` (Chrome fallback on bot-block), or `"chrome"` (always Chrome). Ignored if `analysis_results` provided. See [Chrome Rendering Fallback](#chrome-rendering-fallback). |
 *Provide exactly one of: `domains`, `urls`, `sitemap_url`, or `analysis_results`.
@@ -445,7 +461,7 @@ export_bulk_report format="html" analysis_results=<result from above>
 ## GEO Scoring Categories
-The analysis evaluates 10 categories, each with a weight reflecting its importance for AI/LLM readiness:
+The analysis evaluates 16 categories, each with a weight reflecting its importance for AI/LLM readiness:
 | # | Category | Weight | What It Measures |
 |---|----------|--------|------------------|
@@ -455,10 +471,16 @@ The analysis evaluates 10 categories, each with a weight reflecting its importan
 | 4 | **Internal Linking** | 1.0x | Link density, navigation structure, breadcrumb markup |
 | 5 | **Meta & Discoverability** | 1.0x | Title, meta description, canonical URL, Open Graph tags, hreflang |
 | 6 | **Machine Readability** | 1.5x | SSR detection, bot blocking checks, robots.txt rules, llms.txt presence* |
-| 7 | **Entity & Authority** | 1.0x | Author information, publication dates, organization schema |
+| 7 | **Entity & Authority** | 1.0x | Author info, publication dates, organization schema, E-E-A-T signals, credentials, editorial policy, contact completeness |
 | 8 | **Citability & Answer-Readiness** | 1.3x | FAQ content, data tables, lists, lead paragraph quality |
 | 9 | **Performance & Crawlability** | 0.3x | Image dimensions, lazy loading, resource hints |
 | 10 | **Agent Interactivity** | 0.2x | WebMCP tools, form annotations, agent-callable actions |
+| 11 | **Content Positioning** | 1.2x | Brand differentiation, proof points, social proof |
+| 12 | **Content Freshness** | 0.8x | Date signals, content age, temporal language |
+| 13 | **Information Density** | 1.0x | Substantive-to-filler ratio, section depth, claim-evidence pairing |
+| 14 | **Factual Verifiability** | 0.8x | Citations, source attribution, methodology disclosure |
+| 15 | **Content Comprehensiveness** | 0.8x | Word count, heading coverage, definitions, comparisons |
+| 16 | **Multimodal Content** | 0.5x | Image alt text, figures, video/audio, SVG, multimedia schema |
 *\*llms.txt is checked for presence but is not currently supported or consumed by any major AI model or crawler. It has minimal practical impact on GEO readiness today — see the [`check_llms_txt`](#check_llms_txt) section for details.*
@@ -593,13 +615,68 @@ export_bulk_report format="html" analysis_results=<JSON from step 1>
 ---
+## Chrome Rendering Fallback
+Some sites (Cloudflare, Akamai, PerimeterX, DataDome, Incapsula) refuse static Node fetches with 401/403/429/503 responses. The server can drive a real Chrome instance to fetch those pages instead, so they still get scored.
+### Choosing a render mode
+Every analysis tool (`analyze_domain`, `get_geo_summary`, `compare_domains`, `analyze_urls`, `analyze_sitemap`, `export_report`, `export_bulk_report`) accepts a `render_mode` parameter:
+| Mode | Behavior | Use when |
+|------|----------|----------|
+| `static` *(default)* | Plain Node fetch. Fast. No Chrome required. | You're scoring sites that don't block bots, or you explicitly want to see how a static crawler experiences the page. |
+| `auto` | Static fetch first. If it looks bot-blocked (status 401/403/407/429/503, or 2xx with an empty body), retry that URL via Chrome. | Mixed workloads - most sites fast-path through static; only blocked ones pay the Chrome cost. Recommended for competitive audits across a list of domains. |
+| `chrome` | Every URL fetched via Chrome. Slowest, most resilient. | You know the targets aggressively detect headless and want to front-load the Chrome cost, or you're debugging rendering differences. |
+The result object includes a `renderMode` field so you can tell which path ran: `static`, `chrome`, `chrome-fallback`, `chrome-blocked-<code>` (Chrome tried but also got blocked), or `static-blocked` (both paths failed).
+### Setup
+Chrome modes need a Chrome or Chromium binary. The server looks in these locations, in order:
+1. `CHROME_PATH` env var
+2. `PUPPETEER_EXECUTABLE_PATH` env var
+3. `C:/Program Files/Google/Chrome/Application/chrome.exe`
+4. `C:/Program Files (x86)/Google/Chrome/Application/chrome.exe`
+5. `/Applications/Google Chrome.app/Contents/MacOS/Google Chrome`
+6. `/usr/bin/google-chrome`, `/usr/bin/chromium`, `/usr/bin/chromium-browser`
+If none exist, `render_mode: "static"` still works; only the Chrome-backed modes become unavailable.
+### Attaching to your own Chrome
+For sites that fingerprint headless Chrome, start a Chrome instance with remote debugging and point the server at it. The server will attach to that instance instead of launching its own:
+```bash
+# macOS
+/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome \
+  --remote-debugging-port=9222 --user-data-dir=/tmp/glippy-chrome
+# Windows (PowerShell)
+& "C:\Program Files\Google\Chrome\Application\chrome.exe" `
+  --remote-debugging-port=9222 --user-data-dir=C:\Temp\glippy-chrome
+# Then in your MCP config env:
+#   CHROME_REMOTE_URL=http://127.0.0.1:9222
+```
+Using a dedicated `--user-data-dir` keeps this session isolated from your normal browsing. When attached, the fetcher leaves UA/headers/stealth untouched so requests look identical to a human using that browser.
+### Visible mode
+For debugging, set `CHROME_HEADLESS=0` to watch Chrome drive itself. Purely for development - leave it off in production.
+---
 ## Architecture
 ```
 research-mcp/
 ├── src/
-│   ├── index.js          # MCP server — tool registration, JSON-RPC handling, license validation
-│   └── geo-checker.js    # GEO analysis engine — fetches & scores domains
+│   ├── index.js           # MCP server - tool registration, JSON-RPC handling, license validation
+│   ├── geo-checker.js     # GEO analysis engine - fetches & scores domains
+│   └── chrome-fetcher.js  # Headless Chrome adapter (puppeteer-core) for WAF-blocked sites
 ├── package.json
 └── README.md
 ```
@@ -609,13 +686,13 @@ research-mcp/
 1. **Fetch resources in parallel:**
    - robots.txt
    - llms.txt
-   - Homepage HTML
+   - Homepage HTML (static fetch first, Chrome fallback if bot-blocked)
    - sitemap.xml
    - UCP profile (/.well-known/ucp)
 2. **Parse HTML with cheerio** (server-side DOM)
-3. **Run 10 weighted scoring categories**
+3. **Run 16 weighted scoring categories**
 4. **Return comprehensive analysis** with actionable recommendations

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "glippy-mcp",
-  "version": "0.1.0",
+  "version": "0.2.0",
   "description": "MCP server for GEO (Generative Engine Optimization) analysis — check any domain's AI-readiness",
   "main": "src/index.js",
   "type": "module",
@@ -38,6 +38,7 @@
   "dependencies": {
     "@modelcontextprotocol/sdk": "^1.12.1",
     "cheerio": "^1.0.0",
+    "puppeteer-core": "^24.40.0",
     "zod": "^3.24.0"
   }
 }

package/src/chrome-fetcher.js ADDED Viewed

@@ -0,0 +1,213 @@
+// Chrome-backed fetch adapter for geo-checker.
+//
+// Exposes the same shape as the internal throttledFetchUrl:
+//   { body, statusCode, headers, finalUrl }
+// but drives a headless Chrome via puppeteer-core so that bot-mitigation
+// layers (Cloudflare, Akamai, PerimeterX, DataDome, Incapsula) that block
+// raw Node fetches don't keep us out.
+//
+// The module holds a single long-lived browser + page pair. Callers fetch
+// URLs sequentially; this is fine for the audit path (one domain at a time
+// per checkGEO call) and avoids spinning up a new chromium process per page.
+import puppeteer from 'puppeteer-core';
+const DEFAULT_TIMEOUT_MS = 30_000;
+const WAIT_UNTIL = 'networkidle2';
+const DEFAULT_CHROME_PATHS = [
+  process.env.CHROME_PATH,
+  process.env.PUPPETEER_EXECUTABLE_PATH,
+  'C:/Program Files/Google/Chrome/Application/chrome.exe',
+  'C:/Program Files (x86)/Google/Chrome/Application/chrome.exe',
+  '/Applications/Google Chrome.app/Contents/MacOS/Google Chrome',
+  '/usr/bin/google-chrome',
+  '/usr/bin/chromium',
+  '/usr/bin/chromium-browser',
+].filter(Boolean);
+let browserPromise = null;
+let connectedToExisting = false;
+async function resolveChromePath() {
+  const fs = await import('node:fs/promises');
+  for (const p of DEFAULT_CHROME_PATHS) {
+    try {
+      await fs.access(p);
+      return p;
+    } catch {
+      // try next
+    }
+  }
+  return null;
+}
+async function getBrowser() {
+  if (browserPromise) return browserPromise;
+  browserPromise = (async () => {
+    // Mode 1: attach to a user's already-running Chrome via CDP.
+    // Start Chrome with `--remote-debugging-port=9222` and (if they want to
+    // reuse their normal profile) pass `--user-data-dir=...` to a dedicated
+    // clone. CHROME_REMOTE_URL can be either browserURL (http://host:port)
+    // or a browserWSEndpoint (ws://...).
+    const remoteUrl = process.env.CHROME_REMOTE_URL;
+    if (remoteUrl) {
+      const opts = remoteUrl.startsWith('ws')
+        ? { browserWSEndpoint: remoteUrl }
+        : { browserURL: remoteUrl };
+      const browser = await puppeteer.connect({
+        ...opts,
+        defaultViewport: null,
+      });
+      connectedToExisting = true;
+      return browser;
+    }
+    // Mode 2: launch our own Chrome. Headless by default; set
+    // CHROME_HEADLESS=0 to run visible (useful for sites that aggressively
+    // detect headless).
+    const executablePath = await resolveChromePath();
+    if (!executablePath) {
+      throw new Error(
+        'Chrome executable not found. Set CHROME_PATH or install Chrome/Chromium.',
+      );
+    }
+    const headlessEnv = process.env.CHROME_HEADLESS;
+    const headless = headlessEnv === '0' || headlessEnv === 'false' ? false : 'new';
+    const userDataDir = process.env.CHROME_USER_DATA_DIR || undefined;
+    const browser = await puppeteer.launch({
+      executablePath,
+      headless,
+      userDataDir,
+      args: [
+        '--no-sandbox',
+        '--disable-dev-shm-usage',
+        '--disable-blink-features=AutomationControlled',
+        '--disable-features=IsolateOrigins,site-per-process',
+      ],
+    });
+    return browser;
+  })();
+  return browserPromise;
+}
+async function applyStealth(page) {
+  // Minimal stealth: mask the navigator.webdriver flag and add common
+  // properties that headless Chrome misses. This won't defeat enterprise
+  // bot mitigation, but clears the trivial checks many WAFs rely on.
+  await page.evaluateOnNewDocument(() => {
+    Object.defineProperty(navigator, 'webdriver', { get: () => undefined });
+    // languages / plugins
+    Object.defineProperty(navigator, 'languages', { get: () => ['nl-NL', 'nl', 'en-US', 'en'] });
+    Object.defineProperty(navigator, 'plugins', { get: () => [1, 2, 3, 4, 5] });
+    // chrome.runtime stub
+    window.chrome = window.chrome || { runtime: {} };
+    // permissions query patch (Notification)
+    const originalQuery = window.navigator.permissions && window.navigator.permissions.query;
+    if (originalQuery) {
+      window.navigator.permissions.query = (parameters) =>
+        parameters.name === 'notifications'
+          ? Promise.resolve({ state: Notification.permission })
+          : originalQuery(parameters);
+    }
+  });
+}
+export async function chromeFetch(url, timeoutMs = DEFAULT_TIMEOUT_MS) {
+  const empty = { body: null, statusCode: null, headers: {}, finalUrl: null };
+  let page;
+  try {
+    const browser = await getBrowser();
+    page = await browser.newPage();
+    // When attached to a user's Chrome, leave UA/headers/stealth alone —
+    // their real profile already looks like a human. Only shape the
+    // request when we launched Chrome ourselves.
+    if (!connectedToExisting) {
+      await page.setViewport({ width: 1366, height: 768 });
+      await page.setUserAgent(
+        'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36',
+      );
+      await page.setExtraHTTPHeaders({
+        'Accept-Language': 'nl-NL,nl;q=0.9,en-US;q=0.8,en;q=0.7',
+        'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8',
+        'Upgrade-Insecure-Requests': '1',
+        'Sec-Fetch-Dest': 'document',
+        'Sec-Fetch-Mode': 'navigate',
+        'Sec-Fetch-Site': 'none',
+        'Sec-Fetch-User': '?1',
+      });
+      await applyStealth(page);
+    }
+    const response = await page.goto(url, {
+      waitUntil: WAIT_UNTIL,
+      timeout: timeoutMs,
+    });
+    if (!response) return empty;
+    let statusCode = response.status();
+    let headers = response.headers() || {};
+    // Some WAFs (Cloudflare) serve a 403 interstitial, then JS solves
+    // the challenge and navigates to real content. Give it a brief window
+    // to settle and re-read the final status from the live document.
+    if (statusCode === 403 || statusCode === 503) {
+      try {
+        await page.waitForFunction(
+          () => {
+            const html = document.documentElement ? document.documentElement.outerHTML : '';
+            // Cloudflare challenge markers
+            return !/cf-challenge|cf-browser-verification|Just a moment/i.test(html);
+          },
+          { timeout: 8000 },
+        );
+        // Re-evaluate: if navigation happened, fetch the new main response.
+        const finalResp = page.mainFrame().url() !== url
+          ? await page.waitForResponse(() => true, { timeout: 2000 }).catch(() => null)
+          : null;
+        if (finalResp) {
+          statusCode = finalResp.status();
+          headers = finalResp.headers() || headers;
+        }
+      } catch {
+        // challenge didn't clear — keep the 403/503 so caller can decide.
+      }
+    }
+    const finalUrl = page.url();
+    let body = null;
+    try {
+      body = await page.content();
+    } catch {
+      body = null;
+    }
+    return { body, statusCode, headers, finalUrl };
+  } catch (err) {
+    return { ...empty, error: err.message };
+  } finally {
+    if (page) {
+      try { await page.close(); } catch { /* ignore */ }
+    }
+  }
+}
+export async function closeBrowser() {
+  if (!browserPromise) return;
+  try {
+    const browser = await browserPromise;
+    if (connectedToExisting) {
+      await browser.disconnect();
+    } else {
+      await browser.close();
+    }
+  } catch { /* ignore */ }
+  browserPromise = null;
+  connectedToExisting = false;
+}
+// Close on process exit so the Chrome process doesn't linger.
+const shutdown = () => { closeBrowser().catch(() => {}); };
+process.once('exit', shutdown);
+process.once('SIGINT', () => { shutdown(); process.exit(130); });
+process.once('SIGTERM', () => { shutdown(); process.exit(143); });

package/src/geo-checker.js CHANGED Viewed

@@ -9,6 +9,19 @@ import http from 'node:http';
 import https from 'node:https';
 import { URL } from 'node:url';
 import * as cheerio from 'cheerio';
+import { chromeFetch } from './chrome-fetcher.js';
+// Status codes that indicate the server is refusing or stalling a bot-shaped
+// request rather than serving real content. 202 (Amazon) and 400 (Douglas)
+// sit here because in practice those are only returned to non-browser UAs.
+const BOT_BLOCK_STATUS = new Set([202, 400, 401, 403, 407, 429, 503]);
+function looksBotBlocked(res) {
+  if (!res) return true;
+  if (res.statusCode == null) return true;
+  if (BOT_BLOCK_STATUS.has(res.statusCode)) return true;
+  if (res.statusCode >= 200 && res.statusCode < 300 && !res.body) return true;
+  return false;
+}
 // ---------------------------------------------------------------------------
 // Constants
@@ -750,8 +763,11 @@ function detectPageType($, schemaTypes, pathname) {
   if (['Article', 'NewsArticle', 'BlogPosting', 'TechArticle'].some((t) => schemaTypes.has(t))) return 'article';
   if (['LocalBusiness', 'Restaurant', 'Store'].some((t) => schemaTypes.has(t))) return 'local-business';
-  // Heuristic: homepage detection
-  if (pathname === '/' || pathname === '/index.html' || pathname === '/index.php' || pathname === '') return 'homepage';
+  // Heuristic: homepage detection (including language/locale-prefixed homepages like /en/, /de-DE/, /nl/)
+  // Strip a leading language or locale segment before checking so multilingual
+  // sites hosting their homepage at /en/ or /nl-NL/ are not treated as generic.
+  const normalizedPath = pathname.replace(/^\/[a-z]{2}(?:[-_][a-z]{2,3})?\/?$/i, '/');
+  if (normalizedPath === '/' || normalizedPath === '/index.html' || normalizedPath === '/index.php' || normalizedPath === '') return 'homepage';
   // Heuristic: FAQ page via DOM
   const faqIndicators = $('[class*="faq"], [id*="faq"], details, [class*="accordion"]');
@@ -1439,7 +1455,7 @@ function checkAccessibility($) {
   const unlabeledInputList = [];
   inputs.each((_, el) => {
     const id = $(el).attr('id');
-    const hasLabel = id && $(`label[for="${id}"]`).length > 0;
+    const hasLabel = id && $(`label[for="${id.replace(/(["\\])/g, '\\$1')}"]`).length > 0;
     const hasAriaLabel = $(el).attr('aria-label') || $(el).attr('aria-labelledby');
     const wrappedInLabel = $(el).closest('label').length > 0;
     const hasPlaceholder = $(el).attr('placeholder');
@@ -1885,6 +1901,63 @@ function checkMachineReadability($, robotsTxtData, llmsTxtData, responseHeaders)
   return { checks, score: maxScore > 0 ? Math.round((score / maxScore) * 100) : 0, category: 'Machine Readability' };
 }
+// ---------------------------------------------------------------------------
+// Trust signal evidence extractor
+// ---------------------------------------------------------------------------
+/**
+ * Extract raw nav/header/footer links plus language signals. Hardcoded pattern
+ * lists cannot keep up with ~100 languages and typos; instead we surface the
+ * raw anchor text + href so the calling LLM (or downstream consumer) can
+ * classify trust signals (about / contact / legal / imprint / cookies)
+ * semantically in whatever language the site uses.
+ *
+ * @param {cheerio.CheerioAPI} $
+ * @returns {{
+ *   htmlLang: string|null,
+ *   hreflangs: string[],
+ *   navLinks: Array<{href: string, text: string, rel: string|null}>,
+ *   footerLinks: Array<{href: string, text: string, rel: string|null}>,
+ * }}
+ */
+function extractTrustSignals($) {
+  const PER_LOCATION_LIMIT = 80;
+  const MAX_TEXT_LEN = 120;
+  function collect(selector) {
+    const out = [];
+    const seen = new Set();
+    $(selector).find('a[href]').each((_, el) => {
+      if (out.length >= PER_LOCATION_LIMIT) return false;
+      const $el = $(el);
+      const href = ($el.attr('href') || '').trim();
+      if (!href || href.startsWith('#') || href.toLowerCase().startsWith('javascript:')) return;
+      const text = $el.text().trim().replace(/\s+/g, ' ').slice(0, MAX_TEXT_LEN);
+      const key = `${href}|${text}`;
+      if (seen.has(key)) return;
+      seen.add(key);
+      out.push({ href, text, rel: $el.attr('rel') || null });
+    });
+    return out;
+  }
+  const navLinks = collect('header, nav, [role="navigation"], [class*="menu" i], [class*="navigation" i], [id*="menu" i], [id*="nav" i]');
+  const footerLinks = collect('footer, [role="contentinfo"], [class*="footer" i], [id*="footer" i]');
+  const hreflangs = [];
+  $('link[rel="alternate"][hreflang]').each((_, el) => {
+    const hl = $(el).attr('hreflang');
+    if (hl) hreflangs.push(hl);
+  });
+  return {
+    htmlLang: $('html').attr('lang') || null,
+    hreflangs,
+    navLinks,
+    footerLinks,
+  };
+}
 // ---------------------------------------------------------------------------
 // CHECK CATEGORY 7: Entity & Authority
 // ---------------------------------------------------------------------------
@@ -2464,20 +2537,133 @@ function checkEntity($, jsonLdData) {
     checks.push({ status: 'info', label: 'No About/Contact page links detected', detail: 'Link to organizational info for E-E-A-T' });
   }
-  // Privacy / Terms links (trust signals)
-  const privacyPatterns = ['privacy', 'datenschutz', 'privacidad', 'privacidade', 'confidentialite', 'riservatezza', 'privacybeleid', 'integritet', 'gizlilik'];
-  const termsPatterns = ['terms', 'voorwaarden', 'agb', 'condiciones', 'termos', 'conditions-generales', 'condizioni', 'villkor', 'regulamin', 'kosullar'];
-  const privacySelector = privacyPatterns.map((p) => `a[href*="${p}"]`).join(', ');
-  const termsSelector = termsPatterns.map((p) => `a[href*="${p}"]`).join(', ');
-  const privacyLink = $(privacySelector);
-  const termsLink = $(termsSelector);
+  // Privacy / Terms / Imprint / Cookies links (trust signals, multi-language)
+  // Hardcoded patterns are a fallback heuristic; the extractTrustSignals
+  // evidence payload on the analysis result lets LLM callers reclassify
+  // semantically in any language.
+  const privacyPatterns = [
+    // English
+    'privacy', 'privacy-policy',
+    // Latin-alphabet European languages
+    'datenschutz', 'privatsphaere', 'privatsphare',
+    'privacidad', 'politica-de-privacidad',
+    'privacidade', 'politica-de-privacidade',
+    'confidentialite', 'politique-de-confidentialite', 'vie-privee',
+    'riservatezza', 'privacy-italia',
+    'privacybeleid', 'privacyverklaring',
+    'integritet', 'integritetspolicy',
+    'personvern',
+    'tietosuoja', 'yksityisyys',
+    'persondata', 'fortrolighed',
+    'adatvedelem',
+    'prywatnosc', 'polityka-prywatnosci',
+    'soukromi', 'ochrana-osobnich-udaju',
+    'ochrana-osobnych-udajov',
+    'confidentialitate',
+    'poverljivost', 'privatnost',
+    'zasebnost',
+    'privatesia', 'privatnost-hr',
+    'konfidentsialnost', 'privatnost-ba',
+    'gizlilik',
+    'privatumas', 'privatuma',
+    'yasslilik',
+    // Romanized non-Latin
+    'konfidentsialnost', 'konfidentsialnost-ua', 'konfidentsialnist',
+    'idiotikotita', 'aporrito', 'prostasia-dedomenon',
+    'puraibashi', 'puraibasi-porisi',
+    'geinsajeongbobo', 'gaeinjeongbo',
+    'yinsi', 'yinsi-zhengce',
+    'khasusiyat', 'khososi',
+    'harimiyat',
+    'niji-gopaniyata', 'gopaniyata',
+    'gopniyata',
+    'kerahasiaan', 'privasi',
+    'quyen-rieng-tu', 'bao-mat',
+    'khwam-pen-suanto', 'nayobai-khwampensuntu',
+  ];
+  const termsPatterns = [
+    // English
+    'terms', 'terms-of-service', 'terms-of-use', 'terms-conditions', 'tos',
+    // Latin-alphabet European languages
+    'agb', 'nutzungsbedingungen', 'geschaeftsbedingungen',
+    'condiciones', 'terminos', 'terminos-y-condiciones', 'condiciones-de-uso',
+    'termos', 'termos-de-uso', 'termos-de-servico',
+    'conditions-generales', 'cgu', 'cgv', 'mentions-contrat',
+    'condizioni', 'termini', 'termini-e-condizioni',
+    'voorwaarden', 'algemene-voorwaarden', 'gebruiksvoorwaarden',
+    'villkor', 'anvandarvillkor', 'allmanna-villkor',
+    'brukervilkar', 'vilkar',
+    'kayttoehdot', 'ehdot',
+    'betingelser', 'vilkaar', 'handelsbetingelser',
+    'szerzodesi-feltetelek', 'felhasznalasi-feltetelek',
+    'regulamin', 'warunki',
+    'podminky', 'vseobecne-obchodni-podminky', 'obchodni-podminky',
+    'obchodne-podmienky',
+    'termeni-si-conditii', 'termeni',
+    'uslovi', 'uvjeti', 'pogoji',
+    'kosullar', 'kullanim-kosullari',
+    'salygos', 'naudojimo-salygos',
+    'noteikumi',
+    'kasutustingimused',
+    // Romanized non-Latin
+    'usloviya', 'usloviya-ispolzovaniya', 'pravila',
+    'umovy', 'pravyla',
+    'oroi', 'oroi-xrisis',
+    'riyoukiyaku', 'riyou-kiyaku', 'kiyaku',
+    'iyong-yakgwan', 'yakgwan',
+    'tiaokuan', 'fuwu-tiaokuan', 'shiyong-tiaokuan',
+    'shuruth', 'shuroot-alistikhdam',
+    'sharayit-estefadeh', 'sharayet',
+    'niyam-shartein', 'shartein',
+    'sharth-o',
+    'ketentuan', 'syarat-ketentuan',
+    'dieu-khoan', 'dieu-khoan-su-dung',
+    'khoapkamnot', 'ngeuankhai-kan-chai',
+  ];
+  const imprintPatterns = [
+    // Legally required in DE/AT/CH, common across DACH + EU
+    'impressum', 'imprint', 'mentions-legales', 'aviso-legal',
+    'note-legali', 'colofon', 'colophon', 'wettelijke-vermelding',
+    'juridisk-information', 'oikeudellinen-huomautus',
+    'aviso-legal-pt', 'noticia-legal',
+    'pravni-udaje', 'pravne-informacie',
+    'yasal-bildirim', 'yasal-uyari',
+    'informacje-prawne',
+    'hukuki-bilgiler',
+    'impresum',
+  ];
+  const cookiePatterns = [
+    'cookie', 'cookies', 'cookiebeleid', 'cookie-policy',
+    'politique-cookies', 'politica-cookies', 'politica-de-cookies',
+    'cookierichtlinie', 'cookie-einstellungen',
+    'kekse', 'cookie-instellingen',
+    'soubory-cookie', 'sukromie-cookie',
+    'cerezler', 'gizlilik-cerezler',
+    'pliki-cookie',
+    'fichiers-cookie',
+    'kukit',
+  ];
+  const buildSelector = (patterns) => patterns.map((p) => `a[href*="${p}" i]`).join(', ');
+  const privacyLink = $(buildSelector(privacyPatterns));
+  const termsLink = $(buildSelector(termsPatterns));
+  const imprintLink = $(buildSelector(imprintPatterns));
+  const cookieLink = $(buildSelector(cookiePatterns));
   maxScore += 5;
-  if (privacyLink.length > 0 || termsLink.length > 0) {
+  const legalSignals = [];
+  if (privacyLink.length > 0) legalSignals.push('privacy');
+  if (termsLink.length > 0) legalSignals.push('terms');
+  if (imprintLink.length > 0) legalSignals.push('imprint');
+  if (cookieLink.length > 0) legalSignals.push('cookies');
+  if (legalSignals.length >= 2) {
     score += 5;
-    checks.push({ status: 'pass', label: 'Legal pages linked', detail: `Privacy: ${privacyLink.length > 0 ? 'yes' : 'no'}, Terms: ${termsLink.length > 0 ? 'yes' : 'no'}` });
+    checks.push({ status: 'pass', label: `Legal pages linked (${legalSignals.length})`, detail: `Detected: ${legalSignals.join(', ')}` });
+  } else if (legalSignals.length === 1) {
+    score += 3;
+    checks.push({ status: 'warn', label: `Only one legal page linked (${legalSignals[0]})`, detail: 'Add the others (privacy, terms, imprint, cookies) for full trust signals. Heuristic may miss non-Latin scripts — check evidence payload.' });
   } else {
-    checks.push({ status: 'info', label: 'No privacy/terms links detected', detail: null });
+    checks.push({ status: 'info', label: 'No legal page links detected by heuristic', detail: 'If the site is non-English, verify via the footerLinks evidence payload before treating as missing.' });
   }
   // E-E-A-T Experience Signals (10 pts)
@@ -2539,7 +2725,7 @@ function checkEntity($, jsonLdData) {
   const hasPhone = /(\+?\d{1,3}[-.\s]?)?\(?\d{2,4}\)?[-.\s]?\d{3,4}[-.\s]?\d{3,4}/.test(bodyText);
   const hasEmail = /\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z]{2,}\b/i.test(bodyText);
   const hasAddress = $('[itemprop="address"], [class*="address"], address').length > 0;
-  const hasContactPage = $('a[href*="contact"]').length > 0;
+  const hasContactPage = contactLink.length > 0;
   const contactSignals = (hasPhone ? 1 : 0) + (hasEmail ? 1 : 0) + (hasAddress ? 1 : 0) + (hasContactPage ? 1 : 0);
   maxScore += 5;
   if (contactSignals >= 3) {
@@ -2985,7 +3171,7 @@ function checkWebMCP($, pageType, ucpData) {
         const name = input.attr('name');
         const type = input.attr('type');
         const id = input.attr('id');
-        const label = id ? $(`label[for="${id}"]`).length > 0 : false;
+        const label = id ? $(`label[for="${id.replace(/(["\\])/g, '\\$1')}"]`).length > 0 : false;
         const ariaLabel = input.attr('aria-label');
         const placeholder = input.attr('placeholder');
@@ -4243,6 +4429,16 @@ function analyseHTML(html, domain, robotsTxtData, llmsTxtData, responseHeaders,
     headings: { h1: [], h2: [] },
     lang: null,
     hasStructuredData: false,
+    // Raw evidence for language-agnostic trust signal classification.
+    // Populated by extractTrustSignals; consumers running inside an LLM can
+    // reclassify legal / about / contact / imprint / cookies semantically
+    // instead of relying on the heuristic pattern lists.
+    evidence: {
+      htmlLang: null,
+      hreflangs: [],
+      navLinks: [],
+      footerLinks: [],
+    },
   };
   if (!html) return result;
@@ -4265,6 +4461,9 @@ function analyseHTML(html, domain, robotsTxtData, llmsTxtData, responseHeaders,
   const pageType = detectPageType($, schemaTypes, pathname);
   result.pageType = pageType;
+  // Extract language-agnostic trust signal evidence
+  result.evidence = extractTrustSignals($);
   // Populate basic metadata fields (backward-compatible with old analyseHTML)
   result.title = $('title').first().text().trim() || null;
   result.lang = $('html').attr('lang') || null;
@@ -4391,6 +4590,7 @@ function analyseHTML(html, domain, robotsTxtData, llmsTxtData, responseHeaders,
 async function checkGEO(domain, options = {}) {
   const maxPages = options.maxPages ?? MAX_PAGES_PER_DOMAIN;
   const skipCache = options.skipCache ?? false;
+  const renderMode = options.renderMode ?? 'auto'; // 'static' | 'chrome' | 'auto'
   // Check cache first (unless explicitly skipped)
   if (!skipCache) {
@@ -4500,7 +4700,9 @@ async function checkGEO(domain, options = {}) {
     [robotsRes, llmsRes, homepageRes, sitemapRes, ucpRes] = await Promise.all([
       throttledFetchUrl(robotsUrl, FETCH_TIMEOUT_MS, MAX_TEXT_BODY_SIZE).catch(() => ({ body: null, statusCode: null, headers: {} })),
       throttledFetchUrl(llmsUrl, FETCH_TIMEOUT_MS, MAX_TEXT_BODY_SIZE).catch(() => ({ body: null, statusCode: null, headers: {} })),
-      throttledFetchUrl(homepageUrl).catch(() => ({ body: null, statusCode: null, headers: {} })),
+      renderMode === 'chrome'
+        ? chromeFetch(homepageUrl).catch(() => ({ body: null, statusCode: null, headers: {} }))
+        : throttledFetchUrl(homepageUrl).catch(() => ({ body: null, statusCode: null, headers: {} })),
       throttledFetchUrl(sitemapUrl, FETCH_TIMEOUT_MS, MAX_TEXT_BODY_SIZE).catch(() => ({ body: null, statusCode: null, headers: {} })),
       throttledFetchUrl(ucpUrl, FETCH_TIMEOUT_MS, MAX_TEXT_BODY_SIZE).catch(() => ({ body: null, statusCode: null, headers: {} })),
     ]);
@@ -4509,6 +4711,31 @@ async function checkGEO(domain, options = {}) {
     return output;
   }
+  // Auto fallback: if static fetch couldn't get the homepage (bot block,
+  // WAF, or network error), retry via headless Chrome. Record that we
+  // rendered via Chrome so downstream multi-page crawl uses it too.
+  let useChromeForCrawl = renderMode === 'chrome';
+  if (renderMode === 'auto' && looksBotBlocked(homepageRes)) {
+    const chromeRes = await chromeFetch(homepageUrl).catch(() => null);
+    const chromeOk =
+      chromeRes &&
+      typeof chromeRes.statusCode === 'number' &&
+      chromeRes.statusCode >= 200 &&
+      chromeRes.statusCode < 300 &&
+      chromeRes.body;
+    if (chromeOk) {
+      homepageRes = chromeRes;
+      useChromeForCrawl = true;
+      output.renderMode = 'chrome-fallback';
+    } else {
+      output.renderMode = chromeRes && chromeRes.statusCode
+        ? `chrome-blocked-${chromeRes.statusCode}`
+        : 'static-blocked';
+    }
+  } else {
+    output.renderMode = renderMode === 'chrome' ? 'chrome' : 'static';
+  }
   // --- robots.txt ---
   try {
     if (robotsRes.statusCode === 200 && robotsRes.body) {
@@ -4547,7 +4774,12 @@ async function checkGEO(domain, options = {}) {
   // --- Homepage (full 16-category analysis) ---
   try {
     output.homepage.statusCode = homepageRes.statusCode;
-    if (homepageRes.statusCode === 200 && homepageRes.body) {
+    // Accept any 2xx that came back with a body. In practice Chrome often
+    // surfaces 202 (Amazon) or 206 responses that still carry the rendered
+    // document; analysing those is strictly better than dropping the score.
+    const homepageUsable = homepageRes.statusCode >= 200 &&
+      homepageRes.statusCode < 300 && !!homepageRes.body;
+    if (homepageUsable) {
       output.homepage.analysis = analyseHTML(
         homepageRes.body,
         cleanDomain,
@@ -4633,14 +4865,18 @@ async function checkGEO(domain, options = {}) {
       error: output.homepage.error,
     });
+    // Chrome fetches are serial (one tab at a time), static fetches run in batches.
+    const concurrency = useChromeForCrawl ? 1 : MAX_CONCURRENT_PAGE_FETCHES;
     // Fetch remaining pages in controlled batches
-    for (let i = 0; i < pagesToCrawl.length; i += MAX_CONCURRENT_PAGE_FETCHES) {
-      const batch = pagesToCrawl.slice(i, i + MAX_CONCURRENT_PAGE_FETCHES);
+    for (let i = 0; i < pagesToCrawl.length; i += concurrency) {
+      const batch = pagesToCrawl.slice(i, i + concurrency);
       const batchResults = await Promise.all(
         batch.map(async (pageUrl) => {
           try {
-            const res = await throttledFetchUrl(pageUrl, PAGE_CRAWL_TIMEOUT_MS);
-            if (res.statusCode === 200 && res.body) {
+            const res = useChromeForCrawl
+              ? await chromeFetch(pageUrl, PAGE_CRAWL_TIMEOUT_MS)
+              : await throttledFetchUrl(pageUrl, PAGE_CRAWL_TIMEOUT_MS);
+            if (res.statusCode >= 200 && res.statusCode < 300 && res.body) {
               // Determine pathname for page type detection
               let pathname = '/';
               try { pathname = new URL(pageUrl).pathname; } catch {}

package/src/index.js CHANGED Viewed

@@ -1,4 +1,4 @@
-#!/usr/bin/env node
+#!/usr/bin/env node
 /**
  * Glippy MCP Server
@@ -36,6 +36,27 @@ import {
   parseSitemapUrls,
   aggregatePageScores,
 } from "./geo-checker.js";
+import { chromeFetch } from "./chrome-fetcher.js";
+// Render-mode: how to fetch HTML for scoring.
+//   'static' (default for tools that don't specify) - raw Node fetch, fastest
+//   'chrome'                                        - always render via headless Chrome
+//   'auto'                                          - static first, Chrome fallback on bot-block
+//
+// Chrome modes require a local Chrome/Chromium binary. Auto-resolves from
+// CHROME_PATH / PUPPETEER_EXECUTABLE_PATH / common install locations, or
+// attaches to an already-running Chrome when CHROME_REMOTE_URL is set
+// (e.g. "http://localhost:9222" after launching Chrome with
+// --remote-debugging-port=9222).
+const RENDER_MODES = ["static", "chrome", "auto"];
+function looksBotBlockedResponse(res) {
+  if (!res) return true;
+  if (res.statusCode == null) return true;
+  if ([401, 403, 407, 429, 503].includes(res.statusCode)) return true;
+  if (res.statusCode >= 200 && res.statusCode < 300 && !res.body) return true;
+  return false;
+}
 // ---------------------------------------------------------------------------
 // License validation
@@ -1597,10 +1618,15 @@ const server = new McpServer({
 server.tool(
   "analyze_domain",
   "Run a comprehensive GEO (Generative Engine Optimization) readiness analysis on a domain. " +
-    "Checks robots.txt, llms.txt (note: llms.txt is not currently supported by major AI models — having one cannot hurt but is not a meaningful optimization), " +
-    "homepage HTML (10 scoring categories), sitemap.xml, and security headers. " +
+    "Checks robots.txt, llms.txt (note: llms.txt is not currently supported by major AI models - having one cannot hurt but is not a meaningful optimization), " +
+    "homepage HTML (16 scoring categories), sitemap.xml, and security headers. " +
     "Returns an overall weighted score (0-100) with per-category breakdowns and actionable recommendations. " +
-    "Use output_format='json' to get raw results that can be passed to export_report.",
+    "The response includes a 'Trust Signal Evidence' section (htmlLang, hreflangs, nav/footer link anchor text + href). " +
+    "IMPORTANT: the Entity & Authority heuristic for about/contact/legal/imprint/cookies uses a pattern fallback. " +
+    "When the site is not in English (htmlLang != 'en' or hreflangs indicate otherwise), you (the calling LLM) should " +
+    "inspect navLinks/footerLinks semantically and override the heuristic classification in your final explanation - " +
+    "the anchor text tells you what each link is regardless of URL patterns. " +
+    "Use output_format='json' to get the full raw result (includes homepage.analysis.evidence) for programmatic use.",
   {
     domain: z
       .string()
@@ -1616,6 +1642,14 @@ server.tool(
       .describe(
         "Maximum pages to crawl (1 = homepage only, up to 10 for multi-page analysis). Defaults to 10."
       ),
+    render_mode: z
+      .enum(RENDER_MODES)
+      .optional()
+      .describe(
+        'How to fetch HTML. "static" (fast, plain Node fetch) is default for this tool. ' +
+        '"auto" tries static first and falls back to a local headless Chrome for bot-blocked responses (401/403/407/429/503 or empty 2xx). ' +
+        '"chrome" always renders via Chrome. Chrome modes need a local Chrome binary (CHROME_PATH) or an attached instance (CHROME_REMOTE_URL).'
+      ),
     output_format: z
       .enum(["text", "json"])
       .optional()
@@ -1624,10 +1658,11 @@ server.tool(
         '"json" returns the raw analysis result object that can be passed to export_report\'s analysis_result parameter.'
       ),
   },
-  withLicense(async ({ domain, max_pages, output_format }) => {
+  withLicense(async ({ domain, max_pages, render_mode, output_format }) => {
     try {
       const result = await checkGEO(domain, {
         maxPages: max_pages ?? 10,
+        renderMode: render_mode ?? "static",
       });
       if (result.error) {
@@ -1678,6 +1713,36 @@ server.tool(
           }
         }
         lines.push("");
+        // Trust Signal Evidence - language-agnostic raw data for LLM reclassification
+        const ev = analysis.evidence;
+        if (ev) {
+          lines.push("## Trust Signal Evidence");
+          lines.push(`htmlLang: ${ev.htmlLang || "(none)"}`);
+          if (ev.hreflangs && ev.hreflangs.length > 0) {
+            lines.push(`hreflangs: ${ev.hreflangs.slice(0, 20).join(", ")}${ev.hreflangs.length > 20 ? ` (+${ev.hreflangs.length - 20} more)` : ""}`);
+          }
+          const isNonEnglish = ev.htmlLang && !/^en(-|$)/i.test(ev.htmlLang);
+          if (isNonEnglish) {
+            lines.push(
+              `NOTE: Site is in '${ev.htmlLang}'. If the Entity & Authority category flagged missing about/contact/legal/imprint/cookies links, ` +
+              `verify against the footerLinks below - anchor text like 'Über uns', '会社概要', 'Impressum', 'Politique de confidentialité' count as trust signals ` +
+              `regardless of URL patterns. Override the heuristic if links are clearly present.`
+            );
+          }
+          const formatLink = (l) => `  - ${l.text || "(no text)"} -> ${l.href}`;
+          if (ev.footerLinks && ev.footerLinks.length > 0) {
+            lines.push(`footerLinks (${ev.footerLinks.length}):`);
+            ev.footerLinks.slice(0, 40).forEach((l) => lines.push(formatLink(l)));
+            if (ev.footerLinks.length > 40) lines.push(`  ... (+${ev.footerLinks.length - 40} more)`);
+          }
+          if (ev.navLinks && ev.navLinks.length > 0) {
+            lines.push(`navLinks (${ev.navLinks.length}):`);
+            ev.navLinks.slice(0, 30).forEach((l) => lines.push(formatLink(l)));
+            if (ev.navLinks.length > 30) lines.push(`  ... (+${ev.navLinks.length - 30} more)`);
+          }
+          lines.push("");
+        }
       }
       // robots.txt
@@ -1941,17 +2006,28 @@ server.tool(
 server.tool(
   "get_geo_summary",
   "Get a concise GEO readiness summary for a domain: overall score, grade, top 3 strengths, and top 3 issues to fix. " +
-    "Use this for a quick overview; use analyze_domain for full details.",
+    "Use this for a quick overview; use analyze_domain for full details including the Trust Signal Evidence payload " +
+    "(raw nav/footer links for LLM-driven semantic classification on non-English sites).",
   {
     domain: z
       .string()
       .describe(
         'The domain to check, e.g. "example.com". Do not include https:// prefix.'
       ),
+    render_mode: z
+      .enum(RENDER_MODES)
+      .optional()
+      .describe(
+        'How to fetch the homepage. "static" (default), "auto" (static with Chrome fallback on bot-block), ' +
+        'or "chrome" (always render via local headless Chrome).'
+      ),
   },
-  withLicense(async ({ domain }) => {
+  withLicense(async ({ domain, render_mode }) => {
     try {
-      const result = await checkGEO(domain, { maxPages: 1 });
+      const result = await checkGEO(domain, {
+        maxPages: 1,
+        renderMode: render_mode ?? "static",
+      });
       if (result.error) {
         return {
@@ -1987,6 +2063,15 @@ server.tool(
       lines.push(`# GEO Summary: ${result.domain}`);
       lines.push(`Overall Score: ${analysis.overallScore}% (${grade})`);
       lines.push(`Page Type: ${analysis.pageType}`);
+      const evLang = analysis.evidence?.htmlLang;
+      if (evLang) {
+        lines.push(`Site Language: ${evLang}`);
+        if (!/^en(-|$)/i.test(evLang)) {
+          lines.push(
+            `(Non-English site - use analyze_domain for the footerLinks evidence payload to reclassify trust signals semantically.)`
+          );
+        }
+      }
       lines.push("");
       // Sort categories by score
@@ -2071,6 +2156,13 @@ server.tool(
       .describe(
         "Maximum pages to crawl per domain (1 = homepage only). Defaults to 10."
       ),
+    render_mode: z
+      .enum(RENDER_MODES)
+      .optional()
+      .describe(
+        'How to fetch HTML for each domain. "static" (default), "auto" (static with Chrome fallback on bot-block), ' +
+        'or "chrome" (always render via local headless Chrome).'
+      ),
     output_format: z
       .enum(["text", "json"])
       .optional()
@@ -2082,13 +2174,14 @@ server.tool(
   withTierFeature(
     "compareDomains",
     "Domain comparison requires a Pro or Agency license.",
-    async ({ domains, max_pages, output_format }) => {
+    async ({ domains, max_pages, render_mode, output_format }) => {
     const maxPages = max_pages ?? 10;
+    const renderMode = render_mode ?? "static";
     // Run all analyses in parallel
     const results = await Promise.allSettled(
       domains.map((domain) =>
-        checkGEO(domain, { maxPages }).then((result) => ({
+        checkGEO(domain, { maxPages, renderMode }).then((result) => ({
           domain,
           result,
         }))
@@ -2240,7 +2333,7 @@ const DEFAULT_RATE_LIMIT = parseInt(process.env.GLIPPY_RATE_LIMIT, 10) || 5;
  * @param {number} domainRateLimit - Max requests/second per domain (0 = unlimited)
  * @returns {Promise<{pageResults: object[], domainMeta: Map}>}
  */
-async function analyseUrls(urls, concurrency = 3, domainRateLimit = DEFAULT_RATE_LIMIT) {
+async function analyseUrls(urls, concurrency = 3, domainRateLimit = DEFAULT_RATE_LIMIT, renderMode = "static") {
   // Group URLs by domain
   const domainMap = new Map(); // domain → [urls]
   for (const url of urls) {
@@ -2318,13 +2411,34 @@ async function analyseUrls(urls, concurrency = 3, domainRateLimit = DEFAULT_RATE
           try {
             const pathname = new URL(url).pathname;
             const meta = domainMeta.get(domain);
-            const res = await throttledFetchUrl(url, 15000);
+            let res;
+            let rendered = "static";
+            if (renderMode === "chrome") {
+              res = await chromeFetch(url, 30000);
+              rendered = "chrome";
+            } else {
+              res = await throttledFetchUrl(url, 15000);
+              if (renderMode === "auto" && looksBotBlockedResponse(res)) {
+                const chromeRes = await chromeFetch(url, 30000).catch(() => null);
+                if (
+                  chromeRes &&
+                  typeof chromeRes.statusCode === "number" &&
+                  chromeRes.statusCode >= 200 &&
+                  chromeRes.statusCode < 300 &&
+                  chromeRes.body
+                ) {
+                  res = chromeRes;
+                  rendered = "chrome-fallback";
+                }
+              }
+            }
-            if (res.statusCode !== 200 || !res.body) {
+            if (!res || res.statusCode == null || res.statusCode < 200 || res.statusCode >= 300 || !res.body) {
               return {
                 url,
                 analysis: null,
-                error: res.statusCode ? `HTTP ${res.statusCode}` : "Failed to fetch",
+                error: res && res.statusCode ? `HTTP ${res.statusCode}` : "Failed to fetch",
+                renderMode: rendered,
               };
             }
@@ -2337,7 +2451,7 @@ async function analyseUrls(urls, concurrency = 3, domainRateLimit = DEFAULT_RATE
               pathname
             );
-            return { url, analysis, error: null };
+            return { url, analysis, error: null, renderMode: rendered };
           } catch (err) {
             return { url, analysis: null, error: err.message };
           }
@@ -2453,6 +2567,13 @@ server.tool(
         "Defaults to 5 req/s (or GLIPPY_RATE_LIMIT env var). Set lower for polite crawling, higher if you control the target server. " +
         "Use 0.5 for 1 request every 2 seconds, 10 for aggressive crawling."
       ),
+    render_mode: z
+      .enum(RENDER_MODES)
+      .optional()
+      .describe(
+        'How to fetch each URL. "static" (default), "auto" (static with Chrome fallback on bot-block), ' +
+        'or "chrome" (always render via local headless Chrome).'
+      ),
     output_format: z
       .enum(["text", "json", "summary"])
       .optional()
@@ -2480,7 +2601,7 @@ server.tool(
         "Recommended: 10-20 for detailed results to stay within output limits."
       ),
   },
-  withLicense(async ({ sitemap_url, max_urls, rate_limit, output_format, offset, limit }) => {
+  withLicense(async ({ sitemap_url, max_urls, rate_limit, render_mode, output_format, offset, limit }) => {
     const features = getFeatures();
     // Check if sitemap analysis is available for this tier
@@ -2555,7 +2676,7 @@ server.tool(
       // Analyse all URLs with rate limiting
       const rateLimit = rate_limit ?? DEFAULT_RATE_LIMIT;
-      const { pageResults } = await analyseUrls(urlsToAnalyse, 3, rateLimit);
+      const { pageResults } = await analyseUrls(urlsToAnalyse, 3, rateLimit, render_mode ?? "static");
       const aggregated = aggregatePageScores(pageResults);
       // Summary output mode - compact JSON with minimal page info (ideal for large sitemaps)
@@ -2665,6 +2786,13 @@ server.tool(
         "Defaults to 5 req/s (or GLIPPY_RATE_LIMIT env var). Set lower for polite crawling, higher if you control the target server. " +
         "Use 0.5 for 1 request every 2 seconds, 10 for aggressive crawling."
       ),
+    render_mode: z
+      .enum(RENDER_MODES)
+      .optional()
+      .describe(
+        'How to fetch each URL. "static" (default), "auto" (static with Chrome fallback on bot-block), ' +
+        'or "chrome" (always render via local headless Chrome).'
+      ),
     output_format: z
       .enum(["text", "json", "summary"])
       .optional()
@@ -2692,7 +2820,7 @@ server.tool(
         "Recommended: 10-20 for detailed results to stay within output limits."
       ),
   },
-  withLicense(async ({ urls, rate_limit, output_format, offset, limit }) => {
+  withLicense(async ({ urls, rate_limit, render_mode, output_format, offset, limit }) => {
     const features = getFeatures();
     // Check if batch analysis is available for this tier
@@ -2721,7 +2849,7 @@ server.tool(
     try {
       const rateLimit = rate_limit ?? DEFAULT_RATE_LIMIT;
-      const { pageResults } = await analyseUrls(urls, 3, rateLimit);
+      const { pageResults } = await analyseUrls(urls, 3, rateLimit, render_mode ?? "static");
       const aggregated = aggregatePageScores(pageResults);
       // Summary output mode - compact JSON with minimal page info (ideal for large batches)
@@ -2834,6 +2962,13 @@ server.tool(
         "Maximum pages to crawl (1 = homepage only, up to 10 for multi-page analysis). Defaults to 10. " +
         "Ignored if analysis_result is provided."
       ),
+    render_mode: z
+      .enum(RENDER_MODES)
+      .optional()
+      .describe(
+        'How to fetch HTML. "static" (default), "auto" (static with Chrome fallback on bot-block), ' +
+        'or "chrome" (always render via local headless Chrome). Ignored if analysis_result is provided.'
+      ),
     analysis_result: z
       .object({})
       .passthrough()
@@ -2844,7 +2979,7 @@ server.tool(
         "and export in multiple formats without redundant crawling."
       ),
   },
-  withLicense(async ({ domain, format, max_pages, analysis_result }) => {
+  withLicense(async ({ domain, format, max_pages, render_mode, analysis_result }) => {
     try {
       let result;
@@ -2866,6 +3001,7 @@ server.tool(
         // Run fresh analysis (may use cache automatically)
         result = await checkGEO(domain, {
           maxPages: max_pages ?? 10,
+          renderMode: render_mode ?? "static",
         });
       } else {
         return {
@@ -3003,11 +3139,19 @@ server.tool(
       .describe(
         "Max requests/second per domain for URL/sitemap modes. Defaults to 5. Ignored if analysis_results provided."
       ),
+    render_mode: z
+      .enum(RENDER_MODES)
+      .optional()
+      .describe(
+        'How to fetch HTML. "static" (default), "auto" (static with Chrome fallback on bot-block), ' +
+        'or "chrome" (always render via local headless Chrome). Ignored if analysis_results provided.'
+      ),
   },
   withTierFeature(
     "bulkExport",
     "Bulk report exports require a Pro or Agency license.",
-    async ({ format, domains, urls, sitemap_url, analysis_results, max_pages, max_urls, rate_limit }) => {
+    async ({ format, domains, urls, sitemap_url, analysis_results, max_pages, max_urls, rate_limit, render_mode }) => {
+    const renderMode = render_mode ?? "static";
     // Validate: exactly one input mode
     const modes = [domains, urls, sitemap_url, analysis_results].filter(Boolean).length;
     if (modes !== 1) {
@@ -3116,7 +3260,7 @@ server.tool(
         const maxPages = max_pages ?? 10;
         const results = await Promise.allSettled(
           domains.map((domain) =>
-            checkGEO(domain, { maxPages }).then((result) => ({
+            checkGEO(domain, { maxPages, renderMode }).then((result) => ({
               domain,
               result,
             }))
@@ -3173,7 +3317,7 @@ server.tool(
       // ------------------------------------------------------------------
       if (urls) {
         const rateLimit = rate_limit ?? DEFAULT_RATE_LIMIT;
-        const { pageResults } = await analyseUrls(urls, 3, rateLimit);
+        const { pageResults } = await analyseUrls(urls, 3, rateLimit, renderMode);
         const aggregated = aggregatePageScores(pageResults);
         const title = `${urls.length} URLs`;
@@ -3239,7 +3383,7 @@ server.tool(
         const urlsToAnalyse = allUrls.slice(0, max_urls ?? 50000);
         const rateLimit = rate_limit ?? DEFAULT_RATE_LIMIT;
-        const { pageResults } = await analyseUrls(urlsToAnalyse, 3, rateLimit);
+        const { pageResults } = await analyseUrls(urlsToAnalyse, 3, rateLimit, renderMode);
         const aggregated = aggregatePageScores(pageResults);
         const title = `Sitemap: ${sitemap_url} (${urlsToAnalyse.length} of ${allUrls.length} URLs)`;