npm - glippy-mcp - Versions diffs - 0.1.0 - Mend

glippy-mcp 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (5) hide show

package/README.md ADDED Viewed

@@ -0,0 +1,734 @@
+# Glippy GEO MCP Server
+An MCP (Model Context Protocol) server that exposes Glippy's GEO (Generative Engine Optimization) analysis capabilities as tools for AI agents.
+## Overview
+This MCP server enables AI models (Claude, GPT, etc.) to directly analyse any domain's **GEO readiness** — how well a website is prepared for AI crawlers, LLM-powered search, and agent interaction.
+It wraps the Glippy desktop app's server-side analysis engine (`geo-checker.js`) and exposes it over the standard MCP protocol via stdio transport.
+**Key features:**
+- Full 10-category GEO analysis with weighted scoring
+- robots.txt AI crawler access detection
+- llms.txt file discovery and parsing
+- Sitemap crawling and multi-page analysis
+- Domain comparison and competitive analysis
+- Export to styled Markdown or HTML reports
+- **Smart caching** — automatic deduplication of repeated analyses
+- **JSON output mode** — pass analysis results between tools to avoid re-crawling
+---
+## Table of Contents
+- [Installation](#installation)
+- [Configuration](#configuration)
+  - [Claude Desktop](#usage-with-claude-desktop)
+  - [Claude Code](#usage-with-claude-code)
+  - [Environment Variables](#environment-variables)
+- [Integration Guides](#integration-guides)
+- [Tools Reference](#tools-reference)
+  - [analyze_domain](#analyze_domain)
+  - [check_robots_txt](#check_robots_txt)
+  - [check_llms_txt](#check_llms_txt)
+  - [get_geo_summary](#get_geo_summary)
+  - [compare_domains](#compare_domains)
+  - [analyze_sitemap](#analyze_sitemap)
+  - [analyze_urls](#analyze_urls)
+  - [export_report](#export_report)
+  - [export_bulk_report](#export_bulk_report)
+- [GEO Scoring Categories](#geo-scoring-categories)
+- [Rate Limiting](#rate-limiting)
+- [Output Formats](#output-formats)
+- [Architecture](#architecture)
+- [Manual Testing](#manual-testing)
+- [Troubleshooting](#troubleshooting)
+- [License](#license)
+---
+## Installation
+### Via npm (recommended)
+```bash
+npm install -g glippy-mcp
+```
+### Via npx (no install needed)
+Use directly via `npx` in your MCP configuration:
+```bash
+npx -y glippy-mcp
+```
+### Requirements
+- Node.js 18.0.0 or higher
+- Valid Glippy MCP license key
+---
+## Configuration
+### License Key
+A valid Glippy MCP license key (`GLMCP-XXXX-XXXX-XXXX`) is required. Get one at [glippy.dev](https://glippy.dev).
+The server validates the key against the Glippy API on first use and caches the result for 24 hours. **Analysis runs locally on your machine** — only the license check calls the server.
+### Usage with Claude Desktop
+Add to your `claude_desktop_config.json`:
+```json
+{
+  "mcpServers": {
+    "glippy-geo": {
+      "command": "npx",
+      "args": ["-y", "glippy-mcp"],
+      "env": {
+        "GLIPPY_LICENSE_KEY": "GLMCP-XXXX-XXXX-XXXX"
+      }
+    }
+  }
+}
+```
+**Config file locations:**
+- macOS: `~/Library/Application Support/Claude/claude_desktop_config.json`
+- Windows: `%APPDATA%\Claude\claude_desktop_config.json`
+- Linux: `~/.config/Claude/claude_desktop_config.json`
+### Usage with Claude Code
+Add to your `.mcp.json` in your project root or `~/.claude/.mcp.json` for global access:
+```json
+{
+  "mcpServers": {
+    "glippy-geo": {
+      "command": "npx",
+      "args": ["-y", "glippy-mcp"],
+      "env": {
+        "GLIPPY_LICENSE_KEY": "GLMCP-XXXX-XXXX-XXXX"
+      }
+    }
+  }
+}
+```
+### Environment Variables
+| Variable | Required | Default | Description |
+|----------|----------|---------|-------------|
+| `GLIPPY_LICENSE_KEY` | Yes | — | Your MCP license key (`GLMCP-XXXX-XXXX-XXXX`) |
+| `GLIPPY_RATE_LIMIT` | No | `5` | Default max requests/second per domain for batch tools |
+---
+## Integration Guides
+For detailed setup instructions across all supported environments, see the **[Integration Guide](docs/INTEGRATIONS.md)**.
+### Supported Environments
+| Environment | Support Level | Config File |
+|-------------|---------------|-------------|
+| **Claude Code** (VS Code) | Native MCP | `.mcp.json` |
+| **Claude CLI** (Terminal) | Native MCP | `.mcp.json` |
+| **Claude Desktop** | Native MCP | `claude_desktop_config.json` |
+| **Cursor IDE** | Native MCP | `.cursor/mcp.json` |
+| **Windsurf IDE** | Native MCP | `.windsurf/mcp.json` |
+| **Continue.dev** | Native MCP | `~/.continue/config.json` |
+| **ChatGPT / OpenAI** | Via bridge/API | Custom integration |
+The integration guide includes:
+- Step-by-step setup for each environment
+- Platform-specific config file locations
+- Usage examples and prompts
+- Verification and testing instructions
+- Troubleshooting tips
+---
+## Tools Reference
+### analyze_domain
+Run a comprehensive GEO readiness analysis on a domain.
+**Description:** Checks robots.txt, llms.txt, homepage HTML (10 scoring categories), sitemap.xml, and security headers. Returns an overall weighted score (0-100) with per-category breakdowns and actionable recommendations. Use `output_format="json"` to get raw results that can be passed to `export_report`.
+**Parameters:**
+| Parameter | Type | Required | Description |
+|-----------|------|----------|-------------|
+| `domain` | string | Yes | The domain to analyse, e.g. `"example.com"`. Do not include `https://` prefix. |
+| `max_pages` | integer | No | Maximum pages to crawl (1-10). Default: `10`. |
+| `output_format` | enum | No | `"text"` (default) for human-readable report, `"json"` for raw results to pass to `export_report`. |
+**Example:**
+```
+Analyse GEO readiness for example.com
+```
+**Example (JSON output for chaining):**
+```
+analyze_domain domain="example.com" max_pages=5 output_format="json"
+# Then pass the result to export_report
+```
+**Returns:**
+- Overall GEO score (0-100) with letter grade
+- Page type detection (article, product, homepage, etc.)
+- 10 category scores with pass/fail/warn checks
+- robots.txt analysis with AI crawler access
+- llms.txt presence and content preview
+- Sitemap discovery status
+- Multi-page aggregated scores (if `max_pages > 1`)
+---
+### check_robots_txt
+Check a domain's robots.txt specifically for AI crawler access rules.
+**Description:** Reports which AI crawlers (GPTBot, ClaudeBot, etc.) are blocked or allowed.
+**Parameters:**
+| Parameter | Type | Required | Description |
+|-----------|------|----------|-------------|
+| `domain` | string | Yes | The domain to check, e.g. `"example.com"`. Do not include `https://` prefix. |
+**Example:**
+```
+Check which AI crawlers are blocked on example.com
+```
+**Returns:**
+- robots.txt existence and URL
+- Wildcard disallow detection (`Disallow: /`)
+- Per-crawler access status for:
+  - GPTBot
+  - Google-Extended
+  - CCBot
+  - anthropic-ai
+  - ClaudeBot
+  - Bytespider
+  - PerplexityBot
+  - ChatGPT-User
+  - AmazonBot
+  - cohere-ai
+- Sitemap references found in robots.txt
+---
+### check_llms_txt
+Check if a domain has an llms.txt file.
+**Description:** Checks for the emerging standard file that provides context to LLMs about a site's purpose and content.
+> **Important:** llms.txt is an emerging proposal, but it is **not currently supported or consumed** by major AI models, crawlers, or MCP clients. No mainstream LLM or AI agent reads llms.txt to inform its behaviour. Having an llms.txt file should **not be seen as a relevant optimization** for your GEO readiness — it will not meaningfully improve how AI systems discover or understand your site today. That said, it cannot hurt to have one: the file is lightweight, easy to create, and if the standard gains adoption in the future you will already be prepared.
+**Parameters:**
+| Parameter | Type | Required | Description |
+|-----------|------|----------|-------------|
+| `domain` | string | Yes | The domain to check, e.g. `"example.com"`. Do not include `https://` prefix. |
+**Example:**
+```
+Does example.com have an llms.txt file?
+```
+**Returns:**
+- llms.txt existence
+- Full file contents if present
+- Link to specification at https://llmstxt.org
+---
+### get_geo_summary
+Get a concise GEO readiness summary for quick assessment.
+**Description:** Returns overall score, grade, top 3 strengths, and top 3 issues to fix. Use this for a quick overview; use `analyze_domain` for full details.
+**Parameters:**
+| Parameter | Type | Required | Description |
+|-----------|------|----------|-------------|
+| `domain` | string | Yes | The domain to check, e.g. `"example.com"`. Do not include `https://` prefix. |
+**Example:**
+```
+Give me a quick GEO summary of example.com
+```
+**Returns:**
+- Overall score and grade
+- Page type detected
+- Top 3 strongest categories
+- Top 3 weakest categories with top issue
+- Quick facts (robots.txt, llms.txt, sitemap, blocked crawlers)
+---
+### compare_domains
+Analyse multiple domains in parallel and compare scores.
+**Description:** Returns a comparison table with overall scores, per-category breakdowns, and a ranked summary. Useful for competitive analysis or auditing a portfolio of sites. Use `output_format="json"` to get raw results that can be passed to `export_bulk_report`.
+**Parameters:**
+| Parameter | Type | Required | Description |
+|-----------|------|----------|-------------|
+| `domains` | array[string] | Yes | List of 2-10 domains to compare, e.g. `["example.com", "competitor.com"]`. Do not include `https://` prefix. |
+| `max_pages` | integer | No | Maximum pages to crawl per domain (1-10). Default: `10`. |
+| `output_format` | enum | No | `"text"` (default) for comparison table, `"json"` for raw results to pass to `export_bulk_report`. |
+**Example:**
+```
+Compare GEO scores of example.com, competitor1.com, and competitor2.com
+```
+**Returns:**
+- Ranked list of domains by score
+- Category comparison table (all 10 categories)
+- Quick facts comparison (robots.txt, llms.txt, sitemap, blocked crawlers)
+- Error details for any failed analyses
+---
+### analyze_sitemap
+Fetch a sitemap and analyse all discovered pages.
+**Description:** Fetches a sitemap XML (or sitemap index), extracts page URLs, and runs GEO analysis on each page. Returns per-page scores, category averages, and identifies weakest pages. Use `output_format="json"` to get raw results that can be passed to `export_bulk_report`.
+**Parameters:**
+| Parameter | Type | Required | Description |
+|-----------|------|----------|-------------|
+| `sitemap_url` | string | Yes | Full URL to sitemap, e.g. `"https://example.com/sitemap.xml"` |
+| `max_urls` | integer | No | Maximum URLs to analyse (1-50,000). Default: all URLs found. |
+| `rate_limit` | number | No | Max requests/second per domain (0.1-100). Default: `5`. |
+| `output_format` | enum | No | `"text"` (default) for report, `"json"` for raw results to pass to `export_bulk_report`. |
+**Example:**
+```
+Analyse all pages in https://example.com/sitemap.xml
+```
+**Returns:**
+- Total URLs found vs analysed
+- Per-page results table (URL, score, grade, page type)
+- Category averages across all pages
+- Weakest pages with their problem categories
+**Supports:**
+- Regular sitemaps (`<urlset>`)
+- Sitemap index files (`<sitemapindex>`) — fetches up to 3 sub-sitemaps
+---
+### analyze_urls
+Run GEO analysis on a list of specific URLs.
+**Description:** Fetches each page, scores it across 10 categories, and returns per-page results with aggregated averages. URLs can span multiple domains. Use `output_format="json"` to get raw results that can be passed to `export_bulk_report`.
+**Parameters:**
+| Parameter | Type | Required | Description |
+|-----------|------|----------|-------------|
+| `urls` | array[string] | Yes | List of 1-50,000 full URLs, e.g. `["https://example.com/about", "https://example.com/pricing"]`. Include `https://` prefix. |
+| `rate_limit` | number | No | Max requests/second per domain (0.1-100). Default: `5`. |
+| `output_format` | enum | No | `"text"` (default) for report, `"json"` for raw results to pass to `export_bulk_report`. |
+**Example:**
+```
+Analyse these specific pages: https://example.com/about, https://example.com/pricing, https://example.com/contact
+```
+**Returns:**
+- Per-page results table (URL, score, grade, page type)
+- Category averages across all pages
+- Weakest pages with their problem categories
+---
+### export_report
+Generate a styled, shareable report file.
+**Description:** Runs GEO analysis and returns results as a self-contained report in Markdown or HTML format — matching the Glippy browser extension's export output. You can optionally pass pre-computed analysis results to avoid re-crawling.
+**Parameters:**
+| Parameter | Type | Required | Description |
+|-----------|------|----------|-------------|
+| `domain` | string | No* | The domain to analyse, e.g. `"example.com"`. Do not include `https://` prefix. |
+| `format` | enum | Yes | Report format: `"markdown"` (recommendations only), `"markdown_full"` (all categories and checks), or `"html"` (standalone styled page). |
+| `max_pages` | integer | No | Maximum pages to crawl (1-10). Default: `10`. Ignored if `analysis_result` is provided. |
+| `analysis_result` | object | No* | Pre-computed analysis result from `analyze_domain` (with `output_format="json"`). Skips re-crawling. |
+*Either `domain` or `analysis_result` must be provided.
+**Example:**
+```
+Generate an HTML report for example.com
+```
+**Example (using pre-computed results):**
+```
+# First, analyze with JSON output:
+analyze_domain domain="example.com" max_pages=5 output_format="json"
+# Then export without re-crawling:
+export_report format="html" analysis_result=<result from above>
+```
+**Returns:**
+- Complete report content ready to save
+- For HTML: Standalone page with dark/light theme toggle, score ring, category accordion, recommendations table
+- For Markdown: Structured document with priority-sorted recommendations
+---
+### export_bulk_report
+Generate a styled report for bulk analysis.
+**Description:** Creates a comprehensive report for comparing multiple domains, analysing a list of URLs, or crawling a sitemap. Returns a self-contained report with rankings, category breakdowns, and per-domain/page recommendations. You can pass pre-computed results to avoid re-crawling.
+**Parameters:**
+| Parameter | Type | Required | Description |
+|-----------|------|----------|-------------|
+| `format` | enum | Yes | Report format: `"markdown"` or `"html"` |
+| `domains` | array[string] | No* | Compare 2-10 domains. Do not include `https://`. |
+| `urls` | array[string] | No* | Analyse 1-50,000 specific URLs. Include `https://`. |
+| `sitemap_url` | string | No* | Crawl a sitemap URL. |
+| `analysis_results` | object | No* | Pre-computed results from `compare_domains`, `analyze_urls`, or `analyze_sitemap` (with `output_format="json"`). |
+| `max_pages` | integer | No | For domain mode: pages per domain (1-10). Default: `10`. Ignored if `analysis_results` provided. |
+| `max_urls` | integer | No | For sitemap mode: max URLs to analyse. Default: all. Ignored if `analysis_results` provided. |
+| `rate_limit` | number | No | Max requests/second per domain. Default: `5`. Ignored if `analysis_results` provided. |
+*Provide exactly one of: `domains`, `urls`, `sitemap_url`, or `analysis_results`.
+**Example:**
+```
+Generate an HTML comparison report for example.com and competitor.com
+```
+**Example (using pre-computed results):**
+```
+# First, compare with JSON output:
+compare_domains domains=["example.com", "competitor.com"] output_format="json"
+# Then export without re-crawling:
+export_bulk_report format="html" analysis_results=<result from above>
+```
+**Returns:**
+- **Domain comparison:** Rankings, category comparison table, quick facts, per-domain recommendations
+- **URL/Sitemap analysis:** Per-page results, category averages, common issues across pages, weakest/strongest pages
+---
+## GEO Scoring Categories
+The analysis evaluates 10 categories, each with a weight reflecting its importance for AI/LLM readiness:
+| # | Category | Weight | What It Measures |
+|---|----------|--------|------------------|
+| 1 | **Structured Data & Schema** | 1.5x | JSON-LD presence, Schema.org types (FAQPage, Article, Product, etc.), Speakable markup, schema validation |
+| 2 | **Semantic HTML** | 1.2x | Heading hierarchy (H1-H6), semantic elements (`<article>`, `<nav>`, `<main>`), content-to-markup ratio |
+| 3 | **Accessibility for Agents** | 1.0x | Lang attribute, alt text on images, ARIA labels, descriptive link text |
+| 4 | **Internal Linking** | 1.0x | Link density, navigation structure, breadcrumb markup |
+| 5 | **Meta & Discoverability** | 1.0x | Title, meta description, canonical URL, Open Graph tags, hreflang |
+| 6 | **Machine Readability** | 1.5x | SSR detection, bot blocking checks, robots.txt rules, llms.txt presence* |
+| 7 | **Entity & Authority** | 1.0x | Author information, publication dates, organization schema |
+| 8 | **Citability & Answer-Readiness** | 1.3x | FAQ content, data tables, lists, lead paragraph quality |
+| 9 | **Performance & Crawlability** | 0.3x | Image dimensions, lazy loading, resource hints |
+| 10 | **Agent Interactivity** | 0.2x | WebMCP tools, form annotations, agent-callable actions |
+*\*llms.txt is checked for presence but is not currently supported or consumed by any major AI model or crawler. It has minimal practical impact on GEO readiness today — see the [`check_llms_txt`](#check_llms_txt) section for details.*
+### Scoring
+- Each category produces a **score from 0-100**
+- The **overall score** is a weighted average using the weights above
+- Scores map to **letter grades**: A+ (90+), A (80+), B (70+), C (60+), D (40+), F (<40)
+---
+## Rate Limiting
+To prevent overwhelming target servers during batch operations, the MCP server enforces per-domain rate limiting:
+### Configuration
+1. **Environment variable:** Set `GLIPPY_RATE_LIMIT=3` for 3 requests/second default
+2. **Per-call parameter:** Pass `rate_limit` to `analyze_sitemap`, `analyze_urls`, or `export_bulk_report`
+### Recommended Values
+| Scenario | Rate Limit | Description |
+|----------|------------|-------------|
+| Polite crawling | `0.5` - `1` | 1 request every 1-2 seconds |
+| Default | `5` | 5 requests/second (balanced) |
+| Your own server | `10` - `50` | Faster crawling when you control the target |
+| Aggressive | `100` | Maximum speed (use with caution) |
+### How It Works
+- Requests to different domains run in parallel
+- Requests to the same domain are serialized with the configured delay
+- Global concurrency is capped at 10 simultaneous requests
+---
+## Output Formats
+### Text (Default)
+All tools return structured text output by default, suitable for:
+- Inline display in chat
+- Quick analysis and follow-up questions
+- Programmatic parsing
+### Markdown Reports
+Generated by `export_report` and `export_bulk_report`:
+- Clean, readable structure
+- Priority-sorted recommendations (High → Medium → Low)
+- Tables for easy comparison
+- Save as `.md` file
+### HTML Reports
+Generated by `export_report` and `export_bulk_report`:
+- Standalone, self-contained page (no external dependencies)
+- Dark/light theme toggle with system preference detection
+- Interactive category accordion
+- Score ring visualization
+- Copy recommendations button
+- Print-friendly styling
+- Save as `.html` file
+---
+## Caching & Efficient Workflows
+The MCP server includes smart caching and result-passing features to avoid redundant crawling.
+### Automatic Caching
+Analysis results are cached in-memory for **5 minutes** with the following behavior:
+- **Key:** `domain + maxPages` — cached results are reused when the same domain is analyzed again
+- **Smart coverage:** If you request `max_pages=3` and there's a cached result with `max_pages=5`, the cache is used
+- **Automatic:** No configuration needed — just call tools normally and caching happens automatically
+**Example workflow (automatic):**
+```
+# First call — crawls the site
+analyze_domain domain="example.com" max_pages=5
+# Second call within 5 minutes — uses cached result
+export_report domain="example.com" format="html"
+```
+### JSON Output Mode
+For explicit control, use `output_format="json"` to get raw analysis results that can be passed to export tools.
+**Single domain workflow:**
+```
+# Step 1: Analyze with JSON output
+analyze_domain domain="example.com" max_pages=5 output_format="json"
+# Returns full analysis object as JSON
+# Step 2: Export multiple formats without re-crawling
+export_report format="html" analysis_result=<JSON from step 1>
+export_report format="markdown_full" analysis_result=<JSON from step 1>
+```
+**Multi-domain workflow:**
+```
+# Step 1: Compare with JSON output
+compare_domains domains=["site1.com", "site2.com"] output_format="json"
+# Returns array of analysis results
+# Step 2: Generate report without re-crawling
+export_bulk_report format="html" analysis_results=<JSON from step 1>
+```
+**Sitemap/URL workflow:**
+```
+# Step 1: Analyze sitemap with JSON output
+analyze_sitemap sitemap_url="https://example.com/sitemap.xml" output_format="json"
+# Returns { sitemap_url, pageResults, aggregated }
+# Step 2: Generate report without re-crawling
+export_bulk_report format="html" analysis_results=<JSON from step 1>
+```
+### When to Use Each Approach
+| Scenario | Recommended Approach |
+|----------|---------------------|
+| Quick analysis + single export | Automatic caching (just call both tools) |
+| Generate multiple report formats | JSON output mode (analyze once, export many) |
+| Time-sensitive workflow | JSON output mode (guaranteed no re-crawling) |
+| Interactive exploration | Automatic caching (ask questions, then export) |
+---
+## Architecture
+```
+research-mcp/
+├── src/
+│   ├── index.js          # MCP server — tool registration, JSON-RPC handling, license validation
+│   └── geo-checker.js    # GEO analysis engine — fetches & scores domains
+├── package.json
+└── README.md
+```
+### Analysis Flow
+1. **Fetch resources in parallel:**
+   - robots.txt
+   - llms.txt
+   - Homepage HTML
+   - sitemap.xml
+   - UCP profile (/.well-known/ucp)
+2. **Parse HTML with cheerio** (server-side DOM)
+3. **Run 10 weighted scoring categories**
+4. **Return comprehensive analysis** with actionable recommendations
+### Protocol
+- **Transport:** stdio (JSON-RPC 2.0 over stdin/stdout)
+- **SDK:** `@modelcontextprotocol/sdk` (official TypeScript MCP SDK)
+- **Logging:** All logs go to stderr (stdout reserved for MCP protocol)
+---
+## Manual Testing
+Test the MCP server directly via command line:
+```bash
+# Send MCP init + tool list request via stdin
+echo '{"jsonrpc":"2.0","id":1,"method":"initialize","params":{"protocolVersion":"2025-03-26","capabilities":{},"clientInfo":{"name":"test","version":"1.0.0"}}}
+{"jsonrpc":"2.0","method":"notifications/initialized"}
+{"jsonrpc":"2.0","id":2,"method":"tools/list","params":{}}' | GLIPPY_LICENSE_KEY=your-key node src/index.js 2>/dev/null
+```
+---
+## Troubleshooting
+### "License error: No license key configured"
+**Cause:** The `GLIPPY_LICENSE_KEY` environment variable is not set.
+**Fix:** Add the key to your MCP configuration:
+```json
+"env": {
+  "GLIPPY_LICENSE_KEY": "GLMCP-XXXX-XXXX-XXXX"
+}
+```
+### "License validation failed"
+**Cause:** Invalid or expired license key.
+**Fix:** Get a valid key at [glippy.dev](https://glippy.dev).
+### "Could not reach license server"
+**Cause:** Network connectivity issue or firewall blocking.
+**Fix:**
+- Check your internet connection
+- Ensure `glippy-mcp-api.info-8cb.workers.dev` is accessible
+- If you have a cached valid license, the server will continue working for 24 hours
+### "Error analysing domain: HTTP 403/404"
+**Cause:** Target site is blocking requests or page doesn't exist.
+**Fix:**
+- Verify the domain is accessible in a browser
+- Some sites block automated requests — try a different domain
+- Check if the site requires authentication
+### "No URLs found in sitemap"
+**Cause:** The sitemap doesn't contain `<loc>` entries or uses an unexpected format.
+**Fix:**
+- Verify the sitemap URL returns valid XML
+- Check that URLs in the sitemap match the expected domain
+- For sitemap indexes, ensure sub-sitemaps are accessible
+### High memory usage during batch analysis
+**Cause:** Analysing too many URLs at once.
+**Fix:**
+- Use `max_urls` parameter to limit sitemap crawling
+- Reduce `max_pages` for domain comparison
+- Process URLs in smaller batches
+---
+## AI Crawlers Detected
+The server checks access rules for these AI crawlers in robots.txt:
+| Crawler | Company | Purpose |
+|---------|---------|---------|
+| GPTBot | OpenAI | Training data for GPT models |
+| ChatGPT-User | OpenAI | Real-time browsing in ChatGPT |
+| Google-Extended | Google | Training data for Bard/Gemini |
+| ClaudeBot | Anthropic | Training data for Claude |
+| anthropic-ai | Anthropic | Anthropic's general crawler |
+| CCBot | Common Crawl | Open web corpus |
+| PerplexityBot | Perplexity AI | Search and answer engine |
+| Bytespider | ByteDance | TikTok/Douyin AI features |
+| AmazonBot | Amazon | Alexa and shopping AI |
+| cohere-ai | Cohere | Enterprise AI models |
+---
+## License
+See LICENSE file for licensing terms. Get your license key at [glippy.dev](https://glippy.dev).
+---
+## Support
+- **Integration Guide:** [docs/INTEGRATIONS.md](docs/INTEGRATIONS.md)
+- **Online Documentation:** [glippy.dev/docs](https://glippy.dev)
+- **Issues:** [github.com/jbobbink/glippy/issues](https://github.com/jbobbink/glippy/issues)
+- **Homepage:** [glippy.dev](https://glippy.dev)
+---
+*Generated by [Glippy](https://www.glippy.dev) — GEO Agent-Readiness Checker*