glippy-mcp 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md ADDED
@@ -0,0 +1,734 @@
1
+ # Glippy GEO MCP Server
2
+
3
+ An MCP (Model Context Protocol) server that exposes Glippy's GEO (Generative Engine Optimization) analysis capabilities as tools for AI agents.
4
+
5
+ ## Overview
6
+
7
+ This MCP server enables AI models (Claude, GPT, etc.) to directly analyse any domain's **GEO readiness** — how well a website is prepared for AI crawlers, LLM-powered search, and agent interaction.
8
+
9
+ It wraps the Glippy desktop app's server-side analysis engine (`geo-checker.js`) and exposes it over the standard MCP protocol via stdio transport.
10
+
11
+ **Key features:**
12
+ - Full 10-category GEO analysis with weighted scoring
13
+ - robots.txt AI crawler access detection
14
+ - llms.txt file discovery and parsing
15
+ - Sitemap crawling and multi-page analysis
16
+ - Domain comparison and competitive analysis
17
+ - Export to styled Markdown or HTML reports
18
+ - **Smart caching** — automatic deduplication of repeated analyses
19
+ - **JSON output mode** — pass analysis results between tools to avoid re-crawling
20
+
21
+ ---
22
+
23
+ ## Table of Contents
24
+
25
+ - [Installation](#installation)
26
+ - [Configuration](#configuration)
27
+ - [Claude Desktop](#usage-with-claude-desktop)
28
+ - [Claude Code](#usage-with-claude-code)
29
+ - [Environment Variables](#environment-variables)
30
+ - [Integration Guides](#integration-guides)
31
+ - [Tools Reference](#tools-reference)
32
+ - [analyze_domain](#analyze_domain)
33
+ - [check_robots_txt](#check_robots_txt)
34
+ - [check_llms_txt](#check_llms_txt)
35
+ - [get_geo_summary](#get_geo_summary)
36
+ - [compare_domains](#compare_domains)
37
+ - [analyze_sitemap](#analyze_sitemap)
38
+ - [analyze_urls](#analyze_urls)
39
+ - [export_report](#export_report)
40
+ - [export_bulk_report](#export_bulk_report)
41
+ - [GEO Scoring Categories](#geo-scoring-categories)
42
+ - [Rate Limiting](#rate-limiting)
43
+ - [Output Formats](#output-formats)
44
+ - [Architecture](#architecture)
45
+ - [Manual Testing](#manual-testing)
46
+ - [Troubleshooting](#troubleshooting)
47
+ - [License](#license)
48
+
49
+ ---
50
+
51
+ ## Installation
52
+
53
+ ### Via npm (recommended)
54
+
55
+ ```bash
56
+ npm install -g glippy-mcp
57
+ ```
58
+
59
+ ### Via npx (no install needed)
60
+
61
+ Use directly via `npx` in your MCP configuration:
62
+
63
+ ```bash
64
+ npx -y glippy-mcp
65
+ ```
66
+
67
+ ### Requirements
68
+
69
+ - Node.js 18.0.0 or higher
70
+ - Valid Glippy MCP license key
71
+
72
+ ---
73
+
74
+ ## Configuration
75
+
76
+ ### License Key
77
+
78
+ A valid Glippy MCP license key (`GLMCP-XXXX-XXXX-XXXX`) is required. Get one at [glippy.dev](https://glippy.dev).
79
+
80
+ The server validates the key against the Glippy API on first use and caches the result for 24 hours. **Analysis runs locally on your machine** — only the license check calls the server.
81
+
82
+ ### Usage with Claude Desktop
83
+
84
+ Add to your `claude_desktop_config.json`:
85
+
86
+ ```json
87
+ {
88
+ "mcpServers": {
89
+ "glippy-geo": {
90
+ "command": "npx",
91
+ "args": ["-y", "glippy-mcp"],
92
+ "env": {
93
+ "GLIPPY_LICENSE_KEY": "GLMCP-XXXX-XXXX-XXXX"
94
+ }
95
+ }
96
+ }
97
+ }
98
+ ```
99
+
100
+ **Config file locations:**
101
+ - macOS: `~/Library/Application Support/Claude/claude_desktop_config.json`
102
+ - Windows: `%APPDATA%\Claude\claude_desktop_config.json`
103
+ - Linux: `~/.config/Claude/claude_desktop_config.json`
104
+
105
+ ### Usage with Claude Code
106
+
107
+ Add to your `.mcp.json` in your project root or `~/.claude/.mcp.json` for global access:
108
+
109
+ ```json
110
+ {
111
+ "mcpServers": {
112
+ "glippy-geo": {
113
+ "command": "npx",
114
+ "args": ["-y", "glippy-mcp"],
115
+ "env": {
116
+ "GLIPPY_LICENSE_KEY": "GLMCP-XXXX-XXXX-XXXX"
117
+ }
118
+ }
119
+ }
120
+ }
121
+ ```
122
+
123
+ ### Environment Variables
124
+
125
+ | Variable | Required | Default | Description |
126
+ |----------|----------|---------|-------------|
127
+ | `GLIPPY_LICENSE_KEY` | Yes | — | Your MCP license key (`GLMCP-XXXX-XXXX-XXXX`) |
128
+ | `GLIPPY_RATE_LIMIT` | No | `5` | Default max requests/second per domain for batch tools |
129
+
130
+ ---
131
+
132
+ ## Integration Guides
133
+
134
+ For detailed setup instructions across all supported environments, see the **[Integration Guide](docs/INTEGRATIONS.md)**.
135
+
136
+ ### Supported Environments
137
+
138
+ | Environment | Support Level | Config File |
139
+ |-------------|---------------|-------------|
140
+ | **Claude Code** (VS Code) | Native MCP | `.mcp.json` |
141
+ | **Claude CLI** (Terminal) | Native MCP | `.mcp.json` |
142
+ | **Claude Desktop** | Native MCP | `claude_desktop_config.json` |
143
+ | **Cursor IDE** | Native MCP | `.cursor/mcp.json` |
144
+ | **Windsurf IDE** | Native MCP | `.windsurf/mcp.json` |
145
+ | **Continue.dev** | Native MCP | `~/.continue/config.json` |
146
+ | **ChatGPT / OpenAI** | Via bridge/API | Custom integration |
147
+
148
+ The integration guide includes:
149
+ - Step-by-step setup for each environment
150
+ - Platform-specific config file locations
151
+ - Usage examples and prompts
152
+ - Verification and testing instructions
153
+ - Troubleshooting tips
154
+
155
+ ---
156
+
157
+ ## Tools Reference
158
+
159
+ ### analyze_domain
160
+
161
+ Run a comprehensive GEO readiness analysis on a domain.
162
+
163
+ **Description:** Checks robots.txt, llms.txt, homepage HTML (10 scoring categories), sitemap.xml, and security headers. Returns an overall weighted score (0-100) with per-category breakdowns and actionable recommendations. Use `output_format="json"` to get raw results that can be passed to `export_report`.
164
+
165
+ **Parameters:**
166
+
167
+ | Parameter | Type | Required | Description |
168
+ |-----------|------|----------|-------------|
169
+ | `domain` | string | Yes | The domain to analyse, e.g. `"example.com"`. Do not include `https://` prefix. |
170
+ | `max_pages` | integer | No | Maximum pages to crawl (1-10). Default: `10`. |
171
+ | `output_format` | enum | No | `"text"` (default) for human-readable report, `"json"` for raw results to pass to `export_report`. |
172
+
173
+ **Example:**
174
+ ```
175
+ Analyse GEO readiness for example.com
176
+ ```
177
+
178
+ **Example (JSON output for chaining):**
179
+ ```
180
+ analyze_domain domain="example.com" max_pages=5 output_format="json"
181
+ # Then pass the result to export_report
182
+ ```
183
+
184
+ **Returns:**
185
+ - Overall GEO score (0-100) with letter grade
186
+ - Page type detection (article, product, homepage, etc.)
187
+ - 10 category scores with pass/fail/warn checks
188
+ - robots.txt analysis with AI crawler access
189
+ - llms.txt presence and content preview
190
+ - Sitemap discovery status
191
+ - Multi-page aggregated scores (if `max_pages > 1`)
192
+
193
+ ---
194
+
195
+ ### check_robots_txt
196
+
197
+ Check a domain's robots.txt specifically for AI crawler access rules.
198
+
199
+ **Description:** Reports which AI crawlers (GPTBot, ClaudeBot, etc.) are blocked or allowed.
200
+
201
+ **Parameters:**
202
+
203
+ | Parameter | Type | Required | Description |
204
+ |-----------|------|----------|-------------|
205
+ | `domain` | string | Yes | The domain to check, e.g. `"example.com"`. Do not include `https://` prefix. |
206
+
207
+ **Example:**
208
+ ```
209
+ Check which AI crawlers are blocked on example.com
210
+ ```
211
+
212
+ **Returns:**
213
+ - robots.txt existence and URL
214
+ - Wildcard disallow detection (`Disallow: /`)
215
+ - Per-crawler access status for:
216
+ - GPTBot
217
+ - Google-Extended
218
+ - CCBot
219
+ - anthropic-ai
220
+ - ClaudeBot
221
+ - Bytespider
222
+ - PerplexityBot
223
+ - ChatGPT-User
224
+ - AmazonBot
225
+ - cohere-ai
226
+ - Sitemap references found in robots.txt
227
+
228
+ ---
229
+
230
+ ### check_llms_txt
231
+
232
+ Check if a domain has an llms.txt file.
233
+
234
+ **Description:** Checks for the emerging standard file that provides context to LLMs about a site's purpose and content.
235
+
236
+ > **Important:** llms.txt is an emerging proposal, but it is **not currently supported or consumed** by major AI models, crawlers, or MCP clients. No mainstream LLM or AI agent reads llms.txt to inform its behaviour. Having an llms.txt file should **not be seen as a relevant optimization** for your GEO readiness — it will not meaningfully improve how AI systems discover or understand your site today. That said, it cannot hurt to have one: the file is lightweight, easy to create, and if the standard gains adoption in the future you will already be prepared.
237
+
238
+ **Parameters:**
239
+
240
+ | Parameter | Type | Required | Description |
241
+ |-----------|------|----------|-------------|
242
+ | `domain` | string | Yes | The domain to check, e.g. `"example.com"`. Do not include `https://` prefix. |
243
+
244
+ **Example:**
245
+ ```
246
+ Does example.com have an llms.txt file?
247
+ ```
248
+
249
+ **Returns:**
250
+ - llms.txt existence
251
+ - Full file contents if present
252
+ - Link to specification at https://llmstxt.org
253
+
254
+ ---
255
+
256
+ ### get_geo_summary
257
+
258
+ Get a concise GEO readiness summary for quick assessment.
259
+
260
+ **Description:** Returns overall score, grade, top 3 strengths, and top 3 issues to fix. Use this for a quick overview; use `analyze_domain` for full details.
261
+
262
+ **Parameters:**
263
+
264
+ | Parameter | Type | Required | Description |
265
+ |-----------|------|----------|-------------|
266
+ | `domain` | string | Yes | The domain to check, e.g. `"example.com"`. Do not include `https://` prefix. |
267
+
268
+ **Example:**
269
+ ```
270
+ Give me a quick GEO summary of example.com
271
+ ```
272
+
273
+ **Returns:**
274
+ - Overall score and grade
275
+ - Page type detected
276
+ - Top 3 strongest categories
277
+ - Top 3 weakest categories with top issue
278
+ - Quick facts (robots.txt, llms.txt, sitemap, blocked crawlers)
279
+
280
+ ---
281
+
282
+ ### compare_domains
283
+
284
+ Analyse multiple domains in parallel and compare scores.
285
+
286
+ **Description:** Returns a comparison table with overall scores, per-category breakdowns, and a ranked summary. Useful for competitive analysis or auditing a portfolio of sites. Use `output_format="json"` to get raw results that can be passed to `export_bulk_report`.
287
+
288
+ **Parameters:**
289
+
290
+ | Parameter | Type | Required | Description |
291
+ |-----------|------|----------|-------------|
292
+ | `domains` | array[string] | Yes | List of 2-10 domains to compare, e.g. `["example.com", "competitor.com"]`. Do not include `https://` prefix. |
293
+ | `max_pages` | integer | No | Maximum pages to crawl per domain (1-10). Default: `10`. |
294
+ | `output_format` | enum | No | `"text"` (default) for comparison table, `"json"` for raw results to pass to `export_bulk_report`. |
295
+
296
+ **Example:**
297
+ ```
298
+ Compare GEO scores of example.com, competitor1.com, and competitor2.com
299
+ ```
300
+
301
+ **Returns:**
302
+ - Ranked list of domains by score
303
+ - Category comparison table (all 10 categories)
304
+ - Quick facts comparison (robots.txt, llms.txt, sitemap, blocked crawlers)
305
+ - Error details for any failed analyses
306
+
307
+ ---
308
+
309
+ ### analyze_sitemap
310
+
311
+ Fetch a sitemap and analyse all discovered pages.
312
+
313
+ **Description:** Fetches a sitemap XML (or sitemap index), extracts page URLs, and runs GEO analysis on each page. Returns per-page scores, category averages, and identifies weakest pages. Use `output_format="json"` to get raw results that can be passed to `export_bulk_report`.
314
+
315
+ **Parameters:**
316
+
317
+ | Parameter | Type | Required | Description |
318
+ |-----------|------|----------|-------------|
319
+ | `sitemap_url` | string | Yes | Full URL to sitemap, e.g. `"https://example.com/sitemap.xml"` |
320
+ | `max_urls` | integer | No | Maximum URLs to analyse (1-50,000). Default: all URLs found. |
321
+ | `rate_limit` | number | No | Max requests/second per domain (0.1-100). Default: `5`. |
322
+ | `output_format` | enum | No | `"text"` (default) for report, `"json"` for raw results to pass to `export_bulk_report`. |
323
+
324
+ **Example:**
325
+ ```
326
+ Analyse all pages in https://example.com/sitemap.xml
327
+ ```
328
+
329
+ **Returns:**
330
+ - Total URLs found vs analysed
331
+ - Per-page results table (URL, score, grade, page type)
332
+ - Category averages across all pages
333
+ - Weakest pages with their problem categories
334
+
335
+ **Supports:**
336
+ - Regular sitemaps (`<urlset>`)
337
+ - Sitemap index files (`<sitemapindex>`) — fetches up to 3 sub-sitemaps
338
+
339
+ ---
340
+
341
+ ### analyze_urls
342
+
343
+ Run GEO analysis on a list of specific URLs.
344
+
345
+ **Description:** Fetches each page, scores it across 10 categories, and returns per-page results with aggregated averages. URLs can span multiple domains. Use `output_format="json"` to get raw results that can be passed to `export_bulk_report`.
346
+
347
+ **Parameters:**
348
+
349
+ | Parameter | Type | Required | Description |
350
+ |-----------|------|----------|-------------|
351
+ | `urls` | array[string] | Yes | List of 1-50,000 full URLs, e.g. `["https://example.com/about", "https://example.com/pricing"]`. Include `https://` prefix. |
352
+ | `rate_limit` | number | No | Max requests/second per domain (0.1-100). Default: `5`. |
353
+ | `output_format` | enum | No | `"text"` (default) for report, `"json"` for raw results to pass to `export_bulk_report`. |
354
+
355
+ **Example:**
356
+ ```
357
+ Analyse these specific pages: https://example.com/about, https://example.com/pricing, https://example.com/contact
358
+ ```
359
+
360
+ **Returns:**
361
+ - Per-page results table (URL, score, grade, page type)
362
+ - Category averages across all pages
363
+ - Weakest pages with their problem categories
364
+
365
+ ---
366
+
367
+ ### export_report
368
+
369
+ Generate a styled, shareable report file.
370
+
371
+ **Description:** Runs GEO analysis and returns results as a self-contained report in Markdown or HTML format — matching the Glippy browser extension's export output. You can optionally pass pre-computed analysis results to avoid re-crawling.
372
+
373
+ **Parameters:**
374
+
375
+ | Parameter | Type | Required | Description |
376
+ |-----------|------|----------|-------------|
377
+ | `domain` | string | No* | The domain to analyse, e.g. `"example.com"`. Do not include `https://` prefix. |
378
+ | `format` | enum | Yes | Report format: `"markdown"` (recommendations only), `"markdown_full"` (all categories and checks), or `"html"` (standalone styled page). |
379
+ | `max_pages` | integer | No | Maximum pages to crawl (1-10). Default: `10`. Ignored if `analysis_result` is provided. |
380
+ | `analysis_result` | object | No* | Pre-computed analysis result from `analyze_domain` (with `output_format="json"`). Skips re-crawling. |
381
+
382
+ *Either `domain` or `analysis_result` must be provided.
383
+
384
+ **Example:**
385
+ ```
386
+ Generate an HTML report for example.com
387
+ ```
388
+
389
+ **Example (using pre-computed results):**
390
+ ```
391
+ # First, analyze with JSON output:
392
+ analyze_domain domain="example.com" max_pages=5 output_format="json"
393
+
394
+ # Then export without re-crawling:
395
+ export_report format="html" analysis_result=<result from above>
396
+ ```
397
+
398
+ **Returns:**
399
+ - Complete report content ready to save
400
+ - For HTML: Standalone page with dark/light theme toggle, score ring, category accordion, recommendations table
401
+ - For Markdown: Structured document with priority-sorted recommendations
402
+
403
+ ---
404
+
405
+ ### export_bulk_report
406
+
407
+ Generate a styled report for bulk analysis.
408
+
409
+ **Description:** Creates a comprehensive report for comparing multiple domains, analysing a list of URLs, or crawling a sitemap. Returns a self-contained report with rankings, category breakdowns, and per-domain/page recommendations. You can pass pre-computed results to avoid re-crawling.
410
+
411
+ **Parameters:**
412
+
413
+ | Parameter | Type | Required | Description |
414
+ |-----------|------|----------|-------------|
415
+ | `format` | enum | Yes | Report format: `"markdown"` or `"html"` |
416
+ | `domains` | array[string] | No* | Compare 2-10 domains. Do not include `https://`. |
417
+ | `urls` | array[string] | No* | Analyse 1-50,000 specific URLs. Include `https://`. |
418
+ | `sitemap_url` | string | No* | Crawl a sitemap URL. |
419
+ | `analysis_results` | object | No* | Pre-computed results from `compare_domains`, `analyze_urls`, or `analyze_sitemap` (with `output_format="json"`). |
420
+ | `max_pages` | integer | No | For domain mode: pages per domain (1-10). Default: `10`. Ignored if `analysis_results` provided. |
421
+ | `max_urls` | integer | No | For sitemap mode: max URLs to analyse. Default: all. Ignored if `analysis_results` provided. |
422
+ | `rate_limit` | number | No | Max requests/second per domain. Default: `5`. Ignored if `analysis_results` provided. |
423
+
424
+ *Provide exactly one of: `domains`, `urls`, `sitemap_url`, or `analysis_results`.
425
+
426
+ **Example:**
427
+ ```
428
+ Generate an HTML comparison report for example.com and competitor.com
429
+ ```
430
+
431
+ **Example (using pre-computed results):**
432
+ ```
433
+ # First, compare with JSON output:
434
+ compare_domains domains=["example.com", "competitor.com"] output_format="json"
435
+
436
+ # Then export without re-crawling:
437
+ export_bulk_report format="html" analysis_results=<result from above>
438
+ ```
439
+
440
+ **Returns:**
441
+ - **Domain comparison:** Rankings, category comparison table, quick facts, per-domain recommendations
442
+ - **URL/Sitemap analysis:** Per-page results, category averages, common issues across pages, weakest/strongest pages
443
+
444
+ ---
445
+
446
+ ## GEO Scoring Categories
447
+
448
+ The analysis evaluates 10 categories, each with a weight reflecting its importance for AI/LLM readiness:
449
+
450
+ | # | Category | Weight | What It Measures |
451
+ |---|----------|--------|------------------|
452
+ | 1 | **Structured Data & Schema** | 1.5x | JSON-LD presence, Schema.org types (FAQPage, Article, Product, etc.), Speakable markup, schema validation |
453
+ | 2 | **Semantic HTML** | 1.2x | Heading hierarchy (H1-H6), semantic elements (`<article>`, `<nav>`, `<main>`), content-to-markup ratio |
454
+ | 3 | **Accessibility for Agents** | 1.0x | Lang attribute, alt text on images, ARIA labels, descriptive link text |
455
+ | 4 | **Internal Linking** | 1.0x | Link density, navigation structure, breadcrumb markup |
456
+ | 5 | **Meta & Discoverability** | 1.0x | Title, meta description, canonical URL, Open Graph tags, hreflang |
457
+ | 6 | **Machine Readability** | 1.5x | SSR detection, bot blocking checks, robots.txt rules, llms.txt presence* |
458
+ | 7 | **Entity & Authority** | 1.0x | Author information, publication dates, organization schema |
459
+ | 8 | **Citability & Answer-Readiness** | 1.3x | FAQ content, data tables, lists, lead paragraph quality |
460
+ | 9 | **Performance & Crawlability** | 0.3x | Image dimensions, lazy loading, resource hints |
461
+ | 10 | **Agent Interactivity** | 0.2x | WebMCP tools, form annotations, agent-callable actions |
462
+
463
+ *\*llms.txt is checked for presence but is not currently supported or consumed by any major AI model or crawler. It has minimal practical impact on GEO readiness today — see the [`check_llms_txt`](#check_llms_txt) section for details.*
464
+
465
+ ### Scoring
466
+
467
+ - Each category produces a **score from 0-100**
468
+ - The **overall score** is a weighted average using the weights above
469
+ - Scores map to **letter grades**: A+ (90+), A (80+), B (70+), C (60+), D (40+), F (<40)
470
+
471
+ ---
472
+
473
+ ## Rate Limiting
474
+
475
+ To prevent overwhelming target servers during batch operations, the MCP server enforces per-domain rate limiting:
476
+
477
+ ### Configuration
478
+
479
+ 1. **Environment variable:** Set `GLIPPY_RATE_LIMIT=3` for 3 requests/second default
480
+ 2. **Per-call parameter:** Pass `rate_limit` to `analyze_sitemap`, `analyze_urls`, or `export_bulk_report`
481
+
482
+ ### Recommended Values
483
+
484
+ | Scenario | Rate Limit | Description |
485
+ |----------|------------|-------------|
486
+ | Polite crawling | `0.5` - `1` | 1 request every 1-2 seconds |
487
+ | Default | `5` | 5 requests/second (balanced) |
488
+ | Your own server | `10` - `50` | Faster crawling when you control the target |
489
+ | Aggressive | `100` | Maximum speed (use with caution) |
490
+
491
+ ### How It Works
492
+
493
+ - Requests to different domains run in parallel
494
+ - Requests to the same domain are serialized with the configured delay
495
+ - Global concurrency is capped at 10 simultaneous requests
496
+
497
+ ---
498
+
499
+ ## Output Formats
500
+
501
+ ### Text (Default)
502
+
503
+ All tools return structured text output by default, suitable for:
504
+ - Inline display in chat
505
+ - Quick analysis and follow-up questions
506
+ - Programmatic parsing
507
+
508
+ ### Markdown Reports
509
+
510
+ Generated by `export_report` and `export_bulk_report`:
511
+ - Clean, readable structure
512
+ - Priority-sorted recommendations (High → Medium → Low)
513
+ - Tables for easy comparison
514
+ - Save as `.md` file
515
+
516
+ ### HTML Reports
517
+
518
+ Generated by `export_report` and `export_bulk_report`:
519
+ - Standalone, self-contained page (no external dependencies)
520
+ - Dark/light theme toggle with system preference detection
521
+ - Interactive category accordion
522
+ - Score ring visualization
523
+ - Copy recommendations button
524
+ - Print-friendly styling
525
+ - Save as `.html` file
526
+
527
+ ---
528
+
529
+ ## Caching & Efficient Workflows
530
+
531
+ The MCP server includes smart caching and result-passing features to avoid redundant crawling.
532
+
533
+ ### Automatic Caching
534
+
535
+ Analysis results are cached in-memory for **5 minutes** with the following behavior:
536
+
537
+ - **Key:** `domain + maxPages` — cached results are reused when the same domain is analyzed again
538
+ - **Smart coverage:** If you request `max_pages=3` and there's a cached result with `max_pages=5`, the cache is used
539
+ - **Automatic:** No configuration needed — just call tools normally and caching happens automatically
540
+
541
+ **Example workflow (automatic):**
542
+ ```
543
+ # First call — crawls the site
544
+ analyze_domain domain="example.com" max_pages=5
545
+
546
+ # Second call within 5 minutes — uses cached result
547
+ export_report domain="example.com" format="html"
548
+ ```
549
+
550
+ ### JSON Output Mode
551
+
552
+ For explicit control, use `output_format="json"` to get raw analysis results that can be passed to export tools.
553
+
554
+ **Single domain workflow:**
555
+ ```
556
+ # Step 1: Analyze with JSON output
557
+ analyze_domain domain="example.com" max_pages=5 output_format="json"
558
+ # Returns full analysis object as JSON
559
+
560
+ # Step 2: Export multiple formats without re-crawling
561
+ export_report format="html" analysis_result=<JSON from step 1>
562
+ export_report format="markdown_full" analysis_result=<JSON from step 1>
563
+ ```
564
+
565
+ **Multi-domain workflow:**
566
+ ```
567
+ # Step 1: Compare with JSON output
568
+ compare_domains domains=["site1.com", "site2.com"] output_format="json"
569
+ # Returns array of analysis results
570
+
571
+ # Step 2: Generate report without re-crawling
572
+ export_bulk_report format="html" analysis_results=<JSON from step 1>
573
+ ```
574
+
575
+ **Sitemap/URL workflow:**
576
+ ```
577
+ # Step 1: Analyze sitemap with JSON output
578
+ analyze_sitemap sitemap_url="https://example.com/sitemap.xml" output_format="json"
579
+ # Returns { sitemap_url, pageResults, aggregated }
580
+
581
+ # Step 2: Generate report without re-crawling
582
+ export_bulk_report format="html" analysis_results=<JSON from step 1>
583
+ ```
584
+
585
+ ### When to Use Each Approach
586
+
587
+ | Scenario | Recommended Approach |
588
+ |----------|---------------------|
589
+ | Quick analysis + single export | Automatic caching (just call both tools) |
590
+ | Generate multiple report formats | JSON output mode (analyze once, export many) |
591
+ | Time-sensitive workflow | JSON output mode (guaranteed no re-crawling) |
592
+ | Interactive exploration | Automatic caching (ask questions, then export) |
593
+
594
+ ---
595
+
596
+ ## Architecture
597
+
598
+ ```
599
+ research-mcp/
600
+ ├── src/
601
+ │ ├── index.js # MCP server — tool registration, JSON-RPC handling, license validation
602
+ │ └── geo-checker.js # GEO analysis engine — fetches & scores domains
603
+ ├── package.json
604
+ └── README.md
605
+ ```
606
+
607
+ ### Analysis Flow
608
+
609
+ 1. **Fetch resources in parallel:**
610
+ - robots.txt
611
+ - llms.txt
612
+ - Homepage HTML
613
+ - sitemap.xml
614
+ - UCP profile (/.well-known/ucp)
615
+
616
+ 2. **Parse HTML with cheerio** (server-side DOM)
617
+
618
+ 3. **Run 10 weighted scoring categories**
619
+
620
+ 4. **Return comprehensive analysis** with actionable recommendations
621
+
622
+ ### Protocol
623
+
624
+ - **Transport:** stdio (JSON-RPC 2.0 over stdin/stdout)
625
+ - **SDK:** `@modelcontextprotocol/sdk` (official TypeScript MCP SDK)
626
+ - **Logging:** All logs go to stderr (stdout reserved for MCP protocol)
627
+
628
+ ---
629
+
630
+ ## Manual Testing
631
+
632
+ Test the MCP server directly via command line:
633
+
634
+ ```bash
635
+ # Send MCP init + tool list request via stdin
636
+ echo '{"jsonrpc":"2.0","id":1,"method":"initialize","params":{"protocolVersion":"2025-03-26","capabilities":{},"clientInfo":{"name":"test","version":"1.0.0"}}}
637
+ {"jsonrpc":"2.0","method":"notifications/initialized"}
638
+ {"jsonrpc":"2.0","id":2,"method":"tools/list","params":{}}' | GLIPPY_LICENSE_KEY=your-key node src/index.js 2>/dev/null
639
+ ```
640
+
641
+ ---
642
+
643
+ ## Troubleshooting
644
+
645
+ ### "License error: No license key configured"
646
+
647
+ **Cause:** The `GLIPPY_LICENSE_KEY` environment variable is not set.
648
+
649
+ **Fix:** Add the key to your MCP configuration:
650
+ ```json
651
+ "env": {
652
+ "GLIPPY_LICENSE_KEY": "GLMCP-XXXX-XXXX-XXXX"
653
+ }
654
+ ```
655
+
656
+ ### "License validation failed"
657
+
658
+ **Cause:** Invalid or expired license key.
659
+
660
+ **Fix:** Get a valid key at [glippy.dev](https://glippy.dev).
661
+
662
+ ### "Could not reach license server"
663
+
664
+ **Cause:** Network connectivity issue or firewall blocking.
665
+
666
+ **Fix:**
667
+ - Check your internet connection
668
+ - Ensure `glippy-mcp-api.info-8cb.workers.dev` is accessible
669
+ - If you have a cached valid license, the server will continue working for 24 hours
670
+
671
+ ### "Error analysing domain: HTTP 403/404"
672
+
673
+ **Cause:** Target site is blocking requests or page doesn't exist.
674
+
675
+ **Fix:**
676
+ - Verify the domain is accessible in a browser
677
+ - Some sites block automated requests — try a different domain
678
+ - Check if the site requires authentication
679
+
680
+ ### "No URLs found in sitemap"
681
+
682
+ **Cause:** The sitemap doesn't contain `<loc>` entries or uses an unexpected format.
683
+
684
+ **Fix:**
685
+ - Verify the sitemap URL returns valid XML
686
+ - Check that URLs in the sitemap match the expected domain
687
+ - For sitemap indexes, ensure sub-sitemaps are accessible
688
+
689
+ ### High memory usage during batch analysis
690
+
691
+ **Cause:** Analysing too many URLs at once.
692
+
693
+ **Fix:**
694
+ - Use `max_urls` parameter to limit sitemap crawling
695
+ - Reduce `max_pages` for domain comparison
696
+ - Process URLs in smaller batches
697
+
698
+ ---
699
+
700
+ ## AI Crawlers Detected
701
+
702
+ The server checks access rules for these AI crawlers in robots.txt:
703
+
704
+ | Crawler | Company | Purpose |
705
+ |---------|---------|---------|
706
+ | GPTBot | OpenAI | Training data for GPT models |
707
+ | ChatGPT-User | OpenAI | Real-time browsing in ChatGPT |
708
+ | Google-Extended | Google | Training data for Bard/Gemini |
709
+ | ClaudeBot | Anthropic | Training data for Claude |
710
+ | anthropic-ai | Anthropic | Anthropic's general crawler |
711
+ | CCBot | Common Crawl | Open web corpus |
712
+ | PerplexityBot | Perplexity AI | Search and answer engine |
713
+ | Bytespider | ByteDance | TikTok/Douyin AI features |
714
+ | AmazonBot | Amazon | Alexa and shopping AI |
715
+ | cohere-ai | Cohere | Enterprise AI models |
716
+
717
+ ---
718
+
719
+ ## License
720
+
721
+ See LICENSE file for licensing terms. Get your license key at [glippy.dev](https://glippy.dev).
722
+
723
+ ---
724
+
725
+ ## Support
726
+
727
+ - **Integration Guide:** [docs/INTEGRATIONS.md](docs/INTEGRATIONS.md)
728
+ - **Online Documentation:** [glippy.dev/docs](https://glippy.dev)
729
+ - **Issues:** [github.com/jbobbink/glippy/issues](https://github.com/jbobbink/glippy/issues)
730
+ - **Homepage:** [glippy.dev](https://glippy.dev)
731
+
732
+ ---
733
+
734
+ *Generated by [Glippy](https://www.glippy.dev) — GEO Agent-Readiness Checker*