npm - mcpbrowser - Versions diffs - 0.2.0 - Mend

mcpbrowser 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (4) hide show

package/LICENSE ADDED Viewed

@@ -0,0 +1,21 @@
+MIT License
+Copyright (c) 2025 cherchyk
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

package/README.md ADDED Viewed

@@ -0,0 +1,100 @@
+# MCPBrowser (MCP fetch tool with authenticated Chrome)
+An MCP server that exposes an authenticated page fetch tool for GitHub Copilot. It drives your signed-in Chrome/Edge via DevTools, reusing your profile to read restricted pages.
+## Quick Install
+### Option 1: Install from GitHub (Recommended)
+```bash
+git clone https://github.com/cherchyk/MCPBrowser.git
+cd MCPBrowser
+npm install
+copy .env.example .env  # optional: set Chrome overrides
+```
+### Option 2: Install via npx (when published to npm)
+```bash
+npx mcpbrowser
+```
+## Prereqs
+- Chrome or Edge installed.
+- Node 18+.
+## Run (automatic via Copilot)
+- Add the MCP server entry to VS Code settings (`github.copilot.chat.modelContextProtocolServers`, see below). Copilot will start the server automatically when it needs the tool—no manual launch required.
+- On first use, the server auto-launches Chrome/Edge with remote debugging if it cannot find an existing DevTools endpoint. Defaults: port `9222`, user data dir `%LOCALAPPDATA%/ChromeAuthProfile`. Override with `CHROME_PATH`, `CHROME_USER_DATA_DIR`, or `CHROME_REMOTE_DEBUG_PORT`.
+- The old `scripts/start-all.ps1` launcher was removed; Chrome startup is handled inside the MCP server.
+## Manual start (optional)
+Only if you want to run it yourself (Copilot already starts it when configured):
+```bash
+npm run mcp
+```
+Or manually:
+```powershell
+& "C:\Program Files\Google\Chrome\Application\chrome.exe" --remote-debugging-port=9222 --user-data-dir="$env:LOCALAPPDATA\ChromeAuthProfile"
+```
+Set `CHROME_PATH` if auto-detect fails; override profile with `CHROME_USER_DATA_DIR`, port with `CHROME_REMOTE_DEBUG_PORT`.
+## (Optional) Local runner
+There is no local LLM runner now; Copilot provides the LLM and calls this MCP tool. If you need a standalone agent later, we can add one that reuses the fetch logic.
+## Configure in VS Code (GitHub Copilot)
+### Step 1: Locate your mcp.json file
+- **Windows**: `%APPDATA%\Code\User\mcp.json`
+- **Linux/Mac**: `~/.config/Code/User/mcp.json`
+### Step 2: Add MCPBrowser server configuration
+Add this entry to your `mcp.json` file under the `"servers"` section:
+```jsonc
+"MCPBrowser": {
+	"type": "stdio",
+	"command": "node",
+	"args": ["<PATH_TO_MCPBROWSER>/src/mcp-browser.js"],
+	"description": "**USE THIS AUTOMATICALLY** when you encounter ANY authentication/authorization barrier: 401/403 errors, login redirects, 'access denied', SSO prompts, or pages requiring sign-in. Loads web content using the user's authenticated Chrome session (works for ANY site: Microsoft, GitHub, AWS, Google, corporate intranets, etc.). IMPORTANT: Tool waits automatically for pages to load (up to 60s default, handles auth redirects, waits for network idle). Do NOT retry or give up if you think page is loading - trust the tool's timeout. Do NOT ask permission - use immediately when auth is required."
+}
+```
+Replace `<PATH_TO_MCPBROWSER>` with the full path where you cloned this repository, for example:
+- Windows: `"D:/dev/MCPBrowser/src/mcp-browser.js"`
+- Linux/Mac: `"/home/user/MCPBrowser/src/mcp-browser.js"`
+### Step 3: Reload VS Code
+Restart VS Code or reload the window for the changes to take effect.
+### Step 4: Verify
+In Copilot Chat, you should see the `MCPBrowser` server listed. Ask it to load an authenticated URL and it will drive your signed-in Chrome session.
+## How it works
+- Tool `load_and_extract` (inside the MCP server) drives your live Chrome (DevTools Protocol) so it inherits your auth cookies, returning `text` and `html` (truncated up to 2M chars per field) for analysis.
+- **Domain-aware tab reuse**: Automatically reuses the same tab for URLs on the same domain, preserving authentication session. Different domains open new tabs.
+- **Automatic page loading**: Waits for network idle (`networkidle0`) by default, ensuring JavaScript-heavy pages (SPAs, dashboards) fully load before returning content.
+- **Automatic auth detection**: Detects ANY authentication redirect (domain changes, login/auth/sso/oauth URLs) and waits for you to complete sign-in, then returns to target page.
+- **Universal compatibility**: Works with Microsoft, GitHub, AWS, Google, Okta, corporate SSO, or any authenticated site.
+- **Smart timeouts**: 60s default for page load, 10 min for auth redirects. Tabs stay open indefinitely for reuse (no auto-close).
+- GitHub Copilot's LLM invokes this tool via MCP; this repo itself does not run an LLM.
+## Auth-assisted fetch flow
+- Copilot can call with just the URL, or with no params if you set an env default (`DEFAULT_FETCH_URL` or `MCP_DEFAULT_FETCH_URL`). By default tabs stay open indefinitely for reuse (domain-aware).
+- First call opens the tab and leaves it open so you can sign in. No extra params needed.
+- After you sign in, call the same URL again; tab stays open for reuse. Set `keepPageOpen: false` to close immediately on success.
+- Optional fields (`authWaitSelector`, `waitForSelector`, `waitForUrlPattern`, etc.) are available but not required.
+## Configuration
+- `.env`: optional overrides for `CHROME_WS_ENDPOINT`, `CHROME_REMOTE_DEBUG_HOST/PORT`, `CHROME_PATH`, `CHROME_USER_DATA_DIR`.
+- To use a specific WS endpoint: set `CHROME_WS_ENDPOINT` from Chrome `chrome://version` DevTools JSON.
+## Tips
+- **Universal auth**: Works with ANY authenticated site (Microsoft, GitHub, AWS, Google, corporate intranets, SSO, OAuth, etc.)
+- **No re-authentication needed**: Automatically reuses the same tab for URLs on the same domain, keeping your auth session alive across multiple page fetches
+- **Automatic page loading**: Tool waits for pages to fully load (default 60s timeout, waits for network idle). Copilot should trust the tool and not retry manually.
+- **Auth redirect handling**: Auto-detects auth redirects by monitoring domain changes and common login URL patterns (`/login`, `/auth`, `/signin`, `/sso`, `/oauth`, `/saml`)
+- **Tabs stay open**: By default tabs remain open indefinitely for reuse. Set `keepPageOpen: false` to close immediately after successful fetch.
+- **Smart domain switching**: When switching domains, automatically closes the old tab and opens a new one to prevent tab accumulation
+- If you hit login pages, verify Chrome instance is signed in and the site opens there.
+- Use a dedicated profile directory to avoid interfering with your daily Chrome.
+- For heavy pages, add `waitForSelector` to ensure post-login content appears before extraction.

package/package.json ADDED Viewed

@@ -0,0 +1,41 @@
+{
+  "name": "mcpbrowser",
+  "version": "0.2.0",
+  "type": "module",
+  "description": "MCP server that loads authenticated web pages using Chrome DevTools Protocol",
+  "main": "src/mcp-browser.js",
+  "bin": {
+    "mcpbrowser": "src/mcp-browser.js"
+  },
+  "scripts": {
+    "mcp": "node src/mcp-browser.js"
+  },
+  "keywords": [
+    "mcp",
+    "mcp-server",
+    "model-context-protocol",
+    "chrome",
+    "puppeteer",
+    "authentication",
+    "web-scraping",
+    "github-copilot"
+  ],
+  "author": "cherchyk",
+  "license": "MIT",
+  "repository": {
+    "type": "git",
+    "url": "git+https://github.com/cherchyk/MCPBrowser.git"
+  },
+  "bugs": {
+    "url": "https://github.com/cherchyk/MCPBrowser/issues"
+  },
+  "homepage": "https://github.com/cherchyk/MCPBrowser#readme",
+  "dependencies": {
+    "@modelcontextprotocol/sdk": "^1.25.1",
+    "dotenv": "^16.4.5",
+    "puppeteer-core": "^23.4.1"
+  },
+  "engines": {
+    "node": ">=18.0.0"
+  }
+}

package/src/mcp-browser.js ADDED Viewed

@@ -0,0 +1,249 @@
+#!/usr/bin/env node
+import dotenv from "dotenv";
+import puppeteer from "puppeteer-core";
+import { existsSync } from "fs";
+import os from "os";
+import path from "path";
+import { spawn } from "child_process";
+import { Server } from "@modelcontextprotocol/sdk/server/index.js";
+import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";
+import { ListToolsRequestSchema, CallToolRequestSchema } from "@modelcontextprotocol/sdk/types.js";
+dotenv.config();
+const chromeHost = process.env.CHROME_REMOTE_DEBUG_HOST || "127.0.0.1";
+const chromePort = Number(process.env.CHROME_REMOTE_DEBUG_PORT || 9222);
+const explicitWSEndpoint = process.env.CHROME_WS_ENDPOINT;
+const userDataDir = process.env.CHROME_USER_DATA_DIR || path.join(os.homedir(), "AppData/Local/ChromeAuthProfile");
+const chromePathEnv = process.env.CHROME_PATH;
+const defaultChromePaths = [
+  "C:/Program Files/Google/Chrome/Application/chrome.exe",
+  "C:/Program Files (x86)/Google/Chrome/Application/chrome.exe",
+  "C:/Program Files/Microsoft/Edge/Application/msedge.exe",
+];
+let cachedBrowser = null;
+let lastKeptPage = null; // reuse the same tab when requested
+async function devtoolsAvailable() {
+  try {
+    const url = `http://${chromeHost}:${chromePort}/json/version`;
+    const res = await fetch(url, { method: "GET" });
+    if (!res.ok) return false;
+    const data = await res.json();
+    return Boolean(data.webSocketDebuggerUrl);
+  } catch {
+    return false;
+  }
+}
+function findChromePath() {
+  if (chromePathEnv && existsSync(chromePathEnv)) return chromePathEnv;
+  return defaultChromePaths.find((p) => existsSync(p));
+}
+async function launchChromeIfNeeded() {
+  if (explicitWSEndpoint) return; // user provided explicit endpoint; assume managed externally
+  if (await devtoolsAvailable()) return;
+  const chromePath = findChromePath();
+  if (!chromePath) {
+    throw new Error("Chrome/Edge not found. Set CHROME_PATH to your browser executable.");
+  }
+  const args = [`--remote-debugging-port=${chromePort}`, `--user-data-dir=${userDataDir}`];
+  const child = spawn(chromePath, args, { detached: true, stdio: "ignore" });
+  child.unref();
+  // Wait for DevTools to come up
+  const deadline = Date.now() + 20000;
+  while (Date.now() < deadline) {
+    if (await devtoolsAvailable()) return;
+    await new Promise((r) => setTimeout(r, 500));
+  }
+  throw new Error("Chrome did not become available on DevTools port; check CHROME_PATH/port/profile.");
+}
+async function resolveWSEndpoint() {
+  if (explicitWSEndpoint) return explicitWSEndpoint;
+  const url = `http://${chromeHost}:${chromePort}/json/version`;
+  const res = await fetch(url);
+  if (!res.ok) {
+    throw new Error(`Unable to reach Chrome devtools at ${url}: ${res.status}`);
+  }
+  const data = await res.json();
+  if (!data.webSocketDebuggerUrl) {
+    throw new Error("No webSocketDebuggerUrl in /json/version response");
+  }
+  return data.webSocketDebuggerUrl;
+}
+async function getBrowser() {
+  await launchChromeIfNeeded();
+  if (cachedBrowser && cachedBrowser.isConnected()) return cachedBrowser;
+  const wsEndpoint = await resolveWSEndpoint();
+  cachedBrowser = await puppeteer.connect({
+    browserWSEndpoint: wsEndpoint,
+    defaultViewport: null,
+  });
+  cachedBrowser.on("disconnected", () => {
+    cachedBrowser = null;
+    lastKeptPage = null;
+  });
+  return cachedBrowser;
+}
+async function fetchPage({
+  url,
+  keepPageOpen = true,
+}) {
+  // Hardcoded smart defaults
+  const waitUntil = "networkidle0";
+  const timeoutMs = 60000;
+  const reuseLastKeptPage = true;
+  if (!url) {
+    throw new Error("url parameter is required");
+  }
+  const browser = await getBrowser();
+  let page = null;
+  // Smart tab reuse: only reuse if same domain (preserves auth within domain)
+  if (reuseLastKeptPage && lastKeptPage && !lastKeptPage.isClosed()) {
+    let newHostname;
+    try {
+      newHostname = new URL(url).hostname;
+    } catch {
+      throw new Error(`Invalid URL: ${url}`);
+    }
+    const currentUrl = lastKeptPage.url();
+    if (currentUrl) {
+      try {
+        const currentHostname = new URL(currentUrl).hostname;
+        // Reuse tab only if same domain (keeps auth session alive)
+        if (currentHostname === newHostname) {
+          page = lastKeptPage;
+          await page.bringToFront().catch(() => {});
+        } else {
+          // Different domain - close old tab and create new one
+          await lastKeptPage.close().catch(() => {});
+          lastKeptPage = null;
+        }
+      } catch {
+        // If URL parsing fails, create new tab
+      }
+    }
+  }
+  // Create new tab if no reuse
+  if (!page) {
+    page = await browser.newPage();
+  }
+  let shouldKeepOpen = keepPageOpen || page === lastKeptPage;
+  let wasSuccess = false;
+  try {
+    await page.goto(url, { waitUntil, timeout: timeoutMs });
+    // Extract content
+    const text = await page.evaluate(() => document.body?.innerText || "");
+    const html = await page.evaluate(() => document.documentElement?.outerHTML || "");
+    wasSuccess = true;
+    if (keepPageOpen && lastKeptPage !== page) {
+      // Close old kept page if we're keeping a different one
+      if (lastKeptPage && !lastKeptPage.isClosed()) {
+        await lastKeptPage.close().catch(() => {});
+      }
+      lastKeptPage = page;
+    }
+    return {
+      success: true,
+      url: page.url(),
+      text: truncate(text, 2000000),
+      html: truncate(html, 2000000),
+    };
+  } catch (err) {
+    shouldKeepOpen = shouldKeepOpen || keepPageOpen;
+    const hint = shouldKeepOpen
+      ? "Tab is left open. Complete sign-in there, then call load_and_extract again with just the URL."
+      : undefined;
+    return { success: false, error: err.message || String(err), pageKeptOpen: shouldKeepOpen, hint };
+  } finally {
+    if (!shouldKeepOpen && lastKeptPage === page) {
+      lastKeptPage = null;
+    }
+    if (!shouldKeepOpen) {
+      await page.close().catch(() => {});
+    }
+  }
+}
+function truncate(str, max) {
+  if (!str) return "";
+  return str.length > max ? `${str.slice(0, max)}... [truncated]` : str;
+}
+async function main() {
+  const server = new Server({ name: "MCPBrowser", version: "0.2.0" }, { capabilities: { tools: {} } });
+  const tools = [
+    {
+      name: "load_and_extract",
+      description: "Fetch and extract content from authenticated web pages using a local Chrome/Edge browser via DevTools Protocol. Automatically detects auth redirects by waiting for network idle. Supports smart tab reuse within the same domain to preserve authentication sessions. Returns both plain text and HTML content.",
+      inputSchema: {
+        type: "object",
+        properties: {
+          url: { type: "string", description: "The URL to fetch" },
+          keepPageOpen: { type: "boolean", description: "Keep the tab open after fetching for manual auth or reuse (default: true)" },
+        },
+        required: ["url"],
+        additionalProperties: false,
+      },
+    },
+  ];
+  server.setRequestHandler(ListToolsRequestSchema, async () => ({ tools }));
+  server.setRequestHandler(CallToolRequestSchema, async (request) => {
+    const { name, arguments: args } = request.params;
+    if (name !== "load_and_extract") {
+      throw new Error(`Unknown tool: ${name}`);
+    }
+    const safeArgs = args || {};
+    const fallbackUrl = process.env.DEFAULT_FETCH_URL || process.env.MCP_DEFAULT_FETCH_URL;
+    if (!safeArgs.url) {
+      if (fallbackUrl) {
+        safeArgs.url = fallbackUrl;
+      } else {
+        return {
+          content: [
+            {
+              type: "text",
+              text: JSON.stringify({ success: false, error: "Missing url and no DEFAULT_FETCH_URL/MCP_DEFAULT_FETCH_URL configured" }),
+            },
+          ],
+        };
+      }
+    }
+    const result = await fetchPage(safeArgs);
+    return {
+      content: [
+        {
+          type: "text",
+          text: JSON.stringify(result),
+        },
+      ],
+    };
+  });
+  const transport = new StdioServerTransport();
+  await server.connect(transport);
+}
+main().catch((err) => {
+  console.error(err);
+  process.exit(1);
+});