npm - agent-search-mcp - Versions diffs - 2.1.0 - Mend

agent-search-mcp 2.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (35) hide show

package/CHANGELOG.md +80 -0
package/LICENSE +207 -0
package/README.md +480 -0
package/dist/aggregation/dedup.js +102 -0
package/dist/aggregation/format.js +60 -0
package/dist/aggregation/index.js +3 -0
package/dist/aggregation/scorer.js +110 -0
package/dist/cli.js +169 -0
package/dist/engines/baidu.js +56 -0
package/dist/engines/bing.js +58 -0
package/dist/engines/brave.js +33 -0
package/dist/engines/duckduckgo.js +47 -0
package/dist/engines/exa.js +46 -0
package/dist/engines/index.js +25 -0
package/dist/engines/sogou.js +132 -0
package/dist/engines/tavily.js +33 -0
package/dist/index.js +46 -0
package/dist/infrastructure/cache.js +24 -0
package/dist/infrastructure/config.js +18 -0
package/dist/infrastructure/health.js +86 -0
package/dist/infrastructure/html-utils.js +10 -0
package/dist/infrastructure/http.js +66 -0
package/dist/infrastructure/index.js +9 -0
package/dist/infrastructure/logger.js +9 -0
package/dist/infrastructure/rate-limiter.js +12 -0
package/dist/infrastructure/security.js +158 -0
package/dist/infrastructure/url-validator.js +33 -0
package/dist/tools/capabilities.js +35 -0
package/dist/tools/fetch-tools.js +200 -0
package/dist/tools/free-extract.js +43 -0
package/dist/tools/free-search-advanced.js +40 -0
package/dist/tools/free-search.js +380 -0
package/dist/tools/health.js +9 -0
package/dist/types.js +1 -0
package/package.json +68 -0

package/README.md ADDED Viewed

@@ -0,0 +1,480 @@
+# Agent Search MCP
+> 🔍 Free multi-source search for AI agents — multi-source verification, token savings, MCP native.
+[![License](https://img.shields.io/github/license/lennney/agent-search-mcp)](LICENSE)
+[![Node.js](https://img.shields.io/badge/node-%3E%3D18-brightgreen)](package.json)
+[![MCP](https://img.shields.io/badge/MCP-compatible-blue)](https://modelcontextprotocol.io)
+[![Tests](https://img.shields.io/badge/tests-65%20passing-brightgreen)](https://github.com/lennney/agent-search-mcp)
+**Works with Hermes, Claude Code, Cursor, Windsurf, OpenClaw, Codex, and any MCP-compatible client.**
+---
+[English](#why-agent-search-mcp) · [中文](README_zh.md) · [安装](#quick-start) · [工具文档](#tools) · [竞品对比](#competitor-comparison)
+---
+## Why Agent Search MCP
+**AI agents need to search the internet. But existing solutions have problems:**
+- **Tavily** — Great quality, but $0.01/search adds up fast. Monthly cost: $20-50+.
+- **Exa** — Semantic search is powerful, but $50/month minimum.
+- **Brave Search** — 2000 free queries/month, then $3/1000. Not enough for heavy use.
+- **DDG MCP** — Single source, no verification, no dedup, results vary wildly.
+- **open-websearch** — 13 engines, but 300MB+ dependency tree, no token optimization.
+**Agent Search MCP solves this differently:**
+1. **Free + high quality** — DuckDuckGo + Sogou as core engines, no API key needed
+2. **Multi-source verification** — Results cross-checked across engines, each result gets a confidence score (1-3)
+3. **Token optimization** — Title ≤100 chars, snippet ≤200 chars, dedup removes redundancy. Saves ~40-50% tokens.
+4. **MCP native** — Built for Model Context Protocol from day one. Zero config, works out of the box.
+5. **Self-hostable** — No data sent to third parties. Run it on your own VPS.
+6. **Security built-in** — Prompt injection detection, output boundary markers, phishing URL filtering.
+**Who is this for?**
+- AI agent developers (Hermes, OpenClaw, custom agents)
+- IDE users who want AI-powered search (Claude Code, Cursor, Windsurf)
+- Anyone building MCP-compatible tools
+- Users who need Chinese web search (Sogou integration)
+**The math:** If you search 100 times/day, Tavily costs ~$1/day. Agent Search MCP costs $0. Over a year, that's $365 saved.
+---
+## 为什么选择 Agent Search MCP
+**AI Agent 需要搜索互联网。但现有方案都有问题：**
+- **Tavily** — 质量好，但每次搜索 $0.01，月费 $20-50+
+- **Exa** — 语义搜索强，但最低 $50/月
+- **Brave Search** — 2000 次/月免费，之后 $3/1000，重度使用不够
+- **DDG MCP** — 单源，无验证，无去重，结果质量不稳定
+- **open-websearch** — 13 引擎，但 300MB+ 依赖，无 token 优化
+**Agent Search MCP 的解决方案：**
+1. **免费 + 高质量** — DuckDuckGo + Sogou 为核心，无需 API Key
+2. **多源验证** — 跨引擎交叉验证，每个结果有置信度评分（1-3）
+3. **Token 优化** — 标题 ≤100 字符，摘要 ≤200 字符，去重去除冗余。节省 ~40-50% token
+4. **MCP 原生** — 基于 Model Context Protocol 构建，零配置开箱即用
+5. **可自托管** — 数据不经过第三方，可在自有 VPS 运行
+6. **内置安全** — Prompt 注入检测、输出边界标记、钓鱼 URL 过滤
+**适用人群：**
+- AI Agent 开发者（Hermes、OpenClaw、自定义 Agent）
+- IDE 用户（Claude Code、Cursor、Windsurf）
+- 构建 MCP 兼容工具的开发者
+- 需要中文搜索的用户（搜狗集成）
+**成本对比：** 如果每天搜索 100 次，Tavily 月费约 $30。Agent Search MCP 完全免费。一年省 $365。
+---
+## Competitor Comparison
+| Feature | Agent Search MCP | Tavily | Exa | Brave Search | DDG MCP |
+|---------|:---:|:---:|:---:|:---:|:---:|
+| **Price** | Free | $0.01/search | $50/mo | $3/1000 | Free |
+| **API Key** | Not required | Required | Required | Required | Required |
+| **Multi-source** | ✅ 2-4 engines | ❌ Single | ❌ Single | ❌ Single | ❌ Single |
+| **Confidence score** | ✅ 1-3 | ❌ | ❌ | ❌ | ❌ |
+| **Deduplication** | ✅ URL + title | ❌ | ❌ | ❌ | ❌ |
+| **Token optimization** | ✅ ~40-50% | ❌ | ❌ | ❌ | ❌ |
+| **Chinese search** | ✅ Sogou | ❌ | ❌ | ❌ | ❌ |
+| **MCP native** | ✅ | ✅ | ✅ | ✅ | ✅ |
+| **Self-hostable** | ✅ | ❌ Cloud only | ❌ Cloud only | ❌ Cloud only | ✅ |
+| **Progressive disclosure** | ✅ 3 tools | ❌ | ❌ | ❌ | ❌ |
+| **Health monitoring** | ✅ | ❌ | ❌ | ❌ | ❌ |
+| **Fallback chain** | ✅ Free→Paid | ❌ | ❌ | ❌ | ❌ |
+| **Security** | ✅ Injection protection | ❌ | ❌ | ❌ | ❌ |
+| **Dependencies** | 4 | 12+ | 15+ | 8 | 3 |
+**Key differences:**
+1. **Free by default** — No API key, no credit card, no limits. DuckDuckGo + Sogou work out of the box.
+2. **Multi-source verification** — Results from multiple engines are cross-checked. Confidence score tells you how reliable a result is.
+3. **Token optimization** — Smart truncation and dedup reduce token consumption by ~40-50%. This is crucial for cost-sensitive applications.
+4. **Chinese support** — Sogou engine provides native Chinese web search. Not a translation layer.
+5. **Progressive disclosure** — 3 tools at different complexity levels. Agents discover capabilities on-demand (Exa model).
+6. **Security** — Built-in protection against prompt injection, phishing URLs, and output boundary markers.
+---
+## Quick Start
+### Prerequisites
+- Node.js >= 18
+- Python 3 with `ddgs` library:
+```bash
+pip install ddgs
+```
+### Install
+```bash
+# Option 1: npx (recommended)
+npx agent-search-mcp
+# Option 2: global install
+npm install -g agent-search-mcp
+```
+### Platform Setup
+<details>
+<summary><b>Hermes</b></summary>
+```yaml
+# ~/.hermes/config.yaml
+mcp_servers:
+  agent-search:
+    command: npx
+    args: ["agent-search-mcp"]
+```
+</details>
+<details>
+<summary><b>Claude Code</b></summary>
+```json
+// ~/.claude/mcp.json
+{
+  "mcpServers": {
+    "agent-search": {
+      "command": "npx",
+      "args": ["agent-search-mcp"]
+    }
+  }
+}
+```
+</details>
+<details>
+<summary><b>Cursor</b></summary>
+```json
+// .cursor/mcp.json
+{
+  "mcpServers": {
+    "agent-search": {
+      "command": "npx",
+      "args": ["agent-search-mcp"]
+    }
+  }
+}
+```
+</details>
+<details>
+<summary><b>Windsurf</b></summary>
+```json
+// ~/.codeium/windsurf/mcp_config.json
+{
+  "mcpServers": {
+    "agent-search": {
+      "command": "npx",
+      "args": ["agent-search-mcp"]
+    }
+  }
+}
+```
+</details>
+<details>
+<summary><b>OpenClaw</b></summary>
+```typescript
+// openclaw.config.ts
+{
+  mcpServers: {
+    "agent-search": {
+      command: "npx",
+      args: ["agent-search-mcp"]
+    }
+  }
+}
+```
+</details>
+<details>
+<summary><b>Codex</b></summary>
+```json
+// ~/.codex/mcp.json
+{
+  "mcpServers": {
+    "agent-search": {
+      "command": "npx",
+      "args": ["agent-search-mcp"]
+    }
+  }
+}
+```
+</details>
+---
+## Features
+- **Free by default** — DuckDuckGo + Sogou as core engines, no API key required. Brave and Tavily available as optional paid fallback.
+- **Multi-source verification** — Results cross-checked across engines, each result gets a confidence score (1-3) based on how many sources returned it.
+- **Token optimization** — Title truncation (≤100 chars), snippet truncation (≤200 chars), URL + title dedup. Saves ~40-50% tokens.
+- **Progressive disclosure** — 3 tools at different complexity levels. `free_search` for quick queries, `free_search_advanced` for filtered search, `free_extract` for page content. Agents discover capabilities on-demand.
+- **Fallback chain** — Free engines first, paid engines as backup. Automatic merge, dedup, and scoring.
+- **Health monitoring** — Real-time provider health tracking. Unhealthy providers filtered automatically.
+- **Security** — Prompt injection detection, output boundary markers, phishing URL filtering, and security metadata on every response.
+- **CLI tool** — Use as a command-line tool for terminal search, web extraction, and HTTP server.
+---
+## CLI Usage
+free-agent-search-mcp also works as a CLI tool.
+### Install
+```bash
+npm install -g agent-search-mcp
+```
+### Search
+```bash
+# Basic search
+fasm search "TypeScript MCP server"
+# With options
+fasm search "query" --count 5 --engines bing,baidu
+# JSON output
+fasm search "query" --json
+```
+### Extract Web Page
+```bash
+fasm extract "https://example.com"
+fasm extract "https://example.com" --json
+```
+### Start HTTP Server
+```bash
+fasm serve --port 8080
+```
+### Help
+```bash
+fasm --help
+```
+---
+## Tools
+### `free_search`
+Basic web search with multi-source verification.
+```json
+{
+  "query": "TypeScript MCP server",
+  "count": 5
+}
+```
+**Returns:** Array of search results with confidence scores.
+### `free_search_advanced`
+Advanced search with filters.
+```json
+{
+  "query": "MCP server",
+  "count": 10,
+  "min_confidence": 2,
+  "time_range": "week",
+  "language": "zh",
+  "include_domains": ["github.com"],
+  "exclude_domains": ["reddit.com"]
+}
+```
+**Parameters:**
+- `min_confidence` (1-3): Only return results verified by N+ sources
+- `time_range`: day, week, month, year
+- `language`: auto, en, zh
+- `include_domains`: Only search these domains
+- `exclude_domains`: Exclude these domains
+### `free_extract`
+Extract full content from a URL as Markdown.
+```json
+{
+  "url": "https://example.com/article",
+  "max_length": 5000
+}
+```
+**Returns:** Markdown content with metadata.
+---
+## Resources
+### `search://capabilities`
+Returns a Markdown document describing all available tools and features. Agents can discover capabilities on-demand.
+### `search://health`
+Returns JSON with health status of each search provider. Useful for monitoring and debugging.
+---
+## Configuration
+### Environment Variables
+| Variable | Description | Required |
+|----------|-------------|----------|
+| `BRAVE_API_KEY` | Brave Search API key (2000 free/month) | No |
+| `TAVILY_API_KEY` | Tavily API key (1000 free/month) | No |
+| `LOG_LEVEL` | Log level (info, debug) | No |
+**Zero config works** — no API keys needed for basic search.
+### With Paid Engines
+Set environment variables to enable fallback to paid engines when free results are insufficient:
+```bash
+export BRAVE_API_KEY=your_key_here
+export TAVILY_API_KEY=your_key_here
+```
+---
+## Dependencies
+| Dependency | License | Purpose |
+|------------|---------|---------|
+| @modelcontextprotocol/sdk | MIT | MCP protocol |
+| zod | MIT | Schema validation |
+| pino | MIT | Logging |
+| yaml | ISC | Config parsing |
+| ddgs (Python) | MIT | DuckDuckGo search backend (bypasses anti-bot) |
+**Note:** `ddgs` is a Python library called via subprocess. It must be installed separately:
+```bash
+pip install ddgs
+```
+---
+## Architecture
+```
+Agent
+  ↓ MCP Protocol (stdio)
+MCP Server
+  ├── Tools Layer (progressive disclosure)
+  │   ├── free_search (default)
+  │   ├── free_search_advanced (optional)
+  │   └── free_extract (optional)
+  ├── Aggregation Layer
+  │   ├── Top-1 Snippet merge
+  │   ├── URL + Title dedup
+  │   ├── Scoring + Confidence
+  │   └── Output truncation
+  ├── Security Layer
+  │   ├── Prompt injection detection
+  │   ├── Output boundary markers
+  │   ├── Phishing URL filtering
+  │   └── Security metadata
+  ├── Fallback Chain
+  │   ├── Phase 1: Free engines (DDG + Sogou)
+  │   └── Phase 2: Paid engines (Brave + Tavily)
+  └── Infrastructure
+      ├── Cache (LRU, 60s TTL)
+      ├── Rate Limiter (1s per provider)
+      ├── Health Tracker
+      └── SSRF Protection
+```
+---
+## Documentation / 文档
+| Document | Description |
+|----------|-------------|
+| [PRD](docs/prd.md) | Product Requirements Document |
+| [Architecture](docs/architecture.md) | Technical Architecture |
+| [Plan](docs/plan.md) | Implementation Plan |
+| [Review Results](docs/review-results.md) | 5-Team Review Results |
+| [Fork Plan](docs/fork-plan.md) | Fork & Modification Plan |
+| [CHANGELOG](CHANGELOG.md) | Version History |
+---
+## Development
+```bash
+# Clone
+git clone https://github.com/lennney/agent-search-mcp.git
+cd agent-search-mcp
+# Install
+npm install
+# Build
+npm run build
+# Test
+npm test
+# Run
+npm start
+```
+---
+## Roadmap
+- [ ] v0.1.0 — Initial release with DDG + Sogou
+- [ ] v0.2.0 — Brave + Tavily fallback
+- [ ] v0.3.0 — Health monitoring + rate limiting
+- [ ] v1.0.0 — Stable release with documentation
+- [ ] v1.1.0 — Plugin system for custom engines
+- [ ] v2.0.0 — Browser-based extraction (Playwright)
+---
+## License
+[Apache License 2.0](LICENSE)
+Based on [open-websearch](https://github.com/Aas-ee/open-websearch) by Aas-ee.
+```
+Copyright 2025 Open-WebSearch MCP Server Contributors
+Based on open-websearch by Aas-ee (Apache 2.0).
+Modified by Agent Search MCP Contributors.
+Copyright 2026 Agent Search MCP Contributors
+```
+---
+## Contributing

package/dist/aggregation/dedup.js ADDED Viewed

@@ -0,0 +1,102 @@
+export function normalizeUrl(url) {
+    try {
+        const u = new URL(url);
+        return `${u.hostname}${u.pathname.replace(/\/$/, '')}`.toLowerCase();
+    }
+    catch {
+        return url.toLowerCase();
+    }
+}
+/**
+ * Provider-aware dedup: same provider only searches once.
+ * From ddgs: track which providers we've already queried.
+ */
+export function dedupByProvider(engines) {
+    // Map engine -> provider (e.g., 'ddg' -> 'bing', 'sogou' -> 'sogou')
+    const providerMap = {
+        duckduckgo: 'bing', // DDG uses Bing backend
+        sogou: 'sogou',
+        brave: 'brave',
+        tavily: 'tavily',
+    };
+    const seenProviders = new Set();
+    const uniqueEngines = [];
+    for (const engine of engines) {
+        const provider = providerMap[engine] || engine;
+        if (!seenProviders.has(provider)) {
+            seenProviders.add(provider);
+            uniqueEngines.push(engine);
+        }
+    }
+    return uniqueEngines;
+}
+/**
+ * URL dedup with frequency counting.
+ * From ddgs: track how many engines returned each URL.
+ * Keep the item with longer body (richer content).
+ */
+export function dedupByUrl(results) {
+    const seen = new Map();
+    const frequencies = new Map();
+    for (const r of results) {
+        const key = normalizeUrl(r.url);
+        frequencies.set(key, (frequencies.get(key) || 0) + 1);
+        if (!seen.has(key)) {
+            seen.set(key, r);
+        }
+        else {
+            // From ddgs: keep the item with longer body (richer content)
+            const existing = seen.get(key);
+            if ((r.snippet?.length || 0) > (existing.snippet?.length || 0)) {
+                seen.set(key, r);
+            }
+        }
+    }
+    return { results: Array.from(seen.values()), frequencies };
+}
+/**
+ * Title dedup with Jaccard similarity.
+ */
+export function dedupByTitle(results, threshold = 0.85) {
+    const kept = [];
+    for (const r of results) {
+        const isDuplicate = kept.some(k => jaccard(k.title, r.title) > threshold);
+        if (!isDuplicate)
+            kept.push(r);
+    }
+    return kept;
+}
+/**
+ * Filter low-quality results.
+ * From ddgs: post_extract_results filters ads and invalid results.
+ */
+export function filterLowQuality(results) {
+    return results.filter(r => {
+        // Filter empty snippets
+        if (!r.snippet || r.snippet.length < 20)
+            return false;
+        // Filter DDG ads
+        if (r.url.includes('y.js?') || r.url.includes('/ad/'))
+            return false;
+        // Filter invalid URLs
+        if (!r.url.startsWith('http'))
+            return false;
+        // Filter DDG ad redirects
+        if (r.url.includes('duckduckgo.com/y.js'))
+            return false;
+        // Filter search engine internal links
+        if (r.url.includes('sogou.com/link'))
+            return false;
+        // Filter Wikipedia categories (low quality)
+        if (r.url.includes('wikipedia.org/wiki/Category:'))
+            return false;
+        return true;
+    });
+}
+function jaccard(a, b) {
+    const setA = new Set(a.split(/\s+/));
+    const setB = new Set(b.split(/\s+/));
+    const intersection = new Set([...setA].filter(x => setB.has(x)));
+    const union = new Set([...setA, ...setB]);
+    return union.size > 0 ? intersection.size / union.size : 0;
+}

package/dist/aggregation/format.js ADDED Viewed

@@ -0,0 +1,60 @@
+import { processResultSecurity, getSecurityNote, wrapWithBoundaryMarkers } from '../infrastructure/security.js';
+/**
+ * Format search results with security processing.
+ *
+ * Security features:
+ * - Snippet injection detection and marking
+ * - URL phishing detection
+ * - Boundary markers for agent clarity
+ * - Security metadata per result
+ */
+export function formatResults(results) {
+    // Process security for each result
+    const secured = results.map(r => processResultSecurity(r));
+    return {
+        results: secured.map(r => ({
+            title: r.title.slice(0, 100),
+            url: r.url,
+            snippet: r.snippet.slice(0, 200),
+            confidence: r.confidence,
+            // Only include security details if threats detected
+            ...(r.security.injectionDetected || !r.security.urlSafe ? {
+                security: {
+                    injection_detected: r.security.injectionDetected,
+                    url_safe: r.security.urlSafe,
+                    threats: r.security.threats,
+                    warnings: r.security.warnings,
+                },
+            } : {}),
+        })),
+        meta: {
+            total: results.length,
+            high_confidence: results.filter(r => r.confidence >= 2).length,
+            engines: [...new Set(results.flatMap(r => r.engines || [r.source]))],
+        },
+        security_note: getSecurityNote(),
+    };
+}
+/**
+ * Format results as XML boundary-marked output.
+ * Useful for agents that need clear data/instruction separation.
+ */
+export function formatResultsXml(results) {
+    const header = [
+        '<?xml version="1.0" encoding="UTF-8"?>',
+        '<search-response>',
+        `  <security-note>${getSecurityNote()}</security-note>`,
+        '  <results>',
+    ].join('\n');
+    const body = results.map(r => {
+        const secured = processResultSecurity(r);
+        return wrapWithBoundaryMarkers({
+            title: secured.title.slice(0, 100),
+            url: secured.url,
+            snippet: secured.snippet.slice(0, 200),
+            confidence: secured.confidence,
+        });
+    }).join('\n');
+    const footer = '  </results>\n</search-response>';
+    return [header, body, footer].join('\n');
+}

package/dist/aggregation/index.js ADDED Viewed

@@ -0,0 +1,3 @@
+export { dedupByProvider, dedupByUrl, dedupByTitle, filterLowQuality, normalizeUrl } from './dedup.js';
+export { scoreAndRank } from './scorer.js';
+export { formatResults } from './format.js';