npm - mcp-sequential-research - Versions diffs - 1.0.0 → 1.1.0 - Mend

mcp-sequential-research 1.0.0 → 1.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (3) hide show

package/README.md CHANGED Viewed

@@ -229,12 +229,13 @@ This format is designed for downstream claim-mining tools.
 Works with other MCP servers:
-| Source Type | Recommended MCP |
-|-------------|-----------------|
-| Patents | Google Patents MCP (`search_patents`) |
-| Web | Google Search MCP (Custom Search) |
-| Memory | Memory MCP (`search_nodes`) |
-| Academic | Semantic Scholar API |
+| Source Type | Recommended MCP | Tool |
+|-------------|-----------------|------|
+| Patents | Google Patents MCP | `search_patents` |
+| Web Search | Google Search MCP | `google_search` |
+| Web Scraping | Google Search MCP | `read_webpage` |
+| Memory | Memory MCP | `search_nodes` |
+| Academic | Semantic Scholar API | — |
 ## License

package/docs/MCP_GUIDANCE.md CHANGED Viewed

@@ -32,14 +32,13 @@ This is the exact **operator loop** Claude Code follows for comprehensive resear
 │     └─→ sequential-research:sequential_research_plan(prompt, constraints) │
 │                        ↓                                              │
 │  2. WEB QUERIES (for each plan.queries where query_family == "web")  │
-│     └─→ google-search:search({query, num: 10})                       │
+│     └─→ google-search:google_search({query, num: 10})                │
 │         Returns: {title, link, snippet}[]                            │
 │                        ↓                                              │
 │  3. SCRAPE WEB CONTENT                                               │
 │     ├─→ Collect top URLs (dedupe)                                    │
-│     ├─→ If multiple: firecrawl:firecrawl_batch_scrape({urls})        │
-│     └─→ If single:   firecrawl:firecrawl_scrape({url})               │
-│         Returns: {markdown, metadata}                                │
+│     └─→ For each URL: google-search:read_webpage({url})              │
+│         Returns: {title, text, url}                                  │
 │                        ↓                                              │
 │  4. PATENT QUERIES (for each plan.queries where query_family == "patent") │
 │     └─→ google-patents:search_patents({query, num_results})          │
@@ -89,7 +88,7 @@ For each query where `query_family == "web"`:
 ```json
 {
-  "tool": "google-search:search",
+  "tool": "google-search:google_search",
   "arguments": {
     "query": "photonic computing silicon photonics site:.edu OR filetype:pdf",
     "num": 10
@@ -110,36 +109,16 @@ For each query where `query_family == "web"`:
 }
 ```
-### Step 3: Scrape Web Content via Firecrawl MCP
+### Step 3: Scrape Web Content via Google Search MCP `read_webpage`
-After collecting search results, extract full content using Firecrawl.
+After collecting search results, extract full content using `read_webpage`.
-**For multiple URLs (recommended for efficiency):**
+**For each URL:**
 ```json
 {
-  "tool": "firecrawl:firecrawl_batch_scrape",
+  "tool": "google-search:read_webpage",
   "arguments": {
-    "urls": [
-      "https://example.mit.edu/photonics.pdf",
-      "https://lightmatter.co/technology",
-      "https://ieee.org/article/photonic-computing"
-    ],
-    "options": {
-      "formats": ["markdown"],
-      "onlyMainContent": true
-    }
-  }
-}
-```
-**For a single URL:**
-```json
-{
-  "tool": "firecrawl:firecrawl_scrape",
-  "arguments": {
-    "url": "https://example.mit.edu/photonics.pdf",
-    "formats": ["markdown"],
-    "onlyMainContent": true
+    "url": "https://example.mit.edu/photonics.pdf"
   }
 }
 ```
@@ -147,17 +126,14 @@ After collecting search results, extract full content using Firecrawl.
 **Response format:**
 ```json
 {
-  "success": true,
-  "data": {
-    "markdown": "# Silicon Photonics for AI\n\nRecent advances...",
-    "metadata": {
-      "title": "Silicon Photonics for AI",
-      "sourceURL": "https://example.mit.edu/photonics.pdf"
-    }
-  }
+  "title": "Silicon Photonics for AI - MIT",
+  "text": "# Silicon Photonics for AI\n\nRecent advances in silicon photonics have enabled...",
+  "url": "https://example.mit.edu/photonics.pdf"
 }
 ```
+**Note:** Call `read_webpage` for each URL sequentially or in parallel. The tool automatically converts HTML to readable text and handles most page types.
 ### Step 4: Execute Patent Queries via Google Patents MCP
 For each query where `query_family == "patent"`:
@@ -234,7 +210,7 @@ Transform all responses into the standard schema with sequential source IDs:
 2. Deduplicate URLs before assigning IDs
 3. Patents get `source_type: "patent"` with extra fields
 4. Web content gets `source_type: "web"`
-5. Firecrawl markdown content goes in `excerpt` (truncated if needed)
+5. The `text` from `read_webpage` goes in `excerpt` (truncated if needed)
 ### Step 6: Call `sequential_research_compile`
@@ -280,8 +256,8 @@ await fs.writeFile(`research/${slug}/raw_results.json`, JSON.stringify(rawResult
 | Step | MCP Server | Tool | Purpose |
 |------|-----------|------|---------|
 | 1 | sequential-research | `sequential_research_plan` | Generate structured query plan |
-| 2 | google-search | `search` | Get web search results |
-| 3 | firecrawl | `firecrawl_scrape` / `firecrawl_batch_scrape` | Extract full page content |
+| 2 | google-search | `google_search` | Get web search results |
+| 3 | google-search | `read_webpage` | Extract full page content |
 | 4 | google-patents | `search_patents` | Search patent database |
 | 5 | — | — | Normalize to raw_results[] |
 | 6 | sequential-research | `sequential_research_compile` | Generate report with citations |
@@ -289,31 +265,40 @@ await fs.writeFile(`research/${slug}/raw_results.json`, JSON.stringify(rawResult
 ---
-## Firecrawl Integration Details
+## Web Scraping with `read_webpage`
+The `google-search:read_webpage` tool provides a simple, reliable way to fetch web content without additional MCP server dependencies.
-### When to Use Batch vs Single Scrape
+### Usage
-| Scenario | Tool | Reason |
-|----------|------|--------|
-| 5+ URLs from same query | `firecrawl_batch_scrape` | Efficiency, rate limiting |
-| 1-4 URLs | `firecrawl_scrape` (multiple calls) | Lower overhead |
-| PDF documents | `firecrawl_scrape` | Better PDF handling |
-| Real-time updates needed | `firecrawl_scrape` | Immediate results |
+```json
+{
+  "tool": "google-search:read_webpage",
+  "arguments": {
+    "url": "https://example.com/article"
+  }
+}
+```
-### Firecrawl Options
+### Response Format
 ```json
 {
-  "formats": ["markdown"],        // Output format
-  "onlyMainContent": true,        // Skip headers/footers
-  "includeTags": ["article", "main", "section"],  // Focus on content
-  "excludeTags": ["nav", "footer", "aside"],      // Skip navigation
-  "waitFor": 2000,                // Wait for JS rendering (ms)
-  "timeout": 30000                // Request timeout (ms)
+  "title": "Article Title",
+  "text": "The full text content of the page...",
+  "url": "https://example.com/article"
 }
 ```
-### Handling Firecrawl Errors
+### Features
+- **Automatic HTML to text conversion** — Clean, readable output
+- **No additional setup** — Uses the same google-search MCP server
+- **Handles most page types** — HTML, some PDFs, etc.
+### Handling Scrape Errors
+If a URL fails to scrape, include it in results with a note:
 ```json
 {
@@ -326,13 +311,24 @@ await fs.writeFile(`research/${slug}/raw_results.json`, JSON.stringify(rawResult
       "source_type": "web",
       "title": "Page Title (scrape failed)",
       "url": "https://example.com/blocked",
-      "excerpt": "[Content unavailable - scrape blocked by robots.txt]"
+      "excerpt": "[Content unavailable - page could not be fetched]"
     }
   ],
   "execution_notes": "1 of 5 URLs failed to scrape"
 }
 ```
+### Parallel Execution
+You can call `read_webpage` for multiple URLs in parallel to improve throughput:
+```
+// Execute these concurrently:
+google-search:read_webpage({url: "https://site1.com/page"})
+google-search:read_webpage({url: "https://site2.com/page"})
+google-search:read_webpage({url: "https://site3.com/page"})
+```
 ## Citation Format Requirement
 Citations must be **stable** and **machine-parseable** for downstream claim-mining.

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "mcp-sequential-research",
-  "version": "1.0.0",
+  "version": "1.1.0",
   "description": "MCP server for sequential research planning and compilation",
   "type": "module",
   "main": "dist/index.js",