mcp-sequential-research 1.0.0 → 1.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -229,12 +229,13 @@ This format is designed for downstream claim-mining tools.
229
229
 
230
230
  Works with other MCP servers:
231
231
 
232
- | Source Type | Recommended MCP |
233
- |-------------|-----------------|
234
- | Patents | Google Patents MCP (`search_patents`) |
235
- | Web | Google Search MCP (Custom Search) |
236
- | Memory | Memory MCP (`search_nodes`) |
237
- | Academic | Semantic Scholar API |
232
+ | Source Type | Recommended MCP | Tool |
233
+ |-------------|-----------------|------|
234
+ | Patents | Google Patents MCP | `search_patents` |
235
+ | Web Search | Google Search MCP | `google_search` |
236
+ | Web Scraping | Google Search MCP | `read_webpage` |
237
+ | Memory | Memory MCP | `search_nodes` |
238
+ | Academic | Semantic Scholar API | — |
238
239
 
239
240
  ## License
240
241
 
@@ -32,14 +32,13 @@ This is the exact **operator loop** Claude Code follows for comprehensive resear
32
32
  │ └─→ sequential-research:sequential_research_plan(prompt, constraints) │
33
33
  │ ↓ │
34
34
  │ 2. WEB QUERIES (for each plan.queries where query_family == "web") │
35
- │ └─→ google-search:search({query, num: 10})
35
+ │ └─→ google-search:google_search({query, num: 10})
36
36
  │ Returns: {title, link, snippet}[] │
37
37
  │ ↓ │
38
38
  │ 3. SCRAPE WEB CONTENT │
39
39
  │ ├─→ Collect top URLs (dedupe) │
40
- ├─→ If multiple: firecrawl:firecrawl_batch_scrape({urls})
41
- └─→ If single: firecrawl:firecrawl_scrape({url})
42
- │ Returns: {markdown, metadata} │
40
+ └─→ For each URL: google-search:read_webpage({url})
41
+ Returns: {title, text, url}
43
42
  │ ↓ │
44
43
  │ 4. PATENT QUERIES (for each plan.queries where query_family == "patent") │
45
44
  │ └─→ google-patents:search_patents({query, num_results}) │
@@ -89,7 +88,7 @@ For each query where `query_family == "web"`:
89
88
 
90
89
  ```json
91
90
  {
92
- "tool": "google-search:search",
91
+ "tool": "google-search:google_search",
93
92
  "arguments": {
94
93
  "query": "photonic computing silicon photonics site:.edu OR filetype:pdf",
95
94
  "num": 10
@@ -110,36 +109,16 @@ For each query where `query_family == "web"`:
110
109
  }
111
110
  ```
112
111
 
113
- ### Step 3: Scrape Web Content via Firecrawl MCP
112
+ ### Step 3: Scrape Web Content via Google Search MCP `read_webpage`
114
113
 
115
- After collecting search results, extract full content using Firecrawl.
114
+ After collecting search results, extract full content using `read_webpage`.
116
115
 
117
- **For multiple URLs (recommended for efficiency):**
116
+ **For each URL:**
118
117
  ```json
119
118
  {
120
- "tool": "firecrawl:firecrawl_batch_scrape",
119
+ "tool": "google-search:read_webpage",
121
120
  "arguments": {
122
- "urls": [
123
- "https://example.mit.edu/photonics.pdf",
124
- "https://lightmatter.co/technology",
125
- "https://ieee.org/article/photonic-computing"
126
- ],
127
- "options": {
128
- "formats": ["markdown"],
129
- "onlyMainContent": true
130
- }
131
- }
132
- }
133
- ```
134
-
135
- **For a single URL:**
136
- ```json
137
- {
138
- "tool": "firecrawl:firecrawl_scrape",
139
- "arguments": {
140
- "url": "https://example.mit.edu/photonics.pdf",
141
- "formats": ["markdown"],
142
- "onlyMainContent": true
121
+ "url": "https://example.mit.edu/photonics.pdf"
143
122
  }
144
123
  }
145
124
  ```
@@ -147,17 +126,14 @@ After collecting search results, extract full content using Firecrawl.
147
126
  **Response format:**
148
127
  ```json
149
128
  {
150
- "success": true,
151
- "data": {
152
- "markdown": "# Silicon Photonics for AI\n\nRecent advances...",
153
- "metadata": {
154
- "title": "Silicon Photonics for AI",
155
- "sourceURL": "https://example.mit.edu/photonics.pdf"
156
- }
157
- }
129
+ "title": "Silicon Photonics for AI - MIT",
130
+ "text": "# Silicon Photonics for AI\n\nRecent advances in silicon photonics have enabled...",
131
+ "url": "https://example.mit.edu/photonics.pdf"
158
132
  }
159
133
  ```
160
134
 
135
+ **Note:** Call `read_webpage` for each URL sequentially or in parallel. The tool automatically converts HTML to readable text and handles most page types.
136
+
161
137
  ### Step 4: Execute Patent Queries via Google Patents MCP
162
138
 
163
139
  For each query where `query_family == "patent"`:
@@ -234,7 +210,7 @@ Transform all responses into the standard schema with sequential source IDs:
234
210
  2. Deduplicate URLs before assigning IDs
235
211
  3. Patents get `source_type: "patent"` with extra fields
236
212
  4. Web content gets `source_type: "web"`
237
- 5. Firecrawl markdown content goes in `excerpt` (truncated if needed)
213
+ 5. The `text` from `read_webpage` goes in `excerpt` (truncated if needed)
238
214
 
239
215
  ### Step 6: Call `sequential_research_compile`
240
216
 
@@ -280,8 +256,8 @@ await fs.writeFile(`research/${slug}/raw_results.json`, JSON.stringify(rawResult
280
256
  | Step | MCP Server | Tool | Purpose |
281
257
  |------|-----------|------|---------|
282
258
  | 1 | sequential-research | `sequential_research_plan` | Generate structured query plan |
283
- | 2 | google-search | `search` | Get web search results |
284
- | 3 | firecrawl | `firecrawl_scrape` / `firecrawl_batch_scrape` | Extract full page content |
259
+ | 2 | google-search | `google_search` | Get web search results |
260
+ | 3 | google-search | `read_webpage` | Extract full page content |
285
261
  | 4 | google-patents | `search_patents` | Search patent database |
286
262
  | 5 | — | — | Normalize to raw_results[] |
287
263
  | 6 | sequential-research | `sequential_research_compile` | Generate report with citations |
@@ -289,31 +265,40 @@ await fs.writeFile(`research/${slug}/raw_results.json`, JSON.stringify(rawResult
289
265
 
290
266
  ---
291
267
 
292
- ## Firecrawl Integration Details
268
+ ## Web Scraping with `read_webpage`
269
+
270
+ The `google-search:read_webpage` tool provides a simple, reliable way to fetch web content without additional MCP server dependencies.
293
271
 
294
- ### When to Use Batch vs Single Scrape
272
+ ### Usage
295
273
 
296
- | Scenario | Tool | Reason |
297
- |----------|------|--------|
298
- | 5+ URLs from same query | `firecrawl_batch_scrape` | Efficiency, rate limiting |
299
- | 1-4 URLs | `firecrawl_scrape` (multiple calls) | Lower overhead |
300
- | PDF documents | `firecrawl_scrape` | Better PDF handling |
301
- | Real-time updates needed | `firecrawl_scrape` | Immediate results |
274
+ ```json
275
+ {
276
+ "tool": "google-search:read_webpage",
277
+ "arguments": {
278
+ "url": "https://example.com/article"
279
+ }
280
+ }
281
+ ```
302
282
 
303
- ### Firecrawl Options
283
+ ### Response Format
304
284
 
305
285
  ```json
306
286
  {
307
- "formats": ["markdown"], // Output format
308
- "onlyMainContent": true, // Skip headers/footers
309
- "includeTags": ["article", "main", "section"], // Focus on content
310
- "excludeTags": ["nav", "footer", "aside"], // Skip navigation
311
- "waitFor": 2000, // Wait for JS rendering (ms)
312
- "timeout": 30000 // Request timeout (ms)
287
+ "title": "Article Title",
288
+ "text": "The full text content of the page...",
289
+ "url": "https://example.com/article"
313
290
  }
314
291
  ```
315
292
 
316
- ### Handling Firecrawl Errors
293
+ ### Features
294
+
295
+ - **Automatic HTML to text conversion** — Clean, readable output
296
+ - **No additional setup** — Uses the same google-search MCP server
297
+ - **Handles most page types** — HTML, some PDFs, etc.
298
+
299
+ ### Handling Scrape Errors
300
+
301
+ If a URL fails to scrape, include it in results with a note:
317
302
 
318
303
  ```json
319
304
  {
@@ -326,13 +311,24 @@ await fs.writeFile(`research/${slug}/raw_results.json`, JSON.stringify(rawResult
326
311
  "source_type": "web",
327
312
  "title": "Page Title (scrape failed)",
328
313
  "url": "https://example.com/blocked",
329
- "excerpt": "[Content unavailable - scrape blocked by robots.txt]"
314
+ "excerpt": "[Content unavailable - page could not be fetched]"
330
315
  }
331
316
  ],
332
317
  "execution_notes": "1 of 5 URLs failed to scrape"
333
318
  }
334
319
  ```
335
320
 
321
+ ### Parallel Execution
322
+
323
+ You can call `read_webpage` for multiple URLs in parallel to improve throughput:
324
+
325
+ ```
326
+ // Execute these concurrently:
327
+ google-search:read_webpage({url: "https://site1.com/page"})
328
+ google-search:read_webpage({url: "https://site2.com/page"})
329
+ google-search:read_webpage({url: "https://site3.com/page"})
330
+ ```
331
+
336
332
  ## Citation Format Requirement
337
333
 
338
334
  Citations must be **stable** and **machine-parseable** for downstream claim-mining.
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "mcp-sequential-research",
3
- "version": "1.0.0",
3
+ "version": "1.1.0",
4
4
  "description": "MCP server for sequential research planning and compilation",
5
5
  "type": "module",
6
6
  "main": "dist/index.js",