mcp-sequential-research 1.0.0 → 1.1.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -229,12 +229,13 @@ This format is designed for downstream claim-mining tools.
229
229
 
230
230
  Works with other MCP servers:
231
231
 
232
- | Source Type | Recommended MCP |
233
- |-------------|-----------------|
234
- | Patents | Google Patents MCP (`search_patents`) |
235
- | Web | Google Search MCP (Custom Search) |
236
- | Memory | Memory MCP (`search_nodes`) |
237
- | Academic | Semantic Scholar API |
232
+ | Source Type | Recommended MCP | Tool |
233
+ |-------------|-----------------|------|
234
+ | Patents | Google Patents MCP | `search_patents` |
235
+ | Web Search | Google Search MCP | `google_search` |
236
+ | Web Scraping | Google Search MCP | `read_webpage` |
237
+ | Memory | Memory MCP | `search_nodes` |
238
+ | Academic | Semantic Scholar API | — |
238
239
 
239
240
  ## License
240
241
 
@@ -32,14 +32,13 @@ This is the exact **operator loop** Claude Code follows for comprehensive resear
32
32
  │ └─→ sequential-research:sequential_research_plan(prompt, constraints) │
33
33
  │ ↓ │
34
34
  │ 2. WEB QUERIES (for each plan.queries where query_family == "web") │
35
- │ └─→ google-search:search({query, num: 10})
35
+ │ └─→ google-search:google_search({query, num: 10})
36
36
  │ Returns: {title, link, snippet}[] │
37
37
  │ ↓ │
38
38
  │ 3. SCRAPE WEB CONTENT │
39
39
  │ ├─→ Collect top URLs (dedupe) │
40
- ├─→ If multiple: firecrawl:firecrawl_batch_scrape({urls})
41
- └─→ If single: firecrawl:firecrawl_scrape({url})
42
- │ Returns: {markdown, metadata} │
40
+ └─→ For each URL: google-search:read_webpage({url})
41
+ Returns: {title, text, url}
43
42
  │ ↓ │
44
43
  │ 4. PATENT QUERIES (for each plan.queries where query_family == "patent") │
45
44
  │ └─→ google-patents:search_patents({query, num_results}) │
@@ -89,7 +88,7 @@ For each query where `query_family == "web"`:
89
88
 
90
89
  ```json
91
90
  {
92
- "tool": "google-search:search",
91
+ "tool": "google-search:google_search",
93
92
  "arguments": {
94
93
  "query": "photonic computing silicon photonics site:.edu OR filetype:pdf",
95
94
  "num": 10
@@ -110,66 +109,69 @@ For each query where `query_family == "web"`:
110
109
  }
111
110
  ```
112
111
 
113
- ### Step 3: Scrape Web Content via Firecrawl MCP
112
+ ### Step 3: Scrape Web Content via Google Search MCP `read_webpage`
114
113
 
115
- After collecting search results, extract full content using Firecrawl.
114
+ After collecting search results, extract full content using `read_webpage`.
116
115
 
117
- **For multiple URLs (recommended for efficiency):**
116
+ **For each URL:**
118
117
  ```json
119
118
  {
120
- "tool": "firecrawl:firecrawl_batch_scrape",
119
+ "tool": "google-search:read_webpage",
121
120
  "arguments": {
122
- "urls": [
123
- "https://example.mit.edu/photonics.pdf",
124
- "https://lightmatter.co/technology",
125
- "https://ieee.org/article/photonic-computing"
126
- ],
127
- "options": {
128
- "formats": ["markdown"],
129
- "onlyMainContent": true
130
- }
121
+ "url": "https://example.mit.edu/photonics.pdf"
131
122
  }
132
123
  }
133
124
  ```
134
125
 
135
- **For a single URL:**
126
+ **Response format:**
136
127
  ```json
137
128
  {
138
- "tool": "firecrawl:firecrawl_scrape",
139
- "arguments": {
140
- "url": "https://example.mit.edu/photonics.pdf",
141
- "formats": ["markdown"],
142
- "onlyMainContent": true
143
- }
129
+ "title": "Silicon Photonics for AI - MIT",
130
+ "text": "# Silicon Photonics for AI\n\nRecent advances in silicon photonics have enabled...",
131
+ "url": "https://example.mit.edu/photonics.pdf"
144
132
  }
145
133
  ```
146
134
 
147
- **Response format:**
135
+ **Note:** Call `read_webpage` for each URL sequentially or in parallel. The tool automatically converts HTML to readable text and handles most page types.
136
+
137
+ ### Step 4: Execute Patent Queries via Google Patents MCP
138
+
139
+ For each query where `query_family == "patent"`:
140
+
148
141
  ```json
149
142
  {
150
- "success": true,
151
- "data": {
152
- "markdown": "# Silicon Photonics for AI\n\nRecent advances...",
153
- "metadata": {
154
- "title": "Silicon Photonics for AI",
155
- "sourceURL": "https://example.mit.edu/photonics.pdf"
156
- }
143
+ "tool": "google-patents:search_patents",
144
+ "arguments": {
145
+ "q": "photonic neural network accelerator",
146
+ "num": 10,
147
+ "country": "US",
148
+ "after": "publication:20200101",
149
+ "sort": "new"
157
150
  }
158
151
  }
159
152
  ```
160
153
 
161
- ### Step 4: Execute Patent Queries via Google Patents MCP
154
+ **IMPORTANT:** Always specify `sort: "new"` or `sort: "old"`. The default `sort: "relevance"` is NOT supported by SerpApi and will cause an error.
162
155
 
163
- For each query where `query_family == "patent"`:
156
+ **Parameter notes:**
157
+ - `q` - Search query (required). Use semicolons to separate terms: `"(photonic) OR (optical);neural network"`
158
+ - `num` - Results per page (10-100, default: 10)
159
+ - `country` - Filter by country codes: `"US"`, `"US,WO,EP"`
160
+ - `after` - Date filter format: `"publication:YYYYMMDD"` or `"filing:YYYYMMDD"`
161
+ - `before` - Date filter format: `"publication:YYYYMMDD"` or `"filing:YYYYMMDD"`
162
+ - `status` - Filter: `"GRANT"` or `"APPLICATION"`
163
+ - `sort` - **Must be `"new"` or `"old"`** (NOT `"relevance"`)
164
164
 
165
165
  ```json
166
+ // CORRECT - explicit sort parameter
166
167
  {
167
168
  "tool": "google-patents:search_patents",
168
169
  "arguments": {
169
- "query": "photonic neural network accelerator",
170
- "num_results": 10,
171
- "country": "US",
172
- "after": "2020-01-01"
170
+ "q": "(optical computing) AND (neural network)",
171
+ "num": 10,
172
+ "country": "US,WO",
173
+ "after": "publication:20180101",
174
+ "sort": "new"
173
175
  }
174
176
  }
175
177
  ```
@@ -234,7 +236,7 @@ Transform all responses into the standard schema with sequential source IDs:
234
236
  2. Deduplicate URLs before assigning IDs
235
237
  3. Patents get `source_type: "patent"` with extra fields
236
238
  4. Web content gets `source_type: "web"`
237
- 5. Firecrawl markdown content goes in `excerpt` (truncated if needed)
239
+ 5. The `text` from `read_webpage` goes in `excerpt` (truncated if needed)
238
240
 
239
241
  ### Step 6: Call `sequential_research_compile`
240
242
 
@@ -280,8 +282,8 @@ await fs.writeFile(`research/${slug}/raw_results.json`, JSON.stringify(rawResult
280
282
  | Step | MCP Server | Tool | Purpose |
281
283
  |------|-----------|------|---------|
282
284
  | 1 | sequential-research | `sequential_research_plan` | Generate structured query plan |
283
- | 2 | google-search | `search` | Get web search results |
284
- | 3 | firecrawl | `firecrawl_scrape` / `firecrawl_batch_scrape` | Extract full page content |
285
+ | 2 | google-search | `google_search` | Get web search results |
286
+ | 3 | google-search | `read_webpage` | Extract full page content |
285
287
  | 4 | google-patents | `search_patents` | Search patent database |
286
288
  | 5 | — | — | Normalize to raw_results[] |
287
289
  | 6 | sequential-research | `sequential_research_compile` | Generate report with citations |
@@ -289,31 +291,40 @@ await fs.writeFile(`research/${slug}/raw_results.json`, JSON.stringify(rawResult
289
291
 
290
292
  ---
291
293
 
292
- ## Firecrawl Integration Details
294
+ ## Web Scraping with `read_webpage`
295
+
296
+ The `google-search:read_webpage` tool provides a simple, reliable way to fetch web content without additional MCP server dependencies.
293
297
 
294
- ### When to Use Batch vs Single Scrape
298
+ ### Usage
295
299
 
296
- | Scenario | Tool | Reason |
297
- |----------|------|--------|
298
- | 5+ URLs from same query | `firecrawl_batch_scrape` | Efficiency, rate limiting |
299
- | 1-4 URLs | `firecrawl_scrape` (multiple calls) | Lower overhead |
300
- | PDF documents | `firecrawl_scrape` | Better PDF handling |
301
- | Real-time updates needed | `firecrawl_scrape` | Immediate results |
300
+ ```json
301
+ {
302
+ "tool": "google-search:read_webpage",
303
+ "arguments": {
304
+ "url": "https://example.com/article"
305
+ }
306
+ }
307
+ ```
302
308
 
303
- ### Firecrawl Options
309
+ ### Response Format
304
310
 
305
311
  ```json
306
312
  {
307
- "formats": ["markdown"], // Output format
308
- "onlyMainContent": true, // Skip headers/footers
309
- "includeTags": ["article", "main", "section"], // Focus on content
310
- "excludeTags": ["nav", "footer", "aside"], // Skip navigation
311
- "waitFor": 2000, // Wait for JS rendering (ms)
312
- "timeout": 30000 // Request timeout (ms)
313
+ "title": "Article Title",
314
+ "text": "The full text content of the page...",
315
+ "url": "https://example.com/article"
313
316
  }
314
317
  ```
315
318
 
316
- ### Handling Firecrawl Errors
319
+ ### Features
320
+
321
+ - **Automatic HTML to text conversion** — Clean, readable output
322
+ - **No additional setup** — Uses the same google-search MCP server
323
+ - **Handles most page types** — HTML, some PDFs, etc.
324
+
325
+ ### Handling Scrape Errors
326
+
327
+ If a URL fails to scrape, include it in results with a note:
317
328
 
318
329
  ```json
319
330
  {
@@ -326,13 +337,24 @@ await fs.writeFile(`research/${slug}/raw_results.json`, JSON.stringify(rawResult
326
337
  "source_type": "web",
327
338
  "title": "Page Title (scrape failed)",
328
339
  "url": "https://example.com/blocked",
329
- "excerpt": "[Content unavailable - scrape blocked by robots.txt]"
340
+ "excerpt": "[Content unavailable - page could not be fetched]"
330
341
  }
331
342
  ],
332
343
  "execution_notes": "1 of 5 URLs failed to scrape"
333
344
  }
334
345
  ```
335
346
 
347
+ ### Parallel Execution
348
+
349
+ You can call `read_webpage` for multiple URLs in parallel to improve throughput:
350
+
351
+ ```
352
+ // Execute these concurrently:
353
+ google-search:read_webpage({url: "https://site1.com/page"})
354
+ google-search:read_webpage({url: "https://site2.com/page"})
355
+ google-search:read_webpage({url: "https://site3.com/page"})
356
+ ```
357
+
336
358
  ## Citation Format Requirement
337
359
 
338
360
  Citations must be **stable** and **machine-parseable** for downstream claim-mining.
@@ -119,15 +119,18 @@ Execute queries using appropriate MCP tools. Here's how each query maps to data
119
119
  "params": {
120
120
  "name": "search_patents",
121
121
  "arguments": {
122
- "query": "photonic computing neural network inference",
123
- "num_results": 5,
122
+ "q": "photonic computing neural network inference",
123
+ "num": 10,
124
124
  "country": "US",
125
- "after": "2020-01-01"
125
+ "after": "publication:20200101",
126
+ "sort": "new"
126
127
  }
127
128
  }
128
129
  }
129
130
  ```
130
131
 
132
+ **IMPORTANT:** Always use `sort: "new"` or `sort: "old"`. The default `sort: "relevance"` is NOT supported by SerpApi and will cause an error.
133
+
131
134
  ### Example: Query q1 via Google Search MCP
132
135
 
133
136
  ```json
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "mcp-sequential-research",
3
- "version": "1.0.0",
3
+ "version": "1.1.1",
4
4
  "description": "MCP server for sequential research planning and compilation",
5
5
  "type": "module",
6
6
  "main": "dist/index.js",