thordata-mcp-server 0.4.4__py3-none-any.whl → 0.5.0__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: thordata-mcp-server
3
- Version: 0.4.4
3
+ Version: 0.5.0
4
4
  Summary: Official MCP Server for Thordata.
5
5
  Author-email: Thordata Developer Team <support@thordata.com>
6
6
  License-Expression: MIT
@@ -8,7 +8,7 @@ Requires-Python: >=3.10
8
8
  Description-Content-Type: text/markdown
9
9
  Requires-Dist: mcp[cli]>=1.0.0
10
10
  Requires-Dist: sse-starlette>=1.6.1
11
- Requires-Dist: thordata-sdk>=1.6.0
11
+ Requires-Dist: thordata-sdk>=1.7.0
12
12
  Requires-Dist: pydantic-settings
13
13
  Requires-Dist: markdownify
14
14
  Requires-Dist: html2text
@@ -23,14 +23,14 @@ Requires-Dist: uvicorn
23
23
 
24
24
  **Give your AI Agents real-time web scraping superpowers.**
25
25
 
26
- This MCP Server version has been **streamlined to focus on scraping**, concentrating on four core products:
26
+ This MCP Server version has been **streamlined to focus on scraping**, concentrating on a compact, LLM‑friendly tool surface:
27
27
 
28
- - **SERP API** (Search result scraping)
28
+ - **Search Engine** (LLM-friendly web search wrapper)
29
+ - **SERP API** (Search result scraping, internal plumbing)
29
30
  - **Web Unlocker / Universal Scraper** (Universal page unlocking & scraping)
30
- - **Web Scraper API** (Structured task flow)
31
31
  - **Scraping Browser** (Browser-level scraping)
32
32
 
33
- Earlier versions exposed `proxy.*` / `account.*` / `proxy_users.*` proxy and account management tools. This version has removed these control plane interfaces, keeping only scraping-related capabilities for a clean tool surface in Cursor / MCP clients.
33
+ Earlier versions exposed `proxy.*` / `account.*` / `proxy_users.*` proxy and account management tools, and a large `web_scraper` task surface. This version removes those control plane interfaces from MCP, keeping only scraping-related capabilities that are easy for LLMs to use.
34
34
 
35
35
  ## 🚀 Features
36
36
 
@@ -76,38 +76,15 @@ THORDATA_BROWSER_PASSWORD=your_password
76
76
 
77
77
  ### Tool Exposure Modes
78
78
 
79
- Current implementation provides **streamlined scraping tool surface only**, no longer exposing proxy and account management tools:
79
+ Current implementation provides a **compact scraping tool surface**, optimized for Cursor / LLM tool callers:
80
80
 
81
- - **SERP SCRAPER**: `serp` (actions: `search`, `batch_search`)
82
- - **WEB UNLOCKER**: `unlocker` (actions: `fetch`, `batch_fetch`)
83
- - **WEB SCRAPER (100+ structured tasks + task management)**: `web_scraper` (actions: `catalog`, `groups`, `run`, `batch_run`, `status`, `status_batch`, `wait`, `result`, `result_batch`, `list_tasks`, `cancel`)
84
- - **BROWSER SCRAPER**: `browser` (actions: `navigate`, `snapshot`)
85
- - **Smart (auto tool + fallback)**: `smart_scrape`
81
+ - **`search_engine`** (recommended for LLMs): high-level web search wrapper, returns a light `results[]` array with `title/link/description`. Internally delegates to the SERP backend.
82
+ - **`search_engine_batch`**: batch variant of `search_engine` with per-item `ok/error` results.
83
+ - **`unlocker`**: actions `fetch`, `batch_fetch` universal page unlock & content extraction (HTML/Markdown), with per-item error reporting for batch.
84
+ - **`browser`**: action `snapshot` navigate (optional `url`) and capture an ARIA-focused snapshot for interactive elements.
85
+ - **`smart_scrape`**: auto-picks the best scraper (SERP, Web Scraper, Unlocker) for a given URL and returns a unified, LLM-friendly response.
86
86
 
87
- > Note: This version focuses on scraping functionality and no longer includes `proxy.*` / `account.*` control plane tools.
88
-
89
- ### Web Scraper discovery (100+ tools, no extra env required)
90
-
91
- Use `web_scraper` with `action="catalog"` / `action="groups"` to discover tools.
92
- This keeps Cursor/LLMs usable while still supporting **100+ tools** under a single entrypoint.
93
-
94
- ```env
95
- # Default: curated + limit 60
96
- THORDATA_TASKS_LIST_MODE=curated
97
- THORDATA_TASKS_LIST_DEFAULT_LIMIT=60
98
-
99
- # Which groups are included when mode=curated
100
- THORDATA_TASKS_GROUPS=ecommerce,social,video,search,travel,code,professional
101
-
102
- # Optional safety/UX: restrict which tools can actually run
103
- # (comma-separated prefixes or exact tool keys)
104
- # Example:
105
- # THORDATA_TASKS_ALLOWLIST=thordata.tools.video.,thordata.tools.ecommerce.Amazon.ProductByAsin
106
- THORDATA_TASKS_ALLOWLIST=
107
- ```
108
-
109
- If you want Cursor to **never** see the full 300+ tool list, keep `THORDATA_TASKS_LIST_MODE=curated`
110
- and optionally set `THORDATA_TASKS_ALLOWLIST` to the small subset you actually want to support.
87
+ Internally, the server still uses structured SERP and Web Scraper capabilities, but they are not exposed as large tool surfaces by default to avoid overwhelming LLMs.
111
88
 
112
89
  ### Deployment (Optional)
113
90
 
@@ -162,19 +139,17 @@ Add this to your `claude_desktop_config.json`:
162
139
  Notes:
163
140
  - `THORDATA_BROWSER_USERNAME` / `THORDATA_BROWSER_PASSWORD` are required for `browser.*` tools (Scraping Browser).
164
141
 
165
- ## 🛠️ Available Tools
166
-
167
- ### Available Tools (All directly related to scraping)
142
+ ## 🛠️ Available Tools (Compact Surface)
168
143
 
169
- Current MCP Server only exposes the following **5 scraping-related tools**:
144
+ By default, the MCP server exposes a **small, LLM-friendly tool set**:
170
145
 
171
- - **`serp`**: action `search`, `batch_search`
172
- - **`unlocker`**: action `fetch`, `batch_fetch`
173
- - **`web_scraper`**: action `catalog`, `groups`, `run`, `batch_run`, `status`, `status_batch`, `wait`, `result`, `result_batch`, `list_tasks`, `cancel`
174
- - **`browser`**: action `navigate`, `snapshot`
175
- - **`smart_scrape`**: auto-pick structured tool; fallback to unlocker
146
+ - **`search_engine`**: single-query web search (`params.q`, optional `params.num`, `params.engine`).
147
+ - **`search_engine_batch`**: batch web search with per-item `ok/error` in `results[]`.
148
+ - **`unlocker`**: universal scraping via `fetch` / `batch_fetch`.
149
+ - **`browser`**: `snapshot` with optional `url`, `max_items`, and `max_chars`.
150
+ - **`smart_scrape`**: smart router for `url` with optional preview limit parameters.
176
151
 
177
- > Proxy network related APIs can still be used via other Thordata SDKs / HTTP APIs, but are not exposed through MCP to avoid introducing complex management operations in LLMs.
152
+ Advanced / internal tools (e.g. low-level `serp.*`, full `web_scraper.*` surfaces, proxy/account control plane) remain available via HTTP APIs and SDKs, but are not exposed directly as MCP tools to keep the surface manageable for agents and LLMs.
178
153
 
179
154
  ## 🏗️ Architecture
180
155
 
@@ -189,14 +164,14 @@ thordata_mcp/
189
164
  ├── utils.py # Common utilities (error handling, responses)
190
165
  ├── browser_session.py # Browser session management (Playwright)
191
166
  ├── aria_snapshot.py # ARIA snapshot filtering
192
- └── tools/
193
- ├── product_compact.py # Streamlined 5-tool entry point (serp/unlocker/web_scraper/browser/smart_scrape)
194
- ├── product.py # Full product implementation for internal use (reused by compact version)
195
- ├── data/ # Data plane tools (only scraping-related namespaces retained)
196
- │ ├── serp.py # serp.*
197
- │ ├── universal.py # universal.*
198
- │ ├── browser.py # browser.*
199
- │ └── tasks.py # tasks.*
167
+ └── tools/
168
+ ├── product_compact.py # Streamlined MCP entrypoint (search_engine / unlocker / browser / smart_scrape, plus batch variants)
169
+ ├── product.py # Full product implementation for internal use (reused by compact version)
170
+ ├── data/ # Data plane tools (only scraping-related namespaces retained)
171
+ │ ├── serp.py # SERP backend integration
172
+ │ ├── universal.py # Universal / Unlocker backend integration
173
+ │ ├── browser.py # Browser / Playwright helpers
174
+ │ └── tasks.py # Structured scraping tasks (used by smart_scrape and internal flows)
200
175
  ```
201
176
 
202
177
  ## 🎯 Design Principles
@@ -0,0 +1,26 @@
1
+ thordata_mcp/__init__.py,sha256=EMlQmThUq4YcxjIewML7g2ZyT3W3TXaYGoVciewYbwg,61
2
+ thordata_mcp/aria_snapshot.py,sha256=SW8d_MxudDUlrl0OPjfG1brg67WyC-I4fMnwPOpk9oU,3293
3
+ thordata_mcp/browser_session.py,sha256=p3rM00kHcL8DVSBRoqZfwl8X8Kfi4sx6AsCX9ACi-z4,17654
4
+ thordata_mcp/config.py,sha256=M5gTGrYdvKOlUCI6ZHoSMEWPWjAz7Onu5XXAnat9bts,2957
5
+ thordata_mcp/context.py,sha256=rptQ55f7SP47tbHb9mrebkdrSvPm4gZaabSr15WpnLU,1304
6
+ thordata_mcp/debug_http.py,sha256=6bioJS7M8luF2wr39cQeD9jqgYBPeTAPWxbing6JhD0,7045
7
+ thordata_mcp/main.py,sha256=qcqTdF6YM0lwmp3Zj2R7vlA8LmczflykANCvcOKJZcQ,3743
8
+ thordata_mcp/monitoring.py,sha256=JIFJa0mJb1f9tyzSyERVtOsm_YbKv3i9-E_8aBhI7Ho,7458
9
+ thordata_mcp/registry.py,sha256=4Vp49bNB8BeBAFfyPJO9SPRU2A4Ez6EM5OClAEY1NYE,1339
10
+ thordata_mcp/utils.py,sha256=c_U3jle0oLFOyBYz68ECZxUpRX2-NdyEk12LK1nLnH8,15997
11
+ thordata_mcp/tools/__init__.py,sha256=6XOtYo7kqcHuBQLE0xZLGilP5NxNmZArIjy_tKIMep0,618
12
+ thordata_mcp/tools/debug.py,sha256=Lunt2cm7HvAiQJK1T1damy2mVSVm1RUqr4cPTl8IXko,5468
13
+ thordata_mcp/tools/params_utils.py,sha256=u3QCD9cS6p7APZxMFJM5gkfvQxTeWB_4d0mVWeiozwk,3483
14
+ thordata_mcp/tools/product.py,sha256=sY39fAOSBLcErgyroN3x6W0dftVnYvHH6178DRz9c_4,75573
15
+ thordata_mcp/tools/product_compact.py,sha256=d_WwW1dBDPOHM7E-WSq3eyVxfmZRElrMPM1EowiOXiU,105964
16
+ thordata_mcp/tools/utils.py,sha256=BaD3150Dt_vhLDZliep-_LZaAiN7IAC3HPV2Db8WBZc,4538
17
+ thordata_mcp/tools/data/__init__.py,sha256=KMD19WJlkPe3Y3e-U1v9IjAO3E_qEu2dtzS5yj_h1XU,398
18
+ thordata_mcp/tools/data/browser.py,sha256=KVhNtYUpIV5FwE7uCpSbCM1TWMMezNk679UPaXbDp1Q,20307
19
+ thordata_mcp/tools/data/serp.py,sha256=BdxVMiQ5WwHqK1sPIfk1Ehx9MCG-QFZKLu8ZlZE2IlA,3439
20
+ thordata_mcp/tools/data/tasks.py,sha256=_spv5Uz3ZGdP4sSCrJT3nJg5UQILGxEijjK4Z-zJZVA,16274
21
+ thordata_mcp/tools/data/universal.py,sha256=JUQUOwN2x_p7coRt71bCykQxOzZCnyiN4NvSq4dfZaI,9482
22
+ thordata_mcp_server-0.5.0.dist-info/METADATA,sha256=keM0rodUXSyI35OaNCxF0IrHHPf6qhKM1NXuf6YR8zE,7666
23
+ thordata_mcp_server-0.5.0.dist-info/WHEEL,sha256=wUyA8OaulRlbfwMtmQsvNngGrxQHAvkKcvRmdizlJi0,92
24
+ thordata_mcp_server-0.5.0.dist-info/entry_points.txt,sha256=DI4aWMlUKBmTJTbNanK9m1LWoKeFvuLYS4sPd0n5GIQ,56
25
+ thordata_mcp_server-0.5.0.dist-info/top_level.txt,sha256=9ODiyY_ikjIGCkF7_zk3HAd07KUkH63Uu7Ax-cz7DA8,13
26
+ thordata_mcp_server-0.5.0.dist-info/RECORD,,
@@ -1,24 +0,0 @@
1
- thordata_mcp/__init__.py,sha256=yPIlzOnjL77vd9e1iuXpJs7wUiGYSAnSZtrrdrdQlOg,61
2
- thordata_mcp/aria_snapshot.py,sha256=SW8d_MxudDUlrl0OPjfG1brg67WyC-I4fMnwPOpk9oU,3293
3
- thordata_mcp/browser_session.py,sha256=I804ixmgl2d4B4MC3bqVniBUlYZ5693V618mZ7tYGkU,11425
4
- thordata_mcp/config.py,sha256=HJzFZxRnhhu2wbl4xMgEJIFy4cifSMQ-TYyVGcyVSug,2655
5
- thordata_mcp/context.py,sha256=WmfoEbk_y3lxkssWIsDRpr00fwvuomzclskIUnMlQs0,1305
6
- thordata_mcp/debug_http.py,sha256=6bioJS7M8luF2wr39cQeD9jqgYBPeTAPWxbing6JhD0,7045
7
- thordata_mcp/main.py,sha256=qcqTdF6YM0lwmp3Zj2R7vlA8LmczflykANCvcOKJZcQ,3743
8
- thordata_mcp/monitoring.py,sha256=JIFJa0mJb1f9tyzSyERVtOsm_YbKv3i9-E_8aBhI7Ho,7458
9
- thordata_mcp/registry.py,sha256=4Vp49bNB8BeBAFfyPJO9SPRU2A4Ez6EM5OClAEY1NYE,1339
10
- thordata_mcp/utils.py,sha256=bxrQiJPJu1TvlOSQXUR-LFwFN5r-6qIrKLNEkSAgyIM,13251
11
- thordata_mcp/tools/__init__.py,sha256=6XOtYo7kqcHuBQLE0xZLGilP5NxNmZArIjy_tKIMep0,618
12
- thordata_mcp/tools/product.py,sha256=38Se8OAKLmuRm6bXyLS28PBOXa_LVJ-lYPg6u-cVUhM,72230
13
- thordata_mcp/tools/product_compact.py,sha256=mVOPP3Bde2TzD3m6PV3uHiDMdaqnYb5nhF2fSwIdURQ,48705
14
- thordata_mcp/tools/utils.py,sha256=76YoU2hk1G9GOOxQhvNwYK5mZsQKJT8DnKzhzu-b4r8,4375
15
- thordata_mcp/tools/data/__init__.py,sha256=KMD19WJlkPe3Y3e-U1v9IjAO3E_qEu2dtzS5yj_h1XU,398
16
- thordata_mcp/tools/data/browser.py,sha256=1NXFlqeMCCHWfmklCYAohcs9A_ceBpqO9f7ZNRJZR6c,16122
17
- thordata_mcp/tools/data/serp.py,sha256=BdxVMiQ5WwHqK1sPIfk1Ehx9MCG-QFZKLu8ZlZE2IlA,3439
18
- thordata_mcp/tools/data/tasks.py,sha256=_spv5Uz3ZGdP4sSCrJT3nJg5UQILGxEijjK4Z-zJZVA,16274
19
- thordata_mcp/tools/data/universal.py,sha256=JUQUOwN2x_p7coRt71bCykQxOzZCnyiN4NvSq4dfZaI,9482
20
- thordata_mcp_server-0.4.4.dist-info/METADATA,sha256=9_X3K4gIA2U641GWkvbbRpqBbEb51opEWD7pHZ2uJxA,8000
21
- thordata_mcp_server-0.4.4.dist-info/WHEEL,sha256=wUyA8OaulRlbfwMtmQsvNngGrxQHAvkKcvRmdizlJi0,92
22
- thordata_mcp_server-0.4.4.dist-info/entry_points.txt,sha256=DI4aWMlUKBmTJTbNanK9m1LWoKeFvuLYS4sPd0n5GIQ,56
23
- thordata_mcp_server-0.4.4.dist-info/top_level.txt,sha256=9ODiyY_ikjIGCkF7_zk3HAd07KUkH63Uu7Ax-cz7DA8,13
24
- thordata_mcp_server-0.4.4.dist-info/RECORD,,