thordata-mcp-server 0.4.4__py3-none-any.whl → 0.5.0__py3-none-any.whl
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- thordata_mcp/__init__.py +1 -1
- thordata_mcp/browser_session.py +157 -12
- thordata_mcp/config.py +14 -3
- thordata_mcp/context.py +1 -1
- thordata_mcp/tools/data/browser.py +124 -18
- thordata_mcp/tools/debug.py +125 -0
- thordata_mcp/tools/params_utils.py +107 -0
- thordata_mcp/tools/product.py +83 -5
- thordata_mcp/tools/product_compact.py +2108 -962
- thordata_mcp/tools/utils.py +2 -0
- thordata_mcp/utils.py +393 -322
- {thordata_mcp_server-0.4.4.dist-info → thordata_mcp_server-0.5.0.dist-info}/METADATA +29 -54
- thordata_mcp_server-0.5.0.dist-info/RECORD +26 -0
- thordata_mcp_server-0.4.4.dist-info/RECORD +0 -24
- {thordata_mcp_server-0.4.4.dist-info → thordata_mcp_server-0.5.0.dist-info}/WHEEL +0 -0
- {thordata_mcp_server-0.4.4.dist-info → thordata_mcp_server-0.5.0.dist-info}/entry_points.txt +0 -0
- {thordata_mcp_server-0.4.4.dist-info → thordata_mcp_server-0.5.0.dist-info}/top_level.txt +0 -0
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
Metadata-Version: 2.4
|
|
2
2
|
Name: thordata-mcp-server
|
|
3
|
-
Version: 0.
|
|
3
|
+
Version: 0.5.0
|
|
4
4
|
Summary: Official MCP Server for Thordata.
|
|
5
5
|
Author-email: Thordata Developer Team <support@thordata.com>
|
|
6
6
|
License-Expression: MIT
|
|
@@ -8,7 +8,7 @@ Requires-Python: >=3.10
|
|
|
8
8
|
Description-Content-Type: text/markdown
|
|
9
9
|
Requires-Dist: mcp[cli]>=1.0.0
|
|
10
10
|
Requires-Dist: sse-starlette>=1.6.1
|
|
11
|
-
Requires-Dist: thordata-sdk>=1.
|
|
11
|
+
Requires-Dist: thordata-sdk>=1.7.0
|
|
12
12
|
Requires-Dist: pydantic-settings
|
|
13
13
|
Requires-Dist: markdownify
|
|
14
14
|
Requires-Dist: html2text
|
|
@@ -23,14 +23,14 @@ Requires-Dist: uvicorn
|
|
|
23
23
|
|
|
24
24
|
**Give your AI Agents real-time web scraping superpowers.**
|
|
25
25
|
|
|
26
|
-
This MCP Server version has been **streamlined to focus on scraping**, concentrating on
|
|
26
|
+
This MCP Server version has been **streamlined to focus on scraping**, concentrating on a compact, LLM‑friendly tool surface:
|
|
27
27
|
|
|
28
|
-
- **
|
|
28
|
+
- **Search Engine** (LLM-friendly web search wrapper)
|
|
29
|
+
- **SERP API** (Search result scraping, internal plumbing)
|
|
29
30
|
- **Web Unlocker / Universal Scraper** (Universal page unlocking & scraping)
|
|
30
|
-
- **Web Scraper API** (Structured task flow)
|
|
31
31
|
- **Scraping Browser** (Browser-level scraping)
|
|
32
32
|
|
|
33
|
-
Earlier versions exposed `proxy.*` / `account.*` / `proxy_users.*` proxy and account management tools. This version
|
|
33
|
+
Earlier versions exposed `proxy.*` / `account.*` / `proxy_users.*` proxy and account management tools, and a large `web_scraper` task surface. This version removes those control plane interfaces from MCP, keeping only scraping-related capabilities that are easy for LLMs to use.
|
|
34
34
|
|
|
35
35
|
## 🚀 Features
|
|
36
36
|
|
|
@@ -76,38 +76,15 @@ THORDATA_BROWSER_PASSWORD=your_password
|
|
|
76
76
|
|
|
77
77
|
### Tool Exposure Modes
|
|
78
78
|
|
|
79
|
-
Current implementation provides **
|
|
79
|
+
Current implementation provides a **compact scraping tool surface**, optimized for Cursor / LLM tool callers:
|
|
80
80
|
|
|
81
|
-
-
|
|
82
|
-
-
|
|
83
|
-
-
|
|
84
|
-
-
|
|
85
|
-
-
|
|
81
|
+
- **`search_engine`** (recommended for LLMs): high-level web search wrapper, returns a light `results[]` array with `title/link/description`. Internally delegates to the SERP backend.
|
|
82
|
+
- **`search_engine_batch`**: batch variant of `search_engine` with per-item `ok/error` results.
|
|
83
|
+
- **`unlocker`**: actions `fetch`, `batch_fetch` – universal page unlock & content extraction (HTML/Markdown), with per-item error reporting for batch.
|
|
84
|
+
- **`browser`**: action `snapshot` – navigate (optional `url`) and capture an ARIA-focused snapshot for interactive elements.
|
|
85
|
+
- **`smart_scrape`**: auto-picks the best scraper (SERP, Web Scraper, Unlocker) for a given URL and returns a unified, LLM-friendly response.
|
|
86
86
|
|
|
87
|
-
|
|
88
|
-
|
|
89
|
-
### Web Scraper discovery (100+ tools, no extra env required)
|
|
90
|
-
|
|
91
|
-
Use `web_scraper` with `action="catalog"` / `action="groups"` to discover tools.
|
|
92
|
-
This keeps Cursor/LLMs usable while still supporting **100+ tools** under a single entrypoint.
|
|
93
|
-
|
|
94
|
-
```env
|
|
95
|
-
# Default: curated + limit 60
|
|
96
|
-
THORDATA_TASKS_LIST_MODE=curated
|
|
97
|
-
THORDATA_TASKS_LIST_DEFAULT_LIMIT=60
|
|
98
|
-
|
|
99
|
-
# Which groups are included when mode=curated
|
|
100
|
-
THORDATA_TASKS_GROUPS=ecommerce,social,video,search,travel,code,professional
|
|
101
|
-
|
|
102
|
-
# Optional safety/UX: restrict which tools can actually run
|
|
103
|
-
# (comma-separated prefixes or exact tool keys)
|
|
104
|
-
# Example:
|
|
105
|
-
# THORDATA_TASKS_ALLOWLIST=thordata.tools.video.,thordata.tools.ecommerce.Amazon.ProductByAsin
|
|
106
|
-
THORDATA_TASKS_ALLOWLIST=
|
|
107
|
-
```
|
|
108
|
-
|
|
109
|
-
If you want Cursor to **never** see the full 300+ tool list, keep `THORDATA_TASKS_LIST_MODE=curated`
|
|
110
|
-
and optionally set `THORDATA_TASKS_ALLOWLIST` to the small subset you actually want to support.
|
|
87
|
+
Internally, the server still uses structured SERP and Web Scraper capabilities, but they are not exposed as large tool surfaces by default to avoid overwhelming LLMs.
|
|
111
88
|
|
|
112
89
|
### Deployment (Optional)
|
|
113
90
|
|
|
@@ -162,19 +139,17 @@ Add this to your `claude_desktop_config.json`:
|
|
|
162
139
|
Notes:
|
|
163
140
|
- `THORDATA_BROWSER_USERNAME` / `THORDATA_BROWSER_PASSWORD` are required for `browser.*` tools (Scraping Browser).
|
|
164
141
|
|
|
165
|
-
## 🛠️ Available Tools
|
|
166
|
-
|
|
167
|
-
### Available Tools (All directly related to scraping)
|
|
142
|
+
## 🛠️ Available Tools (Compact Surface)
|
|
168
143
|
|
|
169
|
-
|
|
144
|
+
By default, the MCP server exposes a **small, LLM-friendly tool set**:
|
|
170
145
|
|
|
171
|
-
- **`
|
|
172
|
-
- **`
|
|
173
|
-
- **`
|
|
174
|
-
- **`browser`**:
|
|
175
|
-
- **`smart_scrape`**:
|
|
146
|
+
- **`search_engine`**: single-query web search (`params.q`, optional `params.num`, `params.engine`).
|
|
147
|
+
- **`search_engine_batch`**: batch web search with per-item `ok/error` in `results[]`.
|
|
148
|
+
- **`unlocker`**: universal scraping via `fetch` / `batch_fetch`.
|
|
149
|
+
- **`browser`**: `snapshot` with optional `url`, `max_items`, and `max_chars`.
|
|
150
|
+
- **`smart_scrape`**: smart router for `url` with optional preview limit parameters.
|
|
176
151
|
|
|
177
|
-
|
|
152
|
+
Advanced / internal tools (e.g. low-level `serp.*`, full `web_scraper.*` surfaces, proxy/account control plane) remain available via HTTP APIs and SDKs, but are not exposed directly as MCP tools to keep the surface manageable for agents and LLMs.
|
|
178
153
|
|
|
179
154
|
## 🏗️ Architecture
|
|
180
155
|
|
|
@@ -189,14 +164,14 @@ thordata_mcp/
|
|
|
189
164
|
├── utils.py # Common utilities (error handling, responses)
|
|
190
165
|
├── browser_session.py # Browser session management (Playwright)
|
|
191
166
|
├── aria_snapshot.py # ARIA snapshot filtering
|
|
192
|
-
└── tools/
|
|
193
|
-
|
|
194
|
-
|
|
195
|
-
|
|
196
|
-
|
|
197
|
-
|
|
198
|
-
|
|
199
|
-
|
|
167
|
+
└── tools/
|
|
168
|
+
├── product_compact.py # Streamlined MCP entrypoint (search_engine / unlocker / browser / smart_scrape, plus batch variants)
|
|
169
|
+
├── product.py # Full product implementation for internal use (reused by compact version)
|
|
170
|
+
├── data/ # Data plane tools (only scraping-related namespaces retained)
|
|
171
|
+
│ ├── serp.py # SERP backend integration
|
|
172
|
+
│ ├── universal.py # Universal / Unlocker backend integration
|
|
173
|
+
│ ├── browser.py # Browser / Playwright helpers
|
|
174
|
+
│ └── tasks.py # Structured scraping tasks (used by smart_scrape and internal flows)
|
|
200
175
|
```
|
|
201
176
|
|
|
202
177
|
## 🎯 Design Principles
|
|
@@ -0,0 +1,26 @@
|
|
|
1
|
+
thordata_mcp/__init__.py,sha256=EMlQmThUq4YcxjIewML7g2ZyT3W3TXaYGoVciewYbwg,61
|
|
2
|
+
thordata_mcp/aria_snapshot.py,sha256=SW8d_MxudDUlrl0OPjfG1brg67WyC-I4fMnwPOpk9oU,3293
|
|
3
|
+
thordata_mcp/browser_session.py,sha256=p3rM00kHcL8DVSBRoqZfwl8X8Kfi4sx6AsCX9ACi-z4,17654
|
|
4
|
+
thordata_mcp/config.py,sha256=M5gTGrYdvKOlUCI6ZHoSMEWPWjAz7Onu5XXAnat9bts,2957
|
|
5
|
+
thordata_mcp/context.py,sha256=rptQ55f7SP47tbHb9mrebkdrSvPm4gZaabSr15WpnLU,1304
|
|
6
|
+
thordata_mcp/debug_http.py,sha256=6bioJS7M8luF2wr39cQeD9jqgYBPeTAPWxbing6JhD0,7045
|
|
7
|
+
thordata_mcp/main.py,sha256=qcqTdF6YM0lwmp3Zj2R7vlA8LmczflykANCvcOKJZcQ,3743
|
|
8
|
+
thordata_mcp/monitoring.py,sha256=JIFJa0mJb1f9tyzSyERVtOsm_YbKv3i9-E_8aBhI7Ho,7458
|
|
9
|
+
thordata_mcp/registry.py,sha256=4Vp49bNB8BeBAFfyPJO9SPRU2A4Ez6EM5OClAEY1NYE,1339
|
|
10
|
+
thordata_mcp/utils.py,sha256=c_U3jle0oLFOyBYz68ECZxUpRX2-NdyEk12LK1nLnH8,15997
|
|
11
|
+
thordata_mcp/tools/__init__.py,sha256=6XOtYo7kqcHuBQLE0xZLGilP5NxNmZArIjy_tKIMep0,618
|
|
12
|
+
thordata_mcp/tools/debug.py,sha256=Lunt2cm7HvAiQJK1T1damy2mVSVm1RUqr4cPTl8IXko,5468
|
|
13
|
+
thordata_mcp/tools/params_utils.py,sha256=u3QCD9cS6p7APZxMFJM5gkfvQxTeWB_4d0mVWeiozwk,3483
|
|
14
|
+
thordata_mcp/tools/product.py,sha256=sY39fAOSBLcErgyroN3x6W0dftVnYvHH6178DRz9c_4,75573
|
|
15
|
+
thordata_mcp/tools/product_compact.py,sha256=d_WwW1dBDPOHM7E-WSq3eyVxfmZRElrMPM1EowiOXiU,105964
|
|
16
|
+
thordata_mcp/tools/utils.py,sha256=BaD3150Dt_vhLDZliep-_LZaAiN7IAC3HPV2Db8WBZc,4538
|
|
17
|
+
thordata_mcp/tools/data/__init__.py,sha256=KMD19WJlkPe3Y3e-U1v9IjAO3E_qEu2dtzS5yj_h1XU,398
|
|
18
|
+
thordata_mcp/tools/data/browser.py,sha256=KVhNtYUpIV5FwE7uCpSbCM1TWMMezNk679UPaXbDp1Q,20307
|
|
19
|
+
thordata_mcp/tools/data/serp.py,sha256=BdxVMiQ5WwHqK1sPIfk1Ehx9MCG-QFZKLu8ZlZE2IlA,3439
|
|
20
|
+
thordata_mcp/tools/data/tasks.py,sha256=_spv5Uz3ZGdP4sSCrJT3nJg5UQILGxEijjK4Z-zJZVA,16274
|
|
21
|
+
thordata_mcp/tools/data/universal.py,sha256=JUQUOwN2x_p7coRt71bCykQxOzZCnyiN4NvSq4dfZaI,9482
|
|
22
|
+
thordata_mcp_server-0.5.0.dist-info/METADATA,sha256=keM0rodUXSyI35OaNCxF0IrHHPf6qhKM1NXuf6YR8zE,7666
|
|
23
|
+
thordata_mcp_server-0.5.0.dist-info/WHEEL,sha256=wUyA8OaulRlbfwMtmQsvNngGrxQHAvkKcvRmdizlJi0,92
|
|
24
|
+
thordata_mcp_server-0.5.0.dist-info/entry_points.txt,sha256=DI4aWMlUKBmTJTbNanK9m1LWoKeFvuLYS4sPd0n5GIQ,56
|
|
25
|
+
thordata_mcp_server-0.5.0.dist-info/top_level.txt,sha256=9ODiyY_ikjIGCkF7_zk3HAd07KUkH63Uu7Ax-cz7DA8,13
|
|
26
|
+
thordata_mcp_server-0.5.0.dist-info/RECORD,,
|
|
@@ -1,24 +0,0 @@
|
|
|
1
|
-
thordata_mcp/__init__.py,sha256=yPIlzOnjL77vd9e1iuXpJs7wUiGYSAnSZtrrdrdQlOg,61
|
|
2
|
-
thordata_mcp/aria_snapshot.py,sha256=SW8d_MxudDUlrl0OPjfG1brg67WyC-I4fMnwPOpk9oU,3293
|
|
3
|
-
thordata_mcp/browser_session.py,sha256=I804ixmgl2d4B4MC3bqVniBUlYZ5693V618mZ7tYGkU,11425
|
|
4
|
-
thordata_mcp/config.py,sha256=HJzFZxRnhhu2wbl4xMgEJIFy4cifSMQ-TYyVGcyVSug,2655
|
|
5
|
-
thordata_mcp/context.py,sha256=WmfoEbk_y3lxkssWIsDRpr00fwvuomzclskIUnMlQs0,1305
|
|
6
|
-
thordata_mcp/debug_http.py,sha256=6bioJS7M8luF2wr39cQeD9jqgYBPeTAPWxbing6JhD0,7045
|
|
7
|
-
thordata_mcp/main.py,sha256=qcqTdF6YM0lwmp3Zj2R7vlA8LmczflykANCvcOKJZcQ,3743
|
|
8
|
-
thordata_mcp/monitoring.py,sha256=JIFJa0mJb1f9tyzSyERVtOsm_YbKv3i9-E_8aBhI7Ho,7458
|
|
9
|
-
thordata_mcp/registry.py,sha256=4Vp49bNB8BeBAFfyPJO9SPRU2A4Ez6EM5OClAEY1NYE,1339
|
|
10
|
-
thordata_mcp/utils.py,sha256=bxrQiJPJu1TvlOSQXUR-LFwFN5r-6qIrKLNEkSAgyIM,13251
|
|
11
|
-
thordata_mcp/tools/__init__.py,sha256=6XOtYo7kqcHuBQLE0xZLGilP5NxNmZArIjy_tKIMep0,618
|
|
12
|
-
thordata_mcp/tools/product.py,sha256=38Se8OAKLmuRm6bXyLS28PBOXa_LVJ-lYPg6u-cVUhM,72230
|
|
13
|
-
thordata_mcp/tools/product_compact.py,sha256=mVOPP3Bde2TzD3m6PV3uHiDMdaqnYb5nhF2fSwIdURQ,48705
|
|
14
|
-
thordata_mcp/tools/utils.py,sha256=76YoU2hk1G9GOOxQhvNwYK5mZsQKJT8DnKzhzu-b4r8,4375
|
|
15
|
-
thordata_mcp/tools/data/__init__.py,sha256=KMD19WJlkPe3Y3e-U1v9IjAO3E_qEu2dtzS5yj_h1XU,398
|
|
16
|
-
thordata_mcp/tools/data/browser.py,sha256=1NXFlqeMCCHWfmklCYAohcs9A_ceBpqO9f7ZNRJZR6c,16122
|
|
17
|
-
thordata_mcp/tools/data/serp.py,sha256=BdxVMiQ5WwHqK1sPIfk1Ehx9MCG-QFZKLu8ZlZE2IlA,3439
|
|
18
|
-
thordata_mcp/tools/data/tasks.py,sha256=_spv5Uz3ZGdP4sSCrJT3nJg5UQILGxEijjK4Z-zJZVA,16274
|
|
19
|
-
thordata_mcp/tools/data/universal.py,sha256=JUQUOwN2x_p7coRt71bCykQxOzZCnyiN4NvSq4dfZaI,9482
|
|
20
|
-
thordata_mcp_server-0.4.4.dist-info/METADATA,sha256=9_X3K4gIA2U641GWkvbbRpqBbEb51opEWD7pHZ2uJxA,8000
|
|
21
|
-
thordata_mcp_server-0.4.4.dist-info/WHEEL,sha256=wUyA8OaulRlbfwMtmQsvNngGrxQHAvkKcvRmdizlJi0,92
|
|
22
|
-
thordata_mcp_server-0.4.4.dist-info/entry_points.txt,sha256=DI4aWMlUKBmTJTbNanK9m1LWoKeFvuLYS4sPd0n5GIQ,56
|
|
23
|
-
thordata_mcp_server-0.4.4.dist-info/top_level.txt,sha256=9ODiyY_ikjIGCkF7_zk3HAd07KUkH63Uu7Ax-cz7DA8,13
|
|
24
|
-
thordata_mcp_server-0.4.4.dist-info/RECORD,,
|
|
File without changes
|
{thordata_mcp_server-0.4.4.dist-info → thordata_mcp_server-0.5.0.dist-info}/entry_points.txt
RENAMED
|
File without changes
|
|
File without changes
|