thordata-sdk 1.3.0__py3-none-any.whl
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- thordata/__init__.py +151 -0
- thordata/_example_utils.py +77 -0
- thordata/_utils.py +190 -0
- thordata/async_client.py +1733 -0
- thordata/client.py +1721 -0
- thordata/demo.py +138 -0
- thordata/enums.py +384 -0
- thordata/exceptions.py +355 -0
- thordata/models.py +1197 -0
- thordata/retry.py +382 -0
- thordata/serp_engines.py +166 -0
- thordata_sdk-1.3.0.dist-info/METADATA +208 -0
- thordata_sdk-1.3.0.dist-info/RECORD +16 -0
- thordata_sdk-1.3.0.dist-info/WHEEL +5 -0
- thordata_sdk-1.3.0.dist-info/licenses/LICENSE +21 -0
- thordata_sdk-1.3.0.dist-info/top_level.txt +1 -0
|
@@ -0,0 +1,208 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: thordata-sdk
|
|
3
|
+
Version: 1.3.0
|
|
4
|
+
Summary: The Official Python SDK for Thordata - AI Data Infrastructure & Proxy Network.
|
|
5
|
+
Author-email: Thordata Developer Team <support@thordata.com>
|
|
6
|
+
License: MIT
|
|
7
|
+
Project-URL: Homepage, https://www.thordata.com
|
|
8
|
+
Project-URL: Documentation, https://github.com/Thordata/thordata-python-sdk#readme
|
|
9
|
+
Project-URL: Source, https://github.com/Thordata/thordata-python-sdk
|
|
10
|
+
Project-URL: Tracker, https://github.com/Thordata/thordata-python-sdk/issues
|
|
11
|
+
Project-URL: Changelog, https://github.com/Thordata/thordata-python-sdk/blob/main/CHANGELOG.md
|
|
12
|
+
Keywords: web scraping,proxy,residential proxy,datacenter proxy,ai,llm,data-mining,serp,thordata,web scraper,anti-bot bypass
|
|
13
|
+
Classifier: Development Status :: 4 - Beta
|
|
14
|
+
Classifier: Intended Audience :: Developers
|
|
15
|
+
Classifier: Topic :: Software Development :: Libraries :: Python Modules
|
|
16
|
+
Classifier: Topic :: Internet :: WWW/HTTP
|
|
17
|
+
Classifier: Topic :: Internet :: Proxy Servers
|
|
18
|
+
Classifier: Programming Language :: Python :: 3
|
|
19
|
+
Classifier: Programming Language :: Python :: 3.9
|
|
20
|
+
Classifier: Programming Language :: Python :: 3.10
|
|
21
|
+
Classifier: Programming Language :: Python :: 3.11
|
|
22
|
+
Classifier: Programming Language :: Python :: 3.12
|
|
23
|
+
Classifier: License :: OSI Approved :: MIT License
|
|
24
|
+
Classifier: Operating System :: OS Independent
|
|
25
|
+
Classifier: Typing :: Typed
|
|
26
|
+
Requires-Python: >=3.9
|
|
27
|
+
Description-Content-Type: text/markdown
|
|
28
|
+
License-File: LICENSE
|
|
29
|
+
Requires-Dist: requests>=2.25.0
|
|
30
|
+
Requires-Dist: aiohttp>=3.9.0
|
|
31
|
+
Requires-Dist: PySocks>=1.7.1
|
|
32
|
+
Provides-Extra: dev
|
|
33
|
+
Requires-Dist: pytest>=7.0.0; extra == "dev"
|
|
34
|
+
Requires-Dist: pytest-asyncio>=0.21.0; extra == "dev"
|
|
35
|
+
Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
|
|
36
|
+
Requires-Dist: pytest-httpserver>=1.0.0; extra == "dev"
|
|
37
|
+
Requires-Dist: python-dotenv>=1.0.0; extra == "dev"
|
|
38
|
+
Requires-Dist: black>=23.0.0; extra == "dev"
|
|
39
|
+
Requires-Dist: ruff>=0.1.0; extra == "dev"
|
|
40
|
+
Requires-Dist: mypy>=1.0.0; extra == "dev"
|
|
41
|
+
Requires-Dist: types-requests>=2.28.0; extra == "dev"
|
|
42
|
+
Requires-Dist: aioresponses>=0.7.6; extra == "dev"
|
|
43
|
+
Dynamic: license-file
|
|
44
|
+
|
|
45
|
+
# Thordata Python SDK
|
|
46
|
+
|
|
47
|
+
<div align="center">
|
|
48
|
+
|
|
49
|
+
<img src="https://img.shields.io/badge/Thordata-AI%20Infrastructure-blue?style=for-the-badge" alt="Thordata Logo">
|
|
50
|
+
|
|
51
|
+
**The Official Python Client for Thordata APIs**
|
|
52
|
+
|
|
53
|
+
*Proxy Network โข SERP API โข Web Unlocker โข Web Scraper API*
|
|
54
|
+
|
|
55
|
+
[](https://pypi.org/project/thordata-sdk/)
|
|
56
|
+
[](https://pypi.org/project/thordata-sdk/)
|
|
57
|
+
[](LICENSE)
|
|
58
|
+
[](https://github.com/Thordata/thordata-python-sdk/actions)
|
|
59
|
+
|
|
60
|
+
</div>
|
|
61
|
+
|
|
62
|
+
---
|
|
63
|
+
|
|
64
|
+
## ๐ Introduction
|
|
65
|
+
|
|
66
|
+
This SDK provides a robust, high-performance interface to Thordata's AI data infrastructure. It is designed for high-concurrency scraping, reliable proxy tunneling, and seamless data extraction.
|
|
67
|
+
|
|
68
|
+
**Key Features:**
|
|
69
|
+
* **๐ Production Ready:** Built on `urllib3` connection pooling for low-latency proxy requests.
|
|
70
|
+
* **โก Async Support:** Native `aiohttp` client for high-concurrency SERP/Universal scraping.
|
|
71
|
+
* **๐ก๏ธ Robust:** Handles TLS-in-TLS tunneling, retries, and error parsing automatically.
|
|
72
|
+
* **โจ Developer Experience:** Fully typed (`mypy` compatible) with intuitive IDE autocomplete.
|
|
73
|
+
* **๐งฉ Lazy Validation:** Only validate credentials for the features you actually use.
|
|
74
|
+
|
|
75
|
+
---
|
|
76
|
+
|
|
77
|
+
## ๐ฆ Installation
|
|
78
|
+
|
|
79
|
+
```bash
|
|
80
|
+
pip install thordata-sdk
|
|
81
|
+
```
|
|
82
|
+
|
|
83
|
+
---
|
|
84
|
+
|
|
85
|
+
## ๐ Configuration
|
|
86
|
+
|
|
87
|
+
Set environment variables to avoid hardcoding credentials. You only need to set the variables for the features you use.
|
|
88
|
+
|
|
89
|
+
```bash
|
|
90
|
+
# [Required for SERP & Web Unlocker]
|
|
91
|
+
export THORDATA_SCRAPER_TOKEN="your_token_here"
|
|
92
|
+
|
|
93
|
+
# [Required for Proxy Network]
|
|
94
|
+
export THORDATA_RESIDENTIAL_USERNAME="your_username"
|
|
95
|
+
export THORDATA_RESIDENTIAL_PASSWORD="your_password"
|
|
96
|
+
export THORDATA_PROXY_HOST="vpnXXXX.pr.thordata.net"
|
|
97
|
+
|
|
98
|
+
# [Required for Task Management]
|
|
99
|
+
export THORDATA_PUBLIC_TOKEN="public_token"
|
|
100
|
+
export THORDATA_PUBLIC_KEY="public_key"
|
|
101
|
+
```
|
|
102
|
+
|
|
103
|
+
---
|
|
104
|
+
|
|
105
|
+
## ๐ Quick Start
|
|
106
|
+
|
|
107
|
+
### 1. SERP Search (Google/Bing/Yandex)
|
|
108
|
+
|
|
109
|
+
```python
|
|
110
|
+
from thordata import ThordataClient, Engine
|
|
111
|
+
|
|
112
|
+
client = ThordataClient() # Loads THORDATA_SCRAPER_TOKEN from env
|
|
113
|
+
|
|
114
|
+
# Simple Search
|
|
115
|
+
print("Searching...")
|
|
116
|
+
results = client.serp_search("latest AI trends", engine=Engine.GOOGLE_NEWS)
|
|
117
|
+
|
|
118
|
+
for news in results.get("news_results", [])[:3]:
|
|
119
|
+
print(f"- {news['title']} ({news['source']})")
|
|
120
|
+
```
|
|
121
|
+
|
|
122
|
+
### 2. Universal Scrape (Web Unlocker)
|
|
123
|
+
|
|
124
|
+
Bypass Cloudflare/Akamai and render JavaScript automatically.
|
|
125
|
+
|
|
126
|
+
```python
|
|
127
|
+
html = client.universal_scrape(
|
|
128
|
+
url="https://example.com/protected-page",
|
|
129
|
+
js_render=True,
|
|
130
|
+
wait_for=".content-loaded",
|
|
131
|
+
country="us"
|
|
132
|
+
)
|
|
133
|
+
print(f"Scraped {len(html)} bytes")
|
|
134
|
+
```
|
|
135
|
+
|
|
136
|
+
### 3. High-Performance Proxy
|
|
137
|
+
|
|
138
|
+
Use Thordata's residential IPs with automatic connection pooling.
|
|
139
|
+
|
|
140
|
+
```python
|
|
141
|
+
from thordata import ProxyConfig, ProxyProduct
|
|
142
|
+
|
|
143
|
+
# Config is optional if env vars are set, but allows granular control
|
|
144
|
+
proxy = ProxyConfig(
|
|
145
|
+
product=ProxyProduct.RESIDENTIAL,
|
|
146
|
+
country="jp",
|
|
147
|
+
city="tokyo",
|
|
148
|
+
session_id="session-001",
|
|
149
|
+
session_duration=10 # Sticky IP for 10 mins
|
|
150
|
+
)
|
|
151
|
+
|
|
152
|
+
# Use the client to make requests (Reuses TCP connections)
|
|
153
|
+
response = client.get("https://httpbin.org/ip", proxy_config=proxy)
|
|
154
|
+
print(response.json())
|
|
155
|
+
```
|
|
156
|
+
|
|
157
|
+
---
|
|
158
|
+
|
|
159
|
+
## โ๏ธ Advanced Usage
|
|
160
|
+
|
|
161
|
+
### Async Client (High Concurrency)
|
|
162
|
+
|
|
163
|
+
For building AI agents or high-throughput spiders.
|
|
164
|
+
|
|
165
|
+
```python
|
|
166
|
+
import asyncio
|
|
167
|
+
from thordata import AsyncThordataClient
|
|
168
|
+
|
|
169
|
+
async def main():
|
|
170
|
+
async with AsyncThordataClient() as client:
|
|
171
|
+
# Fire off multiple requests in parallel
|
|
172
|
+
tasks = [
|
|
173
|
+
client.serp_search(f"query {i}")
|
|
174
|
+
for i in range(5)
|
|
175
|
+
]
|
|
176
|
+
results = await asyncio.gather(*tasks)
|
|
177
|
+
print(f"Completed {len(results)} searches")
|
|
178
|
+
|
|
179
|
+
asyncio.run(main())
|
|
180
|
+
```
|
|
181
|
+
|
|
182
|
+
### Web Scraper API (Task Management)
|
|
183
|
+
|
|
184
|
+
Create and manage large-scale scraping tasks asynchronously.
|
|
185
|
+
|
|
186
|
+
```python
|
|
187
|
+
# 1. Create a task
|
|
188
|
+
task_id = client.create_scraper_task(
|
|
189
|
+
file_name="daily_scrape",
|
|
190
|
+
spider_id="universal",
|
|
191
|
+
spider_name="universal",
|
|
192
|
+
parameters={"url": "https://example.com"}
|
|
193
|
+
)
|
|
194
|
+
|
|
195
|
+
# 2. Wait for completion (Polling)
|
|
196
|
+
status = client.wait_for_task(task_id)
|
|
197
|
+
|
|
198
|
+
# 3. Get results
|
|
199
|
+
if status == "ready":
|
|
200
|
+
url = client.get_task_result(task_id)
|
|
201
|
+
print(f"Download Data: {url}")
|
|
202
|
+
```
|
|
203
|
+
|
|
204
|
+
---
|
|
205
|
+
|
|
206
|
+
## ๐ License
|
|
207
|
+
|
|
208
|
+
MIT License. See [LICENSE](LICENSE) for details.
|
|
@@ -0,0 +1,16 @@
|
|
|
1
|
+
thordata/__init__.py,sha256=O9R2zY6qQXWIkQQ8Hcqqcwshymk4-2Bo_Pmm-Ma3SeI,3195
|
|
2
|
+
thordata/_example_utils.py,sha256=T9QtVq9BHhubOShgtGp2GSusYYd-ZFUJFJAw7ubIsa4,2199
|
|
3
|
+
thordata/_utils.py,sha256=Acr_6sHgdZXU7SQozd6FEYTZV6iHw__nlhpBTDwb66U,4917
|
|
4
|
+
thordata/async_client.py,sha256=9swh6AoAgvLRavFzXaM1rUz9Zm66r7GKjThjfILLiSI,58082
|
|
5
|
+
thordata/client.py,sha256=IvkPjm4v9ViiNMUM4uUbUOY3kNtJdw3mzSd3BT3yD0A,59039
|
|
6
|
+
thordata/demo.py,sha256=HQzgaUM33bWD7mBQ6HEkK5K6zqFnSAHLvaam6BwPgFA,3762
|
|
7
|
+
thordata/enums.py,sha256=MpZnS9_8sg2vtcFqM6UicB94cKZm5R1t83L3ejNSbLs,8502
|
|
8
|
+
thordata/exceptions.py,sha256=P9czrxkFhT439DxW3LE5W-koS595ObH4-mAQOfaDM18,9976
|
|
9
|
+
thordata/models.py,sha256=qtB7jE0v5zNEQfSpmOqdiacB5DgM2QfVR2PaYs-DisM,38206
|
|
10
|
+
thordata/retry.py,sha256=5kRwULl3X68Nx8PlSzr9benfyCL0nRSpVQXrwjWr45M,11456
|
|
11
|
+
thordata/serp_engines.py,sha256=iuMWncelcGOskCHXFzpcPMMTL5qfiLkazHB1uj3zpZo,5985
|
|
12
|
+
thordata_sdk-1.3.0.dist-info/licenses/LICENSE,sha256=bAxpWgQIzb-5jl3nhLdOwOJ_vlbHLtSG7yev2B7vioY,1088
|
|
13
|
+
thordata_sdk-1.3.0.dist-info/METADATA,sha256=x7p4JY94WCbvVmLiSaPDcUiKKiSw_4bRn3sLr1PRBPM,6600
|
|
14
|
+
thordata_sdk-1.3.0.dist-info/WHEEL,sha256=_zCd3N1l69ArxyTb8rzEoP9TpbYXkqRFSNOD5OuxnTs,91
|
|
15
|
+
thordata_sdk-1.3.0.dist-info/top_level.txt,sha256=Z8R_07m0lXCCSb1hapL9_nxMtyO3rf_9wOvq4n9u2Hg,9
|
|
16
|
+
thordata_sdk-1.3.0.dist-info/RECORD,,
|
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
MIT License
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2025 Thordata ยท AI Proxy & Web Data
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
|
13
|
+
copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
21
|
+
SOFTWARE.
|
|
@@ -0,0 +1 @@
|
|
|
1
|
+
thordata
|