thordata-sdk 0.3.1__py3-none-any.whl → 0.5.0__py3-none-any.whl
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- thordata/__init__.py +130 -11
- thordata/_utils.py +126 -0
- thordata/async_client.py +672 -185
- thordata/client.py +809 -300
- thordata/enums.py +301 -11
- thordata/exceptions.py +344 -0
- thordata/models.py +725 -0
- thordata/parameters.py +7 -6
- thordata/retry.py +380 -0
- thordata_sdk-0.5.0.dist-info/METADATA +896 -0
- thordata_sdk-0.5.0.dist-info/RECORD +14 -0
- thordata_sdk-0.5.0.dist-info/licenses/LICENSE +21 -0
- thordata_sdk-0.3.1.dist-info/METADATA +0 -200
- thordata_sdk-0.3.1.dist-info/RECORD +0 -10
- thordata_sdk-0.3.1.dist-info/licenses/LICENSE +0 -201
- {thordata_sdk-0.3.1.dist-info → thordata_sdk-0.5.0.dist-info}/WHEEL +0 -0
- {thordata_sdk-0.3.1.dist-info → thordata_sdk-0.5.0.dist-info}/top_level.txt +0 -0
|
@@ -0,0 +1,896 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: thordata-sdk
|
|
3
|
+
Version: 0.5.0
|
|
4
|
+
Summary: The Official Python SDK for Thordata - AI Data Infrastructure & Proxy Network.
|
|
5
|
+
Author-email: Thordata Developer Team <support@thordata.com>
|
|
6
|
+
License: MIT
|
|
7
|
+
Project-URL: Homepage, https://www.thordata.com
|
|
8
|
+
Project-URL: Documentation, https://github.com/Thordata/thordata-python-sdk#readme
|
|
9
|
+
Project-URL: Source, https://github.com/Thordata/thordata-python-sdk
|
|
10
|
+
Project-URL: Tracker, https://github.com/Thordata/thordata-python-sdk/issues
|
|
11
|
+
Project-URL: Changelog, https://github.com/Thordata/thordata-python-sdk/blob/main/CHANGELOG.md
|
|
12
|
+
Keywords: web scraping,proxy,residential proxy,datacenter proxy,ai,llm,data-mining,serp,thordata,web scraper,anti-bot bypass
|
|
13
|
+
Classifier: Development Status :: 4 - Beta
|
|
14
|
+
Classifier: Intended Audience :: Developers
|
|
15
|
+
Classifier: Topic :: Software Development :: Libraries :: Python Modules
|
|
16
|
+
Classifier: Topic :: Internet :: WWW/HTTP
|
|
17
|
+
Classifier: Topic :: Internet :: Proxy Servers
|
|
18
|
+
Classifier: Programming Language :: Python :: 3
|
|
19
|
+
Classifier: Programming Language :: Python :: 3.9
|
|
20
|
+
Classifier: Programming Language :: Python :: 3.10
|
|
21
|
+
Classifier: Programming Language :: Python :: 3.11
|
|
22
|
+
Classifier: Programming Language :: Python :: 3.12
|
|
23
|
+
Classifier: License :: OSI Approved :: MIT License
|
|
24
|
+
Classifier: Operating System :: OS Independent
|
|
25
|
+
Classifier: Typing :: Typed
|
|
26
|
+
Requires-Python: >=3.9
|
|
27
|
+
Description-Content-Type: text/markdown
|
|
28
|
+
License-File: LICENSE
|
|
29
|
+
Requires-Dist: requests>=2.25.0
|
|
30
|
+
Requires-Dist: aiohttp>=3.9.0
|
|
31
|
+
Provides-Extra: dev
|
|
32
|
+
Requires-Dist: pytest>=7.0.0; extra == "dev"
|
|
33
|
+
Requires-Dist: pytest-asyncio>=0.21.0; extra == "dev"
|
|
34
|
+
Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
|
|
35
|
+
Requires-Dist: pytest-httpserver>=1.0.0; extra == "dev"
|
|
36
|
+
Requires-Dist: python-dotenv>=1.0.0; extra == "dev"
|
|
37
|
+
Requires-Dist: black>=23.0.0; extra == "dev"
|
|
38
|
+
Requires-Dist: ruff>=0.1.0; extra == "dev"
|
|
39
|
+
Requires-Dist: mypy>=1.0.0; extra == "dev"
|
|
40
|
+
Requires-Dist: types-requests>=2.28.0; extra == "dev"
|
|
41
|
+
Dynamic: license-file
|
|
42
|
+
|
|
43
|
+
# Thordata Python SDK
|
|
44
|
+
|
|
45
|
+
<div align="center">
|
|
46
|
+
|
|
47
|
+
**Official Python client for Thordata's Proxy Network, SERP API, Web Unlocker, and Web Scraper API.**
|
|
48
|
+
|
|
49
|
+
*Async-ready, type-safe, built for AI agents and large-scale data collection.*
|
|
50
|
+
|
|
51
|
+
[](https://github.com/Thordata/thordata-python-sdk/actions/workflows/ci.yml)
|
|
52
|
+
[](https://pypi.org/project/thordata-sdk/)
|
|
53
|
+
[](https://python.org)
|
|
54
|
+
[](LICENSE)
|
|
55
|
+
[](https://github.com/Thordata/thordata-python-sdk)
|
|
56
|
+
|
|
57
|
+
[Documentation](https://doc.thordata.com) • [Dashboard](https://www.thordata.com) • [Examples](examples/) • [Changelog](CHANGELOG.md)
|
|
58
|
+
|
|
59
|
+
</div>
|
|
60
|
+
|
|
61
|
+
---
|
|
62
|
+
|
|
63
|
+
## ✨ Features
|
|
64
|
+
|
|
65
|
+
| Feature | Description |
|
|
66
|
+
|---------|-------------|
|
|
67
|
+
| 🌐 **Proxy Network** | Residential, Mobile, Datacenter, ISP proxies with geo-targeting |
|
|
68
|
+
| 🔍 **SERP API** | Google, Bing, Yandex, DuckDuckGo, Baidu search results |
|
|
69
|
+
| 🔓 **Web Unlocker** | Bypass Cloudflare, CAPTCHAs, anti-bot systems automatically |
|
|
70
|
+
| 🕷️ **Web Scraper** | Async task-based scraping for complex sites |
|
|
71
|
+
| ⚡ **Async Support** | Full async/await support with aiohttp |
|
|
72
|
+
| 🔄 **Auto Retry** | Configurable retry with exponential backoff |
|
|
73
|
+
| 📝 **Type Safe** | Full type annotations for IDE autocomplete |
|
|
74
|
+
|
|
75
|
+
---
|
|
76
|
+
|
|
77
|
+
## 📦 Installation
|
|
78
|
+
|
|
79
|
+
```bash
|
|
80
|
+
pip install thordata-sdk
|
|
81
|
+
```
|
|
82
|
+
|
|
83
|
+
For development:
|
|
84
|
+
|
|
85
|
+
```bash
|
|
86
|
+
pip install thordata-sdk[dev]
|
|
87
|
+
```
|
|
88
|
+
|
|
89
|
+
---
|
|
90
|
+
|
|
91
|
+
## 🚀 Quick Start
|
|
92
|
+
|
|
93
|
+
### Get Your Credentials
|
|
94
|
+
|
|
95
|
+
1. Sign up at [thordata.com](https://www.thordata.com)
|
|
96
|
+
2. Navigate to your Dashboard
|
|
97
|
+
3. Copy your Scraper Token, Public Token, and Public Key
|
|
98
|
+
|
|
99
|
+
### Basic Usage
|
|
100
|
+
|
|
101
|
+
```python
|
|
102
|
+
from thordata import ThordataClient
|
|
103
|
+
|
|
104
|
+
# Initialize the client
|
|
105
|
+
client = ThordataClient(
|
|
106
|
+
scraper_token="your_scraper_token",
|
|
107
|
+
public_token="your_public_token", # Optional, for task APIs
|
|
108
|
+
public_key="your_public_key" # Optional, for task APIs
|
|
109
|
+
)
|
|
110
|
+
|
|
111
|
+
# Make a request through the proxy network
|
|
112
|
+
response = client.get("https://httpbin.org/ip")
|
|
113
|
+
print(response.json())
|
|
114
|
+
# {'origin': '123.45.67.89'} # Residential IP
|
|
115
|
+
```
|
|
116
|
+
|
|
117
|
+
### Environment Variables
|
|
118
|
+
|
|
119
|
+
Create a `.env` file:
|
|
120
|
+
|
|
121
|
+
```env
|
|
122
|
+
THORDATA_SCRAPER_TOKEN=your_scraper_token
|
|
123
|
+
THORDATA_PUBLIC_TOKEN=your_public_token
|
|
124
|
+
THORDATA_PUBLIC_KEY=your_public_key
|
|
125
|
+
```
|
|
126
|
+
|
|
127
|
+
Then use with python-dotenv:
|
|
128
|
+
|
|
129
|
+
```python
|
|
130
|
+
import os
|
|
131
|
+
from dotenv import load_dotenv
|
|
132
|
+
from thordata import ThordataClient
|
|
133
|
+
|
|
134
|
+
load_dotenv()
|
|
135
|
+
|
|
136
|
+
client = ThordataClient(
|
|
137
|
+
scraper_token=os.getenv("THORDATA_SCRAPER_TOKEN"),
|
|
138
|
+
public_token=os.getenv("THORDATA_PUBLIC_TOKEN"),
|
|
139
|
+
public_key=os.getenv("THORDATA_PUBLIC_KEY"),
|
|
140
|
+
)
|
|
141
|
+
```
|
|
142
|
+
|
|
143
|
+
---
|
|
144
|
+
|
|
145
|
+
## 📖 Usage Guide
|
|
146
|
+
|
|
147
|
+
### 1. Proxy Network
|
|
148
|
+
|
|
149
|
+
#### Basic Proxy Request
|
|
150
|
+
|
|
151
|
+
```python
|
|
152
|
+
from thordata import ThordataClient
|
|
153
|
+
|
|
154
|
+
client = ThordataClient(scraper_token="your_token")
|
|
155
|
+
|
|
156
|
+
# GET request through proxy
|
|
157
|
+
response = client.get("https://example.com")
|
|
158
|
+
print(response.text)
|
|
159
|
+
|
|
160
|
+
# POST request through proxy
|
|
161
|
+
response = client.post("https://httpbin.org/post", json={"key": "value"})
|
|
162
|
+
print(response.json())
|
|
163
|
+
```
|
|
164
|
+
|
|
165
|
+
#### Geo-Targeting
|
|
166
|
+
|
|
167
|
+
```python
|
|
168
|
+
from thordata import ThordataClient, ProxyConfig
|
|
169
|
+
|
|
170
|
+
client = ThordataClient(scraper_token="your_token")
|
|
171
|
+
|
|
172
|
+
# Create a proxy config with geo-targeting
|
|
173
|
+
config = ProxyConfig(
|
|
174
|
+
username="your_username",
|
|
175
|
+
password="your_password",
|
|
176
|
+
country="us", # Target country
|
|
177
|
+
state="california", # Target state
|
|
178
|
+
city="los_angeles", # Target city
|
|
179
|
+
)
|
|
180
|
+
|
|
181
|
+
response = client.get("https://httpbin.org/ip", proxy_config=config)
|
|
182
|
+
print(response.json())
|
|
183
|
+
```
|
|
184
|
+
|
|
185
|
+
#### Sticky Sessions
|
|
186
|
+
|
|
187
|
+
Keep the same IP for multiple requests:
|
|
188
|
+
|
|
189
|
+
```python
|
|
190
|
+
from thordata import ThordataClient, StickySession
|
|
191
|
+
|
|
192
|
+
client = ThordataClient(scraper_token="your_token")
|
|
193
|
+
|
|
194
|
+
# Create a sticky session (same IP for 10 minutes)
|
|
195
|
+
session = StickySession(
|
|
196
|
+
username="your_username",
|
|
197
|
+
password="your_password",
|
|
198
|
+
country="gb",
|
|
199
|
+
duration_minutes=10,
|
|
200
|
+
)
|
|
201
|
+
|
|
202
|
+
# All requests use the same IP
|
|
203
|
+
for i in range(5):
|
|
204
|
+
response = client.get("https://httpbin.org/ip", proxy_config=session)
|
|
205
|
+
print(f"Request {i+1}: {response.json()['origin']}")
|
|
206
|
+
```
|
|
207
|
+
|
|
208
|
+
#### Different Proxy Products
|
|
209
|
+
|
|
210
|
+
```python
|
|
211
|
+
from thordata import ProxyConfig, ProxyProduct
|
|
212
|
+
|
|
213
|
+
# Residential proxy (default, port 9999)
|
|
214
|
+
residential = ProxyConfig(
|
|
215
|
+
username="user", password="pass",
|
|
216
|
+
product=ProxyProduct.RESIDENTIAL
|
|
217
|
+
)
|
|
218
|
+
|
|
219
|
+
# Mobile proxy (port 5555)
|
|
220
|
+
mobile = ProxyConfig(
|
|
221
|
+
username="user", password="pass",
|
|
222
|
+
product=ProxyProduct.MOBILE
|
|
223
|
+
)
|
|
224
|
+
|
|
225
|
+
# Datacenter proxy (port 7777)
|
|
226
|
+
datacenter = ProxyConfig(
|
|
227
|
+
username="user", password="pass",
|
|
228
|
+
product=ProxyProduct.DATACENTER
|
|
229
|
+
)
|
|
230
|
+
```
|
|
231
|
+
|
|
232
|
+
### 2. SERP API (Search Engine Results)
|
|
233
|
+
|
|
234
|
+
#### Basic Search
|
|
235
|
+
|
|
236
|
+
```python
|
|
237
|
+
from thordata import ThordataClient, Engine
|
|
238
|
+
|
|
239
|
+
client = ThordataClient(scraper_token="your_token")
|
|
240
|
+
|
|
241
|
+
# Google search
|
|
242
|
+
results = client.serp_search(
|
|
243
|
+
query="python programming",
|
|
244
|
+
engine=Engine.GOOGLE,
|
|
245
|
+
num=10
|
|
246
|
+
)
|
|
247
|
+
|
|
248
|
+
# Print organic results
|
|
249
|
+
for result in results.get("organic", []):
|
|
250
|
+
print(f"{result['title']}: {result['link']}")
|
|
251
|
+
```
|
|
252
|
+
|
|
253
|
+
#### General Calling Method
|
|
254
|
+
|
|
255
|
+
```python
|
|
256
|
+
from thordata import ThordataClient, Engine
|
|
257
|
+
|
|
258
|
+
client = ThordataClient(scraper_token="YOUR_SCRAPER_TOKEN")
|
|
259
|
+
|
|
260
|
+
results = client.serp_search(
|
|
261
|
+
query="pizza",
|
|
262
|
+
engine=Engine.GOOGLE, # or "google"
|
|
263
|
+
num=10,
|
|
264
|
+
country="us",
|
|
265
|
+
language="en",
|
|
266
|
+
search_type="news", # corresponds to tbm=nws
|
|
267
|
+
# Other parameters are passed in via kwargs
|
|
268
|
+
ibp="some_ibp_value",
|
|
269
|
+
lsig="some_lsig_value",
|
|
270
|
+
)
|
|
271
|
+
```
|
|
272
|
+
|
|
273
|
+
**Note**: All parameters above will be assembled into Thordata SERP API request parameters.
|
|
274
|
+
|
|
275
|
+
#### Advanced Search Options
|
|
276
|
+
|
|
277
|
+
```python
|
|
278
|
+
from thordata import ThordataClient, SerpRequest
|
|
279
|
+
|
|
280
|
+
client = ThordataClient(scraper_token="your_token")
|
|
281
|
+
|
|
282
|
+
# Create a detailed search request
|
|
283
|
+
request = SerpRequest(
|
|
284
|
+
query="best laptops 2024",
|
|
285
|
+
engine="google",
|
|
286
|
+
num=20,
|
|
287
|
+
country="us",
|
|
288
|
+
language="en",
|
|
289
|
+
search_type="shopping", # shopping, news, images, videos
|
|
290
|
+
time_filter="month", # hour, day, week, month, year
|
|
291
|
+
safe_search=True,
|
|
292
|
+
device="mobile", # desktop, mobile, tablet
|
|
293
|
+
)
|
|
294
|
+
|
|
295
|
+
results = client.serp_search_advanced(request)
|
|
296
|
+
```
|
|
297
|
+
|
|
298
|
+
#### Multiple Search Engines
|
|
299
|
+
|
|
300
|
+
```python
|
|
301
|
+
from thordata import ThordataClient, Engine
|
|
302
|
+
|
|
303
|
+
client = ThordataClient(scraper_token="your_token")
|
|
304
|
+
|
|
305
|
+
# Google
|
|
306
|
+
google_results = client.serp_search("AI news", engine=Engine.GOOGLE)
|
|
307
|
+
|
|
308
|
+
# Bing
|
|
309
|
+
bing_results = client.serp_search("AI news", engine=Engine.BING)
|
|
310
|
+
|
|
311
|
+
# Yandex (Russian search engine)
|
|
312
|
+
yandex_results = client.serp_search("AI news", engine=Engine.YANDEX)
|
|
313
|
+
|
|
314
|
+
# DuckDuckGo
|
|
315
|
+
ddg_results = client.serp_search("AI news", engine=Engine.DUCKDUCKGO)
|
|
316
|
+
```
|
|
317
|
+
|
|
318
|
+
---
|
|
319
|
+
|
|
320
|
+
## 🔧 SERP API Parameter Mapping
|
|
321
|
+
|
|
322
|
+
Thordata's SERP API supports multiple search engines and sub-features (Google Search/Shopping/News, etc.).
|
|
323
|
+
This SDK wraps common parameters through `ThordataClient.serp_search` and `SerpRequest`, while other parameters can be passed directly through `**kwargs`.
|
|
324
|
+
|
|
325
|
+
### Google Search Parameter Mapping
|
|
326
|
+
|
|
327
|
+
| Document Parameter | SDK Field/Usage | Description |
|
|
328
|
+
|-------------------|-----------------|-------------|
|
|
329
|
+
| q | query | Search keyword |
|
|
330
|
+
| engine | engine | Engine.GOOGLE / "google" |
|
|
331
|
+
| google_domain | google_domain | e.g., "google.co.uk" |
|
|
332
|
+
| gl | country | Country/region, e.g., "us" |
|
|
333
|
+
| hl | language | Language, e.g., "en", "zh-CN" |
|
|
334
|
+
| cr | countries_filter | Multi-country filter, e.g., "countryFR |
|
|
335
|
+
| lr | languages_filter | Multi-language filter, e.g., "lang_en |
|
|
336
|
+
| location | location | Exact location, e.g., "India" |
|
|
337
|
+
| uule | uule | Base64 encoded location string |
|
|
338
|
+
| tbm | search_type | "images"→tbm=isch, "shopping"→tbm=shop, "news"→tbm=nws, "videos"→tbm=vid, other values passed through as-is |
|
|
339
|
+
| start | start | Result offset for pagination |
|
|
340
|
+
| num | num | Number of results per page |
|
|
341
|
+
| ludocid | ludocid | Google Place ID |
|
|
342
|
+
| kgmid | kgmid | Google Knowledge Graph ID |
|
|
343
|
+
| ibp | ibp="..." (kwargs) | Passed through **kwargs |
|
|
344
|
+
| lsig | lsig="..." (kwargs) | Same as above |
|
|
345
|
+
| si | si="..." (kwargs) | Same as above |
|
|
346
|
+
| uds | uds="ADV" (kwargs) | Same as above |
|
|
347
|
+
| tbs | time_filter or tbs="..." | time_filter="week" generates tbs=qdr:w, can also pass complete tbs directly |
|
|
348
|
+
| safe | safe_search | True → safe=active, False → safe=off |
|
|
349
|
+
| nfpr | no_autocorrect | True → nfpr=1 |
|
|
350
|
+
| filter | filter_duplicates | True → filter=1, False → filter=0 |
|
|
351
|
+
|
|
352
|
+
**Example: Google Search Basic Usage**
|
|
353
|
+
|
|
354
|
+
```python
|
|
355
|
+
results = client.serp_search(
|
|
356
|
+
query="python web scraping best practices",
|
|
357
|
+
engine=Engine.GOOGLE,
|
|
358
|
+
country="us",
|
|
359
|
+
language="en",
|
|
360
|
+
num=10,
|
|
361
|
+
time_filter="week", # Last week
|
|
362
|
+
safe_search=True, # Adult content filter
|
|
363
|
+
)
|
|
364
|
+
```
|
|
365
|
+
|
|
366
|
+
### Google Shopping Parameter Mapping
|
|
367
|
+
|
|
368
|
+
Shopping still uses engine="google", search_type="shopping" to select Shopping mode:
|
|
369
|
+
|
|
370
|
+
```python
|
|
371
|
+
results = client.serp_search(
|
|
372
|
+
query="iPhone 15",
|
|
373
|
+
engine=Engine.GOOGLE,
|
|
374
|
+
search_type="shopping", # tbm=shop
|
|
375
|
+
country="us",
|
|
376
|
+
language="en",
|
|
377
|
+
num=20,
|
|
378
|
+
min_price=500, # Parameters below passed through kwargs
|
|
379
|
+
max_price=1500,
|
|
380
|
+
sort_by=1, # 1=price low to high, 2=high to low
|
|
381
|
+
free_shipping=True,
|
|
382
|
+
on_sale=True,
|
|
383
|
+
small_business=True,
|
|
384
|
+
direct_link=True,
|
|
385
|
+
shoprs="FILTER_ID_HERE",
|
|
386
|
+
)
|
|
387
|
+
shopping_items = results.get("shopping_results", [])
|
|
388
|
+
```
|
|
389
|
+
|
|
390
|
+
| Document Parameter | SDK Field/Usage | Description |
|
|
391
|
+
|-------------------|-----------------|-------------|
|
|
392
|
+
| q | query | Search keyword |
|
|
393
|
+
| google_domain | google_domain | Same as above |
|
|
394
|
+
| gl | country | Same as above |
|
|
395
|
+
| hl | language | Same as above |
|
|
396
|
+
| location | location | Same as above |
|
|
397
|
+
| uule | uule | Same as above |
|
|
398
|
+
| start | start | Offset |
|
|
399
|
+
| num | num | Quantity |
|
|
400
|
+
| tbs | time_filter or tbs="..." | Same as above |
|
|
401
|
+
| shoprs | shoprs="..." (kwargs) | Filter ID |
|
|
402
|
+
| min_price | min_price=... (kwargs) | Minimum price |
|
|
403
|
+
| max_price | max_price=... (kwargs) | Maximum price |
|
|
404
|
+
| sort_by | sort_by=1/2 (kwargs) | Sort order |
|
|
405
|
+
| free_shipping | free_shipping=True/False (kwargs) | Free shipping |
|
|
406
|
+
| on_sale | on_sale=True/False (kwargs) | On sale |
|
|
407
|
+
| small_business | small_business=True/False (kwargs) | Small business |
|
|
408
|
+
| direct_link | direct_link=True/False (kwargs) | Include direct links |
|
|
409
|
+
|
|
410
|
+
### Google Local Parameter Mapping
|
|
411
|
+
|
|
412
|
+
Google Local is mainly about location-based local searches.
|
|
413
|
+
In the SDK, you can use search_type="local" to mark Local mode (tbm passed through as "local"), combined with location + uule.
|
|
414
|
+
|
|
415
|
+
```python
|
|
416
|
+
results = client.serp_search(
|
|
417
|
+
query="pizza near me",
|
|
418
|
+
engine=Engine.GOOGLE,
|
|
419
|
+
search_type="local",
|
|
420
|
+
google_domain="google.com",
|
|
421
|
+
country="us",
|
|
422
|
+
language="en",
|
|
423
|
+
location="San Francisco",
|
|
424
|
+
uule="w+CAIQICIFU2FuIEZyYW5jaXNjbw", # Example value
|
|
425
|
+
start=0, # Local only accepts 0, 20, 40...
|
|
426
|
+
)
|
|
427
|
+
local_results = results.get("local_results", results.get("organic", []))
|
|
428
|
+
```
|
|
429
|
+
|
|
430
|
+
| Document Parameter | SDK Field/Usage | Description |
|
|
431
|
+
|-------------------|-----------------|-------------|
|
|
432
|
+
| q | query | Search term |
|
|
433
|
+
| google_domain | google_domain | Domain |
|
|
434
|
+
| gl | country | Country |
|
|
435
|
+
| hl | language | Language |
|
|
436
|
+
| location | location | Local location |
|
|
437
|
+
| uule | uule | Encoded location |
|
|
438
|
+
| start | start | Offset (must be 0,20,40...) |
|
|
439
|
+
| ludocid | ludocid | Place ID (commonly used in Local results) |
|
|
440
|
+
| tbs | time_filter or tbs="..." | Advanced filtering |
|
|
441
|
+
|
|
442
|
+
### Google Videos Parameter Mapping
|
|
443
|
+
|
|
444
|
+
```python
|
|
445
|
+
results = client.serp_search(
|
|
446
|
+
query="python async tutorial",
|
|
447
|
+
engine=Engine.GOOGLE,
|
|
448
|
+
search_type="videos", # tbm=vid
|
|
449
|
+
country="us",
|
|
450
|
+
language="en",
|
|
451
|
+
languages_filter="lang_en|lang_fr",
|
|
452
|
+
location="United States",
|
|
453
|
+
uule="ENCODED_LOCATION_HERE",
|
|
454
|
+
num=10,
|
|
455
|
+
time_filter="month",
|
|
456
|
+
safe_search=True,
|
|
457
|
+
filter_duplicates=True,
|
|
458
|
+
)
|
|
459
|
+
video_results = results.get("video_results", results.get("organic", []))
|
|
460
|
+
```
|
|
461
|
+
|
|
462
|
+
| Document Parameter | SDK Field/Usage | Description |
|
|
463
|
+
|-------------------|-----------------|-------------|
|
|
464
|
+
| q | query | Search term |
|
|
465
|
+
| google_domain | google_domain | Domain |
|
|
466
|
+
| gl | country | Country |
|
|
467
|
+
| hl | language | Language |
|
|
468
|
+
| lr | languages_filter | Multi-language filter |
|
|
469
|
+
| location | location | Geographic location |
|
|
470
|
+
| uule | uule | Encoded location |
|
|
471
|
+
| start | start | Offset |
|
|
472
|
+
| num | num | Quantity |
|
|
473
|
+
| tbs | time_filter or tbs="..." | Time and advanced filtering |
|
|
474
|
+
| safe | safe_search | Adult content filter |
|
|
475
|
+
| nfpr | no_autocorrect | Disable auto-correction |
|
|
476
|
+
| filter | filter_duplicates | Remove duplicates |
|
|
477
|
+
|
|
478
|
+
### Google News Parameter Mapping
|
|
479
|
+
|
|
480
|
+
Google News has a set of exclusive token parameters for precise control of "topics/media/sections/stories".
|
|
481
|
+
|
|
482
|
+
```python
|
|
483
|
+
results = client.serp_search(
|
|
484
|
+
query="AI regulation",
|
|
485
|
+
engine=Engine.GOOGLE,
|
|
486
|
+
search_type="news", # tbm=nws
|
|
487
|
+
country="us",
|
|
488
|
+
language="en",
|
|
489
|
+
topic_token="YOUR_TOPIC_TOKEN", # Optional
|
|
490
|
+
publication_token="YOUR_PUBLICATION_TOKEN", # Optional
|
|
491
|
+
section_token="YOUR_SECTION_TOKEN", # Optional
|
|
492
|
+
story_token="YOUR_STORY_TOKEN", # Optional
|
|
493
|
+
so=1, # 0=relevance, 1=time
|
|
494
|
+
)
|
|
495
|
+
news_results = results.get("news_results", results.get("organic", []))
|
|
496
|
+
```
|
|
497
|
+
|
|
498
|
+
| Document Parameter | SDK Field/Usage | Description |
|
|
499
|
+
|-------------------|-----------------|-------------|
|
|
500
|
+
| q | query | Search term |
|
|
501
|
+
| gl | country | Country |
|
|
502
|
+
| hl | language | Language |
|
|
503
|
+
| topic_token | topic_token="..." (kwargs) | Topic token |
|
|
504
|
+
| publication_token | publication_token="..." (kwargs) | Media token |
|
|
505
|
+
| section_token | section_token="..." (kwargs) | Section token |
|
|
506
|
+
| story_token | story_token="..." (kwargs) | Story token |
|
|
507
|
+
| so | so=0/1 (kwargs) | Sort: 0=relevance, 1=time |
|
|
508
|
+
|
|
509
|
+
---
|
|
510
|
+
|
|
511
|
+
👉 For more SERP modes and parameter mappings, see docs/serp_reference.md.
|
|
512
|
+
|
|
513
|
+
## 🔓 Web Unlocker (Universal Scraping API)
|
|
514
|
+
|
|
515
|
+
Automatically bypass anti-bot protections:
|
|
516
|
+
|
|
517
|
+
#### Basic Usage
|
|
518
|
+
|
|
519
|
+
```python
|
|
520
|
+
from thordata import ThordataClient
|
|
521
|
+
|
|
522
|
+
client = ThordataClient(scraper_token="your_token")
|
|
523
|
+
|
|
524
|
+
# Get HTML content
|
|
525
|
+
html = client.universal_scrape(
|
|
526
|
+
url="https://example.com",
|
|
527
|
+
js_render=True, # Enable JavaScript rendering
|
|
528
|
+
)
|
|
529
|
+
print(html[:500])
|
|
530
|
+
```
|
|
531
|
+
|
|
532
|
+
#### Advanced Options
|
|
533
|
+
|
|
534
|
+
```python
|
|
535
|
+
from thordata import ThordataClient, UniversalScrapeRequest
|
|
536
|
+
|
|
537
|
+
client = ThordataClient(scraper_token="your_token")
|
|
538
|
+
|
|
539
|
+
request = UniversalScrapeRequest(
|
|
540
|
+
url="https://example.com",
|
|
541
|
+
js_render=True,
|
|
542
|
+
output_format="html",
|
|
543
|
+
country="us",
|
|
544
|
+
block_resources="image,font", # Speed up by blocking resources
|
|
545
|
+
clean_content="js,css", # Remove JS/CSS from output
|
|
546
|
+
wait=5000, # Wait 5 seconds after load
|
|
547
|
+
wait_for=".content-loaded", # Wait for CSS selector
|
|
548
|
+
headers=[
|
|
549
|
+
{"name": "Accept-Language", "value": "en-US"}
|
|
550
|
+
],
|
|
551
|
+
cookies=[
|
|
552
|
+
{"name": "session", "value": "abc123"}
|
|
553
|
+
],
|
|
554
|
+
)
|
|
555
|
+
|
|
556
|
+
html = client.universal_scrape_advanced(request)
|
|
557
|
+
```
|
|
558
|
+
|
|
559
|
+
#### Take Screenshots
|
|
560
|
+
|
|
561
|
+
```python
|
|
562
|
+
from thordata import ThordataClient
|
|
563
|
+
|
|
564
|
+
client = ThordataClient(scraper_token="your_token")
|
|
565
|
+
|
|
566
|
+
# Get PNG screenshot
|
|
567
|
+
png_bytes = client.universal_scrape(
|
|
568
|
+
url="https://example.com",
|
|
569
|
+
js_render=True,
|
|
570
|
+
output_format="png",
|
|
571
|
+
)
|
|
572
|
+
|
|
573
|
+
# Save to file
|
|
574
|
+
with open("screenshot.png", "wb") as f:
|
|
575
|
+
f.write(png_bytes)
|
|
576
|
+
```
|
|
577
|
+
|
|
578
|
+
### Web Scraper API (Async Tasks)
|
|
579
|
+
|
|
580
|
+
For complex scraping jobs that run asynchronously:
|
|
581
|
+
|
|
582
|
+
```python
|
|
583
|
+
from thordata import ThordataClient
|
|
584
|
+
|
|
585
|
+
client = ThordataClient(
|
|
586
|
+
scraper_token="your_token",
|
|
587
|
+
public_token="your_public_token",
|
|
588
|
+
public_key="your_public_key",
|
|
589
|
+
)
|
|
590
|
+
|
|
591
|
+
# Create a scraping task
|
|
592
|
+
task_id = client.create_scraper_task(
|
|
593
|
+
file_name="youtube_channel_data",
|
|
594
|
+
spider_id="youtube_video-post_by-url", # From Dashboard
|
|
595
|
+
spider_name="youtube.com",
|
|
596
|
+
parameters={
|
|
597
|
+
"url": "https://www.youtube.com/@PewDiePie/videos",
|
|
598
|
+
"num_of_posts": "50"
|
|
599
|
+
}
|
|
600
|
+
)
|
|
601
|
+
print(f"Task created: {task_id}")
|
|
602
|
+
|
|
603
|
+
# Wait for completion (with timeout)
|
|
604
|
+
status = client.wait_for_task(task_id, max_wait=300)
|
|
605
|
+
print(f"Task status: {status}")
|
|
606
|
+
|
|
607
|
+
# Get results
|
|
608
|
+
if status in ("ready", "success"):
|
|
609
|
+
download_url = client.get_task_result(task_id)
|
|
610
|
+
print(f"Download: {download_url}")
|
|
611
|
+
```
|
|
612
|
+
|
|
613
|
+
### Async Client (High Concurrency)
|
|
614
|
+
|
|
615
|
+
For maximum performance with concurrent requests:
|
|
616
|
+
|
|
617
|
+
```python
|
|
618
|
+
import asyncio
|
|
619
|
+
from thordata import AsyncThordataClient
|
|
620
|
+
|
|
621
|
+
async def main():
|
|
622
|
+
async with AsyncThordataClient(
|
|
623
|
+
scraper_token="your_token",
|
|
624
|
+
public_token="your_public_token",
|
|
625
|
+
public_key="your_public_key",
|
|
626
|
+
) as client:
|
|
627
|
+
|
|
628
|
+
# Concurrent proxy requests
|
|
629
|
+
urls = [
|
|
630
|
+
"https://httpbin.org/ip",
|
|
631
|
+
"https://httpbin.org/headers",
|
|
632
|
+
"https://httpbin.org/user-agent",
|
|
633
|
+
]
|
|
634
|
+
|
|
635
|
+
tasks = [client.get(url) for url in urls]
|
|
636
|
+
responses = await asyncio.gather(*tasks)
|
|
637
|
+
|
|
638
|
+
for resp in responses:
|
|
639
|
+
print(await resp.json())
|
|
640
|
+
|
|
641
|
+
asyncio.run(main())
|
|
642
|
+
```
|
|
643
|
+
|
|
644
|
+
#### Async SERP Search
|
|
645
|
+
|
|
646
|
+
```python
|
|
647
|
+
import asyncio
|
|
648
|
+
from thordata import AsyncThordataClient, Engine
|
|
649
|
+
|
|
650
|
+
async def search_multiple():
|
|
651
|
+
async with AsyncThordataClient(scraper_token="your_token") as client:
|
|
652
|
+
queries = ["python", "javascript", "rust", "go"]
|
|
653
|
+
|
|
654
|
+
tasks = [
|
|
655
|
+
client.serp_search(q, engine=Engine.GOOGLE)
|
|
656
|
+
for q in queries
|
|
657
|
+
]
|
|
658
|
+
|
|
659
|
+
results = await asyncio.gather(*tasks)
|
|
660
|
+
|
|
661
|
+
for query, result in zip(queries, results):
|
|
662
|
+
count = len(result.get("organic", []))
|
|
663
|
+
print(f"{query}: {count} results")
|
|
664
|
+
|
|
665
|
+
asyncio.run(search_multiple())
|
|
666
|
+
```
|
|
667
|
+
|
|
668
|
+
### Location APIs
|
|
669
|
+
|
|
670
|
+
Discover available geo-targeting options:
|
|
671
|
+
|
|
672
|
+
```python
|
|
673
|
+
from thordata import ThordataClient, ProxyType
|
|
674
|
+
|
|
675
|
+
client = ThordataClient(
|
|
676
|
+
scraper_token="your_token",
|
|
677
|
+
public_token="your_public_token",
|
|
678
|
+
public_key="your_public_key",
|
|
679
|
+
)
|
|
680
|
+
|
|
681
|
+
# List all supported countries
|
|
682
|
+
countries = client.list_countries(proxy_type=ProxyType.RESIDENTIAL)
|
|
683
|
+
print(f"Supported countries: {len(countries)}")
|
|
684
|
+
|
|
685
|
+
# List states for a country
|
|
686
|
+
states = client.list_states("US")
|
|
687
|
+
for state in states[:5]:
|
|
688
|
+
print(f" {state['state_code']}: {state['state_name']}")
|
|
689
|
+
|
|
690
|
+
# List cities
|
|
691
|
+
cities = client.list_cities("US", state_code="california")
|
|
692
|
+
print(f"Cities in California: {len(cities)}")
|
|
693
|
+
|
|
694
|
+
# List ASNs (for ISP targeting)
|
|
695
|
+
asns = client.list_asn("US")
|
|
696
|
+
for asn in asns[:5]:
|
|
697
|
+
print(f" {asn['asn_code']}: {asn['asn_name']}")
|
|
698
|
+
```
|
|
699
|
+
|
|
700
|
+
### Error Handling
|
|
701
|
+
|
|
702
|
+
```python
|
|
703
|
+
from thordata import (
|
|
704
|
+
ThordataClient,
|
|
705
|
+
ThordataError,
|
|
706
|
+
ThordataAuthError,
|
|
707
|
+
ThordataRateLimitError,
|
|
708
|
+
ThordataNetworkError,
|
|
709
|
+
ThordataTimeoutError,
|
|
710
|
+
)
|
|
711
|
+
|
|
712
|
+
client = ThordataClient(scraper_token="your_token")
|
|
713
|
+
|
|
714
|
+
try:
|
|
715
|
+
result = client.serp_search("test query")
|
|
716
|
+
except ThordataAuthError as e:
|
|
717
|
+
print(f"Authentication failed: {e}")
|
|
718
|
+
print(f"Check your token. Status code: {e.status_code}")
|
|
719
|
+
except ThordataRateLimitError as e:
|
|
720
|
+
print(f"Rate limited: {e}")
|
|
721
|
+
if e.retry_after:
|
|
722
|
+
print(f"Retry after {e.retry_after} seconds")
|
|
723
|
+
except ThordataTimeoutError as e:
|
|
724
|
+
print(f"Request timed out: {e}")
|
|
725
|
+
except ThordataNetworkError as e:
|
|
726
|
+
print(f"Network error: {e}")
|
|
727
|
+
except ThordataError as e:
|
|
728
|
+
print(f"General error: {e}")
|
|
729
|
+
```
|
|
730
|
+
|
|
731
|
+
### Retry Configuration
|
|
732
|
+
|
|
733
|
+
Customize automatic retry behavior:
|
|
734
|
+
|
|
735
|
+
```python
|
|
736
|
+
from thordata import ThordataClient, RetryConfig
|
|
737
|
+
|
|
738
|
+
# Custom retry configuration
|
|
739
|
+
retry_config = RetryConfig(
|
|
740
|
+
max_retries=5, # Maximum retry attempts
|
|
741
|
+
backoff_factor=2.0, # Exponential backoff multiplier
|
|
742
|
+
max_backoff=120.0, # Maximum wait between retries
|
|
743
|
+
jitter=True, # Add randomness to prevent thundering herd
|
|
744
|
+
)
|
|
745
|
+
|
|
746
|
+
client = ThordataClient(
|
|
747
|
+
scraper_token="your_token",
|
|
748
|
+
retry_config=retry_config,
|
|
749
|
+
)
|
|
750
|
+
|
|
751
|
+
# Requests will automatically retry on transient failures
|
|
752
|
+
response = client.get("https://example.com")
|
|
753
|
+
```
|
|
754
|
+
|
|
755
|
+
---
|
|
756
|
+
|
|
757
|
+
## 🔧 Configuration Reference
|
|
758
|
+
|
|
759
|
+
### ThordataClient Parameters
|
|
760
|
+
|
|
761
|
+
| Parameter | Type | Default | Description |
|
|
762
|
+
|-----------|------|---------|-------------|
|
|
763
|
+
| scraper_token | str | required | API token from Dashboard |
|
|
764
|
+
| public_token | str | None | Public API token (for tasks/locations) |
|
|
765
|
+
| public_key | str | None | Public API key |
|
|
766
|
+
| proxy_host | str | "pr.thordata.net" | Proxy gateway host |
|
|
767
|
+
| proxy_port | int | 9999 | Proxy gateway port |
|
|
768
|
+
| timeout | int | 30 | Default request timeout (seconds) |
|
|
769
|
+
| retry_config | RetryConfig | None | Retry configuration |
|
|
770
|
+
|
|
771
|
+
### ProxyConfig Parameters
|
|
772
|
+
|
|
773
|
+
| Parameter | Type | Default | Description |
|
|
774
|
+
|-----------|------|---------|-------------|
|
|
775
|
+
| username | str | required | Proxy username |
|
|
776
|
+
| password | str | required | Proxy password |
|
|
777
|
+
| product | ProxyProduct | RESIDENTIAL | Proxy type |
|
|
778
|
+
| country | str | None | ISO 3166-1 alpha-2 code |
|
|
779
|
+
| state | str | None | State name (lowercase) |
|
|
780
|
+
| city | str | None | City name (lowercase) |
|
|
781
|
+
| continent | str | None | Continent code (af/an/as/eu/na/oc/sa) |
|
|
782
|
+
| asn | str | None | ASN code (requires country) |
|
|
783
|
+
| session_id | str | None | Session ID for sticky sessions |
|
|
784
|
+
| session_duration | int | None | Session duration (1-90 minutes) |
|
|
785
|
+
|
|
786
|
+
### Proxy Products & Ports
|
|
787
|
+
|
|
788
|
+
| Product | Port | Description |
|
|
789
|
+
|---------|------|-------------|
|
|
790
|
+
| RESIDENTIAL | 9999 | Rotating residential IPs |
|
|
791
|
+
| MOBILE | 5555 | Mobile carrier IPs |
|
|
792
|
+
| DATACENTER | 7777 | Datacenter IPs |
|
|
793
|
+
| ISP | 6666 | Static ISP IPs |
|
|
794
|
+
|
|
795
|
+
---
|
|
796
|
+
|
|
797
|
+
## 📁 Project Structure
|
|
798
|
+
|
|
799
|
+
```
|
|
800
|
+
thordata-python-sdk/
|
|
801
|
+
├── src/thordata/
|
|
802
|
+
│ ├── __init__.py # Public API exports
|
|
803
|
+
│ ├── client.py # Sync client
|
|
804
|
+
│ ├── async_client.py # Async client
|
|
805
|
+
│ ├── models.py # Data models (ProxyConfig, SerpRequest, etc.)
|
|
806
|
+
│ ├── enums.py # Enumerations
|
|
807
|
+
│ ├── exceptions.py # Exception hierarchy
|
|
808
|
+
│ ├── retry.py # Retry mechanism
|
|
809
|
+
│ └── _utils.py # Internal utilities
|
|
810
|
+
├── tests/ # Test suite
|
|
811
|
+
├── examples/ # Usage examples
|
|
812
|
+
├── pyproject.toml # Package configuration
|
|
813
|
+
└── README.md
|
|
814
|
+
```
|
|
815
|
+
|
|
816
|
+
---
|
|
817
|
+
|
|
818
|
+
## 🧪 Development
|
|
819
|
+
|
|
820
|
+
### Setup
|
|
821
|
+
|
|
822
|
+
```bash
|
|
823
|
+
# Clone the repository
|
|
824
|
+
git clone https://github.com/Thordata/thordata-python-sdk.git
|
|
825
|
+
cd thordata-python-sdk
|
|
826
|
+
|
|
827
|
+
# Create virtual environment
|
|
828
|
+
python -m venv venv
|
|
829
|
+
source venv/bin/activate # On Windows: venv\Scripts\activate
|
|
830
|
+
|
|
831
|
+
# Install with dev dependencies
|
|
832
|
+
pip install -e ".[dev]"
|
|
833
|
+
```
|
|
834
|
+
|
|
835
|
+
### Run Tests
|
|
836
|
+
|
|
837
|
+
```bash
|
|
838
|
+
# Run all tests
|
|
839
|
+
pytest
|
|
840
|
+
|
|
841
|
+
# Run with coverage
|
|
842
|
+
pytest --cov=thordata --cov-report=html
|
|
843
|
+
|
|
844
|
+
# Run specific test file
|
|
845
|
+
pytest tests/test_client.py -v
|
|
846
|
+
```
|
|
847
|
+
|
|
848
|
+
### Code Quality
|
|
849
|
+
|
|
850
|
+
```bash
|
|
851
|
+
# Format code
|
|
852
|
+
black src tests
|
|
853
|
+
|
|
854
|
+
# Lint
|
|
855
|
+
ruff check src tests
|
|
856
|
+
|
|
857
|
+
# Type check
|
|
858
|
+
mypy src
|
|
859
|
+
```
|
|
860
|
+
|
|
861
|
+
---
|
|
862
|
+
|
|
863
|
+
## 📝 Changelog
|
|
864
|
+
|
|
865
|
+
See [CHANGELOG.md](CHANGELOG.md) for version history.
|
|
866
|
+
|
|
867
|
+
---
|
|
868
|
+
|
|
869
|
+
## 🤝 Contributing
|
|
870
|
+
|
|
871
|
+
Contributions are welcome! Please see [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.
|
|
872
|
+
|
|
873
|
+
1. Fork the repository
|
|
874
|
+
2. Create a feature branch (`git checkout -b feature/amazing-feature`)
|
|
875
|
+
3. Commit your changes (`git commit -m 'Add amazing feature'`)
|
|
876
|
+
4. Push to the branch (`git push origin feature/amazing-feature`)
|
|
877
|
+
5. Open a Pull Request
|
|
878
|
+
|
|
879
|
+
---
|
|
880
|
+
|
|
881
|
+
## 📄 License
|
|
882
|
+
|
|
883
|
+
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
|
|
884
|
+
|
|
885
|
+
---
|
|
886
|
+
|
|
887
|
+
## 🆘 Support
|
|
888
|
+
|
|
889
|
+
- 📧 **Email**: support@thordata.com
|
|
890
|
+
- 📚 **Documentation**: [doc.thordata.com](https://doc.thordata.com)
|
|
891
|
+
- 🐛 **Issues**: [GitHub Issues](https://github.com/Thordata/thordata-python-sdk/issues)
|
|
892
|
+
- 💬 **Dashboard**: [thordata.com](https://www.thordata.com)
|
|
893
|
+
|
|
894
|
+
<div align="center">
|
|
895
|
+
<sub>Built with ❤️ by the Thordata Team</sub>
|
|
896
|
+
</div>
|