thordata-sdk 0.5.0__py3-none-any.whl → 0.7.0__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,896 +1,1053 @@
1
- Metadata-Version: 2.4
2
- Name: thordata-sdk
3
- Version: 0.5.0
4
- Summary: The Official Python SDK for Thordata - AI Data Infrastructure & Proxy Network.
5
- Author-email: Thordata Developer Team <support@thordata.com>
6
- License: MIT
7
- Project-URL: Homepage, https://www.thordata.com
8
- Project-URL: Documentation, https://github.com/Thordata/thordata-python-sdk#readme
9
- Project-URL: Source, https://github.com/Thordata/thordata-python-sdk
10
- Project-URL: Tracker, https://github.com/Thordata/thordata-python-sdk/issues
11
- Project-URL: Changelog, https://github.com/Thordata/thordata-python-sdk/blob/main/CHANGELOG.md
12
- Keywords: web scraping,proxy,residential proxy,datacenter proxy,ai,llm,data-mining,serp,thordata,web scraper,anti-bot bypass
13
- Classifier: Development Status :: 4 - Beta
14
- Classifier: Intended Audience :: Developers
15
- Classifier: Topic :: Software Development :: Libraries :: Python Modules
16
- Classifier: Topic :: Internet :: WWW/HTTP
17
- Classifier: Topic :: Internet :: Proxy Servers
18
- Classifier: Programming Language :: Python :: 3
19
- Classifier: Programming Language :: Python :: 3.9
20
- Classifier: Programming Language :: Python :: 3.10
21
- Classifier: Programming Language :: Python :: 3.11
22
- Classifier: Programming Language :: Python :: 3.12
23
- Classifier: License :: OSI Approved :: MIT License
24
- Classifier: Operating System :: OS Independent
25
- Classifier: Typing :: Typed
26
- Requires-Python: >=3.9
27
- Description-Content-Type: text/markdown
28
- License-File: LICENSE
29
- Requires-Dist: requests>=2.25.0
30
- Requires-Dist: aiohttp>=3.9.0
31
- Provides-Extra: dev
32
- Requires-Dist: pytest>=7.0.0; extra == "dev"
33
- Requires-Dist: pytest-asyncio>=0.21.0; extra == "dev"
34
- Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
35
- Requires-Dist: pytest-httpserver>=1.0.0; extra == "dev"
36
- Requires-Dist: python-dotenv>=1.0.0; extra == "dev"
37
- Requires-Dist: black>=23.0.0; extra == "dev"
38
- Requires-Dist: ruff>=0.1.0; extra == "dev"
39
- Requires-Dist: mypy>=1.0.0; extra == "dev"
40
- Requires-Dist: types-requests>=2.28.0; extra == "dev"
41
- Dynamic: license-file
42
-
43
- # Thordata Python SDK
44
-
45
- <div align="center">
46
-
47
- **Official Python client for Thordata's Proxy Network, SERP API, Web Unlocker, and Web Scraper API.**
48
-
49
- *Async-ready, type-safe, built for AI agents and large-scale data collection.*
50
-
51
- [![CI](https://github.com/Thordata/thordata-python-sdk/actions/workflows/ci.yml/badge.svg)](https://github.com/Thordata/thordata-python-sdk/actions/workflows/ci.yml)
52
- [![PyPI version](https://img.shields.io/pypi/v/thordata-sdk?color=blue)](https://pypi.org/project/thordata-sdk/)
53
- [![Python](https://img.shields.io/badge/python-3.8+-blue)](https://python.org)
54
- [![License](https://img.shields.io/badge/license-MIT-green)](LICENSE)
55
- [![Typed](https://img.shields.io/badge/typing-typed-purple)](https://github.com/Thordata/thordata-python-sdk)
56
-
57
- [Documentation](https://doc.thordata.com) • [Dashboard](https://www.thordata.com) • [Examples](examples/) • [Changelog](CHANGELOG.md)
58
-
59
- </div>
60
-
61
- ---
62
-
63
- ## ✨ Features
64
-
65
- | Feature | Description |
66
- |---------|-------------|
67
- | 🌐 **Proxy Network** | Residential, Mobile, Datacenter, ISP proxies with geo-targeting |
68
- | 🔍 **SERP API** | Google, Bing, Yandex, DuckDuckGo, Baidu search results |
69
- | 🔓 **Web Unlocker** | Bypass Cloudflare, CAPTCHAs, anti-bot systems automatically |
70
- | 🕷️ **Web Scraper** | Async task-based scraping for complex sites |
71
- | ⚡ **Async Support** | Full async/await support with aiohttp |
72
- | 🔄 **Auto Retry** | Configurable retry with exponential backoff |
73
- | 📝 **Type Safe** | Full type annotations for IDE autocomplete |
74
-
75
- ---
76
-
77
- ## 📦 Installation
78
-
79
- ```bash
80
- pip install thordata-sdk
81
- ```
82
-
83
- For development:
84
-
85
- ```bash
86
- pip install thordata-sdk[dev]
87
- ```
88
-
89
- ---
90
-
91
- ## 🚀 Quick Start
92
-
93
- ### Get Your Credentials
94
-
95
- 1. Sign up at [thordata.com](https://www.thordata.com)
96
- 2. Navigate to your Dashboard
97
- 3. Copy your Scraper Token, Public Token, and Public Key
98
-
99
- ### Basic Usage
100
-
101
- ```python
102
- from thordata import ThordataClient
103
-
104
- # Initialize the client
105
- client = ThordataClient(
106
- scraper_token="your_scraper_token",
107
- public_token="your_public_token", # Optional, for task APIs
108
- public_key="your_public_key" # Optional, for task APIs
109
- )
110
-
111
- # Make a request through the proxy network
112
- response = client.get("https://httpbin.org/ip")
113
- print(response.json())
114
- # {'origin': '123.45.67.89'} # Residential IP
115
- ```
116
-
117
- ### Environment Variables
118
-
119
- Create a `.env` file:
120
-
121
- ```env
122
- THORDATA_SCRAPER_TOKEN=your_scraper_token
123
- THORDATA_PUBLIC_TOKEN=your_public_token
124
- THORDATA_PUBLIC_KEY=your_public_key
125
- ```
126
-
127
- Then use with python-dotenv:
128
-
129
- ```python
130
- import os
131
- from dotenv import load_dotenv
132
- from thordata import ThordataClient
133
-
134
- load_dotenv()
135
-
136
- client = ThordataClient(
137
- scraper_token=os.getenv("THORDATA_SCRAPER_TOKEN"),
138
- public_token=os.getenv("THORDATA_PUBLIC_TOKEN"),
139
- public_key=os.getenv("THORDATA_PUBLIC_KEY"),
140
- )
141
- ```
142
-
143
- ---
144
-
145
- ## 📖 Usage Guide
146
-
147
- ### 1. Proxy Network
148
-
149
- #### Basic Proxy Request
150
-
151
- ```python
152
- from thordata import ThordataClient
153
-
154
- client = ThordataClient(scraper_token="your_token")
155
-
156
- # GET request through proxy
157
- response = client.get("https://example.com")
158
- print(response.text)
159
-
160
- # POST request through proxy
161
- response = client.post("https://httpbin.org/post", json={"key": "value"})
162
- print(response.json())
163
- ```
164
-
165
- #### Geo-Targeting
166
-
167
- ```python
168
- from thordata import ThordataClient, ProxyConfig
169
-
170
- client = ThordataClient(scraper_token="your_token")
171
-
172
- # Create a proxy config with geo-targeting
173
- config = ProxyConfig(
174
- username="your_username",
175
- password="your_password",
176
- country="us", # Target country
177
- state="california", # Target state
178
- city="los_angeles", # Target city
179
- )
180
-
181
- response = client.get("https://httpbin.org/ip", proxy_config=config)
182
- print(response.json())
183
- ```
184
-
185
- #### Sticky Sessions
186
-
187
- Keep the same IP for multiple requests:
188
-
189
- ```python
190
- from thordata import ThordataClient, StickySession
191
-
192
- client = ThordataClient(scraper_token="your_token")
193
-
194
- # Create a sticky session (same IP for 10 minutes)
195
- session = StickySession(
196
- username="your_username",
197
- password="your_password",
198
- country="gb",
199
- duration_minutes=10,
200
- )
201
-
202
- # All requests use the same IP
203
- for i in range(5):
204
- response = client.get("https://httpbin.org/ip", proxy_config=session)
205
- print(f"Request {i+1}: {response.json()['origin']}")
206
- ```
207
-
208
- #### Different Proxy Products
209
-
210
- ```python
211
- from thordata import ProxyConfig, ProxyProduct
212
-
213
- # Residential proxy (default, port 9999)
214
- residential = ProxyConfig(
215
- username="user", password="pass",
216
- product=ProxyProduct.RESIDENTIAL
217
- )
218
-
219
- # Mobile proxy (port 5555)
220
- mobile = ProxyConfig(
221
- username="user", password="pass",
222
- product=ProxyProduct.MOBILE
223
- )
224
-
225
- # Datacenter proxy (port 7777)
226
- datacenter = ProxyConfig(
227
- username="user", password="pass",
228
- product=ProxyProduct.DATACENTER
229
- )
230
- ```
231
-
232
- ### 2. SERP API (Search Engine Results)
233
-
234
- #### Basic Search
235
-
236
- ```python
237
- from thordata import ThordataClient, Engine
238
-
239
- client = ThordataClient(scraper_token="your_token")
240
-
241
- # Google search
242
- results = client.serp_search(
243
- query="python programming",
244
- engine=Engine.GOOGLE,
245
- num=10
246
- )
247
-
248
- # Print organic results
249
- for result in results.get("organic", []):
250
- print(f"{result['title']}: {result['link']}")
251
- ```
252
-
253
- #### General Calling Method
254
-
255
- ```python
256
- from thordata import ThordataClient, Engine
257
-
258
- client = ThordataClient(scraper_token="YOUR_SCRAPER_TOKEN")
259
-
260
- results = client.serp_search(
261
- query="pizza",
262
- engine=Engine.GOOGLE, # or "google"
263
- num=10,
264
- country="us",
265
- language="en",
266
- search_type="news", # corresponds to tbm=nws
267
- # Other parameters are passed in via kwargs
268
- ibp="some_ibp_value",
269
- lsig="some_lsig_value",
270
- )
271
- ```
272
-
273
- **Note**: All parameters above will be assembled into Thordata SERP API request parameters.
274
-
275
- #### Advanced Search Options
276
-
277
- ```python
278
- from thordata import ThordataClient, SerpRequest
279
-
280
- client = ThordataClient(scraper_token="your_token")
281
-
282
- # Create a detailed search request
283
- request = SerpRequest(
284
- query="best laptops 2024",
285
- engine="google",
286
- num=20,
287
- country="us",
288
- language="en",
289
- search_type="shopping", # shopping, news, images, videos
290
- time_filter="month", # hour, day, week, month, year
291
- safe_search=True,
292
- device="mobile", # desktop, mobile, tablet
293
- )
294
-
295
- results = client.serp_search_advanced(request)
296
- ```
297
-
298
- #### Multiple Search Engines
299
-
300
- ```python
301
- from thordata import ThordataClient, Engine
302
-
303
- client = ThordataClient(scraper_token="your_token")
304
-
305
- # Google
306
- google_results = client.serp_search("AI news", engine=Engine.GOOGLE)
307
-
308
- # Bing
309
- bing_results = client.serp_search("AI news", engine=Engine.BING)
310
-
311
- # Yandex (Russian search engine)
312
- yandex_results = client.serp_search("AI news", engine=Engine.YANDEX)
313
-
314
- # DuckDuckGo
315
- ddg_results = client.serp_search("AI news", engine=Engine.DUCKDUCKGO)
316
- ```
317
-
318
- ---
319
-
320
- ## 🔧 SERP API Parameter Mapping
321
-
322
- Thordata's SERP API supports multiple search engines and sub-features (Google Search/Shopping/News, etc.).
323
- This SDK wraps common parameters through `ThordataClient.serp_search` and `SerpRequest`, while other parameters can be passed directly through `**kwargs`.
324
-
325
- ### Google Search Parameter Mapping
326
-
327
- | Document Parameter | SDK Field/Usage | Description |
328
- |-------------------|-----------------|-------------|
329
- | q | query | Search keyword |
330
- | engine | engine | Engine.GOOGLE / "google" |
331
- | google_domain | google_domain | e.g., "google.co.uk" |
332
- | gl | country | Country/region, e.g., "us" |
333
- | hl | language | Language, e.g., "en", "zh-CN" |
334
- | cr | countries_filter | Multi-country filter, e.g., "countryFR |
335
- | lr | languages_filter | Multi-language filter, e.g., "lang_en |
336
- | location | location | Exact location, e.g., "India" |
337
- | uule | uule | Base64 encoded location string |
338
- | tbm | search_type | "images"→tbm=isch, "shopping"→tbm=shop, "news"→tbm=nws, "videos"→tbm=vid, other values passed through as-is |
339
- | start | start | Result offset for pagination |
340
- | num | num | Number of results per page |
341
- | ludocid | ludocid | Google Place ID |
342
- | kgmid | kgmid | Google Knowledge Graph ID |
343
- | ibp | ibp="..." (kwargs) | Passed through **kwargs |
344
- | lsig | lsig="..." (kwargs) | Same as above |
345
- | si | si="..." (kwargs) | Same as above |
346
- | uds | uds="ADV" (kwargs) | Same as above |
347
- | tbs | time_filter or tbs="..." | time_filter="week" generates tbs=qdr:w, can also pass complete tbs directly |
348
- | safe | safe_search | True → safe=active, False → safe=off |
349
- | nfpr | no_autocorrect | True → nfpr=1 |
350
- | filter | filter_duplicates | True → filter=1, False → filter=0 |
351
-
352
- **Example: Google Search Basic Usage**
353
-
354
- ```python
355
- results = client.serp_search(
356
- query="python web scraping best practices",
357
- engine=Engine.GOOGLE,
358
- country="us",
359
- language="en",
360
- num=10,
361
- time_filter="week", # Last week
362
- safe_search=True, # Adult content filter
363
- )
364
- ```
365
-
366
- ### Google Shopping Parameter Mapping
367
-
368
- Shopping still uses engine="google", search_type="shopping" to select Shopping mode:
369
-
370
- ```python
371
- results = client.serp_search(
372
- query="iPhone 15",
373
- engine=Engine.GOOGLE,
374
- search_type="shopping", # tbm=shop
375
- country="us",
376
- language="en",
377
- num=20,
378
- min_price=500, # Parameters below passed through kwargs
379
- max_price=1500,
380
- sort_by=1, # 1=price low to high, 2=high to low
381
- free_shipping=True,
382
- on_sale=True,
383
- small_business=True,
384
- direct_link=True,
385
- shoprs="FILTER_ID_HERE",
386
- )
387
- shopping_items = results.get("shopping_results", [])
388
- ```
389
-
390
- | Document Parameter | SDK Field/Usage | Description |
391
- |-------------------|-----------------|-------------|
392
- | q | query | Search keyword |
393
- | google_domain | google_domain | Same as above |
394
- | gl | country | Same as above |
395
- | hl | language | Same as above |
396
- | location | location | Same as above |
397
- | uule | uule | Same as above |
398
- | start | start | Offset |
399
- | num | num | Quantity |
400
- | tbs | time_filter or tbs="..." | Same as above |
401
- | shoprs | shoprs="..." (kwargs) | Filter ID |
402
- | min_price | min_price=... (kwargs) | Minimum price |
403
- | max_price | max_price=... (kwargs) | Maximum price |
404
- | sort_by | sort_by=1/2 (kwargs) | Sort order |
405
- | free_shipping | free_shipping=True/False (kwargs) | Free shipping |
406
- | on_sale | on_sale=True/False (kwargs) | On sale |
407
- | small_business | small_business=True/False (kwargs) | Small business |
408
- | direct_link | direct_link=True/False (kwargs) | Include direct links |
409
-
410
- ### Google Local Parameter Mapping
411
-
412
- Google Local is mainly about location-based local searches.
413
- In the SDK, you can use search_type="local" to mark Local mode (tbm passed through as "local"), combined with location + uule.
414
-
415
- ```python
416
- results = client.serp_search(
417
- query="pizza near me",
418
- engine=Engine.GOOGLE,
419
- search_type="local",
420
- google_domain="google.com",
421
- country="us",
422
- language="en",
423
- location="San Francisco",
424
- uule="w+CAIQICIFU2FuIEZyYW5jaXNjbw", # Example value
425
- start=0, # Local only accepts 0, 20, 40...
426
- )
427
- local_results = results.get("local_results", results.get("organic", []))
428
- ```
429
-
430
- | Document Parameter | SDK Field/Usage | Description |
431
- |-------------------|-----------------|-------------|
432
- | q | query | Search term |
433
- | google_domain | google_domain | Domain |
434
- | gl | country | Country |
435
- | hl | language | Language |
436
- | location | location | Local location |
437
- | uule | uule | Encoded location |
438
- | start | start | Offset (must be 0,20,40...) |
439
- | ludocid | ludocid | Place ID (commonly used in Local results) |
440
- | tbs | time_filter or tbs="..." | Advanced filtering |
441
-
442
- ### Google Videos Parameter Mapping
443
-
444
- ```python
445
- results = client.serp_search(
446
- query="python async tutorial",
447
- engine=Engine.GOOGLE,
448
- search_type="videos", # tbm=vid
449
- country="us",
450
- language="en",
451
- languages_filter="lang_en|lang_fr",
452
- location="United States",
453
- uule="ENCODED_LOCATION_HERE",
454
- num=10,
455
- time_filter="month",
456
- safe_search=True,
457
- filter_duplicates=True,
458
- )
459
- video_results = results.get("video_results", results.get("organic", []))
460
- ```
461
-
462
- | Document Parameter | SDK Field/Usage | Description |
463
- |-------------------|-----------------|-------------|
464
- | q | query | Search term |
465
- | google_domain | google_domain | Domain |
466
- | gl | country | Country |
467
- | hl | language | Language |
468
- | lr | languages_filter | Multi-language filter |
469
- | location | location | Geographic location |
470
- | uule | uule | Encoded location |
471
- | start | start | Offset |
472
- | num | num | Quantity |
473
- | tbs | time_filter or tbs="..." | Time and advanced filtering |
474
- | safe | safe_search | Adult content filter |
475
- | nfpr | no_autocorrect | Disable auto-correction |
476
- | filter | filter_duplicates | Remove duplicates |
477
-
478
- ### Google News Parameter Mapping
479
-
480
- Google News has a set of exclusive token parameters for precise control of "topics/media/sections/stories".
481
-
482
- ```python
483
- results = client.serp_search(
484
- query="AI regulation",
485
- engine=Engine.GOOGLE,
486
- search_type="news", # tbm=nws
487
- country="us",
488
- language="en",
489
- topic_token="YOUR_TOPIC_TOKEN", # Optional
490
- publication_token="YOUR_PUBLICATION_TOKEN", # Optional
491
- section_token="YOUR_SECTION_TOKEN", # Optional
492
- story_token="YOUR_STORY_TOKEN", # Optional
493
- so=1, # 0=relevance, 1=time
494
- )
495
- news_results = results.get("news_results", results.get("organic", []))
496
- ```
497
-
498
- | Document Parameter | SDK Field/Usage | Description |
499
- |-------------------|-----------------|-------------|
500
- | q | query | Search term |
501
- | gl | country | Country |
502
- | hl | language | Language |
503
- | topic_token | topic_token="..." (kwargs) | Topic token |
504
- | publication_token | publication_token="..." (kwargs) | Media token |
505
- | section_token | section_token="..." (kwargs) | Section token |
506
- | story_token | story_token="..." (kwargs) | Story token |
507
- | so | so=0/1 (kwargs) | Sort: 0=relevance, 1=time |
508
-
509
- ---
510
-
511
- 👉 For more SERP modes and parameter mappings, see docs/serp_reference.md.
512
-
513
- ## 🔓 Web Unlocker (Universal Scraping API)
514
-
515
- Automatically bypass anti-bot protections:
516
-
517
- #### Basic Usage
518
-
519
- ```python
520
- from thordata import ThordataClient
521
-
522
- client = ThordataClient(scraper_token="your_token")
523
-
524
- # Get HTML content
525
- html = client.universal_scrape(
526
- url="https://example.com",
527
- js_render=True, # Enable JavaScript rendering
528
- )
529
- print(html[:500])
530
- ```
531
-
532
- #### Advanced Options
533
-
534
- ```python
535
- from thordata import ThordataClient, UniversalScrapeRequest
536
-
537
- client = ThordataClient(scraper_token="your_token")
538
-
539
- request = UniversalScrapeRequest(
540
- url="https://example.com",
541
- js_render=True,
542
- output_format="html",
543
- country="us",
544
- block_resources="image,font", # Speed up by blocking resources
545
- clean_content="js,css", # Remove JS/CSS from output
546
- wait=5000, # Wait 5 seconds after load
547
- wait_for=".content-loaded", # Wait for CSS selector
548
- headers=[
549
- {"name": "Accept-Language", "value": "en-US"}
550
- ],
551
- cookies=[
552
- {"name": "session", "value": "abc123"}
553
- ],
554
- )
555
-
556
- html = client.universal_scrape_advanced(request)
557
- ```
558
-
559
- #### Take Screenshots
560
-
561
- ```python
562
- from thordata import ThordataClient
563
-
564
- client = ThordataClient(scraper_token="your_token")
565
-
566
- # Get PNG screenshot
567
- png_bytes = client.universal_scrape(
568
- url="https://example.com",
569
- js_render=True,
570
- output_format="png",
571
- )
572
-
573
- # Save to file
574
- with open("screenshot.png", "wb") as f:
575
- f.write(png_bytes)
576
- ```
577
-
578
- ### Web Scraper API (Async Tasks)
579
-
580
- For complex scraping jobs that run asynchronously:
581
-
582
- ```python
583
- from thordata import ThordataClient
584
-
585
- client = ThordataClient(
586
- scraper_token="your_token",
587
- public_token="your_public_token",
588
- public_key="your_public_key",
589
- )
590
-
591
- # Create a scraping task
592
- task_id = client.create_scraper_task(
593
- file_name="youtube_channel_data",
594
- spider_id="youtube_video-post_by-url", # From Dashboard
595
- spider_name="youtube.com",
596
- parameters={
597
- "url": "https://www.youtube.com/@PewDiePie/videos",
598
- "num_of_posts": "50"
599
- }
600
- )
601
- print(f"Task created: {task_id}")
602
-
603
- # Wait for completion (with timeout)
604
- status = client.wait_for_task(task_id, max_wait=300)
605
- print(f"Task status: {status}")
606
-
607
- # Get results
608
- if status in ("ready", "success"):
609
- download_url = client.get_task_result(task_id)
610
- print(f"Download: {download_url}")
611
- ```
612
-
613
- ### Async Client (High Concurrency)
614
-
615
- For maximum performance with concurrent requests:
616
-
617
- ```python
618
- import asyncio
619
- from thordata import AsyncThordataClient
620
-
621
- async def main():
622
- async with AsyncThordataClient(
623
- scraper_token="your_token",
624
- public_token="your_public_token",
625
- public_key="your_public_key",
626
- ) as client:
627
-
628
- # Concurrent proxy requests
629
- urls = [
630
- "https://httpbin.org/ip",
631
- "https://httpbin.org/headers",
632
- "https://httpbin.org/user-agent",
633
- ]
634
-
635
- tasks = [client.get(url) for url in urls]
636
- responses = await asyncio.gather(*tasks)
637
-
638
- for resp in responses:
639
- print(await resp.json())
640
-
641
- asyncio.run(main())
642
- ```
643
-
644
- #### Async SERP Search
645
-
646
- ```python
647
- import asyncio
648
- from thordata import AsyncThordataClient, Engine
649
-
650
- async def search_multiple():
651
- async with AsyncThordataClient(scraper_token="your_token") as client:
652
- queries = ["python", "javascript", "rust", "go"]
653
-
654
- tasks = [
655
- client.serp_search(q, engine=Engine.GOOGLE)
656
- for q in queries
657
- ]
658
-
659
- results = await asyncio.gather(*tasks)
660
-
661
- for query, result in zip(queries, results):
662
- count = len(result.get("organic", []))
663
- print(f"{query}: {count} results")
664
-
665
- asyncio.run(search_multiple())
666
- ```
667
-
668
- ### Location APIs
669
-
670
- Discover available geo-targeting options:
671
-
672
- ```python
673
- from thordata import ThordataClient, ProxyType
674
-
675
- client = ThordataClient(
676
- scraper_token="your_token",
677
- public_token="your_public_token",
678
- public_key="your_public_key",
679
- )
680
-
681
- # List all supported countries
682
- countries = client.list_countries(proxy_type=ProxyType.RESIDENTIAL)
683
- print(f"Supported countries: {len(countries)}")
684
-
685
- # List states for a country
686
- states = client.list_states("US")
687
- for state in states[:5]:
688
- print(f" {state['state_code']}: {state['state_name']}")
689
-
690
- # List cities
691
- cities = client.list_cities("US", state_code="california")
692
- print(f"Cities in California: {len(cities)}")
693
-
694
- # List ASNs (for ISP targeting)
695
- asns = client.list_asn("US")
696
- for asn in asns[:5]:
697
- print(f" {asn['asn_code']}: {asn['asn_name']}")
698
- ```
699
-
700
- ### Error Handling
701
-
702
- ```python
703
- from thordata import (
704
- ThordataClient,
705
- ThordataError,
706
- ThordataAuthError,
707
- ThordataRateLimitError,
708
- ThordataNetworkError,
709
- ThordataTimeoutError,
710
- )
711
-
712
- client = ThordataClient(scraper_token="your_token")
713
-
714
- try:
715
- result = client.serp_search("test query")
716
- except ThordataAuthError as e:
717
- print(f"Authentication failed: {e}")
718
- print(f"Check your token. Status code: {e.status_code}")
719
- except ThordataRateLimitError as e:
720
- print(f"Rate limited: {e}")
721
- if e.retry_after:
722
- print(f"Retry after {e.retry_after} seconds")
723
- except ThordataTimeoutError as e:
724
- print(f"Request timed out: {e}")
725
- except ThordataNetworkError as e:
726
- print(f"Network error: {e}")
727
- except ThordataError as e:
728
- print(f"General error: {e}")
729
- ```
730
-
731
- ### Retry Configuration
732
-
733
- Customize automatic retry behavior:
734
-
735
- ```python
736
- from thordata import ThordataClient, RetryConfig
737
-
738
- # Custom retry configuration
739
- retry_config = RetryConfig(
740
- max_retries=5, # Maximum retry attempts
741
- backoff_factor=2.0, # Exponential backoff multiplier
742
- max_backoff=120.0, # Maximum wait between retries
743
- jitter=True, # Add randomness to prevent thundering herd
744
- )
745
-
746
- client = ThordataClient(
747
- scraper_token="your_token",
748
- retry_config=retry_config,
749
- )
750
-
751
- # Requests will automatically retry on transient failures
752
- response = client.get("https://example.com")
753
- ```
754
-
755
- ---
756
-
757
- ## 🔧 Configuration Reference
758
-
759
- ### ThordataClient Parameters
760
-
761
- | Parameter | Type | Default | Description |
762
- |-----------|------|---------|-------------|
763
- | scraper_token | str | required | API token from Dashboard |
764
- | public_token | str | None | Public API token (for tasks/locations) |
765
- | public_key | str | None | Public API key |
766
- | proxy_host | str | "pr.thordata.net" | Proxy gateway host |
767
- | proxy_port | int | 9999 | Proxy gateway port |
768
- | timeout | int | 30 | Default request timeout (seconds) |
769
- | retry_config | RetryConfig | None | Retry configuration |
770
-
771
- ### ProxyConfig Parameters
772
-
773
- | Parameter | Type | Default | Description |
774
- |-----------|------|---------|-------------|
775
- | username | str | required | Proxy username |
776
- | password | str | required | Proxy password |
777
- | product | ProxyProduct | RESIDENTIAL | Proxy type |
778
- | country | str | None | ISO 3166-1 alpha-2 code |
779
- | state | str | None | State name (lowercase) |
780
- | city | str | None | City name (lowercase) |
781
- | continent | str | None | Continent code (af/an/as/eu/na/oc/sa) |
782
- | asn | str | None | ASN code (requires country) |
783
- | session_id | str | None | Session ID for sticky sessions |
784
- | session_duration | int | None | Session duration (1-90 minutes) |
785
-
786
- ### Proxy Products & Ports
787
-
788
- | Product | Port | Description |
789
- |---------|------|-------------|
790
- | RESIDENTIAL | 9999 | Rotating residential IPs |
791
- | MOBILE | 5555 | Mobile carrier IPs |
792
- | DATACENTER | 7777 | Datacenter IPs |
793
- | ISP | 6666 | Static ISP IPs |
794
-
795
- ---
796
-
797
- ## 📁 Project Structure
798
-
799
- ```
800
- thordata-python-sdk/
801
- ├── src/thordata/
802
- │ ├── __init__.py # Public API exports
803
- │ ├── client.py # Sync client
804
- │ ├── async_client.py # Async client
805
- │ ├── models.py # Data models (ProxyConfig, SerpRequest, etc.)
806
- │ ├── enums.py # Enumerations
807
- │ ├── exceptions.py # Exception hierarchy
808
- │ ├── retry.py # Retry mechanism
809
- │ └── _utils.py # Internal utilities
810
- ├── tests/ # Test suite
811
- ├── examples/ # Usage examples
812
- ├── pyproject.toml # Package configuration
813
- └── README.md
814
- ```
815
-
816
- ---
817
-
818
- ## 🧪 Development
819
-
820
- ### Setup
821
-
822
- ```bash
823
- # Clone the repository
824
- git clone https://github.com/Thordata/thordata-python-sdk.git
825
- cd thordata-python-sdk
826
-
827
- # Create virtual environment
828
- python -m venv venv
829
- source venv/bin/activate # On Windows: venv\Scripts\activate
830
-
831
- # Install with dev dependencies
832
- pip install -e ".[dev]"
833
- ```
834
-
835
- ### Run Tests
836
-
837
- ```bash
838
- # Run all tests
839
- pytest
840
-
841
- # Run with coverage
842
- pytest --cov=thordata --cov-report=html
843
-
844
- # Run specific test file
845
- pytest tests/test_client.py -v
846
- ```
847
-
848
- ### Code Quality
849
-
850
- ```bash
851
- # Format code
852
- black src tests
853
-
854
- # Lint
855
- ruff check src tests
856
-
857
- # Type check
858
- mypy src
859
- ```
860
-
861
- ---
862
-
863
- ## 📝 Changelog
864
-
865
- See [CHANGELOG.md](CHANGELOG.md) for version history.
866
-
867
- ---
868
-
869
- ## 🤝 Contributing
870
-
871
- Contributions are welcome! Please see [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.
872
-
873
- 1. Fork the repository
874
- 2. Create a feature branch (`git checkout -b feature/amazing-feature`)
875
- 3. Commit your changes (`git commit -m 'Add amazing feature'`)
876
- 4. Push to the branch (`git push origin feature/amazing-feature`)
877
- 5. Open a Pull Request
878
-
879
- ---
880
-
881
- ## 📄 License
882
-
883
- This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
884
-
885
- ---
886
-
887
- ## 🆘 Support
888
-
889
- - 📧 **Email**: support@thordata.com
890
- - 📚 **Documentation**: [doc.thordata.com](https://doc.thordata.com)
891
- - 🐛 **Issues**: [GitHub Issues](https://github.com/Thordata/thordata-python-sdk/issues)
892
- - 💬 **Dashboard**: [thordata.com](https://www.thordata.com)
893
-
894
- <div align="center">
895
- <sub>Built with ❤️ by the Thordata Team</sub>
896
- </div>
1
+ Metadata-Version: 2.4
2
+ Name: thordata-sdk
3
+ Version: 0.7.0
4
+ Summary: The Official Python SDK for Thordata - AI Data Infrastructure & Proxy Network.
5
+ Author-email: Thordata Developer Team <support@thordata.com>
6
+ License: MIT
7
+ Project-URL: Homepage, https://www.thordata.com
8
+ Project-URL: Documentation, https://github.com/Thordata/thordata-python-sdk#readme
9
+ Project-URL: Source, https://github.com/Thordata/thordata-python-sdk
10
+ Project-URL: Tracker, https://github.com/Thordata/thordata-python-sdk/issues
11
+ Project-URL: Changelog, https://github.com/Thordata/thordata-python-sdk/blob/main/CHANGELOG.md
12
+ Keywords: web scraping,proxy,residential proxy,datacenter proxy,ai,llm,data-mining,serp,thordata,web scraper,anti-bot bypass
13
+ Classifier: Development Status :: 4 - Beta
14
+ Classifier: Intended Audience :: Developers
15
+ Classifier: Topic :: Software Development :: Libraries :: Python Modules
16
+ Classifier: Topic :: Internet :: WWW/HTTP
17
+ Classifier: Topic :: Internet :: Proxy Servers
18
+ Classifier: Programming Language :: Python :: 3
19
+ Classifier: Programming Language :: Python :: 3.9
20
+ Classifier: Programming Language :: Python :: 3.10
21
+ Classifier: Programming Language :: Python :: 3.11
22
+ Classifier: Programming Language :: Python :: 3.12
23
+ Classifier: License :: OSI Approved :: MIT License
24
+ Classifier: Operating System :: OS Independent
25
+ Classifier: Typing :: Typed
26
+ Requires-Python: >=3.9
27
+ Description-Content-Type: text/markdown
28
+ License-File: LICENSE
29
+ Requires-Dist: requests>=2.25.0
30
+ Requires-Dist: aiohttp>=3.9.0
31
+ Provides-Extra: dev
32
+ Requires-Dist: pytest>=7.0.0; extra == "dev"
33
+ Requires-Dist: pytest-asyncio>=0.21.0; extra == "dev"
34
+ Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
35
+ Requires-Dist: pytest-httpserver>=1.0.0; extra == "dev"
36
+ Requires-Dist: python-dotenv>=1.0.0; extra == "dev"
37
+ Requires-Dist: black>=23.0.0; extra == "dev"
38
+ Requires-Dist: ruff>=0.1.0; extra == "dev"
39
+ Requires-Dist: mypy>=1.0.0; extra == "dev"
40
+ Requires-Dist: types-requests>=2.28.0; extra == "dev"
41
+ Dynamic: license-file
42
+
43
+ # Thordata Python SDK
44
+
45
+ <div align="center">
46
+
47
+ **Official Python client for Thordata's Proxy Network, SERP API, Web Unlocker, and Web Scraper API.**
48
+
49
+ *Async-ready, type-safe, built for AI agents and large-scale data collection.*
50
+
51
+ [![CI](https://github.com/Thordata/thordata-python-sdk/actions/workflows/ci.yml/badge.svg)](https://github.com/Thordata/thordata-python-sdk/actions/workflows/ci.yml)
52
+ [![PyPI version](https://img.shields.io/pypi/v/thordata-sdk?color=blue)](https://pypi.org/project/thordata-sdk/)
53
+ [![Python](https://img.shields.io/badge/python-3.9+-blue)](https://python.org)
54
+ [![License](https://img.shields.io/badge/license-MIT-green)](LICENSE)
55
+ [![Typed](https://img.shields.io/badge/typing-typed-purple)](https://github.com/Thordata/thordata-python-sdk)
56
+
57
+ [Documentation](https://doc.thordata.com) • [Dashboard](https://www.thordata.com) • [Examples](examples/) • [Changelog](CHANGELOG.md)
58
+
59
+ </div>
60
+
61
+ ---
62
+
63
+ ## ✨ Features
64
+
65
+ | Feature | Description |
66
+ |---------|-------------|
67
+ | 🌐 **Proxy Network** | Residential, Mobile, Datacenter, ISP proxies with geo-targeting |
68
+ | 🔍 **SERP API** | Google, Bing, Yandex, DuckDuckGo, Baidu search results |
69
+ | 🔓 **Web Unlocker** | Bypass Cloudflare, CAPTCHAs, anti-bot systems automatically |
70
+ | 🕷️ **Web Scraper** | Async task-based scraping for complex sites |
71
+ | ⚡ **Async Support** | Full async/await support with aiohttp |
72
+ | 🔄 **Auto Retry** | Configurable retry with exponential backoff |
73
+ | 📝 **Type Safe** | Full type annotations for IDE autocomplete |
74
+
75
+ ---
76
+
77
+ ## 📦 Installation
78
+
79
+ ```bash
80
+ pip install thordata-sdk
81
+ ```
82
+
83
+ For development:
84
+
85
+ ```bash
86
+ pip install thordata-sdk[dev]
87
+ ```
88
+
89
+ ---
90
+
91
+ ## 🚀 Quick Start
92
+
93
+ ### Get Your Credentials
94
+
95
+ 1. Sign up at [thordata.com](https://www.thordata.com)
96
+ 2. Navigate to your Dashboard
97
+ 3. Copy your Scraper Token, Public Token, and Public Key
98
+
99
+ ### Basic Usage
100
+
101
+ ```python
102
+ from thordata import ThordataClient
103
+
104
+ # Initialize the client
105
+ client = ThordataClient(
106
+ scraper_token="your_scraper_token",
107
+ public_token="your_public_token", # Optional, for task APIs
108
+ public_key="your_public_key" # Optional, for task APIs
109
+ )
110
+
111
+ # Make a request through the proxy network
112
+ response = client.get("https://httpbin.org/ip")
113
+ print(response.json())
114
+ # {'origin': '123.45.67.89'} # Residential IP
115
+ ```
116
+
117
+ ### Environment Variables
118
+
119
+ Create a `.env` file:
120
+
121
+ ```env
122
+ THORDATA_SCRAPER_TOKEN=your_scraper_token
123
+ THORDATA_PUBLIC_TOKEN=your_public_token
124
+ THORDATA_PUBLIC_KEY=your_public_key
125
+ ```
126
+
127
+ Then use with python-dotenv:
128
+
129
+ ```python
130
+ import os
131
+ from dotenv import load_dotenv
132
+ from thordata import ThordataClient
133
+
134
+ load_dotenv()
135
+
136
+ client = ThordataClient(
137
+ scraper_token=os.getenv("THORDATA_SCRAPER_TOKEN"),
138
+ public_token=os.getenv("THORDATA_PUBLIC_TOKEN"),
139
+ public_key=os.getenv("THORDATA_PUBLIC_KEY"),
140
+ )
141
+ ```
142
+
143
+ ---
144
+
145
+ ## 📖 Usage Guide
146
+
147
+ ### 1. Proxy Network
148
+
149
+ #### Basic Proxy Request
150
+
151
+ ```python
152
+ from thordata import ThordataClient
153
+
154
+ client = ThordataClient(scraper_token="your_token")
155
+
156
+ # GET request through proxy
157
+ response = client.get("https://example.com")
158
+ print(response.text)
159
+
160
+ # POST request through proxy
161
+ response = client.post("https://httpbin.org/post", json={"key": "value"})
162
+ print(response.json())
163
+ ```
164
+
165
+ #### Geo-Targeting
166
+
167
+ ```python
168
+ from thordata import ThordataClient, ProxyConfig
169
+
170
+ client = ThordataClient(scraper_token="your_token")
171
+
172
+ # Create a proxy config with geo-targeting
173
+ config = ProxyConfig(
174
+ username="your_username",
175
+ password="your_password",
176
+ country="us", # Target country
177
+ state="california", # Target state
178
+ city="los_angeles", # Target city
179
+ )
180
+
181
+ response = client.get("https://httpbin.org/ip", proxy_config=config)
182
+ print(response.json())
183
+ ```
184
+
185
+ #### Sticky Sessions
186
+
187
+ Keep the same IP for multiple requests:
188
+
189
+ ```python
190
+ from thordata import ThordataClient, StickySession
191
+
192
+ client = ThordataClient(scraper_token="your_token")
193
+
194
+ # Create a sticky session (same IP for 10 minutes)
195
+ session = StickySession(
196
+ username="your_username",
197
+ password="your_password",
198
+ country="gb",
199
+ duration_minutes=10,
200
+ )
201
+
202
+ # All requests use the same IP
203
+ for i in range(5):
204
+ response = client.get("https://httpbin.org/ip", proxy_config=session)
205
+ print(f"Request {i+1}: {response.json()['origin']}")
206
+ ```
207
+
208
+ #### Proxy Credentials
209
+
210
+ Each proxy product requires **separate credentials** from Thordata Dashboard:
211
+
212
+ ```env
213
+ # Residential Proxy (port 9999)
214
+ THORDATA_RESIDENTIAL_USERNAME=your_residential_username
215
+ THORDATA_RESIDENTIAL_PASSWORD=your_residential_password
216
+
217
+ # Datacenter Proxy (port 7777)
218
+ THORDATA_DATACENTER_USERNAME=your_datacenter_username
219
+ THORDATA_DATACENTER_PASSWORD=your_datacenter_password
220
+
221
+ # Mobile Proxy (port 5555)
222
+ THORDATA_MOBILE_USERNAME=your_mobile_username
223
+ THORDATA_MOBILE_PASSWORD=your_mobile_password
224
+
225
+ # Static ISP Proxy (port 6666, direct IP connection)
226
+ THORDATA_ISP_HOST=your_static_ip_address
227
+ THORDATA_ISP_USERNAME=your_isp_username
228
+ THORDATA_ISP_PASSWORD=your_isp_password
229
+ ```
230
+
231
+ #### Residential Proxy
232
+
233
+ ```python
234
+ from thordata import ProxyConfig, ProxyProduct
235
+
236
+ proxy = ProxyConfig(
237
+ username="your_username",
238
+ password="your_password",
239
+ product=ProxyProduct.RESIDENTIAL,
240
+ country="us",
241
+ )
242
+
243
+ response = requests.get(
244
+ "http://httpbin.org/ip",
245
+ proxies=proxy.to_proxies_dict(),
246
+ )
247
+ print(response.json())
248
+ ```
249
+
250
+ #### Datacenter Proxy
251
+
252
+ ```python
253
+ proxy = ProxyConfig(
254
+ username="your_username",
255
+ password="your_password",
256
+ product=ProxyProduct.DATACENTER,
257
+ )
258
+ ```
259
+
260
+ #### Mobile Proxy
261
+
262
+ ```python
263
+ proxy = ProxyConfig(
264
+ username="your_username",
265
+ password="your_password",
266
+ product=ProxyProduct.MOBILE,
267
+ country="gb",
268
+ )
269
+ ```
270
+
271
+ #### Static ISP Proxy
272
+
273
+ Static ISP proxies connect directly to your purchased IP address:
274
+
275
+ ```python
276
+ from thordata import StaticISPProxy
277
+
278
+ proxy = StaticISPProxy(
279
+ host="your_static_ip_address", # Your purchased IP
280
+ username="your_username",
281
+ password="your_password",
282
+ )
283
+
284
+ response = requests.get(
285
+ "http://httpbin.org/ip",
286
+ proxies=proxy.to_proxies_dict(),
287
+ )
288
+ # Returns your purchased static IP
289
+ ```
290
+
291
+ #### Proxy Examples
292
+
293
+ ```bash
294
+ python examples/proxy_residential.py
295
+ python examples/proxy_datacenter.py
296
+ python examples/proxy_mobile.py
297
+ python examples/proxy_isp.py
298
+ ```
299
+
300
+ ### Run All Examples
301
+
302
+ ```bash
303
+ # SERP API examples
304
+ python examples/demo_serp_api.py
305
+ python examples/demo_serp_google_news.py
306
+
307
+ # Universal API examples
308
+ python examples/demo_universal.py
309
+ python examples/demo_scraping_browser.py
310
+
311
+ # Web Scraper API examples
312
+ python examples/demo_web_scraper_api.py
313
+
314
+ # Proxy Network examples
315
+ python examples/proxy_residential.py
316
+ python examples/proxy_datacenter.py
317
+ python examples/proxy_mobile.py
318
+ python examples/proxy_isp.py
319
+
320
+ # Async high concurrency example
321
+ python examples/async_high_concurrency.py
322
+ ```
323
+
324
+ ### 2. SERP API (Search Engine Results)
325
+
326
+ #### Basic Search
327
+
328
+ ```python
329
+ from thordata import ThordataClient, Engine
330
+
331
+ client = ThordataClient(scraper_token="your_token")
332
+
333
+ # Google search
334
+ results = client.serp_search(
335
+ query="python programming",
336
+ engine=Engine.GOOGLE,
337
+ num=10
338
+ )
339
+
340
+ # Print organic results
341
+ for result in results.get("organic", []):
342
+ print(f"{result['title']}: {result['link']}")
343
+ ```
344
+
345
+ #### General Calling Method
346
+
347
+ ```python
348
+ from thordata import ThordataClient
349
+
350
+ client = ThordataClient(scraper_token="YOUR_SCRAPER_TOKEN")
351
+
352
+ # Recommended: use dedicated engines for Google verticals when available
353
+ news = client.serp_search(
354
+ query="pizza",
355
+ engine="google_news",
356
+ country="us",
357
+ language="en",
358
+ num=10,
359
+ so=1, # 0=relevance, 1=date (Google News)
360
+ )
361
+
362
+ # Alternative: use Google generic engine + tbm via `search_type`
363
+ # Note: `search_type` maps to Google tbm and is mainly intended for engine="google".
364
+ results = client.serp_search(
365
+ query="pizza",
366
+ engine="google",
367
+ num=10,
368
+ country="us",
369
+ language="en",
370
+ search_type="news", # tbm=nws (Google generic engine)
371
+ ibp="some_ibp_value",
372
+ lsig="some_lsig_value",
373
+ )
374
+ ```
375
+
376
+ **Note**: All parameters above will be assembled into Thordata SERP API request parameters.
377
+
378
+ #### Advanced Search Options
379
+
380
+ ```python
381
+ from thordata import ThordataClient, SerpRequest
382
+
383
+ client = ThordataClient(scraper_token="your_token")
384
+
385
+ # Create a detailed search request
386
+ request = SerpRequest(
387
+ query="best laptops 2024",
388
+ engine="google_shopping",
389
+ num=20,
390
+ country="us",
391
+ language="en",
392
+ safe_search=True,
393
+ device="mobile",
394
+ # Shopping-specific params can be passed via extra_params
395
+ # e.g. min_price=500, max_price=1500, sort_by=1, shoprs="..."
396
+ )
397
+
398
+ results = client.serp_search_advanced(request)
399
+ ```
400
+
401
+ #### Multiple Search Engines
402
+
403
+ ```python
404
+ from thordata import ThordataClient, Engine
405
+
406
+ client = ThordataClient(scraper_token="your_token")
407
+
408
+ # Google
409
+ google_results = client.serp_search("AI news", engine=Engine.GOOGLE)
410
+
411
+ # Bing
412
+ bing_results = client.serp_search("AI news", engine=Engine.BING)
413
+
414
+ # Yandex (Russian search engine)
415
+ yandex_results = client.serp_search("AI news", engine=Engine.YANDEX)
416
+
417
+ # DuckDuckGo
418
+ ddg_results = client.serp_search("AI news", engine=Engine.DUCKDUCKGO)
419
+ ```
420
+
421
+ ---
422
+
423
+ ## 🔧 SERP API Parameter Mapping
424
+
425
+ Thordata's SERP API supports multiple search engines and sub-features (Google Search/Shopping/News, etc.).
426
+ This SDK wraps common parameters through `ThordataClient.serp_search` and `SerpRequest`, while other parameters can be passed directly through `**kwargs`.
427
+
428
+ ### Google Search Parameter Mapping
429
+
430
+ | Document Parameter | SDK Field/Usage | Description |
431
+ |-------------------|-----------------|-------------|
432
+ | q | query | Search keyword |
433
+ | engine | engine | Engine.GOOGLE / "google" |
434
+ | google_domain | google_domain | e.g., "google.co.uk" |
435
+ | gl | country | Country/region, e.g., "us" |
436
+ | hl | language | Language, e.g., "en", "zh-CN" |
437
+ | cr | countries_filter | Multi-country filter, e.g., "countryFR |
438
+ | lr | languages_filter | Multi-language filter, e.g., "lang_en |
439
+ | location | location | Exact location, e.g., "India" |
440
+ | uule | uule | Base64 encoded location string |
441
+ | tbm | search_type | "images"→tbm=isch, "shopping"→tbm=shop, "news"→tbm=nws, "videos"→tbm=vid, other values passed through as-is |
442
+ | start | start | Result offset for pagination |
443
+ | num | num | Number of results per page |
444
+ | ludocid | ludocid | Google Place ID |
445
+ | kgmid | kgmid | Google Knowledge Graph ID |
446
+ | ibp | ibp="..." (kwargs) | Passed through **kwargs |
447
+ | lsig | lsig="..." (kwargs) | Same as above |
448
+ | si | si="..." (kwargs) | Same as above |
449
+ | uds | uds="ADV" (kwargs) | Same as above |
450
+ | tbs | time_filter or tbs="..." | time_filter="week" generates tbs=qdr:w, can also pass complete tbs directly |
451
+ | safe | safe_search | True → safe=active, False → safe=off |
452
+ | nfpr | no_autocorrect | True → nfpr=1 |
453
+ | filter | filter_duplicates | True → filter=1, False → filter=0 |
454
+
455
+ **Example: Google Search Basic Usage**
456
+
457
+ ```python
458
+ results = client.serp_search(
459
+ query="python web scraping best practices",
460
+ engine=Engine.GOOGLE,
461
+ country="us",
462
+ language="en",
463
+ num=10,
464
+ time_filter="week", # Last week
465
+ safe_search=True, # Adult content filter
466
+ )
467
+ ```
468
+
469
+ ### Google Shopping Parameter Mapping
470
+
471
+ Recommended: use the dedicated Google Shopping engine (`engine="google_shopping"`):
472
+
473
+ ```python
474
+ results = client.serp_search(
475
+ query="iPhone 15",
476
+ engine="google_shopping",
477
+ country="us",
478
+ language="en",
479
+ num=20,
480
+ # Shopping parameters are passed through kwargs
481
+ min_price=500,
482
+ max_price=1500,
483
+ sort_by=1,
484
+ free_shipping=True,
485
+ on_sale=True,
486
+ small_business=True,
487
+ direct_link=True,
488
+ shoprs="FILTER_ID_HERE",
489
+ )
490
+ shopping_items = results.get("shopping_results", [])
491
+ ```
492
+ Alternative: use `engine="google"` with `search_type="shopping"` (tbm=shop).
493
+
494
+ | Document Parameter | SDK Field/Usage | Description |
495
+ |-------------------|-----------------|-------------|
496
+ | q | query | Search keyword |
497
+ | google_domain | google_domain | Same as above |
498
+ | gl | country | Same as above |
499
+ | hl | language | Same as above |
500
+ | location | location | Same as above |
501
+ | uule | uule | Same as above |
502
+ | start | start | Offset |
503
+ | num | num | Quantity |
504
+ | tbs | time_filter or tbs="..." | Same as above |
505
+ | shoprs | shoprs="..." (kwargs) | Filter ID |
506
+ | min_price | min_price=... (kwargs) | Minimum price |
507
+ | max_price | max_price=... (kwargs) | Maximum price |
508
+ | sort_by | sort_by=1/2 (kwargs) | Sort order |
509
+ | free_shipping | free_shipping=True/False (kwargs) | Free shipping |
510
+ | on_sale | on_sale=True/False (kwargs) | On sale |
511
+ | small_business | small_business=True/False (kwargs) | Small business |
512
+ | direct_link | direct_link=True/False (kwargs) | Include direct links |
513
+
514
+ ### Google Local Parameter Mapping
515
+
516
+ Google Local is mainly about location-based local searches.
517
+ In the SDK, you can use search_type="local" to mark Local mode (tbm passed through as "local"), combined with location + uule.
518
+
519
+ ```python
520
+ results = client.serp_search(
521
+ query="pizza near me",
522
+ engine=Engine.GOOGLE,
523
+ search_type="local",
524
+ google_domain="google.com",
525
+ country="us",
526
+ language="en",
527
+ location="San Francisco",
528
+ uule="w+CAIQICIFU2FuIEZyYW5jaXNjbw", # Example value
529
+ start=0, # Local only accepts 0, 20, 40...
530
+ )
531
+ local_results = results.get("local_results", results.get("organic", []))
532
+ ```
533
+
534
+ | Document Parameter | SDK Field/Usage | Description |
535
+ |-------------------|-----------------|-------------|
536
+ | q | query | Search term |
537
+ | google_domain | google_domain | Domain |
538
+ | gl | country | Country |
539
+ | hl | language | Language |
540
+ | location location |
541
+ | u | location | Localule | uule | Encoded location |
542
+ | start | start | Offset (must be 0,20,40...) |
543
+ | ludocid | ludocid | Place ID (commonly used in Local results) |
544
+ | tbs | time_filter or tbs="..." | Advanced filtering |
545
+
546
+ ### Google Videos Parameter Mapping
547
+
548
+ ```python
549
+ results = client.serp_search(
550
+ query="python async tutorial",
551
+ engine=Engine.GOOGLE,
552
+ search_type="videos", # tbm=vid
553
+ country="us",
554
+ language="en",
555
+ languages_filter="lang_en|lang_fr",
556
+ location="United States",
557
+ uule="ENCODED_LOCATION_HERE",
558
+ num=10,
559
+ time_filter="month",
560
+ safe_search=True,
561
+ filter_duplicates=True,
562
+ )
563
+ video_results = results.get("video_results", results.get("organic", []))
564
+ ```
565
+
566
+ | Document Parameter | SDK Field/Usage | Description |
567
+ |-------------------|-----------------|-------------|
568
+ | q | query | Search term |
569
+ | google_domain | google_domain | Domain |
570
+ | gl | country | Country |
571
+ | hl | language | Language |
572
+ | lr | languages_filter | Multi-language filter |
573
+ | location | location | Geographic location |
574
+ | uule | uule | Encoded location |
575
+ | start | start | Offset |
576
+ | num | num | Quantity |
577
+ | tbs | time_filter or tbs="..." | Time and advanced filtering |
578
+ | safe | safe_search | Adult content filter |
579
+ | nfpr | no_autocorrect | Disable auto-correction |
580
+ | filter | filter_duplicates | Remove duplicates |
581
+
582
+ ### Google News Parameter Mapping
583
+
584
+ Google News has a set of exclusive token parameters for precise control of "topics/media/sections/stories".
585
+
586
+ ```python
587
+ results = client.serp_search(
588
+ query="AI regulation",
589
+ engine="google_news",
590
+ country="us",
591
+ language="en",
592
+ topic_token="YOUR_TOPIC_TOKEN",
593
+ publication_token="YOUR_PUBLICATION_TOKEN",
594
+ section_token="YOUR_SECTION_TOKEN",
595
+ story_token="YOUR_STORY_TOKEN",
596
+ so=1, # 0=relevance, 1=date
597
+ )
598
+ news_results = results.get("news_results", results.get("organic", []))
599
+ ```
600
+
601
+ | Document Parameter | SDK Field/Usage | Description |
602
+ |-------------------|-----------------|-------------|
603
+ | q | query | Search term |
604
+ | gl | country | Country |
605
+ | hl | language | Language |
606
+ | topic_token | topic_token="..." (kwargs) | Topic token |
607
+ | publication_token | publication_token="..." (kwargs) | Media token |
608
+ | section_token | section_token="..." (kwargs) | Section token |
609
+ | story_token | story_token="..." (kwargs) | Story token |
610
+ | so | so=0/1 (kwargs) | Sort: 0=relevance, 1=time |
611
+
612
+ ---
613
+
614
+ 👉 For more SERP modes and parameter mappings, see docs/serp_reference.md.
615
+
616
+ ## 🔓 Web Unlocker (Universal Scraping API)
617
+
618
+ Automatically bypass anti-bot protections:
619
+
620
+ #### Basic Usage
621
+
622
+ ```python
623
+ from thordata import ThordataClient
624
+
625
+ client = ThordataClient(scraper_token="your_token")
626
+
627
+ # Get HTML content
628
+ html = client.universal_scrape(
629
+ url="https://example.com",
630
+ js_render=True, # Enable JavaScript rendering
631
+ )
632
+ print(html[:500])
633
+ ```
634
+
635
+ #### Advanced Options
636
+
637
+ ```python
638
+ from thordata import ThordataClient, UniversalScrapeRequest
639
+
640
+ client = ThordataClient(scraper_token="your_token")
641
+
642
+ request = UniversalScrapeRequest(
643
+ url="https://example.com",
644
+ js_render=True,
645
+ output_format="html",
646
+ country="us",
647
+ block_resources="image,font", # Speed up by blocking resources
648
+ clean_content="js,css", # Remove JS/CSS from output
649
+ wait=5000, # Wait 5 seconds after load
650
+ wait_for=".content-loaded", # Wait for CSS selector
651
+ headers=[
652
+ {"name": "Accept-Language", "value": "en-US"}
653
+ ],
654
+ cookies=[
655
+ {"name": "session", "value": "abc123"}
656
+ ],
657
+ )
658
+
659
+ html = client.universal_scrape_advanced(request)
660
+ ```
661
+
662
+ #### Take Screenshots
663
+
664
+ ```python
665
+ from thordata import ThordataClient
666
+
667
+ client = ThordataClient(scraper_token="your_token")
668
+
669
+ # Get PNG screenshot
670
+ png_bytes = client.universal_scrape(
671
+ url="https://example.com",
672
+ js_render=True,
673
+ output_format="png",
674
+ )
675
+
676
+ # Save to file
677
+ with open("screenshot.png", "wb") as f:
678
+ f.write(png_bytes)
679
+ ```
680
+
681
+ ### Web Scraper API (Async Tasks)
682
+
683
+ For complex scraping jobs that run asynchronously:
684
+
685
+ ```python
686
+ from thordata import ThordataClient
687
+
688
+ client = ThordataClient(
689
+ scraper_token="your_token",
690
+ public_token="your_public_token",
691
+ public_key="your_public_key",
692
+ )
693
+
694
+ # Create a scraping task
695
+ task_id = client.create_scraper_task(
696
+ file_name="youtube_channel_data",
697
+ spider_id="youtube_video-post_by-url", # From Dashboard
698
+ spider_name="youtube.com",
699
+ parameters={
700
+ "url": "https://www.youtube.com/@PewDiePie/videos",
701
+ "num_of_posts": "50"
702
+ }
703
+ )
704
+ print(f"Task created: {task_id}")
705
+
706
+ # Wait for completion (with timeout)
707
+ status = client.wait_for_task(task_id, max_wait=300)
708
+ print(f"Task status: {status}")
709
+
710
+ # Get results
711
+ if status in ("ready", "success"):
712
+ download_url = client.get_task_result(task_id)
713
+ print(f"Download: {download_url}")
714
+ ```
715
+
716
+ ### Async Client (High Concurrency)
717
+
718
+ For maximum performance with concurrent requests:
719
+
720
+ ```python
721
+ import asyncio
722
+ from thordata import AsyncThordataClient
723
+
724
+ async def main():
725
+ async with AsyncThordataClient(
726
+ scraper_token="your_token",
727
+ public_token="your_public_token",
728
+ public_key="your_public_key",
729
+ ) as client:
730
+
731
+ # Concurrent proxy requests
732
+ urls = [
733
+ "https://httpbin.org/ip",
734
+ "https://httpbin.org/headers",
735
+ "https://httpbin.org/user-agent",
736
+ ]
737
+
738
+ tasks = [client.get(url) for url in urls]
739
+ responses = await asyncio.gather(*tasks)
740
+
741
+ for resp in responses:
742
+ print(await resp.json())
743
+
744
+ asyncio.run(main())
745
+ ```
746
+
747
+ #### Async SERP Search
748
+
749
+ ```python
750
+ import asyncio
751
+ from thordata import AsyncThordataClient, Engine
752
+
753
+ async def search_multiple():
754
+ async with AsyncThordataClient(scraper_token="your_token") as client:
755
+ queries = ["python", "javascript", "rust", "go"]
756
+
757
+ tasks = [
758
+ client.serp_search(q, engine=Engine.GOOGLE)
759
+ for q in queries
760
+ ]
761
+
762
+ results = await asyncio.gather(*tasks)
763
+
764
+ for query, result in zip(queries, results):
765
+ count = len(result.get("organic", []))
766
+ print(f"{query}: {count} results")
767
+
768
+ asyncio.run(search_multiple())
769
+ ```
770
+
771
+ ### Location APIs
772
+
773
+ Discover available geo-targeting options:
774
+
775
+ ```python
776
+ from thordata import ThordataClient, ProxyType
777
+
778
+ client = ThordataClient(
779
+ scraper_token="your_token",
780
+ public_token="your_public_token",
781
+ public_key="your_public_key",
782
+ )
783
+
784
+ # List all supported countries
785
+ countries = client.list_countries(proxy_type=ProxyType.RESIDENTIAL)
786
+ print(f"Supported countries: {len(countries)}")
787
+
788
+ # List states for a country
789
+ states = client.list_states("US")
790
+ for state in states[:5]:
791
+ print(f" {state['state_code']}: {state['state_name']}")
792
+
793
+ # List cities
794
+ cities = client.list_cities("US", state_code="california")
795
+ print(f"Cities in California: {len(cities)}")
796
+
797
+ # List ASNs (for ISP targeting)
798
+ asns = client.list_asn("US")
799
+ for asn in asns[:5]:
800
+ print(f" {asn['asn_code']}: {asn['asn_name']}")
801
+ ```
802
+
803
+ ### Error Handling
804
+
805
+ ```python
806
+ from thordata import (
807
+ ThordataClient,
808
+ ThordataError,
809
+ ThordataAuthError,
810
+ ThordataRateLimitError,
811
+ ThordataNetworkError,
812
+ ThordataTimeoutError,
813
+ )
814
+
815
+ client = ThordataClient(scraper_token="your_token")
816
+
817
+ try:
818
+ result = client.serp_search("test query")
819
+ except ThordataAuthError as e:
820
+ print(f"Authentication failed: {e}")
821
+ print(f"Check your token. Status code: {e.status_code}")
822
+ except ThordataRateLimitError as e:
823
+ print(f"Rate limited: {e}")
824
+ if e.retry_after:
825
+ print(f"Retry after {e.retry_after} seconds")
826
+ except ThordataTimeoutError as e:
827
+ print(f"Request timed out: {e}")
828
+ except ThordataNetworkError as e:
829
+ print(f"Network error: {e}")
830
+ except ThordataError as e:
831
+ print(f"General error: {e}")
832
+ ```
833
+
834
+ ### Retry Configuration
835
+
836
+ Customize automatic retry behavior:
837
+
838
+ ```python
839
+ from thordata import ThordataClient, RetryConfig
840
+
841
+ # Custom retry configuration
842
+ retry_config = RetryConfig(
843
+ max_retries=5, # Maximum retry attempts
844
+ backoff_factor=2.0, # Exponential backoff multiplier
845
+ max_backoff=120.0, # Maximum wait between retries
846
+ jitter=True, # Add randomness to prevent thundering herd
847
+ )
848
+
849
+ client = ThordataClient(
850
+ scraper_token="your_token",
851
+ retry_config=retry_config,
852
+ )
853
+
854
+ # Requests will automatically retry on transient failures
855
+ response = client.get("https://example.com")
856
+ ```
857
+
858
+ ---
859
+
860
+ ## 🔧 Configuration Reference
861
+
862
+ ### ThordataClient Parameters
863
+
864
+ | Parameter | Type | Default | Description |
865
+ |-----------|------|---------|-------------|
866
+ | scraper_token | str | required | API token from Dashboard |
867
+ | public_token | str | None | Public API token (for tasks/locations) |
868
+ | public_key | str | None | Public API key |
869
+ | proxy_host | str | "pr.thordata.net" | Proxy gateway host |
870
+ | proxy_port | int | 9999 | Proxy gateway port |
871
+ | timeout | int | 30 | Default request timeout (seconds) |
872
+ | retry_config | RetryConfig | None | Retry configuration |
873
+
874
+ ### ProxyConfig Parameters
875
+
876
+ | Parameter | Type | Default | Description |
877
+ |-----------|------|---------|-------------|
878
+ | username | str | required | Proxy username |
879
+ | password | str | required | Proxy password |
880
+ | product | ProxyProduct | RESIDENTIAL | Proxy type |
881
+ | country | str | None | ISO 3166-1 alpha-2 code |
882
+ | state | str | None | State name (lowercase) |
883
+ | city | str | None | City name (lowercase) |
884
+ | continent | str | None | Continent code (af/an/as/eu/na/oc/sa) |
885
+ | asn | str | None | ASN code (requires country) |
886
+ | session_id | str | None | Session ID for sticky sessions |
887
+ | session_duration | int | None | Session duration (1-90 minutes) |
888
+
889
+ ### Proxy Products & Ports
890
+
891
+ | Product | Port | Description |
892
+ |---------|------|-------------|
893
+ | RESIDENTIAL | 9999 | Rotating residential IPs |
894
+ | MOBILE | 5555 | Mobile carrier IPs |
895
+ | DATACENTER | 7777 | Datacenter IPs |
896
+ | ISP | 6666 | Static ISP IPs |
897
+
898
+ ---
899
+
900
+ ## 📁 Project Structure
901
+
902
+ ```
903
+ thordata-python-sdk/
904
+ ├── src/thordata/
905
+ │ ├── __init__.py # Public API exports
906
+ │ ├── client.py # Sync client
907
+ │ ├── async_client.py # Async client
908
+ │ ├── models.py # Data models (ProxyConfig, SerpRequest, etc.)
909
+ │ ├── enums.py # Enumerations
910
+ │ ├── exceptions.py # Exception hierarchy
911
+ │ ├── retry.py # Retry mechanism
912
+ │ ├── parameters.py # Parameter definitions
913
+ │ ├── demo.py # Demo functionality
914
+ │ └── _utils.py # Internal utilities
915
+ ├── tests/
916
+ │ ├── __init__.py # Test initialization
917
+ │ ├── conftest.py # Pytest configuration
918
+ │ ├── test_client.py # Client tests
919
+ │ ├── test_async_client.py # Async client tests
920
+ │ ├── test_client_errors.py # Client error tests
921
+ │ ├── test_async_client_errors.py # Async client error tests
922
+ │ ├── test_enums.py # Enums tests
923
+ │ ├── test_models.py # Models tests
924
+ │ ├── test_exceptions.py # Exceptions tests
925
+ │ ├── test_demo_entrypoint.py # Demo entrypoint tests
926
+ │ ├── test_task_status_and_wait.py # Task status tests
927
+ │ ├── test_user_agent.py # User agent tests
928
+ │ ├── test_examples_demo_serp_api.py # SERP API examples tests
929
+ │ ├── test_examples_demo_universal.py # Universal API examples tests
930
+ │ ├── test_examples_demo_web_scraper_api.py # Web scraper examples tests
931
+ │ └── test_examples_async_high_concurrency.py # Async high concurrency tests
932
+ ├── examples/
933
+ │ ├── demo_serp_api.py # SERP API demo
934
+ │ ├── demo_serp_google_news.py # Google News demo
935
+ │ ├── demo_universal.py # Universal API demo
936
+ │ ├── demo_web_scraper_api.py # Web scraper demo
937
+ │ ├── demo_scraping_browser.py # Scraping browser demo
938
+ │ ├── async_high_concurrency.py # Async high concurrency demo
939
+ │ ├── proxy_residential.py # Residential proxy example
940
+ │ ├── proxy_datacenter.py # Datacenter proxy example
941
+ │ ├── proxy_mobile.py # Mobile proxy example
942
+ │ ├── proxy_isp.py # Static ISP proxy example
943
+ │ └── .env.example # Environment variables template
944
+ ├── docs/
945
+ │ ├── serp_reference.md # SERP API reference
946
+ │ ├── serp_reference_legacy.md # Legacy SERP reference
947
+ │ └── universal_reference.md # Universal API reference
948
+ ├── .github/
949
+ │ ├── dependabot.yml # Dependabot configuration
950
+ │ ├── pull_request_template.md # PR template
951
+ │ ├── ISSUE_TEMPLATE/
952
+ │ │ ├── bug_report.md # Bug report template
953
+ │ │ └── feature_request.md # Feature request template
954
+ │ └── workflows/
955
+ │ ├── ci.yml # Continuous integration
956
+ │ └── pypi-publish.yml # PyPI publishing workflow
957
+ ├── .env.example # Environment variables template
958
+ ├── .prettierrc # Prettier configuration
959
+ ├── .prettierignore # Prettier ignore patterns
960
+ ├── eslint.config.cjs # ESLint configuration
961
+ ├── LICENSE # License file
962
+ ├── package.json # Package configuration
963
+ ├── py.typed # Type hints marker
964
+ ├── pyproject.toml # Python package configuration
965
+ ├── pytest.ini # Pytest configuration
966
+ ├── requirements.txt # Python dependencies
967
+ ├── tsconfig.json # TypeScript configuration
968
+ ├── tsconfig.build.json # TypeScript build configuration
969
+ ├── vitest.config.ts # Vitest testing configuration
970
+ └── README.md # This file
971
+ ```
972
+
973
+ ---
974
+
975
+ ## 🧪 Development
976
+
977
+ ### Setup
978
+
979
+ ```bash
980
+ # Clone the repository
981
+ git clone https://github.com/Thordata/thordata-python-sdk.git
982
+ cd thordata-python-sdk
983
+
984
+ # Create virtual environment
985
+ python -m venv venv
986
+ source venv/bin/activate # On Windows: venv\Scripts\activate
987
+
988
+ # Install with dev dependencies
989
+ pip install -e ".[dev]"
990
+ ```
991
+
992
+ ### Run Tests
993
+
994
+ ```bash
995
+ # Run all tests
996
+ pytest
997
+
998
+ # Run with coverage
999
+ pytest --cov=thordata --cov-report=html
1000
+
1001
+ # Run specific test file
1002
+ pytest tests/test_client.py -v
1003
+ ```
1004
+
1005
+ ### Code Quality
1006
+
1007
+ ```bash
1008
+ # Format code
1009
+ black src tests
1010
+
1011
+ # Lint
1012
+ ruff check src tests
1013
+
1014
+ # Type check
1015
+ mypy src
1016
+ ```
1017
+
1018
+ ---
1019
+
1020
+ ## 📝 Changelog
1021
+
1022
+ See [CHANGELOG.md](CHANGELOG.md) for version history.
1023
+
1024
+ ---
1025
+
1026
+ ## 🤝 Contributing
1027
+
1028
+ Contributions are welcome! Please see [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.
1029
+
1030
+ 1. Fork the repository
1031
+ 2. Create a feature branch (`git checkout -b feature/amazing-feature`)
1032
+ 3. Commit your changes (`git commit -m 'Add amazing feature'`)
1033
+ 4. Push to the branch (`git push origin feature/amazing-feature`)
1034
+ 5. Open a Pull Request
1035
+
1036
+ ---
1037
+
1038
+ ## 📄 License
1039
+
1040
+ This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
1041
+
1042
+ ---
1043
+
1044
+ ## 🆘 Support
1045
+
1046
+ - 📧 **Email**: support@thordata.com
1047
+ - 📚 **Documentation**: [doc.thordata.com](https://doc.thordata.com)
1048
+ - 🐛 **Issues**: [GitHub Issues](https://github.com/Thordata/thordata-python-sdk/issues)
1049
+ - 💬 **Dashboard**: [thordata.com](https://www.thordata.com)
1050
+
1051
+ <div align="center">
1052
+ <sub>Built with ❤️ by the Thordata Team</sub>
1053
+ </div>