thordata-sdk 0.6.0__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,1053 @@
1
+ Metadata-Version: 2.4
2
+ Name: thordata-sdk
3
+ Version: 0.6.0
4
+ Summary: The Official Python SDK for Thordata - AI Data Infrastructure & Proxy Network.
5
+ Author-email: Thordata Developer Team <support@thordata.com>
6
+ License: MIT
7
+ Project-URL: Homepage, https://www.thordata.com
8
+ Project-URL: Documentation, https://github.com/Thordata/thordata-python-sdk#readme
9
+ Project-URL: Source, https://github.com/Thordata/thordata-python-sdk
10
+ Project-URL: Tracker, https://github.com/Thordata/thordata-python-sdk/issues
11
+ Project-URL: Changelog, https://github.com/Thordata/thordata-python-sdk/blob/main/CHANGELOG.md
12
+ Keywords: web scraping,proxy,residential proxy,datacenter proxy,ai,llm,data-mining,serp,thordata,web scraper,anti-bot bypass
13
+ Classifier: Development Status :: 4 - Beta
14
+ Classifier: Intended Audience :: Developers
15
+ Classifier: Topic :: Software Development :: Libraries :: Python Modules
16
+ Classifier: Topic :: Internet :: WWW/HTTP
17
+ Classifier: Topic :: Internet :: Proxy Servers
18
+ Classifier: Programming Language :: Python :: 3
19
+ Classifier: Programming Language :: Python :: 3.9
20
+ Classifier: Programming Language :: Python :: 3.10
21
+ Classifier: Programming Language :: Python :: 3.11
22
+ Classifier: Programming Language :: Python :: 3.12
23
+ Classifier: License :: OSI Approved :: MIT License
24
+ Classifier: Operating System :: OS Independent
25
+ Classifier: Typing :: Typed
26
+ Requires-Python: >=3.9
27
+ Description-Content-Type: text/markdown
28
+ License-File: LICENSE
29
+ Requires-Dist: requests>=2.25.0
30
+ Requires-Dist: aiohttp>=3.9.0
31
+ Provides-Extra: dev
32
+ Requires-Dist: pytest>=7.0.0; extra == "dev"
33
+ Requires-Dist: pytest-asyncio>=0.21.0; extra == "dev"
34
+ Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
35
+ Requires-Dist: pytest-httpserver>=1.0.0; extra == "dev"
36
+ Requires-Dist: python-dotenv>=1.0.0; extra == "dev"
37
+ Requires-Dist: black>=23.0.0; extra == "dev"
38
+ Requires-Dist: ruff>=0.1.0; extra == "dev"
39
+ Requires-Dist: mypy>=1.0.0; extra == "dev"
40
+ Requires-Dist: types-requests>=2.28.0; extra == "dev"
41
+ Dynamic: license-file
42
+
43
+ # Thordata Python SDK
44
+
45
+ <div align="center">
46
+
47
+ **Official Python client for Thordata's Proxy Network, SERP API, Web Unlocker, and Web Scraper API.**
48
+
49
+ *Async-ready, type-safe, built for AI agents and large-scale data collection.*
50
+
51
+ [![CI](https://github.com/Thordata/thordata-python-sdk/actions/workflows/ci.yml/badge.svg)](https://github.com/Thordata/thordata-python-sdk/actions/workflows/ci.yml)
52
+ [![PyPI version](https://img.shields.io/pypi/v/thordata-sdk?color=blue)](https://pypi.org/project/thordata-sdk/)
53
+ [![Python](https://img.shields.io/badge/python-3.9+-blue)](https://python.org)
54
+ [![License](https://img.shields.io/badge/license-MIT-green)](LICENSE)
55
+ [![Typed](https://img.shields.io/badge/typing-typed-purple)](https://github.com/Thordata/thordata-python-sdk)
56
+
57
+ [Documentation](https://doc.thordata.com) โ€ข [Dashboard](https://www.thordata.com) โ€ข [Examples](examples/) โ€ข [Changelog](CHANGELOG.md)
58
+
59
+ </div>
60
+
61
+ ---
62
+
63
+ ## โœจ Features
64
+
65
+ | Feature | Description |
66
+ |---------|-------------|
67
+ | ๐ŸŒ **Proxy Network** | Residential, Mobile, Datacenter, ISP proxies with geo-targeting |
68
+ | ๐Ÿ” **SERP API** | Google, Bing, Yandex, DuckDuckGo, Baidu search results |
69
+ | ๐Ÿ”“ **Web Unlocker** | Bypass Cloudflare, CAPTCHAs, anti-bot systems automatically |
70
+ | ๐Ÿ•ท๏ธ **Web Scraper** | Async task-based scraping for complex sites |
71
+ | โšก **Async Support** | Full async/await support with aiohttp |
72
+ | ๐Ÿ”„ **Auto Retry** | Configurable retry with exponential backoff |
73
+ | ๐Ÿ“ **Type Safe** | Full type annotations for IDE autocomplete |
74
+
75
+ ---
76
+
77
+ ## ๐Ÿ“ฆ Installation
78
+
79
+ ```bash
80
+ pip install thordata-sdk
81
+ ```
82
+
83
+ For development:
84
+
85
+ ```bash
86
+ pip install thordata-sdk[dev]
87
+ ```
88
+
89
+ ---
90
+
91
+ ## ๐Ÿš€ Quick Start
92
+
93
+ ### Get Your Credentials
94
+
95
+ 1. Sign up at [thordata.com](https://www.thordata.com)
96
+ 2. Navigate to your Dashboard
97
+ 3. Copy your Scraper Token, Public Token, and Public Key
98
+
99
+ ### Basic Usage
100
+
101
+ ```python
102
+ from thordata import ThordataClient
103
+
104
+ # Initialize the client
105
+ client = ThordataClient(
106
+ scraper_token="your_scraper_token",
107
+ public_token="your_public_token", # Optional, for task APIs
108
+ public_key="your_public_key" # Optional, for task APIs
109
+ )
110
+
111
+ # Make a request through the proxy network
112
+ response = client.get("https://httpbin.org/ip")
113
+ print(response.json())
114
+ # {'origin': '123.45.67.89'} # Residential IP
115
+ ```
116
+
117
+ ### Environment Variables
118
+
119
+ Create a `.env` file:
120
+
121
+ ```env
122
+ THORDATA_SCRAPER_TOKEN=your_scraper_token
123
+ THORDATA_PUBLIC_TOKEN=your_public_token
124
+ THORDATA_PUBLIC_KEY=your_public_key
125
+ ```
126
+
127
+ Then use with python-dotenv:
128
+
129
+ ```python
130
+ import os
131
+ from dotenv import load_dotenv
132
+ from thordata import ThordataClient
133
+
134
+ load_dotenv()
135
+
136
+ client = ThordataClient(
137
+ scraper_token=os.getenv("THORDATA_SCRAPER_TOKEN"),
138
+ public_token=os.getenv("THORDATA_PUBLIC_TOKEN"),
139
+ public_key=os.getenv("THORDATA_PUBLIC_KEY"),
140
+ )
141
+ ```
142
+
143
+ ---
144
+
145
+ ## ๐Ÿ“– Usage Guide
146
+
147
+ ### 1. Proxy Network
148
+
149
+ #### Basic Proxy Request
150
+
151
+ ```python
152
+ from thordata import ThordataClient
153
+
154
+ client = ThordataClient(scraper_token="your_token")
155
+
156
+ # GET request through proxy
157
+ response = client.get("https://example.com")
158
+ print(response.text)
159
+
160
+ # POST request through proxy
161
+ response = client.post("https://httpbin.org/post", json={"key": "value"})
162
+ print(response.json())
163
+ ```
164
+
165
+ #### Geo-Targeting
166
+
167
+ ```python
168
+ from thordata import ThordataClient, ProxyConfig
169
+
170
+ client = ThordataClient(scraper_token="your_token")
171
+
172
+ # Create a proxy config with geo-targeting
173
+ config = ProxyConfig(
174
+ username="your_username",
175
+ password="your_password",
176
+ country="us", # Target country
177
+ state="california", # Target state
178
+ city="los_angeles", # Target city
179
+ )
180
+
181
+ response = client.get("https://httpbin.org/ip", proxy_config=config)
182
+ print(response.json())
183
+ ```
184
+
185
+ #### Sticky Sessions
186
+
187
+ Keep the same IP for multiple requests:
188
+
189
+ ```python
190
+ from thordata import ThordataClient, StickySession
191
+
192
+ client = ThordataClient(scraper_token="your_token")
193
+
194
+ # Create a sticky session (same IP for 10 minutes)
195
+ session = StickySession(
196
+ username="your_username",
197
+ password="your_password",
198
+ country="gb",
199
+ duration_minutes=10,
200
+ )
201
+
202
+ # All requests use the same IP
203
+ for i in range(5):
204
+ response = client.get("https://httpbin.org/ip", proxy_config=session)
205
+ print(f"Request {i+1}: {response.json()['origin']}")
206
+ ```
207
+
208
+ #### Proxy Credentials
209
+
210
+ Each proxy product requires **separate credentials** from Thordata Dashboard:
211
+
212
+ ```env
213
+ # Residential Proxy (port 9999)
214
+ THORDATA_RESIDENTIAL_USERNAME=your_residential_username
215
+ THORDATA_RESIDENTIAL_PASSWORD=your_residential_password
216
+
217
+ # Datacenter Proxy (port 7777)
218
+ THORDATA_DATACENTER_USERNAME=your_datacenter_username
219
+ THORDATA_DATACENTER_PASSWORD=your_datacenter_password
220
+
221
+ # Mobile Proxy (port 5555)
222
+ THORDATA_MOBILE_USERNAME=your_mobile_username
223
+ THORDATA_MOBILE_PASSWORD=your_mobile_password
224
+
225
+ # Static ISP Proxy (port 6666, direct IP connection)
226
+ THORDATA_ISP_HOST=your_static_ip_address
227
+ THORDATA_ISP_USERNAME=your_isp_username
228
+ THORDATA_ISP_PASSWORD=your_isp_password
229
+ ```
230
+
231
+ #### Residential Proxy
232
+
233
+ ```python
234
+ from thordata import ProxyConfig, ProxyProduct
235
+
236
+ proxy = ProxyConfig(
237
+ username="your_username",
238
+ password="your_password",
239
+ product=ProxyProduct.RESIDENTIAL,
240
+ country="us",
241
+ )
242
+
243
+ response = requests.get(
244
+ "http://httpbin.org/ip",
245
+ proxies=proxy.to_proxies_dict(),
246
+ )
247
+ print(response.json())
248
+ ```
249
+
250
+ #### Datacenter Proxy
251
+
252
+ ```python
253
+ proxy = ProxyConfig(
254
+ username="your_username",
255
+ password="your_password",
256
+ product=ProxyProduct.DATACENTER,
257
+ )
258
+ ```
259
+
260
+ #### Mobile Proxy
261
+
262
+ ```python
263
+ proxy = ProxyConfig(
264
+ username="your_username",
265
+ password="your_password",
266
+ product=ProxyProduct.MOBILE,
267
+ country="gb",
268
+ )
269
+ ```
270
+
271
+ #### Static ISP Proxy
272
+
273
+ Static ISP proxies connect directly to your purchased IP address:
274
+
275
+ ```python
276
+ from thordata import StaticISPProxy
277
+
278
+ proxy = StaticISPProxy(
279
+ host="your_static_ip_address", # Your purchased IP
280
+ username="your_username",
281
+ password="your_password",
282
+ )
283
+
284
+ response = requests.get(
285
+ "http://httpbin.org/ip",
286
+ proxies=proxy.to_proxies_dict(),
287
+ )
288
+ # Returns your purchased static IP
289
+ ```
290
+
291
+ #### Proxy Examples
292
+
293
+ ```bash
294
+ python examples/proxy_residential.py
295
+ python examples/proxy_datacenter.py
296
+ python examples/proxy_mobile.py
297
+ python examples/proxy_isp.py
298
+ ```
299
+
300
+ ### Run All Examples
301
+
302
+ ```bash
303
+ # SERP API examples
304
+ python examples/demo_serp_api.py
305
+ python examples/demo_serp_google_news.py
306
+
307
+ # Universal API examples
308
+ python examples/demo_universal.py
309
+ python examples/demo_scraping_browser.py
310
+
311
+ # Web Scraper API examples
312
+ python examples/demo_web_scraper_api.py
313
+
314
+ # Proxy Network examples
315
+ python examples/proxy_residential.py
316
+ python examples/proxy_datacenter.py
317
+ python examples/proxy_mobile.py
318
+ python examples/proxy_isp.py
319
+
320
+ # Async high concurrency example
321
+ python examples/async_high_concurrency.py
322
+ ```
323
+
324
+ ### 2. SERP API (Search Engine Results)
325
+
326
+ #### Basic Search
327
+
328
+ ```python
329
+ from thordata import ThordataClient, Engine
330
+
331
+ client = ThordataClient(scraper_token="your_token")
332
+
333
+ # Google search
334
+ results = client.serp_search(
335
+ query="python programming",
336
+ engine=Engine.GOOGLE,
337
+ num=10
338
+ )
339
+
340
+ # Print organic results
341
+ for result in results.get("organic", []):
342
+ print(f"{result['title']}: {result['link']}")
343
+ ```
344
+
345
+ #### General Calling Method
346
+
347
+ ```python
348
+ from thordata import ThordataClient
349
+
350
+ client = ThordataClient(scraper_token="YOUR_SCRAPER_TOKEN")
351
+
352
+ # Recommended: use dedicated engines for Google verticals when available
353
+ news = client.serp_search(
354
+ query="pizza",
355
+ engine="google_news",
356
+ country="us",
357
+ language="en",
358
+ num=10,
359
+ so=1, # 0=relevance, 1=date (Google News)
360
+ )
361
+
362
+ # Alternative: use Google generic engine + tbm via `search_type`
363
+ # Note: `search_type` maps to Google tbm and is mainly intended for engine="google".
364
+ results = client.serp_search(
365
+ query="pizza",
366
+ engine="google",
367
+ num=10,
368
+ country="us",
369
+ language="en",
370
+ search_type="news", # tbm=nws (Google generic engine)
371
+ ibp="some_ibp_value",
372
+ lsig="some_lsig_value",
373
+ )
374
+ ```
375
+
376
+ **Note**: All parameters above will be assembled into Thordata SERP API request parameters.
377
+
378
+ #### Advanced Search Options
379
+
380
+ ```python
381
+ from thordata import ThordataClient, SerpRequest
382
+
383
+ client = ThordataClient(scraper_token="your_token")
384
+
385
+ # Create a detailed search request
386
+ request = SerpRequest(
387
+ query="best laptops 2024",
388
+ engine="google_shopping",
389
+ num=20,
390
+ country="us",
391
+ language="en",
392
+ safe_search=True,
393
+ device="mobile",
394
+ # Shopping-specific params can be passed via extra_params
395
+ # e.g. min_price=500, max_price=1500, sort_by=1, shoprs="..."
396
+ )
397
+
398
+ results = client.serp_search_advanced(request)
399
+ ```
400
+
401
+ #### Multiple Search Engines
402
+
403
+ ```python
404
+ from thordata import ThordataClient, Engine
405
+
406
+ client = ThordataClient(scraper_token="your_token")
407
+
408
+ # Google
409
+ google_results = client.serp_search("AI news", engine=Engine.GOOGLE)
410
+
411
+ # Bing
412
+ bing_results = client.serp_search("AI news", engine=Engine.BING)
413
+
414
+ # Yandex (Russian search engine)
415
+ yandex_results = client.serp_search("AI news", engine=Engine.YANDEX)
416
+
417
+ # DuckDuckGo
418
+ ddg_results = client.serp_search("AI news", engine=Engine.DUCKDUCKGO)
419
+ ```
420
+
421
+ ---
422
+
423
+ ## ๐Ÿ”ง SERP API Parameter Mapping
424
+
425
+ Thordata's SERP API supports multiple search engines and sub-features (Google Search/Shopping/News, etc.).
426
+ This SDK wraps common parameters through `ThordataClient.serp_search` and `SerpRequest`, while other parameters can be passed directly through `**kwargs`.
427
+
428
+ ### Google Search Parameter Mapping
429
+
430
+ | Document Parameter | SDK Field/Usage | Description |
431
+ |-------------------|-----------------|-------------|
432
+ | q | query | Search keyword |
433
+ | engine | engine | Engine.GOOGLE / "google" |
434
+ | google_domain | google_domain | e.g., "google.co.uk" |
435
+ | gl | country | Country/region, e.g., "us" |
436
+ | hl | language | Language, e.g., "en", "zh-CN" |
437
+ | cr | countries_filter | Multi-country filter, e.g., "countryFR |
438
+ | lr | languages_filter | Multi-language filter, e.g., "lang_en |
439
+ | location | location | Exact location, e.g., "India" |
440
+ | uule | uule | Base64 encoded location string |
441
+ | tbm | search_type | "images"โ†’tbm=isch, "shopping"โ†’tbm=shop, "news"โ†’tbm=nws, "videos"โ†’tbm=vid, other values passed through as-is |
442
+ | start | start | Result offset for pagination |
443
+ | num | num | Number of results per page |
444
+ | ludocid | ludocid | Google Place ID |
445
+ | kgmid | kgmid | Google Knowledge Graph ID |
446
+ | ibp | ibp="..." (kwargs) | Passed through **kwargs |
447
+ | lsig | lsig="..." (kwargs) | Same as above |
448
+ | si | si="..." (kwargs) | Same as above |
449
+ | uds | uds="ADV" (kwargs) | Same as above |
450
+ | tbs | time_filter or tbs="..." | time_filter="week" generates tbs=qdr:w, can also pass complete tbs directly |
451
+ | safe | safe_search | True โ†’ safe=active, False โ†’ safe=off |
452
+ | nfpr | no_autocorrect | True โ†’ nfpr=1 |
453
+ | filter | filter_duplicates | True โ†’ filter=1, False โ†’ filter=0 |
454
+
455
+ **Example: Google Search Basic Usage**
456
+
457
+ ```python
458
+ results = client.serp_search(
459
+ query="python web scraping best practices",
460
+ engine=Engine.GOOGLE,
461
+ country="us",
462
+ language="en",
463
+ num=10,
464
+ time_filter="week", # Last week
465
+ safe_search=True, # Adult content filter
466
+ )
467
+ ```
468
+
469
+ ### Google Shopping Parameter Mapping
470
+
471
+ Recommended: use the dedicated Google Shopping engine (`engine="google_shopping"`):
472
+
473
+ ```python
474
+ results = client.serp_search(
475
+ query="iPhone 15",
476
+ engine="google_shopping",
477
+ country="us",
478
+ language="en",
479
+ num=20,
480
+ # Shopping parameters are passed through kwargs
481
+ min_price=500,
482
+ max_price=1500,
483
+ sort_by=1,
484
+ free_shipping=True,
485
+ on_sale=True,
486
+ small_business=True,
487
+ direct_link=True,
488
+ shoprs="FILTER_ID_HERE",
489
+ )
490
+ shopping_items = results.get("shopping_results", [])
491
+ ```
492
+ Alternative: use `engine="google"` with `search_type="shopping"` (tbm=shop).
493
+
494
+ | Document Parameter | SDK Field/Usage | Description |
495
+ |-------------------|-----------------|-------------|
496
+ | q | query | Search keyword |
497
+ | google_domain | google_domain | Same as above |
498
+ | gl | country | Same as above |
499
+ | hl | language | Same as above |
500
+ | location | location | Same as above |
501
+ | uule | uule | Same as above |
502
+ | start | start | Offset |
503
+ | num | num | Quantity |
504
+ | tbs | time_filter or tbs="..." | Same as above |
505
+ | shoprs | shoprs="..." (kwargs) | Filter ID |
506
+ | min_price | min_price=... (kwargs) | Minimum price |
507
+ | max_price | max_price=... (kwargs) | Maximum price |
508
+ | sort_by | sort_by=1/2 (kwargs) | Sort order |
509
+ | free_shipping | free_shipping=True/False (kwargs) | Free shipping |
510
+ | on_sale | on_sale=True/False (kwargs) | On sale |
511
+ | small_business | small_business=True/False (kwargs) | Small business |
512
+ | direct_link | direct_link=True/False (kwargs) | Include direct links |
513
+
514
+ ### Google Local Parameter Mapping
515
+
516
+ Google Local is mainly about location-based local searches.
517
+ In the SDK, you can use search_type="local" to mark Local mode (tbm passed through as "local"), combined with location + uule.
518
+
519
+ ```python
520
+ results = client.serp_search(
521
+ query="pizza near me",
522
+ engine=Engine.GOOGLE,
523
+ search_type="local",
524
+ google_domain="google.com",
525
+ country="us",
526
+ language="en",
527
+ location="San Francisco",
528
+ uule="w+CAIQICIFU2FuIEZyYW5jaXNjbw", # Example value
529
+ start=0, # Local only accepts 0, 20, 40...
530
+ )
531
+ local_results = results.get("local_results", results.get("organic", []))
532
+ ```
533
+
534
+ | Document Parameter | SDK Field/Usage | Description |
535
+ |-------------------|-----------------|-------------|
536
+ | q | query | Search term |
537
+ | google_domain | google_domain | Domain |
538
+ | gl | country | Country |
539
+ | hl | language | Language |
540
+ | location location |
541
+ | u | location | Localule | uule | Encoded location |
542
+ | start | start | Offset (must be 0,20,40...) |
543
+ | ludocid | ludocid | Place ID (commonly used in Local results) |
544
+ | tbs | time_filter or tbs="..." | Advanced filtering |
545
+
546
+ ### Google Videos Parameter Mapping
547
+
548
+ ```python
549
+ results = client.serp_search(
550
+ query="python async tutorial",
551
+ engine=Engine.GOOGLE,
552
+ search_type="videos", # tbm=vid
553
+ country="us",
554
+ language="en",
555
+ languages_filter="lang_en|lang_fr",
556
+ location="United States",
557
+ uule="ENCODED_LOCATION_HERE",
558
+ num=10,
559
+ time_filter="month",
560
+ safe_search=True,
561
+ filter_duplicates=True,
562
+ )
563
+ video_results = results.get("video_results", results.get("organic", []))
564
+ ```
565
+
566
+ | Document Parameter | SDK Field/Usage | Description |
567
+ |-------------------|-----------------|-------------|
568
+ | q | query | Search term |
569
+ | google_domain | google_domain | Domain |
570
+ | gl | country | Country |
571
+ | hl | language | Language |
572
+ | lr | languages_filter | Multi-language filter |
573
+ | location | location | Geographic location |
574
+ | uule | uule | Encoded location |
575
+ | start | start | Offset |
576
+ | num | num | Quantity |
577
+ | tbs | time_filter or tbs="..." | Time and advanced filtering |
578
+ | safe | safe_search | Adult content filter |
579
+ | nfpr | no_autocorrect | Disable auto-correction |
580
+ | filter | filter_duplicates | Remove duplicates |
581
+
582
+ ### Google News Parameter Mapping
583
+
584
+ Google News has a set of exclusive token parameters for precise control of "topics/media/sections/stories".
585
+
586
+ ```python
587
+ results = client.serp_search(
588
+ query="AI regulation",
589
+ engine="google_news",
590
+ country="us",
591
+ language="en",
592
+ topic_token="YOUR_TOPIC_TOKEN",
593
+ publication_token="YOUR_PUBLICATION_TOKEN",
594
+ section_token="YOUR_SECTION_TOKEN",
595
+ story_token="YOUR_STORY_TOKEN",
596
+ so=1, # 0=relevance, 1=date
597
+ )
598
+ news_results = results.get("news_results", results.get("organic", []))
599
+ ```
600
+
601
+ | Document Parameter | SDK Field/Usage | Description |
602
+ |-------------------|-----------------|-------------|
603
+ | q | query | Search term |
604
+ | gl | country | Country |
605
+ | hl | language | Language |
606
+ | topic_token | topic_token="..." (kwargs) | Topic token |
607
+ | publication_token | publication_token="..." (kwargs) | Media token |
608
+ | section_token | section_token="..." (kwargs) | Section token |
609
+ | story_token | story_token="..." (kwargs) | Story token |
610
+ | so | so=0/1 (kwargs) | Sort: 0=relevance, 1=time |
611
+
612
+ ---
613
+
614
+ ๐Ÿ‘‰ For more SERP modes and parameter mappings, see docs/serp_reference.md.
615
+
616
+ ## ๐Ÿ”“ Web Unlocker (Universal Scraping API)
617
+
618
+ Automatically bypass anti-bot protections:
619
+
620
+ #### Basic Usage
621
+
622
+ ```python
623
+ from thordata import ThordataClient
624
+
625
+ client = ThordataClient(scraper_token="your_token")
626
+
627
+ # Get HTML content
628
+ html = client.universal_scrape(
629
+ url="https://example.com",
630
+ js_render=True, # Enable JavaScript rendering
631
+ )
632
+ print(html[:500])
633
+ ```
634
+
635
+ #### Advanced Options
636
+
637
+ ```python
638
+ from thordata import ThordataClient, UniversalScrapeRequest
639
+
640
+ client = ThordataClient(scraper_token="your_token")
641
+
642
+ request = UniversalScrapeRequest(
643
+ url="https://example.com",
644
+ js_render=True,
645
+ output_format="html",
646
+ country="us",
647
+ block_resources="image,font", # Speed up by blocking resources
648
+ clean_content="js,css", # Remove JS/CSS from output
649
+ wait=5000, # Wait 5 seconds after load
650
+ wait_for=".content-loaded", # Wait for CSS selector
651
+ headers=[
652
+ {"name": "Accept-Language", "value": "en-US"}
653
+ ],
654
+ cookies=[
655
+ {"name": "session", "value": "abc123"}
656
+ ],
657
+ )
658
+
659
+ html = client.universal_scrape_advanced(request)
660
+ ```
661
+
662
+ #### Take Screenshots
663
+
664
+ ```python
665
+ from thordata import ThordataClient
666
+
667
+ client = ThordataClient(scraper_token="your_token")
668
+
669
+ # Get PNG screenshot
670
+ png_bytes = client.universal_scrape(
671
+ url="https://example.com",
672
+ js_render=True,
673
+ output_format="png",
674
+ )
675
+
676
+ # Save to file
677
+ with open("screenshot.png", "wb") as f:
678
+ f.write(png_bytes)
679
+ ```
680
+
681
+ ### Web Scraper API (Async Tasks)
682
+
683
+ For complex scraping jobs that run asynchronously:
684
+
685
+ ```python
686
+ from thordata import ThordataClient
687
+
688
+ client = ThordataClient(
689
+ scraper_token="your_token",
690
+ public_token="your_public_token",
691
+ public_key="your_public_key",
692
+ )
693
+
694
+ # Create a scraping task
695
+ task_id = client.create_scraper_task(
696
+ file_name="youtube_channel_data",
697
+ spider_id="youtube_video-post_by-url", # From Dashboard
698
+ spider_name="youtube.com",
699
+ parameters={
700
+ "url": "https://www.youtube.com/@PewDiePie/videos",
701
+ "num_of_posts": "50"
702
+ }
703
+ )
704
+ print(f"Task created: {task_id}")
705
+
706
+ # Wait for completion (with timeout)
707
+ status = client.wait_for_task(task_id, max_wait=300)
708
+ print(f"Task status: {status}")
709
+
710
+ # Get results
711
+ if status in ("ready", "success"):
712
+ download_url = client.get_task_result(task_id)
713
+ print(f"Download: {download_url}")
714
+ ```
715
+
716
+ ### Async Client (High Concurrency)
717
+
718
+ For maximum performance with concurrent requests:
719
+
720
+ ```python
721
+ import asyncio
722
+ from thordata import AsyncThordataClient
723
+
724
+ async def main():
725
+ async with AsyncThordataClient(
726
+ scraper_token="your_token",
727
+ public_token="your_public_token",
728
+ public_key="your_public_key",
729
+ ) as client:
730
+
731
+ # Concurrent proxy requests
732
+ urls = [
733
+ "https://httpbin.org/ip",
734
+ "https://httpbin.org/headers",
735
+ "https://httpbin.org/user-agent",
736
+ ]
737
+
738
+ tasks = [client.get(url) for url in urls]
739
+ responses = await asyncio.gather(*tasks)
740
+
741
+ for resp in responses:
742
+ print(await resp.json())
743
+
744
+ asyncio.run(main())
745
+ ```
746
+
747
+ #### Async SERP Search
748
+
749
+ ```python
750
+ import asyncio
751
+ from thordata import AsyncThordataClient, Engine
752
+
753
+ async def search_multiple():
754
+ async with AsyncThordataClient(scraper_token="your_token") as client:
755
+ queries = ["python", "javascript", "rust", "go"]
756
+
757
+ tasks = [
758
+ client.serp_search(q, engine=Engine.GOOGLE)
759
+ for q in queries
760
+ ]
761
+
762
+ results = await asyncio.gather(*tasks)
763
+
764
+ for query, result in zip(queries, results):
765
+ count = len(result.get("organic", []))
766
+ print(f"{query}: {count} results")
767
+
768
+ asyncio.run(search_multiple())
769
+ ```
770
+
771
+ ### Location APIs
772
+
773
+ Discover available geo-targeting options:
774
+
775
+ ```python
776
+ from thordata import ThordataClient, ProxyType
777
+
778
+ client = ThordataClient(
779
+ scraper_token="your_token",
780
+ public_token="your_public_token",
781
+ public_key="your_public_key",
782
+ )
783
+
784
+ # List all supported countries
785
+ countries = client.list_countries(proxy_type=ProxyType.RESIDENTIAL)
786
+ print(f"Supported countries: {len(countries)}")
787
+
788
+ # List states for a country
789
+ states = client.list_states("US")
790
+ for state in states[:5]:
791
+ print(f" {state['state_code']}: {state['state_name']}")
792
+
793
+ # List cities
794
+ cities = client.list_cities("US", state_code="california")
795
+ print(f"Cities in California: {len(cities)}")
796
+
797
+ # List ASNs (for ISP targeting)
798
+ asns = client.list_asn("US")
799
+ for asn in asns[:5]:
800
+ print(f" {asn['asn_code']}: {asn['asn_name']}")
801
+ ```
802
+
803
+ ### Error Handling
804
+
805
+ ```python
806
+ from thordata import (
807
+ ThordataClient,
808
+ ThordataError,
809
+ ThordataAuthError,
810
+ ThordataRateLimitError,
811
+ ThordataNetworkError,
812
+ ThordataTimeoutError,
813
+ )
814
+
815
+ client = ThordataClient(scraper_token="your_token")
816
+
817
+ try:
818
+ result = client.serp_search("test query")
819
+ except ThordataAuthError as e:
820
+ print(f"Authentication failed: {e}")
821
+ print(f"Check your token. Status code: {e.status_code}")
822
+ except ThordataRateLimitError as e:
823
+ print(f"Rate limited: {e}")
824
+ if e.retry_after:
825
+ print(f"Retry after {e.retry_after} seconds")
826
+ except ThordataTimeoutError as e:
827
+ print(f"Request timed out: {e}")
828
+ except ThordataNetworkError as e:
829
+ print(f"Network error: {e}")
830
+ except ThordataError as e:
831
+ print(f"General error: {e}")
832
+ ```
833
+
834
+ ### Retry Configuration
835
+
836
+ Customize automatic retry behavior:
837
+
838
+ ```python
839
+ from thordata import ThordataClient, RetryConfig
840
+
841
+ # Custom retry configuration
842
+ retry_config = RetryConfig(
843
+ max_retries=5, # Maximum retry attempts
844
+ backoff_factor=2.0, # Exponential backoff multiplier
845
+ max_backoff=120.0, # Maximum wait between retries
846
+ jitter=True, # Add randomness to prevent thundering herd
847
+ )
848
+
849
+ client = ThordataClient(
850
+ scraper_token="your_token",
851
+ retry_config=retry_config,
852
+ )
853
+
854
+ # Requests will automatically retry on transient failures
855
+ response = client.get("https://example.com")
856
+ ```
857
+
858
+ ---
859
+
860
+ ## ๐Ÿ”ง Configuration Reference
861
+
862
+ ### ThordataClient Parameters
863
+
864
+ | Parameter | Type | Default | Description |
865
+ |-----------|------|---------|-------------|
866
+ | scraper_token | str | required | API token from Dashboard |
867
+ | public_token | str | None | Public API token (for tasks/locations) |
868
+ | public_key | str | None | Public API key |
869
+ | proxy_host | str | "pr.thordata.net" | Proxy gateway host |
870
+ | proxy_port | int | 9999 | Proxy gateway port |
871
+ | timeout | int | 30 | Default request timeout (seconds) |
872
+ | retry_config | RetryConfig | None | Retry configuration |
873
+
874
+ ### ProxyConfig Parameters
875
+
876
+ | Parameter | Type | Default | Description |
877
+ |-----------|------|---------|-------------|
878
+ | username | str | required | Proxy username |
879
+ | password | str | required | Proxy password |
880
+ | product | ProxyProduct | RESIDENTIAL | Proxy type |
881
+ | country | str | None | ISO 3166-1 alpha-2 code |
882
+ | state | str | None | State name (lowercase) |
883
+ | city | str | None | City name (lowercase) |
884
+ | continent | str | None | Continent code (af/an/as/eu/na/oc/sa) |
885
+ | asn | str | None | ASN code (requires country) |
886
+ | session_id | str | None | Session ID for sticky sessions |
887
+ | session_duration | int | None | Session duration (1-90 minutes) |
888
+
889
+ ### Proxy Products & Ports
890
+
891
+ | Product | Port | Description |
892
+ |---------|------|-------------|
893
+ | RESIDENTIAL | 9999 | Rotating residential IPs |
894
+ | MOBILE | 5555 | Mobile carrier IPs |
895
+ | DATACENTER | 7777 | Datacenter IPs |
896
+ | ISP | 6666 | Static ISP IPs |
897
+
898
+ ---
899
+
900
+ ## ๐Ÿ“ Project Structure
901
+
902
+ ```
903
+ thordata-python-sdk/
904
+ โ”œโ”€โ”€ src/thordata/
905
+ โ”‚ โ”œโ”€โ”€ __init__.py # Public API exports
906
+ โ”‚ โ”œโ”€โ”€ client.py # Sync client
907
+ โ”‚ โ”œโ”€โ”€ async_client.py # Async client
908
+ โ”‚ โ”œโ”€โ”€ models.py # Data models (ProxyConfig, SerpRequest, etc.)
909
+ โ”‚ โ”œโ”€โ”€ enums.py # Enumerations
910
+ โ”‚ โ”œโ”€โ”€ exceptions.py # Exception hierarchy
911
+ โ”‚ โ”œโ”€โ”€ retry.py # Retry mechanism
912
+ โ”‚ โ”œโ”€โ”€ parameters.py # Parameter definitions
913
+ โ”‚ โ”œโ”€โ”€ demo.py # Demo functionality
914
+ โ”‚ โ””โ”€โ”€ _utils.py # Internal utilities
915
+ โ”œโ”€โ”€ tests/
916
+ โ”‚ โ”œโ”€โ”€ __init__.py # Test initialization
917
+ โ”‚ โ”œโ”€โ”€ conftest.py # Pytest configuration
918
+ โ”‚ โ”œโ”€โ”€ test_client.py # Client tests
919
+ โ”‚ โ”œโ”€โ”€ test_async_client.py # Async client tests
920
+ โ”‚ โ”œโ”€โ”€ test_client_errors.py # Client error tests
921
+ โ”‚ โ”œโ”€โ”€ test_async_client_errors.py # Async client error tests
922
+ โ”‚ โ”œโ”€โ”€ test_enums.py # Enums tests
923
+ โ”‚ โ”œโ”€โ”€ test_models.py # Models tests
924
+ โ”‚ โ”œโ”€โ”€ test_exceptions.py # Exceptions tests
925
+ โ”‚ โ”œโ”€โ”€ test_demo_entrypoint.py # Demo entrypoint tests
926
+ โ”‚ โ”œโ”€โ”€ test_task_status_and_wait.py # Task status tests
927
+ โ”‚ โ”œโ”€โ”€ test_user_agent.py # User agent tests
928
+ โ”‚ โ”œโ”€โ”€ test_examples_demo_serp_api.py # SERP API examples tests
929
+ โ”‚ โ”œโ”€โ”€ test_examples_demo_universal.py # Universal API examples tests
930
+ โ”‚ โ”œโ”€โ”€ test_examples_demo_web_scraper_api.py # Web scraper examples tests
931
+ โ”‚ โ””โ”€โ”€ test_examples_async_high_concurrency.py # Async high concurrency tests
932
+ โ”œโ”€โ”€ examples/
933
+ โ”‚ โ”œโ”€โ”€ demo_serp_api.py # SERP API demo
934
+ โ”‚ โ”œโ”€โ”€ demo_serp_google_news.py # Google News demo
935
+ โ”‚ โ”œโ”€โ”€ demo_universal.py # Universal API demo
936
+ โ”‚ โ”œโ”€โ”€ demo_web_scraper_api.py # Web scraper demo
937
+ โ”‚ โ”œโ”€โ”€ demo_scraping_browser.py # Scraping browser demo
938
+ โ”‚ โ”œโ”€โ”€ async_high_concurrency.py # Async high concurrency demo
939
+ โ”‚ โ”œโ”€โ”€ proxy_residential.py # Residential proxy example
940
+ โ”‚ โ”œโ”€โ”€ proxy_datacenter.py # Datacenter proxy example
941
+ โ”‚ โ”œโ”€โ”€ proxy_mobile.py # Mobile proxy example
942
+ โ”‚ โ”œโ”€โ”€ proxy_isp.py # Static ISP proxy example
943
+ โ”‚ โ””โ”€โ”€ .env.example # Environment variables template
944
+ โ”œโ”€โ”€ docs/
945
+ โ”‚ โ”œโ”€โ”€ serp_reference.md # SERP API reference
946
+ โ”‚ โ”œโ”€โ”€ serp_reference_legacy.md # Legacy SERP reference
947
+ โ”‚ โ””โ”€โ”€ universal_reference.md # Universal API reference
948
+ โ”œโ”€โ”€ .github/
949
+ โ”‚ โ”œโ”€โ”€ dependabot.yml # Dependabot configuration
950
+ โ”‚ โ”œโ”€โ”€ pull_request_template.md # PR template
951
+ โ”‚ โ”œโ”€โ”€ ISSUE_TEMPLATE/
952
+ โ”‚ โ”‚ โ”œโ”€โ”€ bug_report.md # Bug report template
953
+ โ”‚ โ”‚ โ””โ”€โ”€ feature_request.md # Feature request template
954
+ โ”‚ โ””โ”€โ”€ workflows/
955
+ โ”‚ โ”œโ”€โ”€ ci.yml # Continuous integration
956
+ โ”‚ โ””โ”€โ”€ pypi-publish.yml # PyPI publishing workflow
957
+ โ”œโ”€โ”€ .env.example # Environment variables template
958
+ โ”œโ”€โ”€ .prettierrc # Prettier configuration
959
+ โ”œโ”€โ”€ .prettierignore # Prettier ignore patterns
960
+ โ”œโ”€โ”€ eslint.config.cjs # ESLint configuration
961
+ โ”œโ”€โ”€ LICENSE # License file
962
+ โ”œโ”€โ”€ package.json # Package configuration
963
+ โ”œโ”€โ”€ py.typed # Type hints marker
964
+ โ”œโ”€โ”€ pyproject.toml # Python package configuration
965
+ โ”œโ”€โ”€ pytest.ini # Pytest configuration
966
+ โ”œโ”€โ”€ requirements.txt # Python dependencies
967
+ โ”œโ”€โ”€ tsconfig.json # TypeScript configuration
968
+ โ”œโ”€โ”€ tsconfig.build.json # TypeScript build configuration
969
+ โ”œโ”€โ”€ vitest.config.ts # Vitest testing configuration
970
+ โ””โ”€โ”€ README.md # This file
971
+ ```
972
+
973
+ ---
974
+
975
+ ## ๐Ÿงช Development
976
+
977
+ ### Setup
978
+
979
+ ```bash
980
+ # Clone the repository
981
+ git clone https://github.com/Thordata/thordata-python-sdk.git
982
+ cd thordata-python-sdk
983
+
984
+ # Create virtual environment
985
+ python -m venv venv
986
+ source venv/bin/activate # On Windows: venv\Scripts\activate
987
+
988
+ # Install with dev dependencies
989
+ pip install -e ".[dev]"
990
+ ```
991
+
992
+ ### Run Tests
993
+
994
+ ```bash
995
+ # Run all tests
996
+ pytest
997
+
998
+ # Run with coverage
999
+ pytest --cov=thordata --cov-report=html
1000
+
1001
+ # Run specific test file
1002
+ pytest tests/test_client.py -v
1003
+ ```
1004
+
1005
+ ### Code Quality
1006
+
1007
+ ```bash
1008
+ # Format code
1009
+ black src tests
1010
+
1011
+ # Lint
1012
+ ruff check src tests
1013
+
1014
+ # Type check
1015
+ mypy src
1016
+ ```
1017
+
1018
+ ---
1019
+
1020
+ ## ๐Ÿ“ Changelog
1021
+
1022
+ See [CHANGELOG.md](CHANGELOG.md) for version history.
1023
+
1024
+ ---
1025
+
1026
+ ## ๐Ÿค Contributing
1027
+
1028
+ Contributions are welcome! Please see [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.
1029
+
1030
+ 1. Fork the repository
1031
+ 2. Create a feature branch (`git checkout -b feature/amazing-feature`)
1032
+ 3. Commit your changes (`git commit -m 'Add amazing feature'`)
1033
+ 4. Push to the branch (`git push origin feature/amazing-feature`)
1034
+ 5. Open a Pull Request
1035
+
1036
+ ---
1037
+
1038
+ ## ๐Ÿ“„ License
1039
+
1040
+ This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
1041
+
1042
+ ---
1043
+
1044
+ ## ๐Ÿ†˜ Support
1045
+
1046
+ - ๐Ÿ“ง **Email**: support@thordata.com
1047
+ - ๐Ÿ“š **Documentation**: [doc.thordata.com](https://doc.thordata.com)
1048
+ - ๐Ÿ› **Issues**: [GitHub Issues](https://github.com/Thordata/thordata-python-sdk/issues)
1049
+ - ๐Ÿ’ฌ **Dashboard**: [thordata.com](https://www.thordata.com)
1050
+
1051
+ <div align="center">
1052
+ <sub>Built with โค๏ธ by the Thordata Team</sub>
1053
+ </div>