PyPI - thordata-sdk - Versions diffs - 1.1.0__tar.gz → 1.3.0__tar.gz - Mend

thordata-sdk 1.1.0tar.gz → 1.3.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (36) hide show

thordata_sdk-1.3.0/PKG-INFO ADDED Viewed

@@ -0,0 +1,208 @@
+Metadata-Version: 2.4
+Name: thordata-sdk
+Version: 1.3.0
+Summary: The Official Python SDK for Thordata - AI Data Infrastructure & Proxy Network.
+Author-email: Thordata Developer Team <support@thordata.com>
+License: MIT
+Project-URL: Homepage, https://www.thordata.com
+Project-URL: Documentation, https://github.com/Thordata/thordata-python-sdk#readme
+Project-URL: Source, https://github.com/Thordata/thordata-python-sdk
+Project-URL: Tracker, https://github.com/Thordata/thordata-python-sdk/issues
+Project-URL: Changelog, https://github.com/Thordata/thordata-python-sdk/blob/main/CHANGELOG.md
+Keywords: web scraping,proxy,residential proxy,datacenter proxy,ai,llm,data-mining,serp,thordata,web scraper,anti-bot bypass
+Classifier: Development Status :: 4 - Beta
+Classifier: Intended Audience :: Developers
+Classifier: Topic :: Software Development :: Libraries :: Python Modules
+Classifier: Topic :: Internet :: WWW/HTTP
+Classifier: Topic :: Internet :: Proxy Servers
+Classifier: Programming Language :: Python :: 3
+Classifier: Programming Language :: Python :: 3.9
+Classifier: Programming Language :: Python :: 3.10
+Classifier: Programming Language :: Python :: 3.11
+Classifier: Programming Language :: Python :: 3.12
+Classifier: License :: OSI Approved :: MIT License
+Classifier: Operating System :: OS Independent
+Classifier: Typing :: Typed
+Requires-Python: >=3.9
+Description-Content-Type: text/markdown
+License-File: LICENSE
+Requires-Dist: requests>=2.25.0
+Requires-Dist: aiohttp>=3.9.0
+Requires-Dist: PySocks>=1.7.1
+Provides-Extra: dev
+Requires-Dist: pytest>=7.0.0; extra == "dev"
+Requires-Dist: pytest-asyncio>=0.21.0; extra == "dev"
+Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
+Requires-Dist: pytest-httpserver>=1.0.0; extra == "dev"
+Requires-Dist: python-dotenv>=1.0.0; extra == "dev"
+Requires-Dist: black>=23.0.0; extra == "dev"
+Requires-Dist: ruff>=0.1.0; extra == "dev"
+Requires-Dist: mypy>=1.0.0; extra == "dev"
+Requires-Dist: types-requests>=2.28.0; extra == "dev"
+Requires-Dist: aioresponses>=0.7.6; extra == "dev"
+Dynamic: license-file
+# Thordata Python SDK
+<div align="center">
+<img src="https://img.shields.io/badge/Thordata-AI%20Infrastructure-blue?style=for-the-badge" alt="Thordata Logo">
+**The Official Python Client for Thordata APIs**
+*Proxy Network • SERP API • Web Unlocker • Web Scraper API*
+[![PyPI version](https://img.shields.io/pypi/v/thordata-sdk.svg?style=flat-square)](https://pypi.org/project/thordata-sdk/)
+[![Python Versions](https://img.shields.io/pypi/pyversions/thordata-sdk.svg?style=flat-square)](https://pypi.org/project/thordata-sdk/)
+[![License](https://img.shields.io/badge/license-MIT-green?style=flat-square)](LICENSE)
+[![CI Status](https://img.shields.io/github/actions/workflow/status/Thordata/thordata-python-sdk/ci.yml?branch=main&style=flat-square)](https://github.com/Thordata/thordata-python-sdk/actions)
+</div>
+---
+## 📖 Introduction
+This SDK provides a robust, high-performance interface to Thordata's AI data infrastructure. It is designed for high-concurrency scraping, reliable proxy tunneling, and seamless data extraction.
+**Key Features:**
+*   **🚀 Production Ready:** Built on `urllib3` connection pooling for low-latency proxy requests.
+*   **⚡ Async Support:** Native `aiohttp` client for high-concurrency SERP/Universal scraping.
+*   **🛡️ Robust:** Handles TLS-in-TLS tunneling, retries, and error parsing automatically.
+*   **✨ Developer Experience:** Fully typed (`mypy` compatible) with intuitive IDE autocomplete.
+*   **🧩 Lazy Validation:** Only validate credentials for the features you actually use.
+---
+## 📦 Installation
+```bash
+pip install thordata-sdk
+```
+---
+## 🔐 Configuration
+Set environment variables to avoid hardcoding credentials. You only need to set the variables for the features you use.
+```bash
+# [Required for SERP & Web Unlocker]
+export THORDATA_SCRAPER_TOKEN="your_token_here"
+# [Required for Proxy Network]
+export THORDATA_RESIDENTIAL_USERNAME="your_username"
+export THORDATA_RESIDENTIAL_PASSWORD="your_password"
+export THORDATA_PROXY_HOST="vpnXXXX.pr.thordata.net"
+# [Required for Task Management]
+export THORDATA_PUBLIC_TOKEN="public_token"
+export THORDATA_PUBLIC_KEY="public_key"
+```
+---
+## 🚀 Quick Start
+### 1. SERP Search (Google/Bing/Yandex)
+```python
+from thordata import ThordataClient, Engine
+client = ThordataClient()  # Loads THORDATA_SCRAPER_TOKEN from env
+# Simple Search
+print("Searching...")
+results = client.serp_search("latest AI trends", engine=Engine.GOOGLE_NEWS)
+for news in results.get("news_results", [])[:3]:
+    print(f"- {news['title']} ({news['source']})")
+```
+### 2. Universal Scrape (Web Unlocker)
+Bypass Cloudflare/Akamai and render JavaScript automatically.
+```python
+html = client.universal_scrape(
+    url="https://example.com/protected-page",
+    js_render=True,
+    wait_for=".content-loaded",
+    country="us"
+)
+print(f"Scraped {len(html)} bytes")
+```
+### 3. High-Performance Proxy
+Use Thordata's residential IPs with automatic connection pooling.
+```python
+from thordata import ProxyConfig, ProxyProduct
+# Config is optional if env vars are set, but allows granular control
+proxy = ProxyConfig(
+    product=ProxyProduct.RESIDENTIAL,
+    country="jp",
+    city="tokyo",
+    session_id="session-001",
+    session_duration=10  # Sticky IP for 10 mins
+)
+# Use the client to make requests (Reuses TCP connections)
+response = client.get("https://httpbin.org/ip", proxy_config=proxy)
+print(response.json())
+```
+---
+## ⚙️ Advanced Usage
+### Async Client (High Concurrency)
+For building AI agents or high-throughput spiders.
+```python
+import asyncio
+from thordata import AsyncThordataClient
+async def main():
+    async with AsyncThordataClient() as client:
+        # Fire off multiple requests in parallel
+        tasks = [
+            client.serp_search(f"query {i}")
+            for i in range(5)
+        ]
+        results = await asyncio.gather(*tasks)
+        print(f"Completed {len(results)} searches")
+asyncio.run(main())
+```
+### Web Scraper API (Task Management)
+Create and manage large-scale scraping tasks asynchronously.
+```python
+# 1. Create a task
+task_id = client.create_scraper_task(
+    file_name="daily_scrape",
+    spider_id="universal",
+    spider_name="universal",
+    parameters={"url": "https://example.com"}
+)
+# 2. Wait for completion (Polling)
+status = client.wait_for_task(task_id)
+# 3. Get results
+if status == "ready":
+    url = client.get_task_result(task_id)
+    print(f"Download Data: {url}")
+```
+---
+## 📄 License
+MIT License. See [LICENSE](LICENSE) for details.

thordata_sdk-1.3.0/README.md ADDED Viewed

@@ -0,0 +1,164 @@
+# Thordata Python SDK
+<div align="center">
+<img src="https://img.shields.io/badge/Thordata-AI%20Infrastructure-blue?style=for-the-badge" alt="Thordata Logo">
+**The Official Python Client for Thordata APIs**
+*Proxy Network • SERP API • Web Unlocker • Web Scraper API*
+[![PyPI version](https://img.shields.io/pypi/v/thordata-sdk.svg?style=flat-square)](https://pypi.org/project/thordata-sdk/)
+[![Python Versions](https://img.shields.io/pypi/pyversions/thordata-sdk.svg?style=flat-square)](https://pypi.org/project/thordata-sdk/)
+[![License](https://img.shields.io/badge/license-MIT-green?style=flat-square)](LICENSE)
+[![CI Status](https://img.shields.io/github/actions/workflow/status/Thordata/thordata-python-sdk/ci.yml?branch=main&style=flat-square)](https://github.com/Thordata/thordata-python-sdk/actions)
+</div>
+---
+## 📖 Introduction
+This SDK provides a robust, high-performance interface to Thordata's AI data infrastructure. It is designed for high-concurrency scraping, reliable proxy tunneling, and seamless data extraction.
+**Key Features:**
+*   **🚀 Production Ready:** Built on `urllib3` connection pooling for low-latency proxy requests.
+*   **⚡ Async Support:** Native `aiohttp` client for high-concurrency SERP/Universal scraping.
+*   **🛡️ Robust:** Handles TLS-in-TLS tunneling, retries, and error parsing automatically.
+*   **✨ Developer Experience:** Fully typed (`mypy` compatible) with intuitive IDE autocomplete.
+*   **🧩 Lazy Validation:** Only validate credentials for the features you actually use.
+---
+## 📦 Installation
+```bash
+pip install thordata-sdk
+```
+---
+## 🔐 Configuration
+Set environment variables to avoid hardcoding credentials. You only need to set the variables for the features you use.
+```bash
+# [Required for SERP & Web Unlocker]
+export THORDATA_SCRAPER_TOKEN="your_token_here"
+# [Required for Proxy Network]
+export THORDATA_RESIDENTIAL_USERNAME="your_username"
+export THORDATA_RESIDENTIAL_PASSWORD="your_password"
+export THORDATA_PROXY_HOST="vpnXXXX.pr.thordata.net"
+# [Required for Task Management]
+export THORDATA_PUBLIC_TOKEN="public_token"
+export THORDATA_PUBLIC_KEY="public_key"
+```
+---
+## 🚀 Quick Start
+### 1. SERP Search (Google/Bing/Yandex)
+```python
+from thordata import ThordataClient, Engine
+client = ThordataClient()  # Loads THORDATA_SCRAPER_TOKEN from env
+# Simple Search
+print("Searching...")
+results = client.serp_search("latest AI trends", engine=Engine.GOOGLE_NEWS)
+for news in results.get("news_results", [])[:3]:
+    print(f"- {news['title']} ({news['source']})")
+```
+### 2. Universal Scrape (Web Unlocker)
+Bypass Cloudflare/Akamai and render JavaScript automatically.
+```python
+html = client.universal_scrape(
+    url="https://example.com/protected-page",
+    js_render=True,
+    wait_for=".content-loaded",
+    country="us"
+)
+print(f"Scraped {len(html)} bytes")
+```
+### 3. High-Performance Proxy
+Use Thordata's residential IPs with automatic connection pooling.
+```python
+from thordata import ProxyConfig, ProxyProduct
+# Config is optional if env vars are set, but allows granular control
+proxy = ProxyConfig(
+    product=ProxyProduct.RESIDENTIAL,
+    country="jp",
+    city="tokyo",
+    session_id="session-001",
+    session_duration=10  # Sticky IP for 10 mins
+)
+# Use the client to make requests (Reuses TCP connections)
+response = client.get("https://httpbin.org/ip", proxy_config=proxy)
+print(response.json())
+```
+---
+## ⚙️ Advanced Usage
+### Async Client (High Concurrency)
+For building AI agents or high-throughput spiders.
+```python
+import asyncio
+from thordata import AsyncThordataClient
+async def main():
+    async with AsyncThordataClient() as client:
+        # Fire off multiple requests in parallel
+        tasks = [
+            client.serp_search(f"query {i}")
+            for i in range(5)
+        ]
+        results = await asyncio.gather(*tasks)
+        print(f"Completed {len(results)} searches")
+asyncio.run(main())
+```
+### Web Scraper API (Task Management)
+Create and manage large-scale scraping tasks asynchronously.
+```python
+# 1. Create a task
+task_id = client.create_scraper_task(
+    file_name="daily_scrape",
+    spider_id="universal",
+    spider_name="universal",
+    parameters={"url": "https://example.com"}
+)
+# 2. Wait for completion (Polling)
+status = client.wait_for_task(task_id)
+# 3. Get results
+if status == "ready":
+    url = client.get_task_result(task_id)
+    print(f"Download Data: {url}")
+```
+---
+## 📄 License
+MIT License. See [LICENSE](LICENSE) for details.

{thordata_sdk-1.1.0 → thordata_sdk-1.3.0}/pyproject.toml RENAMED Viewed

@@ -5,7 +5,7 @@ build-backend = "setuptools.build_meta"
 [project]
 name = "thordata-sdk"
-version = "1.1.0"
+version = "1.3.0"
 description = "The Official Python SDK for Thordata - AI Data Infrastructure & Proxy Network."
 readme = "README.md"
 requires-python = ">=3.9"
@@ -45,6 +45,7 @@ classifiers = [
 dependencies = [
     "requests>=2.25.0",
     "aiohttp>=3.9.0",
+    "PySocks>=1.7.1",
 ]
 [project.optional-dependencies]
@@ -83,6 +84,7 @@ include = '\.pyi?$'
 [tool.ruff]
 line-length = 88
 target-version = "py39"
+extend-exclude = ["sdk-spec"]
 [tool.ruff.lint]
 select = [
@@ -123,6 +125,7 @@ ignore_missing_imports = true
 testpaths = ["tests"]
 asyncio_mode = "auto"
 addopts = "-v --cov=thordata --cov-report=term-missing"
+markers = ["integration: live tests that require real credentials"]
 # Coverage setup
 [tool.coverage.run]
@@ -135,4 +138,4 @@ exclude_lines = [
     "def __repr__",
     "raise NotImplementedError",
     "if TYPE_CHECKING:",
-]
+]

{thordata_sdk-1.1.0 → thordata_sdk-1.3.0}/src/thordata/__init__.py RENAMED Viewed

@@ -35,7 +35,7 @@ Async Usage:
     >>> asyncio.run(main())
 """
-__version__ = "1.1.0"
+__version__ = "1.3.0"
 __author__ = "Thordata Developer Team"
 __email__ = "support@thordata.com"

{thordata_sdk-1.1.0 → thordata_sdk-1.3.0}/src/thordata/async_client.py RENAMED Viewed

@@ -60,6 +60,7 @@ from .models import (
     VideoTaskConfig,
 )
 from .retry import RetryConfig
+from .serp_engines import AsyncSerpNamespace
 logger = logging.getLogger(__name__)
@@ -85,7 +86,10 @@ class AsyncThordataClient:
         ...     public_token="pub_token",
         ...     public_key="pub_key"
         ... ) as client:
+        ...     # Old style
         ...     results = await client.serp_search("python")
+        ...     # New style (Namespaced)
+        ...     maps_results = await client.serp.google.maps("coffee", "@40.7,-74.0,14z")
     """
     # API Endpoints (same as sync client)
@@ -96,7 +100,7 @@ class AsyncThordataClient:
     def __init__(
         self,
-        scraper_token: str,
+        scraper_token: str | None = None,  # Change: Optional
         public_token: str | None = None,
         public_key: str | None = None,
         proxy_host: str = "pr.thordata.net",
@@ -111,8 +115,6 @@ class AsyncThordataClient:
         locations_base_url: str | None = None,
     ) -> None:
         """Initialize the Async Thordata Client."""
-        if not scraper_token:
-            raise ThordataConfigError("scraper_token is required")
         self.scraper_token = scraper_token
         self.public_token = public_token
@@ -195,7 +197,7 @@ class AsyncThordataClient:
         self._whitelist_url = f"{whitelist_base}/whitelisted-ips"
         proxy_api_base = os.getenv(
-            "THORDATA_PROXY_API_BASE_URL", "https://api.thordata.com/api"
+            "THORDATA_PROXY_API_BASE_URL", "https://openapi.thordata.com/api"
         )
         self._proxy_list_url = f"{proxy_api_base}/proxy/proxy-list"
         self._proxy_expiration_url = f"{proxy_api_base}/proxy/expiration-time"
@@ -203,6 +205,9 @@ class AsyncThordataClient:
         # Session initialized lazily
         self._session: aiohttp.ClientSession | None = None
+        # Namespaced Access (e.g. client.serp.google.maps(...))
+        self.serp = AsyncSerpNamespace(self)
     async def __aenter__(self) -> AsyncThordataClient:
         """Async context manager entry."""
         if self._session is None or self._session.closed:
@@ -386,6 +391,9 @@ class AsyncThordataClient:
         Returns:
             Parsed JSON results or dict with 'html' key.
         """
+        if not self.scraper_token:
+            raise ThordataConfigError("scraper_token is required for SERP API")
         session = self._get_session()
         engine_str = engine.value if isinstance(engine, Engine) else engine.lower()
@@ -405,7 +413,8 @@ class AsyncThordataClient:
         )
         payload = request.to_payload()
-        headers = build_auth_headers(self.scraper_token, mode=self._auth_mode)
+        token = self.scraper_token or ""
+        headers = build_auth_headers(token, mode=self._auth_mode)
         logger.info(f"Async SERP Search: {engine_str} - {query}")
@@ -451,6 +460,8 @@ class AsyncThordataClient:
         Execute an async SERP search using a SerpRequest object.
         """
         session = self._get_session()
+        if not self.scraper_token:
+            raise ThordataConfigError("scraper_token is required for SERP API")
         payload = request.to_payload()
         headers = build_auth_headers(self.scraper_token, mode=self._auth_mode)
@@ -545,6 +556,8 @@ class AsyncThordataClient:
         Async scrape using a UniversalScrapeRequest object.
         """
         session = self._get_session()
+        if not self.scraper_token:
+            raise ThordataConfigError("scraper_token is required for Universal API")
         payload = request.to_payload()
         headers = build_auth_headers(self.scraper_token, mode=self._auth_mode)
@@ -621,6 +634,8 @@ class AsyncThordataClient:
         """
         self._require_public_credentials()
         session = self._get_session()
+        if not self.scraper_token:
+            raise ThordataConfigError("scraper_token is required for Task Builder")
         payload = config.to_payload()
         # Builder needs 3 headers: token, key, Authorization Bearer
@@ -637,7 +652,7 @@ class AsyncThordataClient:
                 self._builder_url, data=payload, headers=headers
             ) as response:
                 response.raise_for_status()
-                data = await response.json()
+                data = await response.json(content_type=None)
                 code = data.get("code")
                 if code != 200:
@@ -682,6 +697,10 @@ class AsyncThordataClient:
         self._require_public_credentials()
         session = self._get_session()
+        if not self.scraper_token:
+            raise ThordataConfigError(
+                "scraper_token is required for Video Task Builder"
+            )
         payload = config.to_payload()
         headers = build_builder_headers(
@@ -744,7 +763,7 @@ class AsyncThordataClient:
                 self._status_url, data=payload, headers=headers
             ) as response:
                 response.raise_for_status()
-                data = await response.json()
+                data = await response.json(content_type=None)
                 if isinstance(data, dict):
                     code = data.get("code")
@@ -807,7 +826,7 @@ class AsyncThordataClient:
             async with session.post(
                 self._download_url, data=payload, headers=headers
             ) as response:
-                data = await response.json()
+                data = await response.json(content_type=None)
                 code = data.get("code")
                 if code == 200 and data.get("data"):
@@ -860,7 +879,7 @@ class AsyncThordataClient:
                 timeout=self._api_timeout,
             ) as response:
                 response.raise_for_status()
-                data = await response.json()
+                data = await response.json(content_type=None)
                 code = data.get("code")
                 if code != 200:
@@ -914,6 +933,64 @@ class AsyncThordataClient:
         raise TimeoutError(f"Task {task_id} did not complete within {max_wait} seconds")
+    async def run_task(
+        self,
+        file_name: str,
+        spider_id: str,
+        spider_name: str,
+        parameters: dict[str, Any],
+        universal_params: dict[str, Any] | None = None,
+        *,
+        max_wait: float = 600.0,
+        initial_poll_interval: float = 2.0,
+        max_poll_interval: float = 10.0,
+        include_errors: bool = True,
+    ) -> str:
+        """
+        Async high-level wrapper to Run a Web Scraper task and wait for result.
+        Lifecycle: Create -> Poll (Backoff) -> Get Download URL.
+        Returns:
+            str: The download URL.
+        """
+        # 1. Create Task
+        config = ScraperTaskConfig(
+            file_name=file_name,
+            spider_id=spider_id,
+            spider_name=spider_name,
+            parameters=parameters,
+            universal_params=universal_params,
+            include_errors=include_errors,
+        )
+        task_id = await self.create_scraper_task_advanced(config)
+        logger.info(f"Async Task created: {task_id}. Polling...")
+        # 2. Poll Status
+        import time
+        start_time = time.monotonic()
+        current_poll = initial_poll_interval
+        while (time.monotonic() - start_time) < max_wait:
+            status = await self.get_task_status(task_id)
+            status_lower = status.lower()
+            if status_lower in {"ready", "success", "finished"}:
+                logger.info(f"Task {task_id} ready.")
+                # 3. Get Result
+                return await self.get_task_result(task_id)
+            if status_lower in {"failed", "error", "cancelled"}:
+                raise ThordataNetworkError(
+                    f"Task {task_id} failed with status: {status}"
+                )
+            await asyncio.sleep(current_poll)
+            current_poll = min(current_poll * 1.5, max_poll_interval)
+        raise ThordataTimeoutError(f"Async Task {task_id} timed out after {max_wait}s")
     # =========================================================================
     # Proxy Account Management Methods
     # =========================================================================
@@ -1527,8 +1604,8 @@ class AsyncThordataClient:
         self._require_public_credentials()
         params = {
-            "token": self.public_token,
-            "key": self.public_key,
+            "token": self.public_token or "",
+            "key": self.public_key or "",
         }
         for key, value in kwargs.items():

thordata-sdk 1.1.0__tar.gz → 1.3.0__tar.gz

thordata-sdk 1.1.0tar.gz → 1.3.0tar.gz