PyPI - scrapling - Versions diffs - 0.3.4__tar.gz → 0.3.5__tar.gz - Mend

scrapling 0.3.4tar.gz → 0.3.5tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (52) hide show

{scrapling-0.3.4/scrapling.egg-info → scrapling-0.3.5}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: scrapling
-Version: 0.3.4
+Version: 0.3.5
 Summary: Scrapling is an undetectable, powerful, flexible, high-performance Python library that makes Web Scraping easy and effortless as it should be!
 Home-page: https://github.com/D4Vinci/Scrapling
 Author: Karim Shoair
@@ -69,15 +69,15 @@ Requires-Dist: cssselect>=1.3.0
 Requires-Dist: orjson>=3.11.3
 Requires-Dist: tldextract>=5.3.0
 Provides-Extra: fetchers
-Requires-Dist: click>=8.2.1; extra == "fetchers"
+Requires-Dist: click>=8.3.0; extra == "fetchers"
 Requires-Dist: curl_cffi>=0.13.0; extra == "fetchers"
-Requires-Dist: playwright>=1.52.0; extra == "fetchers"
-Requires-Dist: rebrowser-playwright>=1.52.0; extra == "fetchers"
+Requires-Dist: playwright>=1.55.0; extra == "fetchers"
+Requires-Dist: patchright>=1.55.2; extra == "fetchers"
 Requires-Dist: camoufox>=0.4.11; extra == "fetchers"
 Requires-Dist: geoip2>=5.1.0; extra == "fetchers"
 Requires-Dist: msgspec>=0.19.0; extra == "fetchers"
 Provides-Extra: ai
-Requires-Dist: mcp>=1.14.0; extra == "ai"
+Requires-Dist: mcp>=1.14.1; extra == "ai"
 Requires-Dist: markdownify>=1.2.0; extra == "ai"
 Requires-Dist: scrapling[fetchers]; extra == "ai"
 Provides-Extra: shell
@@ -157,12 +157,13 @@ Built for the modern Web, Scrapling has its own rapid parsing engine and its fet
 <!-- sponsors -->
+<a href="https://www.thordata.com/?ls=github&lk=D4Vinci" target="_blank" title="A global network of over 60M+ residential proxies with 99.7% availability, ensuring stable and reliable web data scraping to support AI, BI, and workflows."><img src="https://raw.githubusercontent.com/D4Vinci/Scrapling/main/images/thordata.jpg"></a>
 <a href="https://evomi.com?utm_source=github&utm_medium=banner&utm_campaign=d4vinci-scrapling" target="_blank" title="Evomi is your Swiss Quality Proxy Provider, starting at $0.49/GB"><img src="https://raw.githubusercontent.com/D4Vinci/Scrapling/main/images/evomi.png"></a>
-<a href="https://petrosky.io/d4vinci" target="_blank" title="PetroSky delivers cutting-edge VPS hosting."><img src="https://raw.githubusercontent.com/D4Vinci/Scrapling/main/images/petrosky.png"></a>
 <a href="https://visit.decodo.com/Dy6W0b" target="_blank" title="Try the Most Efficient Residential Proxies for Free"><img src="https://raw.githubusercontent.com/D4Vinci/Scrapling/main/images/decodo.png"></a>
+<a href="https://petrosky.io/d4vinci" target="_blank" title="PetroSky delivers cutting-edge VPS hosting."><img src="https://raw.githubusercontent.com/D4Vinci/Scrapling/main/images/petrosky.png"></a>
 <a href="https://www.swiftproxy.net/" target="_blank" title="Unlock Reliable Proxy Services with Swiftproxy!"><img src="https://raw.githubusercontent.com/D4Vinci/Scrapling/main/images/swiftproxy.png"></a>
-<a href="https://serpapi.com/?utm_source=scrapling" target="_blank" title="Scrape Google and other search engines with SerpApi"><img src="https://raw.githubusercontent.com/D4Vinci/Scrapling/main/images/SerpApi.png"></a>
 <a href="https://www.nstproxy.com/?type=flow&utm_source=scrapling" target="_blank" title="One Proxy Service, Infinite Solutions at Unbeatable Prices!"><img src="https://raw.githubusercontent.com/D4Vinci/Scrapling/main/images/NSTproxy.png"></a>
+<a href="https://serpapi.com/?utm_source=scrapling" target="_blank" title="Scrape Google and other search engines with SerpApi"><img src="https://raw.githubusercontent.com/D4Vinci/Scrapling/main/images/SerpApi.png"></a>
 <!-- /sponsors -->
@@ -411,10 +412,9 @@ This project includes code adapted from:
 ## Thanks and References
 - [Daijro](https://github.com/daijro)'s brilliant work on [BrowserForge](https://github.com/daijro/browserforge) and [Camoufox](https://github.com/daijro/camoufox)
-- [Vinyzu](https://github.com/Vinyzu)'s work on [Botright](https://github.com/Vinyzu/Botright)
+- [Vinyzu](https://github.com/Vinyzu)'s brilliant work on [Botright](https://github.com/Vinyzu/Botright) and [PatchRight](https://github.com/Kaliiiiiiiiii-Vinyzu/patchright)
 - [brotector](https://github.com/kaliiiiiiiiii/brotector) for browser detection bypass techniques
-- [fakebrowser](https://github.com/kkoooqq/fakebrowser) for fingerprinting research
-- [rebrowser-patches](https://github.com/rebrowser/rebrowser-patches) for stealth improvements
+- [fakebrowser](https://github.com/kkoooqq/fakebrowser) and [BotBrowser](https://github.com/botswin/BotBrowser) for fingerprinting research
 ---
 <div align="center"><small>Designed & crafted with ❤️ by Karim Shoair.</small></div><br>

{scrapling-0.3.4 → scrapling-0.3.5}/README.md RENAMED Viewed

@@ -67,12 +67,13 @@ Built for the modern Web, Scrapling has its own rapid parsing engine and its fet
 <!-- sponsors -->
+<a href="https://www.thordata.com/?ls=github&lk=D4Vinci" target="_blank" title="A global network of over 60M+ residential proxies with 99.7% availability, ensuring stable and reliable web data scraping to support AI, BI, and workflows."><img src="https://raw.githubusercontent.com/D4Vinci/Scrapling/main/images/thordata.jpg"></a>
 <a href="https://evomi.com?utm_source=github&utm_medium=banner&utm_campaign=d4vinci-scrapling" target="_blank" title="Evomi is your Swiss Quality Proxy Provider, starting at $0.49/GB"><img src="https://raw.githubusercontent.com/D4Vinci/Scrapling/main/images/evomi.png"></a>
-<a href="https://petrosky.io/d4vinci" target="_blank" title="PetroSky delivers cutting-edge VPS hosting."><img src="https://raw.githubusercontent.com/D4Vinci/Scrapling/main/images/petrosky.png"></a>
 <a href="https://visit.decodo.com/Dy6W0b" target="_blank" title="Try the Most Efficient Residential Proxies for Free"><img src="https://raw.githubusercontent.com/D4Vinci/Scrapling/main/images/decodo.png"></a>
+<a href="https://petrosky.io/d4vinci" target="_blank" title="PetroSky delivers cutting-edge VPS hosting."><img src="https://raw.githubusercontent.com/D4Vinci/Scrapling/main/images/petrosky.png"></a>
 <a href="https://www.swiftproxy.net/" target="_blank" title="Unlock Reliable Proxy Services with Swiftproxy!"><img src="https://raw.githubusercontent.com/D4Vinci/Scrapling/main/images/swiftproxy.png"></a>
-<a href="https://serpapi.com/?utm_source=scrapling" target="_blank" title="Scrape Google and other search engines with SerpApi"><img src="https://raw.githubusercontent.com/D4Vinci/Scrapling/main/images/SerpApi.png"></a>
 <a href="https://www.nstproxy.com/?type=flow&utm_source=scrapling" target="_blank" title="One Proxy Service, Infinite Solutions at Unbeatable Prices!"><img src="https://raw.githubusercontent.com/D4Vinci/Scrapling/main/images/NSTproxy.png"></a>
+<a href="https://serpapi.com/?utm_source=scrapling" target="_blank" title="Scrape Google and other search engines with SerpApi"><img src="https://raw.githubusercontent.com/D4Vinci/Scrapling/main/images/SerpApi.png"></a>
 <!-- /sponsors -->
@@ -321,10 +322,9 @@ This project includes code adapted from:
 ## Thanks and References
 - [Daijro](https://github.com/daijro)'s brilliant work on [BrowserForge](https://github.com/daijro/browserforge) and [Camoufox](https://github.com/daijro/camoufox)
-- [Vinyzu](https://github.com/Vinyzu)'s work on [Botright](https://github.com/Vinyzu/Botright)
+- [Vinyzu](https://github.com/Vinyzu)'s brilliant work on [Botright](https://github.com/Vinyzu/Botright) and [PatchRight](https://github.com/Kaliiiiiiiiii-Vinyzu/patchright)
 - [brotector](https://github.com/kaliiiiiiiiii/brotector) for browser detection bypass techniques
-- [fakebrowser](https://github.com/kkoooqq/fakebrowser) for fingerprinting research
-- [rebrowser-patches](https://github.com/rebrowser/rebrowser-patches) for stealth improvements
+- [fakebrowser](https://github.com/kkoooqq/fakebrowser) and [BotBrowser](https://github.com/botswin/BotBrowser) for fingerprinting research
 ---
 <div align="center"><small>Designed & crafted with ❤️ by Karim Shoair.</small></div><br>

{scrapling-0.3.4 → scrapling-0.3.5}/pyproject.toml RENAMED Viewed

@@ -64,16 +64,16 @@ dependencies = [
 [project.optional-dependencies]
 fetchers = [
-    "click>=8.2.1",
+    "click>=8.3.0",
     "curl_cffi>=0.13.0",
-    "playwright>=1.52.0",
-    "rebrowser-playwright>=1.52.0",
+    "playwright>=1.55.0",
+    "patchright>=1.55.2",
     "camoufox>=0.4.11",
     "geoip2>=5.1.0",
     "msgspec>=0.19.0",
 ]
 ai = [
-    "mcp>=1.14.0",
+    "mcp>=1.14.1",
     "markdownify>=1.2.0",
     "scrapling[fetchers]",
 ]

{scrapling-0.3.4 → scrapling-0.3.5}/scrapling/__init__.py RENAMED Viewed

@@ -1,5 +1,5 @@
 __author__ = "Karim Shoair (karim.shoair@pm.me)"
-__version__ = "0.3.4"
+__version__ = "0.3.5"
 __copyright__ = "Copyright (c) 2024 Karim Shoair"

{scrapling-0.3.4 → scrapling-0.3.5}/scrapling/cli.py RENAMED Viewed

@@ -32,8 +32,8 @@ def __ParseJSONData(json_string: Optional[str] = None) -> Optional[Dict[str, Any
     try:
         return json_loads(json_string)
-    except JSONDecodeError as e:  # pragma: no cover
-        raise ValueError(f"Invalid JSON data '{json_string}': {e}")
+    except JSONDecodeError as err:  # pragma: no cover
+        raise ValueError(f"Invalid JSON data '{json_string}': {err}")
 def __Request_and_Save(
@@ -65,8 +65,8 @@ def __ParseExtractArguments(
         for key, value in _CookieParser(cookies):
             try:
                 parsed_cookies[key] = value
-            except Exception as e:
-                raise ValueError(f"Could not parse cookies '{cookies}': {e}")
+            except Exception as err:
+                raise ValueError(f"Could not parse cookies '{cookies}': {err}")
     parsed_json = __ParseJSONData(json)
     parsed_params = {}

{scrapling-0.3.4 → scrapling-0.3.5}/scrapling/core/custom_types.py RENAMED Viewed

@@ -145,7 +145,7 @@ class TextHandler(str):
         clean_match: bool = False,
         case_sensitive: bool = True,
         check_match: Literal[False] = False,
-    ) -> "TextHandlers[TextHandler]": ...
+    ) -> "TextHandlers": ...
     def re(
         self,
@@ -241,7 +241,7 @@ class TextHandlers(List[TextHandler]):
         replace_entities: bool = True,
         clean_match: bool = False,
         case_sensitive: bool = True,
-    ) -> "TextHandlers[TextHandler]":
+    ) -> "TextHandlers":
         """Call the ``.re()`` method for each element in this list and return
         their results flattened as TextHandlers.

{scrapling-0.3.4 → scrapling-0.3.5}/scrapling/core/shell.py RENAMED Viewed

@@ -201,7 +201,7 @@ class CurlParser:
                 data_payload = parsed_args.data_binary  # Fallback to string
         elif parsed_args.data_raw is not None:
-            data_payload = parsed_args.data_raw
+            data_payload = parsed_args.data_raw.lstrip("$")
         elif parsed_args.data is not None:
             data_payload = parsed_args.data
@@ -317,8 +317,8 @@ def show_page_in_browser(page: Selector):  # pragma: no cover
     try:
         fd, fname = make_temp_file(prefix="scrapling_view_", suffix=".html")
-        with open(fd, "wb") as f:
-            f.write(page.body)
+        with open(fd, "w", encoding=page.encoding) as f:
+            f.write(page.html_content)
         open_in_browser(f"file://{fname}")
     except IOError as e:
@@ -545,7 +545,7 @@ class Convertor:
             for page in pages:
                 match extraction_type:
                     case "markdown":
-                        yield cls._convert_to_markdown(page.body)
+                        yield cls._convert_to_markdown(page.html_content)
                     case "html":
                         yield page.body
                     case "text":

{scrapling-0.3.4 → scrapling-0.3.5}/scrapling/engines/_browsers/_base.py RENAMED Viewed

@@ -1,4 +1,4 @@
-from time import time, sleep
+from time import time
 from asyncio import sleep as asyncio_sleep, Lock
 from camoufox import DefaultAddons
@@ -44,23 +44,7 @@ class SyncSession:
     ) -> PageInfo:  # pragma: no cover
         """Get a new page to use"""
-        # Close all finished pages to ensure clean state
-        self.page_pool.close_all_finished_pages()
-        # If we're at max capacity after cleanup, wait for busy pages to finish
-        if self.page_pool.pages_count >= self.max_pages:
-            start_time = time()
-            while time() - start_time < self._max_wait_for_page:
-                # Wait for any pages to finish, then clean them up
-                sleep(0.05)
-                self.page_pool.close_all_finished_pages()
-                if self.page_pool.pages_count < self.max_pages:
-                    break
-            else:
-                raise TimeoutError(
-                    f"No pages finished to clear place in the pool within the {self._max_wait_for_page}s timeout period"
-                )
+        # No need to check if a page is available or not in sync code because the code blocked before reaching here till the page closed, ofc.
         page = self.context.new_page()
         page.set_default_navigation_timeout(timeout)
         page.set_default_timeout(timeout)
@@ -76,11 +60,6 @@ class SyncSession:
         return self.page_pool.add_page(page)
-    @staticmethod
-    def _get_with_precedence(request_value: Any, session_value: Any, sentinel_value: object) -> Any:
-        """Get value with request-level priority over session-level"""
-        return request_value if request_value is not sentinel_value else session_value
     def get_pool_stats(self) -> Dict[str, int]:
         """Get statistics about the current page pool"""
         return {
@@ -105,16 +84,11 @@ class AsyncSession(SyncSession):
     ) -> PageInfo:  # pragma: no cover
         """Get a new page to use"""
         async with self._lock:
-            # Close all finished pages to ensure clean state
-            await self.page_pool.aclose_all_finished_pages()
             # If we're at max capacity after cleanup, wait for busy pages to finish
             if self.page_pool.pages_count >= self.max_pages:
                 start_time = time()
                 while time() - start_time < self._max_wait_for_page:
-                    # Wait for any pages to finish, then clean them up
                     await asyncio_sleep(0.05)
-                    await self.page_pool.aclose_all_finished_pages()
                     if self.page_pool.pages_count < self.max_pages:
                         break
                 else:

{scrapling-0.3.4 → scrapling-0.3.5}/scrapling/engines/_browsers/_camoufox.py RENAMED Viewed

@@ -16,7 +16,7 @@ from playwright.async_api import (
 )
 from playwright._impl._errors import Error as PlaywrightError
-from ._validators import validate, CamoufoxConfig
+from ._validators import validate_fetch as _validate
 from ._base import SyncSession, AsyncSession, StealthySessionMixin
 from scrapling.core.utils import log
 from scrapling.core._types import (
@@ -297,23 +297,22 @@ class StealthySession(StealthySessionMixin, SyncSession):
         :param selector_config: The arguments that will be passed in the end while creating the final Selector's class.
         :return: A `Response` object.
         """
-        # Validate all resolved parameters
-        params = validate(
-            dict(
-                google_search=self._get_with_precedence(google_search, self.google_search, _UNSET),
-                timeout=self._get_with_precedence(timeout, self.timeout, _UNSET),
-                wait=self._get_with_precedence(wait, self.wait, _UNSET),
-                page_action=self._get_with_precedence(page_action, self.page_action, _UNSET),
-                extra_headers=self._get_with_precedence(extra_headers, self.extra_headers, _UNSET),
-                disable_resources=self._get_with_precedence(disable_resources, self.disable_resources, _UNSET),
-                wait_selector=self._get_with_precedence(wait_selector, self.wait_selector, _UNSET),
-                wait_selector_state=self._get_with_precedence(wait_selector_state, self.wait_selector_state, _UNSET),
-                network_idle=self._get_with_precedence(network_idle, self.network_idle, _UNSET),
-                load_dom=self._get_with_precedence(load_dom, self.load_dom, _UNSET),
-                solve_cloudflare=self._get_with_precedence(solve_cloudflare, self.solve_cloudflare, _UNSET),
-                selector_config=self._get_with_precedence(selector_config, self.selector_config, _UNSET),
-            ),
-            CamoufoxConfig,
+        params = _validate(
+            [
+                ("google_search", google_search, self.google_search),
+                ("timeout", timeout, self.timeout),
+                ("wait", wait, self.wait),
+                ("page_action", page_action, self.page_action),
+                ("extra_headers", extra_headers, self.extra_headers),
+                ("disable_resources", disable_resources, self.disable_resources),
+                ("wait_selector", wait_selector, self.wait_selector),
+                ("wait_selector_state", wait_selector_state, self.wait_selector_state),
+                ("network_idle", network_idle, self.network_idle),
+                ("load_dom", load_dom, self.load_dom),
+                ("solve_cloudflare", solve_cloudflare, self.solve_cloudflare),
+                ("selector_config", selector_config, self.selector_config),
+            ],
+            _UNSET,
         )
         if self._closed:  # pragma: no cover
@@ -381,8 +380,9 @@ class StealthySession(StealthySessionMixin, SyncSession):
                 page_info.page, first_response, final_response, params.selector_config
             )
-            # Mark the page as finished for next use
-            page_info.mark_finished()
+            # Close the page, to free up resources
+            page_info.page.close()
+            self.page_pool.pages.remove(page_info)
             return response
@@ -616,22 +616,22 @@ class AsyncStealthySession(StealthySessionMixin, AsyncSession):
         :param selector_config: The arguments that will be passed in the end while creating the final Selector's class.
         :return: A `Response` object.
         """
-        params = validate(
-            dict(
-                google_search=self._get_with_precedence(google_search, self.google_search, _UNSET),
-                timeout=self._get_with_precedence(timeout, self.timeout, _UNSET),
-                wait=self._get_with_precedence(wait, self.wait, _UNSET),
-                page_action=self._get_with_precedence(page_action, self.page_action, _UNSET),
-                extra_headers=self._get_with_precedence(extra_headers, self.extra_headers, _UNSET),
-                disable_resources=self._get_with_precedence(disable_resources, self.disable_resources, _UNSET),
-                wait_selector=self._get_with_precedence(wait_selector, self.wait_selector, _UNSET),
-                wait_selector_state=self._get_with_precedence(wait_selector_state, self.wait_selector_state, _UNSET),
-                network_idle=self._get_with_precedence(network_idle, self.network_idle, _UNSET),
-                load_dom=self._get_with_precedence(load_dom, self.load_dom, _UNSET),
-                solve_cloudflare=self._get_with_precedence(solve_cloudflare, self.solve_cloudflare, _UNSET),
-                selector_config=self._get_with_precedence(selector_config, self.selector_config, _UNSET),
-            ),
-            CamoufoxConfig,
+        params = _validate(
+            [
+                ("google_search", google_search, self.google_search),
+                ("timeout", timeout, self.timeout),
+                ("wait", wait, self.wait),
+                ("page_action", page_action, self.page_action),
+                ("extra_headers", extra_headers, self.extra_headers),
+                ("disable_resources", disable_resources, self.disable_resources),
+                ("wait_selector", wait_selector, self.wait_selector),
+                ("wait_selector_state", wait_selector_state, self.wait_selector_state),
+                ("network_idle", network_idle, self.network_idle),
+                ("load_dom", load_dom, self.load_dom),
+                ("solve_cloudflare", solve_cloudflare, self.solve_cloudflare),
+                ("selector_config", selector_config, self.selector_config),
+            ],
+            _UNSET,
         )
         if self._closed:  # pragma: no cover
@@ -701,8 +701,9 @@ class AsyncStealthySession(StealthySessionMixin, AsyncSession):
                 page_info.page, first_response, final_response, params.selector_config
             )
-            # Mark the page as finished for next use
-            page_info.mark_finished()
+            # Close the page, to free up resources
+            await page_info.page.close()
+            self.page_pool.pages.remove(page_info)
             return response

{scrapling-0.3.4 → scrapling-0.3.5}/scrapling/engines/_browsers/_controllers.py RENAMED Viewed

@@ -11,14 +11,12 @@ from playwright.async_api import (
     Playwright as AsyncPlaywright,
     Locator as AsyncLocator,
 )
-from rebrowser_playwright.sync_api import sync_playwright as sync_rebrowser_playwright
-from rebrowser_playwright.async_api import (
-    async_playwright as async_rebrowser_playwright,
-)
+from patchright.sync_api import sync_playwright as sync_patchright
+from patchright.async_api import async_playwright as async_patchright
 from scrapling.core.utils import log
 from ._base import SyncSession, AsyncSession, DynamicSessionMixin
-from ._validators import validate, PlaywrightConfig
+from ._validators import validate_fetch as _validate
 from scrapling.core._types import (
     Dict,
     List,
@@ -154,10 +152,7 @@ class DynamicSession(DynamicSessionMixin, SyncSession):
     def __create__(self):
         """Create a browser for this instance and context."""
-        sync_context = sync_rebrowser_playwright
-        if not self.stealth or self.real_chrome:
-            # Because rebrowser_playwright doesn't play well with real browsers
-            sync_context = sync_playwright
+        sync_context = sync_patchright if self.stealth else sync_playwright
         self.playwright: Playwright = sync_context().start()
@@ -229,22 +224,21 @@ class DynamicSession(DynamicSessionMixin, SyncSession):
         :param selector_config: The arguments that will be passed in the end while creating the final Selector's class.
         :return: A `Response` object.
         """
-        # Validate all resolved parameters
-        params = validate(
-            dict(
-                google_search=self._get_with_precedence(google_search, self.google_search, _UNSET),
-                timeout=self._get_with_precedence(timeout, self.timeout, _UNSET),
-                wait=self._get_with_precedence(wait, self.wait, _UNSET),
-                page_action=self._get_with_precedence(page_action, self.page_action, _UNSET),
-                extra_headers=self._get_with_precedence(extra_headers, self.extra_headers, _UNSET),
-                disable_resources=self._get_with_precedence(disable_resources, self.disable_resources, _UNSET),
-                wait_selector=self._get_with_precedence(wait_selector, self.wait_selector, _UNSET),
-                wait_selector_state=self._get_with_precedence(wait_selector_state, self.wait_selector_state, _UNSET),
-                network_idle=self._get_with_precedence(network_idle, self.network_idle, _UNSET),
-                load_dom=self._get_with_precedence(load_dom, self.load_dom, _UNSET),
-                selector_config=self._get_with_precedence(selector_config, self.selector_config, _UNSET),
-            ),
-            PlaywrightConfig,
+        params = _validate(
+            [
+                ("google_search", google_search, self.google_search),
+                ("timeout", timeout, self.timeout),
+                ("wait", wait, self.wait),
+                ("page_action", page_action, self.page_action),
+                ("extra_headers", extra_headers, self.extra_headers),
+                ("disable_resources", disable_resources, self.disable_resources),
+                ("wait_selector", wait_selector, self.wait_selector),
+                ("wait_selector_state", wait_selector_state, self.wait_selector_state),
+                ("network_idle", network_idle, self.network_idle),
+                ("load_dom", load_dom, self.load_dom),
+                ("selector_config", selector_config, self.selector_config),
+            ],
+            _UNSET,
         )
         if self._closed:  # pragma: no cover
@@ -305,8 +299,9 @@ class DynamicSession(DynamicSessionMixin, SyncSession):
                 page_info.page, first_response, final_response, params.selector_config
             )
-            # Mark the page as finished for next use
-            page_info.mark_finished()
+            # Close the page, to free up resources
+            page_info.page.close()
+            self.page_pool.pages.remove(page_info)
             return response
@@ -402,10 +397,7 @@ class AsyncDynamicSession(DynamicSessionMixin, AsyncSession):
     async def __create__(self):
         """Create a browser for this instance and context."""
-        async_context = async_rebrowser_playwright
-        if not self.stealth or self.real_chrome:
-            # Because rebrowser_playwright doesn't play well with real browsers
-            async_context = async_playwright
+        async_context = async_patchright if self.stealth else async_playwright
         self.playwright: AsyncPlaywright = await async_context().start()
@@ -478,22 +470,21 @@ class AsyncDynamicSession(DynamicSessionMixin, AsyncSession):
         :param selector_config: The arguments that will be passed in the end while creating the final Selector's class.
         :return: A `Response` object.
         """
-        # Validate all resolved parameters
-        params = validate(
-            dict(
-                google_search=self._get_with_precedence(google_search, self.google_search, _UNSET),
-                timeout=self._get_with_precedence(timeout, self.timeout, _UNSET),
-                wait=self._get_with_precedence(wait, self.wait, _UNSET),
-                page_action=self._get_with_precedence(page_action, self.page_action, _UNSET),
-                extra_headers=self._get_with_precedence(extra_headers, self.extra_headers, _UNSET),
-                disable_resources=self._get_with_precedence(disable_resources, self.disable_resources, _UNSET),
-                wait_selector=self._get_with_precedence(wait_selector, self.wait_selector, _UNSET),
-                wait_selector_state=self._get_with_precedence(wait_selector_state, self.wait_selector_state, _UNSET),
-                network_idle=self._get_with_precedence(network_idle, self.network_idle, _UNSET),
-                load_dom=self._get_with_precedence(load_dom, self.load_dom, _UNSET),
-                selector_config=self._get_with_precedence(selector_config, self.selector_config, _UNSET),
-            ),
-            PlaywrightConfig,
+        params = _validate(
+            [
+                ("google_search", google_search, self.google_search),
+                ("timeout", timeout, self.timeout),
+                ("wait", wait, self.wait),
+                ("page_action", page_action, self.page_action),
+                ("extra_headers", extra_headers, self.extra_headers),
+                ("disable_resources", disable_resources, self.disable_resources),
+                ("wait_selector", wait_selector, self.wait_selector),
+                ("wait_selector_state", wait_selector_state, self.wait_selector_state),
+                ("network_idle", network_idle, self.network_idle),
+                ("load_dom", load_dom, self.load_dom),
+                ("selector_config", selector_config, self.selector_config),
+            ],
+            _UNSET,
         )
         if self._closed:  # pragma: no cover
@@ -554,9 +545,9 @@ class AsyncDynamicSession(DynamicSessionMixin, AsyncSession):
                 page_info.page, first_response, final_response, params.selector_config
             )
-            # Mark the page as finished for next use
-            page_info.mark_finished()
+            # Close the page, to free up resources
+            await page_info.page.close()
+            self.page_pool.pages.remove(page_info)
             return response
         except Exception as e:  # pragma: no cover

{scrapling-0.3.4 → scrapling-0.3.5}/scrapling/engines/_browsers/_page.py RENAMED Viewed

@@ -6,7 +6,7 @@ from playwright.async_api import Page as AsyncPage
 from scrapling.core._types import Optional, List, Literal
-PageState = Literal["finished", "ready", "busy", "error"]  # States that a page can be in
+PageState = Literal["ready", "busy", "error"]  # States that a page can be in
 @dataclass
@@ -23,11 +23,6 @@ class PageInfo:
         self.state = "busy"
         self.url = url
-    def mark_finished(self):
-        """Mark the page as finished for new requests"""
-        self.state = "finished"
-        self.url = ""
     def mark_error(self):
         """Mark the page as having an error"""
         self.state = "error"
@@ -67,12 +62,6 @@ class PagePool:
         """Get the total number of pages"""
         return len(self.pages)
-    @property
-    def finished_count(self) -> int:
-        """Get the number of finished pages"""
-        with self._lock:
-            return sum(1 for p in self.pages if p.state == "finished")
     @property
     def busy_count(self) -> int:
         """Get the number of busy pages"""
@@ -83,33 +72,3 @@ class PagePool:
         """Remove pages in error state"""
         with self._lock:
             self.pages = [p for p in self.pages if p.state != "error"]
-    def close_all_finished_pages(self):
-        """Close all pages in finished state and remove them from the pool"""
-        with self._lock:
-            pages_to_remove = []
-            for page_info in self.pages:
-                if page_info.state == "finished":
-                    try:
-                        page_info.page.close()
-                    except Exception:
-                        pass
-                    pages_to_remove.append(page_info)
-            for page_info in pages_to_remove:
-                self.pages.remove(page_info)
-    async def aclose_all_finished_pages(self):
-        """Async version: Close all pages in finished state and remove them from the pool"""
-        with self._lock:
-            pages_to_remove = []
-            for page_info in self.pages:
-                if page_info.state == "finished":
-                    try:
-                        await page_info.page.close()
-                    except Exception:
-                        pass
-                    pages_to_remove.append(page_info)
-            for page_info in pages_to_remove:
-                self.pages.remove(page_info)

scrapling-0.3.5/scrapling/engines/_browsers/_validators.py ADDED Viewed

@@ -0,0 +1,229 @@
+from pathlib import Path
+from typing import Annotated
+from dataclasses import dataclass
+from urllib.parse import urlparse
+from msgspec import Struct, Meta, convert, ValidationError
+from scrapling.core._types import (
+    Dict,
+    List,
+    Tuple,
+    Optional,
+    Callable,
+    SelectorWaitStates,
+)
+from scrapling.engines.toolbelt.navigation import construct_proxy_dict
+# Custom validators for msgspec
+def _validate_file_path(value: str):
+    """Fast file path validation"""
+    path = Path(value)
+    if not path.exists():
+        raise ValueError(f"Init script path not found: {value}")
+    if not path.is_file():
+        raise ValueError(f"Init script is not a file: {value}")
+    if not path.is_absolute():
+        raise ValueError(f"Init script is not a absolute path: {value}")
+def _validate_addon_path(value: str):
+    """Fast addon path validation"""
+    path = Path(value)
+    if not path.exists():
+        raise FileNotFoundError(f"Addon path not found: {value}")
+    if not path.is_dir():
+        raise ValueError(f"Addon path must be a directory of the extracted addon: {value}")
+def _validate_cdp_url(cdp_url: str):
+    """Fast CDP URL validation"""
+    try:
+        # Check the scheme
+        if not cdp_url.startswith(("ws://", "wss://")):
+            raise ValueError("CDP URL must use 'ws://' or 'wss://' scheme")
+        # Validate hostname and port
+        if not urlparse(cdp_url).netloc:
+            raise ValueError("Invalid hostname for the CDP URL")
+    except AttributeError as e:
+        raise ValueError(f"Malformed CDP URL: {cdp_url}: {str(e)}")
+    except Exception as e:
+        raise ValueError(f"Invalid CDP URL '{cdp_url}': {str(e)}")
+# Type aliases for cleaner annotations
+PagesCount = Annotated[int, Meta(ge=1, le=50)]
+Seconds = Annotated[int, float, Meta(ge=0)]
+class PlaywrightConfig(Struct, kw_only=True, frozen=False):
+    """Configuration struct for validation"""
+    max_pages: PagesCount = 1
+    cdp_url: Optional[str] = None
+    headless: bool = True
+    google_search: bool = True
+    hide_canvas: bool = False
+    disable_webgl: bool = False
+    real_chrome: bool = False
+    stealth: bool = False
+    wait: Seconds = 0
+    page_action: Optional[Callable] = None
+    proxy: Optional[str | Dict[str, str]] = None  # The default value for proxy in Playwright's source is `None`
+    locale: str = "en-US"
+    extra_headers: Optional[Dict[str, str]] = None
+    useragent: Optional[str] = None
+    timeout: Seconds = 30000
+    init_script: Optional[str] = None
+    disable_resources: bool = False
+    wait_selector: Optional[str] = None
+    cookies: Optional[List[Dict]] = None
+    network_idle: bool = False
+    load_dom: bool = True
+    wait_selector_state: SelectorWaitStates = "attached"
+    selector_config: Optional[Dict] = None
+    def __post_init__(self):
+        """Custom validation after msgspec validation"""
+        if self.page_action and not callable(self.page_action):
+            raise TypeError(f"page_action must be callable, got {type(self.page_action).__name__}")
+        if self.proxy:
+            self.proxy = construct_proxy_dict(self.proxy, as_tuple=True)
+        if self.cdp_url:
+            _validate_cdp_url(self.cdp_url)
+        if not self.cookies:
+            self.cookies = []
+        if not self.selector_config:
+            self.selector_config = {}
+        if self.init_script is not None:
+            _validate_file_path(self.init_script)
+class CamoufoxConfig(Struct, kw_only=True, frozen=False):
+    """Configuration struct for validation"""
+    max_pages: PagesCount = 1
+    headless: bool = True  # noqa: F821
+    block_images: bool = False
+    disable_resources: bool = False
+    block_webrtc: bool = False
+    allow_webgl: bool = True
+    network_idle: bool = False
+    load_dom: bool = True
+    humanize: bool | float = True
+    solve_cloudflare: bool = False
+    wait: Seconds = 0
+    timeout: Seconds = 30000
+    init_script: Optional[str] = None
+    page_action: Optional[Callable] = None
+    wait_selector: Optional[str] = None
+    addons: Optional[List[str]] = None
+    wait_selector_state: SelectorWaitStates = "attached"
+    cookies: Optional[List[Dict]] = None
+    google_search: bool = True
+    extra_headers: Optional[Dict[str, str]] = None
+    proxy: Optional[str | Dict[str, str]] = None  # The default value for proxy in Playwright's source is `None`
+    os_randomize: bool = False
+    disable_ads: bool = False
+    geoip: bool = False
+    selector_config: Optional[Dict] = None
+    additional_args: Optional[Dict] = None
+    def __post_init__(self):
+        """Custom validation after msgspec validation"""
+        if self.page_action and not callable(self.page_action):
+            raise TypeError(f"page_action must be callable, got {type(self.page_action).__name__}")
+        if self.proxy:
+            self.proxy = construct_proxy_dict(self.proxy, as_tuple=True)
+        if self.addons and isinstance(self.addons, list):
+            for addon in self.addons:
+                _validate_addon_path(addon)
+        else:
+            self.addons = []
+        if self.init_script is not None:
+            _validate_file_path(self.init_script)
+        if not self.cookies:
+            self.cookies = []
+        # Cloudflare timeout adjustment
+        if self.solve_cloudflare and self.timeout < 60_000:
+            self.timeout = 60_000
+        if not self.selector_config:
+            self.selector_config = {}
+        if not self.additional_args:
+            self.additional_args = {}
+# Code parts to validate `fetch` in the least possible numbers of lines overall
+class FetchConfig(Struct, kw_only=True):
+    """Configuration struct for `fetch` calls validation"""
+    google_search: bool = True
+    timeout: Seconds = 30000
+    wait: Seconds = 0
+    page_action: Optional[Callable] = None
+    extra_headers: Optional[Dict[str, str]] = None
+    disable_resources: bool = False
+    wait_selector: Optional[str] = None
+    wait_selector_state: SelectorWaitStates = "attached"
+    network_idle: bool = False
+    load_dom: bool = True
+    solve_cloudflare: bool = False
+    selector_config: Optional[Dict] = {}
+    def to_dict(self):
+        return {f: getattr(self, f) for f in self.__struct_fields__}
+@dataclass
+class _fetch_params:
+    """A dataclass of all parameters used by `fetch` calls"""
+    google_search: bool
+    timeout: Seconds
+    wait: Seconds
+    page_action: Optional[Callable]
+    extra_headers: Optional[Dict[str, str]]
+    disable_resources: bool
+    wait_selector: Optional[str]
+    wait_selector_state: SelectorWaitStates
+    network_idle: bool
+    load_dom: bool
+    solve_cloudflare: bool
+    selector_config: Optional[Dict]
+def validate_fetch(params: List[Tuple], sentinel=None) -> _fetch_params:
+    result = {}
+    overrides = {}
+    for arg, request_value, session_value in params:
+        if request_value is not sentinel:
+            overrides[arg] = request_value
+        else:
+            result[arg] = session_value
+    if overrides:
+        overrides = validate(overrides, FetchConfig).to_dict()
+        overrides.update(result)
+        return _fetch_params(**overrides)
+    if not result.get("solve_cloudflare"):
+        result["solve_cloudflare"] = False
+    return _fetch_params(**result)
+def validate(params: Dict, model) -> PlaywrightConfig | CamoufoxConfig | FetchConfig:
+    try:
+        return convert(params, model)
+    except ValidationError as e:
+        raise TypeError(f"Invalid argument type: {e}") from e

{scrapling-0.3.4 → scrapling-0.3.5}/scrapling/engines/static.py RENAMED Viewed

@@ -94,8 +94,8 @@ class FetcherSession:
         self.default_http3 = http3
         self.selector_config = selector_config or {}
-        self._curl_session: Optional[CurlSession] = None
-        self._async_curl_session: Optional[AsyncCurlSession] = None
+        self._curl_session: Optional[CurlSession] | bool = None
+        self._async_curl_session: Optional[AsyncCurlSession] | bool = None
     def _merge_request_args(self, **kwargs) -> Dict[str, Any]:
         """Merge request-specific arguments with default session arguments."""
@@ -239,7 +239,6 @@ class FetcherSession:
         Perform an HTTP request using the configured session.
         :param method: HTTP method to be used, supported methods are ["GET", "POST", "PUT", "DELETE"]
-        :param url: Target URL for the request.
         :param request_args: Arguments to be passed to the session's `request()` method.
         :param max_retries: Maximum number of retries for the request.
         :param retry_delay: Number of seconds to wait between retries.
@@ -280,7 +279,6 @@ class FetcherSession:
         Perform an HTTP request using the configured session.
         :param method: HTTP method to be used, supported methods are ["GET", "POST", "PUT", "DELETE"]
-        :param url: Target URL for the request.
         :param request_args: Arguments to be passed to the session's `request()` method.
         :param max_retries: Maximum number of retries for the request.
         :param retry_delay: Number of seconds to wait between retries.

{scrapling-0.3.4 → scrapling-0.3.5}/scrapling/engines/toolbelt/navigation.py RENAMED Viewed

@@ -4,7 +4,7 @@ Functions related to files and URLs
 from pathlib import Path
 from functools import lru_cache
-from urllib.parse import urlencode, urlparse
+from urllib.parse import urlparse
 from playwright.async_api import Route as async_Route
 from msgspec import Struct, structs, convert, ValidationError

{scrapling-0.3.4 → scrapling-0.3.5}/scrapling/parser.py RENAMED Viewed

@@ -239,7 +239,7 @@ class Selector(SelectorsGeneration):
         )
     def __handle_element(
-        self, element: HtmlElement | _ElementUnicodeResult
+        self, element: Optional[HtmlElement | _ElementUnicodeResult]
     ) -> Optional[Union[TextHandler, "Selector"]]:
         """Used internally in all functions to convert a single element to type (Selector|TextHandler) when possible"""
         if element is None:
@@ -345,7 +345,7 @@ class Selector(SelectorsGeneration):
         return TextHandler(content)
     @property
-    def body(self):
+    def body(self) -> str | bytes:
         """Return the raw body of the current `Selector` without any processing. Useful for binary and non-HTML requests."""
         return self._raw_body
@@ -1259,7 +1259,7 @@ class Selectors(List[Selector]):
         :param clean_match: if enabled, this will ignore all whitespaces and consecutive spaces while matching
         :param case_sensitive: if disabled, the function will set the regex to ignore the letters case while compiling it
         """
-        results = [n.text.re(regex, replace_entities, clean_match, case_sensitive) for n in self]
+        results = [n.re(regex, replace_entities, clean_match, case_sensitive) for n in self]
         return TextHandlers(flatten(results))
     def re_first(

{scrapling-0.3.4 → scrapling-0.3.5/scrapling.egg-info}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: scrapling
-Version: 0.3.4
+Version: 0.3.5
 Summary: Scrapling is an undetectable, powerful, flexible, high-performance Python library that makes Web Scraping easy and effortless as it should be!
 Home-page: https://github.com/D4Vinci/Scrapling
 Author: Karim Shoair
@@ -69,15 +69,15 @@ Requires-Dist: cssselect>=1.3.0
 Requires-Dist: orjson>=3.11.3
 Requires-Dist: tldextract>=5.3.0
 Provides-Extra: fetchers
-Requires-Dist: click>=8.2.1; extra == "fetchers"
+Requires-Dist: click>=8.3.0; extra == "fetchers"
 Requires-Dist: curl_cffi>=0.13.0; extra == "fetchers"
-Requires-Dist: playwright>=1.52.0; extra == "fetchers"
-Requires-Dist: rebrowser-playwright>=1.52.0; extra == "fetchers"
+Requires-Dist: playwright>=1.55.0; extra == "fetchers"
+Requires-Dist: patchright>=1.55.2; extra == "fetchers"
 Requires-Dist: camoufox>=0.4.11; extra == "fetchers"
 Requires-Dist: geoip2>=5.1.0; extra == "fetchers"
 Requires-Dist: msgspec>=0.19.0; extra == "fetchers"
 Provides-Extra: ai
-Requires-Dist: mcp>=1.14.0; extra == "ai"
+Requires-Dist: mcp>=1.14.1; extra == "ai"
 Requires-Dist: markdownify>=1.2.0; extra == "ai"
 Requires-Dist: scrapling[fetchers]; extra == "ai"
 Provides-Extra: shell
@@ -157,12 +157,13 @@ Built for the modern Web, Scrapling has its own rapid parsing engine and its fet
 <!-- sponsors -->
+<a href="https://www.thordata.com/?ls=github&lk=D4Vinci" target="_blank" title="A global network of over 60M+ residential proxies with 99.7% availability, ensuring stable and reliable web data scraping to support AI, BI, and workflows."><img src="https://raw.githubusercontent.com/D4Vinci/Scrapling/main/images/thordata.jpg"></a>
 <a href="https://evomi.com?utm_source=github&utm_medium=banner&utm_campaign=d4vinci-scrapling" target="_blank" title="Evomi is your Swiss Quality Proxy Provider, starting at $0.49/GB"><img src="https://raw.githubusercontent.com/D4Vinci/Scrapling/main/images/evomi.png"></a>
-<a href="https://petrosky.io/d4vinci" target="_blank" title="PetroSky delivers cutting-edge VPS hosting."><img src="https://raw.githubusercontent.com/D4Vinci/Scrapling/main/images/petrosky.png"></a>
 <a href="https://visit.decodo.com/Dy6W0b" target="_blank" title="Try the Most Efficient Residential Proxies for Free"><img src="https://raw.githubusercontent.com/D4Vinci/Scrapling/main/images/decodo.png"></a>
+<a href="https://petrosky.io/d4vinci" target="_blank" title="PetroSky delivers cutting-edge VPS hosting."><img src="https://raw.githubusercontent.com/D4Vinci/Scrapling/main/images/petrosky.png"></a>
 <a href="https://www.swiftproxy.net/" target="_blank" title="Unlock Reliable Proxy Services with Swiftproxy!"><img src="https://raw.githubusercontent.com/D4Vinci/Scrapling/main/images/swiftproxy.png"></a>
-<a href="https://serpapi.com/?utm_source=scrapling" target="_blank" title="Scrape Google and other search engines with SerpApi"><img src="https://raw.githubusercontent.com/D4Vinci/Scrapling/main/images/SerpApi.png"></a>
 <a href="https://www.nstproxy.com/?type=flow&utm_source=scrapling" target="_blank" title="One Proxy Service, Infinite Solutions at Unbeatable Prices!"><img src="https://raw.githubusercontent.com/D4Vinci/Scrapling/main/images/NSTproxy.png"></a>
+<a href="https://serpapi.com/?utm_source=scrapling" target="_blank" title="Scrape Google and other search engines with SerpApi"><img src="https://raw.githubusercontent.com/D4Vinci/Scrapling/main/images/SerpApi.png"></a>
 <!-- /sponsors -->
@@ -411,10 +412,9 @@ This project includes code adapted from:
 ## Thanks and References
 - [Daijro](https://github.com/daijro)'s brilliant work on [BrowserForge](https://github.com/daijro/browserforge) and [Camoufox](https://github.com/daijro/camoufox)
-- [Vinyzu](https://github.com/Vinyzu)'s work on [Botright](https://github.com/Vinyzu/Botright)
+- [Vinyzu](https://github.com/Vinyzu)'s brilliant work on [Botright](https://github.com/Vinyzu/Botright) and [PatchRight](https://github.com/Kaliiiiiiiiii-Vinyzu/patchright)
 - [brotector](https://github.com/kaliiiiiiiiii/brotector) for browser detection bypass techniques
-- [fakebrowser](https://github.com/kkoooqq/fakebrowser) for fingerprinting research
-- [rebrowser-patches](https://github.com/rebrowser/rebrowser-patches) for stealth improvements
+- [fakebrowser](https://github.com/kkoooqq/fakebrowser) and [BotBrowser](https://github.com/botswin/BotBrowser) for fingerprinting research
 ---
 <div align="center"><small>Designed & crafted with ❤️ by Karim Shoair.</small></div><br>

{scrapling-0.3.4 → scrapling-0.3.5}/scrapling.egg-info/requires.txt RENAMED Viewed

@@ -4,7 +4,7 @@ orjson>=3.11.3
 tldextract>=5.3.0
 [ai]
-mcp>=1.14.0
+mcp>=1.14.1
 markdownify>=1.2.0
 scrapling[fetchers]
@@ -12,10 +12,10 @@ scrapling[fetchers]
 scrapling[ai,shell]
 [fetchers]
-click>=8.2.1
+click>=8.3.0
 curl_cffi>=0.13.0
-playwright>=1.52.0
-rebrowser-playwright>=1.52.0
+playwright>=1.55.0
+patchright>=1.55.2
 camoufox>=0.4.11
 geoip2>=5.1.0
 msgspec>=0.19.0

{scrapling-0.3.4 → scrapling-0.3.5}/setup.cfg RENAMED Viewed

@@ -1,6 +1,6 @@
 [metadata]
 name = scrapling
-version = 0.3.4
+version = 0.3.5
 author = Karim Shoair
 author_email = karim.shoair@pm.me
 description = Scrapling is an undetectable, powerful, flexible, high-performance Python library that makes Web Scraping easy and effortless as it should be!

scrapling-0.3.4/scrapling/engines/_browsers/_validators.py DELETED Viewed

@@ -1,164 +0,0 @@
-from msgspec import Struct, convert, ValidationError
-from urllib.parse import urlparse
-from pathlib import Path
-from scrapling.core._types import (
-    Optional,
-    Dict,
-    Callable,
-    List,
-    SelectorWaitStates,
-)
-from scrapling.engines.toolbelt.navigation import construct_proxy_dict
-class PlaywrightConfig(Struct, kw_only=True, frozen=False):
-    """Configuration struct for validation"""
-    max_pages: int = 1
-    cdp_url: Optional[str] = None
-    headless: bool = True
-    google_search: bool = True
-    hide_canvas: bool = False
-    disable_webgl: bool = False
-    real_chrome: bool = False
-    stealth: bool = False
-    wait: int | float = 0
-    page_action: Optional[Callable] = None
-    proxy: Optional[str | Dict[str, str]] = None  # The default value for proxy in Playwright's source is `None`
-    locale: str = "en-US"
-    extra_headers: Optional[Dict[str, str]] = None
-    useragent: Optional[str] = None
-    timeout: int | float = 30000
-    init_script: Optional[str] = None
-    disable_resources: bool = False
-    wait_selector: Optional[str] = None
-    cookies: Optional[List[Dict]] = None
-    network_idle: bool = False
-    load_dom: bool = True
-    wait_selector_state: SelectorWaitStates = "attached"
-    selector_config: Optional[Dict] = None
-    def __post_init__(self):
-        """Custom validation after msgspec validation"""
-        if self.max_pages < 1 or self.max_pages > 50:
-            raise ValueError("max_pages must be between 1 and 50")
-        if self.timeout < 0:
-            raise ValueError("timeout must be >= 0")
-        if self.page_action and not callable(self.page_action):
-            raise TypeError(f"page_action must be callable, got {type(self.page_action).__name__}")
-        if self.proxy:
-            self.proxy = construct_proxy_dict(self.proxy, as_tuple=True)
-        if self.cdp_url:
-            self.__validate_cdp(self.cdp_url)
-        if not self.cookies:
-            self.cookies = []
-        if not self.selector_config:
-            self.selector_config = {}
-        if self.init_script is not None:
-            script_path = Path(self.init_script)
-            if not script_path.exists():
-                raise ValueError("Init script path not found")
-            elif not script_path.is_file():
-                raise ValueError("Init script is not a file")
-            elif not script_path.is_absolute():
-                raise ValueError("Init script is not a absolute path")
-    @staticmethod
-    def __validate_cdp(cdp_url):
-        try:
-            # Check the scheme
-            if not cdp_url.startswith(("ws://", "wss://")):
-                raise ValueError("CDP URL must use 'ws://' or 'wss://' scheme")
-            # Validate hostname and port
-            if not urlparse(cdp_url).netloc:
-                raise ValueError("Invalid hostname for the CDP URL")
-        except AttributeError as e:
-            raise ValueError(f"Malformed CDP URL: {cdp_url}: {str(e)}")
-        except Exception as e:
-            raise ValueError(f"Invalid CDP URL '{cdp_url}': {str(e)}")
-class CamoufoxConfig(Struct, kw_only=True, frozen=False):
-    """Configuration struct for validation"""
-    max_pages: int = 1
-    headless: bool = True  # noqa: F821
-    block_images: bool = False
-    disable_resources: bool = False
-    block_webrtc: bool = False
-    allow_webgl: bool = True
-    network_idle: bool = False
-    load_dom: bool = True
-    humanize: bool | float = True
-    solve_cloudflare: bool = False
-    wait: int | float = 0
-    timeout: int | float = 30000
-    init_script: Optional[str] = None
-    page_action: Optional[Callable] = None
-    wait_selector: Optional[str] = None
-    addons: Optional[List[str]] = None
-    wait_selector_state: SelectorWaitStates = "attached"
-    cookies: Optional[List[Dict]] = None
-    google_search: bool = True
-    extra_headers: Optional[Dict[str, str]] = None
-    proxy: Optional[str | Dict[str, str]] = None  # The default value for proxy in Playwright's source is `None`
-    os_randomize: bool = False
-    disable_ads: bool = False
-    geoip: bool = False
-    selector_config: Optional[Dict] = None
-    additional_args: Optional[Dict] = None
-    def __post_init__(self):
-        """Custom validation after msgspec validation"""
-        if self.max_pages < 1 or self.max_pages > 50:
-            raise ValueError("max_pages must be between 1 and 50")
-        if self.timeout < 0:
-            raise ValueError("timeout must be >= 0")
-        if self.page_action and not callable(self.page_action):
-            raise TypeError(f"page_action must be callable, got {type(self.page_action).__name__}")
-        if self.proxy:
-            self.proxy = construct_proxy_dict(self.proxy, as_tuple=True)
-        if not self.addons:
-            self.addons = []
-        else:
-            for addon in self.addons:
-                addon_path = Path(addon)
-                if not addon_path.exists():
-                    raise FileNotFoundError(f"Addon's path not found: {addon}")
-                elif not addon_path.is_dir():
-                    raise ValueError(
-                        f"Addon's path is not a folder, you need to pass a folder of the extracted addon: {addon}"
-                    )
-        if self.init_script is not None:
-            script_path = Path(self.init_script)
-            if not script_path.exists():
-                raise ValueError("Init script path not found")
-            elif not script_path.is_file():
-                raise ValueError("Init script is not a file")
-            elif not script_path.is_absolute():
-                raise ValueError("Init script is not a absolute path")
-        if not self.cookies:
-            self.cookies = []
-        if self.solve_cloudflare and self.timeout < 60_000:
-            self.timeout = 60_000
-        if not self.selector_config:
-            self.selector_config = {}
-        if not self.additional_args:
-            self.additional_args = {}
-def validate(params, model):
-    try:
-        config = convert(params, model)
-    except ValidationError as e:
-        raise TypeError(f"Invalid argument type: {e}")
-    return config