PyPI - scrapling - Versions diffs - 0.2.1__tar.gz → 0.2.3__tar.gz - Mend

scrapling 0.2.1tar.gz → 0.2.3tar.gz

Files changed (48) hide show

{scrapling-0.2.1 → scrapling-0.2.3}/MANIFEST.in RENAMED Viewed

@@ -1,6 +1,7 @@
 include LICENSE
 include *.db
 include *.js
+include scrapling/engines/toolbelt/bypasses/*.js
 include scrapling/*.db
 include scrapling/*.db*
 include scrapling/py.typed

{scrapling-0.2.1/scrapling.egg-info → scrapling-0.2.3}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.1
 Name: scrapling
-Version: 0.2.1
+Version: 0.2.3
 Summary: Scrapling is a powerful, flexible, and high-performance web scraping library for Python. It
 Home-page: https://github.com/D4Vinci/Scrapling
 Author: Karim Shoair
@@ -41,7 +41,7 @@ Requires-Dist: tldextract
 Requires-Dist: httpx[brotli,zstd]
 Requires-Dist: playwright
 Requires-Dist: rebrowser-playwright
-Requires-Dist: camoufox>=0.3.9
+Requires-Dist: camoufox>=0.3.10
 Requires-Dist: browserforge
 # 🕷️ Scrapling: Undetectable, Lightning-Fast, and Adaptive Web Scraping for Python
@@ -52,9 +52,9 @@ Dealing with failing web scrapers due to anti-bot protections or website changes
 Scrapling is a high-performance, intelligent web scraping library for Python that automatically adapts to website changes while significantly outperforming popular alternatives. For both beginners and experts, Scrapling provides powerful features while maintaining simplicity.
 ```python
->> from scrapling import Fetcher, StealthyFetcher, PlayWrightFetcher
+>> from scrapling.default import Fetcher, StealthyFetcher, PlayWrightFetcher
 # Fetch websites' source under the radar!
->> page = StealthyFetcher().fetch('https://example.com', headless=True, network_idle=True)
+>> page = StealthyFetcher.fetch('https://example.com', headless=True, network_idle=True)
 >> print(page.status)
 200
 >> products = page.css('.product', auto_save=True)  # Scrape data that survives website design changes!
@@ -257,12 +257,21 @@ python -m browserforge update
 ```
 ## Fetching Websites Features
-All fetcher-type classes are imported in the same way
+You might be a little bit confused by now so let me clear things up. All fetcher-type classes are imported in the same way
 ```python
 from scrapling import Fetcher, StealthyFetcher, PlayWrightFetcher
 ```
 And all of them can take these initialization arguments: `auto_match`, `huge_tree`, `keep_comments`, `storage`, `storage_args`, and `debug` which are the same ones you give to the `Adaptor` class.
+If you don't want to pass arguments to the generated `Adaptor` object and want to use the default values, you can use this import instead for cleaner code:
+```python
+from scrapling.default import Fetcher, StealthyFetcher, PlayWrightFetcher
+```
+then use it right away without initializing like:
+```python
+page = StealthyFetcher.fetch('https://example.com')
+```
 Also, the `Response` object returned from all fetchers is the same as `Adaptor` object except it has these added attributes: `status`, `reason`, `cookies`, `headers`, and `request_headers`. All `cookies`, `headers`, and `request_headers` are always of type `dictionary`.
 > [!NOTE]
 > The `auto_match` argument is enabled by default which is the one you should care about the most as you will see later.
@@ -803,6 +812,8 @@ Yes, Scrapling instances are thread-safe. Each Adaptor instance maintains its st
 ## More Sponsors!
 [![Capsolver Banner](https://raw.githubusercontent.com/D4Vinci/Scrapling/main/images/CapSolver.png)](https://www.capsolver.com/?utm_source=github&utm_medium=repo&utm_campaign=scraping&utm_term=Scrapling)
+<a href="https://serpapi.com/?utm_source=scrapling"><img src="https://raw.githubusercontent.com/D4Vinci/Scrapling/main/images/SerpApi.png" height="500" width="500" alt="SerpApi Banner" ></a>
 ## Contributing
 Everybody is invited and welcome to contribute to Scrapling. There is a lot to do!

{scrapling-0.2.1 → scrapling-0.2.3}/README.md RENAMED Viewed

@@ -6,9 +6,9 @@ Dealing with failing web scrapers due to anti-bot protections or website changes
 Scrapling is a high-performance, intelligent web scraping library for Python that automatically adapts to website changes while significantly outperforming popular alternatives. For both beginners and experts, Scrapling provides powerful features while maintaining simplicity.
 ```python
->> from scrapling import Fetcher, StealthyFetcher, PlayWrightFetcher
+>> from scrapling.default import Fetcher, StealthyFetcher, PlayWrightFetcher
 # Fetch websites' source under the radar!
->> page = StealthyFetcher().fetch('https://example.com', headless=True, network_idle=True)
+>> page = StealthyFetcher.fetch('https://example.com', headless=True, network_idle=True)
 >> print(page.status)
 200
 >> products = page.css('.product', auto_save=True)  # Scrape data that survives website design changes!
@@ -211,12 +211,21 @@ python -m browserforge update
 ```
 ## Fetching Websites Features
-All fetcher-type classes are imported in the same way
+You might be a little bit confused by now so let me clear things up. All fetcher-type classes are imported in the same way
 ```python
 from scrapling import Fetcher, StealthyFetcher, PlayWrightFetcher
 ```
 And all of them can take these initialization arguments: `auto_match`, `huge_tree`, `keep_comments`, `storage`, `storage_args`, and `debug` which are the same ones you give to the `Adaptor` class.
+If you don't want to pass arguments to the generated `Adaptor` object and want to use the default values, you can use this import instead for cleaner code:
+```python
+from scrapling.default import Fetcher, StealthyFetcher, PlayWrightFetcher
+```
+then use it right away without initializing like:
+```python
+page = StealthyFetcher.fetch('https://example.com')
+```
 Also, the `Response` object returned from all fetchers is the same as `Adaptor` object except it has these added attributes: `status`, `reason`, `cookies`, `headers`, and `request_headers`. All `cookies`, `headers`, and `request_headers` are always of type `dictionary`.
 > [!NOTE]
 > The `auto_match` argument is enabled by default which is the one you should care about the most as you will see later.
@@ -757,6 +766,8 @@ Yes, Scrapling instances are thread-safe. Each Adaptor instance maintains its st
 ## More Sponsors!
 [![Capsolver Banner](https://raw.githubusercontent.com/D4Vinci/Scrapling/main/images/CapSolver.png)](https://www.capsolver.com/?utm_source=github&utm_medium=repo&utm_campaign=scraping&utm_term=Scrapling)
+<a href="https://serpapi.com/?utm_source=scrapling"><img src="https://raw.githubusercontent.com/D4Vinci/Scrapling/main/images/SerpApi.png" height="500" width="500" alt="SerpApi Banner" ></a>
 ## Contributing
 Everybody is invited and welcome to contribute to Scrapling. There is a lot to do!

{scrapling-0.2.1 → scrapling-0.2.3}/scrapling/__init__.py RENAMED Viewed

@@ -4,7 +4,7 @@ from scrapling.parser import Adaptor, Adaptors
 from scrapling.core.custom_types import TextHandler, AttributesHandler
 __author__ = "Karim Shoair (karim.shoair@pm.me)"
-__version__ = "0.2.1"
+__version__ = "0.2.3"
 __copyright__ = "Copyright (c) 2024 Karim Shoair"

scrapling-0.2.3/scrapling/defaults.py ADDED Viewed

@@ -0,0 +1,6 @@
+from .fetchers import Fetcher, StealthyFetcher, PlayWrightFetcher
+# If you are going to use Fetchers with the default settings, import them from this file instead for a cleaner looking code
+Fetcher = Fetcher()
+StealthyFetcher = StealthyFetcher()
+PlayWrightFetcher = PlayWrightFetcher()

{scrapling-0.2.1 → scrapling-0.2.3}/scrapling/engines/camo.py RENAMED Viewed

@@ -114,14 +114,14 @@ class CamoufoxEngine:
             response = Response(
                 url=res.url,
                 text=page.content(),
-                content=res.body(),
+                body=res.body(),
                 status=res.status,
                 reason=res.status_text,
                 encoding=encoding,
                 cookies={cookie['name']: cookie['value'] for cookie in page.context.cookies()},
                 headers=res.all_headers(),
                 request_headers=res.request.all_headers(),
-                adaptor_arguments=self.adaptor_arguments
+                **self.adaptor_arguments
             )
             page.close()

{scrapling-0.2.1 → scrapling-0.2.3}/scrapling/engines/pw.py RENAMED Viewed

@@ -224,14 +224,14 @@ class PlaywrightEngine:
             response = Response(
                 url=res.url,
                 text=page.content(),
-                content=res.body(),
+                body=res.body(),
                 status=res.status,
                 reason=res.status_text,
                 encoding=encoding,
                 cookies={cookie['name']: cookie['value'] for cookie in page.context.cookies()},
                 headers=res.all_headers(),
                 request_headers=res.request.all_headers(),
-                adaptor_arguments=self.adaptor_arguments
+                **self.adaptor_arguments
             )
             page.close()
         return response

{scrapling-0.2.1 → scrapling-0.2.3}/scrapling/engines/static.py RENAMED Viewed

@@ -53,14 +53,14 @@ class StaticEngine:
         return Response(
             url=str(response.url),
             text=response.text,
-            content=response.content,
+            body=response.content,
             status=response.status_code,
             reason=response.reason_phrase,
             encoding=response.encoding or 'utf-8',
             cookies=dict(response.cookies),
             headers=dict(response.headers),
             request_headers=dict(response.request.headers),
-            adaptor_arguments=self.adaptor_arguments
+            **self.adaptor_arguments
         )
     def get(self, url: str, stealthy_headers: Optional[bool] = True, **kwargs: Dict) -> Response:

scrapling-0.2.3/scrapling/engines/toolbelt/bypasses/navigator_plugins.js ADDED Viewed

@@ -0,0 +1,40 @@
+if(navigator.plugins.length == 0){
+    Object.defineProperty(navigator, 'plugins', {
+        get: () => {
+            const PDFViewerPlugin = Object.create(Plugin.prototype, {
+                description: { value: 'Portable Document Format', enumerable: false },
+                filename: { value: 'internal-pdf-viewer', enumerable: false },
+                name: { value: 'PDF Viewer', enumerable: false },
+            });
+            const ChromePDFViewer = Object.create(Plugin.prototype, {
+                description: { value: 'Portable Document Format', enumerable: false },
+                filename: { value: 'internal-pdf-viewer', enumerable: false },
+                name: { value: 'Chrome PDF Viewer', enumerable: false },
+            });
+            const ChromiumPDFViewer = Object.create(Plugin.prototype, {
+                description: { value: 'Portable Document Format', enumerable: false },
+                filename: { value: 'internal-pdf-viewer', enumerable: false },
+                name: { value: 'Chromium PDF Viewer', enumerable: false },
+            });
+            const EdgePDFViewer = Object.create(Plugin.prototype, {
+                description: { value: 'Portable Document Format', enumerable: false },
+                filename: { value: 'internal-pdf-viewer', enumerable: false },
+                name: { value: 'Microsoft Edge PDF Viewer', enumerable: false },
+            });
+            const WebKitPDFPlugin = Object.create(Plugin.prototype, {
+                description: { value: 'Portable Document Format', enumerable: false },
+                filename: { value: 'internal-pdf-viewer', enumerable: false },
+                name: { value: 'WebKit built-in PDF', enumerable: false },
+            });
+            return Object.create(PluginArray.prototype, {
+                length: { value: 5 },
+                0: { value: PDFViewerPlugin },
+                1: { value: ChromePDFViewer },
+                2: { value: ChromiumPDFViewer },
+                3: { value: EdgePDFViewer },
+                4: { value: WebKitPDFPlugin },
+            });
+        },
+    });
+}

scrapling-0.2.3/scrapling/engines/toolbelt/bypasses/notification_permission.js ADDED Viewed

@@ -0,0 +1,5 @@
+// Bypasses `notificationIsDenied` test in creepsjs's 'Like Headless' sections
+const isSecure = document.location.protocol.startsWith('https')
+if (isSecure){
+    Object.defineProperty(Notification, 'permission', {get: () => 'default'})
+}

scrapling-0.2.3/scrapling/engines/toolbelt/bypasses/pdf_viewer.js ADDED Viewed

@@ -0,0 +1,5 @@
+// PDF viewer enabled
+// Bypasses `pdfIsDisabled` test in creepsjs's 'Like Headless' sections
+Object.defineProperty(navigator, 'pdfViewerEnabled', {
+    get: () => true,
+});

scrapling-0.2.3/scrapling/engines/toolbelt/bypasses/playwright_fingerprint.js ADDED Viewed

	@@ -0,0 +1,2 @@
1	+ // Remove playwright fingerprint => https://github.com/microsoft/playwright/commit/c9e673c6dca746384338ab6bb0cf63c7e7caa9b2#diff-087773eea292da9db5a3f27de8f1a2940cdb895383ad750c3cd8e01772a35b40R915
2	+ delete __pwInitScripts;

scrapling-0.2.3/scrapling/engines/toolbelt/bypasses/screen_props.js ADDED Viewed

@@ -0,0 +1,27 @@
+const windowScreenProps = {
+    // Dimensions
+    innerHeight: 0,
+    innerWidth: 0,
+    outerHeight: 754,
+    outerWidth: 1313,
+    // Position
+    screenX: 19,
+    pageXOffset: 0,
+    pageYOffset: 0,
+    // Display
+    devicePixelRatio: 2
+};
+try {
+    for (const [prop, value] of Object.entries(windowScreenProps)) {
+        if (value > 0) {
+            // The 0 values are introduced by collecting in the hidden iframe.
+            // They are document sizes anyway so no need to test them or inject them.
+            window[prop] = value;
+        }
+    }
+} catch (e) {
+    console.warn(e);
+};

scrapling-0.2.3/scrapling/engines/toolbelt/bypasses/webdriver_fully.js ADDED Viewed

@@ -0,0 +1,27 @@
+// Create a function that looks like a native getter
+const nativeGetter = function get webdriver() {
+    return false;
+};
+// Copy over native function properties
+Object.defineProperties(nativeGetter, {
+    name: { value: 'get webdriver', configurable: true },
+    length: { value: 0, configurable: true },
+    toString: {
+        value: function() {
+            return `function get webdriver() { [native code] }`;
+        },
+        configurable: true
+    }
+});
+// Make it look native
+Object.setPrototypeOf(nativeGetter, Function.prototype);
+// Apply the modified descriptor
+Object.defineProperty(Navigator.prototype, 'webdriver', {
+    get: nativeGetter,
+    set: undefined,
+    enumerable: true,
+    configurable: true
+});

scrapling-0.2.3/scrapling/engines/toolbelt/bypasses/window_chrome.js ADDED Viewed

@@ -0,0 +1,213 @@
+// To escape `HEADCHR_CHROME_OBJ` test in headless mode => https://github.com/antoinevastel/fp-collect/blob/master/src/fpCollect.js#L322
+// Faking window.chrome fully
+if (!window.chrome) {
+    // First, save all existing properties
+    const originalKeys = Object.getOwnPropertyNames(window);
+    const tempObj = {};
+    // Recreate all properties in original order
+    for (const key of originalKeys) {
+        const descriptor = Object.getOwnPropertyDescriptor(window, key);
+        const value = window[key];
+        // delete window[key];
+        Object.defineProperty(tempObj, key, descriptor);
+    }
+    // Use the exact property descriptor found in headful Chrome
+    // fetch it via `Object.getOwnPropertyDescriptor(window, 'chrome')`
+    const mockChrome = {
+        loadTimes: {},
+        csi: {},
+        app: {
+            isInstalled: false
+        },
+        // Add other Chrome-specific properties
+    };
+    Object.defineProperty(tempObj, 'chrome', {
+        writable: true,
+        enumerable: true,
+        configurable: false,
+        value: mockChrome
+    });
+    for (const key of Object.getOwnPropertyNames(tempObj)) {
+        try {
+            Object.defineProperty(window, key,
+                Object.getOwnPropertyDescriptor(tempObj, key));
+        } catch (e) {}
+    };
+    // todo: solve this
+    // Using line below bypasses the hasHighChromeIndex test in creepjs ==> https://github.com/abrahamjuliot/creepjs/blob/master/src/headless/index.ts#L121
+    // Chrome object have to be in the end of the window properties
+    // Object.assign(window, tempObj);
+    // But makes window.chrome unreadable on 'https://bot.sannysoft.com/'
+}
+// That means we're running headful and don't need to mock anything
+if ('app' in window.chrome) {
+    return; // Nothing to do here
+}
+const makeError = {
+    ErrorInInvocation: fn => {
+        const err = new TypeError(`Error in invocation of app.${fn}()`);
+        return utils.stripErrorWithAnchor(
+            err,
+            `at ${fn} (eval at <anonymous>`,
+        );
+    },
+};
+// check with: `JSON.stringify(window.chrome['app'])`
+const STATIC_DATA = JSON.parse(
+    `
+{
+  "isInstalled": false,
+  "InstallState": {
+    "DISABLED": "disabled",
+    "INSTALLED": "installed",
+    "NOT_INSTALLED": "not_installed"
+  },
+  "RunningState": {
+    "CANNOT_RUN": "cannot_run",
+    "READY_TO_RUN": "ready_to_run",
+    "RUNNING": "running"
+  }
+}
+    `.trim(),
+    );
+window.chrome.app = {
+    ...STATIC_DATA,
+    get isInstalled() {
+        return false;
+    },
+    getDetails: function getDetails() {
+        if (arguments.length) {
+            throw makeError.ErrorInInvocation(`getDetails`);
+        }
+        return null;
+    },
+    getIsInstalled: function getDetails() {
+        if (arguments.length) {
+            throw makeError.ErrorInInvocation(`getIsInstalled`);
+        }
+        return false;
+    },
+    runningState: function getDetails() {
+        if (arguments.length) {
+            throw makeError.ErrorInInvocation(`runningState`);
+        }
+        return 'cannot_run';
+    },
+};
+// Check that the Navigation Timing API v1 is available, we need that
+if (!window.performance || !window.performance.timing) {
+    return;
+}
+const {timing} = window.performance;
+window.chrome.csi = function () {
+    return {
+        onloadT: timing.domContentLoadedEventEnd,
+        startE: timing.navigationStart,
+        pageT: Date.now() - timing.navigationStart,
+        tran: 15, // Transition type or something
+    };
+};
+if (!window.PerformancePaintTiming){
+    return;
+}
+const {performance} = window;
+// Some stuff is not available on about:blank as it requires a navigation to occur,
+// let's harden the code to not fail then:
+const ntEntryFallback = {
+    nextHopProtocol: 'h2',
+    type: 'other',
+};
+// The API exposes some funky info regarding the connection
+const protocolInfo = {
+    get connectionInfo() {
+        const ntEntry =
+            performance.getEntriesByType('navigation')[0] || ntEntryFallback;
+        return ntEntry.nextHopProtocol;
+    },
+    get npnNegotiatedProtocol() {
+        // NPN is deprecated in favor of ALPN, but this implementation returns the
+        // HTTP/2 or HTTP2+QUIC/39 requests negotiated via ALPN.
+        const ntEntry =
+            performance.getEntriesByType('navigation')[0] || ntEntryFallback;
+        return ['h2', 'hq'].includes(ntEntry.nextHopProtocol)
+            ? ntEntry.nextHopProtocol
+            : 'unknown';
+    },
+    get navigationType() {
+        const ntEntry =
+            performance.getEntriesByType('navigation')[0] || ntEntryFallback;
+        return ntEntry.type;
+    },
+    get wasAlternateProtocolAvailable() {
+        // The Alternate-Protocol header is deprecated in favor of Alt-Svc
+        // (https://www.mnot.net/blog/2016/03/09/alt-svc), so technically this
+        // should always return false.
+        return false;
+    },
+    get wasFetchedViaSpdy() {
+        // SPDY is deprecated in favor of HTTP/2, but this implementation returns
+        // true for HTTP/2 or HTTP2+QUIC/39 as well.
+        const ntEntry =
+            performance.getEntriesByType('navigation')[0] || ntEntryFallback;
+        return ['h2', 'hq'].includes(ntEntry.nextHopProtocol);
+    },
+    get wasNpnNegotiated() {
+        // NPN is deprecated in favor of ALPN, but this implementation returns true
+        // for HTTP/2 or HTTP2+QUIC/39 requests negotiated via ALPN.
+        const ntEntry =
+            performance.getEntriesByType('navigation')[0] || ntEntryFallback;
+        return ['h2', 'hq'].includes(ntEntry.nextHopProtocol);
+    },
+};
+// Truncate number to specific number of decimals, most of the `loadTimes` stuff has 3
+function toFixed(num, fixed) {
+    var re = new RegExp('^-?\\d+(?:.\\d{0,' + (fixed || -1) + '})?');
+    return num.toString().match(re)[0];
+}
+const timingInfo = {
+    get firstPaintAfterLoadTime() {
+        // This was never actually implemented and always returns 0.
+        return 0;
+    },
+    get requestTime() {
+        return timing.navigationStart / 1000;
+    },
+    get startLoadTime() {
+        return timing.navigationStart / 1000;
+    },
+    get commitLoadTime() {
+        return timing.responseStart / 1000;
+    },
+    get finishDocumentLoadTime() {
+        return timing.domContentLoadedEventEnd / 1000;
+    },
+    get finishLoadTime() {
+        return timing.loadEventEnd / 1000;
+    },
+    get firstPaintTime() {
+        const fpEntry = performance.getEntriesByType('paint')[0] || {
+            startTime: timing.loadEventEnd / 1000, // Fallback if no navigation occured (`about:blank`)
+        };
+        return toFixed(
+            (fpEntry.startTime + performance.timeOrigin) / 1000,
+            3,
+        );
+    },
+};
+window.chrome.loadTimes = function () {
+    return {
+        ...protocolInfo,
+        ...timingInfo,
+    };
+};

{scrapling-0.2.1 → scrapling-0.2.3}/scrapling/engines/toolbelt/custom.py RENAMED Viewed

@@ -12,15 +12,14 @@ from scrapling.core._types import Any, List, Type, Union, Optional, Dict, Callab
 class Response(Adaptor):
     """This class is returned by all engines as a way to unify response type between different libraries."""
-    def __init__(self, url: str, text: str, content: bytes, status: int, reason: str, cookies: Dict, headers: Dict, request_headers: Dict, adaptor_arguments: Dict, encoding: str = 'utf-8'):
+    def __init__(self, url: str, text: str, body: bytes, status: int, reason: str, cookies: Dict, headers: Dict, request_headers: Dict, encoding: str = 'utf-8', **adaptor_arguments: Dict):
         automatch_domain = adaptor_arguments.pop('automatch_domain', None)
-        super().__init__(text=text, body=content, url=automatch_domain or url, encoding=encoding, **adaptor_arguments)
         self.status = status
         self.reason = reason
         self.cookies = cookies
         self.headers = headers
         self.request_headers = request_headers
+        super().__init__(text=text, body=body, url=automatch_domain or url, encoding=encoding, **adaptor_arguments)
         # For back-ward compatibility
         self.adaptor = self
@@ -31,7 +30,7 @@ class Response(Adaptor):
 class BaseFetcher:
     def __init__(
             self, huge_tree: bool = True, keep_comments: Optional[bool] = False, auto_match: Optional[bool] = True,
-            storage: Any = SQLiteStorageSystem, storage_args: Optional[Dict] = None, debug: Optional[bool] = True,
+            storage: Any = SQLiteStorageSystem, storage_args: Optional[Dict] = None, debug: Optional[bool] = False,
             automatch_domain: Optional[str] = None,
     ):
         """Arguments below are the same from the Adaptor class so you can pass them directly, the rest of Adaptor's arguments

{scrapling-0.2.1 → scrapling-0.2.3}/scrapling/parser.py RENAMED Viewed

@@ -32,6 +32,7 @@ class Adaptor(SelectorsGeneration):
             storage: Any = SQLiteStorageSystem,
             storage_args: Optional[Dict] = None,
             debug: Optional[bool] = True,
+            **kwargs
     ):
         """The main class that works as a wrapper for the HTML input data. Using this class, you can search for elements
         with expressions in CSS, XPath, or with simply text. Check the docs for more info.
@@ -117,6 +118,10 @@ class Adaptor(SelectorsGeneration):
         self.__attributes = None
         self.__tag = None
         self.__debug = debug
+        # No need to check if all response attributes exist or not because if `status` exist, then the rest exist (Save some CPU cycles for speed)
+        self.__response_data = {
+            key: getattr(self, key) for key in ('status', 'reason', 'cookies', 'headers', 'request_headers',)
+        } if hasattr(self, 'status') else {}
     # Node functionalities, I wanted to move to separate Mixin class but it had slight impact on performance
     @staticmethod
@@ -138,10 +143,14 @@ class Adaptor(SelectorsGeneration):
             return TextHandler(str(element))
         else:
             if issubclass(type(element), html.HtmlMixin):
                 return self.__class__(
-                    root=element, url=self.url, encoding=self.encoding, auto_match=self.__auto_match_enabled,
+                    root=element,
+                    text='', body=b'',  # Since root argument is provided, both `text` and `body` will be ignored so this is just a filler
+                    url=self.url, encoding=self.encoding, auto_match=self.__auto_match_enabled,
                     keep_comments=True,  # if the comments are already removed in initialization, no need to try to delete them in sub-elements
-                    huge_tree=self.__huge_tree_enabled, debug=self.__debug
+                    huge_tree=self.__huge_tree_enabled, debug=self.__debug,
+                    **self.__response_data
                 )
             return element

{scrapling-0.2.1 → scrapling-0.2.3/scrapling.egg-info}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.1
 Name: scrapling
-Version: 0.2.1
+Version: 0.2.3
 Summary: Scrapling is a powerful, flexible, and high-performance web scraping library for Python. It
 Home-page: https://github.com/D4Vinci/Scrapling
 Author: Karim Shoair
@@ -41,7 +41,7 @@ Requires-Dist: tldextract
 Requires-Dist: httpx[brotli,zstd]
 Requires-Dist: playwright
 Requires-Dist: rebrowser-playwright
-Requires-Dist: camoufox>=0.3.9
+Requires-Dist: camoufox>=0.3.10
 Requires-Dist: browserforge
 # 🕷️ Scrapling: Undetectable, Lightning-Fast, and Adaptive Web Scraping for Python
@@ -52,9 +52,9 @@ Dealing with failing web scrapers due to anti-bot protections or website changes
 Scrapling is a high-performance, intelligent web scraping library for Python that automatically adapts to website changes while significantly outperforming popular alternatives. For both beginners and experts, Scrapling provides powerful features while maintaining simplicity.
 ```python
->> from scrapling import Fetcher, StealthyFetcher, PlayWrightFetcher
+>> from scrapling.default import Fetcher, StealthyFetcher, PlayWrightFetcher
 # Fetch websites' source under the radar!
->> page = StealthyFetcher().fetch('https://example.com', headless=True, network_idle=True)
+>> page = StealthyFetcher.fetch('https://example.com', headless=True, network_idle=True)
 >> print(page.status)
 200
 >> products = page.css('.product', auto_save=True)  # Scrape data that survives website design changes!
@@ -257,12 +257,21 @@ python -m browserforge update
 ```
 ## Fetching Websites Features
-All fetcher-type classes are imported in the same way
+You might be a little bit confused by now so let me clear things up. All fetcher-type classes are imported in the same way
 ```python
 from scrapling import Fetcher, StealthyFetcher, PlayWrightFetcher
 ```
 And all of them can take these initialization arguments: `auto_match`, `huge_tree`, `keep_comments`, `storage`, `storage_args`, and `debug` which are the same ones you give to the `Adaptor` class.
+If you don't want to pass arguments to the generated `Adaptor` object and want to use the default values, you can use this import instead for cleaner code:
+```python
+from scrapling.default import Fetcher, StealthyFetcher, PlayWrightFetcher
+```
+then use it right away without initializing like:
+```python
+page = StealthyFetcher.fetch('https://example.com')
+```
 Also, the `Response` object returned from all fetchers is the same as `Adaptor` object except it has these added attributes: `status`, `reason`, `cookies`, `headers`, and `request_headers`. All `cookies`, `headers`, and `request_headers` are always of type `dictionary`.
 > [!NOTE]
 > The `auto_match` argument is enabled by default which is the one you should care about the most as you will see later.
@@ -803,6 +812,8 @@ Yes, Scrapling instances are thread-safe. Each Adaptor instance maintains its st
 ## More Sponsors!
 [![Capsolver Banner](https://raw.githubusercontent.com/D4Vinci/Scrapling/main/images/CapSolver.png)](https://www.capsolver.com/?utm_source=github&utm_medium=repo&utm_campaign=scraping&utm_term=Scrapling)
+<a href="https://serpapi.com/?utm_source=scrapling"><img src="https://raw.githubusercontent.com/D4Vinci/Scrapling/main/images/SerpApi.png" height="500" width="500" alt="SerpApi Banner" ></a>
 ## Contributing
 Everybody is invited and welcome to contribute to Scrapling. There is a lot to do!

{scrapling-0.2.1 → scrapling-0.2.3}/scrapling.egg-info/SOURCES.txt RENAMED Viewed

@@ -4,6 +4,7 @@ README.md
 setup.cfg
 setup.py
 scrapling/__init__.py
+scrapling/defaults.py
 scrapling/fetchers.py
 scrapling/parser.py
 scrapling/py.typed
@@ -29,6 +30,13 @@ scrapling/engines/toolbelt/__init__.py
 scrapling/engines/toolbelt/custom.py
 scrapling/engines/toolbelt/fingerprints.py
 scrapling/engines/toolbelt/navigation.py
+scrapling/engines/toolbelt/bypasses/navigator_plugins.js
+scrapling/engines/toolbelt/bypasses/notification_permission.js
+scrapling/engines/toolbelt/bypasses/pdf_viewer.js
+scrapling/engines/toolbelt/bypasses/playwright_fingerprint.js
+scrapling/engines/toolbelt/bypasses/screen_props.js
+scrapling/engines/toolbelt/bypasses/webdriver_fully.js
+scrapling/engines/toolbelt/bypasses/window_chrome.js
 tests/__init__.py
 tests/fetchers/__init__.py
 tests/fetchers/test_camoufox.py

{scrapling-0.2.1 → scrapling-0.2.3}/scrapling.egg-info/requires.txt RENAMED Viewed

@@ -7,5 +7,5 @@ tldextract
 httpx[brotli,zstd]
 playwright
 rebrowser-playwright
-camoufox>=0.3.9
+camoufox>=0.3.10
 browserforge

{scrapling-0.2.1 → scrapling-0.2.3}/setup.cfg RENAMED Viewed

@@ -1,6 +1,6 @@
 [metadata]
 name = scrapling
-version = 0.2.1
+version = 0.2.3
 author = Karim Shoair
 author_email = karim.shoair@pm.me
 description = Scrapling is an undetectable, powerful, flexible, adaptive, and high-performance web scraping library for Python.

{scrapling-0.2.1 → scrapling-0.2.3}/setup.py RENAMED Viewed

@@ -6,7 +6,7 @@ with open("README.md", "r", encoding="utf-8") as fh:
 setup(
     name="scrapling",
-    version="0.2.1",
+    version="0.2.3",
     description="""Scrapling is a powerful, flexible, and high-performance web scraping library for Python. It
     simplifies the process of extracting data from websites, even when they undergo structural changes, and offers
     impressive speed improvements over many popular scraping tools.""",
@@ -57,7 +57,7 @@ setup(
         'httpx[brotli,zstd]',
         'playwright',
         'rebrowser-playwright',
-        'camoufox>=0.3.9',
+        'camoufox>=0.3.10',
         'browserforge',
     ],
     python_requires=">=3.8",

{scrapling-0.2.1 → scrapling-0.2.3}/LICENSE RENAMED Viewed

File without changes

{scrapling-0.2.1 → scrapling-0.2.3}/scrapling/core/__init__.py RENAMED Viewed

File without changes

{scrapling-0.2.1 → scrapling-0.2.3}/scrapling/core/_types.py RENAMED Viewed

File without changes

{scrapling-0.2.1 → scrapling-0.2.3}/scrapling/core/custom_types.py RENAMED Viewed

File without changes

{scrapling-0.2.1 → scrapling-0.2.3}/scrapling/core/mixins.py RENAMED Viewed

File without changes

{scrapling-0.2.1 → scrapling-0.2.3}/scrapling/core/storage_adaptors.py RENAMED Viewed

File without changes

{scrapling-0.2.1 → scrapling-0.2.3}/scrapling/core/translator.py RENAMED Viewed

File without changes

{scrapling-0.2.1 → scrapling-0.2.3}/scrapling/core/utils.py RENAMED Viewed

File without changes

{scrapling-0.2.1 → scrapling-0.2.3}/scrapling/engines/__init__.py RENAMED Viewed

File without changes

{scrapling-0.2.1 → scrapling-0.2.3}/scrapling/engines/constants.py RENAMED Viewed

File without changes

{scrapling-0.2.1 → scrapling-0.2.3}/scrapling/engines/toolbelt/__init__.py RENAMED Viewed

File without changes

{scrapling-0.2.1 → scrapling-0.2.3}/scrapling/engines/toolbelt/fingerprints.py RENAMED Viewed

File without changes

{scrapling-0.2.1 → scrapling-0.2.3}/scrapling/engines/toolbelt/navigation.py RENAMED Viewed

File without changes

{scrapling-0.2.1 → scrapling-0.2.3}/scrapling/fetchers.py RENAMED Viewed

File without changes

{scrapling-0.2.1 → scrapling-0.2.3}/scrapling/py.typed RENAMED Viewed

File without changes

{scrapling-0.2.1 → scrapling-0.2.3}/scrapling.egg-info/dependency_links.txt RENAMED Viewed

File without changes

{scrapling-0.2.1 → scrapling-0.2.3}/scrapling.egg-info/not-zip-safe RENAMED Viewed

File without changes

{scrapling-0.2.1 → scrapling-0.2.3}/scrapling.egg-info/top_level.txt RENAMED Viewed

File without changes

{scrapling-0.2.1 → scrapling-0.2.3}/tests/__init__.py RENAMED Viewed

File without changes

{scrapling-0.2.1 → scrapling-0.2.3}/tests/fetchers/__init__.py RENAMED Viewed

File without changes

{scrapling-0.2.1 → scrapling-0.2.3}/tests/fetchers/test_camoufox.py RENAMED Viewed

File without changes

{scrapling-0.2.1 → scrapling-0.2.3}/tests/fetchers/test_httpx.py RENAMED Viewed

File without changes

{scrapling-0.2.1 → scrapling-0.2.3}/tests/fetchers/test_playwright.py RENAMED Viewed

File without changes

{scrapling-0.2.1 → scrapling-0.2.3}/tests/parser/__init__.py RENAMED Viewed

File without changes

{scrapling-0.2.1 → scrapling-0.2.3}/tests/parser/test_automatch.py RENAMED Viewed

File without changes

{scrapling-0.2.1 → scrapling-0.2.3}/tests/parser/test_general.py RENAMED Viewed

File without changes

scrapling 0.2.1__tar.gz → 0.2.3__tar.gz

scrapling 0.2.1tar.gz → 0.2.3tar.gz