PyPI - fraudcrawler - Versions diffs - 0.3.6__tar.gz → 0.3.7__tar.gz - Mend

fraudcrawler 0.3.6tar.gz → 0.3.7tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.

This version of fraudcrawler might be problematic. Click here for more details.

Files changed (19) hide show

{fraudcrawler-0.3.6 → fraudcrawler-0.3.7}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.3
 Name: fraudcrawler
-Version: 0.3.6
+Version: 0.3.7
 Summary: Intelligent Market Monitoring
 Home-page: https://github.com/open-veanu/fraudcrawler
 License: MIT

{fraudcrawler-0.3.6 → fraudcrawler-0.3.7}/fraudcrawler/scraping/serp.py RENAMED Viewed

@@ -196,6 +196,27 @@ class SerpApi(AsyncClient):
             filtered_at_stage=filtered_at_stage,
         )
+    @staticmethod
+    def _is_excluded(domain: str, excluded_urls: List[Host]) -> bool:
+        """Checks if the domain is in the excluded URLs.
+        Note:
+            By checking `if dom == excl or dom.endswith(f".{excl}")` we also
+            check for subdomains. For example, if the domain is
+            `link.springer.com` and the excluded URL is `springer.com`,
+            it will be excluded.
+        Args:
+            domain: The domain to check.
+            excluded_urls: The list of excluded URLs.
+        """
+        dom = domain.lower()
+        excl_doms = [dom.lower() for excl in excluded_urls for dom in excl.domains]
+        for excl in excl_doms:
+            if dom == excl or dom.endswith(f".{excl}"):
+                return True
+        return False
     async def apply(
         self,
         search_term: str,
@@ -242,8 +263,11 @@ class SerpApi(AsyncClient):
         # Filter out the excluded URLs
         if excluded_urls:
-            excluded = [dom for excl in excluded_urls for dom in excl.domains]
-            results = [res for res in results if res.domain not in excluded]
+            results = [
+                res
+                for res in results
+                if not self._is_excluded(res.domain, excluded_urls)
+            ]
         logger.info(
             f'Produced {len(results)} results from SerpApi search with q="{search_string}".'

{fraudcrawler-0.3.6 → fraudcrawler-0.3.7}/pyproject.toml RENAMED Viewed

@@ -4,7 +4,7 @@ build-backend = "poetry.core.masonry.api"
 [tool.poetry]
 name = "fraudcrawler"
-version = "0.3.6"
+version = "0.3.7"
 description = "Intelligent Market Monitoring"
 authors = [
     "Domingo Bertus <hello@veanu.ch>",

{fraudcrawler-0.3.6 → fraudcrawler-0.3.7}/LICENSE RENAMED Viewed

File without changes

{fraudcrawler-0.3.6 → fraudcrawler-0.3.7}/README.md RENAMED Viewed

File without changes

{fraudcrawler-0.3.6 → fraudcrawler-0.3.7}/fraudcrawler/__init__.py RENAMED Viewed

File without changes

{fraudcrawler-0.3.6 → fraudcrawler-0.3.7}/fraudcrawler/base/__init__.py RENAMED Viewed

File without changes

{fraudcrawler-0.3.6 → fraudcrawler-0.3.7}/fraudcrawler/base/base.py RENAMED Viewed

File without changes

{fraudcrawler-0.3.6 → fraudcrawler-0.3.7}/fraudcrawler/base/client.py RENAMED Viewed

File without changes

{fraudcrawler-0.3.6 → fraudcrawler-0.3.7}/fraudcrawler/base/google-languages.json RENAMED Viewed

File without changes

{fraudcrawler-0.3.6 → fraudcrawler-0.3.7}/fraudcrawler/base/google-locations.json RENAMED Viewed

File without changes

{fraudcrawler-0.3.6 → fraudcrawler-0.3.7}/fraudcrawler/base/orchestrator.py RENAMED Viewed

File without changes

{fraudcrawler-0.3.6 → fraudcrawler-0.3.7}/fraudcrawler/launch_demo_pipeline.py RENAMED Viewed

File without changes

{fraudcrawler-0.3.6 → fraudcrawler-0.3.7}/fraudcrawler/processing/__init__.py RENAMED Viewed

File without changes

{fraudcrawler-0.3.6 → fraudcrawler-0.3.7}/fraudcrawler/processing/processor.py RENAMED Viewed

File without changes

{fraudcrawler-0.3.6 → fraudcrawler-0.3.7}/fraudcrawler/scraping/__init__.py RENAMED Viewed

File without changes

{fraudcrawler-0.3.6 → fraudcrawler-0.3.7}/fraudcrawler/scraping/enrich.py RENAMED Viewed

File without changes

{fraudcrawler-0.3.6 → fraudcrawler-0.3.7}/fraudcrawler/scraping/zyte.py RENAMED Viewed

File without changes

{fraudcrawler-0.3.6 → fraudcrawler-0.3.7}/fraudcrawler/settings.py RENAMED Viewed

File without changes

fraudcrawler 0.3.6__tar.gz → 0.3.7__tar.gz

Potentially problematic release.

fraudcrawler 0.3.6tar.gz → 0.3.7tar.gz