PyPI - softhauzpy - Versions diffs - 0.0.81__tar.gz → 0.0.91__tar.gz - Mend

softhauzpy 0.0.81tar.gz → 0.0.91tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (11) hide show

{softhauzpy-0.0.81 → softhauzpy-0.0.91}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.1
 Name: softhauzpy
-Version: 0.0.81
+Version: 0.0.91
 Summary: is a comprehensive Python toolkit built for developers creating intelligent, data-driven web applications. It provides a powerful suite of web utilities including web scraping tools, crawling systems, content extraction pipelines, and search engine components that help developers build fully customizable in-house website search solutions.
 Home-page: https://softhauz.ca
 Author: Karen Urate

{softhauzpy-0.0.81 → softhauzpy-0.0.91}/setup.py RENAMED Viewed

@@ -5,7 +5,7 @@ with open("README.md", "r", encoding="utf-8") as f:
 setup(
     name='softhauzpy',
-    version='0.0.81',
+    version='0.0.91',
     author='Karen Urate',
     author_email='karen.urate@softhauz.ca',
     packages=find_packages(),

{softhauzpy-0.0.81 → softhauzpy-0.0.91}/softhauzpy/main.py RENAMED Viewed

@@ -94,12 +94,13 @@ def detect_input_type(value: str) -> str:
     Parameters
     ----------
-    url           : String - The URL to fetch.
+    page_url      : String - Accepts either the URL to fetch, a local path to an HTML file, or the text/HTML String content of the page.
     title         : String - Optional document title (included in the returned text header when provided).
     author        : String - Optional document author (included in the returned text header when provided).
     description   : String - Optional description (included in the returned text header when provided).
     creation_date : String - Optional creation date string (included in the returned text header when provided).
     modified_date : String - Optional last-modified date string (included in the returned text header when provided).
+    assigned_location : String - Optional the URL to assign if the value passed in page_url is the text/HTML String content of the page.
     Returns
@@ -127,7 +128,8 @@ def extract_pure_text(
         author: str | None = None,
         description: str | None = None,
         creation_date: str | None = None,
-        modified_date: str | None = None) -> dict:
+        modified_date: str | None = None,
+        assigned_location: str | None = None) -> dict:
     input_type = detect_input_type(page_url)
@@ -163,11 +165,12 @@ def extract_pure_text(
     if modified_date:
         header_parts.append(f"Last Modified: {modified_date}")
     if page_url:
-        header_parts.append(f"URL:           {page_url}")
+        if (input_type == "url") or (input_type == "html"):
+            header_parts.append(f"URL:           {page_url}")
     header = " ".join(header_parts)
     result = {
-        "url": page_url,
+        "url": page_url if ((input_type == "url") or (input_type == "html")) else assigned_location,
         "title": title,
         "author": author,
         "description": description,
@@ -220,16 +223,20 @@ def get_search_results_list(page_list=[], keywords='') -> list:
         if len(url) == 0 or len(url) < 1:
             continue
         title = page[1] or ''
         author = page[2] or ''
         description = page[3] or ''
         creation_date = page[4] or ''
         modified_date = page[5] or ''
-        if keywords in (extract_pure_text(url, title=title, author=author, description=description, creation_date=creation_date, modified_date=modified_date)["content"]).lower():
-            results.append((url, title, author, description, creation_date, modified_date))
+        assigned_location = page[6] if len(page[6])>3 else ''
+        if keywords in (extract_pure_text(url, title=title, author=author, description=description, creation_date=creation_date, modified_date=modified_date, assigned_location=assigned_location)["content"]).lower():
+            if detect_input_type(url) != "url":
+                results.append((assigned_location, title, author, description, creation_date, modified_date))
+            else:
+                results.append((url, title, author, description, creation_date, modified_date))
     return results

{softhauzpy-0.0.81 → softhauzpy-0.0.91}/softhauzpy.egg-info/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.1
 Name: softhauzpy
-Version: 0.0.81
+Version: 0.0.91
 Summary: is a comprehensive Python toolkit built for developers creating intelligent, data-driven web applications. It provides a powerful suite of web utilities including web scraping tools, crawling systems, content extraction pipelines, and search engine components that help developers build fully customizable in-house website search solutions.
 Home-page: https://softhauz.ca
 Author: Karen Urate

{softhauzpy-0.0.81 → softhauzpy-0.0.91}/README.md RENAMED Viewed

File without changes

{softhauzpy-0.0.81 → softhauzpy-0.0.91}/setup.cfg RENAMED Viewed

File without changes

{softhauzpy-0.0.81 → softhauzpy-0.0.91}/softhauzpy/__init__.py RENAMED Viewed

File without changes

{softhauzpy-0.0.81 → softhauzpy-0.0.91}/softhauzpy.egg-info/SOURCES.txt RENAMED Viewed

File without changes

{softhauzpy-0.0.81 → softhauzpy-0.0.91}/softhauzpy.egg-info/dependency_links.txt RENAMED Viewed

File without changes

{softhauzpy-0.0.81 → softhauzpy-0.0.91}/softhauzpy.egg-info/requires.txt RENAMED Viewed

File without changes

{softhauzpy-0.0.81 → softhauzpy-0.0.91}/softhauzpy.egg-info/top_level.txt RENAMED Viewed

File without changes

softhauzpy 0.0.81__tar.gz → 0.0.91__tar.gz

softhauzpy 0.0.81tar.gz → 0.0.91tar.gz