softhauzpy 0.0.1__tar.gz → 0.0.3__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- {softhauzpy-0.0.1 → softhauzpy-0.0.3}/PKG-INFO +6 -4
- {softhauzpy-0.0.1 → softhauzpy-0.0.3}/README.md +3 -3
- {softhauzpy-0.0.1 → softhauzpy-0.0.3}/setup.py +4 -2
- {softhauzpy-0.0.1 → softhauzpy-0.0.3}/softhauzpy/main.py +1 -4
- {softhauzpy-0.0.1 → softhauzpy-0.0.3}/softhauzpy.egg-info/PKG-INFO +6 -4
- {softhauzpy-0.0.1 → softhauzpy-0.0.3}/setup.cfg +0 -0
- {softhauzpy-0.0.1 → softhauzpy-0.0.3}/softhauzpy/__init__.py +0 -0
- {softhauzpy-0.0.1 → softhauzpy-0.0.3}/softhauzpy.egg-info/SOURCES.txt +0 -0
- {softhauzpy-0.0.1 → softhauzpy-0.0.3}/softhauzpy.egg-info/dependency_links.txt +0 -0
- {softhauzpy-0.0.1 → softhauzpy-0.0.3}/softhauzpy.egg-info/requires.txt +0 -0
- {softhauzpy-0.0.1 → softhauzpy-0.0.3}/softhauzpy.egg-info/top_level.txt +0 -0
|
@@ -1,15 +1,17 @@
|
|
|
1
1
|
Metadata-Version: 2.1
|
|
2
2
|
Name: softhauzpy
|
|
3
|
-
Version: 0.0.
|
|
3
|
+
Version: 0.0.3
|
|
4
|
+
Author: Karen Urate
|
|
5
|
+
Author-email: karen.urate@softhauz.ca
|
|
4
6
|
Description-Content-Type: text/markdown
|
|
5
7
|
|
|
6
8
|
# SofthauzPy
|
|
7
9
|
**SofthauzPy** is a comprehensive Python toolkit built for developers creating intelligent, data-driven web applications. It provides a powerful suite of web utilities including web scraping tools, crawling systems, content extraction pipelines, and search engine components that help developers build fully customizable in-house website search solutions.
|
|
8
10
|
|
|
9
|
-
Designed for scalability and flexibility,
|
|
11
|
+
Designed for scalability and flexibility, **SofthauzPy** enables teams to collect, process, index, and search website content efficiently — all within a clean Python-first development ecosystem.
|
|
10
12
|
|
|
11
|
-
Built for developers who need scalable web data tools and intelligent search capabilities,
|
|
12
|
-
From lightweight crawlers to fully customizable in-house search engine functionality,
|
|
13
|
+
Built for developers who need scalable web data tools and intelligent search capabilities, **SofthauzPy** simplifies the process of scraping, processing, indexing, and searching website content.
|
|
14
|
+
From lightweight crawlers to fully customizable in-house search engine functionality, **SofthauzPy** helps developers build smarter web applications without relying heavily on external search services.
|
|
13
15
|
|
|
14
16
|
|
|
15
17
|
## Key Features
|
|
@@ -1,10 +1,10 @@
|
|
|
1
1
|
# SofthauzPy
|
|
2
2
|
**SofthauzPy** is a comprehensive Python toolkit built for developers creating intelligent, data-driven web applications. It provides a powerful suite of web utilities including web scraping tools, crawling systems, content extraction pipelines, and search engine components that help developers build fully customizable in-house website search solutions.
|
|
3
3
|
|
|
4
|
-
Designed for scalability and flexibility,
|
|
4
|
+
Designed for scalability and flexibility, **SofthauzPy** enables teams to collect, process, index, and search website content efficiently — all within a clean Python-first development ecosystem.
|
|
5
5
|
|
|
6
|
-
Built for developers who need scalable web data tools and intelligent search capabilities,
|
|
7
|
-
From lightweight crawlers to fully customizable in-house search engine functionality,
|
|
6
|
+
Built for developers who need scalable web data tools and intelligent search capabilities, **SofthauzPy** simplifies the process of scraping, processing, indexing, and searching website content.
|
|
7
|
+
From lightweight crawlers to fully customizable in-house search engine functionality, **SofthauzPy** helps developers build smarter web applications without relying heavily on external search services.
|
|
8
8
|
|
|
9
9
|
|
|
10
10
|
## Key Features
|
|
@@ -1,11 +1,13 @@
|
|
|
1
1
|
from setuptools import setup, find_packages
|
|
2
2
|
|
|
3
|
-
with open("README.md", "r") as f:
|
|
3
|
+
with open("README.md", "r", encoding="utf-8") as f:
|
|
4
4
|
description = f.read()
|
|
5
5
|
|
|
6
6
|
setup(
|
|
7
7
|
name='softhauzpy',
|
|
8
|
-
version='0.0.
|
|
8
|
+
version='0.0.3',
|
|
9
|
+
author='Karen Urate',
|
|
10
|
+
author_email='karen.urate@softhauz.ca',
|
|
9
11
|
packages=find_packages(),
|
|
10
12
|
install_requires=[
|
|
11
13
|
'requests>=2.32.3',
|
|
@@ -989,21 +989,18 @@ def incremental_update(
|
|
|
989
989
|
fp = fingerprint_page(text)
|
|
990
990
|
|
|
991
991
|
if fingerprints.get(url) == fp:
|
|
992
|
-
return False
|
|
992
|
+
return False
|
|
993
993
|
|
|
994
994
|
fingerprints[url] = fp
|
|
995
995
|
|
|
996
|
-
# Remove stale entries from index
|
|
997
996
|
for token in list(index.keys()):
|
|
998
997
|
index[token] = [(doc_id, freq) for doc_id, freq in index[token] if doc_id != url]
|
|
999
998
|
if not index[token]:
|
|
1000
999
|
del index[token]
|
|
1001
1000
|
|
|
1002
|
-
# Remove stale tfidf and metadata entries
|
|
1003
1001
|
tfidf.pop(url, None)
|
|
1004
1002
|
metadata[:] = [m for m in metadata if m.get("url") != url]
|
|
1005
1003
|
|
|
1006
|
-
# Build fresh entries for this page
|
|
1007
1004
|
token_freq = Counter(tokenize(text))
|
|
1008
1005
|
total = len(list(token_freq.elements())) or 1
|
|
1009
1006
|
for token, freq in token_freq.items():
|
|
@@ -1,15 +1,17 @@
|
|
|
1
1
|
Metadata-Version: 2.1
|
|
2
2
|
Name: softhauzpy
|
|
3
|
-
Version: 0.0.
|
|
3
|
+
Version: 0.0.3
|
|
4
|
+
Author: Karen Urate
|
|
5
|
+
Author-email: karen.urate@softhauz.ca
|
|
4
6
|
Description-Content-Type: text/markdown
|
|
5
7
|
|
|
6
8
|
# SofthauzPy
|
|
7
9
|
**SofthauzPy** is a comprehensive Python toolkit built for developers creating intelligent, data-driven web applications. It provides a powerful suite of web utilities including web scraping tools, crawling systems, content extraction pipelines, and search engine components that help developers build fully customizable in-house website search solutions.
|
|
8
10
|
|
|
9
|
-
Designed for scalability and flexibility,
|
|
11
|
+
Designed for scalability and flexibility, **SofthauzPy** enables teams to collect, process, index, and search website content efficiently — all within a clean Python-first development ecosystem.
|
|
10
12
|
|
|
11
|
-
Built for developers who need scalable web data tools and intelligent search capabilities,
|
|
12
|
-
From lightweight crawlers to fully customizable in-house search engine functionality,
|
|
13
|
+
Built for developers who need scalable web data tools and intelligent search capabilities, **SofthauzPy** simplifies the process of scraping, processing, indexing, and searching website content.
|
|
14
|
+
From lightweight crawlers to fully customizable in-house search engine functionality, **SofthauzPy** helps developers build smarter web applications without relying heavily on external search services.
|
|
13
15
|
|
|
14
16
|
|
|
15
17
|
## Key Features
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|