smol-html 0.1.3__py3-none-any.whl → 0.1.5__py3-none-any.whl
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,6 +1,6 @@
|
|
1
1
|
Metadata-Version: 2.4
|
2
2
|
Name: smol-html
|
3
|
-
Version: 0.1.
|
3
|
+
Version: 0.1.5
|
4
4
|
Summary: Small, dependable HTML cleaner/minifier with sensible defaults
|
5
5
|
Project-URL: Homepage, https://github.com/NosibleAI/smol-html
|
6
6
|
Project-URL: Repository, https://github.com/NosibleAI/smol-html
|
@@ -28,7 +28,7 @@ Requires-Dist: lxml[html-clean]>=1.3.2
|
|
28
28
|
Requires-Dist: minify-html>=0.2.6
|
29
29
|
Description-Content-Type: text/markdown
|
30
30
|
|
31
|
-

|
31
|
+

|
32
32
|
|
33
33
|
|
34
34
|
# smol-html
|
@@ -39,7 +39,7 @@ Small, dependable HTML cleaner/minifier with sensible defaults.
|
|
39
39
|
|
40
40
|
Nosible is a search engine, which means we need to store and process a very large number of webpages. To make this tractable, we strip out visual chrome and other non-essential components that don’t matter for downstream tasks (indexing, ranking, retrieval, and LLM pipelines) while preserving the important content and structure. This package cleans and minifies HTML, greatly reducing size on disk; combined with Brotli compression (by Google), the savings are even larger.
|
41
41
|
|
42
|
-

|
42
|
+

|
43
43
|
|
44
44
|
### 📦 Installation
|
45
45
|
|
@@ -53,6 +53,15 @@ pip install smol-html
|
|
53
53
|
uv pip install smol-html
|
54
54
|
```
|
55
55
|
|
56
|
+
### Requirements
|
57
|
+
|
58
|
+
- Python: 3.9
|
59
|
+
- Dependencies:
|
60
|
+
- beautifulsoup4>=4.0.1
|
61
|
+
- brotli>=0.5.2
|
62
|
+
- lxml[html-clean]>=1.3.2
|
63
|
+
- minify-html>=0.2.6
|
64
|
+
|
56
65
|
## Quick Start
|
57
66
|
|
58
67
|
Clean an HTML string (or page contents):
|
@@ -0,0 +1,4 @@
|
|
1
|
+
smol_html-0.1.5.dist-info/METADATA,sha256=jriTNIRVbdSkr7EXyEa1ssdm_rZmwo4IV_FLVEJTJrE,8539
|
2
|
+
smol_html-0.1.5.dist-info/WHEEL,sha256=qtCwoSJWgHk21S1Kb4ihdzI2rlJ1ZKaIurTj_ngOhyQ,87
|
3
|
+
smol_html-0.1.5.dist-info/licenses/LICENSE,sha256=88yg3BujRGq8MYlWhbrzB2YMNWJaXnBck3c7l23labs,1089
|
4
|
+
smol_html-0.1.5.dist-info/RECORD,,
|
smol_html-0.1.3.dist-info/RECORD
DELETED
@@ -1,4 +0,0 @@
|
|
1
|
-
smol_html-0.1.3.dist-info/METADATA,sha256=MApb1E7-tzyEYuRymzRjUTg8TD14vFBsfceTsY07r3s,8279
|
2
|
-
smol_html-0.1.3.dist-info/WHEEL,sha256=qtCwoSJWgHk21S1Kb4ihdzI2rlJ1ZKaIurTj_ngOhyQ,87
|
3
|
-
smol_html-0.1.3.dist-info/licenses/LICENSE,sha256=88yg3BujRGq8MYlWhbrzB2YMNWJaXnBck3c7l23labs,1089
|
4
|
-
smol_html-0.1.3.dist-info/RECORD,,
|
File without changes
|
File without changes
|