justhtml 0.24.0__py3-none-any.whl → 0.38.0__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.


This version of justhtml might be problematic. Click here for more details.

@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: justhtml
3
- Version: 0.24.0
3
+ Version: 0.38.0
4
4
  Summary: A pure Python HTML5 parser that just works.
5
5
  Project-URL: Homepage, https://github.com/emilstenstrom/justhtml
6
6
  Project-URL: Issues, https://github.com/emilstenstrom/justhtml/issues
@@ -52,7 +52,7 @@ A pure Python HTML5 parser that just works. No C extensions to compile. No syste
52
52
  # Requires: [intentionally left blank]
53
53
  ```
54
54
 
55
- - **Just... Secure 🔒** — Safe-by-default output for untrusted HTML — built-in Bleach-style allowlist sanitization on `to_text()`, `to_html()`, `to_markdown()` (override with `safe=False`). Can sanitize inline CSS rules. ([Sanitization & Security](docs/sanitization.md))
55
+ - **Just... Secure 🔒** — Safe-by-default sanitization at construction time — built-in Bleach-style allowlist sanitization on `JustHTML(...)` (disable with `safe=False`). Can sanitize inline CSS rules. ([Sanitization & Security](docs/sanitization.md))
56
56
 
57
57
  ```python
58
58
  JustHTML(
@@ -74,12 +74,29 @@ A pure Python HTML5 parser that just works. No C extensions to compile. No syste
74
74
  # => <p class="x">Hi</p>
75
75
  ```
76
76
 
77
+ - **Just... Transform 🏗️** — Built-in DOM transforms for: drop/unwrap nodes, rewrite attributes, linkify text, and compose safe pipelines. ([Transforms](docs/transforms.md))
78
+
79
+ ```python
80
+ from justhtml import JustHTML, Linkify, SetAttrs, Unwrap
81
+
82
+ doc = JustHTML(
83
+ "<p>Hello <span class=\"x\">world</span> example.com</p>",
84
+ transforms=[
85
+ Unwrap("span.x"),
86
+ Linkify(),
87
+ SetAttrs("a", rel="nofollow"),
88
+ ],
89
+ )
90
+ print(doc.to_html(pretty=False))
91
+ # => <p>Hello world <a href="https://example.com" rel="nofollow">example.com</a></p>
92
+ ```
93
+
77
94
  - **Just... Fast Enough ⚡** — Fast for the common case (fastest pure-Python HTML5 parser available); for terabytes, use a C/Rust parser like `html5ever`. ([Benchmarks](benchmarks/performance.py))
78
95
 
79
96
  ```bash
80
- TIMEFORMAT='%3R s' time curl -Ls https://en.wikipedia.org/wiki/HTML \
81
- | python -m justhtml - > /dev/null
82
- # 0.365 s
97
+ /usr/bin/time -f '%e s' bash -lc \
98
+ "curl -Ls https://en.wikipedia.org/wiki/HTML | python -m justhtml - > /dev/null"
99
+ # 0.41 s
83
100
  ```
84
101
 
85
102
  ## Comparison
@@ -95,7 +112,7 @@ A pure Python HTML5 parser that just works. No C extensions to compile. No syste
95
112
  | **`selectolax`**<br>Python wrapper of C-based Lexbor | 🟡 68% | 🚀 Very Fast | ✅ CSS selectors | ❌ Needs sanitization | Very fast but less compliant. |
96
113
  | **`html.parser`**<br>Python stdlib | 🔴 4% | ⚡ Fast | ❌ None | ❌ Needs sanitization | Standard library. Chokes on malformed HTML. |
97
114
  | **`BeautifulSoup`**<br>Pure Python | 🔴 4% (default) | 🐢 Slow | 🟡 Custom API | ❌ Needs sanitization | Wraps `html.parser` (default). Can use lxml or html5lib. |
98
- | **`lxml`**<br>Python wrapper of C-based libxml2 | 🔴 1% | 🚀 Very Fast | 🟡 XPath | 🔴 [Not considered safe](https://lxml-html-clean.readthedocs.io/en/latest/usage.html) | Fast but not HTML5 compliant. |
115
+ | **`lxml`**<br>Python wrapper of C-based libxml2 | 🔴 1% | 🚀 Very Fast | 🟡 XPath | Needs sanitization | Fast but not HTML5 compliant. Don't use the old lxml.html.clean module! |
99
116
 
100
117
  [1]: Parser compliance scores are from a strict run of the [html5lib-tests](https://github.com/html5lib/html5lib-tests) tree-construction fixtures (1,743 non-script tests). See [docs/correctness.md](docs/correctness.md) for details.
101
118
 
@@ -170,9 +187,13 @@ A pure Python HTML5 parser that just works. No C extensions to compile. No syste
170
187
 
171
188
  - **Just... Correct ✅** — Spec-perfect HTML5 parsing with browser-grade error recovery — passes the official 9k+ [html5lib-tests](https://github.com/html5lib/html5lib-tests) suite, with 100% line+branch coverage. ([Correctness](/EmilStenstrom/justhtml/blob/main/docs/correctness.md))
172
189
  - **Just... Python 🐍** — Pure Python, zero dependencies — no C extensions or system libraries, easy to debug, and works anywhere Python runs (including PyPy and Pyodide). ([Quickstart](/EmilStenstrom/justhtml/blob/main/docs/quickstart.md))
173
- - **Just... Secure 🔒** — Safe-by-default output for untrusted HTML — built-in Bleach-style allowlist sanitization on `to_html()` / `to_markdown()` (override with `safe=False`), plus URL/CSS rules. ([Sanitization & Security](/EmilStenstrom/justhtml/blob/main/docs/sanitization.md))
190
+ - **Just... Secure 🔒** — Safe-by-default sanitization at construction time — built-in Bleach-style allowlist sanitization on `JustHTML(...)` (disable with `safe=False`), plus URL/CSS rules. ([Sanitization & Security](/EmilStenstrom/justhtml/blob/main/docs/sanitization.md))
174
191
  ```
175
192
 
193
+ ## Security
194
+
195
+ For security policy and vulnerability reporting, please see [SECURITY.md](SECURITY.md).
196
+
176
197
  ## Contributing
177
198
 
178
199
  See [CONTRIBUTING.md](CONTRIBUTING.md) for development setup and guidelines.
@@ -0,0 +1,26 @@
1
+ justhtml/__init__.py,sha256=cyFtwOsxM_m-xG3vNdO4YvBQvEp0HOWUN3EnfGwGotc,1183
2
+ justhtml/__main__.py,sha256=aupMvpS2_C4b11GcSNm5_JdlDkllaQLE3_CR8ttUmmk,6559
3
+ justhtml/constants.py,sha256=85cNNHS3fCSwvFGsQSV7uk_G1Ce0llHBkg3sW8k7WZ8,11881
4
+ justhtml/context.py,sha256=Ac4mV-a3ZgJILQbstFu-EB6bRA5oYlSkHqpTxMlMfk0,293
5
+ justhtml/encoding.py,sha256=9mscoXtBb57zehG_BxzN6aTTJHaNfywk5gwxrnH92K8,11310
6
+ justhtml/entities.py,sha256=_cQ3MBrV2hJwAUPVF8JJf7zbrdrxycKOe3Z_thg93Ng,11161
7
+ justhtml/errors.py,sha256=XVTgiXmfh1tX3PjGKBuhiCQ-72gNVuimBUXexHW9pKo,11045
8
+ justhtml/linkify.py,sha256=qTrEJ4UeSC8fVbryst6HfZkgAs69YvaNWkM2sB3zS74,14112
9
+ justhtml/node.py,sha256=A9IetRR8_MC2QCmmcEiAV5nIg97rorUnlDZ9-LfkjOM,27857
10
+ justhtml/parser.py,sha256=STLG33TkMvb0Z_RH5gUDmcsEjWF_QQ2aabRDXhuUF1I,9984
11
+ justhtml/py.typed,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
12
+ justhtml/sanitize.py,sha256=D0aOgy_iFtCnyNZjFaxoAoZvLvoHhUWgeL02p_M9d7k,33188
13
+ justhtml/selector.py,sha256=FLW-rOZwJxGf4uD6ZdHYI7QcEGzstBOOrf-Ubo-37uA,36015
14
+ justhtml/serialize.py,sha256=AZIGuIFJ8oLfpzz938svNN4wGgxYNA_EGheMzuwoi2s,32766
15
+ justhtml/stream.py,sha256=n8pKtVAivG0VerCWEcXSEBwzj8Tm1ltEAL7F46RGUVM,3431
16
+ justhtml/tokenizer.py,sha256=_v3dpjAuq89gjPJMbZLOKgrTc6GmV-QhuDSKGQA_3Pk,107171
17
+ justhtml/tokens.py,sha256=mk3VBdiula7voCKahRFJ45F14_Qh9Ega-XQ4wwavjMg,7695
18
+ justhtml/transforms.py,sha256=ptHXJ26AtbGTz0zZNIZQP47JphbATme3TyKK7x-qzw4,95289
19
+ justhtml/treebuilder.py,sha256=7RQCtHhRTj4uGlALPZtIzVD-ZoEK0ezyn1-Tto9yw3k,60972
20
+ justhtml/treebuilder_modes.py,sha256=8xupHR4IMaCyLGwKX6lcGDMwalMFlgne3B_fhMvyAE0,98887
21
+ justhtml/treebuilder_utils.py,sha256=LjK9tg9sNYR-sJdXKemJCzzzgh6lQW1KBqyvhpWtaoQ,2912
22
+ justhtml-0.38.0.dist-info/METADATA,sha256=yU5XJ-gqssbudTodF55FvNeRQchuPgTu3bFvI7Y9OuU,10171
23
+ justhtml-0.38.0.dist-info/WHEEL,sha256=WLgqFyCfm_KASv4WHyYy0P3pM_m7J5L9k2skdKLirC8,87
24
+ justhtml-0.38.0.dist-info/entry_points.txt,sha256=UN06mPn7J0cBM1dqyf245FvmU9mF3ivgplSr5ppdp6g,52
25
+ justhtml-0.38.0.dist-info/licenses/LICENSE,sha256=_IBvKQiU5PIZRnE1-yHzMEj41agX8PgoQkbXLaKdVy4,1256
26
+ justhtml-0.38.0.dist-info/RECORD,,
@@ -2,7 +2,7 @@ MIT License
2
2
 
3
3
  Copyright (c) 2025 Emil Stenström (JustHTML)
4
4
  Copyright (c) 2014-2017, The html5ever Project Developers (html5ever inspiration)
5
- Copyright (c) 2006-2013 James Graham, Geoffrey Sneddon, and
5
+ Copyright (c) 2006-2013 James Graham, Sam Sneddon, and
6
6
  other contributors (html5lib-tests)
7
7
 
8
8
  Permission is hereby granted, free of charge, to any person obtaining a copy
@@ -1,24 +0,0 @@
1
- justhtml/__init__.py,sha256=fDm2MolicILd_aORC05rrF0VpROIT7U5DbgyzDwyPhs,586
2
- justhtml/__main__.py,sha256=2qH55lmN9F14K3bqljm5B0YTvSdTC-r1U5BoymAI-uw,5204
3
- justhtml/constants.py,sha256=-UATvXXQ7ueFWxJHW79c2eMmMWaSKoqwwcNIGesTAj0,11603
4
- justhtml/context.py,sha256=Ac4mV-a3ZgJILQbstFu-EB6bRA5oYlSkHqpTxMlMfk0,293
5
- justhtml/encoding.py,sha256=9mscoXtBb57zehG_BxzN6aTTJHaNfywk5gwxrnH92K8,11310
6
- justhtml/entities.py,sha256=_cQ3MBrV2hJwAUPVF8JJf7zbrdrxycKOe3Z_thg93Ng,11161
7
- justhtml/errors.py,sha256=cxoYDDOxGoC_sCIP85pHSDWb1Pm_sfZLALWiTMhb8kc,10754
8
- justhtml/node.py,sha256=UnavBYOa_T7Yr7CVcb7tK2nVmt3s9Rs0nTlp9xNioMY,26916
9
- justhtml/parser.py,sha256=huuBeS9bQSjfCyFbfYiLEHVxHLE0XTj3V96rnzb6v_4,6364
10
- justhtml/py.typed,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
11
- justhtml/sanitize.py,sha256=efa56hNQct4a5pitj1JunPlKpsyz3DKkg7XFNEDbiXM,24884
12
- justhtml/selector.py,sha256=ZQDOgHlmPHSBRBZmRVTCDWOqHniy8iZew1MzonAUt3s,35836
13
- justhtml/serialize.py,sha256=3LSpyRG2IGIvhLsQFstUafBhLIIZiuM53uWw820gBb4,20475
14
- justhtml/stream.py,sha256=n8pKtVAivG0VerCWEcXSEBwzj8Tm1ltEAL7F46RGUVM,3431
15
- justhtml/tokenizer.py,sha256=wSiLfu0KtfH6XDV8XN2FsUBQhV1En_zGKb9itdiGa8w,103018
16
- justhtml/tokens.py,sha256=7SGTlB9mjMFU2QvBPnnGiMJXLk1oiEMvvCmbMUXE_Kc,7051
17
- justhtml/treebuilder.py,sha256=DZcrEW6p1IkC_jPu0Q0SsgITDIP2G2GVJJGbo2jZdkw,57712
18
- justhtml/treebuilder_modes.py,sha256=84NzalfDmb6_hwd6a3nBst3S_q1CndEopCt5wCRcpUA,97691
19
- justhtml/treebuilder_utils.py,sha256=LjK9tg9sNYR-sJdXKemJCzzzgh6lQW1KBqyvhpWtaoQ,2912
20
- justhtml-0.24.0.dist-info/METADATA,sha256=7wGboRVZbhcPKMinUzk-d_dRUhEPRZk8tSlgusTWeLM,9520
21
- justhtml-0.24.0.dist-info/WHEEL,sha256=WLgqFyCfm_KASv4WHyYy0P3pM_m7J5L9k2skdKLirC8,87
22
- justhtml-0.24.0.dist-info/entry_points.txt,sha256=UN06mPn7J0cBM1dqyf245FvmU9mF3ivgplSr5ppdp6g,52
23
- justhtml-0.24.0.dist-info/licenses/LICENSE,sha256=QGxhcdDa0J9T8bc3rQFQFR0sY9zPFwRw2X5h3NgBDe0,1261
24
- justhtml-0.24.0.dist-info/RECORD,,