htmltree-view 0.2.1__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 Hadi Cahyadi
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
@@ -0,0 +1,237 @@
1
+ Metadata-Version: 2.4
2
+ Name: htmltree-view
3
+ Version: 0.2.1
4
+ Summary: Visualize HTML DOM structure as a depth-limited, colorized ASCII tree
5
+ License: MIT
6
+ Project-URL: Homepage, https://github.com/cumulus13/htmltree
7
+ Project-URL: Repository, https://github.com/cumulus13/htmltree
8
+ Project-URL: Issues, https://github.com/cumulus13/htmltree/issues
9
+ Keywords: html,dom,tree,visualizer,beautifulsoup,cli,debug,structure,ascii
10
+ Classifier: Development Status :: 4 - Beta
11
+ Classifier: Environment :: Console
12
+ Classifier: Intended Audience :: Developers
13
+ Classifier: License :: OSI Approved :: MIT License
14
+ Classifier: Programming Language :: Python :: 3
15
+ Classifier: Programming Language :: Python :: 3.8
16
+ Classifier: Programming Language :: Python :: 3.9
17
+ Classifier: Programming Language :: Python :: 3.10
18
+ Classifier: Programming Language :: Python :: 3.11
19
+ Classifier: Programming Language :: Python :: 3.12
20
+ Classifier: Topic :: Software Development :: Libraries :: Python Modules
21
+ Classifier: Topic :: Text Processing :: Markup :: HTML
22
+ Classifier: Topic :: Utilities
23
+ Requires-Python: >=3.8
24
+ Description-Content-Type: text/markdown
25
+ License-File: LICENSE
26
+ Requires-Dist: beautifulsoup4>=4.12
27
+ Provides-Extra: lxml
28
+ Requires-Dist: lxml; extra == "lxml"
29
+ Provides-Extra: html5lib
30
+ Requires-Dist: html5lib; extra == "html5lib"
31
+ Provides-Extra: dev
32
+ Requires-Dist: pytest>=7; extra == "dev"
33
+ Requires-Dist: pytest-cov; extra == "dev"
34
+ Requires-Dist: lxml; extra == "dev"
35
+ Requires-Dist: html5lib; extra == "dev"
36
+ Dynamic: license-file
37
+
38
+ # htmltree-view
39
+
40
+ > Visualize HTML DOM structure as a **depth-limited, colorized ASCII tree** — like the `tree` command, but for HTML files.
41
+
42
+ ```
43
+ <html> lang="en" [ L0 2ch ]
44
+ ├── <head> [ L0 4ch ]
45
+ │ ├── <meta> charset="utf-8" [ L1 empty ]
46
+ │ ├── <meta> name="viewport" content="width=device-width" [ L1 empty ]
47
+ │ ├── <title> [ L1 empty ]
48
+ │ │ └── "My Page"
49
+ │ └── <link> rel="stylesheet" href="style.css" [ L1 empty ]
50
+ └── <body> [ L1 3ch ]
51
+ ├── <header> [ L2 2ch ]
52
+ │ └── … (2 children hidden)
53
+ ├── <main> id="main-content" [ L2 2ch ]
54
+ │ └── … (2 children hidden)
55
+ └── <footer> [ L2 2ch ]
56
+ └── … (2 children hidden)
57
+
58
+ ────────────────────────────────────────────────────
59
+ Tags: 8 Text nodes: 1 Max depth: 2 (capped at 2)
60
+ Top tags: meta×2, html×1, head×1, title×1, link×1
61
+ ```
62
+
63
+ ## Features
64
+
65
+ - **Depth limiting** — `-d N` stops at level N; truncated sub-trees show a `… (X children hidden)` hint
66
+ - **CSS selector zoom** — `-s "#app"` or `-s "body > main"` focuses any sub-tree
67
+ - **Semantic tag colors** — headings in amber, structural in blue, forms in pink, links in cyan, etc.
68
+ - **Depth-cycling pipe colors** — guide lines change shade per nesting level
69
+ - **`[L3 5ch]` badges** — depth level + direct child-tag count on every node
70
+ - **Text nodes** — quoted inline, with `--text-limit` truncation and whitespace collapsing
71
+ - **Attribute filtering** — `--attrs id class href` shows only what you care about; `--attrs` hides all
72
+ - **Attribute value truncation** — `--attr-limit 80` prevents base64/data-URI blowout
73
+ - **HTML comments** — hidden by default, shown with `--show-comments`
74
+ - **URL fetching** — `htmltree https://example.com -d 3`
75
+ - **stdin pipe** — `curl ... | htmltree -` or `echo '<div/>' | htmltree -`
76
+ - **Output to file** — `-o tree.txt` (auto-disables color)
77
+ - **Auto color detection** — ANSI disabled when stdout is not a TTY; respects `NO_COLOR` / `FORCE_COLOR` env vars
78
+ - **Streaming output** — `iter_lines()` yields one line at a time; never builds the full string unless you ask
79
+ - **No recursion** — iterative DFS walk; handles arbitrarily deep HTML without `RecursionError`
80
+ - **Stats summary** — total tags, text nodes, comments, max depth seen, top-5 tag frequencies
81
+
82
+ ## Install
83
+
84
+ ```bash
85
+ pip install htmltree-view
86
+
87
+ # With faster lxml parser:
88
+ pip install "htmltree-view[lxml]"
89
+
90
+ # With html5lib (most spec-accurate):
91
+ pip install "htmltree-view[html5lib]"
92
+ ```
93
+
94
+ ## CLI
95
+
96
+ ```bash
97
+ # Full tree
98
+ htmltree index.html
99
+
100
+ # Limit depth to 3 levels
101
+ htmltree index.html -d 3
102
+
103
+ # Focus on a CSS-selected sub-tree
104
+ htmltree index.html -s "body > main"
105
+ htmltree index.html -s "#app"
106
+ htmltree index.html -s ".container"
107
+
108
+ # Fetch from URL
109
+ htmltree https://example.com -d 4
110
+
111
+ # Read from stdin
112
+ curl https://example.com | htmltree -
113
+ echo '<div><p>hi</p></div>' | htmltree -
114
+
115
+ # Show only id and class attributes
116
+ htmltree index.html --attrs id class
117
+
118
+ # Hide all attributes
119
+ htmltree index.html --attrs
120
+
121
+ # Hide text nodes (structure only)
122
+ htmltree index.html --no-text
123
+
124
+ # Show HTML comments
125
+ htmltree index.html --show-comments
126
+
127
+ # Truncate text/attr at 40 chars
128
+ htmltree index.html --text-limit 40 --attr-limit 40
129
+
130
+ # Save to file (color auto-disabled)
131
+ htmltree index.html -o structure.txt
132
+
133
+ # Pipe to less with color preserved
134
+ htmltree index.html --force-color | less -R
135
+
136
+ # Use lxml backend (faster)
137
+ htmltree index.html --parser lxml
138
+
139
+ # Plain output (no ANSI)
140
+ htmltree index.html --no-color
141
+ ```
142
+
143
+ ## Python API
144
+
145
+ ```python
146
+ from htmltree import HtmlTree
147
+
148
+ html = open("index.html").read()
149
+
150
+ # Basic usage
151
+ tree = HtmlTree(html)
152
+ tree.print()
153
+
154
+ # Limit depth, filter attributes
155
+ tree = HtmlTree(html, max_depth=3, show_attrs=["id", "class"])
156
+ tree.print()
157
+
158
+ # Zoom into a sub-tree
159
+ tree = HtmlTree(html, max_depth=5, show_text=False)
160
+ tree.print(root_selector="body > main")
161
+
162
+ # Render to string
163
+ tree = HtmlTree(html, max_depth=2, force_color=False)
164
+ output = tree.render(root_selector="body")
165
+ print(output)
166
+
167
+ # Stream line by line (memory-efficient for large pages)
168
+ tree = HtmlTree(html, max_depth=4)
169
+ for line in tree.iter_lines(root_selector="#content"):
170
+ print(line)
171
+
172
+ # Access stats after render
173
+ tree.render()
174
+ print(tree.stats.total_tags)
175
+ print(tree.stats.tag_counts) # dict: tag name → count
176
+ print(tree.stats.max_depth_seen)
177
+ print(tree.stats.total_text_nodes)
178
+ print(tree.stats.total_comments)
179
+ ```
180
+
181
+ ## CLI reference
182
+
183
+ | Flag | Default | Description |
184
+ |------|---------|-------------|
185
+ | `SOURCE` | — | HTML file path, http/https URL, or `-` for stdin |
186
+ | `-d N` / `--depth N` | unlimited | Max depth; negatives clamped to 0 |
187
+ | `-s CSS` / `--selector CSS` | `<html>` | CSS selector for tree root |
188
+ | `--attrs [NAME …]` | all | Attributes to show; no names = hide all |
189
+ | `--no-text` | off | Hide text nodes |
190
+ | `--show-comments` | off | Show HTML comment nodes |
191
+ | `--text-limit N` | 60 | Max chars per text node |
192
+ | `--attr-limit N` | 80 | Max chars per attribute value |
193
+ | `--no-color` | off | Disable ANSI colors |
194
+ | `--force-color` | off | Force colors even when piped |
195
+ | `--no-summary` | off | Suppress stats footer |
196
+ | `-o FILE` / `--output FILE` | stdout | Write to file |
197
+ | `--parser BACKEND` | `html.parser` | `html.parser`, `lxml`, `html5lib` |
198
+ | `--version` | — | Print version and exit |
199
+
200
+ ## Tree legend
201
+
202
+ | Symbol | Meaning |
203
+ |--------|---------|
204
+ | `[L3]` | Node is at depth 3 |
205
+ | `[5ch]` | 5 direct tag children |
206
+ | `[empty]` | No children |
207
+ | `"text"` | Text node content (may be truncated) |
208
+ | `<!-- … -->` | HTML comment (with `--show-comments`) |
209
+ | `… (N children hidden)` | Sub-tree cut at depth limit |
210
+
211
+ ## Environment variables
212
+
213
+ | Variable | Effect |
214
+ |----------|--------|
215
+ | `NO_COLOR` | Any non-empty value disables ANSI colors (https://no-color.org/) |
216
+ | `FORCE_COLOR` | Any non-empty value forces ANSI colors even when piped |
217
+
218
+ ## Requirements
219
+
220
+ - Python ≥ 3.8
221
+ - `beautifulsoup4 ≥ 4.12`
222
+ - Optional: `lxml`, `html5lib`
223
+
224
+ ## License
225
+
226
+ [MIT](LICENSE)
227
+
228
+ ## 👤 Author
229
+
230
+ [Hadi Cahyadi](mailto:cumulus13@gmail.com)
231
+
232
+
233
+ [![Buy Me a Coffee](https://www.buymeacoffee.com/assets/img/custom_images/orange_img.png)](https://www.buymeacoffee.com/cumulus13)
234
+
235
+ [![Donate via Ko-fi](https://ko-fi.com/img/githubbutton_sm.svg)](https://ko-fi.com/cumulus13)
236
+
237
+ [Support me on Patreon](https://www.patreon.com/cumulus13)
@@ -0,0 +1,200 @@
1
+ # htmltree-view
2
+
3
+ > Visualize HTML DOM structure as a **depth-limited, colorized ASCII tree** — like the `tree` command, but for HTML files.
4
+
5
+ ```
6
+ <html> lang="en" [ L0 2ch ]
7
+ ├── <head> [ L0 4ch ]
8
+ │ ├── <meta> charset="utf-8" [ L1 empty ]
9
+ │ ├── <meta> name="viewport" content="width=device-width" [ L1 empty ]
10
+ │ ├── <title> [ L1 empty ]
11
+ │ │ └── "My Page"
12
+ │ └── <link> rel="stylesheet" href="style.css" [ L1 empty ]
13
+ └── <body> [ L1 3ch ]
14
+ ├── <header> [ L2 2ch ]
15
+ │ └── … (2 children hidden)
16
+ ├── <main> id="main-content" [ L2 2ch ]
17
+ │ └── … (2 children hidden)
18
+ └── <footer> [ L2 2ch ]
19
+ └── … (2 children hidden)
20
+
21
+ ────────────────────────────────────────────────────
22
+ Tags: 8 Text nodes: 1 Max depth: 2 (capped at 2)
23
+ Top tags: meta×2, html×1, head×1, title×1, link×1
24
+ ```
25
+
26
+ ## Features
27
+
28
+ - **Depth limiting** — `-d N` stops at level N; truncated sub-trees show a `… (X children hidden)` hint
29
+ - **CSS selector zoom** — `-s "#app"` or `-s "body > main"` focuses any sub-tree
30
+ - **Semantic tag colors** — headings in amber, structural in blue, forms in pink, links in cyan, etc.
31
+ - **Depth-cycling pipe colors** — guide lines change shade per nesting level
32
+ - **`[L3 5ch]` badges** — depth level + direct child-tag count on every node
33
+ - **Text nodes** — quoted inline, with `--text-limit` truncation and whitespace collapsing
34
+ - **Attribute filtering** — `--attrs id class href` shows only what you care about; `--attrs` hides all
35
+ - **Attribute value truncation** — `--attr-limit 80` prevents base64/data-URI blowout
36
+ - **HTML comments** — hidden by default, shown with `--show-comments`
37
+ - **URL fetching** — `htmltree https://example.com -d 3`
38
+ - **stdin pipe** — `curl ... | htmltree -` or `echo '<div/>' | htmltree -`
39
+ - **Output to file** — `-o tree.txt` (auto-disables color)
40
+ - **Auto color detection** — ANSI disabled when stdout is not a TTY; respects `NO_COLOR` / `FORCE_COLOR` env vars
41
+ - **Streaming output** — `iter_lines()` yields one line at a time; never builds the full string unless you ask
42
+ - **No recursion** — iterative DFS walk; handles arbitrarily deep HTML without `RecursionError`
43
+ - **Stats summary** — total tags, text nodes, comments, max depth seen, top-5 tag frequencies
44
+
45
+ ## Install
46
+
47
+ ```bash
48
+ pip install htmltree-view
49
+
50
+ # With faster lxml parser:
51
+ pip install "htmltree-view[lxml]"
52
+
53
+ # With html5lib (most spec-accurate):
54
+ pip install "htmltree-view[html5lib]"
55
+ ```
56
+
57
+ ## CLI
58
+
59
+ ```bash
60
+ # Full tree
61
+ htmltree index.html
62
+
63
+ # Limit depth to 3 levels
64
+ htmltree index.html -d 3
65
+
66
+ # Focus on a CSS-selected sub-tree
67
+ htmltree index.html -s "body > main"
68
+ htmltree index.html -s "#app"
69
+ htmltree index.html -s ".container"
70
+
71
+ # Fetch from URL
72
+ htmltree https://example.com -d 4
73
+
74
+ # Read from stdin
75
+ curl https://example.com | htmltree -
76
+ echo '<div><p>hi</p></div>' | htmltree -
77
+
78
+ # Show only id and class attributes
79
+ htmltree index.html --attrs id class
80
+
81
+ # Hide all attributes
82
+ htmltree index.html --attrs
83
+
84
+ # Hide text nodes (structure only)
85
+ htmltree index.html --no-text
86
+
87
+ # Show HTML comments
88
+ htmltree index.html --show-comments
89
+
90
+ # Truncate text/attr at 40 chars
91
+ htmltree index.html --text-limit 40 --attr-limit 40
92
+
93
+ # Save to file (color auto-disabled)
94
+ htmltree index.html -o structure.txt
95
+
96
+ # Pipe to less with color preserved
97
+ htmltree index.html --force-color | less -R
98
+
99
+ # Use lxml backend (faster)
100
+ htmltree index.html --parser lxml
101
+
102
+ # Plain output (no ANSI)
103
+ htmltree index.html --no-color
104
+ ```
105
+
106
+ ## Python API
107
+
108
+ ```python
109
+ from htmltree import HtmlTree
110
+
111
+ html = open("index.html").read()
112
+
113
+ # Basic usage
114
+ tree = HtmlTree(html)
115
+ tree.print()
116
+
117
+ # Limit depth, filter attributes
118
+ tree = HtmlTree(html, max_depth=3, show_attrs=["id", "class"])
119
+ tree.print()
120
+
121
+ # Zoom into a sub-tree
122
+ tree = HtmlTree(html, max_depth=5, show_text=False)
123
+ tree.print(root_selector="body > main")
124
+
125
+ # Render to string
126
+ tree = HtmlTree(html, max_depth=2, force_color=False)
127
+ output = tree.render(root_selector="body")
128
+ print(output)
129
+
130
+ # Stream line by line (memory-efficient for large pages)
131
+ tree = HtmlTree(html, max_depth=4)
132
+ for line in tree.iter_lines(root_selector="#content"):
133
+ print(line)
134
+
135
+ # Access stats after render
136
+ tree.render()
137
+ print(tree.stats.total_tags)
138
+ print(tree.stats.tag_counts) # dict: tag name → count
139
+ print(tree.stats.max_depth_seen)
140
+ print(tree.stats.total_text_nodes)
141
+ print(tree.stats.total_comments)
142
+ ```
143
+
144
+ ## CLI reference
145
+
146
+ | Flag | Default | Description |
147
+ |------|---------|-------------|
148
+ | `SOURCE` | — | HTML file path, http/https URL, or `-` for stdin |
149
+ | `-d N` / `--depth N` | unlimited | Max depth; negatives clamped to 0 |
150
+ | `-s CSS` / `--selector CSS` | `<html>` | CSS selector for tree root |
151
+ | `--attrs [NAME …]` | all | Attributes to show; no names = hide all |
152
+ | `--no-text` | off | Hide text nodes |
153
+ | `--show-comments` | off | Show HTML comment nodes |
154
+ | `--text-limit N` | 60 | Max chars per text node |
155
+ | `--attr-limit N` | 80 | Max chars per attribute value |
156
+ | `--no-color` | off | Disable ANSI colors |
157
+ | `--force-color` | off | Force colors even when piped |
158
+ | `--no-summary` | off | Suppress stats footer |
159
+ | `-o FILE` / `--output FILE` | stdout | Write to file |
160
+ | `--parser BACKEND` | `html.parser` | `html.parser`, `lxml`, `html5lib` |
161
+ | `--version` | — | Print version and exit |
162
+
163
+ ## Tree legend
164
+
165
+ | Symbol | Meaning |
166
+ |--------|---------|
167
+ | `[L3]` | Node is at depth 3 |
168
+ | `[5ch]` | 5 direct tag children |
169
+ | `[empty]` | No children |
170
+ | `"text"` | Text node content (may be truncated) |
171
+ | `<!-- … -->` | HTML comment (with `--show-comments`) |
172
+ | `… (N children hidden)` | Sub-tree cut at depth limit |
173
+
174
+ ## Environment variables
175
+
176
+ | Variable | Effect |
177
+ |----------|--------|
178
+ | `NO_COLOR` | Any non-empty value disables ANSI colors (https://no-color.org/) |
179
+ | `FORCE_COLOR` | Any non-empty value forces ANSI colors even when piped |
180
+
181
+ ## Requirements
182
+
183
+ - Python ≥ 3.8
184
+ - `beautifulsoup4 ≥ 4.12`
185
+ - Optional: `lxml`, `html5lib`
186
+
187
+ ## License
188
+
189
+ [MIT](LICENSE)
190
+
191
+ ## 👤 Author
192
+
193
+ [Hadi Cahyadi](mailto:cumulus13@gmail.com)
194
+
195
+
196
+ [![Buy Me a Coffee](https://www.buymeacoffee.com/assets/img/custom_images/orange_img.png)](https://www.buymeacoffee.com/cumulus13)
197
+
198
+ [![Donate via Ko-fi](https://ko-fi.com/img/githubbutton_sm.svg)](https://ko-fi.com/cumulus13)
199
+
200
+ [Support me on Patreon](https://www.patreon.com/cumulus13)
@@ -0,0 +1,29 @@
1
+ #!/usr/bin/env python3
2
+
3
+ # File: htmltree/__init__.py
4
+ # Author: Hadi Cahyadi <cumulus13@gmail.com>
5
+ # Date: 2026-06-28
6
+ # Description: htmltree-view — Visualize HTML DOM structure as a depth-limited, colorized ASCII tree.
7
+ # License: MIT
8
+
9
+ """
10
+ htmltree-view — Visualize HTML DOM structure as a depth-limited, colorized ASCII tree.
11
+
12
+ Quick start
13
+ -----------
14
+ >>> from htmltree import HtmlTree
15
+ >>> tree = HtmlTree(open("index.html").read(), max_depth=3)
16
+ >>> tree.print()
17
+
18
+ CLI
19
+ ---
20
+ htmltree index.html -d 3
21
+ htmltree https://example.com --no-text
22
+ echo '<div><p>hi</p></div>' | htmltree -
23
+ """
24
+
25
+ from .core import HtmlTree, TreeStats
26
+ from .cli import main
27
+
28
+ __version__ = "0.2.0"
29
+ __all__ = ["HtmlTree", "TreeStats", "main"]