html-to-markdown 1.4.0__py3-none-any.whl → 1.6.0__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.


This version of html-to-markdown might be problematic. Click here for more details.

@@ -1,249 +0,0 @@
1
- Metadata-Version: 2.4
2
- Name: html-to-markdown
3
- Version: 1.4.0
4
- Summary: A modern, type-safe Python library for converting HTML to Markdown with comprehensive tag support and customizable options
5
- Author-email: Na'aman Hirschfeld <nhirschfeld@gmail.com>
6
- License: MIT
7
- Project-URL: Changelog, https://github.com/Goldziher/html-to-markdown/releases
8
- Project-URL: Homepage, https://github.com/Goldziher/html-to-markdown
9
- Project-URL: Issues, https://github.com/Goldziher/html-to-markdown/issues
10
- Project-URL: Repository, https://github.com/Goldziher/html-to-markdown.git
11
- Keywords: beautifulsoup,cli-tool,converter,html,html2markdown,markdown,markup,text-extraction,text-processing
12
- Classifier: Development Status :: 5 - Production/Stable
13
- Classifier: Environment :: Console
14
- Classifier: Intended Audience :: Developers
15
- Classifier: License :: OSI Approved :: MIT License
16
- Classifier: Operating System :: OS Independent
17
- Classifier: Programming Language :: Python :: 3 :: Only
18
- Classifier: Programming Language :: Python :: 3.9
19
- Classifier: Programming Language :: Python :: 3.10
20
- Classifier: Programming Language :: Python :: 3.11
21
- Classifier: Programming Language :: Python :: 3.12
22
- Classifier: Programming Language :: Python :: 3.13
23
- Classifier: Topic :: Internet :: WWW/HTTP
24
- Classifier: Topic :: Software Development :: Libraries :: Python Modules
25
- Classifier: Topic :: Text Processing
26
- Classifier: Topic :: Text Processing :: Markup
27
- Classifier: Topic :: Text Processing :: Markup :: HTML
28
- Classifier: Topic :: Text Processing :: Markup :: Markdown
29
- Classifier: Topic :: Utilities
30
- Classifier: Typing :: Typed
31
- Requires-Python: >=3.9
32
- Description-Content-Type: text/markdown
33
- License-File: LICENSE
34
- Requires-Dist: beautifulsoup4>=4.13.4
35
- Dynamic: license-file
36
-
37
- # html-to-markdown
38
-
39
- A modern, fully typed Python library for converting HTML to Markdown. This library is a completely rewritten fork
40
- of [markdownify](https://pypi.org/project/markdownify/) with a modernized codebase, strict type safety and support for
41
- Python 3.9+.
42
-
43
- ## Features
44
-
45
- - Full type safety with strict MyPy adherence
46
- - Functional API design
47
- - Extensive test coverage
48
- - Configurable conversion options
49
- - CLI tool for easy conversions
50
- - Support for pre-configured BeautifulSoup instances
51
- - Strict semver versioning
52
-
53
- ## Installation
54
-
55
- ```shell
56
- pip install html-to-markdown
57
- ```
58
-
59
- ## Quick Start
60
-
61
- Convert HTML to Markdown with a single function call:
62
-
63
- ```python
64
- from html_to_markdown import convert_to_markdown
65
-
66
- html = """
67
- <article>
68
- <h1>Welcome</h1>
69
- <p>This is a <strong>sample</strong> with a <a href="https://example.com">link</a>.</p>
70
- <ul>
71
- <li>Item 1</li>
72
- <li>Item 2</li>
73
- </ul>
74
- </article>
75
- """
76
-
77
- markdown = convert_to_markdown(html)
78
- print(markdown)
79
- ```
80
-
81
- Output:
82
-
83
- ```markdown
84
- # Welcome
85
-
86
- This is a **sample** with a [link](https://example.com).
87
-
88
- * Item 1
89
- * Item 2
90
- ```
91
-
92
- ### Working with BeautifulSoup
93
-
94
- If you need more control over HTML parsing, you can pass a pre-configured BeautifulSoup instance:
95
-
96
- ```python
97
- from bs4 import BeautifulSoup
98
- from html_to_markdown import convert_to_markdown
99
-
100
- # Configure BeautifulSoup with your preferred parser
101
- soup = BeautifulSoup(html, "lxml") # Note: lxml requires additional installation
102
- markdown = convert_to_markdown(soup)
103
- ```
104
-
105
- ## Advanced Usage
106
-
107
- ### Customizing Conversion Options
108
-
109
- The library offers extensive customization through various options:
110
-
111
- ```python
112
- from html_to_markdown import convert_to_markdown
113
-
114
- html = "<div>Your content here...</div>"
115
- markdown = convert_to_markdown(
116
- html,
117
- heading_style="atx", # Use # style headers
118
- strong_em_symbol="*", # Use * for bold/italic
119
- bullets="*+-", # Define bullet point characters
120
- wrap=True, # Enable text wrapping
121
- wrap_width=100, # Set wrap width
122
- escape_asterisks=True, # Escape * characters
123
- code_language="python", # Default code block language
124
- )
125
- ```
126
-
127
- ### Custom Converters
128
-
129
- You can provide your own conversion functions for specific HTML tags:
130
-
131
- ```python
132
- from bs4.element import Tag
133
- from html_to_markdown import convert_to_markdown
134
-
135
- # Define a custom converter for the <b> tag
136
- def custom_bold_converter(*, tag: Tag, text: str, **kwargs) -> str:
137
- return f"IMPORTANT: {text}"
138
-
139
- html = "<p>This is a <b>bold statement</b>.</p>"
140
- markdown = convert_to_markdown(html, custom_converters={"b": custom_bold_converter})
141
- print(markdown)
142
- # Output: This is a IMPORTANT: bold statement.
143
- ```
144
-
145
- Custom converters take precedence over the built-in converters and can be used alongside other configuration options.
146
-
147
- ### Configuration Options
148
-
149
- | Option | Type | Default | Description |
150
- | -------------------- | ---- | -------------- | ------------------------------------------------------ |
151
- | `autolinks` | bool | `True` | Auto-convert URLs to Markdown links |
152
- | `bullets` | str | `'*+-'` | Characters to use for bullet points |
153
- | `code_language` | str | `''` | Default language for code blocks |
154
- | `heading_style` | str | `'underlined'` | Header style (`'underlined'`, `'atx'`, `'atx_closed'`) |
155
- | `escape_asterisks` | bool | `True` | Escape * characters |
156
- | `escape_underscores` | bool | `True` | Escape _ characters |
157
- | `wrap` | bool | `False` | Enable text wrapping |
158
- | `wrap_width` | int | `80` | Text wrap width |
159
-
160
- For a complete list of options, see the [Configuration](#configuration) section below.
161
-
162
- ## CLI Usage
163
-
164
- Convert HTML files directly from the command line:
165
-
166
- ```shell
167
- # Convert a file
168
- html_to_markdown input.html > output.md
169
-
170
- # Process stdin
171
- cat input.html | html_to_markdown > output.md
172
-
173
- # Use custom options
174
- html_to_markdown --heading-style atx --wrap --wrap-width 100 input.html > output.md
175
- ```
176
-
177
- View all available options:
178
-
179
- ```shell
180
- html_to_markdown --help
181
- ```
182
-
183
- ## Migration from Markdownify
184
-
185
- For existing projects using Markdownify, a compatibility layer is provided:
186
-
187
- ```python
188
- # Old code
189
- from markdownify import markdownify as md
190
-
191
- # New code - works the same way
192
- from html_to_markdown import markdownify as md
193
- ```
194
-
195
- The `markdownify` function is an alias for `convert_to_markdown` and provides identical functionality.
196
-
197
- ## Configuration
198
-
199
- Full list of configuration options:
200
-
201
- - `autolinks`: Convert valid URLs to Markdown links automatically
202
- - `bullets`: Characters to use for bullet points in lists
203
- - `code_language`: Default language for fenced code blocks
204
- - `code_language_callback`: Function to determine code block language
205
- - `convert`: List of HTML tags to convert (None = all supported tags)
206
- - `default_title`: Use default titles for elements like links
207
- - `escape_asterisks`: Escape * characters
208
- - `escape_misc`: Escape miscellaneous Markdown characters
209
- - `escape_underscores`: Escape _ characters
210
- - `heading_style`: Header style (underlined/atx/atx_closed)
211
- - `keep_inline_images_in`: Tags where inline images should be kept
212
- - `newline_style`: Style for handling newlines (spaces/backslash)
213
- - `strip`: Tags to remove from output
214
- - `strong_em_symbol`: Symbol for strong/emphasized text (\* or \_)
215
- - `sub_symbol`: Symbol for subscript text
216
- - `sup_symbol`: Symbol for superscript text
217
- - `wrap`: Enable text wrapping
218
- - `wrap_width`: Width for text wrapping
219
- - `convert_as_inline`: Treat content as inline elements
220
- - `custom_converters`: A mapping of HTML tag names to custom converter functions
221
-
222
- ## Contribution
223
-
224
- This library is open to contribution. Feel free to open issues or submit PRs. Its better to discuss issues before
225
- submitting PRs to avoid disappointment.
226
-
227
- ### Local Development
228
-
229
- 1. Clone the repo
230
-
231
- 1. Install the system dependencies
232
-
233
- 1. Install the full dependencies with `uv sync`
234
-
235
- 1. Install the pre-commit hooks with:
236
-
237
- ```shell
238
- pre-commit install && pre-commit install --hook-type commit-msg
239
- ```
240
-
241
- 1. Make your changes and submit a PR
242
-
243
- ## License
244
-
245
- This library uses the MIT license.
246
-
247
- ## Acknowledgments
248
-
249
- Special thanks to the original [markdownify](https://pypi.org/project/markdownify/) project creators and contributors.
@@ -1,14 +0,0 @@
1
- html_to_markdown/__init__.py,sha256=95S7_7mR_g88uTnFI0FaRNykrtAaSKb6sJbwSea2zjk,145
2
- html_to_markdown/__main__.py,sha256=DJyJX7NIK0BVPNS2r3BYJ0Ci_lKHhgVOpw7ZEqACH3c,323
3
- html_to_markdown/cli.py,sha256=Kfh2sF_ySE_fQ0qdwvUZ5Rqx-P4Y12uTpG8xF60gAq0,4789
4
- html_to_markdown/constants.py,sha256=Usk67k18tuRovJpKDsiEXdgH20KgqI9KOnK4Fbx-M5c,547
5
- html_to_markdown/converters.py,sha256=SHRAV1qIFQQdXSD_TToR_F_t8hw3-amz8rIs2Q84YbQ,12276
6
- html_to_markdown/processing.py,sha256=mzF6YNqhj2VoRN6_TafnZ4ZndOyFglsZXTNnOl4uvWM,10564
7
- html_to_markdown/py.typed,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
8
- html_to_markdown/utils.py,sha256=HJUDej5HSpXRtYv-OkCyD0hwnPnVfQCwY6rBRlIOt9s,1989
9
- html_to_markdown-1.4.0.dist-info/licenses/LICENSE,sha256=3J_HR5BWvUM1mlIrlkF32-uC1FM64gy8JfG17LBuheQ,1122
10
- html_to_markdown-1.4.0.dist-info/METADATA,sha256=LmjDer-QQkH6kSlkua1NBhnaKpIu4sVvyJZPPX5PgLk,8229
11
- html_to_markdown-1.4.0.dist-info/WHEEL,sha256=_zCd3N1l69ArxyTb8rzEoP9TpbYXkqRFSNOD5OuxnTs,91
12
- html_to_markdown-1.4.0.dist-info/entry_points.txt,sha256=xmFijrTfgYW7lOrZxZGRPciicQHa5KiXKkUhBCmICtQ,116
13
- html_to_markdown-1.4.0.dist-info/top_level.txt,sha256=Ev6djb1c4dSKr_-n4K-FpEGDkzBigXY6LuZ5onqS7AE,17
14
- html_to_markdown-1.4.0.dist-info/RECORD,,