@kreuzberg/html-to-markdown-wasm 3.4.0 → 3.5.0-rc.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ The MIT License (MIT)
2
+
3
+ Copyright 2024-2025 Na'aman Hirschfeld
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
package/README.md ADDED
@@ -0,0 +1,233 @@
1
+ # html-to-markdown
2
+
3
+ <div align="center" style="display: flex; flex-wrap: wrap; gap: 8px; justify-content: center; margin: 20px 0;">
4
+ <a href="https://github.com/kreuzberg-dev/alef">
5
+ <img src="https://img.shields.io/badge/Bindings-alef%20%D7%90-007ec6" alt="Bindings">
6
+ </a>
7
+ <!-- Language Bindings -->
8
+ <a href="https://crates.io/crates/html-to-markdown-rs">
9
+ <img src="https://img.shields.io/crates/v/html-to-markdown-rs?label=Rust&color=007ec6" alt="Rust">
10
+ </a>
11
+ <a href="https://pypi.org/project/html-to-markdown/">
12
+ <img src="https://img.shields.io/pypi/v/html-to-markdown?label=Python&color=007ec6" alt="Python">
13
+ </a>
14
+ <a href="https://www.npmjs.com/package/@kreuzberg/html-to-markdown-node">
15
+ <img src="https://img.shields.io/npm/v/@kreuzberg/html-to-markdown-node?label=Node.js&color=007ec6" alt="Node.js">
16
+ </a>
17
+ <a href="https://www.npmjs.com/package/@kreuzberg/html-to-markdown-wasm">
18
+ <img src="https://img.shields.io/npm/v/@kreuzberg/html-to-markdown-wasm?label=WASM&color=007ec6" alt="WASM">
19
+ </a>
20
+ <a href="https://central.sonatype.com/artifact/dev.kreuzberg/html-to-markdown">
21
+ <img src="https://img.shields.io/maven-central/v/dev.kreuzberg/html-to-markdown?label=Java&color=007ec6" alt="Java">
22
+ </a>
23
+ <a href="https://pkg.go.dev/github.com/kreuzberg-dev/html-to-markdown/packages/go/v3/htmltomarkdown">
24
+ <img src="https://img.shields.io/github/v/tag/kreuzberg-dev/html-to-markdown?label=Go&color=007ec6&filter=v3*" alt="Go">
25
+ </a>
26
+ <a href="https://www.nuget.org/packages/KreuzbergDev.HtmlToMarkdown/">
27
+ <img src="https://img.shields.io/nuget/v/KreuzbergDev.HtmlToMarkdown?label=C%23&color=007ec6" alt="C#">
28
+ </a>
29
+ <a href="https://packagist.org/packages/kreuzberg-dev/html-to-markdown">
30
+ <img src="https://img.shields.io/packagist/v/kreuzberg-dev/html-to-markdown?label=PHP&color=007ec6" alt="PHP">
31
+ </a>
32
+ <a href="https://rubygems.org/gems/html-to-markdown">
33
+ <img src="https://img.shields.io/gem/v/html-to-markdown?label=Ruby&color=007ec6" alt="Ruby">
34
+ </a>
35
+ <a href="https://hex.pm/packages/html_to_markdown">
36
+ <img src="https://img.shields.io/hexpm/v/html_to_markdown?label=Elixir&color=007ec6" alt="Elixir">
37
+ </a>
38
+ <a href="https://kreuzberg-dev.r-universe.dev/htmltomarkdown">
39
+ <img src="https://img.shields.io/badge/R-htmltomarkdown-007ec6" alt="R">
40
+ </a>
41
+ <a href="https://pub.dev/packages/h2m">
42
+ <img src="https://img.shields.io/pub/v/h2m?label=Dart&color=007ec6" alt="Dart">
43
+ </a>
44
+ <a href="https://central.sonatype.com/artifact/dev.kreuzberg/html-to-markdown-android">
45
+ <img src="https://img.shields.io/maven-central/v/dev.kreuzberg/html-to-markdown-android?label=Kotlin&color=007ec6" alt="Kotlin">
46
+ </a>
47
+ <a href="https://github.com/kreuzberg-dev/html-to-markdown/tree/main/packages/swift">
48
+ <img src="https://img.shields.io/badge/Swift-SPM-007ec6" alt="Swift">
49
+ </a>
50
+ <a href="https://github.com/kreuzberg-dev/html-to-markdown/tree/main/packages/zig">
51
+ <img src="https://img.shields.io/badge/Zig-package-007ec6" alt="Zig">
52
+ </a>
53
+ <a href="https://github.com/kreuzberg-dev/html-to-markdown/releases">
54
+ <img src="https://img.shields.io/badge/C-FFI-007ec6" alt="C FFI">
55
+ </a>
56
+
57
+ <!-- Project Info -->
58
+ <a href="https://github.com/kreuzberg-dev/html-to-markdown/blob/main/LICENSE">
59
+ <img src="https://img.shields.io/badge/License-MIT-007ec6" alt="License">
60
+ </a>
61
+ <a href="https://docs.html-to-markdown.kreuzberg.dev">
62
+ <img src="https://img.shields.io/badge/Docs-html--to--markdown-007ec6" alt="Documentation">
63
+ </a>
64
+ </div>
65
+
66
+ <div align="center" style="margin: 24px 0 0;">
67
+ <a href="https://kreuzberg.dev">
68
+ <img alt="html-to-markdown" src="https://github.com/user-attachments/assets/478a83da-237b-446b-b3a8-e564c13e00a8" />
69
+ </a>
70
+ </div>
71
+
72
+ <div align="center" style="display: flex; flex-wrap: wrap; gap: 12px; justify-content: center; margin: 28px 0 24px;">
73
+ <a href="https://discord.gg/xt9WY3GnKR">
74
+ <img height="22" src="https://img.shields.io/badge/Discord-Chat-007ec6?logo=discord&logoColor=white" alt="Join Discord">
75
+ </a>
76
+ <a href="https://docs.html-to-markdown.kreuzberg.dev/demo/">
77
+ <img height="22" src="https://img.shields.io/badge/Live%20Demo-Open-007ec6?logo=webassembly&logoColor=white" alt="Live Demo">
78
+ </a>
79
+ </div>
80
+
81
+ Pure WebAssembly build of html-to-markdown for browsers, Deno, Cloudflare Workers, and other JS
82
+ runtimes without a Node.js native-addon ABI. Single `.wasm` artifact loaded via `wasm-bindgen`,
83
+ shipped with TypeScript types and dist targets for nodejs, web, bundler, and deno.
84
+
85
+ ## What This Package Provides
86
+
87
+ - **Same renderer as every binding** — output matches Rust, Python, Node.js, Ruby, PHP, Go, Java, .NET, Elixir, R, Dart, Swift, Zig, C FFI, and WASM.
88
+ - **Structured conversion result** — Markdown plus metadata, links, headings, images, tables, and warnings where the binding exposes them.
89
+ - **Production defaults** — HTML is parsed with the Rust core, sanitized by default, and rendered without runtime-specific Markdown drift.
90
+
91
+ ## Installation
92
+
93
+ ```bash
94
+ pnpm add @kreuzberg/html-to-markdown-wasm
95
+ ```
96
+
97
+ ## Performance Snapshot
98
+
99
+ ## Quick Start
100
+
101
+ Basic conversion:
102
+
103
+ ```javascript
104
+ import init, { convert } from "@kreuzberg/html-to-markdown-wasm";
105
+
106
+ await init();
107
+
108
+ const html = "<h1>Hello</h1><p>This is <strong>fast</strong>!</p>";
109
+ const result = convert(html);
110
+ const markdown = result.content;
111
+ console.log(markdown);
112
+ ```
113
+
114
+ With conversion options:
115
+
116
+ ```javascript
117
+ import init, { convert } from "@kreuzberg/html-to-markdown-wasm";
118
+
119
+ await init();
120
+
121
+ const result = convert('<h1>Hello</h1><img src="pic.jpg">', {
122
+ headingStyle: "atx",
123
+ skipImages: true,
124
+ });
125
+ const markdown = result.content;
126
+ console.log(markdown);
127
+ ```
128
+
129
+ ## API Reference
130
+
131
+ ### Core Function
132
+
133
+ ### Options
134
+
135
+ **`ConversionOptions`** – Key configuration fields:
136
+
137
+ - `heading_style`: Heading format (`"underlined"` | `"atx"` | `"atx_closed"`) — default: `"underlined"`
138
+ - `list_indent_width`: Spaces per indent level — default: `2`
139
+ - `bullets`: Bullet characters cycle — default: `"*+-"`
140
+ - `wrap`: Enable text wrapping — default: `false`
141
+ - `wrap_width`: Wrap at column — default: `80`
142
+ - `code_language`: Default fenced code block language — default: none
143
+ - `extract_metadata`: Enable metadata extraction into `result.metadata` — default: `false`
144
+ - `extract_tables`: Enable structured table extraction into `result.tables` — default: `false`
145
+ - `output_format`: Output markup format (`"markdown"` | `"djot"` | `"plain"`) — default: `"markdown"`
146
+
147
+ ## Djot Output Format
148
+
149
+ The library supports converting HTML to [Djot](https://djot.net/), a lightweight markup language similar to Markdown but with a different syntax for some elements. Set `output_format` to `"djot"` to use this format.
150
+
151
+ ### Syntax Differences
152
+
153
+ | Element | Markdown | Djot |
154
+ | -------------- | ---------- | ---------- |
155
+ | Strong | `**text**` | `*text*` |
156
+ | Emphasis | `*text*` | `_text_` |
157
+ | Strikethrough | `~~text~~` | `{-text-}` |
158
+ | Inserted/Added | N/A | `{+text+}` |
159
+ | Highlighted | N/A | `{=text=}` |
160
+ | Subscript | N/A | `~text~` |
161
+ | Superscript | N/A | `^text^` |
162
+
163
+ ### Example Usage
164
+
165
+ Djot's extended syntax allows you to express more semantic meaning in lightweight text, making it useful for documents that require strikethrough, insertion tracking, or mathematical notation.
166
+
167
+ ## Plain Text Output
168
+
169
+ Set `output_format` to `"plain"` to strip all markup and return only visible text. This bypasses the Markdown conversion pipeline entirely for maximum speed.
170
+
171
+ Plain text mode is useful for search indexing, text extraction, and feeding content to LLMs.
172
+
173
+ ## Metadata Extraction
174
+
175
+ The metadata extraction feature enables comprehensive document analysis during conversion. Extract document properties, headers, links, images, and structured data in a single pass — all via the standard `convert()` function.
176
+
177
+ **Use Cases:**
178
+
179
+ - **SEO analysis** – Extract title, description, Open Graph tags, Twitter cards
180
+ - **Table of contents generation** – Build structured outlines from heading hierarchy
181
+ - **Content migration** – Document all external links and resources
182
+ - **Accessibility audits** – Check for images without alt text, empty links, invalid heading hierarchy
183
+ - **Link validation** – Classify and validate anchor, internal, external, email, and phone links
184
+
185
+ **Zero Overhead When Disabled:** Metadata extraction adds negligible overhead and happens during the HTML parsing pass. Pass `extract_metadata: true` in `ConversionOptions` to enable it; the result is available at `result.metadata`.
186
+
187
+ ### Example: Quick Start
188
+
189
+ ## Examples
190
+
191
+ ## Links
192
+
193
+ - **GitHub:** [github.com/kreuzberg-dev/html-to-markdown](https://github.com/kreuzberg-dev/html-to-markdown)
194
+ - **Discord:** [discord.gg/xt9WY3GnKR](https://discord.gg/xt9WY3GnKR)
195
+
196
+ ## Part of Kreuzberg.dev
197
+
198
+ - [Kreuzberg](https://github.com/kreuzberg-dev/kreuzberg) — document intelligence: text, tables, metadata from 90+ formats with optional OCR.
199
+ - [Kreuzberg Cloud](https://github.com/kreuzberg-dev/kreuzberg-cloud) — managed extraction API with SDKs, dashboards, and observability.
200
+ - [kreuzcrawl](https://github.com/kreuzberg-dev/kreuzcrawl) — web crawling and scraping with HTML→Markdown and headless-Chrome fallback.
201
+ - [liter-llm](https://github.com/kreuzberg-dev/liter-llm) — universal LLM API client with native bindings for 14 languages and 143 providers.
202
+ - [tree-sitter-language-pack](https://github.com/kreuzberg-dev/tree-sitter-language-pack) — tree-sitter grammars and code-intelligence primitives.
203
+ - [alef](https://github.com/kreuzberg-dev/alef) — the polyglot binding generator that produces every per-language binding across the 5 polyglot repos.
204
+ - [Discord](https://discord.gg/xt9WY3GnKR) — community, roadmap, announcements.
205
+
206
+ ## Contributing
207
+
208
+ We welcome contributions! Please see our [Contributing Guide](https://github.com/kreuzberg-dev/html-to-markdown/blob/main/CONTRIBUTING.md) for details on:
209
+
210
+ - Setting up the development environment
211
+ - Running tests locally
212
+ - Submitting pull requests
213
+ - Reporting issues
214
+
215
+ All contributions must follow our code quality standards (enforced via pre-commit hooks):
216
+
217
+ - Proper test coverage (Rust 95%+, language bindings 80%+)
218
+ - Formatting and linting checks
219
+ - Documentation for public APIs
220
+
221
+ ## License
222
+
223
+ MIT License – see [LICENSE](https://github.com/kreuzberg-dev/html-to-markdown/blob/main/LICENSE). Copyright © Kreuzberg, Inc.
224
+
225
+ ## Support
226
+
227
+ If you find this library useful, consider [sponsoring the project](https://github.com/sponsors/kreuzberg-dev).
228
+
229
+ Have questions or run into issues? We're here to help:
230
+
231
+ - **GitHub Issues:** [github.com/kreuzberg-dev/html-to-markdown/issues](https://github.com/kreuzberg-dev/html-to-markdown/issues)
232
+ - **Issues:** [github.com/kreuzberg-dev/html-to-markdown/issues](https://github.com/kreuzberg-dev/html-to-markdown/issues)
233
+ - **Discord Community:** [discord.gg/xt9WY3GnKR](https://discord.gg/xt9WY3GnKR)
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@kreuzberg/html-to-markdown-wasm",
3
- "version": "3.4.0",
3
+ "version": "3.5.0-rc.1",
4
4
  "private": false,
5
5
  "description": "High-performance HTML to Markdown converter",
6
6
  "license": "MIT",
@@ -16,9 +16,9 @@
16
16
  "README.md"
17
17
  ],
18
18
  "type": "module",
19
- "main": "pkg/nodejs/html-to-markdown_wasm.js",
20
- "module": "pkg/web/html-to-markdown_wasm.js",
21
- "types": "pkg/nodejs/html-to-markdown_wasm.d.ts",
19
+ "main": "pkg/nodejs/html_to_markdown_wasm.js",
20
+ "module": "pkg/web/html_to_markdown_wasm.js",
21
+ "types": "pkg/nodejs/html_to_markdown_wasm.d.ts",
22
22
  "scripts": {
23
23
  "build": "wasm-pack build --target nodejs --out-dir pkg/nodejs",
24
24
  "build:ci": "wasm-pack build --release --target nodejs --out-dir pkg/nodejs",
@@ -0,0 +1,21 @@
1
+ The MIT License (MIT)
2
+
3
+ Copyright 2024-2025 Na'aman Hirschfeld
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
@@ -0,0 +1,233 @@
1
+ # html-to-markdown
2
+
3
+ <div align="center" style="display: flex; flex-wrap: wrap; gap: 8px; justify-content: center; margin: 20px 0;">
4
+ <a href="https://github.com/kreuzberg-dev/alef">
5
+ <img src="https://img.shields.io/badge/Bindings-alef%20%D7%90-007ec6" alt="Bindings">
6
+ </a>
7
+ <!-- Language Bindings -->
8
+ <a href="https://crates.io/crates/html-to-markdown-rs">
9
+ <img src="https://img.shields.io/crates/v/html-to-markdown-rs?label=Rust&color=007ec6" alt="Rust">
10
+ </a>
11
+ <a href="https://pypi.org/project/html-to-markdown/">
12
+ <img src="https://img.shields.io/pypi/v/html-to-markdown?label=Python&color=007ec6" alt="Python">
13
+ </a>
14
+ <a href="https://www.npmjs.com/package/@kreuzberg/html-to-markdown-node">
15
+ <img src="https://img.shields.io/npm/v/@kreuzberg/html-to-markdown-node?label=Node.js&color=007ec6" alt="Node.js">
16
+ </a>
17
+ <a href="https://www.npmjs.com/package/@kreuzberg/html-to-markdown-wasm">
18
+ <img src="https://img.shields.io/npm/v/@kreuzberg/html-to-markdown-wasm?label=WASM&color=007ec6" alt="WASM">
19
+ </a>
20
+ <a href="https://central.sonatype.com/artifact/dev.kreuzberg/html-to-markdown">
21
+ <img src="https://img.shields.io/maven-central/v/dev.kreuzberg/html-to-markdown?label=Java&color=007ec6" alt="Java">
22
+ </a>
23
+ <a href="https://pkg.go.dev/github.com/kreuzberg-dev/html-to-markdown/packages/go/v3/htmltomarkdown">
24
+ <img src="https://img.shields.io/github/v/tag/kreuzberg-dev/html-to-markdown?label=Go&color=007ec6&filter=v3*" alt="Go">
25
+ </a>
26
+ <a href="https://www.nuget.org/packages/KreuzbergDev.HtmlToMarkdown/">
27
+ <img src="https://img.shields.io/nuget/v/KreuzbergDev.HtmlToMarkdown?label=C%23&color=007ec6" alt="C#">
28
+ </a>
29
+ <a href="https://packagist.org/packages/kreuzberg-dev/html-to-markdown">
30
+ <img src="https://img.shields.io/packagist/v/kreuzberg-dev/html-to-markdown?label=PHP&color=007ec6" alt="PHP">
31
+ </a>
32
+ <a href="https://rubygems.org/gems/html-to-markdown">
33
+ <img src="https://img.shields.io/gem/v/html-to-markdown?label=Ruby&color=007ec6" alt="Ruby">
34
+ </a>
35
+ <a href="https://hex.pm/packages/html_to_markdown">
36
+ <img src="https://img.shields.io/hexpm/v/html_to_markdown?label=Elixir&color=007ec6" alt="Elixir">
37
+ </a>
38
+ <a href="https://kreuzberg-dev.r-universe.dev/htmltomarkdown">
39
+ <img src="https://img.shields.io/badge/R-htmltomarkdown-007ec6" alt="R">
40
+ </a>
41
+ <a href="https://pub.dev/packages/h2m">
42
+ <img src="https://img.shields.io/pub/v/h2m?label=Dart&color=007ec6" alt="Dart">
43
+ </a>
44
+ <a href="https://central.sonatype.com/artifact/dev.kreuzberg/html-to-markdown-android">
45
+ <img src="https://img.shields.io/maven-central/v/dev.kreuzberg/html-to-markdown-android?label=Kotlin&color=007ec6" alt="Kotlin">
46
+ </a>
47
+ <a href="https://github.com/kreuzberg-dev/html-to-markdown/tree/main/packages/swift">
48
+ <img src="https://img.shields.io/badge/Swift-SPM-007ec6" alt="Swift">
49
+ </a>
50
+ <a href="https://github.com/kreuzberg-dev/html-to-markdown/tree/main/packages/zig">
51
+ <img src="https://img.shields.io/badge/Zig-package-007ec6" alt="Zig">
52
+ </a>
53
+ <a href="https://github.com/kreuzberg-dev/html-to-markdown/releases">
54
+ <img src="https://img.shields.io/badge/C-FFI-007ec6" alt="C FFI">
55
+ </a>
56
+
57
+ <!-- Project Info -->
58
+ <a href="https://github.com/kreuzberg-dev/html-to-markdown/blob/main/LICENSE">
59
+ <img src="https://img.shields.io/badge/License-MIT-007ec6" alt="License">
60
+ </a>
61
+ <a href="https://docs.html-to-markdown.kreuzberg.dev">
62
+ <img src="https://img.shields.io/badge/Docs-html--to--markdown-007ec6" alt="Documentation">
63
+ </a>
64
+ </div>
65
+
66
+ <div align="center" style="margin: 24px 0 0;">
67
+ <a href="https://kreuzberg.dev">
68
+ <img alt="html-to-markdown" src="https://github.com/user-attachments/assets/478a83da-237b-446b-b3a8-e564c13e00a8" />
69
+ </a>
70
+ </div>
71
+
72
+ <div align="center" style="display: flex; flex-wrap: wrap; gap: 12px; justify-content: center; margin: 28px 0 24px;">
73
+ <a href="https://discord.gg/xt9WY3GnKR">
74
+ <img height="22" src="https://img.shields.io/badge/Discord-Chat-007ec6?logo=discord&logoColor=white" alt="Join Discord">
75
+ </a>
76
+ <a href="https://docs.html-to-markdown.kreuzberg.dev/demo/">
77
+ <img height="22" src="https://img.shields.io/badge/Live%20Demo-Open-007ec6?logo=webassembly&logoColor=white" alt="Live Demo">
78
+ </a>
79
+ </div>
80
+
81
+ Pure WebAssembly build of html-to-markdown for browsers, Deno, Cloudflare Workers, and other JS
82
+ runtimes without a Node.js native-addon ABI. Single `.wasm` artifact loaded via `wasm-bindgen`,
83
+ shipped with TypeScript types and dist targets for nodejs, web, bundler, and deno.
84
+
85
+ ## What This Package Provides
86
+
87
+ - **Same renderer as every binding** — output matches Rust, Python, Node.js, Ruby, PHP, Go, Java, .NET, Elixir, R, Dart, Swift, Zig, C FFI, and WASM.
88
+ - **Structured conversion result** — Markdown plus metadata, links, headings, images, tables, and warnings where the binding exposes them.
89
+ - **Production defaults** — HTML is parsed with the Rust core, sanitized by default, and rendered without runtime-specific Markdown drift.
90
+
91
+ ## Installation
92
+
93
+ ```bash
94
+ pnpm add @kreuzberg/html-to-markdown-wasm
95
+ ```
96
+
97
+ ## Performance Snapshot
98
+
99
+ ## Quick Start
100
+
101
+ Basic conversion:
102
+
103
+ ```javascript
104
+ import init, { convert } from "@kreuzberg/html-to-markdown-wasm";
105
+
106
+ await init();
107
+
108
+ const html = "<h1>Hello</h1><p>This is <strong>fast</strong>!</p>";
109
+ const result = convert(html);
110
+ const markdown = result.content;
111
+ console.log(markdown);
112
+ ```
113
+
114
+ With conversion options:
115
+
116
+ ```javascript
117
+ import init, { convert } from "@kreuzberg/html-to-markdown-wasm";
118
+
119
+ await init();
120
+
121
+ const result = convert('<h1>Hello</h1><img src="pic.jpg">', {
122
+ headingStyle: "atx",
123
+ skipImages: true,
124
+ });
125
+ const markdown = result.content;
126
+ console.log(markdown);
127
+ ```
128
+
129
+ ## API Reference
130
+
131
+ ### Core Function
132
+
133
+ ### Options
134
+
135
+ **`ConversionOptions`** – Key configuration fields:
136
+
137
+ - `heading_style`: Heading format (`"underlined"` | `"atx"` | `"atx_closed"`) — default: `"underlined"`
138
+ - `list_indent_width`: Spaces per indent level — default: `2`
139
+ - `bullets`: Bullet characters cycle — default: `"*+-"`
140
+ - `wrap`: Enable text wrapping — default: `false`
141
+ - `wrap_width`: Wrap at column — default: `80`
142
+ - `code_language`: Default fenced code block language — default: none
143
+ - `extract_metadata`: Enable metadata extraction into `result.metadata` — default: `false`
144
+ - `extract_tables`: Enable structured table extraction into `result.tables` — default: `false`
145
+ - `output_format`: Output markup format (`"markdown"` | `"djot"` | `"plain"`) — default: `"markdown"`
146
+
147
+ ## Djot Output Format
148
+
149
+ The library supports converting HTML to [Djot](https://djot.net/), a lightweight markup language similar to Markdown but with a different syntax for some elements. Set `output_format` to `"djot"` to use this format.
150
+
151
+ ### Syntax Differences
152
+
153
+ | Element | Markdown | Djot |
154
+ | -------------- | ---------- | ---------- |
155
+ | Strong | `**text**` | `*text*` |
156
+ | Emphasis | `*text*` | `_text_` |
157
+ | Strikethrough | `~~text~~` | `{-text-}` |
158
+ | Inserted/Added | N/A | `{+text+}` |
159
+ | Highlighted | N/A | `{=text=}` |
160
+ | Subscript | N/A | `~text~` |
161
+ | Superscript | N/A | `^text^` |
162
+
163
+ ### Example Usage
164
+
165
+ Djot's extended syntax allows you to express more semantic meaning in lightweight text, making it useful for documents that require strikethrough, insertion tracking, or mathematical notation.
166
+
167
+ ## Plain Text Output
168
+
169
+ Set `output_format` to `"plain"` to strip all markup and return only visible text. This bypasses the Markdown conversion pipeline entirely for maximum speed.
170
+
171
+ Plain text mode is useful for search indexing, text extraction, and feeding content to LLMs.
172
+
173
+ ## Metadata Extraction
174
+
175
+ The metadata extraction feature enables comprehensive document analysis during conversion. Extract document properties, headers, links, images, and structured data in a single pass — all via the standard `convert()` function.
176
+
177
+ **Use Cases:**
178
+
179
+ - **SEO analysis** – Extract title, description, Open Graph tags, Twitter cards
180
+ - **Table of contents generation** – Build structured outlines from heading hierarchy
181
+ - **Content migration** – Document all external links and resources
182
+ - **Accessibility audits** – Check for images without alt text, empty links, invalid heading hierarchy
183
+ - **Link validation** – Classify and validate anchor, internal, external, email, and phone links
184
+
185
+ **Zero Overhead When Disabled:** Metadata extraction adds negligible overhead and happens during the HTML parsing pass. Pass `extract_metadata: true` in `ConversionOptions` to enable it; the result is available at `result.metadata`.
186
+
187
+ ### Example: Quick Start
188
+
189
+ ## Examples
190
+
191
+ ## Links
192
+
193
+ - **GitHub:** [github.com/kreuzberg-dev/html-to-markdown](https://github.com/kreuzberg-dev/html-to-markdown)
194
+ - **Discord:** [discord.gg/xt9WY3GnKR](https://discord.gg/xt9WY3GnKR)
195
+
196
+ ## Part of Kreuzberg.dev
197
+
198
+ - [Kreuzberg](https://github.com/kreuzberg-dev/kreuzberg) — document intelligence: text, tables, metadata from 90+ formats with optional OCR.
199
+ - [Kreuzberg Cloud](https://github.com/kreuzberg-dev/kreuzberg-cloud) — managed extraction API with SDKs, dashboards, and observability.
200
+ - [kreuzcrawl](https://github.com/kreuzberg-dev/kreuzcrawl) — web crawling and scraping with HTML→Markdown and headless-Chrome fallback.
201
+ - [liter-llm](https://github.com/kreuzberg-dev/liter-llm) — universal LLM API client with native bindings for 14 languages and 143 providers.
202
+ - [tree-sitter-language-pack](https://github.com/kreuzberg-dev/tree-sitter-language-pack) — tree-sitter grammars and code-intelligence primitives.
203
+ - [alef](https://github.com/kreuzberg-dev/alef) — the polyglot binding generator that produces every per-language binding across the 5 polyglot repos.
204
+ - [Discord](https://discord.gg/xt9WY3GnKR) — community, roadmap, announcements.
205
+
206
+ ## Contributing
207
+
208
+ We welcome contributions! Please see our [Contributing Guide](https://github.com/kreuzberg-dev/html-to-markdown/blob/main/CONTRIBUTING.md) for details on:
209
+
210
+ - Setting up the development environment
211
+ - Running tests locally
212
+ - Submitting pull requests
213
+ - Reporting issues
214
+
215
+ All contributions must follow our code quality standards (enforced via pre-commit hooks):
216
+
217
+ - Proper test coverage (Rust 95%+, language bindings 80%+)
218
+ - Formatting and linting checks
219
+ - Documentation for public APIs
220
+
221
+ ## License
222
+
223
+ MIT License – see [LICENSE](https://github.com/kreuzberg-dev/html-to-markdown/blob/main/LICENSE). Copyright © Kreuzberg, Inc.
224
+
225
+ ## Support
226
+
227
+ If you find this library useful, consider [sponsoring the project](https://github.com/sponsors/kreuzberg-dev).
228
+
229
+ Have questions or run into issues? We're here to help:
230
+
231
+ - **GitHub Issues:** [github.com/kreuzberg-dev/html-to-markdown/issues](https://github.com/kreuzberg-dev/html-to-markdown/issues)
232
+ - **Issues:** [github.com/kreuzberg-dev/html-to-markdown/issues](https://github.com/kreuzberg-dev/html-to-markdown/issues)
233
+ - **Discord Community:** [discord.gg/xt9WY3GnKR](https://discord.gg/xt9WY3GnKR)