@kreuzberg/html-to-markdown-wasm 3.1.0 → 3.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/dist/README.md DELETED
@@ -1,154 +0,0 @@
1
- # html-to-markdown
2
-
3
- <div align="center" style="display: flex; flex-wrap: wrap; gap: 8px; justify-content: center; margin: 20px 0;">
4
- <!-- Language Bindings -->
5
- <a href="https://crates.io/crates/html-to-markdown-rs">
6
- <img src="https://img.shields.io/crates/v/html-to-markdown-rs?label=Rust&color=007ec6" alt="Rust">
7
- </a>
8
- <a href="https://pypi.org/project/html-to-markdown/">
9
- <img src="https://img.shields.io/pypi/v/html-to-markdown?label=Python&color=007ec6" alt="Python">
10
- </a>
11
- <a href="https://www.npmjs.com/package/@kreuzberg/html-to-markdown-node">
12
- <img src="https://img.shields.io/npm/v/@kreuzberg/html-to-markdown-node?label=Node.js&color=007ec6" alt="Node.js">
13
- </a>
14
- <a href="https://www.npmjs.com/package/@kreuzberg/html-to-markdown-wasm">
15
- <img src="https://img.shields.io/npm/v/@kreuzberg/html-to-markdown-wasm?label=WASM&color=007ec6" alt="WASM">
16
- </a>
17
- <a href="https://central.sonatype.com/artifact/dev.kreuzberg/html-to-markdown">
18
- <img src="https://img.shields.io/maven-central/v/dev.kreuzberg/html-to-markdown?label=Java&color=007ec6" alt="Java">
19
- </a>
20
- <a href="https://pkg.go.dev/github.com/kreuzberg-dev/html-to-markdown/packages/go/v3/htmltomarkdown">
21
- <img src="https://img.shields.io/github/v/tag/kreuzberg-dev/html-to-markdown?label=Go&color=007ec6&filter=v3.0.0" alt="Go">
22
- </a>
23
- <a href="https://www.nuget.org/packages/KreuzbergDev.HtmlToMarkdown/">
24
- <img src="https://img.shields.io/nuget/v/KreuzbergDev.HtmlToMarkdown?label=C%23&color=007ec6" alt="C#">
25
- </a>
26
- <a href="https://packagist.org/packages/kreuzberg-dev/html-to-markdown">
27
- <img src="https://img.shields.io/packagist/v/kreuzberg-dev/html-to-markdown?label=PHP&color=007ec6" alt="PHP">
28
- </a>
29
- <a href="https://rubygems.org/gems/html-to-markdown">
30
- <img src="https://img.shields.io/gem/v/html-to-markdown?label=Ruby&color=007ec6" alt="Ruby">
31
- </a>
32
- <a href="https://hex.pm/packages/html_to_markdown">
33
- <img src="https://img.shields.io/hexpm/v/html_to_markdown?label=Elixir&color=007ec6" alt="Elixir">
34
- </a>
35
- <a href="https://kreuzberg-dev.r-universe.dev/htmltomarkdown">
36
- <img src="https://img.shields.io/badge/R-htmltomarkdown-007ec6" alt="R">
37
- </a>
38
- <a href="https://github.com/kreuzberg-dev/html-to-markdown/releases">
39
- <img src="https://img.shields.io/badge/C-FFI-007ec6" alt="C">
40
- </a>
41
- <a href="https://docs.html-to-markdown.kreuzberg.dev">
42
- <img src="https://img.shields.io/badge/Docs-kreuzberg.dev-007ec6" alt="Documentation">
43
- </a>
44
- <a href="https://github.com/kreuzberg-dev/html-to-markdown/blob/main/LICENSE">
45
- <img src="https://img.shields.io/badge/License-MIT-007ec6" alt="License">
46
- </a>
47
- <a href="https://docs.html-to-markdown.kreuzberg.dev/demo/">
48
- <img src="https://img.shields.io/badge/%E2%96%B6%EF%B8%8F_Live_Demo-007ec6" alt="Live Demo">
49
- </a>
50
- </div>
51
-
52
- <img width="3384" height="573" alt="Banner" src="https://github.com/user-attachments/assets/478a83da-237b-446b-b3a8-e564c13e00a8" />
53
-
54
- <div align="center" style="margin-top: 20px;">
55
- <a href="https://discord.gg/pXxagNK2zN">
56
- <img height="22" src="https://img.shields.io/badge/Discord-Join%20our%20community-7289da?logo=discord&logoColor=white" alt="Discord">
57
- </a>
58
- </div>
59
-
60
- High-performance HTML to Markdown conversion powered by Rust. Ships as native bindings for **Rust, Python, TypeScript/Node.js, Ruby, PHP, Go, Java, C#, Elixir, R, C (FFI), and WebAssembly** with identical rendering across all runtimes.
61
-
62
- **[Documentation](https://docs.html-to-markdown.kreuzberg.dev)** | **[Live Demo](https://docs.html-to-markdown.kreuzberg.dev/demo/)** | **[API Reference](https://docs.html-to-markdown.kreuzberg.dev/reference/api-rust/)**
63
-
64
- ## Highlights
65
-
66
- - **150-280 MB/s** throughput (10-80x faster than pure Python alternatives)
67
- - **12 language bindings** with consistent output across all runtimes
68
- - **Structured result** — `convert()` returns `ConversionResult` with `content`, `metadata`, `tables`, `images`, and `warnings`
69
- - **Metadata extraction** — title, headers, links, images, structured data (JSON-LD, Microdata, RDFa)
70
- - **Visitor pattern** — custom callbacks for content filtering, URL rewriting, domain-specific dialects
71
- - **Table extraction** — extract structured table data (cells, headers, rendered markdown) during conversion
72
- - **Secure by default** — built-in HTML sanitization via ammonia
73
-
74
- ## Quick Start
75
-
76
- ```bash
77
- # Rust
78
- cargo add html-to-markdown-rs
79
-
80
- # Python
81
- pip install html-to-markdown
82
-
83
- # TypeScript / Node.js
84
- npm install @kreuzberg/html-to-markdown-node
85
-
86
- # Ruby
87
- gem install html-to-markdown
88
-
89
- # CLI
90
- cargo install html-to-markdown-cli
91
- # or
92
- brew install kreuzberg-dev/tap/html-to-markdown
93
- ```
94
-
95
- See the **[Installation Guide](https://docs.html-to-markdown.kreuzberg.dev/getting-started/installation/)** for all languages including PHP, Go, Java, C#, Elixir, R, and WASM.
96
-
97
- ### Usage
98
-
99
- `convert()` is the single entry point. It returns a structured `ConversionResult`:
100
-
101
- ```python
102
- # Python
103
- from html_to_markdown import convert
104
-
105
- result = convert("<h1>Hello</h1><p>World</p>")
106
- print(result["content"]) # # Hello\n\nWorld
107
- print(result["metadata"]) # title, links, headings, …
108
- ```
109
-
110
- ```typescript
111
- // TypeScript / Node.js
112
- import { convert } from "@kreuzberg/html-to-markdown-node";
113
-
114
- const result = convert("<h1>Hello</h1><p>World</p>");
115
- console.log(result.content); // # Hello\n\nWorld
116
- console.log(result.metadata); // title, links, headings, …
117
- ```
118
-
119
- ```rust
120
- // Rust
121
- use html_to_markdown_rs::convert;
122
-
123
- let result = convert("<h1>Hello</h1><p>World</p>", None)?;
124
- println!("{}", result.content.unwrap_or_default());
125
- ```
126
-
127
- ## Language Bindings
128
-
129
- | Language | Package | Install |
130
- |----------|---------|---------|
131
- | Rust | [html-to-markdown-rs](https://crates.io/crates/html-to-markdown-rs) | `cargo add html-to-markdown-rs` |
132
- | Python | [html-to-markdown](https://pypi.org/project/html-to-markdown/) | `pip install html-to-markdown` |
133
- | TypeScript / Node.js | [@kreuzberg/html-to-markdown-node](https://www.npmjs.com/package/@kreuzberg/html-to-markdown-node) | `npm install @kreuzberg/html-to-markdown-node` |
134
- | WebAssembly | [@kreuzberg/html-to-markdown-wasm](https://www.npmjs.com/package/@kreuzberg/html-to-markdown-wasm) | `npm install @kreuzberg/html-to-markdown-wasm` |
135
- | Ruby | [html-to-markdown](https://rubygems.org/gems/html-to-markdown) | `gem install html-to-markdown` |
136
- | PHP | [kreuzberg-dev/html-to-markdown](https://packagist.org/packages/kreuzberg-dev/html-to-markdown) | `composer require kreuzberg-dev/html-to-markdown` |
137
- | Go | [htmltomarkdown](https://pkg.go.dev/github.com/kreuzberg-dev/html-to-markdown/packages/go/v3/htmltomarkdown) | `go get github.com/kreuzberg-dev/html-to-markdown/packages/go/v3` |
138
- | Java | [dev.kreuzberg:html-to-markdown](https://central.sonatype.com/artifact/dev.kreuzberg/html-to-markdown) | Maven / Gradle |
139
- | C# | [KreuzbergDev.HtmlToMarkdown](https://www.nuget.org/packages/KreuzbergDev.HtmlToMarkdown/) | `dotnet add package KreuzbergDev.HtmlToMarkdown` |
140
- | Elixir | [html_to_markdown](https://hex.pm/packages/html_to_markdown) | `mix deps.get html_to_markdown` |
141
- | R | [htmltomarkdown](https://kreuzberg-dev.r-universe.dev/htmltomarkdown) | `install.packages("htmltomarkdown")` |
142
- | C (FFI) | [releases](https://github.com/kreuzberg-dev/html-to-markdown/releases) | Pre-built `.so` / `.dll` / `.dylib` |
143
-
144
- ## Part of the Kreuzberg Ecosystem
145
-
146
- html-to-markdown is developed by [kreuzberg.dev](https://kreuzberg.dev) and powers the HTML conversion pipeline in [Kreuzberg](https://docs.kreuzberg.dev), a document intelligence library for extracting text from PDFs, images, and office documents.
147
-
148
- ## Contributing
149
-
150
- Contributions welcome! See [CONTRIBUTING.md](CONTRIBUTING.md) for setup instructions and guidelines.
151
-
152
- ## License
153
-
154
- MIT License — see [LICENSE](LICENSE) for details.
package/dist-node/LICENSE DELETED
@@ -1,21 +0,0 @@
1
- The MIT License (MIT)
2
-
3
- Copyright 2024-2025 Na'aman Hirschfeld
4
-
5
- Permission is hereby granted, free of charge, to any person obtaining a copy
6
- of this software and associated documentation files (the "Software"), to deal
7
- in the Software without restriction, including without limitation the rights
8
- to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
- copies of the Software, and to permit persons to whom the Software is
10
- furnished to do so, subject to the following conditions:
11
-
12
- The above copyright notice and this permission notice shall be included in all
13
- copies or substantial portions of the Software.
14
-
15
- THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
- IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
- FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
- AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
- LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
- OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
- SOFTWARE.
@@ -1,154 +0,0 @@
1
- # html-to-markdown
2
-
3
- <div align="center" style="display: flex; flex-wrap: wrap; gap: 8px; justify-content: center; margin: 20px 0;">
4
- <!-- Language Bindings -->
5
- <a href="https://crates.io/crates/html-to-markdown-rs">
6
- <img src="https://img.shields.io/crates/v/html-to-markdown-rs?label=Rust&color=007ec6" alt="Rust">
7
- </a>
8
- <a href="https://pypi.org/project/html-to-markdown/">
9
- <img src="https://img.shields.io/pypi/v/html-to-markdown?label=Python&color=007ec6" alt="Python">
10
- </a>
11
- <a href="https://www.npmjs.com/package/@kreuzberg/html-to-markdown-node">
12
- <img src="https://img.shields.io/npm/v/@kreuzberg/html-to-markdown-node?label=Node.js&color=007ec6" alt="Node.js">
13
- </a>
14
- <a href="https://www.npmjs.com/package/@kreuzberg/html-to-markdown-wasm">
15
- <img src="https://img.shields.io/npm/v/@kreuzberg/html-to-markdown-wasm?label=WASM&color=007ec6" alt="WASM">
16
- </a>
17
- <a href="https://central.sonatype.com/artifact/dev.kreuzberg/html-to-markdown">
18
- <img src="https://img.shields.io/maven-central/v/dev.kreuzberg/html-to-markdown?label=Java&color=007ec6" alt="Java">
19
- </a>
20
- <a href="https://pkg.go.dev/github.com/kreuzberg-dev/html-to-markdown/packages/go/v3/htmltomarkdown">
21
- <img src="https://img.shields.io/github/v/tag/kreuzberg-dev/html-to-markdown?label=Go&color=007ec6&filter=v3.0.0" alt="Go">
22
- </a>
23
- <a href="https://www.nuget.org/packages/KreuzbergDev.HtmlToMarkdown/">
24
- <img src="https://img.shields.io/nuget/v/KreuzbergDev.HtmlToMarkdown?label=C%23&color=007ec6" alt="C#">
25
- </a>
26
- <a href="https://packagist.org/packages/kreuzberg-dev/html-to-markdown">
27
- <img src="https://img.shields.io/packagist/v/kreuzberg-dev/html-to-markdown?label=PHP&color=007ec6" alt="PHP">
28
- </a>
29
- <a href="https://rubygems.org/gems/html-to-markdown">
30
- <img src="https://img.shields.io/gem/v/html-to-markdown?label=Ruby&color=007ec6" alt="Ruby">
31
- </a>
32
- <a href="https://hex.pm/packages/html_to_markdown">
33
- <img src="https://img.shields.io/hexpm/v/html_to_markdown?label=Elixir&color=007ec6" alt="Elixir">
34
- </a>
35
- <a href="https://kreuzberg-dev.r-universe.dev/htmltomarkdown">
36
- <img src="https://img.shields.io/badge/R-htmltomarkdown-007ec6" alt="R">
37
- </a>
38
- <a href="https://github.com/kreuzberg-dev/html-to-markdown/releases">
39
- <img src="https://img.shields.io/badge/C-FFI-007ec6" alt="C">
40
- </a>
41
- <a href="https://docs.html-to-markdown.kreuzberg.dev">
42
- <img src="https://img.shields.io/badge/Docs-kreuzberg.dev-007ec6" alt="Documentation">
43
- </a>
44
- <a href="https://github.com/kreuzberg-dev/html-to-markdown/blob/main/LICENSE">
45
- <img src="https://img.shields.io/badge/License-MIT-007ec6" alt="License">
46
- </a>
47
- <a href="https://docs.html-to-markdown.kreuzberg.dev/demo/">
48
- <img src="https://img.shields.io/badge/%E2%96%B6%EF%B8%8F_Live_Demo-007ec6" alt="Live Demo">
49
- </a>
50
- </div>
51
-
52
- <img width="3384" height="573" alt="Banner" src="https://github.com/user-attachments/assets/478a83da-237b-446b-b3a8-e564c13e00a8" />
53
-
54
- <div align="center" style="margin-top: 20px;">
55
- <a href="https://discord.gg/pXxagNK2zN">
56
- <img height="22" src="https://img.shields.io/badge/Discord-Join%20our%20community-7289da?logo=discord&logoColor=white" alt="Discord">
57
- </a>
58
- </div>
59
-
60
- High-performance HTML to Markdown conversion powered by Rust. Ships as native bindings for **Rust, Python, TypeScript/Node.js, Ruby, PHP, Go, Java, C#, Elixir, R, C (FFI), and WebAssembly** with identical rendering across all runtimes.
61
-
62
- **[Documentation](https://docs.html-to-markdown.kreuzberg.dev)** | **[Live Demo](https://docs.html-to-markdown.kreuzberg.dev/demo/)** | **[API Reference](https://docs.html-to-markdown.kreuzberg.dev/reference/api-rust/)**
63
-
64
- ## Highlights
65
-
66
- - **150-280 MB/s** throughput (10-80x faster than pure Python alternatives)
67
- - **12 language bindings** with consistent output across all runtimes
68
- - **Structured result** — `convert()` returns `ConversionResult` with `content`, `metadata`, `tables`, `images`, and `warnings`
69
- - **Metadata extraction** — title, headers, links, images, structured data (JSON-LD, Microdata, RDFa)
70
- - **Visitor pattern** — custom callbacks for content filtering, URL rewriting, domain-specific dialects
71
- - **Table extraction** — extract structured table data (cells, headers, rendered markdown) during conversion
72
- - **Secure by default** — built-in HTML sanitization via ammonia
73
-
74
- ## Quick Start
75
-
76
- ```bash
77
- # Rust
78
- cargo add html-to-markdown-rs
79
-
80
- # Python
81
- pip install html-to-markdown
82
-
83
- # TypeScript / Node.js
84
- npm install @kreuzberg/html-to-markdown-node
85
-
86
- # Ruby
87
- gem install html-to-markdown
88
-
89
- # CLI
90
- cargo install html-to-markdown-cli
91
- # or
92
- brew install kreuzberg-dev/tap/html-to-markdown
93
- ```
94
-
95
- See the **[Installation Guide](https://docs.html-to-markdown.kreuzberg.dev/getting-started/installation/)** for all languages including PHP, Go, Java, C#, Elixir, R, and WASM.
96
-
97
- ### Usage
98
-
99
- `convert()` is the single entry point. It returns a structured `ConversionResult`:
100
-
101
- ```python
102
- # Python
103
- from html_to_markdown import convert
104
-
105
- result = convert("<h1>Hello</h1><p>World</p>")
106
- print(result["content"]) # # Hello\n\nWorld
107
- print(result["metadata"]) # title, links, headings, …
108
- ```
109
-
110
- ```typescript
111
- // TypeScript / Node.js
112
- import { convert } from "@kreuzberg/html-to-markdown-node";
113
-
114
- const result = convert("<h1>Hello</h1><p>World</p>");
115
- console.log(result.content); // # Hello\n\nWorld
116
- console.log(result.metadata); // title, links, headings, …
117
- ```
118
-
119
- ```rust
120
- // Rust
121
- use html_to_markdown_rs::convert;
122
-
123
- let result = convert("<h1>Hello</h1><p>World</p>", None)?;
124
- println!("{}", result.content.unwrap_or_default());
125
- ```
126
-
127
- ## Language Bindings
128
-
129
- | Language | Package | Install |
130
- |----------|---------|---------|
131
- | Rust | [html-to-markdown-rs](https://crates.io/crates/html-to-markdown-rs) | `cargo add html-to-markdown-rs` |
132
- | Python | [html-to-markdown](https://pypi.org/project/html-to-markdown/) | `pip install html-to-markdown` |
133
- | TypeScript / Node.js | [@kreuzberg/html-to-markdown-node](https://www.npmjs.com/package/@kreuzberg/html-to-markdown-node) | `npm install @kreuzberg/html-to-markdown-node` |
134
- | WebAssembly | [@kreuzberg/html-to-markdown-wasm](https://www.npmjs.com/package/@kreuzberg/html-to-markdown-wasm) | `npm install @kreuzberg/html-to-markdown-wasm` |
135
- | Ruby | [html-to-markdown](https://rubygems.org/gems/html-to-markdown) | `gem install html-to-markdown` |
136
- | PHP | [kreuzberg-dev/html-to-markdown](https://packagist.org/packages/kreuzberg-dev/html-to-markdown) | `composer require kreuzberg-dev/html-to-markdown` |
137
- | Go | [htmltomarkdown](https://pkg.go.dev/github.com/kreuzberg-dev/html-to-markdown/packages/go/v3/htmltomarkdown) | `go get github.com/kreuzberg-dev/html-to-markdown/packages/go/v3` |
138
- | Java | [dev.kreuzberg:html-to-markdown](https://central.sonatype.com/artifact/dev.kreuzberg/html-to-markdown) | Maven / Gradle |
139
- | C# | [KreuzbergDev.HtmlToMarkdown](https://www.nuget.org/packages/KreuzbergDev.HtmlToMarkdown/) | `dotnet add package KreuzbergDev.HtmlToMarkdown` |
140
- | Elixir | [html_to_markdown](https://hex.pm/packages/html_to_markdown) | `mix deps.get html_to_markdown` |
141
- | R | [htmltomarkdown](https://kreuzberg-dev.r-universe.dev/htmltomarkdown) | `install.packages("htmltomarkdown")` |
142
- | C (FFI) | [releases](https://github.com/kreuzberg-dev/html-to-markdown/releases) | Pre-built `.so` / `.dll` / `.dylib` |
143
-
144
- ## Part of the Kreuzberg Ecosystem
145
-
146
- html-to-markdown is developed by [kreuzberg.dev](https://kreuzberg.dev) and powers the HTML conversion pipeline in [Kreuzberg](https://docs.kreuzberg.dev), a document intelligence library for extracting text from PDFs, images, and office documents.
147
-
148
- ## Contributing
149
-
150
- Contributions welcome! See [CONTRIBUTING.md](CONTRIBUTING.md) for setup instructions and guidelines.
151
-
152
- ## License
153
-
154
- MIT License — see [LICENSE](LICENSE) for details.
package/dist-web/LICENSE DELETED
@@ -1,21 +0,0 @@
1
- The MIT License (MIT)
2
-
3
- Copyright 2024-2025 Na'aman Hirschfeld
4
-
5
- Permission is hereby granted, free of charge, to any person obtaining a copy
6
- of this software and associated documentation files (the "Software"), to deal
7
- in the Software without restriction, including without limitation the rights
8
- to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
- copies of the Software, and to permit persons to whom the Software is
10
- furnished to do so, subject to the following conditions:
11
-
12
- The above copyright notice and this permission notice shall be included in all
13
- copies or substantial portions of the Software.
14
-
15
- THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
- IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
- FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
- AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
- LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
- OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
- SOFTWARE.
@@ -1,154 +0,0 @@
1
- # html-to-markdown
2
-
3
- <div align="center" style="display: flex; flex-wrap: wrap; gap: 8px; justify-content: center; margin: 20px 0;">
4
- <!-- Language Bindings -->
5
- <a href="https://crates.io/crates/html-to-markdown-rs">
6
- <img src="https://img.shields.io/crates/v/html-to-markdown-rs?label=Rust&color=007ec6" alt="Rust">
7
- </a>
8
- <a href="https://pypi.org/project/html-to-markdown/">
9
- <img src="https://img.shields.io/pypi/v/html-to-markdown?label=Python&color=007ec6" alt="Python">
10
- </a>
11
- <a href="https://www.npmjs.com/package/@kreuzberg/html-to-markdown-node">
12
- <img src="https://img.shields.io/npm/v/@kreuzberg/html-to-markdown-node?label=Node.js&color=007ec6" alt="Node.js">
13
- </a>
14
- <a href="https://www.npmjs.com/package/@kreuzberg/html-to-markdown-wasm">
15
- <img src="https://img.shields.io/npm/v/@kreuzberg/html-to-markdown-wasm?label=WASM&color=007ec6" alt="WASM">
16
- </a>
17
- <a href="https://central.sonatype.com/artifact/dev.kreuzberg/html-to-markdown">
18
- <img src="https://img.shields.io/maven-central/v/dev.kreuzberg/html-to-markdown?label=Java&color=007ec6" alt="Java">
19
- </a>
20
- <a href="https://pkg.go.dev/github.com/kreuzberg-dev/html-to-markdown/packages/go/v3/htmltomarkdown">
21
- <img src="https://img.shields.io/github/v/tag/kreuzberg-dev/html-to-markdown?label=Go&color=007ec6&filter=v3.0.0" alt="Go">
22
- </a>
23
- <a href="https://www.nuget.org/packages/KreuzbergDev.HtmlToMarkdown/">
24
- <img src="https://img.shields.io/nuget/v/KreuzbergDev.HtmlToMarkdown?label=C%23&color=007ec6" alt="C#">
25
- </a>
26
- <a href="https://packagist.org/packages/kreuzberg-dev/html-to-markdown">
27
- <img src="https://img.shields.io/packagist/v/kreuzberg-dev/html-to-markdown?label=PHP&color=007ec6" alt="PHP">
28
- </a>
29
- <a href="https://rubygems.org/gems/html-to-markdown">
30
- <img src="https://img.shields.io/gem/v/html-to-markdown?label=Ruby&color=007ec6" alt="Ruby">
31
- </a>
32
- <a href="https://hex.pm/packages/html_to_markdown">
33
- <img src="https://img.shields.io/hexpm/v/html_to_markdown?label=Elixir&color=007ec6" alt="Elixir">
34
- </a>
35
- <a href="https://kreuzberg-dev.r-universe.dev/htmltomarkdown">
36
- <img src="https://img.shields.io/badge/R-htmltomarkdown-007ec6" alt="R">
37
- </a>
38
- <a href="https://github.com/kreuzberg-dev/html-to-markdown/releases">
39
- <img src="https://img.shields.io/badge/C-FFI-007ec6" alt="C">
40
- </a>
41
- <a href="https://docs.html-to-markdown.kreuzberg.dev">
42
- <img src="https://img.shields.io/badge/Docs-kreuzberg.dev-007ec6" alt="Documentation">
43
- </a>
44
- <a href="https://github.com/kreuzberg-dev/html-to-markdown/blob/main/LICENSE">
45
- <img src="https://img.shields.io/badge/License-MIT-007ec6" alt="License">
46
- </a>
47
- <a href="https://docs.html-to-markdown.kreuzberg.dev/demo/">
48
- <img src="https://img.shields.io/badge/%E2%96%B6%EF%B8%8F_Live_Demo-007ec6" alt="Live Demo">
49
- </a>
50
- </div>
51
-
52
- <img width="3384" height="573" alt="Banner" src="https://github.com/user-attachments/assets/478a83da-237b-446b-b3a8-e564c13e00a8" />
53
-
54
- <div align="center" style="margin-top: 20px;">
55
- <a href="https://discord.gg/pXxagNK2zN">
56
- <img height="22" src="https://img.shields.io/badge/Discord-Join%20our%20community-7289da?logo=discord&logoColor=white" alt="Discord">
57
- </a>
58
- </div>
59
-
60
- High-performance HTML to Markdown conversion powered by Rust. Ships as native bindings for **Rust, Python, TypeScript/Node.js, Ruby, PHP, Go, Java, C#, Elixir, R, C (FFI), and WebAssembly** with identical rendering across all runtimes.
61
-
62
- **[Documentation](https://docs.html-to-markdown.kreuzberg.dev)** | **[Live Demo](https://docs.html-to-markdown.kreuzberg.dev/demo/)** | **[API Reference](https://docs.html-to-markdown.kreuzberg.dev/reference/api-rust/)**
63
-
64
- ## Highlights
65
-
66
- - **150-280 MB/s** throughput (10-80x faster than pure Python alternatives)
67
- - **12 language bindings** with consistent output across all runtimes
68
- - **Structured result** — `convert()` returns `ConversionResult` with `content`, `metadata`, `tables`, `images`, and `warnings`
69
- - **Metadata extraction** — title, headers, links, images, structured data (JSON-LD, Microdata, RDFa)
70
- - **Visitor pattern** — custom callbacks for content filtering, URL rewriting, domain-specific dialects
71
- - **Table extraction** — extract structured table data (cells, headers, rendered markdown) during conversion
72
- - **Secure by default** — built-in HTML sanitization via ammonia
73
-
74
- ## Quick Start
75
-
76
- ```bash
77
- # Rust
78
- cargo add html-to-markdown-rs
79
-
80
- # Python
81
- pip install html-to-markdown
82
-
83
- # TypeScript / Node.js
84
- npm install @kreuzberg/html-to-markdown-node
85
-
86
- # Ruby
87
- gem install html-to-markdown
88
-
89
- # CLI
90
- cargo install html-to-markdown-cli
91
- # or
92
- brew install kreuzberg-dev/tap/html-to-markdown
93
- ```
94
-
95
- See the **[Installation Guide](https://docs.html-to-markdown.kreuzberg.dev/getting-started/installation/)** for all languages including PHP, Go, Java, C#, Elixir, R, and WASM.
96
-
97
- ### Usage
98
-
99
- `convert()` is the single entry point. It returns a structured `ConversionResult`:
100
-
101
- ```python
102
- # Python
103
- from html_to_markdown import convert
104
-
105
- result = convert("<h1>Hello</h1><p>World</p>")
106
- print(result["content"]) # # Hello\n\nWorld
107
- print(result["metadata"]) # title, links, headings, …
108
- ```
109
-
110
- ```typescript
111
- // TypeScript / Node.js
112
- import { convert } from "@kreuzberg/html-to-markdown-node";
113
-
114
- const result = convert("<h1>Hello</h1><p>World</p>");
115
- console.log(result.content); // # Hello\n\nWorld
116
- console.log(result.metadata); // title, links, headings, …
117
- ```
118
-
119
- ```rust
120
- // Rust
121
- use html_to_markdown_rs::convert;
122
-
123
- let result = convert("<h1>Hello</h1><p>World</p>", None)?;
124
- println!("{}", result.content.unwrap_or_default());
125
- ```
126
-
127
- ## Language Bindings
128
-
129
- | Language | Package | Install |
130
- |----------|---------|---------|
131
- | Rust | [html-to-markdown-rs](https://crates.io/crates/html-to-markdown-rs) | `cargo add html-to-markdown-rs` |
132
- | Python | [html-to-markdown](https://pypi.org/project/html-to-markdown/) | `pip install html-to-markdown` |
133
- | TypeScript / Node.js | [@kreuzberg/html-to-markdown-node](https://www.npmjs.com/package/@kreuzberg/html-to-markdown-node) | `npm install @kreuzberg/html-to-markdown-node` |
134
- | WebAssembly | [@kreuzberg/html-to-markdown-wasm](https://www.npmjs.com/package/@kreuzberg/html-to-markdown-wasm) | `npm install @kreuzberg/html-to-markdown-wasm` |
135
- | Ruby | [html-to-markdown](https://rubygems.org/gems/html-to-markdown) | `gem install html-to-markdown` |
136
- | PHP | [kreuzberg-dev/html-to-markdown](https://packagist.org/packages/kreuzberg-dev/html-to-markdown) | `composer require kreuzberg-dev/html-to-markdown` |
137
- | Go | [htmltomarkdown](https://pkg.go.dev/github.com/kreuzberg-dev/html-to-markdown/packages/go/v3/htmltomarkdown) | `go get github.com/kreuzberg-dev/html-to-markdown/packages/go/v3` |
138
- | Java | [dev.kreuzberg:html-to-markdown](https://central.sonatype.com/artifact/dev.kreuzberg/html-to-markdown) | Maven / Gradle |
139
- | C# | [KreuzbergDev.HtmlToMarkdown](https://www.nuget.org/packages/KreuzbergDev.HtmlToMarkdown/) | `dotnet add package KreuzbergDev.HtmlToMarkdown` |
140
- | Elixir | [html_to_markdown](https://hex.pm/packages/html_to_markdown) | `mix deps.get html_to_markdown` |
141
- | R | [htmltomarkdown](https://kreuzberg-dev.r-universe.dev/htmltomarkdown) | `install.packages("htmltomarkdown")` |
142
- | C (FFI) | [releases](https://github.com/kreuzberg-dev/html-to-markdown/releases) | Pre-built `.so` / `.dll` / `.dylib` |
143
-
144
- ## Part of the Kreuzberg Ecosystem
145
-
146
- html-to-markdown is developed by [kreuzberg.dev](https://kreuzberg.dev) and powers the HTML conversion pipeline in [Kreuzberg](https://docs.kreuzberg.dev), a document intelligence library for extracting text from PDFs, images, and office documents.
147
-
148
- ## Contributing
149
-
150
- Contributions welcome! See [CONTRIBUTING.md](CONTRIBUTING.md) for setup instructions and guidelines.
151
-
152
- ## License
153
-
154
- MIT License — see [LICENSE](LICENSE) for details.