@kreuzberg/html-to-markdown-wasm 2.29.0 → 3.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -17,63 +17,9 @@ Runs anywhere: Node.js, Deno, Bun, browsers, and edge runtimes.
17
17
  [![RubyGems](https://badge.fury.io/rb/html-to-markdown.svg)](https://rubygems.org/gems/html-to-markdown)
18
18
  [![NuGet](https://img.shields.io/nuget/v/KreuzbergDev.HtmlToMarkdown.svg)](https://www.nuget.org/packages/KreuzbergDev.HtmlToMarkdown/)
19
19
  [![Maven Central](https://img.shields.io/maven-central/v/dev.kreuzberg/html-to-markdown.svg)](https://central.sonatype.com/artifact/dev.kreuzberg/html-to-markdown)
20
- [![Go Reference](https://pkg.go.dev/badge/github.com/kreuzberg-dev/html-to-markdown/packages/go/v2/htmltomarkdown.svg)](https://pkg.go.dev/github.com/kreuzberg-dev/html-to-markdown/packages/go/v2/htmltomarkdown)
20
+ [![Go Reference](https://pkg.go.dev/badge/github.com/kreuzberg-dev/html-to-markdown/packages/go/v3/htmltomarkdown.svg)](https://pkg.go.dev/github.com/kreuzberg-dev/html-to-markdown/packages/go/v3/htmltomarkdown)
21
21
  [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://github.com/kreuzberg-dev/html-to-markdown/blob/main/LICENSE)
22
22
 
23
- ## Migration Guide (v2.18.x → v2.19.0)
24
-
25
- > **⚠️ BREAKING CHANGE: Package Namespace Update**
26
- >
27
- > In v2.19.0, the npm package namespace changed from `html-to-markdown-wasm` to `@kreuzberg/html-to-markdown-wasm` to reflect the new Kreuzberg.dev organization.
28
-
29
- ### Install Updated Package
30
-
31
- **Before (v2.18.x):**
32
- ```bash
33
- npm install html-to-markdown-wasm
34
- ```
35
-
36
- **After (v2.19.0+):**
37
- ```bash
38
- npm install @kreuzberg/html-to-markdown-wasm
39
- ```
40
-
41
- ### Update Import Statements
42
-
43
- **Before:**
44
- ```typescript
45
- import { convert } from 'html-to-markdown-wasm';
46
- // or
47
- import { convert } from "npm:html-to-markdown-wasm"; // Deno
48
- ```
49
-
50
- **After:**
51
- ```typescript
52
- import { convert } from '@kreuzberg/html-to-markdown-wasm';
53
- // or
54
- import { convert } from "npm:@kreuzberg/html-to-markdown-wasm"; // Deno
55
- ```
56
-
57
- ### Update Browser ESM Imports
58
-
59
- **Before:**
60
- ```javascript
61
- import init, { convert } from 'https://unpkg.com/html-to-markdown-wasm/dist-web/html_to_markdown_wasm.js';
62
- ```
63
-
64
- **After:**
65
- ```javascript
66
- import init, { convert } from 'https://unpkg.com/@kreuzberg/html-to-markdown-wasm/dist-web/html_to_markdown_wasm.js';
67
- ```
68
-
69
- ### Summary of Changes
70
-
71
- - Package renamed from `html-to-markdown-wasm` to `@kreuzberg/html-to-markdown-wasm`
72
- - All APIs remain identical
73
- - Full backward compatibility after updating package name and imports
74
-
75
- ---
76
-
77
23
  ## Performance
78
24
 
79
25
  Universal WebAssembly bindings with **excellent performance** across all JavaScript runtimes.
@@ -94,8 +40,8 @@ Universal WebAssembly bindings with **excellent performance** across all JavaScr
94
40
 
95
41
  ### Comparison
96
42
 
97
- - **vs Native NAPI**: ~1.17× slower (WASM has minimal overhead)
98
- - **vs Python**: ~6. faster (no FFI overhead)
43
+ - **vs Native NAPI**: ~1.17x slower (WASM has minimal overhead)
44
+ - **vs Python**: ~6.3x faster (no FFI overhead)
99
45
  - **Best for**: Universal deployment (browsers, Deno, edge runtimes, cross-platform apps)
100
46
 
101
47
  ### Benchmark Fixtures (Apple M4)
@@ -109,9 +55,6 @@ Numbers captured via the shared fixture harness in `tools/benchmark-harness`:
109
55
  | Medium (Python) | 657 KB | 121 |
110
56
  | Large (Rust) | 567 KB | 124 |
111
57
  | Small (Intro) | 463 KB | 163 |
112
- | hOCR German PDF | 44 KB | 1,637 |
113
- | hOCR Invoice | 4 KB | 7,775 |
114
- | hOCR Embedded Tables | 37 KB | 1,667 |
115
58
 
116
59
  > Expect slightly higher numbers in long-lived browser/Deno workers once the WASM module is warm.
117
60
 
@@ -142,8 +85,8 @@ import { convert } from "npm:@kreuzberg/html-to-markdown-wasm";
142
85
  import { convert } from '@kreuzberg/html-to-markdown-wasm';
143
86
 
144
87
  const html = '<h1>Hello World</h1><p>This is <strong>fast</strong>!</p>';
145
- const markdown = convert(html);
146
- console.log(markdown);
88
+ const result = convert(html);
89
+ console.log(result.content);
147
90
  // # Hello World
148
91
  //
149
92
  // This is **fast**!
@@ -151,45 +94,34 @@ console.log(markdown);
151
94
 
152
95
  > **Heads up for edge runtimes:** Cloudflare Workers, Vite dev servers, and other environments that instantiate `.wasm` files asynchronously must call `await initWasm()` (or `await wasmReady`) once during startup before invoking `convert`. Traditional bundlers (Webpack, Rollup) and Deno/Node imports continue to work without manual initialization.
153
96
 
154
- **Working Examples:**
155
- - [**Browser with Rollup**](https://github.com/kreuzberg-dev/html-to-markdown/tree/main/examples/wasm-rollup) - Using dist-web target in browser
156
- - [**Node.js**](https://github.com/kreuzberg-dev/html-to-markdown/tree/main/examples/wasm-node) - Using dist-node target
157
- - [**Cloudflare Workers**](https://github.com/kreuzberg-dev/html-to-markdown/tree/main/examples/wasm-cloudflare) - Using bundler target with Wrangler
97
+ ### WasmConversionResult Fields
158
98
 
159
- ### Reusing Options Handles
99
+ Every call to `convert()` returns a `WasmConversionResult` object with six fields:
160
100
 
161
- ```ts
162
- import {
163
- convertWithOptionsHandle,
164
- createConversionOptionsHandle,
165
- } from '@kreuzberg/html-to-markdown-wasm';
101
+ ```typescript
102
+ import { convert } from '@kreuzberg/html-to-markdown-wasm';
103
+
104
+ const result = convert(html);
166
105
 
167
- const handle = createConversionOptionsHandle({ hocrSpatialTables: false });
168
- const markdown = convertWithOptionsHandle('<h1>Reusable</h1>', handle);
106
+ result.content; // string | null -- converted Markdown (or djot/plain text)
107
+ result.document; // string | null -- structured document tree as JSON
108
+ result.metadata; // string | null -- extracted HTML metadata as JSON
109
+ result.tables; // Array -- all tables found in document order
110
+ result.images; // Array -- extracted inline images (data URIs, SVGs)
111
+ result.warnings; // Array -- non-fatal processing warnings
169
112
  ```
170
113
 
171
114
  ### Byte-Based Input (Buffers / Uint8Array)
172
115
 
173
- When you already have raw bytes (e.g., `fs.readFileSync`, Fetch API responses), skip re-encoding with `TextDecoder` by calling the byte-friendly helpers:
116
+ When you already have raw bytes (e.g., `fs.readFileSync`, Fetch API responses), skip re-encoding with `TextDecoder` by calling the byte-friendly helper:
174
117
 
175
118
  ```ts
176
- import {
177
- convertBytes,
178
- convertBytesWithOptionsHandle,
179
- createConversionOptionsHandle,
180
- convertBytesWithInlineImages,
181
- } from '@kreuzberg/html-to-markdown-wasm';
119
+ import { convertBytes } from '@kreuzberg/html-to-markdown-wasm';
182
120
  import { readFileSync } from 'node:fs';
183
121
 
184
122
  const htmlBytes = readFileSync('input.html'); // Buffer -> Uint8Array
185
- const markdown = convertBytes(htmlBytes);
186
-
187
- const handle = createConversionOptionsHandle({ headingStyle: 'atx' });
188
- const markdownFromHandle = convertBytesWithOptionsHandle(htmlBytes, handle);
189
-
190
- const inlineExtraction = convertBytesWithInlineImages(htmlBytes, null, {
191
- maxDecodedSizeBytes: 5 * 1024 * 1024,
192
- });
123
+ const result = convertBytes(htmlBytes);
124
+ console.log(result.content);
193
125
  ```
194
126
 
195
127
  ### With Options
@@ -197,7 +129,7 @@ const inlineExtraction = convertBytesWithInlineImages(htmlBytes, null, {
197
129
  ```typescript
198
130
  import { convert } from '@kreuzberg/html-to-markdown-wasm';
199
131
 
200
- const markdown = convert(html, {
132
+ const result = convert(html, {
201
133
  headingStyle: 'atx',
202
134
  codeBlockStyle: 'backticks',
203
135
  listIndentWidth: 2,
@@ -205,9 +137,10 @@ const markdown = convert(html, {
205
137
  wrap: true,
206
138
  wrapWidth: 80
207
139
  });
140
+ console.log(result.content);
208
141
  ```
209
142
 
210
- ### Preserve Complex HTML (NEW in v2.5)
143
+ ### Preserve Complex HTML
211
144
 
212
145
  ```typescript
213
146
  import { convert } from '@kreuzberg/html-to-markdown-wasm';
@@ -220,22 +153,23 @@ const html = `
220
153
  </table>
221
154
  `;
222
155
 
223
- const markdown = convert(html, {
156
+ const result = convert(html, {
224
157
  preserveTags: ['table'] // Keep tables as HTML
225
158
  });
159
+ console.log(result.content);
226
160
  ```
227
161
 
228
162
  ### Deno
229
163
 
230
164
  ```typescript
231
- import { convert } from "npm:html-to-markdown-wasm";
165
+ import { convert } from "npm:@kreuzberg/html-to-markdown-wasm";
232
166
 
233
167
  const html = await Deno.readTextFile("input.html");
234
- const markdown = convert(html, { headingStyle: "atx" });
235
- await Deno.writeTextFile("output.md", markdown);
168
+ const result = convert(html, { headingStyle: "atx" });
169
+ await Deno.writeTextFile("output.md", result.content ?? "");
236
170
  ```
237
171
 
238
- > **Performance Tip:** For Node.js/Bun, use [@kreuzberg/html-to-markdown-node](https://www.npmjs.com/package/@kreuzberg/html-to-markdown-node) for 1.17× better performance with native bindings.
172
+ > **Performance Tip:** For Node.js/Bun, use [@kreuzberg/html-to-markdown-node](https://www.npmjs.com/package/@kreuzberg/html-to-markdown-node) for 1.17x better performance with native bindings.
239
173
 
240
174
  ### Browser (ESM)
241
175
 
@@ -253,10 +187,10 @@ await Deno.writeTextFile("output.md", markdown);
253
187
  await init();
254
188
 
255
189
  const html = '<h1>Hello World</h1><p>This runs in the <strong>browser</strong>!</p>';
256
- const markdown = convert(html, { headingStyle: 'atx' });
190
+ const result = convert(html, { headingStyle: 'atx' });
257
191
 
258
- console.log(markdown);
259
- document.body.innerHTML = `<pre>${markdown}</pre>`;
192
+ console.log(result.content);
193
+ document.body.innerHTML = `<pre>${result.content}</pre>`;
260
194
  </script>
261
195
  </body>
262
196
  </html>
@@ -267,10 +201,11 @@ await Deno.writeTextFile("output.md", markdown);
267
201
  ```typescript
268
202
  import { convert } from '@kreuzberg/html-to-markdown-wasm';
269
203
 
270
- const markdown = convert('<h1>Hello</h1>', {
204
+ const result = convert('<h1>Hello</h1>', {
271
205
  headingStyle: 'atx',
272
206
  codeBlockStyle: 'backticks'
273
207
  });
208
+ console.log(result.content);
274
209
  ```
275
210
 
276
211
  ### Cloudflare Workers
@@ -286,28 +221,22 @@ export default {
286
221
  async fetch(request: Request): Promise<Response> {
287
222
  await ready;
288
223
  const html = await request.text();
289
- const markdown = convert(html, { headingStyle: 'atx' });
224
+ const result = convert(html, { headingStyle: 'atx' });
290
225
 
291
- return new Response(markdown, {
226
+ return new Response(result.content ?? "", {
292
227
  headers: { 'Content-Type': 'text/markdown' }
293
228
  });
294
229
  }
295
230
  };
296
231
  ```
297
232
 
298
- > See the full [Cloudflare Workers example](https://github.com/kreuzberg-dev/html-to-markdown/tree/main/examples/wasm-cloudflare) with Wrangler configuration.
299
233
 
300
234
  ## TypeScript
301
235
 
302
236
  Full TypeScript support with type definitions:
303
237
 
304
238
  ```typescript
305
- import {
306
- convert,
307
- convertWithInlineImages,
308
- WasmInlineImageConfig,
309
- type WasmConversionOptions
310
- } from '@kreuzberg/html-to-markdown-wasm';
239
+ import { convert, type WasmConversionOptions } from '@kreuzberg/html-to-markdown-wasm';
311
240
 
312
241
  const options: WasmConversionOptions = {
313
242
  headingStyle: 'atx',
@@ -317,40 +246,16 @@ const options: WasmConversionOptions = {
317
246
  wrapWidth: 80
318
247
  };
319
248
 
320
- const markdown = convert('<h1>Hello</h1>', options);
249
+ const result = convert('<h1>Hello</h1>', options);
250
+ console.log(result.content);
321
251
  ```
322
252
 
323
- ## Inline Images
253
+ ## Metadata and Tables
324
254
 
325
- Extract and decode inline images (data URIs, SVG):
255
+ Extract document metadata and structured tables from the conversion result:
326
256
 
327
257
  ```typescript
328
- import { convertWithInlineImages, WasmInlineImageConfig } from '@kreuzberg/html-to-markdown-wasm';
329
-
330
- const html = '<img src="data:image/png;base64,iVBORw0..." alt="Logo">';
331
-
332
- const config = new WasmInlineImageConfig(5 * 1024 * 1024); // 5MB max
333
- config.inferDimensions = true;
334
- config.filenamePrefix = 'img_';
335
- config.captureSvg = true;
336
-
337
- const result = convertWithInlineImages(html, null, config);
338
-
339
- console.log(result.markdown);
340
- console.log(`Extracted ${result.inlineImages.length} images`);
341
-
342
- for (const img of result.inlineImages) {
343
- console.log(`${img.filename}: ${img.format}, ${img.data.length} bytes`);
344
- // img.data is a Uint8Array - save to file or upload
345
- }
346
- ```
347
-
348
- ## Metadata Extraction
349
-
350
- Extract document metadata (headers, links, images, structured data) alongside Markdown conversion:
351
-
352
- ```typescript
353
- import { convertWithMetadata, WasmMetadataConfig } from '@kreuzberg/html-to-markdown-wasm';
258
+ import { convert } from '@kreuzberg/html-to-markdown-wasm';
354
259
 
355
260
  const html = `
356
261
  <html lang="en">
@@ -359,97 +264,23 @@ const html = `
359
264
  <h1>Main Title</h1>
360
265
  <p>Content with <a href="https://example.com">a link</a></p>
361
266
  <img src="https://example.com/image.jpg" alt="Example image">
267
+ <table>
268
+ <tr><th>Name</th><th>Value</th></tr>
269
+ <tr><td>Foo</td><td>42</td></tr>
270
+ </table>
362
271
  </body>
363
272
  </html>
364
273
  `;
365
274
 
366
- const config = new WasmMetadataConfig();
367
- config.extractHeaders = true;
368
- config.extractLinks = true;
369
- config.extractImages = true;
370
- config.extractStructuredData = true;
371
- config.maxStructuredDataSize = 1_000_000; // 1MB limit
372
-
373
- const result = convertWithMetadata(html, null, config);
374
-
375
- console.log(result.markdown);
376
- console.log('Document metadata:', result.metadata.document);
377
- // {
378
- // title: 'My Article',
379
- // language: 'en',
380
- // ...
381
- // }
382
-
383
- console.log('Headers:', result.metadata.headers);
384
- // [
385
- // { level: 1, text: 'Main Title', id: undefined, depth: 0, htmlOffset: ... }
386
- // ]
387
-
388
- console.log('Links:', result.metadata.links);
389
- // [
390
- // {
391
- // href: 'https://example.com',
392
- // text: 'a link',
393
- // linkType: 'external',
394
- // rel: [],
395
- // ...
396
- // }
397
- // ]
398
-
399
- console.log('Images:', result.metadata.images);
400
- // [
401
- // {
402
- // src: 'https://example.com/image.jpg',
403
- // alt: 'Example image',
404
- // imageType: 'external',
405
- // ...
406
- // }
407
- // ]
408
- ```
409
-
410
- ### Metadata Configuration
411
-
412
- The `WasmMetadataConfig` class controls what metadata is extracted:
413
-
414
- ```typescript
415
- import { WasmMetadataConfig } from '@kreuzberg/html-to-markdown-wasm';
416
-
417
- const config = new WasmMetadataConfig();
418
-
419
- // Enable/disable extraction types
420
- config.extractHeaders = true; // h1-h6 elements
421
- config.extractLinks = true; // <a> elements with link type classification
422
- config.extractImages = true; // <img> and <svg> elements
423
- config.extractStructuredData = true; // JSON-LD, Microdata, RDFa
424
-
425
- // Limit structured data size to prevent memory exhaustion
426
- config.maxStructuredDataSize = 1_000_000; // 1MB default
427
- ```
428
-
429
- ### Metadata Structure
430
-
431
- The returned metadata object includes:
432
-
433
- - **document**: Document-level metadata (title, description, keywords, language, OG tags, Twitter cards, etc.)
434
- - **headers**: Array of header elements with level, text, id, and document position
435
- - **links**: Array of links with href, text, type (anchor/internal/external/email/phone), and rel attributes
436
- - **images**: Array of images with src, alt text, dimensions, and type classification (dataUri/external/relative/svg)
437
- - **structuredData**: Array of JSON-LD, Microdata, and RDFa blocks
438
-
439
- ### Byte-Based Input
440
-
441
- Convert bytes directly with metadata extraction:
442
-
443
- ```typescript
444
- import { convertBytesWithMetadata, WasmMetadataConfig } from '@kreuzberg/html-to-markdown-wasm';
445
- import { readFileSync } from 'node:fs';
446
-
447
- const htmlBytes = readFileSync('article.html');
448
- const config = new WasmMetadataConfig();
275
+ const result = convert(html, {
276
+ extractMetadata: true,
277
+ });
449
278
 
450
- const result = convertBytesWithMetadata(htmlBytes, null, config);
451
- console.log(result.markdown);
452
- console.log(result.metadata);
279
+ console.log(result.content); // Markdown output
280
+ console.log(result.metadata); // JSON string with title, links, headers, etc.
281
+ console.log(result.tables.length); // Number of tables found
282
+ console.log(result.images.length); // Number of inline images extracted
283
+ console.log(result.warnings); // Any processing warnings
453
284
  ```
454
285
 
455
286
  ## Build Targets
@@ -466,33 +297,33 @@ Three build targets are provided for different environments:
466
297
 
467
298
  | Runtime | Support | Package |
468
299
  | ------------------------- | ---------------------------- | -------------- |
469
- | **Node.js** 18+ | Full support | `dist-node` |
470
- | **Deno** | Full support | npm: specifier |
471
- | **Bun** | Full support (prefer native) | Default export |
472
- | **Browsers** | Full support | `dist-web` |
473
- | **Cloudflare Workers** | Full support | Default export |
474
- | **Deno Deploy** | Full support | npm: specifier |
300
+ | **Node.js** 18+ | Full support | `dist-node` |
301
+ | **Deno** | Full support | npm: specifier |
302
+ | **Bun** | Full support (prefer native) | Default export |
303
+ | **Browsers** | Full support | `dist-web` |
304
+ | **Cloudflare Workers** | Full support | Default export |
305
+ | **Deno Deploy** | Full support | npm: specifier |
475
306
 
476
307
  ## When to Use
477
308
 
478
309
  Choose `@kreuzberg/html-to-markdown-wasm` when:
479
310
 
480
- - 🌐 Running in browsers or edge runtimes
481
- - 🦕 Using Deno
482
- - ☁️ Deploying to Cloudflare Workers, Deno Deploy
483
- - 📦 Building universal libraries
484
- - 🔄 Need consistent behavior across all platforms
311
+ - Running in browsers or edge runtimes
312
+ - Using Deno
313
+ - Deploying to Cloudflare Workers, Deno Deploy
314
+ - Building universal libraries
315
+ - Need consistent behavior across all platforms
485
316
 
486
317
  Use [@kreuzberg/html-to-markdown-node](https://www.npmjs.com/package/@kreuzberg/html-to-markdown-node) for:
487
318
 
488
- - Maximum performance in Node.js/Bun (~ faster)
489
- - 🖥️ Server-side only applications
319
+ - Maximum performance in Node.js/Bun (~3x faster)
320
+ - Server-side only applications
490
321
 
491
322
  ## Visitor Pattern Support
492
323
 
493
324
  **The WebAssembly binding does not support the visitor pattern.** The visitor pattern requires callbacks and stateful execution across the WebAssembly/JavaScript boundary, which has fundamental limitations:
494
325
 
495
- ### Why WASM Doesn't Support Visitors
326
+ ### Why WASM Does Not Support Visitors
496
327
 
497
328
  1. **Memory safety across FFI boundary**: The WASM/JS boundary cannot safely pass mutable function callbacks that maintain state across multiple invocations
498
329
  2. **Single-threaded execution model**: WASM runs on a single thread with no equivalent to Node.js's `ThreadsafeFunction` FFI primitive
@@ -504,10 +335,11 @@ Use [@kreuzberg/html-to-markdown-node](https://www.npmjs.com/package/@kreuzberg/
504
335
  Choose one of these approaches:
505
336
 
506
337
  #### 1. Use Node.js Binding (Recommended)
338
+
507
339
  For best performance with visitor support, use the native Node.js binding:
508
340
 
509
341
  ```typescript
510
- import { convertWithVisitor, type Visitor } from '@kreuzberg/html-to-markdown-node';
342
+ import { convert, type Visitor } from '@kreuzberg/html-to-markdown-node';
511
343
 
512
344
  const visitor: Visitor = {
513
345
  visitLink(ctx, href, text, title) {
@@ -516,28 +348,32 @@ const visitor: Visitor = {
516
348
  },
517
349
  };
518
350
 
519
- const markdown = convertWithVisitor(html, { visitor });
351
+ const result = convert(html, undefined, visitor);
352
+ console.log(result.content);
520
353
  ```
521
354
 
522
- **Performance:** ~ faster than WASM, full visitor pattern support.
355
+ **Performance:** ~3x faster than WASM, full visitor pattern support.
523
356
  **Use when:** Running on Node.js or Bun server-side.
524
357
 
525
358
  #### 2. Use Server-Side Bindings
359
+
526
360
  For other platforms, use Python, Ruby, or PHP bindings with visitor support:
527
361
 
528
362
  **Python:**
363
+
529
364
  ```python
530
- from html_to_markdown import convert_with_visitor
365
+ from html_to_markdown import convert
531
366
 
532
367
  class MyVisitor:
533
368
  def visit_link(self, ctx, href, text, title):
534
369
  # Your visitor logic here
535
370
  return {"type": "continue"}
536
371
 
537
- markdown = convert_with_visitor(html, visitor=MyVisitor())
372
+ result = convert(html, None, MyVisitor())
538
373
  ```
539
374
 
540
375
  **Ruby:**
376
+
541
377
  ```ruby
542
378
  require 'html_to_markdown'
543
379
 
@@ -547,10 +383,11 @@ class MyVisitor
547
383
  end
548
384
  end
549
385
 
550
- markdown = HtmlToMarkdown.convert_with_visitor(html, visitor: MyVisitor.new)
386
+ result = HtmlToMarkdown.convert(html, nil, MyVisitor.new)
551
387
  ```
552
388
 
553
389
  **PHP:**
390
+
554
391
  ```php
555
392
  use HtmlToMarkdown\Converter;
556
393
 
@@ -560,10 +397,11 @@ class MyVisitor {
560
397
  }
561
398
  }
562
399
 
563
- $markdown = Converter::convertWithVisitor($html, new MyVisitor());
400
+ $result = Converter::convert($html, null, new MyVisitor());
564
401
  ```
565
402
 
566
403
  #### 3. Preprocess HTML Before Conversion
404
+
567
405
  For simple transformations, manipulate the HTML before passing to WASM:
568
406
 
569
407
  ```typescript
@@ -575,18 +413,20 @@ const processedHtml = html.replace(
575
413
  'https://new-cdn.com'
576
414
  );
577
415
 
578
- const markdown = convert(processedHtml);
416
+ const result = convert(processedHtml);
417
+ console.log(result.content);
579
418
  ```
580
419
 
581
420
  **Use when:** Only simple text replacements are needed.
582
421
 
583
422
  #### 4. Post-Process Markdown
423
+
584
424
  Transform the output Markdown after conversion:
585
425
 
586
426
  ```typescript
587
427
  import { convert } from '@kreuzberg/html-to-markdown-wasm';
588
428
 
589
- const markdown = convert(html);
429
+ const markdown = convert(html).content ?? "";
590
430
 
591
431
  // Post-process the markdown
592
432
  const transformed = markdown
@@ -600,30 +440,30 @@ const transformed = markdown
600
440
 
601
441
  | Binding | Visitor Support | Best For |
602
442
  |---------|-----------------|----------|
603
- | **Rust** | Yes | Core library, performance-critical code |
604
- | **Python** | Yes (sync & async) | Server-side, bulk processing |
605
- | **TypeScript/Node.js** | Yes (sync & async) | Server-side Node.js/Bun, best performance |
606
- | **Ruby** | Yes | Server-side Ruby on Rails, Sinatra |
607
- | **PHP** | Yes | Server-side PHP, content management |
608
- | **Go** | No | Basic conversion only |
609
- | **Java** | No | Basic conversion only |
610
- | **C#** | No | Basic conversion only |
611
- | **Elixir** | No | Basic conversion only |
612
- | **WebAssembly** | No | Browser, Edge, Deno (see alternatives above) |
613
-
614
- For comprehensive visitor pattern documentation with examples, see [Visitor Pattern Guide](../../examples/visitor-pattern/).
443
+ | **Rust** | Yes | Core library, performance-critical code |
444
+ | **Python** | Yes (sync and async) | Server-side, bulk processing |
445
+ | **TypeScript/Node.js** | Yes (sync and async) | Server-side Node.js/Bun, best performance |
446
+ | **Ruby** | Yes | Server-side Ruby on Rails, Sinatra |
447
+ | **PHP** | Yes | Server-side PHP, content management |
448
+ | **Go** | No | Basic conversion only |
449
+ | **Java** | No | Basic conversion only |
450
+ | **C#** | No | Basic conversion only |
451
+ | **Elixir** | No | Basic conversion only |
452
+ | **WebAssembly** | No | Browser, Edge, Deno (see alternatives above) |
453
+
454
+ For comprehensive visitor pattern documentation with examples, see the [full documentation](https://docs.html-to-markdown.kreuzberg.dev).
615
455
 
616
456
  ## Configuration Options
617
457
 
618
- See the [TypeScript definitions](./dist-node/html_to_markdown_wasm.d.ts) for all available options:
458
+ Available options:
619
459
 
620
460
  - Heading styles (atx, underlined, atxClosed)
621
461
  - Code block styles (indented, backticks, tildes)
622
462
  - List formatting (indent width, bullet characters)
623
463
  - Text escaping and formatting
624
464
  - Tag preservation (`preserveTags`) and stripping (`stripTags`)
465
+ - Metadata extraction (`extractMetadata`)
625
466
  - Preprocessing for web scraping
626
- - hOCR table extraction
627
467
  - And more...
628
468
 
629
469
  ## Examples
@@ -644,35 +484,36 @@ const html = `
644
484
  <p>After table</p>
645
485
  `;
646
486
 
647
- const markdown = convert(html, {
487
+ const result = convert(html, {
648
488
  preserveTags: ['table']
649
489
  });
650
490
 
651
- // Result includes the table as HTML
491
+ // result.content includes the table as HTML
652
492
  ```
653
493
 
654
494
  Combine with `stripTags`:
655
495
 
656
496
  ```typescript
657
- const markdown = convert(html, {
497
+ const result = convert(html, {
658
498
  preserveTags: ['table', 'form'], // Keep as HTML
659
499
  stripTags: ['script', 'style'] // Remove entirely
660
500
  });
501
+ console.log(result.content);
661
502
  ```
662
503
 
663
504
  ### Deno Web Server
664
505
 
665
506
  ```typescript
666
- import { convert } from "npm:html-to-markdown-wasm";
507
+ import { convert } from "npm:@kreuzberg/html-to-markdown-wasm";
667
508
 
668
- Deno.serve((req) => {
509
+ Deno.serve(async (req) => {
669
510
  const url = new URL(req.url);
670
511
 
671
512
  if (url.pathname === "/convert" && req.method === "POST") {
672
513
  const html = await req.text();
673
- const markdown = convert(html, { headingStyle: "atx" });
514
+ const result = convert(html, { headingStyle: "atx" });
674
515
 
675
- return new Response(markdown, {
516
+ return new Response(result.content ?? "", {
676
517
  headers: { "Content-Type": "text/markdown" }
677
518
  });
678
519
  }
@@ -696,8 +537,8 @@ Deno.serve((req) => {
696
537
  window.convertFile = async () => {
697
538
  const file = document.getElementById('htmlFile').files[0];
698
539
  const html = await file.text();
699
- const markdown = convert(html, { headingStyle: 'atx' });
700
- document.getElementById('output').textContent = markdown;
540
+ const result = convert(html, { headingStyle: 'atx' });
541
+ document.getElementById('output').textContent = result.content ?? "";
701
542
  };
702
543
  </script>
703
544
  ```
@@ -705,42 +546,39 @@ Deno.serve((req) => {
705
546
  ### Web Scraping (Deno)
706
547
 
707
548
  ```typescript
708
- import { convert } from "npm:html-to-markdown-wasm";
549
+ import { convert } from "npm:@kreuzberg/html-to-markdown-wasm";
709
550
 
710
551
  const response = await fetch("https://example.com");
711
552
  const html = await response.text();
712
553
 
713
- const markdown = convert(html, {
714
- preprocessing: {
715
- enabled: true,
716
- preset: "aggressive",
717
- removeNavigation: true,
718
- removeForms: true
719
- },
554
+ const result = convert(html, {
555
+ preprocess: true,
556
+ preset: "aggressive",
557
+ keepNavigation: false,
720
558
  headingStyle: "atx",
721
559
  codeBlockStyle: "backticks"
722
560
  });
723
561
 
724
- console.log(markdown);
562
+ console.log(result.content);
725
563
  ```
726
564
 
727
565
  ## Other Runtimes
728
566
 
729
567
  The same Rust engine ships as native bindings for other ecosystems:
730
568
 
731
- - 🖥️ Node.js / Bun: [`html-to-markdown-node`](https://www.npmjs.com/package/html-to-markdown-node)
732
- - 🐍 Python: [`html-to-markdown`](https://pypi.org/project/html-to-markdown/)
733
- - 💎 Ruby: [`html-to-markdown`](https://rubygems.org/gems/html-to-markdown)
734
- - 🐘 PHP: [`kreuzberg-dev/html-to-markdown`](https://packagist.org/packages/kreuzberg-dev/html-to-markdown)
735
- - 🦀 Rust crate & CLI: [`html-to-markdown-rs`](https://crates.io/crates/html-to-markdown-rs)
569
+ - Node.js / Bun: [`@kreuzberg/html-to-markdown-node`](https://www.npmjs.com/package/@kreuzberg/html-to-markdown-node)
570
+ - Python: [`html-to-markdown`](https://pypi.org/project/html-to-markdown/)
571
+ - Ruby: [`html-to-markdown`](https://rubygems.org/gems/html-to-markdown)
572
+ - PHP: [`kreuzberg-dev/html-to-markdown`](https://packagist.org/packages/kreuzberg-dev/html-to-markdown)
573
+ - Rust crate and CLI: [`html-to-markdown-rs`](https://crates.io/crates/html-to-markdown-rs)
736
574
 
737
575
  ## Links
738
576
 
739
577
  - [GitHub Repository](https://github.com/kreuzberg-dev/html-to-markdown)
740
578
  - [Full Documentation](https://github.com/kreuzberg-dev/html-to-markdown/blob/main/README.md)
741
- - [Native Node Package](https://www.npmjs.com/package/html-to-markdown-node)
579
+ - [Native Node Package](https://www.npmjs.com/package/@kreuzberg/html-to-markdown-node)
742
580
  - [Python Package](https://pypi.org/project/html-to-markdown/)
743
- - [PHP Extension & Helpers](https://packagist.org/packages/kreuzberg-dev/html-to-markdown)
581
+ - [PHP Extension and Helpers](https://packagist.org/packages/kreuzberg-dev/html-to-markdown)
744
582
  - [Rust Crate](https://crates.io/crates/html-to-markdown-rs)
745
583
 
746
584
  ## License