npm - html-to-markdown-wasm - Versions diffs - 2.15.0 → 2.16.0 - Mend

html-to-markdown-wasm 2.15.0 → 2.16.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (18) hide show

package/LICENSE +21 -0
package/README.md +1 -1
package/dist/LICENSE +21 -0
package/dist/README.md +16 -57
package/dist/html_to_markdown_wasm_bg.js +0 -5
package/dist/html_to_markdown_wasm_bg.wasm +0 -0
package/dist/package.json +2 -1
package/dist-node/LICENSE +21 -0
package/dist-node/README.md +16 -57
package/dist-node/html_to_markdown_wasm.js +0 -5
package/dist-node/html_to_markdown_wasm_bg.wasm +0 -0
package/dist-node/package.json +2 -1
package/dist-web/LICENSE +21 -0
package/dist-web/README.md +16 -57
package/dist-web/html_to_markdown_wasm.js +0 -4
package/dist-web/html_to_markdown_wasm_bg.wasm +0 -0
package/dist-web/package.json +2 -1
package/package.json +1 -1

package/LICENSE ADDED Viewed

@@ -0,0 +1,21 @@
+The MIT License (MIT)
+Copyright 2024-2025 Na'aman Hirschfeld
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

package/README.md CHANGED Viewed

@@ -46,7 +46,7 @@ Universal WebAssembly bindings with **excellent performance** across all JavaScr
 ### Benchmark Fixtures (Apple M4)
-Numbers captured via `task bench:bindings -- --language wasm` using the shared Wikipedia + hOCR suite:
+Numbers captured via the shared fixture harness in `tools/benchmark-harness`:
 | Document               | Size   | ops/sec (WASM) |
 | ---------------------- | ------ | -------------- |

package/dist/LICENSE ADDED Viewed

@@ -0,0 +1,21 @@
+The MIT License (MIT)
+Copyright 2024-2025 Na'aman Hirschfeld
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

package/dist/README.md CHANGED Viewed

@@ -99,7 +99,7 @@ const markdown = convert(html, {
 });
 ```
-**Performance:** The shared fixture harness (`task bench:bindings`) now clocks C# at ~1.4k ops/sec (≈171 MB/s), Go at ~1.3k ops/sec (≈165 MB/s), Node, Python, and the Rust CLI at ~1.3–1.4k ops/sec (≈150 MB/s) on the 129 KB Wikipedia "Lists" page thanks to the new Buffer/Uint8Array fast paths and release-mode harness. Ruby stays close at ~1.2k ops/sec (≈150 MB/s), Java lands at ~1.0k ops/sec (≈126 MB/s), WASM hits ~0.85k ops/sec (≈108 MB/s), and PHP achieves ~0.3k ops/sec (≈35 MB/s)—all providing excellent throughput for production workloads.
+**Performance:** The shared fixture harness now lives in `tools/benchmark-harness` and is used to track Rust + binding throughput over time.
 See the JavaScript guides for full API documentation:
@@ -568,65 +568,24 @@ See the language-specific READMEs for complete configuration, hOCR workflows, an
 ## Performance
-Benchmarked on Apple M4 with complex real-world documents (Wikipedia articles, tables, lists):
+Benchmarked on Apple M4 using the shared fixture harness in `tools/benchmark-harness` (latest consolidated run: `20409971461`).
-### Operations per Second (higher is better)
+### Comparative Throughput (Median Across Fixtures)
-Derived directly from `tools/runtime-bench/results/latest.json` (Apple M4, shared fixtures):
+| Runtime | Median ops/sec | Median throughput (MB/s) | Peak memory (MB) | Successes |
+| ------- | -------------- | ------------------------ | ---------------- | --------- |
+| Rust | 1,060.3 | 116.4 | 171.3 | 56/56 |
+| Go | 1,496.3 | 131.1 | 22.9 | 16/16 |
+| Ruby | 2,155.5 | 300.4 | 280.3 | 48/48 |
+| PHP | 2,357.7 | 308.0 | 223.5 | 48/48 |
+| Elixir | 1,564.1 | 269.1 | 384.7 | 48/48 |
+| C# | 1,234.2 | 272.4 | 187.8 | 16/16 |
+| Java | 1,298.7 | 167.1 | 527.2 | 16/16 |
+| WASM | 1,485.8 | 157.6 | 95.3 | 48/48 |
+| Node.js (NAPI) | 2,054.2 | 306.5 | 95.4 | 48/48 |
+| Python (PyO3) | 3,120.3 | 307.5 | 83.5 | 48/48 |
-| Fixture                | Node.js (NAPI) | WASM | Python (PyO3) | Speedup (Node vs Python) |
-| ---------------------- | -------------- | ---- | ------------- | ------------------------ |
-| **Lists (Timeline)**   | 1,308          | 882  | 1,405         | **0.9×**                 |
-| **Tables (Countries)** | 331            | 242  | 352           | **0.9×**                 |
-| **Medium (Python)**    | 150            | 121  | 158           | **1.0×**                 |
-| **Large (Rust)**       | 163            | 124  | 183           | **0.9×**                 |
-| **Small (Intro)**      | 208            | 163  | 223           | **0.9×**                 |
-| **HOCR German PDF**    | 2,944          | 1,637| 2,991         | **1.0×**                 |
-| **HOCR Invoice**       | 27,326         | 7,775| 23,500        | **1.2×**                 |
-| **HOCR Tables**        | 3,475          | 1,667| 3,464         | **1.0×**                 |
-### Average Performance Summary
-| Implementation        | Avg ops/sec (fixtures) | vs Python | Notes |
-| --------------------- | ---------------------- | --------- | ----- |
-| **Rust CLI/Binary**   | **4,996**              | **1.2× faster** | Preprocessing now stays in one pass + reuses `parse_owned`, so the CLI leads every fixture |
-| **Node.js (NAPI-RS)** | **4,488**              | 1.0×      | Buffer/handle combo keeps Node within ~10 % of the Rust core while serving JS runtimes |
-| **Ruby (magnus)**     | **4,278**              | 0.9×      | Still extremely fast; ~25 k ops/sec on HOCR invoices without extra work |
-| **Python (PyO3)**     | **4,034**              | baseline  | Release-mode harness plus handle reuse keep it competitive, but it now trails Node/Rust |
-| **WebAssembly**       | **1,576**              | 0.4×      | Portable option for Deno/browsers/edge using the new byte APIs |
-| **PHP (ext)**         | **1,480**              | 0.4×      | Composer extension holds steady at 35–70 MB/s once the PIE build is installed |
-### Key Insights
-- **Rust now leads throughput**: the fused preprocessing + `parse_owned` pathway pushes the CLI to ~1.7 k ops/sec on the 129 KB lists page and ~31 k ops/sec on the HOCR invoice fixture.
-- **Node.js trails by only a few percent** after the buffer/handle work—~1.3 k ops/sec on the lists fixture and 27 k ops/sec on HOCR invoices without any UTF-16 copies.
-- **Python remains competitive** but now sits below Node/Rust (~4.0 k average ops/sec); stick to the v2 API to avoid the deprecated compatibility shim.
-- **Elixir matches the Rust core** because the Rustler NIF executes the same `ConversionOptions` pipeline—benchmarks land between 170–1,460 ops/sec on the Wikipedia fixtures and >20 k ops/sec on micro HOCR payloads.
-- **PHP and WASM stay in the 35–70 MB/s band**, which is plenty for Composer queues or edge runtimes as long as the extension/module is built ahead of time.
-- **Rust CLI results now mirror the bindings**, since `task bench:bindings` runs the harness with `cargo run --release` by default—profile there, then push optimizations down into each FFI layer.
-### Runtime Benchmarks (PHP / Ruby / Python / Node / WASM)
-Measured on Apple M4 using the fixture-driven runtime harness in `tools/runtime-bench` (`task bench:bindings`). Every binding consumes the exact same HTML fixtures and hOCR samples from `test_documents/`:
-| Document            | Size     | Ruby ops/sec | PHP ops/sec | Python ops/sec | Node ops/sec | WASM ops/sec | Elixir ops/sec | Rust ops/sec |
-| ------------------- | -------- | ------------ | ----------- | -------------- | ------------ | ------------ | -------------- | ------------ |
-| Lists (Timeline)    | 129 KB   | 1,349        | 533         | 1,405          | 1,308        | 882          | 1,463          | **1,700**    |
-| Tables (Countries)  | 360 KB   | 326          | 118         | 352            | 331          | 242          | 357            | **416**      |
-| Medium (Python)     | 657 KB   | 157          | 59          | 158            | 150          | 121          | 171            | **190**      |
-| Large (Rust)        | 567 KB   | 174          | 65          | 183            | 163          | 124          | 174            | **220**      |
-| Small (Intro)       | 463 KB   | 214          | 83          | 223            | 208          | 163          | 247            | **258**      |
-| HOCR German PDF     | 44 KB    | 2,936        | 1,007       | **2,991**      | 2,944        | 1,637        | 3,113          | 2,760        |
-| HOCR Invoice        | 4 KB     | 25,740       | 8,781       | 23,500         | 27,326       | 7,775        | 20,424         | **31,345**   |
-| HOCR Embedded Tables| 37 KB    | 3,328        | 1,194       | 3,464          | **3,475**    | 1,667        | 3,366          | 3,080        |
-The harness shells out to each runtime’s lightweight benchmark driver (`packages/*/bin/benchmark.*`, `crates/*/bin/benchmark.ts`), feeds fixtures defined in `tools/runtime-bench/fixtures/*.toml`, and writes machine-readable JSON reports (`tools/runtime-bench/results/latest.json`) for regression tracking. Add new languages or scenarios by extending those fixture files and drivers.
-Use `task bench:bindings` to regenerate throughput numbers across all bindings or `task bench:bindings:profile` to capture CPU/memory samples while the benchmarks run. To focus on specific languages or fixtures (for example, `task bench:bindings -- --language elixir`), pass `--language` / `--fixture` directly to `cargo run --manifest-path tools/runtime-bench/Cargo.toml -- …`.
-Need a call-stack view of the Rust core? Run `task flamegraph:rust` (or call the harness with `--language rust --flamegraph path.svg`) to profile a fixture and dump a ready-to-inspect flamegraph in `tools/runtime-bench/results/`.
-**Note on Python performance**: The current Python bindings have optimization opportunities. The v2 API with direct `convert()` calls performs best; avoid the v1 compatibility layer for performance-critical applications.
+Use `task bench:harness` to regenerate throughput numbers across the bindings, `task bench:harness:memory` for CPU/memory samples, and `task bench:harness:rust` for flamegraphs.
 ## Compatibility (v1 → v2)

package/dist/html_to_markdown_wasm_bg.js CHANGED Viewed

@@ -1147,11 +1147,6 @@ export function __wbg_done_62ea16af4ce34b24(arg0) {
     return ret;
 };
-export function __wbg_entries_83c79938054e065f(arg0) {
-    const ret = Object.entries(getObject(arg0));
-    return addHeapObject(ret);
-};
 export function __wbg_error_7534b8e9a36f1ab4(arg0, arg1) {
     let deferred0_0;
     let deferred0_1;

package/dist/html_to_markdown_wasm_bg.wasm CHANGED Viewed

Binary file

package/dist/package.json CHANGED Viewed

@@ -4,7 +4,8 @@
   "collaborators": [
     "Na'aman Hirschfeld <nhirschfeld@gmail.com>"
   ],
-  "version": "2.15.0",
+  "description": "HTML to Markdown conversion for WebAssembly targets",
+  "version": "2.16.0",
   "license": "MIT",
   "repository": {
     "type": "git",

package/dist-node/LICENSE ADDED Viewed

@@ -0,0 +1,21 @@
+The MIT License (MIT)
+Copyright 2024-2025 Na'aman Hirschfeld
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

package/dist-node/README.md CHANGED Viewed

@@ -99,7 +99,7 @@ const markdown = convert(html, {
 });
 ```
-**Performance:** The shared fixture harness (`task bench:bindings`) now clocks C# at ~1.4k ops/sec (≈171 MB/s), Go at ~1.3k ops/sec (≈165 MB/s), Node, Python, and the Rust CLI at ~1.3–1.4k ops/sec (≈150 MB/s) on the 129 KB Wikipedia "Lists" page thanks to the new Buffer/Uint8Array fast paths and release-mode harness. Ruby stays close at ~1.2k ops/sec (≈150 MB/s), Java lands at ~1.0k ops/sec (≈126 MB/s), WASM hits ~0.85k ops/sec (≈108 MB/s), and PHP achieves ~0.3k ops/sec (≈35 MB/s)—all providing excellent throughput for production workloads.
+**Performance:** The shared fixture harness now lives in `tools/benchmark-harness` and is used to track Rust + binding throughput over time.
 See the JavaScript guides for full API documentation:
@@ -568,65 +568,24 @@ See the language-specific READMEs for complete configuration, hOCR workflows, an
 ## Performance
-Benchmarked on Apple M4 with complex real-world documents (Wikipedia articles, tables, lists):
+Benchmarked on Apple M4 using the shared fixture harness in `tools/benchmark-harness` (latest consolidated run: `20409971461`).
-### Operations per Second (higher is better)
+### Comparative Throughput (Median Across Fixtures)
-Derived directly from `tools/runtime-bench/results/latest.json` (Apple M4, shared fixtures):
+| Runtime | Median ops/sec | Median throughput (MB/s) | Peak memory (MB) | Successes |
+| ------- | -------------- | ------------------------ | ---------------- | --------- |
+| Rust | 1,060.3 | 116.4 | 171.3 | 56/56 |
+| Go | 1,496.3 | 131.1 | 22.9 | 16/16 |
+| Ruby | 2,155.5 | 300.4 | 280.3 | 48/48 |
+| PHP | 2,357.7 | 308.0 | 223.5 | 48/48 |
+| Elixir | 1,564.1 | 269.1 | 384.7 | 48/48 |
+| C# | 1,234.2 | 272.4 | 187.8 | 16/16 |
+| Java | 1,298.7 | 167.1 | 527.2 | 16/16 |
+| WASM | 1,485.8 | 157.6 | 95.3 | 48/48 |
+| Node.js (NAPI) | 2,054.2 | 306.5 | 95.4 | 48/48 |
+| Python (PyO3) | 3,120.3 | 307.5 | 83.5 | 48/48 |
-| Fixture                | Node.js (NAPI) | WASM | Python (PyO3) | Speedup (Node vs Python) |
-| ---------------------- | -------------- | ---- | ------------- | ------------------------ |
-| **Lists (Timeline)**   | 1,308          | 882  | 1,405         | **0.9×**                 |
-| **Tables (Countries)** | 331            | 242  | 352           | **0.9×**                 |
-| **Medium (Python)**    | 150            | 121  | 158           | **1.0×**                 |
-| **Large (Rust)**       | 163            | 124  | 183           | **0.9×**                 |
-| **Small (Intro)**      | 208            | 163  | 223           | **0.9×**                 |
-| **HOCR German PDF**    | 2,944          | 1,637| 2,991         | **1.0×**                 |
-| **HOCR Invoice**       | 27,326         | 7,775| 23,500        | **1.2×**                 |
-| **HOCR Tables**        | 3,475          | 1,667| 3,464         | **1.0×**                 |
-### Average Performance Summary
-| Implementation        | Avg ops/sec (fixtures) | vs Python | Notes |
-| --------------------- | ---------------------- | --------- | ----- |
-| **Rust CLI/Binary**   | **4,996**              | **1.2× faster** | Preprocessing now stays in one pass + reuses `parse_owned`, so the CLI leads every fixture |
-| **Node.js (NAPI-RS)** | **4,488**              | 1.0×      | Buffer/handle combo keeps Node within ~10 % of the Rust core while serving JS runtimes |
-| **Ruby (magnus)**     | **4,278**              | 0.9×      | Still extremely fast; ~25 k ops/sec on HOCR invoices without extra work |
-| **Python (PyO3)**     | **4,034**              | baseline  | Release-mode harness plus handle reuse keep it competitive, but it now trails Node/Rust |
-| **WebAssembly**       | **1,576**              | 0.4×      | Portable option for Deno/browsers/edge using the new byte APIs |
-| **PHP (ext)**         | **1,480**              | 0.4×      | Composer extension holds steady at 35–70 MB/s once the PIE build is installed |
-### Key Insights
-- **Rust now leads throughput**: the fused preprocessing + `parse_owned` pathway pushes the CLI to ~1.7 k ops/sec on the 129 KB lists page and ~31 k ops/sec on the HOCR invoice fixture.
-- **Node.js trails by only a few percent** after the buffer/handle work—~1.3 k ops/sec on the lists fixture and 27 k ops/sec on HOCR invoices without any UTF-16 copies.
-- **Python remains competitive** but now sits below Node/Rust (~4.0 k average ops/sec); stick to the v2 API to avoid the deprecated compatibility shim.
-- **Elixir matches the Rust core** because the Rustler NIF executes the same `ConversionOptions` pipeline—benchmarks land between 170–1,460 ops/sec on the Wikipedia fixtures and >20 k ops/sec on micro HOCR payloads.
-- **PHP and WASM stay in the 35–70 MB/s band**, which is plenty for Composer queues or edge runtimes as long as the extension/module is built ahead of time.
-- **Rust CLI results now mirror the bindings**, since `task bench:bindings` runs the harness with `cargo run --release` by default—profile there, then push optimizations down into each FFI layer.
-### Runtime Benchmarks (PHP / Ruby / Python / Node / WASM)
-Measured on Apple M4 using the fixture-driven runtime harness in `tools/runtime-bench` (`task bench:bindings`). Every binding consumes the exact same HTML fixtures and hOCR samples from `test_documents/`:
-| Document            | Size     | Ruby ops/sec | PHP ops/sec | Python ops/sec | Node ops/sec | WASM ops/sec | Elixir ops/sec | Rust ops/sec |
-| ------------------- | -------- | ------------ | ----------- | -------------- | ------------ | ------------ | -------------- | ------------ |
-| Lists (Timeline)    | 129 KB   | 1,349        | 533         | 1,405          | 1,308        | 882          | 1,463          | **1,700**    |
-| Tables (Countries)  | 360 KB   | 326          | 118         | 352            | 331          | 242          | 357            | **416**      |
-| Medium (Python)     | 657 KB   | 157          | 59          | 158            | 150          | 121          | 171            | **190**      |
-| Large (Rust)        | 567 KB   | 174          | 65          | 183            | 163          | 124          | 174            | **220**      |
-| Small (Intro)       | 463 KB   | 214          | 83          | 223            | 208          | 163          | 247            | **258**      |
-| HOCR German PDF     | 44 KB    | 2,936        | 1,007       | **2,991**      | 2,944        | 1,637        | 3,113          | 2,760        |
-| HOCR Invoice        | 4 KB     | 25,740       | 8,781       | 23,500         | 27,326       | 7,775        | 20,424         | **31,345**   |
-| HOCR Embedded Tables| 37 KB    | 3,328        | 1,194       | 3,464          | **3,475**    | 1,667        | 3,366          | 3,080        |
-The harness shells out to each runtime’s lightweight benchmark driver (`packages/*/bin/benchmark.*`, `crates/*/bin/benchmark.ts`), feeds fixtures defined in `tools/runtime-bench/fixtures/*.toml`, and writes machine-readable JSON reports (`tools/runtime-bench/results/latest.json`) for regression tracking. Add new languages or scenarios by extending those fixture files and drivers.
-Use `task bench:bindings` to regenerate throughput numbers across all bindings or `task bench:bindings:profile` to capture CPU/memory samples while the benchmarks run. To focus on specific languages or fixtures (for example, `task bench:bindings -- --language elixir`), pass `--language` / `--fixture` directly to `cargo run --manifest-path tools/runtime-bench/Cargo.toml -- …`.
-Need a call-stack view of the Rust core? Run `task flamegraph:rust` (or call the harness with `--language rust --flamegraph path.svg`) to profile a fixture and dump a ready-to-inspect flamegraph in `tools/runtime-bench/results/`.
-**Note on Python performance**: The current Python bindings have optimization opportunities. The v2 API with direct `convert()` calls performs best; avoid the v1 compatibility layer for performance-critical applications.
+Use `task bench:harness` to regenerate throughput numbers across the bindings, `task bench:harness:memory` for CPU/memory samples, and `task bench:harness:rust` for flamegraphs.
 ## Compatibility (v1 → v2)

package/dist-node/html_to_markdown_wasm.js CHANGED Viewed

@@ -1154,11 +1154,6 @@ exports.__wbg_done_62ea16af4ce34b24 = function(arg0) {
     return ret;
 };
-exports.__wbg_entries_83c79938054e065f = function(arg0) {
-    const ret = Object.entries(getObject(arg0));
-    return addHeapObject(ret);
-};
 exports.__wbg_error_7534b8e9a36f1ab4 = function(arg0, arg1) {
     let deferred0_0;
     let deferred0_1;

package/dist-node/html_to_markdown_wasm_bg.wasm CHANGED Viewed

Binary file

package/dist-node/package.json CHANGED Viewed

@@ -3,7 +3,8 @@
   "collaborators": [
     "Na'aman Hirschfeld <nhirschfeld@gmail.com>"
   ],
-  "version": "2.15.0",
+  "description": "HTML to Markdown conversion for WebAssembly targets",
+  "version": "2.16.0",
   "license": "MIT",
   "repository": {
     "type": "git",

package/dist-web/LICENSE ADDED Viewed

@@ -0,0 +1,21 @@
+The MIT License (MIT)
+Copyright 2024-2025 Na'aman Hirschfeld
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

package/dist-web/README.md CHANGED Viewed

@@ -99,7 +99,7 @@ const markdown = convert(html, {
 });
 ```
-**Performance:** The shared fixture harness (`task bench:bindings`) now clocks C# at ~1.4k ops/sec (≈171 MB/s), Go at ~1.3k ops/sec (≈165 MB/s), Node, Python, and the Rust CLI at ~1.3–1.4k ops/sec (≈150 MB/s) on the 129 KB Wikipedia "Lists" page thanks to the new Buffer/Uint8Array fast paths and release-mode harness. Ruby stays close at ~1.2k ops/sec (≈150 MB/s), Java lands at ~1.0k ops/sec (≈126 MB/s), WASM hits ~0.85k ops/sec (≈108 MB/s), and PHP achieves ~0.3k ops/sec (≈35 MB/s)—all providing excellent throughput for production workloads.
+**Performance:** The shared fixture harness now lives in `tools/benchmark-harness` and is used to track Rust + binding throughput over time.
 See the JavaScript guides for full API documentation:
@@ -568,65 +568,24 @@ See the language-specific READMEs for complete configuration, hOCR workflows, an
 ## Performance
-Benchmarked on Apple M4 with complex real-world documents (Wikipedia articles, tables, lists):
+Benchmarked on Apple M4 using the shared fixture harness in `tools/benchmark-harness` (latest consolidated run: `20409971461`).
-### Operations per Second (higher is better)
+### Comparative Throughput (Median Across Fixtures)
-Derived directly from `tools/runtime-bench/results/latest.json` (Apple M4, shared fixtures):
+| Runtime | Median ops/sec | Median throughput (MB/s) | Peak memory (MB) | Successes |
+| ------- | -------------- | ------------------------ | ---------------- | --------- |
+| Rust | 1,060.3 | 116.4 | 171.3 | 56/56 |
+| Go | 1,496.3 | 131.1 | 22.9 | 16/16 |
+| Ruby | 2,155.5 | 300.4 | 280.3 | 48/48 |
+| PHP | 2,357.7 | 308.0 | 223.5 | 48/48 |
+| Elixir | 1,564.1 | 269.1 | 384.7 | 48/48 |
+| C# | 1,234.2 | 272.4 | 187.8 | 16/16 |
+| Java | 1,298.7 | 167.1 | 527.2 | 16/16 |
+| WASM | 1,485.8 | 157.6 | 95.3 | 48/48 |
+| Node.js (NAPI) | 2,054.2 | 306.5 | 95.4 | 48/48 |
+| Python (PyO3) | 3,120.3 | 307.5 | 83.5 | 48/48 |
-| Fixture                | Node.js (NAPI) | WASM | Python (PyO3) | Speedup (Node vs Python) |
-| ---------------------- | -------------- | ---- | ------------- | ------------------------ |
-| **Lists (Timeline)**   | 1,308          | 882  | 1,405         | **0.9×**                 |
-| **Tables (Countries)** | 331            | 242  | 352           | **0.9×**                 |
-| **Medium (Python)**    | 150            | 121  | 158           | **1.0×**                 |
-| **Large (Rust)**       | 163            | 124  | 183           | **0.9×**                 |
-| **Small (Intro)**      | 208            | 163  | 223           | **0.9×**                 |
-| **HOCR German PDF**    | 2,944          | 1,637| 2,991         | **1.0×**                 |
-| **HOCR Invoice**       | 27,326         | 7,775| 23,500        | **1.2×**                 |
-| **HOCR Tables**        | 3,475          | 1,667| 3,464         | **1.0×**                 |
-### Average Performance Summary
-| Implementation        | Avg ops/sec (fixtures) | vs Python | Notes |
-| --------------------- | ---------------------- | --------- | ----- |
-| **Rust CLI/Binary**   | **4,996**              | **1.2× faster** | Preprocessing now stays in one pass + reuses `parse_owned`, so the CLI leads every fixture |
-| **Node.js (NAPI-RS)** | **4,488**              | 1.0×      | Buffer/handle combo keeps Node within ~10 % of the Rust core while serving JS runtimes |
-| **Ruby (magnus)**     | **4,278**              | 0.9×      | Still extremely fast; ~25 k ops/sec on HOCR invoices without extra work |
-| **Python (PyO3)**     | **4,034**              | baseline  | Release-mode harness plus handle reuse keep it competitive, but it now trails Node/Rust |
-| **WebAssembly**       | **1,576**              | 0.4×      | Portable option for Deno/browsers/edge using the new byte APIs |
-| **PHP (ext)**         | **1,480**              | 0.4×      | Composer extension holds steady at 35–70 MB/s once the PIE build is installed |
-### Key Insights
-- **Rust now leads throughput**: the fused preprocessing + `parse_owned` pathway pushes the CLI to ~1.7 k ops/sec on the 129 KB lists page and ~31 k ops/sec on the HOCR invoice fixture.
-- **Node.js trails by only a few percent** after the buffer/handle work—~1.3 k ops/sec on the lists fixture and 27 k ops/sec on HOCR invoices without any UTF-16 copies.
-- **Python remains competitive** but now sits below Node/Rust (~4.0 k average ops/sec); stick to the v2 API to avoid the deprecated compatibility shim.
-- **Elixir matches the Rust core** because the Rustler NIF executes the same `ConversionOptions` pipeline—benchmarks land between 170–1,460 ops/sec on the Wikipedia fixtures and >20 k ops/sec on micro HOCR payloads.
-- **PHP and WASM stay in the 35–70 MB/s band**, which is plenty for Composer queues or edge runtimes as long as the extension/module is built ahead of time.
-- **Rust CLI results now mirror the bindings**, since `task bench:bindings` runs the harness with `cargo run --release` by default—profile there, then push optimizations down into each FFI layer.
-### Runtime Benchmarks (PHP / Ruby / Python / Node / WASM)
-Measured on Apple M4 using the fixture-driven runtime harness in `tools/runtime-bench` (`task bench:bindings`). Every binding consumes the exact same HTML fixtures and hOCR samples from `test_documents/`:
-| Document            | Size     | Ruby ops/sec | PHP ops/sec | Python ops/sec | Node ops/sec | WASM ops/sec | Elixir ops/sec | Rust ops/sec |
-| ------------------- | -------- | ------------ | ----------- | -------------- | ------------ | ------------ | -------------- | ------------ |
-| Lists (Timeline)    | 129 KB   | 1,349        | 533         | 1,405          | 1,308        | 882          | 1,463          | **1,700**    |
-| Tables (Countries)  | 360 KB   | 326          | 118         | 352            | 331          | 242          | 357            | **416**      |
-| Medium (Python)     | 657 KB   | 157          | 59          | 158            | 150          | 121          | 171            | **190**      |
-| Large (Rust)        | 567 KB   | 174          | 65          | 183            | 163          | 124          | 174            | **220**      |
-| Small (Intro)       | 463 KB   | 214          | 83          | 223            | 208          | 163          | 247            | **258**      |
-| HOCR German PDF     | 44 KB    | 2,936        | 1,007       | **2,991**      | 2,944        | 1,637        | 3,113          | 2,760        |
-| HOCR Invoice        | 4 KB     | 25,740       | 8,781       | 23,500         | 27,326       | 7,775        | 20,424         | **31,345**   |
-| HOCR Embedded Tables| 37 KB    | 3,328        | 1,194       | 3,464          | **3,475**    | 1,667        | 3,366          | 3,080        |
-The harness shells out to each runtime’s lightweight benchmark driver (`packages/*/bin/benchmark.*`, `crates/*/bin/benchmark.ts`), feeds fixtures defined in `tools/runtime-bench/fixtures/*.toml`, and writes machine-readable JSON reports (`tools/runtime-bench/results/latest.json`) for regression tracking. Add new languages or scenarios by extending those fixture files and drivers.
-Use `task bench:bindings` to regenerate throughput numbers across all bindings or `task bench:bindings:profile` to capture CPU/memory samples while the benchmarks run. To focus on specific languages or fixtures (for example, `task bench:bindings -- --language elixir`), pass `--language` / `--fixture` directly to `cargo run --manifest-path tools/runtime-bench/Cargo.toml -- …`.
-Need a call-stack view of the Rust core? Run `task flamegraph:rust` (or call the harness with `--language rust --flamegraph path.svg`) to profile a fixture and dump a ready-to-inspect flamegraph in `tools/runtime-bench/results/`.
-**Note on Python performance**: The current Python bindings have optimization opportunities. The v2 API with direct `convert()` calls performs best; avoid the v1 compatibility layer for performance-critical applications.
+Use `task bench:harness` to regenerate throughput numbers across the bindings, `task bench:harness:memory` for CPU/memory samples, and `task bench:harness:rust` for flamegraphs.
 ## Compatibility (v1 → v2)

package/dist-web/html_to_markdown_wasm.js CHANGED Viewed

@@ -1158,10 +1158,6 @@ function __wbg_get_imports() {
         const ret = getObject(arg0).done;
         return ret;
     };
-    imports.wbg.__wbg_entries_83c79938054e065f = function(arg0) {
-        const ret = Object.entries(getObject(arg0));
-        return addHeapObject(ret);
-    };
     imports.wbg.__wbg_error_7534b8e9a36f1ab4 = function(arg0, arg1) {
         let deferred0_0;
         let deferred0_1;

package/dist-web/html_to_markdown_wasm_bg.wasm CHANGED Viewed

Binary file

package/dist-web/package.json CHANGED Viewed

@@ -4,7 +4,8 @@
   "collaborators": [
     "Na'aman Hirschfeld <nhirschfeld@gmail.com>"
   ],
-  "version": "2.15.0",
+  "description": "HTML to Markdown conversion for WebAssembly targets",
+  "version": "2.16.0",
   "license": "MIT",
   "repository": {
     "type": "git",

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
 	"name": "html-to-markdown-wasm",
-	"version": "2.15.0",
+	"version": "2.16.0",
 	"description": "High-performance HTML to Markdown converter - WebAssembly bindings",
 	"main": "dist/html_to_markdown_wasm.js",
 	"types": "dist/html_to_markdown_wasm.d.ts",