html-to-markdown 2.5.2 → 2.5.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: dbdd98e748fec609d1b033648e4e78aef576a320a127daaa83202fff9a662f51
4
- data.tar.gz: 6a9ea3fa2b0eaaabc6d9618369da96899ea5295585b9354bbf9348915b04bac6
3
+ metadata.gz: 0f6d03456f76e7f7b1157f052c51dc333e1bd68e72a6bb7664f4701f9a592ed2
4
+ data.tar.gz: e4a09afb52aba7580f9805b91981d96007f04c37fa7ba6059867d0e818e260e0
5
5
  SHA512:
6
- metadata.gz: 141b341e6b3729cb804864a1af4e7a7411dd46efcd37645a401169bc57385ae623726a7b136049e5a78010695f1750e56635e670887319bdfb22a62493d7b7e1
7
- data.tar.gz: 3d9b17aa842772c81ca3f212983d614b4318be7635adaa8581005aca3909829d332975275d40dc33d79aac2bd7c3d72d8c05ca24e3d7ecf44f0b1c2e07518758
6
+ metadata.gz: d37723f521d0765ff47578457ba24d6636fe5102af2672cc3f8485bef323770cdb8155512378db4025ea47aa4c31146291da960a69ccdacb93a461e4c8e64593
7
+ data.tar.gz: 31523b5389627bfb1476133f1daa39ae2273d32dfdd72dac0f7927cc45267e41def028ff07bec8a833c567d07d02babda7e8e4e53c7dc63f7490c8ad7dc00481
data/Cargo.toml CHANGED
@@ -1,6 +1,6 @@
1
1
  [package]
2
2
  name = "html-to-markdown-rb"
3
- version = "2.5.2"
3
+ version = "2.5.4"
4
4
  edition = "2021"
5
5
  authors = ["Na'aman Hirschfeld <nhirschfeld@gmail.com>"]
6
6
  license = "MIT"
@@ -21,7 +21,7 @@ crate-type = ["cdylib", "rlib"]
21
21
  default = []
22
22
 
23
23
  [dependencies]
24
- html-to-markdown-rs = { version = "2.5.2", features = ["inline-images"] }
24
+ html-to-markdown-rs = { version = "2.5.4", features = ["inline-images"] }
25
25
  magnus = { git = "https://github.com/matsadler/magnus", rev = "f6db11769efb517427bf7f121f9c32e18b059b38", features = ["rb-sys"] }
26
26
 
27
27
  [dev-dependencies]
data/README.md CHANGED
@@ -1,10 +1,22 @@
1
1
  # html-to-markdown-rb
2
2
 
3
- Ruby bindings for the `html-to-markdown` Rust engine the same core that powers the Python wheels, Node.js NAPI bindings, WebAssembly package, and CLI. The gem exposes fast HTML → Markdown conversion with identical rendering behaviour across every supported language.
3
+ Blazing-fast HTML Markdown conversion for Ruby, powered by the same Rust engine used by our Python, Node.js, and WebAssembly packages. Ship identical Markdown across every runtime while enjoying native extension performance.
4
4
 
5
- [![RubyGems](https://badge.fury.io/rb/html-to-markdown.svg)](https://rubygems.org/gems/html-to-markdown)
5
+ [![Crates.io](https://img.shields.io/crates/v/html-to-markdown-rs.svg)](https://crates.io/crates/html-to-markdown-rs)
6
+ [![npm version](https://badge.fury.io/js/html-to-markdown-node.svg)](https://www.npmjs.com/package/html-to-markdown-node)
7
+ [![PyPI version](https://badge.fury.io/py/html-to-markdown.svg)](https://pypi.org/project/html-to-markdown/)
8
+ [![Gem Version](https://badge.fury.io/rb/html-to-markdown.svg)](https://rubygems.org/gems/html-to-markdown)
6
9
  [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://github.com/Goldziher/html-to-markdown/blob/main/LICENSE)
7
10
 
11
+ ## Features
12
+
13
+ - ⚡ **Rust-fast**: Ruby bindings around a highly optimised Rust core (60‑80× faster than BeautifulSoup-based converters).
14
+ - 🔁 **Identical output**: Shares logic with the Python wheels, npm bindings, WASM package, and CLI — consistent Markdown everywhere.
15
+ - ⚙️ **Rich configuration**: Control heading styles, list indentation, whitespace handling, HTML preprocessing, and more.
16
+ - 🖼️ **Inline image extraction**: Pull out embedded images (PNG/JPEG/SVG/data URIs) alongside Markdown.
17
+ - 🧰 **Bundled CLI proxy**: Call the Rust CLI straight from Ruby or shell scripts.
18
+ - 🛠️ **First-class Rails support**: Works with `Gem.win_platform?` builds, supports Trusted Publishing, and compiles on install if no native gem matches.
19
+
8
20
  ## Installation
9
21
 
10
22
  ```bash
@@ -29,6 +41,18 @@ ridk exec pacman -S --needed --noconfirm base-devel mingw-w64-ucrt-x86_64-toolch
29
41
 
30
42
  This provides the standard headers (including `strings.h`) required for the bindgen step.
31
43
 
44
+ ## Performance Snapshot
45
+
46
+ Apple M4 • Real Wikipedia documents • `HtmlToMarkdown.convert` (Ruby)
47
+
48
+ | Document | Size | Latency | Throughput | Docs/sec |
49
+ | ------------------- | ----- | ------- | ---------- | -------- |
50
+ | Lists (Timeline) | 129KB | 0.69ms | 187 MB/s | 1,450 |
51
+ | Tables (Countries) | 360KB | 2.19ms | 164 MB/s | 456 |
52
+ | Mixed (Python wiki) | 656KB | 4.88ms | 134 MB/s | 205 |
53
+
54
+ > Same core, same benchmarks: the Ruby extension stays within single-digit % of the Rust CLI and mirrors the Python/Node numbers.
55
+
32
56
  ## Quick Start
33
57
 
34
58
  ```ruby
@@ -53,9 +77,11 @@ puts markdown
53
77
  # - Identical output across languages
54
78
  ```
55
79
 
56
- ### Conversion with Options
80
+ ## API
57
81
 
58
- All configuration mirrors the Rust API. Options accept symbols or strings and match the same defaults as the other bindings.
82
+ ### Conversion Options
83
+
84
+ Pass a Ruby hash (string or symbol keys) to tweak rendering. Every option maps one-for-one with the Rust/Python/Node APIs.
59
85
 
60
86
  ```ruby
61
87
  require 'html_to_markdown'
@@ -64,10 +90,31 @@ markdown = HtmlToMarkdown.convert(
64
90
  '<pre><code class="language-ruby">puts "hi"</code></pre>',
65
91
  heading_style: :atx,
66
92
  code_block_style: :fenced,
67
- bullets: ['*', '-', '+'],
68
- wrap: true,
69
- wrap_width: 80,
70
- preserve_tags: %w[table figure]
93
+ bullets: '*+-',
94
+ list_indent_type: :spaces,
95
+ list_indent_width: 2,
96
+ whitespace_mode: :normalized,
97
+ highlight_style: :double_equal
98
+ )
99
+
100
+ puts markdown
101
+ ```
102
+
103
+ ### HTML Preprocessing
104
+
105
+ Clean up scraped HTML (navigation, forms, malformed markup) before conversion:
106
+
107
+ ```ruby
108
+ require 'html_to_markdown'
109
+
110
+ markdown = HtmlToMarkdown.convert(
111
+ html,
112
+ preprocessing: {
113
+ enabled: true,
114
+ preset: :aggressive, # :minimal, :standard, :aggressive
115
+ remove_navigation: true,
116
+ remove_forms: true
117
+ }
71
118
  )
72
119
  ```
73
120
 
@@ -94,7 +141,7 @@ result.inline_images.each do |img|
94
141
  end
95
142
  ```
96
143
 
97
- ### CLI Proxy
144
+ ## CLI
98
145
 
99
146
  The gem bundles a small proxy for the Rust CLI binary. Use it when you need parity with the standalone `html-to-markdown` executable.
100
147
 
@@ -109,10 +156,17 @@ You can also call the CLI binary directly for scripting:
109
156
 
110
157
  ```ruby
111
158
  HtmlToMarkdown::CLIProxy.call(['--version'])
112
- # => "html-to-markdown 2.5.2"
159
+ # => "html-to-markdown 2.5.4"
160
+ ```
161
+
162
+ Rebuild the CLI locally if you see `CLI binary not built` during tests:
163
+
164
+ ```bash
165
+ bundle exec rake compile # builds the extension
166
+ bundle exec ruby scripts/prepare_ruby_gem.rb # copies the CLI into lib/bin/
113
167
  ```
114
168
 
115
- ### Error Handling
169
+ ## Error Handling
116
170
 
117
171
  Conversion errors raise `HtmlToMarkdown::Error` (wrapping the Rust error context). CLI invocations use specialised subclasses:
118
172
 
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module HtmlToMarkdown
4
- VERSION = '2.5.2'
4
+ VERSION = '2.5.4'
5
5
  end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: html-to-markdown
3
3
  version: !ruby/object:Gem::Version
4
- version: 2.5.2
4
+ version: 2.5.4
5
5
  platform: ruby
6
6
  authors:
7
7
  - Na'aman Hirschfeld
@@ -30,8 +30,9 @@ dependencies:
30
30
  - - "<"
31
31
  - !ruby/object:Gem::Version
32
32
  version: '1.0'
33
- description: High-performance HTML to Markdown conversion from Ruby using Magnus and
34
- rb-sys.
33
+ description: |-
34
+ html-to-markdown wraps our ultra-fast Rust converter with a Ruby-native API via Magnus and rb-sys.
35
+ Enjoy identical output to the Python, Node, and WASM bindings, a bundled CLI proxy, and seamless cross-platform installs.
35
36
  email:
36
37
  - nhirschfeld@gmail.com
37
38
  executables:
@@ -44,7 +45,6 @@ files:
44
45
  - README.md
45
46
  - exe/html-to-markdown
46
47
  - extconf.rb
47
- - lib/bin/html-to-markdown
48
48
  - lib/html_to_markdown.rb
49
49
  - lib/html_to_markdown/cli.rb
50
50
  - lib/html_to_markdown/cli_proxy.rb
@@ -81,5 +81,5 @@ requirements: []
81
81
  rubygems_version: 3.5.22
82
82
  signing_key:
83
83
  specification_version: 4
84
- summary: Ruby bindings for the html-to-markdown Rust library
84
+ summary: Blazing-fast HTML to Markdown conversion for Ruby, powered by Rust.
85
85
  test_files: []
Binary file