html-to-markdown 2.5.4 → 2.5.6

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 0f6d03456f76e7f7b1157f052c51dc333e1bd68e72a6bb7664f4701f9a592ed2
4
- data.tar.gz: e4a09afb52aba7580f9805b91981d96007f04c37fa7ba6059867d0e818e260e0
3
+ metadata.gz: 1829c785edaf223adaa67b69c7350264e7bc55de02e1e9b40451f4b61222ae42
4
+ data.tar.gz: 303ef1c08e294512de540e896d5b7d0f652a507ffb307544d847b1b1277adc24
5
5
  SHA512:
6
- metadata.gz: d37723f521d0765ff47578457ba24d6636fe5102af2672cc3f8485bef323770cdb8155512378db4025ea47aa4c31146291da960a69ccdacb93a461e4c8e64593
7
- data.tar.gz: 31523b5389627bfb1476133f1daa39ae2273d32dfdd72dac0f7927cc45267e41def028ff07bec8a833c567d07d02babda7e8e4e53c7dc63f7490c8ad7dc00481
6
+ metadata.gz: b4df160116b63a80814855deaa288bd6456b47745567844347a8b99bb1ba6c4251e9fdfd176d9c45c3ed2fe77434d6c3d39bcb725683abf03c9c1abecf071899
7
+ data.tar.gz: 4c3fb6133b606408d9907b51b49bf71ec6b8dee8f02fffe6ea89eaa2b0d95152e37fd5cafa2c19e2a42298a0eebbd10bb9fde4b3b67e824b6ff0729280490305
data/Cargo.toml CHANGED
@@ -1,6 +1,6 @@
1
1
  [package]
2
2
  name = "html-to-markdown-rb"
3
- version = "2.5.4"
3
+ version = "2.5.6"
4
4
  edition = "2021"
5
5
  authors = ["Na'aman Hirschfeld <nhirschfeld@gmail.com>"]
6
6
  license = "MIT"
@@ -21,7 +21,7 @@ crate-type = ["cdylib", "rlib"]
21
21
  default = []
22
22
 
23
23
  [dependencies]
24
- html-to-markdown-rs = { version = "2.5.4", features = ["inline-images"] }
24
+ html-to-markdown-rs = { version = "2.5.6", features = ["inline-images"] }
25
25
  magnus = { git = "https://github.com/matsadler/magnus", rev = "f6db11769efb517427bf7f121f9c32e18b059b38", features = ["rb-sys"] }
26
26
 
27
27
  [dev-dependencies]
data/README.md CHANGED
@@ -17,6 +17,13 @@ Blazing-fast HTML → Markdown conversion for Ruby, powered by the same Rust eng
17
17
  - 🧰 **Bundled CLI proxy**: Call the Rust CLI straight from Ruby or shell scripts.
18
18
  - 🛠️ **First-class Rails support**: Works with `Gem.win_platform?` builds, supports Trusted Publishing, and compiles on install if no native gem matches.
19
19
 
20
+ ## Documentation & Support
21
+
22
+ - [GitHub repository](https://github.com/Goldziher/html-to-markdown)
23
+ - [Issue tracker](https://github.com/Goldziher/html-to-markdown/issues)
24
+ - [Changelog](https://github.com/Goldziher/html-to-markdown/blob/main/CHANGELOG.md)
25
+ - [Live demo (WASM)](https://goldziher.github.io/html-to-markdown/)
26
+
20
27
  ## Installation
21
28
 
22
29
  ```bash
@@ -156,7 +163,7 @@ You can also call the CLI binary directly for scripting:
156
163
 
157
164
  ```ruby
158
165
  HtmlToMarkdown::CLIProxy.call(['--version'])
159
- # => "html-to-markdown 2.5.4"
166
+ # => "html-to-markdown 2.5.6"
160
167
  ```
161
168
 
162
169
  Rebuild the CLI locally if you see `CLI binary not built` during tests:
data/extconf.rb CHANGED
@@ -5,7 +5,7 @@ require 'rb_sys/mkmf'
5
5
  require 'rbconfig'
6
6
 
7
7
  if RbConfig::CONFIG['host_os'] =~ /mswin|mingw/
8
- devkit = ENV['RI_DEVKIT']
8
+ devkit = ENV.fetch('RI_DEVKIT', nil)
9
9
  prefix = ENV['MSYSTEM_PREFIX'] || '/ucrt64'
10
10
 
11
11
  if devkit
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module HtmlToMarkdown
4
- VERSION = '2.5.4'
4
+ VERSION = '2.5.6'
5
5
  end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: html-to-markdown
3
3
  version: !ruby/object:Gem::Version
4
- version: 2.5.4
4
+ version: 2.5.6
5
5
  platform: ruby
6
6
  authors:
7
7
  - Na'aman Hirschfeld
@@ -30,16 +30,90 @@ dependencies:
30
30
  - - "<"
31
31
  - !ruby/object:Gem::Version
32
32
  version: '1.0'
33
- description: |-
34
- html-to-markdown wraps our ultra-fast Rust converter with a Ruby-native API via Magnus and rb-sys.
35
- Enjoy identical output to the Python, Node, and WASM bindings, a bundled CLI proxy, and seamless cross-platform installs.
33
+ description: "# html-to-markdown-rb\n\nBlazing-fast HTML → Markdown conversion for
34
+ Ruby, powered by the same Rust engine used by our Python, Node.js, and WebAssembly
35
+ packages. Ship identical Markdown across every runtime while enjoying native extension
36
+ performance.\n\n[![Crates.io](https://img.shields.io/crates/v/html-to-markdown-rs.svg)](https://crates.io/crates/html-to-markdown-rs)\n[![npm
37
+ version](https://badge.fury.io/js/html-to-markdown-node.svg)](https://www.npmjs.com/package/html-to-markdown-node)\n[![PyPI
38
+ version](https://badge.fury.io/py/html-to-markdown.svg)](https://pypi.org/project/html-to-markdown/)\n[![Gem
39
+ Version](https://badge.fury.io/rb/html-to-markdown.svg)](https://rubygems.org/gems/html-to-markdown)\n[![License:
40
+ MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://github.com/Goldziher/html-to-markdown/blob/main/LICENSE)\n\n##
41
+ Features\n\n- ⚡ **Rust-fast**: Ruby bindings around a highly optimised Rust core
42
+ (60‑80× faster than BeautifulSoup-based converters).\n- \U0001F501 **Identical output**:
43
+ Shares logic with the Python wheels, npm bindings, WASM package, and CLI — consistent
44
+ Markdown everywhere.\n- ⚙️ **Rich configuration**: Control heading styles, list
45
+ indentation, whitespace handling, HTML preprocessing, and more.\n- \U0001F5BC️ **Inline
46
+ image extraction**: Pull out embedded images (PNG/JPEG/SVG/data URIs) alongside
47
+ Markdown.\n- \U0001F9F0 **Bundled CLI proxy**: Call the Rust CLI straight from Ruby
48
+ or shell scripts.\n- \U0001F6E0️ **First-class Rails support**: Works with `Gem.win_platform?`
49
+ builds, supports Trusted Publishing, and compiles on install if no native gem matches.\n\n##
50
+ Documentation & Support\n\n- [GitHub repository](https://github.com/Goldziher/html-to-markdown)\n-
51
+ [Issue tracker](https://github.com/Goldziher/html-to-markdown/issues)\n- [Changelog](https://github.com/Goldziher/html-to-markdown/blob/main/CHANGELOG.md)\n-
52
+ [Live demo (WASM)](https://goldziher.github.io/html-to-markdown/)\n\n## Installation\n\n```bash\nbundle
53
+ add html-to-markdown\n# or\ngem install html-to-markdown\n```\n\nAdd the gem to
54
+ your project and Bundler will compile the native Rust extension on first install.\n\n###
55
+ Requirements\n\n- Ruby **3.2+** (Magnus relies on the fiber scheduler APIs added
56
+ in 3.2)\n- Rust toolchain **1.85+** with Cargo available on your `$PATH`\n- Ruby
57
+ development headers (`ruby-dev`, `ruby-devel`, or the platform equivalent)\n\n**Windows**:
58
+ install [RubyInstaller with MSYS2](https://rubyinstaller.org/) (UCRT64). Run once:\n\n```powershell\nridk
59
+ exec pacman -S --needed --noconfirm base-devel mingw-w64-ucrt-x86_64-toolchain\n```\n\nThis
60
+ provides the standard headers (including `strings.h`) required for the bindgen step.\n\n##
61
+ Performance Snapshot\n\nApple M4 • Real Wikipedia documents • `HtmlToMarkdown.convert`
62
+ (Ruby)\n\n| Document | Size | Latency | Throughput | Docs/sec |\n| -------------------
63
+ | ----- | ------- | ---------- | -------- |\n| Lists (Timeline) | 129KB | 0.69ms
64
+ \ | 187 MB/s | 1,450 |\n| Tables (Countries) | 360KB | 2.19ms | 164 MB/s
65
+ \ | 456 |\n| Mixed (Python wiki) | 656KB | 4.88ms | 134 MB/s | 205 |\n\n>
66
+ Same core, same benchmarks: the Ruby extension stays within single-digit % of the
67
+ Rust CLI and mirrors the Python/Node numbers.\n\n## Quick Start\n\n```ruby\nrequire
68
+ 'html_to_markdown'\n\nhtml = <<~HTML\n <h1>Welcome</h1>\n <p>This is <strong>Rust-fast</strong>
69
+ conversion!</p>\n <ul>\n <li>Native extension</li>\n <li>Identical output
70
+ across languages</li>\n </ul>\nHTML\n\nmarkdown = HtmlToMarkdown.convert(html)\nputs
71
+ markdown\n# # Welcome\n#\n# This is **Rust-fast** conversion!\n#\n# - Native extension\n#
72
+ - Identical output across languages\n```\n\n## API\n\n### Conversion Options\n\nPass
73
+ a Ruby hash (string or symbol keys) to tweak rendering. Every option maps one-for-one
74
+ with the Rust/Python/Node APIs.\n\n```ruby\nrequire 'html_to_markdown'\n\nmarkdown
75
+ = HtmlToMarkdown.convert(\n '<pre><code class=\"language-ruby\">puts \"hi\"</code></pre>',\n
76
+ \ heading_style: :atx,\n code_block_style: :fenced,\n bullets: '*+-',\n list_indent_type:
77
+ :spaces,\n list_indent_width: 2,\n whitespace_mode: :normalized,\n highlight_style:
78
+ :double_equal\n)\n\nputs markdown\n```\n\n### HTML Preprocessing\n\nClean up scraped
79
+ HTML (navigation, forms, malformed markup) before conversion:\n\n```ruby\nrequire
80
+ 'html_to_markdown'\n\nmarkdown = HtmlToMarkdown.convert(\n html,\n preprocessing:
81
+ {\n enabled: true,\n preset: :aggressive, # :minimal, :standard, :aggressive\n
82
+ \ remove_navigation: true,\n remove_forms: true\n }\n)\n```\n\n### Inline
83
+ Images\n\nExtract inline binary data (data URIs, SVG) together with the converted
84
+ Markdown.\n\n```ruby\nrequire 'html_to_markdown'\n\nresult = HtmlToMarkdown.convert_with_inline_images(\n
85
+ \ '<img src=\"data:image/png;base64,iVBORw0...\" alt=\"Pixel\">',\n image_config:
86
+ {\n max_decoded_size_bytes: 1 * 1024 * 1024,\n infer_dimensions: true,\n filename_prefix:
87
+ 'img_',\n capture_svg: true\n }\n)\n\nputs result.markdown\nresult.inline_images.each
88
+ do |img|\n puts \"#{img.filename} -> #{img.format} (#{img.data.bytesize} bytes)\"\nend\n```\n\n##
89
+ CLI\n\nThe gem bundles a small proxy for the Rust CLI binary. Use it when you need
90
+ parity with the standalone `html-to-markdown` executable.\n\n```ruby\nrequire 'html_to_markdown/cli'\n\nHtmlToMarkdown::CLI.run(%w[--heading-style
91
+ atx input.html], stdout: $stdout)\n# => writes converted Markdown to STDOUT\n```\n\nYou
92
+ can also call the CLI binary directly for scripting:\n\n```ruby\nHtmlToMarkdown::CLIProxy.call(['--version'])\n#
93
+ => \"html-to-markdown 2.5.6\"\n```\n\nRebuild the CLI locally if you see `CLI binary
94
+ not built` during tests:\n\n```bash\nbundle exec rake compile # builds
95
+ the extension\nbundle exec ruby scripts/prepare_ruby_gem.rb # copies the CLI into
96
+ lib/bin/\n```\n\n## Error Handling\n\nConversion errors raise `HtmlToMarkdown::Error`
97
+ (wrapping the Rust error context). CLI invocations use specialised subclasses:\n\n-
98
+ `HtmlToMarkdown::CLIProxy::MissingBinaryError`\n- `HtmlToMarkdown::CLIProxy::CLIExecutionError`\n\nRescue
99
+ them to provide clearer feedback in your application.\n\n## Consistent Across Languages\n\nThe
100
+ Ruby gem shares the exact Rust core with:\n\n- [Python wheels](https://pypi.org/project/html-to-markdown/)\n-
101
+ [Node.js / Bun bindings](https://www.npmjs.com/package/html-to-markdown-node)\n-
102
+ [WebAssembly package](https://www.npmjs.com/package/html-to-markdown-wasm)\n- The
103
+ Rust crate and CLI\n\nUse whichever runtime fits your stack while keeping formatting
104
+ behaviour identical.\n\n## Development\n\n```bash\nbundle exec rake compile #
105
+ build the native extension\nbundle exec rspec # run test suite\n```\n\nThe
106
+ extension uses [Magnus](https://github.com/matsadler/magnus) plus `rb-sys` for bindgen.
107
+ When editing the Rust code under `src/`, rerun `rake compile`.\n\n## License\n\nMIT
108
+ © Na'aman Hirschfeld\n"
36
109
  email:
37
110
  - nhirschfeld@gmail.com
38
111
  executables:
39
112
  - html-to-markdown
40
113
  extensions:
41
114
  - extconf.rb
42
- extra_rdoc_files: []
115
+ extra_rdoc_files:
116
+ - README.md
43
117
  files:
44
118
  - Cargo.toml
45
119
  - README.md
@@ -62,7 +136,7 @@ metadata:
62
136
  source_code_uri: https://github.com/Goldziher/html-to-markdown
63
137
  bug_tracker_uri: https://github.com/Goldziher/html-to-markdown/issues
64
138
  changelog_uri: https://github.com/Goldziher/html-to-markdown/releases
65
- documentation_uri: https://github.com/Goldziher/html-to-markdown/blob/main/README.md
139
+ documentation_uri: https://github.com/Goldziher/html-to-markdown/blob/main/crates/html-to-markdown-rb/README.md
66
140
  post_install_message:
67
141
  rdoc_options: []
68
142
  require_paths: