html-to-markdown 2.5.3-x86_64-darwin-22

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA256:
3
+ metadata.gz: 6ba3a68d8e55234ced5cb5ea5f505514a988ab25ae5a65d2f67bb22a9914b5f0
4
+ data.tar.gz: 61ba5353c6ec8d206c65f132131acd62bc17717b4c8b73e902d291dcccb3a509
5
+ SHA512:
6
+ metadata.gz: 070f6630e702bf828449309a140a823166c43985c938e752b5bdfee5fed89a7481dec8cccfc6645f640bd6938e374756dda9639c3b8c897979682a9b75a0c903
7
+ data.tar.gz: ca3add76b1fd08e3c2e713b45194d8268cf286bb6eee9d48419ac18fb9bdf3f20f58004ccad8c11cb4d296c852f00352e72899155d8b152d63a151eb378164e5
data/README.md ADDED
@@ -0,0 +1,149 @@
1
+ # html-to-markdown-rb
2
+
3
+ Ruby bindings for the `html-to-markdown` Rust engine – the same core that powers the Python wheels, Node.js NAPI bindings, WebAssembly package, and CLI. The gem exposes fast HTML → Markdown conversion with identical rendering behaviour across every supported language.
4
+
5
+ [![Crates.io](https://img.shields.io/crates/v/html-to-markdown-rs.svg)](https://crates.io/crates/html-to-markdown-rs)
6
+ [![npm version](https://badge.fury.io/js/html-to-markdown-node.svg)](https://www.npmjs.com/package/html-to-markdown-node)
7
+ [![PyPI version](https://badge.fury.io/py/html-to-markdown.svg)](https://pypi.org/project/html-to-markdown/)
8
+ [![Gem Version](https://badge.fury.io/rb/html-to-markdown.svg)](https://rubygems.org/gems/html-to-markdown)
9
+ [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://github.com/Goldziher/html-to-markdown/blob/main/LICENSE)
10
+
11
+ ## Installation
12
+
13
+ ```bash
14
+ bundle add html-to-markdown
15
+ # or
16
+ gem install html-to-markdown
17
+ ```
18
+
19
+ Add the gem to your project and Bundler will compile the native Rust extension on first install.
20
+
21
+ ### Requirements
22
+
23
+ - Ruby **3.2+** (Magnus relies on the fiber scheduler APIs added in 3.2)
24
+ - Rust toolchain **1.85+** with Cargo available on your `$PATH`
25
+ - Ruby development headers (`ruby-dev`, `ruby-devel`, or the platform equivalent)
26
+
27
+ **Windows**: install [RubyInstaller with MSYS2](https://rubyinstaller.org/) (UCRT64). Run once:
28
+
29
+ ```powershell
30
+ ridk exec pacman -S --needed --noconfirm base-devel mingw-w64-ucrt-x86_64-toolchain
31
+ ```
32
+
33
+ This provides the standard headers (including `strings.h`) required for the bindgen step.
34
+
35
+ ## Quick Start
36
+
37
+ ```ruby
38
+ require 'html_to_markdown'
39
+
40
+ html = <<~HTML
41
+ <h1>Welcome</h1>
42
+ <p>This is <strong>Rust-fast</strong> conversion!</p>
43
+ <ul>
44
+ <li>Native extension</li>
45
+ <li>Identical output across languages</li>
46
+ </ul>
47
+ HTML
48
+
49
+ markdown = HtmlToMarkdown.convert(html)
50
+ puts markdown
51
+ # # Welcome
52
+ #
53
+ # This is **Rust-fast** conversion!
54
+ #
55
+ # - Native extension
56
+ # - Identical output across languages
57
+ ```
58
+
59
+ ### Conversion with Options
60
+
61
+ All configuration mirrors the Rust API. Options accept symbols or strings and match the same defaults as the other bindings.
62
+
63
+ ```ruby
64
+ require 'html_to_markdown'
65
+
66
+ markdown = HtmlToMarkdown.convert(
67
+ '<pre><code class="language-ruby">puts "hi"</code></pre>',
68
+ heading_style: :atx,
69
+ code_block_style: :fenced,
70
+ bullets: ['*', '-', '+'],
71
+ wrap: true,
72
+ wrap_width: 80,
73
+ preserve_tags: %w[table figure]
74
+ )
75
+ ```
76
+
77
+ ### Inline Images
78
+
79
+ Extract inline binary data (data URIs, SVG) together with the converted Markdown.
80
+
81
+ ```ruby
82
+ require 'html_to_markdown'
83
+
84
+ result = HtmlToMarkdown.convert_with_inline_images(
85
+ '<img src="..." alt="Pixel">',
86
+ image_config: {
87
+ max_decoded_size_bytes: 1 * 1024 * 1024,
88
+ infer_dimensions: true,
89
+ filename_prefix: 'img_',
90
+ capture_svg: true
91
+ }
92
+ )
93
+
94
+ puts result.markdown
95
+ result.inline_images.each do |img|
96
+ puts "#{img.filename} -> #{img.format} (#{img.data.bytesize} bytes)"
97
+ end
98
+ ```
99
+
100
+ ### CLI Proxy
101
+
102
+ The gem bundles a small proxy for the Rust CLI binary. Use it when you need parity with the standalone `html-to-markdown` executable.
103
+
104
+ ```ruby
105
+ require 'html_to_markdown/cli'
106
+
107
+ HtmlToMarkdown::CLI.run(%w[--heading-style atx input.html], stdout: $stdout)
108
+ # => writes converted Markdown to STDOUT
109
+ ```
110
+
111
+ You can also call the CLI binary directly for scripting:
112
+
113
+ ```ruby
114
+ HtmlToMarkdown::CLIProxy.call(['--version'])
115
+ # => "html-to-markdown 2.5.3"
116
+ ```
117
+
118
+ ### Error Handling
119
+
120
+ Conversion errors raise `HtmlToMarkdown::Error` (wrapping the Rust error context). CLI invocations use specialised subclasses:
121
+
122
+ - `HtmlToMarkdown::CLIProxy::MissingBinaryError`
123
+ - `HtmlToMarkdown::CLIProxy::CLIExecutionError`
124
+
125
+ Rescue them to provide clearer feedback in your application.
126
+
127
+ ## Consistent Across Languages
128
+
129
+ The Ruby gem shares the exact Rust core with:
130
+
131
+ - [Python wheels](https://pypi.org/project/html-to-markdown/)
132
+ - [Node.js / Bun bindings](https://www.npmjs.com/package/html-to-markdown-node)
133
+ - [WebAssembly package](https://www.npmjs.com/package/html-to-markdown-wasm)
134
+ - The Rust crate and CLI
135
+
136
+ Use whichever runtime fits your stack while keeping formatting behaviour identical.
137
+
138
+ ## Development
139
+
140
+ ```bash
141
+ bundle exec rake compile # build the native extension
142
+ bundle exec rspec # run test suite
143
+ ```
144
+
145
+ The extension uses [Magnus](https://github.com/matsadler/magnus) plus `rb-sys` for bindgen. When editing the Rust code under `src/`, rerun `rake compile`.
146
+
147
+ ## License
148
+
149
+ MIT © Na'aman Hirschfeld
@@ -0,0 +1,6 @@
1
+ #!/usr/bin/env ruby
2
+ # frozen_string_literal: true
3
+
4
+ require 'html_to_markdown/cli'
5
+
6
+ exit HtmlToMarkdown::CLI.run(ARGV)
Binary file
@@ -0,0 +1,21 @@
1
+ # frozen_string_literal: true
2
+
3
+ require 'html_to_markdown/cli_proxy'
4
+
5
+ module HtmlToMarkdown
6
+ module CLI
7
+ module_function
8
+
9
+ def run(argv = ARGV, stdout: $stdout, stderr: $stderr)
10
+ output = CLIProxy.call(argv)
11
+ stdout.print(output)
12
+ 0
13
+ rescue CLIProxy::CLIExecutionError => e
14
+ stderr.print(e.stderr)
15
+ e.status || 1
16
+ rescue CLIProxy::MissingBinaryError, CLIProxy::Error => e
17
+ stderr.puts(e.message)
18
+ 1
19
+ end
20
+ end
21
+ end
@@ -0,0 +1,71 @@
1
+ # frozen_string_literal: true
2
+
3
+ require 'open3'
4
+ require 'pathname'
5
+
6
+ module HtmlToMarkdown
7
+ module CLIProxy
8
+ Error = Class.new(StandardError)
9
+ MissingBinaryError = Class.new(Error)
10
+
11
+ class CLIExecutionError < Error
12
+ attr_reader :stderr, :status
13
+
14
+ def initialize(message, stderr:, status:)
15
+ super(message)
16
+ @stderr = stderr
17
+ @status = status
18
+ end
19
+ end
20
+
21
+ module_function
22
+
23
+ def call(argv)
24
+ binary = find_cli_binary
25
+ args = Array(argv).map(&:to_s)
26
+ stdout, stderr, status = Open3.capture3(binary.to_s, *args)
27
+ return stdout if status.success?
28
+
29
+ raise CLIExecutionError.new(
30
+ "html-to-markdown CLI exited with status #{status.exitstatus}",
31
+ stderr: stderr,
32
+ status: status.exitstatus
33
+ )
34
+ end
35
+
36
+ def find_cli_binary
37
+ binary_name = Gem.win_platform? ? 'html-to-markdown.exe' : 'html-to-markdown'
38
+ found = search_paths(binary_name).find(&:file?)
39
+ return found if found
40
+
41
+ raise MissingBinaryError, missing_binary_message
42
+ end
43
+
44
+ def root_path
45
+ @root_path ||= Pathname(__dir__).join('../..').expand_path
46
+ end
47
+
48
+ def lib_path
49
+ @lib_path ||= Pathname(__dir__).join('..').expand_path
50
+ end
51
+
52
+ def search_paths(binary_name)
53
+ paths = [
54
+ root_path.join('target', 'release', binary_name),
55
+ lib_path.join('bin', binary_name),
56
+ lib_path.join(binary_name)
57
+ ]
58
+
59
+ workspace_root = root_path.parent&.parent
60
+ paths << workspace_root.join('target', 'release', binary_name) if workspace_root
61
+ paths
62
+ end
63
+
64
+ def missing_binary_message
65
+ <<~MSG.strip
66
+ html-to-markdown CLI binary not found. Build it with
67
+ `cargo build --release --package html-to-markdown-cli`.
68
+ MSG
69
+ end
70
+ end
71
+ end
@@ -0,0 +1,5 @@
1
+ # frozen_string_literal: true
2
+
3
+ module HtmlToMarkdown
4
+ VERSION = '2.5.3'
5
+ end
@@ -0,0 +1,24 @@
1
+ # frozen_string_literal: true
2
+
3
+ require_relative 'html_to_markdown/version'
4
+ require 'html_to_markdown_rb'
5
+
6
+ module HtmlToMarkdown
7
+ autoload :CLI, 'html_to_markdown/cli'
8
+ autoload :CLIProxy, 'html_to_markdown/cli_proxy'
9
+
10
+ class << self
11
+ alias native_convert convert
12
+ alias native_convert_with_inline_images convert_with_inline_images
13
+ end
14
+
15
+ module_function
16
+
17
+ def convert(html, options = nil)
18
+ native_convert(html.to_s, options)
19
+ end
20
+
21
+ def convert_with_inline_images(html, options = nil, image_config = nil)
22
+ native_convert_with_inline_images(html.to_s, options, image_config)
23
+ end
24
+ end
Binary file
@@ -0,0 +1,42 @@
1
+ # frozen_string_literal: true
2
+
3
+ require 'spec_helper'
4
+ require 'html_to_markdown/cli_proxy'
5
+ require 'html_to_markdown/cli'
6
+ require 'stringio'
7
+
8
+ RSpec.describe HtmlToMarkdown::CLIProxy do
9
+ describe '.call' do
10
+ it 'executes the CLI binary' do
11
+ begin
12
+ binary = described_class.find_cli_binary
13
+ rescue HtmlToMarkdown::CLIProxy::MissingBinaryError
14
+ skip 'CLI binary not built'
15
+ end
16
+
17
+ expect(binary).to be_file
18
+
19
+ output = described_class.call(['--version'])
20
+ expect(output).to include(HtmlToMarkdown::VERSION)
21
+ end
22
+ end
23
+
24
+ describe HtmlToMarkdown::CLI do
25
+ it 'writes CLI output to stdout' do
26
+ begin
27
+ HtmlToMarkdown::CLIProxy.find_cli_binary
28
+ rescue HtmlToMarkdown::CLIProxy::MissingBinaryError
29
+ skip 'CLI binary not built'
30
+ end
31
+
32
+ stdout = StringIO.new
33
+ stderr = StringIO.new
34
+
35
+ exit_code = described_class.run(['--version'], stdout: stdout, stderr: stderr)
36
+
37
+ expect(exit_code).to eq(0)
38
+ expect(stdout.string).to include(HtmlToMarkdown::VERSION)
39
+ expect(stderr.string).to be_empty
40
+ end
41
+ end
42
+ end
@@ -0,0 +1,29 @@
1
+ # frozen_string_literal: true
2
+
3
+ require 'spec_helper'
4
+
5
+ RSpec.describe HtmlToMarkdown do
6
+ describe '.convert' do
7
+ it 'converts simple headings' do
8
+ expect(described_class.convert('<h1>Hello</h1>')).to eq("# Hello\n")
9
+ end
10
+
11
+ it 'accepts options hash' do
12
+ result = described_class.convert(
13
+ '<h1>Hello</h1>',
14
+ heading_style: :atx_closed,
15
+ default_title: true
16
+ )
17
+ expect(result).to include('Hello')
18
+ end
19
+ end
20
+
21
+ describe '.convert_with_inline_images' do
22
+ it 'returns inline images metadata' do
23
+ html = '<p><img src="" alt="fake"></p>'
24
+ extraction = described_class.convert_with_inline_images(html)
25
+ expect(extraction).to include(:markdown, :inline_images, :warnings)
26
+ expect(extraction[:inline_images].first[:description]).to eq('fake')
27
+ end
28
+ end
29
+ end
@@ -0,0 +1,10 @@
1
+ # frozen_string_literal: true
2
+
3
+ require 'bundler/setup'
4
+ require 'html_to_markdown'
5
+
6
+ RSpec.configure do |config|
7
+ config.expect_with :rspec do |c|
8
+ c.syntax = :expect
9
+ end
10
+ end
metadata ADDED
@@ -0,0 +1,65 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: html-to-markdown
3
+ version: !ruby/object:Gem::Version
4
+ version: 2.5.3
5
+ platform: x86_64-darwin-22
6
+ authors:
7
+ - Na'aman Hirschfeld
8
+ autorequire:
9
+ bindir: exe
10
+ cert_chain: []
11
+ date: 2025-10-29 00:00:00.000000000 Z
12
+ dependencies: []
13
+ description: High-performance HTML to Markdown conversion from Ruby using Magnus and
14
+ rb-sys.
15
+ email:
16
+ - nhirschfeld@gmail.com
17
+ executables:
18
+ - html-to-markdown
19
+ extensions: []
20
+ extra_rdoc_files: []
21
+ files:
22
+ - README.md
23
+ - exe/html-to-markdown
24
+ - lib/bin/html-to-markdown
25
+ - lib/html_to_markdown.rb
26
+ - lib/html_to_markdown/cli.rb
27
+ - lib/html_to_markdown/cli_proxy.rb
28
+ - lib/html_to_markdown/version.rb
29
+ - lib/html_to_markdown_rb.bundle
30
+ - spec/cli_proxy_spec.rb
31
+ - spec/convert_spec.rb
32
+ - spec/spec_helper.rb
33
+ homepage: https://github.com/Goldziher/html-to-markdown
34
+ licenses:
35
+ - MIT
36
+ metadata:
37
+ rubygems_mfa_required: 'true'
38
+ homepage_uri: https://github.com/Goldziher/html-to-markdown
39
+ source_code_uri: https://github.com/Goldziher/html-to-markdown
40
+ bug_tracker_uri: https://github.com/Goldziher/html-to-markdown/issues
41
+ changelog_uri: https://github.com/Goldziher/html-to-markdown/releases
42
+ documentation_uri: https://github.com/Goldziher/html-to-markdown/blob/main/README.md
43
+ post_install_message:
44
+ rdoc_options: []
45
+ require_paths:
46
+ - lib
47
+ required_ruby_version: !ruby/object:Gem::Requirement
48
+ requirements:
49
+ - - ">="
50
+ - !ruby/object:Gem::Version
51
+ version: '3.3'
52
+ - - "<"
53
+ - !ruby/object:Gem::Version
54
+ version: 3.4.dev
55
+ required_rubygems_version: !ruby/object:Gem::Requirement
56
+ requirements:
57
+ - - ">="
58
+ - !ruby/object:Gem::Version
59
+ version: '0'
60
+ requirements: []
61
+ rubygems_version: 3.5.22
62
+ signing_key:
63
+ specification_version: 4
64
+ summary: Ruby bindings for the html-to-markdown Rust library
65
+ test_files: []