markdownator 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA256:
3
+ metadata.gz: fba67d2ba95d9e160fde97514331e9a40a72bee5219f49f422738700a799af23
4
+ data.tar.gz: a277a37a4de92899f4d045c9f4d4c76a1ebf72d76d014fe13887952ccf522eb0
5
+ SHA512:
6
+ metadata.gz: 2cd3f9f43a382e333b5a1fd76bba182d09b27da73a7bc47072fd0f366aa53fe01ace0eae14daa5ac6aded3d8e6d8b3f1031c27e85132f7cc7f6f13988db9f5e9
7
+ data.tar.gz: faa3b029ce56dc20a3e54c2ea541b2749038c36fe07c0eb3a56bf7625113219a58eb2003bd46a20564f84a21c16c03d18a7ff4a92543586f8e9464f2c092bef9
data/.rspec ADDED
@@ -0,0 +1,3 @@
1
+ --format documentation
2
+ --color
3
+ --require spec_helper
data/.rubocop.yml ADDED
@@ -0,0 +1,20 @@
1
+ AllCops:
2
+ TargetRubyVersion: 2.7
3
+ NewCops: enable
4
+
5
+ Metrics:
6
+ Enabled: false
7
+
8
+ Style/Documentation:
9
+ Enabled: false
10
+
11
+ Style/StringLiterals:
12
+ Enabled: true
13
+ EnforcedStyle: double_quotes
14
+
15
+ Style/StringLiteralsInInterpolation:
16
+ Enabled: true
17
+ EnforcedStyle: double_quotes
18
+
19
+ Layout/LineLength:
20
+ Max: 120
data/CHANGELOG.md ADDED
@@ -0,0 +1,9 @@
1
+ ## [Unreleased]
2
+
3
+ ## [0.1.0] - 2026-06-12
4
+
5
+ - Initial release.
6
+ - Converter-registry engine (`Markdownator.convert`) dispatching local paths, URLs, and IO streams.
7
+ - Converters for plain text, HTML, CSV, JSON, XML, DOCX, XLSX, PPTX, PDF, EPUB, ZIP (recursive), and image metadata.
8
+ - Optional/lazy dependency loading with helpful errors for missing format gems.
9
+ - Pluggable LLM image captioner hook.
@@ -0,0 +1,84 @@
1
+ # Contributor Covenant Code of Conduct
2
+
3
+ ## Our Pledge
4
+
5
+ We as members, contributors, and leaders pledge to make participation in our community a harassment-free experience for everyone, regardless of age, body size, visible or invisible disability, ethnicity, sex characteristics, gender identity and expression, level of experience, education, socio-economic status, nationality, personal appearance, race, religion, or sexual identity and orientation.
6
+
7
+ We pledge to act and interact in ways that contribute to an open, welcoming, diverse, inclusive, and healthy community.
8
+
9
+ ## Our Standards
10
+
11
+ Examples of behavior that contributes to a positive environment for our community include:
12
+
13
+ * Demonstrating empathy and kindness toward other people
14
+ * Being respectful of differing opinions, viewpoints, and experiences
15
+ * Giving and gracefully accepting constructive feedback
16
+ * Accepting responsibility and apologizing to those affected by our mistakes, and learning from the experience
17
+ * Focusing on what is best not just for us as individuals, but for the overall community
18
+
19
+ Examples of unacceptable behavior include:
20
+
21
+ * The use of sexualized language or imagery, and sexual attention or
22
+ advances of any kind
23
+ * Trolling, insulting or derogatory comments, and personal or political attacks
24
+ * Public or private harassment
25
+ * Publishing others' private information, such as a physical or email
26
+ address, without their explicit permission
27
+ * Other conduct which could reasonably be considered inappropriate in a
28
+ professional setting
29
+
30
+ ## Enforcement Responsibilities
31
+
32
+ Community leaders are responsible for clarifying and enforcing our standards of acceptable behavior and will take appropriate and fair corrective action in response to any behavior that they deem inappropriate, threatening, offensive, or harmful.
33
+
34
+ Community leaders have the right and responsibility to remove, edit, or reject comments, commits, code, wiki edits, issues, and other contributions that are not aligned to this Code of Conduct, and will communicate reasons for moderation decisions when appropriate.
35
+
36
+ ## Scope
37
+
38
+ This Code of Conduct applies within all community spaces, and also applies when an individual is officially representing the community in public spaces. Examples of representing our community include using an official e-mail address, posting via an official social media account, or acting as an appointed representative at an online or offline event.
39
+
40
+ ## Enforcement
41
+
42
+ Instances of abusive, harassing, or otherwise unacceptable behavior may be reported to the community leaders responsible for enforcement at alexrupom@hotmail.com. All complaints will be reviewed and investigated promptly and fairly.
43
+
44
+ All community leaders are obligated to respect the privacy and security of the reporter of any incident.
45
+
46
+ ## Enforcement Guidelines
47
+
48
+ Community leaders will follow these Community Impact Guidelines in determining the consequences for any action they deem in violation of this Code of Conduct:
49
+
50
+ ### 1. Correction
51
+
52
+ **Community Impact**: Use of inappropriate language or other behavior deemed unprofessional or unwelcome in the community.
53
+
54
+ **Consequence**: A private, written warning from community leaders, providing clarity around the nature of the violation and an explanation of why the behavior was inappropriate. A public apology may be requested.
55
+
56
+ ### 2. Warning
57
+
58
+ **Community Impact**: A violation through a single incident or series of actions.
59
+
60
+ **Consequence**: A warning with consequences for continued behavior. No interaction with the people involved, including unsolicited interaction with those enforcing the Code of Conduct, for a specified period of time. This includes avoiding interactions in community spaces as well as external channels like social media. Violating these terms may lead to a temporary or permanent ban.
61
+
62
+ ### 3. Temporary Ban
63
+
64
+ **Community Impact**: A serious violation of community standards, including sustained inappropriate behavior.
65
+
66
+ **Consequence**: A temporary ban from any sort of interaction or public communication with the community for a specified period of time. No public or private interaction with the people involved, including unsolicited interaction with those enforcing the Code of Conduct, is allowed during this period. Violating these terms may lead to a permanent ban.
67
+
68
+ ### 4. Permanent Ban
69
+
70
+ **Community Impact**: Demonstrating a pattern of violation of community standards, including sustained inappropriate behavior, harassment of an individual, or aggression toward or disparagement of classes of individuals.
71
+
72
+ **Consequence**: A permanent ban from any sort of public interaction within the community.
73
+
74
+ ## Attribution
75
+
76
+ This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 2.0,
77
+ available at https://www.contributor-covenant.org/version/2/0/code_of_conduct.html.
78
+
79
+ Community Impact Guidelines were inspired by [Mozilla's code of conduct enforcement ladder](https://github.com/mozilla/diversity).
80
+
81
+ [homepage]: https://www.contributor-covenant.org
82
+
83
+ For answers to common questions about this code of conduct, see the FAQ at
84
+ https://www.contributor-covenant.org/faq. Translations are available at https://www.contributor-covenant.org/translations.
data/Gemfile ADDED
@@ -0,0 +1,22 @@
1
+ # frozen_string_literal: true
2
+
3
+ source "https://rubygems.org"
4
+
5
+ # Specify your gem's dependencies in markdownator.gemspec
6
+ gemspec
7
+
8
+ gem "rake", "~> 13.0"
9
+
10
+ gem "rspec", "~> 3.0"
11
+
12
+ gem "rubocop", "~> 1.21"
13
+
14
+ # Optional format libraries, declared here so the test suite can exercise every
15
+ # converter. Applications install only the ones for the formats they use; the
16
+ # gem itself requires them lazily and never depends on them at runtime.
17
+ gem "exifr", "~> 1.3"
18
+ gem "nokogiri", "~> 1.15"
19
+ gem "pdf-reader", "~> 2.12"
20
+ gem "reverse_markdown", "~> 2.1"
21
+ gem "roo", "~> 2.10"
22
+ gem "rubyzip", "~> 2.3"
data/LICENSE.txt ADDED
@@ -0,0 +1,21 @@
1
+ The MIT License (MIT)
2
+
3
+ Copyright (c) 2026 alexrupom
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in
13
+ all copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
21
+ THE SOFTWARE.
data/README.md ADDED
@@ -0,0 +1,109 @@
1
+ # Markdownator
2
+
3
+ Convert files into clean, LLM-friendly **Markdown**. Point Markdownator at a PDF,
4
+ Office document, web page, archive, or image and get Markdown back.
5
+
6
+ It uses a pluggable converter-registry architecture and loads heavy format
7
+ libraries **lazily**, so you only install the gems for the formats you actually use.
8
+
9
+ ## Supported formats
10
+
11
+ | Format | Extensions | Extra gem required |
12
+ |--------|------------|--------------------|
13
+ | Plain text / Markdown | `.txt`, `.md` | — (built in) |
14
+ | CSV | `.csv` | — (built in) |
15
+ | JSON | `.json` | — (built in) |
16
+ | HTML | `.html`, `.htm` | `reverse_markdown` (+ `nokogiri`) |
17
+ | XML | `.xml` | `nokogiri` |
18
+ | Word | `.docx` | `rubyzip`, `nokogiri` |
19
+ | Excel | `.xlsx` | `roo` |
20
+ | PowerPoint | `.pptx` | `rubyzip`, `nokogiri` |
21
+ | PDF | `.pdf` | `pdf-reader` |
22
+ | EPUB | `.epub` | `rubyzip`, `nokogiri`, `reverse_markdown` |
23
+ | ZIP (recurses) | `.zip` | `rubyzip` |
24
+ | Images (metadata) | `.jpg`, `.png`, `.tiff`, … | `exifr` (for EXIF) |
25
+
26
+ If a required gem is missing, the converter raises `Markdownator::MissingDependencyError`
27
+ telling you exactly what to add to your `Gemfile`.
28
+
29
+ ## Installation
30
+
31
+ ```ruby
32
+ gem "markdownator"
33
+ ```
34
+
35
+ Then add the gems for the formats you need, e.g.:
36
+
37
+ ```ruby
38
+ gem "pdf-reader" # PDF
39
+ gem "roo" # XLSX
40
+ gem "rubyzip" # DOCX, PPTX, EPUB, ZIP
41
+ gem "nokogiri" # HTML, XML, DOCX, PPTX, EPUB
42
+ gem "reverse_markdown" # HTML, EPUB
43
+ gem "exifr" # image EXIF
44
+ ```
45
+
46
+ ## Usage
47
+
48
+ ```ruby
49
+ require "markdownator"
50
+
51
+ # From a local path — format is detected from the extension.
52
+ result = Markdownator.convert("report.pdf")
53
+ puts result.markdown
54
+ puts result.title # when the format exposes one (HTML, EPUB)
55
+ puts result.metadata # e.g. { page_count: 12 } for PDF
56
+
57
+ # From a URL.
58
+ Markdownator.convert("https://example.com").markdown
59
+
60
+ # From an open stream — pass hints via StreamInfo.
61
+ File.open("data.csv", "rb") do |io|
62
+ info = Markdownator::StreamInfo.new(extension: "csv")
63
+ Markdownator.convert_stream(io, info).markdown
64
+ end
65
+ ```
66
+
67
+ `Result#to_s` and `Result#text_content` both return the Markdown, so a result is
68
+ convenient to print or interpolate directly.
69
+
70
+ ### Image captioning (optional)
71
+
72
+ Image conversion emits EXIF metadata by default. To add a natural-language
73
+ description, pass any object that responds to `#caption(io, stream_info)` and
74
+ returns a `String`:
75
+
76
+ ```ruby
77
+ class ClaudeCaptioner
78
+ def caption(io, stream_info)
79
+ # Send io.read to your vision model (e.g. Claude) and return its description.
80
+ end
81
+ end
82
+
83
+ Markdownator.convert("photo.jpg", captioner: ClaudeCaptioner.new).markdown
84
+ ```
85
+
86
+ No LLM gem is bundled; the hook is off unless you provide a captioner.
87
+
88
+ ## Development
89
+
90
+ After checking out the repo, run `bin/setup` to install dependencies. Then run
91
+ `rake spec` to run the tests, or `bin/console` for an interactive prompt.
92
+
93
+ To install this gem onto your local machine, run `bundle exec rake install`.
94
+
95
+ ## Contributing
96
+
97
+ Bug reports and pull requests are welcome on GitHub at
98
+ https://github.com/alexrupom/markdownator.
99
+
100
+ ## License
101
+
102
+ The gem is available as open source under the terms of the
103
+ [MIT License](https://opensource.org/licenses/MIT).
104
+
105
+ ## Code of Conduct
106
+
107
+ Everyone interacting in the Markdownator project's codebases, issue trackers, chat
108
+ rooms and mailing lists is expected to follow the
109
+ [code of conduct](https://github.com/alexrupom/markdownator/blob/main/CODE_OF_CONDUCT.md).
data/Rakefile ADDED
@@ -0,0 +1,12 @@
1
+ # frozen_string_literal: true
2
+
3
+ require "bundler/gem_tasks"
4
+ require "rspec/core/rake_task"
5
+
6
+ RSpec::Core::RakeTask.new(:spec)
7
+
8
+ require "rubocop/rake_task"
9
+
10
+ RuboCop::RakeTask.new
11
+
12
+ task default: %i[spec rubocop]
@@ -0,0 +1,69 @@
1
+ # frozen_string_literal: true
2
+
3
+ module Markdownator
4
+ module Converters
5
+ # Abstract base class for all converters.
6
+ #
7
+ # A converter is asked, in priority order, whether it `accepts?` a stream;
8
+ # the first one that does has its `convert` called. Subclasses must override
9
+ # both methods. The stream passed in is a rewound IO positioned at byte 0.
10
+ class Base
11
+ # @return [Boolean] whether this converter can handle the given stream.
12
+ def accepts?(_io, _stream_info)
13
+ raise NotImplementedError, "#{self.class} must implement #accepts?"
14
+ end
15
+
16
+ # @return [Markdownator::Result]
17
+ def convert(_io, _stream_info, **_options)
18
+ raise NotImplementedError, "#{self.class} must implement #convert"
19
+ end
20
+
21
+ private
22
+
23
+ # True when the stream's extension or guessed mimetype matches.
24
+ def matches?(stream_info, extensions: [], mimetypes: [])
25
+ ext = stream_info.extension
26
+ return true if ext && extensions.include?(ext)
27
+
28
+ mime = stream_info.guessed_mimetype
29
+ return true if mime && mimetypes.include?(mime)
30
+
31
+ false
32
+ end
33
+
34
+ # Reads the full stream as a String, honoring the charset hint when present.
35
+ def read_all(io, stream_info)
36
+ data = io.read
37
+ return "" if data.nil?
38
+
39
+ data = data.dup
40
+ encoding = stream_info.charset
41
+ data.force_encoding(encoding) if encoding && Encoding.name_list.include?(encoding)
42
+ data.valid_encoding? ? data : data.encode("UTF-8", invalid: :replace, undef: :replace)
43
+ end
44
+
45
+ # Builds a GitHub-flavored Markdown table from a header row and body rows.
46
+ # Empty input yields an empty string.
47
+ def markdown_table(rows)
48
+ rows = rows.map { |row| Array(row).map { |cell| format_cell(cell) } }
49
+ rows.reject!(&:empty?)
50
+ return "" if rows.empty?
51
+
52
+ width = rows.map(&:length).max
53
+ rows.each { |row| row.fill("", row.length...width) }
54
+
55
+ header = rows.first
56
+ body = rows[1..] || []
57
+ lines = []
58
+ lines << "| #{header.join(" | ")} |"
59
+ lines << "| #{Array.new(width, "---").join(" | ")} |"
60
+ body.each { |row| lines << "| #{row.join(" | ")} |" }
61
+ lines.join("\n")
62
+ end
63
+
64
+ def format_cell(cell)
65
+ cell.to_s.gsub(/\s+/, " ").strip.gsub("|", "\\|")
66
+ end
67
+ end
68
+ end
69
+ end
@@ -0,0 +1,21 @@
1
+ # frozen_string_literal: true
2
+
3
+ require "csv"
4
+
5
+ module Markdownator
6
+ module Converters
7
+ # Converts CSV into a GitHub-flavored Markdown table.
8
+ class Csv < Base
9
+ def accepts?(_io, stream_info)
10
+ matches?(stream_info, extensions: %w[csv], mimetypes: %w[text/csv application/csv])
11
+ end
12
+
13
+ def convert(io, stream_info, **_options)
14
+ rows = CSV.parse(read_all(io, stream_info))
15
+ Result.new(markdown: markdown_table(rows))
16
+ rescue CSV::MalformedCSVError => e
17
+ raise FileConversionError, "Could not parse CSV: #{e.message}"
18
+ end
19
+ end
20
+ end
21
+ end
@@ -0,0 +1,80 @@
1
+ # frozen_string_literal: true
2
+
3
+ module Markdownator
4
+ module Converters
5
+ # Converts a Word .docx (Office Open XML) document into Markdown.
6
+ #
7
+ # A .docx is a ZIP whose `word/document.xml` holds the body. We map heading
8
+ # styles to `#` levels, list paragraphs to bullets, and `w:tbl` to Markdown
9
+ # tables.
10
+ class Docx < Base
11
+ W_NS = "http://schemas.openxmlformats.org/wordprocessingml/2006/main"
12
+
13
+ def accepts?(_io, stream_info)
14
+ matches?(
15
+ stream_info,
16
+ extensions: %w[docx],
17
+ mimetypes: %w[application/vnd.openxmlformats-officedocument.wordprocessingml.document]
18
+ )
19
+ end
20
+
21
+ def convert(io, _stream_info, **_options)
22
+ Markdownator.require_optional("zip", feature: "DOCX conversion")
23
+ Markdownator.require_optional("nokogiri", feature: "DOCX conversion")
24
+
25
+ xml = read_entry(io, "word/document.xml")
26
+ raise FileConversionError, "DOCX is missing word/document.xml" if xml.nil?
27
+
28
+ doc = Nokogiri::XML(xml)
29
+ doc.remove_namespaces!
30
+ body = doc.at_xpath("//body")
31
+ blocks = body.nil? ? [] : body.element_children.filter_map { |node| render_block(node) }
32
+ Result.new(markdown: blocks.join("\n\n"))
33
+ end
34
+
35
+ private
36
+
37
+ def read_entry(io, name)
38
+ ::Zip::File.open_buffer(io) do |zip|
39
+ entry = zip.find_entry(name)
40
+ return entry&.get_input_stream&.read
41
+ end
42
+ end
43
+
44
+ def render_block(node)
45
+ case node.name
46
+ when "p" then render_paragraph(node)
47
+ when "tbl" then render_table(node)
48
+ end
49
+ end
50
+
51
+ def render_paragraph(para)
52
+ text = para.xpath(".//t").map(&:text).join.strip
53
+ return nil if text.empty?
54
+
55
+ style = para.at_xpath(".//pStyle/@val")&.value.to_s
56
+ if (level = heading_level(style))
57
+ "#{"#" * level} #{text}"
58
+ elsif style.match?(/ListParagraph/i) || !para.at_xpath(".//numPr").nil?
59
+ "- #{text}"
60
+ else
61
+ text
62
+ end
63
+ end
64
+
65
+ def heading_level(style)
66
+ match = style.match(/\AHeading(\d)/i) || style.match(/\ATitle\z/i)
67
+ return nil if match.nil?
68
+
69
+ match[1] ? match[1].to_i.clamp(1, 6) : 1
70
+ end
71
+
72
+ def render_table(table)
73
+ rows = table.xpath("./tr").map do |tr|
74
+ tr.xpath("./tc").map { |tc| tc.xpath(".//t").map(&:text).join.strip }
75
+ end
76
+ rows.empty? ? nil : markdown_table(rows)
77
+ end
78
+ end
79
+ end
80
+ end
@@ -0,0 +1,64 @@
1
+ # frozen_string_literal: true
2
+
3
+ module Markdownator
4
+ module Converters
5
+ # Converts an EPUB into Markdown by reading the OPF spine order and running
6
+ # each XHTML chapter through the HTML converter.
7
+ class Epub < Base
8
+ CONTAINER_PATH = "META-INF/container.xml"
9
+
10
+ def accepts?(_io, stream_info)
11
+ matches?(stream_info, extensions: %w[epub], mimetypes: %w[application/epub+zip])
12
+ end
13
+
14
+ def convert(io, _stream_info, **_options)
15
+ Markdownator.require_optional("zip", feature: "EPUB conversion")
16
+ Markdownator.require_optional("nokogiri", feature: "EPUB conversion")
17
+
18
+ ::Zip::File.open_buffer(io) do |zip|
19
+ opf_path = locate_opf(zip)
20
+ raise FileConversionError, "EPUB is missing its OPF package document" if opf_path.nil?
21
+
22
+ opf = Nokogiri::XML(read(zip, opf_path))
23
+ opf.remove_namespaces!
24
+ base = File.dirname(opf_path)
25
+ title = opf.at_xpath("//metadata/title")&.text&.strip
26
+ chapters = spine_documents(zip, opf, base)
27
+ return Result.new(markdown: chapters.join("\n\n"), title: title)
28
+ end
29
+ end
30
+
31
+ private
32
+
33
+ def locate_opf(zip)
34
+ container = zip.find_entry(CONTAINER_PATH)
35
+ return nil if container.nil?
36
+
37
+ doc = Nokogiri::XML(container.get_input_stream.read)
38
+ doc.remove_namespaces!
39
+ doc.at_xpath("//rootfile/@full-path")&.value
40
+ end
41
+
42
+ def spine_documents(zip, opf, base)
43
+ manifest = opf.xpath("//manifest/item").to_h do |item|
44
+ [item["id"], item["href"]]
45
+ end
46
+
47
+ opf.xpath("//spine/itemref").filter_map do |ref|
48
+ href = manifest[ref["idref"]]
49
+ next if href.nil?
50
+
51
+ path = base == "." ? href : File.join(base, href)
52
+ html = read(zip, path)
53
+ next if html.nil?
54
+
55
+ Html.html_to_markdown(html)
56
+ end
57
+ end
58
+
59
+ def read(zip, path)
60
+ zip.find_entry(path)&.get_input_stream&.read
61
+ end
62
+ end
63
+ end
64
+ end
@@ -0,0 +1,29 @@
1
+ # frozen_string_literal: true
2
+
3
+ module Markdownator
4
+ module Converters
5
+ # Converts HTML into Markdown using reverse_markdown (Nokogiri-backed).
6
+ class Html < Base
7
+ def accepts?(_io, stream_info)
8
+ matches?(stream_info, extensions: %w[html htm], mimetypes: %w[text/html application/xhtml+xml])
9
+ end
10
+
11
+ def convert(io, stream_info, **_options)
12
+ html = read_all(io, stream_info)
13
+ Result.new(markdown: self.class.html_to_markdown(html), title: self.class.extract_title(html))
14
+ end
15
+
16
+ # Shared so other container converters (EPUB) can reuse HTML conversion.
17
+ def self.html_to_markdown(html)
18
+ Markdownator.require_optional("reverse_markdown", feature: "HTML conversion")
19
+ ReverseMarkdown.convert(html, unknown_tags: :bypass, github_flavored: true).strip
20
+ end
21
+
22
+ def self.extract_title(html)
23
+ Markdownator.require_optional("nokogiri", feature: "HTML conversion")
24
+ title = Nokogiri::HTML(html).at_css("title")&.text&.strip
25
+ title unless title.nil? || title.empty?
26
+ end
27
+ end
28
+ end
29
+ end
@@ -0,0 +1,88 @@
1
+ # frozen_string_literal: true
2
+
3
+ module Markdownator
4
+ module Converters
5
+ # Converts an image into Markdown metadata (filename + EXIF fields). When a
6
+ # captioner is supplied via the `captioner:` option, its description is
7
+ # appended. A captioner is any object responding to
8
+ # `#caption(io, stream_info) -> String`.
9
+ class Image < Base
10
+ EXTENSIONS = %w[jpg jpeg png gif tif tiff].freeze
11
+ MIMETYPES = %w[image/jpeg image/png image/gif image/tiff].freeze
12
+
13
+ # EXIF fields worth surfacing, in display order.
14
+ EXIF_FIELDS = %i[
15
+ date_time make model orientation
16
+ f_number exposure_time iso_speed_ratings focal_length
17
+ gps_latitude gps_longitude image_description
18
+ ].freeze
19
+
20
+ def accepts?(_io, stream_info)
21
+ matches?(stream_info, extensions: EXTENSIONS, mimetypes: MIMETYPES)
22
+ end
23
+
24
+ def convert(io, stream_info, **options)
25
+ lines = []
26
+ lines << "# #{stream_info.filename}" if stream_info.filename
27
+
28
+ metadata = exif_metadata(io, stream_info)
29
+ metadata.each { |key, value| lines << "- **#{key}**: #{value}" }
30
+
31
+ caption = caption_for(io, stream_info, options[:captioner])
32
+ lines << "\n#{caption}" if caption
33
+
34
+ Result.new(markdown: lines.join("\n").strip, metadata: metadata)
35
+ end
36
+
37
+ private
38
+
39
+ def exif_metadata(io, stream_info)
40
+ return {} unless jpeg_or_tiff?(stream_info)
41
+
42
+ Markdownator.require_optional("exifr", feature: "image metadata extraction")
43
+ io.rewind if io.respond_to?(:rewind)
44
+ reader = exif_reader(stream_info, io)
45
+ return {} if reader.nil?
46
+
47
+ EXIF_FIELDS.each_with_object({}) do |field, acc|
48
+ next unless reader.respond_to?(field)
49
+
50
+ value = reader.public_send(field)
51
+ acc[field.to_s] = value.to_s unless value.nil? || value.to_s.empty?
52
+ end
53
+ rescue StandardError
54
+ {}
55
+ ensure
56
+ io.rewind if io.respond_to?(:rewind)
57
+ end
58
+
59
+ def exif_reader(stream_info, io)
60
+ if tiff?(stream_info)
61
+ EXIFR::TIFF.new(io)
62
+ else
63
+ EXIFR::JPEG.new(io)
64
+ end
65
+ end
66
+
67
+ def jpeg_or_tiff?(stream_info)
68
+ ext = stream_info.extension
69
+ mime = stream_info.guessed_mimetype
70
+ %w[jpg jpeg tif tiff].include?(ext) || %w[image/jpeg image/tiff].include?(mime)
71
+ end
72
+
73
+ def tiff?(stream_info)
74
+ stream_info.extension&.start_with?("tif") || stream_info.guessed_mimetype == "image/tiff"
75
+ end
76
+
77
+ def caption_for(io, stream_info, captioner)
78
+ return nil unless captioner.respond_to?(:caption)
79
+
80
+ io.rewind if io.respond_to?(:rewind)
81
+ text = captioner.caption(io, stream_info)
82
+ text unless text.nil? || text.to_s.strip.empty?
83
+ rescue StandardError
84
+ nil
85
+ end
86
+ end
87
+ end
88
+ end
@@ -0,0 +1,22 @@
1
+ # frozen_string_literal: true
2
+
3
+ require "json"
4
+
5
+ module Markdownator
6
+ module Converters
7
+ # Renders JSON as a pretty-printed fenced code block (lossless).
8
+ class Json < Base
9
+ def accepts?(_io, stream_info)
10
+ matches?(stream_info, extensions: %w[json], mimetypes: %w[application/json text/json])
11
+ end
12
+
13
+ def convert(io, stream_info, **_options)
14
+ raw = read_all(io, stream_info)
15
+ pretty = JSON.pretty_generate(JSON.parse(raw))
16
+ Result.new(markdown: "```json\n#{pretty}\n```")
17
+ rescue JSON::ParserError => e
18
+ raise FileConversionError, "Could not parse JSON: #{e.message}"
19
+ end
20
+ end
21
+ end
22
+ end