site-to-md 1.0.5

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA256:
3
+ metadata.gz: b2d16ad412ed98a97e3871cf72fd3b83665b4ce9f802eb6264e8f76c77b9f3a7
4
+ data.tar.gz: 7f2aa545d6ee2a3e97f8a80fc69e6c2e68b0f117b908660fc4370282f34008f3
5
+ SHA512:
6
+ metadata.gz: 0527d93e4d21d06ae60bab265edd19b500ffd584623bcd3a349521797ce60a8c466fef932724cbaa229bef6ddad3ecc1de1bc3845db74172792c826f566f571d
7
+ data.tar.gz: 8c1631e55a27e3f2274cf18e2a0a5f2b9af148a2869e78432a4767da9f4456cb14e9ea28ebdf3a4de1d5ff18bf7f0eaea70f7e2a5359d2c4c5dca9f268f96fbe
data/CHANGELOG.md ADDED
@@ -0,0 +1,55 @@
1
+ # Changelog
2
+
3
+ All notable changes to this project will be documented in this file.
4
+
5
+ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
6
+ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
7
+
8
+ ## [Unreleased]
9
+
10
+ ## [1.0.5] - 2024-12-27
11
+
12
+ ### Fixed
13
+
14
+ - Fix Release process
15
+
16
+ ## [1.0.4] - 2024-12-27
17
+
18
+ ### Fixed
19
+
20
+ - Fix Release process
21
+
22
+ ## [1.0.3] - 2024-12-27
23
+
24
+ ### Fixed
25
+
26
+ - Fix Release process
27
+
28
+ ## [1.0.2] - 2024-12-27
29
+
30
+ ### Fixed
31
+
32
+ - Fix Release process
33
+
34
+ ## [1.0.1] - 2024-12-27
35
+
36
+ ## 1.0.0 - 2024-12-26
37
+
38
+ ### Added
39
+
40
+ - Command-line interface (CLI) for converting HTML files to a single markdown document
41
+ - Support for extracting content from `<main>` tag with `<body>` fallback
42
+ - Preservation of frontmatter metadata in markdown output
43
+ - Removal of unnecessary HTML elements to optimize markdown for AI tools
44
+ - Enhanced description emphasizing the tool's ease of use for AI tools like Claude AI or ChatGPT
45
+ - Initial test suite for ensuring code quality and reliability
46
+ - Continuous Integration (CI) setup using GitHub Actions
47
+ - Dockerfile to make the CLI tool available as a Docker image (useful for CI)
48
+ - Dependabot configuration for automated dependency updates
49
+
50
+ [1.0.1]: https://github.com/tmaier/site-to-md/compare/v1.0.0...v1.0.1
51
+ [1.0.2]: https://github.com/tmaier/site-to-md/compare/v1.0.1...v1.0.2
52
+ [1.0.3]: https://github.com/tmaier/site-to-md/compare/v1.0.2...v1.0.3
53
+ [1.0.4]: https://github.com/tmaier/site-to-md/compare/v1.0.3...v1.0.4
54
+
55
+ [1.0.5]: https://github.com/tmaier/site-to-md/compare/v1.0.4...v1.0.5
data/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2024 maier.io UG (haftungsbeschränkt)
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
data/README.md ADDED
@@ -0,0 +1,206 @@
1
+ # site-to-md
2
+
3
+ [![Gem Version](https://badge.fury.io/rb/site-to-md.svg)](https://badge.fury.io/rb/site-to-md)
4
+ [![Tests](https://github.com/tmaier/site-to-md/workflows/Tests/badge.svg)](https://github.com/tmaier/site-to-md/actions?query=workflow%3ATests)
5
+ [![RuboCop](https://github.com/tmaier/site-to-md/workflows/RuboCop/badge.svg)](https://github.com/tmaier/site-to-md/actions?query=workflow%3ARuboCop)
6
+ [![MIT License](https://img.shields.io/badge/license-MIT-blue.svg)](LICENSE.txt)
7
+
8
+ This command-line tool aggregates content from HTML pages into a single, streamlined markdown file.
9
+ It removes unnecessary HTML elements to reduce token usage and provides an easily uploadable format for AI tools like Claude AI or ChatGPT.
10
+
11
+ This tool can also be used to create a [llms.txt or llms-full.txt](https://llmstxt.org/).
12
+ For this use case, consider to add this tool to the build pipeline of your satic website.
13
+
14
+ ## Features
15
+
16
+ - Converts all HTML files in a directory (and subdirectories) to markdown
17
+ - Extracts content from `<main>` tag (falls back to `<body>` if not found)
18
+ - Preserves document structure with frontmatter metadata
19
+ - Maintains proper markdown formatting for:
20
+ - Headers
21
+ - Lists
22
+ - Tables
23
+ - Code blocks
24
+ - Links
25
+ - And more...
26
+ - Command-line interface for easy integration
27
+
28
+ ## Installation
29
+
30
+ ```bash
31
+ gem install site-to-md
32
+ ```
33
+
34
+ Or add to your Gemfile:
35
+
36
+ ```ruby
37
+ gem 'site-to-md'
38
+ ```
39
+
40
+ ## Usage
41
+
42
+ ### Command Line
43
+
44
+ Basic usage:
45
+
46
+ ```bash
47
+ site-to-md convert path/to/site
48
+ ```
49
+
50
+ Specify output file:
51
+
52
+ ```bash
53
+ site-to-md convert path/to/site -o output.md
54
+ ```
55
+
56
+ Get help:
57
+
58
+ ```bash
59
+ site-to-md help
60
+ ```
61
+
62
+ ### Docker Image
63
+
64
+ You can use the [site-to-md tool via a Docker image](https://github.com/tmaier/site-to-md/pkgs/container/site-to-md), making it convenient to include in your build pipeline for static websites.
65
+
66
+ #### Example: GitLab CI
67
+
68
+ Here's an example GitLab CI configuration.
69
+ This configuration includes a job `llms-full-txt` that uses the Docker image to convert HTML files in the public folder and generates the llms-full.txt file in the same folder. This
70
+
71
+ ```yaml
72
+ llms-full-txt:
73
+ image: ghcr.io/tmaier/site-to-md:latest
74
+ script:
75
+ - site-to-md convert public -o public/llms-full.txt
76
+ artifacts:
77
+ paths:
78
+ - public/llms-full.txt
79
+ ```
80
+
81
+ ### Ruby API
82
+
83
+ ```ruby
84
+ require 'site_to_md'
85
+
86
+ processor = SiteToMd::Processor.new('path/to/site', 'output.md')
87
+ processor.process
88
+ ```
89
+
90
+ ## Output Format
91
+
92
+ The generated markdown file contains all HTML documents concatenated with frontmatter metadata:
93
+
94
+ ```markdown
95
+ ---
96
+ path: relative/path/to/file.html
97
+ title: Page Title
98
+ ---
99
+
100
+ # Content starts here
101
+
102
+ Document content in markdown format...
103
+
104
+ ================================================================
105
+
106
+ ---
107
+
108
+ path: another/file.html
109
+ title: Another Page
110
+
111
+ ---
112
+
113
+ More content...
114
+ ```
115
+
116
+ ## Development
117
+
118
+ ### Requirements
119
+
120
+ - Ruby 3.2 or higher
121
+ - Bundler
122
+
123
+ ### Getting Started
124
+
125
+ 1. Clone the repository
126
+ 2. Open in VSCode with Dev Containers extension installed
127
+ 3. Click "Reopen in Container" when prompted
128
+
129
+ The development container will set up everything you need:
130
+
131
+ - Ruby development environment
132
+ - Ruby LSP for code intelligence
133
+ - RuboCop for code style checking
134
+ - Development dependencies
135
+
136
+ Alternatively, you can run without Dev Containers:
137
+
138
+ ```bash
139
+ bin/setup
140
+ ```
141
+
142
+ ### Testing
143
+
144
+ Run the test suite:
145
+
146
+ ```bash
147
+ bun/rake test
148
+ ```
149
+
150
+ Run the linter:
151
+
152
+ ```bash
153
+ bin/rubocop
154
+ ```
155
+
156
+ ### Dependency Management
157
+
158
+ We use Dependabot to keep dependencies up to date.
159
+ Dependabot creates pull requests to update:
160
+
161
+ - Ruby gem dependencies (weekly)
162
+ - GitHub Actions (weekly)
163
+ - Dockerfile (weekly)
164
+
165
+ ### Release Process
166
+
167
+ To create a new release, run `bin/cut-new-release {major|minor|patch}`.
168
+ Follow semantic versioning guidelines and ensure the [CHANGELOG](CHANGELOG.md) is up to date with the latest changes.
169
+
170
+ `bin/cut-new-release` will:
171
+
172
+ - Ensure some basic checks pass (e.g., right branch, clean working directory, tests pass)
173
+ - Update the version number
174
+ - Update the [CHANGELOG](CHANGELOG.md) with the new version number and release date
175
+ - Commit the changes and create the new tag
176
+ - Push the changes to GitHub
177
+
178
+ You need to create a new release manually on GitHub and update it with the content of the [CHANGELOG](CHANGELOG.md).
179
+
180
+ ## Contributing
181
+
182
+ Bug reports and pull requests are welcome on GitHub at <https://github.com/tmaier/site-to-md>.
183
+ This project is intended to be a safe, welcoming space for collaboration.
184
+
185
+ 1. Fork it
186
+ 2. Create your feature branch (`git checkout -b my-new-feature`)
187
+ 3. Make your changes:
188
+ - Add tests for new functionality
189
+ - Update documentation if needed
190
+ - Ensure tests pass (`rake test`)
191
+ - Ensure code style checks pass (`bundle exec rubocop`)
192
+ 4. Commit your changes (`git commit -am 'Add some feature'`)
193
+ 5. Push to the branch (`git push origin my-new-feature`)
194
+ 6. Create new Pull Request
195
+
196
+ ## License
197
+
198
+ The gem is available as open source under the terms of the [MIT License](LICENSE).
199
+
200
+ ## About
201
+
202
+ site-to-md is maintained by [maier.io UG (haftungsbeschränkt)](https://maier.io) and [Tobias L. Maier](https://tobiasmaier.info).
203
+
204
+ ## Related Projects
205
+
206
+ - [reverse_markdown](https://github.com/xijo/reverse_markdown) - The HTML to Markdown converter used by this gem
data/exe/site-to-md ADDED
@@ -0,0 +1,7 @@
1
+ #!/usr/bin/env ruby
2
+ # frozen_string_literal: true
3
+
4
+ require 'bundler/setup'
5
+ require 'site_to_md'
6
+
7
+ SiteToMd::CLI.start(ARGV)
@@ -0,0 +1,23 @@
1
+ # frozen_string_literal: true
2
+
3
+ require 'thor'
4
+
5
+ module SiteToMd
6
+ # CLI class handles the command line interface for site-to-markdown conversion.
7
+ class CLI < Thor
8
+ desc 'convert DIRECTORY', 'Convert HTML files from DIRECTORY to markdown'
9
+ method_option :output, aliases: '-o', desc: 'Output file path', default: 'site_content.md'
10
+ def convert(directory)
11
+ processor = Processor.new(directory, options[:output])
12
+ processor.process
13
+ puts "Successfully converted HTML files to #{options[:output]}"
14
+ rescue StandardError => e
15
+ puts "Error: #{e.message}"
16
+ exit 1
17
+ end
18
+
19
+ def self.exit_on_failure?
20
+ true
21
+ end
22
+ end
23
+ end
@@ -0,0 +1,27 @@
1
+ # frozen_string_literal: true
2
+
3
+ module SiteToMd
4
+ # Base error class for all SiteToMd errors
5
+ class Error < StandardError; end
6
+
7
+ # Error raised when no converter is available for a given file type
8
+ class UnsupportedFileTypeError < Error
9
+ def initialize(file)
10
+ super("No converter available for #{file}")
11
+ end
12
+ end
13
+
14
+ # Error raised when no files are found in the source directory
15
+ class NoFilesFoundError < Error
16
+ def initialize
17
+ super('No files found in the source directory')
18
+ end
19
+ end
20
+
21
+ # Error raised when the source directory is invalid
22
+ class InvalidSourceDirectoryError < ArgumentError
23
+ def initialize
24
+ super('Source directory is required and must exist')
25
+ end
26
+ end
27
+ end
@@ -0,0 +1,60 @@
1
+ # frozen_string_literal: true
2
+
3
+ require 'nokogiri'
4
+
5
+ module SiteToMd
6
+ # FileConverter is responsible for converting individual files to a format based on the given converter.
7
+ class FileConverter
8
+ def initialize(file_path, base_directory, converter)
9
+ @file_path = file_path
10
+ @base_directory = base_directory
11
+ @converter = converter
12
+ end
13
+
14
+ def convert
15
+ return nil if content.nil? || content.strip.empty?
16
+
17
+ format_document
18
+ end
19
+
20
+ private
21
+
22
+ def relative_path
23
+ @file_path.sub("#{@base_directory}/", '')
24
+ end
25
+
26
+ def document
27
+ @document ||= Nokogiri::HTML(File.read(@file_path))
28
+ end
29
+
30
+ def title
31
+ document.at_css('title')&.text || 'Untitled'
32
+ end
33
+
34
+ def content_element
35
+ @content_element ||= document.at_css('main') || document.at_css('body')
36
+ end
37
+
38
+ def html_content
39
+ content_element.to_html
40
+ end
41
+
42
+ def content
43
+ @content ||= @converter.convert(html_content)
44
+ end
45
+
46
+ def format_document
47
+ <<~DOCUMENT
48
+ ---
49
+ path: #{relative_path}
50
+ title: #{title}
51
+ ---
52
+
53
+ #{content.strip}
54
+
55
+ ================================================================
56
+
57
+ DOCUMENT
58
+ end
59
+ end
60
+ end
@@ -0,0 +1,21 @@
1
+ # frozen_string_literal: true
2
+
3
+ require 'reverse_markdown'
4
+
5
+ module SiteToMd
6
+ # HTMLConverter uses ReverseMarkdown to convert HTML to markdown.
7
+ class HTMLConverter
8
+ def initialize
9
+ @config = {
10
+ unknown_tags: :bypass,
11
+ github_flavored: true,
12
+ tables: true,
13
+ tag_border: ' '
14
+ }
15
+ end
16
+
17
+ def convert(html)
18
+ ReverseMarkdown.convert(html, @config)
19
+ end
20
+ end
21
+ end
@@ -0,0 +1,54 @@
1
+ # frozen_string_literal: true
2
+
3
+ module SiteToMd
4
+ # Processor collects HTML files and converts them using FileConverter.
5
+ class Processor
6
+ CONVERTERS = { '.html' => HTMLConverter.new }.freeze
7
+
8
+ def initialize(source_directory, output_file = 'site_content.md')
9
+ raise InvalidSourceDirectoryError if source_directory.nil? || source_directory.empty?
10
+ raise InvalidSourceDirectoryError unless Dir.exist?(source_directory)
11
+
12
+ @source_directory = source_directory
13
+ @output_file = output_file
14
+ end
15
+
16
+ def process
17
+ files = collect_files
18
+ raise NoFilesFoundError if files.empty?
19
+
20
+ content = convert_files(files)
21
+ write_output(content)
22
+ end
23
+
24
+ private
25
+
26
+ def collect_files
27
+ extensions_pattern = CONVERTERS.keys.map { |ext| ext.delete_prefix('.') }.join(',')
28
+ Dir.glob(File.join(@source_directory, '**', "*.{#{extensions_pattern}}"))
29
+ end
30
+
31
+ def convert_files(files)
32
+ files.each_with_object([]) do |file, output|
33
+ content = convert_file(file)
34
+ output << content if content
35
+ end.join("\n")
36
+ end
37
+
38
+ def convert_file(file)
39
+ extension = File.extname(file)
40
+ converter = CONVERTERS.fetch(extension) { raise SiteToMd::UnsupportedFileTypeError, file }
41
+ document = FileConverter.new(file, @source_directory, converter)
42
+ document.convert
43
+ rescue SiteToMd::UnsupportedFileTypeError
44
+ raise
45
+ rescue StandardError => e
46
+ warn "Error processing #{file}: #{e.message}"
47
+ nil
48
+ end
49
+
50
+ def write_output(content)
51
+ File.write(@output_file, content)
52
+ end
53
+ end
54
+ end
@@ -0,0 +1,5 @@
1
+ # frozen_string_literal: true
2
+
3
+ module SiteToMd
4
+ VERSION = '1.0.5'
5
+ end
data/lib/site_to_md.rb ADDED
@@ -0,0 +1,12 @@
1
+ # frozen_string_literal: true
2
+
3
+ require 'site_to_md/cli'
4
+ require 'site_to_md/errors'
5
+ require 'site_to_md/file_converter'
6
+ require 'site_to_md/html_converter'
7
+ require 'site_to_md/processor'
8
+ require 'site_to_md/version'
9
+
10
+ # SiteToMd module serves as the namespace for all classes related to the site-to-markdown conversion.
11
+ module SiteToMd
12
+ end
@@ -0,0 +1,50 @@
1
+ # frozen_string_literal: true
2
+
3
+ require 'test_helper'
4
+
5
+ class FileConverterTest < Minitest::Test
6
+ include TestHelpers
7
+
8
+ def setup
9
+ @sample_site = fixture_path('sample_site')
10
+ @markdown_converter = SiteToMd::HTMLConverter.new
11
+ end
12
+
13
+ def test_converts_document_with_main_tag # rubocop:disable Minitest/MultipleAssertions
14
+ file_path = File.join(@sample_site, 'index.html')
15
+ converter = SiteToMd::FileConverter.new(file_path, @sample_site, @markdown_converter)
16
+ result = converter.convert
17
+
18
+ assert_match(/^---$/, result)
19
+ assert_match(/^path: index\.html$/, result)
20
+ assert_match(/^title: Home Page$/, result)
21
+ assert_match(/# Welcome/, result)
22
+ assert_match(/This is a test page/, result)
23
+ end
24
+
25
+ def test_converts_document_without_main_tag # rubocop:disable Minitest/MultipleAssertions
26
+ file_path = File.join(@sample_site, 'about.html')
27
+ converter = SiteToMd::FileConverter.new(file_path, @sample_site, @markdown_converter)
28
+ result = converter.convert
29
+
30
+ assert_match(/^---$/, result)
31
+ assert_match(/^path: about\.html$/, result)
32
+ assert_match(/^title: About$/, result)
33
+ assert_match(/About page content/, result)
34
+ end
35
+
36
+ def test_handles_missing_title
37
+ html = '<html><body><p>Content</p></body></html>'
38
+ temp_file = File.join(@sample_site, 'no_title.html')
39
+ File.write(temp_file, html)
40
+
41
+ begin
42
+ converter = SiteToMd::FileConverter.new(temp_file, @sample_site, @markdown_converter)
43
+ result = converter.convert
44
+
45
+ assert_match(/^title: Untitled$/, result)
46
+ ensure
47
+ File.delete(temp_file)
48
+ end
49
+ end
50
+ end
@@ -0,0 +1,9 @@
1
+ <!DOCTYPE html>
2
+ <html>
3
+ <head>
4
+ <title>About</title>
5
+ </head>
6
+ <body>
7
+ <p>About page content</p>
8
+ </body>
9
+ </html>
@@ -0,0 +1,12 @@
1
+ <!DOCTYPE html>
2
+ <html>
3
+ <head>
4
+ <title>Home Page</title>
5
+ </head>
6
+ <body>
7
+ <main>
8
+ <h1>Welcome</h1>
9
+ <p>This is a test page</p>
10
+ </main>
11
+ </body>
12
+ </html>
@@ -0,0 +1,38 @@
1
+ # frozen_string_literal: true
2
+
3
+ require 'test_helper'
4
+
5
+ class HTMLConverterTest < Minitest::Test
6
+ def setup
7
+ @converter = SiteToMd::HTMLConverter.new
8
+ end
9
+
10
+ def test_converts_basic_html
11
+ html = '<h1>Title</h1><p>Content</p>'
12
+ result = @converter.convert(html)
13
+
14
+ assert_equal "# Title\n\nContent", result.strip
15
+ end
16
+
17
+ def test_converts_links
18
+ html = '<a href="https://example.com">Link</a>'
19
+ result = @converter.convert(html)
20
+
21
+ assert_equal '[Link](https://example.com)', result.strip
22
+ end
23
+
24
+ def test_converts_lists
25
+ html = '<ul><li>Item 1</li><li>Item 2</li></ul>'
26
+ result = @converter.convert(html)
27
+
28
+ assert_match(/- Item 1\n- Item 2/, result.strip)
29
+ end
30
+
31
+ def test_converts_tables
32
+ html = '<table><tr><th>Header</th></tr><tr><td>Data</td></tr></table>'
33
+ result = @converter.convert(html)
34
+
35
+ assert_match(/\| Header \|/, result)
36
+ assert_match(/\| Data \|/, result)
37
+ end
38
+ end
@@ -0,0 +1,58 @@
1
+ # frozen_string_literal: true
2
+
3
+ require 'test_helper'
4
+
5
+ class ProcessorTest < Minitest::Test
6
+ include TestHelpers
7
+
8
+ def setup
9
+ @temp_dir = create_temp_dir
10
+ @sample_site = fixture_path('sample_site')
11
+ @output_file = File.join(@temp_dir, 'output.md')
12
+ @processor = SiteToMd::Processor.new(@sample_site, @output_file)
13
+ end
14
+
15
+ def teardown
16
+ remove_temp_dir(@temp_dir)
17
+ end
18
+
19
+ def test_raises_error_for_nonexistent_directory
20
+ assert_raises(ArgumentError) do
21
+ SiteToMd::Processor.new('nonexistent_directory')
22
+ end
23
+ end
24
+
25
+ def test_raises_error_for_unsupported_extension
26
+ unsupported_file = File.join(@sample_site, 'unsupported.foobar')
27
+ File.write(unsupported_file, 'This is test content.')
28
+
29
+ begin
30
+ assert_raises(SiteToMd::UnsupportedFileTypeError) do
31
+ @processor.send(:convert_file, unsupported_file)
32
+ end
33
+ ensure
34
+ FileUtils.rm_f(unsupported_file)
35
+ end
36
+ end
37
+
38
+ def test_processes_html_files # rubocop:disable Minitest/MultipleAssertions
39
+ @processor.process
40
+
41
+ assert_path_exists @output_file
42
+ content = File.read(@output_file)
43
+
44
+ assert_match(/path: index\.html/, content)
45
+ assert_match(/title: Home Page/, content)
46
+ assert_match(/# Welcome/, content)
47
+ assert_match(/This is a test page/, content)
48
+ end
49
+
50
+ def test_handles_files_without_main_tag
51
+ @processor.process
52
+ content = File.read(@output_file)
53
+
54
+ assert_match(/path: about\.html/, content)
55
+ assert_match(/title: About/, content)
56
+ assert_match(/About page content/, content)
57
+ end
58
+ end
@@ -0,0 +1,22 @@
1
+ # frozen_string_literal: true
2
+
3
+ require 'fileutils'
4
+ require 'minitest/autorun'
5
+ require 'minitest/pride'
6
+ require 'site_to_md'
7
+
8
+ module TestHelpers
9
+ def fixture_path(path)
10
+ File.join(File.expand_path('fixtures', __dir__), path)
11
+ end
12
+
13
+ def create_temp_dir
14
+ dir = File.join(Dir.tmpdir, "site-to-md-#{Time.now.to_i}")
15
+ FileUtils.mkdir_p(dir)
16
+ dir
17
+ end
18
+
19
+ def remove_temp_dir(dir)
20
+ FileUtils.rm_rf(dir) if dir && File.directory?(dir)
21
+ end
22
+ end
metadata ADDED
@@ -0,0 +1,113 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: site-to-md
3
+ version: !ruby/object:Gem::Version
4
+ version: 1.0.5
5
+ platform: ruby
6
+ authors:
7
+ - maier.io
8
+ autorequire:
9
+ bindir: exe
10
+ cert_chain: []
11
+ date: 2024-12-27 00:00:00.000000000 Z
12
+ dependencies:
13
+ - !ruby/object:Gem::Dependency
14
+ name: nokogiri
15
+ requirement: !ruby/object:Gem::Requirement
16
+ requirements:
17
+ - - "~>"
18
+ - !ruby/object:Gem::Version
19
+ version: '1.18'
20
+ type: :runtime
21
+ prerelease: false
22
+ version_requirements: !ruby/object:Gem::Requirement
23
+ requirements:
24
+ - - "~>"
25
+ - !ruby/object:Gem::Version
26
+ version: '1.18'
27
+ - !ruby/object:Gem::Dependency
28
+ name: reverse_markdown
29
+ requirement: !ruby/object:Gem::Requirement
30
+ requirements:
31
+ - - "~>"
32
+ - !ruby/object:Gem::Version
33
+ version: '3.0'
34
+ type: :runtime
35
+ prerelease: false
36
+ version_requirements: !ruby/object:Gem::Requirement
37
+ requirements:
38
+ - - "~>"
39
+ - !ruby/object:Gem::Version
40
+ version: '3.0'
41
+ - !ruby/object:Gem::Dependency
42
+ name: thor
43
+ requirement: !ruby/object:Gem::Requirement
44
+ requirements:
45
+ - - "~>"
46
+ - !ruby/object:Gem::Version
47
+ version: '1.3'
48
+ type: :runtime
49
+ prerelease: false
50
+ version_requirements: !ruby/object:Gem::Requirement
51
+ requirements:
52
+ - - "~>"
53
+ - !ruby/object:Gem::Version
54
+ version: '1.3'
55
+ description: |
56
+ A tool that extracts and combines text from HTML files into a single, streamlined markdown document.
57
+ It provides a command-line interface for easy usage, removes unnecessary HTML elements to reduce
58
+ token usage, and creates an easily uploadable format for AI tools like Claude AI or ChatGPT.
59
+ The tool preserves document structure and includes frontmatter metadata.
60
+ email:
61
+ - hello@maier.io
62
+ executables:
63
+ - site-to-md
64
+ extensions: []
65
+ extra_rdoc_files: []
66
+ files:
67
+ - CHANGELOG.md
68
+ - LICENSE
69
+ - README.md
70
+ - exe/site-to-md
71
+ - lib/site_to_md.rb
72
+ - lib/site_to_md/cli.rb
73
+ - lib/site_to_md/errors.rb
74
+ - lib/site_to_md/file_converter.rb
75
+ - lib/site_to_md/html_converter.rb
76
+ - lib/site_to_md/processor.rb
77
+ - lib/site_to_md/version.rb
78
+ - test/file_converter_test.rb
79
+ - test/fixtures/sample_site/about.html
80
+ - test/fixtures/sample_site/index.html
81
+ - test/html_converter_test.rb
82
+ - test/processor_test.rb
83
+ - test/test_helper.rb
84
+ homepage: https://github.com/tmaier/site-to-md
85
+ licenses:
86
+ - MIT
87
+ metadata:
88
+ homepage_uri: https://github.com/tmaier/site-to-md
89
+ source_code_uri: https://github.com/tmaier/site-to-md
90
+ changelog_uri: https://github.com/tmaier/site-to-md/blob/main/CHANGELOG.md
91
+ bug_tracker_uri: https://github.com/tmaier/site-to-md/issues
92
+ documentation_uri: https://github.com/tmaier/site-to-md/blob/main/README.md
93
+ rubygems_mfa_required: 'true'
94
+ post_install_message:
95
+ rdoc_options: []
96
+ require_paths:
97
+ - lib
98
+ required_ruby_version: !ruby/object:Gem::Requirement
99
+ requirements:
100
+ - - ">="
101
+ - !ruby/object:Gem::Version
102
+ version: 3.2.0
103
+ required_rubygems_version: !ruby/object:Gem::Requirement
104
+ requirements:
105
+ - - ">="
106
+ - !ruby/object:Gem::Version
107
+ version: '0'
108
+ requirements: []
109
+ rubygems_version: 3.4.19
110
+ signing_key:
111
+ specification_version: 4
112
+ summary: Convert static site HTML to a single markdown file
113
+ test_files: []