llm-docs-builder 0.10.0 → 0.11.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: e6407836216f436b728247009f614ea4ea5c2b4de0edf855717129460df4b309
4
- data.tar.gz: 8a556fc0b6307529f5c615c05b082521bd021e9e9f34ca30c8bb21d22b21deb2
3
+ metadata.gz: deeae74a329018b4a43d7845a3be8b7c31347699c3ff7abd93d7b697a48982a3
4
+ data.tar.gz: f9c842caa93a45b4d75c45a15e116e6f98d8463e5268e2de32e498c725877e4f
5
5
  SHA512:
6
- metadata.gz: 3a0b657545415c35187fa1595f3fcc5bd5c27a0c1a292bf00635d3c6f14221ac0a254f008f83acc1f7351ef76c6649e6b0540ac585cd64a670b05ef725b0308e
7
- data.tar.gz: 68e95142e374ebae3c292db724a9163c396fd423eaa04983bbdaefd6f6cbb0bc72822dfbc5af9fd8f357f1545320db70bf5a6c8a2413fc5f796e73195adcdd13
6
+ metadata.gz: 94575eced147bd6740b5395acd41d3f46ffcadf40908831df081d5e03f56b35a2e1e9acfdfc7642af775b2aa86fe48ea322dd11baf48512d2f2ef43a1a491079
7
+ data.tar.gz: 52b7d40d4a95acd20a408f4d453f32c7154637ce42ec210cf67010ce10dbe14ad711c4cbfd4060d8ce5018b91668315d552732ea7118374e69228a869792ff0f
@@ -24,7 +24,7 @@ jobs:
24
24
  fetch-depth: 0
25
25
 
26
26
  - name: Set up Ruby
27
- uses: ruby/setup-ruby@ab177d40ee5483edb974554986f56b33477e21d0 # v1.265.0
27
+ uses: ruby/setup-ruby@4ff6f3611a42bc75eee1e5138240eb1613f48c8f # v1.266.0
28
28
  with:
29
29
  bundler-cache: false
30
30
 
data/AGENTS.md ADDED
@@ -0,0 +1,20 @@
1
+ # Repository Guidelines
2
+
3
+ ## Project Structure & Module Organization
4
+ Core gem code lives in `lib/llm_docs_builder`, with single-responsibility modules such as `generator.rb`, `validator.rb`, and the CLI glue in `cli.rb`. Shared entrypoint `lib/llm_docs_builder.rb` wires dependencies. Executables reside in `bin/`: `llm-docs-builder` boots the CLI, while `rspecs` runs the full test matrix. Specs mirror library files under `spec/` with command-level coverage in `spec/integrations`. Static assets (logos, diff screenshots) are in `misc/`. Example configuration templates live at `llm-docs-builder.yml.example`.
5
+
6
+ ## Build, Test, and Development Commands
7
+ - `bundle install` — sync gem dependencies defined in `Gemfile`.
8
+ - `bundle exec rake` — default task; runs RSpec and RuboCop together.
9
+ - `bundle exec rspec` or `bin/rspecs` — execute unit and integration specs with doc formatter.
10
+ - `bundle exec rubocop` — enforce the Ruby style guide; mirrors CI.
11
+ - `bin/llm-docs-builder transform --docs README.md` — smoke-test the CLI against a local file.
12
+
13
+ ## Coding Style & Naming Conventions
14
+ Target Ruby 3.2 with two-space indentation and trailing newline. Prefer single-quoted strings; enable `# frozen_string_literal: true` headers on Ruby files. Keep lines ≤120 characters except where the RuboCop config allows. Use descriptive module/class names (e.g., `LlmDocsBuilder::Generator`) and predicate methods ending with `?` when returning booleans. Place supporting fixtures in `spec/support` if added, and name files after the class they extend.
15
+
16
+ ## Testing Guidelines
17
+ RSpec is the sole testing framework. Name files `*_spec.rb` and align describe blocks with constant paths. Integration scenarios belong in `spec/integrations` to capture CLI behaviors. SimpleCov is enabled by default for line and branch coverage; export `SIMPLECOV=false` for quick local runs. Persist example statuses with the automatically managed `spec/examples.txt`.
18
+
19
+ ## Commit & Pull Request Guidelines
20
+ Keep commit subjects short, present-tense, and focused (e.g., `Align CLI config (#27)`). Group related changes together so `git log` remains readable. Pull requests should describe motivation, summarize behavioral impact, link related issues or discussions, and include CLI output or screenshots when touching generated docs. Ensure CI passes (`bundle exec rake`) before requesting review, and note any follow-up work in the PR description.
data/CHANGELOG.md CHANGED
@@ -1,5 +1,11 @@
1
1
  # Changelog
2
2
 
3
+ ## 0.11.0 (2025-11-03)
4
+ - [Feature] **Transform from URL** — The `transform` command now accepts a remote URL via `--url` and processes fetched content through the standard transformer pipeline.
5
+ - Example: `llm-docs-builder transform --url https://example.com/docs/page.html`
6
+ - Applies all configured transformations and output options identically to local files
7
+ - By @Eric-Guo and @codex in PR #28.
8
+
3
9
  ## 0.10.0 (2025-10-27)
4
10
  - [Feature] **llms.txt Specification Compliance** - Updated output format to fully comply with the llms.txt specification from llmstxt.org.
5
11
  - **Metadata Format**: Metadata now appears within the description field using parentheses and comma separators: `- [title](url): description (tokens:450, updated:2025-10-13, priority:high)`
data/Gemfile.lock CHANGED
@@ -1,7 +1,7 @@
1
1
  PATH
2
2
  remote: .
3
3
  specs:
4
- llm-docs-builder (0.10.0)
4
+ llm-docs-builder (0.11.0)
5
5
  zeitwerk (~> 2.6)
6
6
 
7
7
  GEM
@@ -12,15 +12,15 @@ GEM
12
12
  coderay (1.1.3)
13
13
  diff-lcs (1.6.2)
14
14
  docile (1.4.1)
15
- json (2.13.2)
15
+ json (2.15.2)
16
16
  language_server-protocol (3.17.0.5)
17
17
  lint_roller (1.1.0)
18
18
  method_source (1.1.0)
19
19
  parallel (1.27.0)
20
- parser (3.3.9.0)
20
+ parser (3.3.10.0)
21
21
  ast (~> 2.4.1)
22
22
  racc
23
- prism (1.4.0)
23
+ prism (1.6.0)
24
24
  pry (0.15.2)
25
25
  coderay (~> 1.1)
26
26
  method_source (~> 1.0)
@@ -29,22 +29,22 @@ GEM
29
29
  pry (>= 0.13, < 0.16)
30
30
  racc (1.8.1)
31
31
  rainbow (3.1.1)
32
- rake (13.3.0)
33
- regexp_parser (2.11.2)
34
- rspec (3.13.1)
32
+ rake (13.3.1)
33
+ regexp_parser (2.11.3)
34
+ rspec (3.13.2)
35
35
  rspec-core (~> 3.13.0)
36
36
  rspec-expectations (~> 3.13.0)
37
37
  rspec-mocks (~> 3.13.0)
38
- rspec-core (3.13.5)
38
+ rspec-core (3.13.6)
39
39
  rspec-support (~> 3.13.0)
40
40
  rspec-expectations (3.13.5)
41
41
  diff-lcs (>= 1.2.0, < 2.0)
42
42
  rspec-support (~> 3.13.0)
43
- rspec-mocks (3.13.5)
43
+ rspec-mocks (3.13.6)
44
44
  diff-lcs (>= 1.2.0, < 2.0)
45
45
  rspec-support (~> 3.13.0)
46
- rspec-support (3.13.5)
47
- rubocop (1.80.0)
46
+ rspec-support (3.13.6)
47
+ rubocop (1.81.6)
48
48
  json (~> 2.3)
49
49
  language_server-protocol (~> 3.17.0.2)
50
50
  lint_roller (~> 1.1.0)
@@ -52,10 +52,10 @@ GEM
52
52
  parser (>= 3.3.0.2)
53
53
  rainbow (>= 2.2.2, < 4.0)
54
54
  regexp_parser (>= 2.9.3, < 3.0)
55
- rubocop-ast (>= 1.46.0, < 2.0)
55
+ rubocop-ast (>= 1.47.1, < 2.0)
56
56
  ruby-progressbar (~> 1.7)
57
57
  unicode-display_width (>= 2.4.0, < 4.0)
58
- rubocop-ast (1.46.0)
58
+ rubocop-ast (1.47.1)
59
59
  parser (>= 3.3.7.2)
60
60
  prism (~> 1.4)
61
61
  ruby-progressbar (1.13.0)
@@ -65,9 +65,9 @@ GEM
65
65
  simplecov_json_formatter (~> 0.1)
66
66
  simplecov-html (0.13.2)
67
67
  simplecov_json_formatter (0.1.4)
68
- unicode-display_width (3.1.5)
69
- unicode-emoji (~> 4.0, >= 4.0.4)
70
- unicode-emoji (4.0.4)
68
+ unicode-display_width (3.2.0)
69
+ unicode-emoji (~> 4.1)
70
+ unicode-emoji (4.1.0)
71
71
  zeitwerk (2.7.3)
72
72
 
73
73
  PLATFORMS
@@ -85,4 +85,4 @@ DEPENDENCIES
85
85
  simplecov (~> 0.21)
86
86
 
87
87
  BUNDLED WITH
88
- 2.7.1
88
+ 2.7.2
data/README.md CHANGED
@@ -61,6 +61,9 @@ Factor: 2.8x smaller
61
61
  # Single file
62
62
  llm-docs-builder transform --docs README.md
63
63
 
64
+ # Fetch and transform a remote page
65
+ llm-docs-builder transform --url https://yoursite.com/docs/page.html
66
+
64
67
  # Bulk transform with config
65
68
  llm-docs-builder bulk-transform --config llm-docs-builder.yml
66
69
  ```
@@ -68,8 +68,9 @@ module LlmDocsBuilder
68
68
  # @param argv [Array<String>] command-line arguments
69
69
  # @return [Hash] parsed options including :command, :config, :docs, :output, :verbose
70
70
  def parse_options(argv)
71
+ command_token = argv.first
71
72
  options = {
72
- command: argv.first&.match?(/^[a-z-]+$/) ? argv.shift : nil
73
+ command: command_token&.match?(/\A[a-z](?:[a-z-]*[a-z])?\z/) ? argv.shift : nil
73
74
  }
74
75
 
75
76
  OptionParser.new do |opts|
@@ -100,7 +101,7 @@ module LlmDocsBuilder
100
101
  options[:output] = path
101
102
  end
102
103
 
103
- opts.on('-u', '--url URL', 'URL to fetch for comparison') do |url|
104
+ opts.on('-u', '--url URL', 'URL to fetch for transform or comparison') do |url|
104
105
  options[:url] = url
105
106
  end
106
107
 
@@ -185,21 +186,42 @@ module LlmDocsBuilder
185
186
  config = LlmDocsBuilder::Config.new(options[:config])
186
187
  merged_options = config.merge_with_options(options)
187
188
 
188
- file_path = merged_options[:docs]
189
+ url = options[:url]
190
+ cli_file_path = options[:docs]
191
+ config_file_path = config['docs']
192
+ file_path = url ? cli_file_path : (cli_file_path || config_file_path)
189
193
 
190
- unless file_path
191
- puts 'File path required for transform command (use -d/--docs)'
194
+ if url && cli_file_path
195
+ puts 'Cannot use both --docs and --url for transform command'
192
196
  exit 1
193
197
  end
194
198
 
195
- unless File.exist?(file_path)
196
- puts "File not found: #{file_path}"
197
- exit 1
199
+ unless file_path
200
+ unless url
201
+ puts 'File path required for transform command (use -d/--docs)'
202
+ exit 1
203
+ end
198
204
  end
199
205
 
200
- puts "Transforming #{file_path}..." if merged_options[:verbose]
206
+ content =
207
+ if url
208
+ puts "Fetching #{url}..." if merged_options[:verbose]
209
+ fetcher = LlmDocsBuilder::UrlFetcher.new(verbose: merged_options[:verbose])
210
+ remote_content = fetcher.fetch(url)
211
+ puts "Transforming content from #{url}..." if merged_options[:verbose]
212
+ transform_options = merged_options.merge(content: remote_content, docs: nil, source_url: url)
213
+ LlmDocsBuilder.transform_markdown(nil, transform_options)
214
+ else
215
+ unless File.exist?(file_path)
216
+ puts "File not found: #{file_path}"
217
+ exit 1
218
+ end
201
219
 
202
- content = LlmDocsBuilder.transform_markdown(file_path, merged_options)
220
+ puts "Transforming #{file_path}..." if merged_options[:verbose]
221
+
222
+ merged_options[:docs] = file_path
223
+ LlmDocsBuilder.transform_markdown(file_path, merged_options)
224
+ end
203
225
 
204
226
  if merged_options[:output] && merged_options[:output] != 'llms.txt'
205
227
  File.write(merged_options[:output], content)
@@ -1,8 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
- require 'net/http'
4
- require 'uri'
5
-
6
3
  module LlmDocsBuilder
7
4
  # Compares content sizes between human and AI versions
8
5
  #
@@ -30,7 +27,7 @@ module LlmDocsBuilder
30
27
  AI_USER_AGENT = 'Claude-Web/1.0 (Anthropic AI Assistant)'
31
28
 
32
29
  # Maximum number of redirects to follow before raising an error
33
- MAX_REDIRECTS = 10
30
+ MAX_REDIRECTS = UrlFetcher::MAX_REDIRECTS
34
31
 
35
32
  # @return [String] URL to compare
36
33
  attr_reader :url
@@ -133,78 +130,11 @@ module LlmDocsBuilder
133
130
  # @return [String] response body
134
131
  # @raise [Errors::GenerationError] if fetch fails or too many redirects
135
132
  def fetch_url(url_string, user_agent, redirect_count = 0)
136
- if redirect_count >= MAX_REDIRECTS
137
- raise(
138
- Errors::GenerationError,
139
- "Too many redirects (#{MAX_REDIRECTS}) when fetching #{url_string}"
140
- )
141
- end
142
-
143
- uri = validate_and_parse_url(url_string)
144
-
145
- http = Net::HTTP.new(uri.host, uri.port)
146
- http.use_ssl = uri.scheme == 'https'
147
- http.open_timeout = 10
148
- http.read_timeout = 30
149
-
150
- request = Net::HTTP::Get.new(uri.request_uri)
151
- request['User-Agent'] = user_agent
152
-
153
- response = http.request(request)
154
-
155
- case response
156
- when Net::HTTPSuccess
157
- response.body
158
- when Net::HTTPRedirection
159
- # Follow redirect with incremented counter
160
- redirect_url = response['location']
161
- puts " Redirecting to #{redirect_url}..." if options[:verbose] && redirect_count.positive?
162
- fetch_url(redirect_url, user_agent, redirect_count + 1)
163
- else
164
- raise(
165
- Errors::GenerationError,
166
- "Failed to fetch #{url_string}: #{response.code} #{response.message}"
167
- )
168
- end
169
- rescue Errors::GenerationError
170
- raise
171
- rescue StandardError => e
172
- raise(
173
- Errors::GenerationError,
174
- "Error fetching #{url_string}: #{e.message}"
175
- )
176
- end
177
-
178
- # Validates and parses URL to prevent malformed URLs
179
- #
180
- # @param url_string [String] URL to validate and parse
181
- # @return [URI::HTTP, URI::HTTPS] parsed URI
182
- # @raise [Errors::GenerationError] if URL is invalid or uses unsupported scheme
183
- def validate_and_parse_url(url_string)
184
- uri = URI.parse(url_string)
185
-
186
- # Only allow HTTP and HTTPS schemes
187
- unless %w[http https].include?(uri.scheme&.downcase)
188
- raise(
189
- Errors::GenerationError,
190
- "Unsupported URL scheme: #{uri.scheme || 'none'} (only http/https allowed)"
191
- )
192
- end
193
-
194
- # Ensure host is present
195
- if uri.host.nil? || uri.host.empty?
196
- raise(
197
- Errors::GenerationError,
198
- "Invalid URL: missing host in #{url_string}"
199
- )
200
- end
201
-
202
- uri
203
- rescue URI::InvalidURIError => e
204
- raise(
205
- Errors::GenerationError,
206
- "Invalid URL format: #{e.message}"
133
+ fetcher = UrlFetcher.new(
134
+ user_agent: user_agent,
135
+ verbose: options[:verbose]
207
136
  )
137
+ fetcher.fetch(url_string, redirect_count)
208
138
  end
209
139
 
210
140
  # Calculate comparison statistics
@@ -57,7 +57,11 @@ module LlmDocsBuilder
57
57
  def merge_with_options(options)
58
58
  # CLI options override config file, config file provides defaults
59
59
  {
60
- docs: options[:docs] || self['docs'] || '.',
60
+ docs: if options.key?(:docs)
61
+ options[:docs]
62
+ else
63
+ self['docs'] || '.'
64
+ end,
61
65
  base_url: options[:base_url] || self['base_url'],
62
66
  title: options[:title] || self['title'],
63
67
  description: options[:description] || self['description'],
@@ -171,7 +175,10 @@ module LlmDocsBuilder
171
175
  else
172
176
  self['calculate_compression'] || false
173
177
  end
174
- }
178
+ }.tap do |merged|
179
+ merged[:content] = options[:content] if options.key?(:content)
180
+ merged[:source_url] = options[:source_url] if options.key?(:source_url)
181
+ end
175
182
  end
176
183
 
177
184
  # Check if a config file was found and exists
@@ -55,7 +55,7 @@ module LlmDocsBuilder
55
55
  #
56
56
  # @return [String] transformed markdown content
57
57
  def transform
58
- content = File.read(file_path)
58
+ content = load_content
59
59
 
60
60
  # Build and execute transformation pipeline
61
61
  content = cleanup_transformer.transform(content, options)
@@ -124,5 +124,16 @@ module LlmDocsBuilder
124
124
  }
125
125
  compressor.compress(content, compression_methods)
126
126
  end
127
+
128
+ # Load source content either from provided string or file path
129
+ #
130
+ # @return [String] markdown content to transform
131
+ def load_content
132
+ if options[:content]
133
+ options[:content].dup
134
+ else
135
+ File.read(file_path)
136
+ end
137
+ end
127
138
  end
128
139
  end
@@ -0,0 +1,120 @@
1
+ # frozen_string_literal: true
2
+
3
+ require 'net/http'
4
+ require 'uri'
5
+
6
+ module LlmDocsBuilder
7
+ # Lightweight HTTP client for fetching remote documentation pages.
8
+ #
9
+ # Provides common functionality needed by multiple commands (transform, compare)
10
+ # including strict scheme validation, redirect handling and sensible timeouts.
11
+ class UrlFetcher
12
+ DEFAULT_USER_AGENT = 'llm-docs-builder/1.0 (+https://github.com/mensfeld/llm-docs-builder)'
13
+ MAX_REDIRECTS = 10
14
+
15
+ # @param user_agent [String] HTTP user agent header value
16
+ # @param verbose [Boolean] enable redirect logging
17
+ # @param output [IO] IO stream used for redirect logging
18
+ def initialize(user_agent: DEFAULT_USER_AGENT, verbose: false, output: $stdout)
19
+ @user_agent = user_agent
20
+ @verbose = verbose
21
+ @output = output
22
+ end
23
+
24
+ # Fetch remote URL content while following redirects.
25
+ #
26
+ # @param url_string [String] URL to fetch
27
+ # @param redirect_count [Integer] current redirect depth (internal use)
28
+ # @return [String] response body
29
+ # @raise [Errors::GenerationError] on invalid URLs, network failures, or redirect loops
30
+ def fetch(url_string, redirect_count = 0)
31
+ if redirect_count >= MAX_REDIRECTS
32
+ raise(
33
+ Errors::GenerationError,
34
+ "Too many redirects (#{MAX_REDIRECTS}) when fetching #{url_string}"
35
+ )
36
+ end
37
+
38
+ uri = validate_and_parse_url(url_string)
39
+
40
+ http = Net::HTTP.new(uri.host, uri.port)
41
+ http.use_ssl = uri.scheme == 'https'
42
+ http.open_timeout = 10
43
+ http.read_timeout = 30
44
+
45
+ request = Net::HTTP::Get.new(uri.request_uri)
46
+ request['User-Agent'] = @user_agent
47
+
48
+ response = http.request(request)
49
+
50
+ case response
51
+ when Net::HTTPSuccess
52
+ response.body
53
+ when Net::HTTPRedirection
54
+ redirect_url = absolute_redirect_url(uri, response['location'])
55
+ log_redirect(redirect_url)
56
+ fetch(redirect_url, redirect_count + 1)
57
+ else
58
+ raise(
59
+ Errors::GenerationError,
60
+ "Failed to fetch #{url_string}: #{response.code} #{response.message}"
61
+ )
62
+ end
63
+ rescue Errors::GenerationError
64
+ raise
65
+ rescue StandardError => e
66
+ raise(
67
+ Errors::GenerationError,
68
+ "Error fetching #{url_string}: #{e.message}"
69
+ )
70
+ end
71
+
72
+ private
73
+
74
+ def validate_and_parse_url(url_string)
75
+ uri = URI.parse(url_string)
76
+
77
+ unless %w[http https].include?(uri.scheme&.downcase)
78
+ raise(
79
+ Errors::GenerationError,
80
+ "Unsupported URL scheme: #{uri.scheme || 'none'} (only http/https allowed)"
81
+ )
82
+ end
83
+
84
+ if uri.host.nil? || uri.host.empty?
85
+ raise(
86
+ Errors::GenerationError,
87
+ "Invalid URL: missing host in #{url_string}"
88
+ )
89
+ end
90
+
91
+ uri
92
+ rescue URI::InvalidURIError => e
93
+ raise(
94
+ Errors::GenerationError,
95
+ "Invalid URL format: #{e.message}"
96
+ )
97
+ end
98
+
99
+ def absolute_redirect_url(base_uri, location)
100
+ raise(
101
+ Errors::GenerationError,
102
+ "Redirect missing location header for #{base_uri}"
103
+ ) if location.nil? || location.empty?
104
+
105
+ URI.join(base_uri, location).to_s
106
+ rescue URI::InvalidURIError => e
107
+ raise(
108
+ Errors::GenerationError,
109
+ "Invalid redirect URL from #{base_uri}: #{e.message}"
110
+ )
111
+ end
112
+
113
+ def log_redirect(url)
114
+ return unless @verbose
115
+
116
+ @output.puts(" Redirecting to #{url}...")
117
+ end
118
+ end
119
+ end
120
+
@@ -2,5 +2,5 @@
2
2
 
3
3
  module LlmDocsBuilder
4
4
  # Current version of the LlmDocsBuilder gem
5
- VERSION = '0.10.0'
5
+ VERSION = '0.11.0'
6
6
  end
@@ -25,6 +25,7 @@ module LlmDocsBuilder
25
25
  # @option options [Boolean] :convert_urls convert HTML URLs to markdown format (overrides
26
26
  # config)
27
27
  # @option options [Boolean] :verbose enable verbose output (overrides config)
28
+ # @option options [String] :content raw markdown content (used for remote sources)
28
29
  # @return [String] generated llms.txt content
29
30
  #
30
31
  # @example Generate from docs directory
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: llm-docs-builder
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.10.0
4
+ version: 0.11.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Maciej Mensfeld
@@ -116,6 +116,7 @@ files:
116
116
  - ".rspec"
117
117
  - ".rubocop.yml"
118
118
  - ".ruby-version"
119
+ - AGENTS.md
119
120
  - CHANGELOG.md
120
121
  - Dockerfile
121
122
  - Gemfile
@@ -143,6 +144,7 @@ files:
143
144
  - lib/llm_docs_builder/transformers/heading_transformer.rb
144
145
  - lib/llm_docs_builder/transformers/link_transformer.rb
145
146
  - lib/llm_docs_builder/transformers/whitespace_transformer.rb
147
+ - lib/llm_docs_builder/url_fetcher.rb
146
148
  - lib/llm_docs_builder/validator.rb
147
149
  - lib/llm_docs_builder/version.rb
148
150
  - llm-docs-builder.gemspec