llm-docs-builder 0.9.4 → 0.11.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 66983a07a7271c966999350d03fbde1b1080ef0ac05a7209452cbb8720074e1b
4
- data.tar.gz: 5d7cb81a700db6a43c17145af56ffcd2f6cea4a9a5182d4a2b14fd7772c8ee07
3
+ metadata.gz: deeae74a329018b4a43d7845a3be8b7c31347699c3ff7abd93d7b697a48982a3
4
+ data.tar.gz: f9c842caa93a45b4d75c45a15e116e6f98d8463e5268e2de32e498c725877e4f
5
5
  SHA512:
6
- metadata.gz: 416b6f94c7e7dbac3bf3e6ae8793adcf9893d685aec29cc83665c2f9a6ab312bd422542e0074bbeec961f23740698aac6e517ebb7c87f3cb0fef3c8c6067c662
7
- data.tar.gz: 2fba65092d82dbbeea60ce05317a781ed82f20367313d4eab9a69425e6e51f70ec106ca5f56218bbe255d1485752732de2a876359fde7018e706b948d51053fb
6
+ metadata.gz: 94575eced147bd6740b5395acd41d3f46ffcadf40908831df081d5e03f56b35a2e1e9acfdfc7642af775b2aa86fe48ea322dd11baf48512d2f2ef43a1a491079
7
+ data.tar.gz: 52b7d40d4a95acd20a408f4d453f32c7154637ce42ec210cf67010ce10dbe14ad711c4cbfd4060d8ce5018b91668315d552732ea7118374e69228a869792ff0f
@@ -24,7 +24,7 @@ jobs:
24
24
  fetch-depth: 0
25
25
 
26
26
  - name: Set up Ruby
27
- uses: ruby/setup-ruby@ab177d40ee5483edb974554986f56b33477e21d0 # v1.265.0
27
+ uses: ruby/setup-ruby@4ff6f3611a42bc75eee1e5138240eb1613f48c8f # v1.266.0
28
28
  with:
29
29
  bundler-cache: false
30
30
 
data/AGENTS.md ADDED
@@ -0,0 +1,20 @@
1
+ # Repository Guidelines
2
+
3
+ ## Project Structure & Module Organization
4
+ Core gem code lives in `lib/llm_docs_builder`, with single-responsibility modules such as `generator.rb`, `validator.rb`, and the CLI glue in `cli.rb`. Shared entrypoint `lib/llm_docs_builder.rb` wires dependencies. Executables reside in `bin/`: `llm-docs-builder` boots the CLI, while `rspecs` runs the full test matrix. Specs mirror library files under `spec/` with command-level coverage in `spec/integrations`. Static assets (logos, diff screenshots) are in `misc/`. Example configuration templates live at `llm-docs-builder.yml.example`.
5
+
6
+ ## Build, Test, and Development Commands
7
+ - `bundle install` — sync gem dependencies defined in `Gemfile`.
8
+ - `bundle exec rake` — default task; runs RSpec and RuboCop together.
9
+ - `bundle exec rspec` or `bin/rspecs` — execute unit and integration specs with doc formatter.
10
+ - `bundle exec rubocop` — enforce the Ruby style guide; mirrors CI.
11
+ - `bin/llm-docs-builder transform --docs README.md` — smoke-test the CLI against a local file.
12
+
13
+ ## Coding Style & Naming Conventions
14
+ Target Ruby 3.2 with two-space indentation and trailing newline. Prefer single-quoted strings; enable `# frozen_string_literal: true` headers on Ruby files. Keep lines ≤120 characters except where the RuboCop config allows. Use descriptive module/class names (e.g., `LlmDocsBuilder::Generator`) and predicate methods ending with `?` when returning booleans. Place supporting fixtures in `spec/support` if added, and name files after the class they extend.
15
+
16
+ ## Testing Guidelines
17
+ RSpec is the sole testing framework. Name files `*_spec.rb` and align describe blocks with constant paths. Integration scenarios belong in `spec/integrations` to capture CLI behaviors. SimpleCov is enabled by default for line and branch coverage; export `SIMPLECOV=false` for quick local runs. Persist example statuses with the automatically managed `spec/examples.txt`.
18
+
19
+ ## Commit & Pull Request Guidelines
20
+ Keep commit subjects short, present-tense, and focused (e.g., `Align CLI config (#27)`). Group related changes together so `git log` remains readable. Pull requests should describe motivation, summarize behavioral impact, link related issues or discussions, and include CLI output or screenshots when touching generated docs. Ensure CI passes (`bundle exec rake`) before requesting review, and note any follow-up work in the PR description.
data/CHANGELOG.md CHANGED
@@ -1,5 +1,25 @@
1
1
  # Changelog
2
2
 
3
+ ## 0.11.0 (2025-11-03)
4
+ - [Feature] **Transform from URL** — The `transform` command now accepts a remote URL via `--url` and processes fetched content through the standard transformer pipeline.
5
+ - Example: `llm-docs-builder transform --url https://example.com/docs/page.html`
6
+ - Applies all configured transformations and output options identically to local files
7
+ - By @Eric-Guo and @codex in PR #28.
8
+
9
+ ## 0.10.0 (2025-10-27)
10
+ - [Feature] **llms.txt Specification Compliance** - Updated output format to fully comply with the llms.txt specification from llmstxt.org.
11
+ - **Metadata Format**: Metadata now appears within the description field using parentheses and comma separators: `- [title](url): description (tokens:450, updated:2025-10-13, priority:high)`
12
+ - **Optional Descriptions**: Parser now correctly handles links without descriptions: `- [title](url)` per spec
13
+ - **Multi-Section Support**: Documents automatically organized into `Documentation`, `Examples`, and `Optional` sections based on priority
14
+ - **Body Content Support**: Added optional `body` config parameter for custom content between description and sections
15
+ - Priority-based categorization: 1-3 → Documentation, 4-5 → Examples, 6-7 → Optional
16
+ - Empty sections are automatically omitted from output
17
+ - Updated parser regex from `/^[-*]\s*\[([^\]]+)\]\(([^)]+)\):\s*(.*)$/m` to `/^[-*]\s*\[([^\]]+)\]\(([^)]+)\)(?::\s*([^\n]*))?$/` to make descriptions optional
18
+ - Fixed multiline regex greedy matching issue that was capturing only one link per section
19
+ - [Test] Added comprehensive test suite for spec compliance (8 new parser tests, 7 new generator tests)
20
+ - [Docs] Updated README with multi-section organization examples and body content usage
21
+ - **Breaking Change**: Metadata format has changed from `tokens:450 updated:2025-10-13` to `(tokens:450, updated:2025-10-13)` for spec compliance
22
+
3
23
  ## 0.9.4 (2025-10-27)
4
24
  - [Feature] **Auto-Exclude Hidden Directories** - Hidden directories (starting with `.`) are now automatically excluded by default to prevent noise from `.git`, `.lint`, `.github`, etc.
5
25
  - Adds `include_hidden: false` as default behavior
data/Gemfile.lock CHANGED
@@ -1,7 +1,7 @@
1
1
  PATH
2
2
  remote: .
3
3
  specs:
4
- llm-docs-builder (0.9.4)
4
+ llm-docs-builder (0.11.0)
5
5
  zeitwerk (~> 2.6)
6
6
 
7
7
  GEM
@@ -12,15 +12,15 @@ GEM
12
12
  coderay (1.1.3)
13
13
  diff-lcs (1.6.2)
14
14
  docile (1.4.1)
15
- json (2.13.2)
15
+ json (2.15.2)
16
16
  language_server-protocol (3.17.0.5)
17
17
  lint_roller (1.1.0)
18
18
  method_source (1.1.0)
19
19
  parallel (1.27.0)
20
- parser (3.3.9.0)
20
+ parser (3.3.10.0)
21
21
  ast (~> 2.4.1)
22
22
  racc
23
- prism (1.4.0)
23
+ prism (1.6.0)
24
24
  pry (0.15.2)
25
25
  coderay (~> 1.1)
26
26
  method_source (~> 1.0)
@@ -29,22 +29,22 @@ GEM
29
29
  pry (>= 0.13, < 0.16)
30
30
  racc (1.8.1)
31
31
  rainbow (3.1.1)
32
- rake (13.3.0)
33
- regexp_parser (2.11.2)
34
- rspec (3.13.1)
32
+ rake (13.3.1)
33
+ regexp_parser (2.11.3)
34
+ rspec (3.13.2)
35
35
  rspec-core (~> 3.13.0)
36
36
  rspec-expectations (~> 3.13.0)
37
37
  rspec-mocks (~> 3.13.0)
38
- rspec-core (3.13.5)
38
+ rspec-core (3.13.6)
39
39
  rspec-support (~> 3.13.0)
40
40
  rspec-expectations (3.13.5)
41
41
  diff-lcs (>= 1.2.0, < 2.0)
42
42
  rspec-support (~> 3.13.0)
43
- rspec-mocks (3.13.5)
43
+ rspec-mocks (3.13.6)
44
44
  diff-lcs (>= 1.2.0, < 2.0)
45
45
  rspec-support (~> 3.13.0)
46
- rspec-support (3.13.5)
47
- rubocop (1.80.0)
46
+ rspec-support (3.13.6)
47
+ rubocop (1.81.6)
48
48
  json (~> 2.3)
49
49
  language_server-protocol (~> 3.17.0.2)
50
50
  lint_roller (~> 1.1.0)
@@ -52,10 +52,10 @@ GEM
52
52
  parser (>= 3.3.0.2)
53
53
  rainbow (>= 2.2.2, < 4.0)
54
54
  regexp_parser (>= 2.9.3, < 3.0)
55
- rubocop-ast (>= 1.46.0, < 2.0)
55
+ rubocop-ast (>= 1.47.1, < 2.0)
56
56
  ruby-progressbar (~> 1.7)
57
57
  unicode-display_width (>= 2.4.0, < 4.0)
58
- rubocop-ast (1.46.0)
58
+ rubocop-ast (1.47.1)
59
59
  parser (>= 3.3.7.2)
60
60
  prism (~> 1.4)
61
61
  ruby-progressbar (1.13.0)
@@ -65,9 +65,9 @@ GEM
65
65
  simplecov_json_formatter (~> 0.1)
66
66
  simplecov-html (0.13.2)
67
67
  simplecov_json_formatter (0.1.4)
68
- unicode-display_width (3.1.5)
69
- unicode-emoji (~> 4.0, >= 4.0.4)
70
- unicode-emoji (4.0.4)
68
+ unicode-display_width (3.2.0)
69
+ unicode-emoji (~> 4.1)
70
+ unicode-emoji (4.1.0)
71
71
  zeitwerk (2.7.3)
72
72
 
73
73
  PLATFORMS
@@ -85,4 +85,4 @@ DEPENDENCIES
85
85
  simplecov (~> 0.21)
86
86
 
87
87
  BUNDLED WITH
88
- 2.7.1
88
+ 2.7.2
data/README.md CHANGED
@@ -61,6 +61,9 @@ Factor: 2.8x smaller
61
61
  # Single file
62
62
  llm-docs-builder transform --docs README.md
63
63
 
64
+ # Fetch and transform a remote page
65
+ llm-docs-builder transform --url https://yoursite.com/docs/page.html
66
+
64
67
  # Bulk transform with config
65
68
  llm-docs-builder bulk-transform --config llm-docs-builder.yml
66
69
  ```
@@ -109,6 +112,7 @@ docs: ./docs
109
112
  base_url: https://myproject.io
110
113
  title: My Project
111
114
  description: Brief description
115
+ body: Optional body content between description and sections
112
116
  output: llms.txt
113
117
  suffix: .llm
114
118
  verbose: false
@@ -289,9 +293,9 @@ Generate enriched llms.txt files with token counts, timestamps, and priority lab
289
293
 
290
294
  **Enhanced llms.txt (with metadata enabled):**
291
295
  ```markdown
292
- - [Getting Started](https://myproject.io/docs/Getting-Started.md) tokens:450 updated:2025-10-13 priority:high
293
- - [Configuration](https://myproject.io/docs/Configuration.md) tokens:2800 updated:2025-10-12 priority:high
294
- - [Advanced Topics](https://myproject.io/docs/Advanced.md) tokens:5200 updated:2025-09-15 priority:medium
296
+ - [Getting Started](https://myproject.io/docs/Getting-Started.md): Quick start guide (tokens:450, updated:2025-10-13, priority:high)
297
+ - [Configuration](https://myproject.io/docs/Configuration.md): Configuration options (tokens:2800, updated:2025-10-12, priority:high)
298
+ - [Advanced Topics](https://myproject.io/docs/Advanced.md): Deep dive topics (tokens:5200, updated:2025-09-15, priority:medium)
295
299
  ```
296
300
 
297
301
  **Benefits:**
@@ -309,6 +313,68 @@ include_priority: true # Show priority labels (high/medium/low)
309
313
  calculate_compression: true # Show compression ratios (slower, requires transformation)
310
314
  ```
311
315
 
316
+ **Note:** Metadata is formatted according to the llms.txt specification, appearing within the description field using parentheses and comma separators for spec compliance.
317
+
318
+ ### Multi-Section Organization
319
+
320
+ Documents are automatically organized into multiple sections based on priority, following the llms.txt specification:
321
+
322
+ **Priority-based categorization:**
323
+ - **Documentation** (priority 1-3): Essential docs like README, getting started guides, user guides
324
+ - **Examples** (priority 4-5): Tutorials and example files
325
+ - **Optional** (priority 6-7): Advanced topics and reference documentation
326
+
327
+ **Example output:**
328
+ ```markdown
329
+ # My Project
330
+
331
+ > Project description
332
+
333
+ ## Documentation
334
+
335
+ - [README](README.md): Main documentation
336
+ - [Getting Started](getting-started.md): Quick start guide
337
+
338
+ ## Examples
339
+
340
+ - [Basic Tutorial](tutorial.md): Step-by-step tutorial
341
+ - [Code Examples](examples.md): Example code
342
+
343
+ ## Optional
344
+
345
+ - [Advanced Topics](advanced.md): Deep dive into advanced features
346
+ - [API Reference](reference.md): Complete API reference
347
+ ```
348
+
349
+ Empty sections are automatically omitted. The "Optional" section aligns with the llms.txt spec for marking secondary content that can be skipped when context windows are limited.
350
+
351
+ ### Body Content
352
+
353
+ Add custom body content between the description and documentation sections:
354
+
355
+ ```yaml
356
+ # llm-docs-builder.yml
357
+ title: My Project
358
+ description: Brief description
359
+ body: |
360
+ This framework is built on Ruby and focuses on performance.
361
+ Key concepts: streaming, batching, and parallel processing.
362
+ docs: ./docs
363
+ ```
364
+
365
+ This produces:
366
+ ```markdown
367
+ # My Project
368
+
369
+ > Brief description
370
+
371
+ This framework is built on Ruby and focuses on performance.
372
+ Key concepts: streaming, batching, and parallel processing.
373
+
374
+ ## Documentation
375
+ ...
376
+ ```
377
+
312
378
  ## Advanced Compression Options
313
379
 
314
380
  All compression features can be used individually for fine-grained control:
@@ -68,8 +68,9 @@ module LlmDocsBuilder
68
68
  # @param argv [Array<String>] command-line arguments
69
69
  # @return [Hash] parsed options including :command, :config, :docs, :output, :verbose
70
70
  def parse_options(argv)
71
+ command_token = argv.first
71
72
  options = {
72
- command: argv.first&.match?(/^[a-z-]+$/) ? argv.shift : nil
73
+ command: command_token&.match?(/\A[a-z](?:[a-z-]*[a-z])?\z/) ? argv.shift : nil
73
74
  }
74
75
 
75
76
  OptionParser.new do |opts|
@@ -100,7 +101,7 @@ module LlmDocsBuilder
100
101
  options[:output] = path
101
102
  end
102
103
 
103
- opts.on('-u', '--url URL', 'URL to fetch for comparison') do |url|
104
+ opts.on('-u', '--url URL', 'URL to fetch for transform or comparison') do |url|
104
105
  options[:url] = url
105
106
  end
106
107
 
@@ -185,21 +186,42 @@ module LlmDocsBuilder
185
186
  config = LlmDocsBuilder::Config.new(options[:config])
186
187
  merged_options = config.merge_with_options(options)
187
188
 
188
- file_path = merged_options[:docs]
189
+ url = options[:url]
190
+ cli_file_path = options[:docs]
191
+ config_file_path = config['docs']
192
+ file_path = url ? cli_file_path : (cli_file_path || config_file_path)
189
193
 
190
- unless file_path
191
- puts 'File path required for transform command (use -d/--docs)'
194
+ if url && cli_file_path
195
+ puts 'Cannot use both --docs and --url for transform command'
192
196
  exit 1
193
197
  end
194
198
 
195
- unless File.exist?(file_path)
196
- puts "File not found: #{file_path}"
197
- exit 1
199
+ unless file_path
200
+ unless url
201
+ puts 'File path required for transform command (use -d/--docs)'
202
+ exit 1
203
+ end
198
204
  end
199
205
 
200
- puts "Transforming #{file_path}..." if merged_options[:verbose]
206
+ content =
207
+ if url
208
+ puts "Fetching #{url}..." if merged_options[:verbose]
209
+ fetcher = LlmDocsBuilder::UrlFetcher.new(verbose: merged_options[:verbose])
210
+ remote_content = fetcher.fetch(url)
211
+ puts "Transforming content from #{url}..." if merged_options[:verbose]
212
+ transform_options = merged_options.merge(content: remote_content, docs: nil, source_url: url)
213
+ LlmDocsBuilder.transform_markdown(nil, transform_options)
214
+ else
215
+ unless File.exist?(file_path)
216
+ puts "File not found: #{file_path}"
217
+ exit 1
218
+ end
201
219
 
202
- content = LlmDocsBuilder.transform_markdown(file_path, merged_options)
220
+ puts "Transforming #{file_path}..." if merged_options[:verbose]
221
+
222
+ merged_options[:docs] = file_path
223
+ LlmDocsBuilder.transform_markdown(file_path, merged_options)
224
+ end
203
225
 
204
226
  if merged_options[:output] && merged_options[:output] != 'llms.txt'
205
227
  File.write(merged_options[:output], content)
@@ -1,8 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
- require 'net/http'
4
- require 'uri'
5
-
6
3
  module LlmDocsBuilder
7
4
  # Compares content sizes between human and AI versions
8
5
  #
@@ -30,7 +27,7 @@ module LlmDocsBuilder
30
27
  AI_USER_AGENT = 'Claude-Web/1.0 (Anthropic AI Assistant)'
31
28
 
32
29
  # Maximum number of redirects to follow before raising an error
33
- MAX_REDIRECTS = 10
30
+ MAX_REDIRECTS = UrlFetcher::MAX_REDIRECTS
34
31
 
35
32
  # @return [String] URL to compare
36
33
  attr_reader :url
@@ -133,78 +130,11 @@ module LlmDocsBuilder
133
130
  # @return [String] response body
134
131
  # @raise [Errors::GenerationError] if fetch fails or too many redirects
135
132
  def fetch_url(url_string, user_agent, redirect_count = 0)
136
- if redirect_count >= MAX_REDIRECTS
137
- raise(
138
- Errors::GenerationError,
139
- "Too many redirects (#{MAX_REDIRECTS}) when fetching #{url_string}"
140
- )
141
- end
142
-
143
- uri = validate_and_parse_url(url_string)
144
-
145
- http = Net::HTTP.new(uri.host, uri.port)
146
- http.use_ssl = uri.scheme == 'https'
147
- http.open_timeout = 10
148
- http.read_timeout = 30
149
-
150
- request = Net::HTTP::Get.new(uri.request_uri)
151
- request['User-Agent'] = user_agent
152
-
153
- response = http.request(request)
154
-
155
- case response
156
- when Net::HTTPSuccess
157
- response.body
158
- when Net::HTTPRedirection
159
- # Follow redirect with incremented counter
160
- redirect_url = response['location']
161
- puts " Redirecting to #{redirect_url}..." if options[:verbose] && redirect_count.positive?
162
- fetch_url(redirect_url, user_agent, redirect_count + 1)
163
- else
164
- raise(
165
- Errors::GenerationError,
166
- "Failed to fetch #{url_string}: #{response.code} #{response.message}"
167
- )
168
- end
169
- rescue Errors::GenerationError
170
- raise
171
- rescue StandardError => e
172
- raise(
173
- Errors::GenerationError,
174
- "Error fetching #{url_string}: #{e.message}"
175
- )
176
- end
177
-
178
- # Validates and parses URL to prevent malformed URLs
179
- #
180
- # @param url_string [String] URL to validate and parse
181
- # @return [URI::HTTP, URI::HTTPS] parsed URI
182
- # @raise [Errors::GenerationError] if URL is invalid or uses unsupported scheme
183
- def validate_and_parse_url(url_string)
184
- uri = URI.parse(url_string)
185
-
186
- # Only allow HTTP and HTTPS schemes
187
- unless %w[http https].include?(uri.scheme&.downcase)
188
- raise(
189
- Errors::GenerationError,
190
- "Unsupported URL scheme: #{uri.scheme || 'none'} (only http/https allowed)"
191
- )
192
- end
193
-
194
- # Ensure host is present
195
- if uri.host.nil? || uri.host.empty?
196
- raise(
197
- Errors::GenerationError,
198
- "Invalid URL: missing host in #{url_string}"
199
- )
200
- end
201
-
202
- uri
203
- rescue URI::InvalidURIError => e
204
- raise(
205
- Errors::GenerationError,
206
- "Invalid URL format: #{e.message}"
133
+ fetcher = UrlFetcher.new(
134
+ user_agent: user_agent,
135
+ verbose: options[:verbose]
207
136
  )
137
+ fetcher.fetch(url_string, redirect_count)
208
138
  end
209
139
 
210
140
  # Calculate comparison statistics
@@ -57,10 +57,15 @@ module LlmDocsBuilder
57
57
  def merge_with_options(options)
58
58
  # CLI options override config file, config file provides defaults
59
59
  {
60
- docs: options[:docs] || self['docs'] || '.',
60
+ docs: if options.key?(:docs)
61
+ options[:docs]
62
+ else
63
+ self['docs'] || '.'
64
+ end,
61
65
  base_url: options[:base_url] || self['base_url'],
62
66
  title: options[:title] || self['title'],
63
67
  description: options[:description] || self['description'],
68
+ body: options[:body] || self['body'],
64
69
  output: options[:output] || self['output'] || 'llms.txt',
65
70
  convert_urls: if options.key?(:convert_urls)
66
71
  options[:convert_urls]
@@ -170,7 +175,10 @@ module LlmDocsBuilder
170
175
  else
171
176
  self['calculate_compression'] || false
172
177
  end
173
- }
178
+ }.tap do |merged|
179
+ merged[:content] = options[:content] if options.key?(:content)
180
+ merged[:source_url] = options[:source_url] if options.key?(:source_url)
181
+ end
174
182
  end
175
183
 
176
184
  # Check if a config file was found and exists
@@ -210,7 +210,11 @@ module LlmDocsBuilder
210
210
 
211
211
  # Constructs llms.txt content from analyzed documentation files
212
212
  #
213
- # Combines title, description, and documentation links into formatted output
213
+ # Combines title, description, body content, and documentation links into formatted output.
214
+ # Organizes documents into sections based on priority:
215
+ # - Priority 1-3: Documentation (essential docs like README, getting started)
216
+ # - Priority 4-5: Examples (tutorials, example files)
217
+ # - Priority 6-7: Optional (advanced topics, reference docs)
214
218
  #
215
219
  # @param docs [Array<Hash>] analyzed file metadata
216
220
  # @return [String] formatted llms.txt content
@@ -224,31 +228,60 @@ module LlmDocsBuilder
224
228
  content << "> #{description}" if description
225
229
  content << ''
226
230
 
227
- if docs.any?
228
- content << '## Documentation'
231
+ # Add optional body content
232
+ if options[:body] && !options[:body].empty?
233
+ content << options[:body]
229
234
  content << ''
235
+ end
230
236
 
231
- docs.each do |doc|
232
- url = build_url(doc[:path])
233
- line = if doc[:description] && !doc[:description].empty?
234
- "- [#{doc[:title]}](#{url}): #{doc[:description]}"
235
- else
236
- "- [#{doc[:title]}](#{url})"
237
- end
238
-
239
- # Append metadata if enabled
240
- if options[:include_metadata]
241
- metadata_parts = []
242
- metadata_parts << "tokens:#{doc[:tokens]}" if doc[:tokens]
243
- metadata_parts << "compression:#{doc[:compression]}" if doc[:compression]
244
- metadata_parts << "updated:#{doc[:updated]}" if doc[:updated]
245
- metadata_parts << priority_label(doc[:priority]) if options[:include_priority]
246
-
247
- line += " #{metadata_parts.join(' ')}" unless metadata_parts.empty?
237
+ if docs.any?
238
+ # Categorize docs by priority into sections
239
+ sections = {
240
+ 'Documentation' => docs.select { |d| d[:priority] <= 3 },
241
+ 'Examples' => docs.select { |d| d[:priority] >= 4 && d[:priority] <= 5 },
242
+ 'Optional' => docs.select { |d| d[:priority] >= 6 }
243
+ }
244
+
245
+ # Build each section (skip empty ones)
246
+ sections.each do |section_name, section_docs|
247
+ next if section_docs.empty?
248
+
249
+ content << "## #{section_name}"
250
+ content << ''
251
+
252
+ section_docs.each do |doc|
253
+ url = build_url(doc[:path])
254
+
255
+ # Build metadata string if enabled
256
+ metadata_str = nil
257
+ if options[:include_metadata]
258
+ metadata_parts = []
259
+ metadata_parts << "tokens:#{doc[:tokens]}" if doc[:tokens]
260
+ metadata_parts << "compression:#{doc[:compression]}" if doc[:compression]
261
+ metadata_parts << "updated:#{doc[:updated]}" if doc[:updated]
262
+ metadata_parts << priority_label(doc[:priority]) if options[:include_priority]
263
+
264
+ metadata_str = "(#{metadata_parts.join(', ')})" unless metadata_parts.empty?
265
+ end
266
+
267
+ # Build line according to spec: - [title](url): description (metadata)
268
+ line = if doc[:description] && !doc[:description].empty?
269
+ base = "- [#{doc[:title]}](#{url}): #{doc[:description]}"
270
+ metadata_str ? "#{base} #{metadata_str}" : base
271
+ else
272
+ # No description: - [title](url) (metadata)
273
+ base = "- [#{doc[:title]}](#{url})"
274
+ metadata_str ? "#{base}: #{metadata_str}" : base
275
+ end
276
+
277
+ content << line
248
278
  end
249
279
 
250
- content << line
280
+ content << ''
251
281
  end
282
+
283
+ # Remove trailing empty line
284
+ content.pop if content.last == ''
252
285
  end
253
286
 
254
287
  "#{content.join("\n")}\n"
@@ -55,7 +55,7 @@ module LlmDocsBuilder
55
55
  #
56
56
  # @return [String] transformed markdown content
57
57
  def transform
58
- content = File.read(file_path)
58
+ content = load_content
59
59
 
60
60
  # Build and execute transformation pipeline
61
61
  content = cleanup_transformer.transform(content, options)
@@ -124,5 +124,16 @@ module LlmDocsBuilder
124
124
  }
125
125
  compressor.compress(content, compression_methods)
126
126
  end
127
+
128
+ # Load source content either from provided string or file path
129
+ #
130
+ # @return [String] markdown content to transform
131
+ def load_content
132
+ if options[:content]
133
+ options[:content].dup
134
+ else
135
+ File.read(file_path)
136
+ end
137
+ end
127
138
  end
128
139
  end
@@ -85,7 +85,7 @@ module LlmDocsBuilder
85
85
 
86
86
  # Extracts markdown links from section content into structured format
87
87
  #
88
- # Scans for markdown list items with links and descriptions. Returns raw content
88
+ # Scans for markdown list items with links and optional descriptions. Returns raw content
89
89
  # if no links are found in the expected format.
90
90
  #
91
91
  # @param content [String] raw section content
@@ -93,11 +93,13 @@ module LlmDocsBuilder
93
93
  def parse_section_content(content)
94
94
  links = []
95
95
 
96
- content.scan(/^[-*]\s*\[([^\]]+)\]\(([^)]+)\):\s*(.*)$/m) do |title, url, description|
96
+ # Updated regex: description is optional (non-capturing group with ?)
97
+ # Use [^\n]* instead of .* to avoid matching across lines
98
+ content.scan(/^[-*]\s*\[([^\]]+)\]\(([^)]+)\)(?::\s*([^\n]*))?$/) do |title, url, description|
97
99
  links << {
98
100
  title: title,
99
101
  url: url,
100
- description: description.strip
102
+ description: description&.strip || ''
101
103
  }
102
104
  end
103
105
 
@@ -0,0 +1,120 @@
1
+ # frozen_string_literal: true
2
+
3
+ require 'net/http'
4
+ require 'uri'
5
+
6
+ module LlmDocsBuilder
7
+ # Lightweight HTTP client for fetching remote documentation pages.
8
+ #
9
+ # Provides common functionality needed by multiple commands (transform, compare)
10
+ # including strict scheme validation, redirect handling and sensible timeouts.
11
+ class UrlFetcher
12
+ DEFAULT_USER_AGENT = 'llm-docs-builder/1.0 (+https://github.com/mensfeld/llm-docs-builder)'
13
+ MAX_REDIRECTS = 10
14
+
15
+ # @param user_agent [String] HTTP user agent header value
16
+ # @param verbose [Boolean] enable redirect logging
17
+ # @param output [IO] IO stream used for redirect logging
18
+ def initialize(user_agent: DEFAULT_USER_AGENT, verbose: false, output: $stdout)
19
+ @user_agent = user_agent
20
+ @verbose = verbose
21
+ @output = output
22
+ end
23
+
24
+ # Fetch remote URL content while following redirects.
25
+ #
26
+ # @param url_string [String] URL to fetch
27
+ # @param redirect_count [Integer] current redirect depth (internal use)
28
+ # @return [String] response body
29
+ # @raise [Errors::GenerationError] on invalid URLs, network failures, or redirect loops
30
+ def fetch(url_string, redirect_count = 0)
31
+ if redirect_count >= MAX_REDIRECTS
32
+ raise(
33
+ Errors::GenerationError,
34
+ "Too many redirects (#{MAX_REDIRECTS}) when fetching #{url_string}"
35
+ )
36
+ end
37
+
38
+ uri = validate_and_parse_url(url_string)
39
+
40
+ http = Net::HTTP.new(uri.host, uri.port)
41
+ http.use_ssl = uri.scheme == 'https'
42
+ http.open_timeout = 10
43
+ http.read_timeout = 30
44
+
45
+ request = Net::HTTP::Get.new(uri.request_uri)
46
+ request['User-Agent'] = @user_agent
47
+
48
+ response = http.request(request)
49
+
50
+ case response
51
+ when Net::HTTPSuccess
52
+ response.body
53
+ when Net::HTTPRedirection
54
+ redirect_url = absolute_redirect_url(uri, response['location'])
55
+ log_redirect(redirect_url)
56
+ fetch(redirect_url, redirect_count + 1)
57
+ else
58
+ raise(
59
+ Errors::GenerationError,
60
+ "Failed to fetch #{url_string}: #{response.code} #{response.message}"
61
+ )
62
+ end
63
+ rescue Errors::GenerationError
64
+ raise
65
+ rescue StandardError => e
66
+ raise(
67
+ Errors::GenerationError,
68
+ "Error fetching #{url_string}: #{e.message}"
69
+ )
70
+ end
71
+
72
+ private
73
+
74
+ def validate_and_parse_url(url_string)
75
+ uri = URI.parse(url_string)
76
+
77
+ unless %w[http https].include?(uri.scheme&.downcase)
78
+ raise(
79
+ Errors::GenerationError,
80
+ "Unsupported URL scheme: #{uri.scheme || 'none'} (only http/https allowed)"
81
+ )
82
+ end
83
+
84
+ if uri.host.nil? || uri.host.empty?
85
+ raise(
86
+ Errors::GenerationError,
87
+ "Invalid URL: missing host in #{url_string}"
88
+ )
89
+ end
90
+
91
+ uri
92
+ rescue URI::InvalidURIError => e
93
+ raise(
94
+ Errors::GenerationError,
95
+ "Invalid URL format: #{e.message}"
96
+ )
97
+ end
98
+
99
+ def absolute_redirect_url(base_uri, location)
100
+ raise(
101
+ Errors::GenerationError,
102
+ "Redirect missing location header for #{base_uri}"
103
+ ) if location.nil? || location.empty?
104
+
105
+ URI.join(base_uri, location).to_s
106
+ rescue URI::InvalidURIError => e
107
+ raise(
108
+ Errors::GenerationError,
109
+ "Invalid redirect URL from #{base_uri}: #{e.message}"
110
+ )
111
+ end
112
+
113
+ def log_redirect(url)
114
+ return unless @verbose
115
+
116
+ @output.puts(" Redirecting to #{url}...")
117
+ end
118
+ end
119
+ end
120
+
@@ -2,5 +2,5 @@
2
2
 
3
3
  module LlmDocsBuilder
4
4
  # Current version of the LlmDocsBuilder gem
5
- VERSION = '0.9.4'
5
+ VERSION = '0.11.0'
6
6
  end
@@ -25,6 +25,7 @@ module LlmDocsBuilder
25
25
  # @option options [Boolean] :convert_urls convert HTML URLs to markdown format (overrides
26
26
  # config)
27
27
  # @option options [Boolean] :verbose enable verbose output (overrides config)
28
+ # @option options [String] :content raw markdown content (used for remote sources)
28
29
  # @return [String] generated llms.txt content
29
30
  #
30
31
  # @example Generate from docs directory
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: llm-docs-builder
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.9.4
4
+ version: 0.11.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Maciej Mensfeld
@@ -116,6 +116,7 @@ files:
116
116
  - ".rspec"
117
117
  - ".rubocop.yml"
118
118
  - ".ruby-version"
119
+ - AGENTS.md
119
120
  - CHANGELOG.md
120
121
  - Dockerfile
121
122
  - Gemfile
@@ -143,6 +144,7 @@ files:
143
144
  - lib/llm_docs_builder/transformers/heading_transformer.rb
144
145
  - lib/llm_docs_builder/transformers/link_transformer.rb
145
146
  - lib/llm_docs_builder/transformers/whitespace_transformer.rb
147
+ - lib/llm_docs_builder/url_fetcher.rb
146
148
  - lib/llm_docs_builder/validator.rb
147
149
  - lib/llm_docs_builder/version.rb
148
150
  - llm-docs-builder.gemspec