llm-docs-builder 0.9.4 → 0.10.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 66983a07a7271c966999350d03fbde1b1080ef0ac05a7209452cbb8720074e1b
4
- data.tar.gz: 5d7cb81a700db6a43c17145af56ffcd2f6cea4a9a5182d4a2b14fd7772c8ee07
3
+ metadata.gz: e6407836216f436b728247009f614ea4ea5c2b4de0edf855717129460df4b309
4
+ data.tar.gz: 8a556fc0b6307529f5c615c05b082521bd021e9e9f34ca30c8bb21d22b21deb2
5
5
  SHA512:
6
- metadata.gz: 416b6f94c7e7dbac3bf3e6ae8793adcf9893d685aec29cc83665c2f9a6ab312bd422542e0074bbeec961f23740698aac6e517ebb7c87f3cb0fef3c8c6067c662
7
- data.tar.gz: 2fba65092d82dbbeea60ce05317a781ed82f20367313d4eab9a69425e6e51f70ec106ca5f56218bbe255d1485752732de2a876359fde7018e706b948d51053fb
6
+ metadata.gz: 3a0b657545415c35187fa1595f3fcc5bd5c27a0c1a292bf00635d3c6f14221ac0a254f008f83acc1f7351ef76c6649e6b0540ac585cd64a670b05ef725b0308e
7
+ data.tar.gz: 68e95142e374ebae3c292db724a9163c396fd423eaa04983bbdaefd6f6cbb0bc72822dfbc5af9fd8f357f1545320db70bf5a6c8a2413fc5f796e73195adcdd13
data/CHANGELOG.md CHANGED
@@ -1,5 +1,19 @@
1
1
  # Changelog
2
2
 
3
+ ## 0.10.0 (2025-10-27)
4
+ - [Feature] **llms.txt Specification Compliance** - Updated output format to fully comply with the llms.txt specification from llmstxt.org.
5
+ - **Metadata Format**: Metadata now appears within the description field using parentheses and comma separators: `- [title](url): description (tokens:450, updated:2025-10-13, priority:high)`
6
+ - **Optional Descriptions**: Parser now correctly handles links without descriptions: `- [title](url)` per spec
7
+ - **Multi-Section Support**: Documents automatically organized into `Documentation`, `Examples`, and `Optional` sections based on priority
8
+ - **Body Content Support**: Added optional `body` config parameter for custom content between description and sections
9
+ - Priority-based categorization: 1-3 → Documentation, 4-5 → Examples, 6-7 → Optional
10
+ - Empty sections are automatically omitted from output
11
+ - Updated parser regex from `/^[-*]\s*\[([^\]]+)\]\(([^)]+)\):\s*(.*)$/m` to `/^[-*]\s*\[([^\]]+)\]\(([^)]+)\)(?::\s*([^\n]*))?$/` to make descriptions optional
12
+ - Fixed multiline regex greedy matching issue that was capturing only one link per section
13
+ - [Test] Added comprehensive test suite for spec compliance (8 new parser tests, 7 new generator tests)
14
+ - [Docs] Updated README with multi-section organization examples and body content usage
15
+ - **Breaking Change**: Metadata format has changed from `tokens:450 updated:2025-10-13` to `(tokens:450, updated:2025-10-13)` for spec compliance
16
+
3
17
  ## 0.9.4 (2025-10-27)
4
18
  - [Feature] **Auto-Exclude Hidden Directories** - Hidden directories (starting with `.`) are now automatically excluded by default to prevent noise from `.git`, `.lint`, `.github`, etc.
5
19
  - Adds `include_hidden: false` as default behavior
data/Gemfile.lock CHANGED
@@ -1,7 +1,7 @@
1
1
  PATH
2
2
  remote: .
3
3
  specs:
4
- llm-docs-builder (0.9.4)
4
+ llm-docs-builder (0.10.0)
5
5
  zeitwerk (~> 2.6)
6
6
 
7
7
  GEM
data/README.md CHANGED
@@ -109,6 +109,7 @@ docs: ./docs
109
109
  base_url: https://myproject.io
110
110
  title: My Project
111
111
  description: Brief description
112
+ body: Optional body content between description and sections
112
113
  output: llms.txt
113
114
  suffix: .llm
114
115
  verbose: false
@@ -289,9 +290,9 @@ Generate enriched llms.txt files with token counts, timestamps, and priority lab
289
290
 
290
291
  **Enhanced llms.txt (with metadata enabled):**
291
292
  ```markdown
292
- - [Getting Started](https://myproject.io/docs/Getting-Started.md) tokens:450 updated:2025-10-13 priority:high
293
- - [Configuration](https://myproject.io/docs/Configuration.md) tokens:2800 updated:2025-10-12 priority:high
294
- - [Advanced Topics](https://myproject.io/docs/Advanced.md) tokens:5200 updated:2025-09-15 priority:medium
293
+ - [Getting Started](https://myproject.io/docs/Getting-Started.md): Quick start guide (tokens:450, updated:2025-10-13, priority:high)
294
+ - [Configuration](https://myproject.io/docs/Configuration.md): Configuration options (tokens:2800, updated:2025-10-12, priority:high)
295
+ - [Advanced Topics](https://myproject.io/docs/Advanced.md): Deep dive topics (tokens:5200, updated:2025-09-15, priority:medium)
295
296
  ```
296
297
 
297
298
  **Benefits:**
@@ -309,6 +310,68 @@ include_priority: true # Show priority labels (high/medium/low)
309
310
  calculate_compression: true # Show compression ratios (slower, requires transformation)
310
311
  ```
311
312
 
313
+ **Note:** Metadata is formatted according to the llms.txt specification, appearing within the description field using parentheses and comma separators for spec compliance.
314
+
315
+ ### Multi-Section Organization
316
+
317
+ Documents are automatically organized into multiple sections based on priority, following the llms.txt specification:
318
+
319
+ **Priority-based categorization:**
320
+ - **Documentation** (priority 1-3): Essential docs like README, getting started guides, user guides
321
+ - **Examples** (priority 4-5): Tutorials and example files
322
+ - **Optional** (priority 6-7): Advanced topics and reference documentation
323
+
324
+ **Example output:**
325
+ ```markdown
326
+ # My Project
327
+
328
+ > Project description
329
+
330
+ ## Documentation
331
+
332
+ - [README](README.md): Main documentation
333
+ - [Getting Started](getting-started.md): Quick start guide
334
+
335
+ ## Examples
336
+
337
+ - [Basic Tutorial](tutorial.md): Step-by-step tutorial
338
+ - [Code Examples](examples.md): Example code
339
+
340
+ ## Optional
341
+
342
+ - [Advanced Topics](advanced.md): Deep dive into advanced features
343
+ - [API Reference](reference.md): Complete API reference
344
+ ```
345
+
346
+ Empty sections are automatically omitted. The "Optional" section aligns with the llms.txt spec for marking secondary content that can be skipped when context windows are limited.
347
+
348
+ ### Body Content
349
+
350
+ Add custom body content between the description and documentation sections:
351
+
352
+ ```yaml
353
+ # llm-docs-builder.yml
354
+ title: My Project
355
+ description: Brief description
356
+ body: |
357
+ This framework is built on Ruby and focuses on performance.
358
+ Key concepts: streaming, batching, and parallel processing.
359
+ docs: ./docs
360
+ ```
361
+
362
+ This produces:
363
+ ```markdown
364
+ # My Project
365
+
366
+ > Brief description
367
+
368
+ This framework is built on Ruby and focuses on performance.
369
+ Key concepts: streaming, batching, and parallel processing.
370
+
371
+ ## Documentation
372
+ ...
373
+ ```
374
+
312
375
  ## Advanced Compression Options
313
376
 
314
377
  All compression features can be used individually for fine-grained control:
@@ -61,6 +61,7 @@ module LlmDocsBuilder
61
61
  base_url: options[:base_url] || self['base_url'],
62
62
  title: options[:title] || self['title'],
63
63
  description: options[:description] || self['description'],
64
+ body: options[:body] || self['body'],
64
65
  output: options[:output] || self['output'] || 'llms.txt',
65
66
  convert_urls: if options.key?(:convert_urls)
66
67
  options[:convert_urls]
@@ -210,7 +210,11 @@ module LlmDocsBuilder
210
210
 
211
211
  # Constructs llms.txt content from analyzed documentation files
212
212
  #
213
- # Combines title, description, and documentation links into formatted output
213
+ # Combines title, description, body content, and documentation links into formatted output.
214
+ # Organizes documents into sections based on priority:
215
+ # - Priority 1-3: Documentation (essential docs like README, getting started)
216
+ # - Priority 4-5: Examples (tutorials, example files)
217
+ # - Priority 6-7: Optional (advanced topics, reference docs)
214
218
  #
215
219
  # @param docs [Array<Hash>] analyzed file metadata
216
220
  # @return [String] formatted llms.txt content
@@ -224,31 +228,60 @@ module LlmDocsBuilder
224
228
  content << "> #{description}" if description
225
229
  content << ''
226
230
 
227
- if docs.any?
228
- content << '## Documentation'
231
+ # Add optional body content
232
+ if options[:body] && !options[:body].empty?
233
+ content << options[:body]
229
234
  content << ''
235
+ end
230
236
 
231
- docs.each do |doc|
232
- url = build_url(doc[:path])
233
- line = if doc[:description] && !doc[:description].empty?
234
- "- [#{doc[:title]}](#{url}): #{doc[:description]}"
235
- else
236
- "- [#{doc[:title]}](#{url})"
237
- end
238
-
239
- # Append metadata if enabled
240
- if options[:include_metadata]
241
- metadata_parts = []
242
- metadata_parts << "tokens:#{doc[:tokens]}" if doc[:tokens]
243
- metadata_parts << "compression:#{doc[:compression]}" if doc[:compression]
244
- metadata_parts << "updated:#{doc[:updated]}" if doc[:updated]
245
- metadata_parts << priority_label(doc[:priority]) if options[:include_priority]
246
-
247
- line += " #{metadata_parts.join(' ')}" unless metadata_parts.empty?
237
+ if docs.any?
238
+ # Categorize docs by priority into sections
239
+ sections = {
240
+ 'Documentation' => docs.select { |d| d[:priority] <= 3 },
241
+ 'Examples' => docs.select { |d| d[:priority] >= 4 && d[:priority] <= 5 },
242
+ 'Optional' => docs.select { |d| d[:priority] >= 6 }
243
+ }
244
+
245
+ # Build each section (skip empty ones)
246
+ sections.each do |section_name, section_docs|
247
+ next if section_docs.empty?
248
+
249
+ content << "## #{section_name}"
250
+ content << ''
251
+
252
+ section_docs.each do |doc|
253
+ url = build_url(doc[:path])
254
+
255
+ # Build metadata string if enabled
256
+ metadata_str = nil
257
+ if options[:include_metadata]
258
+ metadata_parts = []
259
+ metadata_parts << "tokens:#{doc[:tokens]}" if doc[:tokens]
260
+ metadata_parts << "compression:#{doc[:compression]}" if doc[:compression]
261
+ metadata_parts << "updated:#{doc[:updated]}" if doc[:updated]
262
+ metadata_parts << priority_label(doc[:priority]) if options[:include_priority]
263
+
264
+ metadata_str = "(#{metadata_parts.join(', ')})" unless metadata_parts.empty?
265
+ end
266
+
267
+ # Build line according to spec: - [title](url): description (metadata)
268
+ line = if doc[:description] && !doc[:description].empty?
269
+ base = "- [#{doc[:title]}](#{url}): #{doc[:description]}"
270
+ metadata_str ? "#{base} #{metadata_str}" : base
271
+ else
272
+ # No description: - [title](url) (metadata)
273
+ base = "- [#{doc[:title]}](#{url})"
274
+ metadata_str ? "#{base}: #{metadata_str}" : base
275
+ end
276
+
277
+ content << line
248
278
  end
249
279
 
250
- content << line
280
+ content << ''
251
281
  end
282
+
283
+ # Remove trailing empty line
284
+ content.pop if content.last == ''
252
285
  end
253
286
 
254
287
  "#{content.join("\n")}\n"
@@ -85,7 +85,7 @@ module LlmDocsBuilder
85
85
 
86
86
  # Extracts markdown links from section content into structured format
87
87
  #
88
- # Scans for markdown list items with links and descriptions. Returns raw content
88
+ # Scans for markdown list items with links and optional descriptions. Returns raw content
89
89
  # if no links are found in the expected format.
90
90
  #
91
91
  # @param content [String] raw section content
@@ -93,11 +93,13 @@ module LlmDocsBuilder
93
93
  def parse_section_content(content)
94
94
  links = []
95
95
 
96
- content.scan(/^[-*]\s*\[([^\]]+)\]\(([^)]+)\):\s*(.*)$/m) do |title, url, description|
96
+ # Updated regex: description is optional (non-capturing group with ?)
97
+ # Use [^\n]* instead of .* to avoid matching across lines
98
+ content.scan(/^[-*]\s*\[([^\]]+)\]\(([^)]+)\)(?::\s*([^\n]*))?$/) do |title, url, description|
97
99
  links << {
98
100
  title: title,
99
101
  url: url,
100
- description: description.strip
102
+ description: description&.strip || ''
101
103
  }
102
104
  end
103
105
 
@@ -2,5 +2,5 @@
2
2
 
3
3
  module LlmDocsBuilder
4
4
  # Current version of the LlmDocsBuilder gem
5
- VERSION = '0.9.4'
5
+ VERSION = '0.10.0'
6
6
  end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: llm-docs-builder
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.9.4
4
+ version: 0.10.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Maciej Mensfeld