llm-docs-builder 0.9.3 → 0.10.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 75429a83cfd019e7059f76a5fe7df200bff3bc14066c5f83e0739c85a5a68b63
4
- data.tar.gz: 850d0568023dd1602cad0b48af5e6437b61f34f760c77b99adb565056313d241
3
+ metadata.gz: e6407836216f436b728247009f614ea4ea5c2b4de0edf855717129460df4b309
4
+ data.tar.gz: 8a556fc0b6307529f5c615c05b082521bd021e9e9f34ca30c8bb21d22b21deb2
5
5
  SHA512:
6
- metadata.gz: f0384e50c10837ec00e4d195115882d4fdaade5e522bba02929b028ac315d686324baf189bfc16803120b77718b28c10f2026e10f2f2bd3dd1ea7aab09f36549
7
- data.tar.gz: 184e427780364d704067b1f84962fa7c658782ff2e048c991b34c06ca2643029d107872df1a0beac92cfb07076ae72cf9289527d534cad6274a99fdeaef22ac0
6
+ metadata.gz: 3a0b657545415c35187fa1595f3fcc5bd5c27a0c1a292bf00635d3c6f14221ac0a254f008f83acc1f7351ef76c6649e6b0540ac585cd64a670b05ef725b0308e
7
+ data.tar.gz: 68e95142e374ebae3c292db724a9163c396fd423eaa04983bbdaefd6f6cbb0bc72822dfbc5af9fd8f357f1545320db70bf5a6c8a2413fc5f796e73195adcdd13
data/CHANGELOG.md CHANGED
@@ -1,5 +1,34 @@
1
1
  # Changelog
2
2
 
3
+ ## 0.10.0 (2025-10-27)
4
+ - [Feature] **llms.txt Specification Compliance** - Updated output format to fully comply with the llms.txt specification from llmstxt.org.
5
+ - **Metadata Format**: Metadata now appears within the description field using parentheses and comma separators: `- [title](url): description (tokens:450, updated:2025-10-13, priority:high)`
6
+ - **Optional Descriptions**: Parser now correctly handles links without descriptions: `- [title](url)` per spec
7
+ - **Multi-Section Support**: Documents automatically organized into `Documentation`, `Examples`, and `Optional` sections based on priority
8
+ - **Body Content Support**: Added optional `body` config parameter for custom content between description and sections
9
+ - Priority-based categorization: 1-3 → Documentation, 4-5 → Examples, 6-7 → Optional
10
+ - Empty sections are automatically omitted from output
11
+ - Updated parser regex from `/^[-*]\s*\[([^\]]+)\]\(([^)]+)\):\s*(.*)$/m` to `/^[-*]\s*\[([^\]]+)\]\(([^)]+)\)(?::\s*([^\n]*))?$/` to make descriptions optional
12
+ - Fixed multiline regex greedy matching issue that was capturing only one link per section
13
+ - [Test] Added comprehensive test suite for spec compliance (8 new parser tests, 7 new generator tests)
14
+ - [Docs] Updated README with multi-section organization examples and body content usage
15
+ - **Breaking Change**: Metadata format has changed from `tokens:450 updated:2025-10-13` to `(tokens:450, updated:2025-10-13)` for spec compliance
16
+
17
+ ## 0.9.4 (2025-10-27)
18
+ - [Feature] **Auto-Exclude Hidden Directories** - Hidden directories (starting with `.`) are now automatically excluded by default to prevent noise from `.git`, `.lint`, `.github`, etc.
19
+ - Adds `include_hidden: false` as default behavior
20
+ - Set `include_hidden: true` in config to include hidden directories if needed
21
+ - Uses `Find.prune` for efficient directory tree traversal
22
+ - Prevents scanning of common directories like `.lint`, `.gh`, `.git`, `node_modules` (if hidden)
23
+ - Fixed bug where root directory `.` was being pruned when used as docs_path
24
+ - [Fix] **Excludes Pattern Matching** - Fixed fnmatch pattern handling for better glob pattern support.
25
+ - Fixed `**/.dir/**` patterns now correctly match root-level directories
26
+ - Normalized patterns ending with `/**` to `/**/*` for proper fnmatch behavior
27
+ - Handles `**/` prefix matching for zero-directory cases
28
+ - Fixed relative path calculation to avoid "different prefix" errors
29
+ - [Test] Added unit tests for hidden directory exclusion feature (5 tests)
30
+ - [Test] Added integration tests for hidden directory behavior (3 tests)
31
+
3
32
  ## 0.9.3 (2025-10-27)
4
33
  - [Fix] **Generate Command Excludes Support** - The `generate` command now properly respects the `excludes` configuration option to filter out files from llms.txt generation.
5
34
  - Added `should_exclude?` method to Generator class that matches files against glob patterns
data/Gemfile.lock CHANGED
@@ -1,7 +1,7 @@
1
1
  PATH
2
2
  remote: .
3
3
  specs:
4
- llm-docs-builder (0.9.3)
4
+ llm-docs-builder (0.10.0)
5
5
  zeitwerk (~> 2.6)
6
6
 
7
7
  GEM
data/README.md CHANGED
@@ -109,6 +109,7 @@ docs: ./docs
109
109
  base_url: https://myproject.io
110
110
  title: My Project
111
111
  description: Brief description
112
+ body: Optional body content between description and sections
112
113
  output: llms.txt
113
114
  suffix: .llm
114
115
  verbose: false
@@ -289,9 +290,9 @@ Generate enriched llms.txt files with token counts, timestamps, and priority lab
289
290
 
290
291
  **Enhanced llms.txt (with metadata enabled):**
291
292
  ```markdown
292
- - [Getting Started](https://myproject.io/docs/Getting-Started.md) tokens:450 updated:2025-10-13 priority:high
293
- - [Configuration](https://myproject.io/docs/Configuration.md) tokens:2800 updated:2025-10-12 priority:high
294
- - [Advanced Topics](https://myproject.io/docs/Advanced.md) tokens:5200 updated:2025-09-15 priority:medium
293
+ - [Getting Started](https://myproject.io/docs/Getting-Started.md): Quick start guide (tokens:450, updated:2025-10-13, priority:high)
294
+ - [Configuration](https://myproject.io/docs/Configuration.md): Configuration options (tokens:2800, updated:2025-10-12, priority:high)
295
+ - [Advanced Topics](https://myproject.io/docs/Advanced.md): Deep dive topics (tokens:5200, updated:2025-09-15, priority:medium)
295
296
  ```
296
297
 
297
298
  **Benefits:**
@@ -309,6 +310,68 @@ include_priority: true # Show priority labels (high/medium/low)
309
310
  calculate_compression: true # Show compression ratios (slower, requires transformation)
310
311
  ```
311
312
 
313
+ **Note:** Metadata is formatted according to the llms.txt specification, appearing within the description field using parentheses and comma separators for spec compliance.
314
+
315
+ ### Multi-Section Organization
316
+
317
+ Documents are automatically organized into multiple sections based on priority, following the llms.txt specification:
318
+
319
+ **Priority-based categorization:**
320
+ - **Documentation** (priority 1-3): Essential docs like README, getting started guides, user guides
321
+ - **Examples** (priority 4-5): Tutorials and example files
322
+ - **Optional** (priority 6-7): Advanced topics and reference documentation
323
+
324
+ **Example output:**
325
+ ```markdown
326
+ # My Project
327
+
328
+ > Project description
329
+
330
+ ## Documentation
331
+
332
+ - [README](README.md): Main documentation
333
+ - [Getting Started](getting-started.md): Quick start guide
334
+
335
+ ## Examples
336
+
337
+ - [Basic Tutorial](tutorial.md): Step-by-step tutorial
338
+ - [Code Examples](examples.md): Example code
339
+
340
+ ## Optional
341
+
342
+ - [Advanced Topics](advanced.md): Deep dive into advanced features
343
+ - [API Reference](reference.md): Complete API reference
344
+ ```
345
+
346
+ Empty sections are automatically omitted. The "Optional" section aligns with the llms.txt spec for marking secondary content that can be skipped when context windows are limited.
347
+
348
+ ### Body Content
349
+
350
+ Add custom body content between the description and documentation sections:
351
+
352
+ ```yaml
353
+ # llm-docs-builder.yml
354
+ title: My Project
355
+ description: Brief description
356
+ body: |
357
+ This framework is built on Ruby and focuses on performance.
358
+ Key concepts: streaming, batching, and parallel processing.
359
+ docs: ./docs
360
+ ```
361
+
362
+ This produces:
363
+ ```markdown
364
+ # My Project
365
+
366
+ > Brief description
367
+
368
+ This framework is built on Ruby and focuses on performance.
369
+ Key concepts: streaming, batching, and parallel processing.
370
+
371
+ ## Documentation
372
+ ...
373
+ ```
374
+
312
375
  ## Advanced Compression Options
313
376
 
314
377
  All compression features can be used individually for fine-grained control:
@@ -61,6 +61,7 @@ module LlmDocsBuilder
61
61
  base_url: options[:base_url] || self['base_url'],
62
62
  title: options[:title] || self['title'],
63
63
  description: options[:description] || self['description'],
64
+ body: options[:body] || self['body'],
64
65
  output: options[:output] || self['output'] || 'llms.txt',
65
66
  convert_urls: if options.key?(:convert_urls)
66
67
  options[:convert_urls]
@@ -100,6 +101,7 @@ module LlmDocsBuilder
100
101
  suffix: options[:suffix] || self['suffix'] || '.llm',
101
102
  excludes: options[:excludes] || self['excludes'] || [],
102
103
  bulk: options.key?(:bulk) ? options[:bulk] : (self['bulk'] || false),
104
+ include_hidden: options.key?(:include_hidden) ? options[:include_hidden] : (self['include_hidden'] || false),
103
105
  # New compression options
104
106
  remove_code_examples: if options.key?(:remove_code_examples)
105
107
  options[:remove_code_examples]
@@ -76,6 +76,13 @@ module LlmDocsBuilder
76
76
  files = []
77
77
 
78
78
  Find.find(docs_path) do |path|
79
+ # Skip hidden directories unless explicitly enabled
80
+ # Don't prune the root docs_path itself (even if it's ".")
81
+ if File.directory?(path) && path != docs_path && File.basename(path).start_with?('.') && !options[:include_hidden]
82
+ Find.prune
83
+ next
84
+ end
85
+
79
86
  next unless File.file?(path)
80
87
  next unless path.match?(/\.md$/i)
81
88
  next if File.basename(path).start_with?('.')
@@ -203,7 +210,11 @@ module LlmDocsBuilder
203
210
 
204
211
  # Constructs llms.txt content from analyzed documentation files
205
212
  #
206
- # Combines title, description, and documentation links into formatted output
213
+ # Combines title, description, body content, and documentation links into formatted output.
214
+ # Organizes documents into sections based on priority:
215
+ # - Priority 1-3: Documentation (essential docs like README, getting started)
216
+ # - Priority 4-5: Examples (tutorials, example files)
217
+ # - Priority 6-7: Optional (advanced topics, reference docs)
207
218
  #
208
219
  # @param docs [Array<Hash>] analyzed file metadata
209
220
  # @return [String] formatted llms.txt content
@@ -217,31 +228,60 @@ module LlmDocsBuilder
217
228
  content << "> #{description}" if description
218
229
  content << ''
219
230
 
220
- if docs.any?
221
- content << '## Documentation'
231
+ # Add optional body content
232
+ if options[:body] && !options[:body].empty?
233
+ content << options[:body]
222
234
  content << ''
235
+ end
223
236
 
224
- docs.each do |doc|
225
- url = build_url(doc[:path])
226
- line = if doc[:description] && !doc[:description].empty?
227
- "- [#{doc[:title]}](#{url}): #{doc[:description]}"
228
- else
229
- "- [#{doc[:title]}](#{url})"
230
- end
231
-
232
- # Append metadata if enabled
233
- if options[:include_metadata]
234
- metadata_parts = []
235
- metadata_parts << "tokens:#{doc[:tokens]}" if doc[:tokens]
236
- metadata_parts << "compression:#{doc[:compression]}" if doc[:compression]
237
- metadata_parts << "updated:#{doc[:updated]}" if doc[:updated]
238
- metadata_parts << priority_label(doc[:priority]) if options[:include_priority]
239
-
240
- line += " #{metadata_parts.join(' ')}" unless metadata_parts.empty?
237
+ if docs.any?
238
+ # Categorize docs by priority into sections
239
+ sections = {
240
+ 'Documentation' => docs.select { |d| d[:priority] <= 3 },
241
+ 'Examples' => docs.select { |d| d[:priority] >= 4 && d[:priority] <= 5 },
242
+ 'Optional' => docs.select { |d| d[:priority] >= 6 }
243
+ }
244
+
245
+ # Build each section (skip empty ones)
246
+ sections.each do |section_name, section_docs|
247
+ next if section_docs.empty?
248
+
249
+ content << "## #{section_name}"
250
+ content << ''
251
+
252
+ section_docs.each do |doc|
253
+ url = build_url(doc[:path])
254
+
255
+ # Build metadata string if enabled
256
+ metadata_str = nil
257
+ if options[:include_metadata]
258
+ metadata_parts = []
259
+ metadata_parts << "tokens:#{doc[:tokens]}" if doc[:tokens]
260
+ metadata_parts << "compression:#{doc[:compression]}" if doc[:compression]
261
+ metadata_parts << "updated:#{doc[:updated]}" if doc[:updated]
262
+ metadata_parts << priority_label(doc[:priority]) if options[:include_priority]
263
+
264
+ metadata_str = "(#{metadata_parts.join(', ')})" unless metadata_parts.empty?
265
+ end
266
+
267
+ # Build line according to spec: - [title](url): description (metadata)
268
+ line = if doc[:description] && !doc[:description].empty?
269
+ base = "- [#{doc[:title]}](#{url}): #{doc[:description]}"
270
+ metadata_str ? "#{base} #{metadata_str}" : base
271
+ else
272
+ # No description: - [title](url) (metadata)
273
+ base = "- [#{doc[:title]}](#{url})"
274
+ metadata_str ? "#{base}: #{metadata_str}" : base
275
+ end
276
+
277
+ content << line
241
278
  end
242
279
 
243
- content << line
280
+ content << ''
244
281
  end
282
+
283
+ # Remove trailing empty line
284
+ content.pop if content.last == ''
245
285
  end
246
286
 
247
287
  "#{content.join("\n")}\n"
@@ -308,16 +348,38 @@ module LlmDocsBuilder
308
348
  return false if excludes.empty?
309
349
 
310
350
  # Get relative path from docs_path for matching
311
- relative_path = if File.directory?(docs_path)
312
- Pathname.new(file_path).relative_path_from(Pathname.new(docs_path)).to_s
313
- else
314
- File.basename(file_path)
315
- end
351
+ relative_path = begin
352
+ if File.directory?(docs_path)
353
+ # Convert both to absolute paths first to avoid "different prefix" error
354
+ abs_file = File.expand_path(file_path)
355
+ abs_docs = File.expand_path(docs_path)
356
+ Pathname.new(abs_file).relative_path_from(Pathname.new(abs_docs)).to_s
357
+ else
358
+ File.basename(file_path)
359
+ end
360
+ rescue ArgumentError
361
+ # If paths can't be made relative (different roots), use basename
362
+ File.basename(file_path)
363
+ end
316
364
 
317
365
  excludes.any? do |pattern|
366
+ # Normalize pattern: ensure /** is followed by something
367
+ # fnmatch requires /** to be followed by at least one component
368
+ normalized_pattern = pattern.end_with?('/**') ? "#{pattern}/*" : pattern
369
+
318
370
  # Check both absolute and relative paths
319
- File.fnmatch(pattern, file_path, File::FNM_PATHNAME | File::FNM_DOTMATCH) ||
320
- File.fnmatch(pattern, relative_path, File::FNM_PATHNAME | File::FNM_DOTMATCH)
371
+ matches = File.fnmatch(normalized_pattern, file_path, File::FNM_PATHNAME | File::FNM_DOTMATCH) ||
372
+ File.fnmatch(normalized_pattern, relative_path, File::FNM_PATHNAME | File::FNM_DOTMATCH)
373
+
374
+ # If pattern starts with **/, also try without it (for root-level matches)
375
+ # Since **/ in fnmatch doesn't match zero directories
376
+ if !matches && normalized_pattern.start_with?('**/')
377
+ pattern_without_prefix = normalized_pattern.sub(%r{^\*\*/}, '')
378
+ matches = File.fnmatch(pattern_without_prefix, file_path, File::FNM_PATHNAME | File::FNM_DOTMATCH) ||
379
+ File.fnmatch(pattern_without_prefix, relative_path, File::FNM_PATHNAME | File::FNM_DOTMATCH)
380
+ end
381
+
382
+ matches
321
383
  end
322
384
  end
323
385
 
@@ -85,7 +85,7 @@ module LlmDocsBuilder
85
85
 
86
86
  # Extracts markdown links from section content into structured format
87
87
  #
88
- # Scans for markdown list items with links and descriptions. Returns raw content
88
+ # Scans for markdown list items with links and optional descriptions. Returns raw content
89
89
  # if no links are found in the expected format.
90
90
  #
91
91
  # @param content [String] raw section content
@@ -93,11 +93,13 @@ module LlmDocsBuilder
93
93
  def parse_section_content(content)
94
94
  links = []
95
95
 
96
- content.scan(/^[-*]\s*\[([^\]]+)\]\(([^)]+)\):\s*(.*)$/m) do |title, url, description|
96
+ # Updated regex: description is optional (non-capturing group with ?)
97
+ # Use [^\n]* instead of .* to avoid matching across lines
98
+ content.scan(/^[-*]\s*\[([^\]]+)\]\(([^)]+)\)(?::\s*([^\n]*))?$/) do |title, url, description|
97
99
  links << {
98
100
  title: title,
99
101
  url: url,
100
- description: description.strip
102
+ description: description&.strip || ''
101
103
  }
102
104
  end
103
105
 
@@ -2,5 +2,5 @@
2
2
 
3
3
  module LlmDocsBuilder
4
4
  # Current version of the LlmDocsBuilder gem
5
- VERSION = '0.9.3'
5
+ VERSION = '0.10.0'
6
6
  end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: llm-docs-builder
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.9.3
4
+ version: 0.10.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Maciej Mensfeld