RubyGems - llm-docs-builder - Versions diffs - 0.9.3 → 0.10.0 - Mend

llm-docs-builder 0.9.3 → 0.10.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (9) hide show

checksums.yaml +4 -4
data/CHANGELOG.md +29 -0
data/Gemfile.lock +1 -1
data/README.md +66 -3
data/lib/llm_docs_builder/config.rb +2 -0
data/lib/llm_docs_builder/generator.rb +90 -28
data/lib/llm_docs_builder/parser.rb +5 -3
data/lib/llm_docs_builder/version.rb +1 -1
metadata +1 -1

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: 75429a83cfd019e7059f76a5fe7df200bff3bc14066c5f83e0739c85a5a68b63
-  data.tar.gz: 850d0568023dd1602cad0b48af5e6437b61f34f760c77b99adb565056313d241
+  metadata.gz: e6407836216f436b728247009f614ea4ea5c2b4de0edf855717129460df4b309
+  data.tar.gz: 8a556fc0b6307529f5c615c05b082521bd021e9e9f34ca30c8bb21d22b21deb2
 SHA512:
-  metadata.gz: f0384e50c10837ec00e4d195115882d4fdaade5e522bba02929b028ac315d686324baf189bfc16803120b77718b28c10f2026e10f2f2bd3dd1ea7aab09f36549
-  data.tar.gz: 184e427780364d704067b1f84962fa7c658782ff2e048c991b34c06ca2643029d107872df1a0beac92cfb07076ae72cf9289527d534cad6274a99fdeaef22ac0
+  metadata.gz: 3a0b657545415c35187fa1595f3fcc5bd5c27a0c1a292bf00635d3c6f14221ac0a254f008f83acc1f7351ef76c6649e6b0540ac585cd64a670b05ef725b0308e
+  data.tar.gz: 68e95142e374ebae3c292db724a9163c396fd423eaa04983bbdaefd6f6cbb0bc72822dfbc5af9fd8f357f1545320db70bf5a6c8a2413fc5f796e73195adcdd13

data/CHANGELOG.md CHANGED Viewed

@@ -1,5 +1,34 @@
 # Changelog
+## 0.10.0 (2025-10-27)
+- [Feature] **llms.txt Specification Compliance** - Updated output format to fully comply with the llms.txt specification from llmstxt.org.
+  - **Metadata Format**: Metadata now appears within the description field using parentheses and comma separators: `- [title](url): description (tokens:450, updated:2025-10-13, priority:high)`
+  - **Optional Descriptions**: Parser now correctly handles links without descriptions: `- [title](url)` per spec
+  - **Multi-Section Support**: Documents automatically organized into `Documentation`, `Examples`, and `Optional` sections based on priority
+  - **Body Content Support**: Added optional `body` config parameter for custom content between description and sections
+  - Priority-based categorization: 1-3 → Documentation, 4-5 → Examples, 6-7 → Optional
+  - Empty sections are automatically omitted from output
+  - Updated parser regex from `/^[-*]\s*\[([^\]]+)\]\(([^)]+)\):\s*(.*)$/m` to `/^[-*]\s*\[([^\]]+)\]\(([^)]+)\)(?::\s*([^\n]*))?$/` to make descriptions optional
+  - Fixed multiline regex greedy matching issue that was capturing only one link per section
+- [Test] Added comprehensive test suite for spec compliance (8 new parser tests, 7 new generator tests)
+- [Docs] Updated README with multi-section organization examples and body content usage
+- **Breaking Change**: Metadata format has changed from `tokens:450 updated:2025-10-13` to `(tokens:450, updated:2025-10-13)` for spec compliance
+## 0.9.4 (2025-10-27)
+- [Feature] **Auto-Exclude Hidden Directories** - Hidden directories (starting with `.`) are now automatically excluded by default to prevent noise from `.git`, `.lint`, `.github`, etc.
+  - Adds `include_hidden: false` as default behavior
+  - Set `include_hidden: true` in config to include hidden directories if needed
+  - Uses `Find.prune` for efficient directory tree traversal
+  - Prevents scanning of common directories like `.lint`, `.gh`, `.git`, `node_modules` (if hidden)
+  - Fixed bug where root directory `.` was being pruned when used as docs_path
+- [Fix] **Excludes Pattern Matching** - Fixed fnmatch pattern handling for better glob pattern support.
+  - Fixed `**/.dir/**` patterns now correctly match root-level directories
+  - Normalized patterns ending with `/**` to `/**/*` for proper fnmatch behavior
+  - Handles `**/` prefix matching for zero-directory cases
+  - Fixed relative path calculation to avoid "different prefix" errors
+- [Test] Added unit tests for hidden directory exclusion feature (5 tests)
+- [Test] Added integration tests for hidden directory behavior (3 tests)
 ## 0.9.3 (2025-10-27)
 - [Fix] **Generate Command Excludes Support** - The `generate` command now properly respects the `excludes` configuration option to filter out files from llms.txt generation.
   - Added `should_exclude?` method to Generator class that matches files against glob patterns

data/Gemfile.lock CHANGED Viewed

@@ -1,7 +1,7 @@
 PATH
   remote: .
   specs:
-    llm-docs-builder (0.9.3)
+    llm-docs-builder (0.10.0)
       zeitwerk (~> 2.6)
 GEM

data/README.md CHANGED Viewed

@@ -109,6 +109,7 @@ docs: ./docs
 base_url: https://myproject.io
 title: My Project
 description: Brief description
+body: Optional body content between description and sections
 output: llms.txt
 suffix: .llm
 verbose: false
@@ -289,9 +290,9 @@ Generate enriched llms.txt files with token counts, timestamps, and priority lab
 **Enhanced llms.txt (with metadata enabled):**
 ```markdown
-- [Getting Started](https://myproject.io/docs/Getting-Started.md) tokens:450 updated:2025-10-13 priority:high
-- [Configuration](https://myproject.io/docs/Configuration.md) tokens:2800 updated:2025-10-12 priority:high
-- [Advanced Topics](https://myproject.io/docs/Advanced.md) tokens:5200 updated:2025-09-15 priority:medium
+- [Getting Started](https://myproject.io/docs/Getting-Started.md): Quick start guide (tokens:450, updated:2025-10-13, priority:high)
+- [Configuration](https://myproject.io/docs/Configuration.md): Configuration options (tokens:2800, updated:2025-10-12, priority:high)
+- [Advanced Topics](https://myproject.io/docs/Advanced.md): Deep dive topics (tokens:5200, updated:2025-09-15, priority:medium)
 ```
 **Benefits:**
@@ -309,6 +310,68 @@ include_priority: true      # Show priority labels (high/medium/low)
 calculate_compression: true # Show compression ratios (slower, requires transformation)
 ```
+**Note:** Metadata is formatted according to the llms.txt specification, appearing within the description field using parentheses and comma separators for spec compliance.
+### Multi-Section Organization
+Documents are automatically organized into multiple sections based on priority, following the llms.txt specification:
+**Priority-based categorization:**
+- **Documentation** (priority 1-3): Essential docs like README, getting started guides, user guides
+- **Examples** (priority 4-5): Tutorials and example files
+- **Optional** (priority 6-7): Advanced topics and reference documentation
+**Example output:**
+```markdown
+# My Project
+> Project description
+## Documentation
+- [README](README.md): Main documentation
+- [Getting Started](getting-started.md): Quick start guide
+## Examples
+- [Basic Tutorial](tutorial.md): Step-by-step tutorial
+- [Code Examples](examples.md): Example code
+## Optional
+- [Advanced Topics](advanced.md): Deep dive into advanced features
+- [API Reference](reference.md): Complete API reference
+```
+Empty sections are automatically omitted. The "Optional" section aligns with the llms.txt spec for marking secondary content that can be skipped when context windows are limited.
+### Body Content
+Add custom body content between the description and documentation sections:
+```yaml
+# llm-docs-builder.yml
+title: My Project
+description: Brief description
+body: |
+  This framework is built on Ruby and focuses on performance.
+  Key concepts: streaming, batching, and parallel processing.
+docs: ./docs
+```
+This produces:
+```markdown
+# My Project
+> Brief description
+This framework is built on Ruby and focuses on performance.
+Key concepts: streaming, batching, and parallel processing.
+## Documentation
+...
+```
 ## Advanced Compression Options
 All compression features can be used individually for fine-grained control:

data/lib/llm_docs_builder/config.rb CHANGED Viewed

@@ -61,6 +61,7 @@ module LlmDocsBuilder
         base_url: options[:base_url] || self['base_url'],
         title: options[:title] || self['title'],
         description: options[:description] || self['description'],
+        body: options[:body] || self['body'],
         output: options[:output] || self['output'] || 'llms.txt',
         convert_urls: if options.key?(:convert_urls)
                         options[:convert_urls]
@@ -100,6 +101,7 @@ module LlmDocsBuilder
         suffix: options[:suffix] || self['suffix'] || '.llm',
         excludes: options[:excludes] || self['excludes'] || [],
         bulk: options.key?(:bulk) ? options[:bulk] : (self['bulk'] || false),
+        include_hidden: options.key?(:include_hidden) ? options[:include_hidden] : (self['include_hidden'] || false),
         # New compression options
         remove_code_examples: if options.key?(:remove_code_examples)
                                 options[:remove_code_examples]

data/lib/llm_docs_builder/generator.rb CHANGED Viewed

@@ -76,6 +76,13 @@ module LlmDocsBuilder
       files = []
       Find.find(docs_path) do |path|
+        # Skip hidden directories unless explicitly enabled
+        # Don't prune the root docs_path itself (even if it's ".")
+        if File.directory?(path) && path != docs_path && File.basename(path).start_with?('.') && !options[:include_hidden]
+          Find.prune
+          next
+        end
         next unless File.file?(path)
         next unless path.match?(/\.md$/i)
         next if File.basename(path).start_with?('.')
@@ -203,7 +210,11 @@ module LlmDocsBuilder
     # Constructs llms.txt content from analyzed documentation files
     #
-    # Combines title, description, and documentation links into formatted output
+    # Combines title, description, body content, and documentation links into formatted output.
+    # Organizes documents into sections based on priority:
+    # - Priority 1-3: Documentation (essential docs like README, getting started)
+    # - Priority 4-5: Examples (tutorials, example files)
+    # - Priority 6-7: Optional (advanced topics, reference docs)
     #
     # @param docs [Array<Hash>] analyzed file metadata
     # @return [String] formatted llms.txt content
@@ -217,31 +228,60 @@ module LlmDocsBuilder
       content << "> #{description}" if description
       content << ''
-      if docs.any?
-        content << '## Documentation'
+      # Add optional body content
+      if options[:body] && !options[:body].empty?
+        content << options[:body]
         content << ''
+      end
-        docs.each do |doc|
-          url = build_url(doc[:path])
-          line = if doc[:description] && !doc[:description].empty?
-                   "- [#{doc[:title]}](#{url}): #{doc[:description]}"
-                 else
-                   "- [#{doc[:title]}](#{url})"
-                 end
-          # Append metadata if enabled
-          if options[:include_metadata]
-            metadata_parts = []
-            metadata_parts << "tokens:#{doc[:tokens]}" if doc[:tokens]
-            metadata_parts << "compression:#{doc[:compression]}" if doc[:compression]
-            metadata_parts << "updated:#{doc[:updated]}" if doc[:updated]
-            metadata_parts << priority_label(doc[:priority]) if options[:include_priority]
-            line += " #{metadata_parts.join(' ')}" unless metadata_parts.empty?
+      if docs.any?
+        # Categorize docs by priority into sections
+        sections = {
+          'Documentation' => docs.select { |d| d[:priority] <= 3 },
+          'Examples' => docs.select { |d| d[:priority] >= 4 && d[:priority] <= 5 },
+          'Optional' => docs.select { |d| d[:priority] >= 6 }
+        }
+        # Build each section (skip empty ones)
+        sections.each do |section_name, section_docs|
+          next if section_docs.empty?
+          content << "## #{section_name}"
+          content << ''
+          section_docs.each do |doc|
+            url = build_url(doc[:path])
+            # Build metadata string if enabled
+            metadata_str = nil
+            if options[:include_metadata]
+              metadata_parts = []
+              metadata_parts << "tokens:#{doc[:tokens]}" if doc[:tokens]
+              metadata_parts << "compression:#{doc[:compression]}" if doc[:compression]
+              metadata_parts << "updated:#{doc[:updated]}" if doc[:updated]
+              metadata_parts << priority_label(doc[:priority]) if options[:include_priority]
+              metadata_str = "(#{metadata_parts.join(', ')})" unless metadata_parts.empty?
+            end
+            # Build line according to spec: - [title](url): description (metadata)
+            line = if doc[:description] && !doc[:description].empty?
+                     base = "- [#{doc[:title]}](#{url}): #{doc[:description]}"
+                     metadata_str ? "#{base} #{metadata_str}" : base
+                   else
+                     # No description: - [title](url) (metadata)
+                     base = "- [#{doc[:title]}](#{url})"
+                     metadata_str ? "#{base}: #{metadata_str}" : base
+                   end
+            content << line
           end
-          content << line
+          content << ''
         end
+        # Remove trailing empty line
+        content.pop if content.last == ''
       end
       "#{content.join("\n")}\n"
@@ -308,16 +348,38 @@ module LlmDocsBuilder
       return false if excludes.empty?
       # Get relative path from docs_path for matching
-      relative_path = if File.directory?(docs_path)
-                        Pathname.new(file_path).relative_path_from(Pathname.new(docs_path)).to_s
-                      else
-                        File.basename(file_path)
-                      end
+      relative_path = begin
+        if File.directory?(docs_path)
+          # Convert both to absolute paths first to avoid "different prefix" error
+          abs_file = File.expand_path(file_path)
+          abs_docs = File.expand_path(docs_path)
+          Pathname.new(abs_file).relative_path_from(Pathname.new(abs_docs)).to_s
+        else
+          File.basename(file_path)
+        end
+      rescue ArgumentError
+        # If paths can't be made relative (different roots), use basename
+        File.basename(file_path)
+      end
       excludes.any? do |pattern|
+        # Normalize pattern: ensure /** is followed by something
+        # fnmatch requires /** to be followed by at least one component
+        normalized_pattern = pattern.end_with?('/**') ? "#{pattern}/*" : pattern
         # Check both absolute and relative paths
-        File.fnmatch(pattern, file_path, File::FNM_PATHNAME | File::FNM_DOTMATCH) ||
-          File.fnmatch(pattern, relative_path, File::FNM_PATHNAME | File::FNM_DOTMATCH)
+        matches = File.fnmatch(normalized_pattern, file_path, File::FNM_PATHNAME | File::FNM_DOTMATCH) ||
+                  File.fnmatch(normalized_pattern, relative_path, File::FNM_PATHNAME | File::FNM_DOTMATCH)
+        # If pattern starts with **/, also try without it (for root-level matches)
+        # Since **/ in fnmatch doesn't match zero directories
+        if !matches && normalized_pattern.start_with?('**/')
+          pattern_without_prefix = normalized_pattern.sub(%r{^\*\*/}, '')
+          matches = File.fnmatch(pattern_without_prefix, file_path, File::FNM_PATHNAME | File::FNM_DOTMATCH) ||
+                    File.fnmatch(pattern_without_prefix, relative_path, File::FNM_PATHNAME | File::FNM_DOTMATCH)
+        end
+        matches
       end
     end

data/lib/llm_docs_builder/parser.rb CHANGED Viewed

@@ -85,7 +85,7 @@ module LlmDocsBuilder
     # Extracts markdown links from section content into structured format
     #
-    # Scans for markdown list items with links and descriptions. Returns raw content
+    # Scans for markdown list items with links and optional descriptions. Returns raw content
     # if no links are found in the expected format.
     #
     # @param content [String] raw section content
@@ -93,11 +93,13 @@ module LlmDocsBuilder
     def parse_section_content(content)
       links = []
-      content.scan(/^[-*]\s*\[([^\]]+)\]\(([^)]+)\):\s*(.*)$/m) do |title, url, description|
+      # Updated regex: description is optional (non-capturing group with ?)
+      # Use [^\n]* instead of .* to avoid matching across lines
+      content.scan(/^[-*]\s*\[([^\]]+)\]\(([^)]+)\)(?::\s*([^\n]*))?$/) do |title, url, description|
         links << {
           title: title,
           url: url,
-          description: description.strip
+          description: description&.strip || ''
         }
       end

data/lib/llm_docs_builder/version.rb CHANGED Viewed

@@ -2,5 +2,5 @@
 module LlmDocsBuilder
   # Current version of the LlmDocsBuilder gem
-  VERSION = '0.9.3'
+  VERSION = '0.10.0'
 end

metadata CHANGED Viewed

@@ -1,7 +1,7 @@
 --- !ruby/object:Gem::Specification
 name: llm-docs-builder
 version: !ruby/object:Gem::Version
-  version: 0.9.3
+  version: 0.10.0
 platform: ruby
 authors:
 - Maciej Mensfeld