llm-docs-builder 0.9.3 → 0.10.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +29 -0
- data/Gemfile.lock +1 -1
- data/README.md +66 -3
- data/lib/llm_docs_builder/config.rb +2 -0
- data/lib/llm_docs_builder/generator.rb +90 -28
- data/lib/llm_docs_builder/parser.rb +5 -3
- data/lib/llm_docs_builder/version.rb +1 -1
- metadata +1 -1
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: e6407836216f436b728247009f614ea4ea5c2b4de0edf855717129460df4b309
|
|
4
|
+
data.tar.gz: 8a556fc0b6307529f5c615c05b082521bd021e9e9f34ca30c8bb21d22b21deb2
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: 3a0b657545415c35187fa1595f3fcc5bd5c27a0c1a292bf00635d3c6f14221ac0a254f008f83acc1f7351ef76c6649e6b0540ac585cd64a670b05ef725b0308e
|
|
7
|
+
data.tar.gz: 68e95142e374ebae3c292db724a9163c396fd423eaa04983bbdaefd6f6cbb0bc72822dfbc5af9fd8f357f1545320db70bf5a6c8a2413fc5f796e73195adcdd13
|
data/CHANGELOG.md
CHANGED
|
@@ -1,5 +1,34 @@
|
|
|
1
1
|
# Changelog
|
|
2
2
|
|
|
3
|
+
## 0.10.0 (2025-10-27)
|
|
4
|
+
- [Feature] **llms.txt Specification Compliance** - Updated output format to fully comply with the llms.txt specification from llmstxt.org.
|
|
5
|
+
- **Metadata Format**: Metadata now appears within the description field using parentheses and comma separators: `- [title](url): description (tokens:450, updated:2025-10-13, priority:high)`
|
|
6
|
+
- **Optional Descriptions**: Parser now correctly handles links without descriptions: `- [title](url)` per spec
|
|
7
|
+
- **Multi-Section Support**: Documents automatically organized into `Documentation`, `Examples`, and `Optional` sections based on priority
|
|
8
|
+
- **Body Content Support**: Added optional `body` config parameter for custom content between description and sections
|
|
9
|
+
- Priority-based categorization: 1-3 → Documentation, 4-5 → Examples, 6-7 → Optional
|
|
10
|
+
- Empty sections are automatically omitted from output
|
|
11
|
+
- Updated parser regex from `/^[-*]\s*\[([^\]]+)\]\(([^)]+)\):\s*(.*)$/m` to `/^[-*]\s*\[([^\]]+)\]\(([^)]+)\)(?::\s*([^\n]*))?$/` to make descriptions optional
|
|
12
|
+
- Fixed multiline regex greedy matching issue that was capturing only one link per section
|
|
13
|
+
- [Test] Added comprehensive test suite for spec compliance (8 new parser tests, 7 new generator tests)
|
|
14
|
+
- [Docs] Updated README with multi-section organization examples and body content usage
|
|
15
|
+
- **Breaking Change**: Metadata format has changed from `tokens:450 updated:2025-10-13` to `(tokens:450, updated:2025-10-13)` for spec compliance
|
|
16
|
+
|
|
17
|
+
## 0.9.4 (2025-10-27)
|
|
18
|
+
- [Feature] **Auto-Exclude Hidden Directories** - Hidden directories (starting with `.`) are now automatically excluded by default to prevent noise from `.git`, `.lint`, `.github`, etc.
|
|
19
|
+
- Adds `include_hidden: false` as default behavior
|
|
20
|
+
- Set `include_hidden: true` in config to include hidden directories if needed
|
|
21
|
+
- Uses `Find.prune` for efficient directory tree traversal
|
|
22
|
+
- Prevents scanning of common directories like `.lint`, `.gh`, `.git`, `node_modules` (if hidden)
|
|
23
|
+
- Fixed bug where root directory `.` was being pruned when used as docs_path
|
|
24
|
+
- [Fix] **Excludes Pattern Matching** - Fixed fnmatch pattern handling for better glob pattern support.
|
|
25
|
+
- Fixed `**/.dir/**` patterns now correctly match root-level directories
|
|
26
|
+
- Normalized patterns ending with `/**` to `/**/*` for proper fnmatch behavior
|
|
27
|
+
- Handles `**/` prefix matching for zero-directory cases
|
|
28
|
+
- Fixed relative path calculation to avoid "different prefix" errors
|
|
29
|
+
- [Test] Added unit tests for hidden directory exclusion feature (5 tests)
|
|
30
|
+
- [Test] Added integration tests for hidden directory behavior (3 tests)
|
|
31
|
+
|
|
3
32
|
## 0.9.3 (2025-10-27)
|
|
4
33
|
- [Fix] **Generate Command Excludes Support** - The `generate` command now properly respects the `excludes` configuration option to filter out files from llms.txt generation.
|
|
5
34
|
- Added `should_exclude?` method to Generator class that matches files against glob patterns
|
data/Gemfile.lock
CHANGED
data/README.md
CHANGED
|
@@ -109,6 +109,7 @@ docs: ./docs
|
|
|
109
109
|
base_url: https://myproject.io
|
|
110
110
|
title: My Project
|
|
111
111
|
description: Brief description
|
|
112
|
+
body: Optional body content between description and sections
|
|
112
113
|
output: llms.txt
|
|
113
114
|
suffix: .llm
|
|
114
115
|
verbose: false
|
|
@@ -289,9 +290,9 @@ Generate enriched llms.txt files with token counts, timestamps, and priority lab
|
|
|
289
290
|
|
|
290
291
|
**Enhanced llms.txt (with metadata enabled):**
|
|
291
292
|
```markdown
|
|
292
|
-
- [Getting Started](https://myproject.io/docs/Getting-Started.md) tokens:450 updated:2025-10-13 priority:high
|
|
293
|
-
- [Configuration](https://myproject.io/docs/Configuration.md) tokens:2800 updated:2025-10-12 priority:high
|
|
294
|
-
- [Advanced Topics](https://myproject.io/docs/Advanced.md) tokens:5200 updated:2025-09-15 priority:medium
|
|
293
|
+
- [Getting Started](https://myproject.io/docs/Getting-Started.md): Quick start guide (tokens:450, updated:2025-10-13, priority:high)
|
|
294
|
+
- [Configuration](https://myproject.io/docs/Configuration.md): Configuration options (tokens:2800, updated:2025-10-12, priority:high)
|
|
295
|
+
- [Advanced Topics](https://myproject.io/docs/Advanced.md): Deep dive topics (tokens:5200, updated:2025-09-15, priority:medium)
|
|
295
296
|
```
|
|
296
297
|
|
|
297
298
|
**Benefits:**
|
|
@@ -309,6 +310,68 @@ include_priority: true # Show priority labels (high/medium/low)
|
|
|
309
310
|
calculate_compression: true # Show compression ratios (slower, requires transformation)
|
|
310
311
|
```
|
|
311
312
|
|
|
313
|
+
**Note:** Metadata is formatted according to the llms.txt specification, appearing within the description field using parentheses and comma separators for spec compliance.
|
|
314
|
+
|
|
315
|
+
### Multi-Section Organization
|
|
316
|
+
|
|
317
|
+
Documents are automatically organized into multiple sections based on priority, following the llms.txt specification:
|
|
318
|
+
|
|
319
|
+
**Priority-based categorization:**
|
|
320
|
+
- **Documentation** (priority 1-3): Essential docs like README, getting started guides, user guides
|
|
321
|
+
- **Examples** (priority 4-5): Tutorials and example files
|
|
322
|
+
- **Optional** (priority 6-7): Advanced topics and reference documentation
|
|
323
|
+
|
|
324
|
+
**Example output:**
|
|
325
|
+
```markdown
|
|
326
|
+
# My Project
|
|
327
|
+
|
|
328
|
+
> Project description
|
|
329
|
+
|
|
330
|
+
## Documentation
|
|
331
|
+
|
|
332
|
+
- [README](README.md): Main documentation
|
|
333
|
+
- [Getting Started](getting-started.md): Quick start guide
|
|
334
|
+
|
|
335
|
+
## Examples
|
|
336
|
+
|
|
337
|
+
- [Basic Tutorial](tutorial.md): Step-by-step tutorial
|
|
338
|
+
- [Code Examples](examples.md): Example code
|
|
339
|
+
|
|
340
|
+
## Optional
|
|
341
|
+
|
|
342
|
+
- [Advanced Topics](advanced.md): Deep dive into advanced features
|
|
343
|
+
- [API Reference](reference.md): Complete API reference
|
|
344
|
+
```
|
|
345
|
+
|
|
346
|
+
Empty sections are automatically omitted. The "Optional" section aligns with the llms.txt spec for marking secondary content that can be skipped when context windows are limited.
|
|
347
|
+
|
|
348
|
+
### Body Content
|
|
349
|
+
|
|
350
|
+
Add custom body content between the description and documentation sections:
|
|
351
|
+
|
|
352
|
+
```yaml
|
|
353
|
+
# llm-docs-builder.yml
|
|
354
|
+
title: My Project
|
|
355
|
+
description: Brief description
|
|
356
|
+
body: |
|
|
357
|
+
This framework is built on Ruby and focuses on performance.
|
|
358
|
+
Key concepts: streaming, batching, and parallel processing.
|
|
359
|
+
docs: ./docs
|
|
360
|
+
```
|
|
361
|
+
|
|
362
|
+
This produces:
|
|
363
|
+
```markdown
|
|
364
|
+
# My Project
|
|
365
|
+
|
|
366
|
+
> Brief description
|
|
367
|
+
|
|
368
|
+
This framework is built on Ruby and focuses on performance.
|
|
369
|
+
Key concepts: streaming, batching, and parallel processing.
|
|
370
|
+
|
|
371
|
+
## Documentation
|
|
372
|
+
...
|
|
373
|
+
```
|
|
374
|
+
|
|
312
375
|
## Advanced Compression Options
|
|
313
376
|
|
|
314
377
|
All compression features can be used individually for fine-grained control:
|
|
@@ -61,6 +61,7 @@ module LlmDocsBuilder
|
|
|
61
61
|
base_url: options[:base_url] || self['base_url'],
|
|
62
62
|
title: options[:title] || self['title'],
|
|
63
63
|
description: options[:description] || self['description'],
|
|
64
|
+
body: options[:body] || self['body'],
|
|
64
65
|
output: options[:output] || self['output'] || 'llms.txt',
|
|
65
66
|
convert_urls: if options.key?(:convert_urls)
|
|
66
67
|
options[:convert_urls]
|
|
@@ -100,6 +101,7 @@ module LlmDocsBuilder
|
|
|
100
101
|
suffix: options[:suffix] || self['suffix'] || '.llm',
|
|
101
102
|
excludes: options[:excludes] || self['excludes'] || [],
|
|
102
103
|
bulk: options.key?(:bulk) ? options[:bulk] : (self['bulk'] || false),
|
|
104
|
+
include_hidden: options.key?(:include_hidden) ? options[:include_hidden] : (self['include_hidden'] || false),
|
|
103
105
|
# New compression options
|
|
104
106
|
remove_code_examples: if options.key?(:remove_code_examples)
|
|
105
107
|
options[:remove_code_examples]
|
|
@@ -76,6 +76,13 @@ module LlmDocsBuilder
|
|
|
76
76
|
files = []
|
|
77
77
|
|
|
78
78
|
Find.find(docs_path) do |path|
|
|
79
|
+
# Skip hidden directories unless explicitly enabled
|
|
80
|
+
# Don't prune the root docs_path itself (even if it's ".")
|
|
81
|
+
if File.directory?(path) && path != docs_path && File.basename(path).start_with?('.') && !options[:include_hidden]
|
|
82
|
+
Find.prune
|
|
83
|
+
next
|
|
84
|
+
end
|
|
85
|
+
|
|
79
86
|
next unless File.file?(path)
|
|
80
87
|
next unless path.match?(/\.md$/i)
|
|
81
88
|
next if File.basename(path).start_with?('.')
|
|
@@ -203,7 +210,11 @@ module LlmDocsBuilder
|
|
|
203
210
|
|
|
204
211
|
# Constructs llms.txt content from analyzed documentation files
|
|
205
212
|
#
|
|
206
|
-
# Combines title, description, and documentation links into formatted output
|
|
213
|
+
# Combines title, description, body content, and documentation links into formatted output.
|
|
214
|
+
# Organizes documents into sections based on priority:
|
|
215
|
+
# - Priority 1-3: Documentation (essential docs like README, getting started)
|
|
216
|
+
# - Priority 4-5: Examples (tutorials, example files)
|
|
217
|
+
# - Priority 6-7: Optional (advanced topics, reference docs)
|
|
207
218
|
#
|
|
208
219
|
# @param docs [Array<Hash>] analyzed file metadata
|
|
209
220
|
# @return [String] formatted llms.txt content
|
|
@@ -217,31 +228,60 @@ module LlmDocsBuilder
|
|
|
217
228
|
content << "> #{description}" if description
|
|
218
229
|
content << ''
|
|
219
230
|
|
|
220
|
-
|
|
221
|
-
|
|
231
|
+
# Add optional body content
|
|
232
|
+
if options[:body] && !options[:body].empty?
|
|
233
|
+
content << options[:body]
|
|
222
234
|
content << ''
|
|
235
|
+
end
|
|
223
236
|
|
|
224
|
-
|
|
225
|
-
|
|
226
|
-
|
|
227
|
-
|
|
228
|
-
|
|
229
|
-
|
|
230
|
-
|
|
231
|
-
|
|
232
|
-
|
|
233
|
-
|
|
234
|
-
|
|
235
|
-
|
|
236
|
-
|
|
237
|
-
|
|
238
|
-
|
|
239
|
-
|
|
240
|
-
|
|
237
|
+
if docs.any?
|
|
238
|
+
# Categorize docs by priority into sections
|
|
239
|
+
sections = {
|
|
240
|
+
'Documentation' => docs.select { |d| d[:priority] <= 3 },
|
|
241
|
+
'Examples' => docs.select { |d| d[:priority] >= 4 && d[:priority] <= 5 },
|
|
242
|
+
'Optional' => docs.select { |d| d[:priority] >= 6 }
|
|
243
|
+
}
|
|
244
|
+
|
|
245
|
+
# Build each section (skip empty ones)
|
|
246
|
+
sections.each do |section_name, section_docs|
|
|
247
|
+
next if section_docs.empty?
|
|
248
|
+
|
|
249
|
+
content << "## #{section_name}"
|
|
250
|
+
content << ''
|
|
251
|
+
|
|
252
|
+
section_docs.each do |doc|
|
|
253
|
+
url = build_url(doc[:path])
|
|
254
|
+
|
|
255
|
+
# Build metadata string if enabled
|
|
256
|
+
metadata_str = nil
|
|
257
|
+
if options[:include_metadata]
|
|
258
|
+
metadata_parts = []
|
|
259
|
+
metadata_parts << "tokens:#{doc[:tokens]}" if doc[:tokens]
|
|
260
|
+
metadata_parts << "compression:#{doc[:compression]}" if doc[:compression]
|
|
261
|
+
metadata_parts << "updated:#{doc[:updated]}" if doc[:updated]
|
|
262
|
+
metadata_parts << priority_label(doc[:priority]) if options[:include_priority]
|
|
263
|
+
|
|
264
|
+
metadata_str = "(#{metadata_parts.join(', ')})" unless metadata_parts.empty?
|
|
265
|
+
end
|
|
266
|
+
|
|
267
|
+
# Build line according to spec: - [title](url): description (metadata)
|
|
268
|
+
line = if doc[:description] && !doc[:description].empty?
|
|
269
|
+
base = "- [#{doc[:title]}](#{url}): #{doc[:description]}"
|
|
270
|
+
metadata_str ? "#{base} #{metadata_str}" : base
|
|
271
|
+
else
|
|
272
|
+
# No description: - [title](url) (metadata)
|
|
273
|
+
base = "- [#{doc[:title]}](#{url})"
|
|
274
|
+
metadata_str ? "#{base}: #{metadata_str}" : base
|
|
275
|
+
end
|
|
276
|
+
|
|
277
|
+
content << line
|
|
241
278
|
end
|
|
242
279
|
|
|
243
|
-
content <<
|
|
280
|
+
content << ''
|
|
244
281
|
end
|
|
282
|
+
|
|
283
|
+
# Remove trailing empty line
|
|
284
|
+
content.pop if content.last == ''
|
|
245
285
|
end
|
|
246
286
|
|
|
247
287
|
"#{content.join("\n")}\n"
|
|
@@ -308,16 +348,38 @@ module LlmDocsBuilder
|
|
|
308
348
|
return false if excludes.empty?
|
|
309
349
|
|
|
310
350
|
# Get relative path from docs_path for matching
|
|
311
|
-
relative_path =
|
|
312
|
-
|
|
313
|
-
|
|
314
|
-
|
|
315
|
-
|
|
351
|
+
relative_path = begin
|
|
352
|
+
if File.directory?(docs_path)
|
|
353
|
+
# Convert both to absolute paths first to avoid "different prefix" error
|
|
354
|
+
abs_file = File.expand_path(file_path)
|
|
355
|
+
abs_docs = File.expand_path(docs_path)
|
|
356
|
+
Pathname.new(abs_file).relative_path_from(Pathname.new(abs_docs)).to_s
|
|
357
|
+
else
|
|
358
|
+
File.basename(file_path)
|
|
359
|
+
end
|
|
360
|
+
rescue ArgumentError
|
|
361
|
+
# If paths can't be made relative (different roots), use basename
|
|
362
|
+
File.basename(file_path)
|
|
363
|
+
end
|
|
316
364
|
|
|
317
365
|
excludes.any? do |pattern|
|
|
366
|
+
# Normalize pattern: ensure /** is followed by something
|
|
367
|
+
# fnmatch requires /** to be followed by at least one component
|
|
368
|
+
normalized_pattern = pattern.end_with?('/**') ? "#{pattern}/*" : pattern
|
|
369
|
+
|
|
318
370
|
# Check both absolute and relative paths
|
|
319
|
-
File.fnmatch(
|
|
320
|
-
|
|
371
|
+
matches = File.fnmatch(normalized_pattern, file_path, File::FNM_PATHNAME | File::FNM_DOTMATCH) ||
|
|
372
|
+
File.fnmatch(normalized_pattern, relative_path, File::FNM_PATHNAME | File::FNM_DOTMATCH)
|
|
373
|
+
|
|
374
|
+
# If pattern starts with **/, also try without it (for root-level matches)
|
|
375
|
+
# Since **/ in fnmatch doesn't match zero directories
|
|
376
|
+
if !matches && normalized_pattern.start_with?('**/')
|
|
377
|
+
pattern_without_prefix = normalized_pattern.sub(%r{^\*\*/}, '')
|
|
378
|
+
matches = File.fnmatch(pattern_without_prefix, file_path, File::FNM_PATHNAME | File::FNM_DOTMATCH) ||
|
|
379
|
+
File.fnmatch(pattern_without_prefix, relative_path, File::FNM_PATHNAME | File::FNM_DOTMATCH)
|
|
380
|
+
end
|
|
381
|
+
|
|
382
|
+
matches
|
|
321
383
|
end
|
|
322
384
|
end
|
|
323
385
|
|
|
@@ -85,7 +85,7 @@ module LlmDocsBuilder
|
|
|
85
85
|
|
|
86
86
|
# Extracts markdown links from section content into structured format
|
|
87
87
|
#
|
|
88
|
-
# Scans for markdown list items with links and descriptions. Returns raw content
|
|
88
|
+
# Scans for markdown list items with links and optional descriptions. Returns raw content
|
|
89
89
|
# if no links are found in the expected format.
|
|
90
90
|
#
|
|
91
91
|
# @param content [String] raw section content
|
|
@@ -93,11 +93,13 @@ module LlmDocsBuilder
|
|
|
93
93
|
def parse_section_content(content)
|
|
94
94
|
links = []
|
|
95
95
|
|
|
96
|
-
|
|
96
|
+
# Updated regex: description is optional (non-capturing group with ?)
|
|
97
|
+
# Use [^\n]* instead of .* to avoid matching across lines
|
|
98
|
+
content.scan(/^[-*]\s*\[([^\]]+)\]\(([^)]+)\)(?::\s*([^\n]*))?$/) do |title, url, description|
|
|
97
99
|
links << {
|
|
98
100
|
title: title,
|
|
99
101
|
url: url,
|
|
100
|
-
description: description
|
|
102
|
+
description: description&.strip || ''
|
|
101
103
|
}
|
|
102
104
|
end
|
|
103
105
|
|