llm-docs-builder 0.9.4 → 0.10.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +14 -0
- data/Gemfile.lock +1 -1
- data/README.md +66 -3
- data/lib/llm_docs_builder/config.rb +1 -0
- data/lib/llm_docs_builder/generator.rb +54 -21
- data/lib/llm_docs_builder/parser.rb +5 -3
- data/lib/llm_docs_builder/version.rb +1 -1
- metadata +1 -1
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: e6407836216f436b728247009f614ea4ea5c2b4de0edf855717129460df4b309
|
|
4
|
+
data.tar.gz: 8a556fc0b6307529f5c615c05b082521bd021e9e9f34ca30c8bb21d22b21deb2
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: 3a0b657545415c35187fa1595f3fcc5bd5c27a0c1a292bf00635d3c6f14221ac0a254f008f83acc1f7351ef76c6649e6b0540ac585cd64a670b05ef725b0308e
|
|
7
|
+
data.tar.gz: 68e95142e374ebae3c292db724a9163c396fd423eaa04983bbdaefd6f6cbb0bc72822dfbc5af9fd8f357f1545320db70bf5a6c8a2413fc5f796e73195adcdd13
|
data/CHANGELOG.md
CHANGED
|
@@ -1,5 +1,19 @@
|
|
|
1
1
|
# Changelog
|
|
2
2
|
|
|
3
|
+
## 0.10.0 (2025-10-27)
|
|
4
|
+
- [Feature] **llms.txt Specification Compliance** - Updated output format to fully comply with the llms.txt specification from llmstxt.org.
|
|
5
|
+
- **Metadata Format**: Metadata now appears within the description field using parentheses and comma separators: `- [title](url): description (tokens:450, updated:2025-10-13, priority:high)`
|
|
6
|
+
- **Optional Descriptions**: Parser now correctly handles links without descriptions: `- [title](url)` per spec
|
|
7
|
+
- **Multi-Section Support**: Documents automatically organized into `Documentation`, `Examples`, and `Optional` sections based on priority
|
|
8
|
+
- **Body Content Support**: Added optional `body` config parameter for custom content between description and sections
|
|
9
|
+
- Priority-based categorization: 1-3 → Documentation, 4-5 → Examples, 6-7 → Optional
|
|
10
|
+
- Empty sections are automatically omitted from output
|
|
11
|
+
- Updated parser regex from `/^[-*]\s*\[([^\]]+)\]\(([^)]+)\):\s*(.*)$/m` to `/^[-*]\s*\[([^\]]+)\]\(([^)]+)\)(?::\s*([^\n]*))?$/` to make descriptions optional
|
|
12
|
+
- Fixed multiline regex greedy matching issue that was capturing only one link per section
|
|
13
|
+
- [Test] Added comprehensive test suite for spec compliance (8 new parser tests, 7 new generator tests)
|
|
14
|
+
- [Docs] Updated README with multi-section organization examples and body content usage
|
|
15
|
+
- **Breaking Change**: Metadata format has changed from `tokens:450 updated:2025-10-13` to `(tokens:450, updated:2025-10-13)` for spec compliance
|
|
16
|
+
|
|
3
17
|
## 0.9.4 (2025-10-27)
|
|
4
18
|
- [Feature] **Auto-Exclude Hidden Directories** - Hidden directories (starting with `.`) are now automatically excluded by default to prevent noise from `.git`, `.lint`, `.github`, etc.
|
|
5
19
|
- Adds `include_hidden: false` as default behavior
|
data/Gemfile.lock
CHANGED
data/README.md
CHANGED
|
@@ -109,6 +109,7 @@ docs: ./docs
|
|
|
109
109
|
base_url: https://myproject.io
|
|
110
110
|
title: My Project
|
|
111
111
|
description: Brief description
|
|
112
|
+
body: Optional body content between description and sections
|
|
112
113
|
output: llms.txt
|
|
113
114
|
suffix: .llm
|
|
114
115
|
verbose: false
|
|
@@ -289,9 +290,9 @@ Generate enriched llms.txt files with token counts, timestamps, and priority lab
|
|
|
289
290
|
|
|
290
291
|
**Enhanced llms.txt (with metadata enabled):**
|
|
291
292
|
```markdown
|
|
292
|
-
- [Getting Started](https://myproject.io/docs/Getting-Started.md) tokens:450 updated:2025-10-13 priority:high
|
|
293
|
-
- [Configuration](https://myproject.io/docs/Configuration.md) tokens:2800 updated:2025-10-12 priority:high
|
|
294
|
-
- [Advanced Topics](https://myproject.io/docs/Advanced.md) tokens:5200 updated:2025-09-15 priority:medium
|
|
293
|
+
- [Getting Started](https://myproject.io/docs/Getting-Started.md): Quick start guide (tokens:450, updated:2025-10-13, priority:high)
|
|
294
|
+
- [Configuration](https://myproject.io/docs/Configuration.md): Configuration options (tokens:2800, updated:2025-10-12, priority:high)
|
|
295
|
+
- [Advanced Topics](https://myproject.io/docs/Advanced.md): Deep dive topics (tokens:5200, updated:2025-09-15, priority:medium)
|
|
295
296
|
```
|
|
296
297
|
|
|
297
298
|
**Benefits:**
|
|
@@ -309,6 +310,68 @@ include_priority: true # Show priority labels (high/medium/low)
|
|
|
309
310
|
calculate_compression: true # Show compression ratios (slower, requires transformation)
|
|
310
311
|
```
|
|
311
312
|
|
|
313
|
+
**Note:** Metadata is formatted according to the llms.txt specification, appearing within the description field using parentheses and comma separators for spec compliance.
|
|
314
|
+
|
|
315
|
+
### Multi-Section Organization
|
|
316
|
+
|
|
317
|
+
Documents are automatically organized into multiple sections based on priority, following the llms.txt specification:
|
|
318
|
+
|
|
319
|
+
**Priority-based categorization:**
|
|
320
|
+
- **Documentation** (priority 1-3): Essential docs like README, getting started guides, user guides
|
|
321
|
+
- **Examples** (priority 4-5): Tutorials and example files
|
|
322
|
+
- **Optional** (priority 6-7): Advanced topics and reference documentation
|
|
323
|
+
|
|
324
|
+
**Example output:**
|
|
325
|
+
```markdown
|
|
326
|
+
# My Project
|
|
327
|
+
|
|
328
|
+
> Project description
|
|
329
|
+
|
|
330
|
+
## Documentation
|
|
331
|
+
|
|
332
|
+
- [README](README.md): Main documentation
|
|
333
|
+
- [Getting Started](getting-started.md): Quick start guide
|
|
334
|
+
|
|
335
|
+
## Examples
|
|
336
|
+
|
|
337
|
+
- [Basic Tutorial](tutorial.md): Step-by-step tutorial
|
|
338
|
+
- [Code Examples](examples.md): Example code
|
|
339
|
+
|
|
340
|
+
## Optional
|
|
341
|
+
|
|
342
|
+
- [Advanced Topics](advanced.md): Deep dive into advanced features
|
|
343
|
+
- [API Reference](reference.md): Complete API reference
|
|
344
|
+
```
|
|
345
|
+
|
|
346
|
+
Empty sections are automatically omitted. The "Optional" section aligns with the llms.txt spec for marking secondary content that can be skipped when context windows are limited.
|
|
347
|
+
|
|
348
|
+
### Body Content
|
|
349
|
+
|
|
350
|
+
Add custom body content between the description and documentation sections:
|
|
351
|
+
|
|
352
|
+
```yaml
|
|
353
|
+
# llm-docs-builder.yml
|
|
354
|
+
title: My Project
|
|
355
|
+
description: Brief description
|
|
356
|
+
body: |
|
|
357
|
+
This framework is built on Ruby and focuses on performance.
|
|
358
|
+
Key concepts: streaming, batching, and parallel processing.
|
|
359
|
+
docs: ./docs
|
|
360
|
+
```
|
|
361
|
+
|
|
362
|
+
This produces:
|
|
363
|
+
```markdown
|
|
364
|
+
# My Project
|
|
365
|
+
|
|
366
|
+
> Brief description
|
|
367
|
+
|
|
368
|
+
This framework is built on Ruby and focuses on performance.
|
|
369
|
+
Key concepts: streaming, batching, and parallel processing.
|
|
370
|
+
|
|
371
|
+
## Documentation
|
|
372
|
+
...
|
|
373
|
+
```
|
|
374
|
+
|
|
312
375
|
## Advanced Compression Options
|
|
313
376
|
|
|
314
377
|
All compression features can be used individually for fine-grained control:
|
|
@@ -61,6 +61,7 @@ module LlmDocsBuilder
|
|
|
61
61
|
base_url: options[:base_url] || self['base_url'],
|
|
62
62
|
title: options[:title] || self['title'],
|
|
63
63
|
description: options[:description] || self['description'],
|
|
64
|
+
body: options[:body] || self['body'],
|
|
64
65
|
output: options[:output] || self['output'] || 'llms.txt',
|
|
65
66
|
convert_urls: if options.key?(:convert_urls)
|
|
66
67
|
options[:convert_urls]
|
|
@@ -210,7 +210,11 @@ module LlmDocsBuilder
|
|
|
210
210
|
|
|
211
211
|
# Constructs llms.txt content from analyzed documentation files
|
|
212
212
|
#
|
|
213
|
-
# Combines title, description, and documentation links into formatted output
|
|
213
|
+
# Combines title, description, body content, and documentation links into formatted output.
|
|
214
|
+
# Organizes documents into sections based on priority:
|
|
215
|
+
# - Priority 1-3: Documentation (essential docs like README, getting started)
|
|
216
|
+
# - Priority 4-5: Examples (tutorials, example files)
|
|
217
|
+
# - Priority 6-7: Optional (advanced topics, reference docs)
|
|
214
218
|
#
|
|
215
219
|
# @param docs [Array<Hash>] analyzed file metadata
|
|
216
220
|
# @return [String] formatted llms.txt content
|
|
@@ -224,31 +228,60 @@ module LlmDocsBuilder
|
|
|
224
228
|
content << "> #{description}" if description
|
|
225
229
|
content << ''
|
|
226
230
|
|
|
227
|
-
|
|
228
|
-
|
|
231
|
+
# Add optional body content
|
|
232
|
+
if options[:body] && !options[:body].empty?
|
|
233
|
+
content << options[:body]
|
|
229
234
|
content << ''
|
|
235
|
+
end
|
|
230
236
|
|
|
231
|
-
|
|
232
|
-
|
|
233
|
-
|
|
234
|
-
|
|
235
|
-
|
|
236
|
-
|
|
237
|
-
|
|
238
|
-
|
|
239
|
-
|
|
240
|
-
|
|
241
|
-
|
|
242
|
-
|
|
243
|
-
|
|
244
|
-
|
|
245
|
-
|
|
246
|
-
|
|
247
|
-
|
|
237
|
+
if docs.any?
|
|
238
|
+
# Categorize docs by priority into sections
|
|
239
|
+
sections = {
|
|
240
|
+
'Documentation' => docs.select { |d| d[:priority] <= 3 },
|
|
241
|
+
'Examples' => docs.select { |d| d[:priority] >= 4 && d[:priority] <= 5 },
|
|
242
|
+
'Optional' => docs.select { |d| d[:priority] >= 6 }
|
|
243
|
+
}
|
|
244
|
+
|
|
245
|
+
# Build each section (skip empty ones)
|
|
246
|
+
sections.each do |section_name, section_docs|
|
|
247
|
+
next if section_docs.empty?
|
|
248
|
+
|
|
249
|
+
content << "## #{section_name}"
|
|
250
|
+
content << ''
|
|
251
|
+
|
|
252
|
+
section_docs.each do |doc|
|
|
253
|
+
url = build_url(doc[:path])
|
|
254
|
+
|
|
255
|
+
# Build metadata string if enabled
|
|
256
|
+
metadata_str = nil
|
|
257
|
+
if options[:include_metadata]
|
|
258
|
+
metadata_parts = []
|
|
259
|
+
metadata_parts << "tokens:#{doc[:tokens]}" if doc[:tokens]
|
|
260
|
+
metadata_parts << "compression:#{doc[:compression]}" if doc[:compression]
|
|
261
|
+
metadata_parts << "updated:#{doc[:updated]}" if doc[:updated]
|
|
262
|
+
metadata_parts << priority_label(doc[:priority]) if options[:include_priority]
|
|
263
|
+
|
|
264
|
+
metadata_str = "(#{metadata_parts.join(', ')})" unless metadata_parts.empty?
|
|
265
|
+
end
|
|
266
|
+
|
|
267
|
+
# Build line according to spec: - [title](url): description (metadata)
|
|
268
|
+
line = if doc[:description] && !doc[:description].empty?
|
|
269
|
+
base = "- [#{doc[:title]}](#{url}): #{doc[:description]}"
|
|
270
|
+
metadata_str ? "#{base} #{metadata_str}" : base
|
|
271
|
+
else
|
|
272
|
+
# No description: - [title](url) (metadata)
|
|
273
|
+
base = "- [#{doc[:title]}](#{url})"
|
|
274
|
+
metadata_str ? "#{base}: #{metadata_str}" : base
|
|
275
|
+
end
|
|
276
|
+
|
|
277
|
+
content << line
|
|
248
278
|
end
|
|
249
279
|
|
|
250
|
-
content <<
|
|
280
|
+
content << ''
|
|
251
281
|
end
|
|
282
|
+
|
|
283
|
+
# Remove trailing empty line
|
|
284
|
+
content.pop if content.last == ''
|
|
252
285
|
end
|
|
253
286
|
|
|
254
287
|
"#{content.join("\n")}\n"
|
|
@@ -85,7 +85,7 @@ module LlmDocsBuilder
|
|
|
85
85
|
|
|
86
86
|
# Extracts markdown links from section content into structured format
|
|
87
87
|
#
|
|
88
|
-
# Scans for markdown list items with links and descriptions. Returns raw content
|
|
88
|
+
# Scans for markdown list items with links and optional descriptions. Returns raw content
|
|
89
89
|
# if no links are found in the expected format.
|
|
90
90
|
#
|
|
91
91
|
# @param content [String] raw section content
|
|
@@ -93,11 +93,13 @@ module LlmDocsBuilder
|
|
|
93
93
|
def parse_section_content(content)
|
|
94
94
|
links = []
|
|
95
95
|
|
|
96
|
-
|
|
96
|
+
# Updated regex: description is optional (non-capturing group with ?)
|
|
97
|
+
# Use [^\n]* instead of .* to avoid matching across lines
|
|
98
|
+
content.scan(/^[-*]\s*\[([^\]]+)\]\(([^)]+)\)(?::\s*([^\n]*))?$/) do |title, url, description|
|
|
97
99
|
links << {
|
|
98
100
|
title: title,
|
|
99
101
|
url: url,
|
|
100
|
-
description: description
|
|
102
|
+
description: description&.strip || ''
|
|
101
103
|
}
|
|
102
104
|
end
|
|
103
105
|
|