llm-docs-builder 0.8.2 → 0.9.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +18 -0
- data/Gemfile.lock +1 -1
- data/README.md +12 -122
- data/lib/llm_docs_builder/output_formatter.rb +46 -0
- data/lib/llm_docs_builder/transformers/heading_transformer.rb +10 -0
- data/lib/llm_docs_builder/version.rb +1 -1
- data/misc/diff.png +0 -0
- data/misc/logo.png +0 -0
- data/misc/logo_wide.png +0 -0
- metadata +4 -1
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 6ddb04b5a0e30d913d2043a79a3d7e14d35bd0166cf7104a300457887b1019cf
|
4
|
+
data.tar.gz: 5301d904225d0d139a2c2dd8184695eb66e2d858b8f3b5a86026b16a1ccb8c44
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 3daacfdde22c93023677e7e0e7487158b3e1f30a9c4e7ec2bf015bc220ff32296334411c74523aaa411d1ef21b1aa47a2e8cb2c7cada797922d89d5690f7ab0a
|
7
|
+
data.tar.gz: 50a8f8d29f9e79e6f5ab2774111385f6098378491b2732bb84a9d3eee0b2f5be4b6beae196ef76739a6ac1c81562da9ec6f3e66da562e657828874c55ec06ce1
|
data/CHANGELOG.md
CHANGED
@@ -1,5 +1,23 @@
|
|
1
1
|
# Changelog
|
2
2
|
|
3
|
+
## 0.9.1 (2025-10-17)
|
4
|
+
- [Fix] Fixed HeadingTransformer incorrectly treating hash symbols in code blocks as headings.
|
5
|
+
- Now properly tracks code block boundaries (fenced with ``` or ~~~)
|
6
|
+
- Skips heading processing for lines inside code blocks
|
7
|
+
- Prevents Ruby/Python/Shell comments from being interpreted as markdown headings
|
8
|
+
- Added comprehensive test coverage for code block handling
|
9
|
+
|
10
|
+
## 0.9.0 (2025-10-17)
|
11
|
+
- [Feature] **No AI Version Detection** - The `compare` command now detects when websites don't serve AI-optimized versions.
|
12
|
+
- Triggers when reduction is <5% (nearly identical content for human and AI User-Agents)
|
13
|
+
- Displays prominent warning: "WARNING: NO DEDICATED AI VERSION DETECTED"
|
14
|
+
- Shows potential savings estimates based on typical 83% reduction rate
|
15
|
+
- Provides page-specific calculations (estimated token savings, potential size)
|
16
|
+
- Includes implementation guide with actionable steps
|
17
|
+
- Helps identify opportunities to optimize documentation
|
18
|
+
- [Enhancement] Updated `OutputFormatter#display_comparison_results` to include marketing message for unoptimized sites.
|
19
|
+
- [Enhancement] Added utility script `probe_karafka_simple.rb` for batch comparison testing.
|
20
|
+
|
3
21
|
## 0.8.2 (2025-10-17)
|
4
22
|
- [Fix] Fixed Docker workflow test to properly invoke help command (use `generate --help` instead of `--help`).
|
5
23
|
|
data/Gemfile.lock
CHANGED
data/README.md
CHANGED
@@ -1,3 +1,7 @@
|
|
1
|
+
<p align="center">
|
2
|
+
<img src="misc/logo_wide.png" alt="llm-docs-builder logo">
|
3
|
+
</p>
|
4
|
+
|
1
5
|
# llm-docs-builder
|
2
6
|
|
3
7
|
[](
|
@@ -11,10 +15,13 @@ llm-docs-builder transforms markdown documentation to be AI-friendly and generat
|
|
11
15
|
|
12
16
|
When LLMs fetch documentation, they typically get HTML pages designed for humans - complete with navigation bars, footers, JavaScript, CSS, and other overhead. This wastes 70-90% of your context window on content that doesn't help answer questions.
|
13
17
|
|
14
|
-
**Real
|
15
|
-
|
16
|
-
|
17
|
-
|
18
|
+
**Real-world results from [Karafka documentation](https://karafka.io/docs/) (10 pages analyzed):**
|
19
|
+
|
20
|
+
<p align="center">
|
21
|
+
<img src="misc/diff.png" alt="Karafka documentation optimization results">
|
22
|
+
</p>
|
23
|
+
|
24
|
+
**Average reduction: 83% fewer tokens**
|
18
25
|
|
19
26
|
## Quick Start
|
20
27
|
|
@@ -138,11 +145,6 @@ excludes:
|
|
138
145
|
- "**/drafts/**"
|
139
146
|
```
|
140
147
|
|
141
|
-
**Configuration precedence:**
|
142
|
-
1. CLI flags (highest)
|
143
|
-
2. Config file
|
144
|
-
3. Defaults
|
145
|
-
|
146
148
|
## CLI Commands
|
147
149
|
|
148
150
|
```bash
|
@@ -164,38 +166,6 @@ llm-docs-builder version # Show version
|
|
164
166
|
-v, --verbose Detailed output
|
165
167
|
```
|
166
168
|
|
167
|
-
## Ruby API
|
168
|
-
|
169
|
-
```ruby
|
170
|
-
require 'llm_docs_builder'
|
171
|
-
|
172
|
-
# Transform single file with custom options
|
173
|
-
transformed = LlmDocsBuilder.transform_markdown(
|
174
|
-
'README.md',
|
175
|
-
base_url: 'https://myproject.io',
|
176
|
-
remove_code_examples: true,
|
177
|
-
remove_images: true,
|
178
|
-
generate_toc: true,
|
179
|
-
custom_instruction: 'AI-optimized documentation'
|
180
|
-
)
|
181
|
-
|
182
|
-
# Bulk transform
|
183
|
-
files = LlmDocsBuilder.bulk_transform(
|
184
|
-
'./docs',
|
185
|
-
base_url: 'https://myproject.io',
|
186
|
-
suffix: '.llm',
|
187
|
-
remove_duplicates: true,
|
188
|
-
generate_toc: true
|
189
|
-
)
|
190
|
-
|
191
|
-
# Generate llms.txt
|
192
|
-
content = LlmDocsBuilder.generate_from_docs(
|
193
|
-
'./docs',
|
194
|
-
base_url: 'https://myproject.io',
|
195
|
-
title: 'My Project'
|
196
|
-
)
|
197
|
-
```
|
198
|
-
|
199
169
|
## Serving Optimized Docs to AI Bots
|
200
170
|
|
201
171
|
After using `bulk-transform` with `suffix: .llm`, configure your web server to serve optimized versions to AI bots:
|
@@ -221,27 +191,6 @@ location ~ ^/docs/(.*)\.md$ {
|
|
221
191
|
}
|
222
192
|
```
|
223
193
|
|
224
|
-
## Real-World Results: Karafka Framework
|
225
|
-
|
226
|
-
**Before:** 140+ lines of custom transformation code
|
227
|
-
|
228
|
-
**After:** 6 lines of configuration
|
229
|
-
```yaml
|
230
|
-
docs: ./online/docs
|
231
|
-
base_url: https://karafka.io/docs
|
232
|
-
convert_urls: true
|
233
|
-
remove_comments: true
|
234
|
-
remove_badges: true
|
235
|
-
remove_frontmatter: true
|
236
|
-
normalize_whitespace: true
|
237
|
-
suffix: "" # In-place for build pipeline
|
238
|
-
```
|
239
|
-
|
240
|
-
**Results:**
|
241
|
-
- 93% average token reduction
|
242
|
-
- 20-36x smaller files
|
243
|
-
- Automated via GitHub Actions
|
244
|
-
|
245
194
|
## Docker Usage
|
246
195
|
|
247
196
|
```bash
|
@@ -281,43 +230,21 @@ layout: docs
|
|
281
230
|
|
282
231
|
[Click here to see the complete API documentation](./api.md)
|
283
232
|
|
284
|
-
```ruby
|
285
233
|
api = API.new
|
286
234
|
```
|
287
235
|
|
288
|
-

|
289
|
-
```
|
290
|
-
|
291
236
|
**After transformation (with default options):**
|
237
|
+
|
292
238
|
```markdown
|
293
239
|
# API Documentation
|
294
240
|
|
295
241
|
[complete API documentation](./api.md)
|
296
242
|
|
297
|
-
```ruby
|
298
243
|
api = API.new
|
299
244
|
```
|
300
|
-
```
|
301
245
|
|
302
246
|
**Token reduction:** ~40-60% depending on configuration
|
303
247
|
|
304
|
-
## FAQ
|
305
|
-
|
306
|
-
**Q: Do I need to use llms.txt?**
|
307
|
-
No. The compare and transform commands work independently.
|
308
|
-
|
309
|
-
**Q: Will this change how humans see my docs?**
|
310
|
-
Not with default `suffix: .llm`. Separate files are served only to AI bots.
|
311
|
-
|
312
|
-
**Q: Can I use this in my build pipeline?**
|
313
|
-
Yes. Use `suffix: ""` for in-place transformation.
|
314
|
-
|
315
|
-
**Q: How do I know if it's working?**
|
316
|
-
Use `llm-docs-builder compare` to measure before and after.
|
317
|
-
|
318
|
-
**Q: What about private documentation?**
|
319
|
-
Use the `excludes` option to skip sensitive files.
|
320
|
-
|
321
248
|
## RAG Enhancement Features
|
322
249
|
|
323
250
|
### Heading Normalization
|
@@ -405,43 +332,6 @@ All compression features can be used individually for fine-grained control:
|
|
405
332
|
- `convert_urls: true` - Convert `.html`/`.htm` URLs to `.md` format
|
406
333
|
- `normalize_whitespace: true` - Reduce excessive blank lines and remove trailing whitespace
|
407
334
|
|
408
|
-
### Example Usage
|
409
|
-
|
410
|
-
```ruby
|
411
|
-
# Fine-grained control
|
412
|
-
LlmDocsBuilder.transform_markdown(
|
413
|
-
'README.md',
|
414
|
-
remove_frontmatter: true,
|
415
|
-
remove_badges: true,
|
416
|
-
remove_images: true,
|
417
|
-
simplify_links: true,
|
418
|
-
generate_toc: true,
|
419
|
-
normalize_whitespace: true
|
420
|
-
)
|
421
|
-
```
|
422
|
-
|
423
|
-
Or configure via YAML:
|
424
|
-
|
425
|
-
```yaml
|
426
|
-
# llm-docs-builder.yml
|
427
|
-
docs: ./docs
|
428
|
-
base_url: https://myproject.io
|
429
|
-
suffix: .llm
|
430
|
-
|
431
|
-
# Pick exactly what you need
|
432
|
-
remove_frontmatter: true
|
433
|
-
remove_comments: true
|
434
|
-
remove_badges: true
|
435
|
-
remove_images: true
|
436
|
-
simplify_links: true
|
437
|
-
generate_toc: true
|
438
|
-
normalize_whitespace: true
|
439
|
-
```
|
440
|
-
|
441
|
-
## Contributing
|
442
|
-
|
443
|
-
Bug reports and pull requests welcome at [github.com/mensfeld/llm-docs-builder](https://github.com/mensfeld/llm-docs-builder).
|
444
|
-
|
445
335
|
## License
|
446
336
|
|
447
337
|
Available as open source under the [MIT License](https://opensource.org/licenses/MIT).
|
@@ -8,6 +8,8 @@ module LlmDocsBuilder
|
|
8
8
|
#
|
9
9
|
# @api private
|
10
10
|
class OutputFormatter
|
11
|
+
# Threshold percentage below which we consider there's no AI-optimized version
|
12
|
+
NO_AI_VERSION_THRESHOLD = 5
|
11
13
|
# Format bytes into human-readable string
|
12
14
|
#
|
13
15
|
# @param bytes [Integer] number of bytes
|
@@ -56,10 +58,16 @@ module LlmDocsBuilder
|
|
56
58
|
|
57
59
|
if result[:reduction_bytes].positive?
|
58
60
|
display_reduction(result)
|
61
|
+
|
62
|
+
# Detect if there's no dedicated AI version
|
63
|
+
if result[:reduction_percent] < NO_AI_VERSION_THRESHOLD
|
64
|
+
display_no_ai_version_message(result)
|
65
|
+
end
|
59
66
|
elsif result[:reduction_bytes].negative?
|
60
67
|
display_increase(result)
|
61
68
|
else
|
62
69
|
puts 'Same size'
|
70
|
+
display_no_ai_version_message(result)
|
63
71
|
end
|
64
72
|
|
65
73
|
puts '=' * 60
|
@@ -89,5 +97,43 @@ module LlmDocsBuilder
|
|
89
97
|
puts "Token increase: #{format_number(token_increase)} tokens (#{token_increase_percent}%)"
|
90
98
|
puts "Factor: #{result[:factor]}x larger"
|
91
99
|
end
|
100
|
+
|
101
|
+
# Display message when no dedicated AI version is detected
|
102
|
+
#
|
103
|
+
# @param result [Hash] comparison results
|
104
|
+
# @api private
|
105
|
+
def self.display_no_ai_version_message(result)
|
106
|
+
puts ''
|
107
|
+
puts 'WARNING: NO DEDICATED AI VERSION DETECTED'
|
108
|
+
puts ''
|
109
|
+
puts 'The server is returning nearly identical content to both human and AI'
|
110
|
+
puts 'User-Agents, indicating no AI-optimized version is currently served.'
|
111
|
+
puts ''
|
112
|
+
puts 'POTENTIAL SAVINGS WITH AI OPTIMIZATION:'
|
113
|
+
puts ''
|
114
|
+
puts 'Based on typical documentation optimization results, you could expect:'
|
115
|
+
puts ' • 67-95% token reduction (average 83%)'
|
116
|
+
puts ' • 3-20x smaller file sizes'
|
117
|
+
puts ' • Faster LLM processing times'
|
118
|
+
puts ' • Reduced API costs for AI queries'
|
119
|
+
puts ' • Improved response accuracy'
|
120
|
+
puts ''
|
121
|
+
puts "For this page specifically (~#{format_number(result[:human_tokens])} tokens):"
|
122
|
+
puts " • Estimated savings: ~#{format_number((result[:human_tokens] * 0.83).round)} tokens (83% reduction)"
|
123
|
+
puts " • Could reduce to: ~#{format_number((result[:human_tokens] * 0.17).round)} tokens"
|
124
|
+
puts " • Potential size: ~#{format_bytes((result[:human_size] * 0.17).round)}"
|
125
|
+
puts ''
|
126
|
+
puts 'HOW TO IMPLEMENT AI-OPTIMIZED DOCUMENTATION:'
|
127
|
+
puts ''
|
128
|
+
puts '1. Transform your docs with llm-docs-builder:'
|
129
|
+
puts ' llm-docs-builder bulk-transform --docs ./docs --config llm-docs-builder.yml'
|
130
|
+
puts ''
|
131
|
+
puts '2. Configure your web server to serve .md files to AI bots:'
|
132
|
+
puts ' See: https://github.com/mensfeld/llm-docs-builder#serving-optimized-docs'
|
133
|
+
puts ''
|
134
|
+
puts '3. Measure your actual savings:'
|
135
|
+
puts ' llm-docs-builder compare --url <your-url> --file <local-md>'
|
136
|
+
puts ''
|
137
|
+
end
|
92
138
|
end
|
93
139
|
end
|
@@ -38,8 +38,18 @@ module LlmDocsBuilder
|
|
38
38
|
separator = options[:heading_separator] || ' / '
|
39
39
|
heading_stack = []
|
40
40
|
lines = content.lines
|
41
|
+
in_code_block = false
|
41
42
|
|
42
43
|
transformed_lines = lines.map do |line|
|
44
|
+
# Track code block boundaries (fenced code blocks with ``` or ~~~)
|
45
|
+
if line.match?(/^```|^~~~/)
|
46
|
+
in_code_block = !in_code_block
|
47
|
+
next line
|
48
|
+
end
|
49
|
+
|
50
|
+
# Skip heading processing if inside code block
|
51
|
+
next line if in_code_block
|
52
|
+
|
43
53
|
# Match markdown headings (1-6 hash symbols followed by space and text)
|
44
54
|
heading_match = line.match(/^(#+)\s+(.+)$/)
|
45
55
|
|
data/misc/diff.png
ADDED
Binary file
|
data/misc/logo.png
ADDED
Binary file
|
data/misc/logo_wide.png
ADDED
Binary file
|
metadata
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: llm-docs-builder
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.
|
4
|
+
version: 0.9.1
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Maciej Mensfeld
|
@@ -147,6 +147,9 @@ files:
|
|
147
147
|
- lib/llm_docs_builder/version.rb
|
148
148
|
- llm-docs-builder.gemspec
|
149
149
|
- llm-docs-builder.yml.example
|
150
|
+
- misc/diff.png
|
151
|
+
- misc/logo.png
|
152
|
+
- misc/logo_wide.png
|
150
153
|
- renovate.json
|
151
154
|
homepage: https://github.com/mensfeld/llm-docs-builder
|
152
155
|
licenses:
|