llm-docs-builder 0.9.0 → 0.9.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +14 -0
- data/Gemfile.lock +1 -1
- data/README.md +8 -118
- data/lib/llm_docs_builder/transformers/heading_transformer.rb +10 -0
- data/lib/llm_docs_builder/version.rb +1 -1
- data/misc/diff.png +0 -0
- metadata +2 -1
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: e2b8530a9f890cb3deed824d3520f0a267a10b1e60a38e6bc8948b6447f50200
|
4
|
+
data.tar.gz: 7fdca5e5403050d9f4845bf7deb29fd64fc7df8758004a8f97d5e5ff4c519d27
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 0bd796067af5a3246ff99989d0190c33882218a5f5ea0a6f8dfc9a6291ab5fba122269fc1b522acebd2782d07f1ff99a38557efcf616e5e88bfb5ad1c34ec75e
|
7
|
+
data.tar.gz: 9b2e456821ccc73790fb7feb8f1046c89d4111c47be4f6755ba2059ea6ab2b5433229e5053a883b0b976b73875889af085f266089d57468084550aa6deb248d3
|
data/CHANGELOG.md
CHANGED
@@ -1,5 +1,19 @@
|
|
1
1
|
# Changelog
|
2
2
|
|
3
|
+
## 0.9.2 (2025-10-17)
|
4
|
+
- [Fix] Tackle one more block boundaries tracking edge-case.
|
5
|
+
|
6
|
+
## 0.9.1 (2025-10-17)
|
7
|
+
- [Fix] Fixed HeadingTransformer incorrectly treating hash symbols in code blocks as headings.
|
8
|
+
- Now properly tracks code block boundaries (fenced with ``` or ~~~)
|
9
|
+
- Fixed regex pattern from `/^```|^~~~/` to `/^(```|~~~)/` for correct operator precedence
|
10
|
+
- Skips heading processing for lines inside code blocks
|
11
|
+
- Prevents Ruby/Python/Shell comments from being interpreted as markdown headings
|
12
|
+
- Added 5 comprehensive test cases covering multiple scenarios to prevent regression
|
13
|
+
- Skips heading processing for lines inside code blocks
|
14
|
+
- Prevents Ruby/Python/Shell comments from being interpreted as markdown headings
|
15
|
+
- Added comprehensive test coverage for code block handling
|
16
|
+
|
3
17
|
## 0.9.0 (2025-10-17)
|
4
18
|
- [Feature] **No AI Version Detection** - The `compare` command now detects when websites don't serve AI-optimized versions.
|
5
19
|
- Triggers when reduction is <5% (nearly identical content for human and AI User-Agents)
|
data/Gemfile.lock
CHANGED
data/README.md
CHANGED
@@ -15,10 +15,13 @@ llm-docs-builder transforms markdown documentation to be AI-friendly and generat
|
|
15
15
|
|
16
16
|
When LLMs fetch documentation, they typically get HTML pages designed for humans - complete with navigation bars, footers, JavaScript, CSS, and other overhead. This wastes 70-90% of your context window on content that doesn't help answer questions.
|
17
17
|
|
18
|
-
**Real
|
19
|
-
|
20
|
-
|
21
|
-
|
18
|
+
**Real-world results from [Karafka documentation](https://karafka.io/docs/) (10 pages analyzed):**
|
19
|
+
|
20
|
+
<p align="center">
|
21
|
+
<img src="misc/diff.png" alt="Karafka documentation optimization results">
|
22
|
+
</p>
|
23
|
+
|
24
|
+
**Average reduction: 83% fewer tokens**
|
22
25
|
|
23
26
|
## Quick Start
|
24
27
|
|
@@ -142,11 +145,6 @@ excludes:
|
|
142
145
|
- "**/drafts/**"
|
143
146
|
```
|
144
147
|
|
145
|
-
**Configuration precedence:**
|
146
|
-
1. CLI flags (highest)
|
147
|
-
2. Config file
|
148
|
-
3. Defaults
|
149
|
-
|
150
148
|
## CLI Commands
|
151
149
|
|
152
150
|
```bash
|
@@ -168,38 +166,6 @@ llm-docs-builder version # Show version
|
|
168
166
|
-v, --verbose Detailed output
|
169
167
|
```
|
170
168
|
|
171
|
-
## Ruby API
|
172
|
-
|
173
|
-
```ruby
|
174
|
-
require 'llm_docs_builder'
|
175
|
-
|
176
|
-
# Transform single file with custom options
|
177
|
-
transformed = LlmDocsBuilder.transform_markdown(
|
178
|
-
'README.md',
|
179
|
-
base_url: 'https://myproject.io',
|
180
|
-
remove_code_examples: true,
|
181
|
-
remove_images: true,
|
182
|
-
generate_toc: true,
|
183
|
-
custom_instruction: 'AI-optimized documentation'
|
184
|
-
)
|
185
|
-
|
186
|
-
# Bulk transform
|
187
|
-
files = LlmDocsBuilder.bulk_transform(
|
188
|
-
'./docs',
|
189
|
-
base_url: 'https://myproject.io',
|
190
|
-
suffix: '.llm',
|
191
|
-
remove_duplicates: true,
|
192
|
-
generate_toc: true
|
193
|
-
)
|
194
|
-
|
195
|
-
# Generate llms.txt
|
196
|
-
content = LlmDocsBuilder.generate_from_docs(
|
197
|
-
'./docs',
|
198
|
-
base_url: 'https://myproject.io',
|
199
|
-
title: 'My Project'
|
200
|
-
)
|
201
|
-
```
|
202
|
-
|
203
169
|
## Serving Optimized Docs to AI Bots
|
204
170
|
|
205
171
|
After using `bulk-transform` with `suffix: .llm`, configure your web server to serve optimized versions to AI bots:
|
@@ -225,27 +191,6 @@ location ~ ^/docs/(.*)\.md$ {
|
|
225
191
|
}
|
226
192
|
```
|
227
193
|
|
228
|
-
## Real-World Results: Karafka Framework
|
229
|
-
|
230
|
-
**Before:** 140+ lines of custom transformation code
|
231
|
-
|
232
|
-
**After:** 6 lines of configuration
|
233
|
-
```yaml
|
234
|
-
docs: ./online/docs
|
235
|
-
base_url: https://karafka.io/docs
|
236
|
-
convert_urls: true
|
237
|
-
remove_comments: true
|
238
|
-
remove_badges: true
|
239
|
-
remove_frontmatter: true
|
240
|
-
normalize_whitespace: true
|
241
|
-
suffix: "" # In-place for build pipeline
|
242
|
-
```
|
243
|
-
|
244
|
-
**Results:**
|
245
|
-
- 93% average token reduction
|
246
|
-
- 20-36x smaller files
|
247
|
-
- Automated via GitHub Actions
|
248
|
-
|
249
194
|
## Docker Usage
|
250
195
|
|
251
196
|
```bash
|
@@ -285,43 +230,21 @@ layout: docs
|
|
285
230
|
|
286
231
|
[Click here to see the complete API documentation](./api.md)
|
287
232
|
|
288
|
-
```ruby
|
289
233
|
api = API.new
|
290
234
|
```
|
291
235
|
|
292
|
-

|
293
|
-
```
|
294
|
-
|
295
236
|
**After transformation (with default options):**
|
237
|
+
|
296
238
|
```markdown
|
297
239
|
# API Documentation
|
298
240
|
|
299
241
|
[complete API documentation](./api.md)
|
300
242
|
|
301
|
-
```ruby
|
302
243
|
api = API.new
|
303
244
|
```
|
304
|
-
```
|
305
245
|
|
306
246
|
**Token reduction:** ~40-60% depending on configuration
|
307
247
|
|
308
|
-
## FAQ
|
309
|
-
|
310
|
-
**Q: Do I need to use llms.txt?**
|
311
|
-
No. The compare and transform commands work independently.
|
312
|
-
|
313
|
-
**Q: Will this change how humans see my docs?**
|
314
|
-
Not with default `suffix: .llm`. Separate files are served only to AI bots.
|
315
|
-
|
316
|
-
**Q: Can I use this in my build pipeline?**
|
317
|
-
Yes. Use `suffix: ""` for in-place transformation.
|
318
|
-
|
319
|
-
**Q: How do I know if it's working?**
|
320
|
-
Use `llm-docs-builder compare` to measure before and after.
|
321
|
-
|
322
|
-
**Q: What about private documentation?**
|
323
|
-
Use the `excludes` option to skip sensitive files.
|
324
|
-
|
325
248
|
## RAG Enhancement Features
|
326
249
|
|
327
250
|
### Heading Normalization
|
@@ -409,39 +332,6 @@ All compression features can be used individually for fine-grained control:
|
|
409
332
|
- `convert_urls: true` - Convert `.html`/`.htm` URLs to `.md` format
|
410
333
|
- `normalize_whitespace: true` - Reduce excessive blank lines and remove trailing whitespace
|
411
334
|
|
412
|
-
### Example Usage
|
413
|
-
|
414
|
-
```ruby
|
415
|
-
# Fine-grained control
|
416
|
-
LlmDocsBuilder.transform_markdown(
|
417
|
-
'README.md',
|
418
|
-
remove_frontmatter: true,
|
419
|
-
remove_badges: true,
|
420
|
-
remove_images: true,
|
421
|
-
simplify_links: true,
|
422
|
-
generate_toc: true,
|
423
|
-
normalize_whitespace: true
|
424
|
-
)
|
425
|
-
```
|
426
|
-
|
427
|
-
Or configure via YAML:
|
428
|
-
|
429
|
-
```yaml
|
430
|
-
# llm-docs-builder.yml
|
431
|
-
docs: ./docs
|
432
|
-
base_url: https://myproject.io
|
433
|
-
suffix: .llm
|
434
|
-
|
435
|
-
# Pick exactly what you need
|
436
|
-
remove_frontmatter: true
|
437
|
-
remove_comments: true
|
438
|
-
remove_badges: true
|
439
|
-
remove_images: true
|
440
|
-
simplify_links: true
|
441
|
-
generate_toc: true
|
442
|
-
normalize_whitespace: true
|
443
|
-
```
|
444
|
-
|
445
335
|
## License
|
446
336
|
|
447
337
|
Available as open source under the [MIT License](https://opensource.org/licenses/MIT).
|
@@ -38,8 +38,18 @@ module LlmDocsBuilder
|
|
38
38
|
separator = options[:heading_separator] || ' / '
|
39
39
|
heading_stack = []
|
40
40
|
lines = content.lines
|
41
|
+
in_code_block = false
|
41
42
|
|
42
43
|
transformed_lines = lines.map do |line|
|
44
|
+
# Track code block boundaries (fenced code blocks with ``` or ~~~)
|
45
|
+
if line.match?(/^(```|~~~)/)
|
46
|
+
in_code_block = !in_code_block
|
47
|
+
next line
|
48
|
+
end
|
49
|
+
|
50
|
+
# Skip heading processing if inside code block
|
51
|
+
next line if in_code_block
|
52
|
+
|
43
53
|
# Match markdown headings (1-6 hash symbols followed by space and text)
|
44
54
|
heading_match = line.match(/^(#+)\s+(.+)$/)
|
45
55
|
|
data/misc/diff.png
ADDED
Binary file
|
metadata
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: llm-docs-builder
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.9.
|
4
|
+
version: 0.9.2
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Maciej Mensfeld
|
@@ -147,6 +147,7 @@ files:
|
|
147
147
|
- lib/llm_docs_builder/version.rb
|
148
148
|
- llm-docs-builder.gemspec
|
149
149
|
- llm-docs-builder.yml.example
|
150
|
+
- misc/diff.png
|
150
151
|
- misc/logo.png
|
151
152
|
- misc/logo_wide.png
|
152
153
|
- renovate.json
|