llm-docs-builder 0.8.2 → 0.9.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 574ebb294e2ebf55146cad9a17120f32aefac4ad22f01c72c35eb6dbb2b8cdf5
4
- data.tar.gz: d1980a9b4186249f03016a7924376624de8b5cad632aa9a96b77cb089048e229
3
+ metadata.gz: 6ddb04b5a0e30d913d2043a79a3d7e14d35bd0166cf7104a300457887b1019cf
4
+ data.tar.gz: 5301d904225d0d139a2c2dd8184695eb66e2d858b8f3b5a86026b16a1ccb8c44
5
5
  SHA512:
6
- metadata.gz: 39c8e6ebf9fccfd73be6e7ccff8345a16d869cb27051aceef1efb1f12da6c8fc1906ee5dd5fd5e485511dbf5cf7c28a707c24623451448bfde171213eff5a7ed
7
- data.tar.gz: cb5f2301ad7d535e917565a9b6f0794596dd0dcf443ae2e43d973a592a7e4cf4c04b0c61c523ba05735f29cb6ce67d1e02a20c876274ba5b313485242df8683d
6
+ metadata.gz: 3daacfdde22c93023677e7e0e7487158b3e1f30a9c4e7ec2bf015bc220ff32296334411c74523aaa411d1ef21b1aa47a2e8cb2c7cada797922d89d5690f7ab0a
7
+ data.tar.gz: 50a8f8d29f9e79e6f5ab2774111385f6098378491b2732bb84a9d3eee0b2f5be4b6beae196ef76739a6ac1c81562da9ec6f3e66da562e657828874c55ec06ce1
data/CHANGELOG.md CHANGED
@@ -1,5 +1,23 @@
1
1
  # Changelog
2
2
 
3
+ ## 0.9.1 (2025-10-17)
4
+ - [Fix] Fixed HeadingTransformer incorrectly treating hash symbols in code blocks as headings.
5
+ - Now properly tracks code block boundaries (fenced with ``` or ~~~)
6
+ - Skips heading processing for lines inside code blocks
7
+ - Prevents Ruby/Python/Shell comments from being interpreted as markdown headings
8
+ - Added comprehensive test coverage for code block handling
9
+
10
+ ## 0.9.0 (2025-10-17)
11
+ - [Feature] **No AI Version Detection** - The `compare` command now detects when websites don't serve AI-optimized versions.
12
+ - Triggers when reduction is <5% (nearly identical content for human and AI User-Agents)
13
+ - Displays prominent warning: "WARNING: NO DEDICATED AI VERSION DETECTED"
14
+ - Shows potential savings estimates based on typical 83% reduction rate
15
+ - Provides page-specific calculations (estimated token savings, potential size)
16
+ - Includes implementation guide with actionable steps
17
+ - Helps identify opportunities to optimize documentation
18
+ - [Enhancement] Updated `OutputFormatter#display_comparison_results` to include marketing message for unoptimized sites.
19
+ - [Enhancement] Added utility script `probe_karafka_simple.rb` for batch comparison testing.
20
+
3
21
  ## 0.8.2 (2025-10-17)
4
22
  - [Fix] Fixed Docker workflow test to properly invoke help command (use `generate --help` instead of `--help`).
5
23
 
data/Gemfile.lock CHANGED
@@ -1,7 +1,7 @@
1
1
  PATH
2
2
  remote: .
3
3
  specs:
4
- llm-docs-builder (0.8.2)
4
+ llm-docs-builder (0.9.1)
5
5
  zeitwerk (~> 2.6)
6
6
 
7
7
  GEM
data/README.md CHANGED
@@ -1,3 +1,7 @@
1
+ <p align="center">
2
+ <img src="misc/logo_wide.png" alt="llm-docs-builder logo">
3
+ </p>
4
+
1
5
  # llm-docs-builder
2
6
 
3
7
  [![CI](https://github.com/mensfeld/llm-docs-builder/actions/workflows/ci.yml/badge.svg)](
@@ -11,10 +15,13 @@ llm-docs-builder transforms markdown documentation to be AI-friendly and generat
11
15
 
12
16
  When LLMs fetch documentation, they typically get HTML pages designed for humans - complete with navigation bars, footers, JavaScript, CSS, and other overhead. This wastes 70-90% of your context window on content that doesn't help answer questions.
13
17
 
14
- **Real example from Karafka documentation:**
15
- - Human HTML version: 104.4 KB (~26,735 tokens)
16
- - AI markdown version: 21.5 KB (~5,496 tokens)
17
- - **Result: 79% reduction, 21,239 tokens saved, 5x smaller**
18
+ **Real-world results from [Karafka documentation](https://karafka.io/docs/) (10 pages analyzed):**
19
+
20
+ <p align="center">
21
+ <img src="misc/diff.png" alt="Karafka documentation optimization results">
22
+ </p>
23
+
24
+ **Average reduction: 83% fewer tokens**
18
25
 
19
26
  ## Quick Start
20
27
 
@@ -138,11 +145,6 @@ excludes:
138
145
  - "**/drafts/**"
139
146
  ```
140
147
 
141
- **Configuration precedence:**
142
- 1. CLI flags (highest)
143
- 2. Config file
144
- 3. Defaults
145
-
146
148
  ## CLI Commands
147
149
 
148
150
  ```bash
@@ -164,38 +166,6 @@ llm-docs-builder version # Show version
164
166
  -v, --verbose Detailed output
165
167
  ```
166
168
 
167
- ## Ruby API
168
-
169
- ```ruby
170
- require 'llm_docs_builder'
171
-
172
- # Transform single file with custom options
173
- transformed = LlmDocsBuilder.transform_markdown(
174
- 'README.md',
175
- base_url: 'https://myproject.io',
176
- remove_code_examples: true,
177
- remove_images: true,
178
- generate_toc: true,
179
- custom_instruction: 'AI-optimized documentation'
180
- )
181
-
182
- # Bulk transform
183
- files = LlmDocsBuilder.bulk_transform(
184
- './docs',
185
- base_url: 'https://myproject.io',
186
- suffix: '.llm',
187
- remove_duplicates: true,
188
- generate_toc: true
189
- )
190
-
191
- # Generate llms.txt
192
- content = LlmDocsBuilder.generate_from_docs(
193
- './docs',
194
- base_url: 'https://myproject.io',
195
- title: 'My Project'
196
- )
197
- ```
198
-
199
169
  ## Serving Optimized Docs to AI Bots
200
170
 
201
171
  After using `bulk-transform` with `suffix: .llm`, configure your web server to serve optimized versions to AI bots:
@@ -221,27 +191,6 @@ location ~ ^/docs/(.*)\.md$ {
221
191
  }
222
192
  ```
223
193
 
224
- ## Real-World Results: Karafka Framework
225
-
226
- **Before:** 140+ lines of custom transformation code
227
-
228
- **After:** 6 lines of configuration
229
- ```yaml
230
- docs: ./online/docs
231
- base_url: https://karafka.io/docs
232
- convert_urls: true
233
- remove_comments: true
234
- remove_badges: true
235
- remove_frontmatter: true
236
- normalize_whitespace: true
237
- suffix: "" # In-place for build pipeline
238
- ```
239
-
240
- **Results:**
241
- - 93% average token reduction
242
- - 20-36x smaller files
243
- - Automated via GitHub Actions
244
-
245
194
  ## Docker Usage
246
195
 
247
196
  ```bash
@@ -281,43 +230,21 @@ layout: docs
281
230
 
282
231
  [Click here to see the complete API documentation](./api.md)
283
232
 
284
- ```ruby
285
233
  api = API.new
286
234
  ```
287
235
 
288
- ![Diagram](./diagram.png)
289
- ```
290
-
291
236
  **After transformation (with default options):**
237
+
292
238
  ```markdown
293
239
  # API Documentation
294
240
 
295
241
  [complete API documentation](./api.md)
296
242
 
297
- ```ruby
298
243
  api = API.new
299
244
  ```
300
- ```
301
245
 
302
246
  **Token reduction:** ~40-60% depending on configuration
303
247
 
304
- ## FAQ
305
-
306
- **Q: Do I need to use llms.txt?**
307
- No. The compare and transform commands work independently.
308
-
309
- **Q: Will this change how humans see my docs?**
310
- Not with default `suffix: .llm`. Separate files are served only to AI bots.
311
-
312
- **Q: Can I use this in my build pipeline?**
313
- Yes. Use `suffix: ""` for in-place transformation.
314
-
315
- **Q: How do I know if it's working?**
316
- Use `llm-docs-builder compare` to measure before and after.
317
-
318
- **Q: What about private documentation?**
319
- Use the `excludes` option to skip sensitive files.
320
-
321
248
  ## RAG Enhancement Features
322
249
 
323
250
  ### Heading Normalization
@@ -405,43 +332,6 @@ All compression features can be used individually for fine-grained control:
405
332
  - `convert_urls: true` - Convert `.html`/`.htm` URLs to `.md` format
406
333
  - `normalize_whitespace: true` - Reduce excessive blank lines and remove trailing whitespace
407
334
 
408
- ### Example Usage
409
-
410
- ```ruby
411
- # Fine-grained control
412
- LlmDocsBuilder.transform_markdown(
413
- 'README.md',
414
- remove_frontmatter: true,
415
- remove_badges: true,
416
- remove_images: true,
417
- simplify_links: true,
418
- generate_toc: true,
419
- normalize_whitespace: true
420
- )
421
- ```
422
-
423
- Or configure via YAML:
424
-
425
- ```yaml
426
- # llm-docs-builder.yml
427
- docs: ./docs
428
- base_url: https://myproject.io
429
- suffix: .llm
430
-
431
- # Pick exactly what you need
432
- remove_frontmatter: true
433
- remove_comments: true
434
- remove_badges: true
435
- remove_images: true
436
- simplify_links: true
437
- generate_toc: true
438
- normalize_whitespace: true
439
- ```
440
-
441
- ## Contributing
442
-
443
- Bug reports and pull requests welcome at [github.com/mensfeld/llm-docs-builder](https://github.com/mensfeld/llm-docs-builder).
444
-
445
335
  ## License
446
336
 
447
337
  Available as open source under the [MIT License](https://opensource.org/licenses/MIT).
@@ -8,6 +8,8 @@ module LlmDocsBuilder
8
8
  #
9
9
  # @api private
10
10
  class OutputFormatter
11
+ # Threshold percentage below which we consider there's no AI-optimized version
12
+ NO_AI_VERSION_THRESHOLD = 5
11
13
  # Format bytes into human-readable string
12
14
  #
13
15
  # @param bytes [Integer] number of bytes
@@ -56,10 +58,16 @@ module LlmDocsBuilder
56
58
 
57
59
  if result[:reduction_bytes].positive?
58
60
  display_reduction(result)
61
+
62
+ # Detect if there's no dedicated AI version
63
+ if result[:reduction_percent] < NO_AI_VERSION_THRESHOLD
64
+ display_no_ai_version_message(result)
65
+ end
59
66
  elsif result[:reduction_bytes].negative?
60
67
  display_increase(result)
61
68
  else
62
69
  puts 'Same size'
70
+ display_no_ai_version_message(result)
63
71
  end
64
72
 
65
73
  puts '=' * 60
@@ -89,5 +97,43 @@ module LlmDocsBuilder
89
97
  puts "Token increase: #{format_number(token_increase)} tokens (#{token_increase_percent}%)"
90
98
  puts "Factor: #{result[:factor]}x larger"
91
99
  end
100
+
101
+ # Display message when no dedicated AI version is detected
102
+ #
103
+ # @param result [Hash] comparison results
104
+ # @api private
105
+ def self.display_no_ai_version_message(result)
106
+ puts ''
107
+ puts 'WARNING: NO DEDICATED AI VERSION DETECTED'
108
+ puts ''
109
+ puts 'The server is returning nearly identical content to both human and AI'
110
+ puts 'User-Agents, indicating no AI-optimized version is currently served.'
111
+ puts ''
112
+ puts 'POTENTIAL SAVINGS WITH AI OPTIMIZATION:'
113
+ puts ''
114
+ puts 'Based on typical documentation optimization results, you could expect:'
115
+ puts ' • 67-95% token reduction (average 83%)'
116
+ puts ' • 3-20x smaller file sizes'
117
+ puts ' • Faster LLM processing times'
118
+ puts ' • Reduced API costs for AI queries'
119
+ puts ' • Improved response accuracy'
120
+ puts ''
121
+ puts "For this page specifically (~#{format_number(result[:human_tokens])} tokens):"
122
+ puts " • Estimated savings: ~#{format_number((result[:human_tokens] * 0.83).round)} tokens (83% reduction)"
123
+ puts " • Could reduce to: ~#{format_number((result[:human_tokens] * 0.17).round)} tokens"
124
+ puts " • Potential size: ~#{format_bytes((result[:human_size] * 0.17).round)}"
125
+ puts ''
126
+ puts 'HOW TO IMPLEMENT AI-OPTIMIZED DOCUMENTATION:'
127
+ puts ''
128
+ puts '1. Transform your docs with llm-docs-builder:'
129
+ puts ' llm-docs-builder bulk-transform --docs ./docs --config llm-docs-builder.yml'
130
+ puts ''
131
+ puts '2. Configure your web server to serve .md files to AI bots:'
132
+ puts ' See: https://github.com/mensfeld/llm-docs-builder#serving-optimized-docs'
133
+ puts ''
134
+ puts '3. Measure your actual savings:'
135
+ puts ' llm-docs-builder compare --url <your-url> --file <local-md>'
136
+ puts ''
137
+ end
92
138
  end
93
139
  end
@@ -38,8 +38,18 @@ module LlmDocsBuilder
38
38
  separator = options[:heading_separator] || ' / '
39
39
  heading_stack = []
40
40
  lines = content.lines
41
+ in_code_block = false
41
42
 
42
43
  transformed_lines = lines.map do |line|
44
+ # Track code block boundaries (fenced code blocks with ``` or ~~~)
45
+ if line.match?(/^```|^~~~/)
46
+ in_code_block = !in_code_block
47
+ next line
48
+ end
49
+
50
+ # Skip heading processing if inside code block
51
+ next line if in_code_block
52
+
43
53
  # Match markdown headings (1-6 hash symbols followed by space and text)
44
54
  heading_match = line.match(/^(#+)\s+(.+)$/)
45
55
 
@@ -2,5 +2,5 @@
2
2
 
3
3
  module LlmDocsBuilder
4
4
  # Current version of the LlmDocsBuilder gem
5
- VERSION = '0.8.2'
5
+ VERSION = '0.9.1'
6
6
  end
data/misc/diff.png ADDED
Binary file
data/misc/logo.png ADDED
Binary file
Binary file
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: llm-docs-builder
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.8.2
4
+ version: 0.9.1
5
5
  platform: ruby
6
6
  authors:
7
7
  - Maciej Mensfeld
@@ -147,6 +147,9 @@ files:
147
147
  - lib/llm_docs_builder/version.rb
148
148
  - llm-docs-builder.gemspec
149
149
  - llm-docs-builder.yml.example
150
+ - misc/diff.png
151
+ - misc/logo.png
152
+ - misc/logo_wide.png
150
153
  - renovate.json
151
154
  homepage: https://github.com/mensfeld/llm-docs-builder
152
155
  licenses: