llm-docs-builder 0.9.0 → 0.9.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 7f0653d0b7cbe76ef97429cd96ebf225783d2d454c765346c1c13279b73b7527
4
- data.tar.gz: 206472b3a91c477827c3adeb73679758ebe1b3d73b47ef9c60a04afcce183c79
3
+ metadata.gz: 6ddb04b5a0e30d913d2043a79a3d7e14d35bd0166cf7104a300457887b1019cf
4
+ data.tar.gz: 5301d904225d0d139a2c2dd8184695eb66e2d858b8f3b5a86026b16a1ccb8c44
5
5
  SHA512:
6
- metadata.gz: eca112eb185027b8fd905d062d1abf54ed147025d16608176aae6ee6caf836af658d66bb315f5e11137d109e10fb6fce85aedd6d730afbafd9379c49061e1bbd
7
- data.tar.gz: 2b6b7c2227e1b3926378238f743825b36f350cbfa97abaffad2a26e1061063e6185eafb18762152c1174cbcd3dd1c7132cce6affc1b2cbf1ad19be8eda83dfc0
6
+ metadata.gz: 3daacfdde22c93023677e7e0e7487158b3e1f30a9c4e7ec2bf015bc220ff32296334411c74523aaa411d1ef21b1aa47a2e8cb2c7cada797922d89d5690f7ab0a
7
+ data.tar.gz: 50a8f8d29f9e79e6f5ab2774111385f6098378491b2732bb84a9d3eee0b2f5be4b6beae196ef76739a6ac1c81562da9ec6f3e66da562e657828874c55ec06ce1
data/CHANGELOG.md CHANGED
@@ -1,5 +1,12 @@
1
1
  # Changelog
2
2
 
3
+ ## 0.9.1 (2025-10-17)
4
+ - [Fix] Fixed HeadingTransformer incorrectly treating hash symbols in code blocks as headings.
5
+ - Now properly tracks code block boundaries (fenced with ``` or ~~~)
6
+ - Skips heading processing for lines inside code blocks
7
+ - Prevents Ruby/Python/Shell comments from being interpreted as markdown headings
8
+ - Added comprehensive test coverage for code block handling
9
+
3
10
  ## 0.9.0 (2025-10-17)
4
11
  - [Feature] **No AI Version Detection** - The `compare` command now detects when websites don't serve AI-optimized versions.
5
12
  - Triggers when reduction is <5% (nearly identical content for human and AI User-Agents)
data/Gemfile.lock CHANGED
@@ -1,7 +1,7 @@
1
1
  PATH
2
2
  remote: .
3
3
  specs:
4
- llm-docs-builder (0.9.0)
4
+ llm-docs-builder (0.9.1)
5
5
  zeitwerk (~> 2.6)
6
6
 
7
7
  GEM
data/README.md CHANGED
@@ -15,10 +15,13 @@ llm-docs-builder transforms markdown documentation to be AI-friendly and generat
15
15
 
16
16
  When LLMs fetch documentation, they typically get HTML pages designed for humans - complete with navigation bars, footers, JavaScript, CSS, and other overhead. This wastes 70-90% of your context window on content that doesn't help answer questions.
17
17
 
18
- **Real example from Karafka documentation:**
19
- - Human HTML version: 104.4 KB (~26,735 tokens)
20
- - AI markdown version: 21.5 KB (~5,496 tokens)
21
- - **Result: 79% reduction, 21,239 tokens saved, 5x smaller**
18
+ **Real-world results from [Karafka documentation](https://karafka.io/docs/) (10 pages analyzed):**
19
+
20
+ <p align="center">
21
+ <img src="misc/diff.png" alt="Karafka documentation optimization results">
22
+ </p>
23
+
24
+ **Average reduction: 83% fewer tokens**
22
25
 
23
26
  ## Quick Start
24
27
 
@@ -142,11 +145,6 @@ excludes:
142
145
  - "**/drafts/**"
143
146
  ```
144
147
 
145
- **Configuration precedence:**
146
- 1. CLI flags (highest)
147
- 2. Config file
148
- 3. Defaults
149
-
150
148
  ## CLI Commands
151
149
 
152
150
  ```bash
@@ -168,38 +166,6 @@ llm-docs-builder version # Show version
168
166
  -v, --verbose Detailed output
169
167
  ```
170
168
 
171
- ## Ruby API
172
-
173
- ```ruby
174
- require 'llm_docs_builder'
175
-
176
- # Transform single file with custom options
177
- transformed = LlmDocsBuilder.transform_markdown(
178
- 'README.md',
179
- base_url: 'https://myproject.io',
180
- remove_code_examples: true,
181
- remove_images: true,
182
- generate_toc: true,
183
- custom_instruction: 'AI-optimized documentation'
184
- )
185
-
186
- # Bulk transform
187
- files = LlmDocsBuilder.bulk_transform(
188
- './docs',
189
- base_url: 'https://myproject.io',
190
- suffix: '.llm',
191
- remove_duplicates: true,
192
- generate_toc: true
193
- )
194
-
195
- # Generate llms.txt
196
- content = LlmDocsBuilder.generate_from_docs(
197
- './docs',
198
- base_url: 'https://myproject.io',
199
- title: 'My Project'
200
- )
201
- ```
202
-
203
169
  ## Serving Optimized Docs to AI Bots
204
170
 
205
171
  After using `bulk-transform` with `suffix: .llm`, configure your web server to serve optimized versions to AI bots:
@@ -225,27 +191,6 @@ location ~ ^/docs/(.*)\.md$ {
225
191
  }
226
192
  ```
227
193
 
228
- ## Real-World Results: Karafka Framework
229
-
230
- **Before:** 140+ lines of custom transformation code
231
-
232
- **After:** 6 lines of configuration
233
- ```yaml
234
- docs: ./online/docs
235
- base_url: https://karafka.io/docs
236
- convert_urls: true
237
- remove_comments: true
238
- remove_badges: true
239
- remove_frontmatter: true
240
- normalize_whitespace: true
241
- suffix: "" # In-place for build pipeline
242
- ```
243
-
244
- **Results:**
245
- - 93% average token reduction
246
- - 20-36x smaller files
247
- - Automated via GitHub Actions
248
-
249
194
  ## Docker Usage
250
195
 
251
196
  ```bash
@@ -285,43 +230,21 @@ layout: docs
285
230
 
286
231
  [Click here to see the complete API documentation](./api.md)
287
232
 
288
- ```ruby
289
233
  api = API.new
290
234
  ```
291
235
 
292
- ![Diagram](./diagram.png)
293
- ```
294
-
295
236
  **After transformation (with default options):**
237
+
296
238
  ```markdown
297
239
  # API Documentation
298
240
 
299
241
  [complete API documentation](./api.md)
300
242
 
301
- ```ruby
302
243
  api = API.new
303
244
  ```
304
- ```
305
245
 
306
246
  **Token reduction:** ~40-60% depending on configuration
307
247
 
308
- ## FAQ
309
-
310
- **Q: Do I need to use llms.txt?**
311
- No. The compare and transform commands work independently.
312
-
313
- **Q: Will this change how humans see my docs?**
314
- Not with default `suffix: .llm`. Separate files are served only to AI bots.
315
-
316
- **Q: Can I use this in my build pipeline?**
317
- Yes. Use `suffix: ""` for in-place transformation.
318
-
319
- **Q: How do I know if it's working?**
320
- Use `llm-docs-builder compare` to measure before and after.
321
-
322
- **Q: What about private documentation?**
323
- Use the `excludes` option to skip sensitive files.
324
-
325
248
  ## RAG Enhancement Features
326
249
 
327
250
  ### Heading Normalization
@@ -409,39 +332,6 @@ All compression features can be used individually for fine-grained control:
409
332
  - `convert_urls: true` - Convert `.html`/`.htm` URLs to `.md` format
410
333
  - `normalize_whitespace: true` - Reduce excessive blank lines and remove trailing whitespace
411
334
 
412
- ### Example Usage
413
-
414
- ```ruby
415
- # Fine-grained control
416
- LlmDocsBuilder.transform_markdown(
417
- 'README.md',
418
- remove_frontmatter: true,
419
- remove_badges: true,
420
- remove_images: true,
421
- simplify_links: true,
422
- generate_toc: true,
423
- normalize_whitespace: true
424
- )
425
- ```
426
-
427
- Or configure via YAML:
428
-
429
- ```yaml
430
- # llm-docs-builder.yml
431
- docs: ./docs
432
- base_url: https://myproject.io
433
- suffix: .llm
434
-
435
- # Pick exactly what you need
436
- remove_frontmatter: true
437
- remove_comments: true
438
- remove_badges: true
439
- remove_images: true
440
- simplify_links: true
441
- generate_toc: true
442
- normalize_whitespace: true
443
- ```
444
-
445
335
  ## License
446
336
 
447
337
  Available as open source under the [MIT License](https://opensource.org/licenses/MIT).
@@ -38,8 +38,18 @@ module LlmDocsBuilder
38
38
  separator = options[:heading_separator] || ' / '
39
39
  heading_stack = []
40
40
  lines = content.lines
41
+ in_code_block = false
41
42
 
42
43
  transformed_lines = lines.map do |line|
44
+ # Track code block boundaries (fenced code blocks with ``` or ~~~)
45
+ if line.match?(/^```|^~~~/)
46
+ in_code_block = !in_code_block
47
+ next line
48
+ end
49
+
50
+ # Skip heading processing if inside code block
51
+ next line if in_code_block
52
+
43
53
  # Match markdown headings (1-6 hash symbols followed by space and text)
44
54
  heading_match = line.match(/^(#+)\s+(.+)$/)
45
55
 
@@ -2,5 +2,5 @@
2
2
 
3
3
  module LlmDocsBuilder
4
4
  # Current version of the LlmDocsBuilder gem
5
- VERSION = '0.9.0'
5
+ VERSION = '0.9.1'
6
6
  end
data/misc/diff.png ADDED
Binary file
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: llm-docs-builder
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.9.0
4
+ version: 0.9.1
5
5
  platform: ruby
6
6
  authors:
7
7
  - Maciej Mensfeld
@@ -147,6 +147,7 @@ files:
147
147
  - lib/llm_docs_builder/version.rb
148
148
  - llm-docs-builder.gemspec
149
149
  - llm-docs-builder.yml.example
150
+ - misc/diff.png
150
151
  - misc/logo.png
151
152
  - misc/logo_wide.png
152
153
  - renovate.json