llm-docs-builder 0.6.0 → 0.8.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/.rspec +3 -0
- data/CHANGELOG.md +59 -0
- data/Gemfile.lock +1 -1
- data/README.md +241 -541
- data/bin/rspecs +2 -1
- data/lib/llm_docs_builder/cli.rb +1 -62
- data/lib/llm_docs_builder/comparator.rb +4 -16
- data/lib/llm_docs_builder/config.rb +74 -5
- data/lib/llm_docs_builder/generator.rb +67 -8
- data/lib/llm_docs_builder/markdown_transformer.rb +61 -126
- data/lib/llm_docs_builder/output_formatter.rb +93 -0
- data/lib/llm_docs_builder/parser.rb +1 -59
- data/lib/llm_docs_builder/text_compressor.rb +164 -0
- data/lib/llm_docs_builder/token_estimator.rb +52 -0
- data/lib/llm_docs_builder/transformers/base_transformer.rb +30 -0
- data/lib/llm_docs_builder/transformers/content_cleanup_transformer.rb +106 -0
- data/lib/llm_docs_builder/transformers/enhancement_transformer.rb +95 -0
- data/lib/llm_docs_builder/transformers/heading_transformer.rb +72 -0
- data/lib/llm_docs_builder/transformers/link_transformer.rb +84 -0
- data/lib/llm_docs_builder/transformers/whitespace_transformer.rb +44 -0
- data/lib/llm_docs_builder/version.rb +1 -1
- metadata +11 -3
- data/CLAUDE.md +0 -178
- data/llm-docs-builder.yml +0 -7
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 85874aed492c7a756acd3fec52228dedc3e3961b2e413a482aafd155e9c01d5e
|
4
|
+
data.tar.gz: de9d71e8bf15aace848995366cf1f6e46f758cbc86f9fa9b5bdd40be9e4695ce
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: ee2f1c859b24726a812a3d162f6ea59b79e5a19c86d0df4202b4c4b3ad9fcfbe11ef28ea61784ce3594260952fec937b47858f01a2f0e37929173e367aab4547
|
7
|
+
data.tar.gz: 4d40ca6817d4b7b5d1ce9d9e6290e76a7cd145159b0ea8ac984e0e12d127cdc861e5a9e351e1cfbf10f93243c1e63956c552b5e3db4edbfdf9cd76c97859efea
|
data/.rspec
ADDED
data/CHANGELOG.md
CHANGED
@@ -1,5 +1,64 @@
|
|
1
1
|
# Changelog
|
2
2
|
|
3
|
+
## 0.8.0 (2025-10-14)
|
4
|
+
- [Feature] **RAG Enhancement: Heading Normalization** - Transform headings to include hierarchical context for better RAG retrieval.
|
5
|
+
- Adds parent context to H2-H6 headings (e.g., "Configuration / Consumer Settings / auto_offset_reset")
|
6
|
+
- Makes each section self-contained when documents are chunked
|
7
|
+
- Configurable separator (default: " / ")
|
8
|
+
- Enable with `normalize_headings: true`
|
9
|
+
- Perfect for vector databases and RAG systems
|
10
|
+
- [Feature] **RAG Enhancement: Enhanced llms.txt Metadata** - Generate enriched llms.txt files with machine-readable metadata.
|
11
|
+
- Token counts per document (helps AI agents manage context windows)
|
12
|
+
- Last modified timestamps (helps prefer recent docs)
|
13
|
+
- Priority labels: high/medium/low (helps guide which docs to fetch first)
|
14
|
+
- Optional compression ratios (shows optimization effectiveness)
|
15
|
+
- Enable with `include_metadata: true`, `include_tokens: true`, `include_timestamps: true`, `include_priority: true`
|
16
|
+
- [Enhancement] Added `HeadingTransformer` class with comprehensive heading hierarchy tracking.
|
17
|
+
- [Enhancement] Added priority calculation in Generator (README=high, getting started=high, tutorials=medium, etc.).
|
18
|
+
- [Enhancement] Updated `Config#merge_with_options` to support all new RAG options.
|
19
|
+
- [Testing] Added 10 comprehensive tests for HeadingTransformer covering edge cases.
|
20
|
+
- [Testing] All 303 tests passing with 96.94% line coverage and 85.59% branch coverage.
|
21
|
+
- [Documentation] Added "RAG Enhancement Features" section to README with examples and use cases.
|
22
|
+
- [Documentation] Added detailed implementation guide in RAG_FEATURES.md.
|
23
|
+
- [Documentation] Added example RAG configuration in examples/rag-config.yml.
|
24
|
+
|
25
|
+
## 0.7.0 (2025-10-09)
|
26
|
+
- [Feature] **Advanced Token Optimization** - Added 8 new compression options to reduce token consumption:
|
27
|
+
- `remove_code_examples`: Remove code blocks and inline code
|
28
|
+
- `remove_images`: Remove all image syntax
|
29
|
+
- `simplify_links`: Simplify verbose link text (e.g., "Click here to see the docs" → "docs")
|
30
|
+
- `remove_blockquotes`: Remove blockquote formatting while preserving content
|
31
|
+
- `generate_toc`: Generate table of contents from headings with anchor links
|
32
|
+
- `custom_instruction`: Inject AI context messages at document top
|
33
|
+
- `remove_stopwords`: Remove common stopwords from prose (preserves code blocks)
|
34
|
+
- `remove_duplicates`: Remove duplicate paragraphs using fuzzy matching
|
35
|
+
- [Feature] **Compression Presets** - 6 built-in presets for easy usage:
|
36
|
+
- `conservative`: 15-25% reduction (safest transformations)
|
37
|
+
- `moderate`: 30-45% reduction (balanced approach)
|
38
|
+
- `aggressive`: 50-70% reduction (maximum compression)
|
39
|
+
- `documentation`: 35-50% reduction (preserves code examples)
|
40
|
+
- `tutorial`: 20% reduction (minimal compression for learning materials)
|
41
|
+
- `api_reference`: 40% reduction (optimized for API documentation)
|
42
|
+
- [Enhancement] **Refactored Architecture** - Split monolithic `MarkdownTransformer` into focused transformer classes following SRP:
|
43
|
+
- `BaseTransformer`: Common interface for all transformers
|
44
|
+
- `LinkTransformer`: Link expansion, URL conversion, link simplification
|
45
|
+
- `ContentCleanupTransformer`: All removal operations
|
46
|
+
- `EnhancementTransformer`: TOC generation and custom instructions
|
47
|
+
- `WhitespaceTransformer`: Whitespace normalization
|
48
|
+
- `MarkdownTransformer`: Pipeline orchestrator
|
49
|
+
- [Enhancement] Added `TextCompressor` class for advanced text compression (stopwords, duplicates).
|
50
|
+
- [Enhancement] Added `TokenEstimator` class for token count estimation.
|
51
|
+
- [Enhancement] Added `OutputFormatter` class for formatted output (extracted from CLI).
|
52
|
+
- [Enhancement] Added `CompressionPresets` class with preset configurations.
|
53
|
+
- [Enhancement] Custom instructions now adapt to blockquote removal setting (no blockquote format when `remove_blockquotes: true`).
|
54
|
+
- [Enhancement] Updated `Config#merge_with_options` to support all new compression options.
|
55
|
+
- [Testing] Added 20 new integration tests for compression features and presets.
|
56
|
+
- [Testing] Added automatic config file backup/restore in test suite to prevent interference.
|
57
|
+
- [Testing] All 110 tests passing with 79.44% code coverage.
|
58
|
+
- [Documentation] **Shortened README.md by 47%** (729 → 381 lines) while adding all new features.
|
59
|
+
- [Documentation] Added comprehensive compression examples and use cases.
|
60
|
+
- [Documentation] Added preset comparison table showing what each preset does.
|
61
|
+
|
3
62
|
## 0.6.0 (2025-10-09)
|
4
63
|
- [Breaking] **Project renamed from `llms-txt-ruby` to `llm-docs-builder`** to better reflect expanded functionality beyond just llms.txt generation.
|
5
64
|
- Gem name: `llms-txt-ruby` → `llm-docs-builder`
|