llms-txt-ruby 0.0.0 → 0.1.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
data/README.md CHANGED
@@ -1,23 +1,30 @@
1
1
  # llms-txt-ruby
2
2
 
3
- > ⚠️ **Work in Progress** - This gem is currently under active development and not yet ready for any use.
3
+ [![CI](https://github.com/mensfeld/llms-txt-ruby/actions/workflows/ci.yml/badge.svg)](
4
+ https://github.com/mensfeld/llms-txt-ruby/actions/workflows/ci.yml)
4
5
 
5
- A Ruby gem that automatically generates [llms.txt](https://llmstxt.org/) files for Ruby projects using AI. This gem analyzes your Ruby codebase, extracts documentation from YARD comments, README files, and gemspec metadata, then uses a Large Language Model to create a properly formatted llms.txt file following the official specification.
6
+ A Ruby tool for generating [llms.txt](https://llmstxt.org/) files from existing markdown
7
+ documentation. Transform your docs to be AI-friendly.
6
8
 
7
9
  ## What is llms.txt?
8
10
 
9
- The llms.txt file is a proposed standard for providing LLM-friendly content on websites. It offers brief background information, guidance, and links to detailed markdown files, helping Large Language Models understand and navigate your project more effectively.
11
+ The llms.txt file is a proposed standard for providing LLM-friendly content on websites. It
12
+ offers brief background information, guidance, and links to detailed markdown files, helping
13
+ Large Language Models understand and navigate your project more effectively.
10
14
 
11
15
  Learn more at [llmstxt.org](https://llmstxt.org/).
12
16
 
13
- ## Features
17
+ ## What This Tool Does
14
18
 
15
- - 🤖 **AI-powered generation**: Uses Claude or GPT models to create natural, comprehensive llms.txt files
16
- - 📚 **YARD integration**: Extracts rich documentation from YARD comments and tags
17
- - 🔧 **Configurable**: Supports multiple LLM providers and customizable options
18
- - 🖥️ **CLI + API**: Use from command line or integrate into your Ruby applications
19
- - 📁 **Project awareness**: Understands Ruby project structure and conventions
20
- - 🎯 **Spec compliant**: Generates files that strictly follow the llms.txt specification
19
+ This library converts existing human-first documentation into LLM-friendly formats:
20
+
21
+ 1. **Generates llms.txt** - Transforms your existing markdown documentation into a structured
22
+ overview that helps LLMs understand your project's layout and find relevant information
23
+ 2. **Transforms markdown** - Converts individual markdown files from human-readable format to
24
+ AI-optimized format by expanding relative links to absolute URLs and normalizing link
25
+ structures
26
+ 3. **Bulk transforms** - Processes all markdown files in a directory recursively, creating
27
+ LLM-friendly versions alongside originals with customizable exclusion patterns
21
28
 
22
29
  ## Installation
23
30
 
@@ -39,38 +46,443 @@ Or install it yourself as:
39
46
  $ gem install llms-txt-ruby
40
47
  ```
41
48
 
42
- ## Example Output
49
+ ## Quick Start
43
50
 
44
- Here's what a generated llms.txt file might look like:
51
+ ### Option 1: Using Config File (Recommended)
45
52
 
46
- ```markdown
47
- # MyAwesomeGem
53
+ Create a `llms-txt.yml` file in your project root:
54
+
55
+ ```yaml
56
+ # llms-txt.yml
57
+ docs: ./docs
58
+ base_url: https://myproject.io
59
+ title: My Awesome Project
60
+ description: A Ruby library that helps developers build amazing applications
61
+ output: llms.txt
62
+ convert_urls: true
63
+ verbose: false
64
+ ```
48
65
 
49
- > MyAwesomeGem is a Ruby library for processing data with advanced algorithms and providing a clean API for developers.
66
+ Then simply run:
50
67
 
51
- This gem provides a comprehensive toolkit for data processing, featuring both synchronous and asynchronous processing capabilities. It includes built-in caching, error handling, and extensive configuration options.
68
+ ```bash
69
+ llms-txt generate
70
+ ```
52
71
 
53
- ## Documentation
72
+ ### Option 2: Using CLI Only
54
73
 
55
- - [Getting Started Guide](docs/getting_started.md): Quick introduction and basic usage examples
56
- - [API Documentation](https://rubydoc.info/gems/my_awesome_gem): Complete API reference
57
- - [Configuration Guide](docs/configuration.md): Detailed configuration options
74
+ ```bash
75
+ # Generate from docs directory
76
+ llms-txt generate --docs ./docs
77
+
78
+ # Transform a single file
79
+ llms-txt transform README.md
58
80
 
59
- ## Examples
81
+ # Transform all markdown files in directory
82
+ llms-txt bulk-transform --docs ./docs
83
+
84
+ # Use custom config file
85
+ llms-txt generate --config my-config.yml
86
+ ```
60
87
 
61
- - [Basic Usage Examples](examples/basic_usage.rb): Simple examples to get started
62
- - [Advanced Patterns](examples/advanced_patterns.rb): Complex usage patterns and best practices
88
+ ## CLI Reference
63
89
 
64
- ## Optional
90
+ ### Commands
65
91
 
66
- - [Contributing Guidelines](CONTRIBUTING.md): How to contribute to this project
67
- - [Changelog](CHANGELOG.md): Version history and changes
92
+ ```bash
93
+ llms-txt generate [options] # Generate llms.txt from documentation (default)
94
+ llms-txt transform [file] # Transform a markdown file to be AI-friendly
95
+ llms-txt bulk-transform [options] # Transform all markdown files in directory
96
+ llms-txt parse [file] # Parse existing llms.txt file
97
+ llms-txt validate [file] # Validate llms.txt file
98
+ llms-txt version # Show version
68
99
  ```
69
100
 
70
- ## License
101
+ ### Options
102
+
103
+ ```bash
104
+ -c, --config PATH Configuration file path (default: llms-txt.yml)
105
+ -d, --docs PATH Path to documentation directory or file
106
+ -o, --output PATH Output file path
107
+ -v, --verbose Verbose output
108
+ -h, --help Show help message
109
+ ```
110
+
111
+ *For advanced options like base_url, title, description, and convert_urls, use a config file.*
112
+
113
+ ## Configuration File
114
+
115
+ The recommended way to use llms-txt is with a `llms-txt.yml` config file. This allows you to:
116
+
117
+ - ✅ Store all your settings in one place
118
+ - ✅ Version control your llms.txt configuration
119
+ - ✅ Avoid typing long CLI commands repeatedly
120
+ - ✅ Share configuration across team members
121
+
122
+ ### Config File Options
123
+
124
+ ```yaml
125
+ # Path to documentation directory or file
126
+ docs: ./docs
127
+
128
+ # Base URL for expanding relative links (optional)
129
+ base_url: https://myproject.io
130
+
131
+ # Project information (optional - auto-detected if not provided)
132
+ title: My Project Name
133
+ description: Brief description of what your project does
134
+
135
+ # Output file (optional, default: llms.txt)
136
+ output: llms.txt
137
+
138
+ # Transformation options (optional)
139
+ convert_urls: true # Convert .html links to .md
140
+ verbose: false # Enable verbose output
141
+ ```
142
+
143
+ The config file will be automatically found if named:
144
+ - `llms-txt.yml`
145
+ - `llms-txt.yaml`
146
+ - `.llms-txt.yml`
147
+
148
+ ## Bulk Transformation
149
+
150
+ The `bulk-transform` command processes all markdown files in a directory recursively, creating
151
+ AI-friendly versions alongside the originals. This is perfect for transforming entire
152
+ documentation trees.
153
+
154
+ ### Key Features
155
+
156
+ - **Recursive processing** - Finds and transforms all `.md` files in nested directories
157
+ - **Preserves structure** - Maintains your existing directory layout
158
+ - **Exclusion patterns** - Skip files/directories using glob patterns
159
+ - **Custom suffixes** - Choose how transformed files are named
160
+ - **LLM optimizations** - Expands relative links, converts HTML URLs, etc.
161
+
162
+ ### Usage
163
+
164
+ ```bash
165
+ # Transform all files with default settings
166
+ llms-txt bulk-transform --docs ./wiki
167
+
168
+ # Using config file (recommended for complex setups)
169
+ llms-txt bulk-transform --config karafka-config.yml
170
+ ```
171
+
172
+ ### Example Config for Bulk Transformation
173
+
174
+ ```yaml
175
+ # karafka-config.yml
176
+ docs: ./wiki
177
+ base_url: https://karafka.io
178
+ suffix: .llm
179
+ convert_urls: true
180
+ excludes:
181
+ - "**/private/**" # Skip private directories
182
+ - "**/draft-*.md" # Skip draft files
183
+ - "**/old-docs/**" # Skip legacy documentation
184
+ ```
185
+
186
+ ### Example Output
187
+
188
+ With the config above, these files:
189
+ ```
190
+ wiki/
191
+ ├── Home.md
192
+ ├── getting-started.md
193
+ ├── api/
194
+ │ ├── consumers.md
195
+ │ └── producers.md
196
+ └── private/
197
+ └── internal.md
198
+ ```
199
+
200
+ Become:
201
+ ```
202
+ wiki/
203
+ ├── Home.md
204
+ ├── Home.llm.md ← AI-friendly version
205
+ ├── getting-started.md
206
+ ├── getting-started.llm.md
207
+ ├── api/
208
+ │ ├── consumers.md
209
+ │ ├── consumers.llm.md
210
+ │ ├── producers.md
211
+ │ └── producers.llm.md
212
+ └── private/
213
+ └── internal.md ← Excluded, no .llm.md version
214
+ ```
215
+
216
+ ## Serving LLM-Friendly Documentation
217
+
218
+ After using `bulk-transform` to create `.llm.md` versions of your documentation, you can configure your web server to automatically serve these LLM-optimized versions to AI bots while showing the original versions to human visitors.
219
+
220
+ ### How It Works
221
+
222
+ The strategy is simple:
223
+
224
+ 1. **Detect AI bots** by their User-Agent strings
225
+ 2. **Serve `.llm.md` files** to detected AI bots
226
+ 3. **Serve original `.md` files** to human visitors
227
+ 4. **Automatic selection** - no manual switching needed
228
+
229
+ ### Apache Configuration
230
+
231
+ Add this to your `.htaccess` file:
232
+
233
+ ```apache
234
+ # Detect LLM bots by User-Agent
235
+ SetEnvIf User-Agent "(?i)(openai|anthropic|claude|gpt|chatgpt|bard|gemini|copilot)" IS_LLM_BOT
236
+ SetEnvIf User-Agent "(?i)(perplexity|character\.ai|you\.com|poe\.com|huggingface|replicate)" IS_LLM_BOT
237
+ SetEnvIf User-Agent "(?i)(langchain|llamaindex|semantic|embedding|vector|rag)" IS_LLM_BOT
238
+ SetEnvIf User-Agent "(?i)(ollama|mistral|cohere|together|fireworks|groq)" IS_LLM_BOT
239
+
240
+ # Serve .md files as text/plain
241
+ <FilesMatch "\.md$">
242
+ Header set Content-Type "text/plain; charset=utf-8"
243
+ ForceType text/plain
244
+ </FilesMatch>
245
+
246
+ # Enable rewrite engine
247
+ RewriteEngine On
248
+
249
+ # For LLM bots: rewrite requests to serve .llm.md versions
250
+ RewriteCond %{ENV:IS_LLM_BOT} !^$
251
+ RewriteCond %{REQUEST_URI} ^/docs/.*\.md$ [NC]
252
+ RewriteCond %{REQUEST_URI} !\.llm\.md$ [NC]
253
+ RewriteCond %{DOCUMENT_ROOT}%{REQUEST_URI} -f
254
+ RewriteRule ^(.*)\.md$ $1.llm.md [L]
255
+
256
+ # For LLM bots: handle clean URLs by appending .llm.md
257
+ RewriteCond %{ENV:IS_LLM_BOT} !^$
258
+ RewriteCond %{REQUEST_URI} ^/docs/ [NC]
259
+ RewriteCond %{REQUEST_URI} !\.md$ [NC]
260
+ RewriteCond %{DOCUMENT_ROOT}%{REQUEST_URI}.llm.md -f
261
+ RewriteRule ^(.*)$ $1.llm.md [L]
262
+
263
+ # For regular users: serve original .md files or clean URLs as usual
264
+ # (add your normal URL handling rules here)
265
+ ```
266
+
267
+ ### Nginx Configuration
268
+
269
+ Add this to your nginx server block:
71
270
 
72
- The gem is available as open source under the terms of the [MIT License](https://opensource.org/licenses/MIT).
271
+ ```nginx
272
+ # Map to detect LLM bots
273
+ map $http_user_agent $is_llm_bot {
274
+ default 0;
275
+ "~*(?i)(openai|anthropic|claude|gpt|chatgpt|bard|gemini|copilot)" 1;
276
+ "~*(?i)(perplexity|character\.ai|you\.com|poe\.com|huggingface|replicate)" 1;
277
+ "~*(?i)(langchain|llamaindex|semantic|embedding|vector|rag)" 1;
278
+ "~*(?i)(ollama|mistral|cohere|together|fireworks|groq)" 1;
279
+ }
73
280
 
74
- ---
281
+ server {
282
+ # ... your server configuration ...
283
+
284
+ # Serve .md files as text/plain
285
+ location ~ \.md$ {
286
+ default_type text/plain;
287
+ charset utf-8;
288
+ }
289
+
290
+ # For LLM bots requesting .md files, serve .llm.md version
291
+ location ~ ^/docs/(.*)\.md$ {
292
+ if ($is_llm_bot) {
293
+ rewrite ^(.*)\.md$ $1.llm.md last;
294
+ }
295
+ try_files $uri $uri/ =404;
296
+ }
297
+
298
+ # For LLM bots requesting clean URLs, serve .llm.md version
299
+ location ~ ^/docs/ {
300
+ if ($is_llm_bot) {
301
+ try_files $uri.llm.md $uri $uri/ =404;
302
+ }
303
+ try_files $uri $uri.md $uri/ =404;
304
+ }
305
+ }
306
+ ```
307
+
308
+ ### Cloudflare Workers
309
+
310
+ For serverless deployments, use Cloudflare Workers:
311
+
312
+ ```javascript
313
+ export default {
314
+ async fetch(request) {
315
+ const url = new URL(request.url);
316
+ const userAgent = request.headers.get('user-agent') || '';
317
+
318
+ // Detect LLM bots
319
+ const llmBotPatterns = [
320
+ /openai|anthropic|claude|gpt|chatgpt|bard|gemini|copilot/i,
321
+ /perplexity|character\.ai|you\.com|poe\.com|huggingface|replicate/i,
322
+ /langchain|llamaindex|semantic|embedding|vector|rag/i,
323
+ /ollama|mistral|cohere|together|fireworks|groq/i
324
+ ];
325
+
326
+ const isLLMBot = llmBotPatterns.some(pattern => pattern.test(userAgent));
327
+
328
+ // If LLM bot and requesting docs
329
+ if (isLLMBot && url.pathname.startsWith('/docs/')) {
330
+ // Try to serve .llm.md version
331
+ const llmPath = url.pathname.replace(/\.md$/, '.llm.md');
332
+ if (!url.pathname.endsWith('.llm.md')) {
333
+ url.pathname = llmPath;
334
+ }
335
+ }
336
+
337
+ return fetch(url);
338
+ }
339
+ }
340
+ ```
341
+
342
+ ### Custom Suffix
343
+
344
+ If you used a different suffix with the `bulk-transform` command (e.g., `--suffix .ai`), update your web server rules accordingly.
345
+
346
+ **Apache:**
347
+ ```apache
348
+ RewriteRule ^(.*)\.md$ $1.ai.md [L]
349
+ ```
350
+
351
+ **Nginx:**
352
+ ```nginx
353
+ rewrite ^(.*)\.md$ $1.ai.md last;
354
+ ```
355
+
356
+ **Cloudflare Workers:**
357
+ ```javascript
358
+ const llmPath = url.pathname.replace(/\.md$/, '.ai.md');
359
+ ```
360
+
361
+ ### Example Setup
362
+
363
+ ```yaml
364
+ # llms-txt.yml
365
+ docs: ./docs
366
+ base_url: https://myproject.io
367
+ suffix: .llm
368
+ convert_urls: true
369
+ ```
370
+
371
+ ```bash
372
+ # Generate LLM-friendly versions
373
+ llms-txt bulk-transform --config llms-txt.yml
374
+
375
+ # Deploy both original and .llm.md files to your web server
376
+ # The server will automatically serve the right version to each visitor
377
+ ```
378
+
379
+ ## Ruby API
380
+
381
+ ### Basic Usage
382
+
383
+ ```ruby
384
+ require 'llms_txt'
385
+
386
+ # Option 1: Using config file (recommended)
387
+ content = LlmsTxt.generate_from_docs(config_file: 'llms-txt.yml')
388
+
389
+ # Option 2: Direct options (overrides config)
390
+ content = LlmsTxt.generate_from_docs('./docs',
391
+ base_url: 'https://myproject.io',
392
+ title: 'My Project',
393
+ description: 'A great project'
394
+ )
395
+
396
+ # Option 3: Mix config file with overrides
397
+ content = LlmsTxt.generate_from_docs('./docs',
398
+ config_file: 'my-config.yml',
399
+ title: 'Override Title' # This overrides config file title
400
+ )
401
+
402
+ # Transform markdown with config
403
+ transformed = LlmsTxt.transform_markdown('README.md',
404
+ config_file: 'llms-txt.yml'
405
+ )
406
+
407
+ # Transform with direct options
408
+ transformed = LlmsTxt.transform_markdown('README.md',
409
+ base_url: 'https://myproject.io',
410
+ convert_urls: true
411
+ )
412
+
413
+ # Bulk transform all files in directory
414
+ transformed_files = LlmsTxt.bulk_transform('./wiki',
415
+ base_url: 'https://karafka.io',
416
+ suffix: '.llm',
417
+ excludes: ['**/private/**', '**/draft-*.md']
418
+ )
419
+ puts "Transformed #{transformed_files.size} files"
420
+
421
+ # Bulk transform with config file
422
+ transformed_files = LlmsTxt.bulk_transform('./wiki',
423
+ config_file: 'karafka-config.yml'
424
+ )
425
+
426
+ # Parse and validate (unchanged)
427
+ parsed = LlmsTxt.parse('llms.txt')
428
+ puts parsed.title
429
+ puts parsed.description
430
+
431
+ valid = LlmsTxt.validate(content)
432
+ ```
433
+
434
+ ## How It Works
435
+
436
+ ### Generation Process
437
+
438
+ 1. **Scan for markdown files** - Finds all `.md` files in specified directory
439
+ 2. **Extract metadata** - Gets title and description from each file
440
+ 3. **Prioritize docs** - Orders by importance (README first, then guides, APIs, etc.)
441
+ 4. **Build llms.txt** - Creates properly formatted output with links and descriptions
442
+
443
+ ### Transformation Process
444
+
445
+ 1. **Expand relative links** - Convert `./docs/api.md` to `https://myproject.io/docs/api.md`
446
+ 2. **Convert URLs** - Change `.html` links to `.md` for better AI understanding
447
+ 3. **Preserve content** - No content modification, just link processing
448
+
449
+ ### File Prioritization
450
+
451
+ When generating llms.txt, files are automatically prioritized:
452
+
453
+ 1. **README files** - Always listed first
454
+ 2. **Getting Started guides** - Quick start documentation
455
+ 3. **Guides and tutorials** - Step-by-step content
456
+ 4. **API references** - Technical documentation
457
+ 5. **Other files** - Everything else
458
+
459
+ ## Example Output
460
+
461
+ Given a `docs/` directory with:
462
+ - `README.md`
463
+ - `getting-started.md`
464
+ - `api-reference.md`
465
+
466
+ Running `llms-txt generate --docs ./docs --base-url https://myproject.io` creates:
467
+
468
+ ```markdown
469
+ # My Project
470
+
471
+ > This is a Ruby library that helps developers build amazing applications with a clean, simple API.
472
+
473
+ ## Documentation
474
+
475
+ - [README](https://myproject.io/README.md): Complete overview and installation instructions
476
+ - [Getting Started](https://myproject.io/getting-started.md): Quick start guide with examples
477
+ - [API Reference](https://myproject.io/api-reference.md): Detailed API documentation and method
478
+ signatures
479
+ ```
480
+
481
+ ## Contributing
482
+
483
+ Bug reports and pull requests are welcome on GitHub at https://github.com/mensfeld/llms-txt-ruby.
484
+
485
+ ## License
75
486
 
76
- Made with ❤️ for the Ruby community
487
+ The gem is available as open source under the terms of the
488
+ [MIT License](https://opensource.org/licenses/MIT).
data/Rakefile ADDED
@@ -0,0 +1,10 @@
1
+ # frozen_string_literal: true
2
+
3
+ require 'bundler/gem_tasks'
4
+ require 'rspec/core/rake_task'
5
+ require 'rubocop/rake_task'
6
+
7
+ RSpec::Core::RakeTask.new(:spec)
8
+ RuboCop::RakeTask.new
9
+
10
+ task default: %i[spec rubocop]