@pointsharp/antora-llm-generator 1.1.4 → 1.2.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +22 -34
- package/llm-generator.js +52 -102
- package/package.json +1 -5
package/README.md
CHANGED
|
@@ -1,11 +1,16 @@
|
|
|
1
1
|
# Antora LLM Generator Extension
|
|
2
2
|
|
|
3
|
-
An [Antora](https://antora.org) extension that creates
|
|
3
|
+
An [Antora](https://antora.org) extension that creates an auxiliary text file after each site build following the [llmstxt.org specification](https://llmstxt.org/):
|
|
4
4
|
|
|
5
5
|
- **`llms.txt`** - A structured index with links to all documentation pages, organized into sections
|
|
6
|
-
- **`llms-full.txt`** - Complete page content in markdown format with URLs for each page
|
|
7
6
|
|
|
8
|
-
|
|
7
|
+
The extension also generates compatibility and discovery files for crawlers and tools:
|
|
8
|
+
|
|
9
|
+
- **`llm.txt`** - Same content as `llms.txt`
|
|
10
|
+
- **`sitemap-llms.xml`** - Sitemap containing the `llms.txt` URL
|
|
11
|
+
- Updates the main `sitemap.xml` sitemap index to include `sitemap-llms.xml`
|
|
12
|
+
|
|
13
|
+
These files help large-language models ingest your documentation with proper structure, URLs, and context.
|
|
9
14
|
|
|
10
15
|
---
|
|
11
16
|
|
|
@@ -35,9 +40,9 @@ antora:
|
|
|
35
40
|
|
|
36
41
|
### Configuration options
|
|
37
42
|
|
|
38
|
-
- **`summary`** - Optional. Appears as a blockquote at the top of
|
|
43
|
+
- **`summary`** - Optional. Appears as a blockquote at the top of `llms.txt` (following llmstxt.org spec).
|
|
39
44
|
- **`details`** - Optional. Appears as regular text after the summary. Can be multi-line markdown.
|
|
40
|
-
- **`skippaths`** - Optional. Array of glob patterns. Files matching these patterns are omitted from
|
|
45
|
+
- **`skippaths`** - Optional. Array of glob patterns. Files matching these patterns are omitted from `llms.txt`.
|
|
41
46
|
- **`debug`** - Optional. Set to `true` to enable verbose logging during build. Default: `false`.
|
|
42
47
|
|
|
43
48
|
### Navigation-based organization (default)
|
|
@@ -77,12 +82,10 @@ You can control how individual pages are included using AsciiDoc page attributes
|
|
|
77
82
|
|
|
78
83
|
```adoc
|
|
79
84
|
:page-llms-ignore: true
|
|
80
|
-
:page-llms-full-ignore: true
|
|
81
85
|
:description: Brief description of this page
|
|
82
86
|
```
|
|
83
87
|
|
|
84
88
|
- **`:page-llms-ignore:`** - Omit this page from `llms.txt` entirely
|
|
85
|
-
- **`:page-llms-full-ignore:`** - Include in `llms.txt` but omit from `llms-full.txt`
|
|
86
89
|
- **`:description:`** - Standard AsciiDoc attribute used for both HTML metadata and page descriptions in `llms.txt`
|
|
87
90
|
|
|
88
91
|
### Page descriptions
|
|
@@ -137,31 +140,14 @@ Optional details from playbook config
|
|
|
137
140
|
- [Page Title](https://url): Description
|
|
138
141
|
```
|
|
139
142
|
|
|
140
|
-
###
|
|
141
|
-
|
|
142
|
-
Contains complete page content with clear URL references:
|
|
143
|
-
|
|
144
|
-
```markdown
|
|
145
|
-
# Your Site Title - Complete Documentation
|
|
146
|
-
|
|
147
|
-
> Optional summary from playbook config
|
|
148
|
-
|
|
149
|
-
---
|
|
150
|
-
|
|
151
|
-
## Page Title
|
|
152
|
-
|
|
153
|
-
**URL:** https://example.com/page
|
|
154
|
-
|
|
155
|
-
[Page content in markdown format...]
|
|
143
|
+
### Sitemap integration
|
|
156
144
|
|
|
157
|
-
|
|
158
|
-
|
|
159
|
-
## Another Page Title
|
|
145
|
+
The extension also adds `llms.txt` to sitemap discovery without pretending it is an Antora page:
|
|
160
146
|
|
|
161
|
-
|
|
147
|
+
- Generates `sitemap-llms.xml` with the `llms.txt` URL
|
|
148
|
+
- Updates the main `sitemap.xml` sitemap index to reference `sitemap-llms.xml`
|
|
162
149
|
|
|
163
|
-
|
|
164
|
-
```
|
|
150
|
+
This matches Antora best practice by keeping `llms.txt` in the site catalog instead of inserting it into the content catalog as a fake page.
|
|
165
151
|
|
|
166
152
|
---
|
|
167
153
|
|
|
@@ -173,11 +159,13 @@ Run your Antora build as usual:
|
|
|
173
159
|
antora antora-playbook.yml
|
|
174
160
|
```
|
|
175
161
|
|
|
176
|
-
After completion,
|
|
162
|
+
After completion, these files appear in the build output directory:
|
|
163
|
+
|
|
177
164
|
- `llms.txt` and `llm.txt` (both contain the structured index with links)
|
|
178
|
-
- `llms
|
|
165
|
+
- `sitemap-llms.xml` (contains the `llms.txt` URL)
|
|
166
|
+
- `sitemap.xml` is updated to include `sitemap-llms.xml`
|
|
179
167
|
|
|
180
|
-
The duplicate filenames ensure compatibility with different naming conventions.
|
|
168
|
+
The duplicate `llms.txt` and `llm.txt` filenames ensure compatibility with different naming conventions.
|
|
181
169
|
|
|
182
170
|
---
|
|
183
171
|
|
|
@@ -186,7 +174,7 @@ The duplicate filenames ensure compatibility with different naming conventions.
|
|
|
186
174
|
The extension runs **silently** by default, showing only a success message when complete:
|
|
187
175
|
|
|
188
176
|
```
|
|
189
|
-
Generated llms.txt
|
|
177
|
+
Generated llms.txt
|
|
190
178
|
```
|
|
191
179
|
|
|
192
180
|
### Enabling verbose output
|
|
@@ -235,7 +223,7 @@ antora:
|
|
|
235
223
|
- "**/internal/**"
|
|
236
224
|
```
|
|
237
225
|
|
|
238
|
-
**This produces:** The summary appears as a blockquote and details as regular text at the top of
|
|
226
|
+
**This produces:** The summary appears as a blockquote and details as regular text at the top of `llms.txt`, providing important context for LLMs before they see the documentation links.
|
|
239
227
|
|
|
240
228
|
---
|
|
241
229
|
|
package/llm-generator.js
CHANGED
|
@@ -1,11 +1,7 @@
|
|
|
1
1
|
"use strict";
|
|
2
2
|
|
|
3
|
-
const { NodeHtmlMarkdown } = require("node-html-markdown");
|
|
4
3
|
const { minimatch } = require("minimatch");
|
|
5
4
|
|
|
6
|
-
// HTML to Markdown converter for page content
|
|
7
|
-
const nhm = new NodeHtmlMarkdown();
|
|
8
|
-
|
|
9
5
|
// UTF-8 BOM (Byte Order Mark) prepended to output files
|
|
10
6
|
// This helps browsers and editors recognize the file as UTF-8 encoded
|
|
11
7
|
const UTF8_BOM = '\uFEFF';
|
|
@@ -78,12 +74,6 @@ module.exports.register = function (context, { config }) {
|
|
|
78
74
|
indexContent += `\n${details}\n`;
|
|
79
75
|
}
|
|
80
76
|
|
|
81
|
-
// llms-full.txt: Complete page content in markdown
|
|
82
|
-
let fullContent = `# ${siteTitle} - Complete Documentation\n\n`;
|
|
83
|
-
if (summary) {
|
|
84
|
-
fullContent += `> ${summary}\n\n`;
|
|
85
|
-
}
|
|
86
|
-
|
|
87
77
|
// =============================================================================
|
|
88
78
|
// STEP 3: Build page index
|
|
89
79
|
// =============================================================================
|
|
@@ -147,57 +137,12 @@ module.exports.register = function (context, { config }) {
|
|
|
147
137
|
}
|
|
148
138
|
|
|
149
139
|
// =============================================================================
|
|
150
|
-
// STEP 6:
|
|
140
|
+
// STEP 6: Write output files
|
|
151
141
|
// =============================================================================
|
|
152
|
-
//
|
|
153
|
-
for (const [path, page] of pagesByPath.entries()) {
|
|
154
|
-
// Skip pages with :page-llms-full-ignore: attribute
|
|
155
|
-
if (page.asciidoc.attributes["page-llms-full-ignore"]) {
|
|
156
|
-
if (debug) logger.info(`Skipping page from full content with 'page-llms-full-ignore' attribute: ${page.src.path}`);
|
|
157
|
-
continue;
|
|
158
|
-
}
|
|
159
|
-
|
|
160
|
-
const pageUrl = `${siteUrl}/${path}`;
|
|
161
|
-
|
|
162
|
-
// Convert HTML content to markdown (decode as UTF-8)
|
|
163
|
-
const htmlContent = page.contents.toString('utf8');
|
|
164
|
-
let plainText = nhm.translate(htmlContent);
|
|
165
|
-
|
|
166
|
-
// Convert relative URLs to absolute (e.g., "page.html" -> "https://site.com/path/page.html")
|
|
167
|
-
plainText = convertRelativeLinksToAbsolute(plainText, pageUrl, logger);
|
|
168
|
-
|
|
169
|
-
// Add page section with title, URL, optional description, and content
|
|
170
|
-
fullContent += `\n---\n\n`;
|
|
171
|
-
fullContent += `## ${page.title}\n\n`;
|
|
172
|
-
fullContent += `**URL:** ${pageUrl}\n\n`;
|
|
173
|
-
|
|
174
|
-
// Include :description: attribute if present
|
|
175
|
-
if (page.asciidoc.attributes["description"]) {
|
|
176
|
-
fullContent += `*${page.asciidoc.attributes["description"]}*\n\n`;
|
|
177
|
-
}
|
|
178
|
-
|
|
179
|
-
fullContent += plainText;
|
|
180
|
-
fullContent += `\n`;
|
|
181
|
-
}
|
|
182
|
-
|
|
183
|
-
// =============================================================================
|
|
184
|
-
// STEP 7: Write output files
|
|
185
|
-
// =============================================================================
|
|
186
|
-
// Create 4 files total (both singular and plural forms for compatibility):
|
|
187
|
-
// - llms-full.txt / llm-full.txt: Complete page content
|
|
142
|
+
// Create 2 files (both singular and plural forms for compatibility):
|
|
188
143
|
// - llms.txt / llm.txt: Navigation index
|
|
189
144
|
//
|
|
190
145
|
// All files are encoded as UTF-8 with BOM for proper character display
|
|
191
|
-
siteCatalog.addFile({
|
|
192
|
-
out: { path: "llms-full.txt" },
|
|
193
|
-
contents: Buffer.from(UTF8_BOM + fullContent, 'utf8'),
|
|
194
|
-
});
|
|
195
|
-
|
|
196
|
-
siteCatalog.addFile({
|
|
197
|
-
out: { path: "llm-full.txt" },
|
|
198
|
-
contents: Buffer.from(UTF8_BOM + fullContent, 'utf8'),
|
|
199
|
-
});
|
|
200
|
-
|
|
201
146
|
siteCatalog.addFile({
|
|
202
147
|
out: { path: "llm.txt" },
|
|
203
148
|
contents: Buffer.from(UTF8_BOM + indexContent, 'utf8'),
|
|
@@ -208,7 +153,56 @@ module.exports.register = function (context, { config }) {
|
|
|
208
153
|
contents: Buffer.from(UTF8_BOM + indexContent, 'utf8'),
|
|
209
154
|
});
|
|
210
155
|
|
|
211
|
-
|
|
156
|
+
// =============================================================================
|
|
157
|
+
// STEP 8: Add llms.txt to the sitemap
|
|
158
|
+
// =============================================================================
|
|
159
|
+
// Per Dan Allen's recommendation: modify the sitemap XML directly rather than
|
|
160
|
+
// faking llms.txt as a page in the contentCatalog.
|
|
161
|
+
//
|
|
162
|
+
// Strategy:
|
|
163
|
+
// 1. Create sitemap-llms.xml containing only the llms.txt URL.
|
|
164
|
+
// 2. Find sitemap.xml (the sitemap index) in the siteCatalog and inject a
|
|
165
|
+
// <sitemap> entry pointing to sitemap-llms.xml.
|
|
166
|
+
//
|
|
167
|
+
// This keeps llms.txt discoverable by crawlers without misrepresenting it as
|
|
168
|
+
// a documentation page.
|
|
169
|
+
if (siteUrl) {
|
|
170
|
+
try {
|
|
171
|
+
const llmsSitemapXml = [
|
|
172
|
+
'<?xml version="1.0" encoding="UTF-8"?>',
|
|
173
|
+
'<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">',
|
|
174
|
+
' <url>',
|
|
175
|
+
` <loc>${siteUrl}/llms.txt</loc>`,
|
|
176
|
+
' </url>',
|
|
177
|
+
'</urlset>',
|
|
178
|
+
].join('\n');
|
|
179
|
+
|
|
180
|
+
siteCatalog.addFile({
|
|
181
|
+
out: { path: 'sitemap-llms.xml' },
|
|
182
|
+
contents: Buffer.from(llmsSitemapXml, 'utf8'),
|
|
183
|
+
});
|
|
184
|
+
|
|
185
|
+
// Find the sitemap index produced by Antora's site-mapper and add our entry
|
|
186
|
+
const sitemapIndexFile = siteCatalog.getFiles().find((f) => f.out?.path === 'sitemap.xml');
|
|
187
|
+
if (sitemapIndexFile) {
|
|
188
|
+
const current = sitemapIndexFile.contents.toString('utf8');
|
|
189
|
+
const entry = ` <sitemap>\n <loc>${siteUrl}/sitemap-llms.xml</loc>\n </sitemap>\n`;
|
|
190
|
+
sitemapIndexFile.contents = Buffer.from(
|
|
191
|
+
current.replace('</sitemapindex>', entry + '</sitemapindex>'),
|
|
192
|
+
'utf8'
|
|
193
|
+
);
|
|
194
|
+
if (debug) logger.info('Added sitemap-llms.xml to sitemap index.');
|
|
195
|
+
} else {
|
|
196
|
+
logger.warn('llm-generator: sitemap.xml not found in site catalog — sitemap update skipped. Ensure the site-mapper is enabled.');
|
|
197
|
+
}
|
|
198
|
+
} catch (err) {
|
|
199
|
+
logger.warn(`llm-generator: Could not update sitemap with llms.txt: ${err.message}`);
|
|
200
|
+
}
|
|
201
|
+
} else {
|
|
202
|
+
if (debug) logger.info('No site URL configured — skipping sitemap update for llms.txt.');
|
|
203
|
+
}
|
|
204
|
+
|
|
205
|
+
logger.info("Generated llms.txt");
|
|
212
206
|
});
|
|
213
207
|
};
|
|
214
208
|
|
|
@@ -339,50 +333,6 @@ function buildSectionsFromAntoraNavigation(contentCatalog, siteUrl, pagesByPath,
|
|
|
339
333
|
return componentSections;
|
|
340
334
|
}
|
|
341
335
|
|
|
342
|
-
/**
|
|
343
|
-
* Convert relative URLs in markdown links to absolute URLs
|
|
344
|
-
*
|
|
345
|
-
* LLMs need absolute URLs to properly reference pages. This function converts:
|
|
346
|
-
* - Relative links: [text](page.html) -> [text](https://site.com/path/page.html)
|
|
347
|
-
* - Fragment links: [text](#section) -> [text](https://site.com/page#section)
|
|
348
|
-
* - Already absolute links are left unchanged
|
|
349
|
-
*
|
|
350
|
-
* @param {string} markdown - Markdown content with potential relative links
|
|
351
|
-
* @param {string} baseUrl - Full URL of the current page (used as base for resolution)
|
|
352
|
-
* @param {Object} logger - Antora logger instance
|
|
353
|
-
* @returns {string} Markdown with all relative URLs converted to absolute
|
|
354
|
-
*/
|
|
355
|
-
function convertRelativeLinksToAbsolute(markdown, baseUrl, logger) {
|
|
356
|
-
// Match markdown links: [text](url)
|
|
357
|
-
const linkRegex = /\[([^\]]*)\]\(([^)]+)\)/g;
|
|
358
|
-
|
|
359
|
-
return markdown.replace(linkRegex, (match, text, url) => {
|
|
360
|
-
// Skip if URL is already absolute (http://, https://, mailto:, ftp:, etc.)
|
|
361
|
-
if (/^[a-z][a-z0-9+.-]*:/i.test(url)) {
|
|
362
|
-
return match;
|
|
363
|
-
}
|
|
364
|
-
|
|
365
|
-
// Handle fragment-only links (e.g., #section-name)
|
|
366
|
-
if (url.startsWith('#')) {
|
|
367
|
-
// Remove any existing fragment from base URL, then append new fragment
|
|
368
|
-
const baseWithoutFragment = baseUrl.split('#')[0];
|
|
369
|
-
return `[${text}](${baseWithoutFragment}${url})`;
|
|
370
|
-
}
|
|
371
|
-
|
|
372
|
-
// Handle relative URLs (e.g., page.html, ../other.html)
|
|
373
|
-
try {
|
|
374
|
-
// Remove existing fragment from baseUrl for clean resolution
|
|
375
|
-
const baseWithoutFragment = baseUrl.split('#')[0];
|
|
376
|
-
// Use JavaScript URL API to properly resolve relative paths
|
|
377
|
-
const absoluteUrl = new URL(url, baseWithoutFragment).href;
|
|
378
|
-
return `[${text}](${absoluteUrl})`;
|
|
379
|
-
} catch (error) {
|
|
380
|
-
// Return original if resolution fails
|
|
381
|
-
return match;
|
|
382
|
-
}
|
|
383
|
-
});
|
|
384
|
-
}
|
|
385
|
-
|
|
386
336
|
/**
|
|
387
337
|
* Recursively collect navigation items from Antora's navigation tree
|
|
388
338
|
*
|
package/package.json
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "@pointsharp/antora-llm-generator",
|
|
3
3
|
"private": false,
|
|
4
|
-
"version": "1.
|
|
4
|
+
"version": "1.2.0",
|
|
5
5
|
"description": "An Antora extension to generate llms.txt files for LLM consumption following llmstxt.org specification.",
|
|
6
6
|
"main": "llm-generator.js",
|
|
7
7
|
"keywords": [
|
|
@@ -16,9 +16,5 @@
|
|
|
16
16
|
"repository": {
|
|
17
17
|
"type": "git",
|
|
18
18
|
"url": "git+https://github.com/pointsharp/antora-llm-generator.git"
|
|
19
|
-
},
|
|
20
|
-
"dependencies": {
|
|
21
|
-
"minimatch": "10",
|
|
22
|
-
"node-html-markdown": "^1.3.0"
|
|
23
19
|
}
|
|
24
20
|
}
|