@pointsharp/antora-llm-generator 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (3) hide show
  1. package/README.md +226 -0
  2. package/llm-generator.js +470 -0
  3. package/package.json +24 -0
package/README.md ADDED
@@ -0,0 +1,226 @@
1
+ # Antora LLM Generator Extension
2
+
3
+ An [Antora](https://antora.org) extension that creates two auxiliary text files after each site build following the [llmstxt.org specification](https://llmstxt.org/):
4
+
5
+ - **`llms.txt`** - A structured index with links to all documentation pages, organized into sections
6
+ - **`llms-full.txt`** - Complete page content in markdown format with URLs for each page
7
+
8
+ Both files help large-language models ingest your documentation with proper structure, URLs, and context.
9
+
10
+ ---
11
+
12
+ ## Installation
13
+
14
+ This extension is used locally. Ensure the required dependencies are installed:
15
+
16
+ ```bash
17
+ npm install node-html-markdown minimatch
18
+ ```
19
+
20
+ ---
21
+
22
+ ## Playbook configuration
23
+
24
+ Add the extension to your `antora-playbook.yml`:
25
+
26
+ ```yaml
27
+ antora:
28
+ extensions:
29
+ - require: ./extensions/antora-llm-generator/llm-generator.js
30
+ summary: "Brief summary about your documentation site"
31
+ details: |
32
+ Optional longer description or important notes.
33
+ Can be multi-line markdown text.
34
+ skippaths:
35
+ - "someGlob/**/path"
36
+ ```
37
+
38
+ ### Configuration options
39
+
40
+ - **`summary`** - Optional. Appears as a blockquote at the top of both output files (following llmstxt.org spec).
41
+ - **`details`** - Optional. Appears as regular text after the summary. Can be multi-line markdown.
42
+ - **`skippaths`** - Optional. Array of glob patterns. Files matching these patterns are omitted from both output files.
43
+
44
+ ### Navigation-based organization (default)
45
+
46
+ By default, the extension uses your Antora navigation structure from `nav.adoc` files to organize pages in `llms.txt`. This works automatically with:
47
+
48
+ - **Native Antora navigation** - Always works
49
+ - **Navigator extension** - If you use it, the same navigation structure is used
50
+ - **Any custom navigation** - As long as it's in your `nav.adoc` files
51
+
52
+ Pages are organized by component titles as H2 sections, and the navigation hierarchy is preserved.
53
+
54
+ **Example output:**
55
+
56
+ ```markdown
57
+ # Your Site Title
58
+
59
+ > Optional summary
60
+
61
+ ## Product A
62
+
63
+ - [What is Product A?](https://example.com/product-a/latest/index.html)
64
+ - **Installation**
65
+ - [Install on Windows](https://example.com/product-a/latest/installation-windows.html)
66
+ - [Install on Linux](https://example.com/product-a/latest/installation-linux.html)
67
+
68
+ ## Product B
69
+
70
+ - [Introduction to Product B](https://example.com/product-b/index.html)
71
+ ```
72
+
73
+ ---
74
+
75
+ ## Page-level attributes
76
+
77
+ You can control how individual pages are included using AsciiDoc page attributes:
78
+
79
+ ```adoc
80
+ :page-llms-ignore: true
81
+ :page-llms-full-ignore: true
82
+ :description: Brief description of this page
83
+ ```
84
+
85
+ - **`:page-llms-ignore:`** - Omit this page from `llms.txt` entirely
86
+ - **`:page-llms-full-ignore:`** - Include in `llms.txt` but omit from `llms-full.txt`
87
+ - **`:description:`** - Standard AsciiDoc attribute used for both HTML metadata and page descriptions in `llms.txt`
88
+
89
+ ### Page descriptions
90
+
91
+ The extension uses the standard `:description:` attribute to add optional descriptions to page links in `llms.txt`, following the [llmstxt.org specification](https://llmstxt.org/):
92
+
93
+ **Without description:**
94
+ ```markdown
95
+ - [Getting Started](https://example.com/getting-started)
96
+ ```
97
+
98
+ **With description:**
99
+ ```markdown
100
+ - [Getting Started](https://example.com/getting-started): Quick start guide for new users
101
+ ```
102
+
103
+ **Example:**
104
+
105
+ ```adoc
106
+ = System Requirements
107
+ :description: This section describes the system requirements for Product A.
108
+ ```
109
+
110
+ The `:description:` attribute is a standard AsciiDoc attribute that serves dual purposes:
111
+ - **HTML metadata** - Used in `<meta name="description">` tags for SEO
112
+ - **LLM files** - Provides context for each page link in `llms.txt`
113
+
114
+ The extension automatically resolves any attribute references within the description using the standard `{attribute}` syntax, so you can compose descriptions from other attributes if needed.
115
+
116
+ ---
117
+
118
+ ## Output structure
119
+
120
+ ### llms.txt structure
121
+
122
+ Following the [llmstxt.org specification](https://llmstxt.org/), the file contains:
123
+
124
+ ```markdown
125
+ # Your Site Title
126
+
127
+ > Optional summary from playbook config
128
+
129
+ Optional details from playbook config
130
+
131
+ ## Section Name
132
+
133
+ - [Page Title](https://url): Optional description
134
+ - [Another Page](https://url)
135
+
136
+ ## Another Section
137
+
138
+ - [Page Title](https://url): Description
139
+ ```
140
+
141
+ ### llms-full.txt structure
142
+
143
+ Contains complete page content with clear URL references:
144
+
145
+ ```markdown
146
+ # Your Site Title - Complete Documentation
147
+
148
+ > Optional summary from playbook config
149
+
150
+ ---
151
+
152
+ ## Page Title
153
+
154
+ **URL:** https://example.com/page
155
+
156
+ [Page content in markdown format...]
157
+
158
+ ---
159
+
160
+ ## Another Page Title
161
+
162
+ **URL:** https://example.com/another
163
+
164
+ [Page content in markdown format...]
165
+ ```
166
+
167
+ ---
168
+
169
+ ## Building the site
170
+
171
+ Run your Antora build as usual:
172
+
173
+ ```bash
174
+ antora antora-playbook.yml
175
+ ```
176
+
177
+ After completion, four files appear in the build output directory:
178
+ - `llms.txt` and `llm.txt` (both contain the structured index with links)
179
+ - `llms-full.txt` and `llm-full.txt` (both contain complete page content)
180
+
181
+ The duplicate filenames ensure compatibility with different naming conventions.
182
+
183
+ ---
184
+
185
+ ## Example configuration
186
+
187
+ Here's a complete example following the [llmstxt.org](https://llmstxt.org/) best practices:
188
+
189
+ ```yaml
190
+ antora:
191
+ extensions:
192
+ - require: ./extensions/antora-llm-generator/llm-generator.js
193
+ summary: "Comprehensive documentation for the Acme API, including authentication, endpoints, and best practices."
194
+ details: |
195
+ Important notes:
196
+
197
+ - All API endpoints require authentication via API key
198
+ - Rate limits apply to all endpoints (1000 requests/hour)
199
+ - WebSocket connections are available for real-time updates
200
+ skippaths:
201
+ - "admin/**"
202
+ - "**/internal/**"
203
+ ```
204
+
205
+ **This produces:** The summary appears as a blockquote and details as regular text at the top of both `llms.txt` and `llms-full.txt`, providing important context for LLMs before they see the documentation links.
206
+
207
+ ---
208
+
209
+ ## Tips for effective llms.txt files
210
+
211
+ Following [llmstxt.org guidelines](https://llmstxt.org/):
212
+
213
+ - **Use concise, clear language** in summaries and descriptions
214
+ - **Include brief, informative descriptions** for important pages
215
+ - **Avoid ambiguous terms** or unexplained jargon
216
+ - **Use the "Optional" section** for secondary information that can be skipped for shorter context
217
+ - **Organize related pages** into logical sections
218
+ - **Test with actual LLMs** to ensure they can answer questions about your content
219
+
220
+ ---
221
+
222
+ ## Compatibility
223
+
224
+ This extension follows the official [llmstxt.org specification](https://llmstxt.org/) for maximum compatibility with LLM tools and agents.
225
+
226
+ For more details about the specification, visit [https://llmstxt.org/](https://llmstxt.org/).
@@ -0,0 +1,470 @@
1
+ "use strict";
2
+
3
+ const { NodeHtmlMarkdown } = require("node-html-markdown");
4
+ const { minimatch } = require("minimatch");
5
+
6
+ // HTML to Markdown converter for page content
7
+ const nhm = new NodeHtmlMarkdown();
8
+
9
+ // UTF-8 BOM (Byte Order Mark) prepended to output files
10
+ // This helps browsers and editors recognize the file as UTF-8 encoded
11
+ const UTF8_BOM = '\uFEFF';
12
+
13
+ /**
14
+ * Antora Extension: LLM Text Generator
15
+ *
16
+ * Generates two files following the llmstxt.org specification:
17
+ * - llms.txt: Navigation index with page links and descriptions
18
+ * - llms-full.txt: Complete page content in markdown format
19
+ *
20
+ * Both files help LLMs understand and answer questions about your documentation.
21
+ *
22
+ * Configuration options (in playbook):
23
+ * - summary: Optional blockquote at the top of both files
24
+ * - details: Optional text after summary
25
+ * - skippaths: Array of glob patterns to exclude pages
26
+ */
27
+ module.exports.register = function (context, { config }) {
28
+ const logger = context.getLogger("antora-llm-generator");
29
+ const { playbook } = context.getVariables();
30
+ const siteTitle = playbook.site?.title || "Documentation";
31
+ const siteUrl = playbook.site?.url;
32
+
33
+ // Extract configuration options
34
+ const skipPaths = config.skippaths || [];
35
+ const summary = config.summary || null;
36
+ const details = config.details || null;
37
+
38
+ logger.info(`Skip paths: ${JSON.stringify(skipPaths)}`);
39
+
40
+ // Helper: Check if a page path matches any skip pattern
41
+ const shouldSkipPath = (path) => {
42
+ return skipPaths.some((pattern) => minimatch(path, pattern));
43
+ };
44
+
45
+ context.on("beforePublish", ({ contentCatalog, siteCatalog }) => {
46
+ logger.info("Assembling content for LLM text files using Antora navigation.");
47
+
48
+ // =============================================================================
49
+ // STEP 1: Detect navigation source
50
+ // =============================================================================
51
+ // Two navigation sources are supported:
52
+ // 1. Antora native navigation (from nav.adoc files)
53
+ // 2. Navigator extension (if installed, creates site-navigation-data.js)
54
+ const navDataFile = siteCatalog.getFiles().find(f => f.out?.path === 'site-navigation-data.js');
55
+ const useNavigatorData = !!navDataFile;
56
+
57
+ if (useNavigatorData) {
58
+ logger.info("Found site-navigation-data.js - will use navigator extension data");
59
+ } else {
60
+ logger.info("Navigator extension not detected - will use Antora native navigation");
61
+ }
62
+
63
+ // =============================================================================
64
+ // STEP 2: Initialize output file content
65
+ // =============================================================================
66
+
67
+ // llms.txt: Navigation index with page links
68
+ let indexContent = `# ${siteTitle}\n`;
69
+
70
+ // Add optional summary as blockquote (llmstxt.org spec)
71
+ if (summary) {
72
+ indexContent += `\n> ${summary}\n`;
73
+ }
74
+
75
+ // Add optional details as regular text
76
+ if (details) {
77
+ indexContent += `\n${details}\n`;
78
+ }
79
+
80
+ // llms-full.txt: Complete page content in markdown
81
+ let fullContent = `# ${siteTitle} - Complete Documentation\n\n`;
82
+ if (summary) {
83
+ fullContent += `> ${summary}\n\n`;
84
+ }
85
+
86
+ // =============================================================================
87
+ // STEP 3: Build page index
88
+ // =============================================================================
89
+ // Create a Map for fast page lookup by URL path
90
+ // Key: page.out.path (e.g., "net-id-client/1.3/index.html")
91
+ // Value: page object with contents, attributes, etc.
92
+ const pagesByPath = new Map();
93
+ const pages = contentCatalog.findBy({ family: "page" });
94
+
95
+ for (const page of pages) {
96
+ // Skip pages without output path
97
+ if (!page.out) continue;
98
+
99
+ // Skip pages matching skippath patterns from config
100
+ if (shouldSkipPath(page.out.path)) {
101
+ logger.info(`Skipping page matching skip pattern: ${page.out.path}`);
102
+ continue;
103
+ }
104
+
105
+ // Skip pages with :page-llms-ignore: attribute
106
+ if (page.asciidoc.attributes["page-llms-ignore"]) {
107
+ logger.info(`Skipping page with 'page-llms-ignore' attribute: ${page.src.path}`);
108
+ continue;
109
+ }
110
+
111
+ pagesByPath.set(page.out.path, page);
112
+ }
113
+
114
+ logger.info(`Built pagesByPath map with ${pagesByPath.size} pages`);
115
+
116
+ // =============================================================================
117
+ // STEP 4: Build navigation sections
118
+ // =============================================================================
119
+ // Returns Map: sectionName -> array of formatted navigation items
120
+ // Each item is a markdown string: "- [Page Title](url): Optional description"
121
+ let componentSections;
122
+ if (useNavigatorData) {
123
+ // Use navigator extension's site-navigation-data.js
124
+ logger.info("Using navigator extension data");
125
+ componentSections = buildSectionsFromNavigatorData(navDataFile, siteUrl, pagesByPath, logger);
126
+ } else {
127
+ // Use Antora's native navigation structure from component versions
128
+ logger.info("Using Antora native navigation");
129
+ componentSections = buildSectionsFromAntoraNavigation(contentCatalog, siteUrl, pagesByPath, logger);
130
+ }
131
+
132
+ logger.info(`Component sections map has ${componentSections.size} sections`);
133
+ for (const [name, items] of componentSections.entries()) {
134
+ logger.info(`Section "${name}" has ${items.length} items`);
135
+ }
136
+
137
+ // =============================================================================
138
+ // STEP 5: Build llms.txt navigation index
139
+ // =============================================================================
140
+ // Format: H2 sections with bulleted lists of page links
141
+ for (const [sectionName, items] of componentSections.entries()) {
142
+ if (items.length > 0) {
143
+ indexContent += `\n## ${sectionName}\n\n`;
144
+ items.forEach(item => {
145
+ indexContent += `${item}\n`;
146
+ });
147
+ }
148
+ }
149
+
150
+ // =============================================================================
151
+ // STEP 6: Build llms-full.txt complete page content
152
+ // =============================================================================
153
+ // For each page: convert HTML to markdown, resolve relative links, add URL header
154
+ for (const [path, page] of pagesByPath.entries()) {
155
+ // Skip pages with :page-llms-full-ignore: attribute
156
+ if (page.asciidoc.attributes["page-llms-full-ignore"]) {
157
+ logger.info(`Skipping page from full content with 'page-llms-full-ignore' attribute: ${page.src.path}`);
158
+ continue;
159
+ }
160
+
161
+ const pageUrl = `${siteUrl}/${path}`;
162
+
163
+ // Convert HTML content to markdown (decode as UTF-8)
164
+ const htmlContent = page.contents.toString('utf8');
165
+ let plainText = nhm.translate(htmlContent);
166
+
167
+ // Convert relative URLs to absolute (e.g., "page.html" -> "https://site.com/path/page.html")
168
+ plainText = convertRelativeLinksToAbsolute(plainText, pageUrl, logger);
169
+
170
+ // Add page section with title, URL, optional description, and content
171
+ fullContent += `\n---\n\n`;
172
+ fullContent += `## ${page.title}\n\n`;
173
+ fullContent += `**URL:** ${pageUrl}\n\n`;
174
+
175
+ // Include :description: attribute if present
176
+ if (page.asciidoc.attributes["description"]) {
177
+ fullContent += `*${page.asciidoc.attributes["description"]}*\n\n`;
178
+ }
179
+
180
+ fullContent += plainText;
181
+ fullContent += `\n`;
182
+ }
183
+
184
+ // =============================================================================
185
+ // STEP 7: Write output files
186
+ // =============================================================================
187
+ // Create 4 files total (both singular and plural forms for compatibility):
188
+ // - llms-full.txt / llm-full.txt: Complete page content
189
+ // - llms.txt / llm.txt: Navigation index
190
+ //
191
+ // All files are encoded as UTF-8 with BOM for proper character display
192
+ siteCatalog.addFile({
193
+ out: { path: "llms-full.txt" },
194
+ contents: Buffer.from(UTF8_BOM + fullContent, 'utf8'),
195
+ });
196
+
197
+ siteCatalog.addFile({
198
+ out: { path: "llm-full.txt" },
199
+ contents: Buffer.from(UTF8_BOM + fullContent, 'utf8'),
200
+ });
201
+
202
+ siteCatalog.addFile({
203
+ out: { path: "llm.txt" },
204
+ contents: Buffer.from(UTF8_BOM + indexContent, 'utf8'),
205
+ });
206
+
207
+ siteCatalog.addFile({
208
+ out: { path: "llms.txt" },
209
+ contents: Buffer.from(UTF8_BOM + indexContent, 'utf8'),
210
+ });
211
+
212
+ logger.info("llms.txt and llms-full.txt files have been generated successfully.");
213
+ });
214
+ };
215
+
216
+ /**
217
+ * Build sections from navigator extension's site-navigation-data.js
218
+ *
219
+ * The navigator extension creates a JavaScript file with navigation data in this format:
220
+ * siteNavigationData=[{name:"component",title:"Title",versions:[...]}]
221
+ *
222
+ * @param {Object} navDataFile - The site-navigation-data.js file from siteCatalog
223
+ * @param {string} siteUrl - Base URL of the site
224
+ * @param {Map} pagesByPath - Map of page.out.path -> page object
225
+ * @param {Object} logger - Antora logger instance
226
+ * @returns {Map} Map of sectionName -> array of formatted navigation items
227
+ */
228
+ function buildSectionsFromNavigatorData(navDataFile, siteUrl, pagesByPath, logger) {
229
+ const componentSections = new Map();
230
+
231
+ try {
232
+ // Parse the JavaScript file to extract JSON data
233
+ const navContent = navDataFile.contents.toString();
234
+
235
+ // Extract JSON array from: siteNavigationData=[...]
236
+ const jsonMatch = navContent.match(/siteNavigationData=(\[.*\])/);
237
+ if (!jsonMatch) {
238
+ logger.warn("Could not parse site-navigation-data.js format");
239
+ return componentSections;
240
+ }
241
+
242
+ const siteNavigationData = JSON.parse(jsonMatch[1]);
243
+ logger.info(`Parsed ${siteNavigationData.length} components from navigator data`);
244
+
245
+ // Process each component
246
+ for (const component of siteNavigationData) {
247
+ const componentTitle = component.title || component.name;
248
+
249
+ // Initialize section array for this component
250
+ if (!componentSections.has(componentTitle)) {
251
+ componentSections.set(componentTitle, []);
252
+ }
253
+
254
+ const sectionArray = componentSections.get(componentTitle);
255
+
256
+ // Process all versions (e.g., 1.0, 1.1, latest)
257
+ for (const version of component.versions || []) {
258
+ // Process all navigation sets within version
259
+ for (const set of version.sets || []) {
260
+ if (set.items && set.items.length > 0) {
261
+ // Recursively collect navigation items
262
+ const counts = collectNavItems(set.items, sectionArray, siteUrl, pagesByPath, logger);
263
+ logger.debug(`Component ${component.name}: collected ${counts.found} items`);
264
+ }
265
+ }
266
+ }
267
+ }
268
+ } catch (error) {
269
+ logger.error(`Error parsing navigator data: ${error.message}`);
270
+ }
271
+
272
+ return componentSections;
273
+ }
274
+
275
+ /**
276
+ * Build sections using Antora's native navigation structure
277
+ *
278
+ * Antora stores parsed navigation in version.navigation array after processing nav.adoc files.
279
+ * Each component version can have multiple navigation trees (if multiple nav files exist).
280
+ *
281
+ * Structure:
282
+ * - component.versions[] -> each version has:
283
+ * - version.navigation[] -> array of nav trees, each has:
284
+ * - navTree.items[] -> recursive navigation items
285
+ *
286
+ * @param {Object} contentCatalog - Antora content catalog
287
+ * @param {string} siteUrl - Base URL of the site
288
+ * @param {Map} pagesByPath - Map of page.out.path -> page object
289
+ * @param {Object} logger - Antora logger instance
290
+ * @returns {Map} Map of sectionName -> array of formatted navigation items
291
+ */
292
+ function buildSectionsFromAntoraNavigation(contentCatalog, siteUrl, pagesByPath, logger) {
293
+ const componentSections = new Map();
294
+ const components = contentCatalog.getComponents();
295
+
296
+ for (const component of components) {
297
+ for (const version of component.versions) {
298
+ const componentName = version.name;
299
+ const componentTitle = version.title || componentName;
300
+ const versionString = version.version;
301
+
302
+ logger.info(`Processing component: ${componentName} (${versionString || 'unversioned'})`);
303
+
304
+ // Check if version has parsed navigation tree
305
+ // (version.navigation is populated by Antora after parsing nav.adoc files)
306
+ if (!version.navigation || !Array.isArray(version.navigation) || version.navigation.length === 0) {
307
+ logger.debug(`No navigation tree found for ${componentName} ${versionString || 'unversioned'}`);
308
+ continue;
309
+ }
310
+
311
+ // Use component title as section name (H2 heading in llms.txt)
312
+ const sectionName = componentTitle;
313
+ if (!componentSections.has(sectionName)) {
314
+ componentSections.set(sectionName, []);
315
+ }
316
+
317
+ // Process each navigation tree (a component version can have multiple nav files)
318
+ const sectionArray = componentSections.get(sectionName);
319
+ for (const navTree of version.navigation) {
320
+ if (navTree.items && navTree.items.length > 0) {
321
+ // Recursively collect all navigation items
322
+ const counts = collectNavItems(
323
+ navTree.items,
324
+ sectionArray,
325
+ siteUrl,
326
+ pagesByPath,
327
+ logger
328
+ );
329
+ logger.info(`Section "${sectionName}": collected ${counts.found} pages from navigation`);
330
+ }
331
+ }
332
+ }
333
+ }
334
+
335
+ return componentSections;
336
+ }
337
+
338
+ /**
339
+ * Convert relative URLs in markdown links to absolute URLs
340
+ *
341
+ * LLMs need absolute URLs to properly reference pages. This function converts:
342
+ * - Relative links: [text](page.html) -> [text](https://site.com/path/page.html)
343
+ * - Fragment links: [text](#section) -> [text](https://site.com/page#section)
344
+ * - Already absolute links are left unchanged
345
+ *
346
+ * @param {string} markdown - Markdown content with potential relative links
347
+ * @param {string} baseUrl - Full URL of the current page (used as base for resolution)
348
+ * @param {Object} logger - Antora logger instance
349
+ * @returns {string} Markdown with all relative URLs converted to absolute
350
+ */
351
+ function convertRelativeLinksToAbsolute(markdown, baseUrl, logger) {
352
+ // Match markdown links: [text](url)
353
+ const linkRegex = /\[([^\]]*)\]\(([^)]+)\)/g;
354
+
355
+ return markdown.replace(linkRegex, (match, text, url) => {
356
+ // Skip if URL is already absolute (http://, https://, mailto:, ftp:, etc.)
357
+ if (/^[a-z][a-z0-9+.-]*:/i.test(url)) {
358
+ return match;
359
+ }
360
+
361
+ // Handle fragment-only links (e.g., #section-name)
362
+ if (url.startsWith('#')) {
363
+ // Remove any existing fragment from base URL, then append new fragment
364
+ const baseWithoutFragment = baseUrl.split('#')[0];
365
+ return `[${text}](${baseWithoutFragment}${url})`;
366
+ }
367
+
368
+ // Handle relative URLs (e.g., page.html, ../other.html)
369
+ try {
370
+ // Remove existing fragment from baseUrl for clean resolution
371
+ const baseWithoutFragment = baseUrl.split('#')[0];
372
+ // Use JavaScript URL API to properly resolve relative paths
373
+ const absoluteUrl = new URL(url, baseWithoutFragment).href;
374
+ return `[${text}](${absoluteUrl})`;
375
+ } catch (error) {
376
+ logger.debug(`Failed to resolve relative URL "${url}" against base "${baseUrl}": ${error.message}`);
377
+ return match; // Return original if resolution fails
378
+ }
379
+ });
380
+ }
381
+
382
+ /**
383
+ * Recursively collect navigation items from Antora's navigation tree
384
+ *
385
+ * Navigation items can be:
386
+ * - Links: Have both url and content -> formatted as markdown links
387
+ * - Section headers: Have content but no url -> formatted as bold text
388
+ * - Nested items: Have items[] array -> processed recursively with increased indentation
389
+ *
390
+ * The function preserves navigation hierarchy through indentation:
391
+ * - Top-level items: "- [Page](url)"
392
+ * - Nested items: " - [Page](url)"
393
+ * - Deeply nested: " - [Page](url)"
394
+ *
395
+ * @param {Array} items - Array of navigation items to process
396
+ * @param {Array} collector - Array to collect formatted navigation strings
397
+ * @param {string} siteUrl - Base URL of the site
398
+ * @param {Map} pagesByPath - Map of page.out.path -> page object
399
+ * @param {Object} logger - Antora logger instance
400
+ * @param {string} indent - Current indentation level (increases with nesting)
401
+ * @returns {Object} Count object: { found: number, notFound: number }
402
+ */
403
+ function collectNavItems(items, collector, siteUrl, pagesByPath, logger, indent = '') {
404
+ let foundCount = 0;
405
+ let notFoundCount = 0;
406
+
407
+ for (const item of items) {
408
+ // Process navigation items with URLs (actual page links)
409
+ if (item.url) {
410
+ // Remove leading slash from URL for pagesByPath lookup
411
+ // item.url format: "/component/version/page.html"
412
+ // pagesByPath key format: "component/version/page.html"
413
+ const urlPath = item.url.replace(/^\//, '');
414
+ const page = pagesByPath.get(urlPath);
415
+
416
+ if (page) {
417
+ foundCount++;
418
+
419
+ // Get page description from :description: attribute
420
+ let description = page.asciidoc.attributes["description"];
421
+
422
+ // Resolve attribute references in description (e.g., {product-name})
423
+ if (description) {
424
+ description = description.replace(/\{([^}]+)\}/g, (match, attrName) => {
425
+ return page.asciidoc.attributes[attrName] || match;
426
+ });
427
+ }
428
+
429
+ // Build full URL (item.url is relative to site root)
430
+ const pageUrl = `${siteUrl}${item.url}`;
431
+
432
+ // Format as markdown link with optional description
433
+ // Format: "- [Page Title](url): Description" or "- [Page Title](url)"
434
+ const link = description
435
+ ? `${indent}- [${item.content}](${pageUrl}): ${description}`
436
+ : `${indent}- [${item.content}](${pageUrl})`;
437
+
438
+ collector.push(link);
439
+ } else {
440
+ // URL in navigation doesn't match any page in pagesByPath
441
+ notFoundCount++;
442
+ logger.debug(`Navigation item URL not found in pages: ${urlPath}`);
443
+ }
444
+ }
445
+
446
+ // Process nested navigation items (recursive)
447
+ if (item.items && item.items.length > 0) {
448
+ // If this is a section header (has content but no URL), add it as bold text
449
+ if (!item.url && item.content) {
450
+ collector.push(`${indent}- **${item.content}**`);
451
+ }
452
+
453
+ // Recursively process nested items with increased indentation
454
+ const nestedCounts = collectNavItems(
455
+ item.items,
456
+ collector,
457
+ siteUrl,
458
+ pagesByPath,
459
+ logger,
460
+ indent + ' ' // Add 2 spaces for each nesting level
461
+ );
462
+
463
+ // Accumulate counts from nested items
464
+ foundCount += nestedCounts.found;
465
+ notFoundCount += nestedCounts.notFound;
466
+ }
467
+ }
468
+
469
+ return { found: foundCount, notFound: notFoundCount };
470
+ }
package/package.json ADDED
@@ -0,0 +1,24 @@
1
+ {
2
+ "name": "@pointsharp/antora-llm-generator",
3
+ "private": false,
4
+ "version": "1.0.0",
5
+ "description": "An Antora extension to generate llms.txt files for LLM consumption following llmstxt.org specification.",
6
+ "main": "llm-generator.js",
7
+ "keywords": [
8
+ "antora",
9
+ "antora-extension",
10
+ "llm",
11
+ "llmstxt",
12
+ "rag"
13
+ ],
14
+ "author": "Pointsharp AB",
15
+ "license": "Apache-2.0",
16
+ "repository": {
17
+ "type": "git",
18
+ "url": "git+https://github.com/pointsharp/antora-llm-generator.git"
19
+ },
20
+ "dependencies": {
21
+ "minimatch": "10",
22
+ "node-html-markdown": "^1.3.0"
23
+ }
24
+ }