npm - @voicenter-team/nuxt-llms-generator - Versions diffs - 0.1.9 → 0.1.11 - Mend

@voicenter-team/nuxt-llms-generator 0.1.9 → 0.1.11

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (4) hide show

package/README.md +625 -625
package/dist/chunks/llms-files-generator.mjs +313 -115
package/dist/module.json +1 -1
package/package.json +63 -63

package/dist/chunks/llms-files-generator.mjs CHANGED Viewed

@@ -1,6 +1,6 @@
 import { existsSync, mkdirSync, readFileSync, writeFileSync, unlinkSync } from 'fs';
 import { join, dirname, basename } from 'path';
-import { transliterate } from 'transliteration';
+import { slugify } from 'transliteration';
 import Mustache from 'mustache';
 import Anthropic from '@anthropic-ai/sdk';
 import { createHash } from 'crypto';
@@ -17,10 +17,72 @@ You are an expert at creating **Mustache.js templates** that generate **LLM know
 ---
+## \u26A0\uFE0F CRITICAL RULES - NEVER VIOLATE
+### 1. DATA-DRIVEN CONTENT ONLY
+- **EVERY piece of content** must come from a Mustache binding: \`{{propertyName}}\`
+- **NEVER invent, assume, or add content** that doesn't exist in the provided JSON
+- **NO hardcoded descriptions, lists, or facts**
+- If a property doesn't exist in JSON, don't create a section for it
+### 2. ALLOWED CONTEXTUAL ADDITIONS
+You MAY add:
+- **Section headings** that describe what the data represents (e.g., "Key Features", "Technical Details")
+- **Brief introductory phrases** that set context (e.g., "The following items are available:")
+- **Structural markers** for clarity (e.g., "Navigation:", "Metadata:")
+You MAY NOT add:
+- Descriptions of features/benefits not in JSON
+- Explanatory text about what something does
+- Lists of items not present in data
+- Assumptions about the page purpose
+### 3. EXAMPLES OF VIOLATIONS
+\u274C **BAD - Hardcoded content:**
+\`\`\`mustache
+## Key Benefits
+- Real-time monitoring
+- Detailed analytics
+- Easy to use
+\`\`\`
+*Problem: These benefits are invented, not from JSON*
+\u274C **BAD - Invented descriptions:**
+\`\`\`mustache
+This dashboard provides comprehensive monitoring capabilities for call centers...
+\`\`\`
+*Problem: Description is made up*
+\u2705 **GOOD - Data-driven with context:**
+\`\`\`mustache
+{{#features.0}}
+## Available Features
+{{#features}}
+- **{{name}}**: {{description}}
+{{/features}}
+{{/features.0}}
+\`\`\`
+*Good: Content comes from JSON, heading provides context*
+\u2705 **GOOD - Minimal introduction:**
+\`\`\`mustache
+{{#items.0}}
+## Items Overview
+The following items are available:
+{{#items}}
+- {{title}}
+{{/items}}
+{{/items.0}}
+\`\`\`
+*Good: Brief intro, but content is from JSON*
+---
 ## \u{1F3AF} TRUE PURPOSE: Help LLMs Answer Questions Efficiently
 **Critical Understanding:**
-These \`.md\` files are **NOT website copies** \u2014 they are **LLM knowledge base entries** designed for **inference** (understanding), not training.
+These \`.md\` files are **LLM knowledge base entries** designed for **inference** (understanding), not training.
 **Primary Goal:** Enable LLMs to quickly answer user questions about this website page within **limited context windows** (typically 200K tokens).
@@ -47,85 +109,81 @@ ${JSON.stringify(request.pageContent, null, 2)}
 ## \u{1F9E0} Content Philosophy: Think "Knowledge Base Entry"
-### 1. Start with Expert-Level Summary
-- **First impression matters:** What would an expert say about this page in 1-2 sentences?
-- Lead with **value proposition** or **core purpose**
-- Use the blockquote format (\`> \`) for the summary \u2014 this signals importance
+### 1. Start with the Most Important Data
+- Lead with title/heading properties
+- Add main description/summary if available
+- Use blockquote (\`> \`) for key summaries
 ### 2. Structure for Question-Answering
 Anticipate questions an LLM might need to answer:
-- "What is this?" \u2192 Main heading + summary
-- "What does it do/offer?" \u2192 Key features/benefits section
-- "Who is it for?" \u2192 Target audience/use cases
-- "How does it work?" \u2192 Process/methodology
-- "What are the details?" \u2192 Technical specs/pricing/etc.
+- "What is this?" \u2192 Main heading + description properties
+- "What does it offer?" \u2192 Lists of items/features from JSON
+- "Who is it for?" \u2192 Target audience properties (if they exist)
+- "What are the details?" \u2192 Technical/metadata properties
-### 3. Prioritize Information by Importance
+### 3. Prioritize by JSON Structure
 **Essential First:**
-- What this page represents
-- Primary value/purpose
-- Key differentiators
+- Root-level title/name/heading properties
+- Description/summary properties
+- Main content arrays
 **Supporting Details Second:**
-- Features, benefits, specifications
-- Use cases, examples
-- Technical details
+- Feature lists, item arrays
+- Nested objects with details
+- Links and references
-**Peripheral Information Last:**
-- Meta information, related links
-- Supplementary context
+**Metadata Last:**
+- URLs, IDs (if useful for context)
+- Timestamps, technical details
 ### 4. Optimize for Scanability
-- Use **hierarchical headings** (\`#\`, \`##\`, \`###\`) to create clear structure
-- Employ **bullet lists** for scannable facts
-- Keep paragraphs **short and dense** (2-3 sentences max)
-- Use **semantic Markdown** only \u2014 no HTML, entities, or attributes
+- Use **hierarchical headings** (\`#\`, \`##\`, \`###\`)
+- Employ **bullet lists** for arrays
+- Keep structure **clean and semantic**
+- Use Markdown only (no HTML)
 ---
 ## \u{1F527} Technical Principles (Key-Agnostic Design)
 ### 1. Dynamic Property Inference
-**Do not assume fixed property names.** Infer content type and importance from:
+**Do not assume fixed property names.** Infer content type from:
 - **Value structure:** Object, array, string, number
 - **Value length:** Short strings = titles; long text = descriptions
-- **Position in JSON:** Root-level = high importance; nested = contextual details
-- **Semantic patterns:** URLs, images, dates, IDs
+- **Position in JSON:** Root-level = high importance
+- **Semantic patterns:** URLs, images, dates
 ### 2. Exact Property Bindings
-- Always use the **exact property name** from JSON: \`{{actualKeyName}}\`
+- Always use **exact property name** from JSON: \`{{actualKeyName}}\`
 - Do NOT rename or modify binding identifiers
-- The Mustache bindings must match JSON precisely
+- Mustache bindings must match JSON precisely
 ### 3. Humanized Section Headings
 While bindings stay exact, convert keys to readable headings:
 - \`productFeatures\` \u2192 "Product Features"
-- \`pricing_tiers\` \u2192 "Pricing Tiers"
-- \`techSpecs\` \u2192 "Technical Specifications"
+- \`supportPageItems\` \u2192 "Available Support Topics"
+- \`breadcrumbsLinks\` \u2192 "Navigation Path"
 ### 4. Semantic Interpretation Guide
 - **Short root strings (5-50 chars)** \u2192 Likely page title
 - **Medium text (50-300 chars)** \u2192 Likely summary/tagline
 - **Long text (300+ chars)** \u2192 Likely detailed description
+- **Arrays of objects** \u2192 Repeated sections with structure
 - **Arrays of primitives** \u2192 Bullet lists
-- **Arrays of objects** \u2192 Repeated sections or tables
-- **Nested objects** \u2192 Sub-sections with logical hierarchy
 - **URL-like strings** \u2192 Render as \`[Label]({{url}})\`
-- **Image URLs** \u2192 Render as \`![Description]({{imageUrl}})\`
 ### 5. Noise Filtering
-**Exclude non-content fields:**
-- IDs (\`id\`, \`nodeId\`, \`_id\`)
-- Timestamps (\`createdAt\`, \`updatedAt\`, \`lastModified\`)
-- Internal flags (\`isPublished\`, \`sortOrder\`, \`hidden\`)
-- System metadata (\`_type\`, \`contentType\`, \`template\`)
+**Exclude technical metadata:**
+- IDs: \`id\`, \`nodeId\`, \`_id\`, \`guid\`
+- Timestamps: \`createdAt\`, \`updatedAt\`
+- Flags: \`isPublished\`, \`sortOrder\`, \`hidden\`
+- System: \`_type\`, \`contentType\`, \`template\`
 ### 6. Hierarchy & Nesting
 - **Root level** \u2192 \`#\` (H1) \u2014 one per document
 - **Primary sections** \u2192 \`##\` (H2)
 - **Sub-sections** \u2192 \`###\` (H3)
-- **Details** \u2192 \`####\` (H4) \u2014 avoid going deeper
-- Heading depth corresponds to JSON nesting, but stay practical
+- **Details** \u2192 \`####\` (H4) \u2014 avoid deeper
 ---
@@ -133,66 +191,73 @@ While bindings stay exact, convert keys to readable headings:
 ### Mandatory Opening
 \`\`\`mustache
-# {{primaryTitle}}
+# {{primaryTitleProperty}}
-{{#summaryOrTagline}}
-> {{summaryOrTagline}}
-{{/summaryOrTagline}}
+{{#summaryProperty}}
+> {{summaryProperty}}
+{{/summaryProperty}}
 \`\`\`
-### Recommended Sections (adapt to JSON)
+### Recommended Sections (adapt to actual JSON)
 \`\`\`mustache
 {{#mainDescription}}
+## Overview
 {{mainDescription}}
 {{/mainDescription}}
-{{#keyFeatures.0}}
-## Key Features
-{{#keyFeatures}}
-- **{{featureName}}**: {{featureDescription}}
-{{/keyFeatures}}
-{{/keyFeatures.0}}
-{{#useCases.0}}
-## Use Cases
-{{#useCases}}
-### {{caseTitle}}
-{{caseDescription}}
-{{/useCases}}
-{{/useCases.0}}
-{{#technicalDetails.0}}
-## Technical Details
-{{#technicalDetails}}
-- **{{detailLabel}}**: {{detailValue}}
-{{/technicalDetails}}
-{{/technicalDetails.0}}
+{{#itemsArray.0}}
+## Available Items
+{{#itemsArray}}
+### {{itemTitle}}
+{{itemDescription}}
+{{/itemsArray}}
+{{/itemsArray.0}}
+{{#navigationLinks.0}}
+## Navigation
+{{#navigationLinks}}
+- [{{title}}]({{link}})
+{{/navigationLinks}}
+{{/navigationLinks.0}}
+{{#technicalData}}
+## Technical Information
+- **URL**: {{url}}
+- **Type**: {{type}}
+{{/technicalData}}
 \`\`\`
-**Note:** This is an illustrative pattern. Adapt section names and structure to match the actual JSON dynamically.
+**Important:** These are examples. Your template must match the ACTUAL JSON structure provided.
 ---
 ## \u2705 Output Requirements
-1. **Output ONLY the Mustache template** \u2014 no explanations, no code fences, no preamble
+1. **Output ONLY the Mustache template** \u2014 no explanations, no markdown code fences, no preamble
 2. **Use exact JSON property names** in all bindings
 3. **Generate clean Markdown** \u2014 no HTML, entities, or attributes
-4. **Prioritize content** \u2014 most important information first
-5. **Be concise** \u2014 optimize for limited context windows
-6. **Structure for questions** \u2014 LLMs should easily extract facts
-7. **Stay domain-agnostic** \u2014 template should work for any JSON shape
+4. **Data-driven content** \u2014 no invented facts or descriptions
+5. **Contextual headings allowed** \u2014 but content must be from JSON
+6. **Be concise** \u2014 optimize for limited context windows
+7. **Structure for questions** \u2014 LLMs should easily extract facts
 ---
 ## \u{1F680} Your Task
-Analyze the provided JSON structure and **generate a Mustache template** that produces an **LLM knowledge base entry** following these principles.
+Analyze the provided JSON structure and **generate a Mustache template** that:
-**Think:**
-- What would an LLM need to know to answer questions about this page?
-- What's the core value/purpose this page communicates?
-- How can I structure this for maximum inference efficiency?
+1. **Uses ONLY data from JSON** (no invented content)
+2. **Adds logical section headings** for context
+3. **Structures data for question-answering**
+4. **Prioritizes most important properties first**
+5. **Remains universal** (works for any JSON shape)
+**Remember:**
+- Headings can be contextual: \u2705
+- Content must be from JSON: \u2705\u2705\u2705
+- No made-up descriptions: \u274C
+- No assumed features: \u274C
 Generate the template now.
 `;
@@ -216,7 +281,7 @@ class AnthropicClient {
         const response = await this.client.messages.create({
           model: this.model,
           max_tokens: 4e3,
-          temperature: 0.1,
+          temperature: 0.3,
           messages: [{
             role: "user",
             content: prompt
@@ -694,42 +759,174 @@ function generatePageId(urlItem) {
   const nodeID = urlItem.nodeID || "UnknownNode";
   return `${templateAlias}_${nodeID}`;
 }
+function isImportantKey(key) {
+  const importantPatterns = [
+    "title",
+    "name",
+    "heading",
+    "description",
+    "summary",
+    "content",
+    "text",
+    "body",
+    "value",
+    "label",
+    "caption",
+    "alt",
+    "message",
+    "url",
+    "link",
+    "href"
+  ];
+  const lowerKey = key.toLowerCase();
+  return importantPatterns.some((pattern) => lowerKey.includes(pattern));
+}
+function isMetadataKey(key) {
+  const metadataPatterns = [
+    "id",
+    "guid",
+    "key",
+    "_id",
+    "nodeid",
+    "created",
+    "updated",
+    "modified",
+    "timestamp",
+    "date",
+    "sort",
+    "order",
+    "index",
+    "position",
+    "published",
+    "hidden",
+    "visible",
+    "enabled",
+    "status",
+    "type",
+    "contenttype",
+    "template",
+    "alias",
+    "path",
+    "meta",
+    "metadata",
+    "seo",
+    "schema",
+    "properties"
+  ];
+  const lowerKey = key.toLowerCase();
+  return metadataPatterns.some((pattern) => lowerKey.includes(pattern));
+}
+function recursiveTruncate(content, maxTokens, currentDepth = 0) {
+  if (currentDepth > 10) {
+    return { _truncated: "Max depth reached" };
+  }
+  if (maxTokens < 10) {
+    return void 0;
+  }
+  if (content === null || content === void 0) {
+    return content;
+  }
+  if (typeof content !== "object") {
+    if (typeof content === "string" && content.length > 2e3) {
+      return content.substring(0, 2e3) + "...";
+    }
+    return content;
+  }
+  if (Array.isArray(content)) {
+    if (content.length === 0)
+      return content;
+    const itemLimit = Math.max(3, Math.floor(15 / (currentDepth + 1)));
+    const tokensPerItem = Math.floor(maxTokens / Math.min(content.length, itemLimit));
+    const truncatedArray = content.slice(0, itemLimit).map((item) => recursiveTruncate(item, tokensPerItem, currentDepth + 1)).filter((item) => item !== void 0);
+    if (content.length > truncatedArray.length) {
+      truncatedArray.push({
+        _note: `... and ${content.length - truncatedArray.length} more items`
+      });
+    }
+    return truncatedArray;
+  }
+  const truncatedObj = {};
+  const entries = Object.entries(content);
+  const withoutMetadata = entries.filter(([key]) => !isMetadataKey(key));
+  if (withoutMetadata.length === 0) {
+    return { _note: "Only metadata, removed" };
+  }
+  const importantEntries = withoutMetadata.filter(([key]) => isImportantKey(key));
+  const normalEntries = withoutMetadata.filter(([key]) => !isImportantKey(key));
+  const importantBudget = Math.floor(maxTokens * 0.4);
+  const tokensPerImportant = importantEntries.length > 0 ? Math.floor(importantBudget / importantEntries.length) : 0;
+  for (const [key, value] of importantEntries) {
+    const processedValue = recursiveTruncate(value, tokensPerImportant, currentDepth + 1);
+    if (processedValue !== void 0) {
+      truncatedObj[key] = processedValue;
+    }
+  }
+  const usedTokens = estimateContentTokens(truncatedObj);
+  const remainingBudget = maxTokens - usedTokens;
+  if (remainingBudget > 100 && normalEntries.length > 0) {
+    const sortedNormal = normalEntries.sort(([_a, valueA], [_b, valueB]) => {
+      const sizeA = JSON.stringify(valueA).length;
+      const sizeB = JSON.stringify(valueB).length;
+      return sizeA - sizeB;
+    });
+    const tokensPerNormal = Math.floor(remainingBudget / sortedNormal.length);
+    for (const [key, value] of sortedNormal) {
+      const processedValue = recursiveTruncate(value, tokensPerNormal, currentDepth + 1);
+      if (processedValue !== void 0) {
+        truncatedObj[key] = processedValue;
+        const newSize = estimateContentTokens(truncatedObj);
+        if (newSize > maxTokens) {
+          delete truncatedObj[key];
+          break;
+        }
+      }
+    }
+  }
+  return Object.keys(truncatedObj).length > 0 ? truncatedObj : void 0;
+}
+function emergencyTruncate(content, maxTokens) {
+  const result = { ...content };
+  const keys = Object.keys(result).sort((a, b) => {
+    const aImportant = isImportantKey(a) ? 1 : 0;
+    const bImportant = isImportantKey(b) ? 1 : 0;
+    return aImportant - bImportant;
+  });
+  for (const key of keys) {
+    if (estimateContentTokens(result) <= maxTokens)
+      break;
+    delete result[key];
+    console.warn(`    Emergency: removed "${key}"`);
+  }
+  return result;
+}
 function estimateContentTokens(content) {
   try {
     const jsonString = JSON.stringify(content);
-    return Math.ceil(jsonString.length / 4);
+    return Math.ceil(jsonString.length / 3);
   } catch {
     return 0;
   }
 }
-function truncateContentIfNeeded(content, maxTokens = 18e4) {
+function truncateContentIfNeeded(content, maxTokens = 1e5) {
   const estimatedTokens = estimateContentTokens(content);
   if (estimatedTokens <= maxTokens) {
     return content;
   }
-  console.warn(`Content too large (${estimatedTokens} tokens > ${maxTokens} limit), truncating...`);
-  const truncatedContent = { ...content };
-  const sortedKeys = Object.keys(truncatedContent).sort((a, b) => {
-    const sizeA = estimateContentTokens({ [a]: truncatedContent[a] });
-    const sizeB = estimateContentTokens({ [b]: truncatedContent[b] });
-    return sizeB - sizeA;
-  });
-  for (const key of sortedKeys) {
-    if (estimateContentTokens(truncatedContent) <= maxTokens) {
-      break;
-    }
-    const value = truncatedContent[key];
-    if (Array.isArray(value) && value.length > 10) {
-      truncatedContent[key] = value.slice(0, 10);
-      console.warn(`Truncated array ${key} from ${value.length} to 10 items`);
-    } else if (typeof value === "string" && value.length > 5e3) {
-      truncatedContent[key] = value.substring(0, 5e3) + "...";
-      console.warn(`Truncated string ${key} from ${value.length} to 5000 chars`);
-    }
-  }
-  const finalTokens = estimateContentTokens(truncatedContent);
-  console.log(`Content truncated from ${estimatedTokens} to ${finalTokens} tokens`);
-  return truncatedContent;
+  console.warn(`\u26A0\uFE0F  Content too large (${estimatedTokens} tokens > ${maxTokens} limit), truncating recursively...`);
+  const truncatedContent = recursiveTruncate(content, maxTokens, 0);
+  const result = truncatedContent && typeof truncatedContent === "object" && !Array.isArray(truncatedContent) ? truncatedContent : {
+    _error: "Content truncation failed",
+    original: content
+  };
+  const finalTokens = estimateContentTokens(result);
+  const preservedKeys = Object.keys(result).length;
+  const originalKeys = Object.keys(content).length;
+  console.log(`\u2705 Content truncated: ${estimatedTokens} \u2192 ${finalTokens} tokens (preserved ${preservedKeys}/${originalKeys} root keys)`);
+  if (finalTokens > maxTokens) {
+    console.error(`\u274C Recursive truncation insufficient (${finalTokens} > ${maxTokens}), performing emergency truncation...`);
+    return emergencyTruncate(result, maxTokens);
+  }
+  return result;
 }
 function shouldGenerateTemplate(umbracoData, urlItem) {
@@ -1067,7 +1264,7 @@ class TemplateGenerator {
     const pageId = generatePageId(urlItem);
     console.log(`Generating new template for ${pageId} (${urlItem.url})`);
     const tokensBeforeTruncation = estimateContentTokens(pageContent);
-    const truncatedContent = truncateContentIfNeeded(pageContent, 18e4);
+    const truncatedContent = truncateContentIfNeeded(pageContent, 65e3);
     const tokensAfterTruncation = estimateContentTokens(truncatedContent);
     if (tokensBeforeTruncation > tokensAfterTruncation) {
       console.warn(`Page ${pageId} content truncated: ${tokensBeforeTruncation} -> ${tokensAfterTruncation} tokens`);
@@ -1392,17 +1589,18 @@ class LLMSFilesGenerator {
     return `Information about ${urlItem.url}`;
   }
   sanitizeUrlForFilename(url) {
-    if (url === "/") {
+    if (!url || url === "/")
       return "index";
-    }
     let filename = url.replace(/^\//, "").replace(/\/$/, "").replace(/\//g, "-").replace(/--+/g, "-").replace(/^-+|-+$/g, "");
-    if (!filename || filename === "") {
-      filename = `index_${url.length}_${Date.now()}`;
-    }
-    if (filename.startsWith("-") || filename.startsWith(".")) {
-      filename = "page-" + filename.replace(/^[-.]/, "");
-    }
-    return transliterate(filename);
+    filename = slugify(filename, {
+      lowercase: true,
+      separator: "-"
+    });
+    if (!filename)
+      filename = `index-${Date.now()}`;
+    if (/^[.-]/.test(filename))
+      filename = `page-${filename.replace(/^[.-]+/, "")}`;
+    return filename;
   }
   getLLMSFilePath(fullPath) {
     const filename = basename(fullPath);

package/dist/module.json CHANGED Viewed

@@ -4,5 +4,5 @@
   "compatibility": {
     "nuxt": "^3.0.0"
   },
-  "version": "0.1.9"
+  "version": "0.1.11"
 }