npm - @uniweb/semantic-parser - Versions diffs - 1.1.5 → 1.1.7 - Mend

@uniweb/semantic-parser 1.1.5 → 1.1.7

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (26) hide show

package/AGENTS.md +3 -7
package/README.md +2 -159
package/package.json +2 -5
package/src/builders/doc.js +204 -0
package/src/index.js +2 -2
package/src/processors/groups.js +1 -8
package/src/processors/sequence.js +18 -17
package/docs/api.md +0 -350
package/docs/entity-consolidation.md +0 -470
package/docs/file-structure.md +0 -50
package/docs/guide.md +0 -206
package/docs/mapping-patterns.md +0 -928
package/docs/text-component-reference.md +0 -515
package/reference/README.md +0 -195
package/reference/Text.js +0 -188
package/src/mappers/accessor.js +0 -312
package/src/mappers/extractors.js +0 -416
package/src/mappers/helpers.js +0 -234
package/src/mappers/index.js +0 -28
package/src/mappers/types.js +0 -495
package/src/processors/groups_backup.js +0 -379
package/src/processors/groups_doc.md +0 -179
package/src/processors/sequence_backup.js +0 -402
package/src/processors_old/byType.js +0 -129
package/src/processors_old/groups.js +0 -240
package/src/processors_old/sequence.js +0 -140

package/AGENTS.md CHANGED Viewed

@@ -61,7 +61,6 @@ The parser returns a flat content structure:
   title: '',       // Main heading
   pretitle: '',    // Heading before main title
   subtitle: '',    // Heading after main title
-  subtitle2: '',   // Third heading level
   paragraphs: [],
   links: [],       // All link-like entities (including buttons, documents)
   images: [],
@@ -71,7 +70,7 @@ The parser returns a flat content structure:
   quotes: [],
   snippets: [],    // Fenced code — [{ language, code }]
   data: {},        // Structured data (tagged data blocks, forms, cards)
-  headings: [],    // Overflow headings after title/subtitle/subtitle2
+  headings: [],    // Headings after subtitle, in document order
   items: [],       // Child content groups (same structure recursively)
 }
 ```
@@ -126,8 +125,6 @@ Editor-specific nodes are mapped to standard entities:
 - `card-group` → `data[cardType]` arrays (e.g., `data.person`, `data.event`)
 - `document-group` → `links[]` with `role: "document"` and `download: true`
-See `docs/entity-consolidation.md` for complete mapping documentation.
 ### Tagged Data Blocks
 Data blocks with tags route parsed data to the `data` object:
@@ -165,14 +162,14 @@ Lists maintain hierarchy through nested structure. The `processListItems()` func
 ## Content Writing Conventions
-The parser implements the semantic conventions documented in `docs/guide.md`. Key patterns:
+Key patterns:
 - **Pretitle Pattern**: Any heading followed by a more important heading (e.g., H3→H1, H2→H1, H6→H5, etc.)
 - **Banner Pattern**: Image (with banner role or followed by heading) at start of first group
 - **Divider Mode**: Presence of any `horizontalRule` switches entire document to divider-based grouping
 - **Heading Groups**: Consecutive headings with increasing levels are consumed together
 - **Main Content**: First group is main if it's the only group OR has lower heading level than second group
-- **Body Headings**: Headings that overflow the header slots (title, subtitle, subtitle2) are automatically collected in `body.headings`
+- **Body Headings**: Headings after the title and subtitle slots are collected in `body.headings` in document order
 ## Testing Structure
@@ -188,6 +185,5 @@ Tests are organized by processor:
 - The parser never modifies the original ProseMirror document
 - Text content can include inline HTML for formatting (bold → `<strong>`, italic → `<em>`, links → `<a>`)
-- The `processors_old/` directory contains legacy implementations - do not modify
 - Context information in byType includes position, previous/next elements, and nearest heading
 - Group splitting logic differs significantly between heading mode and divider mode

package/README.md CHANGED Viewed

@@ -68,7 +68,6 @@ result = {
   pretitle: "",             // Heading before main title
   title: "Welcome",         // Main heading
   subtitle: "",             // Heading after main title
-  subtitle2: "",            // Third heading level
   // Body fields
   paragraphs: ["Get started today."],
@@ -78,8 +77,8 @@ result = {
   icons: [],
   lists: [],
   quotes: [],
-  data: {},                 // Structured data (tagged code blocks, forms, cards)
-  headings: [],             // Overflow headings after title/subtitle/subtitle2
+  data: {},                 // Structured data (tagged data blocks, forms, cards)
+  headings: [],             // Headings after subtitle, in document order
   // Additional content groups (from headings after content)
   items: [
@@ -143,154 +142,6 @@ sequence.forEach(element => {
 });
 ```
-## Content Mapping Utilities
-The parser includes optional mapping utilities to transform parsed content into component-specific formats. Perfect for visual editors and component-based systems.
-### Type System (Recommended)
-Automatically transform content based on field types with context-aware behavior:
-```js
-const schema = {
-  title: {
-    path: "title",
-    type: "plaintext",  // Auto-strips <strong>, <em>, etc.
-    maxLength: 60       // Auto-truncates intelligently
-  },
-  excerpt: {
-    path: "paragraphs",
-    type: "excerpt",    // Auto-creates excerpt from paragraphs
-    maxLength: 150
-  },
-  image: {
-    path: "images[0].url",
-    type: "image",
-    defaultValue: "/placeholder.jpg"
-  }
-};
-// Visual editor mode (default) - silent, graceful cleanup
-const data = mappers.extractBySchema(parsed, schema);
-// Build mode - validates and warns
-const data = mappers.extractBySchema(parsed, schema, { mode: 'build' });
-```
-**Field Types:** `plaintext`, `richtext`, `excerpt`, `number`, `image`, `link`
-### Using Pre-Built Extractors
-```js
-import { parseContent, mappers } from "@uniweb/semantic-parser";
-const parsed = parseContent(doc);
-// Extract hero component data
-const heroData = mappers.extractors.hero(parsed);
-// { title, subtitle, kicker, description, image, cta, ... }
-// Extract card data
-const cards = mappers.extractors.card(parsed, { useItems: true });
-// Extract statistics
-const stats = mappers.extractors.stats(parsed);
-// [{ value: "12", label: "Partner Labs" }, ...]
-// Extract navigation menu
-const nav = mappers.extractors.navigation(parsed);
-// Extract features list
-const features = mappers.extractors.features(parsed);
-```
-### Schema-Based Mapping
-Define custom mappings using schemas:
-```js
-const schema = {
-  brand: "pretitle",
-  title: "title",
-  subtitle: "subtitle",
-  image: {
-    path: "images[0].url",
-    defaultValue: "/placeholder.jpg"
-  },
-  actions: {
-    path: "links",
-    transform: links => links.map(l => ({ label: l.label, type: "primary" }))
-  }
-};
-const componentData = mappers.accessor.extractBySchema(parsed, schema);
-```
-### Available Extractors
-- `hero` - Hero/banner sections
-- `card` - Card components
-- `article` - Article/blog content
-- `stats` - Statistics/metrics
-- `navigation` - Navigation menus
-- `features` - Feature lists
-- `testimonial` - Testimonials
-- `faq` - FAQ sections
-- `pricing` - Pricing tiers
-- `team` - Team members
-- `gallery` - Image galleries
-See **[Mapping Patterns Guide](./docs/mapping-patterns.md)** for complete documentation.
-## Rendering Content
-After extracting content, render it using a Text component that handles paragraph arrays, rich HTML, and formatting marks.
-### Text Component Pattern
-```jsx
-import { parseContent, mappers } from '@uniweb/semantic-parser';
-import { H1, P } from './components/Text';
-const parsed = parseContent(doc);
-const hero = mappers.extractors.hero(parsed);
-// Render extracted content
-<>
-  <H1 text={hero.title} />
-  <P text={hero.description} />  {/* Handles arrays automatically */}
-</>
-```
-The Text component:
-- **Handles arrays** - Renders `["Para 1", "Para 2"]` as separate paragraphs
-- **Supports rich HTML** - Preserves formatting marks
-- **Multi-line headings** - Wraps multiple lines in semantic heading tags
-- **Color marks** - Supports `<mark>` and `<span>` for visual emphasis
-See **[Text Component Reference](./docs/text-component-reference.md)** for implementation guide.
-### Sanitization
-Sanitize content at the engine level (during data preparation), not in components:
-```javascript
-import { parseContent, mappers } from '@uniweb/semantic-parser';
-function prepareData(parsed) {
-  const hero = mappers.extractors.hero(parsed);
-  return {
-    ...hero,
-    title: mappers.types.sanitizeHtml(hero.title, {
-      allowedTags: ['strong', 'em', 'mark', 'span'],
-      allowedAttr: ['class', 'data-variant']
-    })
-  };
-}
-```
-The parser provides sanitization utilities but doesn't enforce their use. Your engine decides when to sanitize based on security requirements.
 ## Content Grouping
 The parser supports two grouping modes:
@@ -346,14 +197,6 @@ Bracketed spans (`[text]{.class}`) are converted to `<span>` elements with their
 Spans can have classes, IDs, and custom attributes. They combine with other marks—a span with bold becomes `<strong><span class="...">text</span></strong>`.
-## Documentation
-- **[Content Writing Guide](./docs/guide.md)**: Learn how to structure content for optimal parsing
-- **[API Reference](./docs/api.md)**: Complete API documentation with all element types
-- **[Mapping Patterns Guide](./docs/mapping-patterns.md)**: Transform content to component-specific formats
-- **[Text Component Reference](./docs/text-component-reference.md)**: Reference implementation for rendering parsed content
-- **[File Structure](./docs/file-structure.md)**: Codebase organization
 ## Use Cases
 - **Component-based websites**: Extract structured data for React/Vue components

package/package.json CHANGED Viewed

@@ -1,13 +1,11 @@
 {
   "name": "@uniweb/semantic-parser",
-  "version": "1.1.5",
+  "version": "1.1.7",
   "description": "Semantic parser for ProseMirror/TipTap content structures",
   "type": "module",
   "main": "./src/index.js",
   "exports": {
-    ".": "./src/index.js",
-    "./mappers": "./src/mappers/index.js",
-    "./mappers/*": "./src/mappers/*.js"
+    ".": "./src/index.js"
   },
   "keywords": [
     "prosemirror",
@@ -30,7 +28,6 @@
   },
   "homepage": "https://github.com/uniweb/semantic-parser#readme",
   "directories": {
-    "doc": "docs",
     "test": "tests"
   },
   "dependencies": {

package/src/builders/doc.js ADDED Viewed

@@ -0,0 +1,204 @@
+/**
+ * Reverse conversion: content structure → TipTap document.
+ *
+ * Mirrors the forward parser (processors/sequence.js + processors/groups.js)
+ * so that parseContent(buildDoc(content)) roundtrips cleanly.
+ *
+ * Starter content uses plain strings (no HTML marks), so the conversion
+ * is straightforward — no need to reverse inline HTML formatting.
+ */
+// --- TipTap node builders ---
+function textNode(text) {
+  return { type: 'text', text }
+}
+function heading(level, text) {
+  if (!text) return null
+  // Multi-line title: string[] → multiple headings at same level
+  if (Array.isArray(text)) {
+    return text.map(t => heading(level, t)).filter(Boolean)
+  }
+  return {
+    type: 'heading',
+    attrs: { level },
+    content: [textNode(text)],
+  }
+}
+function paragraph(text) {
+  if (!text) return null
+  return {
+    type: 'paragraph',
+    content: [textNode(text)],
+  }
+}
+function linkParagraph({ text, href, target }) {
+  if (!text || !href) return null
+  const mark = { type: 'link', attrs: { href } }
+  if (target) mark.attrs.target = target
+  return {
+    type: 'paragraph',
+    content: [{ type: 'text', text, marks: [mark] }],
+  }
+}
+function imageBlock({ src, alt = '', caption = '', direction, role, width, height }) {
+  const attrs = { url: src, alt }
+  if (caption) attrs.caption = caption
+  if (direction) attrs.direction = direction
+  if (role) attrs.role = role
+  if (width && height) {
+    attrs.aspect_ratio = { width, height, ratio: (height / width) * 100 }
+  }
+  return { type: 'ImageBlock', attrs }
+}
+function iconNode({ src, svg, library, name, size, color }) {
+  // UniwebIcon supports multiple source types
+  const attrs = {}
+  if (svg || src) attrs.svg = svg || src
+  if (library) attrs.library = library
+  if (name) attrs.name = name
+  if (size) attrs.size = size
+  if (color) attrs.color = color
+  return { type: 'UniwebIcon', attrs }
+}
+function videoNode({ src, caption, direction, coverImg }) {
+  const attrs = { src }
+  if (caption) attrs.caption = caption
+  if (direction) attrs.direction = direction
+  if (coverImg) attrs.coverImg = coverImg
+  return { type: 'Video', attrs }
+}
+function dividerBlock() {
+  return { type: 'DividerBlock' }
+}
+function bulletList(items) {
+  if (!items || !items.length) return null
+  return {
+    type: 'bulletList',
+    content: items.map(item => ({
+      type: 'listItem',
+      content: [paragraph(item)].filter(Boolean),
+    })),
+  }
+}
+// --- Group builder ---
+/**
+ * Build TipTap nodes from a content group (main or item).
+ *
+ * @param {Object} group - Content structure: { pretitle, title, subtitle, paragraphs, images, ... }
+ * @param {number} titleLevel - Heading level for title (1 for main, 2 for items)
+ * @returns {Array} Array of TipTap nodes
+ */
+function buildGroupNodes(group, titleLevel = 1) {
+  const nodes = []
+  // 1. Headings: pretitle → title → subtitle
+  // Pretitle uses a higher level number (less important) than title
+  // e.g., H3 before H1 — mirrors isPreTitle() in groups.js
+  if (group.pretitle) {
+    const pre = heading(titleLevel + 2, group.pretitle)
+    if (Array.isArray(pre)) nodes.push(...pre)
+    else if (pre) nodes.push(pre)
+  }
+  if (group.title) {
+    const t = heading(titleLevel, group.title)
+    if (Array.isArray(t)) nodes.push(...t)
+    else if (t) nodes.push(t)
+  }
+  // Subtitle is one level below title
+  if (group.subtitle) {
+    const sub = heading(titleLevel + 1, group.subtitle)
+    if (Array.isArray(sub)) nodes.push(...sub)
+    else if (sub) nodes.push(sub)
+  }
+  // 2. Body fields in document order
+  if (group.paragraphs) {
+    for (const p of group.paragraphs) {
+      const node = paragraph(p)
+      if (node) nodes.push(node)
+    }
+  }
+  if (group.images) {
+    for (const img of group.images) {
+      nodes.push(imageBlock(img))
+    }
+  }
+  if (group.links) {
+    for (const link of group.links) {
+      const node = linkParagraph(link)
+      if (node) nodes.push(node)
+    }
+  }
+  if (group.icons) {
+    for (const icon of group.icons) {
+      nodes.push(iconNode(icon))
+    }
+  }
+  if (group.videos) {
+    for (const video of group.videos) {
+      nodes.push(videoNode(video))
+    }
+  }
+  if (group.lists) {
+    for (const list of group.lists) {
+      const node = bulletList(list)
+      if (node) nodes.push(node)
+    }
+  }
+  return nodes
+}
+// --- Main export ---
+/**
+ * Build a TipTap document from a content structure.
+ *
+ * This is the reverse of parseContent(): given a flat content object
+ * (title, paragraphs, items, etc.), produce a TipTap document that
+ * roundtrips through parseContent() to yield the same structure.
+ *
+ * @param {Object} content - Content structure (same shape as parseContent output / starter)
+ * @returns {Object|null} TipTap document { type: 'doc', content: [...] }, or null if empty
+ */
+function buildDoc(content) {
+  if (!content) return null
+  const nodes = []
+  // Main group content (title level 1)
+  nodes.push(...buildGroupNodes(content, 1))
+  // Items: separated by DividerBlock (mirrors divider-based grouping in groups.js)
+  if (content.items && content.items.length > 0) {
+    for (const item of content.items) {
+      nodes.push(dividerBlock())
+      // Item headings use level 2 (one below main H1)
+      nodes.push(...buildGroupNodes(item, 2))
+    }
+  }
+  if (nodes.length === 0) return null
+  return { type: 'doc', content: nodes }
+}
+export { buildDoc }

package/src/index.js CHANGED Viewed

@@ -1,6 +1,6 @@
 import { processSequence } from "./processors/sequence.js";
 import { processGroups } from "./processors/groups.js";
-import * as mappers from "./mappers/index.js";
+import { buildDoc } from "./builders/doc.js";
 /**
  * Parse ProseMirror/TipTap content into semantic structure
@@ -30,4 +30,4 @@ function parseContent(doc, options = {}) {
     };
 }
-export { parseContent, mappers };
+export { parseContent, buildDoc };

package/src/processors/groups.js CHANGED Viewed

@@ -9,7 +9,6 @@ function flattenGroup(group) {
         title: group.header.title || '',
         pretitle: group.header.pretitle || '',
         subtitle: group.header.subtitle || '',
-        subtitle2: group.header.subtitle2 || '',
         paragraphs: group.body.paragraphs || [],
         links: group.body.links || [],
         images: group.body.images || [],
@@ -37,7 +36,6 @@ function processGroups(sequence, options = {}) {
             title: '',
             pretitle: '',
             subtitle: '',
-            subtitle2: '',
             paragraphs: [],
             links: [],
             images: [],
@@ -75,7 +73,6 @@ function processGroups(sequence, options = {}) {
         title: '',
         pretitle: '',
         subtitle: '',
-        subtitle2: '',
         paragraphs: [],
         links: [],
         images: [],
@@ -227,7 +224,6 @@ function processGroupContent(elements) {
         pretitle: "",
         title: "",
         subtitle: "",
-        subtitle2: "",
     };
     const body = {
@@ -290,11 +286,8 @@ function processGroupContent(elements) {
             } else if (!header.subtitle) {
                 header.subtitle = element.text;
                 lastSlot = 'subtitle';
-            } else if (!header.subtitle2) {
-                header.subtitle2 = element.text;
-                lastSlot = 'subtitle2';
             } else {
-                // After subtitle2, we're in body - collect heading
+                // After subtitle, remaining headings go to body
                 body.headings.push(element.text);
                 lastSlot = null;
             }

package/src/processors/sequence.js CHANGED Viewed

@@ -412,21 +412,25 @@ function processInlineElements(content) {
     return items;
 }
-function makeAssetUrl(info) {
-    let url = "";
-    let src = info?.src || info?.url || "";
+const ASSET_BASE_URL = "https://assets.uniweb.app/";
-    if (src) {
-        url = src;
-    } else if (info?.identifier) {
-        url =
-            new uniweb.Profile(`docufolio/profile`, "_template").getAssetInfo(
-                info.identifier
-            )?.src || "";
-    }
+/**
+ * Resolve an asset identifier ({version}/{filename}) to a direct URL.
+ * Assets are hosted at assets.uniweb.app under dist/{version}/base.{ext}.
+ */
+function resolveAssetIdentifier(identifier) {
+    if (!identifier || typeof identifier !== "string") return "";
+    const [version, filename] = identifier.split("/");
+    if (!filename) return "";
+    const ext = filename.substring(filename.lastIndexOf(".") + 1);
+    return `${ASSET_BASE_URL}dist/${version}/base.${ext}`;
+}
-    return url;
+function makeAssetUrl(info) {
+    const src = info?.src || info?.url || "";
+    if (src) return src;
+    if (info?.identifier) return resolveAssetIdentifier(info.identifier);
+    return "";
 }
 function parseCardBlock(itemAttrs) {
@@ -467,10 +471,7 @@ function parseDocumentBlock(itemAttrs) {
         const { identifier = "" } = info;
         if (identifier) {
-            ele.downloadUrl = new uniweb.Profile(
-                `docufolio/profile`,
-                "_template"
-            ).getAssetInfo(identifier)?.href;
+            ele.downloadUrl = resolveAssetIdentifier(identifier);
         }
     }