npm - @uniweb/semantic-parser - Versions diffs - 1.0.8 → 1.0.10 - Mend

@uniweb/semantic-parser 1.0.8 → 1.0.10

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (11) hide show

package/AGENTS.md +42 -25
package/README.md +52 -104
package/docs/api.md +38 -40
package/docs/mapping-patterns.md +47 -47
package/docs/text-component-reference.md +3 -3
package/package.json +4 -1
package/src/index.js +5 -7
package/src/mappers/extractors.js +113 -120
package/src/processors/groups.js +105 -30
package/src/processors/sequence.js +59 -11
package/src/processors/byType.js +0 -130

package/AGENTS.md CHANGED Viewed

@@ -52,35 +52,26 @@ const result = parseContent(doc);
 // }
 ```
-### Content Group Structure
+### Content Output Structure
-Groups follow a specific structure defined in `processGroupContent()`:
+The parser returns a flat content structure:
 ```js
 {
-  header: {
-    pretitle: '',  // H3 before main title
-    title: '',     // Main heading (H1 or H2)
-    subtitle: ''   // Heading after main title
-  },
-  body: {
-    imgs: [],
-    icons: [],
-    videos: [],
-    paragraphs: [],
-    links: [],
-    lists: [],
-    buttons: [],
-    properties: [],
-    propertyBlocks: [],
-    cards: [],
-    headings: []
-  },
-  banner: null,    // Image with banner role or image before heading
-  metadata: {
-    level: null,   // Heading level that started this group
-    contentTypes: Set()
-  }
+  title: '',       // Main heading
+  pretitle: '',    // Heading before main title
+  subtitle: '',    // Heading after main title
+  paragraphs: [],
+  links: [],
+  imgs: [],
+  icons: [],
+  videos: [],
+  lists: [],
+  buttons: [],
+  data: {},        // Tagged code blocks (keyed by tag name)
+  cards: [],
+  headings: [],
+  items: [],       // Child content groups
 }
 ```
@@ -102,6 +93,32 @@ The sequence processor identifies several special element types by inspecting pa
 These are extracted into dedicated element types for easier downstream processing.
+### Tagged Code Blocks
+Code blocks with tags route parsed data to the `data` object:
+```markdown
+```json:nav-links
+[{ "label": "Home", "href": "/" }]
+```
+```yaml:config
+title: My Site
+theme: dark
+```
+```
+Results in:
+```js
+content.data['nav-links'] = [{ label: "Home", href: "/" }]
+content.data['config'] = { title: "My Site", theme: "dark" }
+```
+**Parsing rules:**
+- Tagged blocks with `json` language: parsed as JSON
+- Tagged blocks with `yaml`/`yml` language: parsed as YAML
+- Untagged blocks: not parsed (stay as raw text in sequence for display)
 ### List Processing
 Lists maintain hierarchy through nested structure. The `processListItems()` function in sequence.js handles nested lists, while `processListContent()` in groups.js applies full group content processing to each list item, allowing lists to contain rich content (images, paragraphs, nested lists, etc.).

package/README.md CHANGED Viewed

@@ -4,11 +4,10 @@ A semantic parser for ProseMirror/TipTap content structures that helps bridge th
 ## What it Does
-The parser transforms rich text editor content (ProseMirror/TipTap) into structured, semantic groups that web components can easily consume. It provides three complementary views of your content:
+The parser transforms rich text editor content (ProseMirror/TipTap) into structured, semantic groups that web components can easily consume. It provides two complementary views of your content:
-1. **Sequence**: A flat, ordered list of all content elements
-2. **Groups**: Content organized into semantic sections with identified main content
-3. **ByType**: Elements categorized by type for easy filtering and queries
+1. **Sequence**: An ordered list of all content elements (for rendering in document order)
+2. **Groups**: Content organized into semantic sections (main content + items)
 ## Installation
@@ -41,16 +40,16 @@ const doc = {
 const result = parseContent(doc);
 // Access different views
-console.log(result.sequence);  // Flat array of elements
-console.log(result.groups);    // Semantic groups with main/items
-console.log(result.byType);    // Elements organized by type
+console.log(result.sequence);  // Ordered array of elements
+console.log(result.title);     // Main content fields at top level
+console.log(result.items);     // Additional content groups
 ```
 ## Output Structure
 ### Sequence View
-A flat array of semantic elements preserving document order:
+An ordered array of semantic elements preserving document order:
 ```js
 result.sequence = [
@@ -59,72 +58,37 @@ result.sequence = [
 ]
 ```
-### Groups View
+### Content Structure
-Content organized into semantic groups:
+Main content fields are at the top level. The `items` array contains additional content groups (created when headings appear after content), each with the same field structure:
 ```js
-result.groups = {
-  main: {
-    header: {
-      pretitle: "",           // H3 before main title
-      title: "Welcome",       // Main heading
-      subtitle: ""            // Heading after main title
-    },
-    body: {
-      paragraphs: ["Get started today."],
-      imgs: [],
-      videos: [],
-      links: [],
-      lists: [],
-      // ... more content types
-    },
-    banner: null,             // Optional banner image
-    metadata: { level: 1 }
-  },
-  items: [],                  // Additional content groups
-  metadata: {
-    dividerMode: false,       // Using dividers vs headings
-    groups: 0
-  }
-}
-```
-### ByType View
+result = {
+  // Main content fields
+  pretitle: "",             // Heading before main title
+  title: "Welcome",         // Main heading
+  subtitle: "",             // Heading after main title
+  paragraphs: ["Get started today."],
+  imgs: [],
+  videos: [],
+  links: [],
+  lists: [],
+  icons: [],
+  buttons: [],
+  banner: null,             // Optional banner image
+  // ... more content types
+  // Additional content groups (from headings after content)
+  items: [
+    { title: "Feature 1", paragraphs: [...], links: [...] },
+    { title: "Feature 2", paragraphs: [...], links: [...] }
+  ],
-Elements organized by type with context:
+  // Ordered sequence for document-order rendering
+  sequence: [...],
-```js
-result.byType = {
-  headings: [
-    {
-      type: "heading",
-      level: 1,
-      content: "Welcome",
-      context: {
-        position: 0,
-        previousElement: null,
-        nextElement: { type: "paragraph", ... },
-        nearestHeading: null
-      }
-    }
-  ],
-  paragraphs: [ /* ... */ ],
-  images: {
-    background: [],
-    content: [],
-    gallery: [],
-    icon: []
-  },
-  lists: [],
-  metadata: {
-    totalElements: 2,
-    dominantType: "paragraph",
-    hasMedia: false
-  },
-  // Helper methods
-  getHeadingsByLevel(level),
-  getElementsByHeadingContext(filter)
+  // Original document
+  raw: { type: "doc", content: [...] }
 }
 ```
@@ -133,45 +97,29 @@ result.byType = {
 ### Extracting Main Content
 ```js
-const { groups } = parseContent(doc);
+const content = parseContent(doc);
-const title = groups.main.header.title;
-const description = groups.main.body.paragraphs.join(" ");
-const image = groups.main.banner?.url;
+const title = content.title;
+const description = content.paragraphs.join(" ");
+const image = content.banner?.url;
 ```
 ### Processing Content Sections
 ```js
-const { groups } = parseContent(doc);
+const content = parseContent(doc);
 // Main content
-console.log("Main:", groups.main.header.title);
+console.log("Title:", content.title);
+console.log("Description:", content.paragraphs);
-// Additional sections
-groups.items.forEach(item => {
-  console.log("Section:", item.header.title);
-  console.log("Content:", item.body.paragraphs);
+// Additional content groups
+content.items.forEach(item => {
+  console.log("Section:", item.title);
+  console.log("Content:", item.paragraphs);
 });
 ```
-### Finding Specific Elements
-```js
-const { byType } = parseContent(doc);
-// Get all H2 headings
-const subheadings = byType.getHeadingsByLevel(2);
-// Get all background images
-const backgrounds = byType.images.background;
-// Get content under specific headings
-const features = byType.getElementsByHeadingContext(
-  h => h.content.includes("Features")
-);
-```
 ### Sequential Processing
 ```js
@@ -203,17 +151,17 @@ Automatically transform content based on field types with context-aware behavior
 ```js
 const schema = {
   title: {
-    path: "groups.main.header.title",
+    path: "title",
     type: "plaintext",  // Auto-strips <strong>, <em>, etc.
     maxLength: 60       // Auto-truncates intelligently
   },
   excerpt: {
-    path: "groups.main.body.paragraphs",
+    path: "paragraphs",
     type: "excerpt",    // Auto-creates excerpt from paragraphs
     maxLength: 150
   },
   image: {
-    path: "groups.main.body.imgs[0].url",
+    path: "imgs[0].url",
     type: "image",
     defaultValue: "/placeholder.jpg"
   }
@@ -259,15 +207,15 @@ Define custom mappings using schemas:
 ```js
 const schema = {
-  brand: "groups.main.header.pretitle",
-  title: "groups.main.header.title",
-  subtitle: "groups.main.header.subtitle",
+  brand: "pretitle",
+  title: "title",
+  subtitle: "subtitle",
   image: {
-    path: "groups.main.body.imgs[0].url",
+    path: "imgs[0].url",
     defaultValue: "/placeholder.jpg"
   },
   actions: {
-    path: "groups.main.body.links",
+    path: "links",
     transform: links => links.map(l => ({ label: l.label, type: "primary" }))
   }
 };

package/docs/api.md CHANGED Viewed

@@ -118,51 +118,49 @@ A flat array of semantic elements extracted from the document tree.
 ### `groups`
-Content organized into semantic groups with identified main content and items.
+Content organized into semantic groups with identified main content and items. The structure is flat - header and body fields are merged at the top level.
 ```js
 {
   main: {
-    header: {
-      pretitle: "PRETITLE TEXT",  // H3 before main title
-      title: "Main Title",         // First heading in group
-      subtitle: "Subtitle"         // Second heading in group
-    },
-    body: {
-      paragraphs: ["paragraph text", ...],
-      imgs: [
-        { url: "...", caption: "...", alt: "..." }
-      ],
-      icons: ["<svg>...</svg>", ...],
-      videos: [
-        { src: "...", caption: "...", alt: "..." }
-      ],
-      links: [
-        { href: "...", label: "..." }
-      ],
-      lists: [
-        [/* processed list items */]
-      ],
-      buttons: [
-        { content: "...", attrs: {...} }
-      ],
-      properties: [],       // Code block content
-      propertyBlocks: [],   // Array of code blocks
-      cards: [],            // Not yet implemented
-      headings: []          // Used in list items
-    },
+    // Header fields (flat)
+    pretitle: "PRETITLE TEXT",    // H3 before main title
+    title: "Main Title",           // First heading in group
+    subtitle: "Subtitle",          // Second heading in group
+    // Body fields (flat)
+    paragraphs: ["paragraph text", ...],
+    imgs: [
+      { url: "...", caption: "...", alt: "..." }
+    ],
+    icons: ["<svg>...</svg>", ...],
+    videos: [
+      { src: "...", caption: "...", alt: "..." }
+    ],
+    links: [
+      { href: "...", label: "..." }
+    ],
+    lists: [
+      [/* processed list items */]
+    ],
+    buttons: [
+      { content: "...", attrs: {...} }
+    ],
+    properties: [],       // Code block content
+    propertyBlocks: [],   // Array of code blocks
+    cards: [],            // Not yet implemented
+    headings: [],         // Used in list items
+    // Banner (flat)
     banner: {
       url: "path/to/banner.jpg",
       caption: "Banner caption",
       alt: "Banner alt text"
-    } | null,
-    metadata: {
-      level: 1,             // Heading level that started this group
-      contentTypes: {}      // Set of content types in group
-    }
+    } | null
   },
   items: [
-    // Array of groups with same structure as main
+    // Array of groups with same flat structure as main
+    // { title, pretitle, subtitle, paragraphs, imgs, ... }
   ],
   metadata: {
     dividerMode: false,     // Whether dividers were used for grouping
@@ -268,14 +266,14 @@ const result = parseContent(doc);
 ```js
 const { groups } = parseContent(doc);
-// Access main content
-console.log(groups.main.header.title);
-console.log(groups.main.body.paragraphs);
+// Access main content (flat structure)
+console.log(groups.main.title);
+console.log(groups.main.paragraphs);
 // Iterate through content items
 groups.items.forEach(item => {
-  console.log(item.header.title);
-  console.log(item.body.paragraphs);
+  console.log(item.title);
+  console.log(item.paragraphs);
 });
 ```