npm - @uniweb/semantic-parser - Versions diffs - 1.0.0 - Mend

@uniweb/semantic-parser 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (25) hide show

package/.claude/settings.local.json +9 -0
package/.eslintrc.json +28 -0
package/LICENSE +674 -0
package/README.md +395 -0
package/docs/api.md +352 -0
package/docs/file-structure.md +50 -0
package/docs/guide.md +206 -0
package/docs/mapping-patterns.md +928 -0
package/docs/text-component-reference.md +515 -0
package/package.json +41 -0
package/reference/README.md +195 -0
package/reference/Text.js +188 -0
package/src/index.js +35 -0
package/src/mappers/accessor.js +312 -0
package/src/mappers/extractors.js +397 -0
package/src/mappers/helpers.js +234 -0
package/src/mappers/index.js +28 -0
package/src/mappers/types.js +495 -0
package/src/processors/byType.js +129 -0
package/src/processors/groups.js +330 -0
package/src/processors/groups_backup.js +379 -0
package/src/processors/groups_doc.md +179 -0
package/src/processors/sequence.js +573 -0
package/src/processors/sequence_backup.js +402 -0
package/src/utils/role.js +53 -0

package/docs/api.md ADDED Viewed

@@ -0,0 +1,352 @@
+# API Reference
+## parseContent(doc, options)
+Parses a ProseMirror/TipTap document into three semantic views.
+### Import
+```js
+import { parseContent } from '@uniwebcms/semantic-parser';
+```
+### Parameters
+- `doc` (Object): A ProseMirror/TipTap document object with `type: "doc"` and `content` array
+- `options` (Object, optional): Parsing options
+  - `parseCodeAsJson` (boolean): Parse code blocks as JSON for properties. Default: false
+**Note:** Body headings are always collected automatically - no configuration needed.
+### Returns
+An object with four properties providing different views of the content:
+```js
+{
+  raw: Object,      // Original ProseMirror document
+  sequence: Array,  // Flat sequence of elements
+  groups: Object,   // Semantic content groups
+  byType: Object    // Elements organized by type
+}
+```
+## Return Value Structure
+### `raw`
+The original ProseMirror document passed as input, unchanged.
+### `sequence`
+A flat array of semantic elements extracted from the document tree.
+**Element Types:**
+```js
+// Heading
+{
+  type: "heading",
+  level: 1,              // 1-6
+  content: "Text content with <strong>HTML</strong> formatting"
+}
+// Paragraph
+{
+  type: "paragraph",
+  content: "Text with <em>inline</em> <a href=\"...\">formatting</a>"
+}
+// List
+{
+  type: "list",
+  style: "bullet" | "ordered",
+  items: [
+    {
+      content: [/* array of elements */],
+      items: [/* nested list items */]
+    }
+  ]
+}
+// Image
+{
+  type: "image",
+  src: "path/to/image.jpg",
+  alt: "Alt text",
+  caption: "Caption text",
+  role: "background" | "content" | "banner" | "icon"
+}
+// Icon (SVG)
+{
+  type: "icon",
+  svg: "<svg>...</svg>"
+}
+// Video
+{
+  type: "video",
+  src: "path/to/video.mp4",
+  alt: "Alt text",
+  caption: "Caption text"
+}
+// Link (paragraph containing only a link)
+{
+  type: "link",
+  content: {
+    href: "https://example.com",
+    label: "Link text"
+  }
+}
+// Button
+{
+  type: "button",
+  content: "Button text",
+  attrs: {
+    // Button-specific attributes
+  }
+}
+// Divider (horizontal rule)
+{
+  type: "divider"
+}
+```
+### `groups`
+Content organized into semantic groups with identified main content and items.
+```js
+{
+  main: {
+    header: {
+      pretitle: "PRETITLE TEXT",  // H3 before main title
+      title: "Main Title",         // First heading in group
+      subtitle: "Subtitle"         // Second heading in group
+    },
+    body: {
+      paragraphs: ["paragraph text", ...],
+      imgs: [
+        { url: "...", caption: "...", alt: "..." }
+      ],
+      icons: ["<svg>...</svg>", ...],
+      videos: [
+        { src: "...", caption: "...", alt: "..." }
+      ],
+      links: [
+        { href: "...", label: "..." }
+      ],
+      lists: [
+        [/* processed list items */]
+      ],
+      buttons: [
+        { content: "...", attrs: {...} }
+      ],
+      properties: [],       // Code block content
+      propertyBlocks: [],   // Array of code blocks
+      cards: [],            // Not yet implemented
+      headings: []          // Used in list items
+    },
+    banner: {
+      url: "path/to/banner.jpg",
+      caption: "Banner caption",
+      alt: "Banner alt text"
+    } | null,
+    metadata: {
+      level: 1,             // Heading level that started this group
+      contentTypes: {}      // Set of content types in group
+    }
+  },
+  items: [
+    // Array of groups with same structure as main
+  ],
+  metadata: {
+    dividerMode: false,     // Whether dividers were used for grouping
+    groups: 0               // Total number of groups
+  }
+}
+```
+**Grouping Modes:**
+1. **Heading-based grouping** (default): Groups start with heading patterns
+2. **Divider-based grouping**: When any `horizontalRule` is present, groups are split by dividers
+**Main Content Identification:**
+- Single group → always main content
+- Multiple groups → first group is main if it has lower heading level than second group
+- Divider mode starting with divider → no main content, all items
+### `byType`
+Elements organized by type with positional context.
+```js
+{
+  headings: [
+    {
+      type: "heading",
+      level: 1,
+      content: "Title",
+      context: {
+        position: 0,
+        previousElement: null,
+        nextElement: { type: "paragraph", ... },
+        nearestHeading: null
+      }
+    }
+  ],
+  paragraphs: [
+    {
+      type: "paragraph",
+      content: "Text",
+      context: { ... }
+    }
+  ],
+  images: {
+    background: [/* images with role="background" */],
+    content: [/* images with role="content" */],
+    gallery: [/* images with role="gallery" */],
+    icon: [/* images with role="icon" */]
+  },
+  lists: [/* list elements with context */],
+  dividers: [/* divider elements with context */],
+  metadata: {
+    totalElements: 10,
+    dominantType: "paragraph",
+    hasMedia: true
+  },
+  // Helper methods
+  getHeadingsByLevel(level),
+  getElementsByHeadingContext(headingFilter)
+}
+```
+**Helper Methods:**
+```js
+// Get all H1 headings
+byType.getHeadingsByLevel(1)
+// Get all elements under headings matching a filter
+byType.getElementsByHeadingContext((heading) => heading.level === 2)
+```
+## Usage Examples
+### Basic Usage
+```js
+import { parseContent } from "@uniwebcms/semantic-parser";
+const doc = {
+  type: "doc",
+  content: [
+    {
+      type: "heading",
+      attrs: { level: 1 },
+      content: [{ type: "text", text: "Welcome" }]
+    },
+    {
+      type: "paragraph",
+      content: [{ type: "text", text: "Get started today." }]
+    }
+  ]
+};
+const result = parseContent(doc);
+```
+### Working with Groups
+```js
+const { groups } = parseContent(doc);
+// Access main content
+console.log(groups.main.header.title);
+console.log(groups.main.body.paragraphs);
+// Iterate through content items
+groups.items.forEach(item => {
+  console.log(item.header.title);
+  console.log(item.body.paragraphs);
+});
+```
+### Working with byType
+```js
+const { byType } = parseContent(doc);
+// Get all images
+const allImages = Object.values(byType.images).flat();
+// Get all H2 headings
+const h2Headings = byType.getHeadingsByLevel(2);
+// Get content under specific headings
+const featuresContent = byType.getElementsByHeadingContext(
+  h => h.content.includes("Features")
+);
+```
+### Working with Sequence
+```js
+const { sequence } = parseContent(doc);
+// Process elements in order
+sequence.forEach(element => {
+  switch(element.type) {
+    case 'heading':
+      console.log(`H${element.level}: ${element.content}`);
+      break;
+    case 'paragraph':
+      console.log(`P: ${element.content}`);
+      break;
+  }
+});
+```
+## Text Formatting
+The parser preserves inline formatting as HTML tags within text content:
+- **Bold**: `<strong>text</strong>`
+- **Italic**: `<em>text</em>`
+- **Links**: `<a href="url">text</a>`
+```js
+// Input
+{
+  type: "paragraph",
+  content: [
+    { type: "text", text: "Normal " },
+    { type: "text", marks: [{ type: "bold" }], text: "bold" }
+  ]
+}
+// Output
+{
+  type: "paragraph",
+  content: "Normal <strong>bold</strong>"
+}
+```
+## Special Element Detection
+The parser detects special patterns and extracts them as dedicated element types:
+- **Paragraph with only a link** → `type: "link"`
+- **Paragraph with only an image** (role: image/banner) → `type: "image"`
+- **Paragraph with only an icon** (role: icon) → `type: "icon"`
+- **Paragraph with only a button mark** → `type: "button"`
+- **Paragraph with only a video** (role: video) → `type: "video"`
+This makes it easier to identify and handle these special cases in downstream processing.

package/docs/file-structure.md ADDED Viewed

@@ -0,0 +1,50 @@
+# File Structure
+```
+semantic-parser/
+├── package.json
+├── README.md
+├── CLAUDE.md            # Guidance for Claude Code
+├── src/
+│   ├── index.js         # Main entry point and API
+│   ├── processors/
+│   │   ├── sequence.js  # Flattens ProseMirror doc to sequence
+│   │   ├── groups.js    # Creates semantic content groups
+│   │   └── byType.js    # Organizes elements by type
+│   ├── processors_old/  # Legacy implementations (deprecated)
+│   │   ├── sequence.js
+│   │   ├── groups.js
+│   │   └── byType.js
+│   └── utils/
+│       └── role.js      # Role detection utilities
+├── tests/
+│   ├── parser.test.js   # Integration tests
+│   ├── processors/
+│   │   ├── sequence.test.js
+│   │   ├── groups.test.js
+│   │   └── byType.test.js
+│   ├── utils/
+│   │   └── role.test.js
+│   └── fixtures/
+│       ├── basic.js     # Simple test cases
+│       ├── groups.js    # Group formation test cases
+│       └── complex.js   # Complex scenarios
+└── docs/
+    ├── guide.md         # Content writing guide
+    ├── api.md           # API reference documentation
+    └── file-structure.md # This file
+```
+## Key Directories
+### `src/processors/`
+Contains the three-stage processing pipeline that transforms ProseMirror documents into semantic structures.
+### `src/processors_old/`
+Legacy implementations kept for reference. Do not modify these files.
+### `tests/`
+Comprehensive test suite organized by processor with shared fixtures.
+### `docs/`
+End-user documentation including content writing guide and API reference.

package/docs/guide.md ADDED Viewed

@@ -0,0 +1,206 @@
+# Content Writing Guide
+This guide explains how to write content that works well with our semantic parser. The parser helps web components understand and render your content effectively by identifying its structure and meaning.
+## Core Concepts
+The parser recognizes two key elements in your content:
+- **Main Content**: The primary content that introduces your section
+- **Groups**: Additional content blocks that follow a consistent structure
+## Main Content
+Main content provides the primary context for your section. Here's how to write it:
+```markdown
+### SOLUTIONS
+# Build Better Websites
+## For Everyone
+Transform how you create web content with our powerful platform.
+```
+Your main content must have exactly one main title, which can be either:
+- A single H1 heading, or
+- A single H2 heading (if no H1 exists)
+You can also add:
+- A pretitle (H3 before main title)
+- A subtitle (next level heading after main title)
+For line breaks in titles, use HTML break tags:
+```markdown
+# Build Better<br>Websites Today
+```
+## Content Groups
+After your main content, you can add multiple content groups. Groups can start with any heading level, and each group can have its own structure.
+H3s can serve two different roles depending on context:
+1. As a pretitle when followed by a higher-level heading:
+```markdown
+### SPEED MATTERS
+## Performance Features
+Modern websites need to be fast. Our platform ensures quick load times
+across all devices.
+```
+2. As a regular group title when followed by content or lower-level headings:
+```markdown
+### Getting Started
+Start building your website in minutes.
+### Installation Guide
+#### Prerequisites
+Make sure you have Node.js installed...
+```
+Each group can have:
+- A title (any heading level that starts the group)
+- A pretitle (H3 followed by higher-level heading)
+- A subtitle (lower-level heading after title)
+- Content (text, lists, media)
+## Creating Groups
+There are two ways to create groups: using headings or using dividers.
+### Using Headings
+Headings naturally create groups when they appear after content:
+```markdown
+# Main Features
+Our platform offers powerful capabilities.
+## Fast Performance
+Lightning quick response times...
+## Easy Integration
+Connect with your existing tools...
+```
+Multiple H1s or multiple H2s (with no H1) will create separate groups:
+```markdown
+# First Group
+Content for first group...
+# Second Group
+Content for second group...
+```
+### Using Dividers
+Alternatively, you can use dividers (---) to explicitly separate groups:
+```markdown
+# Welcome Section
+Our main welcome message.
+---
+Get started with our platform
+with these simple steps.
+---
+Contact us to learn more
+about enterprise solutions.
+```
+Important: Once you use a divider, you must use dividers for all group separations in that section. Don't mix heading-based and divider-based group creation.
+## Rich Content
+### Lists
+Lists maintain their hierarchy, making them perfect for structured data:
+```markdown
+## Features
+- Enterprise
+  - Role-based access
+  - Audit logs
+- Team
+  - Collaboration
+  - API access
+```
+### Media
+Images and videos can have explicit roles:
+```markdown
+![Hero](hero.jpg){role="background"}
+![Icon](icon.svg){role="icon"}
+![](photo.jpg) # Default role is "content"
+```
+Common roles include:
+- background
+- content
+- gallery
+- icon
+### Links
+Links can have roles to indicate their purpose:
+```markdown
+[Get Started](./start){role="button-primary"}
+[Learn More](./docs){role="button"}
+[Privacy](./legal){role="footer-link"}
+```
+Common link roles include:
+- button-primary
+- button
+- button-outline
+- nav-link
+- footer-link
+## Important Notes
+1. Main content is only recognized when there's exactly one main title (H1 or H2).
+2. Avoid these patterns as they'll result in no main content:
+```markdown
+# First Title
+Content...
+# Second Title
+Content...
+```
+3. All content is optional - components may choose what to render based on their needs and configuration.
+4. Be consistent with your group creation method - use either headings or dividers, not both.