@uniweb/semantic-parser 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/docs/api.md ADDED
@@ -0,0 +1,352 @@
1
+ # API Reference
2
+
3
+ ## parseContent(doc, options)
4
+
5
+ Parses a ProseMirror/TipTap document into three semantic views.
6
+
7
+ ### Import
8
+
9
+ ```js
10
+ import { parseContent } from '@uniwebcms/semantic-parser';
11
+ ```
12
+
13
+ ### Parameters
14
+
15
+ - `doc` (Object): A ProseMirror/TipTap document object with `type: "doc"` and `content` array
16
+ - `options` (Object, optional): Parsing options
17
+ - `parseCodeAsJson` (boolean): Parse code blocks as JSON for properties. Default: false
18
+
19
+ **Note:** Body headings are always collected automatically - no configuration needed.
20
+
21
+ ### Returns
22
+
23
+ An object with four properties providing different views of the content:
24
+
25
+ ```js
26
+ {
27
+ raw: Object, // Original ProseMirror document
28
+ sequence: Array, // Flat sequence of elements
29
+ groups: Object, // Semantic content groups
30
+ byType: Object // Elements organized by type
31
+ }
32
+ ```
33
+
34
+ ## Return Value Structure
35
+
36
+ ### `raw`
37
+
38
+ The original ProseMirror document passed as input, unchanged.
39
+
40
+ ### `sequence`
41
+
42
+ A flat array of semantic elements extracted from the document tree.
43
+
44
+ **Element Types:**
45
+
46
+ ```js
47
+ // Heading
48
+ {
49
+ type: "heading",
50
+ level: 1, // 1-6
51
+ content: "Text content with <strong>HTML</strong> formatting"
52
+ }
53
+
54
+ // Paragraph
55
+ {
56
+ type: "paragraph",
57
+ content: "Text with <em>inline</em> <a href=\"...\">formatting</a>"
58
+ }
59
+
60
+ // List
61
+ {
62
+ type: "list",
63
+ style: "bullet" | "ordered",
64
+ items: [
65
+ {
66
+ content: [/* array of elements */],
67
+ items: [/* nested list items */]
68
+ }
69
+ ]
70
+ }
71
+
72
+ // Image
73
+ {
74
+ type: "image",
75
+ src: "path/to/image.jpg",
76
+ alt: "Alt text",
77
+ caption: "Caption text",
78
+ role: "background" | "content" | "banner" | "icon"
79
+ }
80
+
81
+ // Icon (SVG)
82
+ {
83
+ type: "icon",
84
+ svg: "<svg>...</svg>"
85
+ }
86
+
87
+ // Video
88
+ {
89
+ type: "video",
90
+ src: "path/to/video.mp4",
91
+ alt: "Alt text",
92
+ caption: "Caption text"
93
+ }
94
+
95
+ // Link (paragraph containing only a link)
96
+ {
97
+ type: "link",
98
+ content: {
99
+ href: "https://example.com",
100
+ label: "Link text"
101
+ }
102
+ }
103
+
104
+ // Button
105
+ {
106
+ type: "button",
107
+ content: "Button text",
108
+ attrs: {
109
+ // Button-specific attributes
110
+ }
111
+ }
112
+
113
+ // Divider (horizontal rule)
114
+ {
115
+ type: "divider"
116
+ }
117
+ ```
118
+
119
+ ### `groups`
120
+
121
+ Content organized into semantic groups with identified main content and items.
122
+
123
+ ```js
124
+ {
125
+ main: {
126
+ header: {
127
+ pretitle: "PRETITLE TEXT", // H3 before main title
128
+ title: "Main Title", // First heading in group
129
+ subtitle: "Subtitle" // Second heading in group
130
+ },
131
+ body: {
132
+ paragraphs: ["paragraph text", ...],
133
+ imgs: [
134
+ { url: "...", caption: "...", alt: "..." }
135
+ ],
136
+ icons: ["<svg>...</svg>", ...],
137
+ videos: [
138
+ { src: "...", caption: "...", alt: "..." }
139
+ ],
140
+ links: [
141
+ { href: "...", label: "..." }
142
+ ],
143
+ lists: [
144
+ [/* processed list items */]
145
+ ],
146
+ buttons: [
147
+ { content: "...", attrs: {...} }
148
+ ],
149
+ properties: [], // Code block content
150
+ propertyBlocks: [], // Array of code blocks
151
+ cards: [], // Not yet implemented
152
+ headings: [] // Used in list items
153
+ },
154
+ banner: {
155
+ url: "path/to/banner.jpg",
156
+ caption: "Banner caption",
157
+ alt: "Banner alt text"
158
+ } | null,
159
+ metadata: {
160
+ level: 1, // Heading level that started this group
161
+ contentTypes: {} // Set of content types in group
162
+ }
163
+ },
164
+ items: [
165
+ // Array of groups with same structure as main
166
+ ],
167
+ metadata: {
168
+ dividerMode: false, // Whether dividers were used for grouping
169
+ groups: 0 // Total number of groups
170
+ }
171
+ }
172
+ ```
173
+
174
+ **Grouping Modes:**
175
+
176
+ 1. **Heading-based grouping** (default): Groups start with heading patterns
177
+ 2. **Divider-based grouping**: When any `horizontalRule` is present, groups are split by dividers
178
+
179
+ **Main Content Identification:**
180
+
181
+ - Single group → always main content
182
+ - Multiple groups → first group is main if it has lower heading level than second group
183
+ - Divider mode starting with divider → no main content, all items
184
+
185
+ ### `byType`
186
+
187
+ Elements organized by type with positional context.
188
+
189
+ ```js
190
+ {
191
+ headings: [
192
+ {
193
+ type: "heading",
194
+ level: 1,
195
+ content: "Title",
196
+ context: {
197
+ position: 0,
198
+ previousElement: null,
199
+ nextElement: { type: "paragraph", ... },
200
+ nearestHeading: null
201
+ }
202
+ }
203
+ ],
204
+ paragraphs: [
205
+ {
206
+ type: "paragraph",
207
+ content: "Text",
208
+ context: { ... }
209
+ }
210
+ ],
211
+ images: {
212
+ background: [/* images with role="background" */],
213
+ content: [/* images with role="content" */],
214
+ gallery: [/* images with role="gallery" */],
215
+ icon: [/* images with role="icon" */]
216
+ },
217
+ lists: [/* list elements with context */],
218
+ dividers: [/* divider elements with context */],
219
+ metadata: {
220
+ totalElements: 10,
221
+ dominantType: "paragraph",
222
+ hasMedia: true
223
+ },
224
+
225
+ // Helper methods
226
+ getHeadingsByLevel(level),
227
+ getElementsByHeadingContext(headingFilter)
228
+ }
229
+ ```
230
+
231
+ **Helper Methods:**
232
+
233
+ ```js
234
+ // Get all H1 headings
235
+ byType.getHeadingsByLevel(1)
236
+
237
+ // Get all elements under headings matching a filter
238
+ byType.getElementsByHeadingContext((heading) => heading.level === 2)
239
+ ```
240
+
241
+ ## Usage Examples
242
+
243
+ ### Basic Usage
244
+
245
+ ```js
246
+ import { parseContent } from "@uniwebcms/semantic-parser";
247
+
248
+ const doc = {
249
+ type: "doc",
250
+ content: [
251
+ {
252
+ type: "heading",
253
+ attrs: { level: 1 },
254
+ content: [{ type: "text", text: "Welcome" }]
255
+ },
256
+ {
257
+ type: "paragraph",
258
+ content: [{ type: "text", text: "Get started today." }]
259
+ }
260
+ ]
261
+ };
262
+
263
+ const result = parseContent(doc);
264
+ ```
265
+
266
+ ### Working with Groups
267
+
268
+ ```js
269
+ const { groups } = parseContent(doc);
270
+
271
+ // Access main content
272
+ console.log(groups.main.header.title);
273
+ console.log(groups.main.body.paragraphs);
274
+
275
+ // Iterate through content items
276
+ groups.items.forEach(item => {
277
+ console.log(item.header.title);
278
+ console.log(item.body.paragraphs);
279
+ });
280
+ ```
281
+
282
+ ### Working with byType
283
+
284
+ ```js
285
+ const { byType } = parseContent(doc);
286
+
287
+ // Get all images
288
+ const allImages = Object.values(byType.images).flat();
289
+
290
+ // Get all H2 headings
291
+ const h2Headings = byType.getHeadingsByLevel(2);
292
+
293
+ // Get content under specific headings
294
+ const featuresContent = byType.getElementsByHeadingContext(
295
+ h => h.content.includes("Features")
296
+ );
297
+ ```
298
+
299
+ ### Working with Sequence
300
+
301
+ ```js
302
+ const { sequence } = parseContent(doc);
303
+
304
+ // Process elements in order
305
+ sequence.forEach(element => {
306
+ switch(element.type) {
307
+ case 'heading':
308
+ console.log(`H${element.level}: ${element.content}`);
309
+ break;
310
+ case 'paragraph':
311
+ console.log(`P: ${element.content}`);
312
+ break;
313
+ }
314
+ });
315
+ ```
316
+
317
+ ## Text Formatting
318
+
319
+ The parser preserves inline formatting as HTML tags within text content:
320
+
321
+ - **Bold**: `<strong>text</strong>`
322
+ - **Italic**: `<em>text</em>`
323
+ - **Links**: `<a href="url">text</a>`
324
+
325
+ ```js
326
+ // Input
327
+ {
328
+ type: "paragraph",
329
+ content: [
330
+ { type: "text", text: "Normal " },
331
+ { type: "text", marks: [{ type: "bold" }], text: "bold" }
332
+ ]
333
+ }
334
+
335
+ // Output
336
+ {
337
+ type: "paragraph",
338
+ content: "Normal <strong>bold</strong>"
339
+ }
340
+ ```
341
+
342
+ ## Special Element Detection
343
+
344
+ The parser detects special patterns and extracts them as dedicated element types:
345
+
346
+ - **Paragraph with only a link** → `type: "link"`
347
+ - **Paragraph with only an image** (role: image/banner) → `type: "image"`
348
+ - **Paragraph with only an icon** (role: icon) → `type: "icon"`
349
+ - **Paragraph with only a button mark** → `type: "button"`
350
+ - **Paragraph with only a video** (role: video) → `type: "video"`
351
+
352
+ This makes it easier to identify and handle these special cases in downstream processing.
@@ -0,0 +1,50 @@
1
+ # File Structure
2
+
3
+ ```
4
+ semantic-parser/
5
+ ├── package.json
6
+ ├── README.md
7
+ ├── CLAUDE.md # Guidance for Claude Code
8
+ ├── src/
9
+ │ ├── index.js # Main entry point and API
10
+ │ ├── processors/
11
+ │ │ ├── sequence.js # Flattens ProseMirror doc to sequence
12
+ │ │ ├── groups.js # Creates semantic content groups
13
+ │ │ └── byType.js # Organizes elements by type
14
+ │ ├── processors_old/ # Legacy implementations (deprecated)
15
+ │ │ ├── sequence.js
16
+ │ │ ├── groups.js
17
+ │ │ └── byType.js
18
+ │ └── utils/
19
+ │ └── role.js # Role detection utilities
20
+ ├── tests/
21
+ │ ├── parser.test.js # Integration tests
22
+ │ ├── processors/
23
+ │ │ ├── sequence.test.js
24
+ │ │ ├── groups.test.js
25
+ │ │ └── byType.test.js
26
+ │ ├── utils/
27
+ │ │ └── role.test.js
28
+ │ └── fixtures/
29
+ │ ├── basic.js # Simple test cases
30
+ │ ├── groups.js # Group formation test cases
31
+ │ └── complex.js # Complex scenarios
32
+ └── docs/
33
+ ├── guide.md # Content writing guide
34
+ ├── api.md # API reference documentation
35
+ └── file-structure.md # This file
36
+ ```
37
+
38
+ ## Key Directories
39
+
40
+ ### `src/processors/`
41
+ Contains the three-stage processing pipeline that transforms ProseMirror documents into semantic structures.
42
+
43
+ ### `src/processors_old/`
44
+ Legacy implementations kept for reference. Do not modify these files.
45
+
46
+ ### `tests/`
47
+ Comprehensive test suite organized by processor with shared fixtures.
48
+
49
+ ### `docs/`
50
+ End-user documentation including content writing guide and API reference.
package/docs/guide.md ADDED
@@ -0,0 +1,206 @@
1
+ # Content Writing Guide
2
+
3
+ This guide explains how to write content that works well with our semantic parser. The parser helps web components understand and render your content effectively by identifying its structure and meaning.
4
+
5
+ ## Core Concepts
6
+
7
+ The parser recognizes two key elements in your content:
8
+
9
+ - **Main Content**: The primary content that introduces your section
10
+ - **Groups**: Additional content blocks that follow a consistent structure
11
+
12
+ ## Main Content
13
+
14
+ Main content provides the primary context for your section. Here's how to write it:
15
+
16
+ ```markdown
17
+ ### SOLUTIONS
18
+
19
+ # Build Better Websites
20
+
21
+ ## For Everyone
22
+
23
+ Transform how you create web content with our powerful platform.
24
+ ```
25
+
26
+ Your main content must have exactly one main title, which can be either:
27
+
28
+ - A single H1 heading, or
29
+ - A single H2 heading (if no H1 exists)
30
+
31
+ You can also add:
32
+
33
+ - A pretitle (H3 before main title)
34
+ - A subtitle (next level heading after main title)
35
+
36
+ For line breaks in titles, use HTML break tags:
37
+
38
+ ```markdown
39
+ # Build Better<br>Websites Today
40
+ ```
41
+
42
+ ## Content Groups
43
+
44
+ After your main content, you can add multiple content groups. Groups can start with any heading level, and each group can have its own structure.
45
+
46
+ H3s can serve two different roles depending on context:
47
+
48
+ 1. As a pretitle when followed by a higher-level heading:
49
+
50
+ ```markdown
51
+ ### SPEED MATTERS
52
+
53
+ ## Performance Features
54
+
55
+ Modern websites need to be fast. Our platform ensures quick load times
56
+ across all devices.
57
+ ```
58
+
59
+ 2. As a regular group title when followed by content or lower-level headings:
60
+
61
+ ```markdown
62
+ ### Getting Started
63
+
64
+ Start building your website in minutes.
65
+
66
+ ### Installation Guide
67
+
68
+ #### Prerequisites
69
+
70
+ Make sure you have Node.js installed...
71
+ ```
72
+
73
+ Each group can have:
74
+
75
+ - A title (any heading level that starts the group)
76
+ - A pretitle (H3 followed by higher-level heading)
77
+ - A subtitle (lower-level heading after title)
78
+ - Content (text, lists, media)
79
+
80
+ ## Creating Groups
81
+
82
+ There are two ways to create groups: using headings or using dividers.
83
+
84
+ ### Using Headings
85
+
86
+ Headings naturally create groups when they appear after content:
87
+
88
+ ```markdown
89
+ # Main Features
90
+
91
+ Our platform offers powerful capabilities.
92
+
93
+ ## Fast Performance
94
+
95
+ Lightning quick response times...
96
+
97
+ ## Easy Integration
98
+
99
+ Connect with your existing tools...
100
+ ```
101
+
102
+ Multiple H1s or multiple H2s (with no H1) will create separate groups:
103
+
104
+ ```markdown
105
+ # First Group
106
+
107
+ Content for first group...
108
+
109
+ # Second Group
110
+
111
+ Content for second group...
112
+ ```
113
+
114
+ ### Using Dividers
115
+
116
+ Alternatively, you can use dividers (---) to explicitly separate groups:
117
+
118
+ ```markdown
119
+ # Welcome Section
120
+
121
+ Our main welcome message.
122
+
123
+ ---
124
+
125
+ Get started with our platform
126
+ with these simple steps.
127
+
128
+ ---
129
+
130
+ Contact us to learn more
131
+ about enterprise solutions.
132
+ ```
133
+
134
+ Important: Once you use a divider, you must use dividers for all group separations in that section. Don't mix heading-based and divider-based group creation.
135
+
136
+ ## Rich Content
137
+
138
+ ### Lists
139
+
140
+ Lists maintain their hierarchy, making them perfect for structured data:
141
+
142
+ ```markdown
143
+ ## Features
144
+
145
+ - Enterprise
146
+ - Role-based access
147
+ - Audit logs
148
+ - Team
149
+ - Collaboration
150
+ - API access
151
+ ```
152
+
153
+ ### Media
154
+
155
+ Images and videos can have explicit roles:
156
+
157
+ ```markdown
158
+ ![Hero](hero.jpg){role="background"}
159
+ ![Icon](icon.svg){role="icon"}
160
+ ![](photo.jpg) # Default role is "content"
161
+ ```
162
+
163
+ Common roles include:
164
+
165
+ - background
166
+ - content
167
+ - gallery
168
+ - icon
169
+
170
+ ### Links
171
+
172
+ Links can have roles to indicate their purpose:
173
+
174
+ ```markdown
175
+ [Get Started](./start){role="button-primary"}
176
+ [Learn More](./docs){role="button"}
177
+ [Privacy](./legal){role="footer-link"}
178
+ ```
179
+
180
+ Common link roles include:
181
+
182
+ - button-primary
183
+ - button
184
+ - button-outline
185
+ - nav-link
186
+ - footer-link
187
+
188
+ ## Important Notes
189
+
190
+ 1. Main content is only recognized when there's exactly one main title (H1 or H2).
191
+
192
+ 2. Avoid these patterns as they'll result in no main content:
193
+
194
+ ```markdown
195
+ # First Title
196
+
197
+ Content...
198
+
199
+ # Second Title
200
+
201
+ Content...
202
+ ```
203
+
204
+ 3. All content is optional - components may choose what to render based on their needs and configuration.
205
+
206
+ 4. Be consistent with your group creation method - use either headings or dividers, not both.