@uniweb/semantic-parser 1.1.4 → 1.1.6

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,470 +0,0 @@
1
- # Semantic Parser Entity Consolidation
2
-
3
- This document defines the standard semantic entities output by the parser and how editor-specific node types map to them.
4
-
5
- ## Design Principle
6
-
7
- **Editor nodes are authoring conveniences → Parser outputs standardized semantic entities**
8
-
9
- The semantic parser accepts ProseMirror/TipTap documents from two sources:
10
- 1. **File-based markdown** via `@uniweb/content-reader`
11
- 2. **Visual editor** via TipTap with custom node types
12
-
13
- Both sources must produce the same standardized output. Editor-specific node types (like `card-group`, `FormBlock`, `button` node) are conveniences that map to standard entities.
14
-
15
- ---
16
-
17
- ## Standard Entity Set
18
-
19
- After consolidation, the parser outputs this flat structure:
20
-
21
- ```js
22
- {
23
- // Header fields (from headings)
24
- title: '',
25
- pretitle: '',
26
- subtitle: '',
27
- subtitle2: '',
28
-
29
- // Body fields
30
- paragraphs: [], // Text blocks with inline HTML formatting
31
- links: [], // All link-like entities (buttons, documents, nav links)
32
- imgs: [], // All images (with role distinguishing purpose)
33
- videos: [], // Video embeds
34
- icons: [], // Standalone icons
35
- lists: [], // Bullet/ordered lists (recursive structure)
36
- quotes: [], // Blockquotes (recursive structure)
37
- data: {}, // Structured data (tagged code blocks, forms, cards)
38
- headings: [], // Overflow headings after title/subtitle/subtitle2
39
-
40
- items: [], // Semantic groups (same structure recursively)
41
- }
42
- ```
43
-
44
- ### Removed Fields
45
-
46
- | Field | Status | Reason |
47
- |-------|--------|--------|
48
- | `alignment` | **Deprecated** | Editor-only concept, not expressible in markdown |
49
- | `buttons` | **Merged into `links`** | Buttons are styled links |
50
- | `cards` | **Merged into `data`** | Structured data with schema tag |
51
- | `documents` | **Merged into `links`** | Documents are downloadable links |
52
- | `forms` | **Merged into `data`** | Structured data with `form` tag |
53
-
54
- ---
55
-
56
- ## Entity Specifications
57
-
58
- ### Links
59
-
60
- All link-like content merges into the `links` array. The `role` attribute distinguishes behavior.
61
-
62
- ```js
63
- {
64
- href: "/contact",
65
- label: "Contact Us",
66
-
67
- // Role distinguishes link type
68
- role: "link", // Default: standard hyperlink
69
- | "button" // Call-to-action button
70
- | "button-primary" // Primary CTA
71
- | "button-outline" // Outline style button
72
- | "nav-link" // Navigation link
73
- | "footer-link" // Footer navigation
74
- | "document" // Downloadable file
75
-
76
- // Button-specific attributes (when role is button-*)
77
- variant: "primary" | "secondary" | "outline" | "ghost",
78
- size: "sm" | "md" | "lg",
79
- icon: "icon-name",
80
-
81
- // Link behavior
82
- target: "_blank" | "_self",
83
- rel: "noopener noreferrer",
84
- download: true | "filename.pdf",
85
- }
86
- ```
87
-
88
- **Markdown syntax:**
89
- ```markdown
90
- [Standard link](/page)
91
- [Button link](button:/action){variant=primary}
92
- [Download](report.pdf){download}
93
- ```
94
-
95
- ### Images
96
-
97
- All image content uses the `imgs` array. The `role` attribute distinguishes purpose.
98
-
99
- ```js
100
- {
101
- url: "/images/hero.jpg",
102
- alt: "Hero image",
103
- caption: "Optional caption",
104
-
105
- // Role distinguishes image purpose
106
- role: "image", // Default: content image
107
- | "icon" // Small icon/logo
108
- | "background" // Section background
109
- | "gallery" // Gallery item
110
- | "banner" // Hero/banner image
111
-
112
- // Layout attributes
113
- direction: "left" | "right" | "center",
114
- size: "basic" | "lg" | "full",
115
-
116
- // Styling
117
- filter: "grayscale" | "blur",
118
- theme: "light" | "dark",
119
-
120
- // Link wrapper (clickable image)
121
- href: "/link-target",
122
- }
123
- ```
124
-
125
- ### Data (Structured Content)
126
-
127
- The `data` object holds all structured content from tagged code blocks and editor widgets.
128
-
129
- ```js
130
- {
131
- // From tagged code blocks
132
- "form": { fields: [...], submitLabel: "Send" },
133
- "nav-links": [{ label: "Home", href: "/" }],
134
- "config": { theme: "dark" },
135
-
136
- // From editor card widgets (mapped by type)
137
- "person": [
138
- { name: "John", title: "CEO", ... },
139
- { name: "Jane", title: "CTO", ... },
140
- ],
141
- "event": [
142
- { title: "Launch Party", date: "2024-01-15", location: "NYC", ... },
143
- ],
144
- }
145
- ```
146
-
147
- **Markdown syntax for structured data:**
148
- ```markdown
149
- ```yaml:form
150
- fields:
151
- - name: email
152
- type: email
153
- required: true
154
- submitLabel: Subscribe
155
- ```
156
-
157
- ```yaml:nav-links
158
- - label: Home
159
- href: /
160
- ```
161
- ```
162
-
163
- JSON is also supported (`json:tag-name`) if you prefer.
164
-
165
- ---
166
-
167
- ## Editor Node Mappings
168
-
169
- This section documents how TipTap/editor-specific nodes map to standard entities.
170
-
171
- ### `button` Node → `links[]`
172
-
173
- **Editor input:**
174
- ```js
175
- {
176
- type: "button",
177
- content: [{ type: "text", text: "Click me" }],
178
- attrs: {
179
- href: "/action",
180
- variant: "primary",
181
- size: "lg",
182
- icon: "arrow-right"
183
- }
184
- }
185
- ```
186
-
187
- **Standard output:**
188
- ```js
189
- links: [{
190
- href: "/action",
191
- label: "Click me",
192
- role: "button",
193
- variant: "primary",
194
- size: "lg",
195
- icon: "arrow-right"
196
- }]
197
- ```
198
-
199
- ### `FormBlock` Node → `data.form`
200
-
201
- **Editor input:**
202
- ```js
203
- {
204
- type: "FormBlock",
205
- attrs: {
206
- data: {
207
- fields: [{ name: "email", type: "email" }],
208
- submitLabel: "Subscribe"
209
- }
210
- }
211
- }
212
- ```
213
-
214
- **Standard output:**
215
- ```js
216
- data: {
217
- form: {
218
- fields: [{ name: "email", type: "email" }],
219
- submitLabel: "Subscribe"
220
- }
221
- }
222
- ```
223
-
224
- ### `card-group` Node → `data[cardType]`
225
-
226
- Cards are editor widgets for structured entities like people, events, addresses. Each card type becomes a key in `data`, with an array of all cards of that type. This follows the same pattern as tagged code blocks.
227
-
228
- **Editor input:**
229
- ```js
230
- {
231
- type: "card-group",
232
- content: [
233
- {
234
- type: "card",
235
- attrs: {
236
- cardType: "person",
237
- title: "Jane Doe",
238
- subtitle: "CEO",
239
- coverImg: { src: "/jane.jpg" },
240
- address: '{"city": "NYC"}',
241
- icon: { svg: "..." }
242
- }
243
- },
244
- {
245
- type: "card",
246
- attrs: {
247
- cardType: "person",
248
- title: "John Smith",
249
- subtitle: "CTO",
250
- coverImg: { src: "/john.jpg" }
251
- }
252
- },
253
- {
254
- type: "card",
255
- attrs: {
256
- cardType: "event",
257
- title: "Launch Party",
258
- date: "2024-03-15",
259
- location: "San Francisco"
260
- }
261
- }
262
- ]
263
- }
264
- ```
265
-
266
- **Standard output:**
267
- ```js
268
- data: {
269
- person: [
270
- {
271
- title: "Jane Doe",
272
- subtitle: "CEO",
273
- coverImg: "/jane.jpg",
274
- address: { city: "NYC" },
275
- icon: { svg: "..." }
276
- },
277
- {
278
- title: "John Smith",
279
- subtitle: "CTO",
280
- coverImg: "/john.jpg"
281
- }
282
- ],
283
- event: [
284
- {
285
- title: "Launch Party",
286
- date: "2024-03-15",
287
- location: "San Francisco"
288
- }
289
- ]
290
- }
291
- ```
292
-
293
- **Accessing cards by type:**
294
- ```js
295
- // Get all person cards
296
- const people = content.data.person || [];
297
-
298
- // Get all event cards
299
- const events = content.data.event || [];
300
- ```
301
-
302
- **Card schemas:**
303
- | Schema | Common Fields |
304
- |--------|---------------|
305
- | `person` | title (name), subtitle (role), coverImg (photo), address |
306
- | `event` | title, date, location, description |
307
- | `address` | street, city, state, country, postal |
308
- | `document` | title, href, coverImg (preview), fileType |
309
-
310
- ### `document-group` Node → `links[]`
311
-
312
- Documents are downloadable files. They map to links with `role: "document"`.
313
-
314
- **Editor input:**
315
- ```js
316
- {
317
- type: "document-group",
318
- content: [
319
- {
320
- type: "document",
321
- attrs: {
322
- title: "Annual Report",
323
- src: "/reports/annual-2024.pdf",
324
- coverImg: { src: "/preview.jpg" }
325
- }
326
- }
327
- ]
328
- }
329
- ```
330
-
331
- **Standard output:**
332
- ```js
333
- links: [{
334
- href: "/reports/annual-2024.pdf",
335
- label: "Annual Report",
336
- role: "document",
337
- download: true,
338
- preview: "/preview.jpg"
339
- }]
340
- ```
341
-
342
- ---
343
-
344
- ## Deprecation: `alignment`
345
-
346
- The `alignment` field was extracted from heading's `textAlign` attribute in the editor. This is an editor-specific styling concern that:
347
- - Cannot be expressed in file-based markdown
348
- - Is a presentation concern, not semantic content
349
- - Should be handled by component styling, not content structure
350
-
351
- **Migration:** Components relying on `content.alignment` should:
352
- 1. Use CSS/Tailwind for text alignment
353
- 2. Or accept alignment as a component `param` in frontmatter
354
-
355
- ---
356
-
357
- ## Migration Path
358
-
359
- ### Phase 1: Add Mappings (Non-Breaking)
360
-
361
- 1. Continue outputting legacy fields (`buttons`, `cards`, `documents`, `forms`, `alignment`)
362
- 2. Also populate new locations (`links` for buttons/documents, `data` for cards/forms)
363
- 3. Components can migrate gradually
364
-
365
- ### Phase 2: Deprecation Warnings
366
-
367
- 1. Log warnings when legacy fields are accessed
368
- 2. Document migration for each field
369
- 3. Provide codemod or migration script
370
-
371
- ### Phase 3: Remove Legacy Fields
372
-
373
- 1. Remove `buttons`, `cards`, `documents`, `forms`, `alignment` from output
374
- 2. Update all components to use new structure
375
- 3. Update documentation
376
-
377
- ---
378
-
379
- ## Backwards Compatibility
380
-
381
- During migration, the parser can provide a compatibility layer:
382
-
383
- ```js
384
- // Parser option
385
- const content = parse(doc, {
386
- legacyFields: true // Include deprecated fields
387
- });
388
-
389
- // Or via getter that warns
390
- Object.defineProperty(content, 'buttons', {
391
- get() {
392
- console.warn('content.buttons is deprecated, use content.links with role="button"');
393
- return content.links.filter(l => l.role?.startsWith('button'));
394
- }
395
- });
396
- ```
397
-
398
- ---
399
-
400
- ## Component Migration Examples
401
-
402
- ### Before: Using `buttons`
403
-
404
- ```jsx
405
- function CTA({ content }) {
406
- const { links, buttons } = content;
407
- return (
408
- <div>
409
- {links.map(link => <a href={link.href}>{link.label}</a>)}
410
- {buttons.map(btn => <button>{btn.content}</button>)}
411
- </div>
412
- );
413
- }
414
- ```
415
-
416
- ### After: Unified `links`
417
-
418
- ```jsx
419
- function CTA({ content }) {
420
- const { links } = content;
421
- const buttons = links.filter(l => l.role?.startsWith('button'));
422
- const plainLinks = links.filter(l => !l.role?.startsWith('button'));
423
-
424
- return (
425
- <div>
426
- {plainLinks.map(link => <a href={link.href}>{link.label}</a>)}
427
- {buttons.map(btn => (
428
- <a href={btn.href} className={`btn btn-${btn.variant}`}>
429
- {btn.label}
430
- </a>
431
- ))}
432
- </div>
433
- );
434
- }
435
- ```
436
-
437
- ### Or: Role-based rendering
438
-
439
- ```jsx
440
- function CTA({ content }) {
441
- return (
442
- <div>
443
- {content.links.map(link => {
444
- if (link.role?.startsWith('button')) {
445
- return <Button variant={link.variant}>{link.label}</Button>;
446
- }
447
- if (link.role === 'document') {
448
- return <DownloadLink href={link.href}>{link.label}</DownloadLink>;
449
- }
450
- return <a href={link.href}>{link.label}</a>;
451
- })}
452
- </div>
453
- );
454
- }
455
- ```
456
-
457
- ---
458
-
459
- ## Implementation Checklist
460
-
461
- - [ ] Update `processGroupContent` in `groups.js` to map button → links
462
- - [ ] Update `processGroupContent` to map card-group → data.cards
463
- - [ ] Update `processGroupContent` to map document-group → links
464
- - [ ] Update `processGroupContent` to map FormBlock → data.form
465
- - [ ] Remove `alignment` from header extraction
466
- - [ ] Add `legacyFields` option for backwards compatibility
467
- - [ ] Update `flattenGroup` to use new structure
468
- - [ ] Update tests for new entity structure
469
- - [ ] Update AGENTS.md and README.md
470
- - [ ] Create migration guide for components
@@ -1,50 +0,0 @@
1
- # File Structure
2
-
3
- ```
4
- semantic-parser/
5
- ├── package.json
6
- ├── README.md
7
- ├── CLAUDE.md # Guidance for Claude Code
8
- ├── src/
9
- │ ├── index.js # Main entry point and API
10
- │ ├── processors/
11
- │ │ ├── sequence.js # Flattens ProseMirror doc to sequence
12
- │ │ ├── groups.js # Creates semantic content groups
13
- │ │ └── byType.js # Organizes elements by type
14
- │ ├── processors_old/ # Legacy implementations (deprecated)
15
- │ │ ├── sequence.js
16
- │ │ ├── groups.js
17
- │ │ └── byType.js
18
- │ └── utils/
19
- │ └── role.js # Role detection utilities
20
- ├── tests/
21
- │ ├── parser.test.js # Integration tests
22
- │ ├── processors/
23
- │ │ ├── sequence.test.js
24
- │ │ ├── groups.test.js
25
- │ │ └── byType.test.js
26
- │ ├── utils/
27
- │ │ └── role.test.js
28
- │ └── fixtures/
29
- │ ├── basic.js # Simple test cases
30
- │ ├── groups.js # Group formation test cases
31
- │ └── complex.js # Complex scenarios
32
- └── docs/
33
- ├── guide.md # Content writing guide
34
- ├── api.md # API reference documentation
35
- └── file-structure.md # This file
36
- ```
37
-
38
- ## Key Directories
39
-
40
- ### `src/processors/`
41
- Contains the three-stage processing pipeline that transforms ProseMirror documents into semantic structures.
42
-
43
- ### `src/processors_old/`
44
- Legacy implementations kept for reference. Do not modify these files.
45
-
46
- ### `tests/`
47
- Comprehensive test suite organized by processor with shared fixtures.
48
-
49
- ### `docs/`
50
- End-user documentation including content writing guide and API reference.
package/docs/guide.md DELETED
@@ -1,206 +0,0 @@
1
- # Content Writing Guide
2
-
3
- This guide explains how to write content that works well with our semantic parser. The parser helps web components understand and render your content effectively by identifying its structure and meaning.
4
-
5
- ## Core Concepts
6
-
7
- The parser recognizes two key elements in your content:
8
-
9
- - **Main Content**: The primary content that introduces your section
10
- - **Groups**: Additional content blocks that follow a consistent structure
11
-
12
- ## Main Content
13
-
14
- Main content provides the primary context for your section. Here's how to write it:
15
-
16
- ```markdown
17
- ### SOLUTIONS
18
-
19
- # Build Better Websites
20
-
21
- ## For Everyone
22
-
23
- Transform how you create web content with our powerful platform.
24
- ```
25
-
26
- Your main content must have exactly one main title, which can be either:
27
-
28
- - A single H1 heading, or
29
- - A single H2 heading (if no H1 exists)
30
-
31
- You can also add:
32
-
33
- - A pretitle (H3 before main title)
34
- - A subtitle (next level heading after main title)
35
-
36
- For line breaks in titles, use HTML break tags:
37
-
38
- ```markdown
39
- # Build Better<br>Websites Today
40
- ```
41
-
42
- ## Content Groups
43
-
44
- After your main content, you can add multiple content groups. Groups can start with any heading level, and each group can have its own structure.
45
-
46
- H3s can serve two different roles depending on context:
47
-
48
- 1. As a pretitle when followed by a higher-level heading:
49
-
50
- ```markdown
51
- ### SPEED MATTERS
52
-
53
- ## Performance Features
54
-
55
- Modern websites need to be fast. Our platform ensures quick load times
56
- across all devices.
57
- ```
58
-
59
- 2. As a regular group title when followed by content or lower-level headings:
60
-
61
- ```markdown
62
- ### Getting Started
63
-
64
- Start building your website in minutes.
65
-
66
- ### Installation Guide
67
-
68
- #### Prerequisites
69
-
70
- Make sure you have Node.js installed...
71
- ```
72
-
73
- Each group can have:
74
-
75
- - A title (any heading level that starts the group)
76
- - A pretitle (H3 followed by higher-level heading)
77
- - A subtitle (lower-level heading after title)
78
- - Content (text, lists, media)
79
-
80
- ## Creating Groups
81
-
82
- There are two ways to create groups: using headings or using dividers.
83
-
84
- ### Using Headings
85
-
86
- Headings naturally create groups when they appear after content:
87
-
88
- ```markdown
89
- # Main Features
90
-
91
- Our platform offers powerful capabilities.
92
-
93
- ## Fast Performance
94
-
95
- Lightning quick response times...
96
-
97
- ## Easy Integration
98
-
99
- Connect with your existing tools...
100
- ```
101
-
102
- Multiple H1s or multiple H2s (with no H1) will create separate groups:
103
-
104
- ```markdown
105
- # First Group
106
-
107
- Content for first group...
108
-
109
- # Second Group
110
-
111
- Content for second group...
112
- ```
113
-
114
- ### Using Dividers
115
-
116
- Alternatively, you can use dividers (---) to explicitly separate groups:
117
-
118
- ```markdown
119
- # Welcome Section
120
-
121
- Our main welcome message.
122
-
123
- ---
124
-
125
- Get started with our platform
126
- with these simple steps.
127
-
128
- ---
129
-
130
- Contact us to learn more
131
- about enterprise solutions.
132
- ```
133
-
134
- Important: Once you use a divider, you must use dividers for all group separations in that section. Don't mix heading-based and divider-based group creation.
135
-
136
- ## Rich Content
137
-
138
- ### Lists
139
-
140
- Lists maintain their hierarchy, making them perfect for structured data:
141
-
142
- ```markdown
143
- ## Features
144
-
145
- - Enterprise
146
- - Role-based access
147
- - Audit logs
148
- - Team
149
- - Collaboration
150
- - API access
151
- ```
152
-
153
- ### Media
154
-
155
- Images and videos can have explicit roles:
156
-
157
- ```markdown
158
- ![Hero](hero.jpg){role="background"}
159
- ![Icon](icon.svg){role="icon"}
160
- ![](photo.jpg) # Default role is "content"
161
- ```
162
-
163
- Common roles include:
164
-
165
- - background
166
- - content
167
- - gallery
168
- - icon
169
-
170
- ### Links
171
-
172
- Links can have roles to indicate their purpose:
173
-
174
- ```markdown
175
- [Get Started](./start){role="button-primary"}
176
- [Learn More](./docs){role="button"}
177
- [Privacy](./legal){role="footer-link"}
178
- ```
179
-
180
- Common link roles include:
181
-
182
- - button-primary
183
- - button
184
- - button-outline
185
- - nav-link
186
- - footer-link
187
-
188
- ## Important Notes
189
-
190
- 1. Main content is only recognized when there's exactly one main title (H1 or H2).
191
-
192
- 2. Avoid these patterns as they'll result in no main content:
193
-
194
- ```markdown
195
- # First Title
196
-
197
- Content...
198
-
199
- # Second Title
200
-
201
- Content...
202
- ```
203
-
204
- 3. All content is optional - components may choose what to render based on their needs and configuration.
205
-
206
- 4. Be consistent with your group creation method - use either headings or dividers, not both.