@uniweb/semantic-parser 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,515 @@
1
+ # Text Component Reference
2
+
3
+ A reference implementation of a smart typography component for rendering content from the semantic parser. This component is designed to handle the common patterns of rendering headings, paragraphs, and rich text content.
4
+
5
+ > **📦 Ready-to-use implementation:** [`reference/Text.js`](../reference/Text.js)
6
+ > **Installation guide:** [`reference/README.md`](../reference/README.md)
7
+
8
+ This is a **complete, production-ready implementation** that you can copy directly into your React project. See the [Installation](#installation) section below.
9
+
10
+ ## Installation
11
+
12
+ **1. Copy the component to your project:**
13
+ ```bash
14
+ cp reference/Text.js src/components/Text.js
15
+ ```
16
+
17
+ **2. No additional dependencies needed** - Just React
18
+
19
+ **3. Sanitize at engine level** - See [Sanitization Tools](#sanitization-tools) below
20
+
21
+ **4. Use in your components:**
22
+ ```jsx
23
+ import Text, { H1, P } from './components/Text';
24
+ import { parseContent, mappers } from '@uniwebcms/semantic-parser';
25
+
26
+ const parsed = parseContent(doc);
27
+ const hero = mappers.extractors.hero(parsed);
28
+
29
+ <H1 text={hero.title} />
30
+ <P text={hero.description} />
31
+ ```
32
+
33
+ See [`reference/README.md`](../reference/README.md) for TypeScript setup and customization options.
34
+
35
+ ## Overview
36
+
37
+ The Text component provides a unified interface for rendering text content, whether it's plain text, rich HTML, single strings, or arrays of paragraphs. It handles the complexities of:
38
+
39
+ - Rendering paragraph arrays with proper spacing
40
+ - Supporting rich HTML formatting (bold, italic, color marks)
41
+ - Semantic heading structures
42
+ - Empty content filtering
43
+
44
+ ## Architecture Decision: Where to Sanitize
45
+
46
+ **Recommended: Sanitize at the engine level, not in the component.**
47
+
48
+ The semantic parser works with TipTap/ProseMirror editors that use schema-controlled HTML. The parser extracts and transforms this content, and your **engine** (the application layer that prepares data for components) should handle sanitization.
49
+
50
+ ### Why Engine-Level Sanitization?
51
+
52
+ 1. **Performance** - Sanitize once during data preparation, not on every render
53
+ 2. **Context-aware** - Engine knows if content is from trusted TipTap or external sources
54
+ 3. **Cacheable** - Sanitized content can be memoized
55
+ 4. **Clear responsibility** - Engine owns the data pipeline
56
+
57
+ ### Data Flow
58
+
59
+ ```
60
+ TipTap Editor (schema-controlled)
61
+
62
+ Parser (extraction + transformation)
63
+
64
+ Engine (PRIMARY SANITIZATION HERE)
65
+
66
+ Components (trust the data, just render)
67
+ ```
68
+
69
+ The parser provides sanitization utilities (see [Sanitization Tools](#sanitization-tools)), but doesn't enforce their use. Your engine decides when and how to sanitize based on your security requirements.
70
+
71
+ ## Implementation
72
+
73
+ ### Basic Text Component
74
+
75
+ ```jsx
76
+ import React from 'react';
77
+
78
+ /**
79
+ * Text - A smart typography component for rendering content from semantic parser
80
+ *
81
+ * @param {Object} props
82
+ * @param {string|string[]} props.text - Content to render (string or array of paragraphs)
83
+ * @param {string} [props.as='p'] - HTML tag for wrapper/primary element
84
+ * @param {string} [props.className] - CSS class for styling
85
+ * @param {string} [props.lineAs] - Tag for array items (default: 'div' for headings, 'p' for others)
86
+ */
87
+ const Text = ({ text, as = 'p', className, lineAs }) => {
88
+ const isArray = Array.isArray(text);
89
+ const Tag = as;
90
+ const isHeading = as === 'h1' || as === 'h2' || as === 'h3' || as === 'h4' || as === 'h5' || as === 'h6';
91
+
92
+ // Single string
93
+ if (!isArray) {
94
+ if (!text || text.trim() === '') return null;
95
+ return (
96
+ <Tag
97
+ className={className}
98
+ dangerouslySetInnerHTML={{ __html: text }}
99
+ />
100
+ );
101
+ }
102
+
103
+ // Array of strings - filter empty content
104
+ const filteredText = text.filter(
105
+ (item) => typeof item === 'string' && item.trim() !== ''
106
+ );
107
+ if (filteredText.length === 0) return null;
108
+
109
+ const LineTag = lineAs || (isHeading ? 'div' : 'p');
110
+
111
+ // Headings: wrap all lines in one heading tag
112
+ if (isHeading) {
113
+ return (
114
+ <Tag className={className}>
115
+ {filteredText.map((line, i) => (
116
+ <LineTag
117
+ key={i}
118
+ dangerouslySetInnerHTML={{ __html: line }}
119
+ />
120
+ ))}
121
+ </Tag>
122
+ );
123
+ }
124
+
125
+ // Non-headings: render each line as separate element
126
+ return (
127
+ <>
128
+ {filteredText.map((line, i) => (
129
+ <LineTag
130
+ key={i}
131
+ className={className}
132
+ dangerouslySetInnerHTML={{ __html: line }}
133
+ />
134
+ ))}
135
+ </>
136
+ );
137
+ };
138
+
139
+ export default Text;
140
+ ```
141
+
142
+ ### Semantic Wrapper Components
143
+
144
+ For better developer experience, create semantic shortcuts:
145
+
146
+ ```jsx
147
+ // Heading components
148
+ export const H1 = (props) => <Text {...props} as="h1" />;
149
+ export const H2 = (props) => <Text {...props} as="h2" />;
150
+ export const H3 = (props) => <Text {...props} as="h3" />;
151
+ export const H4 = (props) => <Text {...props} as="h4" />;
152
+ export const H5 = (props) => <Text {...props} as="h5" />;
153
+ export const H6 = (props) => <Text {...props} as="h6" />;
154
+
155
+ // Paragraph component
156
+ export const P = (props) => <Text {...props} as="p" />;
157
+
158
+ // Div wrapper for flexible content
159
+ export const Div = (props) => <Text {...props} as="div" />;
160
+ ```
161
+
162
+ ## Usage with Semantic Parser
163
+
164
+ ### Basic Examples
165
+
166
+ ```jsx
167
+ import { parseContent } from '@uniwebcms/semantic-parser';
168
+ import { extractors } from '@uniwebcms/semantic-parser/mappers';
169
+ import { H1, P, Text } from './components/Text';
170
+
171
+ // Parse content
172
+ const parsed = parseContent(document);
173
+
174
+ // Extract hero data
175
+ const hero = extractors.hero(parsed);
176
+
177
+ // Render with Text components
178
+ <>
179
+ <H1 text={hero.title} />
180
+ {hero.subtitle && <H2 text={hero.subtitle} />}
181
+ <P text={hero.description} />
182
+ </>
183
+ ```
184
+
185
+ ### Handling Arrays vs Strings
186
+
187
+ The parser's extractors now return paragraph arrays by default:
188
+
189
+ ```jsx
190
+ // hero.description is an array: ["Para 1", "Para 2"]
191
+ <P text={hero.description} />
192
+ // Renders: <p>Para 1</p><p>Para 2</p>
193
+
194
+ // If you need a single string, use joinParagraphs helper
195
+ import { joinParagraphs } from '@uniwebcms/semantic-parser/mappers/helpers';
196
+
197
+ <P text={joinParagraphs(hero.description)} />
198
+ // Renders: <p>Para 1 Para 2</p>
199
+ ```
200
+
201
+ ### Multi-line Headings
202
+
203
+ ```jsx
204
+ // heading.title might be an array for multi-line headings
205
+ <H1 text={heading.title} />
206
+
207
+ // Example with array: ["Welcome to", "Our Platform"]
208
+ // Renders: <h1><div>Welcome to</div><div>Our Platform</div></h1>
209
+ ```
210
+
211
+ ### Color Marks Support
212
+
213
+ The parser supports color marks for headings using `<mark>` or `<span>` tags:
214
+
215
+ ```jsx
216
+ // Content with color mark
217
+ const title = "Welcome to <mark class='brand'>Our Platform</mark>";
218
+
219
+ <H1 text={title} />
220
+ // Renders with mark tag preserved (if sanitized properly)
221
+ ```
222
+
223
+ **Sanitization Configuration for Color Marks:**
224
+
225
+ ```javascript
226
+ // In your engine, when sanitizing
227
+ import { sanitizeHtml } from '@uniwebcms/semantic-parser/mappers/types';
228
+
229
+ const safeTitleContent = sanitizeHtml(titleContent, {
230
+ allowedTags: ['strong', 'em', 'mark', 'span'],
231
+ allowedAttr: ['class', 'data-variant']
232
+ });
233
+ ```
234
+
235
+ ### Empty Content Handling
236
+
237
+ The component automatically filters empty content:
238
+
239
+ ```jsx
240
+ <P text={["Valid content", "", " ", "More content"]} />
241
+ // Renders: <p>Valid content</p><p>More content</p>
242
+
243
+ <P text={[]} />
244
+ // Renders: null (nothing)
245
+ ```
246
+
247
+ ## Integration Patterns
248
+
249
+ ### With Extractors
250
+
251
+ ```jsx
252
+ import { parseContent, mappers } from '@uniwebcms/semantic-parser';
253
+ const { extractors, helpers } = mappers;
254
+
255
+ const parsed = parseContent(doc);
256
+ const card = extractors.card(parsed);
257
+
258
+ function Card({ data }) {
259
+ return (
260
+ <div className="card">
261
+ <H3 text={data.title} />
262
+ <P text={data.description} />
263
+ {data.image && <img src={data.image} alt={data.imageAlt} />}
264
+ </div>
265
+ );
266
+ }
267
+
268
+ <Card data={card} />
269
+ ```
270
+
271
+ ### With Custom Schemas
272
+
273
+ ```jsx
274
+ import { getByPath, extractBySchema } from '@uniwebcms/semantic-parser/mappers/accessor';
275
+
276
+ const schema = {
277
+ title: { path: 'groups.main.header.title' },
278
+ subtitle: { path: 'groups.main.header.subtitle' },
279
+ content: { path: 'groups.main.body.paragraphs' }
280
+ };
281
+
282
+ const data = extractBySchema(parsed, schema);
283
+
284
+ <>
285
+ <H1 text={data.title} />
286
+ <H2 text={data.subtitle} />
287
+ <P text={data.content} />
288
+ </>
289
+ ```
290
+
291
+ ### Rendering Lists
292
+
293
+ ```jsx
294
+ const features = extractors.features(parsed);
295
+
296
+ <div className="features">
297
+ {features.map((feature, i) => (
298
+ <div key={i} className="feature">
299
+ <H3 text={feature.title} />
300
+ <P text={feature.description} />
301
+ </div>
302
+ ))}
303
+ </div>
304
+ ```
305
+
306
+ ## Styling
307
+
308
+ The component is unstyled by default. Add your own CSS:
309
+
310
+ ```css
311
+ /* Paragraph spacing */
312
+ p + p {
313
+ margin-top: 1.5rem;
314
+ }
315
+
316
+ /* Multi-line headings */
317
+ h1 > div + div {
318
+ margin-top: 0.25rem;
319
+ }
320
+
321
+ /* Color marks */
322
+ mark.brand {
323
+ background: linear-gradient(120deg, var(--brand-color) 0%, var(--brand-color) 100%);
324
+ background-repeat: no-repeat;
325
+ background-size: 100% 40%;
326
+ background-position: 0 85%;
327
+ color: inherit;
328
+ }
329
+ ```
330
+
331
+ ## Sanitization Tools
332
+
333
+ The parser exports sanitization utilities for use in your engine:
334
+
335
+ ```javascript
336
+ import { sanitizeHtml, stripMarkup } from '@uniwebcms/semantic-parser/mappers/types';
337
+
338
+ // Sanitize HTML content
339
+ const safe = sanitizeHtml(content, {
340
+ allowedTags: ['strong', 'em', 'mark', 'span', 'a'],
341
+ allowedAttr: ['href', 'class', 'data-variant']
342
+ });
343
+
344
+ // Strip all HTML (for plain text)
345
+ const plain = stripMarkup(content);
346
+ ```
347
+
348
+ ### When to Sanitize
349
+
350
+ **Always sanitize** when:
351
+ - Content comes from external sources
352
+ - Content is user-generated
353
+ - You're unsure of the source
354
+
355
+ **Optional sanitization** when:
356
+ - Content is from your controlled TipTap editor
357
+ - TipTap schema is locked down
358
+ - You trust the content pipeline
359
+
360
+ **Never needed** when:
361
+ - Content is hard-coded in your app
362
+ - Content is from your CMS with known schemas
363
+
364
+ ## Advanced Customizations
365
+
366
+ ### Custom Line Spacing
367
+
368
+ Add a `spacing` prop for different paragraph spacing:
369
+
370
+ ```jsx
371
+ const Text = React.memo(({ text, as = 'p', className, lineAs, spacing = 'normal' }) => {
372
+ const spacingClass = spacing !== 'normal' ? `spacing-${spacing}` : '';
373
+ const combinedClass = [className, spacingClass].filter(Boolean).join(' ');
374
+
375
+ // ... rest of implementation using combinedClass
376
+ });
377
+
378
+ // Usage
379
+ <P text={paragraphs} spacing="comfortable" />
380
+ ```
381
+
382
+ ```css
383
+ .spacing-compact p + p { margin-top: 0.75rem; }
384
+ .spacing-comfortable p + p { margin-top: 1.5rem; }
385
+ .spacing-relaxed p + p { margin-top: 2rem; }
386
+ ```
387
+
388
+ ### Plain Text Mode
389
+
390
+ Add an opt-out for HTML rendering:
391
+
392
+ ```jsx
393
+ const Text = React.memo(({ text, as = 'p', className, lineAs, plainText = false }) => {
394
+ // ... existing code
395
+
396
+ if (plainText) {
397
+ // Render without dangerouslySetInnerHTML
398
+ return <Tag className={className}>{text}</Tag>;
399
+ }
400
+
401
+ // ... rest of implementation
402
+ });
403
+
404
+ // Usage
405
+ <Text text="Show <tags> literally" plainText={true} />
406
+ ```
407
+
408
+ ## Best Practices
409
+
410
+ ### 1. Sanitize at Engine Level
411
+
412
+ ```javascript
413
+ // ✅ Good - sanitize during data preparation
414
+ function prepareHeroData(parsed) {
415
+ const hero = extractors.hero(parsed);
416
+ return {
417
+ ...hero,
418
+ title: sanitizeHtml(hero.title),
419
+ description: hero.description.map(p => sanitizeHtml(p))
420
+ };
421
+ }
422
+
423
+ const heroData = prepareHeroData(parsed);
424
+ <H1 text={heroData.title} />
425
+ ```
426
+
427
+ ```javascript
428
+ // ❌ Avoid - sanitizing in component on every render
429
+ function Hero({ data }) {
430
+ const safeTitle = sanitizeHtml(data.title); // Runs every render!
431
+ return <H1 text={safeTitle} />;
432
+ }
433
+ ```
434
+
435
+ ### 2. Handle Empty Content
436
+
437
+ ```javascript
438
+ // ✅ Good - component handles it
439
+ <P text={description} />
440
+
441
+ // ❌ Avoid - manual checks everywhere
442
+ {description && description.length > 0 && <P text={description} />}
443
+ ```
444
+
445
+ ### 3. Use Semantic Wrappers
446
+
447
+ ```javascript
448
+ // ✅ Good - clear intent
449
+ <H1 text={title} />
450
+ <P text={content} />
451
+
452
+ // ❌ Avoid - verbose
453
+ <Text text={title} as="h1" />
454
+ <Text text={content} as="p" />
455
+ ```
456
+
457
+ ### 4. Preserve Arrays When Possible
458
+
459
+ ```javascript
460
+ // ✅ Good - preserves paragraph structure
461
+ <P text={hero.description} />
462
+
463
+ // ⚠️ Consider if you really need this
464
+ <P text={joinParagraphs(hero.description)} />
465
+ ```
466
+
467
+ ## TypeScript Support
468
+
469
+ ```typescript
470
+ interface TextProps {
471
+ text: string | string[];
472
+ as?: 'h1' | 'h2' | 'h3' | 'h4' | 'h5' | 'h6' | 'p' | 'div' | 'span';
473
+ className?: string;
474
+ lineAs?: string;
475
+ spacing?: 'compact' | 'normal' | 'comfortable' | 'relaxed';
476
+ plainText?: boolean;
477
+ }
478
+
479
+ const Text: React.FC<TextProps> = ({ ... }) => { ... };
480
+ ```
481
+
482
+ ## Performance Considerations
483
+
484
+ 1. **Sanitize once** - At engine level, not in component
485
+ 2. **Memoize data** - Cache parsed/extracted data at the engine level with `useMemo`
486
+ 3. **Filter early** - Remove empty content during extraction if possible
487
+ 4. **Use proper keys** - In lists, use stable unique keys (not array indices)
488
+ 5. **Batch updates** - Prepare all data before rendering
489
+
490
+ **Note:** The Text component itself is simple and fast. No need for `React.memo` unless profiling proves it's a bottleneck.
491
+
492
+ ## Browser Support
493
+
494
+ - Works in all modern browsers (Chrome, Firefox, Safari, Edge)
495
+ - Uses `dangerouslySetInnerHTML` (supported in all React versions)
496
+ - Server-side rendering compatible
497
+
498
+ ## Security Notes
499
+
500
+ 1. **Trust your pipeline** - If engine sanitizes, component can trust the data
501
+ 2. **DOMPurify recommended** - Use in engine for sanitization
502
+ 3. **TipTap content** - Generally safe due to schema control
503
+ 4. **External content** - Always sanitize before rendering
504
+ 5. **Color marks** - Ensure `class` and `data-variant` attributes are allowed
505
+
506
+ ## Summary
507
+
508
+ - **Component is simple** - Just renders, doesn't sanitize
509
+ - **Engine sanitizes** - Once during data preparation
510
+ - **Parser provides tools** - Utilities available but not enforced
511
+ - **Flexible** - Handles strings, arrays, plain and rich text
512
+ - **Semantic** - Smart defaults for headings vs paragraphs
513
+ - **Performant** - Memoized, filters empty content automatically
514
+
515
+ Copy this implementation and adapt it to your needs. The key is keeping the component simple and moving complexity to your engine layer where you have full context and control.
package/package.json ADDED
@@ -0,0 +1,41 @@
1
+ {
2
+ "name": "@uniweb/semantic-parser",
3
+ "version": "1.0.0",
4
+ "description": "Semantic parser for ProseMirror/TipTap content structures",
5
+ "type": "module",
6
+ "main": "./src/index.js",
7
+ "exports": {
8
+ ".": "./src/index.js",
9
+ "./mappers": "./src/mappers/index.js",
10
+ "./mappers/*": "./src/mappers/*.js"
11
+ },
12
+ "scripts": {
13
+ "test": "NODE_OPTIONS=--experimental-vm-modules jest",
14
+ "test-report": "NODE_OPTIONS=--experimental-vm-modules jest --json > test-results.json 2>&1",
15
+ "test:groups": "NODE_OPTIONS=--experimental-vm-modules jest tests/processors/groups.test.js"
16
+ },
17
+ "keywords": [
18
+ "prosemirror",
19
+ "tiptap",
20
+ "parser",
21
+ "semantic",
22
+ "content"
23
+ ],
24
+ "author": "Proximify Inc.",
25
+ "license": "GPL-3.0-or-later",
26
+ "devDependencies": {
27
+ "jest": "^29.7.0"
28
+ },
29
+ "repository": {
30
+ "type": "git",
31
+ "url": "git+https://github.com/uniweb/semantic-parser.git"
32
+ },
33
+ "bugs": {
34
+ "url": "https://github.com/uniweb/semantic-parser/issues"
35
+ },
36
+ "homepage": "https://github.com/uniweb/semantic-parser#readme",
37
+ "directories": {
38
+ "doc": "docs",
39
+ "test": "tests"
40
+ }
41
+ }