@uniweb/semantic-parser 1.0.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.claude/settings.local.json +9 -0
- package/.eslintrc.json +28 -0
- package/LICENSE +674 -0
- package/README.md +395 -0
- package/docs/api.md +352 -0
- package/docs/file-structure.md +50 -0
- package/docs/guide.md +206 -0
- package/docs/mapping-patterns.md +928 -0
- package/docs/text-component-reference.md +515 -0
- package/package.json +41 -0
- package/reference/README.md +195 -0
- package/reference/Text.js +188 -0
- package/src/index.js +35 -0
- package/src/mappers/accessor.js +312 -0
- package/src/mappers/extractors.js +397 -0
- package/src/mappers/helpers.js +234 -0
- package/src/mappers/index.js +28 -0
- package/src/mappers/types.js +495 -0
- package/src/processors/byType.js +129 -0
- package/src/processors/groups.js +330 -0
- package/src/processors/groups_backup.js +379 -0
- package/src/processors/groups_doc.md +179 -0
- package/src/processors/sequence.js +573 -0
- package/src/processors/sequence_backup.js +402 -0
- package/src/utils/role.js +53 -0
|
@@ -0,0 +1,928 @@
|
|
|
1
|
+
# Content Mapping Patterns
|
|
2
|
+
|
|
3
|
+
This guide shows how to use the mapping utilities to transform parsed content into component-specific formats.
|
|
4
|
+
|
|
5
|
+
## Overview
|
|
6
|
+
|
|
7
|
+
The parser provides mapping utilities designed for two contexts:
|
|
8
|
+
|
|
9
|
+
- **Visual Editor Mode** (default): Gracefully handles content with silent cleanup, perfect for non-technical users
|
|
10
|
+
- **Build Mode**: Validates content and warns about issues, ideal for development workflows
|
|
11
|
+
|
|
12
|
+
### Mapping Tools
|
|
13
|
+
|
|
14
|
+
1. **Type System**: Automatic transformation based on field types (plaintext, richtext, excerpt, etc.)
|
|
15
|
+
2. **Helpers**: General-purpose utility functions
|
|
16
|
+
3. **Accessor**: Path-based extraction with schema support
|
|
17
|
+
4. **Extractors**: Pre-built patterns for common components
|
|
18
|
+
|
|
19
|
+
## Type System (Recommended)
|
|
20
|
+
|
|
21
|
+
The type system automatically transforms content based on component requirements, making it perfect for visual editors where users don't know about HTML/markdown.
|
|
22
|
+
|
|
23
|
+
### Visual Editor Mode (Default)
|
|
24
|
+
|
|
25
|
+
Gracefully handles content issues with silent, automatic cleanup:
|
|
26
|
+
|
|
27
|
+
```js
|
|
28
|
+
const schema = {
|
|
29
|
+
title: {
|
|
30
|
+
path: "groups.main.header.title",
|
|
31
|
+
type: "plaintext", // Auto-strips HTML markup
|
|
32
|
+
maxLength: 60 // Auto-truncates with smart boundaries
|
|
33
|
+
},
|
|
34
|
+
description: {
|
|
35
|
+
path: "groups.main.body.paragraphs",
|
|
36
|
+
type: "excerpt", // Auto-creates excerpt from paragraphs
|
|
37
|
+
maxLength: 150
|
|
38
|
+
},
|
|
39
|
+
image: {
|
|
40
|
+
path: "groups.main.body.imgs[0].url",
|
|
41
|
+
type: "image", // Normalizes image data
|
|
42
|
+
defaultValue: "/placeholder.jpg",
|
|
43
|
+
treatEmptyAsDefault: true
|
|
44
|
+
}
|
|
45
|
+
};
|
|
46
|
+
|
|
47
|
+
// Visual editor mode (default) - silent cleanup
|
|
48
|
+
const data = mappers.extractBySchema(parsed, schema);
|
|
49
|
+
// {
|
|
50
|
+
// title: "Welcome to Our Platform", // <strong> tags stripped
|
|
51
|
+
// description: "Get started with...", // Truncated, markup removed
|
|
52
|
+
// image: "/hero.jpg" or "/placeholder.jpg"
|
|
53
|
+
// }
|
|
54
|
+
```
|
|
55
|
+
|
|
56
|
+
### Build Mode
|
|
57
|
+
|
|
58
|
+
Validates content and provides warnings for developers:
|
|
59
|
+
|
|
60
|
+
```js
|
|
61
|
+
const data = mappers.extractBySchema(parsed, schema, { mode: 'build' });
|
|
62
|
+
|
|
63
|
+
// Console output:
|
|
64
|
+
// ⚠️ [title] Field contains HTML markup but expects plain text (auto-fixed)
|
|
65
|
+
// ⚠️ [title] Text is 65 characters (max: 60) (auto-fixed)
|
|
66
|
+
```
|
|
67
|
+
|
|
68
|
+
### Available Field Types
|
|
69
|
+
|
|
70
|
+
#### `plaintext`
|
|
71
|
+
|
|
72
|
+
Strips all HTML markup, returning clean text. Perfect for titles, labels, and anywhere HTML shouldn't appear.
|
|
73
|
+
|
|
74
|
+
```js
|
|
75
|
+
{
|
|
76
|
+
title: {
|
|
77
|
+
path: "groups.main.header.title",
|
|
78
|
+
type: "plaintext",
|
|
79
|
+
maxLength: 60, // Auto-truncate
|
|
80
|
+
boundary: "word", // or "sentence", "character"
|
|
81
|
+
ellipsis: "...",
|
|
82
|
+
transform: (text) => text.toUpperCase() // Additional transform
|
|
83
|
+
}
|
|
84
|
+
}
|
|
85
|
+
|
|
86
|
+
// Input: "Welcome to <strong>Our Platform</strong>"
|
|
87
|
+
// Output: "Welcome to Our Platform"
|
|
88
|
+
```
|
|
89
|
+
|
|
90
|
+
#### `richtext`
|
|
91
|
+
|
|
92
|
+
Preserves safe HTML while removing dangerous tags (script, iframe, etc.).
|
|
93
|
+
|
|
94
|
+
```js
|
|
95
|
+
{
|
|
96
|
+
description: {
|
|
97
|
+
path: "groups.main.body.paragraphs[0]",
|
|
98
|
+
type: "richtext",
|
|
99
|
+
allowedTags: ["strong", "em", "a", "br"], // Customize allowed tags
|
|
100
|
+
stripTags: ["script", "style"] // Additional tags to remove
|
|
101
|
+
}
|
|
102
|
+
}
|
|
103
|
+
|
|
104
|
+
// Input: "Text with <strong>bold</strong> and <script>bad</script>"
|
|
105
|
+
// Output: "Text with <strong>bold</strong> and "
|
|
106
|
+
```
|
|
107
|
+
|
|
108
|
+
#### `excerpt`
|
|
109
|
+
|
|
110
|
+
Auto-generates excerpt from content, stripping markup and truncating intelligently.
|
|
111
|
+
|
|
112
|
+
```js
|
|
113
|
+
{
|
|
114
|
+
excerpt: {
|
|
115
|
+
path: "groups.main.body.paragraphs",
|
|
116
|
+
type: "excerpt",
|
|
117
|
+
maxLength: 150,
|
|
118
|
+
boundary: "word", // or "sentence"
|
|
119
|
+
preferFirstSentence: true // Use first sentence if short enough
|
|
120
|
+
}
|
|
121
|
+
}
|
|
122
|
+
|
|
123
|
+
// Input: ["Long paragraph with <em>formatting</em>...", "More text..."]
|
|
124
|
+
// Output: "Long paragraph with formatting..."
|
|
125
|
+
```
|
|
126
|
+
|
|
127
|
+
#### `number`
|
|
128
|
+
|
|
129
|
+
Parses and optionally formats numbers.
|
|
130
|
+
|
|
131
|
+
```js
|
|
132
|
+
{
|
|
133
|
+
price: {
|
|
134
|
+
path: "groups.main.header.title",
|
|
135
|
+
type: "number",
|
|
136
|
+
format: {
|
|
137
|
+
decimals: 2,
|
|
138
|
+
thousands: ",",
|
|
139
|
+
decimal: "."
|
|
140
|
+
}
|
|
141
|
+
}
|
|
142
|
+
}
|
|
143
|
+
|
|
144
|
+
// Input: "1234.567"
|
|
145
|
+
// Output: "1,234.57"
|
|
146
|
+
```
|
|
147
|
+
|
|
148
|
+
#### `image`
|
|
149
|
+
|
|
150
|
+
Normalizes image data structure.
|
|
151
|
+
|
|
152
|
+
```js
|
|
153
|
+
{
|
|
154
|
+
image: {
|
|
155
|
+
path: "groups.main.body.imgs[0]",
|
|
156
|
+
type: "image",
|
|
157
|
+
defaultValue: "/placeholder.jpg",
|
|
158
|
+
defaultAlt: "Image"
|
|
159
|
+
}
|
|
160
|
+
}
|
|
161
|
+
|
|
162
|
+
// Input: "/hero.jpg" or { url: "/hero.jpg", alt: "Hero" }
|
|
163
|
+
// Output: { url: "/hero.jpg", alt: "Hero", caption: null }
|
|
164
|
+
```
|
|
165
|
+
|
|
166
|
+
#### `link`
|
|
167
|
+
|
|
168
|
+
Normalizes link data structure.
|
|
169
|
+
|
|
170
|
+
```js
|
|
171
|
+
{
|
|
172
|
+
cta: {
|
|
173
|
+
path: "groups.main.body.links[0]",
|
|
174
|
+
type: "link"
|
|
175
|
+
}
|
|
176
|
+
}
|
|
177
|
+
|
|
178
|
+
// Input: "http://example.com" or { href: "/page", label: "Click" }
|
|
179
|
+
// Output: { href: "/page", label: "Click", target: "_self" }
|
|
180
|
+
```
|
|
181
|
+
|
|
182
|
+
### Validation for UI Hints
|
|
183
|
+
|
|
184
|
+
Get validation results without extracting data - perfect for showing hints in visual editors:
|
|
185
|
+
|
|
186
|
+
```js
|
|
187
|
+
const hints = mappers.validateSchema(parsed, schema, { mode: 'visual-editor' });
|
|
188
|
+
|
|
189
|
+
// {
|
|
190
|
+
// title: [{
|
|
191
|
+
// type: 'max_length',
|
|
192
|
+
// severity: 'info',
|
|
193
|
+
// message: 'Text is 65 characters (max: 60)',
|
|
194
|
+
// autoFix: true
|
|
195
|
+
// }],
|
|
196
|
+
// image: [{
|
|
197
|
+
// type: 'required',
|
|
198
|
+
// severity: 'error',
|
|
199
|
+
// message: 'Required image is missing',
|
|
200
|
+
// autoFix: false
|
|
201
|
+
// }]
|
|
202
|
+
// }
|
|
203
|
+
|
|
204
|
+
// Use in UI:
|
|
205
|
+
// Title field: ℹ️ "Title is a bit long (will be trimmed to fit)"
|
|
206
|
+
// Image field: ⚠️ "Image is required"
|
|
207
|
+
```
|
|
208
|
+
|
|
209
|
+
### Real-World Example
|
|
210
|
+
|
|
211
|
+
```js
|
|
212
|
+
// Component declares its content requirements
|
|
213
|
+
const componentSchema = {
|
|
214
|
+
brand: {
|
|
215
|
+
path: "groups.main.header.pretitle",
|
|
216
|
+
type: "plaintext",
|
|
217
|
+
maxLength: 20,
|
|
218
|
+
transform: (text) => text.toUpperCase()
|
|
219
|
+
},
|
|
220
|
+
title: {
|
|
221
|
+
path: "groups.main.header.title",
|
|
222
|
+
type: "plaintext",
|
|
223
|
+
maxLength: 60,
|
|
224
|
+
required: true
|
|
225
|
+
},
|
|
226
|
+
subtitle: {
|
|
227
|
+
path: "groups.main.header.subtitle",
|
|
228
|
+
type: "plaintext",
|
|
229
|
+
maxLength: 100
|
|
230
|
+
},
|
|
231
|
+
description: {
|
|
232
|
+
path: "groups.main.body.paragraphs",
|
|
233
|
+
type: "excerpt",
|
|
234
|
+
maxLength: 200
|
|
235
|
+
},
|
|
236
|
+
image: {
|
|
237
|
+
path: "groups.main.body.imgs[0].url",
|
|
238
|
+
type: "image",
|
|
239
|
+
defaultValue: "/placeholder.jpg"
|
|
240
|
+
},
|
|
241
|
+
cta: {
|
|
242
|
+
path: "groups.main.body.links[0]",
|
|
243
|
+
type: "link"
|
|
244
|
+
}
|
|
245
|
+
};
|
|
246
|
+
|
|
247
|
+
// Engine extracts and transforms for component
|
|
248
|
+
const componentData = mappers.extractBySchema(parsed, componentSchema);
|
|
249
|
+
|
|
250
|
+
// Component receives clean, validated data:
|
|
251
|
+
// {
|
|
252
|
+
// brand: "NEW PRODUCT",
|
|
253
|
+
// title: "Welcome to Our Platform",
|
|
254
|
+
// subtitle: "Get started today",
|
|
255
|
+
// description: "Transform how you create content...",
|
|
256
|
+
// image: "/hero.jpg",
|
|
257
|
+
// cta: { href: "/signup", label: "Get Started", target: "_self" }
|
|
258
|
+
// }
|
|
259
|
+
```
|
|
260
|
+
|
|
261
|
+
---
|
|
262
|
+
|
|
263
|
+
## Quick Start
|
|
264
|
+
|
|
265
|
+
```js
|
|
266
|
+
import { parseContent, mappers } from "@uniwebcms/semantic-parser";
|
|
267
|
+
|
|
268
|
+
const parsed = parseContent(doc);
|
|
269
|
+
|
|
270
|
+
// Use a pre-built extractor
|
|
271
|
+
const heroData = mappers.extractors.hero(parsed);
|
|
272
|
+
|
|
273
|
+
// Or use schema-based extraction
|
|
274
|
+
const customData = mappers.extractBySchema(parsed, {
|
|
275
|
+
title: "groups.main.header.title",
|
|
276
|
+
image: { path: "groups.main.body.imgs[0].url", defaultValue: "/placeholder.jpg" }
|
|
277
|
+
});
|
|
278
|
+
```
|
|
279
|
+
|
|
280
|
+
## Helper Utilities
|
|
281
|
+
|
|
282
|
+
### Array Helpers
|
|
283
|
+
|
|
284
|
+
```js
|
|
285
|
+
const { helpers } = mappers;
|
|
286
|
+
|
|
287
|
+
// Get first item with default
|
|
288
|
+
const image = helpers.first(images, "/default.jpg");
|
|
289
|
+
|
|
290
|
+
// Get last item
|
|
291
|
+
const lastParagraph = helpers.last(paragraphs);
|
|
292
|
+
|
|
293
|
+
// Transform array
|
|
294
|
+
const titles = helpers.transformArray(items, item => item.header.title);
|
|
295
|
+
|
|
296
|
+
// Filter and transform
|
|
297
|
+
const h2s = helpers.filterArray(headings, h => h.level === 2, h => h.content);
|
|
298
|
+
|
|
299
|
+
// Join text
|
|
300
|
+
const description = helpers.joinText(paragraphs, " ");
|
|
301
|
+
|
|
302
|
+
// Compact (remove null/undefined/empty)
|
|
303
|
+
const cleanArray = helpers.compact([null, "text", "", undefined, "more"]);
|
|
304
|
+
// => ["text", "more"]
|
|
305
|
+
```
|
|
306
|
+
|
|
307
|
+
### Object Helpers
|
|
308
|
+
|
|
309
|
+
```js
|
|
310
|
+
// Get nested value safely
|
|
311
|
+
const title = helpers.get(parsed, "groups.main.header.title", "Untitled");
|
|
312
|
+
|
|
313
|
+
// Pick specific properties
|
|
314
|
+
const metadata = helpers.pick(parsed.groups.main, ["header", "banner"]);
|
|
315
|
+
|
|
316
|
+
// Omit properties
|
|
317
|
+
const withoutMetadata = helpers.omit(item, ["metadata"]);
|
|
318
|
+
```
|
|
319
|
+
|
|
320
|
+
### Validation
|
|
321
|
+
|
|
322
|
+
```js
|
|
323
|
+
// Check if value exists (not null/undefined/empty string)
|
|
324
|
+
if (helpers.exists(title)) {
|
|
325
|
+
// title has a value
|
|
326
|
+
}
|
|
327
|
+
|
|
328
|
+
// Validate required fields
|
|
329
|
+
const validation = helpers.validateRequired(data, ["title", "image"]);
|
|
330
|
+
if (!validation.valid) {
|
|
331
|
+
console.log("Missing fields:", validation.missing);
|
|
332
|
+
}
|
|
333
|
+
```
|
|
334
|
+
|
|
335
|
+
### Safe Extraction
|
|
336
|
+
|
|
337
|
+
```js
|
|
338
|
+
// Wrap extraction in try-catch
|
|
339
|
+
const safeExtractor = helpers.safe((parsed) => {
|
|
340
|
+
return parsed.groups.main.header.title.toUpperCase();
|
|
341
|
+
}, "DEFAULT");
|
|
342
|
+
|
|
343
|
+
const title = safeExtractor(parsed); // Won't throw if path is invalid
|
|
344
|
+
```
|
|
345
|
+
|
|
346
|
+
## Path-Based Accessor
|
|
347
|
+
|
|
348
|
+
### Basic Usage
|
|
349
|
+
|
|
350
|
+
```js
|
|
351
|
+
const { accessor } = mappers;
|
|
352
|
+
|
|
353
|
+
// Simple path
|
|
354
|
+
const title = accessor.getByPath(parsed, "groups.main.header.title");
|
|
355
|
+
|
|
356
|
+
// Array index notation
|
|
357
|
+
const firstImage = accessor.getByPath(parsed, "groups.main.body.imgs[0].url");
|
|
358
|
+
|
|
359
|
+
// With default value
|
|
360
|
+
const image = accessor.getByPath(parsed, "groups.main.body.imgs[0].url", {
|
|
361
|
+
defaultValue: "/placeholder.jpg"
|
|
362
|
+
});
|
|
363
|
+
|
|
364
|
+
// With transformation
|
|
365
|
+
const description = accessor.getByPath(parsed, "groups.main.body.paragraphs", {
|
|
366
|
+
transform: (paragraphs) => paragraphs.join(" ")
|
|
367
|
+
});
|
|
368
|
+
|
|
369
|
+
// Required field (throws if missing)
|
|
370
|
+
const title = accessor.getByPath(parsed, "groups.main.header.title", {
|
|
371
|
+
required: true
|
|
372
|
+
});
|
|
373
|
+
```
|
|
374
|
+
|
|
375
|
+
### Schema-Based Extraction
|
|
376
|
+
|
|
377
|
+
Extract multiple fields at once using a schema:
|
|
378
|
+
|
|
379
|
+
```js
|
|
380
|
+
const schema = {
|
|
381
|
+
// Shorthand: just the path
|
|
382
|
+
title: "groups.main.header.title",
|
|
383
|
+
|
|
384
|
+
// Full config with options
|
|
385
|
+
image: {
|
|
386
|
+
path: "groups.main.body.imgs[0].url",
|
|
387
|
+
defaultValue: "/placeholder.jpg"
|
|
388
|
+
},
|
|
389
|
+
|
|
390
|
+
description: {
|
|
391
|
+
path: "groups.main.body.paragraphs",
|
|
392
|
+
transform: (p) => p.join(" ")
|
|
393
|
+
},
|
|
394
|
+
|
|
395
|
+
cta: {
|
|
396
|
+
path: "groups.main.body.links[0]",
|
|
397
|
+
required: false
|
|
398
|
+
}
|
|
399
|
+
};
|
|
400
|
+
|
|
401
|
+
const data = accessor.extractBySchema(parsed, schema);
|
|
402
|
+
// {
|
|
403
|
+
// title: "...",
|
|
404
|
+
// image: "..." or "/placeholder.jpg",
|
|
405
|
+
// description: "...",
|
|
406
|
+
// cta: {...} or null
|
|
407
|
+
// }
|
|
408
|
+
```
|
|
409
|
+
|
|
410
|
+
### Array Mapping
|
|
411
|
+
|
|
412
|
+
Extract data from array of items:
|
|
413
|
+
|
|
414
|
+
```js
|
|
415
|
+
// Simple: extract single field from each item
|
|
416
|
+
const titles = accessor.mapArray(parsed, "groups.items", "header.title");
|
|
417
|
+
// ["Item 1", "Item 2", "Item 3"]
|
|
418
|
+
|
|
419
|
+
// Complex: extract multiple fields from each item
|
|
420
|
+
const cards = accessor.mapArray(parsed, "groups.items", {
|
|
421
|
+
title: "header.title",
|
|
422
|
+
text: { path: "body.paragraphs", transform: p => p.join(" ") },
|
|
423
|
+
image: { path: "body.imgs[0].url", defaultValue: "/default.jpg" }
|
|
424
|
+
});
|
|
425
|
+
// [
|
|
426
|
+
// { title: "...", text: "...", image: "..." },
|
|
427
|
+
// { title: "...", text: "...", image: "..." }
|
|
428
|
+
// ]
|
|
429
|
+
```
|
|
430
|
+
|
|
431
|
+
### Path Helpers
|
|
432
|
+
|
|
433
|
+
```js
|
|
434
|
+
// Check if path exists
|
|
435
|
+
if (accessor.hasPath(parsed, "groups.main.banner.url")) {
|
|
436
|
+
// Banner exists
|
|
437
|
+
}
|
|
438
|
+
|
|
439
|
+
// Get first existing path
|
|
440
|
+
const image = accessor.getFirstExisting(parsed, [
|
|
441
|
+
"groups.main.banner.url",
|
|
442
|
+
"groups.main.body.imgs[0].url",
|
|
443
|
+
"groups.items[0].body.imgs[0].url"
|
|
444
|
+
], "/fallback.jpg");
|
|
445
|
+
```
|
|
446
|
+
|
|
447
|
+
## Pre-Built Extractors
|
|
448
|
+
|
|
449
|
+
### Hero Component
|
|
450
|
+
|
|
451
|
+
Large header with title, image, and CTA:
|
|
452
|
+
|
|
453
|
+
```js
|
|
454
|
+
const heroData = mappers.extractors.hero(parsed);
|
|
455
|
+
// {
|
|
456
|
+
// title: "Welcome",
|
|
457
|
+
// subtitle: "Get started today",
|
|
458
|
+
// kicker: "NEW",
|
|
459
|
+
// description: "Join thousands of users...",
|
|
460
|
+
// image: "/hero.jpg",
|
|
461
|
+
// imageAlt: "Hero image",
|
|
462
|
+
// banner: "/banner.jpg",
|
|
463
|
+
// cta: { href: "/signup", label: "Get Started" },
|
|
464
|
+
// button: { content: "Learn More", attrs: {...} }
|
|
465
|
+
// }
|
|
466
|
+
```
|
|
467
|
+
|
|
468
|
+
### Card Component
|
|
469
|
+
|
|
470
|
+
```js
|
|
471
|
+
// Single card from main content
|
|
472
|
+
const card = mappers.extractors.card(parsed);
|
|
473
|
+
|
|
474
|
+
// Multiple cards from items
|
|
475
|
+
const cards = mappers.extractors.card(parsed, { useItems: true });
|
|
476
|
+
|
|
477
|
+
// Specific card by index
|
|
478
|
+
const firstCard = mappers.extractors.card(parsed, { useItems: true, itemIndex: 0 });
|
|
479
|
+
```
|
|
480
|
+
|
|
481
|
+
### Article Content
|
|
482
|
+
|
|
483
|
+
```js
|
|
484
|
+
const article = mappers.extractors.article(parsed);
|
|
485
|
+
// {
|
|
486
|
+
// title: "Article Title",
|
|
487
|
+
// subtitle: "Subtitle",
|
|
488
|
+
// kicker: "FEATURED",
|
|
489
|
+
// author: "John Doe",
|
|
490
|
+
// date: "2024-01-01",
|
|
491
|
+
// banner: "/banner.jpg",
|
|
492
|
+
// content: ["paragraph 1", "paragraph 2"],
|
|
493
|
+
// images: [...],
|
|
494
|
+
// videos: [...],
|
|
495
|
+
// links: [...]
|
|
496
|
+
// }
|
|
497
|
+
```
|
|
498
|
+
|
|
499
|
+
### Statistics
|
|
500
|
+
|
|
501
|
+
```js
|
|
502
|
+
const stats = mappers.extractors.stats(parsed);
|
|
503
|
+
// [
|
|
504
|
+
// { value: "12", label: "Partner Labs", description: "..." },
|
|
505
|
+
// { value: "$25M", label: "Grant Funding", description: "..." }
|
|
506
|
+
// ]
|
|
507
|
+
```
|
|
508
|
+
|
|
509
|
+
### Navigation Menu
|
|
510
|
+
|
|
511
|
+
```js
|
|
512
|
+
const nav = mappers.extractors.navigation(parsed);
|
|
513
|
+
// [
|
|
514
|
+
// {
|
|
515
|
+
// label: "Products",
|
|
516
|
+
// href: "/products",
|
|
517
|
+
// children: [
|
|
518
|
+
// { label: "Product 1", href: "/products/1", icon: "..." }
|
|
519
|
+
// ]
|
|
520
|
+
// }
|
|
521
|
+
// ]
|
|
522
|
+
```
|
|
523
|
+
|
|
524
|
+
### Features List
|
|
525
|
+
|
|
526
|
+
```js
|
|
527
|
+
const features = mappers.extractors.features(parsed);
|
|
528
|
+
// [
|
|
529
|
+
// {
|
|
530
|
+
// title: "Fast Performance",
|
|
531
|
+
// subtitle: "Lightning quick",
|
|
532
|
+
// description: "Our platform is optimized...",
|
|
533
|
+
// icon: "<svg>...</svg>",
|
|
534
|
+
// image: "/feature.jpg",
|
|
535
|
+
// link: { href: "/learn-more", label: "Learn More" }
|
|
536
|
+
// }
|
|
537
|
+
// ]
|
|
538
|
+
```
|
|
539
|
+
|
|
540
|
+
### Testimonials
|
|
541
|
+
|
|
542
|
+
```js
|
|
543
|
+
// Single testimonial
|
|
544
|
+
const testimonial = mappers.extractors.testimonial(parsed);
|
|
545
|
+
|
|
546
|
+
// Multiple testimonials from items
|
|
547
|
+
const testimonials = mappers.extractors.testimonial(parsed, { useItems: true });
|
|
548
|
+
// [
|
|
549
|
+
// {
|
|
550
|
+
// quote: "This product changed our workflow completely!",
|
|
551
|
+
// author: "Jane Smith",
|
|
552
|
+
// role: "CEO",
|
|
553
|
+
// company: "Acme Inc",
|
|
554
|
+
// image: "/jane.jpg",
|
|
555
|
+
// imageAlt: "Jane Smith"
|
|
556
|
+
// }
|
|
557
|
+
// ]
|
|
558
|
+
```
|
|
559
|
+
|
|
560
|
+
### FAQ
|
|
561
|
+
|
|
562
|
+
```js
|
|
563
|
+
const faqs = mappers.extractors.faq(parsed);
|
|
564
|
+
// [
|
|
565
|
+
// {
|
|
566
|
+
// question: "How does it work?",
|
|
567
|
+
// answer: "Our platform uses advanced algorithms...",
|
|
568
|
+
// links: [...]
|
|
569
|
+
// }
|
|
570
|
+
// ]
|
|
571
|
+
```
|
|
572
|
+
|
|
573
|
+
### Pricing Tiers
|
|
574
|
+
|
|
575
|
+
```js
|
|
576
|
+
const tiers = mappers.extractors.pricing(parsed);
|
|
577
|
+
// [
|
|
578
|
+
// {
|
|
579
|
+
// name: "Pro",
|
|
580
|
+
// price: "$29/month",
|
|
581
|
+
// description: "For growing teams",
|
|
582
|
+
// features: ["Unlimited users", "API access", "Priority support"],
|
|
583
|
+
// cta: { href: "/signup", label: "Start Free Trial" },
|
|
584
|
+
// highlighted: true
|
|
585
|
+
// }
|
|
586
|
+
// ]
|
|
587
|
+
```
|
|
588
|
+
|
|
589
|
+
### Team Members
|
|
590
|
+
|
|
591
|
+
```js
|
|
592
|
+
const team = mappers.extractors.team(parsed);
|
|
593
|
+
// [
|
|
594
|
+
// {
|
|
595
|
+
// name: "Dr. Sarah Chen",
|
|
596
|
+
// role: "Lead Researcher",
|
|
597
|
+
// department: "Neuroscience",
|
|
598
|
+
// bio: "Dr. Chen specializes in...",
|
|
599
|
+
// image: "/sarah.jpg",
|
|
600
|
+
// imageAlt: "Dr. Sarah Chen",
|
|
601
|
+
// links: [{ href: "https://twitter.com/...", label: "Twitter" }]
|
|
602
|
+
// }
|
|
603
|
+
// ]
|
|
604
|
+
```
|
|
605
|
+
|
|
606
|
+
### Gallery
|
|
607
|
+
|
|
608
|
+
```js
|
|
609
|
+
// All images
|
|
610
|
+
const allImages = mappers.extractors.gallery(parsed);
|
|
611
|
+
|
|
612
|
+
// Only from main content
|
|
613
|
+
const mainImages = mappers.extractors.gallery(parsed, { source: "main" });
|
|
614
|
+
|
|
615
|
+
// Only from items
|
|
616
|
+
const itemImages = mappers.extractors.gallery(parsed, { source: "items" });
|
|
617
|
+
// [
|
|
618
|
+
// { url: "/image1.jpg", alt: "Image 1", caption: "Caption 1" },
|
|
619
|
+
// { url: "/image2.jpg", alt: "Image 2", caption: "Caption 2" }
|
|
620
|
+
// ]
|
|
621
|
+
```
|
|
622
|
+
|
|
623
|
+
## Combining Utilities
|
|
624
|
+
|
|
625
|
+
You can combine helpers, accessors, and extractors for complex transformations:
|
|
626
|
+
|
|
627
|
+
```js
|
|
628
|
+
const { helpers, accessor, extractors } = mappers;
|
|
629
|
+
|
|
630
|
+
// Start with a pre-built extractor
|
|
631
|
+
const baseData = extractors.hero(parsed);
|
|
632
|
+
|
|
633
|
+
// Enhance with custom fields
|
|
634
|
+
const enhancedData = {
|
|
635
|
+
...baseData,
|
|
636
|
+
// Add custom field using accessor
|
|
637
|
+
customField: accessor.getByPath(parsed, "groups.main.metadata.custom"),
|
|
638
|
+
|
|
639
|
+
// Transform array using helper
|
|
640
|
+
relatedPosts: helpers.transformArray(
|
|
641
|
+
accessor.getByPath(parsed, "groups.items", { defaultValue: [] }),
|
|
642
|
+
item => ({
|
|
643
|
+
title: item.header.title,
|
|
644
|
+
link: helpers.first(item.body.links)
|
|
645
|
+
})
|
|
646
|
+
),
|
|
647
|
+
|
|
648
|
+
// Safe extraction with fallback
|
|
649
|
+
safeData: helpers.safe(() => {
|
|
650
|
+
return parsed.groups.main.complexPath.deepValue.toUpperCase();
|
|
651
|
+
}, "DEFAULT")
|
|
652
|
+
};
|
|
653
|
+
```
|
|
654
|
+
|
|
655
|
+
## Engine Integration Example
|
|
656
|
+
|
|
657
|
+
In your component engine, you might use mappers like this:
|
|
658
|
+
|
|
659
|
+
```js
|
|
660
|
+
// Component provides a schema
|
|
661
|
+
const componentSchema = {
|
|
662
|
+
content: {
|
|
663
|
+
type: "hero", // Use pre-built extractor
|
|
664
|
+
// OR
|
|
665
|
+
mapping: { // Use custom mapping
|
|
666
|
+
brand: "groups.main.header.pretitle",
|
|
667
|
+
title: "groups.main.header.title",
|
|
668
|
+
subtitle: "groups.main.header.subtitle",
|
|
669
|
+
image: { path: "groups.main.body.imgs[0].url", defaultValue: "/default.jpg" },
|
|
670
|
+
actions: {
|
|
671
|
+
path: "groups.main.body.links",
|
|
672
|
+
transform: links => links.map(l => ({ label: l.label, type: "primary" }))
|
|
673
|
+
}
|
|
674
|
+
}
|
|
675
|
+
}
|
|
676
|
+
};
|
|
677
|
+
|
|
678
|
+
// Engine maps content before passing to component
|
|
679
|
+
function prepareComponentData(doc, schema) {
|
|
680
|
+
const parsed = parseContent(doc);
|
|
681
|
+
|
|
682
|
+
if (schema.content.type) {
|
|
683
|
+
// Use named extractor
|
|
684
|
+
return mappers.extractors[schema.content.type](parsed);
|
|
685
|
+
} else if (schema.content.mapping) {
|
|
686
|
+
// Use custom schema
|
|
687
|
+
return mappers.accessor.extractBySchema(parsed, schema.content.mapping);
|
|
688
|
+
}
|
|
689
|
+
|
|
690
|
+
// Fallback to standard parsed structure
|
|
691
|
+
return parsed;
|
|
692
|
+
}
|
|
693
|
+
```
|
|
694
|
+
|
|
695
|
+
## Rendering Extracted Content
|
|
696
|
+
|
|
697
|
+
After extracting content, you need to render it in your components. The parser works with content that may contain paragraph arrays, rich HTML, and formatting marks.
|
|
698
|
+
|
|
699
|
+
### Text Component Pattern
|
|
700
|
+
|
|
701
|
+
A **Text component** is recommended for rendering extracted content. See the [Text Component Reference](./text-component-reference.md) for a complete implementation guide.
|
|
702
|
+
|
|
703
|
+
#### Why Use a Text Component?
|
|
704
|
+
|
|
705
|
+
The parser's extractors return content in flexible formats:
|
|
706
|
+
- **Arrays of paragraphs** - `["Para 1", "Para 2"]`
|
|
707
|
+
- **Rich HTML** - `"Welcome to <strong>our platform</strong>"`
|
|
708
|
+
- **Color marks** - `"Title with <mark class='brand'>highlight</mark>"`
|
|
709
|
+
|
|
710
|
+
A Text component handles all these cases automatically.
|
|
711
|
+
|
|
712
|
+
#### Quick Example
|
|
713
|
+
|
|
714
|
+
```jsx
|
|
715
|
+
import { parseContent, mappers } from '@uniwebcms/semantic-parser';
|
|
716
|
+
import { H1, P } from './components/Text'; // See docs/text-component-reference.md
|
|
717
|
+
|
|
718
|
+
const parsed = parseContent(doc);
|
|
719
|
+
const hero = mappers.extractors.hero(parsed);
|
|
720
|
+
|
|
721
|
+
// Simple rendering
|
|
722
|
+
<>
|
|
723
|
+
<H1 text={hero.title} />
|
|
724
|
+
{hero.subtitle && <H2 text={hero.subtitle} />}
|
|
725
|
+
<P text={hero.description} />
|
|
726
|
+
</>
|
|
727
|
+
```
|
|
728
|
+
|
|
729
|
+
#### Handling Paragraph Arrays
|
|
730
|
+
|
|
731
|
+
Extractors now return paragraph arrays to preserve structure:
|
|
732
|
+
|
|
733
|
+
```jsx
|
|
734
|
+
// hero.description is an array: ["First para", "Second para"]
|
|
735
|
+
<P text={hero.description} />
|
|
736
|
+
// Renders: <p>First para</p><p>Second para</p>
|
|
737
|
+
|
|
738
|
+
// If you need a single string, use joinParagraphs
|
|
739
|
+
import { joinParagraphs } from '@uniwebcms/semantic-parser/mappers/helpers';
|
|
740
|
+
|
|
741
|
+
<P text={joinParagraphs(hero.description, '\n\n')} />
|
|
742
|
+
// Renders: <p>First para\n\nSecond para</p>
|
|
743
|
+
```
|
|
744
|
+
|
|
745
|
+
#### Multi-line Headings
|
|
746
|
+
|
|
747
|
+
```jsx
|
|
748
|
+
// heading.title might be an array for multi-line titles
|
|
749
|
+
<H1 text={heading.title} />
|
|
750
|
+
|
|
751
|
+
// Example: ["Welcome to", "Our Platform"]
|
|
752
|
+
// Renders: <h1><div>Welcome to</div><div>Our Platform</div></h1>
|
|
753
|
+
```
|
|
754
|
+
|
|
755
|
+
#### Complete Integration Example
|
|
756
|
+
|
|
757
|
+
```jsx
|
|
758
|
+
import { parseContent, mappers } from '@uniwebcms/semantic-parser';
|
|
759
|
+
import { H1, H2, H3, P } from './components/Text';
|
|
760
|
+
|
|
761
|
+
function HeroSection({ document }) {
|
|
762
|
+
// Parse and extract
|
|
763
|
+
const parsed = parseContent(document);
|
|
764
|
+
const hero = mappers.extractors.hero(parsed);
|
|
765
|
+
|
|
766
|
+
return (
|
|
767
|
+
<section className="hero">
|
|
768
|
+
{hero.kicker && <div className="kicker">{hero.kicker}</div>}
|
|
769
|
+
<H1 text={hero.title} className="hero-title" />
|
|
770
|
+
{hero.subtitle && <H2 text={hero.subtitle} className="hero-subtitle" />}
|
|
771
|
+
<P text={hero.description} className="hero-description" />
|
|
772
|
+
{hero.image && <img src={hero.image} alt={hero.imageAlt} />}
|
|
773
|
+
{hero.cta && (
|
|
774
|
+
<a href={hero.cta.href} className="cta-button">
|
|
775
|
+
{hero.cta.text}
|
|
776
|
+
</a>
|
|
777
|
+
)}
|
|
778
|
+
</section>
|
|
779
|
+
);
|
|
780
|
+
}
|
|
781
|
+
```
|
|
782
|
+
|
|
783
|
+
#### Rendering Lists
|
|
784
|
+
|
|
785
|
+
```jsx
|
|
786
|
+
function FeaturesList({ document }) {
|
|
787
|
+
const parsed = parseContent(document);
|
|
788
|
+
const features = mappers.extractors.features(parsed);
|
|
789
|
+
|
|
790
|
+
return (
|
|
791
|
+
<div className="features-grid">
|
|
792
|
+
{features.map((feature, i) => (
|
|
793
|
+
<div key={i} className="feature-card">
|
|
794
|
+
{feature.icon && <img src={feature.icon} alt="" />}
|
|
795
|
+
<H3 text={feature.title} />
|
|
796
|
+
{feature.subtitle && <P text={feature.subtitle} className="subtitle" />}
|
|
797
|
+
<P text={feature.description} />
|
|
798
|
+
</div>
|
|
799
|
+
))}
|
|
800
|
+
</div>
|
|
801
|
+
);
|
|
802
|
+
}
|
|
803
|
+
```
|
|
804
|
+
|
|
805
|
+
### Sanitization Strategy
|
|
806
|
+
|
|
807
|
+
**Important:** Sanitize at the engine level, not in components.
|
|
808
|
+
|
|
809
|
+
```javascript
|
|
810
|
+
// ✅ Good - sanitize during data preparation
|
|
811
|
+
import { sanitizeHtml } from '@uniwebcms/semantic-parser/mappers/types';
|
|
812
|
+
|
|
813
|
+
function prepareHeroData(parsed) {
|
|
814
|
+
const hero = mappers.extractors.hero(parsed);
|
|
815
|
+
|
|
816
|
+
return {
|
|
817
|
+
...hero,
|
|
818
|
+
title: sanitizeHtml(hero.title, {
|
|
819
|
+
allowedTags: ['strong', 'em', 'mark', 'span'],
|
|
820
|
+
allowedAttr: ['class', 'data-variant']
|
|
821
|
+
}),
|
|
822
|
+
description: hero.description.map(p => sanitizeHtml(p))
|
|
823
|
+
};
|
|
824
|
+
}
|
|
825
|
+
|
|
826
|
+
const safeHeroData = prepareHeroData(parsed);
|
|
827
|
+
<H1 text={safeHeroData.title} />
|
|
828
|
+
```
|
|
829
|
+
|
|
830
|
+
```javascript
|
|
831
|
+
// ❌ Avoid - sanitizing in component on every render
|
|
832
|
+
function Hero({ data }) {
|
|
833
|
+
const safeTitle = sanitizeHtml(data.title); // Runs every render!
|
|
834
|
+
return <H1 text={safeTitle} />;
|
|
835
|
+
}
|
|
836
|
+
```
|
|
837
|
+
|
|
838
|
+
#### When to Sanitize
|
|
839
|
+
|
|
840
|
+
- **Always**: External content, user-generated content
|
|
841
|
+
- **Optional**: Trusted TipTap editor with locked schema
|
|
842
|
+
- **Never needed**: Hard-coded content in your app
|
|
843
|
+
|
|
844
|
+
See [Text Component Reference - Sanitization](./text-component-reference.md#sanitization-tools) for detailed guidance.
|
|
845
|
+
|
|
846
|
+
### Helper Functions for Rendering
|
|
847
|
+
|
|
848
|
+
```javascript
|
|
849
|
+
import {
|
|
850
|
+
joinParagraphs,
|
|
851
|
+
excerptFromParagraphs,
|
|
852
|
+
countWords
|
|
853
|
+
} from '@uniwebcms/semantic-parser/mappers/helpers';
|
|
854
|
+
|
|
855
|
+
// Join paragraphs for single-string display
|
|
856
|
+
const singlePara = joinParagraphs(hero.description, ' ');
|
|
857
|
+
|
|
858
|
+
// Create excerpt for preview
|
|
859
|
+
const excerpt = excerptFromParagraphs(article.content, {
|
|
860
|
+
maxLength: 150
|
|
861
|
+
});
|
|
862
|
+
|
|
863
|
+
// Count words for reading time estimate
|
|
864
|
+
const wordCount = countWords(article.content);
|
|
865
|
+
const readingTime = Math.ceil(wordCount / 200); // ~200 words/min
|
|
866
|
+
```
|
|
867
|
+
|
|
868
|
+
### Color Marks in Headings
|
|
869
|
+
|
|
870
|
+
The parser supports color marks for visual emphasis:
|
|
871
|
+
|
|
872
|
+
```jsx
|
|
873
|
+
// Content with color mark
|
|
874
|
+
const title = "Welcome to <mark class='brand'>Our Platform</mark>";
|
|
875
|
+
|
|
876
|
+
<H1 text={title} />
|
|
877
|
+
```
|
|
878
|
+
|
|
879
|
+
**CSS for Color Marks:**
|
|
880
|
+
|
|
881
|
+
```css
|
|
882
|
+
mark.brand {
|
|
883
|
+
background: linear-gradient(
|
|
884
|
+
120deg,
|
|
885
|
+
var(--brand-color) 0%,
|
|
886
|
+
var(--brand-color) 100%
|
|
887
|
+
);
|
|
888
|
+
background-repeat: no-repeat;
|
|
889
|
+
background-size: 100% 40%;
|
|
890
|
+
background-position: 0 85%;
|
|
891
|
+
color: inherit;
|
|
892
|
+
padding: 0;
|
|
893
|
+
}
|
|
894
|
+
```
|
|
895
|
+
|
|
896
|
+
**Ensure sanitization allows marks:**
|
|
897
|
+
|
|
898
|
+
```javascript
|
|
899
|
+
sanitizeHtml(content, {
|
|
900
|
+
allowedTags: ['strong', 'em', 'mark', 'span'],
|
|
901
|
+
allowedAttr: ['class', 'data-variant']
|
|
902
|
+
});
|
|
903
|
+
```
|
|
904
|
+
|
|
905
|
+
## Best Practices
|
|
906
|
+
|
|
907
|
+
1. **Start with extractors**: Use pre-built patterns when they match your needs
|
|
908
|
+
2. **Customize gradually**: Override specific fields from extractors if needed
|
|
909
|
+
3. **Use schemas for clarity**: Schema-based extraction is self-documenting
|
|
910
|
+
4. **Provide defaults**: Always specify default values for optional fields
|
|
911
|
+
5. **Safe extraction**: Use `helpers.safe()` when accessing uncertain paths
|
|
912
|
+
6. **Validate**: Use `validateRequired()` for critical fields
|
|
913
|
+
7. **Type safety**: Consider adding TypeScript definitions for your schemas
|
|
914
|
+
8. **Sanitize at engine level**: Sanitize once during data preparation, not in components
|
|
915
|
+
9. **Preserve arrays**: Keep paragraph arrays when possible for better rendering control
|
|
916
|
+
10. **Use Text component**: Adopt the reference Text component for consistent rendering
|
|
917
|
+
|
|
918
|
+
## Contributing Patterns
|
|
919
|
+
|
|
920
|
+
If you develop a common pattern that could benefit others, consider contributing it as a new extractor. Common patterns include:
|
|
921
|
+
|
|
922
|
+
- Product cards
|
|
923
|
+
- Event listings
|
|
924
|
+
- Timeline entries
|
|
925
|
+
- Contact forms
|
|
926
|
+
- Newsletter signups
|
|
927
|
+
- Social proof sections
|
|
928
|
+
- Comparison tables
|