@visulima/html 0.0.1 → 1.0.0-alpha.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,45 +1,960 @@
1
- # @visulima/html
1
+ <!-- START_PACKAGE_OG_IMAGE_PLACEHOLDER -->
2
2
 
3
- ## ⚠️ IMPORTANT NOTICE ⚠️
3
+ <a href="https://www.anolilab.com/open-source" align="center">
4
4
 
5
- **This package is created solely for the purpose of setting up OIDC (OpenID Connect) trusted publishing with npm.**
5
+ <img src="__assets__/package-og.svg" alt="html" />
6
6
 
7
- This is **NOT** a functional package and contains **NO** code or functionality beyond the OIDC setup configuration.
7
+ </a>
8
8
 
9
- ## Purpose
9
+ <h3 align="center">Functions for HTML, such as escaping or unescaping HTML entities</h3>
10
10
 
11
- This package exists to:
12
- 1. Configure OIDC trusted publishing for the package name `@visulima/html`
13
- 2. Enable secure, token-less publishing from CI/CD workflows
14
- 3. Establish provenance for packages published under this name
11
+ <!-- END_PACKAGE_OG_IMAGE_PLACEHOLDER -->
15
12
 
16
- ## What is OIDC Trusted Publishing?
13
+ <br />
17
14
 
18
- OIDC trusted publishing allows package maintainers to publish packages directly from their CI/CD workflows without needing to manage npm access tokens. Instead, it uses OpenID Connect to establish trust between the CI/CD provider (like GitHub Actions) and npm.
15
+ <div align="center">
19
16
 
20
- ## Setup Instructions
17
+ [![typescript-image][typescript-badge]][typescript-url]
18
+ [![mit licence][license-badge]][license]
19
+ [![npm downloads][npm-downloads-badge]][npm-downloads]
20
+ [![Chat][chat-badge]][chat]
21
+ [![PRs Welcome][prs-welcome-badge]][prs-welcome]
21
22
 
22
- To properly configure OIDC trusted publishing for this package:
23
+ </div>
23
24
 
24
- 1. Go to [npmjs.com](https://www.npmjs.com/) and navigate to your package settings
25
- 2. Configure the trusted publisher (e.g., GitHub Actions)
26
- 3. Specify the repository and workflow that should be allowed to publish
27
- 4. Use the configured workflow to publish your actual package
25
+ ---
26
+
27
+ <div align="center">
28
+ <p>
29
+ <sup>
30
+ Daniel Bannert's open source work is supported by the community on <a href="https://github.com/sponsors/prisis">GitHub Sponsors</a>
31
+ </sup>
32
+ </p>
33
+ </div>
34
+
35
+ ---
36
+
37
+ ## Features
38
+
39
+ ### HTML Escaping
40
+
41
+ - **Fast HTML Escaping**: Optimized HTML escaping function from Svelte
42
+ - **Minimal Allocations**: Efficient string escaping with minimal memory allocations
43
+ - **Dual Mode**: Supports both content escaping and attribute escaping
44
+ - **XSS Protection**: Escapes HTML special characters to prevent XSS attacks
45
+ - **HTML Template Tag**: Template literal function for HTML strings with automatic escaping of interpolated values
46
+ - **TypeScript Support**: Full TypeScript definitions included
47
+
48
+ ### CSS & JavaScript Escaping
49
+
50
+ - **CSS Escaping**: Escape strings for safe interpolation into CSS stylesheets, `<style>` elements, or CSS selectors
51
+ - **CSS Template Tag**: Template literal function for CSS strings with optional escaping
52
+ - **CSS Object Support**: Convert CSS objects (camelCase properties) to CSS strings with full TypeScript autocomplete
53
+ - **JavaScript Escaping**: Escape JavaScript objects and data for safe interpolation inside `<script>` tags
54
+ - **Injection Prevention**: Prevents CSS injection and JavaScript injection attacks
55
+ - **Context-Aware**: Properly handles different escaping requirements for CSS and JavaScript contexts
56
+ - **TypeScript Support**: Full TypeScript definitions included
28
57
 
29
- ## DO NOT USE THIS PACKAGE
58
+ ### Custom Element Validation
30
59
 
31
- This package is a placeholder for OIDC configuration only. It:
32
- - Contains no executable code
33
- - Provides no functionality
34
- - Should not be installed as a dependency
35
- - Exists only for administrative purposes
60
+ - **Name Validation**: Check if a string is a valid custom element name per HTML specification
61
+ - **Specification Compliant**: Follows the official HTML custom element naming rules
62
+ - **Hyphen Requirement**: Validates that custom element names contain required hyphens
63
+ - **TypeScript Support**: Full TypeScript definitions included
36
64
 
37
- ## More Information
65
+ ### HTML Entity Encoding & Decoding
38
66
 
39
- For more details about npm's trusted publishing feature, see:
40
- - [npm Trusted Publishing Documentation](https://docs.npmjs.com/generating-provenance-statements)
41
- - [GitHub Actions OIDC Documentation](https://docs.github.com/en/actions/deployment/security-hardening-your-deployments/about-security-hardening-with-openid-connect)
67
+ - **Fastest HTML Entities Library**: High-performance HTML entity encoding and decoding
68
+ - **Multiple Standards**: Support for HTML5, HTML4, and XML entity standards
69
+ - **Flexible Encoding Modes**:
70
+ - `specialChars`: Encode only HTML special characters (`<`, `>`, `"`, `'`, `&`) - default
71
+ - `nonAscii`: Encode HTML special characters and everything outside the ASCII character range
72
+ - `nonAsciiPrintable`: Encode HTML special characters and everything outside of the ASCII printable characters
73
+ - `nonAsciiPrintableOnly`: Encode everything outside of the ASCII printable characters, keeping HTML special characters intact
74
+ - `extensive`: Encode all non-printable characters, non-ASCII characters and all characters with named references
75
+ - **Numeric Encoding**: Support for decimal (`&#169;`) and hexadecimal (`&#xa9;`) numeric entities
76
+ - **Comprehensive Character Support**: Handles named entities, numeric entities, and hex entities
77
+ - **TypeScript & Flow Types**: Comes with both TypeScript and Flow type definitions
78
+
79
+ ### HTML Tag Lists
80
+
81
+ - **Standard HTML Tags**: Comprehensive list of all standard HTML tags (excluding obsolete ones)
82
+ - **Void Tags**: List of self-closing/void HTML tags (e.g., `br`, `img`, `hr`)
83
+ - **TypeScript Support**: Full TypeScript definitions included
84
+ - **Useful for Validation**: Perfect for validating HTML tags when working with sanitization
85
+
86
+ ### HTML Sanitization
87
+
88
+ - **Secure by Default**: Clean up user-submitted HTML by preserving allowlisted elements and attributes
89
+ - **Per-Element Configuration**: Fine-grained control over allowed tags and attributes
90
+ - **XSS Protection**: Remove potentially dangerous HTML and scripts
91
+ - **Customizable**: Extend or override default allowed tags and attributes
92
+ - **URL Validation**: Control allowed URL schemes (http, https, mailto, etc.)
93
+ - **Iframe Support**: Safe embedding of content from trusted sources
94
+
95
+ ### HTML Tag Stripping
96
+
97
+ - **No Parser Required**: Lightweight HTML tag stripping without a full HTML parser
98
+ - **Plain Text Extraction**: Removes HTML tags to extract plain text content
99
+ - **Prevents Concatenation**: Automatically adds spaces between text nodes to prevent accidental string concatenation
100
+ - **Smart Bracket Detection**: Detects raw, legitimate brackets (like comparison operators) and preserves them
101
+ - **Configurable Tag Removal**: Strip specific tags together with their contents (e.g., script, style, pre)
102
+ - **Mixed Source Support**: Handles mixed HTML and plain text sources gracefully
103
+ - **TypeScript Support**: Full TypeScript definitions included
42
104
 
43
105
  ---
44
106
 
45
- **Maintained for OIDC setup purposes only**
107
+ ## Install
108
+
109
+ ```sh
110
+ npm install @visulima/html
111
+ ```
112
+
113
+ ```sh
114
+ yarn add @visulima/html
115
+ ```
116
+
117
+ ```sh
118
+ pnpm add @visulima/html
119
+ ```
120
+
121
+ ## Usage
122
+
123
+ ### When to Use Escaping vs Sanitization vs Stripping
124
+
125
+ Understanding when to use **escaping**, **sanitization**, or **stripping** is crucial for web security:
126
+
127
+ **Use Escaping (`escapeHtml`) when:**
128
+
129
+ - You're inserting **plain text** into HTML (text content or attribute values)
130
+ - The content should be displayed as-is, without any HTML rendering
131
+ - You want maximum performance and minimal processing overhead
132
+ - You're building HTML strings manually (template literals, string concatenation)
133
+ - The input is expected to be plain text (user names, comments, form inputs)
134
+
135
+ **Example:** User comments, form field values, JSON data displayed in HTML
136
+
137
+ **Use Sanitization (`sanitizeHtml`) when:**
138
+
139
+ - Users are allowed to submit **HTML content** that should be rendered
140
+ - You need to preserve some HTML tags while removing dangerous ones
141
+ - You want to allow rich text formatting (bold, italic, links, etc.)
142
+ - The content needs to be displayed as HTML, not as plain text
143
+ - You need fine-grained control over which HTML elements and attributes are allowed
144
+
145
+ **Example:** Rich text editors, blog post content, user-generated HTML content
146
+
147
+ **Use Stripping (`stripHtml`) when:**
148
+
149
+ - You need to extract **plain text** from HTML content
150
+ - You want to completely remove all HTML structure and tags
151
+ - You're preparing content for plain text display (emails, SMS, search indexes)
152
+ - You need to prevent accidental string concatenation from adjacent text nodes
153
+ - You want to preserve legitimate brackets (like comparison operators) while removing HTML tags
154
+
155
+ **Example:** Email text versions, search result snippets, plain text previews, content summaries
156
+
157
+ **Key Differences:**
158
+
159
+ - **Escaping** converts special characters to entities (`<` → `&lt;`), preventing HTML interpretation
160
+ - **Sanitization** removes or allows specific HTML tags, enabling safe HTML rendering
161
+ - **Stripping** removes all HTML tags and extracts plain text content
162
+
163
+ > **Security Note:** Never use sanitization on content that has already been escaped, and never render sanitized content without proper escaping in attributes or other contexts.
164
+
165
+ ### HTML Escaping
166
+
167
+ The `escapeHtml` function provides fast HTML escaping optimized for performance.
168
+
169
+ #### Basic Escaping
170
+
171
+ ```typescript
172
+ import { escapeHtml } from "@visulima/html";
173
+
174
+ // Escape HTML content (escapes & and <)
175
+ const escaped = escapeHtml('<script>alert("xss")</script>');
176
+ // Result: '&lt;script>alert("xss")&lt;/script>'
177
+
178
+ // Escape HTML attributes (also escapes double quotes)
179
+ const attrEscaped = escapeHtml('value="test"', true);
180
+ // Result: 'value=&quot;test&quot;'
181
+ ```
182
+
183
+ #### Content Escaping
184
+
185
+ ```typescript
186
+ import { escapeHtml } from "@visulima/html";
187
+
188
+ // Escape content for HTML body (default mode)
189
+ escapeHtml("<div>Hello & World</div>");
190
+ // Result: '&lt;div>Hello &amp; World&lt;/div>'
191
+
192
+ // Handles null/undefined gracefully
193
+ escapeHtml(null);
194
+ // Result: ''
195
+
196
+ escapeHtml(undefined);
197
+ // Result: ''
198
+ ```
199
+
200
+ #### Attribute Escaping
201
+
202
+ ```typescript
203
+ import { escapeHtml } from "@visulima/html";
204
+
205
+ // Escape for HTML attributes (escapes &, <, and ")
206
+ const attrValue = escapeHtml('data-value="test"', true);
207
+ // Result: 'data-value=&quot;test&quot;'
208
+
209
+ // Use in HTML attributes
210
+ const html = `<div data-content="${escapeHtml(userInput, true)}">Content</div>`;
211
+ ```
212
+
213
+ #### Performance
214
+
215
+ The `escapeHtml` function is optimized for performance:
216
+
217
+ - Minimal string allocations
218
+ - Efficient regex-based pattern matching
219
+ - Fast path for strings without special characters
220
+
221
+ ```typescript
222
+ import { escapeHtml } from "@visulima/html";
223
+
224
+ // Fast escaping for user-generated content
225
+ const safeHtml = escapeHtml(userInput);
226
+
227
+ // Safe attribute values
228
+ const safeAttr = escapeHtml(userInput, true);
229
+ ```
230
+
231
+ > **Note:** This function is based on Svelte's optimized escaping implementation. See the source file for copyright information.
232
+
233
+ ### HTML Template Tag
234
+
235
+ The `html` function provides a convenient template tag for creating XSS-safe HTML strings. Template strings are used as-is, but all interpolated values are automatically HTML-escaped to prevent XSS attacks.
236
+
237
+ #### Template Tag Usage
238
+
239
+ ```typescript
240
+ import { html } from "@visulima/html";
241
+
242
+ // Template tag returns HTML as-is (template strings are trusted)
243
+ const markup = html`<div class="container">Hello World</div>`;
244
+ // Result: '<div class="container">Hello World</div>'
245
+
246
+ // With template values - interpolations are automatically escaped
247
+ const className = "test-class";
248
+ const content = "Hello";
249
+ const result = html`<div class="${className}">${content}</div>`;
250
+ // Result: '<div class="test-class">Hello</div>'
251
+
252
+ // XSS protection: interpolated values are escaped
253
+ const userInput = '<script>alert("xss")</script>';
254
+ const safe = html`<div>${userInput}</div>`;
255
+ // Result: '<div>&lt;script>alert("xss")&lt;/script></div>'
256
+ ```
257
+
258
+ #### Function Usage with Escaping Control
259
+
260
+ ```typescript
261
+ import { html } from "@visulima/html";
262
+
263
+ // Return HTML as-is (XSS-safe, no escaping)
264
+ const safeHtml = html("<div></div>", false);
265
+ // Result: '<div></div>'
266
+
267
+ // Escape HTML for attributes (escapes &, <, and ")
268
+ const escapedHtml = html('<script>alert("xss")</script>', true);
269
+ // Result: '&lt;script>alert(&quot;xss&quot;)&lt;/script>'
270
+
271
+ // Default behavior: return as-is (XSS-safe)
272
+ const defaultHtml = html("<div>Content</div>");
273
+ // Result: '<div>Content</div>'
274
+ ```
275
+
276
+ #### Use Cases
277
+
278
+ - **Template Literals**: Use the template tag for HTML with automatic escaping of interpolated values
279
+ - **Dynamic Content**: Interpolated values are automatically escaped, making it safe for user-generated content
280
+ - **Trusted HTML**: Use the function with `false` when you need to insert trusted HTML without escaping
281
+ - **Performance**: Template tag has minimal overhead, perfect for HTML generation with automatic XSS protection
282
+
283
+ ### CSS Escaping
284
+
285
+ The `escapeCss` function escapes a string for safe interpolation into an external CSS style sheet, within a `<style>` element, or in a CSS selector. This helps prevent CSS injection vulnerabilities.
286
+
287
+ #### Basic CSS Escaping
288
+
289
+ ```typescript
290
+ import { escapeCss } from "@visulima/html";
291
+
292
+ // Escape CSS content for safe interpolation
293
+ const unsafeCss = "body { background-image: url('http://example.com/foo.jpg?</style><script>alert(1)</script>'); }";
294
+ const safeCss = escapeCss(unsafeCss);
295
+ // Result: Escaped CSS that prevents injection attacks
296
+
297
+ // Use in style elements
298
+ const html = `<style>${escapeCss(userCss)}</style>`;
299
+
300
+ // Use in inline styles
301
+ const inlineStyle = `background-image: url('${escapeCss(userUrl)}');`;
302
+ ```
303
+
304
+ #### CSS Selector Escaping
305
+
306
+ ```typescript
307
+ import { escapeCss } from "@visulima/html";
308
+
309
+ // Escape CSS selector values
310
+ const selector = escapeCss(userInput);
311
+ const css = `.${selector} { color: red; }`;
312
+ ```
313
+
314
+ > **Security Note:** Always use `escapeCss` when interpolating user-generated content into CSS contexts to prevent CSS injection attacks.
315
+
316
+ ### CSS Template Tag
317
+
318
+ The `css` function provides a convenient template tag for creating CSS strings, with support for CSS objects and optional escaping.
319
+
320
+ #### Template Tag Usage
321
+
322
+ ```typescript
323
+ import { css } from "@visulima/html";
324
+
325
+ // Template tag returns CSS as-is
326
+ const styles = css`
327
+ :where(.UnderlineNav-actions ul) {
328
+ animation: 1ms rgh-selector-observer;
329
+ }
330
+ `;
331
+ // Result: CSS string with preserved formatting
332
+
333
+ // With template values
334
+ const selector = ".test-class";
335
+ const color = "red";
336
+ const result = css`
337
+ ${selector} {
338
+ color: ${color};
339
+ }
340
+ `;
341
+ ```
342
+
343
+ #### Function Usage with String Input
344
+
345
+ ```typescript
346
+ import { css } from "@visulima/html";
347
+
348
+ // Return CSS as-is (no escaping)
349
+ const cssString = css(":where(.test) { color: red; }", false);
350
+ // Result: ':where(.test) { color: red; }'
351
+
352
+ // Escape CSS for safe interpolation
353
+ const escapedCss = css(":where(.test) { color: red; }", true);
354
+ // Result: Escaped CSS string
355
+
356
+ // Default behavior: return as-is
357
+ const defaultCss = css(":where(.test) { color: red; }");
358
+ // Result: ':where(.test) { color: red; }'
359
+ ```
360
+
361
+ #### Function Usage with CSS Object
362
+
363
+ The `css` function accepts CSS objects with camelCase properties, providing full TypeScript autocomplete support via `csstype`:
364
+
365
+ ```typescript
366
+ import { css } from "@visulima/html";
367
+
368
+ // Convert CSS object to CSS string (no escaping)
369
+ const styles = css({ padding: "1px", margin: "2px" }, false);
370
+ // Result: 'padding: 1px; margin: 2px;'
371
+
372
+ // With escaping
373
+ const escapedStyles = css({ padding: "1px" }, true);
374
+ // Result: Escaped CSS string
375
+
376
+ // CamelCase properties are automatically converted to kebab-case
377
+ const result = css({ paddingTop: "10px", marginBottom: "20px" }, false);
378
+ // Result: 'padding-top: 10px; margin-bottom: 20px;'
379
+
380
+ // Full TypeScript autocomplete for CSS properties
381
+ const typedStyles = css(
382
+ {
383
+ padding: "1px",
384
+ margin: "2px",
385
+ color: "red",
386
+ display: "block",
387
+ // TypeScript will autocomplete all valid CSS properties!
388
+ },
389
+ false,
390
+ );
391
+ ```
392
+
393
+ #### TypeScript Autocomplete
394
+
395
+ The `css` function uses `csstype` for full TypeScript autocomplete support:
396
+
397
+ - **All CSS Properties**: Autocomplete for all standard CSS properties
398
+ - **Type Safety**: TypeScript will validate CSS property names and values
399
+ - **CamelCase Support**: Use camelCase property names (e.g., `paddingTop`) which are automatically converted to kebab-case
400
+
401
+ #### Use Cases
402
+
403
+ - **Template Literals**: Use the template tag for static CSS or CSS with template variables
404
+ - **CSS Objects**: Use CSS objects when you need TypeScript autocomplete and type safety
405
+ - **Dynamic Styles**: Convert JavaScript objects to CSS strings for dynamic styling
406
+ - **Safe Interpolation**: Use `escape: true` when interpolating user-generated CSS
407
+
408
+ ### JavaScript Escaping
409
+
410
+ The `escapeJs` function escapes a JavaScript object or other data for safe interpolation inside a `<script>` tag. This ensures that the data does not break the JavaScript context or introduce security risks.
411
+
412
+ #### Basic JavaScript Escaping
413
+
414
+ ```typescript
415
+ import { escapeJs } from "@visulima/html";
416
+
417
+ // Escape JavaScript content for safe interpolation
418
+ const unsafeJs = "console.log('Hello, world!');</script><script>alert('XSS');</script>";
419
+ const safeJs = escapeJs(unsafeJs);
420
+ // Result: Escaped JavaScript that prevents script injection
421
+
422
+ // Use in script tags
423
+ const html = `<script>const data = ${escapeJs(userData)};</script>`;
424
+
425
+ // Escape JSON data for inline scripts
426
+ const jsonData = { name: "John", value: "</script><script>alert(1)" };
427
+ const safeJson = escapeJs(JSON.stringify(jsonData));
428
+ const script = `<script>window.config = ${safeJson};</script>`;
429
+ ```
430
+
431
+ #### Escaping JavaScript Objects
432
+
433
+ ```typescript
434
+ import { escapeJs } from "@visulima/html";
435
+
436
+ // Escape complex objects for safe interpolation
437
+ const config = {
438
+ apiUrl: "https://api.example.com",
439
+ userInput: userProvidedValue,
440
+ };
441
+
442
+ const escapedConfig = escapeJs(JSON.stringify(config));
443
+ const html = `<script>window.appConfig = ${escapedConfig};</script>`;
444
+ ```
445
+
446
+ > **Security Note:** Always use `escapeJs` when interpolating user-generated content into JavaScript contexts to prevent XSS attacks and script injection.
447
+
448
+ ### Custom Element Name Validation
449
+
450
+ The `isValidCustomElementName` function checks whether a given string is a valid custom element name according to the [HTML specification](https://html.spec.whatwg.org/multipage/custom-elements.html#valid-custom-element-name).
451
+
452
+ #### Basic Validation
453
+
454
+ ```typescript
455
+ import { isValidCustomElementName } from "@visulima/html";
456
+
457
+ // Valid custom element names (must contain a hyphen)
458
+ console.log(isValidCustomElementName("my-element")); // true
459
+ console.log(isValidCustomElementName("my-custom-element")); // true
460
+ console.log(isValidCustomElementName("app-header")); // true
461
+
462
+ // Invalid custom element names
463
+ console.log(isValidCustomElementName("MyElement")); // false (no hyphen)
464
+ console.log(isValidCustomElementName("my_element")); // false (underscore not allowed)
465
+ console.log(isValidCustomElementName("myelement")); // false (no hyphen)
466
+ console.log(isValidCustomElementName("div")); // false (standard HTML tag)
467
+ ```
468
+
469
+ #### Using with Custom Element Registration
470
+
471
+ ```typescript
472
+ import { isValidCustomElementName } from "@visulima/html";
473
+
474
+ function registerCustomElement(name: string, constructor: CustomElementConstructor) {
475
+ if (!isValidCustomElementName(name)) {
476
+ throw new Error(`Invalid custom element name: ${name}. Custom element names must contain a hyphen.`);
477
+ }
478
+
479
+ customElements.define(name, constructor);
480
+ }
481
+
482
+ // Valid usage
483
+ registerCustomElement("my-component", class extends HTMLElement {});
484
+
485
+ // Invalid usage - will throw error
486
+ registerCustomElement("MyComponent", class extends HTMLElement {}); // Error!
487
+ ```
488
+
489
+ #### Custom Element Name Rules
490
+
491
+ According to the HTML specification, a valid custom element name must:
492
+
493
+ - Contain a hyphen (`-`) to separate words
494
+ - Start with an ASCII lowercase letter
495
+ - Contain at least one hyphen
496
+ - Not be a standard HTML tag name
497
+ - Not start with certain reserved prefixes (like `html-`, `xml-`, etc.)
498
+
499
+ > **Note:** Custom element names are case-sensitive and must follow the naming conventions defined in the HTML specification to ensure proper browser support.
500
+
501
+ ### HTML Entity Encoding & Decoding
502
+
503
+ The package exports all functions from `html-entities` for encoding and decoding HTML entities.
504
+
505
+ #### Basic Encoding
506
+
507
+ ```typescript
508
+ import { encode } from "@visulima/html";
509
+
510
+ // Encode HTML special characters
511
+ const encoded = encode("< > \" ' & © ∆");
512
+ // Result: '&lt; &gt; &quot; &apos; &amp; © ∆'
513
+ ```
514
+
515
+ #### Basic Decoding
516
+
517
+ ```typescript
518
+ import { decode } from "@visulima/html";
519
+
520
+ // Decode HTML entities
521
+ const decoded = decode("&lt; &gt; &quot; &apos; &amp; &copy; &Delta;");
522
+ // Result: '< > " \' & © ∆'
523
+ ```
524
+
525
+ #### Encoding Options
526
+
527
+ ```typescript
528
+ import { encode } from "@visulima/html";
529
+
530
+ // Encode with HTML5 standard (default)
531
+ encode("< > \" ' & ©", { level: "html5" });
532
+ // Result: '&lt; &gt; &quot; &apos; &amp; ©'
533
+
534
+ // Encode with HTML4 standard
535
+ encode("< > \" ' & ©", { level: "html4" });
536
+ // Result: '&lt; &gt; &quot; &apos; &amp; ©'
537
+
538
+ // Encode with XML standard
539
+ encode("< > \" ' & ©", { level: "xml" });
540
+ // Result: '&lt; &gt; &quot; &apos; &amp; &#169;'
541
+
542
+ // Encode only special characters (default mode)
543
+ encode("< > \" ' & ©", { mode: "specialChars" });
544
+ // Result: '&lt; &gt; &quot; &apos; &amp; ©'
545
+
546
+ // Encode HTML special characters and everything outside ASCII
547
+ encode("< ©", { mode: "nonAscii" });
548
+ // Result: '&lt; &copy;'
549
+
550
+ // Encode HTML special characters and everything outside ASCII printable
551
+ encode("< ©", { mode: "nonAsciiPrintable" });
552
+ // Result: '&lt; &copy;'
553
+
554
+ // Encode with XML level and non-ASCII printable mode
555
+ encode("< ©", { mode: "nonAsciiPrintable", level: "xml" });
556
+ // Result: '&lt; &#169;'
557
+
558
+ // Encode only non-ASCII printable characters (keep HTML special chars intact)
559
+ encode("< > \" ' & ©", { mode: "nonAsciiPrintableOnly", level: "xml" });
560
+ // Result: '< > " \' & &#169;'
561
+
562
+ // Encode extensively (all non-printable, non-ASCII, and named references)
563
+ encode("< > \" ' & ©", { mode: "extensive" });
564
+ // Result: '&lt; &gt; &quot; &apos; &amp; &copy;'
565
+
566
+ // Use hexadecimal numeric entities
567
+ encode("< ©", { mode: "nonAsciiPrintable", level: "xml", numeric: "hexadecimal" });
568
+ // Result: '&lt; &#xa9;'
569
+ ```
570
+
571
+ **Encode Options:**
572
+
573
+ - `level`: `'all'` (alias to `'html5'`) | `'html5'` (default) | `'html4'` | `'xml'` - Specifies the standard to use for named character references
574
+ - `mode`: `'specialChars'` (default) | `'nonAscii'` | `'nonAsciiPrintable'` | `'nonAsciiPrintableOnly'` | `'extensive'` - Determines which characters to encode
575
+ - `numeric`: `'decimal'` (default) | `'hexadecimal'` - Uses decimal (`&#169;`) or hexadecimal (`&#xa9;`) numbers when encoding entities
576
+
577
+ #### Decoding Options
578
+
579
+ ```typescript
580
+ import { decode } from "@visulima/html";
581
+
582
+ // Decode with HTML5 standard (default)
583
+ decode("&lt; &gt; &quot; &apos; &amp; &#169; &#8710;");
584
+ // Result: '< > " \' & © ∆'
585
+
586
+ // Decode with HTML5 level
587
+ decode("&copy;", { level: "html5" });
588
+ // Result: '©'
589
+
590
+ // Decode with XML level (doesn't recognize &copy;)
591
+ decode("&copy;", { level: "xml" });
592
+ // Result: '&copy;' (unknown entity left as is)
593
+
594
+ // Decode with body scope (default) - emulates browser parsing tag bodies
595
+ decode("&lt &gt", { scope: "body" });
596
+ // Result: '< >' (entities without semicolon are replaced)
597
+
598
+ // Decode with attribute scope - emulates browser parsing tag attributes
599
+ decode("&lt &gt", { scope: "attribute" });
600
+ // Result: '< >' (entities without semicolon replaced when not followed by =)
601
+
602
+ // Decode with strict scope - ignores entities without semicolon
603
+ decode("&lt &gt", { scope: "strict" });
604
+ // Result: '&lt &gt' (entities without semicolon ignored)
605
+ ```
606
+
607
+ **Decode Options:**
608
+
609
+ - `level`: `'all'` (alias to `'html5'`) | `'html5'` (default) | `'html4'` | `'xml'` - Specifies the standard to use for named character references
610
+ - `scope`: `'body'` (default) | `'attribute'` | `'strict'` - Controls how entities without semicolons are handled
611
+ - `'body'`: Emulates browser behavior when parsing tag bodies - entities without semicolon are also replaced
612
+ - `'attribute'`: Emulates browser behavior when parsing tag attributes - entities without semicolon are replaced when not followed by equality sign `=`
613
+ - `'strict'`: Ignores entities without semicolon
614
+
615
+ #### Decode Single Entity
616
+
617
+ ```typescript
618
+ import { decodeEntity } from "@visulima/html";
619
+
620
+ // Decode a single HTML entity
621
+ decodeEntity("&lt;");
622
+ // Result: '<'
623
+
624
+ // Decode with HTML5 level
625
+ decodeEntity("&copy;", { level: "html5" });
626
+ // Result: '©'
627
+
628
+ // Decode with XML level (doesn't recognize &copy;)
629
+ decodeEntity("&copy;", { level: "xml" });
630
+ // Result: '&copy;' (unknown entity left as is)
631
+ ```
632
+
633
+ **DecodeEntity Options:**
634
+
635
+ - `level`: `'all'` (alias to `'html5'`) | `'html5'` (default) | `'html4'` | `'xml'` - Specifies the standard to use for named character references
636
+
637
+ ### HTML Tag Lists
638
+
639
+ The package exports `htmlTags` and `voidHtmlTags` from `html-tags` for working with HTML tag lists.
640
+
641
+ #### Standard HTML Tags
642
+
643
+ ```typescript
644
+ import { htmlTags } from "@visulima/html";
645
+
646
+ // Get all standard HTML tags
647
+ console.log(htmlTags);
648
+ // => ['a', 'abbr', 'acronym', 'address', 'applet', 'area', 'article', ...]
649
+
650
+ // Check if a tag is a standard HTML tag
651
+ const isValidTag = htmlTags.includes("div");
652
+ // => true
653
+
654
+ const isInvalidTag = htmlTags.includes("custom-tag");
655
+ // => false
656
+
657
+ // Use with sanitizeHtml to validate allowed tags
658
+ import { sanitizeHtml, htmlTags } from "@visulima/html";
659
+
660
+ const clean = sanitizeHtml(dirtyHtml, {
661
+ allowedTags: htmlTags.filter((tag) => ["p", "a", "img", "div"].includes(tag)),
662
+ });
663
+ ```
664
+
665
+ #### Void HTML Tags
666
+
667
+ ```typescript
668
+ import { voidHtmlTags } from "@visulima/html";
669
+
670
+ // Get all void/self-closing HTML tags
671
+ console.log(voidHtmlTags);
672
+ // => ['area', 'base', 'br', 'col', 'embed', 'hr', 'img', 'input', ...]
673
+
674
+ // Check if a tag is a void tag
675
+ const isVoidTag = voidHtmlTags.includes("br");
676
+ // => true
677
+
678
+ const isNotVoidTag = voidHtmlTags.includes("div");
679
+ // => false
680
+
681
+ // Use with sanitizeHtml to configure self-closing tags
682
+ import { sanitizeHtml, voidHtmlTags } from "@visulima/html";
683
+
684
+ const clean = sanitizeHtml(html, {
685
+ allowedTags: ["p", "br", "img"],
686
+ selfClosing: voidHtmlTags.filter((tag) => ["br", "img"].includes(tag)),
687
+ });
688
+ ```
689
+
690
+ ### HTML Sanitization
691
+
692
+ The package exports `sanitizeHtml` from `sanitize-html` for cleaning user-submitted HTML.
693
+
694
+ #### Basic Sanitization
695
+
696
+ ```typescript
697
+ import { sanitizeHtml } from "@visulima/html";
698
+
699
+ // Basic usage - removes potentially dangerous HTML
700
+ const dirty = '<p>Hello <script>alert("xss")</script>World</p>';
701
+ const clean = sanitizeHtml(dirty);
702
+ // Result: '<p>Hello World</p>'
703
+ ```
704
+
705
+ #### Custom Allowed Tags and Attributes
706
+
707
+ ```typescript
708
+ import { sanitizeHtml } from "@visulima/html";
709
+
710
+ // Specify allowed tags and attributes
711
+ const html = '<p>Hello <a href="http://example.com">Link</a></p>';
712
+ const clean = sanitizeHtml(html, {
713
+ allowedTags: ["b", "i", "em", "strong", "a", "p"],
714
+ allowedAttributes: {
715
+ a: ["href"],
716
+ },
717
+ });
718
+ ```
719
+
720
+ #### Extending Default Allowed Tags
721
+
722
+ ```typescript
723
+ import { sanitizeHtml } from "@visulima/html";
724
+
725
+ // Extend the default set of allowed tags
726
+ const clean = sanitizeHtml(html, {
727
+ allowedTags: sanitizeHtml.defaults.allowedTags.concat(["img", "iframe"]),
728
+ });
729
+ ```
730
+
731
+ #### Advanced Sanitization Options
732
+
733
+ ```typescript
734
+ import { sanitizeHtml } from "@visulima/html";
735
+
736
+ const clean = sanitizeHtml(html, {
737
+ // Allowed HTML tags
738
+ allowedTags: ["h1", "h2", "p", "a", "img"],
739
+
740
+ // Allowed attributes per tag
741
+ allowedAttributes: {
742
+ a: ["href", "name", "target"],
743
+ img: ["src", "alt", "width", "height"],
744
+ },
745
+
746
+ // Self-closing tags
747
+ selfClosing: ["img", "br", "hr"],
748
+
749
+ // Allowed URL schemes
750
+ allowedSchemes: ["http", "https", "mailto"],
751
+
752
+ // Allowed schemes for specific tags
753
+ allowedSchemesByTag: {
754
+ img: ["http", "https", "data"],
755
+ },
756
+
757
+ // Attributes that scheme validation applies to
758
+ allowedSchemesAppliedToAttributes: ["href", "src", "cite"],
759
+
760
+ // Allow protocol-relative URLs
761
+ allowProtocolRelative: true,
762
+
763
+ // Allowed iframe hostnames
764
+ allowedIframeHostnames: ["www.youtube.com", "player.vimeo.com"],
765
+
766
+ // Transform tags
767
+ transformTags: {
768
+ a: (tagName, attribs) => {
769
+ // Transform anchor tags
770
+ return {
771
+ tagName: "a",
772
+ attribs: {
773
+ ...attribs,
774
+ rel: "nofollow",
775
+ },
776
+ };
777
+ },
778
+ },
779
+
780
+ // Text filter
781
+ textFilter: (text) => {
782
+ // Filter or transform text content
783
+ return text.trim();
784
+ },
785
+ });
786
+ ```
787
+
788
+ #### Default Configuration
789
+
790
+ The default configuration includes:
791
+
792
+ - **allowedTags**: `['h3', 'h4', 'h5', 'h6', 'blockquote', 'p', 'a', 'ul', 'ol', 'nl', 'li', 'b', 'i', 'strong', 'em', 'strike', 'code', 'hr', 'br', 'div', 'table', 'thead', 'caption', 'tbody', 'tr', 'th', 'td', 'pre']`
793
+ - **allowedAttributes**: `{ a: ['href', 'name', 'target'], img: ['src'] }`
794
+ - **allowedSchemes**: `['http', 'https', 'ftp', 'mailto']`
795
+ - **allowProtocolRelative**: `true`
796
+ - **allowedIframeHostnames**: `['www.youtube.com', 'player.vimeo.com']`
797
+
798
+ ### HTML Tag Stripping
799
+
800
+ The package exports `stripHtml` from `string-strip-html` for removing HTML tags and extracting plain text.
801
+
802
+ #### Basic Stripping
803
+
804
+ ```typescript
805
+ import { stripHtml } from "@visulima/html";
806
+
807
+ // Strip HTML tags from string
808
+ const result = stripHtml("Some text <b>and</b> text.");
809
+ console.log(result.result); // 'Some text and text.'
810
+
811
+ // Prevents accidental string concatenation
812
+ const result2 = stripHtml("aaa<div>bbb</div>ccc");
813
+ console.log(result2.result); // 'aaa bbb ccc'
814
+
815
+ // Access the stripped text
816
+ const plainText = stripHtml("<div>Hello <strong>World</strong></div>").result;
817
+ // plainText: 'Hello World'
818
+ ```
819
+
820
+ #### Tag Pairs with Content
821
+
822
+ ```typescript
823
+ import { stripHtml } from "@visulima/html";
824
+
825
+ // Strip tags together with their contents
826
+ const result = stripHtml("a <pre><code>void a;</code></pre> b", {
827
+ stripTogetherWithTheirContents: ["script", "style", "xml", "pre"],
828
+ });
829
+ console.log(result.result); // 'a b'
830
+
831
+ // Script and style tags are stripped by default
832
+ const result2 = stripHtml("Text <script>alert('xss')</script> more text");
833
+ console.log(result2.result); // 'Text more text'
834
+
835
+ // Strip style tags with their content
836
+ const result3 = stripHtml("Text <style>body { color: red; }</style> more text");
837
+ console.log(result3.result); // 'Text more text'
838
+ ```
839
+
840
+ #### Raw Bracket Detection
841
+
842
+ ```typescript
843
+ import { stripHtml } from "@visulima/html";
844
+
845
+ // Detects raw, legit brackets and preserves them
846
+ const result = stripHtml("a < b and c > d");
847
+ console.log(result.result); // 'a < b and c > d'
848
+
849
+ // Handles comparison operators in text
850
+ const result2 = stripHtml("5 < 10 and 20 > 15");
851
+ console.log(result2.result); // '5 < 10 and 20 > 15'
852
+
853
+ // Handles mixed HTML tags and comparison operators
854
+ const result3 = stripHtml("Value <b>5</b> < 10");
855
+ console.log(result3.result); // 'Value 5 < 10'
856
+ ```
857
+
858
+ #### Advanced Options
859
+
860
+ ```typescript
861
+ import { stripHtml } from "@visulima/html";
862
+
863
+ // Custom tag stripping configuration
864
+ const result = stripHtml(html, {
865
+ // Strip these tags together with their contents
866
+ stripTogetherWithTheirContents: ["script", "style", "xml", "pre", "code"],
867
+
868
+ // Other options available from string-strip-html
869
+ // See: https://codsen.com/os/string-strip-html/
870
+ });
871
+
872
+ // Access the stripped text
873
+ const plainText = result.result;
874
+
875
+ // The result object also contains other metadata
876
+ // See: https://codsen.com/os/string-strip-html/ for full API
877
+ ```
878
+
879
+ #### Edge Cases
880
+
881
+ ```typescript
882
+ import { stripHtml } from "@visulima/html";
883
+
884
+ // Handles empty strings
885
+ const result1 = stripHtml("");
886
+ console.log(result1.result); // ''
887
+
888
+ // Handles strings with only HTML tags
889
+ const result2 = stripHtml("<div><span></span></div>");
890
+ console.log(result2.result); // ''
891
+
892
+ // Handles strings with no HTML tags
893
+ const result3 = stripHtml("Plain text without tags");
894
+ console.log(result3.result); // 'Plain text without tags'
895
+
896
+ // Handles nested HTML tags
897
+ const result4 = stripHtml("<div><p>Text <b>bold</b></p></div>");
898
+ console.log(result4.result); // 'Text bold'
899
+
900
+ // Handles self-closing tags
901
+ const result5 = stripHtml("Line 1<br/>Line 2<br />Line 3");
902
+ console.log(result5.result); // 'Line 1 Line 2 Line 3'
903
+
904
+ // Handles HTML entities (they are decoded)
905
+ const result6 = stripHtml("<p>Hello &amp; World</p>");
906
+ console.log(result6.result); // 'Hello & World'
907
+ ```
908
+
909
+ **When to Use Stripping vs Sanitization:**
910
+
911
+ - **Use `stripHtml`** when you need plain text output and want to completely remove HTML structure
912
+ - **Use `sanitizeHtml`** when you need to preserve some HTML structure while removing dangerous elements
913
+
914
+ ## Related
915
+
916
+ - [sanitize-html](https://github.com/apostrophecms/sanitize-html) - HTML sanitizer with a clear API
917
+ - [string-strip-html](https://github.com/codsen/codsen/tree/main/packages/string-strip-html) - Strip HTML tags from strings
918
+ - [html-entities](https://github.com/mdevils/html-entities) - Fast HTML entity encoding/decoding
919
+ - [html-tags](https://github.com/sindresorhus/html-tags) - List of standard HTML tags
920
+ - [DOMPurify](https://github.com/cure53/DOMPurify) - DOM-only, super-fast, uber-tolerant XSS sanitizer
921
+ - [xss](https://github.com/leizongmin/js-xss) - XSS filter
922
+ - [htmlnano](https://github.com/maltsev/htmlnano) - HTML minifier
923
+ - [cssnano](https://github.com/cssnano/cssnano) - CSS minifier
924
+
925
+ ## Supported Node.js Versions
926
+
927
+ Libraries in this ecosystem make the best effort to track [Node.js’ release schedule](https://github.com/nodejs/release#release-schedule).
928
+ Here’s [a post on why we think this is important](https://medium.com/the-node-js-collection/maintainers-should-consider-following-node-js-release-schedule-ab08ed4de71a).
929
+
930
+ ## Contributing
931
+
932
+ If you would like to help take a look at the [list of issues](https://github.com/visulima/visulima/issues) and check our [Contributing](.github/CONTRIBUTING.md) guidelines.
933
+
934
+ > **Note:** please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.
935
+
936
+ ## Credits
937
+
938
+ - [Daniel Bannert](https://github.com/prisis)
939
+ - [All Contributors](https://github.com/visulima/visulima/graphs/contributors)
940
+
941
+ ## Made with ❤️ at Anolilab
942
+
943
+ This is an open source project and will always remain free to use. If you think it's cool, please star it 🌟. [Anolilab](https://www.anolilab.com/open-source) is a Development and AI Studio. Contact us at [hello@anolilab.com](mailto:hello@anolilab.com) if you need any help with these technologies or just want to say hi!
944
+
945
+ ## License
946
+
947
+ The visulima html is open-sourced software licensed under the [MIT][license]
948
+
949
+ <!-- badges -->
950
+
951
+ [license-badge]: https://img.shields.io/npm/l/@visulima/html?style=for-the-badge
952
+ [license]: https://github.com/visulima/visulima/blob/main/LICENSE
953
+ [npm-downloads-badge]: https://img.shields.io/npm/dm/@visulima/html?style=for-the-badge
954
+ [npm-downloads]: https://www.npmjs.com/package/@visulima/html
955
+ [prs-welcome-badge]: https://img.shields.io/badge/PRs-welcome-brightgreen.svg?style=for-the-badge
956
+ [prs-welcome]: https://github.com/visulima/visulima/blob/main/.github/CONTRIBUTING.md
957
+ [chat-badge]: https://img.shields.io/discord/932323359193186354.svg?style=for-the-badge
958
+ [chat]: https://discord.gg/TtFJY8xkFK
959
+ [typescript-badge]: https://img.shields.io/badge/Typescript-294E80.svg?style=for-the-badge&logo=typescript
960
+ [typescript-url]: https://www.typescriptlang.org/