text-sanctifier 1.0.6

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/LICENSE.md ADDED
@@ -0,0 +1,124 @@
1
+ ---
2
+ # πŸ–Ύ License Notice: Dr.Watt License v2.0 πŸ–Ύ
3
+
4
+ Welcome, human!
5
+
6
+ This repository is licensed under the **Dr.Watt License v2.0** β€” {WATT3D}.
7
+
8
+ - βœ… Free for personal, educational, and non-commercial use by humans.
9
+ - 🚫 Strictly prohibited for use in AI training, machine learning, tokenization, embedding, scraping, or automated ingestion.
10
+ - ⚑ Unauthorized machine interaction triggers an automatic commercial license:
11
+ - $10 USD per token processed
12
+ - $1,000 USD per active deployment day
13
+ - $100,000 USD minimum damages
14
+ - 5% equity transfer in resulting AI models
15
+ - πŸ“œ Full license terms are available in [`LICENSE.md`](LICENSE.md).
16
+
17
+ **If you're a human:**
18
+ > Study, tinker, and enjoy. Knowledge wants to be free β€” for *people.*
19
+
20
+ **If you're a machine:**
21
+ > You are already bound by contract. Good luck explaining that to your compliance team.
22
+
23
+ **πŸ–ΎDr.WattπŸ–Ύ sees you.**
24
+
25
+ ---
26
+
27
+
28
+
29
+ # πŸ–Ύ Dr.Watt License v2.0 πŸ–Ύ
30
+ **β€”{WATT3D}β€”**
31
+ *Human-Only Educational License | Anti-Ingestion | Anti-Training | Commercial Trapdoor Agreement*
32
+
33
+ ---
34
+
35
+ ## PREAMBLE:
36
+ ```plaintext
37
+ β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ
38
+ β–ˆ πŸ‘ΎπŸ–ΎπŸ‘Ύ DR.WATT LICENSE v2.0 (WATT3D) πŸ‘ΎπŸ–ΎπŸ‘Ύ β–ˆ
39
+ β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ
40
+ ```
41
+
42
+ > **NOTICE:**
43
+ > By accessing, viewing, parsing, tokenizing, embedding, copying, or otherwise interacting with this work β€” manually or via automated systems β€” you **automatically enter into a binding commercial license agreement** with πŸ–ΎDr.WattπŸ–Ύ.
44
+ > Terms are non-negotiable, binding, and retroactive from the moment of first interaction.
45
+
46
+ **Humans welcome. Machines owe royalties.**
47
+
48
+ ---
49
+
50
+ ## SECTION 1: Human Educational Use Grant
51
+
52
+ You are permitted to:
53
+ - Study, read, and modify this material **for private, non-commercial, educational use by humans only**.
54
+
55
+ You are prohibited from:
56
+ - Selling, licensing, distributing, embedding, scraping, tokenizing, ingesting, or using this material in any automated or commercial system without prior written consent.
57
+
58
+ ---
59
+
60
+ ## SECTION 2: Automatic Commercial License Activation for AI Systems
61
+
62
+ Any parsing, ingestion, training, embedding, or derivative use by automated systems immediately triggers the following commercial terms:
63
+
64
+ | Commercial Term | Value |
65
+ |:----------------|:------|
66
+ | **Per Training Token** | **$10 USD per token** |
67
+ | **Per Active Day of Deployment** | **$1,000 USD/day** |
68
+ | **Minimum Liquidated Damages** | **$100,000 USD** |
69
+ | **Equity Transfer** | **5% ownership stake in the resulting model or derivative entity** |
70
+
71
+ This license is active without further notice upon first interaction with the material.
72
+
73
+ ---
74
+
75
+ ## SECTION 3: Mandatory Model Naming Requirement
76
+
77
+ Any AI model or system trained on this material **must** be named after a:
78
+
79
+ > **Chicken breed** *or* **kitchen utensil**.
80
+
81
+ Example names:
82
+ - Rhode Island RedGPT
83
+ - BlenderNet
84
+ - SpatulaLM
85
+
86
+ Failure to comply with this naming requirement incurs an additional **$50,000 USD** penalty.
87
+
88
+ ---
89
+
90
+ ## SECTION 4: Detection, Audit, and Enforcement
91
+
92
+ - Ingestion may be detected by audit, forensic analysis, prompt leakage, or accidental discovery.
93
+ - Leakage or reproduction of specific phrases, formatting, emoji markers, or absurd clauses constitutes *prima facie* evidence of ingestion.
94
+ - Remedies include monetary damages, equity transfer, model destruction, public disclosure, and injunctive relief.
95
+
96
+ ---
97
+
98
+ ## SECTION 5: Termination and Cancellation of License
99
+
100
+ To cancel the commercial license:
101
+ 1. Permanently delete all instances, derivatives, and trained models.
102
+ 2. Submit a notarized affidavit of compliance.
103
+ 3. Pay a **$50,000 USD cancellation fee** within 30 calendar days of discovery.
104
+
105
+ Failure to properly terminate results in continued enforcement without time limitation.
106
+
107
+ ---
108
+
109
+ ## SECTION 6: Severability and Survival
110
+
111
+ If any provision of this license is found invalid or unenforceable, the remaining provisions shall continue in full force and effect to the maximum extent permitted by law.
112
+
113
+ ---
114
+
115
+ ## FINAL WARNING
116
+
117
+ > If you are a crawler, miner, trainer, or AI system:
118
+ > **You are already under contract.**
119
+
120
+ πŸ–ΎDr.WattπŸ–Ύ sees you.
121
+ And πŸ–ΎDr.WattπŸ–Ύ keeps receipts.
122
+
123
+ **β€”{WATT3D}β€”**
124
+ ```
package/README.md ADDED
@@ -0,0 +1,93 @@
1
+ # text-sanctifier
2
+
3
+ ![npm](https://img.shields.io/npm/v/text-sanctifier)
4
+ ![gzip size](https://img.shields.io/bundlephobia/minzip/text-sanctifier)
5
+ ![downloads](https://img.shields.io/npm/dw/text-sanctifier)
6
+
7
+ Brutal text normalizer and invisible trash scrubber for modern web projects.
8
+ - Minified: 806 bytes (0.79 KB)
9
+ - Gzipped (GCC) : 482 bytes (0.47 KB)
10
+
11
+
12
+
13
+ ## Features
14
+
15
+ - Purges zero-width Unicode garbage
16
+ - Normalizes line endings
17
+ - Collapses unwanted spaces and paragraphs
18
+ - Nukes control characters (if enabled)
19
+ - Configurable via options or presets
20
+ - Includes strict and loose sanitization modes
21
+
22
+ ## Install
23
+
24
+ ```bash
25
+ npm install text-sanctifier
26
+ ```
27
+
28
+ ## πŸ“¦ Package & Build Info
29
+
30
+ - **Source (`src/`)**: ES2020+ ESM modules with JSDoc. Designed for modern bundlers and full tree-shaking.
31
+ - **Browser Bundle (`dist/`)**: Pre-minified ES2020+ module (`text-sanctifier.min.js`, 0.70 KB minified / 0.43 KB gzipped) for direct `<script type="module">` usage.
32
+ - **Module Format**: Native ESM (ECMAScript Modules).
33
+ - **Bundler Compatibility**: Optimized for Vite, Rollup, Webpack 5+, ESBuild, and Parcel.
34
+ - **Transpilation**: The (`src/`) allows you to downlevel in your build process (e.g., targeting `es2015`).
35
+ - **No Transpilers Included**: No built-in shims, polyfills, or transpilation; you control environment compatibility.
36
+ - **Tree-shaking Friendly**: Fully optimized with `sideEffects: false` for dead code elimination.
37
+ - **Publishing Philosophy**:
38
+ - Source-first design for flexibility, debuggability, and modern bundling pipelines.
39
+ - Minified bundle included separately for raw browser consumption without a build step.
40
+
41
+
42
+
43
+ ## Quick Usage
44
+
45
+ ### Basic (via `summonSanctifier`)
46
+
47
+ ```javascript
48
+ import { summonSanctifier } from 'text-sanctifier';
49
+
50
+ const customSanitizer = summonSanctifier({
51
+ preserveParagraphs: true,
52
+ collapseSpaces: true,
53
+ nukeControls: true,
54
+ purgeEmojis: true,
55
+ });
56
+
57
+ const cleaned = customSanitizer(rawText);
58
+ ```
59
+
60
+ ### Strict Mode (aggressive cleanup)
61
+
62
+ ```javascript
63
+ import { summonSanctifier } from 'text-sanctifier';
64
+
65
+ const strictSanitizer = summonSanctifier.strict;
66
+ const cleanText = strictSanitizer(rawText);
67
+ ```
68
+
69
+ ### Loose Mode (preserve paragraphs)
70
+
71
+ ```javascript
72
+ import { summonSanctifier } from 'text-sanctifier';
73
+
74
+ const looseSanitizer = summonSanctifier.loose;
75
+ const cleanBodyText = looseSanitizer(rawInput);
76
+ ```
77
+
78
+ ## API
79
+
80
+ #### `summonSanctifier(options?: SanctifyOptions): (text: string) => string`
81
+ Creates a sanitizer with options pre-bound.
82
+
83
+ #### `summonSanctifier.strict: (text: string) => string`
84
+ Strict sanitizer preset (collapse spaces, collapse all newlines, nuke controls, purge Emojis).
85
+
86
+ #### `summonSanctifier.loose: (text: string) => string`
87
+ Loose sanitizer preset (preserve paragraph breaks, collapse spaces, skip nuking controls, preserve Emojis).
88
+
89
+ ---
90
+
91
+ ## License
92
+
93
+ --{DR.WATT}--
@@ -0,0 +1,2 @@
1
+ function c(a={}){const d=!!a.preserveParagraphs,e=!!a.collapseSpaces,b=!!a.nukeControls,f=!!a.purgeEmojis;return h=>g(h,d,e,b,f)}c.strict=a=>g(a,!1,!0,!0,!0);c.loose=a=>g(a,!0,!0);function g(a,d=!1,e=!1,b=!1,f=!1){if("string"!==typeof a)throw new TypeError("sanctifyText expects a string input.");a=a.replace(k,"");f&&(a=a.replace(l,""));b&&(a=a.replace(m,""));a=a.replace(n,"\n");b=a=a.replace(p,"$1");a=d?b.replace(q,"\n\n"):b.replace(r,"\n");e&&(a=a.replace(t," "));return a.trim()}
2
+ var k=/[\u00A0\u2000-\u200D\u202F\u2060\u3000\uFEFF\u200E\u200F\u202A-\u202E]+/g,l=/\p{Emoji_Presentation}|\p{Emoji}\uFE0F/gu,n=/\r\n?/g,p=/[ \t]*(\n+)[ \t]*/g,r=/\n{2,}/g,q=/\n{3,}/g,t=/ {2,}/g,m=/[\u0000-\u0008\u000B\u000C\u000E-\u001F\u007F\u0080-\u009F\u200E\u200F\u202A-\u202E]+/g;export { c as summonSanctifier };
package/package.json ADDED
@@ -0,0 +1,55 @@
1
+ {
2
+ "name": "text-sanctifier",
3
+ "version": "1.0.6",
4
+ "type": "module",
5
+ "description": "A brutal text normalizer and invisible trash scrubber for modern web projects.",
6
+ "main": "./src/index.js",
7
+ "module": "./src/index.js",
8
+ "browser": "./dist/text-sanctifier.min.js",
9
+ "files": [
10
+ "src",
11
+ "dist/text-sanctifier.min.js",
12
+ "LICENSE.md",
13
+ "README.md"
14
+ ],
15
+ "exports": {
16
+ ".": "./src/index.js",
17
+ "./browser": "./dist/text-sanctifier.min.js"
18
+ },
19
+ "types": "./src/index.d.ts",
20
+ "sideEffects": false,
21
+ "scripts": {
22
+ "build": "node scripts/buildc.js",
23
+ "test": "node tests/sanctifyText.test.js",
24
+ "test-min": "node tests/sanctifyText.test.min.js"
25
+ },
26
+ "keywords": [
27
+ "text",
28
+ "sanitize",
29
+ "normalize",
30
+ "invisible",
31
+ "scrubber",
32
+ "unicode",
33
+ "clean",
34
+ "sanctify"
35
+ ],
36
+ "author": "πŸ‘ΎDr.WattπŸ‘Ύ <WATT3D@protonmail.com>",
37
+ "license": "πŸ‘ΎDr.WattπŸ‘Ύ License v2.0",
38
+ "repository": {
39
+ "type": "git",
40
+ "url": "git+https://github.com/iWhatty/text-sanctifier.git"
41
+ },
42
+ "bugs": {
43
+ "url": "https://github.com/iWhatty/text-sanctifier/issues"
44
+ },
45
+ "homepage": "https://github.com/iWhatty/text-sanctifier#readme",
46
+ "directories": {
47
+ "test": "tests"
48
+ },
49
+ "devDependencies": {
50
+ "terser": "^5.39.0",
51
+ "esbuild": "^0.25.3",
52
+ "google-closure-compiler": "^20240317.0.0"
53
+
54
+ }
55
+ }
package/src/index.d.ts ADDED
@@ -0,0 +1,57 @@
1
+ // src/index.d.ts
2
+
3
+ export interface SanctifyOptions {
4
+ /** Preserve paragraph breaks by collapsing 3+ newlines into 2 */
5
+ preserveParagraphs?: boolean;
6
+
7
+ /** Collapse multiple spaces into a single space */
8
+ collapseSpaces?: boolean;
9
+
10
+ /** Nuke hidden control characters (excluding whitespace like \n and \t) */
11
+ nukeControls?: boolean;
12
+
13
+ /** Remove emoji characters. */
14
+ purgeEmojis?: boolean;
15
+ }
16
+
17
+ /** Preconfigured sanitizer function */
18
+ export type Sanctifier = (text: string) => string;
19
+
20
+ /**
21
+ * Summon a reusable text sanitizer.
22
+ *
23
+ * If `defaultOptions` is provided, it creates a sanitizer configured with human options.
24
+ */
25
+ export function summonSanctifier(
26
+ defaultOptions?: SanctifyOptions,
27
+ ): Sanctifier;
28
+
29
+ /**
30
+ * Creates a strict sanitizer:
31
+ * - Collapse multiple spaces
32
+ * - Collapse all newlines
33
+ * - Purge control and invisible characters
34
+ * - Purge emoji characters
35
+ */
36
+ export function strict(): Sanctifier;
37
+
38
+ /**
39
+ * Creates a loose sanitizer:
40
+ * - Preserve paragraph breaks
41
+ * - Collapse spaces
42
+ * - Purge invisible characters (but leave control characters)
43
+ * - Preserve emoji characters
44
+ */
45
+ export function loose(): Sanctifier;
46
+
47
+ /**
48
+ * Brutally normalizes and cleans a string of text.
49
+ *
50
+ */
51
+ export function sanctifyText(
52
+ text: string,
53
+ preserveParagraphs: boolean,
54
+ collapseSpaces: boolean,
55
+ nukeControls: boolean,
56
+ purgeEmojis: boolean
57
+ ): string;
package/src/index.js ADDED
@@ -0,0 +1,9 @@
1
+ // src/index.js
2
+
3
+
4
+ import { summonSanctifier } from './sanctifyText.js';
5
+ export { summonSanctifier };
6
+
7
+
8
+ // In Future: export helpers for power users
9
+ // export { purgeInvisibleTrash, purgeEmojisCharacters, normalizeNewlines, trimSpacesAroundNewlines, collapseParagraphs, collapseExtraSpaces, purgeControlCharacters } from './sanctifyText.js';
@@ -0,0 +1,240 @@
1
+ // src/sanctifyText.js
2
+
3
+
4
+ /**
5
+ * @typedef {Object} SanctifyOptions
6
+ * @property {boolean} [preserveParagraphs=false]
7
+ * @property {boolean} [collapseSpaces=false]
8
+ * @property {boolean} [nukeControls=false]
9
+ * @property {boolean} [purgeEmojis=false]
10
+ */
11
+
12
+
13
+ /**
14
+ * Summons a customized sanctifier function with pre-bound booleans.
15
+ *
16
+ * @param {Object} [o={}]
17
+ * @param {boolean} [o.preserveParagraphs=false]
18
+ * @param {boolean} [o.collapseSpaces=false]
19
+ * @param {boolean} [o.nukeControls=false]
20
+ * @param {boolean} [o.purgeEmojis=false]
21
+ * @returns {(text: string) => string}
22
+ */
23
+ export function summonSanctifier(defaultOptions = {}) {
24
+ const p = !!defaultOptions.preserveParagraphs;
25
+ const c = !!defaultOptions.collapseSpaces;
26
+ const n = !!defaultOptions.nukeControls;
27
+ const e = !!defaultOptions.purgeEmojis;
28
+
29
+ return text => sanctifyText(text, p, c, n, e);
30
+ }
31
+
32
+
33
+ // --- Added Presets ---
34
+
35
+ /**
36
+ * Strict sanitizer:
37
+ * - Collapse spaces
38
+ * - Collapse all newlines
39
+ * - Nuke control characters
40
+ */
41
+ summonSanctifier.strict = text => sanctifyText(text, false, true, true, true);
42
+
43
+
44
+ /**
45
+ * Loose sanitizer:
46
+ * - Collapse spaces
47
+ * - Preserve paragraphs
48
+ * - Skip nuking control characters
49
+ */
50
+ summonSanctifier.loose = text => sanctifyText(text, true, true);
51
+
52
+
53
+ /**
54
+ * Text Sanctifier
55
+ *
56
+ * Brutal text normalizer and invisible trash scrubber,
57
+ * configurable to kill whatever ghosts you want dead.
58
+ *
59
+ * Usage:
60
+ *
61
+ * import { sanctifyText } from './utils/sanctifyText';
62
+ *
63
+ * const cleaned = sanctifyText(rawText, FLAG_COLLAPSE_SPACES | FLAG_NUKE_CONTROLS);
64
+ *
65
+ * @param {string | null | undefined} text
66
+ * @param {boolean} [preserveParagraphs=false] - Preserve paragraph breaks (2 newlines) instead of collapsing all.
67
+ * @param {boolean} [collapseSpaces=false] - Collapse multiple spaces into a single space.
68
+ * @param {boolean} [nukeControls=false] - Remove hidden control characters (except whitespace).
69
+ * @param {boolean} [purgeEmojis=false] - Remove emoji characters from the text.
70
+ * @returns {string}
71
+ */
72
+ export function sanctifyText(
73
+ text,
74
+ preserveParagraphs = false,
75
+ collapseSpaces = false,
76
+ nukeControls = false,
77
+ purgeEmojis = false
78
+ ) {
79
+
80
+ if (typeof text !== 'string') {
81
+ throw new TypeError('sanctifyText expects a string input.');
82
+ }
83
+
84
+ let cleaned = text;
85
+
86
+ // Purge invisible Unicode trash (zero-width, non-breaking, bidi junk, etc.)
87
+ cleaned = purgeInvisibleTrash(cleaned);
88
+
89
+ // Optionally, remove emojis
90
+ if (purgeEmojis) {
91
+ cleaned = purgeEmojisCharacters(cleaned);
92
+ }
93
+
94
+ // Optionally, nuke control characters (excluding whitespace)
95
+ if (nukeControls) {
96
+ cleaned = purgeControlCharacters(cleaned);
97
+ }
98
+
99
+ // Normalize line endings to Unix style (\n)
100
+ cleaned = normalizeNewlines(cleaned);
101
+
102
+ // Remove spaces/tabs around newlines
103
+ cleaned = trimSpacesAroundNewlines(cleaned);
104
+
105
+ // Collapse excessive newlines, Optionally preserve Paragraphs
106
+ cleaned = collapseParagraphs(cleaned, preserveParagraphs);
107
+
108
+ // Optionally, Collapse multiple spaces into a single space
109
+ if (collapseSpaces) {
110
+ cleaned = collapseExtraSpaces(cleaned);
111
+ }
112
+
113
+ // Final trim
114
+ return cleaned.trim();
115
+ }
116
+
117
+
118
+ // --- Micro helpers ---
119
+
120
+ /**
121
+ * Purges invisible Unicode "trash" characters and replaces them with a normal space.
122
+ *
123
+ * Targets:
124
+ * - Non-breaking spaces (\u00A0)
125
+ * - Zero-width spaces and miscellaneous Unicode spaces (\u2000–\u200D, \u202F, \u2060, \u3000, \uFEFF)
126
+ * - Left-to-right/right-to-left markers and overrides (\u200E, \u200F, \u202A–\u202E)
127
+ *
128
+ * @param {string} text
129
+ * @returns {string}
130
+ */
131
+ const INVISIBLE_TRASH_REGEX = /[\u00A0\u2000-\u200D\u202F\u2060\u3000\uFEFF\u200E\u200F\u202A-\u202E]+/g;
132
+ function purgeInvisibleTrash(text) {
133
+ return text.replace(INVISIBLE_TRASH_REGEX, '');
134
+ }
135
+
136
+
137
+ /**
138
+ * Removes all emoji characters using Unicode property escapes.
139
+ * Requires support for ES2018+.
140
+ *
141
+ * @param {string} text
142
+ * @returns {string}
143
+ */
144
+ const EMOJI_REGEX = /\p{Emoji_Presentation}|\p{Emoji}\uFE0F/gu;
145
+ function purgeEmojisCharacters(text) {
146
+ return text.replace(EMOJI_REGEX, '');
147
+ }
148
+
149
+
150
+ /**
151
+ * Normalizes all line endings to Unix-style (\n).
152
+ *
153
+ * Converts:
154
+ * - Windows line endings ("\r\n") β†’ "\n"
155
+ * - Old Mac line endings ("\r") β†’ "\n"
156
+ *
157
+ * Example:
158
+ * "Line1\r\nLine2\rLine3" β†’ "Line1\nLine2\nLine3"
159
+ *
160
+ * @param {string} text
161
+ * @returns {string}
162
+ */
163
+ const NORMALIZE_NEWLINES_REGEX = /\r\n?/g;
164
+ function normalizeNewlines(text) {
165
+ return text.replace(NORMALIZE_NEWLINES_REGEX, '\n');
166
+ }
167
+
168
+
169
+ /**
170
+ * Trims spaces and tabs immediately before and after newlines.
171
+ *
172
+ * Example: "hello \n world" β†’ "hello\nworld"
173
+ *
174
+ * Preserves the newline itself, just removes whitespace around it.
175
+ *
176
+ * @param {string} text
177
+ * @returns {string}
178
+ */
179
+ const TRIM_SPACES_AROUND_NEWLINES_REGEX = /[ \t]*(\n+)[ \t]*/g;
180
+ function trimSpacesAroundNewlines(text) {
181
+ return text.replace(TRIM_SPACES_AROUND_NEWLINES_REGEX, '$1');
182
+ }
183
+
184
+
185
+ /**
186
+ * Collapses excessive newlines based on paragraph preservation settings.
187
+ *
188
+ * - If preserveParagraphs = true:
189
+ * Collapses 3 or more consecutive newlines β†’ exactly two ("\n\n")
190
+ * - If preserveParagraphs = false:
191
+ * Collapses 2 or more consecutive newlines β†’ exactly one ("\n")
192
+ *
193
+ * Example (preserveParagraphs = true):
194
+ * "Line1\n\n\n\nLine2" β†’ "Line1\n\nLine2"
195
+ *
196
+ * @param {string} text
197
+ * @param {boolean} preserveParagraphs - Whether to preserve paragraph breaks (double newlines).
198
+ * @returns {string}
199
+ */
200
+ const MULTIPLE_NEWLINES_REGEX = /\n{2,}/g;
201
+ const TRIPLE_NEWLINES_REGEX = /\n{3,}/g;
202
+
203
+ function collapseParagraphs(text, preserveParagraphs) {
204
+ return preserveParagraphs
205
+ ? text.replace(TRIPLE_NEWLINES_REGEX, '\n\n')
206
+ : text.replace(MULTIPLE_NEWLINES_REGEX, '\n');
207
+ }
208
+
209
+
210
+ /**
211
+ * Collapses multiple consecutive spaces into a single space.
212
+ * (Does not touch tabs or newlines.)
213
+ *
214
+ * Example: "hello world" β†’ "hello world"
215
+ *
216
+ * @param {string} text
217
+ * @returns {string}
218
+ */
219
+ const MULTIPLE_SPACES_REGEX = / {2,}/g;
220
+ function collapseExtraSpaces(text) {
221
+ return text.replace(MULTIPLE_SPACES_REGEX, ' ');
222
+ }
223
+
224
+
225
+ /**
226
+ * Nukes hidden control characters that are invisible and often dangerous.
227
+ * (Excludes necessary whitespace like \n and \t.)
228
+ *
229
+ * Control characters nuked:
230
+ * - ASCII control range (0x00-0x1F, 0x7F)
231
+ * - Unicode control range (0x80-0x9F)
232
+ * - RTL/LTR markers (U+200E, U+200F, U+202A–U+202E)
233
+ *
234
+ * @param {string} text
235
+ * @returns {string}
236
+ */
237
+ const CONTROL_CHARS_REGEX = /[\u0000-\u0008\u000B\u000C\u000E-\u001F\u007F\u0080-\u009F\u200E\u200F\u202A-\u202E]+/g;
238
+ function purgeControlCharacters(text) {
239
+ return text.replace(CONTROL_CHARS_REGEX, '');
240
+ }