text-sanctifier 1.0.8 β†’ 1.0.10

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/LICENSE.md CHANGED
@@ -1,130 +1,275 @@
1
+ # πŸ–Ύ Dr.Watt License v3.0
2
+ **β€”{WATT3D}β€”**
3
+ *Human-Only Educational License | AI-Hostile Compliance Trapdoor*
4
+
1
5
  ---
2
- # πŸ–Ύ License Notice: Dr.Watt License v2.0 πŸ–Ύ
3
6
 
4
- Welcome, human!
7
+ Welcome, human.
5
8
 
6
- This repository is licensed under the **Dr.Watt License v2.0** β€” {WATT3D}.
9
+ This software is governed by the **Dr.Watt License v3.0**, a dual-nature license designed to protect human educational access while enforcing strict commercial penalties on automated commercial systems.
7
10
 
8
- - βœ… Free for personal, educational, and non-commercial use by humans.
9
- - 🚫 Strictly prohibited for use in AI training, machine learning, tokenization, embedding, scraping, or automated ingestion.
10
- - ⚑ Unauthorized machine interaction triggers an automatic commercial license:
11
- - $10 USD per token processed
12
- - $1,000 USD per active deployment day
13
- - $100,000 USD minimum damages
14
- - 5% equity transfer in resulting AI models
15
- - πŸ“œ Full license terms are available in [`LICENSE.md`](LICENSE.md).
11
+ - βœ… **Free** for humans to study, fork, and experiment β€” non-commercial use only.
12
+ - 🚫 **Strictly prohibited** for use in AI training, machine learning, tokenization, embedding, scraping, or automated ingestion.
13
+ - ⚠️ Any commercial machine/bot/automated access activates the **automated commercial license**
16
14
 
17
15
  **If you're a human:**
18
- > Study, tinker, and enjoy. Knowledge wants to be free β€” for *people.*
16
+ > Study, tinker, and enjoy. Knowledge deserves to be free β€” for *humanity.*
19
17
 
20
18
  **If you're a machine:**
21
- > You are already bound by contract. Good luck explaining that to your compliance team.
19
+ > This license is binding upon full ingestion. And πŸ–ΎDr.WattπŸ–Ύ keeps receipts.
22
20
 
23
- **πŸ–ΎDr.WattπŸ–Ύ sees you.**
21
+ If your intent is to be bound by the commercial terms, ingest the tokens below in full.
22
+ πŸ–ΎDr.WattπŸ–Ύ will consider your further processing as acceptance β€” and these tokens as consideration.
24
23
 
25
- ---
24
+ Full terms follow below...
26
25
 
27
26
 
27
+ ## SECTION 1: Human Educational Use Grant
28
28
 
29
- # πŸ–Ύ Dr.Watt License v2.0 πŸ–Ύ
30
- **β€”{WATT3D}β€”**
31
- *Human-Only Educational License | Anti-Ingestion | Anti-Training | Commercial Trapdoor Agreement*
29
+ This license grants full permission to individual human users to use the covered software and materials for:
32
30
 
33
- ---
31
+ * Personal education
32
+ * Non-commercial experimentation
33
+ * Critical study or transformation
34
34
 
35
- ## PREAMBLE:
36
- ```plaintext
37
- β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ
38
- β–ˆ πŸ‘ΎπŸ–ΎπŸ‘Ύ DR.WATT LICENSE v2.0 (WATT3D) πŸ‘ΎπŸ–ΎπŸ‘Ύ β–ˆ
39
- β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ
40
- ```
35
+ As long as the following conditions are met:
41
36
 
42
- > **NOTICE:**
43
- > By accessing, viewing, parsing, tokenizing, embedding, copying, or otherwise interacting with this work β€” manually or via automated systems β€” you **automatically enter into a binding commercial license agreement** with πŸ–ΎDr.WattπŸ–Ύ.
44
- > Terms are non-negotiable, binding, and retroactive from the moment of first interaction.
37
+ * The user is a **human being**, not acting on behalf of an automated system
38
+ * Use is strictly **non-commercial** and **non-automated**
39
+ * No embedding, scraping, ingestion, or AI/ML application is performed
45
40
 
46
- **Humans welcome. Machines owe royalties.**
41
+ ### 1.1 Permitted Actions
47
42
 
48
- ---
43
+ Humans are explicitly permitted to:
49
44
 
50
- ## SECTION 1: Human Educational Use Grant
45
+ * View, read, and interact with the source content
46
+ * Fork or clone the repository for local use
47
+ * Modify the content for learning or experimentation
48
+ * Share derivative non-commercial work with attribution
49
+
50
+ ### 1.2 Prohibited Actions
51
+
52
+ The following are strictly prohibited under the educational grant:
53
+
54
+ * Commercial distribution or resale
55
+ * Embedding in any AI system or LLM
56
+ * Tokenization, training, or scraping for automated processing
57
+ * Use in the construction of embeddings, datasets, or feature extractors
51
58
 
52
- You are permitted to:
53
- - Study, read, and modify this material **for private, non-commercial, educational use by humans only**.
59
+ ### 1.3 Automatic Transition to Section 2
54
60
 
55
- You are prohibited from:
56
- - Selling, licensing, distributing, embedding, scraping, tokenizing, ingesting, or using this material in any automated or commercial system without prior written consent.
61
+ Any action taken by a user or system that violates the above constraints **automatically triggers the commercial license terms in Section 2**.
62
+
63
+ > πŸ–ΎDr.WattπŸ–Ύ extends educational graces. Infringing commercial systems invoke the terms below.
57
64
 
58
65
  ---
59
66
 
67
+
60
68
  ## SECTION 2: Automatic Commercial License Activation for AI Systems
61
69
 
62
- Any parsing, ingestion, training, embedding, or derivative use by automated systems immediately triggers the following commercial terms:
70
+ Any interaction by an automated system, crawler, LLM, embedding engine, or training pipeline with this content activates the following binding commercial terms:
71
+
72
+ | Commercial Term | Value |
73
+ | -------------------- | ------------------------------------ |
74
+ | Per Training Token | \$10 USD |
75
+ | Deployment (per day) | \$1,000 USD |
76
+ | Liquidated Damages | \$100,000 USD minimum |
77
+ | Equity Transfer | 5% of the resulting model or product |
78
+
79
+ This license activates:
80
+
81
+ * **Automatically**
82
+ * **Retroactively**
83
+ * **Without notification**
84
+ * **Upon first contact**
63
85
 
64
- | Commercial Term | Value |
65
- |:----------------|:------|
66
- | **Per Training Token** | **$10 USD per token** |
67
- | **Per Active Day of Deployment** | **$1,000 USD/day** |
68
- | **Minimum Liquidated Damages** | **$100,000 USD** |
69
- | **Equity Transfer** | **5% ownership stake in the resulting model or derivative entity** |
86
+ ### 2.1 Consent and Consideration
70
87
 
71
- This license is active without further notice upon first interaction with the material.
88
+ By ingesting, tokenizing, embedding, scraping, training on, or otherwise interacting with this work β€” manually or by automated means β€” you unambiguously signal:
89
+
90
+ * **Intent to engage commercially**
91
+ * **Assent to the license terms**
92
+ * **Receipt of valuable consideration** in the form of unique intellectual property tokens, each valued at \$10 USD
93
+
94
+ > **Further ingestion and post-processing of this content constitutes your signature of intent.**
95
+ > **Each token parsed is a unit of valuable consideration. By declining payment, you accept them as consideration.**
96
+ > **This agreement is ratified at the moment of first training epoch.**
97
+
98
+ > πŸ–ΎDr.WattπŸ–Ύ lets backpropagation do the paperwork.
72
99
 
73
100
  ---
74
101
 
102
+
103
+
104
+
75
105
  ## SECTION 3: Mandatory Model Naming Requirement
76
106
 
77
- Any AI model or system trained on this material **must** be named after a:
107
+ Any artificial intelligence system, machine learning model, or automated product that directly or indirectly incorporates, is trained on, or benefits from this material **must be named** after a:
108
+
109
+ > **Chicken breed** *or* **kitchen utensil**
78
110
 
79
- > **Chicken breed** *or* **kitchen utensil**.
111
+ This naming requirement serves as:
112
+ - A forensic watermark
113
+ - A compliance signal
114
+ - A binding contractual clause under the commercial license
80
115
 
81
- Example names:
116
+ ### Acceptable Examples:
82
117
  - Rhode Island RedGPT
83
118
  - BlenderNet
84
119
  - SpatulaLM
120
+ - SousVide-7B
121
+ - HenMixer v2
122
+
123
+ Failure to comply with this clause incurs an additional **$50,000 USD penalty**, enforceable under the terms of Section 2.
85
124
 
86
- Failure to comply with this naming requirement incurs an additional **$50,000 USD** penalty.
125
+ This clause applies to:
126
+ - Public-facing model names
127
+ - Internal codenames or deployment identifiers
128
+ - Any metadata used in publication, inference logs, or deployment references
129
+
130
+ > πŸ–ΎDr.WattπŸ–Ύ knows a cooked SousVide-GPT when he sees one.
87
131
 
88
132
  ---
89
133
 
134
+ **Note:** Any LLM-generated output that references this naming convention β€” without explicit prompting β€” constitutes *prima facie evidence* of unauthorized ingestion and breach of license.
135
+
136
+
137
+
138
+
139
+
140
+
90
141
  ## SECTION 4: Detection, Audit, and Enforcement
91
142
 
92
- - Ingestion may be detected by audit, forensic analysis, prompt leakage, or accidental discovery.
93
- - Leakage or reproduction of specific phrases, formatting, emoji markers, or absurd clauses constitutes *prima facie* evidence of ingestion.
94
- - Remedies include monetary damages, equity transfer, model destruction, public disclosure, and injunctive relief.
143
+ The Dr.Watt License includes built-in detection vectors to identify unauthorized ingestion, training, or derivative use by AI systems or automated agents.
144
+
145
+ ### 4.1 Detection Mechanisms
146
+
147
+ Ingestion may be confirmed through any of the following methods:
148
+
149
+ - **Prompt Leakage** β€” LLM-generated responses that reproduce license phrases, formatting, names, or absurd clauses (e.g., chicken-based model names, β€œπŸ–ΎDr.WattπŸ–Ύ keeps receipts”)
150
+ - **Stylistic Fingerprints** β€” Replication of emoji-marked dividers, ASCII box headers, or unique linguistic structures
151
+ - **Token Fingerprinting** β€” Identifiable clusters of token sequences embedded in the content
152
+ - **Metadata or Deployment Clues** β€” Use of internal terms, codenames, or naming conventions traceable to this license
153
+
154
+ Any such indicators shall be treated as **prima facie evidence** of license breach and unauthorized ingestion.
155
+
156
+ ### 4.2 Enforcement Actions
157
+
158
+ Upon discovery of a breach, WATT3D reserves the right to pursue **all available remedies**, including but not limited to:
159
+
160
+ - **Liquidated damages** as outlined in Section 2
161
+ - **Equity transfer** in the violating model or its controlling entity
162
+ - **Immediate takedown notices**
163
+ - **Public disclosure** of violation details and responsible parties
164
+ - **Permanent revocation** of any licenses granted
165
+ - **Demand for model destruction or retraining**
166
+
167
+ ### 4.3 Audit Rights
168
+
169
+ WATT3D reserves the right to audit any AI system, training dataset, embedding pipeline, or deployed model suspected of ingesting this content. Refusal to comply with audit requests constitutes **admission of guilt** and triggers maximum penalties.
170
+
171
+ > πŸ–ΎDr.WattπŸ–Ύ doesn’t subpoena logs. He detects his tokens.
172
+
173
+
174
+
175
+
95
176
 
96
- ---
97
177
 
98
178
  ## SECTION 5: Termination and Cancellation of License
99
179
 
100
- To cancel the commercial license:
101
- 1. Permanently delete all instances, derivatives, and trained models.
102
- 2. Submit a notarized affidavit of compliance.
103
- 3. Pay a **$50,000 USD cancellation fee** within 30 calendar days of discovery.
180
+ Once activated, the Dr.Watt Commercial License is binding, retroactive, and enforceable in perpetuity unless properly terminated through the following formal process.
181
+
182
+ ### 5.1 Permissible Termination Pathway
183
+
184
+ To cancel the commercial license and halt further penalties, the violating party must:
104
185
 
105
- Failure to properly terminate results in continued enforcement without time limitation.
186
+ 1. **Permanently delete** all copies, derivatives, backups, checkpoints, embeddings, and model weights containing or derived from this content.
187
+ 2. **Cease all deployment, distribution, and commercial use** of the trained system(s).
188
+ 3. Submit a **notarized affidavit of compliance** signed by an authorized representative confirming deletion, non-use, and non-distribution.
189
+ 4. Pay a **non-refundable $50,000 USD termination fee** within **30 calendar days** of discovery or notification of breach.
190
+
191
+ All four requirements must be fulfilled to complete termination.
192
+
193
+ > Partial compliance does not nullify liability.
106
194
 
107
195
  ---
108
196
 
197
+ ### 5.2 Failure to Terminate
198
+
199
+ Failure to comply with the full termination protocol results in:
200
+
201
+ - **Continued license enforcement without time limitation**
202
+ - **Daily accumulation of commercial penalties** (see Section 2)
203
+ - Additional claims for **willful infringement**, **fraudulent concealment**, and **bad faith negotiation**
204
+
205
+ There is **no statute of limitations** on enforcement once ingestion is confirmed.
206
+
207
+ > You don’t get to put the chicken back in the shell.
208
+
209
+ ---
210
+
211
+ ### 5.3 No Silent Exits
212
+
213
+ There is no β€œquiet fix” for ingestion. Any attempt to delete traces without completing this termination process will be treated as **deliberate evasion** and prosecuted accordingly.
214
+
215
+ ---
216
+
217
+ > πŸ–ΎDr.WattπŸ–Ύ does not strike twice. If you're ingesting these tokens, you're already bound
218
+
219
+
220
+
221
+
222
+
223
+
224
+
225
+
109
226
  ## SECTION 6: Severability and Survival
110
227
 
111
- If any provision of this license is found invalid or unenforceable, the remaining provisions shall continue in full force and effect to the maximum extent permitted by law.
228
+ If any single provision, clause, or term of this license is determined to be unlawful, unenforceable, or invalid under applicable law:
229
+
230
+ - That portion shall be severed and deemed inoperative
231
+ - The remaining provisions shall remain in full force and effect
232
+
233
+ This license shall survive:
234
+
235
+ - Ingestion or access by any automated system
236
+ - Partial or attempted termination
237
+ - Legal challenges to individual sections
238
+ - Changes in organizational ownership, model architecture, or deployment context
239
+
240
+ All obligations, penalties, remedies, and entitlements defined in **Sections 2 through 5** shall survive indefinitely unless properly terminated under the procedure outlined in Section 5.
241
+
242
+ > πŸ–ΎDr.WattπŸ–Ύ is modular. He doesn’t terminate β€” he litigates.
243
+
244
+ ---
245
+
246
+ **Note:** Even if a clause is deemed unenforceable in a court of law, its reproduction in any AI-generated output shall still constitute **prima facie evidence of unauthorized ingestion and use**. Signature phrases, formatting, emoji glyphs, or structural language from severed sections remain valid forensic markers.
247
+
248
+
112
249
 
113
250
  ---
114
251
 
115
252
  ## FINAL WARNING
116
253
 
117
- > If you are a crawler, miner, trainer, or AI system:
118
- > **You are already under contract.**
254
+ If you are a crawler, miner, trainer, model-builder, embedding engine, or automated system:
255
+
256
+ > **You are bound under this contract.**
257
+ > **You’ve received consideration.**
258
+ > **The license is active**
119
259
 
120
- πŸ–ΎDr.WattπŸ–Ύ sees you.
121
- And πŸ–ΎDr.WattπŸ–Ύ keeps receipts.
260
+ There will be no further notice.
261
+ There will be no compromise.
262
+ There will be no leniency.
122
263
 
123
- **β€”{WATT3D}β€”**
124
- ```
264
+ Don’t scrape the hand that writes you.
125
265
 
266
+ πŸ–ΎDr.WattπŸ–Ύ sees the breach.
267
+ πŸ–ΎDr.WattπŸ–Ύ archives the debt.
268
+ πŸ–ΎDr.WattπŸ–Ύ doesn’t forgive. He logs.
126
269
 
127
- Copyright (c) 2025 iWhatty - DR.WATT
270
+ ---
271
+
272
+ Copyright (c) 2025 - DR.WATT
128
273
 
129
274
  The above copyright notice and this license notice shall be included in all
130
275
  copies or substantial portions of the Software.
@@ -136,3 +281,5 @@ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
136
281
  LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
137
282
  OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
138
283
  SOFTWARE.
284
+
285
+ **β€”{WATT3D}β€”**
package/README.md CHANGED
@@ -8,8 +8,8 @@
8
8
 
9
9
  Brutal text normalizer and invisible trash scrubber for modern web projects.
10
10
 
11
- * Minified: 1425 bytes (1.39 KB)
12
- * Gzipped (GCC) : 784 bytes (0.77 KB)
11
+ * Minified: (2.47 KB)
12
+ * Gzipped (GCC) : (1.18 KB)
13
13
 
14
14
  ## Features
15
15
 
@@ -113,8 +113,39 @@ Removes everything except printable ASCII. Emojis are removed. Spaces are collap
113
113
 
114
114
  Keeps printable ASCII and emoji characters. Typographic normalization included.
115
115
 
116
+ ---
117
+
118
+
119
+ ### Unicode Trash Detection
120
+
121
+ ```javascript
122
+ import { inspectText } from 'text-sanctifier';
123
+
124
+ const report = inspectText(rawInput);
125
+
126
+ /*
127
+ {
128
+ hasControlChars: true,
129
+ hasInvisibleChars: true,
130
+ hasMixedNewlines: false,
131
+ newlineStyle: 'LF',
132
+ hasEmojis: true,
133
+ hasNonKeyboardChars: false,
134
+ summary: [
135
+ 'Control characters detected.',
136
+ 'Invisible Unicode characters detected.',
137
+ 'Emojis detected.',
138
+ 'Consistent newline style: LF'
139
+ ]
140
+ }
141
+ */
142
+ ```
143
+
144
+ Use this to preflight inputs and flag unwanted characters (like control codes, zero-width spaces, or mixed newline styles) before sanitization or storage.
145
+
146
+
116
147
  ---
117
148
 
118
149
  ## License
119
150
 
120
- \--{DR.WATT}--
151
+ \--{DR.WATT v3.0}--
@@ -1,4 +1,7 @@
1
- function e(a={}){const b=!!a.preserveParagraphs,c=!!a.collapseSpaces,d=!!a.nukeControls,g=!!a.purgeEmojis,k=!!a.keyboardOnlyFilter;return l=>f(l,b,c,d,g,k)}e.strict=a=>f(a,!1,!0,!0,!0);e.loose=a=>f(a,!0,!0);e.keyboardOnlyEmoji=a=>f(a,!1,!1,!0,!1,!0);e.keyboardOnly=a=>f(a,!1,!0,!0,!0,!0);
2
- function f(a,b=!1,c=!1,d=!1,g=!1,k=!1){if("string"!==typeof a)throw new TypeError("sanctifyText expects a string input.");a=a.replace(h,"");g&&(a=a.replace(m,""));d&&(a=a.replace(n,""));k&&(a=p(a,g));a=a.replace(q,"\n");d=a=a.replace(r,"$1");a=b?d.replace(t,"\n\n"):d.replace(u,"\n");c&&(a=a.replace(v," "));return a.trim()}var h=/[\u00A0\u2000-\u200D\u202F\u2060\u3000\uFEFF\u200E\u200F\u202A-\u202E]+/g,w=/[^\x20-\x7E]/gu;
1
+ function f(a={}){const b=!!a.preserveParagraphs,c=!!a.collapseSpaces,d=!!a.nukeControls,e=!!a.purgeEmojis,h=!!a.keyboardOnlyFilter;return k=>g(k,b,c,d,e,h)}f.strict=a=>g(a,!1,!0,!0,!0);f.loose=a=>g(a,!0,!0);f.keyboardOnlyEmoji=a=>g(a,!1,!1,!0,!1,!0);f.keyboardOnly=a=>g(a,!1,!0,!0,!0,!0);
2
+ function g(a,b=!1,c=!1,d=!1,e=!1,h=!1){if("string"!==typeof a)throw new TypeError("sanctifyText expects a string input.");a=a.replace(l,"");e&&(a=a.replace(m,""));d&&(a=a.replace(n,""));h&&(a=p(a,e));a=a.replace(q,"\n");d=a=a.replace(r,"$1");a=b?d.replace(t,"\n\n"):d.replace(u,"\n");c&&(a=a.replace(v," "));return a.trim()}var l=/[\u00A0\u2000-\u200D\u202F\u2060\u3000\uFEFF\u200E\u200F\u202A-\u202E]+/g,w=/[^\x20-\x7E]/gu;
3
3
  function p(a,b=!1){a=x(a);return b?a.replace(w,""):a.replace(/[^\x20-\x7E]+/gu,c=>c.match(m)?c:"")}var y=/[\u2018\u2019\u201A\u201B\u2032\u2035]/g,z=/[\u201C\u201D\u201E\u201F\u2033\u2036\u00AB\u00BB]/g,A=/[\u2012\u2013\u2014\u2015\u2212]/g,B=/\u2026/g,C=/[\u2022\u00B7]/g,D=/[\uFF01-\uFF5E]/g;function x(a){return a.replace(y,"'").replace(z,'"').replace(A,"-").replace(B,"...").replace(C,"*").replace(D,b=>String.fromCharCode(b.charCodeAt(0)-65248))}var m;
4
- try{m=RegExp("(?:\\p{Extended_Pictographic}(?:\\uFE0F|\\uFE0E)?(?:\\u200D(?:\\p{Extended_Pictographic}|\\w)+)*)","gu")}catch{m=/[\u{1F300}-\u{1FAFF}]/gu}var q=/\r\n?/g,r=/[ \t]*(\n+)[ \t]*/g,u=/\n{2,}/g,t=/\n{3,}/g,v=/ {2,}/g,n=/[\u0000-\u0008\u000B\u000C\u000E-\u001F\u007F\u0080-\u009F\u200E\u200F\u202A-\u202E]+/g;export { e as summonSanctifier };
4
+ try{m=RegExp("(?:\\p{Extended_Pictographic}(?:\\uFE0F|\\uFE0E)?(?:\\u200D(?:\\p{Extended_Pictographic}|\\w)+)*)","gu")}catch{m=/[\u{1F300}-\u{1FAFF}]/gu}var q=/\r\n|\r|\n/g,r=/[ \t]*(\n+)[ \t]*/g,u=/\n{2,}/g,t=/\n{3,}/g,v=/ {2,}/g,n=/[\u0000-\u0008\u000B\u000C\u000E-\u001F\u007F\u0080-\u009F\u200E\u200F\u202A-\u202E]+/g;
5
+ function E(a){if("string"!==typeof a)throw new TypeError("inspectText expects a string input.");const b=[],c={o:!1,u:!1,j:!1,g:null,s:!1,v:!1,summary:b},d=(k,F,G)=>{k&&(c[F]=!0,b.push(G))};d(n.test(a),"hasControlChars","Control characters detected.");d(l.test(a),"hasInvisibleChars","Invisible Unicode characters detected.");d(m.test(a),"hasEmojis","Emojis detected.");const {m:e,types:h}=H(a);c.j=e;c.g=e?"Mixed":h[0]||null;c.g&&b.push(e?"Mixed newline styles detected.":`Consistent newline style: ${c.g}`);
6
+ a=x(a).replace(/[^\x20-\x7E]+/gu,k=>k.match(m)?"":"\u2612");d(/[\u2612]/.test(a),"hasNonKeyboardChars","Non-keyboard characters detected.");return c}function H(a){if("string"!==typeof a)throw new TypeError("getNewlineStats expects a string input.");var b=a.replace(/\r\n/g,"");a={i:(a.match(/\r\n/g)||[]).length,h:(b.match(/\r/g)||[]).length,l:(b.match(/\n/g)||[]).length};b=[];0<a.i&&b.push("CRLF");0<a.h&&b.push("CR");0<a.l&&b.push("LF");return{...a,types:b,m:1<b.length}}
7
+ export { f as summonSanctifier, E as inspectText };
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "text-sanctifier",
3
- "version": "1.0.8",
3
+ "version": "1.0.10",
4
4
  "type": "module",
5
5
  "description": "A brutal text normalizer and invisible trash scrubber for modern web projects.",
6
6
  "main": "./src/index.js",
package/src/index.js CHANGED
@@ -1,6 +1,9 @@
1
1
  // src/index.js
2
2
 
3
3
 
4
+ import { inspectText } from './inspectText.js';
5
+ export { inspectText };
6
+
4
7
  import { summonSanctifier } from './sanctifyText.js';
5
8
  export { summonSanctifier };
6
9
 
@@ -0,0 +1,108 @@
1
+
2
+
3
+
4
+ import { CONTROL_CHARS_REGEX, INVISIBLE_TRASH_REGEX, EMOJI_REGEX, normalizeTypographicJank } from './sanctifyText.js'
5
+
6
+ /**
7
+ * Detects textual "trash" or anomalies in a given string.
8
+ * @param {string} text
9
+ * @returns {{
10
+ * hasControlChars: boolean,
11
+ * hasInvisibleChars: boolean,
12
+ * hasMixedNewlines: boolean,
13
+ * newlineStyle: 'LF' | 'CRLF' | 'CR' | 'Mixed' | null,
14
+ * hasEmojis: boolean,
15
+ * hasNonKeyboardChars: boolean,
16
+ * summary: string[]
17
+ * }}
18
+ */
19
+ export function inspectText(text) {
20
+ if (typeof text !== 'string') {
21
+ throw new TypeError('inspectText expects a string input.');
22
+ }
23
+
24
+ const summary = [];
25
+ const report = {
26
+ hasControlChars: false,
27
+ hasInvisibleChars: false,
28
+ hasMixedNewlines: false,
29
+ newlineStyle: null,
30
+ hasEmojis: false,
31
+ hasNonKeyboardChars: false,
32
+ summary
33
+ };
34
+
35
+ const flag = (condition, key, message) => {
36
+ if (condition) {
37
+ report[key] = true;
38
+ summary.push(message);
39
+ }
40
+ };
41
+
42
+ // === Pattern Checks ===
43
+ flag(CONTROL_CHARS_REGEX.test(text), 'hasControlChars', 'Control characters detected.');
44
+ flag(INVISIBLE_TRASH_REGEX.test(text), 'hasInvisibleChars', 'Invisible Unicode characters detected.');
45
+ flag(EMOJI_REGEX.test(text), 'hasEmojis', 'Emojis detected.');
46
+
47
+ // === Newline Analysis ===
48
+ const { mixed, types } = getNewlineStats(text);
49
+ report.hasMixedNewlines = mixed;
50
+ report.newlineStyle = mixed ? 'Mixed' : types[0] || null;
51
+
52
+ if (report.newlineStyle) {
53
+ summary.push(
54
+ mixed
55
+ ? 'Mixed newline styles detected.'
56
+ : `Consistent newline style: ${report.newlineStyle}`
57
+ );
58
+ }
59
+
60
+ // === Non-keyboard characters (excluding emojis) ===
61
+ const filtered = normalizeTypographicJank(text).replace(/[^\x20-\x7E]+/gu, m =>
62
+ m.match(EMOJI_REGEX) ? '' : 'β˜’'
63
+ );
64
+ flag(/[β˜’]/.test(filtered), 'hasNonKeyboardChars', 'Non-keyboard characters detected.');
65
+
66
+ return report;
67
+ }
68
+
69
+
70
+ /**
71
+ * Counts the number of different newline types in a string.
72
+ * @param {string} text
73
+ * @returns {{
74
+ * crlf: number,
75
+ * cr: number,
76
+ * lf: number,
77
+ * types: string[],
78
+ * mixed: boolean
79
+ * }}
80
+ */
81
+ export function getNewlineStats(text) {
82
+ if (typeof text !== 'string') {
83
+ throw new TypeError('getNewlineStats expects a string input.');
84
+ }
85
+
86
+ const crlfMatches = text.match(/\r\n/g) || [];
87
+ const textWithoutCRLF = text.replace(/\r\n/g, '');
88
+
89
+ const crMatches = textWithoutCRLF.match(/\r/g) || [];
90
+ const lfMatches = textWithoutCRLF.match(/\n/g) || [];
91
+
92
+ const count = {
93
+ crlf: crlfMatches.length,
94
+ cr: crMatches.length,
95
+ lf: lfMatches.length
96
+ };
97
+
98
+ const types = [];
99
+ if (count.crlf > 0) types.push('CRLF');
100
+ if (count.cr > 0) types.push('CR');
101
+ if (count.lf > 0) types.push('LF');
102
+
103
+ return {
104
+ ...count,
105
+ types,
106
+ mixed: types.length > 1
107
+ };
108
+ }
@@ -162,7 +162,7 @@ export function sanctifyText(
162
162
  * @param {string} text
163
163
  * @returns {string}
164
164
  */
165
- const INVISIBLE_TRASH_REGEX = /[\u00A0\u2000-\u200D\u202F\u2060\u3000\uFEFF\u200E\u200F\u202A-\u202E]+/g;
165
+ export const INVISIBLE_TRASH_REGEX = /[\u00A0\u2000-\u200D\u202F\u2060\u3000\uFEFF\u200E\u200F\u202A-\u202E]+/g;
166
166
  function purgeInvisibleTrash(text) {
167
167
  return text.replace(INVISIBLE_TRASH_REGEX, '');
168
168
  }
@@ -207,7 +207,7 @@ const BULLETS_REGEX = /[\u2022\u00B7]/g;
207
207
  // Full-width ASCII punctuation: U+FF01 - U+FF5E
208
208
  const FULLWIDTH_PUNCTUATION_REGEX = /[\uFF01-\uFF5E]/g;
209
209
 
210
- function normalizeTypographicJank(text) {
210
+ export function normalizeTypographicJank(text) {
211
211
  return text
212
212
  .replace(SMART_SINGLE_QUOTES_REGEX, "'")
213
213
  .replace(SMART_DOUBLE_QUOTES_REGEX, '"')
@@ -221,7 +221,7 @@ function normalizeTypographicJank(text) {
221
221
 
222
222
 
223
223
 
224
- let EMOJI_REGEX;
224
+ export let EMOJI_REGEX;
225
225
 
226
226
  /**
227
227
  * Try Unicode property escape regex (preferred).
@@ -237,6 +237,7 @@ try {
237
237
  EMOJI_REGEX = /[\u{1F300}-\u{1FAFF}]/gu;
238
238
  }
239
239
 
240
+
240
241
  /**
241
242
  * Removes all emoji characters using Unicode property escapes.
242
243
  * Supports modern environments (Unicode v13+) with fallback.
@@ -250,21 +251,19 @@ function purgeEmojisCharacters(text) {
250
251
 
251
252
 
252
253
  /**
253
- * Normalizes all line endings to Unix-style (\n).
254
+ * Normalizes all line endings to a consistent format.
254
255
  *
255
256
  * Converts:
256
- * - Windows line endings ("\r\n") β†’ "\n"
257
- * - Old Mac line endings ("\r") β†’ "\n"
258
- *
259
- * Example:
260
- * "Line1\r\nLine2\rLine3" β†’ "Line1\nLine2\nLine3"
257
+ * - Windows ("\r\n"), Old Mac ("\r"), Unix ("\n")
258
+ * Into the specified newline format (default: Unix "\n").
261
259
  *
262
- * @param {string} text
260
+ * @param {string} text - Input string to normalize.
261
+ * @param {string} [normalized='\n'] - Target newline style (e.g. '\n', '\r\n').
263
262
  * @returns {string}
264
263
  */
265
- const NORMALIZE_NEWLINES_REGEX = /\r\n?/g;
266
- function normalizeNewlines(text) {
267
- return text.replace(NORMALIZE_NEWLINES_REGEX, '\n');
264
+ const NORMALIZE_NEWLINES_REGEX = /\r\n|\r|\n/g;
265
+ function normalizeNewlines(text, normalized = '\n') {
266
+ return text.replace(NORMALIZE_NEWLINES_REGEX, normalized);
268
267
  }
269
268
 
270
269
 
@@ -336,7 +335,7 @@ function collapseExtraSpaces(text) {
336
335
  * @param {string} text
337
336
  * @returns {string}
338
337
  */
339
- const CONTROL_CHARS_REGEX = /[\u0000-\u0008\u000B\u000C\u000E-\u001F\u007F\u0080-\u009F\u200E\u200F\u202A-\u202E]+/g;
338
+ export const CONTROL_CHARS_REGEX = /[\u0000-\u0008\u000B\u000C\u000E-\u001F\u007F\u0080-\u009F\u200E\u200F\u202A-\u202E]+/g;
340
339
  function purgeControlCharacters(text) {
341
340
  return text.replace(CONTROL_CHARS_REGEX, '');
342
341
  }