unspook 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 unspook contributors
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
package/README.md ADDED
@@ -0,0 +1,177 @@
1
+ <div align="center">
2
+
3
+ # 👻 unspook
4
+
5
+ ### Reveal & remove the invisible, dangerous, and confusable characters hiding in your text.
6
+
7
+ [![npm version](https://img.shields.io/npm/v/unspook.svg?color=success)](https://www.npmjs.com/package/unspook)
8
+ [![bundle size](https://img.shields.io/bundlephobia/minzip/unspook?label=gzip)](https://bundlephobia.com/package/unspook)
9
+ [![CI](https://github.com/didrod205/unspook/actions/workflows/ci.yml/badge.svg)](https://github.com/didrod205/unspook/actions/workflows/ci.yml)
10
+ [![types](https://img.shields.io/npm/types/unspook.svg)](https://www.npmjs.com/package/unspook)
11
+ [![license](https://img.shields.io/npm/l/unspook.svg)](./LICENSE)
12
+
13
+ **[🌐 Try the free web app →](https://didrod205.github.io/unspook/)** &nbsp;·&nbsp; no install, nothing uploaded, works offline.
14
+
15
+ </div>
16
+
17
+ ---
18
+
19
+ Your text is probably not as clean as it looks. Copy something from a website, a
20
+ PDF, a Word doc, a chat app, or an AI assistant and you'll often paste in
21
+ **characters you can't see**:
22
+
23
+ - **Zero-width spaces** and **BOMs** that break `===` comparisons, search, and CSV imports.
24
+ - **Non-breaking spaces** masquerading as normal spaces — the bane of every "why won't this match?" bug.
25
+ - **“Smart quotes”, em–dashes and ellipses…** that wreck code, JSON, and CSVs.
26
+ - **Bidi control characters** — the [*Trojan Source*](https://trojansource.codes/) attack (CVE-2021-42574) that makes code read one way and compile another.
27
+ - **Unicode "tag" characters** used to smuggle **invisible prompt-injection** instructions into text fed to LLMs.
28
+ - **Homoglyphs** — a Cyrillic `а` or Greek `ο` that looks exactly like Latin but isn't (phishing, impersonation, broken lookups).
29
+
30
+ **unspook** finds them, shows you exactly what's there, and cleans your text —
31
+ **100% locally**, with **zero dependencies** and **no API key**.
32
+
33
+ > 📸 _Screenshot / demo GIF:_ `./web/screenshot.png` — replace with a recording of the [live app](https://didrod205.github.io/unspook/).
34
+
35
+ ## Why it exists
36
+
37
+ Every "text sanitizer" you find online makes you **paste sensitive content into
38
+ someone else's server**. That's exactly backwards for a privacy/security tool.
39
+ unspook runs entirely in your browser or your terminal — your text never leaves
40
+ your machine. And because detecting these characters is a precise,
41
+ spec-based problem (not a vibe), it's the kind of thing you want a small, tested,
42
+ **deterministic** tool for — not a guess.
43
+
44
+ ## Who it's for
45
+
46
+ Developers (clean code, configs, commit hooks), **writers & marketers** (clean
47
+ copy before publishing), **designers** (paste-safe content), **educators &
48
+ researchers** (spot hidden characters in AI text), **ops & support** (sanitize
49
+ logs and tickets), and anyone who's ever fought a "looks identical but won't
50
+ match" bug.
51
+
52
+ ## Install
53
+
54
+ **No install needed —** just open the **[web app](https://didrod205.github.io/unspook/)**.
55
+
56
+ For the library / CLI:
57
+
58
+ ```bash
59
+ npm install unspook # library
60
+ npm install -g unspook # CLI (or use npx unspook)
61
+ ```
62
+
63
+ Ships ESM **and** CommonJS, with TypeScript types.
64
+
65
+ ## Usage
66
+
67
+ ### In code
68
+
69
+ ```ts
70
+ import { scan, clean, reveal, stats } from "unspook";
71
+
72
+ clean("Hello​world"); // "Helloworld" (zero-width space removed)
73
+ clean("a b"); // "a b" (NBSP → normal space)
74
+ clean("“quote” — dash…", { smartPunctuation: true }); // '"quote" -- dash...'
75
+ clean("аdmin", { homoglyphs: true }); // "admin" (Cyrillic а → a)
76
+
77
+ scan("hi​there");
78
+ // [{ index: 2, char: "​", codePoint: 8203, hex: "U+200B",
79
+ // name: "ZERO WIDTH SPACE", category: "zero-width", severity: "warning" }]
80
+
81
+ reveal("a​b"); // "a[U+200B]b"
82
+ stats(text); // { total, byCategory, bySeverity }
83
+ ```
84
+
85
+ ### On the command line
86
+
87
+ ```bash
88
+ unspook notes.md # print cleaned text
89
+ cat draft.txt | unspook # use it as a filter in any pipeline
90
+ unspook -w README.md # clean a file in place
91
+ unspook --reveal config.yml # show what's hiding
92
+ unspook --scan src/index.ts # report findings; exits 1 if any → perfect for CI
93
+ unspook --aggressive blog.md # also fix smart quotes, homoglyphs & whitespace
94
+ ```
95
+
96
+ Drop it into a pre-commit hook or CI to **fail the build if invisible/bidi
97
+ characters sneak into your codebase.**
98
+
99
+ ### Cleaning options
100
+
101
+ | Option | Default | What it does |
102
+ | ------ | :-----: | ------------ |
103
+ | `zeroWidth` | ✅ | Remove zero-width / invisible chars (ZWSP, BOM, word joiner…) |
104
+ | `bidi` | ✅ | Remove bidirectional controls (Trojan Source) |
105
+ | `tag` | ✅ | Remove Unicode tag chars (invisible prompt injection) |
106
+ | `control` | ✅ | Remove C0/C1 control characters |
107
+ | `invisibleSpaces` | ✅ | Normalize NBSP & exotic spaces → space; drop soft hyphens |
108
+ | `variationSelectors` | ❌ | Remove variation selectors (off by default — used by emoji) |
109
+ | `smartPunctuation` | ❌ | Convert “ ” ‘ ’ — … to ASCII |
110
+ | `homoglyphs` | ❌ | Map look-alike letters to Latin (Cyrillic/Greek/fullwidth) |
111
+ | `collapseWhitespace` | ❌ | Collapse runs of spaces/tabs |
112
+ | `normalizeNewlines` | ✅ | `\r\n`, `\r` → `\n` |
113
+ | `trim` | ❌ | Trim the ends |
114
+
115
+ `DEFAULT_OPTIONS` and `AGGRESSIVE_OPTIONS` presets are exported too.
116
+
117
+ ## FAQ
118
+
119
+ **Is my text uploaded anywhere?**
120
+ No. The web app and the library run entirely on your device — there is no
121
+ server, no telemetry, no network request. You can use it offline.
122
+
123
+ **Will it break my emoji?**
124
+ No. Variation selectors (which emoji rely on) are kept by default. Turn on
125
+ `variationSelectors` only if you specifically want them removed.
126
+
127
+ **Does it modify visible content?**
128
+ By default it only removes invisible/dangerous characters and normalizes odd
129
+ spaces — your visible text is preserved. Smart-quote and homoglyph conversion
130
+ are **opt-in** because they change visible characters.
131
+
132
+ **How is this different from a regex like `/[​]/g`?**
133
+ unspook covers dozens of code points across eight categories (zero-width, bidi,
134
+ tag, control, exotic spaces, smart punctuation, homoglyphs, variation selectors),
135
+ names each finding, assigns a severity, tracks positions, and gives you a tested,
136
+ maintained, reversible-by-option cleaner. No regex to copy-paste-and-get-wrong.
137
+
138
+ **Can I use it in CI / a pre-commit hook?**
139
+ Yes — `unspook --scan <files>` exits with code `1` when anything is found.
140
+
141
+ **Why "unspook"?**
142
+ It un-spooks your text: removes the ghostly invisible characters. 👻
143
+
144
+ ## Contributing
145
+
146
+ Contributions are very welcome! See [CONTRIBUTING.md](./CONTRIBUTING.md) and the
147
+ [Code of Conduct](./CODE_OF_CONDUCT.md). Adding a code point or a homoglyph
148
+ mapping? Include a test and a reference.
149
+
150
+ ```bash
151
+ git clone https://github.com/didrod205/unspook.git
152
+ cd unspook
153
+ npm install
154
+ npm test # run the suite
155
+ npm run dev # run the web app locally
156
+ ```
157
+
158
+ ## 💖 Sponsor
159
+
160
+ unspook is free, MIT-licensed, and built in spare time. If it saved you from a
161
+ maddening invisible-character bug — or a security incident — please consider
162
+ supporting it:
163
+
164
+ - ⭐ **Star this repo** — free, and it genuinely helps others find it.
165
+ - 💛 **[GitHub Sponsors](https://github.com/sponsors/didrod205)**
166
+ - ☕ **[Buy Me a Coffee](https://www.buymeacoffee.com/didrod205)**
167
+ - 🧋 **[Ko-fi](https://ko-fi.com/didrod205)**
168
+ - 🍋 **[Lemon Squeezy](https://elab-studio.lemonsqueezy.com/checkout/buy/5d059b89-51d0-456b-b33a-ed56994f7010)**
169
+
170
+ **Where your support goes:** keeping the character database current with new
171
+ Unicode releases, expanding the homoglyph/confusables coverage, maintaining the
172
+ free hosted web app, adding integrations (VS Code extension, ESLint plugin,
173
+ pre-commit hook), and answering issues quickly.
174
+
175
+ ## License
176
+
177
+ [MIT](./LICENSE) © unspook contributors
@@ -0,0 +1,6 @@
1
+ var I={bidi:"danger",tag:"danger","zero-width":"warning",control:"warning",homoglyph:"warning","variation-selector":"warning","invisible-space":"info","smart-punctuation":"info"},e=(t,n,i)=>({name:t,category:n,severity:I[n],replacement:i}),O=new Map([[8203,e("ZERO WIDTH SPACE","zero-width","")],[8204,e("ZERO WIDTH NON-JOINER","zero-width","")],[8205,e("ZERO WIDTH JOINER","zero-width","")],[8288,e("WORD JOINER","zero-width","")],[8289,e("FUNCTION APPLICATION","zero-width","")],[8290,e("INVISIBLE TIMES","zero-width","")],[8291,e("INVISIBLE SEPARATOR","zero-width","")],[8292,e("INVISIBLE PLUS","zero-width","")],[65279,e("ZERO WIDTH NO-BREAK SPACE (BOM)","zero-width","")],[6158,e("MONGOLIAN VOWEL SEPARATOR","zero-width","")],[8234,e("LEFT-TO-RIGHT EMBEDDING","bidi","")],[8235,e("RIGHT-TO-LEFT EMBEDDING","bidi","")],[8236,e("POP DIRECTIONAL FORMATTING","bidi","")],[8237,e("LEFT-TO-RIGHT OVERRIDE","bidi","")],[8238,e("RIGHT-TO-LEFT OVERRIDE","bidi","")],[8294,e("LEFT-TO-RIGHT ISOLATE","bidi","")],[8295,e("RIGHT-TO-LEFT ISOLATE","bidi","")],[8296,e("FIRST STRONG ISOLATE","bidi","")],[8297,e("POP DIRECTIONAL ISOLATE","bidi","")],[8206,e("LEFT-TO-RIGHT MARK","bidi","")],[8207,e("RIGHT-TO-LEFT MARK","bidi","")],[1564,e("ARABIC LETTER MARK","bidi","")],[160,e("NO-BREAK SPACE","invisible-space"," ")],[8239,e("NARROW NO-BREAK SPACE","invisible-space"," ")],[8199,e("FIGURE SPACE","invisible-space"," ")],[8200,e("PUNCTUATION SPACE","invisible-space"," ")],[8201,e("THIN SPACE","invisible-space"," ")],[8202,e("HAIR SPACE","invisible-space"," ")],[8192,e("EN QUAD","invisible-space"," ")],[8193,e("EM QUAD","invisible-space"," ")],[8194,e("EN SPACE","invisible-space"," ")],[8195,e("EM SPACE","invisible-space"," ")],[8196,e("THREE-PER-EM SPACE","invisible-space"," ")],[8197,e("FOUR-PER-EM SPACE","invisible-space"," ")],[8198,e("SIX-PER-EM SPACE","invisible-space"," ")],[8287,e("MEDIUM MATHEMATICAL SPACE","invisible-space"," ")],[12288,e("IDEOGRAPHIC SPACE","invisible-space"," ")],[5760,e("OGHAM SPACE MARK","invisible-space"," ")],[173,e("SOFT HYPHEN","invisible-space","")],[8232,e("LINE SEPARATOR","invisible-space",`
2
+ `)],[8233,e("PARAGRAPH SEPARATOR","invisible-space",`
3
+ `)],[8220,e("LEFT DOUBLE QUOTATION MARK","smart-punctuation",'"')],[8221,e("RIGHT DOUBLE QUOTATION MARK","smart-punctuation",'"')],[8222,e("DOUBLE LOW-9 QUOTATION MARK","smart-punctuation",'"')],[8223,e("DOUBLE HIGH-REVERSED-9 QUOTATION MARK","smart-punctuation",'"')],[171,e("LEFT-POINTING DOUBLE ANGLE QUOTATION MARK","smart-punctuation",'"')],[187,e("RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK","smart-punctuation",'"')],[8216,e("LEFT SINGLE QUOTATION MARK","smart-punctuation","'")],[8217,e("RIGHT SINGLE QUOTATION MARK","smart-punctuation","'")],[8218,e("SINGLE LOW-9 QUOTATION MARK","smart-punctuation","'")],[8219,e("SINGLE HIGH-REVERSED-9 QUOTATION MARK","smart-punctuation","'")],[8249,e("SINGLE LEFT-POINTING ANGLE QUOTATION MARK","smart-punctuation","'")],[8250,e("SINGLE RIGHT-POINTING ANGLE QUOTATION MARK","smart-punctuation","'")],[8242,e("PRIME","smart-punctuation","'")],[8243,e("DOUBLE PRIME","smart-punctuation",'"')],[8211,e("EN DASH","smart-punctuation","-")],[8212,e("EM DASH","smart-punctuation","--")],[8213,e("HORIZONTAL BAR","smart-punctuation","--")],[8722,e("MINUS SIGN","smart-punctuation","-")],[8230,e("HORIZONTAL ELLIPSIS","smart-punctuation","...")]]),T={0:"NULL",1:"START OF HEADING",2:"START OF TEXT",3:"END OF TEXT",4:"END OF TRANSMISSION",5:"ENQUIRY",6:"ACKNOWLEDGE",7:"BELL",8:"BACKSPACE",11:"LINE TABULATION",12:"FORM FEED",14:"SHIFT OUT",15:"SHIFT IN",16:"DATA LINK ESCAPE",27:"ESCAPE",127:"DELETE"},A=new Map([[1072,"a"],[1077,"e"],[1086,"o"],[1088,"p"],[1089,"c"],[1091,"y"],[1093,"x"],[1109,"s"],[1110,"i"],[1112,"j"],[1211,"h"],[1281,"d"],[1082,"k"],[1084,"m"],[1090,"t"],[1040,"A"],[1042,"B"],[1045,"E"],[1050,"K"],[1052,"M"],[1053,"H"],[1054,"O"],[1056,"P"],[1057,"C"],[1058,"T"],[1059,"Y"],[1061,"X"],[1029,"S"],[1030,"I"],[1032,"J"],[959,"o"],[945,"a"],[947,"y"],[961,"p"],[965,"u"],[913,"A"],[914,"B"],[917,"E"],[918,"Z"],[919,"H"],[921,"I"],[922,"K"],[924,"M"],[925,"N"],[927,"O"],[929,"P"],[932,"T"],[933,"Y"],[935,"X"],[8495,"e"],[8467,"l"],[305,"i"]]);function c(t){let n=O.get(t);if(n)return n;if(t>=917504&&t<=917631)return e("TAG CHARACTER","tag","");if(t>=65024&&t<=65039||t>=917760&&t<=917999)return e("VARIATION SELECTOR","variation-selector","");if(t>=65281&&t<=65374)return e("FULLWIDTH FORM","homoglyph",String.fromCharCode(t-65248));let i=A.get(t);if(i!==void 0)return e(`HOMOGLYPH OF "${i}"`,"homoglyph",i);if(t<=31&&t!==9&&t!==10&&t!==13||t===127||t>=128&&t<=159){let r=T[t]??"CONTROL CHARACTER";return e(r,"control","")}return null}var R={zeroWidth:true,bidi:true,tag:true,control:true,variationSelectors:false,invisibleSpaces:true,smartPunctuation:false,homoglyphs:false,collapseWhitespace:false,normalizeNewlines:true,trim:false},d={zeroWidth:true,bidi:true,tag:true,control:true,variationSelectors:true,invisibleSpaces:true,smartPunctuation:true,homoglyphs:true,collapseWhitespace:true,normalizeNewlines:true,trim:true},p={"zero-width":"zeroWidth",bidi:"bidi",tag:"tag",control:"control","variation-selector":"variationSelectors","invisible-space":"invisibleSpaces","smart-punctuation":"smartPunctuation",homoglyph:"homoglyphs"},b=new Set(["zero-width","bidi","tag","variation-selector","control","invisible-space"]);function E(t){return "U+"+t.toString(16).toUpperCase().padStart(4,"0")}function S(t){let n=[],i=0;for(let r of t){let o=r.codePointAt(0),a=c(o);a&&n.push({index:i,char:r,codePoint:o,hex:E(o),name:a.name,category:a.category,severity:a.severity}),i+=r.length;}return n}function g(t){for(let n of t)if(c(n.codePointAt(0)))return false;return true}function m(t){let n={"zero-width":0,bidi:0,tag:0,"variation-selector":0,"invisible-space":0,control:0,"smart-punctuation":0,homoglyph:0},i={danger:0,warning:0,info:0},r=0;for(let o of S(t))n[o.category]++,i[o.severity]++,r++;return {total:r,byCategory:n,bySeverity:i}}function L(t,n={}){let i={...R,...n},r=t;i.normalizeNewlines&&(r=r.replace(/\r\n?/g,`
4
+ `));let o="";for(let a of r){let x=a.codePointAt(0),s=c(x);if(!s){o+=a;continue}let l=i[p[s.category]];o+=l?s.replacement:a;}return i.collapseWhitespace&&(o=o.replace(/[^\S\n]+/g," ").replace(/[ \t]*\n[ \t]*/g,`
5
+ `)),i.trim&&(o=o.trim()),o}function C(t,n={}){let i=n.token??(a=>`[${a.hex}]`),r="",o=0;for(let a of t){let x=a.codePointAt(0),s=c(x);s&&b.has(s.category)?r+=i({index:o,char:a,codePoint:x,hex:E(x),name:s.name,category:s.category,severity:s.severity}):r+=a,o+=a.length;}return r}export{R as a,d as b,S as c,g as d,m as e,L as f,C as g};//# sourceMappingURL=chunk-PXW2RFNH.js.map
6
+ //# sourceMappingURL=chunk-PXW2RFNH.js.map
@@ -0,0 +1 @@
1
+ {"version":3,"sources":["../src/data.ts","../src/index.ts"],"names":["SEVERITY_BY_CATEGORY","def","name","category","replacement","SPECIAL","C0_NAMES","HOMOGLYPHS","classify","cp","special","latin","DEFAULT_OPTIONS","AGGRESSIVE_OPTIONS","CATEGORY_OPTION","INVISIBLE_CATEGORIES","toHex","scan","text","findings","index","ch","info","isClean","stats","byCategory","bySeverity","total","f","clean","options","opts","src","out","enabled","reveal","token"],"mappings":"AA2BO,IAAMA,EAAmD,CAC9D,IAAA,CAAM,SACN,GAAA,CAAK,QAAA,CACL,aAAc,SAAA,CACd,OAAA,CAAS,SAAA,CACT,SAAA,CAAW,UACX,oBAAA,CAAsB,SAAA,CACtB,kBAAmB,MAAA,CACnB,mBAAA,CAAqB,MACvB,CAAA,CAEMC,CAAAA,CAAM,CAACC,CAAAA,CAAcC,EAAoBC,CAAAA,IAAmC,CAChF,KAAAF,CAAAA,CACA,QAAA,CAAAC,EACA,QAAA,CAAUH,CAAAA,CAAqBG,CAAQ,CAAA,CACvC,WAAA,CAAAC,CACF,CAAA,CAAA,CAGaC,CAAAA,CAAiC,IAAI,GAAA,CAAI,CAEpD,CAAC,IAAA,CAAQJ,CAAAA,CAAI,kBAAA,CAAoB,YAAA,CAAc,EAAE,CAAC,CAAA,CAClD,CAAC,IAAA,CAAQA,CAAAA,CAAI,wBAAyB,YAAA,CAAc,EAAE,CAAC,CAAA,CACvD,CAAC,IAAA,CAAQA,CAAAA,CAAI,oBAAqB,YAAA,CAAc,EAAE,CAAC,CAAA,CACnD,CAAC,IAAA,CAAQA,CAAAA,CAAI,cAAe,YAAA,CAAc,EAAE,CAAC,CAAA,CAC7C,CAAC,KAAQA,CAAAA,CAAI,sBAAA,CAAwB,aAAc,EAAE,CAAC,EACtD,CAAC,IAAA,CAAQA,EAAI,iBAAA,CAAmB,YAAA,CAAc,EAAE,CAAC,CAAA,CACjD,CAAC,IAAA,CAAQA,EAAI,qBAAA,CAAuB,YAAA,CAAc,EAAE,CAAC,CAAA,CACrD,CAAC,IAAA,CAAQA,CAAAA,CAAI,gBAAA,CAAkB,YAAA,CAAc,EAAE,CAAC,CAAA,CAChD,CAAC,KAAA,CAAQA,CAAAA,CAAI,kCAAmC,YAAA,CAAc,EAAE,CAAC,CAAA,CACjE,CAAC,IAAA,CAAQA,CAAAA,CAAI,4BAA6B,YAAA,CAAc,EAAE,CAAC,CAAA,CAG3D,CAAC,KAAQA,CAAAA,CAAI,yBAAA,CAA2B,OAAQ,EAAE,CAAC,EACnD,CAAC,IAAA,CAAQA,EAAI,yBAAA,CAA2B,MAAA,CAAQ,EAAE,CAAC,EACnD,CAAC,IAAA,CAAQA,EAAI,4BAAA,CAA8B,MAAA,CAAQ,EAAE,CAAC,CAAA,CACtD,CAAC,IAAA,CAAQA,EAAI,wBAAA,CAA0B,MAAA,CAAQ,EAAE,CAAC,CAAA,CAClD,CAAC,IAAA,CAAQA,CAAAA,CAAI,wBAAA,CAA0B,MAAA,CAAQ,EAAE,CAAC,CAAA,CAClD,CAAC,IAAA,CAAQA,CAAAA,CAAI,wBAAyB,MAAA,CAAQ,EAAE,CAAC,CAAA,CACjD,CAAC,KAAQA,CAAAA,CAAI,uBAAA,CAAyB,OAAQ,EAAE,CAAC,EACjD,CAAC,IAAA,CAAQA,CAAAA,CAAI,sBAAA,CAAwB,OAAQ,EAAE,CAAC,EAChD,CAAC,IAAA,CAAQA,EAAI,yBAAA,CAA2B,MAAA,CAAQ,EAAE,CAAC,EACnD,CAAC,IAAA,CAAQA,EAAI,oBAAA,CAAsB,MAAA,CAAQ,EAAE,CAAC,CAAA,CAC9C,CAAC,IAAA,CAAQA,EAAI,oBAAA,CAAsB,MAAA,CAAQ,EAAE,CAAC,CAAA,CAC9C,CAAC,IAAA,CAAQA,CAAAA,CAAI,qBAAsB,MAAA,CAAQ,EAAE,CAAC,CAAA,CAG9C,CAAC,IAAQA,CAAAA,CAAI,gBAAA,CAAkB,kBAAmB,GAAG,CAAC,CAAA,CACtD,CAAC,KAAQA,CAAAA,CAAI,uBAAA,CAAyB,kBAAmB,GAAG,CAAC,EAC7D,CAAC,IAAA,CAAQA,CAAAA,CAAI,cAAA,CAAgB,kBAAmB,GAAG,CAAC,EACpD,CAAC,IAAA,CAAQA,EAAI,mBAAA,CAAqB,iBAAA,CAAmB,GAAG,CAAC,EACzD,CAAC,IAAA,CAAQA,EAAI,YAAA,CAAc,iBAAA,CAAmB,GAAG,CAAC,CAAA,CAClD,CAAC,IAAA,CAAQA,CAAAA,CAAI,aAAc,iBAAA,CAAmB,GAAG,CAAC,CAAA,CAClD,CAAC,KAAQA,CAAAA,CAAI,SAAA,CAAW,iBAAA,CAAmB,GAAG,CAAC,CAAA,CAC/C,CAAC,KAAQA,CAAAA,CAAI,SAAA,CAAW,kBAAmB,GAAG,CAAC,CAAA,CAC/C,CAAC,KAAQA,CAAAA,CAAI,UAAA,CAAY,kBAAmB,GAAG,CAAC,EAChD,CAAC,IAAA,CAAQA,CAAAA,CAAI,UAAA,CAAY,kBAAmB,GAAG,CAAC,EAChD,CAAC,IAAA,CAAQA,EAAI,oBAAA,CAAsB,iBAAA,CAAmB,GAAG,CAAC,CAAA,CAC1D,CAAC,IAAA,CAAQA,CAAAA,CAAI,oBAAqB,iBAAA,CAAmB,GAAG,CAAC,CAAA,CACzD,CAAC,IAAA,CAAQA,CAAAA,CAAI,mBAAoB,iBAAA,CAAmB,GAAG,CAAC,CAAA,CACxD,CAAC,KAAQA,CAAAA,CAAI,2BAAA,CAA6B,iBAAA,CAAmB,GAAG,CAAC,CAAA,CACjE,CAAC,MAAQA,CAAAA,CAAI,mBAAA,CAAqB,kBAAmB,GAAG,CAAC,CAAA,CACzD,CAAC,KAAQA,CAAAA,CAAI,kBAAA,CAAoB,kBAAmB,GAAG,CAAC,EACxD,CAAC,GAAA,CAAQA,EAAI,aAAA,CAAe,iBAAA,CAAmB,EAAE,CAAC,CAAA,CAClD,CAAC,IAAA,CAAQA,CAAAA,CAAI,iBAAkB,iBAAA,CAAmB;AAAA,CAAI,CAAC,CAAA,CACvD,CAAC,IAAA,CAAQA,CAAAA,CAAI,sBAAuB,iBAAA,CAAmB;AAAA,CAAI,CAAC,EAG5D,CAAC,IAAA,CAAQA,EAAI,4BAAA,CAA8B,mBAAA,CAAqB,GAAG,CAAC,CAAA,CACpE,CAAC,IAAA,CAAQA,CAAAA,CAAI,8BAA+B,mBAAA,CAAqB,GAAG,CAAC,CAAA,CACrE,CAAC,KAAQA,CAAAA,CAAI,6BAAA,CAA+B,oBAAqB,GAAG,CAAC,EACrE,CAAC,IAAA,CAAQA,EAAI,uCAAA,CAAyC,mBAAA,CAAqB,GAAG,CAAC,CAAA,CAC/E,CAAC,GAAA,CAAQA,CAAAA,CAAI,4CAA6C,mBAAA,CAAqB,GAAG,CAAC,CAAA,CACnF,CAAC,IAAQA,CAAAA,CAAI,4CAAA,CAA8C,oBAAqB,GAAG,CAAC,EACpF,CAAC,IAAA,CAAQA,EAAI,4BAAA,CAA8B,mBAAA,CAAqB,GAAG,CAAC,CAAA,CACpE,CAAC,IAAA,CAAQA,CAAAA,CAAI,8BAA+B,mBAAA,CAAqB,GAAG,CAAC,CAAA,CACrE,CAAC,KAAQA,CAAAA,CAAI,6BAAA,CAA+B,oBAAqB,GAAG,CAAC,EACrE,CAAC,IAAA,CAAQA,EAAI,uCAAA,CAAyC,mBAAA,CAAqB,GAAG,CAAC,CAAA,CAC/E,CAAC,IAAA,CAAQA,CAAAA,CAAI,4CAA6C,mBAAA,CAAqB,GAAG,CAAC,CAAA,CACnF,CAAC,KAAQA,CAAAA,CAAI,4CAAA,CAA8C,oBAAqB,GAAG,CAAC,EACpF,CAAC,IAAA,CAAQA,EAAI,OAAA,CAAS,mBAAA,CAAqB,GAAG,CAAC,CAAA,CAC/C,CAAC,IAAA,CAAQA,CAAAA,CAAI,eAAgB,mBAAA,CAAqB,GAAG,CAAC,CAAA,CACtD,CAAC,KAAQA,CAAAA,CAAI,SAAA,CAAW,oBAAqB,GAAG,CAAC,EACjD,CAAC,IAAA,CAAQA,EAAI,SAAA,CAAW,mBAAA,CAAqB,IAAI,CAAC,CAAA,CAClD,CAAC,IAAA,CAAQA,CAAAA,CAAI,iBAAkB,mBAAA,CAAqB,IAAI,CAAC,CAAA,CACzD,CAAC,KAAQA,CAAAA,CAAI,YAAA,CAAc,oBAAqB,GAAG,CAAC,EACpD,CAAC,IAAA,CAAQA,EAAI,qBAAA,CAAuB,mBAAA,CAAqB,KAAK,CAAC,CACjE,CAAC,CAAA,CAGKK,CAAAA,CAAmC,CACvC,CAAA,CAAM,MAAA,CAAQ,EAAM,kBAAA,CAAoB,CAAA,CAAM,gBAAiB,CAAA,CAAM,aAAA,CACrE,EAAM,qBAAA,CAAuB,CAAA,CAAM,UAAW,CAAA,CAAM,aAAA,CAAe,EAAM,MAAA,CACzE,CAAA,CAAM,YAAa,EAAA,CAAM,iBAAA,CAAmB,GAAM,WAAA,CAAa,EAAA,CAAM,YACrE,EAAA,CAAM,UAAA,CAAY,GAAM,kBAAA,CAAoB,EAAA,CAAM,QAAA,CAAU,GAAA,CAAM,QACpE,CAAA,CAOaC,EAAkC,IAAI,GAAA,CAAI,CAErD,CAAC,IAAA,CAAQ,GAAG,CAAA,CAAG,CAAC,KAAQ,GAAG,CAAA,CAAG,CAAC,IAAA,CAAQ,GAAG,EAAG,CAAC,IAAA,CAAQ,GAAG,CAAA,CAAG,CAAC,KAAQ,GAAG,CAAA,CACxE,CAAC,IAAA,CAAQ,GAAG,EAAG,CAAC,IAAA,CAAQ,GAAG,CAAA,CAAG,CAAC,KAAQ,GAAG,CAAA,CAAG,CAAC,IAAA,CAAQ,GAAG,EAAG,CAAC,IAAA,CAAQ,GAAG,CAAA,CACxE,CAAC,KAAQ,GAAG,CAAA,CAAG,CAAC,IAAA,CAAQ,GAAG,EAAG,CAAC,IAAA,CAAQ,GAAG,CAAA,CAAG,CAAC,KAAQ,GAAG,CAAA,CAAG,CAAC,IAAA,CAAQ,GAAG,EAExE,CAAC,IAAA,CAAQ,GAAG,CAAA,CAAG,CAAC,KAAQ,GAAG,CAAA,CAAG,CAAC,IAAA,CAAQ,GAAG,EAAG,CAAC,IAAA,CAAQ,GAAG,CAAA,CAAG,CAAC,KAAQ,GAAG,CAAA,CACxE,CAAC,IAAA,CAAQ,GAAG,EAAG,CAAC,IAAA,CAAQ,GAAG,CAAA,CAAG,CAAC,IAAA,CAAQ,GAAG,CAAA,CAAG,CAAC,KAAQ,GAAG,CAAA,CAAG,CAAC,IAAA,CAAQ,GAAG,EACxE,CAAC,IAAA,CAAQ,GAAG,CAAA,CAAG,CAAC,KAAQ,GAAG,CAAA,CAAG,CAAC,IAAA,CAAQ,GAAG,EAAG,CAAC,IAAA,CAAQ,GAAG,CAAA,CAAG,CAAC,KAAQ,GAAG,CAAA,CAExE,CAAC,GAAA,CAAQ,GAAG,EAAG,CAAC,GAAA,CAAQ,GAAG,CAAA,CAAG,CAAC,IAAQ,GAAG,CAAA,CAAG,CAAC,GAAA,CAAQ,GAAG,EAAG,CAAC,GAAA,CAAQ,GAAG,CAAA,CACxE,CAAC,IAAQ,GAAG,CAAA,CAAG,CAAC,GAAA,CAAQ,GAAG,EAAG,CAAC,GAAA,CAAQ,GAAG,CAAA,CAAG,CAAC,IAAQ,GAAG,CAAA,CAAG,CAAC,GAAA,CAAQ,GAAG,EACxE,CAAC,GAAA,CAAQ,GAAG,CAAA,CAAG,CAAC,IAAQ,GAAG,CAAA,CAAG,CAAC,GAAA,CAAQ,GAAG,EAAG,CAAC,GAAA,CAAQ,GAAG,CAAA,CAAG,CAAC,IAAQ,GAAG,CAAA,CACxE,CAAC,GAAA,CAAQ,GAAG,EAAG,CAAC,GAAA,CAAQ,GAAG,CAAA,CAAG,CAAC,GAAA,CAAQ,GAAG,CAAA,CAAG,CAAC,IAAQ,GAAG,CAAA,CAEzD,CAAC,IAAA,CAAQ,GAAG,EAAG,CAAC,IAAA,CAAQ,GAAG,CAAA,CAAG,CAAC,IAAQ,GAAG,CAC5C,CAAC,CAAA,CAKM,SAASC,EAASC,CAAAA,CAA6B,CACpD,IAAMC,CAAAA,CAAUL,CAAAA,CAAQ,IAAII,CAAE,CAAA,CAC9B,GAAIC,CAAAA,CAAS,OAAOA,EAGpB,GAAID,CAAAA,EAAM,QAAWA,CAAAA,EAAM,MAAA,CACzB,OAAOR,CAAAA,CAAI,eAAA,CAAiB,MAAO,EAAE,CAAA,CAGvC,GAAKQ,CAAAA,EAAM,KAAA,EAAUA,GAAM,KAAA,EAAYA,CAAAA,EAAM,QAAWA,CAAAA,EAAM,MAAA,CAC5D,OAAOR,CAAAA,CAAI,oBAAA,CAAsB,qBAAsB,EAAE,CAAA,CAG3D,GAAIQ,CAAAA,EAAM,KAAA,EAAUA,GAAM,KAAA,CACxB,OAAOR,EAAI,gBAAA,CAAkB,WAAA,CAAa,OAAO,YAAA,CAAaQ,CAAAA,CAAK,KAAM,CAAC,CAAA,CAG5E,IAAME,CAAAA,CAAQJ,CAAAA,CAAW,IAAIE,CAAE,CAAA,CAC/B,GAAIE,CAAAA,GAAU,MAAA,CACZ,OAAOV,CAAAA,CAAI,CAAA,cAAA,EAAiBU,CAAK,CAAA,CAAA,CAAA,CAAK,WAAA,CAAaA,CAAK,CAAA,CAG1D,GACGF,GAAM,EAAA,EAAQA,CAAAA,GAAO,GAAQA,CAAAA,GAAO,EAAA,EAAQA,IAAO,EAAA,EACpDA,CAAAA,GAAO,KACNA,CAAAA,EAAM,GAAA,EAAQA,GAAM,GAAA,CACrB,CACA,IAAMP,CAAAA,CAAOI,CAAAA,CAASG,CAAE,CAAA,EAAK,mBAAA,CAC7B,OAAOR,CAAAA,CAAIC,CAAAA,CAAM,UAAW,EAAE,CAChC,CAGA,OAAO,IACT,CCnIO,IAAMU,CAAAA,CAA0C,CACrD,SAAA,CAAW,IAAA,CACX,KAAM,IAAA,CACN,GAAA,CAAK,KACL,OAAA,CAAS,IAAA,CACT,mBAAoB,KAAA,CACpB,eAAA,CAAiB,KACjB,gBAAA,CAAkB,KAAA,CAClB,WAAY,KAAA,CACZ,kBAAA,CAAoB,MACpB,iBAAA,CAAmB,IAAA,CACnB,KAAM,KACR,CAAA,CAGaC,EAA6C,CACxD,SAAA,CAAW,KACX,IAAA,CAAM,IAAA,CACN,IAAK,IAAA,CACL,OAAA,CAAS,KACT,kBAAA,CAAoB,IAAA,CACpB,gBAAiB,IAAA,CACjB,gBAAA,CAAkB,KAClB,UAAA,CAAY,IAAA,CACZ,mBAAoB,IAAA,CACpB,iBAAA,CAAmB,KACnB,IAAA,CAAM,IACR,EAEMC,CAAAA,CAAwD,CAC5D,aAAc,WAAA,CACd,IAAA,CAAM,OACN,GAAA,CAAK,KAAA,CACL,OAAA,CAAS,SAAA,CACT,oBAAA,CAAsB,oBAAA,CACtB,kBAAmB,iBAAA,CACnB,mBAAA,CAAqB,mBACrB,SAAA,CAAW,YACb,EAGMC,CAAAA,CAAuB,IAAI,IAAc,CAC7C,YAAA,CACA,OACA,KAAA,CACA,oBAAA,CACA,UACA,iBACF,CAAC,EAED,SAASC,CAAAA,CAAMP,EAAoB,CACjC,OAAO,KAAOA,CAAAA,CAAG,QAAA,CAAS,EAAE,CAAA,CAAE,WAAA,GAAc,QAAA,CAAS,CAAA,CAAG,GAAG,CAC7D,CAWO,SAASQ,CAAAA,CAAKC,CAAAA,CAAyB,CAC5C,IAAMC,CAAAA,CAAsB,EAAC,CACzBC,CAAAA,CAAQ,EACZ,IAAA,IAAWC,CAAAA,IAAMH,EAAM,CACrB,IAAMT,EAAKY,CAAAA,CAAG,WAAA,CAAY,CAAC,CAAA,CACrBC,CAAAA,CAAOd,EAASC,CAAE,CAAA,CACpBa,GACFH,CAAAA,CAAS,IAAA,CAAK,CACZ,KAAA,CAAAC,CAAAA,CACA,KAAMC,CAAAA,CACN,SAAA,CAAWZ,EACX,GAAA,CAAKO,CAAAA,CAAMP,CAAE,CAAA,CACb,IAAA,CAAMa,EAAK,IAAA,CACX,QAAA,CAAUA,EAAK,QAAA,CACf,QAAA,CAAUA,EAAK,QACjB,CAAC,EAEHF,CAAAA,EAASC,CAAAA,CAAG,OACd,CACA,OAAOF,CACT,CAGO,SAASI,CAAAA,CAAQL,CAAAA,CAAuB,CAC7C,IAAA,IAAWG,KAAMH,CAAAA,CACf,GAAIV,EAASa,CAAAA,CAAG,WAAA,CAAY,CAAC,CAAW,CAAA,CAAG,OAAO,MAAA,CAEpD,OAAO,KACT,CASO,SAASG,EAAMN,CAAAA,CAAqB,CACzC,IAAMO,CAAAA,CAAa,CACjB,aAAc,CAAA,CAAG,IAAA,CAAM,EAAG,GAAA,CAAK,CAAA,CAAG,qBAAsB,CAAA,CACxD,iBAAA,CAAmB,EAAG,OAAA,CAAS,CAAA,CAAG,oBAAqB,CAAA,CAAG,SAAA,CAAW,CACvE,CAAA,CACMC,CAAAA,CAAa,CAAE,MAAA,CAAQ,CAAA,CAAG,QAAS,CAAA,CAAG,IAAA,CAAM,CAAE,CAAA,CAChDC,CAAAA,CAAQ,EACZ,IAAA,IAAWC,CAAAA,IAAKX,EAAKC,CAAI,CAAA,CACvBO,EAAWG,CAAAA,CAAE,QAAQ,IACrBF,CAAAA,CAAWE,CAAAA,CAAE,QAAQ,CAAA,EAAA,CACrBD,CAAAA,EAAAA,CAEF,OAAO,CAAE,KAAA,CAAAA,EAAO,UAAA,CAAAF,CAAAA,CAAY,WAAAC,CAAW,CACzC,CAYO,SAASG,CAAAA,CAAMX,EAAcY,CAAAA,CAAwB,GAAY,CACtE,IAAMC,EAAO,CAAE,GAAGnB,EAAiB,GAAGkB,CAAQ,EAE1CE,CAAAA,CAAMd,CAAAA,CACNa,EAAK,iBAAA,GAAmBC,CAAAA,CAAMA,CAAAA,CAAI,OAAA,CAAQ,QAAA,CAAU;AAAA,CAAI,CAAA,CAAA,CAE5D,IAAIC,CAAAA,CAAM,EAAA,CACV,QAAWZ,CAAAA,IAAMW,CAAAA,CAAK,CACpB,IAAMvB,CAAAA,CAAKY,CAAAA,CAAG,YAAY,CAAC,CAAA,CACrBC,CAAAA,CAAOd,CAAAA,CAASC,CAAE,CAAA,CACxB,GAAI,CAACa,CAAAA,CAAM,CACTW,CAAAA,EAAOZ,CAAAA,CACP,QACF,CACA,IAAMa,CAAAA,CAAUH,EAAKjB,CAAAA,CAAgBQ,CAAAA,CAAK,QAAQ,CAAC,CAAA,CACnDW,CAAAA,EAAOC,CAAAA,CAAUZ,CAAAA,CAAK,WAAA,CAAcD,EACtC,CAEA,OAAIU,CAAAA,CAAK,kBAAA,GACPE,CAAAA,CAAMA,CAAAA,CAAI,QAAQ,WAAA,CAAa,GAAG,CAAA,CAAE,OAAA,CAAQ,iBAAA,CAAmB;AAAA,CAAI,GAEjEF,CAAAA,CAAK,IAAA,GAAME,CAAAA,CAAMA,CAAAA,CAAI,MAAK,CAAA,CAEvBA,CACT,CAeO,SAASE,EAAOjB,CAAAA,CAAcY,CAAAA,CAAyB,EAAC,CAAW,CACxE,IAAMM,CAAAA,CAAQN,CAAAA,CAAQ,KAAA,GAAWF,CAAAA,EAAe,IAAIA,CAAAA,CAAE,GAAG,CAAA,CAAA,CAAA,CAAA,CACrDK,CAAAA,CAAM,GACNb,CAAAA,CAAQ,CAAA,CACZ,IAAA,IAAWC,CAAAA,IAAMH,EAAM,CACrB,IAAMT,EAAKY,CAAAA,CAAG,WAAA,CAAY,CAAC,CAAA,CACrBC,CAAAA,CAAOd,CAAAA,CAASC,CAAE,EACpBa,CAAAA,EAAQP,CAAAA,CAAqB,GAAA,CAAIO,CAAAA,CAAK,QAAQ,CAAA,CAChDW,CAAAA,EAAOG,CAAAA,CAAM,CACX,MAAAhB,CAAAA,CACA,IAAA,CAAMC,EACN,SAAA,CAAWZ,CAAAA,CACX,IAAKO,CAAAA,CAAMP,CAAE,CAAA,CACb,IAAA,CAAMa,EAAK,IAAA,CACX,QAAA,CAAUA,CAAAA,CAAK,QAAA,CACf,SAAUA,CAAAA,CAAK,QACjB,CAAC,CAAA,CAEDW,GAAOZ,CAAAA,CAETD,CAAAA,EAASC,EAAG,OACd,CACA,OAAOY,CACT","file":"chunk-PXW2RFNH.js","sourcesContent":["/**\n * Character data tables for unspook.\n *\n * Every code point we care about, grouped by category, with a human-readable\n * name, a severity, and (where relevant) an ASCII/Latin replacement.\n */\n\nexport type Category =\n | \"zero-width\" // invisible, no width — ZWSP, BOM, word joiner, …\n | \"bidi\" // bidirectional controls — the \"Trojan Source\" attack class\n | \"tag\" // Unicode tag chars — invisible prompt-injection / watermarking\n | \"variation-selector\" // VS1–256 — can be used to hide data on a base char\n | \"invisible-space\" // looks like a space but isn't (NBSP, soft hyphen, …)\n | \"control\" // C0/C1 control characters\n | \"smart-punctuation\" // curly quotes, em dash, ellipsis → ASCII\n | \"homoglyph\"; // letters that look Latin but aren't (Cyrillic а, Greek ο, …)\n\nexport type Severity = \"danger\" | \"warning\" | \"info\";\n\nexport interface CharInfo {\n name: string;\n category: Category;\n severity: Severity;\n /** What to put in place of this char when cleaning. `\"\"` = drop entirely. */\n replacement: string;\n}\n\nexport const SEVERITY_BY_CATEGORY: Record<Category, Severity> = {\n bidi: \"danger\",\n tag: \"danger\",\n \"zero-width\": \"warning\",\n control: \"warning\",\n homoglyph: \"warning\",\n \"variation-selector\": \"warning\",\n \"invisible-space\": \"info\",\n \"smart-punctuation\": \"info\",\n};\n\nconst def = (name: string, category: Category, replacement: string): CharInfo => ({\n name,\n category,\n severity: SEVERITY_BY_CATEGORY[category],\n replacement,\n});\n\n/** Explicitly enumerated special characters, keyed by code point. */\nexport const SPECIAL: Map<number, CharInfo> = new Map([\n // — Zero-width / invisible —\n [0x200b, def(\"ZERO WIDTH SPACE\", \"zero-width\", \"\")],\n [0x200c, def(\"ZERO WIDTH NON-JOINER\", \"zero-width\", \"\")],\n [0x200d, def(\"ZERO WIDTH JOINER\", \"zero-width\", \"\")],\n [0x2060, def(\"WORD JOINER\", \"zero-width\", \"\")],\n [0x2061, def(\"FUNCTION APPLICATION\", \"zero-width\", \"\")],\n [0x2062, def(\"INVISIBLE TIMES\", \"zero-width\", \"\")],\n [0x2063, def(\"INVISIBLE SEPARATOR\", \"zero-width\", \"\")],\n [0x2064, def(\"INVISIBLE PLUS\", \"zero-width\", \"\")],\n [0xfeff, def(\"ZERO WIDTH NO-BREAK SPACE (BOM)\", \"zero-width\", \"\")],\n [0x180e, def(\"MONGOLIAN VOWEL SEPARATOR\", \"zero-width\", \"\")],\n\n // — Bidirectional controls (Trojan Source, CVE-2021-42574) —\n [0x202a, def(\"LEFT-TO-RIGHT EMBEDDING\", \"bidi\", \"\")],\n [0x202b, def(\"RIGHT-TO-LEFT EMBEDDING\", \"bidi\", \"\")],\n [0x202c, def(\"POP DIRECTIONAL FORMATTING\", \"bidi\", \"\")],\n [0x202d, def(\"LEFT-TO-RIGHT OVERRIDE\", \"bidi\", \"\")],\n [0x202e, def(\"RIGHT-TO-LEFT OVERRIDE\", \"bidi\", \"\")],\n [0x2066, def(\"LEFT-TO-RIGHT ISOLATE\", \"bidi\", \"\")],\n [0x2067, def(\"RIGHT-TO-LEFT ISOLATE\", \"bidi\", \"\")],\n [0x2068, def(\"FIRST STRONG ISOLATE\", \"bidi\", \"\")],\n [0x2069, def(\"POP DIRECTIONAL ISOLATE\", \"bidi\", \"\")],\n [0x200e, def(\"LEFT-TO-RIGHT MARK\", \"bidi\", \"\")],\n [0x200f, def(\"RIGHT-TO-LEFT MARK\", \"bidi\", \"\")],\n [0x061c, def(\"ARABIC LETTER MARK\", \"bidi\", \"\")],\n\n // — Spaces that aren't a normal space —\n [0x00a0, def(\"NO-BREAK SPACE\", \"invisible-space\", \" \")],\n [0x202f, def(\"NARROW NO-BREAK SPACE\", \"invisible-space\", \" \")],\n [0x2007, def(\"FIGURE SPACE\", \"invisible-space\", \" \")],\n [0x2008, def(\"PUNCTUATION SPACE\", \"invisible-space\", \" \")],\n [0x2009, def(\"THIN SPACE\", \"invisible-space\", \" \")],\n [0x200a, def(\"HAIR SPACE\", \"invisible-space\", \" \")],\n [0x2000, def(\"EN QUAD\", \"invisible-space\", \" \")],\n [0x2001, def(\"EM QUAD\", \"invisible-space\", \" \")],\n [0x2002, def(\"EN SPACE\", \"invisible-space\", \" \")],\n [0x2003, def(\"EM SPACE\", \"invisible-space\", \" \")],\n [0x2004, def(\"THREE-PER-EM SPACE\", \"invisible-space\", \" \")],\n [0x2005, def(\"FOUR-PER-EM SPACE\", \"invisible-space\", \" \")],\n [0x2006, def(\"SIX-PER-EM SPACE\", \"invisible-space\", \" \")],\n [0x205f, def(\"MEDIUM MATHEMATICAL SPACE\", \"invisible-space\", \" \")],\n [0x3000, def(\"IDEOGRAPHIC SPACE\", \"invisible-space\", \" \")],\n [0x1680, def(\"OGHAM SPACE MARK\", \"invisible-space\", \" \")],\n [0x00ad, def(\"SOFT HYPHEN\", \"invisible-space\", \"\")],\n [0x2028, def(\"LINE SEPARATOR\", \"invisible-space\", \"\\n\")],\n [0x2029, def(\"PARAGRAPH SEPARATOR\", \"invisible-space\", \"\\n\")],\n\n // — Smart / typographic punctuation → ASCII —\n [0x201c, def(\"LEFT DOUBLE QUOTATION MARK\", \"smart-punctuation\", '\"')],\n [0x201d, def(\"RIGHT DOUBLE QUOTATION MARK\", \"smart-punctuation\", '\"')],\n [0x201e, def(\"DOUBLE LOW-9 QUOTATION MARK\", \"smart-punctuation\", '\"')],\n [0x201f, def(\"DOUBLE HIGH-REVERSED-9 QUOTATION MARK\", \"smart-punctuation\", '\"')],\n [0x00ab, def(\"LEFT-POINTING DOUBLE ANGLE QUOTATION MARK\", \"smart-punctuation\", '\"')],\n [0x00bb, def(\"RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK\", \"smart-punctuation\", '\"')],\n [0x2018, def(\"LEFT SINGLE QUOTATION MARK\", \"smart-punctuation\", \"'\")],\n [0x2019, def(\"RIGHT SINGLE QUOTATION MARK\", \"smart-punctuation\", \"'\")],\n [0x201a, def(\"SINGLE LOW-9 QUOTATION MARK\", \"smart-punctuation\", \"'\")],\n [0x201b, def(\"SINGLE HIGH-REVERSED-9 QUOTATION MARK\", \"smart-punctuation\", \"'\")],\n [0x2039, def(\"SINGLE LEFT-POINTING ANGLE QUOTATION MARK\", \"smart-punctuation\", \"'\")],\n [0x203a, def(\"SINGLE RIGHT-POINTING ANGLE QUOTATION MARK\", \"smart-punctuation\", \"'\")],\n [0x2032, def(\"PRIME\", \"smart-punctuation\", \"'\")],\n [0x2033, def(\"DOUBLE PRIME\", \"smart-punctuation\", '\"')],\n [0x2013, def(\"EN DASH\", \"smart-punctuation\", \"-\")],\n [0x2014, def(\"EM DASH\", \"smart-punctuation\", \"--\")],\n [0x2015, def(\"HORIZONTAL BAR\", \"smart-punctuation\", \"--\")],\n [0x2212, def(\"MINUS SIGN\", \"smart-punctuation\", \"-\")],\n [0x2026, def(\"HORIZONTAL ELLIPSIS\", \"smart-punctuation\", \"...\")],\n]);\n\n/** Names for the C0 control characters worth labelling. */\nconst C0_NAMES: Record<number, string> = {\n 0x00: \"NULL\", 0x01: \"START OF HEADING\", 0x02: \"START OF TEXT\", 0x03: \"END OF TEXT\",\n 0x04: \"END OF TRANSMISSION\", 0x05: \"ENQUIRY\", 0x06: \"ACKNOWLEDGE\", 0x07: \"BELL\",\n 0x08: \"BACKSPACE\", 0x0b: \"LINE TABULATION\", 0x0c: \"FORM FEED\", 0x0e: \"SHIFT OUT\",\n 0x0f: \"SHIFT IN\", 0x10: \"DATA LINK ESCAPE\", 0x1b: \"ESCAPE\", 0x7f: \"DELETE\",\n};\n\n/**\n * Homoglyphs: characters that render like a Latin letter/number but aren't.\n * Mapped to their Latin look-alike. (A curated, high-confidence subset of the\n * Unicode confusables data.)\n */\nexport const HOMOGLYPHS: Map<number, string> = new Map([\n // Cyrillic (lowercase)\n [0x0430, \"a\"], [0x0435, \"e\"], [0x043e, \"o\"], [0x0440, \"p\"], [0x0441, \"c\"],\n [0x0443, \"y\"], [0x0445, \"x\"], [0x0455, \"s\"], [0x0456, \"i\"], [0x0458, \"j\"],\n [0x04bb, \"h\"], [0x0501, \"d\"], [0x043a, \"k\"], [0x043c, \"m\"], [0x0442, \"t\"],\n // Cyrillic (uppercase)\n [0x0410, \"A\"], [0x0412, \"B\"], [0x0415, \"E\"], [0x041a, \"K\"], [0x041c, \"M\"],\n [0x041d, \"H\"], [0x041e, \"O\"], [0x0420, \"P\"], [0x0421, \"C\"], [0x0422, \"T\"],\n [0x0423, \"Y\"], [0x0425, \"X\"], [0x0405, \"S\"], [0x0406, \"I\"], [0x0408, \"J\"],\n // Greek\n [0x03bf, \"o\"], [0x03b1, \"a\"], [0x03b3, \"y\"], [0x03c1, \"p\"], [0x03c5, \"u\"],\n [0x0391, \"A\"], [0x0392, \"B\"], [0x0395, \"E\"], [0x0396, \"Z\"], [0x0397, \"H\"],\n [0x0399, \"I\"], [0x039a, \"K\"], [0x039c, \"M\"], [0x039d, \"N\"], [0x039f, \"O\"],\n [0x03a1, \"P\"], [0x03a4, \"T\"], [0x03a5, \"Y\"], [0x03a7, \"X\"],\n // Latin look-alikes / letterlike symbols\n [0x212f, \"e\"], [0x2113, \"l\"], [0x0131, \"i\"],\n]);\n\nconst ASCII_PRINTABLE = (cp: number): boolean => cp >= 0x20 && cp <= 0x7e;\n\n/** Classify a single code point, or return `null` if it's unremarkable. */\nexport function classify(cp: number): CharInfo | null {\n const special = SPECIAL.get(cp);\n if (special) return special;\n\n // Tag characters (used for invisible prompt injection / watermarking).\n if (cp >= 0xe0000 && cp <= 0xe007f) {\n return def(\"TAG CHARACTER\", \"tag\", \"\");\n }\n // Variation selectors (can hide data on a base glyph).\n if ((cp >= 0xfe00 && cp <= 0xfe0f) || (cp >= 0xe0100 && cp <= 0xe01ef)) {\n return def(\"VARIATION SELECTOR\", \"variation-selector\", \"\");\n }\n // Fullwidth ASCII forms → normal ASCII.\n if (cp >= 0xff01 && cp <= 0xff5e) {\n return def(\"FULLWIDTH FORM\", \"homoglyph\", String.fromCharCode(cp - 0xfee0));\n }\n // Explicit homoglyph table.\n const latin = HOMOGLYPHS.get(cp);\n if (latin !== undefined) {\n return def(`HOMOGLYPH OF \"${latin}\"`, \"homoglyph\", latin);\n }\n // C0/C1 control characters (excluding tab, newline, carriage return).\n if (\n (cp <= 0x1f && cp !== 0x09 && cp !== 0x0a && cp !== 0x0d) ||\n cp === 0x7f ||\n (cp >= 0x80 && cp <= 0x9f)\n ) {\n const name = C0_NAMES[cp] ?? \"CONTROL CHARACTER\";\n return def(name, \"control\", \"\");\n }\n // Everything else (incl. normal ASCII, emoji, CJK, accented Latin) is fine.\n void ASCII_PRINTABLE;\n return null;\n}\n","/**\n * unspook — find and remove the invisible, dangerous, and confusable\n * characters hiding in your text.\n *\n * Zero dependencies. Pure functions. Runs anywhere JavaScript does.\n */\n\nimport { classify, type Category, type Severity } from \"./data.js\";\n\nexport type { Category, Severity } from \"./data.js\";\n\nexport interface Finding {\n /** UTF-16 index of the character in the source string. */\n index: number;\n /** The offending character itself. */\n char: string;\n /** Unicode code point. */\n codePoint: number;\n /** Formatted code point, e.g. `\"U+200B\"`. */\n hex: string;\n /** Human-readable Unicode name. */\n name: string;\n category: Category;\n severity: Severity;\n}\n\nexport interface CleanOptions {\n /** Remove zero-width / invisible characters (ZWSP, BOM, word joiner…). Default `true`. */\n zeroWidth?: boolean;\n /** Remove bidirectional control characters (the \"Trojan Source\" class). Default `true`. */\n bidi?: boolean;\n /** Remove Unicode tag characters (invisible prompt-injection / watermarks). Default `true`. */\n tag?: boolean;\n /** Remove C0/C1 control characters. Default `true`. */\n control?: boolean;\n /** Remove variation selectors. Default `false` (they're legitimate in emoji). */\n variationSelectors?: boolean;\n /** Normalize exotic spaces (NBSP→space, soft hyphen→removed, line sep→newline). Default `true`. */\n invisibleSpaces?: boolean;\n /** Convert smart/typographic punctuation to ASCII (“ ”→\", —→--, …→...). Default `false`. */\n smartPunctuation?: boolean;\n /** Map homoglyphs to their Latin look-alike (Cyrillic а→a, fullwidth A→A). Default `false`. */\n homoglyphs?: boolean;\n /** Collapse runs of spaces/tabs into one space. Default `false`. */\n collapseWhitespace?: boolean;\n /** Normalize `\\r\\n` and `\\r` to `\\n`. Default `true`. */\n normalizeNewlines?: boolean;\n /** Trim leading/trailing whitespace from the whole string. Default `false`. */\n trim?: boolean;\n}\n\n/** The default, safe cleaning profile: strip the dangerous & invisible, keep meaning. */\nexport const DEFAULT_OPTIONS: Required<CleanOptions> = {\n zeroWidth: true,\n bidi: true,\n tag: true,\n control: true,\n variationSelectors: false,\n invisibleSpaces: true,\n smartPunctuation: false,\n homoglyphs: false,\n collapseWhitespace: false,\n normalizeNewlines: true,\n trim: false,\n};\n\n/** Turn everything on — for when you want maximally plain ASCII-ish text. */\nexport const AGGRESSIVE_OPTIONS: Required<CleanOptions> = {\n zeroWidth: true,\n bidi: true,\n tag: true,\n control: true,\n variationSelectors: true,\n invisibleSpaces: true,\n smartPunctuation: true,\n homoglyphs: true,\n collapseWhitespace: true,\n normalizeNewlines: true,\n trim: true,\n};\n\nconst CATEGORY_OPTION: Record<Category, keyof CleanOptions> = {\n \"zero-width\": \"zeroWidth\",\n bidi: \"bidi\",\n tag: \"tag\",\n control: \"control\",\n \"variation-selector\": \"variationSelectors\",\n \"invisible-space\": \"invisibleSpaces\",\n \"smart-punctuation\": \"smartPunctuation\",\n homoglyph: \"homoglyphs\",\n};\n\n/** Categories whose characters are genuinely invisible (used by {@link reveal}). */\nconst INVISIBLE_CATEGORIES = new Set<Category>([\n \"zero-width\",\n \"bidi\",\n \"tag\",\n \"variation-selector\",\n \"control\",\n \"invisible-space\",\n]);\n\nfunction toHex(cp: number): string {\n return \"U+\" + cp.toString(16).toUpperCase().padStart(4, \"0\");\n}\n\n/**\n * Find every suspicious character in `text`.\n *\n * ```ts\n * scan(\"hi​there\");\n * // [{ index: 2, char: \"​\", codePoint: 8203, hex: \"U+200B\",\n * // name: \"ZERO WIDTH SPACE\", category: \"zero-width\", severity: \"warning\" }]\n * ```\n */\nexport function scan(text: string): Finding[] {\n const findings: Finding[] = [];\n let index = 0;\n for (const ch of text) {\n const cp = ch.codePointAt(0) as number;\n const info = classify(cp);\n if (info) {\n findings.push({\n index,\n char: ch,\n codePoint: cp,\n hex: toHex(cp),\n name: info.name,\n category: info.category,\n severity: info.severity,\n });\n }\n index += ch.length;\n }\n return findings;\n}\n\n/** `true` if `text` contains no suspicious characters at all. */\nexport function isClean(text: string): boolean {\n for (const ch of text) {\n if (classify(ch.codePointAt(0) as number)) return false;\n }\n return true;\n}\n\nexport interface Stats {\n total: number;\n byCategory: Record<Category, number>;\n bySeverity: Record<Severity, number>;\n}\n\n/** Summarize what's lurking in `text` without listing every occurrence. */\nexport function stats(text: string): Stats {\n const byCategory = {\n \"zero-width\": 0, bidi: 0, tag: 0, \"variation-selector\": 0,\n \"invisible-space\": 0, control: 0, \"smart-punctuation\": 0, homoglyph: 0,\n } as Record<Category, number>;\n const bySeverity = { danger: 0, warning: 0, info: 0 } as Record<Severity, number>;\n let total = 0;\n for (const f of scan(text)) {\n byCategory[f.category]++;\n bySeverity[f.severity]++;\n total++;\n }\n return { total, byCategory, bySeverity };\n}\n\n/**\n * Return a cleaned copy of `text`. By default it strips the dangerous and\n * invisible characters while preserving the visible meaning of your text.\n *\n * ```ts\n * clean(\"Hello​world\"); // \"Helloworld\"\n * clean(\"“quote”\", { smartPunctuation: true }); // '\"quote\"'\n * clean(\"аdmin\", { homoglyphs: true }); // \"admin\" (Cyrillic а → a)\n * ```\n */\nexport function clean(text: string, options: CleanOptions = {}): string {\n const opts = { ...DEFAULT_OPTIONS, ...options };\n\n let src = text;\n if (opts.normalizeNewlines) src = src.replace(/\\r\\n?/g, \"\\n\");\n\n let out = \"\";\n for (const ch of src) {\n const cp = ch.codePointAt(0) as number;\n const info = classify(cp);\n if (!info) {\n out += ch;\n continue;\n }\n const enabled = opts[CATEGORY_OPTION[info.category]];\n out += enabled ? info.replacement : ch;\n }\n\n if (opts.collapseWhitespace) {\n out = out.replace(/[^\\S\\n]+/g, \" \").replace(/[ \\t]*\\n[ \\t]*/g, \"\\n\");\n }\n if (opts.trim) out = out.trim();\n\n return out;\n}\n\nexport interface RevealOptions {\n /** Custom token renderer. Default: `(f) => \"[\" + f.hex + \"]\"`. */\n token?: (finding: Finding) => string;\n}\n\n/**\n * Make invisible characters visible by replacing them with a readable token,\n * leaving everything else untouched — handy for logs and terminals.\n *\n * ```ts\n * reveal(\"a​b\"); // \"a[U+200B]b\"\n * ```\n */\nexport function reveal(text: string, options: RevealOptions = {}): string {\n const token = options.token ?? ((f: Finding) => `[${f.hex}]`);\n let out = \"\";\n let index = 0;\n for (const ch of text) {\n const cp = ch.codePointAt(0) as number;\n const info = classify(cp);\n if (info && INVISIBLE_CATEGORIES.has(info.category)) {\n out += token({\n index,\n char: ch,\n codePoint: cp,\n hex: toHex(cp),\n name: info.name,\n category: info.category,\n severity: info.severity,\n });\n } else {\n out += ch;\n }\n index += ch.length;\n }\n return out;\n}\n"]}
package/dist/cli.cjs ADDED
@@ -0,0 +1,30 @@
1
+ #!/usr/bin/env node
2
+ 'use strict';var fs=require('fs');var T={bidi:"danger",tag:"danger","zero-width":"warning",control:"warning",homoglyph:"warning","variation-selector":"warning","invisible-space":"info","smart-punctuation":"info"},e=(t,s,r)=>({name:t,category:s,severity:T[s],replacement:r}),f=new Map([[8203,e("ZERO WIDTH SPACE","zero-width","")],[8204,e("ZERO WIDTH NON-JOINER","zero-width","")],[8205,e("ZERO WIDTH JOINER","zero-width","")],[8288,e("WORD JOINER","zero-width","")],[8289,e("FUNCTION APPLICATION","zero-width","")],[8290,e("INVISIBLE TIMES","zero-width","")],[8291,e("INVISIBLE SEPARATOR","zero-width","")],[8292,e("INVISIBLE PLUS","zero-width","")],[65279,e("ZERO WIDTH NO-BREAK SPACE (BOM)","zero-width","")],[6158,e("MONGOLIAN VOWEL SEPARATOR","zero-width","")],[8234,e("LEFT-TO-RIGHT EMBEDDING","bidi","")],[8235,e("RIGHT-TO-LEFT EMBEDDING","bidi","")],[8236,e("POP DIRECTIONAL FORMATTING","bidi","")],[8237,e("LEFT-TO-RIGHT OVERRIDE","bidi","")],[8238,e("RIGHT-TO-LEFT OVERRIDE","bidi","")],[8294,e("LEFT-TO-RIGHT ISOLATE","bidi","")],[8295,e("RIGHT-TO-LEFT ISOLATE","bidi","")],[8296,e("FIRST STRONG ISOLATE","bidi","")],[8297,e("POP DIRECTIONAL ISOLATE","bidi","")],[8206,e("LEFT-TO-RIGHT MARK","bidi","")],[8207,e("RIGHT-TO-LEFT MARK","bidi","")],[1564,e("ARABIC LETTER MARK","bidi","")],[160,e("NO-BREAK SPACE","invisible-space"," ")],[8239,e("NARROW NO-BREAK SPACE","invisible-space"," ")],[8199,e("FIGURE SPACE","invisible-space"," ")],[8200,e("PUNCTUATION SPACE","invisible-space"," ")],[8201,e("THIN SPACE","invisible-space"," ")],[8202,e("HAIR SPACE","invisible-space"," ")],[8192,e("EN QUAD","invisible-space"," ")],[8193,e("EM QUAD","invisible-space"," ")],[8194,e("EN SPACE","invisible-space"," ")],[8195,e("EM SPACE","invisible-space"," ")],[8196,e("THREE-PER-EM SPACE","invisible-space"," ")],[8197,e("FOUR-PER-EM SPACE","invisible-space"," ")],[8198,e("SIX-PER-EM SPACE","invisible-space"," ")],[8287,e("MEDIUM MATHEMATICAL SPACE","invisible-space"," ")],[12288,e("IDEOGRAPHIC SPACE","invisible-space"," ")],[5760,e("OGHAM SPACE MARK","invisible-space"," ")],[173,e("SOFT HYPHEN","invisible-space","")],[8232,e("LINE SEPARATOR","invisible-space",`
3
+ `)],[8233,e("PARAGRAPH SEPARATOR","invisible-space",`
4
+ `)],[8220,e("LEFT DOUBLE QUOTATION MARK","smart-punctuation",'"')],[8221,e("RIGHT DOUBLE QUOTATION MARK","smart-punctuation",'"')],[8222,e("DOUBLE LOW-9 QUOTATION MARK","smart-punctuation",'"')],[8223,e("DOUBLE HIGH-REVERSED-9 QUOTATION MARK","smart-punctuation",'"')],[171,e("LEFT-POINTING DOUBLE ANGLE QUOTATION MARK","smart-punctuation",'"')],[187,e("RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK","smart-punctuation",'"')],[8216,e("LEFT SINGLE QUOTATION MARK","smart-punctuation","'")],[8217,e("RIGHT SINGLE QUOTATION MARK","smart-punctuation","'")],[8218,e("SINGLE LOW-9 QUOTATION MARK","smart-punctuation","'")],[8219,e("SINGLE HIGH-REVERSED-9 QUOTATION MARK","smart-punctuation","'")],[8249,e("SINGLE LEFT-POINTING ANGLE QUOTATION MARK","smart-punctuation","'")],[8250,e("SINGLE RIGHT-POINTING ANGLE QUOTATION MARK","smart-punctuation","'")],[8242,e("PRIME","smart-punctuation","'")],[8243,e("DOUBLE PRIME","smart-punctuation",'"')],[8211,e("EN DASH","smart-punctuation","-")],[8212,e("EM DASH","smart-punctuation","--")],[8213,e("HORIZONTAL BAR","smart-punctuation","--")],[8722,e("MINUS SIGN","smart-punctuation","-")],[8230,e("HORIZONTAL ELLIPSIS","smart-punctuation","...")]]),d={0:"NULL",1:"START OF HEADING",2:"START OF TEXT",3:"END OF TEXT",4:"END OF TRANSMISSION",5:"ENQUIRY",6:"ACKNOWLEDGE",7:"BELL",8:"BACKSPACE",11:"LINE TABULATION",12:"FORM FEED",14:"SHIFT OUT",15:"SHIFT IN",16:"DATA LINK ESCAPE",27:"ESCAPE",127:"DELETE"},g=new Map([[1072,"a"],[1077,"e"],[1086,"o"],[1088,"p"],[1089,"c"],[1091,"y"],[1093,"x"],[1109,"s"],[1110,"i"],[1112,"j"],[1211,"h"],[1281,"d"],[1082,"k"],[1084,"m"],[1090,"t"],[1040,"A"],[1042,"B"],[1045,"E"],[1050,"K"],[1052,"M"],[1053,"H"],[1054,"O"],[1056,"P"],[1057,"C"],[1058,"T"],[1059,"Y"],[1061,"X"],[1029,"S"],[1030,"I"],[1032,"J"],[959,"o"],[945,"a"],[947,"y"],[961,"p"],[965,"u"],[913,"A"],[914,"B"],[917,"E"],[918,"Z"],[919,"H"],[921,"I"],[922,"K"],[924,"M"],[925,"N"],[927,"O"],[929,"P"],[932,"T"],[933,"Y"],[935,"X"],[8495,"e"],[8467,"l"],[305,"i"]]);function l(t){let s=f.get(t);if(s)return s;if(t>=917504&&t<=917631)return e("TAG CHARACTER","tag","");if(t>=65024&&t<=65039||t>=917760&&t<=917999)return e("VARIATION SELECTOR","variation-selector","");if(t>=65281&&t<=65374)return e("FULLWIDTH FORM","homoglyph",String.fromCharCode(t-65248));let r=g.get(t);if(r!==void 0)return e(`HOMOGLYPH OF "${r}"`,"homoglyph",r);if(t<=31&&t!==9&&t!==10&&t!==13||t===127||t>=128&&t<=159){let o=d[t]??"CONTROL CHARACTER";return e(o,"control","")}return null}var b={zeroWidth:true,bidi:true,tag:true,control:true,variationSelectors:false,invisibleSpaces:true,smartPunctuation:false,homoglyphs:false,collapseWhitespace:false,normalizeNewlines:true,trim:false},x={zeroWidth:true,bidi:true,tag:true,control:true,variationSelectors:true,invisibleSpaces:true,smartPunctuation:true,homoglyphs:true,collapseWhitespace:true,normalizeNewlines:true,trim:true},h={"zero-width":"zeroWidth",bidi:"bidi",tag:"tag",control:"control","variation-selector":"variationSelectors","invisible-space":"invisibleSpaces","smart-punctuation":"smartPunctuation",homoglyph:"homoglyphs"},m=new Set(["zero-width","bidi","tag","variation-selector","control","invisible-space"]);function u(t){return "U+"+t.toString(16).toUpperCase().padStart(4,"0")}function p(t){let s=[],r=0;for(let o of t){let n=o.codePointAt(0),i=l(n);i&&s.push({index:r,char:o,codePoint:n,hex:u(n),name:i.name,category:i.category,severity:i.severity}),r+=o.length;}return s}function E(t,s={}){let r={...b,...s},o=t;r.normalizeNewlines&&(o=o.replace(/\r\n?/g,`
5
+ `));let n="";for(let i of o){let c=i.codePointAt(0),a=l(c);if(!a){n+=i;continue}let A=r[h[a.category]];n+=A?a.replacement:i;}return r.collapseWhitespace&&(n=n.replace(/[^\S\n]+/g," ").replace(/[ \t]*\n[ \t]*/g,`
6
+ `)),r.trim&&(n=n.trim()),n}function O(t,s={}){let r=s.token??(i=>`[${i.hex}]`),o="",n=0;for(let i of t){let c=i.codePointAt(0),a=l(c);a&&m.has(a.category)?o+=r({index:n,char:i,codePoint:c,hex:u(c),name:a.name,category:a.category,severity:a.severity}):o+=i,n+=i.length;}return o}var N=`unspook \u2014 find & remove invisible / dangerous / confusable characters
7
+
8
+ Usage:
9
+ unspook [options] [files...]
10
+ cat file | unspook [options]
11
+
12
+ Options:
13
+ -s, --scan Report findings instead of cleaning; exit 1 if any found
14
+ -w, --write Rewrite files in place (with cleaned output)
15
+ -r, --reveal Print text with invisible characters made visible
16
+ -a, --aggressive Also normalize smart punctuation, homoglyphs & whitespace
17
+ --smart-quotes Convert smart punctuation to ASCII
18
+ --homoglyphs Map look-alike letters to Latin (Cyrillic \u0430 \u2192 a)
19
+ --collapse Collapse runs of whitespace
20
+ --trim Trim leading/trailing whitespace
21
+ --keep-nbsp Keep non-breaking & exotic spaces
22
+ -h, --help Show this help
23
+ -v, --version Show version
24
+ `;function v(t){let s=new Set,r=[],o={"-s":"--scan","-w":"--write","-r":"--reveal","-a":"--aggressive","-h":"--help","-v":"--version"};for(let n of t)n.startsWith("-")?s.add(o[n]??n):r.push(n);return {flags:s,files:r}}function y(){try{return fs.readFileSync(0,"utf8")}catch{return ""}}function C(t){return t.has("--aggressive")?x:{smartPunctuation:t.has("--smart-quotes"),homoglyphs:t.has("--homoglyphs"),collapseWhitespace:t.has("--collapse"),trim:t.has("--trim"),invisibleSpaces:!t.has("--keep-nbsp")}}function L(t,s){let r=p(s),o=t||"<stdin>";if(r.length===0)return process.stderr.write(`\u2713 ${o}: clean
25
+ `),0;process.stderr.write(`\u2717 ${o}: ${r.length} finding(s)
26
+ `);for(let n of r)process.stderr.write(` ${n.hex.padEnd(10)} ${n.severity.padEnd(8)} ${n.category.padEnd(18)} ${n.name} @ ${n.index}
27
+ `);return 1}async function P(){let{flags:t,files:s}=v(process.argv.slice(2));if(t.has("--help"))return process.stdout.write(N),0;if(t.has("--version"))return process.stdout.write(`unspook 0.1.0
28
+ `),0;let r=C(t),o=s.length>0?s.map(i=>({name:i,text:fs.readFileSync(i,"utf8")})):[{name:"",text:y()}];if(t.has("--scan")){let i=0;for(let{name:c,text:a}of o)i=L(c,a)||i;return i}let n=0;for(let{name:i,text:c}of o){let a=t.has("--reveal")?O(c):E(c,r);t.has("--write")&&i?(fs.writeFileSync(i,a),process.stderr.write(`\u2713 cleaned ${i}
29
+ `)):process.stdout.write(a);}return n}P().then(t=>process.exit(t));//# sourceMappingURL=cli.cjs.map
30
+ //# sourceMappingURL=cli.cjs.map
@@ -0,0 +1 @@
1
+ {"version":3,"sources":["../src/data.ts","../src/index.ts","../src/cli.ts"],"names":["SEVERITY_BY_CATEGORY","def","name","category","replacement","SPECIAL","C0_NAMES","HOMOGLYPHS","classify","cp","special","latin","DEFAULT_OPTIONS","AGGRESSIVE_OPTIONS","CATEGORY_OPTION","INVISIBLE_CATEGORIES","toHex","scan","text","findings","index","ch","info","clean","options","opts","src","out","enabled","reveal","token","f","HELP","parseArgs","argv","flags","files","alias","arg","readStdin","readFileSync","optionsFromFlags","report","label","main","inputs","code","exitCode","output","writeFileSync"],"mappings":";kCA2BO,IAAMA,EAAmD,CAC9D,IAAA,CAAM,QAAA,CACN,GAAA,CAAK,SACL,YAAA,CAAc,SAAA,CACd,QAAS,SAAA,CACT,SAAA,CAAW,UACX,oBAAA,CAAsB,SAAA,CACtB,iBAAA,CAAmB,MAAA,CACnB,oBAAqB,MACvB,CAAA,CAEMC,EAAM,CAACC,CAAAA,CAAcC,EAAoBC,CAAAA,IAAmC,CAChF,KAAAF,CAAAA,CACA,QAAA,CAAAC,EACA,QAAA,CAAUH,CAAAA,CAAqBG,CAAQ,CAAA,CACvC,WAAA,CAAAC,CACF,CAAA,CAAA,CAGaC,CAAAA,CAAiC,IAAI,GAAA,CAAI,CAEpD,CAAC,IAAA,CAAQJ,EAAI,kBAAA,CAAoB,YAAA,CAAc,EAAE,CAAC,CAAA,CAClD,CAAC,IAAA,CAAQA,CAAAA,CAAI,wBAAyB,YAAA,CAAc,EAAE,CAAC,CAAA,CACvD,CAAC,KAAQA,CAAAA,CAAI,mBAAA,CAAqB,YAAA,CAAc,EAAE,CAAC,CAAA,CACnD,CAAC,KAAQA,CAAAA,CAAI,aAAA,CAAe,aAAc,EAAE,CAAC,EAC7C,CAAC,IAAA,CAAQA,EAAI,sBAAA,CAAwB,YAAA,CAAc,EAAE,CAAC,CAAA,CACtD,CAAC,IAAA,CAAQA,CAAAA,CAAI,iBAAA,CAAmB,YAAA,CAAc,EAAE,CAAC,CAAA,CACjD,CAAC,IAAA,CAAQA,CAAAA,CAAI,sBAAuB,YAAA,CAAc,EAAE,CAAC,CAAA,CACrD,CAAC,KAAQA,CAAAA,CAAI,gBAAA,CAAkB,aAAc,EAAE,CAAC,EAChD,CAAC,KAAA,CAAQA,CAAAA,CAAI,iCAAA,CAAmC,aAAc,EAAE,CAAC,EACjE,CAAC,IAAA,CAAQA,EAAI,2BAAA,CAA6B,YAAA,CAAc,EAAE,CAAC,CAAA,CAG3D,CAAC,IAAA,CAAQA,CAAAA,CAAI,0BAA2B,MAAA,CAAQ,EAAE,CAAC,CAAA,CACnD,CAAC,IAAA,CAAQA,CAAAA,CAAI,0BAA2B,MAAA,CAAQ,EAAE,CAAC,CAAA,CACnD,CAAC,KAAQA,CAAAA,CAAI,4BAAA,CAA8B,OAAQ,EAAE,CAAC,EACtD,CAAC,IAAA,CAAQA,EAAI,wBAAA,CAA0B,MAAA,CAAQ,EAAE,CAAC,CAAA,CAClD,CAAC,IAAA,CAAQA,EAAI,wBAAA,CAA0B,MAAA,CAAQ,EAAE,CAAC,CAAA,CAClD,CAAC,IAAA,CAAQA,CAAAA,CAAI,wBAAyB,MAAA,CAAQ,EAAE,CAAC,CAAA,CACjD,CAAC,KAAQA,CAAAA,CAAI,uBAAA,CAAyB,OAAQ,EAAE,CAAC,CAAA,CACjD,CAAC,KAAQA,CAAAA,CAAI,sBAAA,CAAwB,OAAQ,EAAE,CAAC,EAChD,CAAC,IAAA,CAAQA,CAAAA,CAAI,yBAAA,CAA2B,OAAQ,EAAE,CAAC,EACnD,CAAC,IAAA,CAAQA,EAAI,oBAAA,CAAsB,MAAA,CAAQ,EAAE,CAAC,EAC9C,CAAC,IAAA,CAAQA,EAAI,oBAAA,CAAsB,MAAA,CAAQ,EAAE,CAAC,CAAA,CAC9C,CAAC,IAAA,CAAQA,CAAAA,CAAI,qBAAsB,MAAA,CAAQ,EAAE,CAAC,CAAA,CAG9C,CAAC,IAAQA,CAAAA,CAAI,gBAAA,CAAkB,iBAAA,CAAmB,GAAG,CAAC,CAAA,CACtD,CAAC,KAAQA,CAAAA,CAAI,uBAAA,CAAyB,kBAAmB,GAAG,CAAC,EAC7D,CAAC,IAAA,CAAQA,EAAI,cAAA,CAAgB,iBAAA,CAAmB,GAAG,CAAC,CAAA,CACpD,CAAC,IAAA,CAAQA,CAAAA,CAAI,mBAAA,CAAqB,iBAAA,CAAmB,GAAG,CAAC,CAAA,CACzD,CAAC,IAAA,CAAQA,CAAAA,CAAI,aAAc,iBAAA,CAAmB,GAAG,CAAC,CAAA,CAClD,CAAC,KAAQA,CAAAA,CAAI,YAAA,CAAc,kBAAmB,GAAG,CAAC,EAClD,CAAC,IAAA,CAAQA,CAAAA,CAAI,SAAA,CAAW,kBAAmB,GAAG,CAAC,EAC/C,CAAC,IAAA,CAAQA,EAAI,SAAA,CAAW,iBAAA,CAAmB,GAAG,CAAC,CAAA,CAC/C,CAAC,IAAA,CAAQA,CAAAA,CAAI,WAAY,iBAAA,CAAmB,GAAG,CAAC,CAAA,CAChD,CAAC,IAAA,CAAQA,CAAAA,CAAI,WAAY,iBAAA,CAAmB,GAAG,CAAC,CAAA,CAChD,CAAC,KAAQA,CAAAA,CAAI,oBAAA,CAAsB,kBAAmB,GAAG,CAAC,EAC1D,CAAC,IAAA,CAAQA,EAAI,mBAAA,CAAqB,iBAAA,CAAmB,GAAG,CAAC,CAAA,CACzD,CAAC,IAAA,CAAQA,EAAI,kBAAA,CAAoB,iBAAA,CAAmB,GAAG,CAAC,CAAA,CACxD,CAAC,IAAA,CAAQA,CAAAA,CAAI,4BAA6B,iBAAA,CAAmB,GAAG,CAAC,CAAA,CACjE,CAAC,MAAQA,CAAAA,CAAI,mBAAA,CAAqB,kBAAmB,GAAG,CAAC,CAAA,CACzD,CAAC,KAAQA,CAAAA,CAAI,kBAAA,CAAoB,kBAAmB,GAAG,CAAC,EACxD,CAAC,GAAA,CAAQA,EAAI,aAAA,CAAe,iBAAA,CAAmB,EAAE,CAAC,CAAA,CAClD,CAAC,IAAA,CAAQA,CAAAA,CAAI,iBAAkB,iBAAA,CAAmB;AAAA,CAAI,CAAC,CAAA,CACvD,CAAC,IAAA,CAAQA,CAAAA,CAAI,sBAAuB,iBAAA,CAAmB;AAAA,CAAI,CAAC,CAAA,CAG5D,CAAC,IAAA,CAAQA,CAAAA,CAAI,6BAA8B,mBAAA,CAAqB,GAAG,CAAC,CAAA,CACpE,CAAC,IAAA,CAAQA,CAAAA,CAAI,8BAA+B,mBAAA,CAAqB,GAAG,CAAC,CAAA,CACrE,CAAC,IAAA,CAAQA,CAAAA,CAAI,8BAA+B,mBAAA,CAAqB,GAAG,CAAC,CAAA,CACrE,CAAC,KAAQA,CAAAA,CAAI,uCAAA,CAAyC,mBAAA,CAAqB,GAAG,CAAC,CAAA,CAC/E,CAAC,IAAQA,CAAAA,CAAI,2CAAA,CAA6C,oBAAqB,GAAG,CAAC,CAAA,CACnF,CAAC,IAAQA,CAAAA,CAAI,4CAAA,CAA8C,oBAAqB,GAAG,CAAC,EACpF,CAAC,IAAA,CAAQA,CAAAA,CAAI,4BAAA,CAA8B,oBAAqB,GAAG,CAAC,EACpE,CAAC,IAAA,CAAQA,EAAI,6BAAA,CAA+B,mBAAA,CAAqB,GAAG,CAAC,EACrE,CAAC,IAAA,CAAQA,EAAI,6BAAA,CAA+B,mBAAA,CAAqB,GAAG,CAAC,CAAA,CACrE,CAAC,IAAA,CAAQA,EAAI,uCAAA,CAAyC,mBAAA,CAAqB,GAAG,CAAC,CAAA,CAC/E,CAAC,IAAA,CAAQA,CAAAA,CAAI,2CAAA,CAA6C,mBAAA,CAAqB,GAAG,CAAC,CAAA,CACnF,CAAC,IAAA,CAAQA,EAAI,4CAAA,CAA8C,mBAAA,CAAqB,GAAG,CAAC,EACpF,CAAC,IAAA,CAAQA,EAAI,OAAA,CAAS,mBAAA,CAAqB,GAAG,CAAC,CAAA,CAC/C,CAAC,IAAA,CAAQA,EAAI,cAAA,CAAgB,mBAAA,CAAqB,GAAG,CAAC,CAAA,CACtD,CAAC,IAAA,CAAQA,CAAAA,CAAI,SAAA,CAAW,mBAAA,CAAqB,GAAG,CAAC,CAAA,CACjD,CAAC,IAAA,CAAQA,CAAAA,CAAI,UAAW,mBAAA,CAAqB,IAAI,CAAC,CAAA,CAClD,CAAC,IAAA,CAAQA,CAAAA,CAAI,iBAAkB,mBAAA,CAAqB,IAAI,CAAC,CAAA,CACzD,CAAC,IAAA,CAAQA,CAAAA,CAAI,aAAc,mBAAA,CAAqB,GAAG,CAAC,CAAA,CACpD,CAAC,KAAQA,CAAAA,CAAI,qBAAA,CAAuB,mBAAA,CAAqB,KAAK,CAAC,CACjE,CAAC,EAGKK,CAAAA,CAAmC,CACvC,EAAM,MAAA,CAAQ,CAAA,CAAM,kBAAA,CAAoB,CAAA,CAAM,gBAAiB,CAAA,CAAM,aAAA,CACrE,EAAM,qBAAA,CAAuB,CAAA,CAAM,UAAW,CAAA,CAAM,aAAA,CAAe,CAAA,CAAM,MAAA,CACzE,EAAM,WAAA,CAAa,EAAA,CAAM,iBAAA,CAAmB,EAAA,CAAM,YAAa,EAAA,CAAM,WAAA,CACrE,EAAA,CAAM,UAAA,CAAY,GAAM,kBAAA,CAAoB,EAAA,CAAM,SAAU,GAAA,CAAM,QACpE,EAOaC,CAAAA,CAAkC,IAAI,GAAA,CAAI,CAErD,CAAC,IAAA,CAAQ,GAAG,EAAG,CAAC,IAAA,CAAQ,GAAG,CAAA,CAAG,CAAC,IAAA,CAAQ,GAAG,EAAG,CAAC,IAAA,CAAQ,GAAG,CAAA,CAAG,CAAC,KAAQ,GAAG,CAAA,CACxE,CAAC,IAAA,CAAQ,GAAG,CAAA,CAAG,CAAC,KAAQ,GAAG,CAAA,CAAG,CAAC,IAAA,CAAQ,GAAG,CAAA,CAAG,CAAC,KAAQ,GAAG,CAAA,CAAG,CAAC,IAAA,CAAQ,GAAG,EACxE,CAAC,IAAA,CAAQ,GAAG,CAAA,CAAG,CAAC,IAAA,CAAQ,GAAG,EAAG,CAAC,IAAA,CAAQ,GAAG,CAAA,CAAG,CAAC,IAAA,CAAQ,GAAG,EAAG,CAAC,IAAA,CAAQ,GAAG,CAAA,CAExE,CAAC,KAAQ,GAAG,CAAA,CAAG,CAAC,IAAA,CAAQ,GAAG,CAAA,CAAG,CAAC,IAAA,CAAQ,GAAG,EAAG,CAAC,IAAA,CAAQ,GAAG,CAAA,CAAG,CAAC,IAAA,CAAQ,GAAG,EACxE,CAAC,IAAA,CAAQ,GAAG,CAAA,CAAG,CAAC,IAAA,CAAQ,GAAG,EAAG,CAAC,IAAA,CAAQ,GAAG,CAAA,CAAG,CAAC,KAAQ,GAAG,CAAA,CAAG,CAAC,IAAA,CAAQ,GAAG,CAAA,CACxE,CAAC,KAAQ,GAAG,CAAA,CAAG,CAAC,IAAA,CAAQ,GAAG,CAAA,CAAG,CAAC,KAAQ,GAAG,CAAA,CAAG,CAAC,IAAA,CAAQ,GAAG,EAAG,CAAC,IAAA,CAAQ,GAAG,CAAA,CAExE,CAAC,GAAA,CAAQ,GAAG,EAAG,CAAC,GAAA,CAAQ,GAAG,CAAA,CAAG,CAAC,GAAA,CAAQ,GAAG,EAAG,CAAC,GAAA,CAAQ,GAAG,CAAA,CAAG,CAAC,IAAQ,GAAG,CAAA,CACxE,CAAC,GAAA,CAAQ,GAAG,CAAA,CAAG,CAAC,IAAQ,GAAG,CAAA,CAAG,CAAC,GAAA,CAAQ,GAAG,CAAA,CAAG,CAAC,IAAQ,GAAG,CAAA,CAAG,CAAC,GAAA,CAAQ,GAAG,CAAA,CACxE,CAAC,GAAA,CAAQ,GAAG,EAAG,CAAC,GAAA,CAAQ,GAAG,CAAA,CAAG,CAAC,IAAQ,GAAG,CAAA,CAAG,CAAC,GAAA,CAAQ,GAAG,CAAA,CAAG,CAAC,IAAQ,GAAG,CAAA,CACxE,CAAC,GAAA,CAAQ,GAAG,CAAA,CAAG,CAAC,IAAQ,GAAG,CAAA,CAAG,CAAC,GAAA,CAAQ,GAAG,EAAG,CAAC,GAAA,CAAQ,GAAG,CAAA,CAEzD,CAAC,IAAA,CAAQ,GAAG,EAAG,CAAC,IAAA,CAAQ,GAAG,CAAA,CAAG,CAAC,GAAA,CAAQ,GAAG,CAC5C,CAAC,CAAA,CAKM,SAASC,CAAAA,CAASC,EAA6B,CACpD,IAAMC,EAAUL,CAAAA,CAAQ,GAAA,CAAII,CAAE,CAAA,CAC9B,GAAIC,CAAAA,CAAS,OAAOA,EAGpB,GAAID,CAAAA,EAAM,QAAWA,CAAAA,EAAM,MAAA,CACzB,OAAOR,CAAAA,CAAI,eAAA,CAAiB,KAAA,CAAO,EAAE,EAGvC,GAAKQ,CAAAA,EAAM,KAAA,EAAUA,CAAAA,EAAM,OAAYA,CAAAA,EAAM,MAAA,EAAWA,CAAAA,EAAM,MAAA,CAC5D,OAAOR,CAAAA,CAAI,oBAAA,CAAsB,qBAAsB,EAAE,CAAA,CAG3D,GAAIQ,CAAAA,EAAM,KAAA,EAAUA,CAAAA,EAAM,KAAA,CACxB,OAAOR,CAAAA,CAAI,gBAAA,CAAkB,YAAa,MAAA,CAAO,YAAA,CAAaQ,EAAK,KAAM,CAAC,CAAA,CAG5E,IAAME,EAAQJ,CAAAA,CAAW,GAAA,CAAIE,CAAE,CAAA,CAC/B,GAAIE,IAAU,MAAA,CACZ,OAAOV,CAAAA,CAAI,CAAA,cAAA,EAAiBU,CAAK,CAAA,CAAA,CAAA,CAAK,WAAA,CAAaA,CAAK,CAAA,CAG1D,GACGF,GAAM,EAAA,EAAQA,CAAAA,GAAO,CAAA,EAAQA,CAAAA,GAAO,IAAQA,CAAAA,GAAO,EAAA,EACpDA,IAAO,GAAA,EACNA,CAAAA,EAAM,KAAQA,CAAAA,EAAM,GAAA,CACrB,CACA,IAAMP,EAAOI,CAAAA,CAASG,CAAE,GAAK,mBAAA,CAC7B,OAAOR,EAAIC,CAAAA,CAAM,SAAA,CAAW,EAAE,CAChC,CAGA,OAAO,IACT,CCnIO,IAAMU,CAAAA,CAA0C,CACrD,SAAA,CAAW,IAAA,CACX,IAAA,CAAM,IAAA,CACN,IAAK,IAAA,CACL,OAAA,CAAS,IAAA,CACT,kBAAA,CAAoB,MACpB,eAAA,CAAiB,IAAA,CACjB,gBAAA,CAAkB,KAAA,CAClB,WAAY,KAAA,CACZ,kBAAA,CAAoB,MACpB,iBAAA,CAAmB,IAAA,CACnB,KAAM,KACR,CAAA,CAGaC,CAAAA,CAA6C,CACxD,UAAW,IAAA,CACX,IAAA,CAAM,KACN,GAAA,CAAK,IAAA,CACL,QAAS,IAAA,CACT,kBAAA,CAAoB,IAAA,CACpB,eAAA,CAAiB,KACjB,gBAAA,CAAkB,IAAA,CAClB,WAAY,IAAA,CACZ,kBAAA,CAAoB,KACpB,iBAAA,CAAmB,IAAA,CACnB,IAAA,CAAM,IACR,EAEMC,CAAAA,CAAwD,CAC5D,aAAc,WAAA,CACd,IAAA,CAAM,OACN,GAAA,CAAK,KAAA,CACL,OAAA,CAAS,SAAA,CACT,qBAAsB,oBAAA,CACtB,iBAAA,CAAmB,kBACnB,mBAAA,CAAqB,kBAAA,CACrB,UAAW,YACb,CAAA,CAGMC,CAAAA,CAAuB,IAAI,IAAc,CAC7C,YAAA,CACA,OACA,KAAA,CACA,oBAAA,CACA,UACA,iBACF,CAAC,CAAA,CAED,SAASC,EAAMP,CAAAA,CAAoB,CACjC,OAAO,IAAA,CAAOA,CAAAA,CAAG,SAAS,EAAE,CAAA,CAAE,WAAA,EAAY,CAAE,SAAS,CAAA,CAAG,GAAG,CAC7D,CAWO,SAASQ,CAAAA,CAAKC,CAAAA,CAAyB,CAC5C,IAAMC,EAAsB,EAAC,CACzBC,EAAQ,CAAA,CACZ,IAAA,IAAWC,KAAMH,CAAAA,CAAM,CACrB,IAAMT,CAAAA,CAAKY,EAAG,WAAA,CAAY,CAAC,EACrBC,CAAAA,CAAOd,CAAAA,CAASC,CAAE,CAAA,CACpBa,CAAAA,EACFH,CAAAA,CAAS,IAAA,CAAK,CACZ,KAAA,CAAAC,CAAAA,CACA,KAAMC,CAAAA,CACN,SAAA,CAAWZ,EACX,GAAA,CAAKO,CAAAA,CAAMP,CAAE,CAAA,CACb,KAAMa,CAAAA,CAAK,IAAA,CACX,SAAUA,CAAAA,CAAK,QAAA,CACf,SAAUA,CAAAA,CAAK,QACjB,CAAC,CAAA,CAEHF,GAASC,CAAAA,CAAG,OACd,CACA,OAAOF,CACT,CA0CO,SAASI,CAAAA,CAAML,CAAAA,CAAcM,CAAAA,CAAwB,EAAC,CAAW,CACtE,IAAMC,CAAAA,CAAO,CAAE,GAAGb,CAAAA,CAAiB,GAAGY,CAAQ,CAAA,CAE1CE,EAAMR,CAAAA,CACNO,CAAAA,CAAK,oBAAmBC,CAAAA,CAAMA,CAAAA,CAAI,QAAQ,QAAA,CAAU;AAAA,CAAI,CAAA,CAAA,CAE5D,IAAIC,CAAAA,CAAM,EAAA,CACV,QAAWN,CAAAA,IAAMK,CAAAA,CAAK,CACpB,IAAMjB,CAAAA,CAAKY,CAAAA,CAAG,YAAY,CAAC,CAAA,CACrBC,CAAAA,CAAOd,CAAAA,CAASC,CAAE,CAAA,CACxB,GAAI,CAACa,CAAAA,CAAM,CACTK,CAAAA,EAAON,CAAAA,CACP,QACF,CACA,IAAMO,CAAAA,CAAUH,EAAKX,CAAAA,CAAgBQ,CAAAA,CAAK,QAAQ,CAAC,CAAA,CACnDK,CAAAA,EAAOC,CAAAA,CAAUN,CAAAA,CAAK,WAAA,CAAcD,EACtC,CAEA,OAAII,CAAAA,CAAK,kBAAA,GACPE,CAAAA,CAAMA,CAAAA,CAAI,QAAQ,WAAA,CAAa,GAAG,CAAA,CAAE,OAAA,CAAQ,iBAAA,CAAmB;AAAA,CAAI,GAEjEF,CAAAA,CAAK,IAAA,GAAME,CAAAA,CAAMA,CAAAA,CAAI,MAAK,CAAA,CAEvBA,CACT,CAeO,SAASE,EAAOX,CAAAA,CAAcM,CAAAA,CAAyB,EAAC,CAAW,CACxE,IAAMM,CAAAA,CAAQN,CAAAA,CAAQ,KAAA,GAAWO,CAAAA,EAAe,IAAIA,CAAAA,CAAE,GAAG,KACrDJ,CAAAA,CAAM,EAAA,CACNP,EAAQ,CAAA,CACZ,IAAA,IAAWC,CAAAA,IAAMH,CAAAA,CAAM,CACrB,IAAMT,CAAAA,CAAKY,EAAG,WAAA,CAAY,CAAC,EACrBC,CAAAA,CAAOd,CAAAA,CAASC,CAAE,CAAA,CACpBa,GAAQP,CAAAA,CAAqB,GAAA,CAAIO,EAAK,QAAQ,CAAA,CAChDK,GAAOG,CAAAA,CAAM,CACX,KAAA,CAAAV,CAAAA,CACA,KAAMC,CAAAA,CACN,SAAA,CAAWZ,EACX,GAAA,CAAKO,CAAAA,CAAMP,CAAE,CAAA,CACb,IAAA,CAAMa,CAAAA,CAAK,IAAA,CACX,SAAUA,CAAAA,CAAK,QAAA,CACf,SAAUA,CAAAA,CAAK,QACjB,CAAC,CAAA,CAEDK,CAAAA,EAAON,CAAAA,CAETD,CAAAA,EAASC,EAAG,OACd,CACA,OAAOM,CACT,CClOA,IAAMK,CAAAA,CAAO,CAAA;;AAAA;AAAA;AAAA;;AAAA;AAAA;AAAA;AAAA;AAAA;AAAA;AAAA;AAAA;AAAA;AAAA;AAAA;AAAA;AAAA,CAAA,CAoBb,SAASC,CAAAA,CAAUC,CAAAA,CAAgB,CACjC,IAAMC,CAAAA,CAAQ,IAAI,GAAA,CACZC,CAAAA,CAAkB,EAAC,CACnBC,CAAAA,CAAgC,CACpC,IAAA,CAAM,QAAA,CAAU,IAAA,CAAM,SAAA,CAAW,IAAA,CAAM,UAAA,CACvC,IAAA,CAAM,cAAA,CAAgB,IAAA,CAAM,QAAA,CAAU,IAAA,CAAM,WAC9C,CAAA,CACA,IAAA,IAAWC,CAAAA,IAAOJ,EACZI,CAAAA,CAAI,UAAA,CAAW,GAAG,CAAA,CAAGH,CAAAA,CAAM,GAAA,CAAIE,CAAAA,CAAMC,CAAG,CAAA,EAAKA,CAAG,CAAA,CAC/CF,CAAAA,CAAM,IAAA,CAAKE,CAAG,CAAA,CAErB,OAAO,CAAE,KAAA,CAAAH,CAAAA,CAAO,KAAA,CAAAC,CAAM,CACxB,CAEA,SAASG,CAAAA,EAAoB,CAC3B,GAAI,CACF,OAAOC,eAAAA,CAAa,CAAA,CAAG,MAAM,CAC/B,CAAA,KAAQ,CACN,OAAO,EACT,CACF,CAEA,SAASC,CAAAA,CAAiBN,CAAAA,CAAkC,CAC1D,OAAIA,CAAAA,CAAM,GAAA,CAAI,cAAc,CAAA,CAAUtB,CAAAA,CAC/B,CACL,gBAAA,CAAkBsB,CAAAA,CAAM,GAAA,CAAI,gBAAgB,CAAA,CAC5C,UAAA,CAAYA,CAAAA,CAAM,GAAA,CAAI,cAAc,CAAA,CACpC,kBAAA,CAAoBA,CAAAA,CAAM,GAAA,CAAI,YAAY,CAAA,CAC1C,KAAMA,CAAAA,CAAM,GAAA,CAAI,QAAQ,CAAA,CACxB,eAAA,CAAiB,CAACA,CAAAA,CAAM,GAAA,CAAI,aAAa,CAC3C,CACF,CAEA,SAASO,CAAAA,CAAOxC,CAAAA,CAAcgB,CAAAA,CAAsB,CAClD,IAAMC,CAAAA,CAAWF,CAAAA,CAAKC,CAAI,CAAA,CACpByB,CAAAA,CAAQzC,CAAAA,EAAQ,SAAA,CACtB,GAAIiB,CAAAA,CAAS,MAAA,GAAW,CAAA,CACtB,OAAA,OAAA,CAAQ,MAAA,CAAO,KAAA,CAAM,UAAKwB,CAAK,CAAA;AAAA,CAAW,CAAA,CACnC,EAET,OAAA,CAAQ,MAAA,CAAO,MAAM,CAAA,OAAA,EAAKA,CAAK,CAAA,EAAA,EAAKxB,CAAAA,CAAS,MAAM,CAAA;AAAA,CAAe,CAAA,CAClE,IAAA,IAAWY,CAAAA,IAAKZ,CAAAA,CACd,OAAA,CAAQ,MAAA,CAAO,KAAA,CAAM,CAAA,IAAA,EAAOY,CAAAA,CAAE,GAAA,CAAI,MAAA,CAAO,EAAE,CAAC,CAAA,CAAA,EAAIA,CAAAA,CAAE,QAAA,CAAS,MAAA,CAAO,CAAC,CAAC,CAAA,CAAA,EAAIA,CAAAA,CAAE,QAAA,CAAS,MAAA,CAAO,EAAE,CAAC,CAAA,CAAA,EAAIA,CAAAA,CAAE,IAAI,CAAA,GAAA,EAAMA,EAAE,KAAK;AAAA,CAAI,EAE1H,OAAO,CACT,CAEA,eAAea,GAAwB,CACrC,GAAM,CAAE,KAAA,CAAAT,EAAO,KAAA,CAAAC,CAAM,EAAIH,CAAAA,CAAU,OAAA,CAAQ,KAAK,KAAA,CAAM,CAAC,CAAC,CAAA,CAExD,GAAIE,CAAAA,CAAM,GAAA,CAAI,QAAQ,CAAA,CACpB,OAAA,OAAA,CAAQ,OAAO,KAAA,CAAMH,CAAI,CAAA,CAClB,CAAA,CAET,GAAIG,CAAAA,CAAM,GAAA,CAAI,WAAW,CAAA,CACvB,OAAA,OAAA,CAAQ,OAAO,KAAA,CAAM,CAAA;AAAA,CAAiB,CAAA,CAC/B,EAGT,IAAMX,CAAAA,CAAUiB,EAAiBN,CAAK,CAAA,CAChCU,EACJT,CAAAA,CAAM,MAAA,CAAS,EACXA,CAAAA,CAAM,GAAA,CAAKlC,IAAU,CAAE,IAAA,CAAAA,EAAM,IAAA,CAAMsC,eAAAA,CAAatC,CAAAA,CAAM,MAAM,CAAE,CAAA,CAAE,EAChE,CAAC,CAAE,KAAM,EAAA,CAAI,IAAA,CAAMqC,GAAY,CAAC,EAEtC,GAAIJ,CAAAA,CAAM,IAAI,QAAQ,CAAA,CAAG,CACvB,IAAIW,CAAAA,CAAO,EACX,IAAA,GAAW,CAAE,IAAA,CAAA5C,CAAAA,CAAM,IAAA,CAAAgB,CAAK,IAAK2B,CAAAA,CAAQC,CAAAA,CAAOJ,EAAOxC,CAAAA,CAAMgB,CAAI,GAAK4B,CAAAA,CAClE,OAAOA,CACT,CAEA,IAAIC,EAAW,CAAA,CACf,IAAA,GAAW,CAAE,IAAA,CAAA7C,CAAAA,CAAM,KAAAgB,CAAK,CAAA,GAAK2B,CAAAA,CAAQ,CACnC,IAAMG,CAAAA,CAASb,EAAM,GAAA,CAAI,UAAU,EAAIN,CAAAA,CAAOX,CAAI,EAAIK,CAAAA,CAAML,CAAAA,CAAMM,CAAO,CAAA,CACrEW,CAAAA,CAAM,GAAA,CAAI,SAAS,CAAA,EAAKjC,CAAAA,EAC1B+C,iBAAc/C,CAAAA,CAAM8C,CAAM,EAC1B,OAAA,CAAQ,MAAA,CAAO,KAAA,CAAM,CAAA,eAAA,EAAa9C,CAAI;AAAA,CAAI,GAE1C,OAAA,CAAQ,MAAA,CAAO,KAAA,CAAM8C,CAAM,EAE/B,CACA,OAAOD,CACT,CAEAH,GAAK,CAAE,IAAA,CAAME,GAAS,OAAA,CAAQ,IAAA,CAAKA,CAAI,CAAC,CAAA","file":"cli.cjs","sourcesContent":["/**\n * Character data tables for unspook.\n *\n * Every code point we care about, grouped by category, with a human-readable\n * name, a severity, and (where relevant) an ASCII/Latin replacement.\n */\n\nexport type Category =\n | \"zero-width\" // invisible, no width — ZWSP, BOM, word joiner, …\n | \"bidi\" // bidirectional controls — the \"Trojan Source\" attack class\n | \"tag\" // Unicode tag chars — invisible prompt-injection / watermarking\n | \"variation-selector\" // VS1–256 — can be used to hide data on a base char\n | \"invisible-space\" // looks like a space but isn't (NBSP, soft hyphen, …)\n | \"control\" // C0/C1 control characters\n | \"smart-punctuation\" // curly quotes, em dash, ellipsis → ASCII\n | \"homoglyph\"; // letters that look Latin but aren't (Cyrillic а, Greek ο, …)\n\nexport type Severity = \"danger\" | \"warning\" | \"info\";\n\nexport interface CharInfo {\n name: string;\n category: Category;\n severity: Severity;\n /** What to put in place of this char when cleaning. `\"\"` = drop entirely. */\n replacement: string;\n}\n\nexport const SEVERITY_BY_CATEGORY: Record<Category, Severity> = {\n bidi: \"danger\",\n tag: \"danger\",\n \"zero-width\": \"warning\",\n control: \"warning\",\n homoglyph: \"warning\",\n \"variation-selector\": \"warning\",\n \"invisible-space\": \"info\",\n \"smart-punctuation\": \"info\",\n};\n\nconst def = (name: string, category: Category, replacement: string): CharInfo => ({\n name,\n category,\n severity: SEVERITY_BY_CATEGORY[category],\n replacement,\n});\n\n/** Explicitly enumerated special characters, keyed by code point. */\nexport const SPECIAL: Map<number, CharInfo> = new Map([\n // — Zero-width / invisible —\n [0x200b, def(\"ZERO WIDTH SPACE\", \"zero-width\", \"\")],\n [0x200c, def(\"ZERO WIDTH NON-JOINER\", \"zero-width\", \"\")],\n [0x200d, def(\"ZERO WIDTH JOINER\", \"zero-width\", \"\")],\n [0x2060, def(\"WORD JOINER\", \"zero-width\", \"\")],\n [0x2061, def(\"FUNCTION APPLICATION\", \"zero-width\", \"\")],\n [0x2062, def(\"INVISIBLE TIMES\", \"zero-width\", \"\")],\n [0x2063, def(\"INVISIBLE SEPARATOR\", \"zero-width\", \"\")],\n [0x2064, def(\"INVISIBLE PLUS\", \"zero-width\", \"\")],\n [0xfeff, def(\"ZERO WIDTH NO-BREAK SPACE (BOM)\", \"zero-width\", \"\")],\n [0x180e, def(\"MONGOLIAN VOWEL SEPARATOR\", \"zero-width\", \"\")],\n\n // — Bidirectional controls (Trojan Source, CVE-2021-42574) —\n [0x202a, def(\"LEFT-TO-RIGHT EMBEDDING\", \"bidi\", \"\")],\n [0x202b, def(\"RIGHT-TO-LEFT EMBEDDING\", \"bidi\", \"\")],\n [0x202c, def(\"POP DIRECTIONAL FORMATTING\", \"bidi\", \"\")],\n [0x202d, def(\"LEFT-TO-RIGHT OVERRIDE\", \"bidi\", \"\")],\n [0x202e, def(\"RIGHT-TO-LEFT OVERRIDE\", \"bidi\", \"\")],\n [0x2066, def(\"LEFT-TO-RIGHT ISOLATE\", \"bidi\", \"\")],\n [0x2067, def(\"RIGHT-TO-LEFT ISOLATE\", \"bidi\", \"\")],\n [0x2068, def(\"FIRST STRONG ISOLATE\", \"bidi\", \"\")],\n [0x2069, def(\"POP DIRECTIONAL ISOLATE\", \"bidi\", \"\")],\n [0x200e, def(\"LEFT-TO-RIGHT MARK\", \"bidi\", \"\")],\n [0x200f, def(\"RIGHT-TO-LEFT MARK\", \"bidi\", \"\")],\n [0x061c, def(\"ARABIC LETTER MARK\", \"bidi\", \"\")],\n\n // — Spaces that aren't a normal space —\n [0x00a0, def(\"NO-BREAK SPACE\", \"invisible-space\", \" \")],\n [0x202f, def(\"NARROW NO-BREAK SPACE\", \"invisible-space\", \" \")],\n [0x2007, def(\"FIGURE SPACE\", \"invisible-space\", \" \")],\n [0x2008, def(\"PUNCTUATION SPACE\", \"invisible-space\", \" \")],\n [0x2009, def(\"THIN SPACE\", \"invisible-space\", \" \")],\n [0x200a, def(\"HAIR SPACE\", \"invisible-space\", \" \")],\n [0x2000, def(\"EN QUAD\", \"invisible-space\", \" \")],\n [0x2001, def(\"EM QUAD\", \"invisible-space\", \" \")],\n [0x2002, def(\"EN SPACE\", \"invisible-space\", \" \")],\n [0x2003, def(\"EM SPACE\", \"invisible-space\", \" \")],\n [0x2004, def(\"THREE-PER-EM SPACE\", \"invisible-space\", \" \")],\n [0x2005, def(\"FOUR-PER-EM SPACE\", \"invisible-space\", \" \")],\n [0x2006, def(\"SIX-PER-EM SPACE\", \"invisible-space\", \" \")],\n [0x205f, def(\"MEDIUM MATHEMATICAL SPACE\", \"invisible-space\", \" \")],\n [0x3000, def(\"IDEOGRAPHIC SPACE\", \"invisible-space\", \" \")],\n [0x1680, def(\"OGHAM SPACE MARK\", \"invisible-space\", \" \")],\n [0x00ad, def(\"SOFT HYPHEN\", \"invisible-space\", \"\")],\n [0x2028, def(\"LINE SEPARATOR\", \"invisible-space\", \"\\n\")],\n [0x2029, def(\"PARAGRAPH SEPARATOR\", \"invisible-space\", \"\\n\")],\n\n // — Smart / typographic punctuation → ASCII —\n [0x201c, def(\"LEFT DOUBLE QUOTATION MARK\", \"smart-punctuation\", '\"')],\n [0x201d, def(\"RIGHT DOUBLE QUOTATION MARK\", \"smart-punctuation\", '\"')],\n [0x201e, def(\"DOUBLE LOW-9 QUOTATION MARK\", \"smart-punctuation\", '\"')],\n [0x201f, def(\"DOUBLE HIGH-REVERSED-9 QUOTATION MARK\", \"smart-punctuation\", '\"')],\n [0x00ab, def(\"LEFT-POINTING DOUBLE ANGLE QUOTATION MARK\", \"smart-punctuation\", '\"')],\n [0x00bb, def(\"RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK\", \"smart-punctuation\", '\"')],\n [0x2018, def(\"LEFT SINGLE QUOTATION MARK\", \"smart-punctuation\", \"'\")],\n [0x2019, def(\"RIGHT SINGLE QUOTATION MARK\", \"smart-punctuation\", \"'\")],\n [0x201a, def(\"SINGLE LOW-9 QUOTATION MARK\", \"smart-punctuation\", \"'\")],\n [0x201b, def(\"SINGLE HIGH-REVERSED-9 QUOTATION MARK\", \"smart-punctuation\", \"'\")],\n [0x2039, def(\"SINGLE LEFT-POINTING ANGLE QUOTATION MARK\", \"smart-punctuation\", \"'\")],\n [0x203a, def(\"SINGLE RIGHT-POINTING ANGLE QUOTATION MARK\", \"smart-punctuation\", \"'\")],\n [0x2032, def(\"PRIME\", \"smart-punctuation\", \"'\")],\n [0x2033, def(\"DOUBLE PRIME\", \"smart-punctuation\", '\"')],\n [0x2013, def(\"EN DASH\", \"smart-punctuation\", \"-\")],\n [0x2014, def(\"EM DASH\", \"smart-punctuation\", \"--\")],\n [0x2015, def(\"HORIZONTAL BAR\", \"smart-punctuation\", \"--\")],\n [0x2212, def(\"MINUS SIGN\", \"smart-punctuation\", \"-\")],\n [0x2026, def(\"HORIZONTAL ELLIPSIS\", \"smart-punctuation\", \"...\")],\n]);\n\n/** Names for the C0 control characters worth labelling. */\nconst C0_NAMES: Record<number, string> = {\n 0x00: \"NULL\", 0x01: \"START OF HEADING\", 0x02: \"START OF TEXT\", 0x03: \"END OF TEXT\",\n 0x04: \"END OF TRANSMISSION\", 0x05: \"ENQUIRY\", 0x06: \"ACKNOWLEDGE\", 0x07: \"BELL\",\n 0x08: \"BACKSPACE\", 0x0b: \"LINE TABULATION\", 0x0c: \"FORM FEED\", 0x0e: \"SHIFT OUT\",\n 0x0f: \"SHIFT IN\", 0x10: \"DATA LINK ESCAPE\", 0x1b: \"ESCAPE\", 0x7f: \"DELETE\",\n};\n\n/**\n * Homoglyphs: characters that render like a Latin letter/number but aren't.\n * Mapped to their Latin look-alike. (A curated, high-confidence subset of the\n * Unicode confusables data.)\n */\nexport const HOMOGLYPHS: Map<number, string> = new Map([\n // Cyrillic (lowercase)\n [0x0430, \"a\"], [0x0435, \"e\"], [0x043e, \"o\"], [0x0440, \"p\"], [0x0441, \"c\"],\n [0x0443, \"y\"], [0x0445, \"x\"], [0x0455, \"s\"], [0x0456, \"i\"], [0x0458, \"j\"],\n [0x04bb, \"h\"], [0x0501, \"d\"], [0x043a, \"k\"], [0x043c, \"m\"], [0x0442, \"t\"],\n // Cyrillic (uppercase)\n [0x0410, \"A\"], [0x0412, \"B\"], [0x0415, \"E\"], [0x041a, \"K\"], [0x041c, \"M\"],\n [0x041d, \"H\"], [0x041e, \"O\"], [0x0420, \"P\"], [0x0421, \"C\"], [0x0422, \"T\"],\n [0x0423, \"Y\"], [0x0425, \"X\"], [0x0405, \"S\"], [0x0406, \"I\"], [0x0408, \"J\"],\n // Greek\n [0x03bf, \"o\"], [0x03b1, \"a\"], [0x03b3, \"y\"], [0x03c1, \"p\"], [0x03c5, \"u\"],\n [0x0391, \"A\"], [0x0392, \"B\"], [0x0395, \"E\"], [0x0396, \"Z\"], [0x0397, \"H\"],\n [0x0399, \"I\"], [0x039a, \"K\"], [0x039c, \"M\"], [0x039d, \"N\"], [0x039f, \"O\"],\n [0x03a1, \"P\"], [0x03a4, \"T\"], [0x03a5, \"Y\"], [0x03a7, \"X\"],\n // Latin look-alikes / letterlike symbols\n [0x212f, \"e\"], [0x2113, \"l\"], [0x0131, \"i\"],\n]);\n\nconst ASCII_PRINTABLE = (cp: number): boolean => cp >= 0x20 && cp <= 0x7e;\n\n/** Classify a single code point, or return `null` if it's unremarkable. */\nexport function classify(cp: number): CharInfo | null {\n const special = SPECIAL.get(cp);\n if (special) return special;\n\n // Tag characters (used for invisible prompt injection / watermarking).\n if (cp >= 0xe0000 && cp <= 0xe007f) {\n return def(\"TAG CHARACTER\", \"tag\", \"\");\n }\n // Variation selectors (can hide data on a base glyph).\n if ((cp >= 0xfe00 && cp <= 0xfe0f) || (cp >= 0xe0100 && cp <= 0xe01ef)) {\n return def(\"VARIATION SELECTOR\", \"variation-selector\", \"\");\n }\n // Fullwidth ASCII forms → normal ASCII.\n if (cp >= 0xff01 && cp <= 0xff5e) {\n return def(\"FULLWIDTH FORM\", \"homoglyph\", String.fromCharCode(cp - 0xfee0));\n }\n // Explicit homoglyph table.\n const latin = HOMOGLYPHS.get(cp);\n if (latin !== undefined) {\n return def(`HOMOGLYPH OF \"${latin}\"`, \"homoglyph\", latin);\n }\n // C0/C1 control characters (excluding tab, newline, carriage return).\n if (\n (cp <= 0x1f && cp !== 0x09 && cp !== 0x0a && cp !== 0x0d) ||\n cp === 0x7f ||\n (cp >= 0x80 && cp <= 0x9f)\n ) {\n const name = C0_NAMES[cp] ?? \"CONTROL CHARACTER\";\n return def(name, \"control\", \"\");\n }\n // Everything else (incl. normal ASCII, emoji, CJK, accented Latin) is fine.\n void ASCII_PRINTABLE;\n return null;\n}\n","/**\n * unspook — find and remove the invisible, dangerous, and confusable\n * characters hiding in your text.\n *\n * Zero dependencies. Pure functions. Runs anywhere JavaScript does.\n */\n\nimport { classify, type Category, type Severity } from \"./data.js\";\n\nexport type { Category, Severity } from \"./data.js\";\n\nexport interface Finding {\n /** UTF-16 index of the character in the source string. */\n index: number;\n /** The offending character itself. */\n char: string;\n /** Unicode code point. */\n codePoint: number;\n /** Formatted code point, e.g. `\"U+200B\"`. */\n hex: string;\n /** Human-readable Unicode name. */\n name: string;\n category: Category;\n severity: Severity;\n}\n\nexport interface CleanOptions {\n /** Remove zero-width / invisible characters (ZWSP, BOM, word joiner…). Default `true`. */\n zeroWidth?: boolean;\n /** Remove bidirectional control characters (the \"Trojan Source\" class). Default `true`. */\n bidi?: boolean;\n /** Remove Unicode tag characters (invisible prompt-injection / watermarks). Default `true`. */\n tag?: boolean;\n /** Remove C0/C1 control characters. Default `true`. */\n control?: boolean;\n /** Remove variation selectors. Default `false` (they're legitimate in emoji). */\n variationSelectors?: boolean;\n /** Normalize exotic spaces (NBSP→space, soft hyphen→removed, line sep→newline). Default `true`. */\n invisibleSpaces?: boolean;\n /** Convert smart/typographic punctuation to ASCII (“ ”→\", —→--, …→...). Default `false`. */\n smartPunctuation?: boolean;\n /** Map homoglyphs to their Latin look-alike (Cyrillic а→a, fullwidth A→A). Default `false`. */\n homoglyphs?: boolean;\n /** Collapse runs of spaces/tabs into one space. Default `false`. */\n collapseWhitespace?: boolean;\n /** Normalize `\\r\\n` and `\\r` to `\\n`. Default `true`. */\n normalizeNewlines?: boolean;\n /** Trim leading/trailing whitespace from the whole string. Default `false`. */\n trim?: boolean;\n}\n\n/** The default, safe cleaning profile: strip the dangerous & invisible, keep meaning. */\nexport const DEFAULT_OPTIONS: Required<CleanOptions> = {\n zeroWidth: true,\n bidi: true,\n tag: true,\n control: true,\n variationSelectors: false,\n invisibleSpaces: true,\n smartPunctuation: false,\n homoglyphs: false,\n collapseWhitespace: false,\n normalizeNewlines: true,\n trim: false,\n};\n\n/** Turn everything on — for when you want maximally plain ASCII-ish text. */\nexport const AGGRESSIVE_OPTIONS: Required<CleanOptions> = {\n zeroWidth: true,\n bidi: true,\n tag: true,\n control: true,\n variationSelectors: true,\n invisibleSpaces: true,\n smartPunctuation: true,\n homoglyphs: true,\n collapseWhitespace: true,\n normalizeNewlines: true,\n trim: true,\n};\n\nconst CATEGORY_OPTION: Record<Category, keyof CleanOptions> = {\n \"zero-width\": \"zeroWidth\",\n bidi: \"bidi\",\n tag: \"tag\",\n control: \"control\",\n \"variation-selector\": \"variationSelectors\",\n \"invisible-space\": \"invisibleSpaces\",\n \"smart-punctuation\": \"smartPunctuation\",\n homoglyph: \"homoglyphs\",\n};\n\n/** Categories whose characters are genuinely invisible (used by {@link reveal}). */\nconst INVISIBLE_CATEGORIES = new Set<Category>([\n \"zero-width\",\n \"bidi\",\n \"tag\",\n \"variation-selector\",\n \"control\",\n \"invisible-space\",\n]);\n\nfunction toHex(cp: number): string {\n return \"U+\" + cp.toString(16).toUpperCase().padStart(4, \"0\");\n}\n\n/**\n * Find every suspicious character in `text`.\n *\n * ```ts\n * scan(\"hi​there\");\n * // [{ index: 2, char: \"​\", codePoint: 8203, hex: \"U+200B\",\n * // name: \"ZERO WIDTH SPACE\", category: \"zero-width\", severity: \"warning\" }]\n * ```\n */\nexport function scan(text: string): Finding[] {\n const findings: Finding[] = [];\n let index = 0;\n for (const ch of text) {\n const cp = ch.codePointAt(0) as number;\n const info = classify(cp);\n if (info) {\n findings.push({\n index,\n char: ch,\n codePoint: cp,\n hex: toHex(cp),\n name: info.name,\n category: info.category,\n severity: info.severity,\n });\n }\n index += ch.length;\n }\n return findings;\n}\n\n/** `true` if `text` contains no suspicious characters at all. */\nexport function isClean(text: string): boolean {\n for (const ch of text) {\n if (classify(ch.codePointAt(0) as number)) return false;\n }\n return true;\n}\n\nexport interface Stats {\n total: number;\n byCategory: Record<Category, number>;\n bySeverity: Record<Severity, number>;\n}\n\n/** Summarize what's lurking in `text` without listing every occurrence. */\nexport function stats(text: string): Stats {\n const byCategory = {\n \"zero-width\": 0, bidi: 0, tag: 0, \"variation-selector\": 0,\n \"invisible-space\": 0, control: 0, \"smart-punctuation\": 0, homoglyph: 0,\n } as Record<Category, number>;\n const bySeverity = { danger: 0, warning: 0, info: 0 } as Record<Severity, number>;\n let total = 0;\n for (const f of scan(text)) {\n byCategory[f.category]++;\n bySeverity[f.severity]++;\n total++;\n }\n return { total, byCategory, bySeverity };\n}\n\n/**\n * Return a cleaned copy of `text`. By default it strips the dangerous and\n * invisible characters while preserving the visible meaning of your text.\n *\n * ```ts\n * clean(\"Hello​world\"); // \"Helloworld\"\n * clean(\"“quote”\", { smartPunctuation: true }); // '\"quote\"'\n * clean(\"аdmin\", { homoglyphs: true }); // \"admin\" (Cyrillic а → a)\n * ```\n */\nexport function clean(text: string, options: CleanOptions = {}): string {\n const opts = { ...DEFAULT_OPTIONS, ...options };\n\n let src = text;\n if (opts.normalizeNewlines) src = src.replace(/\\r\\n?/g, \"\\n\");\n\n let out = \"\";\n for (const ch of src) {\n const cp = ch.codePointAt(0) as number;\n const info = classify(cp);\n if (!info) {\n out += ch;\n continue;\n }\n const enabled = opts[CATEGORY_OPTION[info.category]];\n out += enabled ? info.replacement : ch;\n }\n\n if (opts.collapseWhitespace) {\n out = out.replace(/[^\\S\\n]+/g, \" \").replace(/[ \\t]*\\n[ \\t]*/g, \"\\n\");\n }\n if (opts.trim) out = out.trim();\n\n return out;\n}\n\nexport interface RevealOptions {\n /** Custom token renderer. Default: `(f) => \"[\" + f.hex + \"]\"`. */\n token?: (finding: Finding) => string;\n}\n\n/**\n * Make invisible characters visible by replacing them with a readable token,\n * leaving everything else untouched — handy for logs and terminals.\n *\n * ```ts\n * reveal(\"a​b\"); // \"a[U+200B]b\"\n * ```\n */\nexport function reveal(text: string, options: RevealOptions = {}): string {\n const token = options.token ?? ((f: Finding) => `[${f.hex}]`);\n let out = \"\";\n let index = 0;\n for (const ch of text) {\n const cp = ch.codePointAt(0) as number;\n const info = classify(cp);\n if (info && INVISIBLE_CATEGORIES.has(info.category)) {\n out += token({\n index,\n char: ch,\n codePoint: cp,\n hex: toHex(cp),\n name: info.name,\n category: info.category,\n severity: info.severity,\n });\n } else {\n out += ch;\n }\n index += ch.length;\n }\n return out;\n}\n","#!/usr/bin/env node\n/**\n * unspook CLI — clean or scan text from files or stdin.\n *\n * unspook file.txt # print cleaned text\n * cat file.txt | unspook # works as a filter\n * unspook --scan README.md # report findings, exit 1 if any (great for CI)\n * unspook -w notes.md # clean in place\n * unspook --aggressive --reveal x # show what's hiding\n */\nimport { readFileSync, writeFileSync } from \"node:fs\";\nimport { AGGRESSIVE_OPTIONS, clean, reveal, scan, type CleanOptions } from \"./index.js\";\n\nconst HELP = `unspook — find & remove invisible / dangerous / confusable characters\n\nUsage:\n unspook [options] [files...]\n cat file | unspook [options]\n\nOptions:\n -s, --scan Report findings instead of cleaning; exit 1 if any found\n -w, --write Rewrite files in place (with cleaned output)\n -r, --reveal Print text with invisible characters made visible\n -a, --aggressive Also normalize smart punctuation, homoglyphs & whitespace\n --smart-quotes Convert smart punctuation to ASCII\n --homoglyphs Map look-alike letters to Latin (Cyrillic а → a)\n --collapse Collapse runs of whitespace\n --trim Trim leading/trailing whitespace\n --keep-nbsp Keep non-breaking & exotic spaces\n -h, --help Show this help\n -v, --version Show version\n`;\n\nfunction parseArgs(argv: string[]) {\n const flags = new Set<string>();\n const files: string[] = [];\n const alias: Record<string, string> = {\n \"-s\": \"--scan\", \"-w\": \"--write\", \"-r\": \"--reveal\",\n \"-a\": \"--aggressive\", \"-h\": \"--help\", \"-v\": \"--version\",\n };\n for (const arg of argv) {\n if (arg.startsWith(\"-\")) flags.add(alias[arg] ?? arg);\n else files.push(arg);\n }\n return { flags, files };\n}\n\nfunction readStdin(): string {\n try {\n return readFileSync(0, \"utf8\");\n } catch {\n return \"\";\n }\n}\n\nfunction optionsFromFlags(flags: Set<string>): CleanOptions {\n if (flags.has(\"--aggressive\")) return AGGRESSIVE_OPTIONS;\n return {\n smartPunctuation: flags.has(\"--smart-quotes\"),\n homoglyphs: flags.has(\"--homoglyphs\"),\n collapseWhitespace: flags.has(\"--collapse\"),\n trim: flags.has(\"--trim\"),\n invisibleSpaces: !flags.has(\"--keep-nbsp\"),\n };\n}\n\nfunction report(name: string, text: string): number {\n const findings = scan(text);\n const label = name || \"<stdin>\";\n if (findings.length === 0) {\n process.stderr.write(`✓ ${label}: clean\\n`);\n return 0;\n }\n process.stderr.write(`✗ ${label}: ${findings.length} finding(s)\\n`);\n for (const f of findings) {\n process.stderr.write(` ${f.hex.padEnd(10)} ${f.severity.padEnd(8)} ${f.category.padEnd(18)} ${f.name} @ ${f.index}\\n`);\n }\n return 1;\n}\n\nasync function main(): Promise<number> {\n const { flags, files } = parseArgs(process.argv.slice(2));\n\n if (flags.has(\"--help\")) {\n process.stdout.write(HELP);\n return 0;\n }\n if (flags.has(\"--version\")) {\n process.stdout.write(\"unspook 0.1.0\\n\");\n return 0;\n }\n\n const options = optionsFromFlags(flags);\n const inputs: { name: string; text: string }[] =\n files.length > 0\n ? files.map((name) => ({ name, text: readFileSync(name, \"utf8\") }))\n : [{ name: \"\", text: readStdin() }];\n\n if (flags.has(\"--scan\")) {\n let code = 0;\n for (const { name, text } of inputs) code = report(name, text) || code;\n return code;\n }\n\n let exitCode = 0;\n for (const { name, text } of inputs) {\n const output = flags.has(\"--reveal\") ? reveal(text) : clean(text, options);\n if (flags.has(\"--write\") && name) {\n writeFileSync(name, output);\n process.stderr.write(`✓ cleaned ${name}\\n`);\n } else {\n process.stdout.write(output);\n }\n }\n return exitCode;\n}\n\nmain().then((code) => process.exit(code));\n"]}
package/dist/cli.js ADDED
@@ -0,0 +1,26 @@
1
+ #!/usr/bin/env node
2
+ import {g as g$1,f,b,c}from'./chunk-PXW2RFNH.js';import {readFileSync,writeFileSync}from'fs';var d=`unspook \u2014 find & remove invisible / dangerous / confusable characters
3
+
4
+ Usage:
5
+ unspook [options] [files...]
6
+ cat file | unspook [options]
7
+
8
+ Options:
9
+ -s, --scan Report findings instead of cleaning; exit 1 if any found
10
+ -w, --write Rewrite files in place (with cleaned output)
11
+ -r, --reveal Print text with invisible characters made visible
12
+ -a, --aggressive Also normalize smart punctuation, homoglyphs & whitespace
13
+ --smart-quotes Convert smart punctuation to ASCII
14
+ --homoglyphs Map look-alike letters to Latin (Cyrillic \u0430 \u2192 a)
15
+ --collapse Collapse runs of whitespace
16
+ --trim Trim leading/trailing whitespace
17
+ --keep-nbsp Keep non-breaking & exotic spaces
18
+ -h, --help Show this help
19
+ -v, --version Show version
20
+ `;function g(e){let r=new Set,n=[],i={"-s":"--scan","-w":"--write","-r":"--reveal","-a":"--aggressive","-h":"--help","-v":"--version"};for(let t of e)t.startsWith("-")?r.add(i[t]??t):n.push(t);return {flags:r,files:n}}function m(){try{return readFileSync(0,"utf8")}catch{return ""}}function v(e){return e.has("--aggressive")?b:{smartPunctuation:e.has("--smart-quotes"),homoglyphs:e.has("--homoglyphs"),collapseWhitespace:e.has("--collapse"),trim:e.has("--trim"),invisibleSpaces:!e.has("--keep-nbsp")}}function w(e,r){let n=c(r),i=e||"<stdin>";if(n.length===0)return process.stderr.write(`\u2713 ${i}: clean
21
+ `),0;process.stderr.write(`\u2717 ${i}: ${n.length} finding(s)
22
+ `);for(let t of n)process.stderr.write(` ${t.hex.padEnd(10)} ${t.severity.padEnd(8)} ${t.category.padEnd(18)} ${t.name} @ ${t.index}
23
+ `);return 1}async function y(){let{flags:e,files:r}=g(process.argv.slice(2));if(e.has("--help"))return process.stdout.write(d),0;if(e.has("--version"))return process.stdout.write(`unspook 0.1.0
24
+ `),0;let n=v(e),i=r.length>0?r.map(s=>({name:s,text:readFileSync(s,"utf8")})):[{name:"",text:m()}];if(e.has("--scan")){let s=0;for(let{name:o,text:a}of i)s=w(o,a)||s;return s}let t=0;for(let{name:s,text:o}of i){let a=e.has("--reveal")?g$1(o):f(o,n);e.has("--write")&&s?(writeFileSync(s,a),process.stderr.write(`\u2713 cleaned ${s}
25
+ `)):process.stdout.write(a);}return t}y().then(e=>process.exit(e));//# sourceMappingURL=cli.js.map
26
+ //# sourceMappingURL=cli.js.map
@@ -0,0 +1 @@
1
+ {"version":3,"sources":["../src/cli.ts"],"names":["HELP","parseArgs","argv","flags","files","alias","arg","readStdin","readFileSync","optionsFromFlags","AGGRESSIVE_OPTIONS","report","name","text","findings","scan","label","f","main","options","inputs","code","exitCode","output","reveal","clean","writeFileSync"],"mappings":";6FAaA,IAAMA,CAAAA,CAAO,CAAA;;AAAA;AAAA;AAAA;;AAAA;AAAA;AAAA;AAAA;AAAA;AAAA;AAAA;AAAA;AAAA;AAAA;AAAA;AAAA;AAAA,CAAA,CAoBb,SAASC,CAAAA,CAAUC,CAAAA,CAAgB,CACjC,IAAMC,CAAAA,CAAQ,IAAI,GAAA,CACZC,CAAAA,CAAkB,EAAC,CACnBC,CAAAA,CAAgC,CACpC,IAAA,CAAM,QAAA,CAAU,IAAA,CAAM,SAAA,CAAW,IAAA,CAAM,UAAA,CACvC,IAAA,CAAM,cAAA,CAAgB,IAAA,CAAM,QAAA,CAAU,IAAA,CAAM,WAC9C,CAAA,CACA,IAAA,IAAWC,CAAAA,IAAOJ,EACZI,CAAAA,CAAI,UAAA,CAAW,GAAG,CAAA,CAAGH,CAAAA,CAAM,GAAA,CAAIE,CAAAA,CAAMC,CAAG,CAAA,EAAKA,CAAG,CAAA,CAC/CF,CAAAA,CAAM,IAAA,CAAKE,CAAG,CAAA,CAErB,OAAO,CAAE,KAAA,CAAAH,CAAAA,CAAO,KAAA,CAAAC,CAAM,CACxB,CAEA,SAASG,CAAAA,EAAoB,CAC3B,GAAI,CACF,OAAOC,YAAAA,CAAa,CAAA,CAAG,MAAM,CAC/B,CAAA,KAAQ,CACN,OAAO,EACT,CACF,CAEA,SAASC,CAAAA,CAAiBN,CAAAA,CAAkC,CAC1D,OAAIA,CAAAA,CAAM,GAAA,CAAI,cAAc,CAAA,CAAUO,CAAAA,CAC/B,CACL,gBAAA,CAAkBP,CAAAA,CAAM,GAAA,CAAI,gBAAgB,CAAA,CAC5C,UAAA,CAAYA,CAAAA,CAAM,GAAA,CAAI,cAAc,CAAA,CACpC,kBAAA,CAAoBA,CAAAA,CAAM,GAAA,CAAI,YAAY,CAAA,CAC1C,KAAMA,CAAAA,CAAM,GAAA,CAAI,QAAQ,CAAA,CACxB,eAAA,CAAiB,CAACA,CAAAA,CAAM,GAAA,CAAI,aAAa,CAC3C,CACF,CAEA,SAASQ,CAAAA,CAAOC,CAAAA,CAAcC,CAAAA,CAAsB,CAClD,IAAMC,CAAAA,CAAWC,CAAAA,CAAKF,CAAI,CAAA,CACpBG,CAAAA,CAAQJ,CAAAA,EAAQ,SAAA,CACtB,GAAIE,CAAAA,CAAS,MAAA,GAAW,CAAA,CACtB,OAAA,OAAA,CAAQ,MAAA,CAAO,KAAA,CAAM,UAAKE,CAAK,CAAA;AAAA,CAAW,CAAA,CACnC,EAET,OAAA,CAAQ,MAAA,CAAO,MAAM,CAAA,OAAA,EAAKA,CAAK,CAAA,EAAA,EAAKF,CAAAA,CAAS,MAAM,CAAA;AAAA,CAAe,CAAA,CAClE,IAAA,IAAWG,CAAAA,IAAKH,CAAAA,CACd,OAAA,CAAQ,MAAA,CAAO,KAAA,CAAM,CAAA,IAAA,EAAOG,CAAAA,CAAE,GAAA,CAAI,MAAA,CAAO,EAAE,CAAC,CAAA,CAAA,EAAIA,CAAAA,CAAE,QAAA,CAAS,MAAA,CAAO,CAAC,CAAC,CAAA,CAAA,EAAIA,CAAAA,CAAE,QAAA,CAAS,MAAA,CAAO,EAAE,CAAC,CAAA,CAAA,EAAIA,CAAAA,CAAE,IAAI,CAAA,GAAA,EAAMA,EAAE,KAAK;AAAA,CAAI,EAE1H,OAAO,CACT,CAEA,eAAeC,GAAwB,CACrC,GAAM,CAAE,KAAA,CAAAf,EAAO,KAAA,CAAAC,CAAM,EAAIH,CAAAA,CAAU,OAAA,CAAQ,KAAK,KAAA,CAAM,CAAC,CAAC,CAAA,CAExD,GAAIE,CAAAA,CAAM,GAAA,CAAI,QAAQ,CAAA,CACpB,OAAA,OAAA,CAAQ,OAAO,KAAA,CAAMH,CAAI,CAAA,CAClB,CAAA,CAET,GAAIG,CAAAA,CAAM,GAAA,CAAI,WAAW,CAAA,CACvB,OAAA,OAAA,CAAQ,OAAO,KAAA,CAAM,CAAA;AAAA,CAAiB,CAAA,CAC/B,EAGT,IAAMgB,CAAAA,CAAUV,EAAiBN,CAAK,CAAA,CAChCiB,EACJhB,CAAAA,CAAM,MAAA,CAAS,EACXA,CAAAA,CAAM,GAAA,CAAKQ,IAAU,CAAE,IAAA,CAAAA,EAAM,IAAA,CAAMJ,YAAAA,CAAaI,CAAAA,CAAM,MAAM,CAAE,CAAA,CAAE,EAChE,CAAC,CAAE,KAAM,EAAA,CAAI,IAAA,CAAML,GAAY,CAAC,EAEtC,GAAIJ,CAAAA,CAAM,IAAI,QAAQ,CAAA,CAAG,CACvB,IAAIkB,CAAAA,CAAO,EACX,IAAA,GAAW,CAAE,IAAA,CAAAT,CAAAA,CAAM,IAAA,CAAAC,CAAK,IAAKO,CAAAA,CAAQC,CAAAA,CAAOV,EAAOC,CAAAA,CAAMC,CAAI,GAAKQ,CAAAA,CAClE,OAAOA,CACT,CAEA,IAAIC,EAAW,CAAA,CACf,IAAA,GAAW,CAAE,IAAA,CAAAV,CAAAA,CAAM,KAAAC,CAAK,CAAA,GAAKO,CAAAA,CAAQ,CACnC,IAAMG,CAAAA,CAASpB,EAAM,GAAA,CAAI,UAAU,EAAIqB,GAAAA,CAAOX,CAAI,EAAIY,CAAAA,CAAMZ,CAAAA,CAAMM,CAAO,CAAA,CACrEhB,CAAAA,CAAM,GAAA,CAAI,SAAS,CAAA,EAAKS,CAAAA,EAC1Bc,cAAcd,CAAAA,CAAMW,CAAM,EAC1B,OAAA,CAAQ,MAAA,CAAO,KAAA,CAAM,CAAA,eAAA,EAAaX,CAAI;AAAA,CAAI,GAE1C,OAAA,CAAQ,MAAA,CAAO,KAAA,CAAMW,CAAM,EAE/B,CACA,OAAOD,CACT,CAEAJ,GAAK,CAAE,IAAA,CAAMG,GAAS,OAAA,CAAQ,IAAA,CAAKA,CAAI,CAAC,CAAA","file":"cli.js","sourcesContent":["#!/usr/bin/env node\n/**\n * unspook CLI — clean or scan text from files or stdin.\n *\n * unspook file.txt # print cleaned text\n * cat file.txt | unspook # works as a filter\n * unspook --scan README.md # report findings, exit 1 if any (great for CI)\n * unspook -w notes.md # clean in place\n * unspook --aggressive --reveal x # show what's hiding\n */\nimport { readFileSync, writeFileSync } from \"node:fs\";\nimport { AGGRESSIVE_OPTIONS, clean, reveal, scan, type CleanOptions } from \"./index.js\";\n\nconst HELP = `unspook — find & remove invisible / dangerous / confusable characters\n\nUsage:\n unspook [options] [files...]\n cat file | unspook [options]\n\nOptions:\n -s, --scan Report findings instead of cleaning; exit 1 if any found\n -w, --write Rewrite files in place (with cleaned output)\n -r, --reveal Print text with invisible characters made visible\n -a, --aggressive Also normalize smart punctuation, homoglyphs & whitespace\n --smart-quotes Convert smart punctuation to ASCII\n --homoglyphs Map look-alike letters to Latin (Cyrillic а → a)\n --collapse Collapse runs of whitespace\n --trim Trim leading/trailing whitespace\n --keep-nbsp Keep non-breaking & exotic spaces\n -h, --help Show this help\n -v, --version Show version\n`;\n\nfunction parseArgs(argv: string[]) {\n const flags = new Set<string>();\n const files: string[] = [];\n const alias: Record<string, string> = {\n \"-s\": \"--scan\", \"-w\": \"--write\", \"-r\": \"--reveal\",\n \"-a\": \"--aggressive\", \"-h\": \"--help\", \"-v\": \"--version\",\n };\n for (const arg of argv) {\n if (arg.startsWith(\"-\")) flags.add(alias[arg] ?? arg);\n else files.push(arg);\n }\n return { flags, files };\n}\n\nfunction readStdin(): string {\n try {\n return readFileSync(0, \"utf8\");\n } catch {\n return \"\";\n }\n}\n\nfunction optionsFromFlags(flags: Set<string>): CleanOptions {\n if (flags.has(\"--aggressive\")) return AGGRESSIVE_OPTIONS;\n return {\n smartPunctuation: flags.has(\"--smart-quotes\"),\n homoglyphs: flags.has(\"--homoglyphs\"),\n collapseWhitespace: flags.has(\"--collapse\"),\n trim: flags.has(\"--trim\"),\n invisibleSpaces: !flags.has(\"--keep-nbsp\"),\n };\n}\n\nfunction report(name: string, text: string): number {\n const findings = scan(text);\n const label = name || \"<stdin>\";\n if (findings.length === 0) {\n process.stderr.write(`✓ ${label}: clean\\n`);\n return 0;\n }\n process.stderr.write(`✗ ${label}: ${findings.length} finding(s)\\n`);\n for (const f of findings) {\n process.stderr.write(` ${f.hex.padEnd(10)} ${f.severity.padEnd(8)} ${f.category.padEnd(18)} ${f.name} @ ${f.index}\\n`);\n }\n return 1;\n}\n\nasync function main(): Promise<number> {\n const { flags, files } = parseArgs(process.argv.slice(2));\n\n if (flags.has(\"--help\")) {\n process.stdout.write(HELP);\n return 0;\n }\n if (flags.has(\"--version\")) {\n process.stdout.write(\"unspook 0.1.0\\n\");\n return 0;\n }\n\n const options = optionsFromFlags(flags);\n const inputs: { name: string; text: string }[] =\n files.length > 0\n ? files.map((name) => ({ name, text: readFileSync(name, \"utf8\") }))\n : [{ name: \"\", text: readStdin() }];\n\n if (flags.has(\"--scan\")) {\n let code = 0;\n for (const { name, text } of inputs) code = report(name, text) || code;\n return code;\n }\n\n let exitCode = 0;\n for (const { name, text } of inputs) {\n const output = flags.has(\"--reveal\") ? reveal(text) : clean(text, options);\n if (flags.has(\"--write\") && name) {\n writeFileSync(name, output);\n process.stderr.write(`✓ cleaned ${name}\\n`);\n } else {\n process.stdout.write(output);\n }\n }\n return exitCode;\n}\n\nmain().then((code) => process.exit(code));\n"]}
package/dist/index.cjs ADDED
@@ -0,0 +1,6 @@
1
+ 'use strict';var I={bidi:"danger",tag:"danger","zero-width":"warning",control:"warning",homoglyph:"warning","variation-selector":"warning","invisible-space":"info","smart-punctuation":"info"},e=(t,n,i)=>({name:t,category:n,severity:I[n],replacement:i}),O=new Map([[8203,e("ZERO WIDTH SPACE","zero-width","")],[8204,e("ZERO WIDTH NON-JOINER","zero-width","")],[8205,e("ZERO WIDTH JOINER","zero-width","")],[8288,e("WORD JOINER","zero-width","")],[8289,e("FUNCTION APPLICATION","zero-width","")],[8290,e("INVISIBLE TIMES","zero-width","")],[8291,e("INVISIBLE SEPARATOR","zero-width","")],[8292,e("INVISIBLE PLUS","zero-width","")],[65279,e("ZERO WIDTH NO-BREAK SPACE (BOM)","zero-width","")],[6158,e("MONGOLIAN VOWEL SEPARATOR","zero-width","")],[8234,e("LEFT-TO-RIGHT EMBEDDING","bidi","")],[8235,e("RIGHT-TO-LEFT EMBEDDING","bidi","")],[8236,e("POP DIRECTIONAL FORMATTING","bidi","")],[8237,e("LEFT-TO-RIGHT OVERRIDE","bidi","")],[8238,e("RIGHT-TO-LEFT OVERRIDE","bidi","")],[8294,e("LEFT-TO-RIGHT ISOLATE","bidi","")],[8295,e("RIGHT-TO-LEFT ISOLATE","bidi","")],[8296,e("FIRST STRONG ISOLATE","bidi","")],[8297,e("POP DIRECTIONAL ISOLATE","bidi","")],[8206,e("LEFT-TO-RIGHT MARK","bidi","")],[8207,e("RIGHT-TO-LEFT MARK","bidi","")],[1564,e("ARABIC LETTER MARK","bidi","")],[160,e("NO-BREAK SPACE","invisible-space"," ")],[8239,e("NARROW NO-BREAK SPACE","invisible-space"," ")],[8199,e("FIGURE SPACE","invisible-space"," ")],[8200,e("PUNCTUATION SPACE","invisible-space"," ")],[8201,e("THIN SPACE","invisible-space"," ")],[8202,e("HAIR SPACE","invisible-space"," ")],[8192,e("EN QUAD","invisible-space"," ")],[8193,e("EM QUAD","invisible-space"," ")],[8194,e("EN SPACE","invisible-space"," ")],[8195,e("EM SPACE","invisible-space"," ")],[8196,e("THREE-PER-EM SPACE","invisible-space"," ")],[8197,e("FOUR-PER-EM SPACE","invisible-space"," ")],[8198,e("SIX-PER-EM SPACE","invisible-space"," ")],[8287,e("MEDIUM MATHEMATICAL SPACE","invisible-space"," ")],[12288,e("IDEOGRAPHIC SPACE","invisible-space"," ")],[5760,e("OGHAM SPACE MARK","invisible-space"," ")],[173,e("SOFT HYPHEN","invisible-space","")],[8232,e("LINE SEPARATOR","invisible-space",`
2
+ `)],[8233,e("PARAGRAPH SEPARATOR","invisible-space",`
3
+ `)],[8220,e("LEFT DOUBLE QUOTATION MARK","smart-punctuation",'"')],[8221,e("RIGHT DOUBLE QUOTATION MARK","smart-punctuation",'"')],[8222,e("DOUBLE LOW-9 QUOTATION MARK","smart-punctuation",'"')],[8223,e("DOUBLE HIGH-REVERSED-9 QUOTATION MARK","smart-punctuation",'"')],[171,e("LEFT-POINTING DOUBLE ANGLE QUOTATION MARK","smart-punctuation",'"')],[187,e("RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK","smart-punctuation",'"')],[8216,e("LEFT SINGLE QUOTATION MARK","smart-punctuation","'")],[8217,e("RIGHT SINGLE QUOTATION MARK","smart-punctuation","'")],[8218,e("SINGLE LOW-9 QUOTATION MARK","smart-punctuation","'")],[8219,e("SINGLE HIGH-REVERSED-9 QUOTATION MARK","smart-punctuation","'")],[8249,e("SINGLE LEFT-POINTING ANGLE QUOTATION MARK","smart-punctuation","'")],[8250,e("SINGLE RIGHT-POINTING ANGLE QUOTATION MARK","smart-punctuation","'")],[8242,e("PRIME","smart-punctuation","'")],[8243,e("DOUBLE PRIME","smart-punctuation",'"')],[8211,e("EN DASH","smart-punctuation","-")],[8212,e("EM DASH","smart-punctuation","--")],[8213,e("HORIZONTAL BAR","smart-punctuation","--")],[8722,e("MINUS SIGN","smart-punctuation","-")],[8230,e("HORIZONTAL ELLIPSIS","smart-punctuation","...")]]),T={0:"NULL",1:"START OF HEADING",2:"START OF TEXT",3:"END OF TEXT",4:"END OF TRANSMISSION",5:"ENQUIRY",6:"ACKNOWLEDGE",7:"BELL",8:"BACKSPACE",11:"LINE TABULATION",12:"FORM FEED",14:"SHIFT OUT",15:"SHIFT IN",16:"DATA LINK ESCAPE",27:"ESCAPE",127:"DELETE"},A=new Map([[1072,"a"],[1077,"e"],[1086,"o"],[1088,"p"],[1089,"c"],[1091,"y"],[1093,"x"],[1109,"s"],[1110,"i"],[1112,"j"],[1211,"h"],[1281,"d"],[1082,"k"],[1084,"m"],[1090,"t"],[1040,"A"],[1042,"B"],[1045,"E"],[1050,"K"],[1052,"M"],[1053,"H"],[1054,"O"],[1056,"P"],[1057,"C"],[1058,"T"],[1059,"Y"],[1061,"X"],[1029,"S"],[1030,"I"],[1032,"J"],[959,"o"],[945,"a"],[947,"y"],[961,"p"],[965,"u"],[913,"A"],[914,"B"],[917,"E"],[918,"Z"],[919,"H"],[921,"I"],[922,"K"],[924,"M"],[925,"N"],[927,"O"],[929,"P"],[932,"T"],[933,"Y"],[935,"X"],[8495,"e"],[8467,"l"],[305,"i"]]);function c(t){let n=O.get(t);if(n)return n;if(t>=917504&&t<=917631)return e("TAG CHARACTER","tag","");if(t>=65024&&t<=65039||t>=917760&&t<=917999)return e("VARIATION SELECTOR","variation-selector","");if(t>=65281&&t<=65374)return e("FULLWIDTH FORM","homoglyph",String.fromCharCode(t-65248));let i=A.get(t);if(i!==void 0)return e(`HOMOGLYPH OF "${i}"`,"homoglyph",i);if(t<=31&&t!==9&&t!==10&&t!==13||t===127||t>=128&&t<=159){let r=T[t]??"CONTROL CHARACTER";return e(r,"control","")}return null}var R={zeroWidth:true,bidi:true,tag:true,control:true,variationSelectors:false,invisibleSpaces:true,smartPunctuation:false,homoglyphs:false,collapseWhitespace:false,normalizeNewlines:true,trim:false},d={zeroWidth:true,bidi:true,tag:true,control:true,variationSelectors:true,invisibleSpaces:true,smartPunctuation:true,homoglyphs:true,collapseWhitespace:true,normalizeNewlines:true,trim:true},p={"zero-width":"zeroWidth",bidi:"bidi",tag:"tag",control:"control","variation-selector":"variationSelectors","invisible-space":"invisibleSpaces","smart-punctuation":"smartPunctuation",homoglyph:"homoglyphs"},b=new Set(["zero-width","bidi","tag","variation-selector","control","invisible-space"]);function E(t){return "U+"+t.toString(16).toUpperCase().padStart(4,"0")}function S(t){let n=[],i=0;for(let r of t){let o=r.codePointAt(0),a=c(o);a&&n.push({index:i,char:r,codePoint:o,hex:E(o),name:a.name,category:a.category,severity:a.severity}),i+=r.length;}return n}function g(t){for(let n of t)if(c(n.codePointAt(0)))return false;return true}function m(t){let n={"zero-width":0,bidi:0,tag:0,"variation-selector":0,"invisible-space":0,control:0,"smart-punctuation":0,homoglyph:0},i={danger:0,warning:0,info:0},r=0;for(let o of S(t))n[o.category]++,i[o.severity]++,r++;return {total:r,byCategory:n,bySeverity:i}}function L(t,n={}){let i={...R,...n},r=t;i.normalizeNewlines&&(r=r.replace(/\r\n?/g,`
4
+ `));let o="";for(let a of r){let x=a.codePointAt(0),s=c(x);if(!s){o+=a;continue}let l=i[p[s.category]];o+=l?s.replacement:a;}return i.collapseWhitespace&&(o=o.replace(/[^\S\n]+/g," ").replace(/[ \t]*\n[ \t]*/g,`
5
+ `)),i.trim&&(o=o.trim()),o}function C(t,n={}){let i=n.token??(a=>`[${a.hex}]`),r="",o=0;for(let a of t){let x=a.codePointAt(0),s=c(x);s&&b.has(s.category)?r+=i({index:o,char:a,codePoint:x,hex:E(x),name:s.name,category:s.category,severity:s.severity}):r+=a,o+=a.length;}return r}exports.AGGRESSIVE_OPTIONS=d;exports.DEFAULT_OPTIONS=R;exports.clean=L;exports.isClean=g;exports.reveal=C;exports.scan=S;exports.stats=m;//# sourceMappingURL=index.cjs.map
6
+ //# sourceMappingURL=index.cjs.map
@@ -0,0 +1 @@
1
+ {"version":3,"sources":["../src/data.ts","../src/index.ts"],"names":["SEVERITY_BY_CATEGORY","def","name","category","replacement","SPECIAL","C0_NAMES","HOMOGLYPHS","classify","cp","special","latin","DEFAULT_OPTIONS","AGGRESSIVE_OPTIONS","CATEGORY_OPTION","INVISIBLE_CATEGORIES","toHex","scan","text","findings","index","ch","info","isClean","stats","byCategory","bySeverity","total","f","clean","options","opts","src","out","enabled","reveal","token"],"mappings":"aA2BO,IAAMA,EAAmD,CAC9D,IAAA,CAAM,SACN,GAAA,CAAK,QAAA,CACL,aAAc,SAAA,CACd,OAAA,CAAS,SAAA,CACT,SAAA,CAAW,UACX,oBAAA,CAAsB,SAAA,CACtB,kBAAmB,MAAA,CACnB,mBAAA,CAAqB,MACvB,CAAA,CAEMC,CAAAA,CAAM,CAACC,CAAAA,CAAcC,EAAoBC,CAAAA,IAAmC,CAChF,KAAAF,CAAAA,CACA,QAAA,CAAAC,EACA,QAAA,CAAUH,CAAAA,CAAqBG,CAAQ,CAAA,CACvC,WAAA,CAAAC,CACF,CAAA,CAAA,CAGaC,CAAAA,CAAiC,IAAI,GAAA,CAAI,CAEpD,CAAC,IAAA,CAAQJ,CAAAA,CAAI,kBAAA,CAAoB,YAAA,CAAc,EAAE,CAAC,CAAA,CAClD,CAAC,IAAA,CAAQA,CAAAA,CAAI,wBAAyB,YAAA,CAAc,EAAE,CAAC,CAAA,CACvD,CAAC,IAAA,CAAQA,CAAAA,CAAI,oBAAqB,YAAA,CAAc,EAAE,CAAC,CAAA,CACnD,CAAC,IAAA,CAAQA,CAAAA,CAAI,cAAe,YAAA,CAAc,EAAE,CAAC,CAAA,CAC7C,CAAC,KAAQA,CAAAA,CAAI,sBAAA,CAAwB,aAAc,EAAE,CAAC,EACtD,CAAC,IAAA,CAAQA,EAAI,iBAAA,CAAmB,YAAA,CAAc,EAAE,CAAC,CAAA,CACjD,CAAC,IAAA,CAAQA,EAAI,qBAAA,CAAuB,YAAA,CAAc,EAAE,CAAC,CAAA,CACrD,CAAC,IAAA,CAAQA,CAAAA,CAAI,gBAAA,CAAkB,YAAA,CAAc,EAAE,CAAC,CAAA,CAChD,CAAC,KAAA,CAAQA,CAAAA,CAAI,kCAAmC,YAAA,CAAc,EAAE,CAAC,CAAA,CACjE,CAAC,IAAA,CAAQA,CAAAA,CAAI,4BAA6B,YAAA,CAAc,EAAE,CAAC,CAAA,CAG3D,CAAC,KAAQA,CAAAA,CAAI,yBAAA,CAA2B,OAAQ,EAAE,CAAC,EACnD,CAAC,IAAA,CAAQA,EAAI,yBAAA,CAA2B,MAAA,CAAQ,EAAE,CAAC,EACnD,CAAC,IAAA,CAAQA,EAAI,4BAAA,CAA8B,MAAA,CAAQ,EAAE,CAAC,CAAA,CACtD,CAAC,IAAA,CAAQA,EAAI,wBAAA,CAA0B,MAAA,CAAQ,EAAE,CAAC,CAAA,CAClD,CAAC,IAAA,CAAQA,CAAAA,CAAI,wBAAA,CAA0B,MAAA,CAAQ,EAAE,CAAC,CAAA,CAClD,CAAC,IAAA,CAAQA,CAAAA,CAAI,wBAAyB,MAAA,CAAQ,EAAE,CAAC,CAAA,CACjD,CAAC,KAAQA,CAAAA,CAAI,uBAAA,CAAyB,OAAQ,EAAE,CAAC,EACjD,CAAC,IAAA,CAAQA,CAAAA,CAAI,sBAAA,CAAwB,OAAQ,EAAE,CAAC,EAChD,CAAC,IAAA,CAAQA,EAAI,yBAAA,CAA2B,MAAA,CAAQ,EAAE,CAAC,EACnD,CAAC,IAAA,CAAQA,EAAI,oBAAA,CAAsB,MAAA,CAAQ,EAAE,CAAC,CAAA,CAC9C,CAAC,IAAA,CAAQA,EAAI,oBAAA,CAAsB,MAAA,CAAQ,EAAE,CAAC,CAAA,CAC9C,CAAC,IAAA,CAAQA,CAAAA,CAAI,qBAAsB,MAAA,CAAQ,EAAE,CAAC,CAAA,CAG9C,CAAC,IAAQA,CAAAA,CAAI,gBAAA,CAAkB,kBAAmB,GAAG,CAAC,CAAA,CACtD,CAAC,KAAQA,CAAAA,CAAI,uBAAA,CAAyB,kBAAmB,GAAG,CAAC,EAC7D,CAAC,IAAA,CAAQA,CAAAA,CAAI,cAAA,CAAgB,kBAAmB,GAAG,CAAC,EACpD,CAAC,IAAA,CAAQA,EAAI,mBAAA,CAAqB,iBAAA,CAAmB,GAAG,CAAC,EACzD,CAAC,IAAA,CAAQA,EAAI,YAAA,CAAc,iBAAA,CAAmB,GAAG,CAAC,CAAA,CAClD,CAAC,IAAA,CAAQA,CAAAA,CAAI,aAAc,iBAAA,CAAmB,GAAG,CAAC,CAAA,CAClD,CAAC,KAAQA,CAAAA,CAAI,SAAA,CAAW,iBAAA,CAAmB,GAAG,CAAC,CAAA,CAC/C,CAAC,KAAQA,CAAAA,CAAI,SAAA,CAAW,kBAAmB,GAAG,CAAC,CAAA,CAC/C,CAAC,KAAQA,CAAAA,CAAI,UAAA,CAAY,kBAAmB,GAAG,CAAC,EAChD,CAAC,IAAA,CAAQA,CAAAA,CAAI,UAAA,CAAY,kBAAmB,GAAG,CAAC,EAChD,CAAC,IAAA,CAAQA,EAAI,oBAAA,CAAsB,iBAAA,CAAmB,GAAG,CAAC,CAAA,CAC1D,CAAC,IAAA,CAAQA,CAAAA,CAAI,oBAAqB,iBAAA,CAAmB,GAAG,CAAC,CAAA,CACzD,CAAC,IAAA,CAAQA,CAAAA,CAAI,mBAAoB,iBAAA,CAAmB,GAAG,CAAC,CAAA,CACxD,CAAC,KAAQA,CAAAA,CAAI,2BAAA,CAA6B,iBAAA,CAAmB,GAAG,CAAC,CAAA,CACjE,CAAC,MAAQA,CAAAA,CAAI,mBAAA,CAAqB,kBAAmB,GAAG,CAAC,CAAA,CACzD,CAAC,KAAQA,CAAAA,CAAI,kBAAA,CAAoB,kBAAmB,GAAG,CAAC,EACxD,CAAC,GAAA,CAAQA,EAAI,aAAA,CAAe,iBAAA,CAAmB,EAAE,CAAC,CAAA,CAClD,CAAC,IAAA,CAAQA,CAAAA,CAAI,iBAAkB,iBAAA,CAAmB;AAAA,CAAI,CAAC,CAAA,CACvD,CAAC,IAAA,CAAQA,CAAAA,CAAI,sBAAuB,iBAAA,CAAmB;AAAA,CAAI,CAAC,EAG5D,CAAC,IAAA,CAAQA,EAAI,4BAAA,CAA8B,mBAAA,CAAqB,GAAG,CAAC,CAAA,CACpE,CAAC,IAAA,CAAQA,CAAAA,CAAI,8BAA+B,mBAAA,CAAqB,GAAG,CAAC,CAAA,CACrE,CAAC,KAAQA,CAAAA,CAAI,6BAAA,CAA+B,oBAAqB,GAAG,CAAC,EACrE,CAAC,IAAA,CAAQA,EAAI,uCAAA,CAAyC,mBAAA,CAAqB,GAAG,CAAC,CAAA,CAC/E,CAAC,GAAA,CAAQA,CAAAA,CAAI,4CAA6C,mBAAA,CAAqB,GAAG,CAAC,CAAA,CACnF,CAAC,IAAQA,CAAAA,CAAI,4CAAA,CAA8C,oBAAqB,GAAG,CAAC,EACpF,CAAC,IAAA,CAAQA,EAAI,4BAAA,CAA8B,mBAAA,CAAqB,GAAG,CAAC,CAAA,CACpE,CAAC,IAAA,CAAQA,CAAAA,CAAI,8BAA+B,mBAAA,CAAqB,GAAG,CAAC,CAAA,CACrE,CAAC,KAAQA,CAAAA,CAAI,6BAAA,CAA+B,oBAAqB,GAAG,CAAC,EACrE,CAAC,IAAA,CAAQA,EAAI,uCAAA,CAAyC,mBAAA,CAAqB,GAAG,CAAC,CAAA,CAC/E,CAAC,IAAA,CAAQA,CAAAA,CAAI,4CAA6C,mBAAA,CAAqB,GAAG,CAAC,CAAA,CACnF,CAAC,KAAQA,CAAAA,CAAI,4CAAA,CAA8C,oBAAqB,GAAG,CAAC,EACpF,CAAC,IAAA,CAAQA,EAAI,OAAA,CAAS,mBAAA,CAAqB,GAAG,CAAC,CAAA,CAC/C,CAAC,IAAA,CAAQA,CAAAA,CAAI,eAAgB,mBAAA,CAAqB,GAAG,CAAC,CAAA,CACtD,CAAC,KAAQA,CAAAA,CAAI,SAAA,CAAW,oBAAqB,GAAG,CAAC,EACjD,CAAC,IAAA,CAAQA,EAAI,SAAA,CAAW,mBAAA,CAAqB,IAAI,CAAC,CAAA,CAClD,CAAC,IAAA,CAAQA,CAAAA,CAAI,iBAAkB,mBAAA,CAAqB,IAAI,CAAC,CAAA,CACzD,CAAC,KAAQA,CAAAA,CAAI,YAAA,CAAc,oBAAqB,GAAG,CAAC,EACpD,CAAC,IAAA,CAAQA,EAAI,qBAAA,CAAuB,mBAAA,CAAqB,KAAK,CAAC,CACjE,CAAC,CAAA,CAGKK,CAAAA,CAAmC,CACvC,CAAA,CAAM,MAAA,CAAQ,EAAM,kBAAA,CAAoB,CAAA,CAAM,gBAAiB,CAAA,CAAM,aAAA,CACrE,EAAM,qBAAA,CAAuB,CAAA,CAAM,UAAW,CAAA,CAAM,aAAA,CAAe,EAAM,MAAA,CACzE,CAAA,CAAM,YAAa,EAAA,CAAM,iBAAA,CAAmB,GAAM,WAAA,CAAa,EAAA,CAAM,YACrE,EAAA,CAAM,UAAA,CAAY,GAAM,kBAAA,CAAoB,EAAA,CAAM,QAAA,CAAU,GAAA,CAAM,QACpE,CAAA,CAOaC,EAAkC,IAAI,GAAA,CAAI,CAErD,CAAC,IAAA,CAAQ,GAAG,CAAA,CAAG,CAAC,KAAQ,GAAG,CAAA,CAAG,CAAC,IAAA,CAAQ,GAAG,EAAG,CAAC,IAAA,CAAQ,GAAG,CAAA,CAAG,CAAC,KAAQ,GAAG,CAAA,CACxE,CAAC,IAAA,CAAQ,GAAG,EAAG,CAAC,IAAA,CAAQ,GAAG,CAAA,CAAG,CAAC,KAAQ,GAAG,CAAA,CAAG,CAAC,IAAA,CAAQ,GAAG,EAAG,CAAC,IAAA,CAAQ,GAAG,CAAA,CACxE,CAAC,KAAQ,GAAG,CAAA,CAAG,CAAC,IAAA,CAAQ,GAAG,EAAG,CAAC,IAAA,CAAQ,GAAG,CAAA,CAAG,CAAC,KAAQ,GAAG,CAAA,CAAG,CAAC,IAAA,CAAQ,GAAG,EAExE,CAAC,IAAA,CAAQ,GAAG,CAAA,CAAG,CAAC,KAAQ,GAAG,CAAA,CAAG,CAAC,IAAA,CAAQ,GAAG,EAAG,CAAC,IAAA,CAAQ,GAAG,CAAA,CAAG,CAAC,KAAQ,GAAG,CAAA,CACxE,CAAC,IAAA,CAAQ,GAAG,EAAG,CAAC,IAAA,CAAQ,GAAG,CAAA,CAAG,CAAC,IAAA,CAAQ,GAAG,CAAA,CAAG,CAAC,KAAQ,GAAG,CAAA,CAAG,CAAC,IAAA,CAAQ,GAAG,EACxE,CAAC,IAAA,CAAQ,GAAG,CAAA,CAAG,CAAC,KAAQ,GAAG,CAAA,CAAG,CAAC,IAAA,CAAQ,GAAG,EAAG,CAAC,IAAA,CAAQ,GAAG,CAAA,CAAG,CAAC,KAAQ,GAAG,CAAA,CAExE,CAAC,GAAA,CAAQ,GAAG,EAAG,CAAC,GAAA,CAAQ,GAAG,CAAA,CAAG,CAAC,IAAQ,GAAG,CAAA,CAAG,CAAC,GAAA,CAAQ,GAAG,EAAG,CAAC,GAAA,CAAQ,GAAG,CAAA,CACxE,CAAC,IAAQ,GAAG,CAAA,CAAG,CAAC,GAAA,CAAQ,GAAG,EAAG,CAAC,GAAA,CAAQ,GAAG,CAAA,CAAG,CAAC,IAAQ,GAAG,CAAA,CAAG,CAAC,GAAA,CAAQ,GAAG,EACxE,CAAC,GAAA,CAAQ,GAAG,CAAA,CAAG,CAAC,IAAQ,GAAG,CAAA,CAAG,CAAC,GAAA,CAAQ,GAAG,EAAG,CAAC,GAAA,CAAQ,GAAG,CAAA,CAAG,CAAC,IAAQ,GAAG,CAAA,CACxE,CAAC,GAAA,CAAQ,GAAG,EAAG,CAAC,GAAA,CAAQ,GAAG,CAAA,CAAG,CAAC,GAAA,CAAQ,GAAG,CAAA,CAAG,CAAC,IAAQ,GAAG,CAAA,CAEzD,CAAC,IAAA,CAAQ,GAAG,EAAG,CAAC,IAAA,CAAQ,GAAG,CAAA,CAAG,CAAC,IAAQ,GAAG,CAC5C,CAAC,CAAA,CAKM,SAASC,EAASC,CAAAA,CAA6B,CACpD,IAAMC,CAAAA,CAAUL,CAAAA,CAAQ,IAAII,CAAE,CAAA,CAC9B,GAAIC,CAAAA,CAAS,OAAOA,EAGpB,GAAID,CAAAA,EAAM,QAAWA,CAAAA,EAAM,MAAA,CACzB,OAAOR,CAAAA,CAAI,eAAA,CAAiB,MAAO,EAAE,CAAA,CAGvC,GAAKQ,CAAAA,EAAM,KAAA,EAAUA,GAAM,KAAA,EAAYA,CAAAA,EAAM,QAAWA,CAAAA,EAAM,MAAA,CAC5D,OAAOR,CAAAA,CAAI,oBAAA,CAAsB,qBAAsB,EAAE,CAAA,CAG3D,GAAIQ,CAAAA,EAAM,KAAA,EAAUA,GAAM,KAAA,CACxB,OAAOR,EAAI,gBAAA,CAAkB,WAAA,CAAa,OAAO,YAAA,CAAaQ,CAAAA,CAAK,KAAM,CAAC,CAAA,CAG5E,IAAME,CAAAA,CAAQJ,CAAAA,CAAW,IAAIE,CAAE,CAAA,CAC/B,GAAIE,CAAAA,GAAU,MAAA,CACZ,OAAOV,CAAAA,CAAI,CAAA,cAAA,EAAiBU,CAAK,CAAA,CAAA,CAAA,CAAK,WAAA,CAAaA,CAAK,CAAA,CAG1D,GACGF,GAAM,EAAA,EAAQA,CAAAA,GAAO,GAAQA,CAAAA,GAAO,EAAA,EAAQA,IAAO,EAAA,EACpDA,CAAAA,GAAO,KACNA,CAAAA,EAAM,GAAA,EAAQA,GAAM,GAAA,CACrB,CACA,IAAMP,CAAAA,CAAOI,CAAAA,CAASG,CAAE,CAAA,EAAK,mBAAA,CAC7B,OAAOR,CAAAA,CAAIC,CAAAA,CAAM,UAAW,EAAE,CAChC,CAGA,OAAO,IACT,CCnIO,IAAMU,CAAAA,CAA0C,CACrD,SAAA,CAAW,IAAA,CACX,KAAM,IAAA,CACN,GAAA,CAAK,KACL,OAAA,CAAS,IAAA,CACT,mBAAoB,KAAA,CACpB,eAAA,CAAiB,KACjB,gBAAA,CAAkB,KAAA,CAClB,WAAY,KAAA,CACZ,kBAAA,CAAoB,MACpB,iBAAA,CAAmB,IAAA,CACnB,KAAM,KACR,CAAA,CAGaC,EAA6C,CACxD,SAAA,CAAW,KACX,IAAA,CAAM,IAAA,CACN,IAAK,IAAA,CACL,OAAA,CAAS,KACT,kBAAA,CAAoB,IAAA,CACpB,gBAAiB,IAAA,CACjB,gBAAA,CAAkB,KAClB,UAAA,CAAY,IAAA,CACZ,mBAAoB,IAAA,CACpB,iBAAA,CAAmB,KACnB,IAAA,CAAM,IACR,EAEMC,CAAAA,CAAwD,CAC5D,aAAc,WAAA,CACd,IAAA,CAAM,OACN,GAAA,CAAK,KAAA,CACL,OAAA,CAAS,SAAA,CACT,oBAAA,CAAsB,oBAAA,CACtB,kBAAmB,iBAAA,CACnB,mBAAA,CAAqB,mBACrB,SAAA,CAAW,YACb,EAGMC,CAAAA,CAAuB,IAAI,IAAc,CAC7C,YAAA,CACA,OACA,KAAA,CACA,oBAAA,CACA,UACA,iBACF,CAAC,EAED,SAASC,CAAAA,CAAMP,EAAoB,CACjC,OAAO,KAAOA,CAAAA,CAAG,QAAA,CAAS,EAAE,CAAA,CAAE,WAAA,GAAc,QAAA,CAAS,CAAA,CAAG,GAAG,CAC7D,CAWO,SAASQ,CAAAA,CAAKC,CAAAA,CAAyB,CAC5C,IAAMC,CAAAA,CAAsB,EAAC,CACzBC,CAAAA,CAAQ,EACZ,IAAA,IAAWC,CAAAA,IAAMH,EAAM,CACrB,IAAMT,EAAKY,CAAAA,CAAG,WAAA,CAAY,CAAC,CAAA,CACrBC,CAAAA,CAAOd,EAASC,CAAE,CAAA,CACpBa,GACFH,CAAAA,CAAS,IAAA,CAAK,CACZ,KAAA,CAAAC,CAAAA,CACA,KAAMC,CAAAA,CACN,SAAA,CAAWZ,EACX,GAAA,CAAKO,CAAAA,CAAMP,CAAE,CAAA,CACb,IAAA,CAAMa,EAAK,IAAA,CACX,QAAA,CAAUA,EAAK,QAAA,CACf,QAAA,CAAUA,EAAK,QACjB,CAAC,EAEHF,CAAAA,EAASC,CAAAA,CAAG,OACd,CACA,OAAOF,CACT,CAGO,SAASI,CAAAA,CAAQL,CAAAA,CAAuB,CAC7C,IAAA,IAAWG,KAAMH,CAAAA,CACf,GAAIV,EAASa,CAAAA,CAAG,WAAA,CAAY,CAAC,CAAW,CAAA,CAAG,OAAO,MAAA,CAEpD,OAAO,KACT,CASO,SAASG,EAAMN,CAAAA,CAAqB,CACzC,IAAMO,CAAAA,CAAa,CACjB,aAAc,CAAA,CAAG,IAAA,CAAM,EAAG,GAAA,CAAK,CAAA,CAAG,qBAAsB,CAAA,CACxD,iBAAA,CAAmB,EAAG,OAAA,CAAS,CAAA,CAAG,oBAAqB,CAAA,CAAG,SAAA,CAAW,CACvE,CAAA,CACMC,CAAAA,CAAa,CAAE,MAAA,CAAQ,CAAA,CAAG,QAAS,CAAA,CAAG,IAAA,CAAM,CAAE,CAAA,CAChDC,CAAAA,CAAQ,EACZ,IAAA,IAAWC,CAAAA,IAAKX,EAAKC,CAAI,CAAA,CACvBO,EAAWG,CAAAA,CAAE,QAAQ,IACrBF,CAAAA,CAAWE,CAAAA,CAAE,QAAQ,CAAA,EAAA,CACrBD,CAAAA,EAAAA,CAEF,OAAO,CAAE,KAAA,CAAAA,EAAO,UAAA,CAAAF,CAAAA,CAAY,WAAAC,CAAW,CACzC,CAYO,SAASG,CAAAA,CAAMX,EAAcY,CAAAA,CAAwB,GAAY,CACtE,IAAMC,EAAO,CAAE,GAAGnB,EAAiB,GAAGkB,CAAQ,EAE1CE,CAAAA,CAAMd,CAAAA,CACNa,EAAK,iBAAA,GAAmBC,CAAAA,CAAMA,CAAAA,CAAI,OAAA,CAAQ,QAAA,CAAU;AAAA,CAAI,CAAA,CAAA,CAE5D,IAAIC,CAAAA,CAAM,EAAA,CACV,QAAWZ,CAAAA,IAAMW,CAAAA,CAAK,CACpB,IAAMvB,CAAAA,CAAKY,CAAAA,CAAG,YAAY,CAAC,CAAA,CACrBC,CAAAA,CAAOd,CAAAA,CAASC,CAAE,CAAA,CACxB,GAAI,CAACa,CAAAA,CAAM,CACTW,CAAAA,EAAOZ,CAAAA,CACP,QACF,CACA,IAAMa,CAAAA,CAAUH,EAAKjB,CAAAA,CAAgBQ,CAAAA,CAAK,QAAQ,CAAC,CAAA,CACnDW,CAAAA,EAAOC,CAAAA,CAAUZ,CAAAA,CAAK,WAAA,CAAcD,EACtC,CAEA,OAAIU,CAAAA,CAAK,kBAAA,GACPE,CAAAA,CAAMA,CAAAA,CAAI,QAAQ,WAAA,CAAa,GAAG,CAAA,CAAE,OAAA,CAAQ,iBAAA,CAAmB;AAAA,CAAI,GAEjEF,CAAAA,CAAK,IAAA,GAAME,CAAAA,CAAMA,CAAAA,CAAI,MAAK,CAAA,CAEvBA,CACT,CAeO,SAASE,EAAOjB,CAAAA,CAAcY,CAAAA,CAAyB,EAAC,CAAW,CACxE,IAAMM,CAAAA,CAAQN,CAAAA,CAAQ,KAAA,GAAWF,CAAAA,EAAe,IAAIA,CAAAA,CAAE,GAAG,CAAA,CAAA,CAAA,CAAA,CACrDK,CAAAA,CAAM,GACNb,CAAAA,CAAQ,CAAA,CACZ,IAAA,IAAWC,CAAAA,IAAMH,EAAM,CACrB,IAAMT,EAAKY,CAAAA,CAAG,WAAA,CAAY,CAAC,CAAA,CACrBC,CAAAA,CAAOd,CAAAA,CAASC,CAAE,EACpBa,CAAAA,EAAQP,CAAAA,CAAqB,GAAA,CAAIO,CAAAA,CAAK,QAAQ,CAAA,CAChDW,CAAAA,EAAOG,CAAAA,CAAM,CACX,MAAAhB,CAAAA,CACA,IAAA,CAAMC,EACN,SAAA,CAAWZ,CAAAA,CACX,IAAKO,CAAAA,CAAMP,CAAE,CAAA,CACb,IAAA,CAAMa,EAAK,IAAA,CACX,QAAA,CAAUA,CAAAA,CAAK,QAAA,CACf,SAAUA,CAAAA,CAAK,QACjB,CAAC,CAAA,CAEDW,GAAOZ,CAAAA,CAETD,CAAAA,EAASC,EAAG,OACd,CACA,OAAOY,CACT","file":"index.cjs","sourcesContent":["/**\n * Character data tables for unspook.\n *\n * Every code point we care about, grouped by category, with a human-readable\n * name, a severity, and (where relevant) an ASCII/Latin replacement.\n */\n\nexport type Category =\n | \"zero-width\" // invisible, no width — ZWSP, BOM, word joiner, …\n | \"bidi\" // bidirectional controls — the \"Trojan Source\" attack class\n | \"tag\" // Unicode tag chars — invisible prompt-injection / watermarking\n | \"variation-selector\" // VS1–256 — can be used to hide data on a base char\n | \"invisible-space\" // looks like a space but isn't (NBSP, soft hyphen, …)\n | \"control\" // C0/C1 control characters\n | \"smart-punctuation\" // curly quotes, em dash, ellipsis → ASCII\n | \"homoglyph\"; // letters that look Latin but aren't (Cyrillic а, Greek ο, …)\n\nexport type Severity = \"danger\" | \"warning\" | \"info\";\n\nexport interface CharInfo {\n name: string;\n category: Category;\n severity: Severity;\n /** What to put in place of this char when cleaning. `\"\"` = drop entirely. */\n replacement: string;\n}\n\nexport const SEVERITY_BY_CATEGORY: Record<Category, Severity> = {\n bidi: \"danger\",\n tag: \"danger\",\n \"zero-width\": \"warning\",\n control: \"warning\",\n homoglyph: \"warning\",\n \"variation-selector\": \"warning\",\n \"invisible-space\": \"info\",\n \"smart-punctuation\": \"info\",\n};\n\nconst def = (name: string, category: Category, replacement: string): CharInfo => ({\n name,\n category,\n severity: SEVERITY_BY_CATEGORY[category],\n replacement,\n});\n\n/** Explicitly enumerated special characters, keyed by code point. */\nexport const SPECIAL: Map<number, CharInfo> = new Map([\n // — Zero-width / invisible —\n [0x200b, def(\"ZERO WIDTH SPACE\", \"zero-width\", \"\")],\n [0x200c, def(\"ZERO WIDTH NON-JOINER\", \"zero-width\", \"\")],\n [0x200d, def(\"ZERO WIDTH JOINER\", \"zero-width\", \"\")],\n [0x2060, def(\"WORD JOINER\", \"zero-width\", \"\")],\n [0x2061, def(\"FUNCTION APPLICATION\", \"zero-width\", \"\")],\n [0x2062, def(\"INVISIBLE TIMES\", \"zero-width\", \"\")],\n [0x2063, def(\"INVISIBLE SEPARATOR\", \"zero-width\", \"\")],\n [0x2064, def(\"INVISIBLE PLUS\", \"zero-width\", \"\")],\n [0xfeff, def(\"ZERO WIDTH NO-BREAK SPACE (BOM)\", \"zero-width\", \"\")],\n [0x180e, def(\"MONGOLIAN VOWEL SEPARATOR\", \"zero-width\", \"\")],\n\n // — Bidirectional controls (Trojan Source, CVE-2021-42574) —\n [0x202a, def(\"LEFT-TO-RIGHT EMBEDDING\", \"bidi\", \"\")],\n [0x202b, def(\"RIGHT-TO-LEFT EMBEDDING\", \"bidi\", \"\")],\n [0x202c, def(\"POP DIRECTIONAL FORMATTING\", \"bidi\", \"\")],\n [0x202d, def(\"LEFT-TO-RIGHT OVERRIDE\", \"bidi\", \"\")],\n [0x202e, def(\"RIGHT-TO-LEFT OVERRIDE\", \"bidi\", \"\")],\n [0x2066, def(\"LEFT-TO-RIGHT ISOLATE\", \"bidi\", \"\")],\n [0x2067, def(\"RIGHT-TO-LEFT ISOLATE\", \"bidi\", \"\")],\n [0x2068, def(\"FIRST STRONG ISOLATE\", \"bidi\", \"\")],\n [0x2069, def(\"POP DIRECTIONAL ISOLATE\", \"bidi\", \"\")],\n [0x200e, def(\"LEFT-TO-RIGHT MARK\", \"bidi\", \"\")],\n [0x200f, def(\"RIGHT-TO-LEFT MARK\", \"bidi\", \"\")],\n [0x061c, def(\"ARABIC LETTER MARK\", \"bidi\", \"\")],\n\n // — Spaces that aren't a normal space —\n [0x00a0, def(\"NO-BREAK SPACE\", \"invisible-space\", \" \")],\n [0x202f, def(\"NARROW NO-BREAK SPACE\", \"invisible-space\", \" \")],\n [0x2007, def(\"FIGURE SPACE\", \"invisible-space\", \" \")],\n [0x2008, def(\"PUNCTUATION SPACE\", \"invisible-space\", \" \")],\n [0x2009, def(\"THIN SPACE\", \"invisible-space\", \" \")],\n [0x200a, def(\"HAIR SPACE\", \"invisible-space\", \" \")],\n [0x2000, def(\"EN QUAD\", \"invisible-space\", \" \")],\n [0x2001, def(\"EM QUAD\", \"invisible-space\", \" \")],\n [0x2002, def(\"EN SPACE\", \"invisible-space\", \" \")],\n [0x2003, def(\"EM SPACE\", \"invisible-space\", \" \")],\n [0x2004, def(\"THREE-PER-EM SPACE\", \"invisible-space\", \" \")],\n [0x2005, def(\"FOUR-PER-EM SPACE\", \"invisible-space\", \" \")],\n [0x2006, def(\"SIX-PER-EM SPACE\", \"invisible-space\", \" \")],\n [0x205f, def(\"MEDIUM MATHEMATICAL SPACE\", \"invisible-space\", \" \")],\n [0x3000, def(\"IDEOGRAPHIC SPACE\", \"invisible-space\", \" \")],\n [0x1680, def(\"OGHAM SPACE MARK\", \"invisible-space\", \" \")],\n [0x00ad, def(\"SOFT HYPHEN\", \"invisible-space\", \"\")],\n [0x2028, def(\"LINE SEPARATOR\", \"invisible-space\", \"\\n\")],\n [0x2029, def(\"PARAGRAPH SEPARATOR\", \"invisible-space\", \"\\n\")],\n\n // — Smart / typographic punctuation → ASCII —\n [0x201c, def(\"LEFT DOUBLE QUOTATION MARK\", \"smart-punctuation\", '\"')],\n [0x201d, def(\"RIGHT DOUBLE QUOTATION MARK\", \"smart-punctuation\", '\"')],\n [0x201e, def(\"DOUBLE LOW-9 QUOTATION MARK\", \"smart-punctuation\", '\"')],\n [0x201f, def(\"DOUBLE HIGH-REVERSED-9 QUOTATION MARK\", \"smart-punctuation\", '\"')],\n [0x00ab, def(\"LEFT-POINTING DOUBLE ANGLE QUOTATION MARK\", \"smart-punctuation\", '\"')],\n [0x00bb, def(\"RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK\", \"smart-punctuation\", '\"')],\n [0x2018, def(\"LEFT SINGLE QUOTATION MARK\", \"smart-punctuation\", \"'\")],\n [0x2019, def(\"RIGHT SINGLE QUOTATION MARK\", \"smart-punctuation\", \"'\")],\n [0x201a, def(\"SINGLE LOW-9 QUOTATION MARK\", \"smart-punctuation\", \"'\")],\n [0x201b, def(\"SINGLE HIGH-REVERSED-9 QUOTATION MARK\", \"smart-punctuation\", \"'\")],\n [0x2039, def(\"SINGLE LEFT-POINTING ANGLE QUOTATION MARK\", \"smart-punctuation\", \"'\")],\n [0x203a, def(\"SINGLE RIGHT-POINTING ANGLE QUOTATION MARK\", \"smart-punctuation\", \"'\")],\n [0x2032, def(\"PRIME\", \"smart-punctuation\", \"'\")],\n [0x2033, def(\"DOUBLE PRIME\", \"smart-punctuation\", '\"')],\n [0x2013, def(\"EN DASH\", \"smart-punctuation\", \"-\")],\n [0x2014, def(\"EM DASH\", \"smart-punctuation\", \"--\")],\n [0x2015, def(\"HORIZONTAL BAR\", \"smart-punctuation\", \"--\")],\n [0x2212, def(\"MINUS SIGN\", \"smart-punctuation\", \"-\")],\n [0x2026, def(\"HORIZONTAL ELLIPSIS\", \"smart-punctuation\", \"...\")],\n]);\n\n/** Names for the C0 control characters worth labelling. */\nconst C0_NAMES: Record<number, string> = {\n 0x00: \"NULL\", 0x01: \"START OF HEADING\", 0x02: \"START OF TEXT\", 0x03: \"END OF TEXT\",\n 0x04: \"END OF TRANSMISSION\", 0x05: \"ENQUIRY\", 0x06: \"ACKNOWLEDGE\", 0x07: \"BELL\",\n 0x08: \"BACKSPACE\", 0x0b: \"LINE TABULATION\", 0x0c: \"FORM FEED\", 0x0e: \"SHIFT OUT\",\n 0x0f: \"SHIFT IN\", 0x10: \"DATA LINK ESCAPE\", 0x1b: \"ESCAPE\", 0x7f: \"DELETE\",\n};\n\n/**\n * Homoglyphs: characters that render like a Latin letter/number but aren't.\n * Mapped to their Latin look-alike. (A curated, high-confidence subset of the\n * Unicode confusables data.)\n */\nexport const HOMOGLYPHS: Map<number, string> = new Map([\n // Cyrillic (lowercase)\n [0x0430, \"a\"], [0x0435, \"e\"], [0x043e, \"o\"], [0x0440, \"p\"], [0x0441, \"c\"],\n [0x0443, \"y\"], [0x0445, \"x\"], [0x0455, \"s\"], [0x0456, \"i\"], [0x0458, \"j\"],\n [0x04bb, \"h\"], [0x0501, \"d\"], [0x043a, \"k\"], [0x043c, \"m\"], [0x0442, \"t\"],\n // Cyrillic (uppercase)\n [0x0410, \"A\"], [0x0412, \"B\"], [0x0415, \"E\"], [0x041a, \"K\"], [0x041c, \"M\"],\n [0x041d, \"H\"], [0x041e, \"O\"], [0x0420, \"P\"], [0x0421, \"C\"], [0x0422, \"T\"],\n [0x0423, \"Y\"], [0x0425, \"X\"], [0x0405, \"S\"], [0x0406, \"I\"], [0x0408, \"J\"],\n // Greek\n [0x03bf, \"o\"], [0x03b1, \"a\"], [0x03b3, \"y\"], [0x03c1, \"p\"], [0x03c5, \"u\"],\n [0x0391, \"A\"], [0x0392, \"B\"], [0x0395, \"E\"], [0x0396, \"Z\"], [0x0397, \"H\"],\n [0x0399, \"I\"], [0x039a, \"K\"], [0x039c, \"M\"], [0x039d, \"N\"], [0x039f, \"O\"],\n [0x03a1, \"P\"], [0x03a4, \"T\"], [0x03a5, \"Y\"], [0x03a7, \"X\"],\n // Latin look-alikes / letterlike symbols\n [0x212f, \"e\"], [0x2113, \"l\"], [0x0131, \"i\"],\n]);\n\nconst ASCII_PRINTABLE = (cp: number): boolean => cp >= 0x20 && cp <= 0x7e;\n\n/** Classify a single code point, or return `null` if it's unremarkable. */\nexport function classify(cp: number): CharInfo | null {\n const special = SPECIAL.get(cp);\n if (special) return special;\n\n // Tag characters (used for invisible prompt injection / watermarking).\n if (cp >= 0xe0000 && cp <= 0xe007f) {\n return def(\"TAG CHARACTER\", \"tag\", \"\");\n }\n // Variation selectors (can hide data on a base glyph).\n if ((cp >= 0xfe00 && cp <= 0xfe0f) || (cp >= 0xe0100 && cp <= 0xe01ef)) {\n return def(\"VARIATION SELECTOR\", \"variation-selector\", \"\");\n }\n // Fullwidth ASCII forms → normal ASCII.\n if (cp >= 0xff01 && cp <= 0xff5e) {\n return def(\"FULLWIDTH FORM\", \"homoglyph\", String.fromCharCode(cp - 0xfee0));\n }\n // Explicit homoglyph table.\n const latin = HOMOGLYPHS.get(cp);\n if (latin !== undefined) {\n return def(`HOMOGLYPH OF \"${latin}\"`, \"homoglyph\", latin);\n }\n // C0/C1 control characters (excluding tab, newline, carriage return).\n if (\n (cp <= 0x1f && cp !== 0x09 && cp !== 0x0a && cp !== 0x0d) ||\n cp === 0x7f ||\n (cp >= 0x80 && cp <= 0x9f)\n ) {\n const name = C0_NAMES[cp] ?? \"CONTROL CHARACTER\";\n return def(name, \"control\", \"\");\n }\n // Everything else (incl. normal ASCII, emoji, CJK, accented Latin) is fine.\n void ASCII_PRINTABLE;\n return null;\n}\n","/**\n * unspook — find and remove the invisible, dangerous, and confusable\n * characters hiding in your text.\n *\n * Zero dependencies. Pure functions. Runs anywhere JavaScript does.\n */\n\nimport { classify, type Category, type Severity } from \"./data.js\";\n\nexport type { Category, Severity } from \"./data.js\";\n\nexport interface Finding {\n /** UTF-16 index of the character in the source string. */\n index: number;\n /** The offending character itself. */\n char: string;\n /** Unicode code point. */\n codePoint: number;\n /** Formatted code point, e.g. `\"U+200B\"`. */\n hex: string;\n /** Human-readable Unicode name. */\n name: string;\n category: Category;\n severity: Severity;\n}\n\nexport interface CleanOptions {\n /** Remove zero-width / invisible characters (ZWSP, BOM, word joiner…). Default `true`. */\n zeroWidth?: boolean;\n /** Remove bidirectional control characters (the \"Trojan Source\" class). Default `true`. */\n bidi?: boolean;\n /** Remove Unicode tag characters (invisible prompt-injection / watermarks). Default `true`. */\n tag?: boolean;\n /** Remove C0/C1 control characters. Default `true`. */\n control?: boolean;\n /** Remove variation selectors. Default `false` (they're legitimate in emoji). */\n variationSelectors?: boolean;\n /** Normalize exotic spaces (NBSP→space, soft hyphen→removed, line sep→newline). Default `true`. */\n invisibleSpaces?: boolean;\n /** Convert smart/typographic punctuation to ASCII (“ ”→\", —→--, …→...). Default `false`. */\n smartPunctuation?: boolean;\n /** Map homoglyphs to their Latin look-alike (Cyrillic а→a, fullwidth A→A). Default `false`. */\n homoglyphs?: boolean;\n /** Collapse runs of spaces/tabs into one space. Default `false`. */\n collapseWhitespace?: boolean;\n /** Normalize `\\r\\n` and `\\r` to `\\n`. Default `true`. */\n normalizeNewlines?: boolean;\n /** Trim leading/trailing whitespace from the whole string. Default `false`. */\n trim?: boolean;\n}\n\n/** The default, safe cleaning profile: strip the dangerous & invisible, keep meaning. */\nexport const DEFAULT_OPTIONS: Required<CleanOptions> = {\n zeroWidth: true,\n bidi: true,\n tag: true,\n control: true,\n variationSelectors: false,\n invisibleSpaces: true,\n smartPunctuation: false,\n homoglyphs: false,\n collapseWhitespace: false,\n normalizeNewlines: true,\n trim: false,\n};\n\n/** Turn everything on — for when you want maximally plain ASCII-ish text. */\nexport const AGGRESSIVE_OPTIONS: Required<CleanOptions> = {\n zeroWidth: true,\n bidi: true,\n tag: true,\n control: true,\n variationSelectors: true,\n invisibleSpaces: true,\n smartPunctuation: true,\n homoglyphs: true,\n collapseWhitespace: true,\n normalizeNewlines: true,\n trim: true,\n};\n\nconst CATEGORY_OPTION: Record<Category, keyof CleanOptions> = {\n \"zero-width\": \"zeroWidth\",\n bidi: \"bidi\",\n tag: \"tag\",\n control: \"control\",\n \"variation-selector\": \"variationSelectors\",\n \"invisible-space\": \"invisibleSpaces\",\n \"smart-punctuation\": \"smartPunctuation\",\n homoglyph: \"homoglyphs\",\n};\n\n/** Categories whose characters are genuinely invisible (used by {@link reveal}). */\nconst INVISIBLE_CATEGORIES = new Set<Category>([\n \"zero-width\",\n \"bidi\",\n \"tag\",\n \"variation-selector\",\n \"control\",\n \"invisible-space\",\n]);\n\nfunction toHex(cp: number): string {\n return \"U+\" + cp.toString(16).toUpperCase().padStart(4, \"0\");\n}\n\n/**\n * Find every suspicious character in `text`.\n *\n * ```ts\n * scan(\"hi​there\");\n * // [{ index: 2, char: \"​\", codePoint: 8203, hex: \"U+200B\",\n * // name: \"ZERO WIDTH SPACE\", category: \"zero-width\", severity: \"warning\" }]\n * ```\n */\nexport function scan(text: string): Finding[] {\n const findings: Finding[] = [];\n let index = 0;\n for (const ch of text) {\n const cp = ch.codePointAt(0) as number;\n const info = classify(cp);\n if (info) {\n findings.push({\n index,\n char: ch,\n codePoint: cp,\n hex: toHex(cp),\n name: info.name,\n category: info.category,\n severity: info.severity,\n });\n }\n index += ch.length;\n }\n return findings;\n}\n\n/** `true` if `text` contains no suspicious characters at all. */\nexport function isClean(text: string): boolean {\n for (const ch of text) {\n if (classify(ch.codePointAt(0) as number)) return false;\n }\n return true;\n}\n\nexport interface Stats {\n total: number;\n byCategory: Record<Category, number>;\n bySeverity: Record<Severity, number>;\n}\n\n/** Summarize what's lurking in `text` without listing every occurrence. */\nexport function stats(text: string): Stats {\n const byCategory = {\n \"zero-width\": 0, bidi: 0, tag: 0, \"variation-selector\": 0,\n \"invisible-space\": 0, control: 0, \"smart-punctuation\": 0, homoglyph: 0,\n } as Record<Category, number>;\n const bySeverity = { danger: 0, warning: 0, info: 0 } as Record<Severity, number>;\n let total = 0;\n for (const f of scan(text)) {\n byCategory[f.category]++;\n bySeverity[f.severity]++;\n total++;\n }\n return { total, byCategory, bySeverity };\n}\n\n/**\n * Return a cleaned copy of `text`. By default it strips the dangerous and\n * invisible characters while preserving the visible meaning of your text.\n *\n * ```ts\n * clean(\"Hello​world\"); // \"Helloworld\"\n * clean(\"“quote”\", { smartPunctuation: true }); // '\"quote\"'\n * clean(\"аdmin\", { homoglyphs: true }); // \"admin\" (Cyrillic а → a)\n * ```\n */\nexport function clean(text: string, options: CleanOptions = {}): string {\n const opts = { ...DEFAULT_OPTIONS, ...options };\n\n let src = text;\n if (opts.normalizeNewlines) src = src.replace(/\\r\\n?/g, \"\\n\");\n\n let out = \"\";\n for (const ch of src) {\n const cp = ch.codePointAt(0) as number;\n const info = classify(cp);\n if (!info) {\n out += ch;\n continue;\n }\n const enabled = opts[CATEGORY_OPTION[info.category]];\n out += enabled ? info.replacement : ch;\n }\n\n if (opts.collapseWhitespace) {\n out = out.replace(/[^\\S\\n]+/g, \" \").replace(/[ \\t]*\\n[ \\t]*/g, \"\\n\");\n }\n if (opts.trim) out = out.trim();\n\n return out;\n}\n\nexport interface RevealOptions {\n /** Custom token renderer. Default: `(f) => \"[\" + f.hex + \"]\"`. */\n token?: (finding: Finding) => string;\n}\n\n/**\n * Make invisible characters visible by replacing them with a readable token,\n * leaving everything else untouched — handy for logs and terminals.\n *\n * ```ts\n * reveal(\"a​b\"); // \"a[U+200B]b\"\n * ```\n */\nexport function reveal(text: string, options: RevealOptions = {}): string {\n const token = options.token ?? ((f: Finding) => `[${f.hex}]`);\n let out = \"\";\n let index = 0;\n for (const ch of text) {\n const cp = ch.codePointAt(0) as number;\n const info = classify(cp);\n if (info && INVISIBLE_CATEGORIES.has(info.category)) {\n out += token({\n index,\n char: ch,\n codePoint: cp,\n hex: toHex(cp),\n name: info.name,\n category: info.category,\n severity: info.severity,\n });\n } else {\n out += ch;\n }\n index += ch.length;\n }\n return out;\n}\n"]}
@@ -0,0 +1,103 @@
1
+ /**
2
+ * Character data tables for unspook.
3
+ *
4
+ * Every code point we care about, grouped by category, with a human-readable
5
+ * name, a severity, and (where relevant) an ASCII/Latin replacement.
6
+ */
7
+ type Category = "zero-width" | "bidi" | "tag" | "variation-selector" | "invisible-space" | "control" | "smart-punctuation" | "homoglyph";
8
+ type Severity = "danger" | "warning" | "info";
9
+
10
+ /**
11
+ * unspook — find and remove the invisible, dangerous, and confusable
12
+ * characters hiding in your text.
13
+ *
14
+ * Zero dependencies. Pure functions. Runs anywhere JavaScript does.
15
+ */
16
+
17
+ interface Finding {
18
+ /** UTF-16 index of the character in the source string. */
19
+ index: number;
20
+ /** The offending character itself. */
21
+ char: string;
22
+ /** Unicode code point. */
23
+ codePoint: number;
24
+ /** Formatted code point, e.g. `"U+200B"`. */
25
+ hex: string;
26
+ /** Human-readable Unicode name. */
27
+ name: string;
28
+ category: Category;
29
+ severity: Severity;
30
+ }
31
+ interface CleanOptions {
32
+ /** Remove zero-width / invisible characters (ZWSP, BOM, word joiner…). Default `true`. */
33
+ zeroWidth?: boolean;
34
+ /** Remove bidirectional control characters (the "Trojan Source" class). Default `true`. */
35
+ bidi?: boolean;
36
+ /** Remove Unicode tag characters (invisible prompt-injection / watermarks). Default `true`. */
37
+ tag?: boolean;
38
+ /** Remove C0/C1 control characters. Default `true`. */
39
+ control?: boolean;
40
+ /** Remove variation selectors. Default `false` (they're legitimate in emoji). */
41
+ variationSelectors?: boolean;
42
+ /** Normalize exotic spaces (NBSP→space, soft hyphen→removed, line sep→newline). Default `true`. */
43
+ invisibleSpaces?: boolean;
44
+ /** Convert smart/typographic punctuation to ASCII (“ ”→", —→--, …→...). Default `false`. */
45
+ smartPunctuation?: boolean;
46
+ /** Map homoglyphs to their Latin look-alike (Cyrillic а→a, fullwidth A→A). Default `false`. */
47
+ homoglyphs?: boolean;
48
+ /** Collapse runs of spaces/tabs into one space. Default `false`. */
49
+ collapseWhitespace?: boolean;
50
+ /** Normalize `\r\n` and `\r` to `\n`. Default `true`. */
51
+ normalizeNewlines?: boolean;
52
+ /** Trim leading/trailing whitespace from the whole string. Default `false`. */
53
+ trim?: boolean;
54
+ }
55
+ /** The default, safe cleaning profile: strip the dangerous & invisible, keep meaning. */
56
+ declare const DEFAULT_OPTIONS: Required<CleanOptions>;
57
+ /** Turn everything on — for when you want maximally plain ASCII-ish text. */
58
+ declare const AGGRESSIVE_OPTIONS: Required<CleanOptions>;
59
+ /**
60
+ * Find every suspicious character in `text`.
61
+ *
62
+ * ```ts
63
+ * scan("hi​there");
64
+ * // [{ index: 2, char: "​", codePoint: 8203, hex: "U+200B",
65
+ * // name: "ZERO WIDTH SPACE", category: "zero-width", severity: "warning" }]
66
+ * ```
67
+ */
68
+ declare function scan(text: string): Finding[];
69
+ /** `true` if `text` contains no suspicious characters at all. */
70
+ declare function isClean(text: string): boolean;
71
+ interface Stats {
72
+ total: number;
73
+ byCategory: Record<Category, number>;
74
+ bySeverity: Record<Severity, number>;
75
+ }
76
+ /** Summarize what's lurking in `text` without listing every occurrence. */
77
+ declare function stats(text: string): Stats;
78
+ /**
79
+ * Return a cleaned copy of `text`. By default it strips the dangerous and
80
+ * invisible characters while preserving the visible meaning of your text.
81
+ *
82
+ * ```ts
83
+ * clean("Hello​world"); // "Helloworld"
84
+ * clean("“quote”", { smartPunctuation: true }); // '"quote"'
85
+ * clean("аdmin", { homoglyphs: true }); // "admin" (Cyrillic а → a)
86
+ * ```
87
+ */
88
+ declare function clean(text: string, options?: CleanOptions): string;
89
+ interface RevealOptions {
90
+ /** Custom token renderer. Default: `(f) => "[" + f.hex + "]"`. */
91
+ token?: (finding: Finding) => string;
92
+ }
93
+ /**
94
+ * Make invisible characters visible by replacing them with a readable token,
95
+ * leaving everything else untouched — handy for logs and terminals.
96
+ *
97
+ * ```ts
98
+ * reveal("a​b"); // "a[U+200B]b"
99
+ * ```
100
+ */
101
+ declare function reveal(text: string, options?: RevealOptions): string;
102
+
103
+ export { AGGRESSIVE_OPTIONS, type Category, type CleanOptions, DEFAULT_OPTIONS, type Finding, type RevealOptions, type Severity, type Stats, clean, isClean, reveal, scan, stats };
@@ -0,0 +1,103 @@
1
+ /**
2
+ * Character data tables for unspook.
3
+ *
4
+ * Every code point we care about, grouped by category, with a human-readable
5
+ * name, a severity, and (where relevant) an ASCII/Latin replacement.
6
+ */
7
+ type Category = "zero-width" | "bidi" | "tag" | "variation-selector" | "invisible-space" | "control" | "smart-punctuation" | "homoglyph";
8
+ type Severity = "danger" | "warning" | "info";
9
+
10
+ /**
11
+ * unspook — find and remove the invisible, dangerous, and confusable
12
+ * characters hiding in your text.
13
+ *
14
+ * Zero dependencies. Pure functions. Runs anywhere JavaScript does.
15
+ */
16
+
17
+ interface Finding {
18
+ /** UTF-16 index of the character in the source string. */
19
+ index: number;
20
+ /** The offending character itself. */
21
+ char: string;
22
+ /** Unicode code point. */
23
+ codePoint: number;
24
+ /** Formatted code point, e.g. `"U+200B"`. */
25
+ hex: string;
26
+ /** Human-readable Unicode name. */
27
+ name: string;
28
+ category: Category;
29
+ severity: Severity;
30
+ }
31
+ interface CleanOptions {
32
+ /** Remove zero-width / invisible characters (ZWSP, BOM, word joiner…). Default `true`. */
33
+ zeroWidth?: boolean;
34
+ /** Remove bidirectional control characters (the "Trojan Source" class). Default `true`. */
35
+ bidi?: boolean;
36
+ /** Remove Unicode tag characters (invisible prompt-injection / watermarks). Default `true`. */
37
+ tag?: boolean;
38
+ /** Remove C0/C1 control characters. Default `true`. */
39
+ control?: boolean;
40
+ /** Remove variation selectors. Default `false` (they're legitimate in emoji). */
41
+ variationSelectors?: boolean;
42
+ /** Normalize exotic spaces (NBSP→space, soft hyphen→removed, line sep→newline). Default `true`. */
43
+ invisibleSpaces?: boolean;
44
+ /** Convert smart/typographic punctuation to ASCII (“ ”→", —→--, …→...). Default `false`. */
45
+ smartPunctuation?: boolean;
46
+ /** Map homoglyphs to their Latin look-alike (Cyrillic а→a, fullwidth A→A). Default `false`. */
47
+ homoglyphs?: boolean;
48
+ /** Collapse runs of spaces/tabs into one space. Default `false`. */
49
+ collapseWhitespace?: boolean;
50
+ /** Normalize `\r\n` and `\r` to `\n`. Default `true`. */
51
+ normalizeNewlines?: boolean;
52
+ /** Trim leading/trailing whitespace from the whole string. Default `false`. */
53
+ trim?: boolean;
54
+ }
55
+ /** The default, safe cleaning profile: strip the dangerous & invisible, keep meaning. */
56
+ declare const DEFAULT_OPTIONS: Required<CleanOptions>;
57
+ /** Turn everything on — for when you want maximally plain ASCII-ish text. */
58
+ declare const AGGRESSIVE_OPTIONS: Required<CleanOptions>;
59
+ /**
60
+ * Find every suspicious character in `text`.
61
+ *
62
+ * ```ts
63
+ * scan("hi​there");
64
+ * // [{ index: 2, char: "​", codePoint: 8203, hex: "U+200B",
65
+ * // name: "ZERO WIDTH SPACE", category: "zero-width", severity: "warning" }]
66
+ * ```
67
+ */
68
+ declare function scan(text: string): Finding[];
69
+ /** `true` if `text` contains no suspicious characters at all. */
70
+ declare function isClean(text: string): boolean;
71
+ interface Stats {
72
+ total: number;
73
+ byCategory: Record<Category, number>;
74
+ bySeverity: Record<Severity, number>;
75
+ }
76
+ /** Summarize what's lurking in `text` without listing every occurrence. */
77
+ declare function stats(text: string): Stats;
78
+ /**
79
+ * Return a cleaned copy of `text`. By default it strips the dangerous and
80
+ * invisible characters while preserving the visible meaning of your text.
81
+ *
82
+ * ```ts
83
+ * clean("Hello​world"); // "Helloworld"
84
+ * clean("“quote”", { smartPunctuation: true }); // '"quote"'
85
+ * clean("аdmin", { homoglyphs: true }); // "admin" (Cyrillic а → a)
86
+ * ```
87
+ */
88
+ declare function clean(text: string, options?: CleanOptions): string;
89
+ interface RevealOptions {
90
+ /** Custom token renderer. Default: `(f) => "[" + f.hex + "]"`. */
91
+ token?: (finding: Finding) => string;
92
+ }
93
+ /**
94
+ * Make invisible characters visible by replacing them with a readable token,
95
+ * leaving everything else untouched — handy for logs and terminals.
96
+ *
97
+ * ```ts
98
+ * reveal("a​b"); // "a[U+200B]b"
99
+ * ```
100
+ */
101
+ declare function reveal(text: string, options?: RevealOptions): string;
102
+
103
+ export { AGGRESSIVE_OPTIONS, type Category, type CleanOptions, DEFAULT_OPTIONS, type Finding, type RevealOptions, type Severity, type Stats, clean, isClean, reveal, scan, stats };
package/dist/index.js ADDED
@@ -0,0 +1,2 @@
1
+ export{b as AGGRESSIVE_OPTIONS,a as DEFAULT_OPTIONS,f as clean,d as isClean,g as reveal,c as scan,e as stats}from'./chunk-PXW2RFNH.js';//# sourceMappingURL=index.js.map
2
+ //# sourceMappingURL=index.js.map
@@ -0,0 +1 @@
1
+ {"version":3,"sources":[],"names":[],"mappings":"","file":"index.js"}
package/package.json ADDED
@@ -0,0 +1,68 @@
1
+ {
2
+ "name": "unspook",
3
+ "version": "0.1.0",
4
+ "description": "Find & remove invisible, dangerous, and confusable characters hiding in your text — zero-width spaces, BOMs, bidi 'Trojan Source' controls, Unicode tag prompt-injection, homoglyphs, smart quotes & NBSP. Zero dependencies, runs locally.",
5
+ "type": "module",
6
+ "main": "./dist/index.js",
7
+ "module": "./dist/index.js",
8
+ "types": "./dist/index.d.ts",
9
+ "exports": {
10
+ ".": {
11
+ "types": "./dist/index.d.ts",
12
+ "import": "./dist/index.js",
13
+ "require": "./dist/index.cjs"
14
+ }
15
+ },
16
+ "bin": {
17
+ "unspook": "./dist/cli.js"
18
+ },
19
+ "files": [
20
+ "dist"
21
+ ],
22
+ "sideEffects": false,
23
+ "engines": {
24
+ "node": ">=18"
25
+ },
26
+ "scripts": {
27
+ "build": "tsup",
28
+ "build:web": "vite build",
29
+ "dev": "vite",
30
+ "test": "vitest run",
31
+ "test:watch": "vitest",
32
+ "typecheck": "tsc --noEmit",
33
+ "lint": "tsc --noEmit",
34
+ "prepublishOnly": "npm run build"
35
+ },
36
+ "keywords": [
37
+ "invisible-characters",
38
+ "zero-width-space",
39
+ "unicode",
40
+ "sanitize",
41
+ "homoglyph",
42
+ "trojan-source",
43
+ "bidi",
44
+ "smart-quotes",
45
+ "non-breaking-space",
46
+ "text-cleaner",
47
+ "prompt-injection",
48
+ "security",
49
+ "zero-dependency"
50
+ ],
51
+ "author": "didrod205 (https://github.com/didrod205)",
52
+ "license": "MIT",
53
+ "repository": {
54
+ "type": "git",
55
+ "url": "git+https://github.com/didrod205/unspook.git"
56
+ },
57
+ "bugs": {
58
+ "url": "https://github.com/didrod205/unspook/issues"
59
+ },
60
+ "homepage": "https://didrod205.github.io/unspook/",
61
+ "devDependencies": {
62
+ "@types/node": "^25.9.1",
63
+ "tsup": "^8.3.5",
64
+ "typescript": "^5.7.2",
65
+ "vite": "^6.0.0",
66
+ "vitest": "^2.1.8"
67
+ }
68
+ }