check-unicode 0.4.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,12 @@
1
+ __pycache__/
2
+ *.py[cod]
3
+ *$py.class
4
+ *.egg-info/
5
+ dist/
6
+ build/
7
+ .venv/
8
+ .pytest_cache/
9
+ .mypy_cache/
10
+ .ruff_cache/
11
+ .coverage
12
+ uv.lock
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 mit-d
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
@@ -0,0 +1,149 @@
1
+ Metadata-Version: 2.4
2
+ Name: check-unicode
3
+ Version: 0.4.0
4
+ Summary: Pre-commit hook to detect and fix non-ASCII Unicode characters
5
+ Project-URL: Changelog, https://github.com/mit-d/check-unicode/blob/main/CHANGELOG.md
6
+ Project-URL: Issues, https://github.com/mit-d/check-unicode/issues
7
+ Project-URL: Repository, https://github.com/mit-d/check-unicode
8
+ Author-email: mit-d <derekmttn@gmail.com>
9
+ License-Expression: MIT
10
+ License-File: LICENSE
11
+ Keywords: linting,pre-commit,security,trojan-source,unicode
12
+ Classifier: Development Status :: 4 - Beta
13
+ Classifier: Environment :: Console
14
+ Classifier: Intended Audience :: Developers
15
+ Classifier: License :: OSI Approved :: MIT License
16
+ Classifier: Programming Language :: Python :: 3 :: Only
17
+ Classifier: Programming Language :: Python :: 3.11
18
+ Classifier: Programming Language :: Python :: 3.12
19
+ Classifier: Programming Language :: Python :: 3.13
20
+ Classifier: Programming Language :: Python :: 3.14
21
+ Classifier: Topic :: Software Development :: Quality Assurance
22
+ Requires-Python: >=3.11
23
+ Provides-Extra: dev
24
+ Requires-Dist: bump-my-version; extra == 'dev'
25
+ Requires-Dist: mypy; extra == 'dev'
26
+ Requires-Dist: pytest; extra == 'dev'
27
+ Requires-Dist: pytest-cov; extra == 'dev'
28
+ Requires-Dist: ruff; extra == 'dev'
29
+ Description-Content-Type: text/markdown
30
+
31
+ # check-unicode
32
+
33
+ [![CI](https://github.com/mit-d/check-unicode/actions/workflows/test.yml/badge.svg)](https://github.com/mit-d/check-unicode/actions/workflows/test.yml)
34
+ [![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](LICENSE)
35
+ [![Python 3.11+](https://img.shields.io/badge/python-3.11%2B-blue.svg)](https://www.python.org/)
36
+
37
+ A pre-commit hook to detect and fix non-ASCII Unicode characters in text files.
38
+
39
+ Catches smart quotes, em dashes, fancy spaces, dangerous invisible characters
40
+ (Trojan Source bidi attacks, zero-width chars), and other copy-paste artifacts.
41
+
42
+ ## Installation
43
+
44
+ ### As a pre-commit hook
45
+
46
+ ```yaml
47
+ repos:
48
+ - repo: https://github.com/mit-d/check-unicode
49
+ rev: v0.4.0
50
+ hooks:
51
+ - id: check-unicode
52
+ # or for auto-fix:
53
+ - id: fix-unicode
54
+ ```
55
+
56
+ ### Standalone
57
+
58
+ ```bash
59
+ pip install check-unicode
60
+ check-unicode path/to/file.txt
61
+ ```
62
+
63
+ ## Usage
64
+
65
+ ```text
66
+ check-unicode [OPTIONS] [FILES...]
67
+ ```
68
+
69
+ | Flag | Description | Default |
70
+ | ----------------------- | ----------------------------------------------------- | ------------- |
71
+ | `--fix` | Replace known offenders with ASCII, exit 1 if changed | off |
72
+ | `--allow-range RANGE` | Allow a Unicode range (e.g. `U+00A0-U+00FF`). Repeat. | none |
73
+ | `--allow-codepoint CP` | Allow codepoints (e.g. `U+00B0`). Repeat/comma-sep. | none |
74
+ | `--allow-category CAT` | Allow Unicode category (e.g. `Sc`). Repeatable. | none |
75
+ | `--allow-printable` | Allow all printable chars (only flag invisibles) | off |
76
+ | `--allow-script SCRIPT` | Allow Unicode script (e.g. `Latin`). Repeatable. | none |
77
+ | `--check-confusables` | Detect mixed-script homoglyph/confusable characters | off |
78
+ | `--severity LEVEL` | `error` (exit 1) or `warning` (print, exit 0) | `error` |
79
+ | `--no-color` | Disable ANSI color | auto-detect |
80
+ | `--config FILE` | Path to TOML config | auto-discover |
81
+ | `-q` / `--quiet` | Summary only | off |
82
+ | `-V` / `--version` | Print version | |
83
+
84
+ ## What it catches
85
+
86
+ - **Smart quotes**: `\u201c` `\u201d` `\u2018` `\u2019` and variants
87
+ - **Dashes**: em dash `\u2014`, en dash `\u2013`, minus sign `\u2212`
88
+ - **Fancy spaces**: non-breaking space, em space, thin space, etc.
89
+ - **Ellipsis**: `\u2026`
90
+ - **Dangerous invisible characters** (always flagged):
91
+ - Bidi control (Trojan Source CVE-2021-42574): U+202A-202E, U+2066-2069
92
+ - Zero-width: U+200B-200F, U+FEFF (mid-file), U+2060-2064, U+180E
93
+ - Replacement character: U+FFFD
94
+ - **Confusable homoglyphs** (with `--check-confusables`):
95
+ - Mixed-script identifiers where minority-script chars look like Latin
96
+ - Cyrillic/Greek/Armenian letters that visually resemble Latin letters
97
+ - e.g. Cyrillic `a` (U+0430) mixed with Latin `ccess_level`
98
+
99
+ ## Auto-fix
100
+
101
+ `--fix` replaces known offenders with ASCII equivalents:
102
+
103
+ | Unicode | Replacement |
104
+ | ------------ | ----------- |
105
+ | Smart quotes | `'` or `"` |
106
+ | En/em dashes | `--` |
107
+ | Minus sign | `-` |
108
+ | Fancy spaces | ` ` |
109
+ | Ellipsis | `...` |
110
+
111
+ Dangerous invisible characters are **never auto-fixed** -- they require manual
112
+ review.
113
+
114
+ ## Configuration
115
+
116
+ Create `.check-unicode.toml` or add to `pyproject.toml`:
117
+
118
+ ```toml
119
+ [tool.check-unicode]
120
+ allow-codepoints = ["U+00B0", "U+2192"]
121
+ allow-ranges = ["U+00A0-U+00FF"]
122
+ allow-categories = ["Sc"]
123
+ allow-printable = true
124
+ allow-scripts = ["Latin", "Cyrillic"]
125
+ check-confusables = true
126
+ severity = "error"
127
+ ```
128
+
129
+ ## Output
130
+
131
+ ```text
132
+ path/to/file.txt:42:17: U+201C LEFT DOUBLE QUOTATION MARK [Ps]
133
+ He said \u201chello\u201d to the crowd
134
+ ^
135
+ Found 5 non-ASCII characters in 2 files (3 fixable, 1 dangerous)
136
+ ```
137
+
138
+ ## Development
139
+
140
+ ```bash
141
+ uv venv && uv pip install -e ".[dev]"
142
+ .venv/bin/pytest -v --cov
143
+ .venv/bin/ruff check src/ tests/
144
+ .venv/bin/mypy src/
145
+ ```
146
+
147
+ ## License
148
+
149
+ MIT
@@ -0,0 +1,119 @@
1
+ # check-unicode
2
+
3
+ [![CI](https://github.com/mit-d/check-unicode/actions/workflows/test.yml/badge.svg)](https://github.com/mit-d/check-unicode/actions/workflows/test.yml)
4
+ [![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](LICENSE)
5
+ [![Python 3.11+](https://img.shields.io/badge/python-3.11%2B-blue.svg)](https://www.python.org/)
6
+
7
+ A pre-commit hook to detect and fix non-ASCII Unicode characters in text files.
8
+
9
+ Catches smart quotes, em dashes, fancy spaces, dangerous invisible characters
10
+ (Trojan Source bidi attacks, zero-width chars), and other copy-paste artifacts.
11
+
12
+ ## Installation
13
+
14
+ ### As a pre-commit hook
15
+
16
+ ```yaml
17
+ repos:
18
+ - repo: https://github.com/mit-d/check-unicode
19
+ rev: v0.4.0
20
+ hooks:
21
+ - id: check-unicode
22
+ # or for auto-fix:
23
+ - id: fix-unicode
24
+ ```
25
+
26
+ ### Standalone
27
+
28
+ ```bash
29
+ pip install check-unicode
30
+ check-unicode path/to/file.txt
31
+ ```
32
+
33
+ ## Usage
34
+
35
+ ```text
36
+ check-unicode [OPTIONS] [FILES...]
37
+ ```
38
+
39
+ | Flag | Description | Default |
40
+ | ----------------------- | ----------------------------------------------------- | ------------- |
41
+ | `--fix` | Replace known offenders with ASCII, exit 1 if changed | off |
42
+ | `--allow-range RANGE` | Allow a Unicode range (e.g. `U+00A0-U+00FF`). Repeat. | none |
43
+ | `--allow-codepoint CP` | Allow codepoints (e.g. `U+00B0`). Repeat/comma-sep. | none |
44
+ | `--allow-category CAT` | Allow Unicode category (e.g. `Sc`). Repeatable. | none |
45
+ | `--allow-printable` | Allow all printable chars (only flag invisibles) | off |
46
+ | `--allow-script SCRIPT` | Allow Unicode script (e.g. `Latin`). Repeatable. | none |
47
+ | `--check-confusables` | Detect mixed-script homoglyph/confusable characters | off |
48
+ | `--severity LEVEL` | `error` (exit 1) or `warning` (print, exit 0) | `error` |
49
+ | `--no-color` | Disable ANSI color | auto-detect |
50
+ | `--config FILE` | Path to TOML config | auto-discover |
51
+ | `-q` / `--quiet` | Summary only | off |
52
+ | `-V` / `--version` | Print version | |
53
+
54
+ ## What it catches
55
+
56
+ - **Smart quotes**: `\u201c` `\u201d` `\u2018` `\u2019` and variants
57
+ - **Dashes**: em dash `\u2014`, en dash `\u2013`, minus sign `\u2212`
58
+ - **Fancy spaces**: non-breaking space, em space, thin space, etc.
59
+ - **Ellipsis**: `\u2026`
60
+ - **Dangerous invisible characters** (always flagged):
61
+ - Bidi control (Trojan Source CVE-2021-42574): U+202A-202E, U+2066-2069
62
+ - Zero-width: U+200B-200F, U+FEFF (mid-file), U+2060-2064, U+180E
63
+ - Replacement character: U+FFFD
64
+ - **Confusable homoglyphs** (with `--check-confusables`):
65
+ - Mixed-script identifiers where minority-script chars look like Latin
66
+ - Cyrillic/Greek/Armenian letters that visually resemble Latin letters
67
+ - e.g. Cyrillic `a` (U+0430) mixed with Latin `ccess_level`
68
+
69
+ ## Auto-fix
70
+
71
+ `--fix` replaces known offenders with ASCII equivalents:
72
+
73
+ | Unicode | Replacement |
74
+ | ------------ | ----------- |
75
+ | Smart quotes | `'` or `"` |
76
+ | En/em dashes | `--` |
77
+ | Minus sign | `-` |
78
+ | Fancy spaces | ` ` |
79
+ | Ellipsis | `...` |
80
+
81
+ Dangerous invisible characters are **never auto-fixed** -- they require manual
82
+ review.
83
+
84
+ ## Configuration
85
+
86
+ Create `.check-unicode.toml` or add to `pyproject.toml`:
87
+
88
+ ```toml
89
+ [tool.check-unicode]
90
+ allow-codepoints = ["U+00B0", "U+2192"]
91
+ allow-ranges = ["U+00A0-U+00FF"]
92
+ allow-categories = ["Sc"]
93
+ allow-printable = true
94
+ allow-scripts = ["Latin", "Cyrillic"]
95
+ check-confusables = true
96
+ severity = "error"
97
+ ```
98
+
99
+ ## Output
100
+
101
+ ```text
102
+ path/to/file.txt:42:17: U+201C LEFT DOUBLE QUOTATION MARK [Ps]
103
+ He said \u201chello\u201d to the crowd
104
+ ^
105
+ Found 5 non-ASCII characters in 2 files (3 fixable, 1 dangerous)
106
+ ```
107
+
108
+ ## Development
109
+
110
+ ```bash
111
+ uv venv && uv pip install -e ".[dev]"
112
+ .venv/bin/pytest -v --cov
113
+ .venv/bin/ruff check src/ tests/
114
+ .venv/bin/mypy src/
115
+ ```
116
+
117
+ ## License
118
+
119
+ MIT
@@ -0,0 +1,357 @@
1
+ .\" Man page for check-unicode
2
+ .\" Generate with: man ./docs/check-unicode.1
3
+ .TH CHECK\-UNICODE 1 "2026-02-20" "check-unicode 0.2.0" "User Commands"
4
+ .
5
+ .SH NAME
6
+ check\-unicode \- detect and fix non\-ASCII Unicode characters in text files
7
+ .
8
+ .SH SYNOPSIS
9
+ .B check\-unicode
10
+ .RI [ OPTIONS ]
11
+ .IR FILE ...
12
+ .
13
+ .SH DESCRIPTION
14
+ .B check\-unicode
15
+ scans text files for non\-ASCII Unicode characters and reports their location,
16
+ codepoint, name, and category.
17
+ It is designed to catch copy\-paste artifacts such as smart quotes, em dashes,
18
+ fancy spaces, and dangerous invisible characters
19
+ (Trojan Source bidi attacks, zero\-width chars).
20
+ .PP
21
+ In
22
+ .B \-\-fix
23
+ mode it replaces known offenders with ASCII equivalents.
24
+ Dangerous invisible characters are never auto\-fixed and always require manual
25
+ review.
26
+ .PP
27
+ .B check\-unicode
28
+ is commonly used as a
29
+ .BR pre\-commit (1)
30
+ hook but also works as a standalone CLI tool.
31
+ .
32
+ .SH POSITIONAL ARGUMENTS
33
+ .TP
34
+ .I FILE ...
35
+ One or more files to check.
36
+ At least one file is required; the program exits with code\ 2 if none are
37
+ provided.
38
+ .
39
+ .SH OPTIONS
40
+ .SS Mode
41
+ .TP
42
+ .B \-\-fix
43
+ Replace known offenders (smart quotes, en/em dashes, fancy spaces, ellipsis)
44
+ with their ASCII equivalents using an atomic write (temp file + rename).
45
+ Exits\ 1 if any file was modified.
46
+ Dangerous invisible characters are never auto\-fixed.
47
+ .TP
48
+ .BR \-V ", " \-\-version
49
+ Print the program version and exit.
50
+ .
51
+ .SS Allow\-list options
52
+ These flags suppress findings for specific characters.
53
+ They extend (never replace) any values set in the config file.
54
+ Dangerous invisible characters are always flagged unless explicitly allowed by
55
+ .BR \-\-allow\-codepoint .
56
+ .TP
57
+ .BI \-\-allow\-range " RANGE"
58
+ Allow a Unicode range.
59
+ The format is
60
+ .IR U+XXXX\-U+YYYY .
61
+ May be repeated for multiple ranges.
62
+ .RS
63
+ .PP
64
+ Example:
65
+ .B \-\-allow\-range U+00A0\-U+00FF
66
+ .RE
67
+ .TP
68
+ .BI \-\-allow\-codepoint " CP"
69
+ Allow specific Unicode codepoints.
70
+ Accepts
71
+ .I U+XXXX
72
+ notation, comma\-separated and/or repeated.
73
+ This is the
74
+ .B only
75
+ flag that can suppress dangerous invisible characters.
76
+ .RS
77
+ .PP
78
+ Example:
79
+ .B \-\-allow\-codepoint U+00B0,U+00A9
80
+ .RE
81
+ .TP
82
+ .BI \-\-allow\-category " CAT"
83
+ Allow a Unicode general category.
84
+ May be repeated for multiple categories.
85
+ Use
86
+ .B \-\-list\-categories
87
+ to see all valid values.
88
+ .RS
89
+ .PP
90
+ Example:
91
+ .B \-\-allow\-category Sc
92
+ (Symbol, currency)
93
+ .RE
94
+ .TP
95
+ .B \-\-allow\-printable
96
+ Allow all printable non\-ASCII characters.
97
+ Only invisible and control characters will be flagged.
98
+ .TP
99
+ .BI \-\-allow\-script " SCRIPT"
100
+ Allow all characters from a Unicode script.
101
+ May be repeated.
102
+ Script names are case\-insensitive and normalized to title case.
103
+ Use
104
+ .B \-\-list\-scripts
105
+ to see all valid names.
106
+ .RS
107
+ .PP
108
+ Example:
109
+ .B \-\-allow\-script Cyrillic \-\-allow\-script Greek
110
+ .RE
111
+ .TP
112
+ .B \-\-list\-categories
113
+ Print all 30 Unicode general categories with descriptions and examples,
114
+ then exit.
115
+ Useful for discovering valid values for
116
+ .BR \-\-allow\-category .
117
+ .TP
118
+ .B \-\-list\-scripts
119
+ Print all known Unicode script names, then exit.
120
+ Useful for discovering valid values for
121
+ .BR \-\-allow\-script .
122
+ .
123
+ .SS Detection options
124
+ .TP
125
+ .B \-\-check\-confusables
126
+ Detect mixed\-script homoglyph/confusable characters, such as a Cyrillic
127
+ .B a
128
+ (U+0430) mixed into a Latin identifier.
129
+ This check is
130
+ .B not
131
+ suppressed by
132
+ .BR \-\-allow\-script .
133
+ .
134
+ .SS Output options
135
+ .TP
136
+ .BI \-\-severity " LEVEL"
137
+ Set exit\-code behavior.
138
+ .I LEVEL
139
+ must be
140
+ .B error
141
+ (exit\ 1 on findings) or
142
+ .B warning
143
+ (print findings but exit\ 0).
144
+ Default:
145
+ .BR error .
146
+ .TP
147
+ .B \-\-no\-color
148
+ Disable ANSI color output.
149
+ Color is also disabled when the
150
+ .B NO_COLOR
151
+ environment variable is set or stdout is not a TTY.
152
+ .TP
153
+ .BR \-q ", " \-\-quiet
154
+ Print the summary line only; suppress per\-finding details.
155
+ .
156
+ .SS Configuration
157
+ .TP
158
+ .BI \-\-config " FILE"
159
+ Path to a TOML config file.
160
+ If omitted, the program auto\-discovers
161
+ .I .check\-unicode.toml
162
+ in the current directory, or
163
+ .I [tool.check\-unicode]
164
+ in
165
+ .IR pyproject.toml .
166
+ .TP
167
+ .BI \-\-exclude\-pattern " PATTERN"
168
+ Exclude files matching a glob pattern.
169
+ May be repeated.
170
+ Extends any
171
+ .B exclude\-patterns
172
+ set in the config file.
173
+ Patterns are matched against both the full path and the basename.
174
+ .RS
175
+ .PP
176
+ Example:
177
+ .B \-\-exclude\-pattern '*.min.js' \-\-exclude\-pattern 'vendor/*'
178
+ .RE
179
+ .
180
+ .SH CONFIGURATION FILE
181
+ Settings can be stored in
182
+ .I .check\-unicode.toml
183
+ (standalone) or under the
184
+ .B [tool.check\-unicode]
185
+ table in
186
+ .IR pyproject.toml .
187
+ CLI flags always extend config\-file values; they never replace them.
188
+ .PP
189
+ .nf
190
+ .RS
191
+ [tool.check\-unicode]
192
+ allow\-codepoints = ["U+00B0", "U+2192"]
193
+ allow\-ranges = ["U+00A0\-U+00FF"]
194
+ allow\-categories = ["Sc"]
195
+ allow\-printable = true
196
+ allow\-scripts = ["Latin", "Cyrillic"]
197
+ check\-confusables = true
198
+ severity = "error"
199
+ exclude\-patterns = ["*.min.js", "vendor/*"]
200
+ .RE
201
+ .fi
202
+ .
203
+ .SH EXIT CODES
204
+ .TP
205
+ .B 0
206
+ No findings were detected, or
207
+ .B \-\-severity=warning
208
+ was used.
209
+ .TP
210
+ .B 1
211
+ Non\-ASCII findings were detected, or files were modified in
212
+ .B \-\-fix
213
+ mode.
214
+ .TP
215
+ .B 2
216
+ Usage error (invalid arguments, no files specified, etc.).
217
+ .
218
+ .SH WHAT IT CATCHES
219
+ .SS Copy\-paste artifacts (fixable with \-\-fix)
220
+ .TP
221
+ .B Smart quotes
222
+ \(lq\(rq \(oq\(cq and variants \(-> replaced with ASCII quotes
223
+ .TP
224
+ .B Dashes
225
+ Em dash (U+2014), en dash (U+2013), minus sign (U+2212) \(-> replaced with
226
+ .B \-\-
227
+ or
228
+ .BR \- .
229
+ .TP
230
+ .B Fancy spaces
231
+ Non\-breaking space, em space, thin space, and 14 other Unicode space characters
232
+ \(-> replaced with a regular space.
233
+ .TP
234
+ .B Ellipsis
235
+ Horizontal ellipsis (U+2026) \(-> replaced with
236
+ .BR ... .
237
+ .
238
+ .SS Dangerous invisible characters (never auto\-fixed)
239
+ .TP
240
+ .B Bidi controls (Trojan Source CVE\-2021\-42574)
241
+ U+202A\-202E (embedding/override), U+2066\-2069 (isolate).
242
+ These can make source code appear to do something different from what it
243
+ actually does.
244
+ .TP
245
+ .B Zero\-width characters
246
+ U+200B\-200F, U+FEFF (mid\-file BOM), U+2060\-2064, U+180E.
247
+ Invisible characters that can break identifiers or hide malicious code.
248
+ .TP
249
+ .B Replacement character
250
+ U+FFFD, usually indicates an encoding error.
251
+ .
252
+ .SS Confusable homoglyphs (with \-\-check\-confusables)
253
+ Mixed\-script identifiers where minority\-script characters visually resemble
254
+ Latin letters (e.g.\& Cyrillic
255
+ .I a
256
+ U+0430 in a Latin word).
257
+ .
258
+ .SH OUTPUT FORMAT
259
+ For each finding, the program prints the file, line, column, codepoint,
260
+ Unicode name, and general category:
261
+ .PP
262
+ .nf
263
+ .RS
264
+ path/to/file.txt:42:17: U+201C LEFT DOUBLE QUOTATION MARK [Ps]
265
+ He said \(lqhello\(rq to the crowd
266
+ ^
267
+ .RE
268
+ .fi
269
+ .PP
270
+ After all findings, a summary line is printed:
271
+ .PP
272
+ .nf
273
+ .RS
274
+ Found 5 non\-ASCII characters in 2 files (3 fixable, 1 dangerous)
275
+ .RE
276
+ .fi
277
+ .
278
+ .SH ENVIRONMENT
279
+ .TP
280
+ .B NO_COLOR
281
+ If set (to any value), ANSI color output is disabled.
282
+ See
283
+ .IR https://no\-color.org/ .
284
+ .
285
+ .SH EXAMPLES
286
+ Check all Python files in a project:
287
+ .PP
288
+ .RS
289
+ .B check\-unicode src/**/*.py
290
+ .RE
291
+ .PP
292
+ Auto\-fix smart quotes and dashes:
293
+ .PP
294
+ .RS
295
+ .B check\-unicode \-\-fix *.txt
296
+ .RE
297
+ .PP
298
+ Allow printable characters, flag only invisibles:
299
+ .PP
300
+ .RS
301
+ .B check\-unicode \-\-allow\-printable src/
302
+ .RE
303
+ .PP
304
+ Detect confusables while allowing Cyrillic script:
305
+ .PP
306
+ .RS
307
+ .B check\-unicode \-\-check\-confusables \-\-allow\-script Cyrillic src/
308
+ .RE
309
+ .PP
310
+ Warn without failing CI, disable color:
311
+ .PP
312
+ .RS
313
+ .B check\-unicode \-\-severity warning \-\-no\-color src/
314
+ .RE
315
+ .PP
316
+ List all valid Unicode script names:
317
+ .PP
318
+ .RS
319
+ .B check\-unicode \-\-list\-scripts
320
+ .RE
321
+ .PP
322
+ List all valid Unicode general categories:
323
+ .PP
324
+ .RS
325
+ .B check\-unicode \-\-list\-categories
326
+ .RE
327
+ .PP
328
+ Use with pre\-commit:
329
+ .PP
330
+ .nf
331
+ .RS
332
+ repos:
333
+ \- repo: https://github.com/mit\-d/check\-unicode
334
+ rev: v0.2.0
335
+ hooks:
336
+ \- id: check\-unicode
337
+ # or for auto\-fix:
338
+ \- id: fix\-unicode
339
+ .RE
340
+ .fi
341
+ .
342
+ .SH SEE ALSO
343
+ .BR pre\-commit (1),
344
+ .BR python3 (1),
345
+ .BR unicode (7)
346
+ .PP
347
+ Project repository:
348
+ .I https://github.com/mit\-d/check\-unicode
349
+ .
350
+ .SH AUTHORS
351
+ mit\-d <derekmttn@gmail.com>
352
+ .
353
+ .SH LICENSE
354
+ MIT License.
355
+ See the
356
+ .I LICENSE
357
+ file in the source distribution.