wcwidth 0.2.12__tar.gz → 0.2.14__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Potentially problematic release.
This version of wcwidth might be problematic. Click here for more details.
- {wcwidth-0.2.12/wcwidth.egg-info → wcwidth-0.2.14}/PKG-INFO +35 -11
- {wcwidth-0.2.12 → wcwidth-0.2.14}/README.rst +20 -6
- {wcwidth-0.2.12 → wcwidth-0.2.14}/bin/update-tables.py +51 -22
- {wcwidth-0.2.12 → wcwidth-0.2.14}/bin/verify-table-integrity.py +38 -9
- {wcwidth-0.2.12 → wcwidth-0.2.14}/bin/wcwidth-browser.py +10 -15
- {wcwidth-0.2.12 → wcwidth-0.2.14}/bin/wcwidth-libc-comparator.py +3 -18
- {wcwidth-0.2.12 → wcwidth-0.2.14}/docs/conf.py +1 -2
- {wcwidth-0.2.12 → wcwidth-0.2.14}/docs/intro.rst +20 -6
- wcwidth-0.2.14/docs/requirements.txt +60 -0
- wcwidth-0.2.14/docs/specs.rst +80 -0
- {wcwidth-0.2.12 → wcwidth-0.2.14}/docs/unicode_version.rst +15 -0
- {wcwidth-0.2.12 → wcwidth-0.2.14}/requirements-docs.in +1 -1
- {wcwidth-0.2.12 → wcwidth-0.2.14}/requirements-tests37.txt +2 -2
- wcwidth-0.2.14/requirements-tests39.in +4 -0
- wcwidth-0.2.14/requirements-tests39.txt +38 -0
- wcwidth-0.2.14/requirements-update.txt +27 -0
- {wcwidth-0.2.12 → wcwidth-0.2.14}/setup.cfg +0 -3
- {wcwidth-0.2.12 → wcwidth-0.2.14}/setup.py +4 -5
- {wcwidth-0.2.12 → wcwidth-0.2.14}/tests/test_core.py +89 -46
- {wcwidth-0.2.12 → wcwidth-0.2.14}/tests/test_emojis.py +39 -46
- wcwidth-0.2.14/tests/test_table_integrity.py +18 -0
- {wcwidth-0.2.12 → wcwidth-0.2.14}/tests/test_ucslevel.py +6 -7
- {wcwidth-0.2.12 → wcwidth-0.2.14}/tox.ini +17 -45
- {wcwidth-0.2.12 → wcwidth-0.2.14}/wcwidth/__init__.py +1 -1
- {wcwidth-0.2.12 → wcwidth-0.2.14}/wcwidth/table_vs16.py +2 -2
- {wcwidth-0.2.12 → wcwidth-0.2.14}/wcwidth/table_wide.py +339 -69
- {wcwidth-0.2.12 → wcwidth-0.2.14}/wcwidth/table_zero.py +729 -20
- {wcwidth-0.2.12 → wcwidth-0.2.14}/wcwidth/unicode_versions.py +3 -1
- {wcwidth-0.2.12 → wcwidth-0.2.14}/wcwidth/wcwidth.py +20 -36
- {wcwidth-0.2.12 → wcwidth-0.2.14/wcwidth.egg-info}/PKG-INFO +35 -11
- {wcwidth-0.2.12 → wcwidth-0.2.14}/wcwidth.egg-info/SOURCES.txt +1 -3
- wcwidth-0.2.12/code_templates/python_table_width.py.j2 +0 -0
- wcwidth-0.2.12/docs/requirements.txt +0 -57
- wcwidth-0.2.12/docs/specs.rst +0 -58
- wcwidth-0.2.12/requirements-tests39.in +0 -15
- wcwidth-0.2.12/requirements-tests39.txt +0 -98
- wcwidth-0.2.12/requirements-update.txt +0 -26
- wcwidth-0.2.12/wcwidth.egg-info/requires.txt +0 -3
- {wcwidth-0.2.12 → wcwidth-0.2.14}/LICENSE +0 -0
- {wcwidth-0.2.12 → wcwidth-0.2.14}/MANIFEST.in +0 -0
- {wcwidth-0.2.12 → wcwidth-0.2.14}/bin/new-wide-by-version.py +0 -0
- {wcwidth-0.2.12 → wcwidth-0.2.14}/code_templates/python_table.py.j2 +0 -0
- {wcwidth-0.2.12 → wcwidth-0.2.14}/code_templates/unicode_version.rst.j2 +0 -0
- {wcwidth-0.2.12 → wcwidth-0.2.14}/code_templates/unicode_versions.py.j2 +0 -0
- {wcwidth-0.2.12 → wcwidth-0.2.14}/docs/api.rst +0 -0
- {wcwidth-0.2.12 → wcwidth-0.2.14}/docs/index.rst +0 -0
- {wcwidth-0.2.12 → wcwidth-0.2.14}/requirements-develop.txt +0 -0
- {wcwidth-0.2.12 → wcwidth-0.2.14}/requirements-tests36.txt +0 -0
- {wcwidth-0.2.12 → wcwidth-0.2.14}/requirements-tests37.in +0 -0
- {wcwidth-0.2.12 → wcwidth-0.2.14}/requirements-update.in +0 -0
- {wcwidth-0.2.12 → wcwidth-0.2.14}/tests/__init__.py +0 -0
- {wcwidth-0.2.12 → wcwidth-0.2.14}/tests/emoji-variation-sequences.txt +0 -0
- {wcwidth-0.2.12 → wcwidth-0.2.14}/tests/emoji-zwj-sequences.txt +0 -0
- {wcwidth-0.2.12 → wcwidth-0.2.14}/wcwidth.egg-info/dependency_links.txt +0 -0
- {wcwidth-0.2.12 → wcwidth-0.2.14}/wcwidth.egg-info/top_level.txt +0 -0
- {wcwidth-0.2.12 → wcwidth-0.2.14}/wcwidth.egg-info/zip-safe +0 -0
|
@@ -1,6 +1,6 @@
|
|
|
1
|
-
Metadata-Version: 2.
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
2
|
Name: wcwidth
|
|
3
|
-
Version: 0.2.
|
|
3
|
+
Version: 0.2.14
|
|
4
4
|
Summary: Measures the displayed width of unicode strings in a terminal
|
|
5
5
|
Home-page: https://github.com/jquast/wcwidth
|
|
6
6
|
Author: Jeff Quast
|
|
@@ -13,8 +13,6 @@ Classifier: Development Status :: 5 - Production/Stable
|
|
|
13
13
|
Classifier: Environment :: Console
|
|
14
14
|
Classifier: License :: OSI Approved :: MIT License
|
|
15
15
|
Classifier: Operating System :: POSIX
|
|
16
|
-
Classifier: Programming Language :: Python :: 2.7
|
|
17
|
-
Classifier: Programming Language :: Python :: 3.5
|
|
18
16
|
Classifier: Programming Language :: Python :: 3.6
|
|
19
17
|
Classifier: Programming Language :: Python :: 3.7
|
|
20
18
|
Classifier: Programming Language :: Python :: 3.8
|
|
@@ -22,12 +20,24 @@ Classifier: Programming Language :: Python :: 3.9
|
|
|
22
20
|
Classifier: Programming Language :: Python :: 3.10
|
|
23
21
|
Classifier: Programming Language :: Python :: 3.11
|
|
24
22
|
Classifier: Programming Language :: Python :: 3.12
|
|
23
|
+
Classifier: Programming Language :: Python :: 3.13
|
|
24
|
+
Classifier: Programming Language :: Python :: 3.14
|
|
25
25
|
Classifier: Topic :: Software Development :: Libraries
|
|
26
26
|
Classifier: Topic :: Software Development :: Localization
|
|
27
27
|
Classifier: Topic :: Software Development :: Internationalization
|
|
28
28
|
Classifier: Topic :: Terminals
|
|
29
|
+
Requires-Python: >=3.6
|
|
29
30
|
License-File: LICENSE
|
|
30
|
-
|
|
31
|
+
Dynamic: author
|
|
32
|
+
Dynamic: author-email
|
|
33
|
+
Dynamic: classifier
|
|
34
|
+
Dynamic: description
|
|
35
|
+
Dynamic: home-page
|
|
36
|
+
Dynamic: keywords
|
|
37
|
+
Dynamic: license
|
|
38
|
+
Dynamic: license-file
|
|
39
|
+
Dynamic: requires-python
|
|
40
|
+
Dynamic: summary
|
|
31
41
|
|
|
32
42
|
|pypi_downloads| |codecov| |license|
|
|
33
43
|
|
|
@@ -63,7 +73,7 @@ Example
|
|
|
63
73
|
>>> text = u'コンニチハ'
|
|
64
74
|
|
|
65
75
|
Python **incorrectly** uses the *string length* of 5 codepoints rather than the
|
|
66
|
-
*
|
|
76
|
+
*printable length* of 10 cells, so that when using the `rjust` function, the
|
|
67
77
|
output length is wrong::
|
|
68
78
|
|
|
69
79
|
>>> print(len('コンニチハ'))
|
|
@@ -126,7 +136,7 @@ Briefly, return values of function ``wcwidth()`` are:
|
|
|
126
136
|
Function ``wcswidth()`` simply returns the sum of all values for each character
|
|
127
137
|
along a string, or ``-1`` when it occurs anywhere along a string.
|
|
128
138
|
|
|
129
|
-
Full API Documentation at https://wcwidth.readthedocs.
|
|
139
|
+
Full API Documentation at https://wcwidth.readthedocs.io
|
|
130
140
|
|
|
131
141
|
==========
|
|
132
142
|
Developing
|
|
@@ -136,9 +146,9 @@ Install wcwidth in editable mode::
|
|
|
136
146
|
|
|
137
147
|
pip install -e .
|
|
138
148
|
|
|
139
|
-
Execute unit tests using tox_::
|
|
149
|
+
Execute unit tests using tox_ for all supported Python versions::
|
|
140
150
|
|
|
141
|
-
tox -e
|
|
151
|
+
tox -e py36,py37,py38,py39,py310,py311,py312,py313,py314
|
|
142
152
|
|
|
143
153
|
Updating Unicode Version
|
|
144
154
|
------------------------
|
|
@@ -247,8 +257,19 @@ Other Languages
|
|
|
247
257
|
=======
|
|
248
258
|
History
|
|
249
259
|
=======
|
|
260
|
+
|
|
261
|
+
0.2.14 *2025-09-22*
|
|
262
|
+
* **Drop Support** for Python 2.7 and 3.5. `PR #117`_.
|
|
263
|
+
* **Update** tables to include Unicode Specifications 16.0.0 and 17.0.0.
|
|
264
|
+
`PR #146`_.
|
|
265
|
+
* **Bugfix** U+00AD SOFT HYPHEN should measure as 1, versions 0.2.9 through
|
|
266
|
+
0.2.13 measured as 0. `PR #149`_.
|
|
267
|
+
|
|
268
|
+
0.2.13 *2024-01-06*
|
|
269
|
+
* **Bugfix** zero-width support for Hangul Jamo (Korean)
|
|
270
|
+
|
|
250
271
|
0.2.12 *2023-11-21*
|
|
251
|
-
* re-release to remove .pyi file misplaced in wheel files `Issue #101
|
|
272
|
+
* re-release to remove .pyi file misplaced in wheel files `Issue #101`_.
|
|
252
273
|
|
|
253
274
|
0.2.11 *2023-11-20*
|
|
254
275
|
* Include tests files in the source distribution (`PR #98`_, `PR #100`_).
|
|
@@ -286,7 +307,7 @@ History
|
|
|
286
307
|
Environment variable ``UNICODE_VERSION``, such as ``13.0``, or ``6.3.0``.
|
|
287
308
|
See the `jquast/ucs-detect`_ CLI utility for automatic detection.
|
|
288
309
|
* **Enhancement**:
|
|
289
|
-
API Documentation is published to readthedocs.
|
|
310
|
+
API Documentation is published to readthedocs.io.
|
|
290
311
|
* **Updated** tables for *all* Unicode Specifications with files
|
|
291
312
|
published in a programmatically consumable format, versions 4.1.0
|
|
292
313
|
through 13.0
|
|
@@ -364,6 +385,9 @@ https://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c::
|
|
|
364
385
|
.. _`PR #97`: https://github.com/jquast/wcwidth/pull/97
|
|
365
386
|
.. _`PR #98`: https://github.com/jquast/wcwidth/pull/98
|
|
366
387
|
.. _`PR #100`: https://github.com/jquast/wcwidth/pull/100
|
|
388
|
+
.. _`PR #117`: https://github.com/jquast/wcwidth/pull/117
|
|
389
|
+
.. _`PR #146`: https://github.com/jquast/wcwidth/pull/146
|
|
390
|
+
.. _`PR #149`: https://github.com/jquast/wcwidth/pull/149
|
|
367
391
|
.. _`Issue #101`: https://github.com/jquast/wcwidth/issues/101
|
|
368
392
|
.. _`jquast/blessed`: https://github.com/jquast/blessed
|
|
369
393
|
.. _`selectel/pyte`: https://github.com/selectel/pyte
|
|
@@ -32,7 +32,7 @@ Example
|
|
|
32
32
|
>>> text = u'コンニチハ'
|
|
33
33
|
|
|
34
34
|
Python **incorrectly** uses the *string length* of 5 codepoints rather than the
|
|
35
|
-
*
|
|
35
|
+
*printable length* of 10 cells, so that when using the `rjust` function, the
|
|
36
36
|
output length is wrong::
|
|
37
37
|
|
|
38
38
|
>>> print(len('コンニチハ'))
|
|
@@ -95,7 +95,7 @@ Briefly, return values of function ``wcwidth()`` are:
|
|
|
95
95
|
Function ``wcswidth()`` simply returns the sum of all values for each character
|
|
96
96
|
along a string, or ``-1`` when it occurs anywhere along a string.
|
|
97
97
|
|
|
98
|
-
Full API Documentation at https://wcwidth.readthedocs.
|
|
98
|
+
Full API Documentation at https://wcwidth.readthedocs.io
|
|
99
99
|
|
|
100
100
|
==========
|
|
101
101
|
Developing
|
|
@@ -105,9 +105,9 @@ Install wcwidth in editable mode::
|
|
|
105
105
|
|
|
106
106
|
pip install -e .
|
|
107
107
|
|
|
108
|
-
Execute unit tests using tox_::
|
|
108
|
+
Execute unit tests using tox_ for all supported Python versions::
|
|
109
109
|
|
|
110
|
-
tox -e
|
|
110
|
+
tox -e py36,py37,py38,py39,py310,py311,py312,py313,py314
|
|
111
111
|
|
|
112
112
|
Updating Unicode Version
|
|
113
113
|
------------------------
|
|
@@ -216,8 +216,19 @@ Other Languages
|
|
|
216
216
|
=======
|
|
217
217
|
History
|
|
218
218
|
=======
|
|
219
|
+
|
|
220
|
+
0.2.14 *2025-09-22*
|
|
221
|
+
* **Drop Support** for Python 2.7 and 3.5. `PR #117`_.
|
|
222
|
+
* **Update** tables to include Unicode Specifications 16.0.0 and 17.0.0.
|
|
223
|
+
`PR #146`_.
|
|
224
|
+
* **Bugfix** U+00AD SOFT HYPHEN should measure as 1, versions 0.2.9 through
|
|
225
|
+
0.2.13 measured as 0. `PR #149`_.
|
|
226
|
+
|
|
227
|
+
0.2.13 *2024-01-06*
|
|
228
|
+
* **Bugfix** zero-width support for Hangul Jamo (Korean)
|
|
229
|
+
|
|
219
230
|
0.2.12 *2023-11-21*
|
|
220
|
-
* re-release to remove .pyi file misplaced in wheel files `Issue #101
|
|
231
|
+
* re-release to remove .pyi file misplaced in wheel files `Issue #101`_.
|
|
221
232
|
|
|
222
233
|
0.2.11 *2023-11-20*
|
|
223
234
|
* Include tests files in the source distribution (`PR #98`_, `PR #100`_).
|
|
@@ -255,7 +266,7 @@ History
|
|
|
255
266
|
Environment variable ``UNICODE_VERSION``, such as ``13.0``, or ``6.3.0``.
|
|
256
267
|
See the `jquast/ucs-detect`_ CLI utility for automatic detection.
|
|
257
268
|
* **Enhancement**:
|
|
258
|
-
API Documentation is published to readthedocs.
|
|
269
|
+
API Documentation is published to readthedocs.io.
|
|
259
270
|
* **Updated** tables for *all* Unicode Specifications with files
|
|
260
271
|
published in a programmatically consumable format, versions 4.1.0
|
|
261
272
|
through 13.0
|
|
@@ -333,6 +344,9 @@ https://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c::
|
|
|
333
344
|
.. _`PR #97`: https://github.com/jquast/wcwidth/pull/97
|
|
334
345
|
.. _`PR #98`: https://github.com/jquast/wcwidth/pull/98
|
|
335
346
|
.. _`PR #100`: https://github.com/jquast/wcwidth/pull/100
|
|
347
|
+
.. _`PR #117`: https://github.com/jquast/wcwidth/pull/117
|
|
348
|
+
.. _`PR #146`: https://github.com/jquast/wcwidth/pull/146
|
|
349
|
+
.. _`PR #149`: https://github.com/jquast/wcwidth/pull/149
|
|
336
350
|
.. _`Issue #101`: https://github.com/jquast/wcwidth/issues/101
|
|
337
351
|
.. _`jquast/blessed`: https://github.com/jquast/blessed
|
|
338
352
|
.. _`selectel/pyte`: https://github.com/selectel/pyte
|
|
@@ -21,7 +21,7 @@ import unicodedata
|
|
|
21
21
|
from pathlib import Path
|
|
22
22
|
from dataclasses import field, fields, dataclass
|
|
23
23
|
|
|
24
|
-
from typing import Any, Mapping, Iterable, Iterator, Sequence,
|
|
24
|
+
from typing import Any, Mapping, Iterable, Iterator, Sequence, Collection
|
|
25
25
|
|
|
26
26
|
try:
|
|
27
27
|
from typing import Self
|
|
@@ -54,6 +54,19 @@ FETCH_BLOCKSIZE = int(os.environ.get('FETCH_BLOCKSIZE', '4096'))
|
|
|
54
54
|
MAX_RETRIES = int(os.environ.get('MAX_RETRIES', '6'))
|
|
55
55
|
BACKOFF_FACTOR = float(os.environ.get('BACKOFF_FACTOR', '0.1'))
|
|
56
56
|
|
|
57
|
+
# Hangul Jamo is a decomposed form of Hangul Syllables, see
|
|
58
|
+
# see https://www.unicode.org/faq/korean.html#3
|
|
59
|
+
# https://github.com/ridiculousfish/widecharwidth/pull/17
|
|
60
|
+
# https://github.com/jquast/ucs-detect/issues/9
|
|
61
|
+
# https://devblogs.microsoft.com/oldnewthing/20201009-00/?p=104351
|
|
62
|
+
# "Conjoining Jamo are divided into three classes: L, V, T (Leading
|
|
63
|
+
# consonant, Vowel, Trailing consonant). A Hangul Syllable consists of
|
|
64
|
+
# <LV> or <LVT> sequences."
|
|
65
|
+
HANGUL_JAMO_ZEROWIDTH = (
|
|
66
|
+
*range(0x1160, 0x1200), # Hangul Jungseong Filler .. Hangul Jongseong Ssangnieun
|
|
67
|
+
*range(0xD7B0, 0xD800), # Hangul Jungseong O-Yeo .. Undefined Character of Hangul Jamo Extended-B
|
|
68
|
+
)
|
|
69
|
+
|
|
57
70
|
|
|
58
71
|
def _bisearch(ucs, table):
|
|
59
72
|
"""A copy of wcwwidth._bisearch, to prevent having issues when depending on code that imports
|
|
@@ -77,7 +90,7 @@ def _bisearch(ucs, table):
|
|
|
77
90
|
|
|
78
91
|
@dataclass(order=True, frozen=True)
|
|
79
92
|
class UnicodeVersion:
|
|
80
|
-
"""A class for
|
|
93
|
+
"""A class for comparable unicode version."""
|
|
81
94
|
major: int
|
|
82
95
|
minor: int
|
|
83
96
|
micro: int | None
|
|
@@ -112,11 +125,11 @@ class TableEntry:
|
|
|
112
125
|
properties: tuple[str, ...]
|
|
113
126
|
comment: str
|
|
114
127
|
|
|
115
|
-
def
|
|
128
|
+
def filter_by_category_width(self, wide: int) -> bool:
|
|
116
129
|
"""
|
|
117
|
-
Return whether entry matches
|
|
130
|
+
Return whether entry matches displayed width.
|
|
118
131
|
|
|
119
|
-
|
|
132
|
+
Parses both DerivedGeneralCategory.txt and EastAsianWidth.txt
|
|
120
133
|
"""
|
|
121
134
|
if self.code_range is None:
|
|
122
135
|
return False
|
|
@@ -146,13 +159,12 @@ class TableEntry:
|
|
|
146
159
|
return wide == 1
|
|
147
160
|
|
|
148
161
|
@staticmethod
|
|
149
|
-
def
|
|
150
|
-
|
|
151
|
-
wide: int) -> set[tuple[int, int]]:
|
|
162
|
+
def parse_width_category_values(table_iter: Iterator[TableEntry],
|
|
163
|
+
wide: int) -> set[tuple[int, int]]:
|
|
152
164
|
"""Parse value ranges of unicode data files, by given category and width."""
|
|
153
165
|
return {n
|
|
154
166
|
for entry in table_iter
|
|
155
|
-
if entry.
|
|
167
|
+
if entry.filter_by_category_width(wide)
|
|
156
168
|
for n in list(range(entry.code_range[0], entry.code_range[1]))}
|
|
157
169
|
|
|
158
170
|
|
|
@@ -326,18 +338,19 @@ def fetch_table_wide_data() -> UnicodeTableRenderCtx:
|
|
|
326
338
|
for version in fetch_unicode_versions():
|
|
327
339
|
# parse typical 'wide' characters by categories 'W' and 'F',
|
|
328
340
|
table[version] = parse_category(fname=UnicodeDataFile.EastAsianWidth(version),
|
|
329
|
-
category_codes=('W', 'F'),
|
|
330
341
|
wide=2)
|
|
331
342
|
|
|
332
343
|
# subtract(!) wide characters that were defined above as 'W' category in EastAsianWidth,
|
|
333
344
|
# but also zero-width category 'Mn' or 'Mc' in DerivedGeneralCategory!
|
|
334
|
-
table[version].values.
|
|
335
|
-
|
|
336
|
-
|
|
345
|
+
table[version].values = table[version].values.difference(parse_category(
|
|
346
|
+
fname=UnicodeDataFile.DerivedGeneralCategory(version),
|
|
347
|
+
wide=0).values)
|
|
348
|
+
|
|
349
|
+
# Also subtract Hangul Jamo Vowels and Hangul Trailing Consonants
|
|
350
|
+
table[version].values = table[version].values.difference(HANGUL_JAMO_ZEROWIDTH)
|
|
337
351
|
|
|
338
352
|
# finally, join with atypical 'wide' characters defined by category 'Sk',
|
|
339
353
|
table[version].values.update(parse_category(fname=UnicodeDataFile.DerivedGeneralCategory(version),
|
|
340
|
-
category_codes=('Sk',),
|
|
341
354
|
wide=2).values)
|
|
342
355
|
return UnicodeTableRenderCtx('WIDE_EASTASIAN', table)
|
|
343
356
|
|
|
@@ -352,11 +365,28 @@ def fetch_table_zero_data() -> UnicodeTableRenderCtx:
|
|
|
352
365
|
for version in fetch_unicode_versions():
|
|
353
366
|
# Determine values of zero-width character lookup table by the following category codes
|
|
354
367
|
table[version] = parse_category(fname=UnicodeDataFile.DerivedGeneralCategory(version),
|
|
355
|
-
category_codes=('Me', 'Mn', 'Mc', 'Cf', 'Zl', 'Zp', 'Sk'),
|
|
356
368
|
wide=0)
|
|
357
369
|
|
|
358
|
-
#
|
|
370
|
+
# Include NULL
|
|
359
371
|
table[version].values.add(0)
|
|
372
|
+
|
|
373
|
+
# Add Hangul Jamo Vowels and Hangul Trailing Consonants
|
|
374
|
+
table[version].values.update(HANGUL_JAMO_ZEROWIDTH)
|
|
375
|
+
|
|
376
|
+
# Remove u+00AD categoryCode=Cf name="SOFT HYPHEN",
|
|
377
|
+
# > https://www.unicode.org/faq/casemap_charprop.html
|
|
378
|
+
#
|
|
379
|
+
# > Q: Unicode now treats the SOFT HYPHEN as format control (Cf)
|
|
380
|
+
# > character when formerly it was a punctuation character (Pd).
|
|
381
|
+
# > Doesn't this break ISO 8859-1 compatibility?
|
|
382
|
+
#
|
|
383
|
+
# > [..] In a terminal emulation environment, particularly in
|
|
384
|
+
# > ISO-8859-1 contexts, one could display the SOFT HYPHEN as a hyphen
|
|
385
|
+
# > in all circumstances.
|
|
386
|
+
#
|
|
387
|
+
# This value was wrongly measured as a width of '0' in this wcwidth
|
|
388
|
+
# versions 0.2.9 - 0.2.13. Fixed in 0.2.14
|
|
389
|
+
table[version].values.discard(0x00AD) # SOFT HYPHEN
|
|
360
390
|
return UnicodeTableRenderCtx('ZERO_WIDTH', table)
|
|
361
391
|
|
|
362
392
|
|
|
@@ -379,7 +409,6 @@ def fetch_table_vs16_data() -> UnicodeTableRenderCtx:
|
|
|
379
409
|
For that reason, and that these values are not expected to change,
|
|
380
410
|
only this single shared table is exported.
|
|
381
411
|
|
|
382
|
-
|
|
383
412
|
One example, where v3.2 became v1.1 ("-" 12.0, "+" 15.1)::
|
|
384
413
|
|
|
385
414
|
-2620 FE0F ; Basic_Emoji ; skull and crossbones # 3.2 [1] (☠️)
|
|
@@ -482,7 +511,7 @@ def parse_unicode_table(file: Iterable[str]) -> Iterator[TableEntry]:
|
|
|
482
511
|
|
|
483
512
|
|
|
484
513
|
def parse_vs16_table(fp: Iterable[str]) -> Iterator[TableEntry]:
|
|
485
|
-
"""Parse emoji-variation-sequences.txt for codepoints that
|
|
514
|
+
"""Parse emoji-variation-sequences.txt for codepoints that precede 0xFE0F."""
|
|
486
515
|
hex_str_vs16 = 'FE0F'
|
|
487
516
|
for line in fp:
|
|
488
517
|
data, _, comment = line.partition('#')
|
|
@@ -496,14 +525,14 @@ def parse_vs16_table(fp: Iterable[str]) -> Iterator[TableEntry]:
|
|
|
496
525
|
continue
|
|
497
526
|
code_points = code_points_str.split()
|
|
498
527
|
if len(code_points) == 2 and code_points[1] == hex_str_vs16:
|
|
499
|
-
#
|
|
528
|
+
# yield a single "code range" entry for a single value that precedes FE0F
|
|
500
529
|
yield TableEntry((int(code_points[0], 16), int(code_points[0], 16)), tuple(properties), comment)
|
|
501
530
|
|
|
502
531
|
|
|
503
532
|
@functools.cache
|
|
504
|
-
def parse_category(fname: str,
|
|
533
|
+
def parse_category(fname: str, wide: int) -> TableDef:
|
|
505
534
|
"""Parse value ranges of unicode data files, by given categories into string tables."""
|
|
506
|
-
print(f'parsing {fname}
|
|
535
|
+
print(f'parsing {fname}, wide={wide}: ', end='', flush=True)
|
|
507
536
|
|
|
508
537
|
with open(fname, encoding='utf-8') as f:
|
|
509
538
|
table_iter = parse_unicode_table(f)
|
|
@@ -512,7 +541,7 @@ def parse_category(fname: str, category_codes: Container[str], wide: int) -> Tab
|
|
|
512
541
|
version = next(table_iter).comment.strip()
|
|
513
542
|
# and "date string" from second line
|
|
514
543
|
date = next(table_iter).comment.split(':', 1)[1].strip()
|
|
515
|
-
values = TableEntry.
|
|
544
|
+
values = TableEntry.parse_width_category_values(table_iter, wide)
|
|
516
545
|
print('ok')
|
|
517
546
|
return TableDef(version, date, values)
|
|
518
547
|
|
|
@@ -63,9 +63,31 @@ Category code was changed from 'Mc' to 'Lo':
|
|
|
63
63
|
import logging
|
|
64
64
|
|
|
65
65
|
|
|
66
|
+
def bisearch_pair(ucs, table):
|
|
67
|
+
"""
|
|
68
|
+
A copy of wcwidth._bisearch() but also returns the range of matched values.
|
|
69
|
+
"""
|
|
70
|
+
lbound = 0
|
|
71
|
+
ubound = len(table) - 1
|
|
72
|
+
|
|
73
|
+
if ucs < table[0][0] or ucs > table[ubound][1]:
|
|
74
|
+
return (0, None, None)
|
|
75
|
+
while ubound >= lbound:
|
|
76
|
+
mid = (lbound + ubound) // 2
|
|
77
|
+
if ucs > table[mid][1]:
|
|
78
|
+
lbound = mid + 1
|
|
79
|
+
elif ucs < table[mid][0]:
|
|
80
|
+
ubound = mid - 1
|
|
81
|
+
else:
|
|
82
|
+
return (1, table[mid][0], table[mid][1])
|
|
83
|
+
|
|
84
|
+
return (0, None, None)
|
|
85
|
+
|
|
86
|
+
|
|
66
87
|
def main(log: logging.Logger):
|
|
67
88
|
# local
|
|
68
|
-
from wcwidth import ZERO_WIDTH, WIDE_EASTASIAN,
|
|
89
|
+
from wcwidth import ZERO_WIDTH, WIDE_EASTASIAN, list_versions
|
|
90
|
+
|
|
69
91
|
reversed_uni_versions = list(reversed(list_versions()))
|
|
70
92
|
tables = {'ZERO_WIDTH': ZERO_WIDTH,
|
|
71
93
|
'WIDE_EASTASIAN': WIDE_EASTASIAN}
|
|
@@ -81,14 +103,21 @@ def main(log: logging.Logger):
|
|
|
81
103
|
other_table = tables[other_table_name][version]
|
|
82
104
|
for start_range, stop_range in curr_table:
|
|
83
105
|
for unichar_n in range(start_range, stop_range):
|
|
84
|
-
|
|
85
|
-
|
|
86
|
-
|
|
87
|
-
|
|
88
|
-
|
|
89
|
-
|
|
90
|
-
|
|
91
|
-
|
|
106
|
+
result, _, _ = bisearch_pair(unichar_n, next_table)
|
|
107
|
+
if not result:
|
|
108
|
+
log.info(
|
|
109
|
+
f'value 0x{unichar_n:05x} in table_name={table_name}'
|
|
110
|
+
f' version={version} is not defined in next_version={next_version}'
|
|
111
|
+
f' from inclusive range {hex(start_range)}-{hex(stop_range)}'
|
|
112
|
+
)
|
|
113
|
+
result, lbound, ubound = bisearch_pair(unichar_n, other_table)
|
|
114
|
+
if result:
|
|
115
|
+
log.error(
|
|
116
|
+
f'value 0x{unichar_n:05x} in table_name={table_name}'
|
|
117
|
+
f' version={version} is duplicated in other_table_name={other_table_name}'
|
|
118
|
+
f' from inclusive range 0x{start_range:05x}-0x{stop_range:05x} of'
|
|
119
|
+
f' {table_name} against 0x{lbound:05x}-0x{ubound:05x} in {other_table_name}'
|
|
120
|
+
)
|
|
92
121
|
errors += 1
|
|
93
122
|
if errors:
|
|
94
123
|
log.error(f'{errors} errors, exit 1')
|
|
@@ -22,7 +22,6 @@ Options:
|
|
|
22
22
|
# Invalid constant name "echo"
|
|
23
23
|
# Invalid constant name "flushout" (col 4)
|
|
24
24
|
# Invalid module name "wcwidth-browser"
|
|
25
|
-
from __future__ import division, print_function
|
|
26
25
|
|
|
27
26
|
# std imports
|
|
28
27
|
import sys
|
|
@@ -44,7 +43,7 @@ flushout = functools.partial(print, end='', flush=True)
|
|
|
44
43
|
|
|
45
44
|
#: printable length of highest unicode character description
|
|
46
45
|
LIMIT_UCS = 0x3fffd
|
|
47
|
-
UCS_PRINTLEN = len('{
|
|
46
|
+
UCS_PRINTLEN = len(f'{LIMIT_UCS:0x}')
|
|
48
47
|
|
|
49
48
|
|
|
50
49
|
def readline(term, width):
|
|
@@ -69,7 +68,7 @@ def readline(term, width):
|
|
|
69
68
|
return text
|
|
70
69
|
|
|
71
70
|
|
|
72
|
-
class WcWideCharacterGenerator
|
|
71
|
+
class WcWideCharacterGenerator:
|
|
73
72
|
"""Generator yields unicode characters of the given ``width``."""
|
|
74
73
|
|
|
75
74
|
# pylint: disable=R0903
|
|
@@ -101,7 +100,7 @@ class WcWideCharacterGenerator(object):
|
|
|
101
100
|
return (ucs, name)
|
|
102
101
|
|
|
103
102
|
|
|
104
|
-
class WcCombinedCharacterGenerator
|
|
103
|
+
class WcCombinedCharacterGenerator:
|
|
105
104
|
"""Generator yields unicode characters with combining."""
|
|
106
105
|
|
|
107
106
|
# pylint: disable=R0903
|
|
@@ -116,7 +115,7 @@ class WcCombinedCharacterGenerator(object):
|
|
|
116
115
|
"""
|
|
117
116
|
self.characters = []
|
|
118
117
|
letters_o = ('o' * width)
|
|
119
|
-
for (begin, end) in ZERO_WIDTH[unicode_version]:
|
|
118
|
+
for (begin, end) in ZERO_WIDTH[_wcmatch_version(unicode_version)]:
|
|
120
119
|
for val in [_val for _val in
|
|
121
120
|
range(begin, end + 1)
|
|
122
121
|
if _val <= LIMIT_UCS]:
|
|
@@ -148,11 +147,8 @@ class WcCombinedCharacterGenerator(object):
|
|
|
148
147
|
continue
|
|
149
148
|
return (ucs, name)
|
|
150
149
|
|
|
151
|
-
# python 2.6 - 3.3 compatibility
|
|
152
|
-
next = __next__
|
|
153
150
|
|
|
154
|
-
|
|
155
|
-
class Style(object):
|
|
151
|
+
class Style:
|
|
156
152
|
"""Styling decorator class instance for terminal output."""
|
|
157
153
|
|
|
158
154
|
# pylint: disable=R0903
|
|
@@ -184,7 +180,7 @@ class Style(object):
|
|
|
184
180
|
setattr(self, key, val)
|
|
185
181
|
|
|
186
182
|
|
|
187
|
-
class Screen
|
|
183
|
+
class Screen:
|
|
188
184
|
"""Represents terminal style, data dimensions, and drawables."""
|
|
189
185
|
|
|
190
186
|
intro_msg_fmt = ('Delimiters ({delim}) should align, '
|
|
@@ -217,8 +213,7 @@ class Screen(object):
|
|
|
217
213
|
"""Text of a single column heading."""
|
|
218
214
|
delimiter = self.style.attr_minor(self.style.delimiter)
|
|
219
215
|
hint = self.style.header_hint * self.wide
|
|
220
|
-
heading =
|
|
221
|
-
.format(delimiter=delimiter, hint=hint))
|
|
216
|
+
heading = f'{delimiter}{hint}{delimiter}'
|
|
222
217
|
|
|
223
218
|
def alignment(*args):
|
|
224
219
|
if self.style.alignment == 'right':
|
|
@@ -264,7 +259,7 @@ class Screen(object):
|
|
|
264
259
|
return self.num_rows * self.num_columns
|
|
265
260
|
|
|
266
261
|
|
|
267
|
-
class Pager
|
|
262
|
+
class Pager:
|
|
268
263
|
"""A less(1)-like browser for browsing unicode characters."""
|
|
269
264
|
# pylint: disable=too-many-instance-attributes
|
|
270
265
|
|
|
@@ -570,10 +565,10 @@ class Pager(object):
|
|
|
570
565
|
if idx == self.last_page:
|
|
571
566
|
last_end = '(END)'
|
|
572
567
|
else:
|
|
573
|
-
last_end = '/{
|
|
568
|
+
last_end = f'/{self.last_page}'
|
|
574
569
|
txt = ('Page {idx}{last_end} - '
|
|
575
570
|
'{q} to quit, [keys: {keyset}]'
|
|
576
|
-
.format(idx=style.attr_minor('{
|
|
571
|
+
.format(idx=style.attr_minor(f'{idx}'),
|
|
577
572
|
last_end=style.attr_major(last_end),
|
|
578
573
|
keyset=style.attr_major('kjfbvc12-='),
|
|
579
574
|
q=style.attr_minor('q')))
|
|
@@ -1,5 +1,4 @@
|
|
|
1
1
|
#!/usr/bin/env python
|
|
2
|
-
# coding: utf-8
|
|
3
2
|
"""
|
|
4
3
|
Manual tests comparing wcwidth.py to libc's wcwidth(3) and wcswidth(3).
|
|
5
4
|
|
|
@@ -18,7 +17,6 @@ level for our library to use when comparing to libc.
|
|
|
18
17
|
# Invalid module name "wcwidth-libc-comparator"
|
|
19
18
|
|
|
20
19
|
# standard imports
|
|
21
|
-
from __future__ import print_function
|
|
22
20
|
|
|
23
21
|
# std imports
|
|
24
22
|
import sys
|
|
@@ -64,25 +62,12 @@ def report_ucs_msg(ucs, wcwidth_libc, wcwidth_local):
|
|
|
64
62
|
.decode('ascii')
|
|
65
63
|
.upper()
|
|
66
64
|
.lstrip('0'))
|
|
67
|
-
url = "http://codepoints.net/U+{}"
|
|
65
|
+
url = f"http://codepoints.net/U+{ucp}"
|
|
68
66
|
name = unicodedata.name(ucs)
|
|
69
|
-
return (
|
|
67
|
+
return ("libc,ours={},{} [--o{}o--] name={} val={} {}"
|
|
70
68
|
" ".format(wcwidth_libc, wcwidth_local, ucs, name, ord(ucs), url))
|
|
71
69
|
|
|
72
70
|
|
|
73
|
-
# use chr() for py3.x,
|
|
74
|
-
# unichr() for py2.x
|
|
75
|
-
try:
|
|
76
|
-
_ = unichr(0)
|
|
77
|
-
except NameError as err:
|
|
78
|
-
if err.args[0] == "name 'unichr' is not defined":
|
|
79
|
-
# pylint: disable=W0622
|
|
80
|
-
# Redefining built-in 'unichr' (col 8)
|
|
81
|
-
|
|
82
|
-
unichr = chr
|
|
83
|
-
else:
|
|
84
|
-
raise
|
|
85
|
-
|
|
86
71
|
if sys.maxunicode < 1114111:
|
|
87
72
|
warnings.warn('narrow Python build, only a small subset of '
|
|
88
73
|
'characters may be tested.')
|
|
@@ -108,7 +93,7 @@ def main(using_locale=('en_US', 'UTF-8',)):
|
|
|
108
93
|
report a detailed AssertionError to stdout.
|
|
109
94
|
"""
|
|
110
95
|
all_ucs = (ucs for ucs in
|
|
111
|
-
[
|
|
96
|
+
[chr(val) for val in range(sys.maxunicode)]
|
|
112
97
|
if is_named(ucs) and is_not_combining(ucs))
|
|
113
98
|
|
|
114
99
|
libc_name = ctypes.util.find_library('c')
|
|
@@ -1,5 +1,4 @@
|
|
|
1
1
|
#!/usr/bin/env python3
|
|
2
|
-
# -*- coding: utf-8 -*-
|
|
3
2
|
#
|
|
4
3
|
# wcwidth documentation build configuration file, created by
|
|
5
4
|
# sphinx-quickstart on Fri Oct 20 15:18:02 2017.
|
|
@@ -69,7 +68,7 @@ release = version = wcwidth.__version__
|
|
|
69
68
|
#
|
|
70
69
|
# This is also used if you do content translation via gettext catalogs.
|
|
71
70
|
# Usually you set "language" from the command line for these cases.
|
|
72
|
-
language =
|
|
71
|
+
language = 'en'
|
|
73
72
|
|
|
74
73
|
# List of patterns, relative to source directory, that match files and
|
|
75
74
|
# directories to ignore when looking for source files.
|
|
@@ -32,7 +32,7 @@ Example
|
|
|
32
32
|
>>> text = u'コンニチハ'
|
|
33
33
|
|
|
34
34
|
Python **incorrectly** uses the *string length* of 5 codepoints rather than the
|
|
35
|
-
*
|
|
35
|
+
*printable length* of 10 cells, so that when using the `rjust` function, the
|
|
36
36
|
output length is wrong::
|
|
37
37
|
|
|
38
38
|
>>> print(len('コンニチハ'))
|
|
@@ -95,7 +95,7 @@ Briefly, return values of function ``wcwidth()`` are:
|
|
|
95
95
|
Function ``wcswidth()`` simply returns the sum of all values for each character
|
|
96
96
|
along a string, or ``-1`` when it occurs anywhere along a string.
|
|
97
97
|
|
|
98
|
-
Full API Documentation at https://wcwidth.readthedocs.
|
|
98
|
+
Full API Documentation at https://wcwidth.readthedocs.io
|
|
99
99
|
|
|
100
100
|
==========
|
|
101
101
|
Developing
|
|
@@ -105,9 +105,9 @@ Install wcwidth in editable mode::
|
|
|
105
105
|
|
|
106
106
|
pip install -e .
|
|
107
107
|
|
|
108
|
-
Execute unit tests using tox_::
|
|
108
|
+
Execute unit tests using tox_ for all supported Python versions::
|
|
109
109
|
|
|
110
|
-
tox -e
|
|
110
|
+
tox -e py36,py37,py38,py39,py310,py311,py312,py313,py314
|
|
111
111
|
|
|
112
112
|
Updating Unicode Version
|
|
113
113
|
------------------------
|
|
@@ -216,8 +216,19 @@ Other Languages
|
|
|
216
216
|
=======
|
|
217
217
|
History
|
|
218
218
|
=======
|
|
219
|
+
|
|
220
|
+
0.2.14 *2025-09-22*
|
|
221
|
+
* **Drop Support** for Python 2.7 and 3.5. `PR #117`_.
|
|
222
|
+
* **Update** tables to include Unicode Specifications 16.0.0 and 17.0.0.
|
|
223
|
+
`PR #146`_.
|
|
224
|
+
* **Bugfix** U+00AD SOFT HYPHEN should measure as 1, versions 0.2.9 through
|
|
225
|
+
0.2.13 measured as 0. `PR #149`_.
|
|
226
|
+
|
|
227
|
+
0.2.13 *2024-01-06*
|
|
228
|
+
* **Bugfix** zero-width support for Hangul Jamo (Korean)
|
|
229
|
+
|
|
219
230
|
0.2.12 *2023-11-21*
|
|
220
|
-
* re-release to remove .pyi file misplaced in wheel files `Issue #101
|
|
231
|
+
* re-release to remove .pyi file misplaced in wheel files `Issue #101`_.
|
|
221
232
|
|
|
222
233
|
0.2.11 *2023-11-20*
|
|
223
234
|
* Include tests files in the source distribution (`PR #98`_, `PR #100`_).
|
|
@@ -255,7 +266,7 @@ History
|
|
|
255
266
|
Environment variable ``UNICODE_VERSION``, such as ``13.0``, or ``6.3.0``.
|
|
256
267
|
See the `jquast/ucs-detect`_ CLI utility for automatic detection.
|
|
257
268
|
* **Enhancement**:
|
|
258
|
-
API Documentation is published to readthedocs.
|
|
269
|
+
API Documentation is published to readthedocs.io.
|
|
259
270
|
* **Updated** tables for *all* Unicode Specifications with files
|
|
260
271
|
published in a programmatically consumable format, versions 4.1.0
|
|
261
272
|
through 13.0
|
|
@@ -333,6 +344,9 @@ https://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c::
|
|
|
333
344
|
.. _`PR #97`: https://github.com/jquast/wcwidth/pull/97
|
|
334
345
|
.. _`PR #98`: https://github.com/jquast/wcwidth/pull/98
|
|
335
346
|
.. _`PR #100`: https://github.com/jquast/wcwidth/pull/100
|
|
347
|
+
.. _`PR #117`: https://github.com/jquast/wcwidth/pull/117
|
|
348
|
+
.. _`PR #146`: https://github.com/jquast/wcwidth/pull/146
|
|
349
|
+
.. _`PR #149`: https://github.com/jquast/wcwidth/pull/149
|
|
336
350
|
.. _`Issue #101`: https://github.com/jquast/wcwidth/issues/101
|
|
337
351
|
.. _`jquast/blessed`: https://github.com/jquast/blessed
|
|
338
352
|
.. _`selectel/pyte`: https://github.com/selectel/pyte
|