wcwidth 0.2.12__tar.gz → 0.2.14__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.


This version of wcwidth might be problematic. Click here for more details.

Files changed (56) hide show
  1. {wcwidth-0.2.12/wcwidth.egg-info → wcwidth-0.2.14}/PKG-INFO +35 -11
  2. {wcwidth-0.2.12 → wcwidth-0.2.14}/README.rst +20 -6
  3. {wcwidth-0.2.12 → wcwidth-0.2.14}/bin/update-tables.py +51 -22
  4. {wcwidth-0.2.12 → wcwidth-0.2.14}/bin/verify-table-integrity.py +38 -9
  5. {wcwidth-0.2.12 → wcwidth-0.2.14}/bin/wcwidth-browser.py +10 -15
  6. {wcwidth-0.2.12 → wcwidth-0.2.14}/bin/wcwidth-libc-comparator.py +3 -18
  7. {wcwidth-0.2.12 → wcwidth-0.2.14}/docs/conf.py +1 -2
  8. {wcwidth-0.2.12 → wcwidth-0.2.14}/docs/intro.rst +20 -6
  9. wcwidth-0.2.14/docs/requirements.txt +60 -0
  10. wcwidth-0.2.14/docs/specs.rst +80 -0
  11. {wcwidth-0.2.12 → wcwidth-0.2.14}/docs/unicode_version.rst +15 -0
  12. {wcwidth-0.2.12 → wcwidth-0.2.14}/requirements-docs.in +1 -1
  13. {wcwidth-0.2.12 → wcwidth-0.2.14}/requirements-tests37.txt +2 -2
  14. wcwidth-0.2.14/requirements-tests39.in +4 -0
  15. wcwidth-0.2.14/requirements-tests39.txt +38 -0
  16. wcwidth-0.2.14/requirements-update.txt +27 -0
  17. {wcwidth-0.2.12 → wcwidth-0.2.14}/setup.cfg +0 -3
  18. {wcwidth-0.2.12 → wcwidth-0.2.14}/setup.py +4 -5
  19. {wcwidth-0.2.12 → wcwidth-0.2.14}/tests/test_core.py +89 -46
  20. {wcwidth-0.2.12 → wcwidth-0.2.14}/tests/test_emojis.py +39 -46
  21. wcwidth-0.2.14/tests/test_table_integrity.py +18 -0
  22. {wcwidth-0.2.12 → wcwidth-0.2.14}/tests/test_ucslevel.py +6 -7
  23. {wcwidth-0.2.12 → wcwidth-0.2.14}/tox.ini +17 -45
  24. {wcwidth-0.2.12 → wcwidth-0.2.14}/wcwidth/__init__.py +1 -1
  25. {wcwidth-0.2.12 → wcwidth-0.2.14}/wcwidth/table_vs16.py +2 -2
  26. {wcwidth-0.2.12 → wcwidth-0.2.14}/wcwidth/table_wide.py +339 -69
  27. {wcwidth-0.2.12 → wcwidth-0.2.14}/wcwidth/table_zero.py +729 -20
  28. {wcwidth-0.2.12 → wcwidth-0.2.14}/wcwidth/unicode_versions.py +3 -1
  29. {wcwidth-0.2.12 → wcwidth-0.2.14}/wcwidth/wcwidth.py +20 -36
  30. {wcwidth-0.2.12 → wcwidth-0.2.14/wcwidth.egg-info}/PKG-INFO +35 -11
  31. {wcwidth-0.2.12 → wcwidth-0.2.14}/wcwidth.egg-info/SOURCES.txt +1 -3
  32. wcwidth-0.2.12/code_templates/python_table_width.py.j2 +0 -0
  33. wcwidth-0.2.12/docs/requirements.txt +0 -57
  34. wcwidth-0.2.12/docs/specs.rst +0 -58
  35. wcwidth-0.2.12/requirements-tests39.in +0 -15
  36. wcwidth-0.2.12/requirements-tests39.txt +0 -98
  37. wcwidth-0.2.12/requirements-update.txt +0 -26
  38. wcwidth-0.2.12/wcwidth.egg-info/requires.txt +0 -3
  39. {wcwidth-0.2.12 → wcwidth-0.2.14}/LICENSE +0 -0
  40. {wcwidth-0.2.12 → wcwidth-0.2.14}/MANIFEST.in +0 -0
  41. {wcwidth-0.2.12 → wcwidth-0.2.14}/bin/new-wide-by-version.py +0 -0
  42. {wcwidth-0.2.12 → wcwidth-0.2.14}/code_templates/python_table.py.j2 +0 -0
  43. {wcwidth-0.2.12 → wcwidth-0.2.14}/code_templates/unicode_version.rst.j2 +0 -0
  44. {wcwidth-0.2.12 → wcwidth-0.2.14}/code_templates/unicode_versions.py.j2 +0 -0
  45. {wcwidth-0.2.12 → wcwidth-0.2.14}/docs/api.rst +0 -0
  46. {wcwidth-0.2.12 → wcwidth-0.2.14}/docs/index.rst +0 -0
  47. {wcwidth-0.2.12 → wcwidth-0.2.14}/requirements-develop.txt +0 -0
  48. {wcwidth-0.2.12 → wcwidth-0.2.14}/requirements-tests36.txt +0 -0
  49. {wcwidth-0.2.12 → wcwidth-0.2.14}/requirements-tests37.in +0 -0
  50. {wcwidth-0.2.12 → wcwidth-0.2.14}/requirements-update.in +0 -0
  51. {wcwidth-0.2.12 → wcwidth-0.2.14}/tests/__init__.py +0 -0
  52. {wcwidth-0.2.12 → wcwidth-0.2.14}/tests/emoji-variation-sequences.txt +0 -0
  53. {wcwidth-0.2.12 → wcwidth-0.2.14}/tests/emoji-zwj-sequences.txt +0 -0
  54. {wcwidth-0.2.12 → wcwidth-0.2.14}/wcwidth.egg-info/dependency_links.txt +0 -0
  55. {wcwidth-0.2.12 → wcwidth-0.2.14}/wcwidth.egg-info/top_level.txt +0 -0
  56. {wcwidth-0.2.12 → wcwidth-0.2.14}/wcwidth.egg-info/zip-safe +0 -0
@@ -1,6 +1,6 @@
1
- Metadata-Version: 2.1
1
+ Metadata-Version: 2.4
2
2
  Name: wcwidth
3
- Version: 0.2.12
3
+ Version: 0.2.14
4
4
  Summary: Measures the displayed width of unicode strings in a terminal
5
5
  Home-page: https://github.com/jquast/wcwidth
6
6
  Author: Jeff Quast
@@ -13,8 +13,6 @@ Classifier: Development Status :: 5 - Production/Stable
13
13
  Classifier: Environment :: Console
14
14
  Classifier: License :: OSI Approved :: MIT License
15
15
  Classifier: Operating System :: POSIX
16
- Classifier: Programming Language :: Python :: 2.7
17
- Classifier: Programming Language :: Python :: 3.5
18
16
  Classifier: Programming Language :: Python :: 3.6
19
17
  Classifier: Programming Language :: Python :: 3.7
20
18
  Classifier: Programming Language :: Python :: 3.8
@@ -22,12 +20,24 @@ Classifier: Programming Language :: Python :: 3.9
22
20
  Classifier: Programming Language :: Python :: 3.10
23
21
  Classifier: Programming Language :: Python :: 3.11
24
22
  Classifier: Programming Language :: Python :: 3.12
23
+ Classifier: Programming Language :: Python :: 3.13
24
+ Classifier: Programming Language :: Python :: 3.14
25
25
  Classifier: Topic :: Software Development :: Libraries
26
26
  Classifier: Topic :: Software Development :: Localization
27
27
  Classifier: Topic :: Software Development :: Internationalization
28
28
  Classifier: Topic :: Terminals
29
+ Requires-Python: >=3.6
29
30
  License-File: LICENSE
30
- Requires-Dist: backports.functools-lru-cache>=1.2.1; python_version < "3.2"
31
+ Dynamic: author
32
+ Dynamic: author-email
33
+ Dynamic: classifier
34
+ Dynamic: description
35
+ Dynamic: home-page
36
+ Dynamic: keywords
37
+ Dynamic: license
38
+ Dynamic: license-file
39
+ Dynamic: requires-python
40
+ Dynamic: summary
31
41
 
32
42
  |pypi_downloads| |codecov| |license|
33
43
 
@@ -63,7 +73,7 @@ Example
63
73
  >>> text = u'コンニチハ'
64
74
 
65
75
  Python **incorrectly** uses the *string length* of 5 codepoints rather than the
66
- *printible length* of 10 cells, so that when using the `rjust` function, the
76
+ *printable length* of 10 cells, so that when using the `rjust` function, the
67
77
  output length is wrong::
68
78
 
69
79
  >>> print(len('コンニチハ'))
@@ -126,7 +136,7 @@ Briefly, return values of function ``wcwidth()`` are:
126
136
  Function ``wcswidth()`` simply returns the sum of all values for each character
127
137
  along a string, or ``-1`` when it occurs anywhere along a string.
128
138
 
129
- Full API Documentation at https://wcwidth.readthedocs.org
139
+ Full API Documentation at https://wcwidth.readthedocs.io
130
140
 
131
141
  ==========
132
142
  Developing
@@ -136,9 +146,9 @@ Install wcwidth in editable mode::
136
146
 
137
147
  pip install -e .
138
148
 
139
- Execute unit tests using tox_::
149
+ Execute unit tests using tox_ for all supported Python versions::
140
150
 
141
- tox -e py27,py35,py36,py37,py38,py39,py310,py311,py312
151
+ tox -e py36,py37,py38,py39,py310,py311,py312,py313,py314
142
152
 
143
153
  Updating Unicode Version
144
154
  ------------------------
@@ -247,8 +257,19 @@ Other Languages
247
257
  =======
248
258
  History
249
259
  =======
260
+
261
+ 0.2.14 *2025-09-22*
262
+ * **Drop Support** for Python 2.7 and 3.5. `PR #117`_.
263
+ * **Update** tables to include Unicode Specifications 16.0.0 and 17.0.0.
264
+ `PR #146`_.
265
+ * **Bugfix** U+00AD SOFT HYPHEN should measure as 1, versions 0.2.9 through
266
+ 0.2.13 measured as 0. `PR #149`_.
267
+
268
+ 0.2.13 *2024-01-06*
269
+ * **Bugfix** zero-width support for Hangul Jamo (Korean)
270
+
250
271
  0.2.12 *2023-11-21*
251
- * re-release to remove .pyi file misplaced in wheel files `Issue #101`.
272
+ * re-release to remove .pyi file misplaced in wheel files `Issue #101`_.
252
273
 
253
274
  0.2.11 *2023-11-20*
254
275
  * Include tests files in the source distribution (`PR #98`_, `PR #100`_).
@@ -286,7 +307,7 @@ History
286
307
  Environment variable ``UNICODE_VERSION``, such as ``13.0``, or ``6.3.0``.
287
308
  See the `jquast/ucs-detect`_ CLI utility for automatic detection.
288
309
  * **Enhancement**:
289
- API Documentation is published to readthedocs.org.
310
+ API Documentation is published to readthedocs.io.
290
311
  * **Updated** tables for *all* Unicode Specifications with files
291
312
  published in a programmatically consumable format, versions 4.1.0
292
313
  through 13.0
@@ -364,6 +385,9 @@ https://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c::
364
385
  .. _`PR #97`: https://github.com/jquast/wcwidth/pull/97
365
386
  .. _`PR #98`: https://github.com/jquast/wcwidth/pull/98
366
387
  .. _`PR #100`: https://github.com/jquast/wcwidth/pull/100
388
+ .. _`PR #117`: https://github.com/jquast/wcwidth/pull/117
389
+ .. _`PR #146`: https://github.com/jquast/wcwidth/pull/146
390
+ .. _`PR #149`: https://github.com/jquast/wcwidth/pull/149
367
391
  .. _`Issue #101`: https://github.com/jquast/wcwidth/issues/101
368
392
  .. _`jquast/blessed`: https://github.com/jquast/blessed
369
393
  .. _`selectel/pyte`: https://github.com/selectel/pyte
@@ -32,7 +32,7 @@ Example
32
32
  >>> text = u'コンニチハ'
33
33
 
34
34
  Python **incorrectly** uses the *string length* of 5 codepoints rather than the
35
- *printible length* of 10 cells, so that when using the `rjust` function, the
35
+ *printable length* of 10 cells, so that when using the `rjust` function, the
36
36
  output length is wrong::
37
37
 
38
38
  >>> print(len('コンニチハ'))
@@ -95,7 +95,7 @@ Briefly, return values of function ``wcwidth()`` are:
95
95
  Function ``wcswidth()`` simply returns the sum of all values for each character
96
96
  along a string, or ``-1`` when it occurs anywhere along a string.
97
97
 
98
- Full API Documentation at https://wcwidth.readthedocs.org
98
+ Full API Documentation at https://wcwidth.readthedocs.io
99
99
 
100
100
  ==========
101
101
  Developing
@@ -105,9 +105,9 @@ Install wcwidth in editable mode::
105
105
 
106
106
  pip install -e .
107
107
 
108
- Execute unit tests using tox_::
108
+ Execute unit tests using tox_ for all supported Python versions::
109
109
 
110
- tox -e py27,py35,py36,py37,py38,py39,py310,py311,py312
110
+ tox -e py36,py37,py38,py39,py310,py311,py312,py313,py314
111
111
 
112
112
  Updating Unicode Version
113
113
  ------------------------
@@ -216,8 +216,19 @@ Other Languages
216
216
  =======
217
217
  History
218
218
  =======
219
+
220
+ 0.2.14 *2025-09-22*
221
+ * **Drop Support** for Python 2.7 and 3.5. `PR #117`_.
222
+ * **Update** tables to include Unicode Specifications 16.0.0 and 17.0.0.
223
+ `PR #146`_.
224
+ * **Bugfix** U+00AD SOFT HYPHEN should measure as 1, versions 0.2.9 through
225
+ 0.2.13 measured as 0. `PR #149`_.
226
+
227
+ 0.2.13 *2024-01-06*
228
+ * **Bugfix** zero-width support for Hangul Jamo (Korean)
229
+
219
230
  0.2.12 *2023-11-21*
220
- * re-release to remove .pyi file misplaced in wheel files `Issue #101`.
231
+ * re-release to remove .pyi file misplaced in wheel files `Issue #101`_.
221
232
 
222
233
  0.2.11 *2023-11-20*
223
234
  * Include tests files in the source distribution (`PR #98`_, `PR #100`_).
@@ -255,7 +266,7 @@ History
255
266
  Environment variable ``UNICODE_VERSION``, such as ``13.0``, or ``6.3.0``.
256
267
  See the `jquast/ucs-detect`_ CLI utility for automatic detection.
257
268
  * **Enhancement**:
258
- API Documentation is published to readthedocs.org.
269
+ API Documentation is published to readthedocs.io.
259
270
  * **Updated** tables for *all* Unicode Specifications with files
260
271
  published in a programmatically consumable format, versions 4.1.0
261
272
  through 13.0
@@ -333,6 +344,9 @@ https://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c::
333
344
  .. _`PR #97`: https://github.com/jquast/wcwidth/pull/97
334
345
  .. _`PR #98`: https://github.com/jquast/wcwidth/pull/98
335
346
  .. _`PR #100`: https://github.com/jquast/wcwidth/pull/100
347
+ .. _`PR #117`: https://github.com/jquast/wcwidth/pull/117
348
+ .. _`PR #146`: https://github.com/jquast/wcwidth/pull/146
349
+ .. _`PR #149`: https://github.com/jquast/wcwidth/pull/149
336
350
  .. _`Issue #101`: https://github.com/jquast/wcwidth/issues/101
337
351
  .. _`jquast/blessed`: https://github.com/jquast/blessed
338
352
  .. _`selectel/pyte`: https://github.com/selectel/pyte
@@ -21,7 +21,7 @@ import unicodedata
21
21
  from pathlib import Path
22
22
  from dataclasses import field, fields, dataclass
23
23
 
24
- from typing import Any, Mapping, Iterable, Iterator, Sequence, Container, Collection
24
+ from typing import Any, Mapping, Iterable, Iterator, Sequence, Collection
25
25
 
26
26
  try:
27
27
  from typing import Self
@@ -54,6 +54,19 @@ FETCH_BLOCKSIZE = int(os.environ.get('FETCH_BLOCKSIZE', '4096'))
54
54
  MAX_RETRIES = int(os.environ.get('MAX_RETRIES', '6'))
55
55
  BACKOFF_FACTOR = float(os.environ.get('BACKOFF_FACTOR', '0.1'))
56
56
 
57
+ # Hangul Jamo is a decomposed form of Hangul Syllables, see
58
+ # see https://www.unicode.org/faq/korean.html#3
59
+ # https://github.com/ridiculousfish/widecharwidth/pull/17
60
+ # https://github.com/jquast/ucs-detect/issues/9
61
+ # https://devblogs.microsoft.com/oldnewthing/20201009-00/?p=104351
62
+ # "Conjoining Jamo are divided into three classes: L, V, T (Leading
63
+ # consonant, Vowel, Trailing consonant). A Hangul Syllable consists of
64
+ # <LV> or <LVT> sequences."
65
+ HANGUL_JAMO_ZEROWIDTH = (
66
+ *range(0x1160, 0x1200), # Hangul Jungseong Filler .. Hangul Jongseong Ssangnieun
67
+ *range(0xD7B0, 0xD800), # Hangul Jungseong O-Yeo .. Undefined Character of Hangul Jamo Extended-B
68
+ )
69
+
57
70
 
58
71
  def _bisearch(ucs, table):
59
72
  """A copy of wcwwidth._bisearch, to prevent having issues when depending on code that imports
@@ -77,7 +90,7 @@ def _bisearch(ucs, table):
77
90
 
78
91
  @dataclass(order=True, frozen=True)
79
92
  class UnicodeVersion:
80
- """A class for camparable unicode version."""
93
+ """A class for comparable unicode version."""
81
94
  major: int
82
95
  minor: int
83
96
  micro: int | None
@@ -112,11 +125,11 @@ class TableEntry:
112
125
  properties: tuple[str, ...]
113
126
  comment: str
114
127
 
115
- def filter_by_category(self, category_codes: str, wide: int) -> bool:
128
+ def filter_by_category_width(self, wide: int) -> bool:
116
129
  """
117
- Return whether entry matches given category code and displayed width.
130
+ Return whether entry matches displayed width.
118
131
 
119
- Categories are described here, https://www.unicode.org/reports/tr44/#GC_Values_Table
132
+ Parses both DerivedGeneralCategory.txt and EastAsianWidth.txt
120
133
  """
121
134
  if self.code_range is None:
122
135
  return False
@@ -146,13 +159,12 @@ class TableEntry:
146
159
  return wide == 1
147
160
 
148
161
  @staticmethod
149
- def parse_category_values(category_codes: str,
150
- table_iter: Iterator[TableEntry],
151
- wide: int) -> set[tuple[int, int]]:
162
+ def parse_width_category_values(table_iter: Iterator[TableEntry],
163
+ wide: int) -> set[tuple[int, int]]:
152
164
  """Parse value ranges of unicode data files, by given category and width."""
153
165
  return {n
154
166
  for entry in table_iter
155
- if entry.filter_by_category(category_codes, wide)
167
+ if entry.filter_by_category_width(wide)
156
168
  for n in list(range(entry.code_range[0], entry.code_range[1]))}
157
169
 
158
170
 
@@ -326,18 +338,19 @@ def fetch_table_wide_data() -> UnicodeTableRenderCtx:
326
338
  for version in fetch_unicode_versions():
327
339
  # parse typical 'wide' characters by categories 'W' and 'F',
328
340
  table[version] = parse_category(fname=UnicodeDataFile.EastAsianWidth(version),
329
- category_codes=('W', 'F'),
330
341
  wide=2)
331
342
 
332
343
  # subtract(!) wide characters that were defined above as 'W' category in EastAsianWidth,
333
344
  # but also zero-width category 'Mn' or 'Mc' in DerivedGeneralCategory!
334
- table[version].values.discard(parse_category(fname=UnicodeDataFile.DerivedGeneralCategory(version),
335
- category_codes=('Mn', 'Mc'),
336
- wide=0).values)
345
+ table[version].values = table[version].values.difference(parse_category(
346
+ fname=UnicodeDataFile.DerivedGeneralCategory(version),
347
+ wide=0).values)
348
+
349
+ # Also subtract Hangul Jamo Vowels and Hangul Trailing Consonants
350
+ table[version].values = table[version].values.difference(HANGUL_JAMO_ZEROWIDTH)
337
351
 
338
352
  # finally, join with atypical 'wide' characters defined by category 'Sk',
339
353
  table[version].values.update(parse_category(fname=UnicodeDataFile.DerivedGeneralCategory(version),
340
- category_codes=('Sk',),
341
354
  wide=2).values)
342
355
  return UnicodeTableRenderCtx('WIDE_EASTASIAN', table)
343
356
 
@@ -352,11 +365,28 @@ def fetch_table_zero_data() -> UnicodeTableRenderCtx:
352
365
  for version in fetch_unicode_versions():
353
366
  # Determine values of zero-width character lookup table by the following category codes
354
367
  table[version] = parse_category(fname=UnicodeDataFile.DerivedGeneralCategory(version),
355
- category_codes=('Me', 'Mn', 'Mc', 'Cf', 'Zl', 'Zp', 'Sk'),
356
368
  wide=0)
357
369
 
358
- # And, include NULL
370
+ # Include NULL
359
371
  table[version].values.add(0)
372
+
373
+ # Add Hangul Jamo Vowels and Hangul Trailing Consonants
374
+ table[version].values.update(HANGUL_JAMO_ZEROWIDTH)
375
+
376
+ # Remove u+00AD categoryCode=Cf name="SOFT HYPHEN",
377
+ # > https://www.unicode.org/faq/casemap_charprop.html
378
+ #
379
+ # > Q: Unicode now treats the SOFT HYPHEN as format control (Cf)
380
+ # > character when formerly it was a punctuation character (Pd).
381
+ # > Doesn't this break ISO 8859-1 compatibility?
382
+ #
383
+ # > [..] In a terminal emulation environment, particularly in
384
+ # > ISO-8859-1 contexts, one could display the SOFT HYPHEN as a hyphen
385
+ # > in all circumstances.
386
+ #
387
+ # This value was wrongly measured as a width of '0' in this wcwidth
388
+ # versions 0.2.9 - 0.2.13. Fixed in 0.2.14
389
+ table[version].values.discard(0x00AD) # SOFT HYPHEN
360
390
  return UnicodeTableRenderCtx('ZERO_WIDTH', table)
361
391
 
362
392
 
@@ -379,7 +409,6 @@ def fetch_table_vs16_data() -> UnicodeTableRenderCtx:
379
409
  For that reason, and that these values are not expected to change,
380
410
  only this single shared table is exported.
381
411
 
382
-
383
412
  One example, where v3.2 became v1.1 ("-" 12.0, "+" 15.1)::
384
413
 
385
414
  -2620 FE0F ; Basic_Emoji ; skull and crossbones # 3.2 [1] (☠️)
@@ -482,7 +511,7 @@ def parse_unicode_table(file: Iterable[str]) -> Iterator[TableEntry]:
482
511
 
483
512
 
484
513
  def parse_vs16_table(fp: Iterable[str]) -> Iterator[TableEntry]:
485
- """Parse emoji-variation-sequences.txt for codepoints that preceed 0xFE0F."""
514
+ """Parse emoji-variation-sequences.txt for codepoints that precede 0xFE0F."""
486
515
  hex_str_vs16 = 'FE0F'
487
516
  for line in fp:
488
517
  data, _, comment = line.partition('#')
@@ -496,14 +525,14 @@ def parse_vs16_table(fp: Iterable[str]) -> Iterator[TableEntry]:
496
525
  continue
497
526
  code_points = code_points_str.split()
498
527
  if len(code_points) == 2 and code_points[1] == hex_str_vs16:
499
- # yeild a single "code range" entry for a single value that preceeds FE0F
528
+ # yield a single "code range" entry for a single value that precedes FE0F
500
529
  yield TableEntry((int(code_points[0], 16), int(code_points[0], 16)), tuple(properties), comment)
501
530
 
502
531
 
503
532
  @functools.cache
504
- def parse_category(fname: str, category_codes: Container[str], wide: int) -> TableDef:
533
+ def parse_category(fname: str, wide: int) -> TableDef:
505
534
  """Parse value ranges of unicode data files, by given categories into string tables."""
506
- print(f'parsing {fname} category_codes={",".join(category_codes)}: ', end='', flush=True)
535
+ print(f'parsing {fname}, wide={wide}: ', end='', flush=True)
507
536
 
508
537
  with open(fname, encoding='utf-8') as f:
509
538
  table_iter = parse_unicode_table(f)
@@ -512,7 +541,7 @@ def parse_category(fname: str, category_codes: Container[str], wide: int) -> Tab
512
541
  version = next(table_iter).comment.strip()
513
542
  # and "date string" from second line
514
543
  date = next(table_iter).comment.split(':', 1)[1].strip()
515
- values = TableEntry.parse_category_values(category_codes, table_iter, wide)
544
+ values = TableEntry.parse_width_category_values(table_iter, wide)
516
545
  print('ok')
517
546
  return TableDef(version, date, values)
518
547
 
@@ -63,9 +63,31 @@ Category code was changed from 'Mc' to 'Lo':
63
63
  import logging
64
64
 
65
65
 
66
+ def bisearch_pair(ucs, table):
67
+ """
68
+ A copy of wcwidth._bisearch() but also returns the range of matched values.
69
+ """
70
+ lbound = 0
71
+ ubound = len(table) - 1
72
+
73
+ if ucs < table[0][0] or ucs > table[ubound][1]:
74
+ return (0, None, None)
75
+ while ubound >= lbound:
76
+ mid = (lbound + ubound) // 2
77
+ if ucs > table[mid][1]:
78
+ lbound = mid + 1
79
+ elif ucs < table[mid][0]:
80
+ ubound = mid - 1
81
+ else:
82
+ return (1, table[mid][0], table[mid][1])
83
+
84
+ return (0, None, None)
85
+
86
+
66
87
  def main(log: logging.Logger):
67
88
  # local
68
- from wcwidth import ZERO_WIDTH, WIDE_EASTASIAN, _bisearch, list_versions
89
+ from wcwidth import ZERO_WIDTH, WIDE_EASTASIAN, list_versions
90
+
69
91
  reversed_uni_versions = list(reversed(list_versions()))
70
92
  tables = {'ZERO_WIDTH': ZERO_WIDTH,
71
93
  'WIDE_EASTASIAN': WIDE_EASTASIAN}
@@ -81,14 +103,21 @@ def main(log: logging.Logger):
81
103
  other_table = tables[other_table_name][version]
82
104
  for start_range, stop_range in curr_table:
83
105
  for unichar_n in range(start_range, stop_range):
84
- if not _bisearch(unichar_n, next_table):
85
- log.info(f'value {hex(unichar_n)} in table_name={table_name}'
86
- f' version={version} is not defined in next_version={next_version}'
87
- f' from inclusive range {hex(start_range)}-{hex(stop_range)}')
88
- if _bisearch(unichar_n, other_table):
89
- log.error(f'value {hex(unichar_n)} in table_name={table_name}'
90
- f' version={version} is duplicated in other_table_name={other_table_name}'
91
- f' from inclusive range {hex(start_range)}-{hex(stop_range)}')
106
+ result, _, _ = bisearch_pair(unichar_n, next_table)
107
+ if not result:
108
+ log.info(
109
+ f'value 0x{unichar_n:05x} in table_name={table_name}'
110
+ f' version={version} is not defined in next_version={next_version}'
111
+ f' from inclusive range {hex(start_range)}-{hex(stop_range)}'
112
+ )
113
+ result, lbound, ubound = bisearch_pair(unichar_n, other_table)
114
+ if result:
115
+ log.error(
116
+ f'value 0x{unichar_n:05x} in table_name={table_name}'
117
+ f' version={version} is duplicated in other_table_name={other_table_name}'
118
+ f' from inclusive range 0x{start_range:05x}-0x{stop_range:05x} of'
119
+ f' {table_name} against 0x{lbound:05x}-0x{ubound:05x} in {other_table_name}'
120
+ )
92
121
  errors += 1
93
122
  if errors:
94
123
  log.error(f'{errors} errors, exit 1')
@@ -22,7 +22,6 @@ Options:
22
22
  # Invalid constant name "echo"
23
23
  # Invalid constant name "flushout" (col 4)
24
24
  # Invalid module name "wcwidth-browser"
25
- from __future__ import division, print_function
26
25
 
27
26
  # std imports
28
27
  import sys
@@ -44,7 +43,7 @@ flushout = functools.partial(print, end='', flush=True)
44
43
 
45
44
  #: printable length of highest unicode character description
46
45
  LIMIT_UCS = 0x3fffd
47
- UCS_PRINTLEN = len('{value:0x}'.format(value=LIMIT_UCS))
46
+ UCS_PRINTLEN = len(f'{LIMIT_UCS:0x}')
48
47
 
49
48
 
50
49
  def readline(term, width):
@@ -69,7 +68,7 @@ def readline(term, width):
69
68
  return text
70
69
 
71
70
 
72
- class WcWideCharacterGenerator(object):
71
+ class WcWideCharacterGenerator:
73
72
  """Generator yields unicode characters of the given ``width``."""
74
73
 
75
74
  # pylint: disable=R0903
@@ -101,7 +100,7 @@ class WcWideCharacterGenerator(object):
101
100
  return (ucs, name)
102
101
 
103
102
 
104
- class WcCombinedCharacterGenerator(object):
103
+ class WcCombinedCharacterGenerator:
105
104
  """Generator yields unicode characters with combining."""
106
105
 
107
106
  # pylint: disable=R0903
@@ -116,7 +115,7 @@ class WcCombinedCharacterGenerator(object):
116
115
  """
117
116
  self.characters = []
118
117
  letters_o = ('o' * width)
119
- for (begin, end) in ZERO_WIDTH[unicode_version]:
118
+ for (begin, end) in ZERO_WIDTH[_wcmatch_version(unicode_version)]:
120
119
  for val in [_val for _val in
121
120
  range(begin, end + 1)
122
121
  if _val <= LIMIT_UCS]:
@@ -148,11 +147,8 @@ class WcCombinedCharacterGenerator(object):
148
147
  continue
149
148
  return (ucs, name)
150
149
 
151
- # python 2.6 - 3.3 compatibility
152
- next = __next__
153
150
 
154
-
155
- class Style(object):
151
+ class Style:
156
152
  """Styling decorator class instance for terminal output."""
157
153
 
158
154
  # pylint: disable=R0903
@@ -184,7 +180,7 @@ class Style(object):
184
180
  setattr(self, key, val)
185
181
 
186
182
 
187
- class Screen(object):
183
+ class Screen:
188
184
  """Represents terminal style, data dimensions, and drawables."""
189
185
 
190
186
  intro_msg_fmt = ('Delimiters ({delim}) should align, '
@@ -217,8 +213,7 @@ class Screen(object):
217
213
  """Text of a single column heading."""
218
214
  delimiter = self.style.attr_minor(self.style.delimiter)
219
215
  hint = self.style.header_hint * self.wide
220
- heading = ('{delimiter}{hint}{delimiter}'
221
- .format(delimiter=delimiter, hint=hint))
216
+ heading = f'{delimiter}{hint}{delimiter}'
222
217
 
223
218
  def alignment(*args):
224
219
  if self.style.alignment == 'right':
@@ -264,7 +259,7 @@ class Screen(object):
264
259
  return self.num_rows * self.num_columns
265
260
 
266
261
 
267
- class Pager(object):
262
+ class Pager:
268
263
  """A less(1)-like browser for browsing unicode characters."""
269
264
  # pylint: disable=too-many-instance-attributes
270
265
 
@@ -570,10 +565,10 @@ class Pager(object):
570
565
  if idx == self.last_page:
571
566
  last_end = '(END)'
572
567
  else:
573
- last_end = '/{0}'.format(self.last_page)
568
+ last_end = f'/{self.last_page}'
574
569
  txt = ('Page {idx}{last_end} - '
575
570
  '{q} to quit, [keys: {keyset}]'
576
- .format(idx=style.attr_minor('{0}'.format(idx)),
571
+ .format(idx=style.attr_minor(f'{idx}'),
577
572
  last_end=style.attr_major(last_end),
578
573
  keyset=style.attr_major('kjfbvc12-='),
579
574
  q=style.attr_minor('q')))
@@ -1,5 +1,4 @@
1
1
  #!/usr/bin/env python
2
- # coding: utf-8
3
2
  """
4
3
  Manual tests comparing wcwidth.py to libc's wcwidth(3) and wcswidth(3).
5
4
 
@@ -18,7 +17,6 @@ level for our library to use when comparing to libc.
18
17
  # Invalid module name "wcwidth-libc-comparator"
19
18
 
20
19
  # standard imports
21
- from __future__ import print_function
22
20
 
23
21
  # std imports
24
22
  import sys
@@ -64,25 +62,12 @@ def report_ucs_msg(ucs, wcwidth_libc, wcwidth_local):
64
62
  .decode('ascii')
65
63
  .upper()
66
64
  .lstrip('0'))
67
- url = "http://codepoints.net/U+{}".format(ucp)
65
+ url = f"http://codepoints.net/U+{ucp}"
68
66
  name = unicodedata.name(ucs)
69
- return (u"libc,ours={},{} [--o{}o--] name={} val={} {}"
67
+ return ("libc,ours={},{} [--o{}o--] name={} val={} {}"
70
68
  " ".format(wcwidth_libc, wcwidth_local, ucs, name, ord(ucs), url))
71
69
 
72
70
 
73
- # use chr() for py3.x,
74
- # unichr() for py2.x
75
- try:
76
- _ = unichr(0)
77
- except NameError as err:
78
- if err.args[0] == "name 'unichr' is not defined":
79
- # pylint: disable=W0622
80
- # Redefining built-in 'unichr' (col 8)
81
-
82
- unichr = chr
83
- else:
84
- raise
85
-
86
71
  if sys.maxunicode < 1114111:
87
72
  warnings.warn('narrow Python build, only a small subset of '
88
73
  'characters may be tested.')
@@ -108,7 +93,7 @@ def main(using_locale=('en_US', 'UTF-8',)):
108
93
  report a detailed AssertionError to stdout.
109
94
  """
110
95
  all_ucs = (ucs for ucs in
111
- [unichr(val) for val in range(sys.maxunicode)]
96
+ [chr(val) for val in range(sys.maxunicode)]
112
97
  if is_named(ucs) and is_not_combining(ucs))
113
98
 
114
99
  libc_name = ctypes.util.find_library('c')
@@ -1,5 +1,4 @@
1
1
  #!/usr/bin/env python3
2
- # -*- coding: utf-8 -*-
3
2
  #
4
3
  # wcwidth documentation build configuration file, created by
5
4
  # sphinx-quickstart on Fri Oct 20 15:18:02 2017.
@@ -69,7 +68,7 @@ release = version = wcwidth.__version__
69
68
  #
70
69
  # This is also used if you do content translation via gettext catalogs.
71
70
  # Usually you set "language" from the command line for these cases.
72
- language = None
71
+ language = 'en'
73
72
 
74
73
  # List of patterns, relative to source directory, that match files and
75
74
  # directories to ignore when looking for source files.
@@ -32,7 +32,7 @@ Example
32
32
  >>> text = u'コンニチハ'
33
33
 
34
34
  Python **incorrectly** uses the *string length* of 5 codepoints rather than the
35
- *printible length* of 10 cells, so that when using the `rjust` function, the
35
+ *printable length* of 10 cells, so that when using the `rjust` function, the
36
36
  output length is wrong::
37
37
 
38
38
  >>> print(len('コンニチハ'))
@@ -95,7 +95,7 @@ Briefly, return values of function ``wcwidth()`` are:
95
95
  Function ``wcswidth()`` simply returns the sum of all values for each character
96
96
  along a string, or ``-1`` when it occurs anywhere along a string.
97
97
 
98
- Full API Documentation at https://wcwidth.readthedocs.org
98
+ Full API Documentation at https://wcwidth.readthedocs.io
99
99
 
100
100
  ==========
101
101
  Developing
@@ -105,9 +105,9 @@ Install wcwidth in editable mode::
105
105
 
106
106
  pip install -e .
107
107
 
108
- Execute unit tests using tox_::
108
+ Execute unit tests using tox_ for all supported Python versions::
109
109
 
110
- tox -e py27,py35,py36,py37,py38,py39,py310,py311,py312
110
+ tox -e py36,py37,py38,py39,py310,py311,py312,py313,py314
111
111
 
112
112
  Updating Unicode Version
113
113
  ------------------------
@@ -216,8 +216,19 @@ Other Languages
216
216
  =======
217
217
  History
218
218
  =======
219
+
220
+ 0.2.14 *2025-09-22*
221
+ * **Drop Support** for Python 2.7 and 3.5. `PR #117`_.
222
+ * **Update** tables to include Unicode Specifications 16.0.0 and 17.0.0.
223
+ `PR #146`_.
224
+ * **Bugfix** U+00AD SOFT HYPHEN should measure as 1, versions 0.2.9 through
225
+ 0.2.13 measured as 0. `PR #149`_.
226
+
227
+ 0.2.13 *2024-01-06*
228
+ * **Bugfix** zero-width support for Hangul Jamo (Korean)
229
+
219
230
  0.2.12 *2023-11-21*
220
- * re-release to remove .pyi file misplaced in wheel files `Issue #101`.
231
+ * re-release to remove .pyi file misplaced in wheel files `Issue #101`_.
221
232
 
222
233
  0.2.11 *2023-11-20*
223
234
  * Include tests files in the source distribution (`PR #98`_, `PR #100`_).
@@ -255,7 +266,7 @@ History
255
266
  Environment variable ``UNICODE_VERSION``, such as ``13.0``, or ``6.3.0``.
256
267
  See the `jquast/ucs-detect`_ CLI utility for automatic detection.
257
268
  * **Enhancement**:
258
- API Documentation is published to readthedocs.org.
269
+ API Documentation is published to readthedocs.io.
259
270
  * **Updated** tables for *all* Unicode Specifications with files
260
271
  published in a programmatically consumable format, versions 4.1.0
261
272
  through 13.0
@@ -333,6 +344,9 @@ https://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c::
333
344
  .. _`PR #97`: https://github.com/jquast/wcwidth/pull/97
334
345
  .. _`PR #98`: https://github.com/jquast/wcwidth/pull/98
335
346
  .. _`PR #100`: https://github.com/jquast/wcwidth/pull/100
347
+ .. _`PR #117`: https://github.com/jquast/wcwidth/pull/117
348
+ .. _`PR #146`: https://github.com/jquast/wcwidth/pull/146
349
+ .. _`PR #149`: https://github.com/jquast/wcwidth/pull/149
336
350
  .. _`Issue #101`: https://github.com/jquast/wcwidth/issues/101
337
351
  .. _`jquast/blessed`: https://github.com/jquast/blessed
338
352
  .. _`selectel/pyte`: https://github.com/selectel/pyte