micromark-extension-cjk-friendly-util 2.1.0 → 3.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,10 +1,10 @@
1
1
  # micromark-extension-cjk-friendly-util
2
2
 
3
- [![Version](https://img.shields.io/npm/v/micromark-extension-cjk-friendly-util)](https://npmjs.com/package/micromark-extension-cjk-friendly-util) ![Node Current](https://img.shields.io/node/v/micromark-extension-cjk-friendly-util) [![NPM Downloads](https://img.shields.io/npm/dm/micromark-extension-cjk-friendly-util)](https://npmjs.com/package/micromark-extension-cjk-friendly-util) [![NPM Last Update](https://img.shields.io/npm/last-update/micromark-extension-cjk-friendly-util)](https://npmjs.com/package/micromark-extension-cjk-friendly-util)
3
+ [![Version](https://img.shields.io/npm/v/micromark-extension-cjk-friendly-util)](https://npmjs.com/package/micromark-extension-cjk-friendly-util) ![Node Current](https://img.shields.io/node/v/micromark-extension-cjk-friendly-util) [![NPM Downloads](https://img.shields.io/npm/dm/micromark-extension-cjk-friendly-util)](https://npmjs.com/package/micromark-extension-cjk-friendly-util) [![NPM Last Update](https://img.shields.io/npm/last-update/micromark-extension-cjk-friendly-util)](https://npmjs.com/package/micromark-extension-cjk-friendly-util) [![Socket Badge](https://badge.socket.dev/npm/package/micromark-extension-cjk-friendly-util)](https://socket.dev/npm/package/micromark-extension-cjk-friendly-util) [![Snyk Advisor Package Health Badge](https://snyk.io/advisor/npm-package/micromark-extension-cjk-friendly-util/badge.svg)](https://snyk.io/advisor/npm-package/micromark-extension-cjk-friendly-util)
4
4
 
5
5
  An utility library package for [micromark-extension-cjk-friendly](https://npmjs.com/package/micromark-extension-cjk-friendly), which is internally used by [remark-cjk-friendly](https://npmjs.com/package/remark-cjk-friendly), and its related packages.
6
6
 
7
- ## Problem / <span lang="ja">問題</span> / <span lang="zh-Hans-CN">问题</span> / <span lang="ko">문제점</span>
7
+ ## Problem / <span lang="ja">問題</span> / <span lang="zh-Hans-CN">问题</span> / <span lang="ko">문제</span>
8
8
 
9
9
  CommonMark has a problem that the following emphasis marks `**` are not recognized as emphasis marks in Japanese, Chinese, and Korean.
10
10
 
@@ -12,7 +12,7 @@ CommonMark has a problem that the following emphasis marks `**` are not recogniz
12
12
 
13
13
  <span lang="zh-Hans-CN">CommonMark存在以下问题:在中文、日语和韩语文本中,强调标记`**`不会被识别为强调标记。</span>
14
14
 
15
- <span lang="ko">CommonMark는 일본어와 중국어에서 다음과 같은 강조 표시 `**`가 강조 표시로 인식되지 않는 문제가 있습니다.</span>
15
+ <span lang="ko">CommonMark는 한국어, 일본어, 중국어에서 다음과 같은 강조 표시 `**`가 강조 표시로 인식되지 않는 문제가 있습니다.</span>
16
16
 
17
17
  ```md
18
18
  **このアスタリスクは強調記号として認識されず、そのまま表示されます。**この文のせいで。
@@ -40,15 +40,15 @@ Of course, not only the end side but also the start side has the same issue.
40
40
 
41
41
  CommonMark issue: https://github.com/commonmark/commonmark-spec/issues/650
42
42
 
43
- ## Runtime Requirements / <span lang="ja">実行環境の要件</span> / <span lang="zh-Hans-CN">运行环境要求</span> / <span lang="ko">업데이트 전략</span>
43
+ ## Runtime Requirements / <span lang="ja">実行環境の要件</span> / <span lang="zh-Hans-CN">运行环境要求</span> / <span lang="ko">런타임 요구 사항</span>
44
44
 
45
- This package is ESM-only. It requires Node.js 16 or later.
45
+ This package is ESM-only. It requires Node.js 18 or later. (I have only tested it on 20 and later. There is no factor that would prevent it from working on 18, but I do not guarantee its operation on 18.)
46
46
 
47
- <span lang="ja">本パッケージはESM専用です。Node.js 16以上が必要です。</span>
47
+ <span lang="ja">本パッケージはESM専用です。Node.js 18以上が必要です。(動作検証は20以降でのみ行っています。18での動作を妨げる要因はありませんが、動作の保証はありません)</span>
48
48
 
49
- <span lang="zh-Hans-CN">此包仅支持ESM。需要Node.js 16或更高版本。</span>
49
+ <span lang="zh-Hans-CN">此包仅支持ESM。需要Node.js 18或更高版本。(我只测试了20及以后的版本。没有因素会阻止它在18上工作,但我不保证在18上的操作。)</span>
50
50
 
51
- <span lang="ko">이 패키지는 ESM 사용을 위한 패키지입니다. Node.js 16或更高版本가 필요입니다.</span>
51
+ <span lang="ko">본 패키지는 ESM 전용입니다. Node.js 18 이상이 필요합니다. (동작 검증은 20 이후 버전에서만 수행했습니다. 18에서 동작을 방해하는 요인은 없으나, 동작을 보장하지는 않습니다)</span>
52
52
 
53
53
  ## Installation / <span lang="ja">インストール</span> / <span lang="zh-Hans-CN">安装</span> / <span lang="ko">설치</span>
54
54
 
@@ -70,7 +70,7 @@ If you use another package manager, please replace `npm install` with the comman
70
70
 
71
71
  <span lang="zh-Hans-CN">如果使用其他包管理器,请将 `npm install` 替换为当时包管理器的命令(例如:`pnpm add`、`yarn add`)。</span>
72
72
 
73
- <span lang="ko">다른 패키지 매니저를 사용하는 경우 `npm install`을 해당 패키지 매니저의 명령어(예: `pnpm add`, `yarn add`)로 바꾸어 주세요.</span>
73
+ <span lang="ko">npm이 아닌 다른 패키지 매니저를 사용하는 경우 `npm install`을 해당 패키지 매니저의 명령어(예: `pnpm add`, `yarn add`)로 바꿔 주세요.</span>
74
74
 
75
75
  ## Usage / <span lang="ja">使い方</span> / <span lang="zh-Hans-CN">用法</span> / <span lang="ko">사용법</span>
76
76
 
@@ -89,17 +89,17 @@ This package provides a function and a namespace based on the original micromark
89
89
 
90
90
  Also, this package provides some utility functions to check whether a character belongs to the category defined in the specification (e.g. CJK character), or to help you fetch the Unicode Code Point of a character around the emphasis mark.
91
91
 
92
- ## Specification / <span lang="ja">規格書</span> / <span lang="zh-Hans-CN">规范</span> / <span lang="ko">규정서</span>
92
+ ## Specification / <span lang="ja">規格書</span> / <span lang="zh-Hans-CN">规范</span> / <span lang="ko">설명서</span>
93
93
 
94
94
  https://github.com/tats-u/markdown-cjk-friendly/blob/main/specification.md (English)
95
95
 
96
96
  ## Related packages / <span lang="ja">関連パッケージ</span> / <span lang="zh-Hans-CN">相关包</span> / <span lang="ko">관련 패키지</span>
97
97
 
98
- - [micromark-extension-cjk-friendly](https://npmjs.com/package/micromark-extension-cjk-friendly) [![Version](https://img.shields.io/npm/v/micromark-extension-cjk-friendly)](https://npmjs.com/package/micromark-extension-cjk-friendly) ![Node Current](https://img.shields.io/node/v/micromark-extension-cjk-friendly) [![NPM Downloads](https://img.shields.io/npm/dm/micromark-extension-cjk-friendly)](https://npmjs.com/package/micromark-extension-cjk-friendly) [![NPM Last Update](https://img.shields.io/npm/last-update/micromark-extension-cjk-friendly)](https://npmjs.com/package/micromark-extension-cjk-friendly)
99
- - [remark-cjk-friendly](https://npmjs.com/package/remark-cjk-friendly) [![Version](https://img.shields.io/npm/v/remark-cjk-friendly)](https://npmjs.com/package/remark-cjk-friendly) ![Node Current](https://img.shields.io/node/v/remark-cjk-friendly) [![NPM Downloads](https://img.shields.io/npm/dm/remark-cjk-friendly)](https://npmjs.com/package/remark-cjk-friendly) [![NPM Last Update](https://img.shields.io/npm/last-update/remark-cjk-friendly)](https://npmjs.com/package/remark-cjk-friendly)
100
- - [markdown-it-cjk-friendly](https://npmjs.com/package/markdown-it-cjk-friendly) [![Version](https://img.shields.io/npm/v/markdown-it-cjk-friendly)](https://npmjs.com/package/markdown-it-cjk-friendly) ![Node Current](https://img.shields.io/node/v/markdown-it-cjk-friendly) [![NPM Downloads](https://img.shields.io/npm/dm/markdown-it-cjk-friendly)](https://npmjs.com/package/markdown-it-cjk-friendly) [![NPM Last Update](https://img.shields.io/npm/last-update/markdown-it-cjk-friendly)](https://npmjs.com/package/markdown-it-cjk-friendly)
101
- - [remark-cjk-friendly](https://npmjs.com/package/remark-cjk-friendly) [![Version](https://img.shields.io/npm/v/remark-cjk-friendly)](https://npmjs.com/package/remark-cjk-friendly) ![Node Current](https://img.shields.io/node/v/remark-cjk-friendly) [![NPM Downloads](https://img.shields.io/npm/dm/remark-cjk-friendly)](https://npmjs.com/package/remark-cjk-friendly) [![NPM Last Update](https://img.shields.io/npm/last-update/remark-cjk-friendly)](https://npmjs.com/package/remark-cjk-friendly)
102
- - [micromark-extension-cjk-friendly-gfm-strikethrough](https://npmjs.com/package/micromark-extension-cjk-friendly-gfm-strikethrough) [![Version](https://img.shields.io/npm/v/micromark-extension-cjk-friendly-gfm-strikethrough)](https://npmjs.com/package/micromark-extension-cjk-friendly-gfm-strikethrough) ![Node Current](https://img.shields.io/node/v/micromark-extension-cjk-friendly-gfm-strikethrough) [![NPM Downloads](https://img.shields.io/npm/dm/micromark-extension-cjk-friendly-gfm-strikethrough)](https://npmjs.com/package/micromark-extension-cjk-friendly-gfm-strikethrough) [![NPM Last Update](https://img.shields.io/npm/last-update/micromark-extension-cjk-friendly-gfm-strikethrough)](https://npmjs.com/package/micromark-extension-cjk-friendly-gfm-strikethrough)
98
+ - [micromark-extension-cjk-friendly](https://npmjs.com/package/micromark-extension-cjk-friendly) [![Version](https://img.shields.io/npm/v/micromark-extension-cjk-friendly)](https://npmjs.com/package/micromark-extension-cjk-friendly) ![Node Current](https://img.shields.io/node/v/micromark-extension-cjk-friendly) [![NPM Downloads](https://img.shields.io/npm/dm/micromark-extension-cjk-friendly)](https://npmjs.com/package/micromark-extension-cjk-friendly) [![NPM Last Update](https://img.shields.io/npm/last-update/micromark-extension-cjk-friendly)](https://npmjs.com/package/micromark-extension-cjk-friendly) [![Socket Badge](https://badge.socket.dev/npm/package/micromark-extension-cjk-friendly)](https://socket.dev/npm/package/micromark-extension-cjk-friendly) [![Snyk Advisor Package Health Badge](https://snyk.io/advisor/npm-package/micromark-extension-cjk-friendly/badge.svg)](https://snyk.io/advisor/npm-package/micromark-extension-cjk-friendly)
99
+ - [remark-cjk-friendly](https://npmjs.com/package/remark-cjk-friendly) [![Version](https://img.shields.io/npm/v/remark-cjk-friendly)](https://npmjs.com/package/remark-cjk-friendly) ![Node Current](https://img.shields.io/node/v/remark-cjk-friendly) [![NPM Downloads](https://img.shields.io/npm/dm/remark-cjk-friendly)](https://npmjs.com/package/remark-cjk-friendly) [![NPM Last Update](https://img.shields.io/npm/last-update/remark-cjk-friendly)](https://npmjs.com/package/remark-cjk-friendly) [![Socket Badge](https://badge.socket.dev/npm/package/remark-cjk-friendly)](https://socket.dev/npm/package/remark-cjk-friendly) [![Snyk Advisor Package Health Badge](https://snyk.io/advisor/npm-package/remark-cjk-friendly/badge.svg)](https://snyk.io/advisor/npm-package/remark-cjk-friendly)
100
+ - [markdown-it-cjk-friendly](https://npmjs.com/package/markdown-it-cjk-friendly) [![Version](https://img.shields.io/npm/v/markdown-it-cjk-friendly)](https://npmjs.com/package/markdown-it-cjk-friendly) ![Node Current](https://img.shields.io/node/v/markdown-it-cjk-friendly) [![NPM Downloads](https://img.shields.io/npm/dm/markdown-it-cjk-friendly)](https://npmjs.com/package/markdown-it-cjk-friendly) [![NPM Last Update](https://img.shields.io/npm/last-update/markdown-it-cjk-friendly)](https://npmjs.com/package/markdown-it-cjk-friendly) [![Socket Badge](https://badge.socket.dev/npm/package/markdown-it-cjk-friendly)](https://socket.dev/npm/package/markdown-it-cjk-friendly) [![Snyk Advisor Package Health Badge](https://snyk.io/advisor/npm-package/markdown-it-cjk-friendly/badge.svg)](https://snyk.io/advisor/npm-package/markdown-it-cjk-friendly)
101
+ - [remark-cjk-friendly](https://npmjs.com/package/remark-cjk-friendly) [![Version](https://img.shields.io/npm/v/remark-cjk-friendly)](https://npmjs.com/package/remark-cjk-friendly) ![Node Current](https://img.shields.io/node/v/remark-cjk-friendly) [![NPM Downloads](https://img.shields.io/npm/dm/remark-cjk-friendly)](https://npmjs.com/package/remark-cjk-friendly) [![NPM Last Update](https://img.shields.io/npm/last-update/remark-cjk-friendly)](https://npmjs.com/package/remark-cjk-friendly) [![Socket Badge](https://badge.socket.dev/npm/package/remark-cjk-friendly)](https://socket.dev/npm/package/remark-cjk-friendly) [![Snyk Advisor Package Health Badge](https://snyk.io/advisor/npm-package/remark-cjk-friendly/badge.svg)](https://snyk.io/advisor/npm-package/remark-cjk-friendly)
102
+ - [micromark-extension-cjk-friendly-gfm-strikethrough](https://npmjs.com/package/micromark-extension-cjk-friendly-gfm-strikethrough) [![Version](https://img.shields.io/npm/v/micromark-extension-cjk-friendly-gfm-strikethrough)](https://npmjs.com/package/micromark-extension-cjk-friendly-gfm-strikethrough) ![Node Current](https://img.shields.io/node/v/micromark-extension-cjk-friendly-gfm-strikethrough) [![NPM Downloads](https://img.shields.io/npm/dm/micromark-extension-cjk-friendly-gfm-strikethrough)](https://npmjs.com/package/micromark-extension-cjk-friendly-gfm-strikethrough) [![NPM Last Update](https://img.shields.io/npm/last-update/micromark-extension-cjk-friendly-gfm-strikethrough)](https://npmjs.com/package/micromark-extension-cjk-friendly-gfm-strikethrough) [![Socket Badge](https://badge.socket.dev/npm/package/micromark-extension-cjk-friendly-gfm-strikethrough)](https://socket.dev/npm/package/micromark-extension-cjk-friendly-gfm-strikethrough) [![Snyk Advisor Package Health Badge](https://snyk.io/advisor/npm-package/micromark-extension-cjk-friendly-gfm-strikethrough/badge.svg)](https://snyk.io/advisor/npm-package/micromark-extension-cjk-friendly-gfm-strikethrough)
103
103
 
104
104
  ## Contributing / <span lang="ja">貢献</span> / <span lang="zh-Hans-CN">贡献</span> / <span lang="ko">기여</span>
105
105
 
@@ -1,13 +1,12 @@
1
- import { classifyCharacter } from './classifyCharacter.js';
2
- import 'micromark-util-symbol';
3
- import 'micromark-util-types';
1
+ import { classifyCharacter } from "./classifyCharacter.js";
4
2
 
3
+ //#region src/categoryUtil.d.ts
5
4
  type Category = ReturnType<typeof classifyCharacter>;
6
5
  /**
7
- * `true` if the code point represents an [Unicode whitespace character](https://spec.commonmark.org/0.31.2/#unicode-whitespace-character).
6
+ * `true` if the code point represents a [Unicode whitespace character](https://spec.commonmark.org/0.31.2/#unicode-whitespace-character).
8
7
  *
9
8
  * @param category the return value of `classifyCharacter`.
10
- * @returns `true` if the code point represents an Unicode whitespace character
9
+ * @returns `true` if the code point represents a Unicode whitespace character
11
10
  */
12
11
  declare function isUnicodeWhitespace(category: Category): boolean;
13
12
  /**
@@ -46,11 +45,11 @@ declare function isCjkOrIvs(category: Category): boolean;
46
45
  */
47
46
  declare function isNonEmojiGeneralUseVS(category: Category): boolean;
48
47
  /**
49
- * `true` if the code point represents an [Unicode whitespace character](https://spec.commonmark.org/0.31.2/#unicode-whitespace-character) or an [Unicode punctuation character](https://spec.commonmark.org/0.31.2/#unicode-punctuation-character).
48
+ * `true` if the code point represents a [Unicode whitespace character](https://spec.commonmark.org/0.31.2/#unicode-whitespace-character) or a [Unicode punctuation character](https://spec.commonmark.org/0.31.2/#unicode-punctuation-character).
50
49
  *
51
50
  * @param category the return value of `classifyCharacter`.
52
51
  * @returns `true` if the code point represents a space or punctuation
53
52
  */
54
53
  declare function isSpaceOrPunctuation(category: Category): boolean;
55
-
56
- export { isCjk, isCjkOrIvs, isIvs, isNonCjkPunctuation, isNonEmojiGeneralUseVS, isSpaceOrPunctuation, isUnicodeWhitespace };
54
+ //#endregion
55
+ export { isCjk, isCjkOrIvs, isIvs, isNonCjkPunctuation, isNonEmojiGeneralUseVS, isSpaceOrPunctuation, isUnicodeWhitespace };
@@ -1,49 +1,70 @@
1
- // src/categoryUtil.ts
2
- import { constants as constants2 } from "micromark-util-symbol";
1
+ import { constantsEx } from "./classifyCharacter.js";
2
+ import { constants } from "micromark-util-symbol";
3
3
 
4
- // src/classifyCharacter.ts
5
- import { markdownLineEndingOrSpace } from "micromark-util-character";
6
- import { codes, constants } from "micromark-util-symbol";
7
- var constantsEx;
8
- ((constantsEx2) => {
9
- constantsEx2.spaceOrPunctuation = 3;
10
- constantsEx2.cjk = 4096;
11
- constantsEx2.cjkPunctuation = 4098;
12
- constantsEx2.ivs = 8192;
13
- constantsEx2.cjkOrIvs = 12288;
14
- constantsEx2.nonEmojiGeneralUseVS = 16384;
15
- constantsEx2.variationSelector = 24576;
16
- constantsEx2.ivsToCjkRightShift = 1;
17
- })(constantsEx || (constantsEx = {}));
18
-
19
- // src/categoryUtil.ts
4
+ //#region src/categoryUtil.ts
5
+ /**
6
+ * `true` if the code point represents a [Unicode whitespace character](https://spec.commonmark.org/0.31.2/#unicode-whitespace-character).
7
+ *
8
+ * @param category the return value of `classifyCharacter`.
9
+ * @returns `true` if the code point represents a Unicode whitespace character
10
+ */
20
11
  function isUnicodeWhitespace(category) {
21
- return Boolean(category & constants2.characterGroupWhitespace);
12
+ return Boolean(category & constants.characterGroupWhitespace);
22
13
  }
14
+ /**
15
+ * `true` if the code point represents a [non-CJK punctuation character](https://github.com/tats-u/markdown-cjk-friendly/blob/main/specification.md#non-cjk-punctuation-character).
16
+ *
17
+ * @param category the return value of `classifyCharacter`.
18
+ * @returns `true` if the code point represents a non-CJK punctuation character
19
+ */
23
20
  function isNonCjkPunctuation(category) {
24
- return (category & constantsEx.cjkPunctuation) === constants2.characterGroupPunctuation;
21
+ return (category & constantsEx.cjkPunctuation) === constants.characterGroupPunctuation;
25
22
  }
23
+ /**
24
+ * `true` if the code point represents a [CJK character](https://github.com/tats-u/markdown-cjk-friendly/blob/main/specification.md#cjk-character).
25
+ *
26
+ * @param category the return value of `classifyCharacter`.
27
+ * @returns `true` if the code point represents a CJK character
28
+ */
26
29
  function isCjk(category) {
27
- return Boolean(category & constantsEx.cjk);
30
+ return Boolean(category & constantsEx.cjk);
28
31
  }
32
+ /**
33
+ * `true` if the code point represents an [Ideographic Variation Selector](https://github.com/tats-u/markdown-cjk-friendly/blob/main/specification.md#ideographi-variation-selector).
34
+ *
35
+ * @param category the return value of `classifyCharacter`.
36
+ * @returns `true` if the code point represents an IVS
37
+ */
29
38
  function isIvs(category) {
30
- return category === constantsEx.ivs;
39
+ return category === constantsEx.ivs;
31
40
  }
41
+ /**
42
+ * `true` if {@link isCjk} or {@link isIvs}.
43
+ *
44
+ * @param category the return value of {@link classifyCharacter}.
45
+ * @returns `true` if the code point represents a CJK or IVS
46
+ */
32
47
  function isCjkOrIvs(category) {
33
- return Boolean(category & constantsEx.cjkOrIvs);
48
+ return Boolean(category & constantsEx.cjkOrIvs);
34
49
  }
50
+ /**
51
+ * `true` if the code point represents a [Non-emoji General-use Variation Selector](https://github.com/tats-u/markdown-cjk-friendly/blob/main/specification.md#non-emoji-general-use-variation-selector).
52
+ *
53
+ * @param category the return value of `classifyCharacter`.
54
+ * @returns `true` if the code point represents an Non-emoji General-use Variation Selector
55
+ */
35
56
  function isNonEmojiGeneralUseVS(category) {
36
- return category === constantsEx.nonEmojiGeneralUseVS;
57
+ return category === constantsEx.nonEmojiGeneralUseVS;
37
58
  }
59
+ /**
60
+ * `true` if the code point represents a [Unicode whitespace character](https://spec.commonmark.org/0.31.2/#unicode-whitespace-character) or a [Unicode punctuation character](https://spec.commonmark.org/0.31.2/#unicode-punctuation-character).
61
+ *
62
+ * @param category the return value of `classifyCharacter`.
63
+ * @returns `true` if the code point represents a space or punctuation
64
+ */
38
65
  function isSpaceOrPunctuation(category) {
39
- return Boolean(category & constantsEx.spaceOrPunctuation);
66
+ return Boolean(category & constantsEx.spaceOrPunctuation);
40
67
  }
41
- export {
42
- isCjk,
43
- isCjkOrIvs,
44
- isIvs,
45
- isNonCjkPunctuation,
46
- isNonEmojiGeneralUseVS,
47
- isSpaceOrPunctuation,
48
- isUnicodeWhitespace
49
- };
68
+
69
+ //#endregion
70
+ export { isCjk, isCjkOrIvs, isIvs, isNonCjkPunctuation, isNonEmojiGeneralUseVS, isSpaceOrPunctuation, isUnicodeWhitespace };
@@ -1,4 +1,6 @@
1
- import { Code } from 'micromark-util-types';
1
+ import { Code } from "micromark-util-types";
2
+
3
+ //#region src/characterWithNonBmp.d.ts
2
4
 
3
5
  /**
4
6
  * Check if `uc` is CJK or IVS
@@ -53,5 +55,5 @@ declare const unicodePunctuation: (code: Code) => boolean;
53
55
  * Whether it matches.
54
56
  */
55
57
  declare const unicodeWhitespace: (code: Code) => boolean;
56
-
57
- export { cjkOrIvs, isCjkAmbiguousPunctuation, nonEmojiGeneralUseVS, unicodePunctuation, unicodeWhitespace };
58
+ //#endregion
59
+ export { cjkOrIvs, isCjkAmbiguousPunctuation, nonEmojiGeneralUseVS, unicodePunctuation, unicodeWhitespace };
@@ -1,47 +1,99 @@
1
- // src/characterWithNonBmp.ts
2
1
  import { eastAsianWidthType } from "get-east-asian-width";
2
+
3
+ //#region src/characterWithNonBmp.ts
3
4
  function isEmoji(uc) {
4
- return /^\p{Emoji_Presentation}/u.test(String.fromCodePoint(uc));
5
+ return /^\p{Emoji_Presentation}/u.test(String.fromCodePoint(uc));
5
6
  }
7
+ /**
8
+ * Check if `uc` is CJK or IVS
9
+ *
10
+ * @param uc code point
11
+ * @returns `true` if `uc` is CJK, `null` if IVS, or `false` if neither
12
+ */
6
13
  function cjkOrIvs(uc) {
7
- if (!uc || uc < 4352) {
8
- return false;
9
- }
10
- const eaw = eastAsianWidthType(uc);
11
- switch (eaw) {
12
- case "fullwidth":
13
- case "halfwidth":
14
- return true;
15
- // never be emoji
16
- case "wide":
17
- return !isEmoji(uc);
18
- case "narrow":
19
- return false;
20
- case "ambiguous":
21
- return 917760 <= uc && uc <= 917999 ? null : false;
22
- case "neutral":
23
- return /^\p{sc=Hangul}/u.test(String.fromCodePoint(uc));
24
- }
14
+ if (!uc || uc < 4352) return false;
15
+ switch (eastAsianWidthType(uc)) {
16
+ case "fullwidth":
17
+ case "halfwidth": return true;
18
+ case "wide": return !isEmoji(uc);
19
+ case "narrow": return false;
20
+ case "ambiguous": return 917760 <= uc && uc <= 917999 ? null : false;
21
+ case "neutral": return /^\p{sc=Hangul}/u.test(String.fromCodePoint(uc));
22
+ }
25
23
  }
26
24
  function isCjkAmbiguousPunctuation(main, vs) {
27
- if (vs !== 65025 || !main || main < 8216) return false;
28
- return main === 8216 || main === 8217 || main === 8220 || main === 8221;
25
+ if (vs !== 65025 || !main || main < 8216) return false;
26
+ return main === 8216 || main === 8217 || main === 8220 || main === 8221;
29
27
  }
28
+ /**
29
+ * Check whether the character code represents Non-emoji General-use Variation Selector (U+FE00-U+FE0E).
30
+ */
30
31
  function nonEmojiGeneralUseVS(code) {
31
- return code !== null && code >= 65024 && code <= 65038;
32
+ return code !== null && code >= 65024 && code <= 65038;
32
33
  }
33
- var unicodePunctuation = regexCheck(/\p{P}|\p{S}/u);
34
- var unicodeWhitespace = regexCheck(/\s/);
34
+ /**
35
+ * Check whether the character code represents Unicode punctuation.
36
+ *
37
+ * A **Unicode punctuation** is a character in the Unicode `Pc` (Punctuation,
38
+ * Connector), `Pd` (Punctuation, Dash), `Pe` (Punctuation, Close), `Pf`
39
+ * (Punctuation, Final quote), `Pi` (Punctuation, Initial quote), `Po`
40
+ * (Punctuation, Other), or `Ps` (Punctuation, Open) categories, or an ASCII
41
+ * punctuation (see `asciiPunctuation`).
42
+ *
43
+ * See:
44
+ * **\[UNICODE]**:
45
+ * [The Unicode Standard](https://www.unicode.org/versions/).
46
+ * Unicode Consortium.
47
+ *
48
+ * @param code
49
+ * Code.
50
+ * @returns
51
+ * Whether it matches.
52
+ */
53
+ const unicodePunctuation = regexCheck(/\p{P}|\p{S}/u);
54
+ /**
55
+ * Check whether the character code represents Unicode whitespace.
56
+ *
57
+ * Note that this does handle micromark specific markdown whitespace characters.
58
+ * See `markdownLineEndingOrSpace` to check that.
59
+ *
60
+ * A **Unicode whitespace** is a character in the Unicode `Zs` (Separator,
61
+ * Space) category, or U+0009 CHARACTER TABULATION (HT), U+000A LINE FEED (LF),
62
+ * U+000C (FF), or U+000D CARRIAGE RETURN (CR) (**\[UNICODE]**).
63
+ *
64
+ * See:
65
+ * **\[UNICODE]**:
66
+ * [The Unicode Standard](https://www.unicode.org/versions/).
67
+ * Unicode Consortium.
68
+ *
69
+ * @param code
70
+ * Code.
71
+ * @returns
72
+ * Whether it matches.
73
+ */
74
+ const unicodeWhitespace = regexCheck(/\s/);
75
+ /**
76
+ * Create a code check from a regex.
77
+ *
78
+ * @param regex
79
+ * Expression.
80
+ * @returns
81
+ * Check.
82
+ */
35
83
  function regexCheck(regex) {
36
- return check;
37
- function check(code) {
38
- return code !== null && code > -1 && regex.test(String.fromCodePoint(code));
39
- }
84
+ return check;
85
+ /**
86
+ * Check whether a code matches the bound regex.
87
+ *
88
+ * @param code
89
+ * Character code.
90
+ * @returns
91
+ * Whether the character code matches the bound regex.
92
+ */
93
+ function check(code) {
94
+ return code !== null && code > -1 && regex.test(String.fromCodePoint(code));
95
+ }
40
96
  }
41
- export {
42
- cjkOrIvs,
43
- isCjkAmbiguousPunctuation,
44
- nonEmojiGeneralUseVS,
45
- unicodePunctuation,
46
- unicodeWhitespace
47
- };
97
+
98
+ //#endregion
99
+ export { cjkOrIvs, isCjkAmbiguousPunctuation, nonEmojiGeneralUseVS, unicodePunctuation, unicodeWhitespace };
@@ -1,15 +1,16 @@
1
- import { constants } from 'micromark-util-symbol';
2
- import { Code } from 'micromark-util-types';
1
+ import { constants } from "micromark-util-symbol";
2
+ import { Code } from "micromark-util-types";
3
3
 
4
+ //#region src/classifyCharacter.d.ts
4
5
  declare namespace constantsEx {
5
- const spaceOrPunctuation: 3;
6
- const cjk: 4096;
7
- const cjkPunctuation: 4098;
8
- const ivs: 8192;
9
- const cjkOrIvs: 12288;
10
- const nonEmojiGeneralUseVS: 16384;
11
- const variationSelector: 24576;
12
- const ivsToCjkRightShift: 1;
6
+ const spaceOrPunctuation: 3;
7
+ const cjk: 4096;
8
+ const cjkPunctuation: 4098;
9
+ const ivs: 8192;
10
+ const cjkOrIvs: 12288;
11
+ const nonEmojiGeneralUseVS: 16384;
12
+ const variationSelector: 24576;
13
+ const ivsToCjkRightShift: 1;
13
14
  }
14
15
  /**
15
16
  * Classify whether a code represents whitespace, punctuation, or something
@@ -38,5 +39,5 @@ declare function classifyCharacter(code: Code): typeof constants.characterGroupW
38
39
  * Group of the main code point of the preceding character. Use `isCjkOrIvs` to check whether it is CJK
39
40
  */
40
41
  declare function classifyPrecedingCharacter(before: ReturnType<typeof classifyCharacter>, get2Previous: () => Code, previous: Code): ReturnType<typeof classifyCharacter>;
41
-
42
- export { classifyCharacter, classifyPrecedingCharacter, constantsEx };
42
+ //#endregion
43
+ export { classifyCharacter, classifyPrecedingCharacter, constantsEx };
@@ -1,104 +1,69 @@
1
- // src/classifyCharacter.ts
1
+ import { cjkOrIvs, isCjkAmbiguousPunctuation, nonEmojiGeneralUseVS, unicodePunctuation, unicodeWhitespace } from "./characterWithNonBmp.js";
2
+ import { isNonEmojiGeneralUseVS, isUnicodeWhitespace } from "./categoryUtil.js";
3
+ import { codes, constants } from "micromark-util-symbol";
2
4
  import { markdownLineEndingOrSpace } from "micromark-util-character";
3
- import { codes, constants as constants2 } from "micromark-util-symbol";
4
5
 
5
- // src/categoryUtil.ts
6
- import { constants } from "micromark-util-symbol";
7
- function isUnicodeWhitespace(category) {
8
- return Boolean(category & constants.characterGroupWhitespace);
9
- }
10
- function isNonEmojiGeneralUseVS(category) {
11
- return category === constantsEx.nonEmojiGeneralUseVS;
12
- }
13
-
14
- // src/characterWithNonBmp.ts
15
- import { eastAsianWidthType } from "get-east-asian-width";
16
- function isEmoji(uc) {
17
- return /^\p{Emoji_Presentation}/u.test(String.fromCodePoint(uc));
18
- }
19
- function cjkOrIvs(uc) {
20
- if (!uc || uc < 4352) {
21
- return false;
22
- }
23
- const eaw = eastAsianWidthType(uc);
24
- switch (eaw) {
25
- case "fullwidth":
26
- case "halfwidth":
27
- return true;
28
- // never be emoji
29
- case "wide":
30
- return !isEmoji(uc);
31
- case "narrow":
32
- return false;
33
- case "ambiguous":
34
- return 917760 <= uc && uc <= 917999 ? null : false;
35
- case "neutral":
36
- return /^\p{sc=Hangul}/u.test(String.fromCodePoint(uc));
37
- }
38
- }
39
- function isCjkAmbiguousPunctuation(main, vs) {
40
- if (vs !== 65025 || !main || main < 8216) return false;
41
- return main === 8216 || main === 8217 || main === 8220 || main === 8221;
42
- }
43
- function nonEmojiGeneralUseVS(code) {
44
- return code !== null && code >= 65024 && code <= 65038;
45
- }
46
- var unicodePunctuation = regexCheck(/\p{P}|\p{S}/u);
47
- var unicodeWhitespace = regexCheck(/\s/);
48
- function regexCheck(regex) {
49
- return check;
50
- function check(code) {
51
- return code !== null && code > -1 && regex.test(String.fromCodePoint(code));
52
- }
53
- }
54
-
55
- // src/classifyCharacter.ts
56
- var constantsEx;
57
- ((constantsEx2) => {
58
- constantsEx2.spaceOrPunctuation = 3;
59
- constantsEx2.cjk = 4096;
60
- constantsEx2.cjkPunctuation = 4098;
61
- constantsEx2.ivs = 8192;
62
- constantsEx2.cjkOrIvs = 12288;
63
- constantsEx2.nonEmojiGeneralUseVS = 16384;
64
- constantsEx2.variationSelector = 24576;
65
- constantsEx2.ivsToCjkRightShift = 1;
6
+ //#region src/classifyCharacter.ts
7
+ let constantsEx;
8
+ (function(_constantsEx) {
9
+ _constantsEx.spaceOrPunctuation = 3;
10
+ _constantsEx.cjk = 4096;
11
+ _constantsEx.cjkPunctuation = 4098;
12
+ _constantsEx.ivs = 8192;
13
+ _constantsEx.cjkOrIvs = 12288;
14
+ _constantsEx.nonEmojiGeneralUseVS = 16384;
15
+ _constantsEx.variationSelector = 24576;
16
+ _constantsEx.ivsToCjkRightShift = 1;
66
17
  })(constantsEx || (constantsEx = {}));
18
+ /**
19
+ * Classify whether a code represents whitespace, punctuation, or something
20
+ * else.
21
+ *
22
+ * Used for attention (emphasis, strong), whose sequences can open or close
23
+ * based on the class of surrounding characters.
24
+ *
25
+ * > 👉 **Note**: eof (`null`) is seen as whitespace.
26
+ *
27
+ * @param code
28
+ * Code.
29
+ * @returns
30
+ * Group.
31
+ */
67
32
  function classifyCharacter(code) {
68
- if (code === codes.eof || markdownLineEndingOrSpace(code) || unicodeWhitespace(code)) {
69
- return constants2.characterGroupWhitespace;
70
- }
71
- let value = 0;
72
- if (code >= 4352) {
73
- if (nonEmojiGeneralUseVS(code)) {
74
- return constantsEx.nonEmojiGeneralUseVS;
75
- }
76
- switch (cjkOrIvs(code)) {
77
- case null:
78
- return constantsEx.ivs;
79
- case true:
80
- value |= constantsEx.cjk;
81
- break;
82
- }
83
- }
84
- if (unicodePunctuation(code)) {
85
- value |= constants2.characterGroupPunctuation;
86
- }
87
- return value;
33
+ if (code === codes.eof || markdownLineEndingOrSpace(code) || unicodeWhitespace(code)) return constants.characterGroupWhitespace;
34
+ let value = 0;
35
+ if (code >= 4352) {
36
+ if (nonEmojiGeneralUseVS(code)) return constantsEx.nonEmojiGeneralUseVS;
37
+ switch (cjkOrIvs(code)) {
38
+ case null: return constantsEx.ivs;
39
+ case true:
40
+ value |= constantsEx.cjk;
41
+ break;
42
+ }
43
+ }
44
+ if (unicodePunctuation(code)) value |= constants.characterGroupPunctuation;
45
+ return value;
88
46
  }
47
+ /**}
48
+ * Classify whether a code represents whitespace, punctuation, or something else.
49
+ *
50
+ * Recognizes general-use variation selectors. Use this instead of {@linkcode classifyCharacter} for previous character.
51
+ *
52
+ * @param before result of {@linkcode classifyCharacter} of the preceding character.
53
+ * @param get2Previous a function that returns the code point of the character before the preceding character. Use lambda or {@linkcode Function.prototype.bind}.
54
+ * @param previous code point of the preceding character
55
+ * @returns
56
+ * Group of the main code point of the preceding character. Use `isCjkOrIvs` to check whether it is CJK
57
+ */
89
58
  function classifyPrecedingCharacter(before, get2Previous, previous) {
90
- if (!isNonEmojiGeneralUseVS(before)) {
91
- return before;
92
- }
93
- const twoPrevious = get2Previous();
94
- const twoBefore = classifyCharacter(twoPrevious);
95
- return !twoPrevious || isUnicodeWhitespace(twoBefore) ? before : isCjkAmbiguousPunctuation(twoPrevious, previous) ? constantsEx.cjkPunctuation : stripIvs(twoBefore);
59
+ if (!isNonEmojiGeneralUseVS(before)) return before;
60
+ const twoPrevious = get2Previous();
61
+ const twoBefore = classifyCharacter(twoPrevious);
62
+ return !twoPrevious || isUnicodeWhitespace(twoBefore) ? before : isCjkAmbiguousPunctuation(twoPrevious, previous) ? constantsEx.cjkPunctuation : stripIvs(twoBefore);
96
63
  }
97
64
  function stripIvs(twoBefore) {
98
- return twoBefore & ~constantsEx.ivs;
65
+ return twoBefore & ~constantsEx.ivs;
99
66
  }
100
- export {
101
- classifyCharacter,
102
- classifyPrecedingCharacter,
103
- constantsEx
104
- };
67
+
68
+ //#endregion
69
+ export { classifyCharacter, classifyPrecedingCharacter, constantsEx };
@@ -1,4 +1,6 @@
1
- import { Code, Point, TokenizeContext } from 'micromark-util-types';
1
+ import { Code, Point, TokenizeContext } from "micromark-util-types";
2
+
3
+ //#region src/codeUtil.d.ts
2
4
 
3
5
  /**
4
6
  * Check if the given code is a [High-Surrogate Code Unit](https://www.unicode.org/glossary/#high_surrogate_code_unit).
@@ -42,28 +44,28 @@ declare function tryGetCodeTwoBefore(previousCode: Exclude<Code, null>, nowPoint
42
44
  * @see {@link tryGetCodeTwoBefore}
43
45
  */
44
46
  declare class TwoPreviousCode {
45
- readonly previousCode: Exclude<Code, null>;
46
- readonly nowPoint: Point;
47
- readonly sliceSerialize: TokenizeContext["sliceSerialize"];
48
- private cachedValue;
49
- /**
50
- * @see {@link tryGetCodeTwoBefore}
51
- *
52
- * @param previousCode a previous code point. Should be greater than 65,535 if it represents a [Supplementary Character](https://www.unicode.org/glossary/#supplementary_character).
53
- * @param nowPoint `this.now()` (`this` = `TokenizeContext`)
54
- * @param sliceSerialize `this.sliceSerialize` (`this` = `TokenizeContext`)
55
- */
56
- constructor(previousCode: Exclude<Code, null>, nowPoint: Point, sliceSerialize: TokenizeContext["sliceSerialize"]);
57
- /**
58
- * Returns the return value of {@link tryGetCodeTwoBefore}.
59
- *
60
- * If the value has not been computed yet, it will be computed and cached.
61
- *
62
- * @see {@link tryGetCodeTwoBefore}
63
- *
64
- * @returns a value greater than 65,535 if the code point two positions before represents a [Supplementary Character](https://www.unicode.org/glossary/#supplementary_character), a value less than 65,536 for a [BMP Character](https://www.unicode.org/glossary/#bmp_character), or `null` if not found
65
- */
66
- value(): Code;
47
+ readonly previousCode: Exclude<Code, null>;
48
+ readonly nowPoint: Point;
49
+ readonly sliceSerialize: TokenizeContext["sliceSerialize"];
50
+ private cachedValue;
51
+ /**
52
+ * @see {@link tryGetCodeTwoBefore}
53
+ *
54
+ * @param previousCode a previous code point. Should be greater than 65,535 if it represents a [Supplementary Character](https://www.unicode.org/glossary/#supplementary_character).
55
+ * @param nowPoint `this.now()` (`this` = `TokenizeContext`)
56
+ * @param sliceSerialize `this.sliceSerialize` (`this` = `TokenizeContext`)
57
+ */
58
+ constructor(previousCode: Exclude<Code, null>, nowPoint: Point, sliceSerialize: TokenizeContext["sliceSerialize"]);
59
+ /**
60
+ * Returns the return value of {@link tryGetCodeTwoBefore}.
61
+ *
62
+ * If the value has not been computed yet, it will be computed and cached.
63
+ *
64
+ * @see {@link tryGetCodeTwoBefore}
65
+ *
66
+ * @returns a value greater than 65,535 if the code point two positions before represents a [Supplementary Character](https://www.unicode.org/glossary/#supplementary_character), a value less than 65,536 for a [BMP Character](https://www.unicode.org/glossary/#bmp_character), or `null` if not found
67
+ */
68
+ value(): Code;
67
69
  }
68
70
  /**
69
71
  * If `code` is a [High-Surrogate Code Unit](https://www.unicode.org/glossary/#high_surrogate_code_unit), try to get a genuine next [Unicode Scalar Value](https://www.unicode.org/glossary/#unicode_scalar_value) corresponding to the High-Surrogate Code Unit.
@@ -73,5 +75,5 @@ declare class TwoPreviousCode {
73
75
  * @returns a value greater than 65,535 if the next code point represents a [Supplementary Character](https://www.unicode.org/glossary/#supplementary_character), or `code` otherwise
74
76
  */
75
77
  declare function tryGetGenuineNextCode(code: Exclude<Code, null>, nowPoint: Point, sliceSerialize: TokenizeContext["sliceSerialize"]): Exclude<Code, null>;
76
-
77
- export { TwoPreviousCode, isCodeHighSurrogate, isCodeLowSurrogate, tryGetCodeTwoBefore, tryGetGenuineNextCode, tryGetGenuinePreviousCode };
78
+ //#endregion
79
+ export { TwoPreviousCode, isCodeHighSurrogate, isCodeLowSurrogate, tryGetCodeTwoBefore, tryGetGenuineNextCode, tryGetGenuinePreviousCode };
package/dist/codeUtil.js CHANGED
@@ -1,104 +1,124 @@
1
- var __defProp = Object.defineProperty;
2
- var __defNormalProp = (obj, key, value) => key in obj ? __defProp(obj, key, { enumerable: true, configurable: true, writable: true, value }) : obj[key] = value;
3
- var __publicField = (obj, key, value) => __defNormalProp(obj, typeof key !== "symbol" ? key + "" : key, value);
4
-
5
- // src/codeUtil.ts
1
+ //#region src/codeUtil.ts
2
+ /**
3
+ * Check if the given code is a [High-Surrogate Code Unit](https://www.unicode.org/glossary/#high_surrogate_code_unit).
4
+ *
5
+ * A High-Surrogate Code Unit is the _first_ half of a [Surrogate Pair](https://www.unicode.org/glossary/#surrogate_pair).
6
+ *
7
+ * @param code Code.
8
+ * @returns `true` if the code is a High-Surrogate Code Unit, `false` otherwise.
9
+ */
6
10
  function isCodeHighSurrogate(code) {
7
- return Boolean(code && code >= 55296 && code <= 56319);
11
+ return Boolean(code && code >= 55296 && code <= 56319);
8
12
  }
13
+ /**
14
+ * Check if the given code is a [Low-Surrogate Code Unit](https://www.unicode.org/glossary/#low_surrogate_code_unit).
15
+ *
16
+ * A Low-Surrogate Code Unit is the _second_ half of a [Surrogate Pair](https://www.unicode.org/glossary/#surrogate_pair).
17
+ * @param code
18
+ * The character code to check.
19
+ * @returns
20
+ * True if the code is a Low-Surrogate Code Unit, false otherwise.
21
+ */
9
22
  function isCodeLowSurrogate(code) {
10
- return Boolean(code && code >= 56320 && code <= 57343);
23
+ return Boolean(code && code >= 56320 && code <= 57343);
11
24
  }
25
+ /**
26
+ * If `code` is a [Low-Surrogate Code Unit](https://www.unicode.org/glossary/#low_surrogate_code_unit), try to get a genuine previous [Unicode Scalar Value](https://www.unicode.org/glossary/#unicode_scalar_value) corresponding to the Low-Surrogate Code Unit.
27
+ * @param code a tentative previous [code unit](https://www.unicode.org/glossary/#code_unit) less than 65,536, including a Low-Surrogate one
28
+ * @param nowPoint `this.now()` (`this` = `TokenizeContext`)
29
+ * @param sliceSerialize `this.sliceSerialize` (`this` = `TokenizeContext`)
30
+ * @returns a value greater than 65,535 if the previous code point represents a [Supplementary Character](https://www.unicode.org/glossary/#supplementary_character), or `code` otherwise
31
+ */
12
32
  function tryGetGenuinePreviousCode(code, nowPoint, sliceSerialize) {
13
- if (nowPoint._bufferIndex < 2) {
14
- return code;
15
- }
16
- const previousBuffer = sliceSerialize({
17
- // take 2 characters (code units)
18
- start: { ...nowPoint, _bufferIndex: nowPoint._bufferIndex - 2 },
19
- end: nowPoint
20
- });
21
- const previousCandidate = previousBuffer.codePointAt(0);
22
- return previousCandidate && previousCandidate >= 65536 ? previousCandidate : code;
33
+ if (nowPoint._bufferIndex < 2) return code;
34
+ const previousCandidate = sliceSerialize({
35
+ start: {
36
+ ...nowPoint,
37
+ _bufferIndex: nowPoint._bufferIndex - 2
38
+ },
39
+ end: nowPoint
40
+ }).codePointAt(0);
41
+ return previousCandidate && previousCandidate >= 65536 ? previousCandidate : code;
23
42
  }
43
+ /**
44
+ * Try to get the [Unicode Code Point](https://www.unicode.org/glossary/#code_point) two positions before the current position.
45
+ *
46
+ * @param previousCode a previous code point. Should be greater than 65,535 if it represents a [Supplementary Character](https://www.unicode.org/glossary/#supplementary_character).
47
+ * @param nowPoint `this.now()` (`this` = `TokenizeContext`)
48
+ * @param sliceSerialize `this.sliceSerialize` (`this` = `TokenizeContext`)
49
+ * @returns a value greater than 65,535 if the code point two positions before represents a [Supplementary Character](https://www.unicode.org/glossary/#supplementary_character), a value less than 65,536 for a [BMP Character](https://www.unicode.org/glossary/#bmp_character), or `null` if not found
50
+ */
24
51
  function tryGetCodeTwoBefore(previousCode, nowPoint, sliceSerialize) {
25
- const previousWidth = previousCode >= 65536 ? 2 : 1;
26
- if (nowPoint._bufferIndex < 1 + previousWidth) {
27
- return null;
28
- }
29
- const idealStart = nowPoint._bufferIndex - previousWidth - 2;
30
- const twoPreviousBuffer = sliceSerialize({
31
- // take 1--2 character
32
- start: {
33
- ...nowPoint,
34
- _bufferIndex: idealStart >= 0 ? idealStart : 0
35
- },
36
- end: {
37
- ...nowPoint,
38
- _bufferIndex: nowPoint._bufferIndex - previousWidth
39
- }
40
- });
41
- const twoPreviousLast = twoPreviousBuffer.charCodeAt(
42
- twoPreviousBuffer.length - 1
43
- );
44
- if (Number.isNaN(twoPreviousLast)) {
45
- return null;
46
- }
47
- if (twoPreviousBuffer.length < 2 || twoPreviousLast < 56320 || 57343 < twoPreviousLast) {
48
- return twoPreviousLast;
49
- }
50
- const twoPreviousCandidate = twoPreviousBuffer.codePointAt(0);
51
- if (twoPreviousCandidate && twoPreviousCandidate >= 65536) {
52
- return twoPreviousCandidate;
53
- }
54
- return twoPreviousLast;
52
+ const previousWidth = previousCode >= 65536 ? 2 : 1;
53
+ if (nowPoint._bufferIndex < 1 + previousWidth) return null;
54
+ const idealStart = nowPoint._bufferIndex - previousWidth - 2;
55
+ const twoPreviousBuffer = sliceSerialize({
56
+ start: {
57
+ ...nowPoint,
58
+ _bufferIndex: idealStart >= 0 ? idealStart : 0
59
+ },
60
+ end: {
61
+ ...nowPoint,
62
+ _bufferIndex: nowPoint._bufferIndex - previousWidth
63
+ }
64
+ });
65
+ const twoPreviousLast = twoPreviousBuffer.charCodeAt(twoPreviousBuffer.length - 1);
66
+ if (Number.isNaN(twoPreviousLast)) return null;
67
+ if (twoPreviousBuffer.length < 2 || twoPreviousLast < 56320 || 57343 < twoPreviousLast) return twoPreviousLast;
68
+ const twoPreviousCandidate = twoPreviousBuffer.codePointAt(0);
69
+ if (twoPreviousCandidate && twoPreviousCandidate >= 65536) return twoPreviousCandidate;
70
+ return twoPreviousLast;
55
71
  }
72
+ /**
73
+ * Lazily get the [Unicode Code Point](https://www.unicode.org/glossary/#code_point) two positions before the current position only if necessary.
74
+ *
75
+ * @see {@link tryGetCodeTwoBefore}
76
+ */
56
77
  var TwoPreviousCode = class {
57
- /**
58
- * @see {@link tryGetCodeTwoBefore}
59
- *
60
- * @param previousCode a previous code point. Should be greater than 65,535 if it represents a [Supplementary Character](https://www.unicode.org/glossary/#supplementary_character).
61
- * @param nowPoint `this.now()` (`this` = `TokenizeContext`)
62
- * @param sliceSerialize `this.sliceSerialize` (`this` = `TokenizeContext`)
63
- */
64
- constructor(previousCode, nowPoint, sliceSerialize) {
65
- this.previousCode = previousCode;
66
- this.nowPoint = nowPoint;
67
- this.sliceSerialize = sliceSerialize;
68
- __publicField(this, "cachedValue");
69
- }
70
- /**
71
- * Returns the return value of {@link tryGetCodeTwoBefore}.
72
- *
73
- * If the value has not been computed yet, it will be computed and cached.
74
- *
75
- * @see {@link tryGetCodeTwoBefore}
76
- *
77
- * @returns a value greater than 65,535 if the code point two positions before represents a [Supplementary Character](https://www.unicode.org/glossary/#supplementary_character), a value less than 65,536 for a [BMP Character](https://www.unicode.org/glossary/#bmp_character), or `null` if not found
78
- */
79
- value() {
80
- if (this.cachedValue === void 0) {
81
- this.cachedValue = tryGetCodeTwoBefore(
82
- this.previousCode,
83
- this.nowPoint,
84
- this.sliceSerialize
85
- );
86
- }
87
- return this.cachedValue;
88
- }
78
+ cachedValue = void 0;
79
+ /**
80
+ * @see {@link tryGetCodeTwoBefore}
81
+ *
82
+ * @param previousCode a previous code point. Should be greater than 65,535 if it represents a [Supplementary Character](https://www.unicode.org/glossary/#supplementary_character).
83
+ * @param nowPoint `this.now()` (`this` = `TokenizeContext`)
84
+ * @param sliceSerialize `this.sliceSerialize` (`this` = `TokenizeContext`)
85
+ */
86
+ constructor(previousCode, nowPoint, sliceSerialize) {
87
+ this.previousCode = previousCode;
88
+ this.nowPoint = nowPoint;
89
+ this.sliceSerialize = sliceSerialize;
90
+ }
91
+ /**
92
+ * Returns the return value of {@link tryGetCodeTwoBefore}.
93
+ *
94
+ * If the value has not been computed yet, it will be computed and cached.
95
+ *
96
+ * @see {@link tryGetCodeTwoBefore}
97
+ *
98
+ * @returns a value greater than 65,535 if the code point two positions before represents a [Supplementary Character](https://www.unicode.org/glossary/#supplementary_character), a value less than 65,536 for a [BMP Character](https://www.unicode.org/glossary/#bmp_character), or `null` if not found
99
+ */
100
+ value() {
101
+ if (this.cachedValue === void 0) this.cachedValue = tryGetCodeTwoBefore(this.previousCode, this.nowPoint, this.sliceSerialize);
102
+ return this.cachedValue;
103
+ }
89
104
  };
105
+ /**
106
+ * If `code` is a [High-Surrogate Code Unit](https://www.unicode.org/glossary/#high_surrogate_code_unit), try to get a genuine next [Unicode Scalar Value](https://www.unicode.org/glossary/#unicode_scalar_value) corresponding to the High-Surrogate Code Unit.
107
+ * @param code a tentative next [code unit](https://www.unicode.org/glossary/#code_unit) less than 65,536, including a High-Surrogate one
108
+ * @param nowPoint `this.now()` (`this` = `TokenizeContext`)
109
+ * @param sliceSerialize `this.sliceSerialize` (`this` = `TokenizeContext`)
110
+ * @returns a value greater than 65,535 if the next code point represents a [Supplementary Character](https://www.unicode.org/glossary/#supplementary_character), or `code` otherwise
111
+ */
90
112
  function tryGetGenuineNextCode(code, nowPoint, sliceSerialize) {
91
- const nextCandidate = sliceSerialize({
92
- start: nowPoint,
93
- end: { ...nowPoint, _bufferIndex: nowPoint._bufferIndex + 2 }
94
- }).codePointAt(0);
95
- return nextCandidate && nextCandidate >= 65536 ? nextCandidate : code;
113
+ const nextCandidate = sliceSerialize({
114
+ start: nowPoint,
115
+ end: {
116
+ ...nowPoint,
117
+ _bufferIndex: nowPoint._bufferIndex + 2
118
+ }
119
+ }).codePointAt(0);
120
+ return nextCandidate && nextCandidate >= 65536 ? nextCandidate : code;
96
121
  }
97
- export {
98
- TwoPreviousCode,
99
- isCodeHighSurrogate,
100
- isCodeLowSurrogate,
101
- tryGetCodeTwoBefore,
102
- tryGetGenuineNextCode,
103
- tryGetGenuinePreviousCode
104
- };
122
+
123
+ //#endregion
124
+ export { TwoPreviousCode, isCodeHighSurrogate, isCodeLowSurrogate, tryGetCodeTwoBefore, tryGetGenuineNextCode, tryGetGenuinePreviousCode };
package/dist/index.d.ts CHANGED
@@ -1,5 +1,4 @@
1
- export { isCjk, isCjkOrIvs, isIvs, isNonCjkPunctuation, isNonEmojiGeneralUseVS, isSpaceOrPunctuation, isUnicodeWhitespace } from './categoryUtil.js';
2
- export { classifyCharacter, classifyPrecedingCharacter, constantsEx } from './classifyCharacter.js';
3
- export { TwoPreviousCode, isCodeHighSurrogate, isCodeLowSurrogate, tryGetCodeTwoBefore, tryGetGenuineNextCode, tryGetGenuinePreviousCode } from './codeUtil.js';
4
- import 'micromark-util-symbol';
5
- import 'micromark-util-types';
1
+ import { classifyCharacter, classifyPrecedingCharacter, constantsEx } from "./classifyCharacter.js";
2
+ import { isCjk, isCjkOrIvs, isIvs, isNonCjkPunctuation, isNonEmojiGeneralUseVS, isSpaceOrPunctuation, isUnicodeWhitespace } from "./categoryUtil.js";
3
+ import { TwoPreviousCode, isCodeHighSurrogate, isCodeLowSurrogate, tryGetCodeTwoBefore, tryGetGenuineNextCode, tryGetGenuinePreviousCode } from "./codeUtil.js";
4
+ export { TwoPreviousCode, classifyCharacter, classifyPrecedingCharacter, constantsEx, isCjk, isCjkOrIvs, isCodeHighSurrogate, isCodeLowSurrogate, isIvs, isNonCjkPunctuation, isNonEmojiGeneralUseVS, isSpaceOrPunctuation, isUnicodeWhitespace, tryGetCodeTwoBefore, tryGetGenuineNextCode, tryGetGenuinePreviousCode };
package/dist/index.js CHANGED
@@ -1,231 +1,5 @@
1
- var __defProp = Object.defineProperty;
2
- var __defNormalProp = (obj, key, value) => key in obj ? __defProp(obj, key, { enumerable: true, configurable: true, writable: true, value }) : obj[key] = value;
3
- var __publicField = (obj, key, value) => __defNormalProp(obj, typeof key !== "symbol" ? key + "" : key, value);
1
+ import { classifyCharacter, classifyPrecedingCharacter, constantsEx } from "./classifyCharacter.js";
2
+ import { isCjk, isCjkOrIvs, isIvs, isNonCjkPunctuation, isNonEmojiGeneralUseVS, isSpaceOrPunctuation, isUnicodeWhitespace } from "./categoryUtil.js";
3
+ import { TwoPreviousCode, isCodeHighSurrogate, isCodeLowSurrogate, tryGetCodeTwoBefore, tryGetGenuineNextCode, tryGetGenuinePreviousCode } from "./codeUtil.js";
4
4
 
5
- // src/categoryUtil.ts
6
- import { constants as constants2 } from "micromark-util-symbol";
7
-
8
- // src/classifyCharacter.ts
9
- import { markdownLineEndingOrSpace } from "micromark-util-character";
10
- import { codes, constants } from "micromark-util-symbol";
11
-
12
- // src/characterWithNonBmp.ts
13
- import { eastAsianWidthType } from "get-east-asian-width";
14
- function isEmoji(uc) {
15
- return /^\p{Emoji_Presentation}/u.test(String.fromCodePoint(uc));
16
- }
17
- function cjkOrIvs(uc) {
18
- if (!uc || uc < 4352) {
19
- return false;
20
- }
21
- const eaw = eastAsianWidthType(uc);
22
- switch (eaw) {
23
- case "fullwidth":
24
- case "halfwidth":
25
- return true;
26
- // never be emoji
27
- case "wide":
28
- return !isEmoji(uc);
29
- case "narrow":
30
- return false;
31
- case "ambiguous":
32
- return 917760 <= uc && uc <= 917999 ? null : false;
33
- case "neutral":
34
- return /^\p{sc=Hangul}/u.test(String.fromCodePoint(uc));
35
- }
36
- }
37
- function isCjkAmbiguousPunctuation(main, vs) {
38
- if (vs !== 65025 || !main || main < 8216) return false;
39
- return main === 8216 || main === 8217 || main === 8220 || main === 8221;
40
- }
41
- function nonEmojiGeneralUseVS(code) {
42
- return code !== null && code >= 65024 && code <= 65038;
43
- }
44
- var unicodePunctuation = regexCheck(/\p{P}|\p{S}/u);
45
- var unicodeWhitespace = regexCheck(/\s/);
46
- function regexCheck(regex) {
47
- return check;
48
- function check(code) {
49
- return code !== null && code > -1 && regex.test(String.fromCodePoint(code));
50
- }
51
- }
52
-
53
- // src/classifyCharacter.ts
54
- var constantsEx;
55
- ((constantsEx2) => {
56
- constantsEx2.spaceOrPunctuation = 3;
57
- constantsEx2.cjk = 4096;
58
- constantsEx2.cjkPunctuation = 4098;
59
- constantsEx2.ivs = 8192;
60
- constantsEx2.cjkOrIvs = 12288;
61
- constantsEx2.nonEmojiGeneralUseVS = 16384;
62
- constantsEx2.variationSelector = 24576;
63
- constantsEx2.ivsToCjkRightShift = 1;
64
- })(constantsEx || (constantsEx = {}));
65
- function classifyCharacter(code) {
66
- if (code === codes.eof || markdownLineEndingOrSpace(code) || unicodeWhitespace(code)) {
67
- return constants.characterGroupWhitespace;
68
- }
69
- let value = 0;
70
- if (code >= 4352) {
71
- if (nonEmojiGeneralUseVS(code)) {
72
- return constantsEx.nonEmojiGeneralUseVS;
73
- }
74
- switch (cjkOrIvs(code)) {
75
- case null:
76
- return constantsEx.ivs;
77
- case true:
78
- value |= constantsEx.cjk;
79
- break;
80
- }
81
- }
82
- if (unicodePunctuation(code)) {
83
- value |= constants.characterGroupPunctuation;
84
- }
85
- return value;
86
- }
87
- function classifyPrecedingCharacter(before, get2Previous, previous) {
88
- if (!isNonEmojiGeneralUseVS(before)) {
89
- return before;
90
- }
91
- const twoPrevious = get2Previous();
92
- const twoBefore = classifyCharacter(twoPrevious);
93
- return !twoPrevious || isUnicodeWhitespace(twoBefore) ? before : isCjkAmbiguousPunctuation(twoPrevious, previous) ? constantsEx.cjkPunctuation : stripIvs(twoBefore);
94
- }
95
- function stripIvs(twoBefore) {
96
- return twoBefore & ~constantsEx.ivs;
97
- }
98
-
99
- // src/categoryUtil.ts
100
- function isUnicodeWhitespace(category) {
101
- return Boolean(category & constants2.characterGroupWhitespace);
102
- }
103
- function isNonCjkPunctuation(category) {
104
- return (category & constantsEx.cjkPunctuation) === constants2.characterGroupPunctuation;
105
- }
106
- function isCjk(category) {
107
- return Boolean(category & constantsEx.cjk);
108
- }
109
- function isIvs(category) {
110
- return category === constantsEx.ivs;
111
- }
112
- function isCjkOrIvs(category) {
113
- return Boolean(category & constantsEx.cjkOrIvs);
114
- }
115
- function isNonEmojiGeneralUseVS(category) {
116
- return category === constantsEx.nonEmojiGeneralUseVS;
117
- }
118
- function isSpaceOrPunctuation(category) {
119
- return Boolean(category & constantsEx.spaceOrPunctuation);
120
- }
121
-
122
- // src/codeUtil.ts
123
- function isCodeHighSurrogate(code) {
124
- return Boolean(code && code >= 55296 && code <= 56319);
125
- }
126
- function isCodeLowSurrogate(code) {
127
- return Boolean(code && code >= 56320 && code <= 57343);
128
- }
129
- function tryGetGenuinePreviousCode(code, nowPoint, sliceSerialize) {
130
- if (nowPoint._bufferIndex < 2) {
131
- return code;
132
- }
133
- const previousBuffer = sliceSerialize({
134
- // take 2 characters (code units)
135
- start: { ...nowPoint, _bufferIndex: nowPoint._bufferIndex - 2 },
136
- end: nowPoint
137
- });
138
- const previousCandidate = previousBuffer.codePointAt(0);
139
- return previousCandidate && previousCandidate >= 65536 ? previousCandidate : code;
140
- }
141
- function tryGetCodeTwoBefore(previousCode, nowPoint, sliceSerialize) {
142
- const previousWidth = previousCode >= 65536 ? 2 : 1;
143
- if (nowPoint._bufferIndex < 1 + previousWidth) {
144
- return null;
145
- }
146
- const idealStart = nowPoint._bufferIndex - previousWidth - 2;
147
- const twoPreviousBuffer = sliceSerialize({
148
- // take 1--2 character
149
- start: {
150
- ...nowPoint,
151
- _bufferIndex: idealStart >= 0 ? idealStart : 0
152
- },
153
- end: {
154
- ...nowPoint,
155
- _bufferIndex: nowPoint._bufferIndex - previousWidth
156
- }
157
- });
158
- const twoPreviousLast = twoPreviousBuffer.charCodeAt(
159
- twoPreviousBuffer.length - 1
160
- );
161
- if (Number.isNaN(twoPreviousLast)) {
162
- return null;
163
- }
164
- if (twoPreviousBuffer.length < 2 || twoPreviousLast < 56320 || 57343 < twoPreviousLast) {
165
- return twoPreviousLast;
166
- }
167
- const twoPreviousCandidate = twoPreviousBuffer.codePointAt(0);
168
- if (twoPreviousCandidate && twoPreviousCandidate >= 65536) {
169
- return twoPreviousCandidate;
170
- }
171
- return twoPreviousLast;
172
- }
173
- var TwoPreviousCode = class {
174
- /**
175
- * @see {@link tryGetCodeTwoBefore}
176
- *
177
- * @param previousCode a previous code point. Should be greater than 65,535 if it represents a [Supplementary Character](https://www.unicode.org/glossary/#supplementary_character).
178
- * @param nowPoint `this.now()` (`this` = `TokenizeContext`)
179
- * @param sliceSerialize `this.sliceSerialize` (`this` = `TokenizeContext`)
180
- */
181
- constructor(previousCode, nowPoint, sliceSerialize) {
182
- this.previousCode = previousCode;
183
- this.nowPoint = nowPoint;
184
- this.sliceSerialize = sliceSerialize;
185
- __publicField(this, "cachedValue");
186
- }
187
- /**
188
- * Returns the return value of {@link tryGetCodeTwoBefore}.
189
- *
190
- * If the value has not been computed yet, it will be computed and cached.
191
- *
192
- * @see {@link tryGetCodeTwoBefore}
193
- *
194
- * @returns a value greater than 65,535 if the code point two positions before represents a [Supplementary Character](https://www.unicode.org/glossary/#supplementary_character), a value less than 65,536 for a [BMP Character](https://www.unicode.org/glossary/#bmp_character), or `null` if not found
195
- */
196
- value() {
197
- if (this.cachedValue === void 0) {
198
- this.cachedValue = tryGetCodeTwoBefore(
199
- this.previousCode,
200
- this.nowPoint,
201
- this.sliceSerialize
202
- );
203
- }
204
- return this.cachedValue;
205
- }
206
- };
207
- function tryGetGenuineNextCode(code, nowPoint, sliceSerialize) {
208
- const nextCandidate = sliceSerialize({
209
- start: nowPoint,
210
- end: { ...nowPoint, _bufferIndex: nowPoint._bufferIndex + 2 }
211
- }).codePointAt(0);
212
- return nextCandidate && nextCandidate >= 65536 ? nextCandidate : code;
213
- }
214
- export {
215
- TwoPreviousCode,
216
- classifyCharacter,
217
- classifyPrecedingCharacter,
218
- constantsEx,
219
- isCjk,
220
- isCjkOrIvs,
221
- isCodeHighSurrogate,
222
- isCodeLowSurrogate,
223
- isIvs,
224
- isNonCjkPunctuation,
225
- isNonEmojiGeneralUseVS,
226
- isSpaceOrPunctuation,
227
- isUnicodeWhitespace,
228
- tryGetCodeTwoBefore,
229
- tryGetGenuineNextCode,
230
- tryGetGenuinePreviousCode
231
- };
5
+ export { TwoPreviousCode, classifyCharacter, classifyPrecedingCharacter, constantsEx, isCjk, isCjkOrIvs, isCodeHighSurrogate, isCodeLowSurrogate, isIvs, isNonCjkPunctuation, isNonEmojiGeneralUseVS, isSpaceOrPunctuation, isUnicodeWhitespace, tryGetCodeTwoBefore, tryGetGenuineNextCode, tryGetGenuinePreviousCode };
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "micromark-extension-cjk-friendly-util",
3
- "version": "2.1.0",
3
+ "version": "3.0.0",
4
4
  "type": "module",
5
5
  "exports": {
6
6
  ".": {
@@ -33,7 +33,7 @@
33
33
  "description": "common library for micromark-extension-cjk-friendly and its related packages",
34
34
  "sideEffects": false,
35
35
  "dependencies": {
36
- "get-east-asian-width": "^1.3.0",
36
+ "get-east-asian-width": "^1.4.0",
37
37
  "micromark-util-character": "^2.1.1",
38
38
  "micromark-util-symbol": "^2.0.1"
39
39
  },
@@ -46,15 +46,13 @@
46
46
  }
47
47
  },
48
48
  "engines": {
49
- "node": ">=16"
49
+ "node": ">=18"
50
50
  },
51
51
  "scripts": {
52
- "build:rslib": "rslib build",
53
- "build": "tsup",
54
- "build:lib": "tsup",
55
- "dev:rslib": "rslib build --watch",
56
- "dev": "tsup --watch",
57
- "dev:lib": "tsup --watch",
52
+ "build": "tsdown",
53
+ "build:lib": "tsdown",
54
+ "dev": "tsdown --watch",
55
+ "dev:lib": "tsdown --watch",
58
56
  "test": "vitest run",
59
57
  "test:lib": "vitest run",
60
58
  "test:up": "vitest -u",