micromark-extension-cjk-friendly-util 1.0.0 → 1.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +15 -15
- package/dist/categoryUtil.d.ts +16 -12
- package/dist/categoryUtil.js +44 -0
- package/dist/characterWithNonBmp.d.ts +8 -5
- package/dist/characterWithNonBmp.js +54 -0
- package/dist/classifyCharacter.d.ts +7 -4
- package/dist/classifyCharacter.js +90 -0
- package/dist/codeUtil.d.ts +38 -6
- package/dist/codeUtil.js +104 -0
- package/dist/index.d.ts +5 -3
- package/dist/index.js +153 -93
- package/package.json +13 -5
- package/dist/index.cjs +0 -169
package/README.md
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
# micromark-extension-cjk-friendly-util
|
|
2
2
|
|
|
3
|
-
[](https://npmjs.com/package/micromark-extension-cjk-friendly-util) [](https://npmjs.com/package/micromark-extension-cjk-friendly-util)  [](https://npmjs.com/package/micromark-extension-cjk-friendly-util) [](https://npmjs.com/package/micromark-extension-cjk-friendly-util)
|
|
4
4
|
|
|
5
5
|
An utility library package for [micromark-extension-cjk-friendly](https://npmjs.com/package/micromark-extension-cjk-friendly), which is internally used by [remark-cjk-friendly](https://npmjs.com/package/remark-cjk-friendly), and its related packages.
|
|
6
6
|
|
|
@@ -42,21 +42,21 @@ CommonMark issue: https://github.com/commonmark/commonmark-spec/issues/650
|
|
|
42
42
|
|
|
43
43
|
## Runtime Requirements / <span lang="ja">実行環境の要件</span> / <span lang="zh-Hans-CN">运行环境要求</span> / <span lang="ko">업데이트 전략</span>
|
|
44
44
|
|
|
45
|
-
This package
|
|
45
|
+
This package is ESM-only. It requires Node.js 16 or later.
|
|
46
46
|
|
|
47
|
-
<span lang="ja"
|
|
47
|
+
<span lang="ja">本パッケージはESM専用です。Node.js 16以上が必要です。</span>
|
|
48
48
|
|
|
49
|
-
<span lang="zh-CN"
|
|
49
|
+
<span lang="zh-Hans-CN">此包仅支持ESM。需要Node.js 16或更高版本。</span>
|
|
50
50
|
|
|
51
|
-
<span lang="ko">이 패키지는
|
|
51
|
+
<span lang="ko">이 패키지는 ESM만 사용을 위한 패키지입니다. Node.js 16或更高版本가 필요입니다.</span>
|
|
52
52
|
|
|
53
|
-
|
|
53
|
+
This package uses the [`v` flag for regular expressions](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/unicodeSets) introduced in ES2024, if available, to determine whether a character is an emoji. In the following compatible environments, it will comply with the Unicode version supported by the runtime. Otherwise, it will fall back to the snapshot as of Unicode 16.
|
|
54
54
|
|
|
55
|
-
<span lang="ja"
|
|
55
|
+
<span lang="ja">本パッケージは文字が絵文字かどうかを判定するために、ES2024で導入された[正規表現の`v`フラグ](https://developer.mozilla.org/ja/docs/Web/JavaScript/Reference/Global_Objects/RegExp/unicodeSets)が利用可能であれば使用します。以下の対応環境の場合、ランタイムが対応しているUnicodeバージョンに準拠します。それ以外の場合、Unicode 16時点のスナップショットにフォールバックします。</span>
|
|
56
56
|
|
|
57
|
-
<span lang="zh-CN"
|
|
57
|
+
<span lang="zh-Hans-CN">本包使用ES2024引入的[正则表达式`v`标志](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/unicodeSets)(如果可用)来判断字符是否为表情符号。在以下兼容环境中,将遵循运行时支持的Unicode版本。否则,将回退到Unicode 16的快照。</span>
|
|
58
58
|
|
|
59
|
-
<span lang="ko"
|
|
59
|
+
<span lang="ko">이 패키지는 문자가 이모지인지 판단하기 위해 ES2024에서 도입된 [정규표현식 `v` 플래그](https://developer.mozilla.org/ko/docs/Web/JavaScript/Reference/Global_Objects/RegExp/unicodeSets)를 사용할 수 있다면 사용합니다. 다음 호환 환경에서는 런타임이 지원하는 Unicode 버전을 따릅니다. 그렇지 않은 경우, Unicode 16 시점의 스냅샷으로 폴백합니다.</span>
|
|
60
60
|
|
|
61
61
|
- Chrome / Edge 112 or later
|
|
62
62
|
- Firefox 116 or later
|
|
@@ -100,7 +100,7 @@ This package provides a function and a namespace based on the original micromark
|
|
|
100
100
|
| `classifyCharacter` | function | [micromark-util-character](https://npmjs.com/package/micromark-util-character) | (same) | Tells whether a character is not only a punctuation or whitespace but also a CJK or variation selector |
|
|
101
101
|
| `constantsEx` | namespace | [micromark-util-symbol](https://npmjs.com/package/micromark-util-symbol) | `constants` | Constants meaning CJK and variation selectors; use it and the original `constants` together. |
|
|
102
102
|
|
|
103
|
-
Also, this package provides some utility functions to check whether a character belongs to the category defined in the specification (e.g. CJK
|
|
103
|
+
Also, this package provides some utility functions to check whether a character belongs to the category defined in the specification (e.g. CJK character), or to help you fetch the Unicode Code Point of a character around the emphasis mark.
|
|
104
104
|
|
|
105
105
|
## Specification / <span lang="ja">規格書</span> / <span lang="zh-Hans-CN">规范</span> / <span lang="ko">규정서</span>
|
|
106
106
|
|
|
@@ -108,11 +108,11 @@ https://github.com/tats-u/markdown-cjk-friendly/blob/main/specification.md (Engl
|
|
|
108
108
|
|
|
109
109
|
## Related packages / <span lang="ja">関連パッケージ</span> / <span lang="zh-Hans-CN">相关包</span> / <span lang="ko">관련 패키지</span>
|
|
110
110
|
|
|
111
|
-
- [micromark-extension-cjk-friendly](https://npmjs.com/package/micromark-extension-cjk-friendly) [](https://npmjs.com/package/micromark-extension-cjk-friendly) [ [](https://npmjs.com/package/remark-cjk-friendly) [ [](https://npmjs.com/package/markdown-it-cjk-friendly) [ [](https://npmjs.com/package/remark-cjk-friendly) [ [](https://npmjs.com/package/micromark-extension-cjk-friendly-gfm-strikethrough) [ [](https://npmjs.com/package/micromark-extension-cjk-friendly)  [](https://npmjs.com/package/micromark-extension-cjk-friendly) [](https://npmjs.com/package/micromark-extension-cjk-friendly)
|
|
112
|
+
- [remark-cjk-friendly](https://npmjs.com/package/remark-cjk-friendly) [](https://npmjs.com/package/remark-cjk-friendly)  [](https://npmjs.com/package/remark-cjk-friendly) [](https://npmjs.com/package/remark-cjk-friendly)
|
|
113
|
+
- [markdown-it-cjk-friendly](https://npmjs.com/package/markdown-it-cjk-friendly) [](https://npmjs.com/package/markdown-it-cjk-friendly)  [](https://npmjs.com/package/markdown-it-cjk-friendly) [](https://npmjs.com/package/markdown-it-cjk-friendly)
|
|
114
|
+
- [remark-cjk-friendly](https://npmjs.com/package/remark-cjk-friendly) [](https://npmjs.com/package/remark-cjk-friendly)  [](https://npmjs.com/package/remark-cjk-friendly) [](https://npmjs.com/package/remark-cjk-friendly)
|
|
115
|
+
- [micromark-extension-cjk-friendly-gfm-strikethrough](https://npmjs.com/package/micromark-extension-cjk-friendly-gfm-strikethrough) [](https://npmjs.com/package/micromark-extension-cjk-friendly-gfm-strikethrough)  [](https://npmjs.com/package/micromark-extension-cjk-friendly-gfm-strikethrough) [](https://npmjs.com/package/micromark-extension-cjk-friendly-gfm-strikethrough)
|
|
116
116
|
|
|
117
117
|
## Contributing / <span lang="ja">貢献</span> / <span lang="zh-Hans-CN">贡献</span> / <span lang="ko">기여</span>
|
|
118
118
|
|
package/dist/categoryUtil.d.ts
CHANGED
|
@@ -1,4 +1,7 @@
|
|
|
1
|
-
import {
|
|
1
|
+
import { classifyCharacter } from './classifyCharacter.js';
|
|
2
|
+
import 'micromark-util-symbol';
|
|
3
|
+
import 'micromark-util-types';
|
|
4
|
+
|
|
2
5
|
type Category = ReturnType<typeof classifyCharacter>;
|
|
3
6
|
/**
|
|
4
7
|
* `true` if the code point represents an [Unicode whitespace character](https://spec.commonmark.org/0.31.2/#unicode-whitespace-character).
|
|
@@ -6,40 +9,41 @@ type Category = ReturnType<typeof classifyCharacter>;
|
|
|
6
9
|
* @param category the return value of `classifyCharacter`.
|
|
7
10
|
* @returns `true` if the code point represents an Unicode whitespace character
|
|
8
11
|
*/
|
|
9
|
-
|
|
12
|
+
declare function isUnicodeWhitespace(category: Category): boolean;
|
|
10
13
|
/**
|
|
11
14
|
* `true` if the code point represents a [non-CJK punctuation character](https://github.com/tats-u/markdown-cjk-friendly/blob/main/specification.md#non-cjk-punctuation-character).
|
|
12
15
|
*
|
|
13
16
|
* @param category the return value of `classifyCharacter`.
|
|
14
17
|
* @returns `true` if the code point represents a non-CJK punctuation character
|
|
15
18
|
*/
|
|
16
|
-
|
|
19
|
+
declare function isNonCjkPunctuation(category: Category): boolean;
|
|
17
20
|
/**
|
|
18
|
-
* `true` if the code point represents a [CJK character
|
|
21
|
+
* `true` if the code point represents a [CJK character](https://github.com/tats-u/markdown-cjk-friendly/blob/main/specification.md#cjk-character).
|
|
19
22
|
*
|
|
20
23
|
* @param category the return value of `classifyCharacter`.
|
|
21
24
|
* @returns `true` if the code point represents a CJK character
|
|
22
25
|
*/
|
|
23
|
-
|
|
26
|
+
declare function isCjk(category: Category): boolean;
|
|
24
27
|
/**
|
|
25
|
-
* `true` if the code point represents an [
|
|
28
|
+
* `true` if the code point represents an [Ideographic Variation Selector](https://github.com/tats-u/markdown-cjk-friendly/blob/main/specification.md#ideographi-variation-selector).
|
|
26
29
|
*
|
|
27
30
|
* @param category the return value of `classifyCharacter`.
|
|
28
31
|
* @returns `true` if the code point represents an IVS
|
|
29
32
|
*/
|
|
30
|
-
|
|
33
|
+
declare function isIvs(category: Category): boolean;
|
|
31
34
|
/**
|
|
32
|
-
* `true` if the code point represents a [
|
|
35
|
+
* `true` if the code point represents a [Standard Variation Selector that can follow CJK](https://github.com/tats-u/markdown-cjk-friendly/blob/main/specification.md#svs-that-can-follow-cjk).
|
|
33
36
|
*
|
|
34
37
|
* @param category the return value of `classifyCharacter`.
|
|
35
|
-
* @returns `true` if the code point represents an
|
|
38
|
+
* @returns `true` if the code point represents an Standard Variation Selector that can follow CJK
|
|
36
39
|
*/
|
|
37
|
-
|
|
40
|
+
declare function isSvsFollowingCjk(category: Category): boolean;
|
|
38
41
|
/**
|
|
39
42
|
* `true` if the code point represents an [Unicode whitespace character](https://spec.commonmark.org/0.31.2/#unicode-whitespace-character) or an [Unicode punctuation character](https://spec.commonmark.org/0.31.2/#unicode-punctuation-character).
|
|
40
43
|
*
|
|
41
44
|
* @param category the return value of `classifyCharacter`.
|
|
42
45
|
* @returns `true` if the code point represents a space or punctuation
|
|
43
46
|
*/
|
|
44
|
-
|
|
45
|
-
|
|
47
|
+
declare function isSpaceOrPunctuation(category: Category): boolean;
|
|
48
|
+
|
|
49
|
+
export { isCjk, isIvs, isNonCjkPunctuation, isSpaceOrPunctuation, isSvsFollowingCjk, isUnicodeWhitespace };
|
|
@@ -0,0 +1,44 @@
|
|
|
1
|
+
// src/categoryUtil.ts
|
|
2
|
+
import { constants as constants2 } from "micromark-util-symbol";
|
|
3
|
+
|
|
4
|
+
// src/classifyCharacter.ts
|
|
5
|
+
import { markdownLineEndingOrSpace } from "micromark-util-character";
|
|
6
|
+
import { constants, codes } from "micromark-util-symbol";
|
|
7
|
+
var constantsEx;
|
|
8
|
+
((constantsEx2) => {
|
|
9
|
+
constantsEx2.spaceOrPunctuation = 3;
|
|
10
|
+
constantsEx2.cjk = 4096;
|
|
11
|
+
constantsEx2.cjkPunctuation = 4098;
|
|
12
|
+
constantsEx2.ivs = 8192;
|
|
13
|
+
constantsEx2.cjkOrIvs = 12288;
|
|
14
|
+
constantsEx2.svsFollowingCjk = 16384;
|
|
15
|
+
constantsEx2.variationSelector = 28672;
|
|
16
|
+
})(constantsEx || (constantsEx = {}));
|
|
17
|
+
|
|
18
|
+
// src/categoryUtil.ts
|
|
19
|
+
function isUnicodeWhitespace(category) {
|
|
20
|
+
return Boolean(category & constants2.characterGroupWhitespace);
|
|
21
|
+
}
|
|
22
|
+
function isNonCjkPunctuation(category) {
|
|
23
|
+
return (category & constantsEx.cjkPunctuation) === constants2.characterGroupPunctuation;
|
|
24
|
+
}
|
|
25
|
+
function isCjk(category) {
|
|
26
|
+
return Boolean(category & constantsEx.cjk);
|
|
27
|
+
}
|
|
28
|
+
function isIvs(category) {
|
|
29
|
+
return category === constantsEx.ivs;
|
|
30
|
+
}
|
|
31
|
+
function isSvsFollowingCjk(category) {
|
|
32
|
+
return category === constantsEx.svsFollowingCjk;
|
|
33
|
+
}
|
|
34
|
+
function isSpaceOrPunctuation(category) {
|
|
35
|
+
return Boolean(category & constantsEx.spaceOrPunctuation);
|
|
36
|
+
}
|
|
37
|
+
export {
|
|
38
|
+
isCjk,
|
|
39
|
+
isIvs,
|
|
40
|
+
isNonCjkPunctuation,
|
|
41
|
+
isSpaceOrPunctuation,
|
|
42
|
+
isSvsFollowingCjk,
|
|
43
|
+
isUnicodeWhitespace
|
|
44
|
+
};
|
|
@@ -1,17 +1,18 @@
|
|
|
1
|
-
import
|
|
1
|
+
import { Code } from 'micromark-util-types';
|
|
2
|
+
|
|
2
3
|
/**
|
|
3
4
|
* Check if `uc` is CJK or IVS
|
|
4
5
|
*
|
|
5
6
|
* @param uc code point
|
|
6
7
|
* @returns `true` if `uc` is CJK, `null` if IVS, or `false` if neither
|
|
7
8
|
*/
|
|
8
|
-
|
|
9
|
+
declare function cjkOrIvs(uc: Code): boolean | null;
|
|
9
10
|
/**
|
|
10
11
|
* Check whether the character code represents Standard Variation Sequence that can follow an ideographic character.
|
|
11
12
|
*
|
|
12
13
|
* U+FE0E is used for some CJK symbols (e.g. U+3299) that can also be
|
|
13
14
|
*/
|
|
14
|
-
|
|
15
|
+
declare const svsFollowingCjk: (code: Code) => boolean;
|
|
15
16
|
/**
|
|
16
17
|
* Check whether the character code represents Unicode punctuation.
|
|
17
18
|
*
|
|
@@ -31,7 +32,7 @@ export declare const svsFollowingCjk: (code: Code) => boolean;
|
|
|
31
32
|
* @returns
|
|
32
33
|
* Whether it matches.
|
|
33
34
|
*/
|
|
34
|
-
|
|
35
|
+
declare const unicodePunctuation: (code: Code) => boolean;
|
|
35
36
|
/**
|
|
36
37
|
* Check whether the character code represents Unicode whitespace.
|
|
37
38
|
*
|
|
@@ -52,4 +53,6 @@ export declare const unicodePunctuation: (code: Code) => boolean;
|
|
|
52
53
|
* @returns
|
|
53
54
|
* Whether it matches.
|
|
54
55
|
*/
|
|
55
|
-
|
|
56
|
+
declare const unicodeWhitespace: (code: Code) => boolean;
|
|
57
|
+
|
|
58
|
+
export { cjkOrIvs, svsFollowingCjk, unicodePunctuation, unicodeWhitespace };
|
|
@@ -0,0 +1,54 @@
|
|
|
1
|
+
// src/characterWithNonBmp.ts
|
|
2
|
+
import { eastAsianWidthType } from "get-east-asian-width";
|
|
3
|
+
var isEmoji = function(uc) {
|
|
4
|
+
if (this.fn !== null) {
|
|
5
|
+
return this.fn(uc);
|
|
6
|
+
}
|
|
7
|
+
try {
|
|
8
|
+
const regex = new RegExp("^\\p{RGI_Emoji}", "v");
|
|
9
|
+
this.fn = (uc_) => regex.test(String.fromCodePoint(uc_));
|
|
10
|
+
} catch (e) {
|
|
11
|
+
if (!(e instanceof SyntaxError)) {
|
|
12
|
+
throw e;
|
|
13
|
+
}
|
|
14
|
+
this.fn = (cp) => 8986 <= cp && cp <= 8987 || 9193 <= cp && cp <= 9196 || cp === 9200 || cp === 9203 || 9725 <= cp && cp <= 9726 || 9748 <= cp && cp <= 9749 || 9800 <= cp && cp <= 9811 || cp === 9855 || cp === 9875 || cp === 9889 || 9898 <= cp && cp <= 9899 || 9917 <= cp && cp <= 9918 || 9924 <= cp && cp <= 9925 || cp === 9934 || cp === 9940 || cp === 9962 || 9970 <= cp && cp <= 9971 || cp === 9973 || cp === 9978 || cp === 9981 || cp === 9989 || 9994 <= cp && cp <= 9995 || cp === 10024 || cp === 10060 || cp === 10062 || 10067 <= cp && cp <= 10069 || cp === 10071 || 10133 <= cp && cp <= 10135 || cp === 10160 || cp === 10175 || 11035 <= cp && cp <= 11036 || cp === 11088 || cp === 11093 || cp === 126980 || cp === 127183 || cp === 127374 || 127377 <= cp && cp <= 127386 || cp === 127489 || cp === 127514 || cp === 127535 || 127538 <= cp && cp <= 127542 || 127544 <= cp && cp <= 127546 || 127568 <= cp && cp <= 127569 || 127744 <= cp && cp <= 127756 || 127757 <= cp && cp <= 127758 || cp === 127759 || cp === 127760 || cp === 127761 || cp === 127762 || 127763 <= cp && cp <= 127765 || 127766 <= cp && cp <= 127768 || cp === 127769 || cp === 127770 || cp === 127771 || cp === 127772 || 127773 <= cp && cp <= 127774 || 127775 <= cp && cp <= 127776 || 127789 <= cp && cp <= 127791 || 127792 <= cp && cp <= 127793 || 127794 <= cp && cp <= 127795 || 127796 <= cp && cp <= 127797 || 127799 <= cp && cp <= 127818 || cp === 127819 || 127820 <= cp && cp <= 127823 || cp === 127824 || 127825 <= cp && cp <= 127867 || cp === 127868 || 127870 <= cp && cp <= 127871 || 127872 <= cp && cp <= 127891 || 127904 <= cp && cp <= 127940 || cp === 127941 || cp === 127942 || cp === 127943 || cp === 127944 || cp === 127945 || cp === 127946 || 127951 <= cp && cp <= 127955 || 127968 <= cp && cp <= 127971 || cp === 127972 || 127973 <= cp && cp <= 127984 || cp === 127988 || 127992 <= cp && cp <= 128007 || cp === 128008 || 128009 <= cp && cp <= 128011 || 128012 <= cp && cp <= 128014 || 128015 <= cp && cp <= 128016 || 128017 <= cp && cp <= 128018 || cp === 128019 || cp === 128020 || cp === 128021 || cp === 128022 || 128023 <= cp && cp <= 128041 || cp === 128042 || 128043 <= cp && cp <= 128062 || cp === 128064 || 128066 <= cp && cp <= 128100 || cp === 128101 || 128102 <= cp && cp <= 128107 || 128108 <= cp && cp <= 128109 || 128110 <= cp && cp <= 128172 || cp === 128173 || 128174 <= cp && cp <= 128181 || 128182 <= cp && cp <= 128183 || 128184 <= cp && cp <= 128235 || 128236 <= cp && cp <= 128237 || cp === 128238 || cp === 128239 || 128240 <= cp && cp <= 128244 || cp === 128245 || 128246 <= cp && cp <= 128247 || cp === 128248 || 128249 <= cp && cp <= 128252 || 128255 <= cp && cp <= 128258 || cp === 128259 || 128260 <= cp && cp <= 128263 || cp === 128264 || cp === 128265 || 128266 <= cp && cp <= 128276 || cp === 128277 || 128278 <= cp && cp <= 128299 || 128300 <= cp && cp <= 128301 || 128302 <= cp && cp <= 128317 || 128331 <= cp && cp <= 128334 || 128336 <= cp && cp <= 128347 || 128348 <= cp && cp <= 128359 || cp === 128378 || 128405 <= cp && cp <= 128406 || cp === 128420 || 128507 <= cp && cp <= 128511 || cp === 128512 || 128513 <= cp && cp <= 128518 || 128519 <= cp && cp <= 128520 || 128521 <= cp && cp <= 128525 || cp === 128526 || cp === 128527 || cp === 128528 || cp === 128529 || 128530 <= cp && cp <= 128532 || cp === 128533 || cp === 128534 || cp === 128535 || cp === 128536 || cp === 128537 || cp === 128538 || cp === 128539 || 128540 <= cp && cp <= 128542 || cp === 128543 || 128544 <= cp && cp <= 128549 || 128550 <= cp && cp <= 128551 || 128552 <= cp && cp <= 128555 || cp === 128556 || cp === 128557 || 128558 <= cp && cp <= 128559 || 128560 <= cp && cp <= 128563 || cp === 128564 || cp === 128565 || cp === 128566 || 128567 <= cp && cp <= 128576 || 128577 <= cp && cp <= 128580 || 128581 <= cp && cp <= 128591 || cp === 128640 || 128641 <= cp && cp <= 128642 || 128643 <= cp && cp <= 128645 || cp === 128646 || cp === 128647 || cp === 128648 || cp === 128649 || 128650 <= cp && cp <= 128651 || cp === 128652 || cp === 128653 || cp === 128654 || cp === 128655 || cp === 128656 || 128657 <= cp && cp <= 128659 || cp === 128660 || cp === 128661 || cp === 128662 || cp === 128663 || cp === 128664 || 128665 <= cp && cp <= 128666 || 128667 <= cp && cp <= 128673 || cp === 128674 || cp === 128675 || 128676 <= cp && cp <= 128677 || cp === 128678 || 128679 <= cp && cp <= 128685 || 128686 <= cp && cp <= 128689 || cp === 128690 || 128691 <= cp && cp <= 128693 || cp === 128694 || 128695 <= cp && cp <= 128696 || 128697 <= cp && cp <= 128702 || cp === 128703 || cp === 128704 || 128705 <= cp && cp <= 128709 || cp === 128716 || cp === 128720 || 128721 <= cp && cp <= 128722 || cp === 128725 || 128726 <= cp && cp <= 128727 || cp === 128732 || 128733 <= cp && cp <= 128735 || 128747 <= cp && cp <= 128748 || 128756 <= cp && cp <= 128758 || 128759 <= cp && cp <= 128760 || cp === 128761 || cp === 128762 || 128763 <= cp && cp <= 128764 || 128992 <= cp && cp <= 129003 || cp === 129008 || cp === 129292 || 129293 <= cp && cp <= 129295 || 129296 <= cp && cp <= 129304 || 129305 <= cp && cp <= 129310 || cp === 129311 || 129312 <= cp && cp <= 129319 || 129320 <= cp && cp <= 129327 || cp === 129328 || 129329 <= cp && cp <= 129330 || 129331 <= cp && cp <= 129338 || 129340 <= cp && cp <= 129342 || cp === 129343 || 129344 <= cp && cp <= 129349 || 129351 <= cp && cp <= 129355 || cp === 129356 || 129357 <= cp && cp <= 129359 || 129360 <= cp && cp <= 129374 || 129375 <= cp && cp <= 129387 || 129388 <= cp && cp <= 129392 || cp === 129393 || cp === 129394 || 129395 <= cp && cp <= 129398 || 129399 <= cp && cp <= 129400 || cp === 129401 || cp === 129402 || cp === 129403 || 129404 <= cp && cp <= 129407 || 129408 <= cp && cp <= 129412 || 129413 <= cp && cp <= 129425 || 129426 <= cp && cp <= 129431 || 129432 <= cp && cp <= 129442 || 129443 <= cp && cp <= 129444 || 129445 <= cp && cp <= 129450 || 129451 <= cp && cp <= 129453 || 129454 <= cp && cp <= 129455 || 129456 <= cp && cp <= 129465 || 129466 <= cp && cp <= 129471 || cp === 129472 || 129473 <= cp && cp <= 129474 || 129475 <= cp && cp <= 129482 || cp === 129483 || cp === 129484 || 129485 <= cp && cp <= 129487 || 129488 <= cp && cp <= 129510 || 129511 <= cp && cp <= 129535 || 129648 <= cp && cp <= 129651 || cp === 129652 || 129653 <= cp && cp <= 129655 || 129656 <= cp && cp <= 129658 || 129659 <= cp && cp <= 129660 || 129664 <= cp && cp <= 129666 || 129667 <= cp && cp <= 129670 || 129671 <= cp && cp <= 129672 || cp === 129673 || cp === 129679 || 129680 <= cp && cp <= 129685 || 129686 <= cp && cp <= 129704 || 129705 <= cp && cp <= 129708 || 129709 <= cp && cp <= 129711 || 129712 <= cp && cp <= 129718 || 129719 <= cp && cp <= 129722 || 129723 <= cp && cp <= 129725 || cp === 129726 || cp === 129727 || 129728 <= cp && cp <= 129730 || 129731 <= cp && cp <= 129733 || cp === 129734 || 129742 <= cp && cp <= 129743 || 129744 <= cp && cp <= 129750 || 129751 <= cp && cp <= 129753 || 129754 <= cp && cp <= 129755 || cp === 129756 || cp === 129759 || 129760 <= cp && cp <= 129767 || cp === 129768 || cp === 129769 || 129776 <= cp && cp <= 129782 || 129783 <= cp && cp <= 129784;
|
|
15
|
+
}
|
|
16
|
+
return this.fn(uc);
|
|
17
|
+
}.bind({
|
|
18
|
+
fn: null
|
|
19
|
+
});
|
|
20
|
+
function cjkOrIvs(uc) {
|
|
21
|
+
if (!uc || uc < 0) {
|
|
22
|
+
return false;
|
|
23
|
+
}
|
|
24
|
+
const eaw = eastAsianWidthType(uc);
|
|
25
|
+
switch (eaw) {
|
|
26
|
+
case "fullwidth":
|
|
27
|
+
case "halfwidth":
|
|
28
|
+
return true;
|
|
29
|
+
// never be emoji
|
|
30
|
+
case "wide":
|
|
31
|
+
return !isEmoji(uc);
|
|
32
|
+
case "narrow":
|
|
33
|
+
return false;
|
|
34
|
+
case "ambiguous":
|
|
35
|
+
return 917760 <= uc && uc <= 917999 ? null : false;
|
|
36
|
+
case "neutral":
|
|
37
|
+
return /^\p{sc=Hangul}/u.test(String.fromCodePoint(uc));
|
|
38
|
+
}
|
|
39
|
+
}
|
|
40
|
+
var svsFollowingCjk = regexCheck(/[\uFE00-\uFE02\uFE0E]/u);
|
|
41
|
+
var unicodePunctuation = regexCheck(/\p{P}|\p{S}/u);
|
|
42
|
+
var unicodeWhitespace = regexCheck(/\s/);
|
|
43
|
+
function regexCheck(regex) {
|
|
44
|
+
return check;
|
|
45
|
+
function check(code) {
|
|
46
|
+
return code !== null && code > -1 && regex.test(String.fromCodePoint(code));
|
|
47
|
+
}
|
|
48
|
+
}
|
|
49
|
+
export {
|
|
50
|
+
cjkOrIvs,
|
|
51
|
+
svsFollowingCjk,
|
|
52
|
+
unicodePunctuation,
|
|
53
|
+
unicodeWhitespace
|
|
54
|
+
};
|
|
@@ -1,6 +1,7 @@
|
|
|
1
|
-
import { constants } from
|
|
2
|
-
import
|
|
3
|
-
|
|
1
|
+
import { constants } from 'micromark-util-symbol';
|
|
2
|
+
import { Code } from 'micromark-util-types';
|
|
3
|
+
|
|
4
|
+
declare namespace constantsEx {
|
|
4
5
|
const spaceOrPunctuation: 3;
|
|
5
6
|
const cjk: 4096;
|
|
6
7
|
const cjkPunctuation: 4098;
|
|
@@ -23,4 +24,6 @@ export declare namespace constantsEx {
|
|
|
23
24
|
* @returns
|
|
24
25
|
* Group.
|
|
25
26
|
*/
|
|
26
|
-
|
|
27
|
+
declare function classifyCharacter(code: Code): typeof constants.characterGroupWhitespace | typeof constants.characterGroupPunctuation | typeof constantsEx.cjk | typeof constantsEx.cjkPunctuation | typeof constantsEx.ivs | typeof constantsEx.svsFollowingCjk | 0;
|
|
28
|
+
|
|
29
|
+
export { classifyCharacter, constantsEx };
|
|
@@ -0,0 +1,90 @@
|
|
|
1
|
+
// src/classifyCharacter.ts
|
|
2
|
+
import { markdownLineEndingOrSpace } from "micromark-util-character";
|
|
3
|
+
import { constants, codes } from "micromark-util-symbol";
|
|
4
|
+
|
|
5
|
+
// src/characterWithNonBmp.ts
|
|
6
|
+
import { eastAsianWidthType } from "get-east-asian-width";
|
|
7
|
+
var isEmoji = function(uc) {
|
|
8
|
+
if (this.fn !== null) {
|
|
9
|
+
return this.fn(uc);
|
|
10
|
+
}
|
|
11
|
+
try {
|
|
12
|
+
const regex = new RegExp("^\\p{RGI_Emoji}", "v");
|
|
13
|
+
this.fn = (uc_) => regex.test(String.fromCodePoint(uc_));
|
|
14
|
+
} catch (e) {
|
|
15
|
+
if (!(e instanceof SyntaxError)) {
|
|
16
|
+
throw e;
|
|
17
|
+
}
|
|
18
|
+
this.fn = (cp) => 8986 <= cp && cp <= 8987 || 9193 <= cp && cp <= 9196 || cp === 9200 || cp === 9203 || 9725 <= cp && cp <= 9726 || 9748 <= cp && cp <= 9749 || 9800 <= cp && cp <= 9811 || cp === 9855 || cp === 9875 || cp === 9889 || 9898 <= cp && cp <= 9899 || 9917 <= cp && cp <= 9918 || 9924 <= cp && cp <= 9925 || cp === 9934 || cp === 9940 || cp === 9962 || 9970 <= cp && cp <= 9971 || cp === 9973 || cp === 9978 || cp === 9981 || cp === 9989 || 9994 <= cp && cp <= 9995 || cp === 10024 || cp === 10060 || cp === 10062 || 10067 <= cp && cp <= 10069 || cp === 10071 || 10133 <= cp && cp <= 10135 || cp === 10160 || cp === 10175 || 11035 <= cp && cp <= 11036 || cp === 11088 || cp === 11093 || cp === 126980 || cp === 127183 || cp === 127374 || 127377 <= cp && cp <= 127386 || cp === 127489 || cp === 127514 || cp === 127535 || 127538 <= cp && cp <= 127542 || 127544 <= cp && cp <= 127546 || 127568 <= cp && cp <= 127569 || 127744 <= cp && cp <= 127756 || 127757 <= cp && cp <= 127758 || cp === 127759 || cp === 127760 || cp === 127761 || cp === 127762 || 127763 <= cp && cp <= 127765 || 127766 <= cp && cp <= 127768 || cp === 127769 || cp === 127770 || cp === 127771 || cp === 127772 || 127773 <= cp && cp <= 127774 || 127775 <= cp && cp <= 127776 || 127789 <= cp && cp <= 127791 || 127792 <= cp && cp <= 127793 || 127794 <= cp && cp <= 127795 || 127796 <= cp && cp <= 127797 || 127799 <= cp && cp <= 127818 || cp === 127819 || 127820 <= cp && cp <= 127823 || cp === 127824 || 127825 <= cp && cp <= 127867 || cp === 127868 || 127870 <= cp && cp <= 127871 || 127872 <= cp && cp <= 127891 || 127904 <= cp && cp <= 127940 || cp === 127941 || cp === 127942 || cp === 127943 || cp === 127944 || cp === 127945 || cp === 127946 || 127951 <= cp && cp <= 127955 || 127968 <= cp && cp <= 127971 || cp === 127972 || 127973 <= cp && cp <= 127984 || cp === 127988 || 127992 <= cp && cp <= 128007 || cp === 128008 || 128009 <= cp && cp <= 128011 || 128012 <= cp && cp <= 128014 || 128015 <= cp && cp <= 128016 || 128017 <= cp && cp <= 128018 || cp === 128019 || cp === 128020 || cp === 128021 || cp === 128022 || 128023 <= cp && cp <= 128041 || cp === 128042 || 128043 <= cp && cp <= 128062 || cp === 128064 || 128066 <= cp && cp <= 128100 || cp === 128101 || 128102 <= cp && cp <= 128107 || 128108 <= cp && cp <= 128109 || 128110 <= cp && cp <= 128172 || cp === 128173 || 128174 <= cp && cp <= 128181 || 128182 <= cp && cp <= 128183 || 128184 <= cp && cp <= 128235 || 128236 <= cp && cp <= 128237 || cp === 128238 || cp === 128239 || 128240 <= cp && cp <= 128244 || cp === 128245 || 128246 <= cp && cp <= 128247 || cp === 128248 || 128249 <= cp && cp <= 128252 || 128255 <= cp && cp <= 128258 || cp === 128259 || 128260 <= cp && cp <= 128263 || cp === 128264 || cp === 128265 || 128266 <= cp && cp <= 128276 || cp === 128277 || 128278 <= cp && cp <= 128299 || 128300 <= cp && cp <= 128301 || 128302 <= cp && cp <= 128317 || 128331 <= cp && cp <= 128334 || 128336 <= cp && cp <= 128347 || 128348 <= cp && cp <= 128359 || cp === 128378 || 128405 <= cp && cp <= 128406 || cp === 128420 || 128507 <= cp && cp <= 128511 || cp === 128512 || 128513 <= cp && cp <= 128518 || 128519 <= cp && cp <= 128520 || 128521 <= cp && cp <= 128525 || cp === 128526 || cp === 128527 || cp === 128528 || cp === 128529 || 128530 <= cp && cp <= 128532 || cp === 128533 || cp === 128534 || cp === 128535 || cp === 128536 || cp === 128537 || cp === 128538 || cp === 128539 || 128540 <= cp && cp <= 128542 || cp === 128543 || 128544 <= cp && cp <= 128549 || 128550 <= cp && cp <= 128551 || 128552 <= cp && cp <= 128555 || cp === 128556 || cp === 128557 || 128558 <= cp && cp <= 128559 || 128560 <= cp && cp <= 128563 || cp === 128564 || cp === 128565 || cp === 128566 || 128567 <= cp && cp <= 128576 || 128577 <= cp && cp <= 128580 || 128581 <= cp && cp <= 128591 || cp === 128640 || 128641 <= cp && cp <= 128642 || 128643 <= cp && cp <= 128645 || cp === 128646 || cp === 128647 || cp === 128648 || cp === 128649 || 128650 <= cp && cp <= 128651 || cp === 128652 || cp === 128653 || cp === 128654 || cp === 128655 || cp === 128656 || 128657 <= cp && cp <= 128659 || cp === 128660 || cp === 128661 || cp === 128662 || cp === 128663 || cp === 128664 || 128665 <= cp && cp <= 128666 || 128667 <= cp && cp <= 128673 || cp === 128674 || cp === 128675 || 128676 <= cp && cp <= 128677 || cp === 128678 || 128679 <= cp && cp <= 128685 || 128686 <= cp && cp <= 128689 || cp === 128690 || 128691 <= cp && cp <= 128693 || cp === 128694 || 128695 <= cp && cp <= 128696 || 128697 <= cp && cp <= 128702 || cp === 128703 || cp === 128704 || 128705 <= cp && cp <= 128709 || cp === 128716 || cp === 128720 || 128721 <= cp && cp <= 128722 || cp === 128725 || 128726 <= cp && cp <= 128727 || cp === 128732 || 128733 <= cp && cp <= 128735 || 128747 <= cp && cp <= 128748 || 128756 <= cp && cp <= 128758 || 128759 <= cp && cp <= 128760 || cp === 128761 || cp === 128762 || 128763 <= cp && cp <= 128764 || 128992 <= cp && cp <= 129003 || cp === 129008 || cp === 129292 || 129293 <= cp && cp <= 129295 || 129296 <= cp && cp <= 129304 || 129305 <= cp && cp <= 129310 || cp === 129311 || 129312 <= cp && cp <= 129319 || 129320 <= cp && cp <= 129327 || cp === 129328 || 129329 <= cp && cp <= 129330 || 129331 <= cp && cp <= 129338 || 129340 <= cp && cp <= 129342 || cp === 129343 || 129344 <= cp && cp <= 129349 || 129351 <= cp && cp <= 129355 || cp === 129356 || 129357 <= cp && cp <= 129359 || 129360 <= cp && cp <= 129374 || 129375 <= cp && cp <= 129387 || 129388 <= cp && cp <= 129392 || cp === 129393 || cp === 129394 || 129395 <= cp && cp <= 129398 || 129399 <= cp && cp <= 129400 || cp === 129401 || cp === 129402 || cp === 129403 || 129404 <= cp && cp <= 129407 || 129408 <= cp && cp <= 129412 || 129413 <= cp && cp <= 129425 || 129426 <= cp && cp <= 129431 || 129432 <= cp && cp <= 129442 || 129443 <= cp && cp <= 129444 || 129445 <= cp && cp <= 129450 || 129451 <= cp && cp <= 129453 || 129454 <= cp && cp <= 129455 || 129456 <= cp && cp <= 129465 || 129466 <= cp && cp <= 129471 || cp === 129472 || 129473 <= cp && cp <= 129474 || 129475 <= cp && cp <= 129482 || cp === 129483 || cp === 129484 || 129485 <= cp && cp <= 129487 || 129488 <= cp && cp <= 129510 || 129511 <= cp && cp <= 129535 || 129648 <= cp && cp <= 129651 || cp === 129652 || 129653 <= cp && cp <= 129655 || 129656 <= cp && cp <= 129658 || 129659 <= cp && cp <= 129660 || 129664 <= cp && cp <= 129666 || 129667 <= cp && cp <= 129670 || 129671 <= cp && cp <= 129672 || cp === 129673 || cp === 129679 || 129680 <= cp && cp <= 129685 || 129686 <= cp && cp <= 129704 || 129705 <= cp && cp <= 129708 || 129709 <= cp && cp <= 129711 || 129712 <= cp && cp <= 129718 || 129719 <= cp && cp <= 129722 || 129723 <= cp && cp <= 129725 || cp === 129726 || cp === 129727 || 129728 <= cp && cp <= 129730 || 129731 <= cp && cp <= 129733 || cp === 129734 || 129742 <= cp && cp <= 129743 || 129744 <= cp && cp <= 129750 || 129751 <= cp && cp <= 129753 || 129754 <= cp && cp <= 129755 || cp === 129756 || cp === 129759 || 129760 <= cp && cp <= 129767 || cp === 129768 || cp === 129769 || 129776 <= cp && cp <= 129782 || 129783 <= cp && cp <= 129784;
|
|
19
|
+
}
|
|
20
|
+
return this.fn(uc);
|
|
21
|
+
}.bind({
|
|
22
|
+
fn: null
|
|
23
|
+
});
|
|
24
|
+
function cjkOrIvs(uc) {
|
|
25
|
+
if (!uc || uc < 0) {
|
|
26
|
+
return false;
|
|
27
|
+
}
|
|
28
|
+
const eaw = eastAsianWidthType(uc);
|
|
29
|
+
switch (eaw) {
|
|
30
|
+
case "fullwidth":
|
|
31
|
+
case "halfwidth":
|
|
32
|
+
return true;
|
|
33
|
+
// never be emoji
|
|
34
|
+
case "wide":
|
|
35
|
+
return !isEmoji(uc);
|
|
36
|
+
case "narrow":
|
|
37
|
+
return false;
|
|
38
|
+
case "ambiguous":
|
|
39
|
+
return 917760 <= uc && uc <= 917999 ? null : false;
|
|
40
|
+
case "neutral":
|
|
41
|
+
return /^\p{sc=Hangul}/u.test(String.fromCodePoint(uc));
|
|
42
|
+
}
|
|
43
|
+
}
|
|
44
|
+
var svsFollowingCjk = regexCheck(/[\uFE00-\uFE02\uFE0E]/u);
|
|
45
|
+
var unicodePunctuation = regexCheck(/\p{P}|\p{S}/u);
|
|
46
|
+
var unicodeWhitespace = regexCheck(/\s/);
|
|
47
|
+
function regexCheck(regex) {
|
|
48
|
+
return check;
|
|
49
|
+
function check(code) {
|
|
50
|
+
return code !== null && code > -1 && regex.test(String.fromCodePoint(code));
|
|
51
|
+
}
|
|
52
|
+
}
|
|
53
|
+
|
|
54
|
+
// src/classifyCharacter.ts
|
|
55
|
+
var constantsEx;
|
|
56
|
+
((constantsEx2) => {
|
|
57
|
+
constantsEx2.spaceOrPunctuation = 3;
|
|
58
|
+
constantsEx2.cjk = 4096;
|
|
59
|
+
constantsEx2.cjkPunctuation = 4098;
|
|
60
|
+
constantsEx2.ivs = 8192;
|
|
61
|
+
constantsEx2.cjkOrIvs = 12288;
|
|
62
|
+
constantsEx2.svsFollowingCjk = 16384;
|
|
63
|
+
constantsEx2.variationSelector = 28672;
|
|
64
|
+
})(constantsEx || (constantsEx = {}));
|
|
65
|
+
function classifyCharacter(code) {
|
|
66
|
+
if (code === codes.eof || markdownLineEndingOrSpace(code) || unicodeWhitespace(code)) {
|
|
67
|
+
return constants.characterGroupWhitespace;
|
|
68
|
+
}
|
|
69
|
+
let value = 0;
|
|
70
|
+
if (code >= 4352) {
|
|
71
|
+
if (svsFollowingCjk(code)) {
|
|
72
|
+
return constantsEx.svsFollowingCjk;
|
|
73
|
+
}
|
|
74
|
+
switch (cjkOrIvs(code)) {
|
|
75
|
+
case null:
|
|
76
|
+
return constantsEx.ivs;
|
|
77
|
+
case true:
|
|
78
|
+
value |= constantsEx.cjk;
|
|
79
|
+
break;
|
|
80
|
+
}
|
|
81
|
+
}
|
|
82
|
+
if (unicodePunctuation(code)) {
|
|
83
|
+
value |= constants.characterGroupPunctuation;
|
|
84
|
+
}
|
|
85
|
+
return value;
|
|
86
|
+
}
|
|
87
|
+
export {
|
|
88
|
+
classifyCharacter,
|
|
89
|
+
constantsEx
|
|
90
|
+
};
|
package/dist/codeUtil.d.ts
CHANGED
|
@@ -1,4 +1,5 @@
|
|
|
1
|
-
import
|
|
1
|
+
import { Code, Point, TokenizeContext } from 'micromark-util-types';
|
|
2
|
+
|
|
2
3
|
/**
|
|
3
4
|
* Check if the given code is a [High-Surrogate Code Unit](https://www.unicode.org/glossary/#high_surrogate_code_unit).
|
|
4
5
|
*
|
|
@@ -7,7 +8,7 @@ import type { Code, Point, TokenizeContext } from "micromark-util-types";
|
|
|
7
8
|
* @param code Code.
|
|
8
9
|
* @returns `true` if the code is a High-Surrogate Code Unit, `false` otherwise.
|
|
9
10
|
*/
|
|
10
|
-
|
|
11
|
+
declare function isCodeHighSurrogate(code: Code): code is Exclude<Code, null>;
|
|
11
12
|
/**
|
|
12
13
|
* Check if the given code is a [Low-Surrogate Code Unit](https://www.unicode.org/glossary/#low_surrogate_code_unit).
|
|
13
14
|
*
|
|
@@ -17,7 +18,7 @@ export declare function isCodeHighSurrogate(code: Code): code is Exclude<Code, n
|
|
|
17
18
|
* @returns
|
|
18
19
|
* True if the code is a Low-Surrogate Code Unit, false otherwise.
|
|
19
20
|
*/
|
|
20
|
-
|
|
21
|
+
declare function isCodeLowSurrogate(code: Code): code is Exclude<Code, null>;
|
|
21
22
|
/**
|
|
22
23
|
* If `code` is a [Low-Surrogate Code Unit](https://www.unicode.org/glossary/#low_surrogate_code_unit), try to get a genuine previous [Unicode Scalar Value](https://www.unicode.org/glossary/#unicode_scalar_value) corresponding to the Low-Surrogate Code Unit.
|
|
23
24
|
* @param code a tentative previous [code unit](https://www.unicode.org/glossary/#code_unit) less than 65,536, including a Low-Surrogate one
|
|
@@ -25,7 +26,7 @@ export declare function isCodeLowSurrogate(code: Code): code is Exclude<Code, nu
|
|
|
25
26
|
* @param sliceSerialize `this.sliceSerialize` (`this` = `TokenizeContext`)
|
|
26
27
|
* @returns a value greater than 65,535 if the previous code point represents a [Supplementary Character](https://www.unicode.org/glossary/#supplementary_character), or `code` otherwise
|
|
27
28
|
*/
|
|
28
|
-
|
|
29
|
+
declare function tryGetGenuinePreviousCode(code: Exclude<Code, null>, nowPoint: Point, sliceSerialize: TokenizeContext["sliceSerialize"]): Exclude<Code, null>;
|
|
29
30
|
/**
|
|
30
31
|
* Try to get the [Unicode Code Point](https://www.unicode.org/glossary/#code_point) two positions before the current position.
|
|
31
32
|
*
|
|
@@ -34,7 +35,36 @@ export declare function tryGetGenuinePreviousCode(code: Exclude<Code, null>, now
|
|
|
34
35
|
* @param sliceSerialize `this.sliceSerialize` (`this` = `TokenizeContext`)
|
|
35
36
|
* @returns a value greater than 65,535 if the code point two positions before represents a [Supplementary Character](https://www.unicode.org/glossary/#supplementary_character), a value less than 65,536 for a [BMP Character](https://www.unicode.org/glossary/#bmp_character), or `null` if not found
|
|
36
37
|
*/
|
|
37
|
-
|
|
38
|
+
declare function tryGetCodeTwoBefore(previousCode: Exclude<Code, null>, nowPoint: Point, sliceSerialize: TokenizeContext["sliceSerialize"]): Code;
|
|
39
|
+
/**
|
|
40
|
+
* Lazily get the [Unicode Code Point](https://www.unicode.org/glossary/#code_point) two positions before the current position only if necessary.
|
|
41
|
+
*
|
|
42
|
+
* @see {@link tryGetCodeTwoBefore}
|
|
43
|
+
*/
|
|
44
|
+
declare class TwoPreviousCode {
|
|
45
|
+
readonly previousCode: Exclude<Code, null>;
|
|
46
|
+
readonly nowPoint: Point;
|
|
47
|
+
readonly sliceSerialize: TokenizeContext["sliceSerialize"];
|
|
48
|
+
private cachedValue;
|
|
49
|
+
/**
|
|
50
|
+
* @see {@link tryGetCodeTwoBefore}
|
|
51
|
+
*
|
|
52
|
+
* @param previousCode a previous code point. Should be greater than 65,535 if it represents a [Supplementary Character](https://www.unicode.org/glossary/#supplementary_character).
|
|
53
|
+
* @param nowPoint `this.now()` (`this` = `TokenizeContext`)
|
|
54
|
+
* @param sliceSerialize `this.sliceSerialize` (`this` = `TokenizeContext`)
|
|
55
|
+
*/
|
|
56
|
+
constructor(previousCode: Exclude<Code, null>, nowPoint: Point, sliceSerialize: TokenizeContext["sliceSerialize"]);
|
|
57
|
+
/**
|
|
58
|
+
* Returns the return value of {@link tryGetCodeTwoBefore}.
|
|
59
|
+
*
|
|
60
|
+
* If the value has not been computed yet, it will be computed and cached.
|
|
61
|
+
*
|
|
62
|
+
* @see {@link tryGetCodeTwoBefore}
|
|
63
|
+
*
|
|
64
|
+
* @returns a value greater than 65,535 if the code point two positions before represents a [Supplementary Character](https://www.unicode.org/glossary/#supplementary_character), a value less than 65,536 for a [BMP Character](https://www.unicode.org/glossary/#bmp_character), or `null` if not found
|
|
65
|
+
*/
|
|
66
|
+
value(): Code;
|
|
67
|
+
}
|
|
38
68
|
/**
|
|
39
69
|
* If `code` is a [High-Surrogate Code Unit](https://www.unicode.org/glossary/#high_surrogate_code_unit), try to get a genuine next [Unicode Scalar Value](https://www.unicode.org/glossary/#unicode_scalar_value) corresponding to the High-Surrogate Code Unit.
|
|
40
70
|
* @param code a tentative next [code unit](https://www.unicode.org/glossary/#code_unit) less than 65,536, including a High-Surrogate one
|
|
@@ -42,4 +72,6 @@ export declare function tryGetCodeTwoBefore(previousCode: Exclude<Code, null>, n
|
|
|
42
72
|
* @param sliceSerialize `this.sliceSerialize` (`this` = `TokenizeContext`)
|
|
43
73
|
* @returns a value greater than 65,535 if the next code point represents a [Supplementary Character](https://www.unicode.org/glossary/#supplementary_character), or `code` otherwise
|
|
44
74
|
*/
|
|
45
|
-
|
|
75
|
+
declare function tryGetGenuineNextCode(code: Exclude<Code, null>, nowPoint: Point, sliceSerialize: TokenizeContext["sliceSerialize"]): Exclude<Code, null>;
|
|
76
|
+
|
|
77
|
+
export { TwoPreviousCode, isCodeHighSurrogate, isCodeLowSurrogate, tryGetCodeTwoBefore, tryGetGenuineNextCode, tryGetGenuinePreviousCode };
|
package/dist/codeUtil.js
ADDED
|
@@ -0,0 +1,104 @@
|
|
|
1
|
+
var __defProp = Object.defineProperty;
|
|
2
|
+
var __defNormalProp = (obj, key, value) => key in obj ? __defProp(obj, key, { enumerable: true, configurable: true, writable: true, value }) : obj[key] = value;
|
|
3
|
+
var __publicField = (obj, key, value) => __defNormalProp(obj, typeof key !== "symbol" ? key + "" : key, value);
|
|
4
|
+
|
|
5
|
+
// src/codeUtil.ts
|
|
6
|
+
function isCodeHighSurrogate(code) {
|
|
7
|
+
return Boolean(code && code >= 55296 && code <= 56319);
|
|
8
|
+
}
|
|
9
|
+
function isCodeLowSurrogate(code) {
|
|
10
|
+
return Boolean(code && code >= 56320 && code <= 57343);
|
|
11
|
+
}
|
|
12
|
+
function tryGetGenuinePreviousCode(code, nowPoint, sliceSerialize) {
|
|
13
|
+
if (nowPoint._bufferIndex < 2) {
|
|
14
|
+
return code;
|
|
15
|
+
}
|
|
16
|
+
const previousBuffer = sliceSerialize({
|
|
17
|
+
// take 2 characters (code units)
|
|
18
|
+
start: { ...nowPoint, _bufferIndex: nowPoint._bufferIndex - 2 },
|
|
19
|
+
end: nowPoint
|
|
20
|
+
});
|
|
21
|
+
const previousCandidate = previousBuffer.codePointAt(0);
|
|
22
|
+
return previousCandidate && previousCandidate >= 65536 ? previousCandidate : code;
|
|
23
|
+
}
|
|
24
|
+
function tryGetCodeTwoBefore(previousCode, nowPoint, sliceSerialize) {
|
|
25
|
+
const previousWidth = previousCode >= 65536 ? 2 : 1;
|
|
26
|
+
if (nowPoint._bufferIndex < 1 + previousWidth) {
|
|
27
|
+
return null;
|
|
28
|
+
}
|
|
29
|
+
const idealStart = nowPoint._bufferIndex - previousWidth - 2;
|
|
30
|
+
const twoPreviousBuffer = sliceSerialize({
|
|
31
|
+
// take 1--2 character
|
|
32
|
+
start: {
|
|
33
|
+
...nowPoint,
|
|
34
|
+
_bufferIndex: idealStart >= 0 ? idealStart : 0
|
|
35
|
+
},
|
|
36
|
+
end: {
|
|
37
|
+
...nowPoint,
|
|
38
|
+
_bufferIndex: nowPoint._bufferIndex - previousWidth
|
|
39
|
+
}
|
|
40
|
+
});
|
|
41
|
+
const twoPreviousLast = twoPreviousBuffer.charCodeAt(
|
|
42
|
+
twoPreviousBuffer.length - 1
|
|
43
|
+
);
|
|
44
|
+
if (Number.isNaN(twoPreviousLast)) {
|
|
45
|
+
return null;
|
|
46
|
+
}
|
|
47
|
+
if (twoPreviousBuffer.length < 2 || twoPreviousLast < 56320 || 57343 < twoPreviousLast) {
|
|
48
|
+
return twoPreviousLast;
|
|
49
|
+
}
|
|
50
|
+
const twoPreviousCandidate = twoPreviousBuffer.codePointAt(0);
|
|
51
|
+
if (twoPreviousCandidate && twoPreviousCandidate >= 65536) {
|
|
52
|
+
return twoPreviousCandidate;
|
|
53
|
+
}
|
|
54
|
+
return twoPreviousLast;
|
|
55
|
+
}
|
|
56
|
+
var TwoPreviousCode = class {
|
|
57
|
+
/**
|
|
58
|
+
* @see {@link tryGetCodeTwoBefore}
|
|
59
|
+
*
|
|
60
|
+
* @param previousCode a previous code point. Should be greater than 65,535 if it represents a [Supplementary Character](https://www.unicode.org/glossary/#supplementary_character).
|
|
61
|
+
* @param nowPoint `this.now()` (`this` = `TokenizeContext`)
|
|
62
|
+
* @param sliceSerialize `this.sliceSerialize` (`this` = `TokenizeContext`)
|
|
63
|
+
*/
|
|
64
|
+
constructor(previousCode, nowPoint, sliceSerialize) {
|
|
65
|
+
this.previousCode = previousCode;
|
|
66
|
+
this.nowPoint = nowPoint;
|
|
67
|
+
this.sliceSerialize = sliceSerialize;
|
|
68
|
+
__publicField(this, "cachedValue");
|
|
69
|
+
}
|
|
70
|
+
/**
|
|
71
|
+
* Returns the return value of {@link tryGetCodeTwoBefore}.
|
|
72
|
+
*
|
|
73
|
+
* If the value has not been computed yet, it will be computed and cached.
|
|
74
|
+
*
|
|
75
|
+
* @see {@link tryGetCodeTwoBefore}
|
|
76
|
+
*
|
|
77
|
+
* @returns a value greater than 65,535 if the code point two positions before represents a [Supplementary Character](https://www.unicode.org/glossary/#supplementary_character), a value less than 65,536 for a [BMP Character](https://www.unicode.org/glossary/#bmp_character), or `null` if not found
|
|
78
|
+
*/
|
|
79
|
+
value() {
|
|
80
|
+
if (this.cachedValue === void 0) {
|
|
81
|
+
this.cachedValue = tryGetCodeTwoBefore(
|
|
82
|
+
this.previousCode,
|
|
83
|
+
this.nowPoint,
|
|
84
|
+
this.sliceSerialize
|
|
85
|
+
);
|
|
86
|
+
}
|
|
87
|
+
return this.cachedValue;
|
|
88
|
+
}
|
|
89
|
+
};
|
|
90
|
+
function tryGetGenuineNextCode(code, nowPoint, sliceSerialize) {
|
|
91
|
+
const nextCandidate = sliceSerialize({
|
|
92
|
+
start: nowPoint,
|
|
93
|
+
end: { ...nowPoint, _bufferIndex: nowPoint._bufferIndex + 2 }
|
|
94
|
+
}).codePointAt(0);
|
|
95
|
+
return nextCandidate && nextCandidate >= 65536 ? nextCandidate : code;
|
|
96
|
+
}
|
|
97
|
+
export {
|
|
98
|
+
TwoPreviousCode,
|
|
99
|
+
isCodeHighSurrogate,
|
|
100
|
+
isCodeLowSurrogate,
|
|
101
|
+
tryGetCodeTwoBefore,
|
|
102
|
+
tryGetGenuineNextCode,
|
|
103
|
+
tryGetGenuinePreviousCode
|
|
104
|
+
};
|
package/dist/index.d.ts
CHANGED
|
@@ -1,3 +1,5 @@
|
|
|
1
|
-
export { isCjk, isIvs, isNonCjkPunctuation, isSpaceOrPunctuation, isSvsFollowingCjk, isUnicodeWhitespace
|
|
2
|
-
export { classifyCharacter, constantsEx } from
|
|
3
|
-
export { isCodeHighSurrogate, isCodeLowSurrogate, tryGetGenuineNextCode, tryGetGenuinePreviousCode
|
|
1
|
+
export { isCjk, isIvs, isNonCjkPunctuation, isSpaceOrPunctuation, isSvsFollowingCjk, isUnicodeWhitespace } from './categoryUtil.js';
|
|
2
|
+
export { classifyCharacter, constantsEx } from './classifyCharacter.js';
|
|
3
|
+
export { isCodeHighSurrogate, isCodeLowSurrogate, tryGetCodeTwoBefore, tryGetGenuineNextCode, tryGetGenuinePreviousCode } from './codeUtil.js';
|
|
4
|
+
import 'micromark-util-symbol';
|
|
5
|
+
import 'micromark-util-types';
|
package/dist/index.js
CHANGED
|
@@ -1,123 +1,183 @@
|
|
|
1
|
-
|
|
2
|
-
import
|
|
3
|
-
|
|
4
|
-
|
|
5
|
-
|
|
6
|
-
|
|
7
|
-
|
|
8
|
-
|
|
9
|
-
|
|
10
|
-
|
|
11
|
-
|
|
12
|
-
|
|
13
|
-
|
|
14
|
-
|
|
15
|
-
|
|
16
|
-
|
|
17
|
-
|
|
18
|
-
|
|
1
|
+
// src/categoryUtil.ts
|
|
2
|
+
import { constants as constants2 } from "micromark-util-symbol";
|
|
3
|
+
|
|
4
|
+
// src/classifyCharacter.ts
|
|
5
|
+
import { markdownLineEndingOrSpace } from "micromark-util-character";
|
|
6
|
+
import { constants, codes } from "micromark-util-symbol";
|
|
7
|
+
|
|
8
|
+
// src/characterWithNonBmp.ts
|
|
9
|
+
import { eastAsianWidthType } from "get-east-asian-width";
|
|
10
|
+
var isEmoji = function(uc) {
|
|
11
|
+
if (this.fn !== null) {
|
|
12
|
+
return this.fn(uc);
|
|
13
|
+
}
|
|
14
|
+
try {
|
|
15
|
+
const regex = new RegExp("^\\p{RGI_Emoji}", "v");
|
|
16
|
+
this.fn = (uc_) => regex.test(String.fromCodePoint(uc_));
|
|
17
|
+
} catch (e) {
|
|
18
|
+
if (!(e instanceof SyntaxError)) {
|
|
19
|
+
throw e;
|
|
19
20
|
}
|
|
21
|
+
this.fn = (cp) => 8986 <= cp && cp <= 8987 || 9193 <= cp && cp <= 9196 || cp === 9200 || cp === 9203 || 9725 <= cp && cp <= 9726 || 9748 <= cp && cp <= 9749 || 9800 <= cp && cp <= 9811 || cp === 9855 || cp === 9875 || cp === 9889 || 9898 <= cp && cp <= 9899 || 9917 <= cp && cp <= 9918 || 9924 <= cp && cp <= 9925 || cp === 9934 || cp === 9940 || cp === 9962 || 9970 <= cp && cp <= 9971 || cp === 9973 || cp === 9978 || cp === 9981 || cp === 9989 || 9994 <= cp && cp <= 9995 || cp === 10024 || cp === 10060 || cp === 10062 || 10067 <= cp && cp <= 10069 || cp === 10071 || 10133 <= cp && cp <= 10135 || cp === 10160 || cp === 10175 || 11035 <= cp && cp <= 11036 || cp === 11088 || cp === 11093 || cp === 126980 || cp === 127183 || cp === 127374 || 127377 <= cp && cp <= 127386 || cp === 127489 || cp === 127514 || cp === 127535 || 127538 <= cp && cp <= 127542 || 127544 <= cp && cp <= 127546 || 127568 <= cp && cp <= 127569 || 127744 <= cp && cp <= 127756 || 127757 <= cp && cp <= 127758 || cp === 127759 || cp === 127760 || cp === 127761 || cp === 127762 || 127763 <= cp && cp <= 127765 || 127766 <= cp && cp <= 127768 || cp === 127769 || cp === 127770 || cp === 127771 || cp === 127772 || 127773 <= cp && cp <= 127774 || 127775 <= cp && cp <= 127776 || 127789 <= cp && cp <= 127791 || 127792 <= cp && cp <= 127793 || 127794 <= cp && cp <= 127795 || 127796 <= cp && cp <= 127797 || 127799 <= cp && cp <= 127818 || cp === 127819 || 127820 <= cp && cp <= 127823 || cp === 127824 || 127825 <= cp && cp <= 127867 || cp === 127868 || 127870 <= cp && cp <= 127871 || 127872 <= cp && cp <= 127891 || 127904 <= cp && cp <= 127940 || cp === 127941 || cp === 127942 || cp === 127943 || cp === 127944 || cp === 127945 || cp === 127946 || 127951 <= cp && cp <= 127955 || 127968 <= cp && cp <= 127971 || cp === 127972 || 127973 <= cp && cp <= 127984 || cp === 127988 || 127992 <= cp && cp <= 128007 || cp === 128008 || 128009 <= cp && cp <= 128011 || 128012 <= cp && cp <= 128014 || 128015 <= cp && cp <= 128016 || 128017 <= cp && cp <= 128018 || cp === 128019 || cp === 128020 || cp === 128021 || cp === 128022 || 128023 <= cp && cp <= 128041 || cp === 128042 || 128043 <= cp && cp <= 128062 || cp === 128064 || 128066 <= cp && cp <= 128100 || cp === 128101 || 128102 <= cp && cp <= 128107 || 128108 <= cp && cp <= 128109 || 128110 <= cp && cp <= 128172 || cp === 128173 || 128174 <= cp && cp <= 128181 || 128182 <= cp && cp <= 128183 || 128184 <= cp && cp <= 128235 || 128236 <= cp && cp <= 128237 || cp === 128238 || cp === 128239 || 128240 <= cp && cp <= 128244 || cp === 128245 || 128246 <= cp && cp <= 128247 || cp === 128248 || 128249 <= cp && cp <= 128252 || 128255 <= cp && cp <= 128258 || cp === 128259 || 128260 <= cp && cp <= 128263 || cp === 128264 || cp === 128265 || 128266 <= cp && cp <= 128276 || cp === 128277 || 128278 <= cp && cp <= 128299 || 128300 <= cp && cp <= 128301 || 128302 <= cp && cp <= 128317 || 128331 <= cp && cp <= 128334 || 128336 <= cp && cp <= 128347 || 128348 <= cp && cp <= 128359 || cp === 128378 || 128405 <= cp && cp <= 128406 || cp === 128420 || 128507 <= cp && cp <= 128511 || cp === 128512 || 128513 <= cp && cp <= 128518 || 128519 <= cp && cp <= 128520 || 128521 <= cp && cp <= 128525 || cp === 128526 || cp === 128527 || cp === 128528 || cp === 128529 || 128530 <= cp && cp <= 128532 || cp === 128533 || cp === 128534 || cp === 128535 || cp === 128536 || cp === 128537 || cp === 128538 || cp === 128539 || 128540 <= cp && cp <= 128542 || cp === 128543 || 128544 <= cp && cp <= 128549 || 128550 <= cp && cp <= 128551 || 128552 <= cp && cp <= 128555 || cp === 128556 || cp === 128557 || 128558 <= cp && cp <= 128559 || 128560 <= cp && cp <= 128563 || cp === 128564 || cp === 128565 || cp === 128566 || 128567 <= cp && cp <= 128576 || 128577 <= cp && cp <= 128580 || 128581 <= cp && cp <= 128591 || cp === 128640 || 128641 <= cp && cp <= 128642 || 128643 <= cp && cp <= 128645 || cp === 128646 || cp === 128647 || cp === 128648 || cp === 128649 || 128650 <= cp && cp <= 128651 || cp === 128652 || cp === 128653 || cp === 128654 || cp === 128655 || cp === 128656 || 128657 <= cp && cp <= 128659 || cp === 128660 || cp === 128661 || cp === 128662 || cp === 128663 || cp === 128664 || 128665 <= cp && cp <= 128666 || 128667 <= cp && cp <= 128673 || cp === 128674 || cp === 128675 || 128676 <= cp && cp <= 128677 || cp === 128678 || 128679 <= cp && cp <= 128685 || 128686 <= cp && cp <= 128689 || cp === 128690 || 128691 <= cp && cp <= 128693 || cp === 128694 || 128695 <= cp && cp <= 128696 || 128697 <= cp && cp <= 128702 || cp === 128703 || cp === 128704 || 128705 <= cp && cp <= 128709 || cp === 128716 || cp === 128720 || 128721 <= cp && cp <= 128722 || cp === 128725 || 128726 <= cp && cp <= 128727 || cp === 128732 || 128733 <= cp && cp <= 128735 || 128747 <= cp && cp <= 128748 || 128756 <= cp && cp <= 128758 || 128759 <= cp && cp <= 128760 || cp === 128761 || cp === 128762 || 128763 <= cp && cp <= 128764 || 128992 <= cp && cp <= 129003 || cp === 129008 || cp === 129292 || 129293 <= cp && cp <= 129295 || 129296 <= cp && cp <= 129304 || 129305 <= cp && cp <= 129310 || cp === 129311 || 129312 <= cp && cp <= 129319 || 129320 <= cp && cp <= 129327 || cp === 129328 || 129329 <= cp && cp <= 129330 || 129331 <= cp && cp <= 129338 || 129340 <= cp && cp <= 129342 || cp === 129343 || 129344 <= cp && cp <= 129349 || 129351 <= cp && cp <= 129355 || cp === 129356 || 129357 <= cp && cp <= 129359 || 129360 <= cp && cp <= 129374 || 129375 <= cp && cp <= 129387 || 129388 <= cp && cp <= 129392 || cp === 129393 || cp === 129394 || 129395 <= cp && cp <= 129398 || 129399 <= cp && cp <= 129400 || cp === 129401 || cp === 129402 || cp === 129403 || 129404 <= cp && cp <= 129407 || 129408 <= cp && cp <= 129412 || 129413 <= cp && cp <= 129425 || 129426 <= cp && cp <= 129431 || 129432 <= cp && cp <= 129442 || 129443 <= cp && cp <= 129444 || 129445 <= cp && cp <= 129450 || 129451 <= cp && cp <= 129453 || 129454 <= cp && cp <= 129455 || 129456 <= cp && cp <= 129465 || 129466 <= cp && cp <= 129471 || cp === 129472 || 129473 <= cp && cp <= 129474 || 129475 <= cp && cp <= 129482 || cp === 129483 || cp === 129484 || 129485 <= cp && cp <= 129487 || 129488 <= cp && cp <= 129510 || 129511 <= cp && cp <= 129535 || 129648 <= cp && cp <= 129651 || cp === 129652 || 129653 <= cp && cp <= 129655 || 129656 <= cp && cp <= 129658 || 129659 <= cp && cp <= 129660 || 129664 <= cp && cp <= 129666 || 129667 <= cp && cp <= 129670 || 129671 <= cp && cp <= 129672 || cp === 129673 || cp === 129679 || 129680 <= cp && cp <= 129685 || 129686 <= cp && cp <= 129704 || 129705 <= cp && cp <= 129708 || 129709 <= cp && cp <= 129711 || 129712 <= cp && cp <= 129718 || 129719 <= cp && cp <= 129722 || 129723 <= cp && cp <= 129725 || cp === 129726 || cp === 129727 || 129728 <= cp && cp <= 129730 || 129731 <= cp && cp <= 129733 || cp === 129734 || 129742 <= cp && cp <= 129743 || 129744 <= cp && cp <= 129750 || 129751 <= cp && cp <= 129753 || 129754 <= cp && cp <= 129755 || cp === 129756 || cp === 129759 || 129760 <= cp && cp <= 129767 || cp === 129768 || cp === 129769 || 129776 <= cp && cp <= 129782 || 129783 <= cp && cp <= 129784;
|
|
22
|
+
}
|
|
23
|
+
return this.fn(uc);
|
|
24
|
+
}.bind({
|
|
25
|
+
fn: null
|
|
26
|
+
});
|
|
27
|
+
function cjkOrIvs(uc) {
|
|
28
|
+
if (!uc || uc < 0) {
|
|
29
|
+
return false;
|
|
30
|
+
}
|
|
31
|
+
const eaw = eastAsianWidthType(uc);
|
|
32
|
+
switch (eaw) {
|
|
33
|
+
case "fullwidth":
|
|
34
|
+
case "halfwidth":
|
|
35
|
+
return true;
|
|
36
|
+
// never be emoji
|
|
37
|
+
case "wide":
|
|
38
|
+
return !isEmoji(uc);
|
|
39
|
+
case "narrow":
|
|
40
|
+
return false;
|
|
41
|
+
case "ambiguous":
|
|
42
|
+
return 917760 <= uc && uc <= 917999 ? null : false;
|
|
43
|
+
case "neutral":
|
|
44
|
+
return /^\p{sc=Hangul}/u.test(String.fromCodePoint(uc));
|
|
45
|
+
}
|
|
20
46
|
}
|
|
21
|
-
|
|
22
|
-
|
|
23
|
-
|
|
47
|
+
var svsFollowingCjk = regexCheck(/[\uFE00-\uFE02\uFE0E]/u);
|
|
48
|
+
var unicodePunctuation = regexCheck(/\p{P}|\p{S}/u);
|
|
49
|
+
var unicodeWhitespace = regexCheck(/\s/);
|
|
24
50
|
function regexCheck(regex) {
|
|
25
|
-
|
|
26
|
-
|
|
27
|
-
|
|
28
|
-
|
|
51
|
+
return check;
|
|
52
|
+
function check(code) {
|
|
53
|
+
return code !== null && code > -1 && regex.test(String.fromCodePoint(code));
|
|
54
|
+
}
|
|
29
55
|
}
|
|
30
|
-
|
|
31
|
-
|
|
32
|
-
|
|
33
|
-
|
|
34
|
-
|
|
35
|
-
|
|
36
|
-
|
|
37
|
-
|
|
38
|
-
|
|
56
|
+
|
|
57
|
+
// src/classifyCharacter.ts
|
|
58
|
+
var constantsEx;
|
|
59
|
+
((constantsEx2) => {
|
|
60
|
+
constantsEx2.spaceOrPunctuation = 3;
|
|
61
|
+
constantsEx2.cjk = 4096;
|
|
62
|
+
constantsEx2.cjkPunctuation = 4098;
|
|
63
|
+
constantsEx2.ivs = 8192;
|
|
64
|
+
constantsEx2.cjkOrIvs = 12288;
|
|
65
|
+
constantsEx2.svsFollowingCjk = 16384;
|
|
66
|
+
constantsEx2.variationSelector = 28672;
|
|
67
|
+
})(constantsEx || (constantsEx = {}));
|
|
39
68
|
function classifyCharacter(code) {
|
|
40
|
-
|
|
41
|
-
|
|
42
|
-
|
|
43
|
-
|
|
44
|
-
|
|
45
|
-
|
|
46
|
-
|
|
47
|
-
case true:
|
|
48
|
-
value |= classifyCharacter_constantsEx.cjk;
|
|
49
|
-
break;
|
|
50
|
-
}
|
|
69
|
+
if (code === codes.eof || markdownLineEndingOrSpace(code) || unicodeWhitespace(code)) {
|
|
70
|
+
return constants.characterGroupWhitespace;
|
|
71
|
+
}
|
|
72
|
+
let value = 0;
|
|
73
|
+
if (code >= 4352) {
|
|
74
|
+
if (svsFollowingCjk(code)) {
|
|
75
|
+
return constantsEx.svsFollowingCjk;
|
|
51
76
|
}
|
|
52
|
-
|
|
53
|
-
|
|
77
|
+
switch (cjkOrIvs(code)) {
|
|
78
|
+
case null:
|
|
79
|
+
return constantsEx.ivs;
|
|
80
|
+
case true:
|
|
81
|
+
value |= constantsEx.cjk;
|
|
82
|
+
break;
|
|
83
|
+
}
|
|
84
|
+
}
|
|
85
|
+
if (unicodePunctuation(code)) {
|
|
86
|
+
value |= constants.characterGroupPunctuation;
|
|
87
|
+
}
|
|
88
|
+
return value;
|
|
54
89
|
}
|
|
55
|
-
|
|
90
|
+
|
|
91
|
+
// src/categoryUtil.ts
|
|
56
92
|
function isUnicodeWhitespace(category) {
|
|
57
|
-
|
|
93
|
+
return Boolean(category & constants2.characterGroupWhitespace);
|
|
58
94
|
}
|
|
59
95
|
function isNonCjkPunctuation(category) {
|
|
60
|
-
|
|
96
|
+
return (category & constantsEx.cjkPunctuation) === constants2.characterGroupPunctuation;
|
|
61
97
|
}
|
|
62
98
|
function isCjk(category) {
|
|
63
|
-
|
|
99
|
+
return Boolean(category & constantsEx.cjk);
|
|
64
100
|
}
|
|
65
101
|
function isIvs(category) {
|
|
66
|
-
|
|
102
|
+
return category === constantsEx.ivs;
|
|
67
103
|
}
|
|
68
104
|
function isSvsFollowingCjk(category) {
|
|
69
|
-
|
|
105
|
+
return category === constantsEx.svsFollowingCjk;
|
|
70
106
|
}
|
|
71
107
|
function isSpaceOrPunctuation(category) {
|
|
72
|
-
|
|
108
|
+
return Boolean(category & constantsEx.spaceOrPunctuation);
|
|
73
109
|
}
|
|
110
|
+
|
|
111
|
+
// src/codeUtil.ts
|
|
74
112
|
function isCodeHighSurrogate(code) {
|
|
75
|
-
|
|
113
|
+
return Boolean(code && code >= 55296 && code <= 56319);
|
|
76
114
|
}
|
|
77
115
|
function isCodeLowSurrogate(code) {
|
|
78
|
-
|
|
116
|
+
return Boolean(code && code >= 56320 && code <= 57343);
|
|
79
117
|
}
|
|
80
118
|
function tryGetGenuinePreviousCode(code, nowPoint, sliceSerialize) {
|
|
81
|
-
|
|
82
|
-
|
|
83
|
-
|
|
84
|
-
|
|
85
|
-
|
|
86
|
-
|
|
87
|
-
|
|
88
|
-
|
|
89
|
-
|
|
90
|
-
|
|
119
|
+
if (nowPoint._bufferIndex < 2) {
|
|
120
|
+
return code;
|
|
121
|
+
}
|
|
122
|
+
const previousBuffer = sliceSerialize({
|
|
123
|
+
// take 2 characters (code units)
|
|
124
|
+
start: { ...nowPoint, _bufferIndex: nowPoint._bufferIndex - 2 },
|
|
125
|
+
end: nowPoint
|
|
126
|
+
});
|
|
127
|
+
const previousCandidate = previousBuffer.codePointAt(0);
|
|
128
|
+
return previousCandidate && previousCandidate >= 65536 ? previousCandidate : code;
|
|
91
129
|
}
|
|
92
130
|
function tryGetCodeTwoBefore(previousCode, nowPoint, sliceSerialize) {
|
|
93
|
-
|
|
94
|
-
|
|
95
|
-
|
|
96
|
-
|
|
97
|
-
|
|
98
|
-
|
|
99
|
-
|
|
100
|
-
|
|
101
|
-
|
|
102
|
-
|
|
103
|
-
|
|
104
|
-
|
|
105
|
-
|
|
106
|
-
|
|
107
|
-
|
|
108
|
-
|
|
109
|
-
|
|
110
|
-
|
|
131
|
+
const previousWidth = previousCode >= 65536 ? 2 : 1;
|
|
132
|
+
if (nowPoint._bufferIndex < 1 + previousWidth) {
|
|
133
|
+
return null;
|
|
134
|
+
}
|
|
135
|
+
const idealStart = nowPoint._bufferIndex - previousWidth - 2;
|
|
136
|
+
const twoPreviousBuffer = sliceSerialize({
|
|
137
|
+
// take 1--2 character
|
|
138
|
+
start: {
|
|
139
|
+
...nowPoint,
|
|
140
|
+
_bufferIndex: idealStart >= 0 ? idealStart : 0
|
|
141
|
+
},
|
|
142
|
+
end: {
|
|
143
|
+
...nowPoint,
|
|
144
|
+
_bufferIndex: nowPoint._bufferIndex - previousWidth
|
|
145
|
+
}
|
|
146
|
+
});
|
|
147
|
+
const twoPreviousLast = twoPreviousBuffer.charCodeAt(
|
|
148
|
+
twoPreviousBuffer.length - 1
|
|
149
|
+
);
|
|
150
|
+
if (Number.isNaN(twoPreviousLast)) {
|
|
151
|
+
return null;
|
|
152
|
+
}
|
|
153
|
+
if (twoPreviousBuffer.length < 2 || twoPreviousLast < 56320 || 57343 < twoPreviousLast) {
|
|
111
154
|
return twoPreviousLast;
|
|
155
|
+
}
|
|
156
|
+
const twoPreviousCandidate = twoPreviousBuffer.codePointAt(0);
|
|
157
|
+
if (twoPreviousCandidate && twoPreviousCandidate >= 65536) {
|
|
158
|
+
return twoPreviousCandidate;
|
|
159
|
+
}
|
|
160
|
+
return twoPreviousLast;
|
|
112
161
|
}
|
|
113
162
|
function tryGetGenuineNextCode(code, nowPoint, sliceSerialize) {
|
|
114
|
-
|
|
115
|
-
|
|
116
|
-
|
|
117
|
-
|
|
118
|
-
|
|
119
|
-
}
|
|
120
|
-
}).codePointAt(0);
|
|
121
|
-
return nextCandidate && nextCandidate >= 65536 ? nextCandidate : code;
|
|
163
|
+
const nextCandidate = sliceSerialize({
|
|
164
|
+
start: nowPoint,
|
|
165
|
+
end: { ...nowPoint, _bufferIndex: nowPoint._bufferIndex + 2 }
|
|
166
|
+
}).codePointAt(0);
|
|
167
|
+
return nextCandidate && nextCandidate >= 65536 ? nextCandidate : code;
|
|
122
168
|
}
|
|
123
|
-
export {
|
|
169
|
+
export {
|
|
170
|
+
classifyCharacter,
|
|
171
|
+
constantsEx,
|
|
172
|
+
isCjk,
|
|
173
|
+
isCodeHighSurrogate,
|
|
174
|
+
isCodeLowSurrogate,
|
|
175
|
+
isIvs,
|
|
176
|
+
isNonCjkPunctuation,
|
|
177
|
+
isSpaceOrPunctuation,
|
|
178
|
+
isSvsFollowingCjk,
|
|
179
|
+
isUnicodeWhitespace,
|
|
180
|
+
tryGetCodeTwoBefore,
|
|
181
|
+
tryGetGenuineNextCode,
|
|
182
|
+
tryGetGenuinePreviousCode
|
|
183
|
+
};
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "micromark-extension-cjk-friendly-util",
|
|
3
|
-
"version": "1.
|
|
3
|
+
"version": "1.1.0",
|
|
4
4
|
"type": "module",
|
|
5
5
|
"exports": {
|
|
6
6
|
".": {
|
|
@@ -8,6 +8,8 @@
|
|
|
8
8
|
"default": "./dist/index.js"
|
|
9
9
|
}
|
|
10
10
|
},
|
|
11
|
+
"module": "./dist/index.js",
|
|
12
|
+
"types": "./dist/index.d.ts",
|
|
11
13
|
"files": [
|
|
12
14
|
"dist",
|
|
13
15
|
"LICENSE",
|
|
@@ -44,14 +46,20 @@
|
|
|
44
46
|
}
|
|
45
47
|
},
|
|
46
48
|
"engines": {
|
|
47
|
-
"node": ">=
|
|
49
|
+
"node": ">=16"
|
|
48
50
|
},
|
|
49
51
|
"scripts": {
|
|
50
|
-
"build": "rslib build",
|
|
51
|
-
"
|
|
52
|
-
"
|
|
52
|
+
"build:rslib": "rslib build",
|
|
53
|
+
"build": "tsup",
|
|
54
|
+
"build:lib": "tsup",
|
|
55
|
+
"dev:rslib": "rslib build --watch",
|
|
56
|
+
"dev": "tsup --watch",
|
|
57
|
+
"dev:lib": "tsup --watch",
|
|
58
|
+
"test": "vitest run",
|
|
59
|
+
"test:lib": "vitest run",
|
|
53
60
|
"test:up": "vitest -u",
|
|
54
61
|
"test:watch": "vitest watch",
|
|
62
|
+
"test:lib:watch": "vitest watch",
|
|
55
63
|
"lint:type": "tsc --noEmit"
|
|
56
64
|
}
|
|
57
65
|
}
|
package/dist/index.cjs
DELETED
|
@@ -1,169 +0,0 @@
|
|
|
1
|
-
"use strict";
|
|
2
|
-
var __webpack_require__ = {};
|
|
3
|
-
(()=>{
|
|
4
|
-
__webpack_require__.d = function(exports1, definition) {
|
|
5
|
-
for(var key in definition)if (__webpack_require__.o(definition, key) && !__webpack_require__.o(exports1, key)) Object.defineProperty(exports1, key, {
|
|
6
|
-
enumerable: true,
|
|
7
|
-
get: definition[key]
|
|
8
|
-
});
|
|
9
|
-
};
|
|
10
|
-
})();
|
|
11
|
-
(()=>{
|
|
12
|
-
__webpack_require__.o = function(obj, prop) {
|
|
13
|
-
return Object.prototype.hasOwnProperty.call(obj, prop);
|
|
14
|
-
};
|
|
15
|
-
})();
|
|
16
|
-
(()=>{
|
|
17
|
-
__webpack_require__.r = function(exports1) {
|
|
18
|
-
if ('undefined' != typeof Symbol && Symbol.toStringTag) Object.defineProperty(exports1, Symbol.toStringTag, {
|
|
19
|
-
value: 'Module'
|
|
20
|
-
});
|
|
21
|
-
Object.defineProperty(exports1, '__esModule', {
|
|
22
|
-
value: true
|
|
23
|
-
});
|
|
24
|
-
};
|
|
25
|
-
})();
|
|
26
|
-
var __webpack_exports__ = {};
|
|
27
|
-
__webpack_require__.r(__webpack_exports__);
|
|
28
|
-
__webpack_require__.d(__webpack_exports__, {
|
|
29
|
-
constantsEx: ()=>classifyCharacter_constantsEx,
|
|
30
|
-
isCjk: ()=>isCjk,
|
|
31
|
-
isSpaceOrPunctuation: ()=>isSpaceOrPunctuation,
|
|
32
|
-
isNonCjkPunctuation: ()=>isNonCjkPunctuation,
|
|
33
|
-
isSvsFollowingCjk: ()=>isSvsFollowingCjk,
|
|
34
|
-
tryGetGenuinePreviousCode: ()=>tryGetGenuinePreviousCode,
|
|
35
|
-
isUnicodeWhitespace: ()=>isUnicodeWhitespace,
|
|
36
|
-
isCodeHighSurrogate: ()=>isCodeHighSurrogate,
|
|
37
|
-
isIvs: ()=>isIvs,
|
|
38
|
-
tryGetCodeTwoBefore: ()=>tryGetCodeTwoBefore,
|
|
39
|
-
classifyCharacter: ()=>classifyCharacter,
|
|
40
|
-
isCodeLowSurrogate: ()=>isCodeLowSurrogate,
|
|
41
|
-
tryGetGenuineNextCode: ()=>tryGetGenuineNextCode
|
|
42
|
-
});
|
|
43
|
-
const external_micromark_util_symbol_namespaceObject = require("micromark-util-symbol");
|
|
44
|
-
const external_micromark_util_character_namespaceObject = require("micromark-util-character");
|
|
45
|
-
const external_get_east_asian_width_namespaceObject = require("get-east-asian-width");
|
|
46
|
-
function cjkOrIvs(uc) {
|
|
47
|
-
if (!uc || uc < 0) return false;
|
|
48
|
-
const eaw = (0, external_get_east_asian_width_namespaceObject.eastAsianWidthType)(uc);
|
|
49
|
-
switch(eaw){
|
|
50
|
-
case "fullwidth":
|
|
51
|
-
case "halfwidth":
|
|
52
|
-
return true;
|
|
53
|
-
case "wide":
|
|
54
|
-
return !/^\p{RGI_Emoji}/v.test(String.fromCodePoint(uc));
|
|
55
|
-
case "narrow":
|
|
56
|
-
return false;
|
|
57
|
-
case "ambiguous":
|
|
58
|
-
return 0xe0100 <= uc && uc <= 0xe01ef && null;
|
|
59
|
-
case "neutral":
|
|
60
|
-
return /^\p{sc=Hangul}/u.test(String.fromCodePoint(uc));
|
|
61
|
-
}
|
|
62
|
-
}
|
|
63
|
-
const svsFollowingCjk = regexCheck(/[\uFE00-\uFE02\uFE0E]/u);
|
|
64
|
-
const unicodePunctuation = regexCheck(/\p{P}|\p{S}/u);
|
|
65
|
-
const unicodeWhitespace = regexCheck(/\s/);
|
|
66
|
-
function regexCheck(regex) {
|
|
67
|
-
return check;
|
|
68
|
-
function check(code) {
|
|
69
|
-
return null !== code && code > -1 && regex.test(String.fromCodePoint(code));
|
|
70
|
-
}
|
|
71
|
-
}
|
|
72
|
-
(function(constantsEx) {
|
|
73
|
-
constantsEx.spaceOrPunctuation = 3;
|
|
74
|
-
constantsEx.cjk = 0x1000;
|
|
75
|
-
constantsEx.cjkPunctuation = 0x1002;
|
|
76
|
-
constantsEx.ivs = 0x2000;
|
|
77
|
-
constantsEx.cjkOrIvs = 0x3000;
|
|
78
|
-
constantsEx.svsFollowingCjk = 0x4000;
|
|
79
|
-
constantsEx.variationSelector = 0x7000;
|
|
80
|
-
})(classifyCharacter_constantsEx || (classifyCharacter_constantsEx = {}));
|
|
81
|
-
function classifyCharacter(code) {
|
|
82
|
-
if (code === external_micromark_util_symbol_namespaceObject.codes.eof || (0, external_micromark_util_character_namespaceObject.markdownLineEndingOrSpace)(code) || unicodeWhitespace(code)) return external_micromark_util_symbol_namespaceObject.constants.characterGroupWhitespace;
|
|
83
|
-
let value = 0;
|
|
84
|
-
if (code >= 0x1100) {
|
|
85
|
-
if (svsFollowingCjk(code)) return classifyCharacter_constantsEx.svsFollowingCjk;
|
|
86
|
-
switch(cjkOrIvs(code)){
|
|
87
|
-
case null:
|
|
88
|
-
return classifyCharacter_constantsEx.ivs;
|
|
89
|
-
case true:
|
|
90
|
-
value |= classifyCharacter_constantsEx.cjk;
|
|
91
|
-
break;
|
|
92
|
-
}
|
|
93
|
-
}
|
|
94
|
-
if (unicodePunctuation(code)) value |= external_micromark_util_symbol_namespaceObject.constants.characterGroupPunctuation;
|
|
95
|
-
return value;
|
|
96
|
-
}
|
|
97
|
-
var classifyCharacter_constantsEx;
|
|
98
|
-
function isUnicodeWhitespace(category) {
|
|
99
|
-
return Boolean(category & external_micromark_util_symbol_namespaceObject.constants.characterGroupWhitespace);
|
|
100
|
-
}
|
|
101
|
-
function isNonCjkPunctuation(category) {
|
|
102
|
-
return (category & classifyCharacter_constantsEx.cjkPunctuation) === external_micromark_util_symbol_namespaceObject.constants.characterGroupPunctuation;
|
|
103
|
-
}
|
|
104
|
-
function isCjk(category) {
|
|
105
|
-
return Boolean(category & classifyCharacter_constantsEx.cjk);
|
|
106
|
-
}
|
|
107
|
-
function isIvs(category) {
|
|
108
|
-
return category === classifyCharacter_constantsEx.ivs;
|
|
109
|
-
}
|
|
110
|
-
function isSvsFollowingCjk(category) {
|
|
111
|
-
return category === classifyCharacter_constantsEx.svsFollowingCjk;
|
|
112
|
-
}
|
|
113
|
-
function isSpaceOrPunctuation(category) {
|
|
114
|
-
return Boolean(category & classifyCharacter_constantsEx.spaceOrPunctuation);
|
|
115
|
-
}
|
|
116
|
-
function isCodeHighSurrogate(code) {
|
|
117
|
-
return Boolean(code && code >= 0xd800 && code <= 0xdbff);
|
|
118
|
-
}
|
|
119
|
-
function isCodeLowSurrogate(code) {
|
|
120
|
-
return Boolean(code && code >= 0xdc00 && code <= 0xdfff);
|
|
121
|
-
}
|
|
122
|
-
function tryGetGenuinePreviousCode(code, nowPoint, sliceSerialize) {
|
|
123
|
-
if (nowPoint._bufferIndex < 2) return code;
|
|
124
|
-
const previousBuffer = sliceSerialize({
|
|
125
|
-
start: {
|
|
126
|
-
...nowPoint,
|
|
127
|
-
_bufferIndex: nowPoint._bufferIndex - 2
|
|
128
|
-
},
|
|
129
|
-
end: nowPoint
|
|
130
|
-
});
|
|
131
|
-
const previousCandidate = previousBuffer.codePointAt(0);
|
|
132
|
-
return previousCandidate && previousCandidate >= 65536 ? previousCandidate : code;
|
|
133
|
-
}
|
|
134
|
-
function tryGetCodeTwoBefore(previousCode, nowPoint, sliceSerialize) {
|
|
135
|
-
const previousWidth = previousCode >= 65536 ? 2 : 1;
|
|
136
|
-
if (nowPoint._bufferIndex < 1 + previousWidth) return null;
|
|
137
|
-
const idealStart = nowPoint._bufferIndex - previousWidth - 2;
|
|
138
|
-
const twoPreviousBuffer = sliceSerialize({
|
|
139
|
-
start: {
|
|
140
|
-
...nowPoint,
|
|
141
|
-
_bufferIndex: idealStart >= 0 ? idealStart : 0
|
|
142
|
-
},
|
|
143
|
-
end: {
|
|
144
|
-
...nowPoint,
|
|
145
|
-
_bufferIndex: nowPoint._bufferIndex - previousWidth
|
|
146
|
-
}
|
|
147
|
-
});
|
|
148
|
-
const twoPreviousLast = twoPreviousBuffer.charCodeAt(twoPreviousBuffer.length - 1);
|
|
149
|
-
if (Number.isNaN(twoPreviousLast)) return null;
|
|
150
|
-
if (twoPreviousBuffer.length < 2 || twoPreviousLast < 0xdc00 || 0xdfff < twoPreviousLast) return twoPreviousLast;
|
|
151
|
-
const twoPreviousCandidate = twoPreviousBuffer.codePointAt(0);
|
|
152
|
-
if (twoPreviousCandidate && twoPreviousCandidate >= 65536) return twoPreviousCandidate;
|
|
153
|
-
return twoPreviousLast;
|
|
154
|
-
}
|
|
155
|
-
function tryGetGenuineNextCode(code, nowPoint, sliceSerialize) {
|
|
156
|
-
const nextCandidate = sliceSerialize({
|
|
157
|
-
start: nowPoint,
|
|
158
|
-
end: {
|
|
159
|
-
...nowPoint,
|
|
160
|
-
_bufferIndex: nowPoint._bufferIndex + 2
|
|
161
|
-
}
|
|
162
|
-
}).codePointAt(0);
|
|
163
|
-
return nextCandidate && nextCandidate >= 65536 ? nextCandidate : code;
|
|
164
|
-
}
|
|
165
|
-
var __webpack_export_target__ = exports;
|
|
166
|
-
for(var __webpack_i__ in __webpack_exports__)__webpack_export_target__[__webpack_i__] = __webpack_exports__[__webpack_i__];
|
|
167
|
-
if (__webpack_exports__.__esModule) Object.defineProperty(__webpack_export_target__, '__esModule', {
|
|
168
|
-
value: true
|
|
169
|
-
});
|