@chr33s/pdf-codepoints 5.0.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/LICENSE.md +21 -0
- package/README.md +81 -0
- package/data/ArabicShaping.txt +894 -0
- package/data/Blocks.txt +336 -0
- package/data/CaseFolding.txt +1581 -0
- package/data/CompositionExclusions.txt +208 -0
- package/data/DerivedNormalizationProps.txt +9803 -0
- package/data/EastAsianWidth.txt +2473 -0
- package/data/IndicPositionalCategory.txt +755 -0
- package/data/IndicSyllabicCategory.txt +1286 -0
- package/data/PropertyValueAliases.txt +1541 -0
- package/data/Scripts.txt +2837 -0
- package/data/SpecialCasing.txt +281 -0
- package/data/UnicodeData.txt +32840 -0
- package/data/extracted/DerivedNumericValues.txt +2537 -0
- package/dist/index.d.ts +5 -0
- package/dist/index.js +6 -0
- package/dist/index.js.map +1 -0
- package/dist/parser.d.ts +35 -0
- package/dist/parser.js +308 -0
- package/dist/parser.js.map +1 -0
- package/package.json +40 -0
- package/scripts/update-data.ts +64 -0
- package/src/index.ts +7 -0
- package/src/parser.ts +428 -0
- package/test/parser.test.ts +77 -0
- package/tsconfig.json +10 -0
- package/tsconfig.typecheck.json +14 -0
- package/vitest.config.ts +8 -0
package/LICENSE.md
ADDED
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
MIT License
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2014-present Devon Govett
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
|
13
|
+
copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
21
|
+
SOFTWARE.
|
package/README.md
ADDED
|
@@ -0,0 +1,81 @@
|
|
|
1
|
+
# @chr33s/pdf-codepoints
|
|
2
|
+
|
|
3
|
+
> A parser for files in the Unicode database.
|
|
4
|
+
|
|
5
|
+
Distributed as native ES modules with NodeNext resolution (use Node.js 18+ or a modern bundler).
|
|
6
|
+
|
|
7
|
+
`@chr33s/pdf-codepoints` lives in the [`chr33s/pdf`](https://github.com/chr33s/pdf) monorepo and provides native ES modules with TypeScript declarations.
|
|
8
|
+
|
|
9
|
+
## Overview
|
|
10
|
+
|
|
11
|
+
Produces a giant array of codepoint objects for
|
|
12
|
+
every character represented by Unicode, with many properties derived from files in the Unicode
|
|
13
|
+
database.
|
|
14
|
+
|
|
15
|
+
**BUILD SCRIPTS ONLY**: Use in production is not recommended
|
|
16
|
+
as the parsers are not optimized for speed, the text files are huge, and the resulting array uses a
|
|
17
|
+
huge amount of memory. To access this data in real world applications, use modules that have
|
|
18
|
+
precompiled the data into a compressed form:
|
|
19
|
+
|
|
20
|
+
* [@chr33s/pdf-unicode-properties](https://github.com/chr33s/pdf/tree/main/packages/unicode-properties)
|
|
21
|
+
|
|
22
|
+
## Installation
|
|
23
|
+
|
|
24
|
+
Install using npm:
|
|
25
|
+
|
|
26
|
+
npm install @chr33s/pdf-codepoints
|
|
27
|
+
|
|
28
|
+
## Usage
|
|
29
|
+
|
|
30
|
+
Basic usage:
|
|
31
|
+
|
|
32
|
+
```js
|
|
33
|
+
import codepoints from "@chr33s/pdf-codepoints";
|
|
34
|
+
```
|
|
35
|
+
|
|
36
|
+
The parser generates data by reading the text files contained in the
|
|
37
|
+
[Unicode Character Database](http://unicode.org/ucd/). By default, it will use the database
|
|
38
|
+
bundled with this package. To use a custom version of UCD, use `@chr33s/pdf-codepoints/parser`
|
|
39
|
+
instead, which accepts an optional path to a directory containing the uncompressed UCD data:
|
|
40
|
+
|
|
41
|
+
```js
|
|
42
|
+
import { parser } from "@chr33s/pdf-codepoints";
|
|
43
|
+
codepoints = parser("/path/to/UCD");
|
|
44
|
+
```
|
|
45
|
+
|
|
46
|
+
## Codepoint data
|
|
47
|
+
|
|
48
|
+
Each element in the generated array is either `undefined` (for unassigned code
|
|
49
|
+
points), or an object containing the following properties:
|
|
50
|
+
|
|
51
|
+
* `code` - the code point index
|
|
52
|
+
* `name` - character name
|
|
53
|
+
* `unicode1Name` - legacy name used by Unicode 1
|
|
54
|
+
* `category` - Unicode category
|
|
55
|
+
* `block` - the block name this character is a part of
|
|
56
|
+
* `script` - the script this character belongs to
|
|
57
|
+
* `eastAsianWidth` - the east asian width for this character
|
|
58
|
+
* `combiningClass` - numeric combining class value
|
|
59
|
+
* `combiningClassName` - a string name for the combining class
|
|
60
|
+
* `bidiClass` - class for the Unicode bidirectional algorithm
|
|
61
|
+
* `bidiMirrored` - whether the character is mirrored in the bidi algorithm
|
|
62
|
+
* `numeric` - the numeric value for this character
|
|
63
|
+
* `uppercase` - an array of code points mapping this character to upper case, if any
|
|
64
|
+
* `lowercase` - an array of code points mapping this character to lower case, if any
|
|
65
|
+
* `titlecase` - an array of code points mapping this character to title case, if any
|
|
66
|
+
* `folded` - an array of code points mapping this character to a folded equivalent, if any
|
|
67
|
+
* `caseConditions` - conditions used during case mapping for this character
|
|
68
|
+
* `decomposition` - an array of code points that this character decomposes into. Used by the Unicode normalization algorithm.
|
|
69
|
+
* `compositions` - a dictionary mapping of compositions for this character
|
|
70
|
+
* `isCompat` - whether the decomposition is a compatibility one
|
|
71
|
+
* `isExcluded` - whether the character is excluded from composition
|
|
72
|
+
* `NFC_QC` - quickcheck value for NFC (0 = YES, 1 = NO, 2 = MAYBE)
|
|
73
|
+
* `NFKC_QC` - quickcheck value for NFKC (0 = YES, 1 = NO, 2 = MAYBE)
|
|
74
|
+
* `NFD_QC` - quickcheck value for NFD (0 = YES, 1 = NO)
|
|
75
|
+
* `NFKD_QC` - quickcheck value for NFKD (0 = YES, 1 = NO)
|
|
76
|
+
* `joiningType` - arabic joining type
|
|
77
|
+
* `joiningGroup` - arabic joining group
|
|
78
|
+
|
|
79
|
+
## License
|
|
80
|
+
|
|
81
|
+
MIT
|