@chr33s/pdf-codepoints 5.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/LICENSE.md ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2014-present Devon Govett
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
package/README.md ADDED
@@ -0,0 +1,81 @@
1
+ # @chr33s/pdf-codepoints
2
+
3
+ > A parser for files in the Unicode database.
4
+
5
+ Distributed as native ES modules with NodeNext resolution (use Node.js 18+ or a modern bundler).
6
+
7
+ `@chr33s/pdf-codepoints` lives in the [`chr33s/pdf`](https://github.com/chr33s/pdf) monorepo and provides native ES modules with TypeScript declarations.
8
+
9
+ ## Overview
10
+
11
+ Produces a giant array of codepoint objects for
12
+ every character represented by Unicode, with many properties derived from files in the Unicode
13
+ database.
14
+
15
+ **BUILD SCRIPTS ONLY**: Use in production is not recommended
16
+ as the parsers are not optimized for speed, the text files are huge, and the resulting array uses a
17
+ huge amount of memory. To access this data in real world applications, use modules that have
18
+ precompiled the data into a compressed form:
19
+
20
+ * [@chr33s/pdf-unicode-properties](https://github.com/chr33s/pdf/tree/main/packages/unicode-properties)
21
+
22
+ ## Installation
23
+
24
+ Install using npm:
25
+
26
+ npm install @chr33s/pdf-codepoints
27
+
28
+ ## Usage
29
+
30
+ Basic usage:
31
+
32
+ ```js
33
+ import codepoints from "@chr33s/pdf-codepoints";
34
+ ```
35
+
36
+ The parser generates data by reading the text files contained in the
37
+ [Unicode Character Database](http://unicode.org/ucd/). By default, it will use the database
38
+ bundled with this package. To use a custom version of UCD, use `@chr33s/pdf-codepoints/parser`
39
+ instead, which accepts an optional path to a directory containing the uncompressed UCD data:
40
+
41
+ ```js
42
+ import { parser } from "@chr33s/pdf-codepoints";
43
+ codepoints = parser("/path/to/UCD");
44
+ ```
45
+
46
+ ## Codepoint data
47
+
48
+ Each element in the generated array is either `undefined` (for unassigned code
49
+ points), or an object containing the following properties:
50
+
51
+ * `code` - the code point index
52
+ * `name` - character name
53
+ * `unicode1Name` - legacy name used by Unicode 1
54
+ * `category` - Unicode category
55
+ * `block` - the block name this character is a part of
56
+ * `script` - the script this character belongs to
57
+ * `eastAsianWidth` - the east asian width for this character
58
+ * `combiningClass` - numeric combining class value
59
+ * `combiningClassName` - a string name for the combining class
60
+ * `bidiClass` - class for the Unicode bidirectional algorithm
61
+ * `bidiMirrored` - whether the character is mirrored in the bidi algorithm
62
+ * `numeric` - the numeric value for this character
63
+ * `uppercase` - an array of code points mapping this character to upper case, if any
64
+ * `lowercase` - an array of code points mapping this character to lower case, if any
65
+ * `titlecase` - an array of code points mapping this character to title case, if any
66
+ * `folded` - an array of code points mapping this character to a folded equivalent, if any
67
+ * `caseConditions` - conditions used during case mapping for this character
68
+ * `decomposition` - an array of code points that this character decomposes into. Used by the Unicode normalization algorithm.
69
+ * `compositions` - a dictionary mapping of compositions for this character
70
+ * `isCompat` - whether the decomposition is a compatibility one
71
+ * `isExcluded` - whether the character is excluded from composition
72
+ * `NFC_QC` - quickcheck value for NFC (0 = YES, 1 = NO, 2 = MAYBE)
73
+ * `NFKC_QC` - quickcheck value for NFKC (0 = YES, 1 = NO, 2 = MAYBE)
74
+ * `NFD_QC` - quickcheck value for NFD (0 = YES, 1 = NO)
75
+ * `NFKD_QC` - quickcheck value for NFKD (0 = YES, 1 = NO)
76
+ * `joiningType` - arabic joining type
77
+ * `joiningGroup` - arabic joining group
78
+
79
+ ## License
80
+
81
+ MIT