rfc-bcp47 0.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ The MIT License (MIT)
2
+
3
+ Copyright (c) Gabriel Llamas
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in
13
+ all copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
21
+ THE SOFTWARE.
package/README.md ADDED
@@ -0,0 +1,200 @@
1
+ # bcp47
2
+
3
+ <p align="center">
4
+ <img src="docs/anatomy-screenshot.png" alt="BCP 47 language tag anatomy" width="765">
5
+ </p>
6
+
7
+ > Zero-dependency [BCP 47](https://www.rfc-editor.org/info/bcp47) / [RFC 5646](https://datatracker.ietf.org/doc/html/rfc5646) language tag toolkit for JavaScript and TypeScript
8
+
9
+ [![npm version](https://img.shields.io/npm/v/bcp47.svg)](https://www.npmjs.com/package/bcp47)
10
+ [![npm downloads](https://img.shields.io/npm/dm/bcp47.svg)](https://www.npmjs.com/package/bcp47)
11
+ [![license](https://img.shields.io/npm/l/bcp47.svg)](./LICENSE)
12
+
13
+ - **Parse** any BCP 47 language tag into a structured, typed object
14
+ - **Stringify** a tag object back into a well-formed language tag string
15
+ - **Canonicalize** with case normalization and [IANA registry](https://www.iana.org/assignments/language-subtag-registry/language-subtag-registry) data (deprecated subtags, suppress-script, extlang)
16
+ - **Match** language tags with `filter` and `lookup` per [RFC 4647](https://datatracker.ietf.org/doc/html/rfc4647)
17
+ - **Extension U/T** extraction for Unicode locales ([RFC 6067](https://datatracker.ietf.org/doc/html/rfc6067)) and transformed content ([RFC 6497](https://datatracker.ietf.org/doc/html/rfc6497))
18
+ - **Accept-Language** header parsing per [RFC 9110](https://datatracker.ietf.org/doc/html/rfc9110#section-12.5.4)
19
+ - **WCAG-ready** &mdash; use `parse()` to validate `lang` attributes per [WCAG 2.x SC 3.1.1](https://www.w3.org/WAI/WCAG22/Understanding/language-of-page) and [SC 3.1.2](https://www.w3.org/WAI/WCAG22/Understanding/language-of-parts)
20
+ - **TypeScript-first** with full type inference and strict types out of the box
21
+ - **Zero dependencies**, tree-shakeable, works in Node.js and browsers
22
+
23
+ ## Install
24
+
25
+ ```bash
26
+ npm install bcp47
27
+ ```
28
+
29
+ ## Operators
30
+
31
+ Tree-shakeable operators &mdash; import only what you need.
32
+
33
+ ### parse / stringify
34
+
35
+ ```ts
36
+ import { parse, stringify } from 'bcp47';
37
+
38
+ const tag = parse('en-Latn-US');
39
+
40
+ if (tag?.type === 'langtag') {
41
+ tag.langtag.language // 'en'
42
+ tag.langtag.script // 'Latn'
43
+ tag.langtag.region // 'US'
44
+ }
45
+
46
+ stringify(tag!); // 'en-Latn-US'
47
+
48
+ parse('invalid!'); // null
49
+ ```
50
+
51
+ `parse` returns one of three tag types or `null` for invalid input:
52
+
53
+ | `type` | When | Fields available |
54
+ |--------|------|-----------------|
55
+ | `'langtag'` | Standard language tags (`en-US`, `zh-Hant-TW`) | `langtag.language`, `langtag.script`, `langtag.region`, `langtag.extlang`, `langtag.variant`, `langtag.extension`, `langtag.privateuse` |
56
+ | `'privateuse'` | Private use tags (`x-custom`) | `privateuse` |
57
+ | `'grandfathered'` | Legacy registered tags (`i-klingon`) | `grandfathered.type`, `grandfathered.tag` |
58
+
59
+ ### langtag
60
+
61
+ Build a tag from known parts without parsing a string. Validates subtags and throws `RangeError` on invalid input:
62
+
63
+ ```ts
64
+ import { langtag, stringify } from 'bcp47';
65
+
66
+ const tag = langtag('en', { region: 'US' });
67
+ stringify(tag); // 'en-US'
68
+
69
+ langtag('!!!'); // RangeError — invalid language
70
+ ```
71
+
72
+ ### canonicalize
73
+
74
+ Reduce equivalent tags to a single canonical form — handles case normalization, deprecated subtags, suppress-script, extlang promotion, and extension ordering using [IANA registry](https://www.iana.org/assignments/language-subtag-registry/language-subtag-registry) data:
75
+
76
+ ```ts
77
+ import { canonicalize } from 'bcp47';
78
+
79
+ canonicalize('iw'); // 'he' (deprecated language)
80
+ canonicalize('zh-cmn'); // 'cmn' (extlang to preferred)
81
+ canonicalize('en-Latn'); // 'en' (suppress-script)
82
+ canonicalize('de-DD'); // 'de-DE' (deprecated region)
83
+ ```
84
+
85
+ ### filter
86
+
87
+ Find all matching tags with subtag-aware filtering per [RFC 4647 &sect;3.3.2](https://datatracker.ietf.org/doc/html/rfc4647#section-3.3.2):
88
+
89
+ ```ts
90
+ import { filter } from 'bcp47';
91
+
92
+ const tags = ['de', 'de-DE', 'de-Latn-DE', 'de-AT', 'en-US', 'fr-FR'];
93
+
94
+ filter(tags, 'de-DE'); // ['de-DE', 'de-Latn-DE'] (skips Latn to match DE)
95
+ filter(tags, 'de'); // ['de', 'de-DE', 'de-Latn-DE', 'de-AT'] (all German)
96
+ filter(tags, '*-DE'); // ['de-DE', 'de-Latn-DE'] (* wildcard = any language)
97
+ ```
98
+
99
+ ### lookup
100
+
101
+ Find the single best match via progressive truncation per [RFC 4647 &sect;3.4](https://datatracker.ietf.org/doc/html/rfc4647#section-3.4):
102
+
103
+ ```ts
104
+ import { lookup } from 'bcp47';
105
+
106
+ const tags = ['en', 'en-US', 'fr', 'de'];
107
+
108
+ lookup(tags, 'en-US-x-custom'); // 'en-US' (truncates to match)
109
+ lookup(tags, 'fr-CA'); // 'fr' (truncates to match)
110
+ lookup(tags, 'ja', 'en'); // 'en' (default fallback)
111
+ ```
112
+
113
+ Pair with `acceptLanguage` for HTTP content negotiation:
114
+
115
+ ```ts
116
+ import { acceptLanguage, lookup } from 'bcp47';
117
+
118
+ const prefs = acceptLanguage('fr-CA, en-US;q=0.8, en;q=0.5');
119
+ const best = lookup(['en', 'en-US', 'fr', 'fr-CA'], prefs.map((p) => p.tag));
120
+ // 'fr-CA'
121
+ ```
122
+
123
+ ### extensionU / extensionT
124
+
125
+ Extract Unicode locale and transformed content extensions:
126
+
127
+ ```ts
128
+ import { parse, extensionU, extensionT } from 'bcp47';
129
+
130
+ extensionU(parse('de-DE-u-co-phonebk-ca-buddhist')!);
131
+ // { attributes: [], keywords: { co: 'phonebk', ca: 'buddhist' } }
132
+
133
+ extensionT(parse('und-t-it-m0-ungegn')!);
134
+ // { source: 'it', fields: { m0: 'ungegn' } }
135
+ ```
136
+
137
+ ### acceptLanguage
138
+
139
+ Parse HTTP `Accept-Language` headers:
140
+
141
+ ```ts
142
+ import { acceptLanguage } from 'bcp47';
143
+
144
+ acceptLanguage('fr-CA, en-US;q=0.8, en;q=0.5, *;q=0.1');
145
+ // [
146
+ // { tag: 'fr-CA', quality: 1.0 },
147
+ // { tag: 'en-US', quality: 0.8 },
148
+ // { tag: 'en', quality: 0.5 },
149
+ // { tag: '*', quality: 0.1 }
150
+ // ]
151
+ ```
152
+
153
+ See the [`examples/`](./examples) folder for more usage patterns.
154
+
155
+ ## Operator Reference
156
+
157
+ | Operator | Description |
158
+ |----------|-------------|
159
+ | `parse(tag)` | Parse a BCP 47 tag string into a structured object. Returns `null` for invalid input. |
160
+ | `stringify(tag)` | Convert a parsed tag object back into a well-formed string. |
161
+ | `langtag(language, options?)` | Create a langtag object with sensible defaults. Throws `RangeError` on invalid input. |
162
+ | `canonicalize(tag)` | Normalize casing, sort extensions, apply [IANA registry](https://www.iana.org/assignments/language-subtag-registry/language-subtag-registry) mappings (deprecated subtags, suppress-script, extlang). Returns `null` for invalid input. |
163
+ | `filter(tags, patterns)` | Subtag-aware filtering with `*` wildcard support per RFC 4647 &sect;3.3.2. Returns matched tags. |
164
+ | `lookup(tags, preferences, defaultValue?)` | Lookup per RFC 4647 &sect;3.4. Returns first match or `defaultValue`/`null`. |
165
+ | `extensionU(tag)` | Extract Unicode locale attributes and keywords from the `u` extension. Takes a `BCP47Tag` (not a string). Returns `null` if absent. |
166
+ | `extensionT(tag)` | Extract transformed content data from the `t` extension. Takes a `BCP47Tag` (not a string). Returns `null` if absent. |
167
+ | `acceptLanguage(header)` | Parse an `Accept-Language` header into sorted `{ tag, quality }` entries. |
168
+
169
+ ## CLDR Key References
170
+
171
+ Typed constants mapping extension keys to human-readable descriptions, sourced from the [CLDR BCP 47 data](https://github.com/unicode-org/cldr/tree/main/common/bcp47). Zero runtime cost &mdash; tree-shaken if unused.
172
+
173
+ | Constant | Description |
174
+ |----------|-------------|
175
+ | `UNICODE_LOCALE_KEYS` | U extension keys → descriptions (e.g. `ca` → `'Calendar'`, `nu` → `'Numbering system'`) |
176
+ | `TRANSFORM_KEYS` | T extension keys → descriptions (e.g. `m0` → `'Transform mechanism'`, `s0` → `'Transform source'`) |
177
+
178
+ ## Choosing an Operator
179
+
180
+ | I want to... | Use |
181
+ |--------------|-----|
182
+ | Validate a language tag string | `parse(tag) !== null` |
183
+ | Read subtags (language, script, region) | `parse(tag)` → access `.langtag.*` |
184
+ | Build a tag from known parts | `langtag(language, options)` → `stringify(tag)` |
185
+ | Normalize casing and deprecated subtags | `canonicalize(tag)` |
186
+ | Read Unicode locale preferences (calendar, collation) | `parse(tag)` → `extensionU(parsedTag)` |
187
+ | Read transformed content metadata (source language) | `parse(tag)` → `extensionT(parsedTag)` |
188
+ | Find all locales matching a preference | `filter(tags, patterns)` |
189
+ | Pick the single best locale for a user | `lookup(tags, preferences, defaultValue)` |
190
+ | Parse an HTTP Accept-Language header | `acceptLanguage(header)` → `lookup()` or `filter()` |
191
+
192
+ > **Note:** `canonicalize` and `acceptLanguage` take strings. `extensionU` and `extensionT` take a pre-parsed `BCP47Tag` from `parse()`. This avoids re-parsing when you need multiple operations on the same tag.
193
+
194
+ ## Changelog
195
+
196
+ See [CHANGELOG.md](./CHANGELOG.md) for breaking changes and release history.
197
+
198
+ ## License
199
+
200
+ [MIT](./LICENSE)