@astilba/core 0.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/LICENSE +21 -0
- package/README.md +60 -0
- package/dist/cldr.d.ts +52 -0
- package/dist/cldr.js +258 -0
- package/dist/cldr.js.map +1 -0
- package/dist/harness.d.ts +68 -0
- package/dist/harness.js +47 -0
- package/dist/harness.js.map +1 -0
- package/dist/index.d.ts +98 -0
- package/dist/index.js +202 -0
- package/dist/index.js.map +1 -0
- package/dist/model-5mrSQGoC.d.ts +102 -0
- package/package.json +62 -0
package/LICENSE
ADDED
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
MIT License
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2026 Rees Morris
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
|
13
|
+
copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
21
|
+
SOFTWARE.
|
package/README.md
ADDED
|
@@ -0,0 +1,60 @@
|
|
|
1
|
+
# @astilba/core — Phase 0
|
|
2
|
+
|
|
3
|
+
The **format-neutral heart** of Astilba: a canonical i18n message model, vendored
|
|
4
|
+
CLDR plural rules, the `AstilbaError` class, the **round-trip message-fidelity
|
|
5
|
+
harness** (a generic driver plus the `FormatAdapter` / `RenderOracle` contracts a
|
|
6
|
+
syntax adapter implements), and MT masking + placeholder validation over canonical
|
|
7
|
+
tokens.
|
|
8
|
+
|
|
9
|
+
This package knows nothing about any specific file format or message syntax — that
|
|
10
|
+
lives in a syntax adapter such as
|
|
11
|
+
[`@astilba/adapter-i18next-v4`](https://github.com/astilbahq/astilba/tree/main/packages/adapter-i18next-v4)
|
|
12
|
+
(internal for now, not yet published), which depends on this package and plugs in the
|
|
13
|
+
native i18next-v4 parser, exporter, and render oracle. **The adapter is native i18next v4 only; ICU is rejected loudly,
|
|
14
|
+
never silently approximated.**
|
|
15
|
+
|
|
16
|
+
```bash
|
|
17
|
+
pnpm install
|
|
18
|
+
vp test # the fidelity matrix IS the test suite
|
|
19
|
+
vp run typecheck
|
|
20
|
+
```
|
|
21
|
+
|
|
22
|
+
## What "lossless" means here
|
|
23
|
+
|
|
24
|
+
Equivalence is on **resolved `t()` output**, not file bytes. An adapter may
|
|
25
|
+
renormalise freely (nested↔flat keys, key ordering, suffix re-derivation), so byte
|
|
26
|
+
equality is explicitly _not_ the test. The harness drives an adapter to import a
|
|
27
|
+
bundle `B → C` (canonical), export `C → B'`, and asserts that for every probe
|
|
28
|
+
vector — each context value and each plural category **that has a value**, with a
|
|
29
|
+
representative `count` per category — the adapter's render of `B'` is byte-identical
|
|
30
|
+
to its render of `B`.
|
|
31
|
+
|
|
32
|
+
## Module map
|
|
33
|
+
|
|
34
|
+
| Module | Role |
|
|
35
|
+
| ------------ | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
|
36
|
+
| `model.ts` | The canonical model: `Key → contexts → PluralSet(kind, category→Value)`. Value text is preserved byte-exact (`Value.raw`); `tokens` is a derived view used for masking/validation. |
|
|
37
|
+
| `cldr.ts` | **Vendored** CLDR plural rules — host-ICU-independent. Category sets, `select(n)`, and a representative count per category, for en/de/fr/ru/pl/ar/ja/zh/ko. Unknown languages fall back to `other`-only. |
|
|
38
|
+
| `errors.ts` | `AstilbaError` — one error class with an open string `code`, so each adapter declares its own code set without coupling core to a syntax. |
|
|
39
|
+
| `harness.ts` | The generic round-trip driver (`runRoundTrip`) and the `FormatAdapter` / `RenderOracle` contracts an adapter implements. Knows only the canonical model — nothing format-specific. |
|
|
40
|
+
| `mask.ts` | MT masking/unmasking over canonical tokens plus the fail-closed, CI-failable placeholder validator. The string-entry validator takes a `Tokenizer` by injection, so it is callable standalone with any syntax adapter's tokenizer. |
|
|
41
|
+
|
|
42
|
+
The model and CLDR rules are also reachable via the `@astilba/core/cldr` subpath.
|
|
43
|
+
|
|
44
|
+
## MT masking & placeholder validation
|
|
45
|
+
|
|
46
|
+
`maskTokens` replaces every non-text token (interpolation, `$t()` nesting, markup)
|
|
47
|
+
with an opaque sentinel before an MT/LLM call — the formatter keyword and nesting
|
|
48
|
+
ref live _inside_ the masked span, so the engine never translates them — and
|
|
49
|
+
`unmask` restores them. `validatePlaceholderTokens` is a fail-closed check that
|
|
50
|
+
every interpolation variable, formatter keyword, nesting ref (and its options), and
|
|
51
|
+
markup tag survived translation unmodified; `validatePlaceholders(source,
|
|
52
|
+
translated, tokenize)` is the raw-string entry point for CI.
|
|
53
|
+
|
|
54
|
+
## Known Phase-0 boundaries (logged, not bugs)
|
|
55
|
+
|
|
56
|
+
- **One plural _kind_ per (key, context):** the model holds either a cardinal or an
|
|
57
|
+
ordinal map per cell, so a key used as both is rejected by the adapter
|
|
58
|
+
(`INVALID_RESOURCE`) rather than silently merged.
|
|
59
|
+
- The **vendored CLDR table** covers the corpus-plan languages; other languages fall
|
|
60
|
+
back to `other`-only unless a full table is vendored.
|
package/dist/cldr.d.ts
ADDED
|
@@ -0,0 +1,52 @@
|
|
|
1
|
+
import { a as PluralKind, n as CLDRCategory } from "./model-5mrSQGoC.js";
|
|
2
|
+
|
|
3
|
+
//#region src/cldr.d.ts
|
|
4
|
+
/** CLDR plural operands for a non-negative number `n`. */
|
|
5
|
+
interface Operands {
|
|
6
|
+
n: number;
|
|
7
|
+
i: number;
|
|
8
|
+
v: number;
|
|
9
|
+
f: number;
|
|
10
|
+
t: number;
|
|
11
|
+
}
|
|
12
|
+
declare const operands: (input: number) => Operands;
|
|
13
|
+
interface PluralRule {
|
|
14
|
+
categories: CLDRCategory[];
|
|
15
|
+
select: (n: number) => CLDRCategory;
|
|
16
|
+
/** A representative count for each category in `categories`. */
|
|
17
|
+
representative: Partial<Record<CLDRCategory, number>>;
|
|
18
|
+
}
|
|
19
|
+
interface LanguagePlurals {
|
|
20
|
+
cardinal: PluralRule;
|
|
21
|
+
ordinal: PluralRule;
|
|
22
|
+
}
|
|
23
|
+
/** Primary subtags with vendored CLDR rules. */
|
|
24
|
+
declare const SUPPORTED_LANGUAGES: string[];
|
|
25
|
+
/**
|
|
26
|
+
* Reduce a BCP-47 tag to its primary language subtag (`en-US` -> `en`).
|
|
27
|
+
*
|
|
28
|
+
* KNOWN LIMITATION: this collapses the region, but a few languages have
|
|
29
|
+
* region-specific CLDR plural rules — most notably `pt-PT` differs from `pt`
|
|
30
|
+
* (Brazilian). Harmless while the vendored table is small (those languages
|
|
31
|
+
* aren't in it, so both fall back to `other`-only), but when the full
|
|
32
|
+
* `make-plural` table is vendored, region-specific entries (e.g. `pt-PT`) must
|
|
33
|
+
* be looked up before the primary subtag (a v1.0 item).
|
|
34
|
+
*/
|
|
35
|
+
declare const primarySubtag: (language: string) => string;
|
|
36
|
+
declare const isSupportedLanguage: (language: string) => boolean;
|
|
37
|
+
/**
|
|
38
|
+
* Resolve a language's plural rules. Unknown languages fall back to `other`-only
|
|
39
|
+
* (graceful) — callers that need to warn can check
|
|
40
|
+
* `isSupportedLanguage` first.
|
|
41
|
+
*/
|
|
42
|
+
declare const getPlurals: (language: string) => LanguagePlurals;
|
|
43
|
+
declare const categoriesFor: (language: string, kind: PluralKind) => CLDRCategory[];
|
|
44
|
+
/** The union of cardinal + ordinal categories for a language (used by disambiguation). */
|
|
45
|
+
declare const allCategoriesFor: (language: string) => Set<CLDRCategory>;
|
|
46
|
+
/** Select the category for a count, honoring the ordinal flag. */
|
|
47
|
+
declare const selectCategory: (language: string, count: number, ordinal: boolean) => CLDRCategory;
|
|
48
|
+
/** A representative count that selects `category` for the given language/kind. */
|
|
49
|
+
declare const representativeCount: (language: string, kind: "cardinal" | "ordinal", category: CLDRCategory) => number;
|
|
50
|
+
//#endregion
|
|
51
|
+
export { LanguagePlurals, PluralRule, SUPPORTED_LANGUAGES, allCategoriesFor, categoriesFor, getPlurals, isSupportedLanguage, operands, primarySubtag, representativeCount, selectCategory };
|
|
52
|
+
//# sourceMappingURL=cldr.d.ts.map
|
package/dist/cldr.js
ADDED
|
@@ -0,0 +1,258 @@
|
|
|
1
|
+
//#region src/cldr.ts
|
|
2
|
+
const operands = (input) => {
|
|
3
|
+
const n = Math.abs(input);
|
|
4
|
+
const s = n.toString();
|
|
5
|
+
const dot = s.indexOf(".");
|
|
6
|
+
if (dot === -1) return {
|
|
7
|
+
f: 0,
|
|
8
|
+
i: n,
|
|
9
|
+
n,
|
|
10
|
+
t: 0,
|
|
11
|
+
v: 0
|
|
12
|
+
};
|
|
13
|
+
const intPart = s.slice(0, dot);
|
|
14
|
+
const fracPart = s.slice(dot + 1);
|
|
15
|
+
const trimmed = fracPart.replace(/0+$/u, "");
|
|
16
|
+
return {
|
|
17
|
+
f: fracPart === "" ? 0 : Number.parseInt(fracPart, 10),
|
|
18
|
+
i: Number.parseInt(intPart, 10),
|
|
19
|
+
n,
|
|
20
|
+
t: trimmed === "" ? 0 : Number.parseInt(trimmed, 10),
|
|
21
|
+
v: fracPart.length
|
|
22
|
+
};
|
|
23
|
+
};
|
|
24
|
+
const inRange = (x, lo, hi) => x >= lo && x <= hi;
|
|
25
|
+
/** The trivial single-category rule (Japanese, Chinese, Korean, ... and the fallback). */
|
|
26
|
+
const OTHER_ONLY = {
|
|
27
|
+
categories: ["other"],
|
|
28
|
+
representative: { other: 1 },
|
|
29
|
+
select: () => "other"
|
|
30
|
+
};
|
|
31
|
+
/**
|
|
32
|
+
* The vendored table. Keys are base languages (BCP-47 primary subtag).
|
|
33
|
+
* Languages whose ordinal has no special rule use OTHER_ONLY for ordinal.
|
|
34
|
+
*/
|
|
35
|
+
const TABLE = {
|
|
36
|
+
ar: {
|
|
37
|
+
cardinal: {
|
|
38
|
+
categories: [
|
|
39
|
+
"zero",
|
|
40
|
+
"one",
|
|
41
|
+
"two",
|
|
42
|
+
"few",
|
|
43
|
+
"many",
|
|
44
|
+
"other"
|
|
45
|
+
],
|
|
46
|
+
representative: {
|
|
47
|
+
few: 3,
|
|
48
|
+
many: 11,
|
|
49
|
+
one: 1,
|
|
50
|
+
other: 100,
|
|
51
|
+
two: 2,
|
|
52
|
+
zero: 0
|
|
53
|
+
},
|
|
54
|
+
select: (n) => {
|
|
55
|
+
const o = operands(n);
|
|
56
|
+
if (o.n === 0) return "zero";
|
|
57
|
+
if (o.n === 1) return "one";
|
|
58
|
+
if (o.n === 2) return "two";
|
|
59
|
+
const mod100 = o.i % 100;
|
|
60
|
+
if (o.v === 0 && inRange(mod100, 3, 10)) return "few";
|
|
61
|
+
if (o.v === 0 && inRange(mod100, 11, 99)) return "many";
|
|
62
|
+
return "other";
|
|
63
|
+
}
|
|
64
|
+
},
|
|
65
|
+
ordinal: OTHER_ONLY
|
|
66
|
+
},
|
|
67
|
+
de: {
|
|
68
|
+
cardinal: {
|
|
69
|
+
categories: ["one", "other"],
|
|
70
|
+
representative: {
|
|
71
|
+
one: 1,
|
|
72
|
+
other: 2
|
|
73
|
+
},
|
|
74
|
+
select: (n) => {
|
|
75
|
+
const o = operands(n);
|
|
76
|
+
return o.i === 1 && o.v === 0 ? "one" : "other";
|
|
77
|
+
}
|
|
78
|
+
},
|
|
79
|
+
ordinal: OTHER_ONLY
|
|
80
|
+
},
|
|
81
|
+
en: {
|
|
82
|
+
cardinal: {
|
|
83
|
+
categories: ["one", "other"],
|
|
84
|
+
representative: {
|
|
85
|
+
one: 1,
|
|
86
|
+
other: 2
|
|
87
|
+
},
|
|
88
|
+
select: (n) => {
|
|
89
|
+
const o = operands(n);
|
|
90
|
+
return o.i === 1 && o.v === 0 ? "one" : "other";
|
|
91
|
+
}
|
|
92
|
+
},
|
|
93
|
+
ordinal: {
|
|
94
|
+
categories: [
|
|
95
|
+
"one",
|
|
96
|
+
"two",
|
|
97
|
+
"few",
|
|
98
|
+
"other"
|
|
99
|
+
],
|
|
100
|
+
representative: {
|
|
101
|
+
few: 3,
|
|
102
|
+
one: 1,
|
|
103
|
+
other: 4,
|
|
104
|
+
two: 2
|
|
105
|
+
},
|
|
106
|
+
select: (n) => {
|
|
107
|
+
const { i } = operands(n);
|
|
108
|
+
const mod10 = i % 10;
|
|
109
|
+
const mod100 = i % 100;
|
|
110
|
+
if (mod10 === 1 && mod100 !== 11) return "one";
|
|
111
|
+
if (mod10 === 2 && mod100 !== 12) return "two";
|
|
112
|
+
if (mod10 === 3 && mod100 !== 13) return "few";
|
|
113
|
+
return "other";
|
|
114
|
+
}
|
|
115
|
+
}
|
|
116
|
+
},
|
|
117
|
+
fr: {
|
|
118
|
+
cardinal: {
|
|
119
|
+
categories: [
|
|
120
|
+
"one",
|
|
121
|
+
"many",
|
|
122
|
+
"other"
|
|
123
|
+
],
|
|
124
|
+
representative: {
|
|
125
|
+
many: 1e6,
|
|
126
|
+
one: 1,
|
|
127
|
+
other: 2
|
|
128
|
+
},
|
|
129
|
+
select: (n) => {
|
|
130
|
+
const o = operands(n);
|
|
131
|
+
if (o.i === 0 || o.i === 1) return "one";
|
|
132
|
+
if (o.i % 1e6 === 0 && o.v === 0) return "many";
|
|
133
|
+
return "other";
|
|
134
|
+
}
|
|
135
|
+
},
|
|
136
|
+
ordinal: {
|
|
137
|
+
categories: ["one", "other"],
|
|
138
|
+
representative: {
|
|
139
|
+
one: 1,
|
|
140
|
+
other: 2
|
|
141
|
+
},
|
|
142
|
+
select: (n) => operands(n).n === 1 ? "one" : "other"
|
|
143
|
+
}
|
|
144
|
+
},
|
|
145
|
+
ja: {
|
|
146
|
+
cardinal: OTHER_ONLY,
|
|
147
|
+
ordinal: OTHER_ONLY
|
|
148
|
+
},
|
|
149
|
+
ko: {
|
|
150
|
+
cardinal: OTHER_ONLY,
|
|
151
|
+
ordinal: OTHER_ONLY
|
|
152
|
+
},
|
|
153
|
+
pl: {
|
|
154
|
+
cardinal: {
|
|
155
|
+
categories: [
|
|
156
|
+
"one",
|
|
157
|
+
"few",
|
|
158
|
+
"many",
|
|
159
|
+
"other"
|
|
160
|
+
],
|
|
161
|
+
representative: {
|
|
162
|
+
few: 2,
|
|
163
|
+
many: 5,
|
|
164
|
+
one: 1,
|
|
165
|
+
other: 1.5
|
|
166
|
+
},
|
|
167
|
+
select: (n) => {
|
|
168
|
+
const o = operands(n);
|
|
169
|
+
if (o.v !== 0) return "other";
|
|
170
|
+
if (o.i === 1) return "one";
|
|
171
|
+
const mod10 = o.i % 10;
|
|
172
|
+
const mod100 = o.i % 100;
|
|
173
|
+
if (inRange(mod10, 2, 4) && !inRange(mod100, 12, 14)) return "few";
|
|
174
|
+
return "many";
|
|
175
|
+
}
|
|
176
|
+
},
|
|
177
|
+
ordinal: OTHER_ONLY
|
|
178
|
+
},
|
|
179
|
+
ru: {
|
|
180
|
+
cardinal: {
|
|
181
|
+
categories: [
|
|
182
|
+
"one",
|
|
183
|
+
"few",
|
|
184
|
+
"many",
|
|
185
|
+
"other"
|
|
186
|
+
],
|
|
187
|
+
representative: {
|
|
188
|
+
few: 2,
|
|
189
|
+
many: 5,
|
|
190
|
+
one: 1,
|
|
191
|
+
other: 1.5
|
|
192
|
+
},
|
|
193
|
+
select: (n) => {
|
|
194
|
+
const o = operands(n);
|
|
195
|
+
if (o.v !== 0) return "other";
|
|
196
|
+
const mod10 = o.i % 10;
|
|
197
|
+
const mod100 = o.i % 100;
|
|
198
|
+
if (mod10 === 1 && mod100 !== 11) return "one";
|
|
199
|
+
if (inRange(mod10, 2, 4) && !inRange(mod100, 12, 14)) return "few";
|
|
200
|
+
return "many";
|
|
201
|
+
}
|
|
202
|
+
},
|
|
203
|
+
ordinal: OTHER_ONLY
|
|
204
|
+
},
|
|
205
|
+
zh: {
|
|
206
|
+
cardinal: OTHER_ONLY,
|
|
207
|
+
ordinal: OTHER_ONLY
|
|
208
|
+
}
|
|
209
|
+
};
|
|
210
|
+
const FALLBACK = {
|
|
211
|
+
cardinal: OTHER_ONLY,
|
|
212
|
+
ordinal: OTHER_ONLY
|
|
213
|
+
};
|
|
214
|
+
/** Primary subtags with vendored CLDR rules. */
|
|
215
|
+
const SUPPORTED_LANGUAGES = Object.keys(TABLE);
|
|
216
|
+
/**
|
|
217
|
+
* Reduce a BCP-47 tag to its primary language subtag (`en-US` -> `en`).
|
|
218
|
+
*
|
|
219
|
+
* KNOWN LIMITATION: this collapses the region, but a few languages have
|
|
220
|
+
* region-specific CLDR plural rules — most notably `pt-PT` differs from `pt`
|
|
221
|
+
* (Brazilian). Harmless while the vendored table is small (those languages
|
|
222
|
+
* aren't in it, so both fall back to `other`-only), but when the full
|
|
223
|
+
* `make-plural` table is vendored, region-specific entries (e.g. `pt-PT`) must
|
|
224
|
+
* be looked up before the primary subtag (a v1.0 item).
|
|
225
|
+
*/
|
|
226
|
+
const primarySubtag = (language) => language.toLowerCase().split(/[-_]/u)[0] ?? language.toLowerCase();
|
|
227
|
+
const isSupportedLanguage = (language) => primarySubtag(language) in TABLE;
|
|
228
|
+
/**
|
|
229
|
+
* Resolve a language's plural rules. Unknown languages fall back to `other`-only
|
|
230
|
+
* (graceful) — callers that need to warn can check
|
|
231
|
+
* `isSupportedLanguage` first.
|
|
232
|
+
*/
|
|
233
|
+
const getPlurals = (language) => TABLE[primarySubtag(language)] ?? FALLBACK;
|
|
234
|
+
const categoriesFor = (language, kind) => {
|
|
235
|
+
if (kind === "none") return ["other"];
|
|
236
|
+
const p = getPlurals(language);
|
|
237
|
+
return kind === "ordinal" ? p.ordinal.categories : p.cardinal.categories;
|
|
238
|
+
};
|
|
239
|
+
/** The union of cardinal + ordinal categories for a language (used by disambiguation). */
|
|
240
|
+
const allCategoriesFor = (language) => {
|
|
241
|
+
const p = getPlurals(language);
|
|
242
|
+
return new Set([...p.cardinal.categories, ...p.ordinal.categories]);
|
|
243
|
+
};
|
|
244
|
+
/** Select the category for a count, honoring the ordinal flag. */
|
|
245
|
+
const selectCategory = (language, count, ordinal) => {
|
|
246
|
+
const p = getPlurals(language);
|
|
247
|
+
return ordinal ? p.ordinal.select(count) : p.cardinal.select(count);
|
|
248
|
+
};
|
|
249
|
+
/** A representative count that selects `category` for the given language/kind. */
|
|
250
|
+
const representativeCount = (language, kind, category) => {
|
|
251
|
+
if (kind === "cardinal" && category === "zero") return 0;
|
|
252
|
+
const p = getPlurals(language);
|
|
253
|
+
return (kind === "ordinal" ? p.ordinal : p.cardinal).representative[category] ?? 1;
|
|
254
|
+
};
|
|
255
|
+
//#endregion
|
|
256
|
+
export { SUPPORTED_LANGUAGES, allCategoriesFor, categoriesFor, getPlurals, isSupportedLanguage, operands, primarySubtag, representativeCount, selectCategory };
|
|
257
|
+
|
|
258
|
+
//# sourceMappingURL=cldr.js.map
|
package/dist/cldr.js.map
ADDED
|
@@ -0,0 +1 @@
|
|
|
1
|
+
{"version":3,"file":"cldr.js","names":[],"sources":["../src/cldr.ts"],"sourcesContent":["/**\n * Vendored CLDR plural rules.\n *\n * Why vendored and not `Intl.PluralRules`: native i18next resolves plurals\n * against the END USER's ambient browser/host ICU, which drifts across CLDR\n * versions and across Bun/Node/Workers. If our exporter emits a suffix set that\n * disagrees with what the client's ICU asks for, the user sees a silent miss.\n * So we pin one table here, used by BOTH the exporter (which suffix keys to emit)\n * and the round-trip resolver (which suffix a given count selects). This makes\n * emission host-independent and the harness fully deterministic.\n *\n * Each rule carries:\n * - `categories`: the ordered set of categories that apply to the language.\n * - `select(n)`: the CLDR rule, count -> category.\n * - `representative`: one count per category, for building round-trip test\n * vectors (a representative count).\n *\n * Rules transcribed from CLDR plural-rule data. This vendored table is the\n * Phase-0 stand-in for `make-plural`; the corpus-plan languages plus a few\n * common ones are covered. Unknown languages fall back to `other`-only\n * (custom variants are opaque tags that fall back gracefully).\n */\n\nimport type { CLDRCategory, PluralKind } from \"./model.ts\";\n\n/** CLDR plural operands for a non-negative number `n`. */\ninterface Operands {\n // absolute value\n n: number;\n // integer digits\n i: number;\n // count of visible fraction digits (with trailing zeros)\n v: number;\n // visible fraction digits (with trailing zeros), as integer\n f: number;\n // visible fraction digits (without trailing zeros), as integer\n t: number;\n}\n\nexport const operands = (input: number): Operands => {\n const n = Math.abs(input);\n const s = n.toString();\n const dot = s.indexOf(\".\");\n if (dot === -1) {\n return { f: 0, i: n, n, t: 0, v: 0 };\n }\n const intPart = s.slice(0, dot);\n const fracPart = s.slice(dot + 1);\n const trimmed = fracPart.replace(/0+$/u, \"\");\n return {\n f: fracPart === \"\" ? 0 : Number.parseInt(fracPart, 10),\n i: Number.parseInt(intPart, 10),\n n,\n t: trimmed === \"\" ? 0 : Number.parseInt(trimmed, 10),\n v: fracPart.length,\n };\n};\n\nconst inRange = (x: number, lo: number, hi: number): boolean =>\n x >= lo && x <= hi;\n\nexport interface PluralRule {\n categories: CLDRCategory[];\n select: (n: number) => CLDRCategory;\n /** A representative count for each category in `categories`. */\n representative: Partial<Record<CLDRCategory, number>>;\n}\n\nexport interface LanguagePlurals {\n cardinal: PluralRule;\n ordinal: PluralRule;\n}\n\n/** The trivial single-category rule (Japanese, Chinese, Korean, ... and the fallback). */\nconst OTHER_ONLY: PluralRule = {\n categories: [\"other\"],\n representative: { other: 1 },\n select: () => \"other\",\n};\n\nconst EN_CARDINAL: PluralRule = {\n categories: [\"one\", \"other\"],\n representative: { one: 1, other: 2 },\n select: (n) => {\n const o = operands(n);\n return o.i === 1 && o.v === 0 ? \"one\" : \"other\";\n },\n};\n\nconst EN_ORDINAL: PluralRule = {\n categories: [\"one\", \"two\", \"few\", \"other\"],\n representative: { few: 3, one: 1, other: 4, two: 2 },\n select: (n) => {\n // CLDR ordinal categories key off the integer operand `i`, which operands()\n // derives with Math.abs — so negative counts select the same form as i18next\n // (raw `n % 10` would give a negative remainder and miss the `one`/`two`/`few`\n // branches, e.g. -1 -> \"other\" instead of \"one\").\n const { i } = operands(n);\n const mod10 = i % 10;\n const mod100 = i % 100;\n // 1st, 21st\n if (mod10 === 1 && mod100 !== 11) {\n return \"one\";\n }\n // 2nd, 22nd\n if (mod10 === 2 && mod100 !== 12) {\n return \"two\";\n }\n // 3rd, 23rd\n if (mod10 === 3 && mod100 !== 13) {\n return \"few\";\n }\n // 4th, 11th, 12th, 13th\n return \"other\";\n },\n};\n\nconst FR_CARDINAL: PluralRule = {\n categories: [\"one\", \"many\", \"other\"],\n representative: { many: 1_000_000, one: 1, other: 2 },\n select: (n) => {\n const o = operands(n);\n if (o.i === 0 || o.i === 1) {\n return \"one\";\n }\n // CLDR fr `many`: large round numbers (10^6, ...) with no fraction digits.\n // (i === 0 already returned `one` above, so no `i !== 0` guard is needed here.)\n if (o.i % 1_000_000 === 0 && o.v === 0) {\n return \"many\";\n }\n return \"other\";\n },\n};\n\nconst FR_ORDINAL: PluralRule = {\n categories: [\"one\", \"other\"],\n representative: { one: 1, other: 2 },\n // CLDR fr ordinal \"one\" is `n = 1` on the absolute value; operands().n applies\n // Math.abs so -1 selects \"one\" like i18next (raw `n === 1` missed it).\n select: (n) => (operands(n).n === 1 ? \"one\" : \"other\"),\n};\n\nconst DE_CARDINAL: PluralRule = {\n categories: [\"one\", \"other\"],\n representative: { one: 1, other: 2 },\n select: (n) => {\n const o = operands(n);\n return o.i === 1 && o.v === 0 ? \"one\" : \"other\";\n },\n};\n\nconst RU_CARDINAL: PluralRule = {\n categories: [\"one\", \"few\", \"many\", \"other\"],\n representative: { few: 2, many: 5, one: 1, other: 1.5 },\n select: (n) => {\n const o = operands(n);\n // fractions\n if (o.v !== 0) {\n return \"other\";\n }\n const mod10 = o.i % 10;\n const mod100 = o.i % 100;\n if (mod10 === 1 && mod100 !== 11) {\n return \"one\";\n }\n if (inRange(mod10, 2, 4) && !inRange(mod100, 12, 14)) {\n return \"few\";\n }\n // mod10 in {0,5..9} or mod100 in 11..14\n return \"many\";\n },\n};\n\nconst PL_CARDINAL: PluralRule = {\n categories: [\"one\", \"few\", \"many\", \"other\"],\n representative: { few: 2, many: 5, one: 1, other: 1.5 },\n select: (n) => {\n const o = operands(n);\n // fractions\n if (o.v !== 0) {\n return \"other\";\n }\n if (o.i === 1) {\n return \"one\";\n }\n const mod10 = o.i % 10;\n const mod100 = o.i % 100;\n if (inRange(mod10, 2, 4) && !inRange(mod100, 12, 14)) {\n return \"few\";\n }\n return \"many\";\n },\n};\n\nconst AR_CARDINAL: PluralRule = {\n categories: [\"zero\", \"one\", \"two\", \"few\", \"many\", \"other\"],\n representative: { few: 3, many: 11, one: 1, other: 100, two: 2, zero: 0 },\n select: (n) => {\n const o = operands(n);\n if (o.n === 0) {\n return \"zero\";\n }\n if (o.n === 1) {\n return \"one\";\n }\n if (o.n === 2) {\n return \"two\";\n }\n const mod100 = o.i % 100;\n if (o.v === 0 && inRange(mod100, 3, 10)) {\n return \"few\";\n }\n if (o.v === 0 && inRange(mod100, 11, 99)) {\n return \"many\";\n }\n return \"other\";\n },\n};\n\n/**\n * The vendored table. Keys are base languages (BCP-47 primary subtag).\n * Languages whose ordinal has no special rule use OTHER_ONLY for ordinal.\n */\nconst TABLE: Record<string, LanguagePlurals> = {\n ar: { cardinal: AR_CARDINAL, ordinal: OTHER_ONLY },\n de: { cardinal: DE_CARDINAL, ordinal: OTHER_ONLY },\n en: { cardinal: EN_CARDINAL, ordinal: EN_ORDINAL },\n fr: { cardinal: FR_CARDINAL, ordinal: FR_ORDINAL },\n ja: { cardinal: OTHER_ONLY, ordinal: OTHER_ONLY },\n ko: { cardinal: OTHER_ONLY, ordinal: OTHER_ONLY },\n pl: { cardinal: PL_CARDINAL, ordinal: OTHER_ONLY },\n ru: { cardinal: RU_CARDINAL, ordinal: OTHER_ONLY },\n zh: { cardinal: OTHER_ONLY, ordinal: OTHER_ONLY },\n};\n\nconst FALLBACK: LanguagePlurals = { cardinal: OTHER_ONLY, ordinal: OTHER_ONLY };\n\n/** Primary subtags with vendored CLDR rules. */\nexport const SUPPORTED_LANGUAGES: string[] = Object.keys(TABLE);\n\n/**\n * Reduce a BCP-47 tag to its primary language subtag (`en-US` -> `en`).\n *\n * KNOWN LIMITATION: this collapses the region, but a few languages have\n * region-specific CLDR plural rules — most notably `pt-PT` differs from `pt`\n * (Brazilian). Harmless while the vendored table is small (those languages\n * aren't in it, so both fall back to `other`-only), but when the full\n * `make-plural` table is vendored, region-specific entries (e.g. `pt-PT`) must\n * be looked up before the primary subtag (a v1.0 item).\n */\nexport const primarySubtag = (language: string): string =>\n language.toLowerCase().split(/[-_]/u)[0] ?? language.toLowerCase();\n\nexport const isSupportedLanguage = (language: string): boolean =>\n primarySubtag(language) in TABLE;\n\n/**\n * Resolve a language's plural rules. Unknown languages fall back to `other`-only\n * (graceful) — callers that need to warn can check\n * `isSupportedLanguage` first.\n */\nexport const getPlurals = (language: string): LanguagePlurals =>\n TABLE[primarySubtag(language)] ?? FALLBACK;\n\nexport const categoriesFor = (\n language: string,\n kind: PluralKind\n): CLDRCategory[] => {\n if (kind === \"none\") {\n return [\"other\"];\n }\n const p = getPlurals(language);\n return kind === \"ordinal\" ? p.ordinal.categories : p.cardinal.categories;\n};\n\n/** The union of cardinal + ordinal categories for a language (used by disambiguation). */\nexport const allCategoriesFor = (language: string): Set<CLDRCategory> => {\n const p = getPlurals(language);\n return new Set<CLDRCategory>([\n ...p.cardinal.categories,\n ...p.ordinal.categories,\n ]);\n};\n\n/** Select the category for a count, honoring the ordinal flag. */\nexport const selectCategory = (\n language: string,\n count: number,\n ordinal: boolean\n): CLDRCategory => {\n const p = getPlurals(language);\n return ordinal ? p.ordinal.select(count) : p.cardinal.select(count);\n};\n\n/** A representative count that selects `category` for the given language/kind. */\nexport const representativeCount = (\n language: string,\n kind: \"cardinal\" | \"ordinal\",\n category: CLDRCategory\n): number => {\n // i18next's `_zero` cardinal special-case is reached at count 0 for EVERY\n // language, even ones whose CLDR set has no `zero` category.\n if (kind === \"cardinal\" && category === \"zero\") {\n return 0;\n }\n const p = getPlurals(language);\n const rule = kind === \"ordinal\" ? p.ordinal : p.cardinal;\n const rep = rule.representative[category];\n return rep ?? 1;\n};\n"],"mappings":";AAuCA,MAAa,YAAY,UAA4B;CACnD,MAAM,IAAI,KAAK,IAAI,KAAK;CACxB,MAAM,IAAI,EAAE,SAAS;CACrB,MAAM,MAAM,EAAE,QAAQ,GAAG;CACzB,IAAI,QAAQ,IACV,OAAO;EAAE,GAAG;EAAG,GAAG;EAAG;EAAG,GAAG;EAAG,GAAG;CAAE;CAErC,MAAM,UAAU,EAAE,MAAM,GAAG,GAAG;CAC9B,MAAM,WAAW,EAAE,MAAM,MAAM,CAAC;CAChC,MAAM,UAAU,SAAS,QAAQ,QAAQ,EAAE;CAC3C,OAAO;EACL,GAAG,aAAa,KAAK,IAAI,OAAO,SAAS,UAAU,EAAE;EACrD,GAAG,OAAO,SAAS,SAAS,EAAE;EAC9B;EACA,GAAG,YAAY,KAAK,IAAI,OAAO,SAAS,SAAS,EAAE;EACnD,GAAG,SAAS;CACd;AACF;AAEA,MAAM,WAAW,GAAW,IAAY,OACtC,KAAK,MAAM,KAAK;;AAelB,MAAM,aAAyB;CAC7B,YAAY,CAAC,OAAO;CACpB,gBAAgB,EAAE,OAAO,EAAE;CAC3B,cAAc;AAChB;;;;;AAiJA,MAAM,QAAyC;CAC7C,IAAI;EAAE,UAAU;GA7BhB,YAAY;IAAC;IAAQ;IAAO;IAAO;IAAO;IAAQ;GAAO;GACzD,gBAAgB;IAAE,KAAK;IAAG,MAAM;IAAI,KAAK;IAAG,OAAO;IAAK,KAAK;IAAG,MAAM;GAAE;GACxE,SAAS,MAAM;IACb,MAAM,IAAI,SAAS,CAAC;IACpB,IAAI,EAAE,MAAM,GACV,OAAO;IAET,IAAI,EAAE,MAAM,GACV,OAAO;IAET,IAAI,EAAE,MAAM,GACV,OAAO;IAET,MAAM,SAAS,EAAE,IAAI;IACrB,IAAI,EAAE,MAAM,KAAK,QAAQ,QAAQ,GAAG,EAAE,GACpC,OAAO;IAET,IAAI,EAAE,MAAM,KAAK,QAAQ,QAAQ,IAAI,EAAE,GACrC,OAAO;IAET,OAAO;GACT;EAQ0B;EAAG,SAAS;CAAW;CACjD,IAAI;EAAE,UAAU;GAlFhB,YAAY,CAAC,OAAO,OAAO;GAC3B,gBAAgB;IAAE,KAAK;IAAG,OAAO;GAAE;GACnC,SAAS,MAAM;IACb,MAAM,IAAI,SAAS,CAAC;IACpB,OAAO,EAAE,MAAM,KAAK,EAAE,MAAM,IAAI,QAAQ;GAC1C;EA6E0B;EAAG,SAAS;CAAW;CACjD,IAAI;EAAE,UAAU;GAjJhB,YAAY,CAAC,OAAO,OAAO;GAC3B,gBAAgB;IAAE,KAAK;IAAG,OAAO;GAAE;GACnC,SAAS,MAAM;IACb,MAAM,IAAI,SAAS,CAAC;IACpB,OAAO,EAAE,MAAM,KAAK,EAAE,MAAM,IAAI,QAAQ;GAC1C;EA4I0B;EAAG,SAAS;GAxItC,YAAY;IAAC;IAAO;IAAO;IAAO;GAAO;GACzC,gBAAgB;IAAE,KAAK;IAAG,KAAK;IAAG,OAAO;IAAG,KAAK;GAAE;GACnD,SAAS,MAAM;IAKb,MAAM,EAAE,MAAM,SAAS,CAAC;IACxB,MAAM,QAAQ,IAAI;IAClB,MAAM,SAAS,IAAI;IAEnB,IAAI,UAAU,KAAK,WAAW,IAC5B,OAAO;IAGT,IAAI,UAAU,KAAK,WAAW,IAC5B,OAAO;IAGT,IAAI,UAAU,KAAK,WAAW,IAC5B,OAAO;IAGT,OAAO;GACT;EAgH+C;CAAE;CACjD,IAAI;EAAE,UAAU;GA7GhB,YAAY;IAAC;IAAO;IAAQ;GAAO;GACnC,gBAAgB;IAAE,MAAM;IAAW,KAAK;IAAG,OAAO;GAAE;GACpD,SAAS,MAAM;IACb,MAAM,IAAI,SAAS,CAAC;IACpB,IAAI,EAAE,MAAM,KAAK,EAAE,MAAM,GACvB,OAAO;IAIT,IAAI,EAAE,IAAI,QAAc,KAAK,EAAE,MAAM,GACnC,OAAO;IAET,OAAO;GACT;EAgG0B;EAAG,SAAS;GA5FtC,YAAY,CAAC,OAAO,OAAO;GAC3B,gBAAgB;IAAE,KAAK;IAAG,OAAO;GAAE;GAGnC,SAAS,MAAO,SAAS,CAAC,CAAC,CAAC,MAAM,IAAI,QAAQ;EAwFC;CAAE;CACjD,IAAI;EAAE,UAAU;EAAY,SAAS;CAAW;CAChD,IAAI;EAAE,UAAU;EAAY,SAAS;CAAW;CAChD,IAAI;EAAE,UAAU;GAxDhB,YAAY;IAAC;IAAO;IAAO;IAAQ;GAAO;GAC1C,gBAAgB;IAAE,KAAK;IAAG,MAAM;IAAG,KAAK;IAAG,OAAO;GAAI;GACtD,SAAS,MAAM;IACb,MAAM,IAAI,SAAS,CAAC;IAEpB,IAAI,EAAE,MAAM,GACV,OAAO;IAET,IAAI,EAAE,MAAM,GACV,OAAO;IAET,MAAM,QAAQ,EAAE,IAAI;IACpB,MAAM,SAAS,EAAE,IAAI;IACrB,IAAI,QAAQ,OAAO,GAAG,CAAC,KAAK,CAAC,QAAQ,QAAQ,IAAI,EAAE,GACjD,OAAO;IAET,OAAO;GACT;EAuC0B;EAAG,SAAS;CAAW;CACjD,IAAI;EAAE,UAAU;GA/EhB,YAAY;IAAC;IAAO;IAAO;IAAQ;GAAO;GAC1C,gBAAgB;IAAE,KAAK;IAAG,MAAM;IAAG,KAAK;IAAG,OAAO;GAAI;GACtD,SAAS,MAAM;IACb,MAAM,IAAI,SAAS,CAAC;IAEpB,IAAI,EAAE,MAAM,GACV,OAAO;IAET,MAAM,QAAQ,EAAE,IAAI;IACpB,MAAM,SAAS,EAAE,IAAI;IACrB,IAAI,UAAU,KAAK,WAAW,IAC5B,OAAO;IAET,IAAI,QAAQ,OAAO,GAAG,CAAC,KAAK,CAAC,QAAQ,QAAQ,IAAI,EAAE,GACjD,OAAO;IAGT,OAAO;GACT;EA6D0B;EAAG,SAAS;CAAW;CACjD,IAAI;EAAE,UAAU;EAAY,SAAS;CAAW;AAClD;AAEA,MAAM,WAA4B;CAAE,UAAU;CAAY,SAAS;AAAW;;AAG9E,MAAa,sBAAgC,OAAO,KAAK,KAAK;;;;;;;;;;;AAY9D,MAAa,iBAAiB,aAC5B,SAAS,YAAY,CAAC,CAAC,MAAM,OAAO,CAAC,CAAC,MAAM,SAAS,YAAY;AAEnE,MAAa,uBAAuB,aAClC,cAAc,QAAQ,KAAK;;;;;;AAO7B,MAAa,cAAc,aACzB,MAAM,cAAc,QAAQ,MAAM;AAEpC,MAAa,iBACX,UACA,SACmB;CACnB,IAAI,SAAS,QACX,OAAO,CAAC,OAAO;CAEjB,MAAM,IAAI,WAAW,QAAQ;CAC7B,OAAO,SAAS,YAAY,EAAE,QAAQ,aAAa,EAAE,SAAS;AAChE;;AAGA,MAAa,oBAAoB,aAAwC;CACvE,MAAM,IAAI,WAAW,QAAQ;CAC7B,OAAO,IAAI,IAAkB,CAC3B,GAAG,EAAE,SAAS,YACd,GAAG,EAAE,QAAQ,UACf,CAAC;AACH;;AAGA,MAAa,kBACX,UACA,OACA,YACiB;CACjB,MAAM,IAAI,WAAW,QAAQ;CAC7B,OAAO,UAAU,EAAE,QAAQ,OAAO,KAAK,IAAI,EAAE,SAAS,OAAO,KAAK;AACpE;;AAGA,MAAa,uBACX,UACA,MACA,aACW;CAGX,IAAI,SAAS,cAAc,aAAa,QACtC,OAAO;CAET,MAAM,IAAI,WAAW,QAAQ;CAG7B,QAFa,SAAS,YAAY,EAAE,UAAU,EAAE,SAAA,CAC/B,eAAe,aAClB;AAChB"}
|
|
@@ -0,0 +1,68 @@
|
|
|
1
|
+
import { i as Key, o as PluralSet, r as CanonicalModel } from "./model-5mrSQGoC.js";
|
|
2
|
+
|
|
3
|
+
//#region src/harness.d.ts
|
|
4
|
+
/**
|
|
5
|
+
* A format/syntax adapter's production contract: files <-> canonical model.
|
|
6
|
+
* `Files` is the adapter's on-disk shape (e.g. i18next's nested-JSON bundles);
|
|
7
|
+
* `Input` is whatever it parses from (e.g. `{ language, resources, options }`).
|
|
8
|
+
*/
|
|
9
|
+
interface FormatAdapter<Files = unknown, Input = unknown> {
|
|
10
|
+
readonly id: string;
|
|
11
|
+
parse: (input: Input) => CanonicalModel;
|
|
12
|
+
export: (model: CanonicalModel) => Files;
|
|
13
|
+
}
|
|
14
|
+
/**
|
|
15
|
+
* A probe point the harness compares between source and re-exported files. `key`
|
|
16
|
+
* and the plural classification are canonical (core); `args` is the adapter's
|
|
17
|
+
* render-argument shape (e.g. i18next `{ count, ordinal, context }`) and is
|
|
18
|
+
* opaque to the driver — it just hands it back to the oracle.
|
|
19
|
+
*/
|
|
20
|
+
interface Vector {
|
|
21
|
+
key: Key;
|
|
22
|
+
context: string;
|
|
23
|
+
kind: PluralSet["kind"];
|
|
24
|
+
category?: string;
|
|
25
|
+
args: unknown;
|
|
26
|
+
}
|
|
27
|
+
/**
|
|
28
|
+
* The per-adapter render oracle: enumerate the probe vectors for a model, and
|
|
29
|
+
* render one against a set of files (returning the resolved `t()` string, or
|
|
30
|
+
* `undefined` if it doesn't resolve). Test-only; supplied by the adapter.
|
|
31
|
+
*/
|
|
32
|
+
interface RenderOracle<Files = unknown> {
|
|
33
|
+
vectors: (model: CanonicalModel) => Vector[];
|
|
34
|
+
render: (files: Files, vector: Vector, language: string) => string | undefined;
|
|
35
|
+
}
|
|
36
|
+
interface RoundTripMismatch {
|
|
37
|
+
namespace: string;
|
|
38
|
+
base: string;
|
|
39
|
+
context: string;
|
|
40
|
+
kind: PluralSet["kind"];
|
|
41
|
+
category?: string;
|
|
42
|
+
args: unknown;
|
|
43
|
+
fromSource: string | undefined;
|
|
44
|
+
fromExported: string | undefined;
|
|
45
|
+
/** set when the source itself failed to resolve a vector the model produced */
|
|
46
|
+
note?: string;
|
|
47
|
+
}
|
|
48
|
+
interface RoundTripReport<Files = unknown> {
|
|
49
|
+
language: string;
|
|
50
|
+
verdict: "lossless" | "mismatch";
|
|
51
|
+
vectorCount: number;
|
|
52
|
+
mismatches: RoundTripMismatch[];
|
|
53
|
+
canonical: CanonicalModel;
|
|
54
|
+
exported: Files;
|
|
55
|
+
}
|
|
56
|
+
/**
|
|
57
|
+
* The generic driver: import → export → compare via the adapter's oracle. Equivalence
|
|
58
|
+
* is on resolved render output, so a re-export that renders identically to the
|
|
59
|
+
* source for every probe is lossless — regardless of how the files were renormalised.
|
|
60
|
+
*/
|
|
61
|
+
declare const runRoundTrip: <Files, Input extends {
|
|
62
|
+
resources: Files;
|
|
63
|
+
}>(adapter: FormatAdapter<Files, Input>, oracle: RenderOracle<Files>, input: Input) => RoundTripReport<Files>;
|
|
64
|
+
/** Format a non-lossless report into a human-readable failure message. */
|
|
65
|
+
declare const formatMismatches: (report: RoundTripReport) => string;
|
|
66
|
+
//#endregion
|
|
67
|
+
export { FormatAdapter, RenderOracle, RoundTripMismatch, RoundTripReport, Vector, formatMismatches, runRoundTrip };
|
|
68
|
+
//# sourceMappingURL=harness.d.ts.map
|
package/dist/harness.js
ADDED
|
@@ -0,0 +1,47 @@
|
|
|
1
|
+
//#region src/harness.ts
|
|
2
|
+
/**
|
|
3
|
+
* The generic driver: import → export → compare via the adapter's oracle. Equivalence
|
|
4
|
+
* is on resolved render output, so a re-export that renders identically to the
|
|
5
|
+
* source for every probe is lossless — regardless of how the files were renormalised.
|
|
6
|
+
*/
|
|
7
|
+
const runRoundTrip = (adapter, oracle, input) => {
|
|
8
|
+
const canonical = adapter.parse(input);
|
|
9
|
+
const exported = adapter.export(canonical);
|
|
10
|
+
const source = input.resources;
|
|
11
|
+
const { language } = canonical;
|
|
12
|
+
const vectors = oracle.vectors(canonical);
|
|
13
|
+
const mismatches = [];
|
|
14
|
+
for (const v of vectors) {
|
|
15
|
+
const fromSource = oracle.render(source, v, language);
|
|
16
|
+
const fromExported = oracle.render(exported, v, language);
|
|
17
|
+
const m = {
|
|
18
|
+
...v.category !== void 0 && v.category !== "" ? { category: v.category } : {},
|
|
19
|
+
args: v.args,
|
|
20
|
+
base: v.key.base,
|
|
21
|
+
context: v.context,
|
|
22
|
+
fromExported,
|
|
23
|
+
fromSource,
|
|
24
|
+
kind: v.kind,
|
|
25
|
+
namespace: v.key.namespace
|
|
26
|
+
};
|
|
27
|
+
if (fromSource === void 0) mismatches.push({
|
|
28
|
+
...m,
|
|
29
|
+
note: "source did not resolve a vector the model produced"
|
|
30
|
+
});
|
|
31
|
+
else if (fromSource !== fromExported) mismatches.push(m);
|
|
32
|
+
}
|
|
33
|
+
return {
|
|
34
|
+
canonical,
|
|
35
|
+
exported,
|
|
36
|
+
language,
|
|
37
|
+
mismatches,
|
|
38
|
+
vectorCount: vectors.length,
|
|
39
|
+
verdict: mismatches.length === 0 ? "lossless" : "mismatch"
|
|
40
|
+
};
|
|
41
|
+
};
|
|
42
|
+
/** Format a non-lossless report into a human-readable failure message. */
|
|
43
|
+
const formatMismatches = (report) => report.mismatches.slice(0, 5).map((m) => ` ${m.namespace}:${m.base} ctx="${m.context}" ${m.kind}${m.category !== void 0 && m.category !== "" ? `/${m.category}` : ""}: source=${JSON.stringify(m.fromSource)} exported=${JSON.stringify(m.fromExported)}`).join("\n");
|
|
44
|
+
//#endregion
|
|
45
|
+
export { formatMismatches, runRoundTrip };
|
|
46
|
+
|
|
47
|
+
//# sourceMappingURL=harness.js.map
|
|
@@ -0,0 +1 @@
|
|
|
1
|
+
{"version":3,"file":"harness.js","names":[],"sources":["../src/harness.ts"],"sourcesContent":["/**\n * The round-trip fidelity harness — DRIVER + contracts.\n *\n * Published as the `@astilba/core/harness` subpath, kept OUT of the stable\n * `@astilba/core` entry: the `FormatAdapter`/`RenderOracle` contracts are\n * unproven until a second (e.g. ICU) adapter implements them, so treat this\n * surface as unstable pre-1.0 — it may change in any minor.\n *\n * This is the format-neutral half: the generic loop\n *\n * model = adapter.parse(input) ; files' = adapter.export(model)\n * for each probe vector: assert oracle.render(sourceFiles) === oracle.render(files')\n *\n * \"Render\" is per-syntax (i18next resolves `_one`/`_ordinal_*`/`_male` suffix keys;\n * ICU resolves entirely differently), so each format adapter supplies its own\n * `RenderOracle`. The oracle is a TEST-SUPPORT capability — in production the\n * end-user's i18next does the rendering, never this. The driver knows only the\n * canonical model; it imports nothing format-specific.\n */\n\nimport type { CanonicalModel, Key, PluralSet } from \"./model.ts\";\n\n/**\n * A format/syntax adapter's production contract: files <-> canonical model.\n * `Files` is the adapter's on-disk shape (e.g. i18next's nested-JSON bundles);\n * `Input` is whatever it parses from (e.g. `{ language, resources, options }`).\n */\nexport interface FormatAdapter<Files = unknown, Input = unknown> {\n readonly id: string;\n parse: (input: Input) => CanonicalModel;\n export: (model: CanonicalModel) => Files;\n}\n\n/**\n * A probe point the harness compares between source and re-exported files. `key`\n * and the plural classification are canonical (core); `args` is the adapter's\n * render-argument shape (e.g. i18next `{ count, ordinal, context }`) and is\n * opaque to the driver — it just hands it back to the oracle.\n */\nexport interface Vector {\n key: Key;\n context: string;\n kind: PluralSet[\"kind\"];\n category?: string;\n args: unknown;\n}\n\n/**\n * The per-adapter render oracle: enumerate the probe vectors for a model, and\n * render one against a set of files (returning the resolved `t()` string, or\n * `undefined` if it doesn't resolve). Test-only; supplied by the adapter.\n */\nexport interface RenderOracle<Files = unknown> {\n vectors: (model: CanonicalModel) => Vector[];\n render: (\n files: Files,\n vector: Vector,\n language: string\n ) => string | undefined;\n}\n\nexport interface RoundTripMismatch {\n namespace: string;\n base: string;\n context: string;\n kind: PluralSet[\"kind\"];\n category?: string;\n args: unknown;\n fromSource: string | undefined;\n fromExported: string | undefined;\n /** set when the source itself failed to resolve a vector the model produced */\n note?: string;\n}\n\nexport interface RoundTripReport<Files = unknown> {\n language: string;\n verdict: \"lossless\" | \"mismatch\";\n vectorCount: number;\n mismatches: RoundTripMismatch[];\n canonical: CanonicalModel;\n exported: Files;\n}\n\n/**\n * The generic driver: import → export → compare via the adapter's oracle. Equivalence\n * is on resolved render output, so a re-export that renders identically to the\n * source for every probe is lossless — regardless of how the files were renormalised.\n */\nexport const runRoundTrip = <Files, Input extends { resources: Files }>(\n adapter: FormatAdapter<Files, Input>,\n oracle: RenderOracle<Files>,\n input: Input\n): RoundTripReport<Files> => {\n const canonical = adapter.parse(input);\n const exported = adapter.export(canonical);\n const source = input.resources;\n const { language } = canonical;\n\n const vectors = oracle.vectors(canonical);\n const mismatches: RoundTripMismatch[] = [];\n for (const v of vectors) {\n const fromSource = oracle.render(source, v, language);\n const fromExported = oracle.render(exported, v, language);\n const m: RoundTripMismatch = {\n // category is spread first (conditional key) so the remaining literal keys\n // form a single sorted run; it never collides with the keys below.\n ...(v.category !== undefined && v.category !== \"\"\n ? { category: v.category }\n : {}),\n args: v.args,\n base: v.key.base,\n context: v.context,\n fromExported,\n fromSource,\n kind: v.kind,\n namespace: v.key.namespace,\n };\n // The model built this vector from the SOURCE, so the source MUST resolve it.\n // If it doesn't, the harness — not the export — is broken; surface it instead\n // of letting undefined === undefined read as a (false) match.\n if (fromSource === undefined) {\n mismatches.push({\n ...m,\n note: \"source did not resolve a vector the model produced\",\n });\n } else if (fromSource !== fromExported) {\n mismatches.push(m);\n }\n }\n\n return {\n canonical,\n exported,\n language,\n mismatches,\n vectorCount: vectors.length,\n verdict: mismatches.length === 0 ? \"lossless\" : \"mismatch\",\n };\n};\n\n/** Format a non-lossless report into a human-readable failure message. */\nexport const formatMismatches = (report: RoundTripReport): string =>\n report.mismatches\n .slice(0, 5)\n .map(\n (m) =>\n ` ${m.namespace}:${m.base} ctx=\"${m.context}\" ${m.kind}${\n m.category !== undefined && m.category !== \"\" ? `/${m.category}` : \"\"\n }: source=${JSON.stringify(m.fromSource)} exported=${JSON.stringify(m.fromExported)}`\n )\n .join(\"\\n\");\n"],"mappings":";;;;;;AAwFA,MAAa,gBACX,SACA,QACA,UAC2B;CAC3B,MAAM,YAAY,QAAQ,MAAM,KAAK;CACrC,MAAM,WAAW,QAAQ,OAAO,SAAS;CACzC,MAAM,SAAS,MAAM;CACrB,MAAM,EAAE,aAAa;CAErB,MAAM,UAAU,OAAO,QAAQ,SAAS;CACxC,MAAM,aAAkC,CAAC;CACzC,KAAK,MAAM,KAAK,SAAS;EACvB,MAAM,aAAa,OAAO,OAAO,QAAQ,GAAG,QAAQ;EACpD,MAAM,eAAe,OAAO,OAAO,UAAU,GAAG,QAAQ;EACxD,MAAM,IAAuB;GAG3B,GAAI,EAAE,aAAa,KAAA,KAAa,EAAE,aAAa,KAC3C,EAAE,UAAU,EAAE,SAAS,IACvB,CAAC;GACL,MAAM,EAAE;GACR,MAAM,EAAE,IAAI;GACZ,SAAS,EAAE;GACX;GACA;GACA,MAAM,EAAE;GACR,WAAW,EAAE,IAAI;EACnB;EAIA,IAAI,eAAe,KAAA,GACjB,WAAW,KAAK;GACd,GAAG;GACH,MAAM;EACR,CAAC;OACI,IAAI,eAAe,cACxB,WAAW,KAAK,CAAC;CAErB;CAEA,OAAO;EACL;EACA;EACA;EACA;EACA,aAAa,QAAQ;EACrB,SAAS,WAAW,WAAW,IAAI,aAAa;CAClD;AACF;;AAGA,MAAa,oBAAoB,WAC/B,OAAO,WACJ,MAAM,GAAG,CAAC,CAAC,CACX,KACE,MACC,KAAK,EAAE,UAAU,GAAG,EAAE,KAAK,QAAQ,EAAE,QAAQ,IAAI,EAAE,OACjD,EAAE,aAAa,KAAA,KAAa,EAAE,aAAa,KAAK,IAAI,EAAE,aAAa,GACpE,WAAW,KAAK,UAAU,EAAE,UAAU,EAAE,YAAY,KAAK,UAAU,EAAE,YAAY,GACtF,CAAC,CACA,KAAK,IAAI"}
|
package/dist/index.d.ts
ADDED
|
@@ -0,0 +1,98 @@
|
|
|
1
|
+
import { a as PluralKind, c as ValueToken, i as Key, l as keyId, n as CLDRCategory, o as PluralSet, r as CanonicalModel, s as Value, t as ALL_CLDR_CATEGORIES } from "./model-5mrSQGoC.js";
|
|
2
|
+
import { LanguagePlurals, PluralRule, SUPPORTED_LANGUAGES, allCategoriesFor, categoriesFor, getPlurals, isSupportedLanguage, operands, primarySubtag, representativeCount, selectCategory } from "./cldr.js";
|
|
3
|
+
|
|
4
|
+
//#region src/errors.d.ts
|
|
5
|
+
/**
|
|
6
|
+
* The base error type. `AstilbaError` is the one class consumers catch
|
|
7
|
+
* (`if (e instanceof AstilbaError) e.code`); `code` is an OPEN string so each
|
|
8
|
+
* format adapter owns its own code constants without a closed core enum coupling
|
|
9
|
+
* core to any one syntax. The i18next-v4 codes + their loud, fix-it-yourself
|
|
10
|
+
* factory functions live with the adapter (errors-i18next.ts).
|
|
11
|
+
*/
|
|
12
|
+
/** An error code is an open string; adapters define their own (e.g. `ICU_NOT_SUPPORTED`). */
|
|
13
|
+
type AstilbaErrorCode = string;
|
|
14
|
+
declare class AstilbaError extends Error {
|
|
15
|
+
readonly code: string;
|
|
16
|
+
/** The fully-qualified key (`namespace:flatKey`) the problem was found at, if any. */
|
|
17
|
+
readonly key?: string;
|
|
18
|
+
readonly details?: Record<string, unknown>;
|
|
19
|
+
constructor(code: string, message: string, opts?: {
|
|
20
|
+
key?: string;
|
|
21
|
+
details?: Record<string, unknown>;
|
|
22
|
+
});
|
|
23
|
+
}
|
|
24
|
+
//#endregion
|
|
25
|
+
//#region src/mask.d.ts
|
|
26
|
+
declare const sentinel: (index: number) => string;
|
|
27
|
+
/**
|
|
28
|
+
* Turns a raw value string into canonical tokens. Supplied by whichever syntax
|
|
29
|
+
* adapter is in use — the one syntax-specific dependency the string-entry validator
|
|
30
|
+
* needs, to re-tokenize an MT-returned translation that was never in the model.
|
|
31
|
+
*/
|
|
32
|
+
type Tokenizer = (raw: string) => ValueToken[];
|
|
33
|
+
interface MaskResult {
|
|
34
|
+
masked: string;
|
|
35
|
+
/** original token raws, indexed by sentinel number */
|
|
36
|
+
parts: string[];
|
|
37
|
+
}
|
|
38
|
+
/**
|
|
39
|
+
* Replace interpolation / nesting / markup tokens with sentinels. Rejects loudly
|
|
40
|
+
* if the literal text already contains a reserved sentinel delimiter — rare but
|
|
41
|
+
* legal in real values (e.g. private-use-area glyphs from icon fonts like Material
|
|
42
|
+
* Icons / Nerd Fonts) — rather than letting unmask() silently corrupt it.
|
|
43
|
+
*/
|
|
44
|
+
declare const maskTokens: (tokens: ValueToken[]) => MaskResult;
|
|
45
|
+
declare const unmask: (masked: string, parts: string[]) => string;
|
|
46
|
+
interface SentinelCheck {
|
|
47
|
+
ok: boolean;
|
|
48
|
+
errors: string[];
|
|
49
|
+
}
|
|
50
|
+
/**
|
|
51
|
+
* Validate that an MT engine returned every sentinel exactly once, unmodified,
|
|
52
|
+
* and invented none. Reordering is allowed (target languages reorder freely);
|
|
53
|
+
* pass `requireOrder` to also assert original order.
|
|
54
|
+
*/
|
|
55
|
+
declare const validateSentinels: (translated: string, parts: string[], opts?: {
|
|
56
|
+
requireOrder?: boolean;
|
|
57
|
+
}) => SentinelCheck;
|
|
58
|
+
interface PlaceholderCheck {
|
|
59
|
+
ok: boolean;
|
|
60
|
+
errors: string[];
|
|
61
|
+
}
|
|
62
|
+
/**
|
|
63
|
+
* Compare a source value's tokens against its translation's tokens. Fails closed:
|
|
64
|
+
* any added, dropped, or modified placeholder/markup is an error. Catches a
|
|
65
|
+
* translated variable name, a translated formatter keyword, a mangled `$t()` ref,
|
|
66
|
+
* or a dropped tag — the exact placeholder-corruption failure mode this guards.
|
|
67
|
+
*/
|
|
68
|
+
declare const validatePlaceholderTokens: (source: ValueToken[], translated: ValueToken[]) => PlaceholderCheck;
|
|
69
|
+
/**
|
|
70
|
+
* String-entry placeholder validator: tokenizes both sides with the supplied
|
|
71
|
+
* (syntax-specific) `tokenize` and defers to {@link validatePlaceholderTokens}.
|
|
72
|
+
* The free-utility CI check — the adapter pre-binds its tokenizer so callers get
|
|
73
|
+
* the ergonomic two-argument form.
|
|
74
|
+
*/
|
|
75
|
+
declare const validatePlaceholders: (source: string, translated: string, tokenize: Tokenizer) => PlaceholderCheck;
|
|
76
|
+
/** The actual tokens that differ between source and translation. */
|
|
77
|
+
interface PlaceholderDiff {
|
|
78
|
+
/** placeholders in `source` that the translation dropped (or has fewer of) */
|
|
79
|
+
dropped: ValueToken[];
|
|
80
|
+
/** placeholders the translation introduced that `source` did not have */
|
|
81
|
+
added: ValueToken[];
|
|
82
|
+
}
|
|
83
|
+
/**
|
|
84
|
+
* The structured placeholder diff: the tokens dropped from / added to a
|
|
85
|
+
* translation relative to its source. Shares {@link validatePlaceholderTokens}'s
|
|
86
|
+
* identity rule — both go through the same {@link signature}, so the diff is
|
|
87
|
+
* empty exactly when that validator reports `ok` — but returns the tokens so a
|
|
88
|
+
* caller can render a precise report and decide whether a dropped+added pair is
|
|
89
|
+
* an unambiguous "changed" or must be listed separately.
|
|
90
|
+
*
|
|
91
|
+
* @param source canonical tokens of the source value
|
|
92
|
+
* @param translated canonical tokens of the translation
|
|
93
|
+
* @returns dropped and added tokens; both empty ⇔ placeholders are preserved
|
|
94
|
+
*/
|
|
95
|
+
declare const placeholderDiff: (source: ValueToken[], translated: ValueToken[]) => PlaceholderDiff;
|
|
96
|
+
//#endregion
|
|
97
|
+
export { ALL_CLDR_CATEGORIES, AstilbaError, type AstilbaErrorCode, type CLDRCategory, type CanonicalModel, type Key, type LanguagePlurals, type MaskResult, type PlaceholderCheck, type PlaceholderDiff, type PluralKind, type PluralRule, type PluralSet, SUPPORTED_LANGUAGES, type SentinelCheck, type Tokenizer, type Value, type ValueToken, allCategoriesFor, categoriesFor, getPlurals, isSupportedLanguage, keyId, maskTokens, operands, placeholderDiff, primarySubtag, representativeCount, selectCategory, sentinel, unmask, validatePlaceholderTokens, validatePlaceholders, validateSentinels };
|
|
98
|
+
//# sourceMappingURL=index.d.ts.map
|
package/dist/index.js
ADDED
|
@@ -0,0 +1,202 @@
|
|
|
1
|
+
import { SUPPORTED_LANGUAGES, allCategoriesFor, categoriesFor, getPlurals, isSupportedLanguage, operands, primarySubtag, representativeCount, selectCategory } from "./cldr.js";
|
|
2
|
+
//#region src/model.ts
|
|
3
|
+
const ALL_CLDR_CATEGORIES = [
|
|
4
|
+
"zero",
|
|
5
|
+
"one",
|
|
6
|
+
"two",
|
|
7
|
+
"few",
|
|
8
|
+
"many",
|
|
9
|
+
"other"
|
|
10
|
+
];
|
|
11
|
+
const keyId = (namespace, base) => `${namespace}:${base}`;
|
|
12
|
+
//#endregion
|
|
13
|
+
//#region src/errors.ts
|
|
14
|
+
var AstilbaError = class extends Error {
|
|
15
|
+
code;
|
|
16
|
+
/** The fully-qualified key (`namespace:flatKey`) the problem was found at, if any. */
|
|
17
|
+
key;
|
|
18
|
+
details;
|
|
19
|
+
constructor(code, message, opts = {}) {
|
|
20
|
+
super(message);
|
|
21
|
+
this.name = "AstilbaError";
|
|
22
|
+
this.code = code;
|
|
23
|
+
this.key = opts.key;
|
|
24
|
+
this.details = opts.details;
|
|
25
|
+
}
|
|
26
|
+
};
|
|
27
|
+
//#endregion
|
|
28
|
+
//#region src/mask.ts
|
|
29
|
+
/**
|
|
30
|
+
* MT masking & placeholder validation — the FORMAT-NEUTRAL core. It operates on
|
|
31
|
+
* canonical `ValueToken[]` (which the model
|
|
32
|
+
* already carries) instead of parsing strings itself: a syntax adapter tokenizes,
|
|
33
|
+
* core masks and validates. The one place a raw string must be re-tokenized is a
|
|
34
|
+
* translation returned from MT — it isn't in the model — so the string-entry
|
|
35
|
+
* `validatePlaceholders` takes the adapter's `Tokenizer` by injection.
|
|
36
|
+
*
|
|
37
|
+
* Two jobs:
|
|
38
|
+
* 1. maskTokens()/unmask(): replace every non-text token (the WHOLE token, incl.
|
|
39
|
+
* formatter and `$t()` ref name) with an opaque sentinel before an MT/LLM call,
|
|
40
|
+
* and restore it after. Because the formatter keyword and ref live INSIDE the
|
|
41
|
+
* masked span, MT never sees them — that alone defeats the "translated
|
|
42
|
+
* `one`/`other` -> `uno`/`otros`" class of bug.
|
|
43
|
+
* 2. validatePlaceholderTokens(): a fail-closed, CI-failable check comparing a
|
|
44
|
+
* source value's tokens against its translation's tokens (the caller restores
|
|
45
|
+
* any masked sentinels first) — every interpolation variable, formatter
|
|
46
|
+
* keyword, nesting ref, and markup tag must survive unmodified. This is the
|
|
47
|
+
* validator shipped in the free utility.
|
|
48
|
+
*
|
|
49
|
+
* Sentinels use private-use-area delimiters so they carry no linguistic content
|
|
50
|
+
* for an MT engine to "helpfully" translate, while still being detectable if the
|
|
51
|
+
* engine mangles them.
|
|
52
|
+
*/
|
|
53
|
+
const OPEN = "";
|
|
54
|
+
const CLOSE = "";
|
|
55
|
+
const sentinel = (index) => `${OPEN}${index}${CLOSE}`;
|
|
56
|
+
const SENTINEL_RE = new RegExp(`${OPEN}(\\d+)${CLOSE}`, "gu");
|
|
57
|
+
/**
|
|
58
|
+
* Replace interpolation / nesting / markup tokens with sentinels. Rejects loudly
|
|
59
|
+
* if the literal text already contains a reserved sentinel delimiter — rare but
|
|
60
|
+
* legal in real values (e.g. private-use-area glyphs from icon fonts like Material
|
|
61
|
+
* Icons / Nerd Fonts) — rather than letting unmask() silently corrupt it.
|
|
62
|
+
*/
|
|
63
|
+
const maskTokens = (tokens) => {
|
|
64
|
+
const parts = [];
|
|
65
|
+
let masked = "";
|
|
66
|
+
for (const tok of tokens) if (tok.type === "text") {
|
|
67
|
+
if (tok.raw.includes(OPEN) || tok.raw.includes(CLOSE)) throw new AstilbaError("MASK_VALIDATION", "Value text contains a reserved masking sentinel delimiter (U+E000/U+E001); it cannot be masked without ambiguity. Strip or escape these private-use-area characters before masking.");
|
|
68
|
+
masked += tok.raw;
|
|
69
|
+
} else {
|
|
70
|
+
masked += sentinel(parts.length);
|
|
71
|
+
parts.push(tok.raw);
|
|
72
|
+
}
|
|
73
|
+
return {
|
|
74
|
+
masked,
|
|
75
|
+
parts
|
|
76
|
+
};
|
|
77
|
+
};
|
|
78
|
+
const unmask = (masked, parts) => masked.replace(SENTINEL_RE, (_, n) => {
|
|
79
|
+
const part = parts[Number(n)];
|
|
80
|
+
if (part === void 0) throw new AstilbaError("MASK_VALIDATION", `Unknown sentinel index ${n} during unmask.`);
|
|
81
|
+
return part;
|
|
82
|
+
});
|
|
83
|
+
/**
|
|
84
|
+
* Validate that an MT engine returned every sentinel exactly once, unmodified,
|
|
85
|
+
* and invented none. Reordering is allowed (target languages reorder freely);
|
|
86
|
+
* pass `requireOrder` to also assert original order.
|
|
87
|
+
*/
|
|
88
|
+
const validateSentinels = (translated, parts, opts = {}) => {
|
|
89
|
+
const errors = [];
|
|
90
|
+
const seen = /* @__PURE__ */ new Map();
|
|
91
|
+
const order = [];
|
|
92
|
+
let m;
|
|
93
|
+
SENTINEL_RE.lastIndex = 0;
|
|
94
|
+
while ((m = SENTINEL_RE.exec(translated)) !== null) {
|
|
95
|
+
const idx = Number(m[1]);
|
|
96
|
+
seen.set(idx, (seen.get(idx) ?? 0) + 1);
|
|
97
|
+
order.push(idx);
|
|
98
|
+
}
|
|
99
|
+
for (let i = 0; i < parts.length; i += 1) {
|
|
100
|
+
const count = seen.get(i) ?? 0;
|
|
101
|
+
if (count === 0) errors.push(`placeholder #${i} (${parts[i]}) was dropped by MT`);
|
|
102
|
+
else if (count > 1) errors.push(`placeholder #${i} (${parts[i]}) was duplicated by MT`);
|
|
103
|
+
}
|
|
104
|
+
for (const idx of seen.keys()) if (idx >= parts.length) errors.push(`MT invented an unknown placeholder #${idx}`);
|
|
105
|
+
const stripped = translated.replace(SENTINEL_RE, "");
|
|
106
|
+
if (stripped.includes(OPEN) || stripped.includes(CLOSE)) errors.push("MT corrupted a placeholder sentinel (stray delimiter found)");
|
|
107
|
+
if (opts.requireOrder === true) {
|
|
108
|
+
const expected = [...order].toSorted((a, b) => a - b);
|
|
109
|
+
if (order.join(",") !== expected.join(",")) errors.push("placeholder order was changed");
|
|
110
|
+
}
|
|
111
|
+
return {
|
|
112
|
+
errors,
|
|
113
|
+
ok: errors.length === 0
|
|
114
|
+
};
|
|
115
|
+
};
|
|
116
|
+
const esc = (s) => s.replaceAll("\\", "\\\\").replaceAll("|", "\\|");
|
|
117
|
+
/**
|
|
118
|
+
* Canonical identity for placeholder equality, computed from the `ValueToken`
|
|
119
|
+
* fields directly — variable + format for interpolation, ref for nesting, raw for
|
|
120
|
+
* markup. No syntax-specific normalisation: a value and its own translation carry
|
|
121
|
+
* byte-identical placeholders, so the raw canonical fields ARE the faithful
|
|
122
|
+
* identity (an adapter wanting looser matching can pre-normalise its tokens before
|
|
123
|
+
* calling `validatePlaceholderTokens`). Returns `null` for text (not a placeholder).
|
|
124
|
+
*/
|
|
125
|
+
const signature = (tok) => {
|
|
126
|
+
switch (tok.type) {
|
|
127
|
+
case "interpolation": return `interp:${esc(tok.variable)}|${esc(tok.format ?? "")}`;
|
|
128
|
+
case "nesting": return `nest:${esc(tok.ref)}|${esc(tok.options ?? "")}`;
|
|
129
|
+
case "markup": return `markup:${tok.raw}`;
|
|
130
|
+
case "text": return null;
|
|
131
|
+
default: return tok;
|
|
132
|
+
}
|
|
133
|
+
};
|
|
134
|
+
const bySignature = (tokens) => {
|
|
135
|
+
const groups = /* @__PURE__ */ new Map();
|
|
136
|
+
for (const tok of tokens) {
|
|
137
|
+
const sig = signature(tok);
|
|
138
|
+
if (sig === null) continue;
|
|
139
|
+
const arr = groups.get(sig);
|
|
140
|
+
if (arr) arr.push(tok);
|
|
141
|
+
else groups.set(sig, [tok]);
|
|
142
|
+
}
|
|
143
|
+
return groups;
|
|
144
|
+
};
|
|
145
|
+
/**
|
|
146
|
+
* Compare a source value's tokens against its translation's tokens. Fails closed:
|
|
147
|
+
* any added, dropped, or modified placeholder/markup is an error. Catches a
|
|
148
|
+
* translated variable name, a translated formatter keyword, a mangled `$t()` ref,
|
|
149
|
+
* or a dropped tag — the exact placeholder-corruption failure mode this guards.
|
|
150
|
+
*/
|
|
151
|
+
const validatePlaceholderTokens = (source, translated) => {
|
|
152
|
+
const src = bySignature(source);
|
|
153
|
+
const dst = bySignature(translated);
|
|
154
|
+
const errors = [];
|
|
155
|
+
for (const [sig, toks] of src) if ((dst.get(sig)?.length ?? 0) < toks.length) errors.push(`source placeholder "${sig}" is missing or altered in the translation`);
|
|
156
|
+
for (const [sig, toks] of dst) if (toks.length > (src.get(sig)?.length ?? 0)) errors.push(`translation introduced an unexpected placeholder "${sig}"`);
|
|
157
|
+
return {
|
|
158
|
+
errors,
|
|
159
|
+
ok: errors.length === 0
|
|
160
|
+
};
|
|
161
|
+
};
|
|
162
|
+
/**
|
|
163
|
+
* String-entry placeholder validator: tokenizes both sides with the supplied
|
|
164
|
+
* (syntax-specific) `tokenize` and defers to {@link validatePlaceholderTokens}.
|
|
165
|
+
* The free-utility CI check — the adapter pre-binds its tokenizer so callers get
|
|
166
|
+
* the ergonomic two-argument form.
|
|
167
|
+
*/
|
|
168
|
+
const validatePlaceholders = (source, translated, tokenize) => validatePlaceholderTokens(tokenize(source), tokenize(translated));
|
|
169
|
+
/**
|
|
170
|
+
* The structured placeholder diff: the tokens dropped from / added to a
|
|
171
|
+
* translation relative to its source. Shares {@link validatePlaceholderTokens}'s
|
|
172
|
+
* identity rule — both go through the same {@link signature}, so the diff is
|
|
173
|
+
* empty exactly when that validator reports `ok` — but returns the tokens so a
|
|
174
|
+
* caller can render a precise report and decide whether a dropped+added pair is
|
|
175
|
+
* an unambiguous "changed" or must be listed separately.
|
|
176
|
+
*
|
|
177
|
+
* @param source canonical tokens of the source value
|
|
178
|
+
* @param translated canonical tokens of the translation
|
|
179
|
+
* @returns dropped and added tokens; both empty ⇔ placeholders are preserved
|
|
180
|
+
*/
|
|
181
|
+
const placeholderDiff = (source, translated) => {
|
|
182
|
+
const src = bySignature(source);
|
|
183
|
+
const dst = bySignature(translated);
|
|
184
|
+
const dropped = [];
|
|
185
|
+
const added = [];
|
|
186
|
+
for (const [sig, toks] of src) {
|
|
187
|
+
const have = dst.get(sig)?.length ?? 0;
|
|
188
|
+
for (let i = have; i < toks.length; i += 1) dropped.push(toks[i]);
|
|
189
|
+
}
|
|
190
|
+
for (const [sig, toks] of dst) {
|
|
191
|
+
const have = src.get(sig)?.length ?? 0;
|
|
192
|
+
for (let i = have; i < toks.length; i += 1) added.push(toks[i]);
|
|
193
|
+
}
|
|
194
|
+
return {
|
|
195
|
+
added,
|
|
196
|
+
dropped
|
|
197
|
+
};
|
|
198
|
+
};
|
|
199
|
+
//#endregion
|
|
200
|
+
export { ALL_CLDR_CATEGORIES, AstilbaError, SUPPORTED_LANGUAGES, allCategoriesFor, categoriesFor, getPlurals, isSupportedLanguage, keyId, maskTokens, operands, placeholderDiff, primarySubtag, representativeCount, selectCategory, sentinel, unmask, validatePlaceholderTokens, validatePlaceholders, validateSentinels };
|
|
201
|
+
|
|
202
|
+
//# sourceMappingURL=index.js.map
|
|
@@ -0,0 +1 @@
|
|
|
1
|
+
{"version":3,"file":"index.js","names":[],"sources":["../src/model.ts","../src/errors.ts","../src/mask.ts"],"sourcesContent":["/**\n * The canonical message model.\n *\n * This is the format-agnostic centre of Astilba: every file/syntax adapter maps\n * files <-> this model. Phase 0 ships one adapter (native i18next v4) but the\n * model is deliberately syntax-neutral so ICU and other dialects can map onto it\n * later without changing it.\n *\n * Invariants worth stating out loud:\n * - Value text is preserved EXACTLY (never mutate value bytes). `Value.raw` is\n * the source of truth; `Value.tokens` is a derived view\n * used only for masking/analysis and is never used to reconstruct output.\n * - Plurals are represented STRUCTURALLY as a CLDR-category -> value map, not as\n * suffixed flat keys. The suffix set is re-derived per target language on\n * export, never carried through.\n */\n\n/** The six CLDR plural categories. The set that actually applies is language-dependent. */\nexport type CLDRCategory = \"zero\" | \"one\" | \"two\" | \"few\" | \"many\" | \"other\";\n\nexport const ALL_CLDR_CATEGORIES: readonly CLDRCategory[] = [\n \"zero\",\n \"one\",\n \"two\",\n \"few\",\n \"many\",\n \"other\",\n];\n\n/**\n * Whether a key is pluralised, and if so how.\n * - \"none\" -> a plain string; no plural suffix on export.\n * - \"cardinal\" -> `_one`, `_other`, ... suffixes.\n * - \"ordinal\" -> `_ordinal_one`, ... suffixes (i18next v4 ordinal form).\n *\n * Note \"none\" is distinct from a single-category cardinal (e.g. Japanese, whose\n * only category is `other`): `foo` (none) and `foo_other` (cardinal) are\n * different keys and must round-trip differently.\n */\nexport type PluralKind = \"none\" | \"cardinal\" | \"ordinal\";\n\n/** A token within a value. Derived from `Value.raw`; used for masking + validation. */\nexport type ValueToken =\n | { type: \"text\"; raw: string }\n /** `{{var}}`, `{{var, format}}`, `{{var, format(options)}}` */\n | {\n type: \"interpolation\";\n raw: string;\n variable: string;\n format?: string;\n }\n /** `$t(ref)`, `$t(ref, {\"opt\": ...})` */\n | {\n type: \"nesting\";\n raw: string;\n ref: string;\n options?: string;\n }\n /** an HTML/XML tag `<...>` or an entity `&...;` — opaque markup */\n | { type: \"markup\"; raw: string };\n\n/**\n * A translatable value. `raw` is byte-exact source; `tokens` is the parsed view.\n * `tokens.map(t => t.raw).join(\"\")` always reconstructs `raw` exactly.\n */\nexport interface Value {\n raw: string;\n tokens: ValueToken[];\n}\n\n/**\n * The set of values for one (base, context) cell.\n *\n * - kind \"none\": `values` holds a single \"other\" entry = the plain string.\n * - kind \"cardinal\"/\"ordinal\": `values` holds the per-category plural forms.\n * - `bare`: present only in the rare i18next case where a context key has BOTH\n * a suffix-less form (used when `t()` is called without `count`) AND plural\n * forms (used when `count` is given). Kept so both render paths are lossless.\n */\nexport interface PluralSet {\n kind: PluralKind;\n values: Map<CLDRCategory, Value>;\n bare?: Value;\n}\n\n/**\n * A logical message: a base key with one entry per context value.\n * The no-context case is a single entry keyed \"\" (empty string).\n *\n * `base` is the full key path WITHOUT namespace and WITHOUT plural/context\n * suffixes, using the project key separator (default \".\"), e.g. `account.friend`.\n */\nexport interface Key {\n namespace: string;\n base: string;\n /** contextValue -> its PluralSet. \"\" === no context. */\n contexts: Map<string, PluralSet>;\n}\n\n/**\n * A single language's worth of canonical data. Round-trip is per-language.\n *\n * IN-MEMORY ONLY: this is `Map`-based for fast lookup, so it is NOT directly\n * JSON-serializable (`JSON.stringify` yields `{}` for the Maps). A persistence/\n * transport DTO (plain objects or a `toJSON`/`fromJSON` pair) is a v1.0 item,\n * needed once the backend stores or ships the model.\n */\nexport interface CanonicalModel {\n /** BCP-47, e.g. `en`, `en-US`, `pt-BR`. */\n language: string;\n /** `${namespace}:${base}` -> Key */\n keys: Map<string, Key>;\n}\n\nexport const keyId = (namespace: string, base: string): string =>\n `${namespace}:${base}`;\n","/**\n * The base error type. `AstilbaError` is the one class consumers catch\n * (`if (e instanceof AstilbaError) e.code`); `code` is an OPEN string so each\n * format adapter owns its own code constants without a closed core enum coupling\n * core to any one syntax. The i18next-v4 codes + their loud, fix-it-yourself\n * factory functions live with the adapter (errors-i18next.ts).\n */\n\n/** An error code is an open string; adapters define their own (e.g. `ICU_NOT_SUPPORTED`). */\nexport type AstilbaErrorCode = string;\n\nexport class AstilbaError extends Error {\n readonly code: string;\n /** The fully-qualified key (`namespace:flatKey`) the problem was found at, if any. */\n readonly key?: string;\n readonly details?: Record<string, unknown>;\n\n constructor(\n code: string,\n message: string,\n opts: { key?: string; details?: Record<string, unknown> } = {}\n ) {\n super(message);\n this.name = \"AstilbaError\";\n this.code = code;\n this.key = opts.key;\n this.details = opts.details;\n }\n}\n","/**\n * MT masking & placeholder validation — the FORMAT-NEUTRAL core. It operates on\n * canonical `ValueToken[]` (which the model\n * already carries) instead of parsing strings itself: a syntax adapter tokenizes,\n * core masks and validates. The one place a raw string must be re-tokenized is a\n * translation returned from MT — it isn't in the model — so the string-entry\n * `validatePlaceholders` takes the adapter's `Tokenizer` by injection.\n *\n * Two jobs:\n * 1. maskTokens()/unmask(): replace every non-text token (the WHOLE token, incl.\n * formatter and `$t()` ref name) with an opaque sentinel before an MT/LLM call,\n * and restore it after. Because the formatter keyword and ref live INSIDE the\n * masked span, MT never sees them — that alone defeats the \"translated\n * `one`/`other` -> `uno`/`otros`\" class of bug.\n * 2. validatePlaceholderTokens(): a fail-closed, CI-failable check comparing a\n * source value's tokens against its translation's tokens (the caller restores\n * any masked sentinels first) — every interpolation variable, formatter\n * keyword, nesting ref, and markup tag must survive unmodified. This is the\n * validator shipped in the free utility.\n *\n * Sentinels use private-use-area delimiters so they carry no linguistic content\n * for an MT engine to \"helpfully\" translate, while still being detectable if the\n * engine mangles them.\n */\n\nimport { AstilbaError } from \"./errors.ts\";\nimport type { ValueToken } from \"./model.ts\";\n\nconst OPEN = \"\\uE000\";\nconst CLOSE = \"\\uE001\";\n\nexport const sentinel = (index: number): string => `${OPEN}${index}${CLOSE}`;\n\nconst SENTINEL_RE = new RegExp(`${OPEN}(\\\\d+)${CLOSE}`, \"gu\");\n\n/**\n * Turns a raw value string into canonical tokens. Supplied by whichever syntax\n * adapter is in use — the one syntax-specific dependency the string-entry validator\n * needs, to re-tokenize an MT-returned translation that was never in the model.\n */\nexport type Tokenizer = (raw: string) => ValueToken[];\n\nexport interface MaskResult {\n masked: string;\n /** original token raws, indexed by sentinel number */\n parts: string[];\n}\n\n/**\n * Replace interpolation / nesting / markup tokens with sentinels. Rejects loudly\n * if the literal text already contains a reserved sentinel delimiter — rare but\n * legal in real values (e.g. private-use-area glyphs from icon fonts like Material\n * Icons / Nerd Fonts) — rather than letting unmask() silently corrupt it.\n */\nexport const maskTokens = (tokens: ValueToken[]): MaskResult => {\n const parts: string[] = [];\n let masked = \"\";\n for (const tok of tokens) {\n if (tok.type === \"text\") {\n if (tok.raw.includes(OPEN) || tok.raw.includes(CLOSE)) {\n throw new AstilbaError(\n \"MASK_VALIDATION\",\n \"Value text contains a reserved masking sentinel delimiter \" +\n \"(U+E000/U+E001); it cannot be masked without ambiguity. Strip or \" +\n \"escape these private-use-area characters before masking.\"\n );\n }\n masked += tok.raw;\n } else {\n masked += sentinel(parts.length);\n parts.push(tok.raw);\n }\n }\n return { masked, parts };\n};\n\nexport const unmask = (masked: string, parts: string[]): string =>\n masked.replace(SENTINEL_RE, (_, n: string) => {\n const part = parts[Number(n)];\n if (part === undefined) {\n throw new AstilbaError(\n \"MASK_VALIDATION\",\n `Unknown sentinel index ${n} during unmask.`\n );\n }\n return part;\n });\n\nexport interface SentinelCheck {\n ok: boolean;\n errors: string[];\n}\n\n/**\n * Validate that an MT engine returned every sentinel exactly once, unmodified,\n * and invented none. Reordering is allowed (target languages reorder freely);\n * pass `requireOrder` to also assert original order.\n */\nexport const validateSentinels = (\n translated: string,\n parts: string[],\n opts: { requireOrder?: boolean } = {}\n): SentinelCheck => {\n const errors: string[] = [];\n const seen = new Map<number, number>();\n const order: number[] = [];\n let m: RegExpExecArray | null;\n SENTINEL_RE.lastIndex = 0;\n while ((m = SENTINEL_RE.exec(translated)) !== null) {\n const idx = Number(m[1]);\n seen.set(idx, (seen.get(idx) ?? 0) + 1);\n order.push(idx);\n }\n\n for (let i = 0; i < parts.length; i += 1) {\n const count = seen.get(i) ?? 0;\n if (count === 0) {\n errors.push(`placeholder #${i} (${parts[i]}) was dropped by MT`);\n } else if (count > 1) {\n errors.push(`placeholder #${i} (${parts[i]}) was duplicated by MT`);\n }\n }\n for (const idx of seen.keys()) {\n if (idx >= parts.length) {\n errors.push(`MT invented an unknown placeholder #${idx}`);\n }\n }\n // Detect a corrupted sentinel: stray delimiter chars not part of a valid token.\n const stripped = translated.replace(SENTINEL_RE, \"\");\n if (stripped.includes(OPEN) || stripped.includes(CLOSE)) {\n errors.push(\"MT corrupted a placeholder sentinel (stray delimiter found)\");\n }\n if (opts.requireOrder === true) {\n const expected = [...order].toSorted((a, b) => a - b);\n if (order.join(\",\") !== expected.join(\",\")) {\n errors.push(\"placeholder order was changed\");\n }\n }\n\n return { errors, ok: errors.length === 0 };\n};\n\n// --- post-hoc placeholder validation (the free-utility CI check) -------------\n\n// Escape `\\` then `|` so the `|` field separator below is unambiguous even if a\n// variable / format / ref / options string itself contains one.\nconst esc = (s: string): string =>\n s.replaceAll(\"\\\\\", \"\\\\\\\\\").replaceAll(\"|\", \"\\\\|\");\n\n/**\n * Canonical identity for placeholder equality, computed from the `ValueToken`\n * fields directly — variable + format for interpolation, ref for nesting, raw for\n * markup. No syntax-specific normalisation: a value and its own translation carry\n * byte-identical placeholders, so the raw canonical fields ARE the faithful\n * identity (an adapter wanting looser matching can pre-normalise its tokens before\n * calling `validatePlaceholderTokens`). Returns `null` for text (not a placeholder).\n */\nconst signature = (tok: ValueToken): string | null => {\n switch (tok.type) {\n case \"interpolation\": {\n return `interp:${esc(tok.variable)}|${esc(tok.format ?? \"\")}`;\n }\n case \"nesting\": {\n // options are part of the placeholder's identity: $t(a, {\"count\": 3}) and\n // $t(a, {\"count\": 0}) render differently, so a mutated option must not pass.\n return `nest:${esc(tok.ref)}|${esc(tok.options ?? \"\")}`;\n }\n case \"markup\": {\n // one field (the whole raw tag), so no separator to disambiguate — and the\n // `markup:` prefix keeps it distinct from interp/nesting signatures.\n return `markup:${tok.raw}`;\n }\n case \"text\": {\n return null;\n }\n default: {\n // Exhaustive over ValueToken: a future variant becomes a compile error here\n // rather than silently slipping through this fail-closed validator as text.\n return tok satisfies never;\n }\n }\n};\n\n// Group tokens by signature; both the validator and the diff count from this.\nconst bySignature = (tokens: ValueToken[]): Map<string, ValueToken[]> => {\n const groups = new Map<string, ValueToken[]>();\n for (const tok of tokens) {\n const sig = signature(tok);\n if (sig === null) {\n continue;\n }\n const arr = groups.get(sig);\n if (arr) {\n arr.push(tok);\n } else {\n groups.set(sig, [tok]);\n }\n }\n return groups;\n};\n\nexport interface PlaceholderCheck {\n ok: boolean;\n errors: string[];\n}\n\n/**\n * Compare a source value's tokens against its translation's tokens. Fails closed:\n * any added, dropped, or modified placeholder/markup is an error. Catches a\n * translated variable name, a translated formatter keyword, a mangled `$t()` ref,\n * or a dropped tag — the exact placeholder-corruption failure mode this guards.\n */\nexport const validatePlaceholderTokens = (\n source: ValueToken[],\n translated: ValueToken[]\n): PlaceholderCheck => {\n const src = bySignature(source);\n const dst = bySignature(translated);\n const errors: string[] = [];\n\n for (const [sig, toks] of src) {\n if ((dst.get(sig)?.length ?? 0) < toks.length) {\n errors.push(\n `source placeholder \"${sig}\" is missing or altered in the translation`\n );\n }\n }\n for (const [sig, toks] of dst) {\n if (toks.length > (src.get(sig)?.length ?? 0)) {\n errors.push(`translation introduced an unexpected placeholder \"${sig}\"`);\n }\n }\n\n return { errors, ok: errors.length === 0 };\n};\n\n/**\n * String-entry placeholder validator: tokenizes both sides with the supplied\n * (syntax-specific) `tokenize` and defers to {@link validatePlaceholderTokens}.\n * The free-utility CI check — the adapter pre-binds its tokenizer so callers get\n * the ergonomic two-argument form.\n */\nexport const validatePlaceholders = (\n source: string,\n translated: string,\n tokenize: Tokenizer\n): PlaceholderCheck =>\n validatePlaceholderTokens(tokenize(source), tokenize(translated));\n\n/** The actual tokens that differ between source and translation. */\nexport interface PlaceholderDiff {\n /** placeholders in `source` that the translation dropped (or has fewer of) */\n dropped: ValueToken[];\n /** placeholders the translation introduced that `source` did not have */\n added: ValueToken[];\n}\n\n/**\n * The structured placeholder diff: the tokens dropped from / added to a\n * translation relative to its source. Shares {@link validatePlaceholderTokens}'s\n * identity rule — both go through the same {@link signature}, so the diff is\n * empty exactly when that validator reports `ok` — but returns the tokens so a\n * caller can render a precise report and decide whether a dropped+added pair is\n * an unambiguous \"changed\" or must be listed separately.\n *\n * @param source canonical tokens of the source value\n * @param translated canonical tokens of the translation\n * @returns dropped and added tokens; both empty ⇔ placeholders are preserved\n */\nexport const placeholderDiff = (\n source: ValueToken[],\n translated: ValueToken[]\n): PlaceholderDiff => {\n const src = bySignature(source);\n const dst = bySignature(translated);\n const dropped: ValueToken[] = [];\n const added: ValueToken[] = [];\n for (const [sig, toks] of src) {\n const have = dst.get(sig)?.length ?? 0;\n for (let i = have; i < toks.length; i += 1) {\n dropped.push(toks[i]);\n }\n }\n for (const [sig, toks] of dst) {\n const have = src.get(sig)?.length ?? 0;\n for (let i = have; i < toks.length; i += 1) {\n added.push(toks[i]);\n }\n }\n return { added, dropped };\n};\n"],"mappings":";;AAoBA,MAAa,sBAA+C;CAC1D;CACA;CACA;CACA;CACA;CACA;AACF;AAuFA,MAAa,SAAS,WAAmB,SACvC,GAAG,UAAU,GAAG;;;ACxGlB,IAAa,eAAb,cAAkC,MAAM;CACtC;;CAEA;CACA;CAEA,YACE,MACA,SACA,OAA4D,CAAC,GAC7D;EACA,MAAM,OAAO;EACb,KAAK,OAAO;EACZ,KAAK,OAAO;EACZ,KAAK,MAAM,KAAK;EAChB,KAAK,UAAU,KAAK;CACtB;AACF;;;;;;;;;;;;;;;;;;;;;;;;;;;ACAA,MAAM,OAAO;AACb,MAAM,QAAQ;AAEd,MAAa,YAAY,UAA0B,GAAG,OAAO,QAAQ;AAErE,MAAM,cAAc,IAAI,OAAO,GAAG,KAAK,QAAQ,SAAS,IAAI;;;;;;;AAqB5D,MAAa,cAAc,WAAqC;CAC9D,MAAM,QAAkB,CAAC;CACzB,IAAI,SAAS;CACb,KAAK,MAAM,OAAO,QAChB,IAAI,IAAI,SAAS,QAAQ;EACvB,IAAI,IAAI,IAAI,SAAS,IAAI,KAAK,IAAI,IAAI,SAAS,KAAK,GAClD,MAAM,IAAI,aACR,mBACA,qLAGF;EAEF,UAAU,IAAI;CAChB,OAAO;EACL,UAAU,SAAS,MAAM,MAAM;EAC/B,MAAM,KAAK,IAAI,GAAG;CACpB;CAEF,OAAO;EAAE;EAAQ;CAAM;AACzB;AAEA,MAAa,UAAU,QAAgB,UACrC,OAAO,QAAQ,cAAc,GAAG,MAAc;CAC5C,MAAM,OAAO,MAAM,OAAO,CAAC;CAC3B,IAAI,SAAS,KAAA,GACX,MAAM,IAAI,aACR,mBACA,0BAA0B,EAAE,gBAC9B;CAEF,OAAO;AACT,CAAC;;;;;;AAYH,MAAa,qBACX,YACA,OACA,OAAmC,CAAC,MAClB;CAClB,MAAM,SAAmB,CAAC;CAC1B,MAAM,uBAAO,IAAI,IAAoB;CACrC,MAAM,QAAkB,CAAC;CACzB,IAAI;CACJ,YAAY,YAAY;CACxB,QAAQ,IAAI,YAAY,KAAK,UAAU,OAAO,MAAM;EAClD,MAAM,MAAM,OAAO,EAAE,EAAE;EACvB,KAAK,IAAI,MAAM,KAAK,IAAI,GAAG,KAAK,KAAK,CAAC;EACtC,MAAM,KAAK,GAAG;CAChB;CAEA,KAAK,IAAI,IAAI,GAAG,IAAI,MAAM,QAAQ,KAAK,GAAG;EACxC,MAAM,QAAQ,KAAK,IAAI,CAAC,KAAK;EAC7B,IAAI,UAAU,GACZ,OAAO,KAAK,gBAAgB,EAAE,IAAI,MAAM,GAAG,oBAAoB;OAC1D,IAAI,QAAQ,GACjB,OAAO,KAAK,gBAAgB,EAAE,IAAI,MAAM,GAAG,uBAAuB;CAEtE;CACA,KAAK,MAAM,OAAO,KAAK,KAAK,GAC1B,IAAI,OAAO,MAAM,QACf,OAAO,KAAK,uCAAuC,KAAK;CAI5D,MAAM,WAAW,WAAW,QAAQ,aAAa,EAAE;CACnD,IAAI,SAAS,SAAS,IAAI,KAAK,SAAS,SAAS,KAAK,GACpD,OAAO,KAAK,6DAA6D;CAE3E,IAAI,KAAK,iBAAiB,MAAM;EAC9B,MAAM,WAAW,CAAC,GAAG,KAAK,CAAC,CAAC,UAAU,GAAG,MAAM,IAAI,CAAC;EACpD,IAAI,MAAM,KAAK,GAAG,MAAM,SAAS,KAAK,GAAG,GACvC,OAAO,KAAK,+BAA+B;CAE/C;CAEA,OAAO;EAAE;EAAQ,IAAI,OAAO,WAAW;CAAE;AAC3C;AAMA,MAAM,OAAO,MACX,EAAE,WAAW,MAAM,MAAM,CAAC,CAAC,WAAW,KAAK,KAAK;;;;;;;;;AAUlD,MAAM,aAAa,QAAmC;CACpD,QAAQ,IAAI,MAAZ;EACE,KAAK,iBACH,OAAO,UAAU,IAAI,IAAI,QAAQ,EAAE,GAAG,IAAI,IAAI,UAAU,EAAE;EAE5D,KAAK,WAGH,OAAO,QAAQ,IAAI,IAAI,GAAG,EAAE,GAAG,IAAI,IAAI,WAAW,EAAE;EAEtD,KAAK,UAGH,OAAO,UAAU,IAAI;EAEvB,KAAK,QACH,OAAO;EAET,SAGE,OAAO;CAEX;AACF;AAGA,MAAM,eAAe,WAAoD;CACvE,MAAM,yBAAS,IAAI,IAA0B;CAC7C,KAAK,MAAM,OAAO,QAAQ;EACxB,MAAM,MAAM,UAAU,GAAG;EACzB,IAAI,QAAQ,MACV;EAEF,MAAM,MAAM,OAAO,IAAI,GAAG;EAC1B,IAAI,KACF,IAAI,KAAK,GAAG;OAEZ,OAAO,IAAI,KAAK,CAAC,GAAG,CAAC;CAEzB;CACA,OAAO;AACT;;;;;;;AAaA,MAAa,6BACX,QACA,eACqB;CACrB,MAAM,MAAM,YAAY,MAAM;CAC9B,MAAM,MAAM,YAAY,UAAU;CAClC,MAAM,SAAmB,CAAC;CAE1B,KAAK,MAAM,CAAC,KAAK,SAAS,KACxB,KAAK,IAAI,IAAI,GAAG,CAAC,EAAE,UAAU,KAAK,KAAK,QACrC,OAAO,KACL,uBAAuB,IAAI,2CAC7B;CAGJ,KAAK,MAAM,CAAC,KAAK,SAAS,KACxB,IAAI,KAAK,UAAU,IAAI,IAAI,GAAG,CAAC,EAAE,UAAU,IACzC,OAAO,KAAK,qDAAqD,IAAI,EAAE;CAI3E,OAAO;EAAE;EAAQ,IAAI,OAAO,WAAW;CAAE;AAC3C;;;;;;;AAQA,MAAa,wBACX,QACA,YACA,aAEA,0BAA0B,SAAS,MAAM,GAAG,SAAS,UAAU,CAAC;;;;;;;;;;;;;AAsBlE,MAAa,mBACX,QACA,eACoB;CACpB,MAAM,MAAM,YAAY,MAAM;CAC9B,MAAM,MAAM,YAAY,UAAU;CAClC,MAAM,UAAwB,CAAC;CAC/B,MAAM,QAAsB,CAAC;CAC7B,KAAK,MAAM,CAAC,KAAK,SAAS,KAAK;EAC7B,MAAM,OAAO,IAAI,IAAI,GAAG,CAAC,EAAE,UAAU;EACrC,KAAK,IAAI,IAAI,MAAM,IAAI,KAAK,QAAQ,KAAK,GACvC,QAAQ,KAAK,KAAK,EAAE;CAExB;CACA,KAAK,MAAM,CAAC,KAAK,SAAS,KAAK;EAC7B,MAAM,OAAO,IAAI,IAAI,GAAG,CAAC,EAAE,UAAU;EACrC,KAAK,IAAI,IAAI,MAAM,IAAI,KAAK,QAAQ,KAAK,GACvC,MAAM,KAAK,KAAK,EAAE;CAEtB;CACA,OAAO;EAAE;EAAO;CAAQ;AAC1B"}
|
|
@@ -0,0 +1,102 @@
|
|
|
1
|
+
//#region src/model.d.ts
|
|
2
|
+
/**
|
|
3
|
+
* The canonical message model.
|
|
4
|
+
*
|
|
5
|
+
* This is the format-agnostic centre of Astilba: every file/syntax adapter maps
|
|
6
|
+
* files <-> this model. Phase 0 ships one adapter (native i18next v4) but the
|
|
7
|
+
* model is deliberately syntax-neutral so ICU and other dialects can map onto it
|
|
8
|
+
* later without changing it.
|
|
9
|
+
*
|
|
10
|
+
* Invariants worth stating out loud:
|
|
11
|
+
* - Value text is preserved EXACTLY (never mutate value bytes). `Value.raw` is
|
|
12
|
+
* the source of truth; `Value.tokens` is a derived view
|
|
13
|
+
* used only for masking/analysis and is never used to reconstruct output.
|
|
14
|
+
* - Plurals are represented STRUCTURALLY as a CLDR-category -> value map, not as
|
|
15
|
+
* suffixed flat keys. The suffix set is re-derived per target language on
|
|
16
|
+
* export, never carried through.
|
|
17
|
+
*/
|
|
18
|
+
/** The six CLDR plural categories. The set that actually applies is language-dependent. */
|
|
19
|
+
type CLDRCategory = "zero" | "one" | "two" | "few" | "many" | "other";
|
|
20
|
+
declare const ALL_CLDR_CATEGORIES: readonly CLDRCategory[];
|
|
21
|
+
/**
|
|
22
|
+
* Whether a key is pluralised, and if so how.
|
|
23
|
+
* - "none" -> a plain string; no plural suffix on export.
|
|
24
|
+
* - "cardinal" -> `_one`, `_other`, ... suffixes.
|
|
25
|
+
* - "ordinal" -> `_ordinal_one`, ... suffixes (i18next v4 ordinal form).
|
|
26
|
+
*
|
|
27
|
+
* Note "none" is distinct from a single-category cardinal (e.g. Japanese, whose
|
|
28
|
+
* only category is `other`): `foo` (none) and `foo_other` (cardinal) are
|
|
29
|
+
* different keys and must round-trip differently.
|
|
30
|
+
*/
|
|
31
|
+
type PluralKind = "none" | "cardinal" | "ordinal";
|
|
32
|
+
/** A token within a value. Derived from `Value.raw`; used for masking + validation. */
|
|
33
|
+
type ValueToken = {
|
|
34
|
+
type: "text";
|
|
35
|
+
raw: string;
|
|
36
|
+
} /** `{{var}}`, `{{var, format}}`, `{{var, format(options)}}` */ | {
|
|
37
|
+
type: "interpolation";
|
|
38
|
+
raw: string;
|
|
39
|
+
variable: string;
|
|
40
|
+
format?: string;
|
|
41
|
+
} /** `$t(ref)`, `$t(ref, {"opt": ...})` */ | {
|
|
42
|
+
type: "nesting";
|
|
43
|
+
raw: string;
|
|
44
|
+
ref: string;
|
|
45
|
+
options?: string;
|
|
46
|
+
} /** an HTML/XML tag `<...>` or an entity `&...;` — opaque markup */ | {
|
|
47
|
+
type: "markup";
|
|
48
|
+
raw: string;
|
|
49
|
+
};
|
|
50
|
+
/**
|
|
51
|
+
* A translatable value. `raw` is byte-exact source; `tokens` is the parsed view.
|
|
52
|
+
* `tokens.map(t => t.raw).join("")` always reconstructs `raw` exactly.
|
|
53
|
+
*/
|
|
54
|
+
interface Value {
|
|
55
|
+
raw: string;
|
|
56
|
+
tokens: ValueToken[];
|
|
57
|
+
}
|
|
58
|
+
/**
|
|
59
|
+
* The set of values for one (base, context) cell.
|
|
60
|
+
*
|
|
61
|
+
* - kind "none": `values` holds a single "other" entry = the plain string.
|
|
62
|
+
* - kind "cardinal"/"ordinal": `values` holds the per-category plural forms.
|
|
63
|
+
* - `bare`: present only in the rare i18next case where a context key has BOTH
|
|
64
|
+
* a suffix-less form (used when `t()` is called without `count`) AND plural
|
|
65
|
+
* forms (used when `count` is given). Kept so both render paths are lossless.
|
|
66
|
+
*/
|
|
67
|
+
interface PluralSet {
|
|
68
|
+
kind: PluralKind;
|
|
69
|
+
values: Map<CLDRCategory, Value>;
|
|
70
|
+
bare?: Value;
|
|
71
|
+
}
|
|
72
|
+
/**
|
|
73
|
+
* A logical message: a base key with one entry per context value.
|
|
74
|
+
* The no-context case is a single entry keyed "" (empty string).
|
|
75
|
+
*
|
|
76
|
+
* `base` is the full key path WITHOUT namespace and WITHOUT plural/context
|
|
77
|
+
* suffixes, using the project key separator (default "."), e.g. `account.friend`.
|
|
78
|
+
*/
|
|
79
|
+
interface Key {
|
|
80
|
+
namespace: string;
|
|
81
|
+
base: string;
|
|
82
|
+
/** contextValue -> its PluralSet. "" === no context. */
|
|
83
|
+
contexts: Map<string, PluralSet>;
|
|
84
|
+
}
|
|
85
|
+
/**
|
|
86
|
+
* A single language's worth of canonical data. Round-trip is per-language.
|
|
87
|
+
*
|
|
88
|
+
* IN-MEMORY ONLY: this is `Map`-based for fast lookup, so it is NOT directly
|
|
89
|
+
* JSON-serializable (`JSON.stringify` yields `{}` for the Maps). A persistence/
|
|
90
|
+
* transport DTO (plain objects or a `toJSON`/`fromJSON` pair) is a v1.0 item,
|
|
91
|
+
* needed once the backend stores or ships the model.
|
|
92
|
+
*/
|
|
93
|
+
interface CanonicalModel {
|
|
94
|
+
/** BCP-47, e.g. `en`, `en-US`, `pt-BR`. */
|
|
95
|
+
language: string;
|
|
96
|
+
/** `${namespace}:${base}` -> Key */
|
|
97
|
+
keys: Map<string, Key>;
|
|
98
|
+
}
|
|
99
|
+
declare const keyId: (namespace: string, base: string) => string;
|
|
100
|
+
//#endregion
|
|
101
|
+
export { PluralKind as a, ValueToken as c, Key as i, keyId as l, CLDRCategory as n, PluralSet as o, CanonicalModel as r, Value as s, ALL_CLDR_CATEGORIES as t };
|
|
102
|
+
//# sourceMappingURL=model-5mrSQGoC.d.ts.map
|
package/package.json
ADDED
|
@@ -0,0 +1,62 @@
|
|
|
1
|
+
{
|
|
2
|
+
"name": "@astilba/core",
|
|
3
|
+
"version": "0.1.0",
|
|
4
|
+
"description": "Format-neutral canonical i18n message model, vendored CLDR plural rules, and the round-trip message-fidelity harness (driver + FormatAdapter/RenderOracle contracts). Syntax adapters (e.g. @astilba/adapter-i18next-v4) plug in on top.",
|
|
5
|
+
"keywords": [
|
|
6
|
+
"cldr",
|
|
7
|
+
"i18n",
|
|
8
|
+
"i18next",
|
|
9
|
+
"icu",
|
|
10
|
+
"localization",
|
|
11
|
+
"tms",
|
|
12
|
+
"translation"
|
|
13
|
+
],
|
|
14
|
+
"homepage": "https://github.com/astilbahq/astilba/tree/main/packages/core#readme",
|
|
15
|
+
"bugs": {
|
|
16
|
+
"url": "https://github.com/astilbahq/astilba/issues"
|
|
17
|
+
},
|
|
18
|
+
"license": "MIT",
|
|
19
|
+
"author": "Rees Morris",
|
|
20
|
+
"repository": {
|
|
21
|
+
"type": "git",
|
|
22
|
+
"url": "git+https://github.com/astilbahq/astilba.git",
|
|
23
|
+
"directory": "packages/core"
|
|
24
|
+
},
|
|
25
|
+
"files": [
|
|
26
|
+
"dist"
|
|
27
|
+
],
|
|
28
|
+
"type": "module",
|
|
29
|
+
"sideEffects": false,
|
|
30
|
+
"main": "./dist/index.js",
|
|
31
|
+
"module": "./dist/index.js",
|
|
32
|
+
"types": "./dist/index.d.ts",
|
|
33
|
+
"exports": {
|
|
34
|
+
".": {
|
|
35
|
+
"types": "./dist/index.d.ts",
|
|
36
|
+
"default": "./dist/index.js"
|
|
37
|
+
},
|
|
38
|
+
"./cldr": {
|
|
39
|
+
"types": "./dist/cldr.d.ts",
|
|
40
|
+
"default": "./dist/cldr.js"
|
|
41
|
+
},
|
|
42
|
+
"./harness": {
|
|
43
|
+
"types": "./dist/harness.d.ts",
|
|
44
|
+
"default": "./dist/harness.js"
|
|
45
|
+
},
|
|
46
|
+
"./package.json": "./package.json"
|
|
47
|
+
},
|
|
48
|
+
"publishConfig": {
|
|
49
|
+
"access": "public"
|
|
50
|
+
},
|
|
51
|
+
"devDependencies": {
|
|
52
|
+
"@arethetypeswrong/cli": "^0.18.2",
|
|
53
|
+
"@types/node": "^22",
|
|
54
|
+
"publint": "^0.3.0"
|
|
55
|
+
},
|
|
56
|
+
"scripts": {
|
|
57
|
+
"build": "vp pack",
|
|
58
|
+
"test": "vp test",
|
|
59
|
+
"typecheck": "tsc --noEmit",
|
|
60
|
+
"lint:publish": "trap 'rm -f /tmp/astilba-core-*.tgz' EXIT; pnpm pack --pack-destination /tmp >/dev/null && publint --strict /tmp/astilba-core-*.tgz && attw /tmp/astilba-core-*.tgz --profile esm-only"
|
|
61
|
+
}
|
|
62
|
+
}
|