fontisan 0.4.6 → 0.4.7
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/BUG-stitcher-drops-isolated-cps.md +58 -0
- data/BUG-stitcher-drops-plane1-codepoints.md +310 -0
- data/BUG-stitcher-gid-cap-65535.md +110 -0
- data/CHANGELOG.md +106 -0
- data/README.adoc +121 -68
- data/benchmark/compile_benchmark.rb +70 -0
- data/docs/CFF2_SUPPORT.adoc +184 -0
- data/docs/STITCHER_GUIDE.adoc +151 -0
- data/docs/SVG_TO_GLYF.adoc +118 -0
- data/docs/UFO_COMPILATION.adoc +119 -0
- data/lib/fontisan/collection/writer.rb +5 -6
- data/lib/fontisan/error.rb +31 -0
- data/lib/fontisan/stitcher/deduplicator.rb +47 -0
- data/lib/fontisan/stitcher/glyph_limit.rb +53 -0
- data/lib/fontisan/stitcher/glyph_signature.rb +51 -0
- data/lib/fontisan/stitcher.rb +188 -167
- data/lib/fontisan/svg_to_glyf/assembler.rb +132 -0
- data/lib/fontisan/svg_to_glyf/document.rb +83 -0
- data/lib/fontisan/svg_to_glyf/geometry/affine_transform.rb +112 -0
- data/lib/fontisan/svg_to_glyf/geometry/normalizer.rb +45 -0
- data/lib/fontisan/svg_to_glyf/geometry/transform_parser.rb +91 -0
- data/lib/fontisan/svg_to_glyf/geometry.rb +13 -0
- data/lib/fontisan/svg_to_glyf/path/command.rb +18 -0
- data/lib/fontisan/svg_to_glyf/path/contour_builder.rb +140 -0
- data/lib/fontisan/svg_to_glyf/path/parser.rb +98 -0
- data/lib/fontisan/svg_to_glyf/path/state.rb +79 -0
- data/lib/fontisan/svg_to_glyf/path.rb +14 -0
- data/lib/fontisan/svg_to_glyf.rb +62 -0
- data/lib/fontisan/tables/cff/cff2_charstring_builder.rb +216 -0
- data/lib/fontisan/tables/cff.rb +1 -0
- data/lib/fontisan/tables/cff2/dict_encoder.rb +94 -0
- data/lib/fontisan/tables/cff2/fd_select.rb +69 -0
- data/lib/fontisan/tables/cff2/header.rb +34 -0
- data/lib/fontisan/tables/cff2/index_builder.rb +79 -0
- data/lib/fontisan/tables/cff2.rb +4 -0
- data/lib/fontisan/ufo/compile/cbdt_cblc.rb +103 -0
- data/lib/fontisan/ufo/compile/cff2.rb +181 -0
- data/lib/fontisan/ufo/compile/cff2_subroutines.rb +39 -0
- data/lib/fontisan/ufo/compile/colr.rb +80 -0
- data/lib/fontisan/ufo/compile/cpal.rb +61 -0
- data/lib/fontisan/ufo/compile/math.rb +143 -0
- data/lib/fontisan/ufo/compile/meta.rb +51 -0
- data/lib/fontisan/ufo/compile/otf2_compiler.rb +46 -0
- data/lib/fontisan/ufo/compile/sbix.rb +99 -0
- data/lib/fontisan/ufo/compile/svg_table.rb +60 -0
- data/lib/fontisan/ufo/compile/variable_otf.rb +75 -0
- data/lib/fontisan/ufo/compile.rb +11 -0
- data/lib/fontisan/version.rb +1 -1
- data/lib/fontisan.rb +3 -0
- metadata +41 -2
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: 9957716f86dad58d8ff9b5f0bc49bc7044ab20f073bd26b5b87f09d8a5aa1cb1
|
|
4
|
+
data.tar.gz: 5ac5083c8db47532bc1331c82209c02bc6d8b7095b8f2c27c41fecced8230b32
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: e1fa7608f49e922735d6b677456cbf4864ced9a32e3e771960229f685208c67be6079f9b1bfcfc6f393c83bf8d5b7e0e158c397b618080229a14dd274718724e
|
|
7
|
+
data.tar.gz: 730148059f3819b6dd49c1f8b86ab9d2e4d8508671b54b59cc240e638472b0da0dd5e5b7187f30b09bbd12d8e149a93903097068a45ac281224f2482464fa2ea
|
|
@@ -0,0 +1,58 @@
|
|
|
1
|
+
# BUG: Stitcher drops isolated codepoints from NotoSansCuneiform (U+12399)
|
|
2
|
+
|
|
3
|
+
## Status
|
|
4
|
+
|
|
5
|
+
NEW — discovered 2026-07-01. Not yet addressed. Same class as the
|
|
6
|
+
earlier Plane 1 drop bugs (BUG-stitcher-drops-plane1-codepoints.md)
|
|
7
|
+
but at a different gid position (gid 925 of 1238).
|
|
8
|
+
|
|
9
|
+
## Reproducer
|
|
10
|
+
|
|
11
|
+
```ruby
|
|
12
|
+
require "fontisan"
|
|
13
|
+
|
|
14
|
+
src = Fontisan::FontLoader.load("NotoSansCuneiform-Regular.ttf")
|
|
15
|
+
cmap = src.table("cmap").unicode_mappings
|
|
16
|
+
# U+12399 → gid 925 (font has 1239 glyphs total)
|
|
17
|
+
|
|
18
|
+
stitcher = Fontisan::Stitcher.new
|
|
19
|
+
stitcher.add_source(:cuneiform, src)
|
|
20
|
+
stitcher.include_notdef(from: :cuneiform)
|
|
21
|
+
stitcher.include_codepoints([0x12399], from: :cuneiform)
|
|
22
|
+
stitcher.write_to("/tmp/out.ttf", format: :ttf)
|
|
23
|
+
|
|
24
|
+
out = Fontisan::FontLoader.load("/tmp/out.ttf")
|
|
25
|
+
out.table("cmap").unicode_mappings.key?(0x12399)
|
|
26
|
+
# => false ← SILENTLY DROPPED
|
|
27
|
+
```
|
|
28
|
+
|
|
29
|
+
## Source font
|
|
30
|
+
|
|
31
|
+
- File: `NotoSansCuneiform-Regular.ttf` (essenfont repo)
|
|
32
|
+
- num_glyphs: 1239
|
|
33
|
+
- cmap size: 1238
|
|
34
|
+
- U+12399 → gid 925 (well within range, NOT at the top end)
|
|
35
|
+
|
|
36
|
+
## Other donors affected
|
|
37
|
+
|
|
38
|
+
Same pattern may affect other donors with similar cmap topology.
|
|
39
|
+
Worth running the full donor_audit + coverage gap analysis to
|
|
40
|
+
identify all affected codepoints.
|
|
41
|
+
|
|
42
|
+
## Impact
|
|
43
|
+
|
|
44
|
+
essenfont output is missing U+12399 (Cuneiform) and possibly other
|
|
45
|
+
codepoints from various donors. Each donor with a similar gid layout
|
|
46
|
+
may lose 1-N codepoints silently.
|
|
47
|
+
|
|
48
|
+
## Suggested investigation
|
|
49
|
+
|
|
50
|
+
The O(1) per-glyph extraction path (commit `f07e788`) may still have
|
|
51
|
+
edge cases. gid 925 isn't at the boundary of the font — the drop is
|
|
52
|
+
not a "top-end gid" issue like Kedebideri. This is an isolated drop
|
|
53
|
+
in the middle of the gid range.
|
|
54
|
+
|
|
55
|
+
Check the glyph at gid 925 in the source — is it empty, a composite,
|
|
56
|
+
or otherwise different from its neighbors? The `Convert::FromBinData`
|
|
57
|
+
fix that addressed the `next unless simple` issue may still have
|
|
58
|
+
gaps for certain glyph types.
|
|
@@ -0,0 +1,310 @@
|
|
|
1
|
+
# BUG: Stitcher silently drops Plane 1 codepoints from some sources (and partial Plane 1 loss from others)
|
|
2
|
+
|
|
3
|
+
## RESOLVED in fontisan 0.4.2 (PR #53)
|
|
4
|
+
|
|
5
|
+
All four failure modes are fixed. Final status:
|
|
6
|
+
|
|
7
|
+
| Failure | 0.4.0 | 0.4.1 | 0.4.2 |
|
|
8
|
+
|---|---|---|---|
|
|
9
|
+
| Lentariso TTF Plane 1 (Imperial Aramaic, Phoenician, Sidetic) | broken | fixed | **fixed** ✓ |
|
|
10
|
+
| Kedebideri TTF top gids (Beria Erfe U+16EB5–B8, U+16ED0–D3) | broken | fixed | **fixed** ✓ |
|
|
11
|
+
| Tangut OTF/CFF Plane 1 (NotoSerifTangut) | ok | broken | **fixed** ✓ |
|
|
12
|
+
| CBDT passthrough (NotoColorEmoji) | ok | ok | **fixed** ✓ |
|
|
13
|
+
|
|
14
|
+
### Root cause (PR #53)
|
|
15
|
+
|
|
16
|
+
`Convert::FromBinData` had `next unless simple` which skipped empty
|
|
17
|
+
glyphs, breaking gid → array index alignment. The O(1) per-glyph
|
|
18
|
+
extraction introduced in 0.4.1 (`f07e788`) relied on that alignment
|
|
19
|
+
and silently dropped cps whose gids mapped to skipped slots.
|
|
20
|
+
|
|
21
|
+
PR #53 contains 3 commits:
|
|
22
|
+
|
|
23
|
+
1. **`Convert::FromBinData`**: removed `next unless simple` — always
|
|
24
|
+
adds glyph to UFO (even if empty), maintaining gid → array index
|
|
25
|
+
alignment. **Root cause fix for both the original TTF bug and the
|
|
26
|
+
CFF regression.**
|
|
27
|
+
2. **`Stitcher::Source#extract_cff_glyph_safe(gid)`**: for CFF
|
|
28
|
+
sources, falls back to full UFO conversion (which now has no
|
|
29
|
+
gaps).
|
|
30
|
+
3. **Regression spec**: OTF/CFF Tangut test added to
|
|
31
|
+
`spec/fontisan/stitcher/cmap_preservation_spec.rb` — verifies
|
|
32
|
+
Tangut codepoints from `NotoSerifTangut.otf` survive in output.
|
|
33
|
+
4. **GPOS kern writer** (TODO 08 partial): new `Compile::Gpos`
|
|
34
|
+
module builds minimal GPOS table from UFO kerning pairs.
|
|
35
|
+
|
|
36
|
+
### Consumer-side validation (essenfont)
|
|
37
|
+
|
|
38
|
+
Build with fontisan 0.4.2 + the full donor manifest produces:
|
|
39
|
+
|
|
40
|
+
- 65,535 glyphs / 45.9MB
|
|
41
|
+
- 181,515 cmap entries (was 159,764 in issue #3, +21,751)
|
|
42
|
+
- 59.75% Unicode 17 coverage (was 53.00%, +6.75 pp)
|
|
43
|
+
- 53 blocks complete (was 49)
|
|
44
|
+
|
|
45
|
+
All affected donors round-trip 100%:
|
|
46
|
+
|
|
47
|
+
| Donor | Block | CMAP → output |
|
|
48
|
+
|---|---|---|
|
|
49
|
+
| Lentariso | Imperial Aramaic U+10840–1085F | 31/31 |
|
|
50
|
+
| Lentariso | Phoenician U+10900–1091F | 29/29 |
|
|
51
|
+
| Lentariso | Sidetic U+10940–1095F | 26/26 |
|
|
52
|
+
| Kedebideri | Beria Erfe U+16EA0–16EDF | 50/50 |
|
|
53
|
+
| NotoSerifTangut (OTF/CFF) | Tangut U+17000–187FF | 6136/6136 |
|
|
54
|
+
| NotoSerifTangut | Tangut Components U+18800–18AFF | 768/768 |
|
|
55
|
+
| NotoColorEmoji (CBDT) | Emoticons U+1F600–1F64F | 80/80 |
|
|
56
|
+
|
|
57
|
+
The sections below are preserved for historical reference.
|
|
58
|
+
|
|
59
|
+
---
|
|
60
|
+
|
|
61
|
+
## Summary
|
|
62
|
+
|
|
63
|
+
Two distinct failures in `Fontisan::Stitcher` (fontisan 0.4.0) when
|
|
64
|
+
stitching codepoints from certain TTF/OTF sources into a target TTF:
|
|
65
|
+
|
|
66
|
+
1. **Total Plane 1 drop** — for sources that have cmap entries in
|
|
67
|
+
Plane 1 (U+10000..U+1FFFF), the Stitcher writes **none** of them
|
|
68
|
+
into the output font, even though the donor's cmap loads
|
|
69
|
+
correctly via `Fontisan::FontLoader.load`.
|
|
70
|
+
2. **Partial upper-end drop** — for some sources, a small contiguous
|
|
71
|
+
run of the highest-cp Plane 1 entries is dropped, while the rest
|
|
72
|
+
of the block is preserved.
|
|
73
|
+
|
|
74
|
+
|
|
75
|
+
Both failures are silent — no exception, no warning. The output font
|
|
76
|
+
is missing the affected codepoints without any indication.
|
|
77
|
+
|
|
78
|
+
## Reproducers
|
|
79
|
+
|
|
80
|
+
### Bug 1 — Lentariso (total Plane 1 drop)
|
|
81
|
+
|
|
82
|
+
```ruby
|
|
83
|
+
require "fontisan"
|
|
84
|
+
|
|
85
|
+
src = Fontisan::FontLoader.load("references/input-fonts/Lentariso-Regular.ttf")
|
|
86
|
+
cmap = src.table("cmap")
|
|
87
|
+
maps = cmap.unicode_mappings
|
|
88
|
+
plane1_cps = maps.keys.select { |cp| cp >= 0x10000 && cp <= 0x1FFFF }
|
|
89
|
+
# plane1_cps.size == 1934
|
|
90
|
+
|
|
91
|
+
stitcher = Fontisan::Stitcher.new
|
|
92
|
+
stitcher.add_source(:lentariso, src)
|
|
93
|
+
stitcher.include_notdef(from: :lentariso)
|
|
94
|
+
stitcher.include_codepoints(plane1_cps, from: :lentariso)
|
|
95
|
+
stitcher.write_to("/tmp/out.ttf", format: :ttf)
|
|
96
|
+
|
|
97
|
+
out = Fontisan::FontLoader.load("/tmp/out.ttf")
|
|
98
|
+
out_plane1 = out.table("cmap").unicode_mappings.keys
|
|
99
|
+
.count { |cp| cp >= 0x10000 && cp <= 0x1FFFF }
|
|
100
|
+
# out_plane1 == 0
|
|
101
|
+
```
|
|
102
|
+
|
|
103
|
+
Sub-ranges showing the same pattern:
|
|
104
|
+
- U+10840..U+1085F (Imperial Aramaic): 31 cps in donor → **0 in output**
|
|
105
|
+
- U+10900..U+1091F (Phoenician): 29 cps in donor → **0 in output**
|
|
106
|
+
- U+10940..U+1095F (Sidetic): 26 cps in donor → **0 in output**
|
|
107
|
+
|
|
108
|
+
Other donors with Plane 1 coverage work correctly:
|
|
109
|
+
- `NewGardiner.ttf` (4,537 Plane 1 cps) → all 10/10 stitched in spot-check
|
|
110
|
+
- `references/input-fonts/UniHieroglyphica.ttf` (Egyptian Hieroglyphs Ext-A) → all 4,000/4,000 stitched
|
|
111
|
+
|
|
112
|
+
### Bug 2 — Kedebideri (partial upper-end drop)
|
|
113
|
+
|
|
114
|
+
```ruby
|
|
115
|
+
require "fontisan"
|
|
116
|
+
|
|
117
|
+
src = Fontisan::FontLoader.load("references/input-fonts/Kedebideri-Regular.ttf")
|
|
118
|
+
cmap = src.table("cmap")
|
|
119
|
+
maps = cmap.unicode_mappings
|
|
120
|
+
beria_erfe = maps.keys.select { |cp| cp >= 0x16EA0 && cp <= 0x16EDF }
|
|
121
|
+
# beria_erfe.size == 50
|
|
122
|
+
|
|
123
|
+
stitcher = Fontisan::Stitcher.new
|
|
124
|
+
stitcher.add_source(:kedebideri, src)
|
|
125
|
+
stitcher.include_notdef(from: :kedebideri)
|
|
126
|
+
stitcher.include_codepoints(beria_erfe, from: :kedebideri)
|
|
127
|
+
stitcher.write_to("/tmp/out.ttf", format: :ttf)
|
|
128
|
+
|
|
129
|
+
out = Fontisan::FontLoader.load("/tmp/out.ttf")
|
|
130
|
+
out_beria = out.table("cmap").unicode_mappings.keys
|
|
131
|
+
.select { |cp| cp >= 0x16EA0 && cp <= 0x16EDF }
|
|
132
|
+
# out_beria.size == 42 (8 dropped)
|
|
133
|
+
```
|
|
134
|
+
|
|
135
|
+
The 8 dropped codepoints (always the same set):
|
|
136
|
+
- `U+16EB5`, `U+16EB6`, `U+16EB7`, `U+16EB8` (consecutive)
|
|
137
|
+
- `U+16ED0`, `U+16ED1`, `U+16ED2`, `U+16ED3` (consecutive)
|
|
138
|
+
|
|
139
|
+
These all map to **gids 341–348** in the source font, which is the
|
|
140
|
+
top end of the source's gid range (2..348). Dropped gids span the
|
|
141
|
+
topmost consecutive range of the cmap; the gap at gid 340 in the
|
|
142
|
+
source cmap (a missing glyph) precedes the first dropped range.
|
|
143
|
+
|
|
144
|
+
## Expected
|
|
145
|
+
|
|
146
|
+
All codepoints present in the source font's cmap should appear in the
|
|
147
|
+
stitched output's cmap, with the same or remapped glyph references.
|
|
148
|
+
|
|
149
|
+
## Actual
|
|
150
|
+
|
|
151
|
+
- Lentariso: **0 / 1934** Plane 1 codepoints make it to output (100% loss)
|
|
152
|
+
- Kedebideri: **42 / 50** Beria Erfe codepoints in output (16% loss, top-end)
|
|
153
|
+
|
|
154
|
+
## Observed patterns
|
|
155
|
+
|
|
156
|
+
- **Only Plane 1 is affected**. Plane 0 (BMP) codepoints from the
|
|
157
|
+
same sources work fine. Plane 2+ codepoints (FSung-2, ~61,000 cps
|
|
158
|
+
in CJK Extension B-E) work fine.
|
|
159
|
+
- **The total-drop case (Lentariso) is consistent across multiple
|
|
160
|
+
runs** — every Plane 1 cp is dropped, not just some.
|
|
161
|
+
- **The partial-drop case (Kedebideri) drops exactly the topmost
|
|
162
|
+
consecutive range of cmap gids** that correspond to Plane 1 entries
|
|
163
|
+
whose gids are adjacent.
|
|
164
|
+
- Other Plane 1 sources work fine (NewGardiner, UniHieroglyphica),
|
|
165
|
+
so the bug is not a blanket "Plane 1 unsupported" — something about
|
|
166
|
+
the cmap structure or font topology triggers it.
|
|
167
|
+
|
|
168
|
+
## What's been ruled out
|
|
169
|
+
|
|
170
|
+
- cmap gids exceeding `maxp.num_glyphs` — verified all source gids
|
|
171
|
+
are within num_glyphs for both Lentariso (maxp=6063, cmap=5767)
|
|
172
|
+
and Kedebideri (maxp=380, cmap=347).
|
|
173
|
+
- File format / validation — both sources load without warning via
|
|
174
|
+
`Fontisan::FontLoader.load`; their cmaps are introspectable and
|
|
175
|
+
show all expected entries.
|
|
176
|
+
- Lentariso Plane 0 cps work (e.g., U+0041 → 'A' round-trips).
|
|
177
|
+
- FSung-2's Plane 2 cps (CJK Ext B, ~61k entries) work end-to-end.
|
|
178
|
+
|
|
179
|
+
## Suggested investigation directions
|
|
180
|
+
|
|
181
|
+
1. **cmap format detection** — sources may be reported as format 4
|
|
182
|
+
(BMP-only) by fontisan's parser when they actually have a format
|
|
183
|
+
12 subtable. If the Stitcher writes only format 4 to the output,
|
|
184
|
+
all Plane 1 entries would be dropped.
|
|
185
|
+
2. **`Stitcher::Source#bitmap_mode`** — added in fontisan 0.4 (per
|
|
186
|
+
CHANGELOG entry for the recent Stitcher rewrite). If Lentariso is
|
|
187
|
+
misclassified as `:cbdt` or `:none` despite being a pure `glyf`
|
|
188
|
+
source, its cmap could be ignored or partially processed.
|
|
189
|
+
3. **gid adjacency in cmap** — the Kedebideri drop (gids 341–348)
|
|
190
|
+
sits at the topmost consecutive run of cmap gids. If the Stitcher
|
|
191
|
+
has a cutoff based on "n gids before the cmap max gid", this
|
|
192
|
+
could explain it. The exact mechanism is unclear without reading
|
|
193
|
+
the Stitcher internals.
|
|
194
|
+
4. **Lookback into Stitcher's `assemble_cmap`** — the function that
|
|
195
|
+
merges source cmaps into the output cmap is the most likely site
|
|
196
|
+
of the bug. Check whether it iterates source cmap entries
|
|
197
|
+
sorted by glyph id or by codepoint, and whether it has any
|
|
198
|
+
boundary conditions on either sort key.
|
|
199
|
+
|
|
200
|
+
## Impact
|
|
201
|
+
|
|
202
|
+
- **essenfont** (this project's v0.1.0 output font) silently loses
|
|
203
|
+
86+ codepoints in Sidetic / Imperial Aramaic / Phoenician, and 8
|
|
204
|
+
in Beria Erfe. Coverage in `scripts/coverage_report.rb` shows the
|
|
205
|
+
affected blocks as 0% (or partial), even though the source fonts
|
|
206
|
+
contain valid glyphs.
|
|
207
|
+
- Any downstream consumer of the Stitcher that combines Plane 1
|
|
208
|
+
donors is affected.
|
|
209
|
+
|
|
210
|
+
## Related
|
|
211
|
+
|
|
212
|
+
- fontisan 0.4.0 added single-source CBDT/CBLC passthrough (commit
|
|
213
|
+
`b612058`); this Stitcher bug may have been introduced in the
|
|
214
|
+
same refactor. If so, the fix should land in 0.4.x.
|
|
215
|
+
|
|
216
|
+
## Test fixtures
|
|
217
|
+
|
|
218
|
+
The above reproducers use donor fonts committed to the essenfont
|
|
219
|
+
repo at `references/input-fonts/`:
|
|
220
|
+
- `Lentariso-Regular.ttf` (Bryndan W. Meyerholt, OFL, sha256 514ac442c9c5c361625da0755e84baaee36bbafe098138312f5285c3bd2fa0d3)
|
|
221
|
+
- `Kedebideri-Regular.ttf` (SIL International, OFL, sha256 f95907a8c39c68d557c2264fcf593a858eb4751cd5e0c3c7c53e1ef354444064)
|
|
222
|
+
|
|
223
|
+
Both files are also available from the upstream sources documented in
|
|
224
|
+
`essenfont/references/input-fonts/ATTRIBUTIONS.md`.
|
|
225
|
+
|
|
226
|
+
## Reporter
|
|
227
|
+
|
|
228
|
+
essenfont project, issue #3 (https://github.com/fontist/essenfont/issues/3)
|
|
229
|
+
|
|
230
|
+
## Regression introduced by the fix (2026-06-30)
|
|
231
|
+
|
|
232
|
+
The `feature/stitcher-perf` branch fixes the Lentariso + Kedebideri
|
|
233
|
+
bugs but introduces a **new** silent Plane 1 drop for OTF/CFF sources.
|
|
234
|
+
|
|
235
|
+
### New reproducer
|
|
236
|
+
|
|
237
|
+
```ruby
|
|
238
|
+
require "fontisan"
|
|
239
|
+
|
|
240
|
+
# NotoSerifTangut is OTF (CFF outlines, no glyf table)
|
|
241
|
+
src = Fontisan::FontLoader.load("NotoSerifTangut-Regular.otf")
|
|
242
|
+
cmap = src.table("cmap")
|
|
243
|
+
maps = cmap.unicode_mappings
|
|
244
|
+
tangut = maps.keys.select { |cp| cp >= 0x17000 && cp <= 0x187FF }
|
|
245
|
+
# tangut.size == 6136
|
|
246
|
+
|
|
247
|
+
stitcher = Fontisan::Stitcher.new
|
|
248
|
+
stitcher.add_source(:tangut, src)
|
|
249
|
+
stitcher.include_notdef(from: :tangut)
|
|
250
|
+
stitcher.include_codepoints(tangut.first(10), from: :tangut)
|
|
251
|
+
stitcher.write_to("/tmp/out.ttf", format: :ttf)
|
|
252
|
+
|
|
253
|
+
out = Fontisan::FontLoader.load("/tmp/out.ttf")
|
|
254
|
+
outmaps = out.table("cmap").unicode_mappings
|
|
255
|
+
present = tangut.first(10).count { |cp| outmaps.key?(cp) }
|
|
256
|
+
# present == 0 ← ALL DROPPED
|
|
257
|
+
```
|
|
258
|
+
|
|
259
|
+
Source has tables: `["CFF ", "GDEF", "GPOS", "GSUB", "OS/2", "VORG",
|
|
260
|
+
"cmap", "head", "hhea", "hmtx", "maxp", "name", "post", "vhea",
|
|
261
|
+
"vmtx"]`. Note `CFF ` (CFF outlines), no `glyf`.
|
|
262
|
+
|
|
263
|
+
### Affected donors (essenfont manifest)
|
|
264
|
+
|
|
265
|
+
| Donor | Plane 1 cps | In output | Loss |
|
|
266
|
+
|---|---|---|---|
|
|
267
|
+
| **noto-serif-tangut** (OTF/CFF) | 6916 | **0** | **100%** ← full regression |
|
|
268
|
+
| noto-sans (TTF) | 88 | 84 | 4 cps |
|
|
269
|
+
| noto-sans-sharada (TTF) | 96 | 95 | 1 cp |
|
|
270
|
+
| noto-sans-math (TTF) | 1228 | 1200 | 28 cps |
|
|
271
|
+
| noto-music (TTF) | 549 | 537 | 12 cps |
|
|
272
|
+
| noto-sans-symbols-2 (TTF) | 1608 | 1596 | 12 cps |
|
|
273
|
+
| lentariso (TTF) | 1934 | 1930 | 4 cps |
|
|
274
|
+
| egyptian-text (TTF) | 1131 | 1118 | 13 cps |
|
|
275
|
+
| uni-hieroglyphica (TTF) | 5108 | 5095 | 13 cps |
|
|
276
|
+
| new-gardiner (TTF) | 4537 | 4524 | 13 cps |
|
|
277
|
+
|
|
278
|
+
**Total: 6,916 Tangut cps + ~100 cps across 9 TTF donors = ~7,000 cps lost.**
|
|
279
|
+
|
|
280
|
+
### Suggested cause
|
|
281
|
+
|
|
282
|
+
The `b3e52af` commit ("stitcher: O(1) per-glyph extraction instead
|
|
283
|
+
of O(n) full-donor conversion") likely changed the per-glyph
|
|
284
|
+
extraction path to assume `glyf` outlines. For OTF/CFF sources,
|
|
285
|
+
extraction silently fails and the cp is dropped from the output cmap.
|
|
286
|
+
|
|
287
|
+
The smaller losses (~1-3% per TTF donor) may be a separate issue —
|
|
288
|
+
perhaps an off-by-one or boundary condition in the new
|
|
289
|
+
extraction loop. Worth grepping the new spec files for boundary
|
|
290
|
+
test cases.
|
|
291
|
+
|
|
292
|
+
### Net impact on essenfont
|
|
293
|
+
|
|
294
|
+
- Before fix (0.4.0): 181,354 cmap entries, 59.69% Unicode 17 coverage
|
|
295
|
+
- After fix (feature/stitcher-perf): 174,599 entries, 57.47% coverage
|
|
296
|
+
- **Net change: −6,755 entries, −2.22 percentage points**
|
|
297
|
+
|
|
298
|
+
The fix trades Lentariso + Kedebideri coverage (94 cps regained) for
|
|
299
|
+
Noto Serif Tangut coverage (6,916 cps lost). Net negative.
|
|
300
|
+
|
|
301
|
+
### Recommended next step
|
|
302
|
+
|
|
303
|
+
The Stitcher's per-glyph extraction needs to handle:
|
|
304
|
+
- TTF / `glyf` sources (current path)
|
|
305
|
+
- OTF / `CFF ` sources (currently broken — likely the regression)
|
|
306
|
+
- CBDT/CBLC sources (works in 0.4.0+ for single-source passthrough)
|
|
307
|
+
|
|
308
|
+
All three glyph-storage modes should round-trip identically. A
|
|
309
|
+
parametric spec covering each mode for the same test codepoint
|
|
310
|
+
would prevent this class of regression.
|
|
@@ -0,0 +1,110 @@
|
|
|
1
|
+
# BUG: Stitcher repair pass drops 3,004 cmap entries (gid remap)
|
|
2
|
+
|
|
3
|
+
## Status
|
|
4
|
+
|
|
5
|
+
NEW — discovered 2026-07-01 with fontisan 0.4.6. Introduced by the
|
|
6
|
+
repair pass added alongside the compound-glyph flatten fix
|
|
7
|
+
(commit `73820f1`). Not yet addressed.
|
|
8
|
+
|
|
9
|
+
## Summary
|
|
10
|
+
|
|
11
|
+
The build output's post-write repair step reports:
|
|
12
|
+
|
|
13
|
+
```
|
|
14
|
+
repairing: 3004 cmap entries pointed to non-existent gids (max gid = 65534)
|
|
15
|
+
repaired: 132623 valid cmap entries retained
|
|
16
|
+
```
|
|
17
|
+
|
|
18
|
+
3,004 cmap entries that were correctly assembled by the Stitcher
|
|
19
|
+
are silently dropped because the output font's gid space caps at
|
|
20
|
+
65,534 and the repair step removes any cmap entry whose gid exceeds
|
|
21
|
+
that limit.
|
|
22
|
+
|
|
23
|
+
## Reproducer
|
|
24
|
+
|
|
25
|
+
Build essenfont with the full donor manifest (30+ donors, ~160k
|
|
26
|
+
codepoints in the union). The build output shows:
|
|
27
|
+
|
|
28
|
+
```
|
|
29
|
+
158247/158247 codepoints stitched
|
|
30
|
+
=== Writing Essenfont-Regular.ttf ===
|
|
31
|
+
repairing: 3004 cmap entries pointed to non-existent gids (max gid = 65534)
|
|
32
|
+
repaired: 132623 valid cmap entries retained
|
|
33
|
+
```
|
|
34
|
+
|
|
35
|
+
3,004 cps are dropped, reducing coverage from expected ~96% to ~82%.
|
|
36
|
+
|
|
37
|
+
## Root cause
|
|
38
|
+
|
|
39
|
+
The output TTF has `maxp.num_glyphs = 65535` (the 16-bit maximum).
|
|
40
|
+
The Stitcher assembles glyphs from 30+ donors whose combined glyph
|
|
41
|
+
count exceeds 65,535. When the output font is written:
|
|
42
|
+
|
|
43
|
+
1. Glyphs are assigned gids sequentially from 0 to ~160k.
|
|
44
|
+
2. `maxp.num_glyphs` is clamped to 65,535 (16-bit limit).
|
|
45
|
+
3. Any cmap entry whose gid ≥ 65,535 is now invalid (points past
|
|
46
|
+
the end of the glyph array).
|
|
47
|
+
4. The repair pass removes those entries.
|
|
48
|
+
|
|
49
|
+
The cap of 65,535 is a hard limit of the TrueType `maxp` table
|
|
50
|
+
(`maxComponentElements` is uint16). Fonts with more than 65,535
|
|
51
|
+
glyphs require CFF (OTF) format or `maxp` version 1.5 with the
|
|
52
|
+
extended glyph range — but fontisan doesn't use either.
|
|
53
|
+
|
|
54
|
+
## What's lost
|
|
55
|
+
|
|
56
|
+
3,004 cmap entries are dropped. These fall into two groups:
|
|
57
|
+
|
|
58
|
+
1. **Large CJK donors** (FSung-2, NotoSansKR, NotoSansNushu):
|
|
59
|
+
codepoints assigned to gids > 65,534 are silently dropped.
|
|
60
|
+
Affects ~1,000 cps from Plane 2 CJK (CJK Ext I, Compat Supp)
|
|
61
|
+
+ ~800 cps from Hangul + ~200 cps from Nushu.
|
|
62
|
+
|
|
63
|
+
2. **Synthetic SVG donors** (Khitan, Tulu-Tigalari, etc.):
|
|
64
|
+
codepoints from late-added synthetic donors get assigned gids
|
|
65
|
+
beyond 65,534. Affects ~1,000 cps from chart-extracted glyphs.
|
|
66
|
+
|
|
67
|
+
## Suggested fixes
|
|
68
|
+
|
|
69
|
+
### Option A: Switch to CFF (OTF) output
|
|
70
|
+
|
|
71
|
+
CFF fonts have no 65,535 glyph cap. fontisan's `OtfCompiler` can
|
|
72
|
+
write OTF output. The build would produce `Essenfont-Regular.otf`
|
|
73
|
+
instead of `.ttf`. This is the cleanest fix but changes the output
|
|
74
|
+
format.
|
|
75
|
+
|
|
76
|
+
### Option B: Glyph deduplication
|
|
77
|
+
|
|
78
|
+
Many donor glyphs are identical (e.g., the .notdef glyph appears
|
|
79
|
+
in every donor). Deduplicating these before writing would reduce
|
|
80
|
+
the glyph count below 65,535. Estimated savings: ~10,000 duplicate
|
|
81
|
+
.notdef + whitespace glyphs.
|
|
82
|
+
|
|
83
|
+
### Option C: Subfont splitting
|
|
84
|
+
|
|
85
|
+
Split the output into multiple TTF files by Unicode plane (Plane 0,
|
|
86
|
+
Plane 1, Plane 2, etc.) with a TTC (TrueType Collection) wrapper.
|
|
87
|
+
Each subfont stays under 65,535 glyphs.
|
|
88
|
+
|
|
89
|
+
### Option D: Remove non-contributing donors
|
|
90
|
+
|
|
91
|
+
FSung-X (38,126 Plane 16 PUA cps) currently contributes zero
|
|
92
|
+
useful codepoints to the output. Removing it from the build would
|
|
93
|
+
reduce the glyph count by ~38,000, well within the 65,535 cap.
|
|
94
|
+
|
|
95
|
+
Similarly, NotoSansKR's variable font has 23,174 cps but only
|
|
96
|
+
11,172 are Hangul — the rest are duplicated by other donors.
|
|
97
|
+
Subsetting KR to just Hangul would save ~12,000 glyphs.
|
|
98
|
+
|
|
99
|
+
## Impact
|
|
100
|
+
|
|
101
|
+
Coverage drops from expected ~96% to actual ~82.76%. The 3,004
|
|
102
|
+
lost cps span 8+ Unicode blocks that have valid glyphs in the
|
|
103
|
+
donor fonts but are silently dropped at write time.
|
|
104
|
+
|
|
105
|
+
## References
|
|
106
|
+
|
|
107
|
+
- Build log with repair message: `/tmp/build-svg6.log`
|
|
108
|
+
- Affected blocks: CJK Ext I, CJK Compat Supp, Nushu,
|
|
109
|
+
Tulu-Tigalari, Khitan Small Script (partial), Hangul (partial)
|
|
110
|
+
- Consumer: essenfont issue #3
|
data/CHANGELOG.md
CHANGED
|
@@ -9,6 +9,112 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
|
|
|
9
9
|
|
|
10
10
|
### Added
|
|
11
11
|
|
|
12
|
+
- `Stitcher` explicit subfont declaration model: every `include_*`
|
|
13
|
+
method accepts an `into:` keyword that names the target subfont.
|
|
14
|
+
The user controls collection structure upfront rather than relying
|
|
15
|
+
on after-the-fact splitting. Backward-compatible: without `into:`,
|
|
16
|
+
bindings route to `:default` (single-font behavior unchanged).
|
|
17
|
+
- `Stitcher#write_collection(path, format:)` — writes all declared
|
|
18
|
+
subfonts as a TTC/OTC with table sharing via the existing
|
|
19
|
+
`Collection::Builder`. Each subfont compiled to TTF/OTF/CFF2 per
|
|
20
|
+
the `format:` argument. Collection format auto-selected: `:ttf` →
|
|
21
|
+
TTC, `:otf`/`:otf2` → OTC.
|
|
22
|
+
- `Stitcher#subfonts` — hash of name → bindings for inspection.
|
|
23
|
+
- `Stitcher#subfont_names` — declared subfont names in order.
|
|
24
|
+
- `Stitcher#write_to(path, format:, subfont:)` — writes a specific
|
|
25
|
+
named subfont as a single file (default: `:default`).
|
|
26
|
+
|
|
27
|
+
### Architecture note
|
|
28
|
+
|
|
29
|
+
The previous design planned an after-the-fact "Splitter" that would
|
|
30
|
+
break bindings into plane-based groups at write time. This was
|
|
31
|
+
replaced with **explicit subfont declaration**: the user decides
|
|
32
|
+
which codepoints go into which subfont, and the Stitcher serializes
|
|
33
|
+
that declared structure. This is model-driven (the subfont
|
|
34
|
+
assignment IS the model) and editorially honest (collection
|
|
35
|
+
structure is an editorial decision, not an algorithmic one).
|
|
36
|
+
|
|
37
|
+
### Added
|
|
38
|
+
|
|
39
|
+
- `Fontisan::Tables::Cff2::Header` — CFF2 5-byte header builder
|
|
40
|
+
(majorVersion=2, minorVersion=0, headerSize=5, topDictSize).
|
|
41
|
+
- `Fontisan::Tables::Cff2::IndexBuilder` — CFF2 INDEX builder with
|
|
42
|
+
uint32 count (vs card16 in CFF1). Supports > 65,535 entries in the
|
|
43
|
+
INDEX structure itself.
|
|
44
|
+
- `Fontisan::Tables::Cff2::DictEncoder` — encodes CFF DICT operands
|
|
45
|
+
(integers + BCD reals) and operators (1-byte and 2-byte escape).
|
|
46
|
+
- `Fontisan::Ufo::Compile::Cff2` — from-scratch CFF2 table builder
|
|
47
|
+
for UFO glyphs. Produces: Header + Top DICT (CharStrings + FontDICT
|
|
48
|
+
offsets) + Global Subr INDEX + CharStrings INDEX + FontDICT INDEX
|
|
49
|
+
(wrapping one FontDICT with empty PrivateDICT reference).
|
|
50
|
+
- `Fontisan::Ufo::Compile::Otf2Compiler` — compiles UFO → OTF with
|
|
51
|
+
CFF2 outlines (table tag `CFF2` instead of `CFF `). Same OTTO sfnt
|
|
52
|
+
signature as CFF1.
|
|
53
|
+
- `Stitcher#write_to` now accepts `format: :otf2` for CFF2 output.
|
|
54
|
+
- `GlyphLimit` recognizes `:otf2` format.
|
|
55
|
+
|
|
56
|
+
### Note on glyph cap
|
|
57
|
+
|
|
58
|
+
CFF2 does **not** bypass the 65,535 glyph cap. Per the OpenType spec,
|
|
59
|
+
the CFF2 CharStrings INDEX count must match `maxp.numGlyphs`, which is
|
|
60
|
+
uint16 in all font versions. For > 65,535 glyphs, TTC (TrueType
|
|
61
|
+
Collection) splitting is required. CFF2's value lies in variable font
|
|
62
|
+
support (`blend`/`vsindex` operators + VariationStore), CID-keyed
|
|
63
|
+
fonts (FDSelect), and improved subroutinization — not glyph count.
|
|
64
|
+
|
|
65
|
+
### Added
|
|
66
|
+
|
|
67
|
+
- `Fontisan::Stitcher::GlyphSignature` — deterministic SHA-256 signature
|
|
68
|
+
of a glyph's outline identity (advance width + contours + components).
|
|
69
|
+
Used to detect visually identical glyphs from different donors.
|
|
70
|
+
- `Fontisan::Stitcher::Deduplicator` — registry mapping signatures to
|
|
71
|
+
canonical glyph names, enabling signature-based deduplication during
|
|
72
|
+
Stitcher assembly. Merges identical outlines from different donors
|
|
73
|
+
into a single gid, reducing the glyph count.
|
|
74
|
+
- `Fontisan::Stitcher::GlyphLimit` — format-specific glyph-count caps
|
|
75
|
+
(TTF: 65,535; OTF: unlimited) and enforcement via `check!`.
|
|
76
|
+
- `Fontisan::GlyphLimitExceededError` — raised when the Stitcher's
|
|
77
|
+
output exceeds the format's glyph cap, with actionable guidance
|
|
78
|
+
(switch to OTF, reduce donors, split into TTC).
|
|
79
|
+
- `Stitcher.new(deduplicate: true)` — signature-based dedup is now the
|
|
80
|
+
default; pass `deduplicate: false` to disable.
|
|
81
|
+
|
|
82
|
+
### Fixed
|
|
83
|
+
|
|
84
|
+
- Stitcher no longer silently produces an invalid TTF (or OTF) when
|
|
85
|
+
the glyph count exceeds 65,535. Both TTF and OTF (CFF1) cap at
|
|
86
|
+
65,535 glyphs because `maxp.num_glyphs` is uint16 and the CFF1
|
|
87
|
+
CharStrings INDEX count is card16. The cap is now enforced BEFORE
|
|
88
|
+
writing, and signature-based deduplication merges identical outlines
|
|
89
|
+
to reduce the count. When dedup alone isn't enough,
|
|
90
|
+
`GlyphLimitExceededError` is raised with actionable options
|
|
91
|
+
(split into TTC, reduce donors, wait for CFF2) instead of the
|
|
92
|
+
previous behavior (silent truncation + dropped cmap entries at
|
|
93
|
+
the repair pass).
|
|
94
|
+
|
|
95
|
+
### Added (previous)
|
|
96
|
+
|
|
97
|
+
- `Fontisan::SvgToGlyf` — converts SVG path data (from ucode code-chart
|
|
98
|
+
extraction) into `Ufo::Glyph` objects that feed directly into the
|
|
99
|
+
existing Stitcher + TtfCompiler pipeline. The converter handles SVG
|
|
100
|
+
path commands (M/L/H/V/C/S/Q/T/Z), relative and absolute coordinates,
|
|
101
|
+
smooth-curve reflection, SVG `<g transform>` accumulation, viewBox
|
|
102
|
+
coordinate normalization, and Y-axis flipping. Cubic-to-quadratic
|
|
103
|
+
conversion and contour winding correction are handled automatically
|
|
104
|
+
by the existing `Ufo::Compile::Filters` when compiling to TTF.
|
|
105
|
+
- `Fontisan::SvgToGlyf::Geometry::AffineTransform` — 2×3 matrix with
|
|
106
|
+
compose/apply, used uniformly for all coordinate operations.
|
|
107
|
+
- `Fontisan::SvgToGlyf::Geometry::TransformParser` — parses SVG
|
|
108
|
+
`transform="..."` attribute (translate, scale, rotate, matrix, skew).
|
|
109
|
+
- `Fontisan::SvgToGlyf::Geometry::Normalizer` — composes the viewBox →
|
|
110
|
+
font UPM normalization (Y-flip + scale) with group transforms.
|
|
111
|
+
- `Fontisan::SvgToGlyf::Path::Parser` — tokenizes and parses SVG path
|
|
112
|
+
`d` strings into typed Command objects with implicit-repetition support.
|
|
113
|
+
- `Fontisan::SvgToGlyf::Path::ContourBuilder` — converts commands to
|
|
114
|
+
`Ufo::Contour` objects, tracking current-point, subpath-start, and
|
|
115
|
+
control-point state for smooth-curve reflection.
|
|
116
|
+
- `Fontisan::SvgToGlyf::Document` — walks SVG XML (via Nokogiri),
|
|
117
|
+
accumulating ancestor `<g>` transforms per `<path>`.
|
|
12
118
|
- `Fontisan::Ufo::Compile::Avar` — builds the OpenType `avar` (Axis
|
|
13
119
|
Variation) table with per-axis non-linear maps (defaults to identity
|
|
14
120
|
-1/0/1 mapping).
|