fontisan 0.4.6 → 0.4.7

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (51) hide show
  1. checksums.yaml +4 -4
  2. data/BUG-stitcher-drops-isolated-cps.md +58 -0
  3. data/BUG-stitcher-drops-plane1-codepoints.md +310 -0
  4. data/BUG-stitcher-gid-cap-65535.md +110 -0
  5. data/CHANGELOG.md +106 -0
  6. data/README.adoc +121 -68
  7. data/benchmark/compile_benchmark.rb +70 -0
  8. data/docs/CFF2_SUPPORT.adoc +184 -0
  9. data/docs/STITCHER_GUIDE.adoc +151 -0
  10. data/docs/SVG_TO_GLYF.adoc +118 -0
  11. data/docs/UFO_COMPILATION.adoc +119 -0
  12. data/lib/fontisan/collection/writer.rb +5 -6
  13. data/lib/fontisan/error.rb +31 -0
  14. data/lib/fontisan/stitcher/deduplicator.rb +47 -0
  15. data/lib/fontisan/stitcher/glyph_limit.rb +53 -0
  16. data/lib/fontisan/stitcher/glyph_signature.rb +51 -0
  17. data/lib/fontisan/stitcher.rb +188 -167
  18. data/lib/fontisan/svg_to_glyf/assembler.rb +132 -0
  19. data/lib/fontisan/svg_to_glyf/document.rb +83 -0
  20. data/lib/fontisan/svg_to_glyf/geometry/affine_transform.rb +112 -0
  21. data/lib/fontisan/svg_to_glyf/geometry/normalizer.rb +45 -0
  22. data/lib/fontisan/svg_to_glyf/geometry/transform_parser.rb +91 -0
  23. data/lib/fontisan/svg_to_glyf/geometry.rb +13 -0
  24. data/lib/fontisan/svg_to_glyf/path/command.rb +18 -0
  25. data/lib/fontisan/svg_to_glyf/path/contour_builder.rb +140 -0
  26. data/lib/fontisan/svg_to_glyf/path/parser.rb +98 -0
  27. data/lib/fontisan/svg_to_glyf/path/state.rb +79 -0
  28. data/lib/fontisan/svg_to_glyf/path.rb +14 -0
  29. data/lib/fontisan/svg_to_glyf.rb +62 -0
  30. data/lib/fontisan/tables/cff/cff2_charstring_builder.rb +216 -0
  31. data/lib/fontisan/tables/cff.rb +1 -0
  32. data/lib/fontisan/tables/cff2/dict_encoder.rb +94 -0
  33. data/lib/fontisan/tables/cff2/fd_select.rb +69 -0
  34. data/lib/fontisan/tables/cff2/header.rb +34 -0
  35. data/lib/fontisan/tables/cff2/index_builder.rb +79 -0
  36. data/lib/fontisan/tables/cff2.rb +4 -0
  37. data/lib/fontisan/ufo/compile/cbdt_cblc.rb +103 -0
  38. data/lib/fontisan/ufo/compile/cff2.rb +181 -0
  39. data/lib/fontisan/ufo/compile/cff2_subroutines.rb +39 -0
  40. data/lib/fontisan/ufo/compile/colr.rb +80 -0
  41. data/lib/fontisan/ufo/compile/cpal.rb +61 -0
  42. data/lib/fontisan/ufo/compile/math.rb +143 -0
  43. data/lib/fontisan/ufo/compile/meta.rb +51 -0
  44. data/lib/fontisan/ufo/compile/otf2_compiler.rb +46 -0
  45. data/lib/fontisan/ufo/compile/sbix.rb +99 -0
  46. data/lib/fontisan/ufo/compile/svg_table.rb +60 -0
  47. data/lib/fontisan/ufo/compile/variable_otf.rb +75 -0
  48. data/lib/fontisan/ufo/compile.rb +11 -0
  49. data/lib/fontisan/version.rb +1 -1
  50. data/lib/fontisan.rb +3 -0
  51. metadata +41 -2
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 519952f147a10de9f78f3e30eb4456a077f35cb24099bb65781e7b1c2c4e59a3
4
- data.tar.gz: c37e0f9d37bf0512cd7c0ed9be19e0eac5a9c74eb057c951d1a2dba4237c5f11
3
+ metadata.gz: 9957716f86dad58d8ff9b5f0bc49bc7044ab20f073bd26b5b87f09d8a5aa1cb1
4
+ data.tar.gz: 5ac5083c8db47532bc1331c82209c02bc6d8b7095b8f2c27c41fecced8230b32
5
5
  SHA512:
6
- metadata.gz: c5702491e2326da5f357b5c89904896763c0f57dc807022b11b1d203ab1b4158372595c5059f0fff3143631323048a543e9b2f3046ded97e868d5cf105e518cc
7
- data.tar.gz: 2f9f010ab0079fedfec7932313231bf7d002b2fed7af6476c459918e92e6d9c01300dac46594b64d16874b95924026bcd1a9779fe53ae1d2ea65240464256616
6
+ metadata.gz: e1fa7608f49e922735d6b677456cbf4864ced9a32e3e771960229f685208c67be6079f9b1bfcfc6f393c83bf8d5b7e0e158c397b618080229a14dd274718724e
7
+ data.tar.gz: 730148059f3819b6dd49c1f8b86ab9d2e4d8508671b54b59cc240e638472b0da0dd5e5b7187f30b09bbd12d8e149a93903097068a45ac281224f2482464fa2ea
@@ -0,0 +1,58 @@
1
+ # BUG: Stitcher drops isolated codepoints from NotoSansCuneiform (U+12399)
2
+
3
+ ## Status
4
+
5
+ NEW — discovered 2026-07-01. Not yet addressed. Same class as the
6
+ earlier Plane 1 drop bugs (BUG-stitcher-drops-plane1-codepoints.md)
7
+ but at a different gid position (gid 925 of 1238).
8
+
9
+ ## Reproducer
10
+
11
+ ```ruby
12
+ require "fontisan"
13
+
14
+ src = Fontisan::FontLoader.load("NotoSansCuneiform-Regular.ttf")
15
+ cmap = src.table("cmap").unicode_mappings
16
+ # U+12399 → gid 925 (font has 1239 glyphs total)
17
+
18
+ stitcher = Fontisan::Stitcher.new
19
+ stitcher.add_source(:cuneiform, src)
20
+ stitcher.include_notdef(from: :cuneiform)
21
+ stitcher.include_codepoints([0x12399], from: :cuneiform)
22
+ stitcher.write_to("/tmp/out.ttf", format: :ttf)
23
+
24
+ out = Fontisan::FontLoader.load("/tmp/out.ttf")
25
+ out.table("cmap").unicode_mappings.key?(0x12399)
26
+ # => false ← SILENTLY DROPPED
27
+ ```
28
+
29
+ ## Source font
30
+
31
+ - File: `NotoSansCuneiform-Regular.ttf` (essenfont repo)
32
+ - num_glyphs: 1239
33
+ - cmap size: 1238
34
+ - U+12399 → gid 925 (well within range, NOT at the top end)
35
+
36
+ ## Other donors affected
37
+
38
+ Same pattern may affect other donors with similar cmap topology.
39
+ Worth running the full donor_audit + coverage gap analysis to
40
+ identify all affected codepoints.
41
+
42
+ ## Impact
43
+
44
+ essenfont output is missing U+12399 (Cuneiform) and possibly other
45
+ codepoints from various donors. Each donor with a similar gid layout
46
+ may lose 1-N codepoints silently.
47
+
48
+ ## Suggested investigation
49
+
50
+ The O(1) per-glyph extraction path (commit `f07e788`) may still have
51
+ edge cases. gid 925 isn't at the boundary of the font — the drop is
52
+ not a "top-end gid" issue like Kedebideri. This is an isolated drop
53
+ in the middle of the gid range.
54
+
55
+ Check the glyph at gid 925 in the source — is it empty, a composite,
56
+ or otherwise different from its neighbors? The `Convert::FromBinData`
57
+ fix that addressed the `next unless simple` issue may still have
58
+ gaps for certain glyph types.
@@ -0,0 +1,310 @@
1
+ # BUG: Stitcher silently drops Plane 1 codepoints from some sources (and partial Plane 1 loss from others)
2
+
3
+ ## RESOLVED in fontisan 0.4.2 (PR #53)
4
+
5
+ All four failure modes are fixed. Final status:
6
+
7
+ | Failure | 0.4.0 | 0.4.1 | 0.4.2 |
8
+ |---|---|---|---|
9
+ | Lentariso TTF Plane 1 (Imperial Aramaic, Phoenician, Sidetic) | broken | fixed | **fixed** ✓ |
10
+ | Kedebideri TTF top gids (Beria Erfe U+16EB5–B8, U+16ED0–D3) | broken | fixed | **fixed** ✓ |
11
+ | Tangut OTF/CFF Plane 1 (NotoSerifTangut) | ok | broken | **fixed** ✓ |
12
+ | CBDT passthrough (NotoColorEmoji) | ok | ok | **fixed** ✓ |
13
+
14
+ ### Root cause (PR #53)
15
+
16
+ `Convert::FromBinData` had `next unless simple` which skipped empty
17
+ glyphs, breaking gid → array index alignment. The O(1) per-glyph
18
+ extraction introduced in 0.4.1 (`f07e788`) relied on that alignment
19
+ and silently dropped cps whose gids mapped to skipped slots.
20
+
21
+ PR #53 contains 3 commits:
22
+
23
+ 1. **`Convert::FromBinData`**: removed `next unless simple` — always
24
+ adds glyph to UFO (even if empty), maintaining gid → array index
25
+ alignment. **Root cause fix for both the original TTF bug and the
26
+ CFF regression.**
27
+ 2. **`Stitcher::Source#extract_cff_glyph_safe(gid)`**: for CFF
28
+ sources, falls back to full UFO conversion (which now has no
29
+ gaps).
30
+ 3. **Regression spec**: OTF/CFF Tangut test added to
31
+ `spec/fontisan/stitcher/cmap_preservation_spec.rb` — verifies
32
+ Tangut codepoints from `NotoSerifTangut.otf` survive in output.
33
+ 4. **GPOS kern writer** (TODO 08 partial): new `Compile::Gpos`
34
+ module builds minimal GPOS table from UFO kerning pairs.
35
+
36
+ ### Consumer-side validation (essenfont)
37
+
38
+ Build with fontisan 0.4.2 + the full donor manifest produces:
39
+
40
+ - 65,535 glyphs / 45.9MB
41
+ - 181,515 cmap entries (was 159,764 in issue #3, +21,751)
42
+ - 59.75% Unicode 17 coverage (was 53.00%, +6.75 pp)
43
+ - 53 blocks complete (was 49)
44
+
45
+ All affected donors round-trip 100%:
46
+
47
+ | Donor | Block | CMAP → output |
48
+ |---|---|---|
49
+ | Lentariso | Imperial Aramaic U+10840–1085F | 31/31 |
50
+ | Lentariso | Phoenician U+10900–1091F | 29/29 |
51
+ | Lentariso | Sidetic U+10940–1095F | 26/26 |
52
+ | Kedebideri | Beria Erfe U+16EA0–16EDF | 50/50 |
53
+ | NotoSerifTangut (OTF/CFF) | Tangut U+17000–187FF | 6136/6136 |
54
+ | NotoSerifTangut | Tangut Components U+18800–18AFF | 768/768 |
55
+ | NotoColorEmoji (CBDT) | Emoticons U+1F600–1F64F | 80/80 |
56
+
57
+ The sections below are preserved for historical reference.
58
+
59
+ ---
60
+
61
+ ## Summary
62
+
63
+ Two distinct failures in `Fontisan::Stitcher` (fontisan 0.4.0) when
64
+ stitching codepoints from certain TTF/OTF sources into a target TTF:
65
+
66
+ 1. **Total Plane 1 drop** — for sources that have cmap entries in
67
+ Plane 1 (U+10000..U+1FFFF), the Stitcher writes **none** of them
68
+ into the output font, even though the donor's cmap loads
69
+ correctly via `Fontisan::FontLoader.load`.
70
+ 2. **Partial upper-end drop** — for some sources, a small contiguous
71
+ run of the highest-cp Plane 1 entries is dropped, while the rest
72
+ of the block is preserved.
73
+
74
+
75
+ Both failures are silent — no exception, no warning. The output font
76
+ is missing the affected codepoints without any indication.
77
+
78
+ ## Reproducers
79
+
80
+ ### Bug 1 — Lentariso (total Plane 1 drop)
81
+
82
+ ```ruby
83
+ require "fontisan"
84
+
85
+ src = Fontisan::FontLoader.load("references/input-fonts/Lentariso-Regular.ttf")
86
+ cmap = src.table("cmap")
87
+ maps = cmap.unicode_mappings
88
+ plane1_cps = maps.keys.select { |cp| cp >= 0x10000 && cp <= 0x1FFFF }
89
+ # plane1_cps.size == 1934
90
+
91
+ stitcher = Fontisan::Stitcher.new
92
+ stitcher.add_source(:lentariso, src)
93
+ stitcher.include_notdef(from: :lentariso)
94
+ stitcher.include_codepoints(plane1_cps, from: :lentariso)
95
+ stitcher.write_to("/tmp/out.ttf", format: :ttf)
96
+
97
+ out = Fontisan::FontLoader.load("/tmp/out.ttf")
98
+ out_plane1 = out.table("cmap").unicode_mappings.keys
99
+ .count { |cp| cp >= 0x10000 && cp <= 0x1FFFF }
100
+ # out_plane1 == 0
101
+ ```
102
+
103
+ Sub-ranges showing the same pattern:
104
+ - U+10840..U+1085F (Imperial Aramaic): 31 cps in donor → **0 in output**
105
+ - U+10900..U+1091F (Phoenician): 29 cps in donor → **0 in output**
106
+ - U+10940..U+1095F (Sidetic): 26 cps in donor → **0 in output**
107
+
108
+ Other donors with Plane 1 coverage work correctly:
109
+ - `NewGardiner.ttf` (4,537 Plane 1 cps) → all 10/10 stitched in spot-check
110
+ - `references/input-fonts/UniHieroglyphica.ttf` (Egyptian Hieroglyphs Ext-A) → all 4,000/4,000 stitched
111
+
112
+ ### Bug 2 — Kedebideri (partial upper-end drop)
113
+
114
+ ```ruby
115
+ require "fontisan"
116
+
117
+ src = Fontisan::FontLoader.load("references/input-fonts/Kedebideri-Regular.ttf")
118
+ cmap = src.table("cmap")
119
+ maps = cmap.unicode_mappings
120
+ beria_erfe = maps.keys.select { |cp| cp >= 0x16EA0 && cp <= 0x16EDF }
121
+ # beria_erfe.size == 50
122
+
123
+ stitcher = Fontisan::Stitcher.new
124
+ stitcher.add_source(:kedebideri, src)
125
+ stitcher.include_notdef(from: :kedebideri)
126
+ stitcher.include_codepoints(beria_erfe, from: :kedebideri)
127
+ stitcher.write_to("/tmp/out.ttf", format: :ttf)
128
+
129
+ out = Fontisan::FontLoader.load("/tmp/out.ttf")
130
+ out_beria = out.table("cmap").unicode_mappings.keys
131
+ .select { |cp| cp >= 0x16EA0 && cp <= 0x16EDF }
132
+ # out_beria.size == 42 (8 dropped)
133
+ ```
134
+
135
+ The 8 dropped codepoints (always the same set):
136
+ - `U+16EB5`, `U+16EB6`, `U+16EB7`, `U+16EB8` (consecutive)
137
+ - `U+16ED0`, `U+16ED1`, `U+16ED2`, `U+16ED3` (consecutive)
138
+
139
+ These all map to **gids 341–348** in the source font, which is the
140
+ top end of the source's gid range (2..348). Dropped gids span the
141
+ topmost consecutive range of the cmap; the gap at gid 340 in the
142
+ source cmap (a missing glyph) precedes the first dropped range.
143
+
144
+ ## Expected
145
+
146
+ All codepoints present in the source font's cmap should appear in the
147
+ stitched output's cmap, with the same or remapped glyph references.
148
+
149
+ ## Actual
150
+
151
+ - Lentariso: **0 / 1934** Plane 1 codepoints make it to output (100% loss)
152
+ - Kedebideri: **42 / 50** Beria Erfe codepoints in output (16% loss, top-end)
153
+
154
+ ## Observed patterns
155
+
156
+ - **Only Plane 1 is affected**. Plane 0 (BMP) codepoints from the
157
+ same sources work fine. Plane 2+ codepoints (FSung-2, ~61,000 cps
158
+ in CJK Extension B-E) work fine.
159
+ - **The total-drop case (Lentariso) is consistent across multiple
160
+ runs** — every Plane 1 cp is dropped, not just some.
161
+ - **The partial-drop case (Kedebideri) drops exactly the topmost
162
+ consecutive range of cmap gids** that correspond to Plane 1 entries
163
+ whose gids are adjacent.
164
+ - Other Plane 1 sources work fine (NewGardiner, UniHieroglyphica),
165
+ so the bug is not a blanket "Plane 1 unsupported" — something about
166
+ the cmap structure or font topology triggers it.
167
+
168
+ ## What's been ruled out
169
+
170
+ - cmap gids exceeding `maxp.num_glyphs` — verified all source gids
171
+ are within num_glyphs for both Lentariso (maxp=6063, cmap=5767)
172
+ and Kedebideri (maxp=380, cmap=347).
173
+ - File format / validation — both sources load without warning via
174
+ `Fontisan::FontLoader.load`; their cmaps are introspectable and
175
+ show all expected entries.
176
+ - Lentariso Plane 0 cps work (e.g., U+0041 → 'A' round-trips).
177
+ - FSung-2's Plane 2 cps (CJK Ext B, ~61k entries) work end-to-end.
178
+
179
+ ## Suggested investigation directions
180
+
181
+ 1. **cmap format detection** — sources may be reported as format 4
182
+ (BMP-only) by fontisan's parser when they actually have a format
183
+ 12 subtable. If the Stitcher writes only format 4 to the output,
184
+ all Plane 1 entries would be dropped.
185
+ 2. **`Stitcher::Source#bitmap_mode`** — added in fontisan 0.4 (per
186
+ CHANGELOG entry for the recent Stitcher rewrite). If Lentariso is
187
+ misclassified as `:cbdt` or `:none` despite being a pure `glyf`
188
+ source, its cmap could be ignored or partially processed.
189
+ 3. **gid adjacency in cmap** — the Kedebideri drop (gids 341–348)
190
+ sits at the topmost consecutive run of cmap gids. If the Stitcher
191
+ has a cutoff based on "n gids before the cmap max gid", this
192
+ could explain it. The exact mechanism is unclear without reading
193
+ the Stitcher internals.
194
+ 4. **Lookback into Stitcher's `assemble_cmap`** — the function that
195
+ merges source cmaps into the output cmap is the most likely site
196
+ of the bug. Check whether it iterates source cmap entries
197
+ sorted by glyph id or by codepoint, and whether it has any
198
+ boundary conditions on either sort key.
199
+
200
+ ## Impact
201
+
202
+ - **essenfont** (this project's v0.1.0 output font) silently loses
203
+ 86+ codepoints in Sidetic / Imperial Aramaic / Phoenician, and 8
204
+ in Beria Erfe. Coverage in `scripts/coverage_report.rb` shows the
205
+ affected blocks as 0% (or partial), even though the source fonts
206
+ contain valid glyphs.
207
+ - Any downstream consumer of the Stitcher that combines Plane 1
208
+ donors is affected.
209
+
210
+ ## Related
211
+
212
+ - fontisan 0.4.0 added single-source CBDT/CBLC passthrough (commit
213
+ `b612058`); this Stitcher bug may have been introduced in the
214
+ same refactor. If so, the fix should land in 0.4.x.
215
+
216
+ ## Test fixtures
217
+
218
+ The above reproducers use donor fonts committed to the essenfont
219
+ repo at `references/input-fonts/`:
220
+ - `Lentariso-Regular.ttf` (Bryndan W. Meyerholt, OFL, sha256 514ac442c9c5c361625da0755e84baaee36bbafe098138312f5285c3bd2fa0d3)
221
+ - `Kedebideri-Regular.ttf` (SIL International, OFL, sha256 f95907a8c39c68d557c2264fcf593a858eb4751cd5e0c3c7c53e1ef354444064)
222
+
223
+ Both files are also available from the upstream sources documented in
224
+ `essenfont/references/input-fonts/ATTRIBUTIONS.md`.
225
+
226
+ ## Reporter
227
+
228
+ essenfont project, issue #3 (https://github.com/fontist/essenfont/issues/3)
229
+
230
+ ## Regression introduced by the fix (2026-06-30)
231
+
232
+ The `feature/stitcher-perf` branch fixes the Lentariso + Kedebideri
233
+ bugs but introduces a **new** silent Plane 1 drop for OTF/CFF sources.
234
+
235
+ ### New reproducer
236
+
237
+ ```ruby
238
+ require "fontisan"
239
+
240
+ # NotoSerifTangut is OTF (CFF outlines, no glyf table)
241
+ src = Fontisan::FontLoader.load("NotoSerifTangut-Regular.otf")
242
+ cmap = src.table("cmap")
243
+ maps = cmap.unicode_mappings
244
+ tangut = maps.keys.select { |cp| cp >= 0x17000 && cp <= 0x187FF }
245
+ # tangut.size == 6136
246
+
247
+ stitcher = Fontisan::Stitcher.new
248
+ stitcher.add_source(:tangut, src)
249
+ stitcher.include_notdef(from: :tangut)
250
+ stitcher.include_codepoints(tangut.first(10), from: :tangut)
251
+ stitcher.write_to("/tmp/out.ttf", format: :ttf)
252
+
253
+ out = Fontisan::FontLoader.load("/tmp/out.ttf")
254
+ outmaps = out.table("cmap").unicode_mappings
255
+ present = tangut.first(10).count { |cp| outmaps.key?(cp) }
256
+ # present == 0 ← ALL DROPPED
257
+ ```
258
+
259
+ Source has tables: `["CFF ", "GDEF", "GPOS", "GSUB", "OS/2", "VORG",
260
+ "cmap", "head", "hhea", "hmtx", "maxp", "name", "post", "vhea",
261
+ "vmtx"]`. Note `CFF ` (CFF outlines), no `glyf`.
262
+
263
+ ### Affected donors (essenfont manifest)
264
+
265
+ | Donor | Plane 1 cps | In output | Loss |
266
+ |---|---|---|---|
267
+ | **noto-serif-tangut** (OTF/CFF) | 6916 | **0** | **100%** ← full regression |
268
+ | noto-sans (TTF) | 88 | 84 | 4 cps |
269
+ | noto-sans-sharada (TTF) | 96 | 95 | 1 cp |
270
+ | noto-sans-math (TTF) | 1228 | 1200 | 28 cps |
271
+ | noto-music (TTF) | 549 | 537 | 12 cps |
272
+ | noto-sans-symbols-2 (TTF) | 1608 | 1596 | 12 cps |
273
+ | lentariso (TTF) | 1934 | 1930 | 4 cps |
274
+ | egyptian-text (TTF) | 1131 | 1118 | 13 cps |
275
+ | uni-hieroglyphica (TTF) | 5108 | 5095 | 13 cps |
276
+ | new-gardiner (TTF) | 4537 | 4524 | 13 cps |
277
+
278
+ **Total: 6,916 Tangut cps + ~100 cps across 9 TTF donors = ~7,000 cps lost.**
279
+
280
+ ### Suggested cause
281
+
282
+ The `b3e52af` commit ("stitcher: O(1) per-glyph extraction instead
283
+ of O(n) full-donor conversion") likely changed the per-glyph
284
+ extraction path to assume `glyf` outlines. For OTF/CFF sources,
285
+ extraction silently fails and the cp is dropped from the output cmap.
286
+
287
+ The smaller losses (~1-3% per TTF donor) may be a separate issue —
288
+ perhaps an off-by-one or boundary condition in the new
289
+ extraction loop. Worth grepping the new spec files for boundary
290
+ test cases.
291
+
292
+ ### Net impact on essenfont
293
+
294
+ - Before fix (0.4.0): 181,354 cmap entries, 59.69% Unicode 17 coverage
295
+ - After fix (feature/stitcher-perf): 174,599 entries, 57.47% coverage
296
+ - **Net change: −6,755 entries, −2.22 percentage points**
297
+
298
+ The fix trades Lentariso + Kedebideri coverage (94 cps regained) for
299
+ Noto Serif Tangut coverage (6,916 cps lost). Net negative.
300
+
301
+ ### Recommended next step
302
+
303
+ The Stitcher's per-glyph extraction needs to handle:
304
+ - TTF / `glyf` sources (current path)
305
+ - OTF / `CFF ` sources (currently broken — likely the regression)
306
+ - CBDT/CBLC sources (works in 0.4.0+ for single-source passthrough)
307
+
308
+ All three glyph-storage modes should round-trip identically. A
309
+ parametric spec covering each mode for the same test codepoint
310
+ would prevent this class of regression.
@@ -0,0 +1,110 @@
1
+ # BUG: Stitcher repair pass drops 3,004 cmap entries (gid remap)
2
+
3
+ ## Status
4
+
5
+ NEW — discovered 2026-07-01 with fontisan 0.4.6. Introduced by the
6
+ repair pass added alongside the compound-glyph flatten fix
7
+ (commit `73820f1`). Not yet addressed.
8
+
9
+ ## Summary
10
+
11
+ The build output's post-write repair step reports:
12
+
13
+ ```
14
+ repairing: 3004 cmap entries pointed to non-existent gids (max gid = 65534)
15
+ repaired: 132623 valid cmap entries retained
16
+ ```
17
+
18
+ 3,004 cmap entries that were correctly assembled by the Stitcher
19
+ are silently dropped because the output font's gid space caps at
20
+ 65,534 and the repair step removes any cmap entry whose gid exceeds
21
+ that limit.
22
+
23
+ ## Reproducer
24
+
25
+ Build essenfont with the full donor manifest (30+ donors, ~160k
26
+ codepoints in the union). The build output shows:
27
+
28
+ ```
29
+ 158247/158247 codepoints stitched
30
+ === Writing Essenfont-Regular.ttf ===
31
+ repairing: 3004 cmap entries pointed to non-existent gids (max gid = 65534)
32
+ repaired: 132623 valid cmap entries retained
33
+ ```
34
+
35
+ 3,004 cps are dropped, reducing coverage from expected ~96% to ~82%.
36
+
37
+ ## Root cause
38
+
39
+ The output TTF has `maxp.num_glyphs = 65535` (the 16-bit maximum).
40
+ The Stitcher assembles glyphs from 30+ donors whose combined glyph
41
+ count exceeds 65,535. When the output font is written:
42
+
43
+ 1. Glyphs are assigned gids sequentially from 0 to ~160k.
44
+ 2. `maxp.num_glyphs` is clamped to 65,535 (16-bit limit).
45
+ 3. Any cmap entry whose gid ≥ 65,535 is now invalid (points past
46
+ the end of the glyph array).
47
+ 4. The repair pass removes those entries.
48
+
49
+ The cap of 65,535 is a hard limit of the TrueType `maxp` table
50
+ (`maxComponentElements` is uint16). Fonts with more than 65,535
51
+ glyphs require CFF (OTF) format or `maxp` version 1.5 with the
52
+ extended glyph range — but fontisan doesn't use either.
53
+
54
+ ## What's lost
55
+
56
+ 3,004 cmap entries are dropped. These fall into two groups:
57
+
58
+ 1. **Large CJK donors** (FSung-2, NotoSansKR, NotoSansNushu):
59
+ codepoints assigned to gids > 65,534 are silently dropped.
60
+ Affects ~1,000 cps from Plane 2 CJK (CJK Ext I, Compat Supp)
61
+ + ~800 cps from Hangul + ~200 cps from Nushu.
62
+
63
+ 2. **Synthetic SVG donors** (Khitan, Tulu-Tigalari, etc.):
64
+ codepoints from late-added synthetic donors get assigned gids
65
+ beyond 65,534. Affects ~1,000 cps from chart-extracted glyphs.
66
+
67
+ ## Suggested fixes
68
+
69
+ ### Option A: Switch to CFF (OTF) output
70
+
71
+ CFF fonts have no 65,535 glyph cap. fontisan's `OtfCompiler` can
72
+ write OTF output. The build would produce `Essenfont-Regular.otf`
73
+ instead of `.ttf`. This is the cleanest fix but changes the output
74
+ format.
75
+
76
+ ### Option B: Glyph deduplication
77
+
78
+ Many donor glyphs are identical (e.g., the .notdef glyph appears
79
+ in every donor). Deduplicating these before writing would reduce
80
+ the glyph count below 65,535. Estimated savings: ~10,000 duplicate
81
+ .notdef + whitespace glyphs.
82
+
83
+ ### Option C: Subfont splitting
84
+
85
+ Split the output into multiple TTF files by Unicode plane (Plane 0,
86
+ Plane 1, Plane 2, etc.) with a TTC (TrueType Collection) wrapper.
87
+ Each subfont stays under 65,535 glyphs.
88
+
89
+ ### Option D: Remove non-contributing donors
90
+
91
+ FSung-X (38,126 Plane 16 PUA cps) currently contributes zero
92
+ useful codepoints to the output. Removing it from the build would
93
+ reduce the glyph count by ~38,000, well within the 65,535 cap.
94
+
95
+ Similarly, NotoSansKR's variable font has 23,174 cps but only
96
+ 11,172 are Hangul — the rest are duplicated by other donors.
97
+ Subsetting KR to just Hangul would save ~12,000 glyphs.
98
+
99
+ ## Impact
100
+
101
+ Coverage drops from expected ~96% to actual ~82.76%. The 3,004
102
+ lost cps span 8+ Unicode blocks that have valid glyphs in the
103
+ donor fonts but are silently dropped at write time.
104
+
105
+ ## References
106
+
107
+ - Build log with repair message: `/tmp/build-svg6.log`
108
+ - Affected blocks: CJK Ext I, CJK Compat Supp, Nushu,
109
+ Tulu-Tigalari, Khitan Small Script (partial), Hangul (partial)
110
+ - Consumer: essenfont issue #3
data/CHANGELOG.md CHANGED
@@ -9,6 +9,112 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
9
9
 
10
10
  ### Added
11
11
 
12
+ - `Stitcher` explicit subfont declaration model: every `include_*`
13
+ method accepts an `into:` keyword that names the target subfont.
14
+ The user controls collection structure upfront rather than relying
15
+ on after-the-fact splitting. Backward-compatible: without `into:`,
16
+ bindings route to `:default` (single-font behavior unchanged).
17
+ - `Stitcher#write_collection(path, format:)` — writes all declared
18
+ subfonts as a TTC/OTC with table sharing via the existing
19
+ `Collection::Builder`. Each subfont compiled to TTF/OTF/CFF2 per
20
+ the `format:` argument. Collection format auto-selected: `:ttf` →
21
+ TTC, `:otf`/`:otf2` → OTC.
22
+ - `Stitcher#subfonts` — hash of name → bindings for inspection.
23
+ - `Stitcher#subfont_names` — declared subfont names in order.
24
+ - `Stitcher#write_to(path, format:, subfont:)` — writes a specific
25
+ named subfont as a single file (default: `:default`).
26
+
27
+ ### Architecture note
28
+
29
+ The previous design planned an after-the-fact "Splitter" that would
30
+ break bindings into plane-based groups at write time. This was
31
+ replaced with **explicit subfont declaration**: the user decides
32
+ which codepoints go into which subfont, and the Stitcher serializes
33
+ that declared structure. This is model-driven (the subfont
34
+ assignment IS the model) and editorially honest (collection
35
+ structure is an editorial decision, not an algorithmic one).
36
+
37
+ ### Added
38
+
39
+ - `Fontisan::Tables::Cff2::Header` — CFF2 5-byte header builder
40
+ (majorVersion=2, minorVersion=0, headerSize=5, topDictSize).
41
+ - `Fontisan::Tables::Cff2::IndexBuilder` — CFF2 INDEX builder with
42
+ uint32 count (vs card16 in CFF1). Supports > 65,535 entries in the
43
+ INDEX structure itself.
44
+ - `Fontisan::Tables::Cff2::DictEncoder` — encodes CFF DICT operands
45
+ (integers + BCD reals) and operators (1-byte and 2-byte escape).
46
+ - `Fontisan::Ufo::Compile::Cff2` — from-scratch CFF2 table builder
47
+ for UFO glyphs. Produces: Header + Top DICT (CharStrings + FontDICT
48
+ offsets) + Global Subr INDEX + CharStrings INDEX + FontDICT INDEX
49
+ (wrapping one FontDICT with empty PrivateDICT reference).
50
+ - `Fontisan::Ufo::Compile::Otf2Compiler` — compiles UFO → OTF with
51
+ CFF2 outlines (table tag `CFF2` instead of `CFF `). Same OTTO sfnt
52
+ signature as CFF1.
53
+ - `Stitcher#write_to` now accepts `format: :otf2` for CFF2 output.
54
+ - `GlyphLimit` recognizes `:otf2` format.
55
+
56
+ ### Note on glyph cap
57
+
58
+ CFF2 does **not** bypass the 65,535 glyph cap. Per the OpenType spec,
59
+ the CFF2 CharStrings INDEX count must match `maxp.numGlyphs`, which is
60
+ uint16 in all font versions. For > 65,535 glyphs, TTC (TrueType
61
+ Collection) splitting is required. CFF2's value lies in variable font
62
+ support (`blend`/`vsindex` operators + VariationStore), CID-keyed
63
+ fonts (FDSelect), and improved subroutinization — not glyph count.
64
+
65
+ ### Added
66
+
67
+ - `Fontisan::Stitcher::GlyphSignature` — deterministic SHA-256 signature
68
+ of a glyph's outline identity (advance width + contours + components).
69
+ Used to detect visually identical glyphs from different donors.
70
+ - `Fontisan::Stitcher::Deduplicator` — registry mapping signatures to
71
+ canonical glyph names, enabling signature-based deduplication during
72
+ Stitcher assembly. Merges identical outlines from different donors
73
+ into a single gid, reducing the glyph count.
74
+ - `Fontisan::Stitcher::GlyphLimit` — format-specific glyph-count caps
75
+ (TTF: 65,535; OTF: unlimited) and enforcement via `check!`.
76
+ - `Fontisan::GlyphLimitExceededError` — raised when the Stitcher's
77
+ output exceeds the format's glyph cap, with actionable guidance
78
+ (switch to OTF, reduce donors, split into TTC).
79
+ - `Stitcher.new(deduplicate: true)` — signature-based dedup is now the
80
+ default; pass `deduplicate: false` to disable.
81
+
82
+ ### Fixed
83
+
84
+ - Stitcher no longer silently produces an invalid TTF (or OTF) when
85
+ the glyph count exceeds 65,535. Both TTF and OTF (CFF1) cap at
86
+ 65,535 glyphs because `maxp.num_glyphs` is uint16 and the CFF1
87
+ CharStrings INDEX count is card16. The cap is now enforced BEFORE
88
+ writing, and signature-based deduplication merges identical outlines
89
+ to reduce the count. When dedup alone isn't enough,
90
+ `GlyphLimitExceededError` is raised with actionable options
91
+ (split into TTC, reduce donors, wait for CFF2) instead of the
92
+ previous behavior (silent truncation + dropped cmap entries at
93
+ the repair pass).
94
+
95
+ ### Added (previous)
96
+
97
+ - `Fontisan::SvgToGlyf` — converts SVG path data (from ucode code-chart
98
+ extraction) into `Ufo::Glyph` objects that feed directly into the
99
+ existing Stitcher + TtfCompiler pipeline. The converter handles SVG
100
+ path commands (M/L/H/V/C/S/Q/T/Z), relative and absolute coordinates,
101
+ smooth-curve reflection, SVG `<g transform>` accumulation, viewBox
102
+ coordinate normalization, and Y-axis flipping. Cubic-to-quadratic
103
+ conversion and contour winding correction are handled automatically
104
+ by the existing `Ufo::Compile::Filters` when compiling to TTF.
105
+ - `Fontisan::SvgToGlyf::Geometry::AffineTransform` — 2×3 matrix with
106
+ compose/apply, used uniformly for all coordinate operations.
107
+ - `Fontisan::SvgToGlyf::Geometry::TransformParser` — parses SVG
108
+ `transform="..."` attribute (translate, scale, rotate, matrix, skew).
109
+ - `Fontisan::SvgToGlyf::Geometry::Normalizer` — composes the viewBox →
110
+ font UPM normalization (Y-flip + scale) with group transforms.
111
+ - `Fontisan::SvgToGlyf::Path::Parser` — tokenizes and parses SVG path
112
+ `d` strings into typed Command objects with implicit-repetition support.
113
+ - `Fontisan::SvgToGlyf::Path::ContourBuilder` — converts commands to
114
+ `Ufo::Contour` objects, tracking current-point, subpath-start, and
115
+ control-point state for smooth-curve reflection.
116
+ - `Fontisan::SvgToGlyf::Document` — walks SVG XML (via Nokogiri),
117
+ accumulating ancestor `<g>` transforms per `<path>`.
12
118
  - `Fontisan::Ufo::Compile::Avar` — builds the OpenType `avar` (Axis
13
119
  Variation) table with per-axis non-linear maps (defaults to identity
14
120
  -1/0/1 mapping).