ucode 0.1.0 → 0.1.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (174) hide show
  1. checksums.yaml +4 -4
  2. data/CHANGELOG.md +72 -0
  3. data/Gemfile.lock +2 -2
  4. data/TODO.full/00-README.md +116 -0
  5. data/TODO.full/01-panglyph-vision.md +112 -0
  6. data/TODO.full/02-panglyph-repo-bootstrap.md +184 -0
  7. data/TODO.full/03-panglyph-font-builder.md +201 -0
  8. data/TODO.full/04-panglyph-publish-pipeline.md +126 -0
  9. data/TODO.full/05-ucode-0-1-1-release.md +139 -0
  10. data/TODO.full/06-fontisan-remove-audit.md +142 -0
  11. data/TODO.full/07-fontisan-remove-ucd.md +125 -0
  12. data/TODO.full/08-archive-private-bin-build.md +143 -0
  13. data/TODO.full/09-archive-public-structure.md +164 -0
  14. data/TODO.full/10-fontist-org-woff-glyphs.md +131 -0
  15. data/TODO.full/11-fontist-org-audit-coverage.md +140 -0
  16. data/TODO.full/12-implementation-order.md +216 -0
  17. data/TODO.full/13-fontisan-font-writer-api.md +189 -0
  18. data/TODO.full/14-fontisan-table-writers.md +66 -0
  19. data/TODO.full/15-panglyph-builder-real.md +82 -0
  20. data/TODO.full/16-archive-public-sync-workflows.md +167 -0
  21. data/TODO.full/17-fontist-org-font-picker.md +73 -0
  22. data/TODO.full/18-comprehensive-spec-coverage.md +64 -0
  23. data/TODO.full/19-ucode-0-1-2-patch.md +32 -0
  24. data/TODO.full/20-fontisan-0-2-23-release.md +52 -0
  25. data/TODO.new/00-README.md +30 -0
  26. data/TODO.new/23-universal-glyph-set-source-map.md +312 -0
  27. data/TODO.new/24-universal-glyph-set-build.md +189 -0
  28. data/TODO.new/25-font-audit-against-universal-set.md +195 -0
  29. data/TODO.new/26-missing-glyph-reporter.md +189 -0
  30. data/TODO.new/27-fontist-org-consumer-integration.md +200 -0
  31. data/TODO.new/28-implementation-order-update.md +187 -0
  32. data/TODO.new/29-universal-set-curation-uc17.md +312 -0
  33. data/TODO.new/30-tier1-font-acquisition.md +241 -0
  34. data/TODO.new/31-universal-set-production-build.md +205 -0
  35. data/TODO.new/32-uc17-coverage-matrix.md +165 -0
  36. data/TODO.new/33-specialist-font-acquisition-refresh.md +138 -0
  37. data/TODO.new/34-pillar2-content-stream-correlator.md +147 -0
  38. data/TODO.new/35-universal-set-production-run.md +160 -0
  39. data/TODO.new/36-per-font-coverage-audit.md +145 -0
  40. data/TODO.new/37-coverage-highlight-reporter.md +125 -0
  41. data/TODO.new/38-fontist-org-glyph-consumer.md +141 -0
  42. data/TODO.new/39-implementation-order-update-32-38.md +258 -0
  43. data/TODO.new/40-archive-private-uses-ucode-audit.md +124 -0
  44. data/TODO.new/41-ucode-unicode-archive-bridge.md +160 -0
  45. data/config/specialist_fonts.yml +102 -0
  46. data/config/unicode17_tier1_fonts.yml +42 -0
  47. data/config/unicode17_universal_glyph_set.yml +293 -0
  48. data/lib/ucode/audit/block_aggregator.rb +57 -29
  49. data/lib/ucode/audit/browser/face_page.rb +128 -0
  50. data/lib/ucode/audit/browser/glyph_panel.rb +124 -0
  51. data/lib/ucode/audit/browser/library_page.rb +74 -0
  52. data/lib/ucode/audit/browser/missing_glyph_page.rb +87 -0
  53. data/lib/ucode/audit/browser/template.rb +47 -0
  54. data/lib/ucode/audit/browser/templates/face.css +200 -0
  55. data/lib/ucode/audit/browser/templates/face.html.erb +41 -0
  56. data/lib/ucode/audit/browser/templates/face.js +298 -0
  57. data/lib/ucode/audit/browser/templates/library.css +119 -0
  58. data/lib/ucode/audit/browser/templates/library.html.erb +42 -0
  59. data/lib/ucode/audit/browser/templates/library.js +99 -0
  60. data/lib/ucode/audit/browser/templates/missing_glyph_page.css +119 -0
  61. data/lib/ucode/audit/browser/templates/missing_glyph_page.html.erb +58 -0
  62. data/lib/ucode/audit/browser/templates/missing_glyph_page.js +2 -0
  63. data/lib/ucode/audit/browser.rb +32 -0
  64. data/lib/ucode/audit/context.rb +27 -1
  65. data/lib/ucode/audit/coverage_reference.rb +103 -0
  66. data/lib/ucode/audit/differ.rb +121 -0
  67. data/lib/ucode/audit/emitter/block_emitter.rb +52 -0
  68. data/lib/ucode/audit/emitter/codepoint_emitter.rb +87 -0
  69. data/lib/ucode/audit/emitter/collection_emitter.rb +80 -0
  70. data/lib/ucode/audit/emitter/face_directory.rb +212 -0
  71. data/lib/ucode/audit/emitter/glyph_emitter.rb +48 -0
  72. data/lib/ucode/audit/emitter/index_emitter.rb +149 -0
  73. data/lib/ucode/audit/emitter/library_emitter.rb +96 -0
  74. data/lib/ucode/audit/emitter/paths.rb +312 -0
  75. data/lib/ucode/audit/emitter/plane_emitter.rb +29 -0
  76. data/lib/ucode/audit/emitter/script_emitter.rb +29 -0
  77. data/lib/ucode/audit/emitter.rb +29 -0
  78. data/lib/ucode/audit/extractors/aggregations.rb +31 -2
  79. data/lib/ucode/audit/face_auditor.rb +86 -0
  80. data/lib/ucode/audit/formatters/audit_diff_text.rb +112 -0
  81. data/lib/ucode/audit/formatters/audit_text.rb +411 -0
  82. data/lib/ucode/audit/formatters/color.rb +48 -0
  83. data/lib/ucode/audit/formatters/library_summary_text.rb +98 -0
  84. data/lib/ucode/audit/formatters/text_formatter.rb +83 -0
  85. data/lib/ucode/audit/formatters.rb +23 -0
  86. data/lib/ucode/audit/library_aggregator.rb +86 -0
  87. data/lib/ucode/audit/library_auditor.rb +105 -0
  88. data/lib/ucode/audit/release/emitter.rb +152 -0
  89. data/lib/ucode/audit/release/face_card.rb +93 -0
  90. data/lib/ucode/audit/release/formula_audits.rb +50 -0
  91. data/lib/ucode/audit/release/library_index_builder.rb +78 -0
  92. data/lib/ucode/audit/release/manifest_builder.rb +127 -0
  93. data/lib/ucode/audit/release.rb +42 -0
  94. data/lib/ucode/audit/ucd_only_reference.rb +81 -0
  95. data/lib/ucode/audit/universal_set_reference.rb +136 -0
  96. data/lib/ucode/audit.rb +31 -0
  97. data/lib/ucode/cli.rb +339 -33
  98. data/lib/ucode/commands/audit/browser_command.rb +82 -0
  99. data/lib/ucode/commands/audit/collection_command.rb +103 -0
  100. data/lib/ucode/commands/audit/compare_command.rb +188 -0
  101. data/lib/ucode/commands/audit/font_command.rb +140 -0
  102. data/lib/ucode/commands/audit/library_command.rb +87 -0
  103. data/lib/ucode/commands/audit/reference_builder.rb +64 -0
  104. data/lib/ucode/commands/audit.rb +20 -0
  105. data/lib/ucode/commands/block_feed.rb +73 -0
  106. data/lib/ucode/commands/canonical_build.rb +138 -0
  107. data/lib/ucode/commands/fetch.rb +37 -1
  108. data/lib/ucode/commands/release.rb +115 -0
  109. data/lib/ucode/commands/universal_set.rb +211 -0
  110. data/lib/ucode/commands.rb +5 -0
  111. data/lib/ucode/coordinator/indices.rb +11 -0
  112. data/lib/ucode/coordinator.rb +138 -5
  113. data/lib/ucode/error.rb +30 -2
  114. data/lib/ucode/fetch/font_fetcher/result.rb +39 -0
  115. data/lib/ucode/fetch/font_fetcher.rb +16 -0
  116. data/lib/ucode/fetch/specialist_font_fetcher.rb +280 -0
  117. data/lib/ucode/fetch.rb +7 -3
  118. data/lib/ucode/glyphs/real_fonts/cmap_cache.rb +74 -0
  119. data/lib/ucode/glyphs/real_fonts.rb +1 -0
  120. data/lib/ucode/glyphs/resolver.rb +62 -0
  121. data/lib/ucode/glyphs/source.rb +48 -0
  122. data/lib/ucode/glyphs/source_builder.rb +61 -0
  123. data/lib/ucode/glyphs/source_config/coverage_assertion.rb +79 -0
  124. data/lib/ucode/glyphs/source_config/gap_report.rb +54 -0
  125. data/lib/ucode/glyphs/source_config.rb +104 -0
  126. data/lib/ucode/glyphs/sources/pillar1_embedded_tounicode.rb +63 -0
  127. data/lib/ucode/glyphs/sources/pillar3_last_resort.rb +51 -0
  128. data/lib/ucode/glyphs/sources/tier1_real_font.rb +104 -0
  129. data/lib/ucode/glyphs/sources.rb +20 -0
  130. data/lib/ucode/glyphs/universal_set/builder.rb +161 -0
  131. data/lib/ucode/glyphs/universal_set/coverage_report.rb +139 -0
  132. data/lib/ucode/glyphs/universal_set/idempotency.rb +86 -0
  133. data/lib/ucode/glyphs/universal_set/manifest_accumulator.rb +195 -0
  134. data/lib/ucode/glyphs/universal_set/manifest_writer.rb +61 -0
  135. data/lib/ucode/glyphs/universal_set/pre_build_check.rb +197 -0
  136. data/lib/ucode/glyphs/universal_set/validator.rb +204 -0
  137. data/lib/ucode/glyphs/universal_set.rb +45 -0
  138. data/lib/ucode/glyphs.rb +6 -0
  139. data/lib/ucode/models/audit/baseline.rb +6 -0
  140. data/lib/ucode/models/audit/block_summary.rb +7 -0
  141. data/lib/ucode/models/audit/codepoint_provenance.rb +39 -0
  142. data/lib/ucode/models/audit/release_face.rb +42 -0
  143. data/lib/ucode/models/audit/release_formula.rb +33 -0
  144. data/lib/ucode/models/audit/release_manifest.rb +43 -0
  145. data/lib/ucode/models/audit/release_universal_set.rb +37 -0
  146. data/lib/ucode/models/audit.rb +9 -0
  147. data/lib/ucode/models/block.rb +2 -0
  148. data/lib/ucode/models/build_report.rb +109 -0
  149. data/lib/ucode/models/codepoint/glyph.rb +42 -0
  150. data/lib/ucode/models/codepoint.rb +3 -0
  151. data/lib/ucode/models/glyph_source.rb +86 -0
  152. data/lib/ucode/models/glyph_source_map.rb +138 -0
  153. data/lib/ucode/models/specialist_font.rb +70 -0
  154. data/lib/ucode/models/specialist_font_manifest.rb +48 -0
  155. data/lib/ucode/models/unihan_entry.rb +81 -9
  156. data/lib/ucode/models/unihan_field.rb +21 -0
  157. data/lib/ucode/models/universal_set_entry.rb +47 -0
  158. data/lib/ucode/models/universal_set_manifest.rb +78 -0
  159. data/lib/ucode/models/validation_report.rb +99 -0
  160. data/lib/ucode/models.rb +9 -0
  161. data/lib/ucode/parsers/named_sequences.rb +5 -5
  162. data/lib/ucode/parsers/unihan.rb +50 -19
  163. data/lib/ucode/repo/aggregate_writer.rb +34 -2
  164. data/lib/ucode/repo/block_feed_emitter.rb +153 -0
  165. data/lib/ucode/repo/build_report_accumulator.rb +138 -0
  166. data/lib/ucode/repo/build_report_writer.rb +46 -0
  167. data/lib/ucode/repo/build_validator.rb +229 -0
  168. data/lib/ucode/repo/codepoint_writer.rb +50 -1
  169. data/lib/ucode/repo/paths.rb +8 -0
  170. data/lib/ucode/repo.rb +4 -0
  171. data/lib/ucode/version.rb +1 -1
  172. data/schema/block-feed.output.schema.yml +134 -0
  173. metadata +143 -2
  174. data/ucode.gemspec +0 -56
@@ -0,0 +1,312 @@
1
+ # 29 — Universal glyph set: full Unicode 17 curation (Part 1)
2
+
3
+ ## Goal
4
+
5
+ Fill `config/unicode17_universal_glyph_set.yml` with concrete Tier 1
6
+ font recommendations for every Unicode 17 block (~340 entries). This
7
+ is **Part 1** of the user's three-part directive — produce the FULL
8
+ base of glyph coverage so font audits (TODO 25, shipped) and
9
+ missing-glyph reports (TODO 26) compare against a real reference.
10
+
11
+ Today the config has 315 of 340 blocks with `sources: []`. The
12
+ universal-set build (TODO 24) would produce pillar-3 tofu for every
13
+ block without a Tier 1 source — i.e. most of Unicode 17. This TODO
14
+ closes that gap by encoding the user's August 2025 font investigation
15
+ into the config.
16
+
17
+ ## Why a separate TODO
18
+
19
+ TODO 23 built the **mechanism** (loader, validator, models). TODO 24
20
+ built the **build pipeline**. TODO 21 referenced Tier 1 fonts in its
21
+ example. None of them actually **curated** the per-block font choices
22
+ — they all deferred to "filled in from baseline audit (TODO 05)."
23
+
24
+ This TODO is that filling-in pass. The analysis comes from the
25
+ user's investigation (Lentariso, Kedebideri, NotoSerifTaiYo,
26
+ UniHieroglyphica, Egyptian Text, FSung-*), cross-referenced with the
27
+ Noto Fonts dashboard and the BBAW egyptological font list.
28
+
29
+ ## Architectural improvements
30
+
31
+ ### `default_sources` at the YAML top level (DRY)
32
+
33
+ Most blocks (~250 of ~340) use Noto Sans as their Tier 1 source. The
34
+ current YAML forces each block to repeat:
35
+
36
+ ```yaml
37
+ Basic_Latin:
38
+ sources:
39
+ - kind: fontist
40
+ label: noto-sans
41
+ priority: 1
42
+ license: OFL
43
+ Latin-1_Supplement:
44
+ sources:
45
+ - kind: fontist
46
+ label: noto-sans
47
+ priority: 1
48
+ license: OFL
49
+ # ... 248 more copies of the same entry
50
+ ```
51
+
52
+ ~1250 lines of noise for a single rule. Add a top-level
53
+ `default_sources` that applies when a block's `sources:` is empty or
54
+ absent:
55
+
56
+ ```yaml
57
+ default_sources:
58
+ - kind: fontist
59
+ label: noto-sans
60
+ priority: 1
61
+ license: OFL
62
+ provenance: "Universal fallback for Latin-family scripts"
63
+ ```
64
+
65
+ The curated specialists (Sidetic, Beria Erfe, Tai Yo, etc.) stand out
66
+ as the entries that actually carry policy. Reviewers see "what's
67
+ different" instead of wading through copy-paste.
68
+
69
+ ### `sources_for(block_id)` on the map (single source of truth)
70
+
71
+ The map answers `sources_for(block_id)`. Internally it falls through:
72
+ block-specific sources → `default_sources` → empty. The loader
73
+ returns the map unmodified; the resolver asks the map.
74
+
75
+ ```ruby
76
+ class Ucode::Models::GlyphSourceMap
77
+ def sources_for(block_id)
78
+ entry = map[block_id]
79
+ return entry.sources if entry && entry.sources.any?
80
+ return default_sources if default_sources.any?
81
+
82
+ []
83
+ end
84
+ end
85
+ ```
86
+
87
+ This keeps the map as the single source of truth — no separate
88
+ "default-application" pass that mutates state.
89
+
90
+ ### Coverage assertion (reviewability)
91
+
92
+ When the loader builds the `GlyphSourceMap`, it does NOT assert
93
+ coverage (load stays cheap). A separate `CoverageAssertion` walker
94
+ iterates every block, opens each Tier 1 font's cmap, and reports
95
+ which assigned codepoints have no Tier 1 source. Output:
96
+
97
+ ```ruby
98
+ report = Ucode::Glyphs::SourceConfig::CoverageAssertion.new(
99
+ source_map: map,
100
+ database: Ucode::Database.open("17.0.0"),
101
+ font_cmaps: Ucode::Glyphs::RealFonts::CmapCache.new(fonts_in(map)),
102
+ ).call
103
+
104
+ report.gaps_by_block
105
+ # => { "Combining_Diacritical_Marks_Extended" => [7116, 7117, ...],
106
+ # "Supplemental_Arrows_C" => [118784, 118785] }
107
+ ```
108
+
109
+ This is a **development-time check** — the build still runs, gaps
110
+ fall through to pillar 1-2-3. The report makes curation reviewable:
111
+ "we have 4321 codepoints with no Tier 1 font; here they are by block."
112
+
113
+ Without this assertion, gaps are silent.
114
+
115
+ ### `script_defaults` (out of scope, future improvement)
116
+
117
+ A further DRY step: map Unicode `Script` property values to default
118
+ fonts. Loader resolves block → primary script → font. Saves another
119
+ ~100 lines for the per-script Noto variants (Hebrew, Arabic, Devanagari,
120
+ Bengali, etc.). Deferred — `default_sources` is enough for v0.2.
121
+
122
+ ## Curation matrix
123
+
124
+ ### New Unicode 17 blocks (11 blocks, fully curated)
125
+
126
+ | Block ID | Codepoints | Tier 1 source | Fallback |
127
+ |---|---:|---|---|
128
+ | `Sidetic` | 26 | `data/fonts/Lentariso.otf` (≥1.029) | fontist:noto-sans-sidetic |
129
+ | `Beria_Erfe` | 50 | `data/fonts/Kedebideri-Regular.ttf` (3.001) | pillar-2 |
130
+ | `Tai_Yo` | 54 | `data/fonts/NotoSerifTaiYo.ttf` | fontist:noto-sans-tai-yo |
131
+ | `Tolong_Siki` | 54 | fontist:noto-sans-tolong-siki | pillar-2 |
132
+ | `Sharada_Supplement` | 8 | fontist:noto-sans-sharada | pillar-2 |
133
+ | `CJK_Unified_Ideographs_Extension_J` | 4,298 | `~/Downloads/全宋體/FSung-*.ttf` (priority 1-9) | fontist:noto-sans-cjk-jp |
134
+ | `Symbols_for_Legacy_Computing_Supplement` | 9 | `data/fonts/BabelStonePseudographica.ttf` | pillar-2 (Unicode 17 additions may be missing) |
135
+ | `Supplemental_Arrows_C` | 9 | `data/fonts/Symbola.ttf` | pillar-2 (same caveat) |
136
+ | `Alchemical_Symbols` (4 new + existing) | 4 + 102 | fontist:noto-sans-symbols | `data/fonts/Symbola.ttf` |
137
+ | `Miscellaneous_Symbols_Supplement` | 34 | fontist:noto-sans-symbols-2 | `data/fonts/Symbola.ttf` |
138
+ | `Musical_Symbols` (UC17 additions) | TBD | fontist:noto-music | pillar-2 |
139
+
140
+ ### Egyptian Hieroglyphs family (4 blocks)
141
+
142
+ | Block ID | Range | Codepoints | Tier 1 source |
143
+ |---|---|---:|---|
144
+ | `Egyptian_Hieroglyphs` | U+13000..U+1342F (+28 in UC17) | ~1,072+28 | `data/fonts/UniHieroglyphica.ttf` (v16) |
145
+ | `Egyptian_Hieroglyphs_Format_Controls` | U+13430..U+1345F | 36 | `data/fonts/EgyptianText-Regular.ttf` (microsoft/font-tools) |
146
+ | `Egyptian_Hieroglyphs_Extended-A` | U+13460..U+143FF (+9 in UC17) | ~3,936+9 | `data/fonts/UniHieroglyphica.ttf` (v16) |
147
+ | `Egyptian_Hieroglyphs_Extended-B` | NEW in UC17 | ~600 | `data/fonts/UniHieroglyphica.ttf` (v16) |
148
+
149
+ UniHieroglyphica is the authoritative source for Egyptian Hieroglyph
150
+ outlines (https://aaew.bbaw.de/egyptological-unicode-fonts). Egyptian
151
+ Text (microsoft/font-tools, OFL) is the only source for the Format
152
+ Controls block.
153
+
154
+ ### Existing blocks with Unicode 17 additions (selected)
155
+
156
+ | Block | UC17 additions | Tier 1 source |
157
+ |---|---:|---|
158
+ | `Tangut` | 8 | fontist:noto-sans-tangut |
159
+ | `Tangut_Supplement` | 22 | fontist:noto-sans-tangut |
160
+ | `Tangut_Components` | 115 | fontist:noto-sans-tangut |
161
+ | `Adlam` | 29 | fontist:noto-sans-adlam |
162
+ | `Arabic_Extended-B` | (UC17) | fontist:noto-sans-arabic |
163
+ | `Arabic_Extended-C` (new) | TBD | fontist:noto-sans-arabic |
164
+ | `Telugu` | 1 | fontist:noto-sans-telugu |
165
+ | `Kannada` | 1 | fontist:noto-sans-kannada |
166
+ | `Combining_Diacritical_Marks_Extended` | +27 | pillar-2 (font support spotty) |
167
+ | `CJK_Unified_Ideographs_Extension_C` | additions | `~/Downloads/全宋體/FSung-C.ttf` |
168
+ | `CJK_Unified_Ideographs_Extension_E` | additions | `~/Downloads/全宋體/FSung-E.ttf` |
169
+ | `Chess_Symbols` | +4 | fontist:noto-sans-symbols-2 |
170
+ | `Transport_and_Map_Symbols` | +1 | fontist:noto-sans-symbols-2 |
171
+ | `Symbols_and_Pictographs_Extended-A` | +6 | fontist:noto-sans-symbols-2 |
172
+
173
+ ### Everything else (~250 blocks)
174
+
175
+ `default_sources` (noto-sans) covers:
176
+
177
+ - All Latin, Greek, Cyrillic, Armenian, Hebrew base + supplement blocks.
178
+ - All symbol blocks where Noto Sans covers (Mathematical Operators,
179
+ Box Drawing, Block Elements, Geometric Shapes, etc.).
180
+ - General punctuation, control pictures, etc.
181
+
182
+ When `default_sources` is exhausted (a codepoint is outside Noto
183
+ Sans's coverage), the resolver falls through to Pillar 1 → 2 → 3.
184
+
185
+ ## Curation rules (carry from TODO 23, refined)
186
+
187
+ 1. **One Tier 1 font per script family.** Specialist fonts only for
188
+ blocks the default can't cover.
189
+ 2. **Proprietary fonts never ship.** Sources with `license:
190
+ PROPRIETARY` are loaded for glyph extraction only; the extracted
191
+ SVG (open data) ships, the font file does not.
192
+ 3. **Provenance mandatory.** Every specialist entry cites where the
193
+ font comes from and why.
194
+ 4. **`priority` lower wins.** The resolver tries sources in priority
195
+ order; first hit wins.
196
+ 5. **Block IDs verbatim.** Use the exact Unicode block name with
197
+ underscores (e.g. `Greek_and_Coptic`, never slugified).
198
+
199
+ ## Files to change / create
200
+
201
+ - `config/unicode17_universal_glyph_set.yml` — full content (~150
202
+ lines thanks to `default_sources`).
203
+ - `lib/ucode/models/glyph_source_map.rb` — add `default_sources`
204
+ attribute (collection of `GlyphSource`); add `sources_for(block_id)`.
205
+ - `lib/ucode/models/glyph_source.rb` — no change (already supports
206
+ the shape; the YAML loader just populates `default_sources` from
207
+ the top-level key via the existing mapping).
208
+ - `lib/ucode/glyphs/source_config.rb` — no change to the loader
209
+ itself; it already returns the map. (Existing
210
+ `GlyphSourceMap#fonts_for(block_id)` callers migrate to
211
+ `sources_for(block_id)`; the old method is removed.)
212
+ - `lib/ucode/glyphs/source_config/coverage_assertion.rb` — new.
213
+ - `lib/ucode/glyphs/source_config/gap_report.rb` — new typed result.
214
+ - `lib/ucode/glyphs/source_config.rb` — re-open to add the
215
+ `CoverageAssertion` autoload (or place under
216
+ `lib/ucode/glyphs/source_config/` and add a namespace hub).
217
+ - Specs:
218
+ - Update `spec/ucode/glyphs/source_config_spec.rb` for
219
+ `default_sources` + `sources_for`.
220
+ - New `spec/ucode/glyphs/source_config/coverage_assertion_spec.rb`.
221
+ - Smoke spec: full config loads cleanly, every block resolves to
222
+ at least one source (count of `gaps == 0` for curated blocks).
223
+
224
+ ## Loader shape (target)
225
+
226
+ ```ruby
227
+ class Ucode::Glyphs::SourceConfig
228
+ DEFAULT_PATH = Pathname.new("config/unicode17_universal_glyph_set.yml")
229
+
230
+ def self.load(yaml_path = DEFAULT_PATH)
231
+ parsed = YAML.safe_load(yaml_path.read)
232
+ Ucode::Models::GlyphSourceMap.from_hash(parsed)
233
+ end
234
+ end
235
+ ```
236
+
237
+ The map's `from_hash` already handles a top-level `default_sources`
238
+ array via the existing lutaml-model mapping (the only change is
239
+ adding the attribute + the `sources_for` method).
240
+
241
+ ## Coverage assertion shape
242
+
243
+ ```ruby
244
+ class Ucode::Glyphs::SourceConfig::CoverageAssertion
245
+ def initialize(source_map:, database:, font_cmaps:)
246
+ @source_map = source_map
247
+ @database = database
248
+ @font_cmaps = font_cmaps
249
+ end
250
+
251
+ def call
252
+ gaps = Hash.new { |h, k| h[k] = [] }
253
+ @database.each_assigned_codepoint do |cp|
254
+ block_id = @database.lookup_block(cp)
255
+ next unless block_id
256
+
257
+ sources = @source_map.sources_for(block_id)
258
+ next if sources.empty? # uncurated block; not a gap, just unconfigured
259
+
260
+ next if sources.any? { |s| @font_cmaps.covers?(s.label, cp) }
261
+
262
+ gaps[block_id] << cp
263
+ end
264
+ GapReport.new(gaps_by_block: gaps.freeze)
265
+ end
266
+ end
267
+ ```
268
+
269
+ The assertion never raises — it returns a typed `GapReport`. Callers
270
+ decide whether to act on gaps (CI: warn; local: print; production
271
+ build: continue and let pillar 1-2-3 catch up).
272
+
273
+ ## Acceptance
274
+
275
+ - `config/unicode17_universal_glyph_set.yml` exists with:
276
+ - `default_sources` populated (noto-sans + fallback chain).
277
+ - All 11 new Unicode 17 blocks curated with specific Tier 1 sources.
278
+ - All 4 Egyptian Hieroglyphs blocks curated.
279
+ - `~/Downloads/全宋體/FSung-*` paths documented for CJK Ext J
280
+ (user-local fallback; warning emitted if absent).
281
+ - `GlyphSourceMap#sources_for(block_id)` returns block-specific
282
+ sources when present, otherwise `default_sources`, otherwise `[]`.
283
+ - `CoverageAssertion` produces a `GapReport` whose `gaps_by_block`
284
+ matches expectations: empty for curated blocks, populated for
285
+ known-gap blocks (Combining Diacritical Marks Extended, Symbols
286
+ for Legacy Computing Supp UC17 additions, Supplemental Arrows-C
287
+ UC17 additions).
288
+ - Smoke spec on the full config: every block resolves to at least
289
+ one source (no `[]` results for any assigned block).
290
+ - Rubocop clean.
291
+
292
+ ## Out of scope
293
+
294
+ - Font acquisition (downloading Lentariso, Kedebideri, etc.) — TODO 30.
295
+ - Production build execution — TODO 31.
296
+ - Pillar 2 correlator hardening for residual gaps — TODO 31 (validate
297
+ during production build).
298
+ - CJK Ext J verification (FSung-* actually covers all 4,298
299
+ codepoints) — TODO 31 (validate during production build).
300
+ - `script_defaults` refinement — future TODO.
301
+
302
+ ## References
303
+
304
+ - Source map mechanism: `TODO.new/23-universal-glyph-set-source-map.md`
305
+ - Build pipeline: `TODO.new/24-universal-glyph-set-build.md`
306
+ - Font audit against universal set: `TODO.new/25-font-audit-against-universal-set.md`
307
+ - Font acquisition: `TODO.new/30-tier1-font-acquisition.md`
308
+ - Production build: `TODO.new/31-universal-set-production-build.md`
309
+ - Architecture: `docs/architecture.md` §"The 4-tier glyph sourcing strategy"
310
+ - BBAW font list: https://aaew.bbaw.de/egyptological-unicode-fonts
311
+ - Existing source config: `config/unicode17_universal_glyph_set.yml`
312
+ - Existing loader: `lib/ucode/glyphs/source_config.rb`
@@ -0,0 +1,241 @@
1
+ # 30 — Tier 1 font acquisition: specialist fonts
2
+
3
+ ## Goal
4
+
5
+ A fetcher subsystem that downloads the specialist Tier 1 fonts not
6
+ discoverable via fontist's index. These fonts have canonical sources
7
+ (GitHub releases, SIL downloads, personal academic sites) that
8
+ fontist's formulas don't cover.
9
+
10
+ This unblocks TODO 29's curation: the YAML references fonts like
11
+ `data/fonts/Lentariso.otf`, but those paths must be populated for
12
+ the universal-set build (TODO 24) to actually use them.
13
+
14
+ ## Why a separate TODO
15
+
16
+ fontist is the project's font discovery layer for redistributable
17
+ formulas. It does not (and should not) carry formulas for:
18
+
19
+ - **Lentariso** (github.com/Bry10022/Lentariso) — SFD source, GitHub
20
+ releases. Not in fontist/formulas.
21
+ - **Kedebideri** (software.sil.org/kedebideri) — UFO3 source with
22
+ TECkit mapping. SIL's downloads page is the canonical source.
23
+ - **NotoSerifTaiYo** (translationcommons.org) — pre-release Noto
24
+ variant, not yet on Google Fonts.
25
+ - **UniHieroglyphica** (suignard.com) — personal academic site, OFL.
26
+ - **Egyptian Text** (microsoft/font-tools) — bundled in a font-tools
27
+ release, not a standalone formula.
28
+ - **BabelStone Pseudographica** — personal academic site.
29
+ - **Symbola** — personal academic site.
30
+
31
+ These need their own fetcher. The fetcher is **not** a fontist
32
+ replacement — it's a complementary path for fonts that can't (yet)
33
+ go through fontist's formula process. The output is identical from
34
+ the consumer's perspective: a TTF/OTF on disk under `data/fonts/`.
35
+
36
+ ## Specialist fonts manifest
37
+
38
+ `config/specialist_fonts.yml`:
39
+
40
+ ```yaml
41
+ # Specialist Tier 1 fonts not in fontist's formula index.
42
+ # All entries must be OFL unless explicitly whitelisted.
43
+ fonts:
44
+ - label: Lentariso
45
+ version: "1.033"
46
+ license: OFL
47
+ url: "https://github.com/Bry10022/Lentariso/releases/download/1.033/Lentariso.otf"
48
+ sha256: "<filled in on first successful fetch>"
49
+ path: "data/fonts/Lentariso.otf"
50
+ extract: false
51
+ provenance: "github.com/Bry10022/Lentariso — covers Imperial Aramaic, Phoenician, Sidetic"
52
+
53
+ - label: Kedebideri
54
+ version: "3.001"
55
+ license: OFL
56
+ url: "https://software.sil.org/downloads/r/kedebideri/Kedebideri-3.001.zip"
57
+ sha256: "..."
58
+ path: "data/fonts/Kedebideri-Regular.ttf"
59
+ extract: true # zip: extract just the TTF
60
+ extract_member: "Kedebideri-Regular.ttf"
61
+ provenance: "SIL, first Unicode font for Beria Erfe"
62
+
63
+ - label: NotoSerifTaiYo
64
+ version: "draft-2025-09"
65
+ license: OFL
66
+ url: "https://translationcommons.org/wp-content/uploads/2025/09/NotoSerifTaiYo.ttf"
67
+ sha256: "..."
68
+ path: "data/fonts/NotoSerifTaiYo.ttf"
69
+ extract: false
70
+ provenance: "translationcommons.org, proven via correlate-v4"
71
+
72
+ - label: UniHieroglyphica
73
+ version: "16.0"
74
+ license: OFL
75
+ url: "https://www.suignard.com/UniHieroglyphica/UniHieroglyphica-16.0.zip"
76
+ sha256: "..."
77
+ path: "data/fonts/UniHieroglyphica.ttf"
78
+ extract: true
79
+ extract_member: "UniHieroglyphica.ttf"
80
+ provenance: "suignard.com, authoritative for Egyptian Hieroglyphs"
81
+
82
+ - label: EgyptianText
83
+ version: "1.0"
84
+ license: OFL
85
+ url: "https://github.com/microsoft/font-tools/releases/download/v1.0/EgyptianText-Regular.ttf"
86
+ sha256: "..."
87
+ path: "data/fonts/EgyptianText-Regular.ttf"
88
+ extract: false
89
+ provenance: "microsoft/font-tools — Format Controls block"
90
+
91
+ - label: BabelStonePseudographica
92
+ version: "2024-09-10"
93
+ license: OFL
94
+ url: "https://www.babelstone.co.uk/Fonts/Download/BabelStonePseudographica.zip"
95
+ sha256: "..."
96
+ path: "data/fonts/BabelStonePseudographica.ttf"
97
+ extract: true
98
+ extract_member: "BabelStonePseudographica.ttf"
99
+ provenance: "BabelStone, partial Unicode 17 coverage"
100
+
101
+ - label: Symbola
102
+ version: "13.0"
103
+ license: OFL
104
+ url: "https://dn-works.com/wp-content/uploads/2020/ufas/Symbola.zip"
105
+ sha256: "..."
106
+ path: "data/fonts/Symbola.ttf"
107
+ extract: true
108
+ extract_member: "Symbola.ttf"
109
+ provenance: "dn-works.com, broad Unicode symbol coverage"
110
+
111
+ - label: FSung
112
+ version: "2024"
113
+ license: OFL # Taiwan MOE 全宋體, user-local
114
+ url: null # local-only; user must place under ~/Downloads/全宋體/
115
+ path: "~/Downloads/全宋體/FSung-*.ttf" # glob expanded at load time
116
+ extract: false
117
+ provenance: "Taiwan MOE 全宋體, user-supplied"
118
+ ```
119
+
120
+ URLs are illustrative — TODO 30 verifies each one resolves (curl
121
+ HEAD) before merge. SHA256 hashes are filled in on first successful
122
+ download (computed locally, committed as a checkpoint).
123
+
124
+ ## Architectural notes
125
+
126
+ ### Single manifest, not ad-hoc downloads
127
+
128
+ Every specialist font lives in one YAML. Adding a new font = one
129
+ entry in the manifest; no Ruby changes. The fetcher iterates the
130
+ manifest mechanically.
131
+
132
+ ### Typed result, not exceptions
133
+
134
+ Each font produces a `Result` value object (`:downloaded`, `:skipped`,
135
+ `:failed`, `:local`). The fetcher never raises for a single font
136
+ failure; the aggregate result lists successes and failures separately.
137
+ This lets CI report which fonts broke without abandoning the run.
138
+
139
+ ### License hard-guard
140
+
141
+ Any entry with `license != OFL` requires `--allow-proprietary` to
142
+ fetch. This is a hard guard against accidentally pulling non-OFL
143
+ fonts into the redistributable `data/fonts/` directory. FSung is OFL
144
+ but `url: null` (local-only) — different code path.
145
+
146
+ ### Cmap pre-warming (optional, future)
147
+
148
+ After download, the fetcher can pre-warm the cmap cache by loading
149
+ each font once and recording its codepoint set. Saves the
150
+ universal-set build (TODO 24) a re-parse. Out of scope for v0.2.
151
+
152
+ ## Files to create
153
+
154
+ - `lib/ucode/fetchers.rb` — autoload hub for the new namespace (or
155
+ extend existing if there's already a fetchers module).
156
+ - `lib/ucode/fetchers/font_fetcher.rb` — abstract base.
157
+ - `lib/ucode/fetchers/font_fetcher/result.rb` — typed result.
158
+ - `lib/ucode/fetchers/specialist_font_fetcher.rb` — concrete, reads
159
+ the manifest, fetches each font.
160
+ - `lib/ucode/models/specialist_font.rb` — one manifest entry.
161
+ - `lib/ucode/models/specialist_font_manifest.rb` — full manifest.
162
+ - `config/specialist_fonts.yml` — the manifest.
163
+ - `lib/ucode/commands/fetch.rb` — autoload `Fonts` (extend existing
164
+ fetch namespace).
165
+ - `lib/ucode/commands/fetch/fonts.rb` — CLI command class.
166
+ - Specs:
167
+ - `spec/ucode/fetchers/font_fetcher_spec.rb`
168
+ - `spec/ucode/fetchers/specialist_font_fetcher_spec.rb`
169
+ - `spec/ucode/commands/fetch/fonts_spec.rb`
170
+ - `spec/fixtures/specialist_fonts.yml` — small fixture
171
+ - `spec/fixtures/fonts/.gitkeep`
172
+
173
+ ## Fetcher behavior
174
+
175
+ - **Idempotent.** Skip if `path` exists and SHA256 matches.
176
+ - **Hashed.** Compute SHA256 on download; compare to manifest entry.
177
+ Mismatch raises `Ucode::Fetchers::FontChecksumError` (typed, not
178
+ generic `RuntimeError`).
179
+ - **License-checked.** Refuse to download any font with `license !=
180
+ OFL` unless `--allow-proprietary` is passed. Hard guard.
181
+ - **Extracted.** `extract: true` entries unzip to a temp dir; only
182
+ `extract_member` is moved into place.
183
+ - **Local-only paths.** `url: null` entries print "place <font> at
184
+ <path>" and skip the download. The result is `:local`.
185
+
186
+ ## CLI
187
+
188
+ ```bash
189
+ bin/ucode fetch fonts # fetch all listed fonts
190
+ bin/ucode fetch fonts --label Lentariso # fetch just one
191
+ bin/ucode fetch fonts --dry-run # show what would be fetched
192
+ bin/ucode fetch fonts --allow-proprietary # bypass license guard
193
+ ```
194
+
195
+ Output: per-font status line:
196
+
197
+ ```
198
+ Lentariso downloaded data/fonts/Lentariso.otf (1.2 MB, OFL)
199
+ Kedebideri downloaded data/fonts/Kedebideri-Regular.ttf (450 KB, OFL)
200
+ NotoSerifTaiYo downloaded data/fonts/NotoSerifTaiYo.ttf (180 KB, OFL)
201
+ UniHieroglyphica downloaded data/fonts/UniHieroglyphica.ttf (3.4 MB, OFL)
202
+ EgyptianText downloaded data/fonts/EgyptianText-Regular.ttf (220 KB, OFL)
203
+ FSung local ~/Downloads/全宋體/FSung-*.ttf (user-supplied)
204
+ ```
205
+
206
+ ## Files to change
207
+
208
+ - `lib/ucode/cli.rb` — register `fetch fonts` subcommand.
209
+ - `lib/ucode/commands/fetch.rb` — add `Fonts` autoload.
210
+ - `lib/ucode/exceptions.rb` (or wherever exceptions live) — add
211
+ `FontChecksumError`, `FontLicenseError` if not present.
212
+
213
+ ## Acceptance
214
+
215
+ - `bin/ucode fetch fonts` downloads all 7 specialist fonts into
216
+ `data/fonts/`.
217
+ - Re-running skips already-downloaded (idempotency; SHA256 verified).
218
+ - SHA256 mismatch raises typed `FontChecksumError`.
219
+ - `--allow-proprietary` is required for any font with non-OFL license.
220
+ - Local-only entries (FSung) print a clear "please place at <path>"
221
+ message; no network attempt; result is `:local`.
222
+ - Specs cover: happy path, idempotency, checksum mismatch, license
223
+ refusal, zip extraction, missing extract_member.
224
+ - Rubocop clean.
225
+
226
+ ## Out of scope
227
+
228
+ - Adding these fonts to fontist's formulas (separate upstream effort).
229
+ - The Tier 1 source map curation — TODO 29.
230
+ - The universal-set build that consumes these — TODO 24, TODO 31.
231
+ - CJK FSung auto-download — these are user-local and not redistributable
232
+ via this repo. Documented in the manifest as local-only.
233
+
234
+ ## References
235
+
236
+ - Source map: `TODO.new/29-universal-set-curation-uc17.md`
237
+ - Build pipeline: `TODO.new/24-universal-glyph-set-build.md`
238
+ - Production build: `TODO.new/31-universal-set-production-build.md`
239
+ - Existing fetchers: `lib/ucode/fetchers/{ucd,unihan,charts}.rb` (if present)
240
+ - fontist's FontLocator: `lib/ucode/glyphs/real_fonts/font_locator.rb`
241
+ - BBAW font list: https://aaew.bbaw.de/egyptological-unicode-fonts