ucode 0.1.0 → 0.1.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (174) hide show
  1. checksums.yaml +4 -4
  2. data/CHANGELOG.md +72 -0
  3. data/Gemfile.lock +2 -2
  4. data/TODO.full/00-README.md +116 -0
  5. data/TODO.full/01-panglyph-vision.md +112 -0
  6. data/TODO.full/02-panglyph-repo-bootstrap.md +184 -0
  7. data/TODO.full/03-panglyph-font-builder.md +201 -0
  8. data/TODO.full/04-panglyph-publish-pipeline.md +126 -0
  9. data/TODO.full/05-ucode-0-1-1-release.md +139 -0
  10. data/TODO.full/06-fontisan-remove-audit.md +142 -0
  11. data/TODO.full/07-fontisan-remove-ucd.md +125 -0
  12. data/TODO.full/08-archive-private-bin-build.md +143 -0
  13. data/TODO.full/09-archive-public-structure.md +164 -0
  14. data/TODO.full/10-fontist-org-woff-glyphs.md +131 -0
  15. data/TODO.full/11-fontist-org-audit-coverage.md +140 -0
  16. data/TODO.full/12-implementation-order.md +216 -0
  17. data/TODO.full/13-fontisan-font-writer-api.md +189 -0
  18. data/TODO.full/14-fontisan-table-writers.md +66 -0
  19. data/TODO.full/15-panglyph-builder-real.md +82 -0
  20. data/TODO.full/16-archive-public-sync-workflows.md +167 -0
  21. data/TODO.full/17-fontist-org-font-picker.md +73 -0
  22. data/TODO.full/18-comprehensive-spec-coverage.md +64 -0
  23. data/TODO.full/19-ucode-0-1-2-patch.md +32 -0
  24. data/TODO.full/20-fontisan-0-2-23-release.md +52 -0
  25. data/TODO.new/00-README.md +30 -0
  26. data/TODO.new/23-universal-glyph-set-source-map.md +312 -0
  27. data/TODO.new/24-universal-glyph-set-build.md +189 -0
  28. data/TODO.new/25-font-audit-against-universal-set.md +195 -0
  29. data/TODO.new/26-missing-glyph-reporter.md +189 -0
  30. data/TODO.new/27-fontist-org-consumer-integration.md +200 -0
  31. data/TODO.new/28-implementation-order-update.md +187 -0
  32. data/TODO.new/29-universal-set-curation-uc17.md +312 -0
  33. data/TODO.new/30-tier1-font-acquisition.md +241 -0
  34. data/TODO.new/31-universal-set-production-build.md +205 -0
  35. data/TODO.new/32-uc17-coverage-matrix.md +165 -0
  36. data/TODO.new/33-specialist-font-acquisition-refresh.md +138 -0
  37. data/TODO.new/34-pillar2-content-stream-correlator.md +147 -0
  38. data/TODO.new/35-universal-set-production-run.md +160 -0
  39. data/TODO.new/36-per-font-coverage-audit.md +145 -0
  40. data/TODO.new/37-coverage-highlight-reporter.md +125 -0
  41. data/TODO.new/38-fontist-org-glyph-consumer.md +141 -0
  42. data/TODO.new/39-implementation-order-update-32-38.md +258 -0
  43. data/TODO.new/40-archive-private-uses-ucode-audit.md +124 -0
  44. data/TODO.new/41-ucode-unicode-archive-bridge.md +160 -0
  45. data/config/specialist_fonts.yml +102 -0
  46. data/config/unicode17_tier1_fonts.yml +42 -0
  47. data/config/unicode17_universal_glyph_set.yml +293 -0
  48. data/lib/ucode/audit/block_aggregator.rb +57 -29
  49. data/lib/ucode/audit/browser/face_page.rb +128 -0
  50. data/lib/ucode/audit/browser/glyph_panel.rb +124 -0
  51. data/lib/ucode/audit/browser/library_page.rb +74 -0
  52. data/lib/ucode/audit/browser/missing_glyph_page.rb +87 -0
  53. data/lib/ucode/audit/browser/template.rb +47 -0
  54. data/lib/ucode/audit/browser/templates/face.css +200 -0
  55. data/lib/ucode/audit/browser/templates/face.html.erb +41 -0
  56. data/lib/ucode/audit/browser/templates/face.js +298 -0
  57. data/lib/ucode/audit/browser/templates/library.css +119 -0
  58. data/lib/ucode/audit/browser/templates/library.html.erb +42 -0
  59. data/lib/ucode/audit/browser/templates/library.js +99 -0
  60. data/lib/ucode/audit/browser/templates/missing_glyph_page.css +119 -0
  61. data/lib/ucode/audit/browser/templates/missing_glyph_page.html.erb +58 -0
  62. data/lib/ucode/audit/browser/templates/missing_glyph_page.js +2 -0
  63. data/lib/ucode/audit/browser.rb +32 -0
  64. data/lib/ucode/audit/context.rb +27 -1
  65. data/lib/ucode/audit/coverage_reference.rb +103 -0
  66. data/lib/ucode/audit/differ.rb +121 -0
  67. data/lib/ucode/audit/emitter/block_emitter.rb +52 -0
  68. data/lib/ucode/audit/emitter/codepoint_emitter.rb +87 -0
  69. data/lib/ucode/audit/emitter/collection_emitter.rb +80 -0
  70. data/lib/ucode/audit/emitter/face_directory.rb +212 -0
  71. data/lib/ucode/audit/emitter/glyph_emitter.rb +48 -0
  72. data/lib/ucode/audit/emitter/index_emitter.rb +149 -0
  73. data/lib/ucode/audit/emitter/library_emitter.rb +96 -0
  74. data/lib/ucode/audit/emitter/paths.rb +312 -0
  75. data/lib/ucode/audit/emitter/plane_emitter.rb +29 -0
  76. data/lib/ucode/audit/emitter/script_emitter.rb +29 -0
  77. data/lib/ucode/audit/emitter.rb +29 -0
  78. data/lib/ucode/audit/extractors/aggregations.rb +31 -2
  79. data/lib/ucode/audit/face_auditor.rb +86 -0
  80. data/lib/ucode/audit/formatters/audit_diff_text.rb +112 -0
  81. data/lib/ucode/audit/formatters/audit_text.rb +411 -0
  82. data/lib/ucode/audit/formatters/color.rb +48 -0
  83. data/lib/ucode/audit/formatters/library_summary_text.rb +98 -0
  84. data/lib/ucode/audit/formatters/text_formatter.rb +83 -0
  85. data/lib/ucode/audit/formatters.rb +23 -0
  86. data/lib/ucode/audit/library_aggregator.rb +86 -0
  87. data/lib/ucode/audit/library_auditor.rb +105 -0
  88. data/lib/ucode/audit/release/emitter.rb +152 -0
  89. data/lib/ucode/audit/release/face_card.rb +93 -0
  90. data/lib/ucode/audit/release/formula_audits.rb +50 -0
  91. data/lib/ucode/audit/release/library_index_builder.rb +78 -0
  92. data/lib/ucode/audit/release/manifest_builder.rb +127 -0
  93. data/lib/ucode/audit/release.rb +42 -0
  94. data/lib/ucode/audit/ucd_only_reference.rb +81 -0
  95. data/lib/ucode/audit/universal_set_reference.rb +136 -0
  96. data/lib/ucode/audit.rb +31 -0
  97. data/lib/ucode/cli.rb +339 -33
  98. data/lib/ucode/commands/audit/browser_command.rb +82 -0
  99. data/lib/ucode/commands/audit/collection_command.rb +103 -0
  100. data/lib/ucode/commands/audit/compare_command.rb +188 -0
  101. data/lib/ucode/commands/audit/font_command.rb +140 -0
  102. data/lib/ucode/commands/audit/library_command.rb +87 -0
  103. data/lib/ucode/commands/audit/reference_builder.rb +64 -0
  104. data/lib/ucode/commands/audit.rb +20 -0
  105. data/lib/ucode/commands/block_feed.rb +73 -0
  106. data/lib/ucode/commands/canonical_build.rb +138 -0
  107. data/lib/ucode/commands/fetch.rb +37 -1
  108. data/lib/ucode/commands/release.rb +115 -0
  109. data/lib/ucode/commands/universal_set.rb +211 -0
  110. data/lib/ucode/commands.rb +5 -0
  111. data/lib/ucode/coordinator/indices.rb +11 -0
  112. data/lib/ucode/coordinator.rb +138 -5
  113. data/lib/ucode/error.rb +30 -2
  114. data/lib/ucode/fetch/font_fetcher/result.rb +39 -0
  115. data/lib/ucode/fetch/font_fetcher.rb +16 -0
  116. data/lib/ucode/fetch/specialist_font_fetcher.rb +280 -0
  117. data/lib/ucode/fetch.rb +7 -3
  118. data/lib/ucode/glyphs/real_fonts/cmap_cache.rb +74 -0
  119. data/lib/ucode/glyphs/real_fonts.rb +1 -0
  120. data/lib/ucode/glyphs/resolver.rb +62 -0
  121. data/lib/ucode/glyphs/source.rb +48 -0
  122. data/lib/ucode/glyphs/source_builder.rb +61 -0
  123. data/lib/ucode/glyphs/source_config/coverage_assertion.rb +79 -0
  124. data/lib/ucode/glyphs/source_config/gap_report.rb +54 -0
  125. data/lib/ucode/glyphs/source_config.rb +104 -0
  126. data/lib/ucode/glyphs/sources/pillar1_embedded_tounicode.rb +63 -0
  127. data/lib/ucode/glyphs/sources/pillar3_last_resort.rb +51 -0
  128. data/lib/ucode/glyphs/sources/tier1_real_font.rb +104 -0
  129. data/lib/ucode/glyphs/sources.rb +20 -0
  130. data/lib/ucode/glyphs/universal_set/builder.rb +161 -0
  131. data/lib/ucode/glyphs/universal_set/coverage_report.rb +139 -0
  132. data/lib/ucode/glyphs/universal_set/idempotency.rb +86 -0
  133. data/lib/ucode/glyphs/universal_set/manifest_accumulator.rb +195 -0
  134. data/lib/ucode/glyphs/universal_set/manifest_writer.rb +61 -0
  135. data/lib/ucode/glyphs/universal_set/pre_build_check.rb +197 -0
  136. data/lib/ucode/glyphs/universal_set/validator.rb +204 -0
  137. data/lib/ucode/glyphs/universal_set.rb +45 -0
  138. data/lib/ucode/glyphs.rb +6 -0
  139. data/lib/ucode/models/audit/baseline.rb +6 -0
  140. data/lib/ucode/models/audit/block_summary.rb +7 -0
  141. data/lib/ucode/models/audit/codepoint_provenance.rb +39 -0
  142. data/lib/ucode/models/audit/release_face.rb +42 -0
  143. data/lib/ucode/models/audit/release_formula.rb +33 -0
  144. data/lib/ucode/models/audit/release_manifest.rb +43 -0
  145. data/lib/ucode/models/audit/release_universal_set.rb +37 -0
  146. data/lib/ucode/models/audit.rb +9 -0
  147. data/lib/ucode/models/block.rb +2 -0
  148. data/lib/ucode/models/build_report.rb +109 -0
  149. data/lib/ucode/models/codepoint/glyph.rb +42 -0
  150. data/lib/ucode/models/codepoint.rb +3 -0
  151. data/lib/ucode/models/glyph_source.rb +86 -0
  152. data/lib/ucode/models/glyph_source_map.rb +138 -0
  153. data/lib/ucode/models/specialist_font.rb +70 -0
  154. data/lib/ucode/models/specialist_font_manifest.rb +48 -0
  155. data/lib/ucode/models/unihan_entry.rb +81 -9
  156. data/lib/ucode/models/unihan_field.rb +21 -0
  157. data/lib/ucode/models/universal_set_entry.rb +47 -0
  158. data/lib/ucode/models/universal_set_manifest.rb +78 -0
  159. data/lib/ucode/models/validation_report.rb +99 -0
  160. data/lib/ucode/models.rb +9 -0
  161. data/lib/ucode/parsers/named_sequences.rb +5 -5
  162. data/lib/ucode/parsers/unihan.rb +50 -19
  163. data/lib/ucode/repo/aggregate_writer.rb +34 -2
  164. data/lib/ucode/repo/block_feed_emitter.rb +153 -0
  165. data/lib/ucode/repo/build_report_accumulator.rb +138 -0
  166. data/lib/ucode/repo/build_report_writer.rb +46 -0
  167. data/lib/ucode/repo/build_validator.rb +229 -0
  168. data/lib/ucode/repo/codepoint_writer.rb +50 -1
  169. data/lib/ucode/repo/paths.rb +8 -0
  170. data/lib/ucode/repo.rb +4 -0
  171. data/lib/ucode/version.rb +1 -1
  172. data/schema/block-feed.output.schema.yml +134 -0
  173. metadata +143 -2
  174. data/ucode.gemspec +0 -56
@@ -0,0 +1,205 @@
1
+ # 31 — Universal set production build + coverage validation
2
+
3
+ ## Goal
4
+
5
+ Execute the universal-set build (TODO 24) end-to-end against the
6
+ curated source config (TODO 29) with the acquired fonts (TODO 30).
7
+ Validate the output: every assigned Unicode 17 codepoint has a glyph,
8
+ the manifest is complete, provenance is recorded, and per-tier
9
+ coverage matches the curated expectations.
10
+
11
+ This is the actual **production run**. It produces the artifact that
12
+ fontist.org (TODO 27) and the missing-glyph reporter (TODO 26)
13
+ consume.
14
+
15
+ ## Why a separate TODO
16
+
17
+ TODO 24 built the **mechanics**. TODO 29 curated the **policy**. TODO
18
+ 30 fetched the **fonts**. TODO 31 is **execution + validation**.
19
+
20
+ Splitting execution from mechanics lets us:
21
+
22
+ - Catch curation gaps (a font that doesn't actually cover a block).
23
+ - Catch resolver bugs (a Tier 1 font listed but never queried).
24
+ - Catch pillar fallback regressions (pillar-2 should produce
25
+ identical results to correlate-v4, but only if the catalog wiring
26
+ is correct).
27
+ - Produce an auditable coverage report alongside the manifest.
28
+
29
+ ## Pre-build validation
30
+
31
+ Before running the build, assert:
32
+
33
+ 1. **Source config loads cleanly.** `SourceConfig.load(path)` returns
34
+ a `GlyphSourceMap` with no schema errors.
35
+ 2. **All fonts present.** Every `path:` entry in the YAML exists on
36
+ disk (or is fontist-discoverable). Missing fonts = list + abort.
37
+ Don't start a 4-hour build with known-missing inputs.
38
+ 3. **Coverage assertion runs.** TODO 29's `CoverageAssertion` runs;
39
+ gaps are listed but don't abort (expected for some blocks).
40
+
41
+ If pre-build validation fails, abort with a typed
42
+ `Ucode::Glyphs::UniversalSet::PreBuildError` listing each failure.
43
+
44
+ ## Build execution
45
+
46
+ ```bash
47
+ bin/ucode universal-set build \
48
+ --version 17.0.0 \
49
+ --source-config config/unicode17_universal_glyph_set.yml \
50
+ --output output/universal_glyph_set \
51
+ --parallel 8
52
+ ```
53
+
54
+ Expected runtime: ~3-4 hours for full Unicode 17 (150,000+ codepoints).
55
+ CJK dominates the runtime (~45,000 ideographs via FSung).
56
+
57
+ ## Post-build validation
58
+
59
+ After the build, validate:
60
+
61
+ 1. **Completeness.** Every assigned codepoint has a `glyphs/<U+XXXX>.svg`.
62
+ 2. **Manifest integrity.** `manifest.json` parses, has an entry for
63
+ every assigned codepoint, totals reconcile.
64
+ 3. **Provenance recorded.** Every entry has non-nil `tier` and
65
+ `source` fields.
66
+ 4. **No tofu leaks.** Count pillar-3 entries; investigate any that
67
+ aren't documented as expected gaps (unassigned, PUA,
68
+ noncharacter — Last Resort is correct for these).
69
+ 5. **Idempotency.** Re-running with no source changes produces zero
70
+ file writes.
71
+
72
+ ## Per-tier coverage report
73
+
74
+ `reports/by_tier.json`:
75
+
76
+ ```json
77
+ {
78
+ "tier-1": 148512,
79
+ "pillar-1": 800,
80
+ "pillar-2": 200,
81
+ "pillar-3": 1500,
82
+ "gaps": 0
83
+ }
84
+ ```
85
+
86
+ Target: tier-1 ≥ 95% of assigned codepoints. Tier-3 (Last Resort
87
+ tofu) ≤ 1% of assigned codepoints (Last Resort is the correct tier
88
+ for unassigned/PUA/noncharacter — those should be the only tier-3
89
+ entries among assigned codepoints, and there should be none).
90
+
91
+ ## Per-block coverage report
92
+
93
+ `reports/by_block.json`:
94
+
95
+ ```json
96
+ {
97
+ "Sidetic": {
98
+ "assigned": 26, "tier-1": 26, "pillar-1": 0, "pillar-2": 0, "pillar-3": 0
99
+ },
100
+ "Beria_Erfe": {
101
+ "assigned": 50, "tier-1": 50, "pillar-1": 0, "pillar-2": 0, "pillar-3": 0
102
+ },
103
+ "Combining_Diacritical_Marks_Extended": {
104
+ "assigned": 90, "tier-1": 63, "pillar-1": 0, "pillar-2": 27, "pillar-3": 0
105
+ }
106
+ }
107
+ ```
108
+
109
+ Each block's per-tier breakdown makes it obvious where Tier 1 coverage
110
+ is incomplete. In the example, Combining Diacritical Marks Extended
111
+ has 27 codepoints that fell through to pillar-2 — the residual gap
112
+ the curation (TODO 29) flagged.
113
+
114
+ ## Gap investigation
115
+
116
+ `reports/gaps.json` lists every assigned codepoint that ended up at
117
+ pillar-3 (tofu) — these are **bugs**:
118
+
119
+ ```json
120
+ [
121
+ { "codepoint": 119808, "block": "Mathematical_Alphanumeric_Symbols",
122
+ "reason": "tier-1:noto-sans-math did not cover; pillar-2 catalog miss" }
123
+ ]
124
+ ```
125
+
126
+ Each gap entry records the path through the resolver that led to tofu.
127
+ Zero gaps = perfect coverage. Non-zero gaps = actionable curation
128
+ follow-ups (typically: "add font X to block Y's source list").
129
+
130
+ ## CJK Ext J verification
131
+
132
+ Special verification for the largest single block: CJK Unified
133
+ Ideographs Extension J (4,298 codepoints). The build should produce:
134
+
135
+ - `tier-1` count == 4,298 if FSung-* covers all of them.
136
+ - `tier-1` + `pillar-1` count == 4,298 if FSung-* misses some that
137
+ Code Charts PDF covers.
138
+
139
+ Either is acceptable. The `reports/by_block.json` row for Ext J
140
+ documents which path actually fired.
141
+
142
+ ## Files to create
143
+
144
+ - `lib/ucode/glyphs/universal_set/validator.rb` — post-build
145
+ validator. Reads manifest + glyphs dir, runs the 5 checks above.
146
+ - `lib/ucode/glyphs/universal_set/coverage_report.rb` — emits
147
+ per-tier + per-block + gaps JSON reports.
148
+ - `lib/ucode/glyphs/universal_set/pre_build_check.rb` — runs
149
+ pre-build validation (config + fonts + assertion).
150
+ - `lib/ucode/commands/universal_set.rb` — autoload hub (extend if
151
+ present).
152
+ - `lib/ucode/commands/universal_set/validate.rb` — CLI subcommand.
153
+ - Specs:
154
+ - `spec/ucode/glyphs/universal_set/validator_spec.rb`
155
+ - `spec/ucode/glyphs/universal_set/coverage_report_spec.rb`
156
+ - `spec/ucode/glyphs/universal_set/pre_build_check_spec.rb`
157
+
158
+ ## CLI
159
+
160
+ ```bash
161
+ bin/ucode universal-set build # TODO 24, existing
162
+ bin/ucode universal-set validate # TODO 31, new — post-build validation
163
+ bin/ucode universal-set report # TODO 31, new — emit coverage reports
164
+ bin/ucode universal-set pre-check # TODO 31, new — pre-build validation
165
+ ```
166
+
167
+ `build` runs `pre-check` automatically before starting; the standalone
168
+ `pre-check` is for iterating on curation without burning a 4-hour
169
+ build.
170
+
171
+ ## Acceptance
172
+
173
+ - `bin/ucode universal-set build` completes against Unicode 17.0 in
174
+ under 4 hours.
175
+ - `output/universal_glyph_set/manifest.json` shows
176
+ `codepoints_built == codepoints_assigned` (≥ 150,000).
177
+ - `reports/gaps.json` is empty for assigned codepoints outside the
178
+ documented residual cases (Combining Diacritical Marks Extended
179
+ additions, Symbols Legacy Supp additions, Supp Arrows-C additions).
180
+ - `reports/by_tier.json` shows tier-1 ≥ 95% (target: 100% for
181
+ assigned codepoints outside documented gaps).
182
+ - Re-running with no source changes produces zero file writes.
183
+ - The build correctly handles CJK Ext J: all 4,298 codepoints
184
+ resolved via FSung-* or noto-sans-cjk-jp fallback (no tofu leaks).
185
+ - Residual gaps fall through to Pillar 2 cleanly; no crashes, no
186
+ silent skips.
187
+ - `pre-check` aborts on missing font files with a clear list of
188
+ what's missing.
189
+ - Rubocop clean.
190
+
191
+ ## Out of scope
192
+
193
+ - Source config curation — TODO 29.
194
+ - Font acquisition — TODO 30.
195
+ - fontist.org consumer integration — TODO 27.
196
+ - Site rendering of the universal set — TODO 26 / TODO 27.
197
+
198
+ ## References
199
+
200
+ - Build mechanics: `TODO.new/24-universal-glyph-set-build.md`
201
+ - Source config: `TODO.new/29-universal-set-curation-uc17.md`
202
+ - Font acquisition: `TODO.new/30-tier1-font-acquisition.md`
203
+ - Audit consumer: `TODO.new/25-font-audit-against-universal-set.md`
204
+ - Existing builder: `lib/ucode/glyphs/universal_set/builder.rb`
205
+ - Existing manifest model: `lib/ucode/models/universal_set_manifest.rb`
@@ -0,0 +1,165 @@
1
+ # 32 — Universal glyph set: full UC17 coverage matrix (Part 1 master)
2
+
3
+ ## Goal
4
+
5
+ Produce **one canonical Tier 1 font recommendation per Unicode 17 block**
6
+ (~346 entries). This is the master output of Part 1 — the artifact that
7
+ defines "full coverage" for ucode's universal glyph set. Once this
8
+ matrix is encoded in `config/unicode17_universal_glyph_set.yml`, every
9
+ downstream TODO (production build, per-font audit, missing-glyph
10
+ reporter, fontist.org consumer) treats it as ground truth.
11
+
12
+ The matrix does NOT require fonts to be installed or cmaps to be
13
+ verified — that's TODO 35 (production build) and TODO 36 (per-font
14
+ audit). This TODO is purely the **policy**: "for block X, use font Y
15
+ (fallback chain Z)."
16
+
17
+ ## Why a separate TODO
18
+
19
+ TODO 29 (UC17 curation) started this work but stopped at ~30 specialist
20
+ blocks. The remaining ~315 blocks rely on a single `default_sources`
21
+ entry pointing at `noto-sans` via fontist — which the fontist formula
22
+ repo doesn't actually carry as a generic package. So the current config
23
+ CLAIMS full coverage but the resolver can't materialize glyphs for most
24
+ blocks.
25
+
26
+ This TODO splits the policy work from the acquisition work:
27
+
28
+ - **TODO 32 (this)**: decide the canonical font per block (policy)
29
+ - **TODO 33**: fix the acquisition paths (URLs + fontist formulas)
30
+ - **TODO 35**: build the universal set end-to-end (run the policy)
31
+
32
+ Reviewers can sign off on the per-block choices here without waiting
33
+ for font availability.
34
+
35
+ ## Coverage policy (the recommendation)
36
+
37
+ ### Tier 1 default — Noto Sans family
38
+
39
+ Noto is the canonical Tier 1 source for ~250 of ~346 blocks. Where a
40
+ dedicated Noto Sans <Script> variant exists, use it; otherwise fall
41
+ back to `noto-sans` (Latin/core).
42
+
43
+ | Script family | Tier 1 font | Blocks covered |
44
+ |---|---|---|
45
+ | Latin + extensions + IPA + Spacing Modifier + Combining Diacriticals | `noto-sans` | ~20 blocks |
46
+ | Greek + Coptic | `noto-sans` | 2 |
47
+ | Cyrillic (all extensions) | `noto-sans` | 4 |
48
+ | Armenian, Hebrew | `noto-sans-armenian`, `noto-sans-hebrew` | 2 |
49
+ | Arabic + extensions + Supplement | `noto-naskh-arabic` / `noto-sans-arabic` | 4 |
50
+ | Syriac, Thaana, NKo, Samaritan, Mandaic | `noto-sans-<script>` | 5 |
51
+ | Brahmic (Devanagari, Bengali, Gurmukhi, Gujarati, Oriya, Tamil, Telugu, Kannada, Malayalam, Sinhala) | `noto-sans-<script>` | 10 |
52
+ | Tibetan, Myanmar, Georgian | `noto-sans-<script>` | 3+ |
53
+ | Hangul Jamo + compatibility | `noto-sans-hangul` or `noto-sans-kr` | 5 |
54
+ | Ethiopic + extensions | `noto-sans-ethiopic` | 3 |
55
+ | Cherokee, Canadian Aboriginal | `noto-sans-cherokee`, `noto-sans-canadian-aboriginal` | 2 |
56
+ | Khmer, Mongolian, Limbu, Tai Le, Tai Tham, Buginese | `noto-sans-<script>` | 6 |
57
+ | Symbol blocks (Math, Arrows, Misc, Geometric, Dingbats) | `noto-sans-symbols`, `noto-sans-symbols-2`, `noto-sans-math` | ~10 |
58
+ | Music | `noto-music` | 1 |
59
+
60
+ ### Tier 1 specialists (non-Noto)
61
+
62
+ These blocks need fonts outside the Noto family. Each must be acquired
63
+ via `ucode fetch fonts` (specialist manifest, TODO 30).
64
+
65
+ | Block | Range | Tier 1 font | Provenance | Confidence |
66
+ |---|---|---|---:|---|
67
+ | Sidetic | U+10940–1095F | Lentariso ≥1.029 | github.com/Bry10022/Lentariso | HIGH |
68
+ | Beria Erfe | U+16EA0–16EDF | Kedebideri 3.001 | software.sil.org/kedebideri | HIGH |
69
+ | Tai Yo | U+1E6C0–1E6F3 | NotoSerifTaiYo | translationcommons.org | HIGH |
70
+ | Tolong Siki | U+11DB0–11DEF | Noto Sans Tolong Siki | notofonts.github.io | HIGH |
71
+ | Sharada Supplement | U+11B60–11B7F | Noto Sans Sharada | Google Fonts | HIGH |
72
+ | Egyptian Hieroglyphs | U+13000–1342F | UniHieroglyphica v16 | suignard.com/Ptolemaic/ | HIGH |
73
+ | Egyptian Hieroglyph Format Controls | U+13430–1345F | Egyptian Text | github.com/microsoft/font-tools | HIGH |
74
+ | Egyptian Hieroglyphs Extended-A | U+13460–143FF | UniHieroglyphica v16 | suignard.com | HIGH |
75
+ | Egyptian Hieroglyphs Extended-B (new UC17) | U+134A0.. | UniHieroglyphica v16 | suignard.com | HIGH |
76
+ | CJK Unified Ideographs | U+4E00–9FFF | FSung-1.ttf (local) + Noto Sans CJK JP fallback | ~/Downloads/全宋體 | HIGH |
77
+ | CJK Unified Ideographs Extension A | U+3400–4DBF | FSung + Noto Sans CJK JP | ~/Downloads/全宋體 | HIGH |
78
+ | CJK Unified Ideographs Extension B–H | various | FSung-2.ttf..FSung-X.ttf | ~/Downloads/全宋體 | HIGH |
79
+ | CJK Unified Ideographs Extension J (new UC17) | U+31350–323AF | FSung (latest) + Noto Sans CJK JP | ~/Downloads/全宋體 | HIGH |
80
+ | Tangut + Components + Supplement | U+17000–187FF | Noto Sans Tangut | notofonts.github.io | HIGH |
81
+ | Symbols for Legacy Computing Supplement | U+1CC00–1CCFF | BabelStone Pseudographica | babelstone.co.uk | MEDIUM |
82
+ | Supplemental Arrows-C (new UC17) | U+1CF00–1CFCF | Symbola | dn-works.com / github.com/zhm/symbola mirror | MEDIUM |
83
+
84
+ ### Tier 1 emoji
85
+
86
+ | Block | Range | Tier 1 font |
87
+ |---|---|---|
88
+ | Emoticons + Pictographs + Supplemental + Transport + Symbols & Pictographs Extended-A | various | Noto Emoji (monochrome; Noto Color Emoji for color rendering only) |
89
+ | Variation Selectors | U+FE00–FE0F | Noto Sans (special handling — invisible format chars) |
90
+
91
+ ### Pillar 2 fallback (no Tier 1 available)
92
+
93
+ Blocks with no redistributable Tier 1 font MUST go through pillar 2
94
+ (content-stream correlation). TODO 34 builds this; TODO 32 just
95
+ records the policy.
96
+
97
+ | Block | Why pillar 2 | Pillar 2 PDF source |
98
+ |---|---|---|
99
+ | Sidetic (if Lentariso unavailable) | Private foundry | U10940.pdf |
100
+ | Beria Erfe (if Kedebideri unavailable) | UFO source, complex extract | U16EA0.pdf |
101
+ | Egyptian Hieroglyph Format Controls (gap) | Egyptian Text limitations | U13430.pdf |
102
+
103
+ ### Pillar 3 last resort (always-on fallback)
104
+
105
+ When both Tier 1 and pillar 2 fail (or for unassigned/PUA ranges that
106
+ still need a placeholder glyph), the resolver emits a Last Resort Font
107
+ tofu box. This is encoded as the lowest-priority source on
108
+ `default_sources`, not per-block.
109
+
110
+ ## Scope
111
+
112
+ 1. **YAML structure** — extend `Models::GlyphSourceMap` to accept a
113
+ `default_sources` block at the top level (currently forces ~315
114
+ repetitions of the same Noto Sans entry). See TODO 29 §"Architectural
115
+ improvements" for the shape.
116
+
117
+ 2. **Curate every block** — walk `output/blocks/index.json`, decide
118
+ Tier 1 for each. Output: ~340 distinct entries (or ~30 specialists +
119
+ `default_sources`).
120
+
121
+ 3. **Per-block rationale comment** — every non-default entry must
122
+ explain WHY (provenance URL, OFL check, known coverage gaps). This
123
+ becomes the documentation for the universal set; reviewers should
124
+ not need to chase external links to understand a choice.
125
+
126
+ 4. **Resolve the specialists named in TODO 29** that didn't have
127
+ concrete URLs:
128
+ - Lentariso: GitHub repo has no releases (the prior URL was 404).
129
+ Policy: vendor the TTFs from `TTFs/` folder of the repo ZIP
130
+ (downloadable via `git clone` or codeload ZIP).
131
+ - EgyptianText: Microsoft/font-tools has no releases. Policy: pull
132
+ from `EgyptianOpenType/` directory in the repo.
133
+ - UniHieroglyphica: canonical URL is `suignard.com/Ptolemaic/` (BBAW
134
+ page is authoritative), not the prior `/UniHieroglyphica/` path.
135
+ - Symbola: dn-works.com no longer hosts public downloads. Policy:
136
+ mirror via `github.com/zhm/symbola` (OFL, version-pinned).
137
+
138
+ 5. **Test fixtures** — for each curated specialist, capture a small
139
+ fixture (1–5 codepoint ids) and assert the source map returns the
140
+ expected font label. Tests run without the font installed.
141
+
142
+ ## Acceptance
143
+
144
+ - [ ] `config/unicode17_universal_glyph_set.yml` lists every Unicode
145
+ 17 block by id, with `sources:` per entry or implicit
146
+ `default_sources` fallback.
147
+ - [ ] Each specialist entry carries `provenance`, `license`, `url`
148
+ (or `path` for local), and a rationale comment.
149
+ - [ ] `Ucode::Models::GlyphSourceMap#sources_for(block_id)` returns
150
+ the right list for default AND specialist entries.
151
+ - [ ] Every specialist URL is HTTP 200-verifiable (or marked
152
+ `local_only: true` for user-supplied fonts like FSung).
153
+ - [ ] Curation specs cover at least: Basic_Latin (default),
154
+ Sidetic (specialist fontist), Tai Yo (specialist path),
155
+ CJK Unified Ideographs (specialist multi-source with fallback),
156
+ Egyptian Hieroglyphs (specialist path).
157
+
158
+ ## References
159
+
160
+ - [TODO 23](23-universal-glyph-set-source-map.md) — source map mechanism
161
+ - [TODO 29](29-universal-set-curation-uc17.md) — initial curation
162
+ - [TODO 33](33-specialist-font-acquisition-refresh.md) — fix URLs
163
+ - [TODO 35](35-universal-set-production-run.md) — build it
164
+ - `docs/architecture.md` — 4-tier glyph strategy
165
+ - BBAW Egyptological Unicode Fonts page — authoritative for Egyptian family
@@ -0,0 +1,138 @@
1
+ # 33 — Specialist font acquisition refresh
2
+
3
+ ## Goal
4
+
5
+ Fix the broken acquisition paths that block the universal-set build
6
+ (TODO 35) from completing. Five of the seven specialists in
7
+ `config/specialist_fonts.yml` return HTTP 404/301 today:
8
+
9
+ | Label | Current URL | Status | Working URL |
10
+ |---|---|---|---|
11
+ | Lentariso | `github.com/Bry10022/Lentariso/releases/download/1.033/Lentariso.otf` | 404 (no releases published) | Repo has no release artifacts; vendor `TTFs/*.ttf` from `archive/master.zip` |
12
+ | NotoSerifTaiYo | `translationcommons.org/wp-content/uploads/2025/09/NotoSerifTaiYo.ttf` | 404 | Path changed; needs upstream contact or alternate mirror |
13
+ | UniHieroglyphica | `suignard.com/UniHieroglyphica/UniHieroglyphica-16.0.zip` | 301 redirect | New path is `suignard.com/Ptolemaic/` per BBAW |
14
+ | EgyptianText | `github.com/microsoft/font-tools/releases/download/v1.0/EgyptianText-Regular.ttf` | 404 (no releases) | Vendor from `EgyptianOpenType/` in the repo |
15
+ | BabelStonePseudographica | `babelstone.co.uk/Fonts/Download/BabelStonePseudographica.zip` | 404 | Page exists; download path moved — needs page scrape |
16
+ | Symbola | `dn-works.com/wp-content/uploads/2020/ufas/Symbola.zip` | 404 (site no longer hosts downloads) | Mirror at `github.com/zhm/symbola` (version-pinned) |
17
+
18
+ Plus: `noto-sans`, `noto-sans-cjk-jp`, `noto-sans-arabic`, `noto-sans-telugu`,
19
+ `noto-sans-kannada`, `noto-sans-symbols`, `noto-sans-symbols-2`, `noto-music`,
20
+ `noto-sans-sharada`, `noto-sans-sidetic`, `noto-sans-tolong-siki`,
21
+ `noto-sans-tangut` — none are resolvable via `fontist install` (not in
22
+ the formulas repo).
23
+
24
+ ## Why a separate TODO
25
+
26
+ The fontist formulas repo (`github.com/fontist/formulas`) doesn't carry
27
+ most Noto variants as separate packages. ucode's pre-build check fails
28
+ hard on the first missing font; without fixes here, TODO 35 cannot
29
+ proceed.
30
+
31
+ Two distinct fixes are needed:
32
+
33
+ 1. **Direct-fetch URLs** — for specialists with known canonical sources
34
+ not in fontist (Lentariso, EgyptianText, UniHieroglyphica,
35
+ NotoSerifTaiYo, BabelStone, Symbola). These go through
36
+ `ucode fetch fonts` via `config/specialist_fonts.yml`.
37
+
38
+ 2. **fontist formula PRs** — for Noto variants that SHOULD be in
39
+ fontist but aren't yet. Upstream PRs to
40
+ `github.com/fontist/formulas`. Until merged, ucode can fall back
41
+ to direct GitHub release URLs (notofonts.github.io publishes
42
+ release artifacts).
43
+
44
+ ## Scope
45
+
46
+ ### Phase A — Specialist URL refresh (this ucode repo)
47
+
48
+ 1. **Lentariso** — change `url:` from the dead release URL to the
49
+ codeload archive: `https://codeload.github.com/Bry10022/Lentariso/zip/refs/heads/master`,
50
+ set `extract: true`, `extract_member: TTFs/Lentariso-Regular.ttf`
51
+ (and Bold/Italic if needed). Set `extract_multi: true` if the
52
+ fetcher needs to pull multiple members.
53
+
54
+ 2. **EgyptianText** — `https://codeload.github.com/microsoft/font-tools/zip/refs/heads/main`,
55
+ `extract: true`, `extract_member: EgyptianOpenType/EgyptianText-Regular.ttf`.
56
+ License is MIT per the repo; confirm against the LICENSE file
57
+ before recording.
58
+
59
+ 3. **UniHieroglyphica** — change `url:` to the new path under
60
+ `suignard.com/Ptolemaic/`. The exact filename needs a HEAD request
61
+ to discover (likely `UniHieroglyphica-16.0.zip` or
62
+ `UniHieroglyphica.zip`). Record sha256 on first successful fetch.
63
+
64
+ 4. **BabelStonePseudographica** — fetch
65
+ `babelstone.co.uk/Fonts/Pseudographica.html`, parse for the actual
66
+ download link (likely `BabelStonePseudographica.ttf` direct, not
67
+ zip). Update URL accordingly.
68
+
69
+ 5. **Symbola** — change `url:` to
70
+ `https://raw.githubusercontent.com/zhm/symbola/master/fonts/Symbola.ttf`
71
+ (verified HTTP 200). License: OFL per the mirror; confirm upstream
72
+ license matches before recording.
73
+
74
+ 6. **NotoSerifTaiYo** — needs upstream contact (translationcommons.org
75
+ doesn't expose a current download). Options:
76
+ - Email the maintainers (out of scope for code)
77
+ - Mark `local_only: true` and document that the user must supply
78
+ the file
79
+ - Find a GitHub mirror with the font committed
80
+
81
+ **Recommendation:** mark `local_only: true` for now, document in
82
+ the entry's `provenance:` field. Pillar 2 (TODO 34) covers
83
+ U+1E6C0–U+1E6FF if the font isn't available.
84
+
85
+ ### Phase B — fontist formula PRs (external repo)
86
+
87
+ For each missing Noto variant, open a PR against
88
+ `github.com/fontist/formulas` adding a formula. Each formula is a
89
+ small YAML carrying:
90
+
91
+ - Font metadata (name, license, copyright)
92
+ - One or more release URLs with sha256
93
+ - Per-platform install paths
94
+
95
+ Variants to upstream (in priority order):
96
+
97
+ 1. **noto-sans-cjk-jp** — covers the most codepoints; user-visible
98
+ block (CJK Unified Ideographs). Already documented at
99
+ `github.com/notofonts/noto-cjk`.
100
+ 2. **noto-sans-symbols** + **noto-sans-symbols-2** — cover ~10 symbol
101
+ blocks.
102
+ 3. **noto-music** — covers Musical Symbols block.
103
+ 4. **noto-sans-sharada**, **noto-sans-sidetic**, **noto-sans-tolong-siki** —
104
+ UC17 specialists.
105
+ 5. **noto-sans-arabic**, **noto-sans-telugu**, **noto-sans-kannada** —
106
+ scripts where ucode needs the variant.
107
+ 6. **noto-sans-tangut** — Tangut block.
108
+
109
+ ### Phase C — Local fallbacks (until Phase B merges)
110
+
111
+ Until fontist/formulas merges the new formulas, ucode's fetcher
112
+ subsystem can pull directly from `notofonts.github.io` release
113
+ artifacts (e.g. `https://github.com/notofonts/notofonts.github.io/raw/main/fonts/NotoSansTolongSiki/hinted/ttf/NotoSansTolongSiki-Regular.ttf`).
114
+
115
+ Extend `specialist_fonts.yml` to include these as fallback entries
116
+ when `kind: fontist` resolution fails. The fetcher already supports
117
+ `kind: path` for direct URLs; just add Noto variants as path-kind
118
+ entries.
119
+
120
+ ## Acceptance
121
+
122
+ - [ ] `config/specialist_fonts.yml` URLs all return HTTP 200 (or are
123
+ marked `local_only: true` with documented user-supplied path)
124
+ - [ ] `ucode fetch fonts` succeeds for every entry (including the
125
+ previously-broken ones); sha256 recorded in the YAML
126
+ - [ ] Universal-set pre-check (`ucode universal-set pre-check 17.0.0`)
127
+ reports zero `fontist`-kind missing fonts (path-kind allowed
128
+ for not-yet-upstreamed Noto variants)
129
+ - [ ] At least 3 fontist/formulas PRs opened for the highest-priority
130
+ Noto variants (CJK JP, Symbols, Symbols 2)
131
+ - [ ] Each PR carries the upstream license + sha256 in the formula YAML
132
+
133
+ ## References
134
+
135
+ - [TODO 30](30-tier1-font-acquisition.md) — original acquisition design
136
+ - [TODO 32](32-uc17-coverage-matrix.md) — what we need these fonts FOR
137
+ - `config/specialist_fonts.yml` — current (broken) manifest
138
+ - `lib/ucode/commands/fetch.rb` — fetcher implementation
@@ -0,0 +1,147 @@
1
+ # 34 — Pillar 2 ContentStreamCorrelator (generalize correlate-v4)
2
+
3
+ ## Goal
4
+
5
+ Promote pillar 2 (PDF content-stream positional correlation) from a
6
+ throwaway proof-of-concept into a first-class fallback in the
7
+ canonical 4-tier resolver. Today `lib/ucode/glyphs/embedded_fonts/catalog.rb`
8
+ bails at line 226 when `tu_ref` (ToUnicode CMap) is nil; this TODO
9
+ makes it delegate to a new `ContentStreamCorrelator` that recovers
10
+ CID→codepoint mappings from chart geometry alone.
11
+
12
+ Proven on Tai Yo (all 54 assigned codepoints correctly mapped without
13
+ ToUnicode — see `/tmp/correlate_v4.rb`). This TODO generalizes that
14
+ script and makes it the fallback for blocks where:
15
+
16
+ - Tier 1 font is unavailable (Sidetic if Lentariso unavailable,
17
+ Beria Erfe if Kedebideri unavailable, Egyptian Hieroglyph Format
18
+ Controls gaps)
19
+ - The Code Charts PDF embeds subsetted CIDFonts without ToUnicode
20
+ (common for private specimen fonts — Unicode Consortium uses 80+
21
+ such fonts that are not redistributable)
22
+
23
+ ## Why a separate TODO
24
+
25
+ The Catalog is the entry point for pillar 1 extraction. When
26
+ `tu_ref` is nil, today it returns nil, which means the resolver
27
+ silently drops to pillar 3 (Last Resort tofu). For blocks like
28
+ Egyptian Hieroglyphs (4k+ codepoints where source fonts are
29
+ private), this would mean 4k tofu boxes instead of real outlines.
30
+
31
+ Pillar 2 is the only path for these blocks. Generalizing the Tai Yo
32
+ proof is the unlock.
33
+
34
+ ## Algorithm (extracted from correlate-v4)
35
+
36
+ ```ruby
37
+ # 1. Render the chart page to SVG via mutool:
38
+ # `mutool draw -F svg <pdf> <page>` produces an SVG with:
39
+ # <defs><path id="font_N_M"/> for every CID M in specimen font N
40
+ # <use xlink:href="#font_N_M" transform="matrix(a,b,c,d,X,Y)"/>
41
+ # for every placement
42
+ #
43
+ # 2. Partition <use> elements by their font index (the N in font_N_M):
44
+ # - Labels: fonts that emit hex digits (typically font_3, font_8)
45
+ # - Specimens: the CIDFont carrying the actual glyph outlines
46
+ # (typically font_4 or font_6)
47
+ #
48
+ # 3. Cluster label uses by Y-row:
49
+ # yb = (y / 1.5).round * 1.5 # quantize to row height
50
+ # xb = (x / 50.0).round * 50.0 # quantize to column width
51
+ # clusters[[yb, xb]] << label
52
+ #
53
+ # 4. Per cluster, sort members by X and join decoded text:
54
+ # decode = ->(s) { s.gsub(/&#x([0-9a-fA-F]+);/) { [$1.to_i(16)].pack("U") } }
55
+ # cp_hex = members.sort_by { |m| m[:x] }.map { |m| decode.call(m[:text]) }.join
56
+ #
57
+ # 5. The rightmost cluster per Y-row is the specimen codepoint label.
58
+ # The rightmost <use> per Y-row in the specimen font is the
59
+ # specimen glyph placement. CID(M) ↔ codepoint established.
60
+ #
61
+ # 6. Lift <path id="font_<specimen_idx>_<CID>"> outline, normalize
62
+ # viewBox, emit glyph.svg.
63
+ ```
64
+
65
+ The Y-quantization (1.5) and X-quantization (50.0) come from the
66
+ Code Charts typesetting convention. They should be parameters, not
67
+ constants — different charts may use different grid sizes. Empirical
68
+ discovery: walk all labels, find the smallest Y-gap, use that as
69
+ quantization base.
70
+
71
+ ## Combinator caveat (managed)
72
+
73
+ Code Charts convention draws combining marks (Mn category) as
74
+ "dotted-circle + mark" side-by-side. The dotted circle is a separate
75
+ `<use>` element; it does NOT contaminate the mark's glyf outline.
76
+ Verified clean on all 5 Tai Yo Mn codepoints.
77
+
78
+ However, some foundries ship composite glyphs (mark + base in same
79
+ glyf). For those we'd need a dotted-circle subtraction step:
80
+
81
+ 1. Detect U+25CC outline in the extracted path (signature: ring of
82
+ small dots)
83
+ 2. Remove its subpaths from the final glyph
84
+
85
+ This is a follow-up if any block needs it. Initial implementation
86
+ just extracts the outline as-is; compositing artifacts get flagged
87
+ in the validator (TODO 35).
88
+
89
+ ## Scope
90
+
91
+ 1. **`Ucode::Glyphs::EmbeddedFonts::ContentStreamCorrelator`** —
92
+ new class next to `Catalog`. API:
93
+ ```ruby
94
+ correlator = ContentStreamCorrelator.new(pdf_page:, specimen_font_index:)
95
+ mapping = correlator.call # { codepoint_int => cid_int }
96
+ ```
97
+
98
+ 2. **Patch `Catalog#build_entry`** — when `tu_ref` is nil, instead
99
+ of returning nil, delegate to ContentStreamCorrelator. Caller-
100
+ unchanged. Catalog callers see a populated entry regardless of
101
+ whether pillar 1 or pillar 2 produced the mapping.
102
+
103
+ 3. **Page-walk helper** — for a given block PDF, identify the
104
+ specimen font index automatically (currently hardcoded in
105
+ correlate-v4 as font_4). Heuristic: the font with the most
106
+ `<use>` placements AND the highest CID count in `<defs>` is the
107
+ specimen font.
108
+
109
+ 4. **Y-row quantization auto-discovery** — collect all label Y
110
+ positions, find the smallest non-trivial gap, use that as the
111
+ row-height quantization. Same for X-gap → column width.
112
+
113
+ 5. **Path lifting** — given the specimen font index and CID, find
114
+ `<path id="font_<idx>_<cid>">` in the SVG, extract its `d=`
115
+ attribute, normalize the viewBox (typical Code Charts cell is
116
+ ~1000×1000 user units).
117
+
118
+ 6. **mutool integration** — wrap the `mutool draw -F svg` shell
119
+ call. Cache the rendered SVG keyed by PDF path + page number
120
+ under `~/.cache/ucode/unicode/<version>/svg/<block_id>-<page>.svg`.
121
+
122
+ 7. **Specs** — fixture-based tests for:
123
+ - Tai Yo (proven baseline — must reproduce correlate-v4 output
124
+ exactly)
125
+ - Sidetic (no Tier 1 fallback available; pillar 2 mandatory)
126
+ - Beria Erfe (same)
127
+ - At least one block WITH ToUnicode to ensure pillar 1 still
128
+ works (regression guard)
129
+
130
+ ## Acceptance
131
+
132
+ - [ ] `ContentStreamCorrelator` class exists with documented API
133
+ - [ ] Catalog delegates to it when `tu_ref` is nil
134
+ - [ ] Tai Yo test fixture reproduces the correlate-v4 mapping (54/54
135
+ codepoints correctly attributed)
136
+ - [ ] Sidetic + Beria Erfe PDFs produce complete mappings via
137
+ pillar 2 (no tofu fallback)
138
+ - [ ] Combinator cleanliness check: every Mn codepoint's extracted
139
+ glyph passes the "no U+25CC sub-path" heuristic
140
+ - [ ] mutool SVG output is cached; re-runs are no-ops
141
+
142
+ ## References
143
+
144
+ - `/tmp/correlate_v4.rb` — proven implementation (112 lines, Tai Yo)
145
+ - `lib/ucode/glyphs/embedded_fonts/catalog.rb:226` — bail point
146
+ - [TODO 20](20-canonical-resolver-4-tier.md) — original 4-tier design
147
+ - [TODO 32](32-uc17-coverage-matrix.md) — pillar 2 fallback policy