ucode 0.1.0 → 0.1.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +72 -0
- data/Gemfile.lock +2 -2
- data/TODO.full/00-README.md +116 -0
- data/TODO.full/01-panglyph-vision.md +112 -0
- data/TODO.full/02-panglyph-repo-bootstrap.md +184 -0
- data/TODO.full/03-panglyph-font-builder.md +201 -0
- data/TODO.full/04-panglyph-publish-pipeline.md +126 -0
- data/TODO.full/05-ucode-0-1-1-release.md +139 -0
- data/TODO.full/06-fontisan-remove-audit.md +142 -0
- data/TODO.full/07-fontisan-remove-ucd.md +125 -0
- data/TODO.full/08-archive-private-bin-build.md +143 -0
- data/TODO.full/09-archive-public-structure.md +164 -0
- data/TODO.full/10-fontist-org-woff-glyphs.md +131 -0
- data/TODO.full/11-fontist-org-audit-coverage.md +140 -0
- data/TODO.full/12-implementation-order.md +216 -0
- data/TODO.full/13-fontisan-font-writer-api.md +189 -0
- data/TODO.full/14-fontisan-table-writers.md +66 -0
- data/TODO.full/15-panglyph-builder-real.md +82 -0
- data/TODO.full/16-archive-public-sync-workflows.md +167 -0
- data/TODO.full/17-fontist-org-font-picker.md +73 -0
- data/TODO.full/18-comprehensive-spec-coverage.md +64 -0
- data/TODO.full/19-ucode-0-1-2-patch.md +32 -0
- data/TODO.full/20-fontisan-0-2-23-release.md +52 -0
- data/TODO.new/00-README.md +30 -0
- data/TODO.new/23-universal-glyph-set-source-map.md +312 -0
- data/TODO.new/24-universal-glyph-set-build.md +189 -0
- data/TODO.new/25-font-audit-against-universal-set.md +195 -0
- data/TODO.new/26-missing-glyph-reporter.md +189 -0
- data/TODO.new/27-fontist-org-consumer-integration.md +200 -0
- data/TODO.new/28-implementation-order-update.md +187 -0
- data/TODO.new/29-universal-set-curation-uc17.md +312 -0
- data/TODO.new/30-tier1-font-acquisition.md +241 -0
- data/TODO.new/31-universal-set-production-build.md +205 -0
- data/TODO.new/32-uc17-coverage-matrix.md +165 -0
- data/TODO.new/33-specialist-font-acquisition-refresh.md +138 -0
- data/TODO.new/34-pillar2-content-stream-correlator.md +147 -0
- data/TODO.new/35-universal-set-production-run.md +160 -0
- data/TODO.new/36-per-font-coverage-audit.md +145 -0
- data/TODO.new/37-coverage-highlight-reporter.md +125 -0
- data/TODO.new/38-fontist-org-glyph-consumer.md +141 -0
- data/TODO.new/39-implementation-order-update-32-38.md +258 -0
- data/TODO.new/40-archive-private-uses-ucode-audit.md +124 -0
- data/TODO.new/41-ucode-unicode-archive-bridge.md +160 -0
- data/config/specialist_fonts.yml +102 -0
- data/config/unicode17_tier1_fonts.yml +42 -0
- data/config/unicode17_universal_glyph_set.yml +293 -0
- data/lib/ucode/audit/block_aggregator.rb +57 -29
- data/lib/ucode/audit/browser/face_page.rb +128 -0
- data/lib/ucode/audit/browser/glyph_panel.rb +124 -0
- data/lib/ucode/audit/browser/library_page.rb +74 -0
- data/lib/ucode/audit/browser/missing_glyph_page.rb +87 -0
- data/lib/ucode/audit/browser/template.rb +47 -0
- data/lib/ucode/audit/browser/templates/face.css +200 -0
- data/lib/ucode/audit/browser/templates/face.html.erb +41 -0
- data/lib/ucode/audit/browser/templates/face.js +298 -0
- data/lib/ucode/audit/browser/templates/library.css +119 -0
- data/lib/ucode/audit/browser/templates/library.html.erb +42 -0
- data/lib/ucode/audit/browser/templates/library.js +99 -0
- data/lib/ucode/audit/browser/templates/missing_glyph_page.css +119 -0
- data/lib/ucode/audit/browser/templates/missing_glyph_page.html.erb +58 -0
- data/lib/ucode/audit/browser/templates/missing_glyph_page.js +2 -0
- data/lib/ucode/audit/browser.rb +32 -0
- data/lib/ucode/audit/context.rb +27 -1
- data/lib/ucode/audit/coverage_reference.rb +103 -0
- data/lib/ucode/audit/differ.rb +121 -0
- data/lib/ucode/audit/emitter/block_emitter.rb +52 -0
- data/lib/ucode/audit/emitter/codepoint_emitter.rb +87 -0
- data/lib/ucode/audit/emitter/collection_emitter.rb +80 -0
- data/lib/ucode/audit/emitter/face_directory.rb +212 -0
- data/lib/ucode/audit/emitter/glyph_emitter.rb +48 -0
- data/lib/ucode/audit/emitter/index_emitter.rb +149 -0
- data/lib/ucode/audit/emitter/library_emitter.rb +96 -0
- data/lib/ucode/audit/emitter/paths.rb +312 -0
- data/lib/ucode/audit/emitter/plane_emitter.rb +29 -0
- data/lib/ucode/audit/emitter/script_emitter.rb +29 -0
- data/lib/ucode/audit/emitter.rb +29 -0
- data/lib/ucode/audit/extractors/aggregations.rb +31 -2
- data/lib/ucode/audit/face_auditor.rb +86 -0
- data/lib/ucode/audit/formatters/audit_diff_text.rb +112 -0
- data/lib/ucode/audit/formatters/audit_text.rb +411 -0
- data/lib/ucode/audit/formatters/color.rb +48 -0
- data/lib/ucode/audit/formatters/library_summary_text.rb +98 -0
- data/lib/ucode/audit/formatters/text_formatter.rb +83 -0
- data/lib/ucode/audit/formatters.rb +23 -0
- data/lib/ucode/audit/library_aggregator.rb +86 -0
- data/lib/ucode/audit/library_auditor.rb +105 -0
- data/lib/ucode/audit/release/emitter.rb +152 -0
- data/lib/ucode/audit/release/face_card.rb +93 -0
- data/lib/ucode/audit/release/formula_audits.rb +50 -0
- data/lib/ucode/audit/release/library_index_builder.rb +78 -0
- data/lib/ucode/audit/release/manifest_builder.rb +127 -0
- data/lib/ucode/audit/release.rb +42 -0
- data/lib/ucode/audit/ucd_only_reference.rb +81 -0
- data/lib/ucode/audit/universal_set_reference.rb +136 -0
- data/lib/ucode/audit.rb +31 -0
- data/lib/ucode/cli.rb +339 -33
- data/lib/ucode/commands/audit/browser_command.rb +82 -0
- data/lib/ucode/commands/audit/collection_command.rb +103 -0
- data/lib/ucode/commands/audit/compare_command.rb +188 -0
- data/lib/ucode/commands/audit/font_command.rb +140 -0
- data/lib/ucode/commands/audit/library_command.rb +87 -0
- data/lib/ucode/commands/audit/reference_builder.rb +64 -0
- data/lib/ucode/commands/audit.rb +20 -0
- data/lib/ucode/commands/block_feed.rb +73 -0
- data/lib/ucode/commands/canonical_build.rb +138 -0
- data/lib/ucode/commands/fetch.rb +37 -1
- data/lib/ucode/commands/release.rb +115 -0
- data/lib/ucode/commands/universal_set.rb +211 -0
- data/lib/ucode/commands.rb +5 -0
- data/lib/ucode/coordinator/indices.rb +11 -0
- data/lib/ucode/coordinator.rb +138 -5
- data/lib/ucode/error.rb +30 -2
- data/lib/ucode/fetch/font_fetcher/result.rb +39 -0
- data/lib/ucode/fetch/font_fetcher.rb +16 -0
- data/lib/ucode/fetch/specialist_font_fetcher.rb +280 -0
- data/lib/ucode/fetch.rb +7 -3
- data/lib/ucode/glyphs/real_fonts/cmap_cache.rb +74 -0
- data/lib/ucode/glyphs/real_fonts.rb +1 -0
- data/lib/ucode/glyphs/resolver.rb +62 -0
- data/lib/ucode/glyphs/source.rb +48 -0
- data/lib/ucode/glyphs/source_builder.rb +61 -0
- data/lib/ucode/glyphs/source_config/coverage_assertion.rb +79 -0
- data/lib/ucode/glyphs/source_config/gap_report.rb +54 -0
- data/lib/ucode/glyphs/source_config.rb +104 -0
- data/lib/ucode/glyphs/sources/pillar1_embedded_tounicode.rb +63 -0
- data/lib/ucode/glyphs/sources/pillar3_last_resort.rb +51 -0
- data/lib/ucode/glyphs/sources/tier1_real_font.rb +104 -0
- data/lib/ucode/glyphs/sources.rb +20 -0
- data/lib/ucode/glyphs/universal_set/builder.rb +161 -0
- data/lib/ucode/glyphs/universal_set/coverage_report.rb +139 -0
- data/lib/ucode/glyphs/universal_set/idempotency.rb +86 -0
- data/lib/ucode/glyphs/universal_set/manifest_accumulator.rb +195 -0
- data/lib/ucode/glyphs/universal_set/manifest_writer.rb +61 -0
- data/lib/ucode/glyphs/universal_set/pre_build_check.rb +197 -0
- data/lib/ucode/glyphs/universal_set/validator.rb +204 -0
- data/lib/ucode/glyphs/universal_set.rb +45 -0
- data/lib/ucode/glyphs.rb +6 -0
- data/lib/ucode/models/audit/baseline.rb +6 -0
- data/lib/ucode/models/audit/block_summary.rb +7 -0
- data/lib/ucode/models/audit/codepoint_provenance.rb +39 -0
- data/lib/ucode/models/audit/release_face.rb +42 -0
- data/lib/ucode/models/audit/release_formula.rb +33 -0
- data/lib/ucode/models/audit/release_manifest.rb +43 -0
- data/lib/ucode/models/audit/release_universal_set.rb +37 -0
- data/lib/ucode/models/audit.rb +9 -0
- data/lib/ucode/models/block.rb +2 -0
- data/lib/ucode/models/build_report.rb +109 -0
- data/lib/ucode/models/codepoint/glyph.rb +42 -0
- data/lib/ucode/models/codepoint.rb +3 -0
- data/lib/ucode/models/glyph_source.rb +86 -0
- data/lib/ucode/models/glyph_source_map.rb +138 -0
- data/lib/ucode/models/specialist_font.rb +70 -0
- data/lib/ucode/models/specialist_font_manifest.rb +48 -0
- data/lib/ucode/models/unihan_entry.rb +81 -9
- data/lib/ucode/models/unihan_field.rb +21 -0
- data/lib/ucode/models/universal_set_entry.rb +47 -0
- data/lib/ucode/models/universal_set_manifest.rb +78 -0
- data/lib/ucode/models/validation_report.rb +99 -0
- data/lib/ucode/models.rb +9 -0
- data/lib/ucode/parsers/named_sequences.rb +5 -5
- data/lib/ucode/parsers/unihan.rb +50 -19
- data/lib/ucode/repo/aggregate_writer.rb +34 -2
- data/lib/ucode/repo/block_feed_emitter.rb +153 -0
- data/lib/ucode/repo/build_report_accumulator.rb +138 -0
- data/lib/ucode/repo/build_report_writer.rb +46 -0
- data/lib/ucode/repo/build_validator.rb +229 -0
- data/lib/ucode/repo/codepoint_writer.rb +50 -1
- data/lib/ucode/repo/paths.rb +8 -0
- data/lib/ucode/repo.rb +4 -0
- data/lib/ucode/version.rb +1 -1
- data/schema/block-feed.output.schema.yml +134 -0
- metadata +143 -2
- data/ucode.gemspec +0 -56
|
@@ -0,0 +1,312 @@
|
|
|
1
|
+
# 23 — Universal glyph set: Tier 1 source map
|
|
2
|
+
|
|
3
|
+
## Goal
|
|
4
|
+
|
|
5
|
+
Pin the canonical "one best font per Unicode 17 block" map as a
|
|
6
|
+
first-class, versioned artifact. This is the single source of truth
|
|
7
|
+
that drives the universal glyph set build (TODO 24) and the
|
|
8
|
+
audit-against-universal-set pipeline (TODO 25).
|
|
9
|
+
|
|
10
|
+
The resolver in TODO 20 reads this config; the build in TODO 21
|
|
11
|
+
materializes it; audits in TODO 25 reference it. Without this file we
|
|
12
|
+
have resolver mechanics but no opinionated, full-coverage font choice.
|
|
13
|
+
|
|
14
|
+
## Why a separate file
|
|
15
|
+
|
|
16
|
+
Embedding the block→font table inside the resolver (as TODO 20's
|
|
17
|
+
example shows) blurs two concerns:
|
|
18
|
+
|
|
19
|
+
1. **Mechanism** (the priority-ordered dispatch loop) — belongs in
|
|
20
|
+
`Resolver`. Stable across Unicode versions.
|
|
21
|
+
2. **Policy** (which font wins for which block this Unicode version) —
|
|
22
|
+
belongs in a versioned data file. Changes every Unicode release.
|
|
23
|
+
|
|
24
|
+
Lifting policy out into `config/unicode17_universal_glyph_set.yml`
|
|
25
|
+
makes it reviewable on its own, diffable across versions, and editable
|
|
26
|
+
without touching Ruby.
|
|
27
|
+
|
|
28
|
+
## Files to create
|
|
29
|
+
|
|
30
|
+
- `config/unicode17_universal_glyph_set.yml` — the curated map.
|
|
31
|
+
- `lib/ucode/glyphs/source_config.rb` — loader/validator (returns a
|
|
32
|
+
frozen `SourceConfig` instance with `#fonts_for(block_id)`).
|
|
33
|
+
- `lib/ucode/models/glyph_source.rb` — typed model for one entry in
|
|
34
|
+
the yaml (label, kind, path_or_fontist_name, priority, license).
|
|
35
|
+
- `lib/ucode/models/glyph_source_map.rb` — typed model for the whole
|
|
36
|
+
yaml (top-level `unicode_version`, `map` keyed by block_id).
|
|
37
|
+
- `spec/ucode/glyphs/source_config_spec.rb` — loader specs (real
|
|
38
|
+
fixtures, no doubles).
|
|
39
|
+
- `spec/fixtures/glyph_source_map/minimal.yml` — small fixture.
|
|
40
|
+
- `spec/fixtures/glyph_source_map/full.yml` — symlink or copy of the
|
|
41
|
+
production config (exercised by one smoke spec).
|
|
42
|
+
|
|
43
|
+
## YAML shape
|
|
44
|
+
|
|
45
|
+
```yaml
|
|
46
|
+
# config/unicode17_universal_glyph_set.yml
|
|
47
|
+
unicode_version: "17.0.0"
|
|
48
|
+
ucode_version: "0.2.0"
|
|
49
|
+
generated_at: "2026-06-27T12:00:00Z"
|
|
50
|
+
|
|
51
|
+
# Block IDs use the verbatim Unicode original name with underscores
|
|
52
|
+
# (same convention as Blocks.txt folder names). One entry per block;
|
|
53
|
+
# the resolver tries fonts in listed order.
|
|
54
|
+
map:
|
|
55
|
+
Basic_Latin:
|
|
56
|
+
sources:
|
|
57
|
+
- kind: fontist
|
|
58
|
+
label: noto-sans
|
|
59
|
+
priority: 1
|
|
60
|
+
license: OFL
|
|
61
|
+
provenance: "Google Noto Sans, system fallback for Latin"
|
|
62
|
+
- kind: path
|
|
63
|
+
label: system-ui
|
|
64
|
+
path: "/System/Library/Fonts/Helvetica.ttc"
|
|
65
|
+
priority: 2
|
|
66
|
+
license: PROPRIETARY
|
|
67
|
+
provenance: "macOS system font, fallback only"
|
|
68
|
+
|
|
69
|
+
Greek_And_Coptic:
|
|
70
|
+
sources:
|
|
71
|
+
- kind: fontist
|
|
72
|
+
label: noto-sans
|
|
73
|
+
priority: 1
|
|
74
|
+
|
|
75
|
+
CJK_Unified_Ideographs:
|
|
76
|
+
sources:
|
|
77
|
+
- kind: path
|
|
78
|
+
label: FSung-1
|
|
79
|
+
path: "~/Downloads/全宋體/FSung-1.ttf"
|
|
80
|
+
priority: 1
|
|
81
|
+
license: OFL
|
|
82
|
+
provenance: "Taiwan MOE 全宋體, covers U+4E00..U+9FFF core"
|
|
83
|
+
- kind: path
|
|
84
|
+
label: FSung-2
|
|
85
|
+
path: "~/Downloads/全宋體/FSung-2.ttf"
|
|
86
|
+
priority: 2
|
|
87
|
+
# ... FSung-3 .. FSung-X cover the rest of CJK + extensions
|
|
88
|
+
- kind: fontist
|
|
89
|
+
label: noto-sans-cjk-jp
|
|
90
|
+
priority: 99
|
|
91
|
+
provenance: "Catch-all fallback for any CJK codepoint FSung misses"
|
|
92
|
+
|
|
93
|
+
CJK_Unified_Ideographs_Extension_J:
|
|
94
|
+
sources:
|
|
95
|
+
- kind: path
|
|
96
|
+
label: FSung-J
|
|
97
|
+
path: "~/Downloads/全宋體/FSung-J.ttf"
|
|
98
|
+
priority: 1
|
|
99
|
+
- kind: fontist
|
|
100
|
+
label: noto-sans-cjk-jp
|
|
101
|
+
priority: 2
|
|
102
|
+
|
|
103
|
+
Sidetic:
|
|
104
|
+
sources:
|
|
105
|
+
- kind: fontist
|
|
106
|
+
label: lentariso
|
|
107
|
+
priority: 1
|
|
108
|
+
license: OFL
|
|
109
|
+
provenance: "Lentariso ≥1.029 (github.com/Bry10022/Lentariso)"
|
|
110
|
+
- kind: fontist
|
|
111
|
+
label: noto-sans-sidetic
|
|
112
|
+
priority: 2
|
|
113
|
+
|
|
114
|
+
Beria_Erfe:
|
|
115
|
+
sources:
|
|
116
|
+
- kind: fontist
|
|
117
|
+
label: kedebideri
|
|
118
|
+
priority: 1
|
|
119
|
+
license: OFL
|
|
120
|
+
provenance: "Kedebideri 3.001 (software.sil.org/kedebideri)"
|
|
121
|
+
|
|
122
|
+
Tai_Yo:
|
|
123
|
+
sources:
|
|
124
|
+
- kind: path
|
|
125
|
+
label: NotoSerifTaiYo
|
|
126
|
+
path: "data/fonts/NotoSerifTaiYo.ttf"
|
|
127
|
+
priority: 1
|
|
128
|
+
license: OFL
|
|
129
|
+
provenance: "translationcommons.org, proven via correlate-v4"
|
|
130
|
+
|
|
131
|
+
Tolong_Siki:
|
|
132
|
+
sources:
|
|
133
|
+
- kind: fontist
|
|
134
|
+
label: noto-sans-tolong-siki
|
|
135
|
+
priority: 1
|
|
136
|
+
|
|
137
|
+
Sharada_Supplement:
|
|
138
|
+
sources:
|
|
139
|
+
- kind: fontist
|
|
140
|
+
label: noto-sans-sharada
|
|
141
|
+
priority: 1
|
|
142
|
+
|
|
143
|
+
Egyptian_Hieroglyphs:
|
|
144
|
+
sources:
|
|
145
|
+
- kind: path
|
|
146
|
+
label: UniHieroglyphica
|
|
147
|
+
path: "data/fonts/UniHieroglyphica.ttf"
|
|
148
|
+
priority: 1
|
|
149
|
+
license: OFL
|
|
150
|
+
provenance: "suignard.com, authoritative for Egyptian Hieroglyphs"
|
|
151
|
+
|
|
152
|
+
Egyptian_Hieroglyphs_Format_Controls:
|
|
153
|
+
sources:
|
|
154
|
+
- kind: path
|
|
155
|
+
label: Egyptian-Text
|
|
156
|
+
path: "data/fonts/EgyptianText-Regular.ttf"
|
|
157
|
+
priority: 1
|
|
158
|
+
license: OFL
|
|
159
|
+
provenance: "microsoft/font-tools, OFL"
|
|
160
|
+
|
|
161
|
+
Egyptian_Hieroglyphs_Extended_A:
|
|
162
|
+
sources:
|
|
163
|
+
- kind: path
|
|
164
|
+
label: UniHieroglyphica
|
|
165
|
+
path: "data/fonts/UniHieroglyphica.ttf"
|
|
166
|
+
priority: 1
|
|
167
|
+
|
|
168
|
+
Egyptian_Hieroglyphs_Extended_B:
|
|
169
|
+
sources:
|
|
170
|
+
- kind: path
|
|
171
|
+
label: UniHieroglyphica
|
|
172
|
+
path: "data/fonts/UniHieroglyphica.ttf"
|
|
173
|
+
priority: 1
|
|
174
|
+
|
|
175
|
+
Symbols_for_Legacy_Computing_Supplement:
|
|
176
|
+
sources:
|
|
177
|
+
- kind: fontist
|
|
178
|
+
label: babelstone-pseudographica
|
|
179
|
+
priority: 1
|
|
180
|
+
provenance: "BabelStone, partial Unicode 17 coverage"
|
|
181
|
+
|
|
182
|
+
Supplemental_Arrows_C:
|
|
183
|
+
sources:
|
|
184
|
+
- kind: fontist
|
|
185
|
+
label: symbola
|
|
186
|
+
priority: 1
|
|
187
|
+
|
|
188
|
+
Alchemical_Symbols:
|
|
189
|
+
sources:
|
|
190
|
+
- kind: fontist
|
|
191
|
+
label: noto-sans-symbols
|
|
192
|
+
priority: 1
|
|
193
|
+
- kind: fontist
|
|
194
|
+
label: symbola
|
|
195
|
+
priority: 2
|
|
196
|
+
|
|
197
|
+
Miscellaneous_Symbols_Supplement:
|
|
198
|
+
sources:
|
|
199
|
+
- kind: fontist
|
|
200
|
+
label: noto-sans-symbols-2
|
|
201
|
+
priority: 1
|
|
202
|
+
|
|
203
|
+
Musical_Symbols:
|
|
204
|
+
sources:
|
|
205
|
+
- kind: fontist
|
|
206
|
+
label: noto-music
|
|
207
|
+
priority: 1
|
|
208
|
+
|
|
209
|
+
Tangut:
|
|
210
|
+
Tangut_Supplement:
|
|
211
|
+
Tangut_Components:
|
|
212
|
+
sources:
|
|
213
|
+
- kind: fontist
|
|
214
|
+
label: noto-sans-tangut
|
|
215
|
+
priority: 1
|
|
216
|
+
|
|
217
|
+
Adlam:
|
|
218
|
+
sources:
|
|
219
|
+
- kind: fontist
|
|
220
|
+
label: noto-sans-adlam
|
|
221
|
+
priority: 1
|
|
222
|
+
|
|
223
|
+
# ... one entry per Unicode 17 block (~340 total) ...
|
|
224
|
+
|
|
225
|
+
# Blocks with no known Tier 1 font. The resolver falls through to
|
|
226
|
+
# Pillar 1 → Pillar 2 → Pillar 3 for these. Listed here for explicit
|
|
227
|
+
# documentation; resolver treats absent block_id same as empty sources.
|
|
228
|
+
no_tier1_font:
|
|
229
|
+
- Combining_Diacritical_Marks_Extended # additions: font support spotty
|
|
230
|
+
```
|
|
231
|
+
|
|
232
|
+
## Source kinds
|
|
233
|
+
|
|
234
|
+
- `fontist` — fontist-resolvable name. `FontLocator` finds/installs.
|
|
235
|
+
- `path` — explicit filesystem path. Used for local-only fonts
|
|
236
|
+
(FSung, NotoSerifTaiYo before upstreaming).
|
|
237
|
+
- `system` — system font via fontist's system index (macOS `/System`,
|
|
238
|
+
Linux `/usr/share/fonts`). Reserve for fallbacks.
|
|
239
|
+
|
|
240
|
+
`priority` is a per-block integer; lower wins. The resolver iterates
|
|
241
|
+
the block's `sources` in priority order; first hit wins.
|
|
242
|
+
|
|
243
|
+
## Curation rules
|
|
244
|
+
|
|
245
|
+
1. **One font per script family where possible.** Don't list three
|
|
246
|
+
Latin fonts; pick one (Noto Sans) and let pillar 1-3 catch misses.
|
|
247
|
+
2. **CJK is the exception** — FSung is split across many files; one
|
|
248
|
+
entry per file with monotonic priority. The resolver loads all
|
|
249
|
+
of them; `fontist` fallback ensures the long tail still hits.
|
|
250
|
+
3. **Proprietary fonts never ship.** Sources with `license:
|
|
251
|
+
PROPRIETARY` are loaded for glyph extraction only; the extracted
|
|
252
|
+
SVG (open data) ships, the font file does not.
|
|
253
|
+
4. **Provenance is mandatory.** Every entry cites where the font comes
|
|
254
|
+
from and why it's the chosen source. Without provenance, the entry
|
|
255
|
+
is unreviewable.
|
|
256
|
+
5. **Versioned.** Bump `ucode_version` field on every config edit.
|
|
257
|
+
Consumers can detect config drift vs the dataset.
|
|
258
|
+
|
|
259
|
+
## Source config loader
|
|
260
|
+
|
|
261
|
+
```ruby
|
|
262
|
+
class Ucode::Glyphs::SourceConfig
|
|
263
|
+
# @param yaml_path [Pathname]
|
|
264
|
+
# @return [Ucode::Models::GlyphSourceMap]
|
|
265
|
+
def self.load(yaml_path = DEFAULT_PATH)
|
|
266
|
+
parsed = YAML.safe_load(yaml_path.read)
|
|
267
|
+
Ucode::Models::GlyphSourceMap.from_hash(parsed)
|
|
268
|
+
end
|
|
269
|
+
|
|
270
|
+
DEFAULT_PATH = Pathname.new("config/unicode17_universal_glyph_set.yml")
|
|
271
|
+
end
|
|
272
|
+
```
|
|
273
|
+
|
|
274
|
+
The loader validates:
|
|
275
|
+
- `unicode_version` matches the active UCD baseline (`Ucode.configuration.unicode_version`).
|
|
276
|
+
- Every block_id in `Blocks.txt` has an entry (empty `sources:` allowed).
|
|
277
|
+
- Every `path:` resolves to an existing file (warning, not error, for
|
|
278
|
+
paths under `~/Downloads` since those are user-local).
|
|
279
|
+
- Every `fontist:` label is known to fontist's index (warning if not).
|
|
280
|
+
|
|
281
|
+
## Acceptance
|
|
282
|
+
|
|
283
|
+
- `config/unicode17_universal_glyph_set.yml` exists with one entry per
|
|
284
|
+
Unicode 17 block (~340 entries).
|
|
285
|
+
- Every Unicode 17 new block (Sidetic, Beria Erfe, Tai Yo, Tolong
|
|
286
|
+
Siki, Sharada Supplement, CJK Ext J, Symbols Legacy Supp, Supp
|
|
287
|
+
Arrows-C, Alchemical Symbols ext, Misc Symbols Supp, Musical Symbols
|
|
288
|
+
Supp) has at least one Tier 1 source.
|
|
289
|
+
- Every Egyptian Hieroglyphs block has UniHieroglyphica + Egyptian
|
|
290
|
+
Text entries.
|
|
291
|
+
- Loader specs cover: happy path, missing block (warn), invalid yaml
|
|
292
|
+
(raise), missing font file (warn).
|
|
293
|
+
- Smoke spec against `full.yml` confirms the file parses and every
|
|
294
|
+
block_id resolves to a `GlyphSource` array.
|
|
295
|
+
- Rubocop clean.
|
|
296
|
+
|
|
297
|
+
## Out of scope
|
|
298
|
+
|
|
299
|
+
- The resolver mechanics — TODO 20.
|
|
300
|
+
- The build that materializes glyphs from this config — TODO 24.
|
|
301
|
+
- The audit pipeline that uses the universal set as reference — TODO 25.
|
|
302
|
+
- Pillar 1/2/3 sources — these are not in the yaml; the resolver
|
|
303
|
+
appends them implicitly as fallbacks after Tier 1 sources.
|
|
304
|
+
|
|
305
|
+
## References
|
|
306
|
+
|
|
307
|
+
- Resolver mechanics: `TODO.new/20-canonical-resolver-4-tier.md`
|
|
308
|
+
- Universal build: `TODO.new/24-universal-glyph-set-build.md`
|
|
309
|
+
- Baseline data: `TODO.new/05-baseline-unicode17-coverage-audit.md`
|
|
310
|
+
- Architecture: `docs/architecture.md` §"The 4-tier glyph sourcing
|
|
311
|
+
strategy"
|
|
312
|
+
- FontLocator: `lib/ucode/glyphs/real_fonts/font_locator.rb`
|
|
@@ -0,0 +1,189 @@
|
|
|
1
|
+
# 24 — Universal glyph set build
|
|
2
|
+
|
|
3
|
+
## Goal
|
|
4
|
+
|
|
5
|
+
Materialize the universal glyph set: one SVG file per assigned Unicode
|
|
6
|
+
17 codepoint, sourced via the 4-tier resolver using the curated Tier 1
|
|
7
|
+
config from TODO 23. The set is the canonical reference for "what
|
|
8
|
+
Unicode 17 looks like" — every codepoint has exactly one glyph, with
|
|
9
|
+
documented provenance.
|
|
10
|
+
|
|
11
|
+
This is Part 1 of the user's three-part directive: build the FULL base
|
|
12
|
+
with full coverage so it can serve as the reference for font audits.
|
|
13
|
+
|
|
14
|
+
## What "universal" means
|
|
15
|
+
|
|
16
|
+
The universal set is:
|
|
17
|
+
|
|
18
|
+
- **Total**: every assigned codepoint has a glyph.
|
|
19
|
+
- **Single-sourced**: exactly one glyph per codepoint (no alternatives).
|
|
20
|
+
- **Provenance-tagged**: each glyph records its tier + source font.
|
|
21
|
+
- **Stable**: re-running with the same config + Unicode version
|
|
22
|
+
produces byte-identical SVGs.
|
|
23
|
+
- **Public**: derived SVGs are open data even when the source font is
|
|
24
|
+
proprietary.
|
|
25
|
+
|
|
26
|
+
The set is distinct from the per-codepoint Mode 1 dataset (TODO 21).
|
|
27
|
+
Mode 1 puts glyph.svg inside each codepoint's directory along with
|
|
28
|
+
full UCD properties. The universal set is glyph-only, in a flat
|
|
29
|
+
layout, designed for fast lookup by audits.
|
|
30
|
+
|
|
31
|
+
## Files to create
|
|
32
|
+
|
|
33
|
+
```
|
|
34
|
+
lib/ucode/glyphs/universal_set.rb # namespace hub
|
|
35
|
+
lib/ucode/glyphs/universal_set/builder.rb # iterates codepoints, calls resolver, writes
|
|
36
|
+
lib/ucode/glyphs/universal_set/manifest.rb # builds manifest.json with provenance rollup
|
|
37
|
+
lib/ucode/glyphs/universal_set/idempotency.rb # mtime + content-hash check
|
|
38
|
+
lib/ucode/models/universal_set_entry.rb # one manifest entry
|
|
39
|
+
lib/ucode/models/universal_set_manifest.rb # full manifest model
|
|
40
|
+
lib/ucode/commands/universal_set.rb # CLI: bin/ucode universal-set build
|
|
41
|
+
spec/ucode/glyphs/universal_set/builder_spec.rb
|
|
42
|
+
spec/ucode/glyphs/universal_set/manifest_spec.rb
|
|
43
|
+
spec/ucode/commands/universal_set_spec.rb
|
|
44
|
+
spec/fixtures/universal_set/minimal/ # small slice for fixture-driven specs
|
|
45
|
+
```
|
|
46
|
+
|
|
47
|
+
## Output layout
|
|
48
|
+
|
|
49
|
+
```
|
|
50
|
+
output/universal_glyph_set/
|
|
51
|
+
├── manifest.json # one entry per codepoint with provenance
|
|
52
|
+
├── glyphs/
|
|
53
|
+
│ ├── U+0000.svg
|
|
54
|
+
│ ├── U+0001.svg
|
|
55
|
+
│ ├── ...
|
|
56
|
+
│ ├── U+1F6A0.svg
|
|
57
|
+
│ └── ...
|
|
58
|
+
└── reports/
|
|
59
|
+
├── by_tier.json # tier-1: N1, pillar-1: N2, ...
|
|
60
|
+
├── by_block.json # per-block tier breakdown
|
|
61
|
+
└── gaps.json # assigned codepoints with no glyph (should be empty)
|
|
62
|
+
```
|
|
63
|
+
|
|
64
|
+
Filename pattern: `<U+XXXX>.svg` with uppercase hex, zero-padded to 4
|
|
65
|
+
digits (6 for codepoints above U+FFFF). Same convention as Mode 1.
|
|
66
|
+
|
|
67
|
+
## Manifest shape
|
|
68
|
+
|
|
69
|
+
```json
|
|
70
|
+
{
|
|
71
|
+
"unicode_version": "17.0.0",
|
|
72
|
+
"ucode_version": "0.2.0",
|
|
73
|
+
"generated_at": "2026-06-27T12:00:00Z",
|
|
74
|
+
"source_config_sha256": "abc...",
|
|
75
|
+
"totals": {
|
|
76
|
+
"codepoints_assigned": 150012,
|
|
77
|
+
"codepoints_built": 150012,
|
|
78
|
+
"codepoints_skipped": 0,
|
|
79
|
+
"codepoints_failed": 0
|
|
80
|
+
},
|
|
81
|
+
"by_tier": {
|
|
82
|
+
"tier-1": 148512,
|
|
83
|
+
"pillar-1": 800,
|
|
84
|
+
"pillar-2": 200,
|
|
85
|
+
"pillar-3": 1500
|
|
86
|
+
},
|
|
87
|
+
"entries": [
|
|
88
|
+
{ "codepoint": 65, "id": "U+0041", "tier": "tier-1",
|
|
89
|
+
"source": "noto-sans", "svg_sha256": "def...",
|
|
90
|
+
"svg_size_bytes": 412 },
|
|
91
|
+
{ "codepoint": 10980, "id": "U+2AC4", "tier": "tier-1",
|
|
92
|
+
"source": "lentariso", "svg_sha256": "...",
|
|
93
|
+
"svg_size_bytes": 1820 },
|
|
94
|
+
...
|
|
95
|
+
]
|
|
96
|
+
}
|
|
97
|
+
```
|
|
98
|
+
|
|
99
|
+
The manifest is the single index into the set. Audits (TODO 25) read
|
|
100
|
+
the manifest, not the SVGs, for the "is this codepoint in the
|
|
101
|
+
universal set?" check.
|
|
102
|
+
|
|
103
|
+
## Build flow
|
|
104
|
+
|
|
105
|
+
```ruby
|
|
106
|
+
builder = Ucode::Glyphs::UniversalSet::Builder.new(
|
|
107
|
+
output_root: Pathname.new("output/universal_glyph_set"),
|
|
108
|
+
resolver: Ucode::Glyphs::Resolver.new(sources: resolver_sources),
|
|
109
|
+
unicode_version: "17.0.0",
|
|
110
|
+
parallel_workers: Ucode.configuration.parallel_workers,
|
|
111
|
+
)
|
|
112
|
+
builder.build
|
|
113
|
+
```
|
|
114
|
+
|
|
115
|
+
The builder:
|
|
116
|
+
|
|
117
|
+
1. Reads the assigned-codepoints list from the active UCD baseline.
|
|
118
|
+
2. For each codepoint, calls `resolver.resolve(codepoint)` → `Result`.
|
|
119
|
+
3. Writes `glyphs/<U+XXXX>.svg` atomically (reuse
|
|
120
|
+
`Ucode::Repo::AtomicWrites`).
|
|
121
|
+
4. Records the entry in the manifest.
|
|
122
|
+
5. Emits the manifest + reports at the end.
|
|
123
|
+
|
|
124
|
+
Idempotency follows Mode 1's pattern: a codepoint whose source font
|
|
125
|
+
mtime + content hash are unchanged is skipped. Re-running with one
|
|
126
|
+
new Tier 1 font re-resolves only the codepoints the new font covers.
|
|
127
|
+
|
|
128
|
+
## CLI
|
|
129
|
+
|
|
130
|
+
```bash
|
|
131
|
+
bin/ucode universal-set build \
|
|
132
|
+
--version 17.0.0 \
|
|
133
|
+
--source-config config/unicode17_universal_glyph_set.yml \
|
|
134
|
+
--output output/universal_glyph_set \
|
|
135
|
+
[--parallel 8] \
|
|
136
|
+
[--block Sidetic] # optional: build only one block
|
|
137
|
+
```
|
|
138
|
+
|
|
139
|
+
Output: stdout reports progress; final manifest at the output root.
|
|
140
|
+
|
|
141
|
+
## Provenance recording
|
|
142
|
+
|
|
143
|
+
Every `Result` from the resolver carries `tier` and `provenance`. The
|
|
144
|
+
builder copies these into the manifest entry. Per-tier counts are
|
|
145
|
+
rolled up from the entry list.
|
|
146
|
+
|
|
147
|
+
Special: pillar 3 (Last Resort) glyphs are visually identical tofu
|
|
148
|
+
boxes; their `provenance` is `"pillar-3:last-resort"` and their
|
|
149
|
+
`source` field records the Last Resort UFO version. This makes pillar
|
|
150
|
+
3 coverage visible in the audit drill-down (TODO 26) so users know
|
|
151
|
+
"this glyph is a placeholder; we don't have a real outline."
|
|
152
|
+
|
|
153
|
+
## Acceptance
|
|
154
|
+
|
|
155
|
+
- `bin/ucode universal-set build` completes against Unicode 17.0
|
|
156
|
+
without errors.
|
|
157
|
+
- `output/universal_glyph_set/manifest.json` shows
|
|
158
|
+
`codepoints_built == codepoints_assigned`.
|
|
159
|
+
- `reports/gaps.json` is empty (or documents each gap with a reason).
|
|
160
|
+
- Re-running with no source changes produces zero file writes
|
|
161
|
+
(idempotency check).
|
|
162
|
+
- `--block Sidetic` produces only the Sidetic glyphs (~26 files);
|
|
163
|
+
manifest reflects the partial build.
|
|
164
|
+
- A new Tier 1 font addition (e.g. adding a Sidetic font) re-resolves
|
|
165
|
+
only Sidetic; manifest delta shows old pillar-1 entries flipping to
|
|
166
|
+
tier-1.
|
|
167
|
+
- Specs cover: builder happy path (small fixture set), idempotency,
|
|
168
|
+
per-block scoping, manifest serialization round-trip.
|
|
169
|
+
- Rubocop clean.
|
|
170
|
+
|
|
171
|
+
## Out of scope
|
|
172
|
+
|
|
173
|
+
- The Tier 1 source config (TODO 23).
|
|
174
|
+
- Resolver mechanics (TODO 20).
|
|
175
|
+
- Audits that consume the set (TODO 25).
|
|
176
|
+
- Per-codepoint Mode 1 dataset (TODO 21). The universal set is
|
|
177
|
+
separate; it does not replace Mode 1.
|
|
178
|
+
- Site rendering of the universal set (that's a TODO 26 / fontist.org
|
|
179
|
+
concern).
|
|
180
|
+
|
|
181
|
+
## References
|
|
182
|
+
|
|
183
|
+
- Source config: `TODO.new/23-universal-glyph-set-source-map.md`
|
|
184
|
+
- Resolver: `TODO.new/20-canonical-resolver-4-tier.md`
|
|
185
|
+
- Mode 1 build: `TODO.new/21-canonical-unicode17-build.md`
|
|
186
|
+
- Audit consumer: `TODO.new/25-font-audit-against-universal-set.md`
|
|
187
|
+
- AtomicWrites: `lib/ucode/repo/atomic_writes.rb`
|
|
188
|
+
- Existing pillar implementations: `lib/ucode/glyphs/{real_fonts,
|
|
189
|
+
embedded_fonts,last_resort}/`
|
|
@@ -0,0 +1,195 @@
|
|
|
1
|
+
# 25 — Font audit against universal set
|
|
2
|
+
|
|
3
|
+
## Goal
|
|
4
|
+
|
|
5
|
+
Replace the current cmap-vs-UCD coverage audit with a cmap-vs-universal-set
|
|
6
|
+
audit. The font's coverage is now compared against the universal glyph set
|
|
7
|
+
(TODO 24) — one glyph per assigned codepoint — instead of against the
|
|
8
|
+
abstract UCD codepoint list.
|
|
9
|
+
|
|
10
|
+
This is Part 2 of the user's three-part directive. The universal set
|
|
11
|
+
becomes the reference for "what could be rendered." A font's coverage
|
|
12
|
+
report shows not just "1,500 codepoints covered" but "1,500 of the
|
|
13
|
+
150,012 Unicode-17-representable glyphs."
|
|
14
|
+
|
|
15
|
+
## Why universal-set reference, not UCD codepoint list
|
|
16
|
+
|
|
17
|
+
Today's audit (TODOs 04, 11, 13) compares a font's cmap against the
|
|
18
|
+
abstract set of assigned Unicode 17 codepoints. That's correct but
|
|
19
|
+
abstract — a consumer can't see "what does the missing codepoint
|
|
20
|
+
look like?"
|
|
21
|
+
|
|
22
|
+
By comparing against the universal glyph set instead:
|
|
23
|
+
|
|
24
|
+
- Every "missing" codepoint has a renderable glyph the consumer can
|
|
25
|
+
preview (TODO 26).
|
|
26
|
+
- Tier provenance is visible: "this font is missing U+10980 SIDETIC
|
|
27
|
+
LETTER A, which the universal set sources from Lentariso."
|
|
28
|
+
- Audits across fonts are directly comparable: two fonts both missing
|
|
29
|
+
"all of Sidetic" show the same gap, in the same way.
|
|
30
|
+
|
|
31
|
+
Mechanically, the universal set's codepoint list == the assigned
|
|
32
|
+
codepoint list. The audit logic is identical; the difference is that
|
|
33
|
+
every codepoint has an attached glyph + provenance that the renderer
|
|
34
|
+
(TODO 14, TODO 26) can surface.
|
|
35
|
+
|
|
36
|
+
## Files to create / change
|
|
37
|
+
|
|
38
|
+
- `lib/ucode/audit/universal_set_reference.rb` — adapter that wraps
|
|
39
|
+
the universal-set manifest as a `CoverageReference` (interface below).
|
|
40
|
+
- `lib/ucode/audit/coverage_reference.rb` — common interface for any
|
|
41
|
+
"what's the assigned codepoint set" reference (UCD-only and
|
|
42
|
+
universal-set both implement).
|
|
43
|
+
- `lib/ucode/audit/extractors/aggregations.rb` — change to accept a
|
|
44
|
+
`CoverageReference` instead of always reading UCD directly. Default:
|
|
45
|
+
universal-set reference if available; fall back to UCD-only.
|
|
46
|
+
- `lib/ucode/audit/face_auditor.rb` — accept `reference:` kwarg;
|
|
47
|
+
thread it through to extractors.
|
|
48
|
+
- `lib/ucode/audit/library_auditor.rb` — same.
|
|
49
|
+
- `lib/ucode/commands/audit.rb` (new, was originally going to be TODO
|
|
50
|
+
16's CLI) — `ucode audit font` now takes
|
|
51
|
+
`--reference-universal-set=<path>` flag (default: enabled if the
|
|
52
|
+
universal set exists).
|
|
53
|
+
- Specs:
|
|
54
|
+
- `spec/ucode/audit/universal_set_reference_spec.rb`
|
|
55
|
+
- `spec/ucode/audit/extractors/aggregations_with_universal_set_spec.rb`
|
|
56
|
+
- `spec/ucode/commands/audit_with_universal_set_spec.rb`
|
|
57
|
+
|
|
58
|
+
## CoverageReference interface
|
|
59
|
+
|
|
60
|
+
```ruby
|
|
61
|
+
class Ucode::Audit::CoverageReference
|
|
62
|
+
Entry = Struct.new(:codepoint, :id, :tier, :source, keyword_init: true)
|
|
63
|
+
|
|
64
|
+
# @param codepoint [Integer]
|
|
65
|
+
# @return [Boolean]
|
|
66
|
+
def include?(codepoint)
|
|
67
|
+
raise NotImplementedError
|
|
68
|
+
end
|
|
69
|
+
|
|
70
|
+
# @param block_id [String] verbatim block name
|
|
71
|
+
# @return [Array<Entry>] every assigned codepoint in the block,
|
|
72
|
+
# with tier + source from the universal-set manifest
|
|
73
|
+
def entries_for_block(block_id)
|
|
74
|
+
raise NotImplementedError
|
|
75
|
+
end
|
|
76
|
+
|
|
77
|
+
# @return [String] e.g. "ucd-17.0.0", "universal-set:17.0.0:sha256"
|
|
78
|
+
def reference_id
|
|
79
|
+
raise NotImplementedError
|
|
80
|
+
end
|
|
81
|
+
|
|
82
|
+
# @return [Hash{String=>String}] provenance metadata for the report
|
|
83
|
+
def baseline_metadata
|
|
84
|
+
raise NotImplementedError
|
|
85
|
+
end
|
|
86
|
+
end
|
|
87
|
+
```
|
|
88
|
+
|
|
89
|
+
Two concrete implementations:
|
|
90
|
+
|
|
91
|
+
- `Ucode::Audit::UcdOnlyReference` — reads `Blocks.txt` and assigned
|
|
92
|
+
codepoints from the active UCD database. Entry.tier/source are nil.
|
|
93
|
+
- `Ucode::Audit::UniversalSetReference` — reads the universal-set
|
|
94
|
+
manifest (TODO 24). Every entry carries tier + source.
|
|
95
|
+
|
|
96
|
+
## Aggregation changes
|
|
97
|
+
|
|
98
|
+
`BlockAggregator` previously took `block_total_assigned:` integer from
|
|
99
|
+
the UCD-only baseline. It now takes a `CoverageReference` and calls
|
|
100
|
+
`reference.entries_for_block(block_id)` to get the per-codepoint list.
|
|
101
|
+
For each codepoint, the per-block summary includes:
|
|
102
|
+
|
|
103
|
+
- `covered_count` — codepoints in this block that the font's cmap covers.
|
|
104
|
+
- `missing_codepoints` — codepoints in this block that the font's cmap
|
|
105
|
+
does NOT cover, with universal-set entry attached for renderer drill-down.
|
|
106
|
+
|
|
107
|
+
The `AuditReport.baseline` field gains a `reference_kind` ("ucd" or
|
|
108
|
+
"universal-set") so consumers know which kind of reference produced
|
|
109
|
+
the per-block counts.
|
|
110
|
+
|
|
111
|
+
## Report shape delta
|
|
112
|
+
|
|
113
|
+
Existing `block_summaries[i]` (per TODO 03 + 04) carries
|
|
114
|
+
`missing_codepoints: [Integer]`. New optional field per
|
|
115
|
+
`BlockSummary`:
|
|
116
|
+
|
|
117
|
+
```json
|
|
118
|
+
{
|
|
119
|
+
"name": "Sidetic",
|
|
120
|
+
...
|
|
121
|
+
"missing_codepoints": [10981, 10982, ...],
|
|
122
|
+
"missing_codepoint_provenance": [
|
|
123
|
+
{ "codepoint": 10981, "tier": "tier-1", "source": "lentariso" },
|
|
124
|
+
...
|
|
125
|
+
]
|
|
126
|
+
}
|
|
127
|
+
```
|
|
128
|
+
|
|
129
|
+
`missing_codepoint_provenance` is only populated when the reference is
|
|
130
|
+
a UniversalSetReference. UcdOnlyReference produces the existing
|
|
131
|
+
schema (no provenance).
|
|
132
|
+
|
|
133
|
+
This is an additive change. Old consumers ignore the new field. The
|
|
134
|
+
contract (TODO 04) calls this out as a minor version bump.
|
|
135
|
+
|
|
136
|
+
## Backwards compatibility
|
|
137
|
+
|
|
138
|
+
- `ucode audit font` without a universal set behaves exactly as today
|
|
139
|
+
(UCD-only reference).
|
|
140
|
+
- `ucode audit font` with `--reference-universal-set=<path>` switches
|
|
141
|
+
to universal-set reference. The default is to look for the manifest
|
|
142
|
+
at `output/universal_glyph_set/manifest.json`; if present, use it;
|
|
143
|
+
if absent, warn and fall back to UCD-only.
|
|
144
|
+
|
|
145
|
+
This means CI runs that haven't built the universal set yet continue
|
|
146
|
+
to pass. The new functionality is opt-in via presence of the manifest.
|
|
147
|
+
|
|
148
|
+
## Cross-font comparison
|
|
149
|
+
|
|
150
|
+
A new optional output: `output/font_audit/_comparison/<label1>_vs_<label2>.json`
|
|
151
|
+
produced by:
|
|
152
|
+
|
|
153
|
+
```bash
|
|
154
|
+
bin/ucode audit compare <label1> <label2>
|
|
155
|
+
```
|
|
156
|
+
|
|
157
|
+
Diffs two audits: same blocks, same codepoints, but coverage cells
|
|
158
|
+
differ. Powers "Inter covers these N codepoints that Arial misses"
|
|
159
|
+
visualizations on fontist.org.
|
|
160
|
+
|
|
161
|
+
Implementation: extends `Ucode::Audit::Differ` to compare two
|
|
162
|
+
`AuditReport`s at the codepoint level (current `Differ` compares
|
|
163
|
+
fields and structural inventories; new mode compares per-block
|
|
164
|
+
coverage).
|
|
165
|
+
|
|
166
|
+
## Acceptance
|
|
167
|
+
|
|
168
|
+
- `UniversalSetReference` round-trips the universal-set manifest into
|
|
169
|
+
the CoverageReference interface correctly (specs).
|
|
170
|
+
- `FaceAuditor` accepts `reference:` kwarg; defaults to UCD-only when
|
|
171
|
+
omitted.
|
|
172
|
+
- `BlockAggregator` produces `missing_codepoint_provenance` when given
|
|
173
|
+
a UniversalSetReference; omits the field for UcdOnlyReference.
|
|
174
|
+
- `bin/ucode audit font <path> --reference-universal-set=<manifest>`
|
|
175
|
+
produces a report where every missing codepoint carries provenance.
|
|
176
|
+
- `bin/ucode audit font <path>` (no flag, no manifest on disk) is
|
|
177
|
+
byte-identical to today's output (regression check).
|
|
178
|
+
- `bin/ucode audit compare` produces a per-block per-codepoint diff.
|
|
179
|
+
- Rubocop clean.
|
|
180
|
+
|
|
181
|
+
## Out of scope
|
|
182
|
+
|
|
183
|
+
- The drill-down HTML view that renders the universal glyphs next to
|
|
184
|
+
each missing codepoint — TODO 26.
|
|
185
|
+
- The fontist.org consumer side that surfaces the new field — TODO 27.
|
|
186
|
+
- The universal set build itself — TODO 24.
|
|
187
|
+
|
|
188
|
+
## References
|
|
189
|
+
|
|
190
|
+
- Universal set build: `TODO.new/24-universal-glyph-set-build.md`
|
|
191
|
+
- HTML browser: `TODO.new/14-html-face-browser.md`
|
|
192
|
+
- fontist.org contract: `TODO.new/04-fontist-org-contract.md`
|
|
193
|
+
- Existing Differ: `lib/ucode/audit/differ.rb`
|
|
194
|
+
- Existing aggregations extractor:
|
|
195
|
+
`lib/ucode/audit/extractors/aggregations.rb`
|