ucode 0.1.0 → 0.1.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +72 -0
- data/Gemfile.lock +2 -2
- data/TODO.full/00-README.md +116 -0
- data/TODO.full/01-panglyph-vision.md +112 -0
- data/TODO.full/02-panglyph-repo-bootstrap.md +184 -0
- data/TODO.full/03-panglyph-font-builder.md +201 -0
- data/TODO.full/04-panglyph-publish-pipeline.md +126 -0
- data/TODO.full/05-ucode-0-1-1-release.md +139 -0
- data/TODO.full/06-fontisan-remove-audit.md +142 -0
- data/TODO.full/07-fontisan-remove-ucd.md +125 -0
- data/TODO.full/08-archive-private-bin-build.md +143 -0
- data/TODO.full/09-archive-public-structure.md +164 -0
- data/TODO.full/10-fontist-org-woff-glyphs.md +131 -0
- data/TODO.full/11-fontist-org-audit-coverage.md +140 -0
- data/TODO.full/12-implementation-order.md +216 -0
- data/TODO.full/13-fontisan-font-writer-api.md +189 -0
- data/TODO.full/14-fontisan-table-writers.md +66 -0
- data/TODO.full/15-panglyph-builder-real.md +82 -0
- data/TODO.full/16-archive-public-sync-workflows.md +167 -0
- data/TODO.full/17-fontist-org-font-picker.md +73 -0
- data/TODO.full/18-comprehensive-spec-coverage.md +64 -0
- data/TODO.full/19-ucode-0-1-2-patch.md +32 -0
- data/TODO.full/20-fontisan-0-2-23-release.md +52 -0
- data/TODO.new/00-README.md +30 -0
- data/TODO.new/23-universal-glyph-set-source-map.md +312 -0
- data/TODO.new/24-universal-glyph-set-build.md +189 -0
- data/TODO.new/25-font-audit-against-universal-set.md +195 -0
- data/TODO.new/26-missing-glyph-reporter.md +189 -0
- data/TODO.new/27-fontist-org-consumer-integration.md +200 -0
- data/TODO.new/28-implementation-order-update.md +187 -0
- data/TODO.new/29-universal-set-curation-uc17.md +312 -0
- data/TODO.new/30-tier1-font-acquisition.md +241 -0
- data/TODO.new/31-universal-set-production-build.md +205 -0
- data/TODO.new/32-uc17-coverage-matrix.md +165 -0
- data/TODO.new/33-specialist-font-acquisition-refresh.md +138 -0
- data/TODO.new/34-pillar2-content-stream-correlator.md +147 -0
- data/TODO.new/35-universal-set-production-run.md +160 -0
- data/TODO.new/36-per-font-coverage-audit.md +145 -0
- data/TODO.new/37-coverage-highlight-reporter.md +125 -0
- data/TODO.new/38-fontist-org-glyph-consumer.md +141 -0
- data/TODO.new/39-implementation-order-update-32-38.md +258 -0
- data/TODO.new/40-archive-private-uses-ucode-audit.md +124 -0
- data/TODO.new/41-ucode-unicode-archive-bridge.md +160 -0
- data/config/specialist_fonts.yml +102 -0
- data/config/unicode17_tier1_fonts.yml +42 -0
- data/config/unicode17_universal_glyph_set.yml +293 -0
- data/lib/ucode/audit/block_aggregator.rb +57 -29
- data/lib/ucode/audit/browser/face_page.rb +128 -0
- data/lib/ucode/audit/browser/glyph_panel.rb +124 -0
- data/lib/ucode/audit/browser/library_page.rb +74 -0
- data/lib/ucode/audit/browser/missing_glyph_page.rb +87 -0
- data/lib/ucode/audit/browser/template.rb +47 -0
- data/lib/ucode/audit/browser/templates/face.css +200 -0
- data/lib/ucode/audit/browser/templates/face.html.erb +41 -0
- data/lib/ucode/audit/browser/templates/face.js +298 -0
- data/lib/ucode/audit/browser/templates/library.css +119 -0
- data/lib/ucode/audit/browser/templates/library.html.erb +42 -0
- data/lib/ucode/audit/browser/templates/library.js +99 -0
- data/lib/ucode/audit/browser/templates/missing_glyph_page.css +119 -0
- data/lib/ucode/audit/browser/templates/missing_glyph_page.html.erb +58 -0
- data/lib/ucode/audit/browser/templates/missing_glyph_page.js +2 -0
- data/lib/ucode/audit/browser.rb +32 -0
- data/lib/ucode/audit/context.rb +27 -1
- data/lib/ucode/audit/coverage_reference.rb +103 -0
- data/lib/ucode/audit/differ.rb +121 -0
- data/lib/ucode/audit/emitter/block_emitter.rb +52 -0
- data/lib/ucode/audit/emitter/codepoint_emitter.rb +87 -0
- data/lib/ucode/audit/emitter/collection_emitter.rb +80 -0
- data/lib/ucode/audit/emitter/face_directory.rb +212 -0
- data/lib/ucode/audit/emitter/glyph_emitter.rb +48 -0
- data/lib/ucode/audit/emitter/index_emitter.rb +149 -0
- data/lib/ucode/audit/emitter/library_emitter.rb +96 -0
- data/lib/ucode/audit/emitter/paths.rb +312 -0
- data/lib/ucode/audit/emitter/plane_emitter.rb +29 -0
- data/lib/ucode/audit/emitter/script_emitter.rb +29 -0
- data/lib/ucode/audit/emitter.rb +29 -0
- data/lib/ucode/audit/extractors/aggregations.rb +31 -2
- data/lib/ucode/audit/face_auditor.rb +86 -0
- data/lib/ucode/audit/formatters/audit_diff_text.rb +112 -0
- data/lib/ucode/audit/formatters/audit_text.rb +411 -0
- data/lib/ucode/audit/formatters/color.rb +48 -0
- data/lib/ucode/audit/formatters/library_summary_text.rb +98 -0
- data/lib/ucode/audit/formatters/text_formatter.rb +83 -0
- data/lib/ucode/audit/formatters.rb +23 -0
- data/lib/ucode/audit/library_aggregator.rb +86 -0
- data/lib/ucode/audit/library_auditor.rb +105 -0
- data/lib/ucode/audit/release/emitter.rb +152 -0
- data/lib/ucode/audit/release/face_card.rb +93 -0
- data/lib/ucode/audit/release/formula_audits.rb +50 -0
- data/lib/ucode/audit/release/library_index_builder.rb +78 -0
- data/lib/ucode/audit/release/manifest_builder.rb +127 -0
- data/lib/ucode/audit/release.rb +42 -0
- data/lib/ucode/audit/ucd_only_reference.rb +81 -0
- data/lib/ucode/audit/universal_set_reference.rb +136 -0
- data/lib/ucode/audit.rb +31 -0
- data/lib/ucode/cli.rb +339 -33
- data/lib/ucode/commands/audit/browser_command.rb +82 -0
- data/lib/ucode/commands/audit/collection_command.rb +103 -0
- data/lib/ucode/commands/audit/compare_command.rb +188 -0
- data/lib/ucode/commands/audit/font_command.rb +140 -0
- data/lib/ucode/commands/audit/library_command.rb +87 -0
- data/lib/ucode/commands/audit/reference_builder.rb +64 -0
- data/lib/ucode/commands/audit.rb +20 -0
- data/lib/ucode/commands/block_feed.rb +73 -0
- data/lib/ucode/commands/canonical_build.rb +138 -0
- data/lib/ucode/commands/fetch.rb +37 -1
- data/lib/ucode/commands/release.rb +115 -0
- data/lib/ucode/commands/universal_set.rb +211 -0
- data/lib/ucode/commands.rb +5 -0
- data/lib/ucode/coordinator/indices.rb +11 -0
- data/lib/ucode/coordinator.rb +138 -5
- data/lib/ucode/error.rb +30 -2
- data/lib/ucode/fetch/font_fetcher/result.rb +39 -0
- data/lib/ucode/fetch/font_fetcher.rb +16 -0
- data/lib/ucode/fetch/specialist_font_fetcher.rb +280 -0
- data/lib/ucode/fetch.rb +7 -3
- data/lib/ucode/glyphs/real_fonts/cmap_cache.rb +74 -0
- data/lib/ucode/glyphs/real_fonts.rb +1 -0
- data/lib/ucode/glyphs/resolver.rb +62 -0
- data/lib/ucode/glyphs/source.rb +48 -0
- data/lib/ucode/glyphs/source_builder.rb +61 -0
- data/lib/ucode/glyphs/source_config/coverage_assertion.rb +79 -0
- data/lib/ucode/glyphs/source_config/gap_report.rb +54 -0
- data/lib/ucode/glyphs/source_config.rb +104 -0
- data/lib/ucode/glyphs/sources/pillar1_embedded_tounicode.rb +63 -0
- data/lib/ucode/glyphs/sources/pillar3_last_resort.rb +51 -0
- data/lib/ucode/glyphs/sources/tier1_real_font.rb +104 -0
- data/lib/ucode/glyphs/sources.rb +20 -0
- data/lib/ucode/glyphs/universal_set/builder.rb +161 -0
- data/lib/ucode/glyphs/universal_set/coverage_report.rb +139 -0
- data/lib/ucode/glyphs/universal_set/idempotency.rb +86 -0
- data/lib/ucode/glyphs/universal_set/manifest_accumulator.rb +195 -0
- data/lib/ucode/glyphs/universal_set/manifest_writer.rb +61 -0
- data/lib/ucode/glyphs/universal_set/pre_build_check.rb +197 -0
- data/lib/ucode/glyphs/universal_set/validator.rb +204 -0
- data/lib/ucode/glyphs/universal_set.rb +45 -0
- data/lib/ucode/glyphs.rb +6 -0
- data/lib/ucode/models/audit/baseline.rb +6 -0
- data/lib/ucode/models/audit/block_summary.rb +7 -0
- data/lib/ucode/models/audit/codepoint_provenance.rb +39 -0
- data/lib/ucode/models/audit/release_face.rb +42 -0
- data/lib/ucode/models/audit/release_formula.rb +33 -0
- data/lib/ucode/models/audit/release_manifest.rb +43 -0
- data/lib/ucode/models/audit/release_universal_set.rb +37 -0
- data/lib/ucode/models/audit.rb +9 -0
- data/lib/ucode/models/block.rb +2 -0
- data/lib/ucode/models/build_report.rb +109 -0
- data/lib/ucode/models/codepoint/glyph.rb +42 -0
- data/lib/ucode/models/codepoint.rb +3 -0
- data/lib/ucode/models/glyph_source.rb +86 -0
- data/lib/ucode/models/glyph_source_map.rb +138 -0
- data/lib/ucode/models/specialist_font.rb +70 -0
- data/lib/ucode/models/specialist_font_manifest.rb +48 -0
- data/lib/ucode/models/unihan_entry.rb +81 -9
- data/lib/ucode/models/unihan_field.rb +21 -0
- data/lib/ucode/models/universal_set_entry.rb +47 -0
- data/lib/ucode/models/universal_set_manifest.rb +78 -0
- data/lib/ucode/models/validation_report.rb +99 -0
- data/lib/ucode/models.rb +9 -0
- data/lib/ucode/parsers/named_sequences.rb +5 -5
- data/lib/ucode/parsers/unihan.rb +50 -19
- data/lib/ucode/repo/aggregate_writer.rb +34 -2
- data/lib/ucode/repo/block_feed_emitter.rb +153 -0
- data/lib/ucode/repo/build_report_accumulator.rb +138 -0
- data/lib/ucode/repo/build_report_writer.rb +46 -0
- data/lib/ucode/repo/build_validator.rb +229 -0
- data/lib/ucode/repo/codepoint_writer.rb +50 -1
- data/lib/ucode/repo/paths.rb +8 -0
- data/lib/ucode/repo.rb +4 -0
- data/lib/ucode/version.rb +1 -1
- data/schema/block-feed.output.schema.yml +134 -0
- metadata +143 -2
- data/ucode.gemspec +0 -56
|
@@ -0,0 +1,312 @@
|
|
|
1
|
+
# 29 — Universal glyph set: full Unicode 17 curation (Part 1)
|
|
2
|
+
|
|
3
|
+
## Goal
|
|
4
|
+
|
|
5
|
+
Fill `config/unicode17_universal_glyph_set.yml` with concrete Tier 1
|
|
6
|
+
font recommendations for every Unicode 17 block (~340 entries). This
|
|
7
|
+
is **Part 1** of the user's three-part directive — produce the FULL
|
|
8
|
+
base of glyph coverage so font audits (TODO 25, shipped) and
|
|
9
|
+
missing-glyph reports (TODO 26) compare against a real reference.
|
|
10
|
+
|
|
11
|
+
Today the config has 315 of 340 blocks with `sources: []`. The
|
|
12
|
+
universal-set build (TODO 24) would produce pillar-3 tofu for every
|
|
13
|
+
block without a Tier 1 source — i.e. most of Unicode 17. This TODO
|
|
14
|
+
closes that gap by encoding the user's August 2025 font investigation
|
|
15
|
+
into the config.
|
|
16
|
+
|
|
17
|
+
## Why a separate TODO
|
|
18
|
+
|
|
19
|
+
TODO 23 built the **mechanism** (loader, validator, models). TODO 24
|
|
20
|
+
built the **build pipeline**. TODO 21 referenced Tier 1 fonts in its
|
|
21
|
+
example. None of them actually **curated** the per-block font choices
|
|
22
|
+
— they all deferred to "filled in from baseline audit (TODO 05)."
|
|
23
|
+
|
|
24
|
+
This TODO is that filling-in pass. The analysis comes from the
|
|
25
|
+
user's investigation (Lentariso, Kedebideri, NotoSerifTaiYo,
|
|
26
|
+
UniHieroglyphica, Egyptian Text, FSung-*), cross-referenced with the
|
|
27
|
+
Noto Fonts dashboard and the BBAW egyptological font list.
|
|
28
|
+
|
|
29
|
+
## Architectural improvements
|
|
30
|
+
|
|
31
|
+
### `default_sources` at the YAML top level (DRY)
|
|
32
|
+
|
|
33
|
+
Most blocks (~250 of ~340) use Noto Sans as their Tier 1 source. The
|
|
34
|
+
current YAML forces each block to repeat:
|
|
35
|
+
|
|
36
|
+
```yaml
|
|
37
|
+
Basic_Latin:
|
|
38
|
+
sources:
|
|
39
|
+
- kind: fontist
|
|
40
|
+
label: noto-sans
|
|
41
|
+
priority: 1
|
|
42
|
+
license: OFL
|
|
43
|
+
Latin-1_Supplement:
|
|
44
|
+
sources:
|
|
45
|
+
- kind: fontist
|
|
46
|
+
label: noto-sans
|
|
47
|
+
priority: 1
|
|
48
|
+
license: OFL
|
|
49
|
+
# ... 248 more copies of the same entry
|
|
50
|
+
```
|
|
51
|
+
|
|
52
|
+
~1250 lines of noise for a single rule. Add a top-level
|
|
53
|
+
`default_sources` that applies when a block's `sources:` is empty or
|
|
54
|
+
absent:
|
|
55
|
+
|
|
56
|
+
```yaml
|
|
57
|
+
default_sources:
|
|
58
|
+
- kind: fontist
|
|
59
|
+
label: noto-sans
|
|
60
|
+
priority: 1
|
|
61
|
+
license: OFL
|
|
62
|
+
provenance: "Universal fallback for Latin-family scripts"
|
|
63
|
+
```
|
|
64
|
+
|
|
65
|
+
The curated specialists (Sidetic, Beria Erfe, Tai Yo, etc.) stand out
|
|
66
|
+
as the entries that actually carry policy. Reviewers see "what's
|
|
67
|
+
different" instead of wading through copy-paste.
|
|
68
|
+
|
|
69
|
+
### `sources_for(block_id)` on the map (single source of truth)
|
|
70
|
+
|
|
71
|
+
The map answers `sources_for(block_id)`. Internally it falls through:
|
|
72
|
+
block-specific sources → `default_sources` → empty. The loader
|
|
73
|
+
returns the map unmodified; the resolver asks the map.
|
|
74
|
+
|
|
75
|
+
```ruby
|
|
76
|
+
class Ucode::Models::GlyphSourceMap
|
|
77
|
+
def sources_for(block_id)
|
|
78
|
+
entry = map[block_id]
|
|
79
|
+
return entry.sources if entry && entry.sources.any?
|
|
80
|
+
return default_sources if default_sources.any?
|
|
81
|
+
|
|
82
|
+
[]
|
|
83
|
+
end
|
|
84
|
+
end
|
|
85
|
+
```
|
|
86
|
+
|
|
87
|
+
This keeps the map as the single source of truth — no separate
|
|
88
|
+
"default-application" pass that mutates state.
|
|
89
|
+
|
|
90
|
+
### Coverage assertion (reviewability)
|
|
91
|
+
|
|
92
|
+
When the loader builds the `GlyphSourceMap`, it does NOT assert
|
|
93
|
+
coverage (load stays cheap). A separate `CoverageAssertion` walker
|
|
94
|
+
iterates every block, opens each Tier 1 font's cmap, and reports
|
|
95
|
+
which assigned codepoints have no Tier 1 source. Output:
|
|
96
|
+
|
|
97
|
+
```ruby
|
|
98
|
+
report = Ucode::Glyphs::SourceConfig::CoverageAssertion.new(
|
|
99
|
+
source_map: map,
|
|
100
|
+
database: Ucode::Database.open("17.0.0"),
|
|
101
|
+
font_cmaps: Ucode::Glyphs::RealFonts::CmapCache.new(fonts_in(map)),
|
|
102
|
+
).call
|
|
103
|
+
|
|
104
|
+
report.gaps_by_block
|
|
105
|
+
# => { "Combining_Diacritical_Marks_Extended" => [7116, 7117, ...],
|
|
106
|
+
# "Supplemental_Arrows_C" => [118784, 118785] }
|
|
107
|
+
```
|
|
108
|
+
|
|
109
|
+
This is a **development-time check** — the build still runs, gaps
|
|
110
|
+
fall through to pillar 1-2-3. The report makes curation reviewable:
|
|
111
|
+
"we have 4321 codepoints with no Tier 1 font; here they are by block."
|
|
112
|
+
|
|
113
|
+
Without this assertion, gaps are silent.
|
|
114
|
+
|
|
115
|
+
### `script_defaults` (out of scope, future improvement)
|
|
116
|
+
|
|
117
|
+
A further DRY step: map Unicode `Script` property values to default
|
|
118
|
+
fonts. Loader resolves block → primary script → font. Saves another
|
|
119
|
+
~100 lines for the per-script Noto variants (Hebrew, Arabic, Devanagari,
|
|
120
|
+
Bengali, etc.). Deferred — `default_sources` is enough for v0.2.
|
|
121
|
+
|
|
122
|
+
## Curation matrix
|
|
123
|
+
|
|
124
|
+
### New Unicode 17 blocks (11 blocks, fully curated)
|
|
125
|
+
|
|
126
|
+
| Block ID | Codepoints | Tier 1 source | Fallback |
|
|
127
|
+
|---|---:|---|---|
|
|
128
|
+
| `Sidetic` | 26 | `data/fonts/Lentariso.otf` (≥1.029) | fontist:noto-sans-sidetic |
|
|
129
|
+
| `Beria_Erfe` | 50 | `data/fonts/Kedebideri-Regular.ttf` (3.001) | pillar-2 |
|
|
130
|
+
| `Tai_Yo` | 54 | `data/fonts/NotoSerifTaiYo.ttf` | fontist:noto-sans-tai-yo |
|
|
131
|
+
| `Tolong_Siki` | 54 | fontist:noto-sans-tolong-siki | pillar-2 |
|
|
132
|
+
| `Sharada_Supplement` | 8 | fontist:noto-sans-sharada | pillar-2 |
|
|
133
|
+
| `CJK_Unified_Ideographs_Extension_J` | 4,298 | `~/Downloads/全宋體/FSung-*.ttf` (priority 1-9) | fontist:noto-sans-cjk-jp |
|
|
134
|
+
| `Symbols_for_Legacy_Computing_Supplement` | 9 | `data/fonts/BabelStonePseudographica.ttf` | pillar-2 (Unicode 17 additions may be missing) |
|
|
135
|
+
| `Supplemental_Arrows_C` | 9 | `data/fonts/Symbola.ttf` | pillar-2 (same caveat) |
|
|
136
|
+
| `Alchemical_Symbols` (4 new + existing) | 4 + 102 | fontist:noto-sans-symbols | `data/fonts/Symbola.ttf` |
|
|
137
|
+
| `Miscellaneous_Symbols_Supplement` | 34 | fontist:noto-sans-symbols-2 | `data/fonts/Symbola.ttf` |
|
|
138
|
+
| `Musical_Symbols` (UC17 additions) | TBD | fontist:noto-music | pillar-2 |
|
|
139
|
+
|
|
140
|
+
### Egyptian Hieroglyphs family (4 blocks)
|
|
141
|
+
|
|
142
|
+
| Block ID | Range | Codepoints | Tier 1 source |
|
|
143
|
+
|---|---|---:|---|
|
|
144
|
+
| `Egyptian_Hieroglyphs` | U+13000..U+1342F (+28 in UC17) | ~1,072+28 | `data/fonts/UniHieroglyphica.ttf` (v16) |
|
|
145
|
+
| `Egyptian_Hieroglyphs_Format_Controls` | U+13430..U+1345F | 36 | `data/fonts/EgyptianText-Regular.ttf` (microsoft/font-tools) |
|
|
146
|
+
| `Egyptian_Hieroglyphs_Extended-A` | U+13460..U+143FF (+9 in UC17) | ~3,936+9 | `data/fonts/UniHieroglyphica.ttf` (v16) |
|
|
147
|
+
| `Egyptian_Hieroglyphs_Extended-B` | NEW in UC17 | ~600 | `data/fonts/UniHieroglyphica.ttf` (v16) |
|
|
148
|
+
|
|
149
|
+
UniHieroglyphica is the authoritative source for Egyptian Hieroglyph
|
|
150
|
+
outlines (https://aaew.bbaw.de/egyptological-unicode-fonts). Egyptian
|
|
151
|
+
Text (microsoft/font-tools, OFL) is the only source for the Format
|
|
152
|
+
Controls block.
|
|
153
|
+
|
|
154
|
+
### Existing blocks with Unicode 17 additions (selected)
|
|
155
|
+
|
|
156
|
+
| Block | UC17 additions | Tier 1 source |
|
|
157
|
+
|---|---:|---|
|
|
158
|
+
| `Tangut` | 8 | fontist:noto-sans-tangut |
|
|
159
|
+
| `Tangut_Supplement` | 22 | fontist:noto-sans-tangut |
|
|
160
|
+
| `Tangut_Components` | 115 | fontist:noto-sans-tangut |
|
|
161
|
+
| `Adlam` | 29 | fontist:noto-sans-adlam |
|
|
162
|
+
| `Arabic_Extended-B` | (UC17) | fontist:noto-sans-arabic |
|
|
163
|
+
| `Arabic_Extended-C` (new) | TBD | fontist:noto-sans-arabic |
|
|
164
|
+
| `Telugu` | 1 | fontist:noto-sans-telugu |
|
|
165
|
+
| `Kannada` | 1 | fontist:noto-sans-kannada |
|
|
166
|
+
| `Combining_Diacritical_Marks_Extended` | +27 | pillar-2 (font support spotty) |
|
|
167
|
+
| `CJK_Unified_Ideographs_Extension_C` | additions | `~/Downloads/全宋體/FSung-C.ttf` |
|
|
168
|
+
| `CJK_Unified_Ideographs_Extension_E` | additions | `~/Downloads/全宋體/FSung-E.ttf` |
|
|
169
|
+
| `Chess_Symbols` | +4 | fontist:noto-sans-symbols-2 |
|
|
170
|
+
| `Transport_and_Map_Symbols` | +1 | fontist:noto-sans-symbols-2 |
|
|
171
|
+
| `Symbols_and_Pictographs_Extended-A` | +6 | fontist:noto-sans-symbols-2 |
|
|
172
|
+
|
|
173
|
+
### Everything else (~250 blocks)
|
|
174
|
+
|
|
175
|
+
`default_sources` (noto-sans) covers:
|
|
176
|
+
|
|
177
|
+
- All Latin, Greek, Cyrillic, Armenian, Hebrew base + supplement blocks.
|
|
178
|
+
- All symbol blocks where Noto Sans covers (Mathematical Operators,
|
|
179
|
+
Box Drawing, Block Elements, Geometric Shapes, etc.).
|
|
180
|
+
- General punctuation, control pictures, etc.
|
|
181
|
+
|
|
182
|
+
When `default_sources` is exhausted (a codepoint is outside Noto
|
|
183
|
+
Sans's coverage), the resolver falls through to Pillar 1 → 2 → 3.
|
|
184
|
+
|
|
185
|
+
## Curation rules (carry from TODO 23, refined)
|
|
186
|
+
|
|
187
|
+
1. **One Tier 1 font per script family.** Specialist fonts only for
|
|
188
|
+
blocks the default can't cover.
|
|
189
|
+
2. **Proprietary fonts never ship.** Sources with `license:
|
|
190
|
+
PROPRIETARY` are loaded for glyph extraction only; the extracted
|
|
191
|
+
SVG (open data) ships, the font file does not.
|
|
192
|
+
3. **Provenance mandatory.** Every specialist entry cites where the
|
|
193
|
+
font comes from and why.
|
|
194
|
+
4. **`priority` lower wins.** The resolver tries sources in priority
|
|
195
|
+
order; first hit wins.
|
|
196
|
+
5. **Block IDs verbatim.** Use the exact Unicode block name with
|
|
197
|
+
underscores (e.g. `Greek_and_Coptic`, never slugified).
|
|
198
|
+
|
|
199
|
+
## Files to change / create
|
|
200
|
+
|
|
201
|
+
- `config/unicode17_universal_glyph_set.yml` — full content (~150
|
|
202
|
+
lines thanks to `default_sources`).
|
|
203
|
+
- `lib/ucode/models/glyph_source_map.rb` — add `default_sources`
|
|
204
|
+
attribute (collection of `GlyphSource`); add `sources_for(block_id)`.
|
|
205
|
+
- `lib/ucode/models/glyph_source.rb` — no change (already supports
|
|
206
|
+
the shape; the YAML loader just populates `default_sources` from
|
|
207
|
+
the top-level key via the existing mapping).
|
|
208
|
+
- `lib/ucode/glyphs/source_config.rb` — no change to the loader
|
|
209
|
+
itself; it already returns the map. (Existing
|
|
210
|
+
`GlyphSourceMap#fonts_for(block_id)` callers migrate to
|
|
211
|
+
`sources_for(block_id)`; the old method is removed.)
|
|
212
|
+
- `lib/ucode/glyphs/source_config/coverage_assertion.rb` — new.
|
|
213
|
+
- `lib/ucode/glyphs/source_config/gap_report.rb` — new typed result.
|
|
214
|
+
- `lib/ucode/glyphs/source_config.rb` — re-open to add the
|
|
215
|
+
`CoverageAssertion` autoload (or place under
|
|
216
|
+
`lib/ucode/glyphs/source_config/` and add a namespace hub).
|
|
217
|
+
- Specs:
|
|
218
|
+
- Update `spec/ucode/glyphs/source_config_spec.rb` for
|
|
219
|
+
`default_sources` + `sources_for`.
|
|
220
|
+
- New `spec/ucode/glyphs/source_config/coverage_assertion_spec.rb`.
|
|
221
|
+
- Smoke spec: full config loads cleanly, every block resolves to
|
|
222
|
+
at least one source (count of `gaps == 0` for curated blocks).
|
|
223
|
+
|
|
224
|
+
## Loader shape (target)
|
|
225
|
+
|
|
226
|
+
```ruby
|
|
227
|
+
class Ucode::Glyphs::SourceConfig
|
|
228
|
+
DEFAULT_PATH = Pathname.new("config/unicode17_universal_glyph_set.yml")
|
|
229
|
+
|
|
230
|
+
def self.load(yaml_path = DEFAULT_PATH)
|
|
231
|
+
parsed = YAML.safe_load(yaml_path.read)
|
|
232
|
+
Ucode::Models::GlyphSourceMap.from_hash(parsed)
|
|
233
|
+
end
|
|
234
|
+
end
|
|
235
|
+
```
|
|
236
|
+
|
|
237
|
+
The map's `from_hash` already handles a top-level `default_sources`
|
|
238
|
+
array via the existing lutaml-model mapping (the only change is
|
|
239
|
+
adding the attribute + the `sources_for` method).
|
|
240
|
+
|
|
241
|
+
## Coverage assertion shape
|
|
242
|
+
|
|
243
|
+
```ruby
|
|
244
|
+
class Ucode::Glyphs::SourceConfig::CoverageAssertion
|
|
245
|
+
def initialize(source_map:, database:, font_cmaps:)
|
|
246
|
+
@source_map = source_map
|
|
247
|
+
@database = database
|
|
248
|
+
@font_cmaps = font_cmaps
|
|
249
|
+
end
|
|
250
|
+
|
|
251
|
+
def call
|
|
252
|
+
gaps = Hash.new { |h, k| h[k] = [] }
|
|
253
|
+
@database.each_assigned_codepoint do |cp|
|
|
254
|
+
block_id = @database.lookup_block(cp)
|
|
255
|
+
next unless block_id
|
|
256
|
+
|
|
257
|
+
sources = @source_map.sources_for(block_id)
|
|
258
|
+
next if sources.empty? # uncurated block; not a gap, just unconfigured
|
|
259
|
+
|
|
260
|
+
next if sources.any? { |s| @font_cmaps.covers?(s.label, cp) }
|
|
261
|
+
|
|
262
|
+
gaps[block_id] << cp
|
|
263
|
+
end
|
|
264
|
+
GapReport.new(gaps_by_block: gaps.freeze)
|
|
265
|
+
end
|
|
266
|
+
end
|
|
267
|
+
```
|
|
268
|
+
|
|
269
|
+
The assertion never raises — it returns a typed `GapReport`. Callers
|
|
270
|
+
decide whether to act on gaps (CI: warn; local: print; production
|
|
271
|
+
build: continue and let pillar 1-2-3 catch up).
|
|
272
|
+
|
|
273
|
+
## Acceptance
|
|
274
|
+
|
|
275
|
+
- `config/unicode17_universal_glyph_set.yml` exists with:
|
|
276
|
+
- `default_sources` populated (noto-sans + fallback chain).
|
|
277
|
+
- All 11 new Unicode 17 blocks curated with specific Tier 1 sources.
|
|
278
|
+
- All 4 Egyptian Hieroglyphs blocks curated.
|
|
279
|
+
- `~/Downloads/全宋體/FSung-*` paths documented for CJK Ext J
|
|
280
|
+
(user-local fallback; warning emitted if absent).
|
|
281
|
+
- `GlyphSourceMap#sources_for(block_id)` returns block-specific
|
|
282
|
+
sources when present, otherwise `default_sources`, otherwise `[]`.
|
|
283
|
+
- `CoverageAssertion` produces a `GapReport` whose `gaps_by_block`
|
|
284
|
+
matches expectations: empty for curated blocks, populated for
|
|
285
|
+
known-gap blocks (Combining Diacritical Marks Extended, Symbols
|
|
286
|
+
for Legacy Computing Supp UC17 additions, Supplemental Arrows-C
|
|
287
|
+
UC17 additions).
|
|
288
|
+
- Smoke spec on the full config: every block resolves to at least
|
|
289
|
+
one source (no `[]` results for any assigned block).
|
|
290
|
+
- Rubocop clean.
|
|
291
|
+
|
|
292
|
+
## Out of scope
|
|
293
|
+
|
|
294
|
+
- Font acquisition (downloading Lentariso, Kedebideri, etc.) — TODO 30.
|
|
295
|
+
- Production build execution — TODO 31.
|
|
296
|
+
- Pillar 2 correlator hardening for residual gaps — TODO 31 (validate
|
|
297
|
+
during production build).
|
|
298
|
+
- CJK Ext J verification (FSung-* actually covers all 4,298
|
|
299
|
+
codepoints) — TODO 31 (validate during production build).
|
|
300
|
+
- `script_defaults` refinement — future TODO.
|
|
301
|
+
|
|
302
|
+
## References
|
|
303
|
+
|
|
304
|
+
- Source map mechanism: `TODO.new/23-universal-glyph-set-source-map.md`
|
|
305
|
+
- Build pipeline: `TODO.new/24-universal-glyph-set-build.md`
|
|
306
|
+
- Font audit against universal set: `TODO.new/25-font-audit-against-universal-set.md`
|
|
307
|
+
- Font acquisition: `TODO.new/30-tier1-font-acquisition.md`
|
|
308
|
+
- Production build: `TODO.new/31-universal-set-production-build.md`
|
|
309
|
+
- Architecture: `docs/architecture.md` §"The 4-tier glyph sourcing strategy"
|
|
310
|
+
- BBAW font list: https://aaew.bbaw.de/egyptological-unicode-fonts
|
|
311
|
+
- Existing source config: `config/unicode17_universal_glyph_set.yml`
|
|
312
|
+
- Existing loader: `lib/ucode/glyphs/source_config.rb`
|
|
@@ -0,0 +1,241 @@
|
|
|
1
|
+
# 30 — Tier 1 font acquisition: specialist fonts
|
|
2
|
+
|
|
3
|
+
## Goal
|
|
4
|
+
|
|
5
|
+
A fetcher subsystem that downloads the specialist Tier 1 fonts not
|
|
6
|
+
discoverable via fontist's index. These fonts have canonical sources
|
|
7
|
+
(GitHub releases, SIL downloads, personal academic sites) that
|
|
8
|
+
fontist's formulas don't cover.
|
|
9
|
+
|
|
10
|
+
This unblocks TODO 29's curation: the YAML references fonts like
|
|
11
|
+
`data/fonts/Lentariso.otf`, but those paths must be populated for
|
|
12
|
+
the universal-set build (TODO 24) to actually use them.
|
|
13
|
+
|
|
14
|
+
## Why a separate TODO
|
|
15
|
+
|
|
16
|
+
fontist is the project's font discovery layer for redistributable
|
|
17
|
+
formulas. It does not (and should not) carry formulas for:
|
|
18
|
+
|
|
19
|
+
- **Lentariso** (github.com/Bry10022/Lentariso) — SFD source, GitHub
|
|
20
|
+
releases. Not in fontist/formulas.
|
|
21
|
+
- **Kedebideri** (software.sil.org/kedebideri) — UFO3 source with
|
|
22
|
+
TECkit mapping. SIL's downloads page is the canonical source.
|
|
23
|
+
- **NotoSerifTaiYo** (translationcommons.org) — pre-release Noto
|
|
24
|
+
variant, not yet on Google Fonts.
|
|
25
|
+
- **UniHieroglyphica** (suignard.com) — personal academic site, OFL.
|
|
26
|
+
- **Egyptian Text** (microsoft/font-tools) — bundled in a font-tools
|
|
27
|
+
release, not a standalone formula.
|
|
28
|
+
- **BabelStone Pseudographica** — personal academic site.
|
|
29
|
+
- **Symbola** — personal academic site.
|
|
30
|
+
|
|
31
|
+
These need their own fetcher. The fetcher is **not** a fontist
|
|
32
|
+
replacement — it's a complementary path for fonts that can't (yet)
|
|
33
|
+
go through fontist's formula process. The output is identical from
|
|
34
|
+
the consumer's perspective: a TTF/OTF on disk under `data/fonts/`.
|
|
35
|
+
|
|
36
|
+
## Specialist fonts manifest
|
|
37
|
+
|
|
38
|
+
`config/specialist_fonts.yml`:
|
|
39
|
+
|
|
40
|
+
```yaml
|
|
41
|
+
# Specialist Tier 1 fonts not in fontist's formula index.
|
|
42
|
+
# All entries must be OFL unless explicitly whitelisted.
|
|
43
|
+
fonts:
|
|
44
|
+
- label: Lentariso
|
|
45
|
+
version: "1.033"
|
|
46
|
+
license: OFL
|
|
47
|
+
url: "https://github.com/Bry10022/Lentariso/releases/download/1.033/Lentariso.otf"
|
|
48
|
+
sha256: "<filled in on first successful fetch>"
|
|
49
|
+
path: "data/fonts/Lentariso.otf"
|
|
50
|
+
extract: false
|
|
51
|
+
provenance: "github.com/Bry10022/Lentariso — covers Imperial Aramaic, Phoenician, Sidetic"
|
|
52
|
+
|
|
53
|
+
- label: Kedebideri
|
|
54
|
+
version: "3.001"
|
|
55
|
+
license: OFL
|
|
56
|
+
url: "https://software.sil.org/downloads/r/kedebideri/Kedebideri-3.001.zip"
|
|
57
|
+
sha256: "..."
|
|
58
|
+
path: "data/fonts/Kedebideri-Regular.ttf"
|
|
59
|
+
extract: true # zip: extract just the TTF
|
|
60
|
+
extract_member: "Kedebideri-Regular.ttf"
|
|
61
|
+
provenance: "SIL, first Unicode font for Beria Erfe"
|
|
62
|
+
|
|
63
|
+
- label: NotoSerifTaiYo
|
|
64
|
+
version: "draft-2025-09"
|
|
65
|
+
license: OFL
|
|
66
|
+
url: "https://translationcommons.org/wp-content/uploads/2025/09/NotoSerifTaiYo.ttf"
|
|
67
|
+
sha256: "..."
|
|
68
|
+
path: "data/fonts/NotoSerifTaiYo.ttf"
|
|
69
|
+
extract: false
|
|
70
|
+
provenance: "translationcommons.org, proven via correlate-v4"
|
|
71
|
+
|
|
72
|
+
- label: UniHieroglyphica
|
|
73
|
+
version: "16.0"
|
|
74
|
+
license: OFL
|
|
75
|
+
url: "https://www.suignard.com/UniHieroglyphica/UniHieroglyphica-16.0.zip"
|
|
76
|
+
sha256: "..."
|
|
77
|
+
path: "data/fonts/UniHieroglyphica.ttf"
|
|
78
|
+
extract: true
|
|
79
|
+
extract_member: "UniHieroglyphica.ttf"
|
|
80
|
+
provenance: "suignard.com, authoritative for Egyptian Hieroglyphs"
|
|
81
|
+
|
|
82
|
+
- label: EgyptianText
|
|
83
|
+
version: "1.0"
|
|
84
|
+
license: OFL
|
|
85
|
+
url: "https://github.com/microsoft/font-tools/releases/download/v1.0/EgyptianText-Regular.ttf"
|
|
86
|
+
sha256: "..."
|
|
87
|
+
path: "data/fonts/EgyptianText-Regular.ttf"
|
|
88
|
+
extract: false
|
|
89
|
+
provenance: "microsoft/font-tools — Format Controls block"
|
|
90
|
+
|
|
91
|
+
- label: BabelStonePseudographica
|
|
92
|
+
version: "2024-09-10"
|
|
93
|
+
license: OFL
|
|
94
|
+
url: "https://www.babelstone.co.uk/Fonts/Download/BabelStonePseudographica.zip"
|
|
95
|
+
sha256: "..."
|
|
96
|
+
path: "data/fonts/BabelStonePseudographica.ttf"
|
|
97
|
+
extract: true
|
|
98
|
+
extract_member: "BabelStonePseudographica.ttf"
|
|
99
|
+
provenance: "BabelStone, partial Unicode 17 coverage"
|
|
100
|
+
|
|
101
|
+
- label: Symbola
|
|
102
|
+
version: "13.0"
|
|
103
|
+
license: OFL
|
|
104
|
+
url: "https://dn-works.com/wp-content/uploads/2020/ufas/Symbola.zip"
|
|
105
|
+
sha256: "..."
|
|
106
|
+
path: "data/fonts/Symbola.ttf"
|
|
107
|
+
extract: true
|
|
108
|
+
extract_member: "Symbola.ttf"
|
|
109
|
+
provenance: "dn-works.com, broad Unicode symbol coverage"
|
|
110
|
+
|
|
111
|
+
- label: FSung
|
|
112
|
+
version: "2024"
|
|
113
|
+
license: OFL # Taiwan MOE 全宋體, user-local
|
|
114
|
+
url: null # local-only; user must place under ~/Downloads/全宋體/
|
|
115
|
+
path: "~/Downloads/全宋體/FSung-*.ttf" # glob expanded at load time
|
|
116
|
+
extract: false
|
|
117
|
+
provenance: "Taiwan MOE 全宋體, user-supplied"
|
|
118
|
+
```
|
|
119
|
+
|
|
120
|
+
URLs are illustrative — TODO 30 verifies each one resolves (curl
|
|
121
|
+
HEAD) before merge. SHA256 hashes are filled in on first successful
|
|
122
|
+
download (computed locally, committed as a checkpoint).
|
|
123
|
+
|
|
124
|
+
## Architectural notes
|
|
125
|
+
|
|
126
|
+
### Single manifest, not ad-hoc downloads
|
|
127
|
+
|
|
128
|
+
Every specialist font lives in one YAML. Adding a new font = one
|
|
129
|
+
entry in the manifest; no Ruby changes. The fetcher iterates the
|
|
130
|
+
manifest mechanically.
|
|
131
|
+
|
|
132
|
+
### Typed result, not exceptions
|
|
133
|
+
|
|
134
|
+
Each font produces a `Result` value object (`:downloaded`, `:skipped`,
|
|
135
|
+
`:failed`, `:local`). The fetcher never raises for a single font
|
|
136
|
+
failure; the aggregate result lists successes and failures separately.
|
|
137
|
+
This lets CI report which fonts broke without abandoning the run.
|
|
138
|
+
|
|
139
|
+
### License hard-guard
|
|
140
|
+
|
|
141
|
+
Any entry with `license != OFL` requires `--allow-proprietary` to
|
|
142
|
+
fetch. This is a hard guard against accidentally pulling non-OFL
|
|
143
|
+
fonts into the redistributable `data/fonts/` directory. FSung is OFL
|
|
144
|
+
but `url: null` (local-only) — different code path.
|
|
145
|
+
|
|
146
|
+
### Cmap pre-warming (optional, future)
|
|
147
|
+
|
|
148
|
+
After download, the fetcher can pre-warm the cmap cache by loading
|
|
149
|
+
each font once and recording its codepoint set. Saves the
|
|
150
|
+
universal-set build (TODO 24) a re-parse. Out of scope for v0.2.
|
|
151
|
+
|
|
152
|
+
## Files to create
|
|
153
|
+
|
|
154
|
+
- `lib/ucode/fetchers.rb` — autoload hub for the new namespace (or
|
|
155
|
+
extend existing if there's already a fetchers module).
|
|
156
|
+
- `lib/ucode/fetchers/font_fetcher.rb` — abstract base.
|
|
157
|
+
- `lib/ucode/fetchers/font_fetcher/result.rb` — typed result.
|
|
158
|
+
- `lib/ucode/fetchers/specialist_font_fetcher.rb` — concrete, reads
|
|
159
|
+
the manifest, fetches each font.
|
|
160
|
+
- `lib/ucode/models/specialist_font.rb` — one manifest entry.
|
|
161
|
+
- `lib/ucode/models/specialist_font_manifest.rb` — full manifest.
|
|
162
|
+
- `config/specialist_fonts.yml` — the manifest.
|
|
163
|
+
- `lib/ucode/commands/fetch.rb` — autoload `Fonts` (extend existing
|
|
164
|
+
fetch namespace).
|
|
165
|
+
- `lib/ucode/commands/fetch/fonts.rb` — CLI command class.
|
|
166
|
+
- Specs:
|
|
167
|
+
- `spec/ucode/fetchers/font_fetcher_spec.rb`
|
|
168
|
+
- `spec/ucode/fetchers/specialist_font_fetcher_spec.rb`
|
|
169
|
+
- `spec/ucode/commands/fetch/fonts_spec.rb`
|
|
170
|
+
- `spec/fixtures/specialist_fonts.yml` — small fixture
|
|
171
|
+
- `spec/fixtures/fonts/.gitkeep`
|
|
172
|
+
|
|
173
|
+
## Fetcher behavior
|
|
174
|
+
|
|
175
|
+
- **Idempotent.** Skip if `path` exists and SHA256 matches.
|
|
176
|
+
- **Hashed.** Compute SHA256 on download; compare to manifest entry.
|
|
177
|
+
Mismatch raises `Ucode::Fetchers::FontChecksumError` (typed, not
|
|
178
|
+
generic `RuntimeError`).
|
|
179
|
+
- **License-checked.** Refuse to download any font with `license !=
|
|
180
|
+
OFL` unless `--allow-proprietary` is passed. Hard guard.
|
|
181
|
+
- **Extracted.** `extract: true` entries unzip to a temp dir; only
|
|
182
|
+
`extract_member` is moved into place.
|
|
183
|
+
- **Local-only paths.** `url: null` entries print "place <font> at
|
|
184
|
+
<path>" and skip the download. The result is `:local`.
|
|
185
|
+
|
|
186
|
+
## CLI
|
|
187
|
+
|
|
188
|
+
```bash
|
|
189
|
+
bin/ucode fetch fonts # fetch all listed fonts
|
|
190
|
+
bin/ucode fetch fonts --label Lentariso # fetch just one
|
|
191
|
+
bin/ucode fetch fonts --dry-run # show what would be fetched
|
|
192
|
+
bin/ucode fetch fonts --allow-proprietary # bypass license guard
|
|
193
|
+
```
|
|
194
|
+
|
|
195
|
+
Output: per-font status line:
|
|
196
|
+
|
|
197
|
+
```
|
|
198
|
+
Lentariso downloaded data/fonts/Lentariso.otf (1.2 MB, OFL)
|
|
199
|
+
Kedebideri downloaded data/fonts/Kedebideri-Regular.ttf (450 KB, OFL)
|
|
200
|
+
NotoSerifTaiYo downloaded data/fonts/NotoSerifTaiYo.ttf (180 KB, OFL)
|
|
201
|
+
UniHieroglyphica downloaded data/fonts/UniHieroglyphica.ttf (3.4 MB, OFL)
|
|
202
|
+
EgyptianText downloaded data/fonts/EgyptianText-Regular.ttf (220 KB, OFL)
|
|
203
|
+
FSung local ~/Downloads/全宋體/FSung-*.ttf (user-supplied)
|
|
204
|
+
```
|
|
205
|
+
|
|
206
|
+
## Files to change
|
|
207
|
+
|
|
208
|
+
- `lib/ucode/cli.rb` — register `fetch fonts` subcommand.
|
|
209
|
+
- `lib/ucode/commands/fetch.rb` — add `Fonts` autoload.
|
|
210
|
+
- `lib/ucode/exceptions.rb` (or wherever exceptions live) — add
|
|
211
|
+
`FontChecksumError`, `FontLicenseError` if not present.
|
|
212
|
+
|
|
213
|
+
## Acceptance
|
|
214
|
+
|
|
215
|
+
- `bin/ucode fetch fonts` downloads all 7 specialist fonts into
|
|
216
|
+
`data/fonts/`.
|
|
217
|
+
- Re-running skips already-downloaded (idempotency; SHA256 verified).
|
|
218
|
+
- SHA256 mismatch raises typed `FontChecksumError`.
|
|
219
|
+
- `--allow-proprietary` is required for any font with non-OFL license.
|
|
220
|
+
- Local-only entries (FSung) print a clear "please place at <path>"
|
|
221
|
+
message; no network attempt; result is `:local`.
|
|
222
|
+
- Specs cover: happy path, idempotency, checksum mismatch, license
|
|
223
|
+
refusal, zip extraction, missing extract_member.
|
|
224
|
+
- Rubocop clean.
|
|
225
|
+
|
|
226
|
+
## Out of scope
|
|
227
|
+
|
|
228
|
+
- Adding these fonts to fontist's formulas (separate upstream effort).
|
|
229
|
+
- The Tier 1 source map curation — TODO 29.
|
|
230
|
+
- The universal-set build that consumes these — TODO 24, TODO 31.
|
|
231
|
+
- CJK FSung auto-download — these are user-local and not redistributable
|
|
232
|
+
via this repo. Documented in the manifest as local-only.
|
|
233
|
+
|
|
234
|
+
## References
|
|
235
|
+
|
|
236
|
+
- Source map: `TODO.new/29-universal-set-curation-uc17.md`
|
|
237
|
+
- Build pipeline: `TODO.new/24-universal-glyph-set-build.md`
|
|
238
|
+
- Production build: `TODO.new/31-universal-set-production-build.md`
|
|
239
|
+
- Existing fetchers: `lib/ucode/fetchers/{ucd,unihan,charts}.rb` (if present)
|
|
240
|
+
- fontist's FontLocator: `lib/ucode/glyphs/real_fonts/font_locator.rb`
|
|
241
|
+
- BBAW font list: https://aaew.bbaw.de/egyptological-unicode-fonts
|