ucode 0.1.0 → 0.1.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +72 -0
- data/Gemfile.lock +2 -2
- data/TODO.full/00-README.md +116 -0
- data/TODO.full/01-panglyph-vision.md +112 -0
- data/TODO.full/02-panglyph-repo-bootstrap.md +184 -0
- data/TODO.full/03-panglyph-font-builder.md +201 -0
- data/TODO.full/04-panglyph-publish-pipeline.md +126 -0
- data/TODO.full/05-ucode-0-1-1-release.md +139 -0
- data/TODO.full/06-fontisan-remove-audit.md +142 -0
- data/TODO.full/07-fontisan-remove-ucd.md +125 -0
- data/TODO.full/08-archive-private-bin-build.md +143 -0
- data/TODO.full/09-archive-public-structure.md +164 -0
- data/TODO.full/10-fontist-org-woff-glyphs.md +131 -0
- data/TODO.full/11-fontist-org-audit-coverage.md +140 -0
- data/TODO.full/12-implementation-order.md +216 -0
- data/TODO.full/13-fontisan-font-writer-api.md +189 -0
- data/TODO.full/14-fontisan-table-writers.md +66 -0
- data/TODO.full/15-panglyph-builder-real.md +82 -0
- data/TODO.full/16-archive-public-sync-workflows.md +167 -0
- data/TODO.full/17-fontist-org-font-picker.md +73 -0
- data/TODO.full/18-comprehensive-spec-coverage.md +64 -0
- data/TODO.full/19-ucode-0-1-2-patch.md +32 -0
- data/TODO.full/20-fontisan-0-2-23-release.md +52 -0
- data/TODO.new/00-README.md +30 -0
- data/TODO.new/23-universal-glyph-set-source-map.md +312 -0
- data/TODO.new/24-universal-glyph-set-build.md +189 -0
- data/TODO.new/25-font-audit-against-universal-set.md +195 -0
- data/TODO.new/26-missing-glyph-reporter.md +189 -0
- data/TODO.new/27-fontist-org-consumer-integration.md +200 -0
- data/TODO.new/28-implementation-order-update.md +187 -0
- data/TODO.new/29-universal-set-curation-uc17.md +312 -0
- data/TODO.new/30-tier1-font-acquisition.md +241 -0
- data/TODO.new/31-universal-set-production-build.md +205 -0
- data/TODO.new/32-uc17-coverage-matrix.md +165 -0
- data/TODO.new/33-specialist-font-acquisition-refresh.md +138 -0
- data/TODO.new/34-pillar2-content-stream-correlator.md +147 -0
- data/TODO.new/35-universal-set-production-run.md +160 -0
- data/TODO.new/36-per-font-coverage-audit.md +145 -0
- data/TODO.new/37-coverage-highlight-reporter.md +125 -0
- data/TODO.new/38-fontist-org-glyph-consumer.md +141 -0
- data/TODO.new/39-implementation-order-update-32-38.md +258 -0
- data/TODO.new/40-archive-private-uses-ucode-audit.md +124 -0
- data/TODO.new/41-ucode-unicode-archive-bridge.md +160 -0
- data/config/specialist_fonts.yml +102 -0
- data/config/unicode17_tier1_fonts.yml +42 -0
- data/config/unicode17_universal_glyph_set.yml +293 -0
- data/lib/ucode/audit/block_aggregator.rb +57 -29
- data/lib/ucode/audit/browser/face_page.rb +128 -0
- data/lib/ucode/audit/browser/glyph_panel.rb +124 -0
- data/lib/ucode/audit/browser/library_page.rb +74 -0
- data/lib/ucode/audit/browser/missing_glyph_page.rb +87 -0
- data/lib/ucode/audit/browser/template.rb +47 -0
- data/lib/ucode/audit/browser/templates/face.css +200 -0
- data/lib/ucode/audit/browser/templates/face.html.erb +41 -0
- data/lib/ucode/audit/browser/templates/face.js +298 -0
- data/lib/ucode/audit/browser/templates/library.css +119 -0
- data/lib/ucode/audit/browser/templates/library.html.erb +42 -0
- data/lib/ucode/audit/browser/templates/library.js +99 -0
- data/lib/ucode/audit/browser/templates/missing_glyph_page.css +119 -0
- data/lib/ucode/audit/browser/templates/missing_glyph_page.html.erb +58 -0
- data/lib/ucode/audit/browser/templates/missing_glyph_page.js +2 -0
- data/lib/ucode/audit/browser.rb +32 -0
- data/lib/ucode/audit/context.rb +27 -1
- data/lib/ucode/audit/coverage_reference.rb +103 -0
- data/lib/ucode/audit/differ.rb +121 -0
- data/lib/ucode/audit/emitter/block_emitter.rb +52 -0
- data/lib/ucode/audit/emitter/codepoint_emitter.rb +87 -0
- data/lib/ucode/audit/emitter/collection_emitter.rb +80 -0
- data/lib/ucode/audit/emitter/face_directory.rb +212 -0
- data/lib/ucode/audit/emitter/glyph_emitter.rb +48 -0
- data/lib/ucode/audit/emitter/index_emitter.rb +149 -0
- data/lib/ucode/audit/emitter/library_emitter.rb +96 -0
- data/lib/ucode/audit/emitter/paths.rb +312 -0
- data/lib/ucode/audit/emitter/plane_emitter.rb +29 -0
- data/lib/ucode/audit/emitter/script_emitter.rb +29 -0
- data/lib/ucode/audit/emitter.rb +29 -0
- data/lib/ucode/audit/extractors/aggregations.rb +31 -2
- data/lib/ucode/audit/face_auditor.rb +86 -0
- data/lib/ucode/audit/formatters/audit_diff_text.rb +112 -0
- data/lib/ucode/audit/formatters/audit_text.rb +411 -0
- data/lib/ucode/audit/formatters/color.rb +48 -0
- data/lib/ucode/audit/formatters/library_summary_text.rb +98 -0
- data/lib/ucode/audit/formatters/text_formatter.rb +83 -0
- data/lib/ucode/audit/formatters.rb +23 -0
- data/lib/ucode/audit/library_aggregator.rb +86 -0
- data/lib/ucode/audit/library_auditor.rb +105 -0
- data/lib/ucode/audit/release/emitter.rb +152 -0
- data/lib/ucode/audit/release/face_card.rb +93 -0
- data/lib/ucode/audit/release/formula_audits.rb +50 -0
- data/lib/ucode/audit/release/library_index_builder.rb +78 -0
- data/lib/ucode/audit/release/manifest_builder.rb +127 -0
- data/lib/ucode/audit/release.rb +42 -0
- data/lib/ucode/audit/ucd_only_reference.rb +81 -0
- data/lib/ucode/audit/universal_set_reference.rb +136 -0
- data/lib/ucode/audit.rb +31 -0
- data/lib/ucode/cli.rb +339 -33
- data/lib/ucode/commands/audit/browser_command.rb +82 -0
- data/lib/ucode/commands/audit/collection_command.rb +103 -0
- data/lib/ucode/commands/audit/compare_command.rb +188 -0
- data/lib/ucode/commands/audit/font_command.rb +140 -0
- data/lib/ucode/commands/audit/library_command.rb +87 -0
- data/lib/ucode/commands/audit/reference_builder.rb +64 -0
- data/lib/ucode/commands/audit.rb +20 -0
- data/lib/ucode/commands/block_feed.rb +73 -0
- data/lib/ucode/commands/canonical_build.rb +138 -0
- data/lib/ucode/commands/fetch.rb +37 -1
- data/lib/ucode/commands/release.rb +115 -0
- data/lib/ucode/commands/universal_set.rb +211 -0
- data/lib/ucode/commands.rb +5 -0
- data/lib/ucode/coordinator/indices.rb +11 -0
- data/lib/ucode/coordinator.rb +138 -5
- data/lib/ucode/error.rb +30 -2
- data/lib/ucode/fetch/font_fetcher/result.rb +39 -0
- data/lib/ucode/fetch/font_fetcher.rb +16 -0
- data/lib/ucode/fetch/specialist_font_fetcher.rb +280 -0
- data/lib/ucode/fetch.rb +7 -3
- data/lib/ucode/glyphs/real_fonts/cmap_cache.rb +74 -0
- data/lib/ucode/glyphs/real_fonts.rb +1 -0
- data/lib/ucode/glyphs/resolver.rb +62 -0
- data/lib/ucode/glyphs/source.rb +48 -0
- data/lib/ucode/glyphs/source_builder.rb +61 -0
- data/lib/ucode/glyphs/source_config/coverage_assertion.rb +79 -0
- data/lib/ucode/glyphs/source_config/gap_report.rb +54 -0
- data/lib/ucode/glyphs/source_config.rb +104 -0
- data/lib/ucode/glyphs/sources/pillar1_embedded_tounicode.rb +63 -0
- data/lib/ucode/glyphs/sources/pillar3_last_resort.rb +51 -0
- data/lib/ucode/glyphs/sources/tier1_real_font.rb +104 -0
- data/lib/ucode/glyphs/sources.rb +20 -0
- data/lib/ucode/glyphs/universal_set/builder.rb +161 -0
- data/lib/ucode/glyphs/universal_set/coverage_report.rb +139 -0
- data/lib/ucode/glyphs/universal_set/idempotency.rb +86 -0
- data/lib/ucode/glyphs/universal_set/manifest_accumulator.rb +195 -0
- data/lib/ucode/glyphs/universal_set/manifest_writer.rb +61 -0
- data/lib/ucode/glyphs/universal_set/pre_build_check.rb +197 -0
- data/lib/ucode/glyphs/universal_set/validator.rb +204 -0
- data/lib/ucode/glyphs/universal_set.rb +45 -0
- data/lib/ucode/glyphs.rb +6 -0
- data/lib/ucode/models/audit/baseline.rb +6 -0
- data/lib/ucode/models/audit/block_summary.rb +7 -0
- data/lib/ucode/models/audit/codepoint_provenance.rb +39 -0
- data/lib/ucode/models/audit/release_face.rb +42 -0
- data/lib/ucode/models/audit/release_formula.rb +33 -0
- data/lib/ucode/models/audit/release_manifest.rb +43 -0
- data/lib/ucode/models/audit/release_universal_set.rb +37 -0
- data/lib/ucode/models/audit.rb +9 -0
- data/lib/ucode/models/block.rb +2 -0
- data/lib/ucode/models/build_report.rb +109 -0
- data/lib/ucode/models/codepoint/glyph.rb +42 -0
- data/lib/ucode/models/codepoint.rb +3 -0
- data/lib/ucode/models/glyph_source.rb +86 -0
- data/lib/ucode/models/glyph_source_map.rb +138 -0
- data/lib/ucode/models/specialist_font.rb +70 -0
- data/lib/ucode/models/specialist_font_manifest.rb +48 -0
- data/lib/ucode/models/unihan_entry.rb +81 -9
- data/lib/ucode/models/unihan_field.rb +21 -0
- data/lib/ucode/models/universal_set_entry.rb +47 -0
- data/lib/ucode/models/universal_set_manifest.rb +78 -0
- data/lib/ucode/models/validation_report.rb +99 -0
- data/lib/ucode/models.rb +9 -0
- data/lib/ucode/parsers/named_sequences.rb +5 -5
- data/lib/ucode/parsers/unihan.rb +50 -19
- data/lib/ucode/repo/aggregate_writer.rb +34 -2
- data/lib/ucode/repo/block_feed_emitter.rb +153 -0
- data/lib/ucode/repo/build_report_accumulator.rb +138 -0
- data/lib/ucode/repo/build_report_writer.rb +46 -0
- data/lib/ucode/repo/build_validator.rb +229 -0
- data/lib/ucode/repo/codepoint_writer.rb +50 -1
- data/lib/ucode/repo/paths.rb +8 -0
- data/lib/ucode/repo.rb +4 -0
- data/lib/ucode/version.rb +1 -1
- data/schema/block-feed.output.schema.yml +134 -0
- metadata +143 -2
- data/ucode.gemspec +0 -56
|
@@ -0,0 +1,160 @@
|
|
|
1
|
+
# 35 — Universal set production run + glyph provenance (Part 1 close)
|
|
2
|
+
|
|
3
|
+
## Goal
|
|
4
|
+
|
|
5
|
+
Run `ucode universal-set build 17.0.0` end-to-end with the curated
|
|
6
|
+
coverage matrix (TODO 32) and acquired fonts (TODO 33). Produce one
|
|
7
|
+
SVG glyph per assigned Unicode 17 codepoint (~299,382 files), plus
|
|
8
|
+
a manifest tracking which Tier 1 source produced each glyph.
|
|
9
|
+
|
|
10
|
+
Output goes under `output/universal_glyph_set/`:
|
|
11
|
+
|
|
12
|
+
```
|
|
13
|
+
output/universal_glyph_set/
|
|
14
|
+
manifest.json # version, counts, generated_at
|
|
15
|
+
entries/
|
|
16
|
+
U+0041.json # { cp, source: { kind, label, tier }, sha256, ... }
|
|
17
|
+
U+0042.json
|
|
18
|
+
...
|
|
19
|
+
glyphs/
|
|
20
|
+
U+0041.svg
|
|
21
|
+
U+0042.svg
|
|
22
|
+
...
|
|
23
|
+
```
|
|
24
|
+
|
|
25
|
+
The manifest is the **glyph provenance** record — for every glyph,
|
|
26
|
+
which font (or pillar 2 PDF, or Last Resort) produced it. This is
|
|
27
|
+
what fontist.org's char page surfaces as "this glyph came from
|
|
28
|
+
NotoSerifTaiYo, OFL, version 1.0".
|
|
29
|
+
|
|
30
|
+
## Why a separate TODO
|
|
31
|
+
|
|
32
|
+
TODO 31 built the production infrastructure (`universal-set build` +
|
|
33
|
+
`validate` commands). It hasn't actually run against a complete
|
|
34
|
+
font set — every prior attempt failed at pre-check because fonts
|
|
35
|
+
were missing.
|
|
36
|
+
|
|
37
|
+
With TODO 32 (policy) and TODO 33 (acquisition) done, the production
|
|
38
|
+
run becomes possible. This TODO is the integration test: does the
|
|
39
|
+
end-to-end pipeline produce a complete, validated, provenance-tracked
|
|
40
|
+
universal set?
|
|
41
|
+
|
|
42
|
+
## Scope
|
|
43
|
+
|
|
44
|
+
### Phase A — Pre-check green
|
|
45
|
+
|
|
46
|
+
1. Run `ucode universal-set pre-check 17.0.0`. Must report zero
|
|
47
|
+
missing fonts. If anything's still missing, bounce back to
|
|
48
|
+
TODO 33.
|
|
49
|
+
|
|
50
|
+
2. Capture the pre-check report as a fixture under
|
|
51
|
+
`spec/fixtures/universal_set/pre_check_17.json`. Future
|
|
52
|
+
regressions are caught by diffing against this.
|
|
53
|
+
|
|
54
|
+
### Phase B — Full build
|
|
55
|
+
|
|
56
|
+
3. Run `ucode universal-set build 17.0.0 --to=./output/universal_glyph_set`.
|
|
57
|
+
Expected duration: ~30–60 minutes for 299,382 glyphs (dependent
|
|
58
|
+
on font cache warmth and pillar 2 PDF rendering for blocks that
|
|
59
|
+
need it).
|
|
60
|
+
|
|
61
|
+
4. Capture build metrics:
|
|
62
|
+
- Total codepoints processed
|
|
63
|
+
- Per-tier breakdown (Tier 1 / Pillar 1 / Pillar 2 / Pillar 3)
|
|
64
|
+
- Per-block coverage %
|
|
65
|
+
- Wall-clock time
|
|
66
|
+
|
|
67
|
+
5. Validate: `ucode universal-set validate ./output/universal_glyph_set`.
|
|
68
|
+
Must pass every check:
|
|
69
|
+
- manifest_loadable
|
|
70
|
+
- glyph_files_present (every codepoint has an SVG)
|
|
71
|
+
- totals_reconcile (manifest counts match file counts)
|
|
72
|
+
- provenance_complete (every entry has a source)
|
|
73
|
+
- structural_yaml_valid
|
|
74
|
+
|
|
75
|
+
### Phase C — Provenance surfacing
|
|
76
|
+
|
|
77
|
+
6. Extend the manifest schema to include per-entry provenance that
|
|
78
|
+
fontist.org can render. Each `entries/U+XXXX.json` carries:
|
|
79
|
+
|
|
80
|
+
```json
|
|
81
|
+
{
|
|
82
|
+
"cp": 65,
|
|
83
|
+
"id": "U+0041",
|
|
84
|
+
"block_id": "Basic_Latin",
|
|
85
|
+
"source": {
|
|
86
|
+
"tier": 1,
|
|
87
|
+
"kind": "fontist",
|
|
88
|
+
"label": "noto-sans",
|
|
89
|
+
"version": "...",
|
|
90
|
+
"license": "OFL"
|
|
91
|
+
},
|
|
92
|
+
"sha256": "...",
|
|
93
|
+
"extracted_at": "2026-..."
|
|
94
|
+
}
|
|
95
|
+
```
|
|
96
|
+
|
|
97
|
+
7. Pillar 2 entries carry:
|
|
98
|
+
```json
|
|
99
|
+
{
|
|
100
|
+
"source": {
|
|
101
|
+
"tier": 2,
|
|
102
|
+
"kind": "pdf_charts",
|
|
103
|
+
"pdf_url": "https://www.unicode.org/charts/PDF/U1E6C0.pdf",
|
|
104
|
+
"pdf_sha256": "...",
|
|
105
|
+
"cid": 41
|
|
106
|
+
}
|
|
107
|
+
}
|
|
108
|
+
```
|
|
109
|
+
|
|
110
|
+
8. Pillar 3 entries (Last Resort tofu) carry:
|
|
111
|
+
```json
|
|
112
|
+
{
|
|
113
|
+
"source": {
|
|
114
|
+
"tier": 3,
|
|
115
|
+
"kind": "last_resort",
|
|
116
|
+
"label": "Last Resort Font"
|
|
117
|
+
}
|
|
118
|
+
}
|
|
119
|
+
```
|
|
120
|
+
|
|
121
|
+
### Phase D — HTML browser
|
|
122
|
+
|
|
123
|
+
9. Generate `output/universal_glyph_set/index.html` — a static page
|
|
124
|
+
summarizing the build:
|
|
125
|
+
|
|
126
|
+
- Top-level stats (total codepoints, per-tier pie chart)
|
|
127
|
+
- Per-block table with coverage % and a sample of glyphs
|
|
128
|
+
- Click a glyph → see full provenance
|
|
129
|
+
|
|
130
|
+
Reuse the existing audit browser generator (`ucode audit browser`)
|
|
131
|
+
pattern. Output is dev-server-friendly — no JS build required.
|
|
132
|
+
|
|
133
|
+
### Phase E — Idempotency
|
|
134
|
+
|
|
135
|
+
10. Re-run the build without changing inputs. Every file must be
|
|
136
|
+
byte-identical (mtime unchanged). The existing `Idempotency`
|
|
137
|
+
module handles this; just verify it holds end-to-end.
|
|
138
|
+
|
|
139
|
+
11. Re-run after touching one font (re-fetch Lentariso at the same
|
|
140
|
+
version). Only that font's glyphs should rewrite.
|
|
141
|
+
|
|
142
|
+
## Acceptance
|
|
143
|
+
|
|
144
|
+
- [ ] `output/universal_glyph_set/` exists with `manifest.json` +
|
|
145
|
+
`entries/` + `glyphs/`
|
|
146
|
+
- [ ] 299,382+ glyph SVGs present (one per assigned codepoint)
|
|
147
|
+
- [ ] 0 pillar 3 fallbacks for blocks with known Tier 1 sources
|
|
148
|
+
- [ ] `universal-set validate` exits 0
|
|
149
|
+
- [ ] HTML browser renders locally with no JS errors
|
|
150
|
+
- [ ] Re-running build is a byte-identical no-op
|
|
151
|
+
- [ ] Provenance JSON for U+0041 (Tier 1 noto-sans) and U+1E6C0
|
|
152
|
+
(Tier 1 NotoSerifTaiYo) and at least one pillar 2 entry
|
|
153
|
+
|
|
154
|
+
## References
|
|
155
|
+
|
|
156
|
+
- [TODO 24](24-universal-glyph-set-build.md) — build infrastructure
|
|
157
|
+
- [TODO 31](31-universal-set-production-build.md) — production design
|
|
158
|
+
- [TODO 32](32-uc17-coverage-matrix.md) — input policy
|
|
159
|
+
- [TODO 33](33-specialist-font-acquisition-refresh.md) — input fonts
|
|
160
|
+
- [TODO 38](38-fontist-org-glyph-consumer.md) — consumer wiring
|
|
@@ -0,0 +1,145 @@
|
|
|
1
|
+
# 36 — Per-font coverage audit against universal set (Part 2 master)
|
|
2
|
+
|
|
3
|
+
## Goal
|
|
4
|
+
|
|
5
|
+
Given the universal glyph set (TODO 35) as the **reference baseline**,
|
|
6
|
+
audit ANY font's cmap coverage against it. For each font in our
|
|
7
|
+
library (or any user-supplied font), produce:
|
|
8
|
+
|
|
9
|
+
- Per-block coverage % (e.g. "Lentariso covers 100% of Sidetic,
|
|
10
|
+
0% of Basic Latin")
|
|
11
|
+
- Per-block gap list (codepoints the font misses)
|
|
12
|
+
- Per-block extras (codepoints the font covers that aren't assigned
|
|
13
|
+
in Unicode 17 — rare but possible for old font versions)
|
|
14
|
+
- Overall coverage score weighted by Unicode 17 assigned count
|
|
15
|
+
|
|
16
|
+
This is **Part 2** of the user's directive: use the universal set
|
|
17
|
+
to highlight missing glyphs in specific fonts. The output drives
|
|
18
|
+
font selection decisions ("which font should fontist.org use for
|
|
19
|
+
this block?") and surfaces fonts that claim Unicode X.Y support
|
|
20
|
+
but actually have cmap gaps.
|
|
21
|
+
|
|
22
|
+
## Why a separate TODO
|
|
23
|
+
|
|
24
|
+
TODO 25 built the `CoverageReference` infrastructure (universal set
|
|
25
|
+
as the comparison baseline). TODO 26 built the missing-glyph
|
|
26
|
+
reporter. Neither has been RUN against real fonts because the
|
|
27
|
+
universal set wasn't built (TODO 35 unblocks that).
|
|
28
|
+
|
|
29
|
+
With TODO 35 done, this TODO is the actual audit: walk each font
|
|
30
|
+
in our library, compare cmap to universal set, emit per-font
|
|
31
|
+
coverage reports.
|
|
32
|
+
|
|
33
|
+
## Scope
|
|
34
|
+
|
|
35
|
+
### Phase A — Audit library command
|
|
36
|
+
|
|
37
|
+
1. Extend `ucode audit font` to accept an optional
|
|
38
|
+
`--reference-universal-set=<path>` flag. When provided, the
|
|
39
|
+
audit includes a `coverage` section comparing the font's cmap
|
|
40
|
+
to the universal set's per-block codepoint lists.
|
|
41
|
+
|
|
42
|
+
2. The audit output gains a new section:
|
|
43
|
+
```json
|
|
44
|
+
{
|
|
45
|
+
"coverage": {
|
|
46
|
+
"per_block": [
|
|
47
|
+
{
|
|
48
|
+
"block_id": "Sidetic",
|
|
49
|
+
"range": [10940, 1097F],
|
|
50
|
+
"assigned_count": 26,
|
|
51
|
+
"covered_count": 26,
|
|
52
|
+
"missing": [],
|
|
53
|
+
"extras": []
|
|
54
|
+
},
|
|
55
|
+
...
|
|
56
|
+
],
|
|
57
|
+
"overall": {
|
|
58
|
+
"total_assigned": 299382,
|
|
59
|
+
"total_covered": 145233,
|
|
60
|
+
"percentage": 48.5
|
|
61
|
+
}
|
|
62
|
+
}
|
|
63
|
+
}
|
|
64
|
+
```
|
|
65
|
+
|
|
66
|
+
3. Extend `ucode audit library` (walks a directory of fonts) to
|
|
67
|
+
produce a per-font summary table. Sortable by overall %, by
|
|
68
|
+
per-block coverage, or by font family.
|
|
69
|
+
|
|
70
|
+
### Phase B — Reference baseline extraction
|
|
71
|
+
|
|
72
|
+
4. Build a fast-loading reference structure from the universal set:
|
|
73
|
+
|
|
74
|
+
```
|
|
75
|
+
output/universal_glyph_set/reference/
|
|
76
|
+
by_block.json # { block_id → [cp_int, ...] }
|
|
77
|
+
all_cps.bin # sorted array of cp_int, for fast bsearch
|
|
78
|
+
```
|
|
79
|
+
|
|
80
|
+
The audit loads this once and compares each font's cmap against
|
|
81
|
+
it. Avoids re-reading 299k individual entry JSONs.
|
|
82
|
+
|
|
83
|
+
5. The reference is generated as part of TODO 35's build step. This
|
|
84
|
+
TODO consumes it.
|
|
85
|
+
|
|
86
|
+
### Phase C — Per-font gap browser
|
|
87
|
+
|
|
88
|
+
6. Extend the HTML face browser (`ucode audit browser`) to surface
|
|
89
|
+
coverage gaps visually. For each font:
|
|
90
|
+
|
|
91
|
+
- Per-block table with coverage %, color-coded (green ≥95%,
|
|
92
|
+
yellow 50–95%, red <50%)
|
|
93
|
+
- Click a block → see the actual missing glyphs as a grid
|
|
94
|
+
(showing the universal set's glyph SVG for each missing cp,
|
|
95
|
+
so the user can see what the font is missing)
|
|
96
|
+
|
|
97
|
+
7. Library-level summary page:
|
|
98
|
+
- Top-N fonts by overall coverage
|
|
99
|
+
- Heatmap: font × block, cell color = coverage %
|
|
100
|
+
- "Best font per block" table (which font has the highest
|
|
101
|
+
coverage for each block)
|
|
102
|
+
|
|
103
|
+
### Phase D — Coverage regression detection
|
|
104
|
+
|
|
105
|
+
8. When a font is updated (re-installed via fontist, or new version
|
|
106
|
+
fetched), re-run the audit and DIFF against the prior run.
|
|
107
|
+
Surface:
|
|
108
|
+
- Newly-covered codepoints (good)
|
|
109
|
+
- Newly-missing codepoints (regression — flag for review)
|
|
110
|
+
|
|
111
|
+
9. CI mode: in the ucode release workflow, re-audit the universal
|
|
112
|
+
set's Tier 1 fonts against the latest universal set. Any
|
|
113
|
+
coverage regression blocks the release.
|
|
114
|
+
|
|
115
|
+
### Phase E — Public coverage dashboard
|
|
116
|
+
|
|
117
|
+
10. The HTML library browser can be published to
|
|
118
|
+
`fontist.org/unicode/coverage/` so users can search "which
|
|
119
|
+
font covers Cyrillic Extended-D?" and get an answer.
|
|
120
|
+
|
|
121
|
+
This is the fontist.org consumer integration for coverage
|
|
122
|
+
data — pairs with TODO 38 (glyph consumer).
|
|
123
|
+
|
|
124
|
+
## Acceptance
|
|
125
|
+
|
|
126
|
+
- [ ] `ucode audit font <path> --reference-universal-set=...` emits
|
|
127
|
+
a `coverage` section with per-block + overall stats
|
|
128
|
+
- [ ] `ucode audit library <dir>` walks every font and produces a
|
|
129
|
+
sortable summary
|
|
130
|
+
- [ ] HTML face browser shows per-block coverage with click-through
|
|
131
|
+
to missing-glyph grids
|
|
132
|
+
- [ ] Library browser has a heatmap view
|
|
133
|
+
- [ ] At least 10 fonts audited end-to-end as a smoke test:
|
|
134
|
+
Lentariso, Kedebideri, NotoSerifTaiYo, FSung-1, Noto Sans
|
|
135
|
+
CJK JP, Noto Sans Symbols, Noto Sans Symbols 2, Symbola,
|
|
136
|
+
Noto Music, Last Resort Font
|
|
137
|
+
|
|
138
|
+
## References
|
|
139
|
+
|
|
140
|
+
- [TODO 25](25-font-audit-against-universal-set.md) — CoverageReference
|
|
141
|
+
- [TODO 26](26-missing-glyph-reporter.md) — gap reporter
|
|
142
|
+
- [TODO 35](35-universal-set-production-run.md) — universal set (input)
|
|
143
|
+
- [TODO 37](37-coverage-highlight-reporter.md) — visualizer detail
|
|
144
|
+
- [TODO 38](38-fontist-org-glyph-consumer.md) — public dashboard
|
|
145
|
+
- [TODO 40](40-archive-private-uses-ucode-audit.md) — bin/build uses this audit in CI
|
|
@@ -0,0 +1,125 @@
|
|
|
1
|
+
# 37 — Coverage highlight reporter (missing-glyph visualizer)
|
|
2
|
+
|
|
3
|
+
## Goal
|
|
4
|
+
|
|
5
|
+
A focused visualizer that takes ONE font + the universal set and
|
|
6
|
+
produces a per-block "missing glyph grid" — every codepoint the
|
|
7
|
+
font doesn't cover, rendered as the universal set's reference glyph
|
|
8
|
+
so the user can see at a glance what's missing.
|
|
9
|
+
|
|
10
|
+
Pairs with TODO 36 (the audit data layer). TODO 36 produces the
|
|
11
|
+
JSON-shaped gap lists; this TODO is the human-facing visualizer.
|
|
12
|
+
|
|
13
|
+
## Why a separate TODO
|
|
14
|
+
|
|
15
|
+
TODO 26 built a missing-glyph reporter, but its output is a flat
|
|
16
|
+
list of codepoint ids. For a font like Noto Sans CJK JP missing
|
|
17
|
+
200 codepoints across 5 CJK extensions, a flat list is useless —
|
|
18
|
+
you can't see the patterns. This TODO is the visual layer that
|
|
19
|
+
makes the data actionable.
|
|
20
|
+
|
|
21
|
+
The audience is font maintainers ("what's my font missing?") and
|
|
22
|
+
fontist.org ("which font should we use for this block?"). Both
|
|
23
|
+
need to see the actual glyphs, not hex strings.
|
|
24
|
+
|
|
25
|
+
## Scope
|
|
26
|
+
|
|
27
|
+
### Phase A — Per-font highlight page
|
|
28
|
+
|
|
29
|
+
1. New command: `ucode audit highlight <font-path>` — produces
|
|
30
|
+
`output/audit/highlight/<font-slug>/index.html`.
|
|
31
|
+
|
|
32
|
+
2. Page structure:
|
|
33
|
+
- Header: font name, version, license, overall coverage %
|
|
34
|
+
- Per-block sections, sorted by missing count (most-missing
|
|
35
|
+
first):
|
|
36
|
+
```
|
|
37
|
+
Block: CJK Unified Ideographs Extension J (U+31350..U+323AF)
|
|
38
|
+
Missing: 4123 of 4298 codepoints (95.7% missing)
|
|
39
|
+
|
|
40
|
+
[Grid of missing glyphs, each cell showing:
|
|
41
|
+
- The reference glyph SVG (from universal set)
|
|
42
|
+
- Codepoint id (U+31450)
|
|
43
|
+
- Codepoint name (CJK UNIFIED IDEOGRAPH-31450)
|
|
44
|
+
]
|
|
45
|
+
```
|
|
46
|
+
|
|
47
|
+
3. Grid cell click → drill to per-codepoint page with full UCD
|
|
48
|
+
metadata (reuses the existing UnicodeCharPage shape).
|
|
49
|
+
|
|
50
|
+
### Phase B — Comparison view
|
|
51
|
+
|
|
52
|
+
4. New command: `ucode audit compare <left-font> <right-font>` —
|
|
53
|
+
side-by-side coverage diff:
|
|
54
|
+
- Left covers, right misses (red, right side)
|
|
55
|
+
- Left misses, right covers (red, left side)
|
|
56
|
+
- Both cover (no entry)
|
|
57
|
+
- Both miss (gray)
|
|
58
|
+
|
|
59
|
+
5. Use case: "FSung-1 vs Noto Sans CJK JP for CJK Ext J — which
|
|
60
|
+
should we use as Tier 1?"
|
|
61
|
+
|
|
62
|
+
### Phase C — Library heatmap
|
|
63
|
+
|
|
64
|
+
6. Library-level heatmap page. Rows = fonts, columns = blocks,
|
|
65
|
+
cell color = coverage %.
|
|
66
|
+
|
|
67
|
+
7. Filter controls:
|
|
68
|
+
- Show only blocks with assigned_count > N
|
|
69
|
+
- Show only fonts with overall coverage > X%
|
|
70
|
+
- Sort by family / version / coverage
|
|
71
|
+
|
|
72
|
+
8. Cell click → drill to per-block per-font detail.
|
|
73
|
+
|
|
74
|
+
### Phase D — Embed reference glyphs efficiently
|
|
75
|
+
|
|
76
|
+
9. The highlight page embeds reference glyphs as inline SVG (not
|
|
77
|
+
`<img>` referencing SVG files — that's 200k HTTP requests for
|
|
78
|
+
a full CJK page). Inline SVG with `<symbol>` definitions:
|
|
79
|
+
|
|
80
|
+
```html
|
|
81
|
+
<svg style="display:none">
|
|
82
|
+
<defs>
|
|
83
|
+
<symbol id="U+4E00" viewBox="0 0 1000 1000">
|
|
84
|
+
<path d="..."/>
|
|
85
|
+
</symbol>
|
|
86
|
+
<symbol id="U+4E8C" viewBox="0 0 1000 1000">
|
|
87
|
+
<path d="..."/>
|
|
88
|
+
</symbol>
|
|
89
|
+
</defs>
|
|
90
|
+
</svg>
|
|
91
|
+
<svg viewBox="0 0 1000 1000"><use href="#U+4E00"/></svg>
|
|
92
|
+
```
|
|
93
|
+
|
|
94
|
+
10. For large blocks (CJK 20k+ glyphs), partition the page by
|
|
95
|
+
block-range subsets so the browser doesn't choke. Pagination
|
|
96
|
+
or lazy-load via IntersectionObserver.
|
|
97
|
+
|
|
98
|
+
### Phase E — Diff mode for font versions
|
|
99
|
+
|
|
100
|
+
11. `ucode audit diff <font-v1> <font-v2>` — for the same font
|
|
101
|
+
family across versions, surface:
|
|
102
|
+
- Codepoints added in v2 (good — coverage improved)
|
|
103
|
+
- Codepoints removed in v2 (regression — flag for review)
|
|
104
|
+
|
|
105
|
+
Useful for tracking Noto Sans releases across Unicode versions.
|
|
106
|
+
|
|
107
|
+
## Acceptance
|
|
108
|
+
|
|
109
|
+
- [ ] `ucode audit highlight <font>` produces an HTML page with
|
|
110
|
+
per-block missing-glyph grids
|
|
111
|
+
- [ ] `ucode audit compare <left> <right>` produces a side-by-side
|
|
112
|
+
diff page
|
|
113
|
+
- [ ] Library heatmap renders with no perf issues for ≤50 fonts ×
|
|
114
|
+
~340 blocks
|
|
115
|
+
- [ ] Reference glyphs inlined as `<symbol>` defs (no per-glyph
|
|
116
|
+
HTTP requests)
|
|
117
|
+
- [ ] CJK-scale block (20k+ glyphs) paginates or lazy-loads
|
|
118
|
+
- [ ] Cell clicks navigate to per-codepoint pages (existing
|
|
119
|
+
UnicodeCharPage)
|
|
120
|
+
|
|
121
|
+
## References
|
|
122
|
+
|
|
123
|
+
- [TODO 26](26-missing-glyph-reporter.md) — flat-list predecessor
|
|
124
|
+
- [TODO 36](36-per-font-coverage-audit.md) — audit data layer
|
|
125
|
+
- `lib/ucode/audit/browser.rb` — existing HTML browser generator
|
|
@@ -0,0 +1,141 @@
|
|
|
1
|
+
# 38 — fontist.org glyph consumer + provenance display
|
|
2
|
+
|
|
3
|
+
## Goal
|
|
4
|
+
|
|
5
|
+
Wire fontist.org's `UnicodeCharPage.vue` to consume the universal
|
|
6
|
+
glyph set (TODO 35) — replace the current system-font-fallback
|
|
7
|
+
glyph rendering with the actual SVG ucode extracted. Surface the
|
|
8
|
+
font source (provenance) next to the glyph so users see "this glyph
|
|
9
|
+
came from NotoSerifTaiYo, OFL".
|
|
10
|
+
|
|
11
|
+
This closes the loop: ucode extracts → fontist.org displays.
|
|
12
|
+
|
|
13
|
+
## Why a separate TODO
|
|
14
|
+
|
|
15
|
+
Today the char page renders glyphs via `displayChar(cp, charData.c)`
|
|
16
|
+
— browser-side font resolution. That means:
|
|
17
|
+
|
|
18
|
+
- Tai Yo / Sidetic / Beria Erfe / Egyptian Hieroglyphs show tofu
|
|
19
|
+
on most systems (no system font has them)
|
|
20
|
+
- The "which font is this from?" question has no answer
|
|
21
|
+
- ucode's universal glyph set isn't consumed anywhere
|
|
22
|
+
|
|
23
|
+
With the universal set built (TODO 35), fontist.org can fetch
|
|
24
|
+
`/unicode/glyph/U+XXXX.svg` and render the actual extracted outline.
|
|
25
|
+
Provenance comes from the per-codepoint JSON the universal set
|
|
26
|
+
already emits.
|
|
27
|
+
|
|
28
|
+
## Scope
|
|
29
|
+
|
|
30
|
+
### Phase A — Sync glyphs into fontist.org's public/
|
|
31
|
+
|
|
32
|
+
1. Extend `scripts/fetch-data.sh` to copy `unicode/` from
|
|
33
|
+
`fontist-archive-public` (per TODO 41 — the artifacts land there
|
|
34
|
+
via ucode's publish workflow, NOT via direct raw.githubusercontent.com
|
|
35
|
+
fetch):
|
|
36
|
+
|
|
37
|
+
```bash
|
|
38
|
+
log "copying unicode/block-feed/ + universal-glyph-set/"
|
|
39
|
+
mkdir -p "$PUBLIC/unicode"
|
|
40
|
+
if [[ -d "$TMP/archive/unicode/block-feed" ]]; then
|
|
41
|
+
cp -r "$TMP/archive/unicode/block-feed/." "$PUBLIC/unicode/"
|
|
42
|
+
fi
|
|
43
|
+
if [[ -d "$TMP/archive/unicode/universal-glyph-set" ]]; then
|
|
44
|
+
mkdir -p "$PUBLIC/unicode/glyphs"
|
|
45
|
+
cp -r "$TMP/archive/unicode/universal-glyph-set/glyphs/." "$PUBLIC/unicode/glyphs/"
|
|
46
|
+
cp "$TMP/archive/unicode/universal-glyph-set/manifest.json" "$PUBLIC/unicode/manifest.json"
|
|
47
|
+
fi
|
|
48
|
+
```
|
|
49
|
+
|
|
50
|
+
2. Scale check: 299,382 SVG files at ~1KB each = ~300MB. Too big for
|
|
51
|
+
the fontist.org repo but fine for the public archive. For LOCAL
|
|
52
|
+
dev: full copy is OK. For deployment: rsync to CDN target (not
|
|
53
|
+
committed to git).
|
|
54
|
+
|
|
55
|
+
3. For per-codepoint JSONs (1.2 GB): add `--with-codepoints` flag
|
|
56
|
+
(default OFF). When ON, download + extract the Release asset
|
|
57
|
+
(per TODO 41 §Phase A.3) to `public/codepoints/`.
|
|
58
|
+
|
|
59
|
+
### Phase B — Char page glyph rendering
|
|
60
|
+
|
|
61
|
+
4. Update `UnicodeCharPage.vue`:
|
|
62
|
+
- Replace `displayChar(cp, charData.c)` with `<img :src="`/unicode/glyph/U+${hex}.svg`">`
|
|
63
|
+
when the SVG exists; fall back to `displayChar` for missing
|
|
64
|
+
glyphs (unassigned codepoints not in universal set)
|
|
65
|
+
- Add a "Source" caption: `<small>Glyph from {{ source.label }} ({{ source.license }})</small>`
|
|
66
|
+
|
|
67
|
+
5. Fetch the per-codepoint JSON (already wired in current PR #45)
|
|
68
|
+
to get `source.label`, `source.license`, `source.tier`. Show
|
|
69
|
+
tier as a badge: "Tier 1" / "Pillar 2" / "Last Resort".
|
|
70
|
+
|
|
71
|
+
### Phase C — Per-block glyph grid
|
|
72
|
+
|
|
73
|
+
6. On the block page (`UnicodeBlockPage.vue`), the existing char
|
|
74
|
+
grid currently uses `displayChar` for each cell. Replace with
|
|
75
|
+
inline SVG references via `<symbol>` defs (one def per glyph
|
|
76
|
+
on the page, cells `<use>` it). Pattern from TODO 37.
|
|
77
|
+
|
|
78
|
+
7. For CJK-scale blocks (20k+ glyphs), lazy-load on scroll. The
|
|
79
|
+
existing block grid already paginates; just swap the rendering.
|
|
80
|
+
|
|
81
|
+
### Phase D — Provenance badge component
|
|
82
|
+
|
|
83
|
+
8. New Vue component `GlyphSourceBadge.vue`:
|
|
84
|
+
```vue
|
|
85
|
+
<template>
|
|
86
|
+
<span class="gsb" :class="`gsb-${tier}`">
|
|
87
|
+
<span class="gsb-tier">{{ tierLabel }}</span>
|
|
88
|
+
<span class="gsb-label">{{ source.label }}</span>
|
|
89
|
+
<span class="gsb-license" v-if="source.license">{{ source.license }}</span>
|
|
90
|
+
</span>
|
|
91
|
+
</template>
|
|
92
|
+
```
|
|
93
|
+
|
|
94
|
+
9. Color coding:
|
|
95
|
+
- Tier 1 (real font): green
|
|
96
|
+
- Pillar 1 (PDF + ToUnicode): blue
|
|
97
|
+
- Pillar 2 (PDF correlation): yellow
|
|
98
|
+
- Pillar 3 (Last Resort): gray
|
|
99
|
+
|
|
100
|
+
### Phase E — Per-block coverage indicator
|
|
101
|
+
|
|
102
|
+
10. On the block list page (`/unicode`), each block entry shows
|
|
103
|
+
coverage stats from the universal set:
|
|
104
|
+
- "4123/4298 codepoints covered by Tier 1 (Noto Sans CJK JP)"
|
|
105
|
+
- Color-coded bar: green = full Tier 1, yellow = mixed, red =
|
|
106
|
+
pillar 3 (tofu)
|
|
107
|
+
|
|
108
|
+
11. Click the bar → drill to the per-block highlight page (TODO 37).
|
|
109
|
+
|
|
110
|
+
### Phase F — Glyph detail page
|
|
111
|
+
|
|
112
|
+
12. New route `/unicode/glyph/:hex` — dedicated glyph detail:
|
|
113
|
+
- Large SVG render with zoom/pan
|
|
114
|
+
- Full outline path data (collapsible `<pre>`)
|
|
115
|
+
- Provenance chain (font → cmap → GID → glyf outline → SVG)
|
|
116
|
+
- Comparison: this glyph vs other Tier 1 fonts covering same cp
|
|
117
|
+
|
|
118
|
+
Useful for font designers checking extraction quality.
|
|
119
|
+
|
|
120
|
+
## Acceptance
|
|
121
|
+
|
|
122
|
+
- [ ] `scripts/fetch-data.sh` copies `unicode/` from fontist-archive-public
|
|
123
|
+
(block-feed + universal-glyph-set; per-cp JSONs via optional flag)
|
|
124
|
+
- [ ] `UnicodeCharPage.vue` renders the universal-set SVG (not
|
|
125
|
+
system fallback) for codepoints in the universal set
|
|
126
|
+
- [ ] Provenance badge shows next to every glyph
|
|
127
|
+
- [ ] Block grid renders glyphs via inline SVG `<symbol>` defs
|
|
128
|
+
(no per-glyph HTTP requests)
|
|
129
|
+
- [ ] Block list page shows per-block Tier 1 coverage %
|
|
130
|
+
- [ ] `/unicode/glyph/:hex` route exists with full provenance view
|
|
131
|
+
- [ ] Tai Yo / Sidetic / Egyptian Hieroglyphs render real glyphs
|
|
132
|
+
(no tofu) when sourced from universal set
|
|
133
|
+
|
|
134
|
+
## References
|
|
135
|
+
|
|
136
|
+
- [TODO 27](27-fontist-org-consumer-integration.md) — original consumer TODO
|
|
137
|
+
- [TODO 35](35-universal-set-production-run.md) — universal set (input)
|
|
138
|
+
- [TODO 37](37-coverage-highlight-reporter.md) — visualizer patterns
|
|
139
|
+
- [TODO 41](41-ucode-unicode-archive-bridge.md) — publishing pipeline
|
|
140
|
+
- `src/pages/UnicodeCharPage.vue` — current char page
|
|
141
|
+
- `src/pages/UnicodeBlockPage.vue` — current block page
|