ucode 0.1.0 → 0.1.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (174) hide show
  1. checksums.yaml +4 -4
  2. data/CHANGELOG.md +72 -0
  3. data/Gemfile.lock +2 -2
  4. data/TODO.full/00-README.md +116 -0
  5. data/TODO.full/01-panglyph-vision.md +112 -0
  6. data/TODO.full/02-panglyph-repo-bootstrap.md +184 -0
  7. data/TODO.full/03-panglyph-font-builder.md +201 -0
  8. data/TODO.full/04-panglyph-publish-pipeline.md +126 -0
  9. data/TODO.full/05-ucode-0-1-1-release.md +139 -0
  10. data/TODO.full/06-fontisan-remove-audit.md +142 -0
  11. data/TODO.full/07-fontisan-remove-ucd.md +125 -0
  12. data/TODO.full/08-archive-private-bin-build.md +143 -0
  13. data/TODO.full/09-archive-public-structure.md +164 -0
  14. data/TODO.full/10-fontist-org-woff-glyphs.md +131 -0
  15. data/TODO.full/11-fontist-org-audit-coverage.md +140 -0
  16. data/TODO.full/12-implementation-order.md +216 -0
  17. data/TODO.full/13-fontisan-font-writer-api.md +189 -0
  18. data/TODO.full/14-fontisan-table-writers.md +66 -0
  19. data/TODO.full/15-panglyph-builder-real.md +82 -0
  20. data/TODO.full/16-archive-public-sync-workflows.md +167 -0
  21. data/TODO.full/17-fontist-org-font-picker.md +73 -0
  22. data/TODO.full/18-comprehensive-spec-coverage.md +64 -0
  23. data/TODO.full/19-ucode-0-1-2-patch.md +32 -0
  24. data/TODO.full/20-fontisan-0-2-23-release.md +52 -0
  25. data/TODO.new/00-README.md +30 -0
  26. data/TODO.new/23-universal-glyph-set-source-map.md +312 -0
  27. data/TODO.new/24-universal-glyph-set-build.md +189 -0
  28. data/TODO.new/25-font-audit-against-universal-set.md +195 -0
  29. data/TODO.new/26-missing-glyph-reporter.md +189 -0
  30. data/TODO.new/27-fontist-org-consumer-integration.md +200 -0
  31. data/TODO.new/28-implementation-order-update.md +187 -0
  32. data/TODO.new/29-universal-set-curation-uc17.md +312 -0
  33. data/TODO.new/30-tier1-font-acquisition.md +241 -0
  34. data/TODO.new/31-universal-set-production-build.md +205 -0
  35. data/TODO.new/32-uc17-coverage-matrix.md +165 -0
  36. data/TODO.new/33-specialist-font-acquisition-refresh.md +138 -0
  37. data/TODO.new/34-pillar2-content-stream-correlator.md +147 -0
  38. data/TODO.new/35-universal-set-production-run.md +160 -0
  39. data/TODO.new/36-per-font-coverage-audit.md +145 -0
  40. data/TODO.new/37-coverage-highlight-reporter.md +125 -0
  41. data/TODO.new/38-fontist-org-glyph-consumer.md +141 -0
  42. data/TODO.new/39-implementation-order-update-32-38.md +258 -0
  43. data/TODO.new/40-archive-private-uses-ucode-audit.md +124 -0
  44. data/TODO.new/41-ucode-unicode-archive-bridge.md +160 -0
  45. data/config/specialist_fonts.yml +102 -0
  46. data/config/unicode17_tier1_fonts.yml +42 -0
  47. data/config/unicode17_universal_glyph_set.yml +293 -0
  48. data/lib/ucode/audit/block_aggregator.rb +57 -29
  49. data/lib/ucode/audit/browser/face_page.rb +128 -0
  50. data/lib/ucode/audit/browser/glyph_panel.rb +124 -0
  51. data/lib/ucode/audit/browser/library_page.rb +74 -0
  52. data/lib/ucode/audit/browser/missing_glyph_page.rb +87 -0
  53. data/lib/ucode/audit/browser/template.rb +47 -0
  54. data/lib/ucode/audit/browser/templates/face.css +200 -0
  55. data/lib/ucode/audit/browser/templates/face.html.erb +41 -0
  56. data/lib/ucode/audit/browser/templates/face.js +298 -0
  57. data/lib/ucode/audit/browser/templates/library.css +119 -0
  58. data/lib/ucode/audit/browser/templates/library.html.erb +42 -0
  59. data/lib/ucode/audit/browser/templates/library.js +99 -0
  60. data/lib/ucode/audit/browser/templates/missing_glyph_page.css +119 -0
  61. data/lib/ucode/audit/browser/templates/missing_glyph_page.html.erb +58 -0
  62. data/lib/ucode/audit/browser/templates/missing_glyph_page.js +2 -0
  63. data/lib/ucode/audit/browser.rb +32 -0
  64. data/lib/ucode/audit/context.rb +27 -1
  65. data/lib/ucode/audit/coverage_reference.rb +103 -0
  66. data/lib/ucode/audit/differ.rb +121 -0
  67. data/lib/ucode/audit/emitter/block_emitter.rb +52 -0
  68. data/lib/ucode/audit/emitter/codepoint_emitter.rb +87 -0
  69. data/lib/ucode/audit/emitter/collection_emitter.rb +80 -0
  70. data/lib/ucode/audit/emitter/face_directory.rb +212 -0
  71. data/lib/ucode/audit/emitter/glyph_emitter.rb +48 -0
  72. data/lib/ucode/audit/emitter/index_emitter.rb +149 -0
  73. data/lib/ucode/audit/emitter/library_emitter.rb +96 -0
  74. data/lib/ucode/audit/emitter/paths.rb +312 -0
  75. data/lib/ucode/audit/emitter/plane_emitter.rb +29 -0
  76. data/lib/ucode/audit/emitter/script_emitter.rb +29 -0
  77. data/lib/ucode/audit/emitter.rb +29 -0
  78. data/lib/ucode/audit/extractors/aggregations.rb +31 -2
  79. data/lib/ucode/audit/face_auditor.rb +86 -0
  80. data/lib/ucode/audit/formatters/audit_diff_text.rb +112 -0
  81. data/lib/ucode/audit/formatters/audit_text.rb +411 -0
  82. data/lib/ucode/audit/formatters/color.rb +48 -0
  83. data/lib/ucode/audit/formatters/library_summary_text.rb +98 -0
  84. data/lib/ucode/audit/formatters/text_formatter.rb +83 -0
  85. data/lib/ucode/audit/formatters.rb +23 -0
  86. data/lib/ucode/audit/library_aggregator.rb +86 -0
  87. data/lib/ucode/audit/library_auditor.rb +105 -0
  88. data/lib/ucode/audit/release/emitter.rb +152 -0
  89. data/lib/ucode/audit/release/face_card.rb +93 -0
  90. data/lib/ucode/audit/release/formula_audits.rb +50 -0
  91. data/lib/ucode/audit/release/library_index_builder.rb +78 -0
  92. data/lib/ucode/audit/release/manifest_builder.rb +127 -0
  93. data/lib/ucode/audit/release.rb +42 -0
  94. data/lib/ucode/audit/ucd_only_reference.rb +81 -0
  95. data/lib/ucode/audit/universal_set_reference.rb +136 -0
  96. data/lib/ucode/audit.rb +31 -0
  97. data/lib/ucode/cli.rb +339 -33
  98. data/lib/ucode/commands/audit/browser_command.rb +82 -0
  99. data/lib/ucode/commands/audit/collection_command.rb +103 -0
  100. data/lib/ucode/commands/audit/compare_command.rb +188 -0
  101. data/lib/ucode/commands/audit/font_command.rb +140 -0
  102. data/lib/ucode/commands/audit/library_command.rb +87 -0
  103. data/lib/ucode/commands/audit/reference_builder.rb +64 -0
  104. data/lib/ucode/commands/audit.rb +20 -0
  105. data/lib/ucode/commands/block_feed.rb +73 -0
  106. data/lib/ucode/commands/canonical_build.rb +138 -0
  107. data/lib/ucode/commands/fetch.rb +37 -1
  108. data/lib/ucode/commands/release.rb +115 -0
  109. data/lib/ucode/commands/universal_set.rb +211 -0
  110. data/lib/ucode/commands.rb +5 -0
  111. data/lib/ucode/coordinator/indices.rb +11 -0
  112. data/lib/ucode/coordinator.rb +138 -5
  113. data/lib/ucode/error.rb +30 -2
  114. data/lib/ucode/fetch/font_fetcher/result.rb +39 -0
  115. data/lib/ucode/fetch/font_fetcher.rb +16 -0
  116. data/lib/ucode/fetch/specialist_font_fetcher.rb +280 -0
  117. data/lib/ucode/fetch.rb +7 -3
  118. data/lib/ucode/glyphs/real_fonts/cmap_cache.rb +74 -0
  119. data/lib/ucode/glyphs/real_fonts.rb +1 -0
  120. data/lib/ucode/glyphs/resolver.rb +62 -0
  121. data/lib/ucode/glyphs/source.rb +48 -0
  122. data/lib/ucode/glyphs/source_builder.rb +61 -0
  123. data/lib/ucode/glyphs/source_config/coverage_assertion.rb +79 -0
  124. data/lib/ucode/glyphs/source_config/gap_report.rb +54 -0
  125. data/lib/ucode/glyphs/source_config.rb +104 -0
  126. data/lib/ucode/glyphs/sources/pillar1_embedded_tounicode.rb +63 -0
  127. data/lib/ucode/glyphs/sources/pillar3_last_resort.rb +51 -0
  128. data/lib/ucode/glyphs/sources/tier1_real_font.rb +104 -0
  129. data/lib/ucode/glyphs/sources.rb +20 -0
  130. data/lib/ucode/glyphs/universal_set/builder.rb +161 -0
  131. data/lib/ucode/glyphs/universal_set/coverage_report.rb +139 -0
  132. data/lib/ucode/glyphs/universal_set/idempotency.rb +86 -0
  133. data/lib/ucode/glyphs/universal_set/manifest_accumulator.rb +195 -0
  134. data/lib/ucode/glyphs/universal_set/manifest_writer.rb +61 -0
  135. data/lib/ucode/glyphs/universal_set/pre_build_check.rb +197 -0
  136. data/lib/ucode/glyphs/universal_set/validator.rb +204 -0
  137. data/lib/ucode/glyphs/universal_set.rb +45 -0
  138. data/lib/ucode/glyphs.rb +6 -0
  139. data/lib/ucode/models/audit/baseline.rb +6 -0
  140. data/lib/ucode/models/audit/block_summary.rb +7 -0
  141. data/lib/ucode/models/audit/codepoint_provenance.rb +39 -0
  142. data/lib/ucode/models/audit/release_face.rb +42 -0
  143. data/lib/ucode/models/audit/release_formula.rb +33 -0
  144. data/lib/ucode/models/audit/release_manifest.rb +43 -0
  145. data/lib/ucode/models/audit/release_universal_set.rb +37 -0
  146. data/lib/ucode/models/audit.rb +9 -0
  147. data/lib/ucode/models/block.rb +2 -0
  148. data/lib/ucode/models/build_report.rb +109 -0
  149. data/lib/ucode/models/codepoint/glyph.rb +42 -0
  150. data/lib/ucode/models/codepoint.rb +3 -0
  151. data/lib/ucode/models/glyph_source.rb +86 -0
  152. data/lib/ucode/models/glyph_source_map.rb +138 -0
  153. data/lib/ucode/models/specialist_font.rb +70 -0
  154. data/lib/ucode/models/specialist_font_manifest.rb +48 -0
  155. data/lib/ucode/models/unihan_entry.rb +81 -9
  156. data/lib/ucode/models/unihan_field.rb +21 -0
  157. data/lib/ucode/models/universal_set_entry.rb +47 -0
  158. data/lib/ucode/models/universal_set_manifest.rb +78 -0
  159. data/lib/ucode/models/validation_report.rb +99 -0
  160. data/lib/ucode/models.rb +9 -0
  161. data/lib/ucode/parsers/named_sequences.rb +5 -5
  162. data/lib/ucode/parsers/unihan.rb +50 -19
  163. data/lib/ucode/repo/aggregate_writer.rb +34 -2
  164. data/lib/ucode/repo/block_feed_emitter.rb +153 -0
  165. data/lib/ucode/repo/build_report_accumulator.rb +138 -0
  166. data/lib/ucode/repo/build_report_writer.rb +46 -0
  167. data/lib/ucode/repo/build_validator.rb +229 -0
  168. data/lib/ucode/repo/codepoint_writer.rb +50 -1
  169. data/lib/ucode/repo/paths.rb +8 -0
  170. data/lib/ucode/repo.rb +4 -0
  171. data/lib/ucode/version.rb +1 -1
  172. data/schema/block-feed.output.schema.yml +134 -0
  173. metadata +143 -2
  174. data/ucode.gemspec +0 -56
@@ -0,0 +1,160 @@
1
+ # 35 — Universal set production run + glyph provenance (Part 1 close)
2
+
3
+ ## Goal
4
+
5
+ Run `ucode universal-set build 17.0.0` end-to-end with the curated
6
+ coverage matrix (TODO 32) and acquired fonts (TODO 33). Produce one
7
+ SVG glyph per assigned Unicode 17 codepoint (~299,382 files), plus
8
+ a manifest tracking which Tier 1 source produced each glyph.
9
+
10
+ Output goes under `output/universal_glyph_set/`:
11
+
12
+ ```
13
+ output/universal_glyph_set/
14
+ manifest.json # version, counts, generated_at
15
+ entries/
16
+ U+0041.json # { cp, source: { kind, label, tier }, sha256, ... }
17
+ U+0042.json
18
+ ...
19
+ glyphs/
20
+ U+0041.svg
21
+ U+0042.svg
22
+ ...
23
+ ```
24
+
25
+ The manifest is the **glyph provenance** record — for every glyph,
26
+ which font (or pillar 2 PDF, or Last Resort) produced it. This is
27
+ what fontist.org's char page surfaces as "this glyph came from
28
+ NotoSerifTaiYo, OFL, version 1.0".
29
+
30
+ ## Why a separate TODO
31
+
32
+ TODO 31 built the production infrastructure (`universal-set build` +
33
+ `validate` commands). It hasn't actually run against a complete
34
+ font set — every prior attempt failed at pre-check because fonts
35
+ were missing.
36
+
37
+ With TODO 32 (policy) and TODO 33 (acquisition) done, the production
38
+ run becomes possible. This TODO is the integration test: does the
39
+ end-to-end pipeline produce a complete, validated, provenance-tracked
40
+ universal set?
41
+
42
+ ## Scope
43
+
44
+ ### Phase A — Pre-check green
45
+
46
+ 1. Run `ucode universal-set pre-check 17.0.0`. Must report zero
47
+ missing fonts. If anything's still missing, bounce back to
48
+ TODO 33.
49
+
50
+ 2. Capture the pre-check report as a fixture under
51
+ `spec/fixtures/universal_set/pre_check_17.json`. Future
52
+ regressions are caught by diffing against this.
53
+
54
+ ### Phase B — Full build
55
+
56
+ 3. Run `ucode universal-set build 17.0.0 --to=./output/universal_glyph_set`.
57
+ Expected duration: ~30–60 minutes for 299,382 glyphs (dependent
58
+ on font cache warmth and pillar 2 PDF rendering for blocks that
59
+ need it).
60
+
61
+ 4. Capture build metrics:
62
+ - Total codepoints processed
63
+ - Per-tier breakdown (Tier 1 / Pillar 1 / Pillar 2 / Pillar 3)
64
+ - Per-block coverage %
65
+ - Wall-clock time
66
+
67
+ 5. Validate: `ucode universal-set validate ./output/universal_glyph_set`.
68
+ Must pass every check:
69
+ - manifest_loadable
70
+ - glyph_files_present (every codepoint has an SVG)
71
+ - totals_reconcile (manifest counts match file counts)
72
+ - provenance_complete (every entry has a source)
73
+ - structural_yaml_valid
74
+
75
+ ### Phase C — Provenance surfacing
76
+
77
+ 6. Extend the manifest schema to include per-entry provenance that
78
+ fontist.org can render. Each `entries/U+XXXX.json` carries:
79
+
80
+ ```json
81
+ {
82
+ "cp": 65,
83
+ "id": "U+0041",
84
+ "block_id": "Basic_Latin",
85
+ "source": {
86
+ "tier": 1,
87
+ "kind": "fontist",
88
+ "label": "noto-sans",
89
+ "version": "...",
90
+ "license": "OFL"
91
+ },
92
+ "sha256": "...",
93
+ "extracted_at": "2026-..."
94
+ }
95
+ ```
96
+
97
+ 7. Pillar 2 entries carry:
98
+ ```json
99
+ {
100
+ "source": {
101
+ "tier": 2,
102
+ "kind": "pdf_charts",
103
+ "pdf_url": "https://www.unicode.org/charts/PDF/U1E6C0.pdf",
104
+ "pdf_sha256": "...",
105
+ "cid": 41
106
+ }
107
+ }
108
+ ```
109
+
110
+ 8. Pillar 3 entries (Last Resort tofu) carry:
111
+ ```json
112
+ {
113
+ "source": {
114
+ "tier": 3,
115
+ "kind": "last_resort",
116
+ "label": "Last Resort Font"
117
+ }
118
+ }
119
+ ```
120
+
121
+ ### Phase D — HTML browser
122
+
123
+ 9. Generate `output/universal_glyph_set/index.html` — a static page
124
+ summarizing the build:
125
+
126
+ - Top-level stats (total codepoints, per-tier pie chart)
127
+ - Per-block table with coverage % and a sample of glyphs
128
+ - Click a glyph → see full provenance
129
+
130
+ Reuse the existing audit browser generator (`ucode audit browser`)
131
+ pattern. Output is dev-server-friendly — no JS build required.
132
+
133
+ ### Phase E — Idempotency
134
+
135
+ 10. Re-run the build without changing inputs. Every file must be
136
+ byte-identical (mtime unchanged). The existing `Idempotency`
137
+ module handles this; just verify it holds end-to-end.
138
+
139
+ 11. Re-run after touching one font (re-fetch Lentariso at the same
140
+ version). Only that font's glyphs should rewrite.
141
+
142
+ ## Acceptance
143
+
144
+ - [ ] `output/universal_glyph_set/` exists with `manifest.json` +
145
+ `entries/` + `glyphs/`
146
+ - [ ] 299,382+ glyph SVGs present (one per assigned codepoint)
147
+ - [ ] 0 pillar 3 fallbacks for blocks with known Tier 1 sources
148
+ - [ ] `universal-set validate` exits 0
149
+ - [ ] HTML browser renders locally with no JS errors
150
+ - [ ] Re-running build is a byte-identical no-op
151
+ - [ ] Provenance JSON for U+0041 (Tier 1 noto-sans) and U+1E6C0
152
+ (Tier 1 NotoSerifTaiYo) and at least one pillar 2 entry
153
+
154
+ ## References
155
+
156
+ - [TODO 24](24-universal-glyph-set-build.md) — build infrastructure
157
+ - [TODO 31](31-universal-set-production-build.md) — production design
158
+ - [TODO 32](32-uc17-coverage-matrix.md) — input policy
159
+ - [TODO 33](33-specialist-font-acquisition-refresh.md) — input fonts
160
+ - [TODO 38](38-fontist-org-glyph-consumer.md) — consumer wiring
@@ -0,0 +1,145 @@
1
+ # 36 — Per-font coverage audit against universal set (Part 2 master)
2
+
3
+ ## Goal
4
+
5
+ Given the universal glyph set (TODO 35) as the **reference baseline**,
6
+ audit ANY font's cmap coverage against it. For each font in our
7
+ library (or any user-supplied font), produce:
8
+
9
+ - Per-block coverage % (e.g. "Lentariso covers 100% of Sidetic,
10
+ 0% of Basic Latin")
11
+ - Per-block gap list (codepoints the font misses)
12
+ - Per-block extras (codepoints the font covers that aren't assigned
13
+ in Unicode 17 — rare but possible for old font versions)
14
+ - Overall coverage score weighted by Unicode 17 assigned count
15
+
16
+ This is **Part 2** of the user's directive: use the universal set
17
+ to highlight missing glyphs in specific fonts. The output drives
18
+ font selection decisions ("which font should fontist.org use for
19
+ this block?") and surfaces fonts that claim Unicode X.Y support
20
+ but actually have cmap gaps.
21
+
22
+ ## Why a separate TODO
23
+
24
+ TODO 25 built the `CoverageReference` infrastructure (universal set
25
+ as the comparison baseline). TODO 26 built the missing-glyph
26
+ reporter. Neither has been RUN against real fonts because the
27
+ universal set wasn't built (TODO 35 unblocks that).
28
+
29
+ With TODO 35 done, this TODO is the actual audit: walk each font
30
+ in our library, compare cmap to universal set, emit per-font
31
+ coverage reports.
32
+
33
+ ## Scope
34
+
35
+ ### Phase A — Audit library command
36
+
37
+ 1. Extend `ucode audit font` to accept an optional
38
+ `--reference-universal-set=<path>` flag. When provided, the
39
+ audit includes a `coverage` section comparing the font's cmap
40
+ to the universal set's per-block codepoint lists.
41
+
42
+ 2. The audit output gains a new section:
43
+ ```json
44
+ {
45
+ "coverage": {
46
+ "per_block": [
47
+ {
48
+ "block_id": "Sidetic",
49
+ "range": [10940, 1097F],
50
+ "assigned_count": 26,
51
+ "covered_count": 26,
52
+ "missing": [],
53
+ "extras": []
54
+ },
55
+ ...
56
+ ],
57
+ "overall": {
58
+ "total_assigned": 299382,
59
+ "total_covered": 145233,
60
+ "percentage": 48.5
61
+ }
62
+ }
63
+ }
64
+ ```
65
+
66
+ 3. Extend `ucode audit library` (walks a directory of fonts) to
67
+ produce a per-font summary table. Sortable by overall %, by
68
+ per-block coverage, or by font family.
69
+
70
+ ### Phase B — Reference baseline extraction
71
+
72
+ 4. Build a fast-loading reference structure from the universal set:
73
+
74
+ ```
75
+ output/universal_glyph_set/reference/
76
+ by_block.json # { block_id → [cp_int, ...] }
77
+ all_cps.bin # sorted array of cp_int, for fast bsearch
78
+ ```
79
+
80
+ The audit loads this once and compares each font's cmap against
81
+ it. Avoids re-reading 299k individual entry JSONs.
82
+
83
+ 5. The reference is generated as part of TODO 35's build step. This
84
+ TODO consumes it.
85
+
86
+ ### Phase C — Per-font gap browser
87
+
88
+ 6. Extend the HTML face browser (`ucode audit browser`) to surface
89
+ coverage gaps visually. For each font:
90
+
91
+ - Per-block table with coverage %, color-coded (green ≥95%,
92
+ yellow 50–95%, red <50%)
93
+ - Click a block → see the actual missing glyphs as a grid
94
+ (showing the universal set's glyph SVG for each missing cp,
95
+ so the user can see what the font is missing)
96
+
97
+ 7. Library-level summary page:
98
+ - Top-N fonts by overall coverage
99
+ - Heatmap: font × block, cell color = coverage %
100
+ - "Best font per block" table (which font has the highest
101
+ coverage for each block)
102
+
103
+ ### Phase D — Coverage regression detection
104
+
105
+ 8. When a font is updated (re-installed via fontist, or new version
106
+ fetched), re-run the audit and DIFF against the prior run.
107
+ Surface:
108
+ - Newly-covered codepoints (good)
109
+ - Newly-missing codepoints (regression — flag for review)
110
+
111
+ 9. CI mode: in the ucode release workflow, re-audit the universal
112
+ set's Tier 1 fonts against the latest universal set. Any
113
+ coverage regression blocks the release.
114
+
115
+ ### Phase E — Public coverage dashboard
116
+
117
+ 10. The HTML library browser can be published to
118
+ `fontist.org/unicode/coverage/` so users can search "which
119
+ font covers Cyrillic Extended-D?" and get an answer.
120
+
121
+ This is the fontist.org consumer integration for coverage
122
+ data — pairs with TODO 38 (glyph consumer).
123
+
124
+ ## Acceptance
125
+
126
+ - [ ] `ucode audit font <path> --reference-universal-set=...` emits
127
+ a `coverage` section with per-block + overall stats
128
+ - [ ] `ucode audit library <dir>` walks every font and produces a
129
+ sortable summary
130
+ - [ ] HTML face browser shows per-block coverage with click-through
131
+ to missing-glyph grids
132
+ - [ ] Library browser has a heatmap view
133
+ - [ ] At least 10 fonts audited end-to-end as a smoke test:
134
+ Lentariso, Kedebideri, NotoSerifTaiYo, FSung-1, Noto Sans
135
+ CJK JP, Noto Sans Symbols, Noto Sans Symbols 2, Symbola,
136
+ Noto Music, Last Resort Font
137
+
138
+ ## References
139
+
140
+ - [TODO 25](25-font-audit-against-universal-set.md) — CoverageReference
141
+ - [TODO 26](26-missing-glyph-reporter.md) — gap reporter
142
+ - [TODO 35](35-universal-set-production-run.md) — universal set (input)
143
+ - [TODO 37](37-coverage-highlight-reporter.md) — visualizer detail
144
+ - [TODO 38](38-fontist-org-glyph-consumer.md) — public dashboard
145
+ - [TODO 40](40-archive-private-uses-ucode-audit.md) — bin/build uses this audit in CI
@@ -0,0 +1,125 @@
1
+ # 37 — Coverage highlight reporter (missing-glyph visualizer)
2
+
3
+ ## Goal
4
+
5
+ A focused visualizer that takes ONE font + the universal set and
6
+ produces a per-block "missing glyph grid" — every codepoint the
7
+ font doesn't cover, rendered as the universal set's reference glyph
8
+ so the user can see at a glance what's missing.
9
+
10
+ Pairs with TODO 36 (the audit data layer). TODO 36 produces the
11
+ JSON-shaped gap lists; this TODO is the human-facing visualizer.
12
+
13
+ ## Why a separate TODO
14
+
15
+ TODO 26 built a missing-glyph reporter, but its output is a flat
16
+ list of codepoint ids. For a font like Noto Sans CJK JP missing
17
+ 200 codepoints across 5 CJK extensions, a flat list is useless —
18
+ you can't see the patterns. This TODO is the visual layer that
19
+ makes the data actionable.
20
+
21
+ The audience is font maintainers ("what's my font missing?") and
22
+ fontist.org ("which font should we use for this block?"). Both
23
+ need to see the actual glyphs, not hex strings.
24
+
25
+ ## Scope
26
+
27
+ ### Phase A — Per-font highlight page
28
+
29
+ 1. New command: `ucode audit highlight <font-path>` — produces
30
+ `output/audit/highlight/<font-slug>/index.html`.
31
+
32
+ 2. Page structure:
33
+ - Header: font name, version, license, overall coverage %
34
+ - Per-block sections, sorted by missing count (most-missing
35
+ first):
36
+ ```
37
+ Block: CJK Unified Ideographs Extension J (U+31350..U+323AF)
38
+ Missing: 4123 of 4298 codepoints (95.7% missing)
39
+
40
+ [Grid of missing glyphs, each cell showing:
41
+ - The reference glyph SVG (from universal set)
42
+ - Codepoint id (U+31450)
43
+ - Codepoint name (CJK UNIFIED IDEOGRAPH-31450)
44
+ ]
45
+ ```
46
+
47
+ 3. Grid cell click → drill to per-codepoint page with full UCD
48
+ metadata (reuses the existing UnicodeCharPage shape).
49
+
50
+ ### Phase B — Comparison view
51
+
52
+ 4. New command: `ucode audit compare <left-font> <right-font>` —
53
+ side-by-side coverage diff:
54
+ - Left covers, right misses (red, right side)
55
+ - Left misses, right covers (red, left side)
56
+ - Both cover (no entry)
57
+ - Both miss (gray)
58
+
59
+ 5. Use case: "FSung-1 vs Noto Sans CJK JP for CJK Ext J — which
60
+ should we use as Tier 1?"
61
+
62
+ ### Phase C — Library heatmap
63
+
64
+ 6. Library-level heatmap page. Rows = fonts, columns = blocks,
65
+ cell color = coverage %.
66
+
67
+ 7. Filter controls:
68
+ - Show only blocks with assigned_count > N
69
+ - Show only fonts with overall coverage > X%
70
+ - Sort by family / version / coverage
71
+
72
+ 8. Cell click → drill to per-block per-font detail.
73
+
74
+ ### Phase D — Embed reference glyphs efficiently
75
+
76
+ 9. The highlight page embeds reference glyphs as inline SVG (not
77
+ `<img>` referencing SVG files — that's 200k HTTP requests for
78
+ a full CJK page). Inline SVG with `<symbol>` definitions:
79
+
80
+ ```html
81
+ <svg style="display:none">
82
+ <defs>
83
+ <symbol id="U+4E00" viewBox="0 0 1000 1000">
84
+ <path d="..."/>
85
+ </symbol>
86
+ <symbol id="U+4E8C" viewBox="0 0 1000 1000">
87
+ <path d="..."/>
88
+ </symbol>
89
+ </defs>
90
+ </svg>
91
+ <svg viewBox="0 0 1000 1000"><use href="#U+4E00"/></svg>
92
+ ```
93
+
94
+ 10. For large blocks (CJK 20k+ glyphs), partition the page by
95
+ block-range subsets so the browser doesn't choke. Pagination
96
+ or lazy-load via IntersectionObserver.
97
+
98
+ ### Phase E — Diff mode for font versions
99
+
100
+ 11. `ucode audit diff <font-v1> <font-v2>` — for the same font
101
+ family across versions, surface:
102
+ - Codepoints added in v2 (good — coverage improved)
103
+ - Codepoints removed in v2 (regression — flag for review)
104
+
105
+ Useful for tracking Noto Sans releases across Unicode versions.
106
+
107
+ ## Acceptance
108
+
109
+ - [ ] `ucode audit highlight <font>` produces an HTML page with
110
+ per-block missing-glyph grids
111
+ - [ ] `ucode audit compare <left> <right>` produces a side-by-side
112
+ diff page
113
+ - [ ] Library heatmap renders with no perf issues for ≤50 fonts ×
114
+ ~340 blocks
115
+ - [ ] Reference glyphs inlined as `<symbol>` defs (no per-glyph
116
+ HTTP requests)
117
+ - [ ] CJK-scale block (20k+ glyphs) paginates or lazy-loads
118
+ - [ ] Cell clicks navigate to per-codepoint pages (existing
119
+ UnicodeCharPage)
120
+
121
+ ## References
122
+
123
+ - [TODO 26](26-missing-glyph-reporter.md) — flat-list predecessor
124
+ - [TODO 36](36-per-font-coverage-audit.md) — audit data layer
125
+ - `lib/ucode/audit/browser.rb` — existing HTML browser generator
@@ -0,0 +1,141 @@
1
+ # 38 — fontist.org glyph consumer + provenance display
2
+
3
+ ## Goal
4
+
5
+ Wire fontist.org's `UnicodeCharPage.vue` to consume the universal
6
+ glyph set (TODO 35) — replace the current system-font-fallback
7
+ glyph rendering with the actual SVG ucode extracted. Surface the
8
+ font source (provenance) next to the glyph so users see "this glyph
9
+ came from NotoSerifTaiYo, OFL".
10
+
11
+ This closes the loop: ucode extracts → fontist.org displays.
12
+
13
+ ## Why a separate TODO
14
+
15
+ Today the char page renders glyphs via `displayChar(cp, charData.c)`
16
+ — browser-side font resolution. That means:
17
+
18
+ - Tai Yo / Sidetic / Beria Erfe / Egyptian Hieroglyphs show tofu
19
+ on most systems (no system font has them)
20
+ - The "which font is this from?" question has no answer
21
+ - ucode's universal glyph set isn't consumed anywhere
22
+
23
+ With the universal set built (TODO 35), fontist.org can fetch
24
+ `/unicode/glyph/U+XXXX.svg` and render the actual extracted outline.
25
+ Provenance comes from the per-codepoint JSON the universal set
26
+ already emits.
27
+
28
+ ## Scope
29
+
30
+ ### Phase A — Sync glyphs into fontist.org's public/
31
+
32
+ 1. Extend `scripts/fetch-data.sh` to copy `unicode/` from
33
+ `fontist-archive-public` (per TODO 41 — the artifacts land there
34
+ via ucode's publish workflow, NOT via direct raw.githubusercontent.com
35
+ fetch):
36
+
37
+ ```bash
38
+ log "copying unicode/block-feed/ + universal-glyph-set/"
39
+ mkdir -p "$PUBLIC/unicode"
40
+ if [[ -d "$TMP/archive/unicode/block-feed" ]]; then
41
+ cp -r "$TMP/archive/unicode/block-feed/." "$PUBLIC/unicode/"
42
+ fi
43
+ if [[ -d "$TMP/archive/unicode/universal-glyph-set" ]]; then
44
+ mkdir -p "$PUBLIC/unicode/glyphs"
45
+ cp -r "$TMP/archive/unicode/universal-glyph-set/glyphs/." "$PUBLIC/unicode/glyphs/"
46
+ cp "$TMP/archive/unicode/universal-glyph-set/manifest.json" "$PUBLIC/unicode/manifest.json"
47
+ fi
48
+ ```
49
+
50
+ 2. Scale check: 299,382 SVG files at ~1KB each = ~300MB. Too big for
51
+ the fontist.org repo but fine for the public archive. For LOCAL
52
+ dev: full copy is OK. For deployment: rsync to CDN target (not
53
+ committed to git).
54
+
55
+ 3. For per-codepoint JSONs (1.2 GB): add `--with-codepoints` flag
56
+ (default OFF). When ON, download + extract the Release asset
57
+ (per TODO 41 §Phase A.3) to `public/codepoints/`.
58
+
59
+ ### Phase B — Char page glyph rendering
60
+
61
+ 4. Update `UnicodeCharPage.vue`:
62
+ - Replace `displayChar(cp, charData.c)` with `<img :src="`/unicode/glyph/U+${hex}.svg`">`
63
+ when the SVG exists; fall back to `displayChar` for missing
64
+ glyphs (unassigned codepoints not in universal set)
65
+ - Add a "Source" caption: `<small>Glyph from {{ source.label }} ({{ source.license }})</small>`
66
+
67
+ 5. Fetch the per-codepoint JSON (already wired in current PR #45)
68
+ to get `source.label`, `source.license`, `source.tier`. Show
69
+ tier as a badge: "Tier 1" / "Pillar 2" / "Last Resort".
70
+
71
+ ### Phase C — Per-block glyph grid
72
+
73
+ 6. On the block page (`UnicodeBlockPage.vue`), the existing char
74
+ grid currently uses `displayChar` for each cell. Replace with
75
+ inline SVG references via `<symbol>` defs (one def per glyph
76
+ on the page, cells `<use>` it). Pattern from TODO 37.
77
+
78
+ 7. For CJK-scale blocks (20k+ glyphs), lazy-load on scroll. The
79
+ existing block grid already paginates; just swap the rendering.
80
+
81
+ ### Phase D — Provenance badge component
82
+
83
+ 8. New Vue component `GlyphSourceBadge.vue`:
84
+ ```vue
85
+ <template>
86
+ <span class="gsb" :class="`gsb-${tier}`">
87
+ <span class="gsb-tier">{{ tierLabel }}</span>
88
+ <span class="gsb-label">{{ source.label }}</span>
89
+ <span class="gsb-license" v-if="source.license">{{ source.license }}</span>
90
+ </span>
91
+ </template>
92
+ ```
93
+
94
+ 9. Color coding:
95
+ - Tier 1 (real font): green
96
+ - Pillar 1 (PDF + ToUnicode): blue
97
+ - Pillar 2 (PDF correlation): yellow
98
+ - Pillar 3 (Last Resort): gray
99
+
100
+ ### Phase E — Per-block coverage indicator
101
+
102
+ 10. On the block list page (`/unicode`), each block entry shows
103
+ coverage stats from the universal set:
104
+ - "4123/4298 codepoints covered by Tier 1 (Noto Sans CJK JP)"
105
+ - Color-coded bar: green = full Tier 1, yellow = mixed, red =
106
+ pillar 3 (tofu)
107
+
108
+ 11. Click the bar → drill to the per-block highlight page (TODO 37).
109
+
110
+ ### Phase F — Glyph detail page
111
+
112
+ 12. New route `/unicode/glyph/:hex` — dedicated glyph detail:
113
+ - Large SVG render with zoom/pan
114
+ - Full outline path data (collapsible `<pre>`)
115
+ - Provenance chain (font → cmap → GID → glyf outline → SVG)
116
+ - Comparison: this glyph vs other Tier 1 fonts covering same cp
117
+
118
+ Useful for font designers checking extraction quality.
119
+
120
+ ## Acceptance
121
+
122
+ - [ ] `scripts/fetch-data.sh` copies `unicode/` from fontist-archive-public
123
+ (block-feed + universal-glyph-set; per-cp JSONs via optional flag)
124
+ - [ ] `UnicodeCharPage.vue` renders the universal-set SVG (not
125
+ system fallback) for codepoints in the universal set
126
+ - [ ] Provenance badge shows next to every glyph
127
+ - [ ] Block grid renders glyphs via inline SVG `<symbol>` defs
128
+ (no per-glyph HTTP requests)
129
+ - [ ] Block list page shows per-block Tier 1 coverage %
130
+ - [ ] `/unicode/glyph/:hex` route exists with full provenance view
131
+ - [ ] Tai Yo / Sidetic / Egyptian Hieroglyphs render real glyphs
132
+ (no tofu) when sourced from universal set
133
+
134
+ ## References
135
+
136
+ - [TODO 27](27-fontist-org-consumer-integration.md) — original consumer TODO
137
+ - [TODO 35](35-universal-set-production-run.md) — universal set (input)
138
+ - [TODO 37](37-coverage-highlight-reporter.md) — visualizer patterns
139
+ - [TODO 41](41-ucode-unicode-archive-bridge.md) — publishing pipeline
140
+ - `src/pages/UnicodeCharPage.vue` — current char page
141
+ - `src/pages/UnicodeBlockPage.vue` — current block page