ucode 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (228) hide show
  1. checksums.yaml +7 -0
  2. data/CLAUDE.md +211 -0
  3. data/Gemfile +22 -0
  4. data/Gemfile.lock +406 -0
  5. data/README.md +469 -0
  6. data/Rakefile +18 -0
  7. data/TODO.new/00-README.md +66 -0
  8. data/TODO.new/01-pillar-terminology-alignment.md +69 -0
  9. data/TODO.new/02-audit-schema-design.md +255 -0
  10. data/TODO.new/03-directory-output-spec.md +203 -0
  11. data/TODO.new/04-fontist-org-contract.md +173 -0
  12. data/TODO.new/05-baseline-unicode17-coverage-audit.md +144 -0
  13. data/TODO.new/06-audit-namespace-skeleton.md +105 -0
  14. data/TODO.new/07-audit-models-port.md +132 -0
  15. data/TODO.new/08-extractors-cheap-port.md +113 -0
  16. data/TODO.new/09-extractors-expensive-port.md +99 -0
  17. data/TODO.new/10-aggregations-ucd-rewrite.md +168 -0
  18. data/TODO.new/11-differ-and-library-auditor-port.md +102 -0
  19. data/TODO.new/12-formatters-port.md +115 -0
  20. data/TODO.new/13-directory-emitter.md +147 -0
  21. data/TODO.new/14-html-face-browser.md +144 -0
  22. data/TODO.new/15-html-library-browser.md +102 -0
  23. data/TODO.new/16-cli-audit-subcommands.md +142 -0
  24. data/TODO.new/17-fontisan-cleanup-audit.md +147 -0
  25. data/TODO.new/18-fontisan-cleanup-ucd.md +156 -0
  26. data/TODO.new/19-fontisan-docs-update.md +155 -0
  27. data/TODO.new/20-canonical-resolver-4-tier.md +182 -0
  28. data/TODO.new/21-canonical-unicode17-build.md +148 -0
  29. data/TODO.new/22-implementation-order.md +176 -0
  30. data/UCODE_CHANGELOG.md +97 -0
  31. data/exe/ucode +8 -0
  32. data/lib/ucode/aggregator.rb +77 -0
  33. data/lib/ucode/audit/block_aggregator.rb +90 -0
  34. data/lib/ucode/audit/codepoint_range_coalescer.rb +42 -0
  35. data/lib/ucode/audit/context.rb +137 -0
  36. data/lib/ucode/audit/discrepancy_detector.rb +213 -0
  37. data/lib/ucode/audit/extractors/aggregations.rb +70 -0
  38. data/lib/ucode/audit/extractors/base.rb +21 -0
  39. data/lib/ucode/audit/extractors/color_capabilities.rb +143 -0
  40. data/lib/ucode/audit/extractors/coverage.rb +55 -0
  41. data/lib/ucode/audit/extractors/hinting.rb +199 -0
  42. data/lib/ucode/audit/extractors/identity.rb +65 -0
  43. data/lib/ucode/audit/extractors/licensing.rb +75 -0
  44. data/lib/ucode/audit/extractors/metrics.rb +108 -0
  45. data/lib/ucode/audit/extractors/opentype_layout.rb +71 -0
  46. data/lib/ucode/audit/extractors/provenance.rb +34 -0
  47. data/lib/ucode/audit/extractors/style.rb +88 -0
  48. data/lib/ucode/audit/extractors/variation_detail.rb +101 -0
  49. data/lib/ucode/audit/extractors.rb +31 -0
  50. data/lib/ucode/audit/plane_aggregator.rb +37 -0
  51. data/lib/ucode/audit/registry.rb +63 -0
  52. data/lib/ucode/audit/script_aggregator.rb +92 -0
  53. data/lib/ucode/audit.rb +27 -0
  54. data/lib/ucode/cache.rb +113 -0
  55. data/lib/ucode/cli.rb +272 -0
  56. data/lib/ucode/commands/build.rb +68 -0
  57. data/lib/ucode/commands/cache.rb +46 -0
  58. data/lib/ucode/commands/fetch.rb +62 -0
  59. data/lib/ucode/commands/font_coverage.rb +57 -0
  60. data/lib/ucode/commands/glyphs.rb +136 -0
  61. data/lib/ucode/commands/lookup.rb +65 -0
  62. data/lib/ucode/commands/parse.rb +62 -0
  63. data/lib/ucode/commands/site.rb +33 -0
  64. data/lib/ucode/commands.rb +19 -0
  65. data/lib/ucode/config.rb +110 -0
  66. data/lib/ucode/coordinator/indices.rb +34 -0
  67. data/lib/ucode/coordinator.rb +397 -0
  68. data/lib/ucode/database.rb +214 -0
  69. data/lib/ucode/db_builder.rb +107 -0
  70. data/lib/ucode/error.rb +96 -0
  71. data/lib/ucode/fetch/code_charts.rb +57 -0
  72. data/lib/ucode/fetch/http.rb +83 -0
  73. data/lib/ucode/fetch/ucd_zip.rb +57 -0
  74. data/lib/ucode/fetch/unihan_zip.rb +57 -0
  75. data/lib/ucode/fetch.rb +14 -0
  76. data/lib/ucode/glyphs/cell_extractor.rb +130 -0
  77. data/lib/ucode/glyphs/dvisvgm_renderer.rb +29 -0
  78. data/lib/ucode/glyphs/embedded_fonts/catalog.rb +372 -0
  79. data/lib/ucode/glyphs/embedded_fonts/content_stream_correlator.rb +228 -0
  80. data/lib/ucode/glyphs/embedded_fonts/font_entry.rb +126 -0
  81. data/lib/ucode/glyphs/embedded_fonts/renderer.rb +47 -0
  82. data/lib/ucode/glyphs/embedded_fonts/source.rb +94 -0
  83. data/lib/ucode/glyphs/embedded_fonts/svg.rb +123 -0
  84. data/lib/ucode/glyphs/embedded_fonts/tounicode.rb +103 -0
  85. data/lib/ucode/glyphs/embedded_fonts/writer.rb +76 -0
  86. data/lib/ucode/glyphs/embedded_fonts.rb +50 -0
  87. data/lib/ucode/glyphs/grid.rb +30 -0
  88. data/lib/ucode/glyphs/grid_detector.rb +165 -0
  89. data/lib/ucode/glyphs/last_resort/cmap_index.rb +96 -0
  90. data/lib/ucode/glyphs/last_resort/contents.rb +74 -0
  91. data/lib/ucode/glyphs/last_resort/glif.rb +124 -0
  92. data/lib/ucode/glyphs/last_resort/renderer.rb +67 -0
  93. data/lib/ucode/glyphs/last_resort/source.rb +125 -0
  94. data/lib/ucode/glyphs/last_resort/svg.rb +247 -0
  95. data/lib/ucode/glyphs/last_resort/writer.rb +83 -0
  96. data/lib/ucode/glyphs/last_resort.rb +36 -0
  97. data/lib/ucode/glyphs/monolith_page_map.rb +181 -0
  98. data/lib/ucode/glyphs/mutool_renderer.rb +28 -0
  99. data/lib/ucode/glyphs/page_renderer.rb +221 -0
  100. data/lib/ucode/glyphs/path_bbox.rb +62 -0
  101. data/lib/ucode/glyphs/pdf2svg_renderer.rb +26 -0
  102. data/lib/ucode/glyphs/pdf_fetcher.rb +102 -0
  103. data/lib/ucode/glyphs/pdftocairo_renderer.rb +32 -0
  104. data/lib/ucode/glyphs/real_fonts/block_coverage.rb +45 -0
  105. data/lib/ucode/glyphs/real_fonts/coverage_auditor.rb +117 -0
  106. data/lib/ucode/glyphs/real_fonts/font_coverage_report.rb +45 -0
  107. data/lib/ucode/glyphs/real_fonts/font_locator.rb +95 -0
  108. data/lib/ucode/glyphs/real_fonts/unicode_17_blocks.rb +104 -0
  109. data/lib/ucode/glyphs/real_fonts/writer.rb +50 -0
  110. data/lib/ucode/glyphs/real_fonts.rb +32 -0
  111. data/lib/ucode/glyphs/writer.rb +250 -0
  112. data/lib/ucode/glyphs.rb +27 -0
  113. data/lib/ucode/index.rb +106 -0
  114. data/lib/ucode/index_builder.rb +94 -0
  115. data/lib/ucode/models/audit/audit_axis.rb +30 -0
  116. data/lib/ucode/models/audit/audit_diff.rb +77 -0
  117. data/lib/ucode/models/audit/audit_report.rb +137 -0
  118. data/lib/ucode/models/audit/baseline.rb +32 -0
  119. data/lib/ucode/models/audit/block_summary.rb +72 -0
  120. data/lib/ucode/models/audit/codepoint_detail.rb +45 -0
  121. data/lib/ucode/models/audit/codepoint_range.rb +39 -0
  122. data/lib/ucode/models/audit/codepoint_set_diff.rb +34 -0
  123. data/lib/ucode/models/audit/color_capabilities.rb +91 -0
  124. data/lib/ucode/models/audit/discrepancy.rb +38 -0
  125. data/lib/ucode/models/audit/duplicate_group.rb +23 -0
  126. data/lib/ucode/models/audit/embedding_type.rb +81 -0
  127. data/lib/ucode/models/audit/field_change.rb +28 -0
  128. data/lib/ucode/models/audit/fs_selection_flags.rb +65 -0
  129. data/lib/ucode/models/audit/gasp_range.rb +63 -0
  130. data/lib/ucode/models/audit/hinting.rb +99 -0
  131. data/lib/ucode/models/audit/library_summary.rb +40 -0
  132. data/lib/ucode/models/audit/licensing.rb +48 -0
  133. data/lib/ucode/models/audit/metrics.rb +111 -0
  134. data/lib/ucode/models/audit/named_instance.rb +41 -0
  135. data/lib/ucode/models/audit/opentype_layout.rb +38 -0
  136. data/lib/ucode/models/audit/plane_summary.rb +31 -0
  137. data/lib/ucode/models/audit/script_coverage_row.rb +26 -0
  138. data/lib/ucode/models/audit/script_features.rb +28 -0
  139. data/lib/ucode/models/audit/script_summary.rb +54 -0
  140. data/lib/ucode/models/audit/variation_detail.rb +42 -0
  141. data/lib/ucode/models/audit.rb +50 -0
  142. data/lib/ucode/models/bidi_bracket_pair.rb +20 -0
  143. data/lib/ucode/models/bidi_mirroring.rb +19 -0
  144. data/lib/ucode/models/binary_property_assignment.rb +26 -0
  145. data/lib/ucode/models/block.rb +36 -0
  146. data/lib/ucode/models/case_folding_rule.rb +23 -0
  147. data/lib/ucode/models/cjk_radical.rb +23 -0
  148. data/lib/ucode/models/codepoint/bidi.rb +28 -0
  149. data/lib/ucode/models/codepoint/break_segmentation.rb +22 -0
  150. data/lib/ucode/models/codepoint/case_folding.rb +25 -0
  151. data/lib/ucode/models/codepoint/casing.rb +32 -0
  152. data/lib/ucode/models/codepoint/decomposition.rb +27 -0
  153. data/lib/ucode/models/codepoint/display.rb +24 -0
  154. data/lib/ucode/models/codepoint/emoji.rb +29 -0
  155. data/lib/ucode/models/codepoint/hangul.rb +20 -0
  156. data/lib/ucode/models/codepoint/identifier.rb +30 -0
  157. data/lib/ucode/models/codepoint/indic.rb +20 -0
  158. data/lib/ucode/models/codepoint/joining.rb +20 -0
  159. data/lib/ucode/models/codepoint/normalization.rb +35 -0
  160. data/lib/ucode/models/codepoint/numeric_value.rb +35 -0
  161. data/lib/ucode/models/codepoint.rb +122 -0
  162. data/lib/ucode/models/name_alias.rb +21 -0
  163. data/lib/ucode/models/named_sequence.rb +19 -0
  164. data/lib/ucode/models/names_list_entry.rb +38 -0
  165. data/lib/ucode/models/plane.rb +36 -0
  166. data/lib/ucode/models/property_alias.rb +24 -0
  167. data/lib/ucode/models/property_value_alias.rb +26 -0
  168. data/lib/ucode/models/relationship/compat_equiv.rb +18 -0
  169. data/lib/ucode/models/relationship/cross_reference.rb +17 -0
  170. data/lib/ucode/models/relationship/footnote.rb +24 -0
  171. data/lib/ucode/models/relationship/informal_alias.rb +18 -0
  172. data/lib/ucode/models/relationship/sample_sequence.rb +24 -0
  173. data/lib/ucode/models/relationship/variation_sequence.rb +19 -0
  174. data/lib/ucode/models/relationship.rb +57 -0
  175. data/lib/ucode/models/script.rb +41 -0
  176. data/lib/ucode/models/special_casing_rule.rb +28 -0
  177. data/lib/ucode/models/standardized_variant.rb +24 -0
  178. data/lib/ucode/models/unihan_entry.rb +23 -0
  179. data/lib/ucode/models.rb +47 -0
  180. data/lib/ucode/parsers/auxiliary.rb +26 -0
  181. data/lib/ucode/parsers/base.rb +137 -0
  182. data/lib/ucode/parsers/bidi_brackets.rb +41 -0
  183. data/lib/ucode/parsers/bidi_mirroring.rb +37 -0
  184. data/lib/ucode/parsers/blocks.rb +63 -0
  185. data/lib/ucode/parsers/case_folding.rb +53 -0
  186. data/lib/ucode/parsers/cjk_radicals.rb +102 -0
  187. data/lib/ucode/parsers/derived_age.rb +59 -0
  188. data/lib/ucode/parsers/derived_core_properties.rb +60 -0
  189. data/lib/ucode/parsers/extracted_properties.rb +74 -0
  190. data/lib/ucode/parsers/name_aliases.rb +44 -0
  191. data/lib/ucode/parsers/named_sequences.rb +51 -0
  192. data/lib/ucode/parsers/names_list.rb +250 -0
  193. data/lib/ucode/parsers/property_aliases.rb +41 -0
  194. data/lib/ucode/parsers/property_value_aliases.rb +46 -0
  195. data/lib/ucode/parsers/script_extensions.rb +64 -0
  196. data/lib/ucode/parsers/scripts.rb +60 -0
  197. data/lib/ucode/parsers/special_casing.rb +62 -0
  198. data/lib/ucode/parsers/standardized_variants.rb +56 -0
  199. data/lib/ucode/parsers/unicode_data/hangul_name.rb +73 -0
  200. data/lib/ucode/parsers/unicode_data.rb +268 -0
  201. data/lib/ucode/parsers/unihan.rb +125 -0
  202. data/lib/ucode/parsers.rb +35 -0
  203. data/lib/ucode/range_entry.rb +58 -0
  204. data/lib/ucode/repo/aggregate_writer.rb +364 -0
  205. data/lib/ucode/repo/atomic_writes.rb +48 -0
  206. data/lib/ucode/repo/codepoint_writer.rb +96 -0
  207. data/lib/ucode/repo/paths.rb +122 -0
  208. data/lib/ucode/repo.rb +22 -0
  209. data/lib/ucode/site/config_emitter.rb +124 -0
  210. data/lib/ucode/site/generator.rb +178 -0
  211. data/lib/ucode/site/search_index.rb +68 -0
  212. data/lib/ucode/site/template/.gitignore +4 -0
  213. data/lib/ucode/site/template/.vitepress/config.ts +8 -0
  214. data/lib/ucode/site/template/.vitepress/theme/index.js +20 -0
  215. data/lib/ucode/site/template/char/[codepoint].md +13 -0
  216. data/lib/ucode/site/template/components/BlockView.vue +57 -0
  217. data/lib/ucode/site/template/components/CharView.vue +85 -0
  218. data/lib/ucode/site/template/components/PlaneView.vue +56 -0
  219. data/lib/ucode/site/template/components/SearchView.vue +66 -0
  220. data/lib/ucode/site/template/index.md +25 -0
  221. data/lib/ucode/site/template/package.json +18 -0
  222. data/lib/ucode/site/template/search.md +9 -0
  223. data/lib/ucode/site.rb +13 -0
  224. data/lib/ucode/version.rb +5 -0
  225. data/lib/ucode/version_resolver.rb +76 -0
  226. data/lib/ucode.rb +74 -0
  227. data/ucode.gemspec +56 -0
  228. metadata +404 -0
@@ -0,0 +1,144 @@
1
+ # 05 — Unicode 17 baseline coverage audit
2
+
3
+ ## Goal
4
+
5
+ Capture the actual per-tier coverage numbers for every Unicode 17
6
+ addition (11 new blocks + additions to ~12 existing blocks) before
7
+ locking the migration scope. The numbers ground every later decision:
8
+ which blocks need pillar 2 work, which need real-font mining, which
9
+ are already complete via tier 1.
10
+
11
+ The publisher-confirmed table below is the starting point (carried from
12
+ prior sessions). The deliverable is cmap-verified numbers, not
13
+ publisher claims.
14
+
15
+ ## Why measure first
16
+
17
+ Two open questions cannot be answered without running the audit:
18
+
19
+ 1. **Pillar coverage breakdown**: for each Unicode 17 block, how many
20
+ codepoints does Tier 1 cover? Pillar 1? Pillar 2? Pillar 3? The
21
+ migration's pillar-2 generalization work (already merged for Tai Yo)
22
+ needs to know which other blocks need it.
23
+ 2. **Real-font availability**: which Unicode 17 additions have a
24
+ public Tier 1 font vs which require pillar 2 (Sidetic, Beria Erfe)?
25
+ The canonical resolver config (TODO 20) needs this mapping.
26
+
27
+ ## Scope — Unicode 17 additions
28
+
29
+ ### 11 new blocks
30
+
31
+ | Block | Range | Assigned | Tier 1 source (publisher-confirmed) | Confidence |
32
+ |---|---|---:|---|---|
33
+ | Sidetic | U+10940–1095F | 26 | Lentariso ≥1.029 + Noto Sans Sidetic | HIGH |
34
+ | Sharada Supplement | U+11B60–11B7F | 8 | Noto Sans Sharada | HIGH |
35
+ | Tolong Siki | U+11DB0–11DEF | 54 | Noto Sans Tolong Siki | HIGH |
36
+ | Beria Erfe | U+16EA0–16EDF | 50 | Kedebideri 3.001 (SIL) | HIGH |
37
+ | Tai Yo | U+1E6C0–1E6F3 | 55 | Noto Sans Tai Yo + pillar 2 (proven) | HIGH |
38
+ | Symbols for Legacy Computing Supplement | U+1CC00–1CCFF | 9 | BabelStone Pseudographica | MEDIUM |
39
+ | Supplemental Arrows-C | U+1CF00–1CFCF | 9 | Symbola | MEDIUM |
40
+ | Alchemical Symbols (ext) | U+1F740–1F77F | 4 | Noto Sans Symbols + Symbola | HIGH |
41
+ | Miscellaneous Symbols Supplement | U+1FA70–1FAFF | 34 | Noto Sans Symbols 2 | HIGH |
42
+ | Musical Symbols Supplement (additions) | U+1D200–U+1D2FF | TBD | Noto Music | HIGH |
43
+ | CJK Extension J | U+31350–U+323AF | 4298 | FSung + Noto Sans/Serif CJK | HIGH |
44
+
45
+ ### Additions to existing blocks
46
+
47
+ - Tangut (+8), Tangut Supplement (+22), Tangut Components Supp. (+115) → Noto Sans Tangut + grave-app.
48
+ - Adlam (+29) → Noto Sans Adlam.
49
+ - Arabic Extended-B/C → Noto Naskh Arabic.
50
+ - Telugu (+1), Kannada (+1) → existing Noto.
51
+ - Combining Diacritical Marks Extended (+27) → likely Pillar 2 (font support spotty).
52
+ - CJK Extension C/E additions → FSung.
53
+ - Chess Symbols (+4), Transport (+1), Symbols & Pictographs Ext-A (+6) → Noto Symbols 2.
54
+ - Egyptian Hieroglyphs (+28), Egyptian Hieroglyphs Format Controls (all), Egyptian Hieroglyphs Extended-A (+9), Egyptian Hieroglyphs Extended-B (~600 new) → UniHieroglyphica + Egyptian Text.
55
+
56
+ ## Procedure
57
+
58
+ For each block:
59
+
60
+ 1. **Acquire Tier 1 font** (if publisher-confirmed):
61
+ - Lentariso: github.com/Bry10022/Lentariso (SFD source, OFL).
62
+ - Kedebideri 3.001: software.sil.org/kedebideri/.
63
+ - Noto family: notofonts.github.io or Google Fonts.
64
+ - FSung: `~/Downloads/全宋體/FSung-*.ttf` (already local).
65
+ - BabelStone Pseudographica, Symbola: BabelStone site.
66
+ - UniHieroglyphica: suignard.com (OFL); Egyptian Text: microsoft/font-tools.
67
+ - NotoSerifTaiYo: translationcommons.org.
68
+
69
+ 2. **Run Tier 1 cmap audit** via the existing `ucode font-coverage` CLI
70
+ (renamed to `ucode audit font` in TODO 16; both names valid until
71
+ then):
72
+ ```
73
+ ucode font-coverage <font-path> --label <font-label> \
74
+ --unicode-version 17.0
75
+ ```
76
+ Output: `output/font_coverage/<label>.json` (becomes
77
+ `output/font_audit/<label>/index.json` post-migration).
78
+
79
+ 3. **Capture pillar 1-2 stats** by running `ucode glyphs` against each
80
+ per-block PDF:
81
+ ```
82
+ ucode glyphs --block <block-first-cp> --version 17.0
83
+ ```
84
+ The `Catalog` reports `#codepoints`, `#size`, `#font_count` — log
85
+ these per block.
86
+
87
+ 4. **Cross-check**: Tier 1 cmap count + pillar 1-2 chart count should
88
+ equal `total_assigned` for that block. Discrepancies flag where
89
+ pillar 2 needs to generalize (e.g. fonts without ToUnicode) or
90
+ where Tier 1 fonts are missing codepoints the chart shows.
91
+
92
+ 5. **Sidetic + Beria Erfe specifically** — re-audit with the now-merged
93
+ `ContentStreamCorrelator` (commit `24e6bfd`). The Tai Yo proof
94
+ should generalize; verify the bucket sizes match (Sidetic/Beria
95
+ Erfe may have tighter grid than Tai Yo).
96
+
97
+ ## Deliverable
98
+
99
+ A single markdown report at `docs/unicode17-coverage-baseline.md`
100
+ containing:
101
+
102
+ - One table per Unicode 17 block with: assigned count, Tier 1 covered,
103
+ Pillar 1 covered, Pillar 2 covered, Pillar 3 covered, gap, notes.
104
+ - A summary table aggregating across all blocks.
105
+ - Identified pillar 2 generalization needs (fonts the correlator must
106
+ handle).
107
+ - Identified Tier 1 font gaps (codepoints the publisher-confirmed font
108
+ doesn't actually cover).
109
+
110
+ This report becomes the input to TODO 20 (canonical resolver config:
111
+ block → preferred Tier 1 font) and TODO 21 (Unicode 17 dataset build
112
+ verification).
113
+
114
+ ## Acceptance
115
+
116
+ - All 11 new blocks have cmap-verified Tier 1 numbers (not just
117
+ publisher claims).
118
+ - All Unicode 17 additions to existing blocks have at least publisher
119
+ confirmation; cmap-verified where a font is locally available.
120
+ - `docs/unicode17-coverage-baseline.md` exists with the above tables.
121
+ - Sidetic and Beria Erfe show 26/26 and 50/50 respectively via the
122
+ merged pillar 2 path (validates the correlator generalization).
123
+ - CJK Extension J: FSung covers the 4,298 assigned codepoints (or
124
+ documents which subset is missing).
125
+
126
+ ## Out of scope
127
+
128
+ - Migrating any code (that's TODOs 06-19).
129
+ - The canonical 4-tier resolver (TODO 20) — this audit informs it but
130
+ doesn't build it.
131
+ - HTML browser (TODO 14-15) — the audit outputs JSON only.
132
+
133
+ ## References
134
+
135
+ - Mode 1 vs Mode 2: `docs/architecture.md` §"Two output modes"
136
+ - Tier 1 implementation: `lib/ucode/glyphs/real_fonts/`
137
+ - Pillar 1 implementation: `lib/ucode/glyphs/embedded_fonts/catalog.rb`
138
+ - Pillar 2 implementation:
139
+ `lib/ucode/glyphs/embedded_fonts/content_stream_correlator.rb`
140
+ - Proven Tai Yo correlator: `/tmp/correlate_v4.rb` (carried forward as
141
+ the spec fixture basis)
142
+ - PR #1 description: Tier-1 + Pillar-1 + Pillar-2 already validated
143
+ for Sidetic (26/26 via Lentariso), Beria Erfe (50/50 via
144
+ Kedebideri), Tai Yo (54/54 via pillar 2)
@@ -0,0 +1,105 @@
1
+ # 06 — Audit namespace skeleton
2
+
3
+ ## Goal
4
+
5
+ Stand up the `Ucode::Audit` namespace hub, the `Registry`, and the
6
+ `Context`. No extractors, no models, no CLI yet — just the empty
7
+ orchestrator scaffolding that subsequent TODOs (07-12) populate.
8
+
9
+ This is the foundation; everything else in the migration slots into it.
10
+
11
+ ## Files to create
12
+
13
+ - `lib/ucode/audit.rb` — namespace hub. Declares the autoloads (Ruby
14
+ autoload — see project memory `feedback_require_relative.md`).
15
+ - `lib/ucode/audit/registry.rb` — ordered list of extractor classes,
16
+ iterated by `AuditCommand` for every face.
17
+ - `lib/ucode/audit/context.rb` — value object carrying everything an
18
+ extractor needs to do its job (font handle, codepoint set, UCD
19
+ baseline, options).
20
+ - `lib/ucode/audit/extractors.rb` — extractors namespace hub (empty;
21
+ filled by TODO 08 and 09).
22
+ - `spec/ucode/audit/registry_spec.rb` — empty registry iterates zero
23
+ extractors without error.
24
+ - `spec/ucode/audit/context_spec.rb` — context memoizes codepoints,
25
+ baseline, source_format.
26
+
27
+ ## Port from fontisan
28
+
29
+ Direct port of:
30
+ - `fontisan/lib/fontisan/audit.rb` (namespace hub pattern)
31
+ - `fontisan/lib/fontisan/audit/registry.rb` (Registry module)
32
+ - `fontisan/lib/fontisan/audit/context.rb` (Context class)
33
+ - `fontisan/lib/fontisan/audit/extractors.rb` (extractors namespace)
34
+
35
+ with namespace changes (`Fontisan::` → `Ucode::`).
36
+
37
+ ## Context adjustments vs fontisan
38
+
39
+ The fontisan `Context` carries:
40
+
41
+ - `font`, `font_path`, `font_index`, `num_fonts_in_source`, `options`
42
+ - `codepoints` (memoized cmap keys)
43
+ - `ucd` (memoized UCD database + version + warning)
44
+ - `cldr` (memoized CLDR index — **drop**, see below)
45
+ - `source_format`
46
+
47
+ ucode's `Context` drops:
48
+
49
+ - `cldr` and the entire `resolve_cldr` path. CLDR is out of scope
50
+ (decision in `TODO.new/00-README.md`).
51
+ - `Ucd::VersionResolver` calls → replace with `Ucode::VersionResolver`
52
+ (ucode's own; see `lib/ucode/version_resolver.rb`).
53
+ - `Ucd::Database.open` / `Ucd::CacheManager` calls → replace with
54
+ `Ucode::Database.open` and `Ucode::Cache` (ucode's own; see
55
+ `lib/ucode/database.rb` and `lib/ucode/cache.rb`).
56
+ - `Ucd::Downloader` calls → replace with `Ucode::Fetch::UcdZip`.
57
+
58
+ ucode's `Context` adds:
59
+
60
+ - `baseline` — pre-resolved baseline struct (the assigned-codepoint
61
+ set for the target Unicode version). Extractors read from this
62
+ rather than re-resolving.
63
+ - `renderer` — optional glyph renderer for `--with-glyphs` mode. Set
64
+ only when the option is on; nil otherwise. Avoids loading fontisan's
65
+ outline reader unless needed.
66
+
67
+ ## Registry adjustments
68
+
69
+ The fontisan registry has two extractor lists:
70
+
71
+ - `ORDERED_EXTRACTORS` — 12 extractors (full audit).
72
+ - `BRIEF_EXTRACTORS` — 5 extractors (cheap pass).
73
+
74
+ ucode's registry starts empty (no extractors ported yet). TODOs 08 and
75
+ 09 add them in order. The brief/full mode switch ports across unchanged.
76
+
77
+ Drop the `Extractors::LanguageCoverage` entry from both lists — CLDR
78
+ out of scope.
79
+
80
+ ## Acceptance
81
+
82
+ - `Ucode::Audit` constant exists; `Ucode::Audit::Registry` and
83
+ `Ucode::Audit::Context` are referable.
84
+ - `Ucode::Audit::Registry.each(mode: :full) { |e| }` iterates zero
85
+ extractors without error (empty list).
86
+ - `Ucode::Audit::Registry.each(mode: :brief) { |e| }` same.
87
+ - `Ucode::Audit::Context.new(font: ..., ...)` constructs and memoizes
88
+ `codepoints` on first call.
89
+ - `Context#baseline` returns a real `Ucode::Database`-backed struct
90
+ (or raises a clear error if the version is uncached).
91
+ - No `cldr` method exists on `Context` (verified by spec).
92
+ - All specs use real model instances; no `double()`.
93
+ - Rubocop clean.
94
+
95
+ ## References
96
+
97
+ - Source: `fontisan/lib/fontisan/audit.rb`, `audit/registry.rb`,
98
+ `audit/context.rb`, `audit/extractors.rb`
99
+ - ucode UCD infra: `lib/ucode/database.rb`, `lib/ucode/cache.rb`,
100
+ `lib/ucode/version_resolver.rb`, `lib/ucode/fetch/`
101
+ - Project memory: `feedback_require_relative.md` (autoload rule),
102
+ `feedback_use_fontist_only.md`
103
+ - Follow-ups: `TODO.new/07-audit-models-port.md`,
104
+ `TODO.new/08-extractors-cheap-port.md`,
105
+ `TODO.new/09-extractors-expensive-port.md`
@@ -0,0 +1,132 @@
1
+ # 07 — Models::Audit port
2
+
3
+ ## Goal
4
+
5
+ Port the `Fontisan::Models::Audit::*` lutaml-model classes (15 files)
6
+ to `Ucode::Models::Audit::*` with the schema adjustments from
7
+ `TODO.new/02-audit-schema-design.md`. Pure data classes — no font
8
+ parsing logic.
9
+
10
+ ## Files to create
11
+
12
+ One class per file, plus the namespace hub:
13
+
14
+ ```
15
+ lib/ucode/models/audit.rb
16
+ lib/ucode/models/audit/audit_report.rb
17
+ lib/ucode/models/audit/baseline.rb # NEW
18
+ lib/ucode/models/audit/block_summary.rb # was AuditBlock
19
+ lib/ucode/models/audit/script_summary.rb # NEW (was string list)
20
+ lib/ucode/models/audit/plane_summary.rb # NEW
21
+ lib/ucode/models/audit/discrepancy.rb # NEW
22
+ lib/ucode/models/audit/codepoint_detail.rb # NEW
23
+ lib/ucode/models/audit/codepoint_range.rb
24
+ lib/ucode/models/audit/codepoint_set_diff.rb
25
+ lib/ucode/models/audit/audit_axis.rb
26
+ lib/ucode/models/audit/named_instance.rb
27
+ lib/ucode/models/audit/licensing.rb
28
+ lib/ucode/models/audit/metrics.rb
29
+ lib/ucode/models/audit/hinting.rb
30
+ lib/ucode/models/audit/color_capabilities.rb
31
+ lib/ucode/models/audit/variation_detail.rb
32
+ lib/ucode/models/audit/opentype_layout.rb
33
+ lib/ucode/models/audit/fs_selection_flags.rb
34
+ lib/ucode/models/audit/gasp_range.rb
35
+ lib/ucode/models/audit/embedding_type.rb
36
+ lib/ucode/models/audit/script_coverage_row.rb
37
+ lib/ucode/models/audit/script_features.rb
38
+ lib/ucode/models/audit/field_change.rb
39
+ lib/ucode/models/audit/duplicate_group.rb
40
+ lib/ucode/models/audit/library_summary.rb
41
+ lib/ucode/models/audit/audit_diff.rb
42
+ ```
43
+
44
+ Specs under `spec/ucode/models/audit/` — one spec per model, all
45
+ testing `to_hash` / `from_hash` round-trip.
46
+
47
+ ## Source material
48
+
49
+ Port these unchanged (just namespace swap):
50
+
51
+ - `fontisan/lib/fontisan/models/audit/codepoint_range.rb`
52
+ - `fontisan/lib/fontisan/models/audit/codepoint_set_diff.rb`
53
+ - `fontisan/lib/fontisan/models/audit/audit_axis.rb`
54
+ - `fontisan/lib/fontisan/models/audit/named_instance.rb`
55
+ - `fontisan/lib/fontisan/models/audit/licensing.rb`
56
+ - `fontisan/lib/fontisan/models/audit/metrics.rb`
57
+ - `fontisan/lib/fontisan/models/audit/hinting.rb`
58
+ - `fontisan/lib/fontisan/models/audit/color_capabilities.rb`
59
+ - `fontisan/lib/fontisan/models/audit/variation_detail.rb`
60
+ - `fontisan/lib/fontisan/models/audit/opentype_layout.rb`
61
+ - `fontisan/lib/fontisan/models/audit/fs_selection_flags.rb`
62
+ - `fontisan/lib/fontisan/models/audit/gasp_range.rb`
63
+ - `fontisan/lib/fontisan/models/audit/embedding_type.rb`
64
+ - `fontisan/lib/fontisan/models/audit/script_coverage_row.rb`
65
+ - `fontisan/lib/fontisan/models/audit/script_features.rb`
66
+ - `fontisan/lib/fontisan/models/audit/field_change.rb`
67
+ - `fontisan/lib/fontisan/models/audit/duplicate_group.rb`
68
+ - `fontisan/lib/fontisan/models/audit/library_summary.rb`
69
+ - `fontisan/lib/fontisan/models/audit/audit_diff.rb`
70
+
71
+ ## Schema changes vs fontisan
72
+
73
+ Per `TODO.new/02-audit-schema-design.md`:
74
+
75
+ - `AuditReport`:
76
+ - `fontisan_version` → `ucode_version`.
77
+ - Drop `cldr_version` and `language_coverage`.
78
+ - Drop `ucd_version` string → replace with `baseline` (Baseline model).
79
+ - Drop `unicode_scripts: String[]` → replace with `scripts: ScriptSummary[]`.
80
+ - Add `plane_summaries: PlaneSummary[]`.
81
+ - Add `discrepancies: Discrepancy[]`.
82
+ - `AuditBlock` → renamed `BlockSummary`. Add `missing_codepoints`,
83
+ `covered_codepoints` (verbose), `missing_count`, `coverage_percent`,
84
+ `status`, `plane`. Drop `complete` boolean (replaced by status).
85
+ - `AuditReport` uses `key_value do map "name", to: :name end` form —
86
+ same as fontisan. No `mapping do` (lutaml-model API; see project
87
+ memory `lutaml_model_polymorphism_api.md`).
88
+
89
+ ## lutaml-model conventions
90
+
91
+ - Parent class inherits via `< Lutaml::Model::Serializable` — never
92
+ `include Lutaml::Model::Serializable`. See project memory
93
+ `feedback_lutaml_model_inheritance.md`.
94
+ - Boolean attributes use `Lutaml::Model::Type::Boolean` (not Ruby
95
+ `:boolean` — same convention as fontisan).
96
+ - Key-value serialization uses `key_value do ... end` for JSON/YAML.
97
+ No custom `to_h`/`from_h`/`to_json`/`from_json`.
98
+ - Nested models reference other `Ucode::Models::Audit::*` classes
99
+ directly (no string namespacing).
100
+
101
+ ## Spec requirements
102
+
103
+ - One spec per model file under `spec/ucode/models/audit/`.
104
+ - Each spec:
105
+ - Constructs an instance with realistic attribute values (no
106
+ `nil` where the schema says non-nil).
107
+ - Round-trips through `to_hash` → `from_hash` → field equality.
108
+ - For collections, tests both empty and populated.
109
+ - No `double()` — use real instances or `Struct.new`.
110
+ - `AuditReport` spec additionally verifies every documented field
111
+ from `TODO.new/04-fontist-org-contract.md` is present.
112
+
113
+ ## Acceptance
114
+
115
+ - All 27 model files exist and load via autoload chain declared in
116
+ `lib/ucode/models/audit.rb`.
117
+ - All 27 spec files pass with no `double()` usage.
118
+ - `Ucode::Models::Audit::AuditReport.new(...)` accepts all fields
119
+ from `02-audit-schema-design.md`.
120
+ - `AuditReport#to_hash` produces a hash matching the
121
+ `04-fontist-org-contract.md` JSON shape (where overlapping).
122
+ - Rubocop clean.
123
+
124
+ ## References
125
+
126
+ - Schema source: `TODO.new/02-audit-schema-design.md`
127
+ - Contract: `TODO.new/04-fontist-org-contract.md`
128
+ - Source files: `fontisan/lib/fontisan/models/audit/`
129
+ - lutaml-model conventions: project memory
130
+ `lutaml_model_polymorphism_api.md`,
131
+ `feedback_lutaml_model_inheritance.md`
132
+ - Follow-ups: `TODO.new/08-extractors-cheap-port.md` (uses these models)
@@ -0,0 +1,113 @@
1
+ # 08 — Cheap extractors port
2
+
3
+ ## Goal
4
+
5
+ Port the 5 cheap extractors from fontisan to ucode. These are the
6
+ "brief mode" extractors — fast, name-table-only path that doesn't need
7
+ UCD or expensive table loads. Plus the Coverage extractor (cheap, but
8
+ excluded from brief mode in fontisan because it needs cmap; in ucode we
9
+ keep it cheap because cmap is the Tier 1 foundation).
10
+
11
+ After this TODO, `Ucode::Audit::Registry.each(mode: :brief)` produces
12
+ a minimal-but-real audit report (identity + style + coverage totals,
13
+ no aggregations).
14
+
15
+ ## Files to create
16
+
17
+ ```
18
+ lib/ucode/audit/extractors/
19
+ ├── base.rb # port from fontisan
20
+ ├── provenance.rb # port from fontisan
21
+ ├── identity.rb # port from fontisan
22
+ ├── style.rb # port from fontisan (the older StyleExtractor, not the registry-listed Extractors::Style)
23
+ ├── licensing.rb # port from fontisan
24
+ └── coverage.rb # port from fontisan
25
+ ```
26
+
27
+ Plus update `lib/ucode/audit/registry.rb` to populate `BRIEF_EXTRACTORS`
28
+ and add these to `ORDERED_EXTRACTORS` (the latter stays incomplete
29
+ until TODO 09).
30
+
31
+ Specs: `spec/ucode/audit/extractors/<name>_spec.rb` for each.
32
+
33
+ ## Port from fontisan
34
+
35
+ - `fontisan/lib/fontisan/audit/extractors/base.rb`
36
+ - `fontisan/lib/fontisan/audit/extractors/provenance.rb`
37
+ - `fontisan/lib/fontisan/audit/extractors/identity.rb`
38
+ - `fontisan/lib/fontisan/audit/extractors/style.rb`
39
+ - `fontisan/lib/fontisan/audit/extractors/licensing.rb`
40
+ - `fontisan/lib/fontisan/audit/extractors/coverage.rb`
41
+
42
+ ## Adjustments vs fontisan
43
+
44
+ Each extractor returns a hash of `AuditReport` fields. The fontisan
45
+ versions read font tables via `Context#font.table(...)` — this stays
46
+ the same; ucode's `Context` still wraps a fontisan font handle.
47
+
48
+ ### Provenance
49
+
50
+ - `fontisan_version` → `ucode_version` (read from `Ucode::VERSION`).
51
+ - Otherwise unchanged.
52
+
53
+ ### Identity
54
+
55
+ - Unchanged. Reads `name` table via fontisan's public API.
56
+
57
+ ### Style
58
+
59
+ - The standalone `StyleExtractor` class
60
+ (`fontisan/lib/fontisan/audit/style_extractor.rb`) is older
61
+ fontisan code. The registry-listed `Extractors::Style` is the newer
62
+ thin version. Port the registry-listed version; do not port the
63
+ standalone `StyleExtractor` class.
64
+ - Reads OS/2 + head via fontisan's public API. Same shape.
65
+
66
+ ### Licensing
67
+
68
+ - Unchanged.
69
+
70
+ ### Coverage
71
+
72
+ - Output `codepoints` field uses `"U+XXXX"` string form (per
73
+ `02-audit-schema-design.md`).
74
+ - Output `codepoint_ranges` uses `CodepointRange` model — port the
75
+ `CodepointRangeCoalescer` helper too (`fontisan/lib/fontisan/audit/codepoint_range_coalescer.rb`).
76
+ - Does **not** emit aggregations (blocks/scripts) — that's the
77
+ Aggregations extractor in TODO 10. Coverage only emits the raw
78
+ codepoint set.
79
+
80
+ ## Boundary with fontisan
81
+
82
+ These extractors call **only** fontisan's public font-reading API:
83
+
84
+ - `fontisan_font.table("name")`
85
+ - `fontisan_font.table("OS/2")`
86
+ - `fontisan_font.table("head")`
87
+ - `fontisan_font.table("cmap")`
88
+ - `fontisan_font.sfnt_table("cmap").parse.unicode_mappings`
89
+
90
+ No reaching into `Fontisan::Constants`, no `send`, no
91
+ `instance_variable_get`. If a field needs a table fontisan doesn't
92
+ expose, file a fontisan-side issue; do not work around it in ucode.
93
+
94
+ ## Acceptance
95
+
96
+ - All 6 extractor files exist; each has a passing spec with real
97
+ fixture fonts (use `spec/fixtures/fonts/`).
98
+ - `Ucode::Audit::Registry.each(mode: :brief)` iterates these 5:
99
+ `Provenance, Identity, Style, Licensing, Coverage`.
100
+ - A "brief audit" of a fixture font produces an `AuditReport` with
101
+ provenance, identity, style, licensing, and coverage fields
102
+ populated. Aggregation fields (`baseline`, `blocks`, `scripts`,
103
+ `plane_summaries`) are nil.
104
+ - No `double()` in any spec.
105
+ - Rubocop clean.
106
+
107
+ ## References
108
+
109
+ - Models: `TODO.new/07-audit-models-port.md`
110
+ - Source: `fontisan/lib/fontisan/audit/extractors/{base,provenance,identity,style,licensing,coverage}.rb`
111
+ - Coalescer helper: `fontisan/lib/fontisan/audit/codepoint_range_coalescer.rb`
112
+ - fontisan API boundary: `docs/architecture.md` §"Dependency arrows"
113
+ - Follow-up: `TODO.new/09-extractors-expensive-port.md`
@@ -0,0 +1,99 @@
1
+ # 09 — Expensive extractors port
2
+
3
+ ## Goal
4
+
5
+ Port the 5 expensive extractors from fontisan. These read multiple
6
+ font tables and reconstruct sub-structures (metrics, hinting program,
7
+ color font capabilities, variable-font axes, OpenType layout rules).
8
+ They are excluded from brief mode.
9
+
10
+ After this TODO, `Ucode::Audit::Registry.each(mode: :full)` produces
11
+ a complete (but still no-UCD-aggregations) audit report.
12
+
13
+ ## Files to create
14
+
15
+ ```
16
+ lib/ucode/audit/extractors/
17
+ ├── metrics.rb # port from fontisan
18
+ ├── hinting.rb # port from fontisan
19
+ ├── color_capabilities.rb # port from fontisan
20
+ ├── variation_detail.rb # port from fontisan
21
+ └── opentype_layout.rb # port from fontisan
22
+ ```
23
+
24
+ Plus update `lib/ucode/audit/registry.rb` `ORDERED_EXTRACTORS` to
25
+ include these in their fontisan positions.
26
+
27
+ Specs: `spec/ucode/audit/extractors/<name>_spec.rb` for each.
28
+
29
+ ## Port from fontisan
30
+
31
+ - `fontisan/lib/fontisan/audit/extractors/metrics.rb`
32
+ - `fontisan/lib/fontisan/audit/extractors/hinting.rb`
33
+ - `fontisan/lib/fontisan/audit/extractors/color_capabilities.rb`
34
+ - `fontisan/lib/fontisan/audit/extractors/variation_detail.rb`
35
+ - `fontisan/lib/fontisan/audit/extractors/opentype_layout.rb`
36
+
37
+ ## Adjustments vs fontisan
38
+
39
+ Each extractor returns a hash with one or two keys mapping to a
40
+ model from `TODO.new/07-audit-models-port.md`. The fontisan versions
41
+ already return hashes in this shape — port unchanged.
42
+
43
+ ### Metrics
44
+
45
+ - Reads `head`, `hhea`, `OS/2`, `post` via fontisan's public API.
46
+ - Returns `{ metrics: Ucode::Models::Audit::Metrics.new(...) }`.
47
+ - Returns `{}` (empty hash) for Type 1 fonts.
48
+
49
+ ### Hinting
50
+
51
+ - Reads `fpgm`, `prep`, `cvt`, `gasp`, plus CFF charstrings for CFF fonts.
52
+ - Returns `{ hinting: ... }` or `{}`.
53
+
54
+ ### ColorCapabilities
55
+
56
+ - Reads `COLR`, `CPAL`, `SVG`, `CBDT`, `CBLC`, `sbix`.
57
+ - Returns `{ color_capabilities: ... }` or `{}`.
58
+
59
+ ### VariationDetail
60
+
61
+ - Reads `fvar`, `gvar`, `STAT`, `avar`, `HVAR`, `VVAR`, etc.
62
+ - Returns `{ variation: ... }` or `{}` for non-variable faces.
63
+
64
+ ### OpenTypeLayout
65
+
66
+ - Reads `GSUB`, `GPOS`.
67
+ - Returns `{ opentype_layout: ... }` or `{}`.
68
+
69
+ ## Boundary with fontisan
70
+
71
+ Same boundary as TODO 08: only public font-reading API. If a table
72
+ isn't exposed publicly, file a fontisan-side issue.
73
+
74
+ For complex table walks (e.g. GSUB script list iteration, COLR layer
75
+ tree), prefer asking fontisan to expose a higher-level reader (e.g.
76
+ `fontisan_font.gsub_scripts`) rather than parsing the raw table bytes
77
+ in ucode. ucode is the audit owner, not the font parser.
78
+
79
+ ## Acceptance
80
+
81
+ - All 5 extractor files exist; each has a passing spec with real
82
+ fixture fonts covering: static TrueType, CFF/OTF, variable font,
83
+ color font (COLR or CBDT), Type 1 (returns empty).
84
+ - `Ucode::Audit::Registry.each(mode: :full)` iterates all 10
85
+ extractors ported so far (5 cheap from TODO 08 + 5 expensive here).
86
+ Still missing: Aggregations (TODO 10).
87
+ - A full audit of a fixture variable font populates `variation.axes`,
88
+ `variation.named_instances`, and `opentype_layout` correctly.
89
+ - A full audit of a fixture COLR font populates
90
+ `color_capabilities.colr_layers`, `color_capabilities.cpal_palettes`.
91
+ - No `double()` in any spec.
92
+ - Rubocop clean.
93
+
94
+ ## References
95
+
96
+ - Models: `TODO.new/07-audit-models-port.md`
97
+ - Source: `fontisan/lib/fontisan/audit/extractors/{metrics,hinting,color_capabilities,variation_detail,opentype_layout}.rb`
98
+ - Fixtures: `spec/fixtures/fonts/` (port any missing from fontisan's spec/fixtures/)
99
+ - Follow-up: `TODO.new/10-aggregations-ucd-rewrite.md` (last extractor)