ucode 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (228) hide show
  1. checksums.yaml +7 -0
  2. data/CLAUDE.md +211 -0
  3. data/Gemfile +22 -0
  4. data/Gemfile.lock +406 -0
  5. data/README.md +469 -0
  6. data/Rakefile +18 -0
  7. data/TODO.new/00-README.md +66 -0
  8. data/TODO.new/01-pillar-terminology-alignment.md +69 -0
  9. data/TODO.new/02-audit-schema-design.md +255 -0
  10. data/TODO.new/03-directory-output-spec.md +203 -0
  11. data/TODO.new/04-fontist-org-contract.md +173 -0
  12. data/TODO.new/05-baseline-unicode17-coverage-audit.md +144 -0
  13. data/TODO.new/06-audit-namespace-skeleton.md +105 -0
  14. data/TODO.new/07-audit-models-port.md +132 -0
  15. data/TODO.new/08-extractors-cheap-port.md +113 -0
  16. data/TODO.new/09-extractors-expensive-port.md +99 -0
  17. data/TODO.new/10-aggregations-ucd-rewrite.md +168 -0
  18. data/TODO.new/11-differ-and-library-auditor-port.md +102 -0
  19. data/TODO.new/12-formatters-port.md +115 -0
  20. data/TODO.new/13-directory-emitter.md +147 -0
  21. data/TODO.new/14-html-face-browser.md +144 -0
  22. data/TODO.new/15-html-library-browser.md +102 -0
  23. data/TODO.new/16-cli-audit-subcommands.md +142 -0
  24. data/TODO.new/17-fontisan-cleanup-audit.md +147 -0
  25. data/TODO.new/18-fontisan-cleanup-ucd.md +156 -0
  26. data/TODO.new/19-fontisan-docs-update.md +155 -0
  27. data/TODO.new/20-canonical-resolver-4-tier.md +182 -0
  28. data/TODO.new/21-canonical-unicode17-build.md +148 -0
  29. data/TODO.new/22-implementation-order.md +176 -0
  30. data/UCODE_CHANGELOG.md +97 -0
  31. data/exe/ucode +8 -0
  32. data/lib/ucode/aggregator.rb +77 -0
  33. data/lib/ucode/audit/block_aggregator.rb +90 -0
  34. data/lib/ucode/audit/codepoint_range_coalescer.rb +42 -0
  35. data/lib/ucode/audit/context.rb +137 -0
  36. data/lib/ucode/audit/discrepancy_detector.rb +213 -0
  37. data/lib/ucode/audit/extractors/aggregations.rb +70 -0
  38. data/lib/ucode/audit/extractors/base.rb +21 -0
  39. data/lib/ucode/audit/extractors/color_capabilities.rb +143 -0
  40. data/lib/ucode/audit/extractors/coverage.rb +55 -0
  41. data/lib/ucode/audit/extractors/hinting.rb +199 -0
  42. data/lib/ucode/audit/extractors/identity.rb +65 -0
  43. data/lib/ucode/audit/extractors/licensing.rb +75 -0
  44. data/lib/ucode/audit/extractors/metrics.rb +108 -0
  45. data/lib/ucode/audit/extractors/opentype_layout.rb +71 -0
  46. data/lib/ucode/audit/extractors/provenance.rb +34 -0
  47. data/lib/ucode/audit/extractors/style.rb +88 -0
  48. data/lib/ucode/audit/extractors/variation_detail.rb +101 -0
  49. data/lib/ucode/audit/extractors.rb +31 -0
  50. data/lib/ucode/audit/plane_aggregator.rb +37 -0
  51. data/lib/ucode/audit/registry.rb +63 -0
  52. data/lib/ucode/audit/script_aggregator.rb +92 -0
  53. data/lib/ucode/audit.rb +27 -0
  54. data/lib/ucode/cache.rb +113 -0
  55. data/lib/ucode/cli.rb +272 -0
  56. data/lib/ucode/commands/build.rb +68 -0
  57. data/lib/ucode/commands/cache.rb +46 -0
  58. data/lib/ucode/commands/fetch.rb +62 -0
  59. data/lib/ucode/commands/font_coverage.rb +57 -0
  60. data/lib/ucode/commands/glyphs.rb +136 -0
  61. data/lib/ucode/commands/lookup.rb +65 -0
  62. data/lib/ucode/commands/parse.rb +62 -0
  63. data/lib/ucode/commands/site.rb +33 -0
  64. data/lib/ucode/commands.rb +19 -0
  65. data/lib/ucode/config.rb +110 -0
  66. data/lib/ucode/coordinator/indices.rb +34 -0
  67. data/lib/ucode/coordinator.rb +397 -0
  68. data/lib/ucode/database.rb +214 -0
  69. data/lib/ucode/db_builder.rb +107 -0
  70. data/lib/ucode/error.rb +96 -0
  71. data/lib/ucode/fetch/code_charts.rb +57 -0
  72. data/lib/ucode/fetch/http.rb +83 -0
  73. data/lib/ucode/fetch/ucd_zip.rb +57 -0
  74. data/lib/ucode/fetch/unihan_zip.rb +57 -0
  75. data/lib/ucode/fetch.rb +14 -0
  76. data/lib/ucode/glyphs/cell_extractor.rb +130 -0
  77. data/lib/ucode/glyphs/dvisvgm_renderer.rb +29 -0
  78. data/lib/ucode/glyphs/embedded_fonts/catalog.rb +372 -0
  79. data/lib/ucode/glyphs/embedded_fonts/content_stream_correlator.rb +228 -0
  80. data/lib/ucode/glyphs/embedded_fonts/font_entry.rb +126 -0
  81. data/lib/ucode/glyphs/embedded_fonts/renderer.rb +47 -0
  82. data/lib/ucode/glyphs/embedded_fonts/source.rb +94 -0
  83. data/lib/ucode/glyphs/embedded_fonts/svg.rb +123 -0
  84. data/lib/ucode/glyphs/embedded_fonts/tounicode.rb +103 -0
  85. data/lib/ucode/glyphs/embedded_fonts/writer.rb +76 -0
  86. data/lib/ucode/glyphs/embedded_fonts.rb +50 -0
  87. data/lib/ucode/glyphs/grid.rb +30 -0
  88. data/lib/ucode/glyphs/grid_detector.rb +165 -0
  89. data/lib/ucode/glyphs/last_resort/cmap_index.rb +96 -0
  90. data/lib/ucode/glyphs/last_resort/contents.rb +74 -0
  91. data/lib/ucode/glyphs/last_resort/glif.rb +124 -0
  92. data/lib/ucode/glyphs/last_resort/renderer.rb +67 -0
  93. data/lib/ucode/glyphs/last_resort/source.rb +125 -0
  94. data/lib/ucode/glyphs/last_resort/svg.rb +247 -0
  95. data/lib/ucode/glyphs/last_resort/writer.rb +83 -0
  96. data/lib/ucode/glyphs/last_resort.rb +36 -0
  97. data/lib/ucode/glyphs/monolith_page_map.rb +181 -0
  98. data/lib/ucode/glyphs/mutool_renderer.rb +28 -0
  99. data/lib/ucode/glyphs/page_renderer.rb +221 -0
  100. data/lib/ucode/glyphs/path_bbox.rb +62 -0
  101. data/lib/ucode/glyphs/pdf2svg_renderer.rb +26 -0
  102. data/lib/ucode/glyphs/pdf_fetcher.rb +102 -0
  103. data/lib/ucode/glyphs/pdftocairo_renderer.rb +32 -0
  104. data/lib/ucode/glyphs/real_fonts/block_coverage.rb +45 -0
  105. data/lib/ucode/glyphs/real_fonts/coverage_auditor.rb +117 -0
  106. data/lib/ucode/glyphs/real_fonts/font_coverage_report.rb +45 -0
  107. data/lib/ucode/glyphs/real_fonts/font_locator.rb +95 -0
  108. data/lib/ucode/glyphs/real_fonts/unicode_17_blocks.rb +104 -0
  109. data/lib/ucode/glyphs/real_fonts/writer.rb +50 -0
  110. data/lib/ucode/glyphs/real_fonts.rb +32 -0
  111. data/lib/ucode/glyphs/writer.rb +250 -0
  112. data/lib/ucode/glyphs.rb +27 -0
  113. data/lib/ucode/index.rb +106 -0
  114. data/lib/ucode/index_builder.rb +94 -0
  115. data/lib/ucode/models/audit/audit_axis.rb +30 -0
  116. data/lib/ucode/models/audit/audit_diff.rb +77 -0
  117. data/lib/ucode/models/audit/audit_report.rb +137 -0
  118. data/lib/ucode/models/audit/baseline.rb +32 -0
  119. data/lib/ucode/models/audit/block_summary.rb +72 -0
  120. data/lib/ucode/models/audit/codepoint_detail.rb +45 -0
  121. data/lib/ucode/models/audit/codepoint_range.rb +39 -0
  122. data/lib/ucode/models/audit/codepoint_set_diff.rb +34 -0
  123. data/lib/ucode/models/audit/color_capabilities.rb +91 -0
  124. data/lib/ucode/models/audit/discrepancy.rb +38 -0
  125. data/lib/ucode/models/audit/duplicate_group.rb +23 -0
  126. data/lib/ucode/models/audit/embedding_type.rb +81 -0
  127. data/lib/ucode/models/audit/field_change.rb +28 -0
  128. data/lib/ucode/models/audit/fs_selection_flags.rb +65 -0
  129. data/lib/ucode/models/audit/gasp_range.rb +63 -0
  130. data/lib/ucode/models/audit/hinting.rb +99 -0
  131. data/lib/ucode/models/audit/library_summary.rb +40 -0
  132. data/lib/ucode/models/audit/licensing.rb +48 -0
  133. data/lib/ucode/models/audit/metrics.rb +111 -0
  134. data/lib/ucode/models/audit/named_instance.rb +41 -0
  135. data/lib/ucode/models/audit/opentype_layout.rb +38 -0
  136. data/lib/ucode/models/audit/plane_summary.rb +31 -0
  137. data/lib/ucode/models/audit/script_coverage_row.rb +26 -0
  138. data/lib/ucode/models/audit/script_features.rb +28 -0
  139. data/lib/ucode/models/audit/script_summary.rb +54 -0
  140. data/lib/ucode/models/audit/variation_detail.rb +42 -0
  141. data/lib/ucode/models/audit.rb +50 -0
  142. data/lib/ucode/models/bidi_bracket_pair.rb +20 -0
  143. data/lib/ucode/models/bidi_mirroring.rb +19 -0
  144. data/lib/ucode/models/binary_property_assignment.rb +26 -0
  145. data/lib/ucode/models/block.rb +36 -0
  146. data/lib/ucode/models/case_folding_rule.rb +23 -0
  147. data/lib/ucode/models/cjk_radical.rb +23 -0
  148. data/lib/ucode/models/codepoint/bidi.rb +28 -0
  149. data/lib/ucode/models/codepoint/break_segmentation.rb +22 -0
  150. data/lib/ucode/models/codepoint/case_folding.rb +25 -0
  151. data/lib/ucode/models/codepoint/casing.rb +32 -0
  152. data/lib/ucode/models/codepoint/decomposition.rb +27 -0
  153. data/lib/ucode/models/codepoint/display.rb +24 -0
  154. data/lib/ucode/models/codepoint/emoji.rb +29 -0
  155. data/lib/ucode/models/codepoint/hangul.rb +20 -0
  156. data/lib/ucode/models/codepoint/identifier.rb +30 -0
  157. data/lib/ucode/models/codepoint/indic.rb +20 -0
  158. data/lib/ucode/models/codepoint/joining.rb +20 -0
  159. data/lib/ucode/models/codepoint/normalization.rb +35 -0
  160. data/lib/ucode/models/codepoint/numeric_value.rb +35 -0
  161. data/lib/ucode/models/codepoint.rb +122 -0
  162. data/lib/ucode/models/name_alias.rb +21 -0
  163. data/lib/ucode/models/named_sequence.rb +19 -0
  164. data/lib/ucode/models/names_list_entry.rb +38 -0
  165. data/lib/ucode/models/plane.rb +36 -0
  166. data/lib/ucode/models/property_alias.rb +24 -0
  167. data/lib/ucode/models/property_value_alias.rb +26 -0
  168. data/lib/ucode/models/relationship/compat_equiv.rb +18 -0
  169. data/lib/ucode/models/relationship/cross_reference.rb +17 -0
  170. data/lib/ucode/models/relationship/footnote.rb +24 -0
  171. data/lib/ucode/models/relationship/informal_alias.rb +18 -0
  172. data/lib/ucode/models/relationship/sample_sequence.rb +24 -0
  173. data/lib/ucode/models/relationship/variation_sequence.rb +19 -0
  174. data/lib/ucode/models/relationship.rb +57 -0
  175. data/lib/ucode/models/script.rb +41 -0
  176. data/lib/ucode/models/special_casing_rule.rb +28 -0
  177. data/lib/ucode/models/standardized_variant.rb +24 -0
  178. data/lib/ucode/models/unihan_entry.rb +23 -0
  179. data/lib/ucode/models.rb +47 -0
  180. data/lib/ucode/parsers/auxiliary.rb +26 -0
  181. data/lib/ucode/parsers/base.rb +137 -0
  182. data/lib/ucode/parsers/bidi_brackets.rb +41 -0
  183. data/lib/ucode/parsers/bidi_mirroring.rb +37 -0
  184. data/lib/ucode/parsers/blocks.rb +63 -0
  185. data/lib/ucode/parsers/case_folding.rb +53 -0
  186. data/lib/ucode/parsers/cjk_radicals.rb +102 -0
  187. data/lib/ucode/parsers/derived_age.rb +59 -0
  188. data/lib/ucode/parsers/derived_core_properties.rb +60 -0
  189. data/lib/ucode/parsers/extracted_properties.rb +74 -0
  190. data/lib/ucode/parsers/name_aliases.rb +44 -0
  191. data/lib/ucode/parsers/named_sequences.rb +51 -0
  192. data/lib/ucode/parsers/names_list.rb +250 -0
  193. data/lib/ucode/parsers/property_aliases.rb +41 -0
  194. data/lib/ucode/parsers/property_value_aliases.rb +46 -0
  195. data/lib/ucode/parsers/script_extensions.rb +64 -0
  196. data/lib/ucode/parsers/scripts.rb +60 -0
  197. data/lib/ucode/parsers/special_casing.rb +62 -0
  198. data/lib/ucode/parsers/standardized_variants.rb +56 -0
  199. data/lib/ucode/parsers/unicode_data/hangul_name.rb +73 -0
  200. data/lib/ucode/parsers/unicode_data.rb +268 -0
  201. data/lib/ucode/parsers/unihan.rb +125 -0
  202. data/lib/ucode/parsers.rb +35 -0
  203. data/lib/ucode/range_entry.rb +58 -0
  204. data/lib/ucode/repo/aggregate_writer.rb +364 -0
  205. data/lib/ucode/repo/atomic_writes.rb +48 -0
  206. data/lib/ucode/repo/codepoint_writer.rb +96 -0
  207. data/lib/ucode/repo/paths.rb +122 -0
  208. data/lib/ucode/repo.rb +22 -0
  209. data/lib/ucode/site/config_emitter.rb +124 -0
  210. data/lib/ucode/site/generator.rb +178 -0
  211. data/lib/ucode/site/search_index.rb +68 -0
  212. data/lib/ucode/site/template/.gitignore +4 -0
  213. data/lib/ucode/site/template/.vitepress/config.ts +8 -0
  214. data/lib/ucode/site/template/.vitepress/theme/index.js +20 -0
  215. data/lib/ucode/site/template/char/[codepoint].md +13 -0
  216. data/lib/ucode/site/template/components/BlockView.vue +57 -0
  217. data/lib/ucode/site/template/components/CharView.vue +85 -0
  218. data/lib/ucode/site/template/components/PlaneView.vue +56 -0
  219. data/lib/ucode/site/template/components/SearchView.vue +66 -0
  220. data/lib/ucode/site/template/index.md +25 -0
  221. data/lib/ucode/site/template/package.json +18 -0
  222. data/lib/ucode/site/template/search.md +9 -0
  223. data/lib/ucode/site.rb +13 -0
  224. data/lib/ucode/version.rb +5 -0
  225. data/lib/ucode/version_resolver.rb +76 -0
  226. data/lib/ucode.rb +74 -0
  227. data/ucode.gemspec +56 -0
  228. metadata +404 -0
@@ -0,0 +1,255 @@
1
+ # 02 — Audit schema design
2
+
3
+ ## Goal
4
+
5
+ Define the lutaml-model class hierarchy for the per-face font audit
6
+ report. This is the in-memory shape; serialization to the directory
7
+ tree is `03-directory-output-spec.md`.
8
+
9
+ The schema is the contract: fontist.org codes against it, the HTML
10
+ browser renders it, and the migration ports to it. Lock this before
11
+ touching any extractor.
12
+
13
+ ## Source material
14
+
15
+ Port from `fontisan/lib/fontisan/models/audit/` (15 files, ~750 lines
16
+ total) with the adjustments below. Do not invent new fields without a
17
+ documented consumer.
18
+
19
+ ## Top-level model
20
+
21
+ ```ruby
22
+ # lib/ucode/models/audit/audit_report.rb
23
+ class AuditReport < Lutaml::Model::Serializable
24
+ # --- Provenance ---
25
+ attribute :generated_at, :string
26
+ attribute :ucode_version, :string # was: fontisan_version
27
+ attribute :source_file, :string
28
+ attribute :source_sha256, :string
29
+ attribute :source_format, :string
30
+
31
+ # --- Source layout ---
32
+ attribute :font_index, :integer
33
+ attribute :num_fonts_in_source, :integer
34
+
35
+ # --- Identity (name table) ---
36
+ attribute :family_name, :string
37
+ attribute :subfamily_name, :string
38
+ attribute :full_name, :string
39
+ attribute :postscript_name, :string
40
+ attribute :version, :string
41
+ attribute :font_revision, :float
42
+
43
+ # --- Style (OS/2 + head) ---
44
+ attribute :weight_class, :integer
45
+ attribute :width_class, :integer
46
+ attribute :italic, Lutaml::Model::Type::Boolean
47
+ attribute :bold, Lutaml::Model::Type::Boolean
48
+ attribute :panose, :string
49
+
50
+ # --- Coverage ---
51
+ attribute :total_codepoints, :integer
52
+ attribute :total_glyphs, :integer
53
+ attribute :cmap_subtables, :integer, collection: true
54
+ attribute :codepoint_ranges, CodepointRange, collection: true
55
+ attribute :codepoints, :string, collection: true # "U+XXXX" form
56
+
57
+ # --- Aggregations (driven by ucode's own UCD, not ucd.all.flat.zip) ---
58
+ attribute :baseline, Baseline # see below — replaces ucd_version
59
+ attribute :blocks, BlockSummary, collection: true
60
+ attribute :scripts, ScriptSummary, collection: true # was: unicode_scripts (string list)
61
+
62
+ # --- Optional deep tables (nil for Type 1) ---
63
+ attribute :licensing, Licensing
64
+ attribute :metrics, Metrics
65
+ attribute :hinting, Hinting
66
+ attribute :color_capabilities, ColorCapabilities
67
+ attribute :variation, VariationDetail
68
+ attribute :opentype_layout, OpenTypeLayout
69
+
70
+ # --- Audit signals ---
71
+ attribute :discrepancies, Discrepancy, collection: true # NEW
72
+ attribute :warning, :string
73
+
74
+ key_value do
75
+ # ... one map line per attribute ...
76
+ end
77
+ end
78
+ ```
79
+
80
+ ## New / changed sub-models vs fontisan
81
+
82
+ ### `Baseline` (NEW — replaces fontisan's bare `ucd_version` string)
83
+
84
+ ```ruby
85
+ class Baseline < Lutaml::Model::Serializable
86
+ attribute :unicode_version, :string # "17.0.0"
87
+ attribute :ucode_version, :string
88
+ attribute :fontisan_version, :string # parser provenance
89
+ attribute :source, :string # "ucd-text + Unicode17Blocks overrides"
90
+ attribute :generated_at, :string
91
+ end
92
+ ```
93
+
94
+ ### `BlockSummary` (replaces fontisan's `AuditBlock`)
95
+
96
+ ```ruby
97
+ class BlockSummary < Lutaml::Model::Serializable
98
+ attribute :name, :string # original block name verbatim
99
+ attribute :first_cp, :integer
100
+ attribute :last_cp, :integer
101
+ attribute :range, :string # "U+0000–U+007F" (display form)
102
+ attribute :plane, :integer # 0-16
103
+ attribute :total_assigned, :integer # ucode's curated count
104
+ attribute :covered_count, :integer
105
+ attribute :missing_count, :integer
106
+ attribute :coverage_percent, :float
107
+ attribute :status, :string # see enum below
108
+ attribute :missing_codepoints, :integer, collection: true # always populated
109
+ attribute :covered_codepoints, :integer, collection: true # verbose only
110
+ end
111
+
112
+ # status enum (string — no symbol serialization):
113
+ # COMPLETE — covered_count == total_assigned
114
+ # PARTIAL — 0 < covered_count < total_assigned
115
+ # UNCOVERED_ASSIGNED — covered_count == 0 && total_assigned > 0
116
+ # NO_ASSIGNED_IN_BLOCK — total_assigned == 0 (rare; PUA blocks)
117
+ # OUTSIDE_BASELINE — block exists in font's cmap but not in baseline
118
+ ```
119
+
120
+ ### `ScriptSummary` (replaces fontisan's bare `unicode_scripts: String[]`)
121
+
122
+ ```ruby
123
+ class ScriptSummary < Lutaml::Model::Serializable
124
+ attribute :script_code, :string # "Latn", "Hani", ...
125
+ attribute :script_name, :string # "Latin", "Han", ...
126
+ attribute :blocks_total, :integer
127
+ attribute :assigned_total, :integer
128
+ attribute :covered_total, :integer
129
+ attribute :coverage_percent, :float
130
+ attribute :status, :string # same enum as BlockSummary minus OUTSIDE_BASELINE
131
+ end
132
+ ```
133
+
134
+ ### `Discrepancy` (NEW — cheap audit signal)
135
+
136
+ ```ruby
137
+ class Discrepancy < Lutaml::Model::Serializable
138
+ attribute :kind, :string # "os2_unicode_range_bit_without_cmap_codepoints"
139
+ attribute :detail, :string # human-readable explanation
140
+ attribute :block_name, :string # optional context
141
+ attribute :bit_position, :integer # optional (OS/2 ulUnicodeRange bit)
142
+ end
143
+ ```
144
+
145
+ ### Plane rollup
146
+
147
+ ```ruby
148
+ class PlaneSummary < Lutaml::Model::Serializable
149
+ attribute :plane, :integer
150
+ attribute :blocks_total, :integer
151
+ attribute :assigned_total, :integer
152
+ attribute :covered_total, :integer
153
+ attribute :coverage_percent, :float
154
+ end
155
+ ```
156
+
157
+ Carried on the report as `attribute :plane_summaries, PlaneSummary, collection: true`.
158
+
159
+ ### Codepoint detail (verbose only — emitted to a separate file)
160
+
161
+ ```ruby
162
+ class CodepointDetail < Lutaml::Model::Serializable
163
+ attribute :codepoint, :integer
164
+ attribute :name, :string
165
+ attribute :general_category, :string
166
+ attribute :script, :string
167
+ attribute :script_extensions, :string, collection: true
168
+ attribute :block_name, :string
169
+ attribute :age, :string
170
+ attribute :glyph_id, :integer # GID in the audited font
171
+ attribute :glyph_svg_path, :string # relative path under glyphs/, when emitted
172
+ end
173
+ ```
174
+
175
+ ## Ported unchanged from fontisan
176
+
177
+ These sub-models port across with namespace changes only (`Fontisan::`
178
+ → `Ucode::`):
179
+
180
+ - `Licensing`
181
+ - `Metrics`
182
+ - `Hinting`
183
+ - `ColorCapabilities`
184
+ - `VariationDetail`
185
+ - `OpenTypeLayout`
186
+ - `CodepointRange`, `CodepointSetDiff`
187
+ - `AuditAxis`, `NamedInstance`
188
+ - `FsSelectionFlags`, `GaspRange`, `EmbeddingType`
189
+ - `ScriptCoverageRow`, `ScriptFeatures`
190
+ - `FieldChange` (for Differ), `DuplicateGroup` (for LibraryAuditor)
191
+ - `LibrarySummary`
192
+ - `AuditDiff` (for compare command)
193
+
194
+ ## What's dropped vs fontisan
195
+
196
+ - **`language_coverage`** and `Models::Cldr::*` — CLDR is out of scope
197
+ (UCD Scripts.txt + ScriptExtensions.txt already define per-codepoint
198
+ script coverage; CLDR was overreach). See `00-README.md` decision.
199
+ - **`cldr_version`** on AuditReport — same reason.
200
+ - The `fontisan_version` field is renamed `ucode_version`. Fontisan is
201
+ now an internal parser, not the report's identity.
202
+
203
+ ## File layout
204
+
205
+ ```
206
+ lib/ucode/models/audit/
207
+ ├── audit_report.rb # top-level
208
+ ├── baseline.rb # NEW
209
+ ├── block_summary.rb # was AuditBlock
210
+ ├── script_summary.rb # NEW (was: string list)
211
+ ├── plane_summary.rb # NEW
212
+ ├── discrepancy.rb # NEW
213
+ ├── codepoint_detail.rb # NEW
214
+ ├── codepoint_range.rb
215
+ ├── codepoint_set_diff.rb
216
+ ├── audit_axis.rb
217
+ ├── named_instance.rb
218
+ ├── licensing.rb
219
+ ├── metrics.rb
220
+ ├── hinting.rb
221
+ ├── color_capabilities.rb
222
+ ├── variation_detail.rb
223
+ ├── opentype_layout.rb
224
+ ├── fs_selection_flags.rb
225
+ ├── gasp_range.rb
226
+ ├── embedding_type.rb
227
+ ├── script_coverage_row.rb
228
+ ├── script_features.rb
229
+ ├── field_change.rb
230
+ ├── duplicate_group.rb
231
+ ├── library_summary.rb
232
+ └── audit_diff.rb
233
+ ```
234
+
235
+ Plus the namespace hub `lib/ucode/models/audit.rb` declaring the
236
+ autoloads (Ruby autoload — see project memory `feedback_require_relative.md`
237
+ for the rule).
238
+
239
+ ## Acceptance
240
+
241
+ - All ~26 model classes ported and spec'd with round-trip
242
+ `to_hash` / `from_hash` tests. No hand-rolled `to_h`.
243
+ - No use of `double()` in any spec.
244
+ - The `AuditReport` shape produces the JSON described in
245
+ `04-fontist-org-contract.md` when serialized via the directory emitter
246
+ in `13-directory-emitter.md`.
247
+ - Rubocop clean on all new files.
248
+
249
+ ## References
250
+
251
+ - Source: `fontisan/lib/fontisan/models/audit/`
252
+ - Project memory: `lutaml_model_polymorphism_api.md`,
253
+ `feedback_lutaml_model_inheritance.md`
254
+ - Contract: `TODO.new/04-fontist-org-contract.md`
255
+ - Output: `TODO.new/03-directory-output-spec.md`
@@ -0,0 +1,203 @@
1
+ # 03 — Directory output spec
2
+
3
+ ## Goal
4
+
5
+ Lock the per-face output tree on disk. The HTML browser fetches chunks
6
+ lazily from this layout; fontist.org consumes the same files. One face
7
+ = one directory; the browser never loads the whole tree at once.
8
+
9
+ ## Why directory, not single file
10
+
11
+ A Unicode-17-complete CJK font carries ~50,000 codepoints across ~10
12
+ blocks. A single self-contained JSON would be tens of MB — too big for
13
+ a browser to parse without jank, and too big for fontist.org to fetch
14
+ just to render a coverage map. Splitting by concern lets the consumer
15
+ fetch only what it needs.
16
+
17
+ ## Layout
18
+
19
+ ```
20
+ output/font_audit/<label>/
21
+ ├── index.json # face metadata + totals + per-block stats only
22
+ ├── index.html # standalone browser (inlined CSS/JS, no chunks inlined)
23
+ ├── planes/
24
+ │ ├── 0.json # BMP rollup
25
+ │ ├── 2.json # CJK plane rollup
26
+ │ └── ... # 17 files max
27
+ ├── blocks/
28
+ │ ├── Basic_Latin.json # per-block: stats + missing_codepoints (always)
29
+ │ ├── CJK_Unified_Ideographs.json
30
+ │ └── ... # one per touched block
31
+ ├── scripts/
32
+ │ ├── Latin.json # per-script rollup
33
+ │ ├── Han.json
34
+ │ └── ...
35
+ ├── codepoints/ # verbose mode only (--verbose)
36
+ │ ├── Basic_Latin.json # per-block codepoint detail list
37
+ │ ├── CJK_Unified_Ideographs.json
38
+ │ └── ... # chunked per block; each file <1MB even for CJK
39
+ └── glyphs/ # opt-in (--with-glyphs); one SVG per codepoint
40
+ ├── U+0041.svg
41
+ ├── U+4E00.svg
42
+ └── ...
43
+ ```
44
+
45
+ For a TTC collection, sibling faces share the source directory:
46
+
47
+ ```
48
+ output/font_audit/<source_label>/
49
+ ├── index.json # collection-level summary (num_fonts_in_source, etc.)
50
+ ├── index.html # collection browser (lists faces)
51
+ ├── 00-<face_ps_name>/
52
+ │ ├── index.json
53
+ │ └── ... (per-face layout above)
54
+ ├── 01-<face_ps_name>/
55
+ │ └── ...
56
+ └── ...
57
+ ```
58
+
59
+ Filename pattern for collection faces:
60
+ `{font_index:02d}-{safe_filename(postscript_name)}` — same convention
61
+ fontisan uses today. The `00`-prefix guarantees face-order sort and
62
+ disambiguates broken fonts where two faces share a PostScript name.
63
+
64
+ ## Block filename encoding
65
+
66
+ Block names use the original Unicode verbatim form (e.g.
67
+ `Greek_And_Coptic`, `CJK_Ext_A`). They contain spaces and underscores
68
+ but never slashes — safe as filenames as-is. **Do not slugify.**
69
+
70
+ Replace only the characters filesystems reject: `/` → `_`. Unicode
71
+ block names contain no `/` today, so this is a defensive no-op.
72
+
73
+ ## File contents
74
+
75
+ ### `index.json`
76
+
77
+ Compact face metadata + rollups. Carries everything a renderer needs
78
+ for the initial overview without expanding any block:
79
+
80
+ ```json
81
+ {
82
+ "generated_at": "2026-06-27T12:00:00Z",
83
+ "ucode_version": "0.2.0",
84
+ "font": { ... AuditReport identity + style + coverage-totals ... },
85
+ "baseline": { "unicode_version": "17.0.0", ... },
86
+ "totals": {
87
+ "assigned_codepoints_total": 150000,
88
+ "covered_codepoints_total": 2857,
89
+ "blocks_touched": 24,
90
+ "blocks_complete": 12,
91
+ "blocks_partial": 12,
92
+ "scripts_touched": 5
93
+ },
94
+ "discrepancies": [ ... ],
95
+ "plane_summaries": [ ... ],
96
+ "block_summaries": [
97
+ {
98
+ "name": "Basic Latin",
99
+ "first_cp": 0, "last_cp": 127, "plane": 0,
100
+ "total_assigned": 128, "covered_count": 128,
101
+ "missing_count": 0, "coverage_percent": 100.0,
102
+ "status": "COMPLETE",
103
+ "missing_codepoints": []
104
+ },
105
+ ...
106
+ ],
107
+ "script_summaries": [ ... ]
108
+ }
109
+ ```
110
+
111
+ Per-block `missing_codepoints` is **always** embedded (decision in
112
+ `00-README.md`). Per-block `covered_codepoints` is **never** in
113
+ `index.json` — fetch `codepoints/<NAME>.json` for that.
114
+
115
+ ### `blocks/<NAME>.json`
116
+
117
+ Single `BlockSummary` object (same shape as the entry in
118
+ `block_summaries`) plus optional `codepoints` detail if emitted in
119
+ verbose mode. Carries the full missing list. Cheap to fetch per-block
120
+ on demand.
121
+
122
+ ### `planes/<N>.json` and `scripts/<CODE>.json`
123
+
124
+ Rollup views. Useful for renderers that group by plane or script
125
+ without iterating all blocks.
126
+
127
+ ### `codepoints/<NAME>.json` (verbose only)
128
+
129
+ ```json
130
+ {
131
+ "block_name": "Basic Latin",
132
+ "codepoints": [
133
+ {
134
+ "codepoint": 65,
135
+ "name": "LATIN CAPITAL LETTER A",
136
+ "general_category": "Lu",
137
+ "script": "Latin",
138
+ "block_name": "Basic Latin",
139
+ "age": "1.1",
140
+ "glyph_id": 36,
141
+ "glyph_svg_path": "glyphs/U+0041.svg"
142
+ },
143
+ ...
144
+ ]
145
+ }
146
+ ```
147
+
148
+ Per-block chunking keeps each file under ~1MB even for CJK. The browser
149
+ fetches this only when the user expands a block to see per-character
150
+ detail.
151
+
152
+ ### `glyphs/U+XXXX.svg`
153
+
154
+ Plain SVG file (one glyph outline). The browser fetches individually
155
+ on click. Output via `fontisan` outline reading on the audited font
156
+ (decision: render from audited font, not Code Charts).
157
+
158
+ ## Library mode
159
+
160
+ ```
161
+ output/font_audit/
162
+ ├── index.json # library summary (font count, totals)
163
+ ├── index.html # library browser (cards of audited fonts)
164
+ ├── <font_label_1>/
165
+ ├── <font_label_2>/
166
+ └── ...
167
+ ```
168
+
169
+ `Ucode::Audit::LibraryAuditor` walks a directory, audits each font into
170
+ its own subdirectory, then emits the library-level index pointing at
171
+ each face's `index.json`.
172
+
173
+ ## Idempotency
174
+
175
+ Every emitted file is content-hash compared (same pattern as
176
+ `Ucode::Repo::AtomicWrites`). Re-running `ucode audit font <path>` on
177
+ an unchanged source leaves existing files untouched. Re-running on a
178
+ changed source rewrites only the affected chunks.
179
+
180
+ Skip-newer check: if a chunk file's mtime is newer than the source
181
+ font's mtime AND the baseline UCD's mtime, skip the rewrite entirely.
182
+ This matches the canonical-dataset writer's idempotency rule from
183
+ `CLAUDE.md`.
184
+
185
+ ## Acceptance
186
+
187
+ - A `--verbose` audit of a 50k-codepoint CJK font produces an
188
+ `index.json` under 200KB and no per-chunk file over 1MB.
189
+ - A non-verbose audit produces `index.json`, `planes/`, `blocks/`,
190
+ `scripts/` only — no `codepoints/` and no `glyphs/`.
191
+ - A `--with-glyphs` audit additionally produces `glyphs/U+XXXX.svg` per
192
+ covered codepoint.
193
+ - All filenames preserve original block names verbatim.
194
+ - Re-running the same audit twice produces zero file writes on the
195
+ second run.
196
+
197
+ ## References
198
+
199
+ - Schema: `TODO.new/02-audit-schema-design.md`
200
+ - Contract: `TODO.new/04-fontist-org-contract.md`
201
+ - Emitter impl: `TODO.new/13-directory-emitter.md`
202
+ - Browser impl: `TODO.new/14-html-face-browser.md`
203
+ - `Ucode::Repo::AtomicWrites` (existing pattern)
@@ -0,0 +1,173 @@
1
+ # 04 — fontist.org contract
2
+
3
+ ## Goal
4
+
5
+ Pin the exact JSON contract `fontist.org` consumes. Both sides code
6
+ against this doc. Any breaking change to the contract = minor version
7
+ bump in ucode + a note here.
8
+
9
+ ## What fontist.org needs
10
+
11
+ A coverage map per font shows, per Unicode block, how many codepoints
12
+ the font covers vs how many are assigned. Renderer requirements:
13
+
14
+ 1. Face identity (name, foundry, version) — for the map's header.
15
+ 2. Per-block coverage stats — for the map's body.
16
+ 3. Per-block missing codepoints — for the "what's missing" drill-down.
17
+ 4. Plane rollup — for the map's overview band.
18
+ 5. Audit provenance — for the "data as of Unicode X.Y, generated at
19
+ <timestamp>" footer.
20
+
21
+ fontist.org does **not** need:
22
+ - Per-codepoint detail lists (verbose `codepoints/<NAME>.json`) — those
23
+ are for ucode's local browser.
24
+ - Per-codepoint glyph SVGs (`glyphs/`) — fontist.org renders its own
25
+ glyphs from its own font copies.
26
+
27
+ ## Endpoint shape
28
+
29
+ fontist.org fetches two URLs per audited font:
30
+
31
+ ### 1. `index.json` — the map
32
+
33
+ Self-contained for the map view. Schema is the `AuditReport` shape
34
+ minus the verbose-only fields, plus per-block `missing_codepoints`
35
+ embedded directly.
36
+
37
+ ```json
38
+ {
39
+ "generated_at": "2026-06-27T12:00:00Z",
40
+ "ucode_version": "0.2.0",
41
+ "baseline": {
42
+ "unicode_version": "17.0.0",
43
+ "ucode_version": "0.2.0",
44
+ "fontisan_version": "0.2.22",
45
+ "source": "ucd-text + Unicode17Blocks overrides",
46
+ "generated_at": "2026-06-27T12:00:00Z"
47
+ },
48
+ "font": {
49
+ "source_file": "Inter-Regular.ttf",
50
+ "source_sha256": "3b1a...",
51
+ "source_format": "ttf",
52
+ "font_index": null,
53
+ "num_fonts_in_source": 1,
54
+ "family_name": "Inter",
55
+ "subfamily_name": "Regular",
56
+ "full_name": "Inter Regular",
57
+ "postscript_name": "Inter-Regular",
58
+ "version": "Version 4.000;git-a52131595",
59
+ "font_revision": 4.0,
60
+ "weight_class": 400,
61
+ "width_class": 5,
62
+ "italic": false,
63
+ "bold": false,
64
+ "panose": "2 0 5 3 0 0 0 0 0 0",
65
+ "total_codepoints": 2857,
66
+ "total_glyphs": 1486,
67
+ "cmap_subtables": [4, 12, 14]
68
+ },
69
+ "totals": {
70
+ "assigned_codepoints_total": 150012,
71
+ "covered_codepoints_total": 2857,
72
+ "blocks_touched": 24,
73
+ "blocks_complete": 12,
74
+ "blocks_partial": 12,
75
+ "scripts_touched": 5,
76
+ "scripts_complete": 0
77
+ },
78
+ "plane_summaries": [
79
+ { "plane": 0, "blocks_total": 18, "assigned_total": 55000,
80
+ "covered_total": 2857, "coverage_percent": 5.19 },
81
+ ...
82
+ ],
83
+ "block_summaries": [
84
+ {
85
+ "name": "Basic Latin",
86
+ "first_cp": 0, "last_cp": 127, "plane": 0,
87
+ "total_assigned": 128, "covered_count": 128,
88
+ "missing_count": 0, "coverage_percent": 100.0,
89
+ "status": "COMPLETE",
90
+ "missing_codepoints": []
91
+ },
92
+ {
93
+ "name": "Greek and Coptic",
94
+ "first_cp": 880, "last_cp": 1023, "plane": 0,
95
+ "total_assigned": 135, "covered_count": 80,
96
+ "missing_count": 55, "coverage_percent": 59.26,
97
+ "status": "PARTIAL",
98
+ "missing_codepoints": [881, 883, 885, ...]
99
+ },
100
+ ...
101
+ ],
102
+ "script_summaries": [
103
+ { "script_code": "Latn", "script_name": "Latin",
104
+ "blocks_total": 4, "assigned_total": 1207,
105
+ "covered_total": 1307, "coverage_percent": 100.0,
106
+ "status": "COMPLETE" },
107
+ ...
108
+ ],
109
+ "discrepancies": [],
110
+ "warning": null
111
+ }
112
+ ```
113
+
114
+ ### 2. `blocks/<NAME>.json` — on-demand block expansion
115
+
116
+ If the renderer offers an "expand this block" interaction, it fetches
117
+ the per-block file. Same shape as the entry in `block_summaries` — but
118
+ already in its own file. Use this when iterating one block at a time
119
+ without re-parsing `index.json`.
120
+
121
+ ```json
122
+ {
123
+ "name": "CJK Unified Ideographs",
124
+ "first_cp": 19968, "last_cp": 40959, "plane": 0,
125
+ "total_assigned": 20992, "covered_count": 20950,
126
+ "missing_count": 42, "coverage_percent": 99.80,
127
+ "status": "PARTIAL",
128
+ "missing_codepoints": [19980, 19982, ...]
129
+ }
130
+ ```
131
+
132
+ ## What fontist.org fetches but ignores
133
+
134
+ These sections are in `index.json` but fontist.org does not render them.
135
+ They exist for ucode's local browser and for archival consumers:
136
+
137
+ - `font.italic`, `font.bold`, `font.panose`, `font.weight_class`,
138
+ `font.width_class` — style metadata.
139
+ - `font.cmap_subtables` — internal parser provenance.
140
+ - `font.total_glyphs` — distinct from `total_codepoints`.
141
+ - `licensing`, `metrics`, `hinting`, `color_capabilities`, `variation`,
142
+ `opentype_layout` — full archival record fields. fontist.org may
143
+ surface these in a "details" tab; default behavior is to ignore.
144
+
145
+ ## Backwards-compatibility rules
146
+
147
+ - **Field additions**: minor ucode version bump, no fontist.org change
148
+ required. Renderer ignores unknown fields.
149
+ - **Field removals or renames**: major ucode version bump. Document in
150
+ this file with a "Migrating from X to Y" section. fontist.org must
151
+ update in lockstep.
152
+ - **Status enum expansion** (e.g. adding a new value to
153
+ `block_summaries[].status`): minor bump. Renderer treats unknown
154
+ status as `PARTIAL`.
155
+
156
+ ## Acceptance
157
+
158
+ - A fontist.org fetch of `index.json` is sufficient to render a
159
+ coverage map. No secondary fetch needed for the initial view.
160
+ - A fontist.org fetch of `blocks/<NAME>.json` is sufficient to render
161
+ the per-block drill-down view.
162
+ - Total payload for the initial view is under 500KB for fonts up to
163
+ ~30k codepoints; under 200KB for typical Latin-only fonts.
164
+ - The contract is independently testable: a fixture `index.json` under
165
+ `spec/fixtures/audit/` exercises every documented field.
166
+
167
+ ## References
168
+
169
+ - Schema source: `TODO.new/02-audit-schema-design.md`
170
+ - Layout source: `TODO.new/03-directory-output-spec.md`
171
+ - fontist.org repo: `/Users/mulgogi/src/fontist/fontist.org` (consumer)
172
+ - Existing audit text renderer (fontisan):
173
+ `fontisan/lib/fontisan/formatters/audit_text_renderer.rb`