ucode 0.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +7 -0
- data/CLAUDE.md +211 -0
- data/Gemfile +22 -0
- data/Gemfile.lock +406 -0
- data/README.md +469 -0
- data/Rakefile +18 -0
- data/TODO.new/00-README.md +66 -0
- data/TODO.new/01-pillar-terminology-alignment.md +69 -0
- data/TODO.new/02-audit-schema-design.md +255 -0
- data/TODO.new/03-directory-output-spec.md +203 -0
- data/TODO.new/04-fontist-org-contract.md +173 -0
- data/TODO.new/05-baseline-unicode17-coverage-audit.md +144 -0
- data/TODO.new/06-audit-namespace-skeleton.md +105 -0
- data/TODO.new/07-audit-models-port.md +132 -0
- data/TODO.new/08-extractors-cheap-port.md +113 -0
- data/TODO.new/09-extractors-expensive-port.md +99 -0
- data/TODO.new/10-aggregations-ucd-rewrite.md +168 -0
- data/TODO.new/11-differ-and-library-auditor-port.md +102 -0
- data/TODO.new/12-formatters-port.md +115 -0
- data/TODO.new/13-directory-emitter.md +147 -0
- data/TODO.new/14-html-face-browser.md +144 -0
- data/TODO.new/15-html-library-browser.md +102 -0
- data/TODO.new/16-cli-audit-subcommands.md +142 -0
- data/TODO.new/17-fontisan-cleanup-audit.md +147 -0
- data/TODO.new/18-fontisan-cleanup-ucd.md +156 -0
- data/TODO.new/19-fontisan-docs-update.md +155 -0
- data/TODO.new/20-canonical-resolver-4-tier.md +182 -0
- data/TODO.new/21-canonical-unicode17-build.md +148 -0
- data/TODO.new/22-implementation-order.md +176 -0
- data/UCODE_CHANGELOG.md +97 -0
- data/exe/ucode +8 -0
- data/lib/ucode/aggregator.rb +77 -0
- data/lib/ucode/audit/block_aggregator.rb +90 -0
- data/lib/ucode/audit/codepoint_range_coalescer.rb +42 -0
- data/lib/ucode/audit/context.rb +137 -0
- data/lib/ucode/audit/discrepancy_detector.rb +213 -0
- data/lib/ucode/audit/extractors/aggregations.rb +70 -0
- data/lib/ucode/audit/extractors/base.rb +21 -0
- data/lib/ucode/audit/extractors/color_capabilities.rb +143 -0
- data/lib/ucode/audit/extractors/coverage.rb +55 -0
- data/lib/ucode/audit/extractors/hinting.rb +199 -0
- data/lib/ucode/audit/extractors/identity.rb +65 -0
- data/lib/ucode/audit/extractors/licensing.rb +75 -0
- data/lib/ucode/audit/extractors/metrics.rb +108 -0
- data/lib/ucode/audit/extractors/opentype_layout.rb +71 -0
- data/lib/ucode/audit/extractors/provenance.rb +34 -0
- data/lib/ucode/audit/extractors/style.rb +88 -0
- data/lib/ucode/audit/extractors/variation_detail.rb +101 -0
- data/lib/ucode/audit/extractors.rb +31 -0
- data/lib/ucode/audit/plane_aggregator.rb +37 -0
- data/lib/ucode/audit/registry.rb +63 -0
- data/lib/ucode/audit/script_aggregator.rb +92 -0
- data/lib/ucode/audit.rb +27 -0
- data/lib/ucode/cache.rb +113 -0
- data/lib/ucode/cli.rb +272 -0
- data/lib/ucode/commands/build.rb +68 -0
- data/lib/ucode/commands/cache.rb +46 -0
- data/lib/ucode/commands/fetch.rb +62 -0
- data/lib/ucode/commands/font_coverage.rb +57 -0
- data/lib/ucode/commands/glyphs.rb +136 -0
- data/lib/ucode/commands/lookup.rb +65 -0
- data/lib/ucode/commands/parse.rb +62 -0
- data/lib/ucode/commands/site.rb +33 -0
- data/lib/ucode/commands.rb +19 -0
- data/lib/ucode/config.rb +110 -0
- data/lib/ucode/coordinator/indices.rb +34 -0
- data/lib/ucode/coordinator.rb +397 -0
- data/lib/ucode/database.rb +214 -0
- data/lib/ucode/db_builder.rb +107 -0
- data/lib/ucode/error.rb +96 -0
- data/lib/ucode/fetch/code_charts.rb +57 -0
- data/lib/ucode/fetch/http.rb +83 -0
- data/lib/ucode/fetch/ucd_zip.rb +57 -0
- data/lib/ucode/fetch/unihan_zip.rb +57 -0
- data/lib/ucode/fetch.rb +14 -0
- data/lib/ucode/glyphs/cell_extractor.rb +130 -0
- data/lib/ucode/glyphs/dvisvgm_renderer.rb +29 -0
- data/lib/ucode/glyphs/embedded_fonts/catalog.rb +372 -0
- data/lib/ucode/glyphs/embedded_fonts/content_stream_correlator.rb +228 -0
- data/lib/ucode/glyphs/embedded_fonts/font_entry.rb +126 -0
- data/lib/ucode/glyphs/embedded_fonts/renderer.rb +47 -0
- data/lib/ucode/glyphs/embedded_fonts/source.rb +94 -0
- data/lib/ucode/glyphs/embedded_fonts/svg.rb +123 -0
- data/lib/ucode/glyphs/embedded_fonts/tounicode.rb +103 -0
- data/lib/ucode/glyphs/embedded_fonts/writer.rb +76 -0
- data/lib/ucode/glyphs/embedded_fonts.rb +50 -0
- data/lib/ucode/glyphs/grid.rb +30 -0
- data/lib/ucode/glyphs/grid_detector.rb +165 -0
- data/lib/ucode/glyphs/last_resort/cmap_index.rb +96 -0
- data/lib/ucode/glyphs/last_resort/contents.rb +74 -0
- data/lib/ucode/glyphs/last_resort/glif.rb +124 -0
- data/lib/ucode/glyphs/last_resort/renderer.rb +67 -0
- data/lib/ucode/glyphs/last_resort/source.rb +125 -0
- data/lib/ucode/glyphs/last_resort/svg.rb +247 -0
- data/lib/ucode/glyphs/last_resort/writer.rb +83 -0
- data/lib/ucode/glyphs/last_resort.rb +36 -0
- data/lib/ucode/glyphs/monolith_page_map.rb +181 -0
- data/lib/ucode/glyphs/mutool_renderer.rb +28 -0
- data/lib/ucode/glyphs/page_renderer.rb +221 -0
- data/lib/ucode/glyphs/path_bbox.rb +62 -0
- data/lib/ucode/glyphs/pdf2svg_renderer.rb +26 -0
- data/lib/ucode/glyphs/pdf_fetcher.rb +102 -0
- data/lib/ucode/glyphs/pdftocairo_renderer.rb +32 -0
- data/lib/ucode/glyphs/real_fonts/block_coverage.rb +45 -0
- data/lib/ucode/glyphs/real_fonts/coverage_auditor.rb +117 -0
- data/lib/ucode/glyphs/real_fonts/font_coverage_report.rb +45 -0
- data/lib/ucode/glyphs/real_fonts/font_locator.rb +95 -0
- data/lib/ucode/glyphs/real_fonts/unicode_17_blocks.rb +104 -0
- data/lib/ucode/glyphs/real_fonts/writer.rb +50 -0
- data/lib/ucode/glyphs/real_fonts.rb +32 -0
- data/lib/ucode/glyphs/writer.rb +250 -0
- data/lib/ucode/glyphs.rb +27 -0
- data/lib/ucode/index.rb +106 -0
- data/lib/ucode/index_builder.rb +94 -0
- data/lib/ucode/models/audit/audit_axis.rb +30 -0
- data/lib/ucode/models/audit/audit_diff.rb +77 -0
- data/lib/ucode/models/audit/audit_report.rb +137 -0
- data/lib/ucode/models/audit/baseline.rb +32 -0
- data/lib/ucode/models/audit/block_summary.rb +72 -0
- data/lib/ucode/models/audit/codepoint_detail.rb +45 -0
- data/lib/ucode/models/audit/codepoint_range.rb +39 -0
- data/lib/ucode/models/audit/codepoint_set_diff.rb +34 -0
- data/lib/ucode/models/audit/color_capabilities.rb +91 -0
- data/lib/ucode/models/audit/discrepancy.rb +38 -0
- data/lib/ucode/models/audit/duplicate_group.rb +23 -0
- data/lib/ucode/models/audit/embedding_type.rb +81 -0
- data/lib/ucode/models/audit/field_change.rb +28 -0
- data/lib/ucode/models/audit/fs_selection_flags.rb +65 -0
- data/lib/ucode/models/audit/gasp_range.rb +63 -0
- data/lib/ucode/models/audit/hinting.rb +99 -0
- data/lib/ucode/models/audit/library_summary.rb +40 -0
- data/lib/ucode/models/audit/licensing.rb +48 -0
- data/lib/ucode/models/audit/metrics.rb +111 -0
- data/lib/ucode/models/audit/named_instance.rb +41 -0
- data/lib/ucode/models/audit/opentype_layout.rb +38 -0
- data/lib/ucode/models/audit/plane_summary.rb +31 -0
- data/lib/ucode/models/audit/script_coverage_row.rb +26 -0
- data/lib/ucode/models/audit/script_features.rb +28 -0
- data/lib/ucode/models/audit/script_summary.rb +54 -0
- data/lib/ucode/models/audit/variation_detail.rb +42 -0
- data/lib/ucode/models/audit.rb +50 -0
- data/lib/ucode/models/bidi_bracket_pair.rb +20 -0
- data/lib/ucode/models/bidi_mirroring.rb +19 -0
- data/lib/ucode/models/binary_property_assignment.rb +26 -0
- data/lib/ucode/models/block.rb +36 -0
- data/lib/ucode/models/case_folding_rule.rb +23 -0
- data/lib/ucode/models/cjk_radical.rb +23 -0
- data/lib/ucode/models/codepoint/bidi.rb +28 -0
- data/lib/ucode/models/codepoint/break_segmentation.rb +22 -0
- data/lib/ucode/models/codepoint/case_folding.rb +25 -0
- data/lib/ucode/models/codepoint/casing.rb +32 -0
- data/lib/ucode/models/codepoint/decomposition.rb +27 -0
- data/lib/ucode/models/codepoint/display.rb +24 -0
- data/lib/ucode/models/codepoint/emoji.rb +29 -0
- data/lib/ucode/models/codepoint/hangul.rb +20 -0
- data/lib/ucode/models/codepoint/identifier.rb +30 -0
- data/lib/ucode/models/codepoint/indic.rb +20 -0
- data/lib/ucode/models/codepoint/joining.rb +20 -0
- data/lib/ucode/models/codepoint/normalization.rb +35 -0
- data/lib/ucode/models/codepoint/numeric_value.rb +35 -0
- data/lib/ucode/models/codepoint.rb +122 -0
- data/lib/ucode/models/name_alias.rb +21 -0
- data/lib/ucode/models/named_sequence.rb +19 -0
- data/lib/ucode/models/names_list_entry.rb +38 -0
- data/lib/ucode/models/plane.rb +36 -0
- data/lib/ucode/models/property_alias.rb +24 -0
- data/lib/ucode/models/property_value_alias.rb +26 -0
- data/lib/ucode/models/relationship/compat_equiv.rb +18 -0
- data/lib/ucode/models/relationship/cross_reference.rb +17 -0
- data/lib/ucode/models/relationship/footnote.rb +24 -0
- data/lib/ucode/models/relationship/informal_alias.rb +18 -0
- data/lib/ucode/models/relationship/sample_sequence.rb +24 -0
- data/lib/ucode/models/relationship/variation_sequence.rb +19 -0
- data/lib/ucode/models/relationship.rb +57 -0
- data/lib/ucode/models/script.rb +41 -0
- data/lib/ucode/models/special_casing_rule.rb +28 -0
- data/lib/ucode/models/standardized_variant.rb +24 -0
- data/lib/ucode/models/unihan_entry.rb +23 -0
- data/lib/ucode/models.rb +47 -0
- data/lib/ucode/parsers/auxiliary.rb +26 -0
- data/lib/ucode/parsers/base.rb +137 -0
- data/lib/ucode/parsers/bidi_brackets.rb +41 -0
- data/lib/ucode/parsers/bidi_mirroring.rb +37 -0
- data/lib/ucode/parsers/blocks.rb +63 -0
- data/lib/ucode/parsers/case_folding.rb +53 -0
- data/lib/ucode/parsers/cjk_radicals.rb +102 -0
- data/lib/ucode/parsers/derived_age.rb +59 -0
- data/lib/ucode/parsers/derived_core_properties.rb +60 -0
- data/lib/ucode/parsers/extracted_properties.rb +74 -0
- data/lib/ucode/parsers/name_aliases.rb +44 -0
- data/lib/ucode/parsers/named_sequences.rb +51 -0
- data/lib/ucode/parsers/names_list.rb +250 -0
- data/lib/ucode/parsers/property_aliases.rb +41 -0
- data/lib/ucode/parsers/property_value_aliases.rb +46 -0
- data/lib/ucode/parsers/script_extensions.rb +64 -0
- data/lib/ucode/parsers/scripts.rb +60 -0
- data/lib/ucode/parsers/special_casing.rb +62 -0
- data/lib/ucode/parsers/standardized_variants.rb +56 -0
- data/lib/ucode/parsers/unicode_data/hangul_name.rb +73 -0
- data/lib/ucode/parsers/unicode_data.rb +268 -0
- data/lib/ucode/parsers/unihan.rb +125 -0
- data/lib/ucode/parsers.rb +35 -0
- data/lib/ucode/range_entry.rb +58 -0
- data/lib/ucode/repo/aggregate_writer.rb +364 -0
- data/lib/ucode/repo/atomic_writes.rb +48 -0
- data/lib/ucode/repo/codepoint_writer.rb +96 -0
- data/lib/ucode/repo/paths.rb +122 -0
- data/lib/ucode/repo.rb +22 -0
- data/lib/ucode/site/config_emitter.rb +124 -0
- data/lib/ucode/site/generator.rb +178 -0
- data/lib/ucode/site/search_index.rb +68 -0
- data/lib/ucode/site/template/.gitignore +4 -0
- data/lib/ucode/site/template/.vitepress/config.ts +8 -0
- data/lib/ucode/site/template/.vitepress/theme/index.js +20 -0
- data/lib/ucode/site/template/char/[codepoint].md +13 -0
- data/lib/ucode/site/template/components/BlockView.vue +57 -0
- data/lib/ucode/site/template/components/CharView.vue +85 -0
- data/lib/ucode/site/template/components/PlaneView.vue +56 -0
- data/lib/ucode/site/template/components/SearchView.vue +66 -0
- data/lib/ucode/site/template/index.md +25 -0
- data/lib/ucode/site/template/package.json +18 -0
- data/lib/ucode/site/template/search.md +9 -0
- data/lib/ucode/site.rb +13 -0
- data/lib/ucode/version.rb +5 -0
- data/lib/ucode/version_resolver.rb +76 -0
- data/lib/ucode.rb +74 -0
- data/ucode.gemspec +56 -0
- metadata +404 -0
|
@@ -0,0 +1,168 @@
|
|
|
1
|
+
# 10 — Aggregations rewrite on ucode UCD
|
|
2
|
+
|
|
3
|
+
## Goal
|
|
4
|
+
|
|
5
|
+
The aggregations extractor is the only one that does **not** port
|
|
6
|
+
mechanically from fontisan. Fontisan's version reads `ucd.all.flat.zip`
|
|
7
|
+
via UCDXML. ucode's version uses ucode's own parsed UCD text files
|
|
8
|
+
(UnicodeData.txt, Blocks.txt, Scripts.txt, ScriptExtensions.txt) —
|
|
9
|
+
which is the whole reason for the migration.
|
|
10
|
+
|
|
11
|
+
This is also the extractor that produces the Unicode-coverage output
|
|
12
|
+
fontist.org consumes, so the schema (`blocks`, `scripts`,
|
|
13
|
+
`plane_summaries`, `discrepancies`) must match `02-audit-schema-design.md`
|
|
14
|
+
exactly.
|
|
15
|
+
|
|
16
|
+
## Files to create
|
|
17
|
+
|
|
18
|
+
- `lib/ucode/audit/extractors/aggregations.rb` — main extractor.
|
|
19
|
+
- `lib/ucode/audit/block_aggregator.rb` — given codepoints + ucode
|
|
20
|
+
baseline, produce `BlockSummary[]`.
|
|
21
|
+
- `lib/ucode/audit/script_aggregator.rb` — given codepoints + ucode
|
|
22
|
+
baseline, produce `ScriptSummary[]`.
|
|
23
|
+
- `lib/ucode/audit/plane_aggregator.rb` — roll up block summaries into
|
|
24
|
+
`PlaneSummary[]`.
|
|
25
|
+
- `lib/ucode/audit/discrepancy_detector.rb` — produce `Discrepancy[]`
|
|
26
|
+
from font OS/2 ulUnicodeRange bits vs cmap codepoints.
|
|
27
|
+
- Plus update `lib/ucode/audit/registry.rb` to add `Aggregations`
|
|
28
|
+
to `ORDERED_EXTRACTORS` (last entry; not in `BRIEF_EXTRACTORS`).
|
|
29
|
+
|
|
30
|
+
Specs:
|
|
31
|
+
- `spec/ucode/audit/extractors/aggregations_spec.rb`
|
|
32
|
+
- `spec/ucode/audit/block_aggregator_spec.rb`
|
|
33
|
+
- `spec/ucode/audit/script_aggregator_spec.rb`
|
|
34
|
+
- `spec/ucode/audit/plane_aggregator_spec.rb`
|
|
35
|
+
- `spec/ucode/audit/discrepancy_detector_spec.rb`
|
|
36
|
+
|
|
37
|
+
## What to use from ucode
|
|
38
|
+
|
|
39
|
+
ucode already provides (see `docs/FONTISAN_MIGRATION.md` API list):
|
|
40
|
+
|
|
41
|
+
- `Ucode::Database.open(version)` / `.cached?(version)` — SQLite-backed
|
|
42
|
+
lookup.
|
|
43
|
+
- `Ucode::Database#lookup_block(cp)` → block name (RangeEntry).
|
|
44
|
+
- `Ucode::Database#lookup_script(cp)` → script name.
|
|
45
|
+
- `Ucode::Database#each_block_overlapping(first, last)` — for block
|
|
46
|
+
range queries.
|
|
47
|
+
- `Ucode::Database#block_entries` → all `(first, last, name)` triples.
|
|
48
|
+
- `Ucode::Database#script_entries` → ditto for scripts.
|
|
49
|
+
- `Ucode::Aggregator.aggregate_blocks(codepoints, blocks_index)` —
|
|
50
|
+
existing helper, may need extension.
|
|
51
|
+
- `Ucode::Aggregator.aggregate_scripts(codepoints, scripts_index)` —
|
|
52
|
+
existing helper.
|
|
53
|
+
|
|
54
|
+
**Use these.** Do not re-implement UCD parsing in the audit namespace.
|
|
55
|
+
The aggregation logic IS new (it produces `BlockSummary` shapes with
|
|
56
|
+
status/missing/etc.), but the underlying UCD lookup is ucode's
|
|
57
|
+
existing API.
|
|
58
|
+
|
|
59
|
+
## Algorithm — BlockAggregator
|
|
60
|
+
|
|
61
|
+
Input: `codepoints` (sorted `Integer[]`, from Coverage extractor) +
|
|
62
|
+
`baseline` (the `Ucode::Database` for the target version).
|
|
63
|
+
|
|
64
|
+
Output: `Ucode::Models::Audit::BlockSummary[]` (one per touched block).
|
|
65
|
+
|
|
66
|
+
```
|
|
67
|
+
1. For each codepoint in the font:
|
|
68
|
+
- block_name = baseline.lookup_block(cp)
|
|
69
|
+
- tally[block_name] << cp
|
|
70
|
+
- track touched_blocks set
|
|
71
|
+
2. For each touched block:
|
|
72
|
+
- first_cp, last_cp = baseline.block_range(block_name)
|
|
73
|
+
- plane = first_cp >> 16
|
|
74
|
+
- total_assigned = count of codepoints in [first_cp, last_cp]
|
|
75
|
+
where baseline says "assigned" (not reserved/unassigned).
|
|
76
|
+
For Unicode 17 new blocks, use the curated Unicode17Blocks
|
|
77
|
+
table (handles reserved gaps like Beria Erfe U+16EB9-U+16EBA).
|
|
78
|
+
- covered_count = tally[block_name].size
|
|
79
|
+
- missing_codepoints = assigned_set - tally[block_name]
|
|
80
|
+
- status = pick from enum per 02-audit-schema-design.md
|
|
81
|
+
3. Return BlockSummary[] sorted by first_cp.
|
|
82
|
+
```
|
|
83
|
+
|
|
84
|
+
The "is this codepoint assigned?" check is the subtle bit. ucode's
|
|
85
|
+
baseline knows via UnicodeData.txt entries (a codepoint is assigned
|
|
86
|
+
iff it has a name entry, modulo `<range>` markers). For blocks where
|
|
87
|
+
the curated `Unicode17Blocks` overrides apply, use those (Beria Erfe
|
|
88
|
+
reserved gap, etc.). This logic lives in `Ucode::Database` or a new
|
|
89
|
+
helper; do not duplicate it in the aggregator.
|
|
90
|
+
|
|
91
|
+
## Algorithm — ScriptAggregator
|
|
92
|
+
|
|
93
|
+
Same shape but keyed on `lookup_script(cp)`. Note: ScriptExtensions
|
|
94
|
+
means a codepoint can have multiple scripts. Use `ScriptExtensions.txt`
|
|
95
|
+
to expand — a codepoint in `ScriptExtensions: Latn;Grek` contributes
|
|
96
|
+
to both `Latn` and `Grek` tallies.
|
|
97
|
+
|
|
98
|
+
Output: `ScriptSummary[]` (one per touched script).
|
|
99
|
+
|
|
100
|
+
## Algorithm — PlaneAggregator
|
|
101
|
+
|
|
102
|
+
Roll up `BlockSummary[]` by `plane`. Straightforward sum.
|
|
103
|
+
|
|
104
|
+
## Algorithm — DiscrepancyDetector
|
|
105
|
+
|
|
106
|
+
Read `font.table("OS/2").ul_unicode_range1..4` (4 × 32-bit = 128 bits).
|
|
107
|
+
Each bit corresponds to a Unicode range (per OpenType spec, "Unicode
|
|
108
|
+
Range Bits" table). For each set bit, look up the corresponding
|
|
109
|
+
codepoint range; if the cmap has zero codepoints in that range, emit
|
|
110
|
+
a `Discrepancy` of kind
|
|
111
|
+
`"os2_unicode_range_bit_without_cmap_codepoints"`.
|
|
112
|
+
|
|
113
|
+
Also detect the inverse: cmap codepoints in a range the OS/2 bits
|
|
114
|
+
don't claim. Less critical; emit as
|
|
115
|
+
`"cmap_codepoints_outside_os2_unicode_range"`.
|
|
116
|
+
|
|
117
|
+
Map of OS/2 ulUnicodeRange bit → range lives in OpenType spec. Embed
|
|
118
|
+
as a constant table in the detector.
|
|
119
|
+
|
|
120
|
+
## Output schema
|
|
121
|
+
|
|
122
|
+
The `Aggregations` extractor returns a hash:
|
|
123
|
+
|
|
124
|
+
```ruby
|
|
125
|
+
{
|
|
126
|
+
baseline: Baseline.new(unicode_version: ..., ...),
|
|
127
|
+
blocks: [...],
|
|
128
|
+
scripts: [...],
|
|
129
|
+
plane_summaries: [...],
|
|
130
|
+
discrepancies: [...],
|
|
131
|
+
}
|
|
132
|
+
```
|
|
133
|
+
|
|
134
|
+
The `AuditReport` constructor merges this into the right attributes.
|
|
135
|
+
|
|
136
|
+
## Acceptance
|
|
137
|
+
|
|
138
|
+
- Aggregations extractor produces a non-empty `blocks` array for any
|
|
139
|
+
font with at least one assigned codepoint.
|
|
140
|
+
- For a fixture font with known coverage (e.g. Noto Sans Sidetic
|
|
141
|
+
covering all 26 Sidetic codepoints), the audit reports:
|
|
142
|
+
- `block_summaries` entry for Sidetic with `status: "COMPLETE"`,
|
|
143
|
+
`covered_count: 26`, `missing_codepoints: []`.
|
|
144
|
+
- For a partial-coverage fixture (e.g. Inter covering 80/135 Greek):
|
|
145
|
+
- `block_summaries` entry with `status: "PARTIAL"`,
|
|
146
|
+
`missing_count: 55`, `missing_codepoints: [881, 883, ...]`.
|
|
147
|
+
- Plane rollups correctly sum multi-block planes (BMP has ~200
|
|
148
|
+
blocks; rollup counts all).
|
|
149
|
+
- Discrepancies detect a deliberately-broken fixture (OS/2 bit set
|
|
150
|
+
but cmap empty in that range).
|
|
151
|
+
- The `Baseline` struct reports `unicode_version`, `ucode_version`,
|
|
152
|
+
`source: "ucd-text + Unicode17Blocks overrides"`.
|
|
153
|
+
- All specs use real `Ucode::Database` instances (built from fixture
|
|
154
|
+
UCD slices under `spec/fixtures/ucd/`).
|
|
155
|
+
- No `double()`.
|
|
156
|
+
- Rubocop clean.
|
|
157
|
+
|
|
158
|
+
## References
|
|
159
|
+
|
|
160
|
+
- Schema: `TODO.new/02-audit-schema-design.md`
|
|
161
|
+
- ucode UCD API: `docs/FONTISAN_MIGRATION.md` §"Coordinator + Indices"
|
|
162
|
+
- Existing helpers: `lib/ucode/aggregator.rb`, `lib/ucode/database.rb`
|
|
163
|
+
- Curated overrides: `lib/ucode/glyphs/real_fonts/unicode_17_blocks.rb`
|
|
164
|
+
(move to `lib/ucode/ucd/unicode_17_overrides.rb` as part of this TODO
|
|
165
|
+
if it makes the dependency cleaner)
|
|
166
|
+
- Source being replaced:
|
|
167
|
+
`fontisan/lib/fontisan/audit/extractors/aggregations.rb` (reference
|
|
168
|
+
for the field shape, but the implementation is replaced)
|
|
@@ -0,0 +1,102 @@
|
|
|
1
|
+
# 11 — Differ + library auditor port
|
|
2
|
+
|
|
3
|
+
## Goal
|
|
4
|
+
|
|
5
|
+
Port fontisan's `Differ` (diffs two `AuditReport`s) and `LibraryAuditor`
|
|
6
|
+
/ `LibraryAggregator` (audits a directory of fonts and rolls up).
|
|
7
|
+
These are higher-level orchestration on top of the per-face audit
|
|
8
|
+
covered by TODOs 06-10.
|
|
9
|
+
|
|
10
|
+
## Files to create
|
|
11
|
+
|
|
12
|
+
- `lib/ucode/audit/differ.rb` — port from fontisan.
|
|
13
|
+
- `lib/ucode/audit/library_auditor.rb` — port from fontisan.
|
|
14
|
+
- `lib/ucode/audit/library_aggregator.rb` — port from fontisan.
|
|
15
|
+
- `lib/ucode/audit/codepoint_range_coalescer.rb` — port from fontisan
|
|
16
|
+
(already partially ported in TODO 08 if Coverage needs it; if not,
|
|
17
|
+
port here as a Differ dependency).
|
|
18
|
+
- Specs for each.
|
|
19
|
+
|
|
20
|
+
## Port from fontisan
|
|
21
|
+
|
|
22
|
+
- `fontisan/lib/fontisan/audit/differ.rb`
|
|
23
|
+
- `fontisan/lib/fontisan/audit/library_auditor.rb`
|
|
24
|
+
- `fontisan/lib/fontisan/audit/library_aggregator.rb`
|
|
25
|
+
- `fontisan/lib/fontisan/audit/codepoint_range_coalescer.rb`
|
|
26
|
+
- `fontisan/lib/fontisan/models/audit/audit_diff.rb` (already ported in
|
|
27
|
+
TODO 07; consumed by `Differ`)
|
|
28
|
+
- `fontisan/lib/fontisan/models/audit/codepoint_set_diff.rb` (ditto)
|
|
29
|
+
- `fontisan/lib/fontisan/models/audit/field_change.rb` (ditto)
|
|
30
|
+
- `fontisan/lib/fontisan/models/audit/duplicate_group.rb` (ditto)
|
|
31
|
+
- `fontisan/lib/fontisan/models/audit/library_summary.rb` (ditto)
|
|
32
|
+
|
|
33
|
+
## Differ adjustments
|
|
34
|
+
|
|
35
|
+
The fontisan `Differ` produces an `AuditDiff` containing:
|
|
36
|
+
|
|
37
|
+
- `field_changes` — `FieldChange[]` (per-field old/new).
|
|
38
|
+
- `codepoint_set_diff` — `CodepointSetDiff` (added/removed codepoints).
|
|
39
|
+
- `block_changes` — per-block coverage deltas.
|
|
40
|
+
|
|
41
|
+
Port unchanged. The ucode version operates on `Ucode::Models::Audit::AuditReport`
|
|
42
|
+
instances instead of fontisan's.
|
|
43
|
+
|
|
44
|
+
## LibraryAuditor adjustments
|
|
45
|
+
|
|
46
|
+
The fontisan `LibraryAuditor`:
|
|
47
|
+
|
|
48
|
+
1. Walks a directory (optionally recursive).
|
|
49
|
+
2. For each font file, runs an `AuditCommand`.
|
|
50
|
+
3. Collects reports + tracks skipped files (non-font files, permission
|
|
51
|
+
errors, etc.).
|
|
52
|
+
4. Returns an array of reports.
|
|
53
|
+
|
|
54
|
+
Port unchanged. The ucode version delegates to `Ucode::Audit::Command`
|
|
55
|
+
(added in TODO 13 with the CLI; this TODO assumes its existence or
|
|
56
|
+
extracts the orchestrator earlier — see Implementation Order).
|
|
57
|
+
|
|
58
|
+
## LibraryAggregator adjustments
|
|
59
|
+
|
|
60
|
+
The fontisan `LibraryAggregator` takes an array of reports and produces
|
|
61
|
+
a `LibrarySummary`:
|
|
62
|
+
|
|
63
|
+
- Total font count.
|
|
64
|
+
- Per-block coverage aggregated across all fonts (max / union /
|
|
65
|
+
intersection).
|
|
66
|
+
- Duplicate detection — `DuplicateGroup[]` for fonts with identical
|
|
67
|
+
`source_sha256` or identical codepoint sets.
|
|
68
|
+
- Per-foundry totals (grouped by `font.family_name` or by
|
|
69
|
+
`licensing.manufacturer`).
|
|
70
|
+
|
|
71
|
+
Port unchanged.
|
|
72
|
+
|
|
73
|
+
## CLI integration
|
|
74
|
+
|
|
75
|
+
The library auditor is wired to the CLI in TODO 16 as
|
|
76
|
+
`ucode audit library <dir>`. The compare/differ is wired as
|
|
77
|
+
`ucode audit compare <left> <right>`. This TODO just delivers the
|
|
78
|
+
orchestration classes; CLI is a separate concern.
|
|
79
|
+
|
|
80
|
+
## Acceptance
|
|
81
|
+
|
|
82
|
+
- `Ucode::Audit::Differ.new(left_report, right_report).diff` returns
|
|
83
|
+
an `AuditDiff` with all three sections populated for a meaningfully-
|
|
84
|
+
different report pair.
|
|
85
|
+
- `Ucode::Audit::Differ` on identical reports returns an `AuditDiff`
|
|
86
|
+
with empty arrays (no false positives).
|
|
87
|
+
- `Ucode::Audit::LibraryAuditor.new(dir, recursive: true, options:).audit`
|
|
88
|
+
walks the directory and produces one report per font, skipping
|
|
89
|
+
non-font files with a record in `#skipped`.
|
|
90
|
+
- `Ucode::Audit::LibraryAggregator.aggregate(reports)` returns a
|
|
91
|
+
`LibrarySummary` with per-block union coverage and duplicate groups.
|
|
92
|
+
- Spec uses a fixture library directory with 3-5 small fonts (some
|
|
93
|
+
duplicates, some unique).
|
|
94
|
+
- No `double()`.
|
|
95
|
+
- Rubocop clean.
|
|
96
|
+
|
|
97
|
+
## References
|
|
98
|
+
|
|
99
|
+
- Reports: `TODO.new/07-audit-models-port.md`
|
|
100
|
+
- Source: `fontisan/lib/fontisan/audit/{differ,library_auditor,library_aggregator,codepoint_range_coalescer}.rb`
|
|
101
|
+
- CLI wiring: `TODO.new/16-cli-audit-subcommands.md`
|
|
102
|
+
- Output: `TODO.new/13-directory-emitter.md` (library mode layout)
|
|
@@ -0,0 +1,115 @@
|
|
|
1
|
+
# 12 — Formatters port
|
|
2
|
+
|
|
3
|
+
## Goal
|
|
4
|
+
|
|
5
|
+
Port fontisan's text-rendering formatters. These power the
|
|
6
|
+
human-readable output of `ucode audit font` (text rendering to stdout)
|
|
7
|
+
and `ucode audit compare` (diff rendering). They are presentation-only
|
|
8
|
+
— they take an `AuditReport` (or `AuditDiff`, `LibrarySummary`) and
|
|
9
|
+
return a string.
|
|
10
|
+
|
|
11
|
+
## Files to create
|
|
12
|
+
|
|
13
|
+
- `lib/ucode/audit/formatters.rb` — namespace hub.
|
|
14
|
+
- `lib/ucode/audit/formatters/audit_text.rb` — port from fontisan
|
|
15
|
+
`AuditTextRenderer`.
|
|
16
|
+
- `lib/ucode/audit/formatters/audit_diff_text.rb` — port from fontisan
|
|
17
|
+
`AuditDiffTextRenderer`.
|
|
18
|
+
- `lib/ucode/audit/formatters/library_summary_text.rb` — port from
|
|
19
|
+
fontisan `LibrarySummaryTextRenderer`.
|
|
20
|
+
- `lib/ucode/audit/formatters/text_formatter.rb` — port from fontisan
|
|
21
|
+
`TextFormatter` (shared utilities).
|
|
22
|
+
- Specs for each.
|
|
23
|
+
|
|
24
|
+
## Port from fontisan
|
|
25
|
+
|
|
26
|
+
- `fontisan/lib/fontisan/formatters/text_formatter.rb`
|
|
27
|
+
- `fontisan/lib/fontisan/formatters/audit_text_renderer.rb`
|
|
28
|
+
- `fontisan/lib/fontisan/formatters/audit_diff_text_renderer.rb`
|
|
29
|
+
- `fontisan/lib/fontisan/formatters/library_summary_text_renderer.rb`
|
|
30
|
+
|
|
31
|
+
## Adjustments vs fontisan
|
|
32
|
+
|
|
33
|
+
The formatters read from the report model. Since ucode's report
|
|
34
|
+
shape differs from fontisan's (see `02-audit-schema-design.md`):
|
|
35
|
+
|
|
36
|
+
- Read `report.baseline.unicode_version` instead of `report.ucd_version`.
|
|
37
|
+
- Read `report.scripts` (`ScriptSummary[]`) instead of
|
|
38
|
+
`report.unicode_scripts` (`String[]`). Render as a table with
|
|
39
|
+
coverage percentages, not a flat list.
|
|
40
|
+
- Read `report.blocks` (`BlockSummary[]` with `status`, `coverage_percent`)
|
|
41
|
+
instead of fontisan's `AuditBlock[]` (with `fill_ratio`, `complete`).
|
|
42
|
+
- Render the `discrepancies` array if non-empty (fontisan has no
|
|
43
|
+
equivalent section).
|
|
44
|
+
- Render `plane_summaries` if non-empty (fontisan has no equivalent).
|
|
45
|
+
|
|
46
|
+
The text formatter's job is to make the audit output scannable in a
|
|
47
|
+
terminal. Aim for the same density as `git diff` or `ls -la`: short
|
|
48
|
+
columns, alignment, color codes via ANSI (honor `NO_COLOR=` env var).
|
|
49
|
+
|
|
50
|
+
## Output examples
|
|
51
|
+
|
|
52
|
+
### `ucode audit font Inter-Regular.ttf`
|
|
53
|
+
|
|
54
|
+
```
|
|
55
|
+
Inter Regular (Inter-Regular.ttf, ttf, sha256: 3b1a…)
|
|
56
|
+
Version 4.000;git-a52131595 fontRevision 4.0
|
|
57
|
+
Weight 400 Width 5 PANOSE 2 0 5 3 6 …
|
|
58
|
+
|
|
59
|
+
Coverage: 2,857 codepoints across 1,486 glyphs
|
|
60
|
+
Baseline: Unicode 17.0.0 (ucd-text + Unicode17Blocks overrides)
|
|
61
|
+
|
|
62
|
+
Plane 0 (BMP): 2,857 / 55,000 (5.2%)
|
|
63
|
+
Plane 1 (SMP): 0 / 12,000 (0.0%)
|
|
64
|
+
…
|
|
65
|
+
|
|
66
|
+
Blocks touched: 24 (12 complete, 12 partial)
|
|
67
|
+
Basic Latin U+0000–U+007F 128/128 COMPLETE
|
|
68
|
+
Greek and Coptic U+0370–U+03FF 80/135 PARTIAL (55 missing)
|
|
69
|
+
…
|
|
70
|
+
|
|
71
|
+
Scripts touched: 5
|
|
72
|
+
Latin 1,307/1,207 COMPLETE
|
|
73
|
+
Greek 80/135 PARTIAL
|
|
74
|
+
…
|
|
75
|
+
|
|
76
|
+
Discrepancies: 1
|
|
77
|
+
OS/2 ulUnicodeRange bit 7 (Greek) set but cmap has 0 Greek
|
|
78
|
+
codepoints outside U+0370–U+03FF subset.
|
|
79
|
+
```
|
|
80
|
+
|
|
81
|
+
### `ucode audit compare old.json new.json`
|
|
82
|
+
|
|
83
|
+
```
|
|
84
|
+
Inter-Regular → Inter-Regular (v4.0 → v4.1)
|
|
85
|
+
|
|
86
|
+
Field changes:
|
|
87
|
+
version Version 4.000 → Version 4.100
|
|
88
|
+
font_revision 4.0 → 4.1
|
|
89
|
+
total_codepoints 2,857 → 2,910 (+53)
|
|
90
|
+
|
|
91
|
+
Codepoint set:
|
|
92
|
+
+ 53 added - 0 removed = 2,910 final
|
|
93
|
+
Added (sample): U+037D, U+037E, U+0387, …
|
|
94
|
+
|
|
95
|
+
Block changes:
|
|
96
|
+
Greek and Coptic 80/135 → 133/135 (+53 covered)
|
|
97
|
+
Latin Extended-D 0/112 → 0/112 (no change)
|
|
98
|
+
```
|
|
99
|
+
|
|
100
|
+
## Acceptance
|
|
101
|
+
|
|
102
|
+
- All 4 formatter files exist; each has a passing spec.
|
|
103
|
+
- A fixture `AuditReport` renders to a stable text snapshot (use
|
|
104
|
+
rspec's `match` against a checked-in fixture string).
|
|
105
|
+
- ANSI color is suppressed when `ENV["NO_COLOR"]` is set.
|
|
106
|
+
- Long lists (e.g. 4,298 missing CJK codepoints) are truncated with
|
|
107
|
+
a `… (showing first 50; see blocks/<NAME>.json for full list)` footer.
|
|
108
|
+
- No `double()` in specs.
|
|
109
|
+
- Rubocop clean.
|
|
110
|
+
|
|
111
|
+
## References
|
|
112
|
+
|
|
113
|
+
- Models: `TODO.new/07-audit-models-port.md`
|
|
114
|
+
- Source: `fontisan/lib/fontisan/formatters/`
|
|
115
|
+
- CLI wiring: `TODO.new/16-cli-audit-subcommands.md`
|
|
@@ -0,0 +1,147 @@
|
|
|
1
|
+
# 13 — Directory emitter
|
|
2
|
+
|
|
3
|
+
## Goal
|
|
4
|
+
|
|
5
|
+
Walk an in-memory `AuditReport` (built by TODOs 06-12) and write it
|
|
6
|
+
to the directory tree specified in `03-directory-output-spec.md`.
|
|
7
|
+
Pure I/O — no audit logic, no font parsing. Idempotent via
|
|
8
|
+
content-hash comparison.
|
|
9
|
+
|
|
10
|
+
This is the Mode 2 output writer; equivalent in role to
|
|
11
|
+
`Ucode::Repo::CodepointWriter` for Mode 1.
|
|
12
|
+
|
|
13
|
+
## Files to create
|
|
14
|
+
|
|
15
|
+
```
|
|
16
|
+
lib/ucode/audit/emitter.rb # namespace hub
|
|
17
|
+
lib/ucode/audit/emitter/face_directory.rb # top-level orchestrator
|
|
18
|
+
lib/ucode/audit/emitter/index_emitter.rb # writes index.json
|
|
19
|
+
lib/ucode/audit/emitter/block_emitter.rb # writes blocks/<NAME>.json
|
|
20
|
+
lib/ucode/audit/emitter/plane_emitter.rb # writes planes/<N>.json
|
|
21
|
+
lib/ucode/audit/emitter/script_emitter.rb # writes scripts/<CODE>.json
|
|
22
|
+
lib/ucode/audit/emitter/codepoint_emitter.rb # writes codepoints/<NAME>.json (verbose)
|
|
23
|
+
lib/ucode/audit/emitter/glyph_emitter.rb # writes glyphs/U+XXXX.svg (opt-in)
|
|
24
|
+
lib/ucode/audit/emitter/collection_emitter.rb # writes <source>/00-<face>/ layout for TTC
|
|
25
|
+
lib/ucode/audit/emitter/library_emitter.rb # writes library-level index for directory mode
|
|
26
|
+
```
|
|
27
|
+
|
|
28
|
+
Specs under `spec/ucode/audit/emitter/`.
|
|
29
|
+
|
|
30
|
+
## Public API
|
|
31
|
+
|
|
32
|
+
```ruby
|
|
33
|
+
emitter = Ucode::Audit::Emitter::FaceDirectory.new(
|
|
34
|
+
output_root: Pathname.new("output/font_audit"),
|
|
35
|
+
verbose: false,
|
|
36
|
+
with_glyphs: false,
|
|
37
|
+
)
|
|
38
|
+
emitter.emit_face(label: "Inter-Regular", report: report)
|
|
39
|
+
# → writes output/font_audit/Inter-Regular/index.json + blocks/ + ...
|
|
40
|
+
|
|
41
|
+
emitter.emit_collection(label: "Inter", reports: array_of_reports)
|
|
42
|
+
# → writes output/font_audit/Inter/00-<face>/ + 01-<face>/ + ...
|
|
43
|
+
|
|
44
|
+
emitter.emit_library(reports_by_label:)
|
|
45
|
+
# → writes output/font_audit/<label>/... per font + library index
|
|
46
|
+
```
|
|
47
|
+
|
|
48
|
+
## Idempotency rules
|
|
49
|
+
|
|
50
|
+
Per `03-directory-output-spec.md`:
|
|
51
|
+
|
|
52
|
+
- Each output file is content-hash compared against the existing file
|
|
53
|
+
(if any). Same content → no write. Different content → atomic write
|
|
54
|
+
(write to `.tmp`, rename).
|
|
55
|
+
- Skip-newer check: if the source font's mtime is older than the
|
|
56
|
+
output chunk's mtime AND the baseline UCD's mtime is older too,
|
|
57
|
+
skip the chunk entirely. Saves work on no-op re-runs.
|
|
58
|
+
- Reuse `Ucode::Repo::AtomicWrites` (existing module) for the write
|
|
59
|
+
primitive. Do not reimplement.
|
|
60
|
+
|
|
61
|
+
## Emitter responsibilities
|
|
62
|
+
|
|
63
|
+
### IndexEmitter
|
|
64
|
+
|
|
65
|
+
Writes `index.json`. Serializes the `AuditReport` via lutaml-model's
|
|
66
|
+
`to_hash`, then:
|
|
67
|
+
|
|
68
|
+
- Drop `codepoint_details` (always — only emitted by CodepointEmitter).
|
|
69
|
+
- Drop `covered_codepoints` from each `block_summaries` entry (always
|
|
70
|
+
— IndexEmitter is for the compact form).
|
|
71
|
+
- Embed `missing_codepoints` per block (per decision in `00-README.md`).
|
|
72
|
+
- Add the `totals` summary computed from `block_summaries`.
|
|
73
|
+
|
|
74
|
+
### BlockEmitter
|
|
75
|
+
|
|
76
|
+
Writes one file per `BlockSummary` under `blocks/<NAME>.json`. The
|
|
77
|
+
filename uses the block name verbatim (filesystem-safe per
|
|
78
|
+
`03-directory-output-spec.md` §"Block filename encoding"). Each file
|
|
79
|
+
contains the single `BlockSummary` serialized.
|
|
80
|
+
|
|
81
|
+
### PlaneEmitter / ScriptEmitter
|
|
82
|
+
|
|
83
|
+
Roll-up views. Same shape as `block_summaries` entries but aggregated.
|
|
84
|
+
|
|
85
|
+
### CodepointEmitter (verbose only)
|
|
86
|
+
|
|
87
|
+
For each touched block, walk all codepoints in the font's cmap that
|
|
88
|
+
belong to that block and produce a `CodepointDetail` per codepoint.
|
|
89
|
+
The detail is enriched with ucode baseline data (name, gc, script,
|
|
90
|
+
age) via `Ucode::Database#lookup`.
|
|
91
|
+
|
|
92
|
+
Per-block chunking keeps each file under ~1MB even for CJK Extension J
|
|
93
|
+
(4,298 codepoints × ~200 bytes/detail ≈ 850KB).
|
|
94
|
+
|
|
95
|
+
### GlyphEmitter (opt-in)
|
|
96
|
+
|
|
97
|
+
For each covered codepoint:
|
|
98
|
+
|
|
99
|
+
1. Look up GID in the audited font's cmap.
|
|
100
|
+
2. Read outline via fontisan (glyf for TrueType, CharStrings for CFF).
|
|
101
|
+
3. Convert to SVG, normalize viewBox.
|
|
102
|
+
4. Write to `glyphs/U+XXXX.svg`.
|
|
103
|
+
|
|
104
|
+
Filename pattern: `U+%04X.svg` for BMP, `U+%05X.svg` for SMP, etc.
|
|
105
|
+
— same convention as Mode 1's glyph output.
|
|
106
|
+
|
|
107
|
+
This is the only emitter that calls fontisan's outline reading. Lazy:
|
|
108
|
+
construct the font handle once per face and reuse across codepoints.
|
|
109
|
+
|
|
110
|
+
### CollectionEmitter
|
|
111
|
+
|
|
112
|
+
For a TTC/OTC input, emit one `<source>/00-<face>/` directory per
|
|
113
|
+
face, plus a collection-level `index.json` with face metadata
|
|
114
|
+
(`num_fonts_in_source`, face labels, summary rollup).
|
|
115
|
+
|
|
116
|
+
### LibraryEmitter
|
|
117
|
+
|
|
118
|
+
For directory-mode input, emit one `<label>/` per font (already
|
|
119
|
+
produced by FaceDirectory), plus a library-level `index.json` and
|
|
120
|
+
`index.html` (the latter via TODO 15).
|
|
121
|
+
|
|
122
|
+
## Acceptance
|
|
123
|
+
|
|
124
|
+
- A non-verbose audit produces `index.json`, `planes/`, `blocks/`,
|
|
125
|
+
`scripts/`. No `codepoints/`. No `glyphs/`.
|
|
126
|
+
- A `--verbose` audit additionally produces `codepoints/<NAME>.json`
|
|
127
|
+
per touched block.
|
|
128
|
+
- A `--with-glyphs` audit additionally produces `glyphs/U+XXXX.svg`
|
|
129
|
+
per covered codepoint.
|
|
130
|
+
- `index.json` size is under 200KB for a 50k-codepoint CJK font (no
|
|
131
|
+
per-codepoint detail inlined).
|
|
132
|
+
- Each `codepoints/<NAME>.json` chunk is under 1MB.
|
|
133
|
+
- Re-running the same audit twice produces zero file writes on the
|
|
134
|
+
second run.
|
|
135
|
+
- Re-running after touching the source font rewrites the affected
|
|
136
|
+
chunks only.
|
|
137
|
+
- Block filenames preserve original names verbatim (no slugifying).
|
|
138
|
+
- No `double()` in specs.
|
|
139
|
+
- Rubocop clean.
|
|
140
|
+
|
|
141
|
+
## References
|
|
142
|
+
|
|
143
|
+
- Spec: `TODO.new/03-directory-output-spec.md`
|
|
144
|
+
- Models: `TODO.new/07-audit-models-port.md`
|
|
145
|
+
- Atomic writes: `lib/ucode/repo/atomic_writes.rb` (existing)
|
|
146
|
+
- Mode 1 equivalent: `lib/ucode/repo/codepoint_writer.rb`
|
|
147
|
+
- Browser consumer: `TODO.new/14-html-face-browser.md`
|
|
@@ -0,0 +1,144 @@
|
|
|
1
|
+
# 14 — HTML face browser
|
|
2
|
+
|
|
3
|
+
## Goal
|
|
4
|
+
|
|
5
|
+
Generate a standalone `index.html` per audited face. The browser opens
|
|
6
|
+
one file, sees the overview, and fetches JSON chunks lazily as the
|
|
7
|
+
user expands blocks or drills into per-codepoint detail.
|
|
8
|
+
|
|
9
|
+
This is what makes the audit locally browsable without a server. No
|
|
10
|
+
build step, no JS toolchain — plain HTML + vanilla JS + inlined CSS.
|
|
11
|
+
|
|
12
|
+
## Files to create
|
|
13
|
+
|
|
14
|
+
- `lib/ucode/audit/browser.rb` — namespace hub.
|
|
15
|
+
- `lib/ucode/audit/browser/face_page.rb` — renders one face's
|
|
16
|
+
`index.html`.
|
|
17
|
+
- `lib/ucode/audit/browser/templates/face.html.erb` — the page template.
|
|
18
|
+
- `lib/ucode/audit/browser/templates/face.css` — inlined into the page.
|
|
19
|
+
- `lib/ucode/audit/browser/templates/face.js` — inlined into the page;
|
|
20
|
+
vanilla JS, no dependencies, uses `fetch()` for chunks.
|
|
21
|
+
- `spec/ucode/audit/browser/face_page_spec.rb`.
|
|
22
|
+
|
|
23
|
+
## Template structure
|
|
24
|
+
|
|
25
|
+
The `face.html.erb` template emits a single HTML document with three
|
|
26
|
+
sections:
|
|
27
|
+
|
|
28
|
+
### Header
|
|
29
|
+
|
|
30
|
+
- Font identity (family, subfamily, version, foundry, license).
|
|
31
|
+
- Source provenance (file, sha256, format).
|
|
32
|
+
- Baseline (Unicode version, ucode version, generated_at).
|
|
33
|
+
- Summary stats (X codepoints covered across Y blocks; Z% of baseline).
|
|
34
|
+
|
|
35
|
+
### Plane overview
|
|
36
|
+
|
|
37
|
+
17-row visual band (one row per plane). Each plane is a thin horizontal
|
|
38
|
+
strip subdivided into block rectangles, colored by coverage:
|
|
39
|
+
|
|
40
|
+
- Dark green: COMPLETE
|
|
41
|
+
- Light green: PARTIAL >50%
|
|
42
|
+
- Yellow: PARTIAL ≤50%
|
|
43
|
+
- Red: UNCOVERED_ASSIGNED
|
|
44
|
+
- Gray: not touched / NO_ASSIGNED
|
|
45
|
+
|
|
46
|
+
Click a plane → scrolls to that plane's block list.
|
|
47
|
+
|
|
48
|
+
### Block drilldown
|
|
49
|
+
|
|
50
|
+
Sortable table of all touched blocks:
|
|
51
|
+
|
|
52
|
+
| Block | Range | Covered | Total | % | Status |
|
|
53
|
+
|---|---|---:|---:|---:|---|
|
|
54
|
+
| Basic Latin | U+0000–U+007F | 128 | 128 | 100% | COMPLETE |
|
|
55
|
+
| Greek and Coptic | U+0370–U+03FF | 80 | 135 | 59% | PARTIAL |
|
|
56
|
+
|
|
57
|
+
Click a row → fetches `blocks/<NAME>.json` (if not already loaded) and
|
|
58
|
+
expands to show the missing-codepoint list as a grid of small
|
|
59
|
+
codepoint chips (`U+037D U+0387 U+...`).
|
|
60
|
+
|
|
61
|
+
Click a codepoint chip → if verbose mode produced
|
|
62
|
+
`codepoints/<NAME>.json`, fetch and show name/gc/script/age detail. If
|
|
63
|
+
`--with-glyphs` was on, additionally fetch `glyphs/U+XXXX.svg` and
|
|
64
|
+
inline-render the outline.
|
|
65
|
+
|
|
66
|
+
### Discrepancies panel
|
|
67
|
+
|
|
68
|
+
If `discrepancies` is non-empty, show as a bulleted list at the bottom.
|
|
69
|
+
Otherwise hide.
|
|
70
|
+
|
|
71
|
+
## JS behavior
|
|
72
|
+
|
|
73
|
+
Vanilla JS, ~200 lines. No framework. Behavior:
|
|
74
|
+
|
|
75
|
+
- On load: fetch `index.json`, render header + plane overview + block
|
|
76
|
+
table.
|
|
77
|
+
- Block row click: lazy-fetch `blocks/<NAME>.json`, expand row.
|
|
78
|
+
- Codepoint chip click: lazy-fetch `codepoints/<NAME>.json` (if not
|
|
79
|
+
already fetched for this block), find detail, render.
|
|
80
|
+
- Glyph thumbnail click: lazy-fetch `glyphs/U+XXXX.svg`, inline into
|
|
81
|
+
detail panel.
|
|
82
|
+
- All fetches cached in a `Map` after first load — no duplicate
|
|
83
|
+
fetches.
|
|
84
|
+
- Errors (404 for missing chunk) show a friendly inline message, not
|
|
85
|
+
a broken page.
|
|
86
|
+
|
|
87
|
+
The JS resolves chunk paths relative to the page's own location, so
|
|
88
|
+
the entire `<label>/` directory is portable (can be opened via
|
|
89
|
+
`file://` or served by any static host).
|
|
90
|
+
|
|
91
|
+
## CSS
|
|
92
|
+
|
|
93
|
+
~150 lines. Plain CSS, no preprocessor. Honor `prefers-color-scheme`
|
|
94
|
+
for light/dark. Coverage colors must be readable in both.
|
|
95
|
+
|
|
96
|
+
## Standalone-ness
|
|
97
|
+
|
|
98
|
+
The generated `index.html` must work via `file://` with no server. All
|
|
99
|
+
JS and CSS inlined. Chunk fetches use relative URLs so they work
|
|
100
|
+
regardless of where the directory is mounted.
|
|
101
|
+
|
|
102
|
+
For `file://` URLs, some browsers block `fetch()` of local files. The
|
|
103
|
+
browser should detect this and show a one-line hint: "Open via a local
|
|
104
|
+
server (e.g. `python3 -m http.server` in this directory) for full
|
|
105
|
+
functionality." Initial overview (from inlined `index.json` data, see
|
|
106
|
+
below) still renders.
|
|
107
|
+
|
|
108
|
+
### Inline the overview data
|
|
109
|
+
|
|
110
|
+
To make the initial overview render without any fetch, the template
|
|
111
|
+
inlines the `index.json` contents into a `<script type="application/json"
|
|
112
|
+
id="audit-overview">...</script>` block. The JS reads from this on
|
|
113
|
+
load. Subsequent chunk fetches still go to the JSON files (so the
|
|
114
|
+
overview data isn't duplicated in chunks).
|
|
115
|
+
|
|
116
|
+
This is a deliberate tradeoff: the HTML file is larger (~200KB for a
|
|
117
|
+
typical font) but the initial render is instant and works via
|
|
118
|
+
`file://`.
|
|
119
|
+
|
|
120
|
+
## Acceptance
|
|
121
|
+
|
|
122
|
+
- `Ucode::Audit::Browser::FacePage.new(report:, output_dir:).write`
|
|
123
|
+
produces `<output_dir>/index.html` plus reuses the JSON chunks from
|
|
124
|
+
TODO 13 (does NOT duplicate them).
|
|
125
|
+
- Opening `index.html` in a browser via `file://` shows the overview
|
|
126
|
+
immediately. Plane band + block table + header all render.
|
|
127
|
+
- Clicking a block row fetches the per-block JSON (when served) and
|
|
128
|
+
expands.
|
|
129
|
+
- The page is fully self-contained: no external CSS, no external JS,
|
|
130
|
+
no CDN dependencies.
|
|
131
|
+
- HTML validates (no missing close tags, etc.).
|
|
132
|
+
- Spec asserts the generated HTML contains expected anchor strings
|
|
133
|
+
(font family name, baseline unicode version) and that the inlined
|
|
134
|
+
JSON matches the report.
|
|
135
|
+
- Rubocop clean (the Ruby side; the JS isn't rubocop's concern).
|
|
136
|
+
|
|
137
|
+
## References
|
|
138
|
+
|
|
139
|
+
- Output spec: `TODO.new/03-directory-output-spec.md`
|
|
140
|
+
- Emitter: `TODO.new/13-directory-emitter.md` (FacePage is invoked
|
|
141
|
+
after FaceDirectory to add the HTML)
|
|
142
|
+
- Library browser: `TODO.new/15-html-library-browser.md`
|
|
143
|
+
- CLI flag: `TODO.new/16-cli-audit-subcommands.md` (`--browse` auto-
|
|
144
|
+
generates HTML alongside JSON)
|