ucode 0.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +7 -0
- data/CLAUDE.md +211 -0
- data/Gemfile +22 -0
- data/Gemfile.lock +406 -0
- data/README.md +469 -0
- data/Rakefile +18 -0
- data/TODO.new/00-README.md +66 -0
- data/TODO.new/01-pillar-terminology-alignment.md +69 -0
- data/TODO.new/02-audit-schema-design.md +255 -0
- data/TODO.new/03-directory-output-spec.md +203 -0
- data/TODO.new/04-fontist-org-contract.md +173 -0
- data/TODO.new/05-baseline-unicode17-coverage-audit.md +144 -0
- data/TODO.new/06-audit-namespace-skeleton.md +105 -0
- data/TODO.new/07-audit-models-port.md +132 -0
- data/TODO.new/08-extractors-cheap-port.md +113 -0
- data/TODO.new/09-extractors-expensive-port.md +99 -0
- data/TODO.new/10-aggregations-ucd-rewrite.md +168 -0
- data/TODO.new/11-differ-and-library-auditor-port.md +102 -0
- data/TODO.new/12-formatters-port.md +115 -0
- data/TODO.new/13-directory-emitter.md +147 -0
- data/TODO.new/14-html-face-browser.md +144 -0
- data/TODO.new/15-html-library-browser.md +102 -0
- data/TODO.new/16-cli-audit-subcommands.md +142 -0
- data/TODO.new/17-fontisan-cleanup-audit.md +147 -0
- data/TODO.new/18-fontisan-cleanup-ucd.md +156 -0
- data/TODO.new/19-fontisan-docs-update.md +155 -0
- data/TODO.new/20-canonical-resolver-4-tier.md +182 -0
- data/TODO.new/21-canonical-unicode17-build.md +148 -0
- data/TODO.new/22-implementation-order.md +176 -0
- data/UCODE_CHANGELOG.md +97 -0
- data/exe/ucode +8 -0
- data/lib/ucode/aggregator.rb +77 -0
- data/lib/ucode/audit/block_aggregator.rb +90 -0
- data/lib/ucode/audit/codepoint_range_coalescer.rb +42 -0
- data/lib/ucode/audit/context.rb +137 -0
- data/lib/ucode/audit/discrepancy_detector.rb +213 -0
- data/lib/ucode/audit/extractors/aggregations.rb +70 -0
- data/lib/ucode/audit/extractors/base.rb +21 -0
- data/lib/ucode/audit/extractors/color_capabilities.rb +143 -0
- data/lib/ucode/audit/extractors/coverage.rb +55 -0
- data/lib/ucode/audit/extractors/hinting.rb +199 -0
- data/lib/ucode/audit/extractors/identity.rb +65 -0
- data/lib/ucode/audit/extractors/licensing.rb +75 -0
- data/lib/ucode/audit/extractors/metrics.rb +108 -0
- data/lib/ucode/audit/extractors/opentype_layout.rb +71 -0
- data/lib/ucode/audit/extractors/provenance.rb +34 -0
- data/lib/ucode/audit/extractors/style.rb +88 -0
- data/lib/ucode/audit/extractors/variation_detail.rb +101 -0
- data/lib/ucode/audit/extractors.rb +31 -0
- data/lib/ucode/audit/plane_aggregator.rb +37 -0
- data/lib/ucode/audit/registry.rb +63 -0
- data/lib/ucode/audit/script_aggregator.rb +92 -0
- data/lib/ucode/audit.rb +27 -0
- data/lib/ucode/cache.rb +113 -0
- data/lib/ucode/cli.rb +272 -0
- data/lib/ucode/commands/build.rb +68 -0
- data/lib/ucode/commands/cache.rb +46 -0
- data/lib/ucode/commands/fetch.rb +62 -0
- data/lib/ucode/commands/font_coverage.rb +57 -0
- data/lib/ucode/commands/glyphs.rb +136 -0
- data/lib/ucode/commands/lookup.rb +65 -0
- data/lib/ucode/commands/parse.rb +62 -0
- data/lib/ucode/commands/site.rb +33 -0
- data/lib/ucode/commands.rb +19 -0
- data/lib/ucode/config.rb +110 -0
- data/lib/ucode/coordinator/indices.rb +34 -0
- data/lib/ucode/coordinator.rb +397 -0
- data/lib/ucode/database.rb +214 -0
- data/lib/ucode/db_builder.rb +107 -0
- data/lib/ucode/error.rb +96 -0
- data/lib/ucode/fetch/code_charts.rb +57 -0
- data/lib/ucode/fetch/http.rb +83 -0
- data/lib/ucode/fetch/ucd_zip.rb +57 -0
- data/lib/ucode/fetch/unihan_zip.rb +57 -0
- data/lib/ucode/fetch.rb +14 -0
- data/lib/ucode/glyphs/cell_extractor.rb +130 -0
- data/lib/ucode/glyphs/dvisvgm_renderer.rb +29 -0
- data/lib/ucode/glyphs/embedded_fonts/catalog.rb +372 -0
- data/lib/ucode/glyphs/embedded_fonts/content_stream_correlator.rb +228 -0
- data/lib/ucode/glyphs/embedded_fonts/font_entry.rb +126 -0
- data/lib/ucode/glyphs/embedded_fonts/renderer.rb +47 -0
- data/lib/ucode/glyphs/embedded_fonts/source.rb +94 -0
- data/lib/ucode/glyphs/embedded_fonts/svg.rb +123 -0
- data/lib/ucode/glyphs/embedded_fonts/tounicode.rb +103 -0
- data/lib/ucode/glyphs/embedded_fonts/writer.rb +76 -0
- data/lib/ucode/glyphs/embedded_fonts.rb +50 -0
- data/lib/ucode/glyphs/grid.rb +30 -0
- data/lib/ucode/glyphs/grid_detector.rb +165 -0
- data/lib/ucode/glyphs/last_resort/cmap_index.rb +96 -0
- data/lib/ucode/glyphs/last_resort/contents.rb +74 -0
- data/lib/ucode/glyphs/last_resort/glif.rb +124 -0
- data/lib/ucode/glyphs/last_resort/renderer.rb +67 -0
- data/lib/ucode/glyphs/last_resort/source.rb +125 -0
- data/lib/ucode/glyphs/last_resort/svg.rb +247 -0
- data/lib/ucode/glyphs/last_resort/writer.rb +83 -0
- data/lib/ucode/glyphs/last_resort.rb +36 -0
- data/lib/ucode/glyphs/monolith_page_map.rb +181 -0
- data/lib/ucode/glyphs/mutool_renderer.rb +28 -0
- data/lib/ucode/glyphs/page_renderer.rb +221 -0
- data/lib/ucode/glyphs/path_bbox.rb +62 -0
- data/lib/ucode/glyphs/pdf2svg_renderer.rb +26 -0
- data/lib/ucode/glyphs/pdf_fetcher.rb +102 -0
- data/lib/ucode/glyphs/pdftocairo_renderer.rb +32 -0
- data/lib/ucode/glyphs/real_fonts/block_coverage.rb +45 -0
- data/lib/ucode/glyphs/real_fonts/coverage_auditor.rb +117 -0
- data/lib/ucode/glyphs/real_fonts/font_coverage_report.rb +45 -0
- data/lib/ucode/glyphs/real_fonts/font_locator.rb +95 -0
- data/lib/ucode/glyphs/real_fonts/unicode_17_blocks.rb +104 -0
- data/lib/ucode/glyphs/real_fonts/writer.rb +50 -0
- data/lib/ucode/glyphs/real_fonts.rb +32 -0
- data/lib/ucode/glyphs/writer.rb +250 -0
- data/lib/ucode/glyphs.rb +27 -0
- data/lib/ucode/index.rb +106 -0
- data/lib/ucode/index_builder.rb +94 -0
- data/lib/ucode/models/audit/audit_axis.rb +30 -0
- data/lib/ucode/models/audit/audit_diff.rb +77 -0
- data/lib/ucode/models/audit/audit_report.rb +137 -0
- data/lib/ucode/models/audit/baseline.rb +32 -0
- data/lib/ucode/models/audit/block_summary.rb +72 -0
- data/lib/ucode/models/audit/codepoint_detail.rb +45 -0
- data/lib/ucode/models/audit/codepoint_range.rb +39 -0
- data/lib/ucode/models/audit/codepoint_set_diff.rb +34 -0
- data/lib/ucode/models/audit/color_capabilities.rb +91 -0
- data/lib/ucode/models/audit/discrepancy.rb +38 -0
- data/lib/ucode/models/audit/duplicate_group.rb +23 -0
- data/lib/ucode/models/audit/embedding_type.rb +81 -0
- data/lib/ucode/models/audit/field_change.rb +28 -0
- data/lib/ucode/models/audit/fs_selection_flags.rb +65 -0
- data/lib/ucode/models/audit/gasp_range.rb +63 -0
- data/lib/ucode/models/audit/hinting.rb +99 -0
- data/lib/ucode/models/audit/library_summary.rb +40 -0
- data/lib/ucode/models/audit/licensing.rb +48 -0
- data/lib/ucode/models/audit/metrics.rb +111 -0
- data/lib/ucode/models/audit/named_instance.rb +41 -0
- data/lib/ucode/models/audit/opentype_layout.rb +38 -0
- data/lib/ucode/models/audit/plane_summary.rb +31 -0
- data/lib/ucode/models/audit/script_coverage_row.rb +26 -0
- data/lib/ucode/models/audit/script_features.rb +28 -0
- data/lib/ucode/models/audit/script_summary.rb +54 -0
- data/lib/ucode/models/audit/variation_detail.rb +42 -0
- data/lib/ucode/models/audit.rb +50 -0
- data/lib/ucode/models/bidi_bracket_pair.rb +20 -0
- data/lib/ucode/models/bidi_mirroring.rb +19 -0
- data/lib/ucode/models/binary_property_assignment.rb +26 -0
- data/lib/ucode/models/block.rb +36 -0
- data/lib/ucode/models/case_folding_rule.rb +23 -0
- data/lib/ucode/models/cjk_radical.rb +23 -0
- data/lib/ucode/models/codepoint/bidi.rb +28 -0
- data/lib/ucode/models/codepoint/break_segmentation.rb +22 -0
- data/lib/ucode/models/codepoint/case_folding.rb +25 -0
- data/lib/ucode/models/codepoint/casing.rb +32 -0
- data/lib/ucode/models/codepoint/decomposition.rb +27 -0
- data/lib/ucode/models/codepoint/display.rb +24 -0
- data/lib/ucode/models/codepoint/emoji.rb +29 -0
- data/lib/ucode/models/codepoint/hangul.rb +20 -0
- data/lib/ucode/models/codepoint/identifier.rb +30 -0
- data/lib/ucode/models/codepoint/indic.rb +20 -0
- data/lib/ucode/models/codepoint/joining.rb +20 -0
- data/lib/ucode/models/codepoint/normalization.rb +35 -0
- data/lib/ucode/models/codepoint/numeric_value.rb +35 -0
- data/lib/ucode/models/codepoint.rb +122 -0
- data/lib/ucode/models/name_alias.rb +21 -0
- data/lib/ucode/models/named_sequence.rb +19 -0
- data/lib/ucode/models/names_list_entry.rb +38 -0
- data/lib/ucode/models/plane.rb +36 -0
- data/lib/ucode/models/property_alias.rb +24 -0
- data/lib/ucode/models/property_value_alias.rb +26 -0
- data/lib/ucode/models/relationship/compat_equiv.rb +18 -0
- data/lib/ucode/models/relationship/cross_reference.rb +17 -0
- data/lib/ucode/models/relationship/footnote.rb +24 -0
- data/lib/ucode/models/relationship/informal_alias.rb +18 -0
- data/lib/ucode/models/relationship/sample_sequence.rb +24 -0
- data/lib/ucode/models/relationship/variation_sequence.rb +19 -0
- data/lib/ucode/models/relationship.rb +57 -0
- data/lib/ucode/models/script.rb +41 -0
- data/lib/ucode/models/special_casing_rule.rb +28 -0
- data/lib/ucode/models/standardized_variant.rb +24 -0
- data/lib/ucode/models/unihan_entry.rb +23 -0
- data/lib/ucode/models.rb +47 -0
- data/lib/ucode/parsers/auxiliary.rb +26 -0
- data/lib/ucode/parsers/base.rb +137 -0
- data/lib/ucode/parsers/bidi_brackets.rb +41 -0
- data/lib/ucode/parsers/bidi_mirroring.rb +37 -0
- data/lib/ucode/parsers/blocks.rb +63 -0
- data/lib/ucode/parsers/case_folding.rb +53 -0
- data/lib/ucode/parsers/cjk_radicals.rb +102 -0
- data/lib/ucode/parsers/derived_age.rb +59 -0
- data/lib/ucode/parsers/derived_core_properties.rb +60 -0
- data/lib/ucode/parsers/extracted_properties.rb +74 -0
- data/lib/ucode/parsers/name_aliases.rb +44 -0
- data/lib/ucode/parsers/named_sequences.rb +51 -0
- data/lib/ucode/parsers/names_list.rb +250 -0
- data/lib/ucode/parsers/property_aliases.rb +41 -0
- data/lib/ucode/parsers/property_value_aliases.rb +46 -0
- data/lib/ucode/parsers/script_extensions.rb +64 -0
- data/lib/ucode/parsers/scripts.rb +60 -0
- data/lib/ucode/parsers/special_casing.rb +62 -0
- data/lib/ucode/parsers/standardized_variants.rb +56 -0
- data/lib/ucode/parsers/unicode_data/hangul_name.rb +73 -0
- data/lib/ucode/parsers/unicode_data.rb +268 -0
- data/lib/ucode/parsers/unihan.rb +125 -0
- data/lib/ucode/parsers.rb +35 -0
- data/lib/ucode/range_entry.rb +58 -0
- data/lib/ucode/repo/aggregate_writer.rb +364 -0
- data/lib/ucode/repo/atomic_writes.rb +48 -0
- data/lib/ucode/repo/codepoint_writer.rb +96 -0
- data/lib/ucode/repo/paths.rb +122 -0
- data/lib/ucode/repo.rb +22 -0
- data/lib/ucode/site/config_emitter.rb +124 -0
- data/lib/ucode/site/generator.rb +178 -0
- data/lib/ucode/site/search_index.rb +68 -0
- data/lib/ucode/site/template/.gitignore +4 -0
- data/lib/ucode/site/template/.vitepress/config.ts +8 -0
- data/lib/ucode/site/template/.vitepress/theme/index.js +20 -0
- data/lib/ucode/site/template/char/[codepoint].md +13 -0
- data/lib/ucode/site/template/components/BlockView.vue +57 -0
- data/lib/ucode/site/template/components/CharView.vue +85 -0
- data/lib/ucode/site/template/components/PlaneView.vue +56 -0
- data/lib/ucode/site/template/components/SearchView.vue +66 -0
- data/lib/ucode/site/template/index.md +25 -0
- data/lib/ucode/site/template/package.json +18 -0
- data/lib/ucode/site/template/search.md +9 -0
- data/lib/ucode/site.rb +13 -0
- data/lib/ucode/version.rb +5 -0
- data/lib/ucode/version_resolver.rb +76 -0
- data/lib/ucode.rb +74 -0
- data/ucode.gemspec +56 -0
- metadata +404 -0
|
@@ -0,0 +1,144 @@
|
|
|
1
|
+
# 05 — Unicode 17 baseline coverage audit
|
|
2
|
+
|
|
3
|
+
## Goal
|
|
4
|
+
|
|
5
|
+
Capture the actual per-tier coverage numbers for every Unicode 17
|
|
6
|
+
addition (11 new blocks + additions to ~12 existing blocks) before
|
|
7
|
+
locking the migration scope. The numbers ground every later decision:
|
|
8
|
+
which blocks need pillar 2 work, which need real-font mining, which
|
|
9
|
+
are already complete via tier 1.
|
|
10
|
+
|
|
11
|
+
The publisher-confirmed table below is the starting point (carried from
|
|
12
|
+
prior sessions). The deliverable is cmap-verified numbers, not
|
|
13
|
+
publisher claims.
|
|
14
|
+
|
|
15
|
+
## Why measure first
|
|
16
|
+
|
|
17
|
+
Two open questions cannot be answered without running the audit:
|
|
18
|
+
|
|
19
|
+
1. **Pillar coverage breakdown**: for each Unicode 17 block, how many
|
|
20
|
+
codepoints does Tier 1 cover? Pillar 1? Pillar 2? Pillar 3? The
|
|
21
|
+
migration's pillar-2 generalization work (already merged for Tai Yo)
|
|
22
|
+
needs to know which other blocks need it.
|
|
23
|
+
2. **Real-font availability**: which Unicode 17 additions have a
|
|
24
|
+
public Tier 1 font vs which require pillar 2 (Sidetic, Beria Erfe)?
|
|
25
|
+
The canonical resolver config (TODO 20) needs this mapping.
|
|
26
|
+
|
|
27
|
+
## Scope — Unicode 17 additions
|
|
28
|
+
|
|
29
|
+
### 11 new blocks
|
|
30
|
+
|
|
31
|
+
| Block | Range | Assigned | Tier 1 source (publisher-confirmed) | Confidence |
|
|
32
|
+
|---|---|---:|---|---|
|
|
33
|
+
| Sidetic | U+10940–1095F | 26 | Lentariso ≥1.029 + Noto Sans Sidetic | HIGH |
|
|
34
|
+
| Sharada Supplement | U+11B60–11B7F | 8 | Noto Sans Sharada | HIGH |
|
|
35
|
+
| Tolong Siki | U+11DB0–11DEF | 54 | Noto Sans Tolong Siki | HIGH |
|
|
36
|
+
| Beria Erfe | U+16EA0–16EDF | 50 | Kedebideri 3.001 (SIL) | HIGH |
|
|
37
|
+
| Tai Yo | U+1E6C0–1E6F3 | 55 | Noto Sans Tai Yo + pillar 2 (proven) | HIGH |
|
|
38
|
+
| Symbols for Legacy Computing Supplement | U+1CC00–1CCFF | 9 | BabelStone Pseudographica | MEDIUM |
|
|
39
|
+
| Supplemental Arrows-C | U+1CF00–1CFCF | 9 | Symbola | MEDIUM |
|
|
40
|
+
| Alchemical Symbols (ext) | U+1F740–1F77F | 4 | Noto Sans Symbols + Symbola | HIGH |
|
|
41
|
+
| Miscellaneous Symbols Supplement | U+1FA70–1FAFF | 34 | Noto Sans Symbols 2 | HIGH |
|
|
42
|
+
| Musical Symbols Supplement (additions) | U+1D200–U+1D2FF | TBD | Noto Music | HIGH |
|
|
43
|
+
| CJK Extension J | U+31350–U+323AF | 4298 | FSung + Noto Sans/Serif CJK | HIGH |
|
|
44
|
+
|
|
45
|
+
### Additions to existing blocks
|
|
46
|
+
|
|
47
|
+
- Tangut (+8), Tangut Supplement (+22), Tangut Components Supp. (+115) → Noto Sans Tangut + grave-app.
|
|
48
|
+
- Adlam (+29) → Noto Sans Adlam.
|
|
49
|
+
- Arabic Extended-B/C → Noto Naskh Arabic.
|
|
50
|
+
- Telugu (+1), Kannada (+1) → existing Noto.
|
|
51
|
+
- Combining Diacritical Marks Extended (+27) → likely Pillar 2 (font support spotty).
|
|
52
|
+
- CJK Extension C/E additions → FSung.
|
|
53
|
+
- Chess Symbols (+4), Transport (+1), Symbols & Pictographs Ext-A (+6) → Noto Symbols 2.
|
|
54
|
+
- Egyptian Hieroglyphs (+28), Egyptian Hieroglyphs Format Controls (all), Egyptian Hieroglyphs Extended-A (+9), Egyptian Hieroglyphs Extended-B (~600 new) → UniHieroglyphica + Egyptian Text.
|
|
55
|
+
|
|
56
|
+
## Procedure
|
|
57
|
+
|
|
58
|
+
For each block:
|
|
59
|
+
|
|
60
|
+
1. **Acquire Tier 1 font** (if publisher-confirmed):
|
|
61
|
+
- Lentariso: github.com/Bry10022/Lentariso (SFD source, OFL).
|
|
62
|
+
- Kedebideri 3.001: software.sil.org/kedebideri/.
|
|
63
|
+
- Noto family: notofonts.github.io or Google Fonts.
|
|
64
|
+
- FSung: `~/Downloads/全宋體/FSung-*.ttf` (already local).
|
|
65
|
+
- BabelStone Pseudographica, Symbola: BabelStone site.
|
|
66
|
+
- UniHieroglyphica: suignard.com (OFL); Egyptian Text: microsoft/font-tools.
|
|
67
|
+
- NotoSerifTaiYo: translationcommons.org.
|
|
68
|
+
|
|
69
|
+
2. **Run Tier 1 cmap audit** via the existing `ucode font-coverage` CLI
|
|
70
|
+
(renamed to `ucode audit font` in TODO 16; both names valid until
|
|
71
|
+
then):
|
|
72
|
+
```
|
|
73
|
+
ucode font-coverage <font-path> --label <font-label> \
|
|
74
|
+
--unicode-version 17.0
|
|
75
|
+
```
|
|
76
|
+
Output: `output/font_coverage/<label>.json` (becomes
|
|
77
|
+
`output/font_audit/<label>/index.json` post-migration).
|
|
78
|
+
|
|
79
|
+
3. **Capture pillar 1-2 stats** by running `ucode glyphs` against each
|
|
80
|
+
per-block PDF:
|
|
81
|
+
```
|
|
82
|
+
ucode glyphs --block <block-first-cp> --version 17.0
|
|
83
|
+
```
|
|
84
|
+
The `Catalog` reports `#codepoints`, `#size`, `#font_count` — log
|
|
85
|
+
these per block.
|
|
86
|
+
|
|
87
|
+
4. **Cross-check**: Tier 1 cmap count + pillar 1-2 chart count should
|
|
88
|
+
equal `total_assigned` for that block. Discrepancies flag where
|
|
89
|
+
pillar 2 needs to generalize (e.g. fonts without ToUnicode) or
|
|
90
|
+
where Tier 1 fonts are missing codepoints the chart shows.
|
|
91
|
+
|
|
92
|
+
5. **Sidetic + Beria Erfe specifically** — re-audit with the now-merged
|
|
93
|
+
`ContentStreamCorrelator` (commit `24e6bfd`). The Tai Yo proof
|
|
94
|
+
should generalize; verify the bucket sizes match (Sidetic/Beria
|
|
95
|
+
Erfe may have tighter grid than Tai Yo).
|
|
96
|
+
|
|
97
|
+
## Deliverable
|
|
98
|
+
|
|
99
|
+
A single markdown report at `docs/unicode17-coverage-baseline.md`
|
|
100
|
+
containing:
|
|
101
|
+
|
|
102
|
+
- One table per Unicode 17 block with: assigned count, Tier 1 covered,
|
|
103
|
+
Pillar 1 covered, Pillar 2 covered, Pillar 3 covered, gap, notes.
|
|
104
|
+
- A summary table aggregating across all blocks.
|
|
105
|
+
- Identified pillar 2 generalization needs (fonts the correlator must
|
|
106
|
+
handle).
|
|
107
|
+
- Identified Tier 1 font gaps (codepoints the publisher-confirmed font
|
|
108
|
+
doesn't actually cover).
|
|
109
|
+
|
|
110
|
+
This report becomes the input to TODO 20 (canonical resolver config:
|
|
111
|
+
block → preferred Tier 1 font) and TODO 21 (Unicode 17 dataset build
|
|
112
|
+
verification).
|
|
113
|
+
|
|
114
|
+
## Acceptance
|
|
115
|
+
|
|
116
|
+
- All 11 new blocks have cmap-verified Tier 1 numbers (not just
|
|
117
|
+
publisher claims).
|
|
118
|
+
- All Unicode 17 additions to existing blocks have at least publisher
|
|
119
|
+
confirmation; cmap-verified where a font is locally available.
|
|
120
|
+
- `docs/unicode17-coverage-baseline.md` exists with the above tables.
|
|
121
|
+
- Sidetic and Beria Erfe show 26/26 and 50/50 respectively via the
|
|
122
|
+
merged pillar 2 path (validates the correlator generalization).
|
|
123
|
+
- CJK Extension J: FSung covers the 4,298 assigned codepoints (or
|
|
124
|
+
documents which subset is missing).
|
|
125
|
+
|
|
126
|
+
## Out of scope
|
|
127
|
+
|
|
128
|
+
- Migrating any code (that's TODOs 06-19).
|
|
129
|
+
- The canonical 4-tier resolver (TODO 20) — this audit informs it but
|
|
130
|
+
doesn't build it.
|
|
131
|
+
- HTML browser (TODO 14-15) — the audit outputs JSON only.
|
|
132
|
+
|
|
133
|
+
## References
|
|
134
|
+
|
|
135
|
+
- Mode 1 vs Mode 2: `docs/architecture.md` §"Two output modes"
|
|
136
|
+
- Tier 1 implementation: `lib/ucode/glyphs/real_fonts/`
|
|
137
|
+
- Pillar 1 implementation: `lib/ucode/glyphs/embedded_fonts/catalog.rb`
|
|
138
|
+
- Pillar 2 implementation:
|
|
139
|
+
`lib/ucode/glyphs/embedded_fonts/content_stream_correlator.rb`
|
|
140
|
+
- Proven Tai Yo correlator: `/tmp/correlate_v4.rb` (carried forward as
|
|
141
|
+
the spec fixture basis)
|
|
142
|
+
- PR #1 description: Tier-1 + Pillar-1 + Pillar-2 already validated
|
|
143
|
+
for Sidetic (26/26 via Lentariso), Beria Erfe (50/50 via
|
|
144
|
+
Kedebideri), Tai Yo (54/54 via pillar 2)
|
|
@@ -0,0 +1,105 @@
|
|
|
1
|
+
# 06 — Audit namespace skeleton
|
|
2
|
+
|
|
3
|
+
## Goal
|
|
4
|
+
|
|
5
|
+
Stand up the `Ucode::Audit` namespace hub, the `Registry`, and the
|
|
6
|
+
`Context`. No extractors, no models, no CLI yet — just the empty
|
|
7
|
+
orchestrator scaffolding that subsequent TODOs (07-12) populate.
|
|
8
|
+
|
|
9
|
+
This is the foundation; everything else in the migration slots into it.
|
|
10
|
+
|
|
11
|
+
## Files to create
|
|
12
|
+
|
|
13
|
+
- `lib/ucode/audit.rb` — namespace hub. Declares the autoloads (Ruby
|
|
14
|
+
autoload — see project memory `feedback_require_relative.md`).
|
|
15
|
+
- `lib/ucode/audit/registry.rb` — ordered list of extractor classes,
|
|
16
|
+
iterated by `AuditCommand` for every face.
|
|
17
|
+
- `lib/ucode/audit/context.rb` — value object carrying everything an
|
|
18
|
+
extractor needs to do its job (font handle, codepoint set, UCD
|
|
19
|
+
baseline, options).
|
|
20
|
+
- `lib/ucode/audit/extractors.rb` — extractors namespace hub (empty;
|
|
21
|
+
filled by TODO 08 and 09).
|
|
22
|
+
- `spec/ucode/audit/registry_spec.rb` — empty registry iterates zero
|
|
23
|
+
extractors without error.
|
|
24
|
+
- `spec/ucode/audit/context_spec.rb` — context memoizes codepoints,
|
|
25
|
+
baseline, source_format.
|
|
26
|
+
|
|
27
|
+
## Port from fontisan
|
|
28
|
+
|
|
29
|
+
Direct port of:
|
|
30
|
+
- `fontisan/lib/fontisan/audit.rb` (namespace hub pattern)
|
|
31
|
+
- `fontisan/lib/fontisan/audit/registry.rb` (Registry module)
|
|
32
|
+
- `fontisan/lib/fontisan/audit/context.rb` (Context class)
|
|
33
|
+
- `fontisan/lib/fontisan/audit/extractors.rb` (extractors namespace)
|
|
34
|
+
|
|
35
|
+
with namespace changes (`Fontisan::` → `Ucode::`).
|
|
36
|
+
|
|
37
|
+
## Context adjustments vs fontisan
|
|
38
|
+
|
|
39
|
+
The fontisan `Context` carries:
|
|
40
|
+
|
|
41
|
+
- `font`, `font_path`, `font_index`, `num_fonts_in_source`, `options`
|
|
42
|
+
- `codepoints` (memoized cmap keys)
|
|
43
|
+
- `ucd` (memoized UCD database + version + warning)
|
|
44
|
+
- `cldr` (memoized CLDR index — **drop**, see below)
|
|
45
|
+
- `source_format`
|
|
46
|
+
|
|
47
|
+
ucode's `Context` drops:
|
|
48
|
+
|
|
49
|
+
- `cldr` and the entire `resolve_cldr` path. CLDR is out of scope
|
|
50
|
+
(decision in `TODO.new/00-README.md`).
|
|
51
|
+
- `Ucd::VersionResolver` calls → replace with `Ucode::VersionResolver`
|
|
52
|
+
(ucode's own; see `lib/ucode/version_resolver.rb`).
|
|
53
|
+
- `Ucd::Database.open` / `Ucd::CacheManager` calls → replace with
|
|
54
|
+
`Ucode::Database.open` and `Ucode::Cache` (ucode's own; see
|
|
55
|
+
`lib/ucode/database.rb` and `lib/ucode/cache.rb`).
|
|
56
|
+
- `Ucd::Downloader` calls → replace with `Ucode::Fetch::UcdZip`.
|
|
57
|
+
|
|
58
|
+
ucode's `Context` adds:
|
|
59
|
+
|
|
60
|
+
- `baseline` — pre-resolved baseline struct (the assigned-codepoint
|
|
61
|
+
set for the target Unicode version). Extractors read from this
|
|
62
|
+
rather than re-resolving.
|
|
63
|
+
- `renderer` — optional glyph renderer for `--with-glyphs` mode. Set
|
|
64
|
+
only when the option is on; nil otherwise. Avoids loading fontisan's
|
|
65
|
+
outline reader unless needed.
|
|
66
|
+
|
|
67
|
+
## Registry adjustments
|
|
68
|
+
|
|
69
|
+
The fontisan registry has two extractor lists:
|
|
70
|
+
|
|
71
|
+
- `ORDERED_EXTRACTORS` — 12 extractors (full audit).
|
|
72
|
+
- `BRIEF_EXTRACTORS` — 5 extractors (cheap pass).
|
|
73
|
+
|
|
74
|
+
ucode's registry starts empty (no extractors ported yet). TODOs 08 and
|
|
75
|
+
09 add them in order. The brief/full mode switch ports across unchanged.
|
|
76
|
+
|
|
77
|
+
Drop the `Extractors::LanguageCoverage` entry from both lists — CLDR
|
|
78
|
+
out of scope.
|
|
79
|
+
|
|
80
|
+
## Acceptance
|
|
81
|
+
|
|
82
|
+
- `Ucode::Audit` constant exists; `Ucode::Audit::Registry` and
|
|
83
|
+
`Ucode::Audit::Context` are referable.
|
|
84
|
+
- `Ucode::Audit::Registry.each(mode: :full) { |e| }` iterates zero
|
|
85
|
+
extractors without error (empty list).
|
|
86
|
+
- `Ucode::Audit::Registry.each(mode: :brief) { |e| }` same.
|
|
87
|
+
- `Ucode::Audit::Context.new(font: ..., ...)` constructs and memoizes
|
|
88
|
+
`codepoints` on first call.
|
|
89
|
+
- `Context#baseline` returns a real `Ucode::Database`-backed struct
|
|
90
|
+
(or raises a clear error if the version is uncached).
|
|
91
|
+
- No `cldr` method exists on `Context` (verified by spec).
|
|
92
|
+
- All specs use real model instances; no `double()`.
|
|
93
|
+
- Rubocop clean.
|
|
94
|
+
|
|
95
|
+
## References
|
|
96
|
+
|
|
97
|
+
- Source: `fontisan/lib/fontisan/audit.rb`, `audit/registry.rb`,
|
|
98
|
+
`audit/context.rb`, `audit/extractors.rb`
|
|
99
|
+
- ucode UCD infra: `lib/ucode/database.rb`, `lib/ucode/cache.rb`,
|
|
100
|
+
`lib/ucode/version_resolver.rb`, `lib/ucode/fetch/`
|
|
101
|
+
- Project memory: `feedback_require_relative.md` (autoload rule),
|
|
102
|
+
`feedback_use_fontist_only.md`
|
|
103
|
+
- Follow-ups: `TODO.new/07-audit-models-port.md`,
|
|
104
|
+
`TODO.new/08-extractors-cheap-port.md`,
|
|
105
|
+
`TODO.new/09-extractors-expensive-port.md`
|
|
@@ -0,0 +1,132 @@
|
|
|
1
|
+
# 07 — Models::Audit port
|
|
2
|
+
|
|
3
|
+
## Goal
|
|
4
|
+
|
|
5
|
+
Port the `Fontisan::Models::Audit::*` lutaml-model classes (15 files)
|
|
6
|
+
to `Ucode::Models::Audit::*` with the schema adjustments from
|
|
7
|
+
`TODO.new/02-audit-schema-design.md`. Pure data classes — no font
|
|
8
|
+
parsing logic.
|
|
9
|
+
|
|
10
|
+
## Files to create
|
|
11
|
+
|
|
12
|
+
One class per file, plus the namespace hub:
|
|
13
|
+
|
|
14
|
+
```
|
|
15
|
+
lib/ucode/models/audit.rb
|
|
16
|
+
lib/ucode/models/audit/audit_report.rb
|
|
17
|
+
lib/ucode/models/audit/baseline.rb # NEW
|
|
18
|
+
lib/ucode/models/audit/block_summary.rb # was AuditBlock
|
|
19
|
+
lib/ucode/models/audit/script_summary.rb # NEW (was string list)
|
|
20
|
+
lib/ucode/models/audit/plane_summary.rb # NEW
|
|
21
|
+
lib/ucode/models/audit/discrepancy.rb # NEW
|
|
22
|
+
lib/ucode/models/audit/codepoint_detail.rb # NEW
|
|
23
|
+
lib/ucode/models/audit/codepoint_range.rb
|
|
24
|
+
lib/ucode/models/audit/codepoint_set_diff.rb
|
|
25
|
+
lib/ucode/models/audit/audit_axis.rb
|
|
26
|
+
lib/ucode/models/audit/named_instance.rb
|
|
27
|
+
lib/ucode/models/audit/licensing.rb
|
|
28
|
+
lib/ucode/models/audit/metrics.rb
|
|
29
|
+
lib/ucode/models/audit/hinting.rb
|
|
30
|
+
lib/ucode/models/audit/color_capabilities.rb
|
|
31
|
+
lib/ucode/models/audit/variation_detail.rb
|
|
32
|
+
lib/ucode/models/audit/opentype_layout.rb
|
|
33
|
+
lib/ucode/models/audit/fs_selection_flags.rb
|
|
34
|
+
lib/ucode/models/audit/gasp_range.rb
|
|
35
|
+
lib/ucode/models/audit/embedding_type.rb
|
|
36
|
+
lib/ucode/models/audit/script_coverage_row.rb
|
|
37
|
+
lib/ucode/models/audit/script_features.rb
|
|
38
|
+
lib/ucode/models/audit/field_change.rb
|
|
39
|
+
lib/ucode/models/audit/duplicate_group.rb
|
|
40
|
+
lib/ucode/models/audit/library_summary.rb
|
|
41
|
+
lib/ucode/models/audit/audit_diff.rb
|
|
42
|
+
```
|
|
43
|
+
|
|
44
|
+
Specs under `spec/ucode/models/audit/` — one spec per model, all
|
|
45
|
+
testing `to_hash` / `from_hash` round-trip.
|
|
46
|
+
|
|
47
|
+
## Source material
|
|
48
|
+
|
|
49
|
+
Port these unchanged (just namespace swap):
|
|
50
|
+
|
|
51
|
+
- `fontisan/lib/fontisan/models/audit/codepoint_range.rb`
|
|
52
|
+
- `fontisan/lib/fontisan/models/audit/codepoint_set_diff.rb`
|
|
53
|
+
- `fontisan/lib/fontisan/models/audit/audit_axis.rb`
|
|
54
|
+
- `fontisan/lib/fontisan/models/audit/named_instance.rb`
|
|
55
|
+
- `fontisan/lib/fontisan/models/audit/licensing.rb`
|
|
56
|
+
- `fontisan/lib/fontisan/models/audit/metrics.rb`
|
|
57
|
+
- `fontisan/lib/fontisan/models/audit/hinting.rb`
|
|
58
|
+
- `fontisan/lib/fontisan/models/audit/color_capabilities.rb`
|
|
59
|
+
- `fontisan/lib/fontisan/models/audit/variation_detail.rb`
|
|
60
|
+
- `fontisan/lib/fontisan/models/audit/opentype_layout.rb`
|
|
61
|
+
- `fontisan/lib/fontisan/models/audit/fs_selection_flags.rb`
|
|
62
|
+
- `fontisan/lib/fontisan/models/audit/gasp_range.rb`
|
|
63
|
+
- `fontisan/lib/fontisan/models/audit/embedding_type.rb`
|
|
64
|
+
- `fontisan/lib/fontisan/models/audit/script_coverage_row.rb`
|
|
65
|
+
- `fontisan/lib/fontisan/models/audit/script_features.rb`
|
|
66
|
+
- `fontisan/lib/fontisan/models/audit/field_change.rb`
|
|
67
|
+
- `fontisan/lib/fontisan/models/audit/duplicate_group.rb`
|
|
68
|
+
- `fontisan/lib/fontisan/models/audit/library_summary.rb`
|
|
69
|
+
- `fontisan/lib/fontisan/models/audit/audit_diff.rb`
|
|
70
|
+
|
|
71
|
+
## Schema changes vs fontisan
|
|
72
|
+
|
|
73
|
+
Per `TODO.new/02-audit-schema-design.md`:
|
|
74
|
+
|
|
75
|
+
- `AuditReport`:
|
|
76
|
+
- `fontisan_version` → `ucode_version`.
|
|
77
|
+
- Drop `cldr_version` and `language_coverage`.
|
|
78
|
+
- Drop `ucd_version` string → replace with `baseline` (Baseline model).
|
|
79
|
+
- Drop `unicode_scripts: String[]` → replace with `scripts: ScriptSummary[]`.
|
|
80
|
+
- Add `plane_summaries: PlaneSummary[]`.
|
|
81
|
+
- Add `discrepancies: Discrepancy[]`.
|
|
82
|
+
- `AuditBlock` → renamed `BlockSummary`. Add `missing_codepoints`,
|
|
83
|
+
`covered_codepoints` (verbose), `missing_count`, `coverage_percent`,
|
|
84
|
+
`status`, `plane`. Drop `complete` boolean (replaced by status).
|
|
85
|
+
- `AuditReport` uses `key_value do map "name", to: :name end` form —
|
|
86
|
+
same as fontisan. No `mapping do` (lutaml-model API; see project
|
|
87
|
+
memory `lutaml_model_polymorphism_api.md`).
|
|
88
|
+
|
|
89
|
+
## lutaml-model conventions
|
|
90
|
+
|
|
91
|
+
- Parent class inherits via `< Lutaml::Model::Serializable` — never
|
|
92
|
+
`include Lutaml::Model::Serializable`. See project memory
|
|
93
|
+
`feedback_lutaml_model_inheritance.md`.
|
|
94
|
+
- Boolean attributes use `Lutaml::Model::Type::Boolean` (not Ruby
|
|
95
|
+
`:boolean` — same convention as fontisan).
|
|
96
|
+
- Key-value serialization uses `key_value do ... end` for JSON/YAML.
|
|
97
|
+
No custom `to_h`/`from_h`/`to_json`/`from_json`.
|
|
98
|
+
- Nested models reference other `Ucode::Models::Audit::*` classes
|
|
99
|
+
directly (no string namespacing).
|
|
100
|
+
|
|
101
|
+
## Spec requirements
|
|
102
|
+
|
|
103
|
+
- One spec per model file under `spec/ucode/models/audit/`.
|
|
104
|
+
- Each spec:
|
|
105
|
+
- Constructs an instance with realistic attribute values (no
|
|
106
|
+
`nil` where the schema says non-nil).
|
|
107
|
+
- Round-trips through `to_hash` → `from_hash` → field equality.
|
|
108
|
+
- For collections, tests both empty and populated.
|
|
109
|
+
- No `double()` — use real instances or `Struct.new`.
|
|
110
|
+
- `AuditReport` spec additionally verifies every documented field
|
|
111
|
+
from `TODO.new/04-fontist-org-contract.md` is present.
|
|
112
|
+
|
|
113
|
+
## Acceptance
|
|
114
|
+
|
|
115
|
+
- All 27 model files exist and load via autoload chain declared in
|
|
116
|
+
`lib/ucode/models/audit.rb`.
|
|
117
|
+
- All 27 spec files pass with no `double()` usage.
|
|
118
|
+
- `Ucode::Models::Audit::AuditReport.new(...)` accepts all fields
|
|
119
|
+
from `02-audit-schema-design.md`.
|
|
120
|
+
- `AuditReport#to_hash` produces a hash matching the
|
|
121
|
+
`04-fontist-org-contract.md` JSON shape (where overlapping).
|
|
122
|
+
- Rubocop clean.
|
|
123
|
+
|
|
124
|
+
## References
|
|
125
|
+
|
|
126
|
+
- Schema source: `TODO.new/02-audit-schema-design.md`
|
|
127
|
+
- Contract: `TODO.new/04-fontist-org-contract.md`
|
|
128
|
+
- Source files: `fontisan/lib/fontisan/models/audit/`
|
|
129
|
+
- lutaml-model conventions: project memory
|
|
130
|
+
`lutaml_model_polymorphism_api.md`,
|
|
131
|
+
`feedback_lutaml_model_inheritance.md`
|
|
132
|
+
- Follow-ups: `TODO.new/08-extractors-cheap-port.md` (uses these models)
|
|
@@ -0,0 +1,113 @@
|
|
|
1
|
+
# 08 — Cheap extractors port
|
|
2
|
+
|
|
3
|
+
## Goal
|
|
4
|
+
|
|
5
|
+
Port the 5 cheap extractors from fontisan to ucode. These are the
|
|
6
|
+
"brief mode" extractors — fast, name-table-only path that doesn't need
|
|
7
|
+
UCD or expensive table loads. Plus the Coverage extractor (cheap, but
|
|
8
|
+
excluded from brief mode in fontisan because it needs cmap; in ucode we
|
|
9
|
+
keep it cheap because cmap is the Tier 1 foundation).
|
|
10
|
+
|
|
11
|
+
After this TODO, `Ucode::Audit::Registry.each(mode: :brief)` produces
|
|
12
|
+
a minimal-but-real audit report (identity + style + coverage totals,
|
|
13
|
+
no aggregations).
|
|
14
|
+
|
|
15
|
+
## Files to create
|
|
16
|
+
|
|
17
|
+
```
|
|
18
|
+
lib/ucode/audit/extractors/
|
|
19
|
+
├── base.rb # port from fontisan
|
|
20
|
+
├── provenance.rb # port from fontisan
|
|
21
|
+
├── identity.rb # port from fontisan
|
|
22
|
+
├── style.rb # port from fontisan (the older StyleExtractor, not the registry-listed Extractors::Style)
|
|
23
|
+
├── licensing.rb # port from fontisan
|
|
24
|
+
└── coverage.rb # port from fontisan
|
|
25
|
+
```
|
|
26
|
+
|
|
27
|
+
Plus update `lib/ucode/audit/registry.rb` to populate `BRIEF_EXTRACTORS`
|
|
28
|
+
and add these to `ORDERED_EXTRACTORS` (the latter stays incomplete
|
|
29
|
+
until TODO 09).
|
|
30
|
+
|
|
31
|
+
Specs: `spec/ucode/audit/extractors/<name>_spec.rb` for each.
|
|
32
|
+
|
|
33
|
+
## Port from fontisan
|
|
34
|
+
|
|
35
|
+
- `fontisan/lib/fontisan/audit/extractors/base.rb`
|
|
36
|
+
- `fontisan/lib/fontisan/audit/extractors/provenance.rb`
|
|
37
|
+
- `fontisan/lib/fontisan/audit/extractors/identity.rb`
|
|
38
|
+
- `fontisan/lib/fontisan/audit/extractors/style.rb`
|
|
39
|
+
- `fontisan/lib/fontisan/audit/extractors/licensing.rb`
|
|
40
|
+
- `fontisan/lib/fontisan/audit/extractors/coverage.rb`
|
|
41
|
+
|
|
42
|
+
## Adjustments vs fontisan
|
|
43
|
+
|
|
44
|
+
Each extractor returns a hash of `AuditReport` fields. The fontisan
|
|
45
|
+
versions read font tables via `Context#font.table(...)` — this stays
|
|
46
|
+
the same; ucode's `Context` still wraps a fontisan font handle.
|
|
47
|
+
|
|
48
|
+
### Provenance
|
|
49
|
+
|
|
50
|
+
- `fontisan_version` → `ucode_version` (read from `Ucode::VERSION`).
|
|
51
|
+
- Otherwise unchanged.
|
|
52
|
+
|
|
53
|
+
### Identity
|
|
54
|
+
|
|
55
|
+
- Unchanged. Reads `name` table via fontisan's public API.
|
|
56
|
+
|
|
57
|
+
### Style
|
|
58
|
+
|
|
59
|
+
- The standalone `StyleExtractor` class
|
|
60
|
+
(`fontisan/lib/fontisan/audit/style_extractor.rb`) is older
|
|
61
|
+
fontisan code. The registry-listed `Extractors::Style` is the newer
|
|
62
|
+
thin version. Port the registry-listed version; do not port the
|
|
63
|
+
standalone `StyleExtractor` class.
|
|
64
|
+
- Reads OS/2 + head via fontisan's public API. Same shape.
|
|
65
|
+
|
|
66
|
+
### Licensing
|
|
67
|
+
|
|
68
|
+
- Unchanged.
|
|
69
|
+
|
|
70
|
+
### Coverage
|
|
71
|
+
|
|
72
|
+
- Output `codepoints` field uses `"U+XXXX"` string form (per
|
|
73
|
+
`02-audit-schema-design.md`).
|
|
74
|
+
- Output `codepoint_ranges` uses `CodepointRange` model — port the
|
|
75
|
+
`CodepointRangeCoalescer` helper too (`fontisan/lib/fontisan/audit/codepoint_range_coalescer.rb`).
|
|
76
|
+
- Does **not** emit aggregations (blocks/scripts) — that's the
|
|
77
|
+
Aggregations extractor in TODO 10. Coverage only emits the raw
|
|
78
|
+
codepoint set.
|
|
79
|
+
|
|
80
|
+
## Boundary with fontisan
|
|
81
|
+
|
|
82
|
+
These extractors call **only** fontisan's public font-reading API:
|
|
83
|
+
|
|
84
|
+
- `fontisan_font.table("name")`
|
|
85
|
+
- `fontisan_font.table("OS/2")`
|
|
86
|
+
- `fontisan_font.table("head")`
|
|
87
|
+
- `fontisan_font.table("cmap")`
|
|
88
|
+
- `fontisan_font.sfnt_table("cmap").parse.unicode_mappings`
|
|
89
|
+
|
|
90
|
+
No reaching into `Fontisan::Constants`, no `send`, no
|
|
91
|
+
`instance_variable_get`. If a field needs a table fontisan doesn't
|
|
92
|
+
expose, file a fontisan-side issue; do not work around it in ucode.
|
|
93
|
+
|
|
94
|
+
## Acceptance
|
|
95
|
+
|
|
96
|
+
- All 6 extractor files exist; each has a passing spec with real
|
|
97
|
+
fixture fonts (use `spec/fixtures/fonts/`).
|
|
98
|
+
- `Ucode::Audit::Registry.each(mode: :brief)` iterates these 5:
|
|
99
|
+
`Provenance, Identity, Style, Licensing, Coverage`.
|
|
100
|
+
- A "brief audit" of a fixture font produces an `AuditReport` with
|
|
101
|
+
provenance, identity, style, licensing, and coverage fields
|
|
102
|
+
populated. Aggregation fields (`baseline`, `blocks`, `scripts`,
|
|
103
|
+
`plane_summaries`) are nil.
|
|
104
|
+
- No `double()` in any spec.
|
|
105
|
+
- Rubocop clean.
|
|
106
|
+
|
|
107
|
+
## References
|
|
108
|
+
|
|
109
|
+
- Models: `TODO.new/07-audit-models-port.md`
|
|
110
|
+
- Source: `fontisan/lib/fontisan/audit/extractors/{base,provenance,identity,style,licensing,coverage}.rb`
|
|
111
|
+
- Coalescer helper: `fontisan/lib/fontisan/audit/codepoint_range_coalescer.rb`
|
|
112
|
+
- fontisan API boundary: `docs/architecture.md` §"Dependency arrows"
|
|
113
|
+
- Follow-up: `TODO.new/09-extractors-expensive-port.md`
|
|
@@ -0,0 +1,99 @@
|
|
|
1
|
+
# 09 — Expensive extractors port
|
|
2
|
+
|
|
3
|
+
## Goal
|
|
4
|
+
|
|
5
|
+
Port the 5 expensive extractors from fontisan. These read multiple
|
|
6
|
+
font tables and reconstruct sub-structures (metrics, hinting program,
|
|
7
|
+
color font capabilities, variable-font axes, OpenType layout rules).
|
|
8
|
+
They are excluded from brief mode.
|
|
9
|
+
|
|
10
|
+
After this TODO, `Ucode::Audit::Registry.each(mode: :full)` produces
|
|
11
|
+
a complete (but still no-UCD-aggregations) audit report.
|
|
12
|
+
|
|
13
|
+
## Files to create
|
|
14
|
+
|
|
15
|
+
```
|
|
16
|
+
lib/ucode/audit/extractors/
|
|
17
|
+
├── metrics.rb # port from fontisan
|
|
18
|
+
├── hinting.rb # port from fontisan
|
|
19
|
+
├── color_capabilities.rb # port from fontisan
|
|
20
|
+
├── variation_detail.rb # port from fontisan
|
|
21
|
+
└── opentype_layout.rb # port from fontisan
|
|
22
|
+
```
|
|
23
|
+
|
|
24
|
+
Plus update `lib/ucode/audit/registry.rb` `ORDERED_EXTRACTORS` to
|
|
25
|
+
include these in their fontisan positions.
|
|
26
|
+
|
|
27
|
+
Specs: `spec/ucode/audit/extractors/<name>_spec.rb` for each.
|
|
28
|
+
|
|
29
|
+
## Port from fontisan
|
|
30
|
+
|
|
31
|
+
- `fontisan/lib/fontisan/audit/extractors/metrics.rb`
|
|
32
|
+
- `fontisan/lib/fontisan/audit/extractors/hinting.rb`
|
|
33
|
+
- `fontisan/lib/fontisan/audit/extractors/color_capabilities.rb`
|
|
34
|
+
- `fontisan/lib/fontisan/audit/extractors/variation_detail.rb`
|
|
35
|
+
- `fontisan/lib/fontisan/audit/extractors/opentype_layout.rb`
|
|
36
|
+
|
|
37
|
+
## Adjustments vs fontisan
|
|
38
|
+
|
|
39
|
+
Each extractor returns a hash with one or two keys mapping to a
|
|
40
|
+
model from `TODO.new/07-audit-models-port.md`. The fontisan versions
|
|
41
|
+
already return hashes in this shape — port unchanged.
|
|
42
|
+
|
|
43
|
+
### Metrics
|
|
44
|
+
|
|
45
|
+
- Reads `head`, `hhea`, `OS/2`, `post` via fontisan's public API.
|
|
46
|
+
- Returns `{ metrics: Ucode::Models::Audit::Metrics.new(...) }`.
|
|
47
|
+
- Returns `{}` (empty hash) for Type 1 fonts.
|
|
48
|
+
|
|
49
|
+
### Hinting
|
|
50
|
+
|
|
51
|
+
- Reads `fpgm`, `prep`, `cvt`, `gasp`, plus CFF charstrings for CFF fonts.
|
|
52
|
+
- Returns `{ hinting: ... }` or `{}`.
|
|
53
|
+
|
|
54
|
+
### ColorCapabilities
|
|
55
|
+
|
|
56
|
+
- Reads `COLR`, `CPAL`, `SVG`, `CBDT`, `CBLC`, `sbix`.
|
|
57
|
+
- Returns `{ color_capabilities: ... }` or `{}`.
|
|
58
|
+
|
|
59
|
+
### VariationDetail
|
|
60
|
+
|
|
61
|
+
- Reads `fvar`, `gvar`, `STAT`, `avar`, `HVAR`, `VVAR`, etc.
|
|
62
|
+
- Returns `{ variation: ... }` or `{}` for non-variable faces.
|
|
63
|
+
|
|
64
|
+
### OpenTypeLayout
|
|
65
|
+
|
|
66
|
+
- Reads `GSUB`, `GPOS`.
|
|
67
|
+
- Returns `{ opentype_layout: ... }` or `{}`.
|
|
68
|
+
|
|
69
|
+
## Boundary with fontisan
|
|
70
|
+
|
|
71
|
+
Same boundary as TODO 08: only public font-reading API. If a table
|
|
72
|
+
isn't exposed publicly, file a fontisan-side issue.
|
|
73
|
+
|
|
74
|
+
For complex table walks (e.g. GSUB script list iteration, COLR layer
|
|
75
|
+
tree), prefer asking fontisan to expose a higher-level reader (e.g.
|
|
76
|
+
`fontisan_font.gsub_scripts`) rather than parsing the raw table bytes
|
|
77
|
+
in ucode. ucode is the audit owner, not the font parser.
|
|
78
|
+
|
|
79
|
+
## Acceptance
|
|
80
|
+
|
|
81
|
+
- All 5 extractor files exist; each has a passing spec with real
|
|
82
|
+
fixture fonts covering: static TrueType, CFF/OTF, variable font,
|
|
83
|
+
color font (COLR or CBDT), Type 1 (returns empty).
|
|
84
|
+
- `Ucode::Audit::Registry.each(mode: :full)` iterates all 10
|
|
85
|
+
extractors ported so far (5 cheap from TODO 08 + 5 expensive here).
|
|
86
|
+
Still missing: Aggregations (TODO 10).
|
|
87
|
+
- A full audit of a fixture variable font populates `variation.axes`,
|
|
88
|
+
`variation.named_instances`, and `opentype_layout` correctly.
|
|
89
|
+
- A full audit of a fixture COLR font populates
|
|
90
|
+
`color_capabilities.colr_layers`, `color_capabilities.cpal_palettes`.
|
|
91
|
+
- No `double()` in any spec.
|
|
92
|
+
- Rubocop clean.
|
|
93
|
+
|
|
94
|
+
## References
|
|
95
|
+
|
|
96
|
+
- Models: `TODO.new/07-audit-models-port.md`
|
|
97
|
+
- Source: `fontisan/lib/fontisan/audit/extractors/{metrics,hinting,color_capabilities,variation_detail,opentype_layout}.rb`
|
|
98
|
+
- Fixtures: `spec/fixtures/fonts/` (port any missing from fontisan's spec/fixtures/)
|
|
99
|
+
- Follow-up: `TODO.new/10-aggregations-ucd-rewrite.md` (last extractor)
|