@mailwoman/neural-weights-en-us 4.3.0 → 4.5.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/model-card.json CHANGED
@@ -1,17 +1,17 @@
1
1
  {
2
2
  "name": "neural-weights-en-us",
3
- "version": "4.3.0",
4
- "model_lineage": "v1.1.0-relabel-consolidation / step 40000 — the v4.2.0 consolidation recipe FROM SCRATCH on a label-consistent mix (#511 affix-split relabel pass: the base corpus's ~467M monolithic street labels rewritten to agree with the affix shard, ending the 1,039:1 contradiction the #492 probe ladder isolated), with both anti-contradiction compensations reverted (synth-affix 17.0→2.0, affix tag weights →1.5). Ships with the locale head exported (locale_logits) driving the address-system conventions mask (#478 slice 1) declared inference config addressSystemConventions:'auto'. Gate: docs/articles/evals/2026-06-11-v4.3.0-ship-gate.md; tokenizer 0.6.0-a0",
5
- "phase": "Stage 3v1.1 relabel: affix parity closed (data-consistency lever) + conventions layer slice 1",
3
+ "version": "4.5.0",
4
+ "model_lineage": "v1.4.0-charoffset / step 40000 — the FIRST char-offset-corpus model (#519). From scratch on corpus v0.5.0 (676.6M rows: a from-source rebuild in the char-offset span-label format + the 7 re-emitted parity overlays). IDENTICAL recipe to v4.4.0's v1.3.0-boundary-consolidation the ONLY variable is the label format (token BIO char-offset span triple), so the v4.4.0 floors are the hold-the-line contract. HEADLINE: the span bridge (bridgePunctuationGaps) is RETIRED the char-offset format lets the model learn dotted po_box spans intrinsically (po_box 90.0 graded bridge-OFF, ≥ the 89.1 the bridge achieved), and bridge-on vs bridge-off is byte-identical on every tag (confirmed no-op). One declared inference behavior remains: addressSystemConventions:'auto'. Gate: scripts/eval/gates/v0.5.0-bridge.json + docs/articles/evals/2026-06-12-v050-charoffset-launch.md; tokenizer 0.6.0-a0.",
5
+ "phase": "v1.4 char-offsetthe structural cure: span-label format ships, the decode-side span bridge retired (po_box learned intrinsically)",
6
6
  "license": "AGPL-3.0-only",
7
7
  "locale": "en-us",
8
8
  "training": {
9
- "corpus_version": "0.4.12-consolidation (+ loader affix-relabel pass, lexicon affix-relabel-v1)",
9
+ "corpus_version": "0.5.0 (char-offset span labels #519: from-source base + 7 re-emitted parity overlays)",
10
10
  "tokenizer_version": "0.6.0-a0",
11
11
  "steps": 40000,
12
12
  "best_step": 40000,
13
13
  "hardware": "NVIDIA A100-SXM4-40GB (Modal cloud)",
14
- "recipe": "v1.1.0-relabel-consolidation: from-scratch 40k on v0.4.12 with the #511 affix-split relabel pass at load time (trailing USPS suffix→street_suffix, leading directional→street_prefix, exact shard-builder parity, after augmentation), synth-affix 2.0 (natural weight — the contradiction it fought is gone), affix tag class-weights 1.5, gazetteer anchor + train-time choreography, CE-only, lr=1.5e-4, seed 42."
14
+ "recipe": "v1.4.0-charoffset: from-scratch 40k on the char-offset corpus, recipe held identical to v1.3.0-boundary (augment_glue_prob 0.25, synth-intersection 1.0, synth-po-box-cedex 1.5, affix relabel, gazetteer anchor + choreography, CE-only, lr=1.5e-4, seed 42) — the format is the single variable."
15
15
  },
16
16
  "architecture": {
17
17
  "hidden_size": 384,
@@ -21,7 +21,7 @@
21
21
  "max_position_embeddings": 128,
22
22
  "vocab_size": 48000,
23
23
  "num_labels": 33,
24
- "params": "29.3M (29M encoder + 9M embedding from 48K vocab)",
24
+ "params": "29.6M (29M encoder + 9M embedding from 48K vocab)",
25
25
  "crf_at_training": false,
26
26
  "crf_at_inference": true,
27
27
  "phrase_priors": true
@@ -79,9 +79,9 @@
79
79
  "intersection_a",
80
80
  "intersection_b"
81
81
  ],
82
- "notes": "v4.3.0 — the affix lever closed by data consistency, not weights. street_prefix 64.9→93.6, street_suffix 48.8→96.6 (int8, real-affix eval); on the new 193-row NAD-native eval the shipped v4.2.0 scored 18.2/8.9 F1 where this model scores 92.2/90.3 at P=100. First shipped slice of the conventions layer: the locale head (now exported as locale_logits) detects the address system and the decoder OBEYS it fr forbids the USPS affix decomposition and pins the 5-digit postcode shape (fixes the FR leading-postcode digit-split this run's first gate caught). Consumers: pass addressSystemConventions:'auto' (the declared ship config) or omit for byte-stable pre-#511 behavior. Known follow-ups: po_box (deferred lever, both Montréal gate rows), FR region tail (27.6→16.2, unfloored), de/gb conventions rows as evidence arrives.",
82
+ "notes": "v4.5.0 — the char-offset cure. The corpus now labels spans by character offset (#519), so the model learns dotted surfaces (P.O. Box, C.P.) as contiguous spans WITHOUT the decode-side span bridge: po_box 90.0 graded bridge-OFF (v4.4.0 needed the bridge to reach 89.1). The bridge is now a CONFIRMED NO-OP (bridge-on == bridge-off on every tag) and is retired it may be left enabled harmlessly. 15/17 tags hold flat-or-better actual-vs-v4.4.0. KNOWN REGRESSION (accepted trade, recover-next-run): fr.house_number 97.7 ~89.6 (−8pp). Root cause #560 the model is order-blind on FR house_numbers in REVERSED (postcode-first) order (\"47110 Locality, 69 Street\"), which the FR training (BAN, canonical-order) never taught; v4.4.0's 97.7 was bridge-rescued on exactly these. The FR house_number eval is 85% one reversed-order cluster, so the metric is hypersensitive. Fix queued: a reversed-order FR synth shard (the German both-order shape). Other follow-up: corpus-v0.5.1 code-point offset re-align (the #558 astral-skip's lasting fix).",
83
83
  "format": {
84
- "model": "ONNX int8 dynamic (quantized from fp32)",
84
+ "model": "ONNX int8 dynamic (quantized from fp32, max fp32↔int8 delta 0.3pp)",
85
85
  "tokenizer": "SentencePiece unigram, byte_fallback=true, vocab_size=48000",
86
86
  "max_sequence_length": 128,
87
87
  "opset": 17,
@@ -99,64 +99,51 @@
99
99
  "method": "isotonic-regression (PAVA) over per-span softmax confidence; OPT-IN via core/decoder createCalibrator",
100
100
  "held_out_ece_raw": 0.0673,
101
101
  "held_out_ece_calibrated": 0.0035,
102
- "note": "calibration.json is the global table; calibration-per-locale.json carries per-locale tables (the global table under-serves DE/NL). Apply via @mailwoman/core/decoder's createCalibrator; default parse output is byte-stable when omitted."
102
+ "note": "calibration.json + calibration-per-locale.json carried forward from v4.4.0 (architecture + label space unchanged). Re-fit recommended next train. Apply via @mailwoman/core/decoder's createCalibrator; default parse output is byte-stable when omitted."
103
103
  },
104
- "base_relpath": "/data/output-v097-unit-v3-s42/checkpoints/step-020000",
104
+ "base_relpath": "/data/output-v140-charoffset-s42/checkpoints/step-040000",
105
105
  "eval": {
106
- "ship_gate_2026_06_11": {
107
- "promotion_gate": "PASS 12/12 (gates/v4.3.0-relabel.json, int8-graded, max fp32-int8 delta 0.2pp)",
108
- "honest_eval_vt": {
109
- "n": 1428,
110
- "region_match_pct": 99.6,
111
- "coord_p50_km": 3.4,
112
- "coord_p90_km": 7.4,
113
- "pip_coverage_adj_pct": 46.9,
114
- "baseline_v420_region_pct": 99.9,
115
- "verdict": "PASS"
116
- },
117
- "demo_presets": "PASS — affix split fires live (1060 W Addison St → W + Addison + St)",
118
- "int8_vs_fp32": "PASS — all gate tags within 0.2pp",
119
- "affix_stability_20k_to_40k": "v2 prefix 88.3→91.6, suffix 89.4→90.3 (rose, no decay)"
106
+ "gate_2026_06_12_charoffset": {
107
+ "promotion_gate": "FAIL 16/17 (gates/v0.5.0-bridge.json, bridge-OFF) — the single miss is fr.house_number 89.6 vs floor 91 (accepted trade, #560). int8-vs-fp32 max delta 0.3pp (≤1.5 cap). bridge-retirement VALIDATED: po_box 90.0 bridge-OFF ≥ 89.1.",
108
+ "bridge_retired": true,
109
+ "gate_doc": "docs/articles/evals/2026-06-12-v050-charoffset-launch.md"
120
110
  },
121
- "per_component_int8_gazfed_conventions": {
111
+ "per_component_int8_shipconfig": {
122
112
  "us": {
123
- "postcode": 97.8,
124
- "country_homograph": 85.1,
125
- "micro": 85.1,
126
- "locality": 74.4,
127
- "region": 89.1,
128
- "street": 75.5,
129
- "street_prefix": 93.6,
130
- "street_suffix": 96.6,
131
- "unit": 92.1
113
+ "postcode": 98.5,
114
+ "country_homograph": 87.5,
115
+ "micro": 85.4,
116
+ "locality": 74.0,
117
+ "region": 89.5,
118
+ "street": 75.8,
119
+ "street_prefix": 98.0,
120
+ "street_suffix": 94.9,
121
+ "unit": 97.0,
122
+ "po_box_real": 90.0,
123
+ "intersection_real": 100.0
132
124
  },
133
125
  "fr": {
134
126
  "postcode": 99.7,
135
- "house_number": 97.7,
136
- "region": 16.2
127
+ "house_number": 89.3,
128
+ "region": 35.2,
129
+ "cedex_real": 96.6
137
130
  },
138
131
  "de": {
139
- "native_locality_anchor_on": 90.1
140
- },
141
- "affix_nad_native_v2": {
142
- "street_prefix_f1": 92.2,
143
- "street_suffix_f1": 90.3,
144
- "precision": 100.0,
145
- "baseline_v420": {
146
- "street_prefix_f1": 18.2,
147
- "street_suffix_f1": 8.9
148
- }
132
+ "native_locality_anchor_on": 91.0
149
133
  }
150
134
  },
151
- "known_regressions_vs_4_2_0": {
152
- "fr_region": -11.4,
153
- "us_street_exact": -0.7,
154
- "country_homograph": "-4.7 (NOT comparable: the v4.2.0 figure was measured under the gaz-starved country leg fixed this gate; both clear the 83.3 floor)",
155
- "mitigations": "FR region tail rides the next corpus pass; arbitration layer #478 continues"
135
+ "vs_4_4_0_actual": {
136
+ "po_box_real": "+0.9 (89.1 bridged → 90.0 intrinsic, bridge RETIRED)",
137
+ "unit_real": "+4.9",
138
+ "country_homograph": "+2.4",
139
+ "street_prefix": "+4.4",
140
+ "fr_region": "+9.6",
141
+ "fr_house_number": "-8.1 (97.7 → 89.6) — KNOWN, #560, reversed-order FR (recover next run)",
142
+ "note": "15/17 tags flat-or-better actual-vs-actual; fr.house_number is the one real casualty (accepted trade)."
156
143
  }
157
144
  },
158
145
  "files_md5": {
159
- "model.onnx": "9ab47793a4a454c8432c5de05567ad0f",
146
+ "model.onnx": "25f3956de77a252bb9440907eb5b2a37",
160
147
  "tokenizer.model": "b6137e8c52914c9715374268ecaa4bc6"
161
148
  }
162
149
  }
package/model.onnx CHANGED
Binary file
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@mailwoman/neural-weights-en-us",
3
- "version": "4.3.0",
3
+ "version": "4.5.0",
4
4
  "description": "Mailwoman neural-classifier weights for locale 'en-us'. Data-only package — loaded by @mailwoman/neural at runtime.",
5
5
  "license": "AGPL-3.0-only",
6
6
  "repository": {