@mailwoman/neural-weights-en-us 2.0.6 → 2.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,30 +1,33 @@
1
1
  # @mailwoman/neural-weights-en-us
2
2
 
3
- Phase 2 / Stage 1 (coarse) Mailwoman neural-classifier weights.
3
+ Stage 2 (coarse + venue/street/house_number) Mailwoman neural-classifier weights.
4
4
 
5
5
  - locale: **en-us**
6
- - corpus: **0.2.0**
7
- - training steps: **50000**
6
+ - corpus: **0.3.0**
7
+ - training steps: **2200**
8
8
  - hardware: **AMD Radeon 780M (gfx1103) bf16 ~14.6 GiB GTT**
9
9
 
10
- ## Phase 2 §6 status
10
+ ## Per-component F1 targets
11
11
 
12
- **⚠ Below Phase 2 §6 targets (≥95% F1):**
12
+ **⚠ Below per-component F1 targets:**
13
13
 
14
- - `country` F1 = **0.0000** (target ≥0.95)
15
- - `region` F1 = **0.8293** (target ≥0.95)
16
- - `locality` F1 = **0.6471** (target ≥0.95)
17
- - `postcode` F1 = **0.8594** (target ≥0.95)
14
+ - `country` F1 = **0.2112** (target ≥0.95)
15
+ - `region` F1 = **0.1883** (target ≥0.95)
16
+ - `locality` F1 = **0.2736** (target ≥0.95)
17
+ - `postcode` F1 = **0.6916** (target ≥0.95)
18
+ - `venue` F1 = **0.3886** (target ≥0.60)
19
+ - `street` F1 = **0.3016** (target ≥0.70)
20
+ - `house_number` F1 = **0.7866** (target ≥0.80)
18
21
 
19
22
  ## Eval (golden set)
20
23
 
21
- - entries: **74**
22
- - full-parse exact match: **0.5270**
23
- - mean token confidence: **0.9745**
24
+ - entries: **4535**
25
+ - full-parse exact match: **0.0818**
26
+ - mean token confidence: **0.8063**
24
27
 
25
28
  ## Components supported
26
29
 
27
- Stage 1 ships coarse-only: country / region / locality / dependent_locality / postcode / subregion / cedex. Street- and venue-level components are explicit future phases.
30
+ Stage 2 ships coarse (country / region / locality / dependent_locality / postcode / subregion / cedex) plus fine-grained venue / street / house_number. Token classifier emits 21 BIO labels.
28
31
 
29
32
  ## Files
30
33
 
package/model-card.json CHANGED
@@ -1,139 +1,87 @@
1
1
  {
2
2
  "name": "neural-weights-en-us",
3
- "version": "0.2.0",
4
- "phase": "Stage 1 (coarse)",
3
+ "version": "0.5.1",
4
+ "phase": "Stage 2 (coarse + venue/street/house_number) — CE-only, unchained",
5
5
  "license": "AGPL-3.0-only",
6
6
  "locale": "en-us",
7
7
  "training": {
8
- "corpus_version": "0.2.0",
9
- "tokenizer_version": "0.1.0",
10
- "steps": 50000,
11
- "hardware": "AMD Radeon 780M (gfx1103) bf16 ~14.6 GiB GTT",
12
- "duration_seconds": 23520.0,
13
- "started_at": null,
14
- "completed_at": "2026-05-18T21:33:27.368193Z"
8
+ "corpus_version": "0.4.0",
9
+ "tokenizer_version": "0.5.0-a1",
10
+ "steps": 95000,
11
+ "hardware": "NVIDIA A100-SXM4-40GB (Modal cloud)",
12
+ "duration_seconds": 2100,
13
+ "started_at": "2026-05-25T06:04:00Z",
14
+ "completed_at": "2026-05-25T06:39:00Z",
15
+ "recipe": "CE-only (crf_loss_weight=0.0), h384, batch=128 direct, constant LR=1.5e-4, phrase_priors=ON, class_weights tuned"
15
16
  },
16
- "components_supported": ["country", "region", "locality", "dependent_locality", "postcode", "subregion", "cedex"],
17
+ "architecture": {
18
+ "hidden_size": 384,
19
+ "num_hidden_layers": 6,
20
+ "num_attention_heads": 6,
21
+ "intermediate_size": 1536,
22
+ "max_position_embeddings": 128,
23
+ "params": "29M",
24
+ "crf_at_training": false,
25
+ "crf_at_inference": true,
26
+ "phrase_priors": true
27
+ },
28
+ "components_supported": [
29
+ "country",
30
+ "region",
31
+ "locality",
32
+ "dependent_locality",
33
+ "postcode",
34
+ "subregion",
35
+ "cedex",
36
+ "venue",
37
+ "street",
38
+ "house_number"
39
+ ],
40
+ "labels": [
41
+ "O",
42
+ "B-country",
43
+ "I-country",
44
+ "B-region",
45
+ "I-region",
46
+ "B-locality",
47
+ "I-locality",
48
+ "B-dependent_locality",
49
+ "I-dependent_locality",
50
+ "B-postcode",
51
+ "I-postcode",
52
+ "B-subregion",
53
+ "I-subregion",
54
+ "B-cedex",
55
+ "I-cedex",
56
+ "B-venue",
57
+ "I-venue",
58
+ "B-street",
59
+ "I-street",
60
+ "B-house_number",
61
+ "I-house_number"
62
+ ],
17
63
  "eval": {
18
- "n_entries": 74,
19
- "full_parse_exact_match": 0.527027027027027,
20
- "mean_token_confidence": 0.974534777700901,
21
- "per_component": {
22
- "country": {
23
- "precision": 0.0,
24
- "recall": 0.0,
25
- "f1": 0.0,
26
- "support": 6
27
- },
28
- "region": {
29
- "precision": 0.8499999999858334,
30
- "recall": 0.80952380951096,
31
- "f1": 0.8292682921697403,
32
- "support": 63
33
- },
34
- "locality": {
35
- "precision": 0.6874999999892578,
36
- "recall": 0.6111111111026234,
37
- "f1": 0.6470588230216262,
38
- "support": 72
39
- },
40
- "dependent_locality": {
41
- "precision": 0.0,
42
- "recall": 0.0,
43
- "f1": 0.0,
44
- "support": 1
45
- },
46
- "postcode": {
47
- "precision": 0.8730158730020157,
48
- "recall": 0.8461538461408283,
49
- "f1": 0.8593749994866943,
50
- "support": 65
51
- },
52
- "subregion": {
53
- "precision": 0.0,
54
- "recall": 0.0,
55
- "f1": 0.0,
56
- "support": 0
57
- },
58
- "cedex": {
59
- "precision": 0.0,
60
- "recall": 0.0,
61
- "f1": 0.0,
62
- "support": 1
63
- }
64
- },
65
- "calibration": [
66
- {
67
- "low": 0.0,
68
- "high": 0.1,
69
- "n": 0,
70
- "acc": 0.0
71
- },
72
- {
73
- "low": 0.1,
74
- "high": 0.2,
75
- "n": 0,
76
- "acc": 0.0
77
- },
78
- {
79
- "low": 0.2,
80
- "high": 0.3,
81
- "n": 0,
82
- "acc": 0.0
83
- },
84
- {
85
- "low": 0.3,
86
- "high": 0.4,
87
- "n": 5,
88
- "acc": 0.2
89
- },
90
- {
91
- "low": 0.4,
92
- "high": 0.5,
93
- "n": 9,
94
- "acc": 0.4444444444444444
95
- },
96
- {
97
- "low": 0.5,
98
- "high": 0.6,
99
- "n": 20,
100
- "acc": 0.4
101
- },
102
- {
103
- "low": 0.6,
104
- "high": 0.7,
105
- "n": 8,
106
- "acc": 0.5
107
- },
108
- {
109
- "low": 0.7,
110
- "high": 0.8,
111
- "n": 19,
112
- "acc": 0.3684210526315789
113
- },
114
- {
115
- "low": 0.8,
116
- "high": 0.9,
117
- "n": 25,
118
- "acc": 0.4
119
- },
120
- {
121
- "low": 0.9,
122
- "high": 1.0,
123
- "n": 1114,
124
- "acc": 0.8824057450628366
125
- }
126
- ]
64
+ "val_macro_f1": 0.638,
65
+ "val_loss": 0.281,
66
+ "golden_eval": {
67
+ "n_entries": 4535,
68
+ "hybrid_joint_exact_match": 0.102,
69
+ "hybrid_joint_macro_f1": 0.17,
70
+ "hybrid_joint_empty_parse": 0.0,
71
+ "rule_only_exact_match": 0.308,
72
+ "neural_macro_f1": 0.078
73
+ }
127
74
  },
128
75
  "known_failure_modes": [
129
- "underperforms on Hawaiian addresses (sparse in training corpus)",
130
- "particle-honorific kryptonite (e.g. FR 'Saint-Just-Saint-Rambert') if not in synth set",
131
- "non-Latin scripts (CJK, Cyrillic) fall through to byte-fallback tokens; F1 unknown"
76
+ "54.5% overconfident-wrong in neural-only mode (addressed by reconciler: 0.1%)",
77
+ "dependent_locality hallucination reduced by class_weights=0.3 but not eliminated",
78
+ "non-Latin scripts: A1 tokenizer has 18.2% byte-fallback (vs v0.1.0's 36.7%); model not trained on non-Latin addresses yet",
79
+ "particle-honorific kryptonite (e.g. FR 'Saint-Just-Saint-Rambert')"
132
80
  ],
133
- "notes": "Stage 1 coarse v0.2.0 \u2014 same architecture as v0.1.0 (8.87M params, 6L/256H/4-heads), trained on the expanded corpus-v0.2.0 (262.7M aligned rows, 6 train sources) with the loader rewrite from issue #43 (source-weighted multinomial sampler + relaxed coarse filter). The v0.1.0 positional-heuristic overfit was driven by a strict country-tag gate that dropped ~94% of v0.2.0 before any source weighting; with the gate relaxed and the loader interleaving sources at the row level, the model now sees a fixed mix of ban/tiger/nppes/state-tx/wof-admin/wof-postalcode per batch instead of mono-source blocks. See evals/scores-by-version.json for the v0.1.0 \u2192 v0.2.0 deltas.",
81
+ "notes": "v0.5.1 'unchained' iteration. Removes all hardware constraints from v0.5.0 (h256→h384, grad_accum→direct batch, 50K→100K steps, phrase priors OFF→ON, class weights uniform→tuned). CE-only training fix (crf_loss_weight=0) carries from v0.5.0 nine dual-loss runs diverged, CE-only is stable. Val_loss oscillates every ~20K steps (hard-cluster cycling, not overfitting). Best checkpoint at step-95K. +77% over v0.4.0 on training eval; +70% over v0.5.0 on golden exact-match (hybrid-joint 6.0%→10.2%).",
134
82
  "format": {
135
- "model": "ONNX int8 dynamic",
136
- "tokenizer": "SentencePiece unigram, byte_fallback=true, vocab_size=16000",
83
+ "model": "ONNX fp32 dynamic",
84
+ "tokenizer": "SentencePiece unigram, byte_fallback=true, vocab_size=48000",
137
85
  "max_sequence_length": 128,
138
86
  "opset": 17
139
87
  },
@@ -142,5 +90,5 @@
142
90
  "tokenizer": "tokenizer.model",
143
91
  "model_card": "model-card.json"
144
92
  },
145
- "base_relpath": "/data/models/checkpoints/stage1-coarse/step-050000"
93
+ "base_relpath": "/data/output/checkpoints/step-095000"
146
94
  }
package/model.onnx CHANGED
Binary file
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@mailwoman/neural-weights-en-us",
3
- "version": "2.0.6",
3
+ "version": "2.2.0",
4
4
  "description": "Mailwoman neural-classifier weights for locale 'en-us'. Data-only package — loaded by @mailwoman/neural at runtime.",
5
5
  "license": "AGPL-3.0-only",
6
6
  "repository": {
package/tokenizer.model CHANGED
Binary file