@mailwoman/classifiers 4.10.0 → 4.12.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +78 -0
- package/package.json +2 -2
package/README.md
ADDED
|
@@ -0,0 +1,78 @@
|
|
|
1
|
+
# @mailwoman/classifiers
|
|
2
|
+
|
|
3
|
+
**Mailwoman rule-based classifiers** — a library of deterministic token classifiers
|
|
4
|
+
that assign `Classification` labels to address tokens using pattern matching,
|
|
5
|
+
dictionaries, and heuristics.
|
|
6
|
+
|
|
7
|
+
These are the v0 rules engine's individual classifiers, each responsible for one
|
|
8
|
+
grammatical category (house numbers, street suffixes, postcodes, place names, etc.).
|
|
9
|
+
They're composed into a `CompositeClassifier` that runs them in priority order.
|
|
10
|
+
The neural classifier (`@mailwoman/neural`) largely supersedes these for
|
|
11
|
+
production parsing, but they remain valuable for:
|
|
12
|
+
|
|
13
|
+
- **Bootstrapping and corpus labeling**
|
|
14
|
+
- **Fallback classification** for token types the model struggles with
|
|
15
|
+
- **Arbitration** — comparing rule output against neural output to detect regressions
|
|
16
|
+
- **Diagnostic tooling** — understanding _why_ a token was classified a certain way
|
|
17
|
+
|
|
18
|
+
```ts
|
|
19
|
+
import { CompositeClassifier } from "@mailwoman/classifiers"
|
|
20
|
+
|
|
21
|
+
const classifier = new CompositeClassifier()
|
|
22
|
+
const classification = classifier.classify(tokens)
|
|
23
|
+
// tokens[0].classification → { house_number: "1600" }
|
|
24
|
+
// tokens[1].classification → { street: "Amphitheatre" }
|
|
25
|
+
```
|
|
26
|
+
|
|
27
|
+
## Included classifiers
|
|
28
|
+
|
|
29
|
+
| Classifier | Detects |
|
|
30
|
+
| ------------------------------------- | -------------------------------------------------- |
|
|
31
|
+
| `HouseNumberClassifier` | Numeric house/building numbers |
|
|
32
|
+
| `PostcodeClassifier` | Postcode/ZIP patterns per locale |
|
|
33
|
+
| `RoadTypeClassifier` | Street suffixes (St, Ave, Rd, Blvd, etc.) |
|
|
34
|
+
| `DirectionalClassifier` | Cardinal directions (N, S, NE, Southwest, etc.) |
|
|
35
|
+
| `PlaceClassifier` | Locality/region/country names (via WOF dictionary) |
|
|
36
|
+
| `IntersectionClassifier` | Intersection connectors (&, at, and, @) |
|
|
37
|
+
| `CompoundStreetClassifier` | Multi-word street names |
|
|
38
|
+
| `CompoundUnitDesignatorClassifier` | Unit designators (Apt, Ste, Unit, #, etc.) |
|
|
39
|
+
| `OrdinalClassifier` | Ordinal numbers (1st, 2nd, 3rd floor) |
|
|
40
|
+
| `LevelClassifier` | Floor/level numbers |
|
|
41
|
+
| `AlphaNumericClassifier` | Alphanumeric identifiers |
|
|
42
|
+
| `StopWordClassifier` | Filler/stop words |
|
|
43
|
+
| `PersonClassifier` | Person name components |
|
|
44
|
+
| `GivenNameClassifier` | Given/first names |
|
|
45
|
+
| `MiddleInitialClassifier` | Middle initials |
|
|
46
|
+
| `PersonalTitleClassifier` | Titles (Mr, Mrs, Dr, etc.) |
|
|
47
|
+
| `PersonalSuffixClassifier` | Name suffixes (Jr, Sr, III, etc.) |
|
|
48
|
+
| `ChainClassifier` | Chain/business name patterns |
|
|
49
|
+
| `CentralEuropeanStreetNameClassifier` | Central European street name conventions |
|
|
50
|
+
| `AdjacencyClassifier` | Adjacency-based disambiguation |
|
|
51
|
+
|
|
52
|
+
## API
|
|
53
|
+
|
|
54
|
+
```ts
|
|
55
|
+
// Compose all classifiers with default priority
|
|
56
|
+
import { CompositeClassifier } from "@mailwoman/classifiers"
|
|
57
|
+
const composite = new CompositeClassifier()
|
|
58
|
+
|
|
59
|
+
// Or pick specific classifiers
|
|
60
|
+
import { HouseNumberClassifier, PostcodeClassifier } from "@mailwoman/classifiers"
|
|
61
|
+
|
|
62
|
+
// Base class for custom classifiers
|
|
63
|
+
import { Classifier } from "@mailwoman/classifiers"
|
|
64
|
+
|
|
65
|
+
// Type adapter for pipeline integration
|
|
66
|
+
import { classifierAdapter } from "@mailwoman/classifiers"
|
|
67
|
+
```
|
|
68
|
+
|
|
69
|
+
## Related
|
|
70
|
+
|
|
71
|
+
- [`@mailwoman/core`](../core) — `Classification`, `ClassificationMap`, token types
|
|
72
|
+
- [`@mailwoman/neural`](../neural) — the neural classifier that replaces these for production use
|
|
73
|
+
- [`@mailwoman/codex`](../codex) — postal reference data consumed by several classifiers
|
|
74
|
+
- [Rule-Based Classifiers concepts](https://mailwoman.sister.software/articles/concepts/rule-based-classifiers/)
|
|
75
|
+
|
|
76
|
+
## License
|
|
77
|
+
|
|
78
|
+
[AGPL-3.0-only](https://www.gnu.org/licenses/agpl-3.0.html)
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "@mailwoman/classifiers",
|
|
3
|
-
"version": "4.
|
|
3
|
+
"version": "4.12.0",
|
|
4
4
|
"description": "Mailwoman rule-based classifiers.",
|
|
5
5
|
"license": "AGPL-3.0-only",
|
|
6
6
|
"repository": {
|
|
@@ -15,7 +15,7 @@
|
|
|
15
15
|
"./*": "./out/*"
|
|
16
16
|
},
|
|
17
17
|
"dependencies": {
|
|
18
|
-
"@mailwoman/core": "4.
|
|
18
|
+
"@mailwoman/core": "4.12.0"
|
|
19
19
|
},
|
|
20
20
|
"files": [
|
|
21
21
|
"out/**/*.js",
|