@mailwoman/address-id 4.10.0 → 4.11.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +71 -0
- package/package.json +3 -3
package/README.md
ADDED
|
@@ -0,0 +1,71 @@
|
|
|
1
|
+
# @mailwoman/address-id
|
|
2
|
+
|
|
3
|
+
**Stable, parseable address primary keys** — the deterministic, exact-match
|
|
4
|
+
complement to the fuzzy matcher.
|
|
5
|
+
|
|
6
|
+
Where `@mailwoman/match` decides whether two messy records are _probably_ the
|
|
7
|
+
same entity, `@mailwoman/address-id` produces a content-addressed key you can
|
|
8
|
+
`GROUP BY` or `JOIN ON` without running the matcher at all — for the common
|
|
9
|
+
"same canonical address" case.
|
|
10
|
+
|
|
11
|
+
```ts
|
|
12
|
+
import { createPostalAddressID } from "@mailwoman/address-id"
|
|
13
|
+
|
|
14
|
+
const id = createPostalAddressID({
|
|
15
|
+
components: { street: "123 Main St", locality: "Austin", region: "TX", postcode: "78701" },
|
|
16
|
+
coordinate: { lat: 30.2672, lon: -97.7431 },
|
|
17
|
+
})
|
|
18
|
+
// → "tx.882830829dfffff.abc123def456"
|
|
19
|
+
```
|
|
20
|
+
|
|
21
|
+
## Key structure
|
|
22
|
+
|
|
23
|
+
```
|
|
24
|
+
<state>.<H3-cell>.<content-hash>
|
|
25
|
+
```
|
|
26
|
+
|
|
27
|
+
| Segment | Purpose |
|
|
28
|
+
| ---------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
|
29
|
+
| **State prefix** | Coarse region (`tx`, `ca`, `ny`, …) from a supplied state or plucked from the ZIP via `@mailwoman/codex`; `xx` when unknown. Makes the key region-sortable. |
|
|
30
|
+
| **H3 cell** | Jitter-stable locality token from the resolved coordinate (`h3-js` `latLngToCell` at resolution 9). Coarse on purpose: two geocodes of the same place a few metres apart land in the same cell. |
|
|
31
|
+
| **Content hash** | Hash of the address canonicalized by `@mailwoman/normalize`, so `123 Main St` and `123 MAIN STREET` hash identically. This is the identity; the cell + state localize and partition it. |
|
|
32
|
+
|
|
33
|
+
## API
|
|
34
|
+
|
|
35
|
+
```ts
|
|
36
|
+
// Create a stable address primary key
|
|
37
|
+
createPostalAddressID(input: PostalAddressIDInput): string
|
|
38
|
+
|
|
39
|
+
// Parse a key back into its components
|
|
40
|
+
parsePostalAddressID(id: string): ParsedPostalAddressID
|
|
41
|
+
// → { state: "tx", h3Cell: "882830829dfffff", hash: "abc123def456" }
|
|
42
|
+
```
|
|
43
|
+
|
|
44
|
+
## Design
|
|
45
|
+
|
|
46
|
+
- **Self-contained** on `h3-js`, not `@mailwoman/spatial` (which wasn't
|
|
47
|
+
published when `address-id` shipped). Small, focused dependency footprint.
|
|
48
|
+
- **Content-addressed, not assigned.** The key derives from the data itself
|
|
49
|
+
— no central registry, no sequence numbers.
|
|
50
|
+
- **Jitter-stable.** The H3 cell at resolution 9 (~0.03 km²) absorbs the
|
|
51
|
+
small coordinate differences that come from geocoding the same address
|
|
52
|
+
on different passes.
|
|
53
|
+
|
|
54
|
+
## Use cases
|
|
55
|
+
|
|
56
|
+
- **Deduplication** — `GROUP BY address_id` collapses records at the same
|
|
57
|
+
canonical address without running the fuzzy matcher.
|
|
58
|
+
- **Cross-dataset joins** — deterministic exact-match join key for linking
|
|
59
|
+
records across data sources.
|
|
60
|
+
- **Indexing** — ordered by state prefix for efficient range scans.
|
|
61
|
+
|
|
62
|
+
## Related
|
|
63
|
+
|
|
64
|
+
- [`@mailwoman/match`](../match) — the fuzzy matcher (complementary, not competing)
|
|
65
|
+
- [`@mailwoman/normalize`](../normalize) — canonicalization used by the content hash
|
|
66
|
+
- [`@mailwoman/codex`](../codex) — ZIP → state prefix resolution
|
|
67
|
+
- [`@mailwoman/formatter`](../formatter) — `canonicalKey` (also deterministic, used for blocking)
|
|
68
|
+
|
|
69
|
+
## License
|
|
70
|
+
|
|
71
|
+
[AGPL-3.0-only](https://www.gnu.org/licenses/agpl-3.0.html)
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "@mailwoman/address-id",
|
|
3
|
-
"version": "4.
|
|
3
|
+
"version": "4.11.0",
|
|
4
4
|
"description": "Turn a canonicalized + geocoded address into a stable, parseable primary key (`<state>.<H3-cell>.<hash>`) for deterministic record joins / dedup — the exact-match complement to the fuzzy matcher.",
|
|
5
5
|
"license": "AGPL-3.0-only",
|
|
6
6
|
"repository": {
|
|
@@ -14,8 +14,8 @@
|
|
|
14
14
|
".": "./out/index.js"
|
|
15
15
|
},
|
|
16
16
|
"dependencies": {
|
|
17
|
-
"@mailwoman/codex": "4.
|
|
18
|
-
"@mailwoman/normalize": "4.
|
|
17
|
+
"@mailwoman/codex": "4.11.0",
|
|
18
|
+
"@mailwoman/normalize": "4.11.0",
|
|
19
19
|
"h3-js": "^4.4.0"
|
|
20
20
|
},
|
|
21
21
|
"devDependencies": {
|