emailcanon 0.1.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- emailcanon-0.1.0/.github/workflows/lint.yml +29 -0
- emailcanon-0.1.0/.github/workflows/test.yml +27 -0
- emailcanon-0.1.0/.gitignore +31 -0
- emailcanon-0.1.0/LICENSE +5 -0
- emailcanon-0.1.0/PKG-INFO +367 -0
- emailcanon-0.1.0/README.md +355 -0
- emailcanon-0.1.0/emailcanon/__init__.py +3 -0
- emailcanon-0.1.0/emailcanon/_types.py +68 -0
- emailcanon-0.1.0/emailcanon/core.py +231 -0
- emailcanon-0.1.0/emailcanon/providers.py +172 -0
- emailcanon-0.1.0/emailcanon/py.typed +0 -0
- emailcanon-0.1.0/pyproject.toml +53 -0
- emailcanon-0.1.0/tests/__init__.py +0 -0
- emailcanon-0.1.0/tests/config.py +165 -0
- emailcanon-0.1.0/tests/edgecases.py +88 -0
- emailcanon-0.1.0/tests/gmail.py +33 -0
- emailcanon-0.1.0/tests/providers.py +95 -0
|
@@ -0,0 +1,29 @@
|
|
|
1
|
+
name: Lint
|
|
2
|
+
|
|
3
|
+
on:
|
|
4
|
+
push:
|
|
5
|
+
branches: [master]
|
|
6
|
+
pull_request:
|
|
7
|
+
|
|
8
|
+
jobs:
|
|
9
|
+
lint:
|
|
10
|
+
runs-on: ubuntu-latest
|
|
11
|
+
steps:
|
|
12
|
+
- uses: actions/checkout@v4
|
|
13
|
+
|
|
14
|
+
- name: Set up Python
|
|
15
|
+
uses: actions/setup-python@v5
|
|
16
|
+
with:
|
|
17
|
+
python-version: "3.12"
|
|
18
|
+
|
|
19
|
+
- name: Install dev dependencies
|
|
20
|
+
run: pip install -e ".[dev]"
|
|
21
|
+
|
|
22
|
+
- name: Ruff (lint)
|
|
23
|
+
run: ruff check .
|
|
24
|
+
|
|
25
|
+
- name: Ruff (format check)
|
|
26
|
+
run: ruff format --check .
|
|
27
|
+
|
|
28
|
+
- name: Mypy (type check)
|
|
29
|
+
run: mypy
|
|
@@ -0,0 +1,27 @@
|
|
|
1
|
+
name: Tests
|
|
2
|
+
|
|
3
|
+
on:
|
|
4
|
+
push:
|
|
5
|
+
branches: [master]
|
|
6
|
+
pull_request:
|
|
7
|
+
|
|
8
|
+
jobs:
|
|
9
|
+
test:
|
|
10
|
+
runs-on: ubuntu-latest
|
|
11
|
+
strategy:
|
|
12
|
+
fail-fast: false
|
|
13
|
+
matrix:
|
|
14
|
+
python-version: ["3.12", "3.13"]
|
|
15
|
+
steps:
|
|
16
|
+
- uses: actions/checkout@v4
|
|
17
|
+
|
|
18
|
+
- name: Set up Python ${{ matrix.python-version }}
|
|
19
|
+
uses: actions/setup-python@v5
|
|
20
|
+
with:
|
|
21
|
+
python-version: ${{ matrix.python-version }}
|
|
22
|
+
|
|
23
|
+
- name: Install package
|
|
24
|
+
run: pip install -e ".[dev]"
|
|
25
|
+
|
|
26
|
+
- name: Run tests
|
|
27
|
+
run: python -m unittest discover -s tests -p "*.py" -v
|
|
@@ -0,0 +1,31 @@
|
|
|
1
|
+
# Python
|
|
2
|
+
__pycache__/
|
|
3
|
+
*.py[cod]
|
|
4
|
+
*.pyo
|
|
5
|
+
*.pyd
|
|
6
|
+
*.so
|
|
7
|
+
*.egg
|
|
8
|
+
*.egg-info/
|
|
9
|
+
dist/
|
|
10
|
+
build/
|
|
11
|
+
.eggs/
|
|
12
|
+
|
|
13
|
+
# Virtual environments
|
|
14
|
+
.venv/
|
|
15
|
+
venv/
|
|
16
|
+
env/
|
|
17
|
+
|
|
18
|
+
# Type checking
|
|
19
|
+
.mypy_cache/
|
|
20
|
+
|
|
21
|
+
# Testing
|
|
22
|
+
.pytest_cache/
|
|
23
|
+
.coverage
|
|
24
|
+
htmlcov/
|
|
25
|
+
|
|
26
|
+
# Ruff
|
|
27
|
+
.ruff_cache/
|
|
28
|
+
|
|
29
|
+
# IDE
|
|
30
|
+
.vscode/
|
|
31
|
+
.idea/
|
emailcanon-0.1.0/LICENSE
ADDED
|
@@ -0,0 +1,5 @@
|
|
|
1
|
+
Copyright 2026 grMLEqomlkkU5Eeinz4brIrOVCUCkJuN
|
|
2
|
+
|
|
3
|
+
Permission to use, copy, modify, and/or distribute this software for any purpose with or without fee is hereby granted, provided that the above copyright notice and this permission notice appear in all copies.
|
|
4
|
+
|
|
5
|
+
THE SOFTWARE IS PROVIDED “AS IS” AND THE AUTHOR DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
|
|
@@ -0,0 +1,367 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: emailcanon
|
|
3
|
+
Version: 0.1.0
|
|
4
|
+
Summary: A Python library for email canonicalization
|
|
5
|
+
License: MIT
|
|
6
|
+
License-File: LICENSE
|
|
7
|
+
Requires-Python: >=3.12
|
|
8
|
+
Provides-Extra: dev
|
|
9
|
+
Requires-Dist: mypy>=1.10; extra == 'dev'
|
|
10
|
+
Requires-Dist: ruff>=0.4; extra == 'dev'
|
|
11
|
+
Description-Content-Type: text/markdown
|
|
12
|
+
|
|
13
|
+
# emailCanon
|
|
14
|
+
|
|
15
|
+
A Python library for email address canonicalization and normalization. Normalizes email addresses according to provider-specific rules (Gmail, Outlook, Yahoo, etc.) to help identify duplicate accounts and standardize email formats.
|
|
16
|
+
|
|
17
|
+
This is the Python equivalent of [@grml/nomadic](https://github.com/grMLEqomlkkU5Eeinz4brIrOVCUCkJuN/nomad).
|
|
18
|
+
|
|
19
|
+
## Features
|
|
20
|
+
|
|
21
|
+
- **Provider-aware normalization**: Applies provider-specific rules (sub-address stripping, dot removal, case-folding)
|
|
22
|
+
- **Canonical domain collapsing**: Maps alias domains to canonical domains (e.g., `googlemail.com` becomes `gmail.com`)
|
|
23
|
+
- **Sub-address stripping**: Removes subaddresses (e.g., `user+tag` becomes `user`) per provider
|
|
24
|
+
- **RFC-compliant**: Handles quoted local parts and validates email structure
|
|
25
|
+
- **Customizable**: Extend with custom providers or override default rules
|
|
26
|
+
- **Built-in providers**: Gmail, Microsoft (Outlook/Hotmail), Yahoo, Apple iCloud, Fastmail, ProtonMail, and 10+ others
|
|
27
|
+
|
|
28
|
+
## Installation
|
|
29
|
+
|
|
30
|
+
```bash
|
|
31
|
+
pip install emailcanon
|
|
32
|
+
```
|
|
33
|
+
|
|
34
|
+
## Quick Start
|
|
35
|
+
|
|
36
|
+
```python
|
|
37
|
+
from emailcanon import normalizeEmail, getEmailProvider, isSameEmail
|
|
38
|
+
|
|
39
|
+
# Basic normalization
|
|
40
|
+
normalized = normalizeEmail("Test.User+Tag@Gmail.com")
|
|
41
|
+
print(normalized) # "testuser@gmail.com"
|
|
42
|
+
|
|
43
|
+
# Get the provider ID
|
|
44
|
+
provider = getEmailProvider("user@outlook.com")
|
|
45
|
+
print(provider) # "microsoft"
|
|
46
|
+
|
|
47
|
+
# Check if two emails are equivalent
|
|
48
|
+
same = isSameEmail("john.doe+newsletter@gmail.com", "johndoe@googlemail.com")
|
|
49
|
+
print(same) # True
|
|
50
|
+
```
|
|
51
|
+
|
|
52
|
+
## API Reference
|
|
53
|
+
|
|
54
|
+
### `normalizeEmail(email: str, options: NormalizeOptions | None = None) -> str`
|
|
55
|
+
|
|
56
|
+
Normalizes an email address to its canonical form.
|
|
57
|
+
|
|
58
|
+
Applies provider-specific rules including:
|
|
59
|
+
- Sub-address stripping (e.g., `+tag` for Gmail)
|
|
60
|
+
- Dot removal (Gmail ignores dots in the local part)
|
|
61
|
+
- Case folding of the local part
|
|
62
|
+
- Canonical domain mapping (alias domains to primary domain)
|
|
63
|
+
|
|
64
|
+
**Parameters:**
|
|
65
|
+
- `email`: The email address to normalize
|
|
66
|
+
- `options`: Optional normalization options (see `NormalizeOptions` below)
|
|
67
|
+
|
|
68
|
+
**Returns:** Canonical email string. The string is returned regardless of
|
|
69
|
+
whether the address is structurally valid; this function discards the `valid`
|
|
70
|
+
flag. If you need the validity flag or the parsed components, use
|
|
71
|
+
[`normalizeEmailDetailed`](#normalizeemaildetailedemail-str-options-normalizeoptions--none--none---normalizedemail).
|
|
72
|
+
|
|
73
|
+
**Raises:** `TypeError` if email is not a string
|
|
74
|
+
|
|
75
|
+
**Examples:**
|
|
76
|
+
```python
|
|
77
|
+
normalizeEmail("john.doe+newsletter@GMAIL.COM") # "johndoe@gmail.com"
|
|
78
|
+
normalizeEmail("first.name@outlook.com") # "firstname@outlook.com"
|
|
79
|
+
normalizeEmail("user-tag@yahoo.com") # "user@yahoo.com"
|
|
80
|
+
normalizeEmail("\"quoted.local\"@example.com") # "\"quoted.local\"@example.com"
|
|
81
|
+
```
|
|
82
|
+
|
|
83
|
+
### `normalizeEmailDetailed(email: str, options: NormalizeOptions | None = None) -> NormalizedEmail`
|
|
84
|
+
|
|
85
|
+
Normalizes an email and returns detailed information about the normalization.
|
|
86
|
+
|
|
87
|
+
**Returns:** `NormalizedEmail` object with:
|
|
88
|
+
- `normalized`: Canonical email string
|
|
89
|
+
- `local`: Normalized local part
|
|
90
|
+
- `domain`: Normalized domain
|
|
91
|
+
- `providerId`: ID of matched provider (e.g., "gmail"), or `None`
|
|
92
|
+
- `subaddress`: Extracted sub-address (e.g., "tag" from "user+tag"), or `None`
|
|
93
|
+
(an empty tag such as `user+` yields `None`, just like having no separator)
|
|
94
|
+
- `valid`: Whether the email is structurally valid (see
|
|
95
|
+
[Validity flag limitations](#validity-flag-limitations))
|
|
96
|
+
|
|
97
|
+
**Example:**
|
|
98
|
+
```python
|
|
99
|
+
result = normalizeEmailDetailed("john+newsletter@gmail.com")
|
|
100
|
+
# NormalizedEmail(
|
|
101
|
+
# normalized="john@gmail.com",
|
|
102
|
+
# local="john",
|
|
103
|
+
# domain="gmail.com",
|
|
104
|
+
# providerId="gmail",
|
|
105
|
+
# subaddress="newsletter",
|
|
106
|
+
# valid=True
|
|
107
|
+
# )
|
|
108
|
+
```
|
|
109
|
+
|
|
110
|
+
### `getEmailProvider(email: str, options: NormalizeOptions | None = None) -> str | None`
|
|
111
|
+
|
|
112
|
+
Returns the provider ID for an email address, or `None` if no provider matches.
|
|
113
|
+
|
|
114
|
+
**Parameters:**
|
|
115
|
+
- `email`: The email address
|
|
116
|
+
- `options`: Optional normalization options
|
|
117
|
+
|
|
118
|
+
**Returns:** Provider ID string (e.g., "gmail", "microsoft") or `None`
|
|
119
|
+
|
|
120
|
+
**Raises:** `TypeError` if email is not a string
|
|
121
|
+
|
|
122
|
+
**Example:**
|
|
123
|
+
```python
|
|
124
|
+
getEmailProvider("user@gmail.com") # "gmail"
|
|
125
|
+
getEmailProvider("name@outlook.com") # "microsoft"
|
|
126
|
+
getEmailProvider("person@example.com") # None
|
|
127
|
+
```
|
|
128
|
+
|
|
129
|
+
### `isSameEmail(a: str, b: str, options: NormalizeOptions | None = None) -> bool`
|
|
130
|
+
|
|
131
|
+
Checks if two email addresses normalize to the same canonical form.
|
|
132
|
+
|
|
133
|
+
Useful for detecting duplicate accounts where users registered with different aliases.
|
|
134
|
+
|
|
135
|
+
**Parameters:**
|
|
136
|
+
- `a`, `b`: Email addresses to compare
|
|
137
|
+
- `options`: Optional normalization options
|
|
138
|
+
|
|
139
|
+
**Returns:** `True` if both emails normalize identically, `False` otherwise
|
|
140
|
+
|
|
141
|
+
**Example:**
|
|
142
|
+
```python
|
|
143
|
+
isSameEmail("john.doe@gmail.com", "johndoe+spam@googlemail.com") # True
|
|
144
|
+
isSameEmail("john@example.com", "jane@example.com") # False
|
|
145
|
+
```
|
|
146
|
+
|
|
147
|
+
## Configuration
|
|
148
|
+
|
|
149
|
+
### `NormalizeOptions`
|
|
150
|
+
|
|
151
|
+
Control normalization behavior via options:
|
|
152
|
+
|
|
153
|
+
```python
|
|
154
|
+
from emailcanon import NormalizeOptions, ProviderRule, normalizeEmail
|
|
155
|
+
|
|
156
|
+
options = NormalizeOptions(
|
|
157
|
+
lowercaseDomain=True, # Default: True
|
|
158
|
+
providers=[ # Custom providers to add/override
|
|
159
|
+
ProviderRule(
|
|
160
|
+
id="custom",
|
|
161
|
+
domains=["custom.example.com"],
|
|
162
|
+
lowercaseLocal=True,
|
|
163
|
+
removeDots=True,
|
|
164
|
+
subaddressSeparators=["+"]
|
|
165
|
+
)
|
|
166
|
+
],
|
|
167
|
+
replaceDefaultProviders=False, # Keep built-in providers
|
|
168
|
+
defaultRule=None # Rule for unknown domains
|
|
169
|
+
)
|
|
170
|
+
|
|
171
|
+
normalized = normalizeEmail("user@custom.example.com", options)
|
|
172
|
+
```
|
|
173
|
+
|
|
174
|
+
**Caching note:** The provider registry built from a `NormalizeOptions`
|
|
175
|
+
instance is memoized per options object (keyed on identity), so reusing the
|
|
176
|
+
same `options` across many calls — the typical bulk-deduplication pattern —
|
|
177
|
+
avoids rebuilding the registry every time. Because the cache is keyed on object
|
|
178
|
+
identity, mutating an `options` object after its first use will not be
|
|
179
|
+
reflected; construct a fresh `NormalizeOptions` instead of mutating one.
|
|
180
|
+
|
|
181
|
+
### `ProviderRule`
|
|
182
|
+
|
|
183
|
+
Define custom email provider rules:
|
|
184
|
+
|
|
185
|
+
```python
|
|
186
|
+
ProviderRule(
|
|
187
|
+
id="my_provider", # Unique identifier
|
|
188
|
+
domains=["example.com", "mail.example.com"], # Domain patterns
|
|
189
|
+
lowercaseLocal=True, # Convert local part to lowercase
|
|
190
|
+
removeDots=True, # Remove dots from local part
|
|
191
|
+
subaddressSeparators=["+", "-"], # Characters that separate subaddress
|
|
192
|
+
canonicalDomain="example.com" # Map all domains to this
|
|
193
|
+
)
|
|
194
|
+
```
|
|
195
|
+
|
|
196
|
+
## Supported Providers
|
|
197
|
+
|
|
198
|
+
| Provider | ID | Domains | Rules |
|
|
199
|
+
|----------|----|---------| ----- |
|
|
200
|
+
| **Gmail** | `gmail` | gmail.com, googlemail.com | Lowercase, remove dots, `+` subaddress, maps to gmail.com |
|
|
201
|
+
| **Microsoft** | `microsoft` | outlook.com*, hotmail.com*, live.com*, msn.com, others | Lowercase, `+` subaddress |
|
|
202
|
+
| **Yahoo** | `yahoo` | yahoo.com*, ymail.com, rocketmail.com | Lowercase, `-` subaddress |
|
|
203
|
+
| **Apple iCloud** | `icloud` | icloud.com, me.com, mac.com | Lowercase, `+` subaddress |
|
|
204
|
+
| **Fastmail** | `fastmail` | fastmail.com, fastmail.fm | Lowercase, `+` subaddress |
|
|
205
|
+
| **ProtonMail** | `proton` | protonmail.com, protonmail.ch, proton.me, pm.me | Lowercase, `+` subaddress |
|
|
206
|
+
| **Yandex** | `yandex` | yandex.com, yandex.ru, ya.ru, others | Lowercase, `+` subaddress |
|
|
207
|
+
| **Zoho** | `zoho` | zoho.com, zohomail.com, zoho.eu | Lowercase, `+` subaddress |
|
|
208
|
+
| **Tutanota** | `tutanota` | tutanota.com, tutanota.de, tutamail.com, tuta.com, others | Lowercase, `+` subaddress |
|
|
209
|
+
| **Posteo** | `posteo` | posteo.de, posteo.net | Lowercase, `+` subaddress |
|
|
210
|
+
| **Mailbox.org** | `mailbox` | mailbox.org | Lowercase, `+` subaddress |
|
|
211
|
+
| **Mailfence** | `mailfence` | mailfence.com | Lowercase, `+` subaddress |
|
|
212
|
+
| **Runbox** | `runbox` | runbox.com | Lowercase, `+` subaddress |
|
|
213
|
+
| **Pobox** | `pobox` | pobox.com | Lowercase, `+` subaddress |
|
|
214
|
+
| **AOL** | `aol` | aol.com, aim.com | Lowercase |
|
|
215
|
+
|
|
216
|
+
*\* Multiple regional variants supported (com.au, co.uk, de, fr, etc.)*
|
|
217
|
+
|
|
218
|
+
## Examples
|
|
219
|
+
|
|
220
|
+
### Duplicate Account Detection
|
|
221
|
+
|
|
222
|
+
```python
|
|
223
|
+
from emailcanon import normalizeEmail
|
|
224
|
+
|
|
225
|
+
emails = [
|
|
226
|
+
"john.doe@gmail.com",
|
|
227
|
+
"johndoe+shopping@googlemail.com",
|
|
228
|
+
"john.doe@yahoo.com",
|
|
229
|
+
"J.DOE@GMAIL.COM"
|
|
230
|
+
]
|
|
231
|
+
|
|
232
|
+
# Group by normalized form
|
|
233
|
+
normalized_map = {}
|
|
234
|
+
for email in emails:
|
|
235
|
+
norm = normalizeEmail(email)
|
|
236
|
+
if norm not in normalized_map:
|
|
237
|
+
normalized_map[norm] = []
|
|
238
|
+
normalized_map[norm].append(email)
|
|
239
|
+
|
|
240
|
+
# Find duplicates
|
|
241
|
+
for norm_email, originals in normalized_map.items():
|
|
242
|
+
if len(originals) > 1:
|
|
243
|
+
print(f"Duplicates: {originals} maps to {norm_email}")
|
|
244
|
+
# Output: Duplicates: ['john.doe@gmail.com', 'johndoe+shopping@googlemail.com', 'J.DOE@GMAIL.COM'] maps to john@gmail.com
|
|
245
|
+
```
|
|
246
|
+
|
|
247
|
+
### Custom Provider Rules
|
|
248
|
+
|
|
249
|
+
```python
|
|
250
|
+
from emailcanon import normalizeEmail, NormalizeOptions, ProviderRule
|
|
251
|
+
|
|
252
|
+
# Add custom provider
|
|
253
|
+
options = NormalizeOptions(
|
|
254
|
+
providers=[
|
|
255
|
+
ProviderRule(
|
|
256
|
+
id="company",
|
|
257
|
+
domains=["company.com", "corp.company.com"],
|
|
258
|
+
canonicalDomain="company.com",
|
|
259
|
+
lowercaseLocal=True,
|
|
260
|
+
subaddressSeparators=["+"]
|
|
261
|
+
)
|
|
262
|
+
]
|
|
263
|
+
)
|
|
264
|
+
|
|
265
|
+
normalizeEmail("User+Team@corp.company.com", options)
|
|
266
|
+
# "user@company.com"
|
|
267
|
+
```
|
|
268
|
+
|
|
269
|
+
### Skip Default Providers
|
|
270
|
+
|
|
271
|
+
```python
|
|
272
|
+
from emailcanon import normalizeEmail, NormalizeOptions, ProviderRule
|
|
273
|
+
|
|
274
|
+
# Use only custom providers, ignore built-in ones
|
|
275
|
+
options = NormalizeOptions(
|
|
276
|
+
replaceDefaultProviders=True,
|
|
277
|
+
providers=[
|
|
278
|
+
ProviderRule(
|
|
279
|
+
id="custom",
|
|
280
|
+
domains=["custom.local"],
|
|
281
|
+
lowercaseLocal=True
|
|
282
|
+
)
|
|
283
|
+
]
|
|
284
|
+
)
|
|
285
|
+
|
|
286
|
+
normalizeEmail("User@custom.local", options)
|
|
287
|
+
# "user@custom.local"
|
|
288
|
+
```
|
|
289
|
+
|
|
290
|
+
## Design Notes
|
|
291
|
+
|
|
292
|
+
- **Conservative by default**: Unknown domains get minimal normalization (lowercase domain only, local part unchanged)
|
|
293
|
+
- **Quoted strings**: RFC 5321 quoted local parts (e.g., `"user name"@example.com`) are preserved verbatim
|
|
294
|
+
- **Domain validation**: Domains must follow standard DNS naming (labels separated by dots, alphanumeric + hyphens)
|
|
295
|
+
- **Immutable rules**: Provider rules are frozen dataclasses; mutation is not possible
|
|
296
|
+
|
|
297
|
+
### Validity flag limitations
|
|
298
|
+
|
|
299
|
+
The `valid` flag returned by `normalizeEmailDetailed` is a pragmatic structural
|
|
300
|
+
check, not a full RFC 5321/5322 validator. In particular, the domain check has
|
|
301
|
+
two known limitations:
|
|
302
|
+
|
|
303
|
+
- **Single-label hosts are rejected.** The domain must contain at least one dot,
|
|
304
|
+
so hosts like `localhost` are reported as `valid=False`, even though they are
|
|
305
|
+
deliverable in some environments.
|
|
306
|
+
- **Non-ASCII / IDN domains are rejected.** The check is ASCII-only, so
|
|
307
|
+
internationalized domains such as `münchen.de` are reported as
|
|
308
|
+
`valid=False`. Pre-encode them to Punycode (`xn--mnchen-3ya.de`) if you need
|
|
309
|
+
them to pass.
|
|
310
|
+
|
|
311
|
+
These limitations only affect the `valid` flag; normalization of the local part
|
|
312
|
+
and domain is still performed in all cases.
|
|
313
|
+
|
|
314
|
+
## Why Email Canonicalization?
|
|
315
|
+
|
|
316
|
+
Email addresses can look different but deliver to the same mailbox:
|
|
317
|
+
|
|
318
|
+
| Input | Gmail Reality |
|
|
319
|
+
|-------|---------------|
|
|
320
|
+
| `john.doe@gmail.com` | `johndoe@gmail.com` (dots ignored) |
|
|
321
|
+
| `johndoe+newsletter@gmail.com` | `johndoe@gmail.com` (subaddress stripped) |
|
|
322
|
+
| `johndoe@googlemail.com` | `johndoe@gmail.com` (domain alias) |
|
|
323
|
+
|
|
324
|
+
Without canonicalization, a user could register multiple accounts. emailCanon standardizes these to detect and prevent duplicate registrations.
|
|
325
|
+
|
|
326
|
+
## Development
|
|
327
|
+
|
|
328
|
+
Set up a local virtual environment and install the package with its dev
|
|
329
|
+
dependencies (mypy, ruff):
|
|
330
|
+
|
|
331
|
+
```bash
|
|
332
|
+
# Create and activate a virtual environment
|
|
333
|
+
python -m venv .venv
|
|
334
|
+
source .venv/bin/activate # Windows: .venv\Scripts\activate
|
|
335
|
+
|
|
336
|
+
# Install the package in editable mode with dev extras
|
|
337
|
+
pip install -e ".[dev]"
|
|
338
|
+
```
|
|
339
|
+
|
|
340
|
+
Then run the tooling:
|
|
341
|
+
|
|
342
|
+
```bash
|
|
343
|
+
# Run the test suite (the test files are named after the area they cover,
|
|
344
|
+
# e.g. gmail.py, so discovery needs an explicit pattern)
|
|
345
|
+
python -m unittest discover -s tests -p "*.py"
|
|
346
|
+
|
|
347
|
+
# ...or run a single test module directly
|
|
348
|
+
python -m unittest tests.gmail
|
|
349
|
+
|
|
350
|
+
mypy # type-check
|
|
351
|
+
ruff format # format with tabs
|
|
352
|
+
ruff check # lint
|
|
353
|
+
```
|
|
354
|
+
|
|
355
|
+
When you're done, leave the environment with `deactivate`.
|
|
356
|
+
|
|
357
|
+
## References
|
|
358
|
+
|
|
359
|
+
Provider rules compiled from:
|
|
360
|
+
- Wikipedia, [Email address](https://en.wikipedia.org/wiki/Email_address) (sub-addressing section)
|
|
361
|
+
- [aaronbassett's Email sub-addressing gist](https://gist.github.com/aaronbassett/2f8b3a26cf54e5e1fc9c)
|
|
362
|
+
- [validator.js](https://github.com/validatorjs/validator.js) normalizeEmail conventions
|
|
363
|
+
- Official provider documentation (Fastmail, Microsoft Learn, Proton)
|
|
364
|
+
|
|
365
|
+
## License
|
|
366
|
+
|
|
367
|
+
MIT
|