jmap-email 0.1.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- jmap_email-0.1.0/CHANGELOG.md +13 -0
- jmap_email-0.1.0/CONTRIBUTING.md +79 -0
- jmap_email-0.1.0/LICENSE +22 -0
- jmap_email-0.1.0/PKG-INFO +472 -0
- jmap_email-0.1.0/README.md +442 -0
- jmap_email-0.1.0/SECURITY.md +46 -0
- jmap_email-0.1.0/examples/compose_with_attachment.py +43 -0
- jmap_email-0.1.0/examples/encoded_word_subject.py +39 -0
- jmap_email-0.1.0/examples/import_eml_safely.py +63 -0
- jmap_email-0.1.0/examples/inline_image_roundtrip.py +62 -0
- jmap_email-0.1.0/examples/parse_and_print.py +37 -0
- jmap_email-0.1.0/jmap_email/__init__.py +112 -0
- jmap_email-0.1.0/jmap_email/composer.py +1081 -0
- jmap_email-0.1.0/jmap_email/helpers.py +198 -0
- jmap_email-0.1.0/jmap_email/limits.py +76 -0
- jmap_email-0.1.0/jmap_email/parser.py +1931 -0
- jmap_email-0.1.0/jmap_email/py.typed +0 -0
- jmap_email-0.1.0/jmap_email/types.py +234 -0
- jmap_email-0.1.0/pyproject.toml +180 -0
- jmap_email-0.1.0/tests/__init__.py +0 -0
- jmap_email-0.1.0/tests/test_address_fuzz.py +372 -0
- jmap_email-0.1.0/tests/test_composer.py +3118 -0
- jmap_email-0.1.0/tests/test_composer_fuzz.py +433 -0
- jmap_email-0.1.0/tests/test_helpers.py +224 -0
- jmap_email-0.1.0/tests/test_limits.py +241 -0
- jmap_email-0.1.0/tests/test_message_fuzz.py +724 -0
- jmap_email-0.1.0/tests/test_parser.py +4064 -0
- jmap_email-0.1.0/tests/test_parser_structure.py +917 -0
|
@@ -0,0 +1,13 @@
|
|
|
1
|
+
# Changelog
|
|
2
|
+
|
|
3
|
+
All notable changes to `jmap-email` are documented here.
|
|
4
|
+
|
|
5
|
+
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
|
|
6
|
+
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
|
|
7
|
+
|
|
8
|
+
## [0.1.0] - 2026-06-08
|
|
9
|
+
|
|
10
|
+
Initial release. Extracted from the
|
|
11
|
+
[Messages](https://github.com/suitenumerique/messages) project.
|
|
12
|
+
|
|
13
|
+
[0.1.0]: https://github.com/suitenumerique/messages/releases/tag/jmap-email-0.1.0
|
|
@@ -0,0 +1,79 @@
|
|
|
1
|
+
# Contributing to `jmap-email`
|
|
2
|
+
|
|
3
|
+
Thanks for considering a contribution. This package is small and
|
|
4
|
+
focused; the bar for accepting changes is that they make the library
|
|
5
|
+
more correct, more spec-conformant, or better-documented without
|
|
6
|
+
adding runtime dependencies.
|
|
7
|
+
|
|
8
|
+
## Development environment
|
|
9
|
+
|
|
10
|
+
Two paths are supported.
|
|
11
|
+
|
|
12
|
+
### Docker (matches CI)
|
|
13
|
+
|
|
14
|
+
From the repository root:
|
|
15
|
+
|
|
16
|
+
```bash
|
|
17
|
+
make test-jmap-email # full test suite
|
|
18
|
+
make typecheck-jmap-email # ty (Astral)
|
|
19
|
+
```
|
|
20
|
+
|
|
21
|
+
These spin up the same container image CI uses, so the only divergence
|
|
22
|
+
between local results and CI is the host architecture (arm64 vs x86_64).
|
|
23
|
+
|
|
24
|
+
### Native Python 3.14.5+
|
|
25
|
+
|
|
26
|
+
```bash
|
|
27
|
+
cd src/jmap-email
|
|
28
|
+
pip install -e '.[dev]'
|
|
29
|
+
|
|
30
|
+
pytest # default selection — fuzz tests excluded
|
|
31
|
+
pytest -m fuzz # property-based / Hypothesis fuzz tests
|
|
32
|
+
ruff check .
|
|
33
|
+
ruff format --check .
|
|
34
|
+
```
|
|
35
|
+
|
|
36
|
+
## Pull-request checklist
|
|
37
|
+
|
|
38
|
+
Every PR should:
|
|
39
|
+
|
|
40
|
+
- Add or update a test that fails without the change and passes with
|
|
41
|
+
it. For parser fixes, the test goes in `tests/test_parser.py` near
|
|
42
|
+
the closest existing class; for composer fixes,
|
|
43
|
+
`tests/test_composer.py`; for shape-helper fixes,
|
|
44
|
+
`tests/test_helpers.py`.
|
|
45
|
+
- Keep `make typecheck-jmap-email` green. `ty` is the source of truth
|
|
46
|
+
for type contracts.
|
|
47
|
+
- Not introduce a runtime dependency. The package's value comes
|
|
48
|
+
from being a clean stdlib wrapper; new deps are rejected unless they
|
|
49
|
+
ship a CVE fix the stdlib won't.
|
|
50
|
+
- Update `CHANGELOG.md` under the `Unreleased` heading when the change
|
|
51
|
+
is user-visible.
|
|
52
|
+
- Update `README.md` when public API surface, conformance status, or
|
|
53
|
+
resource defaults move.
|
|
54
|
+
|
|
55
|
+
## Coding conventions
|
|
56
|
+
|
|
57
|
+
- **PEP 585 / PEP 604 typing.** Use `list[X]` / `dict[K, V]` /
|
|
58
|
+
`X | None` rather than `typing.List[X]` etc. No `from __future__
|
|
59
|
+
import annotations` — the supported floor is 3.14.5.
|
|
60
|
+
- **No legacy stdlib imports inside hot paths.** If a regex or
|
|
61
|
+
`email.utils` helper is the wrong tool, write the loop.
|
|
62
|
+
- **Module-private symbols** are prefixed with `_`. Anything in
|
|
63
|
+
`__all__` is part of the wire contract — changes that rename or
|
|
64
|
+
remove a name require a major-version bump (post-1.0) or a clear
|
|
65
|
+
`CHANGELOG` Removed entry (during 0.x).
|
|
66
|
+
|
|
67
|
+
## Adding a regression test for a new CVE / paper
|
|
68
|
+
|
|
69
|
+
1. Add the test to the appropriate `tests/` module under the
|
|
70
|
+
`TestParserSecurityRegressions` or `TestComposerRFCAudit` class
|
|
71
|
+
(whichever fits).
|
|
72
|
+
2. Reference the CVE / paper by id in the test docstring.
|
|
73
|
+
3. Add the entry to the [defense matrix](README.md#defense-matrix) in
|
|
74
|
+
the README.
|
|
75
|
+
|
|
76
|
+
## Security-sensitive changes
|
|
77
|
+
|
|
78
|
+
See `SECURITY.md`. Don't open a public PR or issue for a vulnerability
|
|
79
|
+
before coordinating disclosure.
|
jmap_email-0.1.0/LICENSE
ADDED
|
@@ -0,0 +1,22 @@
|
|
|
1
|
+
MIT License
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2026 Agence Nationale de la Cohésion des Territoires (ANCT)
|
|
4
|
+
and contributors.
|
|
5
|
+
|
|
6
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
7
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
8
|
+
in the Software without restriction, including without limitation the rights
|
|
9
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
10
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
11
|
+
furnished to do so, subject to the following conditions:
|
|
12
|
+
|
|
13
|
+
The above copyright notice and this permission notice shall be included in all
|
|
14
|
+
copies or substantial portions of the Software.
|
|
15
|
+
|
|
16
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
17
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
18
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
19
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
20
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
21
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
22
|
+
SOFTWARE.
|
|
@@ -0,0 +1,472 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: jmap-email
|
|
3
|
+
Version: 0.1.0
|
|
4
|
+
Summary: A strict-JMAP RFC 8621 Email object library for Python 3.14+ with lenient RFC 5322 / MIME parsing and strict-by-design composition. Zero runtime dependencies.
|
|
5
|
+
Project-URL: Homepage, https://github.com/suitenumerique/messages
|
|
6
|
+
Project-URL: Repository, https://github.com/suitenumerique/messages
|
|
7
|
+
Project-URL: Bug Tracker, https://github.com/suitenumerique/messages/issues
|
|
8
|
+
Project-URL: Changelog, https://github.com/suitenumerique/messages/blob/main/src/jmap-email/CHANGELOG.md
|
|
9
|
+
Author-email: ANCT <contact@suite.anct.gouv.fr>
|
|
10
|
+
License-Expression: MIT
|
|
11
|
+
License-File: LICENSE
|
|
12
|
+
Keywords: composer,email,jmap,mime,parser,rfc5322,rfc8621
|
|
13
|
+
Classifier: Development Status :: 4 - Beta
|
|
14
|
+
Classifier: Intended Audience :: Developers
|
|
15
|
+
Classifier: License :: OSI Approved :: MIT License
|
|
16
|
+
Classifier: Natural Language :: English
|
|
17
|
+
Classifier: Programming Language :: Python :: 3
|
|
18
|
+
Classifier: Programming Language :: Python :: 3.14
|
|
19
|
+
Classifier: Topic :: Communications :: Email
|
|
20
|
+
Classifier: Topic :: Software Development :: Libraries :: Python Modules
|
|
21
|
+
Requires-Python: <4.0,>=3.14.5
|
|
22
|
+
Provides-Extra: dev
|
|
23
|
+
Requires-Dist: hypothesis>=6.151.0; extra == 'dev'
|
|
24
|
+
Requires-Dist: pylint>=4.0.4; extra == 'dev'
|
|
25
|
+
Requires-Dist: pytest-cov>=7.0.0; extra == 'dev'
|
|
26
|
+
Requires-Dist: pytest>=9.0.0; extra == 'dev'
|
|
27
|
+
Requires-Dist: ruff>=0.15.0; extra == 'dev'
|
|
28
|
+
Requires-Dist: ty>=0.0.44; extra == 'dev'
|
|
29
|
+
Description-Content-Type: text/markdown
|
|
30
|
+
|
|
31
|
+
# jmap-email
|
|
32
|
+
|
|
33
|
+
A strict-JMAP RFC 8621 Email object library for Python 3.14+, with
|
|
34
|
+
lenient RFC 5322 / MIME parsing and strict-by-design composition.
|
|
35
|
+
**Zero runtime dependencies** — the package is a clean wrapper around
|
|
36
|
+
the Python stdlib `email` package, plus null-safe shape accessors over
|
|
37
|
+
the JMAP Email object.
|
|
38
|
+
|
|
39
|
+
The codebase came out of operating an inbound mail pipeline; every CVE
|
|
40
|
+
and research result in the [defense matrix](#defense-matrix) below has
|
|
41
|
+
a regression test under `tests/`.
|
|
42
|
+
|
|
43
|
+
> Status: **beta** while the public API stabilizes. Wire shape
|
|
44
|
+
> conforms to RFC 8621 §4 today; future 0.1.x releases will only add
|
|
45
|
+
> fields, never remove or rename them.
|
|
46
|
+
|
|
47
|
+
## Why a Python 3.14.5 floor?
|
|
48
|
+
|
|
49
|
+
The standard library `email` package receives frequent bug fixes
|
|
50
|
+
between patch releases, and this library wraps it directly — every fix
|
|
51
|
+
to header parsing, RFC 2047 encoded-words, address-list defects, etc.
|
|
52
|
+
surfaces immediately in our output. The 3.14.5 floor is not arbitrary:
|
|
53
|
+
it carries
|
|
54
|
+
[gh-128110](https://github.com/python/cpython/issues/128110)
|
|
55
|
+
(RFC 2047 §6.2 encoded-word adjacent-pair spacing under modern
|
|
56
|
+
policies), which materially affects the composer.
|
|
57
|
+
|
|
58
|
+
**Aligning on the latest 3.14.x patch is recommended for any
|
|
59
|
+
production deployment.** Each CPython patch release that touches
|
|
60
|
+
`email` is one less class of malformed-input edge case downstream
|
|
61
|
+
pipelines need to paper over manually.
|
|
62
|
+
|
|
63
|
+
## Quick start
|
|
64
|
+
|
|
65
|
+
```bash
|
|
66
|
+
pip install jmap-email
|
|
67
|
+
```
|
|
68
|
+
|
|
69
|
+
```python
|
|
70
|
+
import jmap_email
|
|
71
|
+
|
|
72
|
+
# Parse raw RFC 5322 bytes → JMAP Email object dict (RFC 8621 §4),
|
|
73
|
+
# or None when the input is fundamentally unparseable (empty, non-bytes,
|
|
74
|
+
# stdlib produced no Message, etc.). parse_email never raises — the
|
|
75
|
+
# failure mode is a single `is None` check at the call site.
|
|
76
|
+
email = jmap_email.parse_email(raw_bytes)
|
|
77
|
+
if email is None:
|
|
78
|
+
... # log + skip / 400 / quarantine — caller's choice
|
|
79
|
+
|
|
80
|
+
# Recoverable damage (a salvageable malformed header, an unknown
|
|
81
|
+
# charset that fell back to utf-8/replace, …) surfaces in
|
|
82
|
+
# email["_ext"]["defects"] when you opt into the project-extension
|
|
83
|
+
# namespace:
|
|
84
|
+
email_with_ext = jmap_email.parse_email(raw_bytes, extensions=True)
|
|
85
|
+
defects = (email_with_ext or {}).get("_ext", {}).get("defects") or []
|
|
86
|
+
email["subject"] # str | None (NFC normalised)
|
|
87
|
+
email["from"] # [{"name": str | None, "email": str}, ...] | None
|
|
88
|
+
email["sentAt"] # ISO-8601 with offset, e.g. "2026-06-08T14:30:00+02:00"
|
|
89
|
+
email["textBody"] # JMAP EmailBodyPart[]
|
|
90
|
+
email["bodyValues"] # {partId: {"value", "isEncodingProblem", "isTruncated"}}
|
|
91
|
+
email["headers"] # [{"name": "<wire-case>", "value": "<raw>"}, ...]
|
|
92
|
+
email["hasAttachment"] # bool
|
|
93
|
+
email["preview"] # str (≤256 chars, plain-text)
|
|
94
|
+
|
|
95
|
+
# Strict-by-design composer accepts the same JMAP shape on input.
|
|
96
|
+
# sentAt is required (RFC 5322 §3.6.1) — pass it explicitly.
|
|
97
|
+
raw = jmap_email.compose_email({
|
|
98
|
+
"from": [{"name": "Alice", "email": "alice@example.com"}],
|
|
99
|
+
"to": [{"name": "Bob", "email": "bob@example.com"}],
|
|
100
|
+
"subject": "hi",
|
|
101
|
+
"sentAt": "2026-06-08T12:00:00+00:00",
|
|
102
|
+
"textBody": [{"partId": "1", "type": "text/plain", "content": "hello"}],
|
|
103
|
+
})
|
|
104
|
+
# raw is RFC 5322 bytes ready for SMTP delivery (e.g.
|
|
105
|
+
# smtplib.SMTP.sendmail handles dot-stuffing for you).
|
|
106
|
+
```
|
|
107
|
+
|
|
108
|
+
## Conformance
|
|
109
|
+
|
|
110
|
+
`parse_email()` produces a JMAP Email object per RFC 8621 §4 with the
|
|
111
|
+
following defaults, matching `Email/get` `defaultProperties`:
|
|
112
|
+
|
|
113
|
+
| Property | Default emitted? | Notes |
|
|
114
|
+
| ------------------- | ---------------- | -------------------------------------- |
|
|
115
|
+
| Email metadata (`id`, `blobId`, `threadId`, `mailboxIds`, `keywords`, `size`, `receivedAt`) | No | Server-set; out of parser scope |
|
|
116
|
+
| `subject` | Yes | NFC-normalised; `null` when absent |
|
|
117
|
+
| `from` / `sender` / `to` / `cc` / `bcc` / `replyTo` | Yes | `EmailAddress[]` or `null` |
|
|
118
|
+
| `messageId` / `inReplyTo` / `references` | Yes | `String[]` (no `<>`) or `null` |
|
|
119
|
+
| `sentAt` | Yes | ISO-8601 with offset; `null` when absent |
|
|
120
|
+
| `headers` | Yes | `[{name, value}]` ordered; `value` is RFC 8621 Raw form (byte-faithful, NOT encoded-word-decoded) |
|
|
121
|
+
| `textBody` / `htmlBody` / `attachments` | Yes | `EmailBodyPart[]` per RFC 8621 §4.1.4 |
|
|
122
|
+
| `hasAttachment` | Yes | |
|
|
123
|
+
| `preview` | Yes | ≤256-char plain-text excerpt; HTML-stripped + whitespace-normalised |
|
|
124
|
+
| `bodyValues` | Yes | `{partId: EmailBodyValue}` per §4.1.5; text-body parts then carry metadata only |
|
|
125
|
+
| `bodyStructure` | Opt-in | `parse_email(raw, body_structure=True)` |
|
|
126
|
+
| `_ext` | Opt-in | `parse_email(raw, extensions=True)` — project extensions; see below |
|
|
127
|
+
|
|
128
|
+
Parser-only fields (`preview`, `bodyValues`, `bodyStructure`,
|
|
129
|
+
`hasAttachment`, `ext`) are ignored on composer input — passing them
|
|
130
|
+
through `compose_email` is harmless.
|
|
131
|
+
|
|
132
|
+
### Project extensions (`ext`)
|
|
133
|
+
|
|
134
|
+
`extensions=True` adds a single `_ext` sub-dict to the output.
|
|
135
|
+
These fields are NOT in RFC 8621 — they expose information the parser
|
|
136
|
+
already computes so consumers don't have to re-walk the message:
|
|
137
|
+
|
|
138
|
+
- `_ext.defects` — stdlib `MessageDefect` class names collected during
|
|
139
|
+
the parse walk; useful for message-store quarantine policies (the
|
|
140
|
+
Mailman pattern).
|
|
141
|
+
- `_ext.resent` — Resent-* typed projection (see below). Present only
|
|
142
|
+
when the wire carries at least one Resent-* header.
|
|
143
|
+
|
|
144
|
+
### `EmailBodyPart` extensions
|
|
145
|
+
|
|
146
|
+
RFC 8621 §4.1.4 lists the `EmailBodyPart` shape as `partId`, `blobId`,
|
|
147
|
+
`size`, `headers`, `name`, `type`, `charset`, `disposition`, `cid`,
|
|
148
|
+
`language`, `location`, `subParts`. The library extends that shape
|
|
149
|
+
with two project fields. Where each shows up:
|
|
150
|
+
|
|
151
|
+
| Location | `content` | `sha256` |
|
|
152
|
+
|------------------------|--------------------------|----------|
|
|
153
|
+
| `attachments[i]` | always (`bytes`) | always |
|
|
154
|
+
| `textBody[i]` / `htmlBody[i]` with `body_values=False` | yes (`str` for text/*, base64 `str` for inline media) | no |
|
|
155
|
+
| `textBody[i]` / `htmlBody[i]` with `body_values=True` | absent — content moves to `bodyValues` per §4.1.4 | no |
|
|
156
|
+
| `bodyStructure` and its `subParts` tree | never | never |
|
|
157
|
+
|
|
158
|
+
- `content` exists because the library has no blob store to satisfy
|
|
159
|
+
the spec's `blobId` → fetch-by-blob contract. Callers need the
|
|
160
|
+
bytes somewhere on the part. Attachment `content` is never
|
|
161
|
+
stripped; text/html `content` follows the `body_values` flag.
|
|
162
|
+
- `sha256` is the hex digest of the part's decoded bytes — useful
|
|
163
|
+
for dedup / blob storage. Attachment parts only.
|
|
164
|
+
|
|
165
|
+
`bodyStructure` is pure RFC 8621 shape — no project fields appear
|
|
166
|
+
in that tree, so a strict JMAP consumer can ingest it as-is. Strict
|
|
167
|
+
consumers should ignore unknown keys elsewhere. Composer input that
|
|
168
|
+
includes these fields is harmless — the composer ignores parser-only
|
|
169
|
+
metadata.
|
|
170
|
+
|
|
171
|
+
### Duplicate scalar headers
|
|
172
|
+
|
|
173
|
+
RFC 5322 §3.6 marks From / Sender / Reply-To / To / Cc / Bcc /
|
|
174
|
+
Message-ID / In-Reply-To / References / Subject / Date as `max=1` —
|
|
175
|
+
each may appear at most once. Real-world senders sometimes emit
|
|
176
|
+
duplicates anyway. The parser follows the stdlib
|
|
177
|
+
`email.message.Message[name]` convention: when a header is repeated,
|
|
178
|
+
the first occurrence wins for the scalar JMAP projection. Every
|
|
179
|
+
occurrence still appears in the `headers` list in document order.
|
|
180
|
+
Background: see "Detection of Weak Links in Authentication Chains",
|
|
181
|
+
USENIX Security 2020.
|
|
182
|
+
|
|
183
|
+
### Resent-* projection (`_ext.resent`)
|
|
184
|
+
|
|
185
|
+
RFC 8621 §4.1.3 names only the 11 base header convenience properties;
|
|
186
|
+
Resent-* is not on that list. The library pre-computes it as a §4.1.2
|
|
187
|
+
typed-projection idiom and exposes it under `_ext.resent` so forwarded /
|
|
188
|
+
resent mail handling doesn't need to walk `parsed["headers"]`. Sub-
|
|
189
|
+
fields mirror the base properties — `ext.resent["from"]`,
|
|
190
|
+
`["sender"]`, `["replyTo"]`, `["to"]`, `["cc"]`, `["bcc"]`,
|
|
191
|
+
`["messageId"]`, `["date"]` — and the sub-dict is omitted entirely
|
|
192
|
+
when no Resent-* header is present on the wire.
|
|
193
|
+
|
|
194
|
+
### Pragmatic deviations from RFC 8621
|
|
195
|
+
|
|
196
|
+
Two places where the parser knowingly deviates from the spec text.
|
|
197
|
+
Both are conscious choices for downstream safety; flagging them so
|
|
198
|
+
the contract is explicit:
|
|
199
|
+
|
|
200
|
+
- **`headers[i].value` is not strictly "Raw" form.** RFC 8621 §4.1.2
|
|
201
|
+
defines "Raw" as byte-faithful except for `CRLF+WSP` unfolding.
|
|
202
|
+
We additionally:
|
|
203
|
+
- Strip NUL (`\x00`) bytes — PostgreSQL `TEXT` cannot store NUL, so a
|
|
204
|
+
spec-faithful value would crash any downstream insert. Carrying
|
|
205
|
+
them through and dropping them at the storage boundary would also
|
|
206
|
+
be wrong (different stores would handle them differently).
|
|
207
|
+
- Truncate at `max_header_value_bytes` (default 102 400) — the stdlib
|
|
208
|
+
`_header_value_parser` has quadratic-time hot spots on adversarial
|
|
209
|
+
inputs (gh-136063); truncating early bounds wall-clock.
|
|
210
|
+
The `EmailBodyPart.headers[i].value` field follows the same policy.
|
|
211
|
+
|
|
212
|
+
- **Inline media isn't added to `attachments` in the `multipart/alternative`
|
|
213
|
+
nullified-branch case.** The spec algorithm in §4.1.4 has a clause
|
|
214
|
+
`if ((!htmlBody || !textBody) && isInlineMediaType(part)) attachments.push(part)`.
|
|
215
|
+
We don't honor it. Effect: in the narrow case where a `multipart/
|
|
216
|
+
alternative` ancestor has nullified one body branch and the message
|
|
217
|
+
contains inline `image/*` / `audio/*` / `video/*`, the inline media
|
|
218
|
+
appears in the surviving body but not in `attachments`. Matches what
|
|
219
|
+
Gmail / Apple Mail render; differs from a strict spec walker.
|
|
220
|
+
|
|
221
|
+
## Resource limits
|
|
222
|
+
|
|
223
|
+
The parser enforces hard caps against adversarial input. Caps are
|
|
224
|
+
passed per-call via a frozen `ParseLimits` instance; the default
|
|
225
|
+
applies when no value is supplied.
|
|
226
|
+
|
|
227
|
+
| Attribute | Default | Source |
|
|
228
|
+
| ---------------------------- | ------- | ---------------------------------------- |
|
|
229
|
+
| `max_mime_nesting_depth` | 100 | Postfix `mime_nesting_limit` |
|
|
230
|
+
| `max_mime_parts` | 1000 | Go `multipartmaxparts` |
|
|
231
|
+
| `max_header_value_bytes` | 102 400 | Postfix `header_size_limit` |
|
|
232
|
+
| `max_address_list_bytes` | 100 000 | Dovecot CVE-2024-23184 analogue |
|
|
233
|
+
|
|
234
|
+
Excess input is silently truncated and logged at WARNING level.
|
|
235
|
+
|
|
236
|
+
A single process can host multiple workloads with different caps —
|
|
237
|
+
the limits travel with the call, never via shared module state:
|
|
238
|
+
|
|
239
|
+
```python
|
|
240
|
+
from jmap_email import ParseLimits, parse_email
|
|
241
|
+
|
|
242
|
+
bulk = ParseLimits(max_mime_parts=5000, max_mime_nesting_depth=200)
|
|
243
|
+
gateway = ParseLimits(max_mime_parts=500)
|
|
244
|
+
|
|
245
|
+
parse_email(big_archive_message, limits=bulk)
|
|
246
|
+
parse_email(inbound_smtp_bytes, limits=gateway)
|
|
247
|
+
```
|
|
248
|
+
|
|
249
|
+
`ParseLimits` is frozen and hashable; instances can be reused freely
|
|
250
|
+
across threads and as cache keys.
|
|
251
|
+
|
|
252
|
+
## Strict-compose, lenient-parse
|
|
253
|
+
|
|
254
|
+
The two entry points use **different stdlib `email.policy` instances
|
|
255
|
+
on purpose**:
|
|
256
|
+
|
|
257
|
+
| Direction | Policy | Why |
|
|
258
|
+
|---|---|---|
|
|
259
|
+
| **Compose** (`compose_email`) | `email.policy.SMTP` (cloned, CTE 7-bit) | Caller-controlled input → must produce strictly RFC-compliant output. Enforces address-list folding, RFC 2047 / 2231 encoding, CRLF, line-length limits. |
|
|
260
|
+
| **Parse** (`parse_email`) | `email.policy.compat32` | Real-world inbound MIME violates the spec routinely. `compat32` is lenient: it returns raw header strings and recovers what it can from broken Content-Transfer-Encoding, missing charsets, malformed structural delimiters. |
|
|
261
|
+
|
|
262
|
+
### Parser failure mode
|
|
263
|
+
|
|
264
|
+
`parse_email` is total: it returns a `JmapEmail` dict on success or
|
|
265
|
+
`None` on fundamental failure (empty bytes, wrong type, stdlib
|
|
266
|
+
producing no `Message`, or any unhandled internal error). All failures
|
|
267
|
+
log at WARNING level. No exception escapes.
|
|
268
|
+
|
|
269
|
+
```python
|
|
270
|
+
parsed = parse_email(raw)
|
|
271
|
+
if parsed is None:
|
|
272
|
+
logger.warning("dropped unparseable message")
|
|
273
|
+
return
|
|
274
|
+
... # use parsed
|
|
275
|
+
```
|
|
276
|
+
|
|
277
|
+
Recoverable damage (a salvageable malformed header, an unknown
|
|
278
|
+
charset, etc.) keeps the parse on track — those are surfaced in
|
|
279
|
+
`parsed["_ext"]["defects"]` when the caller opts in via
|
|
280
|
+
`parse_email(raw, extensions=True)`.
|
|
281
|
+
|
|
282
|
+
### Composer error hierarchy
|
|
283
|
+
|
|
284
|
+
`compose_email` raises a typed exception that subclasses `ComposeError`.
|
|
285
|
+
Callers that don't want to discriminate can catch `ComposeError` only;
|
|
286
|
+
callers that do can dispatch on the subclass:
|
|
287
|
+
|
|
288
|
+
```text
|
|
289
|
+
ComposeError
|
|
290
|
+
├── InvalidAddressError # missing/malformed `from`, `to`, …
|
|
291
|
+
├── InvalidMessageIdError # Message-ID / In-Reply-To / References / Content-ID
|
|
292
|
+
├── InvalidDateError # `sentAt` missing or unparseable
|
|
293
|
+
├── AttachmentError # missing content, bad base64, bad MIME type, …
|
|
294
|
+
└── HeaderInjectionError # custom-header name not RFC 5322 ftext
|
|
295
|
+
```
|
|
296
|
+
|
|
297
|
+
The composer is strict on every input the caller controls. Silently
|
|
298
|
+
substituting `now()` for a missing `sentAt`, or quietly dropping a
|
|
299
|
+
broken attachment, would be invisible data loss for the sender.
|
|
300
|
+
|
|
301
|
+
- Want "now" for `sentAt`? Use the `now_sent_at()` helper:
|
|
302
|
+
`compose_email({..., "sentAt": now_sent_at(), ...})`.
|
|
303
|
+
- Handling flaky attachment input? Wrap the compose call in
|
|
304
|
+
`try / except ComposeError` (the base class catches every
|
|
305
|
+
composer error subclass — `InvalidAddressError`,
|
|
306
|
+
`AttachmentError`, etc. — at once).
|
|
307
|
+
|
|
308
|
+
## Shape helpers
|
|
309
|
+
|
|
310
|
+
Every JMAP field is a list — `from`, `to`, `messageId`, `headers`, …
|
|
311
|
+
Reading them safely usually means writing `parsed.get("from") or []`,
|
|
312
|
+
then indexing, then `.get`. Skip that with these helpers:
|
|
313
|
+
|
|
314
|
+
```python
|
|
315
|
+
from jmap_email import (
|
|
316
|
+
first_address, first_address_email, first_address_name,
|
|
317
|
+
first_msgid, msgid_chain, sent_at_to_datetime,
|
|
318
|
+
find_header, find_headers, has_header,
|
|
319
|
+
body_part_text, body_text_joined,
|
|
320
|
+
)
|
|
321
|
+
```
|
|
322
|
+
|
|
323
|
+
About `body_part_text(parsed, part)`: a text body part can have its
|
|
324
|
+
text stored two ways depending on how `parse_email` was called. Either
|
|
325
|
+
the text is right on the part (`part["content"]`), or it's in a
|
|
326
|
+
separate map (`parsed["bodyValues"][part["partId"]]["value"]`). This
|
|
327
|
+
helper checks both, so your code keeps working if the parser default
|
|
328
|
+
ever flips.
|
|
329
|
+
|
|
330
|
+
About `now_sent_at()`: returns the current UTC time formatted as the
|
|
331
|
+
ISO-8601 string `compose_email` expects for `sentAt`. One-liner instead
|
|
332
|
+
of `datetime.now(timezone.utc).isoformat()`.
|
|
333
|
+
|
|
334
|
+
## Validators
|
|
335
|
+
|
|
336
|
+
Want to know if a string would be accepted by `compose_email` as a
|
|
337
|
+
Message-ID without actually trying to compose? Use `is_valid_msg_id`:
|
|
338
|
+
|
|
339
|
+
```python
|
|
340
|
+
from jmap_email import is_valid_msg_id
|
|
341
|
+
|
|
342
|
+
if is_valid_msg_id(parent_header):
|
|
343
|
+
reply["inReplyTo"] = [parent_header]
|
|
344
|
+
```
|
|
345
|
+
|
|
346
|
+
It applies exactly the same checks `compose_email` does — shape,
|
|
347
|
+
length ceiling, no embedded whitespace — but returns `True`/`False`
|
|
348
|
+
instead of raising. Useful for lenient parse paths (archive importers,
|
|
349
|
+
inbound salvaging) that need to decide between keeping a raw id and
|
|
350
|
+
falling back to synthesis without catching an exception.
|
|
351
|
+
|
|
352
|
+
## Strict vs. lenient `parse_address`
|
|
353
|
+
|
|
354
|
+
`parse_address(s)` is **strict by default**: an input that can't be
|
|
355
|
+
parsed into a valid addr-spec returns `("", "")`. Use this for entry-
|
|
356
|
+
point validation (CLI flags, web form input) — `parse_address("no-at")`
|
|
357
|
+
returning `("", "")` lets the caller reject garbage without a second
|
|
358
|
+
`"@" in result` check.
|
|
359
|
+
|
|
360
|
+
Pass `lenient=True` for archive-import paths that must preserve the
|
|
361
|
+
original wire bytes even when invalid:
|
|
362
|
+
|
|
363
|
+
```python
|
|
364
|
+
parse_address("no-at-sign") # → ("", "")
|
|
365
|
+
parse_address("no-at-sign", lenient=True) # → ("", "no-at-sign")
|
|
366
|
+
```
|
|
367
|
+
|
|
368
|
+
`parse_addresses(s)` is always strict per-entry: tuples whose addr-spec
|
|
369
|
+
fails the shape check are silently dropped — so
|
|
370
|
+
`len(parse_addresses(header)) != header.count(",") + 1` is expected
|
|
371
|
+
when the header carries garbage between real entries.
|
|
372
|
+
|
|
373
|
+
## Defense matrix
|
|
374
|
+
|
|
375
|
+
The parser explicitly defends against the documented attack classes
|
|
376
|
+
below. See the `tests/` directory for regression coverage of each.
|
|
377
|
+
|
|
378
|
+
- **CVE-2023-27043** — `parseaddr`/`getaddresses` display-name confusion
|
|
379
|
+
- **CVE-2024-6923** — header-injection via embedded newlines (compose)
|
|
380
|
+
- **CVE-2024-21742** — Apache James `\r\n` in fields
|
|
381
|
+
- **CVE-2024-23184** — Dovecot unbounded address-list allocation
|
|
382
|
+
- **CVE-2002-1337** — Sendmail `crackaddr` nested-comments shape
|
|
383
|
+
- **CVE-2002-2325** — Pine empty-boundary infinite loop
|
|
384
|
+
- **gh-114906** — embedded newline in RFC 2047 encoded-word
|
|
385
|
+
- **gh-136063** — quadratic-time hot spots in `_header_value_parser`
|
|
386
|
+
- **gh-137687** — base64 padding `==` truncation
|
|
387
|
+
- **PortSwigger "Splitting the Email Atom"** (DEF CON 32 2024) —
|
|
388
|
+
encoded-word smuggling of structural chars (`@`, `,`, `<`, `>`, NUL)
|
|
389
|
+
- **Inbox Invasion (CCS '24)** — duplicate boundary parser confusion
|
|
390
|
+
- **Mailsploit** — NUL-byte truncation in encoded-words
|
|
391
|
+
- **USENIX 2020 "Weak Links in Auth Chains"** — duplicate `From:`,
|
|
392
|
+
group-syntax, CFWS-in-address handling
|
|
393
|
+
|
|
394
|
+
## Compatibility
|
|
395
|
+
|
|
396
|
+
- **Python** 3.14.5+ (see [Why a Python 3.14.5 floor?](#why-a-python-3145-floor))
|
|
397
|
+
- **Platforms tested in CI:** Linux on x86_64 and arm64
|
|
398
|
+
- **macOS / Windows / PyPy / free-threaded build:** untested; expected
|
|
399
|
+
to work since the package has zero compiled extensions and zero
|
|
400
|
+
runtime dependencies. Reports of breakage welcome via the issue
|
|
401
|
+
tracker.
|
|
402
|
+
|
|
403
|
+
## Performance and concurrency
|
|
404
|
+
|
|
405
|
+
- **Thread-safe** at the public API level. Module-level state
|
|
406
|
+
(`_HEADER_FACTORY`, `_POLICY`) is constructed once at import and
|
|
407
|
+
never mutated after.
|
|
408
|
+
- **No I/O.** Every entry point operates on in-memory bytes or dicts.
|
|
409
|
+
- **No global rate limits or singletons** beyond the immutable
|
|
410
|
+
registries above. Multiple processes / asyncio tasks may call
|
|
411
|
+
`parse_email` / `compose_email` concurrently without coordination.
|
|
412
|
+
|
|
413
|
+
Ballpark wall time on an Apple M2 (single thread, in-process):
|
|
414
|
+
≈ 0.4 ms per typical 5 kB inbound message; ≈ 1 ms per 100 kB MIME
|
|
415
|
+
multipart with embedded images. Use your own corpus to measure for
|
|
416
|
+
your workload — message-shape variation dominates.
|
|
417
|
+
|
|
418
|
+
## Examples
|
|
419
|
+
|
|
420
|
+
Runnable scripts under `examples/`:
|
|
421
|
+
|
|
422
|
+
- `examples/parse_and_print.py` — parse raw bytes and pretty-print the
|
|
423
|
+
JMAP shape
|
|
424
|
+
- `examples/import_eml_safely.py` — read an `.eml` off disk, handle
|
|
425
|
+
the `None` failure path, surface defects, print key fields
|
|
426
|
+
- `examples/compose_with_attachment.py` — compose a multipart message
|
|
427
|
+
with a regular attachment
|
|
428
|
+
- `examples/inline_image_roundtrip.py` — compose + re-parse a message
|
|
429
|
+
with an inline image, asserting the CID survives
|
|
430
|
+
- `examples/encoded_word_subject.py` — compose a non-ASCII Subject
|
|
431
|
+
and re-parse it
|
|
432
|
+
|
|
433
|
+
## Development
|
|
434
|
+
|
|
435
|
+
The repository ships a docker-compose-based test environment so the
|
|
436
|
+
package can be exercised against the exact Python / pytest / hypothesis
|
|
437
|
+
versions CI uses:
|
|
438
|
+
|
|
439
|
+
```bash
|
|
440
|
+
make test-jmap-email # run the full test suite (zero infra deps)
|
|
441
|
+
make typecheck-jmap-email # static check via Astral's `ty` (Rust)
|
|
442
|
+
```
|
|
443
|
+
|
|
444
|
+
To run tests outside docker:
|
|
445
|
+
|
|
446
|
+
```bash
|
|
447
|
+
cd src/jmap-email
|
|
448
|
+
pip install -e '.[dev]'
|
|
449
|
+
pytest # default selection, fuzz tests excluded
|
|
450
|
+
pytest -m fuzz # property-based / Hypothesis fuzz
|
|
451
|
+
ruff check .
|
|
452
|
+
ruff format --check .
|
|
453
|
+
```
|
|
454
|
+
|
|
455
|
+
See `CONTRIBUTING.md` for the contribution workflow.
|
|
456
|
+
|
|
457
|
+
## License
|
|
458
|
+
|
|
459
|
+
MIT — see `LICENSE`.
|
|
460
|
+
|
|
461
|
+
## Versioning
|
|
462
|
+
|
|
463
|
+
Semantic. Public API is everything exported in `jmap_email.__all__`;
|
|
464
|
+
anything prefixed with `_` is internal and may change between patch
|
|
465
|
+
releases.
|
|
466
|
+
|
|
467
|
+
`__version__` is exposed at the module level.
|
|
468
|
+
|
|
469
|
+
## Security
|
|
470
|
+
|
|
471
|
+
Security-sensitive reports go through GitHub Security Advisories — see
|
|
472
|
+
`SECURITY.md` for the disclosure policy.
|