mail-parser 4.1.4__tar.gz → 4.2.1__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (71) hide show
  1. mail_parser-4.2.1/.github/FUNDING.yml +3 -0
  2. mail_parser-4.2.1/.github/ISSUE_TEMPLATE/bug_report.md +33 -0
  3. mail_parser-4.2.1/.github/ISSUE_TEMPLATE/feature_request.md +17 -0
  4. mail_parser-4.2.1/.github/copilot-instructions.md +226 -0
  5. mail_parser-4.2.1/.github/instructions/containerization-docker-best-practices.instructions.md +681 -0
  6. mail_parser-4.2.1/.github/instructions/github-actions-ci-cd-best-practices.instructions.md +607 -0
  7. mail_parser-4.2.1/.github/instructions/markdown.instructions.md +63 -0
  8. mail_parser-4.2.1/.github/instructions/python.instructions.md +56 -0
  9. mail_parser-4.2.1/.github/workflows/main.yml +158 -0
  10. mail_parser-4.2.1/.gitignore +23 -0
  11. mail_parser-4.2.1/.markdownlint.json +15 -0
  12. mail_parser-4.2.1/.pre-commit-config.yaml +56 -0
  13. mail_parser-4.2.1/Dockerfile +29 -0
  14. mail_parser-4.2.1/Makefile +56 -0
  15. mail_parser-4.2.1/PKG-INFO +507 -0
  16. mail_parser-4.2.1/README.md +482 -0
  17. mail_parser-4.2.1/docker-compose.yml +10 -0
  18. mail_parser-4.2.1/docs/images/Bitcoin SpamScope.jpg +0 -0
  19. mail_parser-4.2.1/pyproject.toml +103 -0
  20. {mail_parser-4.1.4 → mail_parser-4.2.1}/src/mailparser/__init__.py +0 -2
  21. {mail_parser-4.1.4 → mail_parser-4.2.1}/src/mailparser/__main__.py +1 -3
  22. mail_parser-4.2.1/src/mailparser/const.py +101 -0
  23. {mail_parser-4.1.4 → mail_parser-4.2.1}/src/mailparser/core.py +89 -87
  24. {mail_parser-4.1.4 → mail_parser-4.2.1}/src/mailparser/exceptions.py +0 -1
  25. {mail_parser-4.1.4 → mail_parser-4.2.1}/src/mailparser/utils.py +218 -118
  26. {mail_parser-4.1.4 → mail_parser-4.2.1}/src/mailparser/version.py +1 -2
  27. mail_parser-4.2.1/tests/mails/mail_malformed_1 +1660 -0
  28. mail_parser-4.2.1/tests/mails/mail_malformed_2 +60 -0
  29. mail_parser-4.2.1/tests/mails/mail_malformed_3 +345 -0
  30. mail_parser-4.2.1/tests/mails/mail_outlook_1 +0 -0
  31. mail_parser-4.2.1/tests/mails/mail_test_1 +858 -0
  32. mail_parser-4.2.1/tests/mails/mail_test_10 +4186 -0
  33. mail_parser-4.2.1/tests/mails/mail_test_11 +856 -0
  34. mail_parser-4.2.1/tests/mails/mail_test_12 +17 -0
  35. mail_parser-4.2.1/tests/mails/mail_test_13 +1421 -0
  36. mail_parser-4.2.1/tests/mails/mail_test_14 +33 -0
  37. mail_parser-4.2.1/tests/mails/mail_test_15 +5684 -0
  38. mail_parser-4.2.1/tests/mails/mail_test_16 +26 -0
  39. mail_parser-4.2.1/tests/mails/mail_test_17 +102 -0
  40. mail_parser-4.2.1/tests/mails/mail_test_18 +14 -0
  41. mail_parser-4.2.1/tests/mails/mail_test_19 +12 -0
  42. mail_parser-4.2.1/tests/mails/mail_test_2 +19588 -0
  43. mail_parser-4.2.1/tests/mails/mail_test_3 +55 -0
  44. mail_parser-4.2.1/tests/mails/mail_test_4 +8257 -0
  45. mail_parser-4.2.1/tests/mails/mail_test_5 +2919 -0
  46. mail_parser-4.2.1/tests/mails/mail_test_6 +2414 -0
  47. mail_parser-4.2.1/tests/mails/mail_test_7 +1434 -0
  48. mail_parser-4.2.1/tests/mails/mail_test_8 +162 -0
  49. mail_parser-4.2.1/tests/mails/mail_test_9 +68 -0
  50. mail_parser-4.2.1/tests/test_improved_received_patterns.py +167 -0
  51. mail_parser-4.2.1/tests/test_mail_parser.py +1247 -0
  52. mail_parser-4.2.1/tests/test_main.py +360 -0
  53. mail_parser-4.2.1/tests/test_received_corpus.py +307 -0
  54. mail_parser-4.2.1/tests/test_utils.py +646 -0
  55. mail_parser-4.2.1/uv.lock +1322 -0
  56. mail_parser-4.1.4/PKG-INFO +0 -338
  57. mail_parser-4.1.4/README.md +0 -300
  58. mail_parser-4.1.4/pyproject.toml +0 -3
  59. mail_parser-4.1.4/setup.cfg +0 -72
  60. mail_parser-4.1.4/setup.py +0 -20
  61. mail_parser-4.1.4/src/mail_parser.egg-info/PKG-INFO +0 -338
  62. mail_parser-4.1.4/src/mail_parser.egg-info/SOURCES.txt +0 -21
  63. mail_parser-4.1.4/src/mail_parser.egg-info/dependency_links.txt +0 -1
  64. mail_parser-4.1.4/src/mail_parser.egg-info/entry_points.txt +0 -2
  65. mail_parser-4.1.4/src/mail_parser.egg-info/requires.txt +0 -14
  66. mail_parser-4.1.4/src/mail_parser.egg-info/top_level.txt +0 -1
  67. mail_parser-4.1.4/src/mailparser/const.py +0 -98
  68. mail_parser-4.1.4/tests/test_mail_parser.py +0 -663
  69. mail_parser-4.1.4/tests/test_main.py +0 -172
  70. {mail_parser-4.1.4 → mail_parser-4.2.1}/LICENSE.txt +0 -0
  71. {mail_parser-4.1.4 → mail_parser-4.2.1}/NOTICE.txt +0 -0
@@ -0,0 +1,3 @@
1
+ # These are supported funding model platforms
2
+
3
+ github: [fedelemantuano]
@@ -0,0 +1,33 @@
1
+ ---
2
+ name: Bug report
3
+ about: Create a report to help us improve
4
+
5
+ ---
6
+
7
+ **Describe the bug**
8
+ A clear and concise description of what the bug is.
9
+
10
+ **To Reproduce**
11
+ Steps to reproduce the behavior:
12
+
13
+ 1. `import mailparser`
14
+ 2. `mail = mailparser.parse_from_file(f)`
15
+ 3. '....'
16
+ 4. See error
17
+
18
+ **Expected behavior**
19
+ A clear and concise description of what you expected to happen.
20
+
21
+ **Raw mail**
22
+ The raw mail to reproduce the behavior.
23
+ You can use a `gist` like [this](https://gist.github.com/fedelemantuano/5dd702004c25a46b2bd60de21e67458e).
24
+ The issues without raw mail will be closed.
25
+
26
+ **Environment:**
27
+
28
+ - OS: [e.g. Linux, Windows]
29
+ - Docker: [yes or no]
30
+ - mail-parser version [e.g. 3.6.0]
31
+
32
+ **Additional context**
33
+ Add any other context about the problem here (e.g. stack traceback error).
@@ -0,0 +1,17 @@
1
+ ---
2
+ name: Feature request
3
+ about: Suggest an idea for this project
4
+
5
+ ---
6
+
7
+ **Is your feature request related to a problem? Please describe.**
8
+ A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
9
+
10
+ **Describe the solution you'd like**
11
+ A clear and concise description of what you want to happen.
12
+
13
+ **Describe alternatives you've considered**
14
+ A clear and concise description of any alternative solutions or features you've considered.
15
+
16
+ **Additional context**
17
+ Add any other context or screenshots about the feature request here.
@@ -0,0 +1,226 @@
1
+ # Copilot Instructions for mail-parser
2
+
3
+ mail-parser is a **production-grade email parsing library** for Python that transforms raw email messages into
4
+ structured Python objects. Originally built as the foundation for [SpamScope](https://github.com/SpamScope/spamscope),
5
+ it excels at security analysis, forensics, and RFC-compliant email processing.
6
+
7
+ ## Core Architecture
8
+
9
+ ### Factory-Based API Pattern
10
+
11
+ **Always use factory functions** instead of direct `MailParser()` instantiation:
12
+
13
+ ```python
14
+ import mailparser
15
+ mail = mailparser.parse_from_file(filepath) # Standard email files
16
+ mail = mailparser.parse_from_string(raw_email) # Email as string
17
+ mail = mailparser.parse_from_bytes(email_bytes) # Email as bytes
18
+ mail = mailparser.parse_from_file_msg(msg_file) # Outlook .msg files
19
+ ```
20
+
21
+ ### Triple-Format Property Access
22
+
23
+ Every parsed component offers **three access patterns** (`src/mailparser/core.py:550-570`):
24
+
25
+ ```python
26
+ mail.subject # Python object (decoded string)
27
+ mail.subject_raw # Raw header value (JSON list)
28
+ mail.subject_json # JSON-serialized version
29
+ ```
30
+
31
+ This pattern applies to all properties via `__getattr__` magic in `core.py`.
32
+
33
+ ### Property Naming Convention
34
+
35
+ Headers with hyphens use **underscore substitution** (`core.py:__getattr__`):
36
+
37
+ ```python
38
+ mail.X_MSMail_Priority # Accesses "X-MSMail-Priority" header
39
+ mail.Content_Type # Accesses "Content-Type" header
40
+ ```
41
+
42
+ ## Development Workflows
43
+
44
+ ### Dependency Management with uv
45
+
46
+ The project uses **[uv](https://github.com/astral-sh/uv)** (modern pip/virtualenv replacement) exclusively:
47
+
48
+ ```bash
49
+ uv sync # Install all dev/test dependencies (defined in pyproject.toml)
50
+ make install # Alias for uv sync
51
+ ```
52
+
53
+ Never use `pip` directly—all commands in Makefile use `uv run` prefix.
54
+
55
+ ### Testing Patterns
56
+
57
+ ```bash
58
+ make test # pytest with coverage (generates coverage.xml, junit.xml, htmlcov/)
59
+ make lint # ruff check .
60
+ make format # ruff format .
61
+ make check # lint + test
62
+ make pre-commit # Run all pre-commit hooks
63
+ ```
64
+
65
+ When adding features or fixing bugs you MUST follow these steps:
66
+
67
+ 1. Add relevant test email to `tests/mails/` if demonstrating new case
68
+ 2. Write tests in the corresponding test file following existing patterns, under `tests/`
69
+ 3. Run `make test` to verify all tests pass before committing
70
+ 4. Run `uv run mail-parser -f tests/mails/mail_test_11 -j` to manually verify JSON output and that new changes
71
+ work as expected
72
+ 5. Run `make pre-commit` to ensure code style compliance before pushing
73
+
74
+ **Test data location**: `tests/mails/` contains malformed emails, Outlook files, and various encodings
75
+ (`mail_test_1` through `mail_test_17`, `mail_malformed_1-3`, `mail_outlook_1`).
76
+
77
+ **Critical testing rule**: When modifying parsing logic, test against malformed emails to ensure security defect
78
+ detection still works.
79
+
80
+ ### Build & Release Process
81
+
82
+ ```bash
83
+ make build # uv build → creates dist/*.tar.gz and dist/*.whl
84
+ make release # build + twine upload to PyPI
85
+ ```
86
+
87
+ Version is **dynamically loaded** from `src/mailparser/version.py` (see
88
+ `pyproject.toml:tool.hatch.version`).
89
+
90
+ ## Security-First Parsing
91
+
92
+ ### Defect Detection System
93
+
94
+ The parser identifies RFC violations that could indicate malicious intent (`core.py:240-268`):
95
+
96
+ ```python
97
+ mail.has_defects # Boolean flag
98
+ mail.defects # List of defect dicts by content type
99
+ mail.defects_categories # Set of defect class names (e.g., "StartBoundaryNotFoundDefect")
100
+ ```
101
+
102
+ **Epilogue defect handling** (`core.py:320-335`): When `EPILOGUE_DEFECTS` are detected, parser extracts hidden
103
+ content between MIME boundaries that could contain malicious payloads.
104
+
105
+ ### IP Address Extraction
106
+
107
+ `get_server_ipaddress(trust)` method (`core.py:487-528`) extracts sender IPs with **trust-level validation**:
108
+
109
+ ```python
110
+ # Finds first non-private IP in trusted headers
111
+ mail.get_server_ipaddress(trust="Received")
112
+ ```
113
+
114
+ Filters out private IP ranges using Python's `ipaddress` module.
115
+
116
+ ### Received Header Parsing
117
+
118
+ Complex regex-based parsing (`utils.py:302-360`, patterns in `const.py:24-73`) extracts hop-by-hop routing:
119
+
120
+ ```python
121
+ # Returns list of dicts with: by, from, date, date_utc, delay, envelope_from, hop, with
122
+ mail.received
123
+ ```
124
+
125
+ **Key pattern**: `RECEIVED_COMPILED_LIST` contains pre-compiled regexes for "from", "by", "with", "id", "for",
126
+ "via", "envelope-from", "envelope-sender", and date patterns. Recent fixes addressed IBM gateway duplicate matches
127
+ (see comments in `const.py:26-38`).
128
+
129
+ If parsing fails, falls back to `receiveds_not_parsed()` returning `{"raw": <header>, "hop": <n>}`
130
+ structure.
131
+
132
+ ## Project Structure Specifics
133
+
134
+ ### src/ Layout
135
+
136
+ Package uses modern **src-layout** (`src/mailparser/`) for cleaner imports and testing isolation:
137
+
138
+ ```text
139
+ src/mailparser/
140
+ ├── __init__.py # Exports factory functions
141
+ ├── __main__.py # CLI entry point (mail-parser command)
142
+ ├── core.py # MailParser class (760 lines)
143
+ ├── utils.py # Parsing utilities (582 lines)
144
+ ├── const.py # Regex patterns and constants
145
+ ├── exceptions.py # Exception hierarchy
146
+ └── version.py # Version string
147
+ ```
148
+
149
+ ### External Dependency: Outlook Support
150
+
151
+ Outlook `.msg` file parsing requires **system-level Perl module**:
152
+
153
+ ```bash
154
+ apt-get install libemail-outlook-message-perl # Debian/Ubuntu
155
+ ```
156
+
157
+ Triggered via `msgconvert()` function in `utils.py` that shells out to Perl script. Raises `MailParserOutlookError`
158
+ if unavailable.
159
+
160
+ ### CLI Tool Pattern
161
+
162
+ `__main__.py` provides production CLI with mutually exclusive input modes (`-f`, `-s`, `-k`), JSON output (`-j`),
163
+ and selective printing (`-b`, `-a`, `-r`, `-t`).
164
+
165
+ **Entry point defined** in `pyproject.toml:project.scripts`:
166
+
167
+ ```toml
168
+ [project.scripts]
169
+ mail-parser = "mailparser.__main__:main"
170
+ ```
171
+
172
+ ## Code Style & Tooling
173
+
174
+ ### Ruff Configuration
175
+
176
+ Single linter/formatter (replaces black, isort, flake8):
177
+
178
+ ```toml
179
+ [tool.ruff.lint]
180
+ select = ["E", "F", "I"] # pycodestyle, pyflakes, isort
181
+ # "UP", "B", "SIM", "S", "PT" commented out in pyproject.toml
182
+ ```
183
+
184
+ ### Pytest Configuration
185
+
186
+ Key markers in `pyproject.toml:tool.pytest.ini_options`:
187
+
188
+ - `integration`: marks integration tests
189
+ - Coverage outputs: XML (for CI), HTML (for local), terminal
190
+ - JUnit XML for CI integration
191
+
192
+ ## Common Pitfalls
193
+
194
+ 1. **Don't instantiate `MailParser()` directly**—use factory functions from `__init__.py`
195
+ 2. **Don't use `pip`**—always use `uv` or Makefile targets
196
+ 3. **Don't ignore defects**—they're critical for security analysis
197
+ 4. **Don't assume headers exist**—use `.get()` pattern or handle `None`
198
+ 5. **Test against malformed emails**—`tests/mails/mail_malformed_*` files exist for this reason
199
+
200
+ ## Docker Development
201
+
202
+ Dockerfile uses **Python 3.10-slim-bookworm** with Outlook dependencies pre-installed. Container runs as non-root
203
+ `mailparser` user.
204
+
205
+ ```bash
206
+ docker build -t mail-parser .
207
+ docker run mail-parser -f /path/to/email
208
+ ```
209
+
210
+ ## Key Reference Points
211
+
212
+ - **Property implementation**: `core.py:540-730` (all `@property` decorators)
213
+ - **Attachment extraction**: `core.py:355-475` (walks multipart, handles encoding)
214
+ - **Received parsing logic**: `utils.py:302-455` + `const.py:24-73` (regex patterns)
215
+ - **CLI implementation**: `__main__.py:30-347` (argparse + output formatting)
216
+ - **Exception hierarchy**: `exceptions.py:20-60` (5 exception types)
217
+
218
+ ## Testing Strategy
219
+
220
+ When adding features:
221
+
222
+ 1. Add test email to `tests/mails/` if demonstrating new case
223
+ 2. Write tests in `tests/test_mail_parser.py` following existing patterns
224
+ 3. Test both normal and `_raw`/`_json` property variants
225
+ 4. Verify defect detection for security-relevant changes
226
+ 5. Run `make check` before committing