mailsense 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,459 @@
1
+ Metadata-Version: 2.4
2
+ Name: mailsense
3
+ Version: 0.1.0
4
+ Summary: Automated mail intelligence pipeline: Gmail → image extraction → Gemini AI analysis
5
+ Author-email: Samapriya Roy <samapriya.roy@gmail.com>
6
+ License-Expression: Apache-2.0
7
+ Project-URL: Homepage, https://github.com/samapriya/mailsense
8
+ Project-URL: Bug Tracker, https://github.com/samapriya/mailsense/issues
9
+ Keywords: gmail,usps,informed-delivery,gemini,mail,ocr,ai
10
+ Classifier: Development Status :: 4 - Beta
11
+ Classifier: Environment :: Console
12
+ Classifier: Intended Audience :: End Users/Desktop
13
+ Classifier: Operating System :: OS Independent
14
+ Classifier: Programming Language :: Python :: 3
15
+ Classifier: Programming Language :: Python :: 3.10
16
+ Classifier: Programming Language :: Python :: 3.11
17
+ Classifier: Programming Language :: Python :: 3.12
18
+ Classifier: Topic :: Communications :: Email
19
+ Classifier: Topic :: Utilities
20
+ Requires-Python: >=3.10
21
+ Description-Content-Type: text/markdown
22
+ Requires-Dist: tqdm>=4.66
23
+ Requires-Dist: Pillow>=10.0
24
+ Requires-Dist: google-generativeai>=0.7
25
+ Requires-Dist: rich>=13.0
26
+ Provides-Extra: dev
27
+ Requires-Dist: pytest>=8.0; extra == "dev"
28
+ Requires-Dist: build; extra == "dev"
29
+ Requires-Dist: twine; extra == "dev"
30
+
31
+ # mailsense
32
+
33
+ > **Automated mail intelligence pipeline.**
34
+ > Gmail → `.mbox` → image extraction → Gemini AI analysis — all from one CLI.
35
+
36
+ [![PyPI version](https://img.shields.io/pypi/v/mailsense.svg)](https://pypi.org/project/mailsense/)
37
+ [![Python 3.10+](https://img.shields.io/badge/python-3.10%2B-blue.svg)](https://www.python.org/)
38
+ [![License: Apache 2.0](https://img.shields.io/badge/License-Apache%202.0-green.svg)](LICENSE)
39
+
40
+ ---
41
+
42
+ <p align="center">
43
+ <img src="https://i.imgur.com/1sixczI.png" alt="mailsense icon" width="150">
44
+ </p>
45
+
46
+ ## What is mailsense?
47
+
48
+ **mailsense** turns your USPS Informed Delivery emails (or any image-heavy Gmail label) into clean, structured JSON — automatically.
49
+
50
+ The pipeline has three stages:
51
+ ---
52
+
53
+ The pipeline has three stages:
54
+
55
+ ```
56
+ Gmail label
57
+
58
+ ▼ mailsense download
59
+ .mbox files
60
+
61
+ ▼ mailsense extract
62
+ images + metadata.json
63
+
64
+ ▼ mailsense analyze
65
+ structured JSON (sender, recipient, postage, document type, summary…)
66
+ ```
67
+
68
+ Run each stage independently, or fire them all at once with **`mailsense pipeline`**.
69
+
70
+ ---
71
+
72
+ ## Installation
73
+
74
+ ### From PyPI
75
+
76
+ ```bash
77
+ pip install mailsense
78
+ ```
79
+
80
+ ### From source
81
+
82
+ ```bash
83
+ git clone https://github.com/samapriya/mailsense.git
84
+ cd mailsense
85
+ pip install -e .
86
+ ```
87
+
88
+ ### Build a wheel for local distribution
89
+
90
+ ```bash
91
+ pip install build
92
+ python -m build # produces dist/mailsense-*.whl and dist/mailsense-*.tar.gz
93
+ pip install dist/mailsense-*.whl
94
+ ```
95
+
96
+ ---
97
+
98
+ ## Prerequisites
99
+
100
+ | Requirement | How to get it |
101
+ |---|---|
102
+ | **Python 3.10+** | [python.org](https://www.python.org/) |
103
+ | **Gmail IMAP enabled** | Gmail → Settings → See all settings → Forwarding and POP/IMAP → Enable IMAP |
104
+ | **Gmail App Password** | [myaccount.google.com](https://myaccount.google.com) → Security → 2-Step Verification → App Passwords |
105
+ | **Gemini API key** | [aistudio.google.com/app/apikey](https://aistudio.google.com/app/apikey) — free tier: 15 RPM / 1500 RPD |
106
+
107
+ ---
108
+
109
+ ## Quick Start
110
+
111
+ ### 1. Store your credentials once
112
+
113
+ Run the interactive setup wizard — it walks through every setting, masks sensitive input with `*` as you type (paste works too), and lets you press Enter to keep the current or default value:
114
+
115
+ ```bash
116
+ mailsense config configure
117
+ ```
118
+
119
+ Or set values individually:
120
+
121
+ ```bash
122
+ mailsense config set gmail_email you@gmail.com
123
+ mailsense config set gmail_password YOUR_APP_PASSWORD
124
+ mailsense config set gmail_label "USPS Informed Delivery"
125
+ mailsense config set gemini_api_key YOUR_GEMINI_KEY
126
+ ```
127
+
128
+ Credentials are stored in `~/.mailsense` (mode `0600`, owner-readable only).
129
+
130
+ ### 2. Run the full pipeline
131
+
132
+ ```bash
133
+ # If gmail_label is saved in config, --label can be omitted:
134
+ mailsense pipeline --work-dir ./my_mail
135
+
136
+ # Or specify the label explicitly:
137
+ mailsense pipeline --label "USPS Informed Delivery" --work-dir ./my_mail
138
+ ```
139
+
140
+ Results land in:
141
+
142
+ ```
143
+ my_mail/
144
+ mbox/ ← grouped .mbox files from Gmail
145
+ images/ ← extracted mail images + metadata.json
146
+ analyzed/ ← one .json per image with AI-extracted data
147
+ ```
148
+
149
+ ---
150
+
151
+ ## Commands
152
+
153
+ ### `mailsense config`
154
+
155
+ Manage credentials and defaults stored in `~/.mailsense`.
156
+
157
+ ```bash
158
+ # Interactive wizard — prompts for all settings with masked input for secrets
159
+ mailsense config configure
160
+
161
+ # Reconfigure specific keys only
162
+ mailsense config configure gmail_email gmail_label
163
+
164
+ # Show current config (secrets masked)
165
+ mailsense config show
166
+
167
+ # Set a single value
168
+ mailsense config set gmail_label "USPS Informed Delivery"
169
+ mailsense config set api_delay 2
170
+
171
+ # Remove a value
172
+ mailsense config unset gmail_password
173
+
174
+ # List all recognised keys and descriptions
175
+ mailsense config keys
176
+ ```
177
+
178
+ **Available config keys:**
179
+
180
+ | Key | Description |
181
+ |-----|-------------|
182
+ | `gmail_email` | Gmail address used for IMAP |
183
+ | `gmail_password` | Gmail App Password |
184
+ | `gmail_label` | Gmail label to download (e.g. `USPS Informed Delivery`) |
185
+ | `gemini_api_key` | Google Gemini API key |
186
+ | `sender_filter` | From-header substring filter used during extraction (default: `usps`) |
187
+ | `gemini_model` | Gemini model name (default: `gemini-2.0-flash`) |
188
+ | `api_delay` | Seconds between Gemini requests (default: `4`) |
189
+
190
+ > **`gmail_label` vs `sender_filter`:** `gmail_label` is the Gmail folder you download from. `sender_filter` is a substring matched against the `From:` header of each email within those downloads to identify the right messages — the default `usps` matches addresses like `informeddelivery@usps.com`. You rarely need to change `sender_filter`.
191
+
192
+ ---
193
+
194
+ ### `mailsense download`
195
+
196
+ Download emails from a Gmail label to `.mbox` files.
197
+
198
+ ```bash
199
+ # Uses gmail_label from config if --label is not specified
200
+ mailsense download --output-dir ./mbox
201
+
202
+ # Override the label explicitly
203
+ mailsense download --label "USPS Informed Delivery" --output-dir ./mbox
204
+
205
+ # Last 90 days only, grouped by week
206
+ mailsense download --start 90d --group-by week
207
+
208
+ # Specific date range, 14-day chunks
209
+ mailsense download \
210
+ --start 01-01-2025 \
211
+ --end 12-31-2025 \
212
+ --group-by days \
213
+ --days-per-file 14
214
+
215
+ # List all available Gmail labels
216
+ mailsense download --list-labels
217
+ ```
218
+
219
+ **Options:**
220
+
221
+ | Flag | Default | Description |
222
+ |------|---------|-------------|
223
+ | `--label`, `-l` | config `gmail_label` / prompted | Gmail label to download |
224
+ | `--email`, `-e` | config `gmail_email` / prompted | Gmail address |
225
+ | `--password`, `-p` | config `gmail_password` / prompted (masked) | App Password |
226
+ | `--output-dir`, `-o` | `mbox_export` | Directory for `.mbox` files |
227
+ | `--start` | — | Start date (`MM-DD-YYYY`, `YYYY-MM-DD`, or `90d`) |
228
+ | `--end` | — | End date (inclusive) |
229
+ | `--group-by` | `month` | `month` / `week` / `days` / `single` / `individual` |
230
+ | `--days-per-file` | `7` | Window size when `--group-by=days` |
231
+ | `--list-labels` | — | Print all Gmail labels and exit |
232
+ | `--no-resume` | — | Re-download everything, ignoring previous state |
233
+
234
+ **Date formats:** `02-25-2026` · `2026-02-25` · `02/25/2026` · `90d` (relative)
235
+
236
+ ---
237
+
238
+ ### `mailsense extract`
239
+
240
+ Extract images from `.mbox` files, writing `metadata.json`.
241
+
242
+ ```bash
243
+ # Single .mbox file
244
+ mailsense extract --input-dir inbox.mbox --output-dir ./images
245
+
246
+ # Entire folder of .mbox files (batch mode — one subdirectory per mbox)
247
+ mailsense extract --input-dir ./mbox_export --output-dir ./images
248
+
249
+ # Dry run — scan without writing
250
+ mailsense extract --input-dir ./mbox_export --dry-run
251
+ ```
252
+
253
+ **Options:**
254
+
255
+ | Flag | Default | Description |
256
+ |------|---------|-------------|
257
+ | `--input-dir`, `-i` | required | `.mbox` file or directory of `.mbox` files |
258
+ | `--output-dir`, `-o` | `img_extracts` | Root output directory |
259
+ | `--dry-run`, `-n` | — | Scan without writing |
260
+ | `--log-level` | `INFO` | `DEBUG` / `INFO` / `WARNING` / `ERROR` |
261
+ | `--log-file` | — | Write full debug log to a file |
262
+
263
+ The sender filter is read from config (`sender_filter`, default `usps`) and applied automatically — no flag needed.
264
+
265
+ **Image filtering rules (applied in order):**
266
+
267
+ 1. Extension must be `.jpg` `.jpeg` `.png` `.gif` `.webp` `.bmp` `.tiff`
268
+ 2. Filename must **not** start with `content`, `mailer`, or `ra` (email chrome/trackers)
269
+ 3. File must be ≥ 1 KB (eliminates tracking pixels)
270
+
271
+ ---
272
+
273
+ ### `mailsense analyze`
274
+
275
+ Analyze mail images with Gemini AI, producing structured JSON.
276
+
277
+ ```bash
278
+ mailsense analyze \
279
+ --input-dir ./images \
280
+ --output-dir ./analyzed
281
+
282
+ # Custom model and delay (paid tier — faster)
283
+ mailsense analyze \
284
+ --input-dir ./images \
285
+ --output-dir ./analyzed \
286
+ --model gemini-1.5-pro \
287
+ --delay 1
288
+
289
+ # Dry run
290
+ mailsense analyze --input-dir ./images --output-dir ./analyzed --dry-run
291
+ ```
292
+
293
+ **Options:**
294
+
295
+ | Flag | Default | Description |
296
+ |------|---------|-------------|
297
+ | `--input-dir`, `-i` | required | Output directory from the extract stage |
298
+ | `--output-dir`, `-o` | required | Directory for analyzed JSON files |
299
+ | `--api-key`, `-k` | config `gemini_api_key` / env | Gemini API key |
300
+ | `--model`, `-m` | config `gemini_model` (`gemini-2.0-flash`) | Gemini model |
301
+ | `--delay`, `-d` | config `api_delay` (`4`) | Seconds between API calls |
302
+ | `--dry-run`, `-n` | — | Show what would be processed |
303
+
304
+ **Rate limits:**
305
+
306
+ | Tier | RPM | RPD | Recommended `--delay` |
307
+ |------|-----|-----|-----------------------|
308
+ | Free | 15 | 1500 | `4` (default) |
309
+ | Paid | 1000+ | — | `1` or lower |
310
+
311
+ **Output format (per image):**
312
+
313
+ ```json
314
+ {
315
+ "status": "Processed",
316
+ "is_marketing": false,
317
+ "sender": {
318
+ "name": "Capital One",
319
+ "organization": "Capital One Bank",
320
+ "address": { "street": "PO Box 30281", "city": "Salt Lake City", "state": "UT", "zip_code": "84130" }
321
+ },
322
+ "recipient": { "name": "Jane Smith", "address": { ... } },
323
+ "postage_details": { "type": "First Class Mail", "status": "Delivered" },
324
+ "document_info": { "document_type": "Credit Card Statement", "reference_numbers": ["..."] },
325
+ "content_summary": "Monthly credit card statement showing account balance and minimum payment due.",
326
+ "filename": "image_3_a1b2c3d4.jpg",
327
+ "mail_metadata": {
328
+ "date": "Thu, 12 Feb 2026 08:00:00 -0500",
329
+ "subject": "Your USPS Informed Delivery Daily Digest",
330
+ "from": "USPS Informed Delivery <InformedDelivery@informeddelivery.usps.com>"
331
+ }
332
+ }
333
+ ```
334
+
335
+ ---
336
+
337
+ ### `mailsense pipeline`
338
+
339
+ Run all three stages end-to-end from a single command.
340
+
341
+ ```bash
342
+ # Uses gmail_label from config — no --label needed
343
+ mailsense pipeline --work-dir ./mail_run
344
+
345
+ # Override the label explicitly
346
+ mailsense pipeline --label "USPS Informed Delivery" --work-dir ./mail_run
347
+
348
+ # Last 90 days, skip re-downloading if mbox already exists
349
+ mailsense pipeline \
350
+ --start 90d \
351
+ --work-dir ./mail_run \
352
+ --skip-download
353
+
354
+ # Dry run of the entire pipeline
355
+ mailsense pipeline --dry-run
356
+ ```
357
+
358
+ **Workflow output structure:**
359
+
360
+ ```
361
+ mail_run/
362
+ mbox/ ← grouped .mbox files
363
+ images/ ← images + metadata.json per mbox
364
+ analyzed/ ← one .json per image
365
+ ```
366
+
367
+ **Resume behaviour:** Each stage is individually resumable — already-processed files are skipped automatically on re-runs. Use `--no-resume` to force a clean download, or `--skip-download` / `--skip-extract` / `--skip-analyze` to selectively re-run only the stages you need.
368
+
369
+ **All options:**
370
+
371
+ | Flag | Default | Description |
372
+ |------|---------|-------------|
373
+ | `--work-dir`, `-w` | `mailsense_run` | Root directory for all pipeline outputs |
374
+ | `--label`, `-l` | config `gmail_label` / prompted | Gmail label to download |
375
+ | `--email`, `-e` | config `gmail_email` / prompted | Gmail address |
376
+ | `--password`, `-p` | config `gmail_password` / prompted | App Password |
377
+ | `--start` | — | Start date |
378
+ | `--end` | — | End date (inclusive) |
379
+ | `--group-by` | `month` | `.mbox` grouping strategy |
380
+ | `--days-per-file` | `7` | Window size when `--group-by=days` |
381
+ | `--no-resume` | — | Re-download everything |
382
+ | `--api-key`, `-k` | config / env | Gemini API key |
383
+ | `--model` | config / `gemini-2.0-flash` | Gemini model |
384
+ | `--delay` | config / `4` | Seconds between Gemini requests |
385
+ | `--dry-run`, `-n` | — | Dry run all stages |
386
+ | `--skip-download` | — | Skip download, use existing .mbox files |
387
+ | `--skip-extract` | — | Skip extract, use existing images |
388
+ | `--skip-analyze` | — | Skip analyze |
389
+
390
+ ---
391
+
392
+ ## Building for Distribution
393
+
394
+ ```bash
395
+ # Install build tools
396
+ pip install build twine
397
+
398
+ # Build sdist + wheel
399
+ python -m build
400
+
401
+ # Inspect the wheel contents
402
+ python -m zipfile -l dist/mailsense-0.1.0-py3-none-any.whl
403
+
404
+ # Upload to PyPI
405
+ twine upload dist/*
406
+
407
+ # Upload to TestPyPI first (recommended)
408
+ twine upload --repository testpypi dist/*
409
+ ```
410
+
411
+ ---
412
+
413
+ ## Project Structure
414
+
415
+ ```
416
+ mailsense/
417
+ ├── mailsense/
418
+ │ ├── __init__.py # version, metadata
419
+ │ ├── cli.py # argparse root + dispatcher
420
+ │ ├── config.py # ~/.mailsense read/write + interactive wizard
421
+ │ └── commands/
422
+ │ ├── __init__.py
423
+ │ ├── config_cmd.py # mailsense config
424
+ │ ├── download.py # mailsense download — Gmail IMAP → .mbox
425
+ │ ├── extract.py # mailsense extract — .mbox → images
426
+ │ ├── analyze.py # mailsense analyze — images → JSON
427
+ │ └── pipeline.py # mailsense pipeline — end-to-end
428
+ ├── tests/
429
+ │ └── test_config.py
430
+ ├── pyproject.toml
431
+ ├── LICENSE
432
+ └── README.md
433
+ ```
434
+
435
+ ---
436
+
437
+ ## Environment Variables
438
+
439
+ | Variable | Equivalent config key |
440
+ |----------|-----------------------|
441
+ | `GEMINI_API_KEY` | `gemini_api_key` |
442
+
443
+ CLI flags always take precedence over both config file and environment variables.
444
+
445
+ ---
446
+
447
+ ## Contributing
448
+
449
+ 1. Fork the repository
450
+ 2. Create a feature branch: `git checkout -b feature/my-feature`
451
+ 3. Make your changes and add tests
452
+ 4. Run tests: `pytest`
453
+ 5. Submit a pull request
454
+
455
+ ---
456
+
457
+ ## License
458
+
459
+ Copyright 2026 Samapriya Roy. Licensed under the [Apache License 2.0](LICENSE).