mcwaddams 2026.5.22__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,89 @@
1
+ # Python
2
+ __pycache__/
3
+ *.py[cod]
4
+ *$py.class
5
+ *.so
6
+ .Python
7
+ build/
8
+ develop-eggs/
9
+ dist/
10
+ downloads/
11
+ eggs/
12
+ .eggs/
13
+ lib/
14
+ lib64/
15
+ parts/
16
+ sdist/
17
+ var/
18
+ wheels/
19
+ pip-wheel-metadata/
20
+ share/python-wheels/
21
+ *.egg-info/
22
+ .installed.cfg
23
+ *.egg
24
+ MANIFEST
25
+
26
+ # PyInstaller
27
+ *.manifest
28
+ *.spec
29
+
30
+ # Unit test / coverage reports
31
+ htmlcov/
32
+ .tox/
33
+ .nox/
34
+ .coverage
35
+ .coverage.*
36
+ .cache
37
+ nosetests.xml
38
+ coverage.xml
39
+ *.cover
40
+ *.py,cover
41
+ .hypothesis/
42
+ .pytest_cache/
43
+
44
+ # Virtual environments
45
+ .env
46
+ .venv
47
+ env/
48
+ venv/
49
+ ENV/
50
+ env.bak/
51
+ venv.bak/
52
+
53
+ # IDEs
54
+ .vscode/
55
+ .idea/
56
+ *.swp
57
+ *.swo
58
+ *~
59
+
60
+ # OS
61
+ .DS_Store
62
+ .DS_Store?
63
+ ._*
64
+ .Spotlight-V100
65
+ .Trashes
66
+ ehthumbs.db
67
+ Thumbs.db
68
+
69
+ # Project specific
70
+ *.log
71
+ temp/
72
+ tmp/
73
+ *.office_temp
74
+
75
+ # uv
76
+ .uv/
77
+
78
+ # Temporary files created during processing
79
+ *.tmp
80
+ *.temp
81
+
82
+ # Test documents (personal/private)
83
+ ORIGINAL - The Other Side of the Bed*.docx
84
+
85
+ # Reading progress bookmarks (user-specific)
86
+ .*.reading_progress.json
87
+
88
+ # Local MCP config
89
+ .mcp.json
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2024 MCP Office Tools
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
@@ -0,0 +1,431 @@
1
+ Metadata-Version: 2.4
2
+ Name: mcwaddams
3
+ Version: 2026.5.22
4
+ Summary: MCP server for Microsoft Office document processing. Named for Milton Waddams, who was relocated to the basement with boxes of legacy documents.
5
+ Project-URL: Homepage, https://mcwaddams.l.supported.systems
6
+ Project-URL: Repository, https://git.supported.systems/MCP/mcwaddams
7
+ Project-URL: Issues, https://git.supported.systems/MCP/mcwaddams/issues
8
+ Author-email: Ryan Malloy <ryan@supported.systems>
9
+ License: MIT
10
+ License-File: LICENSE
11
+ Keywords: document,docx,excel,legacy,mcp,milton,office,powerpoint,pptx,processing,word,xlsx
12
+ Classifier: Development Status :: 4 - Beta
13
+ Classifier: Intended Audience :: Developers
14
+ Classifier: License :: OSI Approved :: MIT License
15
+ Classifier: Programming Language :: Python :: 3
16
+ Classifier: Programming Language :: Python :: 3.11
17
+ Classifier: Programming Language :: Python :: 3.12
18
+ Classifier: Topic :: Office/Business :: Office Suites
19
+ Classifier: Topic :: Software Development :: Libraries :: Python Modules
20
+ Classifier: Topic :: Text Processing
21
+ Requires-Python: >=3.11
22
+ Requires-Dist: aiofiles>=23.2.0
23
+ Requires-Dist: aiohttp>=3.9.0
24
+ Requires-Dist: beautifulsoup4>=4.12.0
25
+ Requires-Dist: chardet>=5.0.0
26
+ Requires-Dist: fastmcp>=0.5.0
27
+ Requires-Dist: lxml>=4.9.0
28
+ Requires-Dist: mammoth>=1.6.0
29
+ Requires-Dist: msoffcrypto-tool>=5.4.0
30
+ Requires-Dist: olefile>=0.47
31
+ Requires-Dist: openpyxl>=3.1.0
32
+ Requires-Dist: pandas>=2.0.0
33
+ Requires-Dist: pillow>=10.0.0
34
+ Requires-Dist: python-docx>=1.1.0
35
+ Requires-Dist: python-pptx>=1.0.0
36
+ Requires-Dist: xlrd>=2.0.0
37
+ Requires-Dist: xlsxwriter>=3.1.0
38
+ Requires-Dist: xlwt>=1.3.0
39
+ Provides-Extra: conversion
40
+ Requires-Dist: pypandoc>=1.11; extra == 'conversion'
41
+ Provides-Extra: dev
42
+ Requires-Dist: black>=23.0.0; extra == 'dev'
43
+ Requires-Dist: mypy>=1.5.0; extra == 'dev'
44
+ Requires-Dist: pytest-asyncio>=0.21.0; extra == 'dev'
45
+ Requires-Dist: pytest-cov>=4.1.0; extra == 'dev'
46
+ Requires-Dist: pytest>=7.4.0; extra == 'dev'
47
+ Requires-Dist: ruff>=0.1.0; extra == 'dev'
48
+ Requires-Dist: types-beautifulsoup4; extra == 'dev'
49
+ Requires-Dist: types-chardet; extra == 'dev'
50
+ Requires-Dist: types-pillow; extra == 'dev'
51
+ Provides-Extra: enhanced
52
+ Requires-Dist: python-magic>=0.4.0; extra == 'enhanced'
53
+ Provides-Extra: nlp
54
+ Requires-Dist: nltk>=3.8; extra == 'nlp'
55
+ Requires-Dist: spacy>=3.7; extra == 'nlp'
56
+ Requires-Dist: textstat>=0.7; extra == 'nlp'
57
+ Description-Content-Type: text/markdown
58
+
59
+ <div align="center">
60
+
61
+ # ๐Ÿ“Ž mcwaddams
62
+
63
+ **MCP server for Microsoft Office document processing**
64
+
65
+ [![Python 3.11+](https://img.shields.io/badge/python-3.11+-blue.svg?style=flat-square)](https://www.python.org/downloads/)
66
+ [![FastMCP](https://img.shields.io/badge/FastMCP-0.5+-green.svg?style=flat-square)](https://gofastmcp.com)
67
+ [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg?style=flat-square)](https://opensource.org/licenses/MIT)
68
+ [![MCP Protocol](https://img.shields.io/badge/MCP-Protocol-purple?style=flat-square)](https://modelcontextprotocol.io)
69
+
70
+ *"I was told there would be document extraction."*
71
+
72
+ [Installation](#-installation) โ€ข [Tools](#-available-tools) โ€ข [Examples](#-usage-examples) โ€ข [Testing](#-testing)
73
+
74
+ </div>
75
+
76
+ ---
77
+
78
+ ## The Backstory
79
+
80
+ Milton Waddams was relocated to the basement. They took his stapler. But down there, surrounded by boxes of `.doc` files from 1997 and `.xls` spreadsheets that predate Unicode, he became something else entirely: a document processing expert.
81
+
82
+ This MCP server channels that energy. It handles the legacy formats nobody else wants to touch. It extracts text from files that should have been migrated to Google Docs a decade ago. It reads the TPS reports.
83
+
84
+ ---
85
+
86
+ ## โœจ Features
87
+
88
+ - **Universal extraction** โ€” Pull text, images, and metadata from any Office format
89
+ - **Format-specific tools** โ€” Deep analysis for Word (tables, structure), Excel (formulas, charts), PowerPoint
90
+ - **Automatic pagination** โ€” Large documents get chunked so they don't blow up your context window
91
+ - **Fallback processing** โ€” When one library chokes on a weird file, we try another
92
+ - **URL support** โ€” Pass a URL instead of a file path; we'll download and cache it
93
+ - **Legacy formats** โ€” Yes, even those `.doc` and `.xls` files from the basement
94
+
95
+ ---
96
+
97
+ ## ๐Ÿš€ Installation
98
+
99
+ ```bash
100
+ # Quick install with uvx (recommended)
101
+ uvx mcwaddams
102
+
103
+ # Or install with uv/pip
104
+ uv add mcwaddams
105
+ pip install mcwaddams
106
+ ```
107
+
108
+ ### Claude Desktop Configuration
109
+
110
+ Add to your `claude_desktop_config.json`:
111
+
112
+ ```json
113
+ {
114
+ "mcpServers": {
115
+ "mcwaddams": {
116
+ "command": "uvx",
117
+ "args": ["mcwaddams"]
118
+ }
119
+ }
120
+ }
121
+ ```
122
+
123
+ ### Claude Code Configuration
124
+
125
+ ```bash
126
+ claude mcp add mcwaddams "uvx mcwaddams"
127
+ ```
128
+
129
+ ---
130
+
131
+ ## ๐Ÿ›  Available Tools
132
+
133
+ ### Universal Tools
134
+ *Work with all Office formats: Word, Excel, PowerPoint, CSV*
135
+
136
+ | Tool | Description |
137
+ |------|-------------|
138
+ | `extract_text` | Extract text with optional formatting preservation |
139
+ | `extract_images` | Extract embedded images with size filtering |
140
+ | `extract_metadata` | Get document properties (author, dates, statistics) |
141
+ | `detect_office_format` | Identify format, version, encryption status |
142
+ | `analyze_document_health` | Check integrity, corruption, password protection |
143
+ | `get_supported_formats` | List all supported file extensions |
144
+ | `index_document` | Scan document and create resource URIs for on-demand fetching |
145
+
146
+ ### Word Tools
147
+
148
+ | Tool | Description |
149
+ |------|-------------|
150
+ | `convert_to_markdown` | Convert to Markdown with automatic pagination for large docs |
151
+ | `extract_word_tables` | Extract tables as structured JSON, CSV, or Markdown |
152
+ | `analyze_word_structure` | Analyze headings, sections, styles, and document hierarchy |
153
+ | `get_document_outline` | Get structured outline with chapter detection and word counts |
154
+ | `check_style_consistency` | Find formatting issues, missing chapters, style problems |
155
+ | `search_document` | Search text with context and chapter location |
156
+ | `extract_entities` | Extract people, places, organizations using pattern recognition |
157
+ | `get_chapter_summaries` | Generate chapter previews with opening sentences |
158
+ | `save_reading_progress` | Bookmark your reading position for later |
159
+ | `get_reading_progress` | Resume reading from saved position |
160
+
161
+ ### Excel Tools
162
+
163
+ | Tool | Description |
164
+ |------|-------------|
165
+ | `analyze_excel_data` | Statistical analysis: data types, missing values, outliers |
166
+ | `extract_excel_formulas` | Extract formulas with values and dependency analysis |
167
+ | `create_excel_chart_data` | Generate Chart.js/Plotly-ready data from spreadsheets |
168
+
169
+ ---
170
+
171
+ ## ๐Ÿ“‹ Format Support
172
+
173
+ Here's what works and what's "good enough" โ€” legacy formats from Office 97-2003 have more limited extraction, but they still work:
174
+
175
+ | Format | Extension | Text | Images | Metadata | Tables | Formulas |
176
+ |--------|-----------|:----:|:------:|:--------:|:------:|:--------:|
177
+ | **Word (Modern)** | `.docx` | โœ… | โœ… | โœ… | โœ… | - |
178
+ | **Word (Legacy)** | `.doc` | โœ… | โš ๏ธ | โš ๏ธ | โš ๏ธ | - |
179
+ | **Word Template** | `.dotx` | โœ… | โœ… | โœ… | โœ… | - |
180
+ | **Word Macro** | `.docm` | โœ… | โœ… | โœ… | โœ… | - |
181
+ | **Excel (Modern)** | `.xlsx` | โœ… | โœ… | โœ… | โœ… | โœ… |
182
+ | **Excel (Legacy)** | `.xls` | โœ… | โš ๏ธ | โš ๏ธ | โœ… | โš ๏ธ |
183
+ | **Excel Template** | `.xltx` | โœ… | โœ… | โœ… | โœ… | โœ… |
184
+ | **Excel Macro** | `.xlsm` | โœ… | โœ… | โœ… | โœ… | โœ… |
185
+ | **PowerPoint (Modern)** | `.pptx` | โœ… | โœ… | โœ… | โœ… | - |
186
+ | **PowerPoint (Legacy)** | `.ppt` | โœ… | โš ๏ธ | โš ๏ธ | โš ๏ธ | - |
187
+ | **PowerPoint Template** | `.potx` | โœ… | โœ… | โœ… | โœ… | - |
188
+ | **CSV** | `.csv` | โœ… | - | โš ๏ธ | โœ… | - |
189
+
190
+ โœ… Full support โ€ข โš ๏ธ Basic/partial support โ€ข - Not applicable
191
+
192
+ ---
193
+
194
+ ## ๐Ÿ”— MCP Resources
195
+
196
+ Instead of returning entire documents in tool responses, you can index a document once and fetch content on-demand via URI-based resources. This keeps context windows manageable when working with large files.
197
+
198
+ ### How It Works
199
+
200
+ 1. **Index the document** โ€” `index_document` scans the file and returns URIs
201
+ 2. **Fetch what you need** โ€” Request specific chapters, sheets, slides, or images by URI
202
+ 3. **Format on demand** โ€” Append `.txt` or `.html` to get different output formats
203
+
204
+ ### Resource URI Patterns
205
+
206
+ | URI Pattern | Description | Example |
207
+ |-------------|-------------|---------|
208
+ | `chapter://{doc_id}/{n}` | Single chapter/section | `chapter://abc123/3` |
209
+ | `chapters://{doc_id}/{range}` | Multiple chapters | `chapters://abc123/1-5` |
210
+ | `section://{doc_id}/{n}` | Section by heading style | `section://abc123/2` |
211
+ | `paragraph://{doc_id}/{ch}/{p}` | Specific paragraph | `paragraph://abc123/3/7` |
212
+ | `sheet://{doc_id}/{name}` | Excel sheet as markdown table | `sheet://abc123/Revenue` |
213
+ | `slide://{doc_id}/{n}` | PowerPoint slide | `slide://abc123/5` |
214
+ | `slides://{doc_id}/{range}` | Multiple slides | `slides://abc123/1,3,5` |
215
+ | `image://{doc_id}/{n}` | Embedded image | `image://abc123/0` |
216
+
217
+ ### Format Suffixes
218
+
219
+ Append a format suffix to convert on the fly:
220
+
221
+ | Suffix | Output |
222
+ |--------|--------|
223
+ | `.md` (default) | Markdown |
224
+ | `.txt` | Plain text (no formatting) |
225
+ | `.html` | Basic HTML |
226
+
227
+ Examples:
228
+ - `chapter://abc123/3` โ†’ Markdown (default)
229
+ - `chapter://abc123/3.txt` โ†’ Plain text
230
+ - `chapter://abc123/3.html` โ†’ HTML
231
+
232
+ ### Range Syntax
233
+
234
+ Fetch multiple items at once:
235
+ - `1-5` โ†’ Items 1 through 5
236
+ - `1,3,5` โ†’ Specific items
237
+ - `1-3,7,9-10` โ†’ Mixed ranges
238
+
239
+ ### Section Detection
240
+
241
+ The indexer detects document structure automatically:
242
+
243
+ 1. **Heading 1 styles** (primary) โ€” Business docs, manuals, technical documents
244
+ 2. **"Chapter X" text patterns** (fallback) โ€” Books, manuscripts, narratives
245
+
246
+ Use `text_patterns_only=True` to skip heading style detection for documents with messy formatting.
247
+
248
+ ---
249
+
250
+ ## ๐ŸŽฏ MCP Prompts
251
+
252
+ Pre-built workflows that chain multiple tools together:
253
+
254
+ | Prompt | Level | Description |
255
+ |--------|-------|-------------|
256
+ | `explore-document` | Basic | Start with any new document - get structure and identify issues |
257
+ | `find-character` | Basic | Track all mentions of a person/character with context |
258
+ | `chapter-preview` | Basic | Quick overview of each chapter without full read |
259
+ | `resume-reading` | Intermediate | Check saved position and continue reading |
260
+ | `document-analysis` | Intermediate | Comprehensive multi-tool analysis |
261
+ | `character-journey` | Advanced | Track character arc through entire narrative |
262
+ | `document-comparison` | Advanced | Compare entities and themes between chapters |
263
+ | `full-reading-session` | Advanced | Guided reading with bookmarking |
264
+ | `manuscript-review` | Advanced | Complete editorial workflow for editors |
265
+
266
+ ---
267
+
268
+ ## ๐Ÿ’ก Usage Examples
269
+
270
+ ### Extract Text from Any Document
271
+
272
+ ```python
273
+ # Simple extraction
274
+ result = await extract_text("report.docx")
275
+ print(result["text"])
276
+
277
+ # With formatting preserved
278
+ result = await extract_text(
279
+ file_path="report.docx",
280
+ preserve_formatting=True,
281
+ include_metadata=True
282
+ )
283
+ ```
284
+
285
+ ### Convert Word to Markdown (with Pagination)
286
+
287
+ Large documents get paginated automatically. Three ways to handle it:
288
+
289
+ ```python
290
+ # Option 1: Follow the cursor for each chunk
291
+ result = await convert_to_markdown("big-manual.docx")
292
+ if result.get("pagination", {}).get("has_more"):
293
+ next_page = await convert_to_markdown(
294
+ "big-manual.docx",
295
+ cursor_id=result["pagination"]["cursor_id"]
296
+ )
297
+
298
+ # Option 2: Grab specific pages
299
+ result = await convert_to_markdown("big-manual.docx", page_range="1-10")
300
+
301
+ # Option 3: Extract by chapter heading
302
+ result = await convert_to_markdown("big-manual.docx", chapter_name="Introduction")
303
+ ```
304
+
305
+ ### Analyze Excel Data Quality
306
+
307
+ ```python
308
+ result = await analyze_excel_data(
309
+ file_path="sales-data.xlsx",
310
+ include_statistics=True,
311
+ check_data_quality=True
312
+ )
313
+
314
+ # Returns per-column analysis with quality issues
315
+ ```
316
+
317
+ ### Index Document for On-Demand Resource Fetching
318
+
319
+ ```python
320
+ # Index the document - returns URIs for all content
321
+ result = await index_document("novel.docx")
322
+
323
+ # Returns:
324
+ # {
325
+ # "doc_id": "56036b0f171a",
326
+ # "resources": {
327
+ # "chapter": [
328
+ # {"id": "1", "title": "Chapter 1", "uri": "chapter://56036b0f171a/1"},
329
+ # ...
330
+ # ],
331
+ # "image": [
332
+ # {"id": "0", "uri": "image://56036b0f171a/0"},
333
+ # ...
334
+ # ]
335
+ # }
336
+ # }
337
+
338
+ # Fetch specific content via MCP resources:
339
+ # - chapter://56036b0f171a/1 โ†’ Chapter 1 as markdown
340
+ # - chapter://56036b0f171a/1.txt โ†’ Chapter 1 as plain text
341
+ # - chapters://56036b0f171a/1-3 โ†’ Chapters 1-3 combined
342
+ ```
343
+
344
+ ---
345
+
346
+ ## ๐Ÿงช Testing
347
+
348
+ ```bash
349
+ # Run tests and generate the dashboard
350
+ make test
351
+
352
+ # Just pytest
353
+ make test-pytest
354
+
355
+ # Open dashboard
356
+ make view-dashboard
357
+ ```
358
+
359
+ ---
360
+
361
+ ## ๐Ÿ— Architecture
362
+
363
+ The mixin pattern keeps things modular โ€” universal tools work on everything, format-specific tools go deeper.
364
+
365
+ ```
366
+ mcwaddams/
367
+ โ”œโ”€โ”€ src/mcwaddams/
368
+ โ”‚ โ”œโ”€โ”€ server.py # FastMCP server + resource templates
369
+ โ”‚ โ”œโ”€โ”€ resources.py # Resource store for on-demand content
370
+ โ”‚ โ”œโ”€โ”€ mixins/
371
+ โ”‚ โ”‚ โ”œโ”€โ”€ universal.py # Format-agnostic tools
372
+ โ”‚ โ”‚ โ”œโ”€โ”€ word.py # Word-specific tools
373
+ โ”‚ โ”‚ โ”œโ”€โ”€ excel.py # Excel-specific tools
374
+ โ”‚ โ”‚ โ””โ”€โ”€ powerpoint.py # PowerPoint tools
375
+ โ”‚ โ”œโ”€โ”€ utils/ # Validation, caching, detection
376
+ โ”‚ โ””โ”€โ”€ pagination.py # Large document pagination
377
+ โ”œโ”€โ”€ tests/
378
+ โ””โ”€โ”€ reports/
379
+ ```
380
+
381
+ ### Processing Libraries
382
+
383
+ | Format | Primary Library | Fallback |
384
+ |--------|----------------|----------|
385
+ | `.docx` | python-docx | mammoth |
386
+ | `.xlsx` | openpyxl | pandas |
387
+ | `.pptx` | python-pptx | - |
388
+ | `.doc`/`.xls`/`.ppt` | olefile | - |
389
+ | `.csv` | pandas | built-in csv |
390
+
391
+ ---
392
+
393
+ ## ๐Ÿ”ง Development
394
+
395
+ ```bash
396
+ git clone https://github.com/ryanmalloy/mcwaddams.git
397
+ cd mcwaddams
398
+ uv sync --dev
399
+
400
+ uv run pytest
401
+ uv run black src/ tests/
402
+ uv run ruff check src/ tests/
403
+ ```
404
+
405
+ ---
406
+
407
+ ## ๐Ÿ‘ค Author
408
+
409
+ **Ryan Malloy** โ€” [ryanmalloy.com](https://ryanmalloy.com)
410
+
411
+ This package emerged from a human-AI collaboration session. The process raised questions about discernment, voice, and what makes tools actually useful:
412
+
413
+ - **[AI Isn't New. Your Discernment Is What Matters.](https://ryanmalloy.com/blog/ai-discernment)** โ€” 40 years of writing code and why discernment matters more than the tools
414
+
415
+ ---
416
+
417
+ ## ๐Ÿ“œ License
418
+
419
+ MIT License - see [LICENSE](LICENSE) for details.
420
+
421
+ ---
422
+
423
+ <div align="center">
424
+
425
+ *Named for Milton Waddams, who was relocated to the basement with the legacy documents.*
426
+
427
+ *"I could set the building on fire..."*
428
+
429
+ **Built with [FastMCP](https://gofastmcp.com) and the [Model Context Protocol](https://modelcontextprotocol.io)**
430
+
431
+ </div>