PyPI - mcwaddams - Versions diffs - 2026.5.22__tar.gz - Mend

mcwaddams 2026.5.22__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (24) hide show

mcwaddams-2026.5.22/.gitignore +89 -0
mcwaddams-2026.5.22/LICENSE +21 -0
mcwaddams-2026.5.22/PKG-INFO +431 -0
mcwaddams-2026.5.22/README.md +373 -0
mcwaddams-2026.5.22/pyproject.toml +221 -0
mcwaddams-2026.5.22/src/mcwaddams/__init__.py +13 -0
mcwaddams-2026.5.22/src/mcwaddams/mixins/__init__.py +8 -0
mcwaddams-2026.5.22/src/mcwaddams/mixins/excel.py +487 -0
mcwaddams-2026.5.22/src/mcwaddams/mixins/powerpoint.py +48 -0
mcwaddams-2026.5.22/src/mcwaddams/mixins/universal.py +682 -0
mcwaddams-2026.5.22/src/mcwaddams/mixins/word.py +1464 -0
mcwaddams-2026.5.22/src/mcwaddams/pagination.py +494 -0
mcwaddams-2026.5.22/src/mcwaddams/processors/__init__.py +7 -0
mcwaddams-2026.5.22/src/mcwaddams/resources.py +243 -0
mcwaddams-2026.5.22/src/mcwaddams/server.py +625 -0
mcwaddams-2026.5.22/src/mcwaddams/utils/__init__.py +95 -0
mcwaddams-2026.5.22/src/mcwaddams/utils/caching.py +530 -0
mcwaddams-2026.5.22/src/mcwaddams/utils/decorators.py +126 -0
mcwaddams-2026.5.22/src/mcwaddams/utils/excel_processing.py +203 -0
mcwaddams-2026.5.22/src/mcwaddams/utils/file_detection.py +369 -0
mcwaddams-2026.5.22/src/mcwaddams/utils/powerpoint_processing.py +177 -0
mcwaddams-2026.5.22/src/mcwaddams/utils/processing.py +228 -0
mcwaddams-2026.5.22/src/mcwaddams/utils/validation.py +373 -0
mcwaddams-2026.5.22/src/mcwaddams/utils/word_processing.py +1220 -0

mcwaddams-2026.5.22/.gitignore ADDED Viewed

@@ -0,0 +1,89 @@
+# Python
+__pycache__/
+*.py[cod]
+*$py.class
+*.so
+.Python
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+pip-wheel-metadata/
+share/python-wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+MANIFEST
+# PyInstaller
+*.manifest
+*.spec
+# Unit test / coverage reports
+htmlcov/
+.tox/
+.nox/
+.coverage
+.coverage.*
+.cache
+nosetests.xml
+coverage.xml
+*.cover
+*.py,cover
+.hypothesis/
+.pytest_cache/
+# Virtual environments
+.env
+.venv
+env/
+venv/
+ENV/
+env.bak/
+venv.bak/
+# IDEs
+.vscode/
+.idea/
+*.swp
+*.swo
+*~
+# OS
+.DS_Store
+.DS_Store?
+._*
+.Spotlight-V100
+.Trashes
+ehthumbs.db
+Thumbs.db
+# Project specific
+*.log
+temp/
+tmp/
+*.office_temp
+# uv
+.uv/
+# Temporary files created during processing
+*.tmp
+*.temp
+# Test documents (personal/private)
+ORIGINAL - The Other Side of the Bed*.docx
+# Reading progress bookmarks (user-specific)
+.*.reading_progress.json
+# Local MCP config
+.mcp.json

mcwaddams-2026.5.22/LICENSE ADDED Viewed

@@ -0,0 +1,21 @@
+MIT License
+Copyright (c) 2024 MCP Office Tools
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

mcwaddams-2026.5.22/PKG-INFO ADDED Viewed

@@ -0,0 +1,431 @@
+Metadata-Version: 2.4
+Name: mcwaddams
+Version: 2026.5.22
+Summary: MCP server for Microsoft Office document processing. Named for Milton Waddams, who was relocated to the basement with boxes of legacy documents.
+Project-URL: Homepage, https://mcwaddams.l.supported.systems
+Project-URL: Repository, https://git.supported.systems/MCP/mcwaddams
+Project-URL: Issues, https://git.supported.systems/MCP/mcwaddams/issues
+Author-email: Ryan Malloy <ryan@supported.systems>
+License: MIT
+License-File: LICENSE
+Keywords: document,docx,excel,legacy,mcp,milton,office,powerpoint,pptx,processing,word,xlsx
+Classifier: Development Status :: 4 - Beta
+Classifier: Intended Audience :: Developers
+Classifier: License :: OSI Approved :: MIT License
+Classifier: Programming Language :: Python :: 3
+Classifier: Programming Language :: Python :: 3.11
+Classifier: Programming Language :: Python :: 3.12
+Classifier: Topic :: Office/Business :: Office Suites
+Classifier: Topic :: Software Development :: Libraries :: Python Modules
+Classifier: Topic :: Text Processing
+Requires-Python: >=3.11
+Requires-Dist: aiofiles>=23.2.0
+Requires-Dist: aiohttp>=3.9.0
+Requires-Dist: beautifulsoup4>=4.12.0
+Requires-Dist: chardet>=5.0.0
+Requires-Dist: fastmcp>=0.5.0
+Requires-Dist: lxml>=4.9.0
+Requires-Dist: mammoth>=1.6.0
+Requires-Dist: msoffcrypto-tool>=5.4.0
+Requires-Dist: olefile>=0.47
+Requires-Dist: openpyxl>=3.1.0
+Requires-Dist: pandas>=2.0.0
+Requires-Dist: pillow>=10.0.0
+Requires-Dist: python-docx>=1.1.0
+Requires-Dist: python-pptx>=1.0.0
+Requires-Dist: xlrd>=2.0.0
+Requires-Dist: xlsxwriter>=3.1.0
+Requires-Dist: xlwt>=1.3.0
+Provides-Extra: conversion
+Requires-Dist: pypandoc>=1.11; extra == 'conversion'
+Provides-Extra: dev
+Requires-Dist: black>=23.0.0; extra == 'dev'
+Requires-Dist: mypy>=1.5.0; extra == 'dev'
+Requires-Dist: pytest-asyncio>=0.21.0; extra == 'dev'
+Requires-Dist: pytest-cov>=4.1.0; extra == 'dev'
+Requires-Dist: pytest>=7.4.0; extra == 'dev'
+Requires-Dist: ruff>=0.1.0; extra == 'dev'
+Requires-Dist: types-beautifulsoup4; extra == 'dev'
+Requires-Dist: types-chardet; extra == 'dev'
+Requires-Dist: types-pillow; extra == 'dev'
+Provides-Extra: enhanced
+Requires-Dist: python-magic>=0.4.0; extra == 'enhanced'
+Provides-Extra: nlp
+Requires-Dist: nltk>=3.8; extra == 'nlp'
+Requires-Dist: spacy>=3.7; extra == 'nlp'
+Requires-Dist: textstat>=0.7; extra == 'nlp'
+Description-Content-Type: text/markdown
+<div align="center">
+# 📎 mcwaddams
+**MCP server for Microsoft Office document processing**
+[![Python 3.11+](https://img.shields.io/badge/python-3.11+-blue.svg?style=flat-square)](https://www.python.org/downloads/)
+[![FastMCP](https://img.shields.io/badge/FastMCP-0.5+-green.svg?style=flat-square)](https://gofastmcp.com)
+[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg?style=flat-square)](https://opensource.org/licenses/MIT)
+[![MCP Protocol](https://img.shields.io/badge/MCP-Protocol-purple?style=flat-square)](https://modelcontextprotocol.io)
+*"I was told there would be document extraction."*
+[Installation](#-installation) • [Tools](#-available-tools) • [Examples](#-usage-examples) • [Testing](#-testing)
+</div>
+---
+## The Backstory
+Milton Waddams was relocated to the basement. They took his stapler. But down there, surrounded by boxes of `.doc` files from 1997 and `.xls` spreadsheets that predate Unicode, he became something else entirely: a document processing expert.
+This MCP server channels that energy. It handles the legacy formats nobody else wants to touch. It extracts text from files that should have been migrated to Google Docs a decade ago. It reads the TPS reports.
+---
+## ✨ Features
+- **Universal extraction** — Pull text, images, and metadata from any Office format
+- **Format-specific tools** — Deep analysis for Word (tables, structure), Excel (formulas, charts), PowerPoint
+- **Automatic pagination** — Large documents get chunked so they don't blow up your context window
+- **Fallback processing** — When one library chokes on a weird file, we try another
+- **URL support** — Pass a URL instead of a file path; we'll download and cache it
+- **Legacy formats** — Yes, even those `.doc` and `.xls` files from the basement
+---
+## 🚀 Installation
+```bash
+# Quick install with uvx (recommended)
+uvx mcwaddams
+# Or install with uv/pip
+uv add mcwaddams
+pip install mcwaddams
+```
+### Claude Desktop Configuration
+Add to your `claude_desktop_config.json`:
+```json
+{
+  "mcpServers": {
+    "mcwaddams": {
+      "command": "uvx",
+      "args": ["mcwaddams"]
+    }
+  }
+}
+```
+### Claude Code Configuration
+```bash
+claude mcp add mcwaddams "uvx mcwaddams"
+```
+---
+## 🛠 Available Tools
+### Universal Tools
+*Work with all Office formats: Word, Excel, PowerPoint, CSV*
+| Tool | Description |
+|------|-------------|
+| `extract_text` | Extract text with optional formatting preservation |
+| `extract_images` | Extract embedded images with size filtering |
+| `extract_metadata` | Get document properties (author, dates, statistics) |
+| `detect_office_format` | Identify format, version, encryption status |
+| `analyze_document_health` | Check integrity, corruption, password protection |
+| `get_supported_formats` | List all supported file extensions |
+| `index_document` | Scan document and create resource URIs for on-demand fetching |
+### Word Tools
+| Tool | Description |
+|------|-------------|
+| `convert_to_markdown` | Convert to Markdown with automatic pagination for large docs |
+| `extract_word_tables` | Extract tables as structured JSON, CSV, or Markdown |
+| `analyze_word_structure` | Analyze headings, sections, styles, and document hierarchy |
+| `get_document_outline` | Get structured outline with chapter detection and word counts |
+| `check_style_consistency` | Find formatting issues, missing chapters, style problems |
+| `search_document` | Search text with context and chapter location |
+| `extract_entities` | Extract people, places, organizations using pattern recognition |
+| `get_chapter_summaries` | Generate chapter previews with opening sentences |
+| `save_reading_progress` | Bookmark your reading position for later |
+| `get_reading_progress` | Resume reading from saved position |
+### Excel Tools
+| Tool | Description |
+|------|-------------|
+| `analyze_excel_data` | Statistical analysis: data types, missing values, outliers |
+| `extract_excel_formulas` | Extract formulas with values and dependency analysis |
+| `create_excel_chart_data` | Generate Chart.js/Plotly-ready data from spreadsheets |
+---
+## 📋 Format Support
+Here's what works and what's "good enough" — legacy formats from Office 97-2003 have more limited extraction, but they still work:
+| Format | Extension | Text | Images | Metadata | Tables | Formulas |
+|--------|-----------|:----:|:------:|:--------:|:------:|:--------:|
+| **Word (Modern)** | `.docx` | ✅ | ✅ | ✅ | ✅ | - |
+| **Word (Legacy)** | `.doc` | ✅ | ⚠️ | ⚠️ | ⚠️ | - |
+| **Word Template** | `.dotx` | ✅ | ✅ | ✅ | ✅ | - |
+| **Word Macro** | `.docm` | ✅ | ✅ | ✅ | ✅ | - |
+| **Excel (Modern)** | `.xlsx` | ✅ | ✅ | ✅ | ✅ | ✅ |
+| **Excel (Legacy)** | `.xls` | ✅ | ⚠️ | ⚠️ | ✅ | ⚠️ |
+| **Excel Template** | `.xltx` | ✅ | ✅ | ✅ | ✅ | ✅ |
+| **Excel Macro** | `.xlsm` | ✅ | ✅ | ✅ | ✅ | ✅ |
+| **PowerPoint (Modern)** | `.pptx` | ✅ | ✅ | ✅ | ✅ | - |
+| **PowerPoint (Legacy)** | `.ppt` | ✅ | ⚠️ | ⚠️ | ⚠️ | - |
+| **PowerPoint Template** | `.potx` | ✅ | ✅ | ✅ | ✅ | - |
+| **CSV** | `.csv` | ✅ | - | ⚠️ | ✅ | - |
+✅ Full support • ⚠️ Basic/partial support • - Not applicable
+---
+## 🔗 MCP Resources
+Instead of returning entire documents in tool responses, you can index a document once and fetch content on-demand via URI-based resources. This keeps context windows manageable when working with large files.
+### How It Works
+1. **Index the document** — `index_document` scans the file and returns URIs
+2. **Fetch what you need** — Request specific chapters, sheets, slides, or images by URI
+3. **Format on demand** — Append `.txt` or `.html` to get different output formats
+### Resource URI Patterns
+| URI Pattern | Description | Example |
+|-------------|-------------|---------|
+| `chapter://{doc_id}/{n}` | Single chapter/section | `chapter://abc123/3` |
+| `chapters://{doc_id}/{range}` | Multiple chapters | `chapters://abc123/1-5` |
+| `section://{doc_id}/{n}` | Section by heading style | `section://abc123/2` |
+| `paragraph://{doc_id}/{ch}/{p}` | Specific paragraph | `paragraph://abc123/3/7` |
+| `sheet://{doc_id}/{name}` | Excel sheet as markdown table | `sheet://abc123/Revenue` |
+| `slide://{doc_id}/{n}` | PowerPoint slide | `slide://abc123/5` |
+| `slides://{doc_id}/{range}` | Multiple slides | `slides://abc123/1,3,5` |
+| `image://{doc_id}/{n}` | Embedded image | `image://abc123/0` |
+### Format Suffixes
+Append a format suffix to convert on the fly:
+| Suffix | Output |
+|--------|--------|
+| `.md` (default) | Markdown |
+| `.txt` | Plain text (no formatting) |
+| `.html` | Basic HTML |
+Examples:
+- `chapter://abc123/3` → Markdown (default)
+- `chapter://abc123/3.txt` → Plain text
+- `chapter://abc123/3.html` → HTML
+### Range Syntax
+Fetch multiple items at once:
+- `1-5` → Items 1 through 5
+- `1,3,5` → Specific items
+- `1-3,7,9-10` → Mixed ranges
+### Section Detection
+The indexer detects document structure automatically:
+1. **Heading 1 styles** (primary) — Business docs, manuals, technical documents
+2. **"Chapter X" text patterns** (fallback) — Books, manuscripts, narratives
+Use `text_patterns_only=True` to skip heading style detection for documents with messy formatting.
+---
+## 🎯 MCP Prompts
+Pre-built workflows that chain multiple tools together:
+| Prompt | Level | Description |
+|--------|-------|-------------|
+| `explore-document` | Basic | Start with any new document - get structure and identify issues |
+| `find-character` | Basic | Track all mentions of a person/character with context |
+| `chapter-preview` | Basic | Quick overview of each chapter without full read |
+| `resume-reading` | Intermediate | Check saved position and continue reading |
+| `document-analysis` | Intermediate | Comprehensive multi-tool analysis |
+| `character-journey` | Advanced | Track character arc through entire narrative |
+| `document-comparison` | Advanced | Compare entities and themes between chapters |
+| `full-reading-session` | Advanced | Guided reading with bookmarking |
+| `manuscript-review` | Advanced | Complete editorial workflow for editors |
+---
+## 💡 Usage Examples
+### Extract Text from Any Document
+```python
+# Simple extraction
+result = await extract_text("report.docx")
+print(result["text"])
+# With formatting preserved
+result = await extract_text(
+    file_path="report.docx",
+    preserve_formatting=True,
+    include_metadata=True
+)
+```
+### Convert Word to Markdown (with Pagination)
+Large documents get paginated automatically. Three ways to handle it:
+```python
+# Option 1: Follow the cursor for each chunk
+result = await convert_to_markdown("big-manual.docx")
+if result.get("pagination", {}).get("has_more"):
+    next_page = await convert_to_markdown(
+        "big-manual.docx",
+        cursor_id=result["pagination"]["cursor_id"]
+    )
+# Option 2: Grab specific pages
+result = await convert_to_markdown("big-manual.docx", page_range="1-10")
+# Option 3: Extract by chapter heading
+result = await convert_to_markdown("big-manual.docx", chapter_name="Introduction")
+```
+### Analyze Excel Data Quality
+```python
+result = await analyze_excel_data(
+    file_path="sales-data.xlsx",
+    include_statistics=True,
+    check_data_quality=True
+)
+# Returns per-column analysis with quality issues
+```
+### Index Document for On-Demand Resource Fetching
+```python
+# Index the document - returns URIs for all content
+result = await index_document("novel.docx")
+# Returns:
+# {
+#   "doc_id": "56036b0f171a",
+#   "resources": {
+#     "chapter": [
+#       {"id": "1", "title": "Chapter 1", "uri": "chapter://56036b0f171a/1"},
+#       ...
+#     ],
+#     "image": [
+#       {"id": "0", "uri": "image://56036b0f171a/0"},
+#       ...
+#     ]
+#   }
+# }
+# Fetch specific content via MCP resources:
+# - chapter://56036b0f171a/1      → Chapter 1 as markdown
+# - chapter://56036b0f171a/1.txt  → Chapter 1 as plain text
+# - chapters://56036b0f171a/1-3   → Chapters 1-3 combined
+```
+---
+## 🧪 Testing
+```bash
+# Run tests and generate the dashboard
+make test
+# Just pytest
+make test-pytest
+# Open dashboard
+make view-dashboard
+```
+---
+## 🏗 Architecture
+The mixin pattern keeps things modular — universal tools work on everything, format-specific tools go deeper.
+```
+mcwaddams/
+├── src/mcwaddams/
+│   ├── server.py              # FastMCP server + resource templates
+│   ├── resources.py           # Resource store for on-demand content
+│   ├── mixins/
+│   │   ├── universal.py       # Format-agnostic tools
+│   │   ├── word.py            # Word-specific tools
+│   │   ├── excel.py           # Excel-specific tools
+│   │   └── powerpoint.py      # PowerPoint tools
+│   ├── utils/                 # Validation, caching, detection
+│   └── pagination.py          # Large document pagination
+├── tests/
+└── reports/
+```
+### Processing Libraries
+| Format | Primary Library | Fallback |
+|--------|----------------|----------|
+| `.docx` | python-docx | mammoth |
+| `.xlsx` | openpyxl | pandas |
+| `.pptx` | python-pptx | - |
+| `.doc`/`.xls`/`.ppt` | olefile | - |
+| `.csv` | pandas | built-in csv |
+---
+## 🔧 Development
+```bash
+git clone https://github.com/ryanmalloy/mcwaddams.git
+cd mcwaddams
+uv sync --dev
+uv run pytest
+uv run black src/ tests/
+uv run ruff check src/ tests/
+```
+---
+## 👤 Author
+**Ryan Malloy** — [ryanmalloy.com](https://ryanmalloy.com)
+This package emerged from a human-AI collaboration session. The process raised questions about discernment, voice, and what makes tools actually useful:
+- **[AI Isn't New. Your Discernment Is What Matters.](https://ryanmalloy.com/blog/ai-discernment)** — 40 years of writing code and why discernment matters more than the tools
+---
+## 📜 License
+MIT License - see [LICENSE](LICENSE) for details.
+---
+<div align="center">
+*Named for Milton Waddams, who was relocated to the basement with the legacy documents.*
+*"I could set the building on fire..."*
+**Built with [FastMCP](https://gofastmcp.com) and the [Model Context Protocol](https://modelcontextprotocol.io)**
+</div>