PyPI - databricks-advanced-mcp - Versions diffs - 0.0.2__tar.gz - Mend

databricks-advanced-mcp 0.0.2__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (45) hide show

databricks_advanced_mcp-0.0.2/.env.example ADDED Viewed

@@ -0,0 +1,20 @@
+# Databricks Configuration (Required)
+# Azure Databricks: https://adb-xxxx.azuredatabricks.net
+# Databricks on AWS: https://dbc-xxxx.cloud.databricks.com
+DATABRICKS_HOST=https://your-workspace.azuredatabricks.net
+# Authentication — provide ONE of the following:
+DATABRICKS_TOKEN=dapi_your_personal_access_token
+# OR use Azure CLI / managed identity (no env vars needed if already configured)
+# ARM_CLIENT_ID=your-service-principal-client-id
+# ARM_TENANT_ID=your-azure-tenant-id
+# ARM_CLIENT_SECRET=your-service-principal-secret
+# SQL Warehouse (Required for SQL execution tools)
+DATABRICKS_WAREHOUSE_ID=your-sql-warehouse-id
+# Optional — defaults for unqualified table names
+# Azure Databricks: typically "main"
+# Databricks on AWS/GCP: typically "workspace"
+DATABRICKS_CATALOG=main
+DATABRICKS_SCHEMA=default

databricks_advanced_mcp-0.0.2/.github/workflows/workflow.yml ADDED Viewed

@@ -0,0 +1,32 @@
+name: Publish to PyPI
+on:
+  release:
+    types: [published]
+permissions:
+  id-token: write
+jobs:
+  pypi-publish:
+    name: Upload release to PyPI
+    runs-on: ubuntu-latest
+    environment:
+      name: pypi
+      url: https://pypi.org/p/databricks-advanced-mcp
+    steps:
+      - uses: actions/checkout@v4
+      - name: Set up Python
+        uses: actions/setup-python@v5
+        with:
+          python-version: "3.11"
+      - name: Install build dependencies
+        run: pip install build
+      - name: Build package
+        run: python -m build
+      - name: Publish to PyPI
+        uses: pypa/gh-action-pypi-publish@release/v1

databricks_advanced_mcp-0.0.2/.gitignore ADDED Viewed

@@ -0,0 +1,26 @@
+# Environment
+.env
+.env_*
+.venv/
+# Python
+__pycache__/
+*.pyc
+*.pyo
+*.egg-info/
+dist/
+build/
+# Testing
+.pytest_cache/
+.coverage
+htmlcov/
+# IDE
+.idea/
+*.swp
+*.swo
+# OS
+.DS_Store
+Thumbs.db

databricks_advanced_mcp-0.0.2/LICENSE ADDED Viewed

@@ -0,0 +1,21 @@
+MIT License
+Copyright (c) 2026 Henry Bravo
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

databricks_advanced_mcp-0.0.2/PKG-INFO ADDED Viewed

@@ -0,0 +1,346 @@
+Metadata-Version: 2.4
+Name: databricks-advanced-mcp
+Version: 0.0.2
+Summary: Advanced MCP server for Databricks workspace intelligence — dependency scanning, impact analysis, notebook review, and job/pipeline operations.
+Project-URL: Homepage, https://github.com/henrybravo/databricks-advanced-mcp-server
+Project-URL: Repository, https://github.com/henrybravo/databricks-advanced-mcp-server
+Project-URL: Issues, https://github.com/henrybravo/databricks-advanced-mcp-server/issues
+Author: Henry Bravo
+License: MIT
+License-File: LICENSE
+Keywords: aws-databricks,azure-databricks,claude,copilot,databricks,databricks-cloud,fastmcp,mcp,mcp-server
+Classifier: Development Status :: 3 - Alpha
+Classifier: Intended Audience :: Developers
+Classifier: License :: OSI Approved :: MIT License
+Classifier: Programming Language :: Python :: 3
+Classifier: Programming Language :: Python :: 3.11
+Classifier: Programming Language :: Python :: 3.12
+Classifier: Programming Language :: Python :: 3.13
+Requires-Python: >=3.11
+Requires-Dist: databricks-sdk>=0.30.0
+Requires-Dist: fastmcp>=2.0.0
+Requires-Dist: networkx>=3.0
+Requires-Dist: pydantic-settings>=2.0.0
+Requires-Dist: pydantic>=2.0.0
+Requires-Dist: sqlglot>=25.0.0
+Provides-Extra: dev
+Requires-Dist: mypy>=1.10; extra == 'dev'
+Requires-Dist: pytest-asyncio>=0.23; extra == 'dev'
+Requires-Dist: pytest>=8.0; extra == 'dev'
+Requires-Dist: ruff>=0.5.0; extra == 'dev'
+Description-Content-Type: text/markdown
+# Databricks Advanced MCP Server
+[![Python 3.11+](https://img.shields.io/badge/python-3.11%2B-blue.svg)](https://www.python.org/downloads/)
+[![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)](LICENSE)
+[![MCP](https://img.shields.io/badge/MCP-compatible-purple.svg)](https://modelcontextprotocol.io)
+An advanced [Model Context Protocol (MCP)](https://modelcontextprotocol.io) server that gives AI assistants deep visibility into your Databricks workspace - dependency scanning, impact analysis, notebook review, job/pipeline operations, SQL execution, and table metadata inspection.
+## Features
+| Domain | What it does |
+|---|---|
+| **SQL Execution** | Run SQL queries against Databricks SQL warehouses with configurable result limits |
+| **Table Information** | Inspect table metadata, schemas, column details, row counts, and storage info |
+| **Dependency Scanning** | Scan notebooks, jobs, and DLT pipelines to build a workspace dependency graph (DAG) |
+| **Impact Analysis** | Predict downstream breakage from column drops, schema changes, or pipeline failures |
+| **Notebook Review** | Detect performance anti-patterns, coding standard violations, and suggest optimizations |
+| **Job & Pipeline Ops** | List jobs/pipelines, get run status with error diagnostics, trigger reruns |
+## Quick Start
+### Prerequisites
+- **Python 3.11+**
+- **[uv](https://docs.astral.sh/uv/)** — fast Python package manager
+- A **Databricks workspace** with a SQL warehouse
+- A Databricks **personal access token**
+> **Other auth methods:** The Databricks SDK supports [unified authentication](https://docs.databricks.com/en/dev-tools/auth/unified-auth.html) — if you don't set `DATABRICKS_TOKEN`, it will fall back to Azure CLI, managed identity, or `.databrickscfg`. The `.env` setup below uses a PAT for simplicity.
+>
+> **Don't have a Databricks workspace yet?** See [`infra/INSTALL.md`](infra/INSTALL.md) for a one-command Azure deployment using Bicep.
+### 1. Install
+#### Option A: Install from PyPI (recommended)
+```bash
+uv pip install databricks-advanced-mcp
+```
+Or with pip:
+```bash
+pip install databricks-advanced-mcp
+```
+#### Option B: Install from source
+```bash
+git clone https://github.com/henrybravo/databricks-advanced-mcp-server.git
+cd databricks-advanced-mcp-server
+```
+Create and activate a virtual environment:
+**Windows (PowerShell)**
+```powershell
+uv venv .venv
+.\.venv\Scripts\Activate.ps1
+uv pip install -e .
+```
+**macOS / Linux**
+```bash
+uv venv .venv
+source .venv/bin/activate
+uv pip install -e .
+```
+### 2. Configure
+```bash
+cp .env.example .env
+```
+Edit `.env` with your Databricks credentials:
+```dotenv
+# Azure Databricks:
+DATABRICKS_HOST=https://adb-xxxx.azuredatabricks.net
+# Databricks on AWS / GCP:
+# DATABRICKS_HOST=https://dbc-xxxx.cloud.databricks.com
+DATABRICKS_TOKEN=dapi_your_token
+DATABRICKS_WAREHOUSE_ID=your_warehouse_id
+# Optional (defaults shown)
+# Azure workspaces typically use "main"; AWS/GCP workspaces use "workspace"
+DATABRICKS_CATALOG=main
+DATABRICKS_SCHEMA=default
+```
+### 3. Add to your IDE
+Create `.vscode/mcp.json` in your project to register the MCP server with VS Code / GitHub Copilot.
+#### Option A: PyPI install (recommended)
+If you installed from PyPI (`pip install databricks-advanced-mcp`), the `databricks-mcp` CLI is available on your PATH:
+```jsonc
+{
+  "servers": {
+    "databricks-mcp": {
+      "type": "stdio",
+      "command": "databricks-mcp",
+      "env": {
+        "DATABRICKS_HOST": "https://adb-xxxx.azuredatabricks.net",
+        "DATABRICKS_TOKEN": "dapi_your_token",
+        "DATABRICKS_WAREHOUSE_ID": "your_warehouse_id"
+      }
+    }
+  }
+}
+```
+#### Option B: Virtual environment (source install)
+If you cloned the repo and installed into a local `.venv`, point directly to the Python interpreter:
+**Windows**
+```jsonc
+{
+  "servers": {
+    "databricks-mcp": {
+      "type": "stdio",
+      "command": "${workspaceFolder}/.venv/Scripts/python.exe",
+      "args": ["-m", "databricks_advanced_mcp.server"],
+      "envFile": "${workspaceFolder}/.env"
+    }
+  }
+}
+```
+**macOS / Linux**
+```jsonc
+{
+  "servers": {
+    "databricks-mcp": {
+      "type": "stdio",
+      "command": "${workspaceFolder}/.venv/bin/python",
+      "args": ["-m", "databricks_advanced_mcp.server"],
+      "envFile": "${workspaceFolder}/.env"
+    }
+  }
+}
+```
+#### Multiple Workspaces
+Each MCP server instance connects to exactly one Databricks workspace. To work with multiple workspaces simultaneously, register a separate server entry per workspace — each with its own credentials:
+```jsonc
+{
+  "servers": {
+    // AWS / GCP workspace
+    "databricks-cloud": {
+      "type": "stdio",
+      "command": "databricks-mcp",
+      "env": {
+        "DATABRICKS_HOST": "https://dbc-xxxx.cloud.databricks.com",
+        "DATABRICKS_TOKEN": "dapi_cloud_token",
+        "DATABRICKS_WAREHOUSE_ID": "cloud_warehouse_id",
+        "DATABRICKS_CATALOG": "workspace"
+      }
+    },
+    // Azure workspace
+    "databricks-azure": {
+      "type": "stdio",
+      "command": "databricks-mcp",
+      "env": {
+        "DATABRICKS_HOST": "https://adb-xxxx.azuredatabricks.net",
+        "DATABRICKS_TOKEN": "dapi_azure_token",
+        "DATABRICKS_WAREHOUSE_ID": "azure_warehouse_id",
+        "DATABRICKS_CATALOG": "main"
+      }
+    }
+  }
+}
+```
+Alternatively, with a source install you can use separate `.env` files per workspace:
+```jsonc
+{
+  "servers": {
+    "databricks-cloud": {
+      "type": "stdio",
+      "command": "${workspaceFolder}/.venv/bin/python",
+      "args": ["-m", "databricks_advanced_mcp.server"],
+      "envFile": "${workspaceFolder}/.env"
+    },
+    "databricks-azure": {
+      "type": "stdio",
+      "command": "${workspaceFolder}/.venv/bin/python",
+      "args": ["-m", "databricks_advanced_mcp.server"],
+      "envFile": "${workspaceFolder}/.env_azure"
+    }
+  }
+}
+```
+### 4. Start using
+Once configured, your AI assistant can call any of the tools below. Try prompts like:
+- *"List all tables in the `analytics` schema"*
+- *"Review the notebook at `/Users/me/etl_pipeline` for performance issues"*
+- *"What would break if I drop the `customer_id` column from `main.sales.orders`?"*
+- *"Show me the status of job 12345"*
+## MCP Tools
+| Tool | Description |
+|---|---|
+| `execute_query` | Execute SQL against a Databricks SQL warehouse |
+| `get_table_info` | Get table metadata — columns, row count, properties, storage |
+| `list_tables` | List tables in a catalog.schema |
+| `scan_notebook` | Scan a notebook for table/column references |
+| `scan_jobs` | Scan all jobs for table dependencies |
+| `scan_dlt_pipelines` | Scan all DLT pipelines for source/target tables |
+| `build_dependency_graph` | Build the full workspace dependency graph |
+| `get_table_dependencies` | Get upstream/downstream dependencies for a table |
+| `refresh_graph` | Invalidate and rebuild the dependency graph cache |
+| `analyze_impact` | Analyze impact of column drop / schema change / pipeline failure |
+| `review_notebook` | Review a notebook for issues, anti-patterns, and optimizations |
+| `list_jobs` | List jobs with status and schedule info |
+| `get_job_status` | Get detailed job run status with error diagnostics |
+| `list_pipelines` | List DLT pipelines with state and update status |
+| `get_pipeline_status` | Get pipeline update details with event log |
+| `trigger_rerun` | Trigger a job rerun (requires confirmation) |
+## Configuration Reference
+| Variable | Required | Default | Description |
+|---|---|---|---|
+| `DATABRICKS_HOST` | Yes | — | Workspace URL (`https://adb-xxx.azuredatabricks.net` for Azure, `https://dbc-xxx.cloud.databricks.com` for AWS/GCP) |
+| `DATABRICKS_TOKEN` | Yes | — | Personal access token or service principal token |
+| `DATABRICKS_WAREHOUSE_ID` | Yes | — | SQL warehouse ID for query execution |
+| `DATABRICKS_CATALOG` | No | `main` | Default catalog for unqualified table names — use `workspace` for AWS/GCP |
+| `DATABRICKS_SCHEMA` | No | `default` | Default schema for unqualified table names |
+### Cloud Provider Notes
+This server is tested against **Azure Databricks** and **Databricks on AWS** (`.cloud.databricks.com`). Key differences:
+| Aspect | Azure | AWS / GCP |
+|---|---|---|
+| Host format | `https://adb-xxx.azuredatabricks.net` | `https://dbc-xxx.cloud.databricks.com` |
+| Default catalog | `main` | `workspace` |
+| Workspace root objects | `DIRECTORY` | `DIRECTORY` and `REPO` |
+All tools work on both platforms. Set `DATABRICKS_CATALOG` to match your workspace's default catalog.
+## Infrastructure (Optional)
+If you need to provision a new Azure Databricks workspace, the `infra/` directory contains:
+- **`main.bicep`** — Azure Bicep template (Premium SKU, Unity Catalog enabled)
+- **`deploy.ps1`** — One-command PowerShell deployment script
+- **`INSTALL.md`** — Detailed step-by-step deployment guide
+```bash
+cd infra
+./deploy.ps1 -ResourceGroupName rg-databricks-mcp -Location eastus2
+```
+## Development
+```bash
+# Install with dev dependencies
+uv pip install -e ".[dev]"
+# Run tests
+uv run pytest
+# Lint
+uv run ruff check src/ tests/
+# Type check
+uv run mypy src/
+```
+## Architecture
+```
+src/databricks_advanced_mcp/
+├── server.py          # FastMCP server + CLI entry point
+├── config.py          # Pydantic settings from env vars
+├── client.py          # Databricks SDK client factory
+├── tools/             # MCP tool implementations
+│   ├── sql_executor.py
+│   ├── table_info.py
+│   ├── dependency_scanner.py
+│   ├── impact_analysis.py
+│   ├── notebook_reviewer.py
+│   └── job_pipeline_ops.py
+├── parsers/           # Code parsing engines
+│   ├── sql_parser.py       # sqlglot-based SQL extraction
+│   ├── notebook_parser.py  # Databricks notebook cell parsing
+│   └── dlt_parser.py       # DLT pipeline definition parsing
+├── graph/             # Dependency graph
+│   ├── models.py      # Node, Edge, DependencyGraph data models
+│   ├── builder.py     # Graph builder (orchestrates scans)
+│   └── cache.py       # In-memory graph cache with TTL
+└── reviewers/         # Notebook review rule engines
+    ├── performance.py # Performance anti-patterns
+    ├── standards.py   # Coding standards checks
+    └── suggestions.py # Optimization suggestions
+```
+## License
+[MIT](LICENSE)