PyPI - mcp-vector-search - Versions diffs - 0.0.3__py3-none-any.whl → 0.4.12__py3-none-any.whl - Mend

mcp-vector-search 0.0.3py3-none-any.whl → 0.4.12py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.

This version of mcp-vector-search might be problematic. Click here for more details.

Files changed (49) hide show

mcp_vector_search/__init__.py +3 -2
mcp_vector_search/cli/commands/auto_index.py +397 -0
mcp_vector_search/cli/commands/config.py +88 -40
mcp_vector_search/cli/commands/index.py +198 -52
mcp_vector_search/cli/commands/init.py +471 -58
mcp_vector_search/cli/commands/install.py +284 -0
mcp_vector_search/cli/commands/mcp.py +495 -0
mcp_vector_search/cli/commands/search.py +241 -87
mcp_vector_search/cli/commands/status.py +184 -58
mcp_vector_search/cli/commands/watch.py +34 -35
mcp_vector_search/cli/didyoumean.py +184 -0
mcp_vector_search/cli/export.py +320 -0
mcp_vector_search/cli/history.py +292 -0
mcp_vector_search/cli/interactive.py +342 -0
mcp_vector_search/cli/main.py +175 -27
mcp_vector_search/cli/output.py +63 -45
mcp_vector_search/config/defaults.py +50 -36
mcp_vector_search/config/settings.py +49 -35
mcp_vector_search/core/auto_indexer.py +298 -0
mcp_vector_search/core/connection_pool.py +322 -0
mcp_vector_search/core/database.py +335 -25
mcp_vector_search/core/embeddings.py +73 -29
mcp_vector_search/core/exceptions.py +19 -2
mcp_vector_search/core/factory.py +310 -0
mcp_vector_search/core/git_hooks.py +345 -0
mcp_vector_search/core/indexer.py +237 -73
mcp_vector_search/core/models.py +21 -19
mcp_vector_search/core/project.py +73 -58
mcp_vector_search/core/scheduler.py +330 -0
mcp_vector_search/core/search.py +574 -86
mcp_vector_search/core/watcher.py +48 -46
mcp_vector_search/mcp/__init__.py +4 -0
mcp_vector_search/mcp/__main__.py +25 -0
mcp_vector_search/mcp/server.py +701 -0
mcp_vector_search/parsers/base.py +30 -31
mcp_vector_search/parsers/javascript.py +74 -48
mcp_vector_search/parsers/python.py +57 -49
mcp_vector_search/parsers/registry.py +47 -32
mcp_vector_search/parsers/text.py +179 -0
mcp_vector_search/utils/__init__.py +40 -0
mcp_vector_search/utils/gitignore.py +229 -0
mcp_vector_search/utils/timing.py +334 -0
mcp_vector_search/utils/version.py +47 -0
{mcp_vector_search-0.0.3.dist-info → mcp_vector_search-0.4.12.dist-info}/METADATA +173 -7
mcp_vector_search-0.4.12.dist-info/RECORD +54 -0
mcp_vector_search-0.0.3.dist-info/RECORD +0 -35
{mcp_vector_search-0.0.3.dist-info → mcp_vector_search-0.4.12.dist-info}/WHEEL +0 -0
{mcp_vector_search-0.0.3.dist-info → mcp_vector_search-0.4.12.dist-info}/entry_points.txt +0 -0
{mcp_vector_search-0.0.3.dist-info → mcp_vector_search-0.4.12.dist-info}/licenses/LICENSE +0 -0

mcp_vector_search/utils/timing.py ADDED Viewed

@@ -0,0 +1,334 @@
+"""Timing utilities for performance measurement and optimization."""
+import asyncio
+import json
+import statistics
+import time
+from collections.abc import Callable
+from contextlib import asynccontextmanager, contextmanager
+from dataclasses import dataclass, field
+from pathlib import Path
+from typing import Any
+from loguru import logger
+@dataclass
+class TimingResult:
+    """Result of a timing measurement."""
+    operation: str
+    duration: float  # in seconds
+    timestamp: float
+    metadata: dict[str, Any] = field(default_factory=dict)
+    @property
+    def duration_ms(self) -> float:
+        """Duration in milliseconds."""
+        return self.duration * 1000
+    @property
+    def duration_us(self) -> float:
+        """Duration in microseconds."""
+        return self.duration * 1_000_000
+class PerformanceProfiler:
+    """Performance profiler for measuring and analyzing operation timings."""
+    def __init__(self, name: str = "default"):
+        self.name = name
+        self.results: list[TimingResult] = []
+        self._active_timers: dict[str, float] = {}
+        self._nested_level = 0
+    def start_timer(self, operation: str) -> None:
+        """Start timing an operation."""
+        if operation in self._active_timers:
+            logger.warning(f"Timer '{operation}' already active, overwriting")
+        self._active_timers[operation] = time.perf_counter()
+    def stop_timer(
+        self, operation: str, metadata: dict[str, Any] | None = None
+    ) -> TimingResult:
+        """Stop timing an operation and record the result."""
+        if operation not in self._active_timers:
+            raise ValueError(f"Timer '{operation}' not found or not started")
+        start_time = self._active_timers.pop(operation)
+        duration = time.perf_counter() - start_time
+        result = TimingResult(
+            operation=operation,
+            duration=duration,
+            timestamp=time.time(),
+            metadata=metadata or {},
+        )
+        self.results.append(result)
+        return result
+    @contextmanager
+    def time_operation(self, operation: str, metadata: dict[str, Any] | None = None):
+        """Context manager for timing an operation."""
+        indent = "  " * self._nested_level
+        logger.debug(f"{indent}⏱️  Starting: {operation}")
+        self._nested_level += 1
+        start_time = time.perf_counter()
+        try:
+            yield
+        finally:
+            duration = time.perf_counter() - start_time
+            self._nested_level -= 1
+            result = TimingResult(
+                operation=operation,
+                duration=duration,
+                timestamp=time.time(),
+                metadata=metadata or {},
+            )
+            self.results.append(result)
+            indent = "  " * self._nested_level
+            logger.debug(f"{indent}✅ Completed: {operation} ({duration * 1000:.2f}ms)")
+    @asynccontextmanager
+    async def time_async_operation(
+        self, operation: str, metadata: dict[str, Any] | None = None
+    ):
+        """Async context manager for timing an operation."""
+        indent = "  " * self._nested_level
+        logger.debug(f"{indent}⏱️  Starting: {operation}")
+        self._nested_level += 1
+        start_time = time.perf_counter()
+        try:
+            yield
+        finally:
+            duration = time.perf_counter() - start_time
+            self._nested_level -= 1
+            result = TimingResult(
+                operation=operation,
+                duration=duration,
+                timestamp=time.time(),
+                metadata=metadata or {},
+            )
+            self.results.append(result)
+            indent = "  " * self._nested_level
+            logger.debug(f"{indent}✅ Completed: {operation} ({duration * 1000:.2f}ms)")
+    def get_stats(self, operation: str | None = None) -> dict[str, Any]:
+        """Get timing statistics for operations."""
+        if operation:
+            durations = [r.duration for r in self.results if r.operation == operation]
+        else:
+            durations = [r.duration for r in self.results]
+        if not durations:
+            return {}
+        return {
+            "count": len(durations),
+            "total": sum(durations),
+            "mean": statistics.mean(durations),
+            "median": statistics.median(durations),
+            "min": min(durations),
+            "max": max(durations),
+            "std_dev": statistics.stdev(durations) if len(durations) > 1 else 0.0,
+            "p95": statistics.quantiles(durations, n=20)[18]
+            if len(durations) >= 20
+            else max(durations),
+            "p99": statistics.quantiles(durations, n=100)[98]
+            if len(durations) >= 100
+            else max(durations),
+        }
+    def get_operation_breakdown(self) -> dict[str, dict[str, Any]]:
+        """Get breakdown of all operations."""
+        operations = {r.operation for r in self.results}
+        return {op: self.get_stats(op) for op in operations}
+    def print_report(self, show_individual: bool = False, min_duration_ms: float = 0.0):
+        """Print a detailed performance report."""
+        if not self.results:
+            print("No timing results recorded.")
+            return
+        print(f"\n{'=' * 60}")
+        print(f"PERFORMANCE REPORT: {self.name}")
+        print(f"{'=' * 60}")
+        # Overall stats
+        overall_stats = self.get_stats()
+        print("\nOVERALL STATISTICS:")
+        print(f"  Total operations: {overall_stats['count']}")
+        print(f"  Total time: {overall_stats['total'] * 1000:.2f}ms")
+        print(f"  Average: {overall_stats['mean'] * 1000:.2f}ms")
+        print(f"  Median: {overall_stats['median'] * 1000:.2f}ms")
+        print(f"  Min: {overall_stats['min'] * 1000:.2f}ms")
+        print(f"  Max: {overall_stats['max'] * 1000:.2f}ms")
+        # Per-operation breakdown
+        breakdown = self.get_operation_breakdown()
+        print("\nPER-OPERATION BREAKDOWN:")
+        for operation, stats in sorted(
+            breakdown.items(), key=lambda x: x[1]["total"], reverse=True
+        ):
+            print(f"\n  {operation}:")
+            print(f"    Count: {stats['count']}")
+            print(
+                f"    Total: {stats['total'] * 1000:.2f}ms ({stats['total'] / overall_stats['total'] * 100:.1f}%)"
+            )
+            print(f"    Average: {stats['mean'] * 1000:.2f}ms")
+            print(
+                f"    Min/Max: {stats['min'] * 1000:.2f}ms / {stats['max'] * 1000:.2f}ms"
+            )
+            if stats["count"] > 1:
+                print(f"    StdDev: {stats['std_dev'] * 1000:.2f}ms")
+        # Individual results if requested
+        if show_individual:
+            print("\nINDIVIDUAL RESULTS:")
+            for result in self.results:
+                if result.duration_ms >= min_duration_ms:
+                    print(f"  {result.operation}: {result.duration_ms:.2f}ms")
+                    if result.metadata:
+                        print(f"    Metadata: {result.metadata}")
+    def save_results(self, file_path: Path):
+        """Save timing results to a JSON file."""
+        data = {
+            "profiler_name": self.name,
+            "timestamp": time.time(),
+            "results": [
+                {
+                    "operation": r.operation,
+                    "duration": r.duration,
+                    "timestamp": r.timestamp,
+                    "metadata": r.metadata,
+                }
+                for r in self.results
+            ],
+            "stats": self.get_operation_breakdown(),
+        }
+        with open(file_path, "w") as f:
+            json.dump(data, f, indent=2)
+    def clear(self):
+        """Clear all timing results."""
+        self.results.clear()
+        self._active_timers.clear()
+        self._nested_level = 0
+# Global profiler instance
+_global_profiler = PerformanceProfiler("global")
+def time_function(
+    operation_name: str | None = None, metadata: dict[str, Any] | None = None
+):
+    """Decorator for timing function execution."""
+    def decorator(func: Callable) -> Callable:
+        name = operation_name or f"{func.__module__}.{func.__name__}"
+        if asyncio.iscoroutinefunction(func):
+            async def async_wrapper(*args, **kwargs):
+                async with _global_profiler.time_async_operation(name, metadata):
+                    return await func(*args, **kwargs)
+            return async_wrapper
+        else:
+            def sync_wrapper(*args, **kwargs):
+                with _global_profiler.time_operation(name, metadata):
+                    return func(*args, **kwargs)
+            return sync_wrapper
+    return decorator
+@contextmanager
+def time_block(operation: str, metadata: dict[str, Any] | None = None):
+    """Context manager for timing a block of code using the global profiler."""
+    with _global_profiler.time_operation(operation, metadata):
+        yield
+@asynccontextmanager
+async def time_async_block(operation: str, metadata: dict[str, Any] | None = None):
+    """Async context manager for timing a block of code using the global profiler."""
+    async with _global_profiler.time_async_operation(operation, metadata):
+        yield
+def get_global_profiler() -> PerformanceProfiler:
+    """Get the global profiler instance."""
+    return _global_profiler
+def print_global_report(**kwargs):
+    """Print report from the global profiler."""
+    _global_profiler.print_report(**kwargs)
+def clear_global_profiler():
+    """Clear the global profiler."""
+    _global_profiler.clear()
+class SearchProfiler(PerformanceProfiler):
+    """Specialized profiler for search operations."""
+    def __init__(self):
+        super().__init__("search_profiler")
+    async def profile_search(
+        self, search_func: Callable, query: str, **search_kwargs
+    ) -> tuple[Any, dict[str, float]]:
+        """Profile a complete search operation with detailed breakdown."""
+        async with self.time_async_operation(
+            "total_search", {"query": query, "kwargs": search_kwargs}
+        ):
+            # Time the actual search
+            async with self.time_async_operation("search_execution", {"query": query}):
+                result = await search_func(query, **search_kwargs)
+            # Time result processing if we can measure it
+            async with self.time_async_operation(
+                "result_processing",
+                {"result_count": len(result) if hasattr(result, "__len__") else 0},
+            ):
+                # Simulate any post-processing that might happen
+                await asyncio.sleep(0)  # Placeholder for actual processing
+        # Return results and timing breakdown
+        timing_breakdown = {
+            op: self.get_stats(op)["mean"] * 1000  # Convert to ms
+            for op in ["total_search", "search_execution", "result_processing"]
+            if self.get_stats(op)
+        }
+        return result, timing_breakdown
+# Convenience function for quick search profiling
+async def profile_search_operation(
+    search_func: Callable, query: str, **kwargs
+) -> tuple[Any, dict[str, float]]:
+    """Quick function to profile a search operation."""
+    profiler = SearchProfiler()
+    return await profiler.profile_search(search_func, query, **kwargs)

mcp_vector_search/utils/version.py ADDED Viewed

@@ -0,0 +1,47 @@
+"""Version utilities for MCP Vector Search.
+This module provides utilities for accessing and formatting version information.
+"""
+from typing import Any
+from .. import __author__, __build__, __email__, __version__
+def get_version_info() -> dict[str, Any]:
+    """Get complete version information.
+    Returns:
+        Dictionary containing version, build, and package metadata
+    """
+    return {
+        "version": __version__,
+        "build": __build__,
+        "author": __author__,
+        "email": __email__,
+        "package": "mcp-vector-search",
+        "version_string": f"{__version__} (build {__build__})",
+    }
+def get_version_string(include_build: bool = True) -> str:
+    """Get formatted version string.
+    Args:
+        include_build: Whether to include build number
+    Returns:
+        Formatted version string
+    """
+    if include_build:
+        return f"{__version__} (build {__build__})"
+    return __version__
+def get_user_agent() -> str:
+    """Get user agent string for HTTP requests.
+    Returns:
+        User agent string including version
+    """
+    return f"mcp-vector-search/{__version__}"

{mcp_vector_search-0.0.3.dist-info → mcp_vector_search-0.4.12.dist-info}/METADATA RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: mcp-vector-search
-Version: 0.0.3
+Version: 0.4.12
 Summary: CLI-first semantic code search with MCP integration
 Project-URL: Homepage, https://github.com/bobmatnyc/mcp-vector-search
 Project-URL: Documentation, https://mcp-vector-search.readthedocs.io
@@ -40,13 +40,15 @@ Classifier: Topic :: Software Development :: Libraries :: Python Modules
 Requires-Python: >=3.11
 Requires-Dist: aiofiles>=23.0.0
 Requires-Dist: chromadb>=0.5.0
+Requires-Dist: click-didyoumean>=0.3.0
 Requires-Dist: httpx>=0.25.0
 Requires-Dist: loguru>=0.7.0
+Requires-Dist: mcp>=1.12.4
 Requires-Dist: pydantic-settings>=2.1.0
 Requires-Dist: pydantic>=2.5.0
 Requires-Dist: rich>=13.0.0
 Requires-Dist: sentence-transformers>=2.2.2
-Requires-Dist: tree-sitter-languages>=1.8.0
+Requires-Dist: tree-sitter-language-pack>=0.9.0
 Requires-Dist: tree-sitter>=0.20.1
 Requires-Dist: typer>=0.9.0
 Requires-Dist: watchdog>=3.0.0
@@ -56,6 +58,10 @@ Description-Content-Type: text/markdown
 🔍 **CLI-first semantic code search with MCP integration**
+[![PyPI version](https://badge.fury.io/py/mcp-vector-search.svg)](https://badge.fury.io/py/mcp-vector-search)
+[![Python 3.11+](https://img.shields.io/badge/python-3.11+-blue.svg)](https://www.python.org/downloads/)
+[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
 > ⚠️ **Alpha Release (v0.0.3)**: This is an early-stage project under active development. Expect breaking changes and rough edges. Feedback and contributions are welcome!
 A modern, fast, and intelligent code search tool that understands your codebase through semantic analysis and AST parsing. Built with Python, powered by ChromaDB, and designed for developer productivity.
@@ -75,24 +81,31 @@ A modern, fast, and intelligent code search tool that understands your codebase
 - **Rich Output**: Syntax highlighting, similarity scores, context
 - **Fast Performance**: Sub-second search responses, efficient indexing
 - **Modern Architecture**: Async-first, type-safe, modular design
+- **Semi-Automatic Reindexing**: Multiple strategies without daemon processes
 ### 🔧 **Technical Features**
-- **Vector Database**: ChromaDB for efficient similarity search
+- **Vector Database**: ChromaDB with connection pooling for 13.6% performance boost
 - **Embedding Models**: Configurable sentence transformers
-- **Incremental Updates**: Smart file watching and re-indexing
+- **Smart Reindexing**: Search-triggered, Git hooks, scheduled tasks, and manual options
 - **Extensible Parsers**: Plugin architecture for new languages
 - **Configuration Management**: Project-specific settings
+- **Production Ready**: Connection pooling, auto-indexing, comprehensive error handling
 ## 🚀 Quick Start
 ### Installation
 ```bash
-# Install with UV (recommended)
+# Install from PyPI
+pip install mcp-vector-search
+# Or with UV (recommended)
 uv add mcp-vector-search
-# Or with pip
-pip install mcp-vector-search
+# Or install from source
+git clone https://github.com/bobmatnyc/mcp-vector-search.git
+cd mcp-vector-search
+uv sync && uv pip install -e .
 ```
 ### Basic Usage
@@ -109,6 +122,9 @@ mcp-vector-search search "authentication logic"
 mcp-vector-search search "database connection setup"
 mcp-vector-search search "error handling patterns"
+# Setup automatic reindexing (recommended)
+mcp-vector-search auto-index setup --method all
 # Check project status
 mcp-vector-search status
@@ -116,6 +132,32 @@ mcp-vector-search status
 mcp-vector-search watch
 ```
+### Smart CLI with "Did You Mean" Suggestions
+The CLI includes intelligent command suggestions for typos:
+```bash
+# Typos are automatically detected and corrected
+$ mcp-vector-search serach "auth"
+No such command 'serach'. Did you mean 'search'?
+$ mcp-vector-search indx
+No such command 'indx'. Did you mean 'index'?
+```
+See [docs/CLI_FEATURES.md](docs/CLI_FEATURES.md) for more details.
+## Versioning & Releasing
+This project uses semantic versioning with an automated release workflow.
+### Quick Commands
+- `make version-show` - Display current version
+- `make release-patch` - Create patch release
+- `make publish` - Publish to PyPI
+See [docs/VERSIONING_WORKFLOW.md](docs/VERSIONING_WORKFLOW.md) for complete documentation.
 ## 📖 Documentation
 ### Commands
@@ -142,6 +184,18 @@ mcp-vector-search index /path/to/code
 # Force re-indexing
 mcp-vector-search index --force
+# Reindex entire project
+mcp-vector-search index reindex
+# Reindex entire project (explicit)
+mcp-vector-search index reindex --all
+# Reindex entire project without confirmation
+mcp-vector-search index reindex --force
+# Reindex specific file
+mcp-vector-search index reindex path/to/file.py
 ```
 #### `search` - Semantic Search
@@ -159,6 +213,25 @@ mcp-vector-search search "error handling" --limit 10
 mcp-vector-search search similar "path/to/function.py:25"
 ```
+#### `auto-index` - Automatic Reindexing
+```bash
+# Setup all auto-indexing strategies
+mcp-vector-search auto-index setup --method all
+# Setup specific strategies
+mcp-vector-search auto-index setup --method git-hooks
+mcp-vector-search auto-index setup --method scheduled --interval 60
+# Check for stale files and auto-reindex
+mcp-vector-search auto-index check --auto-reindex --max-files 10
+# View auto-indexing status
+mcp-vector-search auto-index status
+# Remove auto-indexing setup
+mcp-vector-search auto-index teardown --method all
+```
 #### `watch` - File Watching
 ```bash
 # Start watching for changes
@@ -194,6 +267,39 @@ mcp-vector-search config set embedding_model microsoft/codebert-base
 mcp-vector-search config models
 ```
+## 🚀 Performance Features
+### Connection Pooling
+Automatic connection pooling provides **13.6% performance improvement** with zero configuration:
+```python
+# Automatically enabled for high-throughput scenarios
+from mcp_vector_search.core.database import PooledChromaVectorDatabase
+database = PooledChromaVectorDatabase(
+    max_connections=10,    # Pool size
+    min_connections=2,     # Warm connections
+    max_idle_time=300.0,   # 5 minutes
+)
+```
+### Semi-Automatic Reindexing
+Multiple strategies to keep your index up-to-date without daemon processes:
+1. **Search-Triggered**: Automatically checks for stale files during searches
+2. **Git Hooks**: Triggers reindexing after commits, merges, checkouts
+3. **Scheduled Tasks**: System-level cron jobs or Windows tasks
+4. **Manual Checks**: On-demand via CLI commands
+5. **Periodic Checker**: In-process periodic checks for long-running apps
+```bash
+# Setup all strategies
+mcp-vector-search auto-index setup --method all
+# Check status
+mcp-vector-search auto-index status
+```
 ### Configuration
 Projects are configured via `.mcp-vector-search/config.json`:
@@ -316,6 +422,66 @@ Please [open an issue](https://github.com/bobmatnyc/mcp-vector-search/issues) or
 - [ ] Team collaboration features
 - [ ] Production-ready performance
+## 🛠️ Development
+### Three-Stage Development Workflow
+**Stage A: Local Development & Testing**
+```bash
+# Setup development environment
+uv sync && uv pip install -e .
+# Run development tests
+./scripts/dev-test.sh
+# Test CLI locally
+uv run mcp-vector-search version
+```
+**Stage B: Local Deployment Testing**
+```bash
+# Build and test clean deployment
+./scripts/deploy-test.sh
+# Test on other projects
+cd ~/other-project
+mcp-vector-search init && mcp-vector-search index
+```
+**Stage C: PyPI Publication**
+```bash
+# Publish to PyPI
+./scripts/publish.sh
+# Verify published version
+pip install mcp-vector-search --upgrade
+```
+### Quick Reference
+```bash
+./scripts/workflow.sh  # Show workflow overview
+```
+See [DEVELOPMENT.md](DEVELOPMENT.md) for detailed development instructions.
+## 📚 Documentation
+For comprehensive documentation, see **[CLAUDE.md](CLAUDE.md)** - the main documentation index.
+### Quick Links
+- **[Installation & Deployment](docs/DEPLOY.md)** - Setup and deployment guide
+- **[Project Structure](docs/STRUCTURE.md)** - Architecture and file organization
+- **[Contributing Guidelines](docs/developer/CONTRIBUTING.md)** - How to contribute
+- **[API Reference](docs/developer/API.md)** - Internal API documentation
+- **[Testing Guide](docs/developer/TESTING.md)** - Testing strategies
+- **[Code Quality](docs/developer/LINTING.md)** - Linting and formatting
+- **[Versioning](docs/VERSIONING.md)** - Version management
+- **[Releases](docs/RELEASES.md)** - Release process
+## 🤝 Contributing
+Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.
 ## 📄 License
 MIT License - see [LICENSE](LICENSE) file for details.

mcp-vector-search 0.0.3__py3-none-any.whl → 0.4.12__py3-none-any.whl

Potentially problematic release.

mcp-vector-search 0.0.3py3-none-any.whl → 0.4.12py3-none-any.whl