PyPI - mcp-code-indexer - Versions diffs - 2.2.1__tar.gz → 2.4.0__tar.gz - Mend

mcp-code-indexer 2.2.1tar.gz → 2.4.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (51) hide show

{mcp_code_indexer-2.2.1/src/mcp_code_indexer.egg-info → mcp_code_indexer-2.4.0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: mcp-code-indexer
-Version: 2.2.1
+Version: 2.4.0
 Summary: MCP server that tracks file descriptions across codebases, enabling AI agents to efficiently navigate and understand code through searchable summaries and token-aware overviews.
 Author: MCP Code Indexer Contributors
 Maintainer: MCP Code Indexer Contributors
@@ -59,8 +59,8 @@ Dynamic: requires-python
 # MCP Code Indexer 🚀
-[![PyPI version](https://badge.fury.io/py/mcp-code-indexer.svg?16)](https://badge.fury.io/py/mcp-code-indexer)
-[![Python](https://img.shields.io/pypi/pyversions/mcp-code-indexer.svg?16)](https://pypi.org/project/mcp-code-indexer/)
+[![PyPI version](https://badge.fury.io/py/mcp-code-indexer.svg?18)](https://badge.fury.io/py/mcp-code-indexer)
+[![Python](https://img.shields.io/pypi/pyversions/mcp-code-indexer.svg?18)](https://pypi.org/project/mcp-code-indexer/)
 [![License](https://img.shields.io/badge/License-MIT-blue.svg)](https://opensource.org/licenses/MIT)
 A production-ready **Model Context Protocol (MCP) server** that revolutionizes how AI agents navigate and understand codebases. Built for high-concurrency environments with advanced database resilience, the server provides instant access to intelligent descriptions, semantic search, and context-aware recommendations while maintaining 800+ writes/sec throughput.

{mcp_code_indexer-2.2.1 → mcp_code_indexer-2.4.0}/README.md RENAMED Viewed

@@ -1,7 +1,7 @@
 # MCP Code Indexer 🚀
-[![PyPI version](https://badge.fury.io/py/mcp-code-indexer.svg?16)](https://badge.fury.io/py/mcp-code-indexer)
-[![Python](https://img.shields.io/pypi/pyversions/mcp-code-indexer.svg?16)](https://pypi.org/project/mcp-code-indexer/)
+[![PyPI version](https://badge.fury.io/py/mcp-code-indexer.svg?18)](https://badge.fury.io/py/mcp-code-indexer)
+[![Python](https://img.shields.io/pypi/pyversions/mcp-code-indexer.svg?18)](https://pypi.org/project/mcp-code-indexer/)
 [![License](https://img.shields.io/badge/License-MIT-blue.svg)](https://opensource.org/licenses/MIT)
 A production-ready **Model Context Protocol (MCP) server** that revolutionizes how AI agents navigate and understand codebases. Built for high-concurrency environments with advanced database resilience, the server provides instant access to intelligent descriptions, semantic search, and context-aware recommendations while maintaining 800+ writes/sec throughput.

{mcp_code_indexer-2.2.1 → mcp_code_indexer-2.4.0}/docs/api-reference.md RENAMED Viewed

@@ -327,7 +327,14 @@ const result = await mcp.callTool("update_missing_descriptions", {
 ### search_descriptions
-Searches through all file descriptions in a project to find files related to specific functionality. Use this for large codebases instead of loading the entire structure. Returns files ranked by relevance.
+Searches through all file descriptions in a project to find files related to specific functionality using intelligent query preprocessing. Features include:
+- **Multi-word search**: `"grpc proto"` finds files containing both terms regardless of order
+- **Operator escaping**: FTS5 operators (`AND`, `OR`, `NOT`, `NEAR`) are treated as literal search terms
+- **Whole word matching**: Prevents partial matches for more precise results
+- **Case insensitive**: Works regardless of case in query or descriptions
+Use this for large codebases instead of loading the entire structure. Returns files ranked by relevance using BM25 scoring.
 #### Parameters
@@ -394,10 +401,50 @@ const result = await mcp.callTool("search_descriptions", {
 }
 ```
+#### Enhanced Search Examples
+**Multi-word search (order-agnostic):**
+```javascript
+// Both queries find the same results
+await mcp.callTool("search_descriptions", {
+  projectName: "api-service",
+  folderPath: "/projects/api-service",
+  branch: "main",
+  query: "grpc proto"        // Finds files with both "grpc" AND "proto"
+});
+await mcp.callTool("search_descriptions", {
+  projectName: "api-service",
+  folderPath: "/projects/api-service",
+  branch: "main",
+  query: "proto grpc"        // Same results as above
+});
+```
+**FTS5 operator escaping:**
+```javascript
+// Search for files containing literal "AND" as a term
+await mcp.callTool("search_descriptions", {
+  projectName: "error-handling",
+  folderPath: "/projects/error-handling",
+  branch: "main",
+  query: "logging AND error"  // Finds files with all three: "logging", "AND", "error"
+});
+```
+**Case insensitive matching:**
+```javascript
+// All variations return same results
+const queries = ["HTTP client", "http CLIENT", "Http Client"];
+// Each finds files containing both "http" and "client" regardless of case
+```
 🔍 **Search Tips**:
+- **Use multiple words**: "grpc proto" finds files with both terms
+- **Try different orders**: "api client" vs "client api" yield same results
 - **Be descriptive**: "authentication logic" vs "auth"
-- **Combine concepts**: "database connection pooling"
-- **Try variations**: If no results, try different terms
+- **Don't worry about operators**: "AND", "OR" are treated as literal search terms
+- **Case doesn't matter**: "HTTP", "http", "Http" all work the same
 - **Use technical terms**: "middleware", "controller", "utils"
 - **Search by purpose**: "error handling", "data validation"

{mcp_code_indexer-2.2.1 → mcp_code_indexer-2.4.0}/pyproject.toml RENAMED Viewed

@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
 [project]
 name = "mcp-code-indexer"
-version = "2.2.1"
+version = "2.4.0"
 description = "MCP server that tracks file descriptions across codebases, enabling AI agents to efficiently navigate and understand code through searchable summaries and token-aware overviews."
 readme = "README.md"
 license = {text = "MIT"}

{mcp_code_indexer-2.2.1 → mcp_code_indexer-2.4.0}/src/mcp_code_indexer/database/database.py RENAMED Viewed

@@ -30,6 +30,7 @@ from mcp_code_indexer.database.exceptions import (
 from mcp_code_indexer.database.connection_health import (
     ConnectionHealthMonitor, DatabaseMetricsCollector
 )
+from mcp_code_indexer.query_preprocessor import preprocess_search_query
 logger = logging.getLogger(__name__)
@@ -848,7 +849,16 @@ class DatabaseManager:
         query: str,
         max_results: int = 20
     ) -> List[SearchResult]:
-        """Search file descriptions using FTS5."""
+        """Search file descriptions using FTS5 with intelligent query preprocessing."""
+        # Preprocess query for optimal FTS5 search
+        preprocessed_query = preprocess_search_query(query)
+        if not preprocessed_query:
+            logger.debug(f"Empty query after preprocessing: '{query}'")
+            return []
+        logger.debug(f"Search query preprocessing: '{query}' -> '{preprocessed_query}'")
         async with self.get_connection() as db:
             cursor = await db.execute(
                 """
@@ -866,7 +876,7 @@ class DatabaseManager:
                 ORDER BY bm25(file_descriptions_fts)
                 LIMIT ?
                 """,
-                (query, project_id, branch, max_results)
+                (preprocessed_query, project_id, branch, max_results)
             )
             rows = await cursor.fetchall()

{mcp_code_indexer-2.2.1 → mcp_code_indexer-2.4.0}/src/mcp_code_indexer/main.py RENAMED Viewed

@@ -294,6 +294,7 @@ async def handle_runcommand(args: argparse.Namespace) -> None:
             "update_codebase_overview": server._handle_update_codebase_overview,
             "get_word_frequency": server._handle_get_word_frequency,
             "merge_branch_descriptions": server._handle_merge_branch_descriptions,
+            "search_codebase_overview": server._handle_search_codebase_overview,
         }
         if tool_name not in tool_handlers:

mcp_code_indexer-2.4.0/src/mcp_code_indexer/query_preprocessor.py ADDED Viewed

@@ -0,0 +1,181 @@
+"""
+Query preprocessing module for intelligent FTS5 search.
+This module provides intelligent query preprocessing for SQLite FTS5 full-text search
+to enable multi-word search with case insensitive matching, whole word enforcement,
+and proper handling of FTS5 operators as literal search terms.
+Key features:
+- Multi-word queries: "grpc proto" becomes "grpc" AND "proto" for order-agnostic matching
+- FTS5 operator escaping: "AND OR" becomes '"AND" AND "OR"' to treat operators as literals
+- Whole word matching: prevents partial matches by relying on proper tokenization
+- Case insensitive: leverages FTS5 default behavior
+- Special character handling: preserves special characters in quoted terms
+"""
+import re
+import logging
+from typing import List, Set
+logger = logging.getLogger(__name__)
+class QueryPreprocessor:
+    """
+    Preprocesses user queries for optimal FTS5 search performance.
+    Handles multi-word queries, operator escaping, and special character preservation
+    while maintaining BM25 ranking performance.
+    """
+    # FTS5 operators that need to be escaped when used as literal search terms
+    FTS5_OPERATORS: Set[str] = {
+        'AND', 'OR', 'NOT', 'NEAR'
+    }
+    def __init__(self):
+        """Initialize the query preprocessor."""
+        pass
+    def preprocess_query(self, query: str) -> str:
+        """
+        Preprocess a user query for FTS5 search.
+        Args:
+            query: Raw user query string
+        Returns:
+            Preprocessed query string optimized for FTS5
+        Examples:
+            >>> preprocessor = QueryPreprocessor()
+            >>> preprocessor.preprocess_query("grpc proto")
+            '"grpc" AND "proto"'
+            >>> preprocessor.preprocess_query("error AND handling")
+            '"error" AND "AND" AND "handling"'
+            >>> preprocessor.preprocess_query('config "file system"')
+            '"config" AND "file system"'
+        """
+        if not query or not query.strip():
+            return ""
+        # Normalize whitespace
+        query = query.strip()
+        # Split into terms while preserving quoted phrases
+        terms = self._split_terms(query)
+        if not terms:
+            return ""
+        # Process each term: escape operators and add quotes
+        processed_terms = []
+        for term in terms:
+            processed_term = self._process_term(term)
+            if processed_term:  # Skip empty terms
+                processed_terms.append(processed_term)
+        if not processed_terms:
+            return ""
+        # Join with AND for multi-word matching
+        result = " AND ".join(processed_terms)
+        logger.debug(f"Preprocessed query: '{query}' -> '{result}'")
+        return result
+    def _split_terms(self, query: str) -> List[str]:
+        """
+        Split query into terms while preserving quoted phrases.
+        Args:
+            query: Input query string
+        Returns:
+            List of terms and quoted phrases
+        Examples:
+            'grpc proto' -> ['grpc', 'proto']
+            'config "file system"' -> ['config', '"file system"']
+            'error AND handling' -> ['error', 'AND', 'handling']
+        """
+        terms = []
+        # Regex to match quoted phrases or individual words
+        # This pattern captures:
+        # 1. Double-quoted strings (including the quotes)
+        # 2. Single words (sequences of non-whitespace characters)
+        pattern = r'"[^"]*"|\S+'
+        matches = re.findall(pattern, query)
+        for match in matches:
+            # Skip empty matches
+            if match.strip():
+                terms.append(match)
+        return terms
+    def _process_term(self, term: str) -> str:
+        """
+        Process a single term: escape operators and ensure proper quoting.
+        Args:
+            term: Single term or quoted phrase
+        Returns:
+            Processed term ready for FTS5
+        Examples:
+            'grpc' -> '"grpc"'
+            'AND' -> '"AND"'
+            '"file system"' -> '"file system"'
+            'c++' -> '"c++"'
+        """
+        if not term:
+            return ""
+        # If already quoted, return as-is (user intentional phrase)
+        if term.startswith('"') and term.endswith('"') and len(term) >= 2:
+            return term
+        # Check if term is an FTS5 operator (case-insensitive)
+        if term.upper() in self.FTS5_OPERATORS:
+            # Escape operator by quoting
+            escaped_term = f'"{term}"'
+            logger.debug(f"Escaped FTS5 operator: '{term}' -> '{escaped_term}'")
+            return escaped_term
+        # Quote all terms to ensure whole-word matching and handle special characters
+        return f'"{term}"'
+    def _escape_quotes_in_term(self, term: str) -> str:
+        """
+        Escape internal quotes in a term for FTS5 compatibility.
+        Args:
+            term: Term that may contain quotes
+        Returns:
+            Term with escaped quotes
+        Examples:
+            'say "hello"' -> 'say ""hello""'
+            "test's file" -> "test's file"
+        """
+        # In FTS5, quotes inside quoted strings are escaped by doubling them
+        return term.replace('"', '""')
+def preprocess_search_query(query: str) -> str:
+    """
+    Convenience function for preprocessing search queries.
+    Args:
+        query: Raw user query
+    Returns:
+        Preprocessed query ready for FTS5
+    """
+    preprocessor = QueryPreprocessor()
+    return preprocessor.preprocess_query(query)

{mcp_code_indexer-2.2.1 → mcp_code_indexer-2.4.0}/src/mcp_code_indexer/server/mcp_server.py RENAMED Viewed

@@ -478,6 +478,23 @@ src/
                         "properties": {},
                         "additionalProperties": False
                     }
+                ),
+                types.Tool(
+                    name="search_codebase_overview",
+                    description="Search for a single word in the codebase overview and return 2 sentences before and after where the word is found. Useful for quickly finding specific information in large overviews.",
+                    inputSchema={
+                        "type": "object",
+                        "properties": {
+                            "projectName": {"type": "string", "description": "The name of the project"},
+                            "folderPath": {"type": "string", "description": "Absolute path to the project folder on disk"},
+                            "branch": {"type": "string", "description": "Git branch name"},
+                            "remoteOrigin": {"type": "string", "description": "Git remote origin URL if available"},
+                            "upstreamOrigin": {"type": "string", "description": "Upstream repository URL if this is a fork"},
+                            "searchWord": {"type": "string", "description": "Single word to search for in the overview"}
+                        },
+                        "required": ["projectName", "folderPath", "branch", "searchWord"],
+                        "additionalProperties": False
+                    }
                 )
             ]
@@ -503,6 +520,7 @@ src/
                 "get_word_frequency": self._handle_get_word_frequency,
                 "merge_branch_descriptions": self._handle_merge_branch_descriptions,
                 "check_database_health": self._handle_check_database_health,
+                "search_codebase_overview": self._handle_search_codebase_overview,
             }
             if name not in tool_handlers:
@@ -889,18 +907,28 @@ src/
         # Use provided token limit or fall back to server default
         token_limit = arguments.get("tokenLimit", self.token_limit)
-        # Calculate total tokens
+        # Calculate total tokens for descriptions
         logger.info("Calculating total token count...")
-        total_tokens = self.token_counter.calculate_codebase_tokens(file_descriptions)
+        descriptions_tokens = self.token_counter.calculate_codebase_tokens(file_descriptions)
+        # Get overview tokens if available
+        overview = await self.db_manager.get_project_overview(project_id, resolved_branch)
+        overview_tokens = 0
+        if overview and overview.overview:
+            overview_tokens = self.token_counter.count_tokens(overview.overview)
+        total_tokens = descriptions_tokens + overview_tokens
         is_large = total_tokens > token_limit
         recommendation = "use_search" if is_large else "use_overview"
-        logger.info(f"Codebase analysis complete: {total_tokens} tokens, {len(file_descriptions)} files")
+        logger.info(f"Codebase analysis complete: {total_tokens} tokens total ({descriptions_tokens} descriptions + {overview_tokens} overview), {len(file_descriptions)} files")
         logger.info(f"Size assessment: {'LARGE' if is_large else 'SMALL'} (limit: {token_limit})")
         logger.info(f"Recommendation: {recommendation}")
         return {
             "totalTokens": total_tokens,
+            "descriptionsTokens": descriptions_tokens,
+            "overviewTokens": overview_tokens,
             "isLarge": is_large,
             "recommendation": recommendation,
             "tokenLimit": token_limit,
@@ -1205,6 +1233,54 @@ src/
             "totalUniqueTerms": result.total_unique_terms
         }
+    async def _handle_search_codebase_overview(self, arguments: Dict[str, Any]) -> Dict[str, Any]:
+        """Handle search_codebase_overview tool calls."""
+        project_id = await self._get_or_create_project_id(arguments)
+        resolved_branch = await self._resolve_branch(project_id, arguments["branch"])
+        search_word = arguments["searchWord"].lower()
+        # Get the overview
+        overview = await self.db_manager.get_project_overview(project_id, resolved_branch)
+        if not overview or not overview.overview:
+            return {
+                "found": False,
+                "message": "No overview found for this project",
+                "searchWord": arguments["searchWord"]
+            }
+        # Split overview into sentences
+        import re
+        sentences = re.split(r'[.!?]+', overview.overview)
+        sentences = [s.strip() for s in sentences if s.strip()]
+        # Find matches
+        matches = []
+        for i, sentence in enumerate(sentences):
+            if search_word in sentence.lower():
+                # Get context: 2 sentences before and after
+                start_idx = max(0, i - 2)
+                end_idx = min(len(sentences), i + 3)
+                context_sentences = sentences[start_idx:end_idx]
+                context = '. '.join(context_sentences) + '.'
+                matches.append({
+                    "matchIndex": i,
+                    "matchSentence": sentence,
+                    "context": context,
+                    "contextStartIndex": start_idx,
+                    "contextEndIndex": end_idx - 1
+                })
+        return {
+            "found": len(matches) > 0,
+            "searchWord": arguments["searchWord"],
+            "matches": matches,
+            "totalMatches": len(matches),
+            "totalSentences": len(sentences)
+        }
     async def _handle_check_database_health(self, arguments: Dict[str, Any]) -> Dict[str, Any]:
         """
         Handle check_database_health tool calls with comprehensive diagnostics.

{mcp_code_indexer-2.2.1 → mcp_code_indexer-2.4.0/src/mcp_code_indexer.egg-info}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: mcp-code-indexer
-Version: 2.2.1
+Version: 2.4.0
 Summary: MCP server that tracks file descriptions across codebases, enabling AI agents to efficiently navigate and understand code through searchable summaries and token-aware overviews.
 Author: MCP Code Indexer Contributors
 Maintainer: MCP Code Indexer Contributors
@@ -59,8 +59,8 @@ Dynamic: requires-python
 # MCP Code Indexer 🚀
-[![PyPI version](https://badge.fury.io/py/mcp-code-indexer.svg?16)](https://badge.fury.io/py/mcp-code-indexer)
-[![Python](https://img.shields.io/pypi/pyversions/mcp-code-indexer.svg?16)](https://pypi.org/project/mcp-code-indexer/)
+[![PyPI version](https://badge.fury.io/py/mcp-code-indexer.svg?18)](https://badge.fury.io/py/mcp-code-indexer)
+[![Python](https://img.shields.io/pypi/pyversions/mcp-code-indexer.svg?18)](https://pypi.org/project/mcp-code-indexer/)
 [![License](https://img.shields.io/badge/License-MIT-blue.svg)](https://opensource.org/licenses/MIT)
 A production-ready **Model Context Protocol (MCP) server** that revolutionizes how AI agents navigate and understand codebases. Built for high-concurrency environments with advanced database resilience, the server provides instant access to intelligent descriptions, semantic search, and context-aware recommendations while maintaining 800+ writes/sec throughput.

{mcp_code_indexer-2.2.1 → mcp_code_indexer-2.4.0}/src/mcp_code_indexer.egg-info/SOURCES.txt RENAMED Viewed

@@ -26,6 +26,7 @@ src/mcp_code_indexer/git_hook_handler.py
 src/mcp_code_indexer/logging_config.py
 src/mcp_code_indexer/main.py
 src/mcp_code_indexer/merge_handler.py
+src/mcp_code_indexer/query_preprocessor.py
 src/mcp_code_indexer/token_counter.py
 src/mcp_code_indexer.egg-info/PKG-INFO
 src/mcp_code_indexer.egg-info/SOURCES.txt