PyPI - code-memory - Versions diffs - 1.0.13__tar.gz → 1.0.15__tar.gz - Mend

code-memory 1.0.13tar.gz → 1.0.15tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (40) hide show

{code_memory-1.0.13 → code_memory-1.0.15}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: code-memory
-Version: 1.0.13
+Version: 1.0.15
 Summary: A deterministic, high-precision code intelligence MCP server
 Project-URL: Homepage, https://github.com/kapillamba4/code-memory
 Project-URL: Documentation, https://github.com/kapillamba4/code-memory#readme
@@ -44,46 +44,29 @@ Description-Content-Type: text/markdown
 # code-memory
+<img src="assets/logo.png" alt="code-memory logo" width="100%">
 A deterministic, high-precision **code intelligence layer** exposed as a [Model Context Protocol (MCP)](https://modelcontextprotocol.io/) server.
-`code-memory` gives your AI coding assistant structured access to your codebase through three focused pathways — eliminating context-window bloat and vague "search everything" queries.
+- **No API key required** — runs entirely locally with sentence-transformers
+- **1 min setup** — just `uvx code-memory` and you're ready
+- **Token saving by 50%** — precise code retrieval instead of dumping entire files
+**Please help star code-memory if you like this project!**
+## Why code-memory?
+Finding the right context from a large codebase is **expensive**, **inaccurate**, and **limited by context windows**. Dumping files into prompts wastes tokens, and LLMs lose track of the actual task as context fills up.
+Instead of manually hunting with `grep`/`find` or dumping raw file text, `code-memory` runs semantic searches against a locally indexed codebase. Inspired by [claude-context](https://github.com/redmonkez12/claude-context), but designed from the ground up for large-scale local search.
 ## Supported Languages
-### Full AST Support (Tree-sitter)
-These languages have structural parsing with symbol extraction (functions, classes, methods, etc.):
-| Language | Extensions |
-|----------|------------|
-| Python | `.py` |
-| JavaScript | `.js`, `.jsx` |
-| TypeScript | `.ts`, `.tsx` |
-| Java | `.java` |
-| Go | `.go` |
-| Rust | `.rs` |
-| C | `.c`, `.h` |
-| C++ | `.cpp`, `.hpp`, `.cc`, `.cxx` |
-| Ruby | `.rb` |
-| Kotlin | `.kt`, `.kts` |
-### Fallback Support (Whole-file Indexing)
-These file types are indexed as complete units for BM25 and semantic search:
-| Category | Extensions |
-|----------|------------|
-| C# | `.cs` |
-| Swift | `.swift` |
-| Scala | `.scala` |
-| Lua | `.lua` |
-| Shell | `.sh`, `.bash`, `.zsh` |
-| Config | `.yaml`, `.yml`, `.toml`, `.json` |
-| Web | `.html`, `.css`, `.scss` |
-| Database | `.sql` |
-| Docs | `.md`, `.txt` |
-> **Note:** Files and directories matching patterns in your `.gitignore` are automatically skipped during indexing. This excludes build artifacts, dependencies, and other generated files.
+**Full AST Support** (structural parsing with symbol extraction): Python, JavaScript/TypeScript, Java, Go, Rust, C/C++, Ruby, Kotlin
+**Fallback Support** (whole-file indexing): C#, Swift, Scala, Lua, Shell, Config (yaml/toml/json), Web (html/css), SQL, Markdown
+> Files matching `.gitignore` patterns are automatically skipped.
 ## Architecture: Progressive Disclosure
@@ -283,12 +266,39 @@ For Windows:
 | Variable | Description | Default |
 |----------|-------------|---------|
 | `CODE_MEMORY_LOG_LEVEL` | Logging verbosity (DEBUG, INFO, WARNING, ERROR) | INFO |
+| `EMBEDDING_MODEL` | HuggingFace model ID for embeddings | `nomic-ai/nomic-embed-text-v1.5` |
 Example:
 ```bash
 CODE_MEMORY_LOG_LEVEL=DEBUG uvx code-memory
 ```
+### Custom Embedding Model
+You can use a different embedding model by setting the `EMBEDDING_MODEL` environment variable:
+```bash
+EMBEDDING_MODEL="BAAI/bge-small-en-v1.5" uvx code-memory
+```
+For MCP hosts, add the environment variable to your configuration:
+```json
+{
+  "mcpServers": {
+    "code-memory": {
+      "command": "uvx",
+      "args": ["code-memory"],
+      "env": {
+        "EMBEDDING_MODEL": "BAAI/bge-small-en-v1.5"
+      }
+    }
+  }
+}
+```
+> **Note:** Changing the embedding model will invalidate existing indexes. You'll need to re-run `index_codebase` after switching models.
 ## Tools
 ### `index_codebase`

{code_memory-1.0.13 → code_memory-1.0.15}/README.md RENAMED Viewed

@@ -1,45 +1,28 @@
 # code-memory
+<img src="assets/logo.png" alt="code-memory logo" width="100%">
 A deterministic, high-precision **code intelligence layer** exposed as a [Model Context Protocol (MCP)](https://modelcontextprotocol.io/) server.
-`code-memory` gives your AI coding assistant structured access to your codebase through three focused pathways — eliminating context-window bloat and vague "search everything" queries.
+- **No API key required** — runs entirely locally with sentence-transformers
+- **1 min setup** — just `uvx code-memory` and you're ready
+- **Token saving by 50%** — precise code retrieval instead of dumping entire files
+**Please help star code-memory if you like this project!**
+## Why code-memory?
+Finding the right context from a large codebase is **expensive**, **inaccurate**, and **limited by context windows**. Dumping files into prompts wastes tokens, and LLMs lose track of the actual task as context fills up.
+Instead of manually hunting with `grep`/`find` or dumping raw file text, `code-memory` runs semantic searches against a locally indexed codebase. Inspired by [claude-context](https://github.com/redmonkez12/claude-context), but designed from the ground up for large-scale local search.
 ## Supported Languages
-### Full AST Support (Tree-sitter)
-These languages have structural parsing with symbol extraction (functions, classes, methods, etc.):
-| Language | Extensions |
-|----------|------------|
-| Python | `.py` |
-| JavaScript | `.js`, `.jsx` |
-| TypeScript | `.ts`, `.tsx` |
-| Java | `.java` |
-| Go | `.go` |
-| Rust | `.rs` |
-| C | `.c`, `.h` |
-| C++ | `.cpp`, `.hpp`, `.cc`, `.cxx` |
-| Ruby | `.rb` |
-| Kotlin | `.kt`, `.kts` |
-### Fallback Support (Whole-file Indexing)
-These file types are indexed as complete units for BM25 and semantic search:
-| Category | Extensions |
-|----------|------------|
-| C# | `.cs` |
-| Swift | `.swift` |
-| Scala | `.scala` |
-| Lua | `.lua` |
-| Shell | `.sh`, `.bash`, `.zsh` |
-| Config | `.yaml`, `.yml`, `.toml`, `.json` |
-| Web | `.html`, `.css`, `.scss` |
-| Database | `.sql` |
-| Docs | `.md`, `.txt` |
-> **Note:** Files and directories matching patterns in your `.gitignore` are automatically skipped during indexing. This excludes build artifacts, dependencies, and other generated files.
+**Full AST Support** (structural parsing with symbol extraction): Python, JavaScript/TypeScript, Java, Go, Rust, C/C++, Ruby, Kotlin
+**Fallback Support** (whole-file indexing): C#, Swift, Scala, Lua, Shell, Config (yaml/toml/json), Web (html/css), SQL, Markdown
+> Files matching `.gitignore` patterns are automatically skipped.
 ## Architecture: Progressive Disclosure
@@ -239,12 +222,39 @@ For Windows:
 | Variable | Description | Default |
 |----------|-------------|---------|
 | `CODE_MEMORY_LOG_LEVEL` | Logging verbosity (DEBUG, INFO, WARNING, ERROR) | INFO |
+| `EMBEDDING_MODEL` | HuggingFace model ID for embeddings | `nomic-ai/nomic-embed-text-v1.5` |
 Example:
 ```bash
 CODE_MEMORY_LOG_LEVEL=DEBUG uvx code-memory
 ```
+### Custom Embedding Model
+You can use a different embedding model by setting the `EMBEDDING_MODEL` environment variable:
+```bash
+EMBEDDING_MODEL="BAAI/bge-small-en-v1.5" uvx code-memory
+```
+For MCP hosts, add the environment variable to your configuration:
+```json
+{
+  "mcpServers": {
+    "code-memory": {
+      "command": "uvx",
+      "args": ["code-memory"],
+      "env": {
+        "EMBEDDING_MODEL": "BAAI/bge-small-en-v1.5"
+      }
+    }
+  }
+}
+```
+> **Note:** Changing the embedding model will invalidate existing indexes. You'll need to re-run `index_codebase` after switching models.
 ## Tools
 ### `index_codebase`

code_memory-1.0.15/assets/logo.png ADDED Viewed

Binary file

{code_memory-1.0.13 → code_memory-1.0.15}/db.py RENAMED Viewed

@@ -12,6 +12,7 @@ All writes use upsert semantics so re-indexing is idempotent.
 from __future__ import annotations
 import logging
+import os
 import sqlite3
 from contextlib import contextmanager
 from typing import TYPE_CHECKING
@@ -31,8 +32,9 @@ logger = logging.getLogger(__name__)
 _model = None
 _embedding_dim = None
-# Model identifier - change this if you switch to a different embedding model
-EMBEDDING_MODEL_NAME = "jinaai/jina-code-embeddings-0.5b"
+# Model identifier - can be overridden via EMBEDDING_MODEL environment variable
+DEFAULT_EMBEDDING_MODEL = "nomic-ai/nomic-embed-text-v1.5"
+EMBEDDING_MODEL_NAME = os.environ.get("EMBEDDING_MODEL", DEFAULT_EMBEDDING_MODEL)
 def get_embedding_model():
@@ -46,7 +48,7 @@ def get_embedding_model():
         )
         # Cache the embedding dimension from the model
         _embedding_dim = _model.get_sentence_embedding_dimension()
-        logger.info(f"Loaded embedding model with dimension: {_embedding_dim}")
+        logger.info(f"Loaded embedding model '{EMBEDDING_MODEL_NAME}' with dimension: {_embedding_dim}")
     return _model

{code_memory-1.0.13 → code_memory-1.0.15}/doc_parser.py RENAMED Viewed

@@ -337,12 +337,15 @@ def index_doc_file(
     }
-def index_doc_directory(dirpath: str, db) -> list[dict]:
+def index_doc_directory(dirpath: str, db, progress_callback=None, progress_offset: int = 0, progress_total: int = 0) -> list[dict]:
     """Recursively index all documentation in a directory.
     Args:
         dirpath: Root directory to search.
         db: Database connection.
+        progress_callback: Optional callback(current, total, message) for progress updates.
+        progress_offset: Offset to add to current count (for combined progress with code indexing).
+        progress_total: Total files across all indexing phases.
     Returns:
         List of result dicts from index_doc_file.
@@ -350,16 +353,23 @@ def index_doc_directory(dirpath: str, db) -> list[dict]:
     abs_dir = os.path.abspath(dirpath)
     results = []
+    # First pass: count files
+    doc_files = []
     for root, dirs, files in os.walk(abs_dir):
-        # Skip unwanted directories
         dirs[:] = [d for d in dirs if d not in SKIP_DIRS and not d.startswith(".")]
         for filename in files:
             ext = os.path.splitext(filename)[1].lower()
             if ext in DOC_EXTENSIONS:
-                filepath = os.path.join(root, filename)
-                result = index_doc_file(filepath, db)
-                results.append(result)
+                doc_files.append(os.path.join(root, filename))
+    # Index files with progress reporting
+    for i, filepath in enumerate(doc_files):
+        result = index_doc_file(filepath, db)
+        results.append(result)
+        if progress_callback:
+            current = progress_offset + i + 1
+            progress_callback(current, progress_total, f"Indexing docs: {os.path.basename(filepath)}")
     return results

{code_memory-1.0.13 → code_memory-1.0.15}/parser.py RENAMED Viewed

@@ -451,7 +451,7 @@ def index_file(filepath: str, db) -> dict:
 # Directory indexer
 # ---------------------------------------------------------------------------
-def index_directory(dirpath: str, db) -> list[dict]:
+def index_directory(dirpath: str, db, progress_callback=None) -> list[dict]:
     """Recursively index all source files under *dirpath*.
     Skips directories in ``_SKIP_DIRS``, files matching ``.gitignore`` patterns
@@ -461,6 +461,7 @@ def index_directory(dirpath: str, db) -> list[dict]:
     Args:
         dirpath: Root directory to scan.
         db: An open ``sqlite3.Connection`` from ``db.get_db()``.
+        progress_callback: Optional callback(current, total, message) for progress updates.
     Returns:
         A list of per-file result dicts (see :func:`index_file`).
@@ -475,6 +476,28 @@ def index_directory(dirpath: str, db) -> list[dict]:
     gitignore = GitignoreMatcher(dirpath)
     logger.debug("Initialized gitignore matcher for %s", dirpath)
+    # First pass: count total files for progress reporting
+    total_files = 0
+    file_list = []
+    for root, dirs, files in os.walk(dirpath, topdown=True):
+        rel_root = os.path.relpath(root, dirpath)
+        if rel_root != ".":
+            gitignore.check_dir_for_gitignore(root, rel_root)
+        dirs[:] = [d for d in dirs if d not in _SKIP_DIRS and not d.endswith(".egg-info")
+                   and not gitignore.should_skip(os.path.join(rel_root, d) if rel_root != "." else d, is_dir=True)]
+        for fname in sorted(files):
+            rel_path = os.path.join(rel_root, fname) if rel_root != "." else fname
+            if gitignore.should_skip(rel_path, is_dir=False):
+                continue
+            ext = os.path.splitext(fname)[1].lower()
+            if ext in _SOURCE_EXTENSIONS or _load_language(ext) is not None:
+                file_list.append(os.path.join(root, fname))
+                total_files += 1
+    # Reset gitignore for actual indexing pass
+    gitignore = GitignoreMatcher(dirpath)
+    files_processed = 0
     for root, dirs, files in os.walk(dirpath, topdown=True):
         rel_root = os.path.relpath(root, dirpath)
@@ -519,6 +542,11 @@ def index_directory(dirpath: str, db) -> list[dict]:
                     "error": True,
                 })
+            # Report progress
+            files_processed += 1
+            if progress_callback:
+                progress_callback(files_processed, total_files, f"Indexing code: {fname}")
     # Log performance summary
     total_elapsed = time.perf_counter() - total_start
     total_symbols = sum(r.get("symbols_indexed", 0) for r in results)

{code_memory-1.0.13 → code_memory-1.0.15}/pyproject.toml RENAMED Viewed

@@ -4,7 +4,7 @@ build-backend = "hatchling.build"
 [project]
 name = "code-memory"
-version = "1.0.13"
+version = "1.0.15"
 description = "A deterministic, high-precision code intelligence MCP server"
 readme = "README.md"
 license = "MIT"

{code_memory-1.0.13 → code_memory-1.0.15}/server.py RENAMED Viewed

@@ -12,9 +12,10 @@ architecture:
 from __future__ import annotations
+import asyncio
 from typing import Literal
-from mcp.server.fastmcp import FastMCP
+from mcp.server.fastmcp import Context, FastMCP
 import db as db_mod
 import doc_parser as doc_parser_mod
@@ -275,7 +276,7 @@ def search_code(
 # ── Tool 2: index_codebase ────────────────────────────────────────────────
 @mcp.tool()
-def index_codebase(directory: str) -> dict:
+async def index_codebase(directory: str, ctx: Context) -> dict:
     """YOU MUST CALL THIS TOOL FIRST before using search_code or search_docs. Use this tool to build the searchable index that powers all other code intelligence features.
     TRIGGER: Call this tool immediately when:
@@ -312,11 +313,36 @@ def index_codebase(directory: str) -> dict:
             database = db_mod.get_db(str(directory_path))
-            # Index code files
+            # Report initial progress
+            await ctx.report_progress(0, 100, "Starting indexing...")
+            # Create progress callback that schedules progress updates on the event loop
+            loop = asyncio.get_running_loop()
+            progress_state = {"current": 0, "total": 0, "phase": "code"}
+            def sync_progress_callback(current: int, total: int, message: str):
+                """Sync callback that schedules async progress reporting."""
+                progress_state["current"] = current
+                progress_state["total"] = total
+                # Schedule the async progress report on the event loop
+                asyncio.run_coroutine_threadsafe(
+                    ctx.report_progress(current, total, message),
+                    loop
+                )
+            # Index code files in a thread to allow progress reporting
             code_logger = logging_config.IndexingLogger("code")
             code_logger.start(str(directory_path))
-            code_results = parser_mod.index_directory(str(directory_path), database)
+            await ctx.report_progress(0, 100, "Scanning code files...")
+            code_results = await asyncio.to_thread(
+                parser_mod.index_directory,
+                str(directory_path),
+                database,
+                sync_progress_callback
+            )
             for r in code_results:
                 if r.get("skipped"):
                     code_logger.file_skipped(r.get("file", "unknown"), r.get("reason", "unknown"))
@@ -331,7 +357,21 @@ def index_codebase(directory: str) -> dict:
             doc_logger = logging_config.IndexingLogger("documentation")
             doc_logger.start(str(directory_path))
-            doc_results = doc_parser_mod.index_doc_directory(str(directory_path), database)
+            # Calculate progress offset for doc indexing
+            code_file_count = len(code_results)
+            doc_progress_offset = code_file_count
+            await ctx.report_progress(code_file_count, code_file_count, "Scanning documentation files...")
+            doc_results = await asyncio.to_thread(
+                doc_parser_mod.index_doc_directory,
+                str(directory_path),
+                database,
+                sync_progress_callback,
+                doc_progress_offset,
+                code_file_count  # Will be updated by callback
+            )
             for r in doc_results:
                 if r.get("skipped"):
                     doc_logger.file_skipped(r.get("file", "unknown"), r.get("reason", "unknown"))
@@ -343,12 +383,18 @@ def index_codebase(directory: str) -> dict:
             doc_skipped = [r for r in doc_results if r.get("skipped")]
             # Extract docstrings from indexed code
-            docstring_results = doc_parser_mod.extract_docstrings_from_code(database)
+            await ctx.report_progress(0, 0, "Extracting docstrings...")
+            docstring_results = await asyncio.to_thread(
+                doc_parser_mod.extract_docstrings_from_code,
+                database
+            )
             total_symbols = sum(r.get("symbols_indexed", 0) for r in indexed)
             total_chunks = sum(r.get("chunks_indexed", 0) for r in doc_indexed)
             log.set_result_count(total_symbols + total_chunks + len(docstring_results))
+            await ctx.report_progress(100, 100, "Indexing complete!")
             return {
                 "status": "ok",
                 "directory": str(directory_path),
@@ -563,6 +609,7 @@ def search_history(
 def main():
     """Entry point for the MCP server when installed as a package."""
     # Warm up embedding model to avoid cold-start latency
+    logger.info(f"Using embedding model: {db_mod.EMBEDDING_MODEL_NAME}")
     logger.info("Warming up embedding model...")
     db_mod.warmup_embedding_model()
     logger.info("Embedding model ready")

{code_memory-1.0.13 → code_memory-1.0.15}/tests/test_tools.py RENAMED Viewed

@@ -2,6 +2,15 @@
 from __future__ import annotations
+from unittest.mock import AsyncMock
+class MockContext:
+    """Mock MCP Context for testing."""
+    def __init__(self):
+        self.report_progress = AsyncMock()
 class TestSearchCodeValidation:
     """Tests for search_code tool input validation."""
@@ -84,8 +93,16 @@ class TestIndexCodebaseValidation:
     def test_nonexistent_directory_returns_error(self):
         """Test that nonexistent directory returns structured error."""
+        import asyncio
         import server
-        result = server.index_codebase("/nonexistent/directory")
+        ctx = MockContext()
+        async def run_test():
+            result = await server.index_codebase("/nonexistent/directory", ctx)
+            return result
+        result = asyncio.run(run_test())
         assert result.get("error") is True
         assert "ValidationError" in result.get("error_type", "")

{code_memory-1.0.13 → code_memory-1.0.15}/uv.lock RENAMED Viewed

@@ -109,7 +109,7 @@ wheels = [
 [[package]]
 name = "code-memory"
-version = "1.0.11"
+version = "1.0.13"
 source = { editable = "." }
 dependencies = [
     { name = "gitpython" },

{code_memory-1.0.13 → code_memory-1.0.15}/.github/workflows/ci.yml RENAMED Viewed

File without changes

{code_memory-1.0.13 → code_memory-1.0.15}/.github/workflows/publish.yml RENAMED Viewed

File without changes

{code_memory-1.0.13 → code_memory-1.0.15}/.github/workflows/release-binaries.yml RENAMED Viewed

File without changes

{code_memory-1.0.13 → code_memory-1.0.15}/.gitignore RENAMED Viewed

File without changes

{code_memory-1.0.13 → code_memory-1.0.15}/.python-version RENAMED Viewed

File without changes

{code_memory-1.0.13 → code_memory-1.0.15}/CHANGELOG.md RENAMED Viewed

File without changes

{code_memory-1.0.13 → code_memory-1.0.15}/CONTRIBUTING.md RENAMED Viewed

File without changes

{code_memory-1.0.13 → code_memory-1.0.15}/LICENSE RENAMED Viewed

File without changes

{code_memory-1.0.13 → code_memory-1.0.15}/Makefile RENAMED Viewed

File without changes

{code_memory-1.0.13 → code_memory-1.0.15}/code-memory.spec RENAMED Viewed

File without changes

{code_memory-1.0.13 → code_memory-1.0.15}/errors.py RENAMED Viewed

File without changes

{code_memory-1.0.13 → code_memory-1.0.15}/git_search.py RENAMED Viewed

File without changes

{code_memory-1.0.13 → code_memory-1.0.15}/hooks/hook-sentence_transformers.py RENAMED Viewed

File without changes

{code_memory-1.0.13 → code_memory-1.0.15}/hooks/hook-sqlite_vec.py RENAMED Viewed

File without changes

{code_memory-1.0.13 → code_memory-1.0.15}/hooks/hook-tree_sitter.py RENAMED Viewed

File without changes

{code_memory-1.0.13 → code_memory-1.0.15}/hooks/hook-tree_sitter_languages.py RENAMED Viewed

File without changes

{code_memory-1.0.13 → code_memory-1.0.15}/logging_config.py RENAMED Viewed

File without changes

{code_memory-1.0.13 → code_memory-1.0.15}/prompts/milestone_1.xml RENAMED Viewed

File without changes

{code_memory-1.0.13 → code_memory-1.0.15}/prompts/milestone_2.xml RENAMED Viewed

File without changes

{code_memory-1.0.13 → code_memory-1.0.15}/prompts/milestone_3.xml RENAMED Viewed

File without changes

{code_memory-1.0.13 → code_memory-1.0.15}/prompts/milestone_4.xml RENAMED Viewed

File without changes

{code_memory-1.0.13 → code_memory-1.0.15}/prompts/milestone_5.xml RENAMED Viewed

File without changes

{code_memory-1.0.13 → code_memory-1.0.15}/prompts/milestone_6.xml RENAMED Viewed

File without changes

{code_memory-1.0.13 → code_memory-1.0.15}/queries.py RENAMED Viewed

File without changes

{code_memory-1.0.13 → code_memory-1.0.15}/tests/__init__.py RENAMED Viewed

File without changes

{code_memory-1.0.13 → code_memory-1.0.15}/tests/conftest.py RENAMED Viewed

File without changes

{code_memory-1.0.13 → code_memory-1.0.15}/tests/test_errors.py RENAMED Viewed

File without changes

{code_memory-1.0.13 → code_memory-1.0.15}/tests/test_logging.py RENAMED Viewed

File without changes

{code_memory-1.0.13 → code_memory-1.0.15}/tests/test_validation.py RENAMED Viewed

File without changes

{code_memory-1.0.13 → code_memory-1.0.15}/validation.py RENAMED Viewed

File without changes

code-memory 1.0.13__tar.gz → 1.0.15__tar.gz

code-memory 1.0.13tar.gz → 1.0.15tar.gz