PyPI - codebase-mcp - Versions diffs - 0.1.0__tar.gz - Mend

codebase-mcp 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (32) hide show

codebase_mcp-0.1.0/.gitignore +38 -0
codebase_mcp-0.1.0/PKG-INFO +424 -0
codebase_mcp-0.1.0/README.md +385 -0
codebase_mcp-0.1.0/mcp-config-example.json +10 -0
codebase_mcp-0.1.0/pyproject.toml +67 -0
codebase_mcp-0.1.0/src/codebase_mcp/__init__.py +3 -0
codebase_mcp-0.1.0/src/codebase_mcp/__main__.py +524 -0
codebase_mcp-0.1.0/src/codebase_mcp/config.py +211 -0
codebase_mcp-0.1.0/src/codebase_mcp/db.py +541 -0
codebase_mcp-0.1.0/src/codebase_mcp/exporter.py +243 -0
codebase_mcp-0.1.0/src/codebase_mcp/handoff.py +317 -0
codebase_mcp-0.1.0/src/codebase_mcp/indexer.py +415 -0
codebase_mcp-0.1.0/src/codebase_mcp/models.py +46 -0
codebase_mcp-0.1.0/src/codebase_mcp/parsers/__init__.py +15 -0
codebase_mcp-0.1.0/src/codebase_mcp/parsers/base.py +157 -0
codebase_mcp-0.1.0/src/codebase_mcp/parsers/config_parsers.py +462 -0
codebase_mcp-0.1.0/src/codebase_mcp/parsers/generic.py +95 -0
codebase_mcp-0.1.0/src/codebase_mcp/parsers/go.py +222 -0
codebase_mcp-0.1.0/src/codebase_mcp/parsers/python.py +231 -0
codebase_mcp-0.1.0/src/codebase_mcp/parsers/rust.py +205 -0
codebase_mcp-0.1.0/src/codebase_mcp/parsers/typescript.py +303 -0
codebase_mcp-0.1.0/src/codebase_mcp/parsers/universal.py +625 -0
codebase_mcp-0.1.0/src/codebase_mcp/server.py +1291 -0
codebase_mcp-0.1.0/src/codebase_mcp/watcher.py +169 -0
codebase_mcp-0.1.0/src/codebase_mcp/webui.py +611 -0
codebase_mcp-0.1.0/tests/conftest.py +46 -0
codebase_mcp-0.1.0/tests/test_call_graph.py +313 -0
codebase_mcp-0.1.0/tests/test_db.py +386 -0
codebase_mcp-0.1.0/tests/test_exclude.py +124 -0
codebase_mcp-0.1.0/tests/test_indexer.py +266 -0
codebase_mcp-0.1.0/tests/test_parsers.py +346 -0
codebase_mcp-0.1.0/tests/test_scan.py +171 -0

codebase_mcp-0.1.0/.gitignore ADDED Viewed

@@ -0,0 +1,38 @@
+# Python
+__pycache__/
+*.py[cod]
+*.pyo
+*.pyd
+.Python
+*.egg-info/
+dist/
+build/
+.eggs/
+*.egg
+*.whl
+.venv/
+venv/
+env/
+pip-wheel-metadata/
+# Testing
+.pytest_cache/
+.coverage
+htmlcov/
+.tox/
+# IDE
+.vscode/
+.idea/
+*.swp
+*.swo
+# Project index (each user builds their own)
+.codebase-mcp/
+# Exports / handoffs
+exports/
+# OS
+.DS_Store
+Thumbs.db

codebase_mcp-0.1.0/PKG-INFO ADDED Viewed

@@ -0,0 +1,424 @@
+Metadata-Version: 2.4
+Name: codebase-mcp
+Version: 0.1.0
+Summary: Persistent, portable codebase intelligence MCP server with incremental indexing and decision memory
+License: MIT
+Keywords: agent,ai,codebase,indexer,mcp,tree-sitter
+Classifier: Development Status :: 4 - Beta
+Classifier: Intended Audience :: Developers
+Classifier: License :: OSI Approved :: MIT License
+Classifier: Programming Language :: Python :: 3
+Classifier: Programming Language :: Python :: 3.11
+Classifier: Programming Language :: Python :: 3.12
+Classifier: Programming Language :: Python :: 3.13
+Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
+Classifier: Topic :: Software Development :: Libraries
+Requires-Python: >=3.11
+Requires-Dist: click>=8.0
+Requires-Dist: mcp[cli]>=1.0.0
+Requires-Dist: rich>=13.0
+Requires-Dist: tree-sitter-language-pack>=0.1.0
+Requires-Dist: tree-sitter>=0.23.0
+Provides-Extra: all
+Requires-Dist: pyyaml>=6.0; extra == 'all'
+Requires-Dist: watchdog>=4.0; extra == 'all'
+Provides-Extra: dev
+Requires-Dist: pytest-asyncio>=0.23; extra == 'dev'
+Requires-Dist: pytest>=7.0; extra == 'dev'
+Provides-Extra: langs
+Requires-Dist: tree-sitter-go; extra == 'langs'
+Requires-Dist: tree-sitter-javascript; extra == 'langs'
+Requires-Dist: tree-sitter-python; extra == 'langs'
+Requires-Dist: tree-sitter-rust; extra == 'langs'
+Requires-Dist: tree-sitter-typescript; extra == 'langs'
+Provides-Extra: watch
+Requires-Dist: watchdog>=4.0; extra == 'watch'
+Provides-Extra: yaml
+Requires-Dist: pyyaml>=6.0; extra == 'yaml'
+Description-Content-Type: text/markdown
+# codebase-mcp
+A local MCP server that gives any AI agent or IDE a persistent, structured understanding of your codebase. It indexes your project once, stores everything in a local SQLite database, and answers structural questions instantly without the agent having to read any files.
+---
+## The problem it solves
+Every time you start a new session in Claude Code, Cursor, Cline, or any other AI tool, the agent has to re-read your files to understand the codebase. On large projects this burns thousands of tokens just on orientation, and the agent still only sees a shallow surface. It cannot answer questions like "what calls this function", "what changed since yesterday", or "what decisions were made and why" without reading everything again.
+codebase-mcp solves this by:
+- Parsing your code once with tree-sitter (Python, TypeScript, JavaScript, Go, Rust, and 50+ more languages)
+- Storing every function, class, method, import, and call site in a local database
+- Keeping that database up to date incrementally (only changed files are re-parsed)
+- Exposing a set of MCP tools so any agent can query the structure without reading files
+- Persisting architectural decisions, notes, and session history across every agent and IDE you use
+---
+## How it is different from standard approaches
+| | Standard approach | codebase-mcp |
+|---|---|---|
+| Codebase understanding | Agent reads files in context window | Pre-indexed, queried via tool calls |
+| Cost per session | Hundreds to thousands of tokens on orientation | Near zero — index is already built |
+| Call graph | Not available | Full caller/callee resolution with 3 strategies |
+| Decisions and notes | Lost when context resets | Stored in database, searchable forever |
+| Switching agents/IDEs | Start over from scratch | Export once, import anywhere |
+| Multi-language | Depends on the agent | 50+ languages via tree-sitter |
+| Incremental updates | Full re-read every time | SHA256-based, only changed files reparsed |
+The core idea is that the agent should never read source files to understand structure. It should call tools and get structured answers back. Reading files is for when you actually need to see the code, not for orientation.
+---
+## Installation
+Requires Python 3.11 or later.
+```bash
+pip install git+https://github.com/vatsal2025/CodeBase.git
+```
+Or clone and install in editable mode for development:
+```bash
+git clone https://github.com/vatsal2025/CodeBase.git
+cd CodeBase
+pip install -e .
+```
+---
+## Registering with your IDE or agent
+Run this once after installation. It writes the MCP server configuration into the config files for every supported tool automatically.
+```bash
+codebase-mcp setup
+```
+To target a specific tool:
+```bash
+codebase-mcp setup --ide claude-code
+codebase-mcp setup --ide cursor
+codebase-mcp setup --ide windsurf
+codebase-mcp setup --ide cline
+codebase-mcp setup --ide zed
+```
+For Claude Code global registration (available across all projects):
+```bash
+codebase-mcp setup --ide claude-code --global
+```
+After setup, restart your IDE or agent. The MCP server named `codebase-intel` will appear in the tool list.
+---
+## Indexing your project
+Before the agent can use the tools, you need to build the index. You can do this from the terminal or let the agent do it on first run.
+From the terminal:
+```bash
+cd /path/to/your/project
+codebase-mcp index .
+```
+Force a full re-index (ignores hash cache):
+```bash
+codebase-mcp index . --full
+```
+The index is stored at `.codebase-mcp/index.db` inside your project directory. On typical projects it builds in under 30 seconds.
+---
+## Tools reference
+These are the tools the agent has access to. A well-configured agent should call these instead of reading files.
+### Session start
+**session_bootstrap(project_root)**
+Call this at the start of every session. Returns project stats, most-referenced files, active decisions, recent notes, and whether the index needs updating. One call gives the agent a complete orientation with minimal tokens.
+**what_changed(project_root)**
+Returns what files changed since the last index run, with a diff of added and removed symbols. Use this when returning to a project after a gap.
+**index_project(project_root, full_reindex)**
+Builds or updates the index. Only changed files are re-parsed. Call this when session_bootstrap reports stale files.
+**get_index_status(project_root)**
+Check staleness without triggering a re-index.
+### Structural queries
+**search_symbols(query, kind, language, limit)**
+Full-text search across all symbols. Finds functions, classes, methods, structs, interfaces, and traits by name or docstring content. Works across all languages.
+```
+search_symbols("authenticate")
+search_symbols("User", kind="class")
+search_symbols("validate", language="typescript")
+```
+**get_symbol(qualified_name)**
+Get complete details for one symbol: its signature, docstring, file location, and its full list of callers and callees.
+```
+get_symbol("src.auth.jwt.verify_token")
+```
+**get_file_outline(path)**
+Get the complete structure of a file — all symbols organized hierarchically (methods grouped under their class), with signatures and line ranges. Does not require reading the file.
+**get_file_context(path)**
+Everything about a file in one call: its outline, who imports it, decisions linked to it, and notes attached to it.
+**get_call_graph(qualified_name, depth)**
+Trace callers and callees recursively. Shows exactly what calls a function and what that function calls, across files and languages.
+**find_references(name)**
+Find every place a symbol is used across the codebase.
+**search_code(pattern, file_pattern, language)**
+Grep-style search across source files. Returns matching lines with context. Use this when you need to see actual code, not just structure.
+**find_todos()**
+Returns all TODO, FIXME, HACK, BUG, and NOTE comments in the entire codebase in a single call.
+**query_symbols_sql(sql)**
+Run a raw SQL query against the symbol database for advanced filtering. Use this for anything the other search tools cannot express.
+### Knowledge persistence
+**add_decision(title, body, category, session_id)**
+Record an architectural decision. Categories: architecture, security, performance, api, database, general. These persist across every session, agent, and IDE.
+```
+add_decision(
+  title="Use JWT for stateless auth",
+  body="Chosen over sessions to support horizontal scaling. HS256 with 1h expiry.",
+  category="security"
+)
+```
+**search_decisions(query, category, status)**
+Search recorded decisions by keyword, category, or status (active/superseded/deprecated). Always check this at session start to recover context from previous sessions.
+**update_decision(decision_id, status, body)**
+Mark a decision as superseded or deprecated when the approach changes.
+**add_note(body, scope, scope_ref)**
+Attach a note to the whole project, a specific file, or a specific symbol. Notes persist and are returned by get_file_context.
+```
+add_note("Token refresh logic is intentionally synchronous — see issue #42", scope="file", scope_ref="src/auth/jwt.py")
+```
+**get_notes(scope, scope_ref)**
+Retrieve notes for the project, a file, or a symbol.
+### Knowledge transfer
+**export_context(project_root, output)**
+Export decisions, notes, and optionally the full symbol index to a JSON file. Use this before switching agents or onboarding a new team member.
+**import_context(import_file, project_root)**
+Import an exported context file. Merges decisions and notes into the local database.
+**create_handoff(project_root, output)**
+Create a complete handoff package: context export plus a human-readable summary of the project state, top files, and active decisions. Use this when switching from one agent to another.
+**index_github_repo(url)**
+Clone a GitHub repository, index it, and return a bootstrap summary. Use this to explore any open source project without manually cloning.
+---
+## How to use it to full potential
+### At the start of every session
+The agent should always call `session_bootstrap` first, not read any files. If the index is stale, it should call `index_project` immediately after. Then it should call `search_decisions` to recover context from previous sessions.
+A good agent prompt to enforce this:
+```
+Before doing anything else in this project:
+1. Call session_bootstrap to orient yourself
+2. If index_stale is true, call index_project
+3. Call search_decisions to review past decisions
+4. Never read a source file to understand structure — use search_symbols, get_file_outline, or get_call_graph instead
+```
+### Recording decisions as you work
+Every significant decision made during a session should be recorded immediately with `add_decision`. This is the most important habit. When you or a future agent returns to the project, `search_decisions` recovers this context in one call instead of re-deriving it from reading code.
+What is worth recording:
+- Why a library or framework was chosen
+- Why a design pattern was picked over alternatives
+- Security constraints or compliance requirements
+- Non-obvious performance decisions
+- Anything that would take more than 5 minutes to figure out from reading the code
+### Using the call graph
+Before modifying a function, call `get_symbol` with its qualified name to see its callers. This tells you the blast radius of any change without reading files. `get_call_graph` with depth > 1 traces multi-level dependencies.
+### Watching for changes
+If you run the watcher, the index stays current automatically:
+```bash
+codebase-mcp serve --watch
+```
+From within a session, you can also start it via the tool:
+```
+start_file_watcher(project_root="...")
+```
+---
+## Transferring knowledge when switching platforms
+The index and all knowledge (decisions, notes, session history) live in `.codebase-mcp/index.db` inside your project directory. There are three ways to transfer this to another platform or agent.
+### Option 1: Commit the database to git
+Add the `.codebase-mcp/` directory to git instead of ignoring it. Anyone who clones the repository gets the full index, all decisions, and all notes immediately. No re-indexing required.
+Remove the exclusion from your `.gitignore`:
+```
+# Remove or comment out this line:
+# .codebase-mcp/
+```
+Then commit:
+```bash
+git add .codebase-mcp/index.db
+git commit -m "Add codebase index and decision log"
+```
+This is the recommended approach for teams. New developers get the full context on clone.
+### Option 2: Export and import
+Export from the source machine:
+```bash
+codebase-mcp export . --output context.json
+```
+Import on the destination:
+```bash
+codebase-mcp import context.json /path/to/project
+```
+The export includes decisions, notes, and optionally the full symbol index. You can share it as a file attachment, a gist, or through any file transfer mechanism.
+The agent can also do this directly:
+```
+export_context(project_root="/path/to/project", output="context.json")
+import_context(import_file="context.json", project_root="/path/to/project")
+```
+### Option 3: Create a handoff package
+When switching from one agent or IDE to another mid-session:
+```bash
+codebase-mcp handoff . --output handoff/
+```
+Or via tool:
+```
+create_handoff(project_root="...")
+```
+The handoff includes the export JSON plus a written summary of current project state, active decisions, and recent changes. Give this to the new agent at session start.
+### What transfers and what does not
+| Data | Transfers | Notes |
+|---|---|---|
+| Decisions | Yes | All statuses |
+| Notes | Yes | All scopes |
+| Symbol index | Optional | Rebuilt automatically by index_project |
+| Call graph | Rebuilt from index | Run index_project after import |
+| Session history | No | Sessions are local only |
+---
+## Configuration
+The config file lives at `.codebase-mcp/config.json` inside your project. It is created automatically on first index with sensible defaults.
+Key settings:
+```json
+{
+  "project_root": "/path/to/project",
+  "exclude_patterns": [
+    "**/.git/**",
+    "**/node_modules/**",
+    "**/__pycache__/**",
+    "**/dist/**",
+    "**/build/**",
+    "**/*.min.js",
+    "**/*.map"
+  ],
+  "max_file_size_kb": 500,
+  "include_extensions": []
+}
+```
+`exclude_patterns` accepts standard glob patterns. `include_extensions` restricts indexing to specific file types if set.
+---
+## Supported languages
+Full tree-sitter parsing (functions, classes, methods, imports, call graph):
+- Python
+- TypeScript and JavaScript (including JSX/TSX)
+- Go
+- Rust
+Universal parser (symbols and structure, no call graph):
+- Java, Kotlin, Swift, C, C++, C#, Ruby, PHP, Scala, Dart, Lua, Bash, SQL, HTML, CSS, YAML, TOML, JSON, Dockerfile, Makefile, and 30+ more via tree-sitter-language-pack
+---
+## Development
+```bash
+git clone https://github.com/vatsal2025/CodeBase.git
+cd CodeBase
+pip install -e ".[dev]"
+pytest tests/
+```
+All 147 tests must pass before submitting changes.
+---
+## License
+MIT