PyPI - claude-context-compiler - Versions diffs - 0.1.0__tar.gz - Mend

claude-context-compiler 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (46) hide show

claude_context_compiler-0.1.0/.claude/settings.local.json ADDED Viewed

@@ -0,0 +1,25 @@
+{
+  "permissions": {
+    "allow": [
+      "WebFetch(domain:claude.ai)",
+      "WebSearch",
+      "WebFetch(domain:github.com)",
+      "WebFetch(domain:aider.chat)",
+      "WebFetch(domain:arxiv.org)",
+      "WebFetch(domain:openspec.dev)",
+      "Bash(uv run:*)",
+      "Bash(python3:*)",
+      "Bash(tail:*)",
+      "Bash(/Users/punakkals/work/context-compiler/.venv/bin/pip install:*)",
+      "Bash(.venv/bin/pip install:*)",
+      "Bash(.venv/bin/pytest:*)",
+      "Bash(.venv/bin/python:*)",
+      "Bash(echo EXIT: $?:*)",
+      "Bash(.venv/bin/context-compiler explain:*)",
+      "Bash(ls:*)",
+      "Bash(head:*)",
+      "Bash(/Users/punakkals/work/context-compiler/.venv/bin/python:*)",
+      "Bash(claude mcp:*)"
+    ]
+  }
+}

claude_context_compiler-0.1.0/CLAUDE.md ADDED Viewed

@@ -0,0 +1,140 @@
+# context-compiler
+A local-first MCP server that indexes Python and TypeScript repositories into a dependency graph and returns the smallest correct context bundle for a given coding task — with a one-line rationale for every included file.
+## What this project does
+- Parses `.py`, `.ts`, `.tsx` files using tree-sitter into a KuzuDB graph
+- Classifies a task string (`"fix the retry logic"`) into BUG_FIX / NEW_FEATURE / REFACTOR using keyword scoring — no LLM calls
+- Finds entry nodes using BM25 over tokenised symbol names + file paths (fuzzy + optional semantic fallback)
+- Traverses the graph with a strategy tuned per task type
+- Scores candidates and compiles a token-budget-aware bundle
+- Returns the bundle with a rationale string per file and an excluded list with reasons
+- Exposes everything as an MCP server over stdio (no HTTP, no port)
+## Spec-driven development
+All component specs live in `openspec/specs/`. Read the relevant spec before implementing any component.
+| Component | Spec | Source requirements |
+|---|---|---|
+| Indexer | `openspec/specs/indexer/spec.md` | FR-01, UC-01 |
+| Classifier | `openspec/specs/classifier/spec.md` | FR-02, UC-08 |
+| Entry node matching | `openspec/specs/entry-node-matching/spec.md` | FR-03.1 (hard problem) |
+| Traversal | `openspec/specs/traversal/spec.md` | FR-03, UC-02/03/04 |
+| Scorer | `openspec/specs/scorer/spec.md` | FR-04, UC-07 |
+| Rationale | `openspec/specs/rationale/spec.md` | FR-05, UC-06 |
+| MCP server | `openspec/specs/mcp-server/spec.md` | FR-06, UC-09/10 |
+Full requirements: `SRS-context-compiler.md`
+Full use cases: `SUC-context-compiler.md`
+## Architecture
+```
+src/context_compiler/
+├── indexer/
+│   ├── parser.py          # tree-sitter extraction (Python + TypeScript)
+│   └── graph.py           # KuzuDB schema + read/write
+├── retrieval/
+│   ├── classifier.py      # keyword scorer → TaskType + confidence
+│   ├── entry_nodes.py     # BM25 + fuzzy + optional fastembed
+│   ├── traversal.py       # BFS per task type
+│   ├── scorer.py          # composite scoring + budget enforcement
+│   └── rationale.py       # template-based rationale generation
+├── server/
+│   └── mcp_server.py      # FastMCP stdio server (get_context, refresh)
+├── cli.py                 # index + explain + serve CLI entry points
+└── models.py              # Pydantic v2 models (ContextBundle, RefreshResult, Node, Edge)
+```
+## Key design decisions
+**No LLM calls in the critical path.** Classification, traversal, scoring, and rationale are all deterministic. The optional fastembed model is a static local ONNX model — not an API call.
+**stdio transport, not HTTP.** The MCP server communicates via stdin/stdout. No port is opened. Claude Code spawns it as a subprocess.
+**Multi-entry BFS.** Entry node matching returns top-K candidates (default K=5), not top-1. BFS runs from all K nodes and results are merged. Nodes reachable from multiple entry points score higher.
+**Fastembed is optional.** Install with `pip install context-compiler[semantic]` to enable embedding-based semantic fallback for entry node matching. Base install uses BM25 + fuzzy only.
+**Hard token ceiling.** The bundle never exceeds the token budget (`CC_TOKEN_BUDGET` env var or `budget` parameter). Partial file inclusion is not permitted.
+**Deterministic output.** Same repo state + same task = byte-identical bundle. Score ties broken by file path (lexicographic ascending).
+## Non-negotiable constraints
+- Source code, graph data, and task strings MUST NOT leave the machine
+- No LLM API calls in the classification, traversal, scoring, or rationale steps
+- The MCP server MUST NOT crash on bad input, missing graph, or parse errors — always return a structured response
+- The bundle MUST NEVER exceed the token budget
+## Tech stack
+| Concern | Library | Why |
+|---|---|---|
+| AST parsing | `tree-sitter`, `tree-sitter-python`, `tree-sitter-typescript` | Production-grade, no language server needed |
+| Graph storage | `kuzu` | Embedded, no server process, Python bindings |
+| BM25 retrieval | `rank-bm25` | Pure Python, zero native deps |
+| Fuzzy matching | `rapidfuzz` | C extension, fast, widely available |
+| Semantic fallback | `fastembed` (optional) | ONNX runtime, no PyTorch, 23MB model |
+| MCP server | `fastmcp` | Anthropic-maintained, stdio support |
+| CLI | `click` | Standard, well-tested |
+| Models | `pydantic` v2 | Native FastMCP support, JSON Schema export |
+| Packaging | `uv` / `pyproject.toml` | Supports `uvx context-compiler` invocation |
+## CLI commands
+```bash
+# Index a repository
+uvx context-compiler index --repo ./my-project
+# Inspect what bundle a task would produce
+uvx context-compiler explain --repo ./my-project --task "fix the retry logic"
+# Start the MCP server (Claude Code spawns this automatically)
+uvx context-compiler serve --repo ./my-project
+```
+## Environment variables
+| Variable | Default | Description |
+|---|---|---|
+| `CC_TOKEN_BUDGET` | `8000` | Default token budget for get_context |
+| `CC_REPO_PATH` | required | Repository path (overridden by --repo flag) |
+## Performance targets
+| Operation | Target |
+|---|---|
+| `get_context` response | ≤ 500ms p95 |
+| Initial index (10k files) | ≤ 120 seconds |
+| Entry node matching (BM25) | ≤ 50ms |
+| Server startup | ≤ 3 seconds |
+## Graph data model
+```
+Node
+  id:            string        # "{file_path}::{symbol_name}"
+  file_path:     string
+  symbol_name:   string | null
+  symbol_type:   FILE | FUNCTION | CLASS | METHOD
+  line_start:    int
+  line_end:      int
+  token_count:   int           # ceil(char_count / 4)
+  last_modified: timestamp
+  language:      PYTHON | TYPESCRIPT
+Edge
+  source_id:     string → Node.id
+  target_id:     string → Node.id
+  edge_type:     CALLS | IMPORTS | COVERS | DEFINED_IN
+```
+## Before implementing any component
+1. Read the relevant spec in `openspec/specs/<component>/spec.md`
+2. Check all scenarios pass — they are the acceptance criteria
+3. Do not add features not in the spec
+4. Do not make LLM API calls from any non-server component

claude_context_compiler-0.1.0/PKG-INFO ADDED Viewed

@@ -0,0 +1,184 @@
+Metadata-Version: 2.4
+Name: claude-context-compiler
+Version: 0.1.0
+Summary: Local context compiler for AI coding assistants — smallest correct context bundle with rationale
+Project-URL: Repository, https://github.com/punakkals/context-compiler
+Author: Punakkals
+License: Apache-2.0
+Keywords: claude,code-intelligence,context,llm,mcp,tree-sitter
+Classifier: Development Status :: 3 - Alpha
+Classifier: Intended Audience :: Developers
+Classifier: License :: OSI Approved :: Apache Software License
+Classifier: Programming Language :: Python :: 3
+Classifier: Programming Language :: Python :: 3.11
+Classifier: Programming Language :: Python :: 3.12
+Classifier: Programming Language :: Python :: 3.13
+Classifier: Topic :: Software Development :: Libraries :: Python Modules
+Requires-Python: >=3.11
+Requires-Dist: click>=8.0
+Requires-Dist: fastmcp>=2.0
+Requires-Dist: kuzu>=0.7
+Requires-Dist: pydantic>=2.0
+Requires-Dist: rank-bm25>=0.2.2
+Requires-Dist: rapidfuzz>=3.0
+Requires-Dist: tree-sitter-python>=0.23
+Requires-Dist: tree-sitter-typescript>=0.23
+Requires-Dist: tree-sitter>=0.23
+Provides-Extra: dev
+Requires-Dist: datamodel-code-generator>=0.25; extra == 'dev'
+Requires-Dist: pytest-asyncio>=0.23; extra == 'dev'
+Requires-Dist: pytest>=8.0; extra == 'dev'
+Requires-Dist: ruff>=0.4; extra == 'dev'
+Provides-Extra: semantic
+Requires-Dist: fastembed>=0.4; extra == 'semantic'
+Description-Content-Type: text/markdown
+# context-compiler
+A local-first MCP server that indexes your Python and TypeScript codebase into a dependency graph and returns the **smallest correct context bundle** for any coding task — with a one-line rationale for every included file.
+No cloud. No LLM API calls. No data leaves your machine.
+---
+## The problem
+When you ask Claude to fix a bug or add a feature, it reads files by guessing which ones are relevant. It over-reads (wastes tokens) or misses the file that actually matters. The bigger the codebase, the worse this gets.
+## How it works
+```
+Your task: "fix the keycloak token expiry"
+         ↓
+  Classify → BUG_FIX
+         ↓
+  Find entry nodes → keycloak.py (BM25 + docstring matching)
+         ↓
+  Traverse graph → keycloak.py + secured_view.py + test_keycloak_steps.py
+         ↓
+  Score + budget → 870 tokens (within 8000 limit)
+         ↓
+  Return bundle with rationale per file
+```
+Everything — classification, traversal, scoring, rationale — is deterministic. Same repo + same task = same bundle, every time.
+---
+## Installation
+```bash
+# Index a repository
+uvx context-compiler index --repo ./my-project
+# Preview what context a task would produce
+uvx context-compiler explain --repo ./my-project --task "fix the retry logic"
+# Start the MCP server (Claude Code does this automatically)
+uvx context-compiler serve --repo ./my-project
+```
+Requires Python 3.11+.
+### Optional: semantic fallback
+Install the optional fastembed model (23MB ONNX, no PyTorch) for better matching when task terms don't appear in symbol names:
+```bash
+pip install "context-compiler[semantic]"
+```
+---
+## Claude Code integration
+**1. Register the MCP server:**
+```bash
+claude mcp add --scope user context-compiler uvx -- context-compiler serve --repo /path/to/your/repo
+```
+**2. Add to your repo's `CLAUDE.md`:**
+```markdown
+## Context retrieval
+Before reading any source files, call `get_context` with the task description.
+Read only the files it returns.
+```
+**3. Use it:**
+```
+> Fix the keycloak token expiry bug
+```
+Claude calls `get_context("fix the keycloak token expiry bug")`, gets back the exact files to read, and starts working — no guessing.
+---
+## MCP tools
+### `get_context(task, budget=8000)`
+Returns the minimal file bundle for a coding task.
+```json
+{
+  "files": ["admin/keycloak.py", "admin/views/secured_view.py"],
+  "rationale": [
+    "Included Keycloak as primary task location (matched 'keycloak')",
+    "Included SecuredView._has_role because it is called by Keycloak (depth 1)"
+  ],
+  "token_estimate": 870,
+  "tokens_saved": 0,
+  "task_type": "BUG_FIX",
+  "confidence": 1.0
+}
+```
+### `refresh(changed_files)`
+Re-indexes the repository after file changes.
+---
+## What makes it different
+**Task-type-aware traversal.** A bug fix traverses inbound callers and test coverage at depth 2. A new feature traverses imports and sibling modules. A refactor traverses everything at depth 3. No other tool adjusts retrieval strategy based on what you're actually trying to do.
+**Rationale per file.** Every included file has a one-line explanation of why it's there. You can see what Claude will read before it reads it.
+**Hard token budget.** The bundle never exceeds the limit. Partial file inclusion is not permitted.
+**Local-first.** Embedded KuzuDB graph, no server, no port, no auth. Works offline.
+---
+## Supported languages
+| Language | Parsing | Docstrings |
+|---|---|---|
+| Python | tree-sitter-python | ✓ (first line of docstring) |
+| TypeScript / TSX | tree-sitter-typescript | ✓ (JSDoc `/** */`) |
+---
+## Environment variables
+| Variable | Default | Description |
+|---|---|---|
+| `CC_REPO_PATH` | required | Path to indexed repository |
+| `CC_TOKEN_BUDGET` | `8000` | Default token budget for `get_context` |
+---
+## Tech stack
+[tree-sitter](https://tree-sitter.github.io/) · [KuzuDB](https://kuzudb.com/) · [BM25 (rank-bm25)](https://github.com/dorianbrown/rank_bm25) · [rapidfuzz](https://github.com/maxbachmann/RapidFuzz) · [FastMCP](https://github.com/jlowin/fastmcp) · [fastembed](https://github.com/qdrant/fastembed) (optional)
+---
+## License
+Apache 2.0

claude_context_compiler-0.1.0/README.md ADDED Viewed

@@ -0,0 +1,149 @@
+# context-compiler
+A local-first MCP server that indexes your Python and TypeScript codebase into a dependency graph and returns the **smallest correct context bundle** for any coding task — with a one-line rationale for every included file.
+No cloud. No LLM API calls. No data leaves your machine.
+---
+## The problem
+When you ask Claude to fix a bug or add a feature, it reads files by guessing which ones are relevant. It over-reads (wastes tokens) or misses the file that actually matters. The bigger the codebase, the worse this gets.
+## How it works
+```
+Your task: "fix the keycloak token expiry"
+         ↓
+  Classify → BUG_FIX
+         ↓
+  Find entry nodes → keycloak.py (BM25 + docstring matching)
+         ↓
+  Traverse graph → keycloak.py + secured_view.py + test_keycloak_steps.py
+         ↓
+  Score + budget → 870 tokens (within 8000 limit)
+         ↓
+  Return bundle with rationale per file
+```
+Everything — classification, traversal, scoring, rationale — is deterministic. Same repo + same task = same bundle, every time.
+---
+## Installation
+```bash
+# Index a repository
+uvx context-compiler index --repo ./my-project
+# Preview what context a task would produce
+uvx context-compiler explain --repo ./my-project --task "fix the retry logic"
+# Start the MCP server (Claude Code does this automatically)
+uvx context-compiler serve --repo ./my-project
+```
+Requires Python 3.11+.
+### Optional: semantic fallback
+Install the optional fastembed model (23MB ONNX, no PyTorch) for better matching when task terms don't appear in symbol names:
+```bash
+pip install "context-compiler[semantic]"
+```
+---
+## Claude Code integration
+**1. Register the MCP server:**
+```bash
+claude mcp add --scope user context-compiler uvx -- context-compiler serve --repo /path/to/your/repo
+```
+**2. Add to your repo's `CLAUDE.md`:**
+```markdown
+## Context retrieval
+Before reading any source files, call `get_context` with the task description.
+Read only the files it returns.
+```
+**3. Use it:**
+```
+> Fix the keycloak token expiry bug
+```
+Claude calls `get_context("fix the keycloak token expiry bug")`, gets back the exact files to read, and starts working — no guessing.
+---
+## MCP tools
+### `get_context(task, budget=8000)`
+Returns the minimal file bundle for a coding task.
+```json
+{
+  "files": ["admin/keycloak.py", "admin/views/secured_view.py"],
+  "rationale": [
+    "Included Keycloak as primary task location (matched 'keycloak')",
+    "Included SecuredView._has_role because it is called by Keycloak (depth 1)"
+  ],
+  "token_estimate": 870,
+  "tokens_saved": 0,
+  "task_type": "BUG_FIX",
+  "confidence": 1.0
+}
+```
+### `refresh(changed_files)`
+Re-indexes the repository after file changes.
+---
+## What makes it different
+**Task-type-aware traversal.** A bug fix traverses inbound callers and test coverage at depth 2. A new feature traverses imports and sibling modules. A refactor traverses everything at depth 3. No other tool adjusts retrieval strategy based on what you're actually trying to do.
+**Rationale per file.** Every included file has a one-line explanation of why it's there. You can see what Claude will read before it reads it.
+**Hard token budget.** The bundle never exceeds the limit. Partial file inclusion is not permitted.
+**Local-first.** Embedded KuzuDB graph, no server, no port, no auth. Works offline.
+---
+## Supported languages
+| Language | Parsing | Docstrings |
+|---|---|---|
+| Python | tree-sitter-python | ✓ (first line of docstring) |
+| TypeScript / TSX | tree-sitter-typescript | ✓ (JSDoc `/** */`) |
+---
+## Environment variables
+| Variable | Default | Description |
+|---|---|---|
+| `CC_REPO_PATH` | required | Path to indexed repository |
+| `CC_TOKEN_BUDGET` | `8000` | Default token budget for `get_context` |
+---
+## Tech stack
+[tree-sitter](https://tree-sitter.github.io/) · [KuzuDB](https://kuzudb.com/) · [BM25 (rank-bm25)](https://github.com/dorianbrown/rank_bm25) · [rapidfuzz](https://github.com/maxbachmann/RapidFuzz) · [FastMCP](https://github.com/jlowin/fastmcp) · [fastembed](https://github.com/qdrant/fastembed) (optional)
+---
+## License
+Apache 2.0