PyPI - codegraph-cli - Versions diffs - 2.1.0__tar.gz → 2.1.2__tar.gz - Mend

codegraph-cli 2.1.0tar.gz → 2.1.2tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (72) hide show

{codegraph_cli-2.1.0 → codegraph_cli-2.1.2}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: codegraph-cli
-Version: 2.1.0
+Version: 2.1.2
 Summary: AI-powered code intelligence CLI with multi-agent analysis, impact graphs, and conversational coding.
 Author-email: Ali Nasir <muhammadalinasir00786@gmail.com>
 License: MIT
@@ -35,22 +35,32 @@ Requires-Dist: tree-sitter>=0.24.0
 Requires-Dist: tree-sitter-python>=0.23.0
 Requires-Dist: tree-sitter-javascript>=0.23.0
 Requires-Dist: tree-sitter-typescript>=0.23.0
-Requires-Dist: litellm>=1.30.0
+Requires-Dist: rich>=13.0.0
+Requires-Dist: python-docx>=1.0.0
+Requires-Dist: pydantic>=2.0.0
 Provides-Extra: crew
 Requires-Dist: crewai>=0.80.0; extra == "crew"
+Provides-Extra: explore
+Requires-Dist: starlette>=0.27.0; extra == "explore"
+Requires-Dist: uvicorn>=0.24.0; extra == "explore"
 Provides-Extra: dev
 Requires-Dist: pytest>=7.4.0; extra == "dev"
 Requires-Dist: pytest-cov>=4.1.0; extra == "dev"
 Requires-Dist: pytest-mock>=3.11.0; extra == "dev"
 Requires-Dist: build>=1.0.0; extra == "dev"
 Requires-Dist: twine>=5.0.0; extra == "dev"
+Provides-Extra: watch
+Requires-Dist: watchdog>=3.0.0; extra == "watch"
 Provides-Extra: embeddings
 Requires-Dist: torch>=2.0.0; extra == "embeddings"
 Requires-Dist: transformers<5.0.0,>=4.48.0; extra == "embeddings"
 Provides-Extra: all
 Requires-Dist: crewai>=0.80.0; extra == "all"
+Requires-Dist: starlette>=0.27.0; extra == "all"
+Requires-Dist: uvicorn>=0.24.0; extra == "all"
 Requires-Dist: torch>=2.0.0; extra == "all"
 Requires-Dist: transformers<5.0.0,>=4.48.0; extra == "all"
+Requires-Dist: watchdog>=3.0.0; extra == "all"
 Dynamic: license-file
 # CodeGraph CLI
@@ -59,7 +69,8 @@ Dynamic: license-file
 [![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](LICENSE)
 [![Python 3.9+](https://img.shields.io/badge/python-3.9%2B-blue.svg)](https://www.python.org)
-[![Version](https://img.shields.io/badge/version-2.0.0-blue.svg)](https://github.com/al1-nasir/codegraph-cli)
+[![Version](https://img.shields.io/badge/version-2.1.1-blue.svg)](https://github.com/al1-nasir/codegraph-cli)
+[![CI](https://github.com/al1-nasir/codegraph-cli/actions/workflows/ci.yml/badge.svg)](https://github.com/al1-nasir/codegraph-cli/actions/workflows/ci.yml)
 ---
@@ -84,12 +95,24 @@ Core capabilities:
 pip install codegraph-cli
 ```
+With neural embedding models (semantic code search):
+```bash
+pip install codegraph-cli[embeddings]
+```
 With CrewAI multi-agent support:
 ```bash
 pip install codegraph-cli[crew]
 ```
+Everything:
+```bash
+pip install codegraph-cli[all]
+```
 For development:
 ```bash
@@ -105,15 +128,15 @@ pip install -e ".[dev]"
 ### 1. Configure your LLM provider
 ```bash
-cg setup
+cg config setup
 ```
 This runs an interactive wizard that writes configuration to `~/.codegraph/config.toml`. Alternatively, switch providers directly:
 ```bash
-cg set-llm openrouter
-cg set-llm groq
-cg set-llm ollama
+cg config set-llm openrouter
+cg config set-llm groq
+cg config set-llm ollama
 ```
 ### 2. Index a project
@@ -140,18 +163,46 @@ cg chat start --crew    # multi-agent mode
 | Provider | Type | Configuration |
 |----------|------|---------------|
-| Ollama | Local, free | `cg set-llm ollama` |
-| Groq | Cloud, free tier | `cg set-llm groq` |
-| OpenAI | Cloud | `cg set-llm openai` |
-| Anthropic | Cloud | `cg set-llm anthropic` |
-| Gemini | Cloud | `cg set-llm gemini` |
-| OpenRouter | Cloud, multi-model | `cg set-llm openrouter` |
+| Ollama | Local, free | `cg config set-llm ollama` |
+| Groq | Cloud, free tier | `cg config set-llm groq` |
+| OpenAI | Cloud | `cg config set-llm openai` |
+| Anthropic | Cloud | `cg config set-llm anthropic` |
+| Gemini | Cloud | `cg config set-llm gemini` |
+| OpenRouter | Cloud, multi-model | `cg config set-llm openrouter` |
 All configuration is stored in `~/.codegraph/config.toml`. No environment variables required.
 ```bash
-cg show-llm        # view current provider, model, and endpoint
-cg unset-llm       # reset to defaults
+cg config show-llm        # view current provider, model, and endpoint
+cg config unset-llm       # reset to defaults
+```
+---
+## Embedding Models
+CodeGraph supports configurable embedding models for semantic code search. Choose based on your hardware and quality needs:
+| Model | Download | Dim | Quality | Command |
+|-------|----------|-----|---------|---------|
+| hash | 0 bytes | 256 | Keyword-only | `cg config set-embedding hash` |
+| minilm | ~80 MB | 384 | Decent | `cg config set-embedding minilm` |
+| bge-base | ~440 MB | 768 | Good | `cg config set-embedding bge-base` |
+| jina-code | ~550 MB | 768 | Code-aware | `cg config set-embedding jina-code` |
+| qodo-1.5b | ~6.2 GB | 1536 | Best | `cg config set-embedding qodo-1.5b` |
+The default is `hash` (zero-dependency, no download). Neural models require the `[embeddings]` extra and are downloaded on first use from HuggingFace.
+```bash
+cg config set-embedding jina-code    # switch to a neural model
+cg config show-embedding             # view current model and all options
+cg config unset-embedding            # reset to hash default
+```
+After changing the embedding model, re-index your project:
+```bash
+cg index /path/to/project
 ```
 ---
@@ -252,8 +303,9 @@ CLI Layer (Typer)
     |       |                           |
     |       +-- Parser (tree-sitter)    +-- VectorStore (LanceDB)
     |       +-- RAGRetriever            |
-    |       +-- LLM Adapter             +-- Embeddings
-    |
+    |       +-- LLM Adapter             +-- Embeddings (configurable)
+    |                                       hash | minilm | bge-base
+    |                                       jina-code | qodo-1.5b
     +-- ChatAgent (standard mode)
     |
     +-- CrewChatAgent (--crew mode)
@@ -264,6 +316,8 @@ CLI Layer (Typer)
             +-- Code Analysis Agent ---> 3 search/analysis tools
 ```
+**Embeddings**: Five models available via `cg config set-embedding`. Hash (default, zero-dependency) through Qodo-Embed-1-1.5B (best quality, 6 GB). Neural models use raw `transformers` + `torch` — no sentence-transformers overhead. Models are cached in `~/.codegraph/models/`.
 **Parser**: tree-sitter grammars for Python, JavaScript, and TypeScript. Extracts modules, classes, functions, imports, and call relationships into a directed graph.
 **Storage**: SQLite for the code graph (nodes + edges), LanceDB for vector embeddings. All data stored under `~/.codegraph/`.
@@ -278,14 +332,14 @@ CLI Layer (Typer)
 codegraph_cli/
     cli.py               # main Typer application, all top-level commands
     cli_chat.py           # interactive chat REPL with styled output
-    cli_setup.py          # setup wizard, set-llm, unset-llm, show-llm
+    cli_setup.py          # setup wizard, set-llm, unset-llm, set-embedding
     cli_v2.py             # v2 code generation commands
     config.py             # loads config from TOML
-    config_manager.py     # TOML read/write, provider validation
+    config_manager.py     # TOML read/write, provider and embedding config
     llm.py                # multi-provider LLM adapter
     parser.py             # tree-sitter AST parsing
     storage.py            # SQLite graph store
-    embeddings.py         # hash-based embedding model
+    embeddings.py         # configurable embedding engine (5 models)
     rag.py                # RAG retriever
     vector_store.py       # LanceDB vector store
     orchestrator.py       # coordinates parsing, search, impact
@@ -310,7 +364,7 @@ codegraph_cli/
 git clone https://github.com/al1-nasir/codegraph-cli.git
 cd codegraph-cli
 python -m venv .venv && source .venv/bin/activate
-pip install -e ".[dev,crew]"
+pip install -e ".[dev,crew,embeddings]"
 pytest
 ```

{codegraph_cli-2.1.0 → codegraph_cli-2.1.2}/README.md RENAMED Viewed

@@ -4,7 +4,8 @@
 [![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](LICENSE)
 [![Python 3.9+](https://img.shields.io/badge/python-3.9%2B-blue.svg)](https://www.python.org)
-[![Version](https://img.shields.io/badge/version-2.0.0-blue.svg)](https://github.com/al1-nasir/codegraph-cli)
+[![Version](https://img.shields.io/badge/version-2.1.1-blue.svg)](https://github.com/al1-nasir/codegraph-cli)
+[![CI](https://github.com/al1-nasir/codegraph-cli/actions/workflows/ci.yml/badge.svg)](https://github.com/al1-nasir/codegraph-cli/actions/workflows/ci.yml)
 ---
@@ -29,12 +30,24 @@ Core capabilities:
 pip install codegraph-cli
 ```
+With neural embedding models (semantic code search):
+```bash
+pip install codegraph-cli[embeddings]
+```
 With CrewAI multi-agent support:
 ```bash
 pip install codegraph-cli[crew]
 ```
+Everything:
+```bash
+pip install codegraph-cli[all]
+```
 For development:
 ```bash
@@ -50,15 +63,15 @@ pip install -e ".[dev]"
 ### 1. Configure your LLM provider
 ```bash
-cg setup
+cg config setup
 ```
 This runs an interactive wizard that writes configuration to `~/.codegraph/config.toml`. Alternatively, switch providers directly:
 ```bash
-cg set-llm openrouter
-cg set-llm groq
-cg set-llm ollama
+cg config set-llm openrouter
+cg config set-llm groq
+cg config set-llm ollama
 ```
 ### 2. Index a project
@@ -85,18 +98,46 @@ cg chat start --crew    # multi-agent mode
 | Provider | Type | Configuration |
 |----------|------|---------------|
-| Ollama | Local, free | `cg set-llm ollama` |
-| Groq | Cloud, free tier | `cg set-llm groq` |
-| OpenAI | Cloud | `cg set-llm openai` |
-| Anthropic | Cloud | `cg set-llm anthropic` |
-| Gemini | Cloud | `cg set-llm gemini` |
-| OpenRouter | Cloud, multi-model | `cg set-llm openrouter` |
+| Ollama | Local, free | `cg config set-llm ollama` |
+| Groq | Cloud, free tier | `cg config set-llm groq` |
+| OpenAI | Cloud | `cg config set-llm openai` |
+| Anthropic | Cloud | `cg config set-llm anthropic` |
+| Gemini | Cloud | `cg config set-llm gemini` |
+| OpenRouter | Cloud, multi-model | `cg config set-llm openrouter` |
 All configuration is stored in `~/.codegraph/config.toml`. No environment variables required.
 ```bash
-cg show-llm        # view current provider, model, and endpoint
-cg unset-llm       # reset to defaults
+cg config show-llm        # view current provider, model, and endpoint
+cg config unset-llm       # reset to defaults
+```
+---
+## Embedding Models
+CodeGraph supports configurable embedding models for semantic code search. Choose based on your hardware and quality needs:
+| Model | Download | Dim | Quality | Command |
+|-------|----------|-----|---------|---------|
+| hash | 0 bytes | 256 | Keyword-only | `cg config set-embedding hash` |
+| minilm | ~80 MB | 384 | Decent | `cg config set-embedding minilm` |
+| bge-base | ~440 MB | 768 | Good | `cg config set-embedding bge-base` |
+| jina-code | ~550 MB | 768 | Code-aware | `cg config set-embedding jina-code` |
+| qodo-1.5b | ~6.2 GB | 1536 | Best | `cg config set-embedding qodo-1.5b` |
+The default is `hash` (zero-dependency, no download). Neural models require the `[embeddings]` extra and are downloaded on first use from HuggingFace.
+```bash
+cg config set-embedding jina-code    # switch to a neural model
+cg config show-embedding             # view current model and all options
+cg config unset-embedding            # reset to hash default
+```
+After changing the embedding model, re-index your project:
+```bash
+cg index /path/to/project
 ```
 ---
@@ -197,8 +238,9 @@ CLI Layer (Typer)
     |       |                           |
     |       +-- Parser (tree-sitter)    +-- VectorStore (LanceDB)
     |       +-- RAGRetriever            |
-    |       +-- LLM Adapter             +-- Embeddings
-    |
+    |       +-- LLM Adapter             +-- Embeddings (configurable)
+    |                                       hash | minilm | bge-base
+    |                                       jina-code | qodo-1.5b
     +-- ChatAgent (standard mode)
     |
     +-- CrewChatAgent (--crew mode)
@@ -209,6 +251,8 @@ CLI Layer (Typer)
             +-- Code Analysis Agent ---> 3 search/analysis tools
 ```
+**Embeddings**: Five models available via `cg config set-embedding`. Hash (default, zero-dependency) through Qodo-Embed-1-1.5B (best quality, 6 GB). Neural models use raw `transformers` + `torch` — no sentence-transformers overhead. Models are cached in `~/.codegraph/models/`.
 **Parser**: tree-sitter grammars for Python, JavaScript, and TypeScript. Extracts modules, classes, functions, imports, and call relationships into a directed graph.
 **Storage**: SQLite for the code graph (nodes + edges), LanceDB for vector embeddings. All data stored under `~/.codegraph/`.
@@ -223,14 +267,14 @@ CLI Layer (Typer)
 codegraph_cli/
     cli.py               # main Typer application, all top-level commands
     cli_chat.py           # interactive chat REPL with styled output
-    cli_setup.py          # setup wizard, set-llm, unset-llm, show-llm
+    cli_setup.py          # setup wizard, set-llm, unset-llm, set-embedding
     cli_v2.py             # v2 code generation commands
     config.py             # loads config from TOML
-    config_manager.py     # TOML read/write, provider validation
+    config_manager.py     # TOML read/write, provider and embedding config
     llm.py                # multi-provider LLM adapter
     parser.py             # tree-sitter AST parsing
     storage.py            # SQLite graph store
-    embeddings.py         # hash-based embedding model
+    embeddings.py         # configurable embedding engine (5 models)
     rag.py                # RAG retriever
     vector_store.py       # LanceDB vector store
     orchestrator.py       # coordinates parsing, search, impact
@@ -255,7 +299,7 @@ codegraph_cli/
 git clone https://github.com/al1-nasir/codegraph-cli.git
 cd codegraph-cli
 python -m venv .venv && source .venv/bin/activate
-pip install -e ".[dev,crew]"
+pip install -e ".[dev,crew,embeddings]"
 pytest
 ```

{codegraph_cli-2.1.0 → codegraph_cli-2.1.2}/codegraph_cli/__init__.py RENAMED Viewed

@@ -1,4 +1,4 @@
 """CodeGraph CLI package."""
 __all__ = ["__version__"]
-__version__ = "2.0.1"
+__version__ = "2.1.2"

{codegraph_cli-2.1.0 → codegraph_cli-2.1.2}/codegraph_cli/agents.py RENAMED Viewed

@@ -2,17 +2,66 @@
 from __future__ import annotations
+import re
 from collections import deque
 from pathlib import Path
 from typing import Dict, List, Set
 from .embeddings import HashEmbeddingModel, TransformerEmbedder
 from .llm import LocalLLM
-from .models import ImpactReport
+from .models import ImpactReport, Node
 from .parser import PythonGraphParser
 from .rag import RAGRetriever
 from .storage import GraphStore
+# Regex to strip bare import lines from chunk text
+_IMPORT_RE = re.compile(r"^(?:from\s+\S+\s+)?import\s+.+$", re.MULTILINE)
+# Maximum characters to keep for a single chunk's code body.
+# Module-level nodes can be very large; truncating keeps embeddings
+# focused on the symbol's signature + docstring + first N lines.
+_MAX_CHUNK_CODE_CHARS = 1500
+def _build_chunk_text(node: Node) -> str:
+    """Build structured chunk text for embedding.
+    The text is formatted so that the embedding model captures:
+    - **file path** (helps retrieval when users mention filenames)
+    - **symbol name + type** (boosts exact-match semantics)
+    - **docstring** (captures purpose / intent)
+    - **code body** (captures implementation detail)
+    Import lines and decorators-only boilerplate are stripped to
+    reduce noise.  Module-level nodes are truncated to avoid huge
+    embeddings that dilute meaning.
+    """
+    parts: List[str] = [
+        f"file: {node.file_path}",
+        f"symbol: {node.qualname}",
+        f"type: {node.node_type}",
+    ]
+    if node.docstring:
+        parts.append(f"doc: {node.docstring.strip()}")
+    # Clean code: strip import lines for non-module nodes
+    code = node.code
+    if node.node_type != "module":
+        code = _IMPORT_RE.sub("", code).strip()
+    else:
+        # For modules keep only the first N chars to avoid huge chunks
+        code = code[:_MAX_CHUNK_CODE_CHARS]
+    # Truncate overly long code
+    if len(code) > _MAX_CHUNK_CODE_CHARS:
+        code = code[:_MAX_CHUNK_CODE_CHARS] + "\n# ... (truncated)"
+    if code:
+        parts.append(code)
+    return "\n".join(parts)
 class GraphAgent:
     """Responsible for parsing projects and maintaining graph memory."""
@@ -31,7 +80,7 @@ class GraphAgent:
         total_nodes = len(nodes)
         for idx, node in enumerate(nodes, 1):
-            text = "\n".join([node.qualname, node.docstring, node.code])
+            text = _build_chunk_text(node)
             emb = self.embedding_model.embed_text(text)
             node_payload.append((node, emb))
@@ -43,13 +92,20 @@ class GraphAgent:
         if show_progress:
             print(f"\r📊 Indexing: {total_nodes}/{total_nodes} nodes (100%)  ")
-        self.store.insert_nodes(node_payload)
+        emb_model_key = getattr(self.embedding_model, 'model_key', 'hash')
+        emb_dim = getattr(self.embedding_model, 'dim', 256)
+        self.store.insert_nodes(node_payload, model_key=emb_model_key)
         self.store.insert_edges(edges)
+        # Record embedding model info in project metadata
         self.store.set_metadata(
             {
                 "project_root": str(project_root),
                 "node_count": len(nodes),
                 "edge_count": len(edges),
+                "embedding_model": emb_model_key,
+                "embedding_dim": emb_dim,
             }
         )
         return {"nodes": len(nodes), "edges": len(edges)}

{codegraph_cli-2.1.0 → codegraph_cli-2.1.2}/codegraph_cli/chat_agent.py RENAMED Viewed

@@ -7,7 +7,7 @@ from typing import Optional
 from .chat_session import SessionManager
 from .codegen_agent import CodeGenAgent
-from .context_manager import assemble_context_for_llm, detect_intent
+from .context_manager import SymbolMemory, assemble_context_for_llm, detect_intent
 from .llm import LocalLLM
 from .models_v2 import ChatSession, CodeProposal
 from .orchestrator import MCPOrchestrator
@@ -59,11 +59,60 @@ class ChatAgent:
         self.rag_retriever = rag_retriever
         self.session_manager = SessionManager()
+        # Symbol memory — tracks recently discussed symbols & files
+        # so we can skip redundant RAG queries.
+        self.symbol_memory = SymbolMemory()
         # Initialize specialized agents
         from .codegen_agent import CodeGenAgent
         from .refactor_agent import RefactorAgent
         self.codegen_agent = CodeGenAgent(context.store, llm, project_context=context)
         self.refactor_agent = RefactorAgent(context.store)
+        # Build enhanced system prompt with auto-context
+        self.system_prompt = self._build_system_prompt()
+    def _build_system_prompt(self) -> str:
+        """Build system prompt enriched with project context.
+        Includes project name, source path, indexed file/symbol counts,
+        node-type breakdown, and recently modified files so the LLM has
+        immediate awareness of the codebase.
+        """
+        base = SYSTEM_PROMPT
+        try:
+            summary = self.context.get_project_summary()
+            parts = [
+                "\n\nProject Context:",
+                f"- Project: {summary.get('project_name', 'unknown')}",
+                f"- Source: {summary.get('source_path', 'N/A')}",
+                f"- Indexed: {summary.get('indexed_files', 0)} files, {summary.get('total_nodes', 0)} symbols",
+            ]
+            node_types = summary.get("node_types", {})
+            if node_types:
+                parts.append(
+                    f"- Breakdown: {node_types.get('function', 0)} functions, "
+                    f"{node_types.get('class', 0)} classes, "
+                    f"{node_types.get('module', 0)} modules"
+                )
+            # Recently modified files
+            if self.context.has_source_access:
+                try:
+                    items = self.context.list_directory(".")
+                    files = [f for f in items if f["type"] == "file"]
+                    files.sort(key=lambda f: f.get("modified", ""), reverse=True)
+                    recent = [f["name"] for f in files[:5]]
+                    if recent:
+                        parts.append(f"- Recently modified: {', '.join(recent)}")
+                except Exception:
+                    pass
+            return base + "\n".join(parts)
+        except Exception:
+            return base
     def process_message(
         self,
@@ -72,6 +121,10 @@ class ChatAgent:
     ) -> str:
         """Process user message and generate response.
+        Note: The caller (REPL) is responsible for adding messages to
+        the session.  This method does NOT add messages itself to avoid
+        duplicate entries.
         Args:
             user_message: User's message
             session: Current chat session
@@ -79,10 +132,6 @@ class ChatAgent:
         Returns:
             Assistant's response
         """
-        # Add user message to session
-        timestamp = datetime.now().isoformat()
-        session.add_message("user", user_message, timestamp)
         # Detect intent
         intent = detect_intent(user_message)
@@ -103,9 +152,6 @@ class ChatAgent:
             # General chat - use LLM with RAG context
             response = self._handle_chat(user_message, session)
-        # Add assistant response to session
-        session.add_message("assistant", response, datetime.now().isoformat())
         # Save session
         self.session_manager.save_session(session)
@@ -289,13 +335,14 @@ class ChatAgent:
     def _handle_chat(self, message: str, session: ChatSession) -> str:
         """Handle general chat with LLM and RAG context."""
-        # Assemble context using smart RAG strategy
+        # Assemble context using smart RAG strategy + symbol memory
         context_messages = assemble_context_for_llm(
             user_message=message,
             session=session,
             rag_retriever=self.rag_retriever,
-            system_prompt=SYSTEM_PROMPT,
-            max_tokens=8000
+            system_prompt=self.system_prompt,
+            max_tokens=8000,
+            symbol_memory=self.symbol_memory,
         )
         # Call LLM

codegraph-cli 2.1.0__tar.gz → 2.1.2__tar.gz

codegraph-cli 2.1.0tar.gz → 2.1.2tar.gz