PyPI - repolix - Versions diffs - 0.1.0__tar.gz - Mend

repolix 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (33) hide show

repolix-0.1.0/LICENSE +23 -0
repolix-0.1.0/MANIFEST.in +4 -0
repolix-0.1.0/PKG-INFO +242 -0
repolix-0.1.0/README.md +207 -0
repolix-0.1.0/codesight/__init__.py +5 -0
repolix-0.1.0/codesight/api.py +275 -0
repolix-0.1.0/codesight/chunker.py +275 -0
repolix-0.1.0/codesight/cli.py +264 -0
repolix-0.1.0/codesight/dist/assets/index-BWIMglAM.js +40 -0
repolix-0.1.0/codesight/dist/dist/assets/index-BWIMglAM.js +40 -0
repolix-0.1.0/codesight/dist/dist/index.html +12 -0
repolix-0.1.0/codesight/dist/index.html +12 -0
repolix-0.1.0/codesight/llm.py +222 -0
repolix-0.1.0/codesight/retriever.py +289 -0
repolix-0.1.0/codesight/store.py +463 -0
repolix-0.1.0/codesight/walker.py +109 -0
repolix-0.1.0/frontend/dist/assets/index-BWIMglAM.js +40 -0
repolix-0.1.0/frontend/dist/index.html +12 -0
repolix-0.1.0/pyproject.toml +73 -0
repolix-0.1.0/repolix.egg-info/PKG-INFO +242 -0
repolix-0.1.0/repolix.egg-info/SOURCES.txt +31 -0
repolix-0.1.0/repolix.egg-info/dependency_links.txt +1 -0
repolix-0.1.0/repolix.egg-info/entry_points.txt +2 -0
repolix-0.1.0/repolix.egg-info/requires.txt +15 -0
repolix-0.1.0/repolix.egg-info/top_level.txt +3 -0
repolix-0.1.0/setup.cfg +4 -0
repolix-0.1.0/tests/test_api.py +161 -0
repolix-0.1.0/tests/test_chunker.py +376 -0
repolix-0.1.0/tests/test_cli.py +176 -0
repolix-0.1.0/tests/test_llm.py +227 -0
repolix-0.1.0/tests/test_retriever.py +203 -0
repolix-0.1.0/tests/test_store.py +303 -0
repolix-0.1.0/tests/test_walker.py +95 -0

repolix-0.1.0/LICENSE ADDED Viewed

@@ -0,0 +1,23 @@
+MIT License
+Copyright (c) 2026 Patrick Chung
+Permission is hereby granted, free of charge, to any person obtaining
+a copy of this software and associated documentation files (the
+"Software"), to deal in the Software without restriction, including
+without limitation the rights to use, copy, modify, merge, publish,
+distribute, sublicense, and/or sell copies of the Software, and to
+permit persons to whom the Software is furnished to do so, subject to
+the following conditions:
+The above copyright notice and this permission notice shall be
+included in all copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

repolix-0.1.0/MANIFEST.in ADDED Viewed

@@ -0,0 +1,4 @@
+include README.md
+include LICENSE
+recursive-include frontend/dist *
+recursive-include codesight *.py

repolix-0.1.0/PKG-INFO ADDED Viewed

@@ -0,0 +1,242 @@
+Metadata-Version: 2.2
+Name: repolix
+Version: 0.1.0
+Summary: Local-first codebase context engine — ask plain English questions about any Python codebase
+Author: Patrick Chung
+License: MIT
+Project-URL: Homepage, https://github.com/TheAsianFish/repolix
+Project-URL: Issues, https://github.com/TheAsianFish/repolix/issues
+Keywords: codebase,search,embeddings,RAG,developer-tools,AST,code-search
+Classifier: Development Status :: 4 - Beta
+Classifier: Intended Audience :: Developers
+Classifier: Programming Language :: Python :: 3
+Classifier: Programming Language :: Python :: 3.11
+Classifier: Programming Language :: Python :: 3.12
+Classifier: Programming Language :: Python :: 3.13
+Classifier: Topic :: Software Development :: Libraries :: Python Modules
+Classifier: Topic :: Utilities
+Requires-Python: >=3.11
+Description-Content-Type: text/markdown
+License-File: LICENSE
+Requires-Dist: tree-sitter>=0.21
+Requires-Dist: tree-sitter-python>=0.21
+Requires-Dist: openai>=1.0
+Requires-Dist: chromadb>=0.4
+Requires-Dist: fastapi>=0.110
+Requires-Dist: uvicorn>=0.29
+Requires-Dist: click>=8.1
+Requires-Dist: python-dotenv>=1.0
+Requires-Dist: tiktoken>=0.7
+Provides-Extra: dev
+Requires-Dist: pytest>=8.0; extra == "dev"
+Requires-Dist: pytest-cov>=5.0; extra == "dev"
+Requires-Dist: ruff>=0.4; extra == "dev"
+Requires-Dist: httpx>=0.27; extra == "dev"
+# codesight
+**Ask plain English questions about any Python codebase. Get answers
+with exact file and line citations. Runs entirely on your machine.**
+```bash
+codesight index ./myrepo
+codesight query "how does authentication work"
+```
+Searching...
+Generating answer...
+── Answer ────────────────────────────────────────────────
+authenticate_user() validates credentials by calling validate_token()
+[1] which checks expiry and signature. On success it creates a
+session via SessionService.create() [2].
+── Citations ─────────────────────────────────────────────
+[1] auth/validators.py:14-28    (validate_token)
+[2] auth/session.py:45-67       (SessionService.create)
+[confidence: high · 5 chunks · index: ./myrepo/.codesight]
+Your code never leaves your machine. No server. No accounts beyond
+an OpenAI API key.
+---
+## Why codesight
+Getting dropped into an unfamiliar codebase is painful. Documentation
+is outdated. Grep finds strings, not meaning. LLM chatbots hallucinate
+file names and function signatures because they have no access to your
+actual code.
+codesight indexes your code locally using AST-based chunking — every
+retrieved chunk is a complete function or class, never an arbitrary
+line slice. It runs entirely on your machine.
+---
+## How it works
+**1. AST chunking**
+Tree-sitter parses each file into a syntax tree. codesight splits only
+at function and class boundaries. Every chunk is semantically complete.
+Methods are tracked with their parent class for disambiguation.
+**2. Hybrid search**
+Queries run against OpenAI embeddings (vector search) and exact token
+matching (keyword search) simultaneously. Results are merged using
+Reciprocal Rank Fusion — a ranking algorithm that rewards consistency
+across search methods over dominance in just one.
+**3. Call graph expansion**
+After initial retrieval, codesight inspects each retrieved chunk's
+call graph and fetches called functions that did not rank highly
+enough on their own. This surfaces implementation details that live
+one function call away from the entry point.
+**4. Metadata re-ranking**
+Retrieved chunks are re-ranked using function names, file paths,
+docstrings, and call graph signals before being sent to the LLM.
+**5. Cited answers**
+The top chunks go to gpt-5.4-mini with instructions to synthesize
+across all chunks and cite every claim. Citations map back to exact
+file paths and line numbers.
+---
+## Quickstart
+### Requirements
+- Python 3.11+
+- OpenAI API key ([get one here](https://platform.openai.com/api-keys))
+> Node.js is **not required** for end users. The web UI is bundled
+> inside the package and served directly by FastAPI.
+### Install from PyPI
+```bash
+pip install codesight
+```
+Set your API key:
+```bash
+export OPENAI_API_KEY=sk-your-key-here
+# or add it to a .env file in your working directory
+```
+### Install from source (development)
+```bash
+git clone https://github.com/TheAsianFish/codesight
+cd codesight
+python -m venv .venv
+source .venv/bin/activate      # Windows: .venv\Scripts\activate
+pip install -e ".[dev]"
+cp .env.example .env
+# Edit .env and add your OPENAI_API_KEY
+```
+### CLI
+```bash
+# Index a repository (~$0.02 per 30k lines, one-time)
+codesight index ./path/to/repo
+# Ask a question
+codesight query "how does authentication work"
+# See raw retrieved chunks without an LLM call
+codesight query "where is UserService defined" --no-llm
+# Force re-index all files after a major refactor
+codesight index ./path/to/repo --force
+```
+### Web UI
+```bash
+# Start the server (the React UI is bundled — no npm needed)
+uvicorn codesight.api:app --port 8000
+# Open http://localhost:8000
+```
+**For frontend development** (hot reload via Vite):
+```bash
+# Requires Node.js 18+
+cd frontend && npm install && cd ..
+bash start.sh
+# Backend: http://localhost:8000  Frontend: http://localhost:3000
+```
+---
+## Cost
+| Action | Cost |
+|---|---|
+| Index 30k line repo | ~$0.02 (one-time) |
+| Re-index after small change | ~$0.001 (changed files only) |
+| Each query | ~$0.001 |
+Incremental indexing means re-indexing after a small change costs
+almost nothing — only changed files are re-embedded.
+---
+## Stack
+| Layer | Choice |
+|---|---|
+| AST parsing | Tree-sitter |
+| Embeddings | text-embedding-3-small |
+| Vector store | ChromaDB (local, no server needed) |
+| LLM | gpt-5.4-mini |
+| Backend | FastAPI |
+| Frontend | React + TypeScript |
+| CLI | Click |
+---
+## Output
+Each query produces:
+- A prose answer with inline citations `[1]`, `[2]` etc.
+- A citations section with exact file paths and line ranges.
+  Citations marked `[truncated]` mean the function exceeded the
+  300-token chunk cap — the answer is based on a partial view of
+  that function.
+- A confidence label (`high` / `medium` / `low`) derived from how
+  strongly the retrieved chunks matched the query across function
+  names, file paths, docstrings, and call graph signals.
+---
+## Limitations
+- Python repos only. TypeScript support planned for V2.
+- Best on repos up to ~30k lines.
+- Deeply nested functions are included in their parent chunk.
+- Large functions (>300 tokens) are truncated at the chunk cap.
+  The `[truncated]` marker in citations flags when this occurs.
+- Complex cross-file reasoning may require rephrasing the query.
+- Architecture-level questions (layer structure, dependency graphs)
+  require the V2 dependency graph feature to answer reliably.
+---
+## Roadmap
+**V2** — TypeScript support, VS Code extension, dependency graph
+**V3** — GitHub webhook re-indexing, multi-repo, Slack bot
+---
+## Contributing
+Bug reports and pull requests are welcome. Please open an issue
+before submitting a large change so we can discuss the approach.
+See .github/ISSUE_TEMPLATE/bug_report.md for the bug report format.
+---
+## License
+MIT © 2026 Patrick Chung

repolix-0.1.0/README.md ADDED Viewed

@@ -0,0 +1,207 @@
+# codesight
+**Ask plain English questions about any Python codebase. Get answers
+with exact file and line citations. Runs entirely on your machine.**
+```bash
+codesight index ./myrepo
+codesight query "how does authentication work"
+```
+Searching...
+Generating answer...
+── Answer ────────────────────────────────────────────────
+authenticate_user() validates credentials by calling validate_token()
+[1] which checks expiry and signature. On success it creates a
+session via SessionService.create() [2].
+── Citations ─────────────────────────────────────────────
+[1] auth/validators.py:14-28    (validate_token)
+[2] auth/session.py:45-67       (SessionService.create)
+[confidence: high · 5 chunks · index: ./myrepo/.codesight]
+Your code never leaves your machine. No server. No accounts beyond
+an OpenAI API key.
+---
+## Why codesight
+Getting dropped into an unfamiliar codebase is painful. Documentation
+is outdated. Grep finds strings, not meaning. LLM chatbots hallucinate
+file names and function signatures because they have no access to your
+actual code.
+codesight indexes your code locally using AST-based chunking — every
+retrieved chunk is a complete function or class, never an arbitrary
+line slice. It runs entirely on your machine.
+---
+## How it works
+**1. AST chunking**
+Tree-sitter parses each file into a syntax tree. codesight splits only
+at function and class boundaries. Every chunk is semantically complete.
+Methods are tracked with their parent class for disambiguation.
+**2. Hybrid search**
+Queries run against OpenAI embeddings (vector search) and exact token
+matching (keyword search) simultaneously. Results are merged using
+Reciprocal Rank Fusion — a ranking algorithm that rewards consistency
+across search methods over dominance in just one.
+**3. Call graph expansion**
+After initial retrieval, codesight inspects each retrieved chunk's
+call graph and fetches called functions that did not rank highly
+enough on their own. This surfaces implementation details that live
+one function call away from the entry point.
+**4. Metadata re-ranking**
+Retrieved chunks are re-ranked using function names, file paths,
+docstrings, and call graph signals before being sent to the LLM.
+**5. Cited answers**
+The top chunks go to gpt-5.4-mini with instructions to synthesize
+across all chunks and cite every claim. Citations map back to exact
+file paths and line numbers.
+---
+## Quickstart
+### Requirements
+- Python 3.11+
+- OpenAI API key ([get one here](https://platform.openai.com/api-keys))
+> Node.js is **not required** for end users. The web UI is bundled
+> inside the package and served directly by FastAPI.
+### Install from PyPI
+```bash
+pip install codesight
+```
+Set your API key:
+```bash
+export OPENAI_API_KEY=sk-your-key-here
+# or add it to a .env file in your working directory
+```
+### Install from source (development)
+```bash
+git clone https://github.com/TheAsianFish/codesight
+cd codesight
+python -m venv .venv
+source .venv/bin/activate      # Windows: .venv\Scripts\activate
+pip install -e ".[dev]"
+cp .env.example .env
+# Edit .env and add your OPENAI_API_KEY
+```
+### CLI
+```bash
+# Index a repository (~$0.02 per 30k lines, one-time)
+codesight index ./path/to/repo
+# Ask a question
+codesight query "how does authentication work"
+# See raw retrieved chunks without an LLM call
+codesight query "where is UserService defined" --no-llm
+# Force re-index all files after a major refactor
+codesight index ./path/to/repo --force
+```
+### Web UI
+```bash
+# Start the server (the React UI is bundled — no npm needed)
+uvicorn codesight.api:app --port 8000
+# Open http://localhost:8000
+```
+**For frontend development** (hot reload via Vite):
+```bash
+# Requires Node.js 18+
+cd frontend && npm install && cd ..
+bash start.sh
+# Backend: http://localhost:8000  Frontend: http://localhost:3000
+```
+---
+## Cost
+| Action | Cost |
+|---|---|
+| Index 30k line repo | ~$0.02 (one-time) |
+| Re-index after small change | ~$0.001 (changed files only) |
+| Each query | ~$0.001 |
+Incremental indexing means re-indexing after a small change costs
+almost nothing — only changed files are re-embedded.
+---
+## Stack
+| Layer | Choice |
+|---|---|
+| AST parsing | Tree-sitter |
+| Embeddings | text-embedding-3-small |
+| Vector store | ChromaDB (local, no server needed) |
+| LLM | gpt-5.4-mini |
+| Backend | FastAPI |
+| Frontend | React + TypeScript |
+| CLI | Click |
+---
+## Output
+Each query produces:
+- A prose answer with inline citations `[1]`, `[2]` etc.
+- A citations section with exact file paths and line ranges.
+  Citations marked `[truncated]` mean the function exceeded the
+  300-token chunk cap — the answer is based on a partial view of
+  that function.
+- A confidence label (`high` / `medium` / `low`) derived from how
+  strongly the retrieved chunks matched the query across function
+  names, file paths, docstrings, and call graph signals.
+---
+## Limitations
+- Python repos only. TypeScript support planned for V2.
+- Best on repos up to ~30k lines.
+- Deeply nested functions are included in their parent chunk.
+- Large functions (>300 tokens) are truncated at the chunk cap.
+  The `[truncated]` marker in citations flags when this occurs.
+- Complex cross-file reasoning may require rephrasing the query.
+- Architecture-level questions (layer structure, dependency graphs)
+  require the V2 dependency graph feature to answer reliably.
+---
+## Roadmap
+**V2** — TypeScript support, VS Code extension, dependency graph
+**V3** — GitHub webhook re-indexing, multi-repo, Slack bot
+---
+## Contributing
+Bug reports and pull requests are welcome. Please open an issue
+before submitting a large change so we can discuss the approach.
+See .github/ISSUE_TEMPLATE/bug_report.md for the bug report format.
+---
+## License
+MIT © 2026 Patrick Chung

repolix-0.1.0/codesight/__init__.py ADDED Viewed

@@ -0,0 +1,5 @@
+"""
+codesight — local-first codebase context engine.
+"""
+__version__ = "0.1.0"