repolix 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
repolix-0.1.0/LICENSE ADDED
@@ -0,0 +1,23 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 Patrick Chung
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining
6
+ a copy of this software and associated documentation files (the
7
+ "Software"), to deal in the Software without restriction, including
8
+ without limitation the rights to use, copy, modify, merge, publish,
9
+ distribute, sublicense, and/or sell copies of the Software, and to
10
+ permit persons to whom the Software is furnished to do so, subject to
11
+ the following conditions:
12
+
13
+ The above copyright notice and this permission notice shall be
14
+ included in all copies or substantial portions of the Software.
15
+
16
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
17
+ EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
18
+ MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
19
+ NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
20
+ BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
21
+ ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
22
+ CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
23
+ SOFTWARE.
@@ -0,0 +1,4 @@
1
+ include README.md
2
+ include LICENSE
3
+ recursive-include frontend/dist *
4
+ recursive-include codesight *.py
repolix-0.1.0/PKG-INFO ADDED
@@ -0,0 +1,242 @@
1
+ Metadata-Version: 2.2
2
+ Name: repolix
3
+ Version: 0.1.0
4
+ Summary: Local-first codebase context engine — ask plain English questions about any Python codebase
5
+ Author: Patrick Chung
6
+ License: MIT
7
+ Project-URL: Homepage, https://github.com/TheAsianFish/repolix
8
+ Project-URL: Issues, https://github.com/TheAsianFish/repolix/issues
9
+ Keywords: codebase,search,embeddings,RAG,developer-tools,AST,code-search
10
+ Classifier: Development Status :: 4 - Beta
11
+ Classifier: Intended Audience :: Developers
12
+ Classifier: Programming Language :: Python :: 3
13
+ Classifier: Programming Language :: Python :: 3.11
14
+ Classifier: Programming Language :: Python :: 3.12
15
+ Classifier: Programming Language :: Python :: 3.13
16
+ Classifier: Topic :: Software Development :: Libraries :: Python Modules
17
+ Classifier: Topic :: Utilities
18
+ Requires-Python: >=3.11
19
+ Description-Content-Type: text/markdown
20
+ License-File: LICENSE
21
+ Requires-Dist: tree-sitter>=0.21
22
+ Requires-Dist: tree-sitter-python>=0.21
23
+ Requires-Dist: openai>=1.0
24
+ Requires-Dist: chromadb>=0.4
25
+ Requires-Dist: fastapi>=0.110
26
+ Requires-Dist: uvicorn>=0.29
27
+ Requires-Dist: click>=8.1
28
+ Requires-Dist: python-dotenv>=1.0
29
+ Requires-Dist: tiktoken>=0.7
30
+ Provides-Extra: dev
31
+ Requires-Dist: pytest>=8.0; extra == "dev"
32
+ Requires-Dist: pytest-cov>=5.0; extra == "dev"
33
+ Requires-Dist: ruff>=0.4; extra == "dev"
34
+ Requires-Dist: httpx>=0.27; extra == "dev"
35
+
36
+ # codesight
37
+
38
+ **Ask plain English questions about any Python codebase. Get answers
39
+ with exact file and line citations. Runs entirely on your machine.**
40
+ ```bash
41
+ codesight index ./myrepo
42
+ codesight query "how does authentication work"
43
+ ```
44
+ Searching...
45
+ Generating answer...
46
+ ── Answer ────────────────────────────────────────────────
47
+ authenticate_user() validates credentials by calling validate_token()
48
+ [1] which checks expiry and signature. On success it creates a
49
+ session via SessionService.create() [2].
50
+ ── Citations ─────────────────────────────────────────────
51
+ [1] auth/validators.py:14-28 (validate_token)
52
+ [2] auth/session.py:45-67 (SessionService.create)
53
+
54
+ [confidence: high · 5 chunks · index: ./myrepo/.codesight]
55
+
56
+ Your code never leaves your machine. No server. No accounts beyond
57
+ an OpenAI API key.
58
+
59
+ ---
60
+
61
+ ## Why codesight
62
+
63
+ Getting dropped into an unfamiliar codebase is painful. Documentation
64
+ is outdated. Grep finds strings, not meaning. LLM chatbots hallucinate
65
+ file names and function signatures because they have no access to your
66
+ actual code.
67
+
68
+ codesight indexes your code locally using AST-based chunking — every
69
+ retrieved chunk is a complete function or class, never an arbitrary
70
+ line slice. It runs entirely on your machine.
71
+
72
+ ---
73
+
74
+ ## How it works
75
+
76
+ **1. AST chunking**
77
+ Tree-sitter parses each file into a syntax tree. codesight splits only
78
+ at function and class boundaries. Every chunk is semantically complete.
79
+ Methods are tracked with their parent class for disambiguation.
80
+
81
+ **2. Hybrid search**
82
+ Queries run against OpenAI embeddings (vector search) and exact token
83
+ matching (keyword search) simultaneously. Results are merged using
84
+ Reciprocal Rank Fusion — a ranking algorithm that rewards consistency
85
+ across search methods over dominance in just one.
86
+
87
+ **3. Call graph expansion**
88
+ After initial retrieval, codesight inspects each retrieved chunk's
89
+ call graph and fetches called functions that did not rank highly
90
+ enough on their own. This surfaces implementation details that live
91
+ one function call away from the entry point.
92
+
93
+ **4. Metadata re-ranking**
94
+ Retrieved chunks are re-ranked using function names, file paths,
95
+ docstrings, and call graph signals before being sent to the LLM.
96
+
97
+ **5. Cited answers**
98
+ The top chunks go to gpt-5.4-mini with instructions to synthesize
99
+ across all chunks and cite every claim. Citations map back to exact
100
+ file paths and line numbers.
101
+
102
+ ---
103
+
104
+ ## Quickstart
105
+
106
+ ### Requirements
107
+
108
+ - Python 3.11+
109
+ - OpenAI API key ([get one here](https://platform.openai.com/api-keys))
110
+
111
+ > Node.js is **not required** for end users. The web UI is bundled
112
+ > inside the package and served directly by FastAPI.
113
+
114
+ ### Install from PyPI
115
+ ```bash
116
+ pip install codesight
117
+ ```
118
+
119
+ Set your API key:
120
+ ```bash
121
+ export OPENAI_API_KEY=sk-your-key-here
122
+ # or add it to a .env file in your working directory
123
+ ```
124
+
125
+ ### Install from source (development)
126
+ ```bash
127
+ git clone https://github.com/TheAsianFish/codesight
128
+ cd codesight
129
+ python -m venv .venv
130
+ source .venv/bin/activate # Windows: .venv\Scripts\activate
131
+ pip install -e ".[dev]"
132
+ cp .env.example .env
133
+ # Edit .env and add your OPENAI_API_KEY
134
+ ```
135
+
136
+ ### CLI
137
+ ```bash
138
+ # Index a repository (~$0.02 per 30k lines, one-time)
139
+ codesight index ./path/to/repo
140
+
141
+ # Ask a question
142
+ codesight query "how does authentication work"
143
+
144
+ # See raw retrieved chunks without an LLM call
145
+ codesight query "where is UserService defined" --no-llm
146
+
147
+ # Force re-index all files after a major refactor
148
+ codesight index ./path/to/repo --force
149
+ ```
150
+
151
+ ### Web UI
152
+ ```bash
153
+ # Start the server (the React UI is bundled — no npm needed)
154
+ uvicorn codesight.api:app --port 8000
155
+ # Open http://localhost:8000
156
+ ```
157
+
158
+ **For frontend development** (hot reload via Vite):
159
+ ```bash
160
+ # Requires Node.js 18+
161
+ cd frontend && npm install && cd ..
162
+ bash start.sh
163
+ # Backend: http://localhost:8000 Frontend: http://localhost:3000
164
+ ```
165
+
166
+ ---
167
+
168
+ ## Cost
169
+
170
+ | Action | Cost |
171
+ |---|---|
172
+ | Index 30k line repo | ~$0.02 (one-time) |
173
+ | Re-index after small change | ~$0.001 (changed files only) |
174
+ | Each query | ~$0.001 |
175
+
176
+ Incremental indexing means re-indexing after a small change costs
177
+ almost nothing — only changed files are re-embedded.
178
+
179
+ ---
180
+
181
+ ## Stack
182
+
183
+ | Layer | Choice |
184
+ |---|---|
185
+ | AST parsing | Tree-sitter |
186
+ | Embeddings | text-embedding-3-small |
187
+ | Vector store | ChromaDB (local, no server needed) |
188
+ | LLM | gpt-5.4-mini |
189
+ | Backend | FastAPI |
190
+ | Frontend | React + TypeScript |
191
+ | CLI | Click |
192
+
193
+ ---
194
+
195
+ ## Output
196
+
197
+ Each query produces:
198
+
199
+ - A prose answer with inline citations `[1]`, `[2]` etc.
200
+ - A citations section with exact file paths and line ranges.
201
+ Citations marked `[truncated]` mean the function exceeded the
202
+ 300-token chunk cap — the answer is based on a partial view of
203
+ that function.
204
+ - A confidence label (`high` / `medium` / `low`) derived from how
205
+ strongly the retrieved chunks matched the query across function
206
+ names, file paths, docstrings, and call graph signals.
207
+
208
+ ---
209
+
210
+ ## Limitations
211
+
212
+ - Python repos only. TypeScript support planned for V2.
213
+ - Best on repos up to ~30k lines.
214
+ - Deeply nested functions are included in their parent chunk.
215
+ - Large functions (>300 tokens) are truncated at the chunk cap.
216
+ The `[truncated]` marker in citations flags when this occurs.
217
+ - Complex cross-file reasoning may require rephrasing the query.
218
+ - Architecture-level questions (layer structure, dependency graphs)
219
+ require the V2 dependency graph feature to answer reliably.
220
+
221
+ ---
222
+
223
+ ## Roadmap
224
+
225
+ **V2** — TypeScript support, VS Code extension, dependency graph
226
+
227
+ **V3** — GitHub webhook re-indexing, multi-repo, Slack bot
228
+
229
+ ---
230
+
231
+ ## Contributing
232
+
233
+ Bug reports and pull requests are welcome. Please open an issue
234
+ before submitting a large change so we can discuss the approach.
235
+
236
+ See .github/ISSUE_TEMPLATE/bug_report.md for the bug report format.
237
+
238
+ ---
239
+
240
+ ## License
241
+
242
+ MIT © 2026 Patrick Chung
@@ -0,0 +1,207 @@
1
+ # codesight
2
+
3
+ **Ask plain English questions about any Python codebase. Get answers
4
+ with exact file and line citations. Runs entirely on your machine.**
5
+ ```bash
6
+ codesight index ./myrepo
7
+ codesight query "how does authentication work"
8
+ ```
9
+ Searching...
10
+ Generating answer...
11
+ ── Answer ────────────────────────────────────────────────
12
+ authenticate_user() validates credentials by calling validate_token()
13
+ [1] which checks expiry and signature. On success it creates a
14
+ session via SessionService.create() [2].
15
+ ── Citations ─────────────────────────────────────────────
16
+ [1] auth/validators.py:14-28 (validate_token)
17
+ [2] auth/session.py:45-67 (SessionService.create)
18
+
19
+ [confidence: high · 5 chunks · index: ./myrepo/.codesight]
20
+
21
+ Your code never leaves your machine. No server. No accounts beyond
22
+ an OpenAI API key.
23
+
24
+ ---
25
+
26
+ ## Why codesight
27
+
28
+ Getting dropped into an unfamiliar codebase is painful. Documentation
29
+ is outdated. Grep finds strings, not meaning. LLM chatbots hallucinate
30
+ file names and function signatures because they have no access to your
31
+ actual code.
32
+
33
+ codesight indexes your code locally using AST-based chunking — every
34
+ retrieved chunk is a complete function or class, never an arbitrary
35
+ line slice. It runs entirely on your machine.
36
+
37
+ ---
38
+
39
+ ## How it works
40
+
41
+ **1. AST chunking**
42
+ Tree-sitter parses each file into a syntax tree. codesight splits only
43
+ at function and class boundaries. Every chunk is semantically complete.
44
+ Methods are tracked with their parent class for disambiguation.
45
+
46
+ **2. Hybrid search**
47
+ Queries run against OpenAI embeddings (vector search) and exact token
48
+ matching (keyword search) simultaneously. Results are merged using
49
+ Reciprocal Rank Fusion — a ranking algorithm that rewards consistency
50
+ across search methods over dominance in just one.
51
+
52
+ **3. Call graph expansion**
53
+ After initial retrieval, codesight inspects each retrieved chunk's
54
+ call graph and fetches called functions that did not rank highly
55
+ enough on their own. This surfaces implementation details that live
56
+ one function call away from the entry point.
57
+
58
+ **4. Metadata re-ranking**
59
+ Retrieved chunks are re-ranked using function names, file paths,
60
+ docstrings, and call graph signals before being sent to the LLM.
61
+
62
+ **5. Cited answers**
63
+ The top chunks go to gpt-5.4-mini with instructions to synthesize
64
+ across all chunks and cite every claim. Citations map back to exact
65
+ file paths and line numbers.
66
+
67
+ ---
68
+
69
+ ## Quickstart
70
+
71
+ ### Requirements
72
+
73
+ - Python 3.11+
74
+ - OpenAI API key ([get one here](https://platform.openai.com/api-keys))
75
+
76
+ > Node.js is **not required** for end users. The web UI is bundled
77
+ > inside the package and served directly by FastAPI.
78
+
79
+ ### Install from PyPI
80
+ ```bash
81
+ pip install codesight
82
+ ```
83
+
84
+ Set your API key:
85
+ ```bash
86
+ export OPENAI_API_KEY=sk-your-key-here
87
+ # or add it to a .env file in your working directory
88
+ ```
89
+
90
+ ### Install from source (development)
91
+ ```bash
92
+ git clone https://github.com/TheAsianFish/codesight
93
+ cd codesight
94
+ python -m venv .venv
95
+ source .venv/bin/activate # Windows: .venv\Scripts\activate
96
+ pip install -e ".[dev]"
97
+ cp .env.example .env
98
+ # Edit .env and add your OPENAI_API_KEY
99
+ ```
100
+
101
+ ### CLI
102
+ ```bash
103
+ # Index a repository (~$0.02 per 30k lines, one-time)
104
+ codesight index ./path/to/repo
105
+
106
+ # Ask a question
107
+ codesight query "how does authentication work"
108
+
109
+ # See raw retrieved chunks without an LLM call
110
+ codesight query "where is UserService defined" --no-llm
111
+
112
+ # Force re-index all files after a major refactor
113
+ codesight index ./path/to/repo --force
114
+ ```
115
+
116
+ ### Web UI
117
+ ```bash
118
+ # Start the server (the React UI is bundled — no npm needed)
119
+ uvicorn codesight.api:app --port 8000
120
+ # Open http://localhost:8000
121
+ ```
122
+
123
+ **For frontend development** (hot reload via Vite):
124
+ ```bash
125
+ # Requires Node.js 18+
126
+ cd frontend && npm install && cd ..
127
+ bash start.sh
128
+ # Backend: http://localhost:8000 Frontend: http://localhost:3000
129
+ ```
130
+
131
+ ---
132
+
133
+ ## Cost
134
+
135
+ | Action | Cost |
136
+ |---|---|
137
+ | Index 30k line repo | ~$0.02 (one-time) |
138
+ | Re-index after small change | ~$0.001 (changed files only) |
139
+ | Each query | ~$0.001 |
140
+
141
+ Incremental indexing means re-indexing after a small change costs
142
+ almost nothing — only changed files are re-embedded.
143
+
144
+ ---
145
+
146
+ ## Stack
147
+
148
+ | Layer | Choice |
149
+ |---|---|
150
+ | AST parsing | Tree-sitter |
151
+ | Embeddings | text-embedding-3-small |
152
+ | Vector store | ChromaDB (local, no server needed) |
153
+ | LLM | gpt-5.4-mini |
154
+ | Backend | FastAPI |
155
+ | Frontend | React + TypeScript |
156
+ | CLI | Click |
157
+
158
+ ---
159
+
160
+ ## Output
161
+
162
+ Each query produces:
163
+
164
+ - A prose answer with inline citations `[1]`, `[2]` etc.
165
+ - A citations section with exact file paths and line ranges.
166
+ Citations marked `[truncated]` mean the function exceeded the
167
+ 300-token chunk cap — the answer is based on a partial view of
168
+ that function.
169
+ - A confidence label (`high` / `medium` / `low`) derived from how
170
+ strongly the retrieved chunks matched the query across function
171
+ names, file paths, docstrings, and call graph signals.
172
+
173
+ ---
174
+
175
+ ## Limitations
176
+
177
+ - Python repos only. TypeScript support planned for V2.
178
+ - Best on repos up to ~30k lines.
179
+ - Deeply nested functions are included in their parent chunk.
180
+ - Large functions (>300 tokens) are truncated at the chunk cap.
181
+ The `[truncated]` marker in citations flags when this occurs.
182
+ - Complex cross-file reasoning may require rephrasing the query.
183
+ - Architecture-level questions (layer structure, dependency graphs)
184
+ require the V2 dependency graph feature to answer reliably.
185
+
186
+ ---
187
+
188
+ ## Roadmap
189
+
190
+ **V2** — TypeScript support, VS Code extension, dependency graph
191
+
192
+ **V3** — GitHub webhook re-indexing, multi-repo, Slack bot
193
+
194
+ ---
195
+
196
+ ## Contributing
197
+
198
+ Bug reports and pull requests are welcome. Please open an issue
199
+ before submitting a large change so we can discuss the approach.
200
+
201
+ See .github/ISSUE_TEMPLATE/bug_report.md for the bug report format.
202
+
203
+ ---
204
+
205
+ ## License
206
+
207
+ MIT © 2026 Patrick Chung
@@ -0,0 +1,5 @@
1
+ """
2
+ codesight — local-first codebase context engine.
3
+ """
4
+
5
+ __version__ = "0.1.0"