codexa 1.0.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/LICENSE +22 -0
- package/README.md +916 -0
- package/bin/codexa.js +2 -0
- package/dist/agent.js +64 -0
- package/dist/chunker.js +50 -0
- package/dist/cli.js +95 -0
- package/dist/config.js +97 -0
- package/dist/db.js +140 -0
- package/dist/embeddings/index.js +148 -0
- package/dist/ingest.js +69 -0
- package/dist/models/index.js +130 -0
- package/dist/retriever.js +22 -0
- package/dist/types.js +2 -0
- package/dist/utils/logger.js +13 -0
- package/package.json +79 -0
package/README.md
ADDED
|
@@ -0,0 +1,916 @@
|
|
|
1
|
+
<div align="center">
|
|
2
|
+
<h1>Codexa</h1>
|
|
3
|
+
|
|
4
|
+
<img width="1536" height="1024" alt="Image" src="https://github.com/user-attachments/assets/9d347801-9e39-494b-8645-17c0804223e3" />
|
|
5
|
+
|
|
6
|
+
<p>
|
|
7
|
+
<strong>A powerful CLI tool that ingests your codebase and allows you to ask questions about it using Retrieval-Augmented Generation (RAG).</strong>
|
|
8
|
+
</p>
|
|
9
|
+
|
|
10
|
+
<p>
|
|
11
|
+
<a href="https://www.npmjs.com/package/codexa"><img src="https://img.shields.io/npm/v/codexa?style=flat-square" alt="npm version"></a>
|
|
12
|
+
<a href="https://github.com/sahitya-chandra/codexa/blob/main/LICENSE"><img src="https://img.shields.io/badge/License-MIT-blue.svg?style=flat-square" alt="License"></a>
|
|
13
|
+
<a href="https://www.typescriptlang.org/"><img src="https://img.shields.io/badge/language-TypeScript-blue.svg?style=flat-square" alt="TypeScript"></a>
|
|
14
|
+
<img src="https://img.shields.io/badge/node-%3E%3D20.0.0-brightgreen.svg?style=flat-square" alt="Node.js version">
|
|
15
|
+
</p>
|
|
16
|
+
|
|
17
|
+
<p>
|
|
18
|
+
<a href="#installation">Installation</a> •
|
|
19
|
+
<a href="#quick-start">Quick Start</a> •
|
|
20
|
+
<a href="#commands">Commands</a> •
|
|
21
|
+
<a href="#configuration">Configuration</a> •
|
|
22
|
+
<a href="#examples">Examples</a>
|
|
23
|
+
</p>
|
|
24
|
+
</div>
|
|
25
|
+
|
|
26
|
+
---
|
|
27
|
+
|
|
28
|
+
## Table of Contents
|
|
29
|
+
|
|
30
|
+
- [Features](#features)
|
|
31
|
+
- [Installation](#installation)
|
|
32
|
+
- [Prerequisites](#prerequisites)
|
|
33
|
+
- [Installation Methods](#installation-methods)
|
|
34
|
+
- [Updating Codexa](#updating-codexa)
|
|
35
|
+
- [LLM Setup](#llm-setup)
|
|
36
|
+
- [Quick Start](#quick-start)
|
|
37
|
+
- [Commands](#commands)
|
|
38
|
+
- [Configuration](#configuration)
|
|
39
|
+
- [Examples](#examples)
|
|
40
|
+
- [How It Works](#how-it-works)
|
|
41
|
+
- [Architecture](#architecture)
|
|
42
|
+
- [Troubleshooting](#troubleshooting)
|
|
43
|
+
- [FAQ](#faq)
|
|
44
|
+
- [Contributing](#contributing)
|
|
45
|
+
- [License](#license)
|
|
46
|
+
|
|
47
|
+
## Features
|
|
48
|
+
|
|
49
|
+
- 🔒 **Privacy-First**: All data processing happens locally by default
|
|
50
|
+
- ⚡ **Fast & Efficient**: Local embeddings and optimized vector search
|
|
51
|
+
- 🤖 **Multiple LLM Support**: Works with Ollama (local) and Groq (cloud)
|
|
52
|
+
- 💾 **Local Storage**: SQLite database for embeddings and context
|
|
53
|
+
- 🎯 **Smart Chunking**: Intelligent code splitting with configurable overlap
|
|
54
|
+
- 🔄 **Session Management**: Maintain conversation context across queries
|
|
55
|
+
- 📊 **Streaming Output**: Real-time response streaming for better UX
|
|
56
|
+
- 🎨 **Multiple File Types**: Supports TypeScript, JavaScript, Python, Go, Rust, Java, and more
|
|
57
|
+
- ⚙️ **Highly Configurable**: Fine-tune chunking, retrieval, and model parameters
|
|
58
|
+
- 🚀 **Zero Setup**: Works out of the box with sensible defaults
|
|
59
|
+
|
|
60
|
+
## Installation
|
|
61
|
+
|
|
62
|
+
### Prerequisites
|
|
63
|
+
|
|
64
|
+
Before installing Codexa, ensure you have the following:
|
|
65
|
+
|
|
66
|
+
- **Node.js**: v20.0.0 or higher
|
|
67
|
+
```bash
|
|
68
|
+
node --version # Should be v20.0.0 or higher
|
|
69
|
+
```
|
|
70
|
+
|
|
71
|
+
- **For Local LLM (Ollama)**: [Ollama](https://ollama.com/) must be installed
|
|
72
|
+
- **For Cloud LLM (Groq)**: A Groq API key from [console.groq.com](https://console.groq.com/)
|
|
73
|
+
|
|
74
|
+
### Installation Methods
|
|
75
|
+
|
|
76
|
+
Choose the installation method that works best for your system:
|
|
77
|
+
|
|
78
|
+
#### Method 1: npm (Recommended)
|
|
79
|
+
|
|
80
|
+
Install Codexa globally using npm:
|
|
81
|
+
|
|
82
|
+
```bash
|
|
83
|
+
npm install -g codexa
|
|
84
|
+
```
|
|
85
|
+
|
|
86
|
+
Verify installation:
|
|
87
|
+
|
|
88
|
+
```bash
|
|
89
|
+
codexa --version
|
|
90
|
+
```
|
|
91
|
+
|
|
92
|
+
#### Method 2: Homebrew (macOS)
|
|
93
|
+
|
|
94
|
+
Install codexa using Homebrew on macOS:
|
|
95
|
+
|
|
96
|
+
First, add the tap:
|
|
97
|
+
```bash
|
|
98
|
+
brew tap sahitya-chandra/codexa
|
|
99
|
+
```
|
|
100
|
+
|
|
101
|
+
Then install:
|
|
102
|
+
```bash
|
|
103
|
+
brew install codexa
|
|
104
|
+
```
|
|
105
|
+
|
|
106
|
+
### Updating Codexa
|
|
107
|
+
|
|
108
|
+
To update codexa to the latest version:
|
|
109
|
+
|
|
110
|
+
**If installed via npm:**
|
|
111
|
+
```bash
|
|
112
|
+
npm install -g codexa@latest
|
|
113
|
+
```
|
|
114
|
+
|
|
115
|
+
**If installed via Homebrew:**
|
|
116
|
+
```bash
|
|
117
|
+
brew upgrade codexa
|
|
118
|
+
```
|
|
119
|
+
|
|
120
|
+
**Check your current version:**
|
|
121
|
+
```bash
|
|
122
|
+
codexa --version
|
|
123
|
+
```
|
|
124
|
+
|
|
125
|
+
**Check for updates:**
|
|
126
|
+
- Visit the [npm package page](https://www.npmjs.com/package/codexa) to see the latest version
|
|
127
|
+
- Or check the [GitHub releases](https://github.com/sahitya-chandra/codexa/releases)
|
|
128
|
+
|
|
129
|
+
> 💡 **Tip:** It's recommended to keep Codexa updated to get the latest features, bug fixes, and security updates.
|
|
130
|
+
|
|
131
|
+
### LLM Setup
|
|
132
|
+
|
|
133
|
+
Codexa requires an LLM to generate answers. You can use either Groq (cloud - recommended) or Ollama (local). Groq is recommended for its speed and reliability.
|
|
134
|
+
|
|
135
|
+
#### Option 1: Using Groq (Cloud - Recommended)
|
|
136
|
+
|
|
137
|
+
Groq provides fast cloud-based LLMs with a generous free tier and is the recommended option for most users.
|
|
138
|
+
|
|
139
|
+
**Step 1: Get a Groq API Key**
|
|
140
|
+
|
|
141
|
+
1. Visit [console.groq.com](https://console.groq.com/)
|
|
142
|
+
2. Sign up or log in
|
|
143
|
+
3. Navigate to API Keys section
|
|
144
|
+
4. Create a new API key
|
|
145
|
+
5. Copy your API key (starts with `gsk_`)
|
|
146
|
+
|
|
147
|
+
**Step 2: Set Environment Variable**
|
|
148
|
+
|
|
149
|
+
**macOS/Linux:**
|
|
150
|
+
```bash
|
|
151
|
+
# Add to your shell profile (~/.zshrc, ~/.bashrc, etc.)
|
|
152
|
+
export GROQ_API_KEY="gsk_your_api_key_here"
|
|
153
|
+
|
|
154
|
+
# Reload your shell or run:
|
|
155
|
+
source ~/.zshrc # or ~/.bashrc
|
|
156
|
+
```
|
|
157
|
+
|
|
158
|
+
**Windows (PowerShell):**
|
|
159
|
+
```powershell
|
|
160
|
+
$env:GROQ_API_KEY="gsk_your_api_key_here"
|
|
161
|
+
|
|
162
|
+
# Or add permanently:
|
|
163
|
+
[System.Environment]::SetEnvironmentVariable('GROQ_API_KEY', 'gsk_your_api_key_here', 'User')
|
|
164
|
+
```
|
|
165
|
+
|
|
166
|
+
**Windows (Command Prompt):**
|
|
167
|
+
```cmd
|
|
168
|
+
setx GROQ_API_KEY "gsk_your_api_key_here"
|
|
169
|
+
```
|
|
170
|
+
|
|
171
|
+
**Step 3: Verify API Key is Set**
|
|
172
|
+
|
|
173
|
+
```bash
|
|
174
|
+
echo $GROQ_API_KEY # macOS/Linux
|
|
175
|
+
echo %GROQ_API_KEY% # Windows CMD
|
|
176
|
+
```
|
|
177
|
+
|
|
178
|
+
**Step 4: Configure Codexa**
|
|
179
|
+
|
|
180
|
+
Codexa defaults to using Groq when you run `codexa init`. If you need to manually configure, edit `.codexarc.json`:
|
|
181
|
+
|
|
182
|
+
```json
|
|
183
|
+
{
|
|
184
|
+
"modelProvider": "groq",
|
|
185
|
+
"model": "llama-3.1-8b-instant",
|
|
186
|
+
"embeddingProvider": "local",
|
|
187
|
+
"embeddingModel": "Xenova/all-MiniLM-L6-v2"
|
|
188
|
+
}
|
|
189
|
+
```
|
|
190
|
+
|
|
191
|
+
**Available Groq Models:**
|
|
192
|
+
- `llama-3.1-8b-instant` - Fast responses (recommended, default)
|
|
193
|
+
- `llama-3.1-70b-versatile` - Higher quality, slower
|
|
194
|
+
|
|
195
|
+
#### Option 2: Using Ollama (Local - Alternative)
|
|
196
|
+
|
|
197
|
+
Ollama runs LLMs locally on your machine, keeping your code completely private. This is an alternative option if you prefer local processing.
|
|
198
|
+
|
|
199
|
+
> ⚠️ **Note:** Models with more than 3 billion parameters may not work reliably with local Ollama setup. We recommend using 3B parameter models for best compatibility, or use Groq (Option 1) for better reliability.
|
|
200
|
+
|
|
201
|
+
**Step 1: Install Ollama**
|
|
202
|
+
|
|
203
|
+
- **macOS/Linux**: Visit [ollama.com](https://ollama.com/) and follow the installation instructions
|
|
204
|
+
- **Or use Homebrew on macOS**:
|
|
205
|
+
```bash
|
|
206
|
+
brew install ollama
|
|
207
|
+
```
|
|
208
|
+
|
|
209
|
+
**Step 2: Start Ollama Service**
|
|
210
|
+
|
|
211
|
+
```bash
|
|
212
|
+
# Start Ollama (usually starts automatically after installation)
|
|
213
|
+
ollama serve
|
|
214
|
+
|
|
215
|
+
# Verify Ollama is running
|
|
216
|
+
curl http://localhost:11434/api/tags
|
|
217
|
+
```
|
|
218
|
+
|
|
219
|
+
**Step 3: Download a Model**
|
|
220
|
+
|
|
221
|
+
Pull a model that Codexa can use:
|
|
222
|
+
|
|
223
|
+
```bash
|
|
224
|
+
# Recommended: Fast and lightweight - 3B parameters
|
|
225
|
+
ollama pull qwen2.5:3b-instruct
|
|
226
|
+
|
|
227
|
+
# Alternative 3B options:
|
|
228
|
+
ollama pull qwen2.5:1.5b-instruct # Even faster, smaller
|
|
229
|
+
ollama pull phi3:mini # Microsoft Phi-3 Mini
|
|
230
|
+
|
|
231
|
+
# ⚠️ Note: Larger models (8B+ like llama3:8b, mistral:7b) may not work locally
|
|
232
|
+
# If you encounter issues, try using a 3B model instead, or switch to Groq
|
|
233
|
+
```
|
|
234
|
+
|
|
235
|
+
**Step 4: Verify Model is Available**
|
|
236
|
+
|
|
237
|
+
```bash
|
|
238
|
+
ollama list
|
|
239
|
+
```
|
|
240
|
+
|
|
241
|
+
You should see your downloaded model in the list.
|
|
242
|
+
|
|
243
|
+
**Step 5: Configure Codexa**
|
|
244
|
+
|
|
245
|
+
Edit `.codexarc.json` after running `codexa init`:
|
|
246
|
+
|
|
247
|
+
```json
|
|
248
|
+
{
|
|
249
|
+
"modelProvider": "local",
|
|
250
|
+
"model": "qwen2.5:3b-instruct",
|
|
251
|
+
"localModelUrl": "http://localhost:11434"
|
|
252
|
+
}
|
|
253
|
+
```
|
|
254
|
+
|
|
255
|
+
#### Quick Setup Summary
|
|
256
|
+
|
|
257
|
+
**For Groq (Recommended):**
|
|
258
|
+
```bash
|
|
259
|
+
# 1. Get API key from console.groq.com
|
|
260
|
+
# 2. Set environment variable
|
|
261
|
+
export GROQ_API_KEY="gsk_your_key"
|
|
262
|
+
|
|
263
|
+
# 3. Run codexa init (defaults to Groq)
|
|
264
|
+
codexa init
|
|
265
|
+
|
|
266
|
+
# 4. Ready to use!
|
|
267
|
+
```
|
|
268
|
+
|
|
269
|
+
**For Ollama (Alternative):**
|
|
270
|
+
```bash
|
|
271
|
+
# 1. Install Ollama
|
|
272
|
+
brew install ollama # macOS
|
|
273
|
+
# or visit ollama.com for other platforms
|
|
274
|
+
|
|
275
|
+
# 2. Start Ollama
|
|
276
|
+
ollama serve
|
|
277
|
+
|
|
278
|
+
# 3. Pull model (use 3B models only)
|
|
279
|
+
ollama pull qwen2.5:3b-instruct
|
|
280
|
+
|
|
281
|
+
# 4. Update .codexarc.json to set "modelProvider": "local"
|
|
282
|
+
codexa init
|
|
283
|
+
# Then edit .codexarc.json to set modelProvider to "local"
|
|
284
|
+
```
|
|
285
|
+
|
|
286
|
+
## Quick Start
|
|
287
|
+
|
|
288
|
+
Once Codexa is installed and your LLM is configured, you're ready to use it:
|
|
289
|
+
|
|
290
|
+
1. **Navigate to your project directory:**
|
|
291
|
+
```bash
|
|
292
|
+
cd /path/to/your/project
|
|
293
|
+
```
|
|
294
|
+
|
|
295
|
+
2. **Initialize Codexa:**
|
|
296
|
+
```bash
|
|
297
|
+
codexa init
|
|
298
|
+
```
|
|
299
|
+
This creates a `.codexarc.json` configuration file with sensible defaults.
|
|
300
|
+
|
|
301
|
+
3. **Ingest your codebase:**
|
|
302
|
+
```bash
|
|
303
|
+
codexa ingest
|
|
304
|
+
```
|
|
305
|
+
This indexes your codebase and creates embeddings. First run may take a few minutes.
|
|
306
|
+
|
|
307
|
+
4. **Ask questions:**
|
|
308
|
+
```bash
|
|
309
|
+
codexa ask "How does the authentication flow work?"
|
|
310
|
+
codexa ask "What is the main entry point of this application?"
|
|
311
|
+
codexa ask "Show me how error handling is implemented"
|
|
312
|
+
```
|
|
313
|
+
|
|
314
|
+
## Commands
|
|
315
|
+
|
|
316
|
+
### `init`
|
|
317
|
+
|
|
318
|
+
Creates a `.codexarc.json` configuration file in the current directory with default settings.
|
|
319
|
+
|
|
320
|
+
```bash
|
|
321
|
+
codexa init
|
|
322
|
+
```
|
|
323
|
+
|
|
324
|
+
**What it does:**
|
|
325
|
+
- Creates `.codexarc.json` in the project root
|
|
326
|
+
- Uses sensible defaults for all configuration options
|
|
327
|
+
- Can be safely run multiple times (won't overwrite existing config)
|
|
328
|
+
|
|
329
|
+
---
|
|
330
|
+
|
|
331
|
+
### `ingest`
|
|
332
|
+
|
|
333
|
+
Indexes the codebase and generates embeddings for semantic search.
|
|
334
|
+
|
|
335
|
+
```bash
|
|
336
|
+
codexa ingest [options]
|
|
337
|
+
```
|
|
338
|
+
|
|
339
|
+
**Options:**
|
|
340
|
+
- `-f, --force` - Clear existing index and rebuild from scratch
|
|
341
|
+
|
|
342
|
+
**Examples:**
|
|
343
|
+
```bash
|
|
344
|
+
# Standard ingestion
|
|
345
|
+
codexa ingest
|
|
346
|
+
|
|
347
|
+
# Force rebuild (useful if you've updated code significantly)
|
|
348
|
+
codexa ingest --force
|
|
349
|
+
```
|
|
350
|
+
|
|
351
|
+
**What it does:**
|
|
352
|
+
1. Scans your repository based on `includeGlobs` and `excludeGlobs` patterns
|
|
353
|
+
2. Chunks files into manageable segments
|
|
354
|
+
3. Generates vector embeddings for each chunk
|
|
355
|
+
4. Stores everything in `.codexa/index.db` (SQLite database)
|
|
356
|
+
|
|
357
|
+
**Note:** First ingestion may take a few minutes depending on your codebase size. Subsequent ingestions are faster as they only process changed files.
|
|
358
|
+
|
|
359
|
+
---
|
|
360
|
+
|
|
361
|
+
### `ask`
|
|
362
|
+
|
|
363
|
+
Ask natural language questions about your codebase.
|
|
364
|
+
|
|
365
|
+
```bash
|
|
366
|
+
codexa ask <question...> [options]
|
|
367
|
+
```
|
|
368
|
+
|
|
369
|
+
**Arguments:**
|
|
370
|
+
- `<question...>` - Your question (can be multiple words)
|
|
371
|
+
|
|
372
|
+
**Options:**
|
|
373
|
+
- `-s, --session <name>` - Session identifier to maintain conversation context (default: `"default"`)
|
|
374
|
+
- `--no-stream` - Disable streaming output (show full response at once)
|
|
375
|
+
|
|
376
|
+
**Examples:**
|
|
377
|
+
```bash
|
|
378
|
+
# Basic question
|
|
379
|
+
codexa ask "How does user authentication work?"
|
|
380
|
+
|
|
381
|
+
# Question with multiple words
|
|
382
|
+
codexa ask "What is the main entry point of this application?"
|
|
383
|
+
|
|
384
|
+
# Use a specific session for context
|
|
385
|
+
codexa ask "How does the login function work?" --session my-analysis
|
|
386
|
+
|
|
387
|
+
# Disable streaming
|
|
388
|
+
codexa ask "Summarize the codebase structure" --no-stream
|
|
389
|
+
|
|
390
|
+
# Follow-up question in the same session
|
|
391
|
+
codexa ask "Can you explain that in more detail?" --session my-analysis
|
|
392
|
+
```
|
|
393
|
+
|
|
394
|
+
**How it works:**
|
|
395
|
+
1. Converts your question to a vector embedding
|
|
396
|
+
2. Searches the codebase for relevant chunks using vector similarity
|
|
397
|
+
3. Retrieves the top-K most relevant code sections
|
|
398
|
+
4. Sends question + context to the LLM
|
|
399
|
+
5. Returns a contextual answer about your codebase
|
|
400
|
+
|
|
401
|
+
## Configuration
|
|
402
|
+
|
|
403
|
+
### Configuration File
|
|
404
|
+
|
|
405
|
+
Codexa uses a `.codexarc.json` file in your project root for configuration. This file is automatically created when you run `codexa init`.
|
|
406
|
+
|
|
407
|
+
**Location:** `.codexarc.json` (project root)
|
|
408
|
+
|
|
409
|
+
**Format:** JSON
|
|
410
|
+
|
|
411
|
+
### Environment Variables
|
|
412
|
+
|
|
413
|
+
Some settings can be configured via environment variables:
|
|
414
|
+
|
|
415
|
+
| Variable | Description | Required For |
|
|
416
|
+
|----------|-------------|--------------|
|
|
417
|
+
| `GROQ_API_KEY` | Groq API key for cloud LLM | Groq provider |
|
|
418
|
+
| `OPENAI_API_KEY` | OpenAI API key (for embeddings) | OpenAI embeddings |
|
|
419
|
+
|
|
420
|
+
**Example:**
|
|
421
|
+
```bash
|
|
422
|
+
export GROQ_API_KEY="gsk_your_key_here"
|
|
423
|
+
export OPENAI_API_KEY="sk-your_key_here" # If using OpenAI embeddings
|
|
424
|
+
```
|
|
425
|
+
|
|
426
|
+
### Configuration Options
|
|
427
|
+
|
|
428
|
+
#### `modelProvider`
|
|
429
|
+
|
|
430
|
+
**Type:** `"local" | "groq"`
|
|
431
|
+
**Default:** `"groq"` (recommended)
|
|
432
|
+
|
|
433
|
+
The LLM provider to use for generating answers.
|
|
434
|
+
|
|
435
|
+
- `"groq"` - Uses Groq's cloud API (recommended, requires `GROQ_API_KEY`)
|
|
436
|
+
- `"local"` - Uses Ollama running on your machine (alternative option)
|
|
437
|
+
|
|
438
|
+
#### `model`
|
|
439
|
+
|
|
440
|
+
**Type:** `string`
|
|
441
|
+
**Default:** `"llama-3.1-8b-instant"` (groq, recommended) or `"qwen2.5:3b-instruct"` (local)
|
|
442
|
+
|
|
443
|
+
The model identifier to use.
|
|
444
|
+
|
|
445
|
+
**Common Groq Models (Recommended):**
|
|
446
|
+
- `llama-3.1-8b-instant` - Fast responses (default, recommended)
|
|
447
|
+
- `llama-3.1-70b-versatile` - Higher quality, slower
|
|
448
|
+
|
|
449
|
+
**Common Local Models (Alternative):**
|
|
450
|
+
- `qwen2.5:3b-instruct` - Fast, lightweight - **3B parameters**
|
|
451
|
+
- `qwen2.5:1.5b-instruct` - Even faster, smaller - **1.5B parameters**
|
|
452
|
+
- `phi3:mini` - Microsoft Phi-3 Mini - **3.8B parameters**
|
|
453
|
+
|
|
454
|
+
> ⚠️ **Warning:** Models with more than 3 billion parameters (like `llama3:8b`, `mistral:7b`) may not work reliably with local Ollama setup. If you encounter issues, please try using a 3B parameter model instead, or switch to Groq.
|
|
455
|
+
|
|
456
|
+
#### `localModelUrl`
|
|
457
|
+
|
|
458
|
+
**Type:** `string`
|
|
459
|
+
**Default:** `"http://localhost:11434"`
|
|
460
|
+
|
|
461
|
+
Base URL for your local Ollama instance. Change this if Ollama runs on a different host or port.
|
|
462
|
+
|
|
463
|
+
#### `embeddingProvider`
|
|
464
|
+
|
|
465
|
+
**Type:** `"local"`
|
|
466
|
+
**Default:** `"local"`
|
|
467
|
+
|
|
468
|
+
The embedding provider for vector search.
|
|
469
|
+
|
|
470
|
+
- `"local"` - Uses `@xenova/transformers` (runs entirely locally)
|
|
471
|
+
|
|
472
|
+
#### `embeddingModel`
|
|
473
|
+
|
|
474
|
+
**Type:** `string`
|
|
475
|
+
**Default:** `"Xenova/all-MiniLM-L6-v2"`
|
|
476
|
+
|
|
477
|
+
The embedding model for generating vector representations. This model is downloaded automatically on first use.
|
|
478
|
+
|
|
479
|
+
#### `maxChunkSize`
|
|
480
|
+
|
|
481
|
+
**Type:** `number`
|
|
482
|
+
**Default:** `200`
|
|
483
|
+
|
|
484
|
+
Maximum number of lines per code chunk. Larger values = more context per chunk but fewer chunks.
|
|
485
|
+
|
|
486
|
+
#### `chunkOverlap`
|
|
487
|
+
|
|
488
|
+
**Type:** `number`
|
|
489
|
+
**Default:** `20`
|
|
490
|
+
|
|
491
|
+
Number of lines to overlap between consecutive chunks. Helps maintain context at chunk boundaries.
|
|
492
|
+
|
|
493
|
+
#### `includeGlobs`
|
|
494
|
+
|
|
495
|
+
**Type:** `string[]`
|
|
496
|
+
**Default:** `["**/*.ts", "**/*.tsx", "**/*.js", "**/*.jsx", "**/*.py", "**/*.go", "**/*.rs", "**/*.java", "**/*.md", "**/*.json"]`
|
|
497
|
+
|
|
498
|
+
File patterns to include in indexing. Supports glob patterns.
|
|
499
|
+
|
|
500
|
+
**Examples:**
|
|
501
|
+
```json
|
|
502
|
+
{
|
|
503
|
+
"includeGlobs": [
|
|
504
|
+
"**/*.ts",
|
|
505
|
+
"**/*.tsx",
|
|
506
|
+
"src/**/*.js",
|
|
507
|
+
"lib/**/*.py"
|
|
508
|
+
]
|
|
509
|
+
}
|
|
510
|
+
```
|
|
511
|
+
|
|
512
|
+
#### `excludeGlobs`
|
|
513
|
+
|
|
514
|
+
**Type:** `string[]`
|
|
515
|
+
**Default:** `["node_modules/**", ".git/**", "dist/**", "build/**", ".codexa/**", "package-lock.json"]`
|
|
516
|
+
|
|
517
|
+
File patterns to exclude from indexing.
|
|
518
|
+
|
|
519
|
+
**Examples:**
|
|
520
|
+
```json
|
|
521
|
+
{
|
|
522
|
+
"excludeGlobs": [
|
|
523
|
+
"node_modules/**",
|
|
524
|
+
".git/**",
|
|
525
|
+
"dist/**",
|
|
526
|
+
"**/*.test.ts",
|
|
527
|
+
"coverage/**"
|
|
528
|
+
]
|
|
529
|
+
}
|
|
530
|
+
```
|
|
531
|
+
|
|
532
|
+
#### `historyDir`
|
|
533
|
+
|
|
534
|
+
**Type:** `string`
|
|
535
|
+
**Default:** `".codexa/sessions"`
|
|
536
|
+
|
|
537
|
+
Directory to store conversation history for session management.
|
|
538
|
+
|
|
539
|
+
#### `dbPath`
|
|
540
|
+
|
|
541
|
+
**Type:** `string`
|
|
542
|
+
**Default:** `".codexa/index.db"`
|
|
543
|
+
|
|
544
|
+
Path to the SQLite database storing code chunks and embeddings.
|
|
545
|
+
|
|
546
|
+
#### `temperature`
|
|
547
|
+
|
|
548
|
+
**Type:** `number`
|
|
549
|
+
**Default:** `0.2`
|
|
550
|
+
|
|
551
|
+
Controls randomness in LLM responses (0.0 = deterministic, 1.0 = creative).
|
|
552
|
+
|
|
553
|
+
- Lower values (0.0-0.3): More focused, deterministic answers
|
|
554
|
+
- Higher values (0.7-1.0): More creative, varied responses
|
|
555
|
+
|
|
556
|
+
#### `topK`
|
|
557
|
+
|
|
558
|
+
**Type:** `number`
|
|
559
|
+
**Default:** `4`
|
|
560
|
+
|
|
561
|
+
Number of code chunks to retrieve and use as context for each question. Higher values provide more context but may include less relevant information.
|
|
562
|
+
|
|
563
|
+
### Example Configurations
|
|
564
|
+
|
|
565
|
+
#### Groq Cloud Provider (Recommended - Default)
|
|
566
|
+
|
|
567
|
+
```json
|
|
568
|
+
{
|
|
569
|
+
"modelProvider": "groq",
|
|
570
|
+
"model": "llama-3.1-8b-instant",
|
|
571
|
+
"embeddingProvider": "local",
|
|
572
|
+
"embeddingModel": "Xenova/all-MiniLM-L6-v2",
|
|
573
|
+
"maxChunkSize": 300,
|
|
574
|
+
"chunkOverlap": 20,
|
|
575
|
+
"temperature": 0.2,
|
|
576
|
+
"topK": 4
|
|
577
|
+
}
|
|
578
|
+
```
|
|
579
|
+
|
|
580
|
+
**Remember:** Set `GROQ_API_KEY` environment variable:
|
|
581
|
+
```bash
|
|
582
|
+
export GROQ_API_KEY="your-api-key"
|
|
583
|
+
```
|
|
584
|
+
|
|
585
|
+
#### Local Development (Alternative)
|
|
586
|
+
|
|
587
|
+
```json
|
|
588
|
+
{
|
|
589
|
+
"modelProvider": "local",
|
|
590
|
+
"model": "qwen2.5:3b-instruct",
|
|
591
|
+
"localModelUrl": "http://localhost:11434",
|
|
592
|
+
"embeddingProvider": "local",
|
|
593
|
+
"embeddingModel": "Xenova/all-MiniLM-L6-v2",
|
|
594
|
+
"maxChunkSize": 200,
|
|
595
|
+
"chunkOverlap": 20,
|
|
596
|
+
"temperature": 0.2,
|
|
597
|
+
"topK": 4
|
|
598
|
+
}
|
|
599
|
+
```
|
|
600
|
+
|
|
601
|
+
#### Optimized for Large Codebases
|
|
602
|
+
|
|
603
|
+
```json
|
|
604
|
+
{
|
|
605
|
+
"modelProvider": "local",
|
|
606
|
+
"model": "qwen2.5:3b-instruct",
|
|
607
|
+
"maxChunkSize": 150,
|
|
608
|
+
"chunkOverlap": 15,
|
|
609
|
+
"topK": 6,
|
|
610
|
+
"temperature": 0.1,
|
|
611
|
+
"includeGlobs": [
|
|
612
|
+
"src/**/*.ts",
|
|
613
|
+
"src/**/*.tsx",
|
|
614
|
+
"lib/**/*.ts"
|
|
615
|
+
],
|
|
616
|
+
"excludeGlobs": [
|
|
617
|
+
"node_modules/**",
|
|
618
|
+
"dist/**",
|
|
619
|
+
"**/*.test.ts",
|
|
620
|
+
"**/*.spec.ts",
|
|
621
|
+
"coverage/**"
|
|
622
|
+
]
|
|
623
|
+
}
|
|
624
|
+
```
|
|
625
|
+
|
|
626
|
+
## Examples
|
|
627
|
+
|
|
628
|
+
### Basic Workflow
|
|
629
|
+
|
|
630
|
+
```bash
|
|
631
|
+
# 1. Initialize in your project
|
|
632
|
+
cd my-project
|
|
633
|
+
codexa init
|
|
634
|
+
|
|
635
|
+
# 2. Index your codebase
|
|
636
|
+
codexa ingest
|
|
637
|
+
|
|
638
|
+
# 3. Ask questions
|
|
639
|
+
codexa ask "What is the main purpose of this codebase?"
|
|
640
|
+
codexa ask "How does the user authentication work?"
|
|
641
|
+
codexa ask "Where is the API routing configured?"
|
|
642
|
+
```
|
|
643
|
+
|
|
644
|
+
<!-- ### Using Sessions for Context
|
|
645
|
+
|
|
646
|
+
```bash
|
|
647
|
+
# Start a new analysis session
|
|
648
|
+
codexa ask "What does the UserService class do?" --session user-analysis
|
|
649
|
+
|
|
650
|
+
# Follow up with context from previous question
|
|
651
|
+
codexa ask "How does it handle errors?" --session user-analysis
|
|
652
|
+
|
|
653
|
+
# Ask about related functionality
|
|
654
|
+
codexa ask "Show me where it's used in the codebase" --session user-analysis
|
|
655
|
+
``` -->
|
|
656
|
+
|
|
657
|
+
### Force Re-indexing
|
|
658
|
+
|
|
659
|
+
```bash
|
|
660
|
+
# After significant code changes
|
|
661
|
+
codexa ingest --force
|
|
662
|
+
```
|
|
663
|
+
|
|
664
|
+
### Working with Specific File Types
|
|
665
|
+
|
|
666
|
+
Update `.codexarc.json` to focus on specific languages:
|
|
667
|
+
|
|
668
|
+
```json
|
|
669
|
+
{
|
|
670
|
+
"includeGlobs": [
|
|
671
|
+
"**/*.ts",
|
|
672
|
+
"**/*.tsx"
|
|
673
|
+
],
|
|
674
|
+
"excludeGlobs": [
|
|
675
|
+
"node_modules/**",
|
|
676
|
+
"**/*.test.ts",
|
|
677
|
+
"**/*.spec.ts"
|
|
678
|
+
]
|
|
679
|
+
}
|
|
680
|
+
```
|
|
681
|
+
|
|
682
|
+
## How It Works
|
|
683
|
+
|
|
684
|
+
Codexa uses Retrieval-Augmented Generation (RAG) to answer questions about your codebase:
|
|
685
|
+
|
|
686
|
+
### 1. Ingestion Phase
|
|
687
|
+
|
|
688
|
+
When you run `codexa ingest`:
|
|
689
|
+
|
|
690
|
+
1. **File Discovery**: Scans your repository using glob patterns (`includeGlobs`/`excludeGlobs`)
|
|
691
|
+
2. **Code Chunking**: Splits files into manageable chunks with configurable overlap
|
|
692
|
+
3. **Embedding Generation**: Creates vector embeddings for each chunk using local transformers
|
|
693
|
+
4. **Storage**: Stores chunks and embeddings in a SQLite database (`.codexa/index.db`)
|
|
694
|
+
|
|
695
|
+
### 2. Query Phase
|
|
696
|
+
|
|
697
|
+
When you run `codexa ask`:
|
|
698
|
+
|
|
699
|
+
1. **Question Embedding**: Converts your question into a vector embedding
|
|
700
|
+
2. **Vector Search**: Finds the most similar code chunks using cosine similarity
|
|
701
|
+
3. **Context Retrieval**: Selects top-K most relevant chunks as context
|
|
702
|
+
4. **LLM Generation**: Sends question + context to your configured LLM
|
|
703
|
+
5. **Response**: Returns an answer grounded in your actual codebase
|
|
704
|
+
|
|
705
|
+
### Benefits
|
|
706
|
+
|
|
707
|
+
- **Privacy**: All processing happens locally by default
|
|
708
|
+
- **Speed**: Local embeddings and vector search are very fast
|
|
709
|
+
- **Accuracy**: Answers are based on your actual code, not generic responses
|
|
710
|
+
- **Context-Aware**: Understands relationships across your codebase
|
|
711
|
+
|
|
712
|
+
## Architecture
|
|
713
|
+
|
|
714
|
+
```
|
|
715
|
+
┌─────────────────┐
|
|
716
|
+
│ User Query │
|
|
717
|
+
└────────┬────────┘
|
|
718
|
+
│
|
|
719
|
+
▼
|
|
720
|
+
┌─────────────────┐ ┌──────────────┐
|
|
721
|
+
│ Embedding │────▶│ Vector │
|
|
722
|
+
│ Generation │ │ Search │
|
|
723
|
+
└─────────────────┘ └──────┬───────┘
|
|
724
|
+
│
|
|
725
|
+
▼
|
|
726
|
+
┌──────────────┐
|
|
727
|
+
│ Context │
|
|
728
|
+
│ Retrieval │
|
|
729
|
+
└──────┬───────┘
|
|
730
|
+
│
|
|
731
|
+
▼
|
|
732
|
+
┌─────────────────┐ ┌──────────────┐
|
|
733
|
+
│ SQLite DB │◀────│ LLM │
|
|
734
|
+
│ (Chunks + │ │ (Ollama/ │
|
|
735
|
+
│ Embeddings) │ │ Groq) │
|
|
736
|
+
└─────────────────┘ └──────┬───────┘
|
|
737
|
+
│
|
|
738
|
+
▼
|
|
739
|
+
┌──────────────┐
|
|
740
|
+
│ Answer │
|
|
741
|
+
└──────────────┘
|
|
742
|
+
```
|
|
743
|
+
|
|
744
|
+
**Key Components:**
|
|
745
|
+
- **Chunker**: Splits code files into semantic chunks
|
|
746
|
+
- **Embedder**: Generates vector embeddings (local transformers)
|
|
747
|
+
- **Retriever**: Finds relevant chunks using vector similarity
|
|
748
|
+
- **LLM Client**: Generates answers (Ollama local or Groq cloud)
|
|
749
|
+
- **Database**: SQLite for storing chunks and embeddings
|
|
750
|
+
|
|
751
|
+
## Troubleshooting
|
|
752
|
+
|
|
753
|
+
### "Ollama not reachable" Error
|
|
754
|
+
|
|
755
|
+
**Problem:** Codexa can't connect to your local Ollama instance.
|
|
756
|
+
|
|
757
|
+
**Solutions:**
|
|
758
|
+
1. Ensure Ollama is running:
|
|
759
|
+
```bash
|
|
760
|
+
ollama serve
|
|
761
|
+
```
|
|
762
|
+
2. Check if Ollama is running on the default port:
|
|
763
|
+
```bash
|
|
764
|
+
curl http://localhost:11434/api/tags
|
|
765
|
+
```
|
|
766
|
+
3. If Ollama runs on a different host/port, update `.codexarc.json`:
|
|
767
|
+
```json
|
|
768
|
+
{
|
|
769
|
+
"localModelUrl": "http://your-host:port"
|
|
770
|
+
}
|
|
771
|
+
```
|
|
772
|
+
|
|
773
|
+
### "Model not found" Error
|
|
774
|
+
|
|
775
|
+
**Problem:** The specified Ollama model isn't available.
|
|
776
|
+
|
|
777
|
+
**Solutions:**
|
|
778
|
+
1. List available models:
|
|
779
|
+
```bash
|
|
780
|
+
ollama list
|
|
781
|
+
```
|
|
782
|
+
2. Pull the required model:
|
|
783
|
+
```bash
|
|
784
|
+
ollama pull qwen2.5:3b-instruct
|
|
785
|
+
```
|
|
786
|
+
3. Or update `.codexarc.json` to use an available model:
|
|
787
|
+
```json
|
|
788
|
+
{
|
|
789
|
+
"model": "your-available-model"
|
|
790
|
+
}
|
|
791
|
+
```
|
|
792
|
+
|
|
793
|
+
### "GROQ_API_KEY not set" Error
|
|
794
|
+
|
|
795
|
+
**Problem:** Using Groq provider but API key is missing.
|
|
796
|
+
|
|
797
|
+
**Solutions:**
|
|
798
|
+
1. Set the environment variable:
|
|
799
|
+
```bash
|
|
800
|
+
export GROQ_API_KEY="your-api-key"
|
|
801
|
+
```
|
|
802
|
+
2. Or add it to your shell profile (`~/.bashrc`, `~/.zshrc`, etc.)
|
|
803
|
+
3. Verify it's set:
|
|
804
|
+
```bash
|
|
805
|
+
echo $GROQ_API_KEY
|
|
806
|
+
```
|
|
807
|
+
|
|
808
|
+
### Ingestion is Very Slow
|
|
809
|
+
|
|
810
|
+
**Problem:** First ingestion takes too long.
|
|
811
|
+
|
|
812
|
+
**Solutions:**
|
|
813
|
+
1. Reduce `maxChunkSize` to create more, smaller chunks
|
|
814
|
+
2. Add more patterns to `excludeGlobs` to skip unnecessary files
|
|
815
|
+
3. Be more specific with `includeGlobs` to focus on important files
|
|
816
|
+
4. Use `--force` only when necessary (incremental updates are faster)
|
|
817
|
+
|
|
818
|
+
### Poor Quality Answers
|
|
819
|
+
|
|
820
|
+
**Problem:** Answers are not relevant or accurate.
|
|
821
|
+
|
|
822
|
+
**Solutions:**
|
|
823
|
+
1. Increase `topK` to retrieve more context:
|
|
824
|
+
```json
|
|
825
|
+
{
|
|
826
|
+
"topK": 6
|
|
827
|
+
}
|
|
828
|
+
```
|
|
829
|
+
2. Adjust `temperature` for more focused answers:
|
|
830
|
+
```json
|
|
831
|
+
{
|
|
832
|
+
"temperature": 0.1
|
|
833
|
+
}
|
|
834
|
+
```
|
|
835
|
+
3. Re-index after significant code changes:
|
|
836
|
+
```bash
|
|
837
|
+
codexa ingest --force
|
|
838
|
+
```
|
|
839
|
+
4. If using local Ollama, try a 3B parameter model (models larger than 3B may not work reliably locally)
|
|
840
|
+
5. Ask more specific questions
|
|
841
|
+
|
|
842
|
+
### Database Locked Error
|
|
843
|
+
|
|
844
|
+
**Problem:** SQLite database is locked (multiple processes accessing it).
|
|
845
|
+
|
|
846
|
+
**Solutions:**
|
|
847
|
+
1. Ensure only one `codexa` process runs at a time
|
|
848
|
+
2. If using concurrent processes, each should use a different `dbPath`
|
|
849
|
+
|
|
850
|
+
### Missing Files in Index
|
|
851
|
+
|
|
852
|
+
**Problem:** Some files aren't being indexed.
|
|
853
|
+
|
|
854
|
+
**Solutions:**
|
|
855
|
+
1. Check `includeGlobs` patterns in `.codexarc.json`
|
|
856
|
+
2. Verify files aren't excluded by `excludeGlobs`
|
|
857
|
+
3. Run with `--force` to rebuild:
|
|
858
|
+
```bash
|
|
859
|
+
codexa ingest --force
|
|
860
|
+
```
|
|
861
|
+
4. Check file permissions (ensure Codexa can read the files)
|
|
862
|
+
|
|
863
|
+
## FAQ
|
|
864
|
+
|
|
865
|
+
**Q: Can I use Codexa with private/confidential code?**
|
|
866
|
+
A: Yes! Codexa processes everything locally by default. Your code never leaves your machine unless you explicitly use cloud providers like Groq.
|
|
867
|
+
|
|
868
|
+
**Q: How much disk space does Codexa use?**
|
|
869
|
+
A: Typically 10-50MB per 1000 files, depending on file sizes. The SQLite database stores chunks and embeddings.
|
|
870
|
+
|
|
871
|
+
**Q: Can I use Codexa in CI/CD?**
|
|
872
|
+
A: Yes, but you'll need to ensure Ollama or your LLM provider is accessible. For CI/CD, consider using Groq (cloud) instead of local Ollama.
|
|
873
|
+
|
|
874
|
+
**Q: Does Codexa work with monorepos?**
|
|
875
|
+
A: Yes! Adjust `includeGlobs` and `excludeGlobs` to target specific packages or workspaces.
|
|
876
|
+
|
|
877
|
+
**Q: Can I use multiple LLM providers?**
|
|
878
|
+
A: You can switch providers by updating `modelProvider` in `.codexarc.json`. Each repository can have its own configuration.
|
|
879
|
+
|
|
880
|
+
**Q: How often should I re-index?**
|
|
881
|
+
A: Codexa only processes changed files on subsequent runs, so you can run `ingest` frequently. Use `--force` only when you need a complete rebuild.
|
|
882
|
+
|
|
883
|
+
**Q: Is there a way to query the database directly?**
|
|
884
|
+
A: The SQLite database (`.codexa/index.db`) can be queried directly, but the schema is internal. Use Codexa's commands for all operations.
|
|
885
|
+
|
|
886
|
+
**Q: Can I customize the prompt sent to the LLM?**
|
|
887
|
+
A: Currently, the prompt is fixed, but this may be configurable in future versions.
|
|
888
|
+
|
|
889
|
+
## Contributing
|
|
890
|
+
|
|
891
|
+
Contributions are welcome! Please see our [Contributing Guide](CONTRIBUTING.md) for details.
|
|
892
|
+
|
|
893
|
+
Quick start:
|
|
894
|
+
1. Fork the repository
|
|
895
|
+
2. Create your feature branch (`git checkout -b feature/amazing-feature`)
|
|
896
|
+
3. Commit your changes (`git commit -m 'Add some amazing feature'`)
|
|
897
|
+
4. Push to the branch (`git push origin feature/amazing-feature`)
|
|
898
|
+
5. Open a Pull Request
|
|
899
|
+
|
|
900
|
+
For major changes, please open an issue first to discuss what you would like to change.
|
|
901
|
+
|
|
902
|
+
See [CONTRIBUTING.md](CONTRIBUTING.md) for detailed guidelines.
|
|
903
|
+
|
|
904
|
+
## License
|
|
905
|
+
|
|
906
|
+
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
|
|
907
|
+
|
|
908
|
+
---
|
|
909
|
+
|
|
910
|
+
<div align="center">
|
|
911
|
+
<p>Made with ❤️ by the Codexa team</p>
|
|
912
|
+
<p>
|
|
913
|
+
<a href="https://github.com/sahitya-chandra/codexa/issues">Report Bug</a> •
|
|
914
|
+
<a href="https://github.com/sahitya-chandra/codexa/issues">Request Feature</a>
|
|
915
|
+
</p>
|
|
916
|
+
</div>
|