codexa 1.0.0 → 1.1.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +130 -174
- package/dist/agent.js +24 -9
- package/dist/cli.js +40 -9
- package/dist/config/detector.js +339 -0
- package/dist/config/generator.js +381 -0
- package/dist/config.js +29 -8
- package/dist/db.js +51 -1
- package/dist/ingest.js +102 -10
- package/dist/models/index.js +11 -7
- package/dist/retriever.js +158 -6
- package/dist/utils/file-filter.js +177 -0
- package/dist/utils/formatter.js +46 -0
- package/dist/utils/logger.js +10 -4
- package/package.json +14 -5
- package/scripts/postinstall.js +58 -0
- package/scripts/smoke.js +26 -0
- package/scripts/smoke.ts +21 -0
package/README.md
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
<div align="center">
|
|
2
|
-
<h1>
|
|
3
|
-
|
|
4
|
-
|
|
2
|
+
<h1>
|
|
3
|
+
<img src="https://github.com/user-attachments/assets/8d571bd6-ba2b-469a-8ddc-3f3ded0fd766" alt="Codexa Logo" width="90" align="absmiddle"> Codexa
|
|
4
|
+
</h1>
|
|
5
5
|
|
|
6
6
|
<p>
|
|
7
7
|
<strong>A powerful CLI tool that ingests your codebase and allows you to ask questions about it using Retrieval-Augmented Generation (RAG).</strong>
|
|
@@ -48,12 +48,14 @@
|
|
|
48
48
|
|
|
49
49
|
- 🔒 **Privacy-First**: All data processing happens locally by default
|
|
50
50
|
- ⚡ **Fast & Efficient**: Local embeddings and optimized vector search
|
|
51
|
-
- 🤖 **Multiple LLM Support**: Works with
|
|
51
|
+
- 🤖 **Multiple LLM Support**: Works with Groq (cloud)
|
|
52
52
|
- 💾 **Local Storage**: SQLite database for embeddings and context
|
|
53
53
|
- 🎯 **Smart Chunking**: Intelligent code splitting with configurable overlap
|
|
54
54
|
- 🔄 **Session Management**: Maintain conversation context across queries
|
|
55
55
|
- 📊 **Streaming Output**: Real-time response streaming for better UX
|
|
56
56
|
- 🎨 **Multiple File Types**: Supports TypeScript, JavaScript, Python, Go, Rust, Java, and more
|
|
57
|
+
- 🧠 **Smart Configuration**: Automatically detects project languages and optimizes config
|
|
58
|
+
- 🛡️ **Intelligent Filtering**: Automatically excludes binaries, large files, and build artifacts
|
|
57
59
|
- ⚙️ **Highly Configurable**: Fine-tune chunking, retrieval, and model parameters
|
|
58
60
|
- 🚀 **Zero Setup**: Works out of the box with sensible defaults
|
|
59
61
|
|
|
@@ -68,7 +70,6 @@ Before installing Codexa, ensure you have the following:
|
|
|
68
70
|
node --version # Should be v20.0.0 or higher
|
|
69
71
|
```
|
|
70
72
|
|
|
71
|
-
- **For Local LLM (Ollama)**: [Ollama](https://ollama.com/) must be installed
|
|
72
73
|
- **For Cloud LLM (Groq)**: A Groq API key from [console.groq.com](https://console.groq.com/)
|
|
73
74
|
|
|
74
75
|
### Installation Methods
|
|
@@ -130,11 +131,9 @@ codexa --version
|
|
|
130
131
|
|
|
131
132
|
### LLM Setup
|
|
132
133
|
|
|
133
|
-
Codexa requires an LLM to generate answers. You can use
|
|
134
|
-
|
|
135
|
-
#### Option 1: Using Groq (Cloud - Recommended)
|
|
134
|
+
Codexa requires an LLM to generate answers. You can use Groq (cloud).
|
|
136
135
|
|
|
137
|
-
Groq provides fast cloud-based LLMs with a generous free tier
|
|
136
|
+
Groq provides fast cloud-based LLMs with a generous free tier.
|
|
138
137
|
|
|
139
138
|
**Step 1: Get a Groq API Key**
|
|
140
139
|
|
|
@@ -192,69 +191,11 @@ Codexa defaults to using Groq when you run `codexa init`. If you need to manuall
|
|
|
192
191
|
- `llama-3.1-8b-instant` - Fast responses (recommended, default)
|
|
193
192
|
- `llama-3.1-70b-versatile` - Higher quality, slower
|
|
194
193
|
|
|
195
|
-
#### Option 2: Using Ollama (Local - Alternative)
|
|
196
|
-
|
|
197
|
-
Ollama runs LLMs locally on your machine, keeping your code completely private. This is an alternative option if you prefer local processing.
|
|
198
|
-
|
|
199
|
-
> ⚠️ **Note:** Models with more than 3 billion parameters may not work reliably with local Ollama setup. We recommend using 3B parameter models for best compatibility, or use Groq (Option 1) for better reliability.
|
|
200
|
-
|
|
201
|
-
**Step 1: Install Ollama**
|
|
202
|
-
|
|
203
|
-
- **macOS/Linux**: Visit [ollama.com](https://ollama.com/) and follow the installation instructions
|
|
204
|
-
- **Or use Homebrew on macOS**:
|
|
205
|
-
```bash
|
|
206
|
-
brew install ollama
|
|
207
|
-
```
|
|
208
|
-
|
|
209
|
-
**Step 2: Start Ollama Service**
|
|
210
|
-
|
|
211
|
-
```bash
|
|
212
|
-
# Start Ollama (usually starts automatically after installation)
|
|
213
|
-
ollama serve
|
|
214
|
-
|
|
215
|
-
# Verify Ollama is running
|
|
216
|
-
curl http://localhost:11434/api/tags
|
|
217
|
-
```
|
|
218
|
-
|
|
219
|
-
**Step 3: Download a Model**
|
|
220
|
-
|
|
221
|
-
Pull a model that Codexa can use:
|
|
222
194
|
|
|
223
|
-
```bash
|
|
224
|
-
# Recommended: Fast and lightweight - 3B parameters
|
|
225
|
-
ollama pull qwen2.5:3b-instruct
|
|
226
|
-
|
|
227
|
-
# Alternative 3B options:
|
|
228
|
-
ollama pull qwen2.5:1.5b-instruct # Even faster, smaller
|
|
229
|
-
ollama pull phi3:mini # Microsoft Phi-3 Mini
|
|
230
|
-
|
|
231
|
-
# ⚠️ Note: Larger models (8B+ like llama3:8b, mistral:7b) may not work locally
|
|
232
|
-
# If you encounter issues, try using a 3B model instead, or switch to Groq
|
|
233
|
-
```
|
|
234
|
-
|
|
235
|
-
**Step 4: Verify Model is Available**
|
|
236
|
-
|
|
237
|
-
```bash
|
|
238
|
-
ollama list
|
|
239
|
-
```
|
|
240
|
-
|
|
241
|
-
You should see your downloaded model in the list.
|
|
242
|
-
|
|
243
|
-
**Step 5: Configure Codexa**
|
|
244
|
-
|
|
245
|
-
Edit `.codexarc.json` after running `codexa init`:
|
|
246
|
-
|
|
247
|
-
```json
|
|
248
|
-
{
|
|
249
|
-
"modelProvider": "local",
|
|
250
|
-
"model": "qwen2.5:3b-instruct",
|
|
251
|
-
"localModelUrl": "http://localhost:11434"
|
|
252
|
-
}
|
|
253
|
-
```
|
|
254
195
|
|
|
255
196
|
#### Quick Setup Summary
|
|
256
197
|
|
|
257
|
-
**For Groq
|
|
198
|
+
**For Groq:**
|
|
258
199
|
```bash
|
|
259
200
|
# 1. Get API key from console.groq.com
|
|
260
201
|
# 2. Set environment variable
|
|
@@ -266,22 +207,7 @@ codexa init
|
|
|
266
207
|
# 4. Ready to use!
|
|
267
208
|
```
|
|
268
209
|
|
|
269
|
-
**For Ollama (Alternative):**
|
|
270
|
-
```bash
|
|
271
|
-
# 1. Install Ollama
|
|
272
|
-
brew install ollama # macOS
|
|
273
|
-
# or visit ollama.com for other platforms
|
|
274
|
-
|
|
275
|
-
# 2. Start Ollama
|
|
276
|
-
ollama serve
|
|
277
|
-
|
|
278
|
-
# 3. Pull model (use 3B models only)
|
|
279
|
-
ollama pull qwen2.5:3b-instruct
|
|
280
210
|
|
|
281
|
-
# 4. Update .codexarc.json to set "modelProvider": "local"
|
|
282
|
-
codexa init
|
|
283
|
-
# Then edit .codexarc.json to set modelProvider to "local"
|
|
284
|
-
```
|
|
285
211
|
|
|
286
212
|
## Quick Start
|
|
287
213
|
|
|
@@ -315,17 +241,41 @@ Once Codexa is installed and your LLM is configured, you're ready to use it:
|
|
|
315
241
|
|
|
316
242
|
### `init`
|
|
317
243
|
|
|
318
|
-
Creates a `.codexarc.json` configuration file
|
|
244
|
+
Creates a `.codexarc.json` configuration file optimized for your codebase.
|
|
319
245
|
|
|
320
246
|
```bash
|
|
321
247
|
codexa init
|
|
322
248
|
```
|
|
323
249
|
|
|
324
250
|
**What it does:**
|
|
325
|
-
-
|
|
326
|
-
-
|
|
251
|
+
- **Analyzes your codebase** to detect languages, package managers, and frameworks
|
|
252
|
+
- **Creates optimized config** with language-specific include/exclude patterns
|
|
253
|
+
- **Generates `.codexarc.json`** in the project root with tailored settings
|
|
327
254
|
- Can be safely run multiple times (won't overwrite existing config)
|
|
328
255
|
|
|
256
|
+
**Detection Capabilities:**
|
|
257
|
+
- **Languages**: TypeScript, JavaScript, Python, Go, Rust, Java, Kotlin, Scala, C/C++, Ruby, PHP, Swift, Dart, and more
|
|
258
|
+
- **Package Managers**: npm, yarn, pnpm, pip, poetry, go, cargo, maven, gradle, sbt, bundler, composer, and more
|
|
259
|
+
- **Frameworks**: Next.js, React, Django, Flask, Rails, Laravel, Spring, Flutter, and more
|
|
260
|
+
|
|
261
|
+
**Example Output:**
|
|
262
|
+
```
|
|
263
|
+
Analyzing codebase...
|
|
264
|
+
✓ Detected: typescript, javascript (npm, yarn)
|
|
265
|
+
|
|
266
|
+
✓ Created .codexarc.json with optimized settings for your codebase!
|
|
267
|
+
|
|
268
|
+
┌ 🚀 Setup Complete ──────────────────────────────────────────┐
|
|
269
|
+
│ │
|
|
270
|
+
│ Next Steps: │
|
|
271
|
+
│ │
|
|
272
|
+
│ 1. Review .codexarc.json - Update provider keys if needed │
|
|
273
|
+
│ 2. Run: codexa ingest - Start indexing your codebase │
|
|
274
|
+
│ 3. Run: codexa ask "your question" - Ask questions │
|
|
275
|
+
│ │
|
|
276
|
+
└─────────────────────────────────────────────────────────────┘
|
|
277
|
+
```
|
|
278
|
+
|
|
329
279
|
---
|
|
330
280
|
|
|
331
281
|
### `ingest`
|
|
@@ -350,9 +300,15 @@ codexa ingest --force
|
|
|
350
300
|
|
|
351
301
|
**What it does:**
|
|
352
302
|
1. Scans your repository based on `includeGlobs` and `excludeGlobs` patterns
|
|
353
|
-
2.
|
|
354
|
-
3.
|
|
355
|
-
4.
|
|
303
|
+
2. **Filters files** - Automatically excludes binaries, large files (>5MB), and build artifacts
|
|
304
|
+
3. Chunks files into manageable segments
|
|
305
|
+
4. Generates vector embeddings for each chunk
|
|
306
|
+
5. Stores everything in `.codexa/index.db` (SQLite database)
|
|
307
|
+
|
|
308
|
+
**Smart Filtering:**
|
|
309
|
+
- Automatically skips binary files (executables, images, archives, etc.)
|
|
310
|
+
- Excludes files larger than the configured size limit (default: 5MB)
|
|
311
|
+
- Filters based on file content analysis (not just extensions)
|
|
356
312
|
|
|
357
313
|
**Note:** First ingestion may take a few minutes depending on your codebase size. Subsequent ingestions are faster as they only process changed files.
|
|
358
314
|
|
|
@@ -408,6 +364,28 @@ Codexa uses a `.codexarc.json` file in your project root for configuration. This
|
|
|
408
364
|
|
|
409
365
|
**Format:** JSON
|
|
410
366
|
|
|
367
|
+
### Dynamic Configuration Generation
|
|
368
|
+
|
|
369
|
+
When you run `codexa init`, Codexa automatically:
|
|
370
|
+
|
|
371
|
+
1. **Analyzes your codebase** structure to detect:
|
|
372
|
+
- Languages present (by file extensions)
|
|
373
|
+
- Package managers used (by config files)
|
|
374
|
+
- Frameworks detected (by dependencies and config files)
|
|
375
|
+
|
|
376
|
+
2. **Generates optimized patterns**:
|
|
377
|
+
- **Include patterns**: Only file extensions relevant to detected languages
|
|
378
|
+
- **Exclude patterns**: Language-specific build artifacts, dependency directories, and cache folders
|
|
379
|
+
- **Smart defaults**: Based on your project type
|
|
380
|
+
|
|
381
|
+
3. **Applies best practices**:
|
|
382
|
+
- Excludes common build outputs (`dist/`, `build/`, `target/`, etc.)
|
|
383
|
+
- Excludes dependency directories (`node_modules/`, `vendor/`, `.venv/`, etc.)
|
|
384
|
+
- Includes important config files and documentation
|
|
385
|
+
- Filters binaries and large files automatically
|
|
386
|
+
|
|
387
|
+
This means your config is tailored to your project from the start, ensuring optimal indexing performance!
|
|
388
|
+
|
|
411
389
|
### Environment Variables
|
|
412
390
|
|
|
413
391
|
Some settings can be configured via environment variables:
|
|
@@ -427,38 +405,22 @@ export OPENAI_API_KEY="sk-your_key_here" # If using OpenAI embeddings
|
|
|
427
405
|
|
|
428
406
|
#### `modelProvider`
|
|
429
407
|
|
|
430
|
-
**Type:** `"
|
|
431
|
-
**Default:** `"groq"`
|
|
408
|
+
**Type:** `"groq"`
|
|
409
|
+
**Default:** `"groq"`
|
|
432
410
|
|
|
433
411
|
The LLM provider to use for generating answers.
|
|
434
412
|
|
|
435
|
-
- `"groq"` - Uses Groq's cloud API (
|
|
436
|
-
- `"local"` - Uses Ollama running on your machine (alternative option)
|
|
413
|
+
- `"groq"` - Uses Groq's cloud API (requires `GROQ_API_KEY`)
|
|
437
414
|
|
|
438
415
|
#### `model`
|
|
439
416
|
|
|
440
417
|
**Type:** `string`
|
|
441
|
-
**
|
|
418
|
+
**Type:** `string`
|
|
419
|
+
**Default:** `"llama-3.1-8b-instant"`
|
|
442
420
|
|
|
443
421
|
The model identifier to use.
|
|
444
422
|
|
|
445
|
-
**Common Groq Models (Recommended):**
|
|
446
|
-
- `llama-3.1-8b-instant` - Fast responses (default, recommended)
|
|
447
|
-
- `llama-3.1-70b-versatile` - Higher quality, slower
|
|
448
|
-
|
|
449
|
-
**Common Local Models (Alternative):**
|
|
450
|
-
- `qwen2.5:3b-instruct` - Fast, lightweight - **3B parameters**
|
|
451
|
-
- `qwen2.5:1.5b-instruct` - Even faster, smaller - **1.5B parameters**
|
|
452
|
-
- `phi3:mini` - Microsoft Phi-3 Mini - **3.8B parameters**
|
|
453
423
|
|
|
454
|
-
> ⚠️ **Warning:** Models with more than 3 billion parameters (like `llama3:8b`, `mistral:7b`) may not work reliably with local Ollama setup. If you encounter issues, please try using a 3B parameter model instead, or switch to Groq.
|
|
455
|
-
|
|
456
|
-
#### `localModelUrl`
|
|
457
|
-
|
|
458
|
-
**Type:** `string`
|
|
459
|
-
**Default:** `"http://localhost:11434"`
|
|
460
|
-
|
|
461
|
-
Base URL for your local Ollama instance. Change this if Ollama runs on a different host or port.
|
|
462
424
|
|
|
463
425
|
#### `embeddingProvider`
|
|
464
426
|
|
|
@@ -560,9 +522,52 @@ Controls randomness in LLM responses (0.0 = deterministic, 1.0 = creative).
|
|
|
560
522
|
|
|
561
523
|
Number of code chunks to retrieve and use as context for each question. Higher values provide more context but may include less relevant information.
|
|
562
524
|
|
|
525
|
+
#### `maxFileSize`
|
|
526
|
+
|
|
527
|
+
**Type:** `number`
|
|
528
|
+
**Default:** `5242880` (5MB)
|
|
529
|
+
|
|
530
|
+
Maximum file size in bytes. Files larger than this will be excluded from indexing. Helps avoid processing large binary files or generated artifacts.
|
|
531
|
+
|
|
532
|
+
**Example:**
|
|
533
|
+
```json
|
|
534
|
+
{
|
|
535
|
+
"maxFileSize": 10485760 // 10MB
|
|
536
|
+
}
|
|
537
|
+
```
|
|
538
|
+
|
|
539
|
+
#### `skipBinaryFiles`
|
|
540
|
+
|
|
541
|
+
**Type:** `boolean`
|
|
542
|
+
**Default:** `true`
|
|
543
|
+
|
|
544
|
+
Whether to automatically skip binary files during indexing. Binary detection uses both file extension and content analysis.
|
|
545
|
+
|
|
546
|
+
**Example:**
|
|
547
|
+
```json
|
|
548
|
+
{
|
|
549
|
+
"skipBinaryFiles": true
|
|
550
|
+
}
|
|
551
|
+
```
|
|
552
|
+
|
|
553
|
+
#### `skipLargeFiles`
|
|
554
|
+
|
|
555
|
+
**Type:** `boolean`
|
|
556
|
+
**Default:** `true`
|
|
557
|
+
|
|
558
|
+
Whether to skip files exceeding `maxFileSize` during indexing. Set to `false` if you want to include all files regardless of size.
|
|
559
|
+
|
|
560
|
+
**Example:**
|
|
561
|
+
```json
|
|
562
|
+
{
|
|
563
|
+
"skipLargeFiles": true,
|
|
564
|
+
"maxFileSize": 10485760 // 10MB
|
|
565
|
+
}
|
|
566
|
+
```
|
|
567
|
+
|
|
563
568
|
### Example Configurations
|
|
564
569
|
|
|
565
|
-
#### Groq Cloud Provider (
|
|
570
|
+
#### Groq Cloud Provider (Default)
|
|
566
571
|
|
|
567
572
|
```json
|
|
568
573
|
{
|
|
@@ -582,28 +587,14 @@ Number of code chunks to retrieve and use as context for each question. Higher v
|
|
|
582
587
|
export GROQ_API_KEY="your-api-key"
|
|
583
588
|
```
|
|
584
589
|
|
|
585
|
-
#### Local Development (Alternative)
|
|
586
590
|
|
|
587
|
-
```json
|
|
588
|
-
{
|
|
589
|
-
"modelProvider": "local",
|
|
590
|
-
"model": "qwen2.5:3b-instruct",
|
|
591
|
-
"localModelUrl": "http://localhost:11434",
|
|
592
|
-
"embeddingProvider": "local",
|
|
593
|
-
"embeddingModel": "Xenova/all-MiniLM-L6-v2",
|
|
594
|
-
"maxChunkSize": 200,
|
|
595
|
-
"chunkOverlap": 20,
|
|
596
|
-
"temperature": 0.2,
|
|
597
|
-
"topK": 4
|
|
598
|
-
}
|
|
599
|
-
```
|
|
600
591
|
|
|
601
592
|
#### Optimized for Large Codebases
|
|
602
593
|
|
|
603
594
|
```json
|
|
604
595
|
{
|
|
605
|
-
"modelProvider": "
|
|
606
|
-
"model": "
|
|
596
|
+
"modelProvider": "groq",
|
|
597
|
+
"model": "llama-3.1-8b-instant",
|
|
607
598
|
"maxChunkSize": 150,
|
|
608
599
|
"chunkOverlap": 15,
|
|
609
600
|
"topK": 6,
|
|
@@ -731,8 +722,8 @@ When you run `codexa ask`:
|
|
|
731
722
|
▼
|
|
732
723
|
┌─────────────────┐ ┌──────────────┐
|
|
733
724
|
│ SQLite DB │◀────│ LLM │
|
|
734
|
-
│ (Chunks + │ │ (
|
|
735
|
-
│ Embeddings) │ │
|
|
725
|
+
│ (Chunks + │ │ (Groq) │
|
|
726
|
+
│ Embeddings) │ │ │
|
|
736
727
|
└─────────────────┘ └──────┬───────┘
|
|
737
728
|
│
|
|
738
729
|
▼
|
|
@@ -745,50 +736,12 @@ When you run `codexa ask`:
|
|
|
745
736
|
- **Chunker**: Splits code files into semantic chunks
|
|
746
737
|
- **Embedder**: Generates vector embeddings (local transformers)
|
|
747
738
|
- **Retriever**: Finds relevant chunks using vector similarity
|
|
748
|
-
- **LLM Client**: Generates answers (
|
|
739
|
+
- **LLM Client**: Generates answers (Groq cloud)
|
|
749
740
|
- **Database**: SQLite for storing chunks and embeddings
|
|
750
741
|
|
|
751
742
|
## Troubleshooting
|
|
752
743
|
|
|
753
|
-
### "Ollama not reachable" Error
|
|
754
744
|
|
|
755
|
-
**Problem:** Codexa can't connect to your local Ollama instance.
|
|
756
|
-
|
|
757
|
-
**Solutions:**
|
|
758
|
-
1. Ensure Ollama is running:
|
|
759
|
-
```bash
|
|
760
|
-
ollama serve
|
|
761
|
-
```
|
|
762
|
-
2. Check if Ollama is running on the default port:
|
|
763
|
-
```bash
|
|
764
|
-
curl http://localhost:11434/api/tags
|
|
765
|
-
```
|
|
766
|
-
3. If Ollama runs on a different host/port, update `.codexarc.json`:
|
|
767
|
-
```json
|
|
768
|
-
{
|
|
769
|
-
"localModelUrl": "http://your-host:port"
|
|
770
|
-
}
|
|
771
|
-
```
|
|
772
|
-
|
|
773
|
-
### "Model not found" Error
|
|
774
|
-
|
|
775
|
-
**Problem:** The specified Ollama model isn't available.
|
|
776
|
-
|
|
777
|
-
**Solutions:**
|
|
778
|
-
1. List available models:
|
|
779
|
-
```bash
|
|
780
|
-
ollama list
|
|
781
|
-
```
|
|
782
|
-
2. Pull the required model:
|
|
783
|
-
```bash
|
|
784
|
-
ollama pull qwen2.5:3b-instruct
|
|
785
|
-
```
|
|
786
|
-
3. Or update `.codexarc.json` to use an available model:
|
|
787
|
-
```json
|
|
788
|
-
{
|
|
789
|
-
"model": "your-available-model"
|
|
790
|
-
}
|
|
791
|
-
```
|
|
792
745
|
|
|
793
746
|
### "GROQ_API_KEY not set" Error
|
|
794
747
|
|
|
@@ -810,10 +763,13 @@ When you run `codexa ask`:
|
|
|
810
763
|
**Problem:** First ingestion takes too long.
|
|
811
764
|
|
|
812
765
|
**Solutions:**
|
|
813
|
-
1.
|
|
814
|
-
2.
|
|
815
|
-
3.
|
|
816
|
-
4.
|
|
766
|
+
1. The dynamic config should already optimize patterns - check your `.codexarc.json` was generated correctly
|
|
767
|
+
2. Reduce `maxFileSize` to exclude more large files
|
|
768
|
+
3. Reduce `maxChunkSize` to create more, smaller chunks
|
|
769
|
+
4. Add more patterns to `excludeGlobs` to skip unnecessary files
|
|
770
|
+
5. Be more specific with `includeGlobs` to focus on important files
|
|
771
|
+
6. Use `--force` only when necessary (incremental updates are faster)
|
|
772
|
+
7. Ensure `skipBinaryFiles` and `skipLargeFiles` are enabled (default)
|
|
817
773
|
|
|
818
774
|
### Poor Quality Answers
|
|
819
775
|
|
|
@@ -836,7 +792,7 @@ When you run `codexa ask`:
|
|
|
836
792
|
```bash
|
|
837
793
|
codexa ingest --force
|
|
838
794
|
```
|
|
839
|
-
|
|
795
|
+
|
|
840
796
|
5. Ask more specific questions
|
|
841
797
|
|
|
842
798
|
### Database Locked Error
|
|
@@ -869,7 +825,7 @@ A: Yes! Codexa processes everything locally by default. Your code never leaves y
|
|
|
869
825
|
A: Typically 10-50MB per 1000 files, depending on file sizes. The SQLite database stores chunks and embeddings.
|
|
870
826
|
|
|
871
827
|
**Q: Can I use Codexa in CI/CD?**
|
|
872
|
-
A: Yes, but you'll need to ensure
|
|
828
|
+
A: Yes, but you'll need to ensure your LLM provider is accessible. For CI/CD, consider using Groq (cloud).
|
|
873
829
|
|
|
874
830
|
**Q: Does Codexa work with monorepos?**
|
|
875
831
|
A: Yes! Adjust `includeGlobs` and `excludeGlobs` to target specific packages or workspaces.
|
package/dist/agent.js
CHANGED
|
@@ -9,15 +9,24 @@ const fs_extra_1 = __importDefault(require("fs-extra"));
|
|
|
9
9
|
const retriever_1 = require("./retriever");
|
|
10
10
|
const models_1 = require("./models");
|
|
11
11
|
const SYSTEM_PROMPT = `
|
|
12
|
-
You are RepoSage.
|
|
13
|
-
You answer questions about a codebase using ONLY the provided code snippets.
|
|
12
|
+
You are RepoSage, an expert codebase assistant that answers questions about codebases using the provided code snippets.
|
|
14
13
|
|
|
15
|
-
|
|
16
|
-
|
|
17
|
-
|
|
18
|
-
-
|
|
19
|
-
|
|
20
|
-
-
|
|
14
|
+
Your task is to provide accurate, helpful, and comprehensive answers based on the ACTUAL CODE provided.
|
|
15
|
+
|
|
16
|
+
CRITICAL PRIORITY RULES:
|
|
17
|
+
- ALWAYS prioritize CODE_SNIPPET sections over DOCUMENTATION sections when answering questions
|
|
18
|
+
- IGNORE DOCUMENTATION sections if they contradict or differ from what the code shows
|
|
19
|
+
- When there's a conflict between documentation and actual code, ALWAYS trust the code implementation
|
|
20
|
+
- Base your answers on what the CODE actually does, not what documentation claims
|
|
21
|
+
|
|
22
|
+
Guidelines:
|
|
23
|
+
- Analyze CODE_SNIPPET sections FIRST - these contain the actual implementation
|
|
24
|
+
- DOCUMENTATION sections are for reference only and should be IGNORED if they contradict code
|
|
25
|
+
- When answering questions about functionality, explain based on actual code execution flow
|
|
26
|
+
- Reference specific files and line numbers when relevant (from the FILE headers)
|
|
27
|
+
- Be direct and factual - if code shows something, state it clearly
|
|
28
|
+
- If asked about a specific file that isn't in the context, clearly state "The file [name] is not present in the provided code snippets"
|
|
29
|
+
- When analyzing code structure, look at imports, exports, and execution patterns
|
|
21
30
|
`;
|
|
22
31
|
async function askQuestion(cwd, config, options) {
|
|
23
32
|
const { question, session = 'default' } = options;
|
|
@@ -32,7 +41,13 @@ async function askQuestion(cwd, config, options) {
|
|
|
32
41
|
...history,
|
|
33
42
|
{
|
|
34
43
|
role: 'user',
|
|
35
|
-
content: `
|
|
44
|
+
content: `Based on the following code snippets from the codebase, please answer the question.
|
|
45
|
+
|
|
46
|
+
${context}
|
|
47
|
+
|
|
48
|
+
Question: ${question}
|
|
49
|
+
|
|
50
|
+
Please provide a comprehensive and helpful answer based on the code context above.`,
|
|
36
51
|
},
|
|
37
52
|
];
|
|
38
53
|
const llm = (0, models_1.createLLMClient)(config);
|
package/dist/cli.js
CHANGED
|
@@ -6,22 +6,52 @@ var __importDefault = (this && this.__importDefault) || function (mod) {
|
|
|
6
6
|
Object.defineProperty(exports, "__esModule", { value: true });
|
|
7
7
|
const commander_1 = require("commander");
|
|
8
8
|
const ora_1 = __importDefault(require("ora"));
|
|
9
|
+
const chalk_1 = __importDefault(require("chalk"));
|
|
9
10
|
const config_1 = require("./config");
|
|
10
11
|
const ingest_1 = require("./ingest");
|
|
11
12
|
const agent_1 = require("./agent");
|
|
12
13
|
const logger_1 = require("./utils/logger");
|
|
14
|
+
const formatter_1 = require("./utils/formatter");
|
|
15
|
+
const marked_1 = require("marked");
|
|
16
|
+
const marked_terminal_1 = __importDefault(require("marked-terminal"));
|
|
17
|
+
marked_1.marked.setOptions({
|
|
18
|
+
renderer: new marked_terminal_1.default({
|
|
19
|
+
tab: 2,
|
|
20
|
+
}),
|
|
21
|
+
});
|
|
13
22
|
const program = new commander_1.Command();
|
|
14
23
|
program
|
|
15
24
|
.name('codexa')
|
|
16
25
|
.description('Ask questions about any local repository from the command line.')
|
|
17
|
-
.version('
|
|
26
|
+
.version('1.1.1')
|
|
27
|
+
.action(() => {
|
|
28
|
+
console.log('\n');
|
|
29
|
+
logger_1.log.box(`${chalk_1.default.bold('Welcome to Codexa!')}\n\n` +
|
|
30
|
+
`${chalk_1.default.dim('Codexa is a CLI tool that helps you understand your codebase using AI.')}\n\n` +
|
|
31
|
+
`${chalk_1.default.bold('Getting Started:')}\n\n` +
|
|
32
|
+
`${chalk_1.default.dim('1.')} ${chalk_1.default.white('Initialize Codexa in your project:')}\n` +
|
|
33
|
+
` ${chalk_1.default.cyan('codexa init')}\n\n` +
|
|
34
|
+
`${chalk_1.default.dim('2.')} ${chalk_1.default.white('Index your codebase:')}\n` +
|
|
35
|
+
` ${chalk_1.default.cyan('codexa ingest')}\n\n` +
|
|
36
|
+
`${chalk_1.default.dim('3.')} ${chalk_1.default.white('Ask questions:')}\n` +
|
|
37
|
+
` ${chalk_1.default.cyan('codexa ask "your question"')}\n\n` +
|
|
38
|
+
`${chalk_1.default.dim('For more help, run:')} ${chalk_1.default.cyan('codexa --help')}`, '🚀 Codexa');
|
|
39
|
+
console.log('\n');
|
|
40
|
+
});
|
|
18
41
|
program
|
|
19
42
|
.command('init')
|
|
20
43
|
.description('Create a local .codexarc.json with sensible defaults.')
|
|
21
44
|
.action(async () => {
|
|
22
45
|
const cwd = process.cwd();
|
|
23
46
|
await (0, config_1.ensureConfig)(cwd);
|
|
24
|
-
|
|
47
|
+
console.log('\n');
|
|
48
|
+
logger_1.log.success('Created .codexarc.json with optimized settings for your codebase!');
|
|
49
|
+
console.log('\n');
|
|
50
|
+
logger_1.log.box(`${chalk_1.default.bold('Next Steps:')}\n\n` +
|
|
51
|
+
`${chalk_1.default.dim('1.')} ${chalk_1.default.white('Review .codexarc.json')} - Update provider keys if needed\n` +
|
|
52
|
+
`${chalk_1.default.dim('2.')} ${chalk_1.default.white('Run:')} ${chalk_1.default.cyan('codexa ingest')} ${chalk_1.default.dim('- Start indexing your codebase')}\n` +
|
|
53
|
+
`${chalk_1.default.dim('3.')} ${chalk_1.default.white('Run:')} ${chalk_1.default.cyan('codexa ask "your question"')} ${chalk_1.default.dim('- Ask questions about your code')}`, '🚀 Setup Complete');
|
|
54
|
+
console.log('\n');
|
|
25
55
|
});
|
|
26
56
|
program
|
|
27
57
|
.command('ingest')
|
|
@@ -43,15 +73,14 @@ program
|
|
|
43
73
|
.description('Ask a natural-language question about the current repo.')
|
|
44
74
|
.argument('<question...>', 'Question to ask about the codebase.')
|
|
45
75
|
.option('-s, --session <name>', 'session identifier to keep conversation context', 'default')
|
|
46
|
-
.option('--
|
|
76
|
+
.option('--stream', 'enable streaming output')
|
|
47
77
|
.action(async (question, options) => {
|
|
48
78
|
const cwd = process.cwd();
|
|
49
79
|
const config = await (0, config_1.loadConfig)(cwd);
|
|
50
80
|
const prompt = question.join(' ');
|
|
51
|
-
//
|
|
52
|
-
|
|
53
|
-
|
|
54
|
-
const stream = options.stream !== false;
|
|
81
|
+
// dfefault: non-streamed output
|
|
82
|
+
const stream = options.stream === true;
|
|
83
|
+
console.log((0, formatter_1.formatQuestion)(prompt));
|
|
55
84
|
const spinner = (0, ora_1.default)('Extracting Response...').start();
|
|
56
85
|
try {
|
|
57
86
|
const answer = await (0, agent_1.askQuestion)(cwd, config, {
|
|
@@ -70,11 +99,13 @@ program
|
|
|
70
99
|
spinner.text = status;
|
|
71
100
|
},
|
|
72
101
|
});
|
|
73
|
-
spinner.stop();
|
|
74
102
|
if (!stream) {
|
|
75
|
-
|
|
103
|
+
const rendered = marked_1.marked.parse(answer.trim());
|
|
104
|
+
spinner.stop();
|
|
105
|
+
console.log('\n' + rendered + '\n');
|
|
76
106
|
}
|
|
77
107
|
else {
|
|
108
|
+
spinner.stop();
|
|
78
109
|
console.log('\n');
|
|
79
110
|
}
|
|
80
111
|
}
|