codevault 1.3.0-beta.7 → 1.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -6,36 +6,118 @@ CodeVault is an intelligent code indexing and search system that enables AI assi
6
6
 
7
7
  ## 🌟 Features
8
8
 
9
- - **🔍 Semantic Code Search**: Find code by meaning, not just keywords
10
- - **🤖 MCP Integration**: Native support for Claude Desktop and other MCP clients
11
- - **🎯 Symbol-Aware Ranking**: Boost results based on function signatures and relationships
12
- - **⚡ Hybrid Search**: Combines vector embeddings with BM25 keyword matching
13
- - **🔄 Context Packs**: Save and reuse search scopes for different projects
9
+ - **🔍 Semantic Search**: Find code by meaning, not just keywords using vector embeddings
10
+ - **🤖 MCP Integration**: Native support for Claude Desktop and other MCP clients
11
+ - **🎯 Symbol-Aware Ranking**: Boost results based on function signatures, parameters, and relationships
12
+ - **⚡ Hybrid Retrieval**: Combines vector embeddings with BM25 keyword matching via Reciprocal Rank Fusion
13
+ - **🚀 Batch Processing**: Efficient API usage with configurable batching (50 chunks/batch by default)
14
+ - **📦 Smart Chunking**: Token-aware semantic code splitting with overlap for optimal context
15
+ - **🔄 Context Packs**: Save and reuse search scopes for different features/modules
16
+ - **🏠 Local-First**: Works with local models (Ollama) or cloud APIs (OpenAI, Nebius)
17
+ - **🔐 Optional Encryption**: AES-256-GCM encryption for indexed code chunks
18
+ - **⚙️ Global Configuration**: One-time setup with interactive wizard for CLI convenience
14
19
  - **📊 Multi-Language Support**: 25+ programming languages via Tree-sitter
15
- - **🏠 Local-First**: Works with local models (Ollama) or cloud APIs
16
- - **🔐 Optional Encryption**: Secure your indexed code chunks
17
- - **📈 Smart Chunking**: Token-aware code splitting for optimal context
20
+ - **🔎 File Watching**: Real-time index updates with debounced change detection
21
+ - **⏱️ Rate Limiting**: Intelligent request/token throttling with automatic retry
22
+ - **💾 Memory Efficient**: LRU caches with automatic cleanup for long-running processes
18
23
 
19
24
  ## 🚀 Quick Start
20
25
 
21
26
  ### Installation
22
27
 
28
+ #### NPM (Global - Recommended)
29
+
30
+ ```bash
31
+ # Install latest beta
32
+ npm install -g codevault@beta
33
+
34
+ # Interactive configuration setup (one-time)
35
+ codevault config init
36
+
37
+ # Index your project
38
+ cd /path/to/your/project
39
+ codevault index
40
+ ```
41
+
42
+ #### From Source
43
+
23
44
  ```bash
45
+ git clone https://github.com/shariqriazz/codevault.git
46
+ cd codevault
24
47
  npm install
25
48
  npm run build
26
49
  npm link
27
50
  ```
28
51
 
52
+ ### Configuration
53
+
54
+ CodeVault supports multiple configuration methods with clear priority:
55
+
56
+ **Priority:** Environment Variables > Project Config > Global Config > Defaults
57
+
58
+ #### Option 1: Interactive Setup (Recommended for CLI)
59
+
60
+ ```bash
61
+ codevault config init
62
+ ```
63
+
64
+ Guides you through:
65
+ - Provider selection (OpenAI, Ollama, Custom API)
66
+ - API key configuration
67
+ - Model selection (preset or custom)
68
+ - Advanced settings (rate limits, encryption, reranking)
69
+
70
+ Configuration saved to `~/.codevault/config.json`
71
+
72
+ #### Option 2: Manual CLI Configuration
73
+
74
+ ```bash
75
+ # Set API key
76
+ codevault config set providers.openai.apiKey sk-your-key-here
77
+ codevault config set providers.openai.model text-embedding-3-large
78
+
79
+ # View configuration
80
+ codevault config list
81
+
82
+ # See all config sources
83
+ codevault config list --sources
84
+ ```
85
+
86
+ #### Option 3: Environment Variables (MCP / CI/CD)
87
+
88
+ ```bash
89
+ # OpenAI
90
+ export OPENAI_API_KEY=sk-your-key-here
91
+ export CODEVAULT_OPENAI_EMBEDDING_MODEL=text-embedding-3-large
92
+
93
+ # Ollama
94
+ export CODEVAULT_OLLAMA_MODEL=nomic-embed-text
95
+
96
+ # Custom settings
97
+ export CODEVAULT_MAX_TOKENS=8192
98
+ export CODEVAULT_DIMENSIONS=3072
99
+ ```
100
+
101
+ #### Option 4: Project-Specific Config
102
+
103
+ ```bash
104
+ # Set local config (project-specific)
105
+ codevault config set --local provider ollama
106
+ codevault config set --local providers.ollama.model nomic-embed-text
107
+ ```
108
+
109
+ See [`CONFIGURATION.md`](CONFIGURATION.md) for complete configuration guide.
110
+
29
111
  ### Index Your Project
30
112
 
31
113
  ```bash
32
- # Navigate to your project
33
- cd /path/to/your/project
114
+ # Using global config (if set via codevault config init)
115
+ codevault index
34
116
 
35
117
  # Using Ollama (local, no API key required)
36
118
  codevault index --provider ollama
37
119
 
38
- # Using OpenAI
120
+ # Using OpenAI with custom settings
39
121
  export OPENAI_API_KEY=your-key-here
40
122
  codevault index --provider openai
41
123
 
@@ -44,25 +126,58 @@ export OPENAI_API_KEY=your-nebius-api-key
44
126
  export OPENAI_BASE_URL=https://api.studio.nebius.com/v1/
45
127
  export CODEVAULT_OPENAI_EMBEDDING_MODEL=Qwen/Qwen3-Embedding-8B
46
128
  codevault index --provider openai
129
+
130
+ # With encryption
131
+ export CODEVAULT_ENCRYPTION_KEY=$(openssl rand -base64 32)
132
+ codevault index --encrypt on
133
+
134
+ # Watch for changes (auto-update index)
135
+ codevault watch --debounce 500
47
136
  ```
48
137
 
49
138
  ### Search Your Code
50
139
 
51
140
  ```bash
141
+ # Basic search
52
142
  codevault search "authentication function"
143
+
144
+ # Search with filters
53
145
  codevault search "stripe checkout" --tags stripe --lang php
146
+
147
+ # Search with full code chunks
148
+ codevault search-with-code "database connection" --limit 5
149
+
150
+ # View project stats
151
+ codevault info
54
152
  ```
55
153
 
56
154
  ### Use with Claude Desktop
57
155
 
58
156
  Add to your `claude_desktop_config.json`:
59
157
 
158
+ ```json
159
+ {
160
+ "mcpServers": {
161
+ "codevault": {
162
+ "command": "npx",
163
+ "args": ["-y", "codevault@beta", "mcp"],
164
+ "env": {
165
+ "OPENAI_API_KEY": "your-api-key-here",
166
+ "CODEVAULT_OPENAI_EMBEDDING_MODEL": "text-embedding-3-large"
167
+ }
168
+ }
169
+ }
170
+ }
171
+ ```
172
+
173
+ Or use local installation:
174
+
60
175
  ```json
61
176
  {
62
177
  "mcpServers": {
63
178
  "codevault": {
64
179
  "command": "node",
65
- "args": ["/path/to/codevault-v2/dist/mcp-server.js"],
180
+ "args": ["/path/to/codevault/dist/mcp-server.js"],
66
181
  "env": {
67
182
  "OPENAI_API_KEY": "your-api-key-here"
68
183
  }
@@ -73,102 +188,331 @@ Add to your `claude_desktop_config.json`:
73
188
 
74
189
  ## 📖 Documentation
75
190
 
76
- ### Supported Languages
77
-
78
- - **Web**: JavaScript, TypeScript, TSX, HTML, CSS, JSON, Markdown
79
- - **Backend**: Python, PHP, Go, Java, Kotlin, C#, Ruby, Scala, Swift
80
- - **Systems**: C, C++, Rust
81
- - **Functional**: Haskell, OCaml, Elixir
82
- - **Scripting**: Bash, Lua
83
-
84
- ### Embedding Providers
85
-
86
- | Provider | Model | Dimensions | Context | Best For | API Key Required |
87
- |----------|-------|------------|---------|----------|------------------|
88
- | **ollama** | nomic-embed-text | 768 | 8K | Local, no API costs | ❌ No |
89
- | **openai** | text-embedding-3-large | 3072 | 8K | Highest quality | ✅ Yes |
90
- | **openai** | Qwen/Qwen3-Embedding-8B | 4096 | 32K | Large context, high quality | ✅ Yes (Nebius AI) |
91
-
92
191
  ### CLI Commands
93
192
 
94
193
  ```bash
194
+ # Configuration Management
195
+ codevault config init # Interactive setup wizard
196
+ codevault config set <key> <value> # Set global config value
197
+ codevault config set --local <key> <val> # Set project config value
198
+ codevault config get <key> # Get config value
199
+ codevault config list # Show merged config
200
+ codevault config list --sources # Show all config sources
201
+ codevault config unset <key> # Remove config value
202
+ codevault config path # Show config file paths
203
+
95
204
  # Indexing
96
- codevault index [path] # Index project
97
- codevault update [path] # Update existing index
98
- codevault watch [path] # Watch for changes
99
-
100
- # Searching
101
- codevault search <query> [path] # Search code
102
- --limit <num> # Max results
103
- --provider <name> # Embedding provider
104
- --path_glob <pattern> # Filter by file pattern
105
- --tags <tag...> # Filter by tags
106
- --lang <language...> # Filter by language
205
+ codevault index [path] # Index project
206
+ codevault index --provider openai # Use specific provider
207
+ codevault index --encrypt on # Enable encryption
208
+ codevault update [path] # Update existing index
209
+ codevault watch [path] # Watch for changes
210
+ codevault watch --debounce 1000 # Custom debounce interval
211
+
212
+ # Searching
213
+ codevault search <query> [path] # Search code (metadata only)
214
+ --limit <num> # Max results (default: 10)
215
+ --provider <name> # Embedding provider
216
+ --path_glob <pattern> # Filter by file pattern
217
+ --tags <tag...> # Filter by tags
218
+ --lang <language...> # Filter by language
219
+ --reranker <off|api> # Enable API reranking
220
+ --hybrid <on|off> # Hybrid search (default: on)
221
+ --bm25 <on|off> # BM25 keyword search (default: on)
222
+ --symbol_boost <on|off> # Symbol boosting (default: on)
223
+
224
+ codevault search-with-code <query> # Search with full code chunks
225
+ --max-code-size <bytes> # Max code size per chunk
107
226
 
108
227
  # Context Packs
109
- codevault context list # List saved contexts
110
- codevault context show <name> # Show context pack
111
- codevault context use <name> # Activate context pack
228
+ codevault context list # List saved contexts
229
+ codevault context show <name> # Show context pack details
230
+ codevault context use <name> # Activate context pack
112
231
 
113
232
  # Utilities
114
- codevault info # Project statistics
115
- codevault mcp # Start MCP server
233
+ codevault info # Project statistics
234
+ codevault mcp # Start MCP server
235
+ codevault --version # Show version
116
236
  ```
117
237
 
118
238
  ### MCP Tools
119
239
 
120
- - **`search_code`**: Semantic code search with filters
121
- - **`search_code_with_chunks`**: Search + retrieve full code
122
- - **`get_code_chunk`**: Get specific code by SHA
240
+ When used via MCP, CodeVault provides these tools:
241
+
242
+ - **`search_code`**: Semantic search returning metadata (paths, symbols, scores, SHAs)
243
+ - **`search_code_with_chunks`**: Search + retrieve full code for each result
244
+ - **`get_code_chunk`**: Get specific code chunk by SHA
123
245
  - **`index_project`**: Index a new project
124
246
  - **`update_project`**: Update existing index
125
- - **`get_project_stats`**: Project overview
126
- - **`use_context_pack`**: Apply saved search context
247
+ - **`get_project_stats`**: Get project overview and statistics
248
+ - **`use_context_pack`**: Apply saved search context/scope
249
+
250
+ ### Supported Languages
251
+
252
+ - **Web**: JavaScript, TypeScript, TSX, HTML, CSS, JSON, Markdown
253
+ - **Backend**: Python, PHP, Go, Java, Kotlin, C#, Ruby, Scala, Swift
254
+ - **Systems**: C, C++, Rust
255
+ - **Functional**: Haskell, OCaml, Elixir
256
+ - **Scripting**: Bash, Lua
257
+
258
+ ### Embedding Providers
259
+
260
+ | Provider | Model | Dimensions | Context | Best For | API Key Required |
261
+ |----------|-------|------------|---------|----------|------------------|
262
+ | **ollama** | nomic-embed-text | 768 | 8K | Local, no API costs | ❌ No |
263
+ | **openai** | text-embedding-3-large | 3072 | 8K | Highest quality | ✅ Yes |
264
+ | **openai** | text-embedding-3-small | 1536 | 8K | Faster, cheaper | ✅ Yes |
265
+ | **openai** | Qwen/Qwen3-Embedding-8B | 4096 | 32K | Large context, high quality | ✅ Yes (Nebius) |
266
+ | **custom** | Your choice | Custom | Custom | Any OpenAI-compatible API | ✅ Yes |
127
267
 
128
268
  ### Environment Variables
129
269
 
130
270
  ```bash
131
- # OpenAI Configuration
271
+ # Provider Configuration
132
272
  OPENAI_API_KEY=sk-...
133
- OPENAI_BASE_URL=https://api.openai.com/v1
273
+ OPENAI_BASE_URL=https://api.openai.com/v1 # For custom endpoints
134
274
  CODEVAULT_OPENAI_EMBEDDING_MODEL=text-embedding-3-large
135
-
136
- # Ollama Configuration
137
275
  CODEVAULT_OLLAMA_MODEL=nomic-embed-text
138
276
 
139
- # Chunking
140
- CODEVAULT_MAX_TOKENS=8192
141
- CODEVAULT_DIMENSIONS=3072
277
+ # Chunking Configuration
278
+ CODEVAULT_MAX_TOKENS=8192 # Max tokens per chunk
279
+ CODEVAULT_DIMENSIONS=3072 # Embedding dimensions
142
280
 
143
281
  # Rate Limiting
144
- CODEVAULT_RATE_LIMIT_RPM=10000 # Requests per minute
145
- CODEVAULT_RATE_LIMIT_TPM=600000 # Tokens per minute
282
+ CODEVAULT_RATE_LIMIT_RPM=10000 # Requests per minute
283
+ CODEVAULT_RATE_LIMIT_TPM=600000 # Tokens per minute
284
+
285
+ # Encryption
286
+ CODEVAULT_ENCRYPTION_KEY=... # 32-byte key (base64 or hex)
146
287
 
147
- # Reranking
288
+ # API Reranking
148
289
  CODEVAULT_RERANK_API_URL=...
149
290
  CODEVAULT_RERANK_API_KEY=...
150
291
  CODEVAULT_RERANK_MODEL=...
151
292
 
152
- # Encryption
153
- CODEVAULT_ENCRYPTION_KEY=...
293
+ # Memory Management
294
+ CODEVAULT_CACHE_CLEAR_INTERVAL=3600000 # Cache cleanup interval (ms)
154
295
  ```
155
296
 
156
297
  ## 🏗️ Architecture
157
298
 
299
+ ### How It Works
300
+
301
+ 1. **Indexing Phase**
302
+ - Parses source files using Tree-sitter
303
+ - Extracts symbols, signatures, and relationships
304
+ - Creates semantic chunks (token-aware, with overlap)
305
+ - Batch generates embeddings (50 chunks/batch)
306
+ - Stores in SQLite + compressed chunks on disk
307
+
308
+ 2. **Search Phase**
309
+ - Generates query embedding
310
+ - Performs vector similarity search
311
+ - Runs BM25 keyword search (if enabled)
312
+ - Applies Reciprocal Rank Fusion
313
+ - Boosts results based on symbol matching
314
+ - Optionally applies API reranking
315
+ - Returns ranked results with metadata
316
+
317
+ 3. **Retrieval Phase**
318
+ - Fetches code chunks by SHA
319
+ - Decompresses and decrypts (if encrypted)
320
+ - Returns full code with context
321
+
158
322
  ### Project Structure
159
323
 
160
324
  ```
161
325
  .codevault/
162
- ├── codevault.db # SQLite database
163
- ├── chunks/ # Compressed code chunks
164
- └── contextpacks/ # Saved search contexts
165
- codevault.codemap.json # Lightweight index
326
+ ├── codevault.db # SQLite: embeddings + metadata
327
+ ├── chunks/ # Compressed code chunks
328
+ │ ├── <sha>.gz # Plain compressed
329
+ │ └── <sha>.gz.enc # Encrypted compressed
330
+ └── contextpacks/ # Saved search contexts
331
+ └── feature-auth.json # Example context pack
332
+
333
+ codevault.codemap.json # Lightweight index (symbol graph)
334
+
335
+ ~/.codevault/ # Global CLI configuration
336
+ └── config.json # User-wide settings
337
+ ```
338
+
339
+ ### Advanced Features
340
+
341
+ #### Batch Processing
342
+
343
+ Embeddings are generated in batches of 50 for optimal API efficiency:
344
+
345
+ ```typescript
346
+ // Automatic batching - no configuration needed
347
+ // Processes 50 chunks per API call
348
+ // Falls back to individual processing on error
349
+ ```
350
+
351
+ #### Smart Chunking
352
+
353
+ Token-aware semantic chunking with configurable limits:
354
+
355
+ - Respects function/class boundaries
356
+ - Applies overlap for context continuity
357
+ - Subdivides large functions intelligently
358
+ - Merges small chunks when beneficial
359
+
360
+ #### Symbol-Aware Ranking
361
+
362
+ Boosts search results based on:
363
+ - Exact symbol name matches
364
+ - Function signature matches
365
+ - Parameter name matches
366
+ - Symbol neighbor relationships (calls, imports)
367
+
368
+ #### Hybrid Search
369
+
370
+ Combines multiple ranking signals:
371
+ - Vector similarity (semantic understanding)
372
+ - BM25 keyword matching (exact term matches)
373
+ - Symbol boost (code structure awareness)
374
+ - Reciprocal Rank Fusion (combines rankings)
375
+
376
+ #### Context Packs
377
+
378
+ Save search scopes for reuse:
379
+
380
+ ```json
381
+ {
382
+ "key": "feature-auth",
383
+ "name": "Authentication Feature",
384
+ "description": "Login, signup, password reset",
385
+ "scope": {
386
+ "path_glob": ["src/auth/**", "src/middleware/auth.ts"],
387
+ "tags": ["auth", "security"],
388
+ "lang": ["typescript", "javascript"]
389
+ }
390
+ }
391
+ ```
392
+
393
+ Usage:
394
+ ```bash
395
+ codevault context use feature-auth
396
+ codevault search "token validation" # Scoped to auth files
397
+ ```
398
+
399
+ #### File Watching
400
+
401
+ Real-time index updates with intelligent debouncing:
402
+
403
+ ```bash
404
+ codevault watch --debounce 500
166
405
  ```
167
406
 
407
+ - Detects file changes, additions, deletions
408
+ - Batches rapid changes (debouncing)
409
+ - Updates only affected chunks
410
+ - Preserves index consistency
411
+
412
+ #### Encryption
413
+
414
+ AES-256-GCM encryption for code chunks:
415
+
416
+ ```bash
417
+ # Generate secure key
418
+ export CODEVAULT_ENCRYPTION_KEY=$(openssl rand -base64 32)
419
+
420
+ # Index with encryption
421
+ codevault index --encrypt on
422
+
423
+ # Files stored as .gz.enc instead of .gz
424
+ # Automatic decryption on read (requires key)
425
+ ```
426
+
427
+ ## 🔧 Performance & Optimization
428
+
429
+ ### Memory Management
430
+
431
+ - LRU caches with automatic eviction
432
+ - Periodic cache cleanup (configurable interval)
433
+ - Graceful shutdown handlers for MCP server
434
+ - Token counter caching for repeated operations
435
+
436
+ ### Rate Limiting
437
+
438
+ Intelligent throttling prevents API errors:
439
+
440
+ - Configurable RPM (requests per minute)
441
+ - Configurable TPM (tokens per minute)
442
+ - Automatic retry with exponential backoff
443
+ - Queue size limits prevent memory exhaustion
444
+
445
+ ### Batch Efficiency
446
+
447
+ - 50 chunks per embedding API call (vs 1 per call)
448
+ - Reduces API overhead by ~98%
449
+ - Automatic fallback for failed batches
450
+ - Preserves partial progress on errors
451
+
452
+ ## 🐛 Troubleshooting
453
+
454
+ ### Common Issues
455
+
456
+ **"Which config is being used?"**
457
+ ```bash
458
+ codevault config list --sources
459
+ ```
460
+
461
+ **"MCP not using my global config"**
462
+
463
+ This is correct! MCP uses environment variables by design. Global config is for CLI convenience only.
464
+
465
+ **"Rate limit errors"**
466
+ ```bash
467
+ # Reduce rate limits
468
+ codevault config set rateLimit.rpm 100
469
+ codevault config set rateLimit.tpm 10000
470
+ ```
471
+
472
+ **"Out of memory during indexing"**
473
+ ```bash
474
+ # Reduce batch size via environment
475
+ export BATCH_SIZE=25
476
+ codevault index
477
+ ```
478
+
479
+ **"Encryption key errors"**
480
+ ```bash
481
+ # Generate valid key (32 bytes)
482
+ export CODEVAULT_ENCRYPTION_KEY=$(openssl rand -base64 32)
483
+ ```
484
+
485
+ ## 🤝 Contributing
486
+
487
+ Contributions welcome! Please:
488
+
489
+ 1. Fork the repository
490
+ 2. Create a feature branch
491
+ 3. Make your changes
492
+ 4. Add tests if applicable
493
+ 5. Submit a pull request
494
+
168
495
  ## 📄 License
169
496
 
170
- MIT License
497
+ MIT License - see [LICENSE](LICENSE) file for details.
498
+
499
+ ## 🔗 Links
500
+
501
+ - **GitHub**: https://github.com/shariqriazz/codevault
502
+ - **NPM**: https://www.npmjs.com/package/codevault
503
+ - **Issues**: https://github.com/shariqriazz/codevault/issues
504
+ - **Configuration Guide**: [CONFIGURATION.md](CONFIGURATION.md)
505
+
506
+ ## 🙏 Acknowledgments
507
+
508
+ Built with:
509
+ - [Model Context Protocol](https://modelcontextprotocol.io/) - AI integration framework
510
+ - [Tree-sitter](https://tree-sitter.github.io/) - Parsing infrastructure
511
+ - [OpenAI](https://openai.com/) - Embedding models
512
+ - [Ollama](https://ollama.ai/) - Local model support
171
513
 
172
514
  ---
173
515
 
174
- **Built by Shariq Riaz**
516
+ **Version**: 1.3.0-beta.7
517
+ **Built by**: Shariq Riaz
518
+ **Last Updated**: January 2025
package/dist/cli.js CHANGED
File without changes
File without changes
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "codevault",
3
- "version": "1.3.0-beta.7",
3
+ "version": "1.3.0",
4
4
  "description": "AI-powered semantic code search via Model Context Protocol",
5
5
  "type": "module",
6
6
  "main": "dist/index.js",
@@ -16,12 +16,12 @@
16
16
  ],
17
17
  "repository": {
18
18
  "type": "git",
19
- "url": "git+https://github.com/shariqriazz/codevault-v2.git"
19
+ "url": "git+https://github.com/shariqriazz/codevault.git"
20
20
  },
21
21
  "bugs": {
22
- "url": "https://github.com/shariqriazz/codevault-v2/issues"
22
+ "url": "https://github.com/shariqriazz/codevault/issues"
23
23
  },
24
- "homepage": "https://github.com/shariqriazz/codevault-v2#readme",
24
+ "homepage": "https://github.com/shariqriazz/codevault#readme",
25
25
  "scripts": {
26
26
  "build": "tsc",
27
27
  "dev": "tsc --watch",