kontext-engine 0.1.3 → 0.1.4
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +123 -83
- package/dist/cli/index.js +594 -344
- package/dist/cli/index.js.map +1 -1
- package/dist/index.js +374 -85
- package/dist/index.js.map +1 -1
- package/package.json +3 -2
package/README.md
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
|
-
# ctx
|
|
1
|
+
# ctx - Context Engine for AI Coding Agents
|
|
2
2
|
|
|
3
3
|
> Give your AI coding agent deep understanding of any codebase.
|
|
4
|
-
> No plugins, no MCP
|
|
4
|
+
> No plugins, no MCP - just a CLI.
|
|
5
5
|
|
|
6
6
|
Any agent that can run bash can use `ctx`. Zero integration required.
|
|
7
7
|
|
|
@@ -17,7 +17,7 @@ ctx ask "how does the auth middleware validate tokens?" # LLM-steered natural l
|
|
|
17
17
|
|
|
18
18
|
AI coding agents are blind. They either read the whole codebase (blows context windows), rely on grep (misses semantic meaning), or need hand-crafted AGENTS.md files that don't scale.
|
|
19
19
|
|
|
20
|
-
`ctx` fixes this. One command indexes your codebase into a local SQLite database. Every search combines **five strategies**
|
|
20
|
+
`ctx` fixes this. One command indexes your codebase into a local SQLite database. Every search combines **five strategies** - vector similarity, full-text, AST symbol lookup, path matching, and dependency tracing - then fuses the results with Reciprocal Rank Fusion.
|
|
21
21
|
|
|
22
22
|
The result: your agent gets exactly the right files and line ranges, in milliseconds.
|
|
23
23
|
|
|
@@ -25,24 +25,27 @@ The result: your agent gets exactly the right files and line ranges, in millisec
|
|
|
25
25
|
|
|
26
26
|
## Features
|
|
27
27
|
|
|
28
|
-
-
|
|
29
|
-
-
|
|
30
|
-
-
|
|
31
|
-
-
|
|
32
|
-
-
|
|
33
|
-
-
|
|
34
|
-
-
|
|
35
|
-
-
|
|
28
|
+
- **Multi-strategy search** - five search strategies fused with Reciprocal Rank Fusion (RRF)
|
|
29
|
+
- **Semantic search** - vector embeddings via `all-MiniLM-L6-v2` (runs 100% locally)
|
|
30
|
+
- **Full-text search** - SQLite FTS5 with BM25 ranking, sanitized query handling for special characters
|
|
31
|
+
- **AST-aware symbol lookup** - Tree-sitter parsing for functions, classes, types, imports across 30+ languages
|
|
32
|
+
- **Path and dependency tracing** - glob matching + BFS dependency graph traversal
|
|
33
|
+
- **LLM-steered queries** - Gemini / OpenAI / Anthropic turn natural language into precise multi-strategy search plans
|
|
34
|
+
- **Smart result ranking** - import deprioritization, test file penalty, small snippet penalty, file diversity, export/public API boost
|
|
35
|
+
- **Incremental indexing** - SHA-256 hash comparison, only re-indexes changed files
|
|
36
|
+
- **File watching** - `ctx watch` auto re-indexes on save
|
|
37
|
+
- **100% local** - your code never leaves your machine (unless you opt into API embeddings or LLM steering)
|
|
36
38
|
|
|
37
39
|
---
|
|
38
40
|
|
|
39
41
|
## Installation
|
|
40
42
|
|
|
41
43
|
```bash
|
|
42
|
-
npm install -g kontext
|
|
44
|
+
npm install -g kontext-engine
|
|
43
45
|
|
|
44
|
-
# Or run directly
|
|
45
|
-
npx kontext init
|
|
46
|
+
# Or run directly (any of these work)
|
|
47
|
+
npx kontext-engine init
|
|
48
|
+
npx ctx init
|
|
46
49
|
```
|
|
47
50
|
|
|
48
51
|
Requires **Node.js 20+**.
|
|
@@ -56,7 +59,7 @@ Requires **Node.js 20+**.
|
|
|
56
59
|
cd my-project
|
|
57
60
|
ctx init
|
|
58
61
|
|
|
59
|
-
# 2. Search (JSON output
|
|
62
|
+
# 2. Search (JSON output - perfect for agents)
|
|
60
63
|
ctx query "error handling"
|
|
61
64
|
|
|
62
65
|
# 3. Search (human-readable text)
|
|
@@ -66,12 +69,54 @@ ctx query "error handling" -f text
|
|
|
66
69
|
export CTX_GEMINI_KEY=your-key # or CTX_OPENAI_KEY / CTX_ANTHROPIC_KEY
|
|
67
70
|
ctx ask "how does the payment flow handle failed charges?"
|
|
68
71
|
|
|
69
|
-
# 5. Watch mode
|
|
72
|
+
# 5. Watch mode - auto re-index on file changes
|
|
70
73
|
ctx watch
|
|
71
74
|
```
|
|
72
75
|
|
|
73
76
|
---
|
|
74
77
|
|
|
78
|
+
## Search Quality
|
|
79
|
+
|
|
80
|
+
`ctx` goes beyond basic search fusion. Results are ranked through multiple passes to surface the most relevant code:
|
|
81
|
+
|
|
82
|
+
### Reciprocal Rank Fusion (RRF)
|
|
83
|
+
|
|
84
|
+
Results from all active strategies (vector, FTS, AST, path, dependency) are combined using RRF with K=60 and per-strategy weights. This produces a unified ranking without needing to normalize scores across different metrics.
|
|
85
|
+
|
|
86
|
+
### Path Boosting
|
|
87
|
+
|
|
88
|
+
Files whose path matches the query terms get a boost:
|
|
89
|
+
- **1.5x** for directory name matches (e.g., querying "indexer" boosts files in `src/indexer/`)
|
|
90
|
+
- **1.4x** for filename matches
|
|
91
|
+
|
|
92
|
+
### Import Deprioritization
|
|
93
|
+
|
|
94
|
+
Import blocks (import statements, require calls) receive a **0.5x penalty** when non-import results exist. This prevents import blocks from outranking actual implementations.
|
|
95
|
+
|
|
96
|
+
### Test File Deprioritization
|
|
97
|
+
|
|
98
|
+
Test files (`tests/`, `__tests__/`, `*.test.*`, `*.spec.*`) receive a **0.65x penalty** when non-test results exist. Test code is useful but rarely the primary answer to "how does X work?"
|
|
99
|
+
|
|
100
|
+
### Small Snippet Penalty
|
|
101
|
+
|
|
102
|
+
Results spanning only 1-3 lines (bare constants, trivial type aliases) get a mild penalty. A `const MAX_RETRIES = 3` should not outrank the retry logic itself.
|
|
103
|
+
|
|
104
|
+
### File Diversity
|
|
105
|
+
|
|
106
|
+
Diminishing returns per file prevent one file from dominating results:
|
|
107
|
+
- 1st result from a file: 1.0x
|
|
108
|
+
- 2nd result: 0.9x
|
|
109
|
+
- 3rd result: 0.8x
|
|
110
|
+
- 4th+: 0.7x
|
|
111
|
+
|
|
112
|
+
This ensures results spread across the codebase, giving broader context.
|
|
113
|
+
|
|
114
|
+
### Export Boost
|
|
115
|
+
|
|
116
|
+
Exported/public API symbols get a mild boost over internal helpers. When you ask about "chunking", the exported `chunkFile()` function ranks higher than the private `canMerge()` helper.
|
|
117
|
+
|
|
118
|
+
---
|
|
119
|
+
|
|
75
120
|
## CLI Reference
|
|
76
121
|
|
|
77
122
|
### `ctx init [path]`
|
|
@@ -83,7 +128,7 @@ ctx init # Index current directory
|
|
|
83
128
|
ctx init ./my-project # Index specific path
|
|
84
129
|
```
|
|
85
130
|
|
|
86
|
-
Runs incrementally on subsequent calls
|
|
131
|
+
Runs incrementally on subsequent calls - only processes changed files.
|
|
87
132
|
|
|
88
133
|
### `ctx query <query>`
|
|
89
134
|
|
|
@@ -102,10 +147,10 @@ ctx query "auth" --language typescript # Filter by language
|
|
|
102
147
|
| Flag | Description | Default |
|
|
103
148
|
|---|---|---|
|
|
104
149
|
| `-f, --format <fmt>` | Output format: `json` or `text` | `json` |
|
|
105
|
-
| `-s, --strategy <list>` | Comma-separated: `vector,fts,ast,path` | `fts,ast` |
|
|
150
|
+
| `-s, --strategy <list>` | Comma-separated: `vector,fts,ast,path` | `fts,ast,path` |
|
|
106
151
|
| `-l, --limit <n>` | Maximum results | `10` |
|
|
107
152
|
| `--language <lang>` | Filter by language | all |
|
|
108
|
-
| `--no-vectors` | Skip vector search |
|
|
153
|
+
| `--no-vectors` | Skip vector search | - |
|
|
109
154
|
|
|
110
155
|
**JSON output (for agents):**
|
|
111
156
|
|
|
@@ -134,11 +179,11 @@ ctx query "auth" --language typescript # Filter by language
|
|
|
134
179
|
```
|
|
135
180
|
Query: "authentication"
|
|
136
181
|
|
|
137
|
-
src/middleware/auth.ts L14
|
|
182
|
+
src/middleware/auth.ts L14-L89 (0.94)
|
|
138
183
|
validateToken [function]
|
|
139
184
|
export async function validateToken(token: string) { ... }
|
|
140
185
|
|
|
141
|
-
src/routes/login.ts L45
|
|
186
|
+
src/routes/login.ts L45-L112 (0.87)
|
|
142
187
|
handleLogin [function]
|
|
143
188
|
...
|
|
144
189
|
|
|
@@ -167,7 +212,7 @@ ctx ask "auth flow" -p openai # Force specific provider
|
|
|
167
212
|
| `-f, --format <fmt>` | Output format: `json` or `text` | `text` |
|
|
168
213
|
| `-l, --limit <n>` | Maximum results | `10` |
|
|
169
214
|
| `-p, --provider <name>` | LLM provider: `gemini`, `openai`, `anthropic` | auto-detect |
|
|
170
|
-
| `--no-explain` | Skip explanation, return raw search results |
|
|
215
|
+
| `--no-explain` | Skip explanation, return raw search results | - |
|
|
171
216
|
|
|
172
217
|
**Requires an API key** (set via environment variable):
|
|
173
218
|
|
|
@@ -177,11 +222,13 @@ export CTX_OPENAI_KEY=your-key # GPT-4o-mini
|
|
|
177
222
|
export CTX_ANTHROPIC_KEY=your-key # Claude 3.5 Haiku
|
|
178
223
|
```
|
|
179
224
|
|
|
180
|
-
Falls back to
|
|
225
|
+
Falls back to keyword-based multi-strategy search if no API key is available. A warning is shown when no LLM provider is detected.
|
|
226
|
+
|
|
227
|
+
**Natural language handling:** Queries like "how does the indexer work?" are automatically processed - stop words are stripped, code identifiers (camelCase, snake_case, dotted names like `fs.readFileSync`) are preserved, and the cleaned terms are used across all search strategies.
|
|
181
228
|
|
|
182
229
|
### `ctx watch [path]`
|
|
183
230
|
|
|
184
|
-
Watch mode
|
|
231
|
+
Watch mode - monitors files and re-indexes automatically when you save.
|
|
185
232
|
|
|
186
233
|
```bash
|
|
187
234
|
ctx watch # Watch current directory
|
|
@@ -209,7 +256,7 @@ ctx status
|
|
|
209
256
|
```
|
|
210
257
|
|
|
211
258
|
```
|
|
212
|
-
Kontext Status
|
|
259
|
+
Kontext Status - /path/to/project
|
|
213
260
|
|
|
214
261
|
Initialized: Yes
|
|
215
262
|
Database: .ctx/index.db (14.2 MB)
|
|
@@ -304,67 +351,58 @@ Configuration lives in `.ctx/config.json`, created automatically by `ctx init`.
|
|
|
304
351
|
| `vector` | KNN cosine similarity on embeddings | Semantic/conceptual search |
|
|
305
352
|
| `fts` | SQLite FTS5 full-text search with BM25 | Keyword/exact term search |
|
|
306
353
|
| `ast` | Symbol name/type/parent matching | Finding specific functions, classes, types |
|
|
307
|
-
| `path` | Glob-pattern file path matching | Finding files by name or directory |
|
|
354
|
+
| `path` | Glob-pattern and keyword file path matching | Finding files by name or directory |
|
|
308
355
|
| `dependency` | BFS traversal of import/require graph | Tracing what depends on what |
|
|
309
356
|
|
|
310
|
-
|
|
357
|
+
Default strategies are `fts,ast,path`. Vector search is opt-in (add `vector` to the strategy list or configure in `.ctx/config.json`). Dependency tracing runs when queries match dependency patterns.
|
|
358
|
+
|
|
359
|
+
Results from all strategies are fused using **Reciprocal Rank Fusion (RRF)** with K=60 and per-strategy weights, then re-ranked with path boosting, import/test deprioritization, file diversity, and export boosting.
|
|
311
360
|
|
|
312
361
|
---
|
|
313
362
|
|
|
314
363
|
## Architecture
|
|
315
364
|
|
|
316
|
-
|
|
317
|
-
|
|
318
|
-
|
|
319
|
-
|
|
320
|
-
|
|
321
|
-
├──────────┴──────────────┴───────────────┴───────────────┤
|
|
322
|
-
│ Storage (SQLite) │
|
|
323
|
-
└─────────────────────────────────────────────────────────┘
|
|
324
|
-
```
|
|
365
|
+
| Layer | Components |
|
|
366
|
+
|---|---|
|
|
367
|
+
| **CLI** | `ctx init` / `ctx query` / `ctx ask` / `ctx watch` / `ctx status` / `ctx config` |
|
|
368
|
+
| **Engine** | Indexer - Search Engine - Steering LLM - File Watcher |
|
|
369
|
+
| **Storage** | SQLite (sqlite-vec vectors + FTS5 full-text + metadata) |
|
|
325
370
|
|
|
326
371
|
### Indexing pipeline
|
|
327
372
|
|
|
328
|
-
|
|
329
|
-
|
|
330
|
-
|
|
331
|
-
|
|
332
|
-
|
|
333
|
-
|
|
334
|
-
|
|
335
|
-
|
|
336
|
-
|
|
337
|
-
|
|
338
|
-
|
|
339
|
-
|
|
340
|
-
|
|
341
|
-
2. **Parsing** — Tree-sitter extracts functions, classes, methods, types, imports, constants with line ranges and docstrings
|
|
342
|
-
3. **Chunking** — splits files into logical code units (not arbitrary line windows). Functions stay whole. Related imports group together. Small constants merge.
|
|
343
|
-
4. **Embedding** — `all-MiniLM-L6-v2` via ONNX Runtime (384-dimensional vectors, runs locally)
|
|
344
|
-
5. **Storage** — SQLite with sqlite-vec for vector KNN, FTS5 for full-text, plus metadata tables
|
|
373
|
+
| Stage | What it does | Output |
|
|
374
|
+
|---|---|---|
|
|
375
|
+
| **Discovery** | Recursive file scan, respects `.gitignore` / `.ctxignore`, 30+ language extensions | File list |
|
|
376
|
+
| **Parsing** | Tree-sitter extracts functions, classes, methods, types, imports, constants | AST nodes with line ranges |
|
|
377
|
+
| **Chunking** | Groups nodes into logical code units, merges small chunks, keeps functions whole | Chunks with metadata |
|
|
378
|
+
| **Embedding** | `all-MiniLM-L6-v2` via ONNX Runtime (384-dim vectors, runs locally) | Vector embeddings |
|
|
379
|
+
| **Storage** | Writes to SQLite: sqlite-vec for KNN, FTS5 for full-text, plus file hashes | `.ctx/index.db` |
|
|
380
|
+
|
|
381
|
+
1. **Discovery** - recursive file scan, respects `.gitignore` and `.ctxignore`, filters by 30+ language extensions
|
|
382
|
+
2. **Parsing** - Tree-sitter extracts functions, classes, methods, types, imports, constants with line ranges and docstrings
|
|
383
|
+
3. **Chunking** - splits files into logical code units (not arbitrary line windows). Functions stay whole. Related imports group together. Small constants merge.
|
|
384
|
+
4. **Embedding** - `all-MiniLM-L6-v2` via ONNX Runtime (384-dimensional vectors, runs locally)
|
|
385
|
+
5. **Storage** - SQLite with sqlite-vec for vector KNN, FTS5 for full-text, plus metadata tables
|
|
345
386
|
|
|
346
387
|
### Search pipeline
|
|
347
388
|
|
|
348
|
-
|
|
349
|
-
|
|
350
|
-
|
|
351
|
-
|
|
352
|
-
|
|
353
|
-
|
|
354
|
-
|
|
355
|
-
|
|
356
|
-
│
|
|
357
|
-
└── Optional: interprets query, picks strategies,
|
|
358
|
-
synthesizes explanation after search
|
|
359
|
-
```
|
|
389
|
+
| Step | Description |
|
|
390
|
+
|---|---|
|
|
391
|
+
| **1. Query input** | Raw user query (natural language or code terms) |
|
|
392
|
+
| **2. Steering (optional)** | LLM interprets query, selects strategies, optimizes search terms |
|
|
393
|
+
| **3. Parallel search** | Runs selected strategies simultaneously: Vector (KNN), FTS (BM25), AST (symbol lookup), Path (glob/keyword), Dependency (BFS) |
|
|
394
|
+
| **4. RRF Fusion** | Reciprocal Rank Fusion combines results across strategies (K=60, per-strategy weights) |
|
|
395
|
+
| **5. Re-ranking** | Path boosting, import penalty, test file penalty, snippet penalty, file diversity, export boost |
|
|
396
|
+
| **6. Synthesis (optional)** | LLM generates a concise explanation referencing specific files and line numbers |
|
|
360
397
|
|
|
361
398
|
### Key design decisions
|
|
362
399
|
|
|
363
|
-
- **SQLite for everything**
|
|
364
|
-
- **Tree-sitter for AST**
|
|
365
|
-
- **Logical chunking**
|
|
366
|
-
- **RRF fusion**
|
|
367
|
-
- **
|
|
400
|
+
- **SQLite for everything** - vectors, FTS, metadata, all in one file (`.ctx/index.db`). Zero infrastructure.
|
|
401
|
+
- **Tree-sitter for AST** - language-agnostic parsing via WebAssembly grammars. Supports TypeScript, JavaScript, Python, Go, Rust, Java, C, C++, and more.
|
|
402
|
+
- **Logical chunking** - chunks follow code structure (functions, classes, type blocks), not arbitrary line windows. This gives better search quality and more useful results.
|
|
403
|
+
- **RRF fusion** - combines results from multiple strategies without needing to normalize scores across different metrics. Simple, effective, well-studied.
|
|
404
|
+
- **Multi-pass re-ranking** - after fusion, results go through path boosting, import/test/snippet deprioritization, file diversity balancing, and export boosting for consistently relevant output.
|
|
405
|
+
- **Incremental by default** - SHA-256 content hashing means re-indexing only processes files that actually changed.
|
|
368
406
|
|
|
369
407
|
---
|
|
370
408
|
|
|
@@ -395,10 +433,10 @@ ctx query "authentication middleware" -f json
|
|
|
395
433
|
### Tips for agent integration
|
|
396
434
|
|
|
397
435
|
- Always use `-f json` for machine-readable output
|
|
398
|
-
-
|
|
436
|
+
- Default strategies (`fts,ast,path`) work great without embeddings
|
|
399
437
|
- Use `ctx ask` when the query is natural language and an LLM key is available
|
|
400
438
|
- Run `ctx init` once, then `ctx watch` in the background to keep the index fresh
|
|
401
|
-
- The index is stored in `.ctx/`
|
|
439
|
+
- The index is stored in `.ctx/` - add it to `.gitignore` (done automatically by `ctx init`)
|
|
402
440
|
|
|
403
441
|
### Works with
|
|
404
442
|
|
|
@@ -406,7 +444,8 @@ ctx query "authentication middleware" -f json
|
|
|
406
444
|
- **Claude Code** (Anthropic)
|
|
407
445
|
- **Cursor** (AI IDE)
|
|
408
446
|
- **Aider** (terminal)
|
|
409
|
-
- **
|
|
447
|
+
- **Windsurf** (AI IDE)
|
|
448
|
+
- **LXT** (coding agent)
|
|
410
449
|
- Any tool that can execute shell commands
|
|
411
450
|
|
|
412
451
|
---
|
|
@@ -421,13 +460,13 @@ TypeScript, JavaScript, Python, Go, Rust, Java, C, C++, C#, Ruby, PHP, Swift, Ko
|
|
|
421
460
|
|
|
422
461
|
```
|
|
423
462
|
src/
|
|
424
|
-
|
|
425
|
-
|
|
426
|
-
|
|
427
|
-
|
|
428
|
-
|
|
429
|
-
|
|
430
|
-
|
|
463
|
+
cli/ # CLI commands (init, query, ask, watch, status, config)
|
|
464
|
+
indexer/ # File discovery, Tree-sitter parsing, chunking, embedding
|
|
465
|
+
search/ # Vector, FTS, AST, path, dependency search + RRF fusion + re-ranking
|
|
466
|
+
steering/ # LLM integration and prompts (Gemini, OpenAI, Anthropic)
|
|
467
|
+
storage/ # SQLite database, sqlite-vec vectors
|
|
468
|
+
watcher/ # File watching with chokidar
|
|
469
|
+
utils/ # Error handling, logging
|
|
431
470
|
```
|
|
432
471
|
|
|
433
472
|
---
|
|
@@ -435,13 +474,14 @@ src/
|
|
|
435
474
|
## Development
|
|
436
475
|
|
|
437
476
|
```bash
|
|
438
|
-
git clone https://github.com/
|
|
439
|
-
cd
|
|
477
|
+
git clone https://github.com/LuciferMornens/context-engine.git
|
|
478
|
+
cd context-engine
|
|
440
479
|
npm install
|
|
441
480
|
npm run build # Build with tsup
|
|
442
|
-
npm run test # Run tests (vitest)
|
|
481
|
+
npm run test # Run tests (vitest) - 369 tests
|
|
443
482
|
npm run lint # Lint (eslint)
|
|
444
483
|
npm run typecheck # Type check (tsc --noEmit)
|
|
484
|
+
npm run check # All of the above
|
|
445
485
|
```
|
|
446
486
|
|
|
447
487
|
---
|