kontext-engine 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 Astrolight AI
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
package/README.md ADDED
@@ -0,0 +1,451 @@
1
+ # ctx — Context Engine for AI Coding Agents
2
+
3
+ > Give your AI coding agent deep understanding of any codebase.
4
+ > No plugins, no MCP — just a CLI.
5
+
6
+ Any agent that can run bash can use `ctx`. Zero integration required.
7
+
8
+ ```bash
9
+ ctx init # Index your codebase (~5s for 1K files)
10
+ ctx query "authentication middleware" # Multi-strategy code search
11
+ ctx ask "how does the auth middleware validate tokens?" # LLM-steered natural language search
12
+ ```
13
+
14
+ ---
15
+
16
+ ## Why
17
+
18
+ AI coding agents are blind. They either read the whole codebase (blows context windows), rely on grep (misses semantic meaning), or need hand-crafted AGENTS.md files that don't scale.
19
+
20
+ `ctx` fixes this. One command indexes your codebase into a local SQLite database. Every search combines **five strategies** — vector similarity, full-text, AST symbol lookup, path matching, and dependency tracing — then fuses the results with Reciprocal Rank Fusion.
21
+
22
+ The result: your agent gets exactly the right files and line ranges, in milliseconds.
23
+
24
+ ---
25
+
26
+ ## Features
27
+
28
+ - **🔍 Semantic search** — vector embeddings via `all-MiniLM-L6-v2` (runs 100% locally)
29
+ - **📝 Full-text search** — SQLite FTS5 with BM25 ranking
30
+ - **🌳 AST-aware symbol lookup** — Tree-sitter parsing for functions, classes, types, imports
31
+ - **📁 Path & dependency tracing** — glob matching + BFS dependency graph traversal
32
+ - **🤖 LLM-steered queries** — Gemini / OpenAI / Anthropic turn natural language into precise multi-strategy search plans
33
+ - **⚡ Incremental indexing** — SHA-256 hash comparison, only re-indexes changed files
34
+ - **👁️ File watching** — `ctx watch` auto re-indexes on save
35
+ - **🏠 100% local** — your code never leaves your machine (unless you opt into API embeddings)
36
+
37
+ ---
38
+
39
+ ## Installation
40
+
41
+ ```bash
42
+ npm install -g kontext
43
+
44
+ # Or run directly
45
+ npx kontext init
46
+ ```
47
+
48
+ Requires **Node.js 20+**.
49
+
50
+ ---
51
+
52
+ ## Quickstart
53
+
54
+ ```bash
55
+ # 1. Index your project
56
+ cd my-project
57
+ ctx init
58
+
59
+ # 2. Search (JSON output — perfect for agents)
60
+ ctx query "error handling"
61
+
62
+ # 3. Search (human-readable text)
63
+ ctx query "error handling" -f text
64
+
65
+ # 4. LLM-steered natural language search (needs API key)
66
+ export CTX_GEMINI_KEY=your-key # or CTX_OPENAI_KEY / CTX_ANTHROPIC_KEY
67
+ ctx ask "how does the payment flow handle failed charges?"
68
+
69
+ # 5. Watch mode — auto re-index on file changes
70
+ ctx watch
71
+ ```
72
+
73
+ ---
74
+
75
+ ## CLI Reference
76
+
77
+ ### `ctx init [path]`
78
+
79
+ Index a codebase. Discovers files, parses ASTs, creates chunks, generates embeddings, stores everything in `.ctx/index.db`.
80
+
81
+ ```bash
82
+ ctx init # Index current directory
83
+ ctx init ./my-project # Index specific path
84
+ ```
85
+
86
+ Runs incrementally on subsequent calls — only processes changed files.
87
+
88
+ ### `ctx query <query>`
89
+
90
+ Multi-strategy code search. Default output is JSON (agent-friendly).
91
+
92
+ ```bash
93
+ ctx query "authentication"
94
+ ctx query "auth" -f text # Human-readable output
95
+ ctx query "auth" -s fts,ast # Specific strategies
96
+ ctx query "auth" -l 20 # Limit results
97
+ ctx query "auth" --language typescript # Filter by language
98
+ ```
99
+
100
+ **Options:**
101
+
102
+ | Flag | Description | Default |
103
+ |---|---|---|
104
+ | `-f, --format <fmt>` | Output format: `json` or `text` | `json` |
105
+ | `-s, --strategy <list>` | Comma-separated: `vector,fts,ast,path` | `fts,ast` |
106
+ | `-l, --limit <n>` | Maximum results | `10` |
107
+ | `--language <lang>` | Filter by language | all |
108
+ | `--no-vectors` | Skip vector search | — |
109
+
110
+ **JSON output (for agents):**
111
+
112
+ ```json
113
+ {
114
+ "query": "authentication",
115
+ "results": [
116
+ {
117
+ "file": "src/middleware/auth.ts",
118
+ "lineStart": 14,
119
+ "lineEnd": 89,
120
+ "name": "validateToken",
121
+ "type": "function",
122
+ "score": 0.94,
123
+ "language": "typescript",
124
+ "text": "export async function validateToken(token: string) { ... }"
125
+ }
126
+ ],
127
+ "searchTimeMs": 12,
128
+ "totalResults": 3
129
+ }
130
+ ```
131
+
132
+ **Text output (for humans):**
133
+
134
+ ```
135
+ Query: "authentication"
136
+
137
+ src/middleware/auth.ts L14–L89 (0.94)
138
+ validateToken [function]
139
+ export async function validateToken(token: string) { ... }
140
+
141
+ src/routes/login.ts L45–L112 (0.87)
142
+ handleLogin [function]
143
+ ...
144
+
145
+ 3 results in 12ms
146
+ ```
147
+
148
+ ### `ctx find <query>`
149
+
150
+ Alias for `ctx query`. Identical behavior.
151
+
152
+ ### `ctx ask <query>`
153
+
154
+ LLM-steered natural language search. Sends your query to a steering LLM that creates a search plan, executes multi-strategy search, then synthesizes an explanation.
155
+
156
+ ```bash
157
+ ctx ask "how does the auth middleware validate tokens?"
158
+ ctx ask "what happens when a payment fails?" -f json
159
+ ctx ask "find all database models" --no-explain
160
+ ctx ask "auth flow" -p openai # Force specific provider
161
+ ```
162
+
163
+ **Options:**
164
+
165
+ | Flag | Description | Default |
166
+ |---|---|---|
167
+ | `-f, --format <fmt>` | Output format: `json` or `text` | `text` |
168
+ | `-l, --limit <n>` | Maximum results | `10` |
169
+ | `-p, --provider <name>` | LLM provider: `gemini`, `openai`, `anthropic` | auto-detect |
170
+ | `--no-explain` | Skip explanation, return raw search results | — |
171
+
172
+ **Requires an API key** (set via environment variable):
173
+
174
+ ```bash
175
+ export CTX_GEMINI_KEY=your-key # Gemini 2.0 Flash (cheapest)
176
+ export CTX_OPENAI_KEY=your-key # GPT-4o-mini
177
+ export CTX_ANTHROPIC_KEY=your-key # Claude 3.5 Haiku
178
+ ```
179
+
180
+ Falls back to basic multi-strategy search if no API key is available.
181
+
182
+ ### `ctx watch [path]`
183
+
184
+ Watch mode — monitors files and re-indexes automatically when you save.
185
+
186
+ ```bash
187
+ ctx watch # Watch current directory
188
+ ctx watch --init # Run full init first, then watch
189
+ ctx watch --debounce 1000 # Custom debounce (ms)
190
+ ctx watch --embed # Re-embed on changes (slower)
191
+ ```
192
+
193
+ **Options:**
194
+
195
+ | Flag | Description | Default |
196
+ |---|---|---|
197
+ | `--init` | Run `ctx init` before starting watch | off |
198
+ | `--debounce <ms>` | Debounce interval | `500` |
199
+ | `--embed` | Enable embedding during watch | off |
200
+
201
+ Press `Ctrl+C` to stop gracefully.
202
+
203
+ ### `ctx status [path]`
204
+
205
+ Show index statistics.
206
+
207
+ ```bash
208
+ ctx status
209
+ ```
210
+
211
+ ```
212
+ Kontext Status — /path/to/project
213
+
214
+ Initialized: Yes
215
+ Database: .ctx/index.db (14.2 MB)
216
+ Last indexed: 2025-01-15 14:30:22
217
+
218
+ Files: 847
219
+ Chunks: 3,241
220
+ Vectors: 3,241
221
+
222
+ Languages:
223
+ Typescript 420 files
224
+ Python 200 files
225
+ Javascript 127 files
226
+ Go 50 files
227
+ Rust 50 files
228
+
229
+ Embedder: local (all-MiniLM-L6-v2, 384 dims)
230
+ ```
231
+
232
+ ### `ctx config <subcommand>`
233
+
234
+ Manage project configuration stored in `.ctx/config.json`.
235
+
236
+ ```bash
237
+ ctx config show # Show full config
238
+ ctx config get search.defaultLimit # Get specific value
239
+ ctx config set search.defaultLimit 20 # Set value
240
+ ctx config set embedder.provider voyage # Switch embedder
241
+ ctx config set search.strategies '["fts","ast","vector"]'
242
+ ctx config reset # Reset to defaults
243
+ ```
244
+
245
+ Supports dot-notation for nested keys. Values are auto-parsed (numbers, booleans, JSON arrays, `null`).
246
+
247
+ ### Global Options
248
+
249
+ | Flag | Description |
250
+ |---|---|
251
+ | `--verbose` | Enable debug output (stderr) |
252
+ | `--version` | Show version |
253
+ | `--help` | Show help |
254
+
255
+ Debug logging is also enabled via `CTX_DEBUG=1`.
256
+
257
+ ---
258
+
259
+ ## Configuration
260
+
261
+ Configuration lives in `.ctx/config.json`, created automatically by `ctx init`.
262
+
263
+ ```json
264
+ {
265
+ "embedder": {
266
+ "provider": "local",
267
+ "model": "Xenova/all-MiniLM-L6-v2",
268
+ "dimensions": 384
269
+ },
270
+ "search": {
271
+ "defaultLimit": 10,
272
+ "strategies": ["vector", "fts", "ast", "path"],
273
+ "weights": {
274
+ "vector": 1.0,
275
+ "fts": 0.8,
276
+ "ast": 0.9,
277
+ "path": 0.7,
278
+ "dependency": 0.6
279
+ }
280
+ },
281
+ "watch": {
282
+ "debounceMs": 500,
283
+ "ignored": []
284
+ },
285
+ "llm": {
286
+ "provider": null,
287
+ "model": null
288
+ }
289
+ }
290
+ ```
291
+
292
+ ### Embedder providers
293
+
294
+ | Provider | Model | Dimensions | Cost | Notes |
295
+ |---|---|---|---|---|
296
+ | `local` | all-MiniLM-L6-v2 | 384 | Free | Default. Runs on CPU via ONNX Runtime. |
297
+ | `voyage` | voyage-code-3 | 1024 | API pricing | Higher quality for code search. |
298
+ | `openai` | text-embedding-3-small | 1536 | API pricing | OpenAI's smallest embedding model. |
299
+
300
+ ### Search strategies
301
+
302
+ | Strategy | What it does | Best for |
303
+ |---|---|---|
304
+ | `vector` | KNN cosine similarity on embeddings | Semantic/conceptual search |
305
+ | `fts` | SQLite FTS5 full-text search with BM25 | Keyword/exact term search |
306
+ | `ast` | Symbol name/type/parent matching | Finding specific functions, classes, types |
307
+ | `path` | Glob-pattern file path matching | Finding files by name or directory |
308
+ | `dependency` | BFS traversal of import/require graph | Tracing what depends on what |
309
+
310
+ Results from all strategies are fused using **Reciprocal Rank Fusion (RRF)** with K=60 and per-strategy weights.
311
+
312
+ ---
313
+
314
+ ## Architecture
315
+
316
+ ```
317
+ ┌─────────────────────────────────────────────────────────┐
318
+ │ ctx CLI │
319
+ ├──────────┬──────────────┬───────────────┬───────────────┤
320
+ │ Indexer │ Search Engine │ Steering LLM │ File Watcher │
321
+ ├──────────┴──────────────┴───────────────┴───────────────┤
322
+ │ Storage (SQLite) │
323
+ └─────────────────────────────────────────────────────────┘
324
+ ```
325
+
326
+ ### Indexing pipeline
327
+
328
+ ```
329
+ Source Files → Discovery → Tree-sitter AST → Logical Chunks → Embeddings → SQLite
330
+ │ │ │ │
331
+ │ ├── functions ├── chunk text ├── vectors (sqlite-vec)
332
+ │ ├── classes ├── file path ├── FTS5 index
333
+ │ ├── imports ├── line range ├── AST metadata
334
+ │ └── types └── language └── file hashes
335
+
336
+ ├── .gitignore / .ctxignore filtering
337
+ └── 30+ language extensions
338
+ ```
339
+
340
+ 1. **Discovery** — recursive file scan, respects `.gitignore` and `.ctxignore`, filters by 30+ language extensions
341
+ 2. **Parsing** — Tree-sitter extracts functions, classes, methods, types, imports, constants with line ranges and docstrings
342
+ 3. **Chunking** — splits files into logical code units (not arbitrary line windows). Functions stay whole. Related imports group together. Small constants merge.
343
+ 4. **Embedding** — `all-MiniLM-L6-v2` via ONNX Runtime (384-dimensional vectors, runs locally)
344
+ 5. **Storage** — SQLite with sqlite-vec for vector KNN, FTS5 for full-text, plus metadata tables
345
+
346
+ ### Search pipeline
347
+
348
+ ```
349
+ Query → [Steering LLM] → Strategy Selection → Parallel Search → RRF Fusion → Ranked Results
350
+ │ │
351
+ │ ├── Vector similarity (KNN)
352
+ │ ├── Full-text search (BM25)
353
+ │ ├── AST symbol lookup
354
+ │ ├── Path glob matching
355
+ │ └── Dependency tracing (BFS)
356
+
357
+ └── Optional: interprets query, picks strategies,
358
+ synthesizes explanation after search
359
+ ```
360
+
361
+ ### Key design decisions
362
+
363
+ - **SQLite for everything** — vectors, FTS, metadata, all in one file (`.ctx/index.db`). Zero infrastructure.
364
+ - **Tree-sitter for AST** — language-agnostic parsing via WebAssembly grammars. Supports TypeScript, JavaScript, Python, Go, Rust, Java, C, C++, and more.
365
+ - **Logical chunking** — chunks follow code structure (functions, classes, type blocks), not arbitrary line windows. This gives better search quality and more useful results.
366
+ - **RRF fusion** — combines results from multiple strategies without needing to normalize scores across different metrics. Simple, effective, well-studied.
367
+ - **Incremental by default** — SHA-256 content hashing means re-indexing only processes files that actually changed.
368
+
369
+ ---
370
+
371
+ ## For AI Agent Authors
372
+
373
+ `ctx` is designed to be called from any AI coding agent via shell. No SDK, no API server, no MCP protocol needed.
374
+
375
+ ### Integration pattern
376
+
377
+ ```bash
378
+ # Your agent runs this in bash:
379
+ ctx query "authentication middleware" -f json
380
+
381
+ # Parse the JSON output, use the file paths and line ranges
382
+ # to read exactly the right code into the agent's context window.
383
+ ```
384
+
385
+ ### Recommended agent workflow
386
+
387
+ ```
388
+ 1. Agent receives a task involving unfamiliar code
389
+ 2. Agent runs: ctx query "<relevant terms>" -f json
390
+ 3. Agent reads the top results (file paths + line ranges)
391
+ 4. Agent now has targeted context instead of the whole codebase
392
+ 5. Agent completes the task with precision
393
+ ```
394
+
395
+ ### Tips for agent integration
396
+
397
+ - Always use `-f json` for machine-readable output
398
+ - Use `-s fts,ast` for fast, embedding-free search
399
+ - Use `ctx ask` when the query is natural language and an LLM key is available
400
+ - Run `ctx init` once, then `ctx watch` in the background to keep the index fresh
401
+ - The index is stored in `.ctx/` — add it to `.gitignore` (done automatically by `ctx init`)
402
+
403
+ ### Works with
404
+
405
+ - **OpenAI Codex** (CLI)
406
+ - **Claude Code** (Anthropic)
407
+ - **Cursor** (AI IDE)
408
+ - **Aider** (terminal)
409
+ - **lxt** (coding agent)
410
+ - Any tool that can execute shell commands
411
+
412
+ ---
413
+
414
+ ## Supported Languages
415
+
416
+ TypeScript, JavaScript, Python, Go, Rust, Java, C, C++, C#, Ruby, PHP, Swift, Kotlin, Scala, Haskell, Lua, R, Dart, Elixir, Shell, SQL, HTML, CSS, SCSS, Vue, Svelte, JSON, YAML, TOML, Markdown, and more.
417
+
418
+ ---
419
+
420
+ ## Project Structure
421
+
422
+ ```
423
+ src/
424
+ ├── cli/ # CLI commands (init, query, ask, watch, status, config)
425
+ ├── indexer/ # File discovery, Tree-sitter parsing, chunking, embedding
426
+ ├── search/ # Vector, FTS, AST, path, dependency search + RRF fusion
427
+ ├── steering/ # LLM integration (Gemini, OpenAI, Anthropic)
428
+ ├── storage/ # SQLite database, sqlite-vec vectors
429
+ ├── watcher/ # File watching with chokidar
430
+ └── utils/ # Error handling, logging
431
+ ```
432
+
433
+ ---
434
+
435
+ ## Development
436
+
437
+ ```bash
438
+ git clone https://github.com/example/kontext.git
439
+ cd kontext
440
+ npm install
441
+ npm run build # Build with tsup
442
+ npm run test # Run tests (vitest)
443
+ npm run lint # Lint (eslint)
444
+ npm run typecheck # Type check (tsc --noEmit)
445
+ ```
446
+
447
+ ---
448
+
449
+ ## License
450
+
451
+ MIT