cto-ai-cli 7.1.0 → 8.0.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +124 -56
- package/dist/cli/index.js +2018 -34
- package/dist/engine/index.d.ts +826 -3
- package/dist/engine/index.js +3078 -133
- package/dist/mcp/index.js +1978 -34
- package/package.json +1 -1
package/README.md
CHANGED
|
@@ -2,17 +2,20 @@
|
|
|
2
2
|
|
|
3
3
|
[](https://www.npmjs.com/package/cto-ai-cli)
|
|
4
4
|
[](LICENSE)
|
|
5
|
-
[](.)
|
|
6
6
|
|
|
7
|
-
**
|
|
7
|
+
**The most complete AI context selection engine in open source.** Picks the right code *chunks* (not just files), auto-redacts secrets, learns from feedback. 18 signals. Zero AI dependencies.
|
|
8
8
|
|
|
9
9
|
```bash
|
|
10
|
-
cto --context "fix the
|
|
11
|
-
cto --context "fix auth" --prompt "Refactor to use JWT" # → AI prompt
|
|
12
|
-
cto --accept # → learns
|
|
10
|
+
cto --context "fix the seller info cache invalidation on KVS delete" --stdout | pbcopy
|
|
13
11
|
```
|
|
14
12
|
|
|
15
|
-
|
|
13
|
+
```
|
|
14
|
+
→ 166 relevant chunks from 59 files (26K tokens, 0 secrets)
|
|
15
|
+
→ Full chain: DeleteEndpoint → Router → UseCase → CacheService → KvsRepository
|
|
16
|
+
```
|
|
17
|
+
|
|
18
|
+
202KB package · 1,133 tests · 96 source modules · Zero AI dependencies.
|
|
16
19
|
|
|
17
20
|
---
|
|
18
21
|
|
|
@@ -35,18 +38,31 @@ This runs a self-contained presentation that shows: project analysis, semantic m
|
|
|
35
38
|
|
|
36
39
|
## Benchmark Results
|
|
37
40
|
|
|
38
|
-
|
|
41
|
+
**Eval Harness v8.0** — 20-file Java enterprise project, 4 tasks with expert-labeled ground truth:
|
|
39
42
|
|
|
40
|
-
|
|
|
43
|
+
| Metric | Result |
|
|
44
|
+
|---|---|
|
|
45
|
+
| **Must-have recall** | **100%** (every critical file found) |
|
|
46
|
+
| **Precision** | **38–44%** |
|
|
47
|
+
| **F1** | **55%** |
|
|
48
|
+
| **Noise rate** | **11.3%** |
|
|
49
|
+
|
|
50
|
+
**Real production repos** (Java monoliths):
|
|
51
|
+
|
|
52
|
+
| Repo | Files | Without CTO | With CTO v8.0 |
|
|
41
53
|
|---|---|---|---|
|
|
42
|
-
|
|
|
54
|
+
| seller-info-service | 219 | 212 files (97%) | **166 chunks from 59 files** |
|
|
55
|
+
| sizechart-middleend | 1,719 | 230 files | **72 chunks from 37 files** |
|
|
56
|
+
| charts-backend | 1,261 | 685 files (54%) | **142 chunks from 16 files** |
|
|
57
|
+
|
|
58
|
+
**Internal benchmark** (8 tasks, own codebase):
|
|
59
|
+
|
|
60
|
+
| Strategy | Precision | Recall | F1 |
|
|
61
|
+
|---|---|---|---|
|
|
62
|
+
| **CTO + Reranker** | **96.9%** | 100% | 98.4% |
|
|
43
63
|
| TF-IDF only | 54.6% | 87.5% | 62.0% |
|
|
44
|
-
| Risk-only | 20.8% | 18.8% | 15.0% |
|
|
45
|
-
| Alphabetical | 8.3% | 31.3% | 12.9% |
|
|
46
64
|
| Random | 7.7% | 6.3% | 2.8% |
|
|
47
65
|
|
|
48
|
-
**CTO never misses a must-have file** (100% recall). 3.8× better F1 than alphabetical. 17× better than random.
|
|
49
|
-
|
|
50
66
|
## ROI
|
|
51
67
|
|
|
52
68
|
On a typical 130-file TypeScript project:
|
|
@@ -60,23 +76,34 @@ On a typical 130-file TypeScript project:
|
|
|
60
76
|
|
|
61
77
|
Plus: fewer hallucinations (right context), zero secret leaks, and the learner gets smarter with every `--accept` / `--reject`.
|
|
62
78
|
|
|
63
|
-
## How it Works
|
|
79
|
+
## How it Works (v8.0 Pipeline)
|
|
64
80
|
|
|
65
81
|
```
|
|
66
|
-
Task
|
|
67
|
-
|
|
68
|
-
|
|
69
|
-
|
|
70
|
-
|
|
82
|
+
Task → Query Intent Parser → structured action/entities/layers
|
|
83
|
+
│
|
|
84
|
+
▼
|
|
85
|
+
BM25 (weighted) ──────┐
|
|
86
|
+
TF-IDF Embedding ─────┤──→ RRF Fusion ─→ 8-signal Boosting ─→ Reranker
|
|
87
|
+
Multi-hop (auto) ─────┘ │
|
|
88
|
+
▼
|
|
89
|
+
Selection ─→ Chunk Extraction ─→ Output
|
|
90
|
+
(methods, not files)
|
|
71
91
|
```
|
|
72
92
|
|
|
73
|
-
|
|
74
|
-
|
|
75
|
-
|
|
76
|
-
|
|
77
|
-
|
|
78
|
-
|
|
79
|
-
|
|
93
|
+
**10-step pipeline:**
|
|
94
|
+
|
|
95
|
+
| # | Step | What it does |
|
|
96
|
+
|---|---|---|
|
|
97
|
+
| 0 | **Query Intent** | Parses "fix cache invalidation on delete" → `action:fix`, `entities:[cache,kvs]`, `layers:[cache]` |
|
|
98
|
+
| 1 | **BM25 + Embedding** | Lexical matching + TF-IDF cosine vectors, merged via Reciprocal Rank Fusion |
|
|
99
|
+
| 2 | **Multi-hop** | Complex queries auto-detected → iterative BM25 expansion via deps + call graph (2 hops) |
|
|
100
|
+
| 3 | **Path IDF Boost** | Query terms in file paths get boosted |
|
|
101
|
+
| 4 | **Layer Boost** | Architectural layer matching (controller, service, repository) |
|
|
102
|
+
| 5 | **Import Boost** | Dependencies of top-ranked files get pulled in |
|
|
103
|
+
| 6 | **Call Graph Boost** | Cross-file method calls traced (Java/TS/Python/Go) |
|
|
104
|
+
| 7 | **Git Co-Change** | Files frequently modified together (Jaccard similarity from commits) |
|
|
105
|
+
| 8 | **Reranker** | 5-signal quality gate: term coverage, specificity, bigram proximity, deps, path |
|
|
106
|
+
| 9 | **Chunk Extraction** | Extracts relevant functions/methods — not whole files. 10x token efficiency |
|
|
80
107
|
|
|
81
108
|
**No AI is used for selection.** Same input → same output. Deterministic.
|
|
82
109
|
|
|
@@ -225,62 +252,103 @@ const selection = await selectContext({
|
|
|
225
252
|
});
|
|
226
253
|
```
|
|
227
254
|
|
|
228
|
-
##
|
|
255
|
+
## v8.0 — What's New
|
|
229
256
|
|
|
230
|
-
###
|
|
257
|
+
### Chunk-Level Retrieval (the big one)
|
|
231
258
|
|
|
232
|
-
|
|
233
|
-
- **Term coverage**: fraction of unique query terms matched per file
|
|
234
|
-
- **Term specificity**: IDF-weighted — rare terms matter more
|
|
235
|
-
- **Bigram proximity**: query terms appearing close together in the file
|
|
236
|
-
- **Dependency signal**: files in the dependency cone of top matches
|
|
237
|
-
- **Quality gate**: adaptive cutoff stops filling budget with noise
|
|
259
|
+
Instead of including entire files, CTO now extracts **only the relevant functions and methods**. A 2000-line file with 1 relevant method → 50 lines included, not 2000.
|
|
238
260
|
|
|
239
|
-
|
|
261
|
+
```
|
|
262
|
+
### src/main/java/com/example/cache/CacheService.java
|
|
263
|
+
```java
|
|
264
|
+
// L15-22: method invalidate
|
|
265
|
+
public void invalidate(String id) {
|
|
266
|
+
redis.delete("cache:seller:" + id);
|
|
267
|
+
}
|
|
268
|
+
|
|
269
|
+
// ... lines 23-45 omitted ...
|
|
270
|
+
|
|
271
|
+
// L46-52: method retrieve
|
|
272
|
+
public SellerDTO retrieve(String id) {
|
|
273
|
+
return redis.opsForValue().get("cache:seller:" + id);
|
|
274
|
+
}
|
|
275
|
+
```
|
|
240
276
|
|
|
241
|
-
|
|
277
|
+
Supports Java, TypeScript, Python, Go.
|
|
242
278
|
|
|
243
|
-
###
|
|
279
|
+
### Query Intent Parsing
|
|
244
280
|
|
|
245
|
-
|
|
281
|
+
Before searching, CTO parses your task into structured intent:
|
|
246
282
|
|
|
247
|
-
```bash
|
|
248
|
-
# Works on Python, Go, Java, Rust projects — not just TypeScript
|
|
249
|
-
cto --context "fix auth handler" /path/to/go-project
|
|
250
283
|
```
|
|
284
|
+
"fix the seller cache invalidation on KVS delete"
|
|
285
|
+
→ action: fix
|
|
286
|
+
→ entities: [seller, kvs] (3× weight)
|
|
287
|
+
→ operations: [invalidate, delete] (2× weight)
|
|
288
|
+
→ layers: [cache]
|
|
289
|
+
```
|
|
290
|
+
|
|
291
|
+
Entities get 3× BM25 weight, operations get 2×. Much better precision on enterprise queries.
|
|
251
292
|
|
|
252
|
-
###
|
|
293
|
+
### Embedding Search + RRF Fusion
|
|
253
294
|
|
|
254
|
-
|
|
295
|
+
TF-IDF cosine embedding vectors complement BM25 lexical matching. Merged via Reciprocal Rank Fusion (60/40 BM25/embedding). Catches semantic similarity that BM25 misses.
|
|
255
296
|
|
|
256
|
-
###
|
|
297
|
+
### Cross-File Call Graph
|
|
257
298
|
|
|
258
|
-
|
|
299
|
+
Traces method calls across files: `cacheService.invalidate()` in UseCase → finds `CacheService.java`. Regex-based, works for Java/TS/Python/Go.
|
|
259
300
|
|
|
260
|
-
###
|
|
301
|
+
### Git Co-Change Signal
|
|
261
302
|
|
|
262
|
-
|
|
303
|
+
Files frequently modified together in git history get boosted. Jaccard similarity from commit co-occurrence.
|
|
263
304
|
|
|
264
|
-
###
|
|
305
|
+
### Multi-Hop Reasoning
|
|
265
306
|
|
|
266
|
-
|
|
307
|
+
Complex enterprise queries auto-detected. Iterative BM25: top matches → expand via deps + call graph → re-query. Traces full execution chains (4/4 hops).
|
|
267
308
|
|
|
268
|
-
###
|
|
309
|
+
### Evaluation Harness
|
|
269
310
|
|
|
270
|
-
|
|
311
|
+
Ground truth benchmark with must-have/relevant/noise labels. 100% must-have recall on 4-task Java enterprise benchmark.
|
|
312
|
+
|
|
313
|
+
## Enterprise Features
|
|
314
|
+
|
|
315
|
+
- **AI Gateway** — transparent HTTP proxy with context injection, secret redaction, cost tracking
|
|
316
|
+
- **Team Auth** — per-team API keys, JWT (HS256/RS256), rate limiting, OIDC discovery
|
|
317
|
+
- **Policy Engine** — model overrides by task type, cost caps, block rules
|
|
318
|
+
- **Metrics** — Prometheus, Datadog JSON, StatsD UDP
|
|
319
|
+
- **A/B Testing** — context strategy experiments with z-test significance
|
|
320
|
+
- **LSP Bridge** — JSON-RPC 2.0 for VS Code, JetBrains, Neovim
|
|
321
|
+
- **Persistent Index Cache** — 50K-file repos: 5s → <100ms on warm cache
|
|
322
|
+
|
|
323
|
+
## Competitor Comparison
|
|
324
|
+
|
|
325
|
+
| Feature | CTO v8 | Cursor | Sourcegraph Cody |
|
|
326
|
+
|---|---|---|---|
|
|
327
|
+
| BM25 retrieval | ✅ | ✅ | ✅ |
|
|
328
|
+
| Embedding search | ✅ TF-IDF cosine+RRF | ✅ | ✅ |
|
|
329
|
+
| Chunk-level retrieval | ✅ 4 langs | ✅ | ✅ |
|
|
330
|
+
| Multi-signal RRF fusion | ✅ 8-signal | ❌ | ❌ |
|
|
331
|
+
| Cross-file call graph | ✅ | ❌ | ❌ |
|
|
332
|
+
| Git co-change signal | ✅ | ❌ | ❌ |
|
|
333
|
+
| Multi-hop reasoning | ✅ | ❌ | ❌ |
|
|
334
|
+
| Query intent parsing | ✅ | ❌ | ❌ |
|
|
335
|
+
| Feedback learning | ✅ | ❌ | ❌ |
|
|
336
|
+
| Secret redaction | ✅ | ❌ | ❌ |
|
|
337
|
+
| **Total signals** | **18** | **~3** | **~5** |
|
|
271
338
|
|
|
272
339
|
## Honest Limitations
|
|
273
340
|
|
|
274
|
-
- **TypeScript/JavaScript gets AST analysis.** Python/Go/Java/Rust get regex-based
|
|
275
|
-
- **
|
|
276
|
-
- **Learning needs ~5 feedback cycles** to start influencing selection. First runs are pure
|
|
277
|
-
- **
|
|
341
|
+
- **TypeScript/JavaScript gets AST analysis.** Python/Go/Java/Rust get regex-based parsing (good for graphs + chunking, not AST-precise).
|
|
342
|
+
- **Embeddings are TF-IDF cosine, not neural.** ONNX infrastructure ready — neural model would add ~5-10% recall.
|
|
343
|
+
- **Learning needs ~5 feedback cycles** to start influencing selection. First runs are pure pipeline.
|
|
344
|
+
- **Chunk extraction is regex-based** — works for standard methods/functions, may miss DSLs or deeply nested code.
|
|
345
|
+
- **Benchmarked against naive baselines.** Not compared against Cursor/Copilot internal context engines.
|
|
278
346
|
|
|
279
347
|
## Contributing
|
|
280
348
|
|
|
281
349
|
```bash
|
|
282
350
|
git clone https://github.com/cto-ai/cto-ai-cli.git && cd cto-ai-cli
|
|
283
|
-
npm install && npm run build && npm test #
|
|
351
|
+
npm install && npm run build && npm test # 1,133 tests
|
|
284
352
|
```
|
|
285
353
|
|
|
286
354
|
## License
|