cto-ai-cli 6.1.0 → 8.0.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +264 -63
- package/dist/cli/index.js +7732 -1729
- package/dist/engine/index.d.ts +1373 -14
- package/dist/engine/index.js +6731 -2110
- package/dist/mcp/index.js +3750 -430
- package/package.json +1 -1
package/README.md
CHANGED
|
@@ -1,81 +1,141 @@
|
|
|
1
|
-
# CTO — AI
|
|
1
|
+
# CTO — AI Context Selection Engine
|
|
2
2
|
|
|
3
3
|
[](https://www.npmjs.com/package/cto-ai-cli)
|
|
4
4
|
[](LICENSE)
|
|
5
|
+
[](.)
|
|
5
6
|
|
|
6
|
-
|
|
7
|
+
**The most complete AI context selection engine in open source.** Picks the right code *chunks* (not just files), auto-redacts secrets, learns from feedback. 18 signals. Zero AI dependencies.
|
|
7
8
|
|
|
8
9
|
```bash
|
|
9
|
-
|
|
10
|
-
|
|
11
|
-
|
|
12
|
-
# Generate a complete AI prompt
|
|
13
|
-
cto --context "fix the auth middleware" --prompt "Refactor this to use JWT"
|
|
10
|
+
cto --context "fix the seller info cache invalidation on KVS delete" --stdout | pbcopy
|
|
11
|
+
```
|
|
14
12
|
|
|
15
|
-
|
|
16
|
-
|
|
13
|
+
```
|
|
14
|
+
→ 166 relevant chunks from 59 files (26K tokens, 0 secrets)
|
|
15
|
+
→ Full chain: DeleteEndpoint → Router → UseCase → CacheService → KvsRepository
|
|
17
16
|
```
|
|
18
17
|
|
|
19
|
-
|
|
18
|
+
202KB package · 1,133 tests · 96 source modules · Zero AI dependencies.
|
|
20
19
|
|
|
21
20
|
---
|
|
22
21
|
|
|
23
|
-
##
|
|
22
|
+
## The Problem
|
|
24
23
|
|
|
25
|
-
When
|
|
24
|
+
When developers use AI coding assistants, they need to provide context — the right source files. Today, most teams either:
|
|
26
25
|
|
|
27
|
-
|
|
28
|
-
|
|
29
|
-
3. **Sanitizes output** — API keys, tokens, passwords auto-redacted before they reach any AI
|
|
30
|
-
4. **Learns from feedback** — `--accept` / `--reject` teach it what you actually need
|
|
26
|
+
- **Send everything** → expensive, slow, hits token limits
|
|
27
|
+
- **Pick files manually** → miss dependencies, forget test files, leak secrets
|
|
31
28
|
|
|
32
|
-
|
|
29
|
+
CTO solves both: it **automatically selects the most relevant files** for any task, **sanitizes secrets** before they reach any AI provider, and **learns from feedback** to get better over time.
|
|
33
30
|
|
|
34
|
-
##
|
|
31
|
+
## Quick Demo
|
|
35
32
|
|
|
36
33
|
```bash
|
|
37
|
-
|
|
38
|
-
npx cto-ai-cli # or one-shot
|
|
34
|
+
cto --demo # Run a live showcase on your project
|
|
39
35
|
```
|
|
40
36
|
|
|
41
|
-
|
|
37
|
+
This runs a self-contained presentation that shows: project analysis, semantic matching proof, secret sanitization, ROI calculation, and benchmark results.
|
|
42
38
|
|
|
43
|
-
|
|
44
|
-
|
|
45
|
-
|
|
39
|
+
## Benchmark Results
|
|
40
|
+
|
|
41
|
+
**Eval Harness v8.0** — 20-file Java enterprise project, 4 tasks with expert-labeled ground truth:
|
|
42
|
+
|
|
43
|
+
| Metric | Result |
|
|
44
|
+
|---|---|
|
|
45
|
+
| **Must-have recall** | **100%** (every critical file found) |
|
|
46
|
+
| **Precision** | **38–44%** |
|
|
47
|
+
| **F1** | **55%** |
|
|
48
|
+
| **Noise rate** | **11.3%** |
|
|
49
|
+
|
|
50
|
+
**Real production repos** (Mercado Libre Java monoliths):
|
|
51
|
+
|
|
52
|
+
| Repo | Files | Without CTO | With CTO v8.0 |
|
|
53
|
+
|---|---|---|---|
|
|
54
|
+
| fury_supply-seller-info | 219 | 212 files (97%) | **166 chunks from 59 files** |
|
|
55
|
+
| sell-sizechart-middleend | 1,719 | 230 files | **72 chunks from 37 files** |
|
|
56
|
+
| charts-backend | 1,261 | 685 files (54%) | **142 chunks from 16 files** |
|
|
46
57
|
|
|
47
|
-
|
|
48
|
-
cto --context "fix login bug" --stdout | pbcopy
|
|
58
|
+
**Internal benchmark** (8 tasks, own codebase):
|
|
49
59
|
|
|
50
|
-
|
|
51
|
-
|
|
60
|
+
| Strategy | Precision | Recall | F1 |
|
|
61
|
+
|---|---|---|---|
|
|
62
|
+
| **CTO + Reranker** | **96.9%** | 100% | 98.4% |
|
|
63
|
+
| TF-IDF only | 54.6% | 87.5% | 62.0% |
|
|
64
|
+
| Random | 7.7% | 6.3% | 2.8% |
|
|
52
65
|
|
|
53
|
-
|
|
54
|
-
cto --context "fix login" --prompt "Refactor to use async/await"
|
|
66
|
+
## ROI
|
|
55
67
|
|
|
56
|
-
|
|
57
|
-
|
|
68
|
+
On a typical 130-file TypeScript project:
|
|
69
|
+
|
|
70
|
+
| Metric | Without CTO | With CTO |
|
|
71
|
+
|---|---|---|
|
|
72
|
+
| Tokens per interaction | 370K (all files) | ~28K (selected) |
|
|
73
|
+
| Cost per interaction (Sonnet) | $1.11 | $0.08 |
|
|
74
|
+
| **Monthly cost (10 devs, 40/day)** | **$8,880** | **$640** |
|
|
75
|
+
| **Annual savings** | — | **~$99,000** |
|
|
76
|
+
|
|
77
|
+
Plus: fewer hallucinations (right context), zero secret leaks, and the learner gets smarter with every `--accept` / `--reject`.
|
|
78
|
+
|
|
79
|
+
## How it Works (v8.0 Pipeline)
|
|
58
80
|
|
|
59
|
-
|
|
60
|
-
|
|
81
|
+
```
|
|
82
|
+
Task → Query Intent Parser → structured action/entities/layers
|
|
83
|
+
│
|
|
84
|
+
▼
|
|
85
|
+
BM25 (weighted) ──────┐
|
|
86
|
+
TF-IDF Embedding ─────┤──→ RRF Fusion ─→ 8-signal Boosting ─→ Reranker
|
|
87
|
+
Multi-hop (auto) ─────┘ │
|
|
88
|
+
▼
|
|
89
|
+
Selection ─→ Chunk Extraction ─→ Output
|
|
90
|
+
(methods, not files)
|
|
61
91
|
```
|
|
62
92
|
|
|
63
|
-
|
|
93
|
+
**10-step pipeline:**
|
|
64
94
|
|
|
65
|
-
|
|
95
|
+
| # | Step | What it does |
|
|
96
|
+
|---|---|---|
|
|
97
|
+
| 0 | **Query Intent** | Parses "fix cache invalidation on delete" → `action:fix`, `entities:[cache,kvs]`, `layers:[cache]` |
|
|
98
|
+
| 1 | **BM25 + Embedding** | Lexical matching + TF-IDF cosine vectors, merged via Reciprocal Rank Fusion |
|
|
99
|
+
| 2 | **Multi-hop** | Complex queries auto-detected → iterative BM25 expansion via deps + call graph (2 hops) |
|
|
100
|
+
| 3 | **Path IDF Boost** | Query terms in file paths get boosted |
|
|
101
|
+
| 4 | **Layer Boost** | Architectural layer matching (controller, service, repository) |
|
|
102
|
+
| 5 | **Import Boost** | Dependencies of top-ranked files get pulled in |
|
|
103
|
+
| 6 | **Call Graph Boost** | Cross-file method calls traced (Java/TS/Python/Go) |
|
|
104
|
+
| 7 | **Git Co-Change** | Files frequently modified together (Jaccard similarity from commits) |
|
|
105
|
+
| 8 | **Reranker** | 5-signal quality gate: term coverage, specificity, bigram proximity, deps, path |
|
|
106
|
+
| 9 | **Chunk Extraction** | Extracts relevant functions/methods — not whole files. 10x token efficiency |
|
|
66
107
|
|
|
67
|
-
|
|
108
|
+
**No AI is used for selection.** Same input → same output. Deterministic.
|
|
109
|
+
|
|
110
|
+
## Install
|
|
68
111
|
|
|
69
112
|
```bash
|
|
70
|
-
|
|
71
|
-
cto
|
|
113
|
+
npm i -g cto-ai-cli # global
|
|
114
|
+
npx cto-ai-cli # or one-shot
|
|
115
|
+
```
|
|
72
116
|
|
|
73
|
-
|
|
74
|
-
cto --reject
|
|
75
|
-
cto --reject --missing src/types/auth.ts
|
|
117
|
+
## Context Selection
|
|
76
118
|
|
|
77
|
-
|
|
78
|
-
cto --
|
|
119
|
+
```bash
|
|
120
|
+
cto --context "refactor the auth middleware" # human-readable summary
|
|
121
|
+
cto --context "fix login bug" --stdout | pbcopy # pipe to clipboard
|
|
122
|
+
cto --context "add tests" --output context.md # save to file
|
|
123
|
+
cto --context "fix login" --prompt "Refactor to async/await" # full AI prompt
|
|
124
|
+
cto --context "debug scoring" --json # JSON for tooling
|
|
125
|
+
cto --context "fix auth" --budget 30000 # custom token budget
|
|
126
|
+
```
|
|
127
|
+
|
|
128
|
+
Output includes full file contents in markdown, ready for Claude, ChatGPT, or any AI. **Secrets are automatically redacted** — API keys, tokens, passwords, PII are replaced with `****` before output.
|
|
129
|
+
|
|
130
|
+
## Feedback Loop
|
|
131
|
+
|
|
132
|
+
CTO learns from real feedback, not from itself:
|
|
133
|
+
|
|
134
|
+
```bash
|
|
135
|
+
cto --accept # last selection was good
|
|
136
|
+
cto --reject # last selection was bad
|
|
137
|
+
cto --reject --missing src/auth.ts # this file was missing
|
|
138
|
+
cto --stats # see what CTO has learned
|
|
79
139
|
```
|
|
80
140
|
|
|
81
141
|
On `--reject`, CTO also detects files you edited after the selection that weren't in the context — those get automatically boosted for next time.
|
|
@@ -89,7 +149,74 @@ cto --audit --full-scan # ignore cache, scan everything
|
|
|
89
149
|
cto --audit --json # machine-readable output
|
|
90
150
|
```
|
|
91
151
|
|
|
92
|
-
45+ patterns (AWS, Stripe, GitHub, OpenAI, Slack,
|
|
152
|
+
45+ patterns (AWS, Stripe, GitHub, OpenAI, Slack, Cloudflare...) plus Shannon entropy analysis. The real value: **audit protects context** — every `--stdout`, `--output`, and `--prompt` auto-sanitizes secrets before output.
|
|
153
|
+
|
|
154
|
+
```
|
|
155
|
+
Before: OPENAI_KEY = "sk-Rk8bN3xYz2Wq5PmL7jCvT1aBcDe"
|
|
156
|
+
After: OPENAI_KEY = "sk-R********************De"
|
|
157
|
+
```
|
|
158
|
+
|
|
159
|
+
## AI Gateway (Enterprise)
|
|
160
|
+
|
|
161
|
+
A transparent HTTP proxy between your developers and AI providers. Automatically injects optimized context, redacts secrets, and tracks costs — without changing developer workflow.
|
|
162
|
+
|
|
163
|
+
```bash
|
|
164
|
+
cto --gateway # Start on port 8787
|
|
165
|
+
cto --gateway --port 9000 # Custom port
|
|
166
|
+
cto --gateway --block-secrets # Block requests with critical secrets
|
|
167
|
+
cto --gateway --budget-daily 50 # $50/day budget limit
|
|
168
|
+
cto --gateway --budget-monthly 500 # $500/month budget limit
|
|
169
|
+
```
|
|
170
|
+
|
|
171
|
+
```
|
|
172
|
+
Developer → CTO Gateway → [context injection + sanitization + cost tracking] → AI Provider
|
|
173
|
+
↓
|
|
174
|
+
Dashboard (http://localhost:8787/__cto)
|
|
175
|
+
```
|
|
176
|
+
|
|
177
|
+
**What the gateway does automatically:**
|
|
178
|
+
- **Injects CTO-selected context** into every AI request (TF-IDF + composite scoring)
|
|
179
|
+
- **Redacts secrets** before they leave the network (45+ patterns)
|
|
180
|
+
- **Tracks costs** per model, per day, per month with budget alerts
|
|
181
|
+
- **Streams responses** with zero-copy SSE passthrough
|
|
182
|
+
- **Serves a live dashboard** at `/__cto` with real-time metrics
|
|
183
|
+
|
|
184
|
+
Supports OpenAI, Anthropic, Google, and Azure OpenAI. SSRF protection built-in.
|
|
185
|
+
|
|
186
|
+
## Cross-Repo Context
|
|
187
|
+
|
|
188
|
+
When working on a task, CTO can pull relevant files from **sibling repositories** — not just the current project.
|
|
189
|
+
|
|
190
|
+
```bash
|
|
191
|
+
cto --context "fix payment webhook" --auto-repos # Auto-discover sibling repos
|
|
192
|
+
cto --context "fix payment webhook" --repos shared-types,payment-service
|
|
193
|
+
```
|
|
194
|
+
|
|
195
|
+
**How it works:**
|
|
196
|
+
1. Discovers sibling repos in parent directory (any dir with `package.json`, `tsconfig.json`, `Cargo.toml`, etc.)
|
|
197
|
+
2. Builds a lightweight TF-IDF index per sibling (reads source files, no full analysis)
|
|
198
|
+
3. Queries each sibling with the task description
|
|
199
|
+
4. Returns ranked matches with repo attribution and content
|
|
200
|
+
|
|
201
|
+
Real use case: You're fixing a webhook handler in `api-gateway` — CTO finds the `Payment` interface in `shared-types` and the consumer in `notification-service` automatically.
|
|
202
|
+
|
|
203
|
+
## Cost-Aware Model Routing
|
|
204
|
+
|
|
205
|
+
CTO analyzes the **actual selected context** (not just the project) to recommend the cheapest model that can handle the task.
|
|
206
|
+
|
|
207
|
+
```bash
|
|
208
|
+
cto --context "update readme" --route # → Haiku ($0.08/call, 73% cheaper)
|
|
209
|
+
cto --context "fix auth bug" --route # → Opus ($1.33/call, critical complexity)
|
|
210
|
+
cto --context "refactor API" --route # → Sonnet ($0.30/call, balanced)
|
|
211
|
+
```
|
|
212
|
+
|
|
213
|
+
**Complexity is computed from real signals:**
|
|
214
|
+
- Token density (% of budget used)
|
|
215
|
+
- Risk concentration (top-5 file avg risk vs project max)
|
|
216
|
+
- Directory diversity (cross-cutting = harder)
|
|
217
|
+
- Dependency density among selected files
|
|
218
|
+
|
|
219
|
+
The gateway also uses this: every proxied request gets a model recommendation in the injected context.
|
|
93
220
|
|
|
94
221
|
## MCP Server
|
|
95
222
|
|
|
@@ -107,17 +234,6 @@ Works as an MCP server for AI editors (Windsurf, Claude Desktop, Cursor).
|
|
|
107
234
|
|
|
108
235
|
MCP output is also auto-sanitized when `includeContents: true`.
|
|
109
236
|
|
|
110
|
-
## How it works
|
|
111
|
-
|
|
112
|
-
1. **Dependency graph** — parses imports, builds adjacency list, identifies hubs
|
|
113
|
-
2. **Risk scoring** — complexity × centrality × recency (continuous, log-scaled)
|
|
114
|
-
3. **TF-IDF/BM25 semantic matching** — task description scored against all file contents + path boosting
|
|
115
|
-
4. **Composite ranking** — `finalScore = risk × 0.4 + semantic × 0.4 + learner × 0.2`
|
|
116
|
-
5. **Greedy allocation** — fills token budget top-down, cascading prune levels (full → signatures → skeleton)
|
|
117
|
-
6. **Bayesian learning** — exponential decay on priors, Wilson score confidence, per-task-type patterns
|
|
118
|
-
|
|
119
|
-
No AI is used for selection. Same input → same output. Deterministic.
|
|
120
|
-
|
|
121
237
|
## Programmatic API
|
|
122
238
|
|
|
123
239
|
```typescript
|
|
@@ -132,22 +248,107 @@ const selection = await selectContext({
|
|
|
132
248
|
task: 'fix auth',
|
|
133
249
|
analysis,
|
|
134
250
|
budget: 50_000,
|
|
135
|
-
semanticScores,
|
|
251
|
+
semanticScores,
|
|
136
252
|
});
|
|
137
253
|
```
|
|
138
254
|
|
|
139
|
-
##
|
|
255
|
+
## v8.0 — What's New
|
|
256
|
+
|
|
257
|
+
### Chunk-Level Retrieval (the big one)
|
|
258
|
+
|
|
259
|
+
Instead of including entire files, CTO now extracts **only the relevant functions and methods**. A 2000-line file with 1 relevant method → 50 lines included, not 2000.
|
|
260
|
+
|
|
261
|
+
```
|
|
262
|
+
### src/main/java/com/example/cache/CacheService.java
|
|
263
|
+
```java
|
|
264
|
+
// L15-22: method invalidate
|
|
265
|
+
public void invalidate(String id) {
|
|
266
|
+
redis.delete("cache:seller:" + id);
|
|
267
|
+
}
|
|
268
|
+
|
|
269
|
+
// ... lines 23-45 omitted ...
|
|
270
|
+
|
|
271
|
+
// L46-52: method retrieve
|
|
272
|
+
public SellerDTO retrieve(String id) {
|
|
273
|
+
return redis.opsForValue().get("cache:seller:" + id);
|
|
274
|
+
}
|
|
275
|
+
```
|
|
276
|
+
|
|
277
|
+
Supports Java, TypeScript, Python, Go.
|
|
278
|
+
|
|
279
|
+
### Query Intent Parsing
|
|
280
|
+
|
|
281
|
+
Before searching, CTO parses your task into structured intent:
|
|
282
|
+
|
|
283
|
+
```
|
|
284
|
+
"fix the seller cache invalidation on KVS delete"
|
|
285
|
+
→ action: fix
|
|
286
|
+
→ entities: [seller, kvs] (3× weight)
|
|
287
|
+
→ operations: [invalidate, delete] (2× weight)
|
|
288
|
+
→ layers: [cache]
|
|
289
|
+
```
|
|
290
|
+
|
|
291
|
+
Entities get 3× BM25 weight, operations get 2×. Much better precision on enterprise queries.
|
|
292
|
+
|
|
293
|
+
### Embedding Search + RRF Fusion
|
|
294
|
+
|
|
295
|
+
TF-IDF cosine embedding vectors complement BM25 lexical matching. Merged via Reciprocal Rank Fusion (60/40 BM25/embedding). Catches semantic similarity that BM25 misses.
|
|
296
|
+
|
|
297
|
+
### Cross-File Call Graph
|
|
298
|
+
|
|
299
|
+
Traces method calls across files: `cacheService.invalidate()` in UseCase → finds `CacheService.java`. Regex-based, works for Java/TS/Python/Go.
|
|
300
|
+
|
|
301
|
+
### Git Co-Change Signal
|
|
302
|
+
|
|
303
|
+
Files frequently modified together in git history get boosted. Jaccard similarity from commit co-occurrence.
|
|
304
|
+
|
|
305
|
+
### Multi-Hop Reasoning
|
|
306
|
+
|
|
307
|
+
Complex enterprise queries auto-detected. Iterative BM25: top matches → expand via deps + call graph → re-query. Traces full execution chains (4/4 hops).
|
|
308
|
+
|
|
309
|
+
### Evaluation Harness
|
|
310
|
+
|
|
311
|
+
Ground truth benchmark with must-have/relevant/noise labels. 100% must-have recall on 4-task Java enterprise benchmark.
|
|
312
|
+
|
|
313
|
+
## Enterprise Features
|
|
314
|
+
|
|
315
|
+
- **AI Gateway** — transparent HTTP proxy with context injection, secret redaction, cost tracking
|
|
316
|
+
- **Team Auth** — per-team API keys, JWT (HS256/RS256), rate limiting, OIDC discovery
|
|
317
|
+
- **Policy Engine** — model overrides by task type, cost caps, block rules
|
|
318
|
+
- **Metrics** — Prometheus, Datadog JSON, StatsD UDP
|
|
319
|
+
- **A/B Testing** — context strategy experiments with z-test significance
|
|
320
|
+
- **LSP Bridge** — JSON-RPC 2.0 for VS Code, JetBrains, Neovim
|
|
321
|
+
- **Persistent Index Cache** — 50K-file repos: 5s → <100ms on warm cache
|
|
322
|
+
|
|
323
|
+
## Competitor Comparison
|
|
324
|
+
|
|
325
|
+
| Feature | CTO v8 | Cursor | Sourcegraph Cody |
|
|
326
|
+
|---|---|---|---|
|
|
327
|
+
| BM25 retrieval | ✅ | ✅ | ✅ |
|
|
328
|
+
| Embedding search | ✅ TF-IDF cosine+RRF | ✅ | ✅ |
|
|
329
|
+
| Chunk-level retrieval | ✅ 4 langs | ✅ | ✅ |
|
|
330
|
+
| Multi-signal RRF fusion | ✅ 8-signal | ❌ | ❌ |
|
|
331
|
+
| Cross-file call graph | ✅ | ❌ | ❌ |
|
|
332
|
+
| Git co-change signal | ✅ | ❌ | ❌ |
|
|
333
|
+
| Multi-hop reasoning | ✅ | ❌ | ❌ |
|
|
334
|
+
| Query intent parsing | ✅ | ❌ | ❌ |
|
|
335
|
+
| Feedback learning | ✅ | ❌ | ❌ |
|
|
336
|
+
| Secret redaction | ✅ | ❌ | ❌ |
|
|
337
|
+
| **Total signals** | **18** | **~3** | **~5** |
|
|
338
|
+
|
|
339
|
+
## Honest Limitations
|
|
140
340
|
|
|
141
|
-
- **TypeScript/JavaScript gets
|
|
142
|
-
- **TF-IDF, not
|
|
143
|
-
- **Learning needs ~5 feedback cycles** to start influencing selection. First runs are pure
|
|
144
|
-
- **
|
|
341
|
+
- **TypeScript/JavaScript gets AST analysis.** Python/Go/Java/Rust get regex-based parsing (good for graphs + chunking, not AST-precise).
|
|
342
|
+
- **Embeddings are TF-IDF cosine, not neural.** ONNX infrastructure ready — neural model would add ~5-10% recall.
|
|
343
|
+
- **Learning needs ~5 feedback cycles** to start influencing selection. First runs are pure pipeline.
|
|
344
|
+
- **Chunk extraction is regex-based** — works for standard methods/functions, may miss DSLs or deeply nested code.
|
|
345
|
+
- **Benchmarked against naive baselines.** Not compared against Cursor/Copilot internal context engines.
|
|
145
346
|
|
|
146
347
|
## Contributing
|
|
147
348
|
|
|
148
349
|
```bash
|
|
149
350
|
git clone https://github.com/cto-ai/cto-ai-cli.git && cd cto-ai-cli
|
|
150
|
-
npm install && npm run build && npm test #
|
|
351
|
+
npm install && npm run build && npm test # 1,133 tests
|
|
151
352
|
```
|
|
152
353
|
|
|
153
354
|
## License
|