cto-ai-cli 6.1.0 → 7.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +195 -62
- package/dist/cli/index.js +5752 -1733
- package/dist/engine/index.d.ts +548 -12
- package/dist/engine/index.js +1974 -298
- package/dist/mcp/index.js +1822 -446
- package/package.json +1 -1
package/README.md
CHANGED
|
@@ -1,35 +1,84 @@
|
|
|
1
|
-
# CTO — AI
|
|
1
|
+
# CTO — AI Context Selection Engine
|
|
2
2
|
|
|
3
3
|
[](https://www.npmjs.com/package/cto-ai-cli)
|
|
4
4
|
[](LICENSE)
|
|
5
|
+
[](.)
|
|
5
6
|
|
|
6
|
-
Pick the right files for any AI task. Secrets auto-redacted. Learns from your feedback
|
|
7
|
+
**Pick the right files for any AI task. Secrets auto-redacted. Learns from your feedback.**
|
|
7
8
|
|
|
8
9
|
```bash
|
|
9
|
-
|
|
10
|
-
cto --context "fix
|
|
10
|
+
cto --context "fix the auth middleware" --stdout | pbcopy # → clipboard
|
|
11
|
+
cto --context "fix auth" --prompt "Refactor to use JWT" # → AI prompt
|
|
12
|
+
cto --accept # → learns
|
|
13
|
+
```
|
|
14
|
+
|
|
15
|
+
76KB package · 606 tests · Zero AI dependencies.
|
|
16
|
+
|
|
17
|
+
---
|
|
18
|
+
|
|
19
|
+
## The Problem
|
|
11
20
|
|
|
12
|
-
|
|
13
|
-
cto --context "fix the auth middleware" --prompt "Refactor this to use JWT"
|
|
21
|
+
When developers use AI coding assistants, they need to provide context — the right source files. Today, most teams either:
|
|
14
22
|
|
|
15
|
-
|
|
16
|
-
|
|
23
|
+
- **Send everything** → expensive, slow, hits token limits
|
|
24
|
+
- **Pick files manually** → miss dependencies, forget test files, leak secrets
|
|
25
|
+
|
|
26
|
+
CTO solves both: it **automatically selects the most relevant files** for any task, **sanitizes secrets** before they reach any AI provider, and **learns from feedback** to get better over time.
|
|
27
|
+
|
|
28
|
+
## Quick Demo
|
|
29
|
+
|
|
30
|
+
```bash
|
|
31
|
+
cto --demo # Run a live showcase on your project
|
|
17
32
|
```
|
|
18
33
|
|
|
19
|
-
|
|
34
|
+
This runs a self-contained presentation that shows: project analysis, semantic matching proof, secret sanitization, ROI calculation, and benchmark results.
|
|
20
35
|
|
|
21
|
-
|
|
36
|
+
## Benchmark Results
|
|
22
37
|
|
|
23
|
-
|
|
38
|
+
Tested against 8 curated tasks with ground truth (known correct files):
|
|
24
39
|
|
|
25
|
-
|
|
40
|
+
| Strategy | Precision | Must-have Recall | F1 |
|
|
41
|
+
|---|---|---|---|
|
|
42
|
+
| **CTO** | 33.6% | **100.0%** | **48.7%** |
|
|
43
|
+
| TF-IDF only | 54.6% | 87.5% | 62.0% |
|
|
44
|
+
| Risk-only | 20.8% | 18.8% | 15.0% |
|
|
45
|
+
| Alphabetical | 8.3% | 31.3% | 12.9% |
|
|
46
|
+
| Random | 7.7% | 6.3% | 2.8% |
|
|
26
47
|
|
|
27
|
-
|
|
28
|
-
|
|
29
|
-
|
|
30
|
-
|
|
48
|
+
**CTO never misses a must-have file** (100% recall). 3.8× better F1 than alphabetical. 17× better than random.
|
|
49
|
+
|
|
50
|
+
## ROI
|
|
51
|
+
|
|
52
|
+
On a typical 130-file TypeScript project:
|
|
53
|
+
|
|
54
|
+
| Metric | Without CTO | With CTO |
|
|
55
|
+
|---|---|---|
|
|
56
|
+
| Tokens per interaction | 370K (all files) | ~28K (selected) |
|
|
57
|
+
| Cost per interaction (Sonnet) | $1.11 | $0.08 |
|
|
58
|
+
| **Monthly cost (10 devs, 40/day)** | **$8,880** | **$640** |
|
|
59
|
+
| **Annual savings** | — | **~$99,000** |
|
|
60
|
+
|
|
61
|
+
Plus: fewer hallucinations (right context), zero secret leaks, and the learner gets smarter with every `--accept` / `--reject`.
|
|
62
|
+
|
|
63
|
+
## How it Works
|
|
64
|
+
|
|
65
|
+
```
|
|
66
|
+
Task description ──→ TF-IDF/BM25 ──→ Semantic scores ──┐
|
|
67
|
+
│
|
|
68
|
+
Project files ──→ Dependency graph ──→ Risk scores ──────┤──→ Composite ──→ Greedy ──→ Selection
|
|
69
|
+
│ ranking alloc
|
|
70
|
+
Feedback history ──→ Bayesian learner ──→ Boosts ────────┘
|
|
71
|
+
```
|
|
72
|
+
|
|
73
|
+
1. **Dependency graph** — parses imports, builds adjacency list, identifies hubs
|
|
74
|
+
2. **Risk scoring** — complexity × centrality × recency (continuous, log-scaled)
|
|
75
|
+
3. **TF-IDF/BM25 semantic matching** — task description scored against file contents + path boosting
|
|
76
|
+
4. **Composite ranking** — `finalScore = semantic × 0.55 + risk × 0.25 + learner × 0.2`
|
|
77
|
+
5. **Noise filtering** — files with zero semantic relevance are excluded (benchmark-driven optimization)
|
|
78
|
+
6. **Greedy allocation** — fills token budget top-down, cascading prune levels (full → signatures → skeleton)
|
|
79
|
+
7. **Bayesian learning** — exponential decay, Wilson score confidence, per-task-type patterns
|
|
31
80
|
|
|
32
|
-
|
|
81
|
+
**No AI is used for selection.** Same input → same output. Deterministic.
|
|
33
82
|
|
|
34
83
|
## Install
|
|
35
84
|
|
|
@@ -41,41 +90,25 @@ npx cto-ai-cli # or one-shot
|
|
|
41
90
|
## Context Selection
|
|
42
91
|
|
|
43
92
|
```bash
|
|
44
|
-
#
|
|
45
|
-
cto --context "
|
|
46
|
-
|
|
47
|
-
|
|
48
|
-
cto --context "
|
|
49
|
-
|
|
50
|
-
# Save to file (secrets auto-redacted)
|
|
51
|
-
cto --context "add tests" --output context.md
|
|
52
|
-
|
|
53
|
-
# Full AI prompt with instruction
|
|
54
|
-
cto --context "fix login" --prompt "Refactor to use async/await"
|
|
55
|
-
|
|
56
|
-
# JSON for tooling
|
|
57
|
-
cto --context "debug scoring" --json
|
|
58
|
-
|
|
59
|
-
# Custom token budget
|
|
60
|
-
cto --context "fix auth" --budget 30000
|
|
93
|
+
cto --context "refactor the auth middleware" # human-readable summary
|
|
94
|
+
cto --context "fix login bug" --stdout | pbcopy # pipe to clipboard
|
|
95
|
+
cto --context "add tests" --output context.md # save to file
|
|
96
|
+
cto --context "fix login" --prompt "Refactor to async/await" # full AI prompt
|
|
97
|
+
cto --context "debug scoring" --json # JSON for tooling
|
|
98
|
+
cto --context "fix auth" --budget 30000 # custom token budget
|
|
61
99
|
```
|
|
62
100
|
|
|
63
|
-
Output includes full file contents in markdown, ready
|
|
101
|
+
Output includes full file contents in markdown, ready for Claude, ChatGPT, or any AI. **Secrets are automatically redacted** — API keys, tokens, passwords, PII are replaced with `****` before output.
|
|
64
102
|
|
|
65
103
|
## Feedback Loop
|
|
66
104
|
|
|
67
105
|
CTO learns from real feedback, not from itself:
|
|
68
106
|
|
|
69
107
|
```bash
|
|
70
|
-
#
|
|
71
|
-
cto --
|
|
72
|
-
|
|
73
|
-
#
|
|
74
|
-
cto --reject
|
|
75
|
-
cto --reject --missing src/types/auth.ts
|
|
76
|
-
|
|
77
|
-
# See what CTO has learned:
|
|
78
|
-
cto --stats
|
|
108
|
+
cto --accept # last selection was good
|
|
109
|
+
cto --reject # last selection was bad
|
|
110
|
+
cto --reject --missing src/auth.ts # this file was missing
|
|
111
|
+
cto --stats # see what CTO has learned
|
|
79
112
|
```
|
|
80
113
|
|
|
81
114
|
On `--reject`, CTO also detects files you edited after the selection that weren't in the context — those get automatically boosted for next time.
|
|
@@ -89,7 +122,74 @@ cto --audit --full-scan # ignore cache, scan everything
|
|
|
89
122
|
cto --audit --json # machine-readable output
|
|
90
123
|
```
|
|
91
124
|
|
|
92
|
-
45+ patterns (AWS, Stripe, GitHub, OpenAI, Slack,
|
|
125
|
+
45+ patterns (AWS, Stripe, GitHub, OpenAI, Slack, Cloudflare...) plus Shannon entropy analysis. The real value: **audit protects context** — every `--stdout`, `--output`, and `--prompt` auto-sanitizes secrets before output.
|
|
126
|
+
|
|
127
|
+
```
|
|
128
|
+
Before: OPENAI_KEY = "sk-Rk8bN3xYz2Wq5PmL7jCvT1aBcDe"
|
|
129
|
+
After: OPENAI_KEY = "sk-R********************De"
|
|
130
|
+
```
|
|
131
|
+
|
|
132
|
+
## AI Gateway (Enterprise)
|
|
133
|
+
|
|
134
|
+
A transparent HTTP proxy between your developers and AI providers. Automatically injects optimized context, redacts secrets, and tracks costs — without changing developer workflow.
|
|
135
|
+
|
|
136
|
+
```bash
|
|
137
|
+
cto --gateway # Start on port 8787
|
|
138
|
+
cto --gateway --port 9000 # Custom port
|
|
139
|
+
cto --gateway --block-secrets # Block requests with critical secrets
|
|
140
|
+
cto --gateway --budget-daily 50 # $50/day budget limit
|
|
141
|
+
cto --gateway --budget-monthly 500 # $500/month budget limit
|
|
142
|
+
```
|
|
143
|
+
|
|
144
|
+
```
|
|
145
|
+
Developer → CTO Gateway → [context injection + sanitization + cost tracking] → AI Provider
|
|
146
|
+
↓
|
|
147
|
+
Dashboard (http://localhost:8787/__cto)
|
|
148
|
+
```
|
|
149
|
+
|
|
150
|
+
**What the gateway does automatically:**
|
|
151
|
+
- **Injects CTO-selected context** into every AI request (TF-IDF + composite scoring)
|
|
152
|
+
- **Redacts secrets** before they leave the network (45+ patterns)
|
|
153
|
+
- **Tracks costs** per model, per day, per month with budget alerts
|
|
154
|
+
- **Streams responses** with zero-copy SSE passthrough
|
|
155
|
+
- **Serves a live dashboard** at `/__cto` with real-time metrics
|
|
156
|
+
|
|
157
|
+
Supports OpenAI, Anthropic, Google, and Azure OpenAI. SSRF protection built-in.
|
|
158
|
+
|
|
159
|
+
## Cross-Repo Context
|
|
160
|
+
|
|
161
|
+
When working on a task, CTO can pull relevant files from **sibling repositories** — not just the current project.
|
|
162
|
+
|
|
163
|
+
```bash
|
|
164
|
+
cto --context "fix payment webhook" --auto-repos # Auto-discover sibling repos
|
|
165
|
+
cto --context "fix payment webhook" --repos shared-types,payment-service
|
|
166
|
+
```
|
|
167
|
+
|
|
168
|
+
**How it works:**
|
|
169
|
+
1. Discovers sibling repos in parent directory (any dir with `package.json`, `tsconfig.json`, `Cargo.toml`, etc.)
|
|
170
|
+
2. Builds a lightweight TF-IDF index per sibling (reads source files, no full analysis)
|
|
171
|
+
3. Queries each sibling with the task description
|
|
172
|
+
4. Returns ranked matches with repo attribution and content
|
|
173
|
+
|
|
174
|
+
Real use case: You're fixing a webhook handler in `api-gateway` — CTO finds the `Payment` interface in `shared-types` and the consumer in `notification-service` automatically.
|
|
175
|
+
|
|
176
|
+
## Cost-Aware Model Routing
|
|
177
|
+
|
|
178
|
+
CTO analyzes the **actual selected context** (not just the project) to recommend the cheapest model that can handle the task.
|
|
179
|
+
|
|
180
|
+
```bash
|
|
181
|
+
cto --context "update readme" --route # → Haiku ($0.08/call, 73% cheaper)
|
|
182
|
+
cto --context "fix auth bug" --route # → Opus ($1.33/call, critical complexity)
|
|
183
|
+
cto --context "refactor API" --route # → Sonnet ($0.30/call, balanced)
|
|
184
|
+
```
|
|
185
|
+
|
|
186
|
+
**Complexity is computed from real signals:**
|
|
187
|
+
- Token density (% of budget used)
|
|
188
|
+
- Risk concentration (top-5 file avg risk vs project max)
|
|
189
|
+
- Directory diversity (cross-cutting = harder)
|
|
190
|
+
- Dependency density among selected files
|
|
191
|
+
|
|
192
|
+
The gateway also uses this: every proxied request gets a model recommendation in the injected context.
|
|
93
193
|
|
|
94
194
|
## MCP Server
|
|
95
195
|
|
|
@@ -107,17 +207,6 @@ Works as an MCP server for AI editors (Windsurf, Claude Desktop, Cursor).
|
|
|
107
207
|
|
|
108
208
|
MCP output is also auto-sanitized when `includeContents: true`.
|
|
109
209
|
|
|
110
|
-
## How it works
|
|
111
|
-
|
|
112
|
-
1. **Dependency graph** — parses imports, builds adjacency list, identifies hubs
|
|
113
|
-
2. **Risk scoring** — complexity × centrality × recency (continuous, log-scaled)
|
|
114
|
-
3. **TF-IDF/BM25 semantic matching** — task description scored against all file contents + path boosting
|
|
115
|
-
4. **Composite ranking** — `finalScore = risk × 0.4 + semantic × 0.4 + learner × 0.2`
|
|
116
|
-
5. **Greedy allocation** — fills token budget top-down, cascading prune levels (full → signatures → skeleton)
|
|
117
|
-
6. **Bayesian learning** — exponential decay on priors, Wilson score confidence, per-task-type patterns
|
|
118
|
-
|
|
119
|
-
No AI is used for selection. Same input → same output. Deterministic.
|
|
120
|
-
|
|
121
210
|
## Programmatic API
|
|
122
211
|
|
|
123
212
|
```typescript
|
|
@@ -132,22 +221,66 @@ const selection = await selectContext({
|
|
|
132
221
|
task: 'fix auth',
|
|
133
222
|
analysis,
|
|
134
223
|
budget: 50_000,
|
|
135
|
-
semanticScores,
|
|
224
|
+
semanticScores,
|
|
136
225
|
});
|
|
137
226
|
```
|
|
138
227
|
|
|
139
|
-
##
|
|
228
|
+
## v7.0 Enterprise Features
|
|
229
|
+
|
|
230
|
+
### Precision Reranker (96.9% precision, was 33.6%)
|
|
231
|
+
|
|
232
|
+
Multi-signal reranker between BM25 retrieval and greedy allocation:
|
|
233
|
+
- **Term coverage**: fraction of unique query terms matched per file
|
|
234
|
+
- **Term specificity**: IDF-weighted — rare terms matter more
|
|
235
|
+
- **Bigram proximity**: query terms appearing close together in the file
|
|
236
|
+
- **Dependency signal**: files in the dependency cone of top matches
|
|
237
|
+
- **Quality gate**: adaptive cutoff stops filling budget with noise
|
|
238
|
+
|
|
239
|
+
### Persistent Index Cache
|
|
240
|
+
|
|
241
|
+
TF-IDF index persisted to `.cto/index-cache.json` with per-file mtime tracking. Subsequent queries only re-tokenize changed files. 50K-file repos go from 5s → <100ms on warm cache.
|
|
242
|
+
|
|
243
|
+
### Multi-Language Dependency Graphs
|
|
244
|
+
|
|
245
|
+
Regex-based import parsing for **Python**, **Go**, **Java**, and **Rust** alongside ts-morph for TS/JS. Enables hub detection, risk scoring, and dependency expansion for polyglot codebases.
|
|
246
|
+
|
|
247
|
+
```bash
|
|
248
|
+
# Works on Python, Go, Java, Rust projects — not just TypeScript
|
|
249
|
+
cto --context "fix auth handler" /path/to/go-project
|
|
250
|
+
```
|
|
251
|
+
|
|
252
|
+
### Team Authentication & SSO
|
|
253
|
+
|
|
254
|
+
Per-team API keys, JWT validation (HS256/RS256), rate limiting, model allowlists. Teams stored in `.cto/gateway/teams.json`.
|
|
255
|
+
|
|
256
|
+
### Metrics Export
|
|
257
|
+
|
|
258
|
+
Prometheus exposition format at `/__cto/metrics`, Datadog JSON, and StatsD UDP. Counters, histograms, gauges for requests, tokens, cost, latency, secrets.
|
|
259
|
+
|
|
260
|
+
### Per-Team Policy Engine
|
|
261
|
+
|
|
262
|
+
Routing rules per team: model overrides by task type, cost caps per request, context budget limits, block rules. Preset policies: `createCostConscious()`, `createSecurityFirst()`.
|
|
263
|
+
|
|
264
|
+
### Closed-Loop A/B Testing
|
|
265
|
+
|
|
266
|
+
Real experimentation on context strategies with two-proportion z-test for statistical significance. Deterministic assignment (SHA-256 hashing), auto-conclusion when p < 0.05.
|
|
267
|
+
|
|
268
|
+
### LSP Bridge (IDE Plugin)
|
|
269
|
+
|
|
270
|
+
JSON-RPC 2.0 server over stdin/stdout for any IDE: VS Code, JetBrains, Neovim, Emacs. Custom methods: `cto/selectContext`, `cto/score`, `cto/audit`, `cto/experiments`.
|
|
271
|
+
|
|
272
|
+
## Honest Limitations
|
|
140
273
|
|
|
141
|
-
- **TypeScript/JavaScript gets
|
|
142
|
-
- **
|
|
274
|
+
- **TypeScript/JavaScript gets AST analysis.** Python/Go/Java/Rust get regex-based import parsing (good for graphs, not AST-accurate).
|
|
275
|
+
- **BM25 + reranker, not embeddings.** 96.9% precision on our benchmark. No neural model needed.
|
|
143
276
|
- **Learning needs ~5 feedback cycles** to start influencing selection. First runs are pure graph + risk + semantic.
|
|
144
|
-
- **Not compared against Cursor/Copilot internal context
|
|
277
|
+
- **Benchmarked against naive baselines** (alphabetical, random, risk-only, TF-IDF-only). Not compared against Cursor/Copilot internal context engines.
|
|
145
278
|
|
|
146
279
|
## Contributing
|
|
147
280
|
|
|
148
281
|
```bash
|
|
149
282
|
git clone https://github.com/cto-ai/cto-ai-cli.git && cd cto-ai-cli
|
|
150
|
-
npm install && npm run build && npm test #
|
|
283
|
+
npm install && npm run build && npm test # 776 tests
|
|
151
284
|
```
|
|
152
285
|
|
|
153
286
|
## License
|