cto-ai-cli 6.1.0 → 7.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,35 +1,84 @@
1
- # CTO — AI context selection done right
1
+ # CTO — AI Context Selection Engine
2
2
 
3
3
  [![npm](https://img.shields.io/npm/v/cto-ai-cli.svg)](https://www.npmjs.com/package/cto-ai-cli)
4
4
  [![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](LICENSE)
5
+ [![Tests](https://img.shields.io/badge/tests-606%20passing-brightgreen)](.)
5
6
 
6
- Pick the right files for any AI task. Secrets auto-redacted. Learns from your feedback.
7
+ **Pick the right files for any AI task. Secrets auto-redacted. Learns from your feedback.**
7
8
 
8
9
  ```bash
9
- # Select context, copy to clipboard
10
- cto --context "fix the auth middleware" --stdout | pbcopy
10
+ cto --context "fix the auth middleware" --stdout | pbcopy # → clipboard
11
+ cto --context "fix auth" --prompt "Refactor to use JWT" # → AI prompt
12
+ cto --accept # → learns
13
+ ```
14
+
15
+ 76KB package · 606 tests · Zero AI dependencies.
16
+
17
+ ---
18
+
19
+ ## The Problem
11
20
 
12
- # Generate a complete AI prompt
13
- cto --context "fix the auth middleware" --prompt "Refactor this to use JWT"
21
+ When developers use AI coding assistants, they need to provide context — the right source files. Today, most teams either:
14
22
 
15
- # Was the AI output good? Tell CTO so it learns.
16
- cto --accept
23
+ - **Send everything** expensive, slow, hits token limits
24
+ - **Pick files manually** → miss dependencies, forget test files, leak secrets
25
+
26
+ CTO solves both: it **automatically selects the most relevant files** for any task, **sanitizes secrets** before they reach any AI provider, and **learns from feedback** to get better over time.
27
+
28
+ ## Quick Demo
29
+
30
+ ```bash
31
+ cto --demo # Run a live showcase on your project
17
32
  ```
18
33
 
19
- 74KB package. Zero bloat.
34
+ This runs a self-contained presentation that shows: project analysis, semantic matching proof, secret sanitization, ROI calculation, and benchmark results.
20
35
 
21
- ---
36
+ ## Benchmark Results
22
37
 
23
- ## What it does
38
+ Tested against 8 curated tasks with ground truth (known correct files):
24
39
 
25
- When you ask an AI to help with code, it needs the right files as context. Send too few and the AI hallucinates. Send too many and you waste tokens. CTO picks the right ones:
40
+ | Strategy | Precision | Must-have Recall | F1 |
41
+ |---|---|---|---|
42
+ | **CTO** | 33.6% | **100.0%** | **48.7%** |
43
+ | TF-IDF only | 54.6% | 87.5% | 62.0% |
44
+ | Risk-only | 20.8% | 18.8% | 15.0% |
45
+ | Alphabetical | 8.3% | 31.3% | 12.9% |
46
+ | Random | 7.7% | 6.3% | 2.8% |
26
47
 
27
- 1. **Matches your task** TF-IDF/BM25 semantic matching, not keyword guessing
28
- 2. **Ranks by composite score** — `risk × 0.4 + semantic × 0.4 + learner × 0.2`
29
- 3. **Sanitizes output** — API keys, tokens, passwords auto-redacted before they reach any AI
30
- 4. **Learns from feedback** — `--accept` / `--reject` teach it what you actually need
48
+ **CTO never misses a must-have file** (100% recall). 3.8× better F1 than alphabetical. 17× better than random.
49
+
50
+ ## ROI
51
+
52
+ On a typical 130-file TypeScript project:
53
+
54
+ | Metric | Without CTO | With CTO |
55
+ |---|---|---|
56
+ | Tokens per interaction | 370K (all files) | ~28K (selected) |
57
+ | Cost per interaction (Sonnet) | $1.11 | $0.08 |
58
+ | **Monthly cost (10 devs, 40/day)** | **$8,880** | **$640** |
59
+ | **Annual savings** | — | **~$99,000** |
60
+
61
+ Plus: fewer hallucinations (right context), zero secret leaks, and the learner gets smarter with every `--accept` / `--reject`.
62
+
63
+ ## How it Works
64
+
65
+ ```
66
+ Task description ──→ TF-IDF/BM25 ──→ Semantic scores ──┐
67
+
68
+ Project files ──→ Dependency graph ──→ Risk scores ──────┤──→ Composite ──→ Greedy ──→ Selection
69
+ │ ranking alloc
70
+ Feedback history ──→ Bayesian learner ──→ Boosts ────────┘
71
+ ```
72
+
73
+ 1. **Dependency graph** — parses imports, builds adjacency list, identifies hubs
74
+ 2. **Risk scoring** — complexity × centrality × recency (continuous, log-scaled)
75
+ 3. **TF-IDF/BM25 semantic matching** — task description scored against file contents + path boosting
76
+ 4. **Composite ranking** — `finalScore = semantic × 0.55 + risk × 0.25 + learner × 0.2`
77
+ 5. **Noise filtering** — files with zero semantic relevance are excluded (benchmark-driven optimization)
78
+ 6. **Greedy allocation** — fills token budget top-down, cascading prune levels (full → signatures → skeleton)
79
+ 7. **Bayesian learning** — exponential decay, Wilson score confidence, per-task-type patterns
31
80
 
32
- Different tasks different files. `"fix auth"` and `"add database tests"` return **completely different selections**.
81
+ **No AI is used for selection.** Same input same output. Deterministic.
33
82
 
34
83
  ## Install
35
84
 
@@ -41,41 +90,25 @@ npx cto-ai-cli # or one-shot
41
90
  ## Context Selection
42
91
 
43
92
  ```bash
44
- # Human-readable summary
45
- cto --context "refactor the auth middleware"
46
-
47
- # Pipe to clipboard (macOS)
48
- cto --context "fix login bug" --stdout | pbcopy
49
-
50
- # Save to file (secrets auto-redacted)
51
- cto --context "add tests" --output context.md
52
-
53
- # Full AI prompt with instruction
54
- cto --context "fix login" --prompt "Refactor to use async/await"
55
-
56
- # JSON for tooling
57
- cto --context "debug scoring" --json
58
-
59
- # Custom token budget
60
- cto --context "fix auth" --budget 30000
93
+ cto --context "refactor the auth middleware" # human-readable summary
94
+ cto --context "fix login bug" --stdout | pbcopy # pipe to clipboard
95
+ cto --context "add tests" --output context.md # save to file
96
+ cto --context "fix login" --prompt "Refactor to async/await" # full AI prompt
97
+ cto --context "debug scoring" --json # JSON for tooling
98
+ cto --context "fix auth" --budget 30000 # custom token budget
61
99
  ```
62
100
 
63
- Output includes full file contents in markdown, ready to paste into Claude, ChatGPT, or any AI. **Secrets are automatically redacted** — API keys, tokens, passwords, PII are replaced with `****` before output.
101
+ Output includes full file contents in markdown, ready for Claude, ChatGPT, or any AI. **Secrets are automatically redacted** — API keys, tokens, passwords, PII are replaced with `****` before output.
64
102
 
65
103
  ## Feedback Loop
66
104
 
67
105
  CTO learns from real feedback, not from itself:
68
106
 
69
107
  ```bash
70
- # After using the context and it worked:
71
- cto --accept
72
-
73
- # If the AI needed files CTO didn't include:
74
- cto --reject
75
- cto --reject --missing src/types/auth.ts
76
-
77
- # See what CTO has learned:
78
- cto --stats
108
+ cto --accept # last selection was good
109
+ cto --reject # last selection was bad
110
+ cto --reject --missing src/auth.ts # this file was missing
111
+ cto --stats # see what CTO has learned
79
112
  ```
80
113
 
81
114
  On `--reject`, CTO also detects files you edited after the selection that weren't in the context — those get automatically boosted for next time.
@@ -89,7 +122,74 @@ cto --audit --full-scan # ignore cache, scan everything
89
122
  cto --audit --json # machine-readable output
90
123
  ```
91
124
 
92
- 45+ patterns (AWS, Stripe, GitHub, OpenAI, Slack, etc.) plus Shannon entropy analysis. But the real value is that **audit protects context**: every `--stdout`, `--output`, and `--prompt` command auto-sanitizes secrets before output.
125
+ 45+ patterns (AWS, Stripe, GitHub, OpenAI, Slack, Cloudflare...) plus Shannon entropy analysis. The real value: **audit protects context** every `--stdout`, `--output`, and `--prompt` auto-sanitizes secrets before output.
126
+
127
+ ```
128
+ Before: OPENAI_KEY = "sk-Rk8bN3xYz2Wq5PmL7jCvT1aBcDe"
129
+ After: OPENAI_KEY = "sk-R********************De"
130
+ ```
131
+
132
+ ## AI Gateway (Enterprise)
133
+
134
+ A transparent HTTP proxy between your developers and AI providers. Automatically injects optimized context, redacts secrets, and tracks costs — without changing developer workflow.
135
+
136
+ ```bash
137
+ cto --gateway # Start on port 8787
138
+ cto --gateway --port 9000 # Custom port
139
+ cto --gateway --block-secrets # Block requests with critical secrets
140
+ cto --gateway --budget-daily 50 # $50/day budget limit
141
+ cto --gateway --budget-monthly 500 # $500/month budget limit
142
+ ```
143
+
144
+ ```
145
+ Developer → CTO Gateway → [context injection + sanitization + cost tracking] → AI Provider
146
+
147
+ Dashboard (http://localhost:8787/__cto)
148
+ ```
149
+
150
+ **What the gateway does automatically:**
151
+ - **Injects CTO-selected context** into every AI request (TF-IDF + composite scoring)
152
+ - **Redacts secrets** before they leave the network (45+ patterns)
153
+ - **Tracks costs** per model, per day, per month with budget alerts
154
+ - **Streams responses** with zero-copy SSE passthrough
155
+ - **Serves a live dashboard** at `/__cto` with real-time metrics
156
+
157
+ Supports OpenAI, Anthropic, Google, and Azure OpenAI. SSRF protection built-in.
158
+
159
+ ## Cross-Repo Context
160
+
161
+ When working on a task, CTO can pull relevant files from **sibling repositories** — not just the current project.
162
+
163
+ ```bash
164
+ cto --context "fix payment webhook" --auto-repos # Auto-discover sibling repos
165
+ cto --context "fix payment webhook" --repos shared-types,payment-service
166
+ ```
167
+
168
+ **How it works:**
169
+ 1. Discovers sibling repos in parent directory (any dir with `package.json`, `tsconfig.json`, `Cargo.toml`, etc.)
170
+ 2. Builds a lightweight TF-IDF index per sibling (reads source files, no full analysis)
171
+ 3. Queries each sibling with the task description
172
+ 4. Returns ranked matches with repo attribution and content
173
+
174
+ Real use case: You're fixing a webhook handler in `api-gateway` — CTO finds the `Payment` interface in `shared-types` and the consumer in `notification-service` automatically.
175
+
176
+ ## Cost-Aware Model Routing
177
+
178
+ CTO analyzes the **actual selected context** (not just the project) to recommend the cheapest model that can handle the task.
179
+
180
+ ```bash
181
+ cto --context "update readme" --route # → Haiku ($0.08/call, 73% cheaper)
182
+ cto --context "fix auth bug" --route # → Opus ($1.33/call, critical complexity)
183
+ cto --context "refactor API" --route # → Sonnet ($0.30/call, balanced)
184
+ ```
185
+
186
+ **Complexity is computed from real signals:**
187
+ - Token density (% of budget used)
188
+ - Risk concentration (top-5 file avg risk vs project max)
189
+ - Directory diversity (cross-cutting = harder)
190
+ - Dependency density among selected files
191
+
192
+ The gateway also uses this: every proxied request gets a model recommendation in the injected context.
93
193
 
94
194
  ## MCP Server
95
195
 
@@ -107,17 +207,6 @@ Works as an MCP server for AI editors (Windsurf, Claude Desktop, Cursor).
107
207
 
108
208
  MCP output is also auto-sanitized when `includeContents: true`.
109
209
 
110
- ## How it works
111
-
112
- 1. **Dependency graph** — parses imports, builds adjacency list, identifies hubs
113
- 2. **Risk scoring** — complexity × centrality × recency (continuous, log-scaled)
114
- 3. **TF-IDF/BM25 semantic matching** — task description scored against all file contents + path boosting
115
- 4. **Composite ranking** — `finalScore = risk × 0.4 + semantic × 0.4 + learner × 0.2`
116
- 5. **Greedy allocation** — fills token budget top-down, cascading prune levels (full → signatures → skeleton)
117
- 6. **Bayesian learning** — exponential decay on priors, Wilson score confidence, per-task-type patterns
118
-
119
- No AI is used for selection. Same input → same output. Deterministic.
120
-
121
210
  ## Programmatic API
122
211
 
123
212
  ```typescript
@@ -132,22 +221,66 @@ const selection = await selectContext({
132
221
  task: 'fix auth',
133
222
  analysis,
134
223
  budget: 50_000,
135
- semanticScores, // wired into ranking
224
+ semanticScores,
136
225
  });
137
226
  ```
138
227
 
139
- ## Honest limitations
228
+ ## v7.0 Enterprise Features
229
+
230
+ ### Precision Reranker (96.9% precision, was 33.6%)
231
+
232
+ Multi-signal reranker between BM25 retrieval and greedy allocation:
233
+ - **Term coverage**: fraction of unique query terms matched per file
234
+ - **Term specificity**: IDF-weighted — rare terms matter more
235
+ - **Bigram proximity**: query terms appearing close together in the file
236
+ - **Dependency signal**: files in the dependency cone of top matches
237
+ - **Quality gate**: adaptive cutoff stops filling budget with noise
238
+
239
+ ### Persistent Index Cache
240
+
241
+ TF-IDF index persisted to `.cto/index-cache.json` with per-file mtime tracking. Subsequent queries only re-tokenize changed files. 50K-file repos go from 5s → <100ms on warm cache.
242
+
243
+ ### Multi-Language Dependency Graphs
244
+
245
+ Regex-based import parsing for **Python**, **Go**, **Java**, and **Rust** alongside ts-morph for TS/JS. Enables hub detection, risk scoring, and dependency expansion for polyglot codebases.
246
+
247
+ ```bash
248
+ # Works on Python, Go, Java, Rust projects — not just TypeScript
249
+ cto --context "fix auth handler" /path/to/go-project
250
+ ```
251
+
252
+ ### Team Authentication & SSO
253
+
254
+ Per-team API keys, JWT validation (HS256/RS256), rate limiting, model allowlists. Teams stored in `.cto/gateway/teams.json`.
255
+
256
+ ### Metrics Export
257
+
258
+ Prometheus exposition format at `/__cto/metrics`, Datadog JSON, and StatsD UDP. Counters, histograms, gauges for requests, tokens, cost, latency, secrets.
259
+
260
+ ### Per-Team Policy Engine
261
+
262
+ Routing rules per team: model overrides by task type, cost caps per request, context budget limits, block rules. Preset policies: `createCostConscious()`, `createSecurityFirst()`.
263
+
264
+ ### Closed-Loop A/B Testing
265
+
266
+ Real experimentation on context strategies with two-proportion z-test for statistical significance. Deterministic assignment (SHA-256 hashing), auto-conclusion when p < 0.05.
267
+
268
+ ### LSP Bridge (IDE Plugin)
269
+
270
+ JSON-RPC 2.0 server over stdin/stdout for any IDE: VS Code, JetBrains, Neovim, Emacs. Custom methods: `cto/selectContext`, `cto/score`, `cto/audit`, `cto/experiments`.
271
+
272
+ ## Honest Limitations
140
273
 
141
- - **TypeScript/JavaScript gets deep analysis.** Other languages get basic file + import analysis.
142
- - **TF-IDF, not embeddings.** Handles most tasks well but won't understand complex intent.
274
+ - **TypeScript/JavaScript gets AST analysis.** Python/Go/Java/Rust get regex-based import parsing (good for graphs, not AST-accurate).
275
+ - **BM25 + reranker, not embeddings.** 96.9% precision on our benchmark. No neural model needed.
143
276
  - **Learning needs ~5 feedback cycles** to start influencing selection. First runs are pure graph + risk + semantic.
144
- - **Not compared against Cursor/Copilot internal context.** Our baselines are naive (alphabetical, random).
277
+ - **Benchmarked against naive baselines** (alphabetical, random, risk-only, TF-IDF-only). Not compared against Cursor/Copilot internal context engines.
145
278
 
146
279
  ## Contributing
147
280
 
148
281
  ```bash
149
282
  git clone https://github.com/cto-ai/cto-ai-cli.git && cd cto-ai-cli
150
- npm install && npm run build && npm test # 597 tests
283
+ npm install && npm run build && npm test # 776 tests
151
284
  ```
152
285
 
153
286
  ## License