cto-ai-cli 5.2.0 → 6.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,435 +1,155 @@
1
- # CTO — Stop sending your entire codebase to AI
1
+ # CTO — AI context selection done right
2
2
 
3
- [![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](LICENSE)
4
- [![Tests](https://img.shields.io/badge/tests-550_passing-brightgreen.svg)](#)
5
- [![Coverage](https://img.shields.io/badge/coverage-91%25-brightgreen.svg)](#)
6
3
  [![npm](https://img.shields.io/npm/v/cto-ai-cli.svg)](https://www.npmjs.com/package/cto-ai-cli)
4
+ [![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](LICENSE)
7
5
 
8
- CTO analyzes your project and selects the **minimum set of files** your AI needs saving tokens, reducing cost, and producing code that actually compiles.
6
+ Pick the right files for any AI task. Secrets auto-redacted. Learns from your feedback.
9
7
 
10
8
  ```bash
11
- npx cto-ai-cli
12
- ```
13
-
14
- **Runs in <1 second.** No API keys. No data leaves your machine.
15
-
16
- ---
17
-
18
- ## The Problem
9
+ # Select context, copy to clipboard
10
+ cto --context "fix the auth middleware" --stdout | pbcopy
19
11
 
20
- When you ask an AI to help with code, it needs context. Most approaches:
12
+ # Generate a complete AI prompt
13
+ cto --context "fix the auth middleware" --prompt "Refactor this to use JWT"
21
14
 
22
- - **Send everything** expensive, noisy, AI gets confused
23
- - **Send open files** — misses types, dependencies, config
24
- - **Let the AI pick** — it doesn't know your dependency graph
25
-
26
- The result: AI generates code that **doesn't compile** because it never saw your type definitions.
27
-
28
- ## The Fix
29
-
30
- ```bash
31
- $ npx cto-ai-cli ./my-project
32
- ```
33
- ```
34
- ⚡ cto-score — analyzing your project...
35
-
36
- ╔══════════════════════════════════════════════════╗
37
- ║ ║
38
- ║ 🟢 Context Score™ 88 / 100 Grade: A- ║
39
- ║ ║
40
- ║ Efficiency ████████████████░░░░ 80% ║
41
- ║ Coverage ████████████████████ 100% ║
42
- ║ Risk Control ████████████████████ 100% ║
43
- ║ Structure █░░░░░░░░░░░░░░░░░░ 5% ║
44
- ║ Governance ██████████████████░ 90% ║
45
- ║ ║
46
- ║ 💰 vs. Sending Everything: ║
47
- ║ Tokens saved: 392K (88%) ║
48
- ║ Monthly savings: ~$943 ║
49
- ║ ║
50
- ╚══════════════════════════════════════════════════╝
51
-
52
- Scanned in 0.6s · 199 files · 443K tokens
15
+ # Was the AI output good? Tell CTO so it learns.
16
+ cto --accept
53
17
  ```
54
18
 
55
- ### What each number means
56
-
57
- | Metric | What it measures | Why it matters |
58
- |--------|-----------------|----------------|
59
- | **Context Score (88/100)** | Overall AI-readiness of your project | Higher = AI tools produce better output with your code |
60
- | **Efficiency (80%)** | How much CTO can compress without losing value | 80% means we send 20% of tokens for the same quality |
61
- | **Coverage (100%)** | % of important files included in the selection | 100% = every dependency and type file is captured |
62
- | **Risk Control (100%)** | Are high-risk files (hubs, complex code) prioritized? | Ensures AI sees the files most likely to cause bugs |
63
- | **Structure (5%)** | How well-organized your codebase is for AI | Low = too many large files, poor modularity |
64
- | **Governance (90%)** | Audit logging, policy enforcement, secret scanning | Enterprise readiness |
65
- | **Tokens saved (88%)** | Reduction vs. sending every file | Directly reduces your API costs |
66
- | **Monthly savings ($943)** | Estimated cost reduction at 800 interactions/month | Based on average GPT-4o pricing |
19
+ 74KB package. Zero bloat.
67
20
 
68
21
  ---
69
22
 
70
- ## Quick Start
23
+ ## What it does
71
24
 
72
- ### Score your project
25
+ When you ask an AI to help with code, it needs the right files as context. Send too few and the AI hallucinates. Send too many and you waste tokens. CTO picks the right ones:
73
26
 
74
- ```bash
75
- npx cto-ai-cli # Analyze current directory
76
- npx cto-ai-cli ./my-project # Analyze a specific project
77
- npx cto-ai-cli --json # Machine-readable JSON output
78
- ```
27
+ 1. **Matches your task** — TF-IDF/BM25 semantic matching, not keyword guessing
28
+ 2. **Ranks by composite score** — `risk × 0.4 + semantic × 0.4 + learner × 0.2`
29
+ 3. **Sanitizes output** — API keys, tokens, passwords auto-redacted before they reach any AI
30
+ 4. **Learns from feedback** `--accept` / `--reject` teach it what you actually need
79
31
 
80
- ### Generate optimized context for AI
32
+ Different tasks different files. `"fix auth"` and `"add database tests"` return **completely different selections**.
81
33
 
82
- ```bash
83
- npx cto-ai-cli --fix
84
- ```
85
-
86
- Creates `.cto/context.md` — paste this into any AI chat for optimal context. Also generates `.cto/config.json` and `.cto/.cteignore`.
34
+ ## Install
87
35
 
88
36
  ```bash
89
- npx cto-ai-cli --context "refactor the auth middleware"
37
+ npm i -g cto-ai-cli # global
38
+ npx cto-ai-cli # or one-shot
90
39
  ```
91
40
 
92
- Generates **task-specific** context — only files relevant to auth, including types, dependencies, and related tests.
93
-
94
- Example output:
95
- ```
96
- 📋 Context for: "refactor the auth middleware"
97
-
98
- Selected 12 files (8.2K tokens):
99
-
100
- ┌─ Core (3 files) ─────────────────────────────
101
- │ src/middleware/auth.ts 2,100 tokens
102
- │ src/types/auth.ts 450 tokens
103
- │ src/config/jwt.ts 320 tokens
104
-
105
- ├─ Dependencies (5 files) ─────────────────────
106
- │ src/models/user.ts 1,200 tokens
107
- │ src/services/token.ts 890 tokens
108
- │ ...
109
-
110
- └─ Tests (2 files) ────────────────────────────
111
- tests/auth.test.ts 1,800 tokens
112
- tests/middleware.test.ts 940 tokens
113
-
114
- Saved to .cto/context.md (8.2K tokens — 97% smaller than full project)
115
- ```
116
-
117
- ### Security audit
41
+ ## Context Selection
118
42
 
119
43
  ```bash
120
- npx cto-ai-cli --audit
121
- ```
44
+ # Human-readable summary
45
+ cto --context "refactor the auth middleware"
122
46
 
123
- Scans for **API keys, tokens, passwords, and PII** before they end up in an AI prompt. 45+ patterns (AWS, Stripe, GitHub, OpenAI, etc.) plus Shannon entropy analysis for unknown formats.
47
+ # Pipe to clipboard (macOS)
48
+ cto --context "fix login bug" --stdout | pbcopy
124
49
 
125
- ```
126
- 🔴 CRITICAL src/config/stripe.ts:8
127
- api-key: sk_l********************yZ
128
- 🔴 CRITICAL src/config/database.ts:14
129
- connection-string: post********************db
130
- 🟠 HIGH src/utils/email.ts:22
131
- pii: admi**********om
132
-
133
- 🚨 3 critical findings. Rotate credentials immediately.
134
- ```
50
+ # Save to file (secrets auto-redacted)
51
+ cto --context "add tests" --output context.md
135
52
 
136
- Run in CI to block PRs with secrets: `CI=true npx cto-ai-cli --audit`
53
+ # Full AI prompt with instruction
54
+ cto --context "fix login" --prompt "Refactor to use async/await"
137
55
 
138
- ### Code review intelligence
139
-
140
- ```bash
141
- npx cto-ai-cli --review
142
- ```
143
-
144
- Analyzes your git diff and generates a structured review:
56
+ # JSON for tooling
57
+ cto --context "debug scoring" --json
145
58
 
59
+ # Custom token budget
60
+ cto --context "fix auth" --budget 30000
146
61
  ```
147
- 📊 Review Quality: 82/100 (B+)
148
62
 
149
- Breaking Changes:
150
- 🔴 Removed export: UserService.findById (used by 4 files)
151
- 🟡 Changed signature: authenticate(token) → authenticate(token, opts)
63
+ Output includes full file contents in markdown, ready to paste into Claude, ChatGPT, or any AI. **Secrets are automatically redacted** — API keys, tokens, passwords, PII are replaced with `****` before output.
152
64
 
153
- Missing Files:
154
- ⚠️ No test file for src/services/auth.ts
155
- ⚠️ src/types/user.ts changed but barrel index not updated
156
-
157
- Impact Radius:
158
- Direct: 4 files | Transitive: 12 files | Tests: 3 files
159
-
160
- Saved review prompt to .cto/review-prompt.md
161
- ```
65
+ ## Feedback Loop
162
66
 
163
- | What it detects | Example |
164
- |-----------------|--------|
165
- | **Breaking changes** | Removed exports, changed function signatures, deleted files |
166
- | **Missing files** | Tests, type files, barrel exports, importers of changed code |
167
- | **Impact radius** | How many files are affected (direct + transitive via BFS) |
168
- | **Review quality** | Score based on PR size, focus, breaking changes, completeness |
169
-
170
- ### Learning mode
67
+ CTO learns from real feedback, not from itself:
171
68
 
172
69
  ```bash
173
- npx cto-ai-cli --learn # View feedback model & stats
174
- npx cto-ai-cli --predict # Predict relevant files for a task
175
- npx cto-ai-cli --learn --json # Export learning data for team sharing
176
- ```
177
-
178
- CTO learns from your usage patterns over time. Uses **EWMA temporal decay** (recent feedback weighs more) and **Bayesian confidence** (Wilson score — avoids over-trusting sparse data).
70
+ # After using the context and it worked:
71
+ cto --accept
179
72
 
180
- ### Quality gate for CI/CD
73
+ # If the AI needed files CTO didn't include:
74
+ cto --reject
75
+ cto --reject --missing src/types/auth.ts
181
76
 
182
- ```bash
183
- npx cto-ai-cli --ci # Run quality gate (exits 1 on failure)
184
- npx cto-ai-cli --ci --threshold 80 # Custom minimum score
185
- npx cto-ai-cli --ci --json # JSON for pipeline parsing
77
+ # See what CTO has learned:
78
+ cto --stats
186
79
  ```
187
80
 
188
- Block merges when context quality drops below your threshold. Tracks baselines and detects regressions.
81
+ On `--reject`, CTO also detects files you edited after the selection that weren't in the context — those get automatically boosted for next time.
189
82
 
190
- ### Monorepo support
83
+ ## Secret Audit
191
84
 
192
85
  ```bash
193
- npx cto-ai-cli --monorepo # Analyze all packages
194
- npx cto-ai-cli --monorepo --package api # Focus on one package
86
+ cto --audit # scan all files
87
+ cto --audit --init-hook # install pre-commit hook
88
+ cto --audit --full-scan # ignore cache, scan everything
89
+ cto --audit --json # machine-readable output
195
90
  ```
196
91
 
197
- Detects npm/yarn/pnpm workspaces, Turborepo, Nx, and Lerna. Shows cross-package dependencies, isolation scores, and shared package analysis.
92
+ 45+ patterns (AWS, Stripe, GitHub, OpenAI, Slack, etc.) plus Shannon entropy analysis. But the real value is that **audit protects context**: every `--stdout`, `--output`, and `--prompt` command auto-sanitizes secrets before output.
198
93
 
199
- ---
94
+ ## MCP Server
200
95
 
201
- ## All CLI Flags
96
+ Works as an MCP server for AI editors (Windsurf, Claude Desktop, Cursor).
202
97
 
203
- ```bash
204
- # Analysis
205
- npx cto-ai-cli [path] # Score a project
206
- npx cto-ai-cli --json # JSON output
207
- npx cto-ai-cli --benchmark # CTO vs naive vs random comparison
208
- npx cto-ai-cli --compare # Compare vs popular OSS projects
209
- npx cto-ai-cli --report # Markdown report + badge
210
-
211
- # Context generation
212
- npx cto-ai-cli --fix # Auto-generate .cto/context.md
213
- npx cto-ai-cli --context "task" # Task-specific context
214
-
215
- # Security
216
- npx cto-ai-cli --audit # Secret & PII detection
217
- npx cto-ai-cli --audit --full-scan # Scan all files (ignore cache)
218
- npx cto-ai-cli --audit --init-hook # Install pre-commit hook
219
-
220
- # Code review
221
- npx cto-ai-cli --review # PR review analysis
222
- npx cto-ai-cli --review --json # Review data as JSON
223
-
224
- # Learning
225
- npx cto-ai-cli --learn # Feedback model dashboard
226
- npx cto-ai-cli --predict # File predictions for a task
227
- npx cto-ai-cli --learn --json # Export learning data
228
-
229
- # CI/CD
230
- npx cto-ai-cli --ci # Quality gate
231
- npx cto-ai-cli --ci --threshold 80 # Custom threshold
232
-
233
- # Monorepo
234
- npx cto-ai-cli --monorepo # Full monorepo analysis
235
- npx cto-ai-cli --monorepo --package X # Single package
236
-
237
- # Gateway (AI proxy)
238
- npx cto-gateway # Start proxy server
239
- npx cto-gateway --budget-daily 10 # With budget enforcement
240
- ```
241
-
242
- ---
98
+ **3 tools:** `cto_select_context`, `cto_audit_secrets`, `cto_explain`
243
99
 
244
- ## MCP Server (for AI Editors)
245
-
246
- CTO works as an [MCP server](https://modelcontextprotocol.io/) — plug it into Claude, Windsurf, or Cursor.
247
-
248
- **Windsurf** — add to `~/.codeium/windsurf/mcp_config.json`:
249
100
  ```json
250
- {
251
- "mcpServers": {
252
- "cto": { "command": "cto-mcp" }
253
- }
254
- }
255
- ```
101
+ // Windsurf: ~/.codeium/windsurf/mcp_config.json
102
+ { "mcpServers": { "cto": { "command": "cto-mcp" } } }
256
103
 
257
- **Claude Desktop:**
258
- ```json
259
- {
260
- "mcpServers": {
261
- "cto": { "command": "npx", "args": ["-y", "cto-ai-cli", "--mcp"] }
262
- }
263
- }
104
+ // Claude Desktop
105
+ { "mcpServers": { "cto": { "command": "npx", "args": ["-y", "cto-ai-cli"] } } }
264
106
  ```
265
107
 
266
- 19 tools available: `cto_analyze`, `cto_select_context`, `cto_score`, `cto_benchmark`, `cto_risk`, `cto_quality_benchmark`, `cto_compilability`, `cto_audit`, `cto_review`, `cto_monorepo`, and more.
108
+ MCP output is also auto-sanitized when `includeContents: true`.
267
109
 
268
- ---
110
+ ## How it works
111
+
112
+ 1. **Dependency graph** — parses imports, builds adjacency list, identifies hubs
113
+ 2. **Risk scoring** — complexity × centrality × recency (continuous, log-scaled)
114
+ 3. **TF-IDF/BM25 semantic matching** — task description scored against all file contents + path boosting
115
+ 4. **Composite ranking** — `finalScore = risk × 0.4 + semantic × 0.4 + learner × 0.2`
116
+ 5. **Greedy allocation** — fills token budget top-down, cascading prune levels (full → signatures → skeleton)
117
+ 6. **Bayesian learning** — exponential decay on priors, Wilson score confidence, per-task-type patterns
118
+
119
+ No AI is used for selection. Same input → same output. Deterministic.
269
120
 
270
121
  ## Programmatic API
271
122
 
272
123
  ```typescript
273
- import { analyzeProject, computeContextScore, selectContext } from 'cto-ai-cli';
124
+ import { analyzeProject, selectContext, buildIndex, query } from 'cto-ai-cli';
274
125
 
275
- // Analyze a project
276
126
  const analysis = await analyzeProject('./my-project');
127
+ const index = buildIndex(files);
128
+ const semanticScores = query(index, 'fix auth', 50)
129
+ .map(m => ({ filePath: m.filePath, score: m.score }));
277
130
 
278
- // Get the Context Score
279
- const score = await computeContextScore(analysis);
280
- console.log(`Score: ${score.overall}/100 (${score.grade})`);
281
- console.log(`Tokens saved: ${score.comparison.savedPercent}%`);
282
-
283
- // Select optimal files for a task
284
131
  const selection = await selectContext({
285
- task: 'refactor the auth middleware',
132
+ task: 'fix auth',
286
133
  analysis,
287
- budget: 50_000, // 50K token budget
134
+ budget: 50_000,
135
+ semanticScores, // wired into ranking
288
136
  });
289
-
290
- console.log(`Selected ${selection.files.length} files`);
291
- console.log(`Coverage: ${selection.coverage.score}%`);
292
- for (const file of selection.files) {
293
- console.log(` ${file.relativePath} (${file.tokens} tokens, risk: ${file.riskScore})`);
294
- }
295
137
  ```
296
138
 
297
- ---
298
-
299
- ## How It Works
300
-
301
- 1. **Scan** — walks your project, parses imports, builds a dependency graph
302
- 2. **Score** — computes risk for each file (complexity, hub score, centrality, recency)
303
- 3. **Select** — deterministic greedy algorithm: picks highest-risk files first within token budget
304
- 4. **Prove** — measures coverage (% of important files included), compares vs naive strategies
305
-
306
- No AI is used for selection. Same input always produces the same output. Fully reproducible.
307
-
308
- ---
309
-
310
- ## Benchmark Proof
311
-
312
- CTO includes an automated benchmark that runs **real context selection** on this repository (or any repo) and compares CTO vs naive (alphabetical) vs random strategies.
313
-
314
- ```bash
315
- $ npx tsx scripts/benchmark.ts --json
316
- ```
317
-
318
- **Results on this repo (124 files, 346K tokens):**
319
-
320
- | Metric | Result |
321
- |--------|--------|
322
- | **CTO win rate** | 100% (20/20 runs across 5 tasks × 4 budgets) |
323
- | **Coverage gain vs random** | +81% average |
324
- | **Tokens saved vs naive** | 10% average |
325
- | **Compilability: CTO** | 92/100 |
326
- | **Compilability: Naive** | 40/100 |
327
- | **CTO fewer predicted errors** | 2 fewer type/import errors per task |
328
- | **Avg selection time** | 16ms |
139
+ ## Honest limitations
329
140
 
330
- The benchmark uses the same scoring engine as the CLI. No hardcoded numbers — run it yourself on any project.
331
-
332
- ---
333
-
334
- ## Gateway — AI Proxy with Security
335
-
336
- ```bash
337
- npx cto-gateway # Start proxy server
338
- npx cto-gateway --port 9000 # Custom port
339
- npx cto-gateway --budget-daily 10 # $10/day budget enforcement
340
- npx cto-gateway --block-secrets # Strip secrets from prompts
341
- npx cto-gateway --api-key <key> # Require authentication
342
- ```
343
-
344
- The gateway sits between your AI editor and the model API, adding:
345
-
346
- - **Context optimization** — injects relevant file contents into prompts automatically
347
- - **Secret scanning** — strips API keys and PII from outbound messages
348
- - **Budget enforcement** — daily/weekly spend limits with alerts
349
- - **Usage tracking** — JSONL logs of all requests with token counts and costs
350
- - **SSRF protection** — domain allowlist, private IP blocking, HTTPS-only
351
- - **Body size limits** — 10MB default, prevents abuse
352
- - **Upstream timeouts** — 120s default with socket cleanup
353
- - **Connection pooling** — keep-alive agents with 50 max sockets
354
-
355
- Supports **OpenAI, Anthropic, Google, and Azure** providers with SSE streaming.
356
-
357
- ---
358
-
359
- ## Multi-Model Optimization
360
-
361
- ```bash
362
- npx cto-ai-cli --benchmark
363
- ```
364
-
365
- CTO knows 8 model profiles and recommends the best model for your task:
366
-
367
- | Model | Context Window | Strengths |
368
- |-------|---------------|----------|
369
- | GPT-4o | 128K | General coding, debugging |
370
- | GPT-4o Mini | 128K | Fast, cheap, simple tasks |
371
- | Claude Sonnet 4 | 200K | Complex refactoring, architecture |
372
- | Claude 3.5 Haiku | 200K | Fast analysis, code review |
373
- | Gemini 2.0 Flash | 1M | Huge codebases, exploration |
374
- | Gemini 2.5 Pro | 1M | Deep reasoning, long context |
375
- | DeepSeek V3 | 128K | Cost-effective coding |
376
- | Codestral | 256K | Code completion, generation |
377
-
378
- For each model, CTO computes: **budget** (based on context window), **quality score** (strength match + coverage), **estimated cost**, and a **recommendation** (best quality, best value, cheapest).
379
-
380
- ---
381
-
382
- ## Security
383
-
384
- ### API Server Path Traversal Protection
385
-
386
- The API server (`cto-api`) validates all project paths:
387
-
388
- - **Forbidden system paths** — blocks `/etc`, `/usr`, `/var`, `/sys`, `/proc`, `/dev`, `/boot`, `/tmp`, `/root`
389
- - **Forbidden patterns** — blocks paths containing `.ssh`, `.gnupg`, `.aws`, `.env`, `passwd`, `shadow`
390
- - **Allowlist** — set `CTO_ALLOWED_ROOTS=/home/deploy,/opt/projects` to restrict access to specific directories
391
-
392
- ### Secret Detection
393
-
394
- 45+ patterns including AWS, Stripe, GitHub, OpenAI, Datadog, Sentry, Firebase, Supabase, and more. Plus:
395
-
396
- - **Shannon entropy analysis** for unknown secret formats
397
- - **PII detection** (emails, SSNs, phone numbers) with safe domain filtering
398
- - **Allowlist system** — SHA-256 fingerprinted exceptions in `.cto/audit/allowlist.json`
399
- - **Incremental scanning** — file hash cache, only re-scans changed files
400
- - **Pre-commit hook** — `npx cto-ai-cli --audit --init-hook` installs a git hook that blocks commits with secrets
401
-
402
- ---
403
-
404
- ## Honest Limitations
405
-
406
- - **TypeScript/JavaScript gets the deepest analysis.** Other languages (Python, Go, Rust, Java) get basic file + import analysis.
407
- - **Benchmarks use simple baselines** (alphabetical, random). Run `npx tsx scripts/benchmark.ts` on your own repo to see real numbers. We haven't compared against Cursor's or Copilot's internal context selection.
408
- - **Savings are estimates** based on average API pricing. Actual savings depend on your model and usage.
409
- - **Risk scoring uses a complexity proxy** instead of real git churn data (planned improvement).
410
-
411
- ---
141
+ - **TypeScript/JavaScript gets deep analysis.** Other languages get basic file + import analysis.
142
+ - **TF-IDF, not embeddings.** Handles most tasks well but won't understand complex intent.
143
+ - **Learning needs ~5 feedback cycles** to start influencing selection. First runs are pure graph + risk + semantic.
144
+ - **Not compared against Cursor/Copilot internal context.** Our baselines are naive (alphabetical, random).
412
145
 
413
146
  ## Contributing
414
147
 
415
148
  ```bash
416
- git clone https://github.com/cto-ai/cto-ai-cli.git
417
- cd cto-ai-cli
418
- npm install
419
- npm run build
420
- npm test # 550 tests, 91% coverage
421
- npm run typecheck # strict TypeScript, zero errors
422
- ```
423
-
424
- Run the automated benchmark to see CTO vs naive on this repo:
425
-
426
- ```bash
427
- npx tsx scripts/benchmark.ts # Human-readable report
428
- npx tsx scripts/benchmark.ts --json # Machine-readable JSON
149
+ git clone https://github.com/cto-ai/cto-ai-cli.git && cd cto-ai-cli
150
+ npm install && npm run build && npm test # 597 tests
429
151
  ```
430
152
 
431
- Full API docs, MCP server reference, and architecture are in [DOCS.md](DOCS.md).
432
-
433
153
  ## License
434
154
 
435
155
  [MIT](LICENSE)