cto-ai-cli 5.0.0 → 5.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,7 +1,8 @@
1
1
  # CTO — Stop sending your entire codebase to AI
2
2
 
3
3
  [![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](LICENSE)
4
- [![Tests](https://img.shields.io/badge/tests-376_passing-brightgreen.svg)](#)
4
+ [![Tests](https://img.shields.io/badge/tests-550_passing-brightgreen.svg)](#)
5
+ [![Coverage](https://img.shields.io/badge/coverage-91%25-brightgreen.svg)](#)
5
6
  [![npm](https://img.shields.io/npm/v/cto-ai-cli.svg)](https://www.npmjs.com/package/cto-ai-cli)
6
7
 
7
8
  CTO analyzes your project and selects the **minimum set of files** your AI needs — saving tokens, reducing cost, and producing code that actually compiles.
@@ -262,7 +263,7 @@ CTO works as an [MCP server](https://modelcontextprotocol.io/) — plug it into
262
263
  }
263
264
  ```
264
265
 
265
- Tools available: `cto_analyze`, `cto_select_context`, `cto_score`, `cto_benchmark`, `cto_risk`, and more.
266
+ 19 tools available: `cto_analyze`, `cto_select_context`, `cto_score`, `cto_benchmark`, `cto_risk`, `cto_quality_benchmark`, `cto_compilability`, `cto_audit`, `cto_review`, `cto_monorepo`, and more.
266
267
 
267
268
  ---
268
269
 
@@ -306,10 +307,104 @@ No AI is used for selection. Same input always produces the same output. Fully r
306
307
 
307
308
  ---
308
309
 
310
+ ## Benchmark Proof
311
+
312
+ CTO includes an automated benchmark that runs **real context selection** on this repository (or any repo) and compares CTO vs naive (alphabetical) vs random strategies.
313
+
314
+ ```bash
315
+ $ npx tsx scripts/benchmark.ts --json
316
+ ```
317
+
318
+ **Results on this repo (124 files, 346K tokens):**
319
+
320
+ | Metric | Result |
321
+ |--------|--------|
322
+ | **CTO win rate** | 100% (20/20 runs across 5 tasks × 4 budgets) |
323
+ | **Coverage gain vs random** | +81% average |
324
+ | **Tokens saved vs naive** | 10% average |
325
+ | **Compilability: CTO** | 92/100 |
326
+ | **Compilability: Naive** | 40/100 |
327
+ | **CTO fewer predicted errors** | 2 fewer type/import errors per task |
328
+ | **Avg selection time** | 16ms |
329
+
330
+ The benchmark uses the same scoring engine as the CLI. No hardcoded numbers — run it yourself on any project.
331
+
332
+ ---
333
+
334
+ ## Gateway — AI Proxy with Security
335
+
336
+ ```bash
337
+ npx cto-gateway # Start proxy server
338
+ npx cto-gateway --port 9000 # Custom port
339
+ npx cto-gateway --budget-daily 10 # $10/day budget enforcement
340
+ npx cto-gateway --block-secrets # Strip secrets from prompts
341
+ npx cto-gateway --api-key <key> # Require authentication
342
+ ```
343
+
344
+ The gateway sits between your AI editor and the model API, adding:
345
+
346
+ - **Context optimization** — injects relevant file contents into prompts automatically
347
+ - **Secret scanning** — strips API keys and PII from outbound messages
348
+ - **Budget enforcement** — daily/weekly spend limits with alerts
349
+ - **Usage tracking** — JSONL logs of all requests with token counts and costs
350
+ - **SSRF protection** — domain allowlist, private IP blocking, HTTPS-only
351
+ - **Body size limits** — 10MB default, prevents abuse
352
+ - **Upstream timeouts** — 120s default with socket cleanup
353
+ - **Connection pooling** — keep-alive agents with 50 max sockets
354
+
355
+ Supports **OpenAI, Anthropic, Google, and Azure** providers with SSE streaming.
356
+
357
+ ---
358
+
359
+ ## Multi-Model Optimization
360
+
361
+ ```bash
362
+ npx cto-ai-cli --benchmark
363
+ ```
364
+
365
+ CTO knows 8 model profiles and recommends the best model for your task:
366
+
367
+ | Model | Context Window | Strengths |
368
+ |-------|---------------|----------|
369
+ | GPT-4o | 128K | General coding, debugging |
370
+ | GPT-4o Mini | 128K | Fast, cheap, simple tasks |
371
+ | Claude Sonnet 4 | 200K | Complex refactoring, architecture |
372
+ | Claude 3.5 Haiku | 200K | Fast analysis, code review |
373
+ | Gemini 2.0 Flash | 1M | Huge codebases, exploration |
374
+ | Gemini 2.5 Pro | 1M | Deep reasoning, long context |
375
+ | DeepSeek V3 | 128K | Cost-effective coding |
376
+ | Codestral | 256K | Code completion, generation |
377
+
378
+ For each model, CTO computes: **budget** (based on context window), **quality score** (strength match + coverage), **estimated cost**, and a **recommendation** (best quality, best value, cheapest).
379
+
380
+ ---
381
+
382
+ ## Security
383
+
384
+ ### API Server Path Traversal Protection
385
+
386
+ The API server (`cto-api`) validates all project paths:
387
+
388
+ - **Forbidden system paths** — blocks `/etc`, `/usr`, `/var`, `/sys`, `/proc`, `/dev`, `/boot`, `/tmp`, `/root`
389
+ - **Forbidden patterns** — blocks paths containing `.ssh`, `.gnupg`, `.aws`, `.env`, `passwd`, `shadow`
390
+ - **Allowlist** — set `CTO_ALLOWED_ROOTS=/home/deploy,/opt/projects` to restrict access to specific directories
391
+
392
+ ### Secret Detection
393
+
394
+ 45+ patterns including AWS, Stripe, GitHub, OpenAI, Datadog, Sentry, Firebase, Supabase, and more. Plus:
395
+
396
+ - **Shannon entropy analysis** for unknown secret formats
397
+ - **PII detection** (emails, SSNs, phone numbers) with safe domain filtering
398
+ - **Allowlist system** — SHA-256 fingerprinted exceptions in `.cto/audit/allowlist.json`
399
+ - **Incremental scanning** — file hash cache, only re-scans changed files
400
+ - **Pre-commit hook** — `npx cto-ai-cli --audit --init-hook` installs a git hook that blocks commits with secrets
401
+
402
+ ---
403
+
309
404
  ## Honest Limitations
310
405
 
311
406
  - **TypeScript/JavaScript gets the deepest analysis.** Other languages (Python, Go, Rust, Java) get basic file + import analysis.
312
- - **Benchmarks use simple baselines** (alphabetical, random). We haven't compared against Cursor's or Copilot's internal context selection.
407
+ - **Benchmarks use simple baselines** (alphabetical, random). Run `npx tsx scripts/benchmark.ts` on your own repo to see real numbers. We haven't compared against Cursor's or Copilot's internal context selection.
313
408
  - **Savings are estimates** based on average API pricing. Actual savings depend on your model and usage.
314
409
  - **Risk scoring uses a complexity proxy** instead of real git churn data (planned improvement).
315
410
 
@@ -322,10 +417,17 @@ git clone https://github.com/cto-ai/cto-ai-cli.git
322
417
  cd cto-ai-cli
323
418
  npm install
324
419
  npm run build
325
- npm test # 376 tests
420
+ npm test # 550 tests, 91% coverage
326
421
  npm run typecheck # strict TypeScript, zero errors
327
422
  ```
328
423
 
424
+ Run the automated benchmark to see CTO vs naive on this repo:
425
+
426
+ ```bash
427
+ npx tsx scripts/benchmark.ts # Human-readable report
428
+ npx tsx scripts/benchmark.ts --json # Machine-readable JSON
429
+ ```
430
+
329
431
  Full API docs, MCP server reference, and architecture are in [DOCS.md](DOCS.md).
330
432
 
331
433
  ## License
@@ -24179,8 +24179,8 @@ async function analyzeProject(projectPath, config) {
24179
24179
  maxDepth: mergedConfig.analysis.maxDepth
24180
24180
  });
24181
24181
  const tokenMethod = mergedConfig.tokens.method;
24182
- const files = [];
24183
- for (const entry of walkEntries) {
24182
+ const BATCH_SIZE = 50;
24183
+ async function estimateFileTokens(entry) {
24184
24184
  let tokens;
24185
24185
  if (tokenMethod === "tiktoken") {
24186
24186
  try {
@@ -24192,7 +24192,7 @@ async function analyzeProject(projectPath, config) {
24192
24192
  } else {
24193
24193
  tokens = countTokensChars4(entry.size);
24194
24194
  }
24195
- files.push({
24195
+ return {
24196
24196
  path: entry.path,
24197
24197
  relativePath: entry.relativePath,
24198
24198
  extension: entry.extension,
@@ -24201,16 +24201,20 @@ async function analyzeProject(projectPath, config) {
24201
24201
  lines: entry.lines,
24202
24202
  lastModified: entry.lastModified,
24203
24203
  kind: classifyFileKind(entry.relativePath),
24204
- // Graph data — populated by graph analysis
24205
24204
  imports: [],
24206
24205
  importedBy: [],
24207
24206
  isHub: false,
24208
24207
  complexity: 0,
24209
- // Risk data — populated by risk analysis
24210
24208
  riskScore: 0,
24211
24209
  riskFactors: [],
24212
24210
  exclusionImpact: "none"
24213
- });
24211
+ };
24212
+ }
24213
+ const files = [];
24214
+ for (let i = 0; i < walkEntries.length; i += BATCH_SIZE) {
24215
+ const batch = walkEntries.slice(i, i + BATCH_SIZE);
24216
+ const results = await Promise.all(batch.map(estimateFileTokens));
24217
+ files.push(...results);
24214
24218
  }
24215
24219
  const graph = buildProjectGraph(absPath, files);
24216
24220
  for (const file of files) {
@@ -592,8 +592,8 @@ async function analyzeProject(projectPath, config) {
592
592
  maxDepth: mergedConfig.analysis.maxDepth
593
593
  });
594
594
  const tokenMethod = mergedConfig.tokens.method;
595
- const files = [];
596
- for (const entry of walkEntries) {
595
+ const BATCH_SIZE = 50;
596
+ async function estimateFileTokens(entry) {
597
597
  let tokens;
598
598
  if (tokenMethod === "tiktoken") {
599
599
  try {
@@ -605,7 +605,7 @@ async function analyzeProject(projectPath, config) {
605
605
  } else {
606
606
  tokens = countTokensChars4(entry.size);
607
607
  }
608
- files.push({
608
+ return {
609
609
  path: entry.path,
610
610
  relativePath: entry.relativePath,
611
611
  extension: entry.extension,
@@ -614,16 +614,20 @@ async function analyzeProject(projectPath, config) {
614
614
  lines: entry.lines,
615
615
  lastModified: entry.lastModified,
616
616
  kind: classifyFileKind(entry.relativePath),
617
- // Graph data — populated by graph analysis
618
617
  imports: [],
619
618
  importedBy: [],
620
619
  isHub: false,
621
620
  complexity: 0,
622
- // Risk data — populated by risk analysis
623
621
  riskScore: 0,
624
622
  riskFactors: [],
625
623
  exclusionImpact: "none"
626
- });
624
+ };
625
+ }
626
+ const files = [];
627
+ for (let i = 0; i < walkEntries.length; i += BATCH_SIZE) {
628
+ const batch = walkEntries.slice(i, i + BATCH_SIZE);
629
+ const results = await Promise.all(batch.map(estimateFileTokens));
630
+ files.push(...results);
627
631
  }
628
632
  const graph = buildProjectGraph(absPath, files);
629
633
  for (const file of files) {