cto-ai-cli 5.1.0 → 5.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (2) hide show
  1. package/README.md +104 -3
  2. package/package.json +1 -1
package/README.md CHANGED
@@ -263,7 +263,7 @@ CTO works as an [MCP server](https://modelcontextprotocol.io/) — plug it into
263
263
  }
264
264
  ```
265
265
 
266
- Tools available: `cto_analyze`, `cto_select_context`, `cto_score`, `cto_benchmark`, `cto_risk`, and more.
266
+ 19 tools available: `cto_analyze`, `cto_select_context`, `cto_score`, `cto_benchmark`, `cto_risk`, `cto_quality_benchmark`, `cto_compilability`, `cto_audit`, `cto_review`, `cto_monorepo`, and more.
267
267
 
268
268
  ---
269
269
 
@@ -307,10 +307,104 @@ No AI is used for selection. Same input always produces the same output. Fully r
307
307
 
308
308
  ---
309
309
 
310
+ ## Benchmark Proof
311
+
312
+ CTO includes an automated benchmark that runs **real context selection** on this repository (or any repo) and compares CTO vs naive (alphabetical) vs random strategies.
313
+
314
+ ```bash
315
+ $ npx tsx scripts/benchmark.ts --json
316
+ ```
317
+
318
+ **Results on this repo (124 files, 346K tokens):**
319
+
320
+ | Metric | Result |
321
+ |--------|--------|
322
+ | **CTO win rate** | 100% (20/20 runs across 5 tasks × 4 budgets) |
323
+ | **Coverage gain vs random** | +81% average |
324
+ | **Tokens saved vs naive** | 10% average |
325
+ | **Compilability: CTO** | 92/100 |
326
+ | **Compilability: Naive** | 40/100 |
327
+ | **CTO fewer predicted errors** | 2 fewer type/import errors per task |
328
+ | **Avg selection time** | 16ms |
329
+
330
+ The benchmark uses the same scoring engine as the CLI. No hardcoded numbers — run it yourself on any project.
331
+
332
+ ---
333
+
334
+ ## Gateway — AI Proxy with Security
335
+
336
+ ```bash
337
+ npx cto-gateway # Start proxy server
338
+ npx cto-gateway --port 9000 # Custom port
339
+ npx cto-gateway --budget-daily 10 # $10/day budget enforcement
340
+ npx cto-gateway --block-secrets # Strip secrets from prompts
341
+ npx cto-gateway --api-key <key> # Require authentication
342
+ ```
343
+
344
+ The gateway sits between your AI editor and the model API, adding:
345
+
346
+ - **Context optimization** — injects relevant file contents into prompts automatically
347
+ - **Secret scanning** — strips API keys and PII from outbound messages
348
+ - **Budget enforcement** — daily/weekly spend limits with alerts
349
+ - **Usage tracking** — JSONL logs of all requests with token counts and costs
350
+ - **SSRF protection** — domain allowlist, private IP blocking, HTTPS-only
351
+ - **Body size limits** — 10MB default, prevents abuse
352
+ - **Upstream timeouts** — 120s default with socket cleanup
353
+ - **Connection pooling** — keep-alive agents with 50 max sockets
354
+
355
+ Supports **OpenAI, Anthropic, Google, and Azure** providers with SSE streaming.
356
+
357
+ ---
358
+
359
+ ## Multi-Model Optimization
360
+
361
+ ```bash
362
+ npx cto-ai-cli --benchmark
363
+ ```
364
+
365
+ CTO knows 8 model profiles and recommends the best model for your task:
366
+
367
+ | Model | Context Window | Strengths |
368
+ |-------|---------------|----------|
369
+ | GPT-4o | 128K | General coding, debugging |
370
+ | GPT-4o Mini | 128K | Fast, cheap, simple tasks |
371
+ | Claude Sonnet 4 | 200K | Complex refactoring, architecture |
372
+ | Claude 3.5 Haiku | 200K | Fast analysis, code review |
373
+ | Gemini 2.0 Flash | 1M | Huge codebases, exploration |
374
+ | Gemini 2.5 Pro | 1M | Deep reasoning, long context |
375
+ | DeepSeek V3 | 128K | Cost-effective coding |
376
+ | Codestral | 256K | Code completion, generation |
377
+
378
+ For each model, CTO computes: **budget** (based on context window), **quality score** (strength match + coverage), **estimated cost**, and a **recommendation** (best quality, best value, cheapest).
379
+
380
+ ---
381
+
382
+ ## Security
383
+
384
+ ### API Server Path Traversal Protection
385
+
386
+ The API server (`cto-api`) validates all project paths:
387
+
388
+ - **Forbidden system paths** — blocks `/etc`, `/usr`, `/var`, `/sys`, `/proc`, `/dev`, `/boot`, `/tmp`, `/root`
389
+ - **Forbidden patterns** — blocks paths containing `.ssh`, `.gnupg`, `.aws`, `.env`, `passwd`, `shadow`
390
+ - **Allowlist** — set `CTO_ALLOWED_ROOTS=/home/deploy,/opt/projects` to restrict access to specific directories
391
+
392
+ ### Secret Detection
393
+
394
+ 45+ patterns including AWS, Stripe, GitHub, OpenAI, Datadog, Sentry, Firebase, Supabase, and more. Plus:
395
+
396
+ - **Shannon entropy analysis** for unknown secret formats
397
+ - **PII detection** (emails, SSNs, phone numbers) with safe domain filtering
398
+ - **Allowlist system** — SHA-256 fingerprinted exceptions in `.cto/audit/allowlist.json`
399
+ - **Incremental scanning** — file hash cache, only re-scans changed files
400
+ - **Pre-commit hook** — `npx cto-ai-cli --audit --init-hook` installs a git hook that blocks commits with secrets
401
+
402
+ ---
403
+
310
404
  ## Honest Limitations
311
405
 
312
406
  - **TypeScript/JavaScript gets the deepest analysis.** Other languages (Python, Go, Rust, Java) get basic file + import analysis.
313
- - **Benchmarks use simple baselines** (alphabetical, random). We haven't compared against Cursor's or Copilot's internal context selection.
407
+ - **Benchmarks use simple baselines** (alphabetical, random). Run `npx tsx scripts/benchmark.ts` on your own repo to see real numbers. We haven't compared against Cursor's or Copilot's internal context selection.
314
408
  - **Savings are estimates** based on average API pricing. Actual savings depend on your model and usage.
315
409
  - **Risk scoring uses a complexity proxy** instead of real git churn data (planned improvement).
316
410
 
@@ -323,10 +417,17 @@ git clone https://github.com/cto-ai/cto-ai-cli.git
323
417
  cd cto-ai-cli
324
418
  npm install
325
419
  npm run build
326
- npm test # 376 tests
420
+ npm test # 550 tests, 91% coverage
327
421
  npm run typecheck # strict TypeScript, zero errors
328
422
  ```
329
423
 
424
+ Run the automated benchmark to see CTO vs naive on this repo:
425
+
426
+ ```bash
427
+ npx tsx scripts/benchmark.ts # Human-readable report
428
+ npx tsx scripts/benchmark.ts --json # Machine-readable JSON
429
+ ```
430
+
330
431
  Full API docs, MCP server reference, and architecture are in [DOCS.md](DOCS.md).
331
432
 
332
433
  ## License
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "cto-ai-cli",
3
- "version": "5.1.0",
3
+ "version": "5.2.0",
4
4
  "description": "Your AI is reading too much code. CTO analyzes your project and selects the optimal files for AI context — saving tokens, improving output quality, and ensuring type definitions are included.",
5
5
  "type": "module",
6
6
  "bin": {