cto-ai-cli 5.1.0 → 5.2.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +104 -3
- package/package.json +1 -1
package/README.md
CHANGED
|
@@ -263,7 +263,7 @@ CTO works as an [MCP server](https://modelcontextprotocol.io/) — plug it into
|
|
|
263
263
|
}
|
|
264
264
|
```
|
|
265
265
|
|
|
266
|
-
|
|
266
|
+
19 tools available: `cto_analyze`, `cto_select_context`, `cto_score`, `cto_benchmark`, `cto_risk`, `cto_quality_benchmark`, `cto_compilability`, `cto_audit`, `cto_review`, `cto_monorepo`, and more.
|
|
267
267
|
|
|
268
268
|
---
|
|
269
269
|
|
|
@@ -307,10 +307,104 @@ No AI is used for selection. Same input always produces the same output. Fully r
|
|
|
307
307
|
|
|
308
308
|
---
|
|
309
309
|
|
|
310
|
+
## Benchmark Proof
|
|
311
|
+
|
|
312
|
+
CTO includes an automated benchmark that runs **real context selection** on this repository (or any repo) and compares CTO vs naive (alphabetical) vs random strategies.
|
|
313
|
+
|
|
314
|
+
```bash
|
|
315
|
+
$ npx tsx scripts/benchmark.ts --json
|
|
316
|
+
```
|
|
317
|
+
|
|
318
|
+
**Results on this repo (124 files, 346K tokens):**
|
|
319
|
+
|
|
320
|
+
| Metric | Result |
|
|
321
|
+
|--------|--------|
|
|
322
|
+
| **CTO win rate** | 100% (20/20 runs across 5 tasks × 4 budgets) |
|
|
323
|
+
| **Coverage gain vs random** | +81% average |
|
|
324
|
+
| **Tokens saved vs naive** | 10% average |
|
|
325
|
+
| **Compilability: CTO** | 92/100 |
|
|
326
|
+
| **Compilability: Naive** | 40/100 |
|
|
327
|
+
| **CTO fewer predicted errors** | 2 fewer type/import errors per task |
|
|
328
|
+
| **Avg selection time** | 16ms |
|
|
329
|
+
|
|
330
|
+
The benchmark uses the same scoring engine as the CLI. No hardcoded numbers — run it yourself on any project.
|
|
331
|
+
|
|
332
|
+
---
|
|
333
|
+
|
|
334
|
+
## Gateway — AI Proxy with Security
|
|
335
|
+
|
|
336
|
+
```bash
|
|
337
|
+
npx cto-gateway # Start proxy server
|
|
338
|
+
npx cto-gateway --port 9000 # Custom port
|
|
339
|
+
npx cto-gateway --budget-daily 10 # $10/day budget enforcement
|
|
340
|
+
npx cto-gateway --block-secrets # Strip secrets from prompts
|
|
341
|
+
npx cto-gateway --api-key <key> # Require authentication
|
|
342
|
+
```
|
|
343
|
+
|
|
344
|
+
The gateway sits between your AI editor and the model API, adding:
|
|
345
|
+
|
|
346
|
+
- **Context optimization** — injects relevant file contents into prompts automatically
|
|
347
|
+
- **Secret scanning** — strips API keys and PII from outbound messages
|
|
348
|
+
- **Budget enforcement** — daily/weekly spend limits with alerts
|
|
349
|
+
- **Usage tracking** — JSONL logs of all requests with token counts and costs
|
|
350
|
+
- **SSRF protection** — domain allowlist, private IP blocking, HTTPS-only
|
|
351
|
+
- **Body size limits** — 10MB default, prevents abuse
|
|
352
|
+
- **Upstream timeouts** — 120s default with socket cleanup
|
|
353
|
+
- **Connection pooling** — keep-alive agents with 50 max sockets
|
|
354
|
+
|
|
355
|
+
Supports **OpenAI, Anthropic, Google, and Azure** providers with SSE streaming.
|
|
356
|
+
|
|
357
|
+
---
|
|
358
|
+
|
|
359
|
+
## Multi-Model Optimization
|
|
360
|
+
|
|
361
|
+
```bash
|
|
362
|
+
npx cto-ai-cli --benchmark
|
|
363
|
+
```
|
|
364
|
+
|
|
365
|
+
CTO knows 8 model profiles and recommends the best model for your task:
|
|
366
|
+
|
|
367
|
+
| Model | Context Window | Strengths |
|
|
368
|
+
|-------|---------------|----------|
|
|
369
|
+
| GPT-4o | 128K | General coding, debugging |
|
|
370
|
+
| GPT-4o Mini | 128K | Fast, cheap, simple tasks |
|
|
371
|
+
| Claude Sonnet 4 | 200K | Complex refactoring, architecture |
|
|
372
|
+
| Claude 3.5 Haiku | 200K | Fast analysis, code review |
|
|
373
|
+
| Gemini 2.0 Flash | 1M | Huge codebases, exploration |
|
|
374
|
+
| Gemini 2.5 Pro | 1M | Deep reasoning, long context |
|
|
375
|
+
| DeepSeek V3 | 128K | Cost-effective coding |
|
|
376
|
+
| Codestral | 256K | Code completion, generation |
|
|
377
|
+
|
|
378
|
+
For each model, CTO computes: **budget** (based on context window), **quality score** (strength match + coverage), **estimated cost**, and a **recommendation** (best quality, best value, cheapest).
|
|
379
|
+
|
|
380
|
+
---
|
|
381
|
+
|
|
382
|
+
## Security
|
|
383
|
+
|
|
384
|
+
### API Server Path Traversal Protection
|
|
385
|
+
|
|
386
|
+
The API server (`cto-api`) validates all project paths:
|
|
387
|
+
|
|
388
|
+
- **Forbidden system paths** — blocks `/etc`, `/usr`, `/var`, `/sys`, `/proc`, `/dev`, `/boot`, `/tmp`, `/root`
|
|
389
|
+
- **Forbidden patterns** — blocks paths containing `.ssh`, `.gnupg`, `.aws`, `.env`, `passwd`, `shadow`
|
|
390
|
+
- **Allowlist** — set `CTO_ALLOWED_ROOTS=/home/deploy,/opt/projects` to restrict access to specific directories
|
|
391
|
+
|
|
392
|
+
### Secret Detection
|
|
393
|
+
|
|
394
|
+
45+ patterns including AWS, Stripe, GitHub, OpenAI, Datadog, Sentry, Firebase, Supabase, and more. Plus:
|
|
395
|
+
|
|
396
|
+
- **Shannon entropy analysis** for unknown secret formats
|
|
397
|
+
- **PII detection** (emails, SSNs, phone numbers) with safe domain filtering
|
|
398
|
+
- **Allowlist system** — SHA-256 fingerprinted exceptions in `.cto/audit/allowlist.json`
|
|
399
|
+
- **Incremental scanning** — file hash cache, only re-scans changed files
|
|
400
|
+
- **Pre-commit hook** — `npx cto-ai-cli --audit --init-hook` installs a git hook that blocks commits with secrets
|
|
401
|
+
|
|
402
|
+
---
|
|
403
|
+
|
|
310
404
|
## Honest Limitations
|
|
311
405
|
|
|
312
406
|
- **TypeScript/JavaScript gets the deepest analysis.** Other languages (Python, Go, Rust, Java) get basic file + import analysis.
|
|
313
|
-
- **Benchmarks use simple baselines** (alphabetical, random). We haven't compared against Cursor's or Copilot's internal context selection.
|
|
407
|
+
- **Benchmarks use simple baselines** (alphabetical, random). Run `npx tsx scripts/benchmark.ts` on your own repo to see real numbers. We haven't compared against Cursor's or Copilot's internal context selection.
|
|
314
408
|
- **Savings are estimates** based on average API pricing. Actual savings depend on your model and usage.
|
|
315
409
|
- **Risk scoring uses a complexity proxy** instead of real git churn data (planned improvement).
|
|
316
410
|
|
|
@@ -323,10 +417,17 @@ git clone https://github.com/cto-ai/cto-ai-cli.git
|
|
|
323
417
|
cd cto-ai-cli
|
|
324
418
|
npm install
|
|
325
419
|
npm run build
|
|
326
|
-
npm test #
|
|
420
|
+
npm test # 550 tests, 91% coverage
|
|
327
421
|
npm run typecheck # strict TypeScript, zero errors
|
|
328
422
|
```
|
|
329
423
|
|
|
424
|
+
Run the automated benchmark to see CTO vs naive on this repo:
|
|
425
|
+
|
|
426
|
+
```bash
|
|
427
|
+
npx tsx scripts/benchmark.ts # Human-readable report
|
|
428
|
+
npx tsx scripts/benchmark.ts --json # Machine-readable JSON
|
|
429
|
+
```
|
|
430
|
+
|
|
330
431
|
Full API docs, MCP server reference, and architecture are in [DOCS.md](DOCS.md).
|
|
331
432
|
|
|
332
433
|
## License
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "cto-ai-cli",
|
|
3
|
-
"version": "5.
|
|
3
|
+
"version": "5.2.0",
|
|
4
4
|
"description": "Your AI is reading too much code. CTO analyzes your project and selects the optimal files for AI context — saving tokens, improving output quality, and ensuring type definitions are included.",
|
|
5
5
|
"type": "module",
|
|
6
6
|
"bin": {
|