npm - cto-ai-cli - Versions diffs - 5.0.0 → 5.2.0 - Mend

cto-ai-cli 5.0.0 → 5.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (20) hide show

package/README.md CHANGED Viewed

@@ -1,7 +1,8 @@
 # CTO — Stop sending your entire codebase to AI
 [![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](LICENSE)
-[![Tests](https://img.shields.io/badge/tests-376_passing-brightgreen.svg)](#)
+[![Tests](https://img.shields.io/badge/tests-550_passing-brightgreen.svg)](#)
+[![Coverage](https://img.shields.io/badge/coverage-91%25-brightgreen.svg)](#)
 [![npm](https://img.shields.io/npm/v/cto-ai-cli.svg)](https://www.npmjs.com/package/cto-ai-cli)
 CTO analyzes your project and selects the **minimum set of files** your AI needs — saving tokens, reducing cost, and producing code that actually compiles.
@@ -262,7 +263,7 @@ CTO works as an [MCP server](https://modelcontextprotocol.io/) — plug it into
 }
 ```
-Tools available: `cto_analyze`, `cto_select_context`, `cto_score`, `cto_benchmark`, `cto_risk`, and more.
+19 tools available: `cto_analyze`, `cto_select_context`, `cto_score`, `cto_benchmark`, `cto_risk`, `cto_quality_benchmark`, `cto_compilability`, `cto_audit`, `cto_review`, `cto_monorepo`, and more.
 ---
@@ -306,10 +307,104 @@ No AI is used for selection. Same input always produces the same output. Fully r
 ---
+## Benchmark Proof
+CTO includes an automated benchmark that runs **real context selection** on this repository (or any repo) and compares CTO vs naive (alphabetical) vs random strategies.
+```bash
+$ npx tsx scripts/benchmark.ts --json
+```
+**Results on this repo (124 files, 346K tokens):**
+| Metric | Result |
+|--------|--------|
+| **CTO win rate** | 100% (20/20 runs across 5 tasks × 4 budgets) |
+| **Coverage gain vs random** | +81% average |
+| **Tokens saved vs naive** | 10% average |
+| **Compilability: CTO** | 92/100 |
+| **Compilability: Naive** | 40/100 |
+| **CTO fewer predicted errors** | 2 fewer type/import errors per task |
+| **Avg selection time** | 16ms |
+The benchmark uses the same scoring engine as the CLI. No hardcoded numbers — run it yourself on any project.
+---
+## Gateway — AI Proxy with Security
+```bash
+npx cto-gateway                       # Start proxy server
+npx cto-gateway --port 9000           # Custom port
+npx cto-gateway --budget-daily 10     # $10/day budget enforcement
+npx cto-gateway --block-secrets        # Strip secrets from prompts
+npx cto-gateway --api-key <key>       # Require authentication
+```
+The gateway sits between your AI editor and the model API, adding:
+- **Context optimization** — injects relevant file contents into prompts automatically
+- **Secret scanning** — strips API keys and PII from outbound messages
+- **Budget enforcement** — daily/weekly spend limits with alerts
+- **Usage tracking** — JSONL logs of all requests with token counts and costs
+- **SSRF protection** — domain allowlist, private IP blocking, HTTPS-only
+- **Body size limits** — 10MB default, prevents abuse
+- **Upstream timeouts** — 120s default with socket cleanup
+- **Connection pooling** — keep-alive agents with 50 max sockets
+Supports **OpenAI, Anthropic, Google, and Azure** providers with SSE streaming.
+---
+## Multi-Model Optimization
+```bash
+npx cto-ai-cli --benchmark
+```
+CTO knows 8 model profiles and recommends the best model for your task:
+| Model | Context Window | Strengths |
+|-------|---------------|----------|
+| GPT-4o | 128K | General coding, debugging |
+| GPT-4o Mini | 128K | Fast, cheap, simple tasks |
+| Claude Sonnet 4 | 200K | Complex refactoring, architecture |
+| Claude 3.5 Haiku | 200K | Fast analysis, code review |
+| Gemini 2.0 Flash | 1M | Huge codebases, exploration |
+| Gemini 2.5 Pro | 1M | Deep reasoning, long context |
+| DeepSeek V3 | 128K | Cost-effective coding |
+| Codestral | 256K | Code completion, generation |
+For each model, CTO computes: **budget** (based on context window), **quality score** (strength match + coverage), **estimated cost**, and a **recommendation** (best quality, best value, cheapest).
+---
+## Security
+### API Server Path Traversal Protection
+The API server (`cto-api`) validates all project paths:
+- **Forbidden system paths** — blocks `/etc`, `/usr`, `/var`, `/sys`, `/proc`, `/dev`, `/boot`, `/tmp`, `/root`
+- **Forbidden patterns** — blocks paths containing `.ssh`, `.gnupg`, `.aws`, `.env`, `passwd`, `shadow`
+- **Allowlist** — set `CTO_ALLOWED_ROOTS=/home/deploy,/opt/projects` to restrict access to specific directories
+### Secret Detection
+45+ patterns including AWS, Stripe, GitHub, OpenAI, Datadog, Sentry, Firebase, Supabase, and more. Plus:
+- **Shannon entropy analysis** for unknown secret formats
+- **PII detection** (emails, SSNs, phone numbers) with safe domain filtering
+- **Allowlist system** — SHA-256 fingerprinted exceptions in `.cto/audit/allowlist.json`
+- **Incremental scanning** — file hash cache, only re-scans changed files
+- **Pre-commit hook** — `npx cto-ai-cli --audit --init-hook` installs a git hook that blocks commits with secrets
+---
 ## Honest Limitations
 - **TypeScript/JavaScript gets the deepest analysis.** Other languages (Python, Go, Rust, Java) get basic file + import analysis.
-- **Benchmarks use simple baselines** (alphabetical, random). We haven't compared against Cursor's or Copilot's internal context selection.
+- **Benchmarks use simple baselines** (alphabetical, random). Run `npx tsx scripts/benchmark.ts` on your own repo to see real numbers. We haven't compared against Cursor's or Copilot's internal context selection.
 - **Savings are estimates** based on average API pricing. Actual savings depend on your model and usage.
 - **Risk scoring uses a complexity proxy** instead of real git churn data (planned improvement).
@@ -322,10 +417,17 @@ git clone https://github.com/cto-ai/cto-ai-cli.git
 cd cto-ai-cli
 npm install
 npm run build
-npm test              # 376 tests
+npm test              # 550 tests, 91% coverage
 npm run typecheck     # strict TypeScript, zero errors
 ```
+Run the automated benchmark to see CTO vs naive on this repo:
+```bash
+npx tsx scripts/benchmark.ts          # Human-readable report
+npx tsx scripts/benchmark.ts --json   # Machine-readable JSON
+```
 Full API docs, MCP server reference, and architecture are in [DOCS.md](DOCS.md).
 ## License

package/dist/action/index.js CHANGED Viewed

@@ -24179,8 +24179,8 @@ async function analyzeProject(projectPath, config) {
     maxDepth: mergedConfig.analysis.maxDepth
   });
   const tokenMethod = mergedConfig.tokens.method;
-  const files = [];
-  for (const entry of walkEntries) {
+  const BATCH_SIZE = 50;
+  async function estimateFileTokens(entry) {
     let tokens;
     if (tokenMethod === "tiktoken") {
       try {
@@ -24192,7 +24192,7 @@ async function analyzeProject(projectPath, config) {
     } else {
       tokens = countTokensChars4(entry.size);
     }
-    files.push({
+    return {
       path: entry.path,
       relativePath: entry.relativePath,
       extension: entry.extension,
@@ -24201,16 +24201,20 @@ async function analyzeProject(projectPath, config) {
       lines: entry.lines,
       lastModified: entry.lastModified,
       kind: classifyFileKind(entry.relativePath),
-      // Graph data — populated by graph analysis
       imports: [],
       importedBy: [],
       isHub: false,
       complexity: 0,
-      // Risk data — populated by risk analysis
       riskScore: 0,
       riskFactors: [],
       exclusionImpact: "none"
-    });
+    };
+  }
+  const files = [];
+  for (let i = 0; i < walkEntries.length; i += BATCH_SIZE) {
+    const batch = walkEntries.slice(i, i + BATCH_SIZE);
+    const results = await Promise.all(batch.map(estimateFileTokens));
+    files.push(...results);
   }
   const graph = buildProjectGraph(absPath, files);
   for (const file of files) {

package/dist/api/dashboard.js CHANGED Viewed

@@ -592,8 +592,8 @@ async function analyzeProject(projectPath, config) {
     maxDepth: mergedConfig.analysis.maxDepth
   });
   const tokenMethod = mergedConfig.tokens.method;
-  const files = [];
-  for (const entry of walkEntries) {
+  const BATCH_SIZE = 50;
+  async function estimateFileTokens(entry) {
     let tokens;
     if (tokenMethod === "tiktoken") {
       try {
@@ -605,7 +605,7 @@ async function analyzeProject(projectPath, config) {
     } else {
       tokens = countTokensChars4(entry.size);
     }
-    files.push({
+    return {
       path: entry.path,
       relativePath: entry.relativePath,
       extension: entry.extension,
@@ -614,16 +614,20 @@ async function analyzeProject(projectPath, config) {
       lines: entry.lines,
       lastModified: entry.lastModified,
       kind: classifyFileKind(entry.relativePath),
-      // Graph data — populated by graph analysis
       imports: [],
       importedBy: [],
       isHub: false,
       complexity: 0,
-      // Risk data — populated by risk analysis
       riskScore: 0,
       riskFactors: [],
       exclusionImpact: "none"
-    });
+    };
+  }
+  const files = [];
+  for (let i = 0; i < walkEntries.length; i += BATCH_SIZE) {
+    const batch = walkEntries.slice(i, i + BATCH_SIZE);
+    const results = await Promise.all(batch.map(estimateFileTokens));
+    files.push(...results);
   }
   const graph = buildProjectGraph(absPath, files);
   for (const file of files) {