cto-ai-cli 3.2.0 → 4.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/DOCS.md CHANGED
@@ -6,6 +6,7 @@
6
6
 
7
7
  - [CLI Commands](#cli-commands)
8
8
  - [Security Audit](#security-audit---audit)
9
+ - [Context Gateway](#context-gateway)
9
10
  - [MCP Server](#mcp-server)
10
11
  - [API Server](#api-server)
11
12
  - [Programmatic API](#programmatic-api)
@@ -236,6 +237,206 @@ Based on findings, CTO generates actionable recommendations:
236
237
 
237
238
  ---
238
239
 
240
+ ## Context Gateway
241
+
242
+ A transparent HTTP proxy that sits between your application and any LLM provider. It intercepts every request, scans for secrets, optimizes context, tracks costs, and enforces budgets — all with zero external dependencies.
243
+
244
+ ### Quick start
245
+
246
+ ```bash
247
+ npx cto-gateway # Start on port 8787
248
+ npx cto-gateway --port 9000 # Custom port
249
+ npx cto-gateway --block-secrets # Hard block on critical secrets
250
+ npx cto-gateway --budget-daily 10 # $10/day limit
251
+ ```
252
+
253
+ Then point your app:
254
+ ```bash
255
+ export OPENAI_BASE_URL=http://localhost:8787
256
+ ```
257
+
258
+ Every request must include the target provider URL as a header:
259
+ ```
260
+ x-cto-target: https://api.openai.com/v1/chat/completions
261
+ ```
262
+
263
+ ### Architecture
264
+
265
+ ```
266
+ Your App → Gateway (localhost:8787) → Provider (api.openai.com)
267
+
268
+ ├── Secret Scanner → redacts/blocks secrets in messages
269
+ ├── Context Optimizer → injects CTO-selected context
270
+ ├── Cost Tracker → logs per-request cost to JSONL
271
+ ├── Budget Guard → rejects requests over limit (429)
272
+ └── Dashboard → live web UI at /__cto
273
+ ```
274
+
275
+ ### CLI flags
276
+
277
+ | Flag | Default | Description |
278
+ |------|---------|-------------|
279
+ | `--port <n>` | `8787` | Port to listen on |
280
+ | `--host <addr>` | `127.0.0.1` | Host to bind to |
281
+ | `--project <path>` | `.` | Project to analyze for context optimization |
282
+ | `--block-secrets` | off | Hard block requests with critical secrets (403) |
283
+ | `--budget-daily <$>` | unlimited | Max cost per day — returns 429 when exceeded |
284
+ | `--budget-monthly <$>` | unlimited | Max cost per month |
285
+ | `--no-optimize` | on | Disable CTO context injection |
286
+ | `--no-redact` | on | Disable secret redaction |
287
+ | `--no-dashboard` | on | Disable web dashboard |
288
+
289
+ ### Provider detection
290
+
291
+ The Gateway auto-detects providers from the `x-cto-target` URL and request headers:
292
+
293
+ | Provider | Detection | Auth header |
294
+ |----------|-----------|-------------|
295
+ | OpenAI | `api.openai.com` or `/v1/chat/completions` | `Authorization: Bearer sk-...` |
296
+ | Anthropic | `api.anthropic.com` or `anthropic-version` header | `x-api-key` |
297
+ | Google AI | `generativelanguage.googleapis.com` | `x-goog-api-key` |
298
+ | Azure OpenAI | `*.openai.azure.com` or `api-key` header | `api-key` |
299
+ | Custom | Fallback — assumes OpenAI-compatible | `Authorization` |
300
+
301
+ ### Model pricing
302
+
303
+ Built-in pricing for accurate cost tracking:
304
+
305
+ | Model | Input ($/M tokens) | Output ($/M tokens) | Context window |
306
+ |-------|--------------------|--------------------|----------------|
307
+ | gpt-4o | $2.50 | $10.00 | 128K |
308
+ | gpt-4o-mini | $0.15 | $0.60 | 128K |
309
+ | o1 | $15.00 | $60.00 | 200K |
310
+ | o3-mini | $1.10 | $4.40 | 200K |
311
+ | claude-sonnet-4 | $3.00 | $15.00 | 200K |
312
+ | claude-3.5-haiku | $0.80 | $4.00 | 200K |
313
+ | gemini-2.5-pro | $1.25 | $10.00 | 1M |
314
+ | gemini-2.0-flash | $0.10 | $0.40 | 1M |
315
+
316
+ ### Request lifecycle
317
+
318
+ 1. **Receive** — Client sends POST request to Gateway
319
+ 2. **Budget check** — If daily/monthly budget exceeded → 429
320
+ 3. **Parse** — Detect provider, extract messages from provider-specific format
321
+ 4. **Scan secrets** — Run 30+ patterns against all message content
322
+ 5. **Redact or block** — Replace secrets with `***REDACTED***` or return 403
323
+ 6. **Optimize context** — If analysis ready, inject CTO-selected files into system prompt
324
+ 7. **Forward** — Proxy to provider (streaming SSE or buffered)
325
+ 8. **Track** — Log cost, tokens, savings, latency to JSONL
326
+ 9. **Respond** — Forward provider response to client (zero-copy for streams)
327
+
328
+ ### Streaming support
329
+
330
+ The Gateway fully supports Server-Sent Events (SSE) streaming:
331
+
332
+ - Detects streaming from `Content-Type: text/event-stream`
333
+ - Zero-copy passthrough: chunks are forwarded to client as they arrive
334
+ - Async token tracking: parses SSE events in background without blocking the stream
335
+ - Usage data extracted from final SSE chunk (when provider includes it)
336
+
337
+ ### Dashboard
338
+
339
+ Available at `http://localhost:8787/__cto` (configurable via `--dashboardPath`).
340
+
341
+ Shows:
342
+ - **Today**: requests, cost, tokens saved, secrets redacted
343
+ - **This month**: totals + budget progress
344
+ - **Feature status**: optimization, redaction, tracking, audit log
345
+ - **By model**: breakdown of requests, tokens, and cost per model
346
+ - **By provider**: requests and cost per provider
347
+ - Auto-refreshes every 30 seconds
348
+
349
+ ### Usage storage
350
+
351
+ | File | Format | Description |
352
+ |------|--------|-------------|
353
+ | `.cto/gateway/usage/YYYY-MM.jsonl` | JSON Lines | One line per request. Monthly files. |
354
+
355
+ Each line:
356
+ ```json
357
+ {
358
+ "id": "a1b2c3d4",
359
+ "timestamp": "2026-02-24T23:52:00.000Z",
360
+ "provider": "openai",
361
+ "model": "gpt-4o",
362
+ "inputTokens": 1200,
363
+ "outputTokens": 350,
364
+ "costUSD": 0.0065,
365
+ "originalTokens": 6200,
366
+ "optimizedTokens": 1200,
367
+ "savedTokens": 5000,
368
+ "savedUSD": 0.0130,
369
+ "secretsRedacted": 2,
370
+ "secretsBlocked": false,
371
+ "latencyMs": 152,
372
+ "stream": true
373
+ }
374
+ ```
375
+
376
+ ### Budget enforcement
377
+
378
+ When a budget is set, the Gateway checks cost totals before every request:
379
+
380
+ | Condition | HTTP response |
381
+ |-----------|---------------|
382
+ | Daily budget exceeded | `429 Too Many Requests` + `{ "error": "Daily budget exceeded", "budget": 10, "current": 10.42 }` |
383
+ | Monthly budget exceeded | `429 Too Many Requests` + `{ "error": "Monthly budget exceeded" }` |
384
+ | Critical secrets + `--block-secrets` | `403 Forbidden` + `{ "error": "Request blocked: secrets detected" }` |
385
+
386
+ Budget alerts are emitted at 80% of limit (configurable via `alertThreshold`).
387
+
388
+ ### Programmatic API
389
+
390
+ ```typescript
391
+ import { ContextGateway, UsageTracker } from 'cto-ai-cli/gateway';
392
+
393
+ const gateway = new ContextGateway({
394
+ port: 8787,
395
+ projectPath: '/path/to/project',
396
+ redactSecrets: true,
397
+ blockOnSecrets: false,
398
+ budgetDaily: 20,
399
+ budgetMonthly: 500,
400
+ });
401
+
402
+ // Listen to events
403
+ gateway.onEvent((event) => {
404
+ if (event.type === 'request') console.log(`${event.record.model}: $${event.record.costUSD}`);
405
+ if (event.type === 'budget-alert') console.log(`Budget warning: ${event.period}`);
406
+ });
407
+
408
+ await gateway.start();
409
+
410
+ // Get usage summary
411
+ const tracker = gateway.getTracker();
412
+ const summary = tracker.getSummary('month');
413
+ console.log(`This month: ${summary.totalRequests} requests, $${summary.totalCostUSD}`);
414
+
415
+ // Provider detection (standalone)
416
+ import { detectProvider, estimateCost } from 'cto-ai-cli/gateway';
417
+
418
+ const provider = detectProvider('https://api.openai.com/v1/chat/completions', {});
419
+ const cost = estimateCost(provider, 'gpt-4o', 5000, 1000);
420
+ ```
421
+
422
+ ### Interceptor (standalone)
423
+
424
+ ```typescript
425
+ import { interceptRequest } from 'cto-ai-cli/gateway';
426
+ import type { Message, GatewayConfig } from 'cto-ai-cli/gateway';
427
+
428
+ const messages: Message[] = [
429
+ { role: 'user', content: 'Deploy with key sk-live_abc123...' },
430
+ ];
431
+
432
+ const result = await interceptRequest(messages, config, analysis);
433
+ // result.secretsRedacted → 1
434
+ // result.messages[0].content → 'Deploy with key sk-l**********23...'
435
+ // result.decisions → ['Redacted 1 secret(s) in user message: api-key']
436
+ ```
437
+
438
+ ---
439
+
239
440
  ## MCP Server
240
441
 
241
442
  CTO exposes 19 tools via the Model Context Protocol.
package/README.md CHANGED
@@ -3,7 +3,7 @@
3
3
  > **Early access** — This is a test version. We'd love your feedback.
4
4
 
5
5
  [![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](LICENSE)
6
- [![Tests](https://img.shields.io/badge/tests-449_passing-brightgreen.svg)](#)
6
+ [![Tests](https://img.shields.io/badge/tests-573_passing-brightgreen.svg)](#)
7
7
 
8
8
  ## Try it now (zero install)
9
9
 
@@ -301,6 +301,72 @@ Every day, developers accidentally send secrets to AI tools:
301
301
 
302
302
  ---
303
303
 
304
+ ## 🌐 Context Gateway — AI proxy for your entire team
305
+
306
+ Every AI API call from your team passes through the Gateway. It sits between your app and any LLM provider, automatically optimizing context, redacting secrets, and tracking costs.
307
+
308
+ ```bash
309
+ npx cto-gateway
310
+ ```
311
+
312
+ ```
313
+ ⚡ CTO Context Gateway v4.0.0
314
+
315
+ 🌐 Proxy: http://127.0.0.1:8787
316
+ 📊 Dashboard: http://127.0.0.1:8787/__cto
317
+ 📁 Project: /your/project
318
+
319
+ ✅ Context optimization
320
+ ✅ Secret redaction
321
+ ✅ Cost tracking
322
+ ⬜ Daily budget (unlimited)
323
+
324
+ How to connect:
325
+ OPENAI_BASE_URL=http://127.0.0.1:8787
326
+ + set header: x-cto-target: https://api.openai.com/v1/chat/completions
327
+
328
+ Waiting for requests...
329
+
330
+ 18:52:34 openai/gpt-4o 1200 tokens $0.0075 (saved 5.2K tokens, $0.0130) [2 secrets redacted] 152ms
331
+ ```
332
+
333
+ ### What it does
334
+
335
+ | Feature | Description |
336
+ |---------|-------------|
337
+ | **Secret redaction** | Scans every message for API keys, tokens, passwords → auto-redacts before sending to the LLM |
338
+ | **Secret blocking** | Optional hard block — reject requests that contain critical secrets |
339
+ | **Context optimization** | Injects CTO-selected files, type definitions, and hub modules into system prompts |
340
+ | **Cost tracking** | Tracks per-request cost by model and provider. Persistent JSONL logs. |
341
+ | **Budget enforcement** | Set daily/monthly limits. Gateway returns 429 when exceeded. |
342
+ | **Live dashboard** | Dark-theme web UI at `/__cto` — today's stats, monthly breakdown, model costs |
343
+ | **SSE streaming** | Full passthrough of streaming responses with zero-copy. No added latency. |
344
+ | **Multi-provider** | OpenAI, Anthropic, Google AI, Azure OpenAI, and any OpenAI-compatible API |
345
+
346
+ ### Supported providers & models
347
+
348
+ | Provider | Models | Pricing tracked |
349
+ |----------|--------|----------------|
350
+ | **OpenAI** | GPT-4o, GPT-4o Mini, o1, o1-mini, o3-mini | ✅ |
351
+ | **Anthropic** | Claude Sonnet 4, Claude 3.5 Haiku, Claude 3 Opus | ✅ |
352
+ | **Google** | Gemini 2.5 Pro, Gemini 2.0 Flash, Gemini 1.5 Pro | ✅ |
353
+ | **Azure OpenAI** | Same as OpenAI (different hosting) | ✅ |
354
+ | **Custom** | Any OpenAI-compatible API (Ollama, LiteLLM, etc.) | Manual |
355
+
356
+ ### Configuration
357
+
358
+ ```bash
359
+ cto-gateway --port 9000 # Custom port
360
+ cto-gateway --block-secrets # Hard block on critical secrets
361
+ cto-gateway --budget-daily 10 # Max $10/day
362
+ cto-gateway --budget-monthly 200 # Max $200/month
363
+ cto-gateway --project ./my-app # Analyze a specific project
364
+ cto-gateway --no-optimize # Disable context injection
365
+ cto-gateway --no-redact # Disable secret redaction
366
+ ```
367
+
368
+ ---
369
+
304
370
  ## What you can do with CTO
305
371
 
306
372
  | Use case | How |
@@ -309,12 +375,14 @@ Every day, developers accidentally send secrets to AI tools:
309
375
  | **Auto-optimize context** | `npx cto-ai-cli --fix` → generates `.cto/context.md` to paste into AI |
310
376
  | **Task-specific context** | `npx cto-ai-cli --context "refactor auth"` → optimized for your task |
311
377
  | **Security audit** | `npx cto-ai-cli --audit` → detect secrets & PII before AI sees them |
378
+ | **AI proxy (Gateway)** | `npx cto-gateway` → proxy with secret redaction + cost tracking |
312
379
  | **Shareable report** | `npx cto-ai-cli --report` → markdown report + README badge |
313
380
  | **Compare vs open source** | `npx cto-ai-cli --compare` → your score vs Zod, Next.js, Express |
314
381
  | **Compare strategies** | `npx cto-ai-cli --benchmark` → CTO vs naive vs random |
315
382
  | **Get context for a task** | `cto2 interact "your task"` |
316
383
  | **Use in your AI editor** | Add MCP server (see setup above) |
317
384
  | **Block secrets in CI** | `CI=true npx cto-ai-cli --audit` |
385
+ | **Budget control** | `cto-gateway --budget-daily 10 --budget-monthly 200` |
318
386
  | **JSON output (scripting)** | `npx cto-ai-cli --json` |
319
387
 
320
388
  ---
@@ -350,7 +418,7 @@ git clone <repo-url>
350
418
  cd cto
351
419
  npm install
352
420
  npm run build
353
- npm test # 449 tests
421
+ npm test # 573 tests
354
422
  npm run typecheck # strict TypeScript
355
423
  ```
356
424