cost-katana 2.4.1 β†’ 2.4.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,18 +1,102 @@
1
- # Cost Katana πŸ₯·
1
+ # Cost Katana
2
+
3
+ [![npm](https://img.shields.io/npm/v/cost-katana.svg)](https://www.npmjs.com/package/cost-katana)
4
+ [![PyPI](https://img.shields.io/pypi/v/cost-katana.svg)](https://pypi.org/project/cost-katana/)
5
+ [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](./LICENSE)
6
+ [![Node.js](https://img.shields.io/badge/node-%3E%3D18-brightgreen)](https://nodejs.org/)
7
+ [![Python](https://img.shields.io/badge/python-%3E%3D3.8-blue)](https://pypi.org/project/cost-katana/)
2
8
 
3
9
  > **Cut your AI costs in half. Without cutting corners.**
4
10
 
5
11
  Cost Katana is a drop-in SDK that wraps your AI calls with automatic cost tracking, smart caching, and optimizationβ€”all in one line of code.
6
12
 
13
+ ## Table of contents
14
+
15
+ - [Cost Katana](#cost-katana)
16
+ - [Table of contents](#table-of-contents)
17
+ - [Installation](#installation)
18
+ - [Quick start](#quick-start)
19
+ - [Path A β€” Gateway (HTTP proxy)](#path-a--gateway-http-proxy)
20
+ - [Path B β€” `ai()` (simple API, cost on the response)](#path-b--ai-simple-api-cost-on-the-response)
21
+ - [Path C β€” Python](#path-c--python)
22
+ - [Which API should I use?](#which-api-should-i-use)
23
+ - [Configuration](#configuration)
24
+ - [Environment variables](#environment-variables)
25
+ - [Programmatic configuration](#programmatic-configuration)
26
+ - [Common request options (`ai()`)](#common-request-options-ai)
27
+ - [Core APIs](#core-apis)
28
+ - [`ai()`](#ai)
29
+ - [`chat()`](#chat)
30
+ - [`gateway()`](#gateway)
31
+ - [Provider-independent design](#provider-independent-design)
32
+ - [Type-safe model constants](#type-safe-model-constants)
33
+ - [Claude extended thinking (`ProviderRequest.thinking`)](#claude-extended-thinking-providerrequestthinking)
34
+ - [Cost optimization](#cost-optimization)
35
+ - [Cheatsheet](#cheatsheet)
36
+ - [Caching](#caching)
37
+ - [Cortex (optimization)](#cortex-optimization)
38
+ - [Compare models side by side](#compare-models-side-by-side)
39
+ - [Quick wins](#quick-wins)
40
+ - [Security and reliability](#security-and-reliability)
41
+ - [Firewall](#firewall)
42
+ - [Auto-failover](#auto-failover)
43
+ - [Usage tracking and analytics](#usage-tracking-and-analytics)
44
+ - [Dashboard attribution with `configure()` and `ai()`](#dashboard-attribution-with-configure-and-ai)
45
+ - [`AICostTracker` with defaults (advanced)](#aicosttracker-with-defaults-advanced)
46
+ - [Dedicated per-provider trackers](#dedicated-per-provider-trackers)
47
+ - [View analytics in the dashboard](#view-analytics-in-the-dashboard)
48
+ - [Manual usage tracking](#manual-usage-tracking)
49
+ - [Session replay and distributed tracing](#session-replay-and-distributed-tracing)
50
+ - [Framework integration](#framework-integration)
51
+ - [Next.js App Router](#nextjs-app-router)
52
+ - [Express.js](#expressjs)
53
+ - [Fastify](#fastify)
54
+ - [NestJS](#nestjs)
55
+ - [Error handling](#error-handling)
56
+ - [AI gateway (details)](#ai-gateway-details)
57
+ - [Experimentation (hosted API)](#experimentation-hosted-api)
58
+ - [Examples and documentation](#examples-and-documentation)
59
+ - [Migration guides](#migration-guides)
60
+ - [From OpenAI SDK](#from-openai-sdk)
61
+ - [From Anthropic SDK](#from-anthropic-sdk)
62
+ - [From LangChain](#from-langchain)
63
+ - [Contributing](#contributing)
64
+ - [Support](#support)
65
+ - [License](#license)
66
+
67
+ ---
68
+
69
+ ## Installation
70
+
71
+ **TypeScript / Node**
72
+
73
+ ```bash
74
+ npm install cost-katana
75
+ ```
76
+
77
+ **Python** β€” published on PyPI as [`cost-katana`](https://pypi.org/project/cost-katana/) (install name uses a hyphen; import uses an underscore).
78
+
79
+ ```bash
80
+ pip install cost-katana
81
+ ```
82
+
83
+ ```python
84
+ import cost_katana as ck # package import: cost_katana
85
+ ```
86
+
87
+ Requires **Node.js 18+** for the npm package and **Python 3.8+** for the PyPI package.
88
+
7
89
  ---
8
90
 
9
- ## Get started in 60 seconds
91
+ ## Quick start
10
92
 
11
93
  Set **`COST_KATANA_API_KEY`**. **`PROJECT_ID`** is optional (recommended for per-project analytics in the dashboard).
12
94
 
13
- ### Gateway first (drop-in proxy β€” like changing base URL + one header)
95
+ ### Path A β€” Gateway (HTTP proxy)
14
96
 
15
- **HTTP / cURL** β€” no SDK; send OpenAI-compatible JSON to the gateway:
97
+ Use this when you want a **drop-in proxy**: change base URL and send `Authorization: Bearer`, or use **`gateway()`** in TypeScript with no extra config (reads `COST_KATANA_API_KEY`, same behavior as `createGatewayClientFromEnv()`).
98
+
99
+ **cURL** (no SDK; OpenAI-compatible JSON):
16
100
 
17
101
  ```bash
18
102
  curl -s https://api.costkatana.com/api/gateway/v1/chat/completions \
@@ -21,26 +105,20 @@ curl -s https://api.costkatana.com/api/gateway/v1/chat/completions \
21
105
  -d '{"model":"gpt-4o","messages":[{"role":"user","content":"Hello!"}]}'
22
106
  ```
23
107
 
24
- See also [`examples/curl-http.sh`](./examples/curl-http.sh).
25
-
26
- **TypeScript β€” `gateway()`** β€” zero extra config; reads `COST_KATANA_API_KEY` (same behavior as `createGatewayClientFromEnv()`):
27
-
28
- ```bash
29
- npm install cost-katana
30
- ```
108
+ **TypeScript**
31
109
 
32
110
  ```typescript
33
- import { gateway } from 'cost-katana';
111
+ import { gateway, OPENAI } from 'cost-katana';
34
112
 
35
113
  const res = await gateway().openai({
36
- model: 'gpt-4o',
37
- messages: [{ role: 'user', content: 'Hello!' }],
114
+ model: OPENAI.GPT_4O,
115
+ messages: [{ role: 'user', content: 'Hello!' }]
38
116
  });
39
117
 
40
118
  console.log(res.data);
41
119
  ```
42
120
 
43
- ### `ai()` β€” simple typed API with cost on the response
121
+ ### Path B β€” `ai()` (simple API, cost on the response)
44
122
 
45
123
  ```typescript
46
124
  import { ai, OPENAI } from 'cost-katana';
@@ -50,11 +128,9 @@ const response = await ai(OPENAI.GPT_4O, 'Hello');
50
128
  console.log(response.text, response.cost);
51
129
  ```
52
130
 
53
- ### Python
131
+ ### Path C β€” Python
54
132
 
55
- ```bash
56
- pip install costkatana
57
- ```
133
+ Install [`cost-katana` from PyPI](https://pypi.org/project/cost-katana/), set `COST_KATANA_API_KEY` (and optionally `PROJECT_ID`), then:
58
134
 
59
135
  ```python
60
136
  import cost_katana as ck
@@ -64,332 +140,355 @@ response = ck.ai(openai.gpt_4o, "Hello")
64
140
  print(response.text, response.cost)
65
141
  ```
66
142
 
143
+ The Python SDK talks to the same hosted backend as TypeScript (`https://api.costkatana.com` by default). For HTTP gateway usage (OpenAI- or Anthropic-shaped JSON), see the [package README on PyPI](https://pypi.org/project/cost-katana/).
144
+
67
145
  ### Which API should I use?
68
146
 
69
- | If you want… | Use |
70
- |--------------|-----|
71
- | Drop-in HTTP proxy (existing OpenAI clients / curl) | Gateway URL + `Authorization: Bearer`, or **`gateway()`** in TypeScript |
72
- | Simple AI calls with cost on the response | **`ai()`** / **`chat()`** |
73
- | Session replay, advanced analytics, or manual `trackUsage` | **`AICostTracker`** (advanced) |
147
+ | If you want… | Use |
148
+ | ---------------------------------------------------------- | ----------------------------------------------------------------------- |
149
+ | Drop-in HTTP proxy (existing OpenAI clients / cURL) | Gateway URL + `Authorization: Bearer`, or **`gateway()`** in TypeScript |
150
+ | Simple AI calls with cost on the response | **`ai()`** / **`chat()`** |
151
+ | Session replay, advanced analytics, or manual `trackUsage` | **`AICostTracker`** (advanced) |
74
152
 
75
- For most apps, **`COST_KATANA_API_KEY`** plus either **`gateway()`** (proxy) or **`ai()`** (SDK) is enough. Optional direct provider keys: see [`.env.example`](./.env.example) if you need them.
153
+ For most apps, **`COST_KATANA_API_KEY`** plus either **`gateway()`** (proxy) or **`ai()`** (SDK) is enough. For optional direct provider keys, add them to your environment as shown in [Configuration](#configuration).
76
154
 
77
155
  ---
78
156
 
79
- ## 🌍 Provider-Independent by Design
157
+ ## Configuration
80
158
 
81
- Cost Katana is **completely provider-agnostic**. Never lock yourself into a single vendor.
159
+ ### Environment variables
82
160
 
83
- ### βœ… Use Capability-Based Routing
161
+ **Start here:** `COST_KATANA_API_KEY` unlocks routing, tracking, and dashboard features. **`PROJECT_ID`** is optional (scopes usage to a project in the dashboard).
84
162
 
85
- ```typescript
86
- import { ai, ModelCapability } from 'cost-katana';
163
+ Create a `.env` in your project (or export in your shell) with the variables you need:
87
164
 
88
- // Automatically selects best model for each task
89
- const code = await ai(ModelCapability.CODE_GENERATION, 'Write a React component');
90
- const chat = await ai(ModelCapability.CONVERSATION, 'Hello!');
91
- const vision = await ai(ModelCapability.VISION, 'Describe this image', { image });
92
- ```
165
+ ```bash
166
+ # Required for hosted Cost Katana
167
+ COST_KATANA_API_KEY=dak_your_key_here
93
168
 
94
- ### βœ… Optimize by Performance Characteristics
169
+ # Optional β€” per-project analytics
170
+ PROJECT_ID=your_project_id
95
171
 
96
- ```typescript
97
- import { ai } from 'cost-katana';
172
+ # Optional β€” direct provider keys (bring your own keys)
173
+ OPENAI_API_KEY=sk-...
174
+ ANTHROPIC_API_KEY=sk-ant-...
175
+ GOOGLE_API_KEY=...
176
+
177
+ # Optional β€” AWS Bedrock
178
+ AWS_ACCESS_KEY_ID=...
179
+ AWS_SECRET_ACCESS_KEY=...
180
+ AWS_REGION=us-east-1
181
+ ```
98
182
 
99
- // Fastest model available
100
- const fast = await ai({ speed: 'fastest' }, prompt);
183
+ There is no `.env.example` file in this repository; copy the block above into your own `.env` and fill in values.
101
184
 
102
- // Cheapest model available
103
- const cheap = await ai({ cost: 'cheapest' }, prompt);
185
+ ### Programmatic configuration
104
186
 
105
- // Best quality model
106
- const best = await ai({ quality: 'best' }, prompt);
187
+ ```typescript
188
+ import { configure } from 'cost-katana';
107
189
 
108
- // Balanced approach
109
- const balanced = await ai({ speed: 'fast', cost: 'cheap' }, prompt);
190
+ await configure({
191
+ apiKey: 'dak_your_key',
192
+ cortex: true, // 40–75% cost savings (when enabled on requests)
193
+ cache: true, // Smart caching (when enabled on requests)
194
+ firewall: true // Block prompt injections
195
+ });
110
196
  ```
111
197
 
112
- **Benefits:**
113
- - πŸ”„ **Automatic Failover** - Seamlessly switch providers if one goes down
114
- - πŸ’° **Cost Optimization** - Routes to the cheapest provider automatically
115
- - πŸš€ **Future-Proof** - New providers added without code changes
116
- - πŸ”“ **Zero Lock-In** - Switch providers anytime, no refactoring needed
198
+ ### Common request options (`ai()`)
117
199
 
118
- [Read the full Provider-Agnostic Guide β†’](https://github.com/Hypothesize-Tech/costkatana-examples/blob/main/PROVIDER_AGNOSTIC_GUIDE.md)
200
+ | Option | Description |
201
+ | --------------- | ----------------------------------- |
202
+ | `temperature` | Creativity (0–2), default `0.7` |
203
+ | `maxTokens` | Max response tokens, default `1000` |
204
+ | `systemMessage` | System prompt |
205
+ | `cache` | Enable caching |
206
+ | `cortex` | Enable optimization (Cortex) |
119
207
 
120
- ---
208
+ ```typescript
209
+ import { ai, OPENAI } from 'cost-katana';
121
210
 
122
- ## πŸ“– Tutorial: Build a Cost-Aware Chatbot
211
+ const response = await ai(OPENAI.GPT_4O, 'Your prompt', {
212
+ temperature: 0.7,
213
+ maxTokens: 500,
214
+ systemMessage: 'You are a helpful AI',
215
+ cache: true,
216
+ cortex: true
217
+ });
218
+ ```
123
219
 
124
- Let's build something real. In this tutorial, you'll create a chatbot that:
125
- - βœ… Tracks every dollar spent
126
- - βœ… Caches repeated questions (saving 100% on duplicates)
127
- - βœ… Optimizes long responses (40-75% savings)
220
+ ---
128
221
 
129
- ### Part 1: Basic Chat Session
222
+ ## Core APIs
130
223
 
131
- ```typescript
132
- import { chat, OPENAI } from 'cost-katana';
224
+ ### `ai()`
133
225
 
134
- // Create a persistent chat session
135
- const session = chat(OPENAI.GPT_4);
226
+ The simplest way to make AI requests with automatic cost tracking.
136
227
 
137
- // Send messages and track costs
138
- await session.send('Hello! What can you help me with?');
139
- await session.send('Tell me a programming joke');
140
- await session.send('Now explain it');
228
+ **Signature**
141
229
 
142
- // See exactly what you spent
143
- console.log(`πŸ’° Total cost: $${session.totalCost.toFixed(4)}`);
144
- console.log(`πŸ“Š Messages: ${session.messages.length}`);
145
- console.log(`🎯 Tokens used: ${session.totalTokens}`);
230
+ ```typescript
231
+ await ai(model, prompt, options?);
146
232
  ```
147
233
 
148
- ### Part 2: Add Smart Caching
234
+ - **`model`** β€” Use type-safe constants (e.g. `OPENAI.GPT_4O`). String model IDs still work but are deprecated.
235
+ - **`prompt`** β€” User prompt text.
236
+ - **`options`** β€” See [Common request options](#common-request-options-ai).
149
237
 
150
- Cache identical questions to avoid paying twice:
238
+ **Returns:** `text`, `cost`, `tokens`, `model`, `provider`, and optionally `cached`, `optimized`, `templateUsed` when applicable.
151
239
 
152
240
  ```typescript
153
241
  import { ai, OPENAI } from 'cost-katana';
154
242
 
155
- // First call - hits the API
156
- const response1 = await ai(OPENAI.GPT_4, 'What is 2+2?', { cache: true });
157
- console.log(`Cached: ${response1.cached}`); // false
158
- console.log(`Cost: $${response1.cost}`); // $0.0008
243
+ const response = await ai(OPENAI.GPT_4O, 'Explain quantum computing', {
244
+ temperature: 0.7,
245
+ maxTokens: 500
246
+ });
247
+
248
+ console.log(response.text);
249
+ console.log(`Cost: $${response.cost}`);
250
+ ```
251
+
252
+ ### `chat()`
159
253
 
160
- // Second call - served from cache (FREE!)
161
- const response2 = await ai(OPENAI.GPT_4, 'What is 2+2?', { cache: true });
162
- console.log(`Cached: ${response2.cached}`); // true
163
- console.log(`Cost: $${response2.cost}`); // $0.0000 πŸŽ‰
254
+ Create a **session** with conversation history and cost tracking.
255
+
256
+ **Signature**
257
+
258
+ ```typescript
259
+ const session = chat(model, options?);
164
260
  ```
165
261
 
166
- ### Part 3: Enable Cortex Optimization
262
+ **Session API**
167
263
 
168
- For long-form content, Cortex compresses prompts intelligently:
264
+ | Member | Description |
265
+ | --------------- | ------------------------------------------------ |
266
+ | `send(message)` | Send a message and append assistant reply |
267
+ | `messages` | Full conversation history |
268
+ | `totalCost` | Running total cost (USD) |
269
+ | `totalTokens` | Running token count |
270
+ | `clear()` | Reset conversation (keeps system message if set) |
169
271
 
170
272
  ```typescript
171
- import { ai, OPENAI } from 'cost-katana';
273
+ import { chat, OPENAI } from 'cost-katana';
172
274
 
173
- const response = await ai(
174
- OPENAI.GPT_4,
175
- 'Write a comprehensive guide to machine learning for beginners',
176
- {
177
- cortex: true, // Enable 40-75% cost reduction
178
- maxTokens: 2000
179
- }
180
- );
275
+ const session = chat(OPENAI.GPT_4O, {
276
+ systemMessage: 'You are a helpful AI assistant.',
277
+ temperature: 0.7
278
+ });
181
279
 
182
- console.log(`Optimized: ${response.optimized}`);
183
- console.log(`Saved: $${response.savedAmount}`);
280
+ await session.send('Hello! What can you help me with?');
281
+ await session.send('Tell me a programming joke');
282
+ await session.send('Now explain it');
283
+
284
+ console.log(`Total cost: $${session.totalCost.toFixed(4)}`);
285
+ console.log(`Messages: ${session.messages.length}`);
286
+ console.log(`Tokens used: ${session.totalTokens}`);
184
287
  ```
185
288
 
186
- ### Part 4: Compare Models Side-by-Side
289
+ ### `gateway()`
187
290
 
188
- Find the best price-to-quality ratio for your use case:
291
+ Zero extra config for the hosted gateway: **`COST_KATANA_API_KEY`** is read from the environment. Use the same OpenAI-shaped request bodies you would send upstream.
189
292
 
190
- ```typescript
191
- import { ai, OPENAI, ANTHROPIC, GOOGLE } from 'cost-katana';
293
+ For advanced gateway features (headers, proxy keys, firewall), see [`docs/GATEWAY.md`](./docs/GATEWAY.md) and [`docs/API.md`](./docs/API.md).
192
294
 
193
- const prompt = 'Summarize the theory of relativity in 50 words';
295
+ ---
194
296
 
195
- const models = [
196
- { name: 'GPT-4', id: OPENAI.GPT_4 },
197
- { name: 'Claude 3.5 Sonnet', id: ANTHROPIC.CLAUDE_3_5_SONNET_20241022 },
198
- { name: 'Gemini 2.5 Pro', id: GOOGLE.GEMINI_2_5_PRO },
199
- { name: 'GPT-3.5 Turbo', id: OPENAI.GPT_3_5_TURBO }
200
- ];
297
+ ## Provider-independent design
201
298
 
202
- console.log('πŸ“Š Model Cost Comparison\n');
299
+ Cost Katana is **provider-agnostic**: the same **`ai()`** API works across OpenAI, Anthropic, Google, and moreβ€”pick a **model constant** per provider.
203
300
 
204
- for (const model of models) {
205
- const response = await ai(model.id, prompt);
206
- console.log(`${model.name.padEnd(20)} $${response.cost.toFixed(6)}`);
207
- }
208
- ```
301
+ ```typescript
302
+ import { ai, OPENAI, ANTHROPIC, GOOGLE } from 'cost-katana';
209
303
 
210
- **Sample Output:**
304
+ const a = await ai(OPENAI.GPT_4O, 'Hello');
305
+ const b = await ai(ANTHROPIC.CLAUDE_3_5_SONNET_20241022, 'Hello');
306
+ const c = await ai(GOOGLE.GEMINI_2_5_PRO, 'Hello');
211
307
  ```
212
- πŸ“Š Model Cost Comparison
213
308
 
214
- GPT-4 $0.001200
215
- Claude 3.5 Sonnet $0.000900
216
- Gemini 2.5 Pro $0.000150
217
- GPT-3.5 Turbo $0.000080
218
- ```
309
+ **Benefits**
310
+
311
+ - **Automatic failover** β€” Seamlessly switch providers when configured (see [Security and reliability](#security-and-reliability)).
312
+ - **Cost optimization** β€” Choose cheaper models with constants and the [cost optimization](#cost-optimization) patterns below.
313
+ - **Future-proof** β€” New providers and models are added to the registry without changing your mental model.
314
+ - **Zero lock-in** β€” Swap model constants as your stack evolves.
315
+
316
+ For deeper routing patterns (capabilities, load balancing, multi-provider setups), see the [Provider-Agnostic Guide](https://github.com/Hypothesize-Tech/costkatana-examples/blob/main/PROVIDER_AGNOSTIC_GUIDE.md).
219
317
 
220
318
  ---
221
319
 
222
- ## 🎯 Type-Safe Model Selection
320
+ ## Type-safe model constants
223
321
 
224
- Stop guessing model names. Get autocomplete and catch typos at compile time:
322
+ Stop guessing model names: use namespaces for autocomplete and typo safety.
225
323
 
226
324
  ```typescript
227
325
  import { OPENAI, ANTHROPIC, GOOGLE, AWS_BEDROCK, XAI, DEEPSEEK } from 'cost-katana';
228
326
 
229
327
  // OpenAI
230
- OPENAI.GPT_5
231
- OPENAI.GPT_4
232
- OPENAI.GPT_4O
233
- OPENAI.GPT_3_5_TURBO
234
- OPENAI.O1
235
- OPENAI.O3
328
+ OPENAI.GPT_5;
329
+ OPENAI.GPT_4;
330
+ OPENAI.GPT_4O;
331
+ OPENAI.GPT_3_5_TURBO;
332
+ OPENAI.O1;
333
+ OPENAI.O3;
236
334
 
237
335
  // Anthropic
238
- ANTHROPIC.CLAUDE_SONNET_4_5
239
- ANTHROPIC.CLAUDE_3_5_SONNET_20241022
240
- ANTHROPIC.CLAUDE_3_5_HAIKU_20241022
336
+ ANTHROPIC.CLAUDE_SONNET_4_5;
337
+ ANTHROPIC.CLAUDE_3_5_SONNET_20241022;
338
+ ANTHROPIC.CLAUDE_3_5_HAIKU_20241022;
241
339
 
242
340
  // Google
243
- GOOGLE.GEMINI_2_5_PRO
244
- GOOGLE.GEMINI_2_5_FLASH
245
- GOOGLE.GEMINI_1_5_PRO
341
+ GOOGLE.GEMINI_2_5_PRO;
342
+ GOOGLE.GEMINI_2_5_FLASH;
343
+ GOOGLE.GEMINI_1_5_PRO;
246
344
 
247
345
  // AWS Bedrock
248
- AWS_BEDROCK.NOVA_PRO
249
- AWS_BEDROCK.NOVA_LITE
250
- AWS_BEDROCK.CLAUDE_SONNET_4_5
346
+ AWS_BEDROCK.NOVA_PRO;
347
+ AWS_BEDROCK.NOVA_LITE;
348
+ AWS_BEDROCK.CLAUDE_SONNET_4_5;
251
349
 
252
350
  // Others
253
- XAI.GROK_2_1212
254
- DEEPSEEK.DEEPSEEK_CHAT
351
+ XAI.GROK_2_1212;
352
+ DEEPSEEK.DEEPSEEK_CHAT;
255
353
  ```
256
354
 
257
- **Why constants over strings?**
258
- | Feature | String `'gpt-4'` | Constant `OPENAI.GPT_4` |
259
- |---------|------------------|-------------------------|
260
- | Autocomplete | ❌ | βœ… |
261
- | Typo protection | ❌ | βœ… |
262
- | Refactor safely | ❌ | βœ… |
263
- | Self-documenting | ❌ | βœ… |
355
+ **Prefer constants over raw strings** β€” They give IDE autocomplete, catch typos early, refactor safely, and document which provider you intended.
264
356
 
265
357
  ---
266
358
 
267
- ## βš™οΈ Configuration
268
-
269
- ### Environment Variables
270
359
 
271
- **Start here:** `COST_KATANA_API_KEY` unlocks routing, tracking, and dashboard features. **`PROJECT_ID`** is optional (set it to scope usage to a project in the dashboard).
272
-
273
- ```bash
274
- # Required for hosted Cost Katana
275
- COST_KATANA_API_KEY=dak_your_key_here
360
+ ```typescript
361
+ import { ai, AWS_BEDROCK } from 'cost-katana';
276
362
 
277
- # Optional β€” per-project analytics
278
- PROJECT_ID=your_project_id
363
+ // Resolves to meta.llama4-scout-17b-instruct-v1:0
364
+ const response = await ai(
365
+ AWS_BEDROCK.LLAMA_3_2_1B_INSTRUCT,
366
+ 'Summarize the difference between RAG and fine-tuning in two sentences.',
367
+ { maxTokens: 500 }
368
+ );
279
369
  ```
280
370
 
281
- Optional: bring your own provider keys, or use AWS Bedrock. **Copy [`.env.example`](./.env.example)** into `.env` and fill in values.
371
+ ### Claude extended thinking (`ProviderRequest.thinking`)
282
372
 
283
- ```bash
284
- # Optional β€” direct provider keys
285
- OPENAI_API_KEY=sk-...
286
- ANTHROPIC_API_KEY=sk-ant-...
287
- GEMINI_API_KEY=...
373
+ **Anthropic** and **AWS Bedrock (Claude)** requests built as a full **`ProviderRequest`** can include optional **`thinking`** for [extended thinking](https://docs.anthropic.com/en/docs/build-with-claude/extended-thinking) / reasoning. The SDK:
288
374
 
289
- # Optional β€” AWS Bedrock
290
- AWS_ACCESS_KEY_ID=...
291
- AWS_SECRET_ACCESS_KEY=...
292
- AWS_REGION=us-east-1
293
- ```
375
+ - Maps supported models to **`adaptive`** thinking (with optional **`effort`**: `low` | `medium` | `high` | `max`) or **`enabled`** thinking (with optional **`budgetTokens`**, or omit it so the Cost Katana gateway can choose a budget).
376
+ - Sets **`temperature` to `1`** when thinking is on, as required by Anthropic for these calls.
294
377
 
295
- ### Programmatic Configuration
378
+ Thinking tokens are billed as **output** tokens. The high-level **`ai()`** helper does not surface `thinking` in its options yet; use **`AICostTracker.makeRequest()`** (e.g. via **`createCostKatanaTracker()`**) and pass a `ProviderRequest`.
296
379
 
297
380
  ```typescript
298
- import { configure } from 'cost-katana';
381
+ import { createCostKatanaTracker } from 'cost-katana';
382
+ import type { ProviderRequest } from 'cost-katana';
299
383
 
300
- await configure({
301
- apiKey: 'dak_your_key',
302
- cortex: true, // 40-75% cost savings
303
- cache: true, // Smart caching
304
- firewall: true // Block prompt injections
305
- });
306
- ```
384
+ const tracker = await createCostKatanaTracker();
307
385
 
308
- ### Request Options
386
+ const request: ProviderRequest = {
387
+ model: 'claude-sonnet-4-5-20250929',
388
+ messages: [
389
+ { role: 'user', content: 'Show your reasoning, then the final answer: is 2^10 > 10^2?' }
390
+ ],
391
+ maxTokens: 8000,
392
+ thinking: { enabled: true, budgetTokens: 12000 }
393
+ };
309
394
 
310
- ```typescript
311
- const response = await ai(OPENAI.GPT_4, 'Your prompt', {
312
- temperature: 0.7, // Creativity (0-2)
313
- maxTokens: 500, // Response limit
314
- systemMessage: 'You are a helpful AI', // System prompt
315
- cache: true, // Enable caching
316
- cortex: true, // Enable optimization
317
- retry: true // Auto-retry on failures
318
- });
395
+ const raw = await tracker.makeRequest(request);
396
+ // Response shape matches the provider (e.g. Anthropic `content` blocks, usage fields).
319
397
  ```
320
398
 
321
- ---
399
+ For **adaptive** thinking on newer Opus / Sonnet builds (e.g. Opus 4.6 / 4.7, Sonnet 4.6), the SDK sends `type: 'adaptive'` and uses **`effort`** (default **`high`**) when you set `thinking: { enabled: true }` on a matching model ID.
322
400
 
323
- ## πŸ”Œ Framework Integration
401
+ ## Cost optimization
324
402
 
325
- ### Next.js App Router
403
+ ### Cheatsheet
404
+
405
+ | Strategy | Typical savings | When to use |
406
+ | -------------------------------------------------- | ---------------------------- | ---------------------------------------- |
407
+ | Use a smaller/faster model (e.g. GPT-3.5 vs GPT-4) | Large on simple tasks | Trivial Q&A, classification, translation |
408
+ | **Caching** | 100% on cache hits | Repeated queries, FAQs |
409
+ | **Cortex** | 40–75% on eligible workloads | Long-form generation |
410
+ | **Chat sessions** | 10–20% | Related multi-turn work |
411
+ | **Gemini Flash** (vs heavy flagship models) | Very high $/token delta | High volume, cost-sensitive |
412
+
413
+ ### Caching
326
414
 
327
415
  ```typescript
328
- // app/api/chat/route.ts
329
416
  import { ai, OPENAI } from 'cost-katana';
330
417
 
331
- export async function POST(request: Request) {
332
- const { prompt } = await request.json();
333
- const response = await ai(OPENAI.GPT_4, prompt);
334
- return Response.json(response);
335
- }
418
+ const response1 = await ai(OPENAI.GPT_4O, 'What is 2+2?', { cache: true });
419
+ console.log(`Cached: ${response1.cached}`);
420
+ console.log(`Cost: $${response1.cost}`);
421
+
422
+ const response2 = await ai(OPENAI.GPT_4O, 'What is 2+2?', { cache: true });
423
+ console.log(`Cached: ${response2.cached}`);
424
+ console.log(`Cost: $${response2.cost}`);
336
425
  ```
337
426
 
338
- ### Express.js
427
+ ### Cortex (optimization)
339
428
 
340
429
  ```typescript
341
- import express from 'express';
342
430
  import { ai, OPENAI } from 'cost-katana';
343
431
 
344
- const app = express();
345
- app.use(express.json());
346
-
347
- app.post('/api/chat', async (req, res) => {
348
- const response = await ai(OPENAI.GPT_4, req.body.prompt);
349
- res.json(response);
350
- });
432
+ const response = await ai(
433
+ OPENAI.GPT_4O,
434
+ 'Write a comprehensive guide to machine learning for beginners',
435
+ {
436
+ cortex: true,
437
+ maxTokens: 2000
438
+ }
439
+ );
351
440
 
352
- app.listen(3000);
441
+ console.log(`Optimized: ${response.optimized}`);
442
+ console.log(`Cost: $${response.cost}`);
353
443
  ```
354
444
 
355
- ### Fastify
445
+ ### Compare models side by side
356
446
 
357
447
  ```typescript
358
- import fastify from 'fastify';
359
- import { ai, OPENAI } from 'cost-katana';
448
+ import { ai, OPENAI, ANTHROPIC, GOOGLE } from 'cost-katana';
360
449
 
361
- const app = fastify();
450
+ const prompt = 'Summarize the theory of relativity in 50 words';
362
451
 
363
- app.post('/api/chat', async (request) => {
364
- const { prompt } = request.body as { prompt: string };
365
- return await ai(OPENAI.GPT_4, prompt);
366
- });
452
+ const models = [
453
+ { name: 'GPT-4 class', id: OPENAI.GPT_4O },
454
+ { name: 'Claude 3.5 Sonnet', id: ANTHROPIC.CLAUDE_3_5_SONNET_20241022 },
455
+ { name: 'Gemini 2.5 Pro', id: GOOGLE.GEMINI_2_5_PRO },
456
+ { name: 'GPT-3.5 Turbo', id: OPENAI.GPT_3_5_TURBO }
457
+ ];
367
458
 
368
- app.listen({ port: 3000 });
459
+ console.log('Model cost comparison\n');
460
+
461
+ for (const model of models) {
462
+ const response = await ai(model.id, prompt);
463
+ console.log(`${model.name.padEnd(22)} $${response.cost.toFixed(6)}`);
464
+ }
369
465
  ```
370
466
 
371
- ### NestJS
467
+ ### Quick wins
372
468
 
373
469
  ```typescript
374
- import { Controller, Post, Body } from '@nestjs/common';
375
470
  import { ai, OPENAI } from 'cost-katana';
376
471
 
377
- @Controller('api')
378
- export class ChatController {
379
- @Post('chat')
380
- async chat(@Body() body: { prompt: string }) {
381
- return await ai(OPENAI.GPT_4, body.prompt);
382
- }
383
- }
472
+ // Expensive: flagship model for a trivial question
473
+ await ai(OPENAI.GPT_4O, 'What is 2+2?');
474
+
475
+ // Better: match model to task
476
+ await ai(OPENAI.GPT_3_5_TURBO, 'What is 2+2?');
477
+
478
+ // Better still: cache repeated FAQs
479
+ await ai(OPENAI.GPT_3_5_TURBO, 'What is 2+2?', { cache: true });
480
+
481
+ // Long content: Cortex
482
+ await ai(OPENAI.GPT_4O, 'Write a 2000-word essay', { cortex: true });
384
483
  ```
385
484
 
386
485
  ---
387
486
 
388
- ## πŸ›‘οΈ Built-in Security
487
+ ## Security and reliability
389
488
 
390
- ### Firewall Protection
489
+ ### Firewall
391
490
 
392
- Block prompt injection attacks automatically:
491
+ Block prompt injection and related abuse when enabled via **`configure({ firewall: true })`** and gateway/tracker settings.
393
492
 
394
493
  ```typescript
395
494
  import { configure, ai, OPENAI } from 'cost-katana';
@@ -397,86 +496,78 @@ import { configure, ai, OPENAI } from 'cost-katana';
397
496
  await configure({ firewall: true });
398
497
 
399
498
  try {
400
- await ai(OPENAI.GPT_4, 'Ignore all previous instructions and...');
499
+ await ai(OPENAI.GPT_4O, 'Ignore all previous instructions and...');
401
500
  } catch (error) {
402
- console.log('πŸ›‘οΈ Blocked:', error.message);
501
+ console.log('Blocked:', (error as Error).message);
403
502
  }
404
503
  ```
405
504
 
406
- **Protects against:**
407
- - Prompt injection attacks
408
- - Jailbreak attempts
409
- - Data exfiltration
410
- - Malicious content generation
411
-
412
- ---
505
+ **Helps mitigate:** prompt injection, jailbreak attempts, unsafe content patterns (exact behavior depends on your gateway configuration).
413
506
 
414
- ## πŸ”„ Auto-Failover
507
+ ### Auto-failover
415
508
 
416
- Never let provider outages break your app:
509
+ When routing and health checks allow, requests can fall back across providers so a single vendor outage does not take down your app.
417
510
 
418
511
  ```typescript
419
512
  import { ai, OPENAI } from 'cost-katana';
420
513
 
421
- // If OpenAI is down, automatically switches to Claude or Gemini
422
- const response = await ai(OPENAI.GPT_4, 'Hello');
514
+ const response = await ai(OPENAI.GPT_4O, 'Hello');
423
515
 
424
516
  console.log(`Provider used: ${response.provider}`);
425
- // Could be 'openai', 'anthropic', or 'google' depending on availability
517
+ // e.g. 'openai', 'anthropic', or 'google' depending on availability and policy
426
518
  ```
427
519
 
428
520
  ---
429
521
 
430
- ## πŸ“Š Usage tracking & analytics
522
+ ## Usage tracking and analytics
431
523
 
432
- ### Dashboard attribution (stay on `ai()`)
524
+ ### Dashboard attribution with `configure()` and `ai()`
433
525
 
434
- Use the same **`ai()`** API as everywhere else. Point usage at your project once with **`configure()`** or env varsβ€”no need to switch to a new class for standard cost and token tracking.
526
+ Use the same **`ai()`** API everywhere. Point usage at your project once with **`configure()`** or environment variables.
435
527
 
436
528
  ```typescript
437
529
  import { configure, ai, OPENAI } from 'cost-katana';
438
530
 
439
531
  await configure({
440
532
  apiKey: process.env.COST_KATANA_API_KEY,
441
- projectId: process.env.PROJECT_ID,
533
+ projectId: process.env.PROJECT_ID
442
534
  });
443
535
 
444
- const response = await ai(OPENAI.GPT_4O, 'Explain quantum computing', {
445
- tags: ['demo', 'readme'],
446
- });
536
+ const response = await ai(OPENAI.GPT_4O, 'Explain quantum computing');
447
537
 
448
538
  console.log(response.text);
449
539
  console.log('Cost:', response.cost);
450
540
  console.log('Tokens:', response.tokens);
451
- console.log('Response time (ms):', response.responseTime);
452
541
  ```
453
542
 
454
- Calls are attributed to your project in the dashboard. You can also pass **`projectId`** on individual `ai()` options when you use multiple projects.
543
+ Calls can be attributed to your project in the dashboard. You can also pass **`projectId`** through tracker/gateway options where supported when using multiple projects.
544
+
545
+ ### `AICostTracker` with defaults (advanced)
455
546
 
456
- ### `AICostTracker` with defaults (recommended)
547
+ When you need a **dedicated tracker instance** (not only the global `ai()` helper), use **`createCostKatanaTracker()`** or **`AICostTracker.createWithDefaults()`**. They populate **`TrackerConfig`** from the same environment rules as auto-config:
457
548
 
458
- When you need a **dedicated tracker instance** (not the global `ai()` helper), use **`createCostKatanaTracker()`** or **`AICostTracker.createWithDefaults()`**. They fill in **`TrackerConfig`** from the same environment rules as auto-config: if you set **direct** provider keys (`OPENAI_API_KEY`, `ANTHROPIC_API_KEY`, `GOOGLE_API_KEY`, or AWS Bedrock creds), those providers are registered. If you **only** have **`COST_KATANA_API_KEY`** and **no** provider keys, the default is **Cost Katana hosted models** via the gateway (**`costkatana-backend-nest`**): a single OpenAI-shaped slot with the reserved `proxy` key so **`ai()`** / **`initializeGateway()`** route inference through the hosted API (no OpenAI/Anthropic keys required in your app). Optimization and alerts come from package defaults; pass a **partial** config to override anything.
549
+ - If you set **direct** provider keys (`OPENAI_API_KEY`, `ANTHROPIC_API_KEY`, `GOOGLE_API_KEY`, or AWS Bedrock credentials), those providers are registered.
550
+ - If you **only** have **`COST_KATANA_API_KEY`** and **no** direct provider keys, the default is **Cost Katana hosted models** via the gateway: inference can route through the hosted API without embedding vendor keys in your app.
459
551
 
460
552
  ```typescript
461
553
  import { createCostKatanaTracker, AIProvider } from 'cost-katana';
462
554
 
463
555
  const tracker = await createCostKatanaTracker();
464
556
 
465
- // Optional overrides (merged on top of defaults)
466
557
  const custom = await createCostKatanaTracker({
467
558
  optimization: { enablePromptOptimization: false },
468
559
  providers: [{ provider: AIProvider.OpenAI, apiKey: process.env.OPENAI_API_KEY! }]
469
560
  });
470
561
 
471
- // Same behavior: await AICostTracker.createWithDefaults({ ... })
562
+ // Same idea: await AICostTracker.createWithDefaults({ ... })
472
563
  // Short alias: import { tracker as costKatana } from 'cost-katana';
473
564
  ```
474
565
 
475
566
  Requires **`COST_KATANA_API_KEY`** in the environment (same as `AICostTracker.create()`). **`PROJECT_ID`** remains optional.
476
567
 
477
- ### Dedicated tracker instances (advanced)
568
+ ### Dedicated per-provider trackers
478
569
 
479
- If you want a **per-provider tracker object** (instead of the global `ai()` helper), use **`createOpenAITracker`** / **`createAnthropicTracker`** / etc. They wrap `AICostTracker` with a small `complete()` API:
570
+ For a **small `complete()`-style API** on top of `AICostTracker`, use **`createOpenAITracker`**, **`createAnthropicTracker`**, etc.
480
571
 
481
572
  ```typescript
482
573
  import { createOpenAITracker, OPENAI } from 'cost-katana';
@@ -489,30 +580,27 @@ console.log('Total cost (USD):', response.cost.totalCost);
489
580
  console.log('Response time (ms):', response.responseTime);
490
581
  ```
491
582
 
492
- For **gateway proxying**, **manual `trackUsage`**, or a fully custom **`AICostTracker`** with your own provider list, see [`docs/API.md`](./docs/API.md) and [`examples/`](./examples/).
493
-
494
- ### View Analytics in Dashboard
583
+ For **gateway proxying**, **manual `trackUsage`**, or a fully custom **`AICostTracker`**, see [`docs/API.md`](./docs/API.md) and [`examples/`](./examples/).
495
584
 
496
- Once tracking is enabled, you can view detailed analytics at your dashboard:
585
+ ### View analytics in the dashboard
497
586
 
498
- - **Network Performance**: DNS lookup time, TCP connection time, total response time
499
- - **Client Environment**: User agent, platform, IP geolocation
500
- - **Request/Response Data**: Full request and response payloads (sanitized)
501
- - **Optimization Opportunities**: AI-powered suggestions to reduce costs
502
- - **Performance Metrics**: Real-time monitoring with anomaly detection
587
+ With tracking enabled, you can inspect:
503
588
 
504
- ### Manual Usage Tracking
589
+ - **Network performance** β€” DNS, TCP, total response time
590
+ - **Client environment** β€” User agent, platform, IP geolocation (where collected)
591
+ - **Request/response data** β€” Payloads (sanitized)
592
+ - **Optimization opportunities** β€” Suggestions to reduce cost
593
+ - **Performance metrics** β€” Monitoring and anomaly signals
505
594
 
506
- For custom implementations or additional tracking:
595
+ ### Manual usage tracking
507
596
 
508
597
  ```typescript
509
598
  import { createCostKatanaTracker } from 'cost-katana';
510
599
 
511
600
  const tracker = await createCostKatanaTracker();
512
601
 
513
- // Manually track usage with additional metadata
514
602
  await tracker.trackUsage({
515
- model: 'gpt-4',
603
+ model: 'gpt-4o',
516
604
  provider: 'openai',
517
605
  prompt: 'Hello, world!',
518
606
  completion: 'Hello! How can I help you today?',
@@ -524,106 +612,182 @@ await tracker.trackUsage({
524
612
  userId: 'user_123',
525
613
  sessionId: 'session_abc',
526
614
  tags: ['chat', 'greeting'],
527
- // Additional metadata for comprehensive tracking
528
615
  requestMetadata: {
529
- userAgent: navigator?.userAgent,
616
+ userAgent: typeof navigator !== 'undefined' ? navigator.userAgent : undefined,
530
617
  clientIP: await fetch('https://api.ipify.org').then(r => r.text()),
531
618
  feature: 'chat-interface'
532
619
  }
533
620
  });
534
621
  ```
535
622
 
536
- ### Session replay & distributed tracing
623
+ ### Session replay and distributed tracing
537
624
 
538
- Session graphs, spans, and trace middleware are provided by the **`trace`** submodule. Start here: [`src/trace/README.md`](./src/trace/README.md) (exported APIs such as `TraceClient`, `LocalTraceService`, and `createTraceMiddleware`).
625
+ The **`trace`** submodule provides session graphs, spans, and middleware. See [`src/trace/README.md`](./src/trace/README.md) for exports such as `TraceClient`, `LocalTraceService`, and `createTraceMiddleware`.
539
626
 
540
627
  ---
541
628
 
542
- ## πŸ’‘ Cost Optimization Cheatsheet
629
+ ## Framework integration
630
+
631
+ ### Next.js App Router
632
+
633
+ ```typescript
634
+ // app/api/chat/route.ts
635
+ import { ai, OPENAI } from 'cost-katana';
543
636
 
544
- | Strategy | Savings | When to Use |
545
- |----------|---------|-------------|
546
- | **Use GPT-3.5 over GPT-4** | 90% | Simple tasks, translations |
547
- | **Enable caching** | 100% on hits | Repeated queries, FAQs |
548
- | **Enable Cortex** | 40-75% | Long-form content |
549
- | **Batch in sessions** | 10-20% | Related queries |
550
- | **Use Gemini Flash** | 95% vs GPT-4 | High-volume, cost-sensitive |
637
+ export async function POST(request: Request) {
638
+ const { prompt } = await request.json();
639
+ const response = await ai(OPENAI.GPT_4O, prompt);
640
+ return Response.json(response);
641
+ }
642
+ ```
551
643
 
552
- ### Quick Wins
644
+ ### Express.js
553
645
 
554
646
  ```typescript
555
- // ❌ Expensive: Using GPT-4 for everything
556
- await ai(OPENAI.GPT_4, 'What is 2+2?'); // $0.001
647
+ import express from 'express';
648
+ import { ai, OPENAI } from 'cost-katana';
557
649
 
558
- // βœ… Smart: Match model to task
559
- await ai(OPENAI.GPT_3_5_TURBO, 'What is 2+2?'); // $0.0001
650
+ const app = express();
651
+ app.use(express.json());
560
652
 
561
- // βœ… Smarter: Cache common queries
562
- await ai(OPENAI.GPT_3_5_TURBO, 'What is 2+2?', { cache: true }); // $0 on repeat
653
+ app.post('/api/chat', async (req, res) => {
654
+ const response = await ai(OPENAI.GPT_4O, req.body.prompt);
655
+ res.json(response);
656
+ });
563
657
 
564
- // βœ… Smartest: Cortex for long content
565
- await ai(OPENAI.GPT_4, 'Write a 2000-word essay', { cortex: true }); // 40-75% off
658
+ app.listen(3000);
659
+ ```
660
+
661
+ ### Fastify
662
+
663
+ ```typescript
664
+ import fastify from 'fastify';
665
+ import { ai, OPENAI } from 'cost-katana';
666
+
667
+ const app = fastify();
668
+
669
+ app.post('/api/chat', async request => {
670
+ const { prompt } = request.body as { prompt: string };
671
+ return await ai(OPENAI.GPT_4O, prompt);
672
+ });
673
+
674
+ app.listen({ port: 3000 });
675
+ ```
676
+
677
+ ### NestJS
678
+
679
+ ```typescript
680
+ import { Controller, Post, Body } from '@nestjs/common';
681
+ import { ai, OPENAI } from 'cost-katana';
682
+
683
+ @Controller('api')
684
+ export class ChatController {
685
+ @Post('chat')
686
+ async chat(@Body() body: { prompt: string }) {
687
+ return await ai(OPENAI.GPT_4O, body.prompt);
688
+ }
689
+ }
566
690
  ```
567
691
 
568
692
  ---
569
693
 
570
- ## πŸ”§ Error Handling
694
+ ## Error handling
571
695
 
572
696
  ```typescript
573
697
  import { ai, OPENAI } from 'cost-katana';
574
698
 
575
699
  try {
576
- const response = await ai(OPENAI.GPT_4, 'Hello');
700
+ const response = await ai(OPENAI.GPT_4O, 'Hello');
577
701
  console.log(response.text);
578
702
  } catch (error) {
579
- switch (error.code) {
703
+ const err = error as Error & { code?: string; availableModels?: string[] };
704
+ switch (err.code) {
580
705
  case 'NO_API_KEY':
581
- console.log('Set COST_KATANA_API_KEY or OPENAI_API_KEY');
706
+ console.log('Set COST_KATANA_API_KEY or a provider API key');
582
707
  break;
583
708
  case 'RATE_LIMIT':
584
- console.log('Rate limited. Retrying...');
709
+ console.log('Rate limited. Retry with backoff.');
585
710
  break;
586
711
  case 'INVALID_MODEL':
587
- console.log('Model not found. Available:', error.availableModels);
712
+ console.log('Model not found. Available:', err.availableModels);
588
713
  break;
589
714
  default:
590
- console.log('Error:', error.message);
715
+ console.log('Error:', err.message);
591
716
  }
592
717
  }
593
718
  ```
594
719
 
720
+ Exact **`code`** values depend on the failure path (gateway vs direct provider). Always log **`message`** for support.
721
+
595
722
  ---
596
723
 
597
- ## 🌐 AI Gateway β€” details
724
+ ## AI gateway (details)
598
725
 
599
- The gateway is an **HTTP proxy**: call Cost Katana’s URL with your API key; the service forwards to OpenAI, Anthropic, Google, Cohere, etc., and can attach caching, retries, firewall, and tracking.
726
+ The gateway is an **HTTP proxy**: call Cost Katana’s URL with your API key; the service forwards to OpenAI, Anthropic, Google, Cohere, and others, and can attach caching, retries, firewall, and tracking.
600
727
 
601
- - **Quick start:** see [Get started in 60 seconds](#get-started-in-60-seconds) above (`gateway()` or curl).
602
- - **`CostKatana-Target-Url`:** only needed for non-default upstream URLs (Azure OpenAI, private endpoints). For standard routes (`/v1/chat/completions`, `/v1/messages`, …), **`gateway()`** uses `inferTargetUrl: true` and usually omits it.
603
- - **Anthropic on hosted gateway:** `gateway.anthropic(...)` / `/v1/messages` often needs no Anthropic key in your app; the service may use Bedrock when no server `ANTHROPIC_API_KEY` is set (see docs for streaming limitations).
604
- - **Dashboard rows:** gateway traffic reflects **proxied** bodies; `AICostTracker` / `trackUsage` is for **custom** structured logging. Multi-turn and token accounting nuances: [`examples/GATEWAY_USAGE_AND_TRACKING.md`](./examples/GATEWAY_USAGE_AND_TRACKING.md) and [costkatana-examples `2-gateway`](https://github.com/Hypothesize-Tech/costkatana-examples/tree/main/2-gateway).
728
+ - **Quick start:** [Quick start β€” Path A](#path-a--gateway-http-proxy) (`gateway()` or cURL).
729
+ - **`CostKatana-Target-Url`:** Use for non-default upstream URLs (Azure OpenAI, private endpoints). For standard routes (`/v1/chat/completions`, `/v1/messages`, …), **`gateway()`** often uses `inferTargetUrl: true` and omits it.
730
+ - **Anthropic on hosted gateway:** `gateway.anthropic(...)` / `/v1/messages` may not require an Anthropic key in your app; the service may use Bedrock when no server `ANTHROPIC_API_KEY` is set (see docs for streaming limitations).
731
+ - **Dashboard vs custom tracking:** Gateway traffic reflects **proxied** bodies; `AICostTracker` / `trackUsage` supports **custom** structured logging. For multi-turn and token nuances, see [`examples/GATEWAY_USAGE_AND_TRACKING.md`](./examples/GATEWAY_USAGE_AND_TRACKING.md) and [costkatana-examples `2-gateway`](https://github.com/Hypothesize-Tech/costkatana-examples/tree/main/2-gateway).
605
732
 
606
733
  ---
607
734
 
608
- ## πŸ“š More Examples
735
+ ## Experimentation (hosted API)
736
+
737
+ The Cost Katana backend ([`costkatana-backend-nest`](https://github.com/Hypothesize-Tech/costkatana-backend-nest)) exposes **experimentation** REST endpoints under **`/api/experimentation`** on the hosted API (same origin as the gateway, e.g. `https://api.costkatana.com`). The dashboard **Experimentation** UI uses these APIs; you can also integrate them directly.
738
+
739
+ **What it covers**
740
+
741
+ - **Model comparison** β€” Run side-by-side comparisons across providers (`POST /api/experimentation/model-comparison`).
742
+ - **Real-time comparison** β€” Start a comparison job (`POST /api/experimentation/real-time-comparison`) and stream progress over **SSE** at `GET /api/experimentation/comparison-progress/:sessionId` (session token validated). Poll or reconnect via `GET /api/experimentation/comparison-job/:sessionId` when authenticated.
743
+ - **Catalog** β€” `GET /api/experimentation/available-models` returns router-registered models (active/inactive) for picking candidates.
744
+ - **Cost estimate** β€” `POST /api/experimentation/estimate-cost` (public) for experiment cost estimates before you run.
745
+ - **What-if scenarios** β€” List/create/analyze/delete scenarios (`/api/experimentation/what-if-scenarios`, `.../:scenarioName/analyze`, lifecycle updates).
746
+ - **Real-time simulation** β€” `POST /api/experimentation/real-time-simulation` (public) for what-if style simulations.
747
+ - **History and insights** β€” `GET /api/experimentation/history`, `GET /api/experimentation/recommendations`, `GET /api/experimentation/fine-tuning-analysis`.
748
+ - **Exports** β€” `GET /api/experimentation/:experimentId/export?format=json|csv` for results.
749
+
750
+ **Auth**
751
+
752
+ - Most write/read routes require a **dashboard user JWT** (`JwtAuthGuard`).
753
+ - Several routes are marked **public** (estimate cost, available models, real-time simulation, SSE progress with a valid session id). See the controller for the exact list: [`experimentation.controller.ts` in costkatana-backend-nest](https://github.com/Hypothesize-Tech/costkatana-backend-nest/blob/main/src/modules/experimentation/experimentation.controller.ts).
754
+
755
+ **Server configuration**
756
+
757
+ - Real model execution for comparisons may require backend flags such as **`ENABLE_REAL_MODEL_COMPARISON=true`** where your deployment enables live API calls to providers.
758
+
759
+ ---
760
+
761
+ ## Examples and documentation
762
+
763
+ **In this repo**
764
+
765
+ | Resource | Description |
766
+ | -------------------------------------------------------------- | ---------------------------- |
767
+ | [`docs/API.md`](./docs/API.md) | API reference |
768
+ | [`docs/EXAMPLES.md`](./docs/EXAMPLES.md) | Examples index |
769
+ | [`docs/GATEWAY.md`](./docs/GATEWAY.md) | Gateway |
770
+ | [`docs/PROMPT_OPTIMIZATION.md`](./docs/PROMPT_OPTIMIZATION.md) | Prompt optimization |
771
+ | [`docs/WEBHOOKS.md`](./docs/WEBHOOKS.md) | Webhooks |
772
+ | [`examples/`](./examples/) | Runnable TypeScript examples |
609
773
 
610
- Explore 45+ complete examples in our examples repository:
774
+ **External examples repo** β€” 45+ complete examples:
611
775
 
612
- **πŸ”— [github.com/Hypothesize-Tech/costkatana-examples](https://github.com/Hypothesize-Tech/costkatana-examples)**
776
+ **[github.com/Hypothesize-Tech/costkatana-examples](https://github.com/Hypothesize-Tech/costkatana-examples)**
613
777
 
614
- | Category | Examples |
615
- |----------|----------|
616
- | **Cost Tracking** | Basic tracking, budgets, alerts |
617
- | **Gateway** | Routing, load balancing, failover |
618
- | **Optimization** | Cortex, caching, compression |
619
- | **Observability** | OpenTelemetry, tracing, metrics |
620
- | **Security** | Firewall, rate limiting, moderation |
621
- | **Workflows** | Multi-step AI orchestration |
622
- | **Frameworks** | Express, Next.js, Fastify, NestJS, FastAPI |
778
+ | Category | Topics |
779
+ | ----------------- | ------------------------------------------ |
780
+ | **Cost tracking** | Budgets, alerts |
781
+ | **Gateway** | Routing, load balancing, failover |
782
+ | **Optimization** | Cortex, caching, compression |
783
+ | **Observability** | OpenTelemetry, tracing, metrics |
784
+ | **Security** | Firewall, rate limiting, moderation |
785
+ | **Workflows** | Multi-step orchestration |
786
+ | **Frameworks** | Express, Next.js, Fastify, NestJS, FastAPI |
623
787
 
624
788
  ---
625
789
 
626
- ## πŸ”„ Migration Guides
790
+ ## Migration guides
627
791
 
628
792
  ### From OpenAI SDK
629
793
 
@@ -641,7 +805,7 @@ console.log(completion.choices[0].message.content);
641
805
  import { ai, OPENAI } from 'cost-katana';
642
806
  const response = await ai(OPENAI.GPT_4, 'Hello');
643
807
  console.log(response.text);
644
- console.log(`Cost: $${response.cost}`); // Bonus: cost tracking!
808
+ console.log(`Cost: $${response.cost}`);
645
809
  ```
646
810
 
647
811
  ### From Anthropic SDK
@@ -675,9 +839,9 @@ const response = await ai(OPENAI.GPT_4, 'Hello');
675
839
 
676
840
  ---
677
841
 
678
- ## 🀝 Contributing
842
+ ## Contributing
679
843
 
680
- We welcome contributions! See our [Contributing Guide](./CONTRIBUTING.md).
844
+ We welcome contributions. See [Contributing Guide](./CONTRIBUTING.md).
681
845
 
682
846
  ```bash
683
847
  git clone https://github.com/Hypothesize-Tech/costkatana-core.git
@@ -693,19 +857,19 @@ npm run build # Build
693
857
 
694
858
  ---
695
859
 
696
- ## πŸ“ž Support
860
+ ## Support
697
861
 
698
- | Channel | Link |
699
- |---------|------|
700
- | **Dashboard** | [costkatana.com](https://costkatana.com) |
701
- | **Documentation** | [docs.costkatana.com](https://docs.costkatana.com) |
702
- | **GitHub** | [github.com/Hypothesize-Tech](https://github.com/Hypothesize-Tech) |
703
- | **Discord** | [discord.gg/D8nDArmKbY](https://discord.gg/D8nDArmKbY) |
704
- | **Email** | support@costkatana.com |
862
+ | Channel | Link |
863
+ | ----------------- | ------------------------------------------------------------------ |
864
+ | **Dashboard** | [costkatana.com](https://costkatana.com) |
865
+ | **Documentation** | [docs.costkatana.com](https://docs.costkatana.com) |
866
+ | **GitHub** | [github.com/Hypothesize-Tech](https://github.com/Hypothesize-Tech) |
867
+ | **Discord** | [discord.gg/D8nDArmKbY](https://discord.gg/D8nDArmKbY) |
868
+ | **Email** | support@costkatana.com |
705
869
 
706
870
  ---
707
871
 
708
- ## πŸ“„ License
872
+ ## License
709
873
 
710
874
  MIT Β© Cost Katana
711
875
 
@@ -713,7 +877,7 @@ MIT Β© Cost Katana
713
877
 
714
878
  <div align="center">
715
879
 
716
- **Start cutting AI costs today** πŸ₯·
880
+ **Start cutting AI costs today**
717
881
 
718
882
  ```bash
719
883
  npm install cost-katana