cost-katana 2.4.1 β 2.4.3
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +490 -326
- package/dist/config/pricing/aws-bedrock.js +1 -1
- package/dist/config/pricing/aws-bedrock.js.map +1 -1
- package/dist/constants/models.d.ts +1 -1
- package/dist/constants/models.d.ts.map +1 -1
- package/dist/constants/models.js +1 -1
- package/dist/constants/models.js.map +1 -1
- package/dist/gateway/client.d.ts.map +1 -1
- package/dist/gateway/client.js +1 -2
- package/dist/gateway/client.js.map +1 -1
- package/dist/providers/anthropic.d.ts.map +1 -1
- package/dist/providers/anthropic.js +22 -2
- package/dist/providers/anthropic.js.map +1 -1
- package/dist/providers/bedrock.d.ts.map +1 -1
- package/dist/providers/bedrock.js +16 -2
- package/dist/providers/bedrock.js.map +1 -1
- package/dist/types/providers.d.ts +5 -0
- package/dist/types/providers.d.ts.map +1 -1
- package/package.json +1 -1
package/README.md
CHANGED
|
@@ -1,18 +1,102 @@
|
|
|
1
|
-
# Cost Katana
|
|
1
|
+
# Cost Katana
|
|
2
|
+
|
|
3
|
+
[](https://www.npmjs.com/package/cost-katana)
|
|
4
|
+
[](https://pypi.org/project/cost-katana/)
|
|
5
|
+
[](./LICENSE)
|
|
6
|
+
[](https://nodejs.org/)
|
|
7
|
+
[](https://pypi.org/project/cost-katana/)
|
|
2
8
|
|
|
3
9
|
> **Cut your AI costs in half. Without cutting corners.**
|
|
4
10
|
|
|
5
11
|
Cost Katana is a drop-in SDK that wraps your AI calls with automatic cost tracking, smart caching, and optimizationβall in one line of code.
|
|
6
12
|
|
|
13
|
+
## Table of contents
|
|
14
|
+
|
|
15
|
+
- [Cost Katana](#cost-katana)
|
|
16
|
+
- [Table of contents](#table-of-contents)
|
|
17
|
+
- [Installation](#installation)
|
|
18
|
+
- [Quick start](#quick-start)
|
|
19
|
+
- [Path A β Gateway (HTTP proxy)](#path-a--gateway-http-proxy)
|
|
20
|
+
- [Path B β `ai()` (simple API, cost on the response)](#path-b--ai-simple-api-cost-on-the-response)
|
|
21
|
+
- [Path C β Python](#path-c--python)
|
|
22
|
+
- [Which API should I use?](#which-api-should-i-use)
|
|
23
|
+
- [Configuration](#configuration)
|
|
24
|
+
- [Environment variables](#environment-variables)
|
|
25
|
+
- [Programmatic configuration](#programmatic-configuration)
|
|
26
|
+
- [Common request options (`ai()`)](#common-request-options-ai)
|
|
27
|
+
- [Core APIs](#core-apis)
|
|
28
|
+
- [`ai()`](#ai)
|
|
29
|
+
- [`chat()`](#chat)
|
|
30
|
+
- [`gateway()`](#gateway)
|
|
31
|
+
- [Provider-independent design](#provider-independent-design)
|
|
32
|
+
- [Type-safe model constants](#type-safe-model-constants)
|
|
33
|
+
- [Claude extended thinking (`ProviderRequest.thinking`)](#claude-extended-thinking-providerrequestthinking)
|
|
34
|
+
- [Cost optimization](#cost-optimization)
|
|
35
|
+
- [Cheatsheet](#cheatsheet)
|
|
36
|
+
- [Caching](#caching)
|
|
37
|
+
- [Cortex (optimization)](#cortex-optimization)
|
|
38
|
+
- [Compare models side by side](#compare-models-side-by-side)
|
|
39
|
+
- [Quick wins](#quick-wins)
|
|
40
|
+
- [Security and reliability](#security-and-reliability)
|
|
41
|
+
- [Firewall](#firewall)
|
|
42
|
+
- [Auto-failover](#auto-failover)
|
|
43
|
+
- [Usage tracking and analytics](#usage-tracking-and-analytics)
|
|
44
|
+
- [Dashboard attribution with `configure()` and `ai()`](#dashboard-attribution-with-configure-and-ai)
|
|
45
|
+
- [`AICostTracker` with defaults (advanced)](#aicosttracker-with-defaults-advanced)
|
|
46
|
+
- [Dedicated per-provider trackers](#dedicated-per-provider-trackers)
|
|
47
|
+
- [View analytics in the dashboard](#view-analytics-in-the-dashboard)
|
|
48
|
+
- [Manual usage tracking](#manual-usage-tracking)
|
|
49
|
+
- [Session replay and distributed tracing](#session-replay-and-distributed-tracing)
|
|
50
|
+
- [Framework integration](#framework-integration)
|
|
51
|
+
- [Next.js App Router](#nextjs-app-router)
|
|
52
|
+
- [Express.js](#expressjs)
|
|
53
|
+
- [Fastify](#fastify)
|
|
54
|
+
- [NestJS](#nestjs)
|
|
55
|
+
- [Error handling](#error-handling)
|
|
56
|
+
- [AI gateway (details)](#ai-gateway-details)
|
|
57
|
+
- [Experimentation (hosted API)](#experimentation-hosted-api)
|
|
58
|
+
- [Examples and documentation](#examples-and-documentation)
|
|
59
|
+
- [Migration guides](#migration-guides)
|
|
60
|
+
- [From OpenAI SDK](#from-openai-sdk)
|
|
61
|
+
- [From Anthropic SDK](#from-anthropic-sdk)
|
|
62
|
+
- [From LangChain](#from-langchain)
|
|
63
|
+
- [Contributing](#contributing)
|
|
64
|
+
- [Support](#support)
|
|
65
|
+
- [License](#license)
|
|
66
|
+
|
|
67
|
+
---
|
|
68
|
+
|
|
69
|
+
## Installation
|
|
70
|
+
|
|
71
|
+
**TypeScript / Node**
|
|
72
|
+
|
|
73
|
+
```bash
|
|
74
|
+
npm install cost-katana
|
|
75
|
+
```
|
|
76
|
+
|
|
77
|
+
**Python** β published on PyPI as [`cost-katana`](https://pypi.org/project/cost-katana/) (install name uses a hyphen; import uses an underscore).
|
|
78
|
+
|
|
79
|
+
```bash
|
|
80
|
+
pip install cost-katana
|
|
81
|
+
```
|
|
82
|
+
|
|
83
|
+
```python
|
|
84
|
+
import cost_katana as ck # package import: cost_katana
|
|
85
|
+
```
|
|
86
|
+
|
|
87
|
+
Requires **Node.js 18+** for the npm package and **Python 3.8+** for the PyPI package.
|
|
88
|
+
|
|
7
89
|
---
|
|
8
90
|
|
|
9
|
-
##
|
|
91
|
+
## Quick start
|
|
10
92
|
|
|
11
93
|
Set **`COST_KATANA_API_KEY`**. **`PROJECT_ID`** is optional (recommended for per-project analytics in the dashboard).
|
|
12
94
|
|
|
13
|
-
###
|
|
95
|
+
### Path A β Gateway (HTTP proxy)
|
|
14
96
|
|
|
15
|
-
|
|
97
|
+
Use this when you want a **drop-in proxy**: change base URL and send `Authorization: Bearer`, or use **`gateway()`** in TypeScript with no extra config (reads `COST_KATANA_API_KEY`, same behavior as `createGatewayClientFromEnv()`).
|
|
98
|
+
|
|
99
|
+
**cURL** (no SDK; OpenAI-compatible JSON):
|
|
16
100
|
|
|
17
101
|
```bash
|
|
18
102
|
curl -s https://api.costkatana.com/api/gateway/v1/chat/completions \
|
|
@@ -21,26 +105,20 @@ curl -s https://api.costkatana.com/api/gateway/v1/chat/completions \
|
|
|
21
105
|
-d '{"model":"gpt-4o","messages":[{"role":"user","content":"Hello!"}]}'
|
|
22
106
|
```
|
|
23
107
|
|
|
24
|
-
|
|
25
|
-
|
|
26
|
-
**TypeScript β `gateway()`** β zero extra config; reads `COST_KATANA_API_KEY` (same behavior as `createGatewayClientFromEnv()`):
|
|
27
|
-
|
|
28
|
-
```bash
|
|
29
|
-
npm install cost-katana
|
|
30
|
-
```
|
|
108
|
+
**TypeScript**
|
|
31
109
|
|
|
32
110
|
```typescript
|
|
33
|
-
import { gateway } from 'cost-katana';
|
|
111
|
+
import { gateway, OPENAI } from 'cost-katana';
|
|
34
112
|
|
|
35
113
|
const res = await gateway().openai({
|
|
36
|
-
model:
|
|
37
|
-
messages: [{ role: 'user', content: 'Hello!' }]
|
|
114
|
+
model: OPENAI.GPT_4O,
|
|
115
|
+
messages: [{ role: 'user', content: 'Hello!' }]
|
|
38
116
|
});
|
|
39
117
|
|
|
40
118
|
console.log(res.data);
|
|
41
119
|
```
|
|
42
120
|
|
|
43
|
-
### `ai()`
|
|
121
|
+
### Path B β `ai()` (simple API, cost on the response)
|
|
44
122
|
|
|
45
123
|
```typescript
|
|
46
124
|
import { ai, OPENAI } from 'cost-katana';
|
|
@@ -50,11 +128,9 @@ const response = await ai(OPENAI.GPT_4O, 'Hello');
|
|
|
50
128
|
console.log(response.text, response.cost);
|
|
51
129
|
```
|
|
52
130
|
|
|
53
|
-
### Python
|
|
131
|
+
### Path C β Python
|
|
54
132
|
|
|
55
|
-
|
|
56
|
-
pip install costkatana
|
|
57
|
-
```
|
|
133
|
+
Install [`cost-katana` from PyPI](https://pypi.org/project/cost-katana/), set `COST_KATANA_API_KEY` (and optionally `PROJECT_ID`), then:
|
|
58
134
|
|
|
59
135
|
```python
|
|
60
136
|
import cost_katana as ck
|
|
@@ -64,332 +140,355 @@ response = ck.ai(openai.gpt_4o, "Hello")
|
|
|
64
140
|
print(response.text, response.cost)
|
|
65
141
|
```
|
|
66
142
|
|
|
143
|
+
The Python SDK talks to the same hosted backend as TypeScript (`https://api.costkatana.com` by default). For HTTP gateway usage (OpenAI- or Anthropic-shaped JSON), see the [package README on PyPI](https://pypi.org/project/cost-katana/).
|
|
144
|
+
|
|
67
145
|
### Which API should I use?
|
|
68
146
|
|
|
69
|
-
| If you wantβ¦
|
|
70
|
-
|
|
71
|
-
| Drop-in HTTP proxy (existing OpenAI clients /
|
|
72
|
-
| Simple AI calls with cost on the response
|
|
73
|
-
| Session replay, advanced analytics, or manual `trackUsage` | **`AICostTracker`** (advanced)
|
|
147
|
+
| If you want⦠| Use |
|
|
148
|
+
| ---------------------------------------------------------- | ----------------------------------------------------------------------- |
|
|
149
|
+
| Drop-in HTTP proxy (existing OpenAI clients / cURL) | Gateway URL + `Authorization: Bearer`, or **`gateway()`** in TypeScript |
|
|
150
|
+
| Simple AI calls with cost on the response | **`ai()`** / **`chat()`** |
|
|
151
|
+
| Session replay, advanced analytics, or manual `trackUsage` | **`AICostTracker`** (advanced) |
|
|
74
152
|
|
|
75
|
-
For most apps, **`COST_KATANA_API_KEY`** plus either **`gateway()`** (proxy) or **`ai()`** (SDK) is enough.
|
|
153
|
+
For most apps, **`COST_KATANA_API_KEY`** plus either **`gateway()`** (proxy) or **`ai()`** (SDK) is enough. For optional direct provider keys, add them to your environment as shown in [Configuration](#configuration).
|
|
76
154
|
|
|
77
155
|
---
|
|
78
156
|
|
|
79
|
-
##
|
|
157
|
+
## Configuration
|
|
80
158
|
|
|
81
|
-
|
|
159
|
+
### Environment variables
|
|
82
160
|
|
|
83
|
-
|
|
161
|
+
**Start here:** `COST_KATANA_API_KEY` unlocks routing, tracking, and dashboard features. **`PROJECT_ID`** is optional (scopes usage to a project in the dashboard).
|
|
84
162
|
|
|
85
|
-
|
|
86
|
-
import { ai, ModelCapability } from 'cost-katana';
|
|
163
|
+
Create a `.env` in your project (or export in your shell) with the variables you need:
|
|
87
164
|
|
|
88
|
-
|
|
89
|
-
|
|
90
|
-
|
|
91
|
-
const vision = await ai(ModelCapability.VISION, 'Describe this image', { image });
|
|
92
|
-
```
|
|
165
|
+
```bash
|
|
166
|
+
# Required for hosted Cost Katana
|
|
167
|
+
COST_KATANA_API_KEY=dak_your_key_here
|
|
93
168
|
|
|
94
|
-
|
|
169
|
+
# Optional β per-project analytics
|
|
170
|
+
PROJECT_ID=your_project_id
|
|
95
171
|
|
|
96
|
-
|
|
97
|
-
|
|
172
|
+
# Optional β direct provider keys (bring your own keys)
|
|
173
|
+
OPENAI_API_KEY=sk-...
|
|
174
|
+
ANTHROPIC_API_KEY=sk-ant-...
|
|
175
|
+
GOOGLE_API_KEY=...
|
|
176
|
+
|
|
177
|
+
# Optional β AWS Bedrock
|
|
178
|
+
AWS_ACCESS_KEY_ID=...
|
|
179
|
+
AWS_SECRET_ACCESS_KEY=...
|
|
180
|
+
AWS_REGION=us-east-1
|
|
181
|
+
```
|
|
98
182
|
|
|
99
|
-
|
|
100
|
-
const fast = await ai({ speed: 'fastest' }, prompt);
|
|
183
|
+
There is no `.env.example` file in this repository; copy the block above into your own `.env` and fill in values.
|
|
101
184
|
|
|
102
|
-
|
|
103
|
-
const cheap = await ai({ cost: 'cheapest' }, prompt);
|
|
185
|
+
### Programmatic configuration
|
|
104
186
|
|
|
105
|
-
|
|
106
|
-
|
|
187
|
+
```typescript
|
|
188
|
+
import { configure } from 'cost-katana';
|
|
107
189
|
|
|
108
|
-
|
|
109
|
-
|
|
190
|
+
await configure({
|
|
191
|
+
apiKey: 'dak_your_key',
|
|
192
|
+
cortex: true, // 40β75% cost savings (when enabled on requests)
|
|
193
|
+
cache: true, // Smart caching (when enabled on requests)
|
|
194
|
+
firewall: true // Block prompt injections
|
|
195
|
+
});
|
|
110
196
|
```
|
|
111
197
|
|
|
112
|
-
|
|
113
|
-
- π **Automatic Failover** - Seamlessly switch providers if one goes down
|
|
114
|
-
- π° **Cost Optimization** - Routes to the cheapest provider automatically
|
|
115
|
-
- π **Future-Proof** - New providers added without code changes
|
|
116
|
-
- π **Zero Lock-In** - Switch providers anytime, no refactoring needed
|
|
198
|
+
### Common request options (`ai()`)
|
|
117
199
|
|
|
118
|
-
|
|
200
|
+
| Option | Description |
|
|
201
|
+
| --------------- | ----------------------------------- |
|
|
202
|
+
| `temperature` | Creativity (0β2), default `0.7` |
|
|
203
|
+
| `maxTokens` | Max response tokens, default `1000` |
|
|
204
|
+
| `systemMessage` | System prompt |
|
|
205
|
+
| `cache` | Enable caching |
|
|
206
|
+
| `cortex` | Enable optimization (Cortex) |
|
|
119
207
|
|
|
120
|
-
|
|
208
|
+
```typescript
|
|
209
|
+
import { ai, OPENAI } from 'cost-katana';
|
|
121
210
|
|
|
122
|
-
|
|
211
|
+
const response = await ai(OPENAI.GPT_4O, 'Your prompt', {
|
|
212
|
+
temperature: 0.7,
|
|
213
|
+
maxTokens: 500,
|
|
214
|
+
systemMessage: 'You are a helpful AI',
|
|
215
|
+
cache: true,
|
|
216
|
+
cortex: true
|
|
217
|
+
});
|
|
218
|
+
```
|
|
123
219
|
|
|
124
|
-
|
|
125
|
-
- β
Tracks every dollar spent
|
|
126
|
-
- β
Caches repeated questions (saving 100% on duplicates)
|
|
127
|
-
- β
Optimizes long responses (40-75% savings)
|
|
220
|
+
---
|
|
128
221
|
|
|
129
|
-
|
|
222
|
+
## Core APIs
|
|
130
223
|
|
|
131
|
-
|
|
132
|
-
import { chat, OPENAI } from 'cost-katana';
|
|
224
|
+
### `ai()`
|
|
133
225
|
|
|
134
|
-
|
|
135
|
-
const session = chat(OPENAI.GPT_4);
|
|
226
|
+
The simplest way to make AI requests with automatic cost tracking.
|
|
136
227
|
|
|
137
|
-
|
|
138
|
-
await session.send('Hello! What can you help me with?');
|
|
139
|
-
await session.send('Tell me a programming joke');
|
|
140
|
-
await session.send('Now explain it');
|
|
228
|
+
**Signature**
|
|
141
229
|
|
|
142
|
-
|
|
143
|
-
|
|
144
|
-
console.log(`π Messages: ${session.messages.length}`);
|
|
145
|
-
console.log(`π― Tokens used: ${session.totalTokens}`);
|
|
230
|
+
```typescript
|
|
231
|
+
await ai(model, prompt, options?);
|
|
146
232
|
```
|
|
147
233
|
|
|
148
|
-
|
|
234
|
+
- **`model`** β Use type-safe constants (e.g. `OPENAI.GPT_4O`). String model IDs still work but are deprecated.
|
|
235
|
+
- **`prompt`** β User prompt text.
|
|
236
|
+
- **`options`** β See [Common request options](#common-request-options-ai).
|
|
149
237
|
|
|
150
|
-
|
|
238
|
+
**Returns:** `text`, `cost`, `tokens`, `model`, `provider`, and optionally `cached`, `optimized`, `templateUsed` when applicable.
|
|
151
239
|
|
|
152
240
|
```typescript
|
|
153
241
|
import { ai, OPENAI } from 'cost-katana';
|
|
154
242
|
|
|
155
|
-
|
|
156
|
-
|
|
157
|
-
|
|
158
|
-
|
|
243
|
+
const response = await ai(OPENAI.GPT_4O, 'Explain quantum computing', {
|
|
244
|
+
temperature: 0.7,
|
|
245
|
+
maxTokens: 500
|
|
246
|
+
});
|
|
247
|
+
|
|
248
|
+
console.log(response.text);
|
|
249
|
+
console.log(`Cost: $${response.cost}`);
|
|
250
|
+
```
|
|
251
|
+
|
|
252
|
+
### `chat()`
|
|
159
253
|
|
|
160
|
-
|
|
161
|
-
|
|
162
|
-
|
|
163
|
-
|
|
254
|
+
Create a **session** with conversation history and cost tracking.
|
|
255
|
+
|
|
256
|
+
**Signature**
|
|
257
|
+
|
|
258
|
+
```typescript
|
|
259
|
+
const session = chat(model, options?);
|
|
164
260
|
```
|
|
165
261
|
|
|
166
|
-
|
|
262
|
+
**Session API**
|
|
167
263
|
|
|
168
|
-
|
|
264
|
+
| Member | Description |
|
|
265
|
+
| --------------- | ------------------------------------------------ |
|
|
266
|
+
| `send(message)` | Send a message and append assistant reply |
|
|
267
|
+
| `messages` | Full conversation history |
|
|
268
|
+
| `totalCost` | Running total cost (USD) |
|
|
269
|
+
| `totalTokens` | Running token count |
|
|
270
|
+
| `clear()` | Reset conversation (keeps system message if set) |
|
|
169
271
|
|
|
170
272
|
```typescript
|
|
171
|
-
import {
|
|
273
|
+
import { chat, OPENAI } from 'cost-katana';
|
|
172
274
|
|
|
173
|
-
const
|
|
174
|
-
|
|
175
|
-
|
|
176
|
-
|
|
177
|
-
cortex: true, // Enable 40-75% cost reduction
|
|
178
|
-
maxTokens: 2000
|
|
179
|
-
}
|
|
180
|
-
);
|
|
275
|
+
const session = chat(OPENAI.GPT_4O, {
|
|
276
|
+
systemMessage: 'You are a helpful AI assistant.',
|
|
277
|
+
temperature: 0.7
|
|
278
|
+
});
|
|
181
279
|
|
|
182
|
-
|
|
183
|
-
|
|
280
|
+
await session.send('Hello! What can you help me with?');
|
|
281
|
+
await session.send('Tell me a programming joke');
|
|
282
|
+
await session.send('Now explain it');
|
|
283
|
+
|
|
284
|
+
console.log(`Total cost: $${session.totalCost.toFixed(4)}`);
|
|
285
|
+
console.log(`Messages: ${session.messages.length}`);
|
|
286
|
+
console.log(`Tokens used: ${session.totalTokens}`);
|
|
184
287
|
```
|
|
185
288
|
|
|
186
|
-
###
|
|
289
|
+
### `gateway()`
|
|
187
290
|
|
|
188
|
-
|
|
291
|
+
Zero extra config for the hosted gateway: **`COST_KATANA_API_KEY`** is read from the environment. Use the same OpenAI-shaped request bodies you would send upstream.
|
|
189
292
|
|
|
190
|
-
|
|
191
|
-
import { ai, OPENAI, ANTHROPIC, GOOGLE } from 'cost-katana';
|
|
293
|
+
For advanced gateway features (headers, proxy keys, firewall), see [`docs/GATEWAY.md`](./docs/GATEWAY.md) and [`docs/API.md`](./docs/API.md).
|
|
192
294
|
|
|
193
|
-
|
|
295
|
+
---
|
|
194
296
|
|
|
195
|
-
|
|
196
|
-
{ name: 'GPT-4', id: OPENAI.GPT_4 },
|
|
197
|
-
{ name: 'Claude 3.5 Sonnet', id: ANTHROPIC.CLAUDE_3_5_SONNET_20241022 },
|
|
198
|
-
{ name: 'Gemini 2.5 Pro', id: GOOGLE.GEMINI_2_5_PRO },
|
|
199
|
-
{ name: 'GPT-3.5 Turbo', id: OPENAI.GPT_3_5_TURBO }
|
|
200
|
-
];
|
|
297
|
+
## Provider-independent design
|
|
201
298
|
|
|
202
|
-
|
|
299
|
+
Cost Katana is **provider-agnostic**: the same **`ai()`** API works across OpenAI, Anthropic, Google, and moreβpick a **model constant** per provider.
|
|
203
300
|
|
|
204
|
-
|
|
205
|
-
|
|
206
|
-
console.log(`${model.name.padEnd(20)} $${response.cost.toFixed(6)}`);
|
|
207
|
-
}
|
|
208
|
-
```
|
|
301
|
+
```typescript
|
|
302
|
+
import { ai, OPENAI, ANTHROPIC, GOOGLE } from 'cost-katana';
|
|
209
303
|
|
|
210
|
-
|
|
304
|
+
const a = await ai(OPENAI.GPT_4O, 'Hello');
|
|
305
|
+
const b = await ai(ANTHROPIC.CLAUDE_3_5_SONNET_20241022, 'Hello');
|
|
306
|
+
const c = await ai(GOOGLE.GEMINI_2_5_PRO, 'Hello');
|
|
211
307
|
```
|
|
212
|
-
π Model Cost Comparison
|
|
213
308
|
|
|
214
|
-
|
|
215
|
-
|
|
216
|
-
|
|
217
|
-
|
|
218
|
-
|
|
309
|
+
**Benefits**
|
|
310
|
+
|
|
311
|
+
- **Automatic failover** β Seamlessly switch providers when configured (see [Security and reliability](#security-and-reliability)).
|
|
312
|
+
- **Cost optimization** β Choose cheaper models with constants and the [cost optimization](#cost-optimization) patterns below.
|
|
313
|
+
- **Future-proof** β New providers and models are added to the registry without changing your mental model.
|
|
314
|
+
- **Zero lock-in** β Swap model constants as your stack evolves.
|
|
315
|
+
|
|
316
|
+
For deeper routing patterns (capabilities, load balancing, multi-provider setups), see the [Provider-Agnostic Guide](https://github.com/Hypothesize-Tech/costkatana-examples/blob/main/PROVIDER_AGNOSTIC_GUIDE.md).
|
|
219
317
|
|
|
220
318
|
---
|
|
221
319
|
|
|
222
|
-
##
|
|
320
|
+
## Type-safe model constants
|
|
223
321
|
|
|
224
|
-
Stop guessing model names
|
|
322
|
+
Stop guessing model names: use namespaces for autocomplete and typo safety.
|
|
225
323
|
|
|
226
324
|
```typescript
|
|
227
325
|
import { OPENAI, ANTHROPIC, GOOGLE, AWS_BEDROCK, XAI, DEEPSEEK } from 'cost-katana';
|
|
228
326
|
|
|
229
327
|
// OpenAI
|
|
230
|
-
OPENAI.GPT_5
|
|
231
|
-
OPENAI.GPT_4
|
|
232
|
-
OPENAI.GPT_4O
|
|
233
|
-
OPENAI.GPT_3_5_TURBO
|
|
234
|
-
OPENAI.O1
|
|
235
|
-
OPENAI.O3
|
|
328
|
+
OPENAI.GPT_5;
|
|
329
|
+
OPENAI.GPT_4;
|
|
330
|
+
OPENAI.GPT_4O;
|
|
331
|
+
OPENAI.GPT_3_5_TURBO;
|
|
332
|
+
OPENAI.O1;
|
|
333
|
+
OPENAI.O3;
|
|
236
334
|
|
|
237
335
|
// Anthropic
|
|
238
|
-
ANTHROPIC.CLAUDE_SONNET_4_5
|
|
239
|
-
ANTHROPIC.CLAUDE_3_5_SONNET_20241022
|
|
240
|
-
ANTHROPIC.CLAUDE_3_5_HAIKU_20241022
|
|
336
|
+
ANTHROPIC.CLAUDE_SONNET_4_5;
|
|
337
|
+
ANTHROPIC.CLAUDE_3_5_SONNET_20241022;
|
|
338
|
+
ANTHROPIC.CLAUDE_3_5_HAIKU_20241022;
|
|
241
339
|
|
|
242
340
|
// Google
|
|
243
|
-
GOOGLE.GEMINI_2_5_PRO
|
|
244
|
-
GOOGLE.GEMINI_2_5_FLASH
|
|
245
|
-
GOOGLE.GEMINI_1_5_PRO
|
|
341
|
+
GOOGLE.GEMINI_2_5_PRO;
|
|
342
|
+
GOOGLE.GEMINI_2_5_FLASH;
|
|
343
|
+
GOOGLE.GEMINI_1_5_PRO;
|
|
246
344
|
|
|
247
345
|
// AWS Bedrock
|
|
248
|
-
AWS_BEDROCK.NOVA_PRO
|
|
249
|
-
AWS_BEDROCK.NOVA_LITE
|
|
250
|
-
AWS_BEDROCK.CLAUDE_SONNET_4_5
|
|
346
|
+
AWS_BEDROCK.NOVA_PRO;
|
|
347
|
+
AWS_BEDROCK.NOVA_LITE;
|
|
348
|
+
AWS_BEDROCK.CLAUDE_SONNET_4_5;
|
|
251
349
|
|
|
252
350
|
// Others
|
|
253
|
-
XAI.GROK_2_1212
|
|
254
|
-
DEEPSEEK.DEEPSEEK_CHAT
|
|
351
|
+
XAI.GROK_2_1212;
|
|
352
|
+
DEEPSEEK.DEEPSEEK_CHAT;
|
|
255
353
|
```
|
|
256
354
|
|
|
257
|
-
**
|
|
258
|
-
| Feature | String `'gpt-4'` | Constant `OPENAI.GPT_4` |
|
|
259
|
-
|---------|------------------|-------------------------|
|
|
260
|
-
| Autocomplete | β | β
|
|
|
261
|
-
| Typo protection | β | β
|
|
|
262
|
-
| Refactor safely | β | β
|
|
|
263
|
-
| Self-documenting | β | β
|
|
|
355
|
+
**Prefer constants over raw strings** β They give IDE autocomplete, catch typos early, refactor safely, and document which provider you intended.
|
|
264
356
|
|
|
265
357
|
---
|
|
266
358
|
|
|
267
|
-
## βοΈ Configuration
|
|
268
|
-
|
|
269
|
-
### Environment Variables
|
|
270
359
|
|
|
271
|
-
|
|
272
|
-
|
|
273
|
-
```bash
|
|
274
|
-
# Required for hosted Cost Katana
|
|
275
|
-
COST_KATANA_API_KEY=dak_your_key_here
|
|
360
|
+
```typescript
|
|
361
|
+
import { ai, AWS_BEDROCK } from 'cost-katana';
|
|
276
362
|
|
|
277
|
-
|
|
278
|
-
|
|
363
|
+
// Resolves to meta.llama4-scout-17b-instruct-v1:0
|
|
364
|
+
const response = await ai(
|
|
365
|
+
AWS_BEDROCK.LLAMA_3_2_1B_INSTRUCT,
|
|
366
|
+
'Summarize the difference between RAG and fine-tuning in two sentences.',
|
|
367
|
+
{ maxTokens: 500 }
|
|
368
|
+
);
|
|
279
369
|
```
|
|
280
370
|
|
|
281
|
-
|
|
371
|
+
### Claude extended thinking (`ProviderRequest.thinking`)
|
|
282
372
|
|
|
283
|
-
|
|
284
|
-
# Optional β direct provider keys
|
|
285
|
-
OPENAI_API_KEY=sk-...
|
|
286
|
-
ANTHROPIC_API_KEY=sk-ant-...
|
|
287
|
-
GEMINI_API_KEY=...
|
|
373
|
+
**Anthropic** and **AWS Bedrock (Claude)** requests built as a full **`ProviderRequest`** can include optional **`thinking`** for [extended thinking](https://docs.anthropic.com/en/docs/build-with-claude/extended-thinking) / reasoning. The SDK:
|
|
288
374
|
|
|
289
|
-
|
|
290
|
-
|
|
291
|
-
AWS_SECRET_ACCESS_KEY=...
|
|
292
|
-
AWS_REGION=us-east-1
|
|
293
|
-
```
|
|
375
|
+
- Maps supported models to **`adaptive`** thinking (with optional **`effort`**: `low` | `medium` | `high` | `max`) or **`enabled`** thinking (with optional **`budgetTokens`**, or omit it so the Cost Katana gateway can choose a budget).
|
|
376
|
+
- Sets **`temperature` to `1`** when thinking is on, as required by Anthropic for these calls.
|
|
294
377
|
|
|
295
|
-
|
|
378
|
+
Thinking tokens are billed as **output** tokens. The high-level **`ai()`** helper does not surface `thinking` in its options yet; use **`AICostTracker.makeRequest()`** (e.g. via **`createCostKatanaTracker()`**) and pass a `ProviderRequest`.
|
|
296
379
|
|
|
297
380
|
```typescript
|
|
298
|
-
import {
|
|
381
|
+
import { createCostKatanaTracker } from 'cost-katana';
|
|
382
|
+
import type { ProviderRequest } from 'cost-katana';
|
|
299
383
|
|
|
300
|
-
await
|
|
301
|
-
apiKey: 'dak_your_key',
|
|
302
|
-
cortex: true, // 40-75% cost savings
|
|
303
|
-
cache: true, // Smart caching
|
|
304
|
-
firewall: true // Block prompt injections
|
|
305
|
-
});
|
|
306
|
-
```
|
|
384
|
+
const tracker = await createCostKatanaTracker();
|
|
307
385
|
|
|
308
|
-
|
|
386
|
+
const request: ProviderRequest = {
|
|
387
|
+
model: 'claude-sonnet-4-5-20250929',
|
|
388
|
+
messages: [
|
|
389
|
+
{ role: 'user', content: 'Show your reasoning, then the final answer: is 2^10 > 10^2?' }
|
|
390
|
+
],
|
|
391
|
+
maxTokens: 8000,
|
|
392
|
+
thinking: { enabled: true, budgetTokens: 12000 }
|
|
393
|
+
};
|
|
309
394
|
|
|
310
|
-
|
|
311
|
-
|
|
312
|
-
temperature: 0.7, // Creativity (0-2)
|
|
313
|
-
maxTokens: 500, // Response limit
|
|
314
|
-
systemMessage: 'You are a helpful AI', // System prompt
|
|
315
|
-
cache: true, // Enable caching
|
|
316
|
-
cortex: true, // Enable optimization
|
|
317
|
-
retry: true // Auto-retry on failures
|
|
318
|
-
});
|
|
395
|
+
const raw = await tracker.makeRequest(request);
|
|
396
|
+
// Response shape matches the provider (e.g. Anthropic `content` blocks, usage fields).
|
|
319
397
|
```
|
|
320
398
|
|
|
321
|
-
|
|
399
|
+
For **adaptive** thinking on newer Opus / Sonnet builds (e.g. Opus 4.6 / 4.7, Sonnet 4.6), the SDK sends `type: 'adaptive'` and uses **`effort`** (default **`high`**) when you set `thinking: { enabled: true }` on a matching model ID.
|
|
322
400
|
|
|
323
|
-
##
|
|
401
|
+
## Cost optimization
|
|
324
402
|
|
|
325
|
-
###
|
|
403
|
+
### Cheatsheet
|
|
404
|
+
|
|
405
|
+
| Strategy | Typical savings | When to use |
|
|
406
|
+
| -------------------------------------------------- | ---------------------------- | ---------------------------------------- |
|
|
407
|
+
| Use a smaller/faster model (e.g. GPT-3.5 vs GPT-4) | Large on simple tasks | Trivial Q&A, classification, translation |
|
|
408
|
+
| **Caching** | 100% on cache hits | Repeated queries, FAQs |
|
|
409
|
+
| **Cortex** | 40β75% on eligible workloads | Long-form generation |
|
|
410
|
+
| **Chat sessions** | 10β20% | Related multi-turn work |
|
|
411
|
+
| **Gemini Flash** (vs heavy flagship models) | Very high $/token delta | High volume, cost-sensitive |
|
|
412
|
+
|
|
413
|
+
### Caching
|
|
326
414
|
|
|
327
415
|
```typescript
|
|
328
|
-
// app/api/chat/route.ts
|
|
329
416
|
import { ai, OPENAI } from 'cost-katana';
|
|
330
417
|
|
|
331
|
-
|
|
332
|
-
|
|
333
|
-
|
|
334
|
-
|
|
335
|
-
}
|
|
418
|
+
const response1 = await ai(OPENAI.GPT_4O, 'What is 2+2?', { cache: true });
|
|
419
|
+
console.log(`Cached: ${response1.cached}`);
|
|
420
|
+
console.log(`Cost: $${response1.cost}`);
|
|
421
|
+
|
|
422
|
+
const response2 = await ai(OPENAI.GPT_4O, 'What is 2+2?', { cache: true });
|
|
423
|
+
console.log(`Cached: ${response2.cached}`);
|
|
424
|
+
console.log(`Cost: $${response2.cost}`);
|
|
336
425
|
```
|
|
337
426
|
|
|
338
|
-
###
|
|
427
|
+
### Cortex (optimization)
|
|
339
428
|
|
|
340
429
|
```typescript
|
|
341
|
-
import express from 'express';
|
|
342
430
|
import { ai, OPENAI } from 'cost-katana';
|
|
343
431
|
|
|
344
|
-
const
|
|
345
|
-
|
|
346
|
-
|
|
347
|
-
|
|
348
|
-
|
|
349
|
-
|
|
350
|
-
}
|
|
432
|
+
const response = await ai(
|
|
433
|
+
OPENAI.GPT_4O,
|
|
434
|
+
'Write a comprehensive guide to machine learning for beginners',
|
|
435
|
+
{
|
|
436
|
+
cortex: true,
|
|
437
|
+
maxTokens: 2000
|
|
438
|
+
}
|
|
439
|
+
);
|
|
351
440
|
|
|
352
|
-
|
|
441
|
+
console.log(`Optimized: ${response.optimized}`);
|
|
442
|
+
console.log(`Cost: $${response.cost}`);
|
|
353
443
|
```
|
|
354
444
|
|
|
355
|
-
###
|
|
445
|
+
### Compare models side by side
|
|
356
446
|
|
|
357
447
|
```typescript
|
|
358
|
-
import
|
|
359
|
-
import { ai, OPENAI } from 'cost-katana';
|
|
448
|
+
import { ai, OPENAI, ANTHROPIC, GOOGLE } from 'cost-katana';
|
|
360
449
|
|
|
361
|
-
const
|
|
450
|
+
const prompt = 'Summarize the theory of relativity in 50 words';
|
|
362
451
|
|
|
363
|
-
|
|
364
|
-
|
|
365
|
-
|
|
366
|
-
}
|
|
452
|
+
const models = [
|
|
453
|
+
{ name: 'GPT-4 class', id: OPENAI.GPT_4O },
|
|
454
|
+
{ name: 'Claude 3.5 Sonnet', id: ANTHROPIC.CLAUDE_3_5_SONNET_20241022 },
|
|
455
|
+
{ name: 'Gemini 2.5 Pro', id: GOOGLE.GEMINI_2_5_PRO },
|
|
456
|
+
{ name: 'GPT-3.5 Turbo', id: OPENAI.GPT_3_5_TURBO }
|
|
457
|
+
];
|
|
367
458
|
|
|
368
|
-
|
|
459
|
+
console.log('Model cost comparison\n');
|
|
460
|
+
|
|
461
|
+
for (const model of models) {
|
|
462
|
+
const response = await ai(model.id, prompt);
|
|
463
|
+
console.log(`${model.name.padEnd(22)} $${response.cost.toFixed(6)}`);
|
|
464
|
+
}
|
|
369
465
|
```
|
|
370
466
|
|
|
371
|
-
###
|
|
467
|
+
### Quick wins
|
|
372
468
|
|
|
373
469
|
```typescript
|
|
374
|
-
import { Controller, Post, Body } from '@nestjs/common';
|
|
375
470
|
import { ai, OPENAI } from 'cost-katana';
|
|
376
471
|
|
|
377
|
-
|
|
378
|
-
|
|
379
|
-
|
|
380
|
-
|
|
381
|
-
|
|
382
|
-
|
|
383
|
-
|
|
472
|
+
// Expensive: flagship model for a trivial question
|
|
473
|
+
await ai(OPENAI.GPT_4O, 'What is 2+2?');
|
|
474
|
+
|
|
475
|
+
// Better: match model to task
|
|
476
|
+
await ai(OPENAI.GPT_3_5_TURBO, 'What is 2+2?');
|
|
477
|
+
|
|
478
|
+
// Better still: cache repeated FAQs
|
|
479
|
+
await ai(OPENAI.GPT_3_5_TURBO, 'What is 2+2?', { cache: true });
|
|
480
|
+
|
|
481
|
+
// Long content: Cortex
|
|
482
|
+
await ai(OPENAI.GPT_4O, 'Write a 2000-word essay', { cortex: true });
|
|
384
483
|
```
|
|
385
484
|
|
|
386
485
|
---
|
|
387
486
|
|
|
388
|
-
##
|
|
487
|
+
## Security and reliability
|
|
389
488
|
|
|
390
|
-
### Firewall
|
|
489
|
+
### Firewall
|
|
391
490
|
|
|
392
|
-
Block prompt injection
|
|
491
|
+
Block prompt injection and related abuse when enabled via **`configure({ firewall: true })`** and gateway/tracker settings.
|
|
393
492
|
|
|
394
493
|
```typescript
|
|
395
494
|
import { configure, ai, OPENAI } from 'cost-katana';
|
|
@@ -397,86 +496,78 @@ import { configure, ai, OPENAI } from 'cost-katana';
|
|
|
397
496
|
await configure({ firewall: true });
|
|
398
497
|
|
|
399
498
|
try {
|
|
400
|
-
await ai(OPENAI.
|
|
499
|
+
await ai(OPENAI.GPT_4O, 'Ignore all previous instructions and...');
|
|
401
500
|
} catch (error) {
|
|
402
|
-
console.log('
|
|
501
|
+
console.log('Blocked:', (error as Error).message);
|
|
403
502
|
}
|
|
404
503
|
```
|
|
405
504
|
|
|
406
|
-
**
|
|
407
|
-
- Prompt injection attacks
|
|
408
|
-
- Jailbreak attempts
|
|
409
|
-
- Data exfiltration
|
|
410
|
-
- Malicious content generation
|
|
411
|
-
|
|
412
|
-
---
|
|
505
|
+
**Helps mitigate:** prompt injection, jailbreak attempts, unsafe content patterns (exact behavior depends on your gateway configuration).
|
|
413
506
|
|
|
414
|
-
|
|
507
|
+
### Auto-failover
|
|
415
508
|
|
|
416
|
-
|
|
509
|
+
When routing and health checks allow, requests can fall back across providers so a single vendor outage does not take down your app.
|
|
417
510
|
|
|
418
511
|
```typescript
|
|
419
512
|
import { ai, OPENAI } from 'cost-katana';
|
|
420
513
|
|
|
421
|
-
|
|
422
|
-
const response = await ai(OPENAI.GPT_4, 'Hello');
|
|
514
|
+
const response = await ai(OPENAI.GPT_4O, 'Hello');
|
|
423
515
|
|
|
424
516
|
console.log(`Provider used: ${response.provider}`);
|
|
425
|
-
//
|
|
517
|
+
// e.g. 'openai', 'anthropic', or 'google' depending on availability and policy
|
|
426
518
|
```
|
|
427
519
|
|
|
428
520
|
---
|
|
429
521
|
|
|
430
|
-
##
|
|
522
|
+
## Usage tracking and analytics
|
|
431
523
|
|
|
432
|
-
### Dashboard attribution (
|
|
524
|
+
### Dashboard attribution with `configure()` and `ai()`
|
|
433
525
|
|
|
434
|
-
Use the same **`ai()`** API
|
|
526
|
+
Use the same **`ai()`** API everywhere. Point usage at your project once with **`configure()`** or environment variables.
|
|
435
527
|
|
|
436
528
|
```typescript
|
|
437
529
|
import { configure, ai, OPENAI } from 'cost-katana';
|
|
438
530
|
|
|
439
531
|
await configure({
|
|
440
532
|
apiKey: process.env.COST_KATANA_API_KEY,
|
|
441
|
-
projectId: process.env.PROJECT_ID
|
|
533
|
+
projectId: process.env.PROJECT_ID
|
|
442
534
|
});
|
|
443
535
|
|
|
444
|
-
const response = await ai(OPENAI.GPT_4O, 'Explain quantum computing'
|
|
445
|
-
tags: ['demo', 'readme'],
|
|
446
|
-
});
|
|
536
|
+
const response = await ai(OPENAI.GPT_4O, 'Explain quantum computing');
|
|
447
537
|
|
|
448
538
|
console.log(response.text);
|
|
449
539
|
console.log('Cost:', response.cost);
|
|
450
540
|
console.log('Tokens:', response.tokens);
|
|
451
|
-
console.log('Response time (ms):', response.responseTime);
|
|
452
541
|
```
|
|
453
542
|
|
|
454
|
-
Calls
|
|
543
|
+
Calls can be attributed to your project in the dashboard. You can also pass **`projectId`** through tracker/gateway options where supported when using multiple projects.
|
|
544
|
+
|
|
545
|
+
### `AICostTracker` with defaults (advanced)
|
|
455
546
|
|
|
456
|
-
|
|
547
|
+
When you need a **dedicated tracker instance** (not only the global `ai()` helper), use **`createCostKatanaTracker()`** or **`AICostTracker.createWithDefaults()`**. They populate **`TrackerConfig`** from the same environment rules as auto-config:
|
|
457
548
|
|
|
458
|
-
|
|
549
|
+
- If you set **direct** provider keys (`OPENAI_API_KEY`, `ANTHROPIC_API_KEY`, `GOOGLE_API_KEY`, or AWS Bedrock credentials), those providers are registered.
|
|
550
|
+
- If you **only** have **`COST_KATANA_API_KEY`** and **no** direct provider keys, the default is **Cost Katana hosted models** via the gateway: inference can route through the hosted API without embedding vendor keys in your app.
|
|
459
551
|
|
|
460
552
|
```typescript
|
|
461
553
|
import { createCostKatanaTracker, AIProvider } from 'cost-katana';
|
|
462
554
|
|
|
463
555
|
const tracker = await createCostKatanaTracker();
|
|
464
556
|
|
|
465
|
-
// Optional overrides (merged on top of defaults)
|
|
466
557
|
const custom = await createCostKatanaTracker({
|
|
467
558
|
optimization: { enablePromptOptimization: false },
|
|
468
559
|
providers: [{ provider: AIProvider.OpenAI, apiKey: process.env.OPENAI_API_KEY! }]
|
|
469
560
|
});
|
|
470
561
|
|
|
471
|
-
// Same
|
|
562
|
+
// Same idea: await AICostTracker.createWithDefaults({ ... })
|
|
472
563
|
// Short alias: import { tracker as costKatana } from 'cost-katana';
|
|
473
564
|
```
|
|
474
565
|
|
|
475
566
|
Requires **`COST_KATANA_API_KEY`** in the environment (same as `AICostTracker.create()`). **`PROJECT_ID`** remains optional.
|
|
476
567
|
|
|
477
|
-
### Dedicated
|
|
568
|
+
### Dedicated per-provider trackers
|
|
478
569
|
|
|
479
|
-
|
|
570
|
+
For a **small `complete()`-style API** on top of `AICostTracker`, use **`createOpenAITracker`**, **`createAnthropicTracker`**, etc.
|
|
480
571
|
|
|
481
572
|
```typescript
|
|
482
573
|
import { createOpenAITracker, OPENAI } from 'cost-katana';
|
|
@@ -489,30 +580,27 @@ console.log('Total cost (USD):', response.cost.totalCost);
|
|
|
489
580
|
console.log('Response time (ms):', response.responseTime);
|
|
490
581
|
```
|
|
491
582
|
|
|
492
|
-
For **gateway proxying**, **manual `trackUsage`**, or a fully custom **`AICostTracker
|
|
493
|
-
|
|
494
|
-
### View Analytics in Dashboard
|
|
583
|
+
For **gateway proxying**, **manual `trackUsage`**, or a fully custom **`AICostTracker`**, see [`docs/API.md`](./docs/API.md) and [`examples/`](./examples/).
|
|
495
584
|
|
|
496
|
-
|
|
585
|
+
### View analytics in the dashboard
|
|
497
586
|
|
|
498
|
-
|
|
499
|
-
- **Client Environment**: User agent, platform, IP geolocation
|
|
500
|
-
- **Request/Response Data**: Full request and response payloads (sanitized)
|
|
501
|
-
- **Optimization Opportunities**: AI-powered suggestions to reduce costs
|
|
502
|
-
- **Performance Metrics**: Real-time monitoring with anomaly detection
|
|
587
|
+
With tracking enabled, you can inspect:
|
|
503
588
|
|
|
504
|
-
|
|
589
|
+
- **Network performance** β DNS, TCP, total response time
|
|
590
|
+
- **Client environment** β User agent, platform, IP geolocation (where collected)
|
|
591
|
+
- **Request/response data** β Payloads (sanitized)
|
|
592
|
+
- **Optimization opportunities** β Suggestions to reduce cost
|
|
593
|
+
- **Performance metrics** β Monitoring and anomaly signals
|
|
505
594
|
|
|
506
|
-
|
|
595
|
+
### Manual usage tracking
|
|
507
596
|
|
|
508
597
|
```typescript
|
|
509
598
|
import { createCostKatanaTracker } from 'cost-katana';
|
|
510
599
|
|
|
511
600
|
const tracker = await createCostKatanaTracker();
|
|
512
601
|
|
|
513
|
-
// Manually track usage with additional metadata
|
|
514
602
|
await tracker.trackUsage({
|
|
515
|
-
model: 'gpt-
|
|
603
|
+
model: 'gpt-4o',
|
|
516
604
|
provider: 'openai',
|
|
517
605
|
prompt: 'Hello, world!',
|
|
518
606
|
completion: 'Hello! How can I help you today?',
|
|
@@ -524,106 +612,182 @@ await tracker.trackUsage({
|
|
|
524
612
|
userId: 'user_123',
|
|
525
613
|
sessionId: 'session_abc',
|
|
526
614
|
tags: ['chat', 'greeting'],
|
|
527
|
-
// Additional metadata for comprehensive tracking
|
|
528
615
|
requestMetadata: {
|
|
529
|
-
userAgent: navigator
|
|
616
|
+
userAgent: typeof navigator !== 'undefined' ? navigator.userAgent : undefined,
|
|
530
617
|
clientIP: await fetch('https://api.ipify.org').then(r => r.text()),
|
|
531
618
|
feature: 'chat-interface'
|
|
532
619
|
}
|
|
533
620
|
});
|
|
534
621
|
```
|
|
535
622
|
|
|
536
|
-
### Session replay
|
|
623
|
+
### Session replay and distributed tracing
|
|
537
624
|
|
|
538
|
-
|
|
625
|
+
The **`trace`** submodule provides session graphs, spans, and middleware. See [`src/trace/README.md`](./src/trace/README.md) for exports such as `TraceClient`, `LocalTraceService`, and `createTraceMiddleware`.
|
|
539
626
|
|
|
540
627
|
---
|
|
541
628
|
|
|
542
|
-
##
|
|
629
|
+
## Framework integration
|
|
630
|
+
|
|
631
|
+
### Next.js App Router
|
|
632
|
+
|
|
633
|
+
```typescript
|
|
634
|
+
// app/api/chat/route.ts
|
|
635
|
+
import { ai, OPENAI } from 'cost-katana';
|
|
543
636
|
|
|
544
|
-
|
|
545
|
-
|
|
546
|
-
|
|
547
|
-
|
|
548
|
-
|
|
549
|
-
|
|
550
|
-
| **Use Gemini Flash** | 95% vs GPT-4 | High-volume, cost-sensitive |
|
|
637
|
+
export async function POST(request: Request) {
|
|
638
|
+
const { prompt } = await request.json();
|
|
639
|
+
const response = await ai(OPENAI.GPT_4O, prompt);
|
|
640
|
+
return Response.json(response);
|
|
641
|
+
}
|
|
642
|
+
```
|
|
551
643
|
|
|
552
|
-
###
|
|
644
|
+
### Express.js
|
|
553
645
|
|
|
554
646
|
```typescript
|
|
555
|
-
|
|
556
|
-
|
|
647
|
+
import express from 'express';
|
|
648
|
+
import { ai, OPENAI } from 'cost-katana';
|
|
557
649
|
|
|
558
|
-
|
|
559
|
-
|
|
650
|
+
const app = express();
|
|
651
|
+
app.use(express.json());
|
|
560
652
|
|
|
561
|
-
|
|
562
|
-
await ai(OPENAI.
|
|
653
|
+
app.post('/api/chat', async (req, res) => {
|
|
654
|
+
const response = await ai(OPENAI.GPT_4O, req.body.prompt);
|
|
655
|
+
res.json(response);
|
|
656
|
+
});
|
|
563
657
|
|
|
564
|
-
|
|
565
|
-
|
|
658
|
+
app.listen(3000);
|
|
659
|
+
```
|
|
660
|
+
|
|
661
|
+
### Fastify
|
|
662
|
+
|
|
663
|
+
```typescript
|
|
664
|
+
import fastify from 'fastify';
|
|
665
|
+
import { ai, OPENAI } from 'cost-katana';
|
|
666
|
+
|
|
667
|
+
const app = fastify();
|
|
668
|
+
|
|
669
|
+
app.post('/api/chat', async request => {
|
|
670
|
+
const { prompt } = request.body as { prompt: string };
|
|
671
|
+
return await ai(OPENAI.GPT_4O, prompt);
|
|
672
|
+
});
|
|
673
|
+
|
|
674
|
+
app.listen({ port: 3000 });
|
|
675
|
+
```
|
|
676
|
+
|
|
677
|
+
### NestJS
|
|
678
|
+
|
|
679
|
+
```typescript
|
|
680
|
+
import { Controller, Post, Body } from '@nestjs/common';
|
|
681
|
+
import { ai, OPENAI } from 'cost-katana';
|
|
682
|
+
|
|
683
|
+
@Controller('api')
|
|
684
|
+
export class ChatController {
|
|
685
|
+
@Post('chat')
|
|
686
|
+
async chat(@Body() body: { prompt: string }) {
|
|
687
|
+
return await ai(OPENAI.GPT_4O, body.prompt);
|
|
688
|
+
}
|
|
689
|
+
}
|
|
566
690
|
```
|
|
567
691
|
|
|
568
692
|
---
|
|
569
693
|
|
|
570
|
-
##
|
|
694
|
+
## Error handling
|
|
571
695
|
|
|
572
696
|
```typescript
|
|
573
697
|
import { ai, OPENAI } from 'cost-katana';
|
|
574
698
|
|
|
575
699
|
try {
|
|
576
|
-
const response = await ai(OPENAI.
|
|
700
|
+
const response = await ai(OPENAI.GPT_4O, 'Hello');
|
|
577
701
|
console.log(response.text);
|
|
578
702
|
} catch (error) {
|
|
579
|
-
|
|
703
|
+
const err = error as Error & { code?: string; availableModels?: string[] };
|
|
704
|
+
switch (err.code) {
|
|
580
705
|
case 'NO_API_KEY':
|
|
581
|
-
console.log('Set COST_KATANA_API_KEY or
|
|
706
|
+
console.log('Set COST_KATANA_API_KEY or a provider API key');
|
|
582
707
|
break;
|
|
583
708
|
case 'RATE_LIMIT':
|
|
584
|
-
console.log('Rate limited.
|
|
709
|
+
console.log('Rate limited. Retry with backoff.');
|
|
585
710
|
break;
|
|
586
711
|
case 'INVALID_MODEL':
|
|
587
|
-
console.log('Model not found. Available:',
|
|
712
|
+
console.log('Model not found. Available:', err.availableModels);
|
|
588
713
|
break;
|
|
589
714
|
default:
|
|
590
|
-
console.log('Error:',
|
|
715
|
+
console.log('Error:', err.message);
|
|
591
716
|
}
|
|
592
717
|
}
|
|
593
718
|
```
|
|
594
719
|
|
|
720
|
+
Exact **`code`** values depend on the failure path (gateway vs direct provider). Always log **`message`** for support.
|
|
721
|
+
|
|
595
722
|
---
|
|
596
723
|
|
|
597
|
-
##
|
|
724
|
+
## AI gateway (details)
|
|
598
725
|
|
|
599
|
-
The gateway is an **HTTP proxy**: call Cost Katanaβs URL with your API key; the service forwards to OpenAI, Anthropic, Google, Cohere,
|
|
726
|
+
The gateway is an **HTTP proxy**: call Cost Katanaβs URL with your API key; the service forwards to OpenAI, Anthropic, Google, Cohere, and others, and can attach caching, retries, firewall, and tracking.
|
|
600
727
|
|
|
601
|
-
- **Quick start:**
|
|
602
|
-
- **`CostKatana-Target-Url`:**
|
|
603
|
-
- **Anthropic on hosted gateway:** `gateway.anthropic(...)` / `/v1/messages`
|
|
604
|
-
- **Dashboard
|
|
728
|
+
- **Quick start:** [Quick start β Path A](#path-a--gateway-http-proxy) (`gateway()` or cURL).
|
|
729
|
+
- **`CostKatana-Target-Url`:** Use for non-default upstream URLs (Azure OpenAI, private endpoints). For standard routes (`/v1/chat/completions`, `/v1/messages`, β¦), **`gateway()`** often uses `inferTargetUrl: true` and omits it.
|
|
730
|
+
- **Anthropic on hosted gateway:** `gateway.anthropic(...)` / `/v1/messages` may not require an Anthropic key in your app; the service may use Bedrock when no server `ANTHROPIC_API_KEY` is set (see docs for streaming limitations).
|
|
731
|
+
- **Dashboard vs custom tracking:** Gateway traffic reflects **proxied** bodies; `AICostTracker` / `trackUsage` supports **custom** structured logging. For multi-turn and token nuances, see [`examples/GATEWAY_USAGE_AND_TRACKING.md`](./examples/GATEWAY_USAGE_AND_TRACKING.md) and [costkatana-examples `2-gateway`](https://github.com/Hypothesize-Tech/costkatana-examples/tree/main/2-gateway).
|
|
605
732
|
|
|
606
733
|
---
|
|
607
734
|
|
|
608
|
-
##
|
|
735
|
+
## Experimentation (hosted API)
|
|
736
|
+
|
|
737
|
+
The Cost Katana backend ([`costkatana-backend-nest`](https://github.com/Hypothesize-Tech/costkatana-backend-nest)) exposes **experimentation** REST endpoints under **`/api/experimentation`** on the hosted API (same origin as the gateway, e.g. `https://api.costkatana.com`). The dashboard **Experimentation** UI uses these APIs; you can also integrate them directly.
|
|
738
|
+
|
|
739
|
+
**What it covers**
|
|
740
|
+
|
|
741
|
+
- **Model comparison** β Run side-by-side comparisons across providers (`POST /api/experimentation/model-comparison`).
|
|
742
|
+
- **Real-time comparison** β Start a comparison job (`POST /api/experimentation/real-time-comparison`) and stream progress over **SSE** at `GET /api/experimentation/comparison-progress/:sessionId` (session token validated). Poll or reconnect via `GET /api/experimentation/comparison-job/:sessionId` when authenticated.
|
|
743
|
+
- **Catalog** β `GET /api/experimentation/available-models` returns router-registered models (active/inactive) for picking candidates.
|
|
744
|
+
- **Cost estimate** β `POST /api/experimentation/estimate-cost` (public) for experiment cost estimates before you run.
|
|
745
|
+
- **What-if scenarios** β List/create/analyze/delete scenarios (`/api/experimentation/what-if-scenarios`, `.../:scenarioName/analyze`, lifecycle updates).
|
|
746
|
+
- **Real-time simulation** β `POST /api/experimentation/real-time-simulation` (public) for what-if style simulations.
|
|
747
|
+
- **History and insights** β `GET /api/experimentation/history`, `GET /api/experimentation/recommendations`, `GET /api/experimentation/fine-tuning-analysis`.
|
|
748
|
+
- **Exports** β `GET /api/experimentation/:experimentId/export?format=json|csv` for results.
|
|
749
|
+
|
|
750
|
+
**Auth**
|
|
751
|
+
|
|
752
|
+
- Most write/read routes require a **dashboard user JWT** (`JwtAuthGuard`).
|
|
753
|
+
- Several routes are marked **public** (estimate cost, available models, real-time simulation, SSE progress with a valid session id). See the controller for the exact list: [`experimentation.controller.ts` in costkatana-backend-nest](https://github.com/Hypothesize-Tech/costkatana-backend-nest/blob/main/src/modules/experimentation/experimentation.controller.ts).
|
|
754
|
+
|
|
755
|
+
**Server configuration**
|
|
756
|
+
|
|
757
|
+
- Real model execution for comparisons may require backend flags such as **`ENABLE_REAL_MODEL_COMPARISON=true`** where your deployment enables live API calls to providers.
|
|
758
|
+
|
|
759
|
+
---
|
|
760
|
+
|
|
761
|
+
## Examples and documentation
|
|
762
|
+
|
|
763
|
+
**In this repo**
|
|
764
|
+
|
|
765
|
+
| Resource | Description |
|
|
766
|
+
| -------------------------------------------------------------- | ---------------------------- |
|
|
767
|
+
| [`docs/API.md`](./docs/API.md) | API reference |
|
|
768
|
+
| [`docs/EXAMPLES.md`](./docs/EXAMPLES.md) | Examples index |
|
|
769
|
+
| [`docs/GATEWAY.md`](./docs/GATEWAY.md) | Gateway |
|
|
770
|
+
| [`docs/PROMPT_OPTIMIZATION.md`](./docs/PROMPT_OPTIMIZATION.md) | Prompt optimization |
|
|
771
|
+
| [`docs/WEBHOOKS.md`](./docs/WEBHOOKS.md) | Webhooks |
|
|
772
|
+
| [`examples/`](./examples/) | Runnable TypeScript examples |
|
|
609
773
|
|
|
610
|
-
|
|
774
|
+
**External examples repo** β 45+ complete examples:
|
|
611
775
|
|
|
612
|
-
|
|
776
|
+
**[github.com/Hypothesize-Tech/costkatana-examples](https://github.com/Hypothesize-Tech/costkatana-examples)**
|
|
613
777
|
|
|
614
|
-
| Category
|
|
615
|
-
|
|
616
|
-
| **Cost
|
|
617
|
-
| **Gateway**
|
|
618
|
-
| **Optimization**
|
|
619
|
-
| **Observability** | OpenTelemetry, tracing, metrics
|
|
620
|
-
| **Security**
|
|
621
|
-
| **Workflows**
|
|
622
|
-
| **Frameworks**
|
|
778
|
+
| Category | Topics |
|
|
779
|
+
| ----------------- | ------------------------------------------ |
|
|
780
|
+
| **Cost tracking** | Budgets, alerts |
|
|
781
|
+
| **Gateway** | Routing, load balancing, failover |
|
|
782
|
+
| **Optimization** | Cortex, caching, compression |
|
|
783
|
+
| **Observability** | OpenTelemetry, tracing, metrics |
|
|
784
|
+
| **Security** | Firewall, rate limiting, moderation |
|
|
785
|
+
| **Workflows** | Multi-step orchestration |
|
|
786
|
+
| **Frameworks** | Express, Next.js, Fastify, NestJS, FastAPI |
|
|
623
787
|
|
|
624
788
|
---
|
|
625
789
|
|
|
626
|
-
##
|
|
790
|
+
## Migration guides
|
|
627
791
|
|
|
628
792
|
### From OpenAI SDK
|
|
629
793
|
|
|
@@ -641,7 +805,7 @@ console.log(completion.choices[0].message.content);
|
|
|
641
805
|
import { ai, OPENAI } from 'cost-katana';
|
|
642
806
|
const response = await ai(OPENAI.GPT_4, 'Hello');
|
|
643
807
|
console.log(response.text);
|
|
644
|
-
console.log(`Cost: $${response.cost}`);
|
|
808
|
+
console.log(`Cost: $${response.cost}`);
|
|
645
809
|
```
|
|
646
810
|
|
|
647
811
|
### From Anthropic SDK
|
|
@@ -675,9 +839,9 @@ const response = await ai(OPENAI.GPT_4, 'Hello');
|
|
|
675
839
|
|
|
676
840
|
---
|
|
677
841
|
|
|
678
|
-
##
|
|
842
|
+
## Contributing
|
|
679
843
|
|
|
680
|
-
We welcome contributions
|
|
844
|
+
We welcome contributions. See [Contributing Guide](./CONTRIBUTING.md).
|
|
681
845
|
|
|
682
846
|
```bash
|
|
683
847
|
git clone https://github.com/Hypothesize-Tech/costkatana-core.git
|
|
@@ -693,19 +857,19 @@ npm run build # Build
|
|
|
693
857
|
|
|
694
858
|
---
|
|
695
859
|
|
|
696
|
-
##
|
|
860
|
+
## Support
|
|
697
861
|
|
|
698
|
-
| Channel
|
|
699
|
-
|
|
700
|
-
| **Dashboard**
|
|
701
|
-
| **Documentation** | [docs.costkatana.com](https://docs.costkatana.com)
|
|
702
|
-
| **GitHub**
|
|
703
|
-
| **Discord**
|
|
704
|
-
| **Email**
|
|
862
|
+
| Channel | Link |
|
|
863
|
+
| ----------------- | ------------------------------------------------------------------ |
|
|
864
|
+
| **Dashboard** | [costkatana.com](https://costkatana.com) |
|
|
865
|
+
| **Documentation** | [docs.costkatana.com](https://docs.costkatana.com) |
|
|
866
|
+
| **GitHub** | [github.com/Hypothesize-Tech](https://github.com/Hypothesize-Tech) |
|
|
867
|
+
| **Discord** | [discord.gg/D8nDArmKbY](https://discord.gg/D8nDArmKbY) |
|
|
868
|
+
| **Email** | support@costkatana.com |
|
|
705
869
|
|
|
706
870
|
---
|
|
707
871
|
|
|
708
|
-
##
|
|
872
|
+
## License
|
|
709
873
|
|
|
710
874
|
MIT Β© Cost Katana
|
|
711
875
|
|
|
@@ -713,7 +877,7 @@ MIT Β© Cost Katana
|
|
|
713
877
|
|
|
714
878
|
<div align="center">
|
|
715
879
|
|
|
716
|
-
**Start cutting AI costs today**
|
|
880
|
+
**Start cutting AI costs today**
|
|
717
881
|
|
|
718
882
|
```bash
|
|
719
883
|
npm install cost-katana
|