ai-token-estimator 1.0.3 → 1.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,12 +1,28 @@
1
1
  # ai-token-estimator
2
2
 
3
- Estimate token counts and costs for LLM API calls based on character count and model-specific ratios.
4
-
5
- > **Important:** This is a rough estimation tool for budgeting purposes, not a precise tokenizer. Actual token counts may vary by ±20% depending on:
6
- > - Content type (code vs prose)
7
- > - Language (CJK languages use more tokens)
8
- > - API message framing overhead
9
- > - Special characters and formatting
3
+ [![npm](https://img.shields.io/npm/v/ai-token-estimator.svg)](https://www.npmjs.com/package/ai-token-estimator)
4
+ [![CI](https://github.com/BitsAndBytesAI/ai-token-estimator/actions/workflows/ci.yml/badge.svg)](https://github.com/BitsAndBytesAI/ai-token-estimator/actions/workflows/ci.yml)
5
+ [![license](https://img.shields.io/npm/l/ai-token-estimator.svg)](https://github.com/BitsAndBytesAI/ai-token-estimator/blob/main/LICENSE)
6
+
7
+ The best way to estimate **tokens + input cost** for LLM calls — with **exact OpenAI tokenization** (tiktoken-compatible BPE) and optional **official provider token counting** for Claude/Gemini.
8
+
9
+ > Accuracy depends on the tokenizer mode you choose:
10
+ > - **Exact** for OpenAI models when you use `openai_exact` / `encode()` / `decode()`.
11
+ > - **Exact** for Claude/Gemini when you use `estimateAsync()` with their official count-tokens endpoints.
12
+ > - **Heuristic** fallback is available for speed and resilience.
13
+
14
+ ## Features
15
+
16
+ - **Exact OpenAI tokenization** (tiktoken-compatible BPE): `encode()` / `decode()` / `openai_exact`
17
+ - **Official provider token counting** (async):
18
+ - Anthropic `POST /v1/messages/count_tokens` (`anthropic_count_tokens`)
19
+ - Gemini `models/:countTokens` (`gemini_count_tokens`)
20
+ - **Fast local fallback** options:
21
+ - Heuristic (`heuristic`, default)
22
+ - Local Gemma SentencePiece approximation (`gemma_sentencepiece`)
23
+ - Automatic fallback to heuristic on provider failures (`fallbackToHeuristicOnError`)
24
+ - **Cost estimation** using a weekly auto-updated pricing/model list (GitHub Actions)
25
+ - TypeScript-first, ships ESM + CJS
10
26
 
11
27
  ## Installation
12
28
 
@@ -17,7 +33,7 @@ npm install ai-token-estimator
17
33
  ## Usage
18
34
 
19
35
  ```typescript
20
- import { estimate, getAvailableModels } from 'ai-token-estimator';
36
+ import { countTokens, estimate, getAvailableModels } from 'ai-token-estimator';
21
37
 
22
38
  // Basic usage
23
39
  const result = estimate({
@@ -37,8 +53,126 @@ console.log(result);
37
53
  // List available models
38
54
  console.log(getAvailableModels());
39
55
  // ['gpt-5.2', 'gpt-4o', 'claude-opus-4.5', 'gemini-3-pro', ...]
56
+
57
+ // Exact tokens for OpenAI, heuristic for others
58
+ console.log(countTokens({ text: 'Hello, world!', model: 'gpt-5.1' }));
59
+ // { tokens: 4, exact: true, encoding: 'o200k_base' }
60
+ ```
61
+
62
+ ## Exact OpenAI tokenization (BPE)
63
+
64
+ This package includes **exact tokenization for OpenAI models** using a tiktoken-compatible BPE tokenizer (via `gpt-tokenizer`).
65
+
66
+ Notes:
67
+ - Encodings are **lazy-loaded on first use** (one-time cost per encoding).
68
+ - Exact tokenization is **slower** than heuristic estimation; `estimate()` defaults to `'heuristic'` to keep existing behavior fast.
69
+ - `encode` / `decode` and `estimate({ tokenizer: 'openai_exact' })` require **Node.js** (uses `node:module` under the hood).
70
+
71
+ ```ts
72
+ import { encode, decode } from 'ai-token-estimator';
73
+
74
+ const text = 'Hello, world!';
75
+ const tokens = encode(text, { model: 'gpt-5.1' }); // exact OpenAI token IDs
76
+ const roundTrip = decode(tokens, { model: 'gpt-5.1' });
77
+
78
+ console.log(tokens.length);
79
+ console.log(roundTrip); // "Hello, world!"
80
+ ```
81
+
82
+ Supported encodings:
83
+ `r50k_base`, `p50k_base`, `p50k_edit`, `cl100k_base`, `o200k_base`, `o200k_harmony`
84
+
85
+ ## Using the exact tokenizer with `estimate()`
86
+
87
+ `estimate()` is heuristic by default (fast). If you want to use exact OpenAI token counting:
88
+
89
+ ```ts
90
+ import { estimate } from 'ai-token-estimator';
91
+
92
+ const result = estimate({
93
+ text: 'Hello, world!',
94
+ model: 'gpt-5.1',
95
+ tokenizer: 'openai_exact',
96
+ });
97
+
98
+ console.log(result.tokenizerMode); // "openai_exact"
99
+ console.log(result.encodingUsed); // "o200k_base"
100
+ ```
101
+
102
+ Or use `tokenizer: 'auto'` to use exact counting for OpenAI models and heuristic for everything else.
103
+
104
+ ## Provider token counting (Claude / Gemini)
105
+
106
+ If you want **more accurate token counts** for Anthropic or Gemini models, you can call their official token counting endpoints
107
+ via `estimateAsync()`. This requires API keys, and therefore should be used **server-side** (never in the browser).
108
+
109
+ If you want these modes to **fail open** (fallback to heuristic estimation) when the provider API is throttled/unavailable or the API key is invalid,
110
+ set `fallbackToHeuristicOnError: true`.
111
+
112
+ ### Anthropic: `POST /v1/messages/count_tokens`
113
+
114
+ - Env var: `ANTHROPIC_API_KEY`
115
+
116
+ ```ts
117
+ import { estimateAsync } from 'ai-token-estimator';
118
+
119
+ const out = await estimateAsync({
120
+ text: 'Hello, Claude',
121
+ model: 'claude-sonnet-4-5',
122
+ tokenizer: 'anthropic_count_tokens',
123
+ fallbackToHeuristicOnError: true,
124
+ anthropic: {
125
+ // apiKey: '...' // optional; otherwise uses process.env.ANTHROPIC_API_KEY
126
+ system: 'You are a helpful assistant',
127
+ },
128
+ });
129
+
130
+ console.log(out.estimatedTokens);
40
131
  ```
41
132
 
133
+ ### Gemini: `models/:countTokens` (Google AI Studio)
134
+
135
+ - Env var: `GEMINI_API_KEY`
136
+
137
+ ```ts
138
+ import { estimateAsync } from 'ai-token-estimator';
139
+
140
+ const out = await estimateAsync({
141
+ text: 'The quick brown fox jumps over the lazy dog.',
142
+ model: 'gemini-2.0-flash',
143
+ tokenizer: 'gemini_count_tokens',
144
+ fallbackToHeuristicOnError: true,
145
+ gemini: {
146
+ // apiKey: '...' // optional; otherwise uses process.env.GEMINI_API_KEY
147
+ },
148
+ });
149
+
150
+ console.log(out.estimatedTokens);
151
+ ```
152
+
153
+ ### Local Gemini option: Gemma SentencePiece (approximation)
154
+
155
+ If you want a **local** tokenizer option for Gemini-like models, you can use a SentencePiece tokenizer model (e.g. Gemma's
156
+ `tokenizer.model`) via `sentencepiece-js`.
157
+
158
+ ```ts
159
+ import { estimateAsync } from 'ai-token-estimator';
160
+
161
+ const out = await estimateAsync({
162
+ text: 'Hello!',
163
+ model: 'gemini-2.0-flash',
164
+ tokenizer: 'gemma_sentencepiece',
165
+ gemma: {
166
+ modelPath: '/path/to/tokenizer.model',
167
+ },
168
+ });
169
+
170
+ console.log(out.estimatedTokens);
171
+ ```
172
+
173
+ Note:
174
+ - This is **not** an official Gemini tokenizer; treat it as an approximation unless you have verified equivalence for your models.
175
+
42
176
  ## API Reference
43
177
 
44
178
  ### `estimate(input: EstimateInput): EstimateOutput`
@@ -52,9 +186,13 @@ interface EstimateInput {
52
186
  text: string; // The text to estimate tokens for
53
187
  model: string; // Model ID (e.g., 'gpt-4o', 'claude-opus-4.5')
54
188
  rounding?: 'ceil' | 'round' | 'floor'; // Rounding strategy (default: 'ceil')
189
+ tokenizer?: 'heuristic' | 'openai_exact' | 'auto'; // Token counting strategy (default: 'heuristic')
55
190
  }
56
191
  ```
57
192
 
193
+ Note:
194
+ - Provider-backed modes (`anthropic_count_tokens`, `gemini_count_tokens`, `gemma_sentencepiece`) are only supported in `estimateAsync()`.
195
+
58
196
  **Returns:**
59
197
 
60
198
  ```typescript
@@ -64,13 +202,50 @@ interface EstimateOutput {
64
202
  estimatedTokens: number; // Estimated token count (integer)
65
203
  estimatedInputCost: number; // Estimated cost in USD
66
204
  charsPerToken: number; // The ratio used for this model
205
+ tokenizerMode?: 'heuristic' | 'openai_exact' | 'auto'; // Which strategy was used
206
+ encodingUsed?: string; // OpenAI encoding when using exact tokenization
67
207
  }
68
208
  ```
69
209
 
210
+ ### `estimateAsync(input: EstimateAsyncInput): Promise<EstimateOutput>`
211
+
212
+ Async estimator that supports provider token counting modes:
213
+ - `anthropic_count_tokens` (Anthropic token count endpoint)
214
+ - `gemini_count_tokens` (Gemini token count endpoint)
215
+ - `gemma_sentencepiece` (local SentencePiece, requires `sentencepiece-js` and a model file)
216
+
217
+ API keys should be provided via env vars (`ANTHROPIC_API_KEY`, `GEMINI_API_KEY`) or passed explicitly in the config objects.
218
+
219
+ If you pass `fallbackToHeuristicOnError: true`, provider-backed modes will fall back to heuristic estimation on:
220
+ - invalid/expired API key (401/403)
221
+ - rate limiting (429)
222
+ - provider errors (5xx) or network issues
223
+
224
+ ### `countTokens(input: TokenCountInput): TokenCountOutput`
225
+
226
+ Counts tokens for a given model:
227
+ - OpenAI models: **exact** BPE tokenization
228
+ - Other providers: heuristic estimate
229
+
230
+ ```ts
231
+ import { countTokens } from 'ai-token-estimator';
232
+
233
+ const result = countTokens({ text: 'Hello, world!', model: 'gpt-5.1' });
234
+ // { tokens: 4, exact: true, encoding: 'o200k_base' }
235
+ ```
236
+
70
237
  ### `getAvailableModels(): string[]`
71
238
 
72
239
  Returns an array of all supported model IDs.
73
240
 
241
+ ### `encode(text: string, options?: EncodeOptions): number[]`
242
+
243
+ Encodes text into **OpenAI token IDs** using tiktoken-compatible BPE tokenization.
244
+
245
+ ### `decode(tokens: Iterable<number>, options?: { encoding?: OpenAIEncoding; model?: string }): string`
246
+
247
+ Decodes OpenAI token IDs back into text using the selected encoding/model.
248
+
74
249
  ### `getModelConfig(model: string): ModelConfig`
75
250
 
76
251
  Returns the configuration for a specific model. Throws if the model is not found.
@@ -108,6 +283,14 @@ This package counts Unicode code points, not UTF-16 code units. This means:
108
283
  - Accented characters count correctly
109
284
  - Most source code characters count as 1
110
285
 
286
+ ## Benchmarks (repo only)
287
+
288
+ This repository includes a small benchmark script to compare heuristic vs exact OpenAI tokenization:
289
+
290
+ ```bash
291
+ npm run benchmark:tokenizer
292
+ ```
293
+
111
294
  <!-- SUPPORTED_MODELS_START -->
112
295
  ## Supported Models
113
296