npm - ai-token-estimator - Versions diffs - 1.0.3 → 1.2.0 - Mend

ai-token-estimator 1.0.3 → 1.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (6) hide show

package/README.md CHANGED Viewed

@@ -1,12 +1,28 @@
 # ai-token-estimator
-Estimate token counts and costs for LLM API calls based on character count and model-specific ratios.
-> **Important:** This is a rough estimation tool for budgeting purposes, not a precise tokenizer. Actual token counts may vary by ±20% depending on:
-> - Content type (code vs prose)
-> - Language (CJK languages use more tokens)
-> - API message framing overhead
-> - Special characters and formatting
+[![npm](https://img.shields.io/npm/v/ai-token-estimator.svg)](https://www.npmjs.com/package/ai-token-estimator)
+[![CI](https://github.com/BitsAndBytesAI/ai-token-estimator/actions/workflows/ci.yml/badge.svg)](https://github.com/BitsAndBytesAI/ai-token-estimator/actions/workflows/ci.yml)
+[![license](https://img.shields.io/npm/l/ai-token-estimator.svg)](https://github.com/BitsAndBytesAI/ai-token-estimator/blob/main/LICENSE)
+The best way to estimate **tokens + input cost** for LLM calls — with **exact OpenAI tokenization** (tiktoken-compatible BPE) and optional **official provider token counting** for Claude/Gemini.
+> Accuracy depends on the tokenizer mode you choose:
+> - **Exact** for OpenAI models when you use `openai_exact` / `encode()` / `decode()`.
+> - **Exact** for Claude/Gemini when you use `estimateAsync()` with their official count-tokens endpoints.
+> - **Heuristic** fallback is available for speed and resilience.
+## Features
+- **Exact OpenAI tokenization** (tiktoken-compatible BPE): `encode()` / `decode()` / `openai_exact`
+- **Official provider token counting** (async):
+  - Anthropic `POST /v1/messages/count_tokens` (`anthropic_count_tokens`)
+  - Gemini `models/:countTokens` (`gemini_count_tokens`)
+- **Fast local fallback** options:
+  - Heuristic (`heuristic`, default)
+  - Local Gemma SentencePiece approximation (`gemma_sentencepiece`)
+  - Automatic fallback to heuristic on provider failures (`fallbackToHeuristicOnError`)
+- **Cost estimation** using a weekly auto-updated pricing/model list (GitHub Actions)
+- TypeScript-first, ships ESM + CJS
 ## Installation
@@ -17,7 +33,7 @@ npm install ai-token-estimator
 ## Usage
 ```typescript
-import { estimate, getAvailableModels } from 'ai-token-estimator';
+import { countTokens, estimate, getAvailableModels } from 'ai-token-estimator';
 // Basic usage
 const result = estimate({
@@ -37,8 +53,126 @@ console.log(result);
 // List available models
 console.log(getAvailableModels());
 // ['gpt-5.2', 'gpt-4o', 'claude-opus-4.5', 'gemini-3-pro', ...]
+// Exact tokens for OpenAI, heuristic for others
+console.log(countTokens({ text: 'Hello, world!', model: 'gpt-5.1' }));
+// { tokens: 4, exact: true, encoding: 'o200k_base' }
+```
+## Exact OpenAI tokenization (BPE)
+This package includes **exact tokenization for OpenAI models** using a tiktoken-compatible BPE tokenizer (via `gpt-tokenizer`).
+Notes:
+- Encodings are **lazy-loaded on first use** (one-time cost per encoding).
+- Exact tokenization is **slower** than heuristic estimation; `estimate()` defaults to `'heuristic'` to keep existing behavior fast.
+- `encode` / `decode` and `estimate({ tokenizer: 'openai_exact' })` require **Node.js** (uses `node:module` under the hood).
+```ts
+import { encode, decode } from 'ai-token-estimator';
+const text = 'Hello, world!';
+const tokens = encode(text, { model: 'gpt-5.1' }); // exact OpenAI token IDs
+const roundTrip = decode(tokens, { model: 'gpt-5.1' });
+console.log(tokens.length);
+console.log(roundTrip); // "Hello, world!"
+```
+Supported encodings:
+`r50k_base`, `p50k_base`, `p50k_edit`, `cl100k_base`, `o200k_base`, `o200k_harmony`
+## Using the exact tokenizer with `estimate()`
+`estimate()` is heuristic by default (fast). If you want to use exact OpenAI token counting:
+```ts
+import { estimate } from 'ai-token-estimator';
+const result = estimate({
+  text: 'Hello, world!',
+  model: 'gpt-5.1',
+  tokenizer: 'openai_exact',
+});
+console.log(result.tokenizerMode); // "openai_exact"
+console.log(result.encodingUsed);  // "o200k_base"
+```
+Or use `tokenizer: 'auto'` to use exact counting for OpenAI models and heuristic for everything else.
+## Provider token counting (Claude / Gemini)
+If you want **more accurate token counts** for Anthropic or Gemini models, you can call their official token counting endpoints
+via `estimateAsync()`. This requires API keys, and therefore should be used **server-side** (never in the browser).
+If you want these modes to **fail open** (fallback to heuristic estimation) when the provider API is throttled/unavailable or the API key is invalid,
+set `fallbackToHeuristicOnError: true`.
+### Anthropic: `POST /v1/messages/count_tokens`
+- Env var: `ANTHROPIC_API_KEY`
+```ts
+import { estimateAsync } from 'ai-token-estimator';
+const out = await estimateAsync({
+  text: 'Hello, Claude',
+  model: 'claude-sonnet-4-5',
+  tokenizer: 'anthropic_count_tokens',
+  fallbackToHeuristicOnError: true,
+  anthropic: {
+    // apiKey: '...' // optional; otherwise uses process.env.ANTHROPIC_API_KEY
+    system: 'You are a helpful assistant',
+  },
+});
+console.log(out.estimatedTokens);
 ```
+### Gemini: `models/:countTokens` (Google AI Studio)
+- Env var: `GEMINI_API_KEY`
+```ts
+import { estimateAsync } from 'ai-token-estimator';
+const out = await estimateAsync({
+  text: 'The quick brown fox jumps over the lazy dog.',
+  model: 'gemini-2.0-flash',
+  tokenizer: 'gemini_count_tokens',
+  fallbackToHeuristicOnError: true,
+  gemini: {
+    // apiKey: '...' // optional; otherwise uses process.env.GEMINI_API_KEY
+  },
+});
+console.log(out.estimatedTokens);
+```
+### Local Gemini option: Gemma SentencePiece (approximation)
+If you want a **local** tokenizer option for Gemini-like models, you can use a SentencePiece tokenizer model (e.g. Gemma's
+`tokenizer.model`) via `sentencepiece-js`.
+```ts
+import { estimateAsync } from 'ai-token-estimator';
+const out = await estimateAsync({
+  text: 'Hello!',
+  model: 'gemini-2.0-flash',
+  tokenizer: 'gemma_sentencepiece',
+  gemma: {
+    modelPath: '/path/to/tokenizer.model',
+  },
+});
+console.log(out.estimatedTokens);
+```
+Note:
+- This is **not** an official Gemini tokenizer; treat it as an approximation unless you have verified equivalence for your models.
 ## API Reference
 ### `estimate(input: EstimateInput): EstimateOutput`
@@ -52,9 +186,13 @@ interface EstimateInput {
   text: string;           // The text to estimate tokens for
   model: string;          // Model ID (e.g., 'gpt-4o', 'claude-opus-4.5')
   rounding?: 'ceil' | 'round' | 'floor';  // Rounding strategy (default: 'ceil')
+  tokenizer?: 'heuristic' | 'openai_exact' | 'auto'; // Token counting strategy (default: 'heuristic')
 }
 ```
+Note:
+- Provider-backed modes (`anthropic_count_tokens`, `gemini_count_tokens`, `gemma_sentencepiece`) are only supported in `estimateAsync()`.
 **Returns:**
 ```typescript
@@ -64,13 +202,50 @@ interface EstimateOutput {
   estimatedTokens: number; // Estimated token count (integer)
   estimatedInputCost: number; // Estimated cost in USD
   charsPerToken: number;   // The ratio used for this model
+  tokenizerMode?: 'heuristic' | 'openai_exact' | 'auto'; // Which strategy was used
+  encodingUsed?: string;   // OpenAI encoding when using exact tokenization
 }
 ```
+### `estimateAsync(input: EstimateAsyncInput): Promise<EstimateOutput>`
+Async estimator that supports provider token counting modes:
+- `anthropic_count_tokens` (Anthropic token count endpoint)
+- `gemini_count_tokens` (Gemini token count endpoint)
+- `gemma_sentencepiece` (local SentencePiece, requires `sentencepiece-js` and a model file)
+API keys should be provided via env vars (`ANTHROPIC_API_KEY`, `GEMINI_API_KEY`) or passed explicitly in the config objects.
+If you pass `fallbackToHeuristicOnError: true`, provider-backed modes will fall back to heuristic estimation on:
+- invalid/expired API key (401/403)
+- rate limiting (429)
+- provider errors (5xx) or network issues
+### `countTokens(input: TokenCountInput): TokenCountOutput`
+Counts tokens for a given model:
+- OpenAI models: **exact** BPE tokenization
+- Other providers: heuristic estimate
+```ts
+import { countTokens } from 'ai-token-estimator';
+const result = countTokens({ text: 'Hello, world!', model: 'gpt-5.1' });
+// { tokens: 4, exact: true, encoding: 'o200k_base' }
+```
 ### `getAvailableModels(): string[]`
 Returns an array of all supported model IDs.
+### `encode(text: string, options?: EncodeOptions): number[]`
+Encodes text into **OpenAI token IDs** using tiktoken-compatible BPE tokenization.
+### `decode(tokens: Iterable<number>, options?: { encoding?: OpenAIEncoding; model?: string }): string`
+Decodes OpenAI token IDs back into text using the selected encoding/model.
 ### `getModelConfig(model: string): ModelConfig`
 Returns the configuration for a specific model. Throws if the model is not found.
@@ -108,6 +283,14 @@ This package counts Unicode code points, not UTF-16 code units. This means:
 - Accented characters count correctly
 - Most source code characters count as 1
+## Benchmarks (repo only)
+This repository includes a small benchmark script to compare heuristic vs exact OpenAI tokenization:
+```bash
+npm run benchmark:tokenizer
+```
 <!-- SUPPORTED_MODELS_START -->
 ## Supported Models