ai-speedometer 1.0.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +309 -0
- package/ai-benchmark-config.json.template +21 -0
- package/cli.js +1773 -0
- package/dist/ai-speedometer +172 -0
- package/docs/README.md +147 -0
- package/docs/models-dev-integration.md +344 -0
- package/docs/token-counting-fallback.md +345 -0
- package/package.json +69 -0
|
@@ -0,0 +1,345 @@
|
|
|
1
|
+
# Token Counting Fallback Mechanism
|
|
2
|
+
|
|
3
|
+
## Overview
|
|
4
|
+
|
|
5
|
+
The AI SDK benchmark tool includes a sophisticated token counting fallback mechanism that ensures accurate performance metrics even when provider APIs don't return reliable token usage data. This document explains how this mechanism works, what edge cases it handles, and why it's necessary.
|
|
6
|
+
|
|
7
|
+
## The Problem: Why Token Counting Fails
|
|
8
|
+
|
|
9
|
+
### Primary Issue: Provider API Limitations
|
|
10
|
+
|
|
11
|
+
Not all AI providers return consistent token usage data through their APIs. Common issues include:
|
|
12
|
+
|
|
13
|
+
1. **Missing Usage Object**: Some providers don't include `usage` in their API responses
|
|
14
|
+
2. **Incomplete Usage Data**: Providers may return `usage` object but with missing fields
|
|
15
|
+
3. **Streaming vs Non-Streaming**: Token counting works differently in streaming responses
|
|
16
|
+
4. **Provider-Specific Formats**: Different providers use different field names (`prompt_tokens` vs `input_tokens`)
|
|
17
|
+
5. **Network Errors**: API calls may succeed but usage data gets lost in transit
|
|
18
|
+
6. **Rate Limiting**: Usage data may be omitted during high load periods
|
|
19
|
+
|
|
20
|
+
### Secondary Issue: AI SDK Abstraction
|
|
21
|
+
|
|
22
|
+
The AI SDK provides a unified interface but relies on underlying provider implementations:
|
|
23
|
+
|
|
24
|
+
```javascript
|
|
25
|
+
// This may fail if provider doesn't support usage reporting
|
|
26
|
+
const usage = await result.usage; // Can be null or undefined
|
|
27
|
+
```
|
|
28
|
+
|
|
29
|
+
## The Fallback Solution
|
|
30
|
+
|
|
31
|
+
### Dual-Layer Token Counting Strategy
|
|
32
|
+
|
|
33
|
+
The benchmark implements a two-tier approach to token counting:
|
|
34
|
+
|
|
35
|
+
#### Tier 1: Provider Token Counting (Preferred)
|
|
36
|
+
|
|
37
|
+
```javascript
|
|
38
|
+
// Try to get accurate token counts from provider
|
|
39
|
+
let usage = null;
|
|
40
|
+
try {
|
|
41
|
+
usage = await result.usage;
|
|
42
|
+
} catch (e) {
|
|
43
|
+
// Usage might not be available
|
|
44
|
+
}
|
|
45
|
+
|
|
46
|
+
// Use provider data if available
|
|
47
|
+
const completionTokens = usage?.completionTokens || null;
|
|
48
|
+
const promptTokens = usage?.promptTokens || null;
|
|
49
|
+
```
|
|
50
|
+
|
|
51
|
+
**When this works:**
|
|
52
|
+
- Provider returns comprehensive usage data
|
|
53
|
+
- AI SDK successfully extracts usage information
|
|
54
|
+
- Network conditions are good
|
|
55
|
+
- Provider supports usage reporting
|
|
56
|
+
|
|
57
|
+
**When this fails:**
|
|
58
|
+
- Provider API doesn't include usage data
|
|
59
|
+
- Network errors during usage retrieval
|
|
60
|
+
- Provider doesn't support token counting
|
|
61
|
+
- Rate limiting or API errors
|
|
62
|
+
|
|
63
|
+
#### Tier 2: Manual Token Estimation (Fallback)
|
|
64
|
+
|
|
65
|
+
```javascript
|
|
66
|
+
// Fallback to manual counting when provider data is unavailable
|
|
67
|
+
const completionTokens = usage?.completionTokens || tokenCount;
|
|
68
|
+
const promptTokens = usage?.promptTokens || Math.round(testPrompt.length / 4);
|
|
69
|
+
const totalTokens = usage?.totalTokens || (completionTokens + promptTokens);
|
|
70
|
+
```
|
|
71
|
+
|
|
72
|
+
### Manual Estimation Algorithm
|
|
73
|
+
|
|
74
|
+
#### Completion Tokens Estimation
|
|
75
|
+
|
|
76
|
+
```javascript
|
|
77
|
+
// Capture text as it streams in
|
|
78
|
+
let fullText = '';
|
|
79
|
+
for await (const textPart of result.textStream) {
|
|
80
|
+
fullText += textPart;
|
|
81
|
+
// Update token count in real-time
|
|
82
|
+
tokenCount = Math.round(fullText.length / 4);
|
|
83
|
+
}
|
|
84
|
+
```
|
|
85
|
+
|
|
86
|
+
**Algorithm:**
|
|
87
|
+
1. **Aggregate All Text**: Combine all streaming text chunks into complete response
|
|
88
|
+
2. **Character-to-Token Ratio**: Use standard 4 characters per token ratio
|
|
89
|
+
3. **Real-time Updates**: Update token count as each chunk arrives
|
|
90
|
+
4. **Final Calculation**: Round to nearest whole number
|
|
91
|
+
|
|
92
|
+
#### Prompt Tokens Estimation
|
|
93
|
+
|
|
94
|
+
```javascript
|
|
95
|
+
const promptTokens = Math.round(testPrompt.length / 4);
|
|
96
|
+
```
|
|
97
|
+
|
|
98
|
+
**Algorithm:**
|
|
99
|
+
1. **Simple Calculation**: Divide prompt length by 4
|
|
100
|
+
2. **Fixed Estimation**: Prompt doesn't change during execution
|
|
101
|
+
3. **Consistent Ratio**: Same 4:1 character-to-token ratio
|
|
102
|
+
|
|
103
|
+
## Edge Cases Handled
|
|
104
|
+
|
|
105
|
+
### Edge Case 1: Partial Usage Data
|
|
106
|
+
|
|
107
|
+
**Scenario**: Provider returns some usage data but not all fields
|
|
108
|
+
|
|
109
|
+
```javascript
|
|
110
|
+
// Provider returns: { prompt_tokens: 20, completion_tokens: null }
|
|
111
|
+
const completionTokens = usage?.completionTokens || tokenCount; // Uses fallback
|
|
112
|
+
const promptTokens = usage?.promptTokens || Math.round(testPrompt.length / 4); // Uses provider data
|
|
113
|
+
```
|
|
114
|
+
|
|
115
|
+
**Handling Logic:**
|
|
116
|
+
- Use available provider data where possible
|
|
117
|
+
- Apply fallback only for missing fields
|
|
118
|
+
- Maintain data integrity by not mixing unreliable sources
|
|
119
|
+
|
|
120
|
+
### Edge Case 2: Streaming Timeouts
|
|
121
|
+
|
|
122
|
+
**Scenario**: Stream starts but doesn't complete within timeout
|
|
123
|
+
|
|
124
|
+
```javascript
|
|
125
|
+
let chunkCount = 0;
|
|
126
|
+
for await (const textPart of result.textStream) {
|
|
127
|
+
fullText += textPart;
|
|
128
|
+
chunkCount++;
|
|
129
|
+
if (chunkCount > 1000) {
|
|
130
|
+
console.log('Stream timeout, using partial data');
|
|
131
|
+
break;
|
|
132
|
+
}
|
|
133
|
+
}
|
|
134
|
+
```
|
|
135
|
+
|
|
136
|
+
**Handling Logic:**
|
|
137
|
+
- Count partial tokens received
|
|
138
|
+
- Log timeout event
|
|
139
|
+
- Continue with available data
|
|
140
|
+
- Mark as potentially incomplete
|
|
141
|
+
|
|
142
|
+
### Edge Case 3: Empty Responses
|
|
143
|
+
|
|
144
|
+
**Scenario**: AI returns empty or very short responses
|
|
145
|
+
|
|
146
|
+
```javascript
|
|
147
|
+
if (fullText.length === 0) {
|
|
148
|
+
tokenCount = 0; // No tokens = no content
|
|
149
|
+
tokensPerSecond = 0; // Avoid division by zero
|
|
150
|
+
}
|
|
151
|
+
```
|
|
152
|
+
|
|
153
|
+
**Handling Logic:**
|
|
154
|
+
- Zero-length responses get zero tokens
|
|
155
|
+
- Prevent division by zero in rate calculations
|
|
156
|
+
- Log empty response as warning
|
|
157
|
+
|
|
158
|
+
### Edge Case 4: Network Interruptions
|
|
159
|
+
|
|
160
|
+
**Scenario**: Connection drops during streaming
|
|
161
|
+
|
|
162
|
+
```javascript
|
|
163
|
+
try {
|
|
164
|
+
for await (const textPart of result.textStream) {
|
|
165
|
+
fullText += textPart;
|
|
166
|
+
}
|
|
167
|
+
} catch (networkError) {
|
|
168
|
+
console.log('Network interruption, using partial data');
|
|
169
|
+
// Continue with partial data already collected
|
|
170
|
+
}
|
|
171
|
+
```
|
|
172
|
+
|
|
173
|
+
**Handling Logic:**
|
|
174
|
+
- Catch network errors during streaming
|
|
175
|
+
- Use tokens collected before interruption
|
|
176
|
+
- Mark result as potentially incomplete
|
|
177
|
+
- Log network error for debugging
|
|
178
|
+
|
|
179
|
+
### Edge Case 5: Provider Rate Limiting
|
|
180
|
+
|
|
181
|
+
**Scenario**: Provider returns rate limiting error
|
|
182
|
+
|
|
183
|
+
```javascript
|
|
184
|
+
if (error.message.includes('rate limit') || error.message.includes('429')) {
|
|
185
|
+
console.log('Rate limited, marking as failed');
|
|
186
|
+
return {
|
|
187
|
+
success: false,
|
|
188
|
+
error: 'Rate limited by provider',
|
|
189
|
+
tokenCount: 0,
|
|
190
|
+
totalTime: 0
|
|
191
|
+
};
|
|
192
|
+
}
|
|
193
|
+
```
|
|
194
|
+
|
|
195
|
+
**Handling Logic:**
|
|
196
|
+
- Detect rate limiting errors
|
|
197
|
+
- Mark benchmark as failed
|
|
198
|
+
- Don't use fallback token counting
|
|
199
|
+
- Provide clear error message
|
|
200
|
+
|
|
201
|
+
## Accuracy Considerations
|
|
202
|
+
|
|
203
|
+
### Estimation Accuracy
|
|
204
|
+
|
|
205
|
+
**Character-to-Token Ratio:**
|
|
206
|
+
- **Standard Ratio**: 4 characters per token
|
|
207
|
+
- **Actual Range**: 3-5 characters per token depending on content
|
|
208
|
+
- **Error Margin**: ±20% typical estimation error
|
|
209
|
+
|
|
210
|
+
**Factors Affecting Accuracy:**
|
|
211
|
+
1. **Content Type**: Code vs prose vs technical content
|
|
212
|
+
2. **Language**: English vs other languages
|
|
213
|
+
3. **Formatting**: Markdown, code blocks, special characters
|
|
214
|
+
4. **Provider**: Different tokenization algorithms
|
|
215
|
+
|
|
216
|
+
### When to Trust Provider Data vs Fallback
|
|
217
|
+
|
|
218
|
+
**Trust Provider Data When:**
|
|
219
|
+
- Provider is OpenAI or Anthropic (reliable token counting)
|
|
220
|
+
- Network connection is stable
|
|
221
|
+
- Usage object is complete and well-formed
|
|
222
|
+
- Response is substantial (>100 tokens)
|
|
223
|
+
|
|
224
|
+
**Use Fallback When:**
|
|
225
|
+
- Provider is unknown or custom
|
|
226
|
+
- Usage object is missing or incomplete
|
|
227
|
+
- Network errors occur
|
|
228
|
+
- Response is very short (<50 tokens)
|
|
229
|
+
|
|
230
|
+
## Performance Implications
|
|
231
|
+
|
|
232
|
+
### Overhead of Fallback Mechanism
|
|
233
|
+
|
|
234
|
+
**Additional Operations:**
|
|
235
|
+
1. **Text Aggregation**: Building complete response string
|
|
236
|
+
2. **Real-time Calculation**: Updating token count during streaming
|
|
237
|
+
3. **Error Handling**: Try/catch blocks and conditional logic
|
|
238
|
+
4. **Memory Usage**: Storing complete response in memory
|
|
239
|
+
|
|
240
|
+
**Performance Impact:**
|
|
241
|
+
- **Memory**: ~10-50KB additional memory per response
|
|
242
|
+
- **CPU**: Minimal overhead from character counting
|
|
243
|
+
- **Network**: No additional network calls
|
|
244
|
+
- **Overall**: <5% performance impact
|
|
245
|
+
|
|
246
|
+
### Optimization Trade-offs
|
|
247
|
+
|
|
248
|
+
**Accuracy vs Performance:**
|
|
249
|
+
- **High Accuracy**: Always use provider data when available
|
|
250
|
+
- **High Performance**: Skip fallback and use estimates only
|
|
251
|
+
- **Balanced Approach**: Current implementation (preferred)
|
|
252
|
+
|
|
253
|
+
**Memory vs Speed:**
|
|
254
|
+
- **Current Implementation**: Stores full response for accuracy
|
|
255
|
+
- **Alternative**: Stream to disk for very large responses
|
|
256
|
+
- **Trade-off**: Memory usage vs disk I/O overhead
|
|
257
|
+
|
|
258
|
+
## Error Handling and Logging
|
|
259
|
+
|
|
260
|
+
### Error Categories
|
|
261
|
+
|
|
262
|
+
1. **Provider Errors**: API errors, authentication failures
|
|
263
|
+
2. **Network Errors**: Connection timeouts, interruptions
|
|
264
|
+
3. **Usage Errors**: Missing or incomplete usage data
|
|
265
|
+
4. **Calculation Errors**: Division by zero, invalid data
|
|
266
|
+
|
|
267
|
+
### Logging Strategy
|
|
268
|
+
|
|
269
|
+
```javascript
|
|
270
|
+
// Log fallback usage for debugging
|
|
271
|
+
if (usage?.completionTokens === undefined) {
|
|
272
|
+
console.log('Using fallback token counting for', model.name);
|
|
273
|
+
console.log('Estimated completion tokens:', tokenCount);
|
|
274
|
+
}
|
|
275
|
+
|
|
276
|
+
// Log estimation accuracy
|
|
277
|
+
if (usage && tokenCount > 0) {
|
|
278
|
+
const accuracy = (usage.completionTokens / tokenCount) * 100;
|
|
279
|
+
console.log('Token estimation accuracy:', accuracy.toFixed(1) + '%');
|
|
280
|
+
}
|
|
281
|
+
```
|
|
282
|
+
|
|
283
|
+
## Configuration and Tuning
|
|
284
|
+
|
|
285
|
+
### Adjustable Parameters
|
|
286
|
+
|
|
287
|
+
```javascript
|
|
288
|
+
// Character-to-token ratio (configurable)
|
|
289
|
+
const CHAR_TO_TOKEN_RATIO = 4;
|
|
290
|
+
|
|
291
|
+
// Maximum chunks before timeout (configurable)
|
|
292
|
+
const MAX_CHUNKS = 1000;
|
|
293
|
+
|
|
294
|
+
// Minimum response length for reliable counting (configurable)
|
|
295
|
+
const MIN_RESPONSE_LENGTH = 10;
|
|
296
|
+
```
|
|
297
|
+
|
|
298
|
+
### Environment Variables
|
|
299
|
+
|
|
300
|
+
```bash
|
|
301
|
+
# Override default character-to-token ratio
|
|
302
|
+
TOKEN_COUNTING_RATIO=3.5
|
|
303
|
+
|
|
304
|
+
# Enable debug logging for token counting
|
|
305
|
+
DEBUG_TOKEN_COUNTING=true
|
|
306
|
+
|
|
307
|
+
# Maximum response size to store in memory (bytes)
|
|
308
|
+
MAX_RESPONSE_SIZE=1048576 # 1MB
|
|
309
|
+
```
|
|
310
|
+
|
|
311
|
+
## Future Enhancements
|
|
312
|
+
|
|
313
|
+
### Potential Improvements
|
|
314
|
+
|
|
315
|
+
1. **Provider-Specific Ratios**: Different ratios for different providers
|
|
316
|
+
2. **Machine Learning**: Train models for better estimation
|
|
317
|
+
3. **Historical Data**: Use past estimation accuracy to improve future estimates
|
|
318
|
+
4. **Provider Feedback**: Loop back to compare estimates with actual provider data
|
|
319
|
+
5. **Adaptive Algorithms**: Adjust ratio based on content type and language
|
|
320
|
+
|
|
321
|
+
### Integration with AI SDK
|
|
322
|
+
|
|
323
|
+
**Long-term Goal:**
|
|
324
|
+
- Contribute improvements back to AI SDK
|
|
325
|
+
- Standardize token counting across providers
|
|
326
|
+
- Implement provider-level fallback mechanisms
|
|
327
|
+
- Add provider-specific token counting plugins
|
|
328
|
+
|
|
329
|
+
## Summary
|
|
330
|
+
|
|
331
|
+
The token counting fallback mechanism ensures that the benchmark tool provides reliable performance metrics even when provider APIs fail to deliver accurate token usage data. By combining provider token counting with manual estimation, the tool maintains accuracy while gracefully handling edge cases and errors.
|
|
332
|
+
|
|
333
|
+
**Key Benefits:**
|
|
334
|
+
- Always returns token counts, never fails
|
|
335
|
+
- Handles edge cases gracefully
|
|
336
|
+
- Provides debug information for troubleshooting
|
|
337
|
+
- Maintains consistent interface across providers
|
|
338
|
+
- Minimal performance overhead
|
|
339
|
+
|
|
340
|
+
**Best Practices:**
|
|
341
|
+
- Always prefer provider token counting when available
|
|
342
|
+
- Log fallback usage for debugging
|
|
343
|
+
- Monitor estimation accuracy over time
|
|
344
|
+
- Configure ratios based on your specific use cases
|
|
345
|
+
- Update provider-specific configurations as needed
|
package/package.json
ADDED
|
@@ -0,0 +1,69 @@
|
|
|
1
|
+
{
|
|
2
|
+
"name": "ai-speedometer",
|
|
3
|
+
"version": "1.0.0",
|
|
4
|
+
"description": "A comprehensive CLI tool for benchmarking AI models across multiple providers with parallel execution and professional metrics",
|
|
5
|
+
"main": "cli.js",
|
|
6
|
+
"bin": {
|
|
7
|
+
"ai-speedometer": "dist/ai-speedometer",
|
|
8
|
+
"aispeed": "dist/ai-speedometer"
|
|
9
|
+
},
|
|
10
|
+
"preferGlobal": true,
|
|
11
|
+
"type": "module",
|
|
12
|
+
"engines": {
|
|
13
|
+
"node": ">=18.0.0"
|
|
14
|
+
},
|
|
15
|
+
"scripts": {
|
|
16
|
+
"start": "node cli.js",
|
|
17
|
+
"cli": "node cli.js",
|
|
18
|
+
"cli:debug": "node cli.js --debug",
|
|
19
|
+
"benchmark": "node benchmark-rest.js",
|
|
20
|
+
"benchmark-all": "node benchmark-rest.js --all",
|
|
21
|
+
"benchmark-track": "node benchmark-rest.js --all --track",
|
|
22
|
+
"build": "esbuild cli.js --bundle --platform=node --outfile=dist/ai-speedometer --format=esm --minify --external:jsonc-parser --external:dotenv && sed -i '1s|^.*|#!/usr/bin/env node|' dist/ai-speedometer",
|
|
23
|
+
"build:dev": "esbuild cli.js --bundle --platform=node --outfile=dist/ai-speedometer --format=esm --external:jsonc-parser --external:dotenv && sed -i '1s|^.*|#!/usr/bin/env node|' dist/ai-speedometer",
|
|
24
|
+
"prepublishOnly": "npm run build",
|
|
25
|
+
"test": "echo \"Error: no test specified\" && exit 1"
|
|
26
|
+
},
|
|
27
|
+
"keywords": [
|
|
28
|
+
"ai",
|
|
29
|
+
"benchmark",
|
|
30
|
+
"cli",
|
|
31
|
+
"speedometer",
|
|
32
|
+
"performance",
|
|
33
|
+
"llm",
|
|
34
|
+
"models",
|
|
35
|
+
"testing",
|
|
36
|
+
"openai",
|
|
37
|
+
"anthropic",
|
|
38
|
+
"ai-sdk",
|
|
39
|
+
"parallel",
|
|
40
|
+
"metrics"
|
|
41
|
+
],
|
|
42
|
+
"author": "ai-speedometer",
|
|
43
|
+
"license": "MIT",
|
|
44
|
+
"repository": {
|
|
45
|
+
"type": "git",
|
|
46
|
+
"url": "git+https://github.com/aptdnfapt/Ai-speedometer.git"
|
|
47
|
+
},
|
|
48
|
+
"bugs": {
|
|
49
|
+
"url": "https://github.com/aptdnfapt/Ai-speedometer/issues"
|
|
50
|
+
},
|
|
51
|
+
"homepage": "https://github.com/aptdnfapt/Ai-speedometer#readme",
|
|
52
|
+
"files": [
|
|
53
|
+
"dist/ai-speedometer",
|
|
54
|
+
"docs/",
|
|
55
|
+
"README.md",
|
|
56
|
+
"ai-benchmark-config.json.template"
|
|
57
|
+
],
|
|
58
|
+
"dependencies": {
|
|
59
|
+
"@ai-sdk/anthropic": "^2.0.17",
|
|
60
|
+
"@ai-sdk/openai-compatible": "^1.0.17",
|
|
61
|
+
"ai": "^5.0.44",
|
|
62
|
+
"cli-table3": "^0.6.5",
|
|
63
|
+
"dotenv": "^17.2.2",
|
|
64
|
+
"jsonc-parser": "^3.3.1"
|
|
65
|
+
},
|
|
66
|
+
"devDependencies": {
|
|
67
|
+
"esbuild": "^0.25.10"
|
|
68
|
+
}
|
|
69
|
+
}
|