llm-cost-attribution 0.1.1 → 0.2.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +56 -133
- package/bin/llm-cost.mjs +151 -0
- package/package.json +3 -2
- package/src/calibrate.mjs +310 -0
- package/src/forecast.mjs +425 -0
- package/src/index.mjs +83 -0
- package/src/project-forecast.mjs +329 -0
- package/src/quantiles.mjs +39 -0
- package/src/synthetic.mjs +153 -0
package/README.md
CHANGED
|
@@ -1,102 +1,48 @@
|
|
|
1
1
|
# llm-cost-attribution
|
|
2
2
|
|
|
3
|
-
Per-issue
|
|
3
|
+
Per-issue cost analytics for [Claude Code](https://docs.anthropic.com/en/docs/claude-code) and [Codex CLI](https://github.com/openai/codex) sessions — how many **tokens** an issue burned, how many **turns** it took (one agent request → response is a turn), and how much of your Codex/Claude plan's rate-limit **quota** it ate. It reads the CLIs' own session logs (JSONL = one JSON record per line) — **no telemetry pipeline, no database, no API keys**.
|
|
4
4
|
|
|
5
5
|
```bash
|
|
6
6
|
npx llm-cost-attribution EPAC-1940
|
|
7
7
|
```
|
|
8
8
|
|
|
9
9
|
```
|
|
10
|
-
|
|
11
|
-
|
|
12
|
-
|
|
13
|
-
|
|
14
|
-
|
|
15
|
-
|
|
16
|
-
|
|
17
|
-
|
|
18
|
-
|
|
19
|
-
|
|
20
|
-
Models: gpt-5-codex
|
|
21
|
-
Turns: 340
|
|
22
|
-
Tokens:
|
|
23
|
-
input uncached 1,517,206
|
|
24
|
-
cache read 51,024,768
|
|
25
|
-
output (visible) 44,683
|
|
26
|
-
output (reasoning) 18,649
|
|
27
|
-
grand total 52,605,306
|
|
28
|
-
Quota (plan_type=pro, 345 samples):
|
|
29
|
-
5h window 58% → 64% used (peak 64%)
|
|
30
|
-
7d window 56% → 57% used (peak 57%)
|
|
10
|
+
LLM COST — EPAC-1940
|
|
11
|
+
Sessions: 5 Turns: 414 Tokens: 61,357,012
|
|
12
|
+
|
|
13
|
+
CODEX (4 sessions) Models: gpt-5-codex Turns: 340
|
|
14
|
+
input uncached 1,517,206
|
|
15
|
+
cache read 51,024,768
|
|
16
|
+
output (visible) 44,683
|
|
17
|
+
output (reasoning) 18,649
|
|
18
|
+
grand total 52,605,306
|
|
19
|
+
Quota (pro, 345 samples): 5h 58%→64% (peak 64%) 7d 56%→57% (peak 57%)
|
|
31
20
|
```
|
|
32
21
|
|
|
33
|
-
|
|
22
|
+
Reading that block: **cache read** is tokens the provider served from its prompt cache (cheap, and usually most of the total); **output (reasoning)** is the model's hidden thinking tokens, billed separately from the **visible** answer; **Quota** is how much of your Codex plan's two rolling rate-limit windows — a 5-hour and a 7-day one — these sessions used.
|
|
34
23
|
|
|
35
|
-
|
|
36
|
-
|
|
37
|
-
- **§4.1.4 Workspace** — "Filesystem workspace assigned to one issue identifier."
|
|
38
|
-
- **Workspace path formula** — `<workspace.root>/<sanitized_issue_identifier>`.
|
|
39
|
-
- **Invariant 1** — "Run the coding agent only in the per-issue workspace path... validate: `cwd == workspace_path`."
|
|
40
|
-
|
|
41
|
-
Because of those requirements, the working directory of every Claude Code or Codex CLI session that Symphony (or any Symphony-spec-conformant orchestrator) launches always carries the issue identifier as its last path component. The CLI agents in turn record that `cwd` in every session JSONL they create. So the issue identifier is already in the transcript — no custom telemetry pipeline needed to join.
|
|
42
|
-
|
|
43
|
-
This package's default `--cwd-pattern` matches the two most common `workspace.root` configurations:
|
|
44
|
-
|
|
45
|
-
1. The Symphony spec default: `<system-temp>/symphony_workspaces/<ISSUE-ID>` (e.g. `/tmp/symphony_workspaces/EPAC-1940`).
|
|
46
|
-
2. A common in-repo override: `<repo>/.symphony/workspaces/<ISSUE-ID>` (used by Autopilot and the Riddim factory's Symphony config).
|
|
47
|
-
|
|
48
|
-
For any other `workspace.root` setting, pass `--cwd-pattern '<regex>'` with one capture group for the issue identifier — see "[The convention](#the-convention)" below.
|
|
24
|
+
Requires Node 20+. Zero runtime dependencies.
|
|
49
25
|
|
|
50
26
|
## How it works
|
|
51
27
|
|
|
52
|
-
Both CLIs persist every
|
|
53
|
-
|
|
54
|
-
- **Claude Code** writes `~/.claude/projects/<encoded-cwd>/<sessionId>.jsonl` for every interactive and non-interactive run (encoded-cwd is the absolute working directory with `/` and `.` replaced by `-`).
|
|
55
|
-
- **Codex CLI** writes `~/.codex/sessions/YYYY/MM/DD/rollout-<timestamp>-<id>.jsonl` for every run, with the working directory recorded in the first `session_meta` event.
|
|
56
|
-
|
|
57
|
-
Each file carries provider-reported token usage per turn — the same numbers your Anthropic / OpenAI account is billed against:
|
|
58
|
-
|
|
59
|
-
| Provider | Tokens captured |
|
|
60
|
-
|---|---|
|
|
61
|
-
| Claude | `input_tokens`, `cache_read_input_tokens`, `cache_creation.{ephemeral_5m,1h}_input_tokens`, `output_tokens` |
|
|
62
|
-
| Codex | `input_tokens`, `cached_input_tokens`, `output_tokens`, `reasoning_output_tokens` (deltaed from cumulative) |
|
|
63
|
-
| Codex (additionally) | `rate_limits.{primary,secondary}.used_percent` per turn |
|
|
28
|
+
Both CLIs persist every run as JSONL — Claude Code in `~/.claude/projects/<encoded-cwd>/<sessionId>.jsonl` (`<encoded-cwd>` is just the run's working directory with `/` and `.` rewritten to `-`), Codex in `~/.codex/sessions/YYYY/MM/DD/rollout-*.jsonl` — and each file records, per turn, the provider-reported token counts (the same numbers your account is billed against) plus, for Codex, its rate-limit usage. This package walks both directories, keeps the sessions whose **working directory** matches the issue ID you ask for, and adds them up.
|
|
64
29
|
|
|
65
|
-
|
|
66
|
-
|
|
67
|
-
## The convention
|
|
68
|
-
|
|
69
|
-
You map sessions to issues via the **working directory at session start**. By default this package matches the Symphony-spec convention:
|
|
70
|
-
|
|
71
|
-
```
|
|
72
|
-
<repo>/.symphony/workspaces/<ISSUE-ID>
|
|
73
|
-
```
|
|
74
|
-
|
|
75
|
-
A regex extracts `<ISSUE-ID>`. If your workflow uses a different layout, pass `--cwd-pattern '<regex>'` with one capture group:
|
|
30
|
+
How does a session get matched to an issue? By its **working directory** (`cwd`). Under [Symphony](https://github.com/openai/symphony/blob/main/SPEC.md)'s spec — Symphony being an orchestrator that runs coding agents one issue at a time — each agent runs in a directory dedicated to its issue (`<workspace.root>/<ISSUE-ID>`), so the issue ID is already baked into every transcript's path; no custom pipeline needed. The default `--cwd-pattern` (the regex that pulls the issue ID out of that path) matches both the spec default (`<tmp>/symphony_workspaces/<ID>`) and the common in-repo layout (`<repo>/.symphony/workspaces/<ID>`). For any other layout, pass your own regex with one capture group around the ID:
|
|
76
31
|
|
|
77
32
|
```bash
|
|
78
|
-
|
|
79
|
-
llm-cost
|
|
80
|
-
|
|
81
|
-
# Your workflow uses ~/issues/<id>/
|
|
82
|
-
llm-cost 1234 --cwd-pattern '/issues/(\d+)$'
|
|
33
|
+
llm-cost FOO-12 --cwd-pattern '-([A-Z]+-\d+)$' # ../repo-worktrees/<ID>
|
|
34
|
+
llm-cost 1234 --cwd-pattern '/issues/(\d+)$' # ~/issues/<id>/
|
|
83
35
|
```
|
|
84
36
|
|
|
85
|
-
If your workflow doesn't give each issue its own
|
|
37
|
+
If your workflow doesn't give each issue its own directory, this package can't disambiguate sessions — see "What it doesn't do."
|
|
86
38
|
|
|
87
39
|
## Install
|
|
88
40
|
|
|
89
41
|
```bash
|
|
90
|
-
#
|
|
91
|
-
|
|
92
|
-
|
|
93
|
-
# Install globally
|
|
94
|
-
npm install -g llm-cost-attribution
|
|
95
|
-
llm-cost EPAC-1940
|
|
42
|
+
npx llm-cost-attribution EPAC-1940 # one-shot
|
|
43
|
+
npm install -g llm-cost-attribution # then: llm-cost EPAC-1940
|
|
96
44
|
```
|
|
97
45
|
|
|
98
|
-
Requires Node 20+. Zero runtime dependencies.
|
|
99
|
-
|
|
100
46
|
## CLI
|
|
101
47
|
|
|
102
48
|
```
|
|
@@ -104,47 +50,49 @@ llm-cost <ISSUE-ID> [options]
|
|
|
104
50
|
llm-cost <ISSUE-ID> --from-usage <usage.jsonl-or-dir>
|
|
105
51
|
llm-cost list
|
|
106
52
|
llm-cost backfill --out <usage.jsonl-path>
|
|
53
|
+
llm-cost calibrate <usage.jsonl-or-dir> [--seed N] [--holdout F]
|
|
107
54
|
llm-cost --help
|
|
108
55
|
|
|
109
56
|
Options:
|
|
110
|
-
--cwd-pattern <regex>
|
|
111
|
-
|
|
112
|
-
|
|
113
|
-
--
|
|
114
|
-
--
|
|
115
|
-
--
|
|
116
|
-
|
|
117
|
-
|
|
118
|
-
--
|
|
119
|
-
--json
|
|
120
|
-
-
|
|
57
|
+
--cwd-pattern <regex> JS regex matching the cwd; one capture group = issue ID.
|
|
58
|
+
--claude-dir <path> Override ~/.claude/projects.
|
|
59
|
+
--codex-dir <path> Override ~/.codex/sessions.
|
|
60
|
+
--from-usage <path> Read a baked usage.jsonl file/dir instead of transcripts.
|
|
61
|
+
--out <path> (backfill) Destination usage.jsonl. Appended.
|
|
62
|
+
--seed <int> (calibrate) Held-out split seed. Default 1.
|
|
63
|
+
--holdout <0..1> (calibrate) Fraction held out per cell. Default 0.2.
|
|
64
|
+
--quantile <0..1> (calibrate) Band to test. Default 0.8.
|
|
65
|
+
--threshold <0..1> (calibrate) Flag coverage drift beyond this. Default 0.1.
|
|
66
|
+
--json Emit JSON instead of a table.
|
|
67
|
+
--no-pricing Suppress the dollar block.
|
|
121
68
|
```
|
|
122
69
|
|
|
123
|
-
## Delete transcripts, keep cost history
|
|
70
|
+
## Delete transcripts, keep cost history
|
|
124
71
|
|
|
125
|
-
Transcripts are large
|
|
72
|
+
Transcripts are large (MBs per session, GBs across a factory) and mostly conversation content the cost tool doesn't need. `backfill` bakes every transcript into a small append-only JSONL (~1 KB/turn, no prompt/response content); queries then read that file, and the transcripts are safe to delete:
|
|
126
73
|
|
|
127
74
|
```bash
|
|
128
|
-
# Bake every transcript on this machine into one file.
|
|
129
75
|
llm-cost backfill --out ~/llm-cost-history.jsonl
|
|
130
|
-
|
|
131
|
-
# Cost queries now run against the much smaller file:
|
|
132
76
|
llm-cost EPAC-1940 --from-usage ~/llm-cost-history.jsonl
|
|
133
|
-
|
|
134
|
-
# Once you've verified the numbers match, transcripts are safe to delete:
|
|
135
|
-
rm -rf ~/.claude/projects ~/.codex/sessions
|
|
77
|
+
rm -rf ~/.claude/projects ~/.codex/sessions # once numbers verified
|
|
136
78
|
```
|
|
137
79
|
|
|
138
|
-
|
|
139
|
-
|
|
140
|
-
| | Before backfill | After backfill |
|
|
80
|
+
| | Before | After |
|
|
141
81
|
|---|---:|---:|
|
|
142
|
-
| Disk
|
|
143
|
-
|
|
|
82
|
+
| Disk | 5.0 GB | 125 MB (40× smaller) |
|
|
83
|
+
| Query time | ~3 min | ~0.3 s |
|
|
144
84
|
|
|
145
|
-
The
|
|
85
|
+
The bake is lossless for everything the analysis uses (quota windows, Claude cache tiers, Codex reasoning/visible split, totals, models, timestamps, workspace provenance). The format follows the [Symphony Cost Telemetry Extension spec](https://github.com/RiddimSoftware/groove/blob/main/specs/symphony-cost-telemetry-extension/SPEC.md), so a conformant orchestrator can emit `usage.jsonl` directly and skip the bake — optional interop, not required.
|
|
146
86
|
|
|
147
|
-
|
|
87
|
+
## Is the forecast trustworthy? (`calibrate`)
|
|
88
|
+
|
|
89
|
+
A **P80** is the 80th-percentile cost — the number 80% of comparable issues come in at or below. Claiming "P80 = 12K tokens" is only honest if, on issues the forecaster never saw, the real cost actually lands under 12K about 80% of the time; otherwise it's a horoscope. `calibrate` checks exactly that against a local `usage.jsonl` whose records are **estimate-tagged** (each one carries the issue's size estimate). It sorts the records into **cells** — groups of past issues sharing the same `{ size, model }` — holds out a reproducible slice of each cell (`--seed` makes the split repeatable), forecasts from what's left, and measures how often the held-out actuals really fell at or below the predicted P80. Any cell whose hit-rate drifts from 80% by more than `--threshold` is flagged ⚠. On a small dataset the coverage figures are themselves noisy — a cell with only a few held-out issues can read 0% or 100% by luck — so treat per-cell flags as directional until cells are well-populated.
|
|
90
|
+
|
|
91
|
+
```bash
|
|
92
|
+
llm-cost calibrate ~/backfill.out --seed 1 --holdout 0.2
|
|
93
|
+
```
|
|
94
|
+
|
|
95
|
+
Read-only and local — the input is never written back or committed (point it at a gitignored file). Committed tests use only synthetic fixtures (`test/forecast-recovers-known-dist.test.mjs`).
|
|
148
96
|
|
|
149
97
|
## Library
|
|
150
98
|
|
|
@@ -156,48 +104,23 @@ import {
|
|
|
156
104
|
listKnownIssues,
|
|
157
105
|
} from 'llm-cost-attribution';
|
|
158
106
|
|
|
159
|
-
|
|
160
|
-
const rollup = await computeIssueCost('EPAC-1940');
|
|
161
|
-
console.log(rollup.combinedTokens);
|
|
162
|
-
console.log(rollup.providerTotals.codex.quotaSamples);
|
|
163
|
-
|
|
164
|
-
// Or read from a backfilled usage.jsonl:
|
|
107
|
+
const rollup = await computeIssueCost('EPAC-1940');
|
|
165
108
|
const rollup2 = await computeIssueCostFromUsage('EPAC-1940', '~/llm-cost-history.jsonl');
|
|
166
|
-
|
|
167
|
-
// Backfill programmatically:
|
|
168
|
-
const result = await backfillUsageFromTranscripts({
|
|
169
|
-
outFile: '/tmp/usage.jsonl',
|
|
170
|
-
onProgress: ({ phase, processed, total }) => console.log(`${phase}: ${processed}/${total}`),
|
|
171
|
-
});
|
|
172
|
-
console.log(`Wrote ${result.recordsWritten} records`);
|
|
109
|
+
const result = await backfillUsageFromTranscripts({ outFile: '/tmp/usage.jsonl' });
|
|
173
110
|
```
|
|
174
111
|
|
|
175
|
-
Pass `{ cwdPattern, claudeProjectsDir, codexSessionsDir }` to override defaults
|
|
112
|
+
Pass `{ cwdPattern, claudeProjectsDir, codexSessionsDir }` to override defaults.
|
|
176
113
|
|
|
177
114
|
## What it doesn't (and can't) do
|
|
178
115
|
|
|
179
|
-
- **Story-point
|
|
180
|
-
- **Attempt counts
|
|
181
|
-
- **PR
|
|
182
|
-
- **
|
|
116
|
+
- **Story-point estimates** — live in your tracker, not the transcripts (see the sibling `llm-cost-estimation`).
|
|
117
|
+
- **Attempt counts** — the CLI doesn't record "attempt #N"; 5 runs look like 5 sessions with no winner marked.
|
|
118
|
+
- **PR / CI / reviewer state** — comes from GitHub, not the CLIs; out of scope (matches Symphony §2.2/§11.5).
|
|
119
|
+
- **Claude Desktop, claude.ai, ChatGPT, raw API SDK** — only Claude Code CLI and Codex CLI sessions are read.
|
|
183
120
|
|
|
184
121
|
## Pricing
|
|
185
122
|
|
|
186
|
-
`llm-cost` shows API-equivalent dollar cost per bucket
|
|
187
|
-
|
|
188
|
-
```
|
|
189
|
-
API-equivalent pricing (gpt-5.5 @ rates verified 2026-05-22):
|
|
190
|
-
input uncached $7.59 (1.5M × $5.00/1M)
|
|
191
|
-
cache read $25.51 (51.0M × $0.500/1M)
|
|
192
|
-
output (visible) $1.34 (44.7K × $30.00/1M)
|
|
193
|
-
output (reasoning) $0.56 (18.6K × $30.00/1M)
|
|
194
|
-
───────────────────────────────────────────
|
|
195
|
-
total API cost $35.00 [hypothetical — your Codex Pro plan covers this]
|
|
196
|
-
```
|
|
197
|
-
|
|
198
|
-
**This is a counterfactual, not your actual spend.** If you're on a subscription plan (Claude Max, Codex Pro, etc.), the dollar number represents what the same token volume would have cost on pay-as-you-go API — useful for comparison, but the marginal cost of running it on your actual plan is captured by the Codex quota readout above (`5h primary 58% → 64% used`), not by the dollar total.
|
|
199
|
-
|
|
200
|
-
The CLI warns when the bundled rate table is more than 90 days old. Pass `--no-pricing` to suppress the block entirely.
|
|
123
|
+
`llm-cost` shows API-equivalent dollar cost per bucket from a built-in rate table ([Anthropic](https://www.anthropic.com/pricing), [OpenAI](https://platform.openai.com/docs/pricing)). **This is a counterfactual, not your actual spend:** on a subscription plan (Claude Max, Codex Pro) it's what the same tokens would cost pay-as-you-go — your real marginal cost is the quota readout, not the dollar total. The CLI warns when the table is >90 days old; `--no-pricing` suppresses the block.
|
|
201
124
|
|
|
202
125
|
## License
|
|
203
126
|
|
package/bin/llm-cost.mjs
CHANGED
|
@@ -21,10 +21,13 @@ import { resolve } from 'node:path';
|
|
|
21
21
|
import { parseArgs } from 'node:util';
|
|
22
22
|
import {
|
|
23
23
|
backfillUsageFromTranscripts,
|
|
24
|
+
calibrateCoverage,
|
|
24
25
|
computeIssueCost,
|
|
25
26
|
computeIssueCostFromUsage,
|
|
26
27
|
computeWorktreeCost,
|
|
27
28
|
listKnownIssues,
|
|
29
|
+
readUsageRecords,
|
|
30
|
+
validateUsageRecord,
|
|
28
31
|
} from '../src/index.mjs';
|
|
29
32
|
import { DEFAULT_CWD_PATTERN } from '../src/issue-pattern.mjs';
|
|
30
33
|
import { computeMultiIssueRollup, expandAllIssueArgs } from '../src/multi-issue.mjs';
|
|
@@ -47,6 +50,10 @@ async function main() {
|
|
|
47
50
|
'no-pricing': { type: 'boolean' },
|
|
48
51
|
worktree: { type: 'string' },
|
|
49
52
|
out: { type: 'string' },
|
|
53
|
+
seed: { type: 'string' },
|
|
54
|
+
holdout: { type: 'string' },
|
|
55
|
+
quantile: { type: 'string' },
|
|
56
|
+
threshold: { type: 'string' },
|
|
50
57
|
json: { type: 'boolean' },
|
|
51
58
|
help: { type: 'boolean', short: 'h' },
|
|
52
59
|
},
|
|
@@ -105,6 +112,44 @@ async function main() {
|
|
|
105
112
|
return;
|
|
106
113
|
}
|
|
107
114
|
|
|
115
|
+
// `llm-cost calibrate <path>` backtests the forecaster's P80 band against a
|
|
116
|
+
// local estimate-tagged usage.jsonl and prints an empirical coverage report.
|
|
117
|
+
// The input is read locally only — never written back, never committed.
|
|
118
|
+
if (command === 'calibrate') {
|
|
119
|
+
const inputPath = positionals[1];
|
|
120
|
+
if (inputPath === undefined || inputPath === '') {
|
|
121
|
+
console.error('error: calibrate requires a path to a usage.jsonl file or directory');
|
|
122
|
+
process.exit(1);
|
|
123
|
+
}
|
|
124
|
+
const calOptions = {};
|
|
125
|
+
if (values.seed !== undefined) calOptions.seed = parseIntOption(values.seed, 'seed');
|
|
126
|
+
if (values.holdout !== undefined) calOptions.holdoutFraction = parseFloatOption(values.holdout, 'holdout');
|
|
127
|
+
if (values.quantile !== undefined) calOptions.quantile = parseFloatOption(values.quantile, 'quantile');
|
|
128
|
+
if (values.threshold !== undefined) calOptions.deviationThreshold = parseFloatOption(values.threshold, 'threshold');
|
|
129
|
+
|
|
130
|
+
const records = [];
|
|
131
|
+
let invalidLines = 0;
|
|
132
|
+
for await (const rec of readUsageRecords(inputPath)) {
|
|
133
|
+
if (validateUsageRecord(rec) === null) records.push(rec);
|
|
134
|
+
else invalidLines += 1;
|
|
135
|
+
}
|
|
136
|
+
|
|
137
|
+
let report;
|
|
138
|
+
try {
|
|
139
|
+
report = await calibrateCoverage(records, calOptions);
|
|
140
|
+
} catch (err) {
|
|
141
|
+
console.error(`error: ${err.message}`);
|
|
142
|
+
process.exit(1);
|
|
143
|
+
}
|
|
144
|
+
|
|
145
|
+
if (values.json === true) {
|
|
146
|
+
console.log(JSON.stringify(report, null, 2));
|
|
147
|
+
return;
|
|
148
|
+
}
|
|
149
|
+
printCalibrationReport(report, inputPath, invalidLines);
|
|
150
|
+
return;
|
|
151
|
+
}
|
|
152
|
+
|
|
108
153
|
if (command === 'list') {
|
|
109
154
|
const ids = await listKnownIssues(options);
|
|
110
155
|
if (values.json === true) {
|
|
@@ -161,6 +206,100 @@ async function main() {
|
|
|
161
206
|
printMultiIssueRollup(multi, fromUsage !== undefined, withPricing);
|
|
162
207
|
}
|
|
163
208
|
|
|
209
|
+
/** Parse a CLI integer option, exiting with a clear error on bad input. */
|
|
210
|
+
function parseIntOption(raw, name) {
|
|
211
|
+
const n = Number(raw);
|
|
212
|
+
if (!Number.isInteger(n)) {
|
|
213
|
+
console.error(`error: --${name} must be an integer (got "${raw}")`);
|
|
214
|
+
process.exit(1);
|
|
215
|
+
}
|
|
216
|
+
return n;
|
|
217
|
+
}
|
|
218
|
+
|
|
219
|
+
/** Parse a CLI float option, exiting with a clear error on bad input. */
|
|
220
|
+
function parseFloatOption(raw, name) {
|
|
221
|
+
const n = Number(raw);
|
|
222
|
+
if (!Number.isFinite(n)) {
|
|
223
|
+
console.error(`error: --${name} must be a number (got "${raw}")`);
|
|
224
|
+
process.exit(1);
|
|
225
|
+
}
|
|
226
|
+
return n;
|
|
227
|
+
}
|
|
228
|
+
|
|
229
|
+
/**
|
|
230
|
+
* Print the calibration coverage report: per-cell and overall empirical
|
|
231
|
+
* coverage of the predicted P80 band, with flags for cells that drift from the
|
|
232
|
+
* target by more than the threshold. Low-confidence cells (too few train/held-out
|
|
233
|
+
* issues) are shown but never flagged.
|
|
234
|
+
*/
|
|
235
|
+
function printCalibrationReport(report, inputPath, invalidLines = 0) {
|
|
236
|
+
const pct = (q) => (q == null ? ' —' : `${(q * 100).toFixed(0)}%`);
|
|
237
|
+
const targetPct = (report.quantile * 100).toFixed(0);
|
|
238
|
+
const thresholdPp = (report.deviationThreshold * 100).toFixed(0);
|
|
239
|
+
|
|
240
|
+
console.log(HEAD);
|
|
241
|
+
console.log(`CALIBRATION COVERAGE — ${inputPath}`);
|
|
242
|
+
console.log(HEAD);
|
|
243
|
+
console.log(
|
|
244
|
+
`Target band: P${targetPct} Held-out: ${(report.holdoutFraction * 100).toFixed(0)}% ` +
|
|
245
|
+
`Seed: ${report.seed} Flag threshold: ±${thresholdPp}pp`,
|
|
246
|
+
);
|
|
247
|
+
console.log(
|
|
248
|
+
`Records: ${formatNumber(report.overall.recordsTotal)} read, ` +
|
|
249
|
+
`${formatNumber(report.overall.recordsSkipped)} skipped (no cell / unavailable)` +
|
|
250
|
+
(invalidLines > 0 ? `, ${formatNumber(invalidLines)} invalid` : ''),
|
|
251
|
+
);
|
|
252
|
+
console.log(`Issues: ${formatNumber(report.overall.issuesTotal)} across ${report.overall.cellsTotal} cell${report.overall.cellsTotal === 1 ? '' : 's'}`);
|
|
253
|
+
console.log();
|
|
254
|
+
|
|
255
|
+
if (report.cells.length === 0) {
|
|
256
|
+
console.log('No forecastable cells found — need records tagged with size (or estimate) and model.');
|
|
257
|
+
return;
|
|
258
|
+
}
|
|
259
|
+
|
|
260
|
+
const cellLabel = (c) => `${c.cell.size} / ${c.cell.model}${c.lowConfidence ? ' (low conf)' : ''}`;
|
|
261
|
+
const labelWidth = Math.max(20, ...report.cells.map((c) => cellLabel(c).length));
|
|
262
|
+
|
|
263
|
+
console.log(
|
|
264
|
+
padRight('Cell', labelWidth) +
|
|
265
|
+
' ' + padLeft('Train', 6) +
|
|
266
|
+
' ' + padLeft('Holdout', 7) +
|
|
267
|
+
' ' + padLeft(`Pred P${targetPct}`, 9) +
|
|
268
|
+
' ' + padLeft('Coverage', 8) +
|
|
269
|
+
' Flag',
|
|
270
|
+
);
|
|
271
|
+
console.log(SEP);
|
|
272
|
+
for (const c of report.cells) {
|
|
273
|
+
console.log(
|
|
274
|
+
padRight(cellLabel(c), labelWidth) +
|
|
275
|
+
' ' + padLeft(formatNumber(c.trainN), 6) +
|
|
276
|
+
' ' + padLeft(formatNumber(c.holdoutN), 7) +
|
|
277
|
+
' ' + padLeft(c.predictedP80 == null ? '—' : formatTokensCompact(c.predictedP80), 9) +
|
|
278
|
+
' ' + padLeft(pct(c.coverage), 8) +
|
|
279
|
+
' ' + (c.flagged ? '⚠ FLAG' : ''),
|
|
280
|
+
);
|
|
281
|
+
}
|
|
282
|
+
console.log(SEP);
|
|
283
|
+
console.log(
|
|
284
|
+
padRight('OVERALL', labelWidth) +
|
|
285
|
+
' ' + padLeft('', 6) +
|
|
286
|
+
' ' + padLeft(formatNumber(report.overall.holdoutN), 7) +
|
|
287
|
+
' ' + padLeft('', 9) +
|
|
288
|
+
' ' + padLeft(pct(report.overall.coverage), 8) +
|
|
289
|
+
' ' + (report.overall.flagged ? '⚠ FLAG' : ''),
|
|
290
|
+
);
|
|
291
|
+
|
|
292
|
+
const flagged = report.cells.filter((c) => c.flagged);
|
|
293
|
+
console.log();
|
|
294
|
+
if (flagged.length === 0) {
|
|
295
|
+
console.log(`✓ No cells deviate from P${targetPct} coverage by more than ${thresholdPp} points.`);
|
|
296
|
+
} else {
|
|
297
|
+
console.log(`⚠ ${flagged.length} cell${flagged.length === 1 ? '' : 's'} off target by >${thresholdPp}pp: ${flagged.map((c) => `${c.cell.size} / ${c.cell.model}`).join(', ')}`);
|
|
298
|
+
}
|
|
299
|
+
console.log();
|
|
300
|
+
console.log('Note: the input is read locally only — never written back or committed. Keep it gitignored.');
|
|
301
|
+
}
|
|
302
|
+
|
|
164
303
|
/**
|
|
165
304
|
* Returns an onProgress callback that writes a live scan counter to stderr,
|
|
166
305
|
* overwriting the same line each tick. Clears the line when the Codex phase
|
|
@@ -202,6 +341,7 @@ function printUsage() {
|
|
|
202
341
|
llm-cost <ISSUE-ID> --from-usage <usage.jsonl-or-dir>
|
|
203
342
|
llm-cost list
|
|
204
343
|
llm-cost backfill --out <usage.jsonl-path>
|
|
344
|
+
llm-cost calibrate <usage.jsonl-or-dir> [--seed N] [--holdout F]
|
|
205
345
|
llm-cost --help
|
|
206
346
|
|
|
207
347
|
Per-issue token, turn, and quota analytics for Claude Code and Codex CLI sessions.
|
|
@@ -227,6 +367,13 @@ Options:
|
|
|
227
367
|
\`usage*.jsonl\` files (per the cost-telemetry spec)
|
|
228
368
|
instead of from the CLI transcripts.
|
|
229
369
|
--out <path> (backfill only) Destination usage.jsonl path. Appended.
|
|
370
|
+
--seed <int> (calibrate only) Seed for the deterministic held-out
|
|
371
|
+
split. Default 1.
|
|
372
|
+
--holdout <0..1> (calibrate only) Fraction of each cell's issues to hold
|
|
373
|
+
out for backtesting. Default 0.2.
|
|
374
|
+
--quantile <0..1> (calibrate only) Quantile band to test. Default 0.8 (P80).
|
|
375
|
+
--threshold <0..1> (calibrate only) Flag a cell when coverage drifts from
|
|
376
|
+
the target by more than this. Default 0.1 (10 points).
|
|
230
377
|
--json Emit machine-readable JSON instead of the table.
|
|
231
378
|
-h, --help Print this message.
|
|
232
379
|
|
|
@@ -244,6 +391,10 @@ Examples:
|
|
|
244
391
|
# to rm -rf ~/.claude/projects and ~/.codex/sessions.
|
|
245
392
|
llm-cost backfill --out ~/llm-cost-history.jsonl
|
|
246
393
|
llm-cost EPAC-1940 --from-usage ~/llm-cost-history.jsonl
|
|
394
|
+
|
|
395
|
+
# Check whether the forecaster's P80 band is actually calibrated against a
|
|
396
|
+
# local, estimate-tagged dataset. The input stays local — never committed.
|
|
397
|
+
llm-cost calibrate ~/backfill.out --seed 1 --holdout 0.2
|
|
247
398
|
`);
|
|
248
399
|
}
|
|
249
400
|
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "llm-cost-attribution",
|
|
3
|
-
"version": "0.
|
|
3
|
+
"version": "0.2.0",
|
|
4
4
|
"description": "Per-issue token, turn, and quota analytics for Claude Code and Codex CLI sessions. Reads the CLIs' own session JSONLs — no telemetry pipeline required.",
|
|
5
5
|
"type": "module",
|
|
6
6
|
"bin": {
|
|
@@ -14,7 +14,8 @@
|
|
|
14
14
|
"LICENSE"
|
|
15
15
|
],
|
|
16
16
|
"scripts": {
|
|
17
|
-
"test": "node --test"
|
|
17
|
+
"test": "node --test && npm run test:boundary",
|
|
18
|
+
"test:boundary": "node scripts/check-boundary.mjs"
|
|
18
19
|
},
|
|
19
20
|
"keywords": [
|
|
20
21
|
"claude",
|