ultracost 0.2.1 → 0.3.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +50 -1
- package/NOTICE +16 -3
- package/README.md +77 -12
- package/bin/cli.js +514 -117
- package/docs/ESTIMATES.md +24 -0
- package/docs/PUBLISHING.md +41 -34
- package/docs/architecture.md +19 -1
- package/docs/policy.md +25 -2
- package/package.json +1 -1
- package/src/classify.js +125 -0
- package/src/cost.js +54 -0
- package/src/detect.js +93 -0
- package/src/estimate.js +18 -0
- package/src/guard.js +244 -166
- package/src/index.js +7 -1
- package/src/lexer.js +227 -0
- package/src/log.js +20 -13
- package/src/loop.js +143 -0
- package/src/paths.js +10 -0
- package/src/policy.js +14 -0
- package/src/render.js +211 -0
- package/src/rules.js +17 -5
- package/src/transcript.js +186 -0
- package/templates/hooks/reinject.mjs +21 -18
- package/templates/hooks/workflow-gate.mjs +51 -45
- package/templates/policy.default.json +15 -2
package/CHANGELOG.md
CHANGED
|
@@ -6,6 +6,54 @@ All notable changes to this project are documented here. The format is based on
|
|
|
6
6
|
|
|
7
7
|
## [Unreleased]
|
|
8
8
|
|
|
9
|
+
## [0.3.0] - 2026-06-14
|
|
10
|
+
|
|
11
|
+
Phase 2 — closed-loop precision and a zero-dependency visual overhaul. Still zero runtime
|
|
12
|
+
dependencies, Claude-Code-only, and offline on the hot path.
|
|
13
|
+
|
|
14
|
+
### Added
|
|
15
|
+
- **Closed-loop, self-calibrating estimates.** New `src/transcript.js` reads local Claude Code
|
|
16
|
+
session transcripts offline (clean-room parse + dedup on `message.id`+`requestId`) and
|
|
17
|
+
attributes tokens **per dynamic-workflow stage** via `subagents/workflows/wf_*/agent-*.jsonl`
|
|
18
|
+
+ `journal.jsonl`. `src/cost.js` prices real usage with cache multipliers.
|
|
19
|
+
- `ultracost usage` — real token cost from your transcripts (main vs subagents vs workflow stages).
|
|
20
|
+
- `ultracost reconcile [--last|<wfId>]` — estimate-vs-actual per stage for a real run.
|
|
21
|
+
- `ultracost calibrate` — learns a token prior from your runs (outlier-filtered) into
|
|
22
|
+
`~/.claude/ultracost/calibration.json`; `estimate`/`explain`/`simulate`/the gate use it automatically.
|
|
23
|
+
- `ultracost ledger` (alias `savings`) — cumulative savings vs an all-opus baseline,
|
|
24
|
+
persisted in `~/.claude/ultracost/ledger.jsonl` (idempotent per workflow id).
|
|
25
|
+
- **Sharper static guard.** `src/guard.js` now runs on a hand-rolled zero-dep JS tokenizer
|
|
26
|
+
(`src/lexer.js`) instead of regex: dynamic model values, template literals, spreads, and
|
|
27
|
+
optional-call `agent?.()` are handled, and fan-out detection covers `forEach`, `for…of`,
|
|
28
|
+
`Promise.all([...map])`, and `Array.from` (not just `.map`/`pipeline`).
|
|
29
|
+
- **New guard codes.** `UC006` flags a pin that mismatches the work the prompt describes,
|
|
30
|
+
`UC007` flags effort over the model's cap, `UC008` flags an `alwaysOpus` role pinned off-opus.
|
|
31
|
+
Deterministic, offline tier scoring lives in `src/classify.js`.
|
|
32
|
+
- **`ultracost explain` / `simulate` / `diff`.** Per-stage rationale (tier, effort, est cost,
|
|
33
|
+
check flags); cost under all-opus / tiered / all-sonnet side by side; and a cost delta between
|
|
34
|
+
two workflow versions, with `--ci` emitting an Infracost-style PR-comment table.
|
|
35
|
+
- **Pre-flight budget guard.** `policy.budget.perRun` / `perDay` make the cost gate hard-deny an
|
|
36
|
+
over-budget launch before it runs (per-day reads the savings ledger).
|
|
37
|
+
- **Zero-dependency visual overhaul.** New `src/render.js` (truecolor/256/16 with NO_COLOR and
|
|
38
|
+
FORCE_COLOR support, ANSI-aware width via `util.stripVTControlCharacters` + `Intl.Segmenter`,
|
|
39
|
+
box-drawing tables, bars, sparklines, rounded panels) reskins every command and the cost gate's
|
|
40
|
+
message (now an aligned multi-line cost table).
|
|
41
|
+
|
|
42
|
+
### Changed
|
|
43
|
+
- **`status` and `doctor` are plugin-aware.** New `src/detect.js` reports how ultracost is
|
|
44
|
+
delivered (`plugin` / `cli` / `both` / `none`) by reading `enabledPlugins` (in `settings.json`
|
|
45
|
+
**and** `settings.local.json`) and the plugin cache `hooks/hooks.json` — so they no longer
|
|
46
|
+
report the active plugin as "off / N issues". Both surface the bypass-mode caveat.
|
|
47
|
+
- **`init` refuses to double-install.** When the plugin already delivers ultracost, `init` stops
|
|
48
|
+
(unless `--force`) so it can't write duplicate `~/.claude` rules that conflict with the plugin.
|
|
49
|
+
CLI hints are `npx`-aware.
|
|
50
|
+
- **One source for the routing prose.** The SessionStart hook (`reinject.mjs`) now compiles the
|
|
51
|
+
injected policy from `src/rules.js` at runtime, and `skills/ultracost/SKILL.md` is generated from
|
|
52
|
+
the same `compileRules()` (drift-tested), so the CLAUDE.md block, the hook, and the skill cannot
|
|
53
|
+
diverge. The injected prose no longer assumes a global `ultracost` binary (plugin users have none).
|
|
54
|
+
- **Policy `version` bumped to 2**: adds `classify.keywords`, `budget`, and
|
|
55
|
+
`estimation.cacheMultipliers` (all optional, back-compatible).
|
|
56
|
+
|
|
9
57
|
## [0.2.1] - 2026-06-14
|
|
10
58
|
|
|
11
59
|
### Changed
|
|
@@ -77,7 +125,8 @@ All notable changes to this project are documented here. The format is based on
|
|
|
77
125
|
and tiers whose model is in `neverUse`).
|
|
78
126
|
- `ultracost status`, `ultracost doctor`, `ultracost uninstall`.
|
|
79
127
|
|
|
80
|
-
[Unreleased]: https://github.com/danielkremen818/ultracost/compare/v0.
|
|
128
|
+
[Unreleased]: https://github.com/danielkremen818/ultracost/compare/v0.3.0...HEAD
|
|
129
|
+
[0.3.0]: https://github.com/danielkremen818/ultracost/compare/v0.2.1...v0.3.0
|
|
81
130
|
[0.2.1]: https://github.com/danielkremen818/ultracost/compare/v0.2.0...v0.2.1
|
|
82
131
|
[0.2.0]: https://github.com/danielkremen818/ultracost/compare/v0.1.0...v0.2.0
|
|
83
132
|
[0.1.0]: https://github.com/danielkremen818/ultracost/releases/tag/v0.1.0
|
package/NOTICE
CHANGED
|
@@ -11,8 +11,21 @@ Prior art / inspiration:
|
|
|
11
11
|
- 0xrdan/claude-router — automatic per-prompt routing via hooks
|
|
12
12
|
- gacabartosz/claude-smart-router — research-inspired complexity scoring
|
|
13
13
|
- R4CK/claude-model-changer — quota-aware complexity routing
|
|
14
|
+
- coyvalyss1/model-matchmaker — keyword->tier routing rubric
|
|
15
|
+
- ryoppippi/ccusage (MIT) — the documented Claude Code transcript parse + dedup contract
|
|
16
|
+
- Anthropic session-report skill (Apache-2.0) — local transcript token-attribution algorithm
|
|
17
|
+
- krulewis/tokencast (MIT) — pre-flight estimate + calibration-from-actuals approach
|
|
18
|
+
- Jwrede/tokentoll (MIT) — "Infracost for LLM" cost-diff-on-PR UX
|
|
19
|
+
|
|
20
|
+
The wrong-tier keyword rubric (src/classify.js) was informed by the public router
|
|
21
|
+
rubrics above; the transcript parse/dedup and calibration were reimplemented
|
|
22
|
+
clean-room from the documented behavior of ccusage / the session-report skill —
|
|
23
|
+
no code was copied. ultracost remains MIT and zero-dependency.
|
|
14
24
|
|
|
15
25
|
ultracost's distinct contribution is per-stage routing for dynamic workflows
|
|
16
|
-
(ultracode): a quality-first policy
|
|
17
|
-
|
|
18
|
-
|
|
26
|
+
(ultracode): a quality-first policy; a static-analysis guard that inspects the
|
|
27
|
+
workflow scripts Claude Code authors and flags stages that would silently inherit
|
|
28
|
+
the session model, pin a model that mismatches the task, or exceed the effort cap;
|
|
29
|
+
and a closed loop that reconciles its own estimates against real per-stage token
|
|
30
|
+
usage. No other tool statically detects an unpinned inline agent()/pipeline() stage
|
|
31
|
+
or a pin that mismatches the work the prompt describes.
|
package/README.md
CHANGED
|
@@ -31,17 +31,22 @@ guard that fails any unpinned stage.
|
|
|
31
31
|
|
|
32
32
|
No telemetry. No network on the hot path. MIT.
|
|
33
33
|
|
|
34
|
-
**Setup:**
|
|
34
|
+
**Setup (Claude Code plugin):**
|
|
35
|
+
|
|
36
|
+
```text
|
|
37
|
+
/plugin marketplace add danielkremen818/ultracost
|
|
38
|
+
/plugin install ultracost@ultracost
|
|
39
|
+
```
|
|
40
|
+
|
|
41
|
+
**Or via the npm CLI** (CI / scripting):
|
|
35
42
|
|
|
36
43
|
```bash
|
|
37
44
|
npx ultracost init
|
|
38
45
|
```
|
|
39
46
|
|
|
40
|
-
> A Claude Code plugin is planned; today ultracost installs via the npm CLI.
|
|
41
|
-
|
|
42
47
|
**Command:** `ultracost check [script]` (or `/ultracost:check` in Claude Code) — flag any `agent()` stage missing a model pin.
|
|
43
48
|
|
|
44
|
-
**CLI verbs:** `init · check · audit · estimate · pricing · status · doctor · uninstall`.
|
|
49
|
+
**CLI verbs:** `init · check · audit · estimate · explain · simulate · diff · usage · reconcile · calibrate · ledger · pricing · status · doctor · uninstall`.
|
|
45
50
|
|
|
46
51
|
<div align="center">
|
|
47
52
|
|
|
@@ -78,7 +83,7 @@ ultracost makes the routing **explicit, policy-driven, and verifiable** — with
|
|
|
78
83
|
|
|
79
84
|
## Architecture
|
|
80
85
|
|
|
81
|
-
One shared core in `src/`, two delivery surfaces:
|
|
86
|
+
One shared core in `src/`, two delivery surfaces: a Claude Code **plugin** (primary) and an **npm CLI** (secondary). Both compile from the same `policy.json`.
|
|
82
87
|
|
|
83
88
|
<div align="center">
|
|
84
89
|
|
|
@@ -90,11 +95,30 @@ The plan lives in **data** (`policy.json`), not in prose buried in a prompt. The
|
|
|
90
95
|
|
|
91
96
|
## Install
|
|
92
97
|
|
|
98
|
+
### Plugin (recommended)
|
|
99
|
+
|
|
100
|
+
Inside Claude Code:
|
|
101
|
+
|
|
102
|
+
```text
|
|
103
|
+
/plugin marketplace add danielkremen818/ultracost
|
|
104
|
+
/plugin install ultracost@ultracost
|
|
105
|
+
```
|
|
106
|
+
|
|
107
|
+
Then verify a workflow script at any time:
|
|
108
|
+
|
|
109
|
+
```text
|
|
110
|
+
/ultracost:check ./path/to/workflow.js
|
|
111
|
+
```
|
|
112
|
+
|
|
113
|
+
The plugin bundles — touching none of your own files — a `SessionStart` policy-injection hook, a `PreToolUse` cost gate on the `Workflow` tool (`ULTRACOST_GATE=off` to disable), the `/ultracost:check` command, and a routing-policy skill. Requires Claude Code with the `/plugin` command and dynamic workflows enabled.
|
|
114
|
+
|
|
115
|
+
### npm CLI
|
|
116
|
+
|
|
93
117
|
```bash
|
|
94
118
|
npx ultracost init
|
|
95
119
|
```
|
|
96
120
|
|
|
97
|
-
This writes `~/.claude/ultracost/policy.json`, injects the routing block into `~/.claude/CLAUDE.md`, installs the re-inject hook (`~/.claude/ultracost/reinject.mjs`), and registers it on `SessionStart` in `~/.claude/settings.json`. New sessions pick it up immediately. Paths honor `CLAUDE_CONFIG_DIR` if you've relocated your config.
|
|
121
|
+
This writes `~/.claude/ultracost/policy.json`, injects the routing block into `~/.claude/CLAUDE.md`, installs the re-inject hook (`~/.claude/ultracost/reinject.mjs`), and registers it on `SessionStart` in `~/.claude/settings.json`. New sessions pick it up immediately. Paths honor `CLAUDE_CONFIG_DIR` if you've relocated your config. Requires Node ≥ 24.
|
|
98
122
|
|
|
99
123
|
Then verify a workflow script at any time:
|
|
100
124
|
|
|
@@ -102,10 +126,21 @@ Then verify a workflow script at any time:
|
|
|
102
126
|
ultracost check ./path/to/workflow.js
|
|
103
127
|
```
|
|
104
128
|
|
|
105
|
-
>
|
|
129
|
+
> Use the npm path for CI/scripting or the CLAUDE.md-injection workflow; for day-to-day use in Claude Code, the plugin above is simpler.
|
|
106
130
|
|
|
107
131
|
## Uninstall
|
|
108
132
|
|
|
133
|
+
### Plugin
|
|
134
|
+
|
|
135
|
+
```text
|
|
136
|
+
/plugin uninstall ultracost@ultracost
|
|
137
|
+
/plugin marketplace remove ultracost
|
|
138
|
+
```
|
|
139
|
+
|
|
140
|
+
The plugin touches none of your own files, so removing it removes everything ultracost added.
|
|
141
|
+
|
|
142
|
+
### npm CLI
|
|
143
|
+
|
|
109
144
|
```bash
|
|
110
145
|
ultracost uninstall
|
|
111
146
|
```
|
|
@@ -115,12 +150,16 @@ Reverses everything `init` did: removes the routing block from `~/.claude/CLAUDE
|
|
|
115
150
|
## Quickstart (CLI)
|
|
116
151
|
|
|
117
152
|
```bash
|
|
118
|
-
ultracost init # install policy + rules + hook
|
|
119
|
-
ultracost status #
|
|
153
|
+
ultracost init # install policy + rules + hook (refuses if the plugin already delivers it)
|
|
154
|
+
ultracost status # active policy + how it's delivered (plugin/cli) + bypass caveat
|
|
120
155
|
ultracost audit ~/.claude/projects # pin stats across your real workflow scripts
|
|
121
156
|
ultracost check ./path/to/workflow # scan a workflow script (or a directory)
|
|
122
157
|
ultracost check . --fix # auto-pin the default model on unpinned stages
|
|
123
158
|
ultracost estimate ./workflow.js # agents, model mix, and cost vs all-opus baseline
|
|
159
|
+
ultracost explain ./workflow.js # per-stage rationale + which checks fire
|
|
160
|
+
ultracost reconcile --last # estimate vs actual for your latest real run
|
|
161
|
+
ultracost calibrate # tune the estimator from your real token usage
|
|
162
|
+
ultracost ledger # cumulative savings vs all-opus
|
|
124
163
|
ultracost pricing refresh # update prices from Anthropic's official page
|
|
125
164
|
```
|
|
126
165
|
|
|
@@ -147,6 +186,29 @@ $ ultracost estimate ./workflow.js
|
|
|
147
186
|
|
|
148
187
|
Estimates are relative (tiered vs all-opus), not a bill; fan-outs are ranges; the interactive 3-option menu needs a TUI. Full detail, assumptions, and the gate's [#52343](https://github.com/anthropics/claude-code/issues/52343) limitation are in [`docs/ESTIMATES.md`](./docs/ESTIMATES.md).
|
|
149
188
|
|
|
189
|
+
## The closed loop: measure, reconcile, calibrate
|
|
190
|
+
|
|
191
|
+
ultracost doesn't just estimate — it reads its own results back and tunes itself. It parses your **local** Claude Code transcripts (offline; no network, no telemetry) and attributes tokens **per dynamic-workflow stage** via the `subagents/workflows/wf_*/agent-*.jsonl` files Claude Code writes. No other router does this.
|
|
192
|
+
|
|
193
|
+
```bash
|
|
194
|
+
ultracost usage # real token cost: main loop vs subagents vs workflow stages
|
|
195
|
+
ultracost reconcile --last # estimate vs ACTUAL, per stage, for your latest workflow run
|
|
196
|
+
ultracost calibrate # learn a token prior from your real runs (estimate uses it)
|
|
197
|
+
ultracost ledger # cumulative $ saved vs an all-opus baseline, persisted
|
|
198
|
+
```
|
|
199
|
+
|
|
200
|
+
- **Self-calibrating.** `calibrate` learns real per-stage token sizes (outlier-filtered) into `~/.claude/ultracost/calibration.json`; `estimate`, `explain`, `simulate`, and the gate use it automatically — the estimate gets closer to your reality every run.
|
|
201
|
+
- **Savings ledger.** `ledger` keeps a running tally of what the policy saved you versus running everything on Opus, persisted in `~/.claude/ultracost/ledger.jsonl` (idempotent per run).
|
|
202
|
+
- **Pre-flight budget guard.** Set `budget.perRun` / `budget.perDay` in the policy and the cost gate **denies** a launch whose estimate would blow the cap — before it runs.
|
|
203
|
+
|
|
204
|
+
## Understand and compare a workflow
|
|
205
|
+
|
|
206
|
+
```bash
|
|
207
|
+
ultracost explain ./wf.js # per-stage: tier, effort, est cost, and which UC checks fire
|
|
208
|
+
ultracost simulate ./wf.js # cost under all-opus vs your tiered pins vs all-sonnet
|
|
209
|
+
ultracost diff old.js new.js # cost delta between two versions (--ci → PR-comment table)
|
|
210
|
+
```
|
|
211
|
+
|
|
150
212
|
## How routing is decided
|
|
151
213
|
|
|
152
214
|
| Tier | Model | Use for |
|
|
@@ -175,9 +237,12 @@ wf.js:4:13 UC003 stage pins banned model "haiku" (policy.neverUse)
|
|
|
175
237
|
| `UC002` | options object present, no `model` |
|
|
176
238
|
| `UC003` | model resolves to a banned model (e.g. haiku) |
|
|
177
239
|
| `UC004` | `model: 'inherit'` while `allowInherit` is false |
|
|
178
|
-
| `UC005` | options
|
|
240
|
+
| `UC005` | model/options is a dynamic expression — can't verify (warning) |
|
|
241
|
+
| `UC006` | the pinned model mismatches the work the prompt describes (warning) |
|
|
242
|
+
| `UC007` | `effort` exceeds the model's cap, e.g. `sonnet` @ `xhigh` (warning) |
|
|
243
|
+
| `UC008` | an `alwaysOpus` role (orchestrator, consolidation, …) pins a cheaper tier (warning) |
|
|
179
244
|
|
|
180
|
-
The scanner
|
|
245
|
+
The scanner runs on a hand-rolled, zero-dependency JS tokenizer, so it's robust to template literals, spreads, optional-call `agent?.()`, and dynamic model values — and an `agent(` inside a prompt string or comment is prose, never a call. Fan-out detection covers `.map`/`.flatMap`/`forEach`/`for…of`/`Promise.all`/`Array.from`/`pipeline`. `--json` for CI, `--fix` to auto-insert the default model on the unambiguous cases (`UC001`/`UC002`), `--quiet` to print only the problems. `UC006`–`UC008` are advisory warnings and never fail the build on their own; exit code is non-zero only when pin-presence errors (`UC001`–`UC004`) are found.
|
|
181
246
|
|
|
182
247
|
## Audit your history
|
|
183
248
|
|
|
@@ -231,7 +296,7 @@ Fails the build if any committed workflow script has a stage that would inherit
|
|
|
231
296
|
|
|
232
297
|
## How it compares
|
|
233
298
|
|
|
234
|
-
ultracost is intentionally narrow. General-purpose routers ([claude-router](https://github.com/0xrdan/claude-router), [claude-smart-router](https://github.com/gacabartosz/claude-smart-router), [claude-model-changer](https://github.com/R4CK/claude-model-changer)) score every prompt and route the *main loop
|
|
299
|
+
ultracost is intentionally narrow. General-purpose routers ([claude-router](https://github.com/0xrdan/claude-router), [claude-smart-router](https://github.com/gacabartosz/claude-smart-router), [claude-model-changer](https://github.com/R4CK/claude-model-changer), [model-matchmaker](https://github.com/coyvalyss1/model-matchmaker)) score every prompt and route the *main loop* at runtime. Linters like [claudelint](https://github.com/pdugan20/claudelint) validate a *file-based* agent's `model:` value. ultracost targets the **dynamic-workflow / ultracode** path and is, as far as we can tell, the only tool that **statically detects an unpinned inline `agent()`/`pipeline()` stage, flags a pin that mismatches the work the prompt describes, and reconciles its own cost estimate against real per-stage token usage**. Cost tooling like [ccusage](https://github.com/ryoppippi/ccusage), [tokencast](https://github.com/krulewis/tokencast), and [tokentoll](https://github.com/Jwrede/tokentoll) informed the transcript-parsing, calibration, and cost-diff approaches (reimplemented clean-room). See [`NOTICE`](./NOTICE) for prior-art credits.
|
|
235
300
|
|
|
236
301
|
## Documentation
|
|
237
302
|
|