ultracost 0.2.0 → 0.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -6,6 +6,56 @@ All notable changes to this project are documented here. The format is based on
6
6
 
7
7
  ## [Unreleased]
8
8
 
9
+ ## [0.3.0] - 2026-06-14
10
+
11
+ Phase 2 — closed-loop precision and a zero-dependency visual overhaul. Still zero runtime
12
+ dependencies, Claude-Code-only, and offline on the hot path.
13
+
14
+ ### Added
15
+ - **Closed-loop, self-calibrating estimates.** New `src/transcript.js` reads local Claude Code
16
+ session transcripts offline (clean-room parse + dedup on `message.id`+`requestId`) and
17
+ attributes tokens **per dynamic-workflow stage** via `subagents/workflows/wf_*/agent-*.jsonl`
18
+ + `journal.jsonl`. `src/cost.js` prices real usage with cache multipliers.
19
+ - `ultracost usage` — real token cost from your transcripts (main vs subagents vs workflow stages).
20
+ - `ultracost reconcile [--last|<wfId>]` — estimate-vs-actual per stage for a real run.
21
+ - `ultracost calibrate` — learns a token prior from your runs (outlier-filtered) into
22
+ `~/.claude/ultracost/calibration.json`; `estimate`/`explain`/`simulate`/the gate use it automatically.
23
+ - `ultracost ledger` (alias `savings`) — cumulative savings vs an all-opus baseline,
24
+ persisted in `~/.claude/ultracost/ledger.jsonl` (idempotent per workflow id).
25
+ - **Sharper static guard.** `src/guard.js` now runs on a hand-rolled zero-dep JS tokenizer
26
+ (`src/lexer.js`) instead of regex: dynamic model values, template literals, spreads, and
27
+ optional-call `agent?.()` are handled, and fan-out detection covers `forEach`, `for…of`,
28
+ `Promise.all([...map])`, and `Array.from` (not just `.map`/`pipeline`).
29
+ - **New guard codes.** `UC006` flags a pin that mismatches the work the prompt describes,
30
+ `UC007` flags effort over the model's cap, `UC008` flags an `alwaysOpus` role pinned off-opus.
31
+ Deterministic, offline tier scoring lives in `src/classify.js`.
32
+ - **`ultracost explain` / `simulate` / `diff`.** Per-stage rationale (tier, effort, est cost,
33
+ check flags); cost under all-opus / tiered / all-sonnet side by side; and a cost delta between
34
+ two workflow versions, with `--ci` emitting an Infracost-style PR-comment table.
35
+ - **Pre-flight budget guard.** `policy.budget.perRun` / `perDay` make the cost gate hard-deny an
36
+ over-budget launch before it runs (per-day reads the savings ledger).
37
+ - **Zero-dependency visual overhaul.** New `src/render.js` (truecolor/256/16 with NO_COLOR and
38
+ FORCE_COLOR support, ANSI-aware width via `util.stripVTControlCharacters` + `Intl.Segmenter`,
39
+ box-drawing tables, bars, sparklines, rounded panels) reskins every command and the cost gate's
40
+ message (now an aligned multi-line cost table).
41
+
42
+ ### Changed
43
+ - **`status` and `doctor` are plugin-aware.** New `src/detect.js` reports how ultracost is
44
+ delivered (`plugin` / `cli` / `both` / `none`) by reading `enabledPlugins` (in `settings.json`
45
+ **and** `settings.local.json`) and the plugin cache `hooks/hooks.json` — so they no longer
46
+ report the active plugin as "off / N issues". Both surface the bypass-mode caveat.
47
+ - **`init` refuses to double-install.** When the plugin already delivers ultracost, `init` stops
48
+ (unless `--force`) so it can't write duplicate `~/.claude` rules that conflict with the plugin.
49
+ CLI hints are `npx`-aware.
50
+ - **One source for the routing prose.** The SessionStart hook (`reinject.mjs`) now compiles the
51
+ injected policy from `src/rules.js` at runtime, and `skills/ultracost/SKILL.md` is generated from
52
+ the same `compileRules()` (drift-tested), so the CLAUDE.md block, the hook, and the skill cannot
53
+ diverge. The injected prose no longer assumes a global `ultracost` binary (plugin users have none).
54
+ - **Policy `version` bumped to 2**: adds `classify.keywords`, `budget`, and
55
+ `estimation.cacheMultipliers` (all optional, back-compatible).
56
+
57
+ ## [0.2.1] - 2026-06-14
58
+
9
59
  ### Changed
10
60
  - **The cost gate is now mode-aware and hard in every permission mode.** It reads
11
61
  `permission_mode` from the `PreToolUse` event: it asks (with the estimate) in
@@ -74,8 +124,9 @@ All notable changes to this project are documented here. The format is based on
74
124
  - Data-driven `policy.json` with load-time validation (rejects undefined default tiers
75
125
  and tiers whose model is in `neverUse`).
76
126
  - `ultracost status`, `ultracost doctor`, `ultracost uninstall`.
77
- - Docs: architecture, policy reference, and the ultracode rationale.
78
127
 
79
- [Unreleased]: https://github.com/danielkremen818/ultracost/compare/v0.2.0...HEAD
128
+ [Unreleased]: https://github.com/danielkremen818/ultracost/compare/v0.3.0...HEAD
129
+ [0.3.0]: https://github.com/danielkremen818/ultracost/compare/v0.2.1...v0.3.0
130
+ [0.2.1]: https://github.com/danielkremen818/ultracost/compare/v0.2.0...v0.2.1
80
131
  [0.2.0]: https://github.com/danielkremen818/ultracost/compare/v0.1.0...v0.2.0
81
132
  [0.1.0]: https://github.com/danielkremen818/ultracost/releases/tag/v0.1.0
package/NOTICE CHANGED
@@ -11,8 +11,21 @@ Prior art / inspiration:
11
11
  - 0xrdan/claude-router — automatic per-prompt routing via hooks
12
12
  - gacabartosz/claude-smart-router — research-inspired complexity scoring
13
13
  - R4CK/claude-model-changer — quota-aware complexity routing
14
+ - coyvalyss1/model-matchmaker — keyword->tier routing rubric
15
+ - ryoppippi/ccusage (MIT) — the documented Claude Code transcript parse + dedup contract
16
+ - Anthropic session-report skill (Apache-2.0) — local transcript token-attribution algorithm
17
+ - krulewis/tokencast (MIT) — pre-flight estimate + calibration-from-actuals approach
18
+ - Jwrede/tokentoll (MIT) — "Infracost for LLM" cost-diff-on-PR UX
19
+
20
+ The wrong-tier keyword rubric (src/classify.js) was informed by the public router
21
+ rubrics above; the transcript parse/dedup and calibration were reimplemented
22
+ clean-room from the documented behavior of ccusage / the session-report skill —
23
+ no code was copied. ultracost remains MIT and zero-dependency.
14
24
 
15
25
  ultracost's distinct contribution is per-stage routing for dynamic workflows
16
- (ultracode): a quality-first policy plus a static-analysis guard that inspects
17
- the workflow scripts Claude Code authors and flags subagent stages that would
18
- silently inherit the session model.
26
+ (ultracode): a quality-first policy; a static-analysis guard that inspects the
27
+ workflow scripts Claude Code authors and flags stages that would silently inherit
28
+ the session model, pin a model that mismatches the task, or exceed the effort cap;
29
+ and a closed loop that reconciles its own estimates against real per-stage token
30
+ usage. No other tool statically detects an unpinned inline agent()/pipeline() stage
31
+ or a pin that mismatches the work the prompt describes.
package/README.md CHANGED
@@ -1,5 +1,7 @@
1
1
  <div align="center">
2
2
 
3
+ <img src="./assets/logo.png" alt="ultracost logo" width="150">
4
+
3
5
  # ultracost
4
6
 
5
7
  **Per-stage model routing for Claude Code dynamic workflows.**
@@ -29,16 +31,22 @@ guard that fails any unpinned stage.
29
31
 
30
32
  No telemetry. No network on the hot path. MIT.
31
33
 
32
- **Setup (plugin):**
34
+ **Setup (Claude Code plugin):**
33
35
 
34
36
  ```text
35
37
  /plugin marketplace add danielkremen818/ultracost
36
38
  /plugin install ultracost@ultracost
37
39
  ```
38
40
 
39
- **Built-in command:** `/ultracost:check [script]` flag any `agent()` stage missing a model pin.
41
+ **Or via the npm CLI** (CI / scripting):
40
42
 
41
- **CLI verbs:** `init · check · audit · estimate · pricing · status · doctor · uninstall`.
43
+ ```bash
44
+ npx ultracost init
45
+ ```
46
+
47
+ **Command:** `ultracost check [script]` (or `/ultracost:check` in Claude Code) — flag any `agent()` stage missing a model pin.
48
+
49
+ **CLI verbs:** `init · check · audit · estimate · explain · simulate · diff · usage · reconcile · calibrate · ledger · pricing · status · doctor · uninstall`.
42
50
 
43
51
  <div align="center">
44
52
 
@@ -77,35 +85,11 @@ ultracost makes the routing **explicit, policy-driven, and verifiable** — with
77
85
 
78
86
  One shared core in `src/`, two delivery surfaces: a Claude Code **plugin** (primary) and an **npm CLI** (secondary). Both compile from the same `policy.json`.
79
87
 
80
- ```mermaid
81
- flowchart TD
82
- subgraph core["src/ — shared core"]
83
- POL["policy.js<br/>(policy.json — you own it)"]
84
- RUL["rules.js<br/>(rule compiler)"]
85
- GRD["guard.js<br/>(static analysis)"]
86
- end
87
-
88
- subgraph plugin["Claude Code plugin — PRIMARY"]
89
- SK["skills/ultracost<br/>routing policy (always-relevant)"]
90
- CMD["/ultracost:check command"]
91
- HK["hooks.json<br/>SessionStart (all sources)"]
92
- end
93
-
94
- subgraph cli["npm CLI — secondary"]
95
- BIN["bin/cli.js<br/>init · check · audit · doctor · status · uninstall"]
96
- end
97
-
98
- POL --> RUL
99
- RUL --> SK
100
- RUL --> BIN
101
- GRD --> CMD
102
- GRD --> BIN
103
- HK --> RE["reinject.mjs<br/>(node, no bash/jq)"]
104
- BIN --> RE
105
-
106
- classDef ft fill:#1f6feb,stroke:#0b3d91,color:#fff;
107
- class POL,RUL,GRD ft;
108
- ```
88
+ <div align="center">
89
+
90
+ <img src="./assets/architecture.svg" alt="ultracost architecture policy.json compiles through the src/ core into a Claude Code plugin and an npm CLI; the guard and a PreToolUse cost gate verify every workflow stage at runtime" width="960">
91
+
92
+ </div>
109
93
 
110
94
  The plan lives in **data** (`policy.json`), not in prose buried in a prompt. The guard is the enforcement layer the model can't talk its way out of. See [`docs/architecture.md`](./docs/architecture.md) for the full picture.
111
95
 
@@ -113,7 +97,7 @@ The plan lives in **data** (`policy.json`), not in prose buried in a prompt. The
113
97
 
114
98
  ### Plugin (recommended)
115
99
 
116
- Inside Claude Code, add the marketplace and install the plugin:
100
+ Inside Claude Code:
117
101
 
118
102
  ```text
119
103
  /plugin marketplace add danielkremen818/ultracost
@@ -126,26 +110,23 @@ Then verify a workflow script at any time:
126
110
  /ultracost:check ./path/to/workflow.js
127
111
  ```
128
112
 
129
- The plugin bundles four things and **touches none of your existing files**:
113
+ The plugin bundles touching none of your own files — a `SessionStart` policy-injection hook, a `PreToolUse` cost gate on the `Workflow` tool (`ULTRACOST_GATE=off` to disable), the `/ultracost:check` command, and a routing-policy skill. Requires Claude Code with the `/plugin` command and dynamic workflows enabled.
130
114
 
131
- - a **`SessionStart`** hook that injects the routing policy as context at session start and after compaction (the always-on guidance),
132
- - a **`PreToolUse` cost gate** on the `Workflow` tool that hard-stops every dynamic-workflow launch with an estimate (set `ULTRACOST_GATE=off` to disable; see [`docs/ESTIMATES.md`](./docs/ESTIMATES.md)),
133
- - the **`/ultracost:check`** command (the Workflow Guard), and
134
- - a **routing-policy skill** for explicit reference when Claude authors workflow/ultracode/subagent scripts.
115
+ ### npm CLI
135
116
 
136
- > Requires Claude Code with the `/plugin` command (run `/help` to confirm) and dynamic workflows enabled.
117
+ ```bash
118
+ npx ultracost init
119
+ ```
137
120
 
138
- ### npm CLI (professional secondary)
121
+ This writes `~/.claude/ultracost/policy.json`, injects the routing block into `~/.claude/CLAUDE.md`, installs the re-inject hook (`~/.claude/ultracost/reinject.mjs`), and registers it on `SessionStart` in `~/.claude/settings.json`. New sessions pick it up immediately. Paths honor `CLAUDE_CONFIG_DIR` if you've relocated your config. Requires Node ≥ 24.
139
122
 
140
- For CI, scripting, or if you prefer the `~/.claude/CLAUDE.md` injection path:
123
+ Then verify a workflow script at any time:
141
124
 
142
125
  ```bash
143
- npx ultracost init
126
+ ultracost check ./path/to/workflow.js
144
127
  ```
145
128
 
146
- This writes `~/.claude/ultracost/policy.json`, injects the routing block into `~/.claude/CLAUDE.md`, installs the re-inject hook (`~/.claude/ultracost/reinject.mjs`), and registers it in `~/.claude/settings.json`. New sessions pick it up immediately. Paths honor `CLAUDE_CONFIG_DIR` if you've relocated your config.
147
-
148
- > Requires Node ≥ 24.
129
+ > Use the npm path for CI/scripting or the CLAUDE.md-injection workflow; for day-to-day use in Claude Code, the plugin above is simpler.
149
130
 
150
131
  ## Uninstall
151
132
 
@@ -156,7 +137,7 @@ This writes `~/.claude/ultracost/policy.json`, injects the routing block into `~
156
137
  /plugin marketplace remove ultracost
157
138
  ```
158
139
 
159
- The plugin touches none of your own files — its hook, command, and skill live inside the plugin package, so removing it removes everything; nothing is left in your `~/.claude/CLAUDE.md` or `settings.json`. If Claude Code offers to "disable in `settings.local.json`" instead (because the plugin was enabled in a shared `settings.json`), that has the same effect — accept it, or remove the marketplace as above.
140
+ The plugin touches none of your own files, so removing it removes everything ultracost added.
160
141
 
161
142
  ### npm CLI
162
143
 
@@ -169,12 +150,16 @@ Reverses everything `init` did: removes the routing block from `~/.claude/CLAUDE
169
150
  ## Quickstart (CLI)
170
151
 
171
152
  ```bash
172
- ultracost init # install policy + rules + hook
173
- ultracost status # see the active policy and install state
153
+ ultracost init # install policy + rules + hook (refuses if the plugin already delivers it)
154
+ ultracost status # active policy + how it's delivered (plugin/cli) + bypass caveat
174
155
  ultracost audit ~/.claude/projects # pin stats across your real workflow scripts
175
156
  ultracost check ./path/to/workflow # scan a workflow script (or a directory)
176
157
  ultracost check . --fix # auto-pin the default model on unpinned stages
177
158
  ultracost estimate ./workflow.js # agents, model mix, and cost vs all-opus baseline
159
+ ultracost explain ./workflow.js # per-stage rationale + which checks fire
160
+ ultracost reconcile --last # estimate vs actual for your latest real run
161
+ ultracost calibrate # tune the estimator from your real token usage
162
+ ultracost ledger # cumulative savings vs all-opus
178
163
  ultracost pricing refresh # update prices from Anthropic's official page
179
164
  ```
180
165
 
@@ -201,6 +186,29 @@ $ ultracost estimate ./workflow.js
201
186
 
202
187
  Estimates are relative (tiered vs all-opus), not a bill; fan-outs are ranges; the interactive 3-option menu needs a TUI. Full detail, assumptions, and the gate's [#52343](https://github.com/anthropics/claude-code/issues/52343) limitation are in [`docs/ESTIMATES.md`](./docs/ESTIMATES.md).
203
188
 
189
+ ## The closed loop: measure, reconcile, calibrate
190
+
191
+ ultracost doesn't just estimate — it reads its own results back and tunes itself. It parses your **local** Claude Code transcripts (offline; no network, no telemetry) and attributes tokens **per dynamic-workflow stage** via the `subagents/workflows/wf_*/agent-*.jsonl` files Claude Code writes. No other router does this.
192
+
193
+ ```bash
194
+ ultracost usage # real token cost: main loop vs subagents vs workflow stages
195
+ ultracost reconcile --last # estimate vs ACTUAL, per stage, for your latest workflow run
196
+ ultracost calibrate # learn a token prior from your real runs (estimate uses it)
197
+ ultracost ledger # cumulative $ saved vs an all-opus baseline, persisted
198
+ ```
199
+
200
+ - **Self-calibrating.** `calibrate` learns real per-stage token sizes (outlier-filtered) into `~/.claude/ultracost/calibration.json`; `estimate`, `explain`, `simulate`, and the gate use it automatically — the estimate gets closer to your reality every run.
201
+ - **Savings ledger.** `ledger` keeps a running tally of what the policy saved you versus running everything on Opus, persisted in `~/.claude/ultracost/ledger.jsonl` (idempotent per run).
202
+ - **Pre-flight budget guard.** Set `budget.perRun` / `budget.perDay` in the policy and the cost gate **denies** a launch whose estimate would blow the cap — before it runs.
203
+
204
+ ## Understand and compare a workflow
205
+
206
+ ```bash
207
+ ultracost explain ./wf.js # per-stage: tier, effort, est cost, and which UC checks fire
208
+ ultracost simulate ./wf.js # cost under all-opus vs your tiered pins vs all-sonnet
209
+ ultracost diff old.js new.js # cost delta between two versions (--ci → PR-comment table)
210
+ ```
211
+
204
212
  ## How routing is decided
205
213
 
206
214
  | Tier | Model | Use for |
@@ -229,9 +237,12 @@ wf.js:4:13 UC003 stage pins banned model "haiku" (policy.neverUse)
229
237
  | `UC002` | options object present, no `model` |
230
238
  | `UC003` | model resolves to a banned model (e.g. haiku) |
231
239
  | `UC004` | `model: 'inherit'` while `allowInherit` is false |
232
- | `UC005` | options passed as a variable — can't verify (warning) |
240
+ | `UC005` | model/options is a dynamic expression — can't verify (warning) |
241
+ | `UC006` | the pinned model mismatches the work the prompt describes (warning) |
242
+ | `UC007` | `effort` exceeds the model's cap, e.g. `sonnet` @ `xhigh` (warning) |
243
+ | `UC008` | an `alwaysOpus` role (orchestrator, consolidation, …) pins a cheaper tier (warning) |
233
244
 
234
- The scanner is string- and comment-aware: an `agent(` that appears inside a prompt string or a comment is prose, not a call, and is never flagged. `--json` for CI, `--fix` to auto-insert the default model on the unambiguous cases (`UC001`/`UC002`), `--quiet` to print only the problems. Exit code is non-zero when errors are found.
245
+ The scanner runs on a hand-rolled, zero-dependency JS tokenizer, so it's robust to template literals, spreads, optional-call `agent?.()`, and dynamic model values — and an `agent(` inside a prompt string or comment is prose, never a call. Fan-out detection covers `.map`/`.flatMap`/`forEach`/`for…of`/`Promise.all`/`Array.from`/`pipeline`. `--json` for CI, `--fix` to auto-insert the default model on the unambiguous cases (`UC001`/`UC002`), `--quiet` to print only the problems. `UC006`–`UC008` are advisory warnings and never fail the build on their own; exit code is non-zero only when pin-presence errors (`UC001`–`UC004`) are found.
235
246
 
236
247
  ## Audit your history
237
248
 
@@ -285,7 +296,7 @@ Fails the build if any committed workflow script has a stage that would inherit
285
296
 
286
297
  ## How it compares
287
298
 
288
- ultracost is intentionally narrow. General-purpose routers ([claude-router](https://github.com/0xrdan/claude-router), [claude-smart-router](https://github.com/gacabartosz/claude-smart-router), [claude-model-changer](https://github.com/R4CK/claude-model-changer)) score every prompt and route the *main loop*. ultracost targets the **dynamic-workflow / ultracode** path and adds a **guard** that statically verifies stage-level pins which none of them do. See [`NOTICE`](./NOTICE) for prior-art credits.
299
+ ultracost is intentionally narrow. General-purpose routers ([claude-router](https://github.com/0xrdan/claude-router), [claude-smart-router](https://github.com/gacabartosz/claude-smart-router), [claude-model-changer](https://github.com/R4CK/claude-model-changer), [model-matchmaker](https://github.com/coyvalyss1/model-matchmaker)) score every prompt and route the *main loop* at runtime. Linters like [claudelint](https://github.com/pdugan20/claudelint) validate a *file-based* agent's `model:` value. ultracost targets the **dynamic-workflow / ultracode** path and is, as far as we can tell, the only tool that **statically detects an unpinned inline `agent()`/`pipeline()` stage, flags a pin that mismatches the work the prompt describes, and reconciles its own cost estimate against real per-stage token usage**. Cost tooling like [ccusage](https://github.com/ryoppippi/ccusage), [tokencast](https://github.com/krulewis/tokencast), and [tokentoll](https://github.com/Jwrede/tokentoll) informed the transcript-parsing, calibration, and cost-diff approaches (reimplemented clean-room). See [`NOTICE`](./NOTICE) for prior-art credits.
289
300
 
290
301
  ## Documentation
291
302