ultracost 0.2.1 → 0.3.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -6,6 +6,70 @@ All notable changes to this project are documented here. The format is based on
6
6
 
7
7
  ## [Unreleased]
8
8
 
9
+ ## [0.3.1] - 2026-06-14
10
+
11
+ ### Added
12
+ - **A slash command for every verb inside Claude Code.** The plugin previously shipped only
13
+ `/ultracost:check`, so the phase-2 features were reachable solely through the npm CLI. It now
14
+ ships `/ultracost:estimate · explain · simulate · diff · audit · usage · reconcile · calibrate ·
15
+ ledger · status` as well — each runs the bundled CLI via `${CLAUDE_PLUGIN_ROOT}/bin/cli.js`, so
16
+ plugin users need nothing on `PATH`. Generated from one table (`scripts/generate-commands.mjs`)
17
+ and drift-tested.
18
+
19
+ ### Changed
20
+ - **README is plugin-first.** The first-use example is `/ultracost:check ./path/to/workflow.js`
21
+ (the slash command), not the npm `ultracost check` — which plugin users don't have on `PATH`.
22
+ Added a full slash-command table and noted the `/ultracost:<verb>` equivalent next to each CLI
23
+ block. The npm CLI is now framed as the CI/scripting path.
24
+
25
+ ## [0.3.0] - 2026-06-14
26
+
27
+ Phase 2 — closed-loop precision and a zero-dependency visual overhaul. Still zero runtime
28
+ dependencies, Claude-Code-only, and offline on the hot path.
29
+
30
+ ### Added
31
+ - **Closed-loop, self-calibrating estimates.** New `src/transcript.js` reads local Claude Code
32
+ session transcripts offline (clean-room parse + dedup on `message.id`+`requestId`) and
33
+ attributes tokens **per dynamic-workflow stage** via `subagents/workflows/wf_*/agent-*.jsonl`
34
+ + `journal.jsonl`. `src/cost.js` prices real usage with cache multipliers.
35
+ - `ultracost usage` — real token cost from your transcripts (main vs subagents vs workflow stages).
36
+ - `ultracost reconcile [--last|<wfId>]` — estimate-vs-actual per stage for a real run.
37
+ - `ultracost calibrate` — learns a token prior from your runs (outlier-filtered) into
38
+ `~/.claude/ultracost/calibration.json`; `estimate`/`explain`/`simulate`/the gate use it automatically.
39
+ - `ultracost ledger` (alias `savings`) — cumulative savings vs an all-opus baseline,
40
+ persisted in `~/.claude/ultracost/ledger.jsonl` (idempotent per workflow id).
41
+ - **Sharper static guard.** `src/guard.js` now runs on a hand-rolled zero-dep JS tokenizer
42
+ (`src/lexer.js`) instead of regex: dynamic model values, template literals, spreads, and
43
+ optional-call `agent?.()` are handled, and fan-out detection covers `forEach`, `for…of`,
44
+ `Promise.all([...map])`, and `Array.from` (not just `.map`/`pipeline`).
45
+ - **New guard codes.** `UC006` flags a pin that mismatches the work the prompt describes,
46
+ `UC007` flags effort over the model's cap, `UC008` flags an `alwaysOpus` role pinned off-opus.
47
+ Deterministic, offline tier scoring lives in `src/classify.js`.
48
+ - **`ultracost explain` / `simulate` / `diff`.** Per-stage rationale (tier, effort, est cost,
49
+ check flags); cost under all-opus / tiered / all-sonnet side by side; and a cost delta between
50
+ two workflow versions, with `--ci` emitting an Infracost-style PR-comment table.
51
+ - **Pre-flight budget guard.** `policy.budget.perRun` / `perDay` make the cost gate hard-deny an
52
+ over-budget launch before it runs (per-day reads the savings ledger).
53
+ - **Zero-dependency visual overhaul.** New `src/render.js` (truecolor/256/16 with NO_COLOR and
54
+ FORCE_COLOR support, ANSI-aware width via `util.stripVTControlCharacters` + `Intl.Segmenter`,
55
+ box-drawing tables, bars, sparklines, rounded panels) reskins every command and the cost gate's
56
+ message (now an aligned multi-line cost table).
57
+
58
+ ### Changed
59
+ - **`status` and `doctor` are plugin-aware.** New `src/detect.js` reports how ultracost is
60
+ delivered (`plugin` / `cli` / `both` / `none`) by reading `enabledPlugins` (in `settings.json`
61
+ **and** `settings.local.json`) and the plugin cache `hooks/hooks.json` — so they no longer
62
+ report the active plugin as "off / N issues". Both surface the bypass-mode caveat.
63
+ - **`init` refuses to double-install.** When the plugin already delivers ultracost, `init` stops
64
+ (unless `--force`) so it can't write duplicate `~/.claude` rules that conflict with the plugin.
65
+ CLI hints are `npx`-aware.
66
+ - **One source for the routing prose.** The SessionStart hook (`reinject.mjs`) now compiles the
67
+ injected policy from `src/rules.js` at runtime, and `skills/ultracost/SKILL.md` is generated from
68
+ the same `compileRules()` (drift-tested), so the CLAUDE.md block, the hook, and the skill cannot
69
+ diverge. The injected prose no longer assumes a global `ultracost` binary (plugin users have none).
70
+ - **Policy `version` bumped to 2**: adds `classify.keywords`, `budget`, and
71
+ `estimation.cacheMultipliers` (all optional, back-compatible).
72
+
9
73
  ## [0.2.1] - 2026-06-14
10
74
 
11
75
  ### Changed
@@ -77,7 +141,9 @@ All notable changes to this project are documented here. The format is based on
77
141
  and tiers whose model is in `neverUse`).
78
142
  - `ultracost status`, `ultracost doctor`, `ultracost uninstall`.
79
143
 
80
- [Unreleased]: https://github.com/danielkremen818/ultracost/compare/v0.2.1...HEAD
144
+ [Unreleased]: https://github.com/danielkremen818/ultracost/compare/v0.3.1...HEAD
145
+ [0.3.1]: https://github.com/danielkremen818/ultracost/compare/v0.3.0...v0.3.1
146
+ [0.3.0]: https://github.com/danielkremen818/ultracost/compare/v0.2.1...v0.3.0
81
147
  [0.2.1]: https://github.com/danielkremen818/ultracost/compare/v0.2.0...v0.2.1
82
148
  [0.2.0]: https://github.com/danielkremen818/ultracost/compare/v0.1.0...v0.2.0
83
149
  [0.1.0]: https://github.com/danielkremen818/ultracost/releases/tag/v0.1.0
package/NOTICE CHANGED
@@ -11,8 +11,21 @@ Prior art / inspiration:
11
11
  - 0xrdan/claude-router — automatic per-prompt routing via hooks
12
12
  - gacabartosz/claude-smart-router — research-inspired complexity scoring
13
13
  - R4CK/claude-model-changer — quota-aware complexity routing
14
+ - coyvalyss1/model-matchmaker — keyword->tier routing rubric
15
+ - ryoppippi/ccusage (MIT) — the documented Claude Code transcript parse + dedup contract
16
+ - Anthropic session-report skill (Apache-2.0) — local transcript token-attribution algorithm
17
+ - krulewis/tokencast (MIT) — pre-flight estimate + calibration-from-actuals approach
18
+ - Jwrede/tokentoll (MIT) — "Infracost for LLM" cost-diff-on-PR UX
19
+
20
+ The wrong-tier keyword rubric (src/classify.js) was informed by the public router
21
+ rubrics above; the transcript parse/dedup and calibration were reimplemented
22
+ clean-room from the documented behavior of ccusage / the session-report skill —
23
+ no code was copied. ultracost remains MIT and zero-dependency.
14
24
 
15
25
  ultracost's distinct contribution is per-stage routing for dynamic workflows
16
- (ultracode): a quality-first policy plus a static-analysis guard that inspects
17
- the workflow scripts Claude Code authors and flags subagent stages that would
18
- silently inherit the session model.
26
+ (ultracode): a quality-first policy; a static-analysis guard that inspects the
27
+ workflow scripts Claude Code authors and flags stages that would silently inherit
28
+ the session model, pin a model that mismatches the task, or exceed the effort cap;
29
+ and a closed loop that reconciles its own estimates against real per-stage token
30
+ usage. No other tool statically detects an unpinned inline agent()/pipeline() stage
31
+ or a pin that mismatches the work the prompt describes.
package/README.md CHANGED
@@ -31,17 +31,22 @@ guard that fails any unpinned stage.
31
31
 
32
32
  No telemetry. No network on the hot path. MIT.
33
33
 
34
- **Setup:**
34
+ **Setup (Claude Code plugin):**
35
+
36
+ ```text
37
+ /plugin marketplace add danielkremen818/ultracost
38
+ /plugin install ultracost@ultracost
39
+ ```
40
+
41
+ **Or via the npm CLI** (CI / scripting):
35
42
 
36
43
  ```bash
37
44
  npx ultracost init
38
45
  ```
39
46
 
40
- > A Claude Code plugin is planned; today ultracost installs via the npm CLI.
41
-
42
- **Command:** `ultracost check [script]` (or `/ultracost:check` in Claude Code) — flag any `agent()` stage missing a model pin.
47
+ **First command (in Claude Code):** `/ultracost:check ./path/to/workflow.js` — flag any `agent()` stage that would silently inherit Opus. The plugin ships a slash command for every verb (`/ultracost:check · estimate · explain · simulate · diff · audit · usage · reconcile · calibrate · ledger · status`) — no global binary needed.
43
48
 
44
- **CLI verbs:** `init · check · audit · estimate · pricing · status · doctor · uninstall`.
49
+ **Same verbs on the npm CLI** (CI / scripting): `ultracost init · check · audit · estimate · explain · simulate · diff · usage · reconcile · calibrate · ledger · pricing · status · doctor · uninstall`.
45
50
 
46
51
  <div align="center">
47
52
 
@@ -78,7 +83,7 @@ ultracost makes the routing **explicit, policy-driven, and verifiable** — with
78
83
 
79
84
  ## Architecture
80
85
 
81
- One shared core in `src/`, two delivery surfaces: an **npm CLI** (the install path today) and a Claude Code **plugin** (planned). Both compile from the same `policy.json`.
86
+ One shared core in `src/`, two delivery surfaces: a Claude Code **plugin** (primary) and an **npm CLI** (secondary). Both compile from the same `policy.json`.
82
87
 
83
88
  <div align="center">
84
89
 
@@ -90,11 +95,46 @@ The plan lives in **data** (`policy.json`), not in prose buried in a prompt. The
90
95
 
91
96
  ## Install
92
97
 
98
+ ### Plugin (recommended)
99
+
100
+ Inside Claude Code:
101
+
102
+ ```text
103
+ /plugin marketplace add danielkremen818/ultracost
104
+ /plugin install ultracost@ultracost
105
+ ```
106
+
107
+ Then, **without leaving Claude Code**, drive everything through slash commands — verify the workflow Claude just drafted, estimate it, or reconcile a finished run against what it actually cost:
108
+
109
+ ```text
110
+ /ultracost:check ./path/to/workflow.js
111
+ ```
112
+
113
+ The plugin bundles — touching none of your own files — a `SessionStart` policy-injection hook, a `PreToolUse` cost gate on the `Workflow` tool (`ULTRACOST_GATE=off` to disable), a routing-policy skill, and a slash command for every verb (each runs the bundled CLI via `${CLAUDE_PLUGIN_ROOT}`, so there's nothing to install on `PATH`):
114
+
115
+ | Command | What it does |
116
+ |---------|--------------|
117
+ | `/ultracost:check [path]` | Flag `agent()` stages that don't pin a model (or pin the wrong tier). Defaults to the most recent workflow script. |
118
+ | `/ultracost:estimate <script>` | Agent count, model mix, tiered cost vs an all-opus baseline. |
119
+ | `/ultracost:explain <script>` | Per-stage rationale: model, effort, the tier the prompt reads like, est cost, check flags. |
120
+ | `/ultracost:simulate <script>` | Cost under all-opus vs your tiered pins vs all-sonnet. |
121
+ | `/ultracost:diff <a> <b>` | Cost delta between two versions of a script. |
122
+ | `/ultracost:audit [dir]` | Pin stats across your real workflow scripts. |
123
+ | `/ultracost:usage [dir]` | Real token cost from local transcripts (main vs subagents vs stages). |
124
+ | `/ultracost:reconcile [--last\|<id>]` | Estimate vs **actual** per stage for a finished run. |
125
+ | `/ultracost:calibrate` | Tune the estimator from your real token usage. |
126
+ | `/ultracost:ledger` | Cumulative savings vs all-opus across recorded runs. |
127
+ | `/ultracost:status` | How ultracost is delivered (plugin/cli), the policy, and the bypass caveat. |
128
+
129
+ Requires Claude Code with the `/plugin` command and dynamic workflows enabled.
130
+
131
+ ### npm CLI
132
+
93
133
  ```bash
94
134
  npx ultracost init
95
135
  ```
96
136
 
97
- This writes `~/.claude/ultracost/policy.json`, injects the routing block into `~/.claude/CLAUDE.md`, installs the re-inject hook (`~/.claude/ultracost/reinject.mjs`), and registers it on `SessionStart` in `~/.claude/settings.json`. New sessions pick it up immediately. Paths honor `CLAUDE_CONFIG_DIR` if you've relocated your config.
137
+ This writes `~/.claude/ultracost/policy.json`, injects the routing block into `~/.claude/CLAUDE.md`, installs the re-inject hook (`~/.claude/ultracost/reinject.mjs`), and registers it on `SessionStart` in `~/.claude/settings.json`. New sessions pick it up immediately. Paths honor `CLAUDE_CONFIG_DIR` if you've relocated your config. Requires Node ≥ 24.
98
138
 
99
139
  Then verify a workflow script at any time:
100
140
 
@@ -102,25 +142,42 @@ Then verify a workflow script at any time:
102
142
  ultracost check ./path/to/workflow.js
103
143
  ```
104
144
 
105
- > Requires Node 24. A Claude Code plugin — bundling the `SessionStart` policy injection, a `PreToolUse` cost gate on the `Workflow` tool, the `/ultracost:check` command, and a routing-policy skill — ships in this repo and is planned as a separate install path; for now use the npm CLI above.
145
+ > Use the npm path for CI/scripting or the CLAUDE.md-injection workflow; for day-to-day use in Claude Code, the plugin above is simpler.
106
146
 
107
147
  ## Uninstall
108
148
 
149
+ ### Plugin
150
+
151
+ ```text
152
+ /plugin uninstall ultracost@ultracost
153
+ /plugin marketplace remove ultracost
154
+ ```
155
+
156
+ The plugin touches none of your own files, so removing it removes everything ultracost added.
157
+
158
+ ### npm CLI
159
+
109
160
  ```bash
110
161
  ultracost uninstall
111
162
  ```
112
163
 
113
164
  Reverses everything `init` did: removes the routing block from `~/.claude/CLAUDE.md`, deletes `~/.claude/ultracost/`, and unregisters the hook from `~/.claude/settings.json` (an invalid `settings.json` is reported, never overwritten).
114
165
 
115
- ## Quickstart (CLI)
166
+ ## Quickstart (npm CLI)
167
+
168
+ > Inside Claude Code, every verb below is also a slash command — `/ultracost:estimate`, `/ultracost:reconcile`, etc. (see the table above). This section is the npm path for CI, scripting, or the CLAUDE.md-injection workflow.
116
169
 
117
170
  ```bash
118
- ultracost init # install policy + rules + hook
119
- ultracost status # see the active policy and install state
171
+ ultracost init # install policy + rules + hook (refuses if the plugin already delivers it)
172
+ ultracost status # active policy + how it's delivered (plugin/cli) + bypass caveat
120
173
  ultracost audit ~/.claude/projects # pin stats across your real workflow scripts
121
174
  ultracost check ./path/to/workflow # scan a workflow script (or a directory)
122
175
  ultracost check . --fix # auto-pin the default model on unpinned stages
123
176
  ultracost estimate ./workflow.js # agents, model mix, and cost vs all-opus baseline
177
+ ultracost explain ./workflow.js # per-stage rationale + which checks fire
178
+ ultracost reconcile --last # estimate vs actual for your latest real run
179
+ ultracost calibrate # tune the estimator from your real token usage
180
+ ultracost ledger # cumulative savings vs all-opus
124
181
  ultracost pricing refresh # update prices from Anthropic's official page
125
182
  ```
126
183
 
@@ -147,6 +204,33 @@ $ ultracost estimate ./workflow.js
147
204
 
148
205
  Estimates are relative (tiered vs all-opus), not a bill; fan-outs are ranges; the interactive 3-option menu needs a TUI. Full detail, assumptions, and the gate's [#52343](https://github.com/anthropics/claude-code/issues/52343) limitation are in [`docs/ESTIMATES.md`](./docs/ESTIMATES.md).
149
206
 
207
+ ## The closed loop: measure, reconcile, calibrate
208
+
209
+ ultracost doesn't just estimate — it reads its own results back and tunes itself. It parses your **local** Claude Code transcripts (offline; no network, no telemetry) and attributes tokens **per dynamic-workflow stage** via the `subagents/workflows/wf_*/agent-*.jsonl` files Claude Code writes. No other router does this.
210
+
211
+ ```bash
212
+ ultracost usage # real token cost: main loop vs subagents vs workflow stages
213
+ ultracost reconcile --last # estimate vs ACTUAL, per stage, for your latest workflow run
214
+ ultracost calibrate # learn a token prior from your real runs (estimate uses it)
215
+ ultracost ledger # cumulative $ saved vs an all-opus baseline, persisted
216
+ ```
217
+
218
+ In Claude Code these are `/ultracost:usage`, `/ultracost:reconcile`, `/ultracost:calibrate`, and `/ultracost:ledger` — same output, no CLI install.
219
+
220
+ - **Self-calibrating.** `calibrate` learns real per-stage token sizes (outlier-filtered) into `~/.claude/ultracost/calibration.json`; `estimate`, `explain`, `simulate`, and the gate use it automatically — the estimate gets closer to your reality every run.
221
+ - **Savings ledger.** `ledger` keeps a running tally of what the policy saved you versus running everything on Opus, persisted in `~/.claude/ultracost/ledger.jsonl` (idempotent per run).
222
+ - **Pre-flight budget guard.** Set `budget.perRun` / `budget.perDay` in the policy and the cost gate **denies** a launch whose estimate would blow the cap — before it runs.
223
+
224
+ ## Understand and compare a workflow
225
+
226
+ ```bash
227
+ ultracost explain ./wf.js # per-stage: tier, effort, est cost, and which UC checks fire
228
+ ultracost simulate ./wf.js # cost under all-opus vs your tiered pins vs all-sonnet
229
+ ultracost diff old.js new.js # cost delta between two versions (--ci → PR-comment table)
230
+ ```
231
+
232
+ Or `/ultracost:explain`, `/ultracost:simulate`, `/ultracost:diff` inside Claude Code.
233
+
150
234
  ## How routing is decided
151
235
 
152
236
  | Tier | Model | Use for |
@@ -175,9 +259,12 @@ wf.js:4:13 UC003 stage pins banned model "haiku" (policy.neverUse)
175
259
  | `UC002` | options object present, no `model` |
176
260
  | `UC003` | model resolves to a banned model (e.g. haiku) |
177
261
  | `UC004` | `model: 'inherit'` while `allowInherit` is false |
178
- | `UC005` | options passed as a variable — can't verify (warning) |
262
+ | `UC005` | model/options is a dynamic expression — can't verify (warning) |
263
+ | `UC006` | the pinned model mismatches the work the prompt describes (warning) |
264
+ | `UC007` | `effort` exceeds the model's cap, e.g. `sonnet` @ `xhigh` (warning) |
265
+ | `UC008` | an `alwaysOpus` role (orchestrator, consolidation, …) pins a cheaper tier (warning) |
179
266
 
180
- The scanner is string- and comment-aware: an `agent(` that appears inside a prompt string or a comment is prose, not a call, and is never flagged. `--json` for CI, `--fix` to auto-insert the default model on the unambiguous cases (`UC001`/`UC002`), `--quiet` to print only the problems. Exit code is non-zero when errors are found.
267
+ The scanner runs on a hand-rolled, zero-dependency JS tokenizer, so it's robust to template literals, spreads, optional-call `agent?.()`, and dynamic model values — and an `agent(` inside a prompt string or comment is prose, never a call. Fan-out detection covers `.map`/`.flatMap`/`forEach`/`for…of`/`Promise.all`/`Array.from`/`pipeline`. `--json` for CI, `--fix` to auto-insert the default model on the unambiguous cases (`UC001`/`UC002`), `--quiet` to print only the problems. `UC006`–`UC008` are advisory warnings and never fail the build on their own; exit code is non-zero only when pin-presence errors (`UC001`–`UC004`) are found.
181
268
 
182
269
  ## Audit your history
183
270
 
@@ -231,7 +318,7 @@ Fails the build if any committed workflow script has a stage that would inherit
231
318
 
232
319
  ## How it compares
233
320
 
234
- ultracost is intentionally narrow. General-purpose routers ([claude-router](https://github.com/0xrdan/claude-router), [claude-smart-router](https://github.com/gacabartosz/claude-smart-router), [claude-model-changer](https://github.com/R4CK/claude-model-changer)) score every prompt and route the *main loop*. ultracost targets the **dynamic-workflow / ultracode** path and adds a **guard** that statically verifies stage-level pins which none of them do. See [`NOTICE`](./NOTICE) for prior-art credits.
321
+ ultracost is intentionally narrow. General-purpose routers ([claude-router](https://github.com/0xrdan/claude-router), [claude-smart-router](https://github.com/gacabartosz/claude-smart-router), [claude-model-changer](https://github.com/R4CK/claude-model-changer), [model-matchmaker](https://github.com/coyvalyss1/model-matchmaker)) score every prompt and route the *main loop* at runtime. Linters like [claudelint](https://github.com/pdugan20/claudelint) validate a *file-based* agent's `model:` value. ultracost targets the **dynamic-workflow / ultracode** path and is, as far as we can tell, the only tool that **statically detects an unpinned inline `agent()`/`pipeline()` stage, flags a pin that mismatches the work the prompt describes, and reconciles its own cost estimate against real per-stage token usage**. Cost tooling like [ccusage](https://github.com/ryoppippi/ccusage), [tokencast](https://github.com/krulewis/tokencast), and [tokentoll](https://github.com/Jwrede/tokentoll) informed the transcript-parsing, calibration, and cost-diff approaches (reimplemented clean-room). See [`NOTICE`](./NOTICE) for prior-art credits.
235
322
 
236
323
  ## Documentation
237
324