npm - ultracost - Versions diffs - 0.2.0 → 0.3.0 - Mend

ultracost 0.2.0 → 0.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (27) hide show

package/CHANGELOG.md +53 -2
package/NOTICE +16 -3
package/README.md +62 -51
package/bin/cli.js +514 -117
package/docs/ESTIMATES.md +24 -0
package/docs/PUBLISHING.md +37 -35
package/docs/TESTING.md +4 -15
package/docs/architecture.md +28 -0
package/docs/policy.md +25 -2
package/package.json +1 -1
package/src/classify.js +125 -0
package/src/cost.js +54 -0
package/src/detect.js +93 -0
package/src/estimate.js +18 -0
package/src/guard.js +244 -166
package/src/index.js +7 -1
package/src/lexer.js +227 -0
package/src/log.js +20 -13
package/src/loop.js +143 -0
package/src/paths.js +10 -0
package/src/policy.js +14 -0
package/src/render.js +211 -0
package/src/rules.js +17 -5
package/src/transcript.js +186 -0
package/templates/hooks/reinject.mjs +21 -18
package/templates/hooks/workflow-gate.mjs +51 -45
package/templates/policy.default.json +15 -2

package/CHANGELOG.md CHANGED Viewed

@@ -6,6 +6,56 @@ All notable changes to this project are documented here. The format is based on
 ## [Unreleased]
+## [0.3.0] - 2026-06-14
+Phase 2 — closed-loop precision and a zero-dependency visual overhaul. Still zero runtime
+dependencies, Claude-Code-only, and offline on the hot path.
+### Added
+- **Closed-loop, self-calibrating estimates.** New `src/transcript.js` reads local Claude Code
+  session transcripts offline (clean-room parse + dedup on `message.id`+`requestId`) and
+  attributes tokens **per dynamic-workflow stage** via `subagents/workflows/wf_*/agent-*.jsonl`
+  + `journal.jsonl`. `src/cost.js` prices real usage with cache multipliers.
+  - `ultracost usage` — real token cost from your transcripts (main vs subagents vs workflow stages).
+  - `ultracost reconcile [--last|<wfId>]` — estimate-vs-actual per stage for a real run.
+  - `ultracost calibrate` — learns a token prior from your runs (outlier-filtered) into
+    `~/.claude/ultracost/calibration.json`; `estimate`/`explain`/`simulate`/the gate use it automatically.
+  - `ultracost ledger` (alias `savings`) — cumulative savings vs an all-opus baseline,
+    persisted in `~/.claude/ultracost/ledger.jsonl` (idempotent per workflow id).
+- **Sharper static guard.** `src/guard.js` now runs on a hand-rolled zero-dep JS tokenizer
+  (`src/lexer.js`) instead of regex: dynamic model values, template literals, spreads, and
+  optional-call `agent?.()` are handled, and fan-out detection covers `forEach`, `for…of`,
+  `Promise.all([...map])`, and `Array.from` (not just `.map`/`pipeline`).
+- **New guard codes.** `UC006` flags a pin that mismatches the work the prompt describes,
+  `UC007` flags effort over the model's cap, `UC008` flags an `alwaysOpus` role pinned off-opus.
+  Deterministic, offline tier scoring lives in `src/classify.js`.
+- **`ultracost explain` / `simulate` / `diff`.** Per-stage rationale (tier, effort, est cost,
+  check flags); cost under all-opus / tiered / all-sonnet side by side; and a cost delta between
+  two workflow versions, with `--ci` emitting an Infracost-style PR-comment table.
+- **Pre-flight budget guard.** `policy.budget.perRun` / `perDay` make the cost gate hard-deny an
+  over-budget launch before it runs (per-day reads the savings ledger).
+- **Zero-dependency visual overhaul.** New `src/render.js` (truecolor/256/16 with NO_COLOR and
+  FORCE_COLOR support, ANSI-aware width via `util.stripVTControlCharacters` + `Intl.Segmenter`,
+  box-drawing tables, bars, sparklines, rounded panels) reskins every command and the cost gate's
+  message (now an aligned multi-line cost table).
+### Changed
+- **`status` and `doctor` are plugin-aware.** New `src/detect.js` reports how ultracost is
+  delivered (`plugin` / `cli` / `both` / `none`) by reading `enabledPlugins` (in `settings.json`
+  **and** `settings.local.json`) and the plugin cache `hooks/hooks.json` — so they no longer
+  report the active plugin as "off / N issues". Both surface the bypass-mode caveat.
+- **`init` refuses to double-install.** When the plugin already delivers ultracost, `init` stops
+  (unless `--force`) so it can't write duplicate `~/.claude` rules that conflict with the plugin.
+  CLI hints are `npx`-aware.
+- **One source for the routing prose.** The SessionStart hook (`reinject.mjs`) now compiles the
+  injected policy from `src/rules.js` at runtime, and `skills/ultracost/SKILL.md` is generated from
+  the same `compileRules()` (drift-tested), so the CLAUDE.md block, the hook, and the skill cannot
+  diverge. The injected prose no longer assumes a global `ultracost` binary (plugin users have none).
+- **Policy `version` bumped to 2**: adds `classify.keywords`, `budget`, and
+  `estimation.cacheMultipliers` (all optional, back-compatible).
+## [0.2.1] - 2026-06-14
 ### Changed
 - **The cost gate is now mode-aware and hard in every permission mode.** It reads
   `permission_mode` from the `PreToolUse` event: it asks (with the estimate) in
@@ -74,8 +124,9 @@ All notable changes to this project are documented here. The format is based on
 - Data-driven `policy.json` with load-time validation (rejects undefined default tiers
   and tiers whose model is in `neverUse`).
 - `ultracost status`, `ultracost doctor`, `ultracost uninstall`.
-- Docs: architecture, policy reference, and the ultracode rationale.
-[Unreleased]: https://github.com/danielkremen818/ultracost/compare/v0.2.0...HEAD
+[Unreleased]: https://github.com/danielkremen818/ultracost/compare/v0.3.0...HEAD
+[0.3.0]: https://github.com/danielkremen818/ultracost/compare/v0.2.1...v0.3.0
+[0.2.1]: https://github.com/danielkremen818/ultracost/compare/v0.2.0...v0.2.1
 [0.2.0]: https://github.com/danielkremen818/ultracost/compare/v0.1.0...v0.2.0
 [0.1.0]: https://github.com/danielkremen818/ultracost/releases/tag/v0.1.0

package/NOTICE CHANGED Viewed

@@ -11,8 +11,21 @@ Prior art / inspiration:
   - 0xrdan/claude-router          — automatic per-prompt routing via hooks
   - gacabartosz/claude-smart-router — research-inspired complexity scoring
   - R4CK/claude-model-changer     — quota-aware complexity routing
+  - coyvalyss1/model-matchmaker   — keyword->tier routing rubric
+  - ryoppippi/ccusage (MIT)       — the documented Claude Code transcript parse + dedup contract
+  - Anthropic session-report skill (Apache-2.0) — local transcript token-attribution algorithm
+  - krulewis/tokencast (MIT)      — pre-flight estimate + calibration-from-actuals approach
+  - Jwrede/tokentoll (MIT)        — "Infracost for LLM" cost-diff-on-PR UX
+The wrong-tier keyword rubric (src/classify.js) was informed by the public router
+rubrics above; the transcript parse/dedup and calibration were reimplemented
+clean-room from the documented behavior of ccusage / the session-report skill —
+no code was copied. ultracost remains MIT and zero-dependency.
 ultracost's distinct contribution is per-stage routing for dynamic workflows
-(ultracode): a quality-first policy plus a static-analysis guard that inspects
-the workflow scripts Claude Code authors and flags subagent stages that would
-silently inherit the session model.
+(ultracode): a quality-first policy; a static-analysis guard that inspects the
+workflow scripts Claude Code authors and flags stages that would silently inherit
+the session model, pin a model that mismatches the task, or exceed the effort cap;
+and a closed loop that reconciles its own estimates against real per-stage token
+usage. No other tool statically detects an unpinned inline agent()/pipeline() stage
+or a pin that mismatches the work the prompt describes.

package/README.md CHANGED Viewed

@@ -1,5 +1,7 @@
 <div align="center">
+<img src="./assets/logo.png" alt="ultracost logo" width="150">
 # ultracost
 **Per-stage model routing for Claude Code dynamic workflows.**
@@ -29,16 +31,22 @@ guard that fails any unpinned stage.
 No telemetry. No network on the hot path. MIT.
-**Setup (plugin):**
+**Setup (Claude Code plugin):**
 ```text
 /plugin marketplace add danielkremen818/ultracost
 /plugin install ultracost@ultracost
 ```
-**Built-in command:** `/ultracost:check [script]` — flag any `agent()` stage missing a model pin.
+**Or via the npm CLI** (CI / scripting):
-**CLI verbs:** `init · check · audit · estimate · pricing · status · doctor · uninstall`.
+```bash
+npx ultracost init
+```
+**Command:** `ultracost check [script]` (or `/ultracost:check` in Claude Code) — flag any `agent()` stage missing a model pin.
+**CLI verbs:** `init · check · audit · estimate · explain · simulate · diff · usage · reconcile · calibrate · ledger · pricing · status · doctor · uninstall`.
 <div align="center">
@@ -77,35 +85,11 @@ ultracost makes the routing **explicit, policy-driven, and verifiable** — with
 One shared core in `src/`, two delivery surfaces: a Claude Code **plugin** (primary) and an **npm CLI** (secondary). Both compile from the same `policy.json`.
-```mermaid
-flowchart TD
-    subgraph core["src/ — shared core"]
-        POL["policy.js<br/>(policy.json — you own it)"]
-        RUL["rules.js<br/>(rule compiler)"]
-        GRD["guard.js<br/>(static analysis)"]
-    end
-    subgraph plugin["Claude Code plugin — PRIMARY"]
-        SK["skills/ultracost<br/>routing policy (always-relevant)"]
-        CMD["/ultracost:check command"]
-        HK["hooks.json<br/>SessionStart (all sources)"]
-    end
-    subgraph cli["npm CLI — secondary"]
-        BIN["bin/cli.js<br/>init · check · audit · doctor · status · uninstall"]
-    end
-    POL --> RUL
-    RUL --> SK
-    RUL --> BIN
-    GRD --> CMD
-    GRD --> BIN
-    HK --> RE["reinject.mjs<br/>(node, no bash/jq)"]
-    BIN --> RE
-    classDef ft fill:#1f6feb,stroke:#0b3d91,color:#fff;
-    class POL,RUL,GRD ft;
-```
+<div align="center">
+<img src="./assets/architecture.svg" alt="ultracost architecture — policy.json compiles through the src/ core into a Claude Code plugin and an npm CLI; the guard and a PreToolUse cost gate verify every workflow stage at runtime" width="960">
+</div>
 The plan lives in **data** (`policy.json`), not in prose buried in a prompt. The guard is the enforcement layer the model can't talk its way out of. See [`docs/architecture.md`](./docs/architecture.md) for the full picture.
@@ -113,7 +97,7 @@ The plan lives in **data** (`policy.json`), not in prose buried in a prompt. The
 ### Plugin (recommended)
-Inside Claude Code, add the marketplace and install the plugin:
+Inside Claude Code:
 ```text
 /plugin marketplace add danielkremen818/ultracost
@@ -126,26 +110,23 @@ Then verify a workflow script at any time:
 /ultracost:check ./path/to/workflow.js
 ```
-The plugin bundles four things and **touches none of your existing files**:
+The plugin bundles — touching none of your own files — a `SessionStart` policy-injection hook, a `PreToolUse` cost gate on the `Workflow` tool (`ULTRACOST_GATE=off` to disable), the `/ultracost:check` command, and a routing-policy skill. Requires Claude Code with the `/plugin` command and dynamic workflows enabled.
-- a **`SessionStart`** hook that injects the routing policy as context at session start and after compaction (the always-on guidance),
-- a **`PreToolUse` cost gate** on the `Workflow` tool that hard-stops every dynamic-workflow launch with an estimate (set `ULTRACOST_GATE=off` to disable; see [`docs/ESTIMATES.md`](./docs/ESTIMATES.md)),
-- the **`/ultracost:check`** command (the Workflow Guard), and
-- a **routing-policy skill** for explicit reference when Claude authors workflow/ultracode/subagent scripts.
+### npm CLI
-> Requires Claude Code with the `/plugin` command (run `/help` to confirm) and dynamic workflows enabled.
+```bash
+npx ultracost init
+```
-### npm CLI (professional secondary)
+This writes `~/.claude/ultracost/policy.json`, injects the routing block into `~/.claude/CLAUDE.md`, installs the re-inject hook (`~/.claude/ultracost/reinject.mjs`), and registers it on `SessionStart` in `~/.claude/settings.json`. New sessions pick it up immediately. Paths honor `CLAUDE_CONFIG_DIR` if you've relocated your config. Requires Node ≥ 24.
-For CI, scripting, or if you prefer the `~/.claude/CLAUDE.md` injection path:
+Then verify a workflow script at any time:
 ```bash
-npx ultracost init
+ultracost check ./path/to/workflow.js
 ```
-This writes `~/.claude/ultracost/policy.json`, injects the routing block into `~/.claude/CLAUDE.md`, installs the re-inject hook (`~/.claude/ultracost/reinject.mjs`), and registers it in `~/.claude/settings.json`. New sessions pick it up immediately. Paths honor `CLAUDE_CONFIG_DIR` if you've relocated your config.
-> Requires Node ≥ 24.
+> Use the npm path for CI/scripting or the CLAUDE.md-injection workflow; for day-to-day use in Claude Code, the plugin above is simpler.
 ## Uninstall
@@ -156,7 +137,7 @@ This writes `~/.claude/ultracost/policy.json`, injects the routing block into `~
 /plugin marketplace remove ultracost
 ```
-The plugin touches none of your own files — its hook, command, and skill live inside the plugin package, so removing it removes everything; nothing is left in your `~/.claude/CLAUDE.md` or `settings.json`. If Claude Code offers to "disable in `settings.local.json`" instead (because the plugin was enabled in a shared `settings.json`), that has the same effect — accept it, or remove the marketplace as above.
+The plugin touches none of your own files, so removing it removes everything ultracost added.
 ### npm CLI
@@ -169,12 +150,16 @@ Reverses everything `init` did: removes the routing block from `~/.claude/CLAUDE
 ## Quickstart (CLI)
 ```bash
-ultracost init                      # install policy + rules + hook
-ultracost status                    # see the active policy and install state
+ultracost init                      # install policy + rules + hook (refuses if the plugin already delivers it)
+ultracost status                    # active policy + how it's delivered (plugin/cli) + bypass caveat
 ultracost audit ~/.claude/projects  # pin stats across your real workflow scripts
 ultracost check ./path/to/workflow  # scan a workflow script (or a directory)
 ultracost check . --fix             # auto-pin the default model on unpinned stages
 ultracost estimate ./workflow.js    # agents, model mix, and cost vs all-opus baseline
+ultracost explain ./workflow.js     # per-stage rationale + which checks fire
+ultracost reconcile --last          # estimate vs actual for your latest real run
+ultracost calibrate                 # tune the estimator from your real token usage
+ultracost ledger                    # cumulative savings vs all-opus
 ultracost pricing refresh           # update prices from Anthropic's official page
 ```
@@ -201,6 +186,29 @@ $ ultracost estimate ./workflow.js
 Estimates are relative (tiered vs all-opus), not a bill; fan-outs are ranges; the interactive 3-option menu needs a TUI. Full detail, assumptions, and the gate's [#52343](https://github.com/anthropics/claude-code/issues/52343) limitation are in [`docs/ESTIMATES.md`](./docs/ESTIMATES.md).
+## The closed loop: measure, reconcile, calibrate
+ultracost doesn't just estimate — it reads its own results back and tunes itself. It parses your **local** Claude Code transcripts (offline; no network, no telemetry) and attributes tokens **per dynamic-workflow stage** via the `subagents/workflows/wf_*/agent-*.jsonl` files Claude Code writes. No other router does this.
+```bash
+ultracost usage                 # real token cost: main loop vs subagents vs workflow stages
+ultracost reconcile --last      # estimate vs ACTUAL, per stage, for your latest workflow run
+ultracost calibrate             # learn a token prior from your real runs (estimate uses it)
+ultracost ledger                # cumulative $ saved vs an all-opus baseline, persisted
+```
+- **Self-calibrating.** `calibrate` learns real per-stage token sizes (outlier-filtered) into `~/.claude/ultracost/calibration.json`; `estimate`, `explain`, `simulate`, and the gate use it automatically — the estimate gets closer to your reality every run.
+- **Savings ledger.** `ledger` keeps a running tally of what the policy saved you versus running everything on Opus, persisted in `~/.claude/ultracost/ledger.jsonl` (idempotent per run).
+- **Pre-flight budget guard.** Set `budget.perRun` / `budget.perDay` in the policy and the cost gate **denies** a launch whose estimate would blow the cap — before it runs.
+## Understand and compare a workflow
+```bash
+ultracost explain ./wf.js       # per-stage: tier, effort, est cost, and which UC checks fire
+ultracost simulate ./wf.js      # cost under all-opus vs your tiered pins vs all-sonnet
+ultracost diff old.js new.js    # cost delta between two versions (--ci → PR-comment table)
+```
 ## How routing is decided
 | Tier | Model | Use for |
@@ -229,9 +237,12 @@ wf.js:4:13  UC003  stage pins banned model "haiku" (policy.neverUse)
 | `UC002` | options object present, no `model` |
 | `UC003` | model resolves to a banned model (e.g. haiku) |
 | `UC004` | `model: 'inherit'` while `allowInherit` is false |
-| `UC005` | options passed as a variable — can't verify (warning) |
+| `UC005` | model/options is a dynamic expression — can't verify (warning) |
+| `UC006` | the pinned model mismatches the work the prompt describes (warning) |
+| `UC007` | `effort` exceeds the model's cap, e.g. `sonnet` @ `xhigh` (warning) |
+| `UC008` | an `alwaysOpus` role (orchestrator, consolidation, …) pins a cheaper tier (warning) |
-The scanner is string- and comment-aware: an `agent(` that appears inside a prompt string or a comment is prose, not a call, and is never flagged. `--json` for CI, `--fix` to auto-insert the default model on the unambiguous cases (`UC001`/`UC002`), `--quiet` to print only the problems. Exit code is non-zero when errors are found.
+The scanner runs on a hand-rolled, zero-dependency JS tokenizer, so it's robust to template literals, spreads, optional-call `agent?.()`, and dynamic model values — and an `agent(` inside a prompt string or comment is prose, never a call. Fan-out detection covers `.map`/`.flatMap`/`forEach`/`for…of`/`Promise.all`/`Array.from`/`pipeline`. `--json` for CI, `--fix` to auto-insert the default model on the unambiguous cases (`UC001`/`UC002`), `--quiet` to print only the problems. `UC006`–`UC008` are advisory warnings and never fail the build on their own; exit code is non-zero only when pin-presence errors (`UC001`–`UC004`) are found.
 ## Audit your history
@@ -285,7 +296,7 @@ Fails the build if any committed workflow script has a stage that would inherit
 ## How it compares
-ultracost is intentionally narrow. General-purpose routers ([claude-router](https://github.com/0xrdan/claude-router), [claude-smart-router](https://github.com/gacabartosz/claude-smart-router), [claude-model-changer](https://github.com/R4CK/claude-model-changer)) score every prompt and route the *main loop*. ultracost targets the **dynamic-workflow / ultracode** path and adds a **guard** that statically verifies stage-level pins — which none of them do. See [`NOTICE`](./NOTICE) for prior-art credits.
+ultracost is intentionally narrow. General-purpose routers ([claude-router](https://github.com/0xrdan/claude-router), [claude-smart-router](https://github.com/gacabartosz/claude-smart-router), [claude-model-changer](https://github.com/R4CK/claude-model-changer), [model-matchmaker](https://github.com/coyvalyss1/model-matchmaker)) score every prompt and route the *main loop* at runtime. Linters like [claudelint](https://github.com/pdugan20/claudelint) validate a *file-based* agent's `model:` value. ultracost targets the **dynamic-workflow / ultracode** path and is, as far as we can tell, the only tool that **statically detects an unpinned inline `agent()`/`pipeline()` stage, flags a pin that mismatches the work the prompt describes, and reconciles its own cost estimate against real per-stage token usage**. Cost tooling like [ccusage](https://github.com/ryoppippi/ccusage), [tokencast](https://github.com/krulewis/tokencast), and [tokentoll](https://github.com/Jwrede/tokentoll) informed the transcript-parsing, calibration, and cost-diff approaches (reimplemented clean-room). See [`NOTICE`](./NOTICE) for prior-art credits.
 ## Documentation