npm - ultracost - Versions diffs - 0.2.1 → 0.3.1 - Mend

ultracost 0.2.1 → 0.3.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (26) hide show

package/CHANGELOG.md +67 -1
package/NOTICE +16 -3
package/README.md +101 -14
package/bin/cli.js +514 -117
package/docs/ESTIMATES.md +24 -0
package/docs/PUBLISHING.md +41 -34
package/docs/architecture.md +19 -1
package/docs/policy.md +25 -2
package/package.json +1 -1
package/src/classify.js +125 -0
package/src/cost.js +54 -0
package/src/detect.js +93 -0
package/src/estimate.js +18 -0
package/src/guard.js +244 -166
package/src/index.js +7 -1
package/src/lexer.js +227 -0
package/src/log.js +20 -13
package/src/loop.js +143 -0
package/src/paths.js +10 -0
package/src/policy.js +14 -0
package/src/render.js +211 -0
package/src/rules.js +17 -5
package/src/transcript.js +186 -0
package/templates/hooks/reinject.mjs +21 -18
package/templates/hooks/workflow-gate.mjs +51 -45
package/templates/policy.default.json +15 -2

package/CHANGELOG.md CHANGED Viewed

@@ -6,6 +6,70 @@ All notable changes to this project are documented here. The format is based on
 ## [Unreleased]
+## [0.3.1] - 2026-06-14
+### Added
+- **A slash command for every verb inside Claude Code.** The plugin previously shipped only
+  `/ultracost:check`, so the phase-2 features were reachable solely through the npm CLI. It now
+  ships `/ultracost:estimate · explain · simulate · diff · audit · usage · reconcile · calibrate ·
+  ledger · status` as well — each runs the bundled CLI via `${CLAUDE_PLUGIN_ROOT}/bin/cli.js`, so
+  plugin users need nothing on `PATH`. Generated from one table (`scripts/generate-commands.mjs`)
+  and drift-tested.
+### Changed
+- **README is plugin-first.** The first-use example is `/ultracost:check ./path/to/workflow.js`
+  (the slash command), not the npm `ultracost check` — which plugin users don't have on `PATH`.
+  Added a full slash-command table and noted the `/ultracost:<verb>` equivalent next to each CLI
+  block. The npm CLI is now framed as the CI/scripting path.
+## [0.3.0] - 2026-06-14
+Phase 2 — closed-loop precision and a zero-dependency visual overhaul. Still zero runtime
+dependencies, Claude-Code-only, and offline on the hot path.
+### Added
+- **Closed-loop, self-calibrating estimates.** New `src/transcript.js` reads local Claude Code
+  session transcripts offline (clean-room parse + dedup on `message.id`+`requestId`) and
+  attributes tokens **per dynamic-workflow stage** via `subagents/workflows/wf_*/agent-*.jsonl`
+  + `journal.jsonl`. `src/cost.js` prices real usage with cache multipliers.
+  - `ultracost usage` — real token cost from your transcripts (main vs subagents vs workflow stages).
+  - `ultracost reconcile [--last|<wfId>]` — estimate-vs-actual per stage for a real run.
+  - `ultracost calibrate` — learns a token prior from your runs (outlier-filtered) into
+    `~/.claude/ultracost/calibration.json`; `estimate`/`explain`/`simulate`/the gate use it automatically.
+  - `ultracost ledger` (alias `savings`) — cumulative savings vs an all-opus baseline,
+    persisted in `~/.claude/ultracost/ledger.jsonl` (idempotent per workflow id).
+- **Sharper static guard.** `src/guard.js` now runs on a hand-rolled zero-dep JS tokenizer
+  (`src/lexer.js`) instead of regex: dynamic model values, template literals, spreads, and
+  optional-call `agent?.()` are handled, and fan-out detection covers `forEach`, `for…of`,
+  `Promise.all([...map])`, and `Array.from` (not just `.map`/`pipeline`).
+- **New guard codes.** `UC006` flags a pin that mismatches the work the prompt describes,
+  `UC007` flags effort over the model's cap, `UC008` flags an `alwaysOpus` role pinned off-opus.
+  Deterministic, offline tier scoring lives in `src/classify.js`.
+- **`ultracost explain` / `simulate` / `diff`.** Per-stage rationale (tier, effort, est cost,
+  check flags); cost under all-opus / tiered / all-sonnet side by side; and a cost delta between
+  two workflow versions, with `--ci` emitting an Infracost-style PR-comment table.
+- **Pre-flight budget guard.** `policy.budget.perRun` / `perDay` make the cost gate hard-deny an
+  over-budget launch before it runs (per-day reads the savings ledger).
+- **Zero-dependency visual overhaul.** New `src/render.js` (truecolor/256/16 with NO_COLOR and
+  FORCE_COLOR support, ANSI-aware width via `util.stripVTControlCharacters` + `Intl.Segmenter`,
+  box-drawing tables, bars, sparklines, rounded panels) reskins every command and the cost gate's
+  message (now an aligned multi-line cost table).
+### Changed
+- **`status` and `doctor` are plugin-aware.** New `src/detect.js` reports how ultracost is
+  delivered (`plugin` / `cli` / `both` / `none`) by reading `enabledPlugins` (in `settings.json`
+  **and** `settings.local.json`) and the plugin cache `hooks/hooks.json` — so they no longer
+  report the active plugin as "off / N issues". Both surface the bypass-mode caveat.
+- **`init` refuses to double-install.** When the plugin already delivers ultracost, `init` stops
+  (unless `--force`) so it can't write duplicate `~/.claude` rules that conflict with the plugin.
+  CLI hints are `npx`-aware.
+- **One source for the routing prose.** The SessionStart hook (`reinject.mjs`) now compiles the
+  injected policy from `src/rules.js` at runtime, and `skills/ultracost/SKILL.md` is generated from
+  the same `compileRules()` (drift-tested), so the CLAUDE.md block, the hook, and the skill cannot
+  diverge. The injected prose no longer assumes a global `ultracost` binary (plugin users have none).
+- **Policy `version` bumped to 2**: adds `classify.keywords`, `budget`, and
+  `estimation.cacheMultipliers` (all optional, back-compatible).
 ## [0.2.1] - 2026-06-14
 ### Changed
@@ -77,7 +141,9 @@ All notable changes to this project are documented here. The format is based on
   and tiers whose model is in `neverUse`).
 - `ultracost status`, `ultracost doctor`, `ultracost uninstall`.
-[Unreleased]: https://github.com/danielkremen818/ultracost/compare/v0.2.1...HEAD
+[Unreleased]: https://github.com/danielkremen818/ultracost/compare/v0.3.1...HEAD
+[0.3.1]: https://github.com/danielkremen818/ultracost/compare/v0.3.0...v0.3.1
+[0.3.0]: https://github.com/danielkremen818/ultracost/compare/v0.2.1...v0.3.0
 [0.2.1]: https://github.com/danielkremen818/ultracost/compare/v0.2.0...v0.2.1
 [0.2.0]: https://github.com/danielkremen818/ultracost/compare/v0.1.0...v0.2.0
 [0.1.0]: https://github.com/danielkremen818/ultracost/releases/tag/v0.1.0

package/NOTICE CHANGED Viewed

@@ -11,8 +11,21 @@ Prior art / inspiration:
   - 0xrdan/claude-router          — automatic per-prompt routing via hooks
   - gacabartosz/claude-smart-router — research-inspired complexity scoring
   - R4CK/claude-model-changer     — quota-aware complexity routing
+  - coyvalyss1/model-matchmaker   — keyword->tier routing rubric
+  - ryoppippi/ccusage (MIT)       — the documented Claude Code transcript parse + dedup contract
+  - Anthropic session-report skill (Apache-2.0) — local transcript token-attribution algorithm
+  - krulewis/tokencast (MIT)      — pre-flight estimate + calibration-from-actuals approach
+  - Jwrede/tokentoll (MIT)        — "Infracost for LLM" cost-diff-on-PR UX
+The wrong-tier keyword rubric (src/classify.js) was informed by the public router
+rubrics above; the transcript parse/dedup and calibration were reimplemented
+clean-room from the documented behavior of ccusage / the session-report skill —
+no code was copied. ultracost remains MIT and zero-dependency.
 ultracost's distinct contribution is per-stage routing for dynamic workflows
-(ultracode): a quality-first policy plus a static-analysis guard that inspects
-the workflow scripts Claude Code authors and flags subagent stages that would
-silently inherit the session model.
+(ultracode): a quality-first policy; a static-analysis guard that inspects the
+workflow scripts Claude Code authors and flags stages that would silently inherit
+the session model, pin a model that mismatches the task, or exceed the effort cap;
+and a closed loop that reconciles its own estimates against real per-stage token
+usage. No other tool statically detects an unpinned inline agent()/pipeline() stage
+or a pin that mismatches the work the prompt describes.

package/README.md CHANGED Viewed

@@ -31,17 +31,22 @@ guard that fails any unpinned stage.
 No telemetry. No network on the hot path. MIT.
-**Setup:**
+**Setup (Claude Code plugin):**
+```text
+/plugin marketplace add danielkremen818/ultracost
+/plugin install ultracost@ultracost
+```
+**Or via the npm CLI** (CI / scripting):
 ```bash
 npx ultracost init
 ```
-> A Claude Code plugin is planned; today ultracost installs via the npm CLI.
-**Command:** `ultracost check [script]` (or `/ultracost:check` in Claude Code) — flag any `agent()` stage missing a model pin.
+**First command (in Claude Code):** `/ultracost:check ./path/to/workflow.js` — flag any `agent()` stage that would silently inherit Opus. The plugin ships a slash command for every verb (`/ultracost:check · estimate · explain · simulate · diff · audit · usage · reconcile · calibrate · ledger · status`) — no global binary needed.
-**CLI verbs:** `init · check · audit · estimate · pricing · status · doctor · uninstall`.
+**Same verbs on the npm CLI** (CI / scripting): `ultracost init · check · audit · estimate · explain · simulate · diff · usage · reconcile · calibrate · ledger · pricing · status · doctor · uninstall`.
 <div align="center">
@@ -78,7 +83,7 @@ ultracost makes the routing **explicit, policy-driven, and verifiable** — with
 ## Architecture
-One shared core in `src/`, two delivery surfaces: an **npm CLI** (the install path today) and a Claude Code **plugin** (planned). Both compile from the same `policy.json`.
+One shared core in `src/`, two delivery surfaces: a Claude Code **plugin** (primary) and an **npm CLI** (secondary). Both compile from the same `policy.json`.
 <div align="center">
@@ -90,11 +95,46 @@ The plan lives in **data** (`policy.json`), not in prose buried in a prompt. The
 ## Install
+### Plugin (recommended)
+Inside Claude Code:
+```text
+/plugin marketplace add danielkremen818/ultracost
+/plugin install ultracost@ultracost
+```
+Then, **without leaving Claude Code**, drive everything through slash commands — verify the workflow Claude just drafted, estimate it, or reconcile a finished run against what it actually cost:
+```text
+/ultracost:check ./path/to/workflow.js
+```
+The plugin bundles — touching none of your own files — a `SessionStart` policy-injection hook, a `PreToolUse` cost gate on the `Workflow` tool (`ULTRACOST_GATE=off` to disable), a routing-policy skill, and a slash command for every verb (each runs the bundled CLI via `${CLAUDE_PLUGIN_ROOT}`, so there's nothing to install on `PATH`):
+| Command | What it does |
+|---------|--------------|
+| `/ultracost:check [path]` | Flag `agent()` stages that don't pin a model (or pin the wrong tier). Defaults to the most recent workflow script. |
+| `/ultracost:estimate <script>` | Agent count, model mix, tiered cost vs an all-opus baseline. |
+| `/ultracost:explain <script>` | Per-stage rationale: model, effort, the tier the prompt reads like, est cost, check flags. |
+| `/ultracost:simulate <script>` | Cost under all-opus vs your tiered pins vs all-sonnet. |
+| `/ultracost:diff <a> <b>` | Cost delta between two versions of a script. |
+| `/ultracost:audit [dir]` | Pin stats across your real workflow scripts. |
+| `/ultracost:usage [dir]` | Real token cost from local transcripts (main vs subagents vs stages). |
+| `/ultracost:reconcile [--last\|<id>]` | Estimate vs **actual** per stage for a finished run. |
+| `/ultracost:calibrate` | Tune the estimator from your real token usage. |
+| `/ultracost:ledger` | Cumulative savings vs all-opus across recorded runs. |
+| `/ultracost:status` | How ultracost is delivered (plugin/cli), the policy, and the bypass caveat. |
+Requires Claude Code with the `/plugin` command and dynamic workflows enabled.
+### npm CLI
 ```bash
 npx ultracost init
 ```
-This writes `~/.claude/ultracost/policy.json`, injects the routing block into `~/.claude/CLAUDE.md`, installs the re-inject hook (`~/.claude/ultracost/reinject.mjs`), and registers it on `SessionStart` in `~/.claude/settings.json`. New sessions pick it up immediately. Paths honor `CLAUDE_CONFIG_DIR` if you've relocated your config.
+This writes `~/.claude/ultracost/policy.json`, injects the routing block into `~/.claude/CLAUDE.md`, installs the re-inject hook (`~/.claude/ultracost/reinject.mjs`), and registers it on `SessionStart` in `~/.claude/settings.json`. New sessions pick it up immediately. Paths honor `CLAUDE_CONFIG_DIR` if you've relocated your config. Requires Node ≥ 24.
 Then verify a workflow script at any time:
@@ -102,25 +142,42 @@ Then verify a workflow script at any time:
 ultracost check ./path/to/workflow.js
 ```
-> Requires Node ≥ 24. A Claude Code plugin — bundling the `SessionStart` policy injection, a `PreToolUse` cost gate on the `Workflow` tool, the `/ultracost:check` command, and a routing-policy skill — ships in this repo and is planned as a separate install path; for now use the npm CLI above.
+> Use the npm path for CI/scripting or the CLAUDE.md-injection workflow; for day-to-day use in Claude Code, the plugin above is simpler.
 ## Uninstall
+### Plugin
+```text
+/plugin uninstall ultracost@ultracost
+/plugin marketplace remove ultracost
+```
+The plugin touches none of your own files, so removing it removes everything ultracost added.
+### npm CLI
 ```bash
 ultracost uninstall
 ```
 Reverses everything `init` did: removes the routing block from `~/.claude/CLAUDE.md`, deletes `~/.claude/ultracost/`, and unregisters the hook from `~/.claude/settings.json` (an invalid `settings.json` is reported, never overwritten).
-## Quickstart (CLI)
+## Quickstart (npm CLI)
+> Inside Claude Code, every verb below is also a slash command — `/ultracost:estimate`, `/ultracost:reconcile`, etc. (see the table above). This section is the npm path for CI, scripting, or the CLAUDE.md-injection workflow.
 ```bash
-ultracost init                      # install policy + rules + hook
-ultracost status                    # see the active policy and install state
+ultracost init                      # install policy + rules + hook (refuses if the plugin already delivers it)
+ultracost status                    # active policy + how it's delivered (plugin/cli) + bypass caveat
 ultracost audit ~/.claude/projects  # pin stats across your real workflow scripts
 ultracost check ./path/to/workflow  # scan a workflow script (or a directory)
 ultracost check . --fix             # auto-pin the default model on unpinned stages
 ultracost estimate ./workflow.js    # agents, model mix, and cost vs all-opus baseline
+ultracost explain ./workflow.js     # per-stage rationale + which checks fire
+ultracost reconcile --last          # estimate vs actual for your latest real run
+ultracost calibrate                 # tune the estimator from your real token usage
+ultracost ledger                    # cumulative savings vs all-opus
 ultracost pricing refresh           # update prices from Anthropic's official page
 ```
@@ -147,6 +204,33 @@ $ ultracost estimate ./workflow.js
 Estimates are relative (tiered vs all-opus), not a bill; fan-outs are ranges; the interactive 3-option menu needs a TUI. Full detail, assumptions, and the gate's [#52343](https://github.com/anthropics/claude-code/issues/52343) limitation are in [`docs/ESTIMATES.md`](./docs/ESTIMATES.md).
+## The closed loop: measure, reconcile, calibrate
+ultracost doesn't just estimate — it reads its own results back and tunes itself. It parses your **local** Claude Code transcripts (offline; no network, no telemetry) and attributes tokens **per dynamic-workflow stage** via the `subagents/workflows/wf_*/agent-*.jsonl` files Claude Code writes. No other router does this.
+```bash
+ultracost usage                 # real token cost: main loop vs subagents vs workflow stages
+ultracost reconcile --last      # estimate vs ACTUAL, per stage, for your latest workflow run
+ultracost calibrate             # learn a token prior from your real runs (estimate uses it)
+ultracost ledger                # cumulative $ saved vs an all-opus baseline, persisted
+```
+In Claude Code these are `/ultracost:usage`, `/ultracost:reconcile`, `/ultracost:calibrate`, and `/ultracost:ledger` — same output, no CLI install.
+- **Self-calibrating.** `calibrate` learns real per-stage token sizes (outlier-filtered) into `~/.claude/ultracost/calibration.json`; `estimate`, `explain`, `simulate`, and the gate use it automatically — the estimate gets closer to your reality every run.
+- **Savings ledger.** `ledger` keeps a running tally of what the policy saved you versus running everything on Opus, persisted in `~/.claude/ultracost/ledger.jsonl` (idempotent per run).
+- **Pre-flight budget guard.** Set `budget.perRun` / `budget.perDay` in the policy and the cost gate **denies** a launch whose estimate would blow the cap — before it runs.
+## Understand and compare a workflow
+```bash
+ultracost explain ./wf.js       # per-stage: tier, effort, est cost, and which UC checks fire
+ultracost simulate ./wf.js      # cost under all-opus vs your tiered pins vs all-sonnet
+ultracost diff old.js new.js    # cost delta between two versions (--ci → PR-comment table)
+```
+Or `/ultracost:explain`, `/ultracost:simulate`, `/ultracost:diff` inside Claude Code.
 ## How routing is decided
 | Tier | Model | Use for |
@@ -175,9 +259,12 @@ wf.js:4:13  UC003  stage pins banned model "haiku" (policy.neverUse)
 | `UC002` | options object present, no `model` |
 | `UC003` | model resolves to a banned model (e.g. haiku) |
 | `UC004` | `model: 'inherit'` while `allowInherit` is false |
-| `UC005` | options passed as a variable — can't verify (warning) |
+| `UC005` | model/options is a dynamic expression — can't verify (warning) |
+| `UC006` | the pinned model mismatches the work the prompt describes (warning) |
+| `UC007` | `effort` exceeds the model's cap, e.g. `sonnet` @ `xhigh` (warning) |
+| `UC008` | an `alwaysOpus` role (orchestrator, consolidation, …) pins a cheaper tier (warning) |
-The scanner is string- and comment-aware: an `agent(` that appears inside a prompt string or a comment is prose, not a call, and is never flagged. `--json` for CI, `--fix` to auto-insert the default model on the unambiguous cases (`UC001`/`UC002`), `--quiet` to print only the problems. Exit code is non-zero when errors are found.
+The scanner runs on a hand-rolled, zero-dependency JS tokenizer, so it's robust to template literals, spreads, optional-call `agent?.()`, and dynamic model values — and an `agent(` inside a prompt string or comment is prose, never a call. Fan-out detection covers `.map`/`.flatMap`/`forEach`/`for…of`/`Promise.all`/`Array.from`/`pipeline`. `--json` for CI, `--fix` to auto-insert the default model on the unambiguous cases (`UC001`/`UC002`), `--quiet` to print only the problems. `UC006`–`UC008` are advisory warnings and never fail the build on their own; exit code is non-zero only when pin-presence errors (`UC001`–`UC004`) are found.
 ## Audit your history
@@ -231,7 +318,7 @@ Fails the build if any committed workflow script has a stage that would inherit
 ## How it compares
-ultracost is intentionally narrow. General-purpose routers ([claude-router](https://github.com/0xrdan/claude-router), [claude-smart-router](https://github.com/gacabartosz/claude-smart-router), [claude-model-changer](https://github.com/R4CK/claude-model-changer)) score every prompt and route the *main loop*. ultracost targets the **dynamic-workflow / ultracode** path and adds a **guard** that statically verifies stage-level pins — which none of them do. See [`NOTICE`](./NOTICE) for prior-art credits.
+ultracost is intentionally narrow. General-purpose routers ([claude-router](https://github.com/0xrdan/claude-router), [claude-smart-router](https://github.com/gacabartosz/claude-smart-router), [claude-model-changer](https://github.com/R4CK/claude-model-changer), [model-matchmaker](https://github.com/coyvalyss1/model-matchmaker)) score every prompt and route the *main loop* at runtime. Linters like [claudelint](https://github.com/pdugan20/claudelint) validate a *file-based* agent's `model:` value. ultracost targets the **dynamic-workflow / ultracode** path and is, as far as we can tell, the only tool that **statically detects an unpinned inline `agent()`/`pipeline()` stage, flags a pin that mismatches the work the prompt describes, and reconciles its own cost estimate against real per-stage token usage**. Cost tooling like [ccusage](https://github.com/ryoppippi/ccusage), [tokencast](https://github.com/krulewis/tokencast), and [tokentoll](https://github.com/Jwrede/tokentoll) informed the transcript-parsing, calibration, and cost-diff approaches (reimplemented clean-room). See [`NOTICE`](./NOTICE) for prior-art credits.
 ## Documentation