npm - llm-cost-estimation - Versions diffs - 0.1.1 - Mend

llm-cost-estimation 0.1.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (8) hide show

package/LICENSE +21 -0
package/README.md +149 -0
package/bin/llm-cost-estimate.mjs +236 -0
package/package.json +44 -0
package/src/enrich.mjs +132 -0
package/src/index.mjs +55 -0
package/src/linear-estimate-source.mjs +131 -0
package/src/sanitize.mjs +91 -0

package/LICENSE ADDED Viewed

@@ -0,0 +1,21 @@
+MIT License
+Copyright (c) 2026 Riddim Software
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

package/README.md ADDED Viewed

@@ -0,0 +1,149 @@
+# llm-cost-estimation
+Forecast LLM cost for a future issue from historical usage telemetry and issue-size estimates.
+`llm-cost-estimation` is the pre-work sibling to [`llm-cost-attribution`](../llm-cost-attribution).
+Attribution reports what was spent after work completes.
+This package forecasts what is likely to be spent before work starts.
+## What it does
+It looks at what past issues of a given size *actually* cost and forecasts the same for a new one. Concretely:
+- Reads **usage records** — one row of cost data per agent **turn** (a turn is one agent request → response) — that follow the [Symphony Cost Telemetry Extension](../specs/symphony-cost-telemetry-extension/SPEC.md).
+- Groups that history into **cells**: buckets of past issues that share the same size and model, written `{ size, model }`. A forecast for an `L` issue on `claude-sonnet-4-6` is read off the `{ L, claude-sonnet-4-6 }` cell.
+- Forecasts a **range**, not a single number: the **P50** (median — half of the cell's past issues cost at or below it) and the **P80** (80th percentile — 4 out of 5 did), for **tokens**, **turns**, **dollars**, and Codex **quota** (the fraction of your plan's rate-limit window the issue is predicted to use).
+- Always reports **`n`** — how many past issues the forecast is based on — and flags a cell **low-confidence** when `n` is small. A forecast from 3 issues is barely a forecast.
+It only *reads* telemetry and prints a forecast; it never modifies your usage records.
+## How good are the forecasts?
+Be skeptical: a forecast is only as trustworthy as the history behind its `{ size, model }` cell, and in practice that history is thin — especially early on.
+- **Most records carry no estimate.** Cost telemetry captures what an issue *spent*, but not its size; story-point estimates live in your tracker. Until they're joined onto the telemetry (`enrichUsageWithEstimate`) or stamped on when the work is dispatched, records have no `estimate` and can't be placed in any cell. A large telemetry file can still yield only a handful of usable issues.
+- **Splitting by size *and* model fragments** what little estimate-tagged history you have across many small cells.
+So expect small `n` and wide P50→P80 bands. **Treat the output as directional, not a budget** — useful for comparing relative cost between sizes or catching order-of-magnitude surprises, not for billing. Always read the printed `n` and `lowConfidence`; a single-digit `n` is a hint, not a number to plan against. The only thing that improves accuracy is more completed issues carrying estimates — no statistical trick manufactures signal the data doesn't have.
+## Install
+```bash
+# One-shot via npx
+npx llm-cost-estimate --size L --model claude-sonnet-4-6 --from-usage ./usage.jsonl
+# Install globally
+npm install -g llm-cost-estimation
+llm-cost-estimate --size M --model gpt-5-codex --from-usage ./usage.jsonl
+```
+## CLI
+```bash
+llm-cost-estimate --size <SIZE> --model <MODEL> [--from-usage <usage.jsonl-or-dir>] [--json]
+llm-cost-estimate --issue <ID> --model <MODEL> [--from-usage <usage.jsonl-or-dir>] [--json]
+llm-cost-estimate --help
+```
+- `--size` takes the issue's size directly — a **story point** (the number, like 1/2/3/5/8, your tracker assigns to rate an issue's effort) or a **T-shirt size** (S/M/L/XL) — so it needs no tracker access.
+- `--issue` resolves the estimate from your tracker through `createLinearEstimateSource` (requires `LINEAR_API_TOKEN`).
+- `--from-usage` accepts a `usage.jsonl` file or a directory of `usage*.jsonl` files (same convention used by attribution backfill).
+- `--json` prints machine-readable JSON.
+### Example
+```bash
+llm-cost-estimate --size L --model claude-sonnet-4-6 --from-usage ./usage.jsonl
+```
+```text
+════════════════════════════════════════════════════════════════════════════════
+COST FORECAST  —  size L, model claude-sonnet-4-6
+════════════════════════════════════════════════════════════════════════════════
+Sample size:         n = 18   (low confidence)
+Metric             P50           P80          n
+────────────────────────────────────────────────────────────────────────
+tokens             1.2M          1.8M         18
+turns              42            58           18
+dollars            $0.74         $1.01        18
+quota (frac)       61.0%         68.5%        18
+```
+**Dollars** here are *API-equivalent* — what those tokens would cost at pay-as-you-go API rates, not what a subscription plan is billed (the same convention `llm-cost-attribution` uses); on a subscription, the **quota** row is the one that reflects real marginal cost. `n = 18 (low confidence)` means only 18 past issues fell in this cell — read the range loosely.
+JSON output:
+```bash
+llm-cost-estimate --size 3 --model claude-sonnet-4-6 --from-usage ./usage.jsonl --json
+```
+```json
+{
+  "size": "3",
+  "model": "claude-sonnet-4-6",
+  "n": 18,
+  "tokens": { "n": 18, "p50": 1215000, "p80": 1760000 },
+  "turns": { "n": 18, "p50": 42, "p80": 58 },
+  "dollars": { "n": 18, "p50": 0.74, "p80": 1.01 },
+  "quota": { "n": 18, "p50": 0.61, "p80": 0.685 },
+  "quotaReason": null,
+  "lowConfidence": true,
+  "empty": false
+}
+```
+## Library API
+```js
+import {
+  forecastIssueCost,
+  forecastProjectCost,
+  enrichUsageWithEstimate,
+  calibrate,
+  createLinearEstimateSource,
+} from 'llm-cost-estimation';
+```
+### `forecastIssueCost(cell, records)`
+Re-exported from [`llm-cost-attribution`](../llm-cost-attribution) for package consistency.
+- `cell` is `{ size, model }`.
+- `records` are estimate-tagged usage records (`{ estimate, model, ...tokens... }`).
+- Returns a forecast object with P50/P80 + `n` for tokens, turns, dollars, and quota.
+### `enrichUsageWithEstimate(records, source, options?)`
+Core transform for adding estimates to usage telemetry.
+- Requires `source` implementing `resolveEstimates(issueIdentifiers): Map|string->number|null`.
+- Adds `estimate` only when the source returns a valid non-negative integer.
+- Returns `{ records, unresolved, stats }`.
+- Issues with no estimate are left untouched and listed in `unresolved`.
+### `forecastProjectCost(projectId, issues, options?)`
+Public API placeholder for project rollups.
+Throws `Error('not implemented')` until the next sequencing issue lands.
+### `calibrate(completedIssues, options?)`
+Public API placeholder for empirical calibration from completed work.
+Throws `Error('not implemented')` until the next sequencing issue lands.
+## What it doesn't do
+- It does **not** infer estimates from issue titles, paths, or code signals.
+  Add estimates in your tracker, then use `enrichUsageWithEstimate` to stamp them onto telemetry.
+- It does **not** predict project-wide quota or wall-clock time.
+- It does **not** promise accuracy from very thin cells.
+  A real forecast needs sufficient historical coverage in the exact `{ size, model }` cell;
+  low coverage is surfaced via `lowConfidence` and `n`.
+- It does **not** merge multiple runs of the same issue for delivery quality.
+The **quota** forecast is per-issue only — the peak fraction of Codex's primary rate-limit window a single issue is expected to hit. It does not add up across issues into a project-level quota.
+## License
+MIT

package/bin/llm-cost-estimate.mjs ADDED Viewed

@@ -0,0 +1,236 @@
+#!/usr/bin/env node
+/**
+ * `llm-cost-estimate` — forecast LLM cost for an issue before work begins.
+ *
+ * Two paths into a forecast:
+ *   --size L --model X            key-free; the cell is `{ size, model }`.
+ *   --issue GRV-123 --model X     reads the issue's story-point estimate from
+ *                                 Linear (opt-in `LINEAR_API_TOKEN`), then
+ *                                 forecasts at that size.
+ *
+ * Both paths read estimate-tagged usage records from `--from-usage <path>`
+ * (a single `usage.jsonl` file or a directory of `usage*.jsonl` files, per
+ * the Symphony Cost Telemetry Extension spec) and emit P50/P80 + n for
+ * tokens, turns, dollars, and — for Codex cells with rate_limits samples —
+ * the per-issue peak primary-window quota fraction. `--json` swaps the table
+ * for the same shape as JSON.
+ */
+import { readUsageRecords } from 'llm-cost-attribution';
+import { parseArgs } from 'node:util';
+import { forecastIssueCost, createLinearEstimateSource } from '../src/index.mjs';
+async function main() {
+  const { values } = parseArgs({
+    options: {
+      size: { type: 'string' },
+      issue: { type: 'string' },
+      model: { type: 'string' },
+      'from-usage': { type: 'string' },
+      json: { type: 'boolean' },
+      help: { type: 'boolean', short: 'h' },
+    },
+  });
+  if (values.help === true) {
+    printUsage();
+    process.exit(0);
+  }
+  if (values.size === undefined && values.issue === undefined) {
+    printUsage();
+    process.exit(1);
+  }
+  if (values.size !== undefined && values.issue !== undefined) {
+    process.stderr.write('error: pass either --size or --issue, not both\n');
+    process.exit(1);
+  }
+  if (values.model === undefined || values.model === '') {
+    process.stderr.write('error: --model is required (e.g. --model claude-sonnet-4-6)\n');
+    process.exit(1);
+  }
+  const model = values.model;
+  let size;
+  let issueIdentifier;
+  if (values.size !== undefined) {
+    size = values.size;
+  } else {
+    issueIdentifier = values.issue;
+    const source = makeLinearEstimateSourceOrExit();
+    let resolved;
+    try {
+      resolved = await source.resolveEstimates([issueIdentifier]);
+    } catch (err) {
+      process.stderr.write(`error: failed to resolve estimate for ${issueIdentifier}: ${err.message}\n`);
+      process.exit(1);
+    }
+    const estimate = resolved instanceof Map
+      ? resolved.get(issueIdentifier)
+      : (resolved?.[issueIdentifier]);
+    if (estimate === null || estimate === undefined) {
+      process.stderr.write(`error: ${issueIdentifier} has no estimate in Linear; pass --size to forecast at a specific size\n`);
+      process.exit(1);
+    }
+    size = String(estimate);
+  }
+  const records = [];
+  if (values['from-usage'] !== undefined && values['from-usage'] !== '') {
+    for await (const record of readUsageRecords(values['from-usage'])) {
+      records.push(record);
+    }
+  }
+  const forecast = await forecastIssueCost({ size, model }, records);
+  const result = {
+    size,
+    model,
+    issueIdentifier,
+    n: forecast.tokens.n,
+    tokens: forecast.tokens,
+    turns: forecast.turns,
+    dollars: forecast.dollars,
+    quota: forecast.quota,
+    quotaReason: forecast.quotaReason,
+    lowConfidence: forecast.lowConfidence,
+    empty: forecast.empty,
+  };
+  if (values.json === true) {
+    console.log(JSON.stringify(result, null, 2));
+    return;
+  }
+  printTable(result);
+}
+function makeLinearEstimateSourceOrExit() {
+  try {
+    return createLinearEstimateSource();
+  } catch (err) {
+    process.stderr.write('error: set LINEAR_API_TOKEN or pass --size to forecast without a Linear lookup\n');
+    process.exit(1);
+  }
+}
+function printUsage() {
+  process.stdout.write(`Usage: llm-cost-estimate --size <SIZE> --model <MODEL> [--from-usage <path>] [--json]
+       llm-cost-estimate --issue <ID> --model <MODEL> [--from-usage <path>] [--json]
+       llm-cost-estimate --help
+Forecast the expected LLM cost for an issue before work begins. The forecaster
+matches \`{ size, model }\` cells against an estimate-tagged usage.jsonl dataset
+and returns empirical P50/P80 quantiles for tokens, turns, dollars, and the
+Codex primary-window quota fraction (single-issue only — never summed across
+issues).
+Inputs:
+  --size <SIZE>           Story-point or T-shirt size to forecast at, e.g.
+                          \`L\` or \`3\`. Key-free — no Linear lookup.
+  --issue <ID>            Linear issue identifier, e.g. \`GRV-123\`. The CLI
+                          resolves the issue's estimate via Linear (requires
+                          \`LINEAR_API_TOKEN\`) and forecasts at that size.
+  --model <MODEL>         Required. Model to forecast at, e.g.
+                          \`claude-sonnet-4-6\` or \`gpt-5.4\`.
+  --from-usage <path>     A \`usage.jsonl\` file or directory of \`usage*.jsonl\`
+                          files (Symphony Cost Telemetry Extension spec). When
+                          omitted the forecast is empty (n=0).
+  --json                  Emit JSON instead of the table.
+  -h, --help              Print this message.
+Examples:
+  llm-cost-estimate --size L --model claude-sonnet-4-6 --from-usage ~/usage.jsonl
+  llm-cost-estimate --issue GRV-123 --model claude-sonnet-4-6 --from-usage ~/usage.jsonl
+  llm-cost-estimate --size 3 --model gpt-5.4 --from-usage ~/usage.jsonl --json
+`);
+}
+const HEAD = '═'.repeat(72);
+const SEP = '─'.repeat(72);
+function printTable(result) {
+  const cell = result.issueIdentifier !== undefined
+    ? `${result.issueIdentifier}  (size ${result.size}, model ${result.model})`
+    : `size ${result.size}, model ${result.model}`;
+  console.log(HEAD);
+  console.log(`COST FORECAST  —  ${cell}`);
+  console.log(HEAD);
+  console.log(`Sample size:         n = ${result.n}${result.lowConfidence ? '   (low confidence)' : ''}`);
+  if (result.empty) {
+    console.log();
+    console.log(`No historical issues match this cell — forecast is empty.`);
+    console.log(`Add more estimate-tagged records to --from-usage and try again.`);
+    return;
+  }
+  console.log();
+  console.log(
+    padRight('Metric', 14) +
+    '  ' + padLeft('P50', 12) +
+    '  ' + padLeft('P80', 12) +
+    '  ' + padLeft('n', 5),
+  );
+  console.log(SEP);
+  console.log(formatRow('tokens', result.tokens, formatTokens));
+  console.log(formatRow('turns', result.turns, formatTurns));
+  console.log(formatRow('dollars', result.dollars, formatUsd));
+  if (result.quota !== null && result.quota !== undefined) {
+    console.log(formatRow('quota (frac)', result.quota, formatFraction));
+  } else if (typeof result.quotaReason === 'string') {
+    console.log();
+    console.log(`(quota: ${result.quotaReason})`);
+  }
+  if (result.dollars !== null && result.dollars.n === 0 && result.n > 0) {
+    console.log();
+    console.log(`(no pricing rates for "${result.model}" — $ row reports n=0)`);
+  }
+}
+function formatRow(label, point, formatValue) {
+  if (point === null || point === undefined) {
+    return padRight(label, 14) + '  ' + padLeft('—', 12) + '  ' + padLeft('—', 12) + '  ' + padLeft('0', 5);
+  }
+  return (
+    padRight(label, 14) +
+    '  ' + padLeft(formatValue(point.p50), 12) +
+    '  ' + padLeft(formatValue(point.p80), 12) +
+    '  ' + padLeft(String(point.n), 5)
+  );
+}
+function formatTokens(value) {
+  if (value === null || value === undefined) return '—';
+  if (value >= 1_000_000) return `${(value / 1_000_000).toFixed(1)}M`;
+  if (value >= 1_000) return `${(value / 1_000).toFixed(1)}K`;
+  return String(Math.round(value));
+}
+function formatTurns(value) {
+  if (value === null || value === undefined) return '—';
+  return String(Math.round(value));
+}
+function formatUsd(value) {
+  if (value === null || value === undefined) return '—';
+  if (value === 0) return '$0.00';
+  if (value < 0.01) return '<$0.01';
+  if (value >= 1000) return `$${value.toLocaleString('en-US', { maximumFractionDigits: 0 })}`;
+  return `$${value.toFixed(2)}`;
+}
+function formatFraction(value) {
+  if (value === null || value === undefined) return '—';
+  return `${(value * 100).toFixed(1)}%`;
+}
+function padRight(s, width) {
+  return s.length >= width ? s : s + ' '.repeat(width - s.length);
+}
+function padLeft(s, width) {
+  return s.length >= width ? s : ' '.repeat(width - s.length) + s;
+}
+main().catch((err) => {
+  process.stderr.write(`${err.stack ?? err.message ?? String(err)}\n`);
+  process.exit(1);
+});

package/package.json ADDED Viewed

@@ -0,0 +1,44 @@
+{
+  "name": "llm-cost-estimation",
+  "version": "0.1.1",
+  "description": "Forecast LLM cost from Linear issue estimates before work begins.",
+  "type": "module",
+  "bin": {
+    "llm-cost-estimate": "bin/llm-cost-estimate.mjs"
+  },
+  "main": "./src/index.mjs",
+  "files": [
+    "bin/",
+    "src/",
+    "README.md",
+    "LICENSE"
+  ],
+  "scripts": {
+    "test": "node --test"
+  },
+  "keywords": [
+    "claude",
+    "claude-code",
+    "codex",
+    "anthropic",
+    "tokens",
+    "cost",
+    "estimation",
+    "forecast",
+    "agentic",
+    "autonomous-developer",
+    "symphony"
+  ],
+  "engines": {
+    "node": ">=20"
+  },
+  "dependencies": {
+    "llm-cost-attribution": "^0.2.0"
+  },
+  "license": "MIT",
+  "repository": {
+    "type": "git",
+    "url": "git+https://github.com/RiddimSoftware/groove.git",
+    "directory": "packages/llm-cost-estimation"
+  }
+}

package/src/enrich.mjs ADDED Viewed

@@ -0,0 +1,132 @@
+/**
+ * `EnrichUsageWithEstimate` use case.
+ *
+ * A pure transform: given cost-only usage records (as defined by the Symphony
+ * Coding-Agent Cost Telemetry Extension) and a `LinearEstimateSource` port,
+ * stamp each record with its issue's story-point `estimate` (spec §5.2).
+ *
+ * Boundary rule: this module MUST NOT import any Linear SDK or HTTP client. It
+ * depends only on the injected port so the estimation core stays key-free and
+ * tracker-agnostic — mirroring how `llm-cost-attribution` stays key-free.
+ *
+ * The port contract:
+ *
+ *   source.resolveEstimates(issueIdentifiers: string[])
+ *     => Map<string, number|null> | Record<string, number|null>
+ *        (or a Promise of one)
+ *
+ *   Resolve each distinct identifier to a non-negative integer estimate, or to
+ *   `null` (or omit the key) when the issue has no estimate or no longer
+ *   resolves. The core de-duplicates identifiers before calling, so the source
+ *   sees at most one lookup per issue.
+ */
+/**
+ * True for a spec-valid `estimate`: a non-negative integer (spec §5.2,
+ * "integer ≥ 0"). `0` is a real estimate value, so it passes — only `null`,
+ * `undefined`, fractional, or negative values are rejected.
+ *
+ * @param {unknown} value
+ * @returns {boolean}
+ */
+export function isValidEstimate(value) {
+  return typeof value === 'number' && Number.isInteger(value) && value >= 0;
+}
+/**
+ * Look up an estimate for an identifier from whatever the source returned. The
+ * source MAY return a `Map` or a plain object; absent keys and non-own
+ * properties are treated as unresolved (`null`).
+ *
+ * @param {Map<string, unknown> | Record<string, unknown> | null | undefined} resolved
+ * @param {string} id
+ * @returns {unknown}
+ */
+function lookupEstimate(resolved, id) {
+  if (resolved == null) return null;
+  if (resolved instanceof Map) return resolved.has(id) ? resolved.get(id) : null;
+  if (Object.prototype.hasOwnProperty.call(resolved, id)) return resolved[id];
+  return null;
+}
+/**
+ * Stamp the spec's optional `estimate` field onto each usage record.
+ *
+ * Distinct `issueIdentifier`s are de-duplicated before the source is queried,
+ * so the port sees at most one lookup per issue. Records whose issue resolves
+ * to a non-negative integer estimate gain `estimate`; all other fields are left
+ * unchanged. Records whose issue has no estimate (`null`) or no longer resolves
+ * are returned untouched — `estimate` stays **absent, never `0`** — and the
+ * issue identifier is reported in the `unresolved` summary.
+ *
+ * Input records are never mutated; a shallow copy is returned for each.
+ *
+ * @param {Iterable<object>} records  Usage records (typically estimate-free).
+ * @param {{ resolveEstimates: (ids: string[]) => unknown }} source  A `LinearEstimateSource`.
+ * @param {object} [options]  Reserved for future options.
+ * @returns {Promise<{
+ *   records: object[],
+ *   unresolved: string[],
+ *   stats: {
+ *     recordsTotal: number,
+ *     recordsEnriched: number,
+ *     issuesQueried: number,
+ *     issuesResolved: number,
+ *     issuesUnresolved: number,
+ *   },
+ * }>}
+ */
+export async function enrichUsageWithEstimate(records, source, options = {}) {
+  if (source == null || typeof source.resolveEstimates !== 'function') {
+    throw new TypeError(
+      'enrichUsageWithEstimate: source must implement resolveEstimates(ids)',
+    );
+  }
+  const input = [...records];
+  // De-duplicate distinct issue identifiers so the source is queried at most
+  // once per issue (≤1 lookup per issue, batched where the API allows).
+  const distinctIds = [
+    ...new Set(
+      input
+        .map((rec) => rec?.issueIdentifier)
+        .filter((id) => typeof id === 'string' && id !== ''),
+    ),
+  ];
+  const resolved = distinctIds.length > 0
+    ? await source.resolveEstimates(distinctIds)
+    : new Map();
+  const resolvedIds = new Set();
+  const unresolvedIds = new Set();
+  let recordsEnriched = 0;
+  const out = input.map((rec) => {
+    const id = rec?.issueIdentifier;
+    if (typeof id !== 'string' || id === '') {
+      return { ...rec };
+    }
+    const estimate = lookupEstimate(resolved, id);
+    if (isValidEstimate(estimate)) {
+      resolvedIds.add(id);
+      recordsEnriched += 1;
+      return { ...rec, estimate };
+    }
+    unresolvedIds.add(id);
+    return { ...rec };
+  });
+  return {
+    records: out,
+    unresolved: [...unresolvedIds].sort(),
+    stats: {
+      recordsTotal: out.length,
+      recordsEnriched,
+      issuesQueried: distinctIds.length,
+      issuesResolved: resolvedIds.size,
+      issuesUnresolved: unresolvedIds.size,
+    },
+  };
+}

package/src/index.mjs ADDED Viewed

@@ -0,0 +1,55 @@
+/**
+ * Public API for `llm-cost-estimation`.
+ *
+ * Implemented exports are re-exported from their sub-modules; the remaining
+ * stubs throw until their implementing issue lands. Import from this barrel —
+ * do not import from sub-modules directly.
+ */
+/**
+ * Stamp the Symphony Cost Telemetry Extension's optional `estimate` field onto
+ * usage records by joining each record's issue to its Linear story-point
+ * estimate via an injected `LinearEstimateSource` port. Pure transform — see
+ * `enrich.mjs`.
+ */
+export { enrichUsageWithEstimate, isValidEstimate } from './enrich.mjs';
+/**
+ * Linear-backed `LinearEstimateSource` adapter for `enrichUsageWithEstimate`.
+ * Reads the API token from an injected option or `LINEAR_API_TOKEN`.
+ */
+export { createLinearEstimateSource } from './linear-estimate-source.mjs';
+/**
+ * Forecast tokens / turns / dollars / quota P50–P80 for a `{ size, model }`
+ * cell from a set of estimate-tagged usage records. Re-exported from
+ * `llm-cost-attribution`, which owns the empirical-quantile forecaster and
+ * its `PricingTable` / `QuotaModel` adapters.
+ */
+export { forecastIssueCost } from 'llm-cost-attribution';
+/**
+ * Forecast the aggregate LLM cost for an entire Linear project, given per-issue
+ * estimates and a calibration dataset.
+ *
+ * @param {string}   projectId    Linear project identifier.
+ * @param {object[]} issues       Array of `{ identifier, estimate }` objects.
+ * @param {object}   [options]
+ * @returns {Promise<object>}
+ */
+export async function forecastProjectCost(projectId, issues, options = {}) {
+  throw new Error('not implemented');
+}
+/**
+ * Build or update a calibration dataset from a set of completed issues whose
+ * actual cost is known. Returns calibration parameters used by the forecast
+ * functions.
+ *
+ * @param {object[]} completedIssues  Array of `{ identifier, estimate, actualCostUsd }`.
+ * @param {object}   [options]
+ * @returns {object}
+ */
+export function calibrate(completedIssues, options = {}) {
+  throw new Error('not implemented');
+}

package/src/linear-estimate-source.mjs ADDED Viewed

@@ -0,0 +1,131 @@
+/**
+ * Linear adapter implementing the `LinearEstimateSource` port consumed by the
+ * `EnrichUsageWithEstimate` use case (`enrich.mjs`).
+ *
+ * This is the ONLY module in the package that talks to Linear. The enrichment
+ * core depends on the port, not on this adapter, so the core stays key-free and
+ * tracker-agnostic. The API token is read from an injected option or the
+ * `LINEAR_API_TOKEN` environment variable — it is never hardcoded, logged, or
+ * written to any usage record (spec §8).
+ *
+ * Tests inject a fake source instead of this adapter; there are no live Linear
+ * calls in CI.
+ */
+const LINEAR_GRAPHQL_ENDPOINT = 'https://api.linear.app/graphql';
+// Linear caps `first` at 250; one (teamKey, number) filter matches at most one
+// issue, so a chunk of this size returns in a single page with no pagination.
+const DEFAULT_CHUNK_SIZE = 100;
+const IDENTIFIER_PATTERN = /^([A-Za-z][A-Za-z0-9]*)-(\d+)$/;
+const ESTIMATES_QUERY = `query IssueEstimates($filter: IssueFilter, $first: Int) {
+  issues(filter: $filter, first: $first) {
+    nodes { identifier estimate }
+  }
+}`;
+/**
+ * Split `"EPAC-1999"` into `{ teamKey: "EPAC", number: 1999 }`, or `null` if it
+ * isn't a `<TEAM>-<NUMBER>` identifier.
+ *
+ * @param {string} identifier
+ */
+function parseIdentifier(identifier) {
+  const match = IDENTIFIER_PATTERN.exec(identifier);
+  if (match === null) return null;
+  return { teamKey: match[1], number: Number(match[2]) };
+}
+function chunk(items, size) {
+  const out = [];
+  for (let i = 0; i < items.length; i += size) {
+    out.push(items.slice(i, i + size));
+  }
+  return out;
+}
+/**
+ * Create a `LinearEstimateSource` backed by Linear's GraphQL API.
+ *
+ * @param {object} [options]
+ * @param {string} [options.token]      Linear API token. Defaults to `process.env.LINEAR_API_TOKEN`.
+ * @param {string} [options.endpoint]   GraphQL endpoint. Defaults to Linear's production endpoint.
+ * @param {typeof fetch} [options.fetch] Fetch implementation. Defaults to the global `fetch`.
+ * @param {number} [options.chunkSize]  Max identifiers per GraphQL request.
+ * @returns {{ resolveEstimates: (issueIdentifiers: string[]) => Promise<Map<string, number|null>> }}
+ */
+export function createLinearEstimateSource(options = {}) {
+  const token = options.token ?? process.env.LINEAR_API_TOKEN;
+  if (typeof token !== 'string' || token === '') {
+    throw new Error(
+      'createLinearEstimateSource: a Linear API token is required (pass options.token or set LINEAR_API_TOKEN)',
+    );
+  }
+  const endpoint = options.endpoint ?? LINEAR_GRAPHQL_ENDPOINT;
+  const fetchImpl = options.fetch ?? globalThis.fetch;
+  if (typeof fetchImpl !== 'function') {
+    throw new TypeError('createLinearEstimateSource: no fetch implementation available');
+  }
+  const chunkSize = options.chunkSize ?? DEFAULT_CHUNK_SIZE;
+  async function fetchChunk(parsed) {
+    const filter = {
+      or: parsed.map(({ teamKey, number }) => ({
+        team: { key: { eq: teamKey } },
+        number: { eq: number },
+      })),
+    };
+    const res = await fetchImpl(endpoint, {
+      method: 'POST',
+      headers: {
+        'content-type': 'application/json',
+        authorization: token,
+      },
+      body: JSON.stringify({
+        query: ESTIMATES_QUERY,
+        variables: { filter, first: parsed.length },
+      }),
+    });
+    if (!res.ok) {
+      throw new Error(`Linear API request failed: HTTP ${res.status}`);
+    }
+    const json = await res.json();
+    if (json.errors) {
+      throw new Error(`Linear API returned errors: ${JSON.stringify(json.errors)}`);
+    }
+    return json?.data?.issues?.nodes ?? [];
+  }
+  return {
+    /**
+     * Resolve each distinct identifier to a non-negative integer estimate, or
+     * `null` when the issue has no estimate or no longer resolves. Identifiers
+     * are de-duplicated by the caller; this method assumes they are distinct.
+     */
+    async resolveEstimates(issueIdentifiers) {
+      const result = new Map();
+      const parsed = [];
+      for (const id of issueIdentifiers) {
+        const p = parseIdentifier(id);
+        if (p === null) {
+          result.set(id, null); // unparseable → unresolved
+        } else {
+          parsed.push({ id, ...p });
+        }
+      }
+      for (const group of chunk(parsed, chunkSize)) {
+        const nodes = await fetchChunk(group);
+        const byIdentifier = new Map(nodes.map((n) => [n.identifier, n.estimate]));
+        for (const { id } of group) {
+          const estimate = byIdentifier.has(id) ? byIdentifier.get(id) : null;
+          result.set(id, estimate ?? null);
+        }
+      }
+      return result;
+    },
+  };
+}

package/src/sanitize.mjs ADDED Viewed

@@ -0,0 +1,91 @@
+/**
+ * Leak-safety guard for the llm-cost-estimation package.
+ *
+ * Fixtures here derive from real telemetry, so a slipped `/Users/<name>` path
+ * or private repo name would publish org/personal data on npm. This module
+ * exposes the deny patterns and a string scanner; `test/no-org-data.test.mjs`
+ * wires them into `npm test` so a leak fails CI rather than relying on
+ * reviewer vigilance.
+ *
+ * Allowed: opaque tracker IDs matching `[A-Z]+-\d+` on their own (e.g.
+ *   `EPAC-1999`, `GRV-42`).
+ * Denied: absolute home paths (`/Users/<name>`, `/home/<name>`), the
+ *   `.symphony/workspaces/<ID>` path shape, and any listed private repo name.
+ */
+// Configurable list of private repo names. Extend this array when a new repo
+// joins the org's "private" set; the guard will start flagging the name as a
+// whole-word match.
+export const PRIVATE_REPO_NAMES = Object.freeze([]);
+// Built-in deny rules. Each regex MUST use the `g` flag — `scanText` walks
+// every match on a line.
+export const DEFAULT_DENY_PATTERNS = Object.freeze([
+  {
+    name: 'home-path-users',
+    regex: /\/Users\/[A-Za-z0-9._-]+/g,
+    description: 'absolute /Users/<name> home path (macOS personal data)',
+  },
+  {
+    name: 'home-path-home',
+    regex: /\/home\/[A-Za-z0-9._-]+/g,
+    description: 'absolute /home/<name> path (Linux personal data)',
+  },
+  {
+    name: 'symphony-workspace-path',
+    regex: /\.symphony\/workspaces\/[A-Za-z0-9._-]+/g,
+    description: '.symphony/workspaces/<ID> path (private orchestrator state)',
+  },
+]);
+/**
+ * Build a deny-pattern set, optionally extending the built-ins with a list of
+ * private repo names. Each repo name is matched as a whole word so a name
+ * like `epac` does not match `epacenter`.
+ */
+export function buildDenyPatterns({ privateRepoNames = PRIVATE_REPO_NAMES } = {}) {
+  const patterns = [...DEFAULT_DENY_PATTERNS];
+  for (const name of privateRepoNames) {
+    patterns.push({
+      name: `private-repo:${name}`,
+      regex: new RegExp(`\\b${escapeRegex(name)}\\b`, 'g'),
+      description: `private repo name "${name}"`,
+    });
+  }
+  return patterns;
+}
+function escapeRegex(s) {
+  return s.replace(/[.*+?^${}()|[\]\\]/g, '\\$&');
+}
+/**
+ * Scan a string for leaked data. Returns an array of findings; an empty
+ * array means clean. Findings carry 1-indexed line + column so callers can
+ * print human-readable file:line:col error messages.
+ *
+ * @returns {{ line: number, column: number, match: string, patternName: string, description: string }[]}
+ */
+export function scanText(text, patterns = buildDenyPatterns()) {
+  const findings = [];
+  const lines = text.split('\n');
+  for (let i = 0; i < lines.length; i++) {
+    const line = lines[i];
+    for (const pattern of patterns) {
+      // Shared regex objects carry lastIndex state across calls; reset it.
+      pattern.regex.lastIndex = 0;
+      let m;
+      while ((m = pattern.regex.exec(line)) !== null) {
+        findings.push({
+          line: i + 1,
+          column: m.index + 1,
+          match: m[0],
+          patternName: pattern.name,
+          description: pattern.description,
+        });
+        if (m.index === pattern.regex.lastIndex) pattern.regex.lastIndex++;
+      }
+    }
+  }
+  return findings;
+}