npm - semdiff - Versions diffs - 0.1.0 - Mend

semdiff 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (11) hide show

package/LICENSE +21 -0
package/README.md +222 -0
package/dist/chunk-4GFNMJGB.js +460 -0
package/dist/chunk-4GFNMJGB.js.map +1 -0
package/dist/cli.d.ts +5 -0
package/dist/cli.js +78 -0
package/dist/cli.js.map +1 -0
package/dist/index.d.ts +276 -0
package/dist/index.js +25 -0
package/dist/index.js.map +1 -0
package/package.json +64 -0

package/LICENSE ADDED Viewed

@@ -0,0 +1,21 @@
+MIT License
+Copyright (c) 2026 Brian Benzinger
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

package/README.md ADDED Viewed

@@ -0,0 +1,222 @@
+# semdiff
+> Meaning-aware diff engine and CLI that surfaces substantive changes in prose, not cosmetic edits.
+`semdiff` answers a question that a line-based diff cannot: **did the meaning
+change?** It ignores reflowed whitespace, renumbered clauses, punctuation
+normalization, and synonym swaps that carry no new obligation, and it flags the
+edits that actually alter substance — a tightened threshold, a new exemption, a
+shifted deadline, an added requirement.
+It is a standalone, domain-neutral library and CLI. It has no backend and no
+network dependencies of its own beyond the LLM provider you configure. The name
+is deliberately generic: `semdiff` is useful to anyone diffing prose where
+meaning matters more than characters — contracts, policies, terms of service,
+documentation, or regulations.
+> [!NOTE]
+> `semdiff` originated as the engine behind a sustainability-regulation change
+> tracker, but it is built and packaged to stand on its own. See
+> [`adr/0001`](adr/0001-standalone-domain-neutral-engine.md) for the scope
+> boundary between this engine and any application that consumes it.
+## Why not just `diff`?
+A character- or line-based diff is precise but semantically blind. Given two
+revisions of a paragraph, it reports *that* bytes changed, not *whether the
+obligation changed*. In a legal or policy setting that produces two failure
+modes that are both expensive:
+- **Noise.** Cosmetic edits (formatting, renumbering, citation-style changes)
+  light up as diffs and bury the one change that matters.
+- **Missed substance.** A reworded sentence that quietly narrows an exemption
+  looks like a small token-level edit and gets dismissed.
+`semdiff` classifies each aligned change as **substantive** or **cosmetic**, and
+for substantive changes describes *what* changed, with a confidence signal and a
+pointer back to the exact spans involved.
+## What it is not
+- It does **not** interpret or give legal advice. It reports what changed
+  between two texts; it does not tell you what the change means for you.
+- It is **not** a generic web scraper or an ingestion pipeline. It diffs two
+  inputs you hand it.
+- It is **not** nondeterministic by accident. The quality and determinism layer
+  (caching, schema validation, confidence flags, an eval harness) is the point —
+  see [`adr/0005`](adr/0005-eval-harness-and-determinism-layer.md).
+## Status
+Implemented (v0, pre-1.0). The pipeline — segment → align → classify → structured
+diff — works end to end behind a per-file coverage gate (95% line / 90% branch).
+The default classifier calls the Anthropic API (set `ANTHROPIC_API_KEY`), or you
+can inject your own `Classifier`, optionally wrapped with `withCache` so
+identical changes are classified once (ADR-0004). The eval harness
+([`adr/0005`](adr/0005-eval-harness-and-determinism-layer.md)) scores classifier
+accuracy (`npm run eval`); curated result snapshots are in
+[`eval/RESULTS.md`](eval/RESULTS.md). Architecture decisions live in [`adr/`](adr/);
+the working agreement for contributors (human and AI) is in
+[`CLAUDE.md`](CLAUDE.md).
+## Install
+```sh
+npm install semdiff
+```
+The published package ships compiled ESM with bundled type declarations and has
+**zero runtime dependencies** beyond the LLM provider you configure (ADR-0009).
+It runs on Node ≥ 20, both locally (CLI) and on AWS Lambda (library).
+## Usage
+As a library:
+```ts
+import { diff } from "semdiff";
+// ANTHROPIC_API_KEY in the environment, or inject your own Classifier.
+const result = await diff(before, after);
+for (const change of result.changes) {
+  console.log(change.type, change.classification, change.description ?? "");
+}
+```
+As a CLI (installed globally, or via `npx`):
+```sh
+npx semdiff before.txt after.txt                      # structured diff as JSON
+npx semdiff before.txt after.txt --granularity clause
+```
+From a checkout of this repo you can run the source directly without building —
+`node src/cli.ts before.txt after.txt` — on a Node that strips TypeScript types.
+Changed content — insertions, deletions, and modifications — is classified by
+the model; identical, cosmetic, and relocated (moved) content is classified
+locally and needs no API key.
+## Configuration
+The only thing `semdiff` needs to configure is the LLM provider. The common case
+is **zero code**: set your key in the environment and the defaults handle the
+rest.
+### 1. Your API key (the only required setup)
+```sh
+export ANTHROPIC_API_KEY=sk-ant-...      # macOS/Linux
+$env:ANTHROPIC_API_KEY = "sk-ant-..."    # PowerShell
+```
+Both the library and the CLI read `ANTHROPIC_API_KEY` automatically — no other
+setup is needed. The key is only used when a change actually has to reach the
+model; identical, cosmetic, and moved content never needs it.
+### 2. Override the model or pass the key explicitly
+The default model is **`claude-opus-4-8`** (the latest capable Claude). Override
+it — or supply the key in code instead of the environment — per call:
+```ts
+import { diff } from "semdiff";
+const result = await diff(before, after, {
+  modelId: "claude-sonnet-4-6",          // any Anthropic model id; default: claude-opus-4-8
+});
+```
+To pass the key in code (e.g. from your own secret store rather than the
+environment), construct the default classifier explicitly and inject it:
+```ts
+import { diff, createDefaultClassifier } from "semdiff";
+const classifier = createDefaultClassifier({
+  apiKey: mySecret,                       // default: process.env.ANTHROPIC_API_KEY
+  modelId: "claude-opus-4-8",             // optional
+  timeoutMs: 60000,                       // optional; per-call timeout (ADR-0012)
+  maxRetries: 2,                          // optional; retries on 429/5xx/network/timeout (ADR-0012)
+});
+const result = await diff(before, after, { classifier });
+```
+> The `modelId` you pass is also stamped into the result's `provenance`, so a
+> diff always records which model produced it (ADR-0004).
+> Each model call has a timeout and retries transient failures (429, 5xx, network
+> errors, timeouts) with exponential backoff, honouring `Retry-After`;
+> non-transient errors (400, auth) fail fast. Tune with `timeoutMs` / `maxRetries`,
+> or set `maxRetries: 0` for a single attempt (ADR-0012).
+### 3. Use a different provider entirely
+`semdiff` depends on a small `Classifier` interface, not on Anthropic. To use
+another provider (OpenAI, a local model, a mock for tests), implement
+`classify` and inject it — the engine keeps its zero-dependency runtime and never
+constructs the default classifier:
+```ts
+import { diff, type Classifier } from "semdiff";
+const classifier: Classifier = {
+  async classify(pair) {
+    // pair: { type, a, b, spanA, spanB }  →  call your provider here
+    return { classification: "substantive", confidence: 0.9, description: "…" };
+  },
+};
+const result = await diff(before, after, { classifier });
+```
+Wrap any classifier with `withCache` so identical changes are classified once
+(ADR-0004):
+```ts
+import { diff, createDefaultClassifier, withCache } from "semdiff";
+const classifier = withCache(createDefaultClassifier({}), {
+  modelId: "claude-opus-4-8",
+  promptVersion: "0",
+});
+```
+## Design at a glance
+```
+input A ─┐
+         ├─▶ segment ─▶ align ─▶ classify ─▶ structured diff
+input B ─┘             (cheap,    (LLM, gated   (substantive vs
+                        local)     on change)    cosmetic + spans)
+```
+- **Segment** both texts into comparable units (clauses / sentences).
+- **Align** units across the two versions with a cheap, deterministic local pass
+  (no LLM): exact and near-exact matches are settled here.
+- **Classify** the genuinely changed units with the LLM — modifications,
+  insertions, and deletions alike (ADR-0011) — returning a structured,
+  schema-validated verdict. Unchanged, trivially-changed, and relocated (moved)
+  units never reach the model, which bounds cost and nondeterminism.
+- **Emit** a stable, versioned structured diff (JSON); the CLI prints that JSON,
+  and any human-readable rendering is a pure function of it (ADR-0006).
+The full reasoning is in the ADRs:
+| ADR | Decision |
+| --- | --- |
+| [0001](adr/0001-standalone-domain-neutral-engine.md) | Standalone, domain-neutral engine separate from any application |
+| [0002](adr/0002-typescript-node-library-and-cli.md) | TypeScript / Node, distributed as both a library and a CLI |
+| [0003](adr/0003-meaning-aware-diff-pipeline.md) | Segment → align → classify pipeline |
+| [0004](adr/0004-llm-classification-and-deterministic-gating.md) | LLM-backed classification, gated and structured |
+| [0005](adr/0005-eval-harness-and-determinism-layer.md) | The eval + determinism layer is the core contribution |
+| [0006](adr/0006-structured-diff-output-schema.md) | Stable structured diff schema as the public contract |
+| [0007](adr/0007-character-offset-span-semantics.md) | Spans are half-open character offsets into the literal input |
+| [0008](adr/0008-vitest-and-per-file-coverage-gate.md) | Vitest with a per-file coverage gate |
+| [0009](adr/0009-default-classifier-over-fetch.md) | The default classifier calls the Anthropic API over fetch |
+| [0010](adr/0010-move-detection-by-content-match.md) | Move detection by content match (deterministic, cosmetic) |
+| [0011](adr/0011-classify-one-sided-changes.md) | Classify one-sided changes (insertions/deletions) through the model |
+| [0012](adr/0012-classifier-resilience-timeout-and-retry.md) | Default classifier resilience: per-call timeout and bounded retry with backoff |
+## License
+[MIT](LICENSE) © 2026 Brian Benzinger

package/dist/chunk-4GFNMJGB.js ADDED Viewed

@@ -0,0 +1,460 @@
+// src/schema.ts
+var SCHEMA_VERSION = "1.0.0";
+// src/classifier.ts
+var DEFAULT_MODEL_ID = "claude-opus-4-8";
+function needsReviewVerdict() {
+  return { classification: "substantive", confidence: 0 };
+}
+// src/classifiers/claude.ts
+var MESSAGES_URL = "https://api.anthropic.com/v1/messages";
+var ANTHROPIC_VERSION = "2023-06-01";
+var MAX_TOKENS = 1024;
+var DEFAULT_TIMEOUT_MS = 6e4;
+var DEFAULT_MAX_RETRIES = 2;
+var BASE_RETRY_DELAY_MS = 500;
+var MAX_RETRY_DELAY_MS = 8e3;
+var SYSTEM_PROMPT = [
+  "You are a careful classifier inside a meaning-aware diff engine. You are given",
+  "two versions of one short span of prose: version A (before) and version B",
+  "(after). Decide whether the change from A to B is:",
+  "",
+  '- "substantive": it alters the meaning \u2014 a changed value, number, date,',
+  "  condition, scope, or any wording a careful reader would act on differently.",
+  '- "cosmetic": it preserves the meaning \u2014 formatting, punctuation, casing,',
+  "  whitespace, renumbering, or a meaning-preserving rewording.",
+  "",
+  "One side may be empty: an empty A means the B text was newly inserted, and an",
+  "empty B means the A text was removed. Judge whether that insertion or removal",
+  "is substantive (it adds or removes meaning, an obligation, or a condition) or",
+  "cosmetic (boilerplate, formatting, or duplicate content).",
+  "",
+  "Rules:",
+  "- Judge only these two snippets; do not assume external context.",
+  '- When genuinely uncertain whether the meaning changed, choose "substantive":',
+  "  it is safer to surface a real change than to hide one.",
+  '- For a substantive change, give a one-sentence factual "description" of what',
+  "  changed \u2014 no advice and no judgement of how significant it is.",
+  '- Set "confidence" in [0, 1] for how sure you are of the classification.'
+].join("\n");
+var VERDICT_SCHEMA = {
+  type: "object",
+  properties: {
+    classification: { type: "string", enum: ["substantive", "cosmetic"] },
+    description: { type: "string" },
+    confidence: { type: "number" }
+  },
+  required: ["classification", "confidence"],
+  additionalProperties: false
+};
+function createDefaultClassifier(config) {
+  const modelId = config.modelId ?? DEFAULT_MODEL_ID;
+  const apiKey = config.apiKey ?? process.env.ANTHROPIC_API_KEY;
+  const timeoutMs = config.timeoutMs ?? DEFAULT_TIMEOUT_MS;
+  const maxRetries = config.maxRetries ?? DEFAULT_MAX_RETRIES;
+  if (apiKey === void 0 || apiKey === "") {
+    throw new Error("createDefaultClassifier: no API key (set ANTHROPIC_API_KEY or pass config.apiKey)");
+  }
+  return {
+    classify: (pair) => {
+      const init = {
+        method: "POST",
+        headers: {
+          "content-type": "application/json",
+          "x-api-key": apiKey,
+          "anthropic-version": ANTHROPIC_VERSION
+        },
+        body: JSON.stringify(buildRequest(modelId, pair))
+      };
+      return classifyWithRetry(() => classifyOnce(init, timeoutMs), maxRetries);
+    }
+  };
+}
+var TransientError = class extends Error {
+  // A field declaration + assignment, not a constructor parameter property:
+  // parameter properties are runtime syntax that Node's strip-only type removal
+  // cannot handle, which would break the zero-build `node src/...` path (ADR-0002).
+  retryAfterMs;
+  constructor(message, retryAfterMs2) {
+    super(message);
+    this.retryAfterMs = retryAfterMs2;
+  }
+};
+async function classifyWithRetry(attempt, maxRetries) {
+  for (let retry = 0; ; retry += 1) {
+    try {
+      return await attempt();
+    } catch (error) {
+      if (!(error instanceof TransientError) || retry >= maxRetries) throw error;
+      await sleep(backoffMs(retry, error.retryAfterMs));
+    }
+  }
+}
+async function classifyOnce(init, timeoutMs) {
+  let response;
+  try {
+    response = await fetchWithTimeout(MESSAGES_URL, init, timeoutMs);
+  } catch (cause) {
+    throw new TransientError(`Anthropic API request failed: ${cause.message}`, 0);
+  }
+  if (!response.ok) {
+    const message = `Anthropic API error ${response.status}: ${await response.text()}`;
+    if (response.status === 429 || response.status >= 500) {
+      throw new TransientError(message, retryAfterMs(response.headers));
+    }
+    throw new Error(message);
+  }
+  return parseVerdict(await response.json());
+}
+async function fetchWithTimeout(url, init, timeoutMs) {
+  const controller = new AbortController();
+  const timer = setTimeout(() => controller.abort(), timeoutMs);
+  try {
+    return await fetch(url, { ...init, signal: controller.signal });
+  } finally {
+    clearTimeout(timer);
+  }
+}
+function backoffMs(retry, retryAfterMs2) {
+  if (retryAfterMs2 > 0) return Math.min(retryAfterMs2, MAX_RETRY_DELAY_MS);
+  const exponential = BASE_RETRY_DELAY_MS * 2 ** retry;
+  return Math.min(exponential + exponential * 0.25 * Math.random(), MAX_RETRY_DELAY_MS);
+}
+function retryAfterMs(headers) {
+  const seconds = Number(headers.get("retry-after"));
+  return seconds > 0 ? seconds * 1e3 : 0;
+}
+function sleep(ms) {
+  return new Promise((resolve) => {
+    setTimeout(resolve, ms);
+  });
+}
+function buildRequest(modelId, pair) {
+  const outputConfig = {
+    format: { type: "json_schema", schema: VERDICT_SCHEMA }
+  };
+  if (modelSupportsEffort(modelId)) {
+    outputConfig.effort = "low";
+  }
+  return {
+    model: modelId,
+    max_tokens: MAX_TOKENS,
+    output_config: outputConfig,
+    system: [{ type: "text", text: SYSTEM_PROMPT, cache_control: { type: "ephemeral" } }],
+    messages: [{ role: "user", content: `Change type: ${pair.type}.
+A:
+${pair.a}
+B:
+${pair.b}` }]
+  };
+}
+function modelSupportsEffort(modelId) {
+  return modelId.startsWith("claude-opus-") || modelId.startsWith("claude-sonnet-4-6");
+}
+function parseVerdict(data) {
+  const message = data;
+  const text = message.content?.find((block) => block.type === "text")?.text;
+  if (text === void 0) {
+    throw new Error("Anthropic API returned no text content");
+  }
+  return JSON.parse(text);
+}
+// src/version.ts
+var ENGINE_VERSION = "0.1.0";
+var DEFAULT_PROMPT_VERSION = "0";
+// src/pipeline/segment.ts
+var SENTENCE_SEGMENTER = new Intl.Segmenter("en", { granularity: "sentence" });
+var CLAUSE_DELIMITERS = ";:";
+function segment(text, granularity) {
+  const units = [];
+  const delimiters = granularity === "clause" ? CLAUSE_DELIMITERS : "";
+  for (const { segment: sentence, index } of SENTENCE_SEGMENTER.segment(text)) {
+    emitUnits(sentence, index, delimiters, units);
+  }
+  return units;
+}
+function emitUnits(chunk, base, delimiters, out) {
+  if (delimiters.length === 0) {
+    pushTrimmed(chunk, base, out);
+    return;
+  }
+  let cursor = 0;
+  for (let i = 0; i < chunk.length; i++) {
+    if (delimiters.includes(chunk[i])) {
+      pushTrimmed(chunk.slice(cursor, i), base + cursor, out);
+      cursor = i + 1;
+    }
+  }
+  pushTrimmed(chunk.slice(cursor), base + cursor, out);
+}
+function pushTrimmed(part, base, out) {
+  const trimmed = part.trim();
+  if (trimmed.length === 0) return;
+  const start = base + (part.length - part.trimStart().length);
+  out.push({ text: trimmed, span: { start, end: start + trimmed.length } });
+}
+// src/pipeline/align.ts
+var LEADING_ENUMERATOR = /^\s*(?:[([]?\s*(?:\d{1,3}|[a-z]{1,2}|[ivxlcdm]{1,5})\s*[)\].]|[-*•·])\s+/iu;
+var PUNCTUATION = new RegExp("\\p{P}", "gu");
+function align(unitsA, unitsB) {
+  const keysA = unitsA.map(normalize);
+  const keysB = unitsB.map(normalize);
+  const matches = lcsMatches(keysA, keysB);
+  const out = [];
+  let i = 0;
+  let j = 0;
+  for (const [mi, mj] of matches) {
+    emitGap(unitsA.slice(i, mi), unitsB.slice(j, mj), out);
+    const a = unitsA[mi];
+    const b = unitsB[mj];
+    out.push({ tag: a.text === b.text ? "unchanged" : "trivial-change", a, b });
+    i = mi + 1;
+    j = mj + 1;
+  }
+  emitGap(unitsA.slice(i), unitsB.slice(j), out);
+  return detectMoves(out);
+}
+function emitGap(gapA, gapB, out) {
+  const paired = Math.min(gapA.length, gapB.length);
+  let k = 0;
+  for (; k < paired; k++) {
+    const a = gapA[k];
+    const b = gapB[k];
+    if (sharesToken(a, b)) {
+      out.push({ tag: "candidate", a, b });
+    } else {
+      out.push({ tag: "candidate", a, b: null });
+      out.push({ tag: "candidate", a: null, b });
+    }
+  }
+  for (; k < gapA.length; k++) out.push({ tag: "candidate", a: gapA[k], b: null });
+  for (; k < gapB.length; k++) out.push({ tag: "candidate", a: null, b: gapB[k] });
+}
+function detectMoves(pairs) {
+  const insertionByKey = /* @__PURE__ */ new Map();
+  pairs.forEach((pair, index) => {
+    if (pair.tag === "candidate" && pair.a === null) {
+      insertionByKey.set(normalize(pair.b), index);
+    }
+  });
+  const moveTo = /* @__PURE__ */ new Map();
+  pairs.forEach((pair, index) => {
+    if (pair.tag === "candidate" && pair.b === null) {
+      const key = normalize(pair.a);
+      const insertionIndex = insertionByKey.get(key);
+      if (insertionIndex !== void 0) {
+        insertionByKey.delete(key);
+        moveTo.set(index, insertionIndex);
+      }
+    }
+  });
+  if (moveTo.size === 0) return pairs;
+  const movedInsertions = new Set(moveTo.values());
+  return pairs.flatMap((pair, index) => {
+    if (movedInsertions.has(index)) return [];
+    const insertionIndex = moveTo.get(index);
+    return insertionIndex === void 0 ? [pair] : [{ tag: "move", a: pair.a, b: pairs[insertionIndex].b }];
+  });
+}
+function normalize(unit) {
+  return unit.text.toLowerCase().replace(LEADING_ENUMERATOR, "").replace(PUNCTUATION, " ").replace(/\s+/g, " ").trim();
+}
+function sharesToken(a, b) {
+  const tokensB = new Set(tokenize(b));
+  return tokenize(a).some((token) => tokensB.has(token));
+}
+function tokenize(unit) {
+  const normalized = normalize(unit);
+  return normalized.length === 0 ? [] : normalized.split(" ");
+}
+function lcsMatches(a, b) {
+  const n = a.length;
+  const m = b.length;
+  const dp = Array.from({ length: n + 1 }, () => new Array(m + 1).fill(0));
+  for (let i2 = n - 1; i2 >= 0; i2--) {
+    for (let j2 = m - 1; j2 >= 0; j2--) {
+      dp[i2][j2] = a[i2] === b[j2] ? dp[i2 + 1][j2 + 1] + 1 : Math.max(dp[i2 + 1][j2], dp[i2][j2 + 1]);
+    }
+  }
+  const matches = [];
+  let i = 0;
+  let j = 0;
+  while (i < n && j < m) {
+    if (a[i] === b[j]) {
+      matches.push([i, j]);
+      i++;
+      j++;
+    } else if (dp[i + 1][j] >= dp[i][j + 1]) {
+      i++;
+    } else {
+      j++;
+    }
+  }
+  return matches;
+}
+// src/pipeline/classify.ts
+var MAX_ATTEMPTS = 2;
+var MIN_TRUSTED_CONFIDENCE = 0.5;
+async function classify(candidates, classifier) {
+  const changes = [];
+  for (const pair of candidates) {
+    changes.push(await classifyPair(pair, classifier));
+  }
+  return changes;
+}
+async function classifyPair(pair, classifier) {
+  for (let attempt = 0; attempt < MAX_ATTEMPTS; attempt += 1) {
+    let verdict;
+    try {
+      verdict = await classifier.classify(pair);
+    } catch {
+      continue;
+    }
+    if (isValidVerdict(verdict)) {
+      return toChange(pair, verdict);
+    }
+  }
+  return needsReviewChange(pair);
+}
+function isValidVerdict(value) {
+  if (typeof value !== "object" || value === null) return false;
+  const v = value;
+  if (v.classification !== "substantive" && v.classification !== "cosmetic") return false;
+  if (typeof v.confidence !== "number" || !Number.isFinite(v.confidence)) return false;
+  if (v.confidence < 0 || v.confidence > 1) return false;
+  if (v.description !== void 0 && typeof v.description !== "string") return false;
+  return true;
+}
+function toChange(pair, verdict) {
+  const base = {
+    type: pair.type,
+    classification: verdict.classification,
+    spanA: pair.spanA,
+    spanB: pair.spanB,
+    confidence: verdict.confidence,
+    needsReview: verdict.confidence < MIN_TRUSTED_CONFIDENCE
+  };
+  return verdict.description === void 0 ? base : { ...base, description: verdict.description };
+}
+function needsReviewChange(pair) {
+  const { classification, confidence } = needsReviewVerdict();
+  return {
+    type: pair.type,
+    classification,
+    spanA: pair.spanA,
+    spanB: pair.spanB,
+    confidence,
+    needsReview: true
+  };
+}
+// src/cache.ts
+import { createHash } from "crypto";
+var FIELD_SEPARATOR = String.fromCharCode(0);
+function createMemoryCache() {
+  const store = /* @__PURE__ */ new Map();
+  return {
+    get: (key) => Promise.resolve(store.get(key)),
+    set: (key, verdict) => {
+      store.set(key, verdict);
+      return Promise.resolve();
+    }
+  };
+}
+function withCache(classifier, options) {
+  const cache = options.cache ?? createMemoryCache();
+  return {
+    classify: async (pair) => {
+      const key = cacheKey(pair, options.modelId, options.promptVersion);
+      const cached = await cache.get(key);
+      if (cached !== void 0) return cached;
+      const verdict = await classifier.classify(pair);
+      await cache.set(key, verdict);
+      return verdict;
+    }
+  };
+}
+function cacheKey(pair, modelId, promptVersion) {
+  const parts = [normalize2(pair.a), normalize2(pair.b), promptVersion, modelId];
+  return createHash("sha256").update(parts.join(FIELD_SEPARATOR)).digest("hex");
+}
+function normalize2(text) {
+  return text.replace(/\s+/g, " ").trim();
+}
+// src/index.ts
+async function diff(a, b, options) {
+  const granularity = options?.segmentGranularity ?? "sentence";
+  const pairs = align(segment(a, granularity), segment(b, granularity));
+  const candidates = [];
+  for (const pair of pairs) {
+    if (pair.tag !== "candidate") continue;
+    if (pair.a !== null && pair.b !== null) {
+      candidates.push({ type: "modification", a: pair.a.text, b: pair.b.text, spanA: pair.a.span, spanB: pair.b.span });
+    } else if (pair.b !== null) {
+      candidates.push({ type: "insertion", a: "", b: pair.b.text, spanA: null, spanB: pair.b.span });
+    } else {
+      candidates.push({ type: "deletion", a: pair.a.text, b: "", spanA: pair.a.span, spanB: null });
+    }
+  }
+  const modelId = options?.modelId ?? DEFAULT_MODEL_ID;
+  const classified = candidates.length === 0 ? [] : await classify(candidates, options?.classifier ?? createDefaultClassifier({ modelId }));
+  const changes = [];
+  let classifiedIndex = 0;
+  for (const pair of pairs) {
+    if (pair.tag === "unchanged") continue;
+    if (pair.tag === "trivial-change") {
+      changes.push(cosmeticModification(pair.a, pair.b));
+    } else if (pair.tag === "move") {
+      changes.push(moveChange(pair.a, pair.b));
+    } else {
+      changes.push(classified[classifiedIndex]);
+      classifiedIndex += 1;
+    }
+  }
+  const provenance = {
+    modelId,
+    promptVersion: options?.promptVersion ?? DEFAULT_PROMPT_VERSION,
+    engineVersion: ENGINE_VERSION
+  };
+  return { schemaVersion: SCHEMA_VERSION, provenance, changes, summary: summarize(changes) };
+}
+function cosmeticModification(a, b) {
+  return { type: "modification", classification: "cosmetic", spanA: a.span, spanB: b.span, confidence: 1, needsReview: false };
+}
+function moveChange(a, b) {
+  return { type: "move", classification: "cosmetic", spanA: a.span, spanB: b.span, confidence: 1, needsReview: false };
+}
+function summarize(changes) {
+  const byType = { insertion: 0, deletion: 0, modification: 0, move: 0 };
+  let substantive = 0;
+  let cosmetic = 0;
+  let needsReview = 0;
+  for (const change of changes) {
+    byType[change.type] += 1;
+    if (change.classification === "substantive") substantive += 1;
+    else cosmetic += 1;
+    if (change.needsReview) needsReview += 1;
+  }
+  return { substantive, cosmetic, byType, needsReview };
+}
+export {
+  SCHEMA_VERSION,
+  DEFAULT_MODEL_ID,
+  needsReviewVerdict,
+  createDefaultClassifier,
+  ENGINE_VERSION,
+  DEFAULT_PROMPT_VERSION,
+  createMemoryCache,
+  withCache,
+  cacheKey,
+  diff
+};
+//# sourceMappingURL=chunk-4GFNMJGB.js.map

package/dist/chunk-4GFNMJGB.js.map ADDED Viewed

@@ -0,0 +1 @@

+ {"version":3,"sources":["../src/schema.ts","../src/classifier.ts","../src/classifiers/claude.ts","../src/version.ts","../src/pipeline/segment.ts","../src/pipeline/align.ts","../src/pipeline/classify.ts","../src/cache.ts","../src/index.ts"],"sourcesContent":["/**\r\n * The semdiff public contract (ADR-0006).\r\n *\r\n * A `StructuredDiff` is the engine's primary output. Every human-readable\r\n * rendering is a pure function of it, and machine consumers — notably the\r\n * downstream `sust-reg-reporter` application — integrate against these types\r\n * and their JSON form. This module is pure types and constants: no logic, no\r\n * imports, nothing domain-specific (ADR-0001).\r\n */\r\n\r\n/**\r\n * Version of the StructuredDiff contract. Additive-by-default (ADR-0006): a\r\n * backwards-compatible addition keeps the version; a breaking shape change\r\n * bumps it and gets its own ADR. `StructuredDiff.schemaVersion` is typed as a\r\n * plain `string` (not this literal) so an additive bump is not itself a\r\n * breaking type change for pinned consumers.\r\n */\r\nexport const SCHEMA_VERSION = \"1.0.0\";\r\n\r\n/**\r\n * A `Span` locates a change within ONE input by half-open `[start, end)`\r\n * CHARACTER OFFSETS (ADR-0007).\r\n *\r\n * INVARIANT (load-bearing for consumer citation integrity): offsets index into\r\n * the EXACT, LITERAL, UN-NORMALIZED input string the caller passed. For\r\n * `sust-reg-reporter` that string is the immutable content-addressed snapshot\r\n * text (its ADR-0004 citation integrity, ADR-0011 snapshot store), so the\r\n * offsets resolve against a stored snapshot. Normalization applied internally\r\n * for alignment (whitespace, casing, punctuation, numbering) MUST NOT shift the\r\n * reported offsets. These `{ start, end }` map field-for-field onto the\r\n * consumer's citation span (`@sust-reg/core` `SourceCitation.span`).\r\n */\r\nexport interface Span {\r\n /** Inclusive start character offset into the literal input. */\r\n readonly start: number;\r\n /** Exclusive end character offset into the literal input. */\r\n readonly end: number;\r\n /**\r\n * Optional id of the segmentation unit this span falls in (ADR-0003).\r\n * Additive metadata only — consumers anchor on `start`/`end`, never this.\r\n */\r\n readonly unitId?: string;\r\n}\r\n\r\n/** The kind of edit a change represents. */\r\nexport type ChangeType = \"insertion\" | \"deletion\" | \"modification\" | \"move\";\r\n\r\n/** Whether a change alters meaning (`substantive`) or not (`cosmetic`). */\r\nexport type Classification = \"substantive\" | \"cosmetic\";\r\n\r\n/** One classified change between input A and input B. */\r\nexport interface Change {\r\n readonly type: ChangeType;\r\n readonly classification: Classification;\r\n /** Location in input A; `null` for a pure insertion (absent from A). */\r\n readonly spanA: Span | null;\r\n /** Location in input B; `null` for a pure deletion (absent from B). */\r\n readonly spanB: Span | null;\r\n /**\r\n * Short description of what changed. Present only for substantive\r\n * modifications; the key is OMITTED otherwise (never set to `undefined`,\r\n * per `exactOptionalPropertyTypes`).\r\n */\r\n readonly description?: string;\r\n /** Classifier confidence in `[0, 1]`. */\r\n readonly confidence: number;\r\n /** Set for low-confidence or failed/degraded classifications (ADR-0004). */\r\n readonly needsReview: boolean;\r\n}\r\n\r\n/** The reproducibility stamp for a run (ADR-0004): identifies the model run. */\r\nexport interface Provenance {\r\n readonly modelId: string;\r\n readonly promptVersion: string;\r\n readonly engineVersion: string;\r\n}\r\n\r\n/** Aggregate counts for quick triage. */\r\nexport interface DiffSummary {\r\n readonly substantive: number;\r\n readonly cosmetic: number;\r\n /** Count per change type; all four keys are present (zeros allowed). */\r\n readonly byType: Readonly<Record<ChangeType, number>>;\r\n readonly needsReview: number;\r\n}\r\n\r\n/**\r\n * The engine's primary output (ADR-0006): a stable, versioned, JSON-\r\n * serializable diff. All human-readable views derive from it.\r\n */\r\nexport interface StructuredDiff {\r\n /** The `SCHEMA_VERSION` in effect at emit time; typed `string` for additive bumps. */\r\n readonly schemaVersion: string;\r\n readonly provenance: Provenance;\r\n readonly changes: readonly Change[];\r\n readonly summary: DiffSummary;\r\n}\r\n","/**\n * The classification boundary (ADR-0004).\n *\n * semdiff uses the LLM strictly as a gated, structured classifier behind a\n * small `Classifier` interface — never a free-form diff narrator, never a\n * hardwired SDK. The default provider is the latest capable Claude model,\n * injected via config, so consumers (e.g. `sust-reg-reporter`) are not forced\n * to own provider wiring.\n */\nimport type { Classification, Span } from \"./schema.ts\";\n\n/**\n * The structural kind of a candidate change. A subset of `ChangeType`: `move`\n * is detected deterministically before classification (ADR-0010), so it never\n * reaches the model.\n */\nexport type CandidateType = \"insertion\" | \"deletion\" | \"modification\";\n\n/**\n * A single changed pair handed to the classifier for a verdict. One side may be\n * absent: for an insertion `a` is `\"\"` and `spanA` is `null`; for a deletion\n * `b` is `\"\"` and `spanB` is `null` (ADR-0011).\n */\nexport interface CandidatePair {\n /** The structural kind of change. */\n readonly type: CandidateType;\n /** The unit text from input A; `\"\"` for an insertion. */\n readonly a: string;\n /** The unit text from input B; `\"\"` for a deletion. */\n readonly b: string;\n /** Location of `a` within input A (literal character offsets, ADR-0007); `null` for an insertion. */\n readonly spanA: Span | null;\n /** Location of `b` within input B (literal character offsets, ADR-0007); `null` for a deletion. */\n readonly spanB: Span | null;\n}\n\n/** The schema-validated verdict for one candidate pair. */\nexport interface ClassifierVerdict {\n readonly classification: Classification;\n /** Present only for a substantive verdict; OMITTED otherwise. */\n readonly description?: string;\n /** Provider/engine confidence in `[0, 1]`. */\n readonly confidence: number;\n}\n\n/** The injectable provider boundary. An implementation wraps one LLM provider. */\nexport interface Classifier {\n classify(pair: CandidatePair): Promise<ClassifierVerdict>;\n}\n\n/**\n * Default model id (the latest capable Claude, ADR-0004). Stamped into run\n * provenance, and used by the default classifier when the caller does not pin\n * one. Callers that inject their own provider should pass `modelId` for an\n * accurate provenance stamp.\n */\nexport const DEFAULT_MODEL_ID = \"claude-opus-4-8\";\n\n/**\n * The never-drop / never-fabricate fallback verdict (ADR-0004). When a model\n * response fails schema validation, retries are exhausted, or the provider\n * errors, the `classify` stage records this conservative verdict — `substantive`\n * (so the change is surfaced for review, not hidden) with zero confidence — and\n * flags the resulting change for review, rather than dropping the pair or\n * guessing a cosmetic/substantive call.\n */\nexport function needsReviewVerdict(): ClassifierVerdict {\n return { classification: \"substantive\", confidence: 0 };\n}\n","/**\r\n * Default classifier — calls the Anthropic Messages API to judge whether a\r\n * change is substantive or cosmetic (ADR-0004, ADR-0009).\r\n *\r\n * It uses the global `fetch` (no SDK), so the engine keeps ZERO runtime\r\n * dependencies; a consumer that needs a different provider or transport injects\r\n * its own `Classifier` instead. Determinism is steered by a pinned model, a\r\n * pinned prompt, and low effort where the model accepts it — Opus 4.8 removed the\r\n * `temperature` parameter, so there is no `temperature: 0`. The verdict is returned through a constrained\r\n * JSON schema and then RE-VALIDATED by the classify stage, so this module can\r\n * parse leniently: any malformed response surfaces as a thrown error that the\r\n * classify stage retries, then degrades to needs-review.\r\n *\r\n * Transport resilience (ADR-0012): each call has a timeout and retries transient\r\n * failures — HTTP 429/5xx, network errors, and abort timeouts — with exponential\r\n * backoff, honouring a `Retry-After` header. Non-transient errors (400, auth)\r\n * fail fast. This is distinct from the classify stage's verdict-level retry: the\r\n * provider exhausts its backoff first, and only if it still throws does the\r\n * stage's safety net degrade the pair to needs-review.\r\n */\r\nimport { DEFAULT_MODEL_ID, type CandidatePair, type Classifier, type ClassifierVerdict } from \"../classifier.ts\";\r\n\r\nconst MESSAGES_URL = \"https://api.anthropic.com/v1/messages\";\r\nconst ANTHROPIC_VERSION = \"2023-06-01\";\r\nconst MAX_TOKENS = 1024;\r\n\r\n/** Per-request timeout before the call is aborted and treated as transient (ADR-0012). */\r\nconst DEFAULT_TIMEOUT_MS = 60_000;\r\n/** Retries after the initial attempt on a transient failure (ADR-0012). */\r\nconst DEFAULT_MAX_RETRIES = 2;\r\n/** Backoff for retry n is BASE * 2**n plus jitter, capped at MAX (ADR-0012). */\r\nconst BASE_RETRY_DELAY_MS = 500;\r\nconst MAX_RETRY_DELAY_MS = 8_000;\r\n\r\n/**\r\n * Static classification instructions — the stable, cacheable prompt prefix\r\n * (ADR-0009). Domain-neutral (ADR-0001). Recall-biased per ADR-0005: when\r\n * uncertain, prefer \"substantive\" so a real change is surfaced, not hidden.\r\n *\r\n * Exported so a release-gating test can pin its hash to `DEFAULT_PROMPT_VERSION`\r\n * (ADR-0005): editing this text without bumping the version would let a persisted\r\n * verdict cache serve stale results. Not part of the package's public `exports`.\r\n */\r\nexport const SYSTEM_PROMPT = [\r\n \"You are a careful classifier inside a meaning-aware diff engine. You are given\",\r\n \"two versions of one short span of prose: version A (before) and version B\",\r\n \"(after). Decide whether the change from A to B is:\",\r\n \"\",\r\n '- \"substantive\": it alters the meaning — a changed value, number, date,',\r\n \" condition, scope, or any wording a careful reader would act on differently.\",\r\n '- \"cosmetic\": it preserves the meaning — formatting, punctuation, casing,',\r\n \" whitespace, renumbering, or a meaning-preserving rewording.\",\r\n \"\",\r\n \"One side may be empty: an empty A means the B text was newly inserted, and an\",\r\n \"empty B means the A text was removed. Judge whether that insertion or removal\",\r\n \"is substantive (it adds or removes meaning, an obligation, or a condition) or\",\r\n \"cosmetic (boilerplate, formatting, or duplicate content).\",\r\n \"\",\r\n \"Rules:\",\r\n \"- Judge only these two snippets; do not assume external context.\",\r\n '- When genuinely uncertain whether the meaning changed, choose \"substantive\":',\r\n \" it is safer to surface a real change than to hide one.\",\r\n '- For a substantive change, give a one-sentence factual \"description\" of what',\r\n \" changed — no advice and no judgement of how significant it is.\",\r\n '- Set \"confidence\" in [0, 1] for how sure you are of the classification.',\r\n].join(\"\\n\");\r\n\r\n/** Constrained output shape (structured outputs). Ranges are validated downstream. */\r\nconst VERDICT_SCHEMA = {\r\n type: \"object\",\r\n properties: {\r\n classification: { type: \"string\", enum: [\"substantive\", \"cosmetic\"] },\r\n description: { type: \"string\" },\r\n confidence: { type: \"number\" },\r\n },\r\n required: [\"classification\", \"confidence\"],\r\n additionalProperties: false,\r\n} as const;\r\n\r\n/** Configuration for the default Anthropic-backed classifier. */\r\nexport interface DefaultClassifierConfig {\r\n /** Model id; defaults to the latest capable Claude (ADR-0004). */\r\n readonly modelId?: string;\r\n /** API key; defaults to `process.env.ANTHROPIC_API_KEY`. */\r\n readonly apiKey?: string;\r\n /** Per-request timeout in ms before the call is aborted and retried (ADR-0012). Default 60000. */\r\n readonly timeoutMs?: number;\r\n /** Retries on a transient failure — 429, 5xx, network, or timeout (ADR-0012). Default 2. */\r\n readonly maxRetries?: number;\r\n}\r\n\r\n/**\r\n * Construct the default classifier. Throws immediately if no API key is\r\n * available, so a diff that needs the model fails at a clear boundary rather\r\n * than per-call.\r\n */\r\nexport function createDefaultClassifier(config: DefaultClassifierConfig): Classifier {\r\n const modelId = config.modelId ?? DEFAULT_MODEL_ID;\r\n const apiKey = config.apiKey ?? process.env.ANTHROPIC_API_KEY;\r\n const timeoutMs = config.timeoutMs ?? DEFAULT_TIMEOUT_MS;\r\n const maxRetries = config.maxRetries ?? DEFAULT_MAX_RETRIES;\r\n if (apiKey === undefined || apiKey === \"\") {\r\n throw new Error(\"createDefaultClassifier: no API key (set ANTHROPIC_API_KEY or pass config.apiKey)\");\r\n }\r\n\r\n return {\r\n classify: (pair: CandidatePair): Promise<ClassifierVerdict> => {\r\n const init: RequestInit = {\r\n method: \"POST\",\r\n headers: {\r\n \"content-type\": \"application/json\",\r\n \"x-api-key\": apiKey,\r\n \"anthropic-version\": ANTHROPIC_VERSION,\r\n },\r\n body: JSON.stringify(buildRequest(modelId, pair)),\r\n };\r\n return classifyWithRetry(() => classifyOnce(init, timeoutMs), maxRetries);\r\n },\r\n };\r\n}\r\n\r\n/**\r\n * A transient failure worth retrying: a 429/5xx response, a network error, or a\r\n * timeout. `retryAfterMs` is the server's hint (0 if none). Non-transient errors\r\n * are thrown as plain `Error`s and propagate without a retry.\r\n */\r\nclass TransientError extends Error {\r\n // A field declaration + assignment, not a constructor parameter property:\r\n // parameter properties are runtime syntax that Node's strip-only type removal\r\n // cannot handle, which would break the zero-build `node src/...` path (ADR-0002).\r\n readonly retryAfterMs: number;\r\n constructor(message: string, retryAfterMs: number) {\r\n super(message);\r\n this.retryAfterMs = retryAfterMs;\r\n }\r\n}\r\n\r\n/** Run `attempt`, retrying transient failures with backoff up to `maxRetries` (ADR-0012). */\r\nasync function classifyWithRetry(\r\n attempt: () => Promise<ClassifierVerdict>,\r\n maxRetries: number,\r\n): Promise<ClassifierVerdict> {\r\n for (let retry = 0; ; retry += 1) {\r\n try {\r\n return await attempt();\r\n } catch (error) {\r\n if (!(error instanceof TransientError) || retry >= maxRetries) throw error;\r\n await sleep(backoffMs(retry, error.retryAfterMs));\r\n }\r\n }\r\n}\r\n\r\n/** One request attempt: transient failures throw `TransientError`, others throw plainly. */\r\nasync function classifyOnce(init: RequestInit, timeoutMs: number): Promise<ClassifierVerdict> {\r\n let response: Response;\r\n try {\r\n response = await fetchWithTimeout(MESSAGES_URL, init, timeoutMs);\r\n } catch (cause) {\r\n // Aborted (timeout) or a network failure — both transient.\r\n throw new TransientError(`Anthropic API request failed: ${(cause as Error).message}`, 0);\r\n }\r\n if (!response.ok) {\r\n const message = `Anthropic API error ${response.status}: ${await response.text()}`;\r\n if (response.status === 429 || response.status >= 500) {\r\n throw new TransientError(message, retryAfterMs(response.headers));\r\n }\r\n throw new Error(message);\r\n }\r\n return parseVerdict(await response.json());\r\n}\r\n\r\n/** `fetch` with an abort-based timeout; the timer is always cleared. */\r\nasync function fetchWithTimeout(url: string, init: RequestInit, timeoutMs: number): Promise<Response> {\r\n const controller = new AbortController();\r\n const timer = setTimeout(() => controller.abort(), timeoutMs);\r\n try {\r\n return await fetch(url, { ...init, signal: controller.signal });\r\n } finally {\r\n clearTimeout(timer);\r\n }\r\n}\r\n\r\n/** Backoff for retry `n`: the server's `Retry-After` if given, else exponential with jitter. */\r\nfunction backoffMs(retry: number, retryAfterMs: number): number {\r\n if (retryAfterMs > 0) return Math.min(retryAfterMs, MAX_RETRY_DELAY_MS);\r\n const exponential = BASE_RETRY_DELAY_MS * 2 ** retry;\r\n return Math.min(exponential + exponential * 0.25 * Math.random(), MAX_RETRY_DELAY_MS);\r\n}\r\n\r\n/** Parse `Retry-After` (seconds) into ms; 0 when absent or unparseable. */\r\nfunction retryAfterMs(headers: Headers): number {\r\n const seconds = Number(headers.get(\"retry-after\"));\r\n return seconds > 0 ? seconds * 1000 : 0;\r\n}\r\n\r\nfunction sleep(ms: number): Promise<void> {\r\n return new Promise((resolve) => {\r\n setTimeout(resolve, ms);\r\n });\r\n}\r\n\r\n/** Build the Messages API request body for one candidate pair. */\r\nfunction buildRequest(modelId: string, pair: CandidatePair): unknown {\r\n const outputConfig: Record<string, unknown> = {\r\n format: { type: \"json_schema\", schema: VERDICT_SCHEMA },\r\n };\r\n // `effort` steers determinism and cost, but only Opus and Sonnet 4.6 accept\r\n // it; Haiku and Sonnet 4.5 reject it with a 400. Include it only where\r\n // supported so overriding `modelId` to a cheaper model (an eval sweep, say)\r\n // does not fail. Omitting it is harmless; sending it where unsupported is not.\r\n if (modelSupportsEffort(modelId)) {\r\n outputConfig.effort = \"low\";\r\n }\r\n return {\r\n model: modelId,\r\n max_tokens: MAX_TOKENS,\r\n output_config: outputConfig,\r\n system: [{ type: \"text\", text: SYSTEM_PROMPT, cache_control: { type: \"ephemeral\" } }],\r\n messages: [{ role: \"user\", content: `Change type: ${pair.type}.\\nA:\\n${pair.a}\\n\\nB:\\n${pair.b}` }],\r\n };\r\n}\r\n\r\n/**\r\n * Whether `output_config.effort` is accepted by `modelId`. Opus (4.5+) and\r\n * Sonnet 4.6 support it; Haiku and Sonnet 4.5 return a 400. Biased to omit when\r\n * unsure — a missing effort still succeeds, an unsupported effort does not — so\r\n * a future model silently runs without effort rather than erroring.\r\n */\r\nfunction modelSupportsEffort(modelId: string): boolean {\r\n return modelId.startsWith(\"claude-opus-\") || modelId.startsWith(\"claude-sonnet-4-6\");\r\n}\r\n\r\n/** Extract the structured verdict from the Messages API response (lenient). */\r\nfunction parseVerdict(data: unknown): ClassifierVerdict {\r\n const message = data as { content?: ReadonlyArray<{ type?: string; text?: string }> };\r\n const text = message.content?.find((block) => block.type === \"text\")?.text;\r\n if (text === undefined) {\r\n throw new Error(\"Anthropic API returned no text content\");\r\n }\r\n return JSON.parse(text) as ClassifierVerdict;\r\n}\r\n","/**\r\n * Version constants feeding the reproducibility stamp (ADR-0004). A run is\r\n * stamped with the model id, the prompt version, and this engine version so\r\n * results are reproducible and cache keys are stable.\r\n */\r\n\r\n/**\r\n * semdiff engine version, stamped into `Provenance.engineVersion`. Kept in sync\r\n * with `package.json` `version`; `test/version.contract.test.ts` fails the build\r\n * if the two drift, so a published artifact never stamps a stale version.\r\n */\r\nexport const ENGINE_VERSION = \"0.1.0\";\r\n\r\n/** Default prompt-template version, stamped into `Provenance.promptVersion`. */\r\nexport const DEFAULT_PROMPT_VERSION = \"0\";\r\n","/**\r\n * Stage 1 — segment (ADR-0003). Local and deterministic; no model.\r\n *\r\n * Split an input into comparable units. At `sentence` granularity each unit is\r\n * a sentence; at `clause` granularity sentences are further divided at strong\r\n * intra-sentence separators. Each `Unit` carries the half-open `[start, end)`\r\n * CHARACTER OFFSETS of its text within the LITERAL input, so spans reported\r\n * downstream index the caller's exact input (the offset invariant, ADR-0007):\r\n * `input.slice(unit.span.start, unit.span.end) === unit.text` always holds.\r\n * Whitespace at unit boundaries is excluded from the span (and the text); no\r\n * other normalization is applied, so offsets never drift.\r\n */\r\nimport type { Span } from \"../schema.ts\";\r\n\r\n/** The granularity at which an input is segmented. */\r\nexport type SegmentGranularity = \"sentence\" | \"clause\";\r\n\r\n/** One comparable unit of an input, anchored to the literal input by offsets. */\r\nexport interface Unit {\r\n /** The unit's text, verbatim from the input (boundary whitespace trimmed). */\r\n readonly text: string;\r\n /** Half-open offsets of `text` within the literal input. */\r\n readonly span: Span;\r\n}\r\n\r\n/**\r\n * Sentence breaking is language-aware, and determinism is a core guarantee\r\n * (ADR-0005), so we pin the locale rather than use the ambient runtime locale.\r\n * Making the locale configurable is a later additive change.\r\n */\r\nconst SENTENCE_SEGMENTER = new Intl.Segmenter(\"en\", { granularity: \"sentence\" });\r\n\r\n/**\r\n * Strong intra-sentence clause separators. Comma-level splitting is deliberately\r\n * excluded — too unreliable to be deterministically useful — and enumerated-\r\n * clause structural cues are a future additive enhancement.\r\n */\r\nconst CLAUSE_DELIMITERS = \";:\";\r\n\r\n/**\r\n * Segment `text` into ordered `Unit`s at the given granularity. Deterministic;\r\n * no model. Empty and whitespace-only inputs yield no units.\r\n */\r\nexport function segment(text: string, granularity: SegmentGranularity): readonly Unit[] {\r\n const units: Unit[] = [];\r\n const delimiters = granularity === \"clause\" ? CLAUSE_DELIMITERS : \"\";\r\n for (const { segment: sentence, index } of SENTENCE_SEGMENTER.segment(text)) {\r\n emitUnits(sentence, index, delimiters, units);\r\n }\r\n return units;\r\n}\r\n\r\n/**\r\n * Emit trimmed units from a sentence `chunk` located at absolute offset `base`.\r\n * With no delimiters the chunk is a single unit; otherwise it is split at each\r\n * delimiter character, offsets staying absolute into the literal input.\r\n */\r\nfunction emitUnits(chunk: string, base: number, delimiters: string, out: Unit[]): void {\r\n if (delimiters.length === 0) {\r\n pushTrimmed(chunk, base, out);\r\n return;\r\n }\r\n let cursor = 0;\r\n for (let i = 0; i < chunk.length; i++) {\r\n if (delimiters.includes(chunk[i]!)) {\r\n pushTrimmed(chunk.slice(cursor, i), base + cursor, out);\r\n cursor = i + 1;\r\n }\r\n }\r\n pushTrimmed(chunk.slice(cursor), base + cursor, out);\r\n}\r\n\r\n/**\r\n * Trim boundary whitespace from `part` and, if non-empty, push a `Unit` whose\r\n * span points at the trimmed content within the literal input (`base` is the\r\n * absolute offset of `part`).\r\n */\r\nfunction pushTrimmed(part: string, base: number, out: Unit[]): void {\r\n const trimmed = part.trim();\r\n if (trimmed.length === 0) return;\r\n const start = base + (part.length - part.trimStart().length);\r\n out.push({ text: trimmed, span: { start, end: start + trimmed.length } });\r\n}\r\n","/**\r\n * Stage 2 — align (ADR-0003). Local and deterministic; no LLM.\r\n *\r\n * Match units across A and B and tag each pairing so the stage 2 -> 3 gate can\r\n * keep unchanged and cosmetic content away from the model:\r\n *\r\n * - `unchanged` — paired and textually identical.\r\n * - `trivial-change` — paired after normalization (whitespace, casing,\r\n * punctuation, and leading enumeration collapsed) but the\r\n * literal text differs. A cosmetic edit.\r\n * - `move` — a relocation of identical content (ADR-0010): a deletion\r\n * whose normalized content matches an insertion elsewhere,\r\n * re-paired into one change. Both old (`a`) and new (`b`)\r\n * positions are present; the text is unchanged.\r\n * - `candidate` — a genuine change needing downstream judgment: a paired\r\n * modification (both sides present), or a one-sided\r\n * insertion (`a === null`) or deletion (`b === null`).\r\n *\r\n * Pairing runs a longest-common-subsequence match over the normalized keys, then\r\n * pairs the survivors in each gap positionally when they share a token. A final\r\n * pass re-pairs content-identical deletion/insertion survivors into `move`s.\r\n *\r\n * Normalization is used ONLY to decide matches; it never touches the `Unit`\r\n * offsets, so the literal-input invariant (ADR-0007) is preserved untouched.\r\n */\r\nimport type { Unit } from \"./segment.ts\";\r\n\r\n/** How an aligned pairing relates its A and B units. */\r\nexport type AlignmentTag = \"unchanged\" | \"trivial-change\" | \"move\" | \"candidate\";\r\n\r\n/** A pairing of units across inputs; either side may be `null`. */\r\nexport interface AlignedPair {\r\n readonly tag: AlignmentTag;\r\n /** Unit from A, or `null` for an insertion. */\r\n readonly a: Unit | null;\r\n /** Unit from B, or `null` for a deletion. */\r\n readonly b: Unit | null;\r\n}\r\n\r\n/** Leading list/enumeration marker, e.g. \"1.\", \"1)\", \"(a)\", \"iv.\", or a bullet. */\r\nconst LEADING_ENUMERATOR = /^\\s*(?:[([]?\\s*(?:\\d{1,3}|[a-z]{1,2}|[ivxlcdm]{1,5})\\s*[)\\].]|[-*•·])\\s+/iu;\r\n\r\n/**\r\n * Unicode punctuation — quotes, dashes, periods, commas, parentheses, etc.\r\n * Symbols are deliberately KEPT: collapsing e.g. \"<\" and \">\" (or \"=\" / \"+\")\r\n * would mask a substantive change as cosmetic, and missing substance is the\r\n * costly error (ADR-0005).\r\n */\r\nconst PUNCTUATION = /\\p{P}/gu;\r\n\r\n/**\r\n * Align the segmented units of A and B into tagged pairings, in order.\r\n * Deterministic; no model.\r\n */\r\nexport function align(unitsA: readonly Unit[], unitsB: readonly Unit[]): readonly AlignedPair[] {\r\n const keysA = unitsA.map(normalize);\r\n const keysB = unitsB.map(normalize);\r\n const matches = lcsMatches(keysA, keysB);\r\n\r\n const out: AlignedPair[] = [];\r\n let i = 0;\r\n let j = 0;\r\n for (const [mi, mj] of matches) {\r\n emitGap(unitsA.slice(i, mi), unitsB.slice(j, mj), out);\r\n const a = unitsA[mi]!;\r\n const b = unitsB[mj]!;\r\n out.push({ tag: a.text === b.text ? \"unchanged\" : \"trivial-change\", a, b });\r\n i = mi + 1;\r\n j = mj + 1;\r\n }\r\n emitGap(unitsA.slice(i), unitsB.slice(j), out);\r\n return detectMoves(out);\r\n}\r\n\r\n/**\r\n * Pair the survivors in a gap: positionally, as a `candidate` modification when\r\n * the two units share a token, otherwise as a separate deletion and insertion.\r\n * Any leftover units are one-sided deletions (A) or insertions (B).\r\n */\r\nfunction emitGap(gapA: readonly Unit[], gapB: readonly Unit[], out: AlignedPair[]): void {\r\n const paired = Math.min(gapA.length, gapB.length);\r\n let k = 0;\r\n for (; k < paired; k++) {\r\n const a = gapA[k]!;\r\n const b = gapB[k]!;\r\n if (sharesToken(a, b)) {\r\n out.push({ tag: \"candidate\", a, b });\r\n } else {\r\n out.push({ tag: \"candidate\", a, b: null });\r\n out.push({ tag: \"candidate\", a: null, b });\r\n }\r\n }\r\n for (; k < gapA.length; k++) out.push({ tag: \"candidate\", a: gapA[k]!, b: null });\r\n for (; k < gapB.length; k++) out.push({ tag: \"candidate\", a: null, b: gapB[k]! });\r\n}\r\n\r\n/**\r\n * Re-pair content-identical deletion/insertion survivors into `move`s (ADR-0010).\r\n * A deletion is matched to an insertion with the same normalized key; the move\r\n * keeps the deletion's old position (`a`) and the insertion's new position (`b`).\r\n * Unmatched insertions/deletions are left as-is.\r\n *\r\n * Matching is content-only and 1:1 — it weighs neither distance nor document\r\n * structure (ADR-0010). Two consequences follow from keying on normalized text:\r\n * when several insertions share a key the LAST one wins (the map overwrites), and\r\n * if the same text genuinely appears as an unrelated deletion AND insertion they\r\n * collapse into one `move`. Acceptable at sentence/clause granularity, where\r\n * identical-content survivors are overwhelmingly true relocations.\r\n */\r\nfunction detectMoves(pairs: readonly AlignedPair[]): readonly AlignedPair[] {\r\n const insertionByKey = new Map<string, number>();\r\n pairs.forEach((pair, index) => {\r\n if (pair.tag === \"candidate\" && pair.a === null) {\r\n insertionByKey.set(normalize(pair.b!), index);\r\n }\r\n });\r\n\r\n const moveTo = new Map<number, number>();\r\n pairs.forEach((pair, index) => {\r\n if (pair.tag === \"candidate\" && pair.b === null) {\r\n const key = normalize(pair.a!);\r\n const insertionIndex = insertionByKey.get(key);\r\n if (insertionIndex !== undefined) {\r\n insertionByKey.delete(key);\r\n moveTo.set(index, insertionIndex);\r\n }\r\n }\r\n });\r\n\r\n if (moveTo.size === 0) return pairs;\r\n\r\n const movedInsertions = new Set(moveTo.values());\r\n return pairs.flatMap((pair, index) => {\r\n if (movedInsertions.has(index)) return [];\r\n const insertionIndex = moveTo.get(index);\r\n return insertionIndex === undefined ? [pair] : [{ tag: \"move\" as const, a: pair.a, b: pairs[insertionIndex]!.b }];\r\n });\r\n}\r\n\r\n/** Normalized match key: lower-cased, enumerator-stripped, punctuation-free. */\r\nfunction normalize(unit: Unit): string {\r\n return unit.text\r\n .toLowerCase()\r\n .replace(LEADING_ENUMERATOR, \"\")\r\n .replace(PUNCTUATION, \" \")\r\n .replace(/\\s+/g, \" \")\r\n .trim();\r\n}\r\n\r\n/** Whether two units share at least one normalized token (a weak similarity gate). */\r\nfunction sharesToken(a: Unit, b: Unit): boolean {\r\n const tokensB = new Set(tokenize(b));\r\n return tokenize(a).some((token) => tokensB.has(token));\r\n}\r\n\r\nfunction tokenize(unit: Unit): string[] {\r\n const normalized = normalize(unit);\r\n return normalized.length === 0 ? [] : normalized.split(\" \");\r\n}\r\n\r\n/**\r\n * Longest-common-subsequence match over two key sequences, returned as ordered\r\n * `[indexInA, indexInB]` pairs of equal keys.\r\n */\r\nfunction lcsMatches(a: readonly string[], b: readonly string[]): Array<readonly [number, number]> {\r\n const n = a.length;\r\n const m = b.length;\r\n const dp: number[][] = Array.from({ length: n + 1 }, () => new Array<number>(m + 1).fill(0));\r\n for (let i = n - 1; i >= 0; i--) {\r\n for (let j = m - 1; j >= 0; j--) {\r\n dp[i]![j] = a[i] === b[j] ? dp[i + 1]![j + 1]! + 1 : Math.max(dp[i + 1]![j]!, dp[i]![j + 1]!);\r\n }\r\n }\r\n\r\n const matches: Array<readonly [number, number]> = [];\r\n let i = 0;\r\n let j = 0;\r\n while (i < n && j < m) {\r\n if (a[i] === b[j]) {\r\n matches.push([i, j]);\r\n i++;\r\n j++;\r\n } else if (dp[i + 1]![j]! >= dp[i]![j + 1]!) {\r\n i++;\r\n } else {\r\n j++;\r\n }\r\n }\r\n return matches;\r\n}\r\n","/**\n * Stage 3 — classify (ADR-0003, ADR-0004). The gated, structured LLM step.\n *\n * The caller passes only genuine `candidate` pairs — align has already kept\n * unchanged, trivial-change, and move content away from the model. A candidate\n * may be a modification (both sides present) or a one-sided insertion/deletion\n * (ADR-0011); for each, this stage asks the injected `Classifier` for a verdict,\n * validates it against the schema, retries once on a malformed response or a\n * provider error, and finally degrades to a flagged `needs-review` change —\n * never dropping a pair and never fabricating a verdict (ADR-0004). The provider\n * stays injected; this module imports no SDK, so the engine has no LLM-infra\n * dependency.\n *\n * Caching (ADR-0004's content-addressed cache) belongs with the provider\n * implementation, not this stage, and is out of scope here.\n */\nimport type { Change } from \"../schema.ts\";\nimport { needsReviewVerdict, type CandidatePair, type Classifier, type ClassifierVerdict } from \"../classifier.ts\";\n\n/** Attempts per pair: one initial call plus one retry (ADR-0004). */\nconst MAX_ATTEMPTS = 2;\n\n/** Verdicts below this confidence are flagged for review (ADR-0006). */\nconst MIN_TRUSTED_CONFIDENCE = 0.5;\n\n/**\n * Classify changed candidate pairs into `Change`s using the injected classifier.\n * Order is preserved; each change carries the candidate's type and spans untouched.\n */\nexport async function classify(\n candidates: readonly CandidatePair[],\n classifier: Classifier,\n): Promise<readonly Change[]> {\n const changes: Change[] = [];\n for (const pair of candidates) {\n changes.push(await classifyPair(pair, classifier));\n }\n return changes;\n}\n\nasync function classifyPair(pair: CandidatePair, classifier: Classifier): Promise<Change> {\n for (let attempt = 0; attempt < MAX_ATTEMPTS; attempt += 1) {\n let verdict: unknown;\n try {\n verdict = await classifier.classify(pair);\n } catch {\n continue; // provider error / timeout / rate limit — retry, then needs-review\n }\n if (isValidVerdict(verdict)) {\n return toChange(pair, verdict);\n }\n }\n return needsReviewChange(pair);\n}\n\n/** Runtime guard: the model response is untrusted until validated (ADR-0004). */\nfunction isValidVerdict(value: unknown): value is ClassifierVerdict {\n if (typeof value !== \"object\" || value === null) return false;\n const v = value as Record<string, unknown>;\n if (v.classification !== \"substantive\" && v.classification !== \"cosmetic\") return false;\n if (typeof v.confidence !== \"number\" || !Number.isFinite(v.confidence)) return false;\n if (v.confidence < 0 || v.confidence > 1) return false;\n if (v.description !== undefined && typeof v.description !== \"string\") return false;\n return true;\n}\n\nfunction toChange(pair: CandidatePair, verdict: ClassifierVerdict): Change {\n const base = {\n type: pair.type,\n classification: verdict.classification,\n spanA: pair.spanA,\n spanB: pair.spanB,\n confidence: verdict.confidence,\n needsReview: verdict.confidence < MIN_TRUSTED_CONFIDENCE,\n };\n return verdict.description === undefined ? base : { ...base, description: verdict.description };\n}\n\nfunction needsReviewChange(pair: CandidatePair): Change {\n const { classification, confidence } = needsReviewVerdict();\n return {\n type: pair.type,\n classification,\n spanA: pair.spanA,\n spanB: pair.spanB,\n confidence,\n needsReview: true,\n };\n}\n","/**\r\n * Content-addressed verdict cache (ADR-0004). Identical classification inputs\r\n * return the cached verdict without a second model call — the primary\r\n * determinism and cost guarantee.\r\n *\r\n * The key is a hash of the normalized pair text plus the prompt version and\r\n * model id, so the same change under the same model/prompt is classified once.\r\n * Spans are deliberately NOT part of the key: the verdict (substantive/cosmetic\r\n * + description + confidence) describes the content change, not where it sits,\r\n * so a pair classified once applies wherever that text appears.\r\n *\r\n * The default cache is in-memory and process-local; inject a `VerdictCache` to\r\n * back it with a persistent store — the engine keeps no backend of its own\r\n * (ADR-0001). Reuse the wrapped classifier across `diff` calls to share it.\r\n */\r\nimport { createHash } from \"node:crypto\";\r\nimport type { CandidatePair, Classifier, ClassifierVerdict } from \"./classifier.ts\";\r\n\r\n/**\r\n * Field separator for the cache key. Normalization collapses whitespace and\r\n * never emits a NUL byte, so distinct field boundaries can never collide.\r\n */\r\nconst FIELD_SEPARATOR = String.fromCharCode(0);\r\n\r\n/** A store for classification verdicts, keyed by content hash. May be async. */\r\nexport interface VerdictCache {\r\n get(key: string): Promise<ClassifierVerdict | undefined>;\r\n set(key: string, verdict: ClassifierVerdict): Promise<void>;\r\n}\r\n\r\n/** An in-memory `VerdictCache` (process-local; the default). */\r\nexport function createMemoryCache(): VerdictCache {\r\n const store = new Map<string, ClassifierVerdict>();\r\n return {\r\n get: (key) => Promise.resolve(store.get(key)),\r\n set: (key, verdict) => {\r\n store.set(key, verdict);\r\n return Promise.resolve();\r\n },\r\n };\r\n}\r\n\r\n/** Options for `withCache`. `modelId` and `promptVersion` are part of the key. */\r\nexport interface CacheOptions {\r\n readonly modelId: string;\r\n readonly promptVersion: string;\r\n readonly cache?: VerdictCache;\r\n}\r\n\r\n/**\r\n * Wrap a `Classifier` so identical inputs are classified once. Reuse the\r\n * returned classifier across `diff` calls to share the cache.\r\n */\r\nexport function withCache(classifier: Classifier, options: CacheOptions): Classifier {\r\n const cache = options.cache ?? createMemoryCache();\r\n return {\r\n classify: async (pair: CandidatePair): Promise<ClassifierVerdict> => {\r\n const key = cacheKey(pair, options.modelId, options.promptVersion);\r\n const cached = await cache.get(key);\r\n if (cached !== undefined) return cached;\r\n const verdict = await classifier.classify(pair);\r\n await cache.set(key, verdict);\r\n return verdict;\r\n },\r\n };\r\n}\r\n\r\n/** Content-addressed key: a hash of (normalized a, normalized b, prompt, model). */\r\nexport function cacheKey(pair: CandidatePair, modelId: string, promptVersion: string): string {\r\n const parts = [normalize(pair.a), normalize(pair.b), promptVersion, modelId];\r\n return createHash(\"sha256\").update(parts.join(FIELD_SEPARATOR)).digest(\"hex\");\r\n}\r\n\r\nfunction normalize(text: string): string {\r\n return text.replace(/\\s+/g, \" \").trim();\r\n}\r\n","/**\n * semdiff — meaning-aware diff engine (library entry point).\n *\n * The library is the source of truth (ADR-0002); the CLI is a thin wrapper.\n * `diff` runs the segment -> align -> classify pipeline (ADR-0003) and assembles\n * the versioned `StructuredDiff` (ADR-0006), the engine's public contract.\n */\nexport * from \"./schema.ts\";\nexport * from \"./classifier.ts\";\n\nimport { SCHEMA_VERSION, type Change, type DiffSummary, type Provenance, type StructuredDiff } from \"./schema.ts\";\nimport { DEFAULT_MODEL_ID, type CandidatePair, type Classifier } from \"./classifier.ts\";\nimport { createDefaultClassifier } from \"./classifiers/claude.ts\";\nimport { ENGINE_VERSION, DEFAULT_PROMPT_VERSION } from \"./version.ts\";\nimport { segment, type SegmentGranularity, type Unit } from \"./pipeline/segment.ts\";\nimport { align } from \"./pipeline/align.ts\";\nimport { classify } from \"./pipeline/classify.ts\";\n\nexport { ENGINE_VERSION, DEFAULT_PROMPT_VERSION };\nexport { createDefaultClassifier, type DefaultClassifierConfig } from \"./classifiers/claude.ts\";\nexport { withCache, createMemoryCache, cacheKey, type VerdictCache, type CacheOptions } from \"./cache.ts\";\n\n/** Options for a `diff` run. Omit a field to take its default. */\nexport interface DiffOptions {\n /**\n * Provider used to classify changed pairs. Defaults to the latest capable\n * Claude model via `createDefaultClassifier` (ADR-0004) — which is only\n * constructed when there is a change to classify (a modification, insertion,\n * or deletion). Identical, cosmetic, and moved content needs no provider.\n */\n readonly classifier?: Classifier;\n /** Model id stamped into provenance; also passed to the default classifier. */\n readonly modelId?: string;\n /** Prompt-template version stamped into provenance. */\n readonly promptVersion?: string;\n /** Granularity at which inputs are segmented (ADR-0003). */\n readonly segmentGranularity?: SegmentGranularity;\n}\n\n/**\n * Produce a meaning-aware structured diff of two inputs. Runs\n * segment -> align -> classify, stamps run provenance, and assembles the\n * `StructuredDiff`. A classifier is constructed and called only when there is at\n * least one change to classify (a modification, insertion, or deletion), so\n * diffs of identical, cosmetic, or merely relocated content need no provider.\n */\nexport async function diff(a: string, b: string, options?: DiffOptions): Promise<StructuredDiff> {\n const granularity = options?.segmentGranularity ?? \"sentence\";\n const pairs = align(segment(a, granularity), segment(b, granularity));\n\n const candidates: CandidatePair[] = [];\n for (const pair of pairs) {\n if (pair.tag !== \"candidate\") continue;\n if (pair.a !== null && pair.b !== null) {\n candidates.push({ type: \"modification\", a: pair.a.text, b: pair.b.text, spanA: pair.a.span, spanB: pair.b.span });\n } else if (pair.b !== null) {\n candidates.push({ type: \"insertion\", a: \"\", b: pair.b.text, spanA: null, spanB: pair.b.span });\n } else {\n candidates.push({ type: \"deletion\", a: pair.a!.text, b: \"\", spanA: pair.a!.span, spanB: null });\n }\n }\n\n const modelId = options?.modelId ?? DEFAULT_MODEL_ID;\n const classified =\n candidates.length === 0\n ? []\n : await classify(candidates, options?.classifier ?? createDefaultClassifier({ modelId }));\n\n const changes: Change[] = [];\n let classifiedIndex = 0;\n for (const pair of pairs) {\n if (pair.tag === \"unchanged\") continue;\n if (pair.tag === \"trivial-change\") {\n changes.push(cosmeticModification(pair.a!, pair.b!));\n } else if (pair.tag === \"move\") {\n changes.push(moveChange(pair.a!, pair.b!));\n } else {\n changes.push(classified[classifiedIndex]!);\n classifiedIndex += 1;\n }\n }\n\n const provenance: Provenance = {\n modelId,\n promptVersion: options?.promptVersion ?? DEFAULT_PROMPT_VERSION,\n engineVersion: ENGINE_VERSION,\n };\n return { schemaVersion: SCHEMA_VERSION, provenance, changes, summary: summarize(changes) };\n}\n\n/** A cosmetic edit to a matched unit — determined deterministically, no model. */\nfunction cosmeticModification(a: Unit, b: Unit): Change {\n return { type: \"modification\", classification: \"cosmetic\", spanA: a.span, spanB: b.span, confidence: 1, needsReview: false };\n}\n\n/**\n * A relocation of identical content (ADR-0010) — deterministic and cosmetic: the\n * text did not change, only its position, so it is surfaced as a `move` rather\n * than a delete + insert. `a` is the old span, `b` the new one.\n */\nfunction moveChange(a: Unit, b: Unit): Change {\n return { type: \"move\", classification: \"cosmetic\", spanA: a.span, spanB: b.span, confidence: 1, needsReview: false };\n}\n\nfunction summarize(changes: readonly Change[]): DiffSummary {\n const byType = { insertion: 0, deletion: 0, modification: 0, move: 0 };\n let substantive = 0;\n let cosmetic = 0;\n let needsReview = 0;\n for (const change of changes) {\n byType[change.type] += 1;\n if (change.classification === \"substantive\") substantive += 1;\n else cosmetic += 1;\n if (change.needsReview) needsReview += 1;\n }\n return { substantive, cosmetic, byType, needsReview };\n}\n"],"mappings":";AAiBO,IAAM,iBAAiB;;;ACuCvB,IAAM,mBAAmB;AAUzB,SAAS,qBAAwC;AACtD,SAAO,EAAE,gBAAgB,eAAe,YAAY,EAAE;AACxD;;;AC9CA,IAAM,eAAe;AACrB,IAAM,oBAAoB;AAC1B,IAAM,aAAa;AAGnB,IAAM,qBAAqB;AAE3B,IAAM,sBAAsB;AAE5B,IAAM,sBAAsB;AAC5B,IAAM,qBAAqB;AAWpB,IAAM,gBAAgB;AAAA,EAC3B;AAAA,EACA;AAAA,EACA;AAAA,EACA;AAAA,EACA;AAAA,EACA;AAAA,EACA;AAAA,EACA;AAAA,EACA;AAAA,EACA;AAAA,EACA;AAAA,EACA;AAAA,EACA;AAAA,EACA;AAAA,EACA;AAAA,EACA;AAAA,EACA;AAAA,EACA;AAAA,EACA;AAAA,EACA;AAAA,EACA;AACF,EAAE,KAAK,IAAI;AAGX,IAAM,iBAAiB;AAAA,EACrB,MAAM;AAAA,EACN,YAAY;AAAA,IACV,gBAAgB,EAAE,MAAM,UAAU,MAAM,CAAC,eAAe,UAAU,EAAE;AAAA,IACpE,aAAa,EAAE,MAAM,SAAS;AAAA,IAC9B,YAAY,EAAE,MAAM,SAAS;AAAA,EAC/B;AAAA,EACA,UAAU,CAAC,kBAAkB,YAAY;AAAA,EACzC,sBAAsB;AACxB;AAmBO,SAAS,wBAAwB,QAA6C;AACnF,QAAM,UAAU,OAAO,WAAW;AAClC,QAAM,SAAS,OAAO,UAAU,QAAQ,IAAI;AAC5C,QAAM,YAAY,OAAO,aAAa;AACtC,QAAM,aAAa,OAAO,cAAc;AACxC,MAAI,WAAW,UAAa,WAAW,IAAI;AACzC,UAAM,IAAI,MAAM,mFAAmF;AAAA,EACrG;AAEA,SAAO;AAAA,IACL,UAAU,CAAC,SAAoD;AAC7D,YAAM,OAAoB;AAAA,QACxB,QAAQ;AAAA,QACR,SAAS;AAAA,UACP,gBAAgB;AAAA,UAChB,aAAa;AAAA,UACb,qBAAqB;AAAA,QACvB;AAAA,QACA,MAAM,KAAK,UAAU,aAAa,SAAS,IAAI,CAAC;AAAA,MAClD;AACA,aAAO,kBAAkB,MAAM,aAAa,MAAM,SAAS,GAAG,UAAU;AAAA,IAC1E;AAAA,EACF;AACF;AAOA,IAAM,iBAAN,cAA6B,MAAM;AAAA;AAAA;AAAA;AAAA,EAIxB;AAAA,EACT,YAAY,SAAiBA,eAAsB;AACjD,UAAM,OAAO;AACb,SAAK,eAAeA;AAAA,EACtB;AACF;AAGA,eAAe,kBACb,SACA,YAC4B;AAC5B,WAAS,QAAQ,KAAK,SAAS,GAAG;AAChC,QAAI;AACF,aAAO,MAAM,QAAQ;AAAA,IACvB,SAAS,OAAO;AACd,UAAI,EAAE,iBAAiB,mBAAmB,SAAS,WAAY,OAAM;AACrE,YAAM,MAAM,UAAU,OAAO,MAAM,YAAY,CAAC;AAAA,IAClD;AAAA,EACF;AACF;AAGA,eAAe,aAAa,MAAmB,WAA+C;AAC5F,MAAI;AACJ,MAAI;AACF,eAAW,MAAM,iBAAiB,cAAc,MAAM,SAAS;AAAA,EACjE,SAAS,OAAO;AAEd,UAAM,IAAI,eAAe,iCAAkC,MAAgB,OAAO,IAAI,CAAC;AAAA,EACzF;AACA,MAAI,CAAC,SAAS,IAAI;AAChB,UAAM,UAAU,uBAAuB,SAAS,MAAM,KAAK,MAAM,SAAS,KAAK,CAAC;AAChF,QAAI,SAAS,WAAW,OAAO,SAAS,UAAU,KAAK;AACrD,YAAM,IAAI,eAAe,SAAS,aAAa,SAAS,OAAO,CAAC;AAAA,IAClE;AACA,UAAM,IAAI,MAAM,OAAO;AAAA,EACzB;AACA,SAAO,aAAa,MAAM,SAAS,KAAK,CAAC;AAC3C;AAGA,eAAe,iBAAiB,KAAa,MAAmB,WAAsC;AACpG,QAAM,aAAa,IAAI,gBAAgB;AACvC,QAAM,QAAQ,WAAW,MAAM,WAAW,MAAM,GAAG,SAAS;AAC5D,MAAI;AACF,WAAO,MAAM,MAAM,KAAK,EAAE,GAAG,MAAM,QAAQ,WAAW,OAAO,CAAC;AAAA,EAChE,UAAE;AACA,iBAAa,KAAK;AAAA,EACpB;AACF;AAGA,SAAS,UAAU,OAAeA,eAA8B;AAC9D,MAAIA,gBAAe,EAAG,QAAO,KAAK,IAAIA,eAAc,kBAAkB;AACtE,QAAM,cAAc,sBAAsB,KAAK;AAC/C,SAAO,KAAK,IAAI,cAAc,cAAc,OAAO,KAAK,OAAO,GAAG,kBAAkB;AACtF;AAGA,SAAS,aAAa,SAA0B;AAC9C,QAAM,UAAU,OAAO,QAAQ,IAAI,aAAa,CAAC;AACjD,SAAO,UAAU,IAAI,UAAU,MAAO;AACxC;AAEA,SAAS,MAAM,IAA2B;AACxC,SAAO,IAAI,QAAQ,CAAC,YAAY;AAC9B,eAAW,SAAS,EAAE;AAAA,EACxB,CAAC;AACH;AAGA,SAAS,aAAa,SAAiB,MAA8B;AACnE,QAAM,eAAwC;AAAA,IAC5C,QAAQ,EAAE,MAAM,eAAe,QAAQ,eAAe;AAAA,EACxD;AAKA,MAAI,oBAAoB,OAAO,GAAG;AAChC,iBAAa,SAAS;AAAA,EACxB;AACA,SAAO;AAAA,IACL,OAAO;AAAA,IACP,YAAY;AAAA,IACZ,eAAe;AAAA,IACf,QAAQ,CAAC,EAAE,MAAM,QAAQ,MAAM,eAAe,eAAe,EAAE,MAAM,YAAY,EAAE,CAAC;AAAA,IACpF,UAAU,CAAC,EAAE,MAAM,QAAQ,SAAS,gBAAgB,KAAK,IAAI;AAAA;AAAA,EAAU,KAAK,CAAC;AAAA;AAAA;AAAA,EAAW,KAAK,CAAC,GAAG,CAAC;AAAA,EACpG;AACF;AAQA,SAAS,oBAAoB,SAA0B;AACrD,SAAO,QAAQ,WAAW,cAAc,KAAK,QAAQ,WAAW,mBAAmB;AACrF;AAGA,SAAS,aAAa,MAAkC;AACtD,QAAM,UAAU;AAChB,QAAM,OAAO,QAAQ,SAAS,KAAK,CAAC,UAAU,MAAM,SAAS,MAAM,GAAG;AACtE,MAAI,SAAS,QAAW;AACtB,UAAM,IAAI,MAAM,wCAAwC;AAAA,EAC1D;AACA,SAAO,KAAK,MAAM,IAAI;AACxB;;;ACrOO,IAAM,iBAAiB;AAGvB,IAAM,yBAAyB;;;ACgBtC,IAAM,qBAAqB,IAAI,KAAK,UAAU,MAAM,EAAE,aAAa,WAAW,CAAC;AAO/E,IAAM,oBAAoB;AAMnB,SAAS,QAAQ,MAAc,aAAkD;AACtF,QAAM,QAAgB,CAAC;AACvB,QAAM,aAAa,gBAAgB,WAAW,oBAAoB;AAClE,aAAW,EAAE,SAAS,UAAU,MAAM,KAAK,mBAAmB,QAAQ,IAAI,GAAG;AAC3E,cAAU,UAAU,OAAO,YAAY,KAAK;AAAA,EAC9C;AACA,SAAO;AACT;AAOA,SAAS,UAAU,OAAe,MAAc,YAAoB,KAAmB;AACrF,MAAI,WAAW,WAAW,GAAG;AAC3B,gBAAY,OAAO,MAAM,GAAG;AAC5B;AAAA,EACF;AACA,MAAI,SAAS;AACb,WAAS,IAAI,GAAG,IAAI,MAAM,QAAQ,KAAK;AACrC,QAAI,WAAW,SAAS,MAAM,CAAC,CAAE,GAAG;AAClC,kBAAY,MAAM,MAAM,QAAQ,CAAC,GAAG,OAAO,QAAQ,GAAG;AACtD,eAAS,IAAI;AAAA,IACf;AAAA,EACF;AACA,cAAY,MAAM,MAAM,MAAM,GAAG,OAAO,QAAQ,GAAG;AACrD;AAOA,SAAS,YAAY,MAAc,MAAc,KAAmB;AAClE,QAAM,UAAU,KAAK,KAAK;AAC1B,MAAI,QAAQ,WAAW,EAAG;AAC1B,QAAM,QAAQ,QAAQ,KAAK,SAAS,KAAK,UAAU,EAAE;AACrD,MAAI,KAAK,EAAE,MAAM,SAAS,MAAM,EAAE,OAAO,KAAK,QAAQ,QAAQ,OAAO,EAAE,CAAC;AAC1E;;;AC1CA,IAAM,qBAAqB;AAQ3B,IAAM,cAAc,WAAC,UAAM,IAAE;AAMtB,SAAS,MAAM,QAAyB,QAAiD;AAC9F,QAAM,QAAQ,OAAO,IAAI,SAAS;AAClC,QAAM,QAAQ,OAAO,IAAI,SAAS;AAClC,QAAM,UAAU,WAAW,OAAO,KAAK;AAEvC,QAAM,MAAqB,CAAC;AAC5B,MAAI,IAAI;AACR,MAAI,IAAI;AACR,aAAW,CAAC,IAAI,EAAE,KAAK,SAAS;AAC9B,YAAQ,OAAO,MAAM,GAAG,EAAE,GAAG,OAAO,MAAM,GAAG,EAAE,GAAG,GAAG;AACrD,UAAM,IAAI,OAAO,EAAE;AACnB,UAAM,IAAI,OAAO,EAAE;AACnB,QAAI,KAAK,EAAE,KAAK,EAAE,SAAS,EAAE,OAAO,cAAc,kBAAkB,GAAG,EAAE,CAAC;AAC1E,QAAI,KAAK;AACT,QAAI,KAAK;AAAA,EACX;AACA,UAAQ,OAAO,MAAM,CAAC,GAAG,OAAO,MAAM,CAAC,GAAG,GAAG;AAC7C,SAAO,YAAY,GAAG;AACxB;AAOA,SAAS,QAAQ,MAAuB,MAAuB,KAA0B;AACvF,QAAM,SAAS,KAAK,IAAI,KAAK,QAAQ,KAAK,MAAM;AAChD,MAAI,IAAI;AACR,SAAO,IAAI,QAAQ,KAAK;AACtB,UAAM,IAAI,KAAK,CAAC;AAChB,UAAM,IAAI,KAAK,CAAC;AAChB,QAAI,YAAY,GAAG,CAAC,GAAG;AACrB,UAAI,KAAK,EAAE,KAAK,aAAa,GAAG,EAAE,CAAC;AAAA,IACrC,OAAO;AACL,UAAI,KAAK,EAAE,KAAK,aAAa,GAAG,GAAG,KAAK,CAAC;AACzC,UAAI,KAAK,EAAE,KAAK,aAAa,GAAG,MAAM,EAAE,CAAC;AAAA,IAC3C;AAAA,EACF;AACA,SAAO,IAAI,KAAK,QAAQ,IAAK,KAAI,KAAK,EAAE,KAAK,aAAa,GAAG,KAAK,CAAC,GAAI,GAAG,KAAK,CAAC;AAChF,SAAO,IAAI,KAAK,QAAQ,IAAK,KAAI,KAAK,EAAE,KAAK,aAAa,GAAG,MAAM,GAAG,KAAK,CAAC,EAAG,CAAC;AAClF;AAeA,SAAS,YAAY,OAAuD;AAC1E,QAAM,iBAAiB,oBAAI,IAAoB;AAC/C,QAAM,QAAQ,CAAC,MAAM,UAAU;AAC7B,QAAI,KAAK,QAAQ,eAAe,KAAK,MAAM,MAAM;AAC/C,qBAAe,IAAI,UAAU,KAAK,CAAE,GAAG,KAAK;AAAA,IAC9C;AAAA,EACF,CAAC;AAED,QAAM,SAAS,oBAAI,IAAoB;AACvC,QAAM,QAAQ,CAAC,MAAM,UAAU;AAC7B,QAAI,KAAK,QAAQ,eAAe,KAAK,MAAM,MAAM;AAC/C,YAAM,MAAM,UAAU,KAAK,CAAE;AAC7B,YAAM,iBAAiB,eAAe,IAAI,GAAG;AAC7C,UAAI,mBAAmB,QAAW;AAChC,uBAAe,OAAO,GAAG;AACzB,eAAO,IAAI,OAAO,cAAc;AAAA,MAClC;AAAA,IACF;AAAA,EACF,CAAC;AAED,MAAI,OAAO,SAAS,EAAG,QAAO;AAE9B,QAAM,kBAAkB,IAAI,IAAI,OAAO,OAAO,CAAC;AAC/C,SAAO,MAAM,QAAQ,CAAC,MAAM,UAAU;AACpC,QAAI,gBAAgB,IAAI,KAAK,EAAG,QAAO,CAAC;AACxC,UAAM,iBAAiB,OAAO,IAAI,KAAK;AACvC,WAAO,mBAAmB,SAAY,CAAC,IAAI,IAAI,CAAC,EAAE,KAAK,QAAiB,GAAG,KAAK,GAAG,GAAG,MAAM,cAAc,EAAG,EAAE,CAAC;AAAA,EAClH,CAAC;AACH;AAGA,SAAS,UAAU,MAAoB;AACrC,SAAO,KAAK,KACT,YAAY,EACZ,QAAQ,oBAAoB,EAAE,EAC9B,QAAQ,aAAa,GAAG,EACxB,QAAQ,QAAQ,GAAG,EACnB,KAAK;AACV;AAGA,SAAS,YAAY,GAAS,GAAkB;AAC9C,QAAM,UAAU,IAAI,IAAI,SAAS,CAAC,CAAC;AACnC,SAAO,SAAS,CAAC,EAAE,KAAK,CAAC,UAAU,QAAQ,IAAI,KAAK,CAAC;AACvD;AAEA,SAAS,SAAS,MAAsB;AACtC,QAAM,aAAa,UAAU,IAAI;AACjC,SAAO,WAAW,WAAW,IAAI,CAAC,IAAI,WAAW,MAAM,GAAG;AAC5D;AAMA,SAAS,WAAW,GAAsB,GAAwD;AAChG,QAAM,IAAI,EAAE;AACZ,QAAM,IAAI,EAAE;AACZ,QAAM,KAAiB,MAAM,KAAK,EAAE,QAAQ,IAAI,EAAE,GAAG,MAAM,IAAI,MAAc,IAAI,CAAC,EAAE,KAAK,CAAC,CAAC;AAC3F,WAASC,KAAI,IAAI,GAAGA,MAAK,GAAGA,MAAK;AAC/B,aAASC,KAAI,IAAI,GAAGA,MAAK,GAAGA,MAAK;AAC/B,SAAGD,EAAC,EAAGC,EAAC,IAAI,EAAED,EAAC,MAAM,EAAEC,EAAC,IAAI,GAAGD,KAAI,CAAC,EAAGC,KAAI,CAAC,IAAK,IAAI,KAAK,IAAI,GAAGD,KAAI,CAAC,EAAGC,EAAC,GAAI,GAAGD,EAAC,EAAGC,KAAI,CAAC,CAAE;AAAA,IAC9F;AAAA,EACF;AAEA,QAAM,UAA4C,CAAC;AACnD,MAAI,IAAI;AACR,MAAI,IAAI;AACR,SAAO,IAAI,KAAK,IAAI,GAAG;AACrB,QAAI,EAAE,CAAC,MAAM,EAAE,CAAC,GAAG;AACjB,cAAQ,KAAK,CAAC,GAAG,CAAC,CAAC;AACnB;AACA;AAAA,IACF,WAAW,GAAG,IAAI,CAAC,EAAG,CAAC,KAAM,GAAG,CAAC,EAAG,IAAI,CAAC,GAAI;AAC3C;AAAA,IACF,OAAO;AACL;AAAA,IACF;AAAA,EACF;AACA,SAAO;AACT;;;ACzKA,IAAM,eAAe;AAGrB,IAAM,yBAAyB;AAM/B,eAAsB,SACpB,YACA,YAC4B;AAC5B,QAAM,UAAoB,CAAC;AAC3B,aAAW,QAAQ,YAAY;AAC7B,YAAQ,KAAK,MAAM,aAAa,MAAM,UAAU,CAAC;AAAA,EACnD;AACA,SAAO;AACT;AAEA,eAAe,aAAa,MAAqB,YAAyC;AACxF,WAAS,UAAU,GAAG,UAAU,cAAc,WAAW,GAAG;AAC1D,QAAI;AACJ,QAAI;AACF,gBAAU,MAAM,WAAW,SAAS,IAAI;AAAA,IAC1C,QAAQ;AACN;AAAA,IACF;AACA,QAAI,eAAe,OAAO,GAAG;AAC3B,aAAO,SAAS,MAAM,OAAO;AAAA,IAC/B;AAAA,EACF;AACA,SAAO,kBAAkB,IAAI;AAC/B;AAGA,SAAS,eAAe,OAA4C;AAClE,MAAI,OAAO,UAAU,YAAY,UAAU,KAAM,QAAO;AACxD,QAAM,IAAI;AACV,MAAI,EAAE,mBAAmB,iBAAiB,EAAE,mBAAmB,WAAY,QAAO;AAClF,MAAI,OAAO,EAAE,eAAe,YAAY,CAAC,OAAO,SAAS,EAAE,UAAU,EAAG,QAAO;AAC/E,MAAI,EAAE,aAAa,KAAK,EAAE,aAAa,EAAG,QAAO;AACjD,MAAI,EAAE,gBAAgB,UAAa,OAAO,EAAE,gBAAgB,SAAU,QAAO;AAC7E,SAAO;AACT;AAEA,SAAS,SAAS,MAAqB,SAAoC;AACzE,QAAM,OAAO;AAAA,IACX,MAAM,KAAK;AAAA,IACX,gBAAgB,QAAQ;AAAA,IACxB,OAAO,KAAK;AAAA,IACZ,OAAO,KAAK;AAAA,IACZ,YAAY,QAAQ;AAAA,IACpB,aAAa,QAAQ,aAAa;AAAA,EACpC;AACA,SAAO,QAAQ,gBAAgB,SAAY,OAAO,EAAE,GAAG,MAAM,aAAa,QAAQ,YAAY;AAChG;AAEA,SAAS,kBAAkB,MAA6B;AACtD,QAAM,EAAE,gBAAgB,WAAW,IAAI,mBAAmB;AAC1D,SAAO;AAAA,IACL,MAAM,KAAK;AAAA,IACX;AAAA,IACA,OAAO,KAAK;AAAA,IACZ,OAAO,KAAK;AAAA,IACZ;AAAA,IACA,aAAa;AAAA,EACf;AACF;;;ACzEA,SAAS,kBAAkB;AAO3B,IAAM,kBAAkB,OAAO,aAAa,CAAC;AAStC,SAAS,oBAAkC;AAChD,QAAM,QAAQ,oBAAI,IAA+B;AACjD,SAAO;AAAA,IACL,KAAK,CAAC,QAAQ,QAAQ,QAAQ,MAAM,IAAI,GAAG,CAAC;AAAA,IAC5C,KAAK,CAAC,KAAK,YAAY;AACrB,YAAM,IAAI,KAAK,OAAO;AACtB,aAAO,QAAQ,QAAQ;AAAA,IACzB;AAAA,EACF;AACF;AAaO,SAAS,UAAU,YAAwB,SAAmC;AACnF,QAAM,QAAQ,QAAQ,SAAS,kBAAkB;AACjD,SAAO;AAAA,IACL,UAAU,OAAO,SAAoD;AACnE,YAAM,MAAM,SAAS,MAAM,QAAQ,SAAS,QAAQ,aAAa;AACjE,YAAM,SAAS,MAAM,MAAM,IAAI,GAAG;AAClC,UAAI,WAAW,OAAW,QAAO;AACjC,YAAM,UAAU,MAAM,WAAW,SAAS,IAAI;AAC9C,YAAM,MAAM,IAAI,KAAK,OAAO;AAC5B,aAAO;AAAA,IACT;AAAA,EACF;AACF;AAGO,SAAS,SAAS,MAAqB,SAAiB,eAA+B;AAC5F,QAAM,QAAQ,CAACC,WAAU,KAAK,CAAC,GAAGA,WAAU,KAAK,CAAC,GAAG,eAAe,OAAO;AAC3E,SAAO,WAAW,QAAQ,EAAE,OAAO,MAAM,KAAK,eAAe,CAAC,EAAE,OAAO,KAAK;AAC9E;AAEA,SAASA,WAAU,MAAsB;AACvC,SAAO,KAAK,QAAQ,QAAQ,GAAG,EAAE,KAAK;AACxC;;;AC7BA,eAAsB,KAAK,GAAW,GAAW,SAAgD;AAC/F,QAAM,cAAc,SAAS,sBAAsB;AACnD,QAAM,QAAQ,MAAM,QAAQ,GAAG,WAAW,GAAG,QAAQ,GAAG,WAAW,CAAC;AAEpE,QAAM,aAA8B,CAAC;AACrC,aAAW,QAAQ,OAAO;AACxB,QAAI,KAAK,QAAQ,YAAa;AAC9B,QAAI,KAAK,MAAM,QAAQ,KAAK,MAAM,MAAM;AACtC,iBAAW,KAAK,EAAE,MAAM,gBAAgB,GAAG,KAAK,EAAE,MAAM,GAAG,KAAK,EAAE,MAAM,OAAO,KAAK,EAAE,MAAM,OAAO,KAAK,EAAE,KAAK,CAAC;AAAA,IAClH,WAAW,KAAK,MAAM,MAAM;AAC1B,iBAAW,KAAK,EAAE,MAAM,aAAa,GAAG,IAAI,GAAG,KAAK,EAAE,MAAM,OAAO,MAAM,OAAO,KAAK,EAAE,KAAK,CAAC;AAAA,IAC/F,OAAO;AACL,iBAAW,KAAK,EAAE,MAAM,YAAY,GAAG,KAAK,EAAG,MAAM,GAAG,IAAI,OAAO,KAAK,EAAG,MAAM,OAAO,KAAK,CAAC;AAAA,IAChG;AAAA,EACF;AAEA,QAAM,UAAU,SAAS,WAAW;AACpC,QAAM,aACJ,WAAW,WAAW,IAClB,CAAC,IACD,MAAM,SAAS,YAAY,SAAS,cAAc,wBAAwB,EAAE,QAAQ,CAAC,CAAC;AAE5F,QAAM,UAAoB,CAAC;AAC3B,MAAI,kBAAkB;AACtB,aAAW,QAAQ,OAAO;AACxB,QAAI,KAAK,QAAQ,YAAa;AAC9B,QAAI,KAAK,QAAQ,kBAAkB;AACjC,cAAQ,KAAK,qBAAqB,KAAK,GAAI,KAAK,CAAE,CAAC;AAAA,IACrD,WAAW,KAAK,QAAQ,QAAQ;AAC9B,cAAQ,KAAK,WAAW,KAAK,GAAI,KAAK,CAAE,CAAC;AAAA,IAC3C,OAAO;AACL,cAAQ,KAAK,WAAW,eAAe,CAAE;AACzC,yBAAmB;AAAA,IACrB;AAAA,EACF;AAEA,QAAM,aAAyB;AAAA,IAC7B;AAAA,IACA,eAAe,SAAS,iBAAiB;AAAA,IACzC,eAAe;AAAA,EACjB;AACA,SAAO,EAAE,eAAe,gBAAgB,YAAY,SAAS,SAAS,UAAU,OAAO,EAAE;AAC3F;AAGA,SAAS,qBAAqB,GAAS,GAAiB;AACtD,SAAO,EAAE,MAAM,gBAAgB,gBAAgB,YAAY,OAAO,EAAE,MAAM,OAAO,EAAE,MAAM,YAAY,GAAG,aAAa,MAAM;AAC7H;AAOA,SAAS,WAAW,GAAS,GAAiB;AAC5C,SAAO,EAAE,MAAM,QAAQ,gBAAgB,YAAY,OAAO,EAAE,MAAM,OAAO,EAAE,MAAM,YAAY,GAAG,aAAa,MAAM;AACrH;AAEA,SAAS,UAAU,SAAyC;AAC1D,QAAM,SAAS,EAAE,WAAW,GAAG,UAAU,GAAG,cAAc,GAAG,MAAM,EAAE;AACrE,MAAI,cAAc;AAClB,MAAI,WAAW;AACf,MAAI,cAAc;AAClB,aAAW,UAAU,SAAS;AAC5B,WAAO,OAAO,IAAI,KAAK;AACvB,QAAI,OAAO,mBAAmB,cAAe,gBAAe;AAAA,QACvD,aAAY;AACjB,QAAI,OAAO,YAAa,gBAAe;AAAA,EACzC;AACA,SAAO,EAAE,aAAa,UAAU,QAAQ,YAAY;AACtD;","names":["retryAfterMs","i","j","normalize"]}

package/dist/cli.d.ts ADDED Viewed

@@ -0,0 +1,5 @@
+#!/usr/bin/env node
+/** CLI entry point. Returns the process exit code. */
+declare function main(argv: readonly string[]): Promise<number>;
+export { main };

package/dist/cli.js ADDED Viewed

@@ -0,0 +1,78 @@
+#!/usr/bin/env node
+import {
+  diff
+} from "./chunk-4GFNMJGB.js";
+// src/cli.ts
+import { readFile } from "fs/promises";
+import { pathToFileURL } from "url";
+var USAGE = [
+  "Usage: semdiff <fileA> <fileB> [--granularity sentence|clause]",
+  "",
+  "Prints a meaning-aware structured diff (JSON) of two text files.",
+  "Substantive changes are classified by the model (set ANTHROPIC_API_KEY)."
+].join("\n");
+function parseArgs(argv) {
+  const positionals = [];
+  let granularity;
+  for (let i = 0; i < argv.length; i += 1) {
+    const arg = argv[i];
+    if (arg === "--granularity") {
+      const value = argv[i + 1];
+      if (value !== "sentence" && value !== "clause") return null;
+      granularity = value;
+      i += 1;
+    } else {
+      positionals.push(arg);
+    }
+  }
+  if (positionals.length !== 2) return null;
+  const fileA = positionals[0];
+  const fileB = positionals[1];
+  return granularity === void 0 ? { fileA, fileB } : { fileA, fileB, granularity };
+}
+function messageOf(error) {
+  return error instanceof Error ? error.message : String(error);
+}
+async function main(argv) {
+  if (argv.includes("-h") || argv.includes("--help")) {
+    process.stdout.write(`${USAGE}
+`);
+    return 0;
+  }
+  const args = parseArgs(argv);
+  if (args === null) {
+    process.stderr.write(`${USAGE}
+`);
+    return 1;
+  }
+  let a;
+  let b;
+  try {
+    [a, b] = await Promise.all([readFile(args.fileA, "utf8"), readFile(args.fileB, "utf8")]);
+  } catch (error) {
+    process.stderr.write(`semdiff: cannot read input: ${messageOf(error)}
+`);
+    return 1;
+  }
+  try {
+    const options = args.granularity === void 0 ? {} : { segmentGranularity: args.granularity };
+    const result = await diff(a, b, options);
+    process.stdout.write(`${JSON.stringify(result, null, 2)}
+`);
+    return 0;
+  } catch (error) {
+    process.stderr.write(`semdiff: ${messageOf(error)}
+`);
+    return 1;
+  }
+}
+if (import.meta.url === pathToFileURL(process.argv[1] ?? "").href) {
+  void main(process.argv.slice(2)).then((code) => {
+    process.exitCode = code;
+  });
+}
+export {
+  main
+};
+//# sourceMappingURL=cli.js.map

package/dist/cli.js.map ADDED Viewed

@@ -0,0 +1 @@

+ {"version":3,"sources":["../src/cli.ts"],"sourcesContent":["#!/usr/bin/env node\r\n/**\r\n * `semdiff` CLI — a thin wrapper over the library (ADR-0002). It reads two\r\n * files, runs `diff`, and prints the `StructuredDiff` as JSON; the structured\r\n * diff is the source of truth (ADR-0006) and any rendering is a pure function of\r\n * it. No capability exists only in the CLI.\r\n *\r\n * Live diffs of substantive changes call the model, so set `ANTHROPIC_API_KEY`\r\n * (or inject a classifier from the library API). Identical, cosmetic, inserted,\r\n * or deleted content needs no model.\r\n *\r\n * NOTE (follow-up): wiring this `.ts` file as the package `bin` relies on Node\r\n * executing TypeScript; raw-`.ts` bin execution is not settled cross-platform\r\n * (notably Windows). Run via `node src/cli.ts ...` until that is resolved.\r\n */\r\nimport { readFile } from \"node:fs/promises\";\r\nimport { pathToFileURL } from \"node:url\";\r\nimport { diff, type DiffOptions } from \"./index.ts\";\r\nimport type { SegmentGranularity } from \"./pipeline/segment.ts\";\r\n\r\nconst USAGE = [\r\n \"Usage: semdiff <fileA> <fileB> [--granularity sentence|clause]\",\r\n \"\",\r\n \"Prints a meaning-aware structured diff (JSON) of two text files.\",\r\n \"Substantive changes are classified by the model (set ANTHROPIC_API_KEY).\",\r\n].join(\"\\n\");\r\n\r\ninterface CliArgs {\r\n readonly fileA: string;\r\n readonly fileB: string;\r\n readonly granularity?: SegmentGranularity;\r\n}\r\n\r\n/** Parse argv into file paths plus options, or `null` if the arguments are invalid. */\r\nfunction parseArgs(argv: readonly string[]): CliArgs | null {\r\n const positionals: string[] = [];\r\n let granularity: SegmentGranularity | undefined;\r\n for (let i = 0; i < argv.length; i += 1) {\r\n const arg = argv[i]!;\r\n if (arg === \"--granularity\") {\r\n const value = argv[i + 1];\r\n if (value !== \"sentence\" && value !== \"clause\") return null;\r\n granularity = value;\r\n i += 1;\r\n } else {\r\n positionals.push(arg);\r\n }\r\n }\r\n if (positionals.length !== 2) return null;\r\n const fileA = positionals[0]!;\r\n const fileB = positionals[1]!;\r\n return granularity === undefined ? { fileA, fileB } : { fileA, fileB, granularity };\r\n}\r\n\r\nfunction messageOf(error: unknown): string {\r\n return error instanceof Error ? error.message : String(error);\r\n}\r\n\r\n/** CLI entry point. Returns the process exit code. */\r\nexport async function main(argv: readonly string[]): Promise<number> {\r\n if (argv.includes(\"-h\") || argv.includes(\"--help\")) {\r\n process.stdout.write(`${USAGE}\\n`);\r\n return 0;\r\n }\r\n const args = parseArgs(argv);\r\n if (args === null) {\r\n process.stderr.write(`${USAGE}\\n`);\r\n return 1;\r\n }\r\n\r\n let a: string;\r\n let b: string;\r\n try {\r\n [a, b] = await Promise.all([readFile(args.fileA, \"utf8\"), readFile(args.fileB, \"utf8\")]);\r\n } catch (error) {\r\n process.stderr.write(`semdiff: cannot read input: ${messageOf(error)}\\n`);\r\n return 1;\r\n }\r\n\r\n try {\r\n const options: DiffOptions = args.granularity === undefined ? {} : { segmentGranularity: args.granularity };\r\n const result = await diff(a, b, options);\r\n process.stdout.write(`${JSON.stringify(result, null, 2)}\\n`);\r\n return 0;\r\n } catch (error) {\r\n process.stderr.write(`semdiff: ${messageOf(error)}\\n`);\r\n return 1;\r\n }\r\n}\r\n\r\nif (import.meta.url === pathToFileURL(process.argv[1] ?? \"\").href) {\r\n void main(process.argv.slice(2)).then((code) => {\r\n process.exitCode = code;\r\n });\r\n}\r\n"],"mappings":";;;;;;AAeA,SAAS,gBAAgB;AACzB,SAAS,qBAAqB;AAI9B,IAAM,QAAQ;AAAA,EACZ;AAAA,EACA;AAAA,EACA;AAAA,EACA;AACF,EAAE,KAAK,IAAI;AASX,SAAS,UAAU,MAAyC;AAC1D,QAAM,cAAwB,CAAC;AAC/B,MAAI;AACJ,WAAS,IAAI,GAAG,IAAI,KAAK,QAAQ,KAAK,GAAG;AACvC,UAAM,MAAM,KAAK,CAAC;AAClB,QAAI,QAAQ,iBAAiB;AAC3B,YAAM,QAAQ,KAAK,IAAI,CAAC;AACxB,UAAI,UAAU,cAAc,UAAU,SAAU,QAAO;AACvD,oBAAc;AACd,WAAK;AAAA,IACP,OAAO;AACL,kBAAY,KAAK,GAAG;AAAA,IACtB;AAAA,EACF;AACA,MAAI,YAAY,WAAW,EAAG,QAAO;AACrC,QAAM,QAAQ,YAAY,CAAC;AAC3B,QAAM,QAAQ,YAAY,CAAC;AAC3B,SAAO,gBAAgB,SAAY,EAAE,OAAO,MAAM,IAAI,EAAE,OAAO,OAAO,YAAY;AACpF;AAEA,SAAS,UAAU,OAAwB;AACzC,SAAO,iBAAiB,QAAQ,MAAM,UAAU,OAAO,KAAK;AAC9D;AAGA,eAAsB,KAAK,MAA0C;AACnE,MAAI,KAAK,SAAS,IAAI,KAAK,KAAK,SAAS,QAAQ,GAAG;AAClD,YAAQ,OAAO,MAAM,GAAG,KAAK;AAAA,CAAI;AACjC,WAAO;AAAA,EACT;AACA,QAAM,OAAO,UAAU,IAAI;AAC3B,MAAI,SAAS,MAAM;AACjB,YAAQ,OAAO,MAAM,GAAG,KAAK;AAAA,CAAI;AACjC,WAAO;AAAA,EACT;AAEA,MAAI;AACJ,MAAI;AACJ,MAAI;AACF,KAAC,GAAG,CAAC,IAAI,MAAM,QAAQ,IAAI,CAAC,SAAS,KAAK,OAAO,MAAM,GAAG,SAAS,KAAK,OAAO,MAAM,CAAC,CAAC;AAAA,EACzF,SAAS,OAAO;AACd,YAAQ,OAAO,MAAM,+BAA+B,UAAU,KAAK,CAAC;AAAA,CAAI;AACxE,WAAO;AAAA,EACT;AAEA,MAAI;AACF,UAAM,UAAuB,KAAK,gBAAgB,SAAY,CAAC,IAAI,EAAE,oBAAoB,KAAK,YAAY;AAC1G,UAAM,SAAS,MAAM,KAAK,GAAG,GAAG,OAAO;AACvC,YAAQ,OAAO,MAAM,GAAG,KAAK,UAAU,QAAQ,MAAM,CAAC,CAAC;AAAA,CAAI;AAC3D,WAAO;AAAA,EACT,SAAS,OAAO;AACd,YAAQ,OAAO,MAAM,YAAY,UAAU,KAAK,CAAC;AAAA,CAAI;AACrD,WAAO;AAAA,EACT;AACF;AAEA,IAAI,YAAY,QAAQ,cAAc,QAAQ,KAAK,CAAC,KAAK,EAAE,EAAE,MAAM;AACjE,OAAK,KAAK,QAAQ,KAAK,MAAM,CAAC,CAAC,EAAE,KAAK,CAAC,SAAS;AAC9C,YAAQ,WAAW;AAAA,EACrB,CAAC;AACH;","names":[]}

package/dist/index.d.ts ADDED Viewed

@@ -0,0 +1,276 @@
+/**
+ * The semdiff public contract (ADR-0006).
+ *
+ * A `StructuredDiff` is the engine's primary output. Every human-readable
+ * rendering is a pure function of it, and machine consumers — notably the
+ * downstream `sust-reg-reporter` application — integrate against these types
+ * and their JSON form. This module is pure types and constants: no logic, no
+ * imports, nothing domain-specific (ADR-0001).
+ */
+/**
+ * Version of the StructuredDiff contract. Additive-by-default (ADR-0006): a
+ * backwards-compatible addition keeps the version; a breaking shape change
+ * bumps it and gets its own ADR. `StructuredDiff.schemaVersion` is typed as a
+ * plain `string` (not this literal) so an additive bump is not itself a
+ * breaking type change for pinned consumers.
+ */
+declare const SCHEMA_VERSION = "1.0.0";
+/**
+ * A `Span` locates a change within ONE input by half-open `[start, end)`
+ * CHARACTER OFFSETS (ADR-0007).
+ *
+ * INVARIANT (load-bearing for consumer citation integrity): offsets index into
+ * the EXACT, LITERAL, UN-NORMALIZED input string the caller passed. For
+ * `sust-reg-reporter` that string is the immutable content-addressed snapshot
+ * text (its ADR-0004 citation integrity, ADR-0011 snapshot store), so the
+ * offsets resolve against a stored snapshot. Normalization applied internally
+ * for alignment (whitespace, casing, punctuation, numbering) MUST NOT shift the
+ * reported offsets. These `{ start, end }` map field-for-field onto the
+ * consumer's citation span (`@sust-reg/core` `SourceCitation.span`).
+ */
+interface Span {
+    /** Inclusive start character offset into the literal input. */
+    readonly start: number;
+    /** Exclusive end character offset into the literal input. */
+    readonly end: number;
+    /**
+     * Optional id of the segmentation unit this span falls in (ADR-0003).
+     * Additive metadata only — consumers anchor on `start`/`end`, never this.
+     */
+    readonly unitId?: string;
+}
+/** The kind of edit a change represents. */
+type ChangeType = "insertion" | "deletion" | "modification" | "move";
+/** Whether a change alters meaning (`substantive`) or not (`cosmetic`). */
+type Classification = "substantive" | "cosmetic";
+/** One classified change between input A and input B. */
+interface Change {
+    readonly type: ChangeType;
+    readonly classification: Classification;
+    /** Location in input A; `null` for a pure insertion (absent from A). */
+    readonly spanA: Span | null;
+    /** Location in input B; `null` for a pure deletion (absent from B). */
+    readonly spanB: Span | null;
+    /**
+     * Short description of what changed. Present only for substantive
+     * modifications; the key is OMITTED otherwise (never set to `undefined`,
+     * per `exactOptionalPropertyTypes`).
+     */
+    readonly description?: string;
+    /** Classifier confidence in `[0, 1]`. */
+    readonly confidence: number;
+    /** Set for low-confidence or failed/degraded classifications (ADR-0004). */
+    readonly needsReview: boolean;
+}
+/** The reproducibility stamp for a run (ADR-0004): identifies the model run. */
+interface Provenance {
+    readonly modelId: string;
+    readonly promptVersion: string;
+    readonly engineVersion: string;
+}
+/** Aggregate counts for quick triage. */
+interface DiffSummary {
+    readonly substantive: number;
+    readonly cosmetic: number;
+    /** Count per change type; all four keys are present (zeros allowed). */
+    readonly byType: Readonly<Record<ChangeType, number>>;
+    readonly needsReview: number;
+}
+/**
+ * The engine's primary output (ADR-0006): a stable, versioned, JSON-
+ * serializable diff. All human-readable views derive from it.
+ */
+interface StructuredDiff {
+    /** The `SCHEMA_VERSION` in effect at emit time; typed `string` for additive bumps. */
+    readonly schemaVersion: string;
+    readonly provenance: Provenance;
+    readonly changes: readonly Change[];
+    readonly summary: DiffSummary;
+}
+/**
+ * The classification boundary (ADR-0004).
+ *
+ * semdiff uses the LLM strictly as a gated, structured classifier behind a
+ * small `Classifier` interface — never a free-form diff narrator, never a
+ * hardwired SDK. The default provider is the latest capable Claude model,
+ * injected via config, so consumers (e.g. `sust-reg-reporter`) are not forced
+ * to own provider wiring.
+ */
+/**
+ * The structural kind of a candidate change. A subset of `ChangeType`: `move`
+ * is detected deterministically before classification (ADR-0010), so it never
+ * reaches the model.
+ */
+type CandidateType = "insertion" | "deletion" | "modification";
+/**
+ * A single changed pair handed to the classifier for a verdict. One side may be
+ * absent: for an insertion `a` is `""` and `spanA` is `null`; for a deletion
+ * `b` is `""` and `spanB` is `null` (ADR-0011).
+ */
+interface CandidatePair {
+    /** The structural kind of change. */
+    readonly type: CandidateType;
+    /** The unit text from input A; `""` for an insertion. */
+    readonly a: string;
+    /** The unit text from input B; `""` for a deletion. */
+    readonly b: string;
+    /** Location of `a` within input A (literal character offsets, ADR-0007); `null` for an insertion. */
+    readonly spanA: Span | null;
+    /** Location of `b` within input B (literal character offsets, ADR-0007); `null` for a deletion. */
+    readonly spanB: Span | null;
+}
+/** The schema-validated verdict for one candidate pair. */
+interface ClassifierVerdict {
+    readonly classification: Classification;
+    /** Present only for a substantive verdict; OMITTED otherwise. */
+    readonly description?: string;
+    /** Provider/engine confidence in `[0, 1]`. */
+    readonly confidence: number;
+}
+/** The injectable provider boundary. An implementation wraps one LLM provider. */
+interface Classifier {
+    classify(pair: CandidatePair): Promise<ClassifierVerdict>;
+}
+/**
+ * Default model id (the latest capable Claude, ADR-0004). Stamped into run
+ * provenance, and used by the default classifier when the caller does not pin
+ * one. Callers that inject their own provider should pass `modelId` for an
+ * accurate provenance stamp.
+ */
+declare const DEFAULT_MODEL_ID = "claude-opus-4-8";
+/**
+ * The never-drop / never-fabricate fallback verdict (ADR-0004). When a model
+ * response fails schema validation, retries are exhausted, or the provider
+ * errors, the `classify` stage records this conservative verdict — `substantive`
+ * (so the change is surfaced for review, not hidden) with zero confidence — and
+ * flags the resulting change for review, rather than dropping the pair or
+ * guessing a cosmetic/substantive call.
+ */
+declare function needsReviewVerdict(): ClassifierVerdict;
+/**
+ * Version constants feeding the reproducibility stamp (ADR-0004). A run is
+ * stamped with the model id, the prompt version, and this engine version so
+ * results are reproducible and cache keys are stable.
+ */
+/**
+ * semdiff engine version, stamped into `Provenance.engineVersion`. Kept in sync
+ * with `package.json` `version`; `test/version.contract.test.ts` fails the build
+ * if the two drift, so a published artifact never stamps a stale version.
+ */
+declare const ENGINE_VERSION = "0.1.0";
+/** Default prompt-template version, stamped into `Provenance.promptVersion`. */
+declare const DEFAULT_PROMPT_VERSION = "0";
+/**
+ * Stage 1 — segment (ADR-0003). Local and deterministic; no model.
+ *
+ * Split an input into comparable units. At `sentence` granularity each unit is
+ * a sentence; at `clause` granularity sentences are further divided at strong
+ * intra-sentence separators. Each `Unit` carries the half-open `[start, end)`
+ * CHARACTER OFFSETS of its text within the LITERAL input, so spans reported
+ * downstream index the caller's exact input (the offset invariant, ADR-0007):
+ * `input.slice(unit.span.start, unit.span.end) === unit.text` always holds.
+ * Whitespace at unit boundaries is excluded from the span (and the text); no
+ * other normalization is applied, so offsets never drift.
+ */
+/** The granularity at which an input is segmented. */
+type SegmentGranularity = "sentence" | "clause";
+/**
+ * Default classifier — calls the Anthropic Messages API to judge whether a
+ * change is substantive or cosmetic (ADR-0004, ADR-0009).
+ *
+ * It uses the global `fetch` (no SDK), so the engine keeps ZERO runtime
+ * dependencies; a consumer that needs a different provider or transport injects
+ * its own `Classifier` instead. Determinism is steered by a pinned model, a
+ * pinned prompt, and low effort where the model accepts it — Opus 4.8 removed the
+ * `temperature` parameter, so there is no `temperature: 0`. The verdict is returned through a constrained
+ * JSON schema and then RE-VALIDATED by the classify stage, so this module can
+ * parse leniently: any malformed response surfaces as a thrown error that the
+ * classify stage retries, then degrades to needs-review.
+ *
+ * Transport resilience (ADR-0012): each call has a timeout and retries transient
+ * failures — HTTP 429/5xx, network errors, and abort timeouts — with exponential
+ * backoff, honouring a `Retry-After` header. Non-transient errors (400, auth)
+ * fail fast. This is distinct from the classify stage's verdict-level retry: the
+ * provider exhausts its backoff first, and only if it still throws does the
+ * stage's safety net degrade the pair to needs-review.
+ */
+/** Configuration for the default Anthropic-backed classifier. */
+interface DefaultClassifierConfig {
+    /** Model id; defaults to the latest capable Claude (ADR-0004). */
+    readonly modelId?: string;
+    /** API key; defaults to `process.env.ANTHROPIC_API_KEY`. */
+    readonly apiKey?: string;
+    /** Per-request timeout in ms before the call is aborted and retried (ADR-0012). Default 60000. */
+    readonly timeoutMs?: number;
+    /** Retries on a transient failure — 429, 5xx, network, or timeout (ADR-0012). Default 2. */
+    readonly maxRetries?: number;
+}
+/**
+ * Construct the default classifier. Throws immediately if no API key is
+ * available, so a diff that needs the model fails at a clear boundary rather
+ * than per-call.
+ */
+declare function createDefaultClassifier(config: DefaultClassifierConfig): Classifier;
+/** A store for classification verdicts, keyed by content hash. May be async. */
+interface VerdictCache {
+    get(key: string): Promise<ClassifierVerdict | undefined>;
+    set(key: string, verdict: ClassifierVerdict): Promise<void>;
+}
+/** An in-memory `VerdictCache` (process-local; the default). */
+declare function createMemoryCache(): VerdictCache;
+/** Options for `withCache`. `modelId` and `promptVersion` are part of the key. */
+interface CacheOptions {
+    readonly modelId: string;
+    readonly promptVersion: string;
+    readonly cache?: VerdictCache;
+}
+/**
+ * Wrap a `Classifier` so identical inputs are classified once. Reuse the
+ * returned classifier across `diff` calls to share the cache.
+ */
+declare function withCache(classifier: Classifier, options: CacheOptions): Classifier;
+/** Content-addressed key: a hash of (normalized a, normalized b, prompt, model). */
+declare function cacheKey(pair: CandidatePair, modelId: string, promptVersion: string): string;
+/**
+ * semdiff — meaning-aware diff engine (library entry point).
+ *
+ * The library is the source of truth (ADR-0002); the CLI is a thin wrapper.
+ * `diff` runs the segment -> align -> classify pipeline (ADR-0003) and assembles
+ * the versioned `StructuredDiff` (ADR-0006), the engine's public contract.
+ */
+/** Options for a `diff` run. Omit a field to take its default. */
+interface DiffOptions {
+    /**
+     * Provider used to classify changed pairs. Defaults to the latest capable
+     * Claude model via `createDefaultClassifier` (ADR-0004) — which is only
+     * constructed when there is a change to classify (a modification, insertion,
+     * or deletion). Identical, cosmetic, and moved content needs no provider.
+     */
+    readonly classifier?: Classifier;
+    /** Model id stamped into provenance; also passed to the default classifier. */
+    readonly modelId?: string;
+    /** Prompt-template version stamped into provenance. */
+    readonly promptVersion?: string;
+    /** Granularity at which inputs are segmented (ADR-0003). */
+    readonly segmentGranularity?: SegmentGranularity;
+}
+/**
+ * Produce a meaning-aware structured diff of two inputs. Runs
+ * segment -> align -> classify, stamps run provenance, and assembles the
+ * `StructuredDiff`. A classifier is constructed and called only when there is at
+ * least one change to classify (a modification, insertion, or deletion), so
+ * diffs of identical, cosmetic, or merely relocated content need no provider.
+ */
+declare function diff(a: string, b: string, options?: DiffOptions): Promise<StructuredDiff>;
+export { type CacheOptions, type CandidatePair, type CandidateType, type Change, type ChangeType, type Classification, type Classifier, type ClassifierVerdict, DEFAULT_MODEL_ID, DEFAULT_PROMPT_VERSION, type DefaultClassifierConfig, type DiffOptions, type DiffSummary, ENGINE_VERSION, type Provenance, SCHEMA_VERSION, type Span, type StructuredDiff, type VerdictCache, cacheKey, createDefaultClassifier, createMemoryCache, diff, needsReviewVerdict, withCache };

package/dist/index.js ADDED Viewed

@@ -0,0 +1,25 @@
+import {
+  DEFAULT_MODEL_ID,
+  DEFAULT_PROMPT_VERSION,
+  ENGINE_VERSION,
+  SCHEMA_VERSION,
+  cacheKey,
+  createDefaultClassifier,
+  createMemoryCache,
+  diff,
+  needsReviewVerdict,
+  withCache
+} from "./chunk-4GFNMJGB.js";
+export {
+  DEFAULT_MODEL_ID,
+  DEFAULT_PROMPT_VERSION,
+  ENGINE_VERSION,
+  SCHEMA_VERSION,
+  cacheKey,
+  createDefaultClassifier,
+  createMemoryCache,
+  diff,
+  needsReviewVerdict,
+  withCache
+};
+//# sourceMappingURL=index.js.map

package/dist/index.js.map ADDED Viewed

	@@ -0,0 +1 @@
1	+ {"version":3,"sources":[],"sourcesContent":[],"mappings":"","names":[]}

package/package.json ADDED Viewed

@@ -0,0 +1,64 @@
+{
+  "name": "semdiff",
+  "version": "0.1.0",
+  "description": "Meaning-aware diff engine: surfaces substantive prose changes and suppresses cosmetic ones. Library + CLI.",
+  "license": "MIT",
+  "author": "Brian Benzinger",
+  "type": "module",
+  "keywords": [
+    "diff",
+    "semantic-diff",
+    "meaning-aware",
+    "prose",
+    "text-diff",
+    "structured-diff",
+    "llm",
+    "classifier",
+    "anthropic",
+    "claude"
+  ],
+  "repository": {
+    "type": "git",
+    "url": "git+https://github.com/brian-benzinger/semdiff.git"
+  },
+  "homepage": "https://github.com/brian-benzinger/semdiff#readme",
+  "bugs": {
+    "url": "https://github.com/brian-benzinger/semdiff/issues"
+  },
+  "main": "./dist/index.js",
+  "exports": {
+    ".": {
+      "types": "./dist/index.d.ts",
+      "import": "./dist/index.js"
+    }
+  },
+  "types": "./dist/index.d.ts",
+  "bin": {
+    "semdiff": "dist/cli.js"
+  },
+  "files": [
+    "dist",
+    "README.md",
+    "LICENSE"
+  ],
+  "sideEffects": false,
+  "engines": {
+    "node": ">=20.0.0"
+  },
+  "scripts": {
+    "build": "tsup",
+    "test": "vitest run --coverage",
+    "test:watch": "vitest",
+    "typecheck": "tsc --noEmit",
+    "eval": "node src/eval/run.ts",
+    "prepublishOnly": "npm run typecheck && npm test && npm run build",
+    "prepack": "npm run build"
+  },
+  "devDependencies": {
+    "@types/node": "^22.18.0",
+    "@vitest/coverage-v8": "^4.1.7",
+    "tsup": "^8.5.0",
+    "typescript": "^5.7.0",
+    "vitest": "^4.1.7"
+  }
+}