npm - @mcarvin/smart-diff - Versions diffs - 1.1.0 → 2.1.0 - Mend

@mcarvin/smart-diff 1.1.0 → 2.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (21) hide show

package/README.md +123 -21
package/dist/index.cjs +447 -134
package/dist/index.cjs.map +1 -1
package/dist/index.min.cjs +1 -1
package/dist/index.min.cjs.map +1 -1
package/dist/index.min.mjs +1 -1
package/dist/index.min.mjs.map +1 -1
package/dist/index.min.umd.js +1 -1
package/dist/index.min.umd.js.map +1 -1
package/dist/index.mjs +441 -131
package/dist/index.mjs.map +1 -1
package/dist/index.umd.js +450 -138
package/dist/index.umd.js.map +1 -1
package/dist/typings/ai/aiTypes.d.ts +5 -3
package/dist/typings/ai/llmProviders.d.ts +12 -0
package/dist/typings/git/diffShaping.d.ts +9 -0
package/dist/typings/git/diffTypes.d.ts +2 -0
package/dist/typings/git/gitDiff.d.ts +2 -0
package/dist/typings/index.d.ts +14 -7
package/package.json +34 -7
package/dist/typings/ai/openAIConfig.d.ts +0 -21

package/README.md CHANGED Viewed

@@ -6,12 +6,13 @@
 [![Maintainability](https://qlty.sh/gh/mcarvin8/projects/smart-diff/maintainability.svg)](https://qlty.sh/gh/mcarvin8/projects/smart-diff)
 [![codecov](https://codecov.io/gh/mcarvin8/smart-diff/graph/badge.svg?token=H3ZWAGG7S9)](https://codecov.io/gh/mcarvin8/smart-diff)
-TypeScript library that turns a **git revision range** into a **Markdown summary** using an OpenAI-compatible Chat Completions API. It uses [`simple-git`](https://github.com/steveukx/git-js) to read the repo, respects **path includes/excludes** and **commit message include/exclude regexes**, and sends commits, paths, structured diff stats, and unified diff text to the model.
+TypeScript library that turns a **git revision range** into a **Markdown summary** using any LLM provider supported by the [Vercel AI SDK](https://sdk.vercel.ai) — OpenAI, Anthropic, Google Gemini, Amazon Bedrock, Mistral, Cohere, Groq, xAI, DeepSeek, or any OpenAI-compatible gateway. It uses [`simple-git`](https://github.com/steveukx/git-js) to read the repo, respects **path includes/excludes** and **commit message include/exclude regexes**, and sends commits, paths, structured diff stats, and unified diff text to the model.
 ## Requirements
 - **Node.js** 20+
-- [Git Bash](https://git-scm.com/install/)
+- An LLM provider credential (see [Provider configuration](#provider-configuration))
+- [Git](https://git-scm.com/) on the `PATH`
 ## Installation
@@ -19,25 +20,70 @@ TypeScript library that turns a **git revision range** into a **Markdown summary
 npm install @mcarvin/smart-diff
 ```
-## LLM configuration
+`@ai-sdk/openai` and `@ai-sdk/openai-compatible` ship as direct dependencies. Every other provider (`@ai-sdk/anthropic`, `@ai-sdk/google`, `@ai-sdk/amazon-bedrock`, `@ai-sdk/mistral`, `@ai-sdk/cohere`, `@ai-sdk/groq`, `@ai-sdk/xai`, `@ai-sdk/deepseek`) is declared as an **optional peer** and only needs to be installed when you actually use that provider. If the package is missing, smart-diff throws a clear error telling you which one to install.
-The library is considered “configured” when `shouldUseLlmGateway()` is true: API key, base URL, and/or JSON default headers are set. Otherwise `summarizeGitDiff` / `generateSummary` throw with `LLM_GATEWAY_REQUIRED_MESSAGE` unless you pass **`openAiClientProvider`**.
+## Provider configuration
+smart-diff is "configured" when [`isLlmProviderConfigured()`](#lower-level-api) returns true — i.e. at least one supported provider can be resolved from env vars — **or** you pass your own `llmModelProvider` factory. Otherwise `summarizeGitDiff` / `generateSummary` throw with `LLM_GATEWAY_REQUIRED_MESSAGE`.
+### Selecting a provider
+`LLM_PROVIDER` explicitly selects a provider. When unset, the resolver auto-detects in this order: `LLM_BASE_URL`/`OPENAI_BASE_URL` → `openai-compatible`, `OPENAI_API_KEY`/`LLM_API_KEY` → `openai`, then `ANTHROPIC_API_KEY`, `GOOGLE_GENERATIVE_AI_API_KEY` (or `GOOGLE_API_KEY`), `MISTRAL_API_KEY`, `COHERE_API_KEY`, `GROQ_API_KEY`, `XAI_API_KEY`, `DEEPSEEK_API_KEY`, and finally `OPENAI_DEFAULT_HEADERS`/`LLM_DEFAULT_HEADERS` → `openai`.
+| Provider (`LLM_PROVIDER`) | Package | Credential env vars | Default model |
+|---|---|---|---|
+| `openai` | `@ai-sdk/openai` | `OPENAI_API_KEY` or `LLM_API_KEY` | `gpt-4o-mini` |
+| `openai-compatible` | `@ai-sdk/openai-compatible` | `LLM_BASE_URL` or `OPENAI_BASE_URL` (required); `OPENAI_API_KEY`/`LLM_API_KEY` or custom headers | `gpt-4o-mini` |
+| `anthropic` | `@ai-sdk/anthropic` | `ANTHROPIC_API_KEY` | `claude-3-5-haiku-latest` |
+| `google` | `@ai-sdk/google` | `GOOGLE_GENERATIVE_AI_API_KEY` or `GOOGLE_API_KEY` | `gemini-2.0-flash` |
+| `bedrock` | `@ai-sdk/amazon-bedrock` | Standard AWS credential chain (env / profile / role) | `anthropic.claude-3-5-haiku-20241022-v1:0` |
+| `mistral` | `@ai-sdk/mistral` | `MISTRAL_API_KEY` | `mistral-small-latest` |
+| `cohere` | `@ai-sdk/cohere` | `COHERE_API_KEY` | `command-r-08-2024` |
+| `groq` | `@ai-sdk/groq` | `GROQ_API_KEY` | `llama-3.1-8b-instant` |
+| `xai` | `@ai-sdk/xai` | `XAI_API_KEY` | `grok-2-latest` |
+| `deepseek` | `@ai-sdk/deepseek` | `DEEPSEEK_API_KEY` | `deepseek-chat` |
+> `LLM_*` wins over `OPENAI_*` where both exist.
+### Common env vars
 | Variable | Purpose |
-|----------|---------|
-| `OPENAI_API_KEY` or `LLM_API_KEY` | API key (`LLM_*` wins over `OPENAI_*` where both exist). |
-| `OPENAI_BASE_URL` or `LLM_BASE_URL` | Base URL for an OpenAI-compatible gateway (`LLM_*` overrides). |
-| `OPENAI_DEFAULT_HEADERS` / `LLM_DEFAULT_HEADERS` | JSON object of extra headers; `LLM_*` merges on top of `OPENAI_*`. Can supply `Authorization` (e.g. raw `sk-…`) when no env key is set. |
-| `LLM_MAX_DIFF_CHARS` | Max size of unified diff text sent to the model (default ~120k characters). |
-| `LLM_MAX_TOKENS` or `OPENAI_MAX_TOKENS` | Max completion tokens (default 4000). |
+|---|---|
+| `LLM_PROVIDER` | Explicit provider id from the table above. |
+| `LLM_MODEL` | Overrides the per-provider default model id. |
+| `OPENAI_BASE_URL` / `LLM_BASE_URL` | Base URL for an OpenAI-compatible gateway; presence alone auto-selects the `openai-compatible` provider. |
+| `OPENAI_DEFAULT_HEADERS` / `LLM_DEFAULT_HEADERS` | JSON object of extra headers merged onto OpenAI / OpenAI-compatible requests (e.g. RBAC tokens, raw `Authorization`). `LLM_*` overrides `OPENAI_*` key-by-key. |
+| `LLM_PROVIDER_NAME` | Display name used when `openai-compatible` is active (defaults to `openai-compatible`). |
+| `OPENAI_MAX_DIFF_CHARS` / `LLM_MAX_DIFF_CHARS` | Max size of unified diff text sent to the model (default ~120k characters). |
+| `OPENAI_MAX_TOKENS` / `LLM_MAX_TOKENS` | Max completion tokens (default 4000). |
+### Example: native OpenAI
+```powershell
+$env:OPENAI_API_KEY = "sk-..."
+# Optional: $env:LLM_MODEL = "gpt-4o"
+```
-The client is created with the official [`openai`](https://www.npmjs.com/package/openai) SDK via `createOpenAiLikeClient()`; use a compatible endpoint and model ID for your provider.
+### Example: Anthropic Claude
-Example using a company-managed OpenAI-compatible gateway:
+```powershell
+$env:ANTHROPIC_API_KEY = "sk-ant-..."
+$env:LLM_MODEL = "claude-3-5-sonnet-latest"   # optional override
+```
+### Example: company-managed OpenAI-compatible gateway
+```powershell
+$env:OPENAI_BASE_URL = "https://llm-gateway.example.com"
+$env:OPENAI_DEFAULT_HEADERS = '{"x-company-rbac":"your-rbac-token-here","Authorization":"Bearer sk-your-api-key-here"}'
+# LLM_PROVIDER is auto-detected as "openai-compatible" because LLM_BASE_URL/OPENAI_BASE_URL is set.
+```
+### Example: Google Gemini
 ```powershell
-$env:LLM_DEFAULT_HEADERS = '{"x-company-rbac":"your-rbac-token-here","Authorization":"sk-your-api-key-here"}'
-$env:LLM_BASE_URL = "https://llm-gateway.example.com"
+$env:GOOGLE_GENERATIVE_AI_API_KEY = "..."
+$env:LLM_MODEL = "gemini-2.0-flash"
 ```
 ## Usage
@@ -56,9 +102,10 @@ const markdown = await summarizeGitDiff({
   commitMessageExcludeRegexes: ['^\\[bot\\]'],
   commitMessageIncludeRegexes: ['^feat:'], // optional; OR across patterns
   teamName: 'Platform',
-  systemPrompt: undefined, // optional; overrides DEFAULT_GIT_DIFF_SYSTEM_PROMPT
-  model: 'gpt-4o-mini', // optional
-  maxDiffChars: 120_000, // optional; also see LLM_MAX_DIFF_CHARS
+  systemPrompt: undefined,   // optional; overrides DEFAULT_GIT_DIFF_SYSTEM_PROMPT
+  provider: 'anthropic',     // optional; overrides LLM_PROVIDER env + auto-detection
+  model: 'claude-3-5-sonnet-latest', // optional
+  maxDiffChars: 120_000,     // optional; also see LLM_MAX_DIFF_CHARS
 });
 ```
@@ -72,9 +119,49 @@ const markdown = await summarizeGitDiff({
 | `commitMessageExcludeRegexes` | Drop commits whose message matches **any** of these patterns. |
 | `teamName` | Adds a `Team:` line to the user payload for the model. |
 | `systemPrompt` | Replaces the default system prompt. |
-| `model` | Chat model id (default `gpt-4o-mini`). |
+| `provider` | `LlmProviderId` — wins over `LLM_PROVIDER` env and auto-detection. |
+| `model` | Chat model id; overrides `LLM_MODEL` and the provider default. |
 | `maxDiffChars` | Caps unified diff size for the request. |
-| `openAiClientProvider` | `() => Promise<OpenAiLikeClient>` — bypasses env-based client creation (required in tests or when you wire the SDK yourself). |
+| `contextLines` | Number of context lines around each change (`git diff -U<n>`). Lower values (1 or 0) are the single biggest token saver on modification-heavy diffs. |
+| `ignoreWhitespace` | Passes `-w` / `--ignore-all-space` to `git diff` so pure-whitespace hunks don't consume tokens. Also applies to `--numstat` / `--name-status` so counts stay consistent. |
+| `stripDiffPreamble` | Removes low-value lines from the unified diff (`diff --git`, `index`, mode changes, `similarity/rename/copy` metadata). `--- a/…`, `+++ b/…`, and `@@` hunk headers are kept. |
+| `maxHunkLines` | Caps the body of each hunk; anything past the limit is replaced with a single elision marker. The `@@` header and `DiffSummary` totals are preserved. |
+| `excludeDefaultNoise` | Merges the built-in `DEFAULT_NOISE_EXCLUDES` list (lockfiles, `dist`, `build`, `out`, `coverage`, `node_modules`, `__snapshots__`) into `excludeFolders`. |
+| `llmModelProvider` | `() => Promise<LanguageModel>` — bypass env-based resolution entirely; hand-wire a Vercel AI SDK `LanguageModel` (required in tests or custom setups). |
+#### Reducing tokens
+For most repos, the cheapest wins are:
+```ts
+await summarizeGitDiff({
+  from: 'origin/main',
+  contextLines: 1,          // -U1 cuts 30-60% of tokens on typical diffs
+  ignoreWhitespace: true,   // drop pure-whitespace hunks entirely
+  stripDiffPreamble: true,  // kill `index`/`mode`/`similarity` lines
+  maxHunkLines: 400,        // truncate monster hunks but keep the @@ header
+  excludeDefaultNoise: true // skip lockfiles, dist/, coverage/, node_modules/
+});
+```
+These options only reshape the *unified diff text* — the structured `DiffSummary` still reports true file counts and line totals, so the model always sees the full change inventory.
+### Injecting your own `LanguageModel`
+If you want full control — for example, to configure retries, middlewares, or hit an in-process mock — pass `llmModelProvider`:
+```ts
+import { summarizeGitDiff } from '@mcarvin/smart-diff';
+import { createAnthropic } from '@ai-sdk/anthropic';
+const md = await summarizeGitDiff({
+  from: 'origin/main',
+  llmModelProvider: async () =>
+    createAnthropic({ apiKey: process.env.MY_ANTHROPIC_KEY })(
+      'claude-3-5-sonnet-latest',
+    ),
+});
+```
 ### Diff shape: single range vs per-commit
@@ -83,13 +170,28 @@ const markdown = await summarizeGitDiff({
 ### Lower-level API
-The package also exports helpers such as `createGitClient`, `getCommits`, `getDiff`, `getDiffSummary`, `getChangedFiles`, `filterCommitsByMessageRegexes`, `buildDiffPathspecs`, `generateSummary`, and OpenAI config utilities (`resolveLlmBaseUrl`, `shouldUseLlmGateway`, `createOpenAiLikeClient`, …). Use these if you build a custom pipeline but still want the same git and LLM behavior.
+The package also exports helpers for building a custom pipeline on top of the same git and LLM behavior:
+- **Git**: `createGitClient`, `getRepoRoot`, `getCommits`, `getDiff`, `getDiffSummary`, `getChangedFiles`, `filterCommitsByMessageRegexes`, `buildDiffPathspecs`, `buildDiffShapingGitArgs`, `shapeUnifiedDiff`, `DEFAULT_NOISE_EXCLUDES`
+- **AI**: `generateSummary`, `resolveLlmMaxDiffChars`, `truncateUnifiedDiffForLlm`
+- **Provider resolution**: `resolveLanguageModel`, `detectLlmProvider`, `isLlmProviderConfigured`, `defaultModelForProvider`, `resolveLlmBaseUrl`, `parseLlmDefaultHeadersFromEnv`
+- **Constants / types**: `DEFAULT_GIT_DIFF_SYSTEM_PROMPT`, `LLM_GATEWAY_REQUIRED_MESSAGE`, `LlmProviderId`, `LlmModelProvider`, `ResolveLanguageModelOptions`, `GenerateSummaryInput`, `SummarizeFlags`
+## Migrating from 1.x → 2.x
+v2 replaces the direct `openai` SDK dependency with the Vercel AI SDK. If you only rely on env-var configuration, your setup keeps working — `OPENAI_API_KEY`, `OPENAI_BASE_URL`, `OPENAI_DEFAULT_HEADERS`, `LLM_*` equivalents, `OPENAI_MAX_DIFF_CHARS`, and `OPENAI_MAX_TOKENS` are all still honored.
+Breaking changes:
+- **Removed `openAiClientProvider` option** on `summarizeGitDiff`/`generateSummary`. Use `llmModelProvider: () => Promise<LanguageModel>` returning a Vercel AI SDK model instead.
+- **Removed `OpenAiLikeClient` and `createOpenAiLikeClient` exports**, along with `shouldUseLlmGateway`. Use `isLlmProviderConfigured()` / `resolveLanguageModel()` instead.
+- **`openai` npm package is no longer a dependency.** Remove it from your own `package.json` if you only depended on it transitively via smart-diff.
 ## Used By
 This package is used by:
-- [sf-git-ai-meta-insights](https://github.com/mcarvin8/sf-git-ai-meta-insights) = Salesforce metadata wrapper compatible with Salesforce DX projects
+- [sf-git-ai-meta-insights](https://github.com/mcarvin8/sf-git-ai-meta-insights) — Salesforce metadata wrapper compatible with Salesforce DX projects
 ## License