@mcarvin/smart-diff 1.1.0 → 2.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -6,12 +6,13 @@
6
6
  [![Maintainability](https://qlty.sh/gh/mcarvin8/projects/smart-diff/maintainability.svg)](https://qlty.sh/gh/mcarvin8/projects/smart-diff)
7
7
  [![codecov](https://codecov.io/gh/mcarvin8/smart-diff/graph/badge.svg?token=H3ZWAGG7S9)](https://codecov.io/gh/mcarvin8/smart-diff)
8
8
 
9
- TypeScript library that turns a **git revision range** into a **Markdown summary** using an OpenAI-compatible Chat Completions API. It uses [`simple-git`](https://github.com/steveukx/git-js) to read the repo, respects **path includes/excludes** and **commit message include/exclude regexes**, and sends commits, paths, structured diff stats, and unified diff text to the model.
9
+ TypeScript library that turns a **git revision range** into a **Markdown summary** using any LLM provider supported by the [Vercel AI SDK](https://sdk.vercel.ai) — OpenAI, Anthropic, Google Gemini, Amazon Bedrock, Mistral, Cohere, Groq, xAI, DeepSeek, or any OpenAI-compatible gateway. It uses [`simple-git`](https://github.com/steveukx/git-js) to read the repo, respects **path includes/excludes** and **commit message include/exclude regexes**, and sends commits, paths, structured diff stats, and unified diff text to the model.
10
10
 
11
11
  ## Requirements
12
12
 
13
13
  - **Node.js** 20+
14
- - [Git Bash](https://git-scm.com/install/)
14
+ - An LLM provider credential (see [Provider configuration](#provider-configuration))
15
+ - [Git](https://git-scm.com/) on the `PATH`
15
16
 
16
17
  ## Installation
17
18
 
@@ -19,25 +20,70 @@ TypeScript library that turns a **git revision range** into a **Markdown summary
19
20
  npm install @mcarvin/smart-diff
20
21
  ```
21
22
 
22
- ## LLM configuration
23
+ `@ai-sdk/openai` and `@ai-sdk/openai-compatible` ship as direct dependencies. Every other provider (`@ai-sdk/anthropic`, `@ai-sdk/google`, `@ai-sdk/amazon-bedrock`, `@ai-sdk/mistral`, `@ai-sdk/cohere`, `@ai-sdk/groq`, `@ai-sdk/xai`, `@ai-sdk/deepseek`) is declared as an **optional peer** and only needs to be installed when you actually use that provider. If the package is missing, smart-diff throws a clear error telling you which one to install.
23
24
 
24
- The library is considered “configured” when `shouldUseLlmGateway()` is true: API key, base URL, and/or JSON default headers are set. Otherwise `summarizeGitDiff` / `generateSummary` throw with `LLM_GATEWAY_REQUIRED_MESSAGE` unless you pass **`openAiClientProvider`**.
25
+ ## Provider configuration
26
+
27
+ smart-diff is "configured" when [`isLlmProviderConfigured()`](#lower-level-api) returns true — i.e. at least one supported provider can be resolved from env vars — **or** you pass your own `llmModelProvider` factory. Otherwise `summarizeGitDiff` / `generateSummary` throw with `LLM_GATEWAY_REQUIRED_MESSAGE`.
28
+
29
+ ### Selecting a provider
30
+
31
+ `LLM_PROVIDER` explicitly selects a provider. When unset, the resolver auto-detects in this order: `LLM_BASE_URL`/`OPENAI_BASE_URL` → `openai-compatible`, `OPENAI_API_KEY`/`LLM_API_KEY` → `openai`, then `ANTHROPIC_API_KEY`, `GOOGLE_GENERATIVE_AI_API_KEY` (or `GOOGLE_API_KEY`), `MISTRAL_API_KEY`, `COHERE_API_KEY`, `GROQ_API_KEY`, `XAI_API_KEY`, `DEEPSEEK_API_KEY`, and finally `OPENAI_DEFAULT_HEADERS`/`LLM_DEFAULT_HEADERS` → `openai`.
32
+
33
+ | Provider (`LLM_PROVIDER`) | Package | Credential env vars | Default model |
34
+ |---|---|---|---|
35
+ | `openai` | `@ai-sdk/openai` | `OPENAI_API_KEY` or `LLM_API_KEY` | `gpt-4o-mini` |
36
+ | `openai-compatible` | `@ai-sdk/openai-compatible` | `LLM_BASE_URL` or `OPENAI_BASE_URL` (required); `OPENAI_API_KEY`/`LLM_API_KEY` or custom headers | `gpt-4o-mini` |
37
+ | `anthropic` | `@ai-sdk/anthropic` | `ANTHROPIC_API_KEY` | `claude-3-5-haiku-latest` |
38
+ | `google` | `@ai-sdk/google` | `GOOGLE_GENERATIVE_AI_API_KEY` or `GOOGLE_API_KEY` | `gemini-2.0-flash` |
39
+ | `bedrock` | `@ai-sdk/amazon-bedrock` | Standard AWS credential chain (env / profile / role) | `anthropic.claude-3-5-haiku-20241022-v1:0` |
40
+ | `mistral` | `@ai-sdk/mistral` | `MISTRAL_API_KEY` | `mistral-small-latest` |
41
+ | `cohere` | `@ai-sdk/cohere` | `COHERE_API_KEY` | `command-r-08-2024` |
42
+ | `groq` | `@ai-sdk/groq` | `GROQ_API_KEY` | `llama-3.1-8b-instant` |
43
+ | `xai` | `@ai-sdk/xai` | `XAI_API_KEY` | `grok-2-latest` |
44
+ | `deepseek` | `@ai-sdk/deepseek` | `DEEPSEEK_API_KEY` | `deepseek-chat` |
45
+
46
+ > `LLM_*` wins over `OPENAI_*` where both exist.
47
+
48
+ ### Common env vars
25
49
 
26
50
  | Variable | Purpose |
27
- |----------|---------|
28
- | `OPENAI_API_KEY` or `LLM_API_KEY` | API key (`LLM_*` wins over `OPENAI_*` where both exist). |
29
- | `OPENAI_BASE_URL` or `LLM_BASE_URL` | Base URL for an OpenAI-compatible gateway (`LLM_*` overrides). |
30
- | `OPENAI_DEFAULT_HEADERS` / `LLM_DEFAULT_HEADERS` | JSON object of extra headers; `LLM_*` merges on top of `OPENAI_*`. Can supply `Authorization` (e.g. raw `sk-…`) when no env key is set. |
31
- | `LLM_MAX_DIFF_CHARS` | Max size of unified diff text sent to the model (default ~120k characters). |
32
- | `LLM_MAX_TOKENS` or `OPENAI_MAX_TOKENS` | Max completion tokens (default 4000). |
51
+ |---|---|
52
+ | `LLM_PROVIDER` | Explicit provider id from the table above. |
53
+ | `LLM_MODEL` | Overrides the per-provider default model id. |
54
+ | `OPENAI_BASE_URL` / `LLM_BASE_URL` | Base URL for an OpenAI-compatible gateway; presence alone auto-selects the `openai-compatible` provider. |
55
+ | `OPENAI_DEFAULT_HEADERS` / `LLM_DEFAULT_HEADERS` | JSON object of extra headers merged onto OpenAI / OpenAI-compatible requests (e.g. RBAC tokens, raw `Authorization`). `LLM_*` overrides `OPENAI_*` key-by-key. |
56
+ | `LLM_PROVIDER_NAME` | Display name used when `openai-compatible` is active (defaults to `openai-compatible`). |
57
+ | `OPENAI_MAX_DIFF_CHARS` / `LLM_MAX_DIFF_CHARS` | Max size of unified diff text sent to the model (default ~120k characters). |
58
+ | `OPENAI_MAX_TOKENS` / `LLM_MAX_TOKENS` | Max completion tokens (default 4000). |
59
+
60
+ ### Example: native OpenAI
61
+
62
+ ```powershell
63
+ $env:OPENAI_API_KEY = "sk-..."
64
+ # Optional: $env:LLM_MODEL = "gpt-4o"
65
+ ```
33
66
 
34
- The client is created with the official [`openai`](https://www.npmjs.com/package/openai) SDK via `createOpenAiLikeClient()`; use a compatible endpoint and model ID for your provider.
67
+ ### Example: Anthropic Claude
35
68
 
36
- Example using a company-managed OpenAI-compatible gateway:
69
+ ```powershell
70
+ $env:ANTHROPIC_API_KEY = "sk-ant-..."
71
+ $env:LLM_MODEL = "claude-3-5-sonnet-latest" # optional override
72
+ ```
73
+
74
+ ### Example: company-managed OpenAI-compatible gateway
75
+
76
+ ```powershell
77
+ $env:OPENAI_BASE_URL = "https://llm-gateway.example.com"
78
+ $env:OPENAI_DEFAULT_HEADERS = '{"x-company-rbac":"your-rbac-token-here","Authorization":"Bearer sk-your-api-key-here"}'
79
+ # LLM_PROVIDER is auto-detected as "openai-compatible" because LLM_BASE_URL/OPENAI_BASE_URL is set.
80
+ ```
81
+
82
+ ### Example: Google Gemini
37
83
 
38
84
  ```powershell
39
- $env:LLM_DEFAULT_HEADERS = '{"x-company-rbac":"your-rbac-token-here","Authorization":"sk-your-api-key-here"}'
40
- $env:LLM_BASE_URL = "https://llm-gateway.example.com"
85
+ $env:GOOGLE_GENERATIVE_AI_API_KEY = "..."
86
+ $env:LLM_MODEL = "gemini-2.0-flash"
41
87
  ```
42
88
 
43
89
  ## Usage
@@ -56,9 +102,10 @@ const markdown = await summarizeGitDiff({
56
102
  commitMessageExcludeRegexes: ['^\\[bot\\]'],
57
103
  commitMessageIncludeRegexes: ['^feat:'], // optional; OR across patterns
58
104
  teamName: 'Platform',
59
- systemPrompt: undefined, // optional; overrides DEFAULT_GIT_DIFF_SYSTEM_PROMPT
60
- model: 'gpt-4o-mini', // optional
61
- maxDiffChars: 120_000, // optional; also see LLM_MAX_DIFF_CHARS
105
+ systemPrompt: undefined, // optional; overrides DEFAULT_GIT_DIFF_SYSTEM_PROMPT
106
+ provider: 'anthropic', // optional; overrides LLM_PROVIDER env + auto-detection
107
+ model: 'claude-3-5-sonnet-latest', // optional
108
+ maxDiffChars: 120_000, // optional; also see LLM_MAX_DIFF_CHARS
62
109
  });
63
110
  ```
64
111
 
@@ -72,9 +119,49 @@ const markdown = await summarizeGitDiff({
72
119
  | `commitMessageExcludeRegexes` | Drop commits whose message matches **any** of these patterns. |
73
120
  | `teamName` | Adds a `Team:` line to the user payload for the model. |
74
121
  | `systemPrompt` | Replaces the default system prompt. |
75
- | `model` | Chat model id (default `gpt-4o-mini`). |
122
+ | `provider` | `LlmProviderId` wins over `LLM_PROVIDER` env and auto-detection. |
123
+ | `model` | Chat model id; overrides `LLM_MODEL` and the provider default. |
76
124
  | `maxDiffChars` | Caps unified diff size for the request. |
77
- | `openAiClientProvider` | `() => Promise<OpenAiLikeClient>` bypasses env-based client creation (required in tests or when you wire the SDK yourself). |
125
+ | `contextLines` | Number of context lines around each change (`git diff -U<n>`). Lower values (1 or 0) are the single biggest token saver on modification-heavy diffs. |
126
+ | `ignoreWhitespace` | Passes `-w` / `--ignore-all-space` to `git diff` so pure-whitespace hunks don't consume tokens. Also applies to `--numstat` / `--name-status` so counts stay consistent. |
127
+ | `stripDiffPreamble` | Removes low-value lines from the unified diff (`diff --git`, `index`, mode changes, `similarity/rename/copy` metadata). `--- a/…`, `+++ b/…`, and `@@` hunk headers are kept. |
128
+ | `maxHunkLines` | Caps the body of each hunk; anything past the limit is replaced with a single elision marker. The `@@` header and `DiffSummary` totals are preserved. |
129
+ | `excludeDefaultNoise` | Merges the built-in `DEFAULT_NOISE_EXCLUDES` list (lockfiles, `dist`, `build`, `out`, `coverage`, `node_modules`, `__snapshots__`) into `excludeFolders`. |
130
+ | `llmModelProvider` | `() => Promise<LanguageModel>` — bypass env-based resolution entirely; hand-wire a Vercel AI SDK `LanguageModel` (required in tests or custom setups). |
131
+
132
+ #### Reducing tokens
133
+
134
+ For most repos, the cheapest wins are:
135
+
136
+ ```ts
137
+ await summarizeGitDiff({
138
+ from: 'origin/main',
139
+ contextLines: 1, // -U1 cuts 30-60% of tokens on typical diffs
140
+ ignoreWhitespace: true, // drop pure-whitespace hunks entirely
141
+ stripDiffPreamble: true, // kill `index`/`mode`/`similarity` lines
142
+ maxHunkLines: 400, // truncate monster hunks but keep the @@ header
143
+ excludeDefaultNoise: true // skip lockfiles, dist/, coverage/, node_modules/
144
+ });
145
+ ```
146
+
147
+ These options only reshape the *unified diff text* — the structured `DiffSummary` still reports true file counts and line totals, so the model always sees the full change inventory.
148
+
149
+ ### Injecting your own `LanguageModel`
150
+
151
+ If you want full control — for example, to configure retries, middlewares, or hit an in-process mock — pass `llmModelProvider`:
152
+
153
+ ```ts
154
+ import { summarizeGitDiff } from '@mcarvin/smart-diff';
155
+ import { createAnthropic } from '@ai-sdk/anthropic';
156
+
157
+ const md = await summarizeGitDiff({
158
+ from: 'origin/main',
159
+ llmModelProvider: async () =>
160
+ createAnthropic({ apiKey: process.env.MY_ANTHROPIC_KEY })(
161
+ 'claude-3-5-sonnet-latest',
162
+ ),
163
+ });
164
+ ```
78
165
 
79
166
  ### Diff shape: single range vs per-commit
80
167
 
@@ -83,13 +170,28 @@ const markdown = await summarizeGitDiff({
83
170
 
84
171
  ### Lower-level API
85
172
 
86
- The package also exports helpers such as `createGitClient`, `getCommits`, `getDiff`, `getDiffSummary`, `getChangedFiles`, `filterCommitsByMessageRegexes`, `buildDiffPathspecs`, `generateSummary`, and OpenAI config utilities (`resolveLlmBaseUrl`, `shouldUseLlmGateway`, `createOpenAiLikeClient`, …). Use these if you build a custom pipeline but still want the same git and LLM behavior.
173
+ The package also exports helpers for building a custom pipeline on top of the same git and LLM behavior:
174
+
175
+ - **Git**: `createGitClient`, `getRepoRoot`, `getCommits`, `getDiff`, `getDiffSummary`, `getChangedFiles`, `filterCommitsByMessageRegexes`, `buildDiffPathspecs`, `buildDiffShapingGitArgs`, `shapeUnifiedDiff`, `DEFAULT_NOISE_EXCLUDES`
176
+ - **AI**: `generateSummary`, `resolveLlmMaxDiffChars`, `truncateUnifiedDiffForLlm`
177
+ - **Provider resolution**: `resolveLanguageModel`, `detectLlmProvider`, `isLlmProviderConfigured`, `defaultModelForProvider`, `resolveLlmBaseUrl`, `parseLlmDefaultHeadersFromEnv`
178
+ - **Constants / types**: `DEFAULT_GIT_DIFF_SYSTEM_PROMPT`, `LLM_GATEWAY_REQUIRED_MESSAGE`, `LlmProviderId`, `LlmModelProvider`, `ResolveLanguageModelOptions`, `GenerateSummaryInput`, `SummarizeFlags`
179
+
180
+ ## Migrating from 1.x → 2.x
181
+
182
+ v2 replaces the direct `openai` SDK dependency with the Vercel AI SDK. If you only rely on env-var configuration, your setup keeps working — `OPENAI_API_KEY`, `OPENAI_BASE_URL`, `OPENAI_DEFAULT_HEADERS`, `LLM_*` equivalents, `OPENAI_MAX_DIFF_CHARS`, and `OPENAI_MAX_TOKENS` are all still honored.
183
+
184
+ Breaking changes:
185
+
186
+ - **Removed `openAiClientProvider` option** on `summarizeGitDiff`/`generateSummary`. Use `llmModelProvider: () => Promise<LanguageModel>` returning a Vercel AI SDK model instead.
187
+ - **Removed `OpenAiLikeClient` and `createOpenAiLikeClient` exports**, along with `shouldUseLlmGateway`. Use `isLlmProviderConfigured()` / `resolveLanguageModel()` instead.
188
+ - **`openai` npm package is no longer a dependency.** Remove it from your own `package.json` if you only depended on it transitively via smart-diff.
87
189
 
88
190
  ## Used By
89
191
 
90
192
  This package is used by:
91
193
 
92
- - [sf-git-ai-meta-insights](https://github.com/mcarvin8/sf-git-ai-meta-insights) = Salesforce metadata wrapper compatible with Salesforce DX projects
194
+ - [sf-git-ai-meta-insights](https://github.com/mcarvin8/sf-git-ai-meta-insights) Salesforce metadata wrapper compatible with Salesforce DX projects
93
195
 
94
196
  ## License
95
197