@mcarvin/smart-diff 1.1.0 → 2.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +123 -21
- package/dist/index.cjs +447 -134
- package/dist/index.cjs.map +1 -1
- package/dist/index.min.cjs +1 -1
- package/dist/index.min.cjs.map +1 -1
- package/dist/index.min.mjs +1 -1
- package/dist/index.min.mjs.map +1 -1
- package/dist/index.min.umd.js +1 -1
- package/dist/index.min.umd.js.map +1 -1
- package/dist/index.mjs +441 -131
- package/dist/index.mjs.map +1 -1
- package/dist/index.umd.js +450 -138
- package/dist/index.umd.js.map +1 -1
- package/dist/typings/ai/aiTypes.d.ts +5 -3
- package/dist/typings/ai/llmProviders.d.ts +12 -0
- package/dist/typings/git/diffShaping.d.ts +9 -0
- package/dist/typings/git/diffTypes.d.ts +2 -0
- package/dist/typings/git/gitDiff.d.ts +2 -0
- package/dist/typings/index.d.ts +14 -7
- package/package.json +34 -7
- package/dist/typings/ai/openAIConfig.d.ts +0 -21
package/README.md
CHANGED
|
@@ -6,12 +6,13 @@
|
|
|
6
6
|
[](https://qlty.sh/gh/mcarvin8/projects/smart-diff)
|
|
7
7
|
[](https://codecov.io/gh/mcarvin8/smart-diff)
|
|
8
8
|
|
|
9
|
-
TypeScript library that turns a **git revision range** into a **Markdown summary** using
|
|
9
|
+
TypeScript library that turns a **git revision range** into a **Markdown summary** using any LLM provider supported by the [Vercel AI SDK](https://sdk.vercel.ai) — OpenAI, Anthropic, Google Gemini, Amazon Bedrock, Mistral, Cohere, Groq, xAI, DeepSeek, or any OpenAI-compatible gateway. It uses [`simple-git`](https://github.com/steveukx/git-js) to read the repo, respects **path includes/excludes** and **commit message include/exclude regexes**, and sends commits, paths, structured diff stats, and unified diff text to the model.
|
|
10
10
|
|
|
11
11
|
## Requirements
|
|
12
12
|
|
|
13
13
|
- **Node.js** 20+
|
|
14
|
-
- [
|
|
14
|
+
- An LLM provider credential (see [Provider configuration](#provider-configuration))
|
|
15
|
+
- [Git](https://git-scm.com/) on the `PATH`
|
|
15
16
|
|
|
16
17
|
## Installation
|
|
17
18
|
|
|
@@ -19,25 +20,70 @@ TypeScript library that turns a **git revision range** into a **Markdown summary
|
|
|
19
20
|
npm install @mcarvin/smart-diff
|
|
20
21
|
```
|
|
21
22
|
|
|
22
|
-
|
|
23
|
+
`@ai-sdk/openai` and `@ai-sdk/openai-compatible` ship as direct dependencies. Every other provider (`@ai-sdk/anthropic`, `@ai-sdk/google`, `@ai-sdk/amazon-bedrock`, `@ai-sdk/mistral`, `@ai-sdk/cohere`, `@ai-sdk/groq`, `@ai-sdk/xai`, `@ai-sdk/deepseek`) is declared as an **optional peer** and only needs to be installed when you actually use that provider. If the package is missing, smart-diff throws a clear error telling you which one to install.
|
|
23
24
|
|
|
24
|
-
|
|
25
|
+
## Provider configuration
|
|
26
|
+
|
|
27
|
+
smart-diff is "configured" when [`isLlmProviderConfigured()`](#lower-level-api) returns true — i.e. at least one supported provider can be resolved from env vars — **or** you pass your own `llmModelProvider` factory. Otherwise `summarizeGitDiff` / `generateSummary` throw with `LLM_GATEWAY_REQUIRED_MESSAGE`.
|
|
28
|
+
|
|
29
|
+
### Selecting a provider
|
|
30
|
+
|
|
31
|
+
`LLM_PROVIDER` explicitly selects a provider. When unset, the resolver auto-detects in this order: `LLM_BASE_URL`/`OPENAI_BASE_URL` → `openai-compatible`, `OPENAI_API_KEY`/`LLM_API_KEY` → `openai`, then `ANTHROPIC_API_KEY`, `GOOGLE_GENERATIVE_AI_API_KEY` (or `GOOGLE_API_KEY`), `MISTRAL_API_KEY`, `COHERE_API_KEY`, `GROQ_API_KEY`, `XAI_API_KEY`, `DEEPSEEK_API_KEY`, and finally `OPENAI_DEFAULT_HEADERS`/`LLM_DEFAULT_HEADERS` → `openai`.
|
|
32
|
+
|
|
33
|
+
| Provider (`LLM_PROVIDER`) | Package | Credential env vars | Default model |
|
|
34
|
+
|---|---|---|---|
|
|
35
|
+
| `openai` | `@ai-sdk/openai` | `OPENAI_API_KEY` or `LLM_API_KEY` | `gpt-4o-mini` |
|
|
36
|
+
| `openai-compatible` | `@ai-sdk/openai-compatible` | `LLM_BASE_URL` or `OPENAI_BASE_URL` (required); `OPENAI_API_KEY`/`LLM_API_KEY` or custom headers | `gpt-4o-mini` |
|
|
37
|
+
| `anthropic` | `@ai-sdk/anthropic` | `ANTHROPIC_API_KEY` | `claude-3-5-haiku-latest` |
|
|
38
|
+
| `google` | `@ai-sdk/google` | `GOOGLE_GENERATIVE_AI_API_KEY` or `GOOGLE_API_KEY` | `gemini-2.0-flash` |
|
|
39
|
+
| `bedrock` | `@ai-sdk/amazon-bedrock` | Standard AWS credential chain (env / profile / role) | `anthropic.claude-3-5-haiku-20241022-v1:0` |
|
|
40
|
+
| `mistral` | `@ai-sdk/mistral` | `MISTRAL_API_KEY` | `mistral-small-latest` |
|
|
41
|
+
| `cohere` | `@ai-sdk/cohere` | `COHERE_API_KEY` | `command-r-08-2024` |
|
|
42
|
+
| `groq` | `@ai-sdk/groq` | `GROQ_API_KEY` | `llama-3.1-8b-instant` |
|
|
43
|
+
| `xai` | `@ai-sdk/xai` | `XAI_API_KEY` | `grok-2-latest` |
|
|
44
|
+
| `deepseek` | `@ai-sdk/deepseek` | `DEEPSEEK_API_KEY` | `deepseek-chat` |
|
|
45
|
+
|
|
46
|
+
> `LLM_*` wins over `OPENAI_*` where both exist.
|
|
47
|
+
|
|
48
|
+
### Common env vars
|
|
25
49
|
|
|
26
50
|
| Variable | Purpose |
|
|
27
|
-
|
|
28
|
-
| `
|
|
29
|
-
| `
|
|
30
|
-
| `
|
|
31
|
-
| `
|
|
32
|
-
| `
|
|
51
|
+
|---|---|
|
|
52
|
+
| `LLM_PROVIDER` | Explicit provider id from the table above. |
|
|
53
|
+
| `LLM_MODEL` | Overrides the per-provider default model id. |
|
|
54
|
+
| `OPENAI_BASE_URL` / `LLM_BASE_URL` | Base URL for an OpenAI-compatible gateway; presence alone auto-selects the `openai-compatible` provider. |
|
|
55
|
+
| `OPENAI_DEFAULT_HEADERS` / `LLM_DEFAULT_HEADERS` | JSON object of extra headers merged onto OpenAI / OpenAI-compatible requests (e.g. RBAC tokens, raw `Authorization`). `LLM_*` overrides `OPENAI_*` key-by-key. |
|
|
56
|
+
| `LLM_PROVIDER_NAME` | Display name used when `openai-compatible` is active (defaults to `openai-compatible`). |
|
|
57
|
+
| `OPENAI_MAX_DIFF_CHARS` / `LLM_MAX_DIFF_CHARS` | Max size of unified diff text sent to the model (default ~120k characters). |
|
|
58
|
+
| `OPENAI_MAX_TOKENS` / `LLM_MAX_TOKENS` | Max completion tokens (default 4000). |
|
|
59
|
+
|
|
60
|
+
### Example: native OpenAI
|
|
61
|
+
|
|
62
|
+
```powershell
|
|
63
|
+
$env:OPENAI_API_KEY = "sk-..."
|
|
64
|
+
# Optional: $env:LLM_MODEL = "gpt-4o"
|
|
65
|
+
```
|
|
33
66
|
|
|
34
|
-
|
|
67
|
+
### Example: Anthropic Claude
|
|
35
68
|
|
|
36
|
-
|
|
69
|
+
```powershell
|
|
70
|
+
$env:ANTHROPIC_API_KEY = "sk-ant-..."
|
|
71
|
+
$env:LLM_MODEL = "claude-3-5-sonnet-latest" # optional override
|
|
72
|
+
```
|
|
73
|
+
|
|
74
|
+
### Example: company-managed OpenAI-compatible gateway
|
|
75
|
+
|
|
76
|
+
```powershell
|
|
77
|
+
$env:OPENAI_BASE_URL = "https://llm-gateway.example.com"
|
|
78
|
+
$env:OPENAI_DEFAULT_HEADERS = '{"x-company-rbac":"your-rbac-token-here","Authorization":"Bearer sk-your-api-key-here"}'
|
|
79
|
+
# LLM_PROVIDER is auto-detected as "openai-compatible" because LLM_BASE_URL/OPENAI_BASE_URL is set.
|
|
80
|
+
```
|
|
81
|
+
|
|
82
|
+
### Example: Google Gemini
|
|
37
83
|
|
|
38
84
|
```powershell
|
|
39
|
-
$env:
|
|
40
|
-
$env:
|
|
85
|
+
$env:GOOGLE_GENERATIVE_AI_API_KEY = "..."
|
|
86
|
+
$env:LLM_MODEL = "gemini-2.0-flash"
|
|
41
87
|
```
|
|
42
88
|
|
|
43
89
|
## Usage
|
|
@@ -56,9 +102,10 @@ const markdown = await summarizeGitDiff({
|
|
|
56
102
|
commitMessageExcludeRegexes: ['^\\[bot\\]'],
|
|
57
103
|
commitMessageIncludeRegexes: ['^feat:'], // optional; OR across patterns
|
|
58
104
|
teamName: 'Platform',
|
|
59
|
-
systemPrompt: undefined,
|
|
60
|
-
|
|
61
|
-
|
|
105
|
+
systemPrompt: undefined, // optional; overrides DEFAULT_GIT_DIFF_SYSTEM_PROMPT
|
|
106
|
+
provider: 'anthropic', // optional; overrides LLM_PROVIDER env + auto-detection
|
|
107
|
+
model: 'claude-3-5-sonnet-latest', // optional
|
|
108
|
+
maxDiffChars: 120_000, // optional; also see LLM_MAX_DIFF_CHARS
|
|
62
109
|
});
|
|
63
110
|
```
|
|
64
111
|
|
|
@@ -72,9 +119,49 @@ const markdown = await summarizeGitDiff({
|
|
|
72
119
|
| `commitMessageExcludeRegexes` | Drop commits whose message matches **any** of these patterns. |
|
|
73
120
|
| `teamName` | Adds a `Team:` line to the user payload for the model. |
|
|
74
121
|
| `systemPrompt` | Replaces the default system prompt. |
|
|
75
|
-
| `
|
|
122
|
+
| `provider` | `LlmProviderId` — wins over `LLM_PROVIDER` env and auto-detection. |
|
|
123
|
+
| `model` | Chat model id; overrides `LLM_MODEL` and the provider default. |
|
|
76
124
|
| `maxDiffChars` | Caps unified diff size for the request. |
|
|
77
|
-
| `
|
|
125
|
+
| `contextLines` | Number of context lines around each change (`git diff -U<n>`). Lower values (1 or 0) are the single biggest token saver on modification-heavy diffs. |
|
|
126
|
+
| `ignoreWhitespace` | Passes `-w` / `--ignore-all-space` to `git diff` so pure-whitespace hunks don't consume tokens. Also applies to `--numstat` / `--name-status` so counts stay consistent. |
|
|
127
|
+
| `stripDiffPreamble` | Removes low-value lines from the unified diff (`diff --git`, `index`, mode changes, `similarity/rename/copy` metadata). `--- a/…`, `+++ b/…`, and `@@` hunk headers are kept. |
|
|
128
|
+
| `maxHunkLines` | Caps the body of each hunk; anything past the limit is replaced with a single elision marker. The `@@` header and `DiffSummary` totals are preserved. |
|
|
129
|
+
| `excludeDefaultNoise` | Merges the built-in `DEFAULT_NOISE_EXCLUDES` list (lockfiles, `dist`, `build`, `out`, `coverage`, `node_modules`, `__snapshots__`) into `excludeFolders`. |
|
|
130
|
+
| `llmModelProvider` | `() => Promise<LanguageModel>` — bypass env-based resolution entirely; hand-wire a Vercel AI SDK `LanguageModel` (required in tests or custom setups). |
|
|
131
|
+
|
|
132
|
+
#### Reducing tokens
|
|
133
|
+
|
|
134
|
+
For most repos, the cheapest wins are:
|
|
135
|
+
|
|
136
|
+
```ts
|
|
137
|
+
await summarizeGitDiff({
|
|
138
|
+
from: 'origin/main',
|
|
139
|
+
contextLines: 1, // -U1 cuts 30-60% of tokens on typical diffs
|
|
140
|
+
ignoreWhitespace: true, // drop pure-whitespace hunks entirely
|
|
141
|
+
stripDiffPreamble: true, // kill `index`/`mode`/`similarity` lines
|
|
142
|
+
maxHunkLines: 400, // truncate monster hunks but keep the @@ header
|
|
143
|
+
excludeDefaultNoise: true // skip lockfiles, dist/, coverage/, node_modules/
|
|
144
|
+
});
|
|
145
|
+
```
|
|
146
|
+
|
|
147
|
+
These options only reshape the *unified diff text* — the structured `DiffSummary` still reports true file counts and line totals, so the model always sees the full change inventory.
|
|
148
|
+
|
|
149
|
+
### Injecting your own `LanguageModel`
|
|
150
|
+
|
|
151
|
+
If you want full control — for example, to configure retries, middlewares, or hit an in-process mock — pass `llmModelProvider`:
|
|
152
|
+
|
|
153
|
+
```ts
|
|
154
|
+
import { summarizeGitDiff } from '@mcarvin/smart-diff';
|
|
155
|
+
import { createAnthropic } from '@ai-sdk/anthropic';
|
|
156
|
+
|
|
157
|
+
const md = await summarizeGitDiff({
|
|
158
|
+
from: 'origin/main',
|
|
159
|
+
llmModelProvider: async () =>
|
|
160
|
+
createAnthropic({ apiKey: process.env.MY_ANTHROPIC_KEY })(
|
|
161
|
+
'claude-3-5-sonnet-latest',
|
|
162
|
+
),
|
|
163
|
+
});
|
|
164
|
+
```
|
|
78
165
|
|
|
79
166
|
### Diff shape: single range vs per-commit
|
|
80
167
|
|
|
@@ -83,13 +170,28 @@ const markdown = await summarizeGitDiff({
|
|
|
83
170
|
|
|
84
171
|
### Lower-level API
|
|
85
172
|
|
|
86
|
-
The package also exports helpers
|
|
173
|
+
The package also exports helpers for building a custom pipeline on top of the same git and LLM behavior:
|
|
174
|
+
|
|
175
|
+
- **Git**: `createGitClient`, `getRepoRoot`, `getCommits`, `getDiff`, `getDiffSummary`, `getChangedFiles`, `filterCommitsByMessageRegexes`, `buildDiffPathspecs`, `buildDiffShapingGitArgs`, `shapeUnifiedDiff`, `DEFAULT_NOISE_EXCLUDES`
|
|
176
|
+
- **AI**: `generateSummary`, `resolveLlmMaxDiffChars`, `truncateUnifiedDiffForLlm`
|
|
177
|
+
- **Provider resolution**: `resolveLanguageModel`, `detectLlmProvider`, `isLlmProviderConfigured`, `defaultModelForProvider`, `resolveLlmBaseUrl`, `parseLlmDefaultHeadersFromEnv`
|
|
178
|
+
- **Constants / types**: `DEFAULT_GIT_DIFF_SYSTEM_PROMPT`, `LLM_GATEWAY_REQUIRED_MESSAGE`, `LlmProviderId`, `LlmModelProvider`, `ResolveLanguageModelOptions`, `GenerateSummaryInput`, `SummarizeFlags`
|
|
179
|
+
|
|
180
|
+
## Migrating from 1.x → 2.x
|
|
181
|
+
|
|
182
|
+
v2 replaces the direct `openai` SDK dependency with the Vercel AI SDK. If you only rely on env-var configuration, your setup keeps working — `OPENAI_API_KEY`, `OPENAI_BASE_URL`, `OPENAI_DEFAULT_HEADERS`, `LLM_*` equivalents, `OPENAI_MAX_DIFF_CHARS`, and `OPENAI_MAX_TOKENS` are all still honored.
|
|
183
|
+
|
|
184
|
+
Breaking changes:
|
|
185
|
+
|
|
186
|
+
- **Removed `openAiClientProvider` option** on `summarizeGitDiff`/`generateSummary`. Use `llmModelProvider: () => Promise<LanguageModel>` returning a Vercel AI SDK model instead.
|
|
187
|
+
- **Removed `OpenAiLikeClient` and `createOpenAiLikeClient` exports**, along with `shouldUseLlmGateway`. Use `isLlmProviderConfigured()` / `resolveLanguageModel()` instead.
|
|
188
|
+
- **`openai` npm package is no longer a dependency.** Remove it from your own `package.json` if you only depended on it transitively via smart-diff.
|
|
87
189
|
|
|
88
190
|
## Used By
|
|
89
191
|
|
|
90
192
|
This package is used by:
|
|
91
193
|
|
|
92
|
-
- [sf-git-ai-meta-insights](https://github.com/mcarvin8/sf-git-ai-meta-insights)
|
|
194
|
+
- [sf-git-ai-meta-insights](https://github.com/mcarvin8/sf-git-ai-meta-insights) — Salesforce metadata wrapper compatible with Salesforce DX projects
|
|
93
195
|
|
|
94
196
|
## License
|
|
95
197
|
|