npm - @sctg/backport-agent - Versions diffs - 0.1.0 → 0.1.1-20260603131045 - Mend

@sctg/backport-agent 0.1.0 → 0.1.1-20260603131045

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (5) hide show

package/README.md CHANGED Viewed

@@ -1,3 +1,4 @@
+[![Npm package version](https://badgen.net/npm/v/@sctg/backport-agent)](https://npmjs.com/package/@sctg/backport-agent)[![TypeScript](https://badgen.net/badge/icon/typescript?icon=typescript&label)](https://typescriptlang.org)
 # Backport Agent
 A deterministic IA powered agent for keeping a heavily customized Git fork in sync with an active upstream repository.
@@ -20,13 +21,11 @@ Backport Agent focuses on the parts that matter most:
 The agent works as a sync pipeline rather than a one-shot merge bot. It reads the upstream history, selects candidate commits, evaluates their risk, applies them in controlled batches, validates the result, and generates a report for review.
-It is built to support a fork that includes features such as:
+It is built to support forks that include features such as:
-- the `keypoollive` provider;
-- encrypted vault-backed model and key discovery;
-- round-robin key rotation;
-- GitHub Actions build-time package renaming;
-- Mintlify documentation generation;
+- custom LLM providers (e.g. `keypoollive` with encrypted vault-backed key rotation);
+- build-time package renaming and CI workflow customizations;
+- documentation generation pipelines;
 - a local reporting and validation workflow.
 ## How it is structured
@@ -56,13 +55,28 @@ npm install
 - [config.example.json](config.example.json)
 - [customizations.example.yaml](customizations.example.yaml)
-3. Set the required vault environment variables in your shell or `.env` file.
+3. Set the required provider credentials in your shell or `.env` file.
+For the **keypoollive** provider (vault-based key rotation):
 ```bash
 KEYPOOL_VAULT_URL=https://...
 KEYPOOL_LIVE_SECRET=...
 ```
+For any other provider supported by `@sctg/cline-sdk`, set the corresponding API key:
+```bash
+ANTHROPIC_API_KEY=sk-ant-...
+# or OPENAI_API_KEY=sk-..., MISTRAL_API_KEY=..., etc.
+```
+You can also override the provider or API key at runtime without editing `config.json`:
+```bash
+npm start -- --provider anthropic --api-key sk-ant-...
+```
 4. Start the agent.
 ```bash
@@ -106,7 +120,7 @@ The repository includes both unit and integration coverage.
 - `npm run test:unit` runs fast deterministic tests.
 - `npm run test:integration` runs integration tests, including real KeypoolLive calls when your vault is configured.
-The integration suite is intentionally practical. It verifies Git behavior in temporary repositories and exercises real SDK tools against the `keypoollive` provider with the `mistral/devstral-latest` model when `.env` is available.
+The integration suite is intentionally practical. It verifies Git behavior in temporary repositories and exercises real SDK tools against a configured provider (defaults to `keypoollive` with the `mistral/devstral-latest` model) when `.env` is available.
 ## Configuration
@@ -115,12 +129,87 @@ The main runtime configuration lives in a JSON file modeled after [config.exampl
 - the upstream repository and branch;
 - the fork repository and branch;
 - the working directory;
+- the LLM provider and model selection (`provider`, `fast`, `specialist`, `powerful`);
 - sync limits and batching;
-- model selection;
 - validation tiers.
+The `provider` field in the `models` section is required. It accepts any provider ID supported by `@sctg/cline-sdk` (e.g. `"keypoollive"`, `"anthropic"`, `"openai"`, `"mistral"`, `"gemini"`). The API key is resolved from the `apiKey` field, a `$ENV_VAR` reference, or the implicit `{PROVIDER_UPPER}_API_KEY` environment variable.
+### `sync.prNumberMatching` — Manual backport detection (optional)
+By default, the agent detects already-applied commits using three signals: `git cherry` patch comparison, exact subject-line match, and the `cherry picked from commit <sha>` annotation added by `git cherry-pick -x`.
+When a commit is cherry-picked manually (conflict resolution, subject rewrite, no `-x` flag), all three signals can miss it. Enabling `prNumberMatching` adds a fourth signal: if a fork commit references the same upstream PR number **and** the two subjects are similar enough (Jaccard word-token score), the commit is considered already applied.
+```json
+"sync": {
+  "prNumberMatching": {
+    "enabled": true,
+    "minSubjectSimilarity": 0.4
+  }
+}
+```
+| Field | Default | Description |
+|---|---|---|
+| `enabled` | `false` | Activate PR-number-based duplicate detection. |
+| `minSubjectSimilarity` | `0.4` | Minimum Jaccard word-token similarity (0–1) between the upstream subject and the matching fork subject. Lower → more permissive (risk of false positives). Higher → stricter (may miss heavily reworded backports). |
+**Example:** upstream commit `Move \`sdk/apps/\` to \`apps/\` (#11200)` is detected as already applied when the fork contains `feat(backport): Move sdk/apps/ to apps/ (cline#11200)` — the PR number matches and the similarity score (~0.67) exceeds the default threshold.
+Enable this only when your team consistently includes the upstream PR number in manual backport commit messages.
+### `ai` section — Quality guardrails (optional)
+The optional `ai` section configures the AI quality guardrails introduced to improve backport reliability. All fields have safe defaults and the section can be omitted entirely.
+```json
+"ai": {
+  "minAutoApplyConfidence": "medium",
+  "requireReviewOnSemanticRisk": false,
+  "enableConflictConsensus": false,
+  "conflictConsensusThreshold": 0.7,
+  "enrichCustomizationContext": true
+}
+```
+| Field | Default | Description |
+|---|---|---|
+| `minAutoApplyConfidence` | `"medium"` | Minimum AI confidence level (`"high"` or `"medium"`) to auto-apply a conflict resolution. Use `"high"` for stricter auto-apply. |
+| `requireReviewOnSemanticRisk` | `false` | When `true`, any commit carrying semantic risk factors is escalated to `"review-required"` by `reconcile_ai_assessments`, regardless of the individual AI recommendations. |
+| `enableConflictConsensus` | `false` | **Opt-in.** Runs a second, independent conflict resolution using `config.models.powerful` and compares both outputs with a Dice-coefficient similarity score. If the two resolutions diverge below `conflictConsensusThreshold`, confidence is downgraded to `"low"`. Enabling this roughly doubles LLM cost per conflict. |
+| `conflictConsensusThreshold` | `0.7` | Minimum line-level similarity (0–1) required for consensus. Only used when `enableConflictConsensus: true`. |
+| `enrichCustomizationContext` | `true` | When `true`, `check_customization_compatibility` reads up to 2 source files matching each customization glob (2 000 chars each) and injects their content into the AI prompt for richer analysis. |
+### AI sub-agent tools
+The `src/ai` module exposes four tools that the main agent invokes when deterministic logic is not enough.
+| Tool | Type | Purpose |
+|---|---|---|
+| `resolve_conflict_with_ai` | LLM call | Resolves merge conflicts in a single file using the configured `specialist` model. Returns `resolvedContent`, `confidence` (`"high"` / `"medium"` / `"low"`), and `reasoning`. Guards: conflict-marker detection, syntax balance check (JS/TS), optional dual-model consensus. |
+| `analyze_commit_for_backport` | LLM call | Analyzes a commit diff to produce a summary, key changes, complexity estimate, semantic risk factors, and a backport `recommendation`. Also runs hallucination detection on referenced file paths. |
+| `check_customization_compatibility` | LLM call | Checks whether a set of changes is compatible with the fork's declared customizations. Optionally enriches the prompt with actual file content when `ai.enrichCustomizationContext` is enabled. |
+| `reconcile_ai_assessments` | Deterministic | **No LLM call.** Combines the outputs of the two analysis tools into a single `finalRecommendation`. Detects contradictions (e.g. analyze said "apply" but compatibility check failed), applies `requireReviewOnSemanticRisk` escalation, and always resolves ambiguity conservatively. Call this after both analysis tools have run for the same commit. |
+Every LLM call is logged to the run's `.prompts.jsonl` file alongside structured quality signals (guards triggered, confidence, hallucination suspects). The detailed report includes a **Decision Quality Metrics** section summarising these signals across the full run.
+#### Benchmark replay
+The `src/tools/benchmark-replay.ts` script lets you compare two models side-by-side without running a full sync against a real repository. It reads an existing `.prompts.jsonl` log, replays every LLM call with the alternative model, and prints a Markdown comparison report.
+```bash
+npx tsx src/tools/benchmark-replay.ts \
+  --log run-1780060224987.prompts.jsonl \
+  --model anthropic/claude-sonnet-4-5 \
+  --provider anthropic \
+  --api-key "$ANTHROPIC_API_KEY" > comparison.md
+```
 Custom fork invariants live in a YAML file modeled after [customizations.example.yaml](customizations.example.yaml). This is where you describe the areas that must not be broken by a backport run.
 ## For contributors
 Contributions are especially welcome in the following areas:
@@ -148,4 +237,4 @@ That makes the agent more useful for real maintenance work and easier for contri
 ## License
-MIT License. See [LICENSE.md](LICENSE.md) for details.
+MIT License. See [LICENSE.md](LICENSE.md) for details.

package/config.example.json CHANGED Viewed

@@ -25,13 +25,36 @@
     "batchSize": 5,
     "dryRun": true,
     "createPullRequest": true,
-    "branchPrefix": "sync/upstream-"
+    "branchPrefix": "sync/upstream-",
+    "prNumberMatching": {
+      "_comment": "DISABLED BY DEFAULT. Enable to detect manually-applied backports that were reworded but kept the upstream PR number.",
+      "enabled": false,
+      "_comment_minSubjectSimilarity": "Jaccard word-token similarity threshold (0..1). Lower = more permissive (risk of false positives). Default 0.4.",
+      "minSubjectSimilarity": 0.4
+    }
   },
   "models": {
+    "_comment_provider": "Any provider supported by @sctg/cline-sdk: \"keypoollive\", \"anthropic\", \"openai\", \"mistral\", \"gemini\", etc.",
+    "provider": "keypoollive",
+    "_comment_apiKey": "Use \"$ENV_VAR\" to read from env, \"auto\" for keypoollive vault, or omit to auto-detect {PROVIDER_UPPER}_API_KEY",
+    "apiKey": "auto",
     "fast": "mistral/devstral-latest",
     "specialist": "mistral/devstral-latest",
     "powerful": "mistral/magistral-medium-latest"
   },
+  "ai": {
+    "_comment": "AI quality guardrails — all fields are optional (defaults shown). Remove this section to use defaults.",
+    "minAutoApplyConfidence": "medium",
+    "_comment_minAutoApplyConfidence": "Minimum AI confidence level to auto-apply a conflict resolution without human review. \"high\" = only apply when the AI is very certain.",
+    "requireReviewOnSemanticRisk": false,
+    "_comment_requireReviewOnSemanticRisk": "When true, any commit with semantic risk factors (e.g. API surface changes) is escalated to \"review-required\" by reconcile_ai_assessments.",
+    "enableConflictConsensus": false,
+    "_comment_enableConflictConsensus": "DISABLED BY DEFAULT. Set to true to run a second independent resolution using config.models.powerful and compare the two outputs. Consensus failures downgrade confidence to \"low\". This roughly doubles LLM cost per conflict.",
+    "conflictConsensusThreshold": 0.7,
+    "_comment_conflictConsensusThreshold": "Minimum Dice-coefficient line similarity (0..1) between the two consensus responses to consider them in agreement. Only used when enableConflictConsensus=true.",
+    "enrichCustomizationContext": true,
+    "_comment_enrichCustomizationContext": "When true, check_customization_compatibility reads up to 2 matching source files (2000 chars each) and injects their content into the AI prompt for richer context."
+  },
   "resolve": {
     "_comment": "Glob or /regex/flags patterns matched against repo-relative file paths",
     "ours": [