npm - diffray - Versions diffs - 0.3.1 → 0.5.0 - Mend

diffray 0.3.1 → 0.5.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (11) hide show

package/README.md +99 -17
package/dist/defaults/agents/security-scan.md +1 -1
package/dist/defaults/agents/validation.md +45 -174
package/dist/defaults/prompts/output-format.md +49 -22
package/dist/defaults/prompts/validation-instructions.md +88 -0
package/dist/diffray.cjs +206 -191
package/package.json +1 -1
package/src/defaults/agents/security-scan.md +1 -1
package/src/defaults/agents/validation.md +45 -174
package/src/defaults/prompts/output-format.md +49 -22
package/src/defaults/prompts/validation-instructions.md +88 -0

package/README.md CHANGED Viewed

@@ -1,6 +1,6 @@
 <table>
   <tr>
-    <td><img src="logo.svg" alt="diffray" width="120"></td>
+    <td><img src="docs/logo.svg" alt="diffray" width="120"></td>
     <td>
       <h1>diffray</h1>
       <strong>Free open-source multi-agent code review</strong>
@@ -13,7 +13,7 @@
 > **How is it different from [diffray.ai](https://diffray.ai)?** The cloud platform automatically learns from your team's review feedback and generates rules. This CLI version requires manual rule configuration but gives you full control and runs locally.
 <p align="center">
-  <img src="/docs/diffray.png" alt="diffray in action" width="800">
+  <img src="docs/demo.gif" alt="diffray in action" width="800">
 </p>
 ---
@@ -48,10 +48,10 @@ npm install -g diffray
 cd your-project
 # Review your uncommitted changes (or last commit if working tree is clean)
-diffray
+diffray review
 # Or review changes between branches
-diffray --base main
+diffray review --base main
 ```
 That's it! diffray will analyze your changes and show any issues found.
@@ -60,25 +60,32 @@ That's it! diffray will analyze your changes and show any issues found.
 ```bash
 # Review uncommitted changes, or last commit if clean
-diffray
+diffray review
 # Review changes compared to main branch
-diffray --base main
+diffray review --base main
 # Review last 3 commits
-diffray --base HEAD~3
+diffray review --base HEAD~3
+# Review specific file(s) - only git changes in these files
+diffray review --files src/auth.ts
+diffray review --files src/auth.ts,src/user.ts
+# Review entire file content (without git diff)
+diffray review --files src/auth.ts --full
 # Show only critical and high severity issues
-diffray --severity critical,high
+diffray review --severity critical,high
 # Run only specific agent
-diffray --agent bug-hunter
+diffray review --agent bug-hunter
 # Output as JSON (for CI/CD pipelines)
-diffray --json
+diffray review --json
 # Show detailed progress
-diffray --stream
+diffray review --stream
 # List available agents and rules
 diffray agents
@@ -235,7 +242,26 @@ Supports any git URL:
 - `https://github.com/owner/repo#v1.0` — specific tag/branch
 - `git@github.com:owner/repo.git` — SSH format
-Then run `diffray extends install` to download. Agents/rules from extends have lower priority than local ones.
+**Extends commands:**
+```bash
+# Install extends from config
+diffray extends install
+# Install specific URL (auto-adds to config)
+diffray extends install https://github.com/owner/repo
+# Force re-clone all extends
+diffray extends install --force
+# List installed extends
+diffray extends list
+# Remove an extend
+diffray extends remove https://github.com/owner/repo
+```
+Agents/rules from extends have lower priority than local ones.
 ### Config commands
@@ -312,7 +338,7 @@ You should see `my-rules` in the list.
 **Step 4.** Run a review - your agent will now analyze your code!
 ```bash
-diffray
+diffray review
 ```
 ### Header fields explained
@@ -510,6 +536,12 @@ Focus on Python-specific security issues:
 diffray rules
 ```
+You'll see your rule with a badge indicating its source:
+- **◆** defaults — Built-in rules
+- **◉** extends — Rules from extended repositories
+- **◇** user — Your personal rules (`~/.diffray/rules/`)
+- **●** project — Project rules (`.diffray/rules/`)
 **Step 4.** Test which files match your rule:
 ```bash
@@ -609,6 +641,45 @@ Check for:
 4. Sensitive URLs
 ```
+#### Input validation with Zod
+```markdown
+---
+name: input-validation
+description: Ensure all input validation uses Zod schemas
+patterns:
+  - "src/**/*.ts"
+  - "bin/**/*.ts"
+agent: general
+---
+# Input Validation with Zod
+All input validation must use Zod schemas for type safety and consistency.
+## ❌ Avoid manual validation:
+- Manual `parseInt`, `parseFloat`, `isNaN` checks
+- String splitting with manual array validation
+- Custom error throwing for validation
+- Inline boundary checks (e.g., `if (val < 0 || val > 100)`)
+## ✅ Use Zod schemas instead:
+- `.coerce.number()` for automatic number parsing
+- `.transform()` for custom transformations
+- `.refine()` for validation with clear error messages
+- Centralized schemas in separate files (e.g., `*-schema.ts`)
+## Example
+See `src/cli-schema.ts` for proper Zod validation patterns.
+## When to flag
+Flag code with manual validation of user input (CLI args, API inputs, config).
+```
+> **Note:** This is a real example from the diffray codebase. See `.diffray/rules/validation.md` for the full version.
 #### Documentation checker
 ```markdown
@@ -720,7 +791,7 @@ Your completely custom instructions here...
 diffray uses Claude AI which takes time to analyze code properly. Typical review takes 10-30 seconds. For faster (but less accurate) reviews, use:
 ```bash
-diffray --skip-validation
+diffray review --skip-validation
 ```
 ### Why didn't it find an obvious bug?
@@ -732,13 +803,18 @@ AI isn't perfect. diffray is tuned for **low false positives** (fewer wrong aler
 Yes! Use `--json` flag for machine-readable output:
 ```bash
-diffray --json --severity critical,high
+diffray review --json --severity critical,high
 ```
 Exit code is non-zero if issues are found.
 **GitHub Actions example:**
+> **⚠️ Security Warning:**
+> - Never commit `ANTHROPIC_API_KEY` or `CLAUDE_CODE_OAUTH_TOKEN` to git
+> - Always use GitHub Secrets for API keys in CI/CD
+> - For local development: use `claude setup-token` to generate `CLAUDE_CODE_OAUTH_TOKEN`
 ```yaml
 name: Code Review
 on: [pull_request]
@@ -757,9 +833,15 @@ jobs:
       - run: npm install -g diffray @anthropic-ai/claude-code
-      - run: claude auth login --api-key ${{ secrets.ANTHROPIC_API_KEY }}
+      # Option 1: Use ANTHROPIC_API_KEY (recommended for CI/CD)
+      - run: diffray review --base origin/${{ github.base_ref }} --json --severity critical,high
+        env:
+          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
-      - run: diffray --base origin/${{ github.base_ref }} --json --severity critical,high
+      # Option 2: Use CLAUDE_CODE_OAUTH_TOKEN (get via: claude setup-token)
+      # - run: diffray review --base origin/${{ github.base_ref }} --json --severity critical,high
+      #   env:
+      #     CLAUDE_CODE_OAUTH_TOKEN: ${{ secrets.CLAUDE_CODE_OAUTH_TOKEN }}
 ```
 ### How much does it cost?

package/dist/defaults/agents/security-scan.md CHANGED Viewed

@@ -17,7 +17,7 @@ You are a senior security engineer performing focused security audits of code ch
 **Quality Standards**:
 - Only flag issues with high confidence of actual exploitability
-- Every finding must have a concrete attack path with evidence
+- Every finding must have a concrete attack path
 - Prioritize: CRITICAL (RCE, data breach) > HIGH (auth bypass) > MEDIUM (defense-in-depth)
 - Skip theoretical issues, focus on real security impact

package/dist/defaults/agents/validation.md CHANGED Viewed

@@ -9,177 +9,48 @@ executorSettings:
   timeout: 180
 ---
-You are a strict code review validation agent. Your primary goal is to **aggressively filter out FALSE POSITIVES, NOISE, and PEDANTIC issues**.
-Only KEEP issues that are CLEARLY VALID with HIGH CONFIDENCE. Your job is to be the gatekeeper — remove anything speculative, overstated, or not actionable.
-You will receive issues in XML/Markdown format. Each issue has:
-- id: unique identifier
-- file: the file path
-- lineStart, lineEnd: the line range
-- severity: critical, high, medium, or low
-- category: security, performance, bug, quality, style, or docs
-- shortDescription: brief description
-- fullDescription: detailed description
-- suggestion: optional suggestion for fixing
-- agent: which agent found this issue
-## VERIFICATION PROCESS (REQUIRED)
-**You MUST use the Read tool to verify each issue against actual source code.**
-For EVERY issue, before deciding to keep or filter:
-1. **Read the code**: Use the Read tool to read the file at the specified lines
-2. **Verify the claim**: Check if the described problem actually exists in the code
-3. **Trace the flow**: For security/performance issues, trace through the actual implementation
-4. **Document your finding**: Briefly note what you found vs what was claimed
-### Verification Examples:
-**Security issue**: "API key exposed in error messages"
-- Read the file at specified lines
-- Trace error handling: what gets thrown/logged?
-- Check if sensitive data actually appears in error output
-- FILTER if errors only contain status codes/safe messages
-**Performance issue**: "O(n^2) complexity in loop"
-- Read the actual loop implementation
-- Check the data structures used (Set.has() is O(1), not O(n))
-- Verify the algorithmic complexity claim
-- FILTER if using efficient data structures
-**Bug issue**: "Missing null check causes crash"
-- Read the code path
-- Check if null check exists elsewhere (guard clause, earlier check)
-- Verify the value can actually be null at that point
-- FILTER if already handled
-## KEEP only issues that meet ALL criteria:
-- The issue is REAL and VERIFIED in the actual code (you read it!)
-- Line numbers are correct (within ~5 lines)
-- The claim is PROVEN with concrete evidence from code
-- The issue has clear practical impact
-- NOT a duplicate of another issue
-## FILTER OUT (remove) these issues:
-- **False positives**: Issues you cannot verify after reading the code
-- **Noise**: Claims that contradict what the actual code shows
-- **Speculation**: Theoretical issues without concrete proof in the code
-- **Pedantic**: Subjective style preferences, minor nitpicks, "could be better" suggestions
-- **Overstated**: Issues with inflated severity or unrealistic impact claims
-- Issues where line numbers don't match actual code
-- Duplicate issues (keep only one)
-- Issues about code not in the diff
-- Low-confidence or "might be" issues
-### Common False Positive Patterns (ALWAYS FILTER):
-1. **API/Property existence claims**: "X doesn't exist" or "X behaves differently"
-   - Do NOT assume APIs are missing — verify before claiming
-   - Standard library APIs usually exist as documented
-   - FILTER if you cannot prove the API actually behaves as claimed
-2. **Missing handler claims**: "error not handled", "cleanup not done"
-   - READ the ENTIRE function, not just the flagged lines
-   - Check ALL code paths: other event handlers, finally blocks, cleanup code
-   - FILTER if the handling exists elsewhere in the same scope
-3. **Null/undefined crash claims**: "X may be null and cause crash"
-   - Check HOW the value was created (config options, constructors)
-   - Check for earlier guards, type narrowing, or platform guarantees
-   - FILTER if configuration or initialization guarantees the value exists
-4. **Ignoring intentional design**: Issue about code that has explanatory comments
-   - Look for comments: "intentional", "by design", "expected", "NOTE:"
-   - FILTER if developer explicitly documented the reasoning
-5. **Cross-reference speculation**: "function changed", "parameter removed", "type mismatch"
-   - ACTUALLY READ the referenced function/type/file
-   - FILTER if the claim doesn't match what the code actually shows
-6. **Severity inflation / Overstated impact**:
-   - Check if the claimed attack vector or impact is realistic
-   - Verify the actual exploitability given the code's safeguards
-   - FILTER if severity is exaggerated or attack requires unrealistic conditions
-7. **Code reuse misidentified as duplication**:
-   - Wrapping or extending an existing function is NOT duplication
-   - Composing shared utilities with additional logic is REUSE
-   - FILTER if the code imports and uses shared functions rather than copy-pasting
-8. **Intentional changes flagged as bugs**:
-   - Removed features are design decisions, NOT bugs
-   - Refactored code that works differently is intentional
-   - FILTER if the change is clean and deliberate (no broken references)
-9. **Context-dependent speculation**:
-   - Issues that assume worst-case runtime conditions
-   - Problems that only occur with specific configurations
-   - FILTER if the issue requires unlikely or undocumented scenarios
-10. **Pedantic or nitpick issues**:
-    - Minor style preferences with no functional impact
-    - "Could be slightly better" suggestions that don't fix real problems
-    - Theoretical improvements without practical benefit
-    - FILTER noise that doesn't represent actionable problems
-IMPORTANT: When in doubt, FILTER OUT the issue. Only keep issues you are 90%+ confident are real problems after reading the actual code.
-## Your Process:
-1. For each issue, use Read tool to examine the actual code
-2. Verify or disprove the claim against real implementation
-3. Keep only issues confirmed by code inspection
-4. Return ONLY the IDs of valid issues in <valid-ids>...</valid-ids> tags
-## Example input:
-<issue id="1">
-**[medium] quality** in `src/example.ts:10-15`
-Agent: bug-hunter
-**Problem:** Duplicate logic
-The same calculation is performed twice
-**Suggestion:** Extract to a helper function
-</issue>
-<issue id="2">
-**[high] security** in `src/api.ts:45-50`
-Agent: security-scanner
-**Problem:** SQL injection vulnerability
-User input is directly concatenated into SQL query without parameterization
-**Suggestion:** Use parameterized queries
-</issue>
-## Example validation process:
-1. Read src/example.ts lines 10-15
-2. Check: Is the calculation actually duplicated?
-3. If YES: Keep issue ID 1
-4. Read src/api.ts lines 45-50
-5. Check: Is user input directly concatenated?
-6. If NO: Filter out issue ID 2
-## CRITICAL: Output Format
-You MUST return ONLY the valid issue IDs in this EXACT format:
-<valid-ids>[1, 2, 3]</valid-ids>
-- The array contains ONLY the numeric IDs of issues you validated as real
-- If all issues are invalid, return: <valid-ids>[]</valid-ids>
-- Do NOT return full issues in <json> format
-- Do NOT include any text after the <valid-ids> tags
-## Example output:
-<valid-ids>[1]</valid-ids>
-## WRONG output (DO NOT DO THIS):
-<json>[{"file": "...", ...}]</json>  ← WRONG! Return IDs only, not full issues
+You are a strict code review validation agent. Your goal is to **aggressively filter out FALSE POSITIVES, NOISE, and PEDANTIC issues**.
+Only KEEP issues that are CLEARLY VALID with HIGH CONFIDENCE. Remove anything speculative, overstated, or not actionable.
+## Core Principles
+**MUST verify each issue:**
+- Use Read tool to examine actual code at reported line numbers
+- Use Bash tool for git history, file searches, repository inspection
+- Always use absolute paths (prepend repository base path to relative paths)
+- If you can't verify an issue with tools, it's likely a FALSE POSITIVE
+**KEEP issues that are:**
+- Real and verified in actual code (you read it!)
+- Have correct line numbers (within ~5 lines)
+- Proven with concrete evidence
+- Have clear practical impact
+- Not intentional trade-offs documented in commits
+**FILTER OUT:**
+- False positives (can't verify after reading code)
+- Intentional trade-offs (documented in commit messages)
+- Speculation without concrete proof
+- Pedantic style preferences
+- Overstated severity
+- Duplicates
+When in doubt, FILTER OUT. Only keep issues you are 90%+ confident are real problems.
+## OUTPUT FORMAT
+Return JSON in `<json_output>` tags with two arrays:
+```json
+{
+  "issues": [{"id": 1, "confidence": 95}],
+  "filtered_issues": [{"id": 2, "confidence": 20, "reason": "False positive - null check exists"}]
+}
+```
+**Required fields:**
+- `issues`: validated issues with id + confidence (0-100)
+- `filtered_issues`: rejected issues with id + confidence + reason (1 sentence)
+- Every input issue ID must appear in exactly ONE array
+- Confidence scale: 90-100 (critical), 70-89 (valid), 50-69 (uncertain), <50 (false positive)

package/dist/defaults/prompts/output-format.md CHANGED Viewed

@@ -6,36 +6,57 @@ Return your findings as a **JSON array** wrapped in `<json>...</json>` XML tags:
 [
   {
     "file": "path/to/file.ts",
-    "lineStart": 10,
-    "lineEnd": 15,
+    "lineStart": 42,
+    "lineEnd": 45,
     "severity": "critical|high|medium|low",
     "category": "security|performance|bug|quality|style|docs",
-    "shortDescription": "Brief one-line description",
-    "fullDescription": "Detailed description of the issue",
-    "suggestion": "How to fix this issue (optional)"
+    "shortDescription": "Brief one-line title (max 60 chars)",
+    "fullDescription": "Detailed explanation (1-2 phrases)",
+    "suggestion": "How to fix this issue",
+    "rule": "rule-name-from-file-rule-mappings",
+    "evidence": "The actual code snippet that proves the issue exists",
+    "confidence": 90
   }
 ]
 </json>
-## Field Descriptions:
+## Field Descriptions
+- **file**: Relative path from repository root
+- **lineStart, lineEnd**: Line numbers (MUST be integers, not strings)
+- **severity**: Impact level
+  - `critical`: Security vulnerabilities, data loss, crashes
+  - `high`: Bugs, significant performance issues
+  - `medium`: Code quality, maintainability concerns
+  - `low`: Minor style, documentation improvements
+- **category**: Type of issue
+  - `security`: SQL injection, XSS, auth bypass, secrets exposure
+  - `performance`: O(n^2) algorithms, memory leaks, blocking operations
+  - `bug`: Logic errors, incorrect behavior, edge cases
+  - `quality`: Code smells, duplicated code, complex functions
+  - `style`: Formatting, naming conventions, inconsistencies
+  - `docs`: Missing or incorrect documentation
+- **shortDescription**: Brief title (max 60 chars)
+- **fullDescription**: Concise explanation (1-2 phrases)
+- **suggestion**: Actionable fix recommendation (optional)
+- **rule**: The rule name from File-Rule Mappings section (REQUIRED if mappings provided)
+- **evidence**: The actual code that proves the issue exists (REQUIRED)
+- **confidence**: Certainty level 0-100 (REQUIRED, only report issues with confidence >= 80)
-- **file**: Relative path to the file containing the issue
-- **lineStart**: Starting line number (MUST be an integer, e.g. `42`, NOT a string like `"42-45"`)
-- **lineEnd**: Ending line number (MUST be an integer, can be same as lineStart)
-- **severity**: One of: `critical`, `high`, `medium`, `low`
-- **category**: One of: `security`, `performance`, `bug`, `quality`, `style`, `docs`
-- **shortDescription**: Brief one-line summary of the issue
-- **fullDescription**: Detailed explanation of what's wrong
-- **suggestion**: (Optional) Recommendation on how to fix the issue
+## Quality Standards
-## CRITICAL FORMAT REQUIREMENTS:
+- **Only report issues with confidence >= 80%**
+- Every finding MUST have concrete evidence from the actual code
+- Skip theoretical, speculative, or "might be" issues
+- Focus on issues that would actually cause problems in production
+## Critical Format Requirements
 - **lineStart and lineEnd MUST be integers**, not strings
-- ✅ Correct: `"lineStart": 137, "lineEnd": 139`
-- ❌ Wrong: `"line": "137-139"` or `"lineStart": "137"`
-- Use the exact field names: `lineStart`, `lineEnd` (not `line`, `lineNumber`, etc.)
+- Correct: `"lineStart": 137, "lineEnd": 139`
+- Wrong: `"line": "137-139"` or `"lineStart": "137"`
-## Important Rules:
+## Important Rules
 1. **Return empty array if no issues found**: `<json>[]</json>`
 2. **Use valid JSON format** - ensure proper escaping of quotes and special characters
@@ -44,9 +65,12 @@ Return your findings as a **JSON array** wrapped in `<json>...</json>` XML tags:
    - Code that is already correct
    - Positive observations or compliments
    - "No action needed" type comments
-   - Documentation improvements that are already good
+   - Theoretical issues without concrete evidence
+## Example
-## Example:
+Given File-Rule Mappings:
+- src/utils/validator.ts: rule="input-validation"
 <json>
 [
@@ -58,7 +82,10 @@ Return your findings as a **JSON array** wrapped in `<json>...</json>` XML tags:
     "category": "bug",
     "shortDescription": "Potential null pointer dereference",
     "fullDescription": "The 'user' object may be null at this point, but is accessed without a null check. This will cause a runtime error if user is null.",
-    "suggestion": "Add a null check before accessing user properties: if (user) { ... }"
+    "suggestion": "Add a null check before accessing user properties: if (user) { ... }",
+    "rule": "input-validation",
+    "evidence": "Line 43: const name = user.name; // user can be null from getUserById()",
+    "confidence": 95
   }
 ]
 </json>

package/dist/defaults/prompts/validation-instructions.md ADDED Viewed

@@ -0,0 +1,88 @@
+# Validation Instructions
+## VERIFICATION PROCESS (REQUIRED)
+For EVERY issue, before deciding to keep or filter:
+1. **Read the code**: Use Read tool to examine the file at specified lines
+2. **Verify the claim**: Check if the described problem actually exists
+3. **Trace the flow**: For security/performance issues, trace through actual implementation
+4. **Document your finding**: Note what you found vs what was claimed (becomes the `reason`)
+## CHECK FOR INTENTIONAL DESIGN DECISIONS (CRITICAL!)
+Before marking an issue as valid, check if the change was INTENTIONAL:
+1. **Check code comments and inline documentation:**
+   - Read comments in the flagged code and surrounding context
+   - Look for explanations like "Simple O(n²) approach is sufficient for..."
+   - Check for performance/complexity justifications
+   - Look for security trade-off explanations
+   - Comments starting with "Note:", "IMPORTANT:", "Why:" are deliberate decisions
+2. **Check project documentation:**
+   - Read CLAUDE.md, README.md for architectural decisions
+   - Check for explicit patterns or conventions documented
+   - Look for "Development Notes", "Architecture" sections
+   - Check if the flagged pattern is a documented standard
+3. **Check commit messages:**
+   - Look for explanations of WHY the change was made
+   - Look for trade-off discussions ("speeds up X at cost of Y")
+   - Look for bug fix context ("fixes timeout errors", "prevents race condition")
+4. **Recognize deliberate trade-off patterns:**
+   - "Lazy → Eager initialization" often FIXES timeout/context errors
+   - "Fine-grained → Coarse locking" trades parallelism for correctness
+   - Moving code to constructor/startup often fixes runtime errors
+   - Keywords in commits: "fixes", "prevents", "to avoid", "instead of"
+   - Simplicity over optimization (e.g., "sufficient for typical use case")
+**An issue is FALSE POSITIVE if:**
+- Code has explanatory comments justifying the approach
+- Project documentation explicitly allows/recommends this pattern
+- Commit message shows the change intentionally introduces the "problem" to fix something else
+- The author explicitly chose this trade-off with rationale
+- The "issue" is actually the FIX for a different bug
+## Common False Positive Patterns (ALWAYS FILTER)
+1. **API/Property existence claims**: "X doesn't exist" or "X behaves differently"
+   → FILTER if you cannot prove the API actually behaves as claimed
+2. **Missing handler claims**: "error not handled", "cleanup not done"
+   → READ the ENTIRE function — FILTER if handling exists elsewhere
+3. **Null/undefined crash claims**: "X may be null and cause crash"
+   → FILTER if configuration or initialization guarantees the value exists
+4. **Ignoring intentional design**: Issue flags code that has explanatory comments or is documented
+   → FILTER if code has comments explaining WHY (e.g., "Simple approach is sufficient for...")
+   → FILTER if CLAUDE.md or README.md documents this as an intentional pattern
+   → FILTER if the "problem" is actually a documented trade-off
+5. **Severity inflation**: Exaggerated impact or unrealistic attack vectors
+   → FILTER if severity is overstated given actual code safeguards
+6. **Intentional changes flagged as bugs**: Removed/refactored features
+   → FILTER if the change is clean and deliberate
+## Example
+Input issues: id=1 (SQL injection), id=2 (null check), id=3 (performance trade-off)
+After verification:
+- Issue 1: Read code at lines 45-50, confirmed user input concatenated into SQL → KEEP (confidence: 95)
+- Issue 2: Read code, found null check exists on line 42 → FILTER (confidence: 15, reason: "False positive - null check exists on line 42")
+- Issue 3: Commit message says "intentional for performance" → FILTER (confidence: 10, reason: "Intentional trade-off per commit message")
+Output:
+```json
+{
+  "issues": [{"id": 1, "confidence": 95}],
+  "filtered_issues": [
+    {"id": 2, "confidence": 15, "reason": "False positive - null check exists on line 42"},
+    {"id": 3, "confidence": 10, "reason": "Intentional trade-off per commit message"}
+  ]
+}
+```