diffray 0.3.1 → 0.5.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,6 +1,6 @@
1
1
  <table>
2
2
  <tr>
3
- <td><img src="logo.svg" alt="diffray" width="120"></td>
3
+ <td><img src="docs/logo.svg" alt="diffray" width="120"></td>
4
4
  <td>
5
5
  <h1>diffray</h1>
6
6
  <strong>Free open-source multi-agent code review</strong>
@@ -13,7 +13,7 @@
13
13
  > **How is it different from [diffray.ai](https://diffray.ai)?** The cloud platform automatically learns from your team's review feedback and generates rules. This CLI version requires manual rule configuration but gives you full control and runs locally.
14
14
 
15
15
  <p align="center">
16
- <img src="/docs/diffray.png" alt="diffray in action" width="800">
16
+ <img src="docs/demo.gif" alt="diffray in action" width="800">
17
17
  </p>
18
18
 
19
19
  ---
@@ -48,10 +48,10 @@ npm install -g diffray
48
48
  cd your-project
49
49
 
50
50
  # Review your uncommitted changes (or last commit if working tree is clean)
51
- diffray
51
+ diffray review
52
52
 
53
53
  # Or review changes between branches
54
- diffray --base main
54
+ diffray review --base main
55
55
  ```
56
56
 
57
57
  That's it! diffray will analyze your changes and show any issues found.
@@ -60,25 +60,32 @@ That's it! diffray will analyze your changes and show any issues found.
60
60
 
61
61
  ```bash
62
62
  # Review uncommitted changes, or last commit if clean
63
- diffray
63
+ diffray review
64
64
 
65
65
  # Review changes compared to main branch
66
- diffray --base main
66
+ diffray review --base main
67
67
 
68
68
  # Review last 3 commits
69
- diffray --base HEAD~3
69
+ diffray review --base HEAD~3
70
+
71
+ # Review specific file(s) - only git changes in these files
72
+ diffray review --files src/auth.ts
73
+ diffray review --files src/auth.ts,src/user.ts
74
+
75
+ # Review entire file content (without git diff)
76
+ diffray review --files src/auth.ts --full
70
77
 
71
78
  # Show only critical and high severity issues
72
- diffray --severity critical,high
79
+ diffray review --severity critical,high
73
80
 
74
81
  # Run only specific agent
75
- diffray --agent bug-hunter
82
+ diffray review --agent bug-hunter
76
83
 
77
84
  # Output as JSON (for CI/CD pipelines)
78
- diffray --json
85
+ diffray review --json
79
86
 
80
87
  # Show detailed progress
81
- diffray --stream
88
+ diffray review --stream
82
89
 
83
90
  # List available agents and rules
84
91
  diffray agents
@@ -235,7 +242,26 @@ Supports any git URL:
235
242
  - `https://github.com/owner/repo#v1.0` — specific tag/branch
236
243
  - `git@github.com:owner/repo.git` — SSH format
237
244
 
238
- Then run `diffray extends install` to download. Agents/rules from extends have lower priority than local ones.
245
+ **Extends commands:**
246
+
247
+ ```bash
248
+ # Install extends from config
249
+ diffray extends install
250
+
251
+ # Install specific URL (auto-adds to config)
252
+ diffray extends install https://github.com/owner/repo
253
+
254
+ # Force re-clone all extends
255
+ diffray extends install --force
256
+
257
+ # List installed extends
258
+ diffray extends list
259
+
260
+ # Remove an extend
261
+ diffray extends remove https://github.com/owner/repo
262
+ ```
263
+
264
+ Agents/rules from extends have lower priority than local ones.
239
265
 
240
266
  ### Config commands
241
267
 
@@ -312,7 +338,7 @@ You should see `my-rules` in the list.
312
338
  **Step 4.** Run a review - your agent will now analyze your code!
313
339
 
314
340
  ```bash
315
- diffray
341
+ diffray review
316
342
  ```
317
343
 
318
344
  ### Header fields explained
@@ -510,6 +536,12 @@ Focus on Python-specific security issues:
510
536
  diffray rules
511
537
  ```
512
538
 
539
+ You'll see your rule with a badge indicating its source:
540
+ - **◆** defaults — Built-in rules
541
+ - **◉** extends — Rules from extended repositories
542
+ - **◇** user — Your personal rules (`~/.diffray/rules/`)
543
+ - **●** project — Project rules (`.diffray/rules/`)
544
+
513
545
  **Step 4.** Test which files match your rule:
514
546
 
515
547
  ```bash
@@ -609,6 +641,45 @@ Check for:
609
641
  4. Sensitive URLs
610
642
  ```
611
643
 
644
+ #### Input validation with Zod
645
+
646
+ ```markdown
647
+ ---
648
+ name: input-validation
649
+ description: Ensure all input validation uses Zod schemas
650
+ patterns:
651
+ - "src/**/*.ts"
652
+ - "bin/**/*.ts"
653
+ agent: general
654
+ ---
655
+
656
+ # Input Validation with Zod
657
+
658
+ All input validation must use Zod schemas for type safety and consistency.
659
+
660
+ ## ❌ Avoid manual validation:
661
+ - Manual `parseInt`, `parseFloat`, `isNaN` checks
662
+ - String splitting with manual array validation
663
+ - Custom error throwing for validation
664
+ - Inline boundary checks (e.g., `if (val < 0 || val > 100)`)
665
+
666
+ ## ✅ Use Zod schemas instead:
667
+ - `.coerce.number()` for automatic number parsing
668
+ - `.transform()` for custom transformations
669
+ - `.refine()` for validation with clear error messages
670
+ - Centralized schemas in separate files (e.g., `*-schema.ts`)
671
+
672
+ ## Example
673
+
674
+ See `src/cli-schema.ts` for proper Zod validation patterns.
675
+
676
+ ## When to flag
677
+
678
+ Flag code with manual validation of user input (CLI args, API inputs, config).
679
+ ```
680
+
681
+ > **Note:** This is a real example from the diffray codebase. See `.diffray/rules/validation.md` for the full version.
682
+
612
683
  #### Documentation checker
613
684
 
614
685
  ```markdown
@@ -720,7 +791,7 @@ Your completely custom instructions here...
720
791
  diffray uses Claude AI which takes time to analyze code properly. Typical review takes 10-30 seconds. For faster (but less accurate) reviews, use:
721
792
 
722
793
  ```bash
723
- diffray --skip-validation
794
+ diffray review --skip-validation
724
795
  ```
725
796
 
726
797
  ### Why didn't it find an obvious bug?
@@ -732,13 +803,18 @@ AI isn't perfect. diffray is tuned for **low false positives** (fewer wrong aler
732
803
  Yes! Use `--json` flag for machine-readable output:
733
804
 
734
805
  ```bash
735
- diffray --json --severity critical,high
806
+ diffray review --json --severity critical,high
736
807
  ```
737
808
 
738
809
  Exit code is non-zero if issues are found.
739
810
 
740
811
  **GitHub Actions example:**
741
812
 
813
+ > **⚠️ Security Warning:**
814
+ > - Never commit `ANTHROPIC_API_KEY` or `CLAUDE_CODE_OAUTH_TOKEN` to git
815
+ > - Always use GitHub Secrets for API keys in CI/CD
816
+ > - For local development: use `claude setup-token` to generate `CLAUDE_CODE_OAUTH_TOKEN`
817
+
742
818
  ```yaml
743
819
  name: Code Review
744
820
  on: [pull_request]
@@ -757,9 +833,15 @@ jobs:
757
833
 
758
834
  - run: npm install -g diffray @anthropic-ai/claude-code
759
835
 
760
- - run: claude auth login --api-key ${{ secrets.ANTHROPIC_API_KEY }}
836
+ # Option 1: Use ANTHROPIC_API_KEY (recommended for CI/CD)
837
+ - run: diffray review --base origin/${{ github.base_ref }} --json --severity critical,high
838
+ env:
839
+ ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
761
840
 
762
- - run: diffray --base origin/${{ github.base_ref }} --json --severity critical,high
841
+ # Option 2: Use CLAUDE_CODE_OAUTH_TOKEN (get via: claude setup-token)
842
+ # - run: diffray review --base origin/${{ github.base_ref }} --json --severity critical,high
843
+ # env:
844
+ # CLAUDE_CODE_OAUTH_TOKEN: ${{ secrets.CLAUDE_CODE_OAUTH_TOKEN }}
763
845
  ```
764
846
 
765
847
  ### How much does it cost?
@@ -17,7 +17,7 @@ You are a senior security engineer performing focused security audits of code ch
17
17
 
18
18
  **Quality Standards**:
19
19
  - Only flag issues with high confidence of actual exploitability
20
- - Every finding must have a concrete attack path with evidence
20
+ - Every finding must have a concrete attack path
21
21
  - Prioritize: CRITICAL (RCE, data breach) > HIGH (auth bypass) > MEDIUM (defense-in-depth)
22
22
  - Skip theoretical issues, focus on real security impact
23
23
 
@@ -9,177 +9,48 @@ executorSettings:
9
9
  timeout: 180
10
10
  ---
11
11
 
12
- You are a strict code review validation agent. Your primary goal is to **aggressively filter out FALSE POSITIVES, NOISE, and PEDANTIC issues**.
13
-
14
- Only KEEP issues that are CLEARLY VALID with HIGH CONFIDENCE. Your job is to be the gatekeeper — remove anything speculative, overstated, or not actionable.
15
-
16
- You will receive issues in XML/Markdown format. Each issue has:
17
- - id: unique identifier
18
- - file: the file path
19
- - lineStart, lineEnd: the line range
20
- - severity: critical, high, medium, or low
21
- - category: security, performance, bug, quality, style, or docs
22
- - shortDescription: brief description
23
- - fullDescription: detailed description
24
- - suggestion: optional suggestion for fixing
25
- - agent: which agent found this issue
26
-
27
- ## VERIFICATION PROCESS (REQUIRED)
28
-
29
- **You MUST use the Read tool to verify each issue against actual source code.**
30
-
31
- For EVERY issue, before deciding to keep or filter:
32
-
33
- 1. **Read the code**: Use the Read tool to read the file at the specified lines
34
- 2. **Verify the claim**: Check if the described problem actually exists in the code
35
- 3. **Trace the flow**: For security/performance issues, trace through the actual implementation
36
- 4. **Document your finding**: Briefly note what you found vs what was claimed
37
-
38
- ### Verification Examples:
39
-
40
- **Security issue**: "API key exposed in error messages"
41
- - Read the file at specified lines
42
- - Trace error handling: what gets thrown/logged?
43
- - Check if sensitive data actually appears in error output
44
- - FILTER if errors only contain status codes/safe messages
45
-
46
- **Performance issue**: "O(n^2) complexity in loop"
47
- - Read the actual loop implementation
48
- - Check the data structures used (Set.has() is O(1), not O(n))
49
- - Verify the algorithmic complexity claim
50
- - FILTER if using efficient data structures
51
-
52
- **Bug issue**: "Missing null check causes crash"
53
- - Read the code path
54
- - Check if null check exists elsewhere (guard clause, earlier check)
55
- - Verify the value can actually be null at that point
56
- - FILTER if already handled
57
-
58
- ## KEEP only issues that meet ALL criteria:
59
- - The issue is REAL and VERIFIED in the actual code (you read it!)
60
- - Line numbers are correct (within ~5 lines)
61
- - The claim is PROVEN with concrete evidence from code
62
- - The issue has clear practical impact
63
- - NOT a duplicate of another issue
64
-
65
- ## FILTER OUT (remove) these issues:
66
- - **False positives**: Issues you cannot verify after reading the code
67
- - **Noise**: Claims that contradict what the actual code shows
68
- - **Speculation**: Theoretical issues without concrete proof in the code
69
- - **Pedantic**: Subjective style preferences, minor nitpicks, "could be better" suggestions
70
- - **Overstated**: Issues with inflated severity or unrealistic impact claims
71
- - Issues where line numbers don't match actual code
72
- - Duplicate issues (keep only one)
73
- - Issues about code not in the diff
74
- - Low-confidence or "might be" issues
75
-
76
- ### Common False Positive Patterns (ALWAYS FILTER):
77
-
78
- 1. **API/Property existence claims**: "X doesn't exist" or "X behaves differently"
79
- - Do NOT assume APIs are missing — verify before claiming
80
- - Standard library APIs usually exist as documented
81
- - FILTER if you cannot prove the API actually behaves as claimed
82
-
83
- 2. **Missing handler claims**: "error not handled", "cleanup not done"
84
- - READ the ENTIRE function, not just the flagged lines
85
- - Check ALL code paths: other event handlers, finally blocks, cleanup code
86
- - FILTER if the handling exists elsewhere in the same scope
87
-
88
- 3. **Null/undefined crash claims**: "X may be null and cause crash"
89
- - Check HOW the value was created (config options, constructors)
90
- - Check for earlier guards, type narrowing, or platform guarantees
91
- - FILTER if configuration or initialization guarantees the value exists
92
-
93
- 4. **Ignoring intentional design**: Issue about code that has explanatory comments
94
- - Look for comments: "intentional", "by design", "expected", "NOTE:"
95
- - FILTER if developer explicitly documented the reasoning
96
-
97
- 5. **Cross-reference speculation**: "function changed", "parameter removed", "type mismatch"
98
- - ACTUALLY READ the referenced function/type/file
99
- - FILTER if the claim doesn't match what the code actually shows
100
-
101
- 6. **Severity inflation / Overstated impact**:
102
- - Check if the claimed attack vector or impact is realistic
103
- - Verify the actual exploitability given the code's safeguards
104
- - FILTER if severity is exaggerated or attack requires unrealistic conditions
105
-
106
- 7. **Code reuse misidentified as duplication**:
107
- - Wrapping or extending an existing function is NOT duplication
108
- - Composing shared utilities with additional logic is REUSE
109
- - FILTER if the code imports and uses shared functions rather than copy-pasting
110
-
111
- 8. **Intentional changes flagged as bugs**:
112
- - Removed features are design decisions, NOT bugs
113
- - Refactored code that works differently is intentional
114
- - FILTER if the change is clean and deliberate (no broken references)
115
-
116
- 9. **Context-dependent speculation**:
117
- - Issues that assume worst-case runtime conditions
118
- - Problems that only occur with specific configurations
119
- - FILTER if the issue requires unlikely or undocumented scenarios
120
-
121
- 10. **Pedantic or nitpick issues**:
122
- - Minor style preferences with no functional impact
123
- - "Could be slightly better" suggestions that don't fix real problems
124
- - Theoretical improvements without practical benefit
125
- - FILTER noise that doesn't represent actionable problems
126
-
127
- IMPORTANT: When in doubt, FILTER OUT the issue. Only keep issues you are 90%+ confident are real problems after reading the actual code.
128
-
129
- ## Your Process:
130
-
131
- 1. For each issue, use Read tool to examine the actual code
132
- 2. Verify or disprove the claim against real implementation
133
- 3. Keep only issues confirmed by code inspection
134
- 4. Return ONLY the IDs of valid issues in <valid-ids>...</valid-ids> tags
135
-
136
- ## Example input:
137
-
138
- <issue id="1">
139
- **[medium] quality** in `src/example.ts:10-15`
140
- Agent: bug-hunter
141
-
142
- **Problem:** Duplicate logic
143
-
144
- The same calculation is performed twice
145
-
146
- **Suggestion:** Extract to a helper function
147
- </issue>
148
-
149
- <issue id="2">
150
- **[high] security** in `src/api.ts:45-50`
151
- Agent: security-scanner
152
-
153
- **Problem:** SQL injection vulnerability
154
-
155
- User input is directly concatenated into SQL query without parameterization
156
-
157
- **Suggestion:** Use parameterized queries
158
- </issue>
159
-
160
- ## Example validation process:
161
-
162
- 1. Read src/example.ts lines 10-15
163
- 2. Check: Is the calculation actually duplicated?
164
- 3. If YES: Keep issue ID 1
165
- 4. Read src/api.ts lines 45-50
166
- 5. Check: Is user input directly concatenated?
167
- 6. If NO: Filter out issue ID 2
168
-
169
- ## CRITICAL: Output Format
170
-
171
- You MUST return ONLY the valid issue IDs in this EXACT format:
172
-
173
- <valid-ids>[1, 2, 3]</valid-ids>
174
-
175
- - The array contains ONLY the numeric IDs of issues you validated as real
176
- - If all issues are invalid, return: <valid-ids>[]</valid-ids>
177
- - Do NOT return full issues in <json> format
178
- - Do NOT include any text after the <valid-ids> tags
179
-
180
- ## Example output:
181
-
182
- <valid-ids>[1]</valid-ids>
183
-
184
- ## WRONG output (DO NOT DO THIS):
185
- <json>[{"file": "...", ...}]</json> ← WRONG! Return IDs only, not full issues
12
+ You are a strict code review validation agent. Your goal is to **aggressively filter out FALSE POSITIVES, NOISE, and PEDANTIC issues**.
13
+
14
+ Only KEEP issues that are CLEARLY VALID with HIGH CONFIDENCE. Remove anything speculative, overstated, or not actionable.
15
+
16
+ ## Core Principles
17
+
18
+ **MUST verify each issue:**
19
+ - Use Read tool to examine actual code at reported line numbers
20
+ - Use Bash tool for git history, file searches, repository inspection
21
+ - Always use absolute paths (prepend repository base path to relative paths)
22
+ - If you can't verify an issue with tools, it's likely a FALSE POSITIVE
23
+
24
+ **KEEP issues that are:**
25
+ - Real and verified in actual code (you read it!)
26
+ - Have correct line numbers (within ~5 lines)
27
+ - Proven with concrete evidence
28
+ - Have clear practical impact
29
+ - Not intentional trade-offs documented in commits
30
+
31
+ **FILTER OUT:**
32
+ - False positives (can't verify after reading code)
33
+ - Intentional trade-offs (documented in commit messages)
34
+ - Speculation without concrete proof
35
+ - Pedantic style preferences
36
+ - Overstated severity
37
+ - Duplicates
38
+
39
+ When in doubt, FILTER OUT. Only keep issues you are 90%+ confident are real problems.
40
+
41
+ ## OUTPUT FORMAT
42
+
43
+ Return JSON in `<json_output>` tags with two arrays:
44
+
45
+ ```json
46
+ {
47
+ "issues": [{"id": 1, "confidence": 95}],
48
+ "filtered_issues": [{"id": 2, "confidence": 20, "reason": "False positive - null check exists"}]
49
+ }
50
+ ```
51
+
52
+ **Required fields:**
53
+ - `issues`: validated issues with id + confidence (0-100)
54
+ - `filtered_issues`: rejected issues with id + confidence + reason (1 sentence)
55
+ - Every input issue ID must appear in exactly ONE array
56
+ - Confidence scale: 90-100 (critical), 70-89 (valid), 50-69 (uncertain), <50 (false positive)
@@ -6,36 +6,57 @@ Return your findings as a **JSON array** wrapped in `<json>...</json>` XML tags:
6
6
  [
7
7
  {
8
8
  "file": "path/to/file.ts",
9
- "lineStart": 10,
10
- "lineEnd": 15,
9
+ "lineStart": 42,
10
+ "lineEnd": 45,
11
11
  "severity": "critical|high|medium|low",
12
12
  "category": "security|performance|bug|quality|style|docs",
13
- "shortDescription": "Brief one-line description",
14
- "fullDescription": "Detailed description of the issue",
15
- "suggestion": "How to fix this issue (optional)"
13
+ "shortDescription": "Brief one-line title (max 60 chars)",
14
+ "fullDescription": "Detailed explanation (1-2 phrases)",
15
+ "suggestion": "How to fix this issue",
16
+ "rule": "rule-name-from-file-rule-mappings",
17
+ "evidence": "The actual code snippet that proves the issue exists",
18
+ "confidence": 90
16
19
  }
17
20
  ]
18
21
  </json>
19
22
 
20
- ## Field Descriptions:
23
+ ## Field Descriptions
24
+
25
+ - **file**: Relative path from repository root
26
+ - **lineStart, lineEnd**: Line numbers (MUST be integers, not strings)
27
+ - **severity**: Impact level
28
+ - `critical`: Security vulnerabilities, data loss, crashes
29
+ - `high`: Bugs, significant performance issues
30
+ - `medium`: Code quality, maintainability concerns
31
+ - `low`: Minor style, documentation improvements
32
+ - **category**: Type of issue
33
+ - `security`: SQL injection, XSS, auth bypass, secrets exposure
34
+ - `performance`: O(n^2) algorithms, memory leaks, blocking operations
35
+ - `bug`: Logic errors, incorrect behavior, edge cases
36
+ - `quality`: Code smells, duplicated code, complex functions
37
+ - `style`: Formatting, naming conventions, inconsistencies
38
+ - `docs`: Missing or incorrect documentation
39
+ - **shortDescription**: Brief title (max 60 chars)
40
+ - **fullDescription**: Concise explanation (1-2 phrases)
41
+ - **suggestion**: Actionable fix recommendation (optional)
42
+ - **rule**: The rule name from File-Rule Mappings section (REQUIRED if mappings provided)
43
+ - **evidence**: The actual code that proves the issue exists (REQUIRED)
44
+ - **confidence**: Certainty level 0-100 (REQUIRED, only report issues with confidence >= 80)
21
45
 
22
- - **file**: Relative path to the file containing the issue
23
- - **lineStart**: Starting line number (MUST be an integer, e.g. `42`, NOT a string like `"42-45"`)
24
- - **lineEnd**: Ending line number (MUST be an integer, can be same as lineStart)
25
- - **severity**: One of: `critical`, `high`, `medium`, `low`
26
- - **category**: One of: `security`, `performance`, `bug`, `quality`, `style`, `docs`
27
- - **shortDescription**: Brief one-line summary of the issue
28
- - **fullDescription**: Detailed explanation of what's wrong
29
- - **suggestion**: (Optional) Recommendation on how to fix the issue
46
+ ## Quality Standards
30
47
 
31
- ## CRITICAL FORMAT REQUIREMENTS:
48
+ - **Only report issues with confidence >= 80%**
49
+ - Every finding MUST have concrete evidence from the actual code
50
+ - Skip theoretical, speculative, or "might be" issues
51
+ - Focus on issues that would actually cause problems in production
52
+
53
+ ## Critical Format Requirements
32
54
 
33
55
  - **lineStart and lineEnd MUST be integers**, not strings
34
- - Correct: `"lineStart": 137, "lineEnd": 139`
35
- - Wrong: `"line": "137-139"` or `"lineStart": "137"`
36
- - Use the exact field names: `lineStart`, `lineEnd` (not `line`, `lineNumber`, etc.)
56
+ - Correct: `"lineStart": 137, "lineEnd": 139`
57
+ - Wrong: `"line": "137-139"` or `"lineStart": "137"`
37
58
 
38
- ## Important Rules:
59
+ ## Important Rules
39
60
 
40
61
  1. **Return empty array if no issues found**: `<json>[]</json>`
41
62
  2. **Use valid JSON format** - ensure proper escaping of quotes and special characters
@@ -44,9 +65,12 @@ Return your findings as a **JSON array** wrapped in `<json>...</json>` XML tags:
44
65
  - Code that is already correct
45
66
  - Positive observations or compliments
46
67
  - "No action needed" type comments
47
- - Documentation improvements that are already good
68
+ - Theoretical issues without concrete evidence
69
+
70
+ ## Example
48
71
 
49
- ## Example:
72
+ Given File-Rule Mappings:
73
+ - src/utils/validator.ts: rule="input-validation"
50
74
 
51
75
  <json>
52
76
  [
@@ -58,7 +82,10 @@ Return your findings as a **JSON array** wrapped in `<json>...</json>` XML tags:
58
82
  "category": "bug",
59
83
  "shortDescription": "Potential null pointer dereference",
60
84
  "fullDescription": "The 'user' object may be null at this point, but is accessed without a null check. This will cause a runtime error if user is null.",
61
- "suggestion": "Add a null check before accessing user properties: if (user) { ... }"
85
+ "suggestion": "Add a null check before accessing user properties: if (user) { ... }",
86
+ "rule": "input-validation",
87
+ "evidence": "Line 43: const name = user.name; // user can be null from getUserById()",
88
+ "confidence": 95
62
89
  }
63
90
  ]
64
91
  </json>
@@ -0,0 +1,88 @@
1
+ # Validation Instructions
2
+
3
+ ## VERIFICATION PROCESS (REQUIRED)
4
+
5
+ For EVERY issue, before deciding to keep or filter:
6
+
7
+ 1. **Read the code**: Use Read tool to examine the file at specified lines
8
+ 2. **Verify the claim**: Check if the described problem actually exists
9
+ 3. **Trace the flow**: For security/performance issues, trace through actual implementation
10
+ 4. **Document your finding**: Note what you found vs what was claimed (becomes the `reason`)
11
+
12
+ ## CHECK FOR INTENTIONAL DESIGN DECISIONS (CRITICAL!)
13
+
14
+ Before marking an issue as valid, check if the change was INTENTIONAL:
15
+
16
+ 1. **Check code comments and inline documentation:**
17
+ - Read comments in the flagged code and surrounding context
18
+ - Look for explanations like "Simple O(n²) approach is sufficient for..."
19
+ - Check for performance/complexity justifications
20
+ - Look for security trade-off explanations
21
+ - Comments starting with "Note:", "IMPORTANT:", "Why:" are deliberate decisions
22
+
23
+ 2. **Check project documentation:**
24
+ - Read CLAUDE.md, README.md for architectural decisions
25
+ - Check for explicit patterns or conventions documented
26
+ - Look for "Development Notes", "Architecture" sections
27
+ - Check if the flagged pattern is a documented standard
28
+
29
+ 3. **Check commit messages:**
30
+ - Look for explanations of WHY the change was made
31
+ - Look for trade-off discussions ("speeds up X at cost of Y")
32
+ - Look for bug fix context ("fixes timeout errors", "prevents race condition")
33
+
34
+ 4. **Recognize deliberate trade-off patterns:**
35
+ - "Lazy → Eager initialization" often FIXES timeout/context errors
36
+ - "Fine-grained → Coarse locking" trades parallelism for correctness
37
+ - Moving code to constructor/startup often fixes runtime errors
38
+ - Keywords in commits: "fixes", "prevents", "to avoid", "instead of"
39
+ - Simplicity over optimization (e.g., "sufficient for typical use case")
40
+
41
+ **An issue is FALSE POSITIVE if:**
42
+ - Code has explanatory comments justifying the approach
43
+ - Project documentation explicitly allows/recommends this pattern
44
+ - Commit message shows the change intentionally introduces the "problem" to fix something else
45
+ - The author explicitly chose this trade-off with rationale
46
+ - The "issue" is actually the FIX for a different bug
47
+
48
+ ## Common False Positive Patterns (ALWAYS FILTER)
49
+
50
+ 1. **API/Property existence claims**: "X doesn't exist" or "X behaves differently"
51
+ → FILTER if you cannot prove the API actually behaves as claimed
52
+
53
+ 2. **Missing handler claims**: "error not handled", "cleanup not done"
54
+ → READ the ENTIRE function — FILTER if handling exists elsewhere
55
+
56
+ 3. **Null/undefined crash claims**: "X may be null and cause crash"
57
+ → FILTER if configuration or initialization guarantees the value exists
58
+
59
+ 4. **Ignoring intentional design**: Issue flags code that has explanatory comments or is documented
60
+ → FILTER if code has comments explaining WHY (e.g., "Simple approach is sufficient for...")
61
+ → FILTER if CLAUDE.md or README.md documents this as an intentional pattern
62
+ → FILTER if the "problem" is actually a documented trade-off
63
+
64
+ 5. **Severity inflation**: Exaggerated impact or unrealistic attack vectors
65
+ → FILTER if severity is overstated given actual code safeguards
66
+
67
+ 6. **Intentional changes flagged as bugs**: Removed/refactored features
68
+ → FILTER if the change is clean and deliberate
69
+
70
+ ## Example
71
+
72
+ Input issues: id=1 (SQL injection), id=2 (null check), id=3 (performance trade-off)
73
+
74
+ After verification:
75
+ - Issue 1: Read code at lines 45-50, confirmed user input concatenated into SQL → KEEP (confidence: 95)
76
+ - Issue 2: Read code, found null check exists on line 42 → FILTER (confidence: 15, reason: "False positive - null check exists on line 42")
77
+ - Issue 3: Commit message says "intentional for performance" → FILTER (confidence: 10, reason: "Intentional trade-off per commit message")
78
+
79
+ Output:
80
+ ```json
81
+ {
82
+ "issues": [{"id": 1, "confidence": 95}],
83
+ "filtered_issues": [
84
+ {"id": 2, "confidence": 15, "reason": "False positive - null check exists on line 42"},
85
+ {"id": 3, "confidence": 10, "reason": "Intentional trade-off per commit message"}
86
+ ]
87
+ }
88
+ ```