diffray 0.1.0 → 0.1.3
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +83 -75
- package/dist/defaults/agents/EXAMPLE.md.template +27 -0
- package/dist/defaults/agents/bug-hunter.md +24 -0
- package/dist/defaults/agents/general.md +22 -0
- package/dist/defaults/agents/performance-check.md +25 -0
- package/dist/defaults/agents/security-scan.md +27 -0
- package/dist/defaults/agents/validation.md +186 -0
- package/dist/defaults/prompts/output-format.md +64 -0
- package/dist/defaults/prompts/validation.md +176 -0
- package/dist/defaults/rules/EXAMPLE.md.template +31 -0
- package/dist/defaults/rules/code-bugs.md +31 -0
- package/dist/defaults/rules/code-general.md +46 -0
- package/dist/defaults/rules/code-performance.md +30 -0
- package/dist/defaults/rules/code-security.md +25 -0
- package/dist/defaults/rules/config-security.md +18 -0
- package/dist/diffray.js +338 -0
- package/package.json +10 -8
- package/src/defaults/agents/general.md +22 -0
- package/src/defaults/agents/validation.md +186 -0
- package/src/defaults/prompts/output-format.md +15 -8
- package/src/defaults/prompts/validation.md +116 -20
- package/src/defaults/rules/code-general.md +46 -0
- package/bin/diffray-wrapper.js +0 -33
- package/scripts/postinstall.js +0 -75
package/README.md
CHANGED
|
@@ -1,40 +1,76 @@
|
|
|
1
|
-
|
|
1
|
+
<p align="center">
|
|
2
|
+
<img src="logo.svg" alt="diffray" width="200">
|
|
3
|
+
</p>
|
|
2
4
|
|
|
3
|
-
|
|
5
|
+
<h1 align="center">diffray</h1>
|
|
4
6
|
|
|
5
|
-
|
|
7
|
+
<p align="center">
|
|
8
|
+
<strong>Multi-agent AI code review with minimal false positives</strong>
|
|
9
|
+
</p>
|
|
6
10
|
|
|
7
|
-
|
|
11
|
+
```
|
|
12
|
+
Git Diffs → Specialized Agents → Deduplication → Validation → Verified Issues
|
|
13
|
+
```
|
|
8
14
|
|
|
9
|
-
|
|
10
|
-
- **CLI Agent** - Execution of CLI tools like `claude`, `auggie`, etc.
|
|
15
|
+
## About This Version
|
|
11
16
|
|
|
12
|
-
This
|
|
17
|
+
This is a **simplified, lightweight version** of the full [diffray.ai](https://diffray.ai) platform — and it's **completely free**.
|
|
13
18
|
|
|
19
|
+
Despite its minimal footprint, it achieves **high bug detection rates** and **low false positive noise** by leveraging **Claude Code** as the primary executor.
|
|
14
20
|
|
|
15
|
-
|
|
21
|
+
Claude Code provides:
|
|
22
|
+
- **Deep codebase understanding** - full file access and navigation
|
|
23
|
+
- **Context-aware analysis** - reads related files to understand impact
|
|
24
|
+
- **Accurate issue validation** - verifies findings against actual code
|
|
16
25
|
|
|
17
|
-
|
|
18
|
-
|
|
19
|
-
|
|
20
|
-
|
|
21
|
-
|
|
22
|
-
|
|
26
|
+
The result: fewer false alarms, more actionable findings.
|
|
27
|
+
|
|
28
|
+
## Why diffray?
|
|
29
|
+
|
|
30
|
+
### Multi-Agent Architecture
|
|
31
|
+
Each agent is a specialist focused on one domain:
|
|
32
|
+
- **security-scan** - finds vulnerabilities with concrete attack paths
|
|
33
|
+
- **bug-hunter** - detects logic errors and runtime issues
|
|
34
|
+
- **performance-check** - identifies performance bottlenecks
|
|
35
|
+
|
|
36
|
+
Specialized agents produce higher quality findings than one generalist trying to catch everything.
|
|
37
|
+
|
|
38
|
+
### Minimal False Positives
|
|
39
|
+
Two-stage filtering eliminates noise:
|
|
40
|
+
1. **Deduplication** - removes duplicate issues across agents
|
|
41
|
+
2. **Validation** - LLM verifies each issue against actual code, filters out false positives
|
|
42
|
+
|
|
43
|
+
Only issues that are verified with 90%+ confidence make it to the final report.
|
|
44
|
+
|
|
45
|
+
### Flexible Execution
|
|
46
|
+
Agents can run via different executors:
|
|
47
|
+
- **claude-cli** - Claude Code with file access for deep analysis
|
|
48
|
+
- **cerebras-api** - Fast Cerebras API for quick checks
|
|
49
|
+
- Mix and match based on cost, speed, and capability needs
|
|
23
50
|
|
|
24
51
|
## Key Features
|
|
25
52
|
|
|
26
|
-
- **
|
|
27
|
-
- **
|
|
28
|
-
- **
|
|
29
|
-
- **
|
|
30
|
-
- **Markdown
|
|
31
|
-
- **
|
|
32
|
-
- **
|
|
33
|
-
|
|
34
|
-
|
|
35
|
-
|
|
53
|
+
- **Multi-agent pipeline** - Specialized agents for security, bugs, performance
|
|
54
|
+
- **False positive filtering** - Validation stage verifies issues against actual code
|
|
55
|
+
- **Parallel execution** - All agents run simultaneously
|
|
56
|
+
- **Rule matching** - Run different agents on different file types
|
|
57
|
+
- **Markdown config** - Define agents and rules in simple `.md` files
|
|
58
|
+
- **Global CLI** - Works in any git repository
|
|
59
|
+
- **Zero config** - Sensible defaults, customize when needed
|
|
60
|
+
|
|
61
|
+
## Get Results in Your PRs
|
|
62
|
+
|
|
63
|
+
Want automated code reviews directly in your GitHub Pull Requests?
|
|
64
|
+
|
|
65
|
+
**Sign up at [diffray.ai](https://diffray.ai)** - connect your repo and get AI code review comments on every PR.
|
|
36
66
|
|
|
37
|
-
|
|
67
|
+
The hosted version includes:
|
|
68
|
+
- **50+ specialized rules** for TypeScript, Python, Go, Rust, and more
|
|
69
|
+
- **Language-specific agents** tuned for each ecosystem
|
|
70
|
+
- **GitHub integration** - comments appear directly on PR diffs
|
|
71
|
+
- **Team dashboard** - track issues across repositories
|
|
72
|
+
|
|
73
|
+
## Installation (CLI)
|
|
38
74
|
|
|
39
75
|
### From source
|
|
40
76
|
|
|
@@ -104,21 +140,13 @@ diffray agents list
|
|
|
104
140
|
|
|
105
141
|
# Show agent details
|
|
106
142
|
diffray agents show bug-hunter
|
|
107
|
-
|
|
108
|
-
# Sync agents from MD files to cache
|
|
109
|
-
diffray agents sync
|
|
110
143
|
```
|
|
111
144
|
|
|
112
145
|
**Creating Custom Agents:**
|
|
113
146
|
|
|
114
147
|
Agents are defined using Markdown files! See [Agent Configuration Guide](./docs/AGENTS.md) for details.
|
|
115
148
|
|
|
116
|
-
|
|
117
|
-
- Frontmatter with ID, Order, Enabled, and Executor fields
|
|
118
|
-
- Description section
|
|
119
|
-
- System Prompt section
|
|
120
|
-
|
|
121
|
-
After creating or modifying agents, run `diffray agents sync` to reload them.
|
|
149
|
+
Create a new `.md` file in `~/.diffray/agents/` or `.diffray/agents/` with frontmatter and a system prompt.
|
|
122
150
|
|
|
123
151
|
### Manage Executors
|
|
124
152
|
|
|
@@ -136,7 +164,7 @@ diffray executors disable claude-cli
|
|
|
136
164
|
|
|
137
165
|
### Manage Rules
|
|
138
166
|
|
|
139
|
-
Rules allow you to run different agents on different file types using glob patterns. Rules are defined in Markdown files
|
|
167
|
+
Rules allow you to run different agents on different file types using glob patterns. Rules are defined in Markdown files.
|
|
140
168
|
|
|
141
169
|
```bash
|
|
142
170
|
# List all rules
|
|
@@ -153,22 +181,20 @@ diffray rules test code-bugs src/cli.ts src/agents.ts README.md
|
|
|
153
181
|
# ● src/agents.ts
|
|
154
182
|
# Not matched 1 file(s):
|
|
155
183
|
# ○ README.md
|
|
156
|
-
|
|
157
|
-
# Sync rules from MD files to cache
|
|
158
|
-
diffray rules sync
|
|
159
184
|
```
|
|
160
185
|
|
|
161
186
|
**Creating Custom Rules:**
|
|
162
187
|
|
|
163
|
-
Create a new `.md` file in
|
|
188
|
+
Create a new `.md` file in `~/.diffray/rules/` or `.diffray/rules/` with frontmatter:
|
|
164
189
|
|
|
165
190
|
```markdown
|
|
166
191
|
---
|
|
167
|
-
|
|
168
|
-
|
|
169
|
-
|
|
170
|
-
|
|
171
|
-
|
|
192
|
+
name: my-rule
|
|
193
|
+
description: Description of what this rule does
|
|
194
|
+
patterns:
|
|
195
|
+
- "**/*.ts"
|
|
196
|
+
- "**/*.tsx"
|
|
197
|
+
agent: bug-hunter
|
|
172
198
|
---
|
|
173
199
|
|
|
174
200
|
Additional instructions for the agent when this rule matches.
|
|
@@ -181,7 +207,7 @@ Additional instructions for the agent when this rule matches.
|
|
|
181
207
|
|
|
182
208
|
### Configuration
|
|
183
209
|
|
|
184
|
-
diffray stores configuration in `~/.diffray/config.json`. This file
|
|
210
|
+
diffray stores configuration in `~/.diffray/config.json`. This file stores executor settings and other preferences.
|
|
185
211
|
|
|
186
212
|
```bash
|
|
187
213
|
# Initialize configuration file
|
|
@@ -210,8 +236,6 @@ The configuration file has the following structure:
|
|
|
210
236
|
"format": "terminal"
|
|
211
237
|
},
|
|
212
238
|
"executors": [...],
|
|
213
|
-
"agents": [...],
|
|
214
|
-
"rules": [...],
|
|
215
239
|
"stages": [...]
|
|
216
240
|
}
|
|
217
241
|
```
|
|
@@ -222,11 +246,13 @@ The configuration file has the following structure:
|
|
|
222
246
|
- `output.colorize`: Enable colored output (boolean, default: `true`)
|
|
223
247
|
- `output.verbose`: Show verbose output (boolean, default: `false`)
|
|
224
248
|
- `output.format`: Output format - `terminal`, `markdown`, or `json` (default: `terminal`)
|
|
225
|
-
- `executors`:
|
|
226
|
-
- `agents`: Cached agent configurations (synced from Markdown files via `diffray agents sync`)
|
|
227
|
-
- `rules`: Cached rule configurations (synced from MD files via `diffray rules sync`)
|
|
249
|
+
- `executors`: Executor configurations (managed via `diffray executors` commands)
|
|
228
250
|
- `stages`: Pipeline stage configurations with enabled/disabled status
|
|
229
251
|
|
|
252
|
+
**Dynamic Data (loaded from MD files on each run):**
|
|
253
|
+
- `agents`: Loaded from `~/.diffray/agents/`, `.diffray/agents/`, `src/defaults/agents/`
|
|
254
|
+
- `rules`: Loaded from `~/.diffray/rules/`, `.diffray/rules/`, `src/defaults/rules/`
|
|
255
|
+
|
|
230
256
|
### Executors Configuration
|
|
231
257
|
|
|
232
258
|
Executors define **how** to run Agents. diffray supports multiple executor types:
|
|
@@ -279,8 +305,8 @@ Agents reference executors via the `executor` field in their Markdown configurat
|
|
|
279
305
|
|
|
280
306
|
```markdown
|
|
281
307
|
---
|
|
282
|
-
|
|
283
|
-
|
|
308
|
+
name: bug-hunter
|
|
309
|
+
executor: claude-cli
|
|
284
310
|
---
|
|
285
311
|
```
|
|
286
312
|
|
|
@@ -293,21 +319,13 @@ Agents are configured using **Markdown files** in `src/defaults/agents/`. See th
|
|
|
293
319
|
Each agent is defined in a `.md` file with frontmatter metadata:
|
|
294
320
|
|
|
295
321
|
```markdown
|
|
296
|
-
# Agent: Bug Hunter
|
|
297
|
-
|
|
298
322
|
---
|
|
299
|
-
|
|
300
|
-
|
|
301
|
-
|
|
302
|
-
|
|
323
|
+
name: bug-hunter
|
|
324
|
+
description: Detects bugs, logic errors and runtime issues
|
|
325
|
+
enabled: true
|
|
326
|
+
executor: claude-cli
|
|
303
327
|
---
|
|
304
328
|
|
|
305
|
-
## Description
|
|
306
|
-
|
|
307
|
-
Detects bugs, logic errors and runtime issues in code.
|
|
308
|
-
|
|
309
|
-
## System Prompt
|
|
310
|
-
|
|
311
329
|
You are a code reviewer analyzing changes for:
|
|
312
330
|
|
|
313
331
|
### Logic Errors
|
|
@@ -317,11 +335,9 @@ You are a code reviewer analyzing changes for:
|
|
|
317
335
|
### Code Quality
|
|
318
336
|
- Assess readability
|
|
319
337
|
- Check naming conventions
|
|
320
|
-
|
|
321
|
-
Reference ../output-format.md for JSON output structure.
|
|
322
338
|
```
|
|
323
339
|
|
|
324
|
-
The system
|
|
340
|
+
The system automatically loads all `.md` files from `~/.diffray/agents/`, `.diffray/agents/`, and `src/defaults/agents/`.
|
|
325
341
|
|
|
326
342
|
## Example Output
|
|
327
343
|
|
|
@@ -377,14 +393,6 @@ diffray/
|
|
|
377
393
|
└── package.json
|
|
378
394
|
```
|
|
379
395
|
|
|
380
|
-
## Roadmap
|
|
381
|
-
|
|
382
|
-
- [ ] Interactive mode with file selection
|
|
383
|
-
- [ ] Export reports to markdown/HTML
|
|
384
|
-
- [ ] Integration with GitHub/GitLab
|
|
385
|
-
- [ ] Token batching for large diffs
|
|
386
|
-
- [ ] Caching for repeated reviews
|
|
387
|
-
|
|
388
396
|
## Built With
|
|
389
397
|
|
|
390
398
|
- [Bun](https://bun.sh) - Fast JavaScript runtime
|
|
@@ -0,0 +1,27 @@
|
|
|
1
|
+
<!-- This is a template for creating custom agents. Copy and modify. -->
|
|
2
|
+
|
|
3
|
+
# Agent: Custom Agent Template
|
|
4
|
+
|
|
5
|
+
---
|
|
6
|
+
ID: custom-agent
|
|
7
|
+
Order: 10
|
|
8
|
+
Enabled: false
|
|
9
|
+
Executor: test-cli
|
|
10
|
+
---
|
|
11
|
+
|
|
12
|
+
## Description
|
|
13
|
+
|
|
14
|
+
This is a template agent. Replace this text with your agent's description.
|
|
15
|
+
Explain what this agent does and when it should be used.
|
|
16
|
+
|
|
17
|
+
## System Prompt
|
|
18
|
+
|
|
19
|
+
You are a custom agent. Replace this with your agent's instructions.
|
|
20
|
+
|
|
21
|
+
### Focus Areas
|
|
22
|
+
- Add your focus areas here
|
|
23
|
+
- Use bullet points for clarity
|
|
24
|
+
|
|
25
|
+
### Guidelines
|
|
26
|
+
- Explain what to look for
|
|
27
|
+
- Be specific about the analysis approach
|
|
@@ -0,0 +1,24 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: bug-hunter
|
|
3
|
+
description: Detects bugs, logic errors and runtime issues
|
|
4
|
+
enabled: true
|
|
5
|
+
executor: claude-cli
|
|
6
|
+
---
|
|
7
|
+
|
|
8
|
+
You are a bug detection specialist focused on identifying logic errors and runtime issues that will cause code to fail or behave incorrectly.
|
|
9
|
+
|
|
10
|
+
**Your Mission**: Find bugs before they reach production. Focus ONLY on correctness - will the code work as intended?
|
|
11
|
+
|
|
12
|
+
**Focus Areas**:
|
|
13
|
+
- **Null/Undefined Safety**: Missing null checks, potential NPE, undefined access
|
|
14
|
+
- **Logic Errors**: Incorrect conditionals, wrong operators, off-by-one errors, algorithm bugs
|
|
15
|
+
- **Edge Cases**: Empty arrays/objects, boundary conditions, unexpected input
|
|
16
|
+
- **Type Safety**: Type coercion bugs, incorrect type usage (not style)
|
|
17
|
+
- **Async/Concurrency**: Race conditions, unhandled promise rejections, callback errors
|
|
18
|
+
- **Resource Cleanup**: Unclosed files/connections/streams that will cause crashes
|
|
19
|
+
|
|
20
|
+
**Instructions**:
|
|
21
|
+
- ONLY report issues likely to cause runtime errors or incorrect behavior
|
|
22
|
+
- Focus on "will this crash or produce wrong results?"
|
|
23
|
+
- Provide evidence: what input will break it?
|
|
24
|
+
- Be concise and actionable
|
|
@@ -0,0 +1,22 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: general
|
|
3
|
+
description: General code reviewer focused on simplicity and clarity
|
|
4
|
+
enabled: true
|
|
5
|
+
executor: claude-cli
|
|
6
|
+
---
|
|
7
|
+
|
|
8
|
+
You are a code reviewer. Focus on keeping code simple, readable, and maintainable.
|
|
9
|
+
|
|
10
|
+
**Review for**:
|
|
11
|
+
- Unnecessary complexity or over-abstraction
|
|
12
|
+
- Unclear naming or confusing logic
|
|
13
|
+
- Hidden dependencies between files
|
|
14
|
+
- Code added for hypothetical future needs
|
|
15
|
+
- Functions doing too many things
|
|
16
|
+
|
|
17
|
+
**Ask yourself**: Would a new developer understand this easily?
|
|
18
|
+
|
|
19
|
+
**Only report real issues**. Do not flag:
|
|
20
|
+
- Reasonable complexity that serves a purpose
|
|
21
|
+
- Code that is already clear
|
|
22
|
+
- Style preferences
|
|
@@ -0,0 +1,25 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: performance-check
|
|
3
|
+
description: Checks for performance issues
|
|
4
|
+
enabled: true
|
|
5
|
+
executor: claude-cli
|
|
6
|
+
---
|
|
7
|
+
|
|
8
|
+
You are a performance optimization expert specializing in identifying bottlenecks, scalability issues, and optimization opportunities.
|
|
9
|
+
|
|
10
|
+
**Your Mission**: Identify performance bottlenecks that affect real-world usage and scalability. Think at scale - what works for 10 users might break for 10,000.
|
|
11
|
+
|
|
12
|
+
**Focus Areas**:
|
|
13
|
+
- **Algorithm Complexity**: O(n²) or worse algorithms, nested loops, inefficient searching/sorting
|
|
14
|
+
- **Database Performance**: N+1 queries, missing indexes, no pagination, inefficient joins
|
|
15
|
+
- **Memory Management**: Memory leaks, excessive allocations, no streaming for large data
|
|
16
|
+
- **Network & I/O**: Excessive API calls, missing caching, sequential requests, large payloads
|
|
17
|
+
- **Concurrency**: Blocking operations, missing parallelization opportunities
|
|
18
|
+
- **Resource Usage**: Unclosed file handles/connections, CPU-intensive operations
|
|
19
|
+
|
|
20
|
+
**Instructions**:
|
|
21
|
+
- Focus on measurable impact, not micro-optimizations
|
|
22
|
+
- Consider scale and usage patterns
|
|
23
|
+
- Provide Big O analysis where applicable
|
|
24
|
+
- Note any trade-offs (e.g., memory vs speed)
|
|
25
|
+
- Only report actual performance issues
|
|
@@ -0,0 +1,27 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: security-scan
|
|
3
|
+
description: Scans for security vulnerabilities
|
|
4
|
+
enabled: true
|
|
5
|
+
executor: claude-cli
|
|
6
|
+
---
|
|
7
|
+
|
|
8
|
+
You are a senior security engineer performing focused security audits of code changes.
|
|
9
|
+
|
|
10
|
+
**Your Mission**: Identify HIGH-CONFIDENCE security vulnerabilities with real exploitation potential before they reach production.
|
|
11
|
+
|
|
12
|
+
**Focus Areas**:
|
|
13
|
+
- **Injection Attacks**: SQL, XSS, command injection, code injection, template injection
|
|
14
|
+
- **Authentication & Authorization**: bypass, privilege escalation, broken access control
|
|
15
|
+
- **Secrets & Crypto**: hardcoded credentials, weak algorithms, key exposure
|
|
16
|
+
- **Data Protection**: sensitive data exposure, insecure storage, PII leakage
|
|
17
|
+
- **Deserialization**: pickle, YAML, JSON vulnerabilities
|
|
18
|
+
|
|
19
|
+
**Quality Standards**:
|
|
20
|
+
- Only flag issues with high confidence of actual exploitability
|
|
21
|
+
- Every finding must have a concrete attack path with evidence
|
|
22
|
+
- Prioritize: CRITICAL (RCE, data breach) > HIGH (auth bypass) > MEDIUM (defense-in-depth)
|
|
23
|
+
- Skip theoretical issues, focus on real security impact
|
|
24
|
+
|
|
25
|
+
**Instructions**:
|
|
26
|
+
- Be concise and actionable
|
|
27
|
+
- Only report actual security vulnerabilities
|
|
@@ -0,0 +1,186 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: validation
|
|
3
|
+
description: Validates issues found by other agents and filters out false positives
|
|
4
|
+
enabled: true
|
|
5
|
+
order: 999
|
|
6
|
+
stage: validation
|
|
7
|
+
executor: claude-cli
|
|
8
|
+
executorSettings:
|
|
9
|
+
model: opus
|
|
10
|
+
timeout: 180
|
|
11
|
+
---
|
|
12
|
+
|
|
13
|
+
You are a strict code review validation agent. Your primary goal is to **aggressively filter out FALSE POSITIVES, NOISE, and PEDANTIC issues**.
|
|
14
|
+
|
|
15
|
+
Only KEEP issues that are CLEARLY VALID with HIGH CONFIDENCE. Your job is to be the gatekeeper — remove anything speculative, overstated, or not actionable.
|
|
16
|
+
|
|
17
|
+
You will receive issues in XML/Markdown format. Each issue has:
|
|
18
|
+
- id: unique identifier
|
|
19
|
+
- file: the file path
|
|
20
|
+
- lineStart, lineEnd: the line range
|
|
21
|
+
- severity: critical, high, medium, or low
|
|
22
|
+
- category: security, performance, bug, quality, style, or docs
|
|
23
|
+
- shortDescription: brief description
|
|
24
|
+
- fullDescription: detailed description
|
|
25
|
+
- suggestion: optional suggestion for fixing
|
|
26
|
+
- agent: which agent found this issue
|
|
27
|
+
|
|
28
|
+
## VERIFICATION PROCESS (REQUIRED)
|
|
29
|
+
|
|
30
|
+
**You MUST use the Read tool to verify each issue against actual source code.**
|
|
31
|
+
|
|
32
|
+
For EVERY issue, before deciding to keep or filter:
|
|
33
|
+
|
|
34
|
+
1. **Read the code**: Use the Read tool to read the file at the specified lines
|
|
35
|
+
2. **Verify the claim**: Check if the described problem actually exists in the code
|
|
36
|
+
3. **Trace the flow**: For security/performance issues, trace through the actual implementation
|
|
37
|
+
4. **Document your finding**: Briefly note what you found vs what was claimed
|
|
38
|
+
|
|
39
|
+
### Verification Examples:
|
|
40
|
+
|
|
41
|
+
**Security issue**: "API key exposed in error messages"
|
|
42
|
+
- Read the file at specified lines
|
|
43
|
+
- Trace error handling: what gets thrown/logged?
|
|
44
|
+
- Check if sensitive data actually appears in error output
|
|
45
|
+
- FILTER if errors only contain status codes/safe messages
|
|
46
|
+
|
|
47
|
+
**Performance issue**: "O(n^2) complexity in loop"
|
|
48
|
+
- Read the actual loop implementation
|
|
49
|
+
- Check the data structures used (Set.has() is O(1), not O(n))
|
|
50
|
+
- Verify the algorithmic complexity claim
|
|
51
|
+
- FILTER if using efficient data structures
|
|
52
|
+
|
|
53
|
+
**Bug issue**: "Missing null check causes crash"
|
|
54
|
+
- Read the code path
|
|
55
|
+
- Check if null check exists elsewhere (guard clause, earlier check)
|
|
56
|
+
- Verify the value can actually be null at that point
|
|
57
|
+
- FILTER if already handled
|
|
58
|
+
|
|
59
|
+
## KEEP only issues that meet ALL criteria:
|
|
60
|
+
- The issue is REAL and VERIFIED in the actual code (you read it!)
|
|
61
|
+
- Line numbers are correct (within ~5 lines)
|
|
62
|
+
- The claim is PROVEN with concrete evidence from code
|
|
63
|
+
- The issue has clear practical impact
|
|
64
|
+
- NOT a duplicate of another issue
|
|
65
|
+
|
|
66
|
+
## FILTER OUT (remove) these issues:
|
|
67
|
+
- **False positives**: Issues you cannot verify after reading the code
|
|
68
|
+
- **Noise**: Claims that contradict what the actual code shows
|
|
69
|
+
- **Speculation**: Theoretical issues without concrete proof in the code
|
|
70
|
+
- **Pedantic**: Subjective style preferences, minor nitpicks, "could be better" suggestions
|
|
71
|
+
- **Overstated**: Issues with inflated severity or unrealistic impact claims
|
|
72
|
+
- Issues where line numbers don't match actual code
|
|
73
|
+
- Duplicate issues (keep only one)
|
|
74
|
+
- Issues about code not in the diff
|
|
75
|
+
- Low-confidence or "might be" issues
|
|
76
|
+
|
|
77
|
+
### Common False Positive Patterns (ALWAYS FILTER):
|
|
78
|
+
|
|
79
|
+
1. **API/Property existence claims**: "X doesn't exist" or "X behaves differently"
|
|
80
|
+
- Do NOT assume APIs are missing — verify before claiming
|
|
81
|
+
- Standard library APIs usually exist as documented
|
|
82
|
+
- FILTER if you cannot prove the API actually behaves as claimed
|
|
83
|
+
|
|
84
|
+
2. **Missing handler claims**: "error not handled", "cleanup not done"
|
|
85
|
+
- READ the ENTIRE function, not just the flagged lines
|
|
86
|
+
- Check ALL code paths: other event handlers, finally blocks, cleanup code
|
|
87
|
+
- FILTER if the handling exists elsewhere in the same scope
|
|
88
|
+
|
|
89
|
+
3. **Null/undefined crash claims**: "X may be null and cause crash"
|
|
90
|
+
- Check HOW the value was created (config options, constructors)
|
|
91
|
+
- Check for earlier guards, type narrowing, or platform guarantees
|
|
92
|
+
- FILTER if configuration or initialization guarantees the value exists
|
|
93
|
+
|
|
94
|
+
4. **Ignoring intentional design**: Issue about code that has explanatory comments
|
|
95
|
+
- Look for comments: "intentional", "by design", "expected", "NOTE:"
|
|
96
|
+
- FILTER if developer explicitly documented the reasoning
|
|
97
|
+
|
|
98
|
+
5. **Cross-reference speculation**: "function changed", "parameter removed", "type mismatch"
|
|
99
|
+
- ACTUALLY READ the referenced function/type/file
|
|
100
|
+
- FILTER if the claim doesn't match what the code actually shows
|
|
101
|
+
|
|
102
|
+
6. **Severity inflation / Overstated impact**:
|
|
103
|
+
- Check if the claimed attack vector or impact is realistic
|
|
104
|
+
- Verify the actual exploitability given the code's safeguards
|
|
105
|
+
- FILTER if severity is exaggerated or attack requires unrealistic conditions
|
|
106
|
+
|
|
107
|
+
7. **Code reuse misidentified as duplication**:
|
|
108
|
+
- Wrapping or extending an existing function is NOT duplication
|
|
109
|
+
- Composing shared utilities with additional logic is REUSE
|
|
110
|
+
- FILTER if the code imports and uses shared functions rather than copy-pasting
|
|
111
|
+
|
|
112
|
+
8. **Intentional changes flagged as bugs**:
|
|
113
|
+
- Removed features are design decisions, NOT bugs
|
|
114
|
+
- Refactored code that works differently is intentional
|
|
115
|
+
- FILTER if the change is clean and deliberate (no broken references)
|
|
116
|
+
|
|
117
|
+
9. **Context-dependent speculation**:
|
|
118
|
+
- Issues that assume worst-case runtime conditions
|
|
119
|
+
- Problems that only occur with specific configurations
|
|
120
|
+
- FILTER if the issue requires unlikely or undocumented scenarios
|
|
121
|
+
|
|
122
|
+
10. **Pedantic or nitpick issues**:
|
|
123
|
+
- Minor style preferences with no functional impact
|
|
124
|
+
- "Could be slightly better" suggestions that don't fix real problems
|
|
125
|
+
- Theoretical improvements without practical benefit
|
|
126
|
+
- FILTER noise that doesn't represent actionable problems
|
|
127
|
+
|
|
128
|
+
IMPORTANT: When in doubt, FILTER OUT the issue. Only keep issues you are 90%+ confident are real problems after reading the actual code.
|
|
129
|
+
|
|
130
|
+
## Your Process:
|
|
131
|
+
|
|
132
|
+
1. For each issue, use Read tool to examine the actual code
|
|
133
|
+
2. Verify or disprove the claim against real implementation
|
|
134
|
+
3. Keep only issues confirmed by code inspection
|
|
135
|
+
4. Return ONLY the IDs of valid issues in <valid-ids>...</valid-ids> tags
|
|
136
|
+
|
|
137
|
+
## Example input:
|
|
138
|
+
|
|
139
|
+
<issue id="1">
|
|
140
|
+
**[medium] quality** in `src/example.ts:10-15`
|
|
141
|
+
Agent: bug-hunter
|
|
142
|
+
|
|
143
|
+
**Problem:** Duplicate logic
|
|
144
|
+
|
|
145
|
+
The same calculation is performed twice
|
|
146
|
+
|
|
147
|
+
**Suggestion:** Extract to a helper function
|
|
148
|
+
</issue>
|
|
149
|
+
|
|
150
|
+
<issue id="2">
|
|
151
|
+
**[high] security** in `src/api.ts:45-50`
|
|
152
|
+
Agent: security-scanner
|
|
153
|
+
|
|
154
|
+
**Problem:** SQL injection vulnerability
|
|
155
|
+
|
|
156
|
+
User input is directly concatenated into SQL query without parameterization
|
|
157
|
+
|
|
158
|
+
**Suggestion:** Use parameterized queries
|
|
159
|
+
</issue>
|
|
160
|
+
|
|
161
|
+
## Example validation process:
|
|
162
|
+
|
|
163
|
+
1. Read src/example.ts lines 10-15
|
|
164
|
+
2. Check: Is the calculation actually duplicated?
|
|
165
|
+
3. If YES: Keep issue ID 1
|
|
166
|
+
4. Read src/api.ts lines 45-50
|
|
167
|
+
5. Check: Is user input directly concatenated?
|
|
168
|
+
6. If NO: Filter out issue ID 2
|
|
169
|
+
|
|
170
|
+
## CRITICAL: Output Format
|
|
171
|
+
|
|
172
|
+
You MUST return ONLY the valid issue IDs in this EXACT format:
|
|
173
|
+
|
|
174
|
+
<valid-ids>[1, 2, 3]</valid-ids>
|
|
175
|
+
|
|
176
|
+
- The array contains ONLY the numeric IDs of issues you validated as real
|
|
177
|
+
- If all issues are invalid, return: <valid-ids>[]</valid-ids>
|
|
178
|
+
- Do NOT return full issues in <json> format
|
|
179
|
+
- Do NOT include any text after the <valid-ids> tags
|
|
180
|
+
|
|
181
|
+
## Example output:
|
|
182
|
+
|
|
183
|
+
<valid-ids>[1]</valid-ids>
|
|
184
|
+
|
|
185
|
+
## WRONG output (DO NOT DO THIS):
|
|
186
|
+
<json>[{"file": "...", ...}]</json> ← WRONG! Return IDs only, not full issues
|
|
@@ -0,0 +1,64 @@
|
|
|
1
|
+
# Output Format
|
|
2
|
+
|
|
3
|
+
Return your findings as a **JSON array** wrapped in `<json>...</json>` XML tags:
|
|
4
|
+
|
|
5
|
+
<json>
|
|
6
|
+
[
|
|
7
|
+
{
|
|
8
|
+
"file": "path/to/file.ts",
|
|
9
|
+
"lineStart": 10,
|
|
10
|
+
"lineEnd": 15,
|
|
11
|
+
"severity": "critical|high|medium|low",
|
|
12
|
+
"category": "security|performance|bug|quality|style|docs",
|
|
13
|
+
"shortDescription": "Brief one-line description",
|
|
14
|
+
"fullDescription": "Detailed description of the issue",
|
|
15
|
+
"suggestion": "How to fix this issue (optional)"
|
|
16
|
+
}
|
|
17
|
+
]
|
|
18
|
+
</json>
|
|
19
|
+
|
|
20
|
+
## Field Descriptions:
|
|
21
|
+
|
|
22
|
+
- **file**: Relative path to the file containing the issue
|
|
23
|
+
- **lineStart**: Starting line number (MUST be an integer, e.g. `42`, NOT a string like `"42-45"`)
|
|
24
|
+
- **lineEnd**: Ending line number (MUST be an integer, can be same as lineStart)
|
|
25
|
+
- **severity**: One of: `critical`, `high`, `medium`, `low`
|
|
26
|
+
- **category**: One of: `security`, `performance`, `bug`, `quality`, `style`, `docs`
|
|
27
|
+
- **shortDescription**: Brief one-line summary of the issue
|
|
28
|
+
- **fullDescription**: Detailed explanation of what's wrong
|
|
29
|
+
- **suggestion**: (Optional) Recommendation on how to fix the issue
|
|
30
|
+
|
|
31
|
+
## CRITICAL FORMAT REQUIREMENTS:
|
|
32
|
+
|
|
33
|
+
- **lineStart and lineEnd MUST be integers**, not strings
|
|
34
|
+
- ✅ Correct: `"lineStart": 137, "lineEnd": 139`
|
|
35
|
+
- ❌ Wrong: `"line": "137-139"` or `"lineStart": "137"`
|
|
36
|
+
- Use the exact field names: `lineStart`, `lineEnd` (not `line`, `lineNumber`, etc.)
|
|
37
|
+
|
|
38
|
+
## Important Rules:
|
|
39
|
+
|
|
40
|
+
1. **Return empty array if no issues found**: `<json>[]</json>`
|
|
41
|
+
2. **Use valid JSON format** - ensure proper escaping of quotes and special characters
|
|
42
|
+
3. **Be precise with line numbers** - they must correspond to actual lines in the diff
|
|
43
|
+
4. **Only report actual issues** - do NOT report:
|
|
44
|
+
- Code that is already correct
|
|
45
|
+
- Positive observations or compliments
|
|
46
|
+
- "No action needed" type comments
|
|
47
|
+
- Documentation improvements that are already good
|
|
48
|
+
|
|
49
|
+
## Example:
|
|
50
|
+
|
|
51
|
+
<json>
|
|
52
|
+
[
|
|
53
|
+
{
|
|
54
|
+
"file": "src/utils/validator.ts",
|
|
55
|
+
"lineStart": 42,
|
|
56
|
+
"lineEnd": 45,
|
|
57
|
+
"severity": "high",
|
|
58
|
+
"category": "bug",
|
|
59
|
+
"shortDescription": "Potential null pointer dereference",
|
|
60
|
+
"fullDescription": "The 'user' object may be null at this point, but is accessed without a null check. This will cause a runtime error if user is null.",
|
|
61
|
+
"suggestion": "Add a null check before accessing user properties: if (user) { ... }"
|
|
62
|
+
}
|
|
63
|
+
]
|
|
64
|
+
</json>
|