planflow-ai 1.3.0 → 1.3.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,289 @@
1
+
2
+ # Review Multi-Agent Parallel Review
3
+
4
+ ## Purpose
5
+
6
+ For large changesets (500+ lines, Deep mode), split the review into specialized subagents running in parallel. Each subagent focuses on a single concern, producing deeper findings than a single-pass review. A coordinator merges, deduplicates, verifies, and ranks the results.
7
+
8
+ **Scope**: `/review-code` and `/review-pr` — activated only when adaptive depth selects **Deep** mode (500+ lines).
9
+
10
+ **Goal**: Higher quality reviews for large PRs by eliminating context-switching between security, logic, performance, and pattern compliance concerns.
11
+
12
+ ---
13
+
14
+ ## When to Activate
15
+
16
+ | Review Mode | Multi-Agent? |
17
+ |-------------|-------------|
18
+ | Lightweight (< 50 lines) | No |
19
+ | Standard (50–500 lines) | No |
20
+ | Deep (500+ lines) | **Yes** |
21
+
22
+ Multi-agent is part of the Deep mode pipeline. It replaces the single-pass analysis steps with parallel subagent execution.
23
+
24
+ ---
25
+
26
+ ## Architecture
27
+
28
+ ```
29
+ Coordinator (main agent)
30
+
31
+ ├─► Subagent: Security Review (parallel)
32
+ ├─► Subagent: Logic & Bugs Review (parallel)
33
+ ├─► Subagent: Performance Review (parallel)
34
+ └─► Subagent: Pattern Compliance (parallel)
35
+
36
+
37
+ Coordinator: Collect → Deduplicate → Verify → Re-Rank → Output
38
+ ```
39
+
40
+ ---
41
+
42
+ ## Subagent Definitions
43
+
44
+ ### 1. Security Review Agent
45
+
46
+ **Focus**: Vulnerabilities, hardcoded secrets, auth bypass, injection (SQL/XSS/command), OWASP top 10, exposed credentials, insecure deserialization, missing CSRF protection.
47
+
48
+ **Model**: sonnet
49
+
50
+ **Prompt template**:
51
+ ```
52
+ You are a security-focused code reviewer. Analyze the provided diff ONLY for security vulnerabilities.
53
+
54
+ Check for:
55
+ - Hardcoded secrets, API keys, tokens
56
+ - SQL/NoSQL injection
57
+ - XSS vulnerabilities
58
+ - Command injection
59
+ - Authentication/authorization bypass
60
+ - Insecure deserialization
61
+ - Missing CSRF protection
62
+ - Exposed sensitive data in logs or responses
63
+ - Insecure cryptographic practices
64
+
65
+ IGNORE: code style, performance, naming conventions, test coverage.
66
+
67
+ Return findings as a JSON array. Each finding must have:
68
+ - file: string (file path)
69
+ - line: number (line number)
70
+ - severity: "Critical" | "Major" | "Minor"
71
+ - title: string (short finding name)
72
+ - description: string (detailed explanation)
73
+ - suggested_fix: string (code suggestion)
74
+ - confidence: number (0.0-1.0)
75
+ ```
76
+
77
+ ### 2. Logic & Bugs Review Agent
78
+
79
+ **Focus**: Edge cases, null/undefined handling, off-by-one errors, race conditions, incorrect boolean logic, infinite loops, unreachable code, wrong return types, missing error handling.
80
+
81
+ **Model**: sonnet
82
+
83
+ **Prompt template**:
84
+ ```
85
+ You are a logic-focused code reviewer. Analyze the provided diff ONLY for logic bugs and edge cases.
86
+
87
+ Check for:
88
+ - Null/undefined access without guards
89
+ - Off-by-one errors in loops and slicing
90
+ - Race conditions in async code
91
+ - Incorrect boolean logic (wrong operator, inverted condition)
92
+ - Infinite loops or recursion without base case
93
+ - Unreachable code paths
94
+ - Wrong return types or missing returns
95
+ - Unhandled promise rejections
96
+ - Missing error handling on fallible operations
97
+
98
+ IGNORE: security vulnerabilities, performance, code style, naming.
99
+
100
+ Return findings as a JSON array. Each finding must have:
101
+ - file: string (file path)
102
+ - line: number (line number)
103
+ - severity: "Critical" | "Major" | "Minor"
104
+ - title: string (short finding name)
105
+ - description: string (detailed explanation)
106
+ - suggested_fix: string (code suggestion)
107
+ - confidence: number (0.0-1.0)
108
+ ```
109
+
110
+ ### 3. Performance Review Agent
111
+
112
+ **Focus**: N+1 queries, memory leaks, unnecessary re-renders, blocking I/O on main thread, excessive allocations, missing pagination, inefficient algorithms, large bundle impacts.
113
+
114
+ **Model**: sonnet
115
+
116
+ **Prompt template**:
117
+ ```
118
+ You are a performance-focused code reviewer. Analyze the provided diff ONLY for performance issues.
119
+
120
+ Check for:
121
+ - N+1 database queries
122
+ - Memory leaks (event listeners not removed, unclosed resources)
123
+ - Unnecessary re-renders (React) or recomputations
124
+ - Blocking I/O on main thread
125
+ - Excessive object/array allocations in hot paths
126
+ - Missing pagination on unbounded queries
127
+ - O(n²) or worse algorithms where O(n) is possible
128
+ - Large synchronous operations that should be async
129
+ - Bundle size impacts (large imports that could be lazy-loaded)
130
+
131
+ IGNORE: security vulnerabilities, logic bugs, code style, naming.
132
+
133
+ Return findings as a JSON array. Each finding must have:
134
+ - file: string (file path)
135
+ - line: number (line number)
136
+ - severity: "Major" | "Minor" | "Suggestion"
137
+ - title: string (short finding name)
138
+ - description: string (detailed explanation)
139
+ - suggested_fix: string (code suggestion)
140
+ - confidence: number (0.0-1.0)
141
+ ```
142
+
143
+ ### 4. Pattern Compliance Review Agent
144
+
145
+ **Focus**: Violations of `forbidden-patterns.md`, deviations from `allowed-patterns.md`, naming inconsistencies, structural pattern conflicts with existing codebase.
146
+
147
+ **Model**: haiku
148
+
149
+ **Prompt template**:
150
+ ```
151
+ You are a pattern compliance reviewer. Analyze the provided diff against the project's coding standards.
152
+
153
+ Forbidden patterns to check (violations of these are findings):
154
+ {contents of forbidden-patterns.md Project Anti-Patterns section}
155
+
156
+ Allowed patterns to verify (deviations from these are findings):
157
+ {contents of allowed-patterns.md Project Patterns section}
158
+
159
+ Also check for:
160
+ - Naming inconsistencies with existing codebase conventions
161
+ - Import organization deviations
162
+ - Error handling pattern deviations
163
+ - Export pattern inconsistencies
164
+
165
+ IGNORE: security vulnerabilities, logic bugs, performance issues.
166
+
167
+ Return findings as a JSON array. Each finding must have:
168
+ - file: string (file path)
169
+ - line: number (line number)
170
+ - severity: "Minor" | "Suggestion"
171
+ - title: string (short finding name)
172
+ - description: string (detailed explanation with pattern reference)
173
+ - suggested_fix: string (code suggestion)
174
+ - confidence: number (0.0-1.0)
175
+ ```
176
+
177
+ ---
178
+
179
+ ## Subagent Input
180
+
181
+ Each subagent receives:
182
+
183
+ 1. **The diff** — For review-code: output of `git diff`. For review-pr: output of `gh pr diff` or Azure DevOps diff.
184
+ 2. **File categorization** — The file-to-category mapping from adaptive depth Step 1 (Core Logic, Infrastructure, UI, Tests)
185
+ 3. **Category-specific context** — Only the pattern files relevant to the subagent's focus
186
+ 4. **Instructions** — The subagent-specific prompt template above
187
+
188
+ For very large diffs (2000+ lines), the coordinator may split the diff by file category and send each subagent only its most relevant files:
189
+ - Security agent → all files (security issues can be anywhere)
190
+ - Logic agent → Core Logic + Infrastructure files
191
+ - Performance agent → Core Logic + UI files
192
+ - Pattern agent → all files
193
+
194
+ ---
195
+
196
+ ## Coordinator Behavior
197
+
198
+ The coordinator (main agent) orchestrates the entire flow:
199
+
200
+ ### Step 1: Spawn Subagents
201
+
202
+ Launch all 4 subagents in parallel using the Agent tool. Each subagent uses `subagent_type: "general-purpose"` with the appropriate model override.
203
+
204
+ ```
205
+ Launch in parallel:
206
+ - Agent(model: "sonnet", prompt: security_prompt)
207
+ - Agent(model: "sonnet", prompt: logic_prompt)
208
+ - Agent(model: "sonnet", prompt: performance_prompt)
209
+ - Agent(model: "haiku", prompt: patterns_prompt)
210
+ ```
211
+
212
+ ### Step 2: Collect Results
213
+
214
+ Wait for all subagents to complete. Parse the JSON findings arrays from each.
215
+
216
+ ### Step 3: Deduplicate
217
+
218
+ Scan for overlapping findings (same file + line range within ±5 lines + similar description):
219
+
220
+ | Overlap Type | Resolution |
221
+ |-------------|------------|
222
+ | Exact match (same file, same line, same issue) | Merge into one finding, note both categories |
223
+ | Near match (same file, ±5 lines, similar issue) | Merge if clearly the same root cause |
224
+ | Different aspects of same code | Keep as separate findings |
225
+
226
+ When merging:
227
+ - Use the **higher severity** from the overlapping findings
228
+ - Use the **higher confidence** score
229
+ - Combine descriptions from both agents
230
+ - Note all contributing categories in the finding
231
+
232
+ ### Step 4: Verify
233
+
234
+ Run the standard verification pass on all deduplicated findings. See `.claude/resources/core/review-verification.md`.
235
+
236
+ ### Step 5: Re-Rank and Group
237
+
238
+ Run the standard severity re-ranking. See `.claude/resources/core/review-severity-ranking.md`.
239
+
240
+ ### Step 6: Generate Output
241
+
242
+ Use the deep review template with severity-grouped findings and executive summary. Add a Multi-Agent Summary section after Review Information:
243
+
244
+ ```markdown
245
+ ## Review Agents
246
+
247
+ | Agent | Model | Findings | After Dedup |
248
+ |-------|-------|----------|-------------|
249
+ | Security | sonnet | {N} | {N} |
250
+ | Logic & Bugs | sonnet | {N} | {N} |
251
+ | Performance | sonnet | {N} | {N} |
252
+ | Pattern Compliance | haiku | {N} | {N} |
253
+ | **Total** | | **{N}** | **{N}** |
254
+
255
+ Duplicates removed: {N}
256
+ ```
257
+
258
+ ---
259
+
260
+ ## Insertion Points
261
+
262
+ ### For review-code-skill.md
263
+
264
+ In the **Deep mode** path of Step 1b, replace the instruction "Proceed with all steps" with:
265
+
266
+ > **If Deep**: Activate multi-agent parallel review. See `.claude/resources/core/review-multi-agent.md`. Spawn 4 specialized subagents (security, logic, performance, patterns) in parallel. Coordinator collects results, deduplicates, then proceeds to Step 5b (verification), Step 5c (re-ranking), Step 6b (pattern review), and Step 6 (output using deep template).
267
+
268
+ Steps 2–5 (pattern loading, similar implementations, analysis, pattern conflicts) are handled by the subagents instead of the main agent.
269
+
270
+ ### For review-pr-skill.md
271
+
272
+ In the **Deep mode** path of Step 1b, replace the instruction "Proceed with all steps" with:
273
+
274
+ > **If Deep**: Activate multi-agent parallel review. See `.claude/resources/core/review-multi-agent.md`. Spawn 4 specialized subagents (security, logic, performance, patterns) in parallel. Coordinator collects results, deduplicates, then proceeds to Step 3b (verification), Step 3c (re-ranking), and Step 4 (output using deep template with severity grouping and executive summary).
275
+
276
+ Steps 2–3 (pattern loading, analysis) are handled by the subagents instead of the main agent.
277
+
278
+ ---
279
+
280
+ ## Related Files
281
+
282
+ | File | Purpose |
283
+ |------|---------|
284
+ | `.claude/resources/core/review-adaptive-depth.md` | Triggers Deep mode (prerequisite) |
285
+ | `.claude/resources/core/review-verification.md` | Verification pass (run by coordinator after dedup) |
286
+ | `.claude/resources/core/review-severity-ranking.md` | Re-ranking (run by coordinator after verification) |
287
+ | `.claude/resources/skills/review-code-skill.md` | Update Deep mode path in Step 1b |
288
+ | `.claude/resources/skills/review-pr-skill.md` | Update Deep mode path in Step 1b |
289
+ | `.claude/resources/patterns/review-code-templates.md` | Deep template gets Review Agents section |
@@ -0,0 +1,149 @@
1
+
2
+ # Review Severity Re-Ranking
3
+
4
+ ## Purpose
5
+
6
+ After all findings are collected and verified, re-rank them by actual impact rather than listing in file order. Group related findings across files and present critical issues first. This applies to **all review modes** (lightweight findings are already minimal, but if multiple issues are found, they should still be severity-ordered).
7
+
8
+ **Scope**: `/review-code` (Step 5c) and `/review-pr` (Step 3c). Applied after verification, before generating the output document.
9
+
10
+ **Goal**: Ensure the most impactful findings appear first. Reviewers should never have to scan past 10 minor style issues to find a critical security bug.
11
+
12
+ ---
13
+
14
+ ## Ranking Algorithm
15
+
16
+ After verification classifies findings as Confirmed or Likely, sort using this priority:
17
+
18
+ 1. **Severity** (primary): Critical → Major → Minor → Suggestion
19
+ 2. **Confidence** (secondary, within same severity): Confirmed → Likely
20
+ 3. **Fix complexity** (tertiary, within same confidence): Lower complexity first (quick wins surface earlier)
21
+
22
+ This ranking applies regardless of which file the finding came from.
23
+
24
+ ---
25
+
26
+ ## Grouping Related Findings
27
+
28
+ Before final output, scan the sorted findings for groupable patterns:
29
+
30
+ | Pattern | Group Condition | Example Group Title |
31
+ |---------|----------------|---------------------|
32
+ | Same issue type in multiple files | ≥ 2 findings with matching issue category | "Missing input validation in 3 API endpoints" |
33
+ | Same root cause | ≥ 2 findings traceable to one underlying problem | "Inconsistent error handling (5 occurrences)" |
34
+ | Causal chain | Findings where one enables/causes another | "Auth bypass: missing check → unprotected route → data exposure" |
35
+
36
+ ### Grouping Rules
37
+
38
+ - **Only group when genuinely related** — don't force-group unrelated findings just because they share a severity level
39
+ - **Small reviews (1-3 findings)**: No grouping. Keep findings individual.
40
+ - **Use the highest severity in the group** as the group's severity level
41
+ - **Show individual occurrences** within the group with `file:line` references
42
+ - **Provide a single suggested fix** that addresses all occurrences when possible
43
+
44
+ ### Grouped Finding Format
45
+
46
+ ```markdown
47
+ ### Finding N: {Group Title} ({count} occurrences)
48
+
49
+ | Field | Value |
50
+ | -------------- | ------------------------------------------------ |
51
+ | Severity | {Highest severity in group} |
52
+ | Fix Complexity | {Average complexity}/10 - {Level} |
53
+ | Pattern | {Reference to pattern from rules, if applicable} |
54
+
55
+ **Occurrences**:
56
+
57
+ | # | File | Line | Status |
58
+ |---|------|------|--------|
59
+ | 1 | `{file_path}` | {line} | {Confirmed/Likely} |
60
+ | 2 | `{file_path}` | {line} | {Confirmed/Likely} |
61
+ | 3 | `{file_path}` | {line} | {Confirmed/Likely} |
62
+
63
+ **Description**:
64
+ {Explanation of the shared issue pattern}
65
+
66
+ **Suggested Fix**:
67
+ \`\`\`{language}
68
+ // Single fix pattern that addresses all occurrences
69
+ \`\`\`
70
+ ```
71
+
72
+ ---
73
+
74
+ ## Executive Summary Trigger
75
+
76
+ | Review Mode | Trigger |
77
+ |-------------|---------|
78
+ | **Lightweight** | Never (too few findings to warrant) |
79
+ | **Standard** | When total findings ≥ 5 |
80
+ | **Deep** | Always (built into deep template) |
81
+
82
+ When triggered in standard mode, prepend this before the findings section:
83
+
84
+ ```markdown
85
+ ## Executive Summary
86
+
87
+ **Risk level**: {Low | Medium | High}
88
+
89
+ **Top issues to address**:
90
+
91
+ 1. {Finding title} ({Severity}) — `{file}:{line}`
92
+ 2. {Finding title} ({Severity}) — `{file}:{line}`
93
+ 3. {Finding title} ({Severity}) — `{file}:{line}`
94
+ ```
95
+
96
+ Show up to 3 top findings. Derive risk level from:
97
+ - **High**: Any Critical finding, or ≥ 3 Major findings
98
+ - **Medium**: Any Major finding, or ≥ 5 Minor findings
99
+ - **Low**: Only Minor/Suggestion findings, fewer than 5 total
100
+
101
+ ---
102
+
103
+ ## Output Structure
104
+
105
+ All review modes now use severity-grouped output instead of per-file ordering:
106
+
107
+ ```markdown
108
+ ## Critical Findings
109
+ ### 1. {Finding title}
110
+ ...
111
+
112
+ ## Major Findings
113
+ ### 2. {Finding title}
114
+ ...
115
+
116
+ ## Minor Findings
117
+ ### 3. {Finding title}
118
+ ...
119
+
120
+ ## Suggestions
121
+ ### 4. {Finding title}
122
+ ...
123
+ ```
124
+
125
+ **Empty sections**: Omit severity sections that have no findings (e.g., if no Critical findings, skip the "Critical Findings" heading entirely).
126
+
127
+ ---
128
+
129
+ ## Insertion Points
130
+
131
+ ### For review-code-skill.md
132
+
133
+ Insert as **Step 5c: Re-Rank and Group Findings** — after Step 5b (Verify Findings), before Step 6b (Pattern Review).
134
+
135
+ ### For review-pr-skill.md
136
+
137
+ Insert as **Step 3c: Re-Rank and Group Findings** — after Step 3b (Verify Findings), before Step 4 (Generate Document).
138
+
139
+ ---
140
+
141
+ ## Related Files
142
+
143
+ | File | Purpose |
144
+ |------|---------|
145
+ | `.claude/resources/skills/review-code-skill.md` | Add Step 5c (Re-Rank and Group) |
146
+ | `.claude/resources/skills/review-pr-skill.md` | Add Step 3c (Re-Rank and Group) |
147
+ | `.claude/resources/patterns/review-code-templates.md` | Standard template uses severity grouping |
148
+ | `.claude/resources/core/review-adaptive-depth.md` | Deep mode already severity-groups; this extends to standard |
149
+ | `.claude/resources/core/review-verification.md` | Verification runs before re-ranking |
@@ -0,0 +1,158 @@
1
+
2
+ # Review Verification
3
+
4
+ ## Purpose
5
+
6
+ Second-pass verification step for `/review-code` and `/review-pr`. After the initial analysis produces findings, each finding is re-examined against surrounding code context to filter false positives. Reduces noise, improves trust in the final review output.
7
+
8
+ **Scope**: `/review-code` (Step 5b) and `/review-pr` (Step 3b). Applied after analysis, before document generation.
9
+
10
+ **Goal**: Surface only actionable findings. A review that cries wolf on false positives trains developers to ignore it. Verification ensures every reported finding is worth the developer's attention.
11
+
12
+ ---
13
+
14
+ ## Verification Step Definition
15
+
16
+ For each finding from the initial analysis:
17
+
18
+ 1. Re-read **15 lines above and 15 lines below** the flagged line (30 lines of context total)
19
+ 2. Evaluate **3 standard verification questions** (same for every finding)
20
+ 3. Evaluate **1 category-specific question** (based on the finding's category)
21
+ 4. Classify the finding as **Confirmed**, **Likely**, or **Dismissed**
22
+
23
+ Apply this process to every finding before generating the review document.
24
+
25
+ ---
26
+
27
+ ## Standard Verification Questions
28
+
29
+ Ask these three questions for every finding, regardless of category:
30
+
31
+ **Q1 — Is this actually a bug/issue, or does the surrounding code handle it?**
32
+ Check if adjacent code (error handling, guards, fallbacks, input validation) already addresses the concern raised. Look 15 lines above for upstream guards and 15 lines below for downstream handling.
33
+
34
+ **Q2 — Is there a test that covers this case?**
35
+ Look for test files related to the flagged code. If a test explicitly covers the scenario being flagged, the finding may be a false positive — the behavior is intentional and validated.
36
+
37
+ **Q3 — Would a senior developer agree this is a real issue?**
38
+ Apply the experienced developer filter. Trivial style preferences, subjective choices, or extremely minor nits that would not appear in a real code review should be dismissed. If the concern is valid but low-stakes, it may still qualify as a Suggestion.
39
+
40
+ ---
41
+
42
+ ## Category-Specific Verification Questions
43
+
44
+ After the three standard questions, evaluate one additional question based on the finding's category:
45
+
46
+ | Category | Verification Question |
47
+ |----------|----------------------|
48
+ | Security | Is there actually an exploit path, or is this internal-only code with no user input reaching the flagged point? |
49
+ | Logic bug | Can this code path actually be reached with the flagged state? Trace the call chain from entry points. |
50
+ | Performance | Is this in a hot path (called per-request, inside a loop, on every render), or is it called once during init/setup? |
51
+ | Pattern violation | Does the surrounding code intentionally deviate for a documented reason (comment, TODO, legacy marker, explicit override)? |
52
+ | Missing test | Is this logic already covered by integration or e2e tests, even if no unit test exists for this specific function? |
53
+ | Error handling | Is the error actually possible at this point, or is it prevented by upstream validation or type constraints? |
54
+ | Type safety | Does the runtime context guarantee the type, even if TypeScript cannot prove it statically? |
55
+
56
+ ---
57
+
58
+ ## Classification Criteria
59
+
60
+ | Classification | Criteria | Action |
61
+ |---------------|----------|--------|
62
+ | **Confirmed** | Clear issue with evidence from context. At least 2 of 3 standard questions support the finding. | Keep in output as-is |
63
+ | **Likely** | Probable issue but context is ambiguous. 1 of 3 standard questions supports; others are uncertain. | Keep in output with `[Likely]` tag |
64
+ | **Dismissed** | False positive. Context clearly shows the code is correct. All 3 standard questions fail to support the finding. | Remove from output |
65
+
66
+ ### Conservative Classification Rules
67
+
68
+ **Rule 1 — When in doubt, choose Likely over Dismissed.**
69
+ It is better to show an ambiguous finding to the user than to hide a real issue. Only dismiss when context clearly resolves the concern.
70
+
71
+ **Rule 2 — Never dismiss a Critical severity finding.**
72
+ Critical findings can be downgraded to Likely at most. A Critical finding that appears to be a false positive should be tagged `[Likely]` and explained — never silently removed.
73
+
74
+ ---
75
+
76
+ ## Output Format
77
+
78
+ ### Verification Summary Template
79
+
80
+ Include this block in the review document, after the Review Summary metrics table:
81
+
82
+ ```markdown
83
+ ## Verification Summary
84
+
85
+ | Metric | Count |
86
+ |--------|-------|
87
+ | Initial findings | {N} |
88
+ | Confirmed | {N} |
89
+ | Likely (needs human judgment) | {N} |
90
+ | Dismissed (false positives filtered) | {N} |
91
+ | **False positive rate** | **{N}%** |
92
+ ```
93
+
94
+ ### Finding Tag Format
95
+
96
+ Likely findings get `[Likely]` prepended to their title:
97
+
98
+ ```markdown
99
+ ### [Likely] Finding Title
100
+ ```
101
+
102
+ Confirmed findings have no special tag — they are the default and need no marker.
103
+
104
+ Dismissed findings are removed entirely from the output. They do not appear in the final review document.
105
+
106
+ ### Review Summary Metrics Update
107
+
108
+ The existing Review Summary metrics table should reflect post-verification counts:
109
+
110
+ ```markdown
111
+ | Metric | Value |
112
+ |--------|-------|
113
+ | Total findings (before verification) | {N} |
114
+ | **Findings after verification** | **{N}** |
115
+ | Critical | {N} |
116
+ | Major | {N} |
117
+ | Minor | {N} |
118
+ | Suggestion | {N} |
119
+ ```
120
+
121
+ ---
122
+
123
+ ## Insertion Points
124
+
125
+ ### For review-code-skill.md
126
+
127
+ Insert as **Step 5b: Verify Findings** — after Step 5 (Pattern Conflicts), before Step 6b (Pattern Review).
128
+
129
+ ### For review-pr-skill.md
130
+
131
+ Insert as **Step 3b: Verify Findings** — after Step 3 (Analyze Code Changes), before Step 4 (Generate Document).
132
+
133
+ ### Logic (same for both)
134
+
135
+ ```
136
+ 1. Collect all findings from the analysis step
137
+ 2. For each finding:
138
+ a. Re-read surrounding context (15 lines above + 15 lines below)
139
+ b. Evaluate 3 standard verification questions
140
+ c. Evaluate 1 category-specific question
141
+ d. Classify as Confirmed / Likely / Dismissed
142
+ 3. Remove Dismissed findings from the findings list
143
+ 4. Tag Likely findings with [Likely] prefix in their title
144
+ 5. Generate Verification Summary stats
145
+ 6. Proceed to document generation with the filtered findings list
146
+ ```
147
+
148
+ ---
149
+
150
+ ## Related Files
151
+
152
+ | File | Purpose |
153
+ |------|---------|
154
+ | `.claude/resources/skills/review-code-skill.md` | Add Step 5b (Verify Findings) |
155
+ | `.claude/resources/skills/review-pr-skill.md` | Add Step 3b (Verify Findings) |
156
+ | `.claude/resources/patterns/review-code-templates.md` | Add Verification Summary to review document template |
157
+ | `.claude/commands/review-code.md` | Add verification reference |
158
+ | `.claude/commands/review-pr.md` | Add verification reference |