npm - planflow-ai - Versions diffs - 1.3.0 → 1.3.2 - Mend

planflow-ai 1.3.0 → 1.3.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (26) hide show

package/.claude/commands/brainstorm.md +2 -2
package/.claude/commands/heartbeat.md +1 -1
package/.claude/commands/learn.md +1 -1
package/.claude/commands/{brain.md → note.md} +12 -12
package/.claude/commands/review-code.md +53 -0
package/.claude/commands/review-pr.md +53 -0
package/.claude/resources/core/_index.md +50 -2
package/.claude/resources/core/resource-capture.md +1 -1
package/.claude/resources/core/review-adaptive-depth.md +217 -0
package/.claude/resources/core/review-multi-agent.md +289 -0
package/.claude/resources/core/review-severity-ranking.md +149 -0
package/.claude/resources/core/review-verification.md +158 -0
package/.claude/resources/patterns/review-code-templates.md +315 -2
package/.claude/resources/skills/_index.md +9 -1
package/.claude/resources/skills/brain-skill.md +3 -3
package/.claude/resources/skills/review-code-skill.md +73 -0
package/.claude/resources/skills/review-pr-skill.md +58 -0
package/README.md +38 -3
package/dist/cli/handlers/claude.js +20 -12
package/dist/cli/handlers/claude.js.map +1 -1
package/package.json +1 -1
package/rules/skills/brain-skill.mdc +4 -4
package/skills/plan-flow/SKILL.md +1 -1
package/skills/plan-flow/brain/SKILL.md +1 -1
package/templates/shared/AGENTS.md.template +1 -1
package/templates/shared/CLAUDE.md.template +1 -1

package/.claude/resources/core/review-multi-agent.md ADDED Viewed

@@ -0,0 +1,289 @@
+# Review Multi-Agent Parallel Review
+## Purpose
+For large changesets (500+ lines, Deep mode), split the review into specialized subagents running in parallel. Each subagent focuses on a single concern, producing deeper findings than a single-pass review. A coordinator merges, deduplicates, verifies, and ranks the results.
+**Scope**: `/review-code` and `/review-pr` — activated only when adaptive depth selects **Deep** mode (500+ lines).
+**Goal**: Higher quality reviews for large PRs by eliminating context-switching between security, logic, performance, and pattern compliance concerns.
+---
+## When to Activate
+| Review Mode | Multi-Agent? |
+|-------------|-------------|
+| Lightweight (< 50 lines) | No |
+| Standard (50–500 lines) | No |
+| Deep (500+ lines) | **Yes** |
+Multi-agent is part of the Deep mode pipeline. It replaces the single-pass analysis steps with parallel subagent execution.
+---
+## Architecture
+```
+Coordinator (main agent)
+    │
+    ├─► Subagent: Security Review        (parallel)
+    ├─► Subagent: Logic & Bugs Review    (parallel)
+    ├─► Subagent: Performance Review     (parallel)
+    └─► Subagent: Pattern Compliance     (parallel)
+    │
+    ▼
+Coordinator: Collect → Deduplicate → Verify → Re-Rank → Output
+```
+---
+## Subagent Definitions
+### 1. Security Review Agent
+**Focus**: Vulnerabilities, hardcoded secrets, auth bypass, injection (SQL/XSS/command), OWASP top 10, exposed credentials, insecure deserialization, missing CSRF protection.
+**Model**: sonnet
+**Prompt template**:
+```
+You are a security-focused code reviewer. Analyze the provided diff ONLY for security vulnerabilities.
+Check for:
+- Hardcoded secrets, API keys, tokens
+- SQL/NoSQL injection
+- XSS vulnerabilities
+- Command injection
+- Authentication/authorization bypass
+- Insecure deserialization
+- Missing CSRF protection
+- Exposed sensitive data in logs or responses
+- Insecure cryptographic practices
+IGNORE: code style, performance, naming conventions, test coverage.
+Return findings as a JSON array. Each finding must have:
+- file: string (file path)
+- line: number (line number)
+- severity: "Critical" | "Major" | "Minor"
+- title: string (short finding name)
+- description: string (detailed explanation)
+- suggested_fix: string (code suggestion)
+- confidence: number (0.0-1.0)
+```
+### 2. Logic & Bugs Review Agent
+**Focus**: Edge cases, null/undefined handling, off-by-one errors, race conditions, incorrect boolean logic, infinite loops, unreachable code, wrong return types, missing error handling.
+**Model**: sonnet
+**Prompt template**:
+```
+You are a logic-focused code reviewer. Analyze the provided diff ONLY for logic bugs and edge cases.
+Check for:
+- Null/undefined access without guards
+- Off-by-one errors in loops and slicing
+- Race conditions in async code
+- Incorrect boolean logic (wrong operator, inverted condition)
+- Infinite loops or recursion without base case
+- Unreachable code paths
+- Wrong return types or missing returns
+- Unhandled promise rejections
+- Missing error handling on fallible operations
+IGNORE: security vulnerabilities, performance, code style, naming.
+Return findings as a JSON array. Each finding must have:
+- file: string (file path)
+- line: number (line number)
+- severity: "Critical" | "Major" | "Minor"
+- title: string (short finding name)
+- description: string (detailed explanation)
+- suggested_fix: string (code suggestion)
+- confidence: number (0.0-1.0)
+```
+### 3. Performance Review Agent
+**Focus**: N+1 queries, memory leaks, unnecessary re-renders, blocking I/O on main thread, excessive allocations, missing pagination, inefficient algorithms, large bundle impacts.
+**Model**: sonnet
+**Prompt template**:
+```
+You are a performance-focused code reviewer. Analyze the provided diff ONLY for performance issues.
+Check for:
+- N+1 database queries
+- Memory leaks (event listeners not removed, unclosed resources)
+- Unnecessary re-renders (React) or recomputations
+- Blocking I/O on main thread
+- Excessive object/array allocations in hot paths
+- Missing pagination on unbounded queries
+- O(n²) or worse algorithms where O(n) is possible
+- Large synchronous operations that should be async
+- Bundle size impacts (large imports that could be lazy-loaded)
+IGNORE: security vulnerabilities, logic bugs, code style, naming.
+Return findings as a JSON array. Each finding must have:
+- file: string (file path)
+- line: number (line number)
+- severity: "Major" | "Minor" | "Suggestion"
+- title: string (short finding name)
+- description: string (detailed explanation)
+- suggested_fix: string (code suggestion)
+- confidence: number (0.0-1.0)
+```
+### 4. Pattern Compliance Review Agent
+**Focus**: Violations of `forbidden-patterns.md`, deviations from `allowed-patterns.md`, naming inconsistencies, structural pattern conflicts with existing codebase.
+**Model**: haiku
+**Prompt template**:
+```
+You are a pattern compliance reviewer. Analyze the provided diff against the project's coding standards.
+Forbidden patterns to check (violations of these are findings):
+{contents of forbidden-patterns.md Project Anti-Patterns section}
+Allowed patterns to verify (deviations from these are findings):
+{contents of allowed-patterns.md Project Patterns section}
+Also check for:
+- Naming inconsistencies with existing codebase conventions
+- Import organization deviations
+- Error handling pattern deviations
+- Export pattern inconsistencies
+IGNORE: security vulnerabilities, logic bugs, performance issues.
+Return findings as a JSON array. Each finding must have:
+- file: string (file path)
+- line: number (line number)
+- severity: "Minor" | "Suggestion"
+- title: string (short finding name)
+- description: string (detailed explanation with pattern reference)
+- suggested_fix: string (code suggestion)
+- confidence: number (0.0-1.0)
+```
+---
+## Subagent Input
+Each subagent receives:
+1. **The diff** — For review-code: output of `git diff`. For review-pr: output of `gh pr diff` or Azure DevOps diff.
+2. **File categorization** — The file-to-category mapping from adaptive depth Step 1 (Core Logic, Infrastructure, UI, Tests)
+3. **Category-specific context** — Only the pattern files relevant to the subagent's focus
+4. **Instructions** — The subagent-specific prompt template above
+For very large diffs (2000+ lines), the coordinator may split the diff by file category and send each subagent only its most relevant files:
+- Security agent → all files (security issues can be anywhere)
+- Logic agent → Core Logic + Infrastructure files
+- Performance agent → Core Logic + UI files
+- Pattern agent → all files
+---
+## Coordinator Behavior
+The coordinator (main agent) orchestrates the entire flow:
+### Step 1: Spawn Subagents
+Launch all 4 subagents in parallel using the Agent tool. Each subagent uses `subagent_type: "general-purpose"` with the appropriate model override.
+```
+Launch in parallel:
+- Agent(model: "sonnet", prompt: security_prompt)
+- Agent(model: "sonnet", prompt: logic_prompt)
+- Agent(model: "sonnet", prompt: performance_prompt)
+- Agent(model: "haiku", prompt: patterns_prompt)
+```
+### Step 2: Collect Results
+Wait for all subagents to complete. Parse the JSON findings arrays from each.
+### Step 3: Deduplicate
+Scan for overlapping findings (same file + line range within ±5 lines + similar description):
+| Overlap Type | Resolution |
+|-------------|------------|
+| Exact match (same file, same line, same issue) | Merge into one finding, note both categories |
+| Near match (same file, ±5 lines, similar issue) | Merge if clearly the same root cause |
+| Different aspects of same code | Keep as separate findings |
+When merging:
+- Use the **higher severity** from the overlapping findings
+- Use the **higher confidence** score
+- Combine descriptions from both agents
+- Note all contributing categories in the finding
+### Step 4: Verify
+Run the standard verification pass on all deduplicated findings. See `.claude/resources/core/review-verification.md`.
+### Step 5: Re-Rank and Group
+Run the standard severity re-ranking. See `.claude/resources/core/review-severity-ranking.md`.
+### Step 6: Generate Output
+Use the deep review template with severity-grouped findings and executive summary. Add a Multi-Agent Summary section after Review Information:
+```markdown
+## Review Agents
+| Agent | Model | Findings | After Dedup |
+|-------|-------|----------|-------------|
+| Security | sonnet | {N} | {N} |
+| Logic & Bugs | sonnet | {N} | {N} |
+| Performance | sonnet | {N} | {N} |
+| Pattern Compliance | haiku | {N} | {N} |
+| **Total** | | **{N}** | **{N}** |
+Duplicates removed: {N}
+```
+---
+## Insertion Points
+### For review-code-skill.md
+In the **Deep mode** path of Step 1b, replace the instruction "Proceed with all steps" with:
+> **If Deep**: Activate multi-agent parallel review. See `.claude/resources/core/review-multi-agent.md`. Spawn 4 specialized subagents (security, logic, performance, patterns) in parallel. Coordinator collects results, deduplicates, then proceeds to Step 5b (verification), Step 5c (re-ranking), Step 6b (pattern review), and Step 6 (output using deep template).
+Steps 2–5 (pattern loading, similar implementations, analysis, pattern conflicts) are handled by the subagents instead of the main agent.
+### For review-pr-skill.md
+In the **Deep mode** path of Step 1b, replace the instruction "Proceed with all steps" with:
+> **If Deep**: Activate multi-agent parallel review. See `.claude/resources/core/review-multi-agent.md`. Spawn 4 specialized subagents (security, logic, performance, patterns) in parallel. Coordinator collects results, deduplicates, then proceeds to Step 3b (verification), Step 3c (re-ranking), and Step 4 (output using deep template with severity grouping and executive summary).
+Steps 2–3 (pattern loading, analysis) are handled by the subagents instead of the main agent.
+---
+## Related Files
+| File | Purpose |
+|------|---------|
+| `.claude/resources/core/review-adaptive-depth.md` | Triggers Deep mode (prerequisite) |
+| `.claude/resources/core/review-verification.md` | Verification pass (run by coordinator after dedup) |
+| `.claude/resources/core/review-severity-ranking.md` | Re-ranking (run by coordinator after verification) |
+| `.claude/resources/skills/review-code-skill.md` | Update Deep mode path in Step 1b |
+| `.claude/resources/skills/review-pr-skill.md` | Update Deep mode path in Step 1b |
+| `.claude/resources/patterns/review-code-templates.md` | Deep template gets Review Agents section |

package/.claude/resources/core/review-severity-ranking.md ADDED Viewed

@@ -0,0 +1,149 @@
+# Review Severity Re-Ranking
+## Purpose
+After all findings are collected and verified, re-rank them by actual impact rather than listing in file order. Group related findings across files and present critical issues first. This applies to **all review modes** (lightweight findings are already minimal, but if multiple issues are found, they should still be severity-ordered).
+**Scope**: `/review-code` (Step 5c) and `/review-pr` (Step 3c). Applied after verification, before generating the output document.
+**Goal**: Ensure the most impactful findings appear first. Reviewers should never have to scan past 10 minor style issues to find a critical security bug.
+---
+## Ranking Algorithm
+After verification classifies findings as Confirmed or Likely, sort using this priority:
+1. **Severity** (primary): Critical → Major → Minor → Suggestion
+2. **Confidence** (secondary, within same severity): Confirmed → Likely
+3. **Fix complexity** (tertiary, within same confidence): Lower complexity first (quick wins surface earlier)
+This ranking applies regardless of which file the finding came from.
+---
+## Grouping Related Findings
+Before final output, scan the sorted findings for groupable patterns:
+| Pattern | Group Condition | Example Group Title |
+|---------|----------------|---------------------|
+| Same issue type in multiple files | ≥ 2 findings with matching issue category | "Missing input validation in 3 API endpoints" |
+| Same root cause | ≥ 2 findings traceable to one underlying problem | "Inconsistent error handling (5 occurrences)" |
+| Causal chain | Findings where one enables/causes another | "Auth bypass: missing check → unprotected route → data exposure" |
+### Grouping Rules
+- **Only group when genuinely related** — don't force-group unrelated findings just because they share a severity level
+- **Small reviews (1-3 findings)**: No grouping. Keep findings individual.
+- **Use the highest severity in the group** as the group's severity level
+- **Show individual occurrences** within the group with `file:line` references
+- **Provide a single suggested fix** that addresses all occurrences when possible
+### Grouped Finding Format
+```markdown
+### Finding N: {Group Title} ({count} occurrences)
+| Field          | Value                                            |
+| -------------- | ------------------------------------------------ |
+| Severity       | {Highest severity in group}                      |
+| Fix Complexity | {Average complexity}/10 - {Level}                |
+| Pattern        | {Reference to pattern from rules, if applicable} |
+**Occurrences**:
+| # | File | Line | Status |
+|---|------|------|--------|
+| 1 | `{file_path}` | {line} | {Confirmed/Likely} |
+| 2 | `{file_path}` | {line} | {Confirmed/Likely} |
+| 3 | `{file_path}` | {line} | {Confirmed/Likely} |
+**Description**:
+{Explanation of the shared issue pattern}
+**Suggested Fix**:
+\`\`\`{language}
+// Single fix pattern that addresses all occurrences
+\`\`\`
+```
+---
+## Executive Summary Trigger
+| Review Mode | Trigger |
+|-------------|---------|
+| **Lightweight** | Never (too few findings to warrant) |
+| **Standard** | When total findings ≥ 5 |
+| **Deep** | Always (built into deep template) |
+When triggered in standard mode, prepend this before the findings section:
+```markdown
+## Executive Summary
+**Risk level**: {Low | Medium | High}
+**Top issues to address**:
+1. {Finding title} ({Severity}) — `{file}:{line}`
+2. {Finding title} ({Severity}) — `{file}:{line}`
+3. {Finding title} ({Severity}) — `{file}:{line}`
+```
+Show up to 3 top findings. Derive risk level from:
+- **High**: Any Critical finding, or ≥ 3 Major findings
+- **Medium**: Any Major finding, or ≥ 5 Minor findings
+- **Low**: Only Minor/Suggestion findings, fewer than 5 total
+---
+## Output Structure
+All review modes now use severity-grouped output instead of per-file ordering:
+```markdown
+## Critical Findings
+### 1. {Finding title}
+...
+## Major Findings
+### 2. {Finding title}
+...
+## Minor Findings
+### 3. {Finding title}
+...
+## Suggestions
+### 4. {Finding title}
+...
+```
+**Empty sections**: Omit severity sections that have no findings (e.g., if no Critical findings, skip the "Critical Findings" heading entirely).
+---
+## Insertion Points
+### For review-code-skill.md
+Insert as **Step 5c: Re-Rank and Group Findings** — after Step 5b (Verify Findings), before Step 6b (Pattern Review).
+### For review-pr-skill.md
+Insert as **Step 3c: Re-Rank and Group Findings** — after Step 3b (Verify Findings), before Step 4 (Generate Document).
+---
+## Related Files
+| File | Purpose |
+|------|---------|
+| `.claude/resources/skills/review-code-skill.md` | Add Step 5c (Re-Rank and Group) |
+| `.claude/resources/skills/review-pr-skill.md` | Add Step 3c (Re-Rank and Group) |
+| `.claude/resources/patterns/review-code-templates.md` | Standard template uses severity grouping |
+| `.claude/resources/core/review-adaptive-depth.md` | Deep mode already severity-groups; this extends to standard |
+| `.claude/resources/core/review-verification.md` | Verification runs before re-ranking |

package/.claude/resources/core/review-verification.md ADDED Viewed

@@ -0,0 +1,158 @@
+# Review Verification
+## Purpose
+Second-pass verification step for `/review-code` and `/review-pr`. After the initial analysis produces findings, each finding is re-examined against surrounding code context to filter false positives. Reduces noise, improves trust in the final review output.
+**Scope**: `/review-code` (Step 5b) and `/review-pr` (Step 3b). Applied after analysis, before document generation.
+**Goal**: Surface only actionable findings. A review that cries wolf on false positives trains developers to ignore it. Verification ensures every reported finding is worth the developer's attention.
+---
+## Verification Step Definition
+For each finding from the initial analysis:
+1. Re-read **15 lines above and 15 lines below** the flagged line (30 lines of context total)
+2. Evaluate **3 standard verification questions** (same for every finding)
+3. Evaluate **1 category-specific question** (based on the finding's category)
+4. Classify the finding as **Confirmed**, **Likely**, or **Dismissed**
+Apply this process to every finding before generating the review document.
+---
+## Standard Verification Questions
+Ask these three questions for every finding, regardless of category:
+**Q1 — Is this actually a bug/issue, or does the surrounding code handle it?**
+Check if adjacent code (error handling, guards, fallbacks, input validation) already addresses the concern raised. Look 15 lines above for upstream guards and 15 lines below for downstream handling.
+**Q2 — Is there a test that covers this case?**
+Look for test files related to the flagged code. If a test explicitly covers the scenario being flagged, the finding may be a false positive — the behavior is intentional and validated.
+**Q3 — Would a senior developer agree this is a real issue?**
+Apply the experienced developer filter. Trivial style preferences, subjective choices, or extremely minor nits that would not appear in a real code review should be dismissed. If the concern is valid but low-stakes, it may still qualify as a Suggestion.
+---
+## Category-Specific Verification Questions
+After the three standard questions, evaluate one additional question based on the finding's category:
+| Category | Verification Question |
+|----------|----------------------|
+| Security | Is there actually an exploit path, or is this internal-only code with no user input reaching the flagged point? |
+| Logic bug | Can this code path actually be reached with the flagged state? Trace the call chain from entry points. |
+| Performance | Is this in a hot path (called per-request, inside a loop, on every render), or is it called once during init/setup? |
+| Pattern violation | Does the surrounding code intentionally deviate for a documented reason (comment, TODO, legacy marker, explicit override)? |
+| Missing test | Is this logic already covered by integration or e2e tests, even if no unit test exists for this specific function? |
+| Error handling | Is the error actually possible at this point, or is it prevented by upstream validation or type constraints? |
+| Type safety | Does the runtime context guarantee the type, even if TypeScript cannot prove it statically? |
+---
+## Classification Criteria
+| Classification | Criteria | Action |
+|---------------|----------|--------|
+| **Confirmed** | Clear issue with evidence from context. At least 2 of 3 standard questions support the finding. | Keep in output as-is |
+| **Likely** | Probable issue but context is ambiguous. 1 of 3 standard questions supports; others are uncertain. | Keep in output with `[Likely]` tag |
+| **Dismissed** | False positive. Context clearly shows the code is correct. All 3 standard questions fail to support the finding. | Remove from output |
+### Conservative Classification Rules
+**Rule 1 — When in doubt, choose Likely over Dismissed.**
+It is better to show an ambiguous finding to the user than to hide a real issue. Only dismiss when context clearly resolves the concern.
+**Rule 2 — Never dismiss a Critical severity finding.**
+Critical findings can be downgraded to Likely at most. A Critical finding that appears to be a false positive should be tagged `[Likely]` and explained — never silently removed.
+---
+## Output Format
+### Verification Summary Template
+Include this block in the review document, after the Review Summary metrics table:
+```markdown
+## Verification Summary
+| Metric | Count |
+|--------|-------|
+| Initial findings | {N} |
+| Confirmed | {N} |
+| Likely (needs human judgment) | {N} |
+| Dismissed (false positives filtered) | {N} |
+| **False positive rate** | **{N}%** |
+```
+### Finding Tag Format
+Likely findings get `[Likely]` prepended to their title:
+```markdown
+### [Likely] Finding Title
+```
+Confirmed findings have no special tag — they are the default and need no marker.
+Dismissed findings are removed entirely from the output. They do not appear in the final review document.
+### Review Summary Metrics Update
+The existing Review Summary metrics table should reflect post-verification counts:
+```markdown
+| Metric | Value |
+|--------|-------|
+| Total findings (before verification) | {N} |
+| **Findings after verification** | **{N}** |
+| Critical | {N} |
+| Major | {N} |
+| Minor | {N} |
+| Suggestion | {N} |
+```
+---
+## Insertion Points
+### For review-code-skill.md
+Insert as **Step 5b: Verify Findings** — after Step 5 (Pattern Conflicts), before Step 6b (Pattern Review).
+### For review-pr-skill.md
+Insert as **Step 3b: Verify Findings** — after Step 3 (Analyze Code Changes), before Step 4 (Generate Document).
+### Logic (same for both)
+```
+1. Collect all findings from the analysis step
+2. For each finding:
+   a. Re-read surrounding context (15 lines above + 15 lines below)
+   b. Evaluate 3 standard verification questions
+   c. Evaluate 1 category-specific question
+   d. Classify as Confirmed / Likely / Dismissed
+3. Remove Dismissed findings from the findings list
+4. Tag Likely findings with [Likely] prefix in their title
+5. Generate Verification Summary stats
+6. Proceed to document generation with the filtered findings list
+```
+---
+## Related Files
+| File | Purpose |
+|------|---------|
+| `.claude/resources/skills/review-code-skill.md` | Add Step 5b (Verify Findings) |
+| `.claude/resources/skills/review-pr-skill.md` | Add Step 3b (Verify Findings) |
+| `.claude/resources/patterns/review-code-templates.md` | Add Verification Summary to review document template |
+| `.claude/commands/review-code.md` | Add verification reference |
+| `.claude/commands/review-pr.md` | Add verification reference |