npm - planflow-ai - Versions diffs - 1.3.0 → 1.3.4 - Mend

planflow-ai 1.3.0 → 1.3.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (44) hide show

package/.claude/commands/brainstorm.md +2 -2
package/.claude/commands/discovery-plan.md +11 -0
package/.claude/commands/execute-plan.md +26 -2
package/.claude/commands/flow.md +5 -0
package/.claude/commands/heartbeat.md +1 -1
package/.claude/commands/learn.md +4 -6
package/.claude/commands/{brain.md → note.md} +12 -12
package/.claude/commands/review-code.md +61 -1
package/.claude/commands/review-pr.md +61 -1
package/.claude/commands/setup.md +11 -1
package/.claude/resources/core/_index.md +102 -2
package/.claude/resources/core/compaction-guide.md +111 -0
package/.claude/resources/core/discovery-sub-agents.md +266 -0
package/.claude/resources/core/phase-isolation.md +222 -0
package/.claude/resources/core/resource-capture.md +1 -1
package/.claude/resources/core/review-adaptive-depth.md +217 -0
package/.claude/resources/core/review-multi-agent.md +289 -0
package/.claude/resources/core/review-severity-ranking.md +149 -0
package/.claude/resources/core/review-verification.md +158 -0
package/.claude/resources/core/session-scratchpad.md +105 -0
package/.claude/resources/patterns/review-code-templates.md +315 -2
package/.claude/resources/skills/_index.md +108 -42
package/.claude/resources/skills/brain-skill.md +3 -3
package/.claude/resources/skills/discovery-skill.md +50 -6
package/.claude/resources/skills/execute-plan-skill.md +14 -6
package/.claude/resources/skills/review-code-skill.md +73 -0
package/.claude/resources/skills/review-pr-skill.md +58 -0
package/README.md +38 -3
package/dist/cli/commands/heartbeat.js.map +1 -1
package/dist/cli/daemon/heartbeat-daemon.js +31 -1
package/dist/cli/daemon/heartbeat-daemon.js.map +1 -1
package/dist/cli/daemon/heartbeat-parser.d.ts.map +1 -1
package/dist/cli/daemon/heartbeat-parser.js +6 -0
package/dist/cli/daemon/heartbeat-parser.js.map +1 -1
package/dist/cli/handlers/claude.js +20 -12
package/dist/cli/handlers/claude.js.map +1 -1
package/dist/cli/types.d.ts +1 -0
package/dist/cli/types.d.ts.map +1 -1
package/package.json +1 -1
package/rules/skills/brain-skill.mdc +4 -4
package/skills/plan-flow/SKILL.md +1 -1
package/skills/plan-flow/brain/SKILL.md +1 -1
package/templates/shared/AGENTS.md.template +1 -1
package/templates/shared/CLAUDE.md.template +11 -1

package/.claude/resources/core/review-verification.md ADDED Viewed

@@ -0,0 +1,158 @@
+# Review Verification
+## Purpose
+Second-pass verification step for `/review-code` and `/review-pr`. After the initial analysis produces findings, each finding is re-examined against surrounding code context to filter false positives. Reduces noise, improves trust in the final review output.
+**Scope**: `/review-code` (Step 5b) and `/review-pr` (Step 3b). Applied after analysis, before document generation.
+**Goal**: Surface only actionable findings. A review that cries wolf on false positives trains developers to ignore it. Verification ensures every reported finding is worth the developer's attention.
+---
+## Verification Step Definition
+For each finding from the initial analysis:
+1. Re-read **15 lines above and 15 lines below** the flagged line (30 lines of context total)
+2. Evaluate **3 standard verification questions** (same for every finding)
+3. Evaluate **1 category-specific question** (based on the finding's category)
+4. Classify the finding as **Confirmed**, **Likely**, or **Dismissed**
+Apply this process to every finding before generating the review document.
+---
+## Standard Verification Questions
+Ask these three questions for every finding, regardless of category:
+**Q1 — Is this actually a bug/issue, or does the surrounding code handle it?**
+Check if adjacent code (error handling, guards, fallbacks, input validation) already addresses the concern raised. Look 15 lines above for upstream guards and 15 lines below for downstream handling.
+**Q2 — Is there a test that covers this case?**
+Look for test files related to the flagged code. If a test explicitly covers the scenario being flagged, the finding may be a false positive — the behavior is intentional and validated.
+**Q3 — Would a senior developer agree this is a real issue?**
+Apply the experienced developer filter. Trivial style preferences, subjective choices, or extremely minor nits that would not appear in a real code review should be dismissed. If the concern is valid but low-stakes, it may still qualify as a Suggestion.
+---
+## Category-Specific Verification Questions
+After the three standard questions, evaluate one additional question based on the finding's category:
+| Category | Verification Question |
+|----------|----------------------|
+| Security | Is there actually an exploit path, or is this internal-only code with no user input reaching the flagged point? |
+| Logic bug | Can this code path actually be reached with the flagged state? Trace the call chain from entry points. |
+| Performance | Is this in a hot path (called per-request, inside a loop, on every render), or is it called once during init/setup? |
+| Pattern violation | Does the surrounding code intentionally deviate for a documented reason (comment, TODO, legacy marker, explicit override)? |
+| Missing test | Is this logic already covered by integration or e2e tests, even if no unit test exists for this specific function? |
+| Error handling | Is the error actually possible at this point, or is it prevented by upstream validation or type constraints? |
+| Type safety | Does the runtime context guarantee the type, even if TypeScript cannot prove it statically? |
+---
+## Classification Criteria
+| Classification | Criteria | Action |
+|---------------|----------|--------|
+| **Confirmed** | Clear issue with evidence from context. At least 2 of 3 standard questions support the finding. | Keep in output as-is |
+| **Likely** | Probable issue but context is ambiguous. 1 of 3 standard questions supports; others are uncertain. | Keep in output with `[Likely]` tag |
+| **Dismissed** | False positive. Context clearly shows the code is correct. All 3 standard questions fail to support the finding. | Remove from output |
+### Conservative Classification Rules
+**Rule 1 — When in doubt, choose Likely over Dismissed.**
+It is better to show an ambiguous finding to the user than to hide a real issue. Only dismiss when context clearly resolves the concern.
+**Rule 2 — Never dismiss a Critical severity finding.**
+Critical findings can be downgraded to Likely at most. A Critical finding that appears to be a false positive should be tagged `[Likely]` and explained — never silently removed.
+---
+## Output Format
+### Verification Summary Template
+Include this block in the review document, after the Review Summary metrics table:
+```markdown
+## Verification Summary
+| Metric | Count |
+|--------|-------|
+| Initial findings | {N} |
+| Confirmed | {N} |
+| Likely (needs human judgment) | {N} |
+| Dismissed (false positives filtered) | {N} |
+| **False positive rate** | **{N}%** |
+```
+### Finding Tag Format
+Likely findings get `[Likely]` prepended to their title:
+```markdown
+### [Likely] Finding Title
+```
+Confirmed findings have no special tag — they are the default and need no marker.
+Dismissed findings are removed entirely from the output. They do not appear in the final review document.
+### Review Summary Metrics Update
+The existing Review Summary metrics table should reflect post-verification counts:
+```markdown
+| Metric | Value |
+|--------|-------|
+| Total findings (before verification) | {N} |
+| **Findings after verification** | **{N}** |
+| Critical | {N} |
+| Major | {N} |
+| Minor | {N} |
+| Suggestion | {N} |
+```
+---
+## Insertion Points
+### For review-code-skill.md
+Insert as **Step 5b: Verify Findings** — after Step 5 (Pattern Conflicts), before Step 6b (Pattern Review).
+### For review-pr-skill.md
+Insert as **Step 3b: Verify Findings** — after Step 3 (Analyze Code Changes), before Step 4 (Generate Document).
+### Logic (same for both)
+```
+1. Collect all findings from the analysis step
+2. For each finding:
+   a. Re-read surrounding context (15 lines above + 15 lines below)
+   b. Evaluate 3 standard verification questions
+   c. Evaluate 1 category-specific question
+   d. Classify as Confirmed / Likely / Dismissed
+3. Remove Dismissed findings from the findings list
+4. Tag Likely findings with [Likely] prefix in their title
+5. Generate Verification Summary stats
+6. Proceed to document generation with the filtered findings list
+```
+---
+## Related Files
+| File | Purpose |
+|------|---------|
+| `.claude/resources/skills/review-code-skill.md` | Add Step 5b (Verify Findings) |
+| `.claude/resources/skills/review-pr-skill.md` | Add Step 3b (Verify Findings) |
+| `.claude/resources/patterns/review-code-templates.md` | Add Verification Summary to review document template |
+| `.claude/commands/review-code.md` | Add verification reference |
+| `.claude/commands/review-pr.md` | Add verification reference |

package/.claude/resources/core/session-scratchpad.md ADDED Viewed

@@ -0,0 +1,105 @@
+# Session Scratchpad
+## Purpose
+The session scratchpad (`flow/.scratchpad.md`) is an ephemeral per-session file for raw notes, temporary analysis, and in-progress observations. It bridges the gap between transient session work and permanent artifacts (ledger, brain, memory).
+**Core principle**: Write things down so they survive compaction; promote the valuable ones before session ends.
+---
+## Session Start Behavior
+On session start:
+1. If `flow/.scratchpad.md` exists and has content, read it silently
+2. If entries look stale (from a previous session), scan for promotable items:
+   - Learnings → promote to `flow/ledger.md`
+   - Error patterns → promote to `flow/brain/errors/`
+   - Feature context → promote to `flow/brain/features/`
+   - User preferences → promote to `flow/ledger.md` (User Preferences section)
+3. After promotion, clear the file (reset to empty template)
+4. If the file is empty or doesn't exist, skip — no action needed
+---
+## Write Triggers
+Write to the scratchpad when you encounter:
+| Trigger | Example Entry |
+|---------|---------------|
+| **Non-obvious error resolution** | "Fixed ESM import issue — need `type: module` in package.json AND `moduleResolution: node16` in tsconfig" |
+| **Architectural insight** | "This project uses barrel exports everywhere — new files need re-export in index.ts" |
+| **Dead-end approach** | "Tried using `fs.watch` for file monitoring — unreliable on Linux, switched to chokidar" |
+| **User preference discovered** | "User prefers terse responses, no trailing summaries" |
+| **Open question** | "Need to verify: does the CI pipeline run on Node 18 or 20?" |
+| **Mid-task context** | "Reviewing auth module — found 3 unused imports, will clean up after main task" |
+### What NOT to Write
+- Routine file reads or grep results (re-readable)
+- Build/test output (captured in logs)
+- Things already captured by brain-capture blocks
+- Obvious facts derivable from the code
+- Full code snippets (reference file:line instead)
+---
+## File Format
+```markdown
+# Session Scratchpad
+## Notes
+- [{timestamp}] {note}
+- [{timestamp}] {note}
+## Open Questions
+- {question}
+## Promote Before Session End
+- [ ] {item to promote} → {target: ledger/brain/memory}
+```
+Keep under **50 lines**. When approaching the limit, promote older entries or discard low-value ones.
+---
+## Promotion Rules
+Before session ends (or when the scratchpad is getting full):
+1. **Scan each entry** and classify:
+   - **Promote**: Valuable insight that should persist → move to appropriate target
+   - **Discard**: Transient note, already acted on, or no future value → delete
+2. **Promotion targets**:
+   | Content Type | Target | Section |
+   |-------------|--------|---------|
+   | Project learning / workaround | `flow/ledger.md` | What Works / What Didn't Work |
+   | User preference | `flow/ledger.md` | User Preferences |
+   | Error pattern | `flow/brain/errors/{name}.md` | New or updated entry |
+   | Feature context | `flow/brain/features/{name}.md` | Timeline update |
+   | Reusable pattern | `flow/resources/pending-patterns.md` | Pattern capture buffer |
+3. **After promotion**, clear the scratchpad (leave just the empty template headers)
+---
+## Compaction Integration
+During `/compact`, the scratchpad is in the **preserve** list — its contents survive compaction summaries. This is critical because scratchpad notes often contain context that would otherwise be lost.
+After compaction, the model should re-read `flow/.scratchpad.md` to restore any notes written before compaction.
+---
+## Rules
+1. **Lightweight writes** — single-line entries, not paragraphs
+2. **No duplication** — don't write what brain-capture already captures
+3. **Self-manage size** — promote or discard when approaching 50 lines
+4. **Promote before ending** — always scan for promotable items before session ends
+5. **Not a task list** — use `flow/tasklist.md` for tasks, scratchpad is for observations

package/.claude/resources/patterns/review-code-templates.md CHANGED Viewed

@@ -61,15 +61,48 @@ For each changed file, similar implementations in the codebase:
 ---
+## Verification Summary
+| Metric | Count |
+|--------|-------|
+| Initial findings | {N} |
+| Confirmed | {N} |
+| Likely (needs human judgment) | {N} |
+| Dismissed (false positives filtered) | {N} |
+| **False positive rate** | **{N}%** |
+---
+## Executive Summary
+> Only include when total findings ≥ 5. Omit for smaller reviews.
+**Risk level**: {Low | Medium | High}
+**Top issues to address**:
+1. {Finding title} ({Severity}) — `{file}:{line}`
+2. {Finding title} ({Severity}) — `{file}:{line}`
+3. {Finding title} ({Severity}) — `{file}:{line}`
+---
 ## Findings
-### Finding 1: {Finding Name}
+> Findings are grouped by severity (Critical → Major → Minor → Suggestion).
+> For findings classified as "Likely" during verification, prepend `[Likely]` to the heading.
+> Omit empty severity sections.
+> Related findings across files may be grouped — see review-severity-ranking.md.
+### Critical Findings
+#### Finding 1: {Finding Name}
 | Field          | Value                                            |
 | -------------- | ------------------------------------------------ |
 | File           | `{file_path}`                                    |
 | Line           | {line_number}                                    |
-| Severity       | {Critical/Major/Minor/Suggestion}                |
+| Severity       | Critical                                         |
 | Fix Complexity | {X/10} - {Level}                                 |
 | Pattern        | {Reference to pattern from rules, if applicable} |
@@ -84,6 +117,24 @@ See `{reference_file_path}` for how this is handled elsewhere in the codebase.
 // Suggested code improvement
 \`\`\`
+### Major Findings
+#### Finding N: {Finding Name}
+> Same format as Critical Findings.
+### Minor Findings
+#### Finding N: {Finding Name}
+> Same format as Critical Findings.
+### Suggestions
+#### Finding N: {Finding Name}
+> Same format as Critical Findings.
 ---
 ## Pattern Conflicts
@@ -184,6 +235,268 @@ List any particularly well-written code or good practices observed:
 ---
+## Lightweight Review Template (< 50 lines)
+Use this compact template when the changeset is under 50 lines.
+```markdown
+# Local Code Review: {Description}
+**Project**: [[{project-name}]]
+## Review Information
+| Field          | Value                 |
+| -------------- | --------------------- |
+| Date           | {date}                |
+| Files Reviewed | {number_of_files}     |
+| Scope          | {all/staged/unstaged} |
+| Language(s)    | {detected_languages}  |
+---
+## Review Summary
+| Metric | Value |
+|--------|-------|
+| **Review Mode** | Lightweight (< 50 lines) |
+| **Total Findings** | {count} |
+| **Status** | {LGTM / Needs Changes} |
+---
+## Findings
+> Only present if issues were found. Skip this section entirely for LGTM reviews.
+### Finding 1: {Finding Name}
+| Field          | Value                                            |
+| -------------- | ------------------------------------------------ |
+| File           | `{file_path}`                                    |
+| Line           | {line_number}                                    |
+| Severity       | {Critical/Major/Minor}                           |
+| Fix Complexity | {X/10} - {Level}                                 |
+**Description**:
+{Detailed explanation of the issue found}
+**Suggested Fix**:
+\`\`\`{language}
+// Suggested code improvement
+\`\`\`
+---
+## Positive Highlights
+- {Highlight 1}
+- {Highlight 2}
+- {Highlight 3}
+---
+## Commit Readiness
+| Status | {Ready to Commit / Needs Changes} |
+| ------ | --------------------------------- |
+| Reason | {Brief explanation}               |
+```
+> **Note**: Lightweight reviews skip Reference Implementations, Pattern Conflicts, Rule Update Recommendations, and Verification Summary sections.
+---
+## Deep Review Template (500+ lines)
+Use this template for large changesets. Findings are grouped by severity instead of by file, and an executive summary is prepended.
+```markdown
+# Local Code Review: {Description}
+**Project**: [[{project-name}]]
+## Review Information
+| Field          | Value                 |
+| -------------- | --------------------- |
+| Date           | {date}                |
+| Files Reviewed | {number_of_files}     |
+| Scope          | {all/staged/unstaged} |
+| Language(s)    | {detected_languages}  |
+---
+## Executive Summary
+### Files Changed by Category
+| Category | Files | Lines Changed |
+|----------|-------|--------------|
+| Core Logic | {N} | +{add}/-{del} |
+| Infrastructure | {N} | +{add}/-{del} |
+| UI/Presentation | {N} | +{add}/-{del} |
+| Tests | {N} | +{add}/-{del} |
+### Risk Assessment
+**Overall Risk**: {Low | Medium | High}
+{1-2 sentence justification based on scope, categories affected, and finding severity distribution}
+### Top 3 Findings
+1. **[{Severity}]** {Finding title} — {one-line description} (`{file}:{line}`)
+2. **[{Severity}]** {Finding title} — {one-line description} (`{file}:{line}`)
+3. **[{Severity}]** {Finding title} — {one-line description} (`{file}:{line}`)
+---
+## Review Agents
+> Multi-agent parallel review section. Only present in Deep mode (500+ lines).
+| Agent | Model | Findings | After Dedup |
+|-------|-------|----------|-------------|
+| Security | sonnet | {N} | {N} |
+| Logic & Bugs | sonnet | {N} | {N} |
+| Performance | sonnet | {N} | {N} |
+| Pattern Compliance | haiku | {N} | {N} |
+| **Total** | | **{N}** | **{N}** |
+Duplicates removed: {N}
+---
+## Changed Files
+| File          | Category       | Status     | Lines Changed |
+| ------------- | -------------- | ---------- | ------------- |
+| `{file_path}` | {Core/Infra/UI/Tests} | {modified} | +{add}/-{del} |
+| ...           | ...            | ...        | ...           |
+---
+## Reference Implementations Found
+> Same format as standard template — per changed file, similar implementations in the codebase.
+---
+## Review Summary
+| Metric                | Value              |
+| --------------------- | ------------------ |
+| **Review Mode**       | Deep (500+ lines)  |
+| **Total Findings**    | {count}            |
+| Critical              | {critical_count}   |
+| Major                 | {major_count}      |
+| Minor                 | {minor_count}      |
+| Suggestion            | {suggestion_count} |
+| **Pattern Conflicts** | {conflict_count}   |
+| **Total Fix Effort**  | {sum_of_scores}/X  |
+---
+## Verification Summary
+| Metric | Count |
+|--------|-------|
+| Initial findings | {N} |
+| Confirmed | {N} |
+| Likely (needs human judgment) | {N} |
+| Dismissed (false positives filtered) | {N} |
+| **False positive rate** | **{N}%** |
+---
+## Critical Findings
+### Finding 1: {Finding Name}
+| Field          | Value                                            |
+| -------------- | ------------------------------------------------ |
+| File           | `{file_path}`                                    |
+| Line           | {line_number}                                    |
+| Severity       | Critical                                         |
+| Fix Complexity | {X/10} - {Level}                                 |
+| Category       | {Core Logic/Infrastructure/UI/Tests}             |
+| Pattern        | {Reference to pattern from rules, if applicable} |
+**Description**:
+{Detailed explanation}
+**Reference Implementation**:
+See `{reference_file_path}` for how this is handled elsewhere.
+**Suggested Fix**:
+\`\`\`{language}
+// Suggested code improvement
+\`\`\`
+---
+## Major Findings
+### Finding N: {Finding Name}
+> Same format as Critical Findings.
+---
+## Minor Findings
+### Finding N: {Finding Name}
+> Same format as Critical Findings.
+---
+## Suggestions
+### Finding N: {Finding Name}
+> Same format as Critical Findings.
+---
+## Pattern Conflicts
+> Same format as standard template.
+---
+## Rule Update Recommendations
+> Same format as standard template.
+---
+## Positive Highlights
+- {Highlight 1}
+- {Highlight 2}
+---
+## Commit Readiness
+| Status | {Ready to Commit/Needs Changes/Needs Discussion} |
+| ------ | ------------------------------------------------ |
+| Reason | {Brief explanation}                              |
+### Before Committing
+- [ ] Address all Critical findings
+- [ ] Address all Major findings
+- [ ] Review Pattern Conflicts and decide on resolution
+- [ ] Update rules files if new patterns should be documented
+```
+> **Note**: Deep reviews always include the Verification Summary, group findings by severity, and prepend an Executive Summary with risk assessment. The Changed Files table includes a Category column.
+---
 ## Example Output
 ### Pattern Conflict Example