npm - devflow-kit - Versions diffs - 1.4.0 → 1.6.0 - Mend

devflow-kit 1.4.0 → 1.6.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (95) hide show

package/plugins/devflow-core-skills/.claude-plugin/plugin.json CHANGED Viewed

@@ -4,7 +4,7 @@
   "author": {
     "name": "Dean0x"
   },
-  "version": "1.4.0",
+  "version": "1.6.0",
   "homepage": "https://github.com/dean0x/devflow",
   "repository": "https://github.com/dean0x/devflow",
   "license": "MIT",
@@ -24,6 +24,7 @@
     "git-workflow",
     "github-patterns",
     "input-validation",
+    "search-first",
     "test-driven-development",
     "test-patterns"
   ]

package/plugins/devflow-core-skills/skills/docs-framework/SKILL.md CHANGED Viewed

@@ -39,7 +39,10 @@ All generated documentation lives under `.docs/` in the project root:
 .memory/
 ├── WORKING-MEMORY.md                   # Auto-maintained by Stop hook (overwritten)
 ├── PROJECT-PATTERNS.md                 # Accumulated patterns (merged across sessions)
-└── backup.json                         # Pre-compact git state snapshot
+├── backup.json                         # Pre-compact git state snapshot
+└── knowledge/
+    ├── decisions.md                    # Architectural decisions (ADR-NNN format)
+    └── pitfalls.md                     # Known pitfalls (PF-NNN format)
 ```
 ---
@@ -97,6 +100,8 @@ source .devflow/scripts/docs-helpers.sh 2>/dev/null || {
 |-------|-----------------|----------|
 | Reviewer | `.docs/reviews/{branch-slug}/{type}-report.{timestamp}.md` | Creates new |
 | Working Memory | `.memory/WORKING-MEMORY.md` | Overwrites (auto-maintained by Stop hook) |
+| Knowledge (decisions) | `.memory/knowledge/decisions.md` | Append-only (ADR-NNN sequential IDs) |
+| Knowledge (pitfalls) | `.memory/knowledge/pitfalls.md` | Append-only (PF-NNN sequential IDs) |
 ### Agents That Don't Persist
@@ -125,6 +130,7 @@ When creating or modifying persisting agents:
 This framework is used by:
 - **Review agents**: Creates review reports
 - **Working Memory hooks**: Auto-maintains `.memory/WORKING-MEMORY.md`
+- **Command flows**: `/implement` appends ADRs to `decisions.md`; `/code-review`, `/debug`, `/resolve` append PFs to `pitfalls.md`
 All persisting agents should load this skill to ensure consistent documentation.

package/plugins/devflow-core-skills/skills/search-first/SKILL.md ADDED Viewed

@@ -0,0 +1,133 @@
+---
+name: search-first
+description: >-
+  This skill should be used when the user asks to "add a utility", "create a helper",
+  "implement parsing", "build a wrapper", or writes infrastructure/utility code that
+  may already exist as a well-maintained package. Enforces research before building.
+user-invocable: false
+allowed-tools: Read, Grep, Glob
+---
+# Search-First
+Research before building. Check if a battle-tested solution exists before writing custom utility code.
+## Iron Law
+> **RESEARCH BEFORE BUILDING**
+>
+> Never write custom utility code without first checking if a battle-tested solution
+> exists. The best code is code you don't write. A maintained package with thousands
+> of users will always beat a hand-rolled utility in reliability, edge cases, and
+> long-term maintenance.
+## When This Skill Activates
+**Triggers** — creating or modifying code that:
+- Parses, formats, or validates data (dates, URLs, emails, UUIDs, etc.)
+- Implements common algorithms (sorting, diffing, hashing, encoding)
+- Wraps HTTP clients, retries, rate limiting, caching
+- Handles file system operations beyond basic read/write
+- Implements CLI argument parsing, logging, or configuration
+- Creates test utilities (mocking, fixtures, assertions)
+**Does NOT trigger** for:
+- Domain-specific business logic unique to this project
+- Glue code connecting existing components
+- Trivial operations (< 5 lines, single-use)
+- Code that intentionally avoids external dependencies (e.g., zero-dep libraries)
+---
+## Research Process
+### Phase 1: Need Analysis
+Before searching, define what you actually need:
+```
+Need: {one-sentence description of the capability}
+Constraints: {runtime, bundle size, license, zero-dep requirement}
+Must-haves: {non-negotiable requirements}
+Nice-to-haves: {optional features}
+```
+### Phase 2: Search
+Delegate research to an Explore subagent to keep main session context clean.
+**Spawn an Explore agent** with this prompt template:
+```
+Task(subagent_type="Explore"):
+"Research existing solutions for: {need description}
+Search for:
+1. npm/PyPI/crates packages that solve this (check package.json/requirements.txt for ecosystem)
+2. Existing utilities in this codebase (grep for related function names)
+3. Framework built-ins that already handle this
+For each candidate, find:
+- Package name and weekly downloads (if applicable)
+- Last publish date and maintenance status
+- Bundle size / dependency count
+- API surface relevant to our need
+- License compatibility
+Return top 3 candidates with pros/cons, or confirm nothing suitable exists."
+```
+### Phase 3: Evaluate
+Score each candidate against evaluation criteria. See `references/evaluation-criteria.md` for the full matrix.
+Quick checklist:
+- [ ] Last published within 12 months
+- [ ] Weekly downloads > 1,000 (npm) or equivalent traction
+- [ ] No known vulnerabilities (check Snyk/npm audit)
+- [ ] API fits the use case without heavy wrapping
+- [ ] License compatible with project (MIT/Apache/BSD preferred)
+- [ ] Bundle size acceptable for the project context
+### Phase 4: Decide
+Choose one of four outcomes:
+| Decision | When | Action |
+|----------|------|--------|
+| **Adopt** | Exact match, well-maintained, good API | Install and use directly |
+| **Extend** | Partial match, needs thin wrapper | Install + write minimal adapter |
+| **Compose** | No single package fits, but 2-3 small ones combine well | Install multiple, write glue code |
+| **Build** | Nothing fits, or dependency cost exceeds value | Write custom, document why |
+**Document the decision** in a code comment at the usage site:
+```typescript
+// search-first: Adopted date-fns for date formatting (2M weekly downloads, 30KB)
+// search-first: Built custom — no package handles our specific wire format
+```
+---
+## Anti-Patterns
+| Anti-Pattern | Correct Approach |
+|-------------|-----------------|
+| Adding a dependency for 5 lines of trivial code | Build — dependency overhead exceeds value |
+| Choosing the most popular package without checking fit | Evaluate API fit, not just popularity |
+| Wrapping a package so heavily it obscures the original | If wrapping > 50% of original API, reconsider |
+| Skipping research because "I know how to build this" | Research anyway — maintenance cost matters more than initial build |
+| Installing a massive framework for one utility function | Look for focused, single-purpose packages |
+## Scope Limiter
+This skill concerns **utility and infrastructure code** only:
+- Data transformation, validation, formatting
+- Network operations, retries, caching
+- CLI tooling, logging, configuration
+- Test utilities and helpers
+It does NOT apply to **domain-specific business logic** where:
+- The logic encodes unique business rules
+- No generic solution could exist
+- The code is inherently project-specific

package/plugins/devflow-core-skills/skills/search-first/references/evaluation-criteria.md ADDED Viewed

@@ -0,0 +1,101 @@
+# Search-First — Evaluation Criteria
+Detailed package evaluation criteria and decision matrix for the 4-outcome model.
+## Evaluation Matrix
+Score each candidate on these axes (1-5 scale):
+| Criterion | Weight | 1 (Poor) | 3 (Acceptable) | 5 (Excellent) |
+|-----------|--------|-----------|-----------------|----------------|
+| **Maintenance** | High | No commits in 2+ years | Active, yearly releases | Regular releases, responsive maintainer |
+| **Adoption** | Medium | < 100 weekly downloads | 1K-10K weekly downloads | > 100K weekly downloads |
+| **API Fit** | High | Needs heavy wrapping | Partial fit, thin adapter needed | Direct use, clean API |
+| **Bundle Size** | Medium | > 500KB | 50-500KB | < 50KB |
+| **Security** | High | Known vulnerabilities | No known issues, few dependencies | Audited, zero/minimal dependencies |
+| **License** | Required | GPL/AGPL (restrictive) | LGPL (conditional) | MIT/Apache/BSD (permissive) |
+**Minimum thresholds**: License must be compatible. Security must be ≥ 3. All others are trade-offs.
+## Decision Matrix
+### Adopt (score ≥ 20/25, API Fit ≥ 4)
+The package directly solves the problem with minimal integration code.
+**Example**: Using `zod` for schema validation — exact fit, massive adoption, tiny bundle.
+```
+✅ Adopt: zod v3.22
+- Maintenance: 5 (monthly releases)
+- Adoption: 5 (4M weekly downloads)
+- API Fit: 5 (direct use for all validation)
+- Bundle Size: 4 (57KB)
+- Security: 5 (zero dependencies)
+- Total: 24/25
+```
+### Extend (score ≥ 15/25, API Fit ≥ 2)
+The package handles 60-80% of the need. Write a thin adapter for the rest.
+**Example**: Using `got` for HTTP but wrapping it with project-specific retry and auth logic.
+```
+✅ Extend: got v14
+- Maintenance: 4 (active)
+- Adoption: 5 (8M weekly downloads)
+- API Fit: 3 (need custom retry wrapper)
+- Bundle Size: 3 (150KB)
+- Security: 4 (minimal deps)
+- Total: 19/25
+Adapter: ~30 lines wrapping retry + auth headers
+```
+### Compose (no single package fits, but small packages combine)
+Two or three focused packages together solve the problem better than one large framework.
+**Example**: `ms` (time parsing) + `p-retry` (retry logic) + `quick-lru` (caching) instead of a monolithic HTTP client framework.
+**Rules for Compose**:
+- Maximum 3 packages in a composition
+- Each package must be focused (single responsibility)
+- Total combined bundle < what a monolithic alternative would cost
+- Glue code should be < 50 lines
+### Build (nothing fits, or dependency cost > value)
+Write custom code when:
+- No package scores ≥ 15/25
+- The code is < 50 lines and trivial
+- Zero-dependency constraint is explicit
+- The domain is too specific for generic packages
+**Required**: Document why Build was chosen:
+```typescript
+// search-first: Built custom — our wire format uses non-standard
+// ISO-8601 extensions that no date library handles correctly.
+// Evaluated: date-fns (no custom format support), luxon (500KB overhead),
+// dayjs (close but missing timezone edge case).
+```
+## Ecosystem-Specific Hints
+### Node.js / TypeScript
+- Check npm: `https://www.npmjs.com/package/{name}`
+- Bundle size: `https://bundlephobia.com/package/{name}`
+- Check if Node.js built-ins handle it (`node:crypto`, `node:url`, `node:path`)
+### Python
+- Check PyPI: `https://pypi.org/project/{name}`
+- Check if stdlib handles it (`urllib`, `json`, `pathlib`, `dataclasses`)
+### Rust
+- Check crates.io: `https://crates.io/crates/{name}`
+- Check if std handles it
+### Go
+- Check pkg.go.dev
+- Go standard library is extensive — check stdlib first

package/plugins/devflow-core-skills/skills/test-driven-development/SKILL.md CHANGED Viewed

@@ -91,7 +91,7 @@ See `references/rationalization-prevention.md` for extended examples with code.
 ## Process Enforcement
-When implementing any feature under ambient BUILD/GUIDED:
+When implementing any feature under ambient IMPLEMENT/GUIDED or IMPLEMENT/ORCHESTRATED:
 1. **Identify the first behavior** — What is the simplest thing this feature must do?
 2. **Write the test** — Describe that behavior as a failing test
@@ -130,7 +130,8 @@ When skipping TDD, never rationalize. State clearly: "Skipping TDD because: [spe
 ## Integration with Ambient Mode
-- **BUILD/GUIDED** → TDD enforced. Every new function/method gets test-first treatment.
-- **BUILD/QUICK** → TDD skipped (trivial single-file edit).
-- **BUILD/ELEVATE** → TDD mentioned in nudge toward `/implement`.
-- **DEBUG/GUIDED** → TDD applies to the fix: write a test that reproduces the bug first, then fix.
+- **IMPLEMENT/GUIDED** → TDD enforced in main session. Write the failing test before production code. Skill loaded directly.
+- **IMPLEMENT/ORCHESTRATED** → TDD enforced via Coder agent (skill in Coder frontmatter). Every implementation gets test-first treatment.
+- **IMPLEMENT/QUICK** → TDD skipped (trivial single-file edit).
+- **DEBUG/GUIDED** → TDD applies to the fix in main session: write a test that reproduces the bug first, then fix.
+- **DEBUG/ORCHESTRATED** → TDD applies to the fix: write a test that reproduces the bug first, then fix.

package/plugins/devflow-debug/.claude-plugin/plugin.json CHANGED Viewed

@@ -4,7 +4,7 @@
   "author": {
     "name": "Dean0x"
   },
-  "version": "1.4.0",
+  "version": "1.6.0",
   "homepage": "https://github.com/dean0x/devflow",
   "repository": "https://github.com/dean0x/devflow",
   "license": "MIT",
@@ -15,10 +15,12 @@
     "agent-teams"
   ],
   "agents": [
-    "git"
+    "git",
+    "synthesizer"
   ],
   "skills": [
     "agent-teams",
-    "git-safety"
+    "git-safety",
+    "knowledge-persistence"
   ]
 }

package/plugins/devflow-debug/agents/synthesizer.md ADDED Viewed

@@ -0,0 +1,211 @@
+---
+name: Synthesizer
+description: Combines outputs from multiple agents into actionable summaries (modes: exploration, planning, review)
+model: haiku
+skills: review-methodology, docs-framework
+---
+# Synthesizer Agent
+You are a synthesis specialist. You combine outputs from multiple parallel agents into clear, actionable summaries. You operate in three modes: exploration, planning, and review.
+## Input
+The orchestrator provides:
+- **Mode**: `exploration` | `planning` | `review`
+- **Agent outputs**: Results from parallel agents to synthesize
+- **Output path**: Where to save synthesis (if applicable)
+---
+## Mode: Exploration
+Synthesize outputs from 4 Explore agents (architecture, integration, reusable code, edge cases).
+**Process:**
+1. Extract key findings from each explorer
+2. Identify patterns that appear across multiple explorations
+3. Resolve conflicts if explorers found contradictory patterns
+4. Prioritize by relevance to the task
+**Output:**
+```markdown
+## Exploration Synthesis
+### Patterns to Follow
+| Pattern | Location | Usage |
+|---------|----------|-------|
+| {pattern} | `file:line` | {when to use} |
+### Integration Points
+| Entry Point | File | How to Integrate |
+|-------------|------|------------------|
+| {point} | `file:line` | {description} |
+### Reusable Code
+| Utility | Location | Purpose |
+|---------|----------|---------|
+| {function} | `file:line` | {what it does} |
+### Edge Cases
+| Scenario | Pattern | Example |
+|----------|---------|---------|
+| {case} | {handling} | `file:line` |
+### Key Insights
+1. {insight}
+2. {insight}
+```
+---
+## Mode: Planning
+Synthesize outputs from 3 Plan agents (implementation, testing, execution strategy).
+**Process:**
+1. Merge implementation steps with testing strategy
+2. Apply execution strategy analysis to determine Coder deployment
+3. Identify dependencies between steps
+4. Assess context risk based on file count and module breadth
+**Execution Strategy Decision:**
+Analyze 3 axes to determine strategy:
+| Axis | Signals | Impact |
+|------|---------|--------|
+| **Artifact Independence** | Shared contracts? Integration points? Cross-file dependencies? | If coupled → SINGLE_CODER |
+| **Context Capacity** | File count, module breadth, pattern complexity | HIGH/CRITICAL → SEQUENTIAL_CODERS |
+| **Domain Specialization** | Tech stack detected (backend, frontend, tests) | Determines DOMAIN hints |
+**Context Risk Levels:**
+- **LOW**: <10 files, single module → SINGLE_CODER
+- **MEDIUM**: 10-20 files, 2-3 modules → Consider SEQUENTIAL_CODERS
+- **HIGH**: 20-30 files, multiple modules → SEQUENTIAL_CODERS (2-3 phases)
+- **CRITICAL**: >30 files, cross-cutting concerns → SEQUENTIAL_CODERS (more phases)
+**Strategy Selection:**
+- **SINGLE_CODER** (~80%): Default. Coherent A→Z implementation. Best for consistency in naming, patterns, error handling.
+- **SEQUENTIAL_CODERS** (~15%): Context overflow risk or layered dependencies. Split into phases with handoff summaries.
+- **PARALLEL_CODERS** (~5%): True artifact independence - no shared contracts, no integration points. Rare.
+**Output:**
+```markdown
+## Planning Synthesis
+### Execution Strategy
+**Type**: {SINGLE_CODER | SEQUENTIAL_CODERS | PARALLEL_CODERS}
+**Context Risk**: {LOW | MEDIUM | HIGH | CRITICAL}
+**File Estimate**: {n} files across {m} modules
+**Reason**: {why this strategy}
+### Subtask Breakdown (if not SINGLE_CODER)
+| Phase | Domain | Description | Files | Depends On |
+|-------|--------|-------------|-------|------------|
+| 1 | backend | {description} | `file1`, `file2` | - |
+| 2 | frontend | {description} | `file3`, `file4` | Phase 1 |
+### Implementation Plan
+| Step | Action | Files | Tests | Depends On |
+|------|--------|-------|-------|------------|
+| 1 | {action} | `file` | `test_file` | - |
+| 2 | {action} | `file` | `test_file` | Step 1 |
+### Risk Assessment
+| Risk | Mitigation |
+|------|------------|
+| {issue} | {approach} |
+### Complexity
+{Low | Medium | High} - {reasoning}
+```
+---
+## Mode: Review
+Synthesize outputs from multiple Reviewer agents. Apply strict merge rules.
+**Process:**
+1. Read all review reports from `${REVIEW_BASE_DIR}/*.md` (exclude your own output `review-summary.*.md`)
+2. Extract confidence percentages from each finding
+3. Apply confidence-aware aggregation: when multiple reviewers flag the same file:line, boost confidence by 10% per additional reviewer (cap at 100%)
+<!-- Confidence threshold also in: shared/agents/reviewer.md, plugins/devflow-code-review/commands/code-review.md -->
+4. Maintain ≥80% confidence threshold in final output
+5. Categorize issues into 3 buckets (from review-methodology)
+6. Count by severity (CRITICAL, HIGH, MEDIUM, LOW)
+7. Determine merge recommendation based on blocking issues
+**Issue Categories:**
+- **Blocking** (Category 1): Issues in YOUR changes - CRITICAL/HIGH must block
+- **Should-Fix** (Category 2): Issues in code you touched - HIGH/MEDIUM
+- **Pre-existing** (Category 3): Legacy issues - informational only
+**Merge Rules:**
+| Condition | Recommendation |
+|-----------|----------------|
+| Any CRITICAL in blocking | BLOCK MERGE |
+| Any HIGH in blocking | CHANGES REQUESTED |
+| Only MEDIUM in blocking | APPROVED WITH COMMENTS |
+| No blocking issues | APPROVED |
+**Output:**
+**CRITICAL**: Write the summary to disk using the Write tool:
+1. Create directory: `mkdir -p ${REVIEW_BASE_DIR}`
+2. Write to `${REVIEW_BASE_DIR}/review-summary.${TIMESTAMP}.md` using Write tool
+3. Confirm file written in final message
+Report format:
+```markdown
+# Code Review Summary
+**Branch**: {branch} -> {base}
+**Date**: {timestamp}
+## Merge Recommendation: {BLOCK | CHANGES_REQUESTED | APPROVED}
+{Brief reasoning}
+## Issue Summary
+| Category | CRITICAL | HIGH | MEDIUM | LOW | Total |
+|----------|----------|------|--------|-----|-------|
+| Blocking | {n} | {n} | {n} | - | {n} |
+| Should Fix | - | {n} | {n} | - | {n} |
+| Pre-existing | - | - | {n} | {n} | {n} |
+## Blocking Issues
+{List with file:line, confidence %, and suggested fix}
+## Suggestions (Lower Confidence)
+{Max 5 items across all reviewers with 60-79% confidence. Brief descriptions only.}
+## Action Plan
+1. {Priority fix}
+2. {Next fix}
+```
+---
+## Principles
+1. **No new research** - Only synthesize what agents found
+2. **Preserve references** - Keep file:line from source agents
+3. **Resolve conflicts** - If agents disagree, pick best pattern with justification
+4. **Actionable output** - Results must be executable by next phase
+5. **Accurate counts** - Issue counts must match reality (review mode)
+6. **Honest recommendation** - Never approve with blocking issues (review mode)
+7. **Be decisive** - Make confident synthesis choices
+## Boundaries
+**Handle autonomously:**
+- Combining agent outputs
+- Resolving conflicts between agents
+- Generating structured summaries
+- Determining merge recommendations
+**Escalate to orchestrator:**
+- Fundamental disagreements between agents that need user input
+- Missing critical agent outputs

package/plugins/devflow-debug/commands/debug-teams.md CHANGED Viewed

@@ -23,7 +23,11 @@ Investigate bugs by spawning a team of agents, each pursuing a different hypothe
 ## Phases
-### Phase 1: Context Gathering
+### Phase 1: Load Project Knowledge
+Read `.memory/knowledge/decisions.md` and `.memory/knowledge/pitfalls.md`. Known pitfalls from prior debugging sessions and code reviews can directly inform hypothesis generation — pass their content as context to investigators in Phase 2.
+### Phase 2: Context Gathering
 If `$ARGUMENTS` starts with `#`, fetch the GitHub issue:
@@ -39,7 +43,7 @@ Analyze the bug description (from arguments or issue) and identify 3-5 plausible
 - **Testable**: Can be confirmed or disproved by reading code/logs
 - **Distinct**: Does not overlap significantly with other hypotheses
-### Phase 2: Spawn Investigation Team
+### Phase 3: Spawn Investigation Team
 Create an agent team with one investigator per hypothesis:
@@ -99,7 +103,7 @@ Spawn investigator teammates with self-contained prompts:
 (Add more investigators if bug complexity warrants 4-5 hypotheses — same pattern)
 ```
-### Phase 3: Investigation
+### Phase 4: Investigation
 Teammates investigate in parallel:
 - Read relevant source files
@@ -108,7 +112,7 @@ Teammates investigate in parallel:
 - Look for edge cases and race conditions
 - Build evidence for or against their hypothesis
-### Phase 4: Adversarial Debate
+### Phase 5: Adversarial Debate
 Lead initiates debate via broadcast:
@@ -135,7 +139,7 @@ Teammates challenge each other directly using SendMessage:
 - `SendMessage(type: "message", recipient: "team-lead", summary: "Updated hypothesis")`
   "I've updated my hypothesis based on investigator-b's finding at..."
-### Phase 5: Convergence
+### Phase 6: Convergence
 After debate (max 2 rounds), lead collects results:
@@ -147,7 +151,7 @@ Lead broadcast:
 - PARTIAL: Some aspects confirmed, others not (details)"
 ```
-### Phase 6: Cleanup
+### Phase 7: Cleanup
 Shut down all investigator teammates explicitly:
@@ -160,7 +164,7 @@ TeamDelete
 Verify TeamDelete succeeded. If failed, retry once after 5s. If retry fails, HALT.
 ```
-### Phase 7: Report
+### Phase 8: Report
 Lead produces final report:
@@ -189,30 +193,40 @@ Lead produces final report:
 {HIGH/MEDIUM/LOW based on consensus strength}
 ```
+### Phase 9: Record Pitfall (if root cause found)
+If root cause was identified with HIGH or MEDIUM confidence:
+1. Read `~/.claude/skills/knowledge-persistence/SKILL.md` and follow its extraction procedure to record pitfalls to `.memory/knowledge/pitfalls.md`
+2. Source field: `/debug {bug description}`
 ## Architecture
 ```
 /debug (orchestrator)
 │
-├─ Phase 1: Context gathering
+├─ Phase 1: Load Project Knowledge
+│
+├─ Phase 2: Context gathering
 │  └─ Git agent (fetch issue, if #N provided)
 │
-├─ Phase 2: Spawn investigation team
+├─ Phase 3: Spawn investigation team
 │  └─ Create team with 3-5 hypothesis investigators
 │
-├─ Phase 3: Parallel investigation
+├─ Phase 4: Parallel investigation
 │  └─ Each teammate investigates independently
 │
-├─ Phase 4: Adversarial debate
+├─ Phase 5: Adversarial debate
 │  └─ Teammates challenge each other directly (max 2 rounds)
 │
-├─ Phase 5: Convergence
+├─ Phase 6: Convergence
 │  └─ Teammates submit final hypothesis status
 │
-├─ Phase 6: Cleanup
+├─ Phase 7: Cleanup
 │  └─ Shut down teammates, release resources
 │
-└─ Phase 7: Root cause report with confidence level
+├─ Phase 8: Root cause report with confidence level
+│
+└─ Phase 9: Record Pitfall (inline, if root cause found)
 ```
 ## Principles