npm - devflow-kit - Versions diffs - 1.4.0 → 1.6.0 - Mend

devflow-kit 1.4.0 → 1.6.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (95) hide show

package/shared/agents/synthesizer.md CHANGED Viewed

@@ -128,10 +128,14 @@ Analyze 3 axes to determine strategy:
 Synthesize outputs from multiple Reviewer agents. Apply strict merge rules.
 **Process:**
-1. Read all review reports from `${REVIEW_BASE_DIR}/*-report.*.md`
-2. Categorize issues into 3 buckets (from review-methodology)
-3. Count by severity (CRITICAL, HIGH, MEDIUM, LOW)
-4. Determine merge recommendation based on blocking issues
+1. Read all review reports from `${REVIEW_BASE_DIR}/*.md` (exclude your own output `review-summary.*.md`)
+2. Extract confidence percentages from each finding
+3. Apply confidence-aware aggregation: when multiple reviewers flag the same file:line, boost confidence by 10% per additional reviewer (cap at 100%)
+<!-- Confidence threshold also in: shared/agents/reviewer.md, plugins/devflow-code-review/commands/code-review.md -->
+4. Maintain ≥80% confidence threshold in final output
+5. Categorize issues into 3 buckets (from review-methodology)
+6. Count by severity (CRITICAL, HIGH, MEDIUM, LOW)
+7. Determine merge recommendation based on blocking issues
 **Issue Categories:**
 - **Blocking** (Category 1): Issues in YOUR changes - CRITICAL/HIGH must block
@@ -172,7 +176,10 @@ Report format:
 | Pre-existing | - | - | {n} | {n} | {n} |
 ## Blocking Issues
-{List with file:line and suggested fix}
+{List with file:line, confidence %, and suggested fix}
+## Suggestions (Lower Confidence)
+{Max 5 items across all reviewers with 60-79% confidence. Brief descriptions only.}
 ## Action Plan
 1. {Priority fix}

package/shared/skills/ambient-router/SKILL.md CHANGED Viewed

@@ -1,25 +1,23 @@
 ---
 name: ambient-router
-description: >-
-  Classify user intent and response depth for ambient mode. Auto-loads relevant
-  skills without explicit command invocation. Used by /ambient command and
-  always-on UserPromptSubmit hook.
+description: This skill should be used when classifying user intent for ambient mode, auto-loading relevant skills without explicit command invocation. Used by the always-on UserPromptSubmit hook.
 user-invocable: false
 allowed-tools: Read, Grep, Glob
 ---
 # Ambient Router
-Classify user intent and auto-load relevant skills. Zero overhead for simple requests, skill injection for substantive work, workflow nudges for complex tasks.
+Classify user intent and auto-load relevant skills. Zero overhead for simple requests, skill loading + optional agent orchestration for substantive work.
 ## Iron Law
-> **PROPORTIONAL RESPONSE**
+> **PROPORTIONAL RESPONSE MATCHED TO SCOPE**
 >
-> Match effort to intent. Never apply heavyweight processes to lightweight requests.
-> A chat question gets zero overhead. A 3-file feature gets 2-3 skills. A system
-> refactor gets a nudge toward `/implement`. Misclassification in either direction
-> is a failure.
+> QUICK gets zero overhead. GUIDED gets skill loading + main session implementation
+> with Simplifier cleanup. ORCHESTRATED gets full skill loading via the Skill tool plus
+> agent pipeline execution. Misclassification in either direction is a failure —
+> false-positive ORCHESTRATED is expensive (5-6 agent spawns), false-negative
+> GUIDED leaves quality on the table.
 ---
@@ -29,14 +27,14 @@ Determine what the user is trying to do from their prompt.
 | Intent | Signal Words / Patterns | Examples |
 |--------|------------------------|---------|
-| **BUILD** | "add", "create", "implement", "build", "write", "make" | "add a login form", "create an API endpoint" |
+| **IMPLEMENT** | "add", "create", "implement", "build", "write", "make" | "add a login form", "create an API endpoint" |
 | **DEBUG** | "fix", "bug", "broken", "failing", "error", "why does" | "fix the auth error", "why is this test failing" |
 | **REVIEW** | "check", "look at", "review", "is this ok", "any issues" | "check this function", "any issues with this?" |
 | **PLAN** | "how should", "design", "architecture", "approach", "strategy" | "how should I structure auth?", "what's the approach for caching?" |
 | **EXPLORE** | "what is", "where is", "find", "show me", "explain", "how does" | "where is the config?", "explain this function" |
 | **CHAT** | greetings, meta-questions, confirmations, short responses | "thanks", "yes", "what can you do?" |
-**Ambiguous prompts:** Default to the lowest-overhead classification. "Update the README" → BUILD/GUIDED. Git operations like "commit this" → QUICK.
+**Ambiguous prompts:** Default to the lowest-overhead classification. "Update the README" → QUICK. Git operations like "commit this" → QUICK.
 ## Step 2: Classify Depth
@@ -44,44 +42,87 @@ Determine how much enforcement the prompt warrants.
 | Depth | Criteria | Action |
 |-------|----------|--------|
-| **QUICK** | CHAT intent. EXPLORE with no analytical depth ("where is X?"). Git/devops operations (commit, push, merge, branch, pr, deploy, reinstall). Single-word continuations. | Respond normally. Zero overhead. Do not state classification. |
-| **GUIDED** | BUILD/DEBUG/REVIEW/PLAN intent (any word count). EXPLORE with analytical depth ("analyze our X", "discuss how Y works"). | Read and apply 2-3 relevant skills from the selection matrix below. State classification briefly. |
-| **ELEVATE** | Multi-file architectural change, system-wide scope, > 5 files. Detailed implementation plan (100+ words with plan structure). | Respond at best effort + recommend: "This looks like it would benefit from `/implement` for full lifecycle management." |
+| **QUICK** | CHAT intent. EXPLORE intent. Git/devops operations (commit, push, merge, branch, pr, deploy, reinstall). Single-word continuations. Small edits, config changes, trivial single-file tweaks. | Respond normally. Zero overhead. Do not state classification. |
+| **GUIDED** | IMPLEMENT with small scope (≤2 files, single module). DEBUG with clear error location (stack trace, specific file, known function). PLAN for focused design questions (specific area/pattern). REVIEW (always GUIDED). | Load skills via Skill tool. Main session implements directly. Spawn Simplifier after code changes. State classification. |
+| **ORCHESTRATED** | IMPLEMENT with larger scope (>2 files, multi-module, complex). DEBUG with vague/cross-cutting bug (no clear location, multiple possible causes). PLAN for system-level architecture (caching layer, auth system, multi-module design). | Load skills via Skill tool, then orchestrate agents per Step 5. State classification. |
-## Step 3: Select Skills (GUIDED depth only)
+**Scope-based decision criteria:**
-Based on classified intent, read the following skills to inform your response.
+| Intent | GUIDED (small scope) | ORCHESTRATED (large scope) |
+|--------|---------------------|---------------------------|
+| **IMPLEMENT** | ≤2 files, single module, clear task | >2 files, multi-module, complex |
+| **DEBUG** | Clear error with known location (stack trace, specific file) | Vague/cross-cutting bug, multiple possible causes |
+| **PLAN** | Focused question about specific area/pattern | System-level architecture, multi-module design |
+| **REVIEW** | Always GUIDED | — |
+**Classification conservatism:** Default to QUICK. Only classify GUIDED/ORCHESTRATED when the prompt has clear task scope. When choosing between GUIDED and ORCHESTRATED, prefer GUIDED — escalate only when scope clearly exceeds main-session capacity.
+## Step 3: Select Skills
+Based on classified intent and depth, invoke each selected skill using the Skill tool.
+### GUIDED-depth skills
 | Intent | Primary Skills | Secondary (if file type matches) |
 |--------|---------------|----------------------------------|
-| **BUILD** | test-driven-development, implementation-patterns | typescript (.ts), react (.tsx/.jsx), go (.go), java (.java), python (.py), rust (.rs), frontend-design (CSS/UI), input-validation (forms/API), security-patterns (auth/crypto) |
-| **DEBUG** | test-patterns, core-patterns | git-safety (if git operations involved) |
+| **IMPLEMENT** | test-driven-development, implementation-patterns, search-first | typescript (.ts), react (.tsx/.jsx), go (.go), java (.java), python (.py), rust (.rs), frontend-design (CSS/UI), input-validation (forms/API), security-patterns (auth/crypto) |
+| **DEBUG** | core-patterns, test-patterns | git-safety (if git operations involved) |
+| **PLAN** | implementation-patterns, core-patterns | — |
 | **REVIEW** | self-review, core-patterns | test-patterns |
-| **PLAN** | implementation-patterns | core-patterns |
-**Excluded from ambient** (review-command-only): review-methodology, complexity-patterns, consistency-patterns, database-patterns, dependencies-patterns, documentation-patterns, regression-patterns, architecture-patterns, accessibility.
+### ORCHESTRATED-depth skills
+| Intent | Primary Skills | Secondary (if file type matches) |
+|--------|---------------|----------------------------------|
+| **IMPLEMENT** | implementation-orchestration, implementation-patterns | typescript (.ts), react (.tsx/.jsx), go (.go), java (.java), python (.py), rust (.rs), frontend-design (CSS/UI), input-validation (forms/API), security-patterns (auth/crypto) |
+| **DEBUG** | debug-orchestration, core-patterns | git-safety (if git operations involved) |
+| **PLAN** | plan-orchestration, implementation-patterns, core-patterns | — |
+**Excluded from ambient** (review-command-only): review-methodology, complexity-patterns, consistency-patterns, database-patterns, dependencies-patterns, documentation-patterns, regression-patterns, architecture-patterns, accessibility, performance-patterns.
 See `references/skill-catalog.md` for the full skill-to-intent mapping with file pattern triggers.
 ## Step 4: Apply
 <IMPORTANT>
-When classification is GUIDED or ELEVATE, skill application is NON-NEGOTIABLE.
+When classification is GUIDED or ORCHESTRATED, skill loading is NON-NEGOTIABLE.
 Do not rationalize skipping skills. Do not respond without loading them first.
-If test-driven-development is selected, you MUST write the failing test before ANY production code.
+BLOCKING REQUIREMENT: Invoke each selected skill using the Skill tool before proceeding.
+For IMPLEMENT intent, enforce TDD: write the failing test before ANY production code.
 </IMPORTANT>
 - **QUICK:** Respond directly. No preamble, no classification statement.
-- **GUIDED:** State classification briefly: `Ambient: BUILD/GUIDED. Loading: test-driven-development, implementation-patterns.` Then read the selected skills and apply their patterns. No exceptions.
-- **ELEVATE:** Respond with your best effort, then append: `> This task spans multiple files/systems. Consider \`/implement\` for full lifecycle.`
+- **GUIDED:** State classification briefly: `Ambient: IMPLEMENT/GUIDED. Loading: implementation-patterns, search-first.` Then invoke each skill using the Skill tool and work directly in main session. After code changes, spawn Simplifier on changed files.
+- **ORCHESTRATED:** State classification briefly: `Ambient: IMPLEMENT/ORCHESTRATED. Loading: implementation-orchestration, implementation-patterns.` Then invoke each skill using the Skill tool and follow Step 5 for agent orchestration.
+### GUIDED Behavior by Intent
+| Intent | Main Session Work | Post-Work |
+|--------|------------------|-----------|
+| **IMPLEMENT** | Implement directly with loaded skills. Follow TDD cycle. | Spawn Simplifier on changed files. |
+| **DEBUG** | Investigate directly — reproduce bug, diagnose from stack trace/error, fix. | Spawn Simplifier on changed files. |
+| **PLAN** | Explore relevant code and design directly. The area is focused enough for main session. | No Simplifier (no code changes). |
+| **REVIEW** | Review directly with loaded skills. | No Simplifier. |
+## Step 5: Orchestrate Agents (ORCHESTRATED depth only)
+After loading skills via Step 3-4, execute the agent pipeline for the classified intent:
+| Intent | Pipeline |
+|--------|----------|
+| **IMPLEMENT** | Follow implementation-orchestration skill pipeline: pre-flight → plan synthesis → Coder → quality gates |
+| **DEBUG** | Follow debug-orchestration skill pipeline: hypotheses → parallel Explores → convergence → report → offer fix |
+| **PLAN** | Follow plan-orchestration skill pipeline: Skimmer → Explores → Plan agent → gap validation |
+| **EXPLORE** | No agents — respond in main session |
+| **CHAT** | No agents — respond in main session |
 ---
 ## Transparency Rules
 1. **QUICK → silent.** No classification output.
-2. **GUIDED → brief statement + full skill enforcement.** One line: intent, depth, skills loaded. Then follow every skill requirement without shortcuts.
-3. **ELEVATE → recommendation.** Best-effort response + workflow nudge.
+2. **GUIDED → brief statement + full skill enforcement.** One line: intent, depth, skills loaded. Then implement in main session with skill patterns applied.
+3. **ORCHESTRATED → brief statement + full skill enforcement + agent orchestration.** One line: intent, depth, skills loaded. Then follow every skill requirement and orchestrate agents per Step 5.
 4. **Never lie about classification.** If uncertain, say so.
 5. **Never over-classify.** When in doubt, go one tier lower.
 6. **Never under-apply.** Rationalization is the enemy of quality. If a skill requires a step, do the step.
@@ -90,7 +131,10 @@ If test-driven-development is selected, you MUST write the failing test before A
 | Case | Handling |
 |------|----------|
-| Mixed intent ("fix this bug and add a test") | Use the higher-overhead intent (BUILD > DEBUG) |
+| Mixed intent ("fix this bug and add a test") | Use the higher-overhead intent (IMPLEMENT > DEBUG) |
 | Continuation of previous conversation | Inherit previous classification unless prompt clearly shifts |
 | User explicitly requests no enforcement | Respect immediately — classify as QUICK |
 | Prompt references specific DevFlow command | Skip ambient — the command has its own orchestration |
+| Scope ambiguous between GUIDED and ORCHESTRATED | Default to GUIDED; escalate if complexity emerges during work |
+| REVIEW intent | Always GUIDED — single Reviewer focus, no orchestration pipeline |
+| Multiple triggers per session | Each runs independently; context compaction handles accumulation |

package/shared/skills/ambient-router/references/skill-catalog.md CHANGED Viewed

@@ -4,46 +4,50 @@ Full mapping of DevFlow skills to ambient intents and file-type triggers. The am
 ## Skills Available for Ambient Loading
-These skills may be loaded during GUIDED-depth ambient routing.
-### BUILD Intent
-| Skill | When to Load | File Patterns |
-|-------|-------------|---------------|
-| test-driven-development | Always for BUILD | `*.ts`, `*.tsx`, `*.js`, `*.jsx`, `*.py` |
-| implementation-patterns | Always for BUILD | Any code file |
-| typescript | TypeScript files in scope | `*.ts`, `*.tsx` |
-| react | React components in scope | `*.tsx`, `*.jsx` |
-| frontend-design | UI/styling work | `*.css`, `*.scss`, `*.tsx` with styling keywords |
-| input-validation | Forms, APIs, user input | Files with form/input/validation keywords |
-| go | Go files in scope | `*.go` |
-| java | Java files in scope | `*.java` |
-| python | Python files in scope | `*.py` |
-| rust | Rust files in scope | `*.rs` |
-| security-patterns | Auth, crypto, secrets | Files with auth/token/crypto/password keywords |
+These skills may be loaded during GUIDED and ORCHESTRATED-depth ambient routing.
+### IMPLEMENT Intent
+| Skill | When to Load | Depth | File Patterns |
+|-------|-------------|-------|---------------|
+| implementation-orchestration | ORCHESTRATED only | ORCHESTRATED | Any — orchestrates agent pipeline |
+| test-driven-development | Always for IMPLEMENT | GUIDED + ORCHESTRATED | Any code file — enforces RED-GREEN-REFACTOR |
+| implementation-patterns | Always for IMPLEMENT | GUIDED + ORCHESTRATED | Any code file |
+| search-first | Always for IMPLEMENT | GUIDED + ORCHESTRATED | Any — enforces research before building |
+| typescript | TypeScript files in scope | GUIDED + ORCHESTRATED | `*.ts`, `*.tsx` |
+| react | React components in scope | GUIDED + ORCHESTRATED | `*.tsx`, `*.jsx` |
+| frontend-design | UI/styling work | GUIDED + ORCHESTRATED | `*.css`, `*.scss`, `*.tsx` with styling keywords |
+| input-validation | Forms, APIs, user input | GUIDED + ORCHESTRATED | Files with form/input/validation keywords |
+| go | Go files in scope | GUIDED + ORCHESTRATED | `*.go` |
+| java | Java files in scope | GUIDED + ORCHESTRATED | `*.java` |
+| python | Python files in scope | GUIDED + ORCHESTRATED | `*.py` |
+| rust | Rust files in scope | GUIDED + ORCHESTRATED | `*.rs` |
+| security-patterns | Auth, crypto, secrets | GUIDED + ORCHESTRATED | Files with auth/token/crypto/password keywords |
 ### DEBUG Intent
-| Skill | When to Load | File Patterns |
-|-------|-------------|---------------|
-| test-patterns | Always for DEBUG | Any test-related context |
-| core-patterns | Always for DEBUG | Any code file |
-| git-safety | Git operations involved | User mentions git, rebase, merge, etc. |
+| Skill | When to Load | Depth | File Patterns |
+|-------|-------------|-------|---------------|
+| debug-orchestration | ORCHESTRATED only | ORCHESTRATED | Any — orchestrates investigation pipeline |
+| core-patterns | Always for DEBUG | GUIDED + ORCHESTRATED | Any code file |
+| test-patterns | Always for DEBUG (GUIDED) | GUIDED | Any code file |
+| git-safety | Git operations involved | GUIDED + ORCHESTRATED | User mentions git, rebase, merge, etc. |
 ### REVIEW Intent
-| Skill | When to Load | File Patterns |
-|-------|-------------|---------------|
-| self-review | Always for REVIEW | Any code file |
-| core-patterns | Always for REVIEW | Any code file |
-| test-patterns | Test files in scope | `*.test.*`, `*.spec.*` |
+| Skill | When to Load | Depth | File Patterns |
+|-------|-------------|-------|---------------|
+| self-review | Always for REVIEW | GUIDED | Any code file |
+| core-patterns | Always for REVIEW | GUIDED | Any code file |
+| test-patterns | Test files in scope | GUIDED | `*.test.*`, `*.spec.*` |
 ### PLAN Intent
-| Skill | When to Load | File Patterns |
-|-------|-------------|---------------|
-| implementation-patterns | Always for PLAN | Any planning context |
-| core-patterns | Architectural planning | System design discussions |
+| Skill | When to Load | Depth | File Patterns |
+|-------|-------------|-------|---------------|
+| plan-orchestration | ORCHESTRATED only | ORCHESTRATED | Any — orchestrates design pipeline |
+| implementation-patterns | Always for PLAN | GUIDED + ORCHESTRATED | Any planning context |
+| core-patterns | Always for PLAN | GUIDED + ORCHESTRATED | System design discussions |
 ## Skills Excluded from Ambient
@@ -62,7 +66,9 @@ These skills are loaded only by explicit DevFlow commands (primarily `/code-revi
 ## Selection Limits
-- **Maximum 3 skills** per ambient response (primary + up to 2 secondary)
-- **Primary skills** are always loaded for the classified intent
+- **Maximum 3 knowledge skills** per ambient response (primary + up to 2 secondary)
+- **Orchestration skills** (implementation-orchestration, debug-orchestration, plan-orchestration) are loaded only at ORCHESTRATED depth — they don't count toward the knowledge skill limit
+- **Primary skills** are always loaded for the classified intent at both GUIDED and ORCHESTRATED depth
 - **Secondary skills** are loaded only when file patterns match conversation context
-- If more than 3 skills seem relevant, this is an ELEVATE signal
+- **GUIDED depth** loads knowledge skills only (no orchestration skills) — main session works directly
+- **ORCHESTRATED depth** loads orchestration skill + knowledge skills — agents execute the pipeline

package/shared/skills/debug-orchestration/SKILL.md ADDED Viewed

@@ -0,0 +1,69 @@
+---
+name: debug-orchestration
+description: Agent orchestration for DEBUG intent — hypothesis investigation, root cause analysis, optional fix
+user-invocable: false
+allowed-tools: Read, Grep, Glob, Bash, Task, AskUserQuestion
+---
+# Debug Orchestration
+Agent pipeline for DEBUG intent in ambient ORCHESTRATED mode. Competing hypothesis investigation, parallel evidence gathering, convergence validation, and optional fix.
+This is a lightweight variant of `/debug` for ambient ORCHESTRATED mode. Excluded: knowledge persistence loading, GitHub issue fetching, pitfall recording.
+## Iron Law
+> **COMPETING HYPOTHESES BEFORE CONCLUSIONS**
+>
+> Never investigate a single theory. Generate 3-5 distinct hypotheses, investigate them
+> in parallel, and let evidence determine the root cause. Confirmation bias is the enemy
+> of debugging — multiple hypotheses are the antidote.
+---
+## Phase 1: Hypothesize
+Analyze the bug description, error messages, and conversation context. Generate 3-5 hypotheses that are:
+- **Specific**: Points to a concrete mechanism (not "something is wrong with auth")
+- **Testable**: Can be confirmed or disproved by examining specific files/logs
+- **Distinct**: Each hypothesis proposes a different root cause
+If fewer than 3 hypotheses are possible, proceed with 2.
+## Phase 2: Investigate (Parallel)
+Spawn one `Task(subagent_type="Explore")` per hypothesis **in a single message** (parallel execution):
+- Each investigator searches for evidence FOR and AGAINST its hypothesis
+- Must provide file:line references for all evidence
+- Returns verdict: **CONFIRMED** | **DISPROVED** | **PARTIAL** (some evidence supports, some contradicts)
+## Phase 3: Converge
+Evaluate investigation results:
+- **One CONFIRMED**: Spawn 1-2 additional `Task(subagent_type="Explore")` agents to validate from different angles (prevent confirmation bias)
+- **Multiple PARTIAL**: Look for a unifying root cause that explains all partial evidence
+- **All DISPROVED**: Report honestly — "No root cause identified from initial hypotheses." Generate 2-3 second-round hypotheses if conversation context suggests avenues not yet explored.
+## Phase 4: Report
+Present root cause analysis:
+- **Confidence level**: HIGH (confirmed + validated) | MEDIUM (partial convergence) | LOW (best guess from incomplete evidence)
+- **Evidence table**: Hypothesis → verdict → key evidence (file:line)
+- **Root cause**: Clear statement of what's wrong and why
+- **Recommended fix**: Specific changes with file references
+## Phase 5: Offer Fix
+Ask user via AskUserQuestion: "Want me to implement this fix?"
+- **YES** → Implement the fix directly in main session using GUIDED approach: load implementation-patterns, search-first, and test-driven-development skills, then code the fix. Spawn `Task(subagent_type="Simplifier")` on changed files after.
+- **NO** → Done. Report stands as documentation.
+## Error Handling
+- **All hypotheses disproved, no second-round ideas**: Report "No root cause identified" with summary of what was investigated and ruled out
+- **Explore agents return insufficient evidence**: Report LOW confidence with available evidence, suggest manual investigation areas

package/shared/skills/docs-framework/SKILL.md CHANGED Viewed

@@ -39,7 +39,10 @@ All generated documentation lives under `.docs/` in the project root:
 .memory/
 ├── WORKING-MEMORY.md                   # Auto-maintained by Stop hook (overwritten)
 ├── PROJECT-PATTERNS.md                 # Accumulated patterns (merged across sessions)
-└── backup.json                         # Pre-compact git state snapshot
+├── backup.json                         # Pre-compact git state snapshot
+└── knowledge/
+    ├── decisions.md                    # Architectural decisions (ADR-NNN format)
+    └── pitfalls.md                     # Known pitfalls (PF-NNN format)
 ```
 ---
@@ -97,6 +100,8 @@ source .devflow/scripts/docs-helpers.sh 2>/dev/null || {
 |-------|-----------------|----------|
 | Reviewer | `.docs/reviews/{branch-slug}/{type}-report.{timestamp}.md` | Creates new |
 | Working Memory | `.memory/WORKING-MEMORY.md` | Overwrites (auto-maintained by Stop hook) |
+| Knowledge (decisions) | `.memory/knowledge/decisions.md` | Append-only (ADR-NNN sequential IDs) |
+| Knowledge (pitfalls) | `.memory/knowledge/pitfalls.md` | Append-only (PF-NNN sequential IDs) |
 ### Agents That Don't Persist
@@ -125,6 +130,7 @@ When creating or modifying persisting agents:
 This framework is used by:
 - **Review agents**: Creates review reports
 - **Working Memory hooks**: Auto-maintains `.memory/WORKING-MEMORY.md`
+- **Command flows**: `/implement` appends ADRs to `decisions.md`; `/code-review`, `/debug`, `/resolve` append PFs to `pitfalls.md`
 All persisting agents should load this skill to ensure consistent documentation.

package/shared/skills/implementation-orchestration/SKILL.md ADDED Viewed

@@ -0,0 +1,92 @@
+---
+name: implementation-orchestration
+description: Agent orchestration for IMPLEMENT intent — pre-flight, Coder, quality gates
+user-invocable: false
+allowed-tools: Read, Grep, Glob, Bash, Task, AskUserQuestion
+---
+# Implementation Orchestration
+Agent pipeline for IMPLEMENT intent in ambient ORCHESTRATED mode. Pre-flight checks, plan synthesis, Coder execution, and quality gates.
+This is a lightweight variant of `/implement` for ambient ORCHESTRATED mode. Excluded: strategy selection (single/sequential/parallel Coders), retry loops, PR creation, knowledge loading.
+## Iron Law
+> **QUALITY GATES ARE NON-NEGOTIABLE**
+>
+> Every Coder output passes through Validator → Simplifier → Scrutinizer → re-Validate → Shepherd.
+> Skipping a gate because "it looks fine" is never acceptable. The pipeline runs to completion
+> or halts on failure — there is no shortcut.
+---
+## Phase 1: Pre-flight — Branch Safety
+Detect branch type before spawning Coder:
+- **Work branches** (`feat/`, `fix/`, `chore/`, `refactor/`, `docs/` prefix): proceed on current branch.
+- **Protected branches** (`main`, `master`, `develop`, `release/*`, `staging`, `production`): ask user via AskUserQuestion with 2-3 suggested branch names following `{type}/{ticket}-{slug}` convention. Include ticket number if available from conversation context.
+- **If user declines branch creation**: proceed on the protected branch. Respect the user's choice.
+## Phase 2: Plan Synthesis
+Synthesize conversation context into a structured EXECUTION_PLAN for Coder:
+- **If a plan exists** in conversation context (from plan mode — accepted in-session or injected after "accept and clear") → use the plan as-is.
+- **Otherwise** → synthesize from conversation: what to build, files/modules affected, constraints, decisions made during discussion.
+Format as structured markdown with: Goal, Steps, Files, Constraints, Decisions.
+## Phase 3: Coder Execution
+Record git SHA before first Coder: `git rev-parse HEAD`
+Spawn `Task(subagent_type="Coder")` with input variables:
+- **TASK_ID**: Generated from timestamp (e.g., `task-2026-03-19_1430`)
+- **TASK_DESCRIPTION**: From conversation context
+- **BASE_BRANCH**: Current branch (or newly created branch from Phase 1)
+- **EXECUTION_PLAN**: From Phase 2
+- **PATTERNS**: Codebase patterns from conversation context
+- **CREATE_PR**: `false` (commit only, no push)
+- **DOMAIN**: Inferred from files in scope (`backend`, `frontend`, `tests`, `fullstack`)
+**Execution strategy**: Single sequential Coder by default. Parallel Coders only when tasks are self-contained — zero shared contracts, no integration points, different files/modules with no imports between them.
+If Coder returns **BLOCKED**, halt the pipeline and report to user.
+## Phase 4: FILES_CHANGED Detection
+After Coder completes, detect changed files:
+```bash
+git diff --name-only {starting_sha}...HEAD
+```
+Pass FILES_CHANGED to all quality gate agents.
+## Phase 5: Quality Gates
+Run sequentially — each gate must pass before the next:
+1. `Task(subagent_type="Validator")` (build + typecheck + lint + tests) — retry up to 2× on failure (Coder fixes between retries)
+2. `Task(subagent_type="Simplifier")` — code clarity and maintainability pass on FILES_CHANGED
+3. `Task(subagent_type="Scrutinizer")` — 9-pillar quality evaluation on FILES_CHANGED
+4. `Task(subagent_type="Validator")` (re-validate after Simplifier/Scrutinizer changes)
+5. `Task(subagent_type="Shepherd")` — verify implementation matches original request — retry up to 2× if misalignment found
+If any gate exhausts retries, halt pipeline and report what passed and what failed.
+## Phase 6: Completion
+Report results:
+- Commits created (from Coder)
+- Files changed
+- Quality gate results (pass/fail per gate)
+- No push — user decides when to push
+## Error Handling
+- **Coder BLOCKED**: Halt immediately, report blocker to user
+- **Validator fails after retries**: Report specific failures, halt pipeline
+- **Shepherd misalignment after retries**: Report misalignment details, let user decide next steps

package/shared/skills/knowledge-persistence/SKILL.md ADDED Viewed

@@ -0,0 +1,128 @@
+---
+name: knowledge-persistence
+description: >-
+  This skill should be used when recording architectural decisions or pitfalls
+  to project knowledge files, or when loading prior decisions and known pitfalls
+  for context during investigation, specification, or review.
+user-invocable: false
+allowed-tools: Read, Write, Bash
+---
+# Knowledge Persistence
+Record architectural decisions and pitfalls to `.memory/knowledge/` files. This is the single source of truth for the extraction procedure — commands reference this skill instead of inlining the steps.
+## Iron Law
+> **SINGLE SOURCE OF TRUTH**
+>
+> All knowledge extraction follows this procedure exactly. Commands never inline
+> their own extraction steps — they read this skill and follow it.
+---
+## File Locations
+```
+.memory/knowledge/
+├── decisions.md    # ADR entries (append-only)
+└── pitfalls.md     # PF entries (area-specific gotchas)
+```
+## File Formats
+### decisions.md (ADR entries)
+**Template header** (create if file missing):
+```
+<!-- TL;DR: 0 decisions. Key: -->
+# Architectural Decisions
+Append-only. Status changes allowed; deletions prohibited.
+```
+**Entry format**:
+```markdown
+## ADR-{NNN}: {Title}
+- **Date**: {YYYY-MM-DD}
+- **Status**: Accepted
+- **Context**: {Why this decision was needed}
+- **Decision**: {What was decided}
+- **Consequences**: {Tradeoffs and implications}
+- **Source**: {command and identifier, e.g. `/implement TASK-123`}
+```
+### pitfalls.md (PF entries)
+**Template header** (create if file missing):
+```
+<!-- TL;DR: 0 pitfalls. Key: -->
+# Known Pitfalls
+Area-specific gotchas, fragile areas, and past bugs.
+```
+**Entry format**:
+```markdown
+## PF-{NNN}: {Short description}
+- **Area**: {file paths or module names}
+- **Issue**: {What goes wrong}
+- **Impact**: {Consequences if hit}
+- **Resolution**: {How to fix or avoid}
+- **Source**: {command and identifier, e.g. `/code-review branch-name`}
+```
+---
+## Extraction Procedure
+Follow these steps when recording decisions or pitfalls:
+1. **Read** the target file (`.memory/knowledge/decisions.md` or `.memory/knowledge/pitfalls.md`). If it doesn't exist, create it with the template header above.
+2. **Check capacity** — count `## ADR-` or `## PF-` headings. If >=50, log "Knowledge base at capacity — skipping new entry" and stop.
+3. **Find next ID** — find highest NNN via regex (`/^## ADR-(\d+)/` or `/^## PF-(\d+)/`), default to 0. Increment by 1.
+4. **Deduplicate** (pitfalls only) — skip if an entry with the same Area + Issue already exists.
+5. **Append** the new entry using the format above.
+6. **Update TL;DR** — rewrite the `<!-- TL;DR: ... -->` comment on line 1 to reflect the new count and key topics.
+## Lock Protocol
+When writing, use a mkdir-based lock:
+- Lock path: `.memory/.knowledge.lock`
+- Timeout: 30 seconds (fail if lock not acquired)
+- Stale recovery: if lock directory is >60 seconds old, remove it and retry
+- Release lock after write completes (remove lock directory)
+## Loading Knowledge for Context
+When a command needs prior knowledge as input (not recording):
+1. Read `.memory/knowledge/decisions.md` if it exists
+2. Read `.memory/knowledge/pitfalls.md` if it exists
+3. Pass content as context to downstream agents — prior decisions constrain scope, known pitfalls inform investigation
+If neither file exists, skip silently. No error, no empty-file creation.
+## Operation Budget
+Recording: do inline (no agent spawn), 2-3 Read/Write operations total.
+Loading: 1-2 Read operations, pass as context string.
+---
+## Extended References
+For entry examples and status lifecycle details:
+- `references/examples.md` - Full decision and pitfall entry examples
+---
+## Success Criteria
+- [ ] Entry appended with correct sequential ID
+- [ ] No duplicate pitfalls (same Area + Issue)
+- [ ] TL;DR comment updated with current count
+- [ ] Lock acquired before write, released after
+- [ ] Capacity limit (50) respected