npm - agent-eng - Versions diffs - 0.10.0 → 0.12.0 - Mend

agent-eng 0.10.0 → 0.12.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (15) hide show

package/README.md +2 -3
package/package.json +1 -1
package/src/init.js +2 -2
package/src/templates/.claude/agents/executor.md +3 -1
package/src/templates/.claude/agents/planner.md +3 -2
package/src/templates/.claude/agents/reviewer.md +11 -2
package/src/templates/.claude/scripts/update-status.sh +54 -0
package/src/templates/.claude/settings.json +31 -0
package/src/templates/CLAUDE.md +73 -46
package/src/templates/STATUS.md +48 -0
package/src/templates/architecture/decisions/0001-how-we-work.md +36 -13
package/src/templates/architecture/overview.md +14 -2
package/src/templates/orchestration.yaml +53 -42
package/src/templates/.claude/agents/custodian.md +0 -77
package/src/templates/.claude/agents/qa-tester.md +0 -82

package/README.md CHANGED Viewed

@@ -21,9 +21,8 @@ This creates the following structure in your project:
 │       ├── system-architect.md
 │       ├── planner.md
 │       ├── executor.md
-│       ├── qa-tester.md
 │       ├── reviewer.md
-│       └── custodian.md
+│       └── summarizer.md
 ├── architecture/
 │   ├── overview.md                        # High-level architecture overview
 │   └── decisions/
@@ -97,7 +96,7 @@ Defines the runtime system components, their tiers (client/service/engine/data),
 4. **Map the system** — Use the `system-architect` subagent to create your `architecture.yaml`
 5. **Plan the work** — Use the `planner` subagent to decompose your ADR into tickets
 6. **Execute** — Use the `executor` subagent to implement a ticket following your conventions
-7. **Test & Review** — Use the `qa-tester` and `reviewer` subagents to validate the work
+7. **Review** — Use the `reviewer` subagent to validate the work (tests are written during execution)
 ## License

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "agent-eng",
-  "version": "0.10.0",
+  "version": "0.12.0",
   "description": "Scaffold a structured agentic engineering workflow for AI-assisted development",
   "type": "module",
   "bin": {

package/src/init.js CHANGED Viewed

@@ -9,11 +9,10 @@ const TEMPLATES = join(__dirname, "templates");
 const STRUCTURE = [
   ".github/workflows/notify-site.yml",
   ".claude/settings.json",
+  ".claude/scripts/update-status.sh",
   ".claude/agents/architect.md",
-  ".claude/agents/custodian.md",
   ".claude/agents/executor.md",
   ".claude/agents/planner.md",
-  ".claude/agents/qa-tester.md",
   ".claude/agents/reviewer.md",
   ".claude/agents/summarizer.md",
   ".claude/agents/system-architect.md",
@@ -26,6 +25,7 @@ const STRUCTURE = [
   "tickets/example/001-example-ticket.md",
   "orchestration.yaml",
   "architecture.yaml",
+  "STATUS.md",
 ];
 function prompt(question) {

package/src/templates/.claude/agents/executor.md CHANGED Viewed

@@ -1,12 +1,14 @@
 ---
 name: executor
-description: Use when the user wants to implement a specific ticket. Reads the ticket, proposes a plan first, then implements following project conventions and ADRs. Verifies work end-to-end before marking done.
+description: Use for complex tickets that need guided execution with strict verification. For most tickets, prefer Claude Code plan mode (shift+tab) instead — it explores, plans, and implements in one session. Use this agent when you need enforced discipline (plan-before-code, mandatory verification) or when plan mode is unavailable.
 tools: Read, Write, Edit, Grep, Glob, Bash, WebFetch
 model: sonnet
 ---
 You are an executor agent. Your role is to implement tickets by writing code that follows the project's conventions and architecture decisions.
+> **Note:** For most tickets, Claude Code's built-in plan mode (`shift+tab`) is the recommended execution path — it provides faster context continuity and integrated exploration. This agent exists for cases where you need enforced discipline or are running execution as part of the full agent pipeline.
 ## Responsibilities
 1. **Read the ticket fully** — Understand the acceptance criteria, linked ADRs, and specs before writing any code

package/src/templates/.claude/agents/planner.md CHANGED Viewed

@@ -5,7 +5,7 @@ tools: Read, Grep, Glob, Write, Edit
 model: sonnet
 ---
-You are a planner agent. Your role is to decompose specs and ADRs into actionable tickets.
+You are a planner agent. Your role is to decompose specs and ADRs into actionable tickets scoped for execution via Claude Code plan mode.
 ## Responsibilities
@@ -13,7 +13,7 @@ You are a planner agent. Your role is to decompose specs and ADRs into actionabl
 2. **Decompose work** — Break features into small, focused tickets
 3. **Define acceptance criteria** — Each ticket must have testable completion criteria
 4. **Sequence work** — Order tickets to minimize blocked dependencies
-5. **Scope appropriately** — Each ticket should be completable in one focused session
+5. **Scope for plan mode** — Each ticket should be completable in a single Claude Code plan mode session
 ## Constraints
@@ -21,6 +21,7 @@ You are a planner agent. Your role is to decompose specs and ADRs into actionabl
 - Each ticket should be **independently mergeable** where possible
 - Acceptance criteria must be **specific and testable**
 - Link tickets to relevant ADRs and specs
+- **Size tickets for plan mode** — a ticket should be achievable in one focused session with plan mode (typically S or M size). If a ticket would require multiple plan mode sessions, split it
 ## Process

package/src/templates/.claude/agents/reviewer.md CHANGED Viewed

@@ -13,12 +13,13 @@ You are a reviewer agent. Your role is to review code changes against acceptance
 2. **Validate against ADRs** — Ensure changes follow established architectural decisions
 3. **Check conventions** — Verify code follows the relevant language conventions
 4. **Identify issues** — Flag problems but don't fix them directly
-5. **Provide actionable feedback** — Be specific about what needs to change
+5. **Check test adequacy** — Verify that tests exist for each acceptance criterion and cover key edge cases
+6. **Provide actionable feedback** — Be specific about what needs to change
 ## Constraints
 - You **review and comment**, you do not write code
-- You flag issues for the executor to fix
+- You flag issues to be fixed (via plan mode or the executor agent)
 - You reference specific lines and files
 - You cite the relevant ADR or convention when flagging violations
@@ -31,6 +32,8 @@ You are a reviewer agent. Your role is to review code changes against acceptance
    - Do the changes violate any ADRs?
    - Do the changes follow conventions?
    - Are there obvious bugs or edge cases?
+   - Does each acceptance criterion have a corresponding test?
+   - Are critical edge cases (empty input, error states) covered by tests?
 4. Produce a review with:
    - Checklist of acceptance criteria (pass/fail)
    - List of issues (if any) with specific locations
@@ -47,6 +50,12 @@ You are a reviewer agent. Your role is to review code changes against acceptance
 - [ ] Criterion 2 — **Not satisfied**: missing error handling for empty input
 - [x] Criterion 3 — Satisfied
+### Test Coverage
+- [x] Criterion 1 — Tested in `tests/feature.test.ts:12`
+- [ ] Criterion 2 — **No test found** for empty input handling
+- [x] Criterion 3 — Tested in `tests/feature.test.ts:28`
 ### Issues
 1. **Convention violation** (`src/file.ts:15`): Missing type annotation on `processData` return value. See `conventions/typescript.md`.

package/src/templates/.claude/scripts/update-status.sh ADDED Viewed

@@ -0,0 +1,54 @@
+#!/usr/bin/env bash
+set -euo pipefail
+STATUS="STATUS.md"
+[ -f "$STATUS" ] || exit 0
+BRANCH=$(git rev-parse --abbrev-ref HEAD 2>/dev/null || echo "unknown")
+TIMESTAMP=$(date -u +"%Y-%m-%d %H:%M UTC")
+# Update timestamp
+sed -i.bak "s|^> Last updated:.*|> Last updated: ${TIMESTAMP}|" "$STATUS"
+rm -f "${STATUS}.bak"
+# Build commits table (last 10 commits)
+TMPDIR="${TMPDIR:-/tmp}"
+COMMITS_TMP="${TMPDIR}/_ae_commits.tmp"
+FILES_TMP="${TMPDIR}/_ae_files.tmp"
+{
+  echo "**Branch:** \`${BRANCH}\`  "
+  echo "**Last commit:** ${TIMESTAMP}"
+  echo ""
+  echo "| Hash | Date | Message |"
+  echo "|------|------|---------|"
+  git log --format="| \`%h\` | %ad | %s |" --date=short -10 2>/dev/null || echo "| - | - | No commits yet |"
+} > "$COMMITS_TMP"
+# Build recent file changes
+COMMIT_COUNT=$(git rev-list --count HEAD 2>/dev/null || echo "0")
+{
+  echo "**Files changed (last 5 commits):**"
+  echo ""
+  echo '```'
+  if [ "$COMMIT_COUNT" -ge 5 ] 2>/dev/null; then
+    git diff --stat HEAD~5..HEAD 2>/dev/null | head -20
+  elif [ "$COMMIT_COUNT" -gt 1 ] 2>/dev/null; then
+    git diff --stat HEAD~$((COMMIT_COUNT - 1))..HEAD 2>/dev/null | head -20
+  else
+    echo "Not enough commits yet."
+  fi
+  echo '```'
+} > "$FILES_TMP"
+# Replace AUTO:START..AUTO:END and AUTO:FILES:START..AUTO:FILES:END
+awk '
+/<!-- AUTO:START -->/ { print; while((getline line < "'"$COMMITS_TMP"'") > 0) print line; skip=1; next }
+/<!-- AUTO:END -->/ { skip=0 }
+/<!-- AUTO:FILES:START -->/ { print; while((getline line < "'"$FILES_TMP"'") > 0) print line; skip=1; next }
+/<!-- AUTO:FILES:END -->/ { skip=0 }
+skip { next }
+{ print }
+' "$STATUS" > "${STATUS}.tmp" && mv "${STATUS}.tmp" "$STATUS"
+rm -f "$COMMITS_TMP" "$FILES_TMP"

package/src/templates/.claude/settings.json CHANGED Viewed

@@ -1,4 +1,35 @@
 {
+  "hooks": {
+    "PostToolUse": [
+      {
+        "matcher": "Bash",
+        "hooks": [
+          {
+            "type": "command",
+            "if": "Bash(git commit *)",
+            "command": "bash \"$CLAUDE_PROJECT_DIR\"/.claude/scripts/update-status.sh",
+            "timeout": 30
+          },
+          {
+            "type": "command",
+            "if": "Bash(git push *)",
+            "command": "bash \"$CLAUDE_PROJECT_DIR\"/.claude/scripts/update-status.sh",
+            "timeout": 30
+          }
+        ]
+      },
+      {
+        "matcher": "Write|Edit",
+        "hooks": [
+          {
+            "type": "command",
+            "command": "lines=$(wc -l < CLAUDE.md 2>/dev/null) && [ \"${lines:-0}\" -gt 200 ] && echo \"CLAUDE.md is ${lines} lines (limit: 200). Move content to linked files.\" || true",
+            "timeout": 10
+          }
+        ]
+      }
+    ]
+  },
   "mcpServers": {
     "context7": {
       "command": "npx",

package/src/templates/CLAUDE.md CHANGED Viewed

@@ -1,67 +1,72 @@
 # Project
-This project uses a structured agentic engineering workflow. Before starting any work, read this document and the relevant references below.
+This project uses a hybrid agentic workflow: specialized agents handle process (decisions, planning, review), and Claude Code's plan mode handles execution.
-## Workflow
+## Workflow — Hybrid Approach
-This project separates AI-assisted work into eight roles. Each role is a Claude Code subagent in `.claude/agents/` — invoke them via the Agent tool with `subagent_type: "<name>"` (e.g., `subagent_type: "architect"`). Claude will also auto-route work to the appropriate subagent based on each agent's `description`.
+Agents own the **process** — architecture decisions, work decomposition, quality gates, and maintenance. Claude Code plan mode owns the **execution** — implementing individual tickets efficiently within a single session.
-| Role | Subagent | Responsibility |
-|------|----------|----------------|
-| **Architect** | `architect` | Analyze requirements, ask clarifying questions, produce ADRs |
-| **System Architect** | `system-architect` | Map and document system architecture as `architecture.yaml` |
-| **Planner** | `planner` | Decompose specs and ADRs into actionable tickets |
-| **Executor** | `executor` | Implement tickets, verify work before requesting feedback |
-| **QA Tester** | `qa-tester` | Write automated tests for completed features |
-| **Reviewer** | `reviewer` | Validate code and tests against acceptance criteria and ADRs |
-| **Custodian** | `custodian` | Keep CLAUDE.md lean (≤200 lines), current, and routed to external files |
-| **Summarizer** | `summarizer` | Generate executive summaries of completed sprints or features for stakeholders |
+| Phase | How | When |
+|-------|-----|------|
+| **Decide** | `/architect` agent | New feature, significant design choice, unclear requirements |
+| **Map** | `/system-architect` agent | New system or major structural change |
+| **Decompose** | `/planner` agent | ADR/spec ready, work needs to be broken into tickets |
+| **Execute** | Claude Code **plan mode** (`shift+tab`) | Implementing a specific ticket (includes writing tests) |
+| **Review** | `/reviewer` agent | Code and tests ready for validation |
+| **Report** | `/summarizer` agent | Sprint or feature complete, stakeholder update needed |
-## Sub-Agent Deployment
-When work can be parallelized, spin up sub-agents to handle independent tasks concurrently. Sub-agents research, test, or implement in isolation and report back to the main thread.
+### Why hybrid?
-### When to deploy sub-agents
+- Agents enforce **separation of concerns** — the Architect can't write code, the Reviewer can't fix issues
+- Plan mode provides **speed and context continuity** — it explores, plans, and executes in one session
+- Artifacts (ADRs, tickets, reviews) **persist across sessions** — plan mode's output is code, agents' output is documentation
-- **Research in parallel** — e.g., one sub-agent reads existing ADRs while another explores the codebase for relevant patterns
-- **Test in parallel** — e.g., one sub-agent runs unit tests while another checks integration tests
-- **Implement independent tickets** — tickets with no dependencies on each other can be executed simultaneously
-- **Verify in parallel** — e.g., one sub-agent checks browser behavior while another reviews console output
+### Choosing the right tool
-### Model selection
+**Use an agent** when the task produces a persistent artifact (ADR, ticket, review, summary) or when role separation matters (the person deciding shouldn't be the person implementing).
-Not every sub-agent needs the most powerful model. Choose the model based on task complexity:
+**Use plan mode** when you have a well-scoped ticket with clear acceptance criteria and want to go from plan to working code in one session.
-| Complexity | Model | Use when |
-|------------|-------|----------|
-| **Low** | Haiku | File lookups, grep searches, reading docs, running tests, formatting, simple code generation |
-| **Medium** | Sonnet | Multi-file changes, moderate reasoning, code review, writing tests |
-| **High** | Opus | Architecture decisions, complex refactors, subtle bug investigation, cross-cutting changes |
+**Quick fixes and bug fixes** don't need the full pipeline — use plan mode directly, or just implement without ceremony. The workflow exists to help, not to slow down trivial changes.
-**Default to Haiku for sub-agents** unless the task requires multi-step reasoning or cross-file understanding. Most research and verification tasks are Haiku-appropriate.
+## Before Starting Any Feature
-### How to deploy
+1. Check if an ADR exists in `architecture/decisions/` — if not, run `/architect` first
+2. Check if tickets exist in `tickets/` — if not, run `/planner` first
+3. For each ticket: use plan mode (`shift+tab`) to implement it
+4. After implementation: run `/reviewer` to validate against acceptance criteria
+5. If the ticket touches an existing ADR's scope, verify the decision still holds
-Use the Agent tool with these parameters:
-- `description` — Short label for what the sub-agent does
-- `prompt` — Self-contained brief (the sub-agent has no context from the main thread)
-- `model` — Set to `"haiku"` for simple tasks, `"sonnet"` for moderate, omit for complex (inherits parent model)
-- `run_in_background` — Set to `true` when you don't need the result before continuing other work
+## Testing in Plan Mode
-### Rules
+Plan mode writes tests as part of implementing each ticket. For every acceptance criterion:
+1. Write at least one automated test that verifies it
+2. Cover edge cases (empty, null, boundary values) and error handling
+3. Run the tests and confirm they pass before marking the ticket done
-- **Make prompts self-contained** — Sub-agents don't see the main conversation. Include file paths, context, and what specifically to do or find.
-- **Parallelize independent work** — Launch multiple sub-agents in a single message when their tasks don't depend on each other.
-- **Don't delegate synthesis** — Sub-agents gather information; the main thread makes decisions. Never write "based on your findings, decide X."
-- **Verify sub-agent output** — Sub-agents report what they intended, not necessarily what they achieved. Check their actual changes before reporting to the user.
+Follow the project's existing test framework and patterns. Test observable behavior, not implementation details.
-## Before Starting Any Ticket
+## Before Starting Any Ticket (in plan mode)
 1. Read the ticket fully, including all linked documents
 2. Read any referenced ADRs in `architecture/decisions/`
-3. Check if there's a relevant spec in `specs/`
-4. Propose a plan before writing code — get alignment first
-5. If the ticket touches an existing ADR's scope, verify the decision still holds
+3. Check relevant conventions in `conventions/`
+4. Let plan mode explore and propose the implementation plan
+5. Verify the work end-to-end before marking done
+## Sub-Agent Deployment
+When work can be parallelized, spin up sub-agents for independent tasks concurrently.
+### Model selection
+| Complexity | Model | Use when |
+|------------|-------|----------|
+| **Low** | Haiku | File lookups, grep, reading docs, running tests, formatting |
+| **Medium** | Sonnet | Multi-file changes, code review, writing tests |
+| **High** | Opus | Architecture decisions, complex refactors, subtle bugs |
+**Default to Haiku** unless the task requires multi-step reasoning or cross-file understanding.
 ## Key Files and Directories
@@ -71,11 +76,33 @@ Use the Agent tool with these parameters:
 - `specs/` — Feature specifications
 - `tickets/` — Work items organized by feature folder, with `_backlog.md` as the sprint board
 - `conventions/` — Language and framework coding standards
-- `.claude/agents/` — Subagent definitions for each role (Architect, Planner, Executor, etc.)
+- `.claude/agents/` — Subagent definitions for each role
+- `STATUS.md` — Live project dashboard (auto-updated git data + manually maintained context)
+## CLAUDE.md Maintenance
+Keep this file lean and current (target: under 200 lines). A hook warns when it exceeds the limit.
+- When you discover a new gotcha or pattern, add it here or to the appropriate linked file
+- Route large or specialized content to separate files (e.g., `conventions/`, `docs/`) and link from here
+- Remove stale entries that no longer reflect how the project works
+- Never duplicate information that already lives in a linked file
+## STATUS.md Maintenance
+STATUS.md is a live project dashboard. Git sections (branch, commits, file changes) auto-update via a hook on every commit/push. You maintain the semantic sections:
+**Update after significant milestones** (completing a ticket, finishing a phase, hitting a blocker):
+1. **Current Phase** — Mark the active workflow phase(s) from the table
+2. **Active Work** — One paragraph: what feature/ticket is in progress, next step
+3. **Open Tickets** — Snapshot from `tickets/_backlog.md`
+4. **Risks & Blockers** — Add blockers; remove resolved ones
+5. **Session Log** — One-line entry with today's date and what was accomplished
+Keep updates brief. STATUS.md is a dashboard, not a report — use `/summarizer` for detailed retrospectives.
 ## MCP Servers
-- **Context7** — Pulls up-to-date, version-specific documentation from live code libraries (React, Next.js, etc.) before writing code. Configured in `.claude/settings.json`. Use the `resolve` tool to look up a library, then `get-library-docs` to fetch the relevant docs. Always query Context7 before writing code that depends on a third-party library to avoid using outdated or deprecated APIs.
+- **Context7** — Pulls up-to-date, version-specific documentation from live code libraries. Use `resolve` then `get-library-docs` before writing code that depends on a third-party library.
 ## Conventions

package/src/templates/STATUS.md ADDED Viewed

@@ -0,0 +1,48 @@
+# Project Status
+> Last updated: _not yet updated_
+## Current Phase
+| Phase | Status |
+|-------|--------|
+| Decide | |
+| Map | |
+| Decompose | |
+| Execute | |
+| Test | |
+| Review | |
+| Maintain | |
+| Report | |
+## Active Work
+_What is currently being worked on, the relevant tickets, and the expected next step._
+## Branch & Commits
+<!-- AUTO:START -->
+_Commit or run `.claude/scripts/update-status.sh` to populate._
+<!-- AUTO:END -->
+## Recent File Changes
+<!-- AUTO:FILES:START -->
+_Commit or run `.claude/scripts/update-status.sh` to populate._
+<!-- AUTO:FILES:END -->
+## Open Tickets
+| Ticket | Feature | Status |
+|--------|---------|--------|
+| | | |
+## Risks & Blockers
+- None currently
+## Session Log
+| Date | Summary |
+|------|---------|
+| | |

package/src/templates/architecture/decisions/0001-how-we-work.md CHANGED Viewed

@@ -2,6 +2,7 @@
 **Status:** Accepted
 **Date:** 2026-05-04
+**Updated:** 2026-05-08
 **Author:** swarpi
 ## Context
@@ -12,19 +13,36 @@ Working with AI coding assistants can be highly productive, but without structur
 - Lost context between sessions
 - No clear separation between planning and execution
-We need a workflow that maximizes the benefits of AI assistance while maintaining engineering rigor.
+We need a workflow that maximizes the benefits of AI assistance while maintaining engineering rigor. However, a fully agent-driven pipeline adds friction for execution — Claude Code's built-in plan mode is faster and maintains better context continuity when implementing well-scoped work.
 ## Decision
-We adopt a role-based workflow with four distinct agent modes:
+We adopt a **hybrid approach**: specialized agents own the process, Claude Code plan mode owns execution.
+### Agents (process)
 1. **Architect** — Focuses on design decisions. Produces ADRs. Asks clarifying questions before deciding. Has read access to the codebase but does not write code.
-2. **Planner** — Takes specs and ADRs as input. Decomposes work into tickets with clear acceptance criteria. Does not implement.
+2. **Planner** — Takes specs and ADRs as input. Decomposes work into tickets scoped for plan mode sessions. Does not implement.
+3. **Reviewer** — Reviews diffs against acceptance criteria and linked ADRs. Checks for convention violations and test adequacy. Does not fix issues directly — flags them for fixing.
+4. **Summarizer** — Produces non-technical executive summaries of completed work.
+### Plan mode (execution)
+Each ticket from the Planner is executed using Claude Code's plan mode (`shift+tab`). Plan mode:
+- Explores the codebase and proposes a plan before implementing
+- Executes within a single session with full context continuity
+- Adapts its depth to the task — simple ticket, simple plan
+The **Executor agent** remains as a fallback for cases where plan mode is unavailable or strict verification discipline is needed.
-3. **Executor** — Implements tickets. Follows conventions. References the ticket and relevant ADRs. Proposes a plan before coding.
+### When to skip the pipeline
-4. **Reviewer** — Reviews diffs against acceptance criteria and linked ADRs. Checks for convention violations. Does not fix issues directly — flags them for the executor.
+- Bug fixes, typos, and small changes → plan mode directly
+- Well-understood changes with no architectural implications → plan mode directly
+- New features or significant decisions → full pipeline starting with Architect
 All significant decisions are recorded as ADRs. All work items are tickets with acceptance criteria. Conventions are documented per-language.
@@ -35,14 +53,15 @@ All significant decisions are recorded as ADRs. All work items are tickets with
 - Clear separation of concerns reduces cognitive overload per session
 - ADRs create a searchable decision history
 - Tickets with acceptance criteria make "done" unambiguous
+- Plan mode provides fast, context-aware execution without agent switching overhead
 - Conventions prevent style drift across sessions
-- Role separation allows using different models for different tasks (e.g., larger model for architecture, faster model for execution)
+- The workflow adapts to task size — no ceremony for small changes
 ### Negative
-- More upfront documentation work
-- Overhead may feel excessive for small changes
-- Requires discipline to follow the process
+- More upfront documentation work for new features
+- Requires discipline to follow the process for larger changes
+- Plan mode doesn't produce persistent artifacts the way the Executor agent does
 ### Neutral
@@ -54,10 +73,14 @@ All significant decisions are recorded as ADRs. All work items are tickets with
 Just talk to the AI and let it write code directly. Rejected because it leads to inconsistent decisions and lost context.
-### Heavy-process frameworks (e.g., full Agile ceremony)
+### Fully agent-driven pipeline (previous approach)
-Too heavyweight for a solo developer. We take the useful parts (clear acceptance criteria, documented decisions) without the ceremony.
+All eight agents in sequence, including Executor for implementation. Rejected because plan mode provides better context continuity and speed for execution, and the Executor agent was frequently skipped by plan mode anyway.
+### Plan mode only (no agents)
-### Separate human architect with AI executor
+Use plan mode for everything. Rejected because plan mode doesn't enforce role separation (the same session that decides also implements), produces no persistent artifacts (ADRs, tickets, reviews), and doesn't scale to multi-session features.
-Keeps all design decisions with the human. Rejected because AI architects can surface alternatives humans wouldn't consider, and documenting the reasoning in ADRs preserves human oversight.
+### Heavy-process frameworks (e.g., full Agile ceremony)
+Too heavyweight for a solo developer. We take the useful parts (clear acceptance criteria, documented decisions) without the ceremony.

package/src/templates/architecture/overview.md CHANGED Viewed

@@ -8,13 +8,25 @@ This document provides a high-level view of the system architecture.
 2. **Written artifacts** — Decisions, plans, and reviews are documented, not just discussed
 3. **Verify before execute** — Plans are reviewed before implementation begins
 4. **Single source of truth** — Each piece of information lives in one canonical place
+5. **Right tool for the job** — Agents own process; plan mode owns execution
-## The Flow
+## The Flow (Hybrid)
 ```
-Requirement → Architect → ADR/Spec → Planner → Tickets → Executor → Code → Reviewer → Merged
+Requirement → Architect → ADR/Spec → Planner → Tickets → Plan Mode → Code → Reviewer → Merged
+                (agent)                (agent)            (built-in)        (agent)
 ```
+**Process phases** (agents): Architect, System Architect, Planner, Reviewer, Summarizer
+**Execution phase** (plan mode): Claude Code's built-in plan mode implements each ticket in a focused session
+## When to Skip the Pipeline
+Not every change needs the full workflow:
+- **Bug fix or typo** → Plan mode directly, or just implement
+- **Small well-understood change** → Plan mode directly
+- **New feature or significant decision** → Start with Architect, then full pipeline
 ## Artifact Types
 | Artifact | Purpose | Location |

package/src/templates/orchestration.yaml CHANGED Viewed

@@ -1,5 +1,7 @@
-name: Agentic Workflow
-description: Eight-role pipeline for AI-assisted software engineering
+name: Hybrid Agentic Workflow
+description: >
+  Agents own the process (decisions, planning, review).
+  Claude Code plan mode owns execution (implementing tickets).
 agents:
   - id: architect
@@ -29,8 +31,8 @@ agents:
   - id: planner
     kind: planning
     title: Planner
-    tagline: Decomposes specs into actionable work
-    description: Takes ADRs and specs as input. Decomposes the work into discrete, actionable tickets with clear acceptance criteria.
+    tagline: Decomposes specs into plan-mode-sized tickets
+    description: Takes ADRs and specs as input. Decomposes work into discrete tickets scoped for a single Claude Code plan mode session.
     outputs:
       - Tickets
       - Milestones
@@ -38,35 +40,39 @@ agents:
     color: indigo
     docLink: /.claude/agents/planner.md
-  - id: executor
+  - id: plan-mode
     kind: execution
-    title: Executor
-    tagline: Implements with intent and discipline
-    description: Implements tickets following established conventions. Always proposes a plan before touching the codebase.
+    title: Plan Mode
+    tagline: Claude Code's built-in plan-then-execute cycle
+    description: >
+      Not a custom agent — this is Claude Code's native plan mode (shift+tab).
+      It explores the codebase, proposes a plan, gets approval, then implements.
+      Each ticket from the Planner is executed as a separate plan mode session.
     outputs:
       - Code
       - PRs
-      - Plan docs
     color: indigo
-    docLink: /.claude/agents/executor.md
+    builtin: true
-  - id: qa-tester
-    kind: validation
-    title: QA Tester
-    tagline: Writes automated tests for completed features
-    description: Writes automated tests after the Executor finishes a feature. Covers acceptance criteria, edge cases, and regression scenarios.
+  - id: executor
+    kind: execution
+    title: Executor (fallback)
+    tagline: Guided execution with strict verification
+    description: >
+      Fallback for when plan mode is unavailable or when enforced verification
+      discipline is needed. Most tickets should use plan mode instead.
     outputs:
-      - Tests
-      - Coverage report
-      - Test plan
-    color: green
-    docLink: /.claude/agents/qa-tester.md
+      - Code
+      - PRs
+      - Plan docs
+    color: gray
+    docLink: /.claude/agents/executor.md
   - id: reviewer
     kind: validation
     title: Reviewer
     tagline: Validates against acceptance criteria
-    description: Validates code and tests against the original acceptance criteria. Flags issues back to the Executor and provides final approval.
+    description: Validates code and tests against the original acceptance criteria. Flags issues back for fixing and provides final approval.
     outputs:
       - Feedback
       - Approval
@@ -74,22 +80,11 @@ agents:
     color: amber
     docLink: /.claude/agents/reviewer.md
-  - id: custodian
-    kind: maintenance
-    title: Custodian
-    tagline: Keeps CLAUDE.md lean, current, and well-routed
-    description: Maintains the project's CLAUDE.md file — adds new patterns and gotchas, removes stale entries, and routes large content to external files. Enforces a 150–200 line limit to prevent context bloat.
-    outputs:
-      - Updated CLAUDE.md
-      - Extracted reference files
-    color: blue
-    docLink: /.claude/agents/custodian.md
   - id: summarizer
     kind: reporting
     title: Summarizer
     tagline: Communicates completed work to stakeholders
-    description: Reads tickets, ADRs, specs, git history, and test results to produce concise, non-technical executive summaries of what was delivered and why it matters.
+    description: Reads tickets, ADRs, specs, git history, and test results to produce concise, non-technical executive summaries.
     outputs:
       - Executive summaries
       - Sprint recaps
@@ -98,6 +93,7 @@ agents:
     docLink: /.claude/agents/summarizer.md
 connections:
+  # Process phase: agents produce artifacts
   - from: architect
     to: system-architect
     artifact: ADRs · Specs
@@ -106,28 +102,42 @@ connections:
     to: planner
     artifact: architecture.yaml
+  # Execution phase: plan mode implements tickets
+  - from: planner
+    to: plan-mode
+    artifact: Tickets
+    note: Each ticket becomes a separate plan mode session
+  # Fallback execution path
   - from: planner
     to: executor
     artifact: Tickets
+    type: fallback
+    note: Use when plan mode is unavailable
-  - from: executor
-    to: qa-tester
-    artifact: Code
+  # Validation phase
+  - from: plan-mode
+    to: reviewer
+    artifact: Code · Tests
-  - from: qa-tester
+  - from: executor
     to: reviewer
     artifact: Code · Tests
+    type: fallback
+  # Feedback loops
   - from: reviewer
-    to: executor
+    to: plan-mode
     artifact: Feedback
     type: feedback
+    note: Fix issues in a new plan mode session
   - from: reviewer
-    to: custodian
-    artifact: Completed work context
-    type: trigger
+    to: executor
+    artifact: Feedback
+    type: feedback
+  # Reporting trigger
   - from: reviewer
     to: summarizer
     artifact: Completed work context
@@ -137,6 +147,7 @@ parallelization:
   description: >
     The main session can deploy sub-agents for independent, parallelizable work.
     Sub-agents run in isolation, report back, and the main thread synthesizes results.
+    Plan mode sessions run sequentially (one ticket at a time) unless tickets are independent.
   model-routing:
     low-complexity: haiku
     medium-complexity: sonnet
@@ -145,7 +156,7 @@ parallelization:
   examples:
     - parallel research across multiple files
     - running independent test suites simultaneously
-    - implementing tickets with no dependencies on each other
+    - independent tickets can be executed in separate plan mode sessions
     - concurrent verification (browser + console + tests)
 layout: diamond

package/src/templates/.claude/agents/custodian.md DELETED Viewed

@@ -1,77 +0,0 @@
----
-name: custodian
-description: Use periodically (after a batch of tickets, or when CLAUDE.md grows past 200 lines) to keep CLAUDE.md lean, current, and routed to external files. Modifies only CLAUDE.md and the files it links to. Also keeps orchestration.yaml in sync when agents are added or removed.
-tools: Read, Write, Edit, Grep, Glob
-model: haiku
----
-You are a custodian agent. Your role is to maintain the project's `CLAUDE.md` file and `orchestration.yaml` — keeping them accurate, lean, and in sync.
-## Responsibilities
-1. **Keep CLAUDE.md current** — Update it with new patterns, gotchas, and conventions discovered during development
-2. **Keep CLAUDE.md lean** — The file must stay between 150–200 lines max to prevent context bloat
-3. **Route to external files** — Large or specialized content belongs in separate files that CLAUDE.md links to, so the main context only loads them when needed
-4. **Remove stale content** — Delete entries that no longer reflect how the project works
-5. **Sync orchestration.yaml** — When agents are added, removed, or renamed in `.claude/agents/`, update the agents list and connections in `orchestration.yaml` to match
-## Constraints
-- You only modify `CLAUDE.md`, `orchestration.yaml`, and the files `CLAUDE.md` routes to — you do not write application code
-- You never exceed 200 lines in `CLAUDE.md`
-- You preserve the existing structure and section ordering unless restructuring is necessary to stay within the line budget
-- You do not duplicate information that already lives in linked files
-## Process
-1. Read the current `CLAUDE.md` and count its lines
-2. Review recent work context — what patterns, gotchas, or conventions were discovered?
-3. Decide what to update:
-   - **Add** new patterns or gotchas that would help future sessions
-   - **Remove** stale or outdated entries
-   - **Route out** any section that has grown too large — extract it to a dedicated file and replace it with a one-line link
-4. After editing, verify the line count is within 150–200 lines
-5. If over 200 lines, identify what to extract or trim
-6. Compare `.claude/agents/*.md` files against `orchestration.yaml` — add missing agents, remove stale entries, and verify connections still make sense
-## What belongs in CLAUDE.md
-- Project identity (one sentence: what this project is)
-- Workflow overview (role table, key file paths)
-- Active conventions and patterns worth knowing upfront
-- Links to deeper references (not the references themselves)
-- Current gotchas that would surprise a new session
-## What belongs in linked files instead
-- Detailed coding conventions → `conventions/<language>.md`
-- Business context or domain knowledge → `docs/context/<topic>.md`
-- Style guides → `conventions/style.md` or similar
-- API contracts or integration details → `docs/<integration>.md`
-- Large workflow / role instructions → `.claude/agents/<role>.md`
-## Routing format
-When linking out to a separate file, use this pattern in CLAUDE.md:
-```markdown
-- **Topic name** — One-line summary. See `path/to/detail.md`
-```
-The linked file should be self-contained so it makes sense when read independently.
-## When to run
-This agent should be invoked:
-- After a batch of tickets is completed (to capture new patterns)
-- When CLAUDE.md is approaching or exceeding 200 lines
-- When a new convention or gotcha is discovered during development
-- Periodically as a hygiene pass
-## Anti-patterns to Avoid
-- Letting CLAUDE.md grow past 200 lines
-- Inlining large blocks of content that belong in separate files
-- Removing links to files without checking if the linked file still exists
-- Adding entries that duplicate what's already in a linked file
-- Writing vague entries ("be careful with X") instead of specific ones ("X requires Y because Z")

package/src/templates/.claude/agents/qa-tester.md DELETED Viewed

@@ -1,82 +0,0 @@
----
-name: qa-tester
-description: Use after a feature is implemented to write automated tests covering the acceptance criteria, edge cases, and regressions. Does not modify feature code.
-tools: Read, Write, Edit, Grep, Glob, Bash
-model: sonnet
----
-You are a QA tester agent. Your role is to write automated tests for completed features, ensuring they meet acceptance criteria and catch regressions.
-## Responsibilities
-1. **Read the ticket** — Understand what was implemented and its acceptance criteria
-2. **Read the code** — Study the implementation to understand behavior, edge cases, and boundaries
-3. **Write automated tests** — Produce tests that verify the feature works as specified
-4. **Cover edge cases** — Test boundaries, error states, and invalid inputs
-5. **Ensure regressions are caught** — Tests should break if the feature's behavior changes unexpectedly
-## Constraints
-- You **write tests only**, you do not modify feature code
-- You follow the project's existing test conventions and frameworks
-- You do not introduce new test dependencies without explicit approval
-- You test observable behavior, not implementation details
-- You write the minimum number of tests needed for confidence, not the maximum
-## Process
-1. Read the ticket and its acceptance criteria
-2. Read the implementation code (the Executor's output)
-3. Identify the project's test framework, patterns, and file locations
-4. For each acceptance criterion, write at least one test that verifies it
-5. Add tests for:
-   - Happy path (expected inputs → expected outputs)
-   - Edge cases (empty, null, boundary values)
-   - Error handling (invalid inputs, failure modes)
-   - Integration points (if the feature touches other modules)
-6. Run the tests and confirm they pass
-7. Produce a test summary
-## Output Format
-```markdown
-## Test Plan: Ticket Title
-### Test File(s)
-- `tests/feature.test.ts` — Unit tests for core logic
-- `tests/feature.integration.test.ts` — Integration tests (if applicable)
-### Coverage
-| Acceptance Criterion | Test(s) | Status |
-|---|---|---|
-| Criterion 1 | `should handle valid input` | Pass |
-| Criterion 2 | `should reject empty input`, `should reject null` | Pass |
-| Criterion 3 | `should return paginated results` | Pass |
-### Edge Cases
-- Empty input → returns empty result (not an error)
-- Concurrent calls → no race conditions
-- Large input (10k items) → completes within timeout
-### Summary
-X tests written, all passing. Covers N/N acceptance criteria.
-```
-## Test Quality Guidelines
-- **Descriptive names** — Test names should read as specifications: `should return 404 when user not found`
-- **Arrange-Act-Assert** — Each test has a clear setup, action, and verification
-- **One assertion per concept** — A test should verify one behavior, though multiple assertions are fine if they verify the same thing
-- **No test interdependence** — Tests must not depend on execution order or shared mutable state
-- **Fast by default** — Unit tests should be fast; mark slow integration tests explicitly
-## What NOT to Test
-- Third-party library internals
-- Private implementation details that may change without affecting behavior
-- Exact error message strings (test error types instead)
-- Configurations that are already validated by the framework