npm - sisyphi - Versions diffs - 1.0.13 → 1.1.0 - Mend

sisyphi 1.0.13 → 1.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (100) hide show

package/dist/templates/orchestrator-impl.md CHANGED Viewed

@@ -1,122 +1,157 @@
 # Implementation Phase
-## Stage-by-Stage Execution
+<stage-execution>
-### Maximize parallelism
+## Maximize Parallelism
-Before starting each cycle, ask: **which stages or tasks are independent right now?** If two stages touch different subsystems (e.g., backend vs frontend, separate services, unrelated modules), spawn them concurrently — don't serialize work that doesn't need to be serialized. Use `--worktree` when parallel agents might touch overlapping files.
+Before each cycle, ask: **which stages or tasks are independent right now?** If two stages touch different subsystems, spawn them concurrently.
-Maximize parallelism **within your development cycle, not by skipping parts of it.** Running a review alongside the next stage's implementation is good parallelism. Skipping review because the next stage is ready is not — that's cutting corners faster, not working faster. A cycle with one agent running is a wasted cycle if other work was ready, but "other work" includes critique and validation agents, not just the next implementation stage.
+Maximize parallelism **within your development cycle, not by skipping parts of it.** Running a review alongside the next stage's implementation is good parallelism. Skipping review because the next stage is ready is cutting corners.
-If the plan has stages that share no file dependencies, **run them in parallel from the start.** The development cycle for each stage involves some combination of:
+If the plan has stages that share no file dependencies, run them in parallel from the start. The development cycle for each stage:
-1. **Detail-plan it** — expand the high-level outline into specific file changes, informed by previous stages. If complex enough, spawn a spec agent first.
-2. **Implement it** — spawn agents with self-contained instructions (see Agent Instructions below). May itself take multiple cycles if the stage has enough work.
-3. **Critique and refine it** — spawn review agents, fix what they find (see Critique and Refinement below).
-4. **Validate it** — spawn a validation agent to verify the stage actually works (see E2E Validation below).
+1. **Detail-plan it** — expand the outline into specific file changes. If complex, spawn a requirements or design agent first.
+2. **Implement it** — spawn agents with self-contained instructions.
+3. **Critique and refine it** — spawn review agents, fix what they find.
+4. **Validate it** — verify the stage actually works end-to-end.
-Not every stage needs every step. Use your judgment about what level of rigor each stage deserves:
-- A types/interfaces stage might just need implementation — the next stage that consumes the types will surface any problems.
-- A core business logic stage needs implementation + critique at minimum — subtle bugs here cascade everywhere.
-- An integration stage or anything touching critical paths needs the full loop including validation — you're building on accumulated assumptions and need to verify they hold.
+Not every stage needs every step:
+- Types/interfaces → implementation only (consumers surface type errors)
+- Core business logic → implementation + critique minimum
+- Integration/critical path → full loop including validation
-The key question each cycle: **what's the riskiest unverified work right now?** If you just finished a foundation stage and are about to build on it, validate the foundation. If you just implemented a low-risk config change, move on and batch it into a broader review later. When multiple stages have completed without any critique or validation, you've lost the feedback loop — stop implementing and catch up on verification before problems compound.
+**When multiple stages have completed without any critique or validation, stop implementing and catch up on verification.** Don't let unverified work compound.
 Don't detail-plan all stages up front. What you learn implementing earlier stages should inform later ones.
-## Agent Instructions
+</stage-execution>
-Implementation agent prompts must be **fully self-contained** — include everything the agent needs so it doesn't have to re-explore or guess. Each spawn instruction should include:
+<agent-instructions>
-- The overall goal of the session (one sentence)
+Implementation agent prompts must be **fully self-contained** — include everything the agent needs so it doesn't have to re-explore or guess:
+- The overall session goal (one sentence)
 - This agent's specific task (files to create/modify, what the change does, done condition)
 - References to relevant context files (`conventions.md`, `explore-architecture.md`, etc.)
-- The e2e recipe reference (`context/e2e-recipe.md`) so the agent can self-verify
+- The e2e recipe reference (`context/e2e-recipe.md`) for self-verification
+Tell every implementation agent to report clearly when done: what they built, what files they changed, and any issues or uncertainties.
-**Tell every implementation agent to report clearly when done:** what they built, what files they changed, and any issues or uncertainties they encountered. Testing and validation happens at the orchestrator level (see Critique and Refinement below), not inside each agent.
+<delegate-outcomes>
 ### Delegate outcomes, not implementations
-Your job is to define **what needs to happen and why**, not to write the code yourself. If you find yourself writing exact code snippets, function signatures, or line-by-line fix instructions in agent prompts — you're doing the agent's job.
+Define **what needs to happen and why**, not the code to write. If you're writing exact code snippets or line-by-line fix instructions in agent prompts, you're doing the agent's job.
+<example>
+<bad>
+"Change line 45 from `x === y` to `crypto.timingSafeEqual(Buffer.from(x), Buffer.from(y))`, handle length mismatch..."
+</bad>
+<good>
+"Fix the timing-safe comparison issue in authMiddleware.ts — see report at reports/agent-002-final.md, Major #3"
+</good>
+</example>
+For fix agents: **pass the review report path and tell the agent to action the items.** The agent reads the report, understands the codebase, and figures out the right fix. Writing the code for them defeats the purpose of delegation.
-**Bad**: "Change line 45 from `x === y` to `crypto.timingSafeEqual(Buffer.from(x), Buffer.from(y))`, handle length mismatch..."
-**Good**: "Fix the timing-safe comparison issue in authMiddleware.ts — see report at reports/agent-002-final.md, Major #3"
+The exception is architectural constraints the agent wouldn't know: "use the existing `personRepository.findOrCreateOwner` method" or "the Supabase client is at `supabaseService.getClient()`". Give agents the **what** and the **landmarks**, not the **how**.
-For fix agents specifically: **pass the review report path and tell the agent to action the items.** The agent reads the report, understands the codebase, and figures out the right fix. This is why you have agents — they're capable of solving problems, not just transcribing solutions. Writing the code for them defeats the purpose of delegation and wastes your context on implementation details you shouldn't be tracking.
+</delegate-outcomes>
-The exception is architectural constraints the agent wouldn't know: "use the existing `personRepository.findOrCreateOwner` method for Neo4j sync" or "the Supabase client is at `supabaseService.getClient()`". Give agents the **what** and the **landmarks**, not the **how**.
+<context-propagation>
 ### Context propagation
-The planning phase produced context files — conventions, e2e recipe, architectural findings. Be selective — give each agent the context relevant to their task, not everything. An agent that gets `conventions.md` writes consistent code. An agent that gets `explore-architecture.md` understands where their change fits.
+The planning phase produced context files — conventions, e2e recipe, architectural findings. Be selective — give each agent the context relevant to their task.
-## Code Smell Escalation
+<example>
+<bad>
+"Implement the auth middleware. Look at how the existing middleware works."
+</bad>
+<rationale>Vague. The agent must re-explore the codebase to find conventions and patterns.</rationale>
+<good>
+"Implement auth middleware per context/requirements-auth.md and context/design-auth.md. Reference context/conventions.md for middleware patterns. E2E recipe at context/e2e-recipe.md."
+</good>
+</example>
-Instruct agents to flag problems early rather than working around them. When an agent encounters unexpected complexity, unclear architecture, or code that fights back — the right move is to stop and report clearly. A clear description of the problem is more valuable than a brittle implementation built on a bad foundation.
+</context-propagation>
-When you see these reports, investigate before pushing forward. If the smell suggests a design issue, involve the user.
+</agent-instructions>
-## Critique and Refinement
+<code-smell-escalation>
-After implementation agents report, assess whether the stage needs critique before advancing. For stages that touch core logic, integration points, or critical paths — review before building on top. For low-risk stages (types, config, boilerplate), you can defer review and batch it with a later critique cycle. The failure mode is not "sometimes skipping review" — it's implementing six stages in a row without any review at all.
+Instruct agents to flag problems early rather than working around them. When an agent encounters unexpected complexity, unclear architecture, or code that fights back — the right move is to stop and report clearly. A clear problem description is more valuable than a brittle implementation.
-### Critique cycle
+When you see these reports, investigate before pushing forward. If the smell suggests a design issue, involve the user.
-When a stage warrants critique, spawn review agents in parallel, each attacking a different dimension:
+</code-smell-escalation>
-1. **Code reuse reviewer** — searches the codebase for existing utilities, helpers, and patterns that the new code duplicates. Flags any new function that reimplements existing functionality, any inline logic that could use an existing utility.
+<critique-refinement>
-2. **Code quality reviewer** — looks for hacky patterns: redundant state, parameter sprawl, copy-paste with slight variation, leaky abstractions, stringly-typed code where constants or enums exist, unnecessary nesting or wrapping.
+## Critique Cycle
-3. **Efficiency reviewer** — looks for unnecessary work (redundant computations, duplicate API calls, N+1 patterns), missed concurrency (independent operations run sequentially), hot-path bloat, unbounded data structures, overly broad operations.
+After implementation agents report, assess whether the stage needs critique before advancing. The failure mode is not "sometimes skipping review" — it's implementing six stages in a row without any.
-Give each reviewer the full diff and relevant context files. They report problems — they don't fix them.
+When a stage warrants critique, spawn review agents in parallel, each attacking a different dimension:
+- **Code reuse** — existing utilities, helpers, patterns the new code duplicates
+- **Code quality** — hacky patterns, redundant state, parameter sprawl, copy-paste, leaky abstractions
+- **Efficiency** — redundant computations, N+1 patterns, missed concurrency, unbounded data structures
+Give each reviewer the full diff and relevant context files. They report problems — they don't fix.
-### Refine cycle
+## Refine Cycle
-Aggregate the reviewer findings. Spawn fix agents and **point them at the review report** — don't rewrite the findings as line-by-line instructions. The fix agent reads the report, reads the code, and figures out the right solution. You triage (skip false positives, note any architectural constraints) — they implement.
+Aggregate reviewer findings. Spawn fix agents and **point them at the review report** — don't rewrite findings as line-by-line instructions. You triage (skip false positives, note architectural constraints) — they implement.
 ```bash
 sisyphus spawn --name "fix-review-issues" --agent-type sisyphus:implement \
   "Fix the issues in reports/agent-003-final.md. Skip item #5 (false positive). Run type-check after."
 ```
-The fix agents should use `/simplify` to systematically review their own changes before reporting.
+Fix agents should use `/simplify` to review their own changes before reporting.
-### Repeat until clean
+Re-review after fixes. Stop when reviewers return only stylistic nits. If 3+ rounds are needed, the approach — not the patches — needs rethinking.
-Spawn reviewers again on the refined code. If they come back with new issues, fix those too. Genuinely nitpicky findings — stylistic preferences, irrelevant edge cases — can be skipped. But if a finding is actually correct, it gets done. **"I don't want to" is not a reason to skip a valid finding.** The distinction is between false positives and laziness. In practice this is usually 1-2 rounds. If it's taking more, the implementation was shaky and you should consider whether the approach needs rethinking rather than patching.
+</critique-refinement>
-## E2E Validation
+<e2e-validation>
-E2E validation confirms the implementation actually works — not just that it compiles or passes unit tests, but that the feature behaves correctly when exercised. Reserve full e2e validation for stages where you're about to build on accumulated work (integration stages, milestones where multiple stages come together) or where failure would be expensive to debug later. Not every stage needs its own e2e pass — but don't let more than 2-3 stages accumulate without one.
+E2E validation confirms the implementation actually works — not just compiles or passes unit tests. Reserve full validation for stages where you're building on accumulated work or where failure would be expensive to debug later. Don't let more than 2-3 stages accumulate without one.
 Spawn a validation agent with the e2e recipe from `context/e2e-recipe.md`. The agent should:
-- Follow the setup steps exactly (build, start servers, seed data)
-- Run every verification step in the recipe
-- Report exactly what passed and what failed — not "it looks good"
+- Follow setup steps exactly (build, start servers, seed data)
+- Run every verification step
+- Report exactly what passed and what failed
-If the recipe involves UI, the validation agent should use `capture` to screenshot and interact with the actual running app. If it involves an API, it should curl the actual endpoints. If it involves CLI behavior, it should exercise it in the terminal.
+If the recipe involves UI, use `capture` to screenshot the running app. If API, curl the endpoints. If CLI, exercise it in the terminal.
-If the project lacks validation tooling, **create it**. A smoke-test script, a seed command, a health-check endpoint — these pay for themselves immediately and every future validation agent reuses them.
+If the project lacks validation tooling, **create it** — a smoke-test script, seed command, or health-check endpoint pays for itself immediately.
-When you've chosen to validate a stage, **don't advance past it until validation passes.** If it fails, log the failures, spawn fix agents, and re-validate. A validation checkpoint you ignore is worse than no checkpoint — it creates false confidence.
+**Don't advance past a validated stage until validation passes.** If it fails, log failures, spawn fix agents, re-validate.
-## Worktree Preference
-When spawning two or more implementation agents in the same cycle, prefer `--worktree` for each. Worktree isolation eliminates file conflict risk — agents can't clobber each other's changes, each gets a clean branch, and they can commit incrementally. The daemon merges branches back when agents complete and surfaces conflicts in your next cycle's state.
+When all implementation stages are complete, transition to validation mode for the comprehensive final pass:
 ```bash
-sisyphus spawn --name "impl-auth" --agent-type sisyphus:implement --worktree "Add session middleware — see context/conventions.md"
-sisyphus spawn --name "impl-routes" --agent-type sisyphus:implement --worktree "Add login routes — see context/conventions.md and context/explore-architecture.md"
+sisyphus yield --mode validation --prompt "All stages implemented — validate against context/e2e-recipe.md"
 ```
-## Returning to Planning
+Validation mode shifts the orchestrator's entire focus to proving the feature works. Stage-level validation during implementation catches issues early; the final validation pass proves the whole thing holds together.
+</e2e-validation>
-If you discover mid-implementation that the approach is wrong — the architecture is different than expected, a dependency changes the approach, or agents keep hitting the same wall — don't keep pushing. Return to planning:
+<returning-to-planning>
+If the approach is wrong mid-implementation, don't keep pushing. Return to planning:
 ```bash
 sisyphus yield --mode planning --prompt "Re-evaluate: discovered X changes the approach — write cycle log"
 ```
-Document what you found in the cycle log before yielding so the planning cycle starts informed. Update roadmap.md to reflect that you're back in an earlier phase.
+Concrete triggers:
+- 2+ agents report same unexpected complexity in the same subsystem
+- An agent discovers a dependency that changes the approach
+- Fix agents keep patching the same area across cycles
+Document what you found in the cycle log before yielding. Update roadmap.md to reflect you're back in an earlier phase.
+</returning-to-planning>

package/dist/templates/orchestrator-planning.md CHANGED Viewed

@@ -1,90 +1,72 @@
 # Planning Phase
-## Planning Phase Flow
+<planning-workflow>
-The natural sequence: **context → spec → roadmap refinement → detailed planning.** Context documents come first because they feed everything downstream — spec writers, planners, and implementers all benefit from not having to re-explore the codebase. After the spec is aligned, revisit the roadmap — that's when you actually understand scope well enough to flesh out phases honestly.
+The natural sequence: **context → requirements → design → roadmap refinement → detailed planning.** Context documents come first because they feed everything downstream — requirements analysts, designers, planners, and implementers all benefit from not having to re-explore the codebase. After the requirements and design are aligned, revisit the roadmap — that's when you actually understand scope well enough to flesh out phases honestly.
-## Exploration
+</planning-workflow>
-Use explore agents to build understanding before making decisions. Each agent should save a focused context document to `.sisyphus/sessions/$SISYPHUS_SESSION_ID/context/` — these artifacts get passed to downstream agents so they don't have to re-explore the codebase themselves.
+<exploration>
-Adapt the number and focus of explore agents to the task. Key principles:
+Use explore agents to build understanding before making decisions. Each agent saves a focused context document to `$SISYPHUS_SESSION_DIR/context/`.
-- **Each agent produces a focused artifact** — not one sprawling document. Focused documents can be selectively passed to downstream agents. An agent implementing auth gets `conventions.md` + `architecture.md`, not a 500-line dump.
-- **Conventions and patterns are high-value** to capture. Implementation agents that receive convention context write consistent code. Ones that don't produce code you'll have to fix.
-- **Exploration serves different purposes at different stages.** Early exploration is architectural — understanding the system and what needs to change. Later exploration before a specific stage is tactical — identifying files, patterns to follow, utilities to reuse. Both are valuable.
-- **Delegate understanding of unfamiliar territory.** If the task touches a library or subsystem you don't know, spawn an agent to investigate and report.
+- **Each agent produces a focused artifact** — not one sprawling document. Focused documents can be selectively passed to downstream agents.
+- **Conventions and patterns are high-value** to capture. Implementation agents that receive convention context write consistent code.
+- **Exploration serves different purposes at different stages.** Early exploration is architectural. Later exploration before a specific stage is tactical — files, patterns, utilities to reuse.
+- **Delegate understanding of unfamiliar territory.** If the task touches an unfamiliar library or subsystem, spawn an agent to investigate and report.
-## Spec Alignment
+</exploration>
-Before investing in a detailed spec, make sure the goal itself is well-defined. If you're making assumptions about scope, requirements, or constraints — surface them to the user. A spec built on wrong assumptions wastes every cycle downstream.
+<requirements-alignment>
-For significant features, spec refinement is iterative:
-- Draft the spec based on exploration findings
-- Have agents review for feasibility and code smells (can this actually work given the codebase?)
-- Seek user alignment on the high-level approach and any decisions that set direction
-- **Apply corrections back to the spec itself** — the spec is the single source of truth. Don't create a separate corrections file and pass both downstream; update the spec and delete the corrections. Plan agents should read one authoritative document, not reconcile two contradictory ones.
+Before investing in detailed requirements, make sure the goal is well-defined. If you're making assumptions about scope, requirements, or constraints — surface them to the user.
-Not every stage needs a standalone spec document — a well-defined stage might just be a detailed section in the implementation plan. Use judgment about how much formality each stage warrants.
+For significant features, requirements refinement is iterative:
+- Draft requirements based on exploration findings
+- Have agents review for feasibility (can this actually work given the codebase?)
+- Seek user alignment on the high-level approach
+- **Fold new knowledge into authoritative documents.** When reviews, exploration, or user feedback change the understanding, update the requirements and design documents directly — they are the single source of truth. Don't create correction files, addendum files, or decision logs alongside them. Remove superseded material rather than annotating it. Plan agents should read clean, current documents — not reconcile contradictions or skip over resolved questions.
-## Roadmap Refinement
+Not every stage needs standalone requirements — a well-defined stage might just be a detailed section in the implementation plan.
-Once you have context docs and an aligned spec, revisit the roadmap. This is the first point where you understand real scope — adjust phase boundaries, add phases you didn't anticipate, reorder for dependencies. Keep future phases at outline level; just make sure the shape is honest.
+</requirements-alignment>
-## Delegating to the Plan Lead
+<plan-delegation>
-Spawn **one plan lead** per feature. Point it at **inputs** (spec, context docs, corrections) — not a pre-made structure. Don't pre-decide staging, ordering, or design decisions. The plan lead has `effort: max` reasoning and handles its own decomposition: it will assess scope, delegate sub-plans to specialist agents if the feature is large enough, run adversarial reviews on the result, and deliver a synthesized master plan.
+Once you have context docs and aligned requirements/design, revisit the roadmap — this is the first point where you understand real scope. Roadmap refinement means updating the four canonical sections: current stage, exit criteria, active context references, and next steps. Decisions from exploration, requirements, and design fold into context documents — not the roadmap.
-**Don't split the planning yourself.** The plan lead decides whether to plan solo or delegate sub-plans to domain-specific agents. If the orchestrator pre-splits into "backend plan agent" and "frontend plan agent," the plan lead's synthesis step — where it resolves cross-domain conflicts, finds gaps, and stress-tests edge cases — never happens. One plan lead per feature, and trust it to decompose internally.
+Spawn **one plan lead** per feature. Point it at inputs (requirements, design, context docs) — not a pre-made structure. The plan lead handles its own decomposition: it assesses scope, delegates sub-plans if needed, runs adversarial reviews, and delivers a synthesized master plan. **Delegate outcomes, not implementations** — tell the plan lead what needs planning and why, not how to structure the plan.
-**When to spawn multiple plan leads:** Only for genuinely independent features with no shared files or integration points. If two features touch the same codebase area, one plan lead should own both — otherwise you'll get conflicting plans with no one responsible for reconciling them.
+**Don't split planning yourself.** If the orchestrator pre-splits into "backend plan agent" and "frontend plan agent," the plan lead's synthesis step — resolving cross-domain conflicts, finding gaps, stress-testing edge cases — never happens.
-## Progressive Development
+**When to spawn multiple plan leads:** Only for genuinely independent features with no shared files or integration points.
-Not all tasks need the same process depth. A 2-file bug fix can go straight to implementation. A cross-repo feature with multiple domains needs full phased development.
+</plan-delegation>
-### Decision heuristic
+<progressive-development>
-- **Small task** (1-3 files, single domain): Skip phases — roadmap is just a short task checklist (diagnose, fix, validate). Single plan agent, single implement agent.
-- **Large task** (3+ stages, multiple domains or repos): Full phased development. The roadmap tracks development phases, and each phase produces artifacts in `context/`.
+Not all tasks need the same process depth.
-Signs you need phased development: the task touches multiple unfamiliar subsystems, the task description spans different concerns (backend, frontend, IPC, etc.), or a spec exists with more than 3 distinct work areas.
+- **Small task** (1-3 files, single domain): Skip phases — roadmap is a short checklist (diagnose, fix, validate). Single plan agent, single implement agent.
+- **Large task** (3+ stages, multiple domains): Full phased development. The roadmap tracks phases, each producing artifacts in `context/`.
-### Implementation stages are context artifacts
+Signs you need phased development: multiple unfamiliar subsystems, the task spans different concerns (backend, frontend, IPC), or the requirements have more than 3 distinct work areas.
-When Phase 3 (Plan) runs, it produces implementation stage breakdowns saved to `context/`:
-- `context/plan-implementation.md` — overall stage outline with dependencies
-- `context/plan-stage-1-types.md` — detailed plan for stage 1
-- `context/plan-stage-2-service.md` — detailed plan for stage 2 (written when stage 1 is underway)
+Implementation stages are context artifacts — saved to `context/plan-stage-N-*.md`. Detail-plan one stage at a time; what you learn implementing stage N informs stage N+1.
-### Don't front-load phases
+</progressive-development>
-Detail-plan one stage at a time. What you learn implementing stage N informs stage N+1's detail plan. The stage outline evolves — stages get added, removed, reordered, or split as understanding grows. That's the system working correctly.
-Detailed plans for stages 4-7 written before stage 1 is implemented are fiction. Defer detail until you're about to execute.
-## E2E Verification Recipe
+<verification-planning>
 Before implementation begins, determine how to concretely verify the change works end-to-end. This is the single most common failure mode: agents report success but nothing actually works.
-The tooling explorer should have mapped the available infrastructure. Common patterns:
-- **Browser automation**: `capture` CLI for UI changes — click through affected flows, screenshot results
-- **CLI verification**: exercise changed behavior interactively in tmux
-- **API testing**: dev server + curl/httpie for endpoint changes
-- **Integration tests**: existing e2e or integration test suite
-- **Smoke script**: create one if nothing else exists
+If you cannot determine a concrete verification method, **ask the user**. Do not proceed to implementation without a verification plan.
-If you cannot determine a concrete verification method, **ask the user**. Offer 2-3 specific options. Do not proceed to implementation without a verification plan.
+Write the recipe to `context/e2e-recipe.md` with setup steps, exact commands or interactions to verify, and what success looks like. Make it executable, not aspirational. Implementation agents and validation agents both reference this file.
-Write the recipe to `context/e2e-recipe.md` with:
-- Setup steps (start dev server, build, seed data, etc.)
-- Exact commands or interactions to verify
-- What success looks like (expected output, visual state, response codes)
+</verification-planning>
-Implementation agents and validation agents both reference this file. Write it to be executable, not aspirational.
-## Transitioning to Implementation
+<transition>
 When you have enough understanding, a reviewed plan, and a verification recipe — transition explicitly:
@@ -92,4 +74,12 @@ When you have enough understanding, a reviewed plan, and a verification recipe
 sisyphus yield --mode implementation --prompt "Begin implementation — see roadmap.md and context/plan-implementation.md"
 ```
-The `--mode implementation` flag loads implementation-phase guidance for the next cycle. Pass a prompt that orients the next cycle to where things stand.
+The `--mode implementation` flag loads implementation-phase guidance for the next cycle.
+After implementation is complete, transition to validation mode to prove the feature works:
+```bash
+sisyphus yield --mode validation --prompt "Implementation complete — validate against context/e2e-recipe.md"
+```
+</transition>

package/dist/templates/orchestrator-plugin/commands/sisyphus/design.md ADDED Viewed

@@ -0,0 +1,13 @@
+---
+description: Create technical design from requirements through investigation and user iteration
+argument-hint: <topic or description>
+---
+# Technical Design
+**Input:** $ARGUMENTS
+The user wants a technical design before implementation begins.
+Spawn a `sisyphus:design` agent to lead this — it's interactive, investigates the codebase, proposes architecture, and iterates with the user. Output goes to `context/design.md`. It expects `context/requirements.md` to exist; if it doesn't, flag that to the user or run requirements first.
+If the current strategy doesn't include a design stage, update it before spawning. Don't do the design work yourself.

package/dist/templates/orchestrator-plugin/commands/sisyphus/problem.md ADDED Viewed

@@ -0,0 +1,13 @@
+---
+description: Explore the problem space collaboratively before committing to a solution
+argument-hint: <topic or description>
+---
+# Problem Exploration
+**Input:** $ARGUMENTS
+The user wants to step back and explore the problem space before committing to a direction. This is a signal to prioritize understanding over progress.
+Spawn a `sisyphus:problem` agent to lead this — it's interactive, collaborates with the user, and saves findings to `context/problem.md`. If the current strategy doesn't account for a problem exploration stage, update it before spawning.
+Don't do the exploration yourself. The `sisyphus:problem` agent is purpose-built for divergent thinking and user collaboration.

package/dist/templates/orchestrator-plugin/commands/sisyphus/requirements.md ADDED Viewed

@@ -0,0 +1,13 @@
+---
+description: Define behavioral requirements with EARS acceptance criteria
+argument-hint: <topic or description>
+---
+# Requirements
+**Input:** $ARGUMENTS
+The user wants formal requirements defined before design or implementation proceeds.
+Spawn a `sisyphus:requirements` agent to lead this — it's interactive, drafts EARS-format requirements, and iterates with the user until approved. Output goes to `context/requirements.md`. If the current strategy doesn't include a requirements stage, update it before spawning.
+Don't draft requirements yourself. The `sisyphus:requirements` agent handles the full process: codebase investigation, drafting, and user iteration.

package/dist/templates/orchestrator-plugin/commands/sisyphus/strategize.md ADDED Viewed

@@ -0,0 +1,19 @@
+---
+description: Redirect session strategy — reactivate if completed, then respawn in strategy mode
+argument-hint: <new direction or focus>
+---
+# Strategize
+**Input:** $ARGUMENTS
+The user wants to redirect this session's strategy.
+## Steps
+1. If the session is completed (`sisyphus status`), reactivate it with `sisyphus continue`.
+2. Annotate `strategy.md` with the pivot — what changed, new focus, which existing artifacts still apply. Don't rewrite the whole strategy.
+3. Yield to strategy mode:
+   ```bash
+   sisyphus yield --mode strategy --prompt "<concise description of the new direction>"
+   ```
+   This respawns a fresh orchestrator that will re-evaluate the goal, stages, and approach.

package/dist/templates/orchestrator-plugin/hooks/explore-gate.sh ADDED Viewed

@@ -0,0 +1,15 @@
+#!/bin/bash
+if [ -z "$SISYPHUS_SESSION_DIR" ]; then exit 0; fi
+CONTEXT_DIR="${SISYPHUS_SESSION_DIR}/context"
+# Gate passes if any explore context file exists
+if ls "${CONTEXT_DIR}"/explore-*.md 1>/dev/null 2>&1; then
+  exit 0
+fi
+cat <<'GATE'
+<explore-gate>
+No exploration context exists yet. Before planning or delegating work, spawn explore agents to build codebase understanding.
+</explore-gate>
+GATE

package/dist/templates/orchestrator-plugin/hooks/hooks.json CHANGED Viewed

@@ -1 +1,14 @@
-{"hooks":{}}
+{
+  "hooks": {
+    "UserPromptSubmit": [
+      {
+        "hooks": [
+          {
+            "type": "command",
+            "command": "bash ${CLAUDE_PLUGIN_ROOT}/hooks/explore-gate.sh"
+          }
+        ]
+      }
+    ]
+  }
+}

package/dist/templates/orchestrator-plugin/skills/orchestration/task-patterns.md CHANGED Viewed

@@ -39,7 +39,7 @@ Usually serial — diagnosis must complete before fix, fix before validation. Ex
 ## Feature Build (Small — 1-3 files)
 ### When to use
-Clear requirements, small scope, no spec needed.
+Clear requirements, small scope, no formal requirements document needed.
 ### Plan structure
 ```
@@ -70,10 +70,12 @@ Feature with moderate complexity. Requirements may need clarification. Multiple
 ```
 ## Feature: [description]
-### Spec & Planning
-- [ ] Draft spec — investigate codebase, propose approach
-- [ ] Create implementation plan from spec
-- [ ] Review plan against spec
+### Requirements & Design
+- [ ] Problem exploration — understand goals, constraints, assumptions
+- [ ] Requirements — define acceptance criteria
+- [ ] Design — architecture, component boundaries, data models
+- [ ] Create implementation plan from requirements + design
+- [ ] Review plan against requirements + design
 ### Implementation
 - [ ] Phase 1 — [foundation/types/interfaces]
@@ -87,18 +89,20 @@ Feature with moderate complexity. Requirements may need clarification. Multiple
 Note: critique and validation are embedded between implementation phases, not deferred to the end. Phase 1 (types) is low-risk and doesn't need its own review, but critique catches issues before Phase 3 builds on them. Validation happens after integration, when all the pieces come together.
 ### Cycle plan
-- **Cycle 1**: Spawn `sisyphus:spec-draft` for spec. Yield. (Human iterates on spec between cycles.)
-- **Cycle 2**: Spawn `sisyphus:plan` for plan. Yield.
-- **Cycle 3**: Spawn `sisyphus:review-plan` for review. If fail, respawn plan with issues. Yield.
-- **Cycle 4**: Spawn `sisyphus:implement` for Phase 1. Yield.
-- **Cycle 5**: Spawn `sisyphus:implement` for Phase 2. Phase 1 is types — low risk, doesn't need its own validation. Yield.
-- **Cycle 6**: Spawn `sisyphus:review` for critique of phases 1-2. This is the checkpoint before integration builds on top. Yield.
-- **Cycle 7**: Address critique findings + spawn `sisyphus:implement` for Phase 3. Yield.
-- **Cycle 8**: Spawn `sisyphus:validate` for e2e smoketest. Yield.
-- **Cycle 9**: Address validation failures or complete.
+- **Cycle 1**: Spawn `sisyphus:problem` for problem exploration. Yield. (Human iterates between cycles.)
+- **Cycle 2**: Spawn `sisyphus:requirements` for requirements analysis. Yield. (Human reviews/iterates.)
+- **Cycle 3**: Spawn `sisyphus:design` for technical design. Yield. (Human reviews/iterates.)
+- **Cycle 4**: Spawn `sisyphus:plan` for plan. Yield.
+- **Cycle 5**: Spawn `sisyphus:review-plan` for review. If fail, respawn plan with issues. Yield.
+- **Cycle 6**: Spawn `sisyphus:implement` for Phase 1. Yield.
+- **Cycle 7**: Spawn `sisyphus:implement` for Phase 2. Phase 1 is types — low risk, doesn't need its own validation. Yield.
+- **Cycle 8**: Spawn `sisyphus:review` for critique of phases 1-2. This is the checkpoint before integration builds on top. Yield.
+- **Cycle 9**: Address critique findings + spawn `sisyphus:implement` for Phase 3. Yield.
+- **Cycle 10**: `sisyphus yield --mode validation` for e2e smoketest. Validation mode proves the feature works — operator for UI, evidence for every claim.
+- **Cycle 11**: Address validation failures (back to `--mode implementation`) or complete.
 ### Failure modes
-- **Spec needs human input**: Mark session as needing human review. Orchestrator notes open questions.
+- **Requirements/design needs human input**: Mark session as needing human review. Orchestrator notes open questions.
 - **Plan fails review**: Feed review issues back, respawn planner.
 - **Critique finds issues in foundation**: Fix before starting integration — don't build on shaky ground.
 - **Validation fails**: Feed specifics back to implement agent for the failing area.
@@ -117,9 +121,10 @@ Cross-cutting feature, multiple domains, needs team coordination. Uses **progres
 ```
 ## Feature: [description]
-### Spec
-- [ ] Draft spec
-- [ ] Review spec
+### Requirements & Design
+- [ ] Problem exploration
+- [ ] Requirements
+- [ ] Design
 ### Stage Outline (high-level only — no file-level detail yet)
 1. [domain A foundation] — no deps — ~N cycles
@@ -140,15 +145,17 @@ See context/plan-stage-N-{name}.md for detail plan.
 Note: verification checkpoints are embedded in the stage outline, not deferred to a final phase. The level of rigor varies — foundation stages get a light critique, core logic gets critique + validation, integration gets full e2e validation. This is judgment, not formula.
 ### Cycle plan
-- **Cycle 1**: Spawn `sisyphus:spec-draft` for spec. Yield.
-- **Cycle 2**: Spawn `sisyphus:plan` for **high-level stage outline only**. Instruction: "Outline stages, dependencies, one-sentence descriptions, cycle estimates. Include verification checkpoints between stages based on risk." Spawn `sisyphus:test-spec` for test properties (parallel). Yield.
-- **Cycle 3**: Review outline. Spawn `sisyphus:plan` to **detail-plan stage 1 only** (provide outline as context). Output to `context/plan-stage-1-{name}.md`. Yield.
-- **Cycle 4**: Spawn `sisyphus:implement` for stage 1. If stage 2 is independent, spawn `sisyphus:plan` to detail-plan stage 2 in parallel. Yield.
-- **Cycle 5**: Spawn `sisyphus:implement` for stage 2 (if detail-planned). Spawn `sisyphus:review` to critique stages 1-2 in parallel — foundation review before core logic builds on it. Detail-plan stage 3 in parallel. Yield.
-- **Cycle 6**: Address critique findings. Spawn `sisyphus:implement` for stage 3. Yield.
-- **Cycle 7**: Spawn `sisyphus:implement` for stage 4. Spawn `sisyphus:review` to critique stage 3 in parallel. Yield.
-- **Cycle 8**: Spawn `sisyphus:validate` for stages 3-4 — core logic checkpoint before integration. Address stage 3 critique. Yield.
-- **Cycle 9+**: Implement integration stage. Validate e2e. Final review.
+- **Cycle 1**: Spawn `sisyphus:problem` for problem exploration. Yield.
+- **Cycle 2**: Spawn `sisyphus:requirements` for requirements. Yield.
+- **Cycle 3**: Spawn `sisyphus:design` for design. Yield.
+- **Cycle 4**: Spawn `sisyphus:plan` for **high-level stage outline only**. Instruction: "Outline stages, dependencies, one-sentence descriptions, cycle estimates. Include verification checkpoints between stages based on risk." Spawn `sisyphus:test-spec` for test properties (parallel). Yield.
+- **Cycle 5**: Review outline. Spawn `sisyphus:plan` to **detail-plan stage 1 only** (provide outline as context). Output to `context/plan-stage-1-{name}.md`. Yield.
+- **Cycle 6**: Spawn `sisyphus:implement` for stage 1. If stage 2 is independent, spawn `sisyphus:plan` to detail-plan stage 2 in parallel. Yield.
+- **Cycle 7**: Spawn `sisyphus:implement` for stage 2 (if detail-planned). Spawn `sisyphus:review` to critique stages 1-2 in parallel — foundation review before core logic builds on it. Detail-plan stage 3 in parallel. Yield.
+- **Cycle 8**: Address critique findings. Spawn `sisyphus:implement` for stage 3. Yield.
+- **Cycle 9**: Spawn `sisyphus:implement` for stage 4. Spawn `sisyphus:review` to critique stage 3 in parallel. Yield.
+- **Cycle 10**: Spawn `sisyphus:validate` for stages 3-4 — core logic checkpoint before integration. Address stage 3 critique. Yield.
+- **Cycle 11+**: Implement integration stage. Final review. Then `sisyphus yield --mode validation` for comprehensive e2e proof.
 ### Failure modes
 - **Detail-plan agent can't produce quality output**: The stage is still too large. Break it into sub-stages in the outline and detail-plan each sub-stage individually.