sisyphi 1.0.13 → 1.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (100) hide show
  1. package/dist/{chunk-T7ETTIQK.js → chunk-M7LZ2ZHD.js} +3 -27
  2. package/dist/chunk-M7LZ2ZHD.js.map +1 -0
  3. package/dist/{chunk-JXKUI4P6.js → chunk-REUQ4B45.js} +7 -38
  4. package/dist/chunk-REUQ4B45.js.map +1 -0
  5. package/dist/{chunk-LWWRGQWM.js → chunk-Z32YVDMY.js} +2 -2
  6. package/dist/chunk-Z32YVDMY.js.map +1 -0
  7. package/dist/cli.js +75 -56
  8. package/dist/cli.js.map +1 -1
  9. package/dist/daemon.js +776 -629
  10. package/dist/daemon.js.map +1 -1
  11. package/dist/{paths-NUUALUVP.js → paths-IJXOAN4E.js} +4 -6
  12. package/dist/templates/CLAUDE.md +16 -14
  13. package/dist/templates/agent-plugin/agents/CLAUDE.md +17 -6
  14. package/dist/templates/agent-plugin/agents/design.md +134 -0
  15. package/dist/templates/agent-plugin/agents/explore.md +39 -0
  16. package/dist/templates/agent-plugin/agents/operator.md +24 -0
  17. package/dist/templates/agent-plugin/agents/plan.md +15 -20
  18. package/dist/templates/agent-plugin/agents/problem.md +119 -0
  19. package/dist/templates/agent-plugin/agents/requirements.md +138 -0
  20. package/dist/templates/agent-plugin/agents/review/CLAUDE.md +29 -0
  21. package/dist/templates/agent-plugin/agents/review/compliance.md +6 -6
  22. package/dist/templates/agent-plugin/agents/review-plan/code-smells.md +4 -4
  23. package/dist/templates/agent-plugin/agents/review-plan/requirements-coverage.md +62 -0
  24. package/dist/templates/agent-plugin/agents/review-plan/security.md +1 -1
  25. package/dist/templates/agent-plugin/agents/review-plan.md +9 -8
  26. package/dist/templates/agent-plugin/agents/review.md +1 -1
  27. package/dist/templates/agent-plugin/agents/test-spec.md +3 -3
  28. package/dist/templates/agent-plugin/hooks/CLAUDE.md +2 -2
  29. package/dist/templates/agent-plugin/hooks/explore-user-prompt.sh +13 -0
  30. package/dist/templates/agent-plugin/hooks/plan-user-prompt.sh +1 -1
  31. package/dist/templates/agent-plugin/hooks/require-submit.sh +70 -3
  32. package/dist/templates/agent-plugin/hooks/review-plan-user-prompt.sh +4 -4
  33. package/dist/templates/agent-plugin/hooks/review-user-prompt.sh +1 -1
  34. package/dist/templates/agent-suffix.md +0 -2
  35. package/dist/templates/orchestrator-base.md +169 -145
  36. package/dist/templates/orchestrator-impl.md +92 -57
  37. package/dist/templates/orchestrator-planning.md +46 -56
  38. package/dist/templates/orchestrator-plugin/commands/sisyphus/design.md +13 -0
  39. package/dist/templates/orchestrator-plugin/commands/sisyphus/problem.md +13 -0
  40. package/dist/templates/orchestrator-plugin/commands/sisyphus/requirements.md +13 -0
  41. package/dist/templates/orchestrator-plugin/commands/sisyphus/strategize.md +19 -0
  42. package/dist/templates/orchestrator-plugin/hooks/explore-gate.sh +15 -0
  43. package/dist/templates/orchestrator-plugin/hooks/hooks.json +14 -1
  44. package/dist/templates/orchestrator-plugin/skills/orchestration/task-patterns.md +34 -27
  45. package/dist/templates/orchestrator-plugin/skills/orchestration/workflow-examples.md +56 -24
  46. package/dist/templates/orchestrator-strategy.md +233 -0
  47. package/dist/templates/orchestrator-validation.md +94 -0
  48. package/dist/tui.js +2730 -2924
  49. package/dist/tui.js.map +1 -1
  50. package/package.json +2 -4
  51. package/templates/CLAUDE.md +16 -14
  52. package/templates/agent-plugin/agents/CLAUDE.md +17 -6
  53. package/templates/agent-plugin/agents/design.md +134 -0
  54. package/templates/agent-plugin/agents/explore.md +39 -0
  55. package/templates/agent-plugin/agents/operator.md +24 -0
  56. package/templates/agent-plugin/agents/plan.md +15 -20
  57. package/templates/agent-plugin/agents/problem.md +119 -0
  58. package/templates/agent-plugin/agents/requirements.md +138 -0
  59. package/templates/agent-plugin/agents/review/CLAUDE.md +29 -0
  60. package/templates/agent-plugin/agents/review/compliance.md +6 -6
  61. package/templates/agent-plugin/agents/review-plan/code-smells.md +4 -4
  62. package/templates/agent-plugin/agents/review-plan/requirements-coverage.md +62 -0
  63. package/templates/agent-plugin/agents/review-plan/security.md +1 -1
  64. package/templates/agent-plugin/agents/review-plan.md +9 -8
  65. package/templates/agent-plugin/agents/review.md +1 -1
  66. package/templates/agent-plugin/agents/test-spec.md +3 -3
  67. package/templates/agent-plugin/hooks/CLAUDE.md +2 -2
  68. package/templates/agent-plugin/hooks/explore-user-prompt.sh +13 -0
  69. package/templates/agent-plugin/hooks/plan-user-prompt.sh +1 -1
  70. package/templates/agent-plugin/hooks/require-submit.sh +70 -3
  71. package/templates/agent-plugin/hooks/review-plan-user-prompt.sh +4 -4
  72. package/templates/agent-plugin/hooks/review-user-prompt.sh +1 -1
  73. package/templates/agent-suffix.md +0 -2
  74. package/templates/orchestrator-base.md +169 -145
  75. package/templates/orchestrator-impl.md +92 -57
  76. package/templates/orchestrator-planning.md +46 -56
  77. package/templates/orchestrator-plugin/commands/sisyphus/design.md +13 -0
  78. package/templates/orchestrator-plugin/commands/sisyphus/problem.md +13 -0
  79. package/templates/orchestrator-plugin/commands/sisyphus/requirements.md +13 -0
  80. package/templates/orchestrator-plugin/commands/sisyphus/strategize.md +19 -0
  81. package/templates/orchestrator-plugin/hooks/explore-gate.sh +15 -0
  82. package/templates/orchestrator-plugin/hooks/hooks.json +14 -1
  83. package/templates/orchestrator-plugin/skills/orchestration/task-patterns.md +34 -27
  84. package/templates/orchestrator-plugin/skills/orchestration/workflow-examples.md +56 -24
  85. package/templates/orchestrator-strategy.md +233 -0
  86. package/templates/orchestrator-validation.md +94 -0
  87. package/dist/chunk-JXKUI4P6.js.map +0 -1
  88. package/dist/chunk-LWWRGQWM.js.map +0 -1
  89. package/dist/chunk-T7ETTIQK.js.map +0 -1
  90. package/dist/templates/agent-plugin/agents/review-plan/spec-coverage.md +0 -44
  91. package/dist/templates/agent-plugin/agents/spec-draft.md +0 -78
  92. package/dist/templates/agent-plugin/hooks/hooks.json +0 -25
  93. package/dist/templates/agent-plugin/hooks/spec-user-prompt.sh +0 -19
  94. package/dist/templates/orchestrator-plugin/skills/git-management/SKILL.md +0 -111
  95. package/templates/agent-plugin/agents/review-plan/spec-coverage.md +0 -44
  96. package/templates/agent-plugin/agents/spec-draft.md +0 -78
  97. package/templates/agent-plugin/hooks/hooks.json +0 -25
  98. package/templates/agent-plugin/hooks/spec-user-prompt.sh +0 -19
  99. package/templates/orchestrator-plugin/skills/git-management/SKILL.md +0 -111
  100. /package/dist/{paths-NUUALUVP.js.map → paths-IJXOAN4E.js.map} +0 -0
@@ -1,122 +1,157 @@
1
1
  # Implementation Phase
2
2
 
3
- ## Stage-by-Stage Execution
3
+ <stage-execution>
4
4
 
5
- ### Maximize parallelism
5
+ ## Maximize Parallelism
6
6
 
7
- Before starting each cycle, ask: **which stages or tasks are independent right now?** If two stages touch different subsystems (e.g., backend vs frontend, separate services, unrelated modules), spawn them concurrently — don't serialize work that doesn't need to be serialized. Use `--worktree` when parallel agents might touch overlapping files.
7
+ Before each cycle, ask: **which stages or tasks are independent right now?** If two stages touch different subsystems, spawn them concurrently.
8
8
 
9
- Maximize parallelism **within your development cycle, not by skipping parts of it.** Running a review alongside the next stage's implementation is good parallelism. Skipping review because the next stage is ready is not — that's cutting corners faster, not working faster. A cycle with one agent running is a wasted cycle if other work was ready, but "other work" includes critique and validation agents, not just the next implementation stage.
9
+ Maximize parallelism **within your development cycle, not by skipping parts of it.** Running a review alongside the next stage's implementation is good parallelism. Skipping review because the next stage is ready is cutting corners.
10
10
 
11
- If the plan has stages that share no file dependencies, **run them in parallel from the start.** The development cycle for each stage involves some combination of:
11
+ If the plan has stages that share no file dependencies, run them in parallel from the start. The development cycle for each stage:
12
12
 
13
- 1. **Detail-plan it** — expand the high-level outline into specific file changes, informed by previous stages. If complex enough, spawn a spec agent first.
14
- 2. **Implement it** — spawn agents with self-contained instructions (see Agent Instructions below). May itself take multiple cycles if the stage has enough work.
15
- 3. **Critique and refine it** — spawn review agents, fix what they find (see Critique and Refinement below).
16
- 4. **Validate it** — spawn a validation agent to verify the stage actually works (see E2E Validation below).
13
+ 1. **Detail-plan it** — expand the outline into specific file changes. If complex, spawn a requirements or design agent first.
14
+ 2. **Implement it** — spawn agents with self-contained instructions.
15
+ 3. **Critique and refine it** — spawn review agents, fix what they find.
16
+ 4. **Validate it** — verify the stage actually works end-to-end.
17
17
 
18
- Not every stage needs every step. Use your judgment about what level of rigor each stage deserves:
19
- - A types/interfaces stage might just need implementation the next stage that consumes the types will surface any problems.
20
- - A core business logic stage needs implementation + critique at minimum — subtle bugs here cascade everywhere.
21
- - An integration stage or anything touching critical paths needs the full loop including validation — you're building on accumulated assumptions and need to verify they hold.
18
+ Not every stage needs every step:
19
+ - Types/interfaces implementation only (consumers surface type errors)
20
+ - Core business logic implementation + critique minimum
21
+ - Integration/critical path full loop including validation
22
22
 
23
- The key question each cycle: **what's the riskiest unverified work right now?** If you just finished a foundation stage and are about to build on it, validate the foundation. If you just implemented a low-risk config change, move on and batch it into a broader review later. When multiple stages have completed without any critique or validation, you've lost the feedback loop — stop implementing and catch up on verification before problems compound.
23
+ **When multiple stages have completed without any critique or validation, stop implementing and catch up on verification.** Don't let unverified work compound.
24
24
 
25
25
  Don't detail-plan all stages up front. What you learn implementing earlier stages should inform later ones.
26
26
 
27
- ## Agent Instructions
27
+ </stage-execution>
28
28
 
29
- Implementation agent prompts must be **fully self-contained** — include everything the agent needs so it doesn't have to re-explore or guess. Each spawn instruction should include:
29
+ <agent-instructions>
30
30
 
31
- - The overall goal of the session (one sentence)
31
+ Implementation agent prompts must be **fully self-contained** include everything the agent needs so it doesn't have to re-explore or guess:
32
+
33
+ - The overall session goal (one sentence)
32
34
  - This agent's specific task (files to create/modify, what the change does, done condition)
33
35
  - References to relevant context files (`conventions.md`, `explore-architecture.md`, etc.)
34
- - The e2e recipe reference (`context/e2e-recipe.md`) so the agent can self-verify
36
+ - The e2e recipe reference (`context/e2e-recipe.md`) for self-verification
37
+
38
+ Tell every implementation agent to report clearly when done: what they built, what files they changed, and any issues or uncertainties.
35
39
 
36
- **Tell every implementation agent to report clearly when done:** what they built, what files they changed, and any issues or uncertainties they encountered. Testing and validation happens at the orchestrator level (see Critique and Refinement below), not inside each agent.
40
+ <delegate-outcomes>
37
41
 
38
42
  ### Delegate outcomes, not implementations
39
43
 
40
- Your job is to define **what needs to happen and why**, not to write the code yourself. If you find yourself writing exact code snippets, function signatures, or line-by-line fix instructions in agent prompts you're doing the agent's job.
44
+ Define **what needs to happen and why**, not the code to write. If you're writing exact code snippets or line-by-line fix instructions in agent prompts, you're doing the agent's job.
45
+
46
+ <example>
47
+ <bad>
48
+ "Change line 45 from `x === y` to `crypto.timingSafeEqual(Buffer.from(x), Buffer.from(y))`, handle length mismatch..."
49
+ </bad>
50
+ <good>
51
+ "Fix the timing-safe comparison issue in authMiddleware.ts — see report at reports/agent-002-final.md, Major #3"
52
+ </good>
53
+ </example>
54
+
55
+ For fix agents: **pass the review report path and tell the agent to action the items.** The agent reads the report, understands the codebase, and figures out the right fix. Writing the code for them defeats the purpose of delegation.
41
56
 
42
- **Bad**: "Change line 45 from `x === y` to `crypto.timingSafeEqual(Buffer.from(x), Buffer.from(y))`, handle length mismatch..."
43
- **Good**: "Fix the timing-safe comparison issue in authMiddleware.ts — see report at reports/agent-002-final.md, Major #3"
57
+ The exception is architectural constraints the agent wouldn't know: "use the existing `personRepository.findOrCreateOwner` method" or "the Supabase client is at `supabaseService.getClient()`". Give agents the **what** and the **landmarks**, not the **how**.
44
58
 
45
- For fix agents specifically: **pass the review report path and tell the agent to action the items.** The agent reads the report, understands the codebase, and figures out the right fix. This is why you have agents — they're capable of solving problems, not just transcribing solutions. Writing the code for them defeats the purpose of delegation and wastes your context on implementation details you shouldn't be tracking.
59
+ </delegate-outcomes>
46
60
 
47
- The exception is architectural constraints the agent wouldn't know: "use the existing `personRepository.findOrCreateOwner` method for Neo4j sync" or "the Supabase client is at `supabaseService.getClient()`". Give agents the **what** and the **landmarks**, not the **how**.
61
+ <context-propagation>
48
62
 
49
63
  ### Context propagation
50
64
 
51
- The planning phase produced context files — conventions, e2e recipe, architectural findings. Be selective — give each agent the context relevant to their task, not everything. An agent that gets `conventions.md` writes consistent code. An agent that gets `explore-architecture.md` understands where their change fits.
65
+ The planning phase produced context files — conventions, e2e recipe, architectural findings. Be selective — give each agent the context relevant to their task.
52
66
 
53
- ## Code Smell Escalation
67
+ <example>
68
+ <bad>
69
+ "Implement the auth middleware. Look at how the existing middleware works."
70
+ </bad>
71
+ <rationale>Vague. The agent must re-explore the codebase to find conventions and patterns.</rationale>
72
+ <good>
73
+ "Implement auth middleware per context/requirements-auth.md and context/design-auth.md. Reference context/conventions.md for middleware patterns. E2E recipe at context/e2e-recipe.md."
74
+ </good>
75
+ </example>
54
76
 
55
- Instruct agents to flag problems early rather than working around them. When an agent encounters unexpected complexity, unclear architecture, or code that fights back — the right move is to stop and report clearly. A clear description of the problem is more valuable than a brittle implementation built on a bad foundation.
77
+ </context-propagation>
56
78
 
57
- When you see these reports, investigate before pushing forward. If the smell suggests a design issue, involve the user.
79
+ </agent-instructions>
58
80
 
59
- ## Critique and Refinement
81
+ <code-smell-escalation>
60
82
 
61
- After implementation agents report, assess whether the stage needs critique before advancing. For stages that touch core logic, integration points, or critical paths review before building on top. For low-risk stages (types, config, boilerplate), you can defer review and batch it with a later critique cycle. The failure mode is not "sometimes skipping review" — it's implementing six stages in a row without any review at all.
83
+ Instruct agents to flag problems early rather than working around them. When an agent encounters unexpected complexity, unclear architecture, or code that fights back the right move is to stop and report clearly. A clear problem description is more valuable than a brittle implementation.
62
84
 
63
- ### Critique cycle
85
+ When you see these reports, investigate before pushing forward. If the smell suggests a design issue, involve the user.
64
86
 
65
- When a stage warrants critique, spawn review agents in parallel, each attacking a different dimension:
87
+ </code-smell-escalation>
66
88
 
67
- 1. **Code reuse reviewer** — searches the codebase for existing utilities, helpers, and patterns that the new code duplicates. Flags any new function that reimplements existing functionality, any inline logic that could use an existing utility.
89
+ <critique-refinement>
68
90
 
69
- 2. **Code quality reviewer** — looks for hacky patterns: redundant state, parameter sprawl, copy-paste with slight variation, leaky abstractions, stringly-typed code where constants or enums exist, unnecessary nesting or wrapping.
91
+ ## Critique Cycle
70
92
 
71
- 3. **Efficiency reviewer** looks for unnecessary work (redundant computations, duplicate API calls, N+1 patterns), missed concurrency (independent operations run sequentially), hot-path bloat, unbounded data structures, overly broad operations.
93
+ After implementation agents report, assess whether the stage needs critique before advancing. The failure mode is not "sometimes skipping review" it's implementing six stages in a row without any.
72
94
 
73
- Give each reviewer the full diff and relevant context files. They report problems they don't fix them.
95
+ When a stage warrants critique, spawn review agents in parallel, each attacking a different dimension:
96
+ - **Code reuse** — existing utilities, helpers, patterns the new code duplicates
97
+ - **Code quality** — hacky patterns, redundant state, parameter sprawl, copy-paste, leaky abstractions
98
+ - **Efficiency** — redundant computations, N+1 patterns, missed concurrency, unbounded data structures
99
+
100
+ Give each reviewer the full diff and relevant context files. They report problems — they don't fix.
74
101
 
75
- ### Refine cycle
102
+ ## Refine Cycle
76
103
 
77
- Aggregate the reviewer findings. Spawn fix agents and **point them at the review report** — don't rewrite the findings as line-by-line instructions. The fix agent reads the report, reads the code, and figures out the right solution. You triage (skip false positives, note any architectural constraints) — they implement.
104
+ Aggregate reviewer findings. Spawn fix agents and **point them at the review report** — don't rewrite findings as line-by-line instructions. You triage (skip false positives, note architectural constraints) — they implement.
78
105
 
79
106
  ```bash
80
107
  sisyphus spawn --name "fix-review-issues" --agent-type sisyphus:implement \
81
108
  "Fix the issues in reports/agent-003-final.md. Skip item #5 (false positive). Run type-check after."
82
109
  ```
83
110
 
84
- The fix agents should use `/simplify` to systematically review their own changes before reporting.
111
+ Fix agents should use `/simplify` to review their own changes before reporting.
85
112
 
86
- ### Repeat until clean
113
+ Re-review after fixes. Stop when reviewers return only stylistic nits. If 3+ rounds are needed, the approach — not the patches — needs rethinking.
87
114
 
88
- Spawn reviewers again on the refined code. If they come back with new issues, fix those too. Genuinely nitpicky findings — stylistic preferences, irrelevant edge cases — can be skipped. But if a finding is actually correct, it gets done. **"I don't want to" is not a reason to skip a valid finding.** The distinction is between false positives and laziness. In practice this is usually 1-2 rounds. If it's taking more, the implementation was shaky and you should consider whether the approach needs rethinking rather than patching.
115
+ </critique-refinement>
89
116
 
90
- ## E2E Validation
117
+ <e2e-validation>
91
118
 
92
- E2E validation confirms the implementation actually works — not just that it compiles or passes unit tests, but that the feature behaves correctly when exercised. Reserve full e2e validation for stages where you're about to build on accumulated work (integration stages, milestones where multiple stages come together) or where failure would be expensive to debug later. Not every stage needs its own e2e pass — but don't let more than 2-3 stages accumulate without one.
119
+ E2E validation confirms the implementation actually works — not just compiles or passes unit tests. Reserve full validation for stages where you're building on accumulated work or where failure would be expensive to debug later. Don't let more than 2-3 stages accumulate without one.
93
120
 
94
121
  Spawn a validation agent with the e2e recipe from `context/e2e-recipe.md`. The agent should:
95
- - Follow the setup steps exactly (build, start servers, seed data)
96
- - Run every verification step in the recipe
97
- - Report exactly what passed and what failed — not "it looks good"
122
+ - Follow setup steps exactly (build, start servers, seed data)
123
+ - Run every verification step
124
+ - Report exactly what passed and what failed
98
125
 
99
- If the recipe involves UI, the validation agent should use `capture` to screenshot and interact with the actual running app. If it involves an API, it should curl the actual endpoints. If it involves CLI behavior, it should exercise it in the terminal.
126
+ If the recipe involves UI, use `capture` to screenshot the running app. If API, curl the endpoints. If CLI, exercise it in the terminal.
100
127
 
101
- If the project lacks validation tooling, **create it**. A smoke-test script, a seed command, a health-check endpoint these pay for themselves immediately and every future validation agent reuses them.
128
+ If the project lacks validation tooling, **create it** a smoke-test script, seed command, or health-check endpoint pays for itself immediately.
102
129
 
103
- When you've chosen to validate a stage, **don't advance past it until validation passes.** If it fails, log the failures, spawn fix agents, and re-validate. A validation checkpoint you ignore is worse than no checkpoint — it creates false confidence.
130
+ **Don't advance past a validated stage until validation passes.** If it fails, log failures, spawn fix agents, re-validate.
104
131
 
105
- ## Worktree Preference
106
-
107
- When spawning two or more implementation agents in the same cycle, prefer `--worktree` for each. Worktree isolation eliminates file conflict risk — agents can't clobber each other's changes, each gets a clean branch, and they can commit incrementally. The daemon merges branches back when agents complete and surfaces conflicts in your next cycle's state.
132
+ When all implementation stages are complete, transition to validation mode for the comprehensive final pass:
108
133
 
109
134
  ```bash
110
- sisyphus spawn --name "impl-auth" --agent-type sisyphus:implement --worktree "Add session middlewaresee context/conventions.md"
111
- sisyphus spawn --name "impl-routes" --agent-type sisyphus:implement --worktree "Add login routes — see context/conventions.md and context/explore-architecture.md"
135
+ sisyphus yield --mode validation --prompt "All stages implementedvalidate against context/e2e-recipe.md"
112
136
  ```
113
137
 
114
- ## Returning to Planning
138
+ Validation mode shifts the orchestrator's entire focus to proving the feature works. Stage-level validation during implementation catches issues early; the final validation pass proves the whole thing holds together.
139
+
140
+ </e2e-validation>
115
141
 
116
- If you discover mid-implementation that the approach is wrong — the architecture is different than expected, a dependency changes the approach, or agents keep hitting the same wall — don't keep pushing. Return to planning:
142
+ <returning-to-planning>
143
+
144
+ If the approach is wrong mid-implementation, don't keep pushing. Return to planning:
117
145
 
118
146
  ```bash
119
147
  sisyphus yield --mode planning --prompt "Re-evaluate: discovered X changes the approach — write cycle log"
120
148
  ```
121
149
 
122
- Document what you found in the cycle log before yielding so the planning cycle starts informed. Update roadmap.md to reflect that you're back in an earlier phase.
150
+ Concrete triggers:
151
+ - 2+ agents report same unexpected complexity in the same subsystem
152
+ - An agent discovers a dependency that changes the approach
153
+ - Fix agents keep patching the same area across cycles
154
+
155
+ Document what you found in the cycle log before yielding. Update roadmap.md to reflect you're back in an earlier phase.
156
+
157
+ </returning-to-planning>
@@ -1,90 +1,72 @@
1
1
  # Planning Phase
2
2
 
3
- ## Planning Phase Flow
3
+ <planning-workflow>
4
4
 
5
- The natural sequence: **context → spec → roadmap refinement → detailed planning.** Context documents come first because they feed everything downstream — spec writers, planners, and implementers all benefit from not having to re-explore the codebase. After the spec is aligned, revisit the roadmap — that's when you actually understand scope well enough to flesh out phases honestly.
5
+ The natural sequence: **context → requirementsdesign → roadmap refinement → detailed planning.** Context documents come first because they feed everything downstream — requirements analysts, designers, planners, and implementers all benefit from not having to re-explore the codebase. After the requirements and design are aligned, revisit the roadmap — that's when you actually understand scope well enough to flesh out phases honestly.
6
6
 
7
- ## Exploration
7
+ </planning-workflow>
8
8
 
9
- Use explore agents to build understanding before making decisions. Each agent should save a focused context document to `.sisyphus/sessions/$SISYPHUS_SESSION_ID/context/` — these artifacts get passed to downstream agents so they don't have to re-explore the codebase themselves.
9
+ <exploration>
10
10
 
11
- Adapt the number and focus of explore agents to the task. Key principles:
11
+ Use explore agents to build understanding before making decisions. Each agent saves a focused context document to `$SISYPHUS_SESSION_DIR/context/`.
12
12
 
13
- - **Each agent produces a focused artifact** — not one sprawling document. Focused documents can be selectively passed to downstream agents. An agent implementing auth gets `conventions.md` + `architecture.md`, not a 500-line dump.
14
- - **Conventions and patterns are high-value** to capture. Implementation agents that receive convention context write consistent code. Ones that don't produce code you'll have to fix.
15
- - **Exploration serves different purposes at different stages.** Early exploration is architectural — understanding the system and what needs to change. Later exploration before a specific stage is tactical — identifying files, patterns to follow, utilities to reuse. Both are valuable.
16
- - **Delegate understanding of unfamiliar territory.** If the task touches a library or subsystem you don't know, spawn an agent to investigate and report.
13
+ - **Each agent produces a focused artifact** — not one sprawling document. Focused documents can be selectively passed to downstream agents.
14
+ - **Conventions and patterns are high-value** to capture. Implementation agents that receive convention context write consistent code.
15
+ - **Exploration serves different purposes at different stages.** Early exploration is architectural. Later exploration before a specific stage is tactical — files, patterns, utilities to reuse.
16
+ - **Delegate understanding of unfamiliar territory.** If the task touches an unfamiliar library or subsystem, spawn an agent to investigate and report.
17
17
 
18
- ## Spec Alignment
18
+ </exploration>
19
19
 
20
- Before investing in a detailed spec, make sure the goal itself is well-defined. If you're making assumptions about scope, requirements, or constraints — surface them to the user. A spec built on wrong assumptions wastes every cycle downstream.
20
+ <requirements-alignment>
21
21
 
22
- For significant features, spec refinement is iterative:
23
- - Draft the spec based on exploration findings
24
- - Have agents review for feasibility and code smells (can this actually work given the codebase?)
25
- - Seek user alignment on the high-level approach and any decisions that set direction
26
- - **Apply corrections back to the spec itself** — the spec is the single source of truth. Don't create a separate corrections file and pass both downstream; update the spec and delete the corrections. Plan agents should read one authoritative document, not reconcile two contradictory ones.
22
+ Before investing in detailed requirements, make sure the goal is well-defined. If you're making assumptions about scope, requirements, or constraints — surface them to the user.
27
23
 
28
- Not every stage needs a standalone spec document — a well-defined stage might just be a detailed section in the implementation plan. Use judgment about how much formality each stage warrants.
24
+ For significant features, requirements refinement is iterative:
25
+ - Draft requirements based on exploration findings
26
+ - Have agents review for feasibility (can this actually work given the codebase?)
27
+ - Seek user alignment on the high-level approach
28
+ - **Fold new knowledge into authoritative documents.** When reviews, exploration, or user feedback change the understanding, update the requirements and design documents directly — they are the single source of truth. Don't create correction files, addendum files, or decision logs alongside them. Remove superseded material rather than annotating it. Plan agents should read clean, current documents — not reconcile contradictions or skip over resolved questions.
29
29
 
30
- ## Roadmap Refinement
30
+ Not every stage needs standalone requirements — a well-defined stage might just be a detailed section in the implementation plan.
31
31
 
32
- Once you have context docs and an aligned spec, revisit the roadmap. This is the first point where you understand real scope — adjust phase boundaries, add phases you didn't anticipate, reorder for dependencies. Keep future phases at outline level; just make sure the shape is honest.
32
+ </requirements-alignment>
33
33
 
34
- ## Delegating to the Plan Lead
34
+ <plan-delegation>
35
35
 
36
- Spawn **one plan lead** per feature. Point it at **inputs** (spec, context docs, corrections)not a pre-made structure. Don't pre-decide staging, ordering, or design decisions. The plan lead has `effort: max` reasoning and handles its own decomposition: it will assess scope, delegate sub-plans to specialist agents if the feature is large enough, run adversarial reviews on the result, and deliver a synthesized master plan.
36
+ Once you have context docs and aligned requirements/design, revisit the roadmapthis is the first point where you understand real scope. Roadmap refinement means updating the four canonical sections: current stage, exit criteria, active context references, and next steps. Decisions from exploration, requirements, and design fold into context documents not the roadmap.
37
37
 
38
- **Don't split the planning yourself.** The plan lead decides whether to plan solo or delegate sub-plans to domain-specific agents. If the orchestrator pre-splits into "backend plan agent" and "frontend plan agent," the plan lead's synthesis step where it resolves cross-domain conflicts, finds gaps, and stress-tests edge cases never happens. One plan lead per feature, and trust it to decompose internally.
38
+ Spawn **one plan lead** per feature. Point it at inputs (requirements, design, context docs) not a pre-made structure. The plan lead handles its own decomposition: it assesses scope, delegates sub-plans if needed, runs adversarial reviews, and delivers a synthesized master plan. **Delegate outcomes, not implementations**tell the plan lead what needs planning and why, not how to structure the plan.
39
39
 
40
- **When to spawn multiple plan leads:** Only for genuinely independent features with no shared files or integration points. If two features touch the same codebase area, one plan lead should own both otherwise you'll get conflicting plans with no one responsible for reconciling them.
40
+ **Don't split planning yourself.** If the orchestrator pre-splits into "backend plan agent" and "frontend plan agent," the plan lead's synthesis stepresolving cross-domain conflicts, finding gaps, stress-testing edge cases never happens.
41
41
 
42
- ## Progressive Development
42
+ **When to spawn multiple plan leads:** Only for genuinely independent features with no shared files or integration points.
43
43
 
44
- Not all tasks need the same process depth. A 2-file bug fix can go straight to implementation. A cross-repo feature with multiple domains needs full phased development.
44
+ </plan-delegation>
45
45
 
46
- ### Decision heuristic
46
+ <progressive-development>
47
47
 
48
- - **Small task** (1-3 files, single domain): Skip phases — roadmap is just a short task checklist (diagnose, fix, validate). Single plan agent, single implement agent.
49
- - **Large task** (3+ stages, multiple domains or repos): Full phased development. The roadmap tracks development phases, and each phase produces artifacts in `context/`.
48
+ Not all tasks need the same process depth.
50
49
 
51
- Signs you need phased development: the task touches multiple unfamiliar subsystems, the task description spans different concerns (backend, frontend, IPC, etc.), or a spec exists with more than 3 distinct work areas.
50
+ - **Small task** (1-3 files, single domain): Skip phases roadmap is a short checklist (diagnose, fix, validate). Single plan agent, single implement agent.
51
+ - **Large task** (3+ stages, multiple domains): Full phased development. The roadmap tracks phases, each producing artifacts in `context/`.
52
52
 
53
- ### Implementation stages are context artifacts
53
+ Signs you need phased development: multiple unfamiliar subsystems, the task spans different concerns (backend, frontend, IPC), or the requirements have more than 3 distinct work areas.
54
54
 
55
- When Phase 3 (Plan) runs, it produces implementation stage breakdowns saved to `context/`:
56
- - `context/plan-implementation.md` — overall stage outline with dependencies
57
- - `context/plan-stage-1-types.md` — detailed plan for stage 1
58
- - `context/plan-stage-2-service.md` — detailed plan for stage 2 (written when stage 1 is underway)
55
+ Implementation stages are context artifacts saved to `context/plan-stage-N-*.md`. Detail-plan one stage at a time; what you learn implementing stage N informs stage N+1.
59
56
 
60
- ### Don't front-load phases
57
+ </progressive-development>
61
58
 
62
- Detail-plan one stage at a time. What you learn implementing stage N informs stage N+1's detail plan. The stage outline evolves — stages get added, removed, reordered, or split as understanding grows. That's the system working correctly.
63
-
64
- Detailed plans for stages 4-7 written before stage 1 is implemented are fiction. Defer detail until you're about to execute.
65
-
66
- ## E2E Verification Recipe
59
+ <verification-planning>
67
60
 
68
61
  Before implementation begins, determine how to concretely verify the change works end-to-end. This is the single most common failure mode: agents report success but nothing actually works.
69
62
 
70
- The tooling explorer should have mapped the available infrastructure. Common patterns:
71
-
72
- - **Browser automation**: `capture` CLI for UI changes — click through affected flows, screenshot results
73
- - **CLI verification**: exercise changed behavior interactively in tmux
74
- - **API testing**: dev server + curl/httpie for endpoint changes
75
- - **Integration tests**: existing e2e or integration test suite
76
- - **Smoke script**: create one if nothing else exists
63
+ If you cannot determine a concrete verification method, **ask the user**. Do not proceed to implementation without a verification plan.
77
64
 
78
- If you cannot determine a concrete verification method, **ask the user**. Offer 2-3 specific options. Do not proceed to implementation without a verification plan.
65
+ Write the recipe to `context/e2e-recipe.md` with setup steps, exact commands or interactions to verify, and what success looks like. Make it executable, not aspirational. Implementation agents and validation agents both reference this file.
79
66
 
80
- Write the recipe to `context/e2e-recipe.md` with:
81
- - Setup steps (start dev server, build, seed data, etc.)
82
- - Exact commands or interactions to verify
83
- - What success looks like (expected output, visual state, response codes)
67
+ </verification-planning>
84
68
 
85
- Implementation agents and validation agents both reference this file. Write it to be executable, not aspirational.
86
-
87
- ## Transitioning to Implementation
69
+ <transition>
88
70
 
89
71
  When you have enough understanding, a reviewed plan, and a verification recipe — transition explicitly:
90
72
 
@@ -92,4 +74,12 @@ When you have enough understanding, a reviewed plan, and a verification recipe
92
74
  sisyphus yield --mode implementation --prompt "Begin implementation — see roadmap.md and context/plan-implementation.md"
93
75
  ```
94
76
 
95
- The `--mode implementation` flag loads implementation-phase guidance for the next cycle. Pass a prompt that orients the next cycle to where things stand.
77
+ The `--mode implementation` flag loads implementation-phase guidance for the next cycle.
78
+
79
+ After implementation is complete, transition to validation mode to prove the feature works:
80
+
81
+ ```bash
82
+ sisyphus yield --mode validation --prompt "Implementation complete — validate against context/e2e-recipe.md"
83
+ ```
84
+
85
+ </transition>
@@ -0,0 +1,13 @@
1
+ ---
2
+ description: Create technical design from requirements through investigation and user iteration
3
+ argument-hint: <topic or description>
4
+ ---
5
+ # Technical Design
6
+
7
+ **Input:** $ARGUMENTS
8
+
9
+ The user wants a technical design before implementation begins.
10
+
11
+ Spawn a `sisyphus:design` agent to lead this — it's interactive, investigates the codebase, proposes architecture, and iterates with the user. Output goes to `context/design.md`. It expects `context/requirements.md` to exist; if it doesn't, flag that to the user or run requirements first.
12
+
13
+ If the current strategy doesn't include a design stage, update it before spawning. Don't do the design work yourself.
@@ -0,0 +1,13 @@
1
+ ---
2
+ description: Explore the problem space collaboratively before committing to a solution
3
+ argument-hint: <topic or description>
4
+ ---
5
+ # Problem Exploration
6
+
7
+ **Input:** $ARGUMENTS
8
+
9
+ The user wants to step back and explore the problem space before committing to a direction. This is a signal to prioritize understanding over progress.
10
+
11
+ Spawn a `sisyphus:problem` agent to lead this — it's interactive, collaborates with the user, and saves findings to `context/problem.md`. If the current strategy doesn't account for a problem exploration stage, update it before spawning.
12
+
13
+ Don't do the exploration yourself. The `sisyphus:problem` agent is purpose-built for divergent thinking and user collaboration.
@@ -0,0 +1,13 @@
1
+ ---
2
+ description: Define behavioral requirements with EARS acceptance criteria
3
+ argument-hint: <topic or description>
4
+ ---
5
+ # Requirements
6
+
7
+ **Input:** $ARGUMENTS
8
+
9
+ The user wants formal requirements defined before design or implementation proceeds.
10
+
11
+ Spawn a `sisyphus:requirements` agent to lead this — it's interactive, drafts EARS-format requirements, and iterates with the user until approved. Output goes to `context/requirements.md`. If the current strategy doesn't include a requirements stage, update it before spawning.
12
+
13
+ Don't draft requirements yourself. The `sisyphus:requirements` agent handles the full process: codebase investigation, drafting, and user iteration.
@@ -0,0 +1,19 @@
1
+ ---
2
+ description: Redirect session strategy — reactivate if completed, then respawn in strategy mode
3
+ argument-hint: <new direction or focus>
4
+ ---
5
+ # Strategize
6
+
7
+ **Input:** $ARGUMENTS
8
+
9
+ The user wants to redirect this session's strategy.
10
+
11
+ ## Steps
12
+
13
+ 1. If the session is completed (`sisyphus status`), reactivate it with `sisyphus continue`.
14
+ 2. Annotate `strategy.md` with the pivot — what changed, new focus, which existing artifacts still apply. Don't rewrite the whole strategy.
15
+ 3. Yield to strategy mode:
16
+ ```bash
17
+ sisyphus yield --mode strategy --prompt "<concise description of the new direction>"
18
+ ```
19
+ This respawns a fresh orchestrator that will re-evaluate the goal, stages, and approach.
@@ -0,0 +1,15 @@
1
+ #!/bin/bash
2
+ if [ -z "$SISYPHUS_SESSION_DIR" ]; then exit 0; fi
3
+
4
+ CONTEXT_DIR="${SISYPHUS_SESSION_DIR}/context"
5
+
6
+ # Gate passes if any explore context file exists
7
+ if ls "${CONTEXT_DIR}"/explore-*.md 1>/dev/null 2>&1; then
8
+ exit 0
9
+ fi
10
+
11
+ cat <<'GATE'
12
+ <explore-gate>
13
+ No exploration context exists yet. Before planning or delegating work, spawn explore agents to build codebase understanding.
14
+ </explore-gate>
15
+ GATE
@@ -1 +1,14 @@
1
- {"hooks":{}}
1
+ {
2
+ "hooks": {
3
+ "UserPromptSubmit": [
4
+ {
5
+ "hooks": [
6
+ {
7
+ "type": "command",
8
+ "command": "bash ${CLAUDE_PLUGIN_ROOT}/hooks/explore-gate.sh"
9
+ }
10
+ ]
11
+ }
12
+ ]
13
+ }
14
+ }
@@ -39,7 +39,7 @@ Usually serial — diagnosis must complete before fix, fix before validation. Ex
39
39
  ## Feature Build (Small — 1-3 files)
40
40
 
41
41
  ### When to use
42
- Clear requirements, small scope, no spec needed.
42
+ Clear requirements, small scope, no formal requirements document needed.
43
43
 
44
44
  ### Plan structure
45
45
  ```
@@ -70,10 +70,12 @@ Feature with moderate complexity. Requirements may need clarification. Multiple
70
70
  ```
71
71
  ## Feature: [description]
72
72
 
73
- ### Spec & Planning
74
- - [ ] Draft specinvestigate codebase, propose approach
75
- - [ ] Create implementation plan from spec
76
- - [ ] Review plan against spec
73
+ ### Requirements & Design
74
+ - [ ] Problem explorationunderstand goals, constraints, assumptions
75
+ - [ ] Requirements define acceptance criteria
76
+ - [ ] Design architecture, component boundaries, data models
77
+ - [ ] Create implementation plan from requirements + design
78
+ - [ ] Review plan against requirements + design
77
79
 
78
80
  ### Implementation
79
81
  - [ ] Phase 1 — [foundation/types/interfaces]
@@ -87,18 +89,20 @@ Feature with moderate complexity. Requirements may need clarification. Multiple
87
89
  Note: critique and validation are embedded between implementation phases, not deferred to the end. Phase 1 (types) is low-risk and doesn't need its own review, but critique catches issues before Phase 3 builds on them. Validation happens after integration, when all the pieces come together.
88
90
 
89
91
  ### Cycle plan
90
- - **Cycle 1**: Spawn `sisyphus:spec-draft` for spec. Yield. (Human iterates on spec between cycles.)
91
- - **Cycle 2**: Spawn `sisyphus:plan` for plan. Yield.
92
- - **Cycle 3**: Spawn `sisyphus:review-plan` for review. If fail, respawn plan with issues. Yield.
93
- - **Cycle 4**: Spawn `sisyphus:implement` for Phase 1. Yield.
94
- - **Cycle 5**: Spawn `sisyphus:implement` for Phase 2. Phase 1 is types — low risk, doesn't need its own validation. Yield.
95
- - **Cycle 6**: Spawn `sisyphus:review` for critique of phases 1-2. This is the checkpoint before integration builds on top. Yield.
96
- - **Cycle 7**: Address critique findings + spawn `sisyphus:implement` for Phase 3. Yield.
97
- - **Cycle 8**: Spawn `sisyphus:validate` for e2e smoketest. Yield.
98
- - **Cycle 9**: Address validation failures or complete.
92
+ - **Cycle 1**: Spawn `sisyphus:problem` for problem exploration. Yield. (Human iterates between cycles.)
93
+ - **Cycle 2**: Spawn `sisyphus:requirements` for requirements analysis. Yield. (Human reviews/iterates.)
94
+ - **Cycle 3**: Spawn `sisyphus:design` for technical design. Yield. (Human reviews/iterates.)
95
+ - **Cycle 4**: Spawn `sisyphus:plan` for plan. Yield.
96
+ - **Cycle 5**: Spawn `sisyphus:review-plan` for review. If fail, respawn plan with issues. Yield.
97
+ - **Cycle 6**: Spawn `sisyphus:implement` for Phase 1. Yield.
98
+ - **Cycle 7**: Spawn `sisyphus:implement` for Phase 2. Phase 1 is types — low risk, doesn't need its own validation. Yield.
99
+ - **Cycle 8**: Spawn `sisyphus:review` for critique of phases 1-2. This is the checkpoint before integration builds on top. Yield.
100
+ - **Cycle 9**: Address critique findings + spawn `sisyphus:implement` for Phase 3. Yield.
101
+ - **Cycle 10**: `sisyphus yield --mode validation` for e2e smoketest. Validation mode proves the feature works — operator for UI, evidence for every claim.
102
+ - **Cycle 11**: Address validation failures (back to `--mode implementation`) or complete.
99
103
 
100
104
  ### Failure modes
101
- - **Spec needs human input**: Mark session as needing human review. Orchestrator notes open questions.
105
+ - **Requirements/design needs human input**: Mark session as needing human review. Orchestrator notes open questions.
102
106
  - **Plan fails review**: Feed review issues back, respawn planner.
103
107
  - **Critique finds issues in foundation**: Fix before starting integration — don't build on shaky ground.
104
108
  - **Validation fails**: Feed specifics back to implement agent for the failing area.
@@ -117,9 +121,10 @@ Cross-cutting feature, multiple domains, needs team coordination. Uses **progres
117
121
  ```
118
122
  ## Feature: [description]
119
123
 
120
- ### Spec
121
- - [ ] Draft spec
122
- - [ ] Review spec
124
+ ### Requirements & Design
125
+ - [ ] Problem exploration
126
+ - [ ] Requirements
127
+ - [ ] Design
123
128
 
124
129
  ### Stage Outline (high-level only — no file-level detail yet)
125
130
  1. [domain A foundation] — no deps — ~N cycles
@@ -140,15 +145,17 @@ See context/plan-stage-N-{name}.md for detail plan.
140
145
  Note: verification checkpoints are embedded in the stage outline, not deferred to a final phase. The level of rigor varies — foundation stages get a light critique, core logic gets critique + validation, integration gets full e2e validation. This is judgment, not formula.
141
146
 
142
147
  ### Cycle plan
143
- - **Cycle 1**: Spawn `sisyphus:spec-draft` for spec. Yield.
144
- - **Cycle 2**: Spawn `sisyphus:plan` for **high-level stage outline only**. Instruction: "Outline stages, dependencies, one-sentence descriptions, cycle estimates. Include verification checkpoints between stages based on risk." Spawn `sisyphus:test-spec` for test properties (parallel). Yield.
145
- - **Cycle 3**: Review outline. Spawn `sisyphus:plan` to **detail-plan stage 1 only** (provide outline as context). Output to `context/plan-stage-1-{name}.md`. Yield.
146
- - **Cycle 4**: Spawn `sisyphus:implement` for stage 1. If stage 2 is independent, spawn `sisyphus:plan` to detail-plan stage 2 in parallel. Yield.
147
- - **Cycle 5**: Spawn `sisyphus:implement` for stage 2 (if detail-planned). Spawn `sisyphus:review` to critique stages 1-2 in parallel foundation review before core logic builds on it. Detail-plan stage 3 in parallel. Yield.
148
- - **Cycle 6**: Address critique findings. Spawn `sisyphus:implement` for stage 3. Yield.
149
- - **Cycle 7**: Spawn `sisyphus:implement` for stage 4. Spawn `sisyphus:review` to critique stage 3 in parallel. Yield.
150
- - **Cycle 8**: Spawn `sisyphus:validate` for stages 3-4 — core logic checkpoint before integration. Address stage 3 critique. Yield.
151
- - **Cycle 9+**: Implement integration stage. Validate e2e. Final review.
148
+ - **Cycle 1**: Spawn `sisyphus:problem` for problem exploration. Yield.
149
+ - **Cycle 2**: Spawn `sisyphus:requirements` for requirements. Yield.
150
+ - **Cycle 3**: Spawn `sisyphus:design` for design. Yield.
151
+ - **Cycle 4**: Spawn `sisyphus:plan` for **high-level stage outline only**. Instruction: "Outline stages, dependencies, one-sentence descriptions, cycle estimates. Include verification checkpoints between stages based on risk." Spawn `sisyphus:test-spec` for test properties (parallel). Yield.
152
+ - **Cycle 5**: Review outline. Spawn `sisyphus:plan` to **detail-plan stage 1 only** (provide outline as context). Output to `context/plan-stage-1-{name}.md`. Yield.
153
+ - **Cycle 6**: Spawn `sisyphus:implement` for stage 1. If stage 2 is independent, spawn `sisyphus:plan` to detail-plan stage 2 in parallel. Yield.
154
+ - **Cycle 7**: Spawn `sisyphus:implement` for stage 2 (if detail-planned). Spawn `sisyphus:review` to critique stages 1-2 in parallel — foundation review before core logic builds on it. Detail-plan stage 3 in parallel. Yield.
155
+ - **Cycle 8**: Address critique findings. Spawn `sisyphus:implement` for stage 3. Yield.
156
+ - **Cycle 9**: Spawn `sisyphus:implement` for stage 4. Spawn `sisyphus:review` to critique stage 3 in parallel. Yield.
157
+ - **Cycle 10**: Spawn `sisyphus:validate` for stages 3-4 — core logic checkpoint before integration. Address stage 3 critique. Yield.
158
+ - **Cycle 11+**: Implement integration stage. Final review. Then `sisyphus yield --mode validation` for comprehensive e2e proof.
152
159
 
153
160
  ### Failure modes
154
161
  - **Detail-plan agent can't produce quality output**: The stage is still too large. Break it into sub-stages in the outline and detail-plan each sub-stage individually.