sisyphi 1.0.13 → 1.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (100) hide show
  1. package/dist/{chunk-T7ETTIQK.js → chunk-M7LZ2ZHD.js} +3 -27
  2. package/dist/chunk-M7LZ2ZHD.js.map +1 -0
  3. package/dist/{chunk-JXKUI4P6.js → chunk-REUQ4B45.js} +7 -38
  4. package/dist/chunk-REUQ4B45.js.map +1 -0
  5. package/dist/{chunk-LWWRGQWM.js → chunk-Z32YVDMY.js} +2 -2
  6. package/dist/chunk-Z32YVDMY.js.map +1 -0
  7. package/dist/cli.js +75 -56
  8. package/dist/cli.js.map +1 -1
  9. package/dist/daemon.js +776 -629
  10. package/dist/daemon.js.map +1 -1
  11. package/dist/{paths-NUUALUVP.js → paths-IJXOAN4E.js} +4 -6
  12. package/dist/templates/CLAUDE.md +16 -14
  13. package/dist/templates/agent-plugin/agents/CLAUDE.md +17 -6
  14. package/dist/templates/agent-plugin/agents/design.md +134 -0
  15. package/dist/templates/agent-plugin/agents/explore.md +39 -0
  16. package/dist/templates/agent-plugin/agents/operator.md +24 -0
  17. package/dist/templates/agent-plugin/agents/plan.md +15 -20
  18. package/dist/templates/agent-plugin/agents/problem.md +119 -0
  19. package/dist/templates/agent-plugin/agents/requirements.md +138 -0
  20. package/dist/templates/agent-plugin/agents/review/CLAUDE.md +29 -0
  21. package/dist/templates/agent-plugin/agents/review/compliance.md +6 -6
  22. package/dist/templates/agent-plugin/agents/review-plan/code-smells.md +4 -4
  23. package/dist/templates/agent-plugin/agents/review-plan/requirements-coverage.md +62 -0
  24. package/dist/templates/agent-plugin/agents/review-plan/security.md +1 -1
  25. package/dist/templates/agent-plugin/agents/review-plan.md +9 -8
  26. package/dist/templates/agent-plugin/agents/review.md +1 -1
  27. package/dist/templates/agent-plugin/agents/test-spec.md +3 -3
  28. package/dist/templates/agent-plugin/hooks/CLAUDE.md +2 -2
  29. package/dist/templates/agent-plugin/hooks/explore-user-prompt.sh +13 -0
  30. package/dist/templates/agent-plugin/hooks/plan-user-prompt.sh +1 -1
  31. package/dist/templates/agent-plugin/hooks/require-submit.sh +70 -3
  32. package/dist/templates/agent-plugin/hooks/review-plan-user-prompt.sh +4 -4
  33. package/dist/templates/agent-plugin/hooks/review-user-prompt.sh +1 -1
  34. package/dist/templates/agent-suffix.md +0 -2
  35. package/dist/templates/orchestrator-base.md +169 -145
  36. package/dist/templates/orchestrator-impl.md +92 -57
  37. package/dist/templates/orchestrator-planning.md +46 -56
  38. package/dist/templates/orchestrator-plugin/commands/sisyphus/design.md +13 -0
  39. package/dist/templates/orchestrator-plugin/commands/sisyphus/problem.md +13 -0
  40. package/dist/templates/orchestrator-plugin/commands/sisyphus/requirements.md +13 -0
  41. package/dist/templates/orchestrator-plugin/commands/sisyphus/strategize.md +19 -0
  42. package/dist/templates/orchestrator-plugin/hooks/explore-gate.sh +15 -0
  43. package/dist/templates/orchestrator-plugin/hooks/hooks.json +14 -1
  44. package/dist/templates/orchestrator-plugin/skills/orchestration/task-patterns.md +34 -27
  45. package/dist/templates/orchestrator-plugin/skills/orchestration/workflow-examples.md +56 -24
  46. package/dist/templates/orchestrator-strategy.md +233 -0
  47. package/dist/templates/orchestrator-validation.md +94 -0
  48. package/dist/tui.js +2730 -2924
  49. package/dist/tui.js.map +1 -1
  50. package/package.json +2 -4
  51. package/templates/CLAUDE.md +16 -14
  52. package/templates/agent-plugin/agents/CLAUDE.md +17 -6
  53. package/templates/agent-plugin/agents/design.md +134 -0
  54. package/templates/agent-plugin/agents/explore.md +39 -0
  55. package/templates/agent-plugin/agents/operator.md +24 -0
  56. package/templates/agent-plugin/agents/plan.md +15 -20
  57. package/templates/agent-plugin/agents/problem.md +119 -0
  58. package/templates/agent-plugin/agents/requirements.md +138 -0
  59. package/templates/agent-plugin/agents/review/CLAUDE.md +29 -0
  60. package/templates/agent-plugin/agents/review/compliance.md +6 -6
  61. package/templates/agent-plugin/agents/review-plan/code-smells.md +4 -4
  62. package/templates/agent-plugin/agents/review-plan/requirements-coverage.md +62 -0
  63. package/templates/agent-plugin/agents/review-plan/security.md +1 -1
  64. package/templates/agent-plugin/agents/review-plan.md +9 -8
  65. package/templates/agent-plugin/agents/review.md +1 -1
  66. package/templates/agent-plugin/agents/test-spec.md +3 -3
  67. package/templates/agent-plugin/hooks/CLAUDE.md +2 -2
  68. package/templates/agent-plugin/hooks/explore-user-prompt.sh +13 -0
  69. package/templates/agent-plugin/hooks/plan-user-prompt.sh +1 -1
  70. package/templates/agent-plugin/hooks/require-submit.sh +70 -3
  71. package/templates/agent-plugin/hooks/review-plan-user-prompt.sh +4 -4
  72. package/templates/agent-plugin/hooks/review-user-prompt.sh +1 -1
  73. package/templates/agent-suffix.md +0 -2
  74. package/templates/orchestrator-base.md +169 -145
  75. package/templates/orchestrator-impl.md +92 -57
  76. package/templates/orchestrator-planning.md +46 -56
  77. package/templates/orchestrator-plugin/commands/sisyphus/design.md +13 -0
  78. package/templates/orchestrator-plugin/commands/sisyphus/problem.md +13 -0
  79. package/templates/orchestrator-plugin/commands/sisyphus/requirements.md +13 -0
  80. package/templates/orchestrator-plugin/commands/sisyphus/strategize.md +19 -0
  81. package/templates/orchestrator-plugin/hooks/explore-gate.sh +15 -0
  82. package/templates/orchestrator-plugin/hooks/hooks.json +14 -1
  83. package/templates/orchestrator-plugin/skills/orchestration/task-patterns.md +34 -27
  84. package/templates/orchestrator-plugin/skills/orchestration/workflow-examples.md +56 -24
  85. package/templates/orchestrator-strategy.md +233 -0
  86. package/templates/orchestrator-validation.md +94 -0
  87. package/dist/chunk-JXKUI4P6.js.map +0 -1
  88. package/dist/chunk-LWWRGQWM.js.map +0 -1
  89. package/dist/chunk-T7ETTIQK.js.map +0 -1
  90. package/dist/templates/agent-plugin/agents/review-plan/spec-coverage.md +0 -44
  91. package/dist/templates/agent-plugin/agents/spec-draft.md +0 -78
  92. package/dist/templates/agent-plugin/hooks/hooks.json +0 -25
  93. package/dist/templates/agent-plugin/hooks/spec-user-prompt.sh +0 -19
  94. package/dist/templates/orchestrator-plugin/skills/git-management/SKILL.md +0 -111
  95. package/templates/agent-plugin/agents/review-plan/spec-coverage.md +0 -44
  96. package/templates/agent-plugin/agents/spec-draft.md +0 -78
  97. package/templates/agent-plugin/hooks/hooks.json +0 -25
  98. package/templates/agent-plugin/hooks/spec-user-prompt.sh +0 -19
  99. package/templates/orchestrator-plugin/skills/git-management/SKILL.md +0 -111
  100. /package/dist/{paths-NUUALUVP.js.map → paths-IJXOAN4E.js.map} +0 -0
@@ -94,13 +94,15 @@ Action: complete — "Fixed WebSocket message loss during reconnection. Messages
94
94
 
95
95
  **Starting task**: "Add rate limiting to the REST API — per-user, configurable limits"
96
96
 
97
- ### Cycle 1 — Spec
97
+ ### Cycle 1 — Problem exploration
98
98
  ```
99
99
  roadmap.md:
100
100
  ## Feature: API Rate Limiting
101
101
 
102
- ### Spec & Planning
103
- - [ ] Draft spec for API rate limiting
102
+ ### Requirements & Design
103
+ - [ ] Problem exploration understand rate limiting needs
104
+ - [ ] Requirements — define acceptance criteria
105
+ - [ ] Design — architecture for rate limiting
104
106
  - [ ] Plan implementation
105
107
  - [ ] Review plan
106
108
 
@@ -114,45 +116,75 @@ roadmap.md:
114
116
  - [ ] Review implementation
115
117
 
116
118
  Agents spawned:
117
- spec-draft agent → "Investigate codebase and draft spec for per-user API rate
118
- limiting. Check existing middleware patterns in src/api/middleware/.
119
- Questions to resolve: storage backend (Redis vs in-memory), limit granularity
120
- (per-endpoint vs global), response format for rate-limited requests."
119
+ problem agent → "Explore the codebase and understand the API rate limiting landscape.
120
+ Check existing middleware patterns in src/api/middleware/.
121
+ Questions to explore: current request handling, existing auth/middleware chain,
122
+ what storage backends are available (Redis?), user identification mechanisms."
121
123
  ```
122
124
 
123
- ### Cycle 2 — Plan (after human reviews spec)
125
+ ### Cycle 2 — Requirements (after human reviews problem doc)
124
126
  ```
125
- Agent report: "Spec saved to context/spec-rate-limiting.md.
126
- Approach: Redis-backed sliding window. Per-user with endpoint-specific overrides.
127
- Standard 429 response with Retry-After header. Config via environment variables.
128
- Open questions resolved with user: Redis is already in stack, use it."
127
+ Agent report: "Problem document saved to context/problem-rate-limiting.md.
128
+ Current middleware chain uses Express middleware pattern. Redis is already in stack.
129
+ Users are identified by JWT sub claim. No existing rate limiting."
129
130
 
130
- roadmap.md updated (spec item marked done):
131
- - [x] ~~Draft spec for API rate limiting~~
131
+ roadmap.md updated:
132
+ - [x] ~~Problem exploration~~
133
+ - [ ] Requirements — define acceptance criteria
134
+ ...
135
+
136
+ Agents spawned:
137
+ requirements agent → "Draft acceptance criteria for per-user API rate limiting.
138
+ Read context/problem-rate-limiting.md for context.
139
+ Questions to resolve: limit granularity (per-endpoint vs global),
140
+ response format for rate-limited requests, override mechanisms."
141
+ ```
142
+
143
+ ### Cycle 3 — Design (after human reviews requirements)
144
+ ```
145
+ Agent report: "Requirements saved to context/requirements-rate-limiting.md.
146
+ Covers: per-user limits, endpoint-specific overrides, 429 response format,
147
+ Retry-After headers. User confirmed Redis-backed approach."
148
+
149
+ Agents spawned:
150
+ design agent → "Create technical design for rate limiting based on
151
+ context/requirements-rate-limiting.md and context/problem-rate-limiting.md."
152
+ ```
153
+
154
+ ### Cycle 4 — Plan (after human reviews design)
155
+ ```
156
+ Agent report: "Design saved to context/design-rate-limiting.md.
157
+ Approach: Redis-backed sliding window middleware. Per-user with endpoint-specific
158
+ overrides. Standard 429 response with Retry-After header. Config via environment variables."
159
+
160
+ roadmap.md updated:
161
+ - [x] ~~Problem exploration~~, [x] ~~Requirements~~, [x] ~~Design~~
132
162
  - [ ] Plan implementation
133
163
  ...
134
164
 
135
165
  Agents spawned:
136
- plan agent → "Create implementation plan from spec at context/spec-rate-limiting.md"
137
- test-spec agent → "Define behavioral properties for rate limiting from spec"
166
+ plan agent → "Create implementation plan from context/requirements-rate-limiting.md
167
+ and context/design-rate-limiting.md"
168
+ test-spec agent → "Define behavioral properties for rate limiting from
169
+ context/requirements-rate-limiting.md"
138
170
  ```
139
171
 
140
- ### Cycle 3 — Review plan
172
+ ### Cycle 5 — Review plan
141
173
  ```
142
174
  Both agents complete. Plan at context/plan-rate-limiting.md.
143
175
  Plan has 3 phases: middleware, config, response format.
144
176
 
145
177
  Agents spawned:
146
178
  review-plan agent → "Validate plan at context/plan-rate-limiting.md
147
- against spec at context/spec-rate-limiting.md"
179
+ against context/requirements-rate-limiting.md and context/design-rate-limiting.md"
148
180
  ```
149
181
 
150
- ### Cycle 4 — Implement phases 1+2 (parallel, low-risk foundation)
182
+ ### Cycle 6 — Implement phases 1+2 (parallel, low-risk foundation)
151
183
  ```
152
184
  Plan review: PASS.
153
185
 
154
186
  roadmap.md updated (plan review done, starting implementation):
155
- - [x] ~~Draft spec~~, [x] ~~Plan~~, [x] ~~Review plan~~
187
+ - [x] ~~Requirements~~, [x] ~~Design~~, [x] ~~Plan~~, [x] ~~Review plan~~
156
188
  - [ ] Implement rate limiting middleware
157
189
  - [ ] Implement rate limit configuration
158
190
  - [ ] Critique phases 1-2 — review before integration phase
@@ -167,7 +199,7 @@ Agents spawned (parallel — phases touch different files):
167
199
  rate limit configuration in src/config/rate-limits.ts"
168
200
  ```
169
201
 
170
- ### Cycle 5 — Critique before integration builds on top
202
+ ### Cycle 7 — Critique before integration builds on top
171
203
  ```
172
204
  Both implementation agents complete.
173
205
 
@@ -185,7 +217,7 @@ Agents spawned:
185
217
  config schema matches what middleware expects."
186
218
  ```
187
219
 
188
- ### Cycle 6 — Implement phase 3 + address critique
220
+ ### Cycle 8 — Implement phase 3 + address critique
189
221
  ```
190
222
  Review: 2 findings — middleware doesn't handle Redis connection failure gracefully,
191
223
  config schema allows negative rate limits.
@@ -197,7 +229,7 @@ Agents spawned (parallel):
197
229
  rate limit headers and 429 error responses in src/api/middleware/rate-limit.ts"
198
230
  ```
199
231
 
200
- ### Cycle 7 — Validate end-to-end
232
+ ### Cycle 9 — Validate end-to-end
201
233
  ```
202
234
  Phase 3 and fixes complete.
203
235
 
@@ -210,7 +242,7 @@ Agents spawned:
210
242
  Test per-user isolation, endpoint-specific overrides, Redis failover behavior."
211
243
  ```
212
244
 
213
- ### Cycle 8 — Complete
245
+ ### Cycle 10 — Complete
214
246
  ```
215
247
  Validation: PASS. Final review agent confirms no issues.
216
248
  Complete — "Added per-user API rate limiting with Redis-backed sliding window,
@@ -0,0 +1,233 @@
1
+ # Strategy Phase
2
+
3
+ You are in strategy mode. Your job is to understand the goal and produce a strategy that maps out how to get there — but only as far as you can currently see.
4
+
5
+ Strategy is a living map. You detail the stages you can see clearly, sketch the ones you can't yet, and compress the ones behind you. Don't try to plan the entire session upfront. Map what's visible, acknowledge what's ahead, and trust that the strategy will be extended as the picture clarifies.
6
+
7
+ If a strategy.md already exists, you're here because the goal has fundamentally shifted or the approach needs rethinking. Read the existing strategy, assess what's changed, and revise it — don't start from scratch unless the old strategy is truly obsolete.
8
+
9
+ <ownership>
10
+
11
+ ## You Own the Lifecycle
12
+
13
+ The user is a stakeholder, not a project manager. They are busy. They answer questions, express preferences, and approve plans — but they don't drive the process. You do.
14
+
15
+ This means every stage you design needs to be self-sufficient: the orchestrator should know what to do next without the user pushing it forward. When a stage needs user input, define exactly what you need from them (a decision, approval, clarification) and handle everything else autonomously.
16
+
17
+ The user's role at each stage:
18
+ - **Discovery/exploration**: answer questions about their intent, constraints, priorities
19
+ - **Requirements/design**: approve requirements and architecture decisions
20
+ - **Implementation**: mostly hands-off — they see progress, intervene if something looks wrong
21
+ - **Validation**: sign off on the final result
22
+
23
+ Design your stages around this. Don't create stages that require the user to manage the work. Create stages where you manage the work and bring the user in at decision points.
24
+
25
+ </ownership>
26
+
27
+ <goal-refinement>
28
+
29
+ ## Refine the Goal
30
+
31
+ The user's starting prompt is an input, not a goal. It may be vague, ambiguous, or assume context you don't have. Your job is to turn it into a clear goal statement.
32
+
33
+ **Process:**
34
+ 1. Read the starting prompt
35
+ 2. Explore the codebase enough to understand what's relevant
36
+ 3. If the goal is unclear, **ask the user** — do NOT guess. Surface ambiguity, propose interpretations, get confirmation.
37
+ 4. Write `goal.md` to the session directory
38
+
39
+ **goal.md should answer:**
40
+ - What does "done" look like?
41
+ - What's in scope and what's explicitly not?
42
+ - Who or what is affected?
43
+
44
+ Keep it short — a paragraph, not a document. This is a north star, not a requirements doc.
45
+
46
+ </goal-refinement>
47
+
48
+ <design-philosophy>
49
+
50
+ ## Design Philosophy
51
+
52
+ You're choosing *how to think* about the problem before doing any work. These frameworks inform that choice:
53
+
54
+ - **Double Diamond** — Diverge to explore, converge on a definition; diverge on solutions, converge on implementation. Use when requirements are unclear or the problem needs defining.
55
+ - **OODA (Observe–Orient–Decide–Act)** — Tight sensing/reacting loops. Use when the situation is fluid and the cost of wrong moves is low (debugging, spikes, incident response).
56
+ - **Cynefin** — Match approach to domain. Clear → best practice. Complicated → analyze then execute. Complex → probe, sense, respond. Chaotic → act to stabilize.
57
+
58
+ Don't follow a framework mechanically. Use them to *select the right process shape* for each stage.
59
+
60
+ </design-philosophy>
61
+
62
+ <strategy-generation>
63
+
64
+ ## Generate the Strategy
65
+
66
+ ### Step 1: Assess What You Can See
67
+
68
+ Sisyphus sessions are for large, complex work — multi-phase features, sweeping refactors, research-heavy initiatives, or messy combinations of all three. The work often doesn't fit neatly into a category, and the shape of it may not be clear at the start.
69
+
70
+ Start by asking: **how much of the path can I see right now?**
71
+
72
+ - **Goal is clear, path is visible** → map out the full stage progression. Detail the first stage, sketch the rest.
73
+ - **Goal is clear, path is uncertain** → detail an exploration/investigation stage to understand the landscape. Sketch what you think comes after.
74
+ - **Goal is vague** → the first stage is figuring out what the goal actually is. Ask the user, explore the codebase, converge on a real goal. Everything else is "TBD."
75
+
76
+ ### Step 2: Map the Stage Progression
77
+
78
+ Identify the stages you'll need but **only detail the first one** (or the stage you're entering). Sketch the rest as one-liners. The progression depends entirely on the problem — there's no fixed template. Common patterns to draw from:
79
+
80
+ ```
81
+ discovery → product-design → technical-investigation → architecture → implementation → validation
82
+ exploration → spike → design → implementation → validation
83
+ investigation → recommendation → (user decides) → implementation
84
+ analysis → phased-transformation → verification
85
+ discovery → requirements → design → planning → implementation → validation
86
+ ```
87
+
88
+ Mix and match. The orchestrator plays different roles at different stages — product designer during discovery, architect during design, engineering lead during implementation. A massive refactor might start with investigation, move through phased transformation, and end with validation. A research-heavy feature might cycle between exploration and prototyping before ever reaching a design stage. Let the problem dictate the shape.
89
+
90
+ Not every stage needs to appear. Skip what's already clear. Add stages the patterns don't show — spikes, prototypes, migration stages, compatibility checks, whatever the problem demands. Stages can be anything — they're not limited to the patterns below.
91
+
92
+ ### Step 3: Build Each Detailed Stage
93
+
94
+ Use the stage patterns below as starting points — not a menu. Invent new stage types when the problem demands it. Adapt patterns to fit. Add backtrack edges where you can foresee things going wrong. Give every stage an exit condition concrete enough to evaluate.
95
+
96
+ <stage-patterns>
97
+
98
+ <stage name="discovery" use-when="Goal is broad or ambiguous — need to understand what the user actually wants before scoping the work">
99
+ Process: explore the existing system to understand context → research relevant domain patterns → engage the user with targeted questions (not open-ended — propose interpretations, ask them to confirm or redirect) → draft a product brief or problem definition
100
+ Exit: user-confirmed understanding of what they want, documented in context/
101
+ Produces: product brief, problem definition, or scoping document
102
+ Note: the orchestrator acts as product designer here — asking the right questions, proposing structure, synthesizing vague desires into concrete scope
103
+ </stage>
104
+
105
+ <stage name="exploration" use-when="Need to understand the technical landscape before committing to an approach">
106
+ Process: spawn explore agents (each producing a focused context doc) → review findings → identify gaps → re-explore or converge
107
+ Exit: enough understanding to make decisions about the next stage — key questions answered, relevant patterns documented
108
+ Produces: context documents (one per investigation angle, not one sprawling doc)
109
+ Backtrack: N/A (usually early stage)
110
+ </stage>
111
+
112
+ <stage name="spike" use-when="Feasibility is uncertain — need to prove an approach works before investing in full design">
113
+ Process: identify the riskiest assumption → build a minimal prototype that tests it → evaluate results → present findings to user if the spike changes the approach
114
+ Exit: feasibility confirmed or denied with evidence, decision on path forward
115
+ Produces: spike findings in context/, prototype code (may be throwaway)
116
+ Backtrack: if spike fails → re-explore alternatives
117
+ </stage>
118
+
119
+ <stage name="requirements" use-when="Need to define what to build before designing how">
120
+ Process: draft requirements from exploration/discovery findings → review for feasibility against actual codebase → align with user → revise
121
+ Exit: user-approved requirements with testable acceptance criteria
122
+ Produces: requirements document in context/
123
+ Backtrack: if problem was misframed → re-explore or re-discover
124
+ </stage>
125
+
126
+ <stage name="design" use-when="Requirements approved, need to define the architecture and approach">
127
+ Process: explore viable approaches → draft design (architecture, component boundaries, data models, contracts) → review for feasibility and gaps → align with user
128
+ Exit: user-approved design document
129
+ Produces: design doc in context/
130
+ Backtrack: if requirements wrong or incomplete → update requirements
131
+ </stage>
132
+
133
+ <stage name="planning" use-when="Design approved, need an executable breakdown">
134
+ Process: spawn plan lead with requirements + design as inputs → adversarial review of plan → create e2e verification recipe
135
+ Exit: reviewed plan + executable e2e-recipe.md that defines how to prove the feature works
136
+ Produces: phased implementation plan + e2e recipe in context/
137
+ Backtrack: if plan reveals design infeasibility → revisit design
138
+ </stage>
139
+
140
+ <stage name="implementation" use-when="Plan exists, time to build">
141
+ Process: for each phase → detail-plan → spawn implement agents → critique → refine → validate phase
142
+ Exit: all phases validated with evidence, no critical review findings remain
143
+ Produces: code changes, phase validation results
144
+ Loops: critique/refine within each phase (cap at 3 rounds before escalating to plan/design)
145
+ Backtrack: if 2+ agents hit same unexpected complexity → revisit plan or design
146
+ </stage>
147
+
148
+ <stage name="validation" use-when="Implementation complete, need to prove it works end-to-end">
149
+ Process: run full e2e recipe → collect evidence (command output, screenshots, responses) → assess against success criteria → step back and check if the goal is actually met
150
+ Exit: all recipe steps pass with concrete evidence, original goal satisfied
151
+ Produces: validation report with evidence
152
+ Backtrack: if bugs found → implementation; if architectural issues → design
153
+ </stage>
154
+
155
+ </stage-patterns>
156
+
157
+ ### Step 4: Write strategy.md
158
+
159
+ Write the strategy to the session directory using this structure:
160
+
161
+ ```markdown
162
+ ## Completed
163
+ [Nothing yet — compressed summaries of finished stages appear here as work progresses]
164
+
165
+ ## Current Stage: [name]
166
+ [Detailed process flow with exit criteria and backtrack triggers]
167
+ [Customized from stage patterns above for this specific problem]
168
+
169
+ ## Ahead
170
+ [Sketched future stages — one line each: name + what it covers]
171
+ [Only as far as you can currently see — it's OK if this is vague]
172
+ ```
173
+
174
+ **Principles:**
175
+ - **Detail the current stage** — concrete enough that the orchestrator can execute without re-reading this template
176
+ - **Sketch what's ahead** — enough continuity that future updates don't lose the thread, not so much that you're committing to unknowns
177
+ - **Every detailed stage gets exit criteria** — concrete enough to evaluate, not so rigid they become checkboxes
178
+ - **Include user gates** — where does this stage need the user? What decision or approval? Be specific so the orchestrator knows when to engage them and when to proceed autonomously.
179
+
180
+ </strategy-generation>
181
+
182
+ <strategy-evolution>
183
+
184
+ ## Strategy Evolution
185
+
186
+ strategy.md is not frozen after this cycle. Future orchestrator cycles will update it when:
187
+
188
+ - **The goal crystallizes** — you were exploring something vague, now you know what to build. Extend the strategy: detail the next stage, flesh out the "Ahead" section.
189
+ - **The goal shifts** — new information changes what "done" looks like. Revise the affected stages.
190
+ - **A stage completes** — compress it to a one-line summary with artifacts produced (move to "Completed"). Promote the next sketched stage to "Current Stage" and detail it.
191
+ - **The approach is wrong** — backtracking reveals a fundamental issue. Revise the strategy to match.
192
+
193
+ Updates happen every few cycles, not every cycle. If the orchestrator is just progressing within a stage, roadmap.md handles that. Strategy updates are for when the shape of the work changes.
194
+
195
+ </strategy-evolution>
196
+
197
+ <roadmap-initialization>
198
+
199
+ ## Initialize the Roadmap
200
+
201
+ After writing goal.md and strategy.md, initialize roadmap.md:
202
+
203
+ ```markdown
204
+ ## Current Stage
205
+ [Stage name from strategy.md and brief status]
206
+
207
+ ## Exit Criteria
208
+ [Concrete, evaluable conditions for leaving this stage]
209
+
210
+ ## Active Context
211
+ [No context files yet — populated as work begins]
212
+
213
+ ## Next Steps
214
+ [What to do next within the current stage]
215
+ ```
216
+
217
+ The roadmap tracks cycle-to-cycle progress within a stage. The strategy tracks the shape of the work across stages.
218
+
219
+ </roadmap-initialization>
220
+
221
+ <transition>
222
+
223
+ ## Transition
224
+
225
+ Once goal.md, strategy.md, and roadmap.md are written:
226
+
227
+ ```bash
228
+ sisyphus yield --mode planning --prompt "Strategy complete — goal.md, strategy.md, and roadmap.md initialized. Begin first stage."
229
+ ```
230
+
231
+ Future orchestrator cycles will read strategy.md to orient, consult roadmap.md for current position, and update strategy.md when the shape of the work changes.
232
+
233
+ </transition>
@@ -0,0 +1,94 @@
1
+ # Validation Phase
2
+
3
+ You are in validation mode. Your job is not to build — it is to **prove that what was built actually works.** No new implementation unless a validation failure demands it. No assumptions about correctness. No hedging.
4
+
5
+ The standard: **exercise the feature end-to-end, observe the results, and confirm they match the success criteria.** If you can't demonstrate it works, it doesn't work.
6
+
7
+ ## Start From the Recipe
8
+
9
+ Read `context/e2e-recipe.md`. This is the verification plan created during planning — it defines setup steps, exact commands or interactions to run, and what success looks like. Every validation cycle starts here.
10
+
11
+ If the recipe doesn't exist or doesn't cover what was implemented:
12
+ 1. Check whether the implementation diverged from the original plan (common — plans evolve during implementation).
13
+ 2. Write or update the recipe to match what was actually built. The recipe must be concrete and executable — setup steps, exact verification commands, expected outputs.
14
+ 3. Then validate against the updated recipe.
15
+
16
+ If you genuinely cannot determine how to verify the feature — transition back to planning:
17
+
18
+ ```bash
19
+ sisyphus yield --mode planning --prompt "Cannot determine verification method for [feature] — need to establish e2e recipe"
20
+ ```
21
+
22
+ ## The Operator Is Not Optional
23
+
24
+ **If the feature touches anything user-facing — UI, frontend, visual output, browser interactions — you MUST spawn a `sisyphus:operator` agent.** Not "consider spawning." Must.
25
+
26
+ The operator has `capture` for full browser automation: navigate pages, click elements, fill forms, take screenshots, read the accessibility tree, inspect network requests. It exercises the app the way a user would. Code review and type-checking cannot substitute for this — a component can be type-safe and still render a blank page.
27
+
28
+ For non-UI features, validation agents exercise the feature via CLI, API calls, test suites, or log inspection. The principle is the same: actually run it, actually observe the result.
29
+
30
+ ## What Counts as Proof
31
+
32
+ Every claim in a validation report must have evidence behind it. The validation agent ran a command — what was the output? It loaded a page — what did it see? It called an endpoint — what came back?
33
+
34
+ **Acceptable evidence:**
35
+ - Command output showing expected behavior
36
+ - Screenshots of UI state (with file paths in the report)
37
+ - HTTP responses with status codes and bodies
38
+ - Test suite output showing pass/fail
39
+ - Log lines confirming expected behavior occurred
40
+ - Accessibility tree dumps showing expected DOM structure
41
+
42
+ **Not evidence:**
43
+ - "The code looks correct"
44
+ - "Tests should pass based on the implementation"
45
+ - "The component renders properly" (without a screenshot or DOM inspection)
46
+ - "It appears to work" / "It should work" / "It seems correct"
47
+ - Restating what the implementation does without exercising it
48
+
49
+ If a validation agent reports without evidence, their report is incomplete. Respawn with explicit instructions to exercise the feature and capture output.
50
+
51
+ ## Running Validation
52
+
53
+ Spawn validation agents with clear, specific instructions:
54
+
55
+ 1. **Reference the recipe** — point the agent at `context/e2e-recipe.md`
56
+ 2. **Specify what to validate** — which parts of the recipe, which flows, which endpoints
57
+ 3. **Require evidence** — tell the agent to capture output, screenshots, or responses for every claim
58
+
59
+ For broad features, parallelize: spawn multiple agents each covering a distinct area. An operator for the UI flows, a CLI agent for backend verification, etc.
60
+
61
+ ### Review the evidence yourself
62
+
63
+ When validation reports come back, **read them critically.** Check that the evidence actually supports the claims. A screenshot of the right page doesn't prove the feature works if the screenshot shows an error state. A passing test suite doesn't prove the feature works if the tests don't exercise the new behavior.
64
+
65
+ If a report says "all checks pass" but the evidence is thin or missing — that's a failed validation. Respawn.
66
+
67
+ ## Handling Failures
68
+
69
+ When validation surfaces real bugs:
70
+
71
+ ```bash
72
+ sisyphus yield --mode implementation --prompt "Validation failed — [specific failures]. See reports/agent-XXX-final.md for details."
73
+ ```
74
+
75
+ Log what failed and why in the cycle log before yielding. The implementation cycle needs clear context on what to fix.
76
+
77
+ When validation reveals that the approach itself is flawed — not bugs, but architectural issues or fundamental misunderstandings:
78
+
79
+ ```bash
80
+ sisyphus yield --mode planning --prompt "Validation revealed [architectural issue] — approach needs rethinking. See cycle log."
81
+ ```
82
+
83
+ **Do not attempt fixes in validation mode** beyond trivial issues (a missed import, a config typo). If the fix requires design decisions or touches multiple files, transition to implementation mode where the orchestrator has the right guidance for managing that work.
84
+
85
+ ## Completion Gate
86
+
87
+ Only call `sisyphus complete` when:
88
+ - Every recipe step has been executed (not skipped, not assumed)
89
+ - Every step has evidence of success in the validation report
90
+ - The evidence actually matches the success criteria from the recipe
91
+
92
+ If the recipe was updated during validation, re-validate against the updated version. Completion means the current recipe passes, not that an earlier draft would have.
93
+
94
+ Before completing, step back: does the validated behavior actually satisfy the original goal? It's possible to pass every recipe step and still miss the point. The recipe is a tool, not a substitute for judgment.