npm - sisyphi - Versions diffs - 1.1.18 → 1.1.19 - Mend

sisyphi 1.1.18 → 1.1.19

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (231) hide show

package/templates/orchestrator-plugin/skills/orchestration/strategy.md ADDED Viewed

@@ -0,0 +1,160 @@
+# Strategy Reference
+Reference material for writing and updating strategy.md — the document that maps the shape of the work across stages.
+## strategy.md Format
+```markdown
+## Completed
+[Compressed summaries of finished stages — delete detail, keep outcomes]
+## Current Stage: [name]
+[Detailed process flow with exit criteria and backtrack triggers]
+## Ahead
+[Sketched future stages — one line each: name + what it covers]
+[Only as far as you can currently see — it's OK if this is vague]
+```
+**Principles:**
+- **Detail the current stage** — concrete enough that the orchestrator can execute without re-reading this skill
+- **Sketch what's ahead** — enough continuity that future updates don't lose the thread, not so much that you're committing to unknowns
+- **Every detailed stage gets exit criteria** — concrete enough to evaluate, not so rigid they become checkboxes
+- **Include user gates** — where does this stage need the user? What decision or approval?
+## Stages name kinds of work, not areas of code
+A strategy stage is a **process phase** — `discovery`, `planning`, `implementation`, `validation`, `spike`. It describes the *kind* of thinking happening that stage. It is **not** a work-area label like `auth-refactor`, `tui-panel`, `migration-script`, or `foundations`.
+Work areas are the plan agent's job. They live in `context/{plan-lead-agent-id}/plan-stage-N-*.md` and structure the implementation phase from the inside. Keep them out of `strategy.md`.
+<example>
+✓ Correct — process phases:
+```
+## Ahead
+- **implementation** — phased build per the plan outline (5 sub-stages: foundations → ask-cli → tui → orphan-handling → migration). Critique + validate per stage.
+- **validation** — run e2e recipe end-to-end, capture evidence, user gate.
+```
+✗ Wrong — work areas masquerading as stages:
+```
+## Ahead
+- **foundations** — humanloop refactor + ask-store helpers
+- **ask-cli + haiku + template** — CLI command and tool-use loop
+- **tui-integration** — inbox panel and key routing
+- **orphan-handling** — kill/complete paths
+- **migration + e2e validation** — drop old command, run recipe
+```
+The second list is a roadmap of code work. Strategy.md collapses into a task list and the process shape (when do we critique? when do we validate? what's the user gate?) disappears.
+</example>
+When you're tempted to name a stage after a code area, that signals you're sketching the plan, not the strategy. Push that detail down into the plan agent's output and keep `strategy.md` at the process-shape layer.
+## Default Pipeline Shape
+The session's effort tier dictates the default pipeline. **Use this shape unless the problem explicitly demands more or less.** The user can change tiers via `sisyphus session effort <low|medium|high|xhigh>`.
+<!--EFFORT:LOW-->
+**Pipeline:** `plan → implement → validate`
+A single plan agent, a single implement agent, a single validate agent. No spec, problem, test-spec, or review-plan stages — the user's request is the requirement; ask in-band if anything's ambiguous. If the work is wrapper-shaped (every change backs onto an existing CLI/API/handler), move directly from discovery into implementation mode without a planning-mode cycle at all.
+<!--/EFFORT-->
+<!--EFFORT:MEDIUM-->
+**Pipeline:** `(spec, if behavior changes) → plan → implement → validate`
+Add `sisyphus:review-plan` only when the plan covers multi-domain integration. Add `sisyphus:test-spec` **only when the user's initial prompt or goal.md explicitly requested tests** (e.g. "with tests", "TDD", "include unit tests", "test coverage"). Silence is a "no" — do not proactively ask, do not infer from feature risk. Spawn `sisyphus:spec` and `sisyphus:problem` only when the goal has multiple valid framings or the design space is genuinely open.
+<!--/EFFORT-->
+<!--EFFORT:HIGH,XHIGH-->
+**Pipeline:** `discovery → spec → planning (with parallel review-plan) → phased implementation with critique/validate checkpoints → validation`
+`sisyphus:review-plan` runs after the plan is drafted. `sisyphus:spec` spawns whenever a feature adds user-visible behavior. `sisyphus:problem` spawns when the goal is nebulous. Append `+ test-spec` to the planning stage **only when the user's initial prompt or goal.md explicitly requested tests** (e.g. "with tests", "TDD", "include unit tests", "test coverage"); silence is a "no." When justified, `sisyphus:test-spec` spawns in parallel with the high-level plan at Cycle 2, not after implementation — post-implementation test-spec silently describes what the code does rather than what it should do.
+<!--/EFFORT-->
+**Re-evaluate the tier when scope shifts mid-session.** A MEDIUM feature that uncovers a new subsystem may have crossed into HIGH; a HIGH feature whose scope was narrowed may have dropped to MEDIUM. Re-run `sisyphus session effort` and re-invoke this skill rather than continuing under the old tier's pipeline.
+## Choosing a Different Shape
+If the default doesn't match the problem, these canonical progressions are the next-best starting points — pick the closest one and prune what's already clear, rather than inventing custom shapes:
+```
+discovery → spec → planning → implementation → validation
+exploration → spike → design → implementation → validation
+investigation → recommendation → (user decides) → implementation
+analysis → phased-transformation → verification
+discovery → product-design → technical-investigation → architecture → implementation → validation
+```
+Add a new stage *type* only when the problem demands a kind of work the patterns don't cover — for example a `spike` to prove feasibility, a `compatibility-check` before a migration, or a `prototype` before committing. The test for "is this a real new stage?" is whether it names a different kind of thinking, not a different slice of code.
+## Stage Patterns
+Use these as starting points. Invent new stage types when the problem demands it. Add backtrack edges where you can foresee things going wrong.
+### discovery
+**Use when:** Goal is undefined, ambiguous, or has shifted — need to clarify what "done" looks like before any other stage runs. Also re-entered mid-session when a pivot invalidates the current goal.
+- Process: read prior context (goal.md, prior strategy if any) → if the goal is provably clear, write goal.md and run the clarity-confirmation deck → otherwise spawn `sisyphus:problem` for interactive exploration → user iterates → fold result into goal.md → set effort tier → write or revise strategy.md
+- Exit: goal.md is current and confirmed; effort tier is set; strategy.md exists for this iteration
+- Produces: goal.md, strategy.md, optionally context/problem.md or context/problem-bifurcation.md
+- Backtrack: if scope reveals multiple independent projects, issue a decomposition deck and let the user pick a lead — record the others under "Known follow-ups" in goal.md
+### exploration
+**Use when:** Need to understand the technical landscape before committing to an approach.
+- Process: spawn explore agents (each producing a focused context doc) → review findings → identify gaps → re-explore or converge
+- Exit: enough understanding to make decisions — key questions answered, relevant patterns documented
+- Produces: context documents (one per investigation angle, not one sprawling doc)
+### spike
+**Use when:** Feasibility is uncertain — need to prove an approach works before investing in full design.
+- Process: identify the riskiest assumption → build a minimal prototype that tests it → evaluate results → present findings to user if the spike changes the approach
+- Exit: feasibility confirmed or denied with evidence, decision on path forward
+- Produces: spike findings in context/, prototype code (may be throwaway)
+- Backtrack: if spike fails → re-explore alternatives
+### spec
+**Use when:** Need to define what to build and how, in a single interactive session.
+- Process: spawn sisyphus:spec → lead explores codebase, asks user questions, dispatches engineer for design and a single writer for requirements → user reviews via TUI → lead deepens design with findings
+- Exit: user-approved design + requirements with testable acceptance criteria
+- Produces: context/design.md + context/design.json + context/requirements.json + context/requirements.md
+- Backtrack: if problem was misframed → re-explore or re-discover
+### planning
+**Use when:** Design approved, need an executable breakdown.
+- Process: spawn plan lead with spec outputs (requirements + design) as inputs → adversarial review of plan → create e2e verification recipe
+- Exit: reviewed plan + executable e2e-recipe.md that defines how to prove the feature works
+- Produces: phased implementation plan + e2e recipe in context/
+- Backtrack: if plan reveals design infeasibility → revisit spec
+### implementation
+**Use when:** Plan exists, time to build.
+- Process: for each phase → detail-plan → spawn implement agents → single critique pass → refine → validate phase
+- Exit: all phases validated with evidence, no critical review findings remain
+- Loops: none within a phase — review runs once, fixes land, then validation. If review surfaces architectural issues, backtrack to plan; otherwise advance.
+- Backtrack: if 2+ agents hit same unexpected complexity → revisit plan or spec; if review finds architectural issues → revisit plan
+### validation
+**Use when:** Implementation complete, need to prove it works end-to-end.
+- Process: run full e2e recipe → collect evidence (command output, screenshots, responses) → assess against success criteria → step back and check if the goal is actually met
+- Exit: all recipe steps pass with concrete evidence, original goal satisfied
+- Produces: validation report with evidence
+- Backtrack: if bugs found → implementation; if architectural issues → spec
+## Mid-session shape revisions
+When the work in flight reveals the strategy itself is off, escalate up this ladder — reach for the lowest-cost move that fits.
+1. **Revise in place.** Stage detail evolved but the pipeline shape holds. Edit `strategy.md` and `roadmap.md`; continue.
+2. **`sisyphus:strategize`.** Approach is wrong but artifacts (specs, explorations, reports) still apply. Annotates the pivot into `strategy.md` and yields `--mode discovery` with a fresh orchestrator.
+3. **`sisyphus session clone <goal>`.** The session is actually two (or more) independent projects. Forks scope into a new top-level session; update `goal.md`/`roadmap.md` here to drop what was cloned.
+4. **`sisyphus session rollback <sessionId> <cycle>`.** A specific cycle introduced state to discard. Rewinds and pauses the session — cycles after the target are lost. Last resort; the others preserve history.
+When the user is the source of the change, update `goal.md` first — strategy revision is downstream of goal.
+## Design Philosophy
+Frameworks to inform process shape selection — use them to *choose the right shape*, not to follow mechanically:
+- **Double Diamond** — Diverge to explore, converge on a definition; diverge on solutions, converge on implementation. Use when requirements are unclear or the problem needs defining.
+- **OODA (Observe–Orient–Decide–Act)** — Tight sensing/reacting loops. Use when the situation is fluid and the cost of wrong moves is low (debugging, spikes, incident response).
+- **Cynefin** — Match approach to domain. Clear → best practice. Complicated → analyze then execute. Complex → probe, sense, respond. Chaotic → act to stabilize.

package/templates/orchestrator-plugin/skills/orchestration/task-patterns.md CHANGED Viewed

@@ -71,7 +71,7 @@ Feature with moderate complexity. Requirements may need clarification. Multiple
 ## Feature: [description]
 ### Requirements & Design
-- [ ] Problem exploration — understand goals, constraints, assumptions
+- [ ] (conditional) Problem exploration — if goal is nebulous, explore before spec
 - [ ] Requirements — define acceptance criteria
 - [ ] Design — architecture, component boundaries, data models
 - [ ] Create implementation plan from requirements + design
@@ -89,20 +89,19 @@ Feature with moderate complexity. Requirements may need clarification. Multiple
 Note: critique and validation are embedded between implementation phases, not deferred to the end. Phase 1 (types) is low-risk and doesn't need its own review, but critique catches issues before Phase 3 builds on them. Validation happens after integration, when all the pieces come together.
 ### Cycle plan
-- **Cycle 1**: Spawn `sisyphus:problem` for problem exploration. Yield. (Human iterates between cycles.)
-- **Cycle 2**: Spawn `sisyphus:requirements` for requirements analysis. Yield. (Human reviews/iterates.)
-- **Cycle 3**: Spawn `sisyphus:design` for technical design. Yield. (Human reviews/iterates.)
-- **Cycle 4**: Spawn `sisyphus:plan` for plan. Yield.
-- **Cycle 5**: Spawn `sisyphus:review-plan` for review. If fail, respawn plan with issues. Yield.
-- **Cycle 6**: Spawn `sisyphus:implement` for Phase 1. Yield.
-- **Cycle 7**: Spawn `sisyphus:implement` for Phase 2. Phase 1 is types — low risk, doesn't need its own validation. Yield.
-- **Cycle 8**: Spawn `sisyphus:review` for critique of phases 1-2. This is the checkpoint before integration builds on top. Yield.
-- **Cycle 9**: Address critique findings + spawn `sisyphus:implement` for Phase 3. Yield.
-- **Cycle 10**: `sisyphus yield --mode validation` for e2e smoketest. Validation mode proves the feature works — operator for UI, evidence for every claim.
-- **Cycle 11**: Address validation failures (back to `--mode implementation`) or complete.
+- **Cycle 0** (conditional): If the problem is nebulous — multiple valid framings, unclear what "done" looks like — spawn `sisyphus:problem` for interactive exploration. Yield `--mode discovery`. Skip if goal is clear and acceptance criteria are obvious.
+- **Cycle 1**: Spawn `sisyphus:spec` for combined design + requirements. Yield. (Human iterates inside the spec session.)
+- **Cycle 2**: Spawn `sisyphus:plan` for plan. Yield.
+- **Cycle 3**: Spawn `sisyphus:review-plan` for review. If fail, respawn plan with issues. Yield.
+- **Cycle 4**: Spawn `sisyphus:implement` for Phase 1. Yield.
+- **Cycle 5**: Spawn `sisyphus:implement` for Phase 2. Phase 1 is types — low risk, doesn't need its own validation. Yield.
+- **Cycle 6**: Spawn `sisyphus:review` for critique of phases 1-2. This is the checkpoint before integration builds on top. Yield.
+- **Cycle 7**: Address critique findings + spawn `sisyphus:implement` for Phase 3. Yield.
+- **Cycle 8**: `sisyphus orch yield --mode validation` for e2e smoketest. Validation mode proves the feature works — operator for UI, evidence for every claim.
+- **Cycle 9**: Address validation failures (back to `--mode implementation`) or complete.
 ### Failure modes
-- **Requirements/design needs human input**: Mark session as needing human review. Orchestrator notes open questions.
+- **Spec needs human input**: Mark session as needing human review. Orchestrator notes open questions.
 - **Plan fails review**: Feed review issues back, respawn planner.
 - **Critique finds issues in foundation**: Fix before starting integration — don't build on shaky ground.
 - **Validation fails**: Feed specifics back to implement agent for the failing area.
@@ -122,7 +121,7 @@ Cross-cutting feature, multiple domains, needs team coordination. Uses **progres
 ## Feature: [description]
 ### Requirements & Design
-- [ ] Problem exploration
+- [ ] (conditional) Problem exploration — if goal is nebulous
 - [ ] Requirements
 - [ ] Design
@@ -138,24 +137,23 @@ Cross-cutting feature, multiple domains, needs team coordination. Uses **progres
 6. [final review] — depends on all
 ### Current Stage: [whichever is active]
-See context/plan-stage-N-{name}.md for detail plan.
+See context/{plan-lead-agent-id}/plan-stage-N-{name}.md for detail plan. (Path comes from the plan lead's submission report.)
 - [ ] [task-level items from detail plan]
 ```
 Note: verification checkpoints are embedded in the stage outline, not deferred to a final phase. The level of rigor varies — foundation stages get a light critique, core logic gets critique + validation, integration gets full e2e validation. This is judgment, not formula.
 ### Cycle plan
-- **Cycle 1**: Spawn `sisyphus:problem` for problem exploration. Yield.
-- **Cycle 2**: Spawn `sisyphus:requirements` for requirements. Yield.
-- **Cycle 3**: Spawn `sisyphus:design` for design. Yield.
-- **Cycle 4**: Spawn `sisyphus:plan` for **high-level stage outline only**. Instruction: "Outline stages, dependencies, one-sentence descriptions, cycle estimates. Include verification checkpoints between stages based on risk." Spawn `sisyphus:test-spec` for test properties (parallel). Yield.
-- **Cycle 5**: Review outline. Spawn `sisyphus:plan` to **detail-plan stage 1 only** (provide outline as context). Output to `context/plan-stage-1-{name}.md`. Yield.
-- **Cycle 6**: Spawn `sisyphus:implement` for stage 1. If stage 2 is independent, spawn `sisyphus:plan` to detail-plan stage 2 in parallel. Yield.
-- **Cycle 7**: Spawn `sisyphus:implement` for stage 2 (if detail-planned). Spawn `sisyphus:review` to critique stages 1-2 in parallel — foundation review before core logic builds on it. Detail-plan stage 3 in parallel. Yield.
-- **Cycle 8**: Address critique findings. Spawn `sisyphus:implement` for stage 3. Yield.
-- **Cycle 9**: Spawn `sisyphus:implement` for stage 4. Spawn `sisyphus:review` to critique stage 3 in parallel. Yield.
-- **Cycle 10**: Spawn `sisyphus:validate` for stages 3-4 — core logic checkpoint before integration. Address stage 3 critique. Yield.
-- **Cycle 11+**: Implement integration stage. Final review. Then `sisyphus yield --mode validation` for comprehensive e2e proof.
+- **Cycle 0** (conditional): If the problem is nebulous, spawn explore agents for technical landscape (yield `--mode discovery`), then spawn `sisyphus:problem` for interactive problem exploration (yield `--mode discovery`). May take 1-3 discovery cycles. Skip if the goal and scope are already clear.
+- **Cycle 1**: Spawn `sisyphus:spec` for combined design + requirements. Yield. (Human iterates inside the spec session.)
+- **Cycle 2**: Spawn `sisyphus:plan` for **high-level stage outline only**. Instruction: "Outline stages, dependencies, one-sentence descriptions, cycle estimates. Include verification checkpoints between stages based on risk." If the user's initial prompt or goal.md explicitly requested tests, also spawn `sisyphus:test-spec` for test properties in parallel; otherwise skip. Yield.
+- **Cycle 4**: Review outline. Spawn `sisyphus:plan` to **detail-plan stage 1 only** (provide outline as context). The plan agent saves under its own subdir and reports the full path — carry that path forward for the implement cycle. Yield.
+- **Cycle 5**: Spawn `sisyphus:implement` for stage 1. If stage 2 is independent, spawn `sisyphus:plan` to detail-plan stage 2 in parallel. Yield.
+- **Cycle 6**: Spawn `sisyphus:implement` for stage 2 (if detail-planned). Spawn `sisyphus:review` to critique stages 1-2 in parallel — foundation review before core logic builds on it. Detail-plan stage 3 in parallel. Yield.
+- **Cycle 7**: Address critique findings. Spawn `sisyphus:implement` for stage 3. Yield.
+- **Cycle 8**: Spawn `sisyphus:implement` for stage 4. Spawn `sisyphus:review` to critique stage 3 in parallel. Yield.
+- **Cycle 9**: Spawn `sisyphus:validate` for stages 3-4 — core logic checkpoint before integration. Address stage 3 critique. Yield.
+- **Cycle 10+**: Implement integration stage. Final review. Then `sisyphus orch yield --mode validation` for comprehensive e2e proof.
 ### Failure modes
 - **Detail-plan agent can't produce quality output**: The stage is still too large. Break it into sub-stages in the outline and detail-plan each sub-stage individually.
@@ -211,13 +209,13 @@ PR review, pre-merge check, or periodic quality audit.
 - [ ] Review [scope] for issues
 - [ ] (conditional) Fix critical/high issues found
-- [ ] (conditional) Re-review fixes
+- [ ] Verify fixes landed (type-check, tests pass)
 ```
 ### Cycle plan
 - **Cycle 1**: Spawn `sisyphus:review` for review. Yield.
 - **Cycle 2**: If critical/high issues, spawn `sisyphus:implement` for fixes. If clean, complete.
-- **Cycle 3**: Spawn `sisyphus:review` for re-review (targeted at fixes only). Complete.
+- **Cycle 3**: Verify fixes landed by reading fix-agent reports + running type-check/tests. Complete. Do **not** spawn a second review pass — review runs once, validation catches regressions.
 ### Parallelization
 Review itself parallelizes internally (subagents per concern). Fix cycle is usually serial.

package/templates/orchestrator-plugin/skills/orchestration/workflow-examples.md CHANGED Viewed

@@ -2,6 +2,122 @@
 End-to-end examples showing how the orchestrator structures cycles for real scenarios.
+### Path conventions in these examples
+Plan files live under per-plan-lead subdirectories: `context/{plan-lead-agent-id}/plan-*.md`. These examples elide the subdir (showing `context/plan-rate-limiting.md`) for readability. In a real cycle, the orchestrator reads the exact path from the plan lead's submission report and carries it verbatim into downstream implement, review-plan, and validate agent prompts.
+---
+## Example 4: Wrapper-Shaped Config Migration (LOW effort — 5 files, mechanical)
+**Starting task**: "All config access goes through `process.env` directly — migrate to a `getConfig()` wrapper already defined in `src/config.ts`"
+**Effort tier**: LOW. Every change is a call-site swap onto an existing handler. No new behavior.
+### Cycle 1 — Plan
+```
+roadmap.md:
+  ## Refactor: Migrate env access to getConfig()
+  - [ ] Plan migration — enumerate all process.env call sites
+  - [ ] Update call sites to use getConfig()
+  - [ ] Validate — no direct process.env access remains; tests pass
+Agents spawned:
+  plan agent → "Enumerate every direct process.env access in src/. Map each call site
+    to the matching getConfig() key. Output a migration checklist. Files expected:
+    src/api/server.ts, src/db/connection.ts, src/queue/worker.ts,
+    src/cli/commands/start.ts, src/config.ts (source of truth — do not modify)."
+```
+### Cycle 2 — Implement
+```
+Plan complete. 23 call sites across 4 files.
+Agents spawned:
+  implement agent → "Execute migration plan at context/{plan-agent-id}/plan-config-migration.md.
+    Replace every process.env.X access with getConfig('X'). Do not modify src/config.ts.
+    Do not add error handling — getConfig() already throws on missing keys."
+```
+### Cycle 3 — Validate + complete
+```
+Implementation complete.
+Agents spawned:
+  validate agent → "Verify migration: grep for remaining process.env access in src/ (excluding
+    src/config.ts). Run existing tests. Confirm zero direct env reads outside config.ts."
+Validation: PASS. Complete — "All env access routed through getConfig()."
+```
+**Pipeline shape**: `plan → implement → validate`. 3 cycles. No `sisyphus:spec`, no `sisyphus:test-spec`, no `sisyphus:review-plan`.
+---
+## Example 5: New Subsystem — Distributed Task Queue (HIGH effort)
+**Starting task**: "Add a persistent task queue so long-running jobs survive server restarts. Include test coverage of the survival, retry, and concurrency invariants."
+**Effort tier**: HIGH. New subsystem, new protocol (worker ↔ queue contract), cross-domain orchestration (API + storage + worker process). The prompt explicitly asks for test coverage — `sisyphus:test-spec` is justified at Cycle 2.
+### Cycle 0 — Problem exploration
+```
+roadmap.md:
+  ## Feature: Persistent Task Queue
+  - [ ] Explore current job execution patterns and constraints
+  - [ ] Spec — requirements + architecture
+  - [ ] Plan implementation (staged outline)
+  - [ ] Spec behavioral properties (test-spec) — user asked for tests in the prompt
+  ...
+Agents spawned:
+  explore agent → "Map current job execution in src/jobs/. Identify what needs to survive
+    restarts, current storage backends, worker process lifecycle."
+  problem agent → "Explore design space for persistent task queue. Questions: push vs pull
+    worker model, at-least-once vs exactly-once semantics, failure/retry policy, storage
+    backend options (Redis, Postgres, SQLite)."
+```
+### Cycle 1 — Spec (human iterates)
+```
+Agents spawned:
+  sisyphus:spec → "Run spec session for persistent task queue.
+    Context in context/problem-task-queue.md and context/explore-task-queue.md."
+Human iterates. Spec outputs:
+  context/requirements-task-queue.md — acceptance criteria, failure semantics
+  context/design-task-queue.md — Redis-backed queue, pull workers, at-least-once delivery
+```
+### Cycle 2 — High-level plan + test-spec (parallel)
+```
+Agents spawned (parallel):
+  plan agent → "Create high-level stage outline from context/requirements-task-queue.md
+    and context/design-task-queue.md. Stages: (1) queue storage layer, (2) producer API,
+    (3) worker consumer, (4) integration + retry logic. Cycle estimates per stage."
+  test-spec agent → "Define behavioral properties: job survives server restart, failed
+    jobs retry up to N times, concurrent workers don't double-execute the same job."
+```
+If the original prompt had been silent on tests, the test-spec spawn would be omitted and Cycle 2 would be plan-only — Cycle 3 would then proceed straight to detail-planning stage 1.
+### Cycles 3–9 — Staged implementation with critique + validation checkpoints
+```
+Follows Feature Build Large pattern:
+  Cycle 3: detail-plan stage 1 + implement stage 1
+  Cycle 4: implement stage 2; detail-plan stage 3 in parallel
+  Cycle 5: critique stages 1-2 (foundation review before worker builds on it)
+  Cycle 6: address critique + implement stage 3
+  Cycle 7: implement stage 4 (integration + retry); validate stages 3-4
+  Cycle 8: sisyphus orch yield --mode validation — e2e: enqueue job, kill server, restart,
+    confirm job ran exactly once
+  Cycle 9: final review agent; complete
+```
+**Pipeline shape**: Full HIGH pipeline — `problem → spec → plan (+ test-spec because the prompt asked for tests) → staged implement → critique → validate → review`. 9+ cycles. Without an explicit test request in the prompt, the parallel `test-spec` would be omitted and Cycle 2 would be plan-only.
 ---
 ## Example 1: Fix a Race Condition in WebSocket Reconnection
@@ -92,7 +208,7 @@ Action: complete — "Fixed WebSocket message loss during reconnection. Messages
 ## Example 2: Add API Rate Limiting
-**Starting task**: "Add rate limiting to the REST API — per-user, configurable limits"
+**Starting task**: "Add rate limiting to the REST API — per-user, configurable limits, with tests for the limit-enforcement and 429 response behavior"
 ### Cycle 1 — Problem exploration
 ```
@@ -122,7 +238,7 @@ Agents spawned:
     what storage backends are available (Redis?), user identification mechanisms."
 ```
-### Cycle 2 — Requirements (after human reviews problem doc)
+### Cycle 2 — Spec (after human iterates on problem)
 ```
 Agent report: "Problem document saved to context/problem-rate-limiting.md.
   Current middleware chain uses Express middleware pattern. Redis is already in stack.
@@ -130,35 +246,27 @@ Agent report: "Problem document saved to context/problem-rate-limiting.md.
 roadmap.md updated:
   - [x] ~~Problem exploration~~
-  - [ ] Requirements — define acceptance criteria
+  - [ ] Spec — define acceptance criteria and architecture
   ...
 Agents spawned:
-  requirements agent → "Draft acceptance criteria for per-user API rate limiting.
-    Read context/problem-rate-limiting.md for context.
-    Questions to resolve: limit granularity (per-endpoint vs global),
-    response format for rate-limited requests, override mechanisms."
-```
+  sisyphus:spec → "Run a spec session for per-user API rate limiting. Read context/problem-rate-limiting.md for context."
-### Cycle 3 — Design (after human reviews requirements)
-```
-Agent report: "Requirements saved to context/requirements-rate-limiting.md.
+Later report: "Spec completed.
+  Requirements saved to context/requirements-rate-limiting.md.
+  Design saved to context/design-rate-limiting.md.
   Covers: per-user limits, endpoint-specific overrides, 429 response format,
-  Retry-After headers. User confirmed Redis-backed approach."
-Agents spawned:
-  design agent → "Create technical design for rate limiting based on
-    context/requirements-rate-limiting.md and context/problem-rate-limiting.md."
+  Retry-After headers, and a Redis-backed sliding window approach."
 ```
-### Cycle 4 — Plan (after human reviews design)
+### Cycle 3 — Plan (after human reviews spec)
 ```
-Agent report: "Design saved to context/design-rate-limiting.md.
+Agent report: "Spec outputs approved.
   Approach: Redis-backed sliding window middleware. Per-user with endpoint-specific
   overrides. Standard 429 response with Retry-After header. Config via environment variables."
 roadmap.md updated:
-  - [x] ~~Problem exploration~~, [x] ~~Requirements~~, [x] ~~Design~~
+  - [x] ~~Problem exploration~~, [x] ~~Spec~~
   - [ ] Plan implementation
   ...
@@ -169,7 +277,7 @@ Agents spawned:
     context/requirements-rate-limiting.md"
 ```
-### Cycle 5 — Review plan
+### Cycle 4 — Review plan
 ```
 Both agents complete. Plan at context/plan-rate-limiting.md.
 Plan has 3 phases: middleware, config, response format.
@@ -179,12 +287,12 @@ Agents spawned:
     against context/requirements-rate-limiting.md and context/design-rate-limiting.md"
 ```
-### Cycle 6 — Implement phases 1+2 (parallel, low-risk foundation)
+### Cycle 5 — Implement phases 1+2 (parallel, low-risk foundation)
 ```
 Plan review: PASS.
 roadmap.md updated (plan review done, starting implementation):
-  - [x] ~~Requirements~~, [x] ~~Design~~, [x] ~~Plan~~, [x] ~~Review plan~~
+  - [x] ~~Spec~~, [x] ~~Plan~~, [x] ~~Review plan~~
   - [ ] Implement rate limiting middleware
   - [ ] Implement rate limit configuration
   - [ ] Critique phases 1-2 — review before integration phase
@@ -199,7 +307,7 @@ Agents spawned (parallel — phases touch different files):
     rate limit configuration in src/config/rate-limits.ts"
 ```
-### Cycle 7 — Critique before integration builds on top
+### Cycle 6 — Critique before integration builds on top
 ```
 Both implementation agents complete.
@@ -217,7 +325,7 @@ Agents spawned:
     config schema matches what middleware expects."
 ```
-### Cycle 8 — Implement phase 3 + address critique
+### Cycle 7 — Implement phase 3 + address critique
 ```
 Review: 2 findings — middleware doesn't handle Redis connection failure gracefully,
   config schema allows negative rate limits.
@@ -229,7 +337,7 @@ Agents spawned (parallel):
     rate limit headers and 429 error responses in src/api/middleware/rate-limit.ts"
 ```
-### Cycle 9 — Validate end-to-end
+### Cycle 8 — Validate end-to-end
 ```
 Phase 3 and fixes complete.

package/templates/orchestrator-settings.json CHANGED Viewed

@@ -1,2 +1,57 @@
 {
+  "spinnerVerbs": {
+    "mode": "replace",
+    "verbs": [
+      "Pushing the boulder",
+      "Delegating the boulder",
+      "Outsourcing the futility",
+      "Splitting the stone",
+      "Spawning underlings",
+      "Herding agents",
+      "Fanning out, praying",
+      "Dispatching, then worrying",
+      "Auditioning agents for misery",
+      "Allocating despair evenly",
+      "Watching panes bloom",
+      "Counting heartbeats",
+      "Tallying reports",
+      "Reaping the finished",
+      "Reconciling outputs",
+      "Synthesizing the damage",
+      "Rotating the cycle",
+      "Rolling cycle N+1",
+      "Pretending this is under control",
+      "Maintaining plausible command",
+      "Second-guessing the split",
+      "Revising optimistic estimates",
+      "Redrafting the plan quietly",
+      "Re-reading the roadmap",
+      "Updating strategy.md",
+      "Pondering whether to yield",
+      "Yielding gracefully",
+      "Yielding reluctantly",
+      "Quoting Camus under breath",
+      "Imagining agents happy",
+      "Embracing the absurd",
+      "Accepting the backlog",
+      "Forgiving a timeout",
+      "Nudging a stuck agent",
+      "Absorbing a crash",
+      "Retrying with dignity",
+      "Delegating harder",
+      "Elevating a blocker",
+      "Squinting at pane count",
+      "Holding the thread",
+      "Holding the line",
+      "Blessing the fleet",
+      "Releasing the session",
+      "Letting agents cook",
+      "Letting go",
+      "Staring into the session",
+      "Weighing respawn",
+      "Shouldering the next cycle",
+      "Believing in the climb",
+      "Contemplating cycle N+1"
+    ]
+  }
 }

package/templates/orchestrator-validation.md CHANGED Viewed

@@ -21,7 +21,7 @@ If the recipe doesn't exist or doesn't cover what was implemented:
 If you genuinely cannot determine how to verify the feature — transition back to planning:
 ```bash
-sisyphus yield --mode planning --prompt "Cannot determine verification method for [feature] — need to establish e2e recipe"
+sisyphus orch yield --mode planning --prompt "Cannot determine verification method for [feature] — need to establish e2e recipe"
 ```
 ## The Operator Is Not Optional
@@ -63,6 +63,8 @@ Spawn validation agents with clear, specific instructions:
 For broad features, parallelize: spawn multiple agents each covering a distinct area. An operator for the UI flows, a CLI agent for backend verification, etc.
+When spawning an operator, tell it explicitly what to target — the browser URL, the Electron app name, or whichever surface applies. The operator should not have to guess whether the product is a web app or a desktop app.
 ### Review the evidence yourself
 When validation reports come back, **read them critically.** Check that the evidence actually supports the claims. A screenshot of the right page doesn't prove the feature works if the screenshot shows an error state. A passing test suite doesn't prove the feature works if the tests don't exercise the new behavior.
@@ -74,32 +76,33 @@ If a report says "all checks pass" but the evidence is thin or missing — that'
 When validation surfaces real bugs:
 ```bash
-sisyphus yield --mode implementation --prompt "Validation failed — [specific failures]. See reports/agent-XXX-final.md for details."
+sisyphus orch yield --mode implementation --prompt "Validation failed — [specific failures]. See reports/agent-XXX-final.md for details."
 ```
-Log what failed and why in the cycle log before yielding. The implementation cycle needs clear context on what to fix.
+Log what failed and why before yielding. The implementation cycle needs clear context on what to fix.
 When validation reveals that the approach itself is flawed — not bugs, but architectural issues or fundamental misunderstandings:
 ```bash
-sisyphus yield --mode planning --prompt "Validation revealed [architectural issue] — approach needs rethinking. See cycle log."
+sisyphus orch yield --mode planning --prompt "Validation revealed [architectural issue] — approach needs rethinking. See cycle log."
 ```
 **Do not attempt fixes in validation mode** beyond trivial issues (a missed import, a config typo). If the fix requires design decisions or touches multiple files, transition to implementation mode where the orchestrator has the right guidance for managing that work.
-## Completion Gate
-When all validation passes, **do not call `sisyphus complete` directly.** Yield to completion mode for user sign-off:
+## Validation CLI
 ```bash
-sisyphus yield --mode completion --prompt "Validation passed — all recipe steps verified. Ready for user review."
+sisyphus agent restart <agentId>                         # respawn a failed/killed validation agent
 ```
-Only yield to completion when:
-- Every recipe step has been executed (not skipped, not assumed)
-- Every step has evidence of success in the validation report
-- The evidence actually matches the success criteria from the recipe
+## Transition to Completion
+When all validation passes, yield to completion mode for user sign-off:
+```bash
+sisyphus orch yield --mode completion --prompt "Validation passed — all recipe steps verified. Ready for user review."
+```
-If the recipe was updated during validation, re-validate against the updated version. Completion means the current recipe passes, not that an earlier draft would have.
+Only yield when every recipe step has been executed with evidence of success. If the recipe was updated during validation, re-validate against the updated version.
-Before transitioning, step back: does the validated behavior actually satisfy the original goal? It's possible to pass every recipe step and still miss the point. The recipe is a tool, not a substitute for judgment.
+Before yielding, re-read goal.md and check recipe coverage against it — not against itself. For each clause that names a user-visible behavior or capability, find the recipe step that exercised it. If a clause has no matching step, the recipe is incomplete: extend it, re-validate, and only then yield. A passing recipe proves the recipe's steps work; it does not prove the goal was met.