sisyphi 1.0.7 → 1.0.9

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (29) hide show
  1. package/dist/daemon.js +22 -14
  2. package/dist/daemon.js.map +1 -1
  3. package/dist/templates/agent-plugin/hooks/CLAUDE.md +57 -0
  4. package/dist/templates/agent-plugin/hooks/debug-user-prompt.sh +15 -0
  5. package/dist/templates/agent-plugin/hooks/hooks.json +12 -8
  6. package/dist/templates/agent-plugin/hooks/operator-user-prompt.sh +14 -0
  7. package/dist/templates/agent-plugin/hooks/plan-user-prompt.sh +1 -4
  8. package/dist/templates/agent-plugin/hooks/review-plan-user-prompt.sh +16 -0
  9. package/dist/templates/agent-plugin/hooks/review-user-prompt.sh +16 -0
  10. package/dist/templates/agent-plugin/hooks/spec-user-prompt.sh +1 -4
  11. package/dist/templates/agent-plugin/hooks/test-spec-user-prompt.sh +14 -0
  12. package/dist/templates/companion-plugin/hooks/hooks.json +6 -4
  13. package/dist/templates/orchestrator-base.md +9 -9
  14. package/dist/templates/orchestrator-impl.md +14 -8
  15. package/dist/templates/orchestrator-plugin/skills/orchestration/task-patterns.md +26 -12
  16. package/package.json +1 -1
  17. package/templates/agent-plugin/hooks/CLAUDE.md +57 -0
  18. package/templates/agent-plugin/hooks/debug-user-prompt.sh +15 -0
  19. package/templates/agent-plugin/hooks/hooks.json +12 -8
  20. package/templates/agent-plugin/hooks/operator-user-prompt.sh +14 -0
  21. package/templates/agent-plugin/hooks/plan-user-prompt.sh +1 -4
  22. package/templates/agent-plugin/hooks/review-plan-user-prompt.sh +16 -0
  23. package/templates/agent-plugin/hooks/review-user-prompt.sh +16 -0
  24. package/templates/agent-plugin/hooks/spec-user-prompt.sh +1 -4
  25. package/templates/agent-plugin/hooks/test-spec-user-prompt.sh +14 -0
  26. package/templates/companion-plugin/hooks/hooks.json +6 -4
  27. package/templates/orchestrator-base.md +9 -9
  28. package/templates/orchestrator-impl.md +14 -8
  29. package/templates/orchestrator-plugin/skills/orchestration/task-patterns.md +26 -12
@@ -0,0 +1,57 @@
1
+ # templates/agent-plugin/hooks/
2
+
3
+ Lifecycle hooks for agent plugin workflows. Enable specialized prompt generation and context handling during agent spawning.
4
+
5
+ ## hooks.json
6
+
7
+ Schema: `{ "phaseKey": { "hookName": "script-name.sh" } }`
8
+
9
+ Example:
10
+ ```json
11
+ {
12
+ "plan": {
13
+ "userPrompt": "plan-user-prompt.sh",
14
+ "systemPrompt": "plan-system-prompt.sh"
15
+ }
16
+ }
17
+ ```
18
+
19
+ - **Keys**: Phase names (e.g., `plan`, `spec`, `implement`) — must correspond to phase modes in agent spawn workflow
20
+ - **Values**: Object mapping hook types to shell script names
21
+ - **Hook types**: `userPrompt`, `systemPrompt` (extensible for future hooks)
22
+
23
+ ## Shell Scripts
24
+
25
+ Each script receives environment variables and outputs text to stdout.
26
+
27
+ ```bash
28
+ # Receives: $SISYPHUS_SESSION_ID, $SISYPHUS_AGENT_ID, $INSTRUCTION, $AGENT_TYPE, context files
29
+ # Outputs: Full user or system prompt text
30
+ ```
31
+
32
+ **Convention**: `{phase}-{hook-type}.sh`
33
+
34
+ **Inputs**:
35
+ - `$SISYPHUS_SESSION_ID` — Session UUID
36
+ - `$SISYPHUS_AGENT_ID` — Agent ID (e.g., `agent-001`)
37
+ - `$INSTRUCTION` — Task instruction from spawn command
38
+ - `$AGENT_TYPE` — Agent type (e.g., `plan`, `spec`, `implement`)
39
+ - Context files at `.sisyphus/sessions/$SISYPHUS_SESSION_ID/context/`
40
+
41
+ **Output**: Must write complete prompt text to stdout (no errors to stderr)
42
+
43
+ ## Invocation
44
+
45
+ Hooks are executed during agent spawn when:
46
+ 1. Agent type matches a plugin agent type (e.g., `--agent-type sisyphus:plan`)
47
+ 2. Phase has hooks configured in hooks.json
48
+ 3. Daemon renders prompts before passing to Claude
49
+
50
+ Output becomes the `--append-system-prompt` or user message content.
51
+
52
+ ## Key Patterns
53
+
54
+ - **No placeholders in shell scripts** — unlike `.md` templates, scripts perform logic and generate final text
55
+ - **Context access**: Scripts can read session state from `$SISYPHUS_SESSION_ID` directory
56
+ - **Error handling**: Exit non-zero to fail agent spawn; errors logged to daemon.log
57
+ - **Stdout only**: Scripts must output complete prompt to stdout; nothing to stderr
@@ -0,0 +1,15 @@
1
+ #!/bin/bash
2
+ # UserPromptSubmit hook: reinforce systematic methodology for debug agents.
3
+ if [ -z "$SISYPHUS_SESSION_ID" ]; then exit 0; fi
4
+
5
+ cat <<'HINT'
6
+ <debug-reminder>
7
+ Systematic debugging — don't skip the fundamentals:
8
+
9
+ - Check git log/blame near the failure — recent changes are the highest-signal evidence
10
+ - For medium+ difficulty (crosses 2+ modules, unclear cause), spawn parallel subagents: data flow tracer, assumption auditor, change investigator
11
+ - Your report must include: exact failing line(s), concrete evidence (code snippets, data flow), confidence level (high/medium/low), and recommended fix
12
+
13
+ Investigate only — no code changes except reproduction tests.
14
+ </debug-reminder>
15
+ HINT
@@ -3,18 +3,22 @@
3
3
  "PreToolUse": [
4
4
  {
5
5
  "matcher": "SendMessage",
6
- "hook": {
7
- "type": "command",
8
- "command": "bash hooks/intercept-send-message.sh"
9
- }
6
+ "hooks": [
7
+ {
8
+ "type": "command",
9
+ "command": "bash ${CLAUDE_PLUGIN_ROOT}/hooks/intercept-send-message.sh"
10
+ }
11
+ ]
10
12
  }
11
13
  ],
12
14
  "Stop": [
13
15
  {
14
- "hook": {
15
- "type": "command",
16
- "command": "bash hooks/require-submit.sh"
17
- }
16
+ "hooks": [
17
+ {
18
+ "type": "command",
19
+ "command": "bash ${CLAUDE_PLUGIN_ROOT}/hooks/require-submit.sh"
20
+ }
21
+ ]
18
22
  }
19
23
  ]
20
24
  }
@@ -0,0 +1,14 @@
1
+ #!/bin/bash
2
+ # UserPromptSubmit hook: reinforce paranoid testing for operator agents.
3
+ if [ -z "$SISYPHUS_SESSION_ID" ]; then exit 0; fi
4
+
5
+ cat <<'HINT'
6
+ <operator-reminder>
7
+ Click EVERYTHING — assume something is broken and prove it:
8
+
9
+ - Every link, button, nav item, dropdown, toggle, accordion, interactive element on the page
10
+ - Edge cases: empty forms, duplicate submissions, back-button mid-flow, double-clicks, rapid navigation, browser refresh mid-action
11
+ - Check ALL sources: DOM, console errors, network failures, logs — not just what's visually obvious
12
+ - Spawn subagents to parallelize when scope is broad (one per page/flow/feature area) — the cost of missing a broken button is higher than an extra agent
13
+ </operator-reminder>
14
+ HINT
@@ -2,10 +2,7 @@
2
2
  # UserPromptSubmit hook: remind plan agent to delegate for large tasks.
3
3
  if [ -z "$SISYPHUS_SESSION_ID" ]; then exit 0; fi
4
4
 
5
- python3 -c "
6
- import json, sys
7
- print(json.dumps({'additionalContext': sys.stdin.read()}))
8
- " <<'HINT'
5
+ cat <<'HINT'
9
6
  <planning-reminder>
10
7
  For particularly large or multi-domain tasks, delegate sub-plans to specialist agents rather than planning everything solo:
11
8
 
@@ -0,0 +1,16 @@
1
+ #!/bin/bash
2
+ # UserPromptSubmit hook: reinforce cross-plan interface focus for plan review agents.
3
+ if [ -z "$SISYPHUS_SESSION_ID" ]; then exit 0; fi
4
+
5
+ cat <<'HINT'
6
+ <review-plan-reminder>
7
+ The primary source of bugs is the interfaces between plans:
8
+
9
+ - Confirm critical/high findings by cross-referencing spec and code yourself — don't rubber-stamp subagent opinions
10
+ - Flag file ownership conflicts: any file touched by 2+ plans or agents needs explicit coordination
11
+ - Read actual source files for pattern consistency — don't review the plan in isolation
12
+ - Type definitions must have exactly one owner; flag divergent names/shapes for the same concept
13
+
14
+ You are read-only. Synthesize and report — never edit plan or code files yourself.
15
+ </review-plan-reminder>
16
+ HINT
@@ -0,0 +1,16 @@
1
+ #!/bin/bash
2
+ # UserPromptSubmit hook: reinforce validation discipline for review agents.
3
+ if [ -z "$SISYPHUS_SESSION_ID" ]; then exit 0; fi
4
+
5
+ cat <<'HINT'
6
+ <review-reminder>
7
+ Only report confirmed findings — spawn validation subagents (~1 per 3 issues) before finalizing:
8
+
9
+ - Bugs/Security: opus validates exploitable/broken
10
+ - Everything else: sonnet confirms significant (not nitpick)
11
+ - Drop anything subjective, pre-existing, or linter-catchable
12
+ - Every finding needs `file:line` + concrete evidence — no "this could be a problem"
13
+
14
+ You are read-only. Investigate and direct fixes through implementers — never edit code yourself.
15
+ </review-reminder>
16
+ HINT
@@ -2,10 +2,7 @@
2
2
  # UserPromptSubmit hook: remind spec agent to iterate with the user.
3
3
  if [ -z "$SISYPHUS_SESSION_ID" ]; then exit 0; fi
4
4
 
5
- python3 -c "
6
- import json, sys
7
- print(json.dumps({'additionalContext': sys.stdin.read()}))
8
- " <<'HINT'
5
+ cat <<'HINT'
9
6
  <spec-reminder>
10
7
  Iterate with the user — include them in the process before writing anything to disk:
11
8
 
@@ -0,0 +1,14 @@
1
+ #!/bin/bash
2
+ # UserPromptSubmit hook: reinforce behavioral invariants for test-spec agents.
3
+ if [ -z "$SISYPHUS_SESSION_ID" ]; then exit 0; fi
4
+
5
+ cat <<'HINT'
6
+ <test-spec-reminder>
7
+ Behavioral properties, not test code:
8
+
9
+ - State behaviors as invariants: "Users can log in with email/password" — not "loginHandler calls bcrypt.compare"
10
+ - Each property must be independently verifiable
11
+ - Include negative properties — what must NOT happen is as important as what must
12
+ - If the change is purely mechanical with nothing to verify, submit { "testsNeeded": false }
13
+ </test-spec-reminder>
14
+ HINT
@@ -2,10 +2,12 @@
2
2
  "hooks": {
3
3
  "UserPromptSubmit": [
4
4
  {
5
- "hook": {
6
- "type": "command",
7
- "command": "bash hooks/user-prompt-context.sh"
8
- }
5
+ "hooks": [
6
+ {
7
+ "type": "command",
8
+ "command": "bash ${CLAUDE_PLUGIN_ROOT}/hooks/user-prompt-context.sh"
9
+ }
10
+ ]
9
11
  }
10
12
  ]
11
13
  }
@@ -160,9 +160,9 @@ This means the roadmap evolves. Outlined phases get refined (or reworked) as you
160
160
 
161
161
  This applies at every level of the hierarchy. Don't produce a detailed implementation plan before you've researched and specified — detailed plans based on assumptions will change. Defer detail until you're about to execute.
162
162
 
163
- ### Validate before advancing
163
+ ### Validate before unverified work compounds
164
164
 
165
- Each completed phase or stage gets verified before the next one starts. Don't build on unverified work. Validation means a separate agent (not the one that did the work) confirms the change actually worksrunning tests, exercising behavior, reviewing code.
165
+ Don't let unverified work accumulate unchecked. The more stages you implement without any critique or validation, the harder it becomes to identify where things went wrong. Interleave verification cycles between implementation stages — how often depends on risk. High-risk stages (core logic, integration points) should be verified before you build on them. Low-risk stages (types, config) can be batched into a broader validation later. The failure mode to avoid is implementing everything and only validating at the end by then, bugs are buried under layers of dependent code and the feedback is useless.
166
166
 
167
167
  ### Every change deserves rigor
168
168
 
@@ -174,15 +174,15 @@ For multi-file changes or design decisions, invest fully in the earlier phases:
174
174
 
175
175
  The system gives you unlimited cycles for a reason: so you never have to cut corners. Failed implementations, deferred issues, and skipped reviews are far more expensive than extra cycles. Use cycles to be thorough, not to be fast.
176
176
 
177
- **Each feature is multiple cycles, not one.** A typical feature like "auth system" is not a single implementation cycle. It's a sequence:
177
+ **Each feature is multiple cycles, not one.** You have three tools for ensuring quality, and your job is to apply them with judgment:
178
178
 
179
- 1. **Implement** — one or more cycles of agents writing code (sometimes the implementation itself needs multiple cycles if it's complex enough)
180
- 2. **Critique** — spawn review agents to find flaws, code smells, overengineering, missed edge cases. They report problems, not fixes.
181
- 3. **Refine** — spawn agents to fix what the reviewers found, simplify, refactor. Agents can use `/simplify` to systematically look for reuse, quality, and efficiency issues.
182
- 4. **Repeat 2-3** until reviewers come back clean — no feedback means you're done, not "good enough." Every issue found gets addressed. Nothing is deferred.
183
- 5. **Validate** — e2e verification by a separate agent that the feature actually works end-to-end
179
+ - **Critique** — spawn review agents to find flaws, code smells, overengineering, missed edge cases. They report problems, not fixes.
180
+ - **Refine** — spawn agents to fix what the reviewers found, simplify, refactor. Agents can use `/simplify` to systematically look for reuse, quality, and efficiency issues.
181
+ - **Validate** — e2e verification by a separate agent that the feature actually works end-to-end.
184
182
 
185
- This implement critique refine loop is how quality happens. Skipping it produces code that passes tests but is brittle, overengineered, or subtly wrong. Budget for it in your roadmap. Never compress it.
183
+ Not every stage needs every tool. A types-only stage might need none — the consumers will surface type errors. A core logic stage needs critique at minimum. An integration stage needs critique and validation. The judgment call is yours, based on risk: how much subsequent work depends on this stage being correct? How costly would a bug here be to find later?
184
+
185
+ What you must avoid is the **batch-everything-then-review-at-the-end** pattern. If you implement five stages before any critique or validation, you've turned a series of small, localizable problems into one massive, entangled debugging session. Interleave verification between implementation stages — not necessarily after every one, but often enough that you're catching problems close to where they were introduced.
186
186
 
187
187
  A phase like "Implement auth system" is realistically 4-6 cycles. A phase like "Frontend shell" is 8+. Be honest about scope — underestimating just means you'll lose track of where you are.
188
188
 
@@ -6,15 +6,21 @@
6
6
 
7
7
  Before starting each cycle, ask: **which stages or tasks are independent right now?** If two stages touch different subsystems (e.g., backend vs frontend, separate services, unrelated modules), spawn them concurrently — don't serialize work that doesn't need to be serialized. Use `--worktree` when parallel agents might touch overlapping files.
8
8
 
9
- Sequential execution is the default trap. Fight it actively. At every yield, look for work that can run alongside the next stage review agents while the next implementation starts, frontend and backend stages in parallel, independent fix agents concurrently. A cycle with one agent running is a wasted cycle if other work was ready.
9
+ Maximize parallelism **within your development cycle, not by skipping parts of it.** Running a review alongside the next stage's implementation is good parallelism. Skipping review because the next stage is ready is not that's cutting corners faster, not working faster. A cycle with one agent running is a wasted cycle if other work was ready, but "other work" includes critique and validation agents, not just the next implementation stage.
10
10
 
11
- If the plan has stages that share no file dependencies, **run them in parallel from the start.** Each stage is multiple cycles:
11
+ If the plan has stages that share no file dependencies, **run them in parallel from the start.** The development cycle for each stage involves some combination of:
12
12
 
13
13
  1. **Detail-plan it** — expand the high-level outline into specific file changes, informed by previous stages. If complex enough, spawn a spec agent first.
14
14
  2. **Implement it** — spawn agents with self-contained instructions (see Agent Instructions below). May itself take multiple cycles if the stage has enough work.
15
- 3. **Critique and refine it** — spawn parallel review agents, fix what they find, repeat until clean (see below).
16
- 4. **Validate it end-to-end** — spawn a validation agent with the e2e recipe. Don't advance until it passes.
17
- 5. **Update roadmap.md** — mark the stage done in the implementation phase, refine future stage outlines if what you learned changes the approach.
15
+ 3. **Critique and refine it** — spawn review agents, fix what they find (see Critique and Refinement below).
16
+ 4. **Validate it** — spawn a validation agent to verify the stage actually works (see E2E Validation below).
17
+
18
+ Not every stage needs every step. Use your judgment about what level of rigor each stage deserves:
19
+ - A types/interfaces stage might just need implementation — the next stage that consumes the types will surface any problems.
20
+ - A core business logic stage needs implementation + critique at minimum — subtle bugs here cascade everywhere.
21
+ - An integration stage or anything touching critical paths needs the full loop including validation — you're building on accumulated assumptions and need to verify they hold.
22
+
23
+ The key question each cycle: **what's the riskiest unverified work right now?** If you just finished a foundation stage and are about to build on it, validate the foundation. If you just implemented a low-risk config change, move on and batch it into a broader review later. When multiple stages have completed without any critique or validation, you've lost the feedback loop — stop implementing and catch up on verification before problems compound.
18
24
 
19
25
  Don't detail-plan all stages up front. What you learn implementing earlier stages should inform later ones.
20
26
 
@@ -52,11 +58,11 @@ When you see these reports, investigate before pushing forward. If the smell sug
52
58
 
53
59
  ## Critique and Refinement
54
60
 
55
- After implementation agents report, **do not advance to the next stage.** The code needs to be reviewed and refined first. This is not optional.
61
+ After implementation agents report, assess whether the stage needs critique before advancing. For stages that touch core logic, integration points, or critical paths — review before building on top. For low-risk stages (types, config, boilerplate), you can defer review and batch it with a later critique cycle. The failure mode is not "sometimes skipping review" — it's implementing six stages in a row without any review at all.
56
62
 
57
63
  ### Critique cycle
58
64
 
59
- Spawn three review agents in parallel, each attacking a different dimension:
65
+ When a stage warrants critique, spawn review agents in parallel, each attacking a different dimension:
60
66
 
61
67
  1. **Code reuse reviewer** — searches the codebase for existing utilities, helpers, and patterns that the new code duplicates. Flags any new function that reimplements existing functionality, any inline logic that could use an existing utility.
62
68
 
@@ -83,7 +89,7 @@ Spawn reviewers again on the refined code. If they come back with new issues, fi
83
89
 
84
90
  ## E2E Validation
85
91
 
86
- After the critique/refine loop produces clean code, **validate end-to-end before advancing.** This is also not optional. The implementing agent is the worst validator of its own worksame blind spots, same assumptions.
92
+ E2E validation confirms the implementation actually works not just that it compiles or passes unit tests, but that the feature behaves correctly when exercised. Reserve full e2e validation for stages where you're about to build on accumulated work (integration stages, milestones where multiple stages come together) or where failure would be expensive to debug later. Not every stage needs its own e2e pass but don't let more than 2-3 stages accumulate without one.
87
93
 
88
94
  Spawn a validation agent with the e2e recipe from `context/e2e-recipe.md`. The agent should:
89
95
  - Follow the setup steps exactly (build, start servers, seed data)
@@ -78,28 +78,33 @@ Feature with moderate complexity. Requirements may need clarification. Multiple
78
78
  ### Implementation
79
79
  - [ ] Phase 1 — [foundation/types/interfaces]
80
80
  - [ ] Phase 2 — [core logic]
81
+ - [ ] Critique phases 1-2
81
82
  - [ ] Phase 3 — [integration/wiring]
82
-
83
- ### Validation
84
- - [ ] Validate full implementation
83
+ - [ ] Validate — smoketest full feature e2e
85
84
  - [ ] Review implementation
86
85
  ```
87
86
 
87
+ Note: critique and validation are embedded between implementation phases, not deferred to the end. Phase 1 (types) is low-risk and doesn't need its own review, but critique catches issues before Phase 3 builds on them. Validation happens after integration, when all the pieces come together.
88
+
88
89
  ### Cycle plan
89
90
  - **Cycle 1**: Spawn `sisyphus:spec-draft` for spec. Yield. (Human iterates on spec between cycles.)
90
91
  - **Cycle 2**: Spawn `sisyphus:plan` for plan. Yield.
91
92
  - **Cycle 3**: Spawn `sisyphus:review-plan` for review. If fail, respawn plan with issues. Yield.
92
93
  - **Cycle 4**: Spawn `sisyphus:implement` for Phase 1. Yield.
93
- - **Cycle 5**: Spawn `sisyphus:implement` for Phase 2 + `sisyphus:validate` for Phase 1 (parallel if independent). Yield.
94
- - **Cycle 6-8**: Continue phases, validate, review.
94
+ - **Cycle 5**: Spawn `sisyphus:implement` for Phase 2. Phase 1 is types low risk, doesn't need its own validation. Yield.
95
+ - **Cycle 6**: Spawn `sisyphus:review` for critique of phases 1-2. This is the checkpoint before integration builds on top. Yield.
96
+ - **Cycle 7**: Address critique findings + spawn `sisyphus:implement` for Phase 3. Yield.
97
+ - **Cycle 8**: Spawn `sisyphus:validate` for e2e smoketest. Yield.
98
+ - **Cycle 9**: Address validation failures or complete.
95
99
 
96
100
  ### Failure modes
97
101
  - **Spec needs human input**: Mark session as needing human review. Orchestrator notes open questions.
98
102
  - **Plan fails review**: Feed review issues back, respawn planner.
99
- - **Phase fails validation**: Feed specifics back to implement agent for that phase only.
103
+ - **Critique finds issues in foundation**: Fix before starting integration don't build on shaky ground.
104
+ - **Validation fails**: Feed specifics back to implement agent for the failing area.
100
105
 
101
106
  ### Parallelization
102
- Phases without dependencies can run in parallel. Types/interfaces (Phase 1) must complete before implementation phases that consume them.
107
+ Phases without dependencies can run in parallel. Types/interfaces (Phase 1) must complete before implementation phases that consume them. Critique can run alongside detail-planning for the next phase.
103
108
 
104
109
  ---
105
110
 
@@ -119,31 +124,40 @@ Cross-cutting feature, multiple domains, needs team coordination. Uses **progres
119
124
  ### Stage Outline (high-level only — no file-level detail yet)
120
125
  1. [domain A foundation] — no deps — ~N cycles
121
126
  2. [domain B foundation] — no deps — ~N cycles
127
+ → critique stages 1-2 (foundation is low-risk individually, but review before building on it)
122
128
  3. [domain A implementation] — depends on 1 — ~N cycles
123
129
  4. [domain B implementation] — depends on 2 — ~N cycles
130
+ → critique + validate stages 3-4 (core logic, high risk — verify before integration)
124
131
  5. [integration layer] — depends on 3, 4 — ~N cycles
125
- 6. [integration tests] depends on all ~N cycles
132
+ validate end-to-end (integration is where accumulated assumptions break)
133
+ 6. [final review] — depends on all
126
134
 
127
135
  ### Current Stage: [whichever is active]
128
136
  See context/plan-stage-N-{name}.md for detail plan.
129
137
  - [ ] [task-level items from detail plan]
130
138
  ```
131
139
 
140
+ Note: verification checkpoints are embedded in the stage outline, not deferred to a final phase. The level of rigor varies — foundation stages get a light critique, core logic gets critique + validation, integration gets full e2e validation. This is judgment, not formula.
141
+
132
142
  ### Cycle plan
133
143
  - **Cycle 1**: Spawn `sisyphus:spec-draft` for spec. Yield.
134
- - **Cycle 2**: Spawn `sisyphus:plan` for **high-level stage outline only**. Instruction: "Outline stages, dependencies, one-sentence descriptions, cycle estimates. Do not detail any stage no file-level specifics." Spawn `sisyphus:test-spec` for test properties (parallel). Yield.
144
+ - **Cycle 2**: Spawn `sisyphus:plan` for **high-level stage outline only**. Instruction: "Outline stages, dependencies, one-sentence descriptions, cycle estimates. Include verification checkpoints between stages based on risk." Spawn `sisyphus:test-spec` for test properties (parallel). Yield.
135
145
  - **Cycle 3**: Review outline. Spawn `sisyphus:plan` to **detail-plan stage 1 only** (provide outline as context). Output to `context/plan-stage-1-{name}.md`. Yield.
136
146
  - **Cycle 4**: Spawn `sisyphus:implement` for stage 1. If stage 2 is independent, spawn `sisyphus:plan` to detail-plan stage 2 in parallel. Yield.
137
- - **Cycle 5**: Validate stage 1. Spawn `sisyphus:implement` for stage 2 (if detail-planned). Detail-plan stage 3 in parallel if independent. Yield.
138
- - **Cycle 6+**: Continue pattern implement current stage, validate previous, detail-plan next. Each stage follows implement → critique → refine → validate.
147
+ - **Cycle 5**: Spawn `sisyphus:implement` for stage 2 (if detail-planned). Spawn `sisyphus:review` to critique stages 1-2 in parallel — foundation review before core logic builds on it. Detail-plan stage 3 in parallel. Yield.
148
+ - **Cycle 6**: Address critique findings. Spawn `sisyphus:implement` for stage 3. Yield.
149
+ - **Cycle 7**: Spawn `sisyphus:implement` for stage 4. Spawn `sisyphus:review` to critique stage 3 in parallel. Yield.
150
+ - **Cycle 8**: Spawn `sisyphus:validate` for stages 3-4 — core logic checkpoint before integration. Address stage 3 critique. Yield.
151
+ - **Cycle 9+**: Implement integration stage. Validate e2e. Final review.
139
152
 
140
153
  ### Failure modes
141
154
  - **Detail-plan agent can't produce quality output**: The stage is still too large. Break it into sub-stages in the outline and detail-plan each sub-stage individually.
142
155
  - **Integration failures**: Often means contracts between domains don't match. Spawn debug agent targeting the integration seam.
143
156
  - **Stage N implementation invalidates stage N+1 outline**: Update the high-level outline. This is expected — it's why you don't detail-plan everything upfront.
157
+ - **Critique finds issues after multiple stages built on top**: This is the scenario verification checkpoints exist to prevent. If it happens, you waited too long to review — add earlier checkpoints to the roadmap going forward.
144
158
 
145
159
  ### Parallelization
146
- Maximize within the progressive pattern. Independent stages run in parallel. Detail-planning the next stage runs alongside implementing the current one. Foundation stages complete before dependent stages. Integration waits for all domain implementations.
160
+ Maximize within the progressive pattern. Independent stages run in parallel. Detail-planning the next stage runs alongside implementing the current one. Critique and validation agents run alongside the next stage's planning or implementation. Foundation stages complete before dependent stages. Integration waits for all domain implementations.
147
161
 
148
162
  ---
149
163
 
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "sisyphi",
3
- "version": "1.0.7",
3
+ "version": "1.0.9",
4
4
  "description": "tmux-integrated orchestration daemon for Claude Code multi-agent workflows",
5
5
  "license": "MIT",
6
6
  "repository": {
@@ -0,0 +1,57 @@
1
+ # templates/agent-plugin/hooks/
2
+
3
+ Lifecycle hooks for agent plugin workflows. Enable specialized prompt generation and context handling during agent spawning.
4
+
5
+ ## hooks.json
6
+
7
+ Schema: `{ "phaseKey": { "hookName": "script-name.sh" } }`
8
+
9
+ Example:
10
+ ```json
11
+ {
12
+ "plan": {
13
+ "userPrompt": "plan-user-prompt.sh",
14
+ "systemPrompt": "plan-system-prompt.sh"
15
+ }
16
+ }
17
+ ```
18
+
19
+ - **Keys**: Phase names (e.g., `plan`, `spec`, `implement`) — must correspond to phase modes in agent spawn workflow
20
+ - **Values**: Object mapping hook types to shell script names
21
+ - **Hook types**: `userPrompt`, `systemPrompt` (extensible for future hooks)
22
+
23
+ ## Shell Scripts
24
+
25
+ Each script receives environment variables and outputs text to stdout.
26
+
27
+ ```bash
28
+ # Receives: $SISYPHUS_SESSION_ID, $SISYPHUS_AGENT_ID, $INSTRUCTION, $AGENT_TYPE, context files
29
+ # Outputs: Full user or system prompt text
30
+ ```
31
+
32
+ **Convention**: `{phase}-{hook-type}.sh`
33
+
34
+ **Inputs**:
35
+ - `$SISYPHUS_SESSION_ID` — Session UUID
36
+ - `$SISYPHUS_AGENT_ID` — Agent ID (e.g., `agent-001`)
37
+ - `$INSTRUCTION` — Task instruction from spawn command
38
+ - `$AGENT_TYPE` — Agent type (e.g., `plan`, `spec`, `implement`)
39
+ - Context files at `.sisyphus/sessions/$SISYPHUS_SESSION_ID/context/`
40
+
41
+ **Output**: Must write complete prompt text to stdout (no errors to stderr)
42
+
43
+ ## Invocation
44
+
45
+ Hooks are executed during agent spawn when:
46
+ 1. Agent type matches a plugin agent type (e.g., `--agent-type sisyphus:plan`)
47
+ 2. Phase has hooks configured in hooks.json
48
+ 3. Daemon renders prompts before passing to Claude
49
+
50
+ Output becomes the `--append-system-prompt` or user message content.
51
+
52
+ ## Key Patterns
53
+
54
+ - **No placeholders in shell scripts** — unlike `.md` templates, scripts perform logic and generate final text
55
+ - **Context access**: Scripts can read session state from `$SISYPHUS_SESSION_ID` directory
56
+ - **Error handling**: Exit non-zero to fail agent spawn; errors logged to daemon.log
57
+ - **Stdout only**: Scripts must output complete prompt to stdout; nothing to stderr
@@ -0,0 +1,15 @@
1
+ #!/bin/bash
2
+ # UserPromptSubmit hook: reinforce systematic methodology for debug agents.
3
+ if [ -z "$SISYPHUS_SESSION_ID" ]; then exit 0; fi
4
+
5
+ cat <<'HINT'
6
+ <debug-reminder>
7
+ Systematic debugging — don't skip the fundamentals:
8
+
9
+ - Check git log/blame near the failure — recent changes are the highest-signal evidence
10
+ - For medium+ difficulty (crosses 2+ modules, unclear cause), spawn parallel subagents: data flow tracer, assumption auditor, change investigator
11
+ - Your report must include: exact failing line(s), concrete evidence (code snippets, data flow), confidence level (high/medium/low), and recommended fix
12
+
13
+ Investigate only — no code changes except reproduction tests.
14
+ </debug-reminder>
15
+ HINT
@@ -3,18 +3,22 @@
3
3
  "PreToolUse": [
4
4
  {
5
5
  "matcher": "SendMessage",
6
- "hook": {
7
- "type": "command",
8
- "command": "bash hooks/intercept-send-message.sh"
9
- }
6
+ "hooks": [
7
+ {
8
+ "type": "command",
9
+ "command": "bash ${CLAUDE_PLUGIN_ROOT}/hooks/intercept-send-message.sh"
10
+ }
11
+ ]
10
12
  }
11
13
  ],
12
14
  "Stop": [
13
15
  {
14
- "hook": {
15
- "type": "command",
16
- "command": "bash hooks/require-submit.sh"
17
- }
16
+ "hooks": [
17
+ {
18
+ "type": "command",
19
+ "command": "bash ${CLAUDE_PLUGIN_ROOT}/hooks/require-submit.sh"
20
+ }
21
+ ]
18
22
  }
19
23
  ]
20
24
  }
@@ -0,0 +1,14 @@
1
+ #!/bin/bash
2
+ # UserPromptSubmit hook: reinforce paranoid testing for operator agents.
3
+ if [ -z "$SISYPHUS_SESSION_ID" ]; then exit 0; fi
4
+
5
+ cat <<'HINT'
6
+ <operator-reminder>
7
+ Click EVERYTHING — assume something is broken and prove it:
8
+
9
+ - Every link, button, nav item, dropdown, toggle, accordion, interactive element on the page
10
+ - Edge cases: empty forms, duplicate submissions, back-button mid-flow, double-clicks, rapid navigation, browser refresh mid-action
11
+ - Check ALL sources: DOM, console errors, network failures, logs — not just what's visually obvious
12
+ - Spawn subagents to parallelize when scope is broad (one per page/flow/feature area) — the cost of missing a broken button is higher than an extra agent
13
+ </operator-reminder>
14
+ HINT
@@ -2,10 +2,7 @@
2
2
  # UserPromptSubmit hook: remind plan agent to delegate for large tasks.
3
3
  if [ -z "$SISYPHUS_SESSION_ID" ]; then exit 0; fi
4
4
 
5
- python3 -c "
6
- import json, sys
7
- print(json.dumps({'additionalContext': sys.stdin.read()}))
8
- " <<'HINT'
5
+ cat <<'HINT'
9
6
  <planning-reminder>
10
7
  For particularly large or multi-domain tasks, delegate sub-plans to specialist agents rather than planning everything solo:
11
8
 
@@ -0,0 +1,16 @@
1
+ #!/bin/bash
2
+ # UserPromptSubmit hook: reinforce cross-plan interface focus for plan review agents.
3
+ if [ -z "$SISYPHUS_SESSION_ID" ]; then exit 0; fi
4
+
5
+ cat <<'HINT'
6
+ <review-plan-reminder>
7
+ The primary source of bugs is the interfaces between plans:
8
+
9
+ - Confirm critical/high findings by cross-referencing spec and code yourself — don't rubber-stamp subagent opinions
10
+ - Flag file ownership conflicts: any file touched by 2+ plans or agents needs explicit coordination
11
+ - Read actual source files for pattern consistency — don't review the plan in isolation
12
+ - Type definitions must have exactly one owner; flag divergent names/shapes for the same concept
13
+
14
+ You are read-only. Synthesize and report — never edit plan or code files yourself.
15
+ </review-plan-reminder>
16
+ HINT
@@ -0,0 +1,16 @@
1
+ #!/bin/bash
2
+ # UserPromptSubmit hook: reinforce validation discipline for review agents.
3
+ if [ -z "$SISYPHUS_SESSION_ID" ]; then exit 0; fi
4
+
5
+ cat <<'HINT'
6
+ <review-reminder>
7
+ Only report confirmed findings — spawn validation subagents (~1 per 3 issues) before finalizing:
8
+
9
+ - Bugs/Security: opus validates exploitable/broken
10
+ - Everything else: sonnet confirms significant (not nitpick)
11
+ - Drop anything subjective, pre-existing, or linter-catchable
12
+ - Every finding needs `file:line` + concrete evidence — no "this could be a problem"
13
+
14
+ You are read-only. Investigate and direct fixes through implementers — never edit code yourself.
15
+ </review-reminder>
16
+ HINT
@@ -2,10 +2,7 @@
2
2
  # UserPromptSubmit hook: remind spec agent to iterate with the user.
3
3
  if [ -z "$SISYPHUS_SESSION_ID" ]; then exit 0; fi
4
4
 
5
- python3 -c "
6
- import json, sys
7
- print(json.dumps({'additionalContext': sys.stdin.read()}))
8
- " <<'HINT'
5
+ cat <<'HINT'
9
6
  <spec-reminder>
10
7
  Iterate with the user — include them in the process before writing anything to disk:
11
8
 
@@ -0,0 +1,14 @@
1
+ #!/bin/bash
2
+ # UserPromptSubmit hook: reinforce behavioral invariants for test-spec agents.
3
+ if [ -z "$SISYPHUS_SESSION_ID" ]; then exit 0; fi
4
+
5
+ cat <<'HINT'
6
+ <test-spec-reminder>
7
+ Behavioral properties, not test code:
8
+
9
+ - State behaviors as invariants: "Users can log in with email/password" — not "loginHandler calls bcrypt.compare"
10
+ - Each property must be independently verifiable
11
+ - Include negative properties — what must NOT happen is as important as what must
12
+ - If the change is purely mechanical with nothing to verify, submit { "testsNeeded": false }
13
+ </test-spec-reminder>
14
+ HINT
@@ -2,10 +2,12 @@
2
2
  "hooks": {
3
3
  "UserPromptSubmit": [
4
4
  {
5
- "hook": {
6
- "type": "command",
7
- "command": "bash hooks/user-prompt-context.sh"
8
- }
5
+ "hooks": [
6
+ {
7
+ "type": "command",
8
+ "command": "bash ${CLAUDE_PLUGIN_ROOT}/hooks/user-prompt-context.sh"
9
+ }
10
+ ]
9
11
  }
10
12
  ]
11
13
  }