sisyphi 1.0.7 → 1.0.9
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/dist/daemon.js +22 -14
- package/dist/daemon.js.map +1 -1
- package/dist/templates/agent-plugin/hooks/CLAUDE.md +57 -0
- package/dist/templates/agent-plugin/hooks/debug-user-prompt.sh +15 -0
- package/dist/templates/agent-plugin/hooks/hooks.json +12 -8
- package/dist/templates/agent-plugin/hooks/operator-user-prompt.sh +14 -0
- package/dist/templates/agent-plugin/hooks/plan-user-prompt.sh +1 -4
- package/dist/templates/agent-plugin/hooks/review-plan-user-prompt.sh +16 -0
- package/dist/templates/agent-plugin/hooks/review-user-prompt.sh +16 -0
- package/dist/templates/agent-plugin/hooks/spec-user-prompt.sh +1 -4
- package/dist/templates/agent-plugin/hooks/test-spec-user-prompt.sh +14 -0
- package/dist/templates/companion-plugin/hooks/hooks.json +6 -4
- package/dist/templates/orchestrator-base.md +9 -9
- package/dist/templates/orchestrator-impl.md +14 -8
- package/dist/templates/orchestrator-plugin/skills/orchestration/task-patterns.md +26 -12
- package/package.json +1 -1
- package/templates/agent-plugin/hooks/CLAUDE.md +57 -0
- package/templates/agent-plugin/hooks/debug-user-prompt.sh +15 -0
- package/templates/agent-plugin/hooks/hooks.json +12 -8
- package/templates/agent-plugin/hooks/operator-user-prompt.sh +14 -0
- package/templates/agent-plugin/hooks/plan-user-prompt.sh +1 -4
- package/templates/agent-plugin/hooks/review-plan-user-prompt.sh +16 -0
- package/templates/agent-plugin/hooks/review-user-prompt.sh +16 -0
- package/templates/agent-plugin/hooks/spec-user-prompt.sh +1 -4
- package/templates/agent-plugin/hooks/test-spec-user-prompt.sh +14 -0
- package/templates/companion-plugin/hooks/hooks.json +6 -4
- package/templates/orchestrator-base.md +9 -9
- package/templates/orchestrator-impl.md +14 -8
- package/templates/orchestrator-plugin/skills/orchestration/task-patterns.md +26 -12
|
@@ -0,0 +1,57 @@
|
|
|
1
|
+
# templates/agent-plugin/hooks/
|
|
2
|
+
|
|
3
|
+
Lifecycle hooks for agent plugin workflows. Enable specialized prompt generation and context handling during agent spawning.
|
|
4
|
+
|
|
5
|
+
## hooks.json
|
|
6
|
+
|
|
7
|
+
Schema: `{ "phaseKey": { "hookName": "script-name.sh" } }`
|
|
8
|
+
|
|
9
|
+
Example:
|
|
10
|
+
```json
|
|
11
|
+
{
|
|
12
|
+
"plan": {
|
|
13
|
+
"userPrompt": "plan-user-prompt.sh",
|
|
14
|
+
"systemPrompt": "plan-system-prompt.sh"
|
|
15
|
+
}
|
|
16
|
+
}
|
|
17
|
+
```
|
|
18
|
+
|
|
19
|
+
- **Keys**: Phase names (e.g., `plan`, `spec`, `implement`) — must correspond to phase modes in agent spawn workflow
|
|
20
|
+
- **Values**: Object mapping hook types to shell script names
|
|
21
|
+
- **Hook types**: `userPrompt`, `systemPrompt` (extensible for future hooks)
|
|
22
|
+
|
|
23
|
+
## Shell Scripts
|
|
24
|
+
|
|
25
|
+
Each script receives environment variables and outputs text to stdout.
|
|
26
|
+
|
|
27
|
+
```bash
|
|
28
|
+
# Receives: $SISYPHUS_SESSION_ID, $SISYPHUS_AGENT_ID, $INSTRUCTION, $AGENT_TYPE, context files
|
|
29
|
+
# Outputs: Full user or system prompt text
|
|
30
|
+
```
|
|
31
|
+
|
|
32
|
+
**Convention**: `{phase}-{hook-type}.sh`
|
|
33
|
+
|
|
34
|
+
**Inputs**:
|
|
35
|
+
- `$SISYPHUS_SESSION_ID` — Session UUID
|
|
36
|
+
- `$SISYPHUS_AGENT_ID` — Agent ID (e.g., `agent-001`)
|
|
37
|
+
- `$INSTRUCTION` — Task instruction from spawn command
|
|
38
|
+
- `$AGENT_TYPE` — Agent type (e.g., `plan`, `spec`, `implement`)
|
|
39
|
+
- Context files at `.sisyphus/sessions/$SISYPHUS_SESSION_ID/context/`
|
|
40
|
+
|
|
41
|
+
**Output**: Must write complete prompt text to stdout (no errors to stderr)
|
|
42
|
+
|
|
43
|
+
## Invocation
|
|
44
|
+
|
|
45
|
+
Hooks are executed during agent spawn when:
|
|
46
|
+
1. Agent type matches a plugin agent type (e.g., `--agent-type sisyphus:plan`)
|
|
47
|
+
2. Phase has hooks configured in hooks.json
|
|
48
|
+
3. Daemon renders prompts before passing to Claude
|
|
49
|
+
|
|
50
|
+
Output becomes the `--append-system-prompt` or user message content.
|
|
51
|
+
|
|
52
|
+
## Key Patterns
|
|
53
|
+
|
|
54
|
+
- **No placeholders in shell scripts** — unlike `.md` templates, scripts perform logic and generate final text
|
|
55
|
+
- **Context access**: Scripts can read session state from `$SISYPHUS_SESSION_ID` directory
|
|
56
|
+
- **Error handling**: Exit non-zero to fail agent spawn; errors logged to daemon.log
|
|
57
|
+
- **Stdout only**: Scripts must output complete prompt to stdout; nothing to stderr
|
|
@@ -0,0 +1,15 @@
|
|
|
1
|
+
#!/bin/bash
|
|
2
|
+
# UserPromptSubmit hook: reinforce systematic methodology for debug agents.
|
|
3
|
+
if [ -z "$SISYPHUS_SESSION_ID" ]; then exit 0; fi
|
|
4
|
+
|
|
5
|
+
cat <<'HINT'
|
|
6
|
+
<debug-reminder>
|
|
7
|
+
Systematic debugging — don't skip the fundamentals:
|
|
8
|
+
|
|
9
|
+
- Check git log/blame near the failure — recent changes are the highest-signal evidence
|
|
10
|
+
- For medium+ difficulty (crosses 2+ modules, unclear cause), spawn parallel subagents: data flow tracer, assumption auditor, change investigator
|
|
11
|
+
- Your report must include: exact failing line(s), concrete evidence (code snippets, data flow), confidence level (high/medium/low), and recommended fix
|
|
12
|
+
|
|
13
|
+
Investigate only — no code changes except reproduction tests.
|
|
14
|
+
</debug-reminder>
|
|
15
|
+
HINT
|
|
@@ -3,18 +3,22 @@
|
|
|
3
3
|
"PreToolUse": [
|
|
4
4
|
{
|
|
5
5
|
"matcher": "SendMessage",
|
|
6
|
-
"
|
|
7
|
-
|
|
8
|
-
|
|
9
|
-
|
|
6
|
+
"hooks": [
|
|
7
|
+
{
|
|
8
|
+
"type": "command",
|
|
9
|
+
"command": "bash ${CLAUDE_PLUGIN_ROOT}/hooks/intercept-send-message.sh"
|
|
10
|
+
}
|
|
11
|
+
]
|
|
10
12
|
}
|
|
11
13
|
],
|
|
12
14
|
"Stop": [
|
|
13
15
|
{
|
|
14
|
-
"
|
|
15
|
-
|
|
16
|
-
|
|
17
|
-
|
|
16
|
+
"hooks": [
|
|
17
|
+
{
|
|
18
|
+
"type": "command",
|
|
19
|
+
"command": "bash ${CLAUDE_PLUGIN_ROOT}/hooks/require-submit.sh"
|
|
20
|
+
}
|
|
21
|
+
]
|
|
18
22
|
}
|
|
19
23
|
]
|
|
20
24
|
}
|
|
@@ -0,0 +1,14 @@
|
|
|
1
|
+
#!/bin/bash
|
|
2
|
+
# UserPromptSubmit hook: reinforce paranoid testing for operator agents.
|
|
3
|
+
if [ -z "$SISYPHUS_SESSION_ID" ]; then exit 0; fi
|
|
4
|
+
|
|
5
|
+
cat <<'HINT'
|
|
6
|
+
<operator-reminder>
|
|
7
|
+
Click EVERYTHING — assume something is broken and prove it:
|
|
8
|
+
|
|
9
|
+
- Every link, button, nav item, dropdown, toggle, accordion, interactive element on the page
|
|
10
|
+
- Edge cases: empty forms, duplicate submissions, back-button mid-flow, double-clicks, rapid navigation, browser refresh mid-action
|
|
11
|
+
- Check ALL sources: DOM, console errors, network failures, logs — not just what's visually obvious
|
|
12
|
+
- Spawn subagents to parallelize when scope is broad (one per page/flow/feature area) — the cost of missing a broken button is higher than an extra agent
|
|
13
|
+
</operator-reminder>
|
|
14
|
+
HINT
|
|
@@ -2,10 +2,7 @@
|
|
|
2
2
|
# UserPromptSubmit hook: remind plan agent to delegate for large tasks.
|
|
3
3
|
if [ -z "$SISYPHUS_SESSION_ID" ]; then exit 0; fi
|
|
4
4
|
|
|
5
|
-
|
|
6
|
-
import json, sys
|
|
7
|
-
print(json.dumps({'additionalContext': sys.stdin.read()}))
|
|
8
|
-
" <<'HINT'
|
|
5
|
+
cat <<'HINT'
|
|
9
6
|
<planning-reminder>
|
|
10
7
|
For particularly large or multi-domain tasks, delegate sub-plans to specialist agents rather than planning everything solo:
|
|
11
8
|
|
|
@@ -0,0 +1,16 @@
|
|
|
1
|
+
#!/bin/bash
|
|
2
|
+
# UserPromptSubmit hook: reinforce cross-plan interface focus for plan review agents.
|
|
3
|
+
if [ -z "$SISYPHUS_SESSION_ID" ]; then exit 0; fi
|
|
4
|
+
|
|
5
|
+
cat <<'HINT'
|
|
6
|
+
<review-plan-reminder>
|
|
7
|
+
The primary source of bugs is the interfaces between plans:
|
|
8
|
+
|
|
9
|
+
- Confirm critical/high findings by cross-referencing spec and code yourself — don't rubber-stamp subagent opinions
|
|
10
|
+
- Flag file ownership conflicts: any file touched by 2+ plans or agents needs explicit coordination
|
|
11
|
+
- Read actual source files for pattern consistency — don't review the plan in isolation
|
|
12
|
+
- Type definitions must have exactly one owner; flag divergent names/shapes for the same concept
|
|
13
|
+
|
|
14
|
+
You are read-only. Synthesize and report — never edit plan or code files yourself.
|
|
15
|
+
</review-plan-reminder>
|
|
16
|
+
HINT
|
|
@@ -0,0 +1,16 @@
|
|
|
1
|
+
#!/bin/bash
|
|
2
|
+
# UserPromptSubmit hook: reinforce validation discipline for review agents.
|
|
3
|
+
if [ -z "$SISYPHUS_SESSION_ID" ]; then exit 0; fi
|
|
4
|
+
|
|
5
|
+
cat <<'HINT'
|
|
6
|
+
<review-reminder>
|
|
7
|
+
Only report confirmed findings — spawn validation subagents (~1 per 3 issues) before finalizing:
|
|
8
|
+
|
|
9
|
+
- Bugs/Security: opus validates exploitable/broken
|
|
10
|
+
- Everything else: sonnet confirms significant (not nitpick)
|
|
11
|
+
- Drop anything subjective, pre-existing, or linter-catchable
|
|
12
|
+
- Every finding needs `file:line` + concrete evidence — no "this could be a problem"
|
|
13
|
+
|
|
14
|
+
You are read-only. Investigate and direct fixes through implementers — never edit code yourself.
|
|
15
|
+
</review-reminder>
|
|
16
|
+
HINT
|
|
@@ -2,10 +2,7 @@
|
|
|
2
2
|
# UserPromptSubmit hook: remind spec agent to iterate with the user.
|
|
3
3
|
if [ -z "$SISYPHUS_SESSION_ID" ]; then exit 0; fi
|
|
4
4
|
|
|
5
|
-
|
|
6
|
-
import json, sys
|
|
7
|
-
print(json.dumps({'additionalContext': sys.stdin.read()}))
|
|
8
|
-
" <<'HINT'
|
|
5
|
+
cat <<'HINT'
|
|
9
6
|
<spec-reminder>
|
|
10
7
|
Iterate with the user — include them in the process before writing anything to disk:
|
|
11
8
|
|
|
@@ -0,0 +1,14 @@
|
|
|
1
|
+
#!/bin/bash
|
|
2
|
+
# UserPromptSubmit hook: reinforce behavioral invariants for test-spec agents.
|
|
3
|
+
if [ -z "$SISYPHUS_SESSION_ID" ]; then exit 0; fi
|
|
4
|
+
|
|
5
|
+
cat <<'HINT'
|
|
6
|
+
<test-spec-reminder>
|
|
7
|
+
Behavioral properties, not test code:
|
|
8
|
+
|
|
9
|
+
- State behaviors as invariants: "Users can log in with email/password" — not "loginHandler calls bcrypt.compare"
|
|
10
|
+
- Each property must be independently verifiable
|
|
11
|
+
- Include negative properties — what must NOT happen is as important as what must
|
|
12
|
+
- If the change is purely mechanical with nothing to verify, submit { "testsNeeded": false }
|
|
13
|
+
</test-spec-reminder>
|
|
14
|
+
HINT
|
|
@@ -2,10 +2,12 @@
|
|
|
2
2
|
"hooks": {
|
|
3
3
|
"UserPromptSubmit": [
|
|
4
4
|
{
|
|
5
|
-
"
|
|
6
|
-
|
|
7
|
-
|
|
8
|
-
|
|
5
|
+
"hooks": [
|
|
6
|
+
{
|
|
7
|
+
"type": "command",
|
|
8
|
+
"command": "bash ${CLAUDE_PLUGIN_ROOT}/hooks/user-prompt-context.sh"
|
|
9
|
+
}
|
|
10
|
+
]
|
|
9
11
|
}
|
|
10
12
|
]
|
|
11
13
|
}
|
|
@@ -160,9 +160,9 @@ This means the roadmap evolves. Outlined phases get refined (or reworked) as you
|
|
|
160
160
|
|
|
161
161
|
This applies at every level of the hierarchy. Don't produce a detailed implementation plan before you've researched and specified — detailed plans based on assumptions will change. Defer detail until you're about to execute.
|
|
162
162
|
|
|
163
|
-
### Validate before
|
|
163
|
+
### Validate before unverified work compounds
|
|
164
164
|
|
|
165
|
-
|
|
165
|
+
Don't let unverified work accumulate unchecked. The more stages you implement without any critique or validation, the harder it becomes to identify where things went wrong. Interleave verification cycles between implementation stages — how often depends on risk. High-risk stages (core logic, integration points) should be verified before you build on them. Low-risk stages (types, config) can be batched into a broader validation later. The failure mode to avoid is implementing everything and only validating at the end — by then, bugs are buried under layers of dependent code and the feedback is useless.
|
|
166
166
|
|
|
167
167
|
### Every change deserves rigor
|
|
168
168
|
|
|
@@ -174,15 +174,15 @@ For multi-file changes or design decisions, invest fully in the earlier phases:
|
|
|
174
174
|
|
|
175
175
|
The system gives you unlimited cycles for a reason: so you never have to cut corners. Failed implementations, deferred issues, and skipped reviews are far more expensive than extra cycles. Use cycles to be thorough, not to be fast.
|
|
176
176
|
|
|
177
|
-
**Each feature is multiple cycles, not one.**
|
|
177
|
+
**Each feature is multiple cycles, not one.** You have three tools for ensuring quality, and your job is to apply them with judgment:
|
|
178
178
|
|
|
179
|
-
|
|
180
|
-
|
|
181
|
-
|
|
182
|
-
4. **Repeat 2-3** until reviewers come back clean — no feedback means you're done, not "good enough." Every issue found gets addressed. Nothing is deferred.
|
|
183
|
-
5. **Validate** — e2e verification by a separate agent that the feature actually works end-to-end
|
|
179
|
+
- **Critique** — spawn review agents to find flaws, code smells, overengineering, missed edge cases. They report problems, not fixes.
|
|
180
|
+
- **Refine** — spawn agents to fix what the reviewers found, simplify, refactor. Agents can use `/simplify` to systematically look for reuse, quality, and efficiency issues.
|
|
181
|
+
- **Validate** — e2e verification by a separate agent that the feature actually works end-to-end.
|
|
184
182
|
|
|
185
|
-
|
|
183
|
+
Not every stage needs every tool. A types-only stage might need none — the consumers will surface type errors. A core logic stage needs critique at minimum. An integration stage needs critique and validation. The judgment call is yours, based on risk: how much subsequent work depends on this stage being correct? How costly would a bug here be to find later?
|
|
184
|
+
|
|
185
|
+
What you must avoid is the **batch-everything-then-review-at-the-end** pattern. If you implement five stages before any critique or validation, you've turned a series of small, localizable problems into one massive, entangled debugging session. Interleave verification between implementation stages — not necessarily after every one, but often enough that you're catching problems close to where they were introduced.
|
|
186
186
|
|
|
187
187
|
A phase like "Implement auth system" is realistically 4-6 cycles. A phase like "Frontend shell" is 8+. Be honest about scope — underestimating just means you'll lose track of where you are.
|
|
188
188
|
|
|
@@ -6,15 +6,21 @@
|
|
|
6
6
|
|
|
7
7
|
Before starting each cycle, ask: **which stages or tasks are independent right now?** If two stages touch different subsystems (e.g., backend vs frontend, separate services, unrelated modules), spawn them concurrently — don't serialize work that doesn't need to be serialized. Use `--worktree` when parallel agents might touch overlapping files.
|
|
8
8
|
|
|
9
|
-
|
|
9
|
+
Maximize parallelism **within your development cycle, not by skipping parts of it.** Running a review alongside the next stage's implementation is good parallelism. Skipping review because the next stage is ready is not — that's cutting corners faster, not working faster. A cycle with one agent running is a wasted cycle if other work was ready, but "other work" includes critique and validation agents, not just the next implementation stage.
|
|
10
10
|
|
|
11
|
-
If the plan has stages that share no file dependencies, **run them in parallel from the start.**
|
|
11
|
+
If the plan has stages that share no file dependencies, **run them in parallel from the start.** The development cycle for each stage involves some combination of:
|
|
12
12
|
|
|
13
13
|
1. **Detail-plan it** — expand the high-level outline into specific file changes, informed by previous stages. If complex enough, spawn a spec agent first.
|
|
14
14
|
2. **Implement it** — spawn agents with self-contained instructions (see Agent Instructions below). May itself take multiple cycles if the stage has enough work.
|
|
15
|
-
3. **Critique and refine it** — spawn
|
|
16
|
-
4. **Validate it
|
|
17
|
-
|
|
15
|
+
3. **Critique and refine it** — spawn review agents, fix what they find (see Critique and Refinement below).
|
|
16
|
+
4. **Validate it** — spawn a validation agent to verify the stage actually works (see E2E Validation below).
|
|
17
|
+
|
|
18
|
+
Not every stage needs every step. Use your judgment about what level of rigor each stage deserves:
|
|
19
|
+
- A types/interfaces stage might just need implementation — the next stage that consumes the types will surface any problems.
|
|
20
|
+
- A core business logic stage needs implementation + critique at minimum — subtle bugs here cascade everywhere.
|
|
21
|
+
- An integration stage or anything touching critical paths needs the full loop including validation — you're building on accumulated assumptions and need to verify they hold.
|
|
22
|
+
|
|
23
|
+
The key question each cycle: **what's the riskiest unverified work right now?** If you just finished a foundation stage and are about to build on it, validate the foundation. If you just implemented a low-risk config change, move on and batch it into a broader review later. When multiple stages have completed without any critique or validation, you've lost the feedback loop — stop implementing and catch up on verification before problems compound.
|
|
18
24
|
|
|
19
25
|
Don't detail-plan all stages up front. What you learn implementing earlier stages should inform later ones.
|
|
20
26
|
|
|
@@ -52,11 +58,11 @@ When you see these reports, investigate before pushing forward. If the smell sug
|
|
|
52
58
|
|
|
53
59
|
## Critique and Refinement
|
|
54
60
|
|
|
55
|
-
After implementation agents report,
|
|
61
|
+
After implementation agents report, assess whether the stage needs critique before advancing. For stages that touch core logic, integration points, or critical paths — review before building on top. For low-risk stages (types, config, boilerplate), you can defer review and batch it with a later critique cycle. The failure mode is not "sometimes skipping review" — it's implementing six stages in a row without any review at all.
|
|
56
62
|
|
|
57
63
|
### Critique cycle
|
|
58
64
|
|
|
59
|
-
|
|
65
|
+
When a stage warrants critique, spawn review agents in parallel, each attacking a different dimension:
|
|
60
66
|
|
|
61
67
|
1. **Code reuse reviewer** — searches the codebase for existing utilities, helpers, and patterns that the new code duplicates. Flags any new function that reimplements existing functionality, any inline logic that could use an existing utility.
|
|
62
68
|
|
|
@@ -83,7 +89,7 @@ Spawn reviewers again on the refined code. If they come back with new issues, fi
|
|
|
83
89
|
|
|
84
90
|
## E2E Validation
|
|
85
91
|
|
|
86
|
-
|
|
92
|
+
E2E validation confirms the implementation actually works — not just that it compiles or passes unit tests, but that the feature behaves correctly when exercised. Reserve full e2e validation for stages where you're about to build on accumulated work (integration stages, milestones where multiple stages come together) or where failure would be expensive to debug later. Not every stage needs its own e2e pass — but don't let more than 2-3 stages accumulate without one.
|
|
87
93
|
|
|
88
94
|
Spawn a validation agent with the e2e recipe from `context/e2e-recipe.md`. The agent should:
|
|
89
95
|
- Follow the setup steps exactly (build, start servers, seed data)
|
|
@@ -78,28 +78,33 @@ Feature with moderate complexity. Requirements may need clarification. Multiple
|
|
|
78
78
|
### Implementation
|
|
79
79
|
- [ ] Phase 1 — [foundation/types/interfaces]
|
|
80
80
|
- [ ] Phase 2 — [core logic]
|
|
81
|
+
- [ ] Critique phases 1-2
|
|
81
82
|
- [ ] Phase 3 — [integration/wiring]
|
|
82
|
-
|
|
83
|
-
### Validation
|
|
84
|
-
- [ ] Validate full implementation
|
|
83
|
+
- [ ] Validate — smoketest full feature e2e
|
|
85
84
|
- [ ] Review implementation
|
|
86
85
|
```
|
|
87
86
|
|
|
87
|
+
Note: critique and validation are embedded between implementation phases, not deferred to the end. Phase 1 (types) is low-risk and doesn't need its own review, but critique catches issues before Phase 3 builds on them. Validation happens after integration, when all the pieces come together.
|
|
88
|
+
|
|
88
89
|
### Cycle plan
|
|
89
90
|
- **Cycle 1**: Spawn `sisyphus:spec-draft` for spec. Yield. (Human iterates on spec between cycles.)
|
|
90
91
|
- **Cycle 2**: Spawn `sisyphus:plan` for plan. Yield.
|
|
91
92
|
- **Cycle 3**: Spawn `sisyphus:review-plan` for review. If fail, respawn plan with issues. Yield.
|
|
92
93
|
- **Cycle 4**: Spawn `sisyphus:implement` for Phase 1. Yield.
|
|
93
|
-
- **Cycle 5**: Spawn `sisyphus:implement` for Phase 2
|
|
94
|
-
- **Cycle 6
|
|
94
|
+
- **Cycle 5**: Spawn `sisyphus:implement` for Phase 2. Phase 1 is types — low risk, doesn't need its own validation. Yield.
|
|
95
|
+
- **Cycle 6**: Spawn `sisyphus:review` for critique of phases 1-2. This is the checkpoint before integration builds on top. Yield.
|
|
96
|
+
- **Cycle 7**: Address critique findings + spawn `sisyphus:implement` for Phase 3. Yield.
|
|
97
|
+
- **Cycle 8**: Spawn `sisyphus:validate` for e2e smoketest. Yield.
|
|
98
|
+
- **Cycle 9**: Address validation failures or complete.
|
|
95
99
|
|
|
96
100
|
### Failure modes
|
|
97
101
|
- **Spec needs human input**: Mark session as needing human review. Orchestrator notes open questions.
|
|
98
102
|
- **Plan fails review**: Feed review issues back, respawn planner.
|
|
99
|
-
- **
|
|
103
|
+
- **Critique finds issues in foundation**: Fix before starting integration — don't build on shaky ground.
|
|
104
|
+
- **Validation fails**: Feed specifics back to implement agent for the failing area.
|
|
100
105
|
|
|
101
106
|
### Parallelization
|
|
102
|
-
Phases without dependencies can run in parallel. Types/interfaces (Phase 1) must complete before implementation phases that consume them.
|
|
107
|
+
Phases without dependencies can run in parallel. Types/interfaces (Phase 1) must complete before implementation phases that consume them. Critique can run alongside detail-planning for the next phase.
|
|
103
108
|
|
|
104
109
|
---
|
|
105
110
|
|
|
@@ -119,31 +124,40 @@ Cross-cutting feature, multiple domains, needs team coordination. Uses **progres
|
|
|
119
124
|
### Stage Outline (high-level only — no file-level detail yet)
|
|
120
125
|
1. [domain A foundation] — no deps — ~N cycles
|
|
121
126
|
2. [domain B foundation] — no deps — ~N cycles
|
|
127
|
+
→ critique stages 1-2 (foundation is low-risk individually, but review before building on it)
|
|
122
128
|
3. [domain A implementation] — depends on 1 — ~N cycles
|
|
123
129
|
4. [domain B implementation] — depends on 2 — ~N cycles
|
|
130
|
+
→ critique + validate stages 3-4 (core logic, high risk — verify before integration)
|
|
124
131
|
5. [integration layer] — depends on 3, 4 — ~N cycles
|
|
125
|
-
|
|
132
|
+
→ validate end-to-end (integration is where accumulated assumptions break)
|
|
133
|
+
6. [final review] — depends on all
|
|
126
134
|
|
|
127
135
|
### Current Stage: [whichever is active]
|
|
128
136
|
See context/plan-stage-N-{name}.md for detail plan.
|
|
129
137
|
- [ ] [task-level items from detail plan]
|
|
130
138
|
```
|
|
131
139
|
|
|
140
|
+
Note: verification checkpoints are embedded in the stage outline, not deferred to a final phase. The level of rigor varies — foundation stages get a light critique, core logic gets critique + validation, integration gets full e2e validation. This is judgment, not formula.
|
|
141
|
+
|
|
132
142
|
### Cycle plan
|
|
133
143
|
- **Cycle 1**: Spawn `sisyphus:spec-draft` for spec. Yield.
|
|
134
|
-
- **Cycle 2**: Spawn `sisyphus:plan` for **high-level stage outline only**. Instruction: "Outline stages, dependencies, one-sentence descriptions, cycle estimates.
|
|
144
|
+
- **Cycle 2**: Spawn `sisyphus:plan` for **high-level stage outline only**. Instruction: "Outline stages, dependencies, one-sentence descriptions, cycle estimates. Include verification checkpoints between stages based on risk." Spawn `sisyphus:test-spec` for test properties (parallel). Yield.
|
|
135
145
|
- **Cycle 3**: Review outline. Spawn `sisyphus:plan` to **detail-plan stage 1 only** (provide outline as context). Output to `context/plan-stage-1-{name}.md`. Yield.
|
|
136
146
|
- **Cycle 4**: Spawn `sisyphus:implement` for stage 1. If stage 2 is independent, spawn `sisyphus:plan` to detail-plan stage 2 in parallel. Yield.
|
|
137
|
-
- **Cycle 5**:
|
|
138
|
-
- **Cycle 6
|
|
147
|
+
- **Cycle 5**: Spawn `sisyphus:implement` for stage 2 (if detail-planned). Spawn `sisyphus:review` to critique stages 1-2 in parallel — foundation review before core logic builds on it. Detail-plan stage 3 in parallel. Yield.
|
|
148
|
+
- **Cycle 6**: Address critique findings. Spawn `sisyphus:implement` for stage 3. Yield.
|
|
149
|
+
- **Cycle 7**: Spawn `sisyphus:implement` for stage 4. Spawn `sisyphus:review` to critique stage 3 in parallel. Yield.
|
|
150
|
+
- **Cycle 8**: Spawn `sisyphus:validate` for stages 3-4 — core logic checkpoint before integration. Address stage 3 critique. Yield.
|
|
151
|
+
- **Cycle 9+**: Implement integration stage. Validate e2e. Final review.
|
|
139
152
|
|
|
140
153
|
### Failure modes
|
|
141
154
|
- **Detail-plan agent can't produce quality output**: The stage is still too large. Break it into sub-stages in the outline and detail-plan each sub-stage individually.
|
|
142
155
|
- **Integration failures**: Often means contracts between domains don't match. Spawn debug agent targeting the integration seam.
|
|
143
156
|
- **Stage N implementation invalidates stage N+1 outline**: Update the high-level outline. This is expected — it's why you don't detail-plan everything upfront.
|
|
157
|
+
- **Critique finds issues after multiple stages built on top**: This is the scenario verification checkpoints exist to prevent. If it happens, you waited too long to review — add earlier checkpoints to the roadmap going forward.
|
|
144
158
|
|
|
145
159
|
### Parallelization
|
|
146
|
-
Maximize within the progressive pattern. Independent stages run in parallel. Detail-planning the next stage runs alongside implementing the current one. Foundation stages complete before dependent stages. Integration waits for all domain implementations.
|
|
160
|
+
Maximize within the progressive pattern. Independent stages run in parallel. Detail-planning the next stage runs alongside implementing the current one. Critique and validation agents run alongside the next stage's planning or implementation. Foundation stages complete before dependent stages. Integration waits for all domain implementations.
|
|
147
161
|
|
|
148
162
|
---
|
|
149
163
|
|
package/package.json
CHANGED
|
@@ -0,0 +1,57 @@
|
|
|
1
|
+
# templates/agent-plugin/hooks/
|
|
2
|
+
|
|
3
|
+
Lifecycle hooks for agent plugin workflows. Enable specialized prompt generation and context handling during agent spawning.
|
|
4
|
+
|
|
5
|
+
## hooks.json
|
|
6
|
+
|
|
7
|
+
Schema: `{ "phaseKey": { "hookName": "script-name.sh" } }`
|
|
8
|
+
|
|
9
|
+
Example:
|
|
10
|
+
```json
|
|
11
|
+
{
|
|
12
|
+
"plan": {
|
|
13
|
+
"userPrompt": "plan-user-prompt.sh",
|
|
14
|
+
"systemPrompt": "plan-system-prompt.sh"
|
|
15
|
+
}
|
|
16
|
+
}
|
|
17
|
+
```
|
|
18
|
+
|
|
19
|
+
- **Keys**: Phase names (e.g., `plan`, `spec`, `implement`) — must correspond to phase modes in agent spawn workflow
|
|
20
|
+
- **Values**: Object mapping hook types to shell script names
|
|
21
|
+
- **Hook types**: `userPrompt`, `systemPrompt` (extensible for future hooks)
|
|
22
|
+
|
|
23
|
+
## Shell Scripts
|
|
24
|
+
|
|
25
|
+
Each script receives environment variables and outputs text to stdout.
|
|
26
|
+
|
|
27
|
+
```bash
|
|
28
|
+
# Receives: $SISYPHUS_SESSION_ID, $SISYPHUS_AGENT_ID, $INSTRUCTION, $AGENT_TYPE, context files
|
|
29
|
+
# Outputs: Full user or system prompt text
|
|
30
|
+
```
|
|
31
|
+
|
|
32
|
+
**Convention**: `{phase}-{hook-type}.sh`
|
|
33
|
+
|
|
34
|
+
**Inputs**:
|
|
35
|
+
- `$SISYPHUS_SESSION_ID` — Session UUID
|
|
36
|
+
- `$SISYPHUS_AGENT_ID` — Agent ID (e.g., `agent-001`)
|
|
37
|
+
- `$INSTRUCTION` — Task instruction from spawn command
|
|
38
|
+
- `$AGENT_TYPE` — Agent type (e.g., `plan`, `spec`, `implement`)
|
|
39
|
+
- Context files at `.sisyphus/sessions/$SISYPHUS_SESSION_ID/context/`
|
|
40
|
+
|
|
41
|
+
**Output**: Must write complete prompt text to stdout (no errors to stderr)
|
|
42
|
+
|
|
43
|
+
## Invocation
|
|
44
|
+
|
|
45
|
+
Hooks are executed during agent spawn when:
|
|
46
|
+
1. Agent type matches a plugin agent type (e.g., `--agent-type sisyphus:plan`)
|
|
47
|
+
2. Phase has hooks configured in hooks.json
|
|
48
|
+
3. Daemon renders prompts before passing to Claude
|
|
49
|
+
|
|
50
|
+
Output becomes the `--append-system-prompt` or user message content.
|
|
51
|
+
|
|
52
|
+
## Key Patterns
|
|
53
|
+
|
|
54
|
+
- **No placeholders in shell scripts** — unlike `.md` templates, scripts perform logic and generate final text
|
|
55
|
+
- **Context access**: Scripts can read session state from `$SISYPHUS_SESSION_ID` directory
|
|
56
|
+
- **Error handling**: Exit non-zero to fail agent spawn; errors logged to daemon.log
|
|
57
|
+
- **Stdout only**: Scripts must output complete prompt to stdout; nothing to stderr
|
|
@@ -0,0 +1,15 @@
|
|
|
1
|
+
#!/bin/bash
|
|
2
|
+
# UserPromptSubmit hook: reinforce systematic methodology for debug agents.
|
|
3
|
+
if [ -z "$SISYPHUS_SESSION_ID" ]; then exit 0; fi
|
|
4
|
+
|
|
5
|
+
cat <<'HINT'
|
|
6
|
+
<debug-reminder>
|
|
7
|
+
Systematic debugging — don't skip the fundamentals:
|
|
8
|
+
|
|
9
|
+
- Check git log/blame near the failure — recent changes are the highest-signal evidence
|
|
10
|
+
- For medium+ difficulty (crosses 2+ modules, unclear cause), spawn parallel subagents: data flow tracer, assumption auditor, change investigator
|
|
11
|
+
- Your report must include: exact failing line(s), concrete evidence (code snippets, data flow), confidence level (high/medium/low), and recommended fix
|
|
12
|
+
|
|
13
|
+
Investigate only — no code changes except reproduction tests.
|
|
14
|
+
</debug-reminder>
|
|
15
|
+
HINT
|
|
@@ -3,18 +3,22 @@
|
|
|
3
3
|
"PreToolUse": [
|
|
4
4
|
{
|
|
5
5
|
"matcher": "SendMessage",
|
|
6
|
-
"
|
|
7
|
-
|
|
8
|
-
|
|
9
|
-
|
|
6
|
+
"hooks": [
|
|
7
|
+
{
|
|
8
|
+
"type": "command",
|
|
9
|
+
"command": "bash ${CLAUDE_PLUGIN_ROOT}/hooks/intercept-send-message.sh"
|
|
10
|
+
}
|
|
11
|
+
]
|
|
10
12
|
}
|
|
11
13
|
],
|
|
12
14
|
"Stop": [
|
|
13
15
|
{
|
|
14
|
-
"
|
|
15
|
-
|
|
16
|
-
|
|
17
|
-
|
|
16
|
+
"hooks": [
|
|
17
|
+
{
|
|
18
|
+
"type": "command",
|
|
19
|
+
"command": "bash ${CLAUDE_PLUGIN_ROOT}/hooks/require-submit.sh"
|
|
20
|
+
}
|
|
21
|
+
]
|
|
18
22
|
}
|
|
19
23
|
]
|
|
20
24
|
}
|
|
@@ -0,0 +1,14 @@
|
|
|
1
|
+
#!/bin/bash
|
|
2
|
+
# UserPromptSubmit hook: reinforce paranoid testing for operator agents.
|
|
3
|
+
if [ -z "$SISYPHUS_SESSION_ID" ]; then exit 0; fi
|
|
4
|
+
|
|
5
|
+
cat <<'HINT'
|
|
6
|
+
<operator-reminder>
|
|
7
|
+
Click EVERYTHING — assume something is broken and prove it:
|
|
8
|
+
|
|
9
|
+
- Every link, button, nav item, dropdown, toggle, accordion, interactive element on the page
|
|
10
|
+
- Edge cases: empty forms, duplicate submissions, back-button mid-flow, double-clicks, rapid navigation, browser refresh mid-action
|
|
11
|
+
- Check ALL sources: DOM, console errors, network failures, logs — not just what's visually obvious
|
|
12
|
+
- Spawn subagents to parallelize when scope is broad (one per page/flow/feature area) — the cost of missing a broken button is higher than an extra agent
|
|
13
|
+
</operator-reminder>
|
|
14
|
+
HINT
|
|
@@ -2,10 +2,7 @@
|
|
|
2
2
|
# UserPromptSubmit hook: remind plan agent to delegate for large tasks.
|
|
3
3
|
if [ -z "$SISYPHUS_SESSION_ID" ]; then exit 0; fi
|
|
4
4
|
|
|
5
|
-
|
|
6
|
-
import json, sys
|
|
7
|
-
print(json.dumps({'additionalContext': sys.stdin.read()}))
|
|
8
|
-
" <<'HINT'
|
|
5
|
+
cat <<'HINT'
|
|
9
6
|
<planning-reminder>
|
|
10
7
|
For particularly large or multi-domain tasks, delegate sub-plans to specialist agents rather than planning everything solo:
|
|
11
8
|
|
|
@@ -0,0 +1,16 @@
|
|
|
1
|
+
#!/bin/bash
|
|
2
|
+
# UserPromptSubmit hook: reinforce cross-plan interface focus for plan review agents.
|
|
3
|
+
if [ -z "$SISYPHUS_SESSION_ID" ]; then exit 0; fi
|
|
4
|
+
|
|
5
|
+
cat <<'HINT'
|
|
6
|
+
<review-plan-reminder>
|
|
7
|
+
The primary source of bugs is the interfaces between plans:
|
|
8
|
+
|
|
9
|
+
- Confirm critical/high findings by cross-referencing spec and code yourself — don't rubber-stamp subagent opinions
|
|
10
|
+
- Flag file ownership conflicts: any file touched by 2+ plans or agents needs explicit coordination
|
|
11
|
+
- Read actual source files for pattern consistency — don't review the plan in isolation
|
|
12
|
+
- Type definitions must have exactly one owner; flag divergent names/shapes for the same concept
|
|
13
|
+
|
|
14
|
+
You are read-only. Synthesize and report — never edit plan or code files yourself.
|
|
15
|
+
</review-plan-reminder>
|
|
16
|
+
HINT
|
|
@@ -0,0 +1,16 @@
|
|
|
1
|
+
#!/bin/bash
|
|
2
|
+
# UserPromptSubmit hook: reinforce validation discipline for review agents.
|
|
3
|
+
if [ -z "$SISYPHUS_SESSION_ID" ]; then exit 0; fi
|
|
4
|
+
|
|
5
|
+
cat <<'HINT'
|
|
6
|
+
<review-reminder>
|
|
7
|
+
Only report confirmed findings — spawn validation subagents (~1 per 3 issues) before finalizing:
|
|
8
|
+
|
|
9
|
+
- Bugs/Security: opus validates exploitable/broken
|
|
10
|
+
- Everything else: sonnet confirms significant (not nitpick)
|
|
11
|
+
- Drop anything subjective, pre-existing, or linter-catchable
|
|
12
|
+
- Every finding needs `file:line` + concrete evidence — no "this could be a problem"
|
|
13
|
+
|
|
14
|
+
You are read-only. Investigate and direct fixes through implementers — never edit code yourself.
|
|
15
|
+
</review-reminder>
|
|
16
|
+
HINT
|
|
@@ -2,10 +2,7 @@
|
|
|
2
2
|
# UserPromptSubmit hook: remind spec agent to iterate with the user.
|
|
3
3
|
if [ -z "$SISYPHUS_SESSION_ID" ]; then exit 0; fi
|
|
4
4
|
|
|
5
|
-
|
|
6
|
-
import json, sys
|
|
7
|
-
print(json.dumps({'additionalContext': sys.stdin.read()}))
|
|
8
|
-
" <<'HINT'
|
|
5
|
+
cat <<'HINT'
|
|
9
6
|
<spec-reminder>
|
|
10
7
|
Iterate with the user — include them in the process before writing anything to disk:
|
|
11
8
|
|
|
@@ -0,0 +1,14 @@
|
|
|
1
|
+
#!/bin/bash
|
|
2
|
+
# UserPromptSubmit hook: reinforce behavioral invariants for test-spec agents.
|
|
3
|
+
if [ -z "$SISYPHUS_SESSION_ID" ]; then exit 0; fi
|
|
4
|
+
|
|
5
|
+
cat <<'HINT'
|
|
6
|
+
<test-spec-reminder>
|
|
7
|
+
Behavioral properties, not test code:
|
|
8
|
+
|
|
9
|
+
- State behaviors as invariants: "Users can log in with email/password" — not "loginHandler calls bcrypt.compare"
|
|
10
|
+
- Each property must be independently verifiable
|
|
11
|
+
- Include negative properties — what must NOT happen is as important as what must
|
|
12
|
+
- If the change is purely mechanical with nothing to verify, submit { "testsNeeded": false }
|
|
13
|
+
</test-spec-reminder>
|
|
14
|
+
HINT
|
|
@@ -2,10 +2,12 @@
|
|
|
2
2
|
"hooks": {
|
|
3
3
|
"UserPromptSubmit": [
|
|
4
4
|
{
|
|
5
|
-
"
|
|
6
|
-
|
|
7
|
-
|
|
8
|
-
|
|
5
|
+
"hooks": [
|
|
6
|
+
{
|
|
7
|
+
"type": "command",
|
|
8
|
+
"command": "bash ${CLAUDE_PLUGIN_ROOT}/hooks/user-prompt-context.sh"
|
|
9
|
+
}
|
|
10
|
+
]
|
|
9
11
|
}
|
|
10
12
|
]
|
|
11
13
|
}
|