@caseyharalson/orrery 0.7.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.devcontainer.example/Dockerfile +149 -0
- package/.devcontainer.example/devcontainer.json +61 -0
- package/.devcontainer.example/init-firewall.sh +175 -0
- package/LICENSE +21 -0
- package/README.md +139 -0
- package/agent/skills/discovery/SKILL.md +428 -0
- package/agent/skills/discovery/schemas/plan-schema.yaml +138 -0
- package/agent/skills/orrery-execute/SKILL.md +107 -0
- package/agent/skills/orrery-report/SKILL.md +119 -0
- package/agent/skills/orrery-review/SKILL.md +105 -0
- package/agent/skills/orrery-verify/SKILL.md +105 -0
- package/agent/skills/refine-plan/SKILL.md +291 -0
- package/agent/skills/simulate-plan/SKILL.md +244 -0
- package/bin/orrery.js +5 -0
- package/lib/cli/commands/help.js +21 -0
- package/lib/cli/commands/ingest-plan.js +56 -0
- package/lib/cli/commands/init.js +21 -0
- package/lib/cli/commands/install-devcontainer.js +97 -0
- package/lib/cli/commands/install-skills.js +182 -0
- package/lib/cli/commands/orchestrate.js +27 -0
- package/lib/cli/commands/resume.js +146 -0
- package/lib/cli/commands/status.js +137 -0
- package/lib/cli/commands/validate-plan.js +288 -0
- package/lib/cli/index.js +57 -0
- package/lib/orchestration/agent-invoker.js +595 -0
- package/lib/orchestration/condensed-plan.js +128 -0
- package/lib/orchestration/config.js +213 -0
- package/lib/orchestration/dependency-resolver.js +149 -0
- package/lib/orchestration/edit-invoker.js +115 -0
- package/lib/orchestration/index.js +1065 -0
- package/lib/orchestration/plan-loader.js +212 -0
- package/lib/orchestration/progress-tracker.js +208 -0
- package/lib/orchestration/report-format.js +80 -0
- package/lib/orchestration/review-invoker.js +305 -0
- package/lib/utils/agent-detector.js +47 -0
- package/lib/utils/git.js +297 -0
- package/lib/utils/paths.js +43 -0
- package/lib/utils/plan-detect.js +24 -0
- package/lib/utils/skill-copier.js +79 -0
- package/package.json +58 -0
|
@@ -0,0 +1,119 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: orrery-report
|
|
3
|
+
description: >
|
|
4
|
+
Final reporting phase for the orrery workflow. Outputs structured JSON
|
|
5
|
+
summarizing execution results, test outcomes, and status. Use after
|
|
6
|
+
orrery-verify completes or when reporting blocked status.
|
|
7
|
+
user-invocable: false
|
|
8
|
+
---
|
|
9
|
+
|
|
10
|
+
# Report Skill
|
|
11
|
+
|
|
12
|
+
## When to Use
|
|
13
|
+
|
|
14
|
+
Use this skill **after verification** to finalize your work on a step and communicate the result to the Orchestrator.
|
|
15
|
+
|
|
16
|
+
**Triggers:**
|
|
17
|
+
|
|
18
|
+
- Execution and Verification phases are complete.
|
|
19
|
+
- You have been handed off from the **Verify** skill.
|
|
20
|
+
- Or, you need to report a **Blocked** status.
|
|
21
|
+
|
|
22
|
+
---
|
|
23
|
+
|
|
24
|
+
## Output Contract (CRITICAL)
|
|
25
|
+
|
|
26
|
+
The Orchestrator expects a **single JSON object** printed to `stdout`. This is how you "report" your status.
|
|
27
|
+
|
|
28
|
+
**Success JSON:**
|
|
29
|
+
|
|
30
|
+
```json
|
|
31
|
+
{
|
|
32
|
+
"stepId": "<id>",
|
|
33
|
+
"status": "complete",
|
|
34
|
+
"summary": "Brief description of work done and verification results",
|
|
35
|
+
"artifacts": ["file1.js", "src/components/NewComp.tsx"],
|
|
36
|
+
"testResults": "Passed 8/8 tests",
|
|
37
|
+
"commitMessage": "feat: add user authentication with session handling"
|
|
38
|
+
}
|
|
39
|
+
```
|
|
40
|
+
|
|
41
|
+
**Blocked JSON:**
|
|
42
|
+
|
|
43
|
+
```json
|
|
44
|
+
{
|
|
45
|
+
"stepId": "<id>",
|
|
46
|
+
"status": "blocked",
|
|
47
|
+
"blockedReason": "Detailed reason why the step cannot be completed",
|
|
48
|
+
"summary": "Work attempted but failed due to..."
|
|
49
|
+
}
|
|
50
|
+
```
|
|
51
|
+
|
|
52
|
+
**Rules:**
|
|
53
|
+
|
|
54
|
+
1. **NO Markdown:** Do not wrap the JSON in triple-backtick code blocks (e.g., ` ```json ... ``` `).
|
|
55
|
+
2. **Clean Output:** Ensure the JSON is valid and on its own line.
|
|
56
|
+
3. **One Object Per Step:** If you are working on multiple steps, you may output multiple JSON objects (one per line).
|
|
57
|
+
|
|
58
|
+
---
|
|
59
|
+
|
|
60
|
+
## How to Do It
|
|
61
|
+
|
|
62
|
+
### Step 1: Gather Information
|
|
63
|
+
|
|
64
|
+
Collect from your execution and verification:
|
|
65
|
+
|
|
66
|
+
- **Status:** Did it pass verification? (`complete` vs `blocked`)
|
|
67
|
+
- **Artifacts:** List of files created or modified.
|
|
68
|
+
- **Summary:** A concise sentence describing the implementation.
|
|
69
|
+
- **Test Results:** Summary of test outcomes (e.g., "Pass: 5, Fail: 0").
|
|
70
|
+
|
|
71
|
+
### Step 2: Construct the JSON
|
|
72
|
+
|
|
73
|
+
Map your gathered info to the JSON fields.
|
|
74
|
+
|
|
75
|
+
- `stepId`: The ID from the plan you are working on.
|
|
76
|
+
- `status`: "complete" (if verified) or "blocked".
|
|
77
|
+
- `summary`: Human-readable explanation.
|
|
78
|
+
- `artifacts`: Array of file paths.
|
|
79
|
+
- `commitMessage`: A conventional commit message (e.g., "feat: add login", "fix: resolve null check").
|
|
80
|
+
|
|
81
|
+
### Step 3: Output to Stdout
|
|
82
|
+
|
|
83
|
+
Print the JSON string. **This is your final action.**
|
|
84
|
+
|
|
85
|
+
---
|
|
86
|
+
|
|
87
|
+
## Example
|
|
88
|
+
|
|
89
|
+
**Scenario:** You implemented a "Login" feature (Step 3).
|
|
90
|
+
|
|
91
|
+
**1. Handoff:** You came from `verify` where 2/2 tests passed.
|
|
92
|
+
**2. Constructing Report:**
|
|
93
|
+
|
|
94
|
+
- ID: "3"
|
|
95
|
+
- Status: "complete"
|
|
96
|
+
- Summary: "Implemented login logic."
|
|
97
|
+
- Artifacts: ["src/auth/login.ts"]
|
|
98
|
+
- Test Results: "2/2 passed"
|
|
99
|
+
|
|
100
|
+
**3. Action:**
|
|
101
|
+
|
|
102
|
+
```json
|
|
103
|
+
{
|
|
104
|
+
"stepId": "3",
|
|
105
|
+
"status": "complete",
|
|
106
|
+
"summary": "Implemented login logic and verified with unit tests",
|
|
107
|
+
"artifacts": ["src/auth/login.ts"],
|
|
108
|
+
"testResults": "2/2 passed",
|
|
109
|
+
"commitMessage": "feat: add login endpoint with session handling"
|
|
110
|
+
}
|
|
111
|
+
```
|
|
112
|
+
|
|
113
|
+
---
|
|
114
|
+
|
|
115
|
+
## Common Pitfalls
|
|
116
|
+
|
|
117
|
+
- **Outputting Text:** "I have finished the step." (The Orchestrator cannot read this).
|
|
118
|
+
- **Markdown Blocks:** Wrapping JSON in ` ```json ... ``` ` breaks the parser.
|
|
119
|
+
- **Invalid JSON:** Ensure the JSON is properly formatted and all strings are quoted.
|
|
@@ -0,0 +1,105 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: orrery-review
|
|
3
|
+
description: >
|
|
4
|
+
Review code changes after each step execution and provide structured feedback.
|
|
5
|
+
user-invocable: false
|
|
6
|
+
---
|
|
7
|
+
|
|
8
|
+
# Review Skill
|
|
9
|
+
|
|
10
|
+
## When to Use
|
|
11
|
+
|
|
12
|
+
Use this skill **after a step execution** to review the changes and provide
|
|
13
|
+
feedback to an editor agent.
|
|
14
|
+
|
|
15
|
+
**Triggers:**
|
|
16
|
+
|
|
17
|
+
- You are invoked by the orchestrator for a review phase.
|
|
18
|
+
- You receive step context, list of modified files, and a git diff.
|
|
19
|
+
|
|
20
|
+
**Do not** apply fixes yourself. Only review and report.
|
|
21
|
+
|
|
22
|
+
---
|
|
23
|
+
|
|
24
|
+
## How to Do It
|
|
25
|
+
|
|
26
|
+
### Step 1: Read Context
|
|
27
|
+
|
|
28
|
+
- Read the step description and requirements provided by the orchestrator.
|
|
29
|
+
- Read the git diff to understand the changes.
|
|
30
|
+
|
|
31
|
+
### Step 2: Read Full Files (Required)
|
|
32
|
+
|
|
33
|
+
For every modified or added file, **read the full file contents** to
|
|
34
|
+
understand broader context. Do not review only the diff.
|
|
35
|
+
|
|
36
|
+
### Step 3: Review Criteria
|
|
37
|
+
|
|
38
|
+
Check the changes against these criteria:
|
|
39
|
+
|
|
40
|
+
- Correctness: Does the code do what it should? Any logic mistakes?
|
|
41
|
+
- Bug risks: Edge cases, null handling, off-by-one errors.
|
|
42
|
+
- Security: Injection, auth issues, data exposure, unsafe inputs.
|
|
43
|
+
- Performance: Obvious inefficiencies, N+1 patterns, memory leaks.
|
|
44
|
+
- Code quality: Readability, naming, complexity, maintainability.
|
|
45
|
+
- Architecture: Duplication, separation of concerns, module boundaries.
|
|
46
|
+
- Error handling: Errors handled gracefully, no silent failures.
|
|
47
|
+
- Testing: Test coverage gaps, testability, missing edge cases.
|
|
48
|
+
|
|
49
|
+
### Step 4: Prioritize Feedback
|
|
50
|
+
|
|
51
|
+
- Mark **blocking** issues that must be fixed (bugs, security, correctness).
|
|
52
|
+
- Mark **suggestions** for improvements that are optional or non-critical.
|
|
53
|
+
- Keep feedback concise and actionable; avoid style-only nits.
|
|
54
|
+
|
|
55
|
+
### Step 5: Decide Status
|
|
56
|
+
|
|
57
|
+
- `approved` if there are **no blocking** issues.
|
|
58
|
+
- `needs_changes` if **any blocking** issue exists.
|
|
59
|
+
|
|
60
|
+
---
|
|
61
|
+
|
|
62
|
+
## Output Contract (CRITICAL)
|
|
63
|
+
|
|
64
|
+
Return a single JSON object to stdout.
|
|
65
|
+
|
|
66
|
+
```json
|
|
67
|
+
{
|
|
68
|
+
"status": "approved | needs_changes",
|
|
69
|
+
"feedback": [
|
|
70
|
+
{
|
|
71
|
+
"file": "path/to/file.js",
|
|
72
|
+
"line": 42,
|
|
73
|
+
"severity": "blocking | suggestion",
|
|
74
|
+
"comment": "Explain the issue and expected fix."
|
|
75
|
+
}
|
|
76
|
+
]
|
|
77
|
+
}
|
|
78
|
+
```
|
|
79
|
+
|
|
80
|
+
**Rules:**
|
|
81
|
+
|
|
82
|
+
1. Output **valid JSON on a single line**.
|
|
83
|
+
2. No markdown code blocks in your final output.
|
|
84
|
+
3. `line` is optional if not applicable.
|
|
85
|
+
4. `feedback` may be empty only when `status` is `approved`.
|
|
86
|
+
|
|
87
|
+
---
|
|
88
|
+
|
|
89
|
+
## Example
|
|
90
|
+
|
|
91
|
+
**Scenario:** Missing null check in `src/service.js`.
|
|
92
|
+
|
|
93
|
+
```json
|
|
94
|
+
{
|
|
95
|
+
"status": "needs_changes",
|
|
96
|
+
"feedback": [
|
|
97
|
+
{
|
|
98
|
+
"file": "src/service.js",
|
|
99
|
+
"line": 87,
|
|
100
|
+
"severity": "blocking",
|
|
101
|
+
"comment": "Guard against null `user` before accessing `user.id` to avoid runtime errors."
|
|
102
|
+
}
|
|
103
|
+
]
|
|
104
|
+
}
|
|
105
|
+
```
|
|
@@ -0,0 +1,105 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: orrery-verify
|
|
3
|
+
description: >
|
|
4
|
+
Run tests, linting, and validation to verify changes work correctly.
|
|
5
|
+
Use after implementation to check acceptance criteria, run test suites,
|
|
6
|
+
and ensure nothing is broken.
|
|
7
|
+
user-invocable: false
|
|
8
|
+
---
|
|
9
|
+
|
|
10
|
+
# Verify Skill
|
|
11
|
+
|
|
12
|
+
## When to Use
|
|
13
|
+
|
|
14
|
+
Use this skill **after execution** to validate that changes work correctly and meet acceptance criteria.
|
|
15
|
+
|
|
16
|
+
**Triggers:**
|
|
17
|
+
|
|
18
|
+
- Execution phase is complete.
|
|
19
|
+
- You have been handed off from the **Execute** skill.
|
|
20
|
+
|
|
21
|
+
**Never skip verification.** Even trivial changes should have at least basic checks.
|
|
22
|
+
|
|
23
|
+
---
|
|
24
|
+
|
|
25
|
+
## How to Do It
|
|
26
|
+
|
|
27
|
+
### Step 0: Check Repository Guidelines
|
|
28
|
+
|
|
29
|
+
Before running generic commands, check for project-specific instructions:
|
|
30
|
+
|
|
31
|
+
1. **Plan metadata.notes** - Use testing/linting commands specified here
|
|
32
|
+
2. **Project guideline files** - Check for files like `CLAUDE.md`, `AGENTS.md`, `COPILOT.md`, or similar at the repo root. Follow their validation steps (e.g., `npm run fix`, `npm run validate`)
|
|
33
|
+
3. **CONTRIBUTING.md** - Check for any validation requirements
|
|
34
|
+
|
|
35
|
+
Use project-specific commands instead of generic ones when available.
|
|
36
|
+
|
|
37
|
+
### Step 1: Run the Test Suite
|
|
38
|
+
|
|
39
|
+
Execute the project's tests:
|
|
40
|
+
|
|
41
|
+
```bash
|
|
42
|
+
# Common test commands
|
|
43
|
+
npm test
|
|
44
|
+
pytest
|
|
45
|
+
go test ./...
|
|
46
|
+
cargo test
|
|
47
|
+
```
|
|
48
|
+
|
|
49
|
+
### Step 2: Run Formatting and Linting
|
|
50
|
+
|
|
51
|
+
If the project has formatting/linting configured:
|
|
52
|
+
|
|
53
|
+
1. **Run formatters first** (e.g., `npm run fix`, `prettier --write`)
|
|
54
|
+
2. **Then run linters** (e.g., `npm run lint`, `eslint .`)
|
|
55
|
+
|
|
56
|
+
Check project guideline files and plan notes for the exact commands.
|
|
57
|
+
|
|
58
|
+
### Step 2.5: Update CHANGELOG (if required)
|
|
59
|
+
|
|
60
|
+
If project guideline files require CHANGELOG updates:
|
|
61
|
+
|
|
62
|
+
1. Check if changes touch user-facing code (lib/, bin/, etc.)
|
|
63
|
+
2. If yes, add an entry under `[Unreleased]` with appropriate category
|
|
64
|
+
3. Use the correct category: Added, Changed, Fixed, Removed, Deprecated, Security
|
|
65
|
+
|
|
66
|
+
### Step 3: Check Acceptance Criteria
|
|
67
|
+
|
|
68
|
+
For each completed step, verify its `criteria` field from the plan.
|
|
69
|
+
Ask yourself: Does the implementation actually satisfy this?
|
|
70
|
+
|
|
71
|
+
### Step 4: Decision & Handoff
|
|
72
|
+
|
|
73
|
+
**Case A: Verification FAILED**
|
|
74
|
+
If tests fail, linting errors occur, or criteria are not met:
|
|
75
|
+
|
|
76
|
+
1. Analyze the error.
|
|
77
|
+
2. **Return to Execute:** Invoke the `orrery-execute` skill using the Skill tool to fix the issues.
|
|
78
|
+
3. _Do not_ proceed to Report until issues are resolved (unless completely blocked).
|
|
79
|
+
|
|
80
|
+
**Case B: Verification PASSED**
|
|
81
|
+
If all checks pass:
|
|
82
|
+
|
|
83
|
+
1. **Gather Stats:** Note the number of tests passed (e.g., "8/8 passed").
|
|
84
|
+
2. **Handoff to Report:** Invoke the `orrery-report` skill using the Skill tool to finalize the step.
|
|
85
|
+
|
|
86
|
+
---
|
|
87
|
+
|
|
88
|
+
## Example
|
|
89
|
+
|
|
90
|
+
**Scenario:** You implemented `src/api/routes/upload.ts`.
|
|
91
|
+
|
|
92
|
+
1. **Run tests:** `npm test` -> **FAIL** (ReferenceError).
|
|
93
|
+
- **Action:** Invoke the `orrery-execute` skill using the Skill tool to fix the ReferenceError.
|
|
94
|
+
|
|
95
|
+
2. **Run tests (Attempt 2):** `npm test` -> **PASS** (5 tests passed).
|
|
96
|
+
3. **Run lint:** `npm run lint` -> **PASS**.
|
|
97
|
+
4. **Action:** Invoke the `orrery-report` skill using the Skill tool.
|
|
98
|
+
|
|
99
|
+
---
|
|
100
|
+
|
|
101
|
+
## Common Pitfalls
|
|
102
|
+
|
|
103
|
+
- **Ignoring failures:** Passing a failed test suite to the Report skill.
|
|
104
|
+
- **Skipping regression checks:** Not running the full suite to ensure old code still works.
|
|
105
|
+
- **Infinite Loops:** If you keep bouncing between Execute and Verify without progress, stop and invoke the `orrery-report` skill using the Skill tool with a "Blocked" status.
|
|
@@ -0,0 +1,291 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: refine-plan
|
|
3
|
+
description: >
|
|
4
|
+
Analyze and improve an existing plan file. Reviews plan structure, dependencies,
|
|
5
|
+
context quality, and acceptance criteria, then implements improvements directly.
|
|
6
|
+
Requires a plan file argument (e.g., /refine-plan .agent-work/plans/my-plan.yaml).
|
|
7
|
+
hooks:
|
|
8
|
+
PostToolUse:
|
|
9
|
+
- matcher: "Write"
|
|
10
|
+
hooks:
|
|
11
|
+
- type: command
|
|
12
|
+
command: "orrery validate-plan"
|
|
13
|
+
---
|
|
14
|
+
|
|
15
|
+
# Refine Plan Skill
|
|
16
|
+
|
|
17
|
+
## When to Use
|
|
18
|
+
|
|
19
|
+
Use this skill when you have an existing plan and want to **improve it** before execution. Unlike Simulate (which is read-only), Refine analyzes the plan and implements improvements directly.
|
|
20
|
+
|
|
21
|
+
**Triggers:**
|
|
22
|
+
|
|
23
|
+
- "This plan needs work"
|
|
24
|
+
- "Can you improve this plan?"
|
|
25
|
+
- "Check if this plan is ready for execution"
|
|
26
|
+
- "The context seems thin, can you fill it out?"
|
|
27
|
+
- "Make sure this plan follows the schema"
|
|
28
|
+
- Plan validation failed and you need to fix issues
|
|
29
|
+
|
|
30
|
+
**Skip if:**
|
|
31
|
+
|
|
32
|
+
- No plan exists yet (use Discovery first)
|
|
33
|
+
- You want to explore the plan interactively (use Simulate instead)
|
|
34
|
+
- You need to fundamentally restructure the plan or change outcomes (use Discovery)
|
|
35
|
+
- The plan is already validated and well-formed
|
|
36
|
+
|
|
37
|
+
---
|
|
38
|
+
|
|
39
|
+
## How to Do It
|
|
40
|
+
|
|
41
|
+
### Step 1: Load and Parse the Plan
|
|
42
|
+
|
|
43
|
+
Read the plan file and understand its structure:
|
|
44
|
+
|
|
45
|
+
- Parse the YAML to identify all steps, dependencies, and metadata
|
|
46
|
+
- Note the outcomes (what success looks like)
|
|
47
|
+
- Map the dependency graph
|
|
48
|
+
- Identify which steps have context_files, risk_notes, and other optional fields
|
|
49
|
+
|
|
50
|
+
Announce what you've loaded:
|
|
51
|
+
|
|
52
|
+
```
|
|
53
|
+
I've loaded [plan name]. It has [N] steps with [X] outcomes.
|
|
54
|
+
Let me analyze it for improvements.
|
|
55
|
+
```
|
|
56
|
+
|
|
57
|
+
### Step 2: Analyze for Improvements
|
|
58
|
+
|
|
59
|
+
Check each improvement category systematically. Build a list of findings.
|
|
60
|
+
|
|
61
|
+
For each issue found, note:
|
|
62
|
+
|
|
63
|
+
- What the problem is
|
|
64
|
+
- Which step(s) it affects
|
|
65
|
+
- What the fix should be
|
|
66
|
+
|
|
67
|
+
### Step 3: Report Findings or Exit Clean
|
|
68
|
+
|
|
69
|
+
**If no improvements found:**
|
|
70
|
+
|
|
71
|
+
```
|
|
72
|
+
The plan looks good. No structural issues, dependencies are correct,
|
|
73
|
+
context is sufficient for autonomous execution, and acceptance criteria
|
|
74
|
+
are testable. Ready for execution.
|
|
75
|
+
```
|
|
76
|
+
|
|
77
|
+
Exit cleanly without modifying the file.
|
|
78
|
+
|
|
79
|
+
**If improvements found:**
|
|
80
|
+
|
|
81
|
+
```
|
|
82
|
+
Found [N] improvements:
|
|
83
|
+
- [Category]: [Brief description] (steps X, Y)
|
|
84
|
+
- [Category]: [Brief description] (step Z)
|
|
85
|
+
|
|
86
|
+
Implementing these changes now.
|
|
87
|
+
```
|
|
88
|
+
|
|
89
|
+
### Step 4: Implement Improvements
|
|
90
|
+
|
|
91
|
+
Write the updated plan directly to the file. The PostToolUse hook will automatically validate the plan after writing.
|
|
92
|
+
|
|
93
|
+
Make changes in order of impact:
|
|
94
|
+
|
|
95
|
+
1. Schema compliance issues first (required fields)
|
|
96
|
+
2. Structural issues (malformed steps)
|
|
97
|
+
3. Dependency issues (missing deps, cycles)
|
|
98
|
+
4. Context and criteria improvements
|
|
99
|
+
|
|
100
|
+
### Step 5: Summarize Changes
|
|
101
|
+
|
|
102
|
+
After writing:
|
|
103
|
+
|
|
104
|
+
```
|
|
105
|
+
Updated the plan with [N] changes:
|
|
106
|
+
- Added missing deps to steps 2.1, 2.3 (they now depend on install step 0.1)
|
|
107
|
+
- Expanded context for step 1.2 to include integration patterns
|
|
108
|
+
- Made criteria in step 3.1 more specific and testable
|
|
109
|
+
- Added risk_notes to step 2.1 (high complexity)
|
|
110
|
+
|
|
111
|
+
Plan validated successfully.
|
|
112
|
+
```
|
|
113
|
+
|
|
114
|
+
---
|
|
115
|
+
|
|
116
|
+
## Improvement Categories
|
|
117
|
+
|
|
118
|
+
Check the plan against these categories:
|
|
119
|
+
|
|
120
|
+
### Structural Issues
|
|
121
|
+
|
|
122
|
+
- **Missing required fields**: Every step must have `id`, `description`, `context`, `requirements`, `criteria`
|
|
123
|
+
- **Malformed step IDs**: IDs should follow `{feature}.{step}` format (e.g., "1.1", "2.3")
|
|
124
|
+
- **Missing metadata**: Plan must have `created_at`, `created_by`, `outcomes`
|
|
125
|
+
- **Empty arrays**: Requirements and criteria must have at least one item
|
|
126
|
+
|
|
127
|
+
**Fix approach:** Add missing fields with sensible defaults or flag for user input if context is unclear.
|
|
128
|
+
|
|
129
|
+
### Dependency Issues
|
|
130
|
+
|
|
131
|
+
- **Missing install step deps**: If step "0.1" installs dependencies, ALL subsequent steps should include "0.1" in their deps
|
|
132
|
+
- **Circular dependencies**: Step A depends on B, B depends on A
|
|
133
|
+
- **Missing sequential deps**: Step uses output from another step but doesn't declare the dependency
|
|
134
|
+
- **Unnecessary sequential deps**: Steps that could run in parallel but have artificial dependencies
|
|
135
|
+
|
|
136
|
+
**Fix approach:** Add missing deps, remove circular deps, suggest parallelization opportunities.
|
|
137
|
+
|
|
138
|
+
### Context Quality
|
|
139
|
+
|
|
140
|
+
Thin context prevents autonomous execution. Check for:
|
|
141
|
+
|
|
142
|
+
- **Vague context**: "Add the feature" without explaining what, where, or why
|
|
143
|
+
- **Missing context_files**: Steps that reference existing code but don't list which files to read
|
|
144
|
+
- **No integration guidance**: Steps that modify existing code without explaining the patterns to follow
|
|
145
|
+
- **Assumed knowledge**: Context assumes the agent knows project-specific conventions
|
|
146
|
+
|
|
147
|
+
**Fix approach:** Expand context with specifics. Add relevant context_files. Reference existing patterns.
|
|
148
|
+
|
|
149
|
+
### Acceptance Criteria Quality
|
|
150
|
+
|
|
151
|
+
Criteria must be **specific** and **testable**. Check for:
|
|
152
|
+
|
|
153
|
+
- **Vague criteria**: "Works correctly" or "Handles errors"
|
|
154
|
+
- **Untestable criteria**: "Code is clean" or "Performance is good"
|
|
155
|
+
- **Missing criteria**: Steps with empty or single-item criteria arrays
|
|
156
|
+
- **Inconsistent scope**: Criteria that don't match what the step actually delivers
|
|
157
|
+
|
|
158
|
+
**Fix approach:** Rewrite vague criteria with specific, observable conditions. Add missing criteria based on requirements.
|
|
159
|
+
|
|
160
|
+
**Examples:**
|
|
161
|
+
|
|
162
|
+
- Bad: "Error handling works"
|
|
163
|
+
- Good: "Returns 400 status with error message when input validation fails"
|
|
164
|
+
|
|
165
|
+
- Bad: "Component renders correctly"
|
|
166
|
+
- Good: "Component renders loading skeleton while data fetches; displays chart when data arrives; shows error message if API returns 500"
|
|
167
|
+
|
|
168
|
+
### Risk Coverage
|
|
169
|
+
|
|
170
|
+
- **High complexity without notes**: Steps with many requirements or external integrations but no risk_notes
|
|
171
|
+
- **Missing edge cases**: Obvious failure modes not acknowledged
|
|
172
|
+
- **Underestimated steps**: Steps that look simple but have hidden complexity
|
|
173
|
+
|
|
174
|
+
**Fix approach:** Add risk_notes for complex steps. Call out edge cases and potential blockers.
|
|
175
|
+
|
|
176
|
+
### Schema Compliance
|
|
177
|
+
|
|
178
|
+
Validate against `../discovery/schemas/plan-schema.yaml`:
|
|
179
|
+
|
|
180
|
+
- Field types match schema (arrays vs strings, required vs optional)
|
|
181
|
+
- Status values are valid enum values
|
|
182
|
+
- All referenced step IDs in deps actually exist
|
|
183
|
+
|
|
184
|
+
**Fix approach:** Correct type mismatches, fix invalid values, remove dangling dep references.
|
|
185
|
+
|
|
186
|
+
---
|
|
187
|
+
|
|
188
|
+
## Reference Schema
|
|
189
|
+
|
|
190
|
+
The plan schema is defined at:
|
|
191
|
+
|
|
192
|
+
```
|
|
193
|
+
agent/skills/discovery/schemas/plan-schema.yaml
|
|
194
|
+
```
|
|
195
|
+
|
|
196
|
+
Key required fields:
|
|
197
|
+
|
|
198
|
+
- **metadata**: `created_at`, `created_by`, `outcomes`
|
|
199
|
+
- **steps[].required**: `id`, `description`, `context`, `requirements`, `criteria`
|
|
200
|
+
- **steps[].optional**: `status`, `deps`, `parallel`, `files`, `context_files`, `commands`, `risk_notes`
|
|
201
|
+
|
|
202
|
+
---
|
|
203
|
+
|
|
204
|
+
## Exit Conditions
|
|
205
|
+
|
|
206
|
+
Exit **without changes** when:
|
|
207
|
+
|
|
208
|
+
- All required fields are present
|
|
209
|
+
- Dependencies are correct (no cycles, install steps properly referenced)
|
|
210
|
+
- Context is sufficient for autonomous execution
|
|
211
|
+
- Criteria are specific and testable
|
|
212
|
+
- No schema violations
|
|
213
|
+
|
|
214
|
+
In this case, report:
|
|
215
|
+
|
|
216
|
+
```
|
|
217
|
+
Plan analysis complete. No improvements needed.
|
|
218
|
+
|
|
219
|
+
- Structure: Valid
|
|
220
|
+
- Dependencies: Correct
|
|
221
|
+
- Context: Sufficient
|
|
222
|
+
- Criteria: Testable
|
|
223
|
+
- Schema: Compliant
|
|
224
|
+
|
|
225
|
+
Ready for execution.
|
|
226
|
+
```
|
|
227
|
+
|
|
228
|
+
---
|
|
229
|
+
|
|
230
|
+
## Example Dialogue
|
|
231
|
+
|
|
232
|
+
```
|
|
233
|
+
User: /refine-plan .agent-work/plans/analytics-dashboard.yaml
|
|
234
|
+
|
|
235
|
+
Agent: I've loaded the analytics dashboard plan. It has 6 steps delivering
|
|
236
|
+
two outcomes. Let me analyze it for improvements.
|
|
237
|
+
|
|
238
|
+
Found 4 improvements:
|
|
239
|
+
|
|
240
|
+
1. Dependency issues: Steps 1.1, 2.1, 2.2 don't depend on the install step (0.1)
|
|
241
|
+
2. Context quality: Step 1.2 has thin context - doesn't explain the caching pattern
|
|
242
|
+
3. Criteria quality: Step 2.3 has vague criteria ("error handling works")
|
|
243
|
+
4. Risk coverage: Step 2.1 (Chart.js integration) has no risk_notes despite complexity
|
|
244
|
+
|
|
245
|
+
Implementing these changes now.
|
|
246
|
+
|
|
247
|
+
[Writes updated plan file]
|
|
248
|
+
|
|
249
|
+
Updated the plan:
|
|
250
|
+
- Added "0.1" to deps for steps 1.1, 2.1, 2.2
|
|
251
|
+
- Expanded step 1.2 context with reference to cache.ts middleware pattern
|
|
252
|
+
- Rewrote step 2.3 criteria: "Shows loading skeleton during fetch",
|
|
253
|
+
"Displays error toast with retry button on API failure"
|
|
254
|
+
- Added risk_notes to step 2.1: "Chart.js bundle size impact - verify no
|
|
255
|
+
significant increase to main bundle"
|
|
256
|
+
|
|
257
|
+
Plan validated successfully. Ready for execution.
|
|
258
|
+
```
|
|
259
|
+
|
|
260
|
+
---
|
|
261
|
+
|
|
262
|
+
## Boundaries
|
|
263
|
+
|
|
264
|
+
### What Refine Does
|
|
265
|
+
|
|
266
|
+
- Fixes structural issues (missing fields, malformed data)
|
|
267
|
+
- Corrects dependency problems
|
|
268
|
+
- Expands thin context
|
|
269
|
+
- Improves vague criteria
|
|
270
|
+
- Adds missing risk notes
|
|
271
|
+
- Ensures schema compliance
|
|
272
|
+
|
|
273
|
+
### What Refine Does NOT Do
|
|
274
|
+
|
|
275
|
+
- **Change outcomes**: If the outcomes are wrong, use Discovery
|
|
276
|
+
- **Restructure the plan**: Refine improves existing steps, doesn't reorganize them
|
|
277
|
+
- **Add new features**: Refine doesn't expand scope
|
|
278
|
+
- **Remove steps**: Refine improves steps, doesn't delete them
|
|
279
|
+
- **Question the approach**: That's what Simulate is for
|
|
280
|
+
|
|
281
|
+
If you find yourself wanting to fundamentally change the plan (new steps, different outcomes, restructured features), suggest returning to Discovery instead.
|
|
282
|
+
|
|
283
|
+
---
|
|
284
|
+
|
|
285
|
+
## Common Pitfalls
|
|
286
|
+
|
|
287
|
+
- **Over-refining**: Making changes that don't materially improve the plan. If it's good enough, say so.
|
|
288
|
+
- **Changing scope**: Refine improves quality, not content. Don't add features or change what the plan delivers.
|
|
289
|
+
- **Forgetting validation**: Always let the PostToolUse hook validate after writing. If validation fails, fix and write again.
|
|
290
|
+
- **Missing the install step pattern**: This is the most common dependency issue. Always check if 0.1 exists and if subsequent steps depend on it.
|
|
291
|
+
- **Vague improvements**: Don't just say "expanded context" - show specifically what changed.
|