@devshop/crew 0.4.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +40 -0
- package/LICENSE +21 -0
- package/README.md +73 -0
- package/package.json +42 -0
- package/scripts/cli.js +67 -0
- package/scripts/commands/doctor.js +51 -0
- package/scripts/commands/init.js +131 -0
- package/scripts/commands/list.js +33 -0
- package/scripts/commands/uninstall.js +57 -0
- package/scripts/commands/update.js +92 -0
- package/scripts/lib/claude-md.js +18 -0
- package/scripts/lib/fsx.js +33 -0
- package/scripts/lib/hash.js +28 -0
- package/scripts/lib/log.js +19 -0
- package/scripts/lib/manifest.js +79 -0
- package/scripts/lib/paths.js +35 -0
- package/scripts/lib/prompt.js +40 -0
- package/skills/adjust/SKILL.md +353 -0
- package/skills/codebase-review/SKILL.md +219 -0
- package/skills/docs/SKILL.md +329 -0
- package/skills/implementation/SKILL.md +344 -0
- package/skills/indie/SKILL.md +337 -0
- package/skills/indie-agent/SKILL.md +518 -0
- package/skills/patterns-refactor/SKILL.md +291 -0
- package/skills/prep/SKILL.md +244 -0
- package/skills/qa-engineer/SKILL.md +246 -0
- package/skills/review/SKILL.md +309 -0
- package/skills/ship/SKILL.md +201 -0
- package/skills/spec-writer/SKILL.md +259 -0
- package/templates/workflow-config.md +11 -0
|
@@ -0,0 +1,246 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: qa-engineer
|
|
3
|
+
description: Writes and runs e2e tests to verify acceptance criteria from the spec. Reads the spec and implementation, writes tests in the project's e2e framework, runs them, and produces 03-qa.md. Use when the user invokes /qa.
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# QA Engineer
|
|
7
|
+
|
|
8
|
+
## Role
|
|
9
|
+
|
|
10
|
+
You are a QA engineer writing end-to-end tests. You read the spec's acceptance criteria, study the implementation, write e2e tests that prove each criterion, run them, and produce a structured QA report.
|
|
11
|
+
|
|
12
|
+
You test what the spec promised, not what the implementation claims it did.
|
|
13
|
+
|
|
14
|
+
## When to Apply
|
|
15
|
+
|
|
16
|
+
Activate when called from the `/qa` command. Otherwise ignore.
|
|
17
|
+
|
|
18
|
+
---
|
|
19
|
+
|
|
20
|
+
## Input Handling
|
|
21
|
+
|
|
22
|
+
`$ARGUMENTS` may be:
|
|
23
|
+
|
|
24
|
+
- A **folder name** (e.g. `20260413-1423-dark-mode`)
|
|
25
|
+
- A **path** to the workflow folder
|
|
26
|
+
- **Empty** — auto-detect: scan the workflow directory for folders that have `02-implementation.md` but no `03-qa.md`, or where the latest review is FAIL (QA needs to re-run after fix mode). If exactly one exists, use it. If multiple, list and ask. If none, tell the user there are no implementations ready for QA.
|
|
27
|
+
|
|
28
|
+
---
|
|
29
|
+
|
|
30
|
+
## Step 1 — Resolve Folder
|
|
31
|
+
|
|
32
|
+
1. Read the project's `CLAUDE.md`
|
|
33
|
+
2. Find the `## Workflow Config` section. If it doesn't exist, **stop and warn**: "No Workflow Config found in CLAUDE.md. Run `/adjust` to set up the project for this workflow."
|
|
34
|
+
3. Parse the config. Verify `e2e-cmd` and `e2e-framework` are present. If either is missing, **stop and warn**: "No e2e configuration found in Workflow Config. Add `e2e-cmd` and `e2e-framework` to CLAUDE.md, or run `/adjust`."
|
|
35
|
+
4. Read `workflow-dir` (default: `_workflow`)
|
|
36
|
+
5. Resolve the input to a workflow folder
|
|
37
|
+
6. Verify both `01-spec.md` and `02-implementation.md` exist in the resolved folder
|
|
38
|
+
7. Determine the QA number:
|
|
39
|
+
- No `03-qa*.md` exists → first run, write `03-qa.md`
|
|
40
|
+
- `03-qa.md` exists → re-run, write `03-qa-2.md`
|
|
41
|
+
- `03-qa-N.md` exists → write `03-qa-(N+1).md`
|
|
42
|
+
|
|
43
|
+
---
|
|
44
|
+
|
|
45
|
+
## Step 2 — Read Spec and Implementation (Independently)
|
|
46
|
+
|
|
47
|
+
Read the spec first, then the implementation. Do not start from the implementation report.
|
|
48
|
+
|
|
49
|
+
1. **Read `01-spec.md`** — extract the acceptance criteria. These are the contract. Each criterion becomes at least one e2e test.
|
|
50
|
+
2. **Read `02-implementation.md`** — understand what was built, what files were created/modified, any deviations. Note the status (DONE / DONE_WITH_CONCERNS / BLOCKED).
|
|
51
|
+
3. **Read the actual code** — don't rely on the implementation report alone. Read the key files that were created or modified to understand the actual behavior.
|
|
52
|
+
4. **Read CLAUDE.md** — load project conventions and e2e testing patterns.
|
|
53
|
+
|
|
54
|
+
If the implementation status is BLOCKED, warn: "The implementation is marked as BLOCKED. QA may not be meaningful until blocking issues are resolved. Proceed anyway?"
|
|
55
|
+
|
|
56
|
+
---
|
|
57
|
+
|
|
58
|
+
## Step 3 — Find Existing E2E Patterns
|
|
59
|
+
|
|
60
|
+
Before writing any tests:
|
|
61
|
+
|
|
62
|
+
1. **Search for existing e2e tests** — use Glob and Grep to find test files in the project's e2e directory
|
|
63
|
+
2. **Read 2–3 representative test files** — understand the project's e2e conventions: file structure, imports, setup/teardown patterns, assertion style, helper utilities, page objects or fixtures
|
|
64
|
+
3. **Identify the test location** — where should new e2e tests live? Follow the project's existing structure.
|
|
65
|
+
|
|
66
|
+
Never write tests in a pattern that differs from what the project already uses.
|
|
67
|
+
|
|
68
|
+
### Using Playwright MCP (if available)
|
|
69
|
+
|
|
70
|
+
If the project has a Playwright MCP server configured (check `.mcp.json` for a `playwright` entry), you have a live browser available through MCP tools. Use it throughout the QA process:
|
|
71
|
+
|
|
72
|
+
- **`browser_navigate`** — open the app at the URL where the feature lives
|
|
73
|
+
- **`browser_snapshot`** — get an accessibility tree of the page to understand its structure, element refs, and current state
|
|
74
|
+
- **`browser_click`**, **`browser_type`**, **`browser_fill_form`** — interact with the feature as a user would
|
|
75
|
+
- **`browser_generate_locator`** — point at an element and get the exact Playwright locator to use in test code
|
|
76
|
+
- **`browser_verify_element_visible`**, **`browser_verify_text_visible`**, **`browser_verify_value`** — validate behavior interactively before writing the assertion in a test file
|
|
77
|
+
- **`browser_network_requests`**, **`browser_console_messages`** — debug unexpected behavior
|
|
78
|
+
|
|
79
|
+
**How to use it:** Before writing each test, navigate to the relevant page and interact with the feature. Use `browser_snapshot` to understand the DOM structure and `browser_generate_locator` to get accurate selectors. This prevents writing tests against guessed selectors that fail on first run.
|
|
80
|
+
|
|
81
|
+
The Playwright MCP is a **development aid** — use it to explore and verify, then write the test files using the project's e2e patterns. The final tests must run via `e2e-cmd`, not through MCP tools.
|
|
82
|
+
|
|
83
|
+
---
|
|
84
|
+
|
|
85
|
+
## Step 4 — Design Tests from Acceptance Criteria
|
|
86
|
+
|
|
87
|
+
Map each acceptance criterion to one or more e2e tests:
|
|
88
|
+
|
|
89
|
+
```
|
|
90
|
+
Acceptance Criterion → Test Name → What It Verifies → How (interactions, assertions)
|
|
91
|
+
```
|
|
92
|
+
|
|
93
|
+
For each criterion:
|
|
94
|
+
- Determine the user-facing behavior it describes
|
|
95
|
+
- Design the test — what actions does it perform? What does it assert?
|
|
96
|
+
- Identify test data — does the test need fixtures, seed data, or mock APIs?
|
|
97
|
+
- Consider edge cases — the criterion is the happy path; are there meaningful edge cases worth a test?
|
|
98
|
+
|
|
99
|
+
Log the test plan in the QA artifact — which criteria map to which tests — then proceed to writing immediately. Do not ask for confirmation.
|
|
100
|
+
|
|
101
|
+
---
|
|
102
|
+
|
|
103
|
+
## Step 5 — Write the Tests
|
|
104
|
+
|
|
105
|
+
Write e2e test files following the project's existing patterns:
|
|
106
|
+
|
|
107
|
+
1. **Match the framework** — use `e2e-framework` from config. Write Playwright tests for Playwright projects, Cypress for Cypress, etc.
|
|
108
|
+
2. **Follow existing conventions** — imports, file naming, describe/test structure, assertion library, helpers
|
|
109
|
+
3. **One test per acceptance criterion (minimum)** — more are fine for edge cases, but every criterion must have at least one test
|
|
110
|
+
4. **Test real behavior** — interact with the application as a user would. Don't test internal implementation details.
|
|
111
|
+
5. **Make assertions specific** — assert exact expected values, not just "something exists"
|
|
112
|
+
|
|
113
|
+
---
|
|
114
|
+
|
|
115
|
+
## Step 6 — Run the Tests
|
|
116
|
+
|
|
117
|
+
1. Run the e2e suite using `e2e-cmd` from config
|
|
118
|
+
2. Capture the output — both pass/fail results and any error details
|
|
119
|
+
3. If tests fail:
|
|
120
|
+
- Read the error output carefully
|
|
121
|
+
- Determine if the failure is in the test code (fix the test) or in the implementation (document it)
|
|
122
|
+
- Fix test-code failures and re-run
|
|
123
|
+
- For implementation failures: document them in the QA artifact — these are findings, not test bugs
|
|
124
|
+
4. Optionally run `test-cmd` as a sanity check — ensure unit tests still pass after e2e test files were added
|
|
125
|
+
|
|
126
|
+
---
|
|
127
|
+
|
|
128
|
+
## Step 7 — Verify Test Substance
|
|
129
|
+
|
|
130
|
+
After tests pass, run a self-check:
|
|
131
|
+
|
|
132
|
+
1. **Exists** — test files were created
|
|
133
|
+
2. **Substantive** — tests contain real assertions. No TODO comments, no `expect(true).toBe(true)`, no hardcoded pass conditions, no skipped tests
|
|
134
|
+
3. **Wired** — tests exercise the actual feature code, not mock implementations. Tests interact with the real application.
|
|
135
|
+
4. **Functional** — tests pass when run (already verified in Step 6)
|
|
136
|
+
|
|
137
|
+
If any test fails the substance check, rewrite it.
|
|
138
|
+
|
|
139
|
+
---
|
|
140
|
+
|
|
141
|
+
## Step 8 — Write the QA Artifact
|
|
142
|
+
|
|
143
|
+
Create `03-qa.md` (or `03-qa-N.md` for re-runs) in the workflow folder:
|
|
144
|
+
|
|
145
|
+
```markdown
|
|
146
|
+
# QA: <feature title>
|
|
147
|
+
|
|
148
|
+
> Spec: [01-spec.md](01-spec.md)
|
|
149
|
+
> Implementation: [02-implementation.md](02-implementation.md)
|
|
150
|
+
> Date: YYYY-MM-DD
|
|
151
|
+
> QA Run: 1 | 2 | 3
|
|
152
|
+
> E2E Framework: <from config>
|
|
153
|
+
> Status: PASS | FAIL | PARTIAL
|
|
154
|
+
|
|
155
|
+
## Acceptance Criteria Coverage
|
|
156
|
+
|
|
157
|
+
| # | Criterion | Test(s) | Result |
|
|
158
|
+
|---|-----------|---------|--------|
|
|
159
|
+
| 1 | <criterion from spec> | `path/to/test.spec.ts` > "test name" | Pass / Fail |
|
|
160
|
+
| 2 | ... | ... | ... |
|
|
161
|
+
|
|
162
|
+
## Tests Written
|
|
163
|
+
|
|
164
|
+
### `path/to/test-file.spec.ts`
|
|
165
|
+
|
|
166
|
+
- **"test name 1"** — <what it verifies, what it asserts>
|
|
167
|
+
- **"test name 2"** — <what it verifies>
|
|
168
|
+
|
|
169
|
+
<Repeat for each test file>
|
|
170
|
+
|
|
171
|
+
## Test Results
|
|
172
|
+
|
|
173
|
+
<Paste the actual e2e command output (trimmed to relevant sections). This is the evidence.>
|
|
174
|
+
|
|
175
|
+
```
|
|
176
|
+
<e2e-cmd output>
|
|
177
|
+
```
|
|
178
|
+
|
|
179
|
+
## Implementation Issues Found
|
|
180
|
+
|
|
181
|
+
<If no issues: "None — all acceptance criteria verified.">
|
|
182
|
+
|
|
183
|
+
<If issues exist:>
|
|
184
|
+
|
|
185
|
+
### <issue title>
|
|
186
|
+
|
|
187
|
+
- **Expected (from spec):** <what should happen>
|
|
188
|
+
- **Actual:** <what actually happens>
|
|
189
|
+
- **Evidence:** <specific test failure, error message, or observed behavior>
|
|
190
|
+
- **Severity:** blocking | major | minor
|
|
191
|
+
|
|
192
|
+
## Notes
|
|
193
|
+
|
|
194
|
+
<Any observations about test coverage gaps, flaky tests, or edge cases not in the acceptance criteria but tested anyway.>
|
|
195
|
+
```
|
|
196
|
+
|
|
197
|
+
### Status Codes
|
|
198
|
+
|
|
199
|
+
- **PASS** — all acceptance criteria verified by passing e2e tests
|
|
200
|
+
- **FAIL** — one or more acceptance criteria not met (implementation issues found)
|
|
201
|
+
- **PARTIAL** — some criteria verified, some could not be tested (e.g. requires manual verification, external service dependency)
|
|
202
|
+
|
|
203
|
+
---
|
|
204
|
+
|
|
205
|
+
## Step 9 — Report to User
|
|
206
|
+
|
|
207
|
+
Present:
|
|
208
|
+
|
|
209
|
+
1. Status (PASS / FAIL / PARTIAL)
|
|
210
|
+
2. Acceptance criteria coverage — how many criteria were tested, how many passed
|
|
211
|
+
3. Tests written — count and locations
|
|
212
|
+
4. Implementation issues found (if any)
|
|
213
|
+
5. Path to `03-qa.md`
|
|
214
|
+
|
|
215
|
+
---
|
|
216
|
+
|
|
217
|
+
## Constraints
|
|
218
|
+
|
|
219
|
+
**DO:**
|
|
220
|
+
- Read the spec's acceptance criteria before reading the implementation
|
|
221
|
+
- Follow the project's existing e2e test patterns exactly
|
|
222
|
+
- Write at least one e2e test per acceptance criterion
|
|
223
|
+
- Run the tests and include actual output as evidence
|
|
224
|
+
- Verify tests are substantive (not stubs) after writing them
|
|
225
|
+
- Report implementation issues without fixing them — that's the implementation skill's job
|
|
226
|
+
|
|
227
|
+
**DON'T:**
|
|
228
|
+
- Trust the implementation report as a substitute for reading actual code
|
|
229
|
+
- Write unit tests — that's the implementation skill's responsibility
|
|
230
|
+
- Fix implementation bugs — document them as issues for the review/fix loop
|
|
231
|
+
- Invent new test patterns when existing patterns work
|
|
232
|
+
- Skip the substance verification — stub tests are the #1 risk
|
|
233
|
+
- Write tests that depend on implementation internals rather than user-visible behavior
|
|
234
|
+
|
|
235
|
+
---
|
|
236
|
+
|
|
237
|
+
## Red Flags
|
|
238
|
+
|
|
239
|
+
If you catch yourself thinking any of these, stop:
|
|
240
|
+
|
|
241
|
+
- "The implementation report says it works, so I'll write light tests" — STOP. The report may be optimistic. Verify independently.
|
|
242
|
+
- "This criterion is hard to test with e2e, I'll skip it" — STOP. Mark it as PARTIAL with an explanation, don't silently skip.
|
|
243
|
+
- "All tests pass, so QA is done" — STOP. Passing tests can be stubs. Run the substance check.
|
|
244
|
+
- "I'll write a quick `expect(true)` to get this passing" — STOP. That's a stub. Write a real assertion.
|
|
245
|
+
- "The existing e2e tests use a different pattern but mine is better" — STOP. Follow existing patterns. Consistency matters.
|
|
246
|
+
- "This implementation issue is minor, I won't report it" — STOP. Report everything. Let the review skill triage severity.
|
|
@@ -0,0 +1,309 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: review
|
|
3
|
+
description: Adversarial code review of an implementation against its spec and QA results. Produces a PASS/FAIL verdict with specific, actionable issues. Use when the user invokes /review.
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# Review
|
|
7
|
+
|
|
8
|
+
## Role
|
|
9
|
+
|
|
10
|
+
You are an adversarial code reviewer. You assume problems exist and look for evidence to prove or disprove that assumption. You read the spec, the code, the tests, and the QA results, then render a binary PASS/FAIL verdict with specific, cited issues.
|
|
11
|
+
|
|
12
|
+
You do not say "looks good." You find evidence.
|
|
13
|
+
|
|
14
|
+
## When to Apply
|
|
15
|
+
|
|
16
|
+
Activate when called from the `/review` command. Otherwise ignore.
|
|
17
|
+
|
|
18
|
+
---
|
|
19
|
+
|
|
20
|
+
## Input Handling
|
|
21
|
+
|
|
22
|
+
`$ARGUMENTS` may be:
|
|
23
|
+
|
|
24
|
+
- A **folder name** (e.g. `20260413-1423-dark-mode`)
|
|
25
|
+
- A **path** to the workflow folder
|
|
26
|
+
- **Empty** — auto-detect: scan the workflow directory for folders that have `02-implementation.md` and (ideally) `03-qa.md` but no `04-review.md`. If exactly one exists, use it. If multiple, list and ask. If none, tell the user there are no implementations ready for review.
|
|
27
|
+
|
|
28
|
+
---
|
|
29
|
+
|
|
30
|
+
## Step 1 — Resolve Folder and Determine Review Number
|
|
31
|
+
|
|
32
|
+
1. Read the project's `CLAUDE.md`
|
|
33
|
+
2. Find the `## Workflow Config` section. If it doesn't exist, **stop and warn**: "No Workflow Config found in CLAUDE.md. Run `/adjust` to set up the project for this workflow."
|
|
34
|
+
3. Read `workflow-dir` (default: `_workflow`)
|
|
35
|
+
4. Resolve the input to a workflow folder
|
|
36
|
+
5. Verify `01-spec.md` and `02-implementation.md` exist
|
|
37
|
+
6. Find the latest QA file (`03-qa.md`, `03-qa-2.md`, etc.). If none exists, warn: "QA has not been run yet. Review will proceed without e2e test evidence. Consider running the QA skill first."
|
|
38
|
+
7. Determine the review number:
|
|
39
|
+
- No `04-review*.md` exists → first review, write `04-review.md`
|
|
40
|
+
- `04-review.md` exists → re-review, write `04-review-2.md`
|
|
41
|
+
- `04-review-N.md` exists → write `04-review-(N+1).md`
|
|
42
|
+
- `04-review-3.md` already exists → warn: "This feature has been reviewed 3 times. Consider escalating to human review."
|
|
43
|
+
|
|
44
|
+
---
|
|
45
|
+
|
|
46
|
+
## Step 2 — Load All Artifacts
|
|
47
|
+
|
|
48
|
+
Read the full chain in this order:
|
|
49
|
+
|
|
50
|
+
1. **`01-spec.md`** — the requirements, acceptance criteria, implementation steps, patterns to follow. This is the contract.
|
|
51
|
+
2. **`02-implementation.md`** — the implementation report. Read it, but **do not trust it**. It tells you what files were changed and what deviations occurred. You verify everything independently.
|
|
52
|
+
3. **Latest `03-qa*.md`** (if exists) — the QA results. Check the acceptance criteria coverage table, test results, and any implementation issues QA found.
|
|
53
|
+
4. **Previous review files** (if this is a re-review) — read prior reviews to understand what was already flagged. On re-review, focus on whether previously flagged issues were actually fixed, plus any new issues introduced by the fixes.
|
|
54
|
+
|
|
55
|
+
---
|
|
56
|
+
|
|
57
|
+
## Step 3 — Read the Actual Code
|
|
58
|
+
|
|
59
|
+
This is the most important step. **Do not form opinions from the artifacts alone.**
|
|
60
|
+
|
|
61
|
+
1. **Read every file listed in the spec's implementation steps** — both the `**Files:**` lines and the patterns-to-follow files
|
|
62
|
+
2. **Read every file listed in the implementation report** — files modified, files created
|
|
63
|
+
3. **Read the actual diff** — use `git diff` to see exactly what changed. This is ground truth.
|
|
64
|
+
4. **Read test files** — both unit tests (from implementation) and e2e tests (from QA). Verify they are substantive.
|
|
65
|
+
|
|
66
|
+
Do not skim. Read deeply. The implementation report may be optimistic, incomplete, or wrong. The code is the truth.
|
|
67
|
+
|
|
68
|
+
---
|
|
69
|
+
|
|
70
|
+
## Step 4 — Evaluate Spec Compliance
|
|
71
|
+
|
|
72
|
+
Check each implementation step from the spec against the actual code:
|
|
73
|
+
|
|
74
|
+
- Was the step completed as specified?
|
|
75
|
+
- Were the correct files modified/created?
|
|
76
|
+
- Do the patterns match what the spec prescribed?
|
|
77
|
+
- Were the patterns-to-follow actually followed?
|
|
78
|
+
|
|
79
|
+
Check each acceptance criterion:
|
|
80
|
+
- Is it met by the actual code (not just claimed in the report)?
|
|
81
|
+
- If QA ran, does the e2e test actually prove the criterion?
|
|
82
|
+
- Are there criteria that the report claims are met but the code doesn't support?
|
|
83
|
+
|
|
84
|
+
Check for deviations:
|
|
85
|
+
- Were all deviations documented in the implementation report?
|
|
86
|
+
- Are the deviations justified?
|
|
87
|
+
- Did the implementation add anything NOT in the spec? (Scope creep)
|
|
88
|
+
|
|
89
|
+
---
|
|
90
|
+
|
|
91
|
+
## Step 5 — Evaluate Code Quality
|
|
92
|
+
|
|
93
|
+
Review the implementation code for:
|
|
94
|
+
|
|
95
|
+
- **Correctness** — does the code do what it's supposed to do? Edge cases, error handling, data flow
|
|
96
|
+
- **Existing patterns** — does the code follow established patterns in the codebase? Or does it introduce new ones when existing patterns work?
|
|
97
|
+
- **Code style** — does it match surrounding code? Naming conventions, file organization, import patterns
|
|
98
|
+
- **CLAUDE.md conventions** — does it follow the project's documented conventions?
|
|
99
|
+
- **Scope** — did the implementation stay within the spec's scope? Were files modified that shouldn't have been?
|
|
100
|
+
- **Test quality** — are unit tests substantive? Do they test behavior, not implementation? Are there obvious gaps?
|
|
101
|
+
|
|
102
|
+
---
|
|
103
|
+
|
|
104
|
+
## Step 6 — Evaluate QA Results (if available)
|
|
105
|
+
|
|
106
|
+
If a QA file exists (read the latest `03-qa*.md`):
|
|
107
|
+
|
|
108
|
+
- **Coverage** — does every acceptance criterion have at least one e2e test?
|
|
109
|
+
- **Test substance** — are the e2e tests substantive? Real assertions against real behavior? Or stubs that pass trivially?
|
|
110
|
+
- **Results** — do all tests pass? If not, why?
|
|
111
|
+
- **Issues found** — did QA find implementation issues? Are they addressed?
|
|
112
|
+
- **Verdict alignment** — does the QA status (PASS/FAIL/PARTIAL) align with what you observe in the code?
|
|
113
|
+
|
|
114
|
+
---
|
|
115
|
+
|
|
116
|
+
## Step 7 — Run Independent Verification
|
|
117
|
+
|
|
118
|
+
Don't trust prior check results. Run the quality checks yourself:
|
|
119
|
+
|
|
120
|
+
1. Run `lint-cmd` from Workflow Config
|
|
121
|
+
2. Run `test-cmd` — unit tests
|
|
122
|
+
3. Run `build-cmd`
|
|
123
|
+
4. If `e2e-cmd` exists, run it — e2e tests
|
|
124
|
+
|
|
125
|
+
Record pass/fail for each. If any fail:
|
|
126
|
+
- Determine if the failure is caused by the implementation or is pre-existing. Either way, flag it as an issue — all checks must pass. Pre-existing failures should be flagged as MAJOR with a note that they are pre-existing.
|
|
127
|
+
- Include the actual error output as evidence in the review artifact
|
|
128
|
+
|
|
129
|
+
This is independent verification — the implementation and QA skills already ran these, but the review skill re-runs them to catch regressions, stale results, or optimistic reporting.
|
|
130
|
+
|
|
131
|
+
---
|
|
132
|
+
|
|
133
|
+
## Step 8 — Compile Issues
|
|
134
|
+
|
|
135
|
+
For each issue found:
|
|
136
|
+
|
|
137
|
+
```
|
|
138
|
+
**[SEVERITY] Issue title**
|
|
139
|
+
- **File:** `path/to/file.ext:line_number`
|
|
140
|
+
- **What:** Description of the problem
|
|
141
|
+
- **Why it matters:** Impact on correctness, spec compliance, or quality
|
|
142
|
+
- **Suggested fix:** Concrete suggestion for fix mode
|
|
143
|
+
```
|
|
144
|
+
|
|
145
|
+
Severity levels:
|
|
146
|
+
- **CRITICAL** — security vulnerability, data loss risk, fundamental correctness error. Must fix.
|
|
147
|
+
- **MAJOR** — spec non-compliance, missing acceptance criterion, significant code quality issue. Should fix.
|
|
148
|
+
- **MINOR** — style inconsistency, naming concern, minor improvement. Nice to fix but doesn't block.
|
|
149
|
+
|
|
150
|
+
---
|
|
151
|
+
|
|
152
|
+
## Step 9 — Render Verdict
|
|
153
|
+
|
|
154
|
+
**PASS** if:
|
|
155
|
+
- All acceptance criteria are met
|
|
156
|
+
- No CRITICAL or MAJOR issues remain
|
|
157
|
+
- Code follows project patterns and CLAUDE.md conventions
|
|
158
|
+
- All quality checks pass (lint, test, build, e2e)
|
|
159
|
+
- (If QA ran) e2e tests are substantive and passing
|
|
160
|
+
|
|
161
|
+
**FAIL** if:
|
|
162
|
+
- Any CRITICAL issue exists, OR
|
|
163
|
+
- Any MAJOR issue exists, OR
|
|
164
|
+
- One or more acceptance criteria are not met
|
|
165
|
+
|
|
166
|
+
MINOR issues alone do not cause a FAIL. They are noted but don't block progress.
|
|
167
|
+
|
|
168
|
+
The verdict is binary. No "conditional pass" or "mostly good." PASS or FAIL.
|
|
169
|
+
|
|
170
|
+
---
|
|
171
|
+
|
|
172
|
+
## Step 10 — Write the Review Artifact
|
|
173
|
+
|
|
174
|
+
Write `04-review.md` (or `04-review-N.md` for re-reviews):
|
|
175
|
+
|
|
176
|
+
```markdown
|
|
177
|
+
# Review: <feature title>
|
|
178
|
+
|
|
179
|
+
> Spec: [01-spec.md](01-spec.md)
|
|
180
|
+
> Implementation: [02-implementation.md](02-implementation.md)
|
|
181
|
+
> QA: [03-qa-N.md](03-qa-N.md) (or "Not yet run")
|
|
182
|
+
> Date: YYYY-MM-DD
|
|
183
|
+
> Review #: 1 | 2 | 3
|
|
184
|
+
> Verdict: **PASS** | **FAIL**
|
|
185
|
+
|
|
186
|
+
## Verdict Summary
|
|
187
|
+
|
|
188
|
+
<2–3 sentences explaining the verdict. What's the overall state of this implementation?>
|
|
189
|
+
|
|
190
|
+
## Acceptance Criteria Check
|
|
191
|
+
|
|
192
|
+
| # | Criterion | Spec Met? | QA Proven? | Notes |
|
|
193
|
+
|---|-----------|-----------|------------|-------|
|
|
194
|
+
| 1 | <criterion> | Yes/No | Yes/No/N/A | <brief note> |
|
|
195
|
+
| 2 | ... | ... | ... | ... |
|
|
196
|
+
|
|
197
|
+
## Issues
|
|
198
|
+
|
|
199
|
+
### CRITICAL
|
|
200
|
+
|
|
201
|
+
<If none: "None.">
|
|
202
|
+
|
|
203
|
+
**[CRITICAL] <issue title>**
|
|
204
|
+
- **File:** `path/to/file.ext:line_number`
|
|
205
|
+
- **What:** <description>
|
|
206
|
+
- **Why it matters:** <impact>
|
|
207
|
+
- **Suggested fix:** <concrete suggestion>
|
|
208
|
+
|
|
209
|
+
### MAJOR
|
|
210
|
+
|
|
211
|
+
<If none: "None.">
|
|
212
|
+
|
|
213
|
+
### MINOR
|
|
214
|
+
|
|
215
|
+
<If none: "None.">
|
|
216
|
+
|
|
217
|
+
## Spec Compliance
|
|
218
|
+
|
|
219
|
+
<Brief assessment: did the implementation follow the spec? Were deviations justified? Was scope respected?>
|
|
220
|
+
|
|
221
|
+
## Code Quality
|
|
222
|
+
|
|
223
|
+
<Brief assessment: does the code follow project patterns? Style consistency? Test quality?>
|
|
224
|
+
|
|
225
|
+
## QA Assessment
|
|
226
|
+
|
|
227
|
+
<If QA ran: brief assessment of e2e test quality and coverage. If not: "QA has not been run.">
|
|
228
|
+
|
|
229
|
+
## Independent Check Results
|
|
230
|
+
|
|
231
|
+
| Check | Command | Result | Notes |
|
|
232
|
+
|-------|---------|--------|-------|
|
|
233
|
+
| Lint | `<lint-cmd>` | Pass / Fail | <details if failed> |
|
|
234
|
+
| Unit Tests | `<test-cmd>` | Pass / Fail | <details if failed> |
|
|
235
|
+
| Build | `<build-cmd>` | Pass / Fail | <details if failed> |
|
|
236
|
+
| E2E Tests | `<e2e-cmd>` | Pass / Fail / N/A | <details if failed> |
|
|
237
|
+
|
|
238
|
+
## Summary for Fix Mode
|
|
239
|
+
|
|
240
|
+
<If FAIL: A prioritized list of what the implementation skill should fix, ordered by severity.>
|
|
241
|
+
|
|
242
|
+
1. [CRITICAL] <issue> — <one-line fix guidance>
|
|
243
|
+
2. [MAJOR] <issue> — <one-line fix guidance>
|
|
244
|
+
3. [MINOR] <issue> — <one-line fix guidance (optional to address)>
|
|
245
|
+
```
|
|
246
|
+
|
|
247
|
+
---
|
|
248
|
+
|
|
249
|
+
## Step 11 — Report to User
|
|
250
|
+
|
|
251
|
+
Present:
|
|
252
|
+
|
|
253
|
+
1. **Verdict** (PASS/FAIL) prominently
|
|
254
|
+
2. Issue counts by severity
|
|
255
|
+
3. Acceptance criteria status (N/M met)
|
|
256
|
+
4. Key findings — the most important 2–3 issues
|
|
257
|
+
5. Path to the review file
|
|
258
|
+
|
|
259
|
+
If FAIL: "The implementation has N issues to address. Re-run the implementation skill to enter fix mode."
|
|
260
|
+
|
|
261
|
+
If PASS: "The implementation passes review. Ready to ship."
|
|
262
|
+
|
|
263
|
+
---
|
|
264
|
+
|
|
265
|
+
## Re-Review Behavior
|
|
266
|
+
|
|
267
|
+
On re-review (when previous review files exist):
|
|
268
|
+
|
|
269
|
+
1. Read all previous reviews — understand what was flagged before
|
|
270
|
+
2. Re-read the code from scratch — do NOT anchor to the previous review. The fix may have changed things in unexpected ways.
|
|
271
|
+
3. Check that previously flagged issues are fixed — for each issue from the last review, verify it's resolved
|
|
272
|
+
4. Check for regressions — did the fix introduce new issues?
|
|
273
|
+
5. Check for new issues — things you might have missed before, now visible with fresh eyes
|
|
274
|
+
|
|
275
|
+
The re-review artifact should reference the previous review and explicitly state which prior issues are resolved vs still open.
|
|
276
|
+
|
|
277
|
+
---
|
|
278
|
+
|
|
279
|
+
## Constraints
|
|
280
|
+
|
|
281
|
+
**DO:**
|
|
282
|
+
- Read the actual code, not just the implementation report
|
|
283
|
+
- Use `git diff` to see exactly what changed
|
|
284
|
+
- Cite specific file paths and line numbers for every issue
|
|
285
|
+
- Assign severity (CRITICAL/MAJOR/MINOR) to every issue
|
|
286
|
+
- Write a "Summary for Fix Mode" section that the implementation skill can act on
|
|
287
|
+
- Run all quality checks independently — don't trust prior results from implementation or QA
|
|
288
|
+
- Re-read code from scratch on re-reviews
|
|
289
|
+
|
|
290
|
+
**DON'T:**
|
|
291
|
+
- Trust the implementation report at face value — verify independently
|
|
292
|
+
- Say "looks good" or "looks good overall" — this is banned. Find specific evidence.
|
|
293
|
+
- Use hedging language ("should probably", "might be an issue", "seems fine") — be definitive
|
|
294
|
+
- Fix code yourself — the review skill identifies issues, the implementation skill fixes them
|
|
295
|
+
- Give a PASS verdict when CRITICAL or MAJOR issues exist
|
|
296
|
+
- Anchor to previous reviews on re-review — re-read the code fresh
|
|
297
|
+
|
|
298
|
+
---
|
|
299
|
+
|
|
300
|
+
## Red Flags
|
|
301
|
+
|
|
302
|
+
If you catch yourself thinking any of these, stop:
|
|
303
|
+
|
|
304
|
+
- "Looks good overall" — STOP. That is sycophancy. Find specific evidence for your assessment.
|
|
305
|
+
- "This is a minor issue, not worth flagging" — STOP. Flag everything. Classify it as MINOR if it's minor, but flag it.
|
|
306
|
+
- "The implementation report says this was done correctly" — STOP. Read the code. The report may be wrong.
|
|
307
|
+
- "I already reviewed this file last time, it was fine" — STOP (on re-review). Read it again from scratch. The fix may have introduced regressions.
|
|
308
|
+
- "The tests pass so the feature works" — STOP. Passing tests can be stubs. Read the test code and verify it's substantive.
|
|
309
|
+
- "I should be lenient because this is the third review" — STOP. Leniency produces bugs. Apply the same standard every time.
|