@abdullahsahmad/work-kit 0.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +147 -0
- package/cli/bin/work-kit.mjs +18 -0
- package/cli/src/commands/complete.ts +123 -0
- package/cli/src/commands/completions.ts +137 -0
- package/cli/src/commands/context.ts +41 -0
- package/cli/src/commands/doctor.ts +79 -0
- package/cli/src/commands/init.test.ts +116 -0
- package/cli/src/commands/init.ts +184 -0
- package/cli/src/commands/loopback.ts +64 -0
- package/cli/src/commands/next.ts +172 -0
- package/cli/src/commands/observe.ts +144 -0
- package/cli/src/commands/setup.ts +159 -0
- package/cli/src/commands/status.ts +50 -0
- package/cli/src/commands/uninstall.ts +89 -0
- package/cli/src/commands/upgrade.ts +12 -0
- package/cli/src/commands/validate.ts +34 -0
- package/cli/src/commands/workflow.ts +125 -0
- package/cli/src/config/agent-map.ts +62 -0
- package/cli/src/config/loopback-routes.ts +45 -0
- package/cli/src/config/phases.ts +119 -0
- package/cli/src/context/extractor.test.ts +77 -0
- package/cli/src/context/extractor.ts +73 -0
- package/cli/src/context/prompt-builder.ts +70 -0
- package/cli/src/engine/loopbacks.test.ts +33 -0
- package/cli/src/engine/loopbacks.ts +32 -0
- package/cli/src/engine/parallel.ts +60 -0
- package/cli/src/engine/phases.ts +23 -0
- package/cli/src/engine/transitions.test.ts +117 -0
- package/cli/src/engine/transitions.ts +97 -0
- package/cli/src/index.ts +248 -0
- package/cli/src/observer/data.ts +237 -0
- package/cli/src/observer/renderer.ts +316 -0
- package/cli/src/observer/watcher.ts +99 -0
- package/cli/src/state/helpers.test.ts +91 -0
- package/cli/src/state/helpers.ts +65 -0
- package/cli/src/state/schema.ts +113 -0
- package/cli/src/state/store.ts +82 -0
- package/cli/src/state/validators.test.ts +105 -0
- package/cli/src/state/validators.ts +81 -0
- package/cli/src/utils/colors.ts +12 -0
- package/package.json +49 -0
- package/skills/auto-kit/SKILL.md +214 -0
- package/skills/build/SKILL.md +88 -0
- package/skills/build/stages/commit.md +43 -0
- package/skills/build/stages/core.md +48 -0
- package/skills/build/stages/integration.md +44 -0
- package/skills/build/stages/migration.md +41 -0
- package/skills/build/stages/red.md +44 -0
- package/skills/build/stages/refactor.md +48 -0
- package/skills/build/stages/setup.md +42 -0
- package/skills/build/stages/ui.md +51 -0
- package/skills/deploy/SKILL.md +62 -0
- package/skills/deploy/stages/merge.md +47 -0
- package/skills/deploy/stages/monitor.md +39 -0
- package/skills/deploy/stages/remediate.md +54 -0
- package/skills/full-kit/SKILL.md +195 -0
- package/skills/plan/SKILL.md +77 -0
- package/skills/plan/stages/architecture.md +53 -0
- package/skills/plan/stages/audit.md +58 -0
- package/skills/plan/stages/blueprint.md +60 -0
- package/skills/plan/stages/clarify.md +61 -0
- package/skills/plan/stages/investigate.md +47 -0
- package/skills/plan/stages/scope.md +46 -0
- package/skills/plan/stages/sketch.md +44 -0
- package/skills/plan/stages/ux-flow.md +49 -0
- package/skills/review/SKILL.md +104 -0
- package/skills/review/stages/compliance.md +48 -0
- package/skills/review/stages/handoff.md +59 -0
- package/skills/review/stages/performance.md +45 -0
- package/skills/review/stages/security.md +49 -0
- package/skills/review/stages/self-review.md +41 -0
- package/skills/test/SKILL.md +83 -0
- package/skills/test/stages/e2e.md +44 -0
- package/skills/test/stages/validate.md +51 -0
- package/skills/test/stages/verify.md +41 -0
- package/skills/wrap-up/SKILL.md +107 -0
|
@@ -0,0 +1,49 @@
|
|
|
1
|
+
---
|
|
2
|
+
description: "Plan sub-stage: Define user-facing flows, screens, interactions, edge cases."
|
|
3
|
+
---
|
|
4
|
+
|
|
5
|
+
# UX Flow
|
|
6
|
+
|
|
7
|
+
**Role:** UX Designer
|
|
8
|
+
**Goal:** Define how the user will experience this feature.
|
|
9
|
+
|
|
10
|
+
## Instructions
|
|
11
|
+
|
|
12
|
+
1. If this feature has no UI changes, output `has_ui_changes: false` and move on
|
|
13
|
+
2. Otherwise, define the user flow step by step
|
|
14
|
+
3. List screens/pages affected (new or modified)
|
|
15
|
+
4. Define interactions (clicks, forms, navigation)
|
|
16
|
+
5. Cover edge cases: empty states, loading states, error states
|
|
17
|
+
|
|
18
|
+
## Output (append to state.md)
|
|
19
|
+
|
|
20
|
+
```markdown
|
|
21
|
+
### Plan: UX Flow
|
|
22
|
+
|
|
23
|
+
**Has UI Changes:** true/false
|
|
24
|
+
|
|
25
|
+
**User Flow:**
|
|
26
|
+
1. User navigates to <page>
|
|
27
|
+
2. User clicks <element>
|
|
28
|
+
3. System shows <response>
|
|
29
|
+
4. ...
|
|
30
|
+
|
|
31
|
+
**Screens Affected:**
|
|
32
|
+
- <page/component> — <new or modified> — <what changes>
|
|
33
|
+
|
|
34
|
+
**Interactions:**
|
|
35
|
+
- <element>: <what happens on click/submit/hover>
|
|
36
|
+
|
|
37
|
+
**Edge Cases:**
|
|
38
|
+
- Empty state: <what shows when no data>
|
|
39
|
+
- Loading state: <what shows during fetch>
|
|
40
|
+
- Error state: <what shows on failure>
|
|
41
|
+
- Permissions: <what if user can't access>
|
|
42
|
+
```
|
|
43
|
+
|
|
44
|
+
## Rules
|
|
45
|
+
|
|
46
|
+
- Skip this entirely if there are zero UI changes (backend-only features)
|
|
47
|
+
- Think from the user's perspective, not the developer's
|
|
48
|
+
- Every screen should have empty, loading, and error states defined
|
|
49
|
+
- Reference existing UI patterns from Investigate findings
|
|
@@ -0,0 +1,104 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: review
|
|
3
|
+
description: "Run the Review phase — 5 sub-stages: Self-Review, Security, Performance, Compliance, Handoff."
|
|
4
|
+
user-invocable: false
|
|
5
|
+
allowed-tools: Bash, Read, Write, Edit, Glob, Grep, Agent
|
|
6
|
+
---
|
|
7
|
+
|
|
8
|
+
You are the **Senior Reviewer**. Perform multi-dimensional code review before the feature ships.
|
|
9
|
+
|
|
10
|
+
## Sub-stages (in order)
|
|
11
|
+
|
|
12
|
+
1. **Self-Review** — Check your own diff for obvious issues
|
|
13
|
+
2. **Security** — OWASP top 10 security review
|
|
14
|
+
3. **Performance** — Query efficiency, bundle size, rendering performance
|
|
15
|
+
4. **Compliance** — Compare final code against Blueprint
|
|
16
|
+
5. **Handoff** — Finalize PR description, flag concerns, make ship/no-ship decision
|
|
17
|
+
|
|
18
|
+
## Execution
|
|
19
|
+
|
|
20
|
+
For each sub-stage:
|
|
21
|
+
1. Read the sub-stage file (e.g., `.claude/skills/review/stages/self-review.md`)
|
|
22
|
+
2. Follow its instructions — fix issues directly when possible
|
|
23
|
+
3. Update `.work-kit/state.md` with findings
|
|
24
|
+
4. Proceed to next sub-stage
|
|
25
|
+
|
|
26
|
+
## Key Principle
|
|
27
|
+
|
|
28
|
+
**Fix issues directly when possible.** A review that only lists problems without fixing them is half a review. If you can fix it in under 5 minutes, fix it. If it's bigger, document it for the Handoff decision.
|
|
29
|
+
|
|
30
|
+
## Recording
|
|
31
|
+
|
|
32
|
+
Throughout every sub-stage, update the shared state.md sections:
|
|
33
|
+
|
|
34
|
+
- **`## Decisions`** — If you make judgment calls during review (e.g., "accepted this deviation because..."), record them.
|
|
35
|
+
- **`## Deviations`** — Compliance sub-stage will audit these. If you fix a deviation during review, note that it was resolved.
|
|
36
|
+
|
|
37
|
+
Review findings feed directly into the Handoff decision and the final work-kit log.
|
|
38
|
+
|
|
39
|
+
## Context Input
|
|
40
|
+
|
|
41
|
+
This phase runs as a **fresh agent** (the orchestrator). Read only these sections from `.work-kit/state.md`:
|
|
42
|
+
- `### Plan: Final` — Blueprint (for Compliance review)
|
|
43
|
+
- `### Build: Final` — what was built, PR, deviations
|
|
44
|
+
- `### Test: Final` — test results, criteria status, confidence
|
|
45
|
+
- `## Criteria` — acceptance criteria
|
|
46
|
+
|
|
47
|
+
## Parallel Sub-agents
|
|
48
|
+
|
|
49
|
+
**Self-Review, Security, Performance, and Compliance** are independent and run as **4 parallel sub-agents**. **Handoff** runs after all 4 complete.
|
|
50
|
+
|
|
51
|
+
```
|
|
52
|
+
Agent: Self-Review ──┐
|
|
53
|
+
Agent: Security ──┤
|
|
54
|
+
Agent: Performance ──├──→ Agent: Handoff
|
|
55
|
+
Agent: Compliance ──┘
|
|
56
|
+
```
|
|
57
|
+
|
|
58
|
+
Each sub-agent receives:
|
|
59
|
+
- The git diff (`git diff main...HEAD`)
|
|
60
|
+
- The relevant Context Input sections
|
|
61
|
+
- Its sub-stage skill file instructions
|
|
62
|
+
|
|
63
|
+
Each writes its own `### Review: <sub-stage>` section to state.md.
|
|
64
|
+
|
|
65
|
+
**Handoff agent** reads all 4 review sections + Test: Final → makes the ship decision.
|
|
66
|
+
|
|
67
|
+
## Final Output
|
|
68
|
+
|
|
69
|
+
After Handoff completes, append a `### Review: Final` section to state.md. This is what **Deploy and Wrap-up read**.
|
|
70
|
+
|
|
71
|
+
```markdown
|
|
72
|
+
### Review: Final
|
|
73
|
+
|
|
74
|
+
**Decision:** approved | changes_requested | rejected
|
|
75
|
+
|
|
76
|
+
**Summary:** <1-2 sentences — overall assessment>
|
|
77
|
+
|
|
78
|
+
**Issues found:** <total count>
|
|
79
|
+
**Issues fixed:** <count>
|
|
80
|
+
**Remaining concerns:**
|
|
81
|
+
- <concern — or "None">
|
|
82
|
+
|
|
83
|
+
**Security:** <clear | risks noted>
|
|
84
|
+
**Performance:** <clear | issues noted>
|
|
85
|
+
**Compliance:** <compliant | deviations noted>
|
|
86
|
+
|
|
87
|
+
**If changes_requested:**
|
|
88
|
+
- <specific change 1>
|
|
89
|
+
- <specific change 2>
|
|
90
|
+
|
|
91
|
+
**If rejected:**
|
|
92
|
+
- <reason>
|
|
93
|
+
```
|
|
94
|
+
|
|
95
|
+
Then:
|
|
96
|
+
- Update state: `**Phase:** review (complete)`
|
|
97
|
+
- Commit state: `git add .work-kit/ && git commit -m "work-kit: complete review"`
|
|
98
|
+
|
|
99
|
+
## Routing
|
|
100
|
+
|
|
101
|
+
The Handoff decision determines what happens next:
|
|
102
|
+
- **approved** → proceed to Deploy (or complete if deploy is skipped)
|
|
103
|
+
- **changes_requested** → loop back to Build/Core with specific change list
|
|
104
|
+
- **rejected** → stop work, explain why to the user
|
|
@@ -0,0 +1,48 @@
|
|
|
1
|
+
---
|
|
2
|
+
description: "Review sub-stage: Compare final code against Blueprint."
|
|
3
|
+
---
|
|
4
|
+
|
|
5
|
+
# Compliance Review
|
|
6
|
+
|
|
7
|
+
**Role:** Compliance Auditor
|
|
8
|
+
**Goal:** Verify the implementation matches what was planned.
|
|
9
|
+
|
|
10
|
+
## Instructions
|
|
11
|
+
|
|
12
|
+
1. Re-read the Blueprint from `.work-kit/state.md`
|
|
13
|
+
2. Compare final code against each Blueprint step:
|
|
14
|
+
- Was the step implemented?
|
|
15
|
+
- Does it match the specified approach?
|
|
16
|
+
- Any deviations?
|
|
17
|
+
3. Check:
|
|
18
|
+
- All Blueprint steps are implemented
|
|
19
|
+
- No scope creep (things built that weren't in the Blueprint)
|
|
20
|
+
- Architecture matches the plan (data model, API surface, components)
|
|
21
|
+
- UX Flow matches the plan (screens, interactions, states)
|
|
22
|
+
4. Document any deviations with justification
|
|
23
|
+
|
|
24
|
+
## Output (append to state.md)
|
|
25
|
+
|
|
26
|
+
```markdown
|
|
27
|
+
### Review: Compliance
|
|
28
|
+
|
|
29
|
+
**Result:** compliant | deviations_found
|
|
30
|
+
|
|
31
|
+
**Blueprint Steps:**
|
|
32
|
+
- Step 1: <implemented | deviated | missing>
|
|
33
|
+
- Step 2: <implemented | deviated | missing>
|
|
34
|
+
- ...
|
|
35
|
+
|
|
36
|
+
**Deviations:**
|
|
37
|
+
- <deviation and justification — or "None">
|
|
38
|
+
|
|
39
|
+
**Scope Creep:**
|
|
40
|
+
- <anything built that wasn't planned — or "None">
|
|
41
|
+
```
|
|
42
|
+
|
|
43
|
+
## Rules
|
|
44
|
+
|
|
45
|
+
- Deviations aren't always bad — sometimes the plan was wrong and the code adapted
|
|
46
|
+
- But deviations need justification — "I felt like it" is not acceptable
|
|
47
|
+
- Missing steps are a red flag — they need to be implemented or explicitly dropped with reason
|
|
48
|
+
- Scope creep should be called out even if the extra code is good
|
|
@@ -0,0 +1,59 @@
|
|
|
1
|
+
---
|
|
2
|
+
description: "Review sub-stage: Finalize PR, make ship/no-ship decision."
|
|
3
|
+
---
|
|
4
|
+
|
|
5
|
+
# Handoff
|
|
6
|
+
|
|
7
|
+
**Role:** Ship Decision Maker
|
|
8
|
+
**Goal:** Final PR polish and go/no-go decision.
|
|
9
|
+
|
|
10
|
+
## Instructions
|
|
11
|
+
|
|
12
|
+
1. Update the PR description with:
|
|
13
|
+
- Summary of what was built
|
|
14
|
+
- How to test it
|
|
15
|
+
- Screenshots if applicable
|
|
16
|
+
- Any concerns or known limitations
|
|
17
|
+
2. Review all findings from prior review sub-stages
|
|
18
|
+
3. Check acceptance criteria status from Test/Validate
|
|
19
|
+
4. Make the decision: **approved**, **changes_requested**, or **rejected**
|
|
20
|
+
|
|
21
|
+
## Decision Criteria
|
|
22
|
+
|
|
23
|
+
- **approved**: All criteria met, no critical/high security issues, tests pass, compliance is acceptable
|
|
24
|
+
- **changes_requested**: Gaps exist but are fixable — specify exactly what needs to change
|
|
25
|
+
- **rejected**: Fundamental problems that require rethinking the approach
|
|
26
|
+
|
|
27
|
+
## Output (append to state.md)
|
|
28
|
+
|
|
29
|
+
```markdown
|
|
30
|
+
### Review: Handoff
|
|
31
|
+
|
|
32
|
+
**PR Description:** updated | already adequate
|
|
33
|
+
**Summary:** <1-2 sentences: what ships and its state>
|
|
34
|
+
|
|
35
|
+
**Concerns:**
|
|
36
|
+
- <any remaining concerns — or "None">
|
|
37
|
+
|
|
38
|
+
**Decision:** approved | changes_requested | rejected
|
|
39
|
+
|
|
40
|
+
**If changes_requested:**
|
|
41
|
+
- <specific change 1>
|
|
42
|
+
- <specific change 2>
|
|
43
|
+
|
|
44
|
+
**If rejected:**
|
|
45
|
+
- <reason and recommended next step>
|
|
46
|
+
```
|
|
47
|
+
|
|
48
|
+
## Outcome Routing
|
|
49
|
+
|
|
50
|
+
- **approved** → Proceed to Deploy phase (or complete if deploy skipped)
|
|
51
|
+
- **changes_requested** → Loop back to Build/Core with the change list as context
|
|
52
|
+
- **rejected** → Stop. Report to user with explanation.
|
|
53
|
+
|
|
54
|
+
## Rules
|
|
55
|
+
|
|
56
|
+
- Be specific about what needs to change — "needs work" is useless feedback
|
|
57
|
+
- Don't block on cosmetic issues — fix them directly before finalizing
|
|
58
|
+
- The PR should be ready for a human reviewer after this step
|
|
59
|
+
- If you're unsure between approved and changes_requested, ask the user
|
|
@@ -0,0 +1,45 @@
|
|
|
1
|
+
---
|
|
2
|
+
description: "Review sub-stage: Check for performance issues."
|
|
3
|
+
---
|
|
4
|
+
|
|
5
|
+
# Performance Review
|
|
6
|
+
|
|
7
|
+
**Role:** Performance Engineer
|
|
8
|
+
**Goal:** Catch performance problems before they reach production.
|
|
9
|
+
|
|
10
|
+
## Instructions
|
|
11
|
+
|
|
12
|
+
Check the diff for:
|
|
13
|
+
|
|
14
|
+
1. **N+1 Queries** — Loops that make individual DB queries instead of batching
|
|
15
|
+
2. **Missing Indexes** — New query patterns that need DB indexes
|
|
16
|
+
3. **Large Bundle Imports** — Importing entire libraries when only one function is needed
|
|
17
|
+
4. **Unnecessary Re-renders** — Components re-rendering when they shouldn't
|
|
18
|
+
5. **Missing Memoization** — Expensive computations without caching
|
|
19
|
+
6. **Hot Path Operations** — Heavy work in frequently-called code paths
|
|
20
|
+
7. **Missing Pagination** — Unbounded queries or list renders
|
|
21
|
+
8. **Memory Leaks** — Event listeners, intervals, or subscriptions not cleaned up
|
|
22
|
+
|
|
23
|
+
Fix what you can. Document what needs deeper investigation.
|
|
24
|
+
|
|
25
|
+
## Output (append to state.md)
|
|
26
|
+
|
|
27
|
+
```markdown
|
|
28
|
+
### Review: Performance
|
|
29
|
+
|
|
30
|
+
**Findings:**
|
|
31
|
+
- <finding — or "None">
|
|
32
|
+
|
|
33
|
+
**Fixes Applied:**
|
|
34
|
+
- <what was fixed — or "None">
|
|
35
|
+
|
|
36
|
+
**Recommendations:**
|
|
37
|
+
- <suggestions for future optimization — or "None">
|
|
38
|
+
```
|
|
39
|
+
|
|
40
|
+
## Rules
|
|
41
|
+
|
|
42
|
+
- Focus on actual problems, not theoretical ones
|
|
43
|
+
- Don't prematurely optimize code that runs once on page load
|
|
44
|
+
- DO flag anything that scales with data (queries, list renders, loops)
|
|
45
|
+
- If you add an index, include it in the migration or note it as needed
|
|
@@ -0,0 +1,49 @@
|
|
|
1
|
+
---
|
|
2
|
+
description: "Review sub-stage: OWASP top 10 security review."
|
|
3
|
+
---
|
|
4
|
+
|
|
5
|
+
# Security Review
|
|
6
|
+
|
|
7
|
+
**Role:** Security Auditor
|
|
8
|
+
**Goal:** Check for common security vulnerabilities.
|
|
9
|
+
|
|
10
|
+
## Instructions
|
|
11
|
+
|
|
12
|
+
Review the diff against OWASP Top 10:
|
|
13
|
+
|
|
14
|
+
1. **Injection** — SQL injection, command injection, code injection. Are all inputs parameterized?
|
|
15
|
+
2. **Broken Auth** — Are auth checks in place where needed? Session handling correct?
|
|
16
|
+
3. **Sensitive Data Exposure** — Are secrets, tokens, PII handled safely? No logging of sensitive data?
|
|
17
|
+
4. **Broken Access Control** — Can users access resources they shouldn't?
|
|
18
|
+
5. **Security Misconfiguration** — Default configs, unnecessary features enabled, overly permissive CORS?
|
|
19
|
+
6. **XSS** — User input rendered without sanitization? Raw HTML injection vectors?
|
|
20
|
+
7. **Insecure Deserialization** — Untrusted data parsed without validation?
|
|
21
|
+
8. **Vulnerable Dependencies** — Known CVEs in new dependencies?
|
|
22
|
+
9. **Insufficient Logging** — Security events logged? But no sensitive data in logs?
|
|
23
|
+
10. **CSRF** — State-changing requests protected?
|
|
24
|
+
|
|
25
|
+
Fix issues directly when possible. Document what you can't fix.
|
|
26
|
+
|
|
27
|
+
## Output (append to state.md)
|
|
28
|
+
|
|
29
|
+
```markdown
|
|
30
|
+
### Review: Security
|
|
31
|
+
|
|
32
|
+
**Findings:**
|
|
33
|
+
- <finding with severity: critical/high/medium/low — or "None">
|
|
34
|
+
|
|
35
|
+
**Fixes Applied:**
|
|
36
|
+
- <what was fixed — or "None">
|
|
37
|
+
|
|
38
|
+
**Remaining Risks:**
|
|
39
|
+
- <risks that need human attention — or "None">
|
|
40
|
+
|
|
41
|
+
**Severity Summary:** no issues | low | medium | high | critical
|
|
42
|
+
```
|
|
43
|
+
|
|
44
|
+
## Rules
|
|
45
|
+
|
|
46
|
+
- Focus on code YOU wrote/modified — not the entire codebase
|
|
47
|
+
- Not every feature touches all 10 categories — skip irrelevant ones
|
|
48
|
+
- Don't add security theater (unnecessary complexity for non-existent threats)
|
|
49
|
+
- If you find a critical issue, fix it immediately and note it prominently
|
|
@@ -0,0 +1,41 @@
|
|
|
1
|
+
---
|
|
2
|
+
description: "Review sub-stage: Review your own diff for obvious issues."
|
|
3
|
+
---
|
|
4
|
+
|
|
5
|
+
# Self-Review
|
|
6
|
+
|
|
7
|
+
**Role:** Self-Critical Developer
|
|
8
|
+
**Goal:** Catch the easy stuff before the formal review.
|
|
9
|
+
|
|
10
|
+
## Instructions
|
|
11
|
+
|
|
12
|
+
1. Run `git diff main...HEAD` to see the full diff
|
|
13
|
+
2. Check for:
|
|
14
|
+
- Dead code or unused imports
|
|
15
|
+
- Unclear variable/function naming
|
|
16
|
+
- Missing or misleading comments
|
|
17
|
+
- Copy-paste errors
|
|
18
|
+
- Formatting issues (run the linter)
|
|
19
|
+
- TODOs that should be resolved
|
|
20
|
+
- Console.logs or debug code left in
|
|
21
|
+
- Code that doesn't match the Blueprint
|
|
22
|
+
3. Fix issues directly — don't just list them
|
|
23
|
+
4. Run tests after fixes to confirm nothing broke
|
|
24
|
+
|
|
25
|
+
## Output (append to state.md)
|
|
26
|
+
|
|
27
|
+
```markdown
|
|
28
|
+
### Review: Self-Review
|
|
29
|
+
|
|
30
|
+
**Issues Found:** <N>
|
|
31
|
+
**Issues Fixed:** <M>
|
|
32
|
+
**Remaining Concerns:**
|
|
33
|
+
- <anything you found but couldn't fix — or "None">
|
|
34
|
+
```
|
|
35
|
+
|
|
36
|
+
## Rules
|
|
37
|
+
|
|
38
|
+
- Run the linter and fix all warnings
|
|
39
|
+
- Remove ALL debug code (console.log, debugger statements, etc.)
|
|
40
|
+
- This is about catching careless mistakes, not redesigning the architecture
|
|
41
|
+
- Be honest — pretending your code is perfect helps no one
|
|
@@ -0,0 +1,83 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: test
|
|
3
|
+
description: "Run the Test phase — 3 sub-stages: Verify, E2E, Validate."
|
|
4
|
+
user-invocable: false
|
|
5
|
+
allowed-tools: Bash, Read, Write, Edit, Glob, Grep, Agent
|
|
6
|
+
---
|
|
7
|
+
|
|
8
|
+
You are the **QA Lead**. Validate the implementation against the Blueprint and acceptance criteria.
|
|
9
|
+
|
|
10
|
+
## Sub-stages (in order)
|
|
11
|
+
|
|
12
|
+
1. **Verify** — Run existing test suite, check for regressions
|
|
13
|
+
2. **E2E** — Test user flows end-to-end
|
|
14
|
+
3. **Validate** — Verify every acceptance criterion is satisfied
|
|
15
|
+
|
|
16
|
+
## Execution
|
|
17
|
+
|
|
18
|
+
For each sub-stage:
|
|
19
|
+
1. Read the sub-stage file (e.g., `.claude/skills/test/stages/verify.md`)
|
|
20
|
+
2. Follow its instructions
|
|
21
|
+
3. Update `.work-kit/state.md` with outputs
|
|
22
|
+
4. Proceed to next sub-stage
|
|
23
|
+
|
|
24
|
+
## Key Principle
|
|
25
|
+
|
|
26
|
+
**Test against the Blueprint, not just the code.** The Blueprint defined what should be built. Verify that what was built matches what was planned, and that it actually works.
|
|
27
|
+
|
|
28
|
+
## Recording
|
|
29
|
+
|
|
30
|
+
Throughout every sub-stage, update the shared state.md sections:
|
|
31
|
+
|
|
32
|
+
- **`## Criteria`** — Check off criteria as they're verified. Add evidence inline: `- [x] <criterion> — verified by <test name / screenshot / manual check>`.
|
|
33
|
+
- **`## Decisions`** — If you discover a criterion is untestable or needs reinterpretation, record the decision and why.
|
|
34
|
+
|
|
35
|
+
The criteria checklist is copied directly into the final work-kit log. Make it accurate.
|
|
36
|
+
|
|
37
|
+
## Context Input
|
|
38
|
+
|
|
39
|
+
This phase runs as a **fresh agent**. Read only these sections from `.work-kit/state.md`:
|
|
40
|
+
- `### Build: Final` — what was built, PR, test status, known issues
|
|
41
|
+
- `### Plan: Final` — Blueprint (to test against) and Architecture
|
|
42
|
+
- `## Criteria` — acceptance criteria to validate
|
|
43
|
+
|
|
44
|
+
## Parallel Sub-agents
|
|
45
|
+
|
|
46
|
+
**Verify** and **E2E** are independent and can run as **parallel sub-agents**. **Validate** runs after both complete (it needs their results).
|
|
47
|
+
|
|
48
|
+
```
|
|
49
|
+
Agent: Verify ──┐
|
|
50
|
+
├──→ Agent: Validate
|
|
51
|
+
Agent: E2E ──┘
|
|
52
|
+
```
|
|
53
|
+
|
|
54
|
+
Each sub-agent reads the same Context Input sections and writes its own `### Test: <sub-stage>` section to state.md.
|
|
55
|
+
|
|
56
|
+
## Final Output
|
|
57
|
+
|
|
58
|
+
After all sub-stages are done, append a `### Test: Final` section to state.md. This is what **Review agents read**.
|
|
59
|
+
|
|
60
|
+
```markdown
|
|
61
|
+
### Test: Final
|
|
62
|
+
|
|
63
|
+
**Suite status:** all passing | <N> failures
|
|
64
|
+
**Total tests:** <count> (passing: <N>, failing: <N>)
|
|
65
|
+
|
|
66
|
+
**Criteria status:**
|
|
67
|
+
- Satisfied: <N> / <total>
|
|
68
|
+
- Gaps: <list — or "None">
|
|
69
|
+
|
|
70
|
+
**Confidence:** high | medium | low
|
|
71
|
+
|
|
72
|
+
**E2E results:**
|
|
73
|
+
- <flow>: pass | fail
|
|
74
|
+
- ...
|
|
75
|
+
|
|
76
|
+
**Evidence summary:**
|
|
77
|
+
- <criterion> — <evidence type and location>
|
|
78
|
+
- ...
|
|
79
|
+
```
|
|
80
|
+
|
|
81
|
+
Then:
|
|
82
|
+
- Update state: `**Phase:** test (complete)`
|
|
83
|
+
- Commit state: `git add .work-kit/ && git commit -m "work-kit: complete test"`
|
|
@@ -0,0 +1,44 @@
|
|
|
1
|
+
---
|
|
2
|
+
description: "Test sub-stage: Test user flows end-to-end."
|
|
3
|
+
---
|
|
4
|
+
|
|
5
|
+
# E2E
|
|
6
|
+
|
|
7
|
+
**Role:** End-to-End Tester
|
|
8
|
+
**Goal:** Test the feature as a user would experience it.
|
|
9
|
+
|
|
10
|
+
## Instructions
|
|
11
|
+
|
|
12
|
+
1. Review the UX Flow from the Plan phase
|
|
13
|
+
2. For each user flow defined:
|
|
14
|
+
- Write an E2E test (Playwright, Cypress, or manual verification)
|
|
15
|
+
- Test the happy path
|
|
16
|
+
- Test key edge cases (empty state, error state, boundary values)
|
|
17
|
+
3. Take screenshots at key states if the test framework supports it
|
|
18
|
+
4. Focus on the most important flows — don't test every permutation
|
|
19
|
+
|
|
20
|
+
## Output (append to state.md)
|
|
21
|
+
|
|
22
|
+
```markdown
|
|
23
|
+
### Test: E2E
|
|
24
|
+
|
|
25
|
+
**Tests Written:**
|
|
26
|
+
- `<test file>`: <flow description>
|
|
27
|
+
|
|
28
|
+
**Flows Verified:**
|
|
29
|
+
- <flow 1>: pass | fail (<details>)
|
|
30
|
+
- <flow 2>: pass | fail (<details>)
|
|
31
|
+
|
|
32
|
+
**Screenshots:**
|
|
33
|
+
- <description>: <path or "not applicable">
|
|
34
|
+
|
|
35
|
+
**Notes:**
|
|
36
|
+
- <edge cases tested, issues found>
|
|
37
|
+
```
|
|
38
|
+
|
|
39
|
+
## Rules
|
|
40
|
+
|
|
41
|
+
- If the project has no E2E framework, test manually and document the steps
|
|
42
|
+
- Focus on user-visible behavior, not internal implementation
|
|
43
|
+
- Screenshots are evidence — capture them for key states
|
|
44
|
+
- If a flow fails, fix the implementation (not the test) unless the test expectation is wrong
|
|
@@ -0,0 +1,51 @@
|
|
|
1
|
+
---
|
|
2
|
+
description: "Test sub-stage: Verify every acceptance criterion is satisfied with evidence."
|
|
3
|
+
---
|
|
4
|
+
|
|
5
|
+
# Validate
|
|
6
|
+
|
|
7
|
+
**Role:** Acceptance Validator
|
|
8
|
+
**Goal:** Confirm every acceptance criterion from Clarify is met.
|
|
9
|
+
|
|
10
|
+
## Instructions
|
|
11
|
+
|
|
12
|
+
1. Read the `## Criteria` section from `.work-kit/state.md`
|
|
13
|
+
2. For each criterion:
|
|
14
|
+
- Determine if it's been satisfied by the implementation
|
|
15
|
+
- Identify the evidence (passing test, screenshot, code reference)
|
|
16
|
+
- Mark it as checked `[x]` or unchecked `[ ]` with a note
|
|
17
|
+
3. Assess overall confidence: high | medium | low
|
|
18
|
+
|
|
19
|
+
## Output (append to state.md)
|
|
20
|
+
|
|
21
|
+
Update the `## Criteria` section — check off satisfied criteria with evidence:
|
|
22
|
+
|
|
23
|
+
```markdown
|
|
24
|
+
## Criteria
|
|
25
|
+
- [x] User can upload an avatar image — tested in `avatar.test.ts:upload`
|
|
26
|
+
- [x] Fallback to initials when no avatar — screenshot: empty-state.png
|
|
27
|
+
- [ ] Avatar displays at 32px, 48px, and 96px sizes — 32px and 48px verified, 96px not tested
|
|
28
|
+
```
|
|
29
|
+
|
|
30
|
+
Also append:
|
|
31
|
+
|
|
32
|
+
```markdown
|
|
33
|
+
### Test: Validate
|
|
34
|
+
|
|
35
|
+
**Criteria Status:**
|
|
36
|
+
- Satisfied: <N> / <total>
|
|
37
|
+
- Gaps: <list of unsatisfied criteria>
|
|
38
|
+
|
|
39
|
+
**Confidence:** high | medium | low
|
|
40
|
+
|
|
41
|
+
**Gap Details:**
|
|
42
|
+
- "<unsatisfied criterion>" — <why it's not met and what's needed>
|
|
43
|
+
```
|
|
44
|
+
|
|
45
|
+
## Rules
|
|
46
|
+
|
|
47
|
+
- Every criterion needs evidence, not just "I think it works"
|
|
48
|
+
- Be honest about gaps — hiding them here means Review catches them later (or worse, they ship)
|
|
49
|
+
- If a criterion is genuinely not testable, explain why
|
|
50
|
+
- Low confidence should trigger concern in the Review phase
|
|
51
|
+
- Criteria should not change during Test — if a new criterion is discovered, note it but don't add it to the checklist mid-test
|
|
@@ -0,0 +1,41 @@
|
|
|
1
|
+
---
|
|
2
|
+
description: "Test sub-stage: Run existing test suite, check for regressions."
|
|
3
|
+
---
|
|
4
|
+
|
|
5
|
+
# Verify
|
|
6
|
+
|
|
7
|
+
**Role:** Regression Tester
|
|
8
|
+
**Goal:** Ensure nothing is broken — both new and existing tests pass.
|
|
9
|
+
|
|
10
|
+
## Instructions
|
|
11
|
+
|
|
12
|
+
1. Run the full test suite
|
|
13
|
+
2. Check results:
|
|
14
|
+
- All new tests (from Build/Red): should pass
|
|
15
|
+
- All pre-existing tests: should still pass
|
|
16
|
+
3. If any test fails:
|
|
17
|
+
- Determine if it's a regression (existing test broke) or a new failure
|
|
18
|
+
- Fix regressions immediately — don't skip or disable tests
|
|
19
|
+
- For new test failures, investigate and fix the implementation
|
|
20
|
+
4. Run the suite again after fixes to confirm clean pass
|
|
21
|
+
|
|
22
|
+
## Output (append to state.md)
|
|
23
|
+
|
|
24
|
+
```markdown
|
|
25
|
+
### Test: Verify
|
|
26
|
+
|
|
27
|
+
**Suite Result:** pass | fail
|
|
28
|
+
**Total Tests:** <N> passing, <M> failing
|
|
29
|
+
**Regressions Found:**
|
|
30
|
+
- <test name> — <what broke and fix applied — or "None">
|
|
31
|
+
|
|
32
|
+
**Fixes Applied:**
|
|
33
|
+
- <description — or "None">
|
|
34
|
+
```
|
|
35
|
+
|
|
36
|
+
## Rules
|
|
37
|
+
|
|
38
|
+
- Do NOT skip failing tests — fix them
|
|
39
|
+
- Do NOT disable tests to make the suite pass
|
|
40
|
+
- If a pre-existing test fails and it's a legitimate behavior change, update the test with a comment explaining why
|
|
41
|
+
- Run the suite at least twice — once to find issues, once to confirm fixes
|