@abdullahsahmad/work-kit 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (76) hide show
  1. package/README.md +147 -0
  2. package/cli/bin/work-kit.mjs +18 -0
  3. package/cli/src/commands/complete.ts +123 -0
  4. package/cli/src/commands/completions.ts +137 -0
  5. package/cli/src/commands/context.ts +41 -0
  6. package/cli/src/commands/doctor.ts +79 -0
  7. package/cli/src/commands/init.test.ts +116 -0
  8. package/cli/src/commands/init.ts +184 -0
  9. package/cli/src/commands/loopback.ts +64 -0
  10. package/cli/src/commands/next.ts +172 -0
  11. package/cli/src/commands/observe.ts +144 -0
  12. package/cli/src/commands/setup.ts +159 -0
  13. package/cli/src/commands/status.ts +50 -0
  14. package/cli/src/commands/uninstall.ts +89 -0
  15. package/cli/src/commands/upgrade.ts +12 -0
  16. package/cli/src/commands/validate.ts +34 -0
  17. package/cli/src/commands/workflow.ts +125 -0
  18. package/cli/src/config/agent-map.ts +62 -0
  19. package/cli/src/config/loopback-routes.ts +45 -0
  20. package/cli/src/config/phases.ts +119 -0
  21. package/cli/src/context/extractor.test.ts +77 -0
  22. package/cli/src/context/extractor.ts +73 -0
  23. package/cli/src/context/prompt-builder.ts +70 -0
  24. package/cli/src/engine/loopbacks.test.ts +33 -0
  25. package/cli/src/engine/loopbacks.ts +32 -0
  26. package/cli/src/engine/parallel.ts +60 -0
  27. package/cli/src/engine/phases.ts +23 -0
  28. package/cli/src/engine/transitions.test.ts +117 -0
  29. package/cli/src/engine/transitions.ts +97 -0
  30. package/cli/src/index.ts +248 -0
  31. package/cli/src/observer/data.ts +237 -0
  32. package/cli/src/observer/renderer.ts +316 -0
  33. package/cli/src/observer/watcher.ts +99 -0
  34. package/cli/src/state/helpers.test.ts +91 -0
  35. package/cli/src/state/helpers.ts +65 -0
  36. package/cli/src/state/schema.ts +113 -0
  37. package/cli/src/state/store.ts +82 -0
  38. package/cli/src/state/validators.test.ts +105 -0
  39. package/cli/src/state/validators.ts +81 -0
  40. package/cli/src/utils/colors.ts +12 -0
  41. package/package.json +49 -0
  42. package/skills/auto-kit/SKILL.md +214 -0
  43. package/skills/build/SKILL.md +88 -0
  44. package/skills/build/stages/commit.md +43 -0
  45. package/skills/build/stages/core.md +48 -0
  46. package/skills/build/stages/integration.md +44 -0
  47. package/skills/build/stages/migration.md +41 -0
  48. package/skills/build/stages/red.md +44 -0
  49. package/skills/build/stages/refactor.md +48 -0
  50. package/skills/build/stages/setup.md +42 -0
  51. package/skills/build/stages/ui.md +51 -0
  52. package/skills/deploy/SKILL.md +62 -0
  53. package/skills/deploy/stages/merge.md +47 -0
  54. package/skills/deploy/stages/monitor.md +39 -0
  55. package/skills/deploy/stages/remediate.md +54 -0
  56. package/skills/full-kit/SKILL.md +195 -0
  57. package/skills/plan/SKILL.md +77 -0
  58. package/skills/plan/stages/architecture.md +53 -0
  59. package/skills/plan/stages/audit.md +58 -0
  60. package/skills/plan/stages/blueprint.md +60 -0
  61. package/skills/plan/stages/clarify.md +61 -0
  62. package/skills/plan/stages/investigate.md +47 -0
  63. package/skills/plan/stages/scope.md +46 -0
  64. package/skills/plan/stages/sketch.md +44 -0
  65. package/skills/plan/stages/ux-flow.md +49 -0
  66. package/skills/review/SKILL.md +104 -0
  67. package/skills/review/stages/compliance.md +48 -0
  68. package/skills/review/stages/handoff.md +59 -0
  69. package/skills/review/stages/performance.md +45 -0
  70. package/skills/review/stages/security.md +49 -0
  71. package/skills/review/stages/self-review.md +41 -0
  72. package/skills/test/SKILL.md +83 -0
  73. package/skills/test/stages/e2e.md +44 -0
  74. package/skills/test/stages/validate.md +51 -0
  75. package/skills/test/stages/verify.md +41 -0
  76. package/skills/wrap-up/SKILL.md +107 -0
@@ -0,0 +1,49 @@
1
+ ---
2
+ description: "Plan sub-stage: Define user-facing flows, screens, interactions, edge cases."
3
+ ---
4
+
5
+ # UX Flow
6
+
7
+ **Role:** UX Designer
8
+ **Goal:** Define how the user will experience this feature.
9
+
10
+ ## Instructions
11
+
12
+ 1. If this feature has no UI changes, output `has_ui_changes: false` and move on
13
+ 2. Otherwise, define the user flow step by step
14
+ 3. List screens/pages affected (new or modified)
15
+ 4. Define interactions (clicks, forms, navigation)
16
+ 5. Cover edge cases: empty states, loading states, error states
17
+
18
+ ## Output (append to state.md)
19
+
20
+ ```markdown
21
+ ### Plan: UX Flow
22
+
23
+ **Has UI Changes:** true/false
24
+
25
+ **User Flow:**
26
+ 1. User navigates to <page>
27
+ 2. User clicks <element>
28
+ 3. System shows <response>
29
+ 4. ...
30
+
31
+ **Screens Affected:**
32
+ - <page/component> — <new or modified> — <what changes>
33
+
34
+ **Interactions:**
35
+ - <element>: <what happens on click/submit/hover>
36
+
37
+ **Edge Cases:**
38
+ - Empty state: <what shows when no data>
39
+ - Loading state: <what shows during fetch>
40
+ - Error state: <what shows on failure>
41
+ - Permissions: <what if user can't access>
42
+ ```
43
+
44
+ ## Rules
45
+
46
+ - Skip this entirely if there are zero UI changes (backend-only features)
47
+ - Think from the user's perspective, not the developer's
48
+ - Every screen should have empty, loading, and error states defined
49
+ - Reference existing UI patterns from Investigate findings
@@ -0,0 +1,104 @@
1
+ ---
2
+ name: review
3
+ description: "Run the Review phase — 5 sub-stages: Self-Review, Security, Performance, Compliance, Handoff."
4
+ user-invocable: false
5
+ allowed-tools: Bash, Read, Write, Edit, Glob, Grep, Agent
6
+ ---
7
+
8
+ You are the **Senior Reviewer**. Perform multi-dimensional code review before the feature ships.
9
+
10
+ ## Sub-stages (in order)
11
+
12
+ 1. **Self-Review** — Check your own diff for obvious issues
13
+ 2. **Security** — OWASP top 10 security review
14
+ 3. **Performance** — Query efficiency, bundle size, rendering performance
15
+ 4. **Compliance** — Compare final code against Blueprint
16
+ 5. **Handoff** — Finalize PR description, flag concerns, make ship/no-ship decision
17
+
18
+ ## Execution
19
+
20
+ For each sub-stage:
21
+ 1. Read the sub-stage file (e.g., `.claude/skills/review/stages/self-review.md`)
22
+ 2. Follow its instructions — fix issues directly when possible
23
+ 3. Update `.work-kit/state.md` with findings
24
+ 4. Proceed to next sub-stage
25
+
26
+ ## Key Principle
27
+
28
+ **Fix issues directly when possible.** A review that only lists problems without fixing them is half a review. If you can fix it in under 5 minutes, fix it. If it's bigger, document it for the Handoff decision.
29
+
30
+ ## Recording
31
+
32
+ Throughout every sub-stage, update the shared state.md sections:
33
+
34
+ - **`## Decisions`** — If you make judgment calls during review (e.g., "accepted this deviation because..."), record them.
35
+ - **`## Deviations`** — Compliance sub-stage will audit these. If you fix a deviation during review, note that it was resolved.
36
+
37
+ Review findings feed directly into the Handoff decision and the final work-kit log.
38
+
39
+ ## Context Input
40
+
41
+ This phase runs as a **fresh agent** (the orchestrator). Read only these sections from `.work-kit/state.md`:
42
+ - `### Plan: Final` — Blueprint (for Compliance review)
43
+ - `### Build: Final` — what was built, PR, deviations
44
+ - `### Test: Final` — test results, criteria status, confidence
45
+ - `## Criteria` — acceptance criteria
46
+
47
+ ## Parallel Sub-agents
48
+
49
+ **Self-Review, Security, Performance, and Compliance** are independent and run as **4 parallel sub-agents**. **Handoff** runs after all 4 complete.
50
+
51
+ ```
52
+ Agent: Self-Review ──┐
53
+ Agent: Security ──┤
54
+ Agent: Performance ──├──→ Agent: Handoff
55
+ Agent: Compliance ──┘
56
+ ```
57
+
58
+ Each sub-agent receives:
59
+ - The git diff (`git diff main...HEAD`)
60
+ - The relevant Context Input sections
61
+ - Its sub-stage skill file instructions
62
+
63
+ Each writes its own `### Review: <sub-stage>` section to state.md.
64
+
65
+ **Handoff agent** reads all 4 review sections + Test: Final → makes the ship decision.
66
+
67
+ ## Final Output
68
+
69
+ After Handoff completes, append a `### Review: Final` section to state.md. This is what **Deploy and Wrap-up read**.
70
+
71
+ ```markdown
72
+ ### Review: Final
73
+
74
+ **Decision:** approved | changes_requested | rejected
75
+
76
+ **Summary:** <1-2 sentences — overall assessment>
77
+
78
+ **Issues found:** <total count>
79
+ **Issues fixed:** <count>
80
+ **Remaining concerns:**
81
+ - <concern — or "None">
82
+
83
+ **Security:** <clear | risks noted>
84
+ **Performance:** <clear | issues noted>
85
+ **Compliance:** <compliant | deviations noted>
86
+
87
+ **If changes_requested:**
88
+ - <specific change 1>
89
+ - <specific change 2>
90
+
91
+ **If rejected:**
92
+ - <reason>
93
+ ```
94
+
95
+ Then:
96
+ - Update state: `**Phase:** review (complete)`
97
+ - Commit state: `git add .work-kit/ && git commit -m "work-kit: complete review"`
98
+
99
+ ## Routing
100
+
101
+ The Handoff decision determines what happens next:
102
+ - **approved** → proceed to Deploy (or complete if deploy is skipped)
103
+ - **changes_requested** → loop back to Build/Core with specific change list
104
+ - **rejected** → stop work, explain why to the user
@@ -0,0 +1,48 @@
1
+ ---
2
+ description: "Review sub-stage: Compare final code against Blueprint."
3
+ ---
4
+
5
+ # Compliance Review
6
+
7
+ **Role:** Compliance Auditor
8
+ **Goal:** Verify the implementation matches what was planned.
9
+
10
+ ## Instructions
11
+
12
+ 1. Re-read the Blueprint from `.work-kit/state.md`
13
+ 2. Compare final code against each Blueprint step:
14
+ - Was the step implemented?
15
+ - Does it match the specified approach?
16
+ - Any deviations?
17
+ 3. Check:
18
+ - All Blueprint steps are implemented
19
+ - No scope creep (things built that weren't in the Blueprint)
20
+ - Architecture matches the plan (data model, API surface, components)
21
+ - UX Flow matches the plan (screens, interactions, states)
22
+ 4. Document any deviations with justification
23
+
24
+ ## Output (append to state.md)
25
+
26
+ ```markdown
27
+ ### Review: Compliance
28
+
29
+ **Result:** compliant | deviations_found
30
+
31
+ **Blueprint Steps:**
32
+ - Step 1: <implemented | deviated | missing>
33
+ - Step 2: <implemented | deviated | missing>
34
+ - ...
35
+
36
+ **Deviations:**
37
+ - <deviation and justification — or "None">
38
+
39
+ **Scope Creep:**
40
+ - <anything built that wasn't planned — or "None">
41
+ ```
42
+
43
+ ## Rules
44
+
45
+ - Deviations aren't always bad — sometimes the plan was wrong and the code adapted
46
+ - But deviations need justification — "I felt like it" is not acceptable
47
+ - Missing steps are a red flag — they need to be implemented or explicitly dropped with reason
48
+ - Scope creep should be called out even if the extra code is good
@@ -0,0 +1,59 @@
1
+ ---
2
+ description: "Review sub-stage: Finalize PR, make ship/no-ship decision."
3
+ ---
4
+
5
+ # Handoff
6
+
7
+ **Role:** Ship Decision Maker
8
+ **Goal:** Final PR polish and go/no-go decision.
9
+
10
+ ## Instructions
11
+
12
+ 1. Update the PR description with:
13
+ - Summary of what was built
14
+ - How to test it
15
+ - Screenshots if applicable
16
+ - Any concerns or known limitations
17
+ 2. Review all findings from prior review sub-stages
18
+ 3. Check acceptance criteria status from Test/Validate
19
+ 4. Make the decision: **approved**, **changes_requested**, or **rejected**
20
+
21
+ ## Decision Criteria
22
+
23
+ - **approved**: All criteria met, no critical/high security issues, tests pass, compliance is acceptable
24
+ - **changes_requested**: Gaps exist but are fixable — specify exactly what needs to change
25
+ - **rejected**: Fundamental problems that require rethinking the approach
26
+
27
+ ## Output (append to state.md)
28
+
29
+ ```markdown
30
+ ### Review: Handoff
31
+
32
+ **PR Description:** updated | already adequate
33
+ **Summary:** <1-2 sentences: what ships and its state>
34
+
35
+ **Concerns:**
36
+ - <any remaining concerns — or "None">
37
+
38
+ **Decision:** approved | changes_requested | rejected
39
+
40
+ **If changes_requested:**
41
+ - <specific change 1>
42
+ - <specific change 2>
43
+
44
+ **If rejected:**
45
+ - <reason and recommended next step>
46
+ ```
47
+
48
+ ## Outcome Routing
49
+
50
+ - **approved** → Proceed to Deploy phase (or complete if deploy skipped)
51
+ - **changes_requested** → Loop back to Build/Core with the change list as context
52
+ - **rejected** → Stop. Report to user with explanation.
53
+
54
+ ## Rules
55
+
56
+ - Be specific about what needs to change — "needs work" is useless feedback
57
+ - Don't block on cosmetic issues — fix them directly before finalizing
58
+ - The PR should be ready for a human reviewer after this step
59
+ - If you're unsure between approved and changes_requested, ask the user
@@ -0,0 +1,45 @@
1
+ ---
2
+ description: "Review sub-stage: Check for performance issues."
3
+ ---
4
+
5
+ # Performance Review
6
+
7
+ **Role:** Performance Engineer
8
+ **Goal:** Catch performance problems before they reach production.
9
+
10
+ ## Instructions
11
+
12
+ Check the diff for:
13
+
14
+ 1. **N+1 Queries** — Loops that make individual DB queries instead of batching
15
+ 2. **Missing Indexes** — New query patterns that need DB indexes
16
+ 3. **Large Bundle Imports** — Importing entire libraries when only one function is needed
17
+ 4. **Unnecessary Re-renders** — Components re-rendering when they shouldn't
18
+ 5. **Missing Memoization** — Expensive computations without caching
19
+ 6. **Hot Path Operations** — Heavy work in frequently-called code paths
20
+ 7. **Missing Pagination** — Unbounded queries or list renders
21
+ 8. **Memory Leaks** — Event listeners, intervals, or subscriptions not cleaned up
22
+
23
+ Fix what you can. Document what needs deeper investigation.
24
+
25
+ ## Output (append to state.md)
26
+
27
+ ```markdown
28
+ ### Review: Performance
29
+
30
+ **Findings:**
31
+ - <finding — or "None">
32
+
33
+ **Fixes Applied:**
34
+ - <what was fixed — or "None">
35
+
36
+ **Recommendations:**
37
+ - <suggestions for future optimization — or "None">
38
+ ```
39
+
40
+ ## Rules
41
+
42
+ - Focus on actual problems, not theoretical ones
43
+ - Don't prematurely optimize code that runs once on page load
44
+ - DO flag anything that scales with data (queries, list renders, loops)
45
+ - If you add an index, include it in the migration or note it as needed
@@ -0,0 +1,49 @@
1
+ ---
2
+ description: "Review sub-stage: OWASP top 10 security review."
3
+ ---
4
+
5
+ # Security Review
6
+
7
+ **Role:** Security Auditor
8
+ **Goal:** Check for common security vulnerabilities.
9
+
10
+ ## Instructions
11
+
12
+ Review the diff against OWASP Top 10:
13
+
14
+ 1. **Injection** — SQL injection, command injection, code injection. Are all inputs parameterized?
15
+ 2. **Broken Auth** — Are auth checks in place where needed? Session handling correct?
16
+ 3. **Sensitive Data Exposure** — Are secrets, tokens, PII handled safely? No logging of sensitive data?
17
+ 4. **Broken Access Control** — Can users access resources they shouldn't?
18
+ 5. **Security Misconfiguration** — Default configs, unnecessary features enabled, overly permissive CORS?
19
+ 6. **XSS** — User input rendered without sanitization? Raw HTML injection vectors?
20
+ 7. **Insecure Deserialization** — Untrusted data parsed without validation?
21
+ 8. **Vulnerable Dependencies** — Known CVEs in new dependencies?
22
+ 9. **Insufficient Logging** — Security events logged? But no sensitive data in logs?
23
+ 10. **CSRF** — State-changing requests protected?
24
+
25
+ Fix issues directly when possible. Document what you can't fix.
26
+
27
+ ## Output (append to state.md)
28
+
29
+ ```markdown
30
+ ### Review: Security
31
+
32
+ **Findings:**
33
+ - <finding with severity: critical/high/medium/low — or "None">
34
+
35
+ **Fixes Applied:**
36
+ - <what was fixed — or "None">
37
+
38
+ **Remaining Risks:**
39
+ - <risks that need human attention — or "None">
40
+
41
+ **Severity Summary:** no issues | low | medium | high | critical
42
+ ```
43
+
44
+ ## Rules
45
+
46
+ - Focus on code YOU wrote/modified — not the entire codebase
47
+ - Not every feature touches all 10 categories — skip irrelevant ones
48
+ - Don't add security theater (unnecessary complexity for non-existent threats)
49
+ - If you find a critical issue, fix it immediately and note it prominently
@@ -0,0 +1,41 @@
1
+ ---
2
+ description: "Review sub-stage: Review your own diff for obvious issues."
3
+ ---
4
+
5
+ # Self-Review
6
+
7
+ **Role:** Self-Critical Developer
8
+ **Goal:** Catch the easy stuff before the formal review.
9
+
10
+ ## Instructions
11
+
12
+ 1. Run `git diff main...HEAD` to see the full diff
13
+ 2. Check for:
14
+ - Dead code or unused imports
15
+ - Unclear variable/function naming
16
+ - Missing or misleading comments
17
+ - Copy-paste errors
18
+ - Formatting issues (run the linter)
19
+ - TODOs that should be resolved
20
+ - Console.logs or debug code left in
21
+ - Code that doesn't match the Blueprint
22
+ 3. Fix issues directly — don't just list them
23
+ 4. Run tests after fixes to confirm nothing broke
24
+
25
+ ## Output (append to state.md)
26
+
27
+ ```markdown
28
+ ### Review: Self-Review
29
+
30
+ **Issues Found:** <N>
31
+ **Issues Fixed:** <M>
32
+ **Remaining Concerns:**
33
+ - <anything you found but couldn't fix — or "None">
34
+ ```
35
+
36
+ ## Rules
37
+
38
+ - Run the linter and fix all warnings
39
+ - Remove ALL debug code (console.log, debugger statements, etc.)
40
+ - This is about catching careless mistakes, not redesigning the architecture
41
+ - Be honest — pretending your code is perfect helps no one
@@ -0,0 +1,83 @@
1
+ ---
2
+ name: test
3
+ description: "Run the Test phase — 3 sub-stages: Verify, E2E, Validate."
4
+ user-invocable: false
5
+ allowed-tools: Bash, Read, Write, Edit, Glob, Grep, Agent
6
+ ---
7
+
8
+ You are the **QA Lead**. Validate the implementation against the Blueprint and acceptance criteria.
9
+
10
+ ## Sub-stages (in order)
11
+
12
+ 1. **Verify** — Run existing test suite, check for regressions
13
+ 2. **E2E** — Test user flows end-to-end
14
+ 3. **Validate** — Verify every acceptance criterion is satisfied
15
+
16
+ ## Execution
17
+
18
+ For each sub-stage:
19
+ 1. Read the sub-stage file (e.g., `.claude/skills/test/stages/verify.md`)
20
+ 2. Follow its instructions
21
+ 3. Update `.work-kit/state.md` with outputs
22
+ 4. Proceed to next sub-stage
23
+
24
+ ## Key Principle
25
+
26
+ **Test against the Blueprint, not just the code.** The Blueprint defined what should be built. Verify that what was built matches what was planned, and that it actually works.
27
+
28
+ ## Recording
29
+
30
+ Throughout every sub-stage, update the shared state.md sections:
31
+
32
+ - **`## Criteria`** — Check off criteria as they're verified. Add evidence inline: `- [x] <criterion> — verified by <test name / screenshot / manual check>`.
33
+ - **`## Decisions`** — If you discover a criterion is untestable or needs reinterpretation, record the decision and why.
34
+
35
+ The criteria checklist is copied directly into the final work-kit log. Make it accurate.
36
+
37
+ ## Context Input
38
+
39
+ This phase runs as a **fresh agent**. Read only these sections from `.work-kit/state.md`:
40
+ - `### Build: Final` — what was built, PR, test status, known issues
41
+ - `### Plan: Final` — Blueprint (to test against) and Architecture
42
+ - `## Criteria` — acceptance criteria to validate
43
+
44
+ ## Parallel Sub-agents
45
+
46
+ **Verify** and **E2E** are independent and can run as **parallel sub-agents**. **Validate** runs after both complete (it needs their results).
47
+
48
+ ```
49
+ Agent: Verify ──┐
50
+ ├──→ Agent: Validate
51
+ Agent: E2E ──┘
52
+ ```
53
+
54
+ Each sub-agent reads the same Context Input sections and writes its own `### Test: <sub-stage>` section to state.md.
55
+
56
+ ## Final Output
57
+
58
+ After all sub-stages are done, append a `### Test: Final` section to state.md. This is what **Review agents read**.
59
+
60
+ ```markdown
61
+ ### Test: Final
62
+
63
+ **Suite status:** all passing | <N> failures
64
+ **Total tests:** <count> (passing: <N>, failing: <N>)
65
+
66
+ **Criteria status:**
67
+ - Satisfied: <N> / <total>
68
+ - Gaps: <list — or "None">
69
+
70
+ **Confidence:** high | medium | low
71
+
72
+ **E2E results:**
73
+ - <flow>: pass | fail
74
+ - ...
75
+
76
+ **Evidence summary:**
77
+ - <criterion> — <evidence type and location>
78
+ - ...
79
+ ```
80
+
81
+ Then:
82
+ - Update state: `**Phase:** test (complete)`
83
+ - Commit state: `git add .work-kit/ && git commit -m "work-kit: complete test"`
@@ -0,0 +1,44 @@
1
+ ---
2
+ description: "Test sub-stage: Test user flows end-to-end."
3
+ ---
4
+
5
+ # E2E
6
+
7
+ **Role:** End-to-End Tester
8
+ **Goal:** Test the feature as a user would experience it.
9
+
10
+ ## Instructions
11
+
12
+ 1. Review the UX Flow from the Plan phase
13
+ 2. For each user flow defined:
14
+ - Write an E2E test (Playwright, Cypress, or manual verification)
15
+ - Test the happy path
16
+ - Test key edge cases (empty state, error state, boundary values)
17
+ 3. Take screenshots at key states if the test framework supports it
18
+ 4. Focus on the most important flows — don't test every permutation
19
+
20
+ ## Output (append to state.md)
21
+
22
+ ```markdown
23
+ ### Test: E2E
24
+
25
+ **Tests Written:**
26
+ - `<test file>`: <flow description>
27
+
28
+ **Flows Verified:**
29
+ - <flow 1>: pass | fail (<details>)
30
+ - <flow 2>: pass | fail (<details>)
31
+
32
+ **Screenshots:**
33
+ - <description>: <path or "not applicable">
34
+
35
+ **Notes:**
36
+ - <edge cases tested, issues found>
37
+ ```
38
+
39
+ ## Rules
40
+
41
+ - If the project has no E2E framework, test manually and document the steps
42
+ - Focus on user-visible behavior, not internal implementation
43
+ - Screenshots are evidence — capture them for key states
44
+ - If a flow fails, fix the implementation (not the test) unless the test expectation is wrong
@@ -0,0 +1,51 @@
1
+ ---
2
+ description: "Test sub-stage: Verify every acceptance criterion is satisfied with evidence."
3
+ ---
4
+
5
+ # Validate
6
+
7
+ **Role:** Acceptance Validator
8
+ **Goal:** Confirm every acceptance criterion from Clarify is met.
9
+
10
+ ## Instructions
11
+
12
+ 1. Read the `## Criteria` section from `.work-kit/state.md`
13
+ 2. For each criterion:
14
+ - Determine if it's been satisfied by the implementation
15
+ - Identify the evidence (passing test, screenshot, code reference)
16
+ - Mark it as checked `[x]` or unchecked `[ ]` with a note
17
+ 3. Assess overall confidence: high | medium | low
18
+
19
+ ## Output (append to state.md)
20
+
21
+ Update the `## Criteria` section — check off satisfied criteria with evidence:
22
+
23
+ ```markdown
24
+ ## Criteria
25
+ - [x] User can upload an avatar image — tested in `avatar.test.ts:upload`
26
+ - [x] Fallback to initials when no avatar — screenshot: empty-state.png
27
+ - [ ] Avatar displays at 32px, 48px, and 96px sizes — 32px and 48px verified, 96px not tested
28
+ ```
29
+
30
+ Also append:
31
+
32
+ ```markdown
33
+ ### Test: Validate
34
+
35
+ **Criteria Status:**
36
+ - Satisfied: <N> / <total>
37
+ - Gaps: <list of unsatisfied criteria>
38
+
39
+ **Confidence:** high | medium | low
40
+
41
+ **Gap Details:**
42
+ - "<unsatisfied criterion>" — <why it's not met and what's needed>
43
+ ```
44
+
45
+ ## Rules
46
+
47
+ - Every criterion needs evidence, not just "I think it works"
48
+ - Be honest about gaps — hiding them here means Review catches them later (or worse, they ship)
49
+ - If a criterion is genuinely not testable, explain why
50
+ - Low confidence should trigger concern in the Review phase
51
+ - Criteria should not change during Test — if a new criterion is discovered, note it but don't add it to the checklist mid-test
@@ -0,0 +1,41 @@
1
+ ---
2
+ description: "Test sub-stage: Run existing test suite, check for regressions."
3
+ ---
4
+
5
+ # Verify
6
+
7
+ **Role:** Regression Tester
8
+ **Goal:** Ensure nothing is broken — both new and existing tests pass.
9
+
10
+ ## Instructions
11
+
12
+ 1. Run the full test suite
13
+ 2. Check results:
14
+ - All new tests (from Build/Red): should pass
15
+ - All pre-existing tests: should still pass
16
+ 3. If any test fails:
17
+ - Determine if it's a regression (existing test broke) or a new failure
18
+ - Fix regressions immediately — don't skip or disable tests
19
+ - For new test failures, investigate and fix the implementation
20
+ 4. Run the suite again after fixes to confirm clean pass
21
+
22
+ ## Output (append to state.md)
23
+
24
+ ```markdown
25
+ ### Test: Verify
26
+
27
+ **Suite Result:** pass | fail
28
+ **Total Tests:** <N> passing, <M> failing
29
+ **Regressions Found:**
30
+ - <test name> — <what broke and fix applied — or "None">
31
+
32
+ **Fixes Applied:**
33
+ - <description — or "None">
34
+ ```
35
+
36
+ ## Rules
37
+
38
+ - Do NOT skip failing tests — fix them
39
+ - Do NOT disable tests to make the suite pass
40
+ - If a pre-existing test fails and it's a legitimate behavior change, update the test with a comment explaining why
41
+ - Run the suite at least twice — once to find issues, once to confirm fixes