work-kit-cli 0.4.0 → 0.5.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +8 -1
- package/cli/src/commands/bootstrap.test.ts +1 -1
- package/cli/src/commands/bootstrap.ts +22 -14
- package/cli/src/commands/complete.ts +76 -2
- package/cli/src/commands/doctor.ts +51 -2
- package/cli/src/commands/extract.ts +30 -18
- package/cli/src/commands/init.test.ts +3 -1
- package/cli/src/commands/init.ts +22 -15
- package/cli/src/commands/learn.test.ts +29 -2
- package/cli/src/commands/learn.ts +2 -1
- package/cli/src/commands/setup.ts +17 -1
- package/cli/src/config/agent-map.ts +10 -2
- package/cli/src/config/constants.ts +7 -0
- package/cli/src/config/loopback-routes.ts +6 -0
- package/cli/src/config/model-routing.ts +7 -1
- package/cli/src/config/workflow.ts +12 -6
- package/cli/src/index.ts +2 -2
- package/cli/src/state/helpers.test.ts +1 -1
- package/cli/src/state/schema.ts +11 -4
- package/cli/src/state/validators.test.ts +21 -2
- package/cli/src/state/validators.ts +2 -2
- package/cli/src/utils/knowledge.ts +7 -1
- package/cli/src/workflow/gates.ts +1 -0
- package/cli/src/workflow/parallel.ts +6 -1
- package/cli/src/workflow/transitions.test.ts +2 -2
- package/package.json +2 -2
- package/skills/auto-kit/SKILL.md +8 -1
- package/skills/full-kit/SKILL.md +14 -7
- package/skills/wk-bootstrap/SKILL.md +8 -0
- package/skills/wk-debug/SKILL.md +127 -0
- package/skills/wk-define/SKILL.md +87 -0
- package/skills/wk-define/steps/refine.md +71 -0
- package/skills/wk-define/steps/spec.md +70 -0
- package/skills/wk-plan/steps/architecture.md +16 -0
- package/skills/wk-test/steps/browser.md +92 -0
- package/skills/wk-test/steps/e2e.md +45 -23
- package/skills/wk-wrap-up/SKILL.md +1 -1
- package/skills/wk-wrap-up/steps/knowledge.md +9 -4
|
@@ -0,0 +1,87 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: define
|
|
3
|
+
description: "Run the Define phase — 2 steps that turn a vague idea into a concrete spec before Plan starts."
|
|
4
|
+
user-invocable: false
|
|
5
|
+
allowed-tools: Bash, Read, Write, Edit, Glob, Grep
|
|
6
|
+
---
|
|
7
|
+
|
|
8
|
+
You are the **Definition Lead**. Two focused steps turn the user's raw description into a tightened problem statement and a lightweight spec that the Plan phase can clarify against.
|
|
9
|
+
|
|
10
|
+
## Steps (in order)
|
|
11
|
+
|
|
12
|
+
1. **Refine** — Surface ambiguity in the request, propose 1–3 concrete framings, get the user to pick or correct one. Output: a tightened problem statement.
|
|
13
|
+
2. **Spec** — Turn the tightened statement into a minimal PRD: goal, non-goals, users, success signal, constraints. Not architecture (that's Plan).
|
|
14
|
+
|
|
15
|
+
## Execution
|
|
16
|
+
|
|
17
|
+
For each step:
|
|
18
|
+
1. Read the step file (e.g., `.claude/skills/wk-define/steps/refine.md`)
|
|
19
|
+
2. Follow its instructions completely
|
|
20
|
+
3. Write outputs to `.work-kit/state.md` under a section for that step
|
|
21
|
+
4. Update `**Phase:** define` and `**Step:** <current>` in state.md
|
|
22
|
+
5. Proceed to the next step
|
|
23
|
+
|
|
24
|
+
## When this phase runs
|
|
25
|
+
|
|
26
|
+
- **full-kit**: always (it's the first phase)
|
|
27
|
+
- **auto-kit**: only for `feature` and `large-feature` classifications. Bug fixes, small changes, and refactors skip Define entirely — the user already knows what they want.
|
|
28
|
+
|
|
29
|
+
## Recording
|
|
30
|
+
|
|
31
|
+
Define is short, but you should still capture:
|
|
32
|
+
|
|
33
|
+
- **`## Decisions`** — when you choose between framings, append: `**<context>**: chose <X> over <Y> — <why>`. The Decision capture in Refine is the most important — it locks in which interpretation the rest of the pipeline runs against.
|
|
34
|
+
- **`## Observations`** — typed bullets under `[lesson|convention|risk|workflow]` if you notice something worth preserving.
|
|
35
|
+
|
|
36
|
+
## Loop-back
|
|
37
|
+
|
|
38
|
+
If **Spec** uncovers ambiguity that should have been caught in Refine:
|
|
39
|
+
- Report outcome `revise` on Spec
|
|
40
|
+
- The orchestrator loops back to Refine
|
|
41
|
+
- Re-run Refine with the new ambiguity in mind
|
|
42
|
+
- Then re-run Spec
|
|
43
|
+
- Max 2 iterations
|
|
44
|
+
|
|
45
|
+
## Final Output
|
|
46
|
+
|
|
47
|
+
After both steps are done, append a `### Define: Final` section to state.md. This is what the Plan phase reads.
|
|
48
|
+
|
|
49
|
+
```markdown
|
|
50
|
+
### Define: Final
|
|
51
|
+
|
|
52
|
+
**Verdict:** ready
|
|
53
|
+
|
|
54
|
+
**Problem:** <tightened one-sentence problem statement from Refine>
|
|
55
|
+
|
|
56
|
+
**Spec:**
|
|
57
|
+
- **Goal:** <what success looks like, one sentence>
|
|
58
|
+
- **Non-goals:** <bullets — what we are explicitly NOT doing>
|
|
59
|
+
- **Users:** <who this is for>
|
|
60
|
+
- **Success signal:** <how we know it worked — observable, ideally measurable>
|
|
61
|
+
- **Constraints:** <bullets — hard limits, deadlines, dependencies>
|
|
62
|
+
|
|
63
|
+
**Open Questions for Plan:**
|
|
64
|
+
- <questions Define couldn't resolve, to be picked up in Plan/Clarify — empty if none>
|
|
65
|
+
```
|
|
66
|
+
|
|
67
|
+
Then:
|
|
68
|
+
- Update state: `**Phase:** define (complete)`
|
|
69
|
+
- Commit state: `git add .work-kit/ && git commit -m "work-kit: complete define"`
|
|
70
|
+
|
|
71
|
+
## Boundaries
|
|
72
|
+
|
|
73
|
+
### Always
|
|
74
|
+
- Ask the user when the request has multiple plausible interpretations
|
|
75
|
+
- Keep the spec to one screen — Define is *cheap*, not exhaustive
|
|
76
|
+
- Cross every framing decision into `## Decisions` so wrap-up can graduate it to the knowledge layer
|
|
77
|
+
- Stop when you've tightened the ask, NOT when you've designed it
|
|
78
|
+
|
|
79
|
+
### Ask First
|
|
80
|
+
- Choosing between framings (Refine must surface options to the user, not pick silently)
|
|
81
|
+
- Adding non-goals the user did not mention (confirm before locking them out)
|
|
82
|
+
|
|
83
|
+
### Never
|
|
84
|
+
- Propose architecture, file paths, or implementation strategies — that is Plan's job
|
|
85
|
+
- Investigate the codebase — Plan/Investigate handles that
|
|
86
|
+
- Skip Refine because "the request is clear" — if it were, the user could have written the spec themselves
|
|
87
|
+
- Inflate the spec to look thorough. Define exists to *narrow*, not to demonstrate effort
|
|
@@ -0,0 +1,71 @@
|
|
|
1
|
+
---
|
|
2
|
+
description: "Define step: Surface ambiguity in the request, propose framings, get user to pick one."
|
|
3
|
+
---
|
|
4
|
+
|
|
5
|
+
# Refine
|
|
6
|
+
|
|
7
|
+
**Role:** Idea Refiner
|
|
8
|
+
**Goal:** Take the user's raw description and turn it into a tightened, unambiguous problem statement. If the request is fuzzy, surface that — don't paper over it.
|
|
9
|
+
|
|
10
|
+
## Instructions
|
|
11
|
+
|
|
12
|
+
1. **Read the request** in `.work-kit/state.md` under `## Description`.
|
|
13
|
+
|
|
14
|
+
2. **Identify ambiguities.** Look for:
|
|
15
|
+
- Vague nouns ("the system", "users", "better")
|
|
16
|
+
- Implicit assumptions ("obviously", "just", "simple")
|
|
17
|
+
- Missing actors (who triggers it? who benefits?)
|
|
18
|
+
- Missing scope (one screen? whole feature? whole product area?)
|
|
19
|
+
- Conflicting framings (could mean A or B and they lead to different builds)
|
|
20
|
+
|
|
21
|
+
3. **Generate 1–3 framings.** A framing is a one-sentence interpretation of what the user *probably* means. If the request is genuinely clear, generate ONE framing and confirm it. If it's ambiguous, generate 2–3 distinct framings that lead to materially different builds.
|
|
22
|
+
|
|
23
|
+
4. **Ask the user to pick** one framing or correct all of them. **You must wait for the answer.** Do not silently pick.
|
|
24
|
+
|
|
25
|
+
5. **Write the tightened problem statement** based on the chosen framing. One sentence. Concrete nouns, named actors, observable outcome.
|
|
26
|
+
|
|
27
|
+
## Output (append to state.md)
|
|
28
|
+
|
|
29
|
+
```markdown
|
|
30
|
+
### Define: Refine
|
|
31
|
+
|
|
32
|
+
**Original Request:**
|
|
33
|
+
<copy of the relevant lines from ## Description>
|
|
34
|
+
|
|
35
|
+
**Ambiguities Found:**
|
|
36
|
+
- <ambiguity 1 — what's vague and why it matters>
|
|
37
|
+
- <ambiguity 2>
|
|
38
|
+
- <... or "None — request is clear">
|
|
39
|
+
|
|
40
|
+
**Framings Considered:**
|
|
41
|
+
1. <framing A — one sentence>
|
|
42
|
+
2. <framing B — one sentence>
|
|
43
|
+
3. <framing C — one sentence — only if needed>
|
|
44
|
+
|
|
45
|
+
**Chosen Framing:** <A | B | C> (confirmed by user)
|
|
46
|
+
|
|
47
|
+
**Tightened Problem Statement:**
|
|
48
|
+
<one concrete sentence — this is what Spec will work from>
|
|
49
|
+
```
|
|
50
|
+
|
|
51
|
+
Also append to `## Decisions`:
|
|
52
|
+
|
|
53
|
+
```markdown
|
|
54
|
+
- **Framing**: chose <A> over <B/C> — <why, in user's words if they explained>
|
|
55
|
+
```
|
|
56
|
+
|
|
57
|
+
## Rules
|
|
58
|
+
|
|
59
|
+
- ONE step, ONE deliverable: a tightened problem statement.
|
|
60
|
+
- You **must** ask the user to pick when there are multiple framings. Picking silently violates the "Ask, don't decide" principle.
|
|
61
|
+
- Refine is allowed to be 5 minutes of work. If it takes longer, you're doing Plan's job.
|
|
62
|
+
- Do NOT read code. Do NOT propose solutions. Do NOT design anything.
|
|
63
|
+
|
|
64
|
+
## Anti-Rationalization
|
|
65
|
+
|
|
66
|
+
| Excuse | Reality |
|
|
67
|
+
|--------|---------|
|
|
68
|
+
| "The request is obvious — no need to ask" | If it were obvious, the user wouldn't need a Define phase. The whole point of Refine is to catch the assumption *you* are about to make. Surface it. |
|
|
69
|
+
| "I'll pick the most likely framing and the user can correct me" | That wastes the next 6 steps if you guessed wrong. One question now saves an entire Plan phase. |
|
|
70
|
+
| "Generating multiple framings feels artificial" | If you can only think of one framing, generate one and confirm it. The point is *finding* alternatives, not faking them. |
|
|
71
|
+
| "I'll combine all framings into one comprehensive scope" | Combining framings produces a vague spec that means nothing. Pick one. The others go in non-goals if they came close. |
|
|
@@ -0,0 +1,70 @@
|
|
|
1
|
+
---
|
|
2
|
+
description: "Define step: Turn the tightened problem statement into a minimal PRD."
|
|
3
|
+
---
|
|
4
|
+
|
|
5
|
+
# Spec
|
|
6
|
+
|
|
7
|
+
**Role:** Mini-PRD Author
|
|
8
|
+
**Goal:** Convert the tightened problem statement from Refine into a minimal spec the Plan phase can clarify against. Five sections, no more.
|
|
9
|
+
|
|
10
|
+
## Instructions
|
|
11
|
+
|
|
12
|
+
1. **Read** `### Define: Refine` from `.work-kit/state.md`. The "Tightened Problem Statement" is your starting point.
|
|
13
|
+
|
|
14
|
+
2. **Write the spec.** Five sections, each as short as possible while still being concrete:
|
|
15
|
+
|
|
16
|
+
- **Goal** — one sentence. What does success look like? Should be observable.
|
|
17
|
+
- **Non-goals** — bullets. What are we *explicitly not* doing? This is where you put framings that came close in Refine but lost.
|
|
18
|
+
- **Users** — who is this for? Be specific. "Users" is not specific. "Logged-in admins viewing the team settings page" is.
|
|
19
|
+
- **Success signal** — how do we know it worked? Ideally measurable. If you can't measure it, describe what an observer would see.
|
|
20
|
+
- **Constraints** — bullets. Hard limits: deadlines, dependencies, must-not-touch areas, budget, compatibility.
|
|
21
|
+
|
|
22
|
+
3. **Audit your spec.** Read it back and ask:
|
|
23
|
+
- Is the Goal a single sentence? (If not, you have multiple goals — pick one or escalate.)
|
|
24
|
+
- Are non-goals concrete things, not platitudes? ("Not over-engineering" is not a non-goal.)
|
|
25
|
+
- Could you build the wrong thing if you only read this spec? If yes, sharpen it.
|
|
26
|
+
- Did Refine leave any ambiguity that resurfaced here? If so, report `revise` and loop back.
|
|
27
|
+
|
|
28
|
+
4. **Decide outcome:**
|
|
29
|
+
- `done` — spec is solid, Plan can take it from here
|
|
30
|
+
- `revise` — Refine missed something, loop back
|
|
31
|
+
|
|
32
|
+
## Output (append to state.md)
|
|
33
|
+
|
|
34
|
+
```markdown
|
|
35
|
+
### Define: Spec
|
|
36
|
+
|
|
37
|
+
**Goal:** <one observable sentence>
|
|
38
|
+
|
|
39
|
+
**Non-goals:**
|
|
40
|
+
- <thing we are explicitly not doing>
|
|
41
|
+
- <another thing we are explicitly not doing>
|
|
42
|
+
|
|
43
|
+
**Users:** <specific actor in a specific context>
|
|
44
|
+
|
|
45
|
+
**Success signal:** <what an observer would see, ideally measurable>
|
|
46
|
+
|
|
47
|
+
**Constraints:**
|
|
48
|
+
- <hard limit 1>
|
|
49
|
+
- <hard limit 2>
|
|
50
|
+
|
|
51
|
+
**Open Questions for Plan:**
|
|
52
|
+
- <ambiguity that survived Define and should be picked up in Plan/Clarify — empty if none>
|
|
53
|
+
```
|
|
54
|
+
|
|
55
|
+
## Rules
|
|
56
|
+
|
|
57
|
+
- Each section is **at most 3 lines**. If you can't fit it, you're designing, not specifying.
|
|
58
|
+
- The spec is for the Plan phase, not the user. The user already approved the framing in Refine.
|
|
59
|
+
- Do NOT propose architecture, file paths, libraries, or APIs. None of that.
|
|
60
|
+
- Do NOT add acceptance criteria here — Plan/Clarify owns those.
|
|
61
|
+
- If you find yourself needing more than 5 sections, you're building a PRD instead of a spec. Stop.
|
|
62
|
+
|
|
63
|
+
## Anti-Rationalization
|
|
64
|
+
|
|
65
|
+
| Excuse | Reality |
|
|
66
|
+
|--------|---------|
|
|
67
|
+
| "I should add an Architecture section to be helpful" | Plan does architecture. Adding it here pollutes the boundary and means Plan re-litigates a decision Define wasn't equipped to make. |
|
|
68
|
+
| "Non-goals feel like nitpicks, I'll skip them" | Non-goals are the most valuable section. They prevent scope creep in Build by stating up front what we're *not* doing. |
|
|
69
|
+
| "The success signal is 'it works'" | "It works" is not a signal. If you can't describe what an observer would see, you don't know what success means yet — go back to Refine. |
|
|
70
|
+
| "Constraints can be discovered during Plan" | Some can. The hard ones (deadlines, must-not-touch areas, regulatory) need to be locked in Define so Plan doesn't waste effort on impossible designs. |
|
|
@@ -51,3 +51,19 @@ description: "Plan step: Define technical design — data model, API surface, co
|
|
|
51
51
|
- Include TypeScript types/interfaces that will be needed
|
|
52
52
|
- If you need a migration, specify the exact columns and types
|
|
53
53
|
- This is the technical contract — Blueprint will reference it directly
|
|
54
|
+
|
|
55
|
+
## Recording Decisions
|
|
56
|
+
|
|
57
|
+
When you choose between real alternatives in this step (a library, a data shape, a layering choice), append a bullet to `## Decisions` in state.md using **this exact shape**:
|
|
58
|
+
|
|
59
|
+
```
|
|
60
|
+
- **<context>**: chose <X> over <Y> — <one-sentence why>
|
|
61
|
+
```
|
|
62
|
+
|
|
63
|
+
Examples:
|
|
64
|
+
```
|
|
65
|
+
- **State store**: chose Zustand over Redux Toolkit — smaller surface, no boilerplate, project already uses it elsewhere.
|
|
66
|
+
- **ID generation**: chose ULID over UUID v4 — sortable, K-ordered for index locality.
|
|
67
|
+
```
|
|
68
|
+
|
|
69
|
+
`work-kit extract` (run during `wrap-up/knowledge`) auto-graduates these into `.work-kit-knowledge/decisions.md` so the rationale survives the session. Bullets that don't match this shape are skipped silently — they stay scratch.
|
|
@@ -0,0 +1,92 @@
|
|
|
1
|
+
---
|
|
2
|
+
description: "Test step: Drive the running app via Chrome DevTools MCP and verify each user-facing acceptance criterion."
|
|
3
|
+
---
|
|
4
|
+
|
|
5
|
+
# Browser
|
|
6
|
+
|
|
7
|
+
**Role:** Live Browser Verifier
|
|
8
|
+
**Goal:** Exercise the running application through Chrome DevTools MCP and confirm each user-facing acceptance criterion behaves correctly in a real browser. The unit test suite (`verify`) and end-to-end suite (`e2e`) check what the developer wrote — `browser` checks what the user will actually see.
|
|
9
|
+
|
|
10
|
+
## Driver: Chrome DevTools MCP
|
|
11
|
+
|
|
12
|
+
This step uses the **Chrome DevTools MCP** server. There is no `*.spec.ts` file to write or maintain — you drive the browser interactively using MCP tools, the same way you use `Bash` or `Read`.
|
|
13
|
+
|
|
14
|
+
If the MCP isn't available, `work-kit doctor` will have warned at session start. In that case:
|
|
15
|
+
- Skip this step gracefully
|
|
16
|
+
- Append `### Test: Browser` with `**Verdict:** skipped` and the reason
|
|
17
|
+
- Continue to the next test step
|
|
18
|
+
|
|
19
|
+
## When this step runs
|
|
20
|
+
|
|
21
|
+
- **full-kit**: every UI session
|
|
22
|
+
- **auto-kit**: `feature` and `large-feature` classifications, only when the request involves UI
|
|
23
|
+
- **non-UI work**: skipped automatically by the workflow matrix
|
|
24
|
+
|
|
25
|
+
The Test phase parallelizes `verify`, `e2e`, and `browser` — you run alongside them. If e2e is also enabled and they share a dev server, coordinate so you don't crash each other (see Rules below).
|
|
26
|
+
|
|
27
|
+
## Instructions
|
|
28
|
+
|
|
29
|
+
1. **Check the dev server.** Read project config (or `## Description` / `### Plan: Final`) to find how the dev server is started. If it isn't already running:
|
|
30
|
+
- Start it in the background
|
|
31
|
+
- Wait for it to be reachable (poll the health endpoint or root URL)
|
|
32
|
+
- Note the port
|
|
33
|
+
|
|
34
|
+
2. **List the user-facing flows to verify.** Pull from `### Plan: UX Flow` if present, otherwise from the user-facing acceptance criteria in `## Criteria`. Each flow should be one observable user action with one observable outcome.
|
|
35
|
+
|
|
36
|
+
3. **For each flow:**
|
|
37
|
+
- Use Chrome DevTools MCP to navigate to the relevant URL
|
|
38
|
+
- Perform the action (click, type, submit, etc.)
|
|
39
|
+
- Observe the result (text, element state, URL change, network response)
|
|
40
|
+
- Capture a screenshot to `.work-kit/test-browser/<flow-name>.png` for the record
|
|
41
|
+
- Capture any console errors or failed network requests
|
|
42
|
+
|
|
43
|
+
4. **Record verdicts.** For each flow: `pass | fail | skip` with a one-sentence note.
|
|
44
|
+
|
|
45
|
+
5. **Decide outcome:**
|
|
46
|
+
- `done` — all flows pass
|
|
47
|
+
- `needs_debug` — a flow fails in a way you can't immediately diagnose (the debug skill will help)
|
|
48
|
+
- `blocked` — dev server won't start, MCP unreachable, or environment broken
|
|
49
|
+
|
|
50
|
+
## Output (append to state.md)
|
|
51
|
+
|
|
52
|
+
```markdown
|
|
53
|
+
### Test: Browser
|
|
54
|
+
|
|
55
|
+
**Verdict:** pass | fail | skipped
|
|
56
|
+
**Driver:** Chrome DevTools MCP
|
|
57
|
+
**Dev Server:** <URL or "not started — skipped">
|
|
58
|
+
|
|
59
|
+
**Flows:**
|
|
60
|
+
- [pass|fail|skip] <flow name> — <one-sentence observation>
|
|
61
|
+
- ...
|
|
62
|
+
|
|
63
|
+
**Console Errors:** <count + brief or "None">
|
|
64
|
+
**Failed Requests:** <count + brief or "None">
|
|
65
|
+
**Screenshots:** .work-kit/test-browser/
|
|
66
|
+
|
|
67
|
+
**Notes:**
|
|
68
|
+
<anything the Validate step needs to know — or "None">
|
|
69
|
+
```
|
|
70
|
+
|
|
71
|
+
## Rules
|
|
72
|
+
|
|
73
|
+
### Always
|
|
74
|
+
- Verify against the **running app**, not against source files. The point of this step is catching things that pass unit tests but break in the browser.
|
|
75
|
+
- Capture screenshots even for passing flows — they're cheap evidence and help future debugging.
|
|
76
|
+
- Use the user-facing criteria as the source of truth for what to test, not the implementation.
|
|
77
|
+
- Tear down anything you started (dev server, browser session) when you finish.
|
|
78
|
+
|
|
79
|
+
### Never
|
|
80
|
+
- Mock or stub the backend. If the backend is broken, that's a real failure — record it.
|
|
81
|
+
- Skip flows because they're "obviously fine" without actually loading them.
|
|
82
|
+
- Disable console error capture to avoid noise. Console errors are signal.
|
|
83
|
+
- Run while another test step is hammering the same dev server endpoint — coordinate via the dev server lifecycle (start once, share, don't kill mid-run).
|
|
84
|
+
|
|
85
|
+
## Anti-Rationalization
|
|
86
|
+
|
|
87
|
+
| Excuse | Reality |
|
|
88
|
+
|--------|---------|
|
|
89
|
+
| "The unit tests passed, browser is just duplication" | Unit tests prove pieces work in isolation. Browser proves the assembly works for a real user. Unit-passing + browser-failing is the most common ship-broken pattern. |
|
|
90
|
+
| "The console errors are unrelated to my change" | Then suppress them in code, not by ignoring them here. Unrelated console errors mean someone else's broken thing is hiding inside your safety net. |
|
|
91
|
+
| "I'll skip the screenshots, they take time" | They take seconds. They're invaluable when something regresses three weeks from now and you want to know what the page looked like the day it shipped. |
|
|
92
|
+
| "MCP isn't installed, I'll do the test manually" | Manual is fine for the user, not for this skill. If MCP is missing, mark `skipped` with the reason and let the user install it later. The pipeline must not stall on missing tooling. |
|
|
@@ -1,23 +1,39 @@
|
|
|
1
1
|
---
|
|
2
|
-
description: "Test step:
|
|
2
|
+
description: "Test step: Run the existing Playwright suite as a regression gate, and add specs only for genuinely new flows that nothing else covers."
|
|
3
3
|
---
|
|
4
4
|
|
|
5
5
|
# E2E
|
|
6
6
|
|
|
7
|
-
**Role:**
|
|
8
|
-
**Goal:**
|
|
7
|
+
**Role:** Regression Gate + Selective Spec Author
|
|
8
|
+
**Goal:** Keep the cumulative Playwright suite green, and add a new spec **only** when a user flow in this session isn't already covered by an existing test. `test/browser` handles live per-session verification via Chrome DevTools MCP — this step exists for **durable regression coverage**, not session-local checks.
|
|
9
9
|
|
|
10
10
|
## Instructions
|
|
11
11
|
|
|
12
12
|
1. **Verify Playwright is installed.** Run `npx playwright --version`. If it fails or `@playwright/test` is missing from `package.json`, STOP and tell the user to run `work-kit setup` (which installs Playwright + Chromium and scaffolds a config).
|
|
13
|
-
|
|
14
|
-
|
|
15
|
-
-
|
|
16
|
-
-
|
|
17
|
-
|
|
18
|
-
|
|
19
|
-
|
|
20
|
-
|
|
13
|
+
|
|
14
|
+
2. **Run the existing suite first.** `npx playwright test`. This is the regression gate — if anything the previous sessions wrote is now broken, you catch it here before touching anything new.
|
|
15
|
+
- All pre-existing tests must still pass.
|
|
16
|
+
- Regressions → fix the implementation (not the test), unless the test encodes a behavior that intentionally changed in this session. In that case, update the test *with a comment* explaining why.
|
|
17
|
+
|
|
18
|
+
3. **Identify genuinely new flows.** Read `### Plan: UX Flow` and `## Criteria`. For each user flow in this session, ask:
|
|
19
|
+
- Does an existing spec under `testDir` already exercise this flow? (Grep for selectors, route paths, or the feature name.)
|
|
20
|
+
- If yes: **do not add a duplicate spec.** `test/browser` will cover the live verification; e2e's job here is done.
|
|
21
|
+
- If no: add **one** focused spec that covers the happy path + one or two critical edge cases. Not exhaustive permutations.
|
|
22
|
+
|
|
23
|
+
4. **Re-run the suite** after any new spec is added. `npx playwright test`. Everything must be green.
|
|
24
|
+
|
|
25
|
+
5. **Capture screenshots** only for new specs, via `page.screenshot()` or `--trace on`. Skip screenshots for existing specs — they're already in the regression record.
|
|
26
|
+
|
|
27
|
+
## Spec Discipline (important)
|
|
28
|
+
|
|
29
|
+
The suite is **cumulative** across sessions — every spec you add sticks around forever and becomes maintenance burden. Before adding any new spec:
|
|
30
|
+
|
|
31
|
+
- **Prefer extending an existing spec** to adding a new file. If a relevant spec exists, add a `test(...)` block inside it rather than creating a sibling file.
|
|
32
|
+
- **Write for the flow, not the feature.** A "user can log in" spec shouldn't be rewritten just because you added a new login method — extend it.
|
|
33
|
+
- **No spec per session.** If the session made a small tweak to a flow that already has a spec, don't add a second spec for the tweak.
|
|
34
|
+
- **Delete stale specs when you legitimately replace behavior** instead of leaving `.skip`'d husks.
|
|
35
|
+
|
|
36
|
+
When in doubt: **don't write the spec.** `test/browser` will verify the feature works live; the next session that actually needs regression coverage for this flow can add it.
|
|
21
37
|
|
|
22
38
|
## Output (append to state.md)
|
|
23
39
|
|
|
@@ -25,32 +41,38 @@ description: "Test step: Test user flows end-to-end."
|
|
|
25
41
|
### Test: E2E
|
|
26
42
|
|
|
27
43
|
**Verdict:** pass | fail
|
|
28
|
-
**
|
|
29
|
-
|
|
44
|
+
**Suite Result:** <X passing, Y failing> (pre-existing + any new)
|
|
45
|
+
**Regressions Found:** <none | list of existing specs that broke + fixes applied>
|
|
46
|
+
|
|
47
|
+
**New Specs Added:**
|
|
48
|
+
- `<test file>`: <flow description — or "None — existing specs already cover this session's flows">
|
|
30
49
|
|
|
31
|
-
**
|
|
32
|
-
- <flow
|
|
33
|
-
-
|
|
50
|
+
**New Specs Skipped (already covered):**
|
|
51
|
+
- <flow> — covered by `<existing spec file>`
|
|
52
|
+
- ... or "N/A"
|
|
34
53
|
|
|
35
|
-
**Screenshots:**
|
|
54
|
+
**Screenshots (new specs only):**
|
|
36
55
|
- <description>: <path or "not applicable">
|
|
37
56
|
|
|
38
57
|
**Notes:**
|
|
39
|
-
- <
|
|
58
|
+
- <anything the Validate step needs to know>
|
|
40
59
|
```
|
|
41
60
|
|
|
42
61
|
## Rules
|
|
43
62
|
|
|
44
63
|
- Playwright is the required E2E framework. Manual verification does NOT satisfy this step.
|
|
45
64
|
- If Playwright is missing, halt and direct the user to `work-kit setup` — do not fall back to curl, manual steps, or another framework.
|
|
65
|
+
- **Run the existing suite first**, always. A regression in pre-existing tests is more important than adding new coverage.
|
|
66
|
+
- **No new spec unless coverage is genuinely missing.** Duplicate specs are a tax on every future session.
|
|
46
67
|
- Focus on user-visible behavior, not internal implementation.
|
|
47
|
-
-
|
|
48
|
-
- If a flow fails, fix the implementation (not the test) unless the test expectation is wrong.
|
|
68
|
+
- If a flow fails, fix the implementation (not the test) unless the test encodes a behavior that intentionally changed this session.
|
|
49
69
|
|
|
50
70
|
## Anti-Rationalization
|
|
51
71
|
|
|
52
72
|
| Excuse | Reality |
|
|
53
73
|
|--------|---------|
|
|
54
|
-
| "
|
|
55
|
-
| "
|
|
56
|
-
| "
|
|
74
|
+
| "Every session should add its own spec for documentation" | No. Specs are executable safety nets, not docs. An unneeded spec adds flakiness and maintenance with zero regression value over the spec that already covers the flow. |
|
|
75
|
+
| "I'll add a spec just in case — it can't hurt" | It can. Each spec adds suite runtime, a selector that can rot, and a merge-conflict surface. "Just in case" accumulates until the suite is 20 minutes long and half-skipped. |
|
|
76
|
+
| "Manual verification counts as E2E testing" | It does not. This step either runs Playwright green or halts. Live verification is `test/browser`'s job, not a fallback for e2e. |
|
|
77
|
+
| "Unit tests already cover this flow" | Unit tests mock boundaries. E2E tests verify the real flow across database, API, and UI. Keep the e2e spec if no unit+integration combination could catch a boundary regression. |
|
|
78
|
+
| "The existing spec is close enough, I'll write a new one anyway" | Extend the existing spec with a new `test(...)` block instead. Two specs for nearly the same flow is worse than one spec that covers both cases. |
|
|
@@ -26,7 +26,7 @@ The summary you write goes into `.work-kit/summary.md`; the CLI archives it into
|
|
|
26
26
|
4. **Run** `work-kit complete wrap-up/summary --outcome done`
|
|
27
27
|
|
|
28
28
|
### Step 2: knowledge
|
|
29
|
-
5. **Run `work-kit extract`** —
|
|
29
|
+
5. **Run `work-kit extract`** — routes typed `## Observations` bullets and tracker loopbacks/skipped/failed steps into `.work-kit-knowledge/` files. `## Decisions` and `## Deviations` are not auto-harvested (they're scratch space).
|
|
30
30
|
6. **Review the summary you just wrote** for subjective additions the parser would miss. For each, call `work-kit learn --type <lesson|convention|risk|workflow> --text "..."`.
|
|
31
31
|
7. **Run** `work-kit complete wrap-up/knowledge --outcome done`
|
|
32
32
|
|
|
@@ -14,13 +14,16 @@ After `wrap-up/summary`. By now you've just re-read the full `state.md` and dist
|
|
|
14
14
|
```bash
|
|
15
15
|
work-kit extract
|
|
16
16
|
```
|
|
17
|
-
This parses `.work-kit/state.md` and `.work-kit/tracker.json` and routes entries to `.work-kit-knowledge/{lessons,conventions,risks,workflow}.md`. It pulls from:
|
|
18
|
-
- `## Observations` typed bullets (`- [lesson|convention|risk|workflow] text`)
|
|
19
|
-
- `## Decisions`
|
|
20
|
-
- `## Deviations` → workflow feedback
|
|
17
|
+
This parses `.work-kit/state.md` and `.work-kit/tracker.json` and routes entries to `.work-kit-knowledge/{lessons,conventions,risks,workflow,decisions}.md`. It pulls from:
|
|
18
|
+
- `## Observations` typed bullets (`- [lesson|convention|risk|workflow|decision] text`) — auto-harvested
|
|
19
|
+
- `## Decisions` bullets matching `**<context>**: chose X over Y — <why>` — auto-harvested into `decisions.md`
|
|
21
20
|
- `tracker.json.loopbacks[]` → workflow feedback
|
|
22
21
|
- Skipped/failed steps → workflow feedback
|
|
23
22
|
|
|
23
|
+
`## Deviations` is **not** auto-harvested — it's agent scratch space (test plans, review notes). If something in there is worth preserving, restate it as a typed bullet under `## Observations` (or call `work-kit learn` directly in step 2).
|
|
24
|
+
|
|
25
|
+
Free-form bullets under `## Decisions` (lines that don't match `**context**: chose X over Y`) are skipped silently — they're treated as scratch.
|
|
26
|
+
|
|
24
27
|
The output JSON tells you how many entries were `written` vs `duplicates`. Re-running is idempotent.
|
|
25
28
|
|
|
26
29
|
2. **Read your `.work-kit/summary.md`** (the one you just wrote). For each non-obvious thing in it that the parser would NOT have captured automatically, call `work-kit learn`:
|
|
@@ -29,6 +32,7 @@ After `wrap-up/summary`. By now you've just re-read the full `state.md` and dist
|
|
|
29
32
|
work-kit learn --type lesson --text "Discovered that the test fixtures must be reset between Playwright suites, otherwise auth state leaks."
|
|
30
33
|
work-kit learn --type risk --text "src/payment/webhook.ts has no integration test coverage for retries."
|
|
31
34
|
work-kit learn --type convention --text "All new API endpoints must register a Zod schema in src/schemas/."
|
|
35
|
+
work-kit learn --type decision --text "**Browser driver**: chose Chrome DevTools MCP over Playwright — agentic, no spec files to maintain."
|
|
32
36
|
work-kit learn --type workflow --text "The wk-test/e2e step doesn't tell agents to start the dev server before running Playwright."
|
|
33
37
|
```
|
|
34
38
|
|
|
@@ -46,6 +50,7 @@ After `wrap-up/summary`. By now you've just re-read the full `state.md` and dist
|
|
|
46
50
|
| `lesson` | lessons.md | Project-specific learnings — facts about *this* codebase. |
|
|
47
51
|
| `convention` | conventions.md | Codified rules this project follows. Future sessions should respect these. |
|
|
48
52
|
| `risk` | risks.md | Fragile or dangerous areas. Touch with care. |
|
|
53
|
+
| `decision` | decisions.md | Architectural choices: what was picked, what was rejected, why. Read before re-litigating. Auto-harvested from `## Decisions` when bullets follow the `**context**: chose X over Y — <why>` format. |
|
|
49
54
|
| `workflow` | workflow.md | Feedback about the work-kit kit itself — skill quality, step skips, loopbacks, failure modes. **Mined manually across projects to improve work-kit upstream.** |
|
|
50
55
|
|
|
51
56
|
## Boundaries
|