@athenaflow/plugin-e2e-test-builder 2.0.9 → 2.0.10
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.claude-plugin/plugin.json +1 -1
- package/.codex-plugin/plugin.json +1 -1
- package/dist/{2.0.8 → 2.0.10}/.agents/plugins/marketplace.json +1 -1
- package/dist/{2.0.9 → 2.0.10}/claude/plugin/.claude-plugin/plugin.json +1 -1
- package/dist/{2.0.9 → 2.0.10}/claude/plugin/package.json +8 -2
- package/dist/{2.0.9 → 2.0.10}/claude/plugin/skills/add-e2e-tests/SKILL.md +18 -65
- package/dist/{2.0.8 → 2.0.10}/claude/plugin/skills/add-e2e-tests/agents/openai.yaml +1 -1
- package/dist/{2.0.8 → 2.0.10}/claude/plugin/skills/add-e2e-tests/references/error-recovery.md +3 -3
- package/dist/{2.0.8/codex → 2.0.10/claude}/plugin/skills/add-e2e-tests/references/scaffolding.md +1 -1
- package/dist/{2.0.9 → 2.0.10}/claude/plugin/skills/fix-flaky-tests/SKILL.md +1 -1
- package/dist/{2.0.8/codex → 2.0.10/claude}/plugin/skills/fix-flaky-tests/references/fix-patterns.md +3 -2
- package/dist/{2.0.9 → 2.0.10}/claude/plugin/skills/generate-test-cases/SKILL.md +8 -2
- package/dist/{2.0.9 → 2.0.10}/claude/plugin/skills/plan-test-coverage/SKILL.md +7 -6
- package/dist/{2.0.8 → 2.0.10}/claude/plugin/skills/review-test-cases/SKILL.md +3 -4
- package/dist/{2.0.8 → 2.0.10}/claude/plugin/skills/write-test-code/SKILL.md +4 -3
- package/dist/{2.0.8 → 2.0.10}/claude/plugin/skills/write-test-code/references/api-setup-teardown.md +1 -1
- package/dist/{2.0.9 → 2.0.10}/codex/plugin/.codex-plugin/plugin.json +1 -1
- package/dist/{2.0.9 → 2.0.10}/codex/plugin/package.json +8 -2
- package/dist/{2.0.9 → 2.0.10}/codex/plugin/skills/add-e2e-tests/SKILL.md +18 -65
- package/dist/{2.0.8 → 2.0.10}/codex/plugin/skills/add-e2e-tests/agents/openai.yaml +1 -1
- package/dist/{2.0.9/claude → 2.0.10/codex}/plugin/skills/add-e2e-tests/references/error-recovery.md +3 -3
- package/dist/{2.0.9/claude → 2.0.10/codex}/plugin/skills/add-e2e-tests/references/scaffolding.md +1 -1
- package/dist/{2.0.8/claude → 2.0.10/codex}/plugin/skills/fix-flaky-tests/SKILL.md +1 -1
- package/dist/{2.0.9/claude → 2.0.10/codex}/plugin/skills/fix-flaky-tests/references/fix-patterns.md +3 -2
- package/dist/{2.0.9 → 2.0.10}/codex/plugin/skills/generate-test-cases/SKILL.md +8 -2
- package/dist/{2.0.9 → 2.0.10}/codex/plugin/skills/plan-test-coverage/SKILL.md +7 -6
- package/dist/{2.0.9/claude → 2.0.10/codex}/plugin/skills/review-test-cases/SKILL.md +3 -4
- package/dist/{2.0.9/claude → 2.0.10/codex}/plugin/skills/write-test-code/SKILL.md +4 -3
- package/dist/{2.0.9/claude → 2.0.10/codex}/plugin/skills/write-test-code/references/api-setup-teardown.md +1 -1
- package/dist/{2.0.9 → 2.0.10}/release.json +1 -1
- package/package.json +7 -1
- package/skills/add-e2e-tests/SKILL.md +18 -65
- package/skills/add-e2e-tests/agents/openai.yaml +1 -1
- package/skills/add-e2e-tests/references/error-recovery.md +3 -3
- package/skills/add-e2e-tests/references/scaffolding.md +1 -1
- package/skills/fix-flaky-tests/SKILL.md +1 -1
- package/skills/fix-flaky-tests/references/fix-patterns.md +3 -2
- package/skills/generate-test-cases/SKILL.md +8 -2
- package/skills/plan-test-coverage/SKILL.md +7 -6
- package/skills/review-test-cases/SKILL.md +3 -4
- package/skills/write-test-code/SKILL.md +4 -3
- package/skills/write-test-code/references/api-setup-teardown.md +1 -1
- package/dist/2.0.8/claude/plugin/.claude-plugin/plugin.json +0 -20
- package/dist/2.0.8/claude/plugin/package.json +0 -9
- package/dist/2.0.8/claude/plugin/skills/add-e2e-tests/SKILL.md +0 -217
- package/dist/2.0.8/claude/plugin/skills/add-e2e-tests/agents/claude.yaml +0 -1
- package/dist/2.0.8/claude/plugin/skills/add-e2e-tests/references/scaffolding.md +0 -12
- package/dist/2.0.8/claude/plugin/skills/add-e2e-tests/references/tracker-template.md +0 -53
- package/dist/2.0.8/claude/plugin/skills/fix-flaky-tests/references/fix-patterns.md +0 -91
- package/dist/2.0.8/claude/plugin/skills/generate-test-cases/SKILL.md +0 -184
- package/dist/2.0.8/claude/plugin/skills/plan-test-coverage/SKILL.md +0 -116
- package/dist/2.0.8/codex/plugin/.codex-plugin/plugin.json +0 -15
- package/dist/2.0.8/codex/plugin/package.json +0 -9
- package/dist/2.0.8/codex/plugin/skills/add-e2e-tests/SKILL.md +0 -217
- package/dist/2.0.8/codex/plugin/skills/add-e2e-tests/agents/claude.yaml +0 -1
- package/dist/2.0.8/codex/plugin/skills/add-e2e-tests/references/error-recovery.md +0 -43
- package/dist/2.0.8/codex/plugin/skills/add-e2e-tests/references/tracker-template.md +0 -53
- package/dist/2.0.8/codex/plugin/skills/fix-flaky-tests/SKILL.md +0 -160
- package/dist/2.0.8/codex/plugin/skills/generate-test-cases/SKILL.md +0 -184
- package/dist/2.0.8/codex/plugin/skills/plan-test-coverage/SKILL.md +0 -116
- package/dist/2.0.8/codex/plugin/skills/review-test-cases/SKILL.md +0 -147
- package/dist/2.0.8/codex/plugin/skills/write-test-code/SKILL.md +0 -227
- package/dist/2.0.8/codex/plugin/skills/write-test-code/references/api-setup-teardown.md +0 -83
- package/dist/2.0.8/release.json +0 -18
- package/dist/2.0.9/.agents/plugins/marketplace.json +0 -14
- package/dist/2.0.9/claude/plugin/skills/add-e2e-tests/agents/openai.yaml +0 -10
- package/dist/2.0.9/claude/plugin/skills/add-e2e-tests/references/authentication.md +0 -8
- package/dist/2.0.9/claude/plugin/skills/add-e2e-tests/references/tracker-template.md +0 -53
- package/dist/2.0.9/claude/plugin/skills/analyze-test-codebase/SKILL.md +0 -142
- package/dist/2.0.9/claude/plugin/skills/analyze-test-codebase/agents/claude.yaml +0 -3
- package/dist/2.0.9/claude/plugin/skills/analyze-test-codebase/agents/openai.yaml +0 -4
- package/dist/2.0.9/claude/plugin/skills/fix-flaky-tests/agents/claude.yaml +0 -3
- package/dist/2.0.9/claude/plugin/skills/fix-flaky-tests/agents/openai.yaml +0 -10
- package/dist/2.0.9/claude/plugin/skills/generate-test-cases/agents/claude.yaml +0 -3
- package/dist/2.0.9/claude/plugin/skills/generate-test-cases/agents/openai.yaml +0 -10
- package/dist/2.0.9/claude/plugin/skills/generate-test-cases/references/scenario-categories.md +0 -36
- package/dist/2.0.9/claude/plugin/skills/plan-test-coverage/agents/claude.yaml +0 -3
- package/dist/2.0.9/claude/plugin/skills/plan-test-coverage/agents/openai.yaml +0 -10
- package/dist/2.0.9/claude/plugin/skills/review-test-cases/agents/claude.yaml +0 -3
- package/dist/2.0.9/claude/plugin/skills/review-test-cases/agents/openai.yaml +0 -10
- package/dist/2.0.9/claude/plugin/skills/review-test-code/SKILL.md +0 -189
- package/dist/2.0.9/claude/plugin/skills/review-test-code/agents/claude.yaml +0 -3
- package/dist/2.0.9/claude/plugin/skills/review-test-code/agents/openai.yaml +0 -10
- package/dist/2.0.9/claude/plugin/skills/write-test-code/agents/claude.yaml +0 -3
- package/dist/2.0.9/claude/plugin/skills/write-test-code/agents/openai.yaml +0 -10
- package/dist/2.0.9/claude/plugin/skills/write-test-code/references/anti-patterns.md +0 -88
- package/dist/2.0.9/claude/plugin/skills/write-test-code/references/auth-patterns.md +0 -63
- package/dist/2.0.9/claude/plugin/skills/write-test-code/references/mapping-tables.md +0 -56
- package/dist/2.0.9/claude/plugin/skills/write-test-code/references/network-interception.md +0 -56
- package/dist/2.0.9/codex/plugin/skills/add-e2e-tests/agents/openai.yaml +0 -10
- package/dist/2.0.9/codex/plugin/skills/add-e2e-tests/references/authentication.md +0 -8
- package/dist/2.0.9/codex/plugin/skills/add-e2e-tests/references/error-recovery.md +0 -43
- package/dist/2.0.9/codex/plugin/skills/add-e2e-tests/references/scaffolding.md +0 -12
- package/dist/2.0.9/codex/plugin/skills/add-e2e-tests/references/tracker-template.md +0 -53
- package/dist/2.0.9/codex/plugin/skills/analyze-test-codebase/SKILL.md +0 -142
- package/dist/2.0.9/codex/plugin/skills/analyze-test-codebase/agents/claude.yaml +0 -3
- package/dist/2.0.9/codex/plugin/skills/analyze-test-codebase/agents/openai.yaml +0 -4
- package/dist/2.0.9/codex/plugin/skills/fix-flaky-tests/SKILL.md +0 -160
- package/dist/2.0.9/codex/plugin/skills/fix-flaky-tests/agents/claude.yaml +0 -3
- package/dist/2.0.9/codex/plugin/skills/fix-flaky-tests/agents/openai.yaml +0 -10
- package/dist/2.0.9/codex/plugin/skills/fix-flaky-tests/references/fix-patterns.md +0 -91
- package/dist/2.0.9/codex/plugin/skills/generate-test-cases/agents/claude.yaml +0 -3
- package/dist/2.0.9/codex/plugin/skills/generate-test-cases/agents/openai.yaml +0 -10
- package/dist/2.0.9/codex/plugin/skills/generate-test-cases/references/scenario-categories.md +0 -36
- package/dist/2.0.9/codex/plugin/skills/plan-test-coverage/agents/claude.yaml +0 -3
- package/dist/2.0.9/codex/plugin/skills/plan-test-coverage/agents/openai.yaml +0 -10
- package/dist/2.0.9/codex/plugin/skills/review-test-cases/SKILL.md +0 -147
- package/dist/2.0.9/codex/plugin/skills/review-test-cases/agents/claude.yaml +0 -3
- package/dist/2.0.9/codex/plugin/skills/review-test-cases/agents/openai.yaml +0 -10
- package/dist/2.0.9/codex/plugin/skills/review-test-code/SKILL.md +0 -189
- package/dist/2.0.9/codex/plugin/skills/review-test-code/agents/claude.yaml +0 -3
- package/dist/2.0.9/codex/plugin/skills/review-test-code/agents/openai.yaml +0 -10
- package/dist/2.0.9/codex/plugin/skills/write-test-code/SKILL.md +0 -227
- package/dist/2.0.9/codex/plugin/skills/write-test-code/agents/claude.yaml +0 -3
- package/dist/2.0.9/codex/plugin/skills/write-test-code/agents/openai.yaml +0 -10
- package/dist/2.0.9/codex/plugin/skills/write-test-code/references/anti-patterns.md +0 -88
- package/dist/2.0.9/codex/plugin/skills/write-test-code/references/api-setup-teardown.md +0 -83
- package/dist/2.0.9/codex/plugin/skills/write-test-code/references/auth-patterns.md +0 -63
- package/dist/2.0.9/codex/plugin/skills/write-test-code/references/mapping-tables.md +0 -56
- package/dist/2.0.9/codex/plugin/skills/write-test-code/references/network-interception.md +0 -56
- package/skills/add-e2e-tests/references/tracker-template.md +0 -53
- /package/dist/{2.0.9 → 2.0.10}/claude/plugin/skills/add-e2e-tests/agents/claude.yaml +0 -0
- /package/dist/{2.0.8 → 2.0.10}/claude/plugin/skills/add-e2e-tests/references/authentication.md +0 -0
- /package/dist/{2.0.8 → 2.0.10}/claude/plugin/skills/analyze-test-codebase/SKILL.md +0 -0
- /package/dist/{2.0.8 → 2.0.10}/claude/plugin/skills/analyze-test-codebase/agents/claude.yaml +0 -0
- /package/dist/{2.0.8 → 2.0.10}/claude/plugin/skills/analyze-test-codebase/agents/openai.yaml +0 -0
- /package/dist/{2.0.8 → 2.0.10}/claude/plugin/skills/fix-flaky-tests/agents/claude.yaml +0 -0
- /package/dist/{2.0.8 → 2.0.10}/claude/plugin/skills/fix-flaky-tests/agents/openai.yaml +0 -0
- /package/dist/{2.0.8 → 2.0.10}/claude/plugin/skills/generate-test-cases/agents/claude.yaml +0 -0
- /package/dist/{2.0.8 → 2.0.10}/claude/plugin/skills/generate-test-cases/agents/openai.yaml +0 -0
- /package/dist/{2.0.8 → 2.0.10}/claude/plugin/skills/generate-test-cases/references/scenario-categories.md +0 -0
- /package/dist/{2.0.8 → 2.0.10}/claude/plugin/skills/plan-test-coverage/agents/claude.yaml +0 -0
- /package/dist/{2.0.8 → 2.0.10}/claude/plugin/skills/plan-test-coverage/agents/openai.yaml +0 -0
- /package/dist/{2.0.8 → 2.0.10}/claude/plugin/skills/review-test-cases/agents/claude.yaml +0 -0
- /package/dist/{2.0.8 → 2.0.10}/claude/plugin/skills/review-test-cases/agents/openai.yaml +0 -0
- /package/dist/{2.0.8 → 2.0.10}/claude/plugin/skills/review-test-code/SKILL.md +0 -0
- /package/dist/{2.0.8 → 2.0.10}/claude/plugin/skills/review-test-code/agents/claude.yaml +0 -0
- /package/dist/{2.0.8 → 2.0.10}/claude/plugin/skills/review-test-code/agents/openai.yaml +0 -0
- /package/dist/{2.0.8 → 2.0.10}/claude/plugin/skills/write-test-code/agents/claude.yaml +0 -0
- /package/dist/{2.0.8 → 2.0.10}/claude/plugin/skills/write-test-code/agents/openai.yaml +0 -0
- /package/dist/{2.0.8 → 2.0.10}/claude/plugin/skills/write-test-code/references/anti-patterns.md +0 -0
- /package/dist/{2.0.8 → 2.0.10}/claude/plugin/skills/write-test-code/references/auth-patterns.md +0 -0
- /package/dist/{2.0.8 → 2.0.10}/claude/plugin/skills/write-test-code/references/mapping-tables.md +0 -0
- /package/dist/{2.0.8 → 2.0.10}/claude/plugin/skills/write-test-code/references/network-interception.md +0 -0
- /package/dist/{2.0.9 → 2.0.10}/codex/plugin/skills/add-e2e-tests/agents/claude.yaml +0 -0
- /package/dist/{2.0.8 → 2.0.10}/codex/plugin/skills/add-e2e-tests/references/authentication.md +0 -0
- /package/dist/{2.0.8 → 2.0.10}/codex/plugin/skills/analyze-test-codebase/SKILL.md +0 -0
- /package/dist/{2.0.8 → 2.0.10}/codex/plugin/skills/analyze-test-codebase/agents/claude.yaml +0 -0
- /package/dist/{2.0.8 → 2.0.10}/codex/plugin/skills/analyze-test-codebase/agents/openai.yaml +0 -0
- /package/dist/{2.0.8 → 2.0.10}/codex/plugin/skills/fix-flaky-tests/agents/claude.yaml +0 -0
- /package/dist/{2.0.8 → 2.0.10}/codex/plugin/skills/fix-flaky-tests/agents/openai.yaml +0 -0
- /package/dist/{2.0.8 → 2.0.10}/codex/plugin/skills/generate-test-cases/agents/claude.yaml +0 -0
- /package/dist/{2.0.8 → 2.0.10}/codex/plugin/skills/generate-test-cases/agents/openai.yaml +0 -0
- /package/dist/{2.0.8 → 2.0.10}/codex/plugin/skills/generate-test-cases/references/scenario-categories.md +0 -0
- /package/dist/{2.0.8 → 2.0.10}/codex/plugin/skills/plan-test-coverage/agents/claude.yaml +0 -0
- /package/dist/{2.0.8 → 2.0.10}/codex/plugin/skills/plan-test-coverage/agents/openai.yaml +0 -0
- /package/dist/{2.0.8 → 2.0.10}/codex/plugin/skills/review-test-cases/agents/claude.yaml +0 -0
- /package/dist/{2.0.8 → 2.0.10}/codex/plugin/skills/review-test-cases/agents/openai.yaml +0 -0
- /package/dist/{2.0.8 → 2.0.10}/codex/plugin/skills/review-test-code/SKILL.md +0 -0
- /package/dist/{2.0.8 → 2.0.10}/codex/plugin/skills/review-test-code/agents/claude.yaml +0 -0
- /package/dist/{2.0.8 → 2.0.10}/codex/plugin/skills/review-test-code/agents/openai.yaml +0 -0
- /package/dist/{2.0.8 → 2.0.10}/codex/plugin/skills/write-test-code/agents/claude.yaml +0 -0
- /package/dist/{2.0.8 → 2.0.10}/codex/plugin/skills/write-test-code/agents/openai.yaml +0 -0
- /package/dist/{2.0.8 → 2.0.10}/codex/plugin/skills/write-test-code/references/anti-patterns.md +0 -0
- /package/dist/{2.0.8 → 2.0.10}/codex/plugin/skills/write-test-code/references/auth-patterns.md +0 -0
- /package/dist/{2.0.8 → 2.0.10}/codex/plugin/skills/write-test-code/references/mapping-tables.md +0 -0
- /package/dist/{2.0.8 → 2.0.10}/codex/plugin/skills/write-test-code/references/network-interception.md +0 -0
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "e2e-test-builder",
|
|
3
3
|
"description": "Full-pipeline Playwright E2E test generation \u2014 explores your live site via browser, detects existing test conventions, plans coverage gaps, produces reviewed test specs, writes production-grade test code with quality gates, and stabilizes flaky tests",
|
|
4
|
-
"version": "2.0.
|
|
4
|
+
"version": "2.0.10",
|
|
5
5
|
"author": {
|
|
6
6
|
"name": "Athenaflow"
|
|
7
7
|
},
|
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "e2e-test-builder",
|
|
3
|
-
"version": "2.0.
|
|
3
|
+
"version": "2.0.10",
|
|
4
4
|
"description": "Full-pipeline Playwright E2E test generation \u2014 explores your live site via browser, detects existing test conventions, plans coverage gaps, produces reviewed test specs, writes production-grade test code with quality gates, and stabilizes flaky tests",
|
|
5
5
|
"author": {
|
|
6
6
|
"name": "Athenaflow"
|
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "e2e-test-builder",
|
|
3
3
|
"description": "Full-pipeline Playwright E2E test generation \u2014 explores your live site via browser, detects existing test conventions, plans coverage gaps, produces reviewed test specs, writes production-grade test code with quality gates, and stabilizes flaky tests",
|
|
4
|
-
"version": "2.0.
|
|
4
|
+
"version": "2.0.10",
|
|
5
5
|
"author": {
|
|
6
6
|
"name": "Athenaflow"
|
|
7
7
|
},
|
|
@@ -1,9 +1,15 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "@athenaflow/plugin-e2e-test-builder",
|
|
3
|
-
"version": "2.0.
|
|
3
|
+
"version": "2.0.10",
|
|
4
4
|
"description": "Full-pipeline Playwright E2E test generation — explores your live site via browser, detects existing test conventions, plans coverage gaps, produces reviewed test specs, writes production-grade test code with quality gates, and stabilizes flaky tests",
|
|
5
5
|
"license": "MIT",
|
|
6
6
|
"publishConfig": {
|
|
7
7
|
"access": "public"
|
|
8
|
-
}
|
|
8
|
+
},
|
|
9
|
+
"files": [
|
|
10
|
+
".claude-plugin/",
|
|
11
|
+
".codex-plugin/",
|
|
12
|
+
"skills/",
|
|
13
|
+
"dist/"
|
|
14
|
+
]
|
|
9
15
|
}
|
|
@@ -8,7 +8,7 @@ description: >
|
|
|
8
8
|
Delegates to sub-skills (analyze-test-codebase, plan-test-coverage, generate-test-cases,
|
|
9
9
|
review-test-cases, write-test-code, review-test-code, fix-flaky-tests) internally — do NOT
|
|
10
10
|
skip to sub-skills directly unless the user explicitly requests a narrow activity.
|
|
11
|
-
|
|
11
|
+
Uses subagent delegation to save context.
|
|
12
12
|
allowed-tools: Read Write Edit Glob Grep Bash Task
|
|
13
13
|
---
|
|
14
14
|
|
|
@@ -22,34 +22,17 @@ Parse the target URL and feature description from: $ARGUMENTS
|
|
|
22
22
|
|
|
23
23
|
Derive a **feature slug** from the feature description (e.g., "Login flow" → `login`, "Checkout with payment" → `checkout`). Use this slug for file naming throughout.
|
|
24
24
|
|
|
25
|
-
##
|
|
26
|
-
|
|
27
|
-
### 1. Orient: Understand the Project, the Product, and Your Capabilities
|
|
25
|
+
## 1. Orient: Understand the Project, the Product, and Your Capabilities
|
|
28
26
|
|
|
29
27
|
Before planning any work, build deep situational awareness. This step determines the quality of everything that follows — rushed orientation leads to missed test cases and wasted effort.
|
|
30
28
|
|
|
31
|
-
|
|
32
|
-
- If `e2e-tracker.md` exists in the project root, read it and resume from where you left off — skip to **step 2 (Plan)** with the remaining work.
|
|
33
|
-
- If no tracker exists, this is a fresh start. Proceed with orientation below.
|
|
34
|
-
|
|
35
|
-
#### First: create initial tasks and tracker
|
|
36
|
-
|
|
37
|
-
As soon as you parse the user's request:
|
|
38
|
-
|
|
39
|
-
1. **Create the tracker** — write `e2e-tracker.md` with the goal (URL, feature, slug) and a skeleton plan.
|
|
40
|
-
2. **Create high-level tasks** for the work ahead — analyze codebase, explore the product, plan coverage, generate test specs, write tests, verify tests.
|
|
41
|
-
|
|
42
|
-
These are your starting skeleton. As you work through orientation and discover the actual shape of the work, refine both the tasks and the tracker — break tasks into granular sub-tasks, add new ones, remove ones that don't apply.
|
|
43
|
-
|
|
44
|
-
Treat the task list as a visible milestone log. Keep it concise, but update it continuously. Do not leave broad tasks open until the end and then mark everything complete in one batch.
|
|
45
|
-
|
|
46
|
-
#### 1a. Understand the codebase
|
|
29
|
+
### Understand the codebase
|
|
47
30
|
|
|
48
31
|
- Does a Playwright config exist (`playwright.config.{ts,js,mjs}`)? If not, you will need to scaffold one (see Scaffolding section).
|
|
49
32
|
- Are there existing tests? What conventions do they follow — naming, locators, fixtures, page objects, auth?
|
|
50
33
|
- Load the `analyze-test-codebase` skill and follow its methodology.
|
|
51
34
|
|
|
52
|
-
|
|
35
|
+
### Understand the product
|
|
53
36
|
|
|
54
37
|
This is the most important part of orientation. You cannot write good tests for a product you don't understand.
|
|
55
38
|
|
|
@@ -59,7 +42,7 @@ This is the most important part of orientation. You cannot write good tests for
|
|
|
59
42
|
|
|
60
43
|
Why this matters: absent explicit exploration, agents tend to write tests based on assumptions about how a product works rather than how it actually works. The result is tests that target imaginary behavior or miss critical real behavior. Spending time here prevents both.
|
|
61
44
|
|
|
62
|
-
|
|
45
|
+
### Know your skills
|
|
63
46
|
|
|
64
47
|
You have access to specialized skills that contain deep domain knowledge. Load the relevant skill before performing each activity — skills prevent improvisation and encode best practices.
|
|
65
48
|
|
|
@@ -76,22 +59,11 @@ You have access to specialized skills that contain deep domain knowledge. Load t
|
|
|
76
59
|
|
|
77
60
|
Before doing a substantial activity, load the skill that covers that activity so you can follow its workflow rather than improvising.
|
|
78
61
|
|
|
79
|
-
|
|
80
|
-
|
|
81
|
-
After orienting, update the tracker with what you learned about the codebase and product, conventions discovered, and your refined plan. The tracker must always answer these four questions for anyone reading it cold:
|
|
82
|
-
|
|
83
|
-
1. What is the goal?
|
|
84
|
-
2. What has been done?
|
|
85
|
-
3. What is remaining?
|
|
86
|
-
4. What should I do next?
|
|
62
|
+
## 2. Plan: Refine Tasks Into Granular Checkpoints
|
|
87
63
|
|
|
88
|
-
|
|
64
|
+
Refine the work into granular checkpoints based on what orientation revealed. The plan should flow from what you learned, not from a fixed template.
|
|
89
65
|
|
|
90
|
-
###
|
|
91
|
-
|
|
92
|
-
By now you have initial tasks and a tracker from step 1. Refine tasks into granular checkpoints. The plan should flow from what you learned during orientation, not from a fixed template.
|
|
93
|
-
|
|
94
|
-
#### Task granularity
|
|
66
|
+
### Task granularity
|
|
95
67
|
|
|
96
68
|
Think in small checkpoints, not big phases. Each task should represent a concrete, verifiable unit of progress.
|
|
97
69
|
|
|
@@ -117,11 +89,11 @@ Create tasks for verification steps too (running tests, checking coverage, brows
|
|
|
117
89
|
|
|
118
90
|
Update task status as each checkpoint completes. A good pattern is: finish exploration and mark it complete, finish coverage/spec work and mark it complete, finish implementation and mark it complete, then finish review/execution and mark it complete. Do not keep all milestones open until session end.
|
|
119
91
|
|
|
120
|
-
|
|
92
|
+
## 3. Execute
|
|
121
93
|
|
|
122
94
|
Work through your tasks. Load the relevant skill before each activity.
|
|
123
95
|
|
|
124
|
-
|
|
96
|
+
### Planning uses the browser heavily
|
|
125
97
|
|
|
126
98
|
When planning what to test (coverage planning, test case generation), use the browser extensively. Don't just catalog elements — interact with the product to discover:
|
|
127
99
|
- What validation messages appear for each field?
|
|
@@ -133,14 +105,14 @@ When planning what to test (coverage planning, test case generation), use the br
|
|
|
133
105
|
|
|
134
106
|
Every test case you generate should trace back to something you actually observed or deliberately triggered in the browser. This is how you avoid introducing useless test cases (testing imaginary behavior) and avoid missing important ones (behavior you didn't think to check).
|
|
135
107
|
|
|
136
|
-
|
|
108
|
+
### Subagent delegation
|
|
137
109
|
|
|
138
110
|
Delegate heavy browser exploration and test writing to subagents when that saves context for orchestration, verification, and debugging. When delegating:
|
|
139
111
|
- Pass the relevant file paths (conventions, coverage plan, test specs)
|
|
140
112
|
- Instruct the subagent to invoke the appropriate skill (subagents inherit access to plugin skills)
|
|
141
113
|
- Specify concrete output expectations (file path, format, TC-ID conventions)
|
|
142
114
|
|
|
143
|
-
|
|
115
|
+
### Quality gates
|
|
144
116
|
|
|
145
117
|
Two review gates and a test execution checkpoint are mandatory during execution. The review gates are review-only — they produce findings but do not modify files.
|
|
146
118
|
|
|
@@ -148,45 +120,26 @@ Two review gates and a test execution checkpoint are mandatory during execution.
|
|
|
148
120
|
1. Load the `review-test-cases` skill and run it against `test-cases/<feature>.md`
|
|
149
121
|
2. If verdict is **NEEDS REVISION** — address all blockers in the spec before proceeding to implementation
|
|
150
122
|
3. If verdict is **PASS WITH WARNINGS** — address warnings if quick, otherwise note them and proceed
|
|
151
|
-
4. Record the review verdict in the tracker
|
|
152
|
-
|
|
153
123
|
**Gate 2: Review test code** (after `write-test-code`, before final test execution)
|
|
154
124
|
1. Load the `review-test-code` skill and run it against the implemented test files
|
|
155
125
|
2. If verdict is **NEEDS REVISION** — fix all blockers before running tests for signoff
|
|
156
126
|
3. If verdict is **PASS WITH WARNINGS** — fix warnings that affect stability, proceed with execution
|
|
157
|
-
4. Record the review verdict in the tracker
|
|
158
127
|
|
|
159
128
|
**Checkpoint: Test execution**
|
|
160
129
|
1. Run the tests: `npx playwright test <file> --reporter=list 2>&1`
|
|
161
|
-
2.
|
|
130
|
+
2. Inspect the full output — green test output is the only proof of correctness
|
|
162
131
|
3. If tests fail, load the `fix-flaky-tests` skill and follow its structured diagnostic approach. Do not guess-and-retry.
|
|
163
|
-
4. Maximum 3 fix-and-rerun cycles per test. If stuck after 3 cycles,
|
|
164
|
-
|
|
165
|
-
**Test execution and coverage checks must never be delegated to subagents.** Run `npx playwright test` directly and record the output.
|
|
166
|
-
|
|
167
|
-
#### Update the tracker as you work
|
|
168
|
-
|
|
169
|
-
Do not wait until session end. After each meaningful chunk of progress (completing a step, discovering a blocker, producing an artifact), update the tracker. If your context window resets, only what's in the tracker survives.
|
|
170
|
-
|
|
171
|
-
Keep the tracker and task list synchronized. If you record progress in the tracker, update the corresponding task status in the same phase of work.
|
|
172
|
-
|
|
173
|
-
#### Error recovery
|
|
174
|
-
|
|
175
|
-
If infrastructure failures occur (browser MCP unavailable, clone failures, npm install errors), see [references/error-recovery.md](references/error-recovery.md) for diagnostic steps. General pattern: diagnose, attempt one known fix, if still stuck record in tracker and ask the user.
|
|
132
|
+
4. Maximum 3 fix-and-rerun cycles per test. If stuck after 3 cycles, move on with the diagnostic output.
|
|
176
133
|
|
|
177
|
-
|
|
134
|
+
**Test execution and coverage checks must never be delegated to subagents.** Run `npx playwright test` directly.
|
|
178
135
|
|
|
179
|
-
|
|
180
|
-
1. Ensure the tracker reflects all progress, discoveries, and blockers from this session
|
|
181
|
-
2. Write clear instructions for what the next session should do
|
|
182
|
-
3. If all work is complete and all tests pass with full TC-ID coverage: write `<!-- E2E_COMPLETE -->` as the last line of the tracker
|
|
183
|
-
4. If an unrecoverable blocker prevents progress: write `<!-- E2E_BLOCKED: reason -->` as the last line
|
|
136
|
+
### Error recovery
|
|
184
137
|
|
|
185
|
-
|
|
138
|
+
If infrastructure failures occur (browser MCP unavailable, clone failures, npm install errors), see [references/error-recovery.md](references/error-recovery.md) for diagnostic steps. General pattern: diagnose, attempt one known fix, if still stuck ask the user.
|
|
186
139
|
|
|
187
140
|
## Scaffolding
|
|
188
141
|
|
|
189
|
-
If Playwright is not set up in the target project, follow the procedure in [references/scaffolding.md](references/scaffolding.md) to clone the boilerplate, merge configuration, and install dependencies.
|
|
142
|
+
If Playwright is not set up in the target project, follow the procedure in [references/scaffolding.md](references/scaffolding.md) to clone the boilerplate, merge configuration, and install dependencies.
|
|
190
143
|
|
|
191
144
|
## Authentication
|
|
192
145
|
|
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
interface:
|
|
2
2
|
display_name: "Run Full E2E Workflow"
|
|
3
3
|
short_description: "Orchestrate browser-led E2E work from exploration to verified tests"
|
|
4
|
-
default_prompt: "Run the full E2E workflow for this feature:
|
|
4
|
+
default_prompt: "Run the full E2E workflow for this feature: explore the live product first, then plan, spec, review, implement, and verify the Playwright tests."
|
|
5
5
|
|
|
6
6
|
dependencies:
|
|
7
7
|
tools:
|
package/dist/{2.0.8 → 2.0.10}/claude/plugin/skills/add-e2e-tests/references/error-recovery.md
RENAMED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
# Error Recovery for Infrastructure Failures
|
|
2
2
|
|
|
3
|
-
When infrastructure failures occur during E2E test building, follow the general pattern: diagnose, attempt one known fix, if still stuck
|
|
3
|
+
When infrastructure failures occur during E2E test building, follow the general pattern: diagnose, attempt one known fix, if still stuck ask the user.
|
|
4
4
|
|
|
5
5
|
## Browser MCP unavailable
|
|
6
6
|
|
|
@@ -8,7 +8,7 @@ The browser MCP server (`agent-web-interface`) must be running for site explorat
|
|
|
8
8
|
|
|
9
9
|
1. Verify the MCP server is configured in the project (check `.mcp.json` or plugin config).
|
|
10
10
|
2. Ask the user to confirm the MCP server is running or to restart it.
|
|
11
|
-
3. If unreachable after user intervention,
|
|
11
|
+
3. If unreachable after user intervention, this is an unrecoverable blocker — inform the user that browser exploration cannot proceed.
|
|
12
12
|
|
|
13
13
|
## Boilerplate clone fails
|
|
14
14
|
|
|
@@ -40,4 +40,4 @@ For any infrastructure failure not listed above:
|
|
|
40
40
|
|
|
41
41
|
1. **Diagnose** — read the error message carefully, check logs, identify the root cause.
|
|
42
42
|
2. **Attempt one known fix** — apply the most likely solution based on the error.
|
|
43
|
-
3. **If still stuck** —
|
|
43
|
+
3. **If still stuck** — ask the user for help with the full error output and diagnostic steps taken. Do not loop through multiple speculative fixes.
|
package/dist/{2.0.8/codex → 2.0.10/claude}/plugin/skills/add-e2e-tests/references/scaffolding.md
RENAMED
|
@@ -9,4 +9,4 @@ If Playwright is not set up in the target project, follow this procedure:
|
|
|
9
9
|
4. Merge devDependencies into `package.json`.
|
|
10
10
|
5. Run `npm install && npx playwright install --with-deps chromium`.
|
|
11
11
|
6. Clean up the temp clone.
|
|
12
|
-
7.
|
|
12
|
+
7. Preserve the important scaffolding decisions in your working notes.
|
|
@@ -56,7 +56,7 @@ Investigate based on the classification:
|
|
|
56
56
|
- Check if the test asserts before an API response arrives — search for missing `waitForResponse`
|
|
57
57
|
- Look for animations/transitions that affect element state (CSS transitions, skeleton screens)
|
|
58
58
|
- Check for `waitForTimeout` being used as a "fix" — this is a symptom, not a cure
|
|
59
|
-
- Check
|
|
59
|
+
- Check whether the test needs a more specific readiness signal: targeted `waitForResponse`, a URL/assertion change, a loading indicator disappearing, or a hydration marker becoming ready
|
|
60
60
|
|
|
61
61
|
**State leakage:**
|
|
62
62
|
- Run the failing test alone: `npx playwright test --grep "<test name>"`
|
package/dist/{2.0.8/codex → 2.0.10/claude}/plugin/skills/fix-flaky-tests/references/fix-patterns.md
RENAMED
|
@@ -17,8 +17,9 @@ await expect(element).toBeVisible();
|
|
|
17
17
|
await expect(page.getByRole('progressbar')).toBeHidden();
|
|
18
18
|
await expect(element).toBeVisible();
|
|
19
19
|
|
|
20
|
-
// GOOD: wait for
|
|
21
|
-
await page.goto('/page'
|
|
20
|
+
// GOOD: wait for a specific readiness signal after navigation
|
|
21
|
+
await page.goto('/page');
|
|
22
|
+
await expect(page.getByRole('heading', { name: /dashboard/i })).toBeVisible();
|
|
22
23
|
|
|
23
24
|
// GOOD: use auto-retrying assertion (retries until timeout)
|
|
24
25
|
await expect(page.getByText(/loaded/i)).toBeVisible({ timeout: 10000 });
|
|
@@ -80,7 +80,7 @@ Launch another subagent, or continue in the main thread if the flow is small, to
|
|
|
80
80
|
|
|
81
81
|
### Step 4: Reason About Additional Scenarios
|
|
82
82
|
|
|
83
|
-
After exploration, reason about scenarios that could not be directly triggered but
|
|
83
|
+
After exploration, reason about scenarios that could not be directly triggered but may still need coverage:
|
|
84
84
|
|
|
85
85
|
- **Network & Performance** — failure modes, slow responses, large data sets, offline behavior
|
|
86
86
|
- **Accessibility (WCAG 2.1 AA)** — keyboard navigation, screen reader support, focus management, contrast
|
|
@@ -88,6 +88,12 @@ After exploration, reason about scenarios that could not be directly triggered b
|
|
|
88
88
|
- **Cross-browser** — Safari/Firefox/mobile-specific behavioral differences
|
|
89
89
|
- **Concurrent & Session** — session expiry, multi-tab conflicts, race conditions
|
|
90
90
|
|
|
91
|
+
For scenarios that were not directly observed:
|
|
92
|
+
- label them clearly as inferred, mock-required, or environment-dependent in the spec notes
|
|
93
|
+
- avoid inventing exact UI text, validation copy, or server behavior you did not observe
|
|
94
|
+
- phrase expected results at the right confidence level (for example, "shows an error state" rather than exact copy if the exact message was not seen)
|
|
95
|
+
- prefer these scenarios when they are strongly implied by the architecture or are standard negative paths the implementation will need to simulate
|
|
96
|
+
|
|
91
97
|
See [references/scenario-categories.md](references/scenario-categories.md) for detailed checklists within each category.
|
|
92
98
|
|
|
93
99
|
### Step 5: Generate Test Case Specifications
|
|
@@ -162,7 +168,7 @@ When generating specs that span multiple roles or test categories, recommend rol
|
|
|
162
168
|
- Steps must be **concrete and unambiguous** — "click the Submit button" not "submit the form"
|
|
163
169
|
- Expected results must be **observable and verifiable** — include actual error messages observed
|
|
164
170
|
- Priority must be **justified** — Critical = blocks core journey, High = significant, Medium = secondary, Low = cosmetic
|
|
165
|
-
-
|
|
171
|
+
- Include at minimum one network/server failure scenario, one empty state scenario, and one session/auth edge case when those scenarios meaningfully apply to the feature. If a category is not applicable, say so explicitly in the spec rather than inventing coverage.
|
|
166
172
|
- **Test case count guidance:** Aim for 15-30 test cases per feature area as a baseline. Fewer than 10 suggests missing error paths or edge cases. More than 40 suggests the feature should be split into sub-features with separate spec files. Prioritize breadth of category coverage over depth within a single category.
|
|
167
173
|
|
|
168
174
|
## Blocking Conditions
|
|
@@ -20,6 +20,7 @@ Plan what E2E tests to write for a feature by analyzing existing test coverage a
|
|
|
20
20
|
```
|
|
21
21
|
- Identify what's already covered and what's missing
|
|
22
22
|
- Note existing TC-IDs for the feature area to avoid conflicts
|
|
23
|
+
- Use the canonical TC-ID format `TC-<FEATURE>-<NNN>` for every planned test, regardless of category. Category belongs in the plan metadata, not the ID.
|
|
23
24
|
|
|
24
25
|
3. **Quick site inspection** (lightweight, not full exploration, optional if browser tooling is unavailable):
|
|
25
26
|
- If the current context has browser tools, follow the `agent-web-interface-guide` skill's browsing patterns (orient before acting, use `list_pages` for session awareness, close only pages you opened)
|
|
@@ -66,28 +67,28 @@ Plan what E2E tests to write for a feature by analyzing existing test coverage a
|
|
|
66
67
|
#### P0 — Critical Path
|
|
67
68
|
| TC-ID | Description | Why Critical |
|
|
68
69
|
|-------|-------------|-------------|
|
|
69
|
-
| TC-FEATURE-
|
|
70
|
+
| TC-FEATURE-001 | Happy path: user completes full flow | Core revenue path |
|
|
70
71
|
|
|
71
72
|
#### P1 — Validation & Errors
|
|
72
73
|
| TC-ID | Description | Why Important |
|
|
73
74
|
|-------|-------------|--------------|
|
|
74
|
-
| TC-FEATURE-
|
|
75
|
+
| TC-FEATURE-002 | Submit with empty required fields | Common user error |
|
|
75
76
|
|
|
76
77
|
#### P2 — Edge Cases
|
|
77
78
|
| TC-ID | Description | Notes |
|
|
78
79
|
|-------|-------------|-------|
|
|
79
|
-
| TC-FEATURE-
|
|
80
|
+
| TC-FEATURE-003 | Special characters in search input | Unicode handling |
|
|
80
81
|
|
|
81
82
|
#### Accessibility (include if project has accessibility requirements or WCAG compliance goals)
|
|
82
83
|
| TC-ID | Description | WCAG Criterion |
|
|
83
84
|
|-------|-------------|----------------|
|
|
84
|
-
| TC-FEATURE-
|
|
85
|
-
| TC-FEATURE-
|
|
85
|
+
| TC-FEATURE-004 | Keyboard-only navigation through flow | 2.1.1 Keyboard |
|
|
86
|
+
| TC-FEATURE-005 | Form errors announced to screen readers | 1.3.1 Info and Relationships |
|
|
86
87
|
|
|
87
88
|
#### Visual Regression (if project has visual testing setup)
|
|
88
89
|
| TC-ID | Description | Viewport |
|
|
89
90
|
|-------|-------------|----------|
|
|
90
|
-
| TC-FEATURE-
|
|
91
|
+
| TC-FEATURE-006 | Layout consistency at mobile width | 375x812 |
|
|
91
92
|
|
|
92
93
|
#### Cross-Browser Matrix (include if project runs tests across multiple browsers)
|
|
93
94
|
| Browser | Priority | Reason |
|
|
@@ -30,7 +30,6 @@ If no argument provided, search for `test-cases/*.md` files and review the most
|
|
|
30
30
|
1. Read the test case spec file
|
|
31
31
|
2. Read any related files for context:
|
|
32
32
|
- `e2e-plan/conventions.md` or `e2e-plan/coverage-plan.md` if they exist
|
|
33
|
-
- `e2e-tracker.md` if it exists (to understand what was explored)
|
|
34
33
|
3. Extract the target URL from the spec header
|
|
35
34
|
|
|
36
35
|
### Step 2: Run the Review Checklist
|
|
@@ -46,11 +45,11 @@ Evaluate every test case against each criterion. Track findings by severity:
|
|
|
46
45
|
| Check | What to Look For |
|
|
47
46
|
|-------|-----------------|
|
|
48
47
|
| Happy path present | At least one Critical-priority test covers the primary success flow end-to-end |
|
|
49
|
-
| Error paths covered (
|
|
48
|
+
| Error paths covered (when applicable) | For features that depend on backend retrieval/submission, expect coverage for meaningful server/network failures. For collection/data-driven UIs, expect empty-state coverage. For auth-gated features, expect session/auth edge cases. Missing an applicable category is a BLOCKER; a clearly documented non-applicable category is acceptable |
|
|
50
49
|
| Boundary conditions | Min/max values, empty inputs, special characters, long strings |
|
|
51
50
|
| Authentication edge cases | Session expiry, unauthorized access, role-based differences (if applicable) |
|
|
52
51
|
| Navigation edge cases | Back/forward, direct URL access, refresh mid-flow |
|
|
53
|
-
| Missing user actions | Every
|
|
52
|
+
| Missing critical user actions | Every user-critical action in scope should appear in at least one test case. Ancillary controls may be omitted if they are not material to the target journey |
|
|
54
53
|
|
|
55
54
|
#### 2b. Specification Quality
|
|
56
55
|
|
|
@@ -59,7 +58,7 @@ Evaluate every test case against each criterion. Track findings by severity:
|
|
|
59
58
|
| Steps are concrete | "Click the Submit button" not "submit the form"; "Enter 'test@example.com' in Email field" not "enter email" |
|
|
60
59
|
| Expected results are observable | Specific text, URL change, element state — not "page updates" or "works correctly" |
|
|
61
60
|
| Preconditions are explicit | Auth state, test data, feature flags, starting URL — nothing assumed |
|
|
62
|
-
| TC-IDs are sequential | No gaps, no duplicates,
|
|
61
|
+
| TC-IDs are sequential | No gaps, no duplicates, and use the canonical `TC-<FEATURE>-<NNN>` format |
|
|
63
62
|
| Priority is justified | Critical = blocks core journey; not everything is Critical |
|
|
64
63
|
| Categories are accurate | Happy Path vs Validation vs Edge Case — correctly classified |
|
|
65
64
|
|
|
@@ -95,7 +95,7 @@ Avoid `.first()` / `.nth()` unless a strong, documented reason exists — scope
|
|
|
95
95
|
- If explicit waiting needed, wait for meaningful state: visibility, enabled, URL, specific network response, spinner gone
|
|
96
96
|
|
|
97
97
|
### Test Case IDs
|
|
98
|
-
- Every test MUST have a unique TC-ID: `TC-<FEATURE>-<
|
|
98
|
+
- Every test MUST have a unique TC-ID: `TC-<FEATURE>-<NNN>`
|
|
99
99
|
- Include in test title: `test('TC-LOGIN-001: User can log in with valid credentials', ...)`
|
|
100
100
|
- Sequential within feature area, never reused
|
|
101
101
|
- When adding to existing file, check existing IDs and continue the sequence
|
|
@@ -143,8 +143,9 @@ See [references/api-setup-teardown.md](references/api-setup-teardown.md) for ful
|
|
|
143
143
|
|
|
144
144
|
### Network Interception and Error Paths
|
|
145
145
|
Use `page.route()` to mock server errors, patch responses, assert backend calls, or block
|
|
146
|
-
heavy resources.
|
|
147
|
-
|
|
146
|
+
heavy resources. Add error path tests when they meaningfully apply to the feature: for example,
|
|
147
|
+
server/network failures for backend-driven flows, empty states for collection/data-driven UIs,
|
|
148
|
+
and session/auth cases for gated features. If a category is not applicable, do not invent it.
|
|
148
149
|
|
|
149
150
|
See [references/network-interception.md](references/network-interception.md) for full patterns with code examples.
|
|
150
151
|
|
package/dist/{2.0.8 → 2.0.10}/claude/plugin/skills/write-test-code/references/api-setup-teardown.md
RENAMED
|
@@ -80,4 +80,4 @@ export const test = base.extend<{ testTicket: { id: string; title: string } }>({
|
|
|
80
80
|
|
|
81
81
|
For environments where individual deletion is impractical, tag test data (e.g., `title LIKE 'Test %'`) and delete in batch during `globalTeardown.ts`.
|
|
82
82
|
|
|
83
|
-
If the cleanup API endpoint is unknown, do not invent one. Leave a clear `TODO` with the missing endpoint details, document the cleanup gap in the test file or
|
|
83
|
+
If the cleanup API endpoint is unknown, do not invent one. Leave a clear `TODO` with the missing endpoint details, document the cleanup gap in the test file or working notes, and prefer fixture-scoped or environment reset strategies that you can verify. If cleanup is genuinely impossible (no API, no database access), document this as a known limitation in the test file header AND add an `afterEach` that logs a warning.
|
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "e2e-test-builder",
|
|
3
|
-
"version": "2.0.
|
|
3
|
+
"version": "2.0.10",
|
|
4
4
|
"description": "Full-pipeline Playwright E2E test generation \u2014 explores your live site via browser, detects existing test conventions, plans coverage gaps, produces reviewed test specs, writes production-grade test code with quality gates, and stabilizes flaky tests",
|
|
5
5
|
"author": {
|
|
6
6
|
"name": "Athenaflow"
|
|
@@ -1,9 +1,15 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "@athenaflow/plugin-e2e-test-builder",
|
|
3
|
-
"version": "2.0.
|
|
3
|
+
"version": "2.0.10",
|
|
4
4
|
"description": "Full-pipeline Playwright E2E test generation — explores your live site via browser, detects existing test conventions, plans coverage gaps, produces reviewed test specs, writes production-grade test code with quality gates, and stabilizes flaky tests",
|
|
5
5
|
"license": "MIT",
|
|
6
6
|
"publishConfig": {
|
|
7
7
|
"access": "public"
|
|
8
|
-
}
|
|
8
|
+
},
|
|
9
|
+
"files": [
|
|
10
|
+
".claude-plugin/",
|
|
11
|
+
".codex-plugin/",
|
|
12
|
+
"skills/",
|
|
13
|
+
"dist/"
|
|
14
|
+
]
|
|
9
15
|
}
|
|
@@ -8,7 +8,7 @@ description: >
|
|
|
8
8
|
Delegates to sub-skills (analyze-test-codebase, plan-test-coverage, generate-test-cases,
|
|
9
9
|
review-test-cases, write-test-code, review-test-code, fix-flaky-tests) internally — do NOT
|
|
10
10
|
skip to sub-skills directly unless the user explicitly requests a narrow activity.
|
|
11
|
-
|
|
11
|
+
Uses subagent delegation to save context.
|
|
12
12
|
allowed-tools: Read Write Edit Glob Grep Bash Task
|
|
13
13
|
---
|
|
14
14
|
|
|
@@ -22,34 +22,17 @@ Parse the target URL and feature description from: $ARGUMENTS
|
|
|
22
22
|
|
|
23
23
|
Derive a **feature slug** from the feature description (e.g., "Login flow" → `login`, "Checkout with payment" → `checkout`). Use this slug for file naming throughout.
|
|
24
24
|
|
|
25
|
-
##
|
|
26
|
-
|
|
27
|
-
### 1. Orient: Understand the Project, the Product, and Your Capabilities
|
|
25
|
+
## 1. Orient: Understand the Project, the Product, and Your Capabilities
|
|
28
26
|
|
|
29
27
|
Before planning any work, build deep situational awareness. This step determines the quality of everything that follows — rushed orientation leads to missed test cases and wasted effort.
|
|
30
28
|
|
|
31
|
-
|
|
32
|
-
- If `e2e-tracker.md` exists in the project root, read it and resume from where you left off — skip to **step 2 (Plan)** with the remaining work.
|
|
33
|
-
- If no tracker exists, this is a fresh start. Proceed with orientation below.
|
|
34
|
-
|
|
35
|
-
#### First: create initial tasks and tracker
|
|
36
|
-
|
|
37
|
-
As soon as you parse the user's request:
|
|
38
|
-
|
|
39
|
-
1. **Create the tracker** — write `e2e-tracker.md` with the goal (URL, feature, slug) and a skeleton plan.
|
|
40
|
-
2. **Create high-level tasks** for the work ahead — analyze codebase, explore the product, plan coverage, generate test specs, write tests, verify tests.
|
|
41
|
-
|
|
42
|
-
These are your starting skeleton. As you work through orientation and discover the actual shape of the work, refine both the tasks and the tracker — break tasks into granular sub-tasks, add new ones, remove ones that don't apply.
|
|
43
|
-
|
|
44
|
-
Treat the task list as a visible milestone log. Keep it concise, but update it continuously. Do not leave broad tasks open until the end and then mark everything complete in one batch.
|
|
45
|
-
|
|
46
|
-
#### 1a. Understand the codebase
|
|
29
|
+
### Understand the codebase
|
|
47
30
|
|
|
48
31
|
- Does a Playwright config exist (`playwright.config.{ts,js,mjs}`)? If not, you will need to scaffold one (see Scaffolding section).
|
|
49
32
|
- Are there existing tests? What conventions do they follow — naming, locators, fixtures, page objects, auth?
|
|
50
33
|
- Load the `analyze-test-codebase` skill and follow its methodology.
|
|
51
34
|
|
|
52
|
-
|
|
35
|
+
### Understand the product
|
|
53
36
|
|
|
54
37
|
This is the most important part of orientation. You cannot write good tests for a product you don't understand.
|
|
55
38
|
|
|
@@ -59,7 +42,7 @@ This is the most important part of orientation. You cannot write good tests for
|
|
|
59
42
|
|
|
60
43
|
Why this matters: absent explicit exploration, agents tend to write tests based on assumptions about how a product works rather than how it actually works. The result is tests that target imaginary behavior or miss critical real behavior. Spending time here prevents both.
|
|
61
44
|
|
|
62
|
-
|
|
45
|
+
### Know your skills
|
|
63
46
|
|
|
64
47
|
You have access to specialized skills that contain deep domain knowledge. Load the relevant skill before performing each activity — skills prevent improvisation and encode best practices.
|
|
65
48
|
|
|
@@ -76,22 +59,11 @@ You have access to specialized skills that contain deep domain knowledge. Load t
|
|
|
76
59
|
|
|
77
60
|
Before doing a substantial activity, load the skill that covers that activity so you can follow its workflow rather than improvising.
|
|
78
61
|
|
|
79
|
-
|
|
80
|
-
|
|
81
|
-
After orienting, update the tracker with what you learned about the codebase and product, conventions discovered, and your refined plan. The tracker must always answer these four questions for anyone reading it cold:
|
|
82
|
-
|
|
83
|
-
1. What is the goal?
|
|
84
|
-
2. What has been done?
|
|
85
|
-
3. What is remaining?
|
|
86
|
-
4. What should I do next?
|
|
62
|
+
## 2. Plan: Refine Tasks Into Granular Checkpoints
|
|
87
63
|
|
|
88
|
-
|
|
64
|
+
Refine the work into granular checkpoints based on what orientation revealed. The plan should flow from what you learned, not from a fixed template.
|
|
89
65
|
|
|
90
|
-
###
|
|
91
|
-
|
|
92
|
-
By now you have initial tasks and a tracker from step 1. Refine tasks into granular checkpoints. The plan should flow from what you learned during orientation, not from a fixed template.
|
|
93
|
-
|
|
94
|
-
#### Task granularity
|
|
66
|
+
### Task granularity
|
|
95
67
|
|
|
96
68
|
Think in small checkpoints, not big phases. Each task should represent a concrete, verifiable unit of progress.
|
|
97
69
|
|
|
@@ -117,11 +89,11 @@ Create tasks for verification steps too (running tests, checking coverage, brows
|
|
|
117
89
|
|
|
118
90
|
Update task status as each checkpoint completes. A good pattern is: finish exploration and mark it complete, finish coverage/spec work and mark it complete, finish implementation and mark it complete, then finish review/execution and mark it complete. Do not keep all milestones open until session end.
|
|
119
91
|
|
|
120
|
-
|
|
92
|
+
## 3. Execute
|
|
121
93
|
|
|
122
94
|
Work through your tasks. Load the relevant skill before each activity.
|
|
123
95
|
|
|
124
|
-
|
|
96
|
+
### Planning uses the browser heavily
|
|
125
97
|
|
|
126
98
|
When planning what to test (coverage planning, test case generation), use the browser extensively. Don't just catalog elements — interact with the product to discover:
|
|
127
99
|
- What validation messages appear for each field?
|
|
@@ -133,14 +105,14 @@ When planning what to test (coverage planning, test case generation), use the br
|
|
|
133
105
|
|
|
134
106
|
Every test case you generate should trace back to something you actually observed or deliberately triggered in the browser. This is how you avoid introducing useless test cases (testing imaginary behavior) and avoid missing important ones (behavior you didn't think to check).
|
|
135
107
|
|
|
136
|
-
|
|
108
|
+
### Subagent delegation
|
|
137
109
|
|
|
138
110
|
Delegate heavy browser exploration and test writing to subagents when that saves context for orchestration, verification, and debugging. When delegating:
|
|
139
111
|
- Pass the relevant file paths (conventions, coverage plan, test specs)
|
|
140
112
|
- Instruct the subagent to invoke the appropriate skill (subagents inherit access to plugin skills)
|
|
141
113
|
- Specify concrete output expectations (file path, format, TC-ID conventions)
|
|
142
114
|
|
|
143
|
-
|
|
115
|
+
### Quality gates
|
|
144
116
|
|
|
145
117
|
Two review gates and a test execution checkpoint are mandatory during execution. The review gates are review-only — they produce findings but do not modify files.
|
|
146
118
|
|
|
@@ -148,45 +120,26 @@ Two review gates and a test execution checkpoint are mandatory during execution.
|
|
|
148
120
|
1. Load the `review-test-cases` skill and run it against `test-cases/<feature>.md`
|
|
149
121
|
2. If verdict is **NEEDS REVISION** — address all blockers in the spec before proceeding to implementation
|
|
150
122
|
3. If verdict is **PASS WITH WARNINGS** — address warnings if quick, otherwise note them and proceed
|
|
151
|
-
4. Record the review verdict in the tracker
|
|
152
|
-
|
|
153
123
|
**Gate 2: Review test code** (after `write-test-code`, before final test execution)
|
|
154
124
|
1. Load the `review-test-code` skill and run it against the implemented test files
|
|
155
125
|
2. If verdict is **NEEDS REVISION** — fix all blockers before running tests for signoff
|
|
156
126
|
3. If verdict is **PASS WITH WARNINGS** — fix warnings that affect stability, proceed with execution
|
|
157
|
-
4. Record the review verdict in the tracker
|
|
158
127
|
|
|
159
128
|
**Checkpoint: Test execution**
|
|
160
129
|
1. Run the tests: `npx playwright test <file> --reporter=list 2>&1`
|
|
161
|
-
2.
|
|
130
|
+
2. Inspect the full output — green test output is the only proof of correctness
|
|
162
131
|
3. If tests fail, load the `fix-flaky-tests` skill and follow its structured diagnostic approach. Do not guess-and-retry.
|
|
163
|
-
4. Maximum 3 fix-and-rerun cycles per test. If stuck after 3 cycles,
|
|
164
|
-
|
|
165
|
-
**Test execution and coverage checks must never be delegated to subagents.** Run `npx playwright test` directly and record the output.
|
|
166
|
-
|
|
167
|
-
#### Update the tracker as you work
|
|
168
|
-
|
|
169
|
-
Do not wait until session end. After each meaningful chunk of progress (completing a step, discovering a blocker, producing an artifact), update the tracker. If your context window resets, only what's in the tracker survives.
|
|
170
|
-
|
|
171
|
-
Keep the tracker and task list synchronized. If you record progress in the tracker, update the corresponding task status in the same phase of work.
|
|
172
|
-
|
|
173
|
-
#### Error recovery
|
|
174
|
-
|
|
175
|
-
If infrastructure failures occur (browser MCP unavailable, clone failures, npm install errors), see [references/error-recovery.md](references/error-recovery.md) for diagnostic steps. General pattern: diagnose, attempt one known fix, if still stuck record in tracker and ask the user.
|
|
132
|
+
4. Maximum 3 fix-and-rerun cycles per test. If stuck after 3 cycles, move on with the diagnostic output.
|
|
176
133
|
|
|
177
|
-
|
|
134
|
+
**Test execution and coverage checks must never be delegated to subagents.** Run `npx playwright test` directly.
|
|
178
135
|
|
|
179
|
-
|
|
180
|
-
1. Ensure the tracker reflects all progress, discoveries, and blockers from this session
|
|
181
|
-
2. Write clear instructions for what the next session should do
|
|
182
|
-
3. If all work is complete and all tests pass with full TC-ID coverage: write `<!-- E2E_COMPLETE -->` as the last line of the tracker
|
|
183
|
-
4. If an unrecoverable blocker prevents progress: write `<!-- E2E_BLOCKED: reason -->` as the last line
|
|
136
|
+
### Error recovery
|
|
184
137
|
|
|
185
|
-
|
|
138
|
+
If infrastructure failures occur (browser MCP unavailable, clone failures, npm install errors), see [references/error-recovery.md](references/error-recovery.md) for diagnostic steps. General pattern: diagnose, attempt one known fix, if still stuck ask the user.
|
|
186
139
|
|
|
187
140
|
## Scaffolding
|
|
188
141
|
|
|
189
|
-
If Playwright is not set up in the target project, follow the procedure in [references/scaffolding.md](references/scaffolding.md) to clone the boilerplate, merge configuration, and install dependencies.
|
|
142
|
+
If Playwright is not set up in the target project, follow the procedure in [references/scaffolding.md](references/scaffolding.md) to clone the boilerplate, merge configuration, and install dependencies.
|
|
190
143
|
|
|
191
144
|
## Authentication
|
|
192
145
|
|