stably 4.8.9 → 4.10.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (47) hide show
  1. package/dist/index.mjs +1 -1
  2. package/dist/stably-plugin-cli/.claude-plugin/plugin.json +5 -0
  3. package/dist/stably-plugin-cli/skills/bash-commands/SKILL.md +65 -0
  4. package/dist/stably-plugin-cli/skills/browser-interaction-guide/SKILL.md +144 -0
  5. package/dist/stably-plugin-cli/skills/bulk-test-handling/SKILL.md +104 -0
  6. package/dist/stably-plugin-cli/skills/debugging-test-failures/SKILL.md +146 -0
  7. package/dist/{stably-plugin → stably-plugin-cli}/skills/playwright-best-practices/SKILL.md +11 -5
  8. package/dist/stably-plugin-cli/skills/playwright-config-auth/SKILL.md +217 -0
  9. package/dist/stably-plugin-cli/skills/stably-sdk-reference/SKILL.md +307 -0
  10. package/dist/stably-plugin-cli/skills/test-creation-workflow/SKILL.md +311 -0
  11. package/package.json +4 -1
  12. package/dist/stably-plugin/.claude-plugin/plugin.json +0 -5
  13. package/dist/stably-plugin/skills/playwright-best-practices/references/accessibility.md +0 -359
  14. package/dist/stably-plugin/skills/playwright-best-practices/references/annotations.md +0 -526
  15. package/dist/stably-plugin/skills/playwright-best-practices/references/assertions-waiting.md +0 -361
  16. package/dist/stably-plugin/skills/playwright-best-practices/references/browser-apis.md +0 -391
  17. package/dist/stably-plugin/skills/playwright-best-practices/references/browser-extensions.md +0 -506
  18. package/dist/stably-plugin/skills/playwright-best-practices/references/canvas-webgl.md +0 -493
  19. package/dist/stably-plugin/skills/playwright-best-practices/references/ci-cd.md +0 -407
  20. package/dist/stably-plugin/skills/playwright-best-practices/references/clock-mocking.md +0 -364
  21. package/dist/stably-plugin/skills/playwright-best-practices/references/component-testing.md +0 -500
  22. package/dist/stably-plugin/skills/playwright-best-practices/references/console-errors.md +0 -420
  23. package/dist/stably-plugin/skills/playwright-best-practices/references/debugging.md +0 -491
  24. package/dist/stably-plugin/skills/playwright-best-practices/references/electron.md +0 -509
  25. package/dist/stably-plugin/skills/playwright-best-practices/references/error-testing.md +0 -360
  26. package/dist/stably-plugin/skills/playwright-best-practices/references/file-operations.md +0 -375
  27. package/dist/stably-plugin/skills/playwright-best-practices/references/fixtures-hooks.md +0 -417
  28. package/dist/stably-plugin/skills/playwright-best-practices/references/flaky-tests.md +0 -494
  29. package/dist/stably-plugin/skills/playwright-best-practices/references/global-setup.md +0 -434
  30. package/dist/stably-plugin/skills/playwright-best-practices/references/i18n.md +0 -508
  31. package/dist/stably-plugin/skills/playwright-best-practices/references/iframes.md +0 -403
  32. package/dist/stably-plugin/skills/playwright-best-practices/references/locators.md +0 -242
  33. package/dist/stably-plugin/skills/playwright-best-practices/references/mobile-testing.md +0 -409
  34. package/dist/stably-plugin/skills/playwright-best-practices/references/multi-context.md +0 -288
  35. package/dist/stably-plugin/skills/playwright-best-practices/references/multi-user.md +0 -393
  36. package/dist/stably-plugin/skills/playwright-best-practices/references/network-advanced.md +0 -452
  37. package/dist/stably-plugin/skills/playwright-best-practices/references/page-object-model.md +0 -315
  38. package/dist/stably-plugin/skills/playwright-best-practices/references/performance-testing.md +0 -476
  39. package/dist/stably-plugin/skills/playwright-best-practices/references/performance.md +0 -453
  40. package/dist/stably-plugin/skills/playwright-best-practices/references/projects-dependencies.md +0 -456
  41. package/dist/stably-plugin/skills/playwright-best-practices/references/security-testing.md +0 -430
  42. package/dist/stably-plugin/skills/playwright-best-practices/references/service-workers.md +0 -504
  43. package/dist/stably-plugin/skills/playwright-best-practices/references/test-coverage.md +0 -495
  44. package/dist/stably-plugin/skills/playwright-best-practices/references/test-data.md +0 -492
  45. package/dist/stably-plugin/skills/playwright-best-practices/references/test-organization.md +0 -361
  46. package/dist/stably-plugin/skills/playwright-best-practices/references/third-party.md +0 -464
  47. package/dist/stably-plugin/skills/playwright-best-practices/references/websockets.md +0 -403
@@ -0,0 +1,5 @@
1
+ {
2
+ "name": "stably-plugin-cli",
3
+ "description": "Skills for Stably CLI agent",
4
+ "version": "1.0.0"
5
+ }
@@ -0,0 +1,65 @@
1
+ ---
2
+ name: bash-commands
3
+ description: Guide for bash command usage including run_in_background for blocking commands. Use when running long-running processes or commands that hang. Triggers on bash, run_in_background, hanging, blocking, long-running.
4
+ ---
5
+
6
+ <!-- CLI-ONLY SKILL: Same command whitelist as web, but no additional restrictions.
7
+ Both CLI and web agents use the same security hooks. -->
8
+
9
+ # Bash Commands Guide
10
+
11
+ This skill covers bash command usage and best practices for the Stably CLI agent environment.
12
+
13
+ ## Long-Running Commands (CRITICAL - Avoid Hanging)
14
+
15
+ **ALWAYS use `run_in_background: true`** for commands that run indefinitely:
16
+
17
+ ```
18
+ Bash with run_in_background: true
19
+ command: "pnpm dev"
20
+ ```
21
+
22
+ ### Why This Matters
23
+
24
+ Some commands never exit on their own - they run until manually stopped. Without `run_in_background: true`, the agent will hang indefinitely waiting for the command to complete.
25
+
26
+ ### Commands That MUST Use run_in_background
27
+
28
+ Any command that blocks indefinitely:
29
+ - Dev servers: `pnpm dev`, `npm start`, `next dev`, `vite`
30
+ - Watch processes: `tsc --watch`, `nodemon`, file watchers
31
+ - Servers: `node server.js`, `python -m http.server`
32
+ - Tail commands: `tail -f`
33
+
34
+ ### After Starting a Background Command
35
+
36
+ 1. The command returns immediately with a task ID
37
+ 2. Wait briefly for initialization if needed
38
+ 3. Proceed with your next action
39
+
40
+ ## Whitelisted Commands
41
+
42
+ The CLI agent uses the same command whitelist as the web agent:
43
+
44
+ | Command | Purpose | Notes |
45
+ |---------|---------|-------|
46
+ | `rm` | Delete files | Restricted to workspace directory |
47
+ | `mv` | Move/rename files | |
48
+ | `git` | Version control | Config flags (-c, --config) blocked |
49
+ | `gh` | GitHub CLI | For PR/issue operations |
50
+ | `npm` | Package manager | |
51
+ | `yarn` | Package manager | |
52
+ | `pnpm` | Package manager | |
53
+ | `bun` | Package manager | |
54
+ | `npx` | Run packages | Only `npx stably test` allowed |
55
+
56
+ ### What's NOT Allowed
57
+
58
+ Use dedicated tools instead of bash for these operations:
59
+ - `cat`, `head`, `tail` → Use the **Read** tool
60
+ - `grep`, `rg` → Use the **Grep** tool
61
+ - `find`, `ls` → Use the **Glob** tool
62
+ - `sed`, `awk` → Use the **Edit** tool
63
+ - `echo`, `printf` → Use the **Write** tool
64
+
65
+ These dedicated tools provide better UX and are easier to review.
@@ -0,0 +1,144 @@
1
+ ---
2
+ name: browser-interaction-guide
3
+ description: Guide for browser interactions in Playwright tests including setup_page vs test_debug decisions, wait patterns, headless mode, locator best practices, and browser cleanup. Use when setting up browser sessions, writing wait patterns, fixing locator issues, or debugging headless failures. Triggers on browser setup, setup_page, test_debug, wait pattern, waitFor, headless, locator, browser cleanup.
4
+ ---
5
+
6
+ # Browser Interaction Guide
7
+
8
+ Guide for browser interactions in Playwright tests.
9
+
10
+ ## Browser Setup
11
+
12
+ Before using any browser interaction tools (`browser_navigate`, `browser_click`, `browser_type`, etc.), you must first call a setup tool:
13
+ - `mcp__playwright-test__generator_setup_page`
14
+ - `mcp__playwright-test__planner_setup_page`
15
+
16
+ For project selection (auth dependencies), see `playwright-config-auth` skill.
17
+
18
+ ### seedFile Limitations
19
+
20
+ `generator_setup_page` requires a **top-level `test()` call** in the seed file. Tests wrapped in `test.describe()` won't be recognized and will return "seed test not found".
21
+
22
+ If your test uses `test.describe()`:
23
+ - Use `tests/seed.spec.ts` as the seedFile instead
24
+ - Then navigate to your starting URL manually with `browser_navigate`
25
+
26
+ ## setup_page vs test_debug
27
+
28
+ ### Use `generator_setup_page` / `planner_setup_page` when:
29
+
30
+ - You want to keep the browser open after running test code
31
+ - You need to pause mid-flow to see current state
32
+ - User says "run the test and keep the browser open"
33
+
34
+ Browser stays open regardless of whether test passes or fails.
35
+
36
+ ### Use `test_debug` when:
37
+
38
+ - Debugging test failures and stepping through tests
39
+ - Verifying a test passes (browser closes on success)
40
+ - Diagnosing why a test is failing (browser stays open on failure)
41
+
42
+ Browser only stays open if test fails.
43
+
44
+ ### Common Mistake
45
+
46
+ When user asks to "run a test and keep the browser open at the end":
47
+ - **Wrong:** Use `test_debug` - it closes browser on success
48
+ - **Right:** Use `generator_setup_page` with test file as seedFile
49
+
50
+ ### Error Recovery
51
+
52
+ When `setup_page` or `test_debug` fails:
53
+
54
+ | Error Type | Action |
55
+ |------------|--------|
56
+ | **Test code error** (assertion failed, locator not found) | Switch to `test_debug` for iterative debugging |
57
+ | **External service error** (timeout, network) | Retry `test_debug` up to 3 times |
58
+ | **Same error 3+ times** | Flag as blocked service issue, do not keep retrying |
59
+
60
+ ## Browser Cleanup
61
+
62
+ Close the browser whenever you are not actively using it. Keeping browsers running costs money.
63
+
64
+ Use `mcp__stably-agent-control__close_browser_session` (NOT `browser_close`) - it directly terminates the VM and always works.
65
+
66
+ ### When to Close
67
+
68
+ - After running tests - close immediately unless running more tests right away
69
+ - After completing browser-based exploration or debugging
70
+ - When user indicates they're done with current task
71
+
72
+ ## Wait Patterns
73
+
74
+ Do NOT use `waitForLoadState()` - Playwright automatically waits for elements to be ready before performing actions.
75
+
76
+ Wait for specific elements instead:
77
+
78
+ ```typescript
79
+ await page.locator('main').waitFor({ state: 'visible' }); // Page ready
80
+ await page.locator('.loader').waitFor({ state: 'hidden' }); // Loading done
81
+ await page.getByRole('heading', { name: 'Title' }).waitFor(); // Content loaded
82
+ ```
83
+
84
+ Best practices:
85
+ - Never use `.catch(() => {})` on waits - it hides timing issues
86
+ - Wait for state transitions (visible → hidden), not just end states
87
+
88
+ ## Headless Mode
89
+
90
+ Tests run in headless mode by default, which is inherently flakier. Write defensively:
91
+
92
+ ```typescript
93
+ // Ensure element is in viewport (critical in headless)
94
+ await element.scrollIntoViewIfNeeded();
95
+ await element.waitFor({ state: 'visible' });
96
+
97
+ // Wait before typing
98
+ await input.waitFor({ state: 'visible' });
99
+ await input.clear();
100
+ await input.fill('new value');
101
+ ```
102
+
103
+ If test passes headed but fails headless, it's a timing issue - add appropriate waits.
104
+
105
+ ## Locator Best Practices
106
+
107
+ ### Prefer Stable Locators
108
+
109
+ ```typescript
110
+ // Role-based (best)
111
+ page.getByRole('button', { name: 'Submit' })
112
+
113
+ // Test IDs
114
+ page.getByTestId('submit-button')
115
+
116
+ // Structural with context
117
+ page.locator('table tbody tr').first().locator('button')
118
+ ```
119
+
120
+ ### Avoid Fragile Locators
121
+
122
+ - Empty text filters with indexes: `.filter({ hasText: /^$/ }).nth(5)`
123
+ - Absolute indexes without context
124
+ - CSS selectors based on implementation details
125
+
126
+ ### Use `.describe()` Method
127
+
128
+ All locators must use `.describe()` for readability in trace views:
129
+
130
+ ```typescript
131
+ page.getByRole('button', { name: 'Submit' }).describe('Submit button')
132
+ ```
133
+
134
+ ## Browser Interaction Strategy (Priority Order)
135
+
136
+ 1. **Standard Playwright methods** (`locator.click()`, `locator.fill()`) - First choice, fast and reliable
137
+ 2. **`agent.act()`** for complex/dynamic UI - Use for sliders, carousels, drag-drop, dynamic menus. Slower but more robust.
138
+ 3. **Custom JS `page.evaluate()`** - Last resort only when agent.act() is inconsistent
139
+
140
+ Don't prematurely optimize with custom JS. Slower but reliable `agent.act()` is better than fast but brittle custom JS.
141
+
142
+ ## Page State Snapshots
143
+
144
+ After browser actions, you receive a filtered page snapshot showing interactive elements. If you need full page context, use `mcp__playwright-test__browser_snapshot` for the complete unfiltered accessibility tree.
@@ -0,0 +1,104 @@
1
+ ---
2
+ name: bulk-test-handling
3
+ description: Guide for handling bulk test creation requests (>15 tests). Use when user submits many tests at once via CSV, list, or batch import. Covers time estimation and batch processing approach. Triggers on bulk import, many tests, CSV import, batch tests, multiple tests at once.
4
+ ---
5
+
6
+ <!-- CLI-ONLY SKILL: No AskUserQuestion - outputs info then proceeds based on mode.
7
+ The web version (stably-plugin-web) uses AskUserQuestion for confirmation.
8
+ Check system prompt for "one-shot mode" to determine behavior. -->
9
+
10
+ # Bulk Test Handling Guide (CLI)
11
+
12
+ This skill provides guidance for handling requests to create many tests at once in CLI mode.
13
+
14
+ ## When This Applies
15
+
16
+ Use this guidance when the user submits **more than 15 tests** at once, such as:
17
+ - CSV import with test prompts
18
+ - List of test descriptions in a single message
19
+ - Batch test creation request
20
+ - "Create tests for all these scenarios..."
21
+
22
+ ## Required: Inform User Before Starting
23
+
24
+ Before starting bulk test creation, output a clear summary:
25
+ 1. Confirm the number of tests detected
26
+ 2. Set time expectations
27
+ 3. Explain the batch approach
28
+
29
+ ### Time Estimation
30
+
31
+ Each test typically takes **5-15 minutes** depending on complexity:
32
+ - Simple tests (form fill, navigation): ~5 min
33
+ - Medium tests (multi-step workflows): ~10 min
34
+ - Complex tests (conditional logic, multiple assertions): ~15 min
35
+
36
+ **Estimate formula:** `numberOfTests × 10 minutes` (use 10 min as average)
37
+
38
+ ### Information Template
39
+
40
+ Output this before starting:
41
+
42
+ ```
43
+ Detected {N} tests to create.
44
+
45
+ **Processing approach:** I'll work through these in batches of about 6 tests at a time.
46
+
47
+ **Estimated time:** Approximately {N × 10} minutes total ({N} tests × ~10 min each)
48
+
49
+ **What I'll do for each test:**
50
+ 1. Analyze the test requirement
51
+ 2. Explore the application if needed
52
+ 3. Write the test code
53
+ 4. Verify it runs successfully
54
+
55
+ Starting with batch 1...
56
+ ```
57
+
58
+ ## Processing Approach
59
+
60
+ ### Batch Size
61
+
62
+ Process tests in batches of approximately **6 tests** at a time. This:
63
+ - Keeps quality high
64
+ - Allows natural progress checkpoints
65
+ - Enables course corrections if needed
66
+
67
+ ### Progress Updates
68
+
69
+ Keep the user informed:
70
+ - "Starting batch 1 of 3 (tests 1-6)..."
71
+ - "Completed 6/18 tests. Moving to batch 2..."
72
+ - "Batch 2 complete. 2 tests needed manual review..."
73
+
74
+ ### Between Batches - Check Your System Prompt
75
+
76
+ Your system prompt tells you which mode you're in:
77
+ - **If "one-shot mode" appears in your system prompt:** Continue automatically through all batches without pausing for input
78
+ - **Otherwise (chat mode):** You can pause between batches to let user review progress and adjust priorities
79
+
80
+ After completing each batch:
81
+ 1. Summarize what was created
82
+ 2. Note any tests that had issues
83
+ 3. In chat mode: ask if user wants to continue or adjust
84
+ 4. In one-shot mode: proceed automatically to next batch
85
+
86
+ ## Handling Issues
87
+
88
+ If tests fail or have problems during bulk creation:
89
+ - Note the issue but continue with other tests
90
+ - Compile a summary of problematic tests at the end
91
+ - List which tests need follow-up
92
+
93
+ ## Summary at Completion
94
+
95
+ After all batches complete, provide:
96
+ - Total tests created successfully
97
+ - List of any failed/skipped tests with reasons
98
+ - Suggestions for resolving issues
99
+
100
+ ## Do NOT
101
+
102
+ - Start creating tests without informing user of scope
103
+ - Process all tests without progress updates
104
+ - Skip time estimation
@@ -0,0 +1,146 @@
1
+ ---
2
+ name: debugging-test-failures
3
+ description: Guide for debugging test execution issues including assertion failures, MCP tool failures, test setup problems, browser errors, and flaky tests. Use when tests fail, MCP tools fail, browser setup fails, or tests are flaky. Triggers on test failure, debug, MCP error, test won't run, setup failed, CDP error, flaky test, timeout.
4
+ ---
5
+
6
+ # Debugging Test Failures
7
+
8
+ Guide for debugging Playwright test failures.
9
+
10
+ ## Gather Codebase Context First
11
+
12
+ Before attempting any fix, understand how the userflow being tested actually works. Explore application source code to understand:
13
+ - What components/pages are involved
14
+ - What APIs/services are called and data flows
15
+ - What state management affects the UI
16
+ - What recent changes may have caused the failure
17
+
18
+ ### Codebase Exploration
19
+
20
+ Use sub-agents with `model: "haiku"` for codebase exploration to save costs:
21
+
22
+ ```
23
+ subagent_type: "Explore"
24
+ model: "haiku"
25
+ description: "Find login page components"
26
+ prompt: "Search the codebase for components related to the login page. Find the main login component, any API calls it makes, and related hooks or services."
27
+ ```
28
+
29
+ Application code is usually in `../src/`, `../app/`, or `../packages/` - parent directories of the tests folder.
30
+
31
+ ## Source of Truth Hierarchy
32
+
33
+ When determining what behavior is "correct", use this hierarchy (most trustworthy first):
34
+
35
+ 1. **PR Description / Commit Messages** - Explains intended behavior change
36
+ 2. **Design Documents** - Describe intended behavior, not necessarily what code does
37
+ 3. **Test Intent (User Prompt Comments)** - Records what the test should verify
38
+ 4. **Application Code** - Shows what actually happens, not what should happen (code can have bugs)
39
+
40
+ If code conflicts with PR description or design docs, code may be wrong. Report as application bug.
41
+
42
+ ## Issue Classification
43
+
44
+ Five types of issues cause test failures:
45
+
46
+ | Type | Description | Examples |
47
+ |------|-------------|----------|
48
+ | **Test Issue** | Test is incorrect or needs updating | Wrong assertions, flaky selectors, race conditions |
49
+ | **Application Bug** | Application has genuine bug | Functionality broken, UI not rendering |
50
+ | **External/Other** | Infrastructure or environment issues | Third-party outages, network problems |
51
+ | **Flaky Test** | Non-deterministic failure | Passes on retry, timing-dependent |
52
+ | **UI Change** | UI intentionally changed, test is stale | Button text changed, layout restructured |
53
+
54
+ ## Debugging by Mode
55
+
56
+ **In `fix` mode**: Delegate browser-based exploration to the `debug-worker` subagent.
57
+
58
+ **In `single`, `build`, `chat` modes**: Run `test_debug` / `test_run` directly (no debug subagent available).
59
+
60
+ ### When to Use Subagents
61
+
62
+ - Verifying a test works after making changes
63
+ - Reproducing test failures
64
+ - Investigating flaky tests
65
+ - Running tests multiple times for consistency
66
+
67
+ ### What You Do Directly
68
+
69
+ - Writing test code
70
+ - Implementing fixes based on diagnosis
71
+ - Reading test files (use Read tool)
72
+
73
+ ## Debugging Workflow
74
+
75
+ 1. **Find failed tests**: Use Read/Glob to identify failing test files
76
+ 2. **Reproduce + diagnose**:
77
+ - Fix mode: Delegate to `debug-worker`
78
+ - Single/build/chat modes: Run `test_debug` directly
79
+ 3. **Implement fix**: Make necessary code changes
80
+ 4. **Verify fix**: Use `test_run` directly for final verification
81
+ 5. **Group related failures**: Only group tests that share the same root cause
82
+
83
+ ## Parallel Diagnosis
84
+
85
+ Parallelize only read-only analysis. Browser debugging should stay sequential to avoid session conflicts.
86
+
87
+ ### Grouping Strategy
88
+
89
+ 1. Analyze all failures first: Read error messages and categorize by likely root cause
90
+ 2. Group by shared characteristics: Same error, same page, same API dependency
91
+ 3. Launch diagnosis sequentially in fix mode, directly in other modes
92
+
93
+ ## Browser Infrastructure Errors
94
+
95
+ ### CDP Connection Errors
96
+
97
+ If you encounter `browserType.connectOverCDP` errors like "Target page has been closed" or "upstream connection failed":
98
+
99
+ This is a stale browser VM issue, NOT a test or config problem.
100
+
101
+ **Fix:**
102
+ 1. Call `mcp__stably-agent-control__close_browser_session` to terminate the cloud browser VM
103
+ 2. Retry your operation - fresh browser VM will spin up automatically
104
+
105
+ Do NOT repeatedly call `restart_mcp_servers` - it doesn't fix VM-level issues.
106
+
107
+ ## Empty test_run Output
108
+
109
+ If `test_run` returns empty/no output (no pass, no fail, no error):
110
+
111
+ This is NOT a tool malfunction. Empty output means zero tests matched the project filters.
112
+
113
+ Do NOT retry the same command - it will keep returning empty.
114
+
115
+ **Diagnose:** See `playwright-config-auth` skill for project selection guidance and how to fix filter mismatches.
116
+
117
+ ## Debugging During Test Creation
118
+
119
+ ### Do Not Assume
120
+
121
+ - **Bug-free**: Application may have bugs you discover
122
+ - **Accessible**: Website may have auth requirements, bot detection, or rate limiting
123
+
124
+ ### When Observed Behavior Differs from Expectations
125
+
126
+ 1. Do NOT change user's original testing intent without approval
127
+ 2. STOP and report: "The user expects X, but the application does Y"
128
+
129
+ A failing test that catches a real bug is more valuable than a passing test that hides bugs.
130
+
131
+ ## Best Practices
132
+
133
+ - Reproduce first: Always try to reproduce failures before diagnosing
134
+ - Be specific: Identify actual file/function/selector locations
135
+ - Use evidence: Base analysis on actual test runs, errors, code inspection
136
+ - Respect annotations: If test is marked with `test.skip()`, do not touch it
137
+
138
+ ## Maximum Fix Attempts
139
+
140
+ You may attempt to fix a specific issue a maximum of 3 times. If after 3 attempts the same error persists:
141
+
142
+ 1. STOP trying the same approach
143
+ 2. Report the blocker clearly: what you tried, what keeps failing, likely root cause
144
+ 3. Move on if truly blocked
145
+
146
+ Do NOT enter an infinite fix-retry loop. 3 attempts is the maximum before escalating.
@@ -11,6 +11,12 @@ metadata:
11
11
 
12
12
  This skill provides comprehensive guidance for all aspects of Playwright test development, from writing new tests to debugging and maintaining existing test suites.
13
13
 
14
+ ## Execution Policy (Stably Agents)
15
+
16
+ - Use Playwright MCP tools for test execution in normal agent flows (`test_debug` for iteration, `test_run` for final verification).
17
+ - Do NOT run `npx playwright test` from Bash in agent workflows.
18
+ - Exception: auto-heal `fix-worker` subagents validate via `npx stably test --reporter=list` because they run in parallel without browser MCP tools.
19
+
14
20
  ## Activity-Based Reference Guide
15
21
 
16
22
  Consult these references based on what you're doing:
@@ -233,10 +239,10 @@ What are you doing?
233
239
 
234
240
  After writing or modifying tests:
235
241
 
236
- 1. **Run tests**: `npx playwright test --reporter=list`
242
+ 1. **Run iterative checks** with `test_debug`
237
243
  2. **If tests fail**:
238
- - Review error output and trace (`npx playwright show-trace`)
244
+ - Review the failure output and trace artifacts from the test tools
239
245
  - Fix locators, waits, or assertions
240
- - Re-run tests
241
- 3. **Only proceed when all tests pass**
242
- 4. **Run multiple times** for critical tests: `npx playwright test --repeat-each=5`
246
+ - Re-run `test_debug`
247
+ 3. **Final verification**: run `test_run` once after `test_debug` is stable
248
+ 4. **For critical tests**, run additional `test_debug` passes to check repeatability before final `test_run`