npm - stably - Versions diffs - 4.8.9 → 4.10.0 - Mend

stably 4.8.9 → 4.10.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (47) hide show

package/dist/stably-plugin-cli/.claude-plugin/plugin.json ADDED Viewed

@@ -0,0 +1,5 @@
+{
+  "name": "stably-plugin-cli",
+  "description": "Skills for Stably CLI agent",
+  "version": "1.0.0"
+}

package/dist/stably-plugin-cli/skills/bash-commands/SKILL.md ADDED Viewed

@@ -0,0 +1,65 @@
+---
+name: bash-commands
+description: Guide for bash command usage including run_in_background for blocking commands. Use when running long-running processes or commands that hang. Triggers on bash, run_in_background, hanging, blocking, long-running.
+---
+<!-- CLI-ONLY SKILL: Same command whitelist as web, but no additional restrictions.
+     Both CLI and web agents use the same security hooks. -->
+# Bash Commands Guide
+This skill covers bash command usage and best practices for the Stably CLI agent environment.
+## Long-Running Commands (CRITICAL - Avoid Hanging)
+**ALWAYS use `run_in_background: true`** for commands that run indefinitely:
+```
+Bash with run_in_background: true
+command: "pnpm dev"
+```
+### Why This Matters
+Some commands never exit on their own - they run until manually stopped. Without `run_in_background: true`, the agent will hang indefinitely waiting for the command to complete.
+### Commands That MUST Use run_in_background
+Any command that blocks indefinitely:
+- Dev servers: `pnpm dev`, `npm start`, `next dev`, `vite`
+- Watch processes: `tsc --watch`, `nodemon`, file watchers
+- Servers: `node server.js`, `python -m http.server`
+- Tail commands: `tail -f`
+### After Starting a Background Command
+1. The command returns immediately with a task ID
+2. Wait briefly for initialization if needed
+3. Proceed with your next action
+## Whitelisted Commands
+The CLI agent uses the same command whitelist as the web agent:
+| Command | Purpose | Notes |
+|---------|---------|-------|
+| `rm` | Delete files | Restricted to workspace directory |
+| `mv` | Move/rename files | |
+| `git` | Version control | Config flags (-c, --config) blocked |
+| `gh` | GitHub CLI | For PR/issue operations |
+| `npm` | Package manager | |
+| `yarn` | Package manager | |
+| `pnpm` | Package manager | |
+| `bun` | Package manager | |
+| `npx` | Run packages | Only `npx stably test` allowed |
+### What's NOT Allowed
+Use dedicated tools instead of bash for these operations:
+- `cat`, `head`, `tail` → Use the **Read** tool
+- `grep`, `rg` → Use the **Grep** tool
+- `find`, `ls` → Use the **Glob** tool
+- `sed`, `awk` → Use the **Edit** tool
+- `echo`, `printf` → Use the **Write** tool
+These dedicated tools provide better UX and are easier to review.

package/dist/stably-plugin-cli/skills/browser-interaction-guide/SKILL.md ADDED Viewed

@@ -0,0 +1,144 @@
+---
+name: browser-interaction-guide
+description: Guide for browser interactions in Playwright tests including setup_page vs test_debug decisions, wait patterns, headless mode, locator best practices, and browser cleanup. Use when setting up browser sessions, writing wait patterns, fixing locator issues, or debugging headless failures. Triggers on browser setup, setup_page, test_debug, wait pattern, waitFor, headless, locator, browser cleanup.
+---
+# Browser Interaction Guide
+Guide for browser interactions in Playwright tests.
+## Browser Setup
+Before using any browser interaction tools (`browser_navigate`, `browser_click`, `browser_type`, etc.), you must first call a setup tool:
+- `mcp__playwright-test__generator_setup_page`
+- `mcp__playwright-test__planner_setup_page`
+For project selection (auth dependencies), see `playwright-config-auth` skill.
+### seedFile Limitations
+`generator_setup_page` requires a **top-level `test()` call** in the seed file. Tests wrapped in `test.describe()` won't be recognized and will return "seed test not found".
+If your test uses `test.describe()`:
+- Use `tests/seed.spec.ts` as the seedFile instead
+- Then navigate to your starting URL manually with `browser_navigate`
+## setup_page vs test_debug
+### Use `generator_setup_page` / `planner_setup_page` when:
+- You want to keep the browser open after running test code
+- You need to pause mid-flow to see current state
+- User says "run the test and keep the browser open"
+Browser stays open regardless of whether test passes or fails.
+### Use `test_debug` when:
+- Debugging test failures and stepping through tests
+- Verifying a test passes (browser closes on success)
+- Diagnosing why a test is failing (browser stays open on failure)
+Browser only stays open if test fails.
+### Common Mistake
+When user asks to "run a test and keep the browser open at the end":
+- **Wrong:** Use `test_debug` - it closes browser on success
+- **Right:** Use `generator_setup_page` with test file as seedFile
+### Error Recovery
+When `setup_page` or `test_debug` fails:
+| Error Type | Action |
+|------------|--------|
+| **Test code error** (assertion failed, locator not found) | Switch to `test_debug` for iterative debugging |
+| **External service error** (timeout, network) | Retry `test_debug` up to 3 times |
+| **Same error 3+ times** | Flag as blocked service issue, do not keep retrying |
+## Browser Cleanup
+Close the browser whenever you are not actively using it. Keeping browsers running costs money.
+Use `mcp__stably-agent-control__close_browser_session` (NOT `browser_close`) - it directly terminates the VM and always works.
+### When to Close
+- After running tests - close immediately unless running more tests right away
+- After completing browser-based exploration or debugging
+- When user indicates they're done with current task
+## Wait Patterns
+Do NOT use `waitForLoadState()` - Playwright automatically waits for elements to be ready before performing actions.
+Wait for specific elements instead:
+```typescript
+await page.locator('main').waitFor({ state: 'visible' });       // Page ready
+await page.locator('.loader').waitFor({ state: 'hidden' });     // Loading done
+await page.getByRole('heading', { name: 'Title' }).waitFor();   // Content loaded
+```
+Best practices:
+- Never use `.catch(() => {})` on waits - it hides timing issues
+- Wait for state transitions (visible → hidden), not just end states
+## Headless Mode
+Tests run in headless mode by default, which is inherently flakier. Write defensively:
+```typescript
+// Ensure element is in viewport (critical in headless)
+await element.scrollIntoViewIfNeeded();
+await element.waitFor({ state: 'visible' });
+// Wait before typing
+await input.waitFor({ state: 'visible' });
+await input.clear();
+await input.fill('new value');
+```
+If test passes headed but fails headless, it's a timing issue - add appropriate waits.
+## Locator Best Practices
+### Prefer Stable Locators
+```typescript
+// Role-based (best)
+page.getByRole('button', { name: 'Submit' })
+// Test IDs
+page.getByTestId('submit-button')
+// Structural with context
+page.locator('table tbody tr').first().locator('button')
+```
+### Avoid Fragile Locators
+- Empty text filters with indexes: `.filter({ hasText: /^$/ }).nth(5)`
+- Absolute indexes without context
+- CSS selectors based on implementation details
+### Use `.describe()` Method
+All locators must use `.describe()` for readability in trace views:
+```typescript
+page.getByRole('button', { name: 'Submit' }).describe('Submit button')
+```
+## Browser Interaction Strategy (Priority Order)
+1. **Standard Playwright methods** (`locator.click()`, `locator.fill()`) - First choice, fast and reliable
+2. **`agent.act()`** for complex/dynamic UI - Use for sliders, carousels, drag-drop, dynamic menus. Slower but more robust.
+3. **Custom JS `page.evaluate()`** - Last resort only when agent.act() is inconsistent
+Don't prematurely optimize with custom JS. Slower but reliable `agent.act()` is better than fast but brittle custom JS.
+## Page State Snapshots
+After browser actions, you receive a filtered page snapshot showing interactive elements. If you need full page context, use `mcp__playwright-test__browser_snapshot` for the complete unfiltered accessibility tree.

package/dist/stably-plugin-cli/skills/bulk-test-handling/SKILL.md ADDED Viewed

@@ -0,0 +1,104 @@
+---
+name: bulk-test-handling
+description: Guide for handling bulk test creation requests (>15 tests). Use when user submits many tests at once via CSV, list, or batch import. Covers time estimation and batch processing approach. Triggers on bulk import, many tests, CSV import, batch tests, multiple tests at once.
+---
+<!-- CLI-ONLY SKILL: No AskUserQuestion - outputs info then proceeds based on mode.
+     The web version (stably-plugin-web) uses AskUserQuestion for confirmation.
+     Check system prompt for "one-shot mode" to determine behavior. -->
+# Bulk Test Handling Guide (CLI)
+This skill provides guidance for handling requests to create many tests at once in CLI mode.
+## When This Applies
+Use this guidance when the user submits **more than 15 tests** at once, such as:
+- CSV import with test prompts
+- List of test descriptions in a single message
+- Batch test creation request
+- "Create tests for all these scenarios..."
+## Required: Inform User Before Starting
+Before starting bulk test creation, output a clear summary:
+1. Confirm the number of tests detected
+2. Set time expectations
+3. Explain the batch approach
+### Time Estimation
+Each test typically takes **5-15 minutes** depending on complexity:
+- Simple tests (form fill, navigation): ~5 min
+- Medium tests (multi-step workflows): ~10 min
+- Complex tests (conditional logic, multiple assertions): ~15 min
+**Estimate formula:** `numberOfTests × 10 minutes` (use 10 min as average)
+### Information Template
+Output this before starting:
+```
+Detected {N} tests to create.
+**Processing approach:** I'll work through these in batches of about 6 tests at a time.
+**Estimated time:** Approximately {N × 10} minutes total ({N} tests × ~10 min each)
+**What I'll do for each test:**
+1. Analyze the test requirement
+2. Explore the application if needed
+3. Write the test code
+4. Verify it runs successfully
+Starting with batch 1...
+```
+## Processing Approach
+### Batch Size
+Process tests in batches of approximately **6 tests** at a time. This:
+- Keeps quality high
+- Allows natural progress checkpoints
+- Enables course corrections if needed
+### Progress Updates
+Keep the user informed:
+- "Starting batch 1 of 3 (tests 1-6)..."
+- "Completed 6/18 tests. Moving to batch 2..."
+- "Batch 2 complete. 2 tests needed manual review..."
+### Between Batches - Check Your System Prompt
+Your system prompt tells you which mode you're in:
+- **If "one-shot mode" appears in your system prompt:** Continue automatically through all batches without pausing for input
+- **Otherwise (chat mode):** You can pause between batches to let user review progress and adjust priorities
+After completing each batch:
+1. Summarize what was created
+2. Note any tests that had issues
+3. In chat mode: ask if user wants to continue or adjust
+4. In one-shot mode: proceed automatically to next batch
+## Handling Issues
+If tests fail or have problems during bulk creation:
+- Note the issue but continue with other tests
+- Compile a summary of problematic tests at the end
+- List which tests need follow-up
+## Summary at Completion
+After all batches complete, provide:
+- Total tests created successfully
+- List of any failed/skipped tests with reasons
+- Suggestions for resolving issues
+## Do NOT
+- Start creating tests without informing user of scope
+- Process all tests without progress updates
+- Skip time estimation

package/dist/stably-plugin-cli/skills/debugging-test-failures/SKILL.md ADDED Viewed

@@ -0,0 +1,146 @@
+---
+name: debugging-test-failures
+description: Guide for debugging test execution issues including assertion failures, MCP tool failures, test setup problems, browser errors, and flaky tests. Use when tests fail, MCP tools fail, browser setup fails, or tests are flaky. Triggers on test failure, debug, MCP error, test won't run, setup failed, CDP error, flaky test, timeout.
+---
+# Debugging Test Failures
+Guide for debugging Playwright test failures.
+## Gather Codebase Context First
+Before attempting any fix, understand how the userflow being tested actually works. Explore application source code to understand:
+- What components/pages are involved
+- What APIs/services are called and data flows
+- What state management affects the UI
+- What recent changes may have caused the failure
+### Codebase Exploration
+Use sub-agents with `model: "haiku"` for codebase exploration to save costs:
+```
+subagent_type: "Explore"
+model: "haiku"
+description: "Find login page components"
+prompt: "Search the codebase for components related to the login page. Find the main login component, any API calls it makes, and related hooks or services."
+```
+Application code is usually in `../src/`, `../app/`, or `../packages/` - parent directories of the tests folder.
+## Source of Truth Hierarchy
+When determining what behavior is "correct", use this hierarchy (most trustworthy first):
+1. **PR Description / Commit Messages** - Explains intended behavior change
+2. **Design Documents** - Describe intended behavior, not necessarily what code does
+3. **Test Intent (User Prompt Comments)** - Records what the test should verify
+4. **Application Code** - Shows what actually happens, not what should happen (code can have bugs)
+If code conflicts with PR description or design docs, code may be wrong. Report as application bug.
+## Issue Classification
+Five types of issues cause test failures:
+| Type | Description | Examples |
+|------|-------------|----------|
+| **Test Issue** | Test is incorrect or needs updating | Wrong assertions, flaky selectors, race conditions |
+| **Application Bug** | Application has genuine bug | Functionality broken, UI not rendering |
+| **External/Other** | Infrastructure or environment issues | Third-party outages, network problems |
+| **Flaky Test** | Non-deterministic failure | Passes on retry, timing-dependent |
+| **UI Change** | UI intentionally changed, test is stale | Button text changed, layout restructured |
+## Debugging by Mode
+**In `fix` mode**: Delegate browser-based exploration to the `debug-worker` subagent.
+**In `single`, `build`, `chat` modes**: Run `test_debug` / `test_run` directly (no debug subagent available).
+### When to Use Subagents
+- Verifying a test works after making changes
+- Reproducing test failures
+- Investigating flaky tests
+- Running tests multiple times for consistency
+### What You Do Directly
+- Writing test code
+- Implementing fixes based on diagnosis
+- Reading test files (use Read tool)
+## Debugging Workflow
+1. **Find failed tests**: Use Read/Glob to identify failing test files
+2. **Reproduce + diagnose**:
+   - Fix mode: Delegate to `debug-worker`
+   - Single/build/chat modes: Run `test_debug` directly
+3. **Implement fix**: Make necessary code changes
+4. **Verify fix**: Use `test_run` directly for final verification
+5. **Group related failures**: Only group tests that share the same root cause
+## Parallel Diagnosis
+Parallelize only read-only analysis. Browser debugging should stay sequential to avoid session conflicts.
+### Grouping Strategy
+1. Analyze all failures first: Read error messages and categorize by likely root cause
+2. Group by shared characteristics: Same error, same page, same API dependency
+3. Launch diagnosis sequentially in fix mode, directly in other modes
+## Browser Infrastructure Errors
+### CDP Connection Errors
+If you encounter `browserType.connectOverCDP` errors like "Target page has been closed" or "upstream connection failed":
+This is a stale browser VM issue, NOT a test or config problem.
+**Fix:**
+1. Call `mcp__stably-agent-control__close_browser_session` to terminate the cloud browser VM
+2. Retry your operation - fresh browser VM will spin up automatically
+Do NOT repeatedly call `restart_mcp_servers` - it doesn't fix VM-level issues.
+## Empty test_run Output
+If `test_run` returns empty/no output (no pass, no fail, no error):
+This is NOT a tool malfunction. Empty output means zero tests matched the project filters.
+Do NOT retry the same command - it will keep returning empty.
+**Diagnose:** See `playwright-config-auth` skill for project selection guidance and how to fix filter mismatches.
+## Debugging During Test Creation
+### Do Not Assume
+- **Bug-free**: Application may have bugs you discover
+- **Accessible**: Website may have auth requirements, bot detection, or rate limiting
+### When Observed Behavior Differs from Expectations
+1. Do NOT change user's original testing intent without approval
+2. STOP and report: "The user expects X, but the application does Y"
+A failing test that catches a real bug is more valuable than a passing test that hides bugs.
+## Best Practices
+- Reproduce first: Always try to reproduce failures before diagnosing
+- Be specific: Identify actual file/function/selector locations
+- Use evidence: Base analysis on actual test runs, errors, code inspection
+- Respect annotations: If test is marked with `test.skip()`, do not touch it
+## Maximum Fix Attempts
+You may attempt to fix a specific issue a maximum of 3 times. If after 3 attempts the same error persists:
+1. STOP trying the same approach
+2. Report the blocker clearly: what you tried, what keeps failing, likely root cause
+3. Move on if truly blocked
+Do NOT enter an infinite fix-retry loop. 3 attempts is the maximum before escalating.

package/dist/{stably-plugin → stably-plugin-cli}/skills/playwright-best-practices/SKILL.md RENAMED Viewed

@@ -11,6 +11,12 @@ metadata:
 This skill provides comprehensive guidance for all aspects of Playwright test development, from writing new tests to debugging and maintaining existing test suites.
+## Execution Policy (Stably Agents)
+- Use Playwright MCP tools for test execution in normal agent flows (`test_debug` for iteration, `test_run` for final verification).
+- Do NOT run `npx playwright test` from Bash in agent workflows.
+- Exception: auto-heal `fix-worker` subagents validate via `npx stably test --reporter=list` because they run in parallel without browser MCP tools.
 ## Activity-Based Reference Guide
 Consult these references based on what you're doing:
@@ -233,10 +239,10 @@ What are you doing?
 After writing or modifying tests:
-1. **Run tests**: `npx playwright test --reporter=list`
+1. **Run iterative checks** with `test_debug`
 2. **If tests fail**:
-   - Review error output and trace (`npx playwright show-trace`)
+   - Review the failure output and trace artifacts from the test tools
    - Fix locators, waits, or assertions
-   - Re-run tests
-3. **Only proceed when all tests pass**
-4. **Run multiple times** for critical tests: `npx playwright test --repeat-each=5`
+   - Re-run `test_debug`
+3. **Final verification**: run `test_run` once after `test_debug` is stable
+4. **For critical tests**, run additional `test_debug` passes to check repeatability before final `test_run`