npm - @muggleai/works - Versions diffs - 4.4.0 → 4.6.0 - Mend

@muggleai/works 4.4.0 → 4.6.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (30) hide show

package/README.md +31 -13
package/dist/{chunk-PMI2DI3V.js → chunk-TP4T4T2Z.js} +348 -105
package/dist/cli.js +1 -1
package/dist/index.js +1 -1
package/dist/plugin/.claude-plugin/plugin.json +1 -1
package/dist/plugin/.cursor-plugin/plugin.json +1 -1
package/dist/plugin/README.md +1 -0
package/dist/plugin/skills/do/e2e-acceptance.md +6 -3
package/dist/plugin/skills/do/open-prs.md +35 -74
package/dist/plugin/skills/muggle-pr-visual-walkthrough/SKILL.md +181 -0
package/dist/plugin/skills/muggle-test/SKILL.md +146 -121
package/dist/plugin/skills/muggle-test-feature-local/SKILL.md +66 -16
package/dist/plugin/skills/muggle-test-import/SKILL.md +127 -25
package/dist/plugin/skills/muggle-test-regenerate-missing/SKILL.md +201 -0
package/dist/plugin/skills/muggle-test-regenerate-missing/evals/evals.json +58 -0
package/dist/plugin/skills/muggle-test-regenerate-missing/evals/trigger-eval.json +22 -0
package/dist/release-manifest.json +7 -0
package/package.json +7 -7
package/plugin/.claude-plugin/plugin.json +1 -1
package/plugin/.cursor-plugin/plugin.json +1 -1
package/plugin/README.md +1 -0
package/plugin/skills/do/e2e-acceptance.md +6 -3
package/plugin/skills/do/open-prs.md +35 -74
package/plugin/skills/muggle-pr-visual-walkthrough/SKILL.md +181 -0
package/plugin/skills/muggle-test/SKILL.md +146 -121
package/plugin/skills/muggle-test-feature-local/SKILL.md +66 -16
package/plugin/skills/muggle-test-import/SKILL.md +127 -25
package/plugin/skills/muggle-test-regenerate-missing/SKILL.md +201 -0
package/plugin/skills/muggle-test-regenerate-missing/evals/evals.json +58 -0
package/plugin/skills/muggle-test-regenerate-missing/evals/trigger-eval.json +22 -0

package/plugin/skills/muggle-test/SKILL.md CHANGED Viewed

@@ -15,6 +15,22 @@ A router skill that detects code changes, resolves impacted test cases, executes
 - **Multi-select** (use cases, test cases): Use `AskQuestion` with `allow_multiple: true`.
 - **Free-text inputs** (URLs, descriptions): Only use plain text prompts when there is no finite set of options. Even then, offer a detected/default value when possible.
 - **Batch related questions**: If two questions are independent, present them together in a single `AskQuestion` call rather than asking sequentially.
+- **Parallelize job-creation calls**: Whenever you're kicking off N independent cloud jobs — creating multiple use cases, generating/creating multiple test cases, fetching details for multiple test cases, starting multiple remote workflows, publishing multiple local runs, or fetching per-step screenshots for multiple runs — issue all N tool calls in a single message so they run in parallel. Never loop them sequentially unless there is a real ordering constraint (e.g. a single local Electron browser that can only run one test at a time).
+## Test Case Design: One Atomic Behavior Per Test Case
+Every test case verifies exactly **one** user-observable behavior. Never bundle multiple concerns, sequential flows, or bootstrap/setup into a single test case — even if you think it would be "cleaner" or "more efficient."
+**Ordering, dependencies, and bootstrap are Muggle's service responsibility, not yours.** Muggle's cloud handles test case dependencies, prerequisite state, and execution ordering. Your job is to describe the *atomic behavior to verify* — never the flow that gets there.
+- ❌ Wrong: one test case that "signs up, logs in, navigates to the detail modal, verifies icon stacking, verifies tab order, verifies history format, and verifies reference layout."
+- ✅ Right: four separate test cases — one per verifiable behavior — each with instruction text like "Verify the detail modal shows stacked pair of icons per card" with **no** signup / login / navigation / setup language.
+**Never bake bootstrap into a test case description.** Signup, login, seed data, prerequisite navigation, tear-down — none of these belong inside the test case body. Write only the verification itself. The service will prepend whatever setup is needed based on its own dependency graph.
+**Never consolidate the generator's output.** When `muggle-remote-test-case-generate-from-prompt` returns N micro-tests from a single prompt, that decomposition is the authoritative one. Do not "merge them into 1 for simplicity," do not "rewrite them to share bootstrap," do not "collapse them to match a 4 UC / 4 TC plan." Accept what the generator gave you.
+**Never skip the generate→review cycle.** Even when you are 100% confident about the right shape, always present the generated test cases to the user before calling `muggle-remote-test-case-create`. "I'll skip the generate→review cycle and create directly" is a sign you're about to get it wrong.
 ## Step 1: Confirm Scope of Work (Always First)
@@ -41,8 +57,8 @@ If the user's intent is clear, state back what you understood and use `AskQuesti
 - Option 2: "Switch to [the other mode]"
 If ambiguous, use `AskQuestion` to let the user choose:
-- Option 1: "Local — launch browser on your machine against localhost"
-- Option 2: "Remote — Muggle cloud tests against a preview/staging URL"
+- Option 1: "On my computer — test your localhost dev server in a browser on your machine"
+- Option 2: "In the cloud — test remotely targeting your deployed preview/staging URL"
 Only proceed after the user selects an option.
@@ -66,8 +82,12 @@ If no changes detected (clean tree), tell the user and ask what they want to tes
 ## Step 3: Authenticate
 1. Call `muggle-remote-auth-status`
-2. If authenticated and not expired → proceed
-3. If not authenticated or expired → call `muggle-remote-auth-login`
+2. If **authenticated and not expired** → print the logged-in email and ask via `AskQuestion`:
+   > "You're logged in as **{email}**. Continue with this account?"
+   - Option 1: "Yes, continue"
+   - Option 2: "No, switch account"
+   If the user picks "switch account", call `muggle-remote-auth-login` with `forceNewSession: true`, then `muggle-remote-auth-poll`.
+3. If **not authenticated or expired** → call `muggle-remote-auth-login`
 4. If login pending → call `muggle-remote-auth-poll`
 If auth fails repeatedly, suggest: `muggle logout && muggle login` from terminal.
@@ -93,56 +113,68 @@ A **project** is where all your test results, use cases, and test scripts are gr
 Store the `projectId` only after user confirms.
-## Step 5: Select Use Case (User Must Choose)
+## Step 5: Select Use Case (Best-Effort Shortlist)
 ### 5a: List existing use cases
 Call `muggle-remote-use-case-list` with the project ID.
-### 5b: Present ALL use cases for user selection
+### 5b: Best-effort match against the change summary
-Use `AskQuestion` with `allow_multiple: true` to present all use cases as clickable options. Always include a "Create new use case" option at the end.
+Using the change summary from Step 2, pick the use cases whose title/description most plausibly relate to the impacted areas. Produce a **short shortlist** (typically 1–5) — don't try to be exhaustive, and don't dump the full project list on the user. A confident best-effort match is the goal.
-Prompt: "Which use case(s) do you want to test?"
+If nothing looks like a confident match, fall back to asking the user which use case(s) they have in mind.
-### 5c: Wait for explicit user selection
+### 5c: Present the shortlist for confirmation
-**CRITICAL: Do NOT auto-select use cases** based on:
-- Git changes analysis
-- Use case title/description matching
-- Any heuristic or inference
+Use `AskQuestion` with `allow_multiple: true`:
-The user MUST explicitly tell you which use case(s) to use.
+Prompt: "These use cases look most relevant to your changes — confirm which to test:"
-### 5d: If user chooses "Create new use case"
-1. Ask the user to describe the use case in plain English
-2. Call `muggle-remote-use-case-create-from-prompts`:
+- Pre-check the shortlisted items so the user can accept with one click
+- Include "Pick a different use case" to reveal the full project list
+- Include "Create new use case" at the end
+### 5d: If user picks "Pick a different use case"
+Re-present the full list from 5a via `AskQuestion` with `allow_multiple: true`, then continue.
+### 5e: If user chooses "Create new use case"
+1. Ask the user to describe the use case(s) in plain English — they may want more than one
+2. Call `muggle-remote-use-case-create-from-prompts` **once** with **all** descriptions batched into the `instructions` array (this endpoint natively fans out the jobs server-side — do NOT make one call per use case):
    - `projectId`: The project ID
-   - `prompts`: Array of `{ instruction: "..." }` with the user's description
-3. Present the created use case and confirm it's correct
+   - `instructions`: A plain array of strings, one per use case — e.g. `["<description 1>", "<description 2>", ...]`
+3. Present the created use cases and confirm they're correct
-## Step 6: Select Test Case (User Must Choose)
+## Step 6: Select Test Case (Best-Effort Shortlist)
 For the selected use case(s):
 ### 6a: List existing test cases
 Call `muggle-remote-test-case-list-by-use-case` with each use case ID.
-### 6b: Present ALL test cases for user selection
+### 6b: Best-effort match against the change summary
+Using the change summary from Step 2, pick the test cases that look most relevant to the impacted areas. Keep the shortlist small and confident — don't enumerate every test case attached to the use case(s).
-Use `AskQuestion` with `allow_multiple: true` to present all test cases as clickable options. Always include a "Generate new test case" option at the end.
+If nothing looks like a confident match, fall back to offering to run all test cases for the selected use case(s), or ask the user what they had in mind.
-Prompt: "Which test case(s) do you want to run?"
+### 6c: Present the shortlist for confirmation
-### 6c: Wait for explicit user selection
+Use `AskQuestion` with `allow_multiple: true`:
-**CRITICAL: Do NOT auto-select test cases** — the user MUST explicitly choose which test case(s) to execute.
+Prompt: "These test cases look most relevant — confirm which to run:"
+- Pre-check the shortlisted items so the user can accept with one click
+- Include "Show all test cases" to reveal the full list
+- Include "Generate new test case" at the end
 ### 6d: If user chooses "Generate new test case"
-1. Ask the user to describe what they want to test in plain English
-2. Call `muggle-remote-test-case-generate-from-prompt`:
-   - `projectId`, `useCaseId`, `instruction` (the user's description)
-3. Present the generated test case(s) for review
-4. Call `muggle-remote-test-case-create` to save the ones the user approves
+1. Ask the user to describe what they want to test in plain English — they may want more than one test case
+2. For N descriptions, issue N `muggle-remote-test-case-generate-from-prompt` calls **in parallel** (single message, multiple tool calls — never loop sequentially):
+   - `projectId`, `useCaseId`, `instruction` (one description per call)
+   - Each `instruction` must describe **exactly one atomic behavior to verify**. No signup, no login, no "first navigate to X, then click Y, then verify Z" chains, no seed data, no cleanup. Just the verification. See **Test Case Design** above.
+3. **Accept the generator's decomposition as-is.** If the generator returns 4 micro-tests from a single prompt, that's 4 correct test cases — never merge, consolidate, or rewrite them to bundle bootstrap.
+4. Present the generated test case(s) for user review — **always do this review cycle**, even when you think you already know the right shape. Skipping straight to creation is the anti-pattern this skill most frequently gets wrong.
+5. For the ones the user approves, issue `muggle-remote-test-case-create` calls **in parallel**
 ### 6e: Confirm final selection
@@ -154,9 +186,7 @@ Wait for user confirmation before moving to execution.
 ## Step 7A: Execute — Local Mode
-### Pre-flight questions (batch where possible)
-**Question 1 — Local URL:**
+### Pre-flight question — Local URL
 Try to auto-detect the dev server URL by checking running terminals or common ports (e.g., `lsof -iTCP -sTCP:LISTEN -nP | grep -E ':(3000|3001|4200|5173|8080)'`). If a likely URL is found, present it as a clickable default via `AskQuestion`:
 - Option 1: "http://localhost:3000" (or whatever was detected)
@@ -164,38 +194,31 @@ Try to auto-detect the dev server URL by checking running terminals or common po
 If nothing detected, ask as free text: "Your local app should be running. What's the URL? (e.g., http://localhost:3000)"
-**Question 2 — Electron launch + window visibility (ask together):**
-After getting the URL, use a single `AskQuestion` call with two questions:
+**No separate approval or visibility question.** The user picking Local mode in Step 1 *is* the approval — do not ask "ready to launch Electron?" before every run. The Electron browser defaults to visible; if the user wants headless, they will say so, otherwise let it run visible.
-1. "Ready to launch the Muggle Electron browser for [N] test case(s)?"
-   - "Yes, launch it (visible — I want to watch)"
-   - "Yes, launch it (headless — run in background)"
-   - "No, cancel"
+### Fetch test case details (in parallel)
-If user cancels, stop and ask what they want to do instead.
+Before execution, fetch full test case details for all selected test cases by issuing **all** `muggle-remote-test-case-get` calls in parallel (single message, multiple tool calls).
-### Run sequentially
+### Run sequentially (Electron constraint)
-For each test case:
+Execution itself **must** be sequential because there is only one local Electron browser. For each test case, in order:
-1. Call `muggle-remote-test-case-get` to fetch full details
-2. Call `muggle-local-execute-test-generation`:
-   - `testCase`: Full test case object from step 1
-   - `localUrl`: User's local URL (from Question 1)
-   - `approveElectronAppLaunch`: `true` (only if user approved in Question 2)
-   - `showUi`: `true` if user chose "visible", `false` if "headless" (from Question 2)
-3. Store the returned `runId`
+1. Call `muggle-local-execute-test-generation`:
+   - `testCase`: Full test case object from the parallel fetch above
+   - `localUrl`: User's local URL from the pre-flight question
+   - `showUi`: omit (default visible) unless the user explicitly asked for headless, then pass `false`
+2. Store the returned `runId`
 If a generation fails, log it and continue to the next. Do not abort the batch.
-### Collect results
+### Collect results (in parallel)
-For each `runId`, call `muggle-local-run-result-get`. Extract: status, duration, step count, `artifactsDir`.
+For every `runId`, issue all `muggle-local-run-result-get` calls in parallel. Extract: status, duration, step count, `artifactsDir`.
-### Publish each run to cloud
+### Publish each run to cloud (in parallel)
-For each completed run, call `muggle-local-publish-test-script`:
+For every completed run, issue all `muggle-local-publish-test-script` calls in parallel (single message, multiple tool calls):
 - `runId`: The local run ID
 - `cloudTestCaseId`: The cloud test case ID
@@ -225,26 +248,29 @@ For failures: show which step failed, the local screenshot path, and a suggestio
 > "What's the preview/staging URL to test against?"
-### Trigger remote workflows
+### Fetch test case details (in parallel)
-For each test case:
+Issue all `muggle-remote-test-case-get` calls in parallel (single message, multiple tool calls) to hydrate the test case bodies.
-1. Call `muggle-remote-test-case-get` to fetch full details
-2. Call `muggle-remote-workflow-start-test-script-generation`:
-   - `projectId`: The project ID
-   - `useCaseId`: The use case ID
-   - `testCaseId`: The test case ID
-   - `name`: `"muggle-test: {test case title}"`
-   - `url`: The preview/staging URL
-   - `goal`: From the test case
-   - `precondition`: From the test case (use `"None"` if empty)
-   - `instructions`: From the test case
-   - `expectedResult`: From the test case
-3. Store the returned workflow runtime ID
+### Trigger remote workflows (in parallel)
+Once details are in hand, issue all `muggle-remote-workflow-start-test-script-generation` calls in parallel — never loop them sequentially. For each test case:
+- `projectId`: The project ID
+- `useCaseId`: The use case ID
+- `testCaseId`: The test case ID
+- `name`: `"muggle-test: {test case title}"`
+- `url`: The preview/staging URL
+- `goal`: From the test case
+- `precondition`: From the test case (use `"None"` if empty)
+- `instructions`: From the test case
+- `expectedResult`: From the test case
+Store each returned workflow runtime ID.
-### Monitor and report
+### Monitor and report (in parallel)
-For each workflow, call `muggle-remote-wf-get-ts-gen-latest-run` with the runtime ID.
+Issue all `muggle-remote-wf-get-ts-gen-latest-run` calls in parallel, one per runtime ID.
 ```
 Test Case                  Workflow Status   Runtime ID
@@ -279,66 +305,61 @@ open "https://www.muggle-ai.com/muggleTestV0/dashboard/projects/{projectId}/runs
 Tell the user:
 > "I've opened the Muggle AI dashboard in your browser — you can see the test results, step-by-step screenshots, and action scripts there."
-## Step 9: Post E2E Acceptance Results to PR
+## Step 9: Offer to Post Visual Walkthrough to PR
-After reporting results, check if there's an open PR for the current branch and attach the E2E acceptance summary.
+After reporting results, ask the user if they want to attach a **visual walkthrough** — a markdown block with per-test-case dashboard links and step-by-step screenshots — to the current branch's open PR. The rendering and posting workflow lives in the shared `muggle-pr-visual-walkthrough` skill; this step gathers the required input and hands off.
-### 9a: Find the PR
-```bash
-gh pr view --json number,url,title 2>/dev/null
-```
+### 9a: Gather per-step screenshots (required input for the shared skill)
-- If a PR exists → post results as a comment
-- If no PR exists → use `AskQuestion`:
-  - "Create PR with E2E acceptance results"
-  - "Skip posting to PR"
+The shared skill takes an **`E2eReport` JSON** that includes per-step screenshot URLs. You already have `projectId`, `testCaseId`, `runId`, `viewUrl`, and `status` from earlier steps — you still need the step-level data.
-### 9b: Build the E2E acceptance comment body
+For the published runs from Step 7A, issue **all** `muggle-remote-test-script-get` calls in parallel (single message, multiple tool calls) — one per `testScriptId` returned by `muggle-local-publish-test-script`. Then, for each response:
-Construct a markdown comment with the full E2E acceptance breakdown. The format links each test case to its detail page on the Muggle AI dashboard, so PR reviewers can click through to see step-by-step screenshots and action scripts.
+1. Extract `steps[].operation.action` (description) and `steps[].operation.screenshotUrl` (cloud URL).
+2. Build a `steps` array: `[{ stepIndex: 0, action: "...", screenshotUrl: "..." }, ...]`.
+3. If the run failed, also capture `failureStepIndex`, `error`, and the local `artifactsDir` from `muggle-local-run-result-get`.
-```markdown
-## 🧪 Muggle AI — E2E Acceptance Results
+Assemble the report:
-**X passed / Y failed** | [View all on Muggle AI](https://www.muggle-ai.com/muggleTestV0/dashboard/projects/{projectId}/runs)
-| Test Case | Status | Details |
-|-----------|--------|---------|
-| [Login with valid creds](https://www.muggle-ai.com/muggleTestV0/dashboard/projects/{projectId}/scripts?modal=script-details&testCaseId={testCaseId}) | ✅ PASSED | 8 steps, 12.3s |
-| [Login with invalid creds](https://www.muggle-ai.com/muggleTestV0/dashboard/projects/{projectId}/scripts?modal=script-details&testCaseId={testCaseId}) | ✅ PASSED | 6 steps, 9.1s |
-| [Checkout flow](https://www.muggle-ai.com/muggleTestV0/dashboard/projects/{projectId}/scripts?modal=script-details&testCaseId={testCaseId}) | ❌ FAILED | Step 7: "Click checkout button" — element not found |
+```json
+{
+  "projectId": "<projectId>",
+  "tests": [
+    {
+      "name": "<test case title>",
+      "testCaseId": "<id>",
+      "testScriptId": "<id>",
+      "runId": "<id>",
+      "viewUrl": "<publish response viewUrl>",
+      "status": "passed",
+      "steps": [{ "stepIndex": 0, "action": "...", "screenshotUrl": "..." }]
+    }
+  ]
+}
+```
-<details>
-<summary>Failed test details</summary>
+See the shared skill for the full schema (including the failed-test shape with `failureStepIndex` and `error`).
-### Checkout flow
-- **Failed at**: Step 7 — "Click checkout button"
-- **Error**: Element not found
-- **Local artifacts**: `~/.muggle-ai/sessions/{runId}/`
-- **Screenshots**: `~/.muggle-ai/sessions/{runId}/screenshots/`
+### 9b: Ask the user
-</details>
+Use `AskQuestion`:
----
-*Generated by [Muggle AI](https://www.muggle-ai.com) — change-driven E2E acceptance testing*
-```
+> "Post a visual walkthrough of these results to the PR? Reviewers can click each test case to see step-by-step screenshots on the Muggle AI dashboard."
-### 9c: Post to the PR
+- Option 1: "Yes, post to PR"
+- Option 2: "Skip"
-If PR already exists — add as a comment:
-```bash
-gh pr comment {pr-number} --body "$(cat <<'EOF'
-{the E2E acceptance comment body from 9b}
-EOF
-)"
-```
+### 9c: Invoke the shared skill in Mode A
-If creating a new PR — include the E2E acceptance section in the PR body alongside the usual summary/changes sections.
+If the user chooses "Yes, post to PR", invoke the `muggle-pr-visual-walkthrough` skill via the `Skill` tool. With the `E2eReport` already in context, the skill will:
-### 9d: Confirm to user
+1. Call `muggle build-pr-section` to render the markdown block (fit-vs-overflow automatic)
+2. Find the PR via `gh pr view`
+3. Post `body` as a `gh pr comment`
+4. Post the overflow `comment` as a second comment (only if the CLI emitted one)
+5. Confirm the PR URL to the user
-> "E2E acceptance results posted to PR #{number}. Reviewers can click the test case links to see step-by-step screenshots on the Muggle AI dashboard."
+This skill always uses **Mode A** (post to an existing PR); `muggle-do` is the only caller that uses Mode B. Do not attempt to render the walkthrough markdown yourself — delegate to the shared skill.
 ## Tool Reference
@@ -360,21 +381,25 @@ If creating a new PR — include the E2E acceptance section in the PR body along
 | Results | `muggle-local-run-result-get` | Local |
 | Results | `muggle-remote-wf-get-ts-gen-latest-run` | Remote |
 | Publish | `muggle-local-publish-test-script` | Local |
+| Per-step screenshots (for walkthrough) | `muggle-remote-test-script-get` | Both |
 | Browser | `open` (shell command) | Both |
-| PR | `gh pr view`, `gh pr comment`, `gh pr create` | Both |
+| PR walkthrough | `muggle-pr-visual-walkthrough` (shared skill) | Both |
 ## Guardrails
 - **Always confirm intent first** — never assume local vs remote without asking
 - **User MUST select project** — present clickable options via `AskQuestion`, wait for explicit choice, never auto-select
-- **User MUST select use case(s)** — present clickable options via `AskQuestion`, wait for explicit choice, never auto-select based on git changes or heuristics
-- **User MUST select test case(s)** — present clickable options via `AskQuestion`, wait for explicit choice, never auto-select
+- **Best-effort shortlist use cases** — use the change summary to narrow the list to the most relevant 1–5 use cases and pre-check them; never dump every use case in the project on the user. Always leave an escape hatch to reveal the full list.
+- **Best-effort shortlist test cases** — same idea: pre-check the test cases most relevant to the change summary; never enumerate every test case attached to a use case. Always leave an escape hatch to reveal the full list.
 - **Use `AskQuestion` for every selection** — never ask the user to type a number; always present clickable options
-- **Batch related questions** — combine Electron approval + visibility into one question; auto-detect localhost URL when possible
-- **Never launch Electron without explicit user approval** (`approveElectronAppLaunch`)
+- **Auto-detect localhost URL when possible**; only fall back to free-text when nothing is listening on a common port
+- **Parallelize independent cloud jobs** — when creating N use cases, generating/creating N test cases, fetching N test case details, starting N remote workflows, polling N workflow runtimes, publishing N local runs, or fetching N per-step test scripts, issue all N calls in a single message so they fan out in parallel. The only tolerated sequential loop is local Electron execution (one browser, one test at a time). For use case creation specifically, use the native batch form of `muggle-remote-use-case-create-from-prompts` (all descriptions in one `instructions` array) instead of parallel calls.
+- **One atomic behavior per test case** — every test case verifies exactly one user-observable behavior. Never bundle signup/login/navigation/bootstrap/teardown into a test case body. Ordering and dependencies are Muggle's service responsibility, not the skill's.
+- **Never consolidate the generator's output** — if `muggle-remote-test-case-generate-from-prompt` returns N micro-tests, accept all N; never merge them into fewer test cases, even if "the plan" says 4 UC / 4 TC.
+- **Never skip the generate→review cycle** — always present generated test cases to the user before calling `muggle-remote-test-case-create`, even when you're confident. "I'll skip the review and create directly" is always wrong.
+- **Never ask for Electron launch approval before each run** — the user picking Local mode is the approval. Don't prompt "Ready to launch Electron?" before execution; just run.
 - **Never silently drop test cases** — log failures and continue, then report them
 - **Never guess the URL** — always ask the user for localhost or preview URL
 - **Always publish before opening browser** — the dashboard needs the published data to show results
-- **Use correct dashboard URL format** — `modal=script-details` (not `modal=details`)
-- **Always check for PR before posting** — don't create a PR comment if there's no PR (ask user first)
+- **Delegate PR posting to `muggle-pr-visual-walkthrough`** — never inline the walkthrough markdown or call `gh pr comment` directly from this skill; ask the user and hand off
 - **Can be invoked at any state** — if the user already has a project or use cases set up, skip to the relevant step rather than re-doing everything

package/plugin/skills/muggle-test-feature-local/SKILL.md CHANGED Viewed

@@ -19,7 +19,7 @@ The local URL only changes where the browser opens; it does not change the remot
 **Every selection-based question MUST use the `AskQuestion` tool** (or the platform's equivalent structured selection tool). Never ask the user to "reply with a number" in a plain text message — always present clickable options.
-- **Selections** (project, use case, test case, script, approval): Use `AskQuestion` with labeled options the user can click.
+- **Selections** (project, use case, test case, script): Use `AskQuestion` with labeled options the user can click.
 - **Free-text inputs** (URLs, descriptions): Only use plain text prompts when there is no finite set of options. Even then, offer a detected/default value when possible.
 ## Workflow
@@ -27,7 +27,12 @@ The local URL only changes where the browser opens; it does not change the remot
 ### 1. Auth
 - `muggle-remote-auth-status`
-- If not signed in: `muggle-remote-auth-login` then `muggle-remote-auth-poll`
+- If **authenticated**: print the logged-in email and ask via `AskQuestion`:
+  > "You're logged in as **{email}**. Continue with this account?"
+  - Option 1: "Yes, continue"
+  - Option 2: "No, switch account"
+  If the user picks "switch account", call `muggle-remote-auth-login` with `forceNewSession: true` then `muggle-remote-auth-poll`.
+- If **not signed in or expired**: call `muggle-remote-auth-login` then `muggle-remote-auth-poll`.
   Do not skip or assume auth.
 ### 2. Targets (user must confirm)
@@ -57,7 +62,7 @@ Prompt for projects: "Pick the project to group this test into:"
 - **Project — Create new project:** Collect `projectName`, `description`, and `url` (may be the local app URL, e.g. `http://localhost:3999`). Call `muggle-remote-project-create`. Use the returned `projectId` and continue.
 - **Use case — Create new use case:** User provides a natural-language instruction (or you reuse their testing goal).
   1. `muggle-remote-use-case-prompt-preview` with `projectId`, `instruction` — show preview; get confirmation via `AskQuestion`.
-  2. `muggle-remote-use-case-create-from-prompts` with `projectId`, `prompts: [{ instruction }]` — persist. Use the created use case id and continue to test-case selection.
+  2. `muggle-remote-use-case-create-from-prompts` with `projectId` and `instructions: ["<the user's natural-language instruction>"]` — persist. Use the created use case id and continue to test-case selection.
 - **Test case — Create new test case** (requires a chosen `useCaseId`): User provides an instruction describing what to test.
   1. `muggle-remote-test-case-generate-from-prompt` with `projectId`, `useCaseId`, `instruction` — **preview only** (server test-case prompt preview); show the returned draft(s); get confirmation via `AskQuestion`.
   2. Persist the accepted draft with `muggle-remote-test-case-create`, mapping preview fields into the required properties (`title`, `description`, `goal`, `expectedResult`, `url`, etc.). Then continue from **section 4** with that `testCaseId`.
@@ -84,21 +89,21 @@ Remind them: local URL is only the execution target, not tied to cloud project c
 **Generate**
 1. `muggle-remote-test-case-get`
-2. `muggle-local-execute-test-generation` (after approval in step 6) with that test case + `localUrl` + `approveElectronAppLaunch: true` (optional: `showUi: true`, **`timeoutMs`** — see below)
+2. `muggle-local-execute-test-generation` with that test case + `localUrl` (optional: `showUi: false` for headless — defaults to visible; **`timeoutMs`** — see below)
 **Replay**
 1. `muggle-remote-test-script-get` — note `actionScriptId`
 2. `muggle-remote-action-script-get` with that id — full `actionScript`
    **Use the API response as-is.** Do not edit, shorten, or rebuild `actionScript`; replay needs full `label` paths for element lookup.
-3. `muggle-local-execute-replay` (after approval in step 6) with `testScript`, `actionScript`, `localUrl`, `approveElectronAppLaunch: true` (optional: `showUi: true`, **`timeoutMs`** — see below)
+3. `muggle-local-execute-replay` with `testScript`, `actionScript`, `localUrl` (optional: `showUi: false` for headless — defaults to visible; **`timeoutMs`** — see below)
 ### Local execution timeout (`timeoutMs`)
 The MCP client often uses a **default wait of 300000 ms (5 minutes)** for `muggle-local-execute-test-generation` and `muggle-local-execute-replay`. **Exploratory script generation** (Auth0 login, dashboards, multi-step wizards, many LLM iterations) routinely **runs longer than 5 minutes** while Electron is still healthy.
 - **Always pass `timeoutMs`** for flows that may be long — for example **`600000` (10 min)** or **`900000` (15 min)** — unless the user explicitly wants a short cap.
-- If the tool reports **`Electron execution timed out after 300000ms`** (or similar) **but** Electron logs show the run still progressing (steps, screenshots, LLM calls), treat it as **orchestration timeout**, not an Electron app defect: **increase `timeoutMs` and retry** (after user re-approves if your policy requires it).
+- If the tool reports **`Electron execution timed out after 300000ms`** (or similar) **but** Electron logs show the run still progressing (steps, screenshots, LLM calls), treat it as **orchestration timeout**, not an Electron app defect: **increase `timeoutMs` and retry**.
 - **Test case design:** Preconditions like "a test run has already completed" on an **empty account** can force many steps (sign-up, new project, crawl). Prefer an account/project that **already has** the needed state, or narrow the test goal so generation does not try to create a full project from scratch unless that is intentional.
 ### Interpreting `failed` / non-zero Electron exit
@@ -106,15 +111,9 @@ The MCP client often uses a **default wait of 300000 ms (5 minutes)** for `muggl
 - **`Electron execution timed out after 300000ms`:** Orchestration wait too short — see **`timeoutMs`** above.
 - **Exit code 26** (and messages like **LLM failed to generate / replay action script**): Often corresponds to a completed exploration whose **outcome was goal not achievable** (`goal_not_achievable`, summary with `halt`) — e.g. verifying "view script after a successful run" when **no run or script exists yet** in the UI. Use `muggle-local-run-result-get` and read the **summary / structured summary**; do not assume an Electron crash. **Fix:** choose a **project that already has** completed runs and scripts, or **change the test case** so preconditions match what localhost can satisfy (e.g. include steps to create and run a test first, or assert only empty-state UI when no runs exist).
-### 6. Approval before any local execution
+### 6. Execute (no approval prompt)
-Use `AskQuestion` to get explicit approval before launching Electron. State: replay vs generation, test case name, URL.
-- "Yes, launch Electron (visible — I want to watch)"
-- "Yes, launch Electron (headless — run in background)"
-- "No, cancel"
-Only call local execute tools with `approveElectronAppLaunch: true` after the user selects a "Yes" option. Map visible to `showUi: true`, headless to `showUi: false`.
+Call `muggle-local-execute-test-generation` or `muggle-local-execute-replay` directly. **Do not** ask the user to re-approve the Electron launch — the user choosing this skill in the first place is the approval. The browser defaults to visible; only pass `showUi: false` if the user explicitly asked for headless.
 ### 7. After successful generation only
@@ -126,11 +125,62 @@ Only call local execute tools with `approveElectronAppLaunch: true` after the us
 - `muggle-local-run-result-get` with the run id from execute.
 - Include: status, duration, pass/fail summary, per-step summary, artifact/screenshot paths, errors if failed, and script view URL when publishing ran.
+### 9. Offer to post a visual walkthrough to the PR
+After reporting results, gather the required input and hand off to the shared **`muggle-pr-visual-walkthrough`** skill, which renders the walkthrough via `muggle build-pr-section` and posts it to the current branch's open PR.
+#### 9a: Gather per-step screenshots
+The shared skill takes an **`E2eReport` JSON** that includes per-step screenshot URLs. After step 7 has called `muggle-local-publish-test-script` and you have the `testScriptId`:
+1. Call `muggle-remote-test-script-get` with the `testScriptId`.
+2. Extract per step: `steps[].operation.action` and `steps[].operation.screenshotUrl`.
+3. Build the `steps` array: `[{ stepIndex: 0, action: "...", screenshotUrl: "..." }, ...]`.
+4. If the run failed, capture `failureStepIndex`, `error`, and the local `artifactsDir` from the run result in step 8.
+Assemble the `E2eReport`:
+```json
+{
+  "projectId": "<projectId from step 2>",
+  "tests": [
+    {
+      "name": "<test case title>",
+      "testCaseId": "<id>",
+      "testScriptId": "<id from publish>",
+      "runId": "<runId from execute>",
+      "viewUrl": "<viewUrl from publish>",
+      "status": "passed",
+      "steps": [{ "stepIndex": 0, "action": "...", "screenshotUrl": "..." }]
+    }
+  ]
+}
+```
+See the `muggle-pr-visual-walkthrough` skill for the full schema including the failed-test shape.
+#### 9b: Ask the user
+Use `AskQuestion`:
+> "Post a visual walkthrough of this run to the PR? Reviewers can click the test case to see step-by-step screenshots on the Muggle AI dashboard."
+- Option 1: "Yes, post to PR"
+- Option 2: "Skip"
+#### 9c: Invoke the shared skill in Mode A
+If the user chooses "Yes, post to PR", invoke the `muggle-pr-visual-walkthrough` skill via the `Skill` tool. With the `E2eReport` in context, the skill renders the markdown block via the CLI, finds the PR via `gh pr view`, posts `body` as a comment, posts the overflow `comment` only if the CLI emitted one, and confirms the PR URL to the user.
+Always use **Mode A** (post to existing PR) from this skill. Never hand-write the walkthrough markdown or call `gh pr comment` directly — delegate to `muggle-pr-visual-walkthrough`.
 ## Non-negotiables
-- No silent auth skip; no launching Electron without approval via `AskQuestion`.
+- No silent auth skip.
+- **Never prompt for Electron launch approval** before execution — invoking this skill is the approval. Just run.
 - If replayable scripts exist, do not default to generation without user choice.
 - No hiding failures: surface errors and artifact paths.
 - Replay: never hand-built or simplified `actionScript` — only from `muggle-remote-action-script-get`.
-- Use `AskQuestion` for every selection — project, use case, test case, script, and approval. Never ask the user to type a number.
+- Use `AskQuestion` for every selection — project, use case, test case, script. Never ask the user to type a number.
 - Project, use case, and test case selection lists must always include "Create new ...". Include "Show full list" whenever the API returned at least one row for that step; omit "Show full list" when the list is empty (offer "Create new ..." only). For creates, use preview tools (`muggle-remote-use-case-prompt-preview`, `muggle-remote-test-case-generate-from-prompt`) before persisting.
+- PR posting is always optional and always delegated to the `muggle-pr-visual-walkthrough` skill — never inline the walkthrough markdown or call `gh pr comment` directly from this skill.