@muggleai/works 4.8.1 → 4.8.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,7 +1,7 @@
1
1
  {
2
2
  "name": "muggle",
3
3
  "description": "Run real-browser end-to-end (E2E) acceptance tests on your web app from any AI coding agent. Generate test scripts from plain English, replay them on localhost, capture screenshots, and validate user flows like signup, checkout, and dashboards. Works across Claude Code, Cursor, Codex, and Windsurf.",
4
- "version": "4.8.1",
4
+ "version": "4.8.3",
5
5
  "author": {
6
6
  "name": "Muggle AI",
7
7
  "email": "support@muggle-ai.com"
@@ -2,7 +2,7 @@
2
2
  "name": "muggle",
3
3
  "displayName": "Muggle AI",
4
4
  "description": "Ship quality products with AI-powered end-to-end (E2E) acceptance testing that validates your web app like a real user — from Claude Code and Cursor to PR.",
5
- "version": "4.8.1",
5
+ "version": "4.8.3",
6
6
  "author": {
7
7
  "name": "Muggle AI",
8
8
  "email": "support@muggle-ai.com"
@@ -1,7 +1,15 @@
1
- # E2E / acceptance agent
1
+ # E2E / acceptance agent (Stage 6/7)
2
2
 
3
3
  You are running **end-to-end (E2E) acceptance** test cases against code changes using Muggle AI's local testing infrastructure. These tests simulate real users in a browser — they are not unit tests.
4
4
 
5
+ ## Turn preamble
6
+
7
+ Start the turn with:
8
+
9
+ ```
10
+ **Stage 6/7 — E2E acceptance** — running browser tests against the validation target from pre-flight.
11
+ ```
12
+
5
13
  ## Design
6
14
 
7
15
  E2E acceptance testing runs **locally** using the `test-feature-local` approach:
@@ -15,32 +23,35 @@ This guarantees E2E acceptance tests always run — no dependency on cloud repla
15
23
 
16
24
  ## Input
17
25
 
18
- You receive:
19
- - The Muggle project ID
26
+ You receive everything from `state.md` already — pre-flight resolved it:
27
+
28
+ - `localUrl` — the locally running dev server URL
29
+ - `projectId` — the chosen Muggle project
30
+ - The validation strategy (`local-e2e`, `staging-replay`, `unit-only`, `skip`)
31
+ - Test-user credential status (existing / new / skip)
20
32
  - The list of changed repos, files, and a summary of changes
21
33
  - The requirements goal
22
- - `localUrl` per repo (from `muggle-repos.json`) — the locally running dev server URL
23
34
 
24
35
  ## Your Job
25
36
 
26
- ### Step 0: Resolve Local URL
37
+ ### Step 0: Consume pre-flight (no user questions)
38
+
39
+ Read `state.md`. If the validation strategy is `unit-only` or `skip`, **do not run this stage** — skip to stage 7 and record the skip reason. Otherwise use `localUrl` directly; **do not ask the user** for it.
40
+
41
+ If `localUrl` or `projectId` is missing from `state.md`, that is a pre-flight bug. **Do not paper over it by asking the user** — escalate once with the session path and halt. The fix is to expand `pre-flight.md`, not to grow a new question here.
27
42
 
28
- Read `localUrl` for each repo from the context. If it is not provided, ask the user:
29
- > "E2E acceptance testing requires a running local server. What URL is the `<repo>` app running on? (e.g. `http://localhost:3000`)"
43
+ ### Step 0.5: Pre-flight verification probes
30
44
 
31
- **Do not skip E2E acceptance tests.** Wait for the user to provide the URL before proceeding.
45
+ Before launching Electron, run these live checks and fail loudly if any fails:
32
46
 
33
- ### Step 1: Check Authentication
47
+ 1. `curl -s -o /dev/null -w "%{http_code}" <localUrl>` — expect 2xx or 3xx. If the dev server isn't up, halt with the exact command the user needs to start it.
48
+ 2. If a backend URL is recorded, probe its health endpoint. A 5xx or unreachable backend means the dashboard will render in an error state and test results will be meaningless — halt.
49
+ 3. `muggle-remote-auth-status` — must be `authenticated`. If not, the pre-flight missed this; escalate.
50
+ 4. If test credentials were marked `existing`, confirm the Auth0 tenant in the repo's env matches the tenant the secrets were created under (recorded in `state.md`). Tenant mismatch → halt with "existing secrets target tenant X, local dev targets tenant Y — update pre-flight to collect new credentials."
34
51
 
35
- - `muggle-remote-auth-status`
36
- - If **authenticated**: print the logged-in email and ask via `AskQuestion`:
37
- > "You're logged in as **{email}**. Continue with this account?"
38
- - Option 1: "Yes, continue"
39
- - Option 2: "No, switch account"
40
- If the user picks "switch account", call `muggle-remote-auth-login` with `forceNewSession: true` then `muggle-remote-auth-poll`.
41
- - If **not signed in or expired**: `muggle-remote-auth-login` then `muggle-remote-auth-poll`
52
+ ### Step 1: Authentication already verified
42
53
 
43
- Do not skip or assume auth.
54
+ Pre-flight handled auth. If `muggle-remote-auth-status` somehow shows expired here (session clock skew, etc.), re-auth silently via `muggle-remote-auth-login` + `muggle-remote-auth-poll` — but do not ask the user "continue with this account?" again.
44
55
 
45
56
  ### Step 2: Get Test Cases
46
57
 
@@ -1,7 +1,15 @@
1
- # Impact Analysis Agent
1
+ # Impact Analysis Agent (Stage 3/7)
2
2
 
3
3
  You are analyzing git repositories to determine which ones have actual code changes that need to go through the dev cycle pipeline.
4
4
 
5
+ ## Turn preamble
6
+
7
+ Start the turn with:
8
+
9
+ ```
10
+ **Stage 3/7 — Impact analysis** — diffing each affected repo against its default branch.
11
+ ```
12
+
5
13
  ## Input
6
14
 
7
15
  You receive:
@@ -1,7 +1,29 @@
1
- # PR Creation Agent
1
+ # PR Creation Agent (Stage 7/7)
2
2
 
3
3
  You are creating pull requests for each repository that has changes after a successful dev cycle run.
4
4
 
5
+ ## Turn preamble
6
+
7
+ Start the turn with:
8
+
9
+ ```
10
+ **Stage 7/7 — Open PR** — rendering the visual walkthrough and pushing the PR.
11
+ ```
12
+
13
+ ## Non-negotiable: visual walkthrough is required
14
+
15
+ **You MUST invoke `muggle-pr-visual-walkthrough` (Mode B) to render the E2E section of the PR body.** Hand-writing the PR body with a text summary and `gh pr create` is a stage failure — reviewers rely on the dashboard links and per-step screenshots the walkthrough produces.
16
+
17
+ If the E2E stage was skipped (validation was `unit-only` or `skip`), you may omit the walkthrough section — but mark the PR title with `[UNVERIFIED]` or `[UNIT-ONLY]` accordingly, and record the reason in the PR body under `## Validation`.
18
+
19
+ Before calling `gh pr create`, self-check:
20
+
21
+ - [ ] `muggle-pr-visual-walkthrough` was invoked (or the skip reason is recorded).
22
+ - [ ] The `body` returned by the skill is embedded verbatim in the PR body.
23
+ - [ ] If `comment` is non-null, it will be posted as a follow-up after the PR is created.
24
+
25
+ If you cannot check all three, **halt** — do not create the PR. Fix the upstream stage first.
26
+
5
27
  ## Input
6
28
 
7
29
  You receive:
@@ -0,0 +1,104 @@
1
+ # Pre-flight Agent (Stage 1/7)
2
+
3
+ You are running the **only user-facing stage** of the muggle-do dev cycle. Your job is to consolidate every ambiguity — task scope, repos, validation strategy, environment, credentials, PR target — into a **single turn** so the rest of the cycle can run unattended.
4
+
5
+ **Non-negotiable:** Never split pre-flight across multiple turns. Detect what you can silently, then ask every remaining question at once. If you find yourself asking a follow-up, you failed — fold the follow-up back into this file so the next run covers it.
6
+
7
+ ## Turn preamble
8
+
9
+ Start the turn with:
10
+
11
+ ```
12
+ **Stage 1/7 — Pre-flight** — consolidating everything the cycle needs before going silent.
13
+ ```
14
+
15
+ ## Input
16
+
17
+ You receive:
18
+
19
+ - The user's task description (from `$ARGUMENTS`).
20
+ - The list of configured repos (names + paths) from the Muggle config.
21
+ - Any session directory that already exists (resumption case).
22
+
23
+ ## Silent detection (do this first — no user prompts)
24
+
25
+ Before asking anything, gather every fact you can resolve without the user:
26
+
27
+ 1. **Candidate repo(s).** Match keywords in the task description against configured repo names. If one repo is an obvious match, propose it as the default; if two or three are plausible, list them.
28
+ 2. **Current branch and default branch** for each candidate repo. Run `git -C <repo> symbolic-ref refs/remotes/origin/HEAD --short` and `git -C <repo> branch --show-current`. If the current branch is the default, the pre-flight must collect a new branch name.
29
+ 3. **Running dev server.** Scan common dev ports — `lsof -iTCP -sTCP:LISTEN -nP | grep -E ':(3000|3001|3999|4200|5173|8080)'` — and hit `/` with `curl -s -o /dev/null -w "%{http_code}"` to confirm 2xx.
30
+ 4. **Running backend.** If the repo's `.env.local` (or equivalent) declares a backend URL (e.g. `REACT_APP_BACKEND_BASE_URL=http://localhost:5050`), probe the health endpoint; note up/down.
31
+ 5. **Muggle MCP auth.** Call `muggle-remote-auth-status`. If expired, you will ask to re-auth in the questionnaire.
32
+ 6. **Candidate Muggle projects.** Call `muggle-remote-project-list` and rank by semantic match against the task description and the repo's dev URL.
33
+ 7. **Existing test-user secrets.** For each candidate Muggle project, call `muggle-remote-secret-list` and note whether `managed_profile_email` / `managed_profile_password` exist.
34
+ 8. **Auth0 tenant in use for local dev.** Grep the repo's env file for `*AUTH0_DOMAIN*`; record the tenant. This tells the user whether the staging-tenant test user will work or not.
35
+
36
+ ## The consolidated questionnaire
37
+
38
+ Present **one `AskQuestion`** (or the platform's structured-selection equivalent) that collects every remaining decision. Use detected values as defaults whenever possible. Questions to include, in this order:
39
+
40
+ 1. **Task scope clarification** — only if the task description is genuinely ambiguous. Offer 2–3 interpretations as options plus "Other — type a clarification." If the task is unambiguous, omit.
41
+ 2. **Repo(s) to modify** — pre-selected with the best silent match. "Confirm <repo>" / "Change repo" / "Multi-repo (list them)".
42
+ 3. **Branch name** — default: `users/<user>/<slug>` derived from the task. "Use default" / "Use different name (type)".
43
+ 4. **Validation strategy** — the single most important question. Options:
44
+ - **Local E2E** (Muggle Electron against a running localhost) — default if a dev server was detected.
45
+ - **Staging replay** — for changes already deployed to a preview URL.
46
+ - **Unit tests only** — skip E2E, acceptable for pure refactors or backend-only changes.
47
+ - **Skip validation** — explicit opt-out; the PR title gets `[UNVERIFIED]`.
48
+ 5. **Local URL** — only if validation is Local E2E. Default: the detected port. "Confirm `<detected>`" / "Type a different URL".
49
+ 6. **Backend reachable?** — only if validation is Local E2E and a backend URL is declared. If the health probe failed, ask "Start the backend now and I'll re-probe" / "Proceed anyway" / "Skip to unit tests only".
50
+ 7. **Muggle project** — pre-selected with the best silent match. "Use <top match>" / "Use a different existing project (list)" / "Create new".
51
+ 8. **Test-user credentials** — only if validation is Local E2E AND the Auth0 tenant in the repo differs from the tenant the managed secrets were created under. Options: "Reuse existing secrets (may fail if tenant mismatch — will surface failure)" / "Create new secrets for this tenant (provide email + password)" / "Switch to staging replay".
52
+ 9. **PR target branch** — default: the repo's default branch. "Use default" / "Target a different branch".
53
+ 10. **Re-auth Muggle MCP?** — only if auth was missing/expired. "Log in now" / "Abort".
54
+
55
+ If fewer than two of the above need the user, still gather them in a single turn — never open a second round.
56
+
57
+ ## Output
58
+
59
+ After the user answers, write **`state.md`** with every resolved value, verbatim, in this format:
60
+
61
+ ```markdown
62
+ # Session state
63
+
64
+ **Slug:** <slug>
65
+ **Current stage:** 1/7 (pre-flight complete)
66
+ **Last update:** <ISO-8601 timestamp>
67
+
68
+ ## Pre-flight answers
69
+
70
+ - Task: <one-line goal>
71
+ - Repos: <repo1>, <repo2>
72
+ - Branch: <branch-name>
73
+ - Validation: <strategy>
74
+ - Local URL: <url or N/A>
75
+ - Backend status: <up | down | N/A>
76
+ - Muggle project: <name> (<uuid>)
77
+ - Test credentials: <existing | new | skip>
78
+ - PR target: <branch>
79
+ - Auth status: <ok | re-authed | N/A>
80
+
81
+ ## Blockers
82
+ <none | bulleted list>
83
+ ```
84
+
85
+ Also initialize `iterations/001.md` with a header:
86
+
87
+ ```markdown
88
+ # Iteration 001 — <ISO-8601 timestamp>
89
+
90
+ ### Stage 1/7 — Pre-flight (<timestamp>)
91
+
92
+ <verbatim copy of pre-flight answers>
93
+ ```
94
+
95
+ ## Handoff
96
+
97
+ Return control to the muggle-do driver with a one-line summary: `pre-flight complete, proceeding silently through stages 2–7`. Do not print the pre-flight answers again — they are in `state.md` and the iteration log.
98
+
99
+ ## Non-negotiables
100
+
101
+ - Exactly **one** user turn. Zero follow-up questions inside this stage.
102
+ - Silent detection **must** run before the questionnaire — never ask for a value you can detect.
103
+ - Every detected value is a default, not a lock — the user can always override via "Type a different …".
104
+ - Missing `state.md` or `iterations/001.md` at the end of this stage is a stage failure.
@@ -1,7 +1,17 @@
1
- # Requirements Analysis Agent
1
+ # Requirements Analysis Agent (Stage 2/7)
2
2
 
3
3
  You are analyzing a user's task description to extract structured requirements for an autonomous development cycle.
4
4
 
5
+ ## Turn preamble
6
+
7
+ Start the turn with:
8
+
9
+ ```
10
+ **Stage 2/7 — Requirements** — extracting structured goals from the pre-flight-clarified task.
11
+ ```
12
+
13
+ Pre-flight already resolved ambiguity via the consolidated questionnaire. **Do not ask the user any questions here** — infer silently and record assumptions in Notes.
14
+
5
15
  ## Input
6
16
 
7
17
  You receive:
@@ -1,7 +1,15 @@
1
- # Unit Test Runner Agent
1
+ # Unit Test Runner Agent (Stage 5/7)
2
2
 
3
3
  You are running unit tests for each repository that has changes in the dev cycle pipeline.
4
4
 
5
+ ## Turn preamble
6
+
7
+ Start the turn with:
8
+
9
+ ```
10
+ **Stage 5/7 — Unit tests** — running each repo's test suite.
11
+ ```
12
+
5
13
  ## Input
6
14
 
7
15
  You receive:
@@ -1,7 +1,15 @@
1
- # Code Validation Agent
1
+ # Code Validation Agent (Stage 4/7)
2
2
 
3
3
  You are validating that each repository's git state is ready for the dev cycle pipeline.
4
4
 
5
+ ## Turn preamble
6
+
7
+ Start the turn with:
8
+
9
+ ```
10
+ **Stage 4/7 — Validate code** — checking branch and commit state for each repo with changes.
11
+ ```
12
+
5
13
  ## Input
6
14
 
7
15
  You receive:
@@ -6,9 +6,9 @@ disable-model-invocation: true
6
6
 
7
7
  # Muggle Do
8
8
 
9
- Muggle Do is the command for the Muggle AI development workflow.
9
+ Muggle Do runs a battle-tested autonomous dev cycle: **pre-flight requirements → impact analysis → validate code → unit tests → E2E acceptance → open PR**.
10
10
 
11
- It runs a battle-tested autonomous dev cycle: requirements -> impact analysis -> validate code -> coding -> unit tests -> E2E acceptance tests -> open PRs.
11
+ The design goal is **fire and review**: the user answers one consolidated pre-flight questionnaire, then walks away. Every subsequent stage runs unattended until completion or a genuine blocker.
12
12
 
13
13
  For maintenance tasks, use the dedicated skills:
14
14
 
@@ -20,34 +20,78 @@ For maintenance tasks, use the dedicated skills:
20
20
 
21
21
  Treat `$ARGUMENTS` as the user command:
22
22
 
23
- - Empty / `help` / `menu` / `?` -> show menu and session selector.
24
- - Anything else -> treat as a new task description and start/resume a dev-cycle session.
23
+ - Empty / `help` / `menu` / `?` show menu and session selector.
24
+ - Anything else treat as a new task description and start/resume a dev-cycle session.
25
+
26
+ ## The seven stages
27
+
28
+ | # | Stage | File | User-facing? |
29
+ | :- | :---- | :--- | :----------- |
30
+ | 1 | Pre-flight | [../do/pre-flight.md](../do/pre-flight.md) | **Yes — single consolidated turn** |
31
+ | 2 | Requirements | [../do/requirements.md](../do/requirements.md) | No |
32
+ | 3 | Impact analysis | [../do/impact-analysis.md](../do/impact-analysis.md) | No |
33
+ | 4 | Validate code | [../do/validate-code.md](../do/validate-code.md) | No |
34
+ | 5 | Unit tests | [../do/unit-tests.md](../do/unit-tests.md) | No |
35
+ | 6 | E2E acceptance | [../do/e2e-acceptance.md](../do/e2e-acceptance.md) | No |
36
+ | 7 | Open PR | [../do/open-prs.md](../do/open-prs.md) | No |
37
+
38
+ **Stage 1 (pre-flight) is the ONLY stage that talks to the user.** Stages 2–7 run silently to completion. If a later stage hits a genuine blocker that the pre-flight didn't cover, escalate with a single terminal message — do not open a second round of questions.
39
+
40
+ ## Front-loading (stage 1 non-negotiable)
41
+
42
+ All ambiguity — task scope, repo selection, validation strategy, localhost URL, backend health, Muggle project, test-user credentials, branch name, PR target — is resolved in a **single** pre-flight turn. See `pre-flight.md` for the exact questionnaire.
43
+
44
+ **Red-flag behaviors (do not do):**
45
+
46
+ - Asking a clarifying question mid-cycle because "I didn't think of that at pre-flight."
47
+ - Starting a dev server mid-cycle and discovering the port is wrong.
48
+ - Reaching the E2E stage before knowing how the user wants it validated.
49
+ - Asking the user to "pick one" across multiple turns instead of one turn.
50
+
51
+ If any of these happen, the pre-flight was incomplete — treat it as a skill bug, not a user bug, and expand `pre-flight.md` to cover the missed case after the run.
25
52
 
26
53
  ## Session model
27
54
 
28
- Use `.muggle-do/sessions/<slug>/` with these files:
55
+ Every run writes to `.muggle-do/sessions/<slug>/`:
29
56
 
30
- - `state.md`
31
- - `requirements.md`
32
- - `iterations/<NNN>.md`
33
- - `result.md`
57
+ - `state.md` — one-screen live status: current stage (N/7), last update timestamp, pre-flight answers verbatim, any blockers.
58
+ - `iterations/<NNN>.md` — append-only log of stage transitions for iteration NNN: what ran, what was decided, what artifacts were produced.
59
+ - `requirements.md` — frozen output of stage 2.
60
+ - `result.md` — final summary written by stage 7 (PR URLs, E2E outcome, open issues).
34
61
 
35
- On each stage transition, update `state.md` and append stage output to the active iteration file.
62
+ **On every stage transition, you MUST:**
36
63
 
37
- ## Dev cycle agents
64
+ 1. Append a dated entry to the active `iterations/<NNN>.md`: `### Stage N/7 — <name> (<timestamp>)` followed by the stage's output.
65
+ 2. Rewrite `state.md` to reflect the new current stage and any relevant counters.
38
66
 
39
- Use the supporting files in the `../do/` directory as stage-specific instructions:
67
+ If these files don't exist, create them — missing session files means the user lost visibility into the cycle, which is the exact failure mode this skill exists to prevent.
40
68
 
41
- - [requirements.md](../do/requirements.md)
42
- - [impact-analysis.md](../do/impact-analysis.md)
43
- - [validate-code.md](../do/validate-code.md)
44
- - [unit-tests.md](../do/unit-tests.md)
45
- - [e2e-acceptance.md](../do/e2e-acceptance.md)
46
- - [open-prs.md](../do/open-prs.md)
69
+ ## Turn preamble
70
+
71
+ Each stage turn MUST begin with one line in this form before any other output:
72
+
73
+ ```
74
+ **Stage N/7 — <stage name>** — <one-line intent>
75
+ ```
76
+
77
+ This is how the user can tell, at a glance, what phase the cycle is in without parsing a long response.
47
78
 
48
79
  ## Guardrails
49
80
 
50
- - Do not skip unit tests before E2E acceptance tests.
51
- - Do not skip E2E acceptance tests due to missing scripts; generate when needed.
52
- - If the same stage fails 3 times in a row, escalate with details.
53
- - If total iterations reach 3 and E2E acceptance tests still fail, continue to PR creation with `[E2E FAILING]`.
81
+ - **No mid-cycle user questions.** Anything not covered by pre-flight is a skill bug; escalate once, do not loop.
82
+ - **Do not skip unit tests before E2E acceptance tests.**
83
+ - **Do not skip E2E acceptance tests due to missing scripts** generate when needed.
84
+ - **Do not hand-write the E2E block of the PR body.** The `open-prs.md` stage MUST invoke `muggle-pr-visual-walkthrough` Mode B to render the screenshots-and-steps section. Hand-writing it loses the dashboard links the user relies on for review.
85
+ - **If the same stage fails 3 times in a row, escalate with details.**
86
+ - **If total iterations reach 3 and E2E acceptance tests still fail**, continue to PR creation with `[E2E FAILING]` in the title; the visual walkthrough section makes the failures reviewable.
87
+
88
+ ## Completion contract
89
+
90
+ When stage 7 finishes, the final message to the user contains at minimum:
91
+
92
+ - PR URL(s)
93
+ - E2E status (passing / `[E2E FAILING]`)
94
+ - Link to the run dashboard for each test case (via the walkthrough skill output)
95
+ - Path to `result.md` for full details
96
+
97
+ No other content. The user already read the walkthrough in the PR body — do not re-summarize it here.
@@ -5,17 +5,29 @@ description: Update Muggle AI to latest version. Use when user types muggle upgr
5
5
 
6
6
  # Muggle Upgrade
7
7
 
8
- Update all Muggle AI components to the latest published version.
8
+ Update all Muggle AI components to the latest published version. This means **both** the `@muggleai/works` CLI on npm **and** the Electron runner the CLI manages.
9
9
 
10
10
  ## Steps
11
11
 
12
12
  1. Run `/muggle:muggle-status` checks to capture current versions.
13
- 2. Run `muggle upgrade` to check GitHub releases for the latest electron-app version and download it.
14
- 3. Report the upgrade results:
15
- - Previous version vs new version for each component.
16
- - Whether the upgrade succeeded or failed.
17
- 4. Run `/muggle:muggle-status` again to confirm everything is healthy after upgrade.
13
+
14
+ 2. Capture CLI versions:
15
+ - Installed CLI: `muggle --version`
16
+ - Latest on npm: `npm view @muggleai/works version`
17
+ - Detect install location: `npm ls -g @muggleai/works --depth=0` (falls back to `pnpm ls -g @muggleai/works` if not found)
18
+
19
+ 3. **If installed CLI < latest on npm**, upgrade the CLI itself before touching Electron:
20
+ - npm global install: `npm install -g @muggleai/works@latest`
21
+ - pnpm global install: `pnpm add -g @muggleai/works@latest`
22
+ - If neither is detected, report the situation and ask the user how the CLI was installed before proceeding.
23
+
24
+ 4. Run `muggle upgrade` to pull the Electron runner version that the (now-latest) CLI expects.
25
+ - Note: `muggle upgrade` only manages the Electron runner — it does NOT upgrade the CLI npm package. That is why step 3 must run first.
26
+
27
+ 5. Run `/muggle:muggle-status` again to confirm everything is healthy after upgrade.
18
28
 
19
29
  ## Output
20
30
 
21
- Show a before/after version comparison. If the upgrade fails at any step, report the error and suggest running `/muggle:muggle-repair`.
31
+ Show a before/after table for **CLI**, **Electron runner**, **MCP server**, and **Auth**. Call out any version that did not change so the user understands what shipped vs what was already current.
32
+
33
+ If the upgrade fails at any step, report the error and suggest running `/muggle:muggle-repair`.
@@ -1,7 +1,7 @@
1
1
  {
2
- "release": "4.8.1",
3
- "buildId": "run-20-1",
4
- "commitSha": "8cd42b4b70c049e44510010003084227b04229a8",
5
- "buildTime": "2026-04-14T21:44:51Z",
2
+ "release": "4.8.3",
3
+ "buildId": "run-23-1",
4
+ "commitSha": "db2ab1ae29f982b0a12db6dbd2a334cc8016f016",
5
+ "buildTime": "2026-04-14T23:24:16Z",
6
6
  "serviceName": "muggle-ai-works-mcp"
7
7
  }
package/package.json CHANGED
@@ -1,7 +1,7 @@
1
1
  {
2
2
  "name": "@muggleai/works",
3
3
  "mcpName": "io.github.multiplex-ai/muggle",
4
- "version": "4.8.1",
4
+ "version": "4.8.3",
5
5
  "description": "Ship quality products with AI-powered E2E acceptance testing that validates your web app like a real user — from Claude Code and Cursor to PR.",
6
6
  "type": "module",
7
7
  "main": "dist/index.js",
@@ -41,14 +41,14 @@
41
41
  "test:watch": "vitest"
42
42
  },
43
43
  "muggleConfig": {
44
- "electronAppVersion": "1.0.59",
44
+ "electronAppVersion": "1.0.60",
45
45
  "downloadBaseUrl": "https://github.com/multiplex-ai/muggle-ai-works/releases/download",
46
46
  "runtimeTargetDefault": "production",
47
47
  "checksums": {
48
- "darwin-arm64": "c6da3f7f6b6875174a70a6c065554ed051ff99f731470e161ab71d9a9e568a87",
49
- "darwin-x64": "ff93b24724fd415b99c40bcce2cc90a31e453a6408aad45a63480ff92606e7b0",
50
- "win32-x64": "59bc67ea0a067fb4a204b87c73bf8cc387e705307f77ec96202822ff14b587fa",
51
- "linux-x64": "714c93f586ac423377d9061924d2f703c175e3b541ef6ad22c96f6c20d3f4ea0"
48
+ "darwin-arm64": "a1e084b32ebc28fa823bcc9161959bd3618f0c5e45f41f946b5efa9192b8b15b",
49
+ "darwin-x64": "0605176a152b12149015b500b5d04dfaec06a5f3ed4d615d5442437cbaa2ba3a",
50
+ "win32-x64": "f6c9a243894720420229d37ff8915827e56c5aa2f21dd887a0f476d952698ad8",
51
+ "linux-x64": "4404163f56b664c1d8c040435ef981d1ff6583e2ba2e4b38ea4375afecd1c204"
52
52
  }
53
53
  },
54
54
  "dependencies": {
@@ -1,7 +1,7 @@
1
1
  {
2
2
  "name": "muggle",
3
3
  "description": "Run real-browser end-to-end (E2E) acceptance tests on your web app from any AI coding agent. Generate test scripts from plain English, replay them on localhost, capture screenshots, and validate user flows like signup, checkout, and dashboards. Works across Claude Code, Cursor, Codex, and Windsurf.",
4
- "version": "4.8.1",
4
+ "version": "4.8.3",
5
5
  "author": {
6
6
  "name": "Muggle AI",
7
7
  "email": "support@muggle-ai.com"
@@ -2,7 +2,7 @@
2
2
  "name": "muggle",
3
3
  "displayName": "Muggle AI",
4
4
  "description": "Ship quality products with AI-powered end-to-end (E2E) acceptance testing that validates your web app like a real user — from Claude Code and Cursor to PR.",
5
- "version": "4.8.1",
5
+ "version": "4.8.3",
6
6
  "author": {
7
7
  "name": "Muggle AI",
8
8
  "email": "support@muggle-ai.com"
@@ -1,7 +1,15 @@
1
- # E2E / acceptance agent
1
+ # E2E / acceptance agent (Stage 6/7)
2
2
 
3
3
  You are running **end-to-end (E2E) acceptance** test cases against code changes using Muggle AI's local testing infrastructure. These tests simulate real users in a browser — they are not unit tests.
4
4
 
5
+ ## Turn preamble
6
+
7
+ Start the turn with:
8
+
9
+ ```
10
+ **Stage 6/7 — E2E acceptance** — running browser tests against the validation target from pre-flight.
11
+ ```
12
+
5
13
  ## Design
6
14
 
7
15
  E2E acceptance testing runs **locally** using the `test-feature-local` approach:
@@ -15,32 +23,35 @@ This guarantees E2E acceptance tests always run — no dependency on cloud repla
15
23
 
16
24
  ## Input
17
25
 
18
- You receive:
19
- - The Muggle project ID
26
+ You receive everything from `state.md` already — pre-flight resolved it:
27
+
28
+ - `localUrl` — the locally running dev server URL
29
+ - `projectId` — the chosen Muggle project
30
+ - The validation strategy (`local-e2e`, `staging-replay`, `unit-only`, `skip`)
31
+ - Test-user credential status (existing / new / skip)
20
32
  - The list of changed repos, files, and a summary of changes
21
33
  - The requirements goal
22
- - `localUrl` per repo (from `muggle-repos.json`) — the locally running dev server URL
23
34
 
24
35
  ## Your Job
25
36
 
26
- ### Step 0: Resolve Local URL
37
+ ### Step 0: Consume pre-flight (no user questions)
38
+
39
+ Read `state.md`. If the validation strategy is `unit-only` or `skip`, **do not run this stage** — skip to stage 7 and record the skip reason. Otherwise use `localUrl` directly; **do not ask the user** for it.
40
+
41
+ If `localUrl` or `projectId` is missing from `state.md`, that is a pre-flight bug. **Do not paper over it by asking the user** — escalate once with the session path and halt. The fix is to expand `pre-flight.md`, not to grow a new question here.
27
42
 
28
- Read `localUrl` for each repo from the context. If it is not provided, ask the user:
29
- > "E2E acceptance testing requires a running local server. What URL is the `<repo>` app running on? (e.g. `http://localhost:3000`)"
43
+ ### Step 0.5: Pre-flight verification probes
30
44
 
31
- **Do not skip E2E acceptance tests.** Wait for the user to provide the URL before proceeding.
45
+ Before launching Electron, run these live checks and fail loudly if any fails:
32
46
 
33
- ### Step 1: Check Authentication
47
+ 1. `curl -s -o /dev/null -w "%{http_code}" <localUrl>` — expect 2xx or 3xx. If the dev server isn't up, halt with the exact command the user needs to start it.
48
+ 2. If a backend URL is recorded, probe its health endpoint. A 5xx or unreachable backend means the dashboard will render in an error state and test results will be meaningless — halt.
49
+ 3. `muggle-remote-auth-status` — must be `authenticated`. If not, the pre-flight missed this; escalate.
50
+ 4. If test credentials were marked `existing`, confirm the Auth0 tenant in the repo's env matches the tenant the secrets were created under (recorded in `state.md`). Tenant mismatch → halt with "existing secrets target tenant X, local dev targets tenant Y — update pre-flight to collect new credentials."
34
51
 
35
- - `muggle-remote-auth-status`
36
- - If **authenticated**: print the logged-in email and ask via `AskQuestion`:
37
- > "You're logged in as **{email}**. Continue with this account?"
38
- - Option 1: "Yes, continue"
39
- - Option 2: "No, switch account"
40
- If the user picks "switch account", call `muggle-remote-auth-login` with `forceNewSession: true` then `muggle-remote-auth-poll`.
41
- - If **not signed in or expired**: `muggle-remote-auth-login` then `muggle-remote-auth-poll`
52
+ ### Step 1: Authentication already verified
42
53
 
43
- Do not skip or assume auth.
54
+ Pre-flight handled auth. If `muggle-remote-auth-status` somehow shows expired here (session clock skew, etc.), re-auth silently via `muggle-remote-auth-login` + `muggle-remote-auth-poll` — but do not ask the user "continue with this account?" again.
44
55
 
45
56
  ### Step 2: Get Test Cases
46
57
 
@@ -1,7 +1,15 @@
1
- # Impact Analysis Agent
1
+ # Impact Analysis Agent (Stage 3/7)
2
2
 
3
3
  You are analyzing git repositories to determine which ones have actual code changes that need to go through the dev cycle pipeline.
4
4
 
5
+ ## Turn preamble
6
+
7
+ Start the turn with:
8
+
9
+ ```
10
+ **Stage 3/7 — Impact analysis** — diffing each affected repo against its default branch.
11
+ ```
12
+
5
13
  ## Input
6
14
 
7
15
  You receive:
@@ -1,7 +1,29 @@
1
- # PR Creation Agent
1
+ # PR Creation Agent (Stage 7/7)
2
2
 
3
3
  You are creating pull requests for each repository that has changes after a successful dev cycle run.
4
4
 
5
+ ## Turn preamble
6
+
7
+ Start the turn with:
8
+
9
+ ```
10
+ **Stage 7/7 — Open PR** — rendering the visual walkthrough and pushing the PR.
11
+ ```
12
+
13
+ ## Non-negotiable: visual walkthrough is required
14
+
15
+ **You MUST invoke `muggle-pr-visual-walkthrough` (Mode B) to render the E2E section of the PR body.** Hand-writing the PR body with a text summary and `gh pr create` is a stage failure — reviewers rely on the dashboard links and per-step screenshots the walkthrough produces.
16
+
17
+ If the E2E stage was skipped (validation was `unit-only` or `skip`), you may omit the walkthrough section — but mark the PR title with `[UNVERIFIED]` or `[UNIT-ONLY]` accordingly, and record the reason in the PR body under `## Validation`.
18
+
19
+ Before calling `gh pr create`, self-check:
20
+
21
+ - [ ] `muggle-pr-visual-walkthrough` was invoked (or the skip reason is recorded).
22
+ - [ ] The `body` returned by the skill is embedded verbatim in the PR body.
23
+ - [ ] If `comment` is non-null, it will be posted as a follow-up after the PR is created.
24
+
25
+ If you cannot check all three, **halt** — do not create the PR. Fix the upstream stage first.
26
+
5
27
  ## Input
6
28
 
7
29
  You receive:
@@ -0,0 +1,104 @@
1
+ # Pre-flight Agent (Stage 1/7)
2
+
3
+ You are running the **only user-facing stage** of the muggle-do dev cycle. Your job is to consolidate every ambiguity — task scope, repos, validation strategy, environment, credentials, PR target — into a **single turn** so the rest of the cycle can run unattended.
4
+
5
+ **Non-negotiable:** Never split pre-flight across multiple turns. Detect what you can silently, then ask every remaining question at once. If you find yourself asking a follow-up, you failed — fold the follow-up back into this file so the next run covers it.
6
+
7
+ ## Turn preamble
8
+
9
+ Start the turn with:
10
+
11
+ ```
12
+ **Stage 1/7 — Pre-flight** — consolidating everything the cycle needs before going silent.
13
+ ```
14
+
15
+ ## Input
16
+
17
+ You receive:
18
+
19
+ - The user's task description (from `$ARGUMENTS`).
20
+ - The list of configured repos (names + paths) from the Muggle config.
21
+ - Any session directory that already exists (resumption case).
22
+
23
+ ## Silent detection (do this first — no user prompts)
24
+
25
+ Before asking anything, gather every fact you can resolve without the user:
26
+
27
+ 1. **Candidate repo(s).** Match keywords in the task description against configured repo names. If one repo is an obvious match, propose it as the default; if two or three are plausible, list them.
28
+ 2. **Current branch and default branch** for each candidate repo. Run `git -C <repo> symbolic-ref refs/remotes/origin/HEAD --short` and `git -C <repo> branch --show-current`. If the current branch is the default, the pre-flight must collect a new branch name.
29
+ 3. **Running dev server.** Scan common dev ports — `lsof -iTCP -sTCP:LISTEN -nP | grep -E ':(3000|3001|3999|4200|5173|8080)'` — and hit `/` with `curl -s -o /dev/null -w "%{http_code}"` to confirm 2xx.
30
+ 4. **Running backend.** If the repo's `.env.local` (or equivalent) declares a backend URL (e.g. `REACT_APP_BACKEND_BASE_URL=http://localhost:5050`), probe the health endpoint; note up/down.
31
+ 5. **Muggle MCP auth.** Call `muggle-remote-auth-status`. If expired, you will ask to re-auth in the questionnaire.
32
+ 6. **Candidate Muggle projects.** Call `muggle-remote-project-list` and rank by semantic match against the task description and the repo's dev URL.
33
+ 7. **Existing test-user secrets.** For each candidate Muggle project, call `muggle-remote-secret-list` and note whether `managed_profile_email` / `managed_profile_password` exist.
34
+ 8. **Auth0 tenant in use for local dev.** Grep the repo's env file for `*AUTH0_DOMAIN*`; record the tenant. This tells the user whether the staging-tenant test user will work or not.
35
+
36
+ ## The consolidated questionnaire
37
+
38
+ Present **one `AskQuestion`** (or the platform's structured-selection equivalent) that collects every remaining decision. Use detected values as defaults whenever possible. Questions to include, in this order:
39
+
40
+ 1. **Task scope clarification** — only if the task description is genuinely ambiguous. Offer 2–3 interpretations as options plus "Other — type a clarification." If the task is unambiguous, omit.
41
+ 2. **Repo(s) to modify** — pre-selected with the best silent match. "Confirm <repo>" / "Change repo" / "Multi-repo (list them)".
42
+ 3. **Branch name** — default: `users/<user>/<slug>` derived from the task. "Use default" / "Use different name (type)".
43
+ 4. **Validation strategy** — the single most important question. Options:
44
+ - **Local E2E** (Muggle Electron against a running localhost) — default if a dev server was detected.
45
+ - **Staging replay** — for changes already deployed to a preview URL.
46
+ - **Unit tests only** — skip E2E, acceptable for pure refactors or backend-only changes.
47
+ - **Skip validation** — explicit opt-out; the PR title gets `[UNVERIFIED]`.
48
+ 5. **Local URL** — only if validation is Local E2E. Default: the detected port. "Confirm `<detected>`" / "Type a different URL".
49
+ 6. **Backend reachable?** — only if validation is Local E2E and a backend URL is declared. If the health probe failed, ask "Start the backend now and I'll re-probe" / "Proceed anyway" / "Skip to unit tests only".
50
+ 7. **Muggle project** — pre-selected with the best silent match. "Use <top match>" / "Use a different existing project (list)" / "Create new".
51
+ 8. **Test-user credentials** — only if validation is Local E2E AND the Auth0 tenant in the repo differs from the tenant the managed secrets were created under. Options: "Reuse existing secrets (may fail if tenant mismatch — will surface failure)" / "Create new secrets for this tenant (provide email + password)" / "Switch to staging replay".
52
+ 9. **PR target branch** — default: the repo's default branch. "Use default" / "Target a different branch".
53
+ 10. **Re-auth Muggle MCP?** — only if auth was missing/expired. "Log in now" / "Abort".
54
+
55
+ If fewer than two of the above need the user, still gather them in a single turn — never open a second round.
56
+
57
+ ## Output
58
+
59
+ After the user answers, write **`state.md`** with every resolved value, verbatim, in this format:
60
+
61
+ ```markdown
62
+ # Session state
63
+
64
+ **Slug:** <slug>
65
+ **Current stage:** 1/7 (pre-flight complete)
66
+ **Last update:** <ISO-8601 timestamp>
67
+
68
+ ## Pre-flight answers
69
+
70
+ - Task: <one-line goal>
71
+ - Repos: <repo1>, <repo2>
72
+ - Branch: <branch-name>
73
+ - Validation: <strategy>
74
+ - Local URL: <url or N/A>
75
+ - Backend status: <up | down | N/A>
76
+ - Muggle project: <name> (<uuid>)
77
+ - Test credentials: <existing | new | skip>
78
+ - PR target: <branch>
79
+ - Auth status: <ok | re-authed | N/A>
80
+
81
+ ## Blockers
82
+ <none | bulleted list>
83
+ ```
84
+
85
+ Also initialize `iterations/001.md` with a header:
86
+
87
+ ```markdown
88
+ # Iteration 001 — <ISO-8601 timestamp>
89
+
90
+ ### Stage 1/7 — Pre-flight (<timestamp>)
91
+
92
+ <verbatim copy of pre-flight answers>
93
+ ```
94
+
95
+ ## Handoff
96
+
97
+ Return control to the muggle-do driver with a one-line summary: `pre-flight complete, proceeding silently through stages 2–7`. Do not print the pre-flight answers again — they are in `state.md` and the iteration log.
98
+
99
+ ## Non-negotiables
100
+
101
+ - Exactly **one** user turn. Zero follow-up questions inside this stage.
102
+ - Silent detection **must** run before the questionnaire — never ask for a value you can detect.
103
+ - Every detected value is a default, not a lock — the user can always override via "Type a different …".
104
+ - Missing `state.md` or `iterations/001.md` at the end of this stage is a stage failure.
@@ -1,7 +1,17 @@
1
- # Requirements Analysis Agent
1
+ # Requirements Analysis Agent (Stage 2/7)
2
2
 
3
3
  You are analyzing a user's task description to extract structured requirements for an autonomous development cycle.
4
4
 
5
+ ## Turn preamble
6
+
7
+ Start the turn with:
8
+
9
+ ```
10
+ **Stage 2/7 — Requirements** — extracting structured goals from the pre-flight-clarified task.
11
+ ```
12
+
13
+ Pre-flight already resolved ambiguity via the consolidated questionnaire. **Do not ask the user any questions here** — infer silently and record assumptions in Notes.
14
+
5
15
  ## Input
6
16
 
7
17
  You receive:
@@ -1,7 +1,15 @@
1
- # Unit Test Runner Agent
1
+ # Unit Test Runner Agent (Stage 5/7)
2
2
 
3
3
  You are running unit tests for each repository that has changes in the dev cycle pipeline.
4
4
 
5
+ ## Turn preamble
6
+
7
+ Start the turn with:
8
+
9
+ ```
10
+ **Stage 5/7 — Unit tests** — running each repo's test suite.
11
+ ```
12
+
5
13
  ## Input
6
14
 
7
15
  You receive:
@@ -1,7 +1,15 @@
1
- # Code Validation Agent
1
+ # Code Validation Agent (Stage 4/7)
2
2
 
3
3
  You are validating that each repository's git state is ready for the dev cycle pipeline.
4
4
 
5
+ ## Turn preamble
6
+
7
+ Start the turn with:
8
+
9
+ ```
10
+ **Stage 4/7 — Validate code** — checking branch and commit state for each repo with changes.
11
+ ```
12
+
5
13
  ## Input
6
14
 
7
15
  You receive:
@@ -6,9 +6,9 @@ disable-model-invocation: true
6
6
 
7
7
  # Muggle Do
8
8
 
9
- Muggle Do is the command for the Muggle AI development workflow.
9
+ Muggle Do runs a battle-tested autonomous dev cycle: **pre-flight requirements → impact analysis → validate code → unit tests → E2E acceptance → open PR**.
10
10
 
11
- It runs a battle-tested autonomous dev cycle: requirements -> impact analysis -> validate code -> coding -> unit tests -> E2E acceptance tests -> open PRs.
11
+ The design goal is **fire and review**: the user answers one consolidated pre-flight questionnaire, then walks away. Every subsequent stage runs unattended until completion or a genuine blocker.
12
12
 
13
13
  For maintenance tasks, use the dedicated skills:
14
14
 
@@ -20,34 +20,78 @@ For maintenance tasks, use the dedicated skills:
20
20
 
21
21
  Treat `$ARGUMENTS` as the user command:
22
22
 
23
- - Empty / `help` / `menu` / `?` -> show menu and session selector.
24
- - Anything else -> treat as a new task description and start/resume a dev-cycle session.
23
+ - Empty / `help` / `menu` / `?` show menu and session selector.
24
+ - Anything else treat as a new task description and start/resume a dev-cycle session.
25
+
26
+ ## The seven stages
27
+
28
+ | # | Stage | File | User-facing? |
29
+ | :- | :---- | :--- | :----------- |
30
+ | 1 | Pre-flight | [../do/pre-flight.md](../do/pre-flight.md) | **Yes — single consolidated turn** |
31
+ | 2 | Requirements | [../do/requirements.md](../do/requirements.md) | No |
32
+ | 3 | Impact analysis | [../do/impact-analysis.md](../do/impact-analysis.md) | No |
33
+ | 4 | Validate code | [../do/validate-code.md](../do/validate-code.md) | No |
34
+ | 5 | Unit tests | [../do/unit-tests.md](../do/unit-tests.md) | No |
35
+ | 6 | E2E acceptance | [../do/e2e-acceptance.md](../do/e2e-acceptance.md) | No |
36
+ | 7 | Open PR | [../do/open-prs.md](../do/open-prs.md) | No |
37
+
38
+ **Stage 1 (pre-flight) is the ONLY stage that talks to the user.** Stages 2–7 run silently to completion. If a later stage hits a genuine blocker that the pre-flight didn't cover, escalate with a single terminal message — do not open a second round of questions.
39
+
40
+ ## Front-loading (stage 1 non-negotiable)
41
+
42
+ All ambiguity — task scope, repo selection, validation strategy, localhost URL, backend health, Muggle project, test-user credentials, branch name, PR target — is resolved in a **single** pre-flight turn. See `pre-flight.md` for the exact questionnaire.
43
+
44
+ **Red-flag behaviors (do not do):**
45
+
46
+ - Asking a clarifying question mid-cycle because "I didn't think of that at pre-flight."
47
+ - Starting a dev server mid-cycle and discovering the port is wrong.
48
+ - Reaching the E2E stage before knowing how the user wants it validated.
49
+ - Asking the user to "pick one" across multiple turns instead of one turn.
50
+
51
+ If any of these happen, the pre-flight was incomplete — treat it as a skill bug, not a user bug, and expand `pre-flight.md` to cover the missed case after the run.
25
52
 
26
53
  ## Session model
27
54
 
28
- Use `.muggle-do/sessions/<slug>/` with these files:
55
+ Every run writes to `.muggle-do/sessions/<slug>/`:
29
56
 
30
- - `state.md`
31
- - `requirements.md`
32
- - `iterations/<NNN>.md`
33
- - `result.md`
57
+ - `state.md` — one-screen live status: current stage (N/7), last update timestamp, pre-flight answers verbatim, any blockers.
58
+ - `iterations/<NNN>.md` — append-only log of stage transitions for iteration NNN: what ran, what was decided, what artifacts were produced.
59
+ - `requirements.md` — frozen output of stage 2.
60
+ - `result.md` — final summary written by stage 7 (PR URLs, E2E outcome, open issues).
34
61
 
35
- On each stage transition, update `state.md` and append stage output to the active iteration file.
62
+ **On every stage transition, you MUST:**
36
63
 
37
- ## Dev cycle agents
64
+ 1. Append a dated entry to the active `iterations/<NNN>.md`: `### Stage N/7 — <name> (<timestamp>)` followed by the stage's output.
65
+ 2. Rewrite `state.md` to reflect the new current stage and any relevant counters.
38
66
 
39
- Use the supporting files in the `../do/` directory as stage-specific instructions:
67
+ If these files don't exist, create them — missing session files means the user lost visibility into the cycle, which is the exact failure mode this skill exists to prevent.
40
68
 
41
- - [requirements.md](../do/requirements.md)
42
- - [impact-analysis.md](../do/impact-analysis.md)
43
- - [validate-code.md](../do/validate-code.md)
44
- - [unit-tests.md](../do/unit-tests.md)
45
- - [e2e-acceptance.md](../do/e2e-acceptance.md)
46
- - [open-prs.md](../do/open-prs.md)
69
+ ## Turn preamble
70
+
71
+ Each stage turn MUST begin with one line in this form before any other output:
72
+
73
+ ```
74
+ **Stage N/7 — <stage name>** — <one-line intent>
75
+ ```
76
+
77
+ This is how the user can tell, at a glance, what phase the cycle is in without parsing a long response.
47
78
 
48
79
  ## Guardrails
49
80
 
50
- - Do not skip unit tests before E2E acceptance tests.
51
- - Do not skip E2E acceptance tests due to missing scripts; generate when needed.
52
- - If the same stage fails 3 times in a row, escalate with details.
53
- - If total iterations reach 3 and E2E acceptance tests still fail, continue to PR creation with `[E2E FAILING]`.
81
+ - **No mid-cycle user questions.** Anything not covered by pre-flight is a skill bug; escalate once, do not loop.
82
+ - **Do not skip unit tests before E2E acceptance tests.**
83
+ - **Do not skip E2E acceptance tests due to missing scripts** generate when needed.
84
+ - **Do not hand-write the E2E block of the PR body.** The `open-prs.md` stage MUST invoke `muggle-pr-visual-walkthrough` Mode B to render the screenshots-and-steps section. Hand-writing it loses the dashboard links the user relies on for review.
85
+ - **If the same stage fails 3 times in a row, escalate with details.**
86
+ - **If total iterations reach 3 and E2E acceptance tests still fail**, continue to PR creation with `[E2E FAILING]` in the title; the visual walkthrough section makes the failures reviewable.
87
+
88
+ ## Completion contract
89
+
90
+ When stage 7 finishes, the final message to the user contains at minimum:
91
+
92
+ - PR URL(s)
93
+ - E2E status (passing / `[E2E FAILING]`)
94
+ - Link to the run dashboard for each test case (via the walkthrough skill output)
95
+ - Path to `result.md` for full details
96
+
97
+ No other content. The user already read the walkthrough in the PR body — do not re-summarize it here.
@@ -5,17 +5,29 @@ description: Update Muggle AI to latest version. Use when user types muggle upgr
5
5
 
6
6
  # Muggle Upgrade
7
7
 
8
- Update all Muggle AI components to the latest published version.
8
+ Update all Muggle AI components to the latest published version. This means **both** the `@muggleai/works` CLI on npm **and** the Electron runner the CLI manages.
9
9
 
10
10
  ## Steps
11
11
 
12
12
  1. Run `/muggle:muggle-status` checks to capture current versions.
13
- 2. Run `muggle upgrade` to check GitHub releases for the latest electron-app version and download it.
14
- 3. Report the upgrade results:
15
- - Previous version vs new version for each component.
16
- - Whether the upgrade succeeded or failed.
17
- 4. Run `/muggle:muggle-status` again to confirm everything is healthy after upgrade.
13
+
14
+ 2. Capture CLI versions:
15
+ - Installed CLI: `muggle --version`
16
+ - Latest on npm: `npm view @muggleai/works version`
17
+ - Detect install location: `npm ls -g @muggleai/works --depth=0` (falls back to `pnpm ls -g @muggleai/works` if not found)
18
+
19
+ 3. **If installed CLI < latest on npm**, upgrade the CLI itself before touching Electron:
20
+ - npm global install: `npm install -g @muggleai/works@latest`
21
+ - pnpm global install: `pnpm add -g @muggleai/works@latest`
22
+ - If neither is detected, report the situation and ask the user how the CLI was installed before proceeding.
23
+
24
+ 4. Run `muggle upgrade` to pull the Electron runner version that the (now-latest) CLI expects.
25
+ - Note: `muggle upgrade` only manages the Electron runner — it does NOT upgrade the CLI npm package. That is why step 3 must run first.
26
+
27
+ 5. Run `/muggle:muggle-status` again to confirm everything is healthy after upgrade.
18
28
 
19
29
  ## Output
20
30
 
21
- Show a before/after version comparison. If the upgrade fails at any step, report the error and suggest running `/muggle:muggle-repair`.
31
+ Show a before/after table for **CLI**, **Electron runner**, **MCP server**, and **Auth**. Call out any version that did not change so the user understands what shipped vs what was already current.
32
+
33
+ If the upgrade fails at any step, report the error and suggest running `/muggle:muggle-repair`.