@muggleai/works 4.2.1 → 4.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (34) hide show
  1. package/README.md +100 -50
  2. package/dist/{chunk-CXTJOYWM.js → chunk-23NOSJFH.js} +284 -184
  3. package/dist/cli.js +1 -1
  4. package/dist/index.js +1 -1
  5. package/dist/plugin/.claude-plugin/plugin.json +4 -4
  6. package/dist/plugin/.cursor-plugin/plugin.json +3 -3
  7. package/dist/plugin/README.md +7 -5
  8. package/dist/plugin/scripts/ensure-electron-app.sh +3 -3
  9. package/dist/plugin/skills/do/e2e-acceptance.md +161 -0
  10. package/dist/plugin/skills/do/open-prs.md +78 -14
  11. package/dist/plugin/skills/muggle/SKILL.md +4 -2
  12. package/dist/plugin/skills/muggle-do/SKILL.md +6 -6
  13. package/dist/plugin/skills/muggle-test/SKILL.md +416 -0
  14. package/dist/plugin/skills/muggle-test-feature-local/SKILL.md +77 -80
  15. package/dist/plugin/skills/muggle-test-import/SKILL.md +276 -0
  16. package/dist/plugin/skills/muggle-upgrade/SKILL.md +1 -1
  17. package/dist/plugin/skills/optimize-descriptions/SKILL.md +8 -8
  18. package/package.json +15 -12
  19. package/plugin/.claude-plugin/plugin.json +4 -4
  20. package/plugin/.cursor-plugin/plugin.json +3 -3
  21. package/plugin/README.md +7 -5
  22. package/plugin/scripts/ensure-electron-app.sh +3 -3
  23. package/plugin/skills/do/e2e-acceptance.md +161 -0
  24. package/plugin/skills/do/open-prs.md +78 -14
  25. package/plugin/skills/muggle/SKILL.md +4 -2
  26. package/plugin/skills/muggle-do/SKILL.md +6 -6
  27. package/plugin/skills/muggle-test/SKILL.md +416 -0
  28. package/plugin/skills/muggle-test-feature-local/SKILL.md +77 -80
  29. package/plugin/skills/muggle-test-import/SKILL.md +276 -0
  30. package/plugin/skills/muggle-upgrade/SKILL.md +1 -1
  31. package/plugin/skills/optimize-descriptions/SKILL.md +8 -8
  32. package/scripts/postinstall.mjs +2 -2
  33. package/dist/plugin/skills/do/qa.md +0 -89
  34. package/plugin/skills/do/qa.md +0 -89
@@ -0,0 +1,416 @@
1
+ ---
2
+ name: muggle-test
3
+ description: "Run change-driven E2E acceptance testing using Muggle AI — detects local code changes, maps them to use cases, and generates test scripts either locally (real browser on localhost) or remotely (cloud execution on a preview/staging URL). Publishes results to Muggle dashboard, opens them in the browser, and posts E2E acceptance summaries with screenshots to the PR. Use this skill whenever the user wants to test their changes, run E2E acceptance tests on recent work, validate what they've been working on, or check if their code changes broke anything. Triggers on: 'test my changes', 'run tests on my changes', 'acceptance test my work', 'check my changes', 'validate my changes', 'test before I push', 'make sure my changes work', 'regression test my changes', 'test on preview', 'test on staging'. This is the go-to skill for change-driven E2E acceptance testing — it handles everything from change detection to test execution to result reporting."
4
+ ---
5
+
6
+ # Muggle Test — Change-Driven E2E Acceptance Router
7
+
8
+ A router skill that detects code changes, resolves impacted test cases, executes them locally or remotely, publishes results to the Muggle AI dashboard, and posts E2E acceptance summaries to the PR. The user can invoke this at any moment, in any state.
9
+
10
+ ## Step 1: Confirm Scope of Work (Always First)
11
+
12
+ Parse the user's query and explicitly confirm their expectation. There are exactly two modes:
13
+
14
+ ### Mode A: Local Test Generation
15
+ > Test impacted use cases/test cases against **localhost** using the Electron browser.
16
+ >
17
+ > Execution tool: `muggle-local-execute-test-generation`
18
+
19
+ Signs the user wants this: mentions "localhost", "local", "my machine", "dev server", "my changes locally", or just "test my changes" in a repo context.
20
+
21
+ ### Mode B: Remote Test Generation
22
+ > Ask Muggle's cloud to generate test scripts against a **preview/staging URL**.
23
+ >
24
+ > Execution tool: `muggle-remote-workflow-start-test-script-generation`
25
+
26
+ Signs the user wants this: mentions "preview", "staging", "deployed", "preview URL", "test on preview", "test the deployment", or provides a non-localhost URL.
27
+
28
+ ### Confirming
29
+
30
+ If the user's intent is clear, state back what you understood and ask for confirmation:
31
+ ```
32
+ I'll run [local/remote] test generation. Confirm?
33
+ ──────────────────────────────────────────────────────────────
34
+ 1. Yes, proceed
35
+ 2. No, switch to [the other mode]
36
+ ──────────────────────────────────────────────────────────────
37
+ ```
38
+
39
+ If ambiguous, ask the user to choose:
40
+ ```
41
+ How do you want to run the test?
42
+ ──────────────────────────────────────────────────────────────
43
+ 1. Local — launch browser on your machine against localhost
44
+ 2. Remote — Muggle cloud tests against a preview/staging URL
45
+ ──────────────────────────────────────────────────────────────
46
+ ```
47
+
48
+ Only proceed after the user selects an option.
49
+
50
+ ## Step 2: Detect Local Changes
51
+
52
+ Analyze the working directory to understand what changed.
53
+
54
+ 1. Run `git status` and `git diff --stat` for an overview
55
+ 2. Run `git diff` (or `git diff --cached` if staged) to read actual diffs
56
+ 3. Identify impacted feature areas:
57
+ - Changed UI components, pages, routes
58
+ - Modified API endpoints or data flows
59
+ - Updated form fields, validation, user interactions
60
+ 4. Produce a concise **change summary** — a list of impacted features
61
+
62
+ Present:
63
+ > "Here's what changed: [list]. I'll scope E2E acceptance testing to these areas."
64
+
65
+ If no changes detected (clean tree), tell the user and ask what they want to test.
66
+
67
+ ## Step 3: Authenticate
68
+
69
+ 1. Call `muggle-remote-auth-status`
70
+ 2. If authenticated and not expired → proceed
71
+ 3. If not authenticated or expired → call `muggle-remote-auth-login`
72
+ 4. If login pending → call `muggle-remote-auth-poll`
73
+
74
+ If auth fails repeatedly, suggest: `muggle logout && muggle login` from terminal.
75
+
76
+ ## Step 4: Select Project (User Must Choose)
77
+
78
+ 1. Call `muggle-remote-project-list`
79
+ 2. Present **all** projects as a numbered list:
80
+
81
+ ```
82
+ Available Muggle Projects:
83
+ ──────────────────────────────────────────────────────────────
84
+ 1. MUGGLE AI STAGING 1 https://staging.muggle-ai.com/
85
+ 2. Tanka Testing https://www.tanka.ai
86
+ 3. Staging muggleTestV0 https://staging.muggle-ai.com/muggleTestV0
87
+ 4. [Create new project]
88
+ ──────────────────────────────────────────────────────────────
89
+ ```
90
+
91
+ > "Which project should I use? Reply with the number."
92
+
93
+ 3. **Wait for the user to explicitly choose** — do NOT auto-select based on repo name or URL matching
94
+ 4. **If user chooses "Create new project"**:
95
+ - Ask for `projectName`
96
+ - Ask for `description`
97
+ - Ask for the production/preview URL
98
+ - Call `muggle-remote-project-create`
99
+
100
+ Store the `projectId` only after user confirms.
101
+
102
+ ## Step 5: Select Use Case (User Must Choose)
103
+
104
+ ### 5a: List existing use cases
105
+ Call `muggle-remote-use-case-list` with the project ID.
106
+
107
+ ### 5b: Present ALL use cases as a numbered list for user selection
108
+
109
+ ```
110
+ Available Use Cases for [Project Name]:
111
+ ──────────────────────────────────────────────────────────────────────────
112
+ 1. Sign up for Muggle Test account
113
+ 2. Access existing account via login
114
+ 3. Manually Add a Use Case
115
+ 4. View Generated Test Script After Test Run
116
+ 5. Generate comprehensive UX testing reports
117
+ 6. [Create new use case]
118
+ ──────────────────────────────────────────────────────────────────────────
119
+ ```
120
+
121
+ > "Which use case do you want to test? Reply with the number (or multiple numbers separated by commas)."
122
+
123
+ ### 5c: Wait for explicit user selection
124
+
125
+ **CRITICAL: Do NOT auto-select use cases** based on:
126
+ - Git changes analysis
127
+ - Use case title/description matching
128
+ - Any heuristic or inference
129
+
130
+ The user MUST explicitly tell you which use case(s) to use.
131
+
132
+ ### 5d: If user chooses "Create new use case"
133
+ 1. Ask the user to describe the use case in plain English
134
+ 2. Call `muggle-remote-use-case-create-from-prompts`:
135
+ - `projectId`: The project ID
136
+ - `prompts`: Array of `{ instruction: "..." }` with the user's description
137
+ 3. Present the created use case and confirm it's correct
138
+
139
+ ## Step 6: Select Test Case (User Must Choose)
140
+
141
+ For the selected use case(s):
142
+
143
+ ### 6a: List existing test cases
144
+ Call `muggle-remote-test-case-list-by-use-case` with each use case ID.
145
+
146
+ ### 6b: Present ALL test cases as a numbered list for user selection
147
+
148
+ ```
149
+ Available Test Cases for "[Use Case Name]":
150
+ ──────────────────────────────────────────────────────────────────────────
151
+ 1. E2E: Login with valid credentials
152
+ 2. E2E: Login with invalid password
153
+ 3. E2E: Login with expired session
154
+ 4. [Generate new test case]
155
+ ──────────────────────────────────────────────────────────────────────────
156
+ ```
157
+
158
+ > "Which test case(s) do you want to run? Reply with the number (or multiple numbers separated by commas)."
159
+
160
+ ### 6c: Wait for explicit user selection
161
+
162
+ **CRITICAL: Do NOT auto-select test cases** — the user MUST explicitly choose which test case(s) to execute.
163
+
164
+ ### 6d: If user chooses "Generate new test case"
165
+ 1. Ask the user to describe what they want to test in plain English
166
+ 2. Call `muggle-remote-test-case-generate-from-prompt`:
167
+ - `projectId`, `useCaseId`, `instruction` (the user's description)
168
+ 3. Present the generated test case(s) for review
169
+ 4. Call `muggle-remote-test-case-create` to save the ones the user approves
170
+
171
+ ### 6e: Confirm final selection
172
+
173
+ > "You selected [N] test case(s): [list titles]. Ready to proceed?"
174
+
175
+ Wait for user confirmation before moving to execution.
176
+
177
+ ## Step 7A: Execute — Local Mode
178
+
179
+ ### Three separate questions (ask one at a time, wait for answer before next)
180
+
181
+ **Question 1 — Local URL:**
182
+ > "Your local app should be running. What's the URL? (e.g., http://localhost:3000)"
183
+
184
+ Wait for user to provide URL before asking question 2.
185
+
186
+ **Question 2 — Electron launch approval:**
187
+ ```
188
+ I'll launch the Muggle Electron browser to run [N] test case(s).
189
+ ──────────────────────────────────────────────────────────────
190
+ 1. Yes, launch it
191
+ 2. No, cancel
192
+ ──────────────────────────────────────────────────────────────
193
+ ```
194
+
195
+ Wait for "1" before asking question 3. If "2", stop and ask what they want to do instead.
196
+
197
+ **Question 3 — Window visibility:**
198
+ ```
199
+ How should the browser window appear?
200
+ ──────────────────────────────────────────────────────────────
201
+ 1. Visible (watch the browser as it runs)
202
+ 2. Headless (run in background)
203
+ ──────────────────────────────────────────────────────────────
204
+ ```
205
+
206
+ Wait for answer (1 or 2) before proceeding.
207
+
208
+ ### Run sequentially
209
+
210
+ For each test case:
211
+
212
+ 1. Call `muggle-remote-test-case-get` to fetch full details
213
+ 2. Call `muggle-local-execute-test-generation`:
214
+ - `testCase`: Full test case object from step 1
215
+ - `localUrl`: User's local URL (from Question 1)
216
+ - `approveElectronAppLaunch`: `true` (only if user said "yes" in Question 2)
217
+ - `showUi`: `true` if user chose "visible", `false` if "headless" (from Question 3)
218
+ 3. Store the returned `runId`
219
+
220
+ If a generation fails, log it and continue to the next. Do not abort the batch.
221
+
222
+ ### Collect results
223
+
224
+ For each `runId`, call `muggle-local-run-result-get`. Extract: status, duration, step count, `artifactsDir`.
225
+
226
+ ### Publish each run to cloud
227
+
228
+ For each completed run, call `muggle-local-publish-test-script`:
229
+ - `runId`: The local run ID
230
+ - `cloudTestCaseId`: The cloud test case ID
231
+
232
+ This returns:
233
+ - `viewUrl`: Direct link to view this test run on the Muggle AI dashboard
234
+ - `testScriptId`, `actionScriptId`, `workflowRuntimeId`
235
+
236
+ Store every `viewUrl` — these are used in the next steps.
237
+
238
+ ### Report summary
239
+
240
+ ```
241
+ Test Case Status Duration Steps View on Muggle
242
+ ─────────────────────────────────────────────────────────────────────────
243
+ Login with valid creds PASSED 12.3s 8 https://www.muggle-ai.com/...
244
+ Login with invalid creds PASSED 9.1s 6 https://www.muggle-ai.com/...
245
+ Checkout flow FAILED 15.7s 12 https://www.muggle-ai.com/...
246
+ ─────────────────────────────────────────────────────────────────────────
247
+ Total: 3 tests | 2 passed | 1 failed | 37.1s
248
+ ```
249
+
250
+ For failures: show which step failed, the local screenshot path, and a suggestion.
251
+
252
+ ## Step 7B: Execute — Remote Mode
253
+
254
+ ### Ask for target URL
255
+
256
+ > "What's the preview/staging URL to test against?"
257
+
258
+ ### Trigger remote workflows
259
+
260
+ For each test case:
261
+
262
+ 1. Call `muggle-remote-test-case-get` to fetch full details
263
+ 2. Call `muggle-remote-workflow-start-test-script-generation`:
264
+ - `projectId`: The project ID
265
+ - `useCaseId`: The use case ID
266
+ - `testCaseId`: The test case ID
267
+ - `name`: `"muggle-test: {test case title}"`
268
+ - `url`: The preview/staging URL
269
+ - `goal`: From the test case
270
+ - `precondition`: From the test case (use `"None"` if empty)
271
+ - `instructions`: From the test case
272
+ - `expectedResult`: From the test case
273
+ 3. Store the returned workflow runtime ID
274
+
275
+ ### Monitor and report
276
+
277
+ For each workflow, call `muggle-remote-wf-get-ts-gen-latest-run` with the runtime ID.
278
+
279
+ ```
280
+ Test Case Workflow Status Runtime ID
281
+ ────────────────────────────────────────────────────────
282
+ Login with valid creds RUNNING rt-abc123
283
+ Login with invalid creds COMPLETED rt-def456
284
+ Checkout flow QUEUED rt-ghi789
285
+ ```
286
+
287
+ ## Step 8: Open Results in Browser
288
+
289
+ After execution and publishing are complete, open the Muggle AI dashboard so the user can visually inspect results and screenshots.
290
+
291
+ ### Mode A (Local) — open each published viewUrl
292
+
293
+ For each published run's `viewUrl`:
294
+ ```bash
295
+ open "https://www.muggle-ai.com/muggleTestV0/dashboard/projects/{projectId}/scripts?modal=script-details&testCaseId={testCaseId}"
296
+ ```
297
+
298
+ If there are many runs (>3), open just the project-level runs page instead of individual tabs:
299
+ ```bash
300
+ open "https://www.muggle-ai.com/muggleTestV0/dashboard/projects/{projectId}/runs"
301
+ ```
302
+
303
+ ### Mode B (Remote) — open the project runs page
304
+
305
+ ```bash
306
+ open "https://www.muggle-ai.com/muggleTestV0/dashboard/projects/{projectId}/runs"
307
+ ```
308
+
309
+ Tell the user:
310
+ > "I've opened the Muggle AI dashboard in your browser — you can see the test results, step-by-step screenshots, and action scripts there."
311
+
312
+ ## Step 9: Post E2E Acceptance Results to PR
313
+
314
+ After reporting results, check if there's an open PR for the current branch and attach the E2E acceptance summary.
315
+
316
+ ### 9a: Find the PR
317
+
318
+ ```bash
319
+ gh pr view --json number,url,title 2>/dev/null
320
+ ```
321
+
322
+ - If a PR exists → post results as a comment
323
+ - If no PR exists → ask:
324
+ ```
325
+ No open PR found for this branch.
326
+ ──────────────────────────────────────────────────────────────
327
+ 1. Create PR with E2E acceptance results
328
+ 2. Skip posting to PR
329
+ ──────────────────────────────────────────────────────────────
330
+ ```
331
+ - If 1: create PR with E2E acceptance results in the body (use `gh pr create`)
332
+ - If 2: skip this step
333
+
334
+ ### 9b: Build the E2E acceptance comment body
335
+
336
+ Construct a markdown comment with the full E2E acceptance breakdown. The format links each test case to its detail page on the Muggle AI dashboard, so PR reviewers can click through to see step-by-step screenshots and action scripts.
337
+
338
+ ```markdown
339
+ ## 🧪 Muggle AI — E2E Acceptance Results
340
+
341
+ **X passed / Y failed** | [View all on Muggle AI](https://www.muggle-ai.com/muggleTestV0/dashboard/projects/{projectId}/runs)
342
+
343
+ | Test Case | Status | Details |
344
+ |-----------|--------|---------|
345
+ | [Login with valid creds](https://www.muggle-ai.com/muggleTestV0/dashboard/projects/{projectId}/scripts?modal=script-details&testCaseId={testCaseId}) | ✅ PASSED | 8 steps, 12.3s |
346
+ | [Login with invalid creds](https://www.muggle-ai.com/muggleTestV0/dashboard/projects/{projectId}/scripts?modal=script-details&testCaseId={testCaseId}) | ✅ PASSED | 6 steps, 9.1s |
347
+ | [Checkout flow](https://www.muggle-ai.com/muggleTestV0/dashboard/projects/{projectId}/scripts?modal=script-details&testCaseId={testCaseId}) | ❌ FAILED | Step 7: "Click checkout button" — element not found |
348
+
349
+ <details>
350
+ <summary>Failed test details</summary>
351
+
352
+ ### Checkout flow
353
+ - **Failed at**: Step 7 — "Click checkout button"
354
+ - **Error**: Element not found
355
+ - **Local artifacts**: `~/.muggle-ai/sessions/{runId}/`
356
+ - **Screenshots**: `~/.muggle-ai/sessions/{runId}/screenshots/`
357
+
358
+ </details>
359
+
360
+ ---
361
+ *Generated by [Muggle AI](https://www.muggle-ai.com) — change-driven E2E acceptance testing*
362
+ ```
363
+
364
+ ### 9c: Post to the PR
365
+
366
+ If PR already exists — add as a comment:
367
+ ```bash
368
+ gh pr comment {pr-number} --body "$(cat <<'EOF'
369
+ {the E2E acceptance comment body from 9b}
370
+ EOF
371
+ )"
372
+ ```
373
+
374
+ If creating a new PR — include the E2E acceptance section in the PR body alongside the usual summary/changes sections.
375
+
376
+ ### 9d: Confirm to user
377
+
378
+ > "E2E acceptance results posted to PR #{number}. Reviewers can click the test case links to see step-by-step screenshots on the Muggle AI dashboard."
379
+
380
+ ## Tool Reference
381
+
382
+ | Phase | Tool | Mode |
383
+ |:------|:-----|:-----|
384
+ | Auth | `muggle-remote-auth-status` | Both |
385
+ | Auth | `muggle-remote-auth-login` | Both |
386
+ | Auth | `muggle-remote-auth-poll` | Both |
387
+ | Project | `muggle-remote-project-list` | Both |
388
+ | Project | `muggle-remote-project-create` | Both |
389
+ | Use Case | `muggle-remote-use-case-list` | Both |
390
+ | Use Case | `muggle-remote-use-case-create-from-prompts` | Both |
391
+ | Test Case | `muggle-remote-test-case-list-by-use-case` | Both |
392
+ | Test Case | `muggle-remote-test-case-generate-from-prompt` | Both |
393
+ | Test Case | `muggle-remote-test-case-create` | Both |
394
+ | Test Case | `muggle-remote-test-case-get` | Both |
395
+ | Execute | `muggle-local-execute-test-generation` | Local |
396
+ | Execute | `muggle-remote-workflow-start-test-script-generation` | Remote |
397
+ | Results | `muggle-local-run-result-get` | Local |
398
+ | Results | `muggle-remote-wf-get-ts-gen-latest-run` | Remote |
399
+ | Publish | `muggle-local-publish-test-script` | Local |
400
+ | Browser | `open` (shell command) | Both |
401
+ | PR | `gh pr view`, `gh pr comment`, `gh pr create` | Both |
402
+
403
+ ## Guardrails
404
+
405
+ - **Always confirm intent first** — never assume local vs remote without asking
406
+ - **User MUST select project** — present numbered list, wait for explicit choice, never auto-select
407
+ - **User MUST select use case(s)** — present numbered list, wait for explicit choice, never auto-select based on git changes or heuristics
408
+ - **User MUST select test case(s)** — present numbered list, wait for explicit choice, never auto-select
409
+ - **Ask three separate questions for local mode** — (1) URL as text input, (2) Electron approval as numbered choice, (3) visibility as numbered choice — one at a time, wait for each answer
410
+ - **Never launch Electron without explicit user approval** (`approveElectronAppLaunch`)
411
+ - **Never silently drop test cases** — log failures and continue, then report them
412
+ - **Never guess the URL** — always ask the user for localhost or preview URL
413
+ - **Always publish before opening browser** — the dashboard needs the published data to show results
414
+ - **Use correct dashboard URL format** — `modal=script-details` (not `modal=details`)
415
+ - **Always check for PR before posting** — don't create a PR comment if there's no PR (ask user first)
416
+ - **Can be invoked at any state** — if the user already has a project or use cases set up, skip to the relevant step rather than re-doing everything
@@ -1,122 +1,119 @@
1
1
  ---
2
2
  name: muggle-test-feature-local
3
- description: Run a real-browser QA test against localhost to verify a feature works correctly — signup flows, checkout, form validation, UI interactions, or any user-facing behavior. Launches a browser that executes test steps and captures screenshots. Use this skill whenever the user asks to test, QA, validate, or verify their web app, UI changes, user flows, or frontend behavior on localhost or a dev server — even if they don't mention 'muggle' or 'QA' explicitly.
3
+ description: Run a real-browser end-to-end (E2E) acceptance test against localhost to verify a feature works correctly — signup flows, checkout, form validation, UI interactions, or any user-facing behavior. Launches a browser that executes test steps and captures screenshots. Use this skill whenever the user asks to test, validate, or verify their web app, UI changes, user flows, or frontend behavior on localhost or a dev server — even if they don't mention 'muggle' or 'E2E' explicitly.
4
4
  ---
5
5
 
6
6
  # Muggle Test Feature Local
7
7
 
8
- Run end-to-end feature testing from UI against a local URL:
8
+ **Goal:** Run or generate an end-to-end test against a **local URL** using Muggle’s Electron browser.
9
9
 
10
- - Cloud management: `muggle-remote-*`
11
- - Local execution and artifacts: `muggle-local-*`
10
+ | Scope | MCP tools |
11
+ | :---- | :-------- |
12
+ | Cloud (projects, cases, scripts, auth) | `muggle-remote-*` |
13
+ | Local (Electron run, publish, results) | `muggle-local-*` |
14
+ | Create new entities (preview / create) | `muggle-remote-project-create`, `muggle-remote-use-case-prompt-preview`, `muggle-remote-use-case-create-from-prompts`, `muggle-remote-test-case-generate-from-prompt`, `muggle-remote-test-case-create` |
15
+
16
+ The local URL only changes where the browser opens; it does not change the remote project or test definitions.
12
17
 
13
18
  ## Workflow
14
19
 
15
20
  ### 1. Auth
16
21
 
17
22
  - `muggle-remote-auth-status`
18
- - If needed: `muggle-remote-auth-login` + `muggle-remote-auth-poll`
23
+ - If not signed in: `muggle-remote-auth-login` then `muggle-remote-auth-poll`
24
+ Do not skip or assume auth.
25
+
26
+ ### 2. Targets (user must confirm)
19
27
 
20
- ### 2. Select project, use case, and test case
28
+ Ask the user to pick **project**, **use case**, and **test case** (do not infer).
21
29
 
22
- - Explicitly ask user to select each target to proceed.
23
30
  - `muggle-remote-project-list`
24
- - `muggle-remote-use-case-list`
25
- - `muggle-remote-test-case-list-by-use-case`
31
+ - `muggle-remote-use-case-list` (with `projectId`)
32
+ - `muggle-remote-test-case-list-by-use-case` (with `useCaseId`)
33
+
34
+ **Selection UI (mandatory):** After each list call, present choices as a **numbered list** (`1.` … `n.`). Keep each line minimal: number, short title, UUID. Ask the user to **reply with the number** or the UUID.
35
+
36
+ **Fixed tail of each pick list (project, use case, test case):** After the relevance-ranked rows, end with the options below. **Create new …** is never omitted; **Show full list** is omitted when it would be pointless (see empty list).
37
+
38
+ 1. **Show full list** — user sees every row from the API (then re-number the full list including the tails below again). **Skip this option** if the API returned **zero** rows for that step (e.g. no test cases yet for the chosen use case). There is nothing to expand.
39
+ 2. **Create new …** — user creates a new entity instead of picking an existing one. Label per step: **Create new project**, **Create new use case**, or **Create new test case**.
40
+
41
+ **Relevance-first filtering (mandatory for project, use case, and test case lists):**
26
42
 
27
- ### 3. Resolve local URL
43
+ - Do **not** dump the full list by default.
44
+ - Rank items by semantic relevance to the user’s stated goal (title first, then description / user story / acceptance criteria).
45
+ - Show only the **top 3–5** most relevant options, then **Show full list** (unless the API list is empty — see above), then **Create new …** as above.
46
+ - If the user picks **Show full list**, then present the complete numbered list (still ending with **Create new …**; include **Show full list** again only when the full list has at least one row).
28
47
 
29
- - Use the URL provided by the user.
30
- - If missing, ask explicitly (do not guess).
31
- - Inform user the local URL does not affect the project's remote test.
48
+ **Create new tools and flow (use these MCP tools; preview before persist):**
32
49
 
33
- ### 4. Check for existing scripts and ask user to choose
50
+ - **Project Create new project:** Collect `projectName`, `description`, and `url` (may be the local app URL, e.g. `http://localhost:3999`). Call `muggle-remote-project-create`. Use the returned `projectId` and continue.
51
+ - **Use case — Create new use case:** User provides a natural-language instruction (or you reuse their testing goal).
52
+ 1. `muggle-remote-use-case-prompt-preview` with `projectId`, `instruction` — show preview; get confirmation.
53
+ 2. `muggle-remote-use-case-create-from-prompts` with `projectId`, `prompts: [{ instruction }]` — persist. Use the created use case id and continue to test-case selection.
54
+ - **Test case — Create new test case** (requires a chosen `useCaseId`): User provides an instruction describing what to test.
55
+ 1. `muggle-remote-test-case-generate-from-prompt` with `projectId`, `useCaseId`, `instruction` — **preview only** (server test-case prompt preview); show the returned draft(s); get confirmation.
56
+ 2. Persist the accepted draft with `muggle-remote-test-case-create`, mapping preview fields into the required properties (`title`, `description`, `goal`, `expectedResult`, `url`, etc.). Then continue from **§4** with that `testCaseId`.
34
57
 
35
- Check BOTH cloud and local scripts to determine what's available:
58
+ ### 3. Local URL
36
59
 
37
- 1. **Check cloud scripts:** `muggle-remote-test-script-list` filtered by projectId
38
- 2. **Check local scripts:** `muggle-local-test-script-list` filtered by projectId
60
+ - Use the URL the user gives. If none, ask; **do not guess**.
61
+ - Remind them: local URL is only the execution target, not tied to cloud project config.
39
62
 
40
- **Decision logic:**
63
+ ### 4. Existing scripts vs new generation
41
64
 
42
- | Cloud Script | Local Script (status: published/generated) | Action |
43
- |--------------|---------------------------------------------|--------|
44
- | Exists + ACTIVE | Exists | Ask user: "Replay existing script" or "Regenerate from scratch"? |
45
- | Exists + ACTIVE | Not found | Sync from cloud first, then ask user |
46
- | Not found | Exists | Ask user: "Replay local script" or "Regenerate"? |
47
- | Not found | Not found | Default to generation (no need to ask) |
65
+ `muggle-remote-test-script-list` with `testCaseId`.
48
66
 
49
- **When asking user, show:**
50
- - Script name and ID
51
- - When it was created/updated
52
- - Number of steps
53
- - Last run status if available
67
+ - **If any replayable/succeeded scripts exist:** list them in a **numbered** list and ask: replay one **or** generate new.
68
+ Show: name, id, created/updated, step count. Include **`Generate new script`** as the **last** numbered option (e.g. last number) so it is selectable by number too.
69
+ - **If none:** go straight to generation (no need to ask replay vs generate).
54
70
 
55
- ### 5. Prepare for execution
71
+ ### 5. Load data for the chosen path
56
72
 
57
- **For Replay:**
73
+ **Generate**
58
74
 
59
- Local scripts contain the complete `actionScript` with element labels required for replay. Remote scripts only contain metadata.
75
+ 1. `muggle-remote-test-case-get`
76
+ 2. `muggle-local-execute-test-generation` (after approval in step 6) with that test case + `localUrl` + `approveElectronAppLaunch: true` (optional: `showUi: true`, **`timeoutMs`** — see below)
60
77
 
61
- 1. Use `muggle-local-test-script-get` with `testScriptId` to fetch the FULL script including actionScript
62
- 2. The returned script includes all steps with `operation.label` paths needed for element location
63
- 3. Pass this complete script to `muggle-local-execute-replay`
78
+ **Replay**
64
79
 
65
- **IMPORTANT:** Do NOT manually construct or simplify the actionScript. The electron app requires the complete script with all `label` paths intact to locate page elements during replay.
80
+ 1. `muggle-remote-test-script-get` note `actionScriptId`
81
+ 2. `muggle-remote-action-script-get` with that id → full `actionScript`
82
+ **Use the API response as-is.** Do not edit, shorten, or rebuild `actionScript`; replay needs full `label` paths for element lookup.
83
+ 3. `muggle-local-execute-replay` (after approval in step 6) with `testScript`, `actionScript`, `localUrl`, `approveElectronAppLaunch: true` (optional: `showUi: true`, **`timeoutMs`** — see below)
66
84
 
67
- **For Generation:**
85
+ ### Local execution timeout (`timeoutMs`)
68
86
 
69
- 1. `muggle-remote-test-case-get` to fetch test case details
70
- 2. `muggle-local-execute-test-generation` with the test case
87
+ The MCP client often uses a **default wait of 300000 ms (5 minutes)** for `muggle-local-execute-test-generation` and `muggle-local-execute-replay`. **Exploratory script generation** (Auth0 login, dashboards, multi-step wizards, many LLM iterations) routinely **runs longer than 5 minutes** while Electron is still healthy.
71
88
 
72
- ### 6. Approval requirement
89
+ - **Always pass `timeoutMs`** for flows that may be long — for example **`600000` (10 min)** or **`900000` (15 min)** — unless the user explicitly wants a short cap.
90
+ - If the tool reports **`Electron execution timed out after 300000ms`** (or similar) **but** Electron logs show the run still progressing (steps, screenshots, LLM calls), treat it as **orchestration timeout**, not an Electron app defect: **increase `timeoutMs` and retry** (after user re-approves if your policy requires it).
91
+ - **Test case design:** Preconditions like “a test run has already completed” on an **empty account** can force many steps (sign-up, new project, crawl). Prefer an account/project that **already has** the needed state, or narrow the test goal so generation does not try to create a full project from scratch unless that is intentional.
73
92
 
74
- - Before execution, get explicit user approval for launching Electron app.
75
- - Show what will be executed (replay vs generation, test case name, URL).
76
- - Only then set `approveElectronAppLaunch: true`.
93
+ ### Interpreting `failed` / non-zero Electron exit
77
94
 
78
- ### 7. Execute
95
+ - **`Electron execution timed out after 300000ms`:** Orchestration wait too short — see **`timeoutMs`** above.
96
+ - **Exit code 26** (and messages like **LLM failed to generate / replay action script**): Often corresponds to a completed exploration whose **outcome was goal not achievable** (`goal_not_achievable`, summary with `halt`) — e.g. verifying “view script after a successful run” when **no run or script exists yet** in the UI. Use `muggle-local-run-result-get` and read the **summary / structured summary**; do not assume an Electron crash. **Fix:** choose a **project that already has** completed runs and scripts, or **change the test case** so preconditions match what localhost can satisfy (e.g. include steps to create and run a test first, or assert only empty-state UI when no runs exist).
79
97
 
80
- **Replay:**
81
- ```
82
- muggle-local-execute-replay with:
83
- - testScript: (full script from muggle-local-test-script-get)
84
- - localUrl: user-provided localhost URL
85
- - approveElectronAppLaunch: true
86
- - showUi: true (optional, lets user watch)
87
- ```
98
+ ### 6. Approval before any local execution
88
99
 
89
- **Generation:**
90
- ```
91
- muggle-local-execute-test-generation with:
92
- - testCase: (from muggle-remote-test-case-get)
93
- - localUrl: user-provided localhost URL
94
- - approveElectronAppLaunch: true
95
- - showUi: true (optional)
96
- ```
100
+ Get **explicit** OK to launch Electron. State: replay vs generation, test case name, URL.
101
+ Only then call local execute tools with `approveElectronAppLaunch: true`.
97
102
 
98
- ### 8. Publish generation results (generation only)
103
+ ### 7. After successful generation only
99
104
 
100
- - Use `muggle-local-publish-test-script` after successful generation.
101
- - This uploads the script to cloud so it can be replayed later.
102
- - Return the remote URL for user to view the result.
105
+ - `muggle-local-publish-test-script`
106
+ - Open returned `viewUrl` for the user (`open "<viewUrl>"` on macOS or OS equivalent).
103
107
 
104
- ### 9. Report results
108
+ ### 8. Report
105
109
 
106
- - `muggle-local-run-result-get` with returned runId.
107
- - Report:
108
- - status (passed/failed)
109
- - duration
110
- - pass/fail summary
111
- - steps summary (which steps passed/failed)
112
- - artifacts path (screenshots location)
113
- - script detail view URL
110
+ - `muggle-local-run-result-get` with the run id from execute.
111
+ - Include: status, duration, pass/fail summary, per-step summary, artifact/screenshot paths, errors if failed, and script view URL when publishing ran.
114
112
 
115
- ## Guardrails
113
+ ## Non-negotiables
116
114
 
117
- - Do not silently skip auth.
118
- - Do not silently skip asking user when a replayable script exists.
119
- - Do not launch Electron without explicit approval.
120
- - Do not hide failing run details; include error and artifacts path.
121
- - Do not simplify or reconstruct actionScript for replay; use the complete script from `muggle-local-test-script-get`.
122
- - Always check local scripts before defaulting to generation.
115
+ - No silent auth skip; no launching Electron without approval.
116
+ - If replayable scripts exist, do not default to generation without user choice.
117
+ - No hiding failures: surface errors and artifact paths.
118
+ - Replay: never hand-built or simplified `actionScript` only from `muggle-remote-action-script-get`.
119
+ - Project, use case, and test case selection lists must always include **Create new …**. Include **Show full list** whenever the API returned at least one row for that step; **omit Show full list** when the list is empty (offer **Create new …** only). For creates, use preview tools (`muggle-remote-use-case-prompt-preview`, `muggle-remote-test-case-generate-from-prompt`) before persisting.