@muggleai/works 4.3.0 → 4.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -27,90 +27,96 @@ For each repo with changes:
27
27
  - `## Goal` — the requirements goal
28
28
  - `## Acceptance Criteria` — bulleted list (omit section if empty)
29
29
  - `## Changes` — summary of what changed in this repo
30
- - `## E2E Acceptance Results` summary table (see format below)
30
+ - E2E acceptance evidence block from `muggle build-pr-section` (see "Rendering the E2E acceptance results block" below)
31
31
  4. **Create the PR** using `gh pr create --title "..." --body "..." --head <branch>` in the repo directory.
32
32
  5. **Capture the PR URL** and extract the PR number.
33
- 6. **Post E2E acceptance evidence comment** with screenshots (see format below).
34
-
35
- ## E2E acceptance results section format (PR body)
36
-
37
- ```markdown
38
- ## E2E Acceptance Results
39
-
40
- **X passed / Y failed**
41
-
42
- | Test Case | Status | Details |
43
- |-----------|--------|---------|
44
- | [Name]({viewUrl}) | ✅ PASSED | — |
45
- | [Name]({viewUrl}) | ❌ FAILED | {error} |
33
+ 6. **Post the overflow comment only if `muggle build-pr-section` emitted one** (see "Rendering the E2E acceptance results block" below). In the common case, no comment is posted.
34
+
35
+ ## Rendering the E2E acceptance results block
36
+
37
+ Do **not** hand-write the `## E2E Acceptance Results` markdown. Use the `muggle build-pr-section` CLI, which renders a deterministic block and decides whether the evidence fits in the PR description or needs to spill into an overflow comment.
38
+
39
+ ### Step A: Build the report JSON
40
+
41
+ Assemble the e2e-acceptance report you collected in `e2e-acceptance.md` into a JSON object with this shape:
42
+
43
+ ```json
44
+ {
45
+ "projectId": "<project UUID>",
46
+ "tests": [
47
+ {
48
+ "name": "<test case name>",
49
+ "testCaseId": "<UUID>",
50
+ "testScriptId": "<UUID or omitted>",
51
+ "runId": "<UUID>",
52
+ "viewUrl": "<muggle-ai.com run URL>",
53
+ "status": "passed",
54
+ "steps": [
55
+ { "stepIndex": 0, "action": "<action>", "screenshotUrl": "<URL>" }
56
+ ]
57
+ },
58
+ {
59
+ "name": "<test case name>",
60
+ "testCaseId": "<UUID>",
61
+ "runId": "<UUID>",
62
+ "viewUrl": "<muggle-ai.com run URL>",
63
+ "status": "failed",
64
+ "failureStepIndex": 2,
65
+ "error": "<error message>",
66
+ "artifactsDir": "<path, optional>",
67
+ "steps": [
68
+ { "stepIndex": 0, "action": "<action>", "screenshotUrl": "<URL>" }
69
+ ]
70
+ }
71
+ ]
72
+ }
46
73
  ```
47
74
 
48
- ## E2E acceptance evidence comment format
75
+ ### Step B: Render the evidence block
49
76
 
50
- After creating the PR, post a comment with embedded screenshots:
77
+ Pipe the JSON into `muggle build-pr-section`. It writes `{ "body": "...", "comment": "..." | null }` to stdout:
51
78
 
52
79
  ```bash
53
- gh pr comment <PR#> --body "$(cat <<'EOF'
54
- ## 🧪 E2E acceptance evidence
55
-
56
- **X passed / Y failed**
57
-
58
- | Test Case | Status | Summary |
59
- |-----------|--------|---------|
60
- | [Login Flow]({viewUrl}) | ✅ PASSED | <a href="{lastStepScreenshotUrl}"><img src="{lastStepScreenshotUrl}" width="120"></a> |
61
- | [Checkout]({viewUrl}) | ❌ FAILED | <a href="{failureStepScreenshotUrl}"><img src="{failureStepScreenshotUrl}" width="120"></a> |
62
-
63
- <details>
64
- <summary>📸 <strong>Login Flow</strong> — 5 steps</summary>
65
-
66
- | # | Action | Screenshot |
67
- |---|--------|------------|
68
- | 1 | Navigate to `/login` | <a href="{screenshotUrl}"><img src="{screenshotUrl}" width="200"></a> |
69
- | 2 | Enter username | <a href="{screenshotUrl}"><img src="{screenshotUrl}" width="200"></a> |
70
- | 3 | Click "Sign In" | <a href="{screenshotUrl}"><img src="{screenshotUrl}" width="200"></a> |
80
+ echo "$REPORT_JSON" | muggle build-pr-section > /tmp/muggle-pr-section.json
81
+ ```
71
82
 
72
- </details>
83
+ The command exits nonzero on malformed input and writes a descriptive error to stderr — do not swallow that error, surface it to the user.
73
84
 
74
- <details>
75
- <summary>📸 <strong>Checkout</strong> — 4 steps (failed at step 3)</summary>
85
+ ### Step C: Build the PR body
76
86
 
77
- | # | Action | Screenshot |
78
- |---|--------|------------|
79
- | 1 | Add item to cart | <a href="{screenshotUrl}"><img src="{screenshotUrl}" width="200"></a> |
80
- | 2 | View cart | <a href="{screenshotUrl}"><img src="{screenshotUrl}" width="200"></a> |
81
- | 3 ⚠️ | Click confirm — **Element not found** | <a href="{screenshotUrl}"><img src="{screenshotUrl}" width="200"></a> |
87
+ Build the PR body by concatenating, in order:
82
88
 
83
- </details>
84
- EOF
85
- )"
86
- ```
89
+ - `## Goal` — the requirements goal
90
+ - `## Acceptance Criteria` — bulleted list (omit section if empty)
91
+ - `## Changes` — summary of what changed in this repo
92
+ - The `body` field from the CLI output (already contains its own `## E2E Acceptance Results` header)
87
93
 
88
- ### Comment Building Rules
94
+ ### Step D: Create the PR, then post the overflow comment only if present
89
95
 
90
- 1. **Summary table:**
91
- - Show thumbnail (120px) of **last step** for passed tests
92
- - Show thumbnail of **failure step** for failed tests
93
- - Thumbnail links to full-size image
96
+ 1. Create the PR with `gh pr create --title "..." --body "..." --head <branch>`.
97
+ 2. Capture the PR URL and extract the PR number.
98
+ 3. If the CLI output's `comment` field is `null`, **do not post a comment** everything is already in the PR description.
99
+ 4. If the CLI output's `comment` field is a non-null string, post it as a follow-up comment:
94
100
 
95
- 2. **Collapsible details per test case:**
96
- - Show all steps with 200px thumbnails
97
- - Mark failure step with ⚠️ and inline error message
98
- - Include step count in summary line
101
+ ```bash
102
+ gh pr comment <PR#> --body "$(cat <<'EOF'
103
+ <comment field contents>
104
+ EOF
105
+ )"
106
+ ```
99
107
 
100
- 3. **HTML for thumbnails:**
101
- - Use `<a href="{url}"><img src="{url}" width="N"></a>` for clickable thumbnails
102
- - 120px width in summary table, 200px in details
108
+ ### Notes on fit vs. overflow
103
109
 
104
- 4. **All tests get screenshots:**
105
- - Passing tests show proof of success
106
- - Failing tests highlight the failure point
110
+ - **The common case is fit**: the full evidence (summary, per-test rows, collapsible failure details) lives in the PR description, no comment is posted.
111
+ - **The overflow case** is triggered automatically when the full inline body would exceed the CLI's budget. In that case the PR description contains the summary, the per-test rows, and a pointer line; the full step-by-step failure details live in the follow-up comment.
112
+ - You do not make the fit-vs-overflow decision — the CLI does. Never post the comment speculatively.
107
113
 
108
114
  ## Output
109
115
 
110
116
  **PRs Created:**
111
117
  - (repo name): (PR URL)
112
118
 
113
- **E2E acceptance evidence comments posted:**
119
+ **E2E acceptance overflow comments posted:** (only include repos where an overflow comment was actually posted)
114
120
  - (repo name): comment posted to PR #(number)
115
121
 
116
122
  **Errors:** (any repos where PR creation or comment posting failed, with the error message)
@@ -9,24 +9,24 @@ Use this as the top-level Muggle command router.
9
9
 
10
10
  ## Menu
11
11
 
12
- When user asks for "muggle" with no specific subcommand, show this command set:
12
+ When user asks for "muggle" with no specific subcommand, use `AskQuestion` to present the command set as clickable options:
13
13
 
14
- - `/muggle:muggle-do`autonomous dev pipeline
15
- - `/muggle:muggle-test` change-driven E2E acceptance testing (local or remote, with PR posting)
16
- - `/muggle:muggle-test-feature-local`local feature E2E acceptance testing
17
- - `/muggle:muggle-status` — health check
18
- - `/muggle:muggle-repair`repair broken installation
19
- - `/muggle:muggle-upgrade`upgrade local installation
14
+ - "Test my changes change-driven E2E acceptance testing (local or remote)" → `muggle-test`
15
+ - "Test a feature on localhost run a single E2E test locally" → `muggle-test-feature-local`
16
+ - "Autonomous dev pipeline requirements to PR" `muggle-do`
17
+ - "Health check — verify installation status" → `muggle-status`
18
+ - "Repairfix broken installation" → `muggle-repair`
19
+ - "Upgradeupdate to latest version" → `muggle-upgrade`
20
20
 
21
21
  ## Routing
22
22
 
23
- If the user intent clearly matches one command, route to that command behavior:
23
+ If the user intent clearly matches one command, route directly no menu needed:
24
24
 
25
- - status/health/check -> `muggle-status`
26
- - repair/fix/install broken -> `muggle-repair`
27
- - upgrade/update latest -> `muggle-upgrade`
28
- - test my changes/acceptance test my work/test before push/post E2E acceptance results to PR/test on staging/test on preview -> `muggle-test`
29
- - test localhost/validate single feature -> `muggle-test-feature-local`
30
- - build/implement from request -> `muggle-do`
25
+ - status/health/check `muggle-status`
26
+ - repair/fix/install broken `muggle-repair`
27
+ - upgrade/update latest `muggle-upgrade`
28
+ - test my changes/acceptance test my work/test before push/post E2E acceptance results to PR/test on staging/test on preview `muggle-test`
29
+ - test localhost/validate single feature `muggle-test-feature-local`
30
+ - build/implement from request `muggle-do`
31
31
 
32
- If intent is ambiguous, ask one concise clarification question.
32
+ If intent is ambiguous, use `AskQuestion` with the most likely options rather than asking the user to type a clarification.
@@ -7,6 +7,15 @@ description: "Run change-driven E2E acceptance testing using Muggle AI — detec
7
7
 
8
8
  A router skill that detects code changes, resolves impacted test cases, executes them locally or remotely, publishes results to the Muggle AI dashboard, and posts E2E acceptance summaries to the PR. The user can invoke this at any moment, in any state.
9
9
 
10
+ ## UX Guidelines — Minimize Typing
11
+
12
+ **Every selection-based question MUST use the `AskQuestion` tool** (or the platform's equivalent structured selection tool). Never ask the user to "reply with a number" in a plain text message — always present clickable options.
13
+
14
+ - **Selections** (project, use case, test case, mode, approval): Use `AskQuestion` with labeled options the user can click.
15
+ - **Multi-select** (use cases, test cases): Use `AskQuestion` with `allow_multiple: true`.
16
+ - **Free-text inputs** (URLs, descriptions): Only use plain text prompts when there is no finite set of options. Even then, offer a detected/default value when possible.
17
+ - **Batch related questions**: If two questions are independent, present them together in a single `AskQuestion` call rather than asking sequentially.
18
+
10
19
  ## Step 1: Confirm Scope of Work (Always First)
11
20
 
12
21
  Parse the user's query and explicitly confirm their expectation. There are exactly two modes:
@@ -27,23 +36,13 @@ Signs the user wants this: mentions "preview", "staging", "deployed", "preview U
27
36
 
28
37
  ### Confirming
29
38
 
30
- If the user's intent is clear, state back what you understood and ask for confirmation:
31
- ```
32
- I'll run [local/remote] test generation. Confirm?
33
- ──────────────────────────────────────────────────────────────
34
- 1. Yes, proceed
35
- 2. No, switch to [the other mode]
36
- ──────────────────────────────────────────────────────────────
37
- ```
39
+ If the user's intent is clear, state back what you understood and use `AskQuestion` to confirm:
40
+ - Option 1: "Yes, proceed"
41
+ - Option 2: "Switch to [the other mode]"
38
42
 
39
- If ambiguous, ask the user to choose:
40
- ```
41
- How do you want to run the test?
42
- ──────────────────────────────────────────────────────────────
43
- 1. Local — launch browser on your machine against localhost
44
- 2. Remote — Muggle cloud tests against a preview/staging URL
45
- ──────────────────────────────────────────────────────────────
46
- ```
43
+ If ambiguous, use `AskQuestion` to let the user choose:
44
+ - Option 1: "Local — launch browser on your machine against localhost"
45
+ - Option 2: "Remote Muggle cloud tests against a preview/staging URL"
47
46
 
48
47
  Only proceed after the user selects an option.
49
48
 
@@ -75,26 +74,21 @@ If auth fails repeatedly, suggest: `muggle logout && muggle login` from terminal
75
74
 
76
75
  ## Step 4: Select Project (User Must Choose)
77
76
 
77
+ A **project** is where all your test results, use cases, and test scripts are grouped on the Muggle AI dashboard. Pick the project that matches what you're working on.
78
+
78
79
  1. Call `muggle-remote-project-list`
79
- 2. Present **all** projects as a numbered list:
80
+ 2. Use `AskQuestion` to present all projects as clickable options. Include the project URL in each label so the user can identify the right one. Always include a "Create new project" option at the end.
80
81
 
81
- ```
82
- Available Muggle Projects:
83
- ──────────────────────────────────────────────────────────────
84
- 1. MUGGLE AI STAGING 1 https://staging.muggle-ai.com/
85
- 2. Tanka Testing https://www.tanka.ai
86
- 3. Staging muggleTestV0 https://staging.muggle-ai.com/muggleTestV0
87
- 4. [Create new project]
88
- ──────────────────────────────────────────────────────────────
89
- ```
82
+ Example labels:
83
+ - "MUGGLE AI STAGING 1 — https://staging.muggle-ai.com/"
84
+ - "Tanka Testing — https://www.tanka.ai"
85
+ - "Create new project"
90
86
 
91
- > "Which project should I use? Reply with the number."
87
+ Prompt: "Pick the project to group this test run into:"
92
88
 
93
89
  3. **Wait for the user to explicitly choose** — do NOT auto-select based on repo name or URL matching
94
90
  4. **If user chooses "Create new project"**:
95
- - Ask for `projectName`
96
- - Ask for `description`
97
- - Ask for the production/preview URL
91
+ - Ask for `projectName`, `description`, and the production/preview URL
98
92
  - Call `muggle-remote-project-create`
99
93
 
100
94
  Store the `projectId` only after user confirms.
@@ -104,21 +98,11 @@ Store the `projectId` only after user confirms.
104
98
  ### 5a: List existing use cases
105
99
  Call `muggle-remote-use-case-list` with the project ID.
106
100
 
107
- ### 5b: Present ALL use cases as a numbered list for user selection
101
+ ### 5b: Present ALL use cases for user selection
108
102
 
109
- ```
110
- Available Use Cases for [Project Name]:
111
- ──────────────────────────────────────────────────────────────────────────
112
- 1. Sign up for Muggle Test account
113
- 2. Access existing account via login
114
- 3. Manually Add a Use Case
115
- 4. View Generated Test Script After Test Run
116
- 5. Generate comprehensive UX testing reports
117
- 6. [Create new use case]
118
- ──────────────────────────────────────────────────────────────────────────
119
- ```
103
+ Use `AskQuestion` with `allow_multiple: true` to present all use cases as clickable options. Always include a "Create new use case" option at the end.
120
104
 
121
- > "Which use case do you want to test? Reply with the number (or multiple numbers separated by commas)."
105
+ Prompt: "Which use case(s) do you want to test?"
122
106
 
123
107
  ### 5c: Wait for explicit user selection
124
108
 
@@ -143,19 +127,11 @@ For the selected use case(s):
143
127
  ### 6a: List existing test cases
144
128
  Call `muggle-remote-test-case-list-by-use-case` with each use case ID.
145
129
 
146
- ### 6b: Present ALL test cases as a numbered list for user selection
130
+ ### 6b: Present ALL test cases for user selection
147
131
 
148
- ```
149
- Available Test Cases for "[Use Case Name]":
150
- ──────────────────────────────────────────────────────────────────────────
151
- 1. E2E: Login with valid credentials
152
- 2. E2E: Login with invalid password
153
- 3. E2E: Login with expired session
154
- 4. [Generate new test case]
155
- ──────────────────────────────────────────────────────────────────────────
156
- ```
132
+ Use `AskQuestion` with `allow_multiple: true` to present all test cases as clickable options. Always include a "Generate new test case" option at the end.
157
133
 
158
- > "Which test case(s) do you want to run? Reply with the number (or multiple numbers separated by commas)."
134
+ Prompt: "Which test case(s) do you want to run?"
159
135
 
160
136
  ### 6c: Wait for explicit user selection
161
137
 
@@ -170,40 +146,34 @@ Available Test Cases for "[Use Case Name]":
170
146
 
171
147
  ### 6e: Confirm final selection
172
148
 
173
- > "You selected [N] test case(s): [list titles]. Ready to proceed?"
149
+ Use `AskQuestion` to confirm: "You selected [N] test case(s): [list titles]. Ready to proceed?"
150
+ - Option 1: "Yes, run them"
151
+ - Option 2: "No, let me re-select"
174
152
 
175
153
  Wait for user confirmation before moving to execution.
176
154
 
177
155
  ## Step 7A: Execute — Local Mode
178
156
 
179
- ### Three separate questions (ask one at a time, wait for answer before next)
157
+ ### Pre-flight questions (batch where possible)
180
158
 
181
159
  **Question 1 — Local URL:**
182
- > "Your local app should be running. What's the URL? (e.g., http://localhost:3000)"
183
160
 
184
- Wait for user to provide URL before asking question 2.
161
+ Try to auto-detect the dev server URL by checking running terminals or common ports (e.g., `lsof -iTCP -sTCP:LISTEN -nP | grep -E ':(3000|3001|4200|5173|8080)'`). If a likely URL is found, present it as a clickable default via `AskQuestion`:
162
+ - Option 1: "http://localhost:3000" (or whatever was detected)
163
+ - Option 2: "Other — let me type a URL"
185
164
 
186
- **Question 2 Electron launch approval:**
187
- ```
188
- I'll launch the Muggle Electron browser to run [N] test case(s).
189
- ──────────────────────────────────────────────────────────────
190
- 1. Yes, launch it
191
- 2. No, cancel
192
- ──────────────────────────────────────────────────────────────
193
- ```
165
+ If nothing detected, ask as free text: "Your local app should be running. What's the URL? (e.g., http://localhost:3000)"
194
166
 
195
- Wait for "1" before asking question 3. If "2", stop and ask what they want to do instead.
167
+ **Question 2 Electron launch + window visibility (ask together):**
196
168
 
197
- **Question 3 Window visibility:**
198
- ```
199
- How should the browser window appear?
200
- ──────────────────────────────────────────────────────────────
201
- 1. Visible (watch the browser as it runs)
202
- 2. Headless (run in background)
203
- ──────────────────────────────────────────────────────────────
204
- ```
169
+ After getting the URL, use a single `AskQuestion` call with two questions:
170
+
171
+ 1. "Ready to launch the Muggle Electron browser for [N] test case(s)?"
172
+ - "Yes, launch it (visible — I want to watch)"
173
+ - "Yes, launch it (headless run in background)"
174
+ - "No, cancel"
205
175
 
206
- Wait for answer (1 or 2) before proceeding.
176
+ If user cancels, stop and ask what they want to do instead.
207
177
 
208
178
  ### Run sequentially
209
179
 
@@ -213,8 +183,8 @@ For each test case:
213
183
  2. Call `muggle-local-execute-test-generation`:
214
184
  - `testCase`: Full test case object from step 1
215
185
  - `localUrl`: User's local URL (from Question 1)
216
- - `approveElectronAppLaunch`: `true` (only if user said "yes" in Question 2)
217
- - `showUi`: `true` if user chose "visible", `false` if "headless" (from Question 3)
186
+ - `approveElectronAppLaunch`: `true` (only if user approved in Question 2)
187
+ - `showUi`: `true` if user chose "visible", `false` if "headless" (from Question 2)
218
188
  3. Store the returned `runId`
219
189
 
220
190
  If a generation fails, log it and continue to the next. Do not abort the batch.
@@ -320,16 +290,9 @@ gh pr view --json number,url,title 2>/dev/null
320
290
  ```
321
291
 
322
292
  - If a PR exists → post results as a comment
323
- - If no PR exists → ask:
324
- ```
325
- No open PR found for this branch.
326
- ──────────────────────────────────────────────────────────────
327
- 1. Create PR with E2E acceptance results
328
- 2. Skip posting to PR
329
- ──────────────────────────────────────────────────────────────
330
- ```
331
- - If 1: create PR with E2E acceptance results in the body (use `gh pr create`)
332
- - If 2: skip this step
293
+ - If no PR exists → use `AskQuestion`:
294
+ - "Create PR with E2E acceptance results"
295
+ - "Skip posting to PR"
333
296
 
334
297
  ### 9b: Build the E2E acceptance comment body
335
298
 
@@ -403,10 +366,11 @@ If creating a new PR — include the E2E acceptance section in the PR body along
403
366
  ## Guardrails
404
367
 
405
368
  - **Always confirm intent first** — never assume local vs remote without asking
406
- - **User MUST select project** — present numbered list, wait for explicit choice, never auto-select
407
- - **User MUST select use case(s)** — present numbered list, wait for explicit choice, never auto-select based on git changes or heuristics
408
- - **User MUST select test case(s)** — present numbered list, wait for explicit choice, never auto-select
409
- - **Ask three separate questions for local mode** — (1) URL as text input, (2) Electron approval as numbered choice, (3) visibility as numbered choice — one at a time, wait for each answer
369
+ - **User MUST select project** — present clickable options via `AskQuestion`, wait for explicit choice, never auto-select
370
+ - **User MUST select use case(s)** — present clickable options via `AskQuestion`, wait for explicit choice, never auto-select based on git changes or heuristics
371
+ - **User MUST select test case(s)** — present clickable options via `AskQuestion`, wait for explicit choice, never auto-select
372
+ - **Use `AskQuestion` for every selection** — never ask the user to type a number; always present clickable options
373
+ - **Batch related questions** — combine Electron approval + visibility into one question; auto-detect localhost URL when possible
410
374
  - **Never launch Electron without explicit user approval** (`approveElectronAppLaunch`)
411
375
  - **Never silently drop test cases** — log failures and continue, then report them
412
376
  - **Never guess the URL** — always ask the user for localhost or preview URL
@@ -5,7 +5,7 @@ description: Run a real-browser end-to-end (E2E) acceptance test against localho
5
5
 
6
6
  # Muggle Test Feature Local
7
7
 
8
- **Goal:** Run or generate an end-to-end test against a **local URL** using Muggles Electron browser.
8
+ **Goal:** Run or generate an end-to-end test against a **local URL** using Muggle's Electron browser.
9
9
 
10
10
  | Scope | MCP tools |
11
11
  | :---- | :-------- |
@@ -15,12 +15,19 @@ description: Run a real-browser end-to-end (E2E) acceptance test against localho
15
15
 
16
16
  The local URL only changes where the browser opens; it does not change the remote project or test definitions.
17
17
 
18
+ ## UX Guidelines — Minimize Typing
19
+
20
+ **Every selection-based question MUST use the `AskQuestion` tool** (or the platform's equivalent structured selection tool). Never ask the user to "reply with a number" in a plain text message — always present clickable options.
21
+
22
+ - **Selections** (project, use case, test case, script, approval): Use `AskQuestion` with labeled options the user can click.
23
+ - **Free-text inputs** (URLs, descriptions): Only use plain text prompts when there is no finite set of options. Even then, offer a detected/default value when possible.
24
+
18
25
  ## Workflow
19
26
 
20
27
  ### 1. Auth
21
28
 
22
29
  - `muggle-remote-auth-status`
23
- - If not signed in: `muggle-remote-auth-login` then `muggle-remote-auth-poll`
30
+ - If not signed in: `muggle-remote-auth-login` then `muggle-remote-auth-poll`
24
31
  Do not skip or assume auth.
25
32
 
26
33
  ### 2. Targets (user must confirm)
@@ -31,41 +38,45 @@ Ask the user to pick **project**, **use case**, and **test case** (do not infer)
31
38
  - `muggle-remote-use-case-list` (with `projectId`)
32
39
  - `muggle-remote-test-case-list-by-use-case` (with `useCaseId`)
33
40
 
34
- **Selection UI (mandatory):** After each list call, present choices as a **numbered list** (`1.` … `n.`). Keep each line minimal: number, short title, UUID. Ask the user to **reply with the number** or the UUID.
41
+ **Selection UI (mandatory):** Every selection MUST use `AskQuestion` with clickable options. Never ask the user to "reply with the number" in plain text.
35
42
 
36
- **Fixed tail of each pick list (project, use case, test case):** After the relevance-ranked rows, end with the options below. **Create new …** is never omitted; **Show full list** is omitted when it would be pointless (see empty list).
43
+ **Project selection context:** A **project** groups all your test results, use cases, and test scripts on the Muggle AI dashboard. Include the project URL in each option label so the user can identify the right one.
37
44
 
38
- 1. **Show full list** — user sees every row from the API (then re-number the full list including the tails below again). **Skip this option** if the API returned **zero** rows for that step (e.g. no test cases yet for the chosen use case). There is nothing to expand.
39
- 2. **Create new …** — user creates a new entity instead of picking an existing one. Label per step: **Create new project**, **Create new use case**, or **Create new test case**.
45
+ Prompt for projects: "Pick the project to group this test into:"
40
46
 
41
47
  **Relevance-first filtering (mandatory for project, use case, and test case lists):**
42
48
 
43
49
  - Do **not** dump the full list by default.
44
- - Rank items by semantic relevance to the users stated goal (title first, then description / user story / acceptance criteria).
45
- - Show only the **top 35** most relevant options, then **Show full list** (unless the API list is empty — see above), then **Create new …** as above.
46
- - If the user picks **Show full list**, then present the complete numbered list (still ending with **Create new …**; include **Show full list** again only when the full list has at least one row).
50
+ - Rank items by semantic relevance to the user's stated goal (title first, then description / user story / acceptance criteria).
51
+ - Show only the **top 3-5** most relevant options via `AskQuestion`, plus these fixed tail options:
52
+ - **"Show full list"** present the complete list in a new `AskQuestion` call. **Skip this option** if the API returned zero rows.
53
+ - **"Create new ..."** — never omitted. Label per step: "Create new project", "Create new use case", or "Create new test case".
47
54
 
48
55
  **Create new — tools and flow (use these MCP tools; preview before persist):**
49
56
 
50
57
  - **Project — Create new project:** Collect `projectName`, `description`, and `url` (may be the local app URL, e.g. `http://localhost:3999`). Call `muggle-remote-project-create`. Use the returned `projectId` and continue.
51
- - **Use case — Create new use case:** User provides a natural-language instruction (or you reuse their testing goal).
52
- 1. `muggle-remote-use-case-prompt-preview` with `projectId`, `instruction` — show preview; get confirmation.
58
+ - **Use case — Create new use case:** User provides a natural-language instruction (or you reuse their testing goal).
59
+ 1. `muggle-remote-use-case-prompt-preview` with `projectId`, `instruction` — show preview; get confirmation via `AskQuestion`.
53
60
  2. `muggle-remote-use-case-create-from-prompts` with `projectId`, `prompts: [{ instruction }]` — persist. Use the created use case id and continue to test-case selection.
54
- - **Test case — Create new test case** (requires a chosen `useCaseId`): User provides an instruction describing what to test.
55
- 1. `muggle-remote-test-case-generate-from-prompt` with `projectId`, `useCaseId`, `instruction` — **preview only** (server test-case prompt preview); show the returned draft(s); get confirmation.
56
- 2. Persist the accepted draft with `muggle-remote-test-case-create`, mapping preview fields into the required properties (`title`, `description`, `goal`, `expectedResult`, `url`, etc.). Then continue from **§4** with that `testCaseId`.
61
+ - **Test case — Create new test case** (requires a chosen `useCaseId`): User provides an instruction describing what to test.
62
+ 1. `muggle-remote-test-case-generate-from-prompt` with `projectId`, `useCaseId`, `instruction` — **preview only** (server test-case prompt preview); show the returned draft(s); get confirmation via `AskQuestion`.
63
+ 2. Persist the accepted draft with `muggle-remote-test-case-create`, mapping preview fields into the required properties (`title`, `description`, `goal`, `expectedResult`, `url`, etc.). Then continue from **section 4** with that `testCaseId`.
57
64
 
58
65
  ### 3. Local URL
59
66
 
60
- - Use the URL the user gives. If none, ask; **do not guess**.
61
- - Remind them: local URL is only the execution target, not tied to cloud project config.
67
+ Try to auto-detect the dev server URL by checking running terminals or common ports (e.g., `lsof -iTCP -sTCP:LISTEN -nP | grep -E ':(3000|3001|4200|5173|8080)'`). If a likely URL is found, present it as a clickable default via `AskQuestion`:
68
+ - Option 1: "http://localhost:3000" (or whatever was detected)
69
+ - Option 2: "Other — let me type a URL"
70
+
71
+ If nothing detected, ask as free text: "Your local app should be running. What's the URL? (e.g., http://localhost:3000)"
72
+
73
+ Remind them: local URL is only the execution target, not tied to cloud project config.
62
74
 
63
75
  ### 4. Existing scripts vs new generation
64
76
 
65
77
  `muggle-remote-test-script-list` with `testCaseId`.
66
78
 
67
- - **If any replayable/succeeded scripts exist:** list them in a **numbered** list and ask: replay one **or** generate new.
68
- Show: name, id, created/updated, step count. Include **`Generate new script`** as the **last** numbered option (e.g. last number) so it is selectable by number too.
79
+ - **If any replayable/succeeded scripts exist:** use `AskQuestion` to present them as clickable options. Show: name, created/updated, step count per option. Include **"Generate new script"** as the last option.
69
80
  - **If none:** go straight to generation (no need to ask replay vs generate).
70
81
 
71
82
  ### 5. Load data for the chosen path
@@ -77,8 +88,8 @@ Ask the user to pick **project**, **use case**, and **test case** (do not infer)
77
88
 
78
89
  **Replay**
79
90
 
80
- 1. `muggle-remote-test-script-get` note `actionScriptId`
81
- 2. `muggle-remote-action-script-get` with that id full `actionScript`
91
+ 1. `muggle-remote-test-script-get` note `actionScriptId`
92
+ 2. `muggle-remote-action-script-get` with that id full `actionScript`
82
93
  **Use the API response as-is.** Do not edit, shorten, or rebuild `actionScript`; replay needs full `label` paths for element lookup.
83
94
  3. `muggle-local-execute-replay` (after approval in step 6) with `testScript`, `actionScript`, `localUrl`, `approveElectronAppLaunch: true` (optional: `showUi: true`, **`timeoutMs`** — see below)
84
95
 
@@ -88,17 +99,22 @@ The MCP client often uses a **default wait of 300000 ms (5 minutes)** for `muggl
88
99
 
89
100
  - **Always pass `timeoutMs`** for flows that may be long — for example **`600000` (10 min)** or **`900000` (15 min)** — unless the user explicitly wants a short cap.
90
101
  - If the tool reports **`Electron execution timed out after 300000ms`** (or similar) **but** Electron logs show the run still progressing (steps, screenshots, LLM calls), treat it as **orchestration timeout**, not an Electron app defect: **increase `timeoutMs` and retry** (after user re-approves if your policy requires it).
91
- - **Test case design:** Preconditions like a test run has already completed on an **empty account** can force many steps (sign-up, new project, crawl). Prefer an account/project that **already has** the needed state, or narrow the test goal so generation does not try to create a full project from scratch unless that is intentional.
102
+ - **Test case design:** Preconditions like "a test run has already completed" on an **empty account** can force many steps (sign-up, new project, crawl). Prefer an account/project that **already has** the needed state, or narrow the test goal so generation does not try to create a full project from scratch unless that is intentional.
92
103
 
93
104
  ### Interpreting `failed` / non-zero Electron exit
94
105
 
95
106
  - **`Electron execution timed out after 300000ms`:** Orchestration wait too short — see **`timeoutMs`** above.
96
- - **Exit code 26** (and messages like **LLM failed to generate / replay action script**): Often corresponds to a completed exploration whose **outcome was goal not achievable** (`goal_not_achievable`, summary with `halt`) — e.g. verifying view script after a successful run when **no run or script exists yet** in the UI. Use `muggle-local-run-result-get` and read the **summary / structured summary**; do not assume an Electron crash. **Fix:** choose a **project that already has** completed runs and scripts, or **change the test case** so preconditions match what localhost can satisfy (e.g. include steps to create and run a test first, or assert only empty-state UI when no runs exist).
107
+ - **Exit code 26** (and messages like **LLM failed to generate / replay action script**): Often corresponds to a completed exploration whose **outcome was goal not achievable** (`goal_not_achievable`, summary with `halt`) — e.g. verifying "view script after a successful run" when **no run or script exists yet** in the UI. Use `muggle-local-run-result-get` and read the **summary / structured summary**; do not assume an Electron crash. **Fix:** choose a **project that already has** completed runs and scripts, or **change the test case** so preconditions match what localhost can satisfy (e.g. include steps to create and run a test first, or assert only empty-state UI when no runs exist).
97
108
 
98
109
  ### 6. Approval before any local execution
99
110
 
100
- Get **explicit** OK to launch Electron. State: replay vs generation, test case name, URL.
101
- Only then call local execute tools with `approveElectronAppLaunch: true`.
111
+ Use `AskQuestion` to get explicit approval before launching Electron. State: replay vs generation, test case name, URL.
112
+
113
+ - "Yes, launch Electron (visible — I want to watch)"
114
+ - "Yes, launch Electron (headless — run in background)"
115
+ - "No, cancel"
116
+
117
+ Only call local execute tools with `approveElectronAppLaunch: true` after the user selects a "Yes" option. Map visible to `showUi: true`, headless to `showUi: false`.
102
118
 
103
119
  ### 7. After successful generation only
104
120
 
@@ -112,8 +128,9 @@ Only then call local execute tools with `approveElectronAppLaunch: true`.
112
128
 
113
129
  ## Non-negotiables
114
130
 
115
- - No silent auth skip; no launching Electron without approval.
131
+ - No silent auth skip; no launching Electron without approval via `AskQuestion`.
116
132
  - If replayable scripts exist, do not default to generation without user choice.
117
133
  - No hiding failures: surface errors and artifact paths.
118
134
  - Replay: never hand-built or simplified `actionScript` — only from `muggle-remote-action-script-get`.
119
- - Project, use case, and test case selection lists must always include **Create new …**. Include **Show full list** whenever the API returned at least one row for that step; **omit Show full list** when the list is empty (offer **Create new …** only). For creates, use preview tools (`muggle-remote-use-case-prompt-preview`, `muggle-remote-test-case-generate-from-prompt`) before persisting.
135
+ - Use `AskQuestion` for every selection project, use case, test case, script, and approval. Never ask the user to type a number.
136
+ - Project, use case, and test case selection lists must always include "Create new ...". Include "Show full list" whenever the API returned at least one row for that step; omit "Show full list" when the list is empty (offer "Create new ..." only). For creates, use preview tools (`muggle-remote-use-case-prompt-preview`, `muggle-remote-test-case-generate-from-prompt`) before persisting.