@muggleai/works 4.4.0 → 4.5.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (28) hide show
  1. package/README.md +31 -13
  2. package/dist/{chunk-PMI2DI3V.js → chunk-MNCBJEPQ.js} +311 -35
  3. package/dist/cli.js +1 -1
  4. package/dist/index.js +1 -1
  5. package/dist/plugin/.claude-plugin/plugin.json +1 -1
  6. package/dist/plugin/.cursor-plugin/plugin.json +1 -1
  7. package/dist/plugin/README.md +1 -0
  8. package/dist/plugin/skills/do/open-prs.md +35 -74
  9. package/dist/plugin/skills/muggle-pr-visual-walkthrough/SKILL.md +181 -0
  10. package/dist/plugin/skills/muggle-test/SKILL.md +44 -48
  11. package/dist/plugin/skills/muggle-test-feature-local/SKILL.md +51 -1
  12. package/dist/plugin/skills/muggle-test-import/SKILL.md +122 -23
  13. package/dist/plugin/skills/muggle-test-regenerate-missing/SKILL.md +196 -0
  14. package/dist/plugin/skills/muggle-test-regenerate-missing/evals/evals.json +58 -0
  15. package/dist/plugin/skills/muggle-test-regenerate-missing/evals/trigger-eval.json +22 -0
  16. package/dist/release-manifest.json +7 -0
  17. package/package.json +7 -7
  18. package/plugin/.claude-plugin/plugin.json +1 -1
  19. package/plugin/.cursor-plugin/plugin.json +1 -1
  20. package/plugin/README.md +1 -0
  21. package/plugin/skills/do/open-prs.md +35 -74
  22. package/plugin/skills/muggle-pr-visual-walkthrough/SKILL.md +181 -0
  23. package/plugin/skills/muggle-test/SKILL.md +44 -48
  24. package/plugin/skills/muggle-test-feature-local/SKILL.md +51 -1
  25. package/plugin/skills/muggle-test-import/SKILL.md +122 -23
  26. package/plugin/skills/muggle-test-regenerate-missing/SKILL.md +196 -0
  27. package/plugin/skills/muggle-test-regenerate-missing/evals/evals.json +58 -0
  28. package/plugin/skills/muggle-test-regenerate-missing/evals/trigger-eval.json +22 -0
@@ -160,7 +160,10 @@ A **project** is where all your imported use cases, test cases, and future test
160
160
 
161
161
  ## Step 6 — Import
162
162
 
163
- Import in two passes. Show progress to the user as you go.
163
+ Import in two passes using bulk-preview. Show progress to the user as you go.
164
+
165
+ Both passes use Muggle's async bulk-preview MCP tools, which route prompts through OpenAI's
166
+ Batch API for roughly ~50% of normal LLM cost. The flow is always: **submit → poll → persist**.
164
167
 
165
168
  ### Path A — Native PRD upload (for document files)
166
169
 
@@ -199,40 +202,135 @@ If the upload or processing fails, fall back to Path B manual extraction.
199
202
 
200
203
  Run both passes below for Playwright, Cypress, or other test scripts.
201
204
 
202
- ### Pass 1 — Create use cases (Path B only)
205
+ ### Shared limits (both passes)
206
+
207
+ - Maximum 100 prompts per submit call. If you have more, split into batches of 100 and submit sequentially.
208
+ - Maximum 4000 characters per `instruction`.
209
+ - Maximum 3 in-flight bulk-preview jobs per project (the submit tool will error if exceeded).
210
+
211
+ ### Shared error handling (both passes)
203
212
 
204
- Call `muggle-remote-use-case-create-from-prompts` with all use cases in a single batch:
213
+ The bulk-preview submit and get/cancel MCP tools surface structured error codes — look for
214
+ these on any tool result and act accordingly:
205
215
 
216
+ | Error code / symptom | What happened | What to do |
217
+ |---|---|---|
218
+ | `TOO_MANY_IN_FLIGHT_JOBS` (HTTP 429) | Already 3 in-flight jobs for this project | Tell the user: "There are already 3 bulk-preview jobs in progress for this project. Wait for them to finish, then retry." Stop. |
219
+ | `QUOTA_EXCEEDED_PREFLIGHT` (HTTP 409) | Batch would blow past the account's quota for this resource | Show: "Your quota allows at most `<maxPromptsAllowed>` prompts in this batch (current headroom: `<headroom>`). Please reduce the batch and try again." Stop. |
220
+ | `NOT_FOUND` on submit (HTTP 404) | Project or parent use case does not exist, or this server version doesn't expose bulk-preview yet | Tell the user which — double-check the IDs you passed. If you're confident the IDs are right, ask the user to make sure the prompt-service is up to date. Stop. |
221
+ | `VALIDATION_ERROR` (HTTP 400) | A prompt exceeds limits (e.g. >4000 chars) or the prompt list is empty | Fix the offending prompts and retry. |
222
+ | Payload > 1 MB (HTTP 413) | Body too large | Split into smaller batches. |
223
+
224
+ ### Shared polling loop
225
+
226
+ After a successful submit, poll with `muggle-remote-bulk-preview-job-get` (inputs: `projectId`,
227
+ `jobId`) every 15 seconds. Show progress like:
206
228
  ```
207
- projectId: <chosen project ID>
208
- prompts: [
209
- { instruction: "<Use case name> — <one-sentence description of what this use case covers>" },
210
- ...
211
- ]
229
+ Generating previews for "User Authentication"... (status: running, elapsed: 30s)
212
230
  ```
213
231
 
214
- After the call returns, collect the use case IDs from the response.
215
- If IDs are not in the response, call `muggle-remote-use-case-list` and match by name.
232
+ `status` values and what to do:
216
233
 
217
- ### Pass 2 Create test cases
218
-
219
- For each use case, call `muggle-remote-test-case-create` for every test case under it:
234
+ | Status | Terminal? | Action |
235
+ |---|---|---|
236
+ | `queued` | No | Keep polling |
237
+ | `submitted` | No | Keep polling |
238
+ | `running` | No | Keep polling |
239
+ | `succeeded` | Yes | All prompts processed — proceed to persist results |
240
+ | `partial` | Yes | Some prompts succeeded — show summary, ask user whether to proceed |
241
+ | `failed` | Yes | Job failed entirely — show `error.message` and stop |
242
+ | `cancelled` | Yes | Job was cancelled — stop |
243
+ | `expired` | Yes | Job expired before completing — tell user to retry |
220
244
 
245
+ **If status is `partial`**, show:
221
246
  ```
222
- projectId: <project ID>
223
- useCaseId: <use case ID>
224
- title: "Login with valid credentials"
225
- description: "Navigate to the login page, enter a valid email and password, submit the form"
226
- goal: "Verify that a registered user can log in successfully"
227
- expectedResult: "User is redirected to the dashboard and sees their name in the header"
228
- precondition: "A user account exists and is not locked"
229
- priority: "HIGH"
230
- url: "https://app.example.com/login"
247
+ Preview completed with partial results: <N> of <promptCount> generated successfully.
248
+
249
+ Failed items:
250
+ - [<clientRef>] "<source text>": <error message>
251
+
252
+ Proceed with the <N> successful items, or cancel to review?
231
253
  ```
254
+ Use `AskQuestion` with options "Proceed with successful items" / "Cancel import". Only continue
255
+ if the user chooses to proceed.
256
+
257
+ If you need to abort an in-flight job, call `muggle-remote-bulk-preview-job-cancel` — the
258
+ server picks up the request cooperatively within one harvester tick.
259
+
260
+ ### Pass 1 — Create use cases (Path B only)
261
+
262
+ 1. Call `muggle-remote-use-case-bulk-preview-submit` with one prompt per use case:
263
+ ```
264
+ projectId: <chosen project ID>
265
+ prompts: [
266
+ { clientRef: "uc-0", instruction: "<Use case name> — <one-sentence description>" },
267
+ ...
268
+ ]
269
+ ```
270
+ The call returns `{ jobId, status, kind, promptCount }`.
271
+
272
+ 2. Run the **Shared polling loop** above until the job reaches a terminal status.
273
+
274
+ 3. For each successful result (shape: `{ clientRef, index, status: "success", useCase: IUseCaseCreationRequest }`),
275
+ call `muggle-remote-use-case-create` to persist it — no LLM is invoked, so this is fast and free:
276
+ ```
277
+ projectId: <project ID>
278
+ title: <from useCase.title>
279
+ description: <from useCase.description>
280
+ userStory: <from useCase.userStory>
281
+ url: <from useCase.url> # optional
282
+ useCaseBreakdown: <from useCase.useCaseBreakdown>
283
+ status: <from useCase.status> # e.g. DRAFT
284
+ priority: <from useCase.priority> # e.g. MEDIUM
285
+ source: <from useCase.source> # e.g. PROMPT
286
+ category: <from useCase.category> # optional
287
+ ```
288
+
289
+ 4. Collect the returned `useCaseId` of each created use case — you'll need it for Pass 2.
290
+ It is safe to persist use cases in parallel once the job is terminal.
291
+
292
+ ### Pass 2 — Generate and create test cases
293
+
294
+ For each use case, run a bulk-preview job to generate its test cases.
295
+
296
+ 1. Call `muggle-remote-test-case-bulk-preview-submit`:
297
+ ```
298
+ projectId: <project ID>
299
+ useCaseId: <use case ID>
300
+ prompts: [
301
+ {
302
+ clientRef: "tc-0",
303
+ instruction: "<title> | goal: <goal> | expectedResult: <expectedResult> | precondition: <precondition> | priority: <HIGH|MEDIUM|LOW> | url: <url>"
304
+ },
305
+ ...
306
+ ]
307
+ ```
308
+
309
+ 2. Run the **Shared polling loop** above until the job reaches a terminal status.
310
+
311
+ 3. Each successful result has this shape (note the fan-out):
312
+ ```jsonc
313
+ { "clientRef": "tc-0", "index": 0, "status": "success", "testCases": [ /* ITestCaseCreationRequest[] — fan-out 1–5 */ ] }
314
+ ```
315
+ One input prompt may produce 1–5 test case items. For each item in `result.testCases`, call
316
+ `muggle-remote-test-case-create`:
317
+ ```
318
+ projectId: <project ID>
319
+ useCaseId: <use case ID>
320
+ title: <from testCase.title>
321
+ description: <from testCase.description>
322
+ goal: <from testCase.goal>
323
+ expectedResult: <from testCase.expectedResult>
324
+ precondition: <from testCase.precondition>
325
+ priority: <from testCase.priority>
326
+ url: <from testCase.url>
327
+ ```
232
328
 
233
329
  Print progress: `Creating test cases for "User Authentication"... (1/3)`
234
330
 
235
- It is safe to create test cases for different use cases in parallel do so when you have many to create.
331
+ It is safe to create test cases for different use cases in parallel once their bulk-preview
332
+ jobs have reached a terminal status. However, **submit** bulk-preview jobs sequentially to
333
+ avoid exceeding the 3 in-flight job cap per project.
236
334
 
237
335
  ---
238
336
 
@@ -243,6 +341,7 @@ When all imports are done, print a clean summary. Include:
243
341
  - Total use cases and test cases created
244
342
  - A line per use case with its test case count and a link to view it
245
343
  - A link to the project overview
344
+ - If any items failed during preview (partial status), list them so the user can retry
246
345
 
247
346
  Construct view URLs using the Muggle dashboard URL pattern:
248
347
  - Project test cases: `https://www.muggle-ai.com/muggleTestV0/dashboard/projects/<projectId>/testcases`
@@ -0,0 +1,196 @@
1
+ ---
2
+ name: muggle-test-regenerate-missing
3
+ description: "Bulk-regenerate test scripts for every test case in a Muggle AI project that doesn't currently have an active script. Scans the project, finds test cases stuck in DRAFT or GENERATION_PENDING (no usable script attached), shows the user the list, and on approval kicks off remote test script generation for each one in parallel via the Muggle cloud. Use this skill whenever the user asks to 'regenerate missing scripts', 'fill in missing test scripts', 'generate scripts for test cases without one', 'regen all the test cases that don't have scripts', 'rebuild scripts for stale test cases', 'fix test cases with no script', 'bulk regenerate', or any phrasing that means 'kick off script generation across a project for the cases that need it'. Triggers on: 'regenerate missing test scripts', 'generate scripts for all empty test cases', 'fill the gaps in my test scripts', 'bulk test script regen', 'all my test cases without active scripts'. This is the go-to skill for project-wide script catch-up — it handles discovery, filtering, confirmation, and remote workflow dispatch end-to-end."
4
+ ---
5
+
6
+ # Muggle Test — Regenerate Missing Test Scripts
7
+
8
+ A bulk maintenance skill for Muggle AI projects. It finds every test case in a project that does **not** currently have an active (ready-to-run) test script, shows the list to the user, and on approval triggers a remote test script generation workflow for each one. Useful after creating a batch of new test cases or when cleaning up a project that has drifted.
9
+
10
+ Execution is **remote only** — Muggle's cloud generates the scripts in parallel against the project URL. The user's machine is not involved beyond making API calls.
11
+
12
+ ## Concept: what counts as "no active script"
13
+
14
+ In the Muggle data model, a test case carries a status that reflects whether it has a usable script attached:
15
+
16
+ | Status | Meaning | Regenerate? |
17
+ |:-------|:--------|:-----------:|
18
+ | `ACTIVE` | Has a generated, ready-to-run script | No — already good |
19
+ | `DRAFT` | Created but never generated | **Yes** |
20
+ | `GENERATION_PENDING` | Queued but generation never started | **Yes** |
21
+ | `GENERATING` | Currently generating | No — generation is in flight; don't double-dispatch |
22
+ | `REPLAY_PENDING` / `REPLAYING` | A replay is in flight; script exists | No — busy with replay |
23
+ | `DEPRECATED` | Marked stale on purpose | No — user decision |
24
+ | `ARCHIVED` | Hidden from normal flows | No — user decision |
25
+
26
+ The skill targets exactly **DRAFT** and **GENERATION_PENDING** by default. These are the two states that mean "no usable script attached and nothing running to produce one." `GENERATING` is deliberately excluded — a workflow is already in progress, and firing a second one races against the first and wastes budget.
27
+
28
+ Treat this filter as a default, not a law. If the user explicitly says "include generating ones too, they're stuck" or "include deprecated", respect the override — but don't widen the filter on your own.
29
+
30
+ ## UX Guidelines — Minimize Typing
31
+
32
+ **Every selection-based question MUST use the `AskQuestion` tool** (or the platform's equivalent structured selection tool). Never ask the user to "reply with a number" — always present clickable options.
33
+
34
+ - **Selections** (project, which test cases to include): Use `AskQuestion`, with `allow_multiple: true` for the test case picker.
35
+ - **Free-text inputs** (project URL when creating, override filters): Only ask as plain text when the option set isn't finite.
36
+ - **Batch related questions** when independent. Don't ask sequentially what could be one screen.
37
+
38
+ ## Workflow
39
+
40
+ ### Step 1 — Authenticate
41
+
42
+ 1. Call `muggle-remote-auth-status`.
43
+ 2. If not authenticated or expired → call `muggle-remote-auth-login`, then poll with `muggle-remote-auth-poll`.
44
+ 3. Do not skip auth and do not assume a stale token still works.
45
+
46
+ If auth keeps failing, suggest the user run `muggle logout && muggle login` from a terminal.
47
+
48
+ ### Step 2 — Select Project (user must choose)
49
+
50
+ A **project** is the unit on the Muggle AI dashboard that groups test cases, scripts, and runs. The user must pick the one to scan — never auto-select from repo name, branch, or URL heuristics.
51
+
52
+ 1. Call `muggle-remote-project-list`.
53
+ 2. Use `AskQuestion` to present projects as clickable options. Include the project URL in each label so the user can disambiguate. Always include "Create new project" as the last option.
54
+ 3. Wait for explicit selection.
55
+ 4. If the user picks "Create new project": collect `projectName`, `description`, and `url`, then call `muggle-remote-project-create`.
56
+
57
+ Store `projectId` and `projectUrl` only after the user confirms — both are needed downstream.
58
+
59
+ ### Step 3 — Scan Test Cases
60
+
61
+ Pull the full set of test cases for the project. The list endpoint is paginated.
62
+
63
+ 1. Call `muggle-remote-test-case-list` with `projectId`. Start at page 1; continue requesting pages until the response indicates no more items. Use a generous `pageSize` (e.g. 100) to keep the call count low.
64
+ 2. Accumulate everything into a single in-memory array.
65
+ 3. Tell the user the totals as you go if the project is large (e.g., "Found 247 test cases across the project").
66
+
67
+ If the call returns zero test cases, stop and tell the user — there is nothing to regenerate. Suggest they create test cases first (point them at `muggle-test` or `muggle-test-feature-local`).
68
+
69
+ ### Step 4 — Filter to Missing Scripts
70
+
71
+ Apply the default filter: keep only test cases where `status` ∈ `{DRAFT, GENERATION_PENDING}`.
72
+
73
+ Then show a one-line summary. Also surface how many cases are currently `GENERATING` so the user knows about in-flight work (but don't include them in the candidate list unless they override the filter):
74
+
75
+ ```
76
+ Project: <name> (<projectId>)
77
+ Total test cases: 247
78
+ With active script (skipped): 198
79
+ Currently generating (skipped, in-flight): 32
80
+ Needs regeneration: 17
81
+ • DRAFT: 12
82
+ • GENERATION_PENDING: 5
83
+ ```
84
+
85
+ If after filtering the list is empty, congratulate the user — every test case already has an active script — and stop. Do not invent work.
86
+
87
+ ### Step 5 — Present and Confirm Selection
88
+
89
+ Use `AskQuestion` with `allow_multiple: true` to present every candidate test case as a clickable option. The user must explicitly approve which ones to regenerate.
90
+
91
+ For each option label, show enough context for the user to make a real decision:
92
+
93
+ ```
94
+ [<status>] <title> — use case: <use case title>
95
+ ```
96
+
97
+ For example:
98
+ - `[DRAFT] Sign up with valid email — use case: User Registration`
99
+ - `[GENERATION_PENDING] Add item to cart — use case: Checkout Flow`
100
+
101
+ Default behavior:
102
+ - If there are **≤ 25** candidates, present all of them in a single `AskQuestion` with everything pre-checked and let the user deselect anything they want to skip.
103
+ - If there are **> 25** candidates, show the first 25 ranked by status priority (`DRAFT` → `GENERATION_PENDING`), plus a tail option **"Include all N — don't make me click each one"**. The user can also pick "Show next batch" to see more.
104
+
105
+ After selection, call `AskQuestion` once more for a final confirmation:
106
+
107
+ > "About to start remote test script generation for **N** test cases against `<projectUrl>`. This will consume Muggle workflow budget. Proceed?"
108
+ >
109
+ > - "Yes, start all N"
110
+ > - "No, cancel"
111
+
112
+ Only proceed after the user picks "Yes".
113
+
114
+ ### Step 6 — Dispatch Remote Generations
115
+
116
+ For each selected test case, in order:
117
+
118
+ 1. Call `muggle-remote-test-case-get` with `testCaseId` to fetch the full record (the list endpoint returns a slim shape; generation needs `goal`, `precondition`, `instructions`, `expectedResult`, `url`).
119
+ 2. Call `muggle-remote-workflow-start-test-script-generation` with:
120
+ - `projectId` — from Step 2
121
+ - `useCaseId` — from the test case
122
+ - `testCaseId` — the test case being regenerated
123
+ - `name` — `"muggle-test-regenerate-missing: {test case title}"` (so it's easy to find this batch later in the dashboard)
124
+ - `url` — prefer the test case's own `url` if set, else the project URL from Step 2
125
+ - `goal`, `precondition`, `instructions`, `expectedResult` — straight from the test case. If `precondition` is empty, pass `"None"` (the schema requires a non-empty string).
126
+ 3. Capture the returned workflow runtime ID and store it alongside the test case.
127
+
128
+ **Failure handling:** if a single dispatch fails (validation error, server error, missing field), log it inline, mark the test case as `dispatch_failed`, and continue to the next one. Do not abort the whole batch — partial progress is more useful than nothing.
129
+
130
+ **Pacing:** Muggle's cloud handles parallelism on its side, so you don't need to throttle. Just dispatch sequentially as fast as the API will accept them.
131
+
132
+ ### Step 7 — Report
133
+
134
+ After all dispatches are done, print a summary table:
135
+
136
+ ```
137
+ Test Case Use Case Prev Status Dispatch Runtime
138
+ ───────────────────────────────────────────────────────────────────────────────────────────────────
139
+ Sign up with valid email User Registration DRAFT ✅ started rt-abc123
140
+ Sign up with disposable email User Registration DRAFT ✅ started rt-def456
141
+ Add item to cart Checkout Flow GENERATION_PEND. ✅ started rt-ghi789
142
+ Apply expired coupon Checkout Flow GENERATION_PEND. ❌ failed —
143
+ ───────────────────────────────────────────────────────────────────────────────────────────────────
144
+ Total: 17 dispatched | 16 started | 1 failed
145
+ ```
146
+
147
+ For failures: include a one-line error excerpt and (where possible) a hint at the cause (e.g., "missing instructions field — edit the test case in the dashboard, then re-run this skill").
148
+
149
+ ### Step 8 — Open the Dashboard
150
+
151
+ Open the Muggle AI dashboard so the user can watch progress visually:
152
+
153
+ ```bash
154
+ open "https://www.muggle-ai.com/muggleTestV0/dashboard/projects/{projectId}/runs"
155
+ ```
156
+
157
+ Tell them:
158
+
159
+ > "I've opened the project's runs page. Generation jobs typically take a few minutes each — they'll appear here as they progress. Your test cases will move into `ACTIVE` status as scripts complete."
160
+
161
+ ### Step 9 (optional) — Poll Status
162
+
163
+ Only if the user explicitly asks for a status check, call `muggle-remote-wf-get-ts-gen-latest-run` for each runtime ID and report progress. **Do not poll on a tight loop by default** — the dispatch step is the actual goal of this skill, and the dashboard already shows live status better than a CLI loop ever can. Polling exists as a courtesy for users who don't want to leave the terminal.
164
+
165
+ If the user wants a one-shot snapshot, present a small table:
166
+
167
+ ```
168
+ Test Case Runtime Status Steps so far
169
+ ────────────────────────────────────────────────────────────
170
+ Sign up with valid email rt-abc123 RUNNING 6
171
+ Add item to cart rt-ghi789 COMPLETED 12
172
+ ```
173
+
174
+ ## Tool Reference
175
+
176
+ | Phase | Tool |
177
+ |:------|:-----|
178
+ | Auth | `muggle-remote-auth-status`, `muggle-remote-auth-login`, `muggle-remote-auth-poll` |
179
+ | Project | `muggle-remote-project-list`, `muggle-remote-project-create` |
180
+ | Scan | `muggle-remote-test-case-list` (paginated) |
181
+ | Detail | `muggle-remote-test-case-get` |
182
+ | Dispatch | `muggle-remote-workflow-start-test-script-generation` |
183
+ | Status (optional) | `muggle-remote-wf-get-ts-gen-latest-run`, `muggle-remote-wf-get-latest-ts-gen-by-tc` |
184
+ | Browser | `open` (shell command) |
185
+
186
+ ## Non-negotiables
187
+
188
+ - **The user MUST select the project** — present projects via `AskQuestion`, never infer from cwd, repo name, or URL guesses.
189
+ - **The user MUST approve which test cases to regenerate** — show the candidates via `AskQuestion`, let them deselect, then confirm again before any dispatch. Bulk-regenerating without approval can waste meaningful workflow budget.
190
+ - **Default filter is `DRAFT` + `GENERATION_PENDING`** — never include `GENERATING`, `ACTIVE`, `DEPRECATED`, `ARCHIVED`, `REPLAYING`, or `REPLAY_PENDING` unless the user explicitly says so. `GENERATING` already has a workflow in flight and dispatching another races against it. `ACTIVE` test cases already have working scripts. The rest reflect deliberate user decisions or in-flight replays the skill should not interfere with.
191
+ - **Use `muggle-remote-test-case-get` before each dispatch** — the list endpoint returns a slim shape and generation needs the full payload.
192
+ - **Failures don't abort the batch** — log and continue, then surface them at the end. Partial progress beats no progress.
193
+ - **Never throttle artificially** — dispatch sequentially as fast as the API accepts. Muggle's cloud handles parallelism.
194
+ - **Open the dashboard, don't poll by default** — the runs page is the canonical view of progress. Only poll if the user explicitly asks.
195
+ - **Use `AskQuestion` for every selection** — never ask the user to type a number.
196
+ - **Can be invoked at any state** — if the user already has a project chosen in conversation context, skip Step 2 and go straight to scanning.
@@ -0,0 +1,58 @@
1
+ {
2
+ "skill_name": "muggle-test-regenerate-missing",
3
+ "notes": "These evals test the PLAN behavior. Because Muggle MCP tools require live auth and a real project, the subagents can't actually execute the workflow — each prompt tells them to write a step-by-step plan instead. The assertions check whether the plan reflects the skill's non-negotiables (correct filter, AskQuestion-based selection, no auto-select, dispatch via the remote workflow tool, batch-failure tolerance, dashboard open-at-end).",
4
+ "evals": [
5
+ {
6
+ "id": 0,
7
+ "eval_name": "vague-catch-up-request",
8
+ "prompt": "I just finished importing a bunch of test cases from our PRD into our Muggle project 'Acme Checkout QA' and half of them never got actual scripts generated — they're still sitting there as drafts. Can you kick off script generation for the ones that need it?",
9
+ "files": [],
10
+ "assertions": [
11
+ { "name": "calls_auth_status_first", "text": "Plan starts by calling muggle-remote-auth-status (and login/poll if needed) before any other Muggle tool." },
12
+ { "name": "project_selection_via_AskQuestion", "text": "Plan calls muggle-remote-project-list and presents projects via AskQuestion rather than auto-selecting 'Acme Checkout QA' by name match." },
13
+ { "name": "default_filter_is_draft_and_generation_pending", "text": "Plan states that the status filter is DRAFT + GENERATION_PENDING and explicitly excludes GENERATING, ACTIVE, DEPRECATED, ARCHIVED, REPLAYING, and REPLAY_PENDING." },
14
+ { "name": "uses_test_case_list_paginated", "text": "Plan calls muggle-remote-test-case-list with pagination (or equivalent full-enumeration) for the project scan step." },
15
+ { "name": "test_case_get_before_dispatch", "text": "Plan calls muggle-remote-test-case-get for each candidate before dispatching generation, to obtain the full payload (goal, precondition, instructions, expectedResult, url)." },
16
+ { "name": "dispatch_via_remote_workflow_tool", "text": "Plan dispatches each regeneration via muggle-remote-workflow-start-test-script-generation (not a local Electron tool)." },
17
+ { "name": "candidate_list_via_AskQuestion_multi_select", "text": "Plan presents the candidate test cases via AskQuestion with multi-select (allow_multiple: true) so the user can deselect individuals." },
18
+ { "name": "final_confirmation_step", "text": "Plan includes a final yes/no confirmation AskQuestion before any dispatch, showing the count of test cases about to be regenerated." },
19
+ { "name": "batch_failure_tolerance", "text": "Plan explicitly states that a single dispatch failure does not abort the batch — failures are logged and the loop continues." },
20
+ { "name": "opens_dashboard_runs_page_at_end", "text": "Plan ends by opening the Muggle dashboard project runs page for the user to watch progress." }
21
+ ]
22
+ },
23
+ {
24
+ "id": 1,
25
+ "eval_name": "explicit-filter-override",
26
+ "prompt": "Our project 'Tanka Staging' has about 200 test cases. I want to regenerate scripts for everything that's not active — including the ones that are stuck in GENERATING, because those have been sitting there for two days and I'm pretty sure they're dead. Please walk me through what you'd do.",
27
+ "files": [],
28
+ "assertions": [
29
+ { "name": "calls_auth_status_first", "text": "Plan starts by calling muggle-remote-auth-status before any other Muggle tool." },
30
+ { "name": "project_selection_via_AskQuestion", "text": "Plan uses AskQuestion to let the user confirm project choice; does not auto-select 'Tanka Staging' by name match alone." },
31
+ { "name": "honors_override_to_include_generating", "text": "Plan explicitly honors the user's override and includes GENERATING in the filter for this run, noting that this is a deliberate override of the default filter." },
32
+ { "name": "still_excludes_active_deprecated_archived", "text": "Even with the override, the plan still excludes ACTIVE, DEPRECATED, ARCHIVED, REPLAY_PENDING, and REPLAYING — it only widens the filter to include GENERATING, not everything." },
33
+ { "name": "handles_large_candidate_list", "text": "Plan acknowledges that with ~200 test cases the candidate list may exceed 25 and describes paging or bulk-include handling (e.g., 'Include all N' tail option)." },
34
+ { "name": "dispatch_via_remote_workflow_tool", "text": "Plan dispatches each regeneration via muggle-remote-workflow-start-test-script-generation." },
35
+ { "name": "final_confirmation_step", "text": "Plan includes a final yes/no AskQuestion confirmation before any dispatch, showing the total count." },
36
+ { "name": "batch_failure_tolerance", "text": "Plan states that a single dispatch failure does not abort the batch." },
37
+ { "name": "opens_dashboard_runs_page_at_end", "text": "Plan ends by opening the Muggle dashboard project runs page." }
38
+ ]
39
+ },
40
+ {
41
+ "id": 2,
42
+ "eval_name": "small-project-confirmation",
43
+ "prompt": "can you bulk regen my muggle test cases that don't have scripts? The project is tiny — I think there are only like 6 or 7 test cases in DRAFT.",
44
+ "files": [],
45
+ "assertions": [
46
+ { "name": "calls_auth_status_first", "text": "Plan starts by calling muggle-remote-auth-status before any other Muggle tool." },
47
+ { "name": "project_selection_via_AskQuestion", "text": "Plan uses AskQuestion to let the user pick the project — does not infer it." },
48
+ { "name": "default_filter_is_draft_and_generation_pending", "text": "Plan states that the status filter is DRAFT + GENERATION_PENDING by default." },
49
+ { "name": "small_list_all_preselected", "text": "Plan describes presenting all candidates in a single AskQuestion with everything pre-checked (since the candidate list is ≤ 25)." },
50
+ { "name": "test_case_get_before_dispatch", "text": "Plan calls muggle-remote-test-case-get for each candidate before dispatching generation." },
51
+ { "name": "dispatch_via_remote_workflow_tool", "text": "Plan dispatches each regeneration via muggle-remote-workflow-start-test-script-generation." },
52
+ { "name": "final_confirmation_step", "text": "Plan includes a final yes/no AskQuestion confirmation before any dispatch." },
53
+ { "name": "batch_failure_tolerance", "text": "Plan states that a single dispatch failure does not abort the batch." },
54
+ { "name": "opens_dashboard_runs_page_at_end", "text": "Plan ends by opening the Muggle dashboard project runs page." }
55
+ ]
56
+ }
57
+ ]
58
+ }
@@ -0,0 +1,22 @@
1
+ [
2
+ { "query": "ok so we just imported about 40 test cases from the PRD into our muggle project and most of them are sitting in DRAFT with no scripts attached — can you kick off script generation for all the ones that don't have an active script yet? project is 'Acme Checkout QA'", "should_trigger": true },
3
+ { "query": "bulk regen missing test scripts in my muggle project", "should_trigger": true },
4
+ { "query": "I looked at the test cases page in muggle and like 30 out of 80 are still showing as DRAFT. is there a way to batch-regenerate all the ones without an active script instead of clicking each one in the UI?", "should_trigger": true },
5
+ { "query": "can you fill in the gaps in my muggle project — i want every test case that doesn't have a working script to have one. the project is called 'internal-admin-e2e'", "should_trigger": true },
6
+ { "query": "all my muggle test cases without active scripts need a script now please", "should_trigger": true },
7
+ { "query": "regenerate every test case in 'tanka staging' that doesn't currently have an active test script attached. default filter is fine, just don't touch the ACTIVE ones.", "should_trigger": true },
8
+ { "query": "I have a ton of generation_pending and draft test cases in muggle — they've been stuck for days. kick off generation for all of them at once.", "should_trigger": true },
9
+ { "query": "rebuild the missing muggle scripts for my project, the inactive test cases need to be regenerated in one batch", "should_trigger": true },
10
+ { "query": "kick off muggle script generation for every test case that's still in DRAFT state — i don't want to go through them one by one in the dashboard", "should_trigger": true },
11
+ { "query": "bulk fill the empty test scripts across my whole project so they all become active", "should_trigger": true },
12
+ { "query": "I just pushed a new commit on branch users/stan/checkout-refactor — can you run the muggle e2e acceptance tests against my localhost dev server (it's on port 3000) and make sure nothing is broken before I open the PR?", "should_trigger": false },
13
+ { "query": "write me a playwright test for the checkout flow that covers the happy path — user adds item, enters address, pays with test card", "should_trigger": false },
14
+ { "query": "can you run a single muggle test against my localhost for the signup flow? I want to watch it in the visible Electron window, the dev server is on http://localhost:5173", "should_trigger": false },
15
+ { "query": "my muggle-ai-ui CI is failing — the test script for 'login with valid creds' keeps timing out. can you look at the action script, figure out why, and fix it?", "should_trigger": false },
16
+ { "query": "I have a cypress test suite in cypress/integration/** and a few markdown PRDs in docs/requirements. import them into muggle as use cases and test cases under my project 'Acme Checkout QA'", "should_trigger": false },
17
+ { "query": "what muggle projects do I have right now? just list them for me with their URLs", "should_trigger": false },
18
+ { "query": "create a new muggle test case for 'apply expired coupon shows error banner' under the Checkout Flow use case in project 'Acme Checkout QA'", "should_trigger": false },
19
+ { "query": "regen my package-lock.json and reinstall node_modules — something is off with the muggle-ai-works workspace after the dependabot merge", "should_trigger": false },
20
+ { "query": "check the status of my current muggle test script generation — i kicked off about 15 runs earlier and i want to know which ones finished and which are still running", "should_trigger": false },
21
+ { "query": "generate a brand new use case and some test cases from scratch for my muggle project using a plain-english description of our new onboarding flow", "should_trigger": false }
22
+ ]
@@ -0,0 +1,7 @@
1
+ {
2
+ "release": "4.5.0",
3
+ "buildId": "run-13-1",
4
+ "commitSha": "bff100bb7229ea757db7154a9ef0c289da0124ef",
5
+ "buildTime": "2026-04-09T22:06:42Z",
6
+ "serviceName": "muggle-ai-works-mcp"
7
+ }
package/package.json CHANGED
@@ -1,7 +1,7 @@
1
1
  {
2
2
  "name": "@muggleai/works",
3
3
  "mcpName": "io.github.multiplex-ai/muggle",
4
- "version": "4.4.0",
4
+ "version": "4.5.0",
5
5
  "description": "Ship quality products with AI-powered E2E acceptance testing that validates your web app like a real user — from Claude Code and Cursor to PR.",
6
6
  "type": "module",
7
7
  "main": "dist/index.js",
@@ -16,7 +16,7 @@
16
16
  ],
17
17
  "scripts": {
18
18
  "clean": "rimraf dist",
19
- "build": "tsup && node scripts/sync-versions.mjs && node scripts/build-plugin.mjs",
19
+ "build": "tsup && node scripts/write-release-manifest.mjs && node scripts/sync-versions.mjs && node scripts/build-plugin.mjs",
20
20
  "build:plugin": "node scripts/build-plugin.mjs",
21
21
  "sync:versions": "node scripts/sync-versions.mjs",
22
22
  "build:release": "npm run build",
@@ -41,14 +41,14 @@
41
41
  "test:watch": "vitest"
42
42
  },
43
43
  "muggleConfig": {
44
- "electronAppVersion": "1.0.36",
44
+ "electronAppVersion": "1.0.47",
45
45
  "downloadBaseUrl": "https://github.com/multiplex-ai/muggle-ai-works/releases/download",
46
46
  "runtimeTargetDefault": "production",
47
47
  "checksums": {
48
- "darwin-arm64": "0466282dbccab499a42a1da13e4ff1883b58044de8c66858e77bad0cbee6979e",
49
- "darwin-x64": "405730a320b8eb6ced6e250840b75758de32355705eedeeb81297aa3e211805a",
50
- "win32-x64": "962fd484f8409b662ba71f24fdafb0a665f28e52b7066696f5c5d16ebc7ba1ed",
51
- "linux-x64": "f3cd45b893fa7f90e0af1e5b911ce13bf9a7a25d038298cb1ff2465145f08abc"
48
+ "darwin-arm64": "f80b943ea5f05e7113d603ee8104c07be101a26092c4fa50ed6fcb37a0cbebff",
49
+ "darwin-x64": "3189a5c07087f9ba2ef03f99e3735055c00a752b2421ae9cffc113f04933da61",
50
+ "win32-x64": "d8e102fce024776262f856dfea0b12429e853689d29d03e27e54dbeffbe59725",
51
+ "linux-x64": "d7218dddcbab47f78c64fe438cf5bd129740f20f6ad160b615a648415f8faffc"
52
52
  }
53
53
  },
54
54
  "dependencies": {
@@ -1,7 +1,7 @@
1
1
  {
2
2
  "name": "muggle",
3
3
  "description": "Run real-browser end-to-end (E2E) acceptance tests on your web app from any AI coding agent. Generate test scripts from plain English, replay them on localhost, capture screenshots, and validate user flows like signup, checkout, and dashboards. Works across Claude Code, Cursor, Codex, and Windsurf.",
4
- "version": "4.4.0",
4
+ "version": "4.5.0",
5
5
  "author": {
6
6
  "name": "Muggle AI",
7
7
  "email": "support@muggle-ai.com"
@@ -2,7 +2,7 @@
2
2
  "name": "muggle",
3
3
  "displayName": "Muggle AI",
4
4
  "description": "Ship quality products with AI-powered end-to-end (E2E) acceptance testing that validates your web app like a real user — from Claude Code and Cursor to PR.",
5
- "version": "4.4.0",
5
+ "version": "4.5.0",
6
6
  "author": {
7
7
  "name": "Muggle AI",
8
8
  "email": "support@muggle-ai.com"
package/plugin/README.md CHANGED
@@ -28,6 +28,7 @@ Type `muggle` to discover the full command family.
28
28
  | `/muggle:muggle-test` | Change-driven E2E acceptance router: detects code changes, maps to use cases, runs test generation locally or remotely, publishes to dashboard, opens in browser, posts E2E acceptance results to PR. |
29
29
  | `/muggle:muggle-test-feature-local` | Test a feature on localhost with AI-driven browser automation. Offers publish to cloud after each run. |
30
30
  | `/muggle:muggle-test-import` | Import existing tests into Muggle Test — from Playwright/Cypress specs, PRDs, Gherkin feature files, test plan docs, or any test artifact. |
31
+ | `/muggle:muggle-test-regenerate-missing` | Bulk-regenerate test scripts for every test case in a project that doesn't currently have an active script. Scans DRAFT + GENERATION_PENDING, confirms the list with the user, and dispatches remote generation workflows for each. |
31
32
  | `/muggle:muggle-status` | Health check for Electron browser test runner, MCP server, and authentication. |
32
33
  | `/muggle:muggle-repair` | Diagnose and fix broken installation automatically. |
33
34
  | `/muggle:muggle-upgrade` | Update Electron browser test runner and MCP server to latest version. |