npm - @muggleai/works - Versions diffs - 4.3.0 → 4.5.0 - Mend

@muggleai/works 4.3.0 → 4.5.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (30) hide show

package/README.md +31 -13
package/dist/{chunk-23NOSJFH.js → chunk-MNCBJEPQ.js} +588 -36
package/dist/cli.js +1 -1
package/dist/index.js +1 -1
package/dist/plugin/.claude-plugin/plugin.json +1 -1
package/dist/plugin/.cursor-plugin/plugin.json +1 -1
package/dist/plugin/README.md +1 -0
package/dist/plugin/skills/do/open-prs.md +32 -65
package/dist/plugin/skills/muggle/SKILL.md +15 -15
package/dist/plugin/skills/muggle-pr-visual-walkthrough/SKILL.md +181 -0
package/dist/plugin/skills/muggle-test/SKILL.md +97 -137
package/dist/plugin/skills/muggle-test-feature-local/SKILL.md +94 -27
package/dist/plugin/skills/muggle-test-import/SKILL.md +135 -40
package/dist/plugin/skills/muggle-test-regenerate-missing/SKILL.md +196 -0
package/dist/plugin/skills/muggle-test-regenerate-missing/evals/evals.json +58 -0
package/dist/plugin/skills/muggle-test-regenerate-missing/evals/trigger-eval.json +22 -0
package/dist/release-manifest.json +7 -0
package/package.json +7 -7
package/plugin/.claude-plugin/plugin.json +1 -1
package/plugin/.cursor-plugin/plugin.json +1 -1
package/plugin/README.md +1 -0
package/plugin/skills/do/open-prs.md +32 -65
package/plugin/skills/muggle/SKILL.md +15 -15
package/plugin/skills/muggle-pr-visual-walkthrough/SKILL.md +181 -0
package/plugin/skills/muggle-test/SKILL.md +97 -137
package/plugin/skills/muggle-test-feature-local/SKILL.md +94 -27
package/plugin/skills/muggle-test-import/SKILL.md +135 -40
package/plugin/skills/muggle-test-regenerate-missing/SKILL.md +196 -0
package/plugin/skills/muggle-test-regenerate-missing/evals/evals.json +58 -0
package/plugin/skills/muggle-test-regenerate-missing/evals/trigger-eval.json +22 -0

package/dist/plugin/skills/muggle-test-import/SKILL.md CHANGED Viewed

@@ -114,11 +114,11 @@ Found 3 use cases with 8 test cases:
    ✦ [HIGH]   Checkout fails with invalid payment info
 ```
-Ask:
-- "Does this structure look right?"
-- "Anything to add, remove, rename, or re-prioritise before I import?"
+Use `AskQuestion` to confirm:
+- "Looks good — proceed with import"
+- "I want to make changes first"
-Incorporate feedback, then confirm: "Ready to import — shall I proceed?"
+If the user wants changes, incorporate feedback, then ask again. Only proceed after explicit approval.
 > For Path A (native PRD upload): present the use case/test case list that Muggle extracted
 > after the processing workflow completes, and ask the user to confirm before adding any
@@ -142,29 +142,28 @@ If not authenticated:
 ## Step 5 — Pick or create a project
-Call `muggle-remote-project-list` and show the results as a numbered menu:
+A **project** is where all your imported use cases, test cases, and future test results are grouped on the Muggle AI dashboard.
-```
-Existing projects:
-  1. Acme Web App
-  2. Admin Portal
-  3. Mobile API
+1. Call `muggle-remote-project-list`
+2. Use `AskQuestion` to present all projects as clickable options. Include the project URL in each label. Always include a "Create new project" option at the end.
-Or: [C] Create new project
-```
+   Prompt: "Pick the project to import into:"
-**If creating a new project**, propose values based on what you learned from the source files:
-- **Name**: infer the app name from filenames, URLs, or document headings (e.g., "Acme App")
-- **Description**: "Imported from [filename(s)] — [date]"
-- **URL**: the base URL of the app under test
+3. **If creating a new project**, propose values based on what you learned from the source files:
+   - **Name**: infer the app name from filenames, URLs, or document headings (e.g., "Acme App")
+   - **Description**: "Imported from [filename(s)] — [date]"
+   - **URL**: the base URL of the app under test
-Show the proposal and confirm before calling `muggle-remote-project-create`.
+   Show the proposal and confirm before calling `muggle-remote-project-create`.
 ---
 ## Step 6 — Import
-Import in two passes. Show progress to the user as you go.
+Import in two passes using bulk-preview. Show progress to the user as you go.
+Both passes use Muggle's async bulk-preview MCP tools, which route prompts through OpenAI's
+Batch API for roughly ~50% of normal LLM cost. The flow is always: **submit → poll → persist**.
 ### Path A — Native PRD upload (for document files)
@@ -203,40 +202,135 @@ If the upload or processing fails, fall back to Path B manual extraction.
 Run both passes below for Playwright, Cypress, or other test scripts.
-### Pass 1 — Create use cases (Path B only)
+### Shared limits (both passes)
+- Maximum 100 prompts per submit call. If you have more, split into batches of 100 and submit sequentially.
+- Maximum 4000 characters per `instruction`.
+- Maximum 3 in-flight bulk-preview jobs per project (the submit tool will error if exceeded).
-Call `muggle-remote-use-case-create-from-prompts` with all use cases in a single batch:
+### Shared error handling (both passes)
+The bulk-preview submit and get/cancel MCP tools surface structured error codes — look for
+these on any tool result and act accordingly:
+| Error code / symptom | What happened | What to do |
+|---|---|---|
+| `TOO_MANY_IN_FLIGHT_JOBS` (HTTP 429) | Already 3 in-flight jobs for this project | Tell the user: "There are already 3 bulk-preview jobs in progress for this project. Wait for them to finish, then retry." Stop. |
+| `QUOTA_EXCEEDED_PREFLIGHT` (HTTP 409) | Batch would blow past the account's quota for this resource | Show: "Your quota allows at most `<maxPromptsAllowed>` prompts in this batch (current headroom: `<headroom>`). Please reduce the batch and try again." Stop. |
+| `NOT_FOUND` on submit (HTTP 404) | Project or parent use case does not exist, or this server version doesn't expose bulk-preview yet | Tell the user which — double-check the IDs you passed. If you're confident the IDs are right, ask the user to make sure the prompt-service is up to date. Stop. |
+| `VALIDATION_ERROR` (HTTP 400) | A prompt exceeds limits (e.g. >4000 chars) or the prompt list is empty | Fix the offending prompts and retry. |
+| Payload > 1 MB (HTTP 413) | Body too large | Split into smaller batches. |
+### Shared polling loop
+After a successful submit, poll with `muggle-remote-bulk-preview-job-get` (inputs: `projectId`,
+`jobId`) every 15 seconds. Show progress like:
 ```
-projectId: <chosen project ID>
-prompts: [
-  { instruction: "<Use case name> — <one-sentence description of what this use case covers>" },
-  ...
-]
+Generating previews for "User Authentication"... (status: running, elapsed: 30s)
 ```
-After the call returns, collect the use case IDs from the response.
-If IDs are not in the response, call `muggle-remote-use-case-list` and match by name.
-### Pass 2 — Create test cases
+`status` values and what to do:
-For each use case, call `muggle-remote-test-case-create` for every test case under it:
+| Status | Terminal? | Action |
+|---|---|---|
+| `queued` | No | Keep polling |
+| `submitted` | No | Keep polling |
+| `running` | No | Keep polling |
+| `succeeded` | Yes | All prompts processed — proceed to persist results |
+| `partial` | Yes | Some prompts succeeded — show summary, ask user whether to proceed |
+| `failed` | Yes | Job failed entirely — show `error.message` and stop |
+| `cancelled` | Yes | Job was cancelled — stop |
+| `expired` | Yes | Job expired before completing — tell user to retry |
+**If status is `partial`**, show:
 ```
-projectId: <project ID>
-useCaseId: <use case ID>
-title:          "Login with valid credentials"
-description:    "Navigate to the login page, enter a valid email and password, submit the form"
-goal:           "Verify that a registered user can log in successfully"
-expectedResult: "User is redirected to the dashboard and sees their name in the header"
-precondition:   "A user account exists and is not locked"
-priority:       "HIGH"
-url:            "https://app.example.com/login"
+Preview completed with partial results: <N> of <promptCount> generated successfully.
+Failed items:
+  - [<clientRef>] "<source text>": <error message>
+Proceed with the <N> successful items, or cancel to review?
 ```
+Use `AskQuestion` with options "Proceed with successful items" / "Cancel import". Only continue
+if the user chooses to proceed.
+If you need to abort an in-flight job, call `muggle-remote-bulk-preview-job-cancel` — the
+server picks up the request cooperatively within one harvester tick.
+### Pass 1 — Create use cases (Path B only)
+1. Call `muggle-remote-use-case-bulk-preview-submit` with one prompt per use case:
+   ```
+   projectId: <chosen project ID>
+   prompts: [
+     { clientRef: "uc-0", instruction: "<Use case name> — <one-sentence description>" },
+     ...
+   ]
+   ```
+   The call returns `{ jobId, status, kind, promptCount }`.
+2. Run the **Shared polling loop** above until the job reaches a terminal status.
+3. For each successful result (shape: `{ clientRef, index, status: "success", useCase: IUseCaseCreationRequest }`),
+   call `muggle-remote-use-case-create` to persist it — no LLM is invoked, so this is fast and free:
+   ```
+   projectId:        <project ID>
+   title:            <from useCase.title>
+   description:      <from useCase.description>
+   userStory:        <from useCase.userStory>
+   url:              <from useCase.url>         # optional
+   useCaseBreakdown: <from useCase.useCaseBreakdown>
+   status:           <from useCase.status>       # e.g. DRAFT
+   priority:         <from useCase.priority>     # e.g. MEDIUM
+   source:           <from useCase.source>       # e.g. PROMPT
+   category:         <from useCase.category>     # optional
+   ```
+4. Collect the returned `useCaseId` of each created use case — you'll need it for Pass 2.
+   It is safe to persist use cases in parallel once the job is terminal.
+### Pass 2 — Generate and create test cases
+For each use case, run a bulk-preview job to generate its test cases.
+1. Call `muggle-remote-test-case-bulk-preview-submit`:
+   ```
+   projectId: <project ID>
+   useCaseId: <use case ID>
+   prompts: [
+     {
+       clientRef: "tc-0",
+       instruction: "<title> | goal: <goal> | expectedResult: <expectedResult> | precondition: <precondition> | priority: <HIGH|MEDIUM|LOW> | url: <url>"
+     },
+     ...
+   ]
+   ```
+2. Run the **Shared polling loop** above until the job reaches a terminal status.
+3. Each successful result has this shape (note the fan-out):
+   ```jsonc
+   { "clientRef": "tc-0", "index": 0, "status": "success", "testCases": [ /* ITestCaseCreationRequest[] — fan-out 1–5 */ ] }
+   ```
+   One input prompt may produce 1–5 test case items. For each item in `result.testCases`, call
+   `muggle-remote-test-case-create`:
+   ```
+   projectId:      <project ID>
+   useCaseId:      <use case ID>
+   title:          <from testCase.title>
+   description:    <from testCase.description>
+   goal:           <from testCase.goal>
+   expectedResult: <from testCase.expectedResult>
+   precondition:   <from testCase.precondition>
+   priority:       <from testCase.priority>
+   url:            <from testCase.url>
+   ```
 Print progress: `Creating test cases for "User Authentication"... (1/3)`
-It is safe to create test cases for different use cases in parallel — do so when you have many to create.
+It is safe to create test cases for different use cases in parallel once their bulk-preview
+jobs have reached a terminal status. However, **submit** bulk-preview jobs sequentially to
+avoid exceeding the 3 in-flight job cap per project.
 ---
@@ -247,6 +341,7 @@ When all imports are done, print a clean summary. Include:
 - Total use cases and test cases created
 - A line per use case with its test case count and a link to view it
 - A link to the project overview
+- If any items failed during preview (partial status), list them so the user can retry
 Construct view URLs using the Muggle dashboard URL pattern:
 - Project test cases: `https://www.muggle-ai.com/muggleTestV0/dashboard/projects/<projectId>/testcases`

package/dist/plugin/skills/muggle-test-regenerate-missing/SKILL.md ADDED Viewed

@@ -0,0 +1,196 @@
+---
+name: muggle-test-regenerate-missing
+description: "Bulk-regenerate test scripts for every test case in a Muggle AI project that doesn't currently have an active script. Scans the project, finds test cases stuck in DRAFT or GENERATION_PENDING (no usable script attached), shows the user the list, and on approval kicks off remote test script generation for each one in parallel via the Muggle cloud. Use this skill whenever the user asks to 'regenerate missing scripts', 'fill in missing test scripts', 'generate scripts for test cases without one', 'regen all the test cases that don't have scripts', 'rebuild scripts for stale test cases', 'fix test cases with no script', 'bulk regenerate', or any phrasing that means 'kick off script generation across a project for the cases that need it'. Triggers on: 'regenerate missing test scripts', 'generate scripts for all empty test cases', 'fill the gaps in my test scripts', 'bulk test script regen', 'all my test cases without active scripts'. This is the go-to skill for project-wide script catch-up — it handles discovery, filtering, confirmation, and remote workflow dispatch end-to-end."
+---
+# Muggle Test — Regenerate Missing Test Scripts
+A bulk maintenance skill for Muggle AI projects. It finds every test case in a project that does **not** currently have an active (ready-to-run) test script, shows the list to the user, and on approval triggers a remote test script generation workflow for each one. Useful after creating a batch of new test cases or when cleaning up a project that has drifted.
+Execution is **remote only** — Muggle's cloud generates the scripts in parallel against the project URL. The user's machine is not involved beyond making API calls.
+## Concept: what counts as "no active script"
+In the Muggle data model, a test case carries a status that reflects whether it has a usable script attached:
+| Status | Meaning | Regenerate? |
+|:-------|:--------|:-----------:|
+| `ACTIVE` | Has a generated, ready-to-run script | No — already good |
+| `DRAFT` | Created but never generated | **Yes** |
+| `GENERATION_PENDING` | Queued but generation never started | **Yes** |
+| `GENERATING` | Currently generating | No — generation is in flight; don't double-dispatch |
+| `REPLAY_PENDING` / `REPLAYING` | A replay is in flight; script exists | No — busy with replay |
+| `DEPRECATED` | Marked stale on purpose | No — user decision |
+| `ARCHIVED` | Hidden from normal flows | No — user decision |
+The skill targets exactly **DRAFT** and **GENERATION_PENDING** by default. These are the two states that mean "no usable script attached and nothing running to produce one." `GENERATING` is deliberately excluded — a workflow is already in progress, and firing a second one races against the first and wastes budget.
+Treat this filter as a default, not a law. If the user explicitly says "include generating ones too, they're stuck" or "include deprecated", respect the override — but don't widen the filter on your own.
+## UX Guidelines — Minimize Typing
+**Every selection-based question MUST use the `AskQuestion` tool** (or the platform's equivalent structured selection tool). Never ask the user to "reply with a number" — always present clickable options.
+- **Selections** (project, which test cases to include): Use `AskQuestion`, with `allow_multiple: true` for the test case picker.
+- **Free-text inputs** (project URL when creating, override filters): Only ask as plain text when the option set isn't finite.
+- **Batch related questions** when independent. Don't ask sequentially what could be one screen.
+## Workflow
+### Step 1 — Authenticate
+1. Call `muggle-remote-auth-status`.
+2. If not authenticated or expired → call `muggle-remote-auth-login`, then poll with `muggle-remote-auth-poll`.
+3. Do not skip auth and do not assume a stale token still works.
+If auth keeps failing, suggest the user run `muggle logout && muggle login` from a terminal.
+### Step 2 — Select Project (user must choose)
+A **project** is the unit on the Muggle AI dashboard that groups test cases, scripts, and runs. The user must pick the one to scan — never auto-select from repo name, branch, or URL heuristics.
+1. Call `muggle-remote-project-list`.
+2. Use `AskQuestion` to present projects as clickable options. Include the project URL in each label so the user can disambiguate. Always include "Create new project" as the last option.
+3. Wait for explicit selection.
+4. If the user picks "Create new project": collect `projectName`, `description`, and `url`, then call `muggle-remote-project-create`.
+Store `projectId` and `projectUrl` only after the user confirms — both are needed downstream.
+### Step 3 — Scan Test Cases
+Pull the full set of test cases for the project. The list endpoint is paginated.
+1. Call `muggle-remote-test-case-list` with `projectId`. Start at page 1; continue requesting pages until the response indicates no more items. Use a generous `pageSize` (e.g. 100) to keep the call count low.
+2. Accumulate everything into a single in-memory array.
+3. Tell the user the totals as you go if the project is large (e.g., "Found 247 test cases across the project").
+If the call returns zero test cases, stop and tell the user — there is nothing to regenerate. Suggest they create test cases first (point them at `muggle-test` or `muggle-test-feature-local`).
+### Step 4 — Filter to Missing Scripts
+Apply the default filter: keep only test cases where `status` ∈ `{DRAFT, GENERATION_PENDING}`.
+Then show a one-line summary. Also surface how many cases are currently `GENERATING` so the user knows about in-flight work (but don't include them in the candidate list unless they override the filter):
+```
+Project: <name> (<projectId>)
+Total test cases: 247
+With active script (skipped): 198
+Currently generating (skipped, in-flight): 32
+Needs regeneration: 17
+  • DRAFT: 12
+  • GENERATION_PENDING: 5
+```
+If after filtering the list is empty, congratulate the user — every test case already has an active script — and stop. Do not invent work.
+### Step 5 — Present and Confirm Selection
+Use `AskQuestion` with `allow_multiple: true` to present every candidate test case as a clickable option. The user must explicitly approve which ones to regenerate.
+For each option label, show enough context for the user to make a real decision:
+```
+[<status>] <title> — use case: <use case title>
+```
+For example:
+- `[DRAFT] Sign up with valid email — use case: User Registration`
+- `[GENERATION_PENDING] Add item to cart — use case: Checkout Flow`
+Default behavior:
+- If there are **≤ 25** candidates, present all of them in a single `AskQuestion` with everything pre-checked and let the user deselect anything they want to skip.
+- If there are **> 25** candidates, show the first 25 ranked by status priority (`DRAFT` → `GENERATION_PENDING`), plus a tail option **"Include all N — don't make me click each one"**. The user can also pick "Show next batch" to see more.
+After selection, call `AskQuestion` once more for a final confirmation:
+> "About to start remote test script generation for **N** test cases against `<projectUrl>`. This will consume Muggle workflow budget. Proceed?"
+>
+> - "Yes, start all N"
+> - "No, cancel"
+Only proceed after the user picks "Yes".
+### Step 6 — Dispatch Remote Generations
+For each selected test case, in order:
+1. Call `muggle-remote-test-case-get` with `testCaseId` to fetch the full record (the list endpoint returns a slim shape; generation needs `goal`, `precondition`, `instructions`, `expectedResult`, `url`).
+2. Call `muggle-remote-workflow-start-test-script-generation` with:
+   - `projectId` — from Step 2
+   - `useCaseId` — from the test case
+   - `testCaseId` — the test case being regenerated
+   - `name` — `"muggle-test-regenerate-missing: {test case title}"` (so it's easy to find this batch later in the dashboard)
+   - `url` — prefer the test case's own `url` if set, else the project URL from Step 2
+   - `goal`, `precondition`, `instructions`, `expectedResult` — straight from the test case. If `precondition` is empty, pass `"None"` (the schema requires a non-empty string).
+3. Capture the returned workflow runtime ID and store it alongside the test case.
+**Failure handling:** if a single dispatch fails (validation error, server error, missing field), log it inline, mark the test case as `dispatch_failed`, and continue to the next one. Do not abort the whole batch — partial progress is more useful than nothing.
+**Pacing:** Muggle's cloud handles parallelism on its side, so you don't need to throttle. Just dispatch sequentially as fast as the API will accept them.
+### Step 7 — Report
+After all dispatches are done, print a summary table:
+```
+Test Case                          Use Case             Prev Status       Dispatch       Runtime
+───────────────────────────────────────────────────────────────────────────────────────────────────
+Sign up with valid email           User Registration    DRAFT             ✅ started      rt-abc123
+Sign up with disposable email      User Registration    DRAFT             ✅ started      rt-def456
+Add item to cart                   Checkout Flow        GENERATION_PEND.  ✅ started      rt-ghi789
+Apply expired coupon               Checkout Flow        GENERATION_PEND.  ❌ failed       —
+───────────────────────────────────────────────────────────────────────────────────────────────────
+Total: 17 dispatched | 16 started | 1 failed
+```
+For failures: include a one-line error excerpt and (where possible) a hint at the cause (e.g., "missing instructions field — edit the test case in the dashboard, then re-run this skill").
+### Step 8 — Open the Dashboard
+Open the Muggle AI dashboard so the user can watch progress visually:
+```bash
+open "https://www.muggle-ai.com/muggleTestV0/dashboard/projects/{projectId}/runs"
+```
+Tell them:
+> "I've opened the project's runs page. Generation jobs typically take a few minutes each — they'll appear here as they progress. Your test cases will move into `ACTIVE` status as scripts complete."
+### Step 9 (optional) — Poll Status
+Only if the user explicitly asks for a status check, call `muggle-remote-wf-get-ts-gen-latest-run` for each runtime ID and report progress. **Do not poll on a tight loop by default** — the dispatch step is the actual goal of this skill, and the dashboard already shows live status better than a CLI loop ever can. Polling exists as a courtesy for users who don't want to leave the terminal.
+If the user wants a one-shot snapshot, present a small table:
+```
+Test Case                Runtime     Status      Steps so far
+────────────────────────────────────────────────────────────
+Sign up with valid email rt-abc123   RUNNING     6
+Add item to cart         rt-ghi789   COMPLETED   12
+```
+## Tool Reference
+| Phase | Tool |
+|:------|:-----|
+| Auth | `muggle-remote-auth-status`, `muggle-remote-auth-login`, `muggle-remote-auth-poll` |
+| Project | `muggle-remote-project-list`, `muggle-remote-project-create` |
+| Scan | `muggle-remote-test-case-list` (paginated) |
+| Detail | `muggle-remote-test-case-get` |
+| Dispatch | `muggle-remote-workflow-start-test-script-generation` |
+| Status (optional) | `muggle-remote-wf-get-ts-gen-latest-run`, `muggle-remote-wf-get-latest-ts-gen-by-tc` |
+| Browser | `open` (shell command) |
+## Non-negotiables
+- **The user MUST select the project** — present projects via `AskQuestion`, never infer from cwd, repo name, or URL guesses.
+- **The user MUST approve which test cases to regenerate** — show the candidates via `AskQuestion`, let them deselect, then confirm again before any dispatch. Bulk-regenerating without approval can waste meaningful workflow budget.
+- **Default filter is `DRAFT` + `GENERATION_PENDING`** — never include `GENERATING`, `ACTIVE`, `DEPRECATED`, `ARCHIVED`, `REPLAYING`, or `REPLAY_PENDING` unless the user explicitly says so. `GENERATING` already has a workflow in flight and dispatching another races against it. `ACTIVE` test cases already have working scripts. The rest reflect deliberate user decisions or in-flight replays the skill should not interfere with.
+- **Use `muggle-remote-test-case-get` before each dispatch** — the list endpoint returns a slim shape and generation needs the full payload.
+- **Failures don't abort the batch** — log and continue, then surface them at the end. Partial progress beats no progress.
+- **Never throttle artificially** — dispatch sequentially as fast as the API accepts. Muggle's cloud handles parallelism.
+- **Open the dashboard, don't poll by default** — the runs page is the canonical view of progress. Only poll if the user explicitly asks.
+- **Use `AskQuestion` for every selection** — never ask the user to type a number.
+- **Can be invoked at any state** — if the user already has a project chosen in conversation context, skip Step 2 and go straight to scanning.

package/dist/plugin/skills/muggle-test-regenerate-missing/evals/evals.json ADDED Viewed

@@ -0,0 +1,58 @@
+{
+  "skill_name": "muggle-test-regenerate-missing",
+  "notes": "These evals test the PLAN behavior. Because Muggle MCP tools require live auth and a real project, the subagents can't actually execute the workflow — each prompt tells them to write a step-by-step plan instead. The assertions check whether the plan reflects the skill's non-negotiables (correct filter, AskQuestion-based selection, no auto-select, dispatch via the remote workflow tool, batch-failure tolerance, dashboard open-at-end).",
+  "evals": [
+    {
+      "id": 0,
+      "eval_name": "vague-catch-up-request",
+      "prompt": "I just finished importing a bunch of test cases from our PRD into our Muggle project 'Acme Checkout QA' and half of them never got actual scripts generated — they're still sitting there as drafts. Can you kick off script generation for the ones that need it?",
+      "files": [],
+      "assertions": [
+        { "name": "calls_auth_status_first", "text": "Plan starts by calling muggle-remote-auth-status (and login/poll if needed) before any other Muggle tool." },
+        { "name": "project_selection_via_AskQuestion", "text": "Plan calls muggle-remote-project-list and presents projects via AskQuestion rather than auto-selecting 'Acme Checkout QA' by name match." },
+        { "name": "default_filter_is_draft_and_generation_pending", "text": "Plan states that the status filter is DRAFT + GENERATION_PENDING and explicitly excludes GENERATING, ACTIVE, DEPRECATED, ARCHIVED, REPLAYING, and REPLAY_PENDING." },
+        { "name": "uses_test_case_list_paginated", "text": "Plan calls muggle-remote-test-case-list with pagination (or equivalent full-enumeration) for the project scan step." },
+        { "name": "test_case_get_before_dispatch", "text": "Plan calls muggle-remote-test-case-get for each candidate before dispatching generation, to obtain the full payload (goal, precondition, instructions, expectedResult, url)." },
+        { "name": "dispatch_via_remote_workflow_tool", "text": "Plan dispatches each regeneration via muggle-remote-workflow-start-test-script-generation (not a local Electron tool)." },
+        { "name": "candidate_list_via_AskQuestion_multi_select", "text": "Plan presents the candidate test cases via AskQuestion with multi-select (allow_multiple: true) so the user can deselect individuals." },
+        { "name": "final_confirmation_step", "text": "Plan includes a final yes/no confirmation AskQuestion before any dispatch, showing the count of test cases about to be regenerated." },
+        { "name": "batch_failure_tolerance", "text": "Plan explicitly states that a single dispatch failure does not abort the batch — failures are logged and the loop continues." },
+        { "name": "opens_dashboard_runs_page_at_end", "text": "Plan ends by opening the Muggle dashboard project runs page for the user to watch progress." }
+      ]
+    },
+    {
+      "id": 1,
+      "eval_name": "explicit-filter-override",
+      "prompt": "Our project 'Tanka Staging' has about 200 test cases. I want to regenerate scripts for everything that's not active — including the ones that are stuck in GENERATING, because those have been sitting there for two days and I'm pretty sure they're dead. Please walk me through what you'd do.",
+      "files": [],
+      "assertions": [
+        { "name": "calls_auth_status_first", "text": "Plan starts by calling muggle-remote-auth-status before any other Muggle tool." },
+        { "name": "project_selection_via_AskQuestion", "text": "Plan uses AskQuestion to let the user confirm project choice; does not auto-select 'Tanka Staging' by name match alone." },
+        { "name": "honors_override_to_include_generating", "text": "Plan explicitly honors the user's override and includes GENERATING in the filter for this run, noting that this is a deliberate override of the default filter." },
+        { "name": "still_excludes_active_deprecated_archived", "text": "Even with the override, the plan still excludes ACTIVE, DEPRECATED, ARCHIVED, REPLAY_PENDING, and REPLAYING — it only widens the filter to include GENERATING, not everything." },
+        { "name": "handles_large_candidate_list", "text": "Plan acknowledges that with ~200 test cases the candidate list may exceed 25 and describes paging or bulk-include handling (e.g., 'Include all N' tail option)." },
+        { "name": "dispatch_via_remote_workflow_tool", "text": "Plan dispatches each regeneration via muggle-remote-workflow-start-test-script-generation." },
+        { "name": "final_confirmation_step", "text": "Plan includes a final yes/no AskQuestion confirmation before any dispatch, showing the total count." },
+        { "name": "batch_failure_tolerance", "text": "Plan states that a single dispatch failure does not abort the batch." },
+        { "name": "opens_dashboard_runs_page_at_end", "text": "Plan ends by opening the Muggle dashboard project runs page." }
+      ]
+    },
+    {
+      "id": 2,
+      "eval_name": "small-project-confirmation",
+      "prompt": "can you bulk regen my muggle test cases that don't have scripts? The project is tiny — I think there are only like 6 or 7 test cases in DRAFT.",
+      "files": [],
+      "assertions": [
+        { "name": "calls_auth_status_first", "text": "Plan starts by calling muggle-remote-auth-status before any other Muggle tool." },
+        { "name": "project_selection_via_AskQuestion", "text": "Plan uses AskQuestion to let the user pick the project — does not infer it." },
+        { "name": "default_filter_is_draft_and_generation_pending", "text": "Plan states that the status filter is DRAFT + GENERATION_PENDING by default." },
+        { "name": "small_list_all_preselected", "text": "Plan describes presenting all candidates in a single AskQuestion with everything pre-checked (since the candidate list is ≤ 25)." },
+        { "name": "test_case_get_before_dispatch", "text": "Plan calls muggle-remote-test-case-get for each candidate before dispatching generation." },
+        { "name": "dispatch_via_remote_workflow_tool", "text": "Plan dispatches each regeneration via muggle-remote-workflow-start-test-script-generation." },
+        { "name": "final_confirmation_step", "text": "Plan includes a final yes/no AskQuestion confirmation before any dispatch." },
+        { "name": "batch_failure_tolerance", "text": "Plan states that a single dispatch failure does not abort the batch." },
+        { "name": "opens_dashboard_runs_page_at_end", "text": "Plan ends by opening the Muggle dashboard project runs page." }
+      ]
+    }
+  ]
+}

package/dist/plugin/skills/muggle-test-regenerate-missing/evals/trigger-eval.json ADDED Viewed

@@ -0,0 +1,22 @@
+[
+  { "query": "ok so we just imported about 40 test cases from the PRD into our muggle project and most of them are sitting in DRAFT with no scripts attached — can you kick off script generation for all the ones that don't have an active script yet? project is 'Acme Checkout QA'", "should_trigger": true },
+  { "query": "bulk regen missing test scripts in my muggle project", "should_trigger": true },
+  { "query": "I looked at the test cases page in muggle and like 30 out of 80 are still showing as DRAFT. is there a way to batch-regenerate all the ones without an active script instead of clicking each one in the UI?", "should_trigger": true },
+  { "query": "can you fill in the gaps in my muggle project — i want every test case that doesn't have a working script to have one. the project is called 'internal-admin-e2e'", "should_trigger": true },
+  { "query": "all my muggle test cases without active scripts need a script now please", "should_trigger": true },
+  { "query": "regenerate every test case in 'tanka staging' that doesn't currently have an active test script attached. default filter is fine, just don't touch the ACTIVE ones.", "should_trigger": true },
+  { "query": "I have a ton of generation_pending and draft test cases in muggle — they've been stuck for days. kick off generation for all of them at once.", "should_trigger": true },
+  { "query": "rebuild the missing muggle scripts for my project, the inactive test cases need to be regenerated in one batch", "should_trigger": true },
+  { "query": "kick off muggle script generation for every test case that's still in DRAFT state — i don't want to go through them one by one in the dashboard", "should_trigger": true },
+  { "query": "bulk fill the empty test scripts across my whole project so they all become active", "should_trigger": true },
+  { "query": "I just pushed a new commit on branch users/stan/checkout-refactor — can you run the muggle e2e acceptance tests against my localhost dev server (it's on port 3000) and make sure nothing is broken before I open the PR?", "should_trigger": false },
+  { "query": "write me a playwright test for the checkout flow that covers the happy path — user adds item, enters address, pays with test card", "should_trigger": false },
+  { "query": "can you run a single muggle test against my localhost for the signup flow? I want to watch it in the visible Electron window, the dev server is on http://localhost:5173", "should_trigger": false },
+  { "query": "my muggle-ai-ui CI is failing — the test script for 'login with valid creds' keeps timing out. can you look at the action script, figure out why, and fix it?", "should_trigger": false },
+  { "query": "I have a cypress test suite in cypress/integration/** and a few markdown PRDs in docs/requirements. import them into muggle as use cases and test cases under my project 'Acme Checkout QA'", "should_trigger": false },
+  { "query": "what muggle projects do I have right now? just list them for me with their URLs", "should_trigger": false },
+  { "query": "create a new muggle test case for 'apply expired coupon shows error banner' under the Checkout Flow use case in project 'Acme Checkout QA'", "should_trigger": false },
+  { "query": "regen my package-lock.json and reinstall node_modules — something is off with the muggle-ai-works workspace after the dependabot merge", "should_trigger": false },
+  { "query": "check the status of my current muggle test script generation — i kicked off about 15 runs earlier and i want to know which ones finished and which are still running", "should_trigger": false },
+  { "query": "generate a brand new use case and some test cases from scratch for my muggle project using a plain-english description of our new onboarding flow", "should_trigger": false }
+]

package/dist/release-manifest.json ADDED Viewed

@@ -0,0 +1,7 @@
+{
+  "release": "4.5.0",
+  "buildId": "run-13-1",
+  "commitSha": "bff100bb7229ea757db7154a9ef0c289da0124ef",
+  "buildTime": "2026-04-09T22:06:42Z",
+  "serviceName": "muggle-ai-works-mcp"
+}

package/package.json CHANGED Viewed

@@ -1,7 +1,7 @@
 {
     "name": "@muggleai/works",
     "mcpName": "io.github.multiplex-ai/muggle",
-    "version": "4.3.0",
+    "version": "4.5.0",
     "description": "Ship quality products with AI-powered E2E acceptance testing that validates your web app like a real user — from Claude Code and Cursor to PR.",
     "type": "module",
     "main": "dist/index.js",
@@ -16,7 +16,7 @@
     ],
     "scripts": {
         "clean": "rimraf dist",
-        "build": "tsup && node scripts/sync-versions.mjs && node scripts/build-plugin.mjs",
+        "build": "tsup && node scripts/write-release-manifest.mjs && node scripts/sync-versions.mjs && node scripts/build-plugin.mjs",
         "build:plugin": "node scripts/build-plugin.mjs",
         "sync:versions": "node scripts/sync-versions.mjs",
         "build:release": "npm run build",
@@ -41,14 +41,14 @@
         "test:watch": "vitest"
     },
     "muggleConfig": {
-        "electronAppVersion": "1.0.32",
+        "electronAppVersion": "1.0.47",
         "downloadBaseUrl": "https://github.com/multiplex-ai/muggle-ai-works/releases/download",
         "runtimeTargetDefault": "production",
         "checksums": {
-            "darwin-arm64": "8a0c66138a7d7cf8225c749304a2624a0b950a907f35893259d3a7c98758eb23",
-            "darwin-x64": "9efc098ced8fe7ee724560ff66f902a9663f2601389bf71cb1016cca86d03468",
-            "win32-x64": "60eb2f6e0179423920e4553c1b25d6051cedf1fdc5f568a96976b85625cb32be",
-            "linux-x64": "36212f0ec3da6325d7c22cfd5226dede2645b2a86a190a168f3747dc5b1b7b97"
+            "darwin-arm64": "f80b943ea5f05e7113d603ee8104c07be101a26092c4fa50ed6fcb37a0cbebff",
+            "darwin-x64": "3189a5c07087f9ba2ef03f99e3735055c00a752b2421ae9cffc113f04933da61",
+            "win32-x64": "d8e102fce024776262f856dfea0b12429e853689d29d03e27e54dbeffbe59725",
+            "linux-x64": "d7218dddcbab47f78c64fe438cf5bd129740f20f6ad160b615a648415f8faffc"
         }
     },
     "dependencies": {

package/plugin/.claude-plugin/plugin.json CHANGED Viewed

@@ -1,7 +1,7 @@
 {
   "name": "muggle",
   "description": "Run real-browser end-to-end (E2E) acceptance tests on your web app from any AI coding agent. Generate test scripts from plain English, replay them on localhost, capture screenshots, and validate user flows like signup, checkout, and dashboards. Works across Claude Code, Cursor, Codex, and Windsurf.",
-  "version": "4.3.0",
+  "version": "4.5.0",
   "author": {
     "name": "Muggle AI",
     "email": "support@muggle-ai.com"

package/plugin/.cursor-plugin/plugin.json CHANGED Viewed

@@ -2,7 +2,7 @@
   "name": "muggle",
   "displayName": "Muggle AI",
   "description": "Ship quality products with AI-powered end-to-end (E2E) acceptance testing that validates your web app like a real user — from Claude Code and Cursor to PR.",
-  "version": "4.3.0",
+  "version": "4.5.0",
   "author": {
     "name": "Muggle AI",
     "email": "support@muggle-ai.com"

package/plugin/README.md CHANGED Viewed

@@ -28,6 +28,7 @@ Type `muggle` to discover the full command family.
 | `/muggle:muggle-test` | Change-driven E2E acceptance router: detects code changes, maps to use cases, runs test generation locally or remotely, publishes to dashboard, opens in browser, posts E2E acceptance results to PR. |
 | `/muggle:muggle-test-feature-local` | Test a feature on localhost with AI-driven browser automation. Offers publish to cloud after each run. |
 | `/muggle:muggle-test-import` | Import existing tests into Muggle Test — from Playwright/Cypress specs, PRDs, Gherkin feature files, test plan docs, or any test artifact. |
+| `/muggle:muggle-test-regenerate-missing` | Bulk-regenerate test scripts for every test case in a project that doesn't currently have an active script. Scans DRAFT + GENERATION_PENDING, confirms the list with the user, and dispatches remote generation workflows for each. |
 | `/muggle:muggle-status` | Health check for Electron browser test runner, MCP server, and authentication. |
 | `/muggle:muggle-repair` | Diagnose and fix broken installation automatically. |
 | `/muggle:muggle-upgrade` | Update Electron browser test runner and MCP server to latest version. |