npm - @muggleai/works - Versions diffs - 4.2.1 → 4.2.2 - Mend

@muggleai/works 4.2.1 → 4.2.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (8) hide show

package/README.md +56 -14
package/dist/{chunk-CXTJOYWM.js → chunk-BZJXQZ5Q.js} +42 -17
package/dist/cli.js +1 -1
package/dist/index.js +1 -1
package/dist/plugin/skills/muggle-test-feature-local/SKILL.md +76 -79
package/package.json +1 -1
package/plugin/skills/muggle-test-feature-local/SKILL.md +76 -79
package/scripts/postinstall.mjs +2 -2

package/README.md CHANGED Viewed

@@ -27,24 +27,16 @@ muggle-ai-works closes the gap between "code complete" and "actually works."
 ## Quick Start
-### 1. Install
+### 1. Install (choose your client)
-In Claude Code, run:
+**Claude Code (full plugin experience)**
 ```
 /plugin marketplace add https://github.com/multiplex-ai/muggle-ai-works
 /plugin install muggleai@muggle-works
 ```
-If you install via npm instead:
-```bash
-npm install -g @muggleai/works
-```
-`npm install` updates the CLI and syncs `muggle-*` skills to `~/.cursor/skills/` for Cursor discovery. Claude slash commands are plugin-managed, so update those with `/plugin update muggleai@muggle-works`.
-This installs the Muggle AI plugin with:
+This installs:
 - `/muggle:muggle` — command router and menu
 - `/muggle:muggle-do` — autonomous dev pipeline (requirements to PR)
@@ -55,16 +47,48 @@ This installs the Muggle AI plugin with:
 - MCP server with 70+ tools (auto-started)
 - Electron QA engine provisioning (via session hook)
+**Cursor, Codex, Windsurf, and other MCP clients (MCP tools only)**
+```bash
+npm install -g @muggleai/works
+```
+Then configure your MCP client:
+```json
+{
+  "mcpServers": {
+    "muggle": {
+      "command": "muggle",
+      "args": ["serve"],
+      "env": {
+        "MUGGLE_MCP_PROMPT_SERVICE_TARGET": "production"
+      }
+    }
+  }
+}
+```
+`npm install` also syncs `muggle-*` skills to `~/.cursor/skills/` for Cursor discovery. Claude slash commands are plugin-managed, so update those with `/plugin update muggleai@muggle-works`.
 ### 2. Verify
+**Claude Code**
 ```
 /muggle:muggle-status
 ```
 This checks Electron QA engine, MCP server health, and authentication. If anything is broken, run `/muggle:muggle-repair`.
+**Cursor/Codex/Windsurf/other MCP clients**
+Run any `muggle-*` MCP tool from your client after adding the MCP server config above. Authentication starts automatically on first protected tool call.
 ### 3. Start building features
+**Claude Code**
 Describe what you want to build:
 ```
@@ -73,8 +97,14 @@ Describe what you want to build:
 The AI handles the full cycle: code the feature, run unit tests, QA the app in a real browser, and open a PR with results.
+**Cursor/Codex/Windsurf/other MCP clients**
+Use the direct MCP workflow section below to call `muggle-*` tools from your client.
 ### 4. Test a feature locally
+**Claude Code**
 Already have code running on localhost? Test it directly:
 ```
@@ -83,6 +113,10 @@ Already have code running on localhost? Test it directly:
 Describe what to test in plain English. The AI finds or creates test cases, launches a real browser, and reports results with screenshots.
+**Cursor/Codex/Windsurf/other MCP clients**
+Call local execution MCP tools directly (for example `muggle-local-execute-test-script-replay` or related `muggle-local-*` commands exposed by your client).
 ---
 ## How does it work?
@@ -120,7 +154,7 @@ Screenshots captured per step → action-script.json recorded
 Results: pass/fail with evidence at ~/.muggle-ai/sessions/{runId}/
          │
          v
-muggle-local-publish-test-script uploads to cloud
+muggle-local-publish-test-script uploads to cloud → returns viewUrl to open dashboard
 ```
 ---
@@ -256,7 +290,7 @@ Local Execution (muggle-local-*)
 | `muggle-local-cancel-execution`        | Cancel active execution            |
 | `muggle-local-run-result-list`         | List run results                   |
 | `muggle-local-run-result-get`          | Get detailed results + screenshots |
-| `muggle-local-publish-test-script`     | Publish script to cloud            |
+| `muggle-local-publish-test-script`     | Publish script to cloud, returns `viewUrl` |
 Reports and Analytics (muggle-remote-report-*)
@@ -391,7 +425,7 @@ Data directory structure (~/.muggle-ai/)
 ## What AI clients does it work with?
-Full support for Claude Code. MCP tools work with Cursor and any MCP-compatible client. Plugin skills require Claude Code plugin support.
+Full support for Claude Code. Cursor, Codex, Windsurf, and other MCP-compatible clients use the same MCP tools but do not support Claude plugin slash commands (`/muggle:*`).
 Platform compatibility table
@@ -501,6 +535,14 @@ git tag v<version> && git push --tags
 ```
+Release tag strategy
+- `electron-app-vX.Y.Z` tags in `muggle-ai-works` are for public Electron app binary releases (consumed by `muggle setup`, `muggle upgrade`, and npm postinstall).
+- `vX.Y.Z` tags in `muggle-ai-works` are for npm publishing of `@muggleai/works` (`publish-works.yml`).
+- `muggle-ai-teaching-service` builds Electron artifacts and publishes them into this public repo using `electron-app-vX.Y.Z`, so binaries are publicly downloadable.
+- The two version tracks are intentionally separate: runtime Electron artifact versions and npm package versions can move independently.
 Optimizing agent-facing descriptions

package/dist/{chunk-CXTJOYWM.js → chunk-BZJXQZ5Q.js} RENAMED Viewed

@@ -406,6 +406,7 @@ function resetLogger() {
 // packages/mcps/src/mcp/qa/index.ts
 var qa_exports2 = {};
 __export(qa_exports2, {
+  ActionScriptGetInputSchema: () => ActionScriptGetInputSchema,
   ApiKeyCreateInputSchema: () => ApiKeyCreateInputSchema,
   ApiKeyGetInputSchema: () => ApiKeyGetInputSchema,
   ApiKeyListInputSchema: () => ApiKeyListInputSchema,
@@ -1948,12 +1949,12 @@ function buildReplayActionScript(params) {
     sourceLabel: "testScript"
   });
   const rewrittenActionScript = rewriteActionScriptUrls({
-    actionScript: params.testScript.actionScript,
+    actionScript: params.actionScript,
     originalUrl: params.testScript.url,
     localUrl: params.localUrl
   });
   return {
-    actionScriptId: params.testScript.id,
+    actionScriptId: params.testScript.actionScriptId,
     actionScriptName: params.testScript.name,
     actionType: "UserDefined",
     actionParams: {
@@ -2231,7 +2232,7 @@ ${executionResult.stderr}`;
   }
 }
 async function executeReplay(params) {
-  const { testScript, localUrl } = params;
+  const { testScript, actionScript, localUrl } = params;
   const timeoutMs = params.timeoutMs ?? 18e4;
   const userId = getAuthenticatedUserId();
   const authContent = buildStudioAuthContent();
@@ -2256,15 +2257,16 @@ async function executeReplay(params) {
   try {
     const runId = runResult.id;
     const startedAt = Date.now();
-    const actionScript = buildReplayActionScript({
+    const builtActionScript = buildReplayActionScript({
       testScript,
+      actionScript,
       localUrl,
       runId,
       ownerUserId: authContent.userId
     });
     const inputFilePath = await writeTempFile({
       filename: `${runId}_input.json`,
-      data: actionScript
+      data: builtActionScript
     });
     const authFilePath = await writeTempFile({
       filename: `${runId}_auth.json`,
@@ -2968,7 +2970,8 @@ var TestCaseCreateInputSchema = z.object({
   automated: z.boolean().optional().describe("Whether this test case is automated (default: true)")
 });
 var TestScriptListInputSchema = z.object({
-  projectId: IdSchema.describe("Project ID to list test scripts for")
+  projectId: IdSchema.describe("Project ID to list test scripts for"),
+  testCaseId: IdSchema.optional().describe("Optional test case ID to filter scripts by")
 }).merge(PaginationInputSchema);
 var TestScriptGetInputSchema = z.object({
   testScriptId: IdSchema.describe("Test script ID to retrieve")
@@ -2976,6 +2979,9 @@ var TestScriptGetInputSchema = z.object({
 var TestScriptListPaginatedInputSchema = z.object({
   projectId: IdSchema.describe("Project ID to list test scripts for")
 }).merge(PaginationInputSchema);
+var ActionScriptGetInputSchema = z.object({
+  actionScriptId: IdSchema.describe("Action script ID to retrieve")
+});
 var WorkflowStartWebsiteScanInputSchema = z.object({
   projectId: IdSchema.describe("Project ID to scan"),
   url: z.string().url().describe("Website URL to scan"),
@@ -3147,7 +3153,8 @@ var GatewayError = class extends Error {
 var ALLOWED_UPSTREAM_PREFIXES = [
   "/v1/protected/muggle-test/",
   "/v1/protected/wallet/",
-  "/v1/protected/api-keys"
+  "/v1/protected/api-keys",
+  "/v1/protected/actionScript/"
 ];
 var PromptServiceClient = class {
   httpClient;
@@ -3636,14 +3643,14 @@ var testCaseTools = [
 var testScriptTools = [
   {
     name: "muggle-remote-test-script-list",
-    description: "List test scripts for a project.",
+    description: "List test scripts for a project, optionally filtered by test case.",
     inputSchema: TestScriptListInputSchema,
     mapToUpstream: (input) => {
       const data = input;
       return {
         method: "GET",
         path: `${MUGGLE_TEST_PREFIX}/test-scripts`,
-        queryParams: { projectId: data.projectId, page: data.page, pageSize: data.pageSize }
+        queryParams: { projectId: data.projectId, testCaseId: data.testCaseId, page: data.page, pageSize: data.pageSize }
       };
     }
   },
@@ -3673,6 +3680,20 @@ var testScriptTools = [
     }
   }
 ];
+var actionScriptTools = [
+  {
+    name: "muggle-remote-action-script-get",
+    description: "Get the full action script content by ID. Use actionScriptId from a test script to fetch the complete script with all steps and element labels needed for replay.",
+    inputSchema: ActionScriptGetInputSchema,
+    mapToUpstream: (input) => {
+      const data = input;
+      return {
+        method: "GET",
+        path: `/v1/protected/actionScript/${data.actionScriptId}`
+      };
+    }
+  }
+];
 var workflowTools = [
   {
     name: "muggle-remote-workflow-start-website-scan",
@@ -4641,6 +4662,7 @@ var allQaToolDefinitions = [
   ...useCaseTools,
   ...testCaseTools,
   ...testScriptTools,
+  ...actionScriptTools,
   ...workflowTools,
   ...reportTools,
   ...secretTools,
@@ -4828,8 +4850,8 @@ var TestScriptDetailsSchema = z.object({
   name: z.string().min(1).describe("Test script name"),
   /** Cloud test case ID this script belongs to. */
   testCaseId: z.string().min(1).describe("Cloud test case ID this script was generated from"),
-  /** Action script steps. */
-  actionScript: z.array(z.unknown()).describe("Action script steps to replay"),
+  /** Action script ID reference (use muggle-remote-action-script-get to fetch content). */
+  actionScriptId: z.string().min(1).describe("Action script ID - use muggle-remote-action-script-get to fetch the full script"),
   /** Original cloud URL (for reference, replaced by localUrl). */
   url: z.string().url().optional().describe("Original cloud URL (replaced by localUrl during execution)"),
   /** Cloud project ID (required for electron workflow context). */
@@ -4850,8 +4872,10 @@ var ExecuteTestGenerationInputSchema = z.object({
   showUi: z.boolean().optional().describe("Show the electron-app UI during execution (default: false, runs headless)")
 });
 var ExecuteReplayInputSchema = z.object({
-  /** Test script details from qa_test_script_get. */
-  testScript: TestScriptDetailsSchema.describe("Test script details obtained from qa_test_script_get"),
+  /** Test script metadata from muggle-remote-test-script-get. */
+  testScript: TestScriptDetailsSchema.describe("Test script metadata from muggle-remote-test-script-get"),
+  /** Action script content from muggle-remote-action-script-get (using testScript.actionScriptId). */
+  actionScript: z.array(z.unknown()).describe("Action script steps from muggle-remote-action-script-get"),
   /** Local URL to test against. */
   localUrl: z.string().url().describe("Local URL to test against (e.g., http://localhost:3000)"),
   /** Explicit approval to launch electron-app. */
@@ -5154,7 +5178,7 @@ var executeTestGenerationTool = {
 };
 var executeReplayTool = {
   name: "muggle-local-execute-replay",
-  description: "Replay an existing QA test script in a real browser to verify your app still works correctly \u2014 use this for regression testing after code changes. The browser executes each saved step and captures screenshots so you can see what happened. Requires a test script (from muggle-remote-test-script-get) and a localhost URL. Launches an Electron browser \u2014 requires explicit approval via approveElectronAppLaunch. Runs headless by default; set showUi: true to watch.",
+  description: "Replay an existing QA test script in a real browser to verify your app still works correctly \u2014 use this for regression testing after code changes. The browser executes each saved step and captures screenshots so you can see what happened. Requires: (1) test script metadata from muggle-remote-test-script-get, (2) actionScript content from muggle-remote-action-script-get using the testScript.actionScriptId, and (3) a localhost URL. Launches an Electron browser \u2014 requires explicit approval via approveElectronAppLaunch. Runs headless by default; set showUi: true to watch.",
   inputSchema: ExecuteReplayInputSchema,
   execute: async (ctx) => {
     const logger14 = createChildLogger2(ctx.correlationId);
@@ -5171,7 +5195,7 @@ var executeReplayTool = {
           "",
           `**Test Script:** ${input.testScript.name}`,
           `**Local URL:** ${input.localUrl}`,
-          `**Steps:** ${input.testScript.actionScript.length}`,
+          `**Steps:** ${input.actionScript.length}`,
           `**UI Mode:** ${uiMode}`,
           "",
           "**Note:** The electron-app will open a browser window and execute the test steps."
@@ -5183,6 +5207,7 @@ var executeReplayTool = {
     try {
       const result = await executeReplay({
         testScript: input.testScript,
+        actionScript: input.actionScript,
         localUrl: input.localUrl,
         timeoutMs: input.timeoutMs,
         showUi: input.showUi
@@ -5225,7 +5250,7 @@ var cancelExecutionTool = {
 };
 var publishTestScriptTool = {
   name: "muggle-local-publish-test-script",
-  description: "Publish a locally generated test script to the cloud. Uses the run ID from muggle_execute_test_generation to find the script and uploads it to the specified cloud test case.",
+  description: "Publish a locally generated test script to the cloud. Uses the run ID from muggle_execute_test_generation to find the script and uploads it to the specified cloud test case. Returns a viewUrl that can be opened in the user's browser to view the published test script on the dashboard.",
   inputSchema: PublishTestScriptInputSchema,
   execute: async (ctx) => {
     const logger14 = createChildLogger2(ctx.correlationId);
@@ -6923,7 +6948,7 @@ function getBinaryName2() {
   }
 }
 function extractVersionFromTag(tag) {
-  const match = tag.match(/^electron-app-v(\d+\.\d+\.\d+)$/);
+  const match = tag.match(/^(?:electron-app-)?v(\d+\.\d+\.\d+)$/);
   return match ? match[1] : null;
 }
 function getVersionOverridePath() {

package/dist/cli.js CHANGED Viewed

@@ -1,5 +1,5 @@
 #!/usr/bin/env node
-import { runCli } from './chunk-CXTJOYWM.js';
+import { runCli } from './chunk-BZJXQZ5Q.js';
 // src/cli/main.ts
 runCli().catch((error) => {

package/dist/index.js CHANGED Viewed

	@@ -1 +1 @@
1	- export { src_exports2 as commands, createChildLogger, createUnifiedMcpServer, getConfig, getLocalQaTools, getLogger, getQaTools, local_exports as localQa, mcp_exports as mcp, qa_exports as qa, server_exports as server, src_exports as shared } from './chunk-~~CXTJOYWM~~.js';
1	+ export { src_exports2 as commands, createChildLogger, createUnifiedMcpServer, getConfig, getLocalQaTools, getLogger, getQaTools, local_exports as localQa, mcp_exports as mcp, qa_exports as qa, server_exports as server, src_exports as shared } from './chunk-BZJXQZ5Q.js';

package/dist/plugin/skills/muggle-test-feature-local/SKILL.md CHANGED Viewed

@@ -5,118 +5,115 @@ description: Run a real-browser QA test against localhost to verify a feature wo
 # Muggle Test Feature Local
-Run end-to-end feature testing from UI against a local URL:
+**Goal:** Run or generate an end-to-end test against a **local URL** using Muggle’s Electron browser.
-- Cloud management: `muggle-remote-*`
-- Local execution and artifacts: `muggle-local-*`
+| Scope | MCP tools |
+| :---- | :-------- |
+| Cloud (projects, cases, scripts, auth) | `muggle-remote-*` |
+| Local (Electron run, publish, results) | `muggle-local-*` |
+| Create new entities (preview / create) | `muggle-remote-project-create`, `muggle-remote-use-case-prompt-preview`, `muggle-remote-use-case-create-from-prompts`, `muggle-remote-test-case-generate-from-prompt`, `muggle-remote-test-case-create` |
+The local URL only changes where the browser opens; it does not change the remote project or test definitions.
 ## Workflow
 ### 1. Auth
 - `muggle-remote-auth-status`
-- If needed: `muggle-remote-auth-login` + `muggle-remote-auth-poll`
+- If not signed in: `muggle-remote-auth-login` then `muggle-remote-auth-poll`
+  Do not skip or assume auth.
+### 2. Targets (user must confirm)
-### 2. Select project, use case, and test case
+Ask the user to pick **project**, **use case**, and **test case** (do not infer).
-- Explicitly ask user to select each target to proceed.
 - `muggle-remote-project-list`
-- `muggle-remote-use-case-list`
-- `muggle-remote-test-case-list-by-use-case`
+- `muggle-remote-use-case-list` (with `projectId`)
+- `muggle-remote-test-case-list-by-use-case` (with `useCaseId`)
+**Selection UI (mandatory):** After each list call, present choices as a **numbered list** (`1.` … `n.`). Keep each line minimal: number, short title, UUID. Ask the user to **reply with the number** or the UUID.
+**Fixed tail of each pick list (project, use case, test case):** After the relevance-ranked rows, end with the options below. **Create new …** is never omitted; **Show full list** is omitted when it would be pointless (see empty list).
+1. **Show full list** — user sees every row from the API (then re-number the full list including the tails below again). **Skip this option** if the API returned **zero** rows for that step (e.g. no test cases yet for the chosen use case). There is nothing to expand.
+2. **Create new …** — user creates a new entity instead of picking an existing one. Label per step: **Create new project**, **Create new use case**, or **Create new test case**.
+**Relevance-first filtering (mandatory for project, use case, and test case lists):**
-### 3. Resolve local URL
+- Do **not** dump the full list by default.
+- Rank items by semantic relevance to the user’s stated goal (title first, then description / user story / acceptance criteria).
+- Show only the **top 3–5** most relevant options, then **Show full list** (unless the API list is empty — see above), then **Create new …** as above.
+- If the user picks **Show full list**, then present the complete numbered list (still ending with **Create new …**; include **Show full list** again only when the full list has at least one row).
-- Use the URL provided by the user.
-- If missing, ask explicitly (do not guess).
-- Inform user the local URL does not affect the project's remote test.
+**Create new — tools and flow (use these MCP tools; preview before persist):**
-### 4. Check for existing scripts and ask user to choose
+- **Project — Create new project:** Collect `projectName`, `description`, and `url` (may be the local app URL, e.g. `http://localhost:3999`). Call `muggle-remote-project-create`. Use the returned `projectId` and continue.
+- **Use case — Create new use case:** User provides a natural-language instruction (or you reuse their testing goal).
+  1. `muggle-remote-use-case-prompt-preview` with `projectId`, `instruction` — show preview; get confirmation.
+  2. `muggle-remote-use-case-create-from-prompts` with `projectId`, `prompts: [{ instruction }]` — persist. Use the created use case id and continue to test-case selection.
+- **Test case — Create new test case** (requires a chosen `useCaseId`): User provides an instruction describing what to test.
+  1. `muggle-remote-test-case-generate-from-prompt` with `projectId`, `useCaseId`, `instruction` — **preview only** (server test-case prompt preview); show the returned draft(s); get confirmation.
+  2. Persist the accepted draft with `muggle-remote-test-case-create`, mapping preview fields into the required properties (`title`, `description`, `goal`, `expectedResult`, `url`, etc.). Then continue from **§4** with that `testCaseId`.
-Check BOTH cloud and local scripts to determine what's available:
+### 3. Local URL
-1. **Check cloud scripts:** `muggle-remote-test-script-list` filtered by projectId
-2. **Check local scripts:** `muggle-local-test-script-list` filtered by projectId
+- Use the URL the user gives. If none, ask; **do not guess**.
+- Remind them: local URL is only the execution target, not tied to cloud project config.
-**Decision logic:**
+### 4. Existing scripts vs new generation
-| Cloud Script | Local Script (status: published/generated) | Action |
-|--------------|---------------------------------------------|--------|
-| Exists + ACTIVE | Exists | Ask user: "Replay existing script" or "Regenerate from scratch"? |
-| Exists + ACTIVE | Not found | Sync from cloud first, then ask user |
-| Not found | Exists | Ask user: "Replay local script" or "Regenerate"? |
-| Not found | Not found | Default to generation (no need to ask) |
+`muggle-remote-test-script-list` with `testCaseId`.
-**When asking user, show:**
-- Script name and ID
-- When it was created/updated
-- Number of steps
-- Last run status if available
+- **If any replayable/succeeded scripts exist:** list them in a **numbered** list and ask: replay one **or** generate new.
+  Show: name, id, created/updated, step count. Include **`Generate new script`** as the **last** numbered option (e.g. last number) so it is selectable by number too.
+- **If none:** go straight to generation (no need to ask replay vs generate).
-### 5. Prepare for execution
+### 5. Load data for the chosen path
-**For Replay:**
+**Generate**
-Local scripts contain the complete `actionScript` with element labels required for replay. Remote scripts only contain metadata.
+1. `muggle-remote-test-case-get`
+2. `muggle-local-execute-test-generation` (after approval in step 6) with that test case + `localUrl` + `approveElectronAppLaunch: true` (optional: `showUi: true`, **`timeoutMs`** — see below)
-1. Use `muggle-local-test-script-get` with `testScriptId` to fetch the FULL script including actionScript
-2. The returned script includes all steps with `operation.label` paths needed for element location
-3. Pass this complete script to `muggle-local-execute-replay`
+**Replay**
-**IMPORTANT:** Do NOT manually construct or simplify the actionScript. The electron app requires the complete script with all `label` paths intact to locate page elements during replay.
+1. `muggle-remote-test-script-get` → note `actionScriptId`
+2. `muggle-remote-action-script-get` with that id → full `actionScript`
+   **Use the API response as-is.** Do not edit, shorten, or rebuild `actionScript`; replay needs full `label` paths for element lookup.
+3. `muggle-local-execute-replay` (after approval in step 6) with `testScript`, `actionScript`, `localUrl`, `approveElectronAppLaunch: true` (optional: `showUi: true`, **`timeoutMs`** — see below)
-**For Generation:**
+### Local execution timeout (`timeoutMs`)
-1. `muggle-remote-test-case-get` to fetch test case details
-2. `muggle-local-execute-test-generation` with the test case
+The MCP client often uses a **default wait of 300000 ms (5 minutes)** for `muggle-local-execute-test-generation` and `muggle-local-execute-replay`. **Exploratory script generation** (Auth0 login, dashboards, multi-step wizards, many LLM iterations) routinely **runs longer than 5 minutes** while Electron is still healthy.
-### 6. Approval requirement
+- **Always pass `timeoutMs`** for flows that may be long — for example **`600000` (10 min)** or **`900000` (15 min)** — unless the user explicitly wants a short cap.
+- If the tool reports **`Electron execution timed out after 300000ms`** (or similar) **but** Electron logs show the run still progressing (steps, screenshots, LLM calls), treat it as **orchestration timeout**, not an Electron app defect: **increase `timeoutMs` and retry** (after user re-approves if your policy requires it).
+- **Test case design:** Preconditions like “a test run has already completed” on an **empty account** can force many steps (sign-up, new project, crawl). Prefer an account/project that **already has** the needed state, or narrow the test goal so generation does not try to create a full project from scratch unless that is intentional.
-- Before execution, get explicit user approval for launching Electron app.
-- Show what will be executed (replay vs generation, test case name, URL).
-- Only then set `approveElectronAppLaunch: true`.
+### Interpreting `failed` / non-zero Electron exit
-### 7. Execute
+- **`Electron execution timed out after 300000ms`:** Orchestration wait too short — see **`timeoutMs`** above.
+- **Exit code 26** (and messages like **LLM failed to generate / replay action script**): Often corresponds to a completed exploration whose **outcome was goal not achievable** (`goal_not_achievable`, summary with `halt`) — e.g. verifying “view script after a successful run” when **no run or script exists yet** in the UI. Use `muggle-local-run-result-get` and read the **summary / structured summary**; do not assume an Electron crash. **Fix:** choose a **project that already has** completed runs and scripts, or **change the test case** so preconditions match what localhost can satisfy (e.g. include steps to create and run a test first, or assert only empty-state UI when no runs exist).
-**Replay:**
-```
-muggle-local-execute-replay with:
-- testScript: (full script from muggle-local-test-script-get)
-- localUrl: user-provided localhost URL
-- approveElectronAppLaunch: true
-- showUi: true (optional, lets user watch)
-```
+### 6. Approval before any local execution
-**Generation:**
-```
-muggle-local-execute-test-generation with:
-- testCase: (from muggle-remote-test-case-get)
-- localUrl: user-provided localhost URL
-- approveElectronAppLaunch: true
-- showUi: true (optional)
-```
+Get **explicit** OK to launch Electron. State: replay vs generation, test case name, URL.
+Only then call local execute tools with `approveElectronAppLaunch: true`.
-### 8. Publish generation results (generation only)
+### 7. After successful generation only
-- Use `muggle-local-publish-test-script` after successful generation.
-- This uploads the script to cloud so it can be replayed later.
-- Return the remote URL for user to view the result.
+- `muggle-local-publish-test-script`
+- Open returned `viewUrl` for the user (`open "<viewUrl>"` on macOS or OS equivalent).
-### 9. Report results
+### 8. Report
-- `muggle-local-run-result-get` with returned runId.
-- Report:
-  - status (passed/failed)
-  - duration
-  - pass/fail summary
-  - steps summary (which steps passed/failed)
-  - artifacts path (screenshots location)
-  - script detail view URL
+- `muggle-local-run-result-get` with the run id from execute.
+- Include: status, duration, pass/fail summary, per-step summary, artifact/screenshot paths, errors if failed, and script view URL when publishing ran.
-## Guardrails
+## Non-negotiables
-- Do not silently skip auth.
-- Do not silently skip asking user when a replayable script exists.
-- Do not launch Electron without explicit approval.
-- Do not hide failing run details; include error and artifacts path.
-- Do not simplify or reconstruct actionScript for replay; use the complete script from `muggle-local-test-script-get`.
-- Always check local scripts before defaulting to generation.
+- No silent auth skip; no launching Electron without approval.
+- If replayable scripts exist, do not default to generation without user choice.
+- No hiding failures: surface errors and artifact paths.
+- Replay: never hand-built or simplified `actionScript` — only from `muggle-remote-action-script-get`.
+- Project, use case, and test case selection lists must always include **Create new …**. Include **Show full list** whenever the API returned at least one row for that step; **omit Show full list** when the list is empty (offer **Create new …** only). For creates, use preview tools (`muggle-remote-use-case-prompt-preview`, `muggle-remote-test-case-generate-from-prompt`) before persisting.

package/package.json CHANGED Viewed

@@ -1,7 +1,7 @@
 {
     "name": "@muggleai/works",
     "mcpName": "io.github.multiplex-ai/muggle",
-    "version": "4.2.1",
+    "version": "4.2.2",
     "description": "Ship quality products with AI-powered QA that validates your app's user experience — from Claude Code and Cursor to PR.",
     "type": "module",
     "main": "dist/index.js",

package/plugin/skills/muggle-test-feature-local/SKILL.md CHANGED Viewed

@@ -5,118 +5,115 @@ description: Run a real-browser QA test against localhost to verify a feature wo
 # Muggle Test Feature Local
-Run end-to-end feature testing from UI against a local URL:
+**Goal:** Run or generate an end-to-end test against a **local URL** using Muggle’s Electron browser.
-- Cloud management: `muggle-remote-*`
-- Local execution and artifacts: `muggle-local-*`
+| Scope | MCP tools |
+| :---- | :-------- |
+| Cloud (projects, cases, scripts, auth) | `muggle-remote-*` |
+| Local (Electron run, publish, results) | `muggle-local-*` |
+| Create new entities (preview / create) | `muggle-remote-project-create`, `muggle-remote-use-case-prompt-preview`, `muggle-remote-use-case-create-from-prompts`, `muggle-remote-test-case-generate-from-prompt`, `muggle-remote-test-case-create` |
+The local URL only changes where the browser opens; it does not change the remote project or test definitions.
 ## Workflow
 ### 1. Auth
 - `muggle-remote-auth-status`
-- If needed: `muggle-remote-auth-login` + `muggle-remote-auth-poll`
+- If not signed in: `muggle-remote-auth-login` then `muggle-remote-auth-poll`
+  Do not skip or assume auth.
+### 2. Targets (user must confirm)
-### 2. Select project, use case, and test case
+Ask the user to pick **project**, **use case**, and **test case** (do not infer).
-- Explicitly ask user to select each target to proceed.
 - `muggle-remote-project-list`
-- `muggle-remote-use-case-list`
-- `muggle-remote-test-case-list-by-use-case`
+- `muggle-remote-use-case-list` (with `projectId`)
+- `muggle-remote-test-case-list-by-use-case` (with `useCaseId`)
+**Selection UI (mandatory):** After each list call, present choices as a **numbered list** (`1.` … `n.`). Keep each line minimal: number, short title, UUID. Ask the user to **reply with the number** or the UUID.
+**Fixed tail of each pick list (project, use case, test case):** After the relevance-ranked rows, end with the options below. **Create new …** is never omitted; **Show full list** is omitted when it would be pointless (see empty list).
+1. **Show full list** — user sees every row from the API (then re-number the full list including the tails below again). **Skip this option** if the API returned **zero** rows for that step (e.g. no test cases yet for the chosen use case). There is nothing to expand.
+2. **Create new …** — user creates a new entity instead of picking an existing one. Label per step: **Create new project**, **Create new use case**, or **Create new test case**.
+**Relevance-first filtering (mandatory for project, use case, and test case lists):**
-### 3. Resolve local URL
+- Do **not** dump the full list by default.
+- Rank items by semantic relevance to the user’s stated goal (title first, then description / user story / acceptance criteria).
+- Show only the **top 3–5** most relevant options, then **Show full list** (unless the API list is empty — see above), then **Create new …** as above.
+- If the user picks **Show full list**, then present the complete numbered list (still ending with **Create new …**; include **Show full list** again only when the full list has at least one row).
-- Use the URL provided by the user.
-- If missing, ask explicitly (do not guess).
-- Inform user the local URL does not affect the project's remote test.
+**Create new — tools and flow (use these MCP tools; preview before persist):**
-### 4. Check for existing scripts and ask user to choose
+- **Project — Create new project:** Collect `projectName`, `description`, and `url` (may be the local app URL, e.g. `http://localhost:3999`). Call `muggle-remote-project-create`. Use the returned `projectId` and continue.
+- **Use case — Create new use case:** User provides a natural-language instruction (or you reuse their testing goal).
+  1. `muggle-remote-use-case-prompt-preview` with `projectId`, `instruction` — show preview; get confirmation.
+  2. `muggle-remote-use-case-create-from-prompts` with `projectId`, `prompts: [{ instruction }]` — persist. Use the created use case id and continue to test-case selection.
+- **Test case — Create new test case** (requires a chosen `useCaseId`): User provides an instruction describing what to test.
+  1. `muggle-remote-test-case-generate-from-prompt` with `projectId`, `useCaseId`, `instruction` — **preview only** (server test-case prompt preview); show the returned draft(s); get confirmation.
+  2. Persist the accepted draft with `muggle-remote-test-case-create`, mapping preview fields into the required properties (`title`, `description`, `goal`, `expectedResult`, `url`, etc.). Then continue from **§4** with that `testCaseId`.
-Check BOTH cloud and local scripts to determine what's available:
+### 3. Local URL
-1. **Check cloud scripts:** `muggle-remote-test-script-list` filtered by projectId
-2. **Check local scripts:** `muggle-local-test-script-list` filtered by projectId
+- Use the URL the user gives. If none, ask; **do not guess**.
+- Remind them: local URL is only the execution target, not tied to cloud project config.
-**Decision logic:**
+### 4. Existing scripts vs new generation
-| Cloud Script | Local Script (status: published/generated) | Action |
-|--------------|---------------------------------------------|--------|
-| Exists + ACTIVE | Exists | Ask user: "Replay existing script" or "Regenerate from scratch"? |
-| Exists + ACTIVE | Not found | Sync from cloud first, then ask user |
-| Not found | Exists | Ask user: "Replay local script" or "Regenerate"? |
-| Not found | Not found | Default to generation (no need to ask) |
+`muggle-remote-test-script-list` with `testCaseId`.
-**When asking user, show:**
-- Script name and ID
-- When it was created/updated
-- Number of steps
-- Last run status if available
+- **If any replayable/succeeded scripts exist:** list them in a **numbered** list and ask: replay one **or** generate new.
+  Show: name, id, created/updated, step count. Include **`Generate new script`** as the **last** numbered option (e.g. last number) so it is selectable by number too.
+- **If none:** go straight to generation (no need to ask replay vs generate).
-### 5. Prepare for execution
+### 5. Load data for the chosen path
-**For Replay:**
+**Generate**
-Local scripts contain the complete `actionScript` with element labels required for replay. Remote scripts only contain metadata.
+1. `muggle-remote-test-case-get`
+2. `muggle-local-execute-test-generation` (after approval in step 6) with that test case + `localUrl` + `approveElectronAppLaunch: true` (optional: `showUi: true`, **`timeoutMs`** — see below)
-1. Use `muggle-local-test-script-get` with `testScriptId` to fetch the FULL script including actionScript
-2. The returned script includes all steps with `operation.label` paths needed for element location
-3. Pass this complete script to `muggle-local-execute-replay`
+**Replay**
-**IMPORTANT:** Do NOT manually construct or simplify the actionScript. The electron app requires the complete script with all `label` paths intact to locate page elements during replay.
+1. `muggle-remote-test-script-get` → note `actionScriptId`
+2. `muggle-remote-action-script-get` with that id → full `actionScript`
+   **Use the API response as-is.** Do not edit, shorten, or rebuild `actionScript`; replay needs full `label` paths for element lookup.
+3. `muggle-local-execute-replay` (after approval in step 6) with `testScript`, `actionScript`, `localUrl`, `approveElectronAppLaunch: true` (optional: `showUi: true`, **`timeoutMs`** — see below)
-**For Generation:**
+### Local execution timeout (`timeoutMs`)
-1. `muggle-remote-test-case-get` to fetch test case details
-2. `muggle-local-execute-test-generation` with the test case
+The MCP client often uses a **default wait of 300000 ms (5 minutes)** for `muggle-local-execute-test-generation` and `muggle-local-execute-replay`. **Exploratory script generation** (Auth0 login, dashboards, multi-step wizards, many LLM iterations) routinely **runs longer than 5 minutes** while Electron is still healthy.
-### 6. Approval requirement
+- **Always pass `timeoutMs`** for flows that may be long — for example **`600000` (10 min)** or **`900000` (15 min)** — unless the user explicitly wants a short cap.
+- If the tool reports **`Electron execution timed out after 300000ms`** (or similar) **but** Electron logs show the run still progressing (steps, screenshots, LLM calls), treat it as **orchestration timeout**, not an Electron app defect: **increase `timeoutMs` and retry** (after user re-approves if your policy requires it).
+- **Test case design:** Preconditions like “a test run has already completed” on an **empty account** can force many steps (sign-up, new project, crawl). Prefer an account/project that **already has** the needed state, or narrow the test goal so generation does not try to create a full project from scratch unless that is intentional.
-- Before execution, get explicit user approval for launching Electron app.
-- Show what will be executed (replay vs generation, test case name, URL).
-- Only then set `approveElectronAppLaunch: true`.
+### Interpreting `failed` / non-zero Electron exit
-### 7. Execute
+- **`Electron execution timed out after 300000ms`:** Orchestration wait too short — see **`timeoutMs`** above.
+- **Exit code 26** (and messages like **LLM failed to generate / replay action script**): Often corresponds to a completed exploration whose **outcome was goal not achievable** (`goal_not_achievable`, summary with `halt`) — e.g. verifying “view script after a successful run” when **no run or script exists yet** in the UI. Use `muggle-local-run-result-get` and read the **summary / structured summary**; do not assume an Electron crash. **Fix:** choose a **project that already has** completed runs and scripts, or **change the test case** so preconditions match what localhost can satisfy (e.g. include steps to create and run a test first, or assert only empty-state UI when no runs exist).
-**Replay:**
-```
-muggle-local-execute-replay with:
-- testScript: (full script from muggle-local-test-script-get)
-- localUrl: user-provided localhost URL
-- approveElectronAppLaunch: true
-- showUi: true (optional, lets user watch)
-```
+### 6. Approval before any local execution
-**Generation:**
-```
-muggle-local-execute-test-generation with:
-- testCase: (from muggle-remote-test-case-get)
-- localUrl: user-provided localhost URL
-- approveElectronAppLaunch: true
-- showUi: true (optional)
-```
+Get **explicit** OK to launch Electron. State: replay vs generation, test case name, URL.
+Only then call local execute tools with `approveElectronAppLaunch: true`.
-### 8. Publish generation results (generation only)
+### 7. After successful generation only
-- Use `muggle-local-publish-test-script` after successful generation.
-- This uploads the script to cloud so it can be replayed later.
-- Return the remote URL for user to view the result.
+- `muggle-local-publish-test-script`
+- Open returned `viewUrl` for the user (`open "<viewUrl>"` on macOS or OS equivalent).
-### 9. Report results
+### 8. Report
-- `muggle-local-run-result-get` with returned runId.
-- Report:
-  - status (passed/failed)
-  - duration
-  - pass/fail summary
-  - steps summary (which steps passed/failed)
-  - artifacts path (screenshots location)
-  - script detail view URL
+- `muggle-local-run-result-get` with the run id from execute.
+- Include: status, duration, pass/fail summary, per-step summary, artifact/screenshot paths, errors if failed, and script view URL when publishing ran.
-## Guardrails
+## Non-negotiables
-- Do not silently skip auth.
-- Do not silently skip asking user when a replayable script exists.
-- Do not launch Electron without explicit approval.
-- Do not hide failing run details; include error and artifacts path.
-- Do not simplify or reconstruct actionScript for replay; use the complete script from `muggle-local-test-script-get`.
-- Always check local scripts before defaulting to generation.
+- No silent auth skip; no launching Electron without approval.
+- If replayable scripts exist, do not default to generation without user choice.
+- No hiding failures: surface errors and artifact paths.
+- Replay: never hand-built or simplified `actionScript` — only from `muggle-remote-action-script-get`.
+- Project, use case, and test case selection lists must always include **Create new …**. Include **Show full list** whenever the API returned at least one row for that step; **omit Show full list** when the list is empty (offer **Create new …** only). For creates, use preview tools (`muggle-remote-use-case-prompt-preview`, `muggle-remote-test-case-generate-from-prompt`) before persisting.

package/scripts/postinstall.mjs CHANGED Viewed

@@ -574,7 +574,7 @@ async function downloadElectronApp() {
         const checksums = config.checksums || {};
         const platformKey = getPlatformKey();
         const expectedChecksum = checksums[platformKey] || "";
-        const downloadUrl = `${baseUrl}/v${version}/${binaryName}`;
+        const downloadUrl = `${baseUrl}/electron-app-v${version}/${binaryName}`;
         const appDir = getElectronAppDir();
         const versionDir = join(appDir, version);
@@ -703,7 +703,7 @@ async function downloadElectronApp() {
             const packageJson = require("../package.json");
             const config = packageJson.muggleConfig || {};
             logError("  - Electron app version:", config.electronAppVersion || "unknown");
-            logError("  - Download URL:", `${config.downloadBaseUrl}/v${config.electronAppVersion}/${getBinaryName()}`);
+            logError("  - Download URL:", `${config.downloadBaseUrl}/electron-app-v${config.electronAppVersion}/${getBinaryName()}`);
         } catch {
             logError("  - Could not read package.json config");
         }