@muggleai/works 4.2.1 → 4.2.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -27,24 +27,16 @@ muggle-ai-works closes the gap between "code complete" and "actually works."
27
27
 
28
28
  ## Quick Start
29
29
 
30
- ### 1. Install
30
+ ### 1. Install (choose your client)
31
31
 
32
- In Claude Code, run:
32
+ **Claude Code (full plugin experience)**
33
33
 
34
34
  ```
35
35
  /plugin marketplace add https://github.com/multiplex-ai/muggle-ai-works
36
36
  /plugin install muggleai@muggle-works
37
37
  ```
38
38
 
39
- If you install via npm instead:
40
-
41
- ```bash
42
- npm install -g @muggleai/works
43
- ```
44
-
45
- `npm install` updates the CLI and syncs `muggle-*` skills to `~/.cursor/skills/` for Cursor discovery. Claude slash commands are plugin-managed, so update those with `/plugin update muggleai@muggle-works`.
46
-
47
- This installs the Muggle AI plugin with:
39
+ This installs:
48
40
 
49
41
  - `/muggle:muggle` — command router and menu
50
42
  - `/muggle:muggle-do` — autonomous dev pipeline (requirements to PR)
@@ -55,16 +47,48 @@ This installs the Muggle AI plugin with:
55
47
  - MCP server with 70+ tools (auto-started)
56
48
  - Electron QA engine provisioning (via session hook)
57
49
 
50
+ **Cursor, Codex, Windsurf, and other MCP clients (MCP tools only)**
51
+
52
+ ```bash
53
+ npm install -g @muggleai/works
54
+ ```
55
+
56
+ Then configure your MCP client:
57
+
58
+ ```json
59
+ {
60
+ "mcpServers": {
61
+ "muggle": {
62
+ "command": "muggle",
63
+ "args": ["serve"],
64
+ "env": {
65
+ "MUGGLE_MCP_PROMPT_SERVICE_TARGET": "production"
66
+ }
67
+ }
68
+ }
69
+ }
70
+ ```
71
+
72
+ `npm install` also syncs `muggle-*` skills to `~/.cursor/skills/` for Cursor discovery. Claude slash commands are plugin-managed, so update those with `/plugin update muggleai@muggle-works`.
73
+
58
74
  ### 2. Verify
59
75
 
76
+ **Claude Code**
77
+
60
78
  ```
61
79
  /muggle:muggle-status
62
80
  ```
63
81
 
64
82
  This checks Electron QA engine, MCP server health, and authentication. If anything is broken, run `/muggle:muggle-repair`.
65
83
 
84
+ **Cursor/Codex/Windsurf/other MCP clients**
85
+
86
+ Run any `muggle-*` MCP tool from your client after adding the MCP server config above. Authentication starts automatically on first protected tool call.
87
+
66
88
  ### 3. Start building features
67
89
 
90
+ **Claude Code**
91
+
68
92
  Describe what you want to build:
69
93
 
70
94
  ```
@@ -73,8 +97,14 @@ Describe what you want to build:
73
97
 
74
98
  The AI handles the full cycle: code the feature, run unit tests, QA the app in a real browser, and open a PR with results.
75
99
 
100
+ **Cursor/Codex/Windsurf/other MCP clients**
101
+
102
+ Use the direct MCP workflow section below to call `muggle-*` tools from your client.
103
+
76
104
  ### 4. Test a feature locally
77
105
 
106
+ **Claude Code**
107
+
78
108
  Already have code running on localhost? Test it directly:
79
109
 
80
110
  ```
@@ -83,6 +113,10 @@ Already have code running on localhost? Test it directly:
83
113
 
84
114
  Describe what to test in plain English. The AI finds or creates test cases, launches a real browser, and reports results with screenshots.
85
115
 
116
+ **Cursor/Codex/Windsurf/other MCP clients**
117
+
118
+ Call local execution MCP tools directly (for example `muggle-local-execute-test-script-replay` or related `muggle-local-*` commands exposed by your client).
119
+
86
120
  ---
87
121
 
88
122
  ## How does it work?
@@ -120,7 +154,7 @@ Screenshots captured per step → action-script.json recorded
120
154
  Results: pass/fail with evidence at ~/.muggle-ai/sessions/{runId}/
121
155
 
122
156
  v
123
- muggle-local-publish-test-script uploads to cloud
157
+ muggle-local-publish-test-script uploads to cloud → returns viewUrl to open dashboard
124
158
  ```
125
159
 
126
160
  ---
@@ -256,7 +290,7 @@ Local Execution (muggle-local-*)
256
290
  | `muggle-local-cancel-execution` | Cancel active execution |
257
291
  | `muggle-local-run-result-list` | List run results |
258
292
  | `muggle-local-run-result-get` | Get detailed results + screenshots |
259
- | `muggle-local-publish-test-script` | Publish script to cloud |
293
+ | `muggle-local-publish-test-script` | Publish script to cloud, returns `viewUrl` |
260
294
 
261
295
 
262
296
  Reports and Analytics (muggle-remote-report-*)
@@ -391,7 +425,7 @@ Data directory structure (~/.muggle-ai/)
391
425
 
392
426
  ## What AI clients does it work with?
393
427
 
394
- Full support for Claude Code. MCP tools work with Cursor and any MCP-compatible client. Plugin skills require Claude Code plugin support.
428
+ Full support for Claude Code. Cursor, Codex, Windsurf, and other MCP-compatible clients use the same MCP tools but do not support Claude plugin slash commands (`/muggle:*`).
395
429
 
396
430
  Platform compatibility table
397
431
 
@@ -501,6 +535,14 @@ git tag v<version> && git push --tags
501
535
  ```
502
536
 
503
537
 
538
+ Release tag strategy
539
+
540
+ - `electron-app-vX.Y.Z` tags in `muggle-ai-works` are for public Electron app binary releases (consumed by `muggle setup`, `muggle upgrade`, and npm postinstall).
541
+ - `vX.Y.Z` tags in `muggle-ai-works` are for npm publishing of `@muggleai/works` (`publish-works.yml`).
542
+ - `muggle-ai-teaching-service` builds Electron artifacts and publishes them into this public repo using `electron-app-vX.Y.Z`, so binaries are publicly downloadable.
543
+ - The two version tracks are intentionally separate: runtime Electron artifact versions and npm package versions can move independently.
544
+
545
+
504
546
  Optimizing agent-facing descriptions
505
547
 
506
548
 
@@ -406,6 +406,7 @@ function resetLogger() {
406
406
  // packages/mcps/src/mcp/qa/index.ts
407
407
  var qa_exports2 = {};
408
408
  __export(qa_exports2, {
409
+ ActionScriptGetInputSchema: () => ActionScriptGetInputSchema,
409
410
  ApiKeyCreateInputSchema: () => ApiKeyCreateInputSchema,
410
411
  ApiKeyGetInputSchema: () => ApiKeyGetInputSchema,
411
412
  ApiKeyListInputSchema: () => ApiKeyListInputSchema,
@@ -1948,12 +1949,12 @@ function buildReplayActionScript(params) {
1948
1949
  sourceLabel: "testScript"
1949
1950
  });
1950
1951
  const rewrittenActionScript = rewriteActionScriptUrls({
1951
- actionScript: params.testScript.actionScript,
1952
+ actionScript: params.actionScript,
1952
1953
  originalUrl: params.testScript.url,
1953
1954
  localUrl: params.localUrl
1954
1955
  });
1955
1956
  return {
1956
- actionScriptId: params.testScript.id,
1957
+ actionScriptId: params.testScript.actionScriptId,
1957
1958
  actionScriptName: params.testScript.name,
1958
1959
  actionType: "UserDefined",
1959
1960
  actionParams: {
@@ -2231,7 +2232,7 @@ ${executionResult.stderr}`;
2231
2232
  }
2232
2233
  }
2233
2234
  async function executeReplay(params) {
2234
- const { testScript, localUrl } = params;
2235
+ const { testScript, actionScript, localUrl } = params;
2235
2236
  const timeoutMs = params.timeoutMs ?? 18e4;
2236
2237
  const userId = getAuthenticatedUserId();
2237
2238
  const authContent = buildStudioAuthContent();
@@ -2256,15 +2257,16 @@ async function executeReplay(params) {
2256
2257
  try {
2257
2258
  const runId = runResult.id;
2258
2259
  const startedAt = Date.now();
2259
- const actionScript = buildReplayActionScript({
2260
+ const builtActionScript = buildReplayActionScript({
2260
2261
  testScript,
2262
+ actionScript,
2261
2263
  localUrl,
2262
2264
  runId,
2263
2265
  ownerUserId: authContent.userId
2264
2266
  });
2265
2267
  const inputFilePath = await writeTempFile({
2266
2268
  filename: `${runId}_input.json`,
2267
- data: actionScript
2269
+ data: builtActionScript
2268
2270
  });
2269
2271
  const authFilePath = await writeTempFile({
2270
2272
  filename: `${runId}_auth.json`,
@@ -2968,7 +2970,8 @@ var TestCaseCreateInputSchema = z.object({
2968
2970
  automated: z.boolean().optional().describe("Whether this test case is automated (default: true)")
2969
2971
  });
2970
2972
  var TestScriptListInputSchema = z.object({
2971
- projectId: IdSchema.describe("Project ID to list test scripts for")
2973
+ projectId: IdSchema.describe("Project ID to list test scripts for"),
2974
+ testCaseId: IdSchema.optional().describe("Optional test case ID to filter scripts by")
2972
2975
  }).merge(PaginationInputSchema);
2973
2976
  var TestScriptGetInputSchema = z.object({
2974
2977
  testScriptId: IdSchema.describe("Test script ID to retrieve")
@@ -2976,6 +2979,9 @@ var TestScriptGetInputSchema = z.object({
2976
2979
  var TestScriptListPaginatedInputSchema = z.object({
2977
2980
  projectId: IdSchema.describe("Project ID to list test scripts for")
2978
2981
  }).merge(PaginationInputSchema);
2982
+ var ActionScriptGetInputSchema = z.object({
2983
+ actionScriptId: IdSchema.describe("Action script ID to retrieve")
2984
+ });
2979
2985
  var WorkflowStartWebsiteScanInputSchema = z.object({
2980
2986
  projectId: IdSchema.describe("Project ID to scan"),
2981
2987
  url: z.string().url().describe("Website URL to scan"),
@@ -3147,7 +3153,8 @@ var GatewayError = class extends Error {
3147
3153
  var ALLOWED_UPSTREAM_PREFIXES = [
3148
3154
  "/v1/protected/muggle-test/",
3149
3155
  "/v1/protected/wallet/",
3150
- "/v1/protected/api-keys"
3156
+ "/v1/protected/api-keys",
3157
+ "/v1/protected/actionScript/"
3151
3158
  ];
3152
3159
  var PromptServiceClient = class {
3153
3160
  httpClient;
@@ -3636,14 +3643,14 @@ var testCaseTools = [
3636
3643
  var testScriptTools = [
3637
3644
  {
3638
3645
  name: "muggle-remote-test-script-list",
3639
- description: "List test scripts for a project.",
3646
+ description: "List test scripts for a project, optionally filtered by test case.",
3640
3647
  inputSchema: TestScriptListInputSchema,
3641
3648
  mapToUpstream: (input) => {
3642
3649
  const data = input;
3643
3650
  return {
3644
3651
  method: "GET",
3645
3652
  path: `${MUGGLE_TEST_PREFIX}/test-scripts`,
3646
- queryParams: { projectId: data.projectId, page: data.page, pageSize: data.pageSize }
3653
+ queryParams: { projectId: data.projectId, testCaseId: data.testCaseId, page: data.page, pageSize: data.pageSize }
3647
3654
  };
3648
3655
  }
3649
3656
  },
@@ -3673,6 +3680,20 @@ var testScriptTools = [
3673
3680
  }
3674
3681
  }
3675
3682
  ];
3683
+ var actionScriptTools = [
3684
+ {
3685
+ name: "muggle-remote-action-script-get",
3686
+ description: "Get the full action script content by ID. Use actionScriptId from a test script to fetch the complete script with all steps and element labels needed for replay.",
3687
+ inputSchema: ActionScriptGetInputSchema,
3688
+ mapToUpstream: (input) => {
3689
+ const data = input;
3690
+ return {
3691
+ method: "GET",
3692
+ path: `/v1/protected/actionScript/${data.actionScriptId}`
3693
+ };
3694
+ }
3695
+ }
3696
+ ];
3676
3697
  var workflowTools = [
3677
3698
  {
3678
3699
  name: "muggle-remote-workflow-start-website-scan",
@@ -4641,6 +4662,7 @@ var allQaToolDefinitions = [
4641
4662
  ...useCaseTools,
4642
4663
  ...testCaseTools,
4643
4664
  ...testScriptTools,
4665
+ ...actionScriptTools,
4644
4666
  ...workflowTools,
4645
4667
  ...reportTools,
4646
4668
  ...secretTools,
@@ -4828,8 +4850,8 @@ var TestScriptDetailsSchema = z.object({
4828
4850
  name: z.string().min(1).describe("Test script name"),
4829
4851
  /** Cloud test case ID this script belongs to. */
4830
4852
  testCaseId: z.string().min(1).describe("Cloud test case ID this script was generated from"),
4831
- /** Action script steps. */
4832
- actionScript: z.array(z.unknown()).describe("Action script steps to replay"),
4853
+ /** Action script ID reference (use muggle-remote-action-script-get to fetch content). */
4854
+ actionScriptId: z.string().min(1).describe("Action script ID - use muggle-remote-action-script-get to fetch the full script"),
4833
4855
  /** Original cloud URL (for reference, replaced by localUrl). */
4834
4856
  url: z.string().url().optional().describe("Original cloud URL (replaced by localUrl during execution)"),
4835
4857
  /** Cloud project ID (required for electron workflow context). */
@@ -4850,8 +4872,10 @@ var ExecuteTestGenerationInputSchema = z.object({
4850
4872
  showUi: z.boolean().optional().describe("Show the electron-app UI during execution (default: false, runs headless)")
4851
4873
  });
4852
4874
  var ExecuteReplayInputSchema = z.object({
4853
- /** Test script details from qa_test_script_get. */
4854
- testScript: TestScriptDetailsSchema.describe("Test script details obtained from qa_test_script_get"),
4875
+ /** Test script metadata from muggle-remote-test-script-get. */
4876
+ testScript: TestScriptDetailsSchema.describe("Test script metadata from muggle-remote-test-script-get"),
4877
+ /** Action script content from muggle-remote-action-script-get (using testScript.actionScriptId). */
4878
+ actionScript: z.array(z.unknown()).describe("Action script steps from muggle-remote-action-script-get"),
4855
4879
  /** Local URL to test against. */
4856
4880
  localUrl: z.string().url().describe("Local URL to test against (e.g., http://localhost:3000)"),
4857
4881
  /** Explicit approval to launch electron-app. */
@@ -5154,7 +5178,7 @@ var executeTestGenerationTool = {
5154
5178
  };
5155
5179
  var executeReplayTool = {
5156
5180
  name: "muggle-local-execute-replay",
5157
- description: "Replay an existing QA test script in a real browser to verify your app still works correctly \u2014 use this for regression testing after code changes. The browser executes each saved step and captures screenshots so you can see what happened. Requires a test script (from muggle-remote-test-script-get) and a localhost URL. Launches an Electron browser \u2014 requires explicit approval via approveElectronAppLaunch. Runs headless by default; set showUi: true to watch.",
5181
+ description: "Replay an existing QA test script in a real browser to verify your app still works correctly \u2014 use this for regression testing after code changes. The browser executes each saved step and captures screenshots so you can see what happened. Requires: (1) test script metadata from muggle-remote-test-script-get, (2) actionScript content from muggle-remote-action-script-get using the testScript.actionScriptId, and (3) a localhost URL. Launches an Electron browser \u2014 requires explicit approval via approveElectronAppLaunch. Runs headless by default; set showUi: true to watch.",
5158
5182
  inputSchema: ExecuteReplayInputSchema,
5159
5183
  execute: async (ctx) => {
5160
5184
  const logger14 = createChildLogger2(ctx.correlationId);
@@ -5171,7 +5195,7 @@ var executeReplayTool = {
5171
5195
  "",
5172
5196
  `**Test Script:** ${input.testScript.name}`,
5173
5197
  `**Local URL:** ${input.localUrl}`,
5174
- `**Steps:** ${input.testScript.actionScript.length}`,
5198
+ `**Steps:** ${input.actionScript.length}`,
5175
5199
  `**UI Mode:** ${uiMode}`,
5176
5200
  "",
5177
5201
  "**Note:** The electron-app will open a browser window and execute the test steps."
@@ -5183,6 +5207,7 @@ var executeReplayTool = {
5183
5207
  try {
5184
5208
  const result = await executeReplay({
5185
5209
  testScript: input.testScript,
5210
+ actionScript: input.actionScript,
5186
5211
  localUrl: input.localUrl,
5187
5212
  timeoutMs: input.timeoutMs,
5188
5213
  showUi: input.showUi
@@ -5225,7 +5250,7 @@ var cancelExecutionTool = {
5225
5250
  };
5226
5251
  var publishTestScriptTool = {
5227
5252
  name: "muggle-local-publish-test-script",
5228
- description: "Publish a locally generated test script to the cloud. Uses the run ID from muggle_execute_test_generation to find the script and uploads it to the specified cloud test case.",
5253
+ description: "Publish a locally generated test script to the cloud. Uses the run ID from muggle_execute_test_generation to find the script and uploads it to the specified cloud test case. Returns a viewUrl that can be opened in the user's browser to view the published test script on the dashboard.",
5229
5254
  inputSchema: PublishTestScriptInputSchema,
5230
5255
  execute: async (ctx) => {
5231
5256
  const logger14 = createChildLogger2(ctx.correlationId);
@@ -6923,7 +6948,7 @@ function getBinaryName2() {
6923
6948
  }
6924
6949
  }
6925
6950
  function extractVersionFromTag(tag) {
6926
- const match = tag.match(/^electron-app-v(\d+\.\d+\.\d+)$/);
6951
+ const match = tag.match(/^(?:electron-app-)?v(\d+\.\d+\.\d+)$/);
6927
6952
  return match ? match[1] : null;
6928
6953
  }
6929
6954
  function getVersionOverridePath() {
package/dist/cli.js CHANGED
@@ -1,5 +1,5 @@
1
1
  #!/usr/bin/env node
2
- import { runCli } from './chunk-CXTJOYWM.js';
2
+ import { runCli } from './chunk-BZJXQZ5Q.js';
3
3
 
4
4
  // src/cli/main.ts
5
5
  runCli().catch((error) => {
package/dist/index.js CHANGED
@@ -1 +1 @@
1
- export { src_exports2 as commands, createChildLogger, createUnifiedMcpServer, getConfig, getLocalQaTools, getLogger, getQaTools, local_exports as localQa, mcp_exports as mcp, qa_exports as qa, server_exports as server, src_exports as shared } from './chunk-CXTJOYWM.js';
1
+ export { src_exports2 as commands, createChildLogger, createUnifiedMcpServer, getConfig, getLocalQaTools, getLogger, getQaTools, local_exports as localQa, mcp_exports as mcp, qa_exports as qa, server_exports as server, src_exports as shared } from './chunk-BZJXQZ5Q.js';
@@ -5,118 +5,115 @@ description: Run a real-browser QA test against localhost to verify a feature wo
5
5
 
6
6
  # Muggle Test Feature Local
7
7
 
8
- Run end-to-end feature testing from UI against a local URL:
8
+ **Goal:** Run or generate an end-to-end test against a **local URL** using Muggle’s Electron browser.
9
9
 
10
- - Cloud management: `muggle-remote-*`
11
- - Local execution and artifacts: `muggle-local-*`
10
+ | Scope | MCP tools |
11
+ | :---- | :-------- |
12
+ | Cloud (projects, cases, scripts, auth) | `muggle-remote-*` |
13
+ | Local (Electron run, publish, results) | `muggle-local-*` |
14
+ | Create new entities (preview / create) | `muggle-remote-project-create`, `muggle-remote-use-case-prompt-preview`, `muggle-remote-use-case-create-from-prompts`, `muggle-remote-test-case-generate-from-prompt`, `muggle-remote-test-case-create` |
15
+
16
+ The local URL only changes where the browser opens; it does not change the remote project or test definitions.
12
17
 
13
18
  ## Workflow
14
19
 
15
20
  ### 1. Auth
16
21
 
17
22
  - `muggle-remote-auth-status`
18
- - If needed: `muggle-remote-auth-login` + `muggle-remote-auth-poll`
23
+ - If not signed in: `muggle-remote-auth-login` then `muggle-remote-auth-poll`
24
+ Do not skip or assume auth.
25
+
26
+ ### 2. Targets (user must confirm)
19
27
 
20
- ### 2. Select project, use case, and test case
28
+ Ask the user to pick **project**, **use case**, and **test case** (do not infer).
21
29
 
22
- - Explicitly ask user to select each target to proceed.
23
30
  - `muggle-remote-project-list`
24
- - `muggle-remote-use-case-list`
25
- - `muggle-remote-test-case-list-by-use-case`
31
+ - `muggle-remote-use-case-list` (with `projectId`)
32
+ - `muggle-remote-test-case-list-by-use-case` (with `useCaseId`)
33
+
34
+ **Selection UI (mandatory):** After each list call, present choices as a **numbered list** (`1.` … `n.`). Keep each line minimal: number, short title, UUID. Ask the user to **reply with the number** or the UUID.
35
+
36
+ **Fixed tail of each pick list (project, use case, test case):** After the relevance-ranked rows, end with the options below. **Create new …** is never omitted; **Show full list** is omitted when it would be pointless (see empty list).
37
+
38
+ 1. **Show full list** — user sees every row from the API (then re-number the full list including the tails below again). **Skip this option** if the API returned **zero** rows for that step (e.g. no test cases yet for the chosen use case). There is nothing to expand.
39
+ 2. **Create new …** — user creates a new entity instead of picking an existing one. Label per step: **Create new project**, **Create new use case**, or **Create new test case**.
40
+
41
+ **Relevance-first filtering (mandatory for project, use case, and test case lists):**
26
42
 
27
- ### 3. Resolve local URL
43
+ - Do **not** dump the full list by default.
44
+ - Rank items by semantic relevance to the user’s stated goal (title first, then description / user story / acceptance criteria).
45
+ - Show only the **top 3–5** most relevant options, then **Show full list** (unless the API list is empty — see above), then **Create new …** as above.
46
+ - If the user picks **Show full list**, then present the complete numbered list (still ending with **Create new …**; include **Show full list** again only when the full list has at least one row).
28
47
 
29
- - Use the URL provided by the user.
30
- - If missing, ask explicitly (do not guess).
31
- - Inform user the local URL does not affect the project's remote test.
48
+ **Create new tools and flow (use these MCP tools; preview before persist):**
32
49
 
33
- ### 4. Check for existing scripts and ask user to choose
50
+ - **Project Create new project:** Collect `projectName`, `description`, and `url` (may be the local app URL, e.g. `http://localhost:3999`). Call `muggle-remote-project-create`. Use the returned `projectId` and continue.
51
+ - **Use case — Create new use case:** User provides a natural-language instruction (or you reuse their testing goal).
52
+ 1. `muggle-remote-use-case-prompt-preview` with `projectId`, `instruction` — show preview; get confirmation.
53
+ 2. `muggle-remote-use-case-create-from-prompts` with `projectId`, `prompts: [{ instruction }]` — persist. Use the created use case id and continue to test-case selection.
54
+ - **Test case — Create new test case** (requires a chosen `useCaseId`): User provides an instruction describing what to test.
55
+ 1. `muggle-remote-test-case-generate-from-prompt` with `projectId`, `useCaseId`, `instruction` — **preview only** (server test-case prompt preview); show the returned draft(s); get confirmation.
56
+ 2. Persist the accepted draft with `muggle-remote-test-case-create`, mapping preview fields into the required properties (`title`, `description`, `goal`, `expectedResult`, `url`, etc.). Then continue from **§4** with that `testCaseId`.
34
57
 
35
- Check BOTH cloud and local scripts to determine what's available:
58
+ ### 3. Local URL
36
59
 
37
- 1. **Check cloud scripts:** `muggle-remote-test-script-list` filtered by projectId
38
- 2. **Check local scripts:** `muggle-local-test-script-list` filtered by projectId
60
+ - Use the URL the user gives. If none, ask; **do not guess**.
61
+ - Remind them: local URL is only the execution target, not tied to cloud project config.
39
62
 
40
- **Decision logic:**
63
+ ### 4. Existing scripts vs new generation
41
64
 
42
- | Cloud Script | Local Script (status: published/generated) | Action |
43
- |--------------|---------------------------------------------|--------|
44
- | Exists + ACTIVE | Exists | Ask user: "Replay existing script" or "Regenerate from scratch"? |
45
- | Exists + ACTIVE | Not found | Sync from cloud first, then ask user |
46
- | Not found | Exists | Ask user: "Replay local script" or "Regenerate"? |
47
- | Not found | Not found | Default to generation (no need to ask) |
65
+ `muggle-remote-test-script-list` with `testCaseId`.
48
66
 
49
- **When asking user, show:**
50
- - Script name and ID
51
- - When it was created/updated
52
- - Number of steps
53
- - Last run status if available
67
+ - **If any replayable/succeeded scripts exist:** list them in a **numbered** list and ask: replay one **or** generate new.
68
+ Show: name, id, created/updated, step count. Include **`Generate new script`** as the **last** numbered option (e.g. last number) so it is selectable by number too.
69
+ - **If none:** go straight to generation (no need to ask replay vs generate).
54
70
 
55
- ### 5. Prepare for execution
71
+ ### 5. Load data for the chosen path
56
72
 
57
- **For Replay:**
73
+ **Generate**
58
74
 
59
- Local scripts contain the complete `actionScript` with element labels required for replay. Remote scripts only contain metadata.
75
+ 1. `muggle-remote-test-case-get`
76
+ 2. `muggle-local-execute-test-generation` (after approval in step 6) with that test case + `localUrl` + `approveElectronAppLaunch: true` (optional: `showUi: true`, **`timeoutMs`** — see below)
60
77
 
61
- 1. Use `muggle-local-test-script-get` with `testScriptId` to fetch the FULL script including actionScript
62
- 2. The returned script includes all steps with `operation.label` paths needed for element location
63
- 3. Pass this complete script to `muggle-local-execute-replay`
78
+ **Replay**
64
79
 
65
- **IMPORTANT:** Do NOT manually construct or simplify the actionScript. The electron app requires the complete script with all `label` paths intact to locate page elements during replay.
80
+ 1. `muggle-remote-test-script-get` note `actionScriptId`
81
+ 2. `muggle-remote-action-script-get` with that id → full `actionScript`
82
+ **Use the API response as-is.** Do not edit, shorten, or rebuild `actionScript`; replay needs full `label` paths for element lookup.
83
+ 3. `muggle-local-execute-replay` (after approval in step 6) with `testScript`, `actionScript`, `localUrl`, `approveElectronAppLaunch: true` (optional: `showUi: true`, **`timeoutMs`** — see below)
66
84
 
67
- **For Generation:**
85
+ ### Local execution timeout (`timeoutMs`)
68
86
 
69
- 1. `muggle-remote-test-case-get` to fetch test case details
70
- 2. `muggle-local-execute-test-generation` with the test case
87
+ The MCP client often uses a **default wait of 300000 ms (5 minutes)** for `muggle-local-execute-test-generation` and `muggle-local-execute-replay`. **Exploratory script generation** (Auth0 login, dashboards, multi-step wizards, many LLM iterations) routinely **runs longer than 5 minutes** while Electron is still healthy.
71
88
 
72
- ### 6. Approval requirement
89
+ - **Always pass `timeoutMs`** for flows that may be long — for example **`600000` (10 min)** or **`900000` (15 min)** — unless the user explicitly wants a short cap.
90
+ - If the tool reports **`Electron execution timed out after 300000ms`** (or similar) **but** Electron logs show the run still progressing (steps, screenshots, LLM calls), treat it as **orchestration timeout**, not an Electron app defect: **increase `timeoutMs` and retry** (after user re-approves if your policy requires it).
91
+ - **Test case design:** Preconditions like “a test run has already completed” on an **empty account** can force many steps (sign-up, new project, crawl). Prefer an account/project that **already has** the needed state, or narrow the test goal so generation does not try to create a full project from scratch unless that is intentional.
73
92
 
74
- - Before execution, get explicit user approval for launching Electron app.
75
- - Show what will be executed (replay vs generation, test case name, URL).
76
- - Only then set `approveElectronAppLaunch: true`.
93
+ ### Interpreting `failed` / non-zero Electron exit
77
94
 
78
- ### 7. Execute
95
+ - **`Electron execution timed out after 300000ms`:** Orchestration wait too short — see **`timeoutMs`** above.
96
+ - **Exit code 26** (and messages like **LLM failed to generate / replay action script**): Often corresponds to a completed exploration whose **outcome was goal not achievable** (`goal_not_achievable`, summary with `halt`) — e.g. verifying “view script after a successful run” when **no run or script exists yet** in the UI. Use `muggle-local-run-result-get` and read the **summary / structured summary**; do not assume an Electron crash. **Fix:** choose a **project that already has** completed runs and scripts, or **change the test case** so preconditions match what localhost can satisfy (e.g. include steps to create and run a test first, or assert only empty-state UI when no runs exist).
79
97
 
80
- **Replay:**
81
- ```
82
- muggle-local-execute-replay with:
83
- - testScript: (full script from muggle-local-test-script-get)
84
- - localUrl: user-provided localhost URL
85
- - approveElectronAppLaunch: true
86
- - showUi: true (optional, lets user watch)
87
- ```
98
+ ### 6. Approval before any local execution
88
99
 
89
- **Generation:**
90
- ```
91
- muggle-local-execute-test-generation with:
92
- - testCase: (from muggle-remote-test-case-get)
93
- - localUrl: user-provided localhost URL
94
- - approveElectronAppLaunch: true
95
- - showUi: true (optional)
96
- ```
100
+ Get **explicit** OK to launch Electron. State: replay vs generation, test case name, URL.
101
+ Only then call local execute tools with `approveElectronAppLaunch: true`.
97
102
 
98
- ### 8. Publish generation results (generation only)
103
+ ### 7. After successful generation only
99
104
 
100
- - Use `muggle-local-publish-test-script` after successful generation.
101
- - This uploads the script to cloud so it can be replayed later.
102
- - Return the remote URL for user to view the result.
105
+ - `muggle-local-publish-test-script`
106
+ - Open returned `viewUrl` for the user (`open "<viewUrl>"` on macOS or OS equivalent).
103
107
 
104
- ### 9. Report results
108
+ ### 8. Report
105
109
 
106
- - `muggle-local-run-result-get` with returned runId.
107
- - Report:
108
- - status (passed/failed)
109
- - duration
110
- - pass/fail summary
111
- - steps summary (which steps passed/failed)
112
- - artifacts path (screenshots location)
113
- - script detail view URL
110
+ - `muggle-local-run-result-get` with the run id from execute.
111
+ - Include: status, duration, pass/fail summary, per-step summary, artifact/screenshot paths, errors if failed, and script view URL when publishing ran.
114
112
 
115
- ## Guardrails
113
+ ## Non-negotiables
116
114
 
117
- - Do not silently skip auth.
118
- - Do not silently skip asking user when a replayable script exists.
119
- - Do not launch Electron without explicit approval.
120
- - Do not hide failing run details; include error and artifacts path.
121
- - Do not simplify or reconstruct actionScript for replay; use the complete script from `muggle-local-test-script-get`.
122
- - Always check local scripts before defaulting to generation.
115
+ - No silent auth skip; no launching Electron without approval.
116
+ - If replayable scripts exist, do not default to generation without user choice.
117
+ - No hiding failures: surface errors and artifact paths.
118
+ - Replay: never hand-built or simplified `actionScript` only from `muggle-remote-action-script-get`.
119
+ - Project, use case, and test case selection lists must always include **Create new …**. Include **Show full list** whenever the API returned at least one row for that step; **omit Show full list** when the list is empty (offer **Create new …** only). For creates, use preview tools (`muggle-remote-use-case-prompt-preview`, `muggle-remote-test-case-generate-from-prompt`) before persisting.
package/package.json CHANGED
@@ -1,7 +1,7 @@
1
1
  {
2
2
  "name": "@muggleai/works",
3
3
  "mcpName": "io.github.multiplex-ai/muggle",
4
- "version": "4.2.1",
4
+ "version": "4.2.2",
5
5
  "description": "Ship quality products with AI-powered QA that validates your app's user experience — from Claude Code and Cursor to PR.",
6
6
  "type": "module",
7
7
  "main": "dist/index.js",
@@ -5,118 +5,115 @@ description: Run a real-browser QA test against localhost to verify a feature wo
5
5
 
6
6
  # Muggle Test Feature Local
7
7
 
8
- Run end-to-end feature testing from UI against a local URL:
8
+ **Goal:** Run or generate an end-to-end test against a **local URL** using Muggle’s Electron browser.
9
9
 
10
- - Cloud management: `muggle-remote-*`
11
- - Local execution and artifacts: `muggle-local-*`
10
+ | Scope | MCP tools |
11
+ | :---- | :-------- |
12
+ | Cloud (projects, cases, scripts, auth) | `muggle-remote-*` |
13
+ | Local (Electron run, publish, results) | `muggle-local-*` |
14
+ | Create new entities (preview / create) | `muggle-remote-project-create`, `muggle-remote-use-case-prompt-preview`, `muggle-remote-use-case-create-from-prompts`, `muggle-remote-test-case-generate-from-prompt`, `muggle-remote-test-case-create` |
15
+
16
+ The local URL only changes where the browser opens; it does not change the remote project or test definitions.
12
17
 
13
18
  ## Workflow
14
19
 
15
20
  ### 1. Auth
16
21
 
17
22
  - `muggle-remote-auth-status`
18
- - If needed: `muggle-remote-auth-login` + `muggle-remote-auth-poll`
23
+ - If not signed in: `muggle-remote-auth-login` then `muggle-remote-auth-poll`
24
+ Do not skip or assume auth.
25
+
26
+ ### 2. Targets (user must confirm)
19
27
 
20
- ### 2. Select project, use case, and test case
28
+ Ask the user to pick **project**, **use case**, and **test case** (do not infer).
21
29
 
22
- - Explicitly ask user to select each target to proceed.
23
30
  - `muggle-remote-project-list`
24
- - `muggle-remote-use-case-list`
25
- - `muggle-remote-test-case-list-by-use-case`
31
+ - `muggle-remote-use-case-list` (with `projectId`)
32
+ - `muggle-remote-test-case-list-by-use-case` (with `useCaseId`)
33
+
34
+ **Selection UI (mandatory):** After each list call, present choices as a **numbered list** (`1.` … `n.`). Keep each line minimal: number, short title, UUID. Ask the user to **reply with the number** or the UUID.
35
+
36
+ **Fixed tail of each pick list (project, use case, test case):** After the relevance-ranked rows, end with the options below. **Create new …** is never omitted; **Show full list** is omitted when it would be pointless (see empty list).
37
+
38
+ 1. **Show full list** — user sees every row from the API (then re-number the full list including the tails below again). **Skip this option** if the API returned **zero** rows for that step (e.g. no test cases yet for the chosen use case). There is nothing to expand.
39
+ 2. **Create new …** — user creates a new entity instead of picking an existing one. Label per step: **Create new project**, **Create new use case**, or **Create new test case**.
40
+
41
+ **Relevance-first filtering (mandatory for project, use case, and test case lists):**
26
42
 
27
- ### 3. Resolve local URL
43
+ - Do **not** dump the full list by default.
44
+ - Rank items by semantic relevance to the user’s stated goal (title first, then description / user story / acceptance criteria).
45
+ - Show only the **top 3–5** most relevant options, then **Show full list** (unless the API list is empty — see above), then **Create new …** as above.
46
+ - If the user picks **Show full list**, then present the complete numbered list (still ending with **Create new …**; include **Show full list** again only when the full list has at least one row).
28
47
 
29
- - Use the URL provided by the user.
30
- - If missing, ask explicitly (do not guess).
31
- - Inform user the local URL does not affect the project's remote test.
48
+ **Create new tools and flow (use these MCP tools; preview before persist):**
32
49
 
33
- ### 4. Check for existing scripts and ask user to choose
50
+ - **Project Create new project:** Collect `projectName`, `description`, and `url` (may be the local app URL, e.g. `http://localhost:3999`). Call `muggle-remote-project-create`. Use the returned `projectId` and continue.
51
+ - **Use case — Create new use case:** User provides a natural-language instruction (or you reuse their testing goal).
52
+ 1. `muggle-remote-use-case-prompt-preview` with `projectId`, `instruction` — show preview; get confirmation.
53
+ 2. `muggle-remote-use-case-create-from-prompts` with `projectId`, `prompts: [{ instruction }]` — persist. Use the created use case id and continue to test-case selection.
54
+ - **Test case — Create new test case** (requires a chosen `useCaseId`): User provides an instruction describing what to test.
55
+ 1. `muggle-remote-test-case-generate-from-prompt` with `projectId`, `useCaseId`, `instruction` — **preview only** (server test-case prompt preview); show the returned draft(s); get confirmation.
56
+ 2. Persist the accepted draft with `muggle-remote-test-case-create`, mapping preview fields into the required properties (`title`, `description`, `goal`, `expectedResult`, `url`, etc.). Then continue from **§4** with that `testCaseId`.
34
57
 
35
- Check BOTH cloud and local scripts to determine what's available:
58
+ ### 3. Local URL
36
59
 
37
- 1. **Check cloud scripts:** `muggle-remote-test-script-list` filtered by projectId
38
- 2. **Check local scripts:** `muggle-local-test-script-list` filtered by projectId
60
+ - Use the URL the user gives. If none, ask; **do not guess**.
61
+ - Remind them: local URL is only the execution target, not tied to cloud project config.
39
62
 
40
- **Decision logic:**
63
+ ### 4. Existing scripts vs new generation
41
64
 
42
- | Cloud Script | Local Script (status: published/generated) | Action |
43
- |--------------|---------------------------------------------|--------|
44
- | Exists + ACTIVE | Exists | Ask user: "Replay existing script" or "Regenerate from scratch"? |
45
- | Exists + ACTIVE | Not found | Sync from cloud first, then ask user |
46
- | Not found | Exists | Ask user: "Replay local script" or "Regenerate"? |
47
- | Not found | Not found | Default to generation (no need to ask) |
65
+ `muggle-remote-test-script-list` with `testCaseId`.
48
66
 
49
- **When asking user, show:**
50
- - Script name and ID
51
- - When it was created/updated
52
- - Number of steps
53
- - Last run status if available
67
+ - **If any replayable/succeeded scripts exist:** list them in a **numbered** list and ask: replay one **or** generate new.
68
+ Show: name, id, created/updated, step count. Include **`Generate new script`** as the **last** numbered option (e.g. last number) so it is selectable by number too.
69
+ - **If none:** go straight to generation (no need to ask replay vs generate).
54
70
 
55
- ### 5. Prepare for execution
71
+ ### 5. Load data for the chosen path
56
72
 
57
- **For Replay:**
73
+ **Generate**
58
74
 
59
- Local scripts contain the complete `actionScript` with element labels required for replay. Remote scripts only contain metadata.
75
+ 1. `muggle-remote-test-case-get`
76
+ 2. `muggle-local-execute-test-generation` (after approval in step 6) with that test case + `localUrl` + `approveElectronAppLaunch: true` (optional: `showUi: true`, **`timeoutMs`** — see below)
60
77
 
61
- 1. Use `muggle-local-test-script-get` with `testScriptId` to fetch the FULL script including actionScript
62
- 2. The returned script includes all steps with `operation.label` paths needed for element location
63
- 3. Pass this complete script to `muggle-local-execute-replay`
78
+ **Replay**
64
79
 
65
- **IMPORTANT:** Do NOT manually construct or simplify the actionScript. The electron app requires the complete script with all `label` paths intact to locate page elements during replay.
80
+ 1. `muggle-remote-test-script-get` note `actionScriptId`
81
+ 2. `muggle-remote-action-script-get` with that id → full `actionScript`
82
+ **Use the API response as-is.** Do not edit, shorten, or rebuild `actionScript`; replay needs full `label` paths for element lookup.
83
+ 3. `muggle-local-execute-replay` (after approval in step 6) with `testScript`, `actionScript`, `localUrl`, `approveElectronAppLaunch: true` (optional: `showUi: true`, **`timeoutMs`** — see below)
66
84
 
67
- **For Generation:**
85
+ ### Local execution timeout (`timeoutMs`)
68
86
 
69
- 1. `muggle-remote-test-case-get` to fetch test case details
70
- 2. `muggle-local-execute-test-generation` with the test case
87
+ The MCP client often uses a **default wait of 300000 ms (5 minutes)** for `muggle-local-execute-test-generation` and `muggle-local-execute-replay`. **Exploratory script generation** (Auth0 login, dashboards, multi-step wizards, many LLM iterations) routinely **runs longer than 5 minutes** while Electron is still healthy.
71
88
 
72
- ### 6. Approval requirement
89
+ - **Always pass `timeoutMs`** for flows that may be long — for example **`600000` (10 min)** or **`900000` (15 min)** — unless the user explicitly wants a short cap.
90
+ - If the tool reports **`Electron execution timed out after 300000ms`** (or similar) **but** Electron logs show the run still progressing (steps, screenshots, LLM calls), treat it as **orchestration timeout**, not an Electron app defect: **increase `timeoutMs` and retry** (after user re-approves if your policy requires it).
91
+ - **Test case design:** Preconditions like “a test run has already completed” on an **empty account** can force many steps (sign-up, new project, crawl). Prefer an account/project that **already has** the needed state, or narrow the test goal so generation does not try to create a full project from scratch unless that is intentional.
73
92
 
74
- - Before execution, get explicit user approval for launching Electron app.
75
- - Show what will be executed (replay vs generation, test case name, URL).
76
- - Only then set `approveElectronAppLaunch: true`.
93
+ ### Interpreting `failed` / non-zero Electron exit
77
94
 
78
- ### 7. Execute
95
+ - **`Electron execution timed out after 300000ms`:** Orchestration wait too short — see **`timeoutMs`** above.
96
+ - **Exit code 26** (and messages like **LLM failed to generate / replay action script**): Often corresponds to a completed exploration whose **outcome was goal not achievable** (`goal_not_achievable`, summary with `halt`) — e.g. verifying “view script after a successful run” when **no run or script exists yet** in the UI. Use `muggle-local-run-result-get` and read the **summary / structured summary**; do not assume an Electron crash. **Fix:** choose a **project that already has** completed runs and scripts, or **change the test case** so preconditions match what localhost can satisfy (e.g. include steps to create and run a test first, or assert only empty-state UI when no runs exist).
79
97
 
80
- **Replay:**
81
- ```
82
- muggle-local-execute-replay with:
83
- - testScript: (full script from muggle-local-test-script-get)
84
- - localUrl: user-provided localhost URL
85
- - approveElectronAppLaunch: true
86
- - showUi: true (optional, lets user watch)
87
- ```
98
+ ### 6. Approval before any local execution
88
99
 
89
- **Generation:**
90
- ```
91
- muggle-local-execute-test-generation with:
92
- - testCase: (from muggle-remote-test-case-get)
93
- - localUrl: user-provided localhost URL
94
- - approveElectronAppLaunch: true
95
- - showUi: true (optional)
96
- ```
100
+ Get **explicit** OK to launch Electron. State: replay vs generation, test case name, URL.
101
+ Only then call local execute tools with `approveElectronAppLaunch: true`.
97
102
 
98
- ### 8. Publish generation results (generation only)
103
+ ### 7. After successful generation only
99
104
 
100
- - Use `muggle-local-publish-test-script` after successful generation.
101
- - This uploads the script to cloud so it can be replayed later.
102
- - Return the remote URL for user to view the result.
105
+ - `muggle-local-publish-test-script`
106
+ - Open returned `viewUrl` for the user (`open "<viewUrl>"` on macOS or OS equivalent).
103
107
 
104
- ### 9. Report results
108
+ ### 8. Report
105
109
 
106
- - `muggle-local-run-result-get` with returned runId.
107
- - Report:
108
- - status (passed/failed)
109
- - duration
110
- - pass/fail summary
111
- - steps summary (which steps passed/failed)
112
- - artifacts path (screenshots location)
113
- - script detail view URL
110
+ - `muggle-local-run-result-get` with the run id from execute.
111
+ - Include: status, duration, pass/fail summary, per-step summary, artifact/screenshot paths, errors if failed, and script view URL when publishing ran.
114
112
 
115
- ## Guardrails
113
+ ## Non-negotiables
116
114
 
117
- - Do not silently skip auth.
118
- - Do not silently skip asking user when a replayable script exists.
119
- - Do not launch Electron without explicit approval.
120
- - Do not hide failing run details; include error and artifacts path.
121
- - Do not simplify or reconstruct actionScript for replay; use the complete script from `muggle-local-test-script-get`.
122
- - Always check local scripts before defaulting to generation.
115
+ - No silent auth skip; no launching Electron without approval.
116
+ - If replayable scripts exist, do not default to generation without user choice.
117
+ - No hiding failures: surface errors and artifact paths.
118
+ - Replay: never hand-built or simplified `actionScript` only from `muggle-remote-action-script-get`.
119
+ - Project, use case, and test case selection lists must always include **Create new …**. Include **Show full list** whenever the API returned at least one row for that step; **omit Show full list** when the list is empty (offer **Create new …** only). For creates, use preview tools (`muggle-remote-use-case-prompt-preview`, `muggle-remote-test-case-generate-from-prompt`) before persisting.
@@ -574,7 +574,7 @@ async function downloadElectronApp() {
574
574
  const checksums = config.checksums || {};
575
575
  const platformKey = getPlatformKey();
576
576
  const expectedChecksum = checksums[platformKey] || "";
577
- const downloadUrl = `${baseUrl}/v${version}/${binaryName}`;
577
+ const downloadUrl = `${baseUrl}/electron-app-v${version}/${binaryName}`;
578
578
 
579
579
  const appDir = getElectronAppDir();
580
580
  const versionDir = join(appDir, version);
@@ -703,7 +703,7 @@ async function downloadElectronApp() {
703
703
  const packageJson = require("../package.json");
704
704
  const config = packageJson.muggleConfig || {};
705
705
  logError(" - Electron app version:", config.electronAppVersion || "unknown");
706
- logError(" - Download URL:", `${config.downloadBaseUrl}/v${config.electronAppVersion}/${getBinaryName()}`);
706
+ logError(" - Download URL:", `${config.downloadBaseUrl}/electron-app-v${config.electronAppVersion}/${getBinaryName()}`);
707
707
  } catch {
708
708
  logError(" - Could not read package.json config");
709
709
  }