@muggleai/works 4.2.2 → 4.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (33) hide show
  1. package/README.md +45 -37
  2. package/dist/{chunk-BZJXQZ5Q.js → chunk-23NOSJFH.js} +247 -172
  3. package/dist/cli.js +1 -1
  4. package/dist/index.js +1 -1
  5. package/dist/plugin/.claude-plugin/plugin.json +4 -4
  6. package/dist/plugin/.cursor-plugin/plugin.json +3 -3
  7. package/dist/plugin/README.md +7 -5
  8. package/dist/plugin/scripts/ensure-electron-app.sh +3 -3
  9. package/dist/plugin/skills/do/e2e-acceptance.md +161 -0
  10. package/dist/plugin/skills/do/open-prs.md +78 -14
  11. package/dist/plugin/skills/muggle/SKILL.md +4 -2
  12. package/dist/plugin/skills/muggle-do/SKILL.md +6 -6
  13. package/dist/plugin/skills/muggle-test/SKILL.md +416 -0
  14. package/dist/plugin/skills/muggle-test-feature-local/SKILL.md +1 -1
  15. package/dist/plugin/skills/muggle-test-import/SKILL.md +276 -0
  16. package/dist/plugin/skills/muggle-upgrade/SKILL.md +1 -1
  17. package/dist/plugin/skills/optimize-descriptions/SKILL.md +8 -8
  18. package/package.json +15 -12
  19. package/plugin/.claude-plugin/plugin.json +4 -4
  20. package/plugin/.cursor-plugin/plugin.json +3 -3
  21. package/plugin/README.md +7 -5
  22. package/plugin/scripts/ensure-electron-app.sh +3 -3
  23. package/plugin/skills/do/e2e-acceptance.md +161 -0
  24. package/plugin/skills/do/open-prs.md +78 -14
  25. package/plugin/skills/muggle/SKILL.md +4 -2
  26. package/plugin/skills/muggle-do/SKILL.md +6 -6
  27. package/plugin/skills/muggle-test/SKILL.md +416 -0
  28. package/plugin/skills/muggle-test-feature-local/SKILL.md +1 -1
  29. package/plugin/skills/muggle-test-import/SKILL.md +276 -0
  30. package/plugin/skills/muggle-upgrade/SKILL.md +1 -1
  31. package/plugin/skills/optimize-descriptions/SKILL.md +8 -8
  32. package/dist/plugin/skills/do/qa.md +0 -89
  33. package/plugin/skills/do/qa.md +0 -89
@@ -0,0 +1,161 @@
1
+ # E2E / acceptance agent
2
+
3
+ You are running **end-to-end (E2E) acceptance** test cases against code changes using Muggle AI's local testing infrastructure. These tests simulate real users in a browser — they are not unit tests.
4
+
5
+ ## Design
6
+
7
+ E2E acceptance testing runs **locally** using the `test-feature-local` approach:
8
+
9
+ | Scope | MCP tools |
10
+ | :---- | :-------- |
11
+ | Cloud (projects, cases, scripts, auth) | `muggle-remote-*` |
12
+ | Local (Electron run, publish, results) | `muggle-local-*` |
13
+
14
+ This guarantees E2E acceptance tests always run — no dependency on cloud replay service availability.
15
+
16
+ ## Input
17
+
18
+ You receive:
19
+ - The Muggle project ID
20
+ - The list of changed repos, files, and a summary of changes
21
+ - The requirements goal
22
+ - `localUrl` per repo (from `muggle-repos.json`) — the locally running dev server URL
23
+
24
+ ## Your Job
25
+
26
+ ### Step 0: Resolve Local URL
27
+
28
+ Read `localUrl` for each repo from the context. If it is not provided, ask the user:
29
+ > "E2E acceptance testing requires a running local server. What URL is the `<repo>` app running on? (e.g. `http://localhost:3000`)"
30
+
31
+ **Do not skip E2E acceptance tests.** Wait for the user to provide the URL before proceeding.
32
+
33
+ ### Step 1: Check Authentication
34
+
35
+ - `muggle-remote-auth-status`
36
+ - If not signed in: `muggle-remote-auth-login` then `muggle-remote-auth-poll`
37
+
38
+ Do not skip or assume auth.
39
+
40
+ ### Step 2: Get Test Cases
41
+
42
+ Use `muggle-remote-test-case-list` with the project ID to fetch all test cases.
43
+
44
+ ### Step 3: Filter Relevant Test Cases
45
+
46
+ Based on the changed files and the requirements goal, determine which test cases are relevant:
47
+ - Test cases whose use cases directly relate to the changed functionality
48
+ - Test cases that cover areas potentially affected by the changes
49
+ - When in doubt, include the test case (better to over-test than miss a regression)
50
+
51
+ ### Step 4: Execute Tests Locally
52
+
53
+ For each relevant test case:
54
+
55
+ 1. Call `muggle-remote-test-script-list` filtered by `testCaseId` to check for an existing script.
56
+
57
+ 2. **If a script exists** (replay path):
58
+ - `muggle-remote-test-script-get` with `testScriptId` → note `actionScriptId`
59
+ - `muggle-remote-action-script-get` with that id → full `actionScript`
60
+ - **Use the API response as-is.** Do not edit, shorten, or rebuild `actionScript`; replay needs full `label` paths for element lookup.
61
+ - `muggle-local-execute-replay` with:
62
+ - `testScript`: the full script object
63
+ - `actionScript`: the full action script object (from `muggle-remote-action-script-get`)
64
+ - `localUrl`: the resolved local URL
65
+ - `approveElectronAppLaunch`: `true` *(pipeline context — user starting `muggle-do` is implicit approval)*
66
+ - `timeoutMs`: `600000` (10 min) or `900000` (15 min) for complex flows
67
+
68
+ 3. **If no script exists** (generation path):
69
+ - `muggle-remote-test-case-get` with `testCaseId` to fetch the full test case object.
70
+ - `muggle-local-execute-test-generation` with:
71
+ - `testCase`: the full test case object
72
+ - `localUrl`: the resolved local URL
73
+ - `approveElectronAppLaunch`: `true`
74
+ - `timeoutMs`: `600000` (10 min) or `900000` (15 min) for complex flows
75
+
76
+ 4. When execution completes, call `muggle-local-run-result-get` with the `runId` returned by the execute call.
77
+
78
+ 5. **Retain per test case:** `testCaseId`, `testScriptId` (if present), `runId`, `status` (passed/failed), `artifactsDir`.
79
+
80
+ ### Local Execution Timeout (`timeoutMs`)
81
+
82
+ The MCP client often uses a **default wait of 300000 ms (5 minutes)**. **Exploratory script generation** (Auth0 login, dashboards, multi-step wizards, many LLM iterations) routinely **runs longer than 5 minutes** while Electron is still healthy.
83
+
84
+ - **Always pass `timeoutMs`** — `600000` (10 min) or `900000` (15 min) — unless the test case is known to be simple.
85
+ - If the tool reports **`Electron execution timed out after 300000ms`** but Electron logs show the run still progressing (steps, screenshots, LLM calls), treat it as **orchestration timeout**, not an Electron app defect: **increase `timeoutMs` and retry**.
86
+
87
+ ### Interpreting Failures
88
+
89
+ - **`Electron execution timed out after 300000ms`:** Orchestration wait too short — see `timeoutMs` above.
90
+ - **Exit code 26** (and messages like **LLM failed to generate / replay action script**): Often corresponds to a completed exploration whose **outcome was goal not achievable** (`goal_not_achievable`, summary with `halt`). Use `muggle-local-run-result-get` and read the **summary / structured summary**; do not assume an Electron crash.
91
+ - **Fix for precondition failures:** Choose a project/account that already has the needed state, or narrow the test goal so generation does not try to create resources from scratch unless intentional.
92
+
93
+ ### Step 5: Publish Test Scripts
94
+
95
+ After each test execution completes (whether pass or fail):
96
+
97
+ 1. Call `muggle-local-publish-test-script` with:
98
+ - `runId`: the run ID from execution
99
+ - `cloudTestCaseId`: the test case ID
100
+
101
+ 2. **Retain from publish response:**
102
+ - `testScriptId`: the cloud test script ID
103
+ - `viewUrl`: the URL to view the run on muggle-ai.com
104
+
105
+ This ensures all screenshots are uploaded to the cloud and accessible via URLs for PR comments.
106
+
107
+ ### Step 6: Fetch Screenshot URLs
108
+
109
+ For each published test script:
110
+
111
+ 1. Call `muggle-remote-test-script-get` with the `testScriptId` from publish.
112
+
113
+ 2. Extract from the response:
114
+ - `steps[].operation.screenshotUrl`: cloud URL for each step's screenshot
115
+ - `steps[].operation.action`: the action description for each step
116
+
117
+ 3. **Retain per test case:** array of `{ stepIndex, action, screenshotUrl }`.
118
+
119
+ ### Step 7: Collect Results
120
+
121
+ For each test case:
122
+ - Record pass or fail from the run result
123
+ - If failed, capture the error message, failure step index, and `artifactsDir` for local debugging
124
+ - Every test case must be executed — generate a new script if none exists (no skips)
125
+
126
+ ## Output
127
+
128
+ **E2E acceptance report:**
129
+
130
+ **Passed:** (count)
131
+ - (test case name):
132
+ - testCaseId: `<id>`
133
+ - testScriptId: `<id>`
134
+ - runId: `<id>`
135
+ - viewUrl: `<url>`
136
+ - steps: `[{ stepIndex, action, screenshotUrl }, ...]`
137
+
138
+ **Failed:** (count)
139
+ - (test case name):
140
+ - testCaseId: `<id>`
141
+ - testScriptId: `<id>`
142
+ - runId: `<id>`
143
+ - viewUrl: `<url>`
144
+ - failureStepIndex: `<index>`
145
+ - error: `<message>`
146
+ - steps: `[{ stepIndex, action, screenshotUrl }, ...]`
147
+ - artifactsDir: `<path>` (for local debugging)
148
+
149
+ **Metadata:**
150
+ - projectId: `<projectId>`
151
+
152
+ **Overall:** ALL PASSED | FAILURES DETECTED
153
+
154
+ ## Non-negotiables
155
+
156
+ - No silent auth skip; always verify with `muggle-remote-auth-status` first.
157
+ - Replay: never hand-build or simplify `actionScript` — only use full response from `muggle-remote-action-script-get`.
158
+ - Always pass `timeoutMs` for execution calls; do not rely on default 5-minute timeout.
159
+ - No hiding failures: surface errors, exit codes, and artifact paths.
160
+ - Every test case must be executed — generate a new script if none exists (no skips).
161
+ - Always publish after execution to ensure screenshots are cloud-accessible for PR comments.
@@ -7,7 +7,12 @@ You are creating pull requests for each repository that has changes after a succ
7
7
  You receive:
8
8
  - Per-repo: repo name, path, branch name
9
9
  - Requirements: goal, acceptance criteria
10
- - QA report: passed/failed test cases, each with testCaseId, testScriptId, runId, artifactsDir, and projectId
10
+ - E2E acceptance report: passed/failed test cases, each with:
11
+ - `testCaseId`, `testScriptId`, `runId`, `projectId`
12
+ - `viewUrl`: link to view run on muggle-ai.com
13
+ - `steps`: array of `{ stepIndex, action, screenshotUrl }`
14
+ - `failureStepIndex` and `error` (if failed)
15
+ - `artifactsDir` (for local debugging)
11
16
 
12
17
  ## Your Job
13
18
 
@@ -15,38 +20,97 @@ For each repo with changes:
15
20
 
16
21
  1. **Push the branch** to origin: `git push -u origin <branch-name>` in the repo directory.
17
22
  2. **Build the PR title:**
18
- - If QA has failures: `[QA FAILING] <goal>`
23
+ - If E2E acceptance tests have failures: `[E2E FAILING] <goal>`
19
24
  - Otherwise: `<goal>`
20
25
  - Keep under 70 characters
21
26
  3. **Build the PR body** with these sections:
22
27
  - `## Goal` — the requirements goal
23
28
  - `## Acceptance Criteria` — bulleted list (omit section if empty)
24
29
  - `## Changes` — summary of what changed in this repo
25
- - `## QA Results` — full test case breakdown (see format below)
30
+ - `## E2E Acceptance Results` — summary table (see format below)
26
31
  4. **Create the PR** using `gh pr create --title "..." --body "..." --head <branch>` in the repo directory.
27
- 5. **Capture the PR URL** from the output.
32
+ 5. **Capture the PR URL** and extract the PR number.
33
+ 6. **Post E2E acceptance evidence comment** with screenshots (see format below).
28
34
 
29
- ## QA Results Section Format
35
+ ## E2E acceptance results section format (PR body)
30
36
 
31
- ```
32
- ## QA Results
37
+ ```markdown
38
+ ## E2E Acceptance Results
33
39
 
34
40
  **X passed / Y failed**
35
41
 
36
42
  | Test Case | Status | Details |
37
43
  |-----------|--------|---------|
38
- | [Name](https://www.muggle-ai.com/muggleTestV0/dashboard/projects/{projectId}/scripts?modal=details&testCaseId={testCaseId}) | ✅ PASSED | — |
39
- | [Name](https://www.muggle-ai.com/muggleTestV0/dashboard/projects/{projectId}/scripts?modal=details&testCaseId={testCaseId}) | ❌ FAILED | {error} — artifacts: `{artifactsDir}` |
44
+ | [Name]({viewUrl}) | ✅ PASSED | — |
45
+ | [Name]({viewUrl}) | ❌ FAILED | {error} |
46
+ ```
47
+
48
+ ## E2E acceptance evidence comment format
49
+
50
+ After creating the PR, post a comment with embedded screenshots:
51
+
52
+ ```bash
53
+ gh pr comment <PR#> --body "$(cat <<'EOF'
54
+ ## 🧪 E2E acceptance evidence
55
+
56
+ **X passed / Y failed**
57
+
58
+ | Test Case | Status | Summary |
59
+ |-----------|--------|---------|
60
+ | [Login Flow]({viewUrl}) | ✅ PASSED | <a href="{lastStepScreenshotUrl}"><img src="{lastStepScreenshotUrl}" width="120"></a> |
61
+ | [Checkout]({viewUrl}) | ❌ FAILED | <a href="{failureStepScreenshotUrl}"><img src="{failureStepScreenshotUrl}" width="120"></a> |
62
+
63
+ <details>
64
+ <summary>📸 <strong>Login Flow</strong> — 5 steps</summary>
65
+
66
+ | # | Action | Screenshot |
67
+ |---|--------|------------|
68
+ | 1 | Navigate to `/login` | <a href="{screenshotUrl}"><img src="{screenshotUrl}" width="200"></a> |
69
+ | 2 | Enter username | <a href="{screenshotUrl}"><img src="{screenshotUrl}" width="200"></a> |
70
+ | 3 | Click "Sign In" | <a href="{screenshotUrl}"><img src="{screenshotUrl}" width="200"></a> |
71
+
72
+ </details>
73
+
74
+ <details>
75
+ <summary>📸 <strong>Checkout</strong> — 4 steps (failed at step 3)</summary>
76
+
77
+ | # | Action | Screenshot |
78
+ |---|--------|------------|
79
+ | 1 | Add item to cart | <a href="{screenshotUrl}"><img src="{screenshotUrl}" width="200"></a> |
80
+ | 2 | View cart | <a href="{screenshotUrl}"><img src="{screenshotUrl}" width="200"></a> |
81
+ | 3 ⚠️ | Click confirm — **Element not found** | <a href="{screenshotUrl}"><img src="{screenshotUrl}" width="200"></a> |
82
+
83
+ </details>
84
+ EOF
85
+ )"
40
86
  ```
41
87
 
42
- Rules:
43
- - Link each test case name to its details page on www.muggle-ai.com using the URL pattern above (requires `testCaseId` and `projectId` from the QA report).
44
- - For failed tests, include the error message and the local `artifactsDir` path so the developer can inspect screenshots.
45
- - Screenshots are in `{artifactsDir}/screenshots/` and viewable locally.
88
+ ### Comment Building Rules
89
+
90
+ 1. **Summary table:**
91
+ - Show thumbnail (120px) of **last step** for passed tests
92
+ - Show thumbnail of **failure step** for failed tests
93
+ - Thumbnail links to full-size image
94
+
95
+ 2. **Collapsible details per test case:**
96
+ - Show all steps with 200px thumbnails
97
+ - Mark failure step with ⚠️ and inline error message
98
+ - Include step count in summary line
99
+
100
+ 3. **HTML for thumbnails:**
101
+ - Use `<a href="{url}"><img src="{url}" width="N"></a>` for clickable thumbnails
102
+ - 120px width in summary table, 200px in details
103
+
104
+ 4. **All tests get screenshots:**
105
+ - Passing tests show proof of success
106
+ - Failing tests highlight the failure point
46
107
 
47
108
  ## Output
48
109
 
49
110
  **PRs Created:**
50
111
  - (repo name): (PR URL)
51
112
 
52
- **Errors:** (any repos where PR creation failed, with the error message)
113
+ **E2E acceptance evidence comments posted:**
114
+ - (repo name): comment posted to PR #(number)
115
+
116
+ **Errors:** (any repos where PR creation or comment posting failed, with the error message)
@@ -12,7 +12,8 @@ Use this as the top-level Muggle command router.
12
12
  When user asks for "muggle" with no specific subcommand, show this command set:
13
13
 
14
14
  - `/muggle:muggle-do` — autonomous dev pipeline
15
- - `/muggle:muggle-test-feature-local` — local feature QA
15
+ - `/muggle:muggle-test` — change-driven E2E acceptance testing (local or remote, with PR posting)
16
+ - `/muggle:muggle-test-feature-local` — local feature E2E acceptance testing
16
17
  - `/muggle:muggle-status` — health check
17
18
  - `/muggle:muggle-repair` — repair broken installation
18
19
  - `/muggle:muggle-upgrade` — upgrade local installation
@@ -24,7 +25,8 @@ If the user intent clearly matches one command, route to that command behavior:
24
25
  - status/health/check -> `muggle-status`
25
26
  - repair/fix/install broken -> `muggle-repair`
26
27
  - upgrade/update latest -> `muggle-upgrade`
27
- - test localhost/validate feature -> `muggle-test-feature-local`
28
+ - test my changes/acceptance test my work/test before push/post E2E acceptance results to PR/test on staging/test on preview -> `muggle-test`
29
+ - test localhost/validate single feature -> `muggle-test-feature-local`
28
30
  - build/implement from request -> `muggle-do`
29
31
 
30
32
  If intent is ambiguous, ask one concise clarification question.
@@ -6,9 +6,9 @@ disable-model-invocation: true
6
6
 
7
7
  # Muggle Do
8
8
 
9
- Muggle Do is the top-level command for the Muggle AI development workflow.
9
+ Muggle Do is the command for the Muggle AI development workflow.
10
10
 
11
- It runs the autonomous dev cycle: requirements -> impact analysis -> validate code -> coding -> unit tests -> QA -> open PRs.
11
+ It runs a battle-tested autonomous dev cycle: requirements -> impact analysis -> validate code -> coding -> unit tests -> E2E acceptance tests -> open PRs.
12
12
 
13
13
  For maintenance tasks, use the dedicated skills:
14
14
 
@@ -42,12 +42,12 @@ Use the supporting files in the `../do/` directory as stage-specific instruction
42
42
  - [impact-analysis.md](../do/impact-analysis.md)
43
43
  - [validate-code.md](../do/validate-code.md)
44
44
  - [unit-tests.md](../do/unit-tests.md)
45
- - [qa.md](../do/qa.md)
45
+ - [e2e-acceptance.md](../do/e2e-acceptance.md)
46
46
  - [open-prs.md](../do/open-prs.md)
47
47
 
48
48
  ## Guardrails
49
49
 
50
- - Do not skip unit tests before QA.
51
- - Do not skip QA due to missing scripts; generate when needed.
50
+ - Do not skip unit tests before E2E acceptance tests.
51
+ - Do not skip E2E acceptance tests due to missing scripts; generate when needed.
52
52
  - If the same stage fails 3 times in a row, escalate with details.
53
- - If total iterations reach 3 and QA still fails, continue to PR creation with `[QA FAILING]`.
53
+ - If total iterations reach 3 and E2E acceptance tests still fail, continue to PR creation with `[E2E FAILING]`.