@muggleai/works 2.0.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +326 -0
- package/bin/muggle.js +2 -0
- package/dist/chunk-O6JAG3WQ.js +6950 -0
- package/dist/cli.js +8 -0
- package/dist/index.js +1 -0
- package/package.json +94 -0
- package/scripts/postinstall.mjs +862 -0
- package/skills-dist/muggle-do.md +589 -0
- package/skills-dist/publish-test-to-cloud.md +43 -0
- package/skills-dist/test-feature-local.md +344 -0
|
@@ -0,0 +1,589 @@
|
|
|
1
|
+
---
|
|
2
|
+
description: Quality-guranteed development workflow by Muggle AI. Takes a task through requirements, coding, testing, QA, and PR creation with iterative fix loops. Manages state in .muggle-do/sessions/ for auditability and crash recovery.
|
|
3
|
+
---
|
|
4
|
+
|
|
5
|
+
# Muggle Do — Autonomous Development Pipeline
|
|
6
|
+
|
|
7
|
+
Muggle Do is a session-based, iterative development pipeline. Given a task description, it autonomously:
|
|
8
|
+
|
|
9
|
+
1. Extracts requirements and acceptance criteria
|
|
10
|
+
2. Analyzes impact across configured repositories
|
|
11
|
+
3. Validates git state (branches, commits, working tree)
|
|
12
|
+
4. Writes or fixes implementation code
|
|
13
|
+
5. Runs unit tests
|
|
14
|
+
6. Runs QA test cases via Muggle AI infrastructure
|
|
15
|
+
7. Triages failures and loops back to fix them
|
|
16
|
+
8. Opens pull requests when done
|
|
17
|
+
|
|
18
|
+
Each run is a **session** with full state persistence in markdown files. Sessions survive crashes, support resume, and provide a complete audit trail.
|
|
19
|
+
|
|
20
|
+
---
|
|
21
|
+
|
|
22
|
+
## Input
|
|
23
|
+
|
|
24
|
+
The user's task description is: **$ARGUMENTS**
|
|
25
|
+
|
|
26
|
+
This is a natural-language description of what to build, fix, or change. If `$ARGUMENTS` is empty, ask the user to describe the task before proceeding.
|
|
27
|
+
|
|
28
|
+
---
|
|
29
|
+
|
|
30
|
+
## Repo Configuration
|
|
31
|
+
|
|
32
|
+
Repos are configured in `muggle-repos.json` in the working directory:
|
|
33
|
+
|
|
34
|
+
```json
|
|
35
|
+
[
|
|
36
|
+
{ "name": "frontend", "path": "/absolute/path/to/frontend", "testCommand": "pnpm test" },
|
|
37
|
+
{ "name": "backend", "path": "/absolute/path/to/backend", "testCommand": "pnpm test" }
|
|
38
|
+
]
|
|
39
|
+
```
|
|
40
|
+
|
|
41
|
+
Read this file at startup. If it does not exist, ask the user to provide repo details and create it before proceeding.
|
|
42
|
+
|
|
43
|
+
---
|
|
44
|
+
|
|
45
|
+
## Session Management
|
|
46
|
+
|
|
47
|
+
On invocation, check `.muggle-do/sessions/` for existing sessions.
|
|
48
|
+
|
|
49
|
+
### If sessions exist
|
|
50
|
+
|
|
51
|
+
Read each session's `state.md` to determine its status. Present the user with options:
|
|
52
|
+
|
|
53
|
+
```
|
|
54
|
+
Existing sessions:
|
|
55
|
+
[1] add-login-page — CODING (iteration 2) ← in progress
|
|
56
|
+
[2] fix-payment-timeout — DONE (2 iterations) ← completed
|
|
57
|
+
[3] Start new session
|
|
58
|
+
|
|
59
|
+
Which session? _
|
|
60
|
+
```
|
|
61
|
+
|
|
62
|
+
- **Resume active session** — continue from the current stage recorded in `state.md`
|
|
63
|
+
- **Review completed session** — read and display `result.md`
|
|
64
|
+
- **Start new session** — proceed with `$ARGUMENTS` as the task description
|
|
65
|
+
|
|
66
|
+
### If no sessions exist
|
|
67
|
+
|
|
68
|
+
Skip the prompt entirely. Go straight to creating a new session with `$ARGUMENTS`.
|
|
69
|
+
|
|
70
|
+
### Session Naming
|
|
71
|
+
|
|
72
|
+
Generate a slug from the task description:
|
|
73
|
+
- Lowercase, hyphenated (replace spaces and non-alphanumeric chars with hyphens)
|
|
74
|
+
- Max 50 characters
|
|
75
|
+
- Remove leading/trailing hyphens and collapse consecutive hyphens
|
|
76
|
+
- On collision with an existing session directory, append a numeric suffix: `add-login-page-2`, `add-login-page-3`, etc.
|
|
77
|
+
|
|
78
|
+
Create the session directory structure:
|
|
79
|
+
|
|
80
|
+
```
|
|
81
|
+
.muggle-do/sessions/<slug>/
|
|
82
|
+
state.md
|
|
83
|
+
iterations/
|
|
84
|
+
```
|
|
85
|
+
|
|
86
|
+
Ensure `.muggle-do` is listed in `.gitignore`. If not, add it.
|
|
87
|
+
|
|
88
|
+
---
|
|
89
|
+
|
|
90
|
+
## State File Locations
|
|
91
|
+
|
|
92
|
+
Each session contains:
|
|
93
|
+
|
|
94
|
+
```
|
|
95
|
+
.muggle-do/sessions/<slug>/
|
|
96
|
+
state.md # Current state pointer — read this first on resume
|
|
97
|
+
requirements.md # Stable requirements (goal, criteria, repos)
|
|
98
|
+
iterations/
|
|
99
|
+
001.md # Full record of iteration 1
|
|
100
|
+
002.md # Full record of iteration 2
|
|
101
|
+
...
|
|
102
|
+
result.md # Final outcome (written on completion)
|
|
103
|
+
```
|
|
104
|
+
|
|
105
|
+
### `state.md` Template
|
|
106
|
+
|
|
107
|
+
```markdown
|
|
108
|
+
# Muggle Do: <session-slug>
|
|
109
|
+
|
|
110
|
+
## Config
|
|
111
|
+
- **Task**: <user's task description>
|
|
112
|
+
- **Session**: <session-slug>
|
|
113
|
+
- **Started**: <ISO 8601 timestamp>
|
|
114
|
+
- **Repos**: <repo-name> (<path>), <repo-name> (<path>), ...
|
|
115
|
+
- **Max iterations**: 3
|
|
116
|
+
|
|
117
|
+
## Current
|
|
118
|
+
- **Iteration**: <number>
|
|
119
|
+
- **Stage**: <stage-name>
|
|
120
|
+
- **Previous failure**: <description or "none">
|
|
121
|
+
- **Jump target**: <stage or "none">
|
|
122
|
+
```
|
|
123
|
+
|
|
124
|
+
### `requirements.md` Template
|
|
125
|
+
|
|
126
|
+
```markdown
|
|
127
|
+
# Requirements
|
|
128
|
+
|
|
129
|
+
## Goal
|
|
130
|
+
<one clear sentence describing the outcome>
|
|
131
|
+
|
|
132
|
+
## Acceptance Criteria
|
|
133
|
+
1. <criterion 1>
|
|
134
|
+
2. <criterion 2>
|
|
135
|
+
3. ...
|
|
136
|
+
|
|
137
|
+
## Repos
|
|
138
|
+
| Repo | Path | Test Command |
|
|
139
|
+
|------|------|-------------|
|
|
140
|
+
| <name> | <path> | <command> |
|
|
141
|
+
|
|
142
|
+
## Notes
|
|
143
|
+
<any ambiguities, assumptions, or inferred criteria>
|
|
144
|
+
```
|
|
145
|
+
|
|
146
|
+
### `iterations/NNN.md` Template
|
|
147
|
+
|
|
148
|
+
```markdown
|
|
149
|
+
# Iteration <N>
|
|
150
|
+
|
|
151
|
+
## Impact Analysis
|
|
152
|
+
| Repo | Files | Changes |
|
|
153
|
+
|------|-------|---------|
|
|
154
|
+
| <name> | <file paths> | <description> |
|
|
155
|
+
|
|
156
|
+
## Validate Code
|
|
157
|
+
| Repo | Branch | Status |
|
|
158
|
+
|------|--------|--------|
|
|
159
|
+
| <name> | <branch> | <status description> |
|
|
160
|
+
|
|
161
|
+
## Coding
|
|
162
|
+
- <description of what was written or fixed>
|
|
163
|
+
|
|
164
|
+
## Unit Tests — <PASS|FAIL>
|
|
165
|
+
| Repo | Result | Details |
|
|
166
|
+
|------|--------|---------|
|
|
167
|
+
| <name> | <pass count> | <details> |
|
|
168
|
+
|
|
169
|
+
## QA — <PASS|FAIL>
|
|
170
|
+
| Test Case | Result | Details |
|
|
171
|
+
|-----------|--------|---------|
|
|
172
|
+
| <TC-ID: name> | <PASS|FAIL> | <details> |
|
|
173
|
+
|
|
174
|
+
## Triage
|
|
175
|
+
- **Failed stage**: <stage name>
|
|
176
|
+
- **Failure**: <what failed>
|
|
177
|
+
- **Analysis**: <root cause analysis>
|
|
178
|
+
- **Decision**: Jump to <STAGE>
|
|
179
|
+
- **Reasoning**: <why this jump target>
|
|
180
|
+
```
|
|
181
|
+
|
|
182
|
+
Only include sections for stages that actually ran in that iteration. If triage jumped to CODING, the iteration file starts at `## Coding`.
|
|
183
|
+
|
|
184
|
+
### `result.md` Template
|
|
185
|
+
|
|
186
|
+
```markdown
|
|
187
|
+
# Result
|
|
188
|
+
|
|
189
|
+
## Status: <DONE|QA_FAILING>
|
|
190
|
+
## Session: <session-slug>
|
|
191
|
+
## Completed: <ISO 8601 timestamp>
|
|
192
|
+
## Iterations: <total count>
|
|
193
|
+
|
|
194
|
+
## PRs
|
|
195
|
+
| Repo | PR | URL |
|
|
196
|
+
|------|-----|-----|
|
|
197
|
+
| <name> | <branch> | <URL> |
|
|
198
|
+
|
|
199
|
+
## QA Summary
|
|
200
|
+
- <pass count>/<total count> test cases passed (after <N> iterations)
|
|
201
|
+
- Iteration 1: <summary>
|
|
202
|
+
- Iteration 2: <summary>
|
|
203
|
+
```
|
|
204
|
+
|
|
205
|
+
---
|
|
206
|
+
|
|
207
|
+
## State Machine
|
|
208
|
+
|
|
209
|
+
### Stages
|
|
210
|
+
|
|
211
|
+
| Stage | Purpose | Runs every iteration? |
|
|
212
|
+
|-------|---------|----------------------|
|
|
213
|
+
| INIT | Create session, read config | Once |
|
|
214
|
+
| REQUIREMENTS | Extract goal + acceptance criteria from user task | Once (unless triage jumps back) |
|
|
215
|
+
| IMPACT_ANALYSIS | Detect changed files across repos | Yes |
|
|
216
|
+
| VALIDATE_CODE | Check branches, commits, clean working tree | Yes |
|
|
217
|
+
| CODING | Write new code (iteration 1) or fix code (subsequent iterations) | Yes |
|
|
218
|
+
| UNIT_TESTS | Run test commands per repo | Yes |
|
|
219
|
+
| QA | Run Muggle AI test cases | Yes |
|
|
220
|
+
| TRIAGE | Analyze failure, decide where to jump back | On failure only |
|
|
221
|
+
| OPEN_PRS | Push branches, create PRs | Once at end |
|
|
222
|
+
| DONE | Session complete | Terminal |
|
|
223
|
+
|
|
224
|
+
### State Transitions
|
|
225
|
+
|
|
226
|
+
```
|
|
227
|
+
INIT
|
|
228
|
+
|
|
|
229
|
+
v
|
|
230
|
+
REQUIREMENTS <--- triage: "flow can't be accomplished"
|
|
231
|
+
|
|
|
232
|
+
+-------------+---- Iteration --------+
|
|
233
|
+
| v |
|
|
234
|
+
| IMPACT_ANALYSIS <--- triage: "need different files"
|
|
235
|
+
| | |
|
|
236
|
+
| v |
|
|
237
|
+
| VALIDATE_CODE <--- triage: "branch/commit issues"
|
|
238
|
+
| | |
|
|
239
|
+
| v |
|
|
240
|
+
| CODING <--- triage: "styling fix", "logic bug"
|
|
241
|
+
| | |
|
|
242
|
+
| v |
|
|
243
|
+
| UNIT_TESTS |
|
|
244
|
+
| | |
|
|
245
|
+
| v |
|
|
246
|
+
| QA |
|
|
247
|
+
| | |
|
|
248
|
+
| fail | pass |
|
|
249
|
+
| v | |
|
|
250
|
+
| TRIAGE | |
|
|
251
|
+
| | | |
|
|
252
|
+
| +------+--- (jumps back) -------+
|
|
253
|
+
| |
|
|
254
|
+
+-------------+
|
|
255
|
+
v
|
|
256
|
+
OPEN_PRS
|
|
257
|
+
|
|
|
258
|
+
v
|
|
259
|
+
DONE
|
|
260
|
+
```
|
|
261
|
+
|
|
262
|
+
---
|
|
263
|
+
|
|
264
|
+
## State Write Protocol
|
|
265
|
+
|
|
266
|
+
Update state files incrementally as you progress through stages:
|
|
267
|
+
|
|
268
|
+
1. **Before each stage**: Update `state.md` — set `Stage` to the current stage name.
|
|
269
|
+
2. **After each stage completes**: Append the stage results to the current iteration file (`iterations/NNN.md`).
|
|
270
|
+
3. **On triage**: Append the triage record to the current iteration file, then update `state.md` with the jump target stage and increment the iteration counter.
|
|
271
|
+
4. **On completion**: Write `result.md` and set `Stage` to `DONE` in `state.md`.
|
|
272
|
+
|
|
273
|
+
This means iteration files are built up incrementally (not written all at once), so a crash mid-stage leaves a partial but useful record that supports resume.
|
|
274
|
+
|
|
275
|
+
---
|
|
276
|
+
|
|
277
|
+
## Stage Instructions
|
|
278
|
+
|
|
279
|
+
Execute stages in order according to the state machine. When resuming a session, skip to the stage recorded in `state.md`.
|
|
280
|
+
|
|
281
|
+
---
|
|
282
|
+
|
|
283
|
+
### REQUIREMENTS
|
|
284
|
+
|
|
285
|
+
**When**: First stage of every new session. Also re-run if triage jumps back here.
|
|
286
|
+
|
|
287
|
+
**Update state.md**: Set Stage to `requirements`.
|
|
288
|
+
|
|
289
|
+
**Instructions:**
|
|
290
|
+
|
|
291
|
+
1. Read `muggle-repos.json` to get the list of configured repos with their names, paths, and test commands.
|
|
292
|
+
2. Read the user's task description (`$ARGUMENTS`).
|
|
293
|
+
3. Extract the **goal** — one clear sentence describing the outcome.
|
|
294
|
+
4. Extract **acceptance criteria** — specific, verifiable conditions that must be true when the task is done. Each criterion should be independently testable. If the task description is vague, infer reasonable criteria but flag them as inferred in Notes.
|
|
295
|
+
5. Identify which repos from `muggle-repos.json` are likely affected based on the task description.
|
|
296
|
+
6. Write `requirements.md` in the session directory using the template above. Include the full repo table from `muggle-repos.json`.
|
|
297
|
+
|
|
298
|
+
**Output**: The requirements are now written to `requirements.md`. Proceed to IMPACT_ANALYSIS.
|
|
299
|
+
|
|
300
|
+
Do NOT ask the user clarifying questions. Make reasonable inferences and note assumptions.
|
|
301
|
+
|
|
302
|
+
---
|
|
303
|
+
|
|
304
|
+
### IMPACT_ANALYSIS
|
|
305
|
+
|
|
306
|
+
**When**: After REQUIREMENTS, or when triage jumps back here.
|
|
307
|
+
|
|
308
|
+
**Update state.md**: Set Stage to `impact_analysis`.
|
|
309
|
+
|
|
310
|
+
**Instructions:**
|
|
311
|
+
|
|
312
|
+
For each repo listed in `requirements.md`:
|
|
313
|
+
|
|
314
|
+
1. **Check the current branch**: Run `git branch --show-current` in the repo. If it returns empty (detached HEAD), record an error for that repo.
|
|
315
|
+
2. **Detect the default branch**: Run `git symbolic-ref refs/remotes/origin/HEAD --short` to find the default branch (e.g., `origin/main`). Strip the `origin/` prefix. If this fails, check if `main` or `master` exist locally via `git rev-parse --verify`.
|
|
316
|
+
3. **Verify it is a feature branch**: The current branch must NOT be the default branch. If it is, create a feature branch from the task slug (e.g., `feat/<session-slug>`) and switch to it.
|
|
317
|
+
4. **List changed files**: Run `git diff --name-only <default-branch>...HEAD` to find files changed on this branch relative to the default branch. Also check `git status --porcelain` for uncommitted changes.
|
|
318
|
+
5. **Get the diff**: Run `git diff <default-branch>...HEAD` for the full diff. If this is the first iteration and there are no changes yet, that is expected — the CODING stage will create them.
|
|
319
|
+
|
|
320
|
+
**Record per repo:**
|
|
321
|
+
- Branch name
|
|
322
|
+
- Default branch
|
|
323
|
+
- Changed files (if any)
|
|
324
|
+
- Brief diff summary
|
|
325
|
+
- Status: OK or ERROR (with reason)
|
|
326
|
+
|
|
327
|
+
**Output**: Append the Impact Analysis section to the current iteration file. Proceed to VALIDATE_CODE.
|
|
328
|
+
|
|
329
|
+
If this is the first iteration and no changes exist yet, that is normal — proceed to VALIDATE_CODE and then CODING.
|
|
330
|
+
|
|
331
|
+
---
|
|
332
|
+
|
|
333
|
+
### VALIDATE_CODE
|
|
334
|
+
|
|
335
|
+
**When**: After IMPACT_ANALYSIS, or when triage jumps back here.
|
|
336
|
+
|
|
337
|
+
**Update state.md**: Set Stage to `validate_code`.
|
|
338
|
+
|
|
339
|
+
**Instructions:**
|
|
340
|
+
|
|
341
|
+
For each repo with changes (or all repos on first iteration):
|
|
342
|
+
|
|
343
|
+
1. **Verify the branch is a feature branch** (not main/master/the default branch). If on the default branch, create and checkout a feature branch: `feat/<session-slug>`.
|
|
344
|
+
2. **Check for uncommitted changes**: Run `git status --porcelain` in the repo. If there are uncommitted changes, note them — they should be committed during CODING.
|
|
345
|
+
3. **Get the branch diff**: Run `git diff <default-branch>...HEAD --stat` for a summary of changes.
|
|
346
|
+
4. **Verify commits exist on the branch** (if not first iteration): Run `git log <default-branch>..HEAD --oneline` to confirm there are commits.
|
|
347
|
+
|
|
348
|
+
**Record per repo:**
|
|
349
|
+
- Branch name
|
|
350
|
+
- Commit count and summaries
|
|
351
|
+
- Uncommitted changes: yes/no
|
|
352
|
+
- Diff stat
|
|
353
|
+
- Status: READY | WARNING | ERROR
|
|
354
|
+
|
|
355
|
+
**On failure** (e.g., merge conflicts, detached HEAD that cannot be resolved): Transition to TRIAGE.
|
|
356
|
+
|
|
357
|
+
**Output**: Append the Validate Code section to the current iteration file. Proceed to CODING.
|
|
358
|
+
|
|
359
|
+
---
|
|
360
|
+
|
|
361
|
+
### CODING
|
|
362
|
+
|
|
363
|
+
**When**: After VALIDATE_CODE, or when triage jumps back here (most common jump target).
|
|
364
|
+
|
|
365
|
+
**Update state.md**: Set Stage to `coding`.
|
|
366
|
+
|
|
367
|
+
**Instructions:**
|
|
368
|
+
|
|
369
|
+
#### First iteration (new feature):
|
|
370
|
+
- Read `requirements.md` for the goal and acceptance criteria
|
|
371
|
+
- Read the Impact Analysis section from the current iteration file for which files/repos to work with
|
|
372
|
+
- Write the implementation code across the identified repos
|
|
373
|
+
- For each repo, commit changes to the feature branch with descriptive commit messages
|
|
374
|
+
- Done when: code is committed and ready for testing
|
|
375
|
+
|
|
376
|
+
#### Subsequent iterations (fix):
|
|
377
|
+
- Read the Triage section from the **previous** iteration file to understand what failed and why
|
|
378
|
+
- Read the `Previous failure` and `Jump target` from `state.md`
|
|
379
|
+
- Edit the relevant code to address the specific failure
|
|
380
|
+
- Commit the fix with a message referencing what was fixed (e.g., `fix: mobile viewport for login submit button`)
|
|
381
|
+
- Done when: fix is committed and ready for re-testing
|
|
382
|
+
|
|
383
|
+
#### Multi-repo handling:
|
|
384
|
+
Work on each affected repo sequentially. Commit to each repo's feature branch before moving to the next.
|
|
385
|
+
|
|
386
|
+
**Output**: Append the Coding section to the current iteration file. Proceed to UNIT_TESTS.
|
|
387
|
+
|
|
388
|
+
---
|
|
389
|
+
|
|
390
|
+
### UNIT_TESTS
|
|
391
|
+
|
|
392
|
+
**When**: After CODING. Always runs before QA to catch regressions.
|
|
393
|
+
|
|
394
|
+
**Update state.md**: Set Stage to `unit_tests`.
|
|
395
|
+
|
|
396
|
+
**Instructions:**
|
|
397
|
+
|
|
398
|
+
For each repo listed in `requirements.md`:
|
|
399
|
+
|
|
400
|
+
1. **Run the test command** using Bash in the repo's directory. Use the `testCommand` from `muggle-repos.json` (default: `pnpm test`).
|
|
401
|
+
2. **Capture the full output** — both stdout and stderr.
|
|
402
|
+
3. **Determine pass/fail** — exit code 0 means pass, anything else means fail.
|
|
403
|
+
4. **If tests fail**, extract the specific failing test names and descriptions from the output.
|
|
404
|
+
|
|
405
|
+
**Record per repo:**
|
|
406
|
+
- Test command run
|
|
407
|
+
- Result: PASS or FAIL
|
|
408
|
+
- Failed tests (list, if any)
|
|
409
|
+
- Relevant output (full if failed, summary if passed)
|
|
410
|
+
|
|
411
|
+
**On pass (all repos)**: Append Unit Tests section to iteration file. Proceed to QA.
|
|
412
|
+
|
|
413
|
+
**On failure**: Append Unit Tests section to iteration file. Transition to TRIAGE.
|
|
414
|
+
|
|
415
|
+
---
|
|
416
|
+
|
|
417
|
+
### QA
|
|
418
|
+
|
|
419
|
+
**When**: After UNIT_TESTS pass.
|
|
420
|
+
|
|
421
|
+
**Update state.md**: Set Stage to `qa`.
|
|
422
|
+
|
|
423
|
+
**Instructions:**
|
|
424
|
+
|
|
425
|
+
#### Step 1: Check Authentication
|
|
426
|
+
|
|
427
|
+
Use the `muggle-remote-auth-status` MCP tool to verify valid credentials. If not authenticated, use `muggle-remote-auth-login` to start the device-code login flow and `muggle-remote-auth-poll` to wait for the user to complete login.
|
|
428
|
+
|
|
429
|
+
#### Step 2: Get Test Cases
|
|
430
|
+
|
|
431
|
+
Use `muggle-remote-test-case-list` with the project ID to fetch all test cases for this project.
|
|
432
|
+
|
|
433
|
+
#### Step 3: Filter Relevant Test Cases
|
|
434
|
+
|
|
435
|
+
Based on the changed files and the requirements goal, determine which test cases are relevant. Include:
|
|
436
|
+
- Test cases whose use cases directly relate to the changed functionality
|
|
437
|
+
- Test cases that cover areas potentially affected by the changes
|
|
438
|
+
- When in doubt, include the test case (better to test more than miss a regression)
|
|
439
|
+
|
|
440
|
+
#### Step 4: Run Test Scripts
|
|
441
|
+
|
|
442
|
+
For each relevant test case that has test scripts:
|
|
443
|
+
1. Use `muggle-remote-test-script-list` to find test scripts for the test case
|
|
444
|
+
2. Use `muggle-remote-workflow-start-test-script-replay` to trigger a replay
|
|
445
|
+
3. Use `muggle-remote-wf-get-ts-replay-latest-run` to poll for results (check every 10 seconds, timeout after 5 minutes per test)
|
|
446
|
+
|
|
447
|
+
#### Step 5: Collect Results
|
|
448
|
+
|
|
449
|
+
For each test case:
|
|
450
|
+
- Record pass or fail
|
|
451
|
+
- If failed, capture the failure reason and reproduction steps
|
|
452
|
+
- If no test script exists for a test case, note it as "no script available" (not a failure)
|
|
453
|
+
|
|
454
|
+
**On pass (all test cases)**: Append QA section to iteration file. Proceed to OPEN_PRS.
|
|
455
|
+
|
|
456
|
+
**On failure**: Append QA section to iteration file. Transition to TRIAGE.
|
|
457
|
+
|
|
458
|
+
---
|
|
459
|
+
|
|
460
|
+
### TRIAGE
|
|
461
|
+
|
|
462
|
+
**When**: After any stage fails (VALIDATE_CODE, UNIT_TESTS, or QA).
|
|
463
|
+
|
|
464
|
+
**Update state.md**: Set Stage to `triage`.
|
|
465
|
+
|
|
466
|
+
**Instructions:**
|
|
467
|
+
|
|
468
|
+
1. **Identify what failed**: Read the failed stage's output from the current iteration file.
|
|
469
|
+
|
|
470
|
+
2. **Analyze the failure**: Determine the root cause. Is it a requirements issue, a scoping issue, a git issue, or a code bug?
|
|
471
|
+
|
|
472
|
+
3. **Classify using these heuristics** (in order):
|
|
473
|
+
|
|
474
|
+
| # | Question | If yes, jump to | Signal |
|
|
475
|
+
|---|----------|-----------------|--------|
|
|
476
|
+
| 1 | Does the failure indicate the goal or acceptance criteria are wrong or incomplete? | REQUIREMENTS | QA reveals a user flow that contradicts the stated goal, or acceptance criteria are missing a scenario |
|
|
477
|
+
| 2 | Does the fix require changing files not currently in scope? | IMPACT_ANALYSIS | The fix touches repos or files not identified in the impact analysis |
|
|
478
|
+
| 3 | Are there branch or commit issues? | VALIDATE_CODE | Uncommitted changes, wrong branch, merge conflicts |
|
|
479
|
+
| 4 | Is it a code-level bug (logic, styling, API)? | CODING | Test output points to specific behavior the code should handle differently |
|
|
480
|
+
|
|
481
|
+
**When uncertain, default to CODING** — it is the lowest-risk jump target since unit tests will catch regressions before re-running QA.
|
|
482
|
+
|
|
483
|
+
4. **Record the triage decision** in the current iteration file using the Triage section template.
|
|
484
|
+
|
|
485
|
+
5. **Update state.md**:
|
|
486
|
+
- Set `Stage` to the jump target
|
|
487
|
+
- Set `Previous failure` to a brief description of what failed
|
|
488
|
+
- Set `Jump target` to the target stage name
|
|
489
|
+
- If jumping to REQUIREMENTS: increment `Iteration` (this restarts the pipeline scope)
|
|
490
|
+
- If jumping to IMPACT_ANALYSIS, VALIDATE_CODE, or CODING: increment `Iteration`
|
|
491
|
+
|
|
492
|
+
6. **Create a new iteration file** (`iterations/NNN.md`) for the next iteration.
|
|
493
|
+
|
|
494
|
+
7. **Continue execution** at the jump target stage.
|
|
495
|
+
|
|
496
|
+
Note: When triage jumps to CODING, the subsequent flow always passes through UNIT_TESTS before re-running QA (code changes could break tests).
|
|
497
|
+
|
|
498
|
+
---
|
|
499
|
+
|
|
500
|
+
### OPEN_PRS
|
|
501
|
+
|
|
502
|
+
**When**: After QA passes, OR after max iterations reached (with QA_FAILING tag).
|
|
503
|
+
|
|
504
|
+
**Update state.md**: Set Stage to `open_prs`.
|
|
505
|
+
|
|
506
|
+
**Instructions:**
|
|
507
|
+
|
|
508
|
+
For each repo with changes:
|
|
509
|
+
|
|
510
|
+
1. **Push the branch** to origin: `git push -u origin <branch-name>` in the repo directory.
|
|
511
|
+
2. **Build the PR title:**
|
|
512
|
+
- If QA has failures (max iterations reached without passing): `[QA FAILING] <goal>`
|
|
513
|
+
- Otherwise: `<goal>`
|
|
514
|
+
- Keep under 70 characters
|
|
515
|
+
3. **Build the PR body** with these sections:
|
|
516
|
+
- `## Goal` — the requirements goal
|
|
517
|
+
- `## Acceptance Criteria` — bulleted list from `requirements.md`
|
|
518
|
+
- `## Changes` — summary of what changed in this repo
|
|
519
|
+
- `## QA Results` — passed/failed counts, failure details if any
|
|
520
|
+
4. **Create the PR** using `gh pr create --title "..." --body "..." --head <branch>` in the repo directory.
|
|
521
|
+
5. **Capture the PR URL** from the output.
|
|
522
|
+
|
|
523
|
+
**Output**: Record all PR URLs. Proceed to DONE.
|
|
524
|
+
|
|
525
|
+
---
|
|
526
|
+
|
|
527
|
+
### DONE
|
|
528
|
+
|
|
529
|
+
**When**: After OPEN_PRS completes.
|
|
530
|
+
|
|
531
|
+
**Instructions:**
|
|
532
|
+
|
|
533
|
+
1. Write `result.md` in the session directory using the template above.
|
|
534
|
+
2. Update `state.md`: Set Stage to `done`.
|
|
535
|
+
3. Present the final results to the user:
|
|
536
|
+
- List of PRs opened (with URLs)
|
|
537
|
+
- QA summary (passed/failed counts, iteration history)
|
|
538
|
+
- Any warnings or issues encountered
|
|
539
|
+
- Total iterations used
|
|
540
|
+
|
|
541
|
+
---
|
|
542
|
+
|
|
543
|
+
## Guardrails
|
|
544
|
+
|
|
545
|
+
### Max fix attempts per stage
|
|
546
|
+
|
|
547
|
+
If the same stage fails **3 consecutive times** (e.g., 3 failed unit test runs in a row), stop and escalate to the user:
|
|
548
|
+
|
|
549
|
+
```
|
|
550
|
+
ESCALATION: Unit tests have failed 3 consecutive times.
|
|
551
|
+
|
|
552
|
+
Latest failure:
|
|
553
|
+
<failure details>
|
|
554
|
+
|
|
555
|
+
Previous attempts:
|
|
556
|
+
Iteration 1: <what was tried>
|
|
557
|
+
Iteration 2: <what was tried>
|
|
558
|
+
Iteration 3: <what was tried>
|
|
559
|
+
|
|
560
|
+
Please review the failures and provide guidance on how to proceed.
|
|
561
|
+
```
|
|
562
|
+
|
|
563
|
+
The per-stage failure counter resets when the stage passes.
|
|
564
|
+
|
|
565
|
+
### Max total iterations
|
|
566
|
+
|
|
567
|
+
If the pipeline has completed **3 full iterations** (cycles through CODING -> UNIT_TESTS -> QA) and QA still fails, stop iterating and proceed directly to OPEN_PRS with the `[QA FAILING]` prefix on PR titles.
|
|
568
|
+
|
|
569
|
+
### Triage to REQUIREMENTS
|
|
570
|
+
|
|
571
|
+
A triage decision that jumps back to REQUIREMENTS counts as starting a new iteration, since it restarts the pipeline scope. This counts against the max total iterations limit.
|
|
572
|
+
|
|
573
|
+
### Unresolvable failures
|
|
574
|
+
|
|
575
|
+
If you cannot determine a fix or the failure is outside your capability (e.g., infrastructure issue, missing credentials, external service down), pause and escalate to the user with a clear description of:
|
|
576
|
+
- What stage failed
|
|
577
|
+
- What the failure is
|
|
578
|
+
- What you have already tried
|
|
579
|
+
- What you need from the user to continue
|
|
580
|
+
|
|
581
|
+
---
|
|
582
|
+
|
|
583
|
+
## Error Handling
|
|
584
|
+
|
|
585
|
+
- If a stage encounters an unexpected error (not a test failure, but an infrastructure/tooling error), report it clearly and pause for user input.
|
|
586
|
+
- Always show which stage failed and why.
|
|
587
|
+
- Never silently skip a stage or continue past an error without recording it.
|
|
588
|
+
- If `muggle-repos.json` is missing or malformed, stop and ask the user to fix it before proceeding.
|
|
589
|
+
- If a repo path does not exist on disk, report the error and exclude that repo (continue with remaining repos if any).
|
|
@@ -0,0 +1,43 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: publish-test-to-cloud
|
|
3
|
+
description: Publish a local generation run to cloud workflow records using MCP tools.
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# Publish Test To Cloud
|
|
7
|
+
|
|
8
|
+
Publish a locally generated run to Muggle AI cloud so it appears in cloud workflow/test result views.
|
|
9
|
+
|
|
10
|
+
## Required Tools
|
|
11
|
+
|
|
12
|
+
- `muggle-remote-auth-status`
|
|
13
|
+
- `muggle-remote-auth-login`
|
|
14
|
+
- `muggle-remote-auth-poll`
|
|
15
|
+
- `muggle-local-run-result-list`
|
|
16
|
+
- `muggle-local-run-result-get`
|
|
17
|
+
- `muggle-local-publish-test-script`
|
|
18
|
+
- `muggle-remote-local-run-upload` (advanced/manual path)
|
|
19
|
+
|
|
20
|
+
## Default Flow
|
|
21
|
+
|
|
22
|
+
1. Check auth with `muggle-remote-auth-status`.
|
|
23
|
+
2. If not authenticated, run login flow:
|
|
24
|
+
- `muggle-remote-auth-login`
|
|
25
|
+
- `muggle-remote-auth-poll` (when pending)
|
|
26
|
+
3. Find a local run:
|
|
27
|
+
- `muggle-local-run-result-list`
|
|
28
|
+
- choose a **generation** run in `passed`/`failed` state
|
|
29
|
+
4. Validate details:
|
|
30
|
+
- `muggle-local-run-result-get`
|
|
31
|
+
- ensure run has `projectId`, `useCaseId`, `cloudTestCaseId`, `executionTimeMs`, and local execution context
|
|
32
|
+
5. Publish:
|
|
33
|
+
- `muggle-local-publish-test-script` with:
|
|
34
|
+
- `runId`
|
|
35
|
+
- `cloudTestCaseId`
|
|
36
|
+
6. Return cloud identifiers and view URL from tool response.
|
|
37
|
+
|
|
38
|
+
## Notes
|
|
39
|
+
|
|
40
|
+
- `muggle-local-publish-test-script` is the preferred path. It packages local artifacts and uploads to cloud endpoint.
|
|
41
|
+
- `muggle-remote-local-run-upload` is available for manual/direct upload when needed.
|
|
42
|
+
- Replay runs are not publishable by this skill.
|
|
43
|
+
- Hard fail on missing required metadata; do not silently substitute values.
|