greenrun-cli 0.2.8 → 0.2.9
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/package.json
CHANGED
package/templates/claude-md.md
CHANGED
|
@@ -63,6 +63,18 @@ Don't ask the user for information you can derive from the codebase (base URL, l
|
|
|
63
63
|
3. Use `create_page` to register the page URL if not already registered
|
|
64
64
|
4. Use `create_test` with the instructions and page IDs
|
|
65
65
|
|
|
66
|
+
### Bug Detection During Test Creation
|
|
67
|
+
|
|
68
|
+
When exploring pages to write tests, if something doesn't work as expected:
|
|
69
|
+
|
|
70
|
+
- **If the test steps are wrong** (wrong field names, missing prerequisite, bad selectors) -- fix the instructions and retry. Always try to make the test work before giving up.
|
|
71
|
+
- **If there's a real application bug** (form won't submit with certain data, unexpected error, broken feature):
|
|
72
|
+
1. Adjust the original test to work around the bug so it captures the happy path
|
|
73
|
+
2. Create a **separate bug test** tagged `bug` with steps that reproduce the failure, describing expected vs actual behaviour
|
|
74
|
+
3. Start a run for the bug test and complete it as `failed` with a summary of what went wrong
|
|
75
|
+
|
|
76
|
+
This way the main test suite tracks working functionality while bugs are captured as individual failing tests.
|
|
77
|
+
|
|
66
78
|
### Impact Analysis
|
|
67
79
|
|
|
68
80
|
After making code changes, use the `/greenrun-sweep` command or the `sweep` tool to find which tests are affected by the pages you changed. This helps you run only the relevant tests.
|
|
@@ -23,7 +23,9 @@ If auth fails (login form still visible after following instructions), report al
|
|
|
23
23
|
|
|
24
24
|
## Execute
|
|
25
25
|
|
|
26
|
-
You have a batch result from `prepare_test_batch` containing `project` (with `credentials` array) and `tests[]` (each with `test_id`, `test_name`, `run_id`, `instructions`, `credential_name`, `pages`, `tags`, `
|
|
26
|
+
You have a batch result from `prepare_test_batch` containing `project` (with `credentials` array) and `tests[]` (each with `test_id`, `test_name`, `run_id`, `instructions`, `credential_name`, `pages`, `tags`, `has_script`).
|
|
27
|
+
|
|
28
|
+
Note: `has_script` is a boolean indicating whether a cached Playwright script exists. To fetch the actual script content, call `get_test(test_id)` — only do this when you need the script (e.g. in Step 5 when writing test files).
|
|
27
29
|
|
|
28
30
|
If `tests` is empty, tell the user no matching active tests were found and stop.
|
|
29
31
|
|
|
@@ -35,8 +37,8 @@ Run the Authenticate procedure above once, using the standard Playwright tools (
|
|
|
35
37
|
|
|
36
38
|
Split the batch into two groups:
|
|
37
39
|
|
|
38
|
-
- **scripted**: tests where `
|
|
39
|
-
- **unscripted**: tests where `
|
|
40
|
+
- **scripted**: tests where `has_script` is true (cached Playwright scripts ready to run)
|
|
41
|
+
- **unscripted**: tests where `has_script` is false (need script generation)
|
|
40
42
|
|
|
41
43
|
If all tests are scripted, skip to Step 4.
|
|
42
44
|
|
|
@@ -60,6 +62,42 @@ For each unscripted test (in difficulty order), do a **scouting pass** — actua
|
|
|
60
62
|
4. Snapshot after each state change to capture: validation errors, success banners, modal dialogs, redirected pages, dynamically loaded content
|
|
61
63
|
5. Collect all observed elements and selectors as context
|
|
62
64
|
|
|
65
|
+
#### Handling failures during scouting
|
|
66
|
+
|
|
67
|
+
If a step doesn't work as expected during the scouting pass, investigate before moving on:
|
|
68
|
+
|
|
69
|
+
1. **Determine the cause**: Is it a test problem (wrong instructions, bad selectors, missing prerequisite) or an application bug (form won't submit, unexpected error, broken functionality)?
|
|
70
|
+
|
|
71
|
+
2. **If the test is wrong** — fix and retry:
|
|
72
|
+
- Adjust the instructions to match what the UI actually requires (e.g. a required field the instructions missed, a different button label, an extra confirmation step)
|
|
73
|
+
- Update the test via `update_test` with corrected instructions
|
|
74
|
+
- Retry the failing step
|
|
75
|
+
|
|
76
|
+
3. **If it's an application bug** — work around it and record the bug:
|
|
77
|
+
- Find a way to make the original test pass by avoiding the broken path (e.g. if a discount code field breaks form submission, leave it blank)
|
|
78
|
+
- Update the original test instructions if needed to use the working path
|
|
79
|
+
- Create a **new bug test** that reproduces the specific failure:
|
|
80
|
+
```
|
|
81
|
+
create_test(project_id, {
|
|
82
|
+
name: "BUG: [description of the failure]",
|
|
83
|
+
instructions: "[steps that reproduce the bug, ending with the expected vs actual behaviour]",
|
|
84
|
+
tags: ["bug"],
|
|
85
|
+
page_ids: [relevant page IDs],
|
|
86
|
+
credential_name: same as original test
|
|
87
|
+
})
|
|
88
|
+
```
|
|
89
|
+
- Start a run for the bug test and immediately complete it as failed:
|
|
90
|
+
```
|
|
91
|
+
start_run(bug_test_id) → complete_run(run_id, "failed", "description of what went wrong")
|
|
92
|
+
```
|
|
93
|
+
- Continue scouting the original test with the workaround
|
|
94
|
+
|
|
95
|
+
This ensures the original test captures the happy path while bugs are tracked as separate failing tests that will show up in future runs.
|
|
96
|
+
|
|
97
|
+
After each test's scouting pass, close the browser so the next test starts with a clean context (no leftover cookies, storage, or page state):
|
|
98
|
+
|
|
99
|
+
Call `browser_close` after collecting all observations. The next `browser_navigate` call will automatically open a fresh browser context.
|
|
100
|
+
|
|
63
101
|
Then generate a `.spec.ts` script using the observed elements:
|
|
64
102
|
|
|
65
103
|
```ts
|
|
@@ -108,7 +146,7 @@ Call this via `browser_run_code`. If `auth_mode` is `none`, skip this step.
|
|
|
108
146
|
|
|
109
147
|
Gather all tests that have scripts (previously scripted + newly generated from Step 3).
|
|
110
148
|
|
|
111
|
-
1. **
|
|
149
|
+
1. **Fetch and write test files**: For each scripted test, call `get_test(test_id)` to retrieve the full script content, then write it to `/tmp/greenrun-tests/{test_id}.spec.ts`. Fetch scripts in parallel to minimize latency.
|
|
112
150
|
|
|
113
151
|
2. **Write config**: Write `/tmp/greenrun-tests/playwright.config.ts`:
|
|
114
152
|
|
|
@@ -137,6 +175,12 @@ npx playwright test --config /tmp/greenrun-tests/playwright.config.ts
|
|
|
137
175
|
|
|
138
176
|
5. **Report results**: Call `complete_run(run_id, status, result_summary)` for each test. Map Playwright statuses: `passed` → `passed`, `failed`/`timedOut` → `failed`, other → `error`.
|
|
139
177
|
|
|
178
|
+
6. **Clean up browsers**: After native execution completes, close any browsers left behind by the test runner:
|
|
179
|
+
```bash
|
|
180
|
+
npx playwright test --config /tmp/greenrun-tests/playwright.config.ts --list 2>/dev/null; true
|
|
181
|
+
```
|
|
182
|
+
The Playwright Test runner normally cleans up after itself, but if tests crash or timeout, browser processes may linger. Also call `browser_close` to reset the MCP browser context before any subsequent AI fallback execution.
|
|
183
|
+
|
|
140
184
|
### Step 6: Handle unscripted tests without scripts
|
|
141
185
|
|
|
142
186
|
Any tests that still don't have scripts (e.g. because the background agent hasn't finished, or script generation failed) need to be executed via AI agents using the legacy approach. Follow Step 7 for these tests.
|
|
@@ -154,8 +198,10 @@ After parsing all native results, walk through them in completion order. Track c
|
|
|
154
198
|
|
|
155
199
|
For tests that **failed** in native execution (and circuit breaker has not tripped):
|
|
156
200
|
|
|
157
|
-
1.
|
|
158
|
-
2.
|
|
201
|
+
1. Close the current browser context with `browser_close` so the fallback starts fresh
|
|
202
|
+
2. Re-authenticate by navigating to the login page and following the Authenticate procedure
|
|
203
|
+
3. Start new runs via `start_run(test_id)` (the original runs were already completed in Step 5)
|
|
204
|
+
4. Launch background Task agents using the tab-isolation pattern:
|
|
159
205
|
|
|
160
206
|
Create tabs and launch agents in batches of 20:
|
|
161
207
|
|