greenrun-cli 0.2.8 → 0.2.9

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "greenrun-cli",
3
- "version": "0.2.8",
3
+ "version": "0.2.9",
4
4
  "description": "CLI and MCP server for Greenrun - browser test management for Claude Code",
5
5
  "type": "module",
6
6
  "main": "dist/server.js",
@@ -63,6 +63,18 @@ Don't ask the user for information you can derive from the codebase (base URL, l
63
63
  3. Use `create_page` to register the page URL if not already registered
64
64
  4. Use `create_test` with the instructions and page IDs
65
65
 
66
+ ### Bug Detection During Test Creation
67
+
68
+ When exploring pages to write tests, if something doesn't work as expected:
69
+
70
+ - **If the test steps are wrong** (wrong field names, missing prerequisite, bad selectors) -- fix the instructions and retry. Always try to make the test work before giving up.
71
+ - **If there's a real application bug** (form won't submit with certain data, unexpected error, broken feature):
72
+ 1. Adjust the original test to work around the bug so it captures the happy path
73
+ 2. Create a **separate bug test** tagged `bug` with steps that reproduce the failure, describing expected vs actual behaviour
74
+ 3. Start a run for the bug test and complete it as `failed` with a summary of what went wrong
75
+
76
+ This way the main test suite tracks working functionality while bugs are captured as individual failing tests.
77
+
66
78
  ### Impact Analysis
67
79
 
68
80
  After making code changes, use the `/greenrun-sweep` command or the `sweep` tool to find which tests are affected by the pages you changed. This helps you run only the relevant tests.
@@ -23,7 +23,9 @@ If auth fails (login form still visible after following instructions), report al
23
23
 
24
24
  ## Execute
25
25
 
26
- You have a batch result from `prepare_test_batch` containing `project` (with `credentials` array) and `tests[]` (each with `test_id`, `test_name`, `run_id`, `instructions`, `credential_name`, `pages`, `tags`, `script`, `script_generated_at`).
26
+ You have a batch result from `prepare_test_batch` containing `project` (with `credentials` array) and `tests[]` (each with `test_id`, `test_name`, `run_id`, `instructions`, `credential_name`, `pages`, `tags`, `has_script`).
27
+
28
+ Note: `has_script` is a boolean indicating whether a cached Playwright script exists. To fetch the actual script content, call `get_test(test_id)` — only do this when you need the script (e.g. in Step 5 when writing test files).
27
29
 
28
30
  If `tests` is empty, tell the user no matching active tests were found and stop.
29
31
 
@@ -35,8 +37,8 @@ Run the Authenticate procedure above once, using the standard Playwright tools (
35
37
 
36
38
  Split the batch into two groups:
37
39
 
38
- - **scripted**: tests where `script` is non-null (cached Playwright scripts ready to run)
39
- - **unscripted**: tests where `script` is null (need script generation)
40
+ - **scripted**: tests where `has_script` is true (cached Playwright scripts ready to run)
41
+ - **unscripted**: tests where `has_script` is false (need script generation)
40
42
 
41
43
  If all tests are scripted, skip to Step 4.
42
44
 
@@ -60,6 +62,42 @@ For each unscripted test (in difficulty order), do a **scouting pass** — actua
60
62
  4. Snapshot after each state change to capture: validation errors, success banners, modal dialogs, redirected pages, dynamically loaded content
61
63
  5. Collect all observed elements and selectors as context
62
64
 
65
+ #### Handling failures during scouting
66
+
67
+ If a step doesn't work as expected during the scouting pass, investigate before moving on:
68
+
69
+ 1. **Determine the cause**: Is it a test problem (wrong instructions, bad selectors, missing prerequisite) or an application bug (form won't submit, unexpected error, broken functionality)?
70
+
71
+ 2. **If the test is wrong** — fix and retry:
72
+ - Adjust the instructions to match what the UI actually requires (e.g. a required field the instructions missed, a different button label, an extra confirmation step)
73
+ - Update the test via `update_test` with corrected instructions
74
+ - Retry the failing step
75
+
76
+ 3. **If it's an application bug** — work around it and record the bug:
77
+ - Find a way to make the original test pass by avoiding the broken path (e.g. if a discount code field breaks form submission, leave it blank)
78
+ - Update the original test instructions if needed to use the working path
79
+ - Create a **new bug test** that reproduces the specific failure:
80
+ ```
81
+ create_test(project_id, {
82
+ name: "BUG: [description of the failure]",
83
+ instructions: "[steps that reproduce the bug, ending with the expected vs actual behaviour]",
84
+ tags: ["bug"],
85
+ page_ids: [relevant page IDs],
86
+ credential_name: same as original test
87
+ })
88
+ ```
89
+ - Start a run for the bug test and immediately complete it as failed:
90
+ ```
91
+ start_run(bug_test_id) → complete_run(run_id, "failed", "description of what went wrong")
92
+ ```
93
+ - Continue scouting the original test with the workaround
94
+
95
+ This ensures the original test captures the happy path while bugs are tracked as separate failing tests that will show up in future runs.
96
+
97
+ After each test's scouting pass, close the browser so the next test starts with a clean context (no leftover cookies, storage, or page state):
98
+
99
+ Call `browser_close` after collecting all observations. The next `browser_navigate` call will automatically open a fresh browser context.
100
+
63
101
  Then generate a `.spec.ts` script using the observed elements:
64
102
 
65
103
  ```ts
@@ -108,7 +146,7 @@ Call this via `browser_run_code`. If `auth_mode` is `none`, skip this step.
108
146
 
109
147
  Gather all tests that have scripts (previously scripted + newly generated from Step 3).
110
148
 
111
- 1. **Write test files**: For each scripted test, write the script to `/tmp/greenrun-tests/{test_id}.spec.ts`
149
+ 1. **Fetch and write test files**: For each scripted test, call `get_test(test_id)` to retrieve the full script content, then write it to `/tmp/greenrun-tests/{test_id}.spec.ts`. Fetch scripts in parallel to minimize latency.
112
150
 
113
151
  2. **Write config**: Write `/tmp/greenrun-tests/playwright.config.ts`:
114
152
 
@@ -137,6 +175,12 @@ npx playwright test --config /tmp/greenrun-tests/playwright.config.ts
137
175
 
138
176
  5. **Report results**: Call `complete_run(run_id, status, result_summary)` for each test. Map Playwright statuses: `passed` → `passed`, `failed`/`timedOut` → `failed`, other → `error`.
139
177
 
178
+ 6. **Clean up browsers**: After native execution completes, close any browsers left behind by the test runner:
179
+ ```bash
180
+ npx playwright test --config /tmp/greenrun-tests/playwright.config.ts --list 2>/dev/null; true
181
+ ```
182
+ The Playwright Test runner normally cleans up after itself, but if tests crash or timeout, browser processes may linger. Also call `browser_close` to reset the MCP browser context before any subsequent AI fallback execution.
183
+
140
184
  ### Step 6: Handle unscripted tests without scripts
141
185
 
142
186
  Any tests that still don't have scripts (e.g. because the background agent hasn't finished, or script generation failed) need to be executed via AI agents using the legacy approach. Follow Step 7 for these tests.
@@ -154,8 +198,10 @@ After parsing all native results, walk through them in completion order. Track c
154
198
 
155
199
  For tests that **failed** in native execution (and circuit breaker has not tripped):
156
200
 
157
- 1. Start new runs via `start_run(test_id)` (the original runs were already completed in Step 5)
158
- 2. Launch background Task agents using the tab-isolation pattern:
201
+ 1. Close the current browser context with `browser_close` so the fallback starts fresh
202
+ 2. Re-authenticate by navigating to the login page and following the Authenticate procedure
203
+ 3. Start new runs via `start_run(test_id)` (the original runs were already completed in Step 5)
204
+ 4. Launch background Task agents using the tab-isolation pattern:
159
205
 
160
206
  Create tabs and launch agents in batches of 20:
161
207