greenrun-cli 0.2.11 → 0.2.12

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "greenrun-cli",
3
- "version": "0.2.11",
3
+ "version": "0.2.12",
4
4
  "description": "CLI and MCP server for Greenrun - browser test management for Claude Code",
5
5
  "type": "module",
6
6
  "main": "dist/server.js",
@@ -44,66 +44,61 @@ If all tests are scripted, skip to Step 4.
44
44
 
45
45
  ### Step 3: Generate scripts for unscripted tests
46
46
 
47
- For each **unscripted** test, one at a time:
47
+ For each **unscripted** test, launch a Task agent sequentially (one at a time, wait for each to complete before starting the next). This keeps browser snapshot data out of the parent context.
48
48
 
49
- 1. Call `get_test(test_id)` to fetch the full instructions
50
- 2. Do a **scouting pass** — follow the test instructions in the browser to observe all UI states:
51
- - Navigate to the test's starting page via `browser_navigate`
52
- - Take a `browser_snapshot` to see initial elements
53
- - Follow the test instructions step by step using Playwright MCP tools (`browser_click`, `browser_type`, `browser_snapshot` after each action)
54
- - Snapshot after each state change to capture: validation errors, success banners, modal dialogs, redirected pages, dynamically loaded content
55
- - Collect all observed elements and selectors as context
56
-
57
- #### Handling failures during scouting
58
-
59
- If a step doesn't work as expected during the scouting pass, investigate before moving on:
60
-
61
- 1. **Determine the cause**: Is it a test problem (wrong instructions, bad selectors, missing prerequisite) or an application bug (form won't submit, unexpected error, broken functionality)?
49
+ ```
50
+ Task tool with:
51
+ - subagent_type: "general-purpose"
52
+ - max_turns: 30
53
+ - model: "sonnet"
54
+ - prompt: (see agent prompt below)
55
+ ```
62
56
 
63
- 2. **If the test is wrong** — fix and retry:
64
- - Adjust the instructions to match what the UI actually requires (e.g. a required field the instructions missed, a different button label, an extra confirmation step)
65
- - Update the test via `update_test` with corrected instructions
66
- - Retry the failing step
57
+ #### Script generation agent prompt
67
58
 
68
- 3. **If it's an application bug** work around it and record the bug:
69
- - Find a way to make the original test pass by avoiding the broken path (e.g. if a discount code field breaks form submission, leave it blank)
70
- - Update the original test instructions if needed to use the working path
71
- - Create a **new bug test** that reproduces the specific failure:
72
- ```
73
- create_test(project_id, {
74
- name: "BUG: [description of the failure]",
75
- instructions: "[steps that reproduce the bug, ending with the expected vs actual behaviour]",
76
- tags: ["bug"],
77
- page_ids: [relevant page IDs],
78
- credential_name: same as original test
79
- })
80
- ```
81
- - Start a run for the bug test and immediately complete it as failed:
82
- ```
83
- start_run(bug_test_id) → complete_run(run_id, "failed", "description of what went wrong")
84
- ```
85
- - Continue scouting the original test with the workaround
59
+ Include the following in the prompt, substituting the actual values:
86
60
 
87
- This ensures the original test captures the happy path while bugs are tracked as separate failing tests that will show up in future runs.
61
+ ```
62
+ Greenrun script generation for test: {test_name}
63
+ Test ID: {test_id}
64
+ Project ID: {project_id}
88
65
 
89
- After each test's scouting pass, close the browser so the next test starts with a clean context (no leftover cookies, storage, or page state):
66
+ Project auth: {auth_mode}, login_url: {login_url}
67
+ Credentials: {credential_name} — email: {email}, password: {password}
90
68
 
91
- Call `browser_close` after collecting all observations. The next `browser_navigate` call will automatically open a fresh browser context.
69
+ ## Task
92
70
 
93
- Then generate a `.spec.ts` script using the observed elements:
71
+ 1. Call `get_test("{test_id}")` to fetch the full test instructions
72
+ 2. Authenticate: navigate to {login_url} and log in with the credential above using `browser_navigate`, `browser_snapshot`, `browser_click`, `browser_type`
73
+ 3. Do a scouting pass — follow the test instructions step by step in the browser:
74
+ - Navigate to the test's starting page via `browser_navigate`
75
+ - Take a `browser_snapshot` to see initial elements
76
+ - Follow each instruction using Playwright MCP tools (`browser_click`, `browser_type`, `browser_snapshot` after each action)
77
+ - Snapshot after each state change to capture selectors, validation errors, success banners, modal dialogs, redirected pages
78
+ 4. Handle failures:
79
+ - If a step fails because the test instructions are wrong (wrong field name, missing step, bad selector), fix the instructions and retry. Update the test via `update_test` with corrected instructions.
80
+ - If a step fails because of an application bug, work around it for the main test and create a new bug test:
81
+ `create_test({project_id}, { name: "BUG: [description]", instructions: "[repro steps]", tags: ["bug"], page_ids: [...], credential_name: "{credential_name}" })`
82
+ Then: `start_run(bug_test_id)` → `complete_run(run_id, "failed", "description")`
83
+ 5. After scouting, generate a Playwright `.spec.ts` script:
94
84
 
95
- ```ts
96
85
  import { test, expect } from '@playwright/test';
97
86
  test('{test_name}', async ({ page }) => {
98
- // If the test has a credential_name, include login steps using the matching
99
- // credential from project.credentials (email + password) at the login_url
87
+ // Include login steps using the credential email + password at login_url
100
88
  await page.goto('{start_url}');
101
- // Steps generated from scouting pass observations
89
+ // Steps from scouting observations
102
90
  // Use getByRole, getByText, getByLabel, getByPlaceholder for selectors
103
91
  });
92
+
93
+ 6. Save: `update_test("{test_id}", { script: <generated_script>, script_generated_at: "<ISO_now>" })`
94
+ 7. Close browser: `browser_close`
95
+
96
+ ## Return
97
+
98
+ Return a one-line summary: {test_name} | script generated | or | {test_name} | failed | {reason}
104
99
  ```
105
100
 
106
- Save via `update_test(test_id, { script: <generated_script>, script_generated_at: <ISO_now> })`.
101
+ After each agent completes, note the result and proceed to the next unscripted test.
107
102
 
108
103
  ### Step 4: Export auth state
109
104
 
@@ -165,23 +160,50 @@ After parsing all native results, walk through them in completion order. Track c
165
160
 
166
161
  ### Step 7: AI fallback for native failures
167
162
 
168
- For tests that **failed** in native execution (and circuit breaker has not tripped), execute them one at a time using the AI agent approach:
163
+ For tests that **failed** in native execution (and circuit breaker has not tripped), execute them one at a time via Task agents. This keeps snapshot data out of the parent context.
169
164
 
170
- 1. Close the current browser context with `browser_close` so the fallback starts fresh
171
- 2. Re-authenticate by navigating to the login page and following the Authenticate procedure
172
- 3. For each failed test:
173
- - Call `get_test(test_id)` to fetch the full instructions
174
- - Start a new run via `start_run(test_id)` (the original run was already completed in Step 5)
175
- - Navigate to the test's starting page via `browser_navigate`
176
- - Follow the test instructions step by step using Playwright MCP tools
177
- - Determine if this is a stale script (UI changed) or an actual bug
178
- - If the test passes manually, invalidate the cached script: `update_test(test_id, { script: null, script_generated_at: null })`
179
- - Call `complete_run(run_id, status, brief_summary)`
180
- - Call `browser_close` before the next test to reset state
165
+ For each failed test, launch a Task agent sequentially (wait for each to complete before the next):
166
+
167
+ ```
168
+ Task tool with:
169
+ - subagent_type: "general-purpose"
170
+ - max_turns: 25
171
+ - model: "sonnet"
172
+ - prompt: (see agent prompt below)
173
+ ```
174
+
175
+ #### AI fallback agent prompt
176
+
177
+ ```
178
+ Greenrun AI fallback test. Test: {test_name}
179
+ Test ID: {test_id}
180
+
181
+ Project auth: {auth_mode}, login_url: {login_url}
182
+ Credentials: {credential_name} — email: {email}, password: {password}
183
+
184
+ Native execution failed with: {failure_message}
185
+
186
+ ## Task
187
+
188
+ 1. Call `get_test("{test_id}")` to fetch the full test instructions
189
+ 2. Start a new run: `start_run("{test_id}")` — note the run_id
190
+ 3. Authenticate: navigate to {login_url} and log in with the credential above
191
+ 4. Follow the test instructions step by step using Playwright MCP tools (`browser_navigate`, `browser_snapshot`, `browser_click`, `browser_type`)
192
+ 5. Determine if the native failure was a stale script (UI changed) or an actual application bug
193
+ 6. If the test passes manually, invalidate the stale cached script: `update_test("{test_id}", { script: null, script_generated_at: null })`
194
+ 7. Call `complete_run(run_id, status, brief_summary)` — ALWAYS call this, even on error
195
+ 8. Call `browser_close`
196
+
197
+ ## Return
198
+
199
+ Return: {test_name} | {status} | {summary}
200
+ ```
201
+
202
+ After each agent completes, note the result. If the agent fails to call `complete_run`, call it yourself with status "error".
181
203
 
182
204
  ### Step 8: Handle unscripted tests without scripts
183
205
 
184
- Any tests that didn't get scripts generated in Step 3 (e.g. if script generation failed) need to be executed the same way as Step 7 — one at a time using the AI agent approach. Follow the same pattern: get instructions, start run, execute in browser, complete run, close browser.
206
+ Any tests that didn't get scripts generated in Step 3 (e.g. if script generation failed) need to be executed the same way as Step 7 — launch a Task agent for each one sequentially using the AI fallback agent prompt above (omit the "Native execution failed with" line).
185
207
 
186
208
  ## Summarize
187
209