greenrun-cli 0.2.11 → 0.2.13
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/package.json +1 -1
- package/templates/commands/procedures.md +102 -66
package/package.json
CHANGED
|
@@ -44,66 +44,61 @@ If all tests are scripted, skip to Step 4.
|
|
|
44
44
|
|
|
45
45
|
### Step 3: Generate scripts for unscripted tests
|
|
46
46
|
|
|
47
|
-
For each **unscripted** test, one at a time
|
|
47
|
+
For each **unscripted** test, launch a Task agent sequentially (one at a time, wait for each to complete before starting the next). This keeps browser snapshot data out of the parent context.
|
|
48
48
|
|
|
49
|
-
|
|
50
|
-
|
|
51
|
-
|
|
52
|
-
|
|
53
|
-
|
|
54
|
-
|
|
55
|
-
|
|
56
|
-
|
|
57
|
-
#### Handling failures during scouting
|
|
58
|
-
|
|
59
|
-
If a step doesn't work as expected during the scouting pass, investigate before moving on:
|
|
60
|
-
|
|
61
|
-
1. **Determine the cause**: Is it a test problem (wrong instructions, bad selectors, missing prerequisite) or an application bug (form won't submit, unexpected error, broken functionality)?
|
|
49
|
+
```
|
|
50
|
+
Task tool with:
|
|
51
|
+
- subagent_type: "general-purpose"
|
|
52
|
+
- max_turns: 30
|
|
53
|
+
- model: "sonnet"
|
|
54
|
+
- prompt: (see agent prompt below)
|
|
55
|
+
```
|
|
62
56
|
|
|
63
|
-
|
|
64
|
-
- Adjust the instructions to match what the UI actually requires (e.g. a required field the instructions missed, a different button label, an extra confirmation step)
|
|
65
|
-
- Update the test via `update_test` with corrected instructions
|
|
66
|
-
- Retry the failing step
|
|
57
|
+
#### Script generation agent prompt
|
|
67
58
|
|
|
68
|
-
|
|
69
|
-
- Find a way to make the original test pass by avoiding the broken path (e.g. if a discount code field breaks form submission, leave it blank)
|
|
70
|
-
- Update the original test instructions if needed to use the working path
|
|
71
|
-
- Create a **new bug test** that reproduces the specific failure:
|
|
72
|
-
```
|
|
73
|
-
create_test(project_id, {
|
|
74
|
-
name: "BUG: [description of the failure]",
|
|
75
|
-
instructions: "[steps that reproduce the bug, ending with the expected vs actual behaviour]",
|
|
76
|
-
tags: ["bug"],
|
|
77
|
-
page_ids: [relevant page IDs],
|
|
78
|
-
credential_name: same as original test
|
|
79
|
-
})
|
|
80
|
-
```
|
|
81
|
-
- Start a run for the bug test and immediately complete it as failed:
|
|
82
|
-
```
|
|
83
|
-
start_run(bug_test_id) → complete_run(run_id, "failed", "description of what went wrong")
|
|
84
|
-
```
|
|
85
|
-
- Continue scouting the original test with the workaround
|
|
59
|
+
Include the following in the prompt, substituting the actual values:
|
|
86
60
|
|
|
87
|
-
|
|
61
|
+
```
|
|
62
|
+
Greenrun script generation for test: {test_name}
|
|
63
|
+
Test ID: {test_id}
|
|
64
|
+
Project ID: {project_id}
|
|
88
65
|
|
|
89
|
-
|
|
66
|
+
Project auth: {auth_mode}, login_url: {login_url}
|
|
67
|
+
Credentials: {credential_name} — email: {email}, password: {password}
|
|
90
68
|
|
|
91
|
-
|
|
69
|
+
## Task
|
|
92
70
|
|
|
93
|
-
|
|
71
|
+
1. Call `get_test("{test_id}")` to fetch the full test instructions
|
|
72
|
+
2. Authenticate: navigate to {login_url} and log in with the credential above using `browser_navigate`, `browser_snapshot`, `browser_click`, `browser_type`
|
|
73
|
+
3. Do a scouting pass — follow the test instructions step by step in the browser:
|
|
74
|
+
- Navigate to the test's starting page via `browser_navigate`
|
|
75
|
+
- Take a `browser_snapshot` to see initial elements
|
|
76
|
+
- Follow each instruction using Playwright MCP tools (`browser_click`, `browser_type`, `browser_snapshot` after each action)
|
|
77
|
+
- Snapshot after each state change to capture selectors, validation errors, success banners, modal dialogs, redirected pages
|
|
78
|
+
4. Handle failures:
|
|
79
|
+
- If a step fails because the test instructions are wrong (wrong field name, missing step, bad selector), fix the instructions and retry. Update the test via `update_test` with corrected instructions.
|
|
80
|
+
- If a step fails because of an application bug, work around it for the main test and create a new bug test:
|
|
81
|
+
`create_test({project_id}, { name: "BUG: [description]", instructions: "[repro steps]", tags: ["bug"], page_ids: [...], credential_name: "{credential_name}" })`
|
|
82
|
+
Then: `start_run(bug_test_id)` → `complete_run(run_id, "failed", "description")`
|
|
83
|
+
5. After scouting, generate a Playwright `.spec.ts` script:
|
|
94
84
|
|
|
95
|
-
```ts
|
|
96
85
|
import { test, expect } from '@playwright/test';
|
|
97
86
|
test('{test_name}', async ({ page }) => {
|
|
98
|
-
//
|
|
99
|
-
// credential from project.credentials (email + password) at the login_url
|
|
87
|
+
// Include login steps using the credential email + password at login_url
|
|
100
88
|
await page.goto('{start_url}');
|
|
101
|
-
// Steps
|
|
89
|
+
// Steps from scouting observations
|
|
102
90
|
// Use getByRole, getByText, getByLabel, getByPlaceholder for selectors
|
|
103
91
|
});
|
|
92
|
+
|
|
93
|
+
6. Save: `update_test("{test_id}", { script: <generated_script>, script_generated_at: "<ISO_now>" })`
|
|
94
|
+
7. Close browser: `browser_close`
|
|
95
|
+
|
|
96
|
+
## Return
|
|
97
|
+
|
|
98
|
+
Return a one-line summary: {test_name} | script generated | or | {test_name} | failed | {reason}
|
|
104
99
|
```
|
|
105
100
|
|
|
106
|
-
|
|
101
|
+
After each agent completes, note the result and proceed to the next unscripted test.
|
|
107
102
|
|
|
108
103
|
### Step 4: Export auth state
|
|
109
104
|
|
|
@@ -123,9 +118,22 @@ Call this via `browser_run_code`. If `auth_mode` is `none`, skip this step.
|
|
|
123
118
|
|
|
124
119
|
Gather all tests that have scripts (previously scripted + newly generated from Step 3).
|
|
125
120
|
|
|
126
|
-
1.
|
|
121
|
+
**1. Fetch scripts and write test files** — launch one Task agent per scripted test, all in parallel. These are just API calls + file writes so they don't conflict. Also write the Playwright config directly (it's small).
|
|
122
|
+
|
|
123
|
+
For each scripted test, launch in parallel:
|
|
127
124
|
|
|
128
|
-
|
|
125
|
+
```
|
|
126
|
+
Task tool with:
|
|
127
|
+
- subagent_type: "general-purpose"
|
|
128
|
+
- max_turns: 5
|
|
129
|
+
- model: "haiku"
|
|
130
|
+
- prompt: "Fetch the Playwright script for test {test_id} and write it to a file.
|
|
131
|
+
1. Call `get_test(\"{test_id}\")` to fetch the test
|
|
132
|
+
2. Write the `script` field to `/tmp/greenrun-tests/{test_id}.spec.ts` using the Write tool
|
|
133
|
+
3. Return: \"{test_name} | written\""
|
|
134
|
+
```
|
|
135
|
+
|
|
136
|
+
While the agents run, write `/tmp/greenrun-tests/playwright.config.ts` directly:
|
|
129
137
|
|
|
130
138
|
```ts
|
|
131
139
|
import { defineConfig } from '@playwright/test';
|
|
@@ -136,23 +144,24 @@ export default defineConfig({
|
|
|
136
144
|
reporter: [['json', { outputFile: 'results.json' }]],
|
|
137
145
|
use: {
|
|
138
146
|
baseURL: '{base_url}',
|
|
139
|
-
|
|
147
|
+
// include storageState ONLY if auth_mode is not 'none':
|
|
148
|
+
storageState: '/tmp/greenrun-auth-state.json',
|
|
140
149
|
},
|
|
141
150
|
});
|
|
142
151
|
```
|
|
143
152
|
|
|
144
|
-
|
|
153
|
+
Wait for all agents to complete before executing.
|
|
145
154
|
|
|
146
|
-
|
|
155
|
+
**2. Execute** — run via Bash:
|
|
147
156
|
```
|
|
148
157
|
npx playwright test --config /tmp/greenrun-tests/playwright.config.ts
|
|
149
158
|
```
|
|
150
159
|
|
|
151
|
-
|
|
160
|
+
**3. Parse results**: Read `/tmp/greenrun-tests/results.json`. Map each result back to a run ID via the filename: `{test_id}.spec.ts` → test_id → find the matching run_id from the batch.
|
|
152
161
|
|
|
153
|
-
|
|
162
|
+
**4. Report results**: Call `complete_run(run_id, status, result_summary)` for each test. Map Playwright statuses: `passed` → `passed`, `failed`/`timedOut` → `failed`, other → `error`.
|
|
154
163
|
|
|
155
|
-
|
|
164
|
+
**5. Clean up**: Call `browser_close` to reset the MCP browser context.
|
|
156
165
|
|
|
157
166
|
### Step 6: Circuit breaker
|
|
158
167
|
|
|
@@ -165,23 +174,50 @@ After parsing all native results, walk through them in completion order. Track c
|
|
|
165
174
|
|
|
166
175
|
### Step 7: AI fallback for native failures
|
|
167
176
|
|
|
168
|
-
For tests that **failed** in native execution (and circuit breaker has not tripped), execute them one at a time
|
|
177
|
+
For tests that **failed** in native execution (and circuit breaker has not tripped), execute them one at a time via Task agents. This keeps snapshot data out of the parent context.
|
|
169
178
|
|
|
170
|
-
|
|
171
|
-
|
|
172
|
-
|
|
173
|
-
|
|
174
|
-
|
|
175
|
-
|
|
176
|
-
|
|
177
|
-
|
|
178
|
-
|
|
179
|
-
|
|
180
|
-
|
|
179
|
+
For each failed test, launch a Task agent sequentially (wait for each to complete before the next):
|
|
180
|
+
|
|
181
|
+
```
|
|
182
|
+
Task tool with:
|
|
183
|
+
- subagent_type: "general-purpose"
|
|
184
|
+
- max_turns: 25
|
|
185
|
+
- model: "sonnet"
|
|
186
|
+
- prompt: (see agent prompt below)
|
|
187
|
+
```
|
|
188
|
+
|
|
189
|
+
#### AI fallback agent prompt
|
|
190
|
+
|
|
191
|
+
```
|
|
192
|
+
Greenrun AI fallback test. Test: {test_name}
|
|
193
|
+
Test ID: {test_id}
|
|
194
|
+
|
|
195
|
+
Project auth: {auth_mode}, login_url: {login_url}
|
|
196
|
+
Credentials: {credential_name} — email: {email}, password: {password}
|
|
197
|
+
|
|
198
|
+
Native execution failed with: {failure_message}
|
|
199
|
+
|
|
200
|
+
## Task
|
|
201
|
+
|
|
202
|
+
1. Call `get_test("{test_id}")` to fetch the full test instructions
|
|
203
|
+
2. Start a new run: `start_run("{test_id}")` — note the run_id
|
|
204
|
+
3. Authenticate: navigate to {login_url} and log in with the credential above
|
|
205
|
+
4. Follow the test instructions step by step using Playwright MCP tools (`browser_navigate`, `browser_snapshot`, `browser_click`, `browser_type`)
|
|
206
|
+
5. Determine if the native failure was a stale script (UI changed) or an actual application bug
|
|
207
|
+
6. If the test passes manually, invalidate the stale cached script: `update_test("{test_id}", { script: null, script_generated_at: null })`
|
|
208
|
+
7. Call `complete_run(run_id, status, brief_summary)` — ALWAYS call this, even on error
|
|
209
|
+
8. Call `browser_close`
|
|
210
|
+
|
|
211
|
+
## Return
|
|
212
|
+
|
|
213
|
+
Return: {test_name} | {status} | {summary}
|
|
214
|
+
```
|
|
215
|
+
|
|
216
|
+
After each agent completes, note the result. If the agent fails to call `complete_run`, call it yourself with status "error".
|
|
181
217
|
|
|
182
218
|
### Step 8: Handle unscripted tests without scripts
|
|
183
219
|
|
|
184
|
-
Any tests that didn't get scripts generated in Step 3 (e.g. if script generation failed) need to be executed the same way as Step 7 —
|
|
220
|
+
Any tests that didn't get scripts generated in Step 3 (e.g. if script generation failed) need to be executed the same way as Step 7 — launch a Task agent for each one sequentially using the AI fallback agent prompt above (omit the "Native execution failed with" line).
|
|
185
221
|
|
|
186
222
|
## Summarize
|
|
187
223
|
|