@allenpan2026/harshjudge 0.4.4 → 0.4.5

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "harshjudge",
3
- "version": "0.4.4",
3
+ "version": "0.4.5",
4
4
  "description": "AI-native E2E testing orchestration for Claude Code",
5
5
  "author": {
6
6
  "name": "Allen Pan"
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@allenpan2026/harshjudge",
3
- "version": "0.4.4",
3
+ "version": "0.4.5",
4
4
  "description": "AI-native E2E testing orchestration CLI for Claude Code.",
5
5
  "type": "module",
6
6
  "main": "./dist/index.js",
@@ -1,11 +1,11 @@
1
1
  ---
2
2
  name: harshjudge
3
- description: AI-native E2E testing orchestration for Claude Code. Use when creating, running, or managing end-to-end test scenarios with visual evidence capture. Activates for tasks involving E2E tests, browser automation testing, test scenario creation, test execution with screenshots, or checking test status.
3
+ description: E2E testing orchestration for Claude Code. Use when creating, running, or managing end-to-end test scenarios frontend (browser), backend (API), or CLI. Activates for tasks involving E2E tests, test scenario creation, test execution with evidence capture, or checking test status.
4
4
  ---
5
5
 
6
6
  # HarshJudge E2E Testing
7
7
 
8
- AI-native E2E testing with CLI commands and visual evidence capture.
8
+ AI-native E2E testing with CLI commands and evidence capture.
9
9
 
10
10
  ## CLI Setup
11
11
 
@@ -20,7 +20,7 @@ alias harshjudge="npx @allenpan2026/harshjudge@latest"
20
20
 
21
21
  ## Core Principles
22
22
 
23
- 1. **Evidence First**: Screenshot before and after every action
23
+ 1. **Evidence First**: Capture evidence appropriate to the step type — screenshots for frontend, response bodies for API, stdout for CLI
24
24
  2. **Fail Fast**: Stop on error, report with context
25
25
  3. **Complete Runs**: Always call `harshjudge complete-run`, even on failure
26
26
  4. **Step Isolation**: Each step executes in its own spawned agent for token efficiency
@@ -108,27 +108,19 @@ Main Agent Step Agents (spawned per step)
108
108
  | `harshjudge discover search <pattern>` | Search file content |
109
109
  | `harshjudge dashboard open/close/status` | Manage dashboard server |
110
110
 
111
- ### Browser Automation (Auto-Detect)
111
+ ### Step Types
112
112
 
113
- Before running a scenario, detect which browser tool is available by checking for these tools in order:
113
+ Each step declares its execution mode via `type` in the step file frontmatter:
114
114
 
115
- 1. **Playwright MCP** look for `browser_navigate` tool
116
- 2. **browser-use** — look for `browser_use` tool or `browser-use` CLI
117
- 3. **cmux-browser** look for cmux browser surfaces
115
+ | Type | Tools | Evidence Captured |
116
+ |------|-------|-------------------|
117
+ | `frontend` | Browser tool (auto-detected) | screenshot, console_log, network_log, html_snapshot |
118
+ | `backend` | Bash (curl/httpie) | api_response, api_headers, db_snapshot |
119
+ | `cli` | Bash | stdout, stderr, exit_code |
118
120
 
119
- Use whichever is found first. The step agent needs these actions:
121
+ If `type` is omitted, the step agent infers from the step content.
120
122
 
121
- | Action | What to do |
122
- |--------|-----------|
123
- | Navigate | Go to a URL |
124
- | Inspect | Get page state before interacting |
125
- | Click | Click element by text/role/ref |
126
- | Type | Enter text into input |
127
- | Screenshot | Capture page as image file |
128
- | Wait | Wait for text/element/timeout |
129
- | Console | Read browser console output |
130
-
131
- See [run-browser.md](references/run-browser.md) for tool-specific syntax.
123
+ See [run-tools.md](references/run-tools.md) for tool-specific guidance per type.
132
124
 
133
125
  ## Step Agent Prompt Template
134
126
 
@@ -140,24 +132,31 @@ Execute step {stepId} of scenario {scenarioSlug}:
140
132
  ## Step Content
141
133
  {content from steps/{stepId}-{slug}.md}
142
134
 
135
+ ## Step Type
136
+ {type from step frontmatter, or infer from content: frontend|backend|cli}
137
+
143
138
  ## Project Context
144
139
  Base URL: {from config.yaml}
145
- Auth: {from prd.md if needed}
140
+ Services: {from prd.md list of services under test}
146
141
 
147
142
  ## Previous Step
148
143
  Status: {pass|fail|first step}
149
144
 
150
145
  ## Your Task
151
- 1. Execute the actions using the available browser tool
152
- 2. Inspect the page before clicking or typing
153
- 3. Capture before/after screenshots
154
- 4. Record evidence: harshjudge evidence <runId> --step {stepNumber} --type screenshot --name before --data /path/to/screenshot.png
146
+ 1. Read the step type from frontmatter (frontend/backend/cli)
147
+ 2. Execute the actions using the appropriate tool:
148
+ - frontend: use available browser tool
149
+ - backend: use curl/httpie via Bash
150
+ - cli: run commands via Bash
151
+ 3. Capture evidence appropriate to the step type
152
+ 4. Record evidence: harshjudge evidence <runId> --step {stepNumber} --type <evidence_type> --name <name> --data <path_or_data>
155
153
 
156
154
  Return ONLY a JSON object:
157
155
  {
158
156
  "status": "pass" | "fail",
159
- "evidencePaths": ["path1.png", "path2.png"],
160
- "error": null | "error message"
157
+ "evidencePaths": ["path1", "path2"],
158
+ "error": null | "error message",
159
+ "summary": "Brief description of what happened and result (1-2 sentences)"
161
160
  }
162
161
 
163
162
  DO NOT return full evidence content. DO NOT explain your work.
@@ -1,15 +1,14 @@
1
1
  # Project PRD
2
2
 
3
3
  ## Application Type
4
- <!-- backend | fullstack | frontend | other -->
4
+ <!-- backend | fullstack | frontend | cli | other -->
5
5
  {app_type}
6
6
 
7
- ## Ports
8
- | Service | Port |
9
- |---------|------|
10
- | Frontend | {frontend_port} |
11
- | Backend | {backend_port} |
12
- | Database | {database_port} |
7
+ ## Services Under Test
8
+
9
+ | Service | Type | Endpoint/Command |
10
+ |---------|------|-----------------|
11
+ | {service_name} | frontend/backend/cli | {url or command} |
13
12
 
14
13
  ## Main Scenarios
15
14
  <!-- High-level list of main testing scenarios -->
@@ -26,9 +25,9 @@
26
25
 
27
26
  ## Tech Stack
28
27
  <!-- Frameworks, libraries, tools -->
29
- - Frontend: {frontend_stack}
30
- - Backend: {backend_stack}
31
- - Testing: {testing_tools}
28
+ - {stack_item_1}
29
+ - {stack_item_2}
30
+ - {stack_item_3}
32
31
 
33
32
  ## Notes
34
33
  <!-- Additional context for test scenarios -->
@@ -28,7 +28,7 @@ Read .harshJudge/prd.md
28
28
 
29
29
  Check for:
30
30
  - Existing user flows to test
31
- - Known UI patterns and selectors
31
+ - Known patterns, endpoints, and commands
32
32
  - Timing considerations
33
33
  - Environment requirements
34
34
  - Test credentials
@@ -52,6 +52,7 @@ Each step needs:
52
52
 
53
53
  ```typescript
54
54
  {
55
+ type: "frontend" | "backend" | "cli", // Step execution mode (optional, inferred if omitted)
55
56
  title: string, // Step title (becomes filename)
56
57
  description?: string, // What this step does
57
58
  preconditions?: string, // Required state before step
@@ -60,9 +61,10 @@ Each step needs:
60
61
  }
61
62
  ```
62
63
 
63
- **Example step:**
64
+ **Example step (frontend):**
64
65
  ```json
65
66
  {
67
+ "type": "frontend",
66
68
  "title": "Navigate to login",
67
69
  "description": "Open the application login page",
68
70
  "preconditions": "Application is running at baseUrl",
@@ -71,6 +73,30 @@ Each step needs:
71
73
  }
72
74
  ```
73
75
 
76
+ **Example step (backend):**
77
+ ```json
78
+ {
79
+ "type": "backend",
80
+ "title": "Create user via API",
81
+ "description": "POST to /api/users and verify 201 response",
82
+ "preconditions": "Server is running at baseUrl",
83
+ "actions": "1. POST /api/users with {name: 'test', email: 'test@example.com'}\n2. Capture response body and status code",
84
+ "expectedOutcome": "Response status 201, body contains user id"
85
+ }
86
+ ```
87
+
88
+ **Example step (cli):**
89
+ ```json
90
+ {
91
+ "type": "cli",
92
+ "title": "Generate config file",
93
+ "description": "Run the generate command and verify output",
94
+ "preconditions": "Tool is installed and on PATH",
95
+ "actions": "1. Run my-tool generate --config prod\n2. Capture stdout and exit code",
96
+ "expectedOutcome": "Exit code 0, stdout contains 'Generated successfully'"
97
+ }
98
+ ```
99
+
74
100
  ### Step 4: Run create
75
101
 
76
102
  Pass scenario data as JSON via stdin or a file:
@@ -197,6 +223,10 @@ avgDuration: 0
197
223
 
198
224
  **Step file format (01-navigate-to-login.md):**
199
225
  ```markdown
226
+ ---
227
+ type: frontend
228
+ ---
229
+
200
230
  # Step 01: Navigate to login
201
231
 
202
232
  ## Description
@@ -13,13 +13,12 @@ Use this workflow when:
13
13
  - `harshjudge status <slug>` — review failed run evidence
14
14
  - `harshjudge create <slug>` — update scenario with step files
15
15
  - `harshjudge start` + `harshjudge complete-step` + `harshjudge complete-run` — re-run test
16
- - Playwright tools for browser automation
17
16
 
18
17
  ## Core Philosophy: Learn from Failures
19
18
 
20
19
  **Failed runs are valuable data, not waste.** Each failed run provides:
21
- 1. Screenshots showing what actually happened (in `step-XX/evidence/`)
22
- 2. Logs revealing backend behavior
20
+ 1. Step evidence showing what actually happened (in `step-XX/evidence/`)
21
+ 2. Logs, responses, and output revealing actual behavior
23
22
  3. Evidence of gaps between expectation and reality
24
23
 
25
24
  **Goal:** Use evidence to iterate toward a scenario that accurately tests the intended behavior, and **accumulate learnings** in `prd.md`.
@@ -42,7 +41,7 @@ Navigate to the failed run's evidence directories:
42
41
  .harshJudge/scenarios/{slug}/runs/{runId}/
43
42
  ```
44
43
 
45
- Read `result.json` for per-step details. View screenshots in `step-XX/evidence/`.
44
+ Read `result.json` for per-step details. Review step evidence (screenshots, responses, output) in `step-XX/evidence/`.
46
45
 
47
46
  ### Step 3: Review the Dashboard
48
47
 
@@ -52,14 +51,22 @@ harshjudge dashboard open
52
51
 
53
52
  Open `http://localhost:3001` → Scenario → Failed Run.
54
53
 
55
- Examine: before/after screenshots, console logs, network logs.
54
+ Examine: step evidence (screenshots, responses, output), console logs, network logs.
56
55
 
57
56
  ### Step 4: Classify the Failure
58
57
 
59
58
  | Failure Type | Description | Action | Document In |
60
59
  |-------------|-------------|--------|-------------|
61
- | **Selector Broken** | UI changed, selectors outdated | Edit step file | prd.md (selector notes) |
62
- | **Timing Issue** | Action too fast, element not ready | Add wait to step | prd.md (timing patterns) |
60
+ | **Frontend: Element not found** | UI changed, element missing or relocated | Edit step file with updated actions | prd.md (UI patterns) |
61
+ | **Frontend: Page didn't load** | Navigation failed or timed out | Add wait, check URL | prd.md (timing patterns) |
62
+ | **Frontend: Visual mismatch** | Page state differs from expectation | Update expected outcome | — |
63
+ | **Backend: Status code mismatch** | API returned unexpected status | Update step or fix app | prd.md (known behaviors) |
64
+ | **Backend: Response schema drift** | Response shape changed | Update expected outcome | prd.md (schema notes) |
65
+ | **Backend: Timeout** | Request took too long | Add timeout, check service | prd.md (env setup) |
66
+ | **CLI: Non-zero exit code** | Command failed unexpectedly | Check stderr, update step | prd.md (known errors) |
67
+ | **CLI: Missing output** | Expected text absent from stdout | Update expected outcome | — |
68
+ | **CLI: Unexpected stderr** | Warnings or errors in stderr | Investigate root cause | prd.md (known bugs) |
69
+ | **Timing Issue** | Action too fast, resource not ready | Add wait to step | prd.md (timing patterns) |
63
70
  | **Step Mismatch** | Step describes wrong flow | Edit step file | — |
64
71
  | **Missing Step** | Need additional step | Add step, update scenario | — |
65
72
  | **App Bug** | Application has actual bug | Mark as known-fail | prd.md (known bugs) |
@@ -102,10 +109,10 @@ Follow [[run]] workflow:
102
109
  **Root Cause:** Email input selector changed from `.email-input` to `[data-testid="email"]`
103
110
 
104
111
  **Changes Made:**
105
- - Updated step-02 Playwright selectors to use data-testid attributes
112
+ - Updated step-02 actions to match the new API response schema
106
113
 
107
114
  **Learning:**
108
- - Always prefer data-testid selectors over class names
115
+ - Always verify response schema against live API, not just status code
109
116
  ```
110
117
 
111
118
  ### Step 8: Report Iteration Result
@@ -117,17 +124,17 @@ Previous Run: {runId} (FAIL at step 02)
117
124
  New Run: {newRunId} (PASS)
118
125
 
119
126
  Changes:
120
- - Updated step-02 selectors to use data-testid
127
+ - Updated step-02 expected outcome to match new API response schema
121
128
 
122
129
  Learnings recorded in prd.md:
123
- - Selector convention: prefer data-testid attributes
130
+ - API response schema: always check body structure, not just status code
124
131
  ```
125
132
 
126
133
  ---
127
134
 
128
135
  ## Best Practices
129
136
 
130
- 1. **Review step evidence first** — before changing anything, examine before/after screenshots
137
+ 1. **Review step evidence first** — before changing anything, examine step evidence (screenshots, responses, output)
131
138
  2. **Edit individual steps when possible** — for small fixes, edit the `.md` file directly
132
139
  3. **Use create for major changes** — when adding/removing steps or reorganizing
133
140
  4. **Document learnings in prd.md** — after each successful iteration
@@ -10,27 +10,39 @@ Execute step {stepId} of scenario {scenarioSlug}:
10
10
  ## Step Content
11
11
  {paste content from steps/{step.file}}
12
12
 
13
+ ## Step Type
14
+ {type from step frontmatter, or infer from content: frontend|backend|cli}
15
+
13
16
  ## Project Context
14
17
  Base URL: {from config.yaml}
15
- Auth: {from prd.md if this step involves login}
18
+ Services: {from prd.md list of services under test}
16
19
 
17
20
  ## Previous Step
18
21
  Status: {pass|fail|first step}
19
22
 
20
23
  ## Your Task
21
- 1. Navigate to the base URL if not already there
22
- 2. Execute the actions described in the step content
23
- 3. Use the available browser tool to inspect the page before interacting
24
- 4. Take before/after screenshots using the browser tool
25
- 5. Record evidence:
26
- harshjudge evidence {runId} --step {stepNumber} --type screenshot --name before --data /path/to/screenshot.png
27
- 6. Verify the expected outcome
28
- 7. Write a summary describing what happened and whether expected outcome matched
29
-
30
- Return ONLY a JSON object:
24
+ Based on step type:
25
+
26
+ **frontend:**
27
+ 1. Use the available browser tool to navigate and interact
28
+ 2. Inspect the page before clicking or typing
29
+ 3. Take before/after screenshots
30
+ 4. Record evidence: harshjudge evidence {runId} --step {stepNumber} --type screenshot --name before --data /path/to/screenshot.png
31
+
32
+ **backend:**
33
+ 1. Execute HTTP requests using curl or httpie via Bash
34
+ 2. Capture the full response (status, headers, body)
35
+ 3. Record evidence: harshjudge evidence {runId} --step {stepNumber} --type api_response --name response --data /path/to/response.json
36
+
37
+ **cli:**
38
+ 1. Run the specified commands via Bash
39
+ 2. Capture stdout and stderr
40
+ 3. Record evidence: harshjudge evidence {runId} --step {stepNumber} --type stdout --name output --data /path/to/output.txt
41
+
42
+ Then verify the expected outcome and return ONLY a JSON object:
31
43
  {
32
44
  "status": "pass" | "fail",
33
- "evidencePaths": ["path1.png", "path2.png"],
45
+ "evidencePaths": ["path1", "path2"],
34
46
  "error": null | "error message",
35
47
  "summary": "Brief description of what happened and result (1-2 sentences)"
36
48
  }
@@ -57,7 +69,7 @@ Task tool with:
57
69
  "status": "pass",
58
70
  "evidencePaths": [
59
71
  ".harshJudge/scenarios/login-flow/runs/abc123xyz/step-01/evidence/before.png",
60
- ".harshJudge/scenarios/login-flow/runs/abc123xyz/step-01/evidence/after.png"
72
+ ".harshJudge/scenarios/login-flow/runs/abc123xyz/step-01/evidence/response.json"
61
73
  ],
62
74
  "error": null,
63
75
  "summary": "Login form loaded successfully. Email and password fields visible."
@@ -0,0 +1,95 @@
1
+ # Tool Reference by Step Type
2
+
3
+ Used during step execution in [[run]].
4
+
5
+ HarshJudge supports three step types. Use the tools appropriate to the step type.
6
+
7
+ ## Frontend Steps
8
+
9
+ Use whatever browser automation tool is available in your environment.
10
+
11
+ ### Required Capabilities
12
+
13
+ | Action | What to do |
14
+ |--------|-----------|
15
+ | Navigate | Go to a URL |
16
+ | Inspect | Get page state before interacting |
17
+ | Click | Click element by text/role/ref |
18
+ | Type | Enter text into input |
19
+ | Screenshot | Capture page as image file |
20
+ | Wait | Wait for text/element/timeout |
21
+ | Console | Read browser console output |
22
+
23
+ ### Supported Browser Tools
24
+
25
+ **Playwright MCP** (default):
26
+ Tools: `browser_navigate`, `browser_click`, `browser_type`, `browser_snapshot`, `browser_take_screenshot`, `browser_wait_for`, `browser_console_messages`, `browser_network_requests`
27
+
28
+ **browser-use MCP** (token efficient):
29
+ See [browser-use MCP docs](https://docs.browser-use.com/customize/integrations/mcp-server)
30
+
31
+ **Chrome DevTools MCP:**
32
+ Tools: page navigation, DOM inspection, network monitoring via Chrome remote debugging
33
+
34
+ ### Best Practices
35
+
36
+ - Inspect the page before clicking or typing
37
+ - Take a screenshot **before** and **after** each significant action
38
+ - Wait after navigation to confirm page loaded
39
+ - Capture console errors on unexpected behavior
40
+
41
+ ## Backend Steps
42
+
43
+ Use Bash to make HTTP requests and query databases.
44
+
45
+ ### HTTP Requests
46
+
47
+ ```bash
48
+ # Using curl
49
+ curl -s -w "\n%{http_code}" -H "Content-Type: application/json" \
50
+ -X POST http://localhost:3000/api/users \
51
+ -d '{"name": "test"}' > /tmp/response.json
52
+
53
+ # Save response for evidence
54
+ harshjudge evidence <runId> --step 1 --type api_response --name create-user --data /tmp/response.json
55
+ ```
56
+
57
+ ### Database Queries
58
+
59
+ ```bash
60
+ # PostgreSQL example
61
+ psql -h localhost -U user -d mydb -c "SELECT * FROM users WHERE email='test@example.com'" \
62
+ --csv > /tmp/db-result.csv
63
+
64
+ harshjudge evidence <runId> --step 1 --type db_snapshot --name users-check --data /tmp/db-result.csv
65
+ ```
66
+
67
+ ### Best Practices
68
+
69
+ - Always capture the full response (status code + headers + body)
70
+ - Save responses to temp files, then record via `harshjudge evidence`
71
+ - For auth flows, chain requests (login → use token → verify)
72
+ - Check response schema, not just status code
73
+
74
+ ## CLI Steps
75
+
76
+ Use Bash to run commands and capture output.
77
+
78
+ ### Command Execution
79
+
80
+ ```bash
81
+ # Run command and capture output
82
+ my-tool generate --config prod > /tmp/stdout.txt 2> /tmp/stderr.txt
83
+ echo $? > /tmp/exit-code.txt
84
+
85
+ # Record evidence
86
+ harshjudge evidence <runId> --step 1 --type stdout --name generate-output --data /tmp/stdout.txt
87
+ harshjudge evidence <runId> --step 1 --type exit_code --name generate-exit --data /tmp/exit-code.txt
88
+ ```
89
+
90
+ ### Best Practices
91
+
92
+ - Capture both stdout and stderr separately
93
+ - Always check exit code
94
+ - For long-running commands, use timeout
95
+ - Save output to temp files before recording evidence
@@ -15,7 +15,7 @@ Use this workflow when user wants to:
15
15
  3. `harshjudge complete-step <runId>` — Complete each step, get next step
16
16
  4. `harshjudge complete-run <runId>` — Finalize with pass/fail status
17
17
 
18
- See [[run-browser]] for browser tool reference (Playwright MCP, browser-use, Chrome DevTools).
18
+ See [[run-tools]] for tool reference by step type (frontend, backend, CLI).
19
19
 
20
20
  > **TOKEN OPTIMIZATION**: Each step executes in its own spawned agent. This isolates context and prevents token accumulation.
21
21
 
@@ -108,7 +108,7 @@ harshjudge evidence <runId> \
108
108
 
109
109
  Saved to: `.harshJudge/scenarios/{slug}/runs/{runId}/step-01/evidence/`
110
110
 
111
- Evidence types: `screenshot`, `console_log`, `network_log`, `html_snapshot`.
111
+ Evidence types: `screenshot`, `console_log`, `network_log`, `html_snapshot`, `api_response`, `api_headers`, `db_snapshot`, `stdout`, `stderr`, `exit_code`, `custom`.
112
112
 
113
113
  ## Step Tracking (MANDATORY)
114
114
 
@@ -1,63 +0,0 @@
1
- # Browser Tool Reference
2
-
3
- Used during step execution in [[run]].
4
-
5
- HarshJudge is **browser-tool-agnostic**. Use whatever browser automation tool is available in your environment. The step agent needs these capabilities:
6
-
7
- ## Required Capabilities
8
-
9
- | Action | What to do |
10
- |--------|-----------|
11
- | Navigate | Go to a URL |
12
- | Inspect page | Get current page state (DOM, accessibility tree) before interacting |
13
- | Click | Click an element by text, role, or reference |
14
- | Type | Enter text into an input field |
15
- | Select | Choose an option from a dropdown |
16
- | Wait | Wait for text to appear/disappear, or for a timeout |
17
- | Screenshot | Capture the current page as an image file |
18
- | Console logs | Read browser console output |
19
- | Network logs | Read network requests/responses |
20
-
21
- ## Supported Browser Tools
22
-
23
- ### Playwright MCP (Default)
24
-
25
- Most common. Available as a Claude Code plugin.
26
-
27
- ```json
28
- {
29
- "playwright": {
30
- "command": "npx",
31
- "args": ["@playwright/mcp@latest"]
32
- }
33
- }
34
- ```
35
-
36
- Tools: `browser_navigate`, `browser_click`, `browser_type`, `browser_snapshot`, `browser_take_screenshot`, `browser_wait_for`, `browser_console_messages`, `browser_network_requests`
37
-
38
- ### browser-use MCP (Token Efficient Alternative)
39
-
40
- Compresses DOM before sending to LLM — significantly fewer tokens per interaction. Python-based.
41
-
42
- Setup: See [browser-use MCP docs](https://docs.browser-use.com/customize/integrations/mcp-server)
43
-
44
- ### Chrome DevTools MCP
45
-
46
- Connects to an already-running Chrome instance via remote debugging.
47
-
48
- ```json
49
- {
50
- "chrome-devtools": {
51
- "command": "npx",
52
- "args": ["chrome-devtools-mcp"]
53
- }
54
- }
55
- ```
56
-
57
- ## Best Practices
58
-
59
- - Always inspect the page before clicking or typing to get current element state
60
- - Take a screenshot **before** and **after** each significant action
61
- - Wait after navigation to confirm the page loaded
62
- - Capture console errors on unexpected behavior
63
- - Save screenshots to a temp path, then record via `harshjudge evidence`