@allenpan2026/harshjudge 0.4.4 → 0.4.5
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.claude-plugin/plugin.json +1 -1
- package/package.json +1 -1
- package/skills/harshjudge/SKILL.md +26 -27
- package/skills/harshjudge/assets/prd.md +9 -10
- package/skills/harshjudge/references/create.md +32 -2
- package/skills/harshjudge/references/iterate.md +19 -12
- package/skills/harshjudge/references/run-step-agent.md +25 -13
- package/skills/harshjudge/references/run-tools.md +95 -0
- package/skills/harshjudge/references/run.md +2 -2
- package/skills/harshjudge/references/run-browser.md +0 -63
package/package.json
CHANGED
|
@@ -1,11 +1,11 @@
|
|
|
1
1
|
---
|
|
2
2
|
name: harshjudge
|
|
3
|
-
description:
|
|
3
|
+
description: E2E testing orchestration for Claude Code. Use when creating, running, or managing end-to-end test scenarios — frontend (browser), backend (API), or CLI. Activates for tasks involving E2E tests, test scenario creation, test execution with evidence capture, or checking test status.
|
|
4
4
|
---
|
|
5
5
|
|
|
6
6
|
# HarshJudge E2E Testing
|
|
7
7
|
|
|
8
|
-
AI-native E2E testing with CLI commands and
|
|
8
|
+
AI-native E2E testing with CLI commands and evidence capture.
|
|
9
9
|
|
|
10
10
|
## CLI Setup
|
|
11
11
|
|
|
@@ -20,7 +20,7 @@ alias harshjudge="npx @allenpan2026/harshjudge@latest"
|
|
|
20
20
|
|
|
21
21
|
## Core Principles
|
|
22
22
|
|
|
23
|
-
1. **Evidence First**:
|
|
23
|
+
1. **Evidence First**: Capture evidence appropriate to the step type — screenshots for frontend, response bodies for API, stdout for CLI
|
|
24
24
|
2. **Fail Fast**: Stop on error, report with context
|
|
25
25
|
3. **Complete Runs**: Always call `harshjudge complete-run`, even on failure
|
|
26
26
|
4. **Step Isolation**: Each step executes in its own spawned agent for token efficiency
|
|
@@ -108,27 +108,19 @@ Main Agent Step Agents (spawned per step)
|
|
|
108
108
|
| `harshjudge discover search <pattern>` | Search file content |
|
|
109
109
|
| `harshjudge dashboard open/close/status` | Manage dashboard server |
|
|
110
110
|
|
|
111
|
-
###
|
|
111
|
+
### Step Types
|
|
112
112
|
|
|
113
|
-
|
|
113
|
+
Each step declares its execution mode via `type` in the step file frontmatter:
|
|
114
114
|
|
|
115
|
-
|
|
116
|
-
|
|
117
|
-
|
|
115
|
+
| Type | Tools | Evidence Captured |
|
|
116
|
+
|------|-------|-------------------|
|
|
117
|
+
| `frontend` | Browser tool (auto-detected) | screenshot, console_log, network_log, html_snapshot |
|
|
118
|
+
| `backend` | Bash (curl/httpie) | api_response, api_headers, db_snapshot |
|
|
119
|
+
| `cli` | Bash | stdout, stderr, exit_code |
|
|
118
120
|
|
|
119
|
-
|
|
121
|
+
If `type` is omitted, the step agent infers from the step content.
|
|
120
122
|
|
|
121
|
-
|
|
122
|
-
|--------|-----------|
|
|
123
|
-
| Navigate | Go to a URL |
|
|
124
|
-
| Inspect | Get page state before interacting |
|
|
125
|
-
| Click | Click element by text/role/ref |
|
|
126
|
-
| Type | Enter text into input |
|
|
127
|
-
| Screenshot | Capture page as image file |
|
|
128
|
-
| Wait | Wait for text/element/timeout |
|
|
129
|
-
| Console | Read browser console output |
|
|
130
|
-
|
|
131
|
-
See [run-browser.md](references/run-browser.md) for tool-specific syntax.
|
|
123
|
+
See [run-tools.md](references/run-tools.md) for tool-specific guidance per type.
|
|
132
124
|
|
|
133
125
|
## Step Agent Prompt Template
|
|
134
126
|
|
|
@@ -140,24 +132,31 @@ Execute step {stepId} of scenario {scenarioSlug}:
|
|
|
140
132
|
## Step Content
|
|
141
133
|
{content from steps/{stepId}-{slug}.md}
|
|
142
134
|
|
|
135
|
+
## Step Type
|
|
136
|
+
{type from step frontmatter, or infer from content: frontend|backend|cli}
|
|
137
|
+
|
|
143
138
|
## Project Context
|
|
144
139
|
Base URL: {from config.yaml}
|
|
145
|
-
|
|
140
|
+
Services: {from prd.md — list of services under test}
|
|
146
141
|
|
|
147
142
|
## Previous Step
|
|
148
143
|
Status: {pass|fail|first step}
|
|
149
144
|
|
|
150
145
|
## Your Task
|
|
151
|
-
1.
|
|
152
|
-
2.
|
|
153
|
-
|
|
154
|
-
|
|
146
|
+
1. Read the step type from frontmatter (frontend/backend/cli)
|
|
147
|
+
2. Execute the actions using the appropriate tool:
|
|
148
|
+
- frontend: use available browser tool
|
|
149
|
+
- backend: use curl/httpie via Bash
|
|
150
|
+
- cli: run commands via Bash
|
|
151
|
+
3. Capture evidence appropriate to the step type
|
|
152
|
+
4. Record evidence: harshjudge evidence <runId> --step {stepNumber} --type <evidence_type> --name <name> --data <path_or_data>
|
|
155
153
|
|
|
156
154
|
Return ONLY a JSON object:
|
|
157
155
|
{
|
|
158
156
|
"status": "pass" | "fail",
|
|
159
|
-
"evidencePaths": ["path1
|
|
160
|
-
"error": null | "error message"
|
|
157
|
+
"evidencePaths": ["path1", "path2"],
|
|
158
|
+
"error": null | "error message",
|
|
159
|
+
"summary": "Brief description of what happened and result (1-2 sentences)"
|
|
161
160
|
}
|
|
162
161
|
|
|
163
162
|
DO NOT return full evidence content. DO NOT explain your work.
|
|
@@ -1,15 +1,14 @@
|
|
|
1
1
|
# Project PRD
|
|
2
2
|
|
|
3
3
|
## Application Type
|
|
4
|
-
<!-- backend | fullstack | frontend | other -->
|
|
4
|
+
<!-- backend | fullstack | frontend | cli | other -->
|
|
5
5
|
{app_type}
|
|
6
6
|
|
|
7
|
-
##
|
|
8
|
-
|
|
9
|
-
|
|
10
|
-
|
|
11
|
-
|
|
|
12
|
-
| Database | {database_port} |
|
|
7
|
+
## Services Under Test
|
|
8
|
+
|
|
9
|
+
| Service | Type | Endpoint/Command |
|
|
10
|
+
|---------|------|-----------------|
|
|
11
|
+
| {service_name} | frontend/backend/cli | {url or command} |
|
|
13
12
|
|
|
14
13
|
## Main Scenarios
|
|
15
14
|
<!-- High-level list of main testing scenarios -->
|
|
@@ -26,9 +25,9 @@
|
|
|
26
25
|
|
|
27
26
|
## Tech Stack
|
|
28
27
|
<!-- Frameworks, libraries, tools -->
|
|
29
|
-
-
|
|
30
|
-
-
|
|
31
|
-
-
|
|
28
|
+
- {stack_item_1}
|
|
29
|
+
- {stack_item_2}
|
|
30
|
+
- {stack_item_3}
|
|
32
31
|
|
|
33
32
|
## Notes
|
|
34
33
|
<!-- Additional context for test scenarios -->
|
|
@@ -28,7 +28,7 @@ Read .harshJudge/prd.md
|
|
|
28
28
|
|
|
29
29
|
Check for:
|
|
30
30
|
- Existing user flows to test
|
|
31
|
-
- Known
|
|
31
|
+
- Known patterns, endpoints, and commands
|
|
32
32
|
- Timing considerations
|
|
33
33
|
- Environment requirements
|
|
34
34
|
- Test credentials
|
|
@@ -52,6 +52,7 @@ Each step needs:
|
|
|
52
52
|
|
|
53
53
|
```typescript
|
|
54
54
|
{
|
|
55
|
+
type: "frontend" | "backend" | "cli", // Step execution mode (optional, inferred if omitted)
|
|
55
56
|
title: string, // Step title (becomes filename)
|
|
56
57
|
description?: string, // What this step does
|
|
57
58
|
preconditions?: string, // Required state before step
|
|
@@ -60,9 +61,10 @@ Each step needs:
|
|
|
60
61
|
}
|
|
61
62
|
```
|
|
62
63
|
|
|
63
|
-
**Example step:**
|
|
64
|
+
**Example step (frontend):**
|
|
64
65
|
```json
|
|
65
66
|
{
|
|
67
|
+
"type": "frontend",
|
|
66
68
|
"title": "Navigate to login",
|
|
67
69
|
"description": "Open the application login page",
|
|
68
70
|
"preconditions": "Application is running at baseUrl",
|
|
@@ -71,6 +73,30 @@ Each step needs:
|
|
|
71
73
|
}
|
|
72
74
|
```
|
|
73
75
|
|
|
76
|
+
**Example step (backend):**
|
|
77
|
+
```json
|
|
78
|
+
{
|
|
79
|
+
"type": "backend",
|
|
80
|
+
"title": "Create user via API",
|
|
81
|
+
"description": "POST to /api/users and verify 201 response",
|
|
82
|
+
"preconditions": "Server is running at baseUrl",
|
|
83
|
+
"actions": "1. POST /api/users with {name: 'test', email: 'test@example.com'}\n2. Capture response body and status code",
|
|
84
|
+
"expectedOutcome": "Response status 201, body contains user id"
|
|
85
|
+
}
|
|
86
|
+
```
|
|
87
|
+
|
|
88
|
+
**Example step (cli):**
|
|
89
|
+
```json
|
|
90
|
+
{
|
|
91
|
+
"type": "cli",
|
|
92
|
+
"title": "Generate config file",
|
|
93
|
+
"description": "Run the generate command and verify output",
|
|
94
|
+
"preconditions": "Tool is installed and on PATH",
|
|
95
|
+
"actions": "1. Run my-tool generate --config prod\n2. Capture stdout and exit code",
|
|
96
|
+
"expectedOutcome": "Exit code 0, stdout contains 'Generated successfully'"
|
|
97
|
+
}
|
|
98
|
+
```
|
|
99
|
+
|
|
74
100
|
### Step 4: Run create
|
|
75
101
|
|
|
76
102
|
Pass scenario data as JSON via stdin or a file:
|
|
@@ -197,6 +223,10 @@ avgDuration: 0
|
|
|
197
223
|
|
|
198
224
|
**Step file format (01-navigate-to-login.md):**
|
|
199
225
|
```markdown
|
|
226
|
+
---
|
|
227
|
+
type: frontend
|
|
228
|
+
---
|
|
229
|
+
|
|
200
230
|
# Step 01: Navigate to login
|
|
201
231
|
|
|
202
232
|
## Description
|
|
@@ -13,13 +13,12 @@ Use this workflow when:
|
|
|
13
13
|
- `harshjudge status <slug>` — review failed run evidence
|
|
14
14
|
- `harshjudge create <slug>` — update scenario with step files
|
|
15
15
|
- `harshjudge start` + `harshjudge complete-step` + `harshjudge complete-run` — re-run test
|
|
16
|
-
- Playwright tools for browser automation
|
|
17
16
|
|
|
18
17
|
## Core Philosophy: Learn from Failures
|
|
19
18
|
|
|
20
19
|
**Failed runs are valuable data, not waste.** Each failed run provides:
|
|
21
|
-
1.
|
|
22
|
-
2. Logs revealing
|
|
20
|
+
1. Step evidence showing what actually happened (in `step-XX/evidence/`)
|
|
21
|
+
2. Logs, responses, and output revealing actual behavior
|
|
23
22
|
3. Evidence of gaps between expectation and reality
|
|
24
23
|
|
|
25
24
|
**Goal:** Use evidence to iterate toward a scenario that accurately tests the intended behavior, and **accumulate learnings** in `prd.md`.
|
|
@@ -42,7 +41,7 @@ Navigate to the failed run's evidence directories:
|
|
|
42
41
|
.harshJudge/scenarios/{slug}/runs/{runId}/
|
|
43
42
|
```
|
|
44
43
|
|
|
45
|
-
Read `result.json` for per-step details.
|
|
44
|
+
Read `result.json` for per-step details. Review step evidence (screenshots, responses, output) in `step-XX/evidence/`.
|
|
46
45
|
|
|
47
46
|
### Step 3: Review the Dashboard
|
|
48
47
|
|
|
@@ -52,14 +51,22 @@ harshjudge dashboard open
|
|
|
52
51
|
|
|
53
52
|
Open `http://localhost:3001` → Scenario → Failed Run.
|
|
54
53
|
|
|
55
|
-
Examine:
|
|
54
|
+
Examine: step evidence (screenshots, responses, output), console logs, network logs.
|
|
56
55
|
|
|
57
56
|
### Step 4: Classify the Failure
|
|
58
57
|
|
|
59
58
|
| Failure Type | Description | Action | Document In |
|
|
60
59
|
|-------------|-------------|--------|-------------|
|
|
61
|
-
| **
|
|
62
|
-
| **
|
|
60
|
+
| **Frontend: Element not found** | UI changed, element missing or relocated | Edit step file with updated actions | prd.md (UI patterns) |
|
|
61
|
+
| **Frontend: Page didn't load** | Navigation failed or timed out | Add wait, check URL | prd.md (timing patterns) |
|
|
62
|
+
| **Frontend: Visual mismatch** | Page state differs from expectation | Update expected outcome | — |
|
|
63
|
+
| **Backend: Status code mismatch** | API returned unexpected status | Update step or fix app | prd.md (known behaviors) |
|
|
64
|
+
| **Backend: Response schema drift** | Response shape changed | Update expected outcome | prd.md (schema notes) |
|
|
65
|
+
| **Backend: Timeout** | Request took too long | Add timeout, check service | prd.md (env setup) |
|
|
66
|
+
| **CLI: Non-zero exit code** | Command failed unexpectedly | Check stderr, update step | prd.md (known errors) |
|
|
67
|
+
| **CLI: Missing output** | Expected text absent from stdout | Update expected outcome | — |
|
|
68
|
+
| **CLI: Unexpected stderr** | Warnings or errors in stderr | Investigate root cause | prd.md (known bugs) |
|
|
69
|
+
| **Timing Issue** | Action too fast, resource not ready | Add wait to step | prd.md (timing patterns) |
|
|
63
70
|
| **Step Mismatch** | Step describes wrong flow | Edit step file | — |
|
|
64
71
|
| **Missing Step** | Need additional step | Add step, update scenario | — |
|
|
65
72
|
| **App Bug** | Application has actual bug | Mark as known-fail | prd.md (known bugs) |
|
|
@@ -102,10 +109,10 @@ Follow [[run]] workflow:
|
|
|
102
109
|
**Root Cause:** Email input selector changed from `.email-input` to `[data-testid="email"]`
|
|
103
110
|
|
|
104
111
|
**Changes Made:**
|
|
105
|
-
- Updated step-02
|
|
112
|
+
- Updated step-02 actions to match the new API response schema
|
|
106
113
|
|
|
107
114
|
**Learning:**
|
|
108
|
-
- Always
|
|
115
|
+
- Always verify response schema against live API, not just status code
|
|
109
116
|
```
|
|
110
117
|
|
|
111
118
|
### Step 8: Report Iteration Result
|
|
@@ -117,17 +124,17 @@ Previous Run: {runId} (FAIL at step 02)
|
|
|
117
124
|
New Run: {newRunId} (PASS)
|
|
118
125
|
|
|
119
126
|
Changes:
|
|
120
|
-
- Updated step-02
|
|
127
|
+
- Updated step-02 expected outcome to match new API response schema
|
|
121
128
|
|
|
122
129
|
Learnings recorded in prd.md:
|
|
123
|
-
-
|
|
130
|
+
- API response schema: always check body structure, not just status code
|
|
124
131
|
```
|
|
125
132
|
|
|
126
133
|
---
|
|
127
134
|
|
|
128
135
|
## Best Practices
|
|
129
136
|
|
|
130
|
-
1. **Review step evidence first** — before changing anything, examine
|
|
137
|
+
1. **Review step evidence first** — before changing anything, examine step evidence (screenshots, responses, output)
|
|
131
138
|
2. **Edit individual steps when possible** — for small fixes, edit the `.md` file directly
|
|
132
139
|
3. **Use create for major changes** — when adding/removing steps or reorganizing
|
|
133
140
|
4. **Document learnings in prd.md** — after each successful iteration
|
|
@@ -10,27 +10,39 @@ Execute step {stepId} of scenario {scenarioSlug}:
|
|
|
10
10
|
## Step Content
|
|
11
11
|
{paste content from steps/{step.file}}
|
|
12
12
|
|
|
13
|
+
## Step Type
|
|
14
|
+
{type from step frontmatter, or infer from content: frontend|backend|cli}
|
|
15
|
+
|
|
13
16
|
## Project Context
|
|
14
17
|
Base URL: {from config.yaml}
|
|
15
|
-
|
|
18
|
+
Services: {from prd.md — list of services under test}
|
|
16
19
|
|
|
17
20
|
## Previous Step
|
|
18
21
|
Status: {pass|fail|first step}
|
|
19
22
|
|
|
20
23
|
## Your Task
|
|
21
|
-
|
|
22
|
-
|
|
23
|
-
|
|
24
|
-
|
|
25
|
-
|
|
26
|
-
|
|
27
|
-
|
|
28
|
-
|
|
29
|
-
|
|
30
|
-
|
|
24
|
+
Based on step type:
|
|
25
|
+
|
|
26
|
+
**frontend:**
|
|
27
|
+
1. Use the available browser tool to navigate and interact
|
|
28
|
+
2. Inspect the page before clicking or typing
|
|
29
|
+
3. Take before/after screenshots
|
|
30
|
+
4. Record evidence: harshjudge evidence {runId} --step {stepNumber} --type screenshot --name before --data /path/to/screenshot.png
|
|
31
|
+
|
|
32
|
+
**backend:**
|
|
33
|
+
1. Execute HTTP requests using curl or httpie via Bash
|
|
34
|
+
2. Capture the full response (status, headers, body)
|
|
35
|
+
3. Record evidence: harshjudge evidence {runId} --step {stepNumber} --type api_response --name response --data /path/to/response.json
|
|
36
|
+
|
|
37
|
+
**cli:**
|
|
38
|
+
1. Run the specified commands via Bash
|
|
39
|
+
2. Capture stdout and stderr
|
|
40
|
+
3. Record evidence: harshjudge evidence {runId} --step {stepNumber} --type stdout --name output --data /path/to/output.txt
|
|
41
|
+
|
|
42
|
+
Then verify the expected outcome and return ONLY a JSON object:
|
|
31
43
|
{
|
|
32
44
|
"status": "pass" | "fail",
|
|
33
|
-
"evidencePaths": ["path1
|
|
45
|
+
"evidencePaths": ["path1", "path2"],
|
|
34
46
|
"error": null | "error message",
|
|
35
47
|
"summary": "Brief description of what happened and result (1-2 sentences)"
|
|
36
48
|
}
|
|
@@ -57,7 +69,7 @@ Task tool with:
|
|
|
57
69
|
"status": "pass",
|
|
58
70
|
"evidencePaths": [
|
|
59
71
|
".harshJudge/scenarios/login-flow/runs/abc123xyz/step-01/evidence/before.png",
|
|
60
|
-
".harshJudge/scenarios/login-flow/runs/abc123xyz/step-01/evidence/
|
|
72
|
+
".harshJudge/scenarios/login-flow/runs/abc123xyz/step-01/evidence/response.json"
|
|
61
73
|
],
|
|
62
74
|
"error": null,
|
|
63
75
|
"summary": "Login form loaded successfully. Email and password fields visible."
|
|
@@ -0,0 +1,95 @@
|
|
|
1
|
+
# Tool Reference by Step Type
|
|
2
|
+
|
|
3
|
+
Used during step execution in [[run]].
|
|
4
|
+
|
|
5
|
+
HarshJudge supports three step types. Use the tools appropriate to the step type.
|
|
6
|
+
|
|
7
|
+
## Frontend Steps
|
|
8
|
+
|
|
9
|
+
Use whatever browser automation tool is available in your environment.
|
|
10
|
+
|
|
11
|
+
### Required Capabilities
|
|
12
|
+
|
|
13
|
+
| Action | What to do |
|
|
14
|
+
|--------|-----------|
|
|
15
|
+
| Navigate | Go to a URL |
|
|
16
|
+
| Inspect | Get page state before interacting |
|
|
17
|
+
| Click | Click element by text/role/ref |
|
|
18
|
+
| Type | Enter text into input |
|
|
19
|
+
| Screenshot | Capture page as image file |
|
|
20
|
+
| Wait | Wait for text/element/timeout |
|
|
21
|
+
| Console | Read browser console output |
|
|
22
|
+
|
|
23
|
+
### Supported Browser Tools
|
|
24
|
+
|
|
25
|
+
**Playwright MCP** (default):
|
|
26
|
+
Tools: `browser_navigate`, `browser_click`, `browser_type`, `browser_snapshot`, `browser_take_screenshot`, `browser_wait_for`, `browser_console_messages`, `browser_network_requests`
|
|
27
|
+
|
|
28
|
+
**browser-use MCP** (token efficient):
|
|
29
|
+
See [browser-use MCP docs](https://docs.browser-use.com/customize/integrations/mcp-server)
|
|
30
|
+
|
|
31
|
+
**Chrome DevTools MCP:**
|
|
32
|
+
Tools: page navigation, DOM inspection, network monitoring via Chrome remote debugging
|
|
33
|
+
|
|
34
|
+
### Best Practices
|
|
35
|
+
|
|
36
|
+
- Inspect the page before clicking or typing
|
|
37
|
+
- Take a screenshot **before** and **after** each significant action
|
|
38
|
+
- Wait after navigation to confirm page loaded
|
|
39
|
+
- Capture console errors on unexpected behavior
|
|
40
|
+
|
|
41
|
+
## Backend Steps
|
|
42
|
+
|
|
43
|
+
Use Bash to make HTTP requests and query databases.
|
|
44
|
+
|
|
45
|
+
### HTTP Requests
|
|
46
|
+
|
|
47
|
+
```bash
|
|
48
|
+
# Using curl
|
|
49
|
+
curl -s -w "\n%{http_code}" -H "Content-Type: application/json" \
|
|
50
|
+
-X POST http://localhost:3000/api/users \
|
|
51
|
+
-d '{"name": "test"}' > /tmp/response.json
|
|
52
|
+
|
|
53
|
+
# Save response for evidence
|
|
54
|
+
harshjudge evidence <runId> --step 1 --type api_response --name create-user --data /tmp/response.json
|
|
55
|
+
```
|
|
56
|
+
|
|
57
|
+
### Database Queries
|
|
58
|
+
|
|
59
|
+
```bash
|
|
60
|
+
# PostgreSQL example
|
|
61
|
+
psql -h localhost -U user -d mydb -c "SELECT * FROM users WHERE email='test@example.com'" \
|
|
62
|
+
--csv > /tmp/db-result.csv
|
|
63
|
+
|
|
64
|
+
harshjudge evidence <runId> --step 1 --type db_snapshot --name users-check --data /tmp/db-result.csv
|
|
65
|
+
```
|
|
66
|
+
|
|
67
|
+
### Best Practices
|
|
68
|
+
|
|
69
|
+
- Always capture the full response (status code + headers + body)
|
|
70
|
+
- Save responses to temp files, then record via `harshjudge evidence`
|
|
71
|
+
- For auth flows, chain requests (login → use token → verify)
|
|
72
|
+
- Check response schema, not just status code
|
|
73
|
+
|
|
74
|
+
## CLI Steps
|
|
75
|
+
|
|
76
|
+
Use Bash to run commands and capture output.
|
|
77
|
+
|
|
78
|
+
### Command Execution
|
|
79
|
+
|
|
80
|
+
```bash
|
|
81
|
+
# Run command and capture output
|
|
82
|
+
my-tool generate --config prod > /tmp/stdout.txt 2> /tmp/stderr.txt
|
|
83
|
+
echo $? > /tmp/exit-code.txt
|
|
84
|
+
|
|
85
|
+
# Record evidence
|
|
86
|
+
harshjudge evidence <runId> --step 1 --type stdout --name generate-output --data /tmp/stdout.txt
|
|
87
|
+
harshjudge evidence <runId> --step 1 --type exit_code --name generate-exit --data /tmp/exit-code.txt
|
|
88
|
+
```
|
|
89
|
+
|
|
90
|
+
### Best Practices
|
|
91
|
+
|
|
92
|
+
- Capture both stdout and stderr separately
|
|
93
|
+
- Always check exit code
|
|
94
|
+
- For long-running commands, use timeout
|
|
95
|
+
- Save output to temp files before recording evidence
|
|
@@ -15,7 +15,7 @@ Use this workflow when user wants to:
|
|
|
15
15
|
3. `harshjudge complete-step <runId>` — Complete each step, get next step
|
|
16
16
|
4. `harshjudge complete-run <runId>` — Finalize with pass/fail status
|
|
17
17
|
|
|
18
|
-
See [[run-
|
|
18
|
+
See [[run-tools]] for tool reference by step type (frontend, backend, CLI).
|
|
19
19
|
|
|
20
20
|
> **TOKEN OPTIMIZATION**: Each step executes in its own spawned agent. This isolates context and prevents token accumulation.
|
|
21
21
|
|
|
@@ -108,7 +108,7 @@ harshjudge evidence <runId> \
|
|
|
108
108
|
|
|
109
109
|
Saved to: `.harshJudge/scenarios/{slug}/runs/{runId}/step-01/evidence/`
|
|
110
110
|
|
|
111
|
-
Evidence types: `screenshot`, `console_log`, `network_log`, `html_snapshot`.
|
|
111
|
+
Evidence types: `screenshot`, `console_log`, `network_log`, `html_snapshot`, `api_response`, `api_headers`, `db_snapshot`, `stdout`, `stderr`, `exit_code`, `custom`.
|
|
112
112
|
|
|
113
113
|
## Step Tracking (MANDATORY)
|
|
114
114
|
|
|
@@ -1,63 +0,0 @@
|
|
|
1
|
-
# Browser Tool Reference
|
|
2
|
-
|
|
3
|
-
Used during step execution in [[run]].
|
|
4
|
-
|
|
5
|
-
HarshJudge is **browser-tool-agnostic**. Use whatever browser automation tool is available in your environment. The step agent needs these capabilities:
|
|
6
|
-
|
|
7
|
-
## Required Capabilities
|
|
8
|
-
|
|
9
|
-
| Action | What to do |
|
|
10
|
-
|--------|-----------|
|
|
11
|
-
| Navigate | Go to a URL |
|
|
12
|
-
| Inspect page | Get current page state (DOM, accessibility tree) before interacting |
|
|
13
|
-
| Click | Click an element by text, role, or reference |
|
|
14
|
-
| Type | Enter text into an input field |
|
|
15
|
-
| Select | Choose an option from a dropdown |
|
|
16
|
-
| Wait | Wait for text to appear/disappear, or for a timeout |
|
|
17
|
-
| Screenshot | Capture the current page as an image file |
|
|
18
|
-
| Console logs | Read browser console output |
|
|
19
|
-
| Network logs | Read network requests/responses |
|
|
20
|
-
|
|
21
|
-
## Supported Browser Tools
|
|
22
|
-
|
|
23
|
-
### Playwright MCP (Default)
|
|
24
|
-
|
|
25
|
-
Most common. Available as a Claude Code plugin.
|
|
26
|
-
|
|
27
|
-
```json
|
|
28
|
-
{
|
|
29
|
-
"playwright": {
|
|
30
|
-
"command": "npx",
|
|
31
|
-
"args": ["@playwright/mcp@latest"]
|
|
32
|
-
}
|
|
33
|
-
}
|
|
34
|
-
```
|
|
35
|
-
|
|
36
|
-
Tools: `browser_navigate`, `browser_click`, `browser_type`, `browser_snapshot`, `browser_take_screenshot`, `browser_wait_for`, `browser_console_messages`, `browser_network_requests`
|
|
37
|
-
|
|
38
|
-
### browser-use MCP (Token Efficient Alternative)
|
|
39
|
-
|
|
40
|
-
Compresses DOM before sending to LLM — significantly fewer tokens per interaction. Python-based.
|
|
41
|
-
|
|
42
|
-
Setup: See [browser-use MCP docs](https://docs.browser-use.com/customize/integrations/mcp-server)
|
|
43
|
-
|
|
44
|
-
### Chrome DevTools MCP
|
|
45
|
-
|
|
46
|
-
Connects to an already-running Chrome instance via remote debugging.
|
|
47
|
-
|
|
48
|
-
```json
|
|
49
|
-
{
|
|
50
|
-
"chrome-devtools": {
|
|
51
|
-
"command": "npx",
|
|
52
|
-
"args": ["chrome-devtools-mcp"]
|
|
53
|
-
}
|
|
54
|
-
}
|
|
55
|
-
```
|
|
56
|
-
|
|
57
|
-
## Best Practices
|
|
58
|
-
|
|
59
|
-
- Always inspect the page before clicking or typing to get current element state
|
|
60
|
-
- Take a screenshot **before** and **after** each significant action
|
|
61
|
-
- Wait after navigation to confirm the page loaded
|
|
62
|
-
- Capture console errors on unexpected behavior
|
|
63
|
-
- Save screenshots to a temp path, then record via `harshjudge evidence`
|