@allenpan2026/harshjudge 0.4.1 → 0.4.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.claude-plugin/plugin.json +1 -1
- package/README.md +4 -1
- package/package.json +1 -1
- package/skills/harshjudge/references/run-browser.md +63 -0
- package/skills/harshjudge/references/run-step-agent.md +2 -2
- package/skills/harshjudge/references/run.md +1 -1
- package/skills/harshjudge/references/run-playwright.md +0 -41
package/README.md
CHANGED
|
@@ -34,7 +34,10 @@ harshjudge init my-app
|
|
|
34
34
|
|
|
35
35
|
- **Node.js**: 18+ LTS
|
|
36
36
|
- **Claude Code**: Latest version
|
|
37
|
-
- **
|
|
37
|
+
- **A browser automation tool** (any one of):
|
|
38
|
+
- Playwright MCP (default, most common)
|
|
39
|
+
- browser-use MCP (token efficient alternative)
|
|
40
|
+
- Chrome DevTools MCP
|
|
38
41
|
|
|
39
42
|
## Installation
|
|
40
43
|
|
package/package.json
CHANGED
|
@@ -0,0 +1,63 @@
|
|
|
1
|
+
# Browser Tool Reference
|
|
2
|
+
|
|
3
|
+
Used during step execution in [[run]].
|
|
4
|
+
|
|
5
|
+
HarshJudge is **browser-tool-agnostic**. Use whatever browser automation tool is available in your environment. The step agent needs these capabilities:
|
|
6
|
+
|
|
7
|
+
## Required Capabilities
|
|
8
|
+
|
|
9
|
+
| Action | What to do |
|
|
10
|
+
|--------|-----------|
|
|
11
|
+
| Navigate | Go to a URL |
|
|
12
|
+
| Inspect page | Get current page state (DOM, accessibility tree) before interacting |
|
|
13
|
+
| Click | Click an element by text, role, or reference |
|
|
14
|
+
| Type | Enter text into an input field |
|
|
15
|
+
| Select | Choose an option from a dropdown |
|
|
16
|
+
| Wait | Wait for text to appear/disappear, or for a timeout |
|
|
17
|
+
| Screenshot | Capture the current page as an image file |
|
|
18
|
+
| Console logs | Read browser console output |
|
|
19
|
+
| Network logs | Read network requests/responses |
|
|
20
|
+
|
|
21
|
+
## Supported Browser Tools
|
|
22
|
+
|
|
23
|
+
### Playwright MCP (Default)
|
|
24
|
+
|
|
25
|
+
Most common. Available as a Claude Code plugin.
|
|
26
|
+
|
|
27
|
+
```json
|
|
28
|
+
{
|
|
29
|
+
"playwright": {
|
|
30
|
+
"command": "npx",
|
|
31
|
+
"args": ["@playwright/mcp@latest"]
|
|
32
|
+
}
|
|
33
|
+
}
|
|
34
|
+
```
|
|
35
|
+
|
|
36
|
+
Tools: `browser_navigate`, `browser_click`, `browser_type`, `browser_snapshot`, `browser_take_screenshot`, `browser_wait_for`, `browser_console_messages`, `browser_network_requests`
|
|
37
|
+
|
|
38
|
+
### browser-use MCP (Token Efficient Alternative)
|
|
39
|
+
|
|
40
|
+
Compresses DOM before sending to LLM — significantly fewer tokens per interaction. Python-based.
|
|
41
|
+
|
|
42
|
+
Setup: See [browser-use MCP docs](https://docs.browser-use.com/customize/integrations/mcp-server)
|
|
43
|
+
|
|
44
|
+
### Chrome DevTools MCP
|
|
45
|
+
|
|
46
|
+
Connects to an already-running Chrome instance via remote debugging.
|
|
47
|
+
|
|
48
|
+
```json
|
|
49
|
+
{
|
|
50
|
+
"chrome-devtools": {
|
|
51
|
+
"command": "npx",
|
|
52
|
+
"args": ["chrome-devtools-mcp"]
|
|
53
|
+
}
|
|
54
|
+
}
|
|
55
|
+
```
|
|
56
|
+
|
|
57
|
+
## Best Practices
|
|
58
|
+
|
|
59
|
+
- Always inspect the page before clicking or typing to get current element state
|
|
60
|
+
- Take a screenshot **before** and **after** each significant action
|
|
61
|
+
- Wait after navigation to confirm the page loaded
|
|
62
|
+
- Capture console errors on unexpected behavior
|
|
63
|
+
- Save screenshots to a temp path, then record via `harshjudge evidence`
|
|
@@ -20,8 +20,8 @@ Status: {pass|fail|first step}
|
|
|
20
20
|
## Your Task
|
|
21
21
|
1. Navigate to the base URL if not already there
|
|
22
22
|
2. Execute the actions described in the step content
|
|
23
|
-
3. Use
|
|
24
|
-
4.
|
|
23
|
+
3. Use the available browser tool to inspect the page before interacting
|
|
24
|
+
4. Take before/after screenshots using the browser tool
|
|
25
25
|
5. Record evidence:
|
|
26
26
|
harshjudge evidence {runId} --step {stepNumber} --type screenshot --name before --data /path/to/screenshot.png
|
|
27
27
|
6. Verify the expected outcome
|
|
@@ -15,7 +15,7 @@ Use this workflow when user wants to:
|
|
|
15
15
|
3. `harshjudge complete-step <runId>` — Complete each step, get next step
|
|
16
16
|
4. `harshjudge complete-run <runId>` — Finalize with pass/fail status
|
|
17
17
|
|
|
18
|
-
See [[run-
|
|
18
|
+
See [[run-browser]] for browser tool reference (Playwright MCP, browser-use, Chrome DevTools).
|
|
19
19
|
|
|
20
20
|
> **TOKEN OPTIMIZATION**: Each step executes in its own spawned agent. This isolates context and prevents token accumulation.
|
|
21
21
|
|
|
@@ -1,41 +0,0 @@
|
|
|
1
|
-
# Playwright Tools Reference
|
|
2
|
-
|
|
3
|
-
Used during step execution in [[run]].
|
|
4
|
-
|
|
5
|
-
## Navigation & State
|
|
6
|
-
|
|
7
|
-
| Tool | Usage |
|
|
8
|
-
|------|-------|
|
|
9
|
-
| `browser_navigate` | `{ "url": "http://localhost:3000" }` |
|
|
10
|
-
| `browser_snapshot` | `{}` → Returns accessibility tree with refs |
|
|
11
|
-
| `browser_take_screenshot` | `{ "filename": "step-01-before.png" }` |
|
|
12
|
-
|
|
13
|
-
## Interactions
|
|
14
|
-
|
|
15
|
-
| Tool | Usage |
|
|
16
|
-
|------|-------|
|
|
17
|
-
| `browser_click` | `{ "element": "Login button", "ref": "e5" }` |
|
|
18
|
-
| `browser_type` | `{ "element": "Email input", "ref": "e4", "text": "test@example.com" }` |
|
|
19
|
-
| `browser_select_option` | `{ "element": "Country", "ref": "e7", "values": ["USA"] }` |
|
|
20
|
-
|
|
21
|
-
## Waiting
|
|
22
|
-
|
|
23
|
-
| Tool | Usage |
|
|
24
|
-
|------|-------|
|
|
25
|
-
| `browser_wait_for` | `{ "text": "Welcome" }` |
|
|
26
|
-
| `browser_wait_for` | `{ "textGone": "Loading..." }` |
|
|
27
|
-
| `browser_wait_for` | `{ "time": 2 }` |
|
|
28
|
-
|
|
29
|
-
## Debugging
|
|
30
|
-
|
|
31
|
-
| Tool | Usage |
|
|
32
|
-
|------|-------|
|
|
33
|
-
| `browser_console_messages` | `{ "level": "error" }` |
|
|
34
|
-
| `browser_network_requests` | `{}` |
|
|
35
|
-
|
|
36
|
-
## Best Practices
|
|
37
|
-
|
|
38
|
-
- Always call `browser_snapshot` before `browser_click` or `browser_type` to get current element refs
|
|
39
|
-
- Take a screenshot **before** and **after** each significant action
|
|
40
|
-
- Use `browser_wait_for` after navigation to confirm page loaded
|
|
41
|
-
- Capture console errors on any unexpected behavior
|