@allenpan2026/harshjudge 0.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,258 @@
1
+ # Create Scenario Workflow
2
+
3
+ ## Trigger
4
+
5
+ Use this workflow when user wants to:
6
+ - Create a new E2E test scenario
7
+ - Define test steps for a user flow
8
+ - Document a test case with expected behavior
9
+
10
+ ## CLI Commands Used
11
+
12
+ - `harshjudge create <slug>` — creates scenario with individual step files
13
+
14
+ ## Prerequisites
15
+
16
+ - HarshJudge must be initialized (`.harshJudge/` directory exists)
17
+ - If not initialized, run setup workflow first
18
+
19
+ ## Workflow
20
+
21
+ ### Step 1: Check PRD for Context
22
+
23
+ **Before creating a scenario, review existing knowledge:**
24
+
25
+ ```
26
+ Read .harshJudge/prd.md
27
+ ```
28
+
29
+ Check for:
30
+ - Existing user flows to test
31
+ - Known UI patterns and selectors
32
+ - Timing considerations
33
+ - Environment requirements
34
+ - Test credentials
35
+
36
+ ### Step 2: Gather Scenario Information
37
+
38
+ Collect from user (or analyze codebase to suggest):
39
+
40
+ | Field | Required | Description | Example |
41
+ |-------|----------|-------------|---------|
42
+ | `slug` | Yes | URL-safe identifier | `login-flow`, `checkout-process` |
43
+ | `title` | Yes | Human-readable title | `User Login Flow` |
44
+ | `steps` | Yes | Array of step objects | See format below |
45
+ | `tags` | No | Categorization tags | `["auth", "critical"]` |
46
+ | `estimatedDuration` | No | Expected seconds | `60` |
47
+ | `starred` | No | Mark as favorite | `false` |
48
+
49
+ ### Step 3: Define Steps
50
+
51
+ Each step needs:
52
+
53
+ ```typescript
54
+ {
55
+ title: string, // Step title (becomes filename)
56
+ description?: string, // What this step does
57
+ preconditions?: string, // Required state before step
58
+ actions: string, // Actions to perform
59
+ expectedOutcome: string // What should happen
60
+ }
61
+ ```
62
+
63
+ **Example step:**
64
+ ```json
65
+ {
66
+ "title": "Navigate to login",
67
+ "description": "Open the application login page",
68
+ "preconditions": "Application is running at baseUrl",
69
+ "actions": "1. Navigate to /login\n2. Wait for page to load",
70
+ "expectedOutcome": "Login form is visible with email and password fields"
71
+ }
72
+ ```
73
+
74
+ ### Step 4: Run create
75
+
76
+ Pass scenario data as JSON via stdin or a file:
77
+
78
+ ```bash
79
+ harshjudge create login-flow --title "User Login Flow" --steps-file steps.json
80
+ ```
81
+
82
+ Or provide inline JSON:
83
+
84
+ ```bash
85
+ harshjudge create login-flow --json '{
86
+ "title": "User Login Flow",
87
+ "steps": [
88
+ {
89
+ "title": "Navigate to login",
90
+ "description": "Open the login page",
91
+ "actions": "1. Navigate to /login\n2. Wait for page load",
92
+ "expectedOutcome": "Login form is visible"
93
+ },
94
+ {
95
+ "title": "Enter credentials",
96
+ "description": "Fill in the login form",
97
+ "preconditions": "Login form is visible",
98
+ "actions": "1. Enter email into email field\n2. Enter password into password field",
99
+ "expectedOutcome": "Both fields are populated"
100
+ },
101
+ {
102
+ "title": "Submit form",
103
+ "description": "Submit and verify login",
104
+ "actions": "1. Click login button\n2. Wait for redirect",
105
+ "expectedOutcome": "Dashboard is displayed with welcome message"
106
+ }
107
+ ],
108
+ "tags": ["auth", "critical", "smoke"],
109
+ "estimatedDuration": 60,
110
+ "starred": false
111
+ }'
112
+ ```
113
+
114
+ ### Step 5: Verify Output
115
+
116
+ The command outputs:
117
+
118
+ ```
119
+ Scenario created: login-flow
120
+
121
+ Structure:
122
+ .harshJudge/scenarios/login-flow/
123
+ meta.yaml
124
+ steps/
125
+ 01-navigate-to-login.md
126
+ 02-enter-credentials.md
127
+ 03-submit-form.md
128
+
129
+ Steps: 3
130
+ Tags: auth, critical, smoke
131
+ ```
132
+
133
+ **On Success:** Continue to Step 6
134
+ **On Error:** STOP and report (see Error Handling below)
135
+
136
+ ### Step 6: Report Success
137
+
138
+ ```
139
+ Scenario created: login-flow
140
+
141
+ Structure:
142
+ .harshJudge/scenarios/login-flow/
143
+ meta.yaml # Scenario definition
144
+ steps/
145
+ 01-navigate-to-login.md
146
+ 02-enter-credentials.md
147
+ 03-submit-form.md
148
+
149
+ Steps: 3
150
+ Tags: auth, critical, smoke
151
+
152
+ Next steps:
153
+ 1. Run the scenario: "Run the login-flow scenario"
154
+ 2. Expect iteration: First runs often reveal needed adjustments
155
+ 3. Learnings will be captured in prd.md
156
+ ```
157
+
158
+ ---
159
+
160
+ ## Created File Structure
161
+
162
+ After `harshjudge create` completes:
163
+
164
+ ```
165
+ .harshJudge/scenarios/{slug}/
166
+ meta.yaml # Scenario metadata + step references
167
+ steps/
168
+ 01-{step-slug}.md # First step details
169
+ 02-{step-slug}.md # Second step details
170
+ ...
171
+ ```
172
+
173
+ **meta.yaml format:**
174
+ ```yaml
175
+ title: User Login Flow
176
+ slug: login-flow
177
+ starred: false
178
+ tags:
179
+ - auth
180
+ - critical
181
+ estimatedDuration: 60
182
+ steps:
183
+ - id: '01'
184
+ title: Navigate to login
185
+ file: 01-navigate-to-login.md
186
+ - id: '02'
187
+ title: Enter credentials
188
+ file: 02-enter-credentials.md
189
+ - id: '03'
190
+ title: Submit form
191
+ file: 03-submit-form.md
192
+ totalRuns: 0
193
+ passCount: 0
194
+ failCount: 0
195
+ avgDuration: 0
196
+ ```
197
+
198
+ **Step file format (01-navigate-to-login.md):**
199
+ ```markdown
200
+ # Step 01: Navigate to login
201
+
202
+ ## Description
203
+ Open the login page
204
+
205
+ ## Preconditions
206
+ Application is running at baseUrl
207
+
208
+ ## Actions
209
+ 1. Navigate to /login
210
+ 2. Wait for page load
211
+
212
+ ## Expected Outcome
213
+ Login form is visible
214
+ ```
215
+
216
+ ---
217
+
218
+ ## Error Handling
219
+
220
+ | Error | Cause | Resolution |
221
+ |-------|-------|------------|
222
+ | `Project not initialized` | Missing .harshJudge/ | Run setup workflow |
223
+ | `Invalid slug format` | Non-URL-safe characters | Use lowercase, hyphens, numbers only |
224
+ | `Steps array empty` | No steps provided | Add at least one step |
225
+ | `Step missing actions` | Incomplete step object | Add actions and expectedOutcome |
226
+
227
+ **On Error:**
228
+ 1. **STOP immediately**
229
+ 2. Report error with full context
230
+ 3. Do NOT proceed or retry
231
+
232
+ ---
233
+
234
+ ## Updating Existing Scenarios
235
+
236
+ To update a scenario (same slug = update):
237
+
238
+ ```bash
239
+ harshjudge create login-flow --json '{ ... updated steps ... }'
240
+ ```
241
+
242
+ **What happens on update:**
243
+ - Step files are overwritten with new content
244
+ - `meta.yaml` is updated with new step references
245
+ - Run statistics (totalRuns, passCount, etc.) are **preserved**
246
+
247
+ **When to update vs create new:**
248
+ - **Update:** Fixing selectors, adding steps, correcting expectations
249
+ - **New:** Testing completely different flow
250
+
251
+ ---
252
+
253
+ ## Post-Create Guidance
254
+
255
+ After successful creation:
256
+ 1. **Run the scenario** - First runs often fail, this is expected
257
+ 2. **Use iterate workflow** - To fix issues and capture learnings
258
+ 3. **Learnings go to prd.md** - Document selector patterns, timing, etc.
@@ -0,0 +1,152 @@
1
+ # Iterate Scenario Workflow
2
+
3
+ ## Trigger
4
+
5
+ Use this workflow when:
6
+ - A test run **failed** and needs scenario refinement
7
+ - The scenario definition doesn't match actual application behavior
8
+ - User wants to **improve** a scenario based on failed evidence
9
+ - Test steps are **outdated** after application changes
10
+
11
+ ## CLI Commands Used
12
+
13
+ - `harshjudge status <slug>` — review failed run evidence
14
+ - `harshjudge create <slug>` — update scenario with step files
15
+ - `harshjudge start` + `harshjudge complete-step` + `harshjudge complete-run` — re-run test
16
+ - Playwright tools for browser automation
17
+
18
+ ## Core Philosophy: Learn from Failures
19
+
20
+ **Failed runs are valuable data, not waste.** Each failed run provides:
21
+ 1. Screenshots showing what actually happened (in `step-XX/evidence/`)
22
+ 2. Logs revealing backend behavior
23
+ 3. Evidence of gaps between expectation and reality
24
+
25
+ **Goal:** Use evidence to iterate toward a scenario that accurately tests the intended behavior, and **accumulate learnings** in `prd.md`.
26
+
27
+ ## Workflow
28
+
29
+ ### Step 1: Analyze the Failed Run
30
+
31
+ ```bash
32
+ harshjudge status login-flow
33
+ ```
34
+
35
+ Review to identify: lastRun ID, which step failed, historical pass rate.
36
+
37
+ ### Step 2: Review Evidence
38
+
39
+ Navigate to the failed run's evidence directories:
40
+
41
+ ```
42
+ .harshJudge/scenarios/{slug}/runs/{runId}/
43
+ ```
44
+
45
+ Read `result.json` for per-step details. View screenshots in `step-XX/evidence/`.
46
+
47
+ ### Step 3: Review the Dashboard
48
+
49
+ ```bash
50
+ harshjudge dashboard open
51
+ ```
52
+
53
+ Open `http://localhost:3001` → Scenario → Failed Run.
54
+
55
+ Examine: before/after screenshots, console logs, network logs.
56
+
57
+ ### Step 4: Classify the Failure
58
+
59
+ | Failure Type | Description | Action | Document In |
60
+ |-------------|-------------|--------|-------------|
61
+ | **Selector Broken** | UI changed, selectors outdated | Edit step file | prd.md (selector notes) |
62
+ | **Timing Issue** | Action too fast, element not ready | Add wait to step | prd.md (timing patterns) |
63
+ | **Step Mismatch** | Step describes wrong flow | Edit step file | — |
64
+ | **Missing Step** | Need additional step | Add step, update scenario | — |
65
+ | **App Bug** | Application has actual bug | Mark as known-fail | prd.md (known bugs) |
66
+ | **Environment Issue** | Test env not matching prod | Fix environment | prd.md (env setup) |
67
+
68
+ ### Step 5: Update the Step File(s)
69
+
70
+ **Option A: Edit a single step file directly**
71
+
72
+ ```
73
+ Edit .harshJudge/scenarios/{slug}/steps/{stepId}-{step-slug}.md
74
+ ```
75
+
76
+ **Option B: Recreate scenario with updated steps**
77
+
78
+ ```bash
79
+ harshjudge create login-flow --json '{ "title": "...", "steps": [...] }'
80
+ ```
81
+
82
+ > `harshjudge create` preserves existing run statistics when updating.
83
+
84
+ ### Step 6: Re-run the Updated Scenario
85
+
86
+ Follow [[run]] workflow:
87
+ 1. `harshjudge start login-flow`
88
+ 2. Execute each step via spawned agents
89
+ 3. `harshjudge complete-run <runId>` with final status
90
+
91
+ ### Step 7: Record the Iteration
92
+
93
+ **Update prd.md with learnings:**
94
+
95
+ ```markdown
96
+ ## Iteration History
97
+
98
+ ### ITR-001: Login selector fix (2024-01-15)
99
+
100
+ **Scenario:** login-flow
101
+ **Failed Step:** 02 (Enter credentials)
102
+ **Root Cause:** Email input selector changed from `.email-input` to `[data-testid="email"]`
103
+
104
+ **Changes Made:**
105
+ - Updated step-02 Playwright selectors to use data-testid attributes
106
+
107
+ **Learning:**
108
+ - Always prefer data-testid selectors over class names
109
+ ```
110
+
111
+ ### Step 8: Report Iteration Result
112
+
113
+ ```
114
+ Iteration complete: login-flow
115
+
116
+ Previous Run: {runId} (FAIL at step 02)
117
+ New Run: {newRunId} (PASS)
118
+
119
+ Changes:
120
+ - Updated step-02 selectors to use data-testid
121
+
122
+ Learnings recorded in prd.md:
123
+ - Selector convention: prefer data-testid attributes
124
+ ```
125
+
126
+ ---
127
+
128
+ ## Best Practices
129
+
130
+ 1. **Review step evidence first** — before changing anything, examine before/after screenshots
131
+ 2. **Edit individual steps when possible** — for small fixes, edit the `.md` file directly
132
+ 3. **Use create for major changes** — when adding/removing steps or reorganizing
133
+ 4. **Document learnings in prd.md** — after each successful iteration
134
+ 5. **Small iterations** — one change per iteration for clearer diagnosis
135
+
136
+ ---
137
+
138
+ ## Error Handling
139
+
140
+ | Error | Action |
141
+ |-------|--------|
142
+ | `harshjudge status` fails | Check if HarshJudge initialized, scenario exists |
143
+ | `harshjudge create` fails | Check slug format, step array validity |
144
+ | Step file not found | Recreate scenario with `harshjudge create` |
145
+ | New run fails same way | Check if change was applied correctly |
146
+ | New run fails differently | Progress — new issue to investigate |
147
+
148
+ **On Error:**
149
+ 1. **STOP** — Do not proceed
150
+ 2. **Report** — Command, params, error
151
+ 3. **Check prd.md** — Is this a known pattern?
152
+ 4. **Do NOT retry** — Unless user instructs
@@ -0,0 +1,41 @@
1
+ # Playwright Tools Reference
2
+
3
+ Used during step execution in [[run]].
4
+
5
+ ## Navigation & State
6
+
7
+ | Tool | Usage |
8
+ |------|-------|
9
+ | `browser_navigate` | `{ "url": "http://localhost:3000" }` |
10
+ | `browser_snapshot` | `{}` → Returns accessibility tree with refs |
11
+ | `browser_take_screenshot` | `{ "filename": "step-01-before.png" }` |
12
+
13
+ ## Interactions
14
+
15
+ | Tool | Usage |
16
+ |------|-------|
17
+ | `browser_click` | `{ "element": "Login button", "ref": "e5" }` |
18
+ | `browser_type` | `{ "element": "Email input", "ref": "e4", "text": "test@example.com" }` |
19
+ | `browser_select_option` | `{ "element": "Country", "ref": "e7", "values": ["USA"] }` |
20
+
21
+ ## Waiting
22
+
23
+ | Tool | Usage |
24
+ |------|-------|
25
+ | `browser_wait_for` | `{ "text": "Welcome" }` |
26
+ | `browser_wait_for` | `{ "textGone": "Loading..." }` |
27
+ | `browser_wait_for` | `{ "time": 2 }` |
28
+
29
+ ## Debugging
30
+
31
+ | Tool | Usage |
32
+ |------|-------|
33
+ | `browser_console_messages` | `{ "level": "error" }` |
34
+ | `browser_network_requests` | `{}` |
35
+
36
+ ## Best Practices
37
+
38
+ - Always call `browser_snapshot` before `browser_click` or `browser_type` to get current element refs
39
+ - Take a screenshot **before** and **after** each significant action
40
+ - Use `browser_wait_for` after navigation to confirm page loaded
41
+ - Capture console errors on any unexpected behavior
@@ -0,0 +1,65 @@
1
+ # Step Agent Prompt Template
2
+
3
+ Used by the main orchestrator in [[run]] when spawning per-step agents.
4
+
5
+ ## Prompt Template
6
+
7
+ ```
8
+ Execute step {stepId} of scenario {scenarioSlug}:
9
+
10
+ ## Step Content
11
+ {paste content from steps/{step.file}}
12
+
13
+ ## Project Context
14
+ Base URL: {from config.yaml}
15
+ Auth: {from prd.md if this step involves login}
16
+
17
+ ## Previous Step
18
+ Status: {pass|fail|first step}
19
+
20
+ ## Your Task
21
+ 1. Navigate to the base URL if not already there
22
+ 2. Execute the actions described in the step content
23
+ 3. Use browser_snapshot before clicking to get element refs
24
+ 4. Capture before/after screenshots using browser_take_screenshot
25
+ 5. Record evidence:
26
+ harshjudge evidence {runId} --step {stepNumber} --type screenshot --name before --data /path/to/screenshot.png
27
+ 6. Verify the expected outcome
28
+ 7. Write a summary describing what happened and whether expected outcome matched
29
+
30
+ Return ONLY a JSON object:
31
+ {
32
+ "status": "pass" | "fail",
33
+ "evidencePaths": ["path1.png", "path2.png"],
34
+ "error": null | "error message",
35
+ "summary": "Brief description of what happened and result (1-2 sentences)"
36
+ }
37
+
38
+ ## Important Rules
39
+ - DO NOT return full evidence content
40
+ - DO NOT explain your work in prose
41
+ - DO NOT proceed if you encounter an error
42
+ - ONLY return the JSON result object
43
+ ```
44
+
45
+ ## Spawning via Task Tool
46
+
47
+ ```
48
+ Task tool with:
49
+ subagent_type: "general-purpose"
50
+ prompt: <filled prompt above>
51
+ ```
52
+
53
+ ## Expected Return Shape
54
+
55
+ ```json
56
+ {
57
+ "status": "pass",
58
+ "evidencePaths": [
59
+ ".harshJudge/scenarios/login-flow/runs/abc123xyz/step-01/evidence/before.png",
60
+ ".harshJudge/scenarios/login-flow/runs/abc123xyz/step-01/evidence/after.png"
61
+ ],
62
+ "error": null,
63
+ "summary": "Login form loaded successfully. Email and password fields visible."
64
+ }
65
+ ```
@@ -0,0 +1,129 @@
1
+ # Run Scenario Workflow
2
+
3
+ ## Trigger
4
+
5
+ Use this workflow when user wants to:
6
+ - Execute an E2E test scenario
7
+ - Run a specific test with evidence capture
8
+ - Validate application behavior
9
+
10
+ ## CLI Commands Used
11
+
12
+ **HarshJudge Commands (in order):**
13
+ 1. `harshjudge start <scenarioSlug>` — Initialize the test run, get step list
14
+ 2. `harshjudge evidence <runId>` — Capture evidence for each step
15
+ 3. `harshjudge complete-step <runId>` — Complete each step, get next step
16
+ 4. `harshjudge complete-run <runId>` — Finalize with pass/fail status
17
+
18
+ See [[run-playwright]] for Playwright tool reference.
19
+
20
+ > **TOKEN OPTIMIZATION**: Each step executes in its own spawned agent. This isolates context and prevents token accumulation.
21
+
22
+ ## Prerequisites
23
+
24
+ - HarshJudge initialized (`.harshJudge/` exists)
25
+ - Scenario exists with steps (created via `harshjudge create`)
26
+ - Target application is running at configured baseUrl
27
+
28
+ ## Orchestration Flow
29
+
30
+ ```
31
+ 1. harshjudge start <scenarioSlug>
32
+ → Returns: runId, steps[{id, title, file}]
33
+
34
+ 2. Read .harshJudge/prd.md for project context
35
+
36
+ 3. FOR EACH step in steps:
37
+ a. Read step file: .harshJudge/scenarios/{slug}/steps/{step.file}
38
+ b. Spawn step agent (see [[run-step-agent]] for prompt template)
39
+ c. Agent returns: { status, evidencePaths, error, summary }
40
+ d. harshjudge complete-step <runId> --step <id> --status <pass|fail>
41
+ --duration <ms> --summary "..."
42
+ → Returns: nextStepId or null
43
+ e. IF status === 'fail' OR nextStepId === null: BREAK
44
+
45
+ 4. harshjudge complete-run <runId> --status <pass|fail> --duration <ms>
46
+
47
+ 5. Report results to user
48
+ ```
49
+
50
+ ## Step 1: Start the Run
51
+
52
+ ```bash
53
+ harshjudge start login-flow
54
+ ```
55
+
56
+ Output includes `runId`, `runPath`, `steps[]` array with `{id, title, file}`.
57
+
58
+ ## Step 2: Read Project Context
59
+
60
+ ```
61
+ Read .harshJudge/prd.md
62
+ ```
63
+
64
+ Extract: Base URL, auth credentials, tech stack info.
65
+
66
+ ## Step 3: Execute Each Step
67
+
68
+ For each step: read step file → spawn step agent → process result → call complete-step.
69
+
70
+ See [[run-step-agent]] for the full step agent prompt template.
71
+
72
+ **Complete the step:**
73
+ ```bash
74
+ harshjudge complete-step <runId> \
75
+ --step 01 \
76
+ --status pass \
77
+ --duration 3500 \
78
+ --summary "Navigated to login page. Form visible with email/password fields."
79
+ ```
80
+
81
+ Returns `nextStepId` (null when last step or should stop).
82
+
83
+ ## Step 4: Complete the Run
84
+
85
+ **On Success:**
86
+ ```bash
87
+ harshjudge complete-run <runId> --status pass --duration 15234
88
+ ```
89
+
90
+ **On Failure:**
91
+ ```bash
92
+ harshjudge complete-run <runId> \
93
+ --status fail \
94
+ --duration 8521 \
95
+ --failed-step 03 \
96
+ --error "Expected dashboard but got error page"
97
+ ```
98
+
99
+ ## Evidence Recording
100
+
101
+ ```bash
102
+ harshjudge evidence <runId> \
103
+ --step 1 \
104
+ --type screenshot \
105
+ --name before \
106
+ --data /absolute/path/to/screenshot.png
107
+ ```
108
+
109
+ Saved to: `.harshJudge/scenarios/{slug}/runs/{runId}/step-01/evidence/`
110
+
111
+ Evidence types: `screenshot`, `console_log`, `network_log`, `html_snapshot`.
112
+
113
+ ## Error Handling
114
+
115
+ | Error | Action |
116
+ |-------|--------|
117
+ | `harshjudge start` fails | STOP, report error |
118
+ | Step agent fails | complete-step with fail, break loop |
119
+ | `harshjudge evidence` fails | Log warning, continue |
120
+ | `harshjudge complete-step` fails | CRITICAL: attempt complete-run anyway |
121
+ | `harshjudge complete-run` fails | CRITICAL: report immediately |
122
+
123
+ Always call `complete-run`, even on failure. Never retry unless user instructs.
124
+
125
+ ## Post-Run Guidance
126
+
127
+ **On Pass:** Consider re-running to verify stability.
128
+
129
+ **On Fail:** Use iterate workflow. Review evidence in `step-XX/evidence/`. See [[iterate]].