npm - @allenpan2026/harshjudge - Versions diffs - 0.4.0 - Mend

@allenpan2026/harshjudge 0.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (18) hide show

package/.claude-plugin/marketplace.json +17 -0
package/.claude-plugin/plugin.json +11 -0
package/LICENSE +21 -0
package/README.md +224 -0
package/dist/cli.js +1869 -0
package/dist/cli.js.map +1 -0
package/dist/dashboard-worker.js +896 -0
package/dist/dashboard-worker.js.map +1 -0
package/package.json +64 -0
package/skills/harshjudge/SKILL.md +152 -0
package/skills/harshjudge/assets/prd.md +36 -0
package/skills/harshjudge/references/create.md +258 -0
package/skills/harshjudge/references/iterate.md +152 -0
package/skills/harshjudge/references/run-playwright.md +41 -0
package/skills/harshjudge/references/run-step-agent.md +65 -0
package/skills/harshjudge/references/run.md +129 -0
package/skills/harshjudge/references/setup.md +129 -0
package/skills/harshjudge/references/status.md +134 -0

package/skills/harshjudge/references/create.md ADDED Viewed

@@ -0,0 +1,258 @@
+# Create Scenario Workflow
+## Trigger
+Use this workflow when user wants to:
+- Create a new E2E test scenario
+- Define test steps for a user flow
+- Document a test case with expected behavior
+## CLI Commands Used
+- `harshjudge create <slug>` — creates scenario with individual step files
+## Prerequisites
+- HarshJudge must be initialized (`.harshJudge/` directory exists)
+- If not initialized, run setup workflow first
+## Workflow
+### Step 1: Check PRD for Context
+**Before creating a scenario, review existing knowledge:**
+```
+Read .harshJudge/prd.md
+```
+Check for:
+- Existing user flows to test
+- Known UI patterns and selectors
+- Timing considerations
+- Environment requirements
+- Test credentials
+### Step 2: Gather Scenario Information
+Collect from user (or analyze codebase to suggest):
+| Field | Required | Description | Example |
+|-------|----------|-------------|---------|
+| `slug` | Yes | URL-safe identifier | `login-flow`, `checkout-process` |
+| `title` | Yes | Human-readable title | `User Login Flow` |
+| `steps` | Yes | Array of step objects | See format below |
+| `tags` | No | Categorization tags | `["auth", "critical"]` |
+| `estimatedDuration` | No | Expected seconds | `60` |
+| `starred` | No | Mark as favorite | `false` |
+### Step 3: Define Steps
+Each step needs:
+```typescript
+{
+  title: string,           // Step title (becomes filename)
+  description?: string,    // What this step does
+  preconditions?: string,  // Required state before step
+  actions: string,         // Actions to perform
+  expectedOutcome: string  // What should happen
+}
+```
+**Example step:**
+```json
+{
+  "title": "Navigate to login",
+  "description": "Open the application login page",
+  "preconditions": "Application is running at baseUrl",
+  "actions": "1. Navigate to /login\n2. Wait for page to load",
+  "expectedOutcome": "Login form is visible with email and password fields"
+}
+```
+### Step 4: Run create
+Pass scenario data as JSON via stdin or a file:
+```bash
+harshjudge create login-flow --title "User Login Flow" --steps-file steps.json
+```
+Or provide inline JSON:
+```bash
+harshjudge create login-flow --json '{
+  "title": "User Login Flow",
+  "steps": [
+    {
+      "title": "Navigate to login",
+      "description": "Open the login page",
+      "actions": "1. Navigate to /login\n2. Wait for page load",
+      "expectedOutcome": "Login form is visible"
+    },
+    {
+      "title": "Enter credentials",
+      "description": "Fill in the login form",
+      "preconditions": "Login form is visible",
+      "actions": "1. Enter email into email field\n2. Enter password into password field",
+      "expectedOutcome": "Both fields are populated"
+    },
+    {
+      "title": "Submit form",
+      "description": "Submit and verify login",
+      "actions": "1. Click login button\n2. Wait for redirect",
+      "expectedOutcome": "Dashboard is displayed with welcome message"
+    }
+  ],
+  "tags": ["auth", "critical", "smoke"],
+  "estimatedDuration": 60,
+  "starred": false
+}'
+```
+### Step 5: Verify Output
+The command outputs:
+```
+Scenario created: login-flow
+Structure:
+  .harshJudge/scenarios/login-flow/
+    meta.yaml
+    steps/
+      01-navigate-to-login.md
+      02-enter-credentials.md
+      03-submit-form.md
+Steps: 3
+Tags: auth, critical, smoke
+```
+**On Success:** Continue to Step 6
+**On Error:** STOP and report (see Error Handling below)
+### Step 6: Report Success
+```
+Scenario created: login-flow
+Structure:
+  .harshJudge/scenarios/login-flow/
+    meta.yaml           # Scenario definition
+    steps/
+      01-navigate-to-login.md
+      02-enter-credentials.md
+      03-submit-form.md
+Steps: 3
+Tags: auth, critical, smoke
+Next steps:
+1. Run the scenario: "Run the login-flow scenario"
+2. Expect iteration: First runs often reveal needed adjustments
+3. Learnings will be captured in prd.md
+```
+---
+## Created File Structure
+After `harshjudge create` completes:
+```
+.harshJudge/scenarios/{slug}/
+  meta.yaml           # Scenario metadata + step references
+  steps/
+    01-{step-slug}.md # First step details
+    02-{step-slug}.md # Second step details
+    ...
+```
+**meta.yaml format:**
+```yaml
+title: User Login Flow
+slug: login-flow
+starred: false
+tags:
+  - auth
+  - critical
+estimatedDuration: 60
+steps:
+  - id: '01'
+    title: Navigate to login
+    file: 01-navigate-to-login.md
+  - id: '02'
+    title: Enter credentials
+    file: 02-enter-credentials.md
+  - id: '03'
+    title: Submit form
+    file: 03-submit-form.md
+totalRuns: 0
+passCount: 0
+failCount: 0
+avgDuration: 0
+```
+**Step file format (01-navigate-to-login.md):**
+```markdown
+# Step 01: Navigate to login
+## Description
+Open the login page
+## Preconditions
+Application is running at baseUrl
+## Actions
+1. Navigate to /login
+2. Wait for page load
+## Expected Outcome
+Login form is visible
+```
+---
+## Error Handling
+| Error | Cause | Resolution |
+|-------|-------|------------|
+| `Project not initialized` | Missing .harshJudge/ | Run setup workflow |
+| `Invalid slug format` | Non-URL-safe characters | Use lowercase, hyphens, numbers only |
+| `Steps array empty` | No steps provided | Add at least one step |
+| `Step missing actions` | Incomplete step object | Add actions and expectedOutcome |
+**On Error:**
+1. **STOP immediately**
+2. Report error with full context
+3. Do NOT proceed or retry
+---
+## Updating Existing Scenarios
+To update a scenario (same slug = update):
+```bash
+harshjudge create login-flow --json '{ ... updated steps ... }'
+```
+**What happens on update:**
+- Step files are overwritten with new content
+- `meta.yaml` is updated with new step references
+- Run statistics (totalRuns, passCount, etc.) are **preserved**
+**When to update vs create new:**
+- **Update:** Fixing selectors, adding steps, correcting expectations
+- **New:** Testing completely different flow
+---
+## Post-Create Guidance
+After successful creation:
+1. **Run the scenario** - First runs often fail, this is expected
+2. **Use iterate workflow** - To fix issues and capture learnings
+3. **Learnings go to prd.md** - Document selector patterns, timing, etc.

package/skills/harshjudge/references/iterate.md ADDED Viewed

@@ -0,0 +1,152 @@
+# Iterate Scenario Workflow
+## Trigger
+Use this workflow when:
+- A test run **failed** and needs scenario refinement
+- The scenario definition doesn't match actual application behavior
+- User wants to **improve** a scenario based on failed evidence
+- Test steps are **outdated** after application changes
+## CLI Commands Used
+- `harshjudge status <slug>` — review failed run evidence
+- `harshjudge create <slug>` — update scenario with step files
+- `harshjudge start` + `harshjudge complete-step` + `harshjudge complete-run` — re-run test
+- Playwright tools for browser automation
+## Core Philosophy: Learn from Failures
+**Failed runs are valuable data, not waste.** Each failed run provides:
+1. Screenshots showing what actually happened (in `step-XX/evidence/`)
+2. Logs revealing backend behavior
+3. Evidence of gaps between expectation and reality
+**Goal:** Use evidence to iterate toward a scenario that accurately tests the intended behavior, and **accumulate learnings** in `prd.md`.
+## Workflow
+### Step 1: Analyze the Failed Run
+```bash
+harshjudge status login-flow
+```
+Review to identify: lastRun ID, which step failed, historical pass rate.
+### Step 2: Review Evidence
+Navigate to the failed run's evidence directories:
+```
+.harshJudge/scenarios/{slug}/runs/{runId}/
+```
+Read `result.json` for per-step details. View screenshots in `step-XX/evidence/`.
+### Step 3: Review the Dashboard
+```bash
+harshjudge dashboard open
+```
+Open `http://localhost:3001` → Scenario → Failed Run.
+Examine: before/after screenshots, console logs, network logs.
+### Step 4: Classify the Failure
+| Failure Type | Description | Action | Document In |
+|-------------|-------------|--------|-------------|
+| **Selector Broken** | UI changed, selectors outdated | Edit step file | prd.md (selector notes) |
+| **Timing Issue** | Action too fast, element not ready | Add wait to step | prd.md (timing patterns) |
+| **Step Mismatch** | Step describes wrong flow | Edit step file | — |
+| **Missing Step** | Need additional step | Add step, update scenario | — |
+| **App Bug** | Application has actual bug | Mark as known-fail | prd.md (known bugs) |
+| **Environment Issue** | Test env not matching prod | Fix environment | prd.md (env setup) |
+### Step 5: Update the Step File(s)
+**Option A: Edit a single step file directly**
+```
+Edit .harshJudge/scenarios/{slug}/steps/{stepId}-{step-slug}.md
+```
+**Option B: Recreate scenario with updated steps**
+```bash
+harshjudge create login-flow --json '{ "title": "...", "steps": [...] }'
+```
+> `harshjudge create` preserves existing run statistics when updating.
+### Step 6: Re-run the Updated Scenario
+Follow [[run]] workflow:
+1. `harshjudge start login-flow`
+2. Execute each step via spawned agents
+3. `harshjudge complete-run <runId>` with final status
+### Step 7: Record the Iteration
+**Update prd.md with learnings:**
+```markdown
+## Iteration History
+### ITR-001: Login selector fix (2024-01-15)
+**Scenario:** login-flow
+**Failed Step:** 02 (Enter credentials)
+**Root Cause:** Email input selector changed from `.email-input` to `[data-testid="email"]`
+**Changes Made:**
+- Updated step-02 Playwright selectors to use data-testid attributes
+**Learning:**
+- Always prefer data-testid selectors over class names
+```
+### Step 8: Report Iteration Result
+```
+Iteration complete: login-flow
+Previous Run: {runId} (FAIL at step 02)
+New Run: {newRunId} (PASS)
+Changes:
+- Updated step-02 selectors to use data-testid
+Learnings recorded in prd.md:
+- Selector convention: prefer data-testid attributes
+```
+---
+## Best Practices
+1. **Review step evidence first** — before changing anything, examine before/after screenshots
+2. **Edit individual steps when possible** — for small fixes, edit the `.md` file directly
+3. **Use create for major changes** — when adding/removing steps or reorganizing
+4. **Document learnings in prd.md** — after each successful iteration
+5. **Small iterations** — one change per iteration for clearer diagnosis
+---
+## Error Handling
+| Error | Action |
+|-------|--------|
+| `harshjudge status` fails | Check if HarshJudge initialized, scenario exists |
+| `harshjudge create` fails | Check slug format, step array validity |
+| Step file not found | Recreate scenario with `harshjudge create` |
+| New run fails same way | Check if change was applied correctly |
+| New run fails differently | Progress — new issue to investigate |
+**On Error:**
+1. **STOP** — Do not proceed
+2. **Report** — Command, params, error
+3. **Check prd.md** — Is this a known pattern?
+4. **Do NOT retry** — Unless user instructs

package/skills/harshjudge/references/run-playwright.md ADDED Viewed

@@ -0,0 +1,41 @@
+# Playwright Tools Reference
+Used during step execution in [[run]].
+## Navigation & State
+| Tool | Usage |
+|------|-------|
+| `browser_navigate` | `{ "url": "http://localhost:3000" }` |
+| `browser_snapshot` | `{}` → Returns accessibility tree with refs |
+| `browser_take_screenshot` | `{ "filename": "step-01-before.png" }` |
+## Interactions
+| Tool | Usage |
+|------|-------|
+| `browser_click` | `{ "element": "Login button", "ref": "e5" }` |
+| `browser_type` | `{ "element": "Email input", "ref": "e4", "text": "test@example.com" }` |
+| `browser_select_option` | `{ "element": "Country", "ref": "e7", "values": ["USA"] }` |
+## Waiting
+| Tool | Usage |
+|------|-------|
+| `browser_wait_for` | `{ "text": "Welcome" }` |
+| `browser_wait_for` | `{ "textGone": "Loading..." }` |
+| `browser_wait_for` | `{ "time": 2 }` |
+## Debugging
+| Tool | Usage |
+|------|-------|
+| `browser_console_messages` | `{ "level": "error" }` |
+| `browser_network_requests` | `{}` |
+## Best Practices
+- Always call `browser_snapshot` before `browser_click` or `browser_type` to get current element refs
+- Take a screenshot **before** and **after** each significant action
+- Use `browser_wait_for` after navigation to confirm page loaded
+- Capture console errors on any unexpected behavior

package/skills/harshjudge/references/run-step-agent.md ADDED Viewed

@@ -0,0 +1,65 @@
+# Step Agent Prompt Template
+Used by the main orchestrator in [[run]] when spawning per-step agents.
+## Prompt Template
+```
+Execute step {stepId} of scenario {scenarioSlug}:
+## Step Content
+{paste content from steps/{step.file}}
+## Project Context
+Base URL: {from config.yaml}
+Auth: {from prd.md if this step involves login}
+## Previous Step
+Status: {pass|fail|first step}
+## Your Task
+1. Navigate to the base URL if not already there
+2. Execute the actions described in the step content
+3. Use browser_snapshot before clicking to get element refs
+4. Capture before/after screenshots using browser_take_screenshot
+5. Record evidence:
+   harshjudge evidence {runId} --step {stepNumber} --type screenshot --name before --data /path/to/screenshot.png
+6. Verify the expected outcome
+7. Write a summary describing what happened and whether expected outcome matched
+Return ONLY a JSON object:
+{
+  "status": "pass" | "fail",
+  "evidencePaths": ["path1.png", "path2.png"],
+  "error": null | "error message",
+  "summary": "Brief description of what happened and result (1-2 sentences)"
+}
+## Important Rules
+- DO NOT return full evidence content
+- DO NOT explain your work in prose
+- DO NOT proceed if you encounter an error
+- ONLY return the JSON result object
+```
+## Spawning via Task Tool
+```
+Task tool with:
+  subagent_type: "general-purpose"
+  prompt: <filled prompt above>
+```
+## Expected Return Shape
+```json
+{
+  "status": "pass",
+  "evidencePaths": [
+    ".harshJudge/scenarios/login-flow/runs/abc123xyz/step-01/evidence/before.png",
+    ".harshJudge/scenarios/login-flow/runs/abc123xyz/step-01/evidence/after.png"
+  ],
+  "error": null,
+  "summary": "Login form loaded successfully. Email and password fields visible."
+}
+```

package/skills/harshjudge/references/run.md ADDED Viewed

@@ -0,0 +1,129 @@
+# Run Scenario Workflow
+## Trigger
+Use this workflow when user wants to:
+- Execute an E2E test scenario
+- Run a specific test with evidence capture
+- Validate application behavior
+## CLI Commands Used
+**HarshJudge Commands (in order):**
+1. `harshjudge start <scenarioSlug>` — Initialize the test run, get step list
+2. `harshjudge evidence <runId>` — Capture evidence for each step
+3. `harshjudge complete-step <runId>` — Complete each step, get next step
+4. `harshjudge complete-run <runId>` — Finalize with pass/fail status
+See [[run-playwright]] for Playwright tool reference.
+> **TOKEN OPTIMIZATION**: Each step executes in its own spawned agent. This isolates context and prevents token accumulation.
+## Prerequisites
+- HarshJudge initialized (`.harshJudge/` exists)
+- Scenario exists with steps (created via `harshjudge create`)
+- Target application is running at configured baseUrl
+## Orchestration Flow
+```
+1. harshjudge start <scenarioSlug>
+   → Returns: runId, steps[{id, title, file}]
+2. Read .harshJudge/prd.md for project context
+3. FOR EACH step in steps:
+   a. Read step file: .harshJudge/scenarios/{slug}/steps/{step.file}
+   b. Spawn step agent (see [[run-step-agent]] for prompt template)
+   c. Agent returns: { status, evidencePaths, error, summary }
+   d. harshjudge complete-step <runId> --step <id> --status <pass|fail>
+      --duration <ms> --summary "..."
+      → Returns: nextStepId or null
+   e. IF status === 'fail' OR nextStepId === null: BREAK
+4. harshjudge complete-run <runId> --status <pass|fail> --duration <ms>
+5. Report results to user
+```
+## Step 1: Start the Run
+```bash
+harshjudge start login-flow
+```
+Output includes `runId`, `runPath`, `steps[]` array with `{id, title, file}`.
+## Step 2: Read Project Context
+```
+Read .harshJudge/prd.md
+```
+Extract: Base URL, auth credentials, tech stack info.
+## Step 3: Execute Each Step
+For each step: read step file → spawn step agent → process result → call complete-step.
+See [[run-step-agent]] for the full step agent prompt template.
+**Complete the step:**
+```bash
+harshjudge complete-step <runId> \
+  --step 01 \
+  --status pass \
+  --duration 3500 \
+  --summary "Navigated to login page. Form visible with email/password fields."
+```
+Returns `nextStepId` (null when last step or should stop).
+## Step 4: Complete the Run
+**On Success:**
+```bash
+harshjudge complete-run <runId> --status pass --duration 15234
+```
+**On Failure:**
+```bash
+harshjudge complete-run <runId> \
+  --status fail \
+  --duration 8521 \
+  --failed-step 03 \
+  --error "Expected dashboard but got error page"
+```
+## Evidence Recording
+```bash
+harshjudge evidence <runId> \
+  --step 1 \
+  --type screenshot \
+  --name before \
+  --data /absolute/path/to/screenshot.png
+```
+Saved to: `.harshJudge/scenarios/{slug}/runs/{runId}/step-01/evidence/`
+Evidence types: `screenshot`, `console_log`, `network_log`, `html_snapshot`.
+## Error Handling
+| Error | Action |
+|-------|--------|
+| `harshjudge start` fails | STOP, report error |
+| Step agent fails | complete-step with fail, break loop |
+| `harshjudge evidence` fails | Log warning, continue |
+| `harshjudge complete-step` fails | CRITICAL: attempt complete-run anyway |
+| `harshjudge complete-run` fails | CRITICAL: report immediately |
+Always call `complete-run`, even on failure. Never retry unless user instructs.
+## Post-Run Guidance
+**On Pass:** Consider re-running to verify stability.
+**On Fail:** Use iterate workflow. Review evidence in `step-XX/evidence/`. See [[iterate]].