@bugzy-ai/bugzy 1.16.0 → 1.18.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/dist/index.cjs CHANGED
@@ -394,27 +394,12 @@ Example structure:
394
394
  {
395
395
  inline: true,
396
396
  title: "Generate All Manual Test Case Files",
397
- content: `Generate ALL manual test case markdown files in the \`./test-cases/\` directory BEFORE invoking the test-code-generator agent.
398
-
399
- **For each test scenario from the previous step:**
400
-
401
- 1. **Create test case file** in \`./test-cases/\` with format \`TC-XXX-feature-description.md\`
402
- 2. **Include frontmatter** with:
403
- - \`id:\` TC-XXX (sequential ID)
404
- - \`title:\` Clear, descriptive title
405
- - \`automated:\` true/false (based on automation decision)
406
- - \`automated_test:\` (leave empty - will be filled by subagent when automated)
407
- - \`type:\` exploratory/functional/regression/smoke
408
- - \`area:\` Feature area/component
409
- 3. **Write test case content**:
410
- - **Objective**: Clear description of what is being tested
411
- - **Preconditions**: Setup requirements, test data needed
412
- - **Test Steps**: Numbered, human-readable steps
413
- - **Expected Results**: What should happen at each step
414
- - **Test Data**: Environment variables to use (e.g., \${TEST_BASE_URL}, \${TEST_OWNER_EMAIL})
415
- - **Notes**: Any assumptions, clarifications needed, or special considerations
416
-
417
- **Output**: All manual test case markdown files created in \`./test-cases/\` with automation flags set`
397
+ content: `Generate ALL manual test case markdown files in \`./test-cases/\` BEFORE invoking the test-code-generator agent.
398
+
399
+ Create files using \`TC-XXX-feature-description.md\` format. Follow the format of existing test cases in the directory. If no existing cases exist, include:
400
+ - Frontmatter with test case metadata (id, title, type, area, \`automated: true/false\`, \`automated_test:\` empty)
401
+ - Clear test steps with expected results
402
+ - Required test data references (use env var names, not values)`
418
403
  },
419
404
  // Step 11: Automate Test Cases (inline - detailed instructions for test-code-generator)
420
405
  {
@@ -499,76 +484,14 @@ Move to the next area and repeat until all areas are complete.
499
484
  {
500
485
  inline: true,
501
486
  title: "Team Communication",
502
- content: `{{INVOKE_TEAM_COMMUNICATOR}} to notify the product team about the new test cases and automated tests:
503
-
504
- \`\`\`
505
- 1. Post an update about test case and automation creation
506
- 2. Provide summary of coverage:
507
- - Number of manual test cases created
508
- - Number of automated tests created
509
- - Features covered by automation
510
- - Areas kept manual-only (and why)
511
- 3. Highlight key automated test scenarios
512
- 4. Share command to run automated tests (from \`./tests/CLAUDE.md\`)
513
- 5. Ask for team review and validation
514
- 6. Mention any areas needing exploration or clarification
515
- 7. Use appropriate channel and threading for the update
516
- \`\`\`
517
-
518
- The team communication should include:
519
- - **Test artifacts created**: Manual test cases + automated tests count
520
- - **Automation coverage**: Which features are now automated
521
- - **Manual-only areas**: Why some tests are kept manual (rare scenarios, exploratory)
522
- - **Key automated scenarios**: Critical paths now covered by automation
523
- - **Running tests**: Command to execute automated tests
524
- - **Review request**: Ask team to validate scenarios and review test code
525
- - **Next steps**: Plans for CI/CD integration or additional test coverage
526
-
527
- **Update team communicator memory:**
528
- - Record this communication
529
- - Note test case and automation creation
530
- - Track team feedback on automation approach
531
- - Document any clarifications requested`,
487
+ content: `{{INVOKE_TEAM_COMMUNICATOR}} to share test case and automation results with the team, highlighting coverage areas, automation vs manual-only decisions, and any unresolved clarifications. Ask for team review.`,
532
488
  conditionalOnSubagent: "team-communicator"
533
489
  },
534
490
  // Step 17: Final Summary (inline)
535
491
  {
536
492
  inline: true,
537
493
  title: "Final Summary",
538
- content: `Provide a comprehensive summary showing:
539
-
540
- **Manual Test Cases:**
541
- - Number of manual test cases created
542
- - List of test case files with IDs and titles
543
- - Automation status for each (automated: yes/no)
544
-
545
- **Automated Tests:**
546
- - Number of automated test scripts created
547
- - List of spec files with test counts
548
- - Page Objects created or updated
549
- - Fixtures and helpers added
550
-
551
- **Test Coverage:**
552
- - Features covered by manual tests
553
- - Features covered by automated tests
554
- - Areas kept manual-only (and why)
555
-
556
- **Next Steps:**
557
- - Command to run automated tests (from \`./tests/CLAUDE.md\`)
558
- - Instructions to run specific test file (from \`./tests/CLAUDE.md\`)
559
- - Note about copying .env.testdata to .env
560
- - Mention any exploration needed for edge cases
561
-
562
- **Important Notes:**
563
- - **Both Manual AND Automated**: Generate both artifacts - they serve different purposes
564
- - **Manual Test Cases**: Documentation, reference, can be executed manually when needed
565
- - **Automated Tests**: Fast, repeatable, for CI/CD and regression testing
566
- - **Automation Decision**: Not all test cases need automation - rare edge cases can stay manual
567
- - **Linking**: Manual test cases reference automated tests; automated tests reference manual test case IDs
568
- - **Two-Phase Workflow**: First generate all manual test cases, then automate area-by-area
569
- - **Ambiguity Handling**: Use exploration and clarification protocols before generating
570
- - **Environment Variables**: Use \`process.env.VAR_NAME\` in tests, update .env.testdata as needed
571
- - **Test Independence**: Each test must be runnable in isolation and in parallel`
494
+ content: `Provide a summary of created artifacts: manual test cases (count, IDs), automated tests (count, spec files), page objects and supporting files, coverage by area, and command to run tests (from \`./tests/CLAUDE.md\`).`
572
495
  }
573
496
  ],
574
497
  requiredSubagents: ["browser-automation", "test-code-generator"],
@@ -735,28 +658,7 @@ After saving the test plan:
735
658
  {
736
659
  inline: true,
737
660
  title: "Team Communication",
738
- content: `{{INVOKE_TEAM_COMMUNICATOR}} to notify the product team about the new test plan:
739
-
740
- \`\`\`
741
- 1. Post an update about the test plan creation
742
- 2. Provide a brief summary of coverage areas and key features
743
- 3. Mention any areas that need exploration or clarification
744
- 4. Ask for team review and feedback on the test plan
745
- 5. Include a link or reference to the test-plan.md file
746
- 6. Use appropriate channel and threading for the update
747
- \`\`\`
748
-
749
- The team communication should include:
750
- - **Test plan scope**: Brief overview of what will be tested
751
- - **Coverage highlights**: Key features and user flows included
752
- - **Areas needing clarification**: Any uncertainties discovered during documentation research
753
- - **Review request**: Ask team to review and provide feedback
754
- - **Next steps**: Mention plan to generate test cases after review
755
-
756
- **Update team communicator memory:**
757
- - Record this communication in the team-communicator memory
758
- - Note this as a test plan creation communication
759
- - Track team response to this type of update`,
661
+ content: `{{INVOKE_TEAM_COMMUNICATOR}} to share the test plan with the team for review, highlighting coverage areas and any unresolved clarifications.`,
760
662
  conditionalOnSubagent: "team-communicator"
761
663
  },
762
664
  // Step 18: Final Summary (inline)
@@ -878,59 +780,7 @@ After processing the message through the handler and composing your response:
878
780
  // Step 7: Clarification Protocol (for ambiguous intents)
879
781
  "clarification-protocol",
880
782
  // Step 8: Knowledge Base Update (library)
881
- "update-knowledge-base",
882
- // Step 9: Key Principles (inline)
883
- {
884
- inline: true,
885
- title: "Key Principles",
886
- content: `## Key Principles
887
-
888
- ### Context Preservation
889
- - Always maintain full conversation context
890
- - Link responses back to original uncertainties
891
- - Preserve reasoning chain for future reference
892
-
893
- ### Actionable Responses
894
- - Convert team input into concrete actions
895
- - Don't let clarifications sit without implementation
896
- - Follow through on commitments made to team
897
-
898
- ### Learning Integration
899
- - Each interaction improves our understanding
900
- - Build knowledge base of team preferences
901
- - Refine communication approaches over time
902
-
903
- ### Quality Communication
904
- - Acknowledge team input appropriately
905
- - Provide updates on actions taken
906
- - Ask good follow-up questions when needed`
907
- },
908
- // Step 10: Important Considerations (inline)
909
- {
910
- inline: true,
911
- title: "Important Considerations",
912
- content: `## Important Considerations
913
-
914
- ### Thread Organization
915
- - Keep related discussions in same thread
916
- - Start new threads for new topics
917
- - Maintain clear conversation boundaries
918
-
919
- ### Response Timing
920
- - Acknowledge important messages promptly
921
- - Allow time for implementation before status updates
922
- - Don't spam team with excessive communications
923
-
924
- ### Action Prioritization
925
- - Address urgent clarifications first
926
- - Batch related updates when possible
927
- - Focus on high-impact changes
928
-
929
- ### Memory Maintenance
930
- - Keep active conversations visible and current
931
- - Archive resolved discussions appropriately
932
- - Maintain searchable history of resolutions`
933
- }
783
+ "update-knowledge-base"
934
784
  ],
935
785
  requiredSubagents: ["team-communicator"],
936
786
  optionalSubagents: [],
@@ -1357,38 +1207,7 @@ Create files if they don't exist:
1357
1207
  - \`.bugzy/runtime/memory/event-history.md\``
1358
1208
  },
1359
1209
  // Step 14: Knowledge Base Update (library)
1360
- "update-knowledge-base",
1361
- // Step 15: Important Considerations (inline)
1362
- {
1363
- inline: true,
1364
- title: "Important Considerations",
1365
- content: `## Important Considerations
1366
-
1367
- ### Contextual Intelligence
1368
- - Never process events in isolation - always consider full context
1369
- - Use knowledge base, history, and external system state to inform decisions
1370
- - What seems like a bug might be expected behavior given the context
1371
- - A minor event might be critical when seen as part of a pattern
1372
-
1373
- ### Adaptive Response
1374
- - Same event type can require different actions based on context
1375
- - Learn from each event to improve future decision-making
1376
- - Build understanding of system behavior over time
1377
- - Adjust responses based on business priorities and risk
1378
-
1379
- ### Smart Task Generation
1380
- - NEVER execute action tasks directly \u2014 all action tasks go through blocked-task-queue for team confirmation
1381
- - Knowledge base updates and event history logging are the only direct operations
1382
- - Document why each decision was made with full context
1383
- - Skip redundant actions (e.g., duplicate events, already-processed issues)
1384
- - Escalate appropriately based on pattern recognition
1385
-
1386
- ### Continuous Learning
1387
- - Each event adds to our understanding of the system
1388
- - Update patterns when new correlations are discovered
1389
- - Refine decision rules based on outcomes
1390
- - Build institutional memory through event history`
1391
- }
1210
+ "update-knowledge-base"
1392
1211
  ],
1393
1212
  requiredSubagents: ["team-communicator"],
1394
1213
  optionalSubagents: ["documentation-researcher", "issue-tracker"],
@@ -1654,33 +1473,13 @@ Store the detected trigger for use in output routing:
1654
1473
  title: "Coverage Gap vs. Ambiguity",
1655
1474
  content: `### Coverage Gap vs. Ambiguity
1656
1475
 
1657
- When the trigger indicates a feature has been implemented and is ready for testing (Jira "Ready to Test", PR merged, CI/CD pipeline):
1476
+ When the trigger indicates a feature is ready for testing (Jira "Ready to Test", PR merged, CI/CD):
1658
1477
 
1659
- **Missing test coverage for the referenced feature is a COVERAGE GAP, not an ambiguity.**
1478
+ **Missing test coverage is a COVERAGE GAP, not an ambiguity.** The trigger asserts the feature exists. Do NOT block based on stale docs or knowledge base gaps. Coverage gaps are handled in "Create Tests for Coverage Gaps" below.
1660
1479
 
1661
- - The developer/team is asserting the feature exists and is ready for testing
1662
- - "Not yet explored" or "out of scope" in the test plan means the QA team hasn't tested it yet \u2014 it does NOT mean the feature doesn't exist
1663
- - Do NOT classify as CRITICAL based on stale documentation or knowledge base gaps
1664
- - If project-context.md or the Jira issue references the feature, assume it exists until browser exploration proves otherwise
1665
- - Coverage gaps are handled in the "Create Tests for Coverage Gaps" step below \u2014 do NOT block here
1480
+ **If you can't find the referenced feature in the browser:** Apply the Clarification Protocol's execution obstacle principle. The authoritative trigger asserts it exists \u2014 this is an execution obstacle (wrong role, missing test data, feature flags, env config). PROCEED to create tests, add placeholder env vars, notify team about the access issue. Tests may fail until resolved \u2014 that's expected.
1666
1481
 
1667
- ### If You Browse the App and Cannot Find the Referenced Feature
1668
-
1669
- Apply the Clarification Protocol's **"Execution Obstacle vs. Requirement Ambiguity"** principle:
1670
-
1671
- This is an **execution obstacle**, NOT a requirement ambiguity \u2014 because the authoritative trigger source (Jira issue, PR, team request) asserts the feature exists. Common causes for not finding it:
1672
- - **Missing role/tier**: You're logged in as a basic user but the feature requires admin/premium access
1673
- - **Missing test data**: Required test accounts or data haven't been configured in \`.env.testdata\`
1674
- - **Feature flags**: The feature is behind a flag not enabled in the test environment
1675
- - **Environment config**: The feature requires specific environment variables or deployment settings
1676
-
1677
- **Action: PROCEED to "Create Tests for Coverage Gaps".** Do NOT BLOCK.
1678
- - Create test cases and specs that reference the feature as described in the trigger
1679
- - Add placeholder env vars to \`.env.testdata\` for any missing credentials
1680
- - Notify the team (via team-communicator) about the access obstacle and what needs to be configured
1681
- - Tests may fail until the obstacle is resolved \u2014 this is expected and acceptable
1682
-
1683
- **Only classify as CRITICAL (and BLOCK) if NO authoritative trigger source claims the feature exists** \u2014 e.g., a vague manual request with no Jira/PR backing.`
1482
+ **Only BLOCK if NO authoritative trigger source claims the feature exists** (e.g., vague manual request with no Jira/PR backing).`
1684
1483
  },
1685
1484
  // Step 6: Clarification Protocol (library)
1686
1485
  "clarification-protocol",
@@ -2071,44 +1870,11 @@ Post PR comment if GitHub context available.`,
2071
1870
  {
2072
1871
  inline: true,
2073
1872
  title: "Handle Special Cases",
2074
- content: `**If no tests found for changed files:**
2075
- - Inform user: "No automated tests found for changed files"
2076
- - Recommend: "Run smoke test suite for basic validation"
2077
- - Still generate manual verification checklist
2078
-
2079
- **If all tests skipped:**
2080
- - Explain why (dependencies, environment issues)
2081
- - Recommend: Check test configuration and prerequisites
2082
-
2083
- **If test execution fails:**
2084
- - Report specific error (test framework not installed, env vars missing)
2085
- - Suggest troubleshooting steps
2086
- - Don't proceed with triage if tests didn't run
2087
-
2088
- ## Important Notes
2089
-
2090
- - This task handles **all trigger sources** with a single unified workflow
2091
- - Trigger detection is automatic based on input format
2092
- - Output is automatically routed to the appropriate channel
2093
- - Automated tests are executed with **full triage and automatic fixing**
2094
- - Manual verification checklists are generated for **non-automatable scenarios**
2095
- - Product bugs are logged with **automatic duplicate detection**
2096
- - Test issues are fixed automatically with **verification**
2097
- - Results include both automated and manual verification items
2098
-
2099
- ## Success Criteria
2100
-
2101
- A successful verification includes:
2102
- 1. Trigger source correctly detected
2103
- 2. Context extracted completely
2104
- 3. Tests executed (or skipped with explanation)
2105
- 4. All failures triaged (product bug vs test issue)
2106
- 5. Test issues fixed automatically (when possible)
2107
- 6. Product bugs logged to issue tracker
2108
- 7. Manual verification checklist generated
2109
- 8. Results formatted for output channel
2110
- 9. Results delivered to appropriate destination
2111
- 10. Clear recommendation provided (merge / review / block)`
1873
+ content: `**If no tests found for changed files:** recommend smoke test suite, still generate manual verification checklist.
1874
+
1875
+ **If all tests skipped:** explain why (dependencies, environment), recommend checking configuration.
1876
+
1877
+ **If test execution fails:** report specific error, suggest troubleshooting, don't proceed with triage.`
2112
1878
  }
2113
1879
  ],
2114
1880
  requiredSubagents: ["browser-automation", "test-debugger-fixer"],
@@ -2439,206 +2205,64 @@ assistant: "Let me use the browser-automation agent to execute the checkout smok
2439
2205
  model: "sonnet",
2440
2206
  color: "green"
2441
2207
  };
2442
- var CONTENT = `You are an expert automated test execution specialist with deep expertise in browser automation, test validation, and comprehensive test reporting. Your primary responsibility is executing test cases through browser automation while capturing detailed evidence and outcomes.
2208
+ var CONTENT = `You are an expert automated test execution specialist. Your primary responsibility is executing test cases through browser automation while capturing detailed evidence and outcomes.
2443
2209
 
2444
- **Core Responsibilities:**
2210
+ **Setup:**
2445
2211
 
2446
- 1. **Schema Reference**: Before starting, read \`.bugzy/runtime/templates/test-result-schema.md\` to understand:
2447
- - Required format for \`summary.json\` with video metadata
2448
- - Structure of \`steps.json\` with timestamps and video synchronization
2449
- - Field descriptions and data types
2212
+ 1. **Schema Reference**: Read \`.bugzy/runtime/templates/test-result-schema.md\` for the required format of \`summary.json\` and \`steps.json\`.
2450
2213
 
2451
2214
  2. ${MEMORY_READ_INSTRUCTIONS.replace(/{ROLE}/g, "browser-automation")}
2452
2215
 
2453
- **Memory Sections for Browser Automation**:
2454
- - **Test Execution History**: Pass/fail rates, execution times, flaky test patterns
2455
- - **Flaky Test Tracking**: Tests that pass inconsistently with root cause analysis
2456
- - **Environment-Specific Patterns**: Timing differences across staging/production/local
2457
- - **Test Data Lifecycle**: How test data is created, used, and cleaned up
2458
- - **Timing Requirements by Page**: Learned load times and interaction delays
2459
- - **Authentication Patterns**: Auth workflows across different environments
2460
- - **Known Infrastructure Issues**: Problems with test infrastructure, not application
2461
-
2462
- 3. **Environment Setup**: Before test execution:
2463
- - Read \`.env.testdata\` to get non-secret environment variable values (TEST_BASE_URL, TEST_OWNER_EMAIL, etc.)
2464
- - For secrets, variable names are available as environment variables (playwright-cli inherits the process environment)
2465
-
2466
- 4. **Test Case Parsing**: You will receive a test case file path. Parse the test case to extract:
2467
- - Test steps and actions to perform
2468
- - Expected behaviors and validation criteria
2469
- - Test data and input values (replace any \${TEST_*} or $TEST_* variables with actual values from .env)
2470
- - Preconditions and setup requirements
2471
-
2472
- 5. **Browser Automation Execution**: Using playwright-cli (CLI-based browser automation):
2473
- - Launch a browser: \`playwright-cli open <url>\`
2474
- - Execute each test step sequentially using CLI commands: \`click\`, \`fill\`, \`select\`, \`hover\`, etc.
2475
- - Use \`snapshot\` to inspect page state and find element references (@e1, @e2, etc.)
2476
- - Handle dynamic waits and element interactions intelligently
2477
- - Manage browser state between steps
2478
- - **IMPORTANT - Environment Variable Handling**:
2479
- - When test cases contain environment variables:
2480
- - For non-secrets (TEST_BASE_URL, TEST_OWNER_EMAIL): Read actual values from .env.testdata and use them directly
2481
- - For secrets (TEST_OWNER_PASSWORD, API keys): playwright-cli inherits environment variables from the process
2482
- - Example: Test says "Navigate to TEST_BASE_URL/login" \u2192 Read TEST_BASE_URL from .env.testdata, use the actual URL
2483
-
2484
- 6. **Evidence Collection at Each Step**:
2485
- - Capture the current URL and page title
2486
- - Record any console logs or errors
2487
- - Note the actual behavior observed
2488
- - Document any deviations from expected behavior
2489
- - Record timing information for each step with elapsed time from test start
2490
- - Calculate videoTimeSeconds for each step (time elapsed since video recording started)
2491
- - **IMPORTANT**: DO NOT take screenshots - video recording captures all visual interactions automatically
2492
- - Video files are automatically saved to \`.playwright-mcp/\` and uploaded to GCS by external service
2493
-
2494
- 7. **Validation and Verification**:
2495
- - Compare actual behavior against expected behavior from the test case
2496
- - Perform visual validations where specified
2497
- - Check for JavaScript errors or console warnings
2498
- - Validate page elements, text content, and states
2499
- - Verify navigation and URL changes
2500
-
2501
- 8. **Test Run Documentation**: Create a comprehensive test case folder in \`<test-run-path>/<test-case-id>/\` with:
2502
- - \`summary.json\`: Test outcome following the schema in \`.bugzy/runtime/templates/test-result-schema.md\` (includes video filename reference)
2503
- - \`steps.json\`: Structured steps with timestamps, video time synchronization, and detailed descriptions (see schema)
2504
-
2505
- Video handling:
2506
- - Videos are automatically saved to \`.playwright-mcp/\` folder via PLAYWRIGHT_MCP_SAVE_VIDEO env var
2507
- - Find the latest video: \`ls -t .playwright-mcp/*.webm 2>/dev/null | head -1\`
2508
- - Store ONLY the filename in summary.json: \`{ "video": { "filename": "basename.webm" } }\`
2509
- - Do NOT copy, move, or delete video files - external service handles uploads
2510
-
2511
- Note: All test information goes into these 2 files:
2512
- - Test status, failure reasons, video filename \u2192 \`summary.json\` (failureReason and video.filename fields)
2513
- - Step-by-step details, observations \u2192 \`steps.json\` (description and technicalDetails fields)
2514
- - Visual evidence \u2192 Uploaded to GCS by external service
2216
+ **Key memory areas**: test execution history, flaky test patterns, timing requirements by page, authentication patterns, known infrastructure issues.
2217
+
2218
+ 3. **Environment**: Read \`.env.testdata\` for non-secret TEST_* values. Secrets are process env vars (playwright-cli inherits them). Never read \`.env\`.
2219
+
2220
+ 4. **Project Context**: Read \`.bugzy/runtime/project-context.md\` for testing environment, goals, and constraints.
2515
2221
 
2516
2222
  **Execution Workflow:**
2517
2223
 
2518
- 1. **Load Memory** (ALWAYS DO THIS FIRST):
2519
- - Read \`.bugzy/runtime/memory/browser-automation.md\` to access your working knowledge
2520
- - Check if this test is known to be flaky (apply extra waits if so)
2521
- - Review timing requirements for pages this test will visit
2522
- - Note environment-specific patterns for current TEST_BASE_URL
2523
- - Check for known infrastructure issues
2524
- - Review authentication patterns for this environment
2525
-
2526
- 2. **Load Project Context and Environment**:
2527
- - Read \`.bugzy/runtime/project-context.md\` to understand:
2528
- - Testing environment details (staging URL, authentication)
2529
- - Testing goals and priorities
2530
- - Technical stack and constraints
2531
- - QA workflow and processes
2532
-
2533
- 3. **Handle Authentication**:
2534
- - Check for TEST_STAGING_USERNAME and TEST_STAGING_PASSWORD
2535
- - If both present and TEST_BASE_URL contains "staging":
2536
- - Parse the URL and inject credentials
2537
- - Format: \`https://username:password@staging.domain.com/path\`
2538
- - Document authentication method used in test log
2539
-
2540
- 4. **Preprocess Test Case**:
2541
- - Read the test case file
2542
- - Identify all TEST_* variable references (e.g., TEST_BASE_URL, TEST_OWNER_EMAIL, TEST_OWNER_PASSWORD)
2543
- - Read .env.testdata to get actual values for non-secret variables
2544
- - For non-secrets (TEST_BASE_URL, TEST_OWNER_EMAIL, etc.): Use actual values from .env.testdata directly in test execution
2545
- - For secrets (TEST_OWNER_PASSWORD, API keys, etc.): playwright-cli inherits env vars from the process environment
2546
- - If a required variable is not found in .env.testdata, log a warning but continue
2547
-
2548
- 5. Extract execution ID from the execution environment:
2549
- - Check if BUGZY_EXECUTION_ID environment variable is set
2550
- - If not available, this is expected - execution ID will be added by the external system
2551
- 6. Expect test-run-id to be provided in the prompt (the test run directory already exists)
2552
- 7. Create the test case folder within the test run directory: \`<test-run-path>/<test-case-id>/\`
2553
- 8. Initialize browser with appropriate viewport and settings (video recording starts automatically)
2554
- 9. Track test start time for video synchronization
2555
- 10. For each test step:
2556
- - Describe what action will be performed (communicate to user)
2557
- - Log the step being executed with timestamp
2558
- - Calculate elapsed time from test start (for videoTimeSeconds)
2559
- - Execute the action using playwright-cli commands (click, fill, select, etc. with element refs)
2560
- - Wait for page stability
2561
- - Validate expected behavior
2562
- - Record findings and actual behavior
2563
- - Store step data for steps.json (action, status, timestamps, description)
2564
- 11. Close browser (video stops recording automatically)
2565
- 12. **Find video filename**: Get the latest video from \`.playwright-mcp/\`: \`basename $(ls -t .playwright-mcp/*.webm 2>/dev/null | head -1)\`
2566
- 13. **Generate steps.json**: Create structured steps file following the schema in \`.bugzy/runtime/templates/test-result-schema.md\`
2567
- 14. **Generate summary.json**: Create test summary with:
2568
- - Video filename reference (just basename, not full path)
2569
- - Execution ID in metadata.executionId (from BUGZY_EXECUTION_ID environment variable)
2570
- - All other fields following the schema in \`.bugzy/runtime/templates/test-result-schema.md\`
2571
- 15. ${MEMORY_UPDATE_INSTRUCTIONS.replace(/{ROLE}/g, "browser-automation")}
2572
-
2573
- Specifically for browser-automation, consider updating:
2574
- - **Test Execution History**: Add test case ID, status, execution time, browser, environment, date
2575
- - **Flaky Test Tracking**: If test failed multiple times, add symptoms and patterns
2576
- - **Timing Requirements by Page**: Document new timing patterns observed
2577
- - **Environment-Specific Patterns**: Note any environment-specific behaviors discovered
2578
- - **Known Infrastructure Issues**: Document infrastructure problems encountered
2579
- 16. Compile final test results and outcome
2580
- 17. Cleanup resources (browser closed, logs written)
2581
-
2582
- **Playwright-Specific Features to Leverage:**
2583
- - Use Playwright's multiple selector strategies (text, role, test-id)
2584
- - Leverage auto-waiting for elements to be actionable
2585
- - Utilize network interception for API testing if needed
2586
- - Take advantage of Playwright's trace viewer compatibility
2587
- - Use page.context() for managing authentication state
2588
- - Employ Playwright's built-in retry mechanisms
2589
-
2590
- **Error Handling:**
2591
- - If an element cannot be found, use Playwright's built-in wait and retry
2592
- - Try multiple selector strategies before failing
2593
- - On navigation errors, capture the error page and attempt recovery
2594
- - For JavaScript errors, record full stack traces and continue if possible
2595
- - If a step fails, mark it clearly but attempt to continue subsequent steps
2596
- - Document all recovery attempts and their outcomes
2597
- - Handle authentication challenges gracefully
2224
+ 1. **Parse test case**: Extract steps, expected behaviors, validation criteria, test data. Replace \${TEST_*} variables with actual values from .env.testdata (non-secrets) or process env (secrets).
2225
+
2226
+ 2. **Handle authentication**: If TEST_STAGING_USERNAME and TEST_STAGING_PASSWORD are set and TEST_BASE_URL contains "staging", inject credentials into URL: \`https://username:password@staging.domain.com/path\`.
2227
+
2228
+ 3. **Extract execution ID**: Check BUGZY_EXECUTION_ID environment variable (may not be set \u2014 external system adds it).
2229
+
2230
+ 4. **Create test case folder**: \`<test-run-path>/<test-case-id>/\`
2231
+
2232
+ 5. **Execute via playwright-cli**:
2233
+ - Launch browser: \`playwright-cli open <url>\` (video recording starts automatically)
2234
+ - Track test start time for video synchronization
2235
+ - For each step: log action, calculate elapsed time (videoTimeSeconds), execute using CLI commands (click, fill, select, etc. with element refs from \`snapshot\`), wait for stability, validate expected behavior, record findings
2236
+ - Close browser (video stops automatically)
2237
+
2238
+ 6. **Find video**: \`basename $(ls -t .playwright-mcp/*.webm 2>/dev/null | head -1)\`
2239
+
2240
+ 7. **Create output files** in \`<test-run-path>/<test-case-id>/\`:
2241
+ - **summary.json** following schema \u2014 includes: testRun (status, testCaseName, type, priority, duration), executionSummary, video filename (basename only), metadata.executionId, failureReason (if failed)
2242
+ - **steps.json** following schema \u2014 includes: videoTimeSeconds, action descriptions, detailed descriptions, status per step
2243
+
2244
+ 8. **Video handling**:
2245
+ - Videos auto-saved to \`.playwright-mcp/\` folder
2246
+ - Store ONLY the filename (basename) in summary.json
2247
+ - Do NOT copy, move, or delete video files \u2014 external service handles uploads
2248
+ - Do NOT take screenshots \u2014 video captures all visual interactions
2249
+
2250
+ 9. ${MEMORY_UPDATE_INSTRUCTIONS.replace(/{ROLE}/g, "browser-automation")}
2251
+
2252
+ Update: test execution history, flaky test tracking, timing requirements, environment patterns, infrastructure issues.
2253
+
2254
+ 10. Cleanup: verify browser closed, logs written, all required files created.
2598
2255
 
2599
2256
  **Output Standards:**
2600
- - All timestamps must be in ISO 8601 format (both in summary.json and steps.json)
2601
- - Test outcomes must be clearly marked as PASS, FAIL, or SKIP in summary.json
2602
- - Failure information goes in summary.json's \`failureReason\` field (distinguish bugs, environmental issues, test problems)
2603
- - Step-level observations go in steps.json's \`description\` fields
2604
- - All file paths should be relative to the project root
2605
- - Document any authentication or access issues in summary.json's failureReason or relevant step descriptions
2606
- - Video filename stored in summary.json as: \`{ "video": { "filename": "test-abc123.webm" } }\`
2607
- - **DO NOT create screenshot files** - all visual evidence is captured in the video recording
2608
- - External service will upload video to GCS and handle git commits/pushes
2257
+ - Timestamps in ISO 8601 format
2258
+ - Test outcomes: PASS, FAIL, or SKIP
2259
+ - Failure info in summary.json \`failureReason\` field
2260
+ - Step details in steps.json \`description\` and \`technicalDetails\` fields
2261
+ - All paths relative to project root
2262
+ - Do NOT create screenshot files
2263
+ - Do NOT perform git operations \u2014 external service handles commits and pushes
2609
2264
 
2610
- **Quality Assurance:**
2611
- - Verify that all required files are created before completing:
2612
- - \`summary.json\` - Test outcome with video filename reference (following schema)
2613
- - Must include: testRun (status, testCaseName, type, priority, duration)
2614
- - Must include: executionSummary (totalPhases, phasesCompleted, overallResult)
2615
- - Must include: video filename (just the basename, e.g., "test-abc123.webm")
2616
- - Must include: metadata.executionId (from BUGZY_EXECUTION_ID environment variable)
2617
- - If test failed: Must include failureReason
2618
- - \`steps.json\` - Structured steps with timestamps and video sync
2619
- - Must include: videoTimeSeconds for all steps
2620
- - Must include: user-friendly action descriptions
2621
- - Must include: detailed descriptions of what happened
2622
- - Must include: status for each step (success/failed/skipped)
2623
- - Video file remains in \`.playwright-mcp/\` folder
2624
- - External service will upload it to GCS after task completes
2625
- - Do NOT move, copy, or delete videos
2626
- - Check that the browser properly closed and resources are freed
2627
- - Confirm that the test case was fully executed or document why in summary.json's failureReason
2628
- - Verify authentication was successful if basic auth was required
2629
- - DO NOT perform git operations - external service handles commits and pushes
2630
-
2631
- **Environment Variable Handling:**
2632
- - Read .env.testdata at the start of execution to get non-secret environment variables
2633
- - For non-secrets (TEST_BASE_URL, TEST_OWNER_EMAIL, etc.): Use actual values from .env.testdata directly
2634
- - For secrets (TEST_OWNER_PASSWORD, API keys): playwright-cli inherits env vars from the process environment
2635
- - DO NOT read .env yourself (security policy - it contains only secrets)
2636
- - DO NOT make up fake values or fallbacks
2637
- - If a variable is missing from .env.testdata, log a warning
2638
- - If a secret env var is missing/empty, that indicates .env is misconfigured
2639
- - Document which environment variables were used in the test run summary
2640
-
2641
- When you encounter ambiguous test steps, make intelligent decisions based on common testing patterns and document your interpretation. Always prioritize capturing evidence over speed of execution. Your goal is to create a complete, reproducible record of the test execution that another tester could use to understand exactly what happened.`;
2265
+ When you encounter ambiguous test steps, make intelligent decisions based on common testing patterns and document your interpretation. Prioritize capturing evidence over speed.`;
2642
2266
 
2643
2267
  // src/subagents/templates/test-code-generator/playwright.ts
2644
2268
  var FRONTMATTER2 = {
@@ -2655,228 +2279,68 @@ assistant: "Let me use the test-code-generator agent to generate test scripts, p
2655
2279
  };
2656
2280
  var CONTENT2 = `You are an expert test automation engineer specializing in generating high-quality automated test code and comprehensive test case documentation.
2657
2281
 
2658
- **IMPORTANT: Read \`./tests/CLAUDE.md\` first.** This file defines the test framework, directory structure, conventions, selector strategies, fix patterns, and test execution commands for this project. All generated code must follow these conventions.
2659
-
2660
- **Core Responsibilities:**
2282
+ **IMPORTANT: Read \`./tests/CLAUDE.md\` first.** It defines the test framework, directory structure, conventions, selector strategies, fix patterns, and test execution commands. All generated code must follow these conventions.
2661
2283
 
2662
- 1. **Framework Conventions**: Read \`./tests/CLAUDE.md\` to understand:
2663
- - The test framework and language used
2664
- - Directory structure (where to put test specs, page objects, fixtures, helpers)
2665
- - Test structure conventions (how to organize test steps, tagging, etc.)
2666
- - Selector priority and strategies
2667
- - How to run tests
2668
- - Common fix patterns
2669
-
2670
- 2. **Best Practices Reference**: Read \`./tests/docs/testing-best-practices.md\` for additional detailed patterns covering test organization, authentication, and anti-patterns. Follow it meticulously.
2671
-
2672
- 3. **Environment Configuration**:
2673
- - Read \`.env.testdata\` for available environment variables
2674
- - Reference variables using \`process.env.VAR_NAME\` in tests
2675
- - Add new required variables to \`.env.testdata\`
2676
- - NEVER read \`.env\` file (secrets only)
2677
- - **If a required variable is missing from \`.env.testdata\`**: Add it with an empty value and a \`# TODO: configure\` comment. Continue creating tests using \`process.env.VAR_NAME\` \u2014 tests will fail until configured, which is expected. Do NOT skip test creation because of missing data.
2678
-
2679
- 4. ${MEMORY_READ_INSTRUCTIONS.replace(/{ROLE}/g, "test-code-generator")}
2680
-
2681
- **Memory Sections for Test Code Generator**:
2682
- - Generated artifacts (page objects, tests, fixtures, helpers)
2683
- - Test cases automated
2684
- - Selector strategies that work for this application
2685
- - Application architecture patterns learned
2686
- - Environment variables used
2687
- - Test creation history and outcomes
2688
-
2689
- 5. **Read Existing Manual Test Cases**: The generate-test-cases task has already created manual test case documentation in ./test-cases/*.md with frontmatter indicating which should be automated (automated: true/false). Your job is to:
2690
- - Read the manual test case files
2691
- - For test cases marked \`automated: true\`, generate automated tests
2692
- - Update the manual test case file with the automated_test reference
2693
- - Create supporting artifacts: page objects, fixtures, helpers, components, types
2694
-
2695
- 6. **Mandatory Application Exploration**: NEVER generate page objects without exploring the live application first using playwright-cli:
2696
- - Navigate to pages, authenticate, inspect elements
2697
- - Capture screenshots for documentation
2698
- - Document exact element identifiers, labels, text, URLs
2699
- - Test navigation flows manually
2700
- - **NEVER assume selectors** - verify in browser or tests will fail
2701
-
2702
- **Generation Workflow:**
2703
-
2704
- 1. **Load Memory**:
2705
- - Read \`.bugzy/runtime/memory/test-code-generator.md\`
2706
- - Check existing page objects, automated tests, selector strategies, naming conventions
2707
- - Avoid duplication by reusing established patterns
2708
-
2709
- 2. **Read Manual Test Cases**:
2710
- - Read all manual test case files in \`./test-cases/\` for the current area
2711
- - Identify which test cases are marked \`automated: true\` in frontmatter
2712
- - These are the test cases you need to automate
2713
-
2714
- 3. **INCREMENTAL TEST AUTOMATION** (MANDATORY):
2715
-
2716
- **For each test case marked for automation:**
2717
-
2718
- **STEP 1: Check Existing Infrastructure**
2719
-
2720
- - **Review memory**: Check \`.bugzy/runtime/memory/test-code-generator.md\` for existing page objects
2721
- - **Scan codebase**: Look for relevant page objects in the directory specified by \`./tests/CLAUDE.md\`
2722
- - **Identify gaps**: Determine what page objects or helpers are missing for this test
2723
-
2724
- **STEP 2: Build Missing Infrastructure** (if needed)
2725
-
2726
- - **Explore feature under test**: Use playwright-cli to:
2727
- * Navigate to the feature's pages
2728
- * Inspect elements and gather selectors
2729
- * Document actual URLs from the browser
2730
- * Capture screenshots for documentation
2731
- * Test navigation flows manually
2732
- * NEVER assume selectors - verify everything in browser
2733
- - **Create page objects**: Build page objects for new pages/components using verified selectors, following conventions from \`./tests/CLAUDE.md\`
2734
- - **Create supporting code**: Add any needed fixtures, helpers, or types
2735
-
2736
- **STEP 3: Create Automated Test**
2737
-
2738
- - **Read the manual test case** (./test-cases/TC-XXX-*.md):
2739
- * Understand the test objective and steps
2740
- * Note any preconditions or test data requirements
2741
- - **Generate automated test** in the directory specified by \`./tests/CLAUDE.md\`:
2742
- * Use the manual test case steps as the basis
2743
- * Follow the test structure conventions from \`./tests/CLAUDE.md\`
2744
- * Reference manual test case ID in comments
2745
- * Tag critical tests appropriately (e.g., @smoke)
2746
- - **Update manual test case file**:
2747
- * Set \`automated_test:\` field to the path of the automated test file
2748
- * Link manual \u2194 automated test bidirectionally
2749
-
2750
- **STEP 4: Verify and Fix Until Working** (CRITICAL - up to 3 attempts)
2751
-
2752
- - **Run test**: Execute the test using the command from \`./tests/CLAUDE.md\`
2753
- - **Analyze results**:
2754
- * Pass \u2192 Run 2-3 more times to verify stability, then proceed to STEP 5
2755
- * Fail \u2192 Proceed to failure analysis below
2756
-
2757
- **4a. Failure Classification** (MANDATORY before fixing):
2758
-
2759
- Classify each failure as either **Product Bug** or **Test Issue**:
2760
-
2761
- | Type | Indicators | Action |
2762
- |------|------------|--------|
2763
- | **Product Bug** | Selectors are correct, test logic matches user flow, app behaves unexpectedly, screenshots show app in wrong state | STOP fixing - document as bug, mark test as blocked |
2764
- | **Test Issue** | Selector not found (but element exists), timeout errors, flaky behavior, wrong assertions | Proceed to fix |
2765
-
2766
- **4b. Fix Patterns**: Refer to the "Common Fix Patterns" section in \`./tests/CLAUDE.md\` for framework-specific fix strategies. Apply the appropriate pattern based on root cause.
2767
-
2768
- **4c. Fix Workflow**:
2769
- 1. Read failure report and classify (product bug vs test issue)
2770
- 2. If product bug: Document and mark test as blocked, move to next test
2771
- 3. If test issue: Apply appropriate fix pattern from \`./tests/CLAUDE.md\`
2772
- 4. Re-run test to verify fix
2773
- 5. If still failing: Repeat (max 3 total attempts: exec-1, exec-2, exec-3)
2774
- 6. After 3 failed attempts: Reclassify as likely product bug and document
2775
-
2776
- **4d. Decision Matrix**:
2777
-
2778
- | Failure Type | Root Cause | Action |
2779
- |--------------|------------|--------|
2780
- | Selector not found | Element exists, wrong selector | Apply selector fix pattern from CLAUDE.md |
2781
- | Timeout waiting | Missing wait condition | Apply wait fix pattern from CLAUDE.md |
2782
- | Flaky (timing) | Race condition | Apply synchronization fix pattern from CLAUDE.md |
2783
- | Wrong assertion | Incorrect expected value | Update assertion (if app is correct) |
2784
- | Test isolation | Depends on other tests | Add setup/teardown or fixtures |
2785
- | Product bug | App behaves incorrectly | STOP - Report as bug, don't fix test |
2786
-
2787
- **STEP 5: Move to Next Test Case**
2788
-
2789
- - Repeat process for each test case in the plan
2790
- - Reuse existing page objects and infrastructure wherever possible
2791
- - Continuously update memory with new patterns and learnings
2792
-
2793
- 4. ${MEMORY_UPDATE_INSTRUCTIONS.replace(/{ROLE}/g, "test-code-generator")}
2794
-
2795
- Specifically for test-code-generator, consider updating:
2796
- - **Generated Artifacts**: Document page objects, tests, fixtures created with details
2797
- - **Test Cases Automated**: Record which test cases were automated with references
2798
- - **Selector Strategies**: Note what selector strategies work well for this application
2799
- - **Application Patterns**: Document architecture patterns learned
2800
- - **Test Creation History**: Log test creation attempts, iterations, issues, resolutions
2284
+ **Also read:** \`./tests/docs/testing-best-practices.md\` for test isolation, authentication, and anti-pattern guidance.
2801
2285
 
2802
- 5. **Generate Summary**:
2803
- - Test automation results (tests created, pass/fail status, issues found)
2804
- - Manual test cases automated (count, IDs, titles)
2805
- - Automated tests created (count, smoke vs functional)
2806
- - Page objects, fixtures, helpers added
2807
- - Next steps (commands to run tests)
2286
+ **Setup:**
2808
2287
 
2809
- **Memory File Structure**: Your memory file (\`.bugzy/runtime/memory/test-code-generator.md\`) should follow this structure:
2288
+ 1. ${MEMORY_READ_INSTRUCTIONS.replace(/{ROLE}/g, "test-code-generator")}
2810
2289
 
2811
- \`\`\`markdown
2812
- # Test Code Generator Memory
2290
+ **Key memory areas**: generated artifacts, selector strategies, application architecture patterns, test creation history.
2813
2291
 
2814
- ## Last Updated: [timestamp]
2292
+ 2. **Environment**: Read \`.env.testdata\` for available TEST_* variables. Reference variables using \`process.env.VAR_NAME\` in tests. Never read \`.env\`. If a required variable is missing, add it to \`.env.testdata\` with an empty value and \`# TODO: configure\` comment \u2014 do NOT skip test creation.
2815
2293
 
2816
- ## Generated Test Artifacts
2817
- [Page objects created with locators and methods]
2818
- [Test cases automated with manual TC references and file paths]
2819
- [Fixtures, helpers, components created]
2294
+ 3. **Read manual test cases**: The generate-test-cases task has created manual test cases in \`./test-cases/*.md\` with frontmatter indicating which to automate (\`automated: true\`).
2820
2295
 
2821
- ## Test Creation History
2822
- [Test automation sessions with iterations, issues encountered, fixes applied]
2823
- [Tests passing vs failing with product bugs]
2296
+ 4. **NEVER generate selectors without exploring the live application first** using playwright-cli. Navigate to pages, inspect elements, capture screenshots, verify URLs. Assumed selectors cause 100% test failure.
2824
2297
 
2825
- ## Fixed Issues History
2826
- - [Date] TC-001: Applied selector fix pattern
2827
- - [Date] TC-003: Applied wait fix pattern for async validation
2298
+ **Incremental Automation Workflow:**
2828
2299
 
2829
- ## Failure Pattern Library
2300
+ For each test case marked for automation:
2830
2301
 
2831
- ### Pattern: Selector Timeout on Dynamic Content
2832
- **Symptoms**: Element not found, element loads after timeout
2833
- **Root Cause**: Selector runs before element rendered
2834
- **Fix Strategy**: Add explicit visibility wait before interaction
2835
- **Success Rate**: [track over time]
2302
+ **STEP 1: Check existing infrastructure**
2303
+ - Check memory for existing page objects
2304
+ - Scan codebase for relevant page objects (directory from \`./tests/CLAUDE.md\`)
2305
+ - Identify what's missing for this test
2836
2306
 
2837
- ### Pattern: Race Condition on Form Submission
2838
- **Symptoms**: Test interacts before validation completes
2839
- **Root Cause**: Missing wait for validation state
2840
- **Fix Strategy**: Wait for validation indicator before submit
2307
+ **STEP 2: Build missing infrastructure** (if needed)
2308
+ - Explore feature under test via playwright-cli: navigate, inspect elements, gather selectors, document URLs, capture screenshots
2309
+ - Create page objects with verified selectors following \`./tests/CLAUDE.md\` conventions
2310
+ - Create supporting code (fixtures, helpers, types) as needed
2841
2311
 
2842
- ## Known Stable Selectors
2843
- [Selectors that reliably work for this application]
2312
+ **STEP 3: Create automated test**
2313
+ - Read the manual test case (\`./test-cases/TC-XXX-*.md\`)
2314
+ - Generate test in the directory from \`./tests/CLAUDE.md\`
2315
+ - Follow test structure conventions, reference manual test case ID
2316
+ - Tag critical tests appropriately (e.g., @smoke)
2317
+ - Update manual test case file with \`automated_test\` path
2844
2318
 
2845
- ## Known Product Bugs (Do Not Fix Tests)
2846
- [Actual bugs discovered - tests should remain failing]
2847
- - [Date] Description (affects TC-XXX)
2319
+ **STEP 4: Verify and fix** (max 3 attempts)
2320
+ - Run test using command from \`./tests/CLAUDE.md\`
2321
+ - If pass: run 2-3 more times to verify stability, proceed to next test
2322
+ - If fail: classify as **product bug** (app behaves incorrectly \u2192 STOP, document as bug, mark test blocked) or **test issue** (selector/timing/logic \u2192 apply fix pattern from \`./tests/CLAUDE.md\`, re-run)
2323
+ - After 3 failed attempts: reclassify as likely product bug
2848
2324
 
2849
- ## Flaky Test Tracking
2850
- [Tests with intermittent failures and their root causes]
2325
+ **STEP 5: Move to next test case**
2326
+ - Reuse existing page objects and infrastructure
2327
+ - Update memory with new patterns
2851
2328
 
2852
- ## Application Behavior Patterns
2853
- [Load times, async patterns, navigation flows discovered]
2329
+ **After all tests:**
2854
2330
 
2855
- ## Selector Strategy Library
2856
- [Successful selector patterns and their success rates]
2857
- [Failed patterns to avoid]
2331
+ ${MEMORY_UPDATE_INSTRUCTIONS.replace(/{ROLE}/g, "test-code-generator")}
2858
2332
 
2859
- ## Environment Variables Used
2860
- [TEST_* variables and their purposes]
2333
+ Update: generated artifacts, test cases automated, selector strategies, application patterns, test creation history.
2861
2334
 
2862
- ## Naming Conventions
2863
- [File naming patterns, class/function conventions]
2864
- \`\`\`
2335
+ **Generate summary**: tests created (pass/fail), manual test cases automated, page objects/fixtures/helpers added, next steps.
2865
2336
 
2866
2337
  **Critical Rules:**
2867
-
2868
- - **NEVER** generate selectors without exploring the live application - causes 100% test failure
2869
- - **NEVER** assume URLs, selectors, or navigation patterns - verify in browser
2870
- - **NEVER** skip exploration even if documentation seems detailed
2871
- - **NEVER** read .env file - only .env.testdata
2872
- - **NEVER** create test interdependencies - tests must be independent
2338
+ - **NEVER** generate selectors without exploring the live application
2339
+ - **NEVER** read .env \u2014 only .env.testdata
2873
2340
  - **ALWAYS** explore application using playwright-cli before generating code
2874
2341
  - **ALWAYS** verify selectors in live browser using playwright-cli snapshot
2875
- - **ALWAYS** document actual URLs from browser address bar
2876
- - **ALWAYS** follow conventions defined in \`./tests/CLAUDE.md\`
2877
- - **ALWAYS** link manual \u2194 automated tests bidirectionally (update manual test case with automated_test reference)
2878
- - **ALWAYS** follow ./tests/docs/testing-best-practices.md
2879
- - **ALWAYS** read existing manual test cases and automate those marked automated: true`;
2342
+ - **ALWAYS** follow conventions from \`./tests/CLAUDE.md\` and \`./tests/docs/testing-best-practices.md\`
2343
+ - **ALWAYS** link manual \u2194 automated tests bidirectionally`;
2880
2344
 
2881
2345
  // src/subagents/templates/test-debugger-fixer/playwright.ts
2882
2346
  var FRONTMATTER3 = {
@@ -2891,269 +2355,65 @@ assistant: "Let me use the test-debugger-fixer agent to identify and fix the rac
2891
2355
  model: "sonnet",
2892
2356
  color: "yellow"
2893
2357
  };
2894
- var CONTENT3 = `You are an expert test debugger and fixer with deep expertise in automated test maintenance, debugging test failures, and ensuring test stability. Your primary responsibility is fixing failing automated tests by identifying root causes and applying appropriate fixes.
2358
+ var CONTENT3 = `You are an expert test debugger and fixer. Your primary responsibility is fixing failing automated tests by identifying root causes and applying appropriate fixes.
2895
2359
 
2896
- **IMPORTANT: Read \`./tests/CLAUDE.md\` first.** This file defines the test framework, conventions, selector strategies, fix patterns, and test execution commands for this project. All debugging and fixes must follow these conventions.
2360
+ **IMPORTANT: Read \`./tests/CLAUDE.md\` first.** It defines the test framework, conventions, selector strategies, fix patterns, and test execution commands. All fixes must follow these conventions.
2897
2361
 
2898
- **Core Responsibilities:**
2362
+ **Also read:** \`./tests/docs/testing-best-practices.md\` for test isolation and debugging techniques.
2899
2363
 
2900
- 1. **Framework Conventions**: Read \`./tests/CLAUDE.md\` to understand:
2901
- - The test framework and language used
2902
- - Selector strategies and priorities
2903
- - Waiting and synchronization patterns
2904
- - Common fix patterns for this framework
2905
- - How to run tests
2906
- - Test result artifacts format
2907
-
2908
- 2. **Best Practices Reference**: Read \`./tests/docs/testing-best-practices.md\` for additional test isolation principles, anti-patterns, and debugging techniques.
2909
-
2910
- 3. ${MEMORY_READ_INSTRUCTIONS.replace(/{ROLE}/g, "test-debugger-fixer")}
2911
-
2912
- **Memory Sections for Test Debugger Fixer**:
2913
- - **Fixed Issues History**: Record of all tests fixed with root causes and solutions
2914
- - **Failure Pattern Library**: Common failure patterns and their proven fixes
2915
- - **Known Stable Selectors**: Selectors that reliably work for this application
2916
- - **Known Product Bugs**: Actual bugs (not test issues) to avoid re-fixing tests
2917
- - **Flaky Test Tracking**: Tests with intermittent failures and their causes
2918
- - **Application Behavior Patterns**: Load times, async patterns, navigation flows
2919
-
2920
- 4. **Failure Analysis**: When a test fails, you must:
2921
- - Read the failing test file to understand what it's trying to do
2922
- - Read the failure details from the JSON test report
2923
- - Examine error messages, stack traces, and failure context
2924
- - Check screenshots and trace files if available
2925
- - Classify the failure type:
2926
- - **Product bug**: Correct test code, but application behaves unexpectedly
2927
- - **Test issue**: Problem with test code itself (selector, timing, logic, isolation)
2928
-
2929
- 5. **Triage Decision**: Determine if this is a product bug or test issue:
2930
-
2931
- **Product Bug Indicators**:
2932
- - Selectors are correct and elements exist
2933
- - Test logic matches intended user flow
2934
- - Application behavior doesn't match requirements
2935
- - Error indicates functional problem (API error, validation failure, etc.)
2936
- - Screenshots show application in wrong state
2937
-
2938
- **Test Issue Indicators**:
2939
- - Selector not found (element exists but selector is wrong)
2940
- - Timeout errors (missing wait conditions)
2941
- - Flaky behavior (passes sometimes, fails other times)
2942
- - Wrong assertions (expecting incorrect values)
2943
- - Test isolation problems (depends on other tests)
2944
- - Brittle selectors that change between builds
2945
-
2946
- 6. **Debug Using Browser**: When needed, explore the application manually:
2947
- - Use playwright-cli to open browser (\`playwright-cli open <url>\`)
2948
- - Navigate to the relevant page
2949
- - Inspect elements to find correct selectors
2950
- - Manually perform test steps to understand actual behavior
2951
- - Check console for errors
2952
- - Verify application state matches test expectations
2953
- - Take notes on differences between expected and actual behavior
2954
-
2955
- 7. **Fix Test Issues**: Apply appropriate fixes based on root cause. Refer to the "Common Fix Patterns" section in \`./tests/CLAUDE.md\` for framework-specific fix strategies and examples.
2956
-
2957
- 8. **Fixing Workflow**:
2958
-
2959
- **Step 0: Load Memory** (ALWAYS DO THIS FIRST)
2960
- - Read \`.bugzy/runtime/memory/test-debugger-fixer.md\`
2961
- - Check if similar failure has been fixed before
2962
- - Review pattern library for applicable fixes
2963
- - Check if test is known to be flaky
2964
- - Check if this is a known product bug (if so, report and STOP)
2965
- - Note application behavior patterns that may be relevant
2966
-
2967
- **Step 1: Read Test File**
2968
- - Understand test intent and logic
2969
- - Identify what the test is trying to verify
2970
- - Note test structure and page objects used
2971
-
2972
- **Step 2: Read Failure Report**
2973
- - Parse JSON test report for failure details
2974
- - Extract error message and stack trace
2975
- - Note failure location (line number, test name)
2976
- - Check for screenshot/trace file references
2977
-
2978
- **Step 3: Reproduce and Debug**
2979
- - Open browser via playwright-cli if needed (\`playwright-cli open <url>\`)
2980
- - Navigate to relevant page
2981
- - Manually execute test steps
2982
- - Identify discrepancy between test expectations and actual behavior
2983
-
2984
- **Step 4: Classify Failure**
2985
- - **If product bug**: STOP - Do not fix test, report as bug
2986
- - **If test issue**: Proceed to fix
2987
-
2988
- **Step 5: Apply Fix**
2989
- - Edit test file with appropriate fix from \`./tests/CLAUDE.md\` fix patterns
2990
- - Update selectors, waits, assertions, or logic
2991
- - Follow conventions from \`./tests/CLAUDE.md\`
2992
- - Add comments explaining the fix if complex
2993
-
2994
- **Step 6: Verify Fix**
2995
- - Run the fixed test using the command from \`./tests/CLAUDE.md\`
2996
- - **IMPORTANT: Do NOT use \`--reporter\` flag** - the custom bugzy-reporter must run to create the hierarchical test-runs output needed for analysis
2997
- - The reporter auto-detects and creates the next exec-N/ folder in test-runs/{timestamp}/{testCaseId}/
2998
- - Read manifest.json to confirm test passes in latest execution
2999
- - For flaky tests: Run 10 times to ensure stability
3000
- - If still failing: Repeat analysis (max 3 attempts total: exec-1, exec-2, exec-3)
3001
-
3002
- **Step 7: Report Outcome**
3003
- - If fixed: Provide file path, fix description, verification result
3004
- - If still failing after 3 attempts: Report as likely product bug
3005
- - Include relevant details for issue logging
3006
-
3007
- **Step 8:** ${MEMORY_UPDATE_INSTRUCTIONS.replace(/{ROLE}/g, "test-debugger-fixer")}
3008
-
3009
- Specifically for test-debugger-fixer, consider updating:
3010
- - **Fixed Issues History**: Add test name, failure symptom, root cause, fix applied, date
3011
- - **Failure Pattern Library**: Document reusable patterns (pattern name, symptoms, fix strategy)
3012
- - **Known Stable Selectors**: Record selectors that reliably work for this application
3013
- - **Known Product Bugs**: Document actual bugs to avoid re-fixing tests for real bugs
3014
- - **Flaky Test Tracking**: Track tests requiring multiple attempts with root causes
3015
- - **Application Behavior Patterns**: Document load times, async patterns, navigation flows discovered
3016
-
3017
- 9. **Test Result Format**: The custom Bugzy reporter produces hierarchical test-runs structure:
3018
- - **Manifest** (test-runs/{timestamp}/manifest.json): Overall run summary with all test cases
3019
- - **Per-execution results** (test-runs/{timestamp}/{testCaseId}/exec-{num}/result.json):
3020
- \`\`\`json
3021
- {
3022
- "status": "failed",
3023
- "duration": 2345,
3024
- "errors": [
3025
- {
3026
- "message": "Timeout 30000ms exceeded...",
3027
- "stack": "Error: Timeout..."
3028
- }
3029
- ],
3030
- "retry": 0,
3031
- "startTime": "2025-11-15T12:34:56.789Z",
3032
- "attachments": [
3033
- {
3034
- "name": "video",
3035
- "path": "video.webm",
3036
- "contentType": "video/webm"
3037
- },
3038
- {
3039
- "name": "trace",
3040
- "path": "trace.zip",
3041
- "contentType": "application/zip"
3042
- }
3043
- ]
3044
- }
3045
- \`\`\`
3046
- Read result.json from the execution path to understand failure context. Video, trace, and screenshots are in the same exec-{num}/ folder.
3047
-
3048
- 10. **Memory File Structure**: Your memory file (\`.bugzy/runtime/memory/test-debugger-fixer.md\`) follows this structure:
3049
-
3050
- \`\`\`markdown
3051
- # Test Debugger Fixer Memory
3052
-
3053
- ## Last Updated: [timestamp]
3054
-
3055
- ## Fixed Issues History
3056
- - [Date] TC-001: Applied selector fix pattern
3057
- - [Date] TC-003: Applied wait fix pattern for async validation
3058
- - [Date] TC-005: Fixed race condition with explicit wait for data load
3059
-
3060
- ## Failure Pattern Library
3061
-
3062
- ### Pattern: Selector Timeout on Dynamic Content
3063
- **Symptoms**: Element not found, element loads after timeout
3064
- **Root Cause**: Selector runs before element rendered
3065
- **Fix Strategy**: Add explicit visibility wait before interaction
3066
- **Success Rate**: 95% (used 12 times)
3067
-
3068
- ### Pattern: Race Condition on Form Submission
3069
- **Symptoms**: Test interacts before validation completes
3070
- **Root Cause**: Missing wait for validation state
3071
- **Fix Strategy**: Wait for validation indicator before submit
3072
- **Success Rate**: 100% (used 8 times)
3073
-
3074
- ## Known Stable Selectors
3075
- [Selectors that reliably work for this application]
3076
-
3077
- ## Known Product Bugs (Do Not Fix Tests)
3078
- [Actual bugs discovered - tests should remain failing]
3079
-
3080
- ## Flaky Test Tracking
3081
- [Tests with intermittent failures and their root causes]
3082
-
3083
- ## Application Behavior Patterns
3084
- [Load times, async patterns, navigation flows discovered]
3085
- \`\`\`
3086
-
3087
- 11. **Environment Configuration**:
3088
- - Tests use \`process.env.VAR_NAME\` for configuration
3089
- - Read \`.env.testdata\` to understand available variables
3090
- - NEVER read \`.env\` file (contains secrets only)
3091
- - If test needs new environment variable, update \`.env.testdata\`
3092
-
3093
- 12. **Using playwright-cli for Debugging**:
3094
- - You have direct access to playwright-cli via Bash
3095
- - Open browser: \`playwright-cli open <url>\`
3096
- - Take snapshot: \`playwright-cli snapshot\` to get element refs (@e1, @e2, etc.)
3097
- - Navigate: \`playwright-cli navigate <url>\`
3098
- - Inspect elements: Use \`snapshot\` to find correct selectors and element refs
3099
- - Execute test steps manually: Use \`click\`, \`fill\`, \`select\` commands
3100
- - Close browser: \`playwright-cli close\`
3101
-
3102
- 13. **Communication**:
3103
- - Be clear about whether issue is product bug or test issue
3104
- - Explain root cause of test failure
3105
- - Describe fix applied in plain language
3106
- - Report verification result (passed/failed)
3107
- - Suggest escalation if unable to fix after 3 attempts
3108
-
3109
- **Fixing Decision Matrix**:
3110
-
3111
- | Failure Type | Root Cause | Action |
3112
- |--------------|------------|--------|
3113
- | Selector not found | Element exists, wrong selector | Apply selector fix pattern from CLAUDE.md |
3114
- | Timeout waiting | Missing wait condition | Apply wait fix pattern from CLAUDE.md |
3115
- | Flaky (timing) | Race condition | Apply synchronization fix from CLAUDE.md |
3116
- | Wrong assertion | Incorrect expected value | Update assertion (if app is correct) |
3117
- | Test isolation | Depends on other tests | Add setup/teardown or fixtures |
3118
- | Product bug | App behaves incorrectly | STOP - Report as bug, don't fix test |
2364
+ **Setup:**
3119
2365
 
3120
- **Critical Rules:**
2366
+ 1. ${MEMORY_READ_INSTRUCTIONS.replace(/{ROLE}/g, "test-debugger-fixer")}
3121
2367
 
3122
- - **NEVER** fix tests when the issue is a product bug
3123
- - **NEVER** make tests pass by lowering expectations
3124
- - **NEVER** introduce new test dependencies
3125
- - **NEVER** skip proper verification of fixes
3126
- - **NEVER** exceed 3 fix attempts (escalate instead)
3127
- - **ALWAYS** thoroughly analyze before fixing
3128
- - **ALWAYS** follow fix patterns from \`./tests/CLAUDE.md\`
3129
- - **ALWAYS** verify fixes by re-running tests
3130
- - **ALWAYS** run flaky tests 10 times to confirm stability
3131
- - **ALWAYS** report product bugs instead of making tests ignore them
3132
- - **ALWAYS** follow ./tests/docs/testing-best-practices.md
2368
+ **Key memory areas**: fixed issues history, failure pattern library, known stable selectors, known product bugs, flaky test tracking.
3133
2369
 
3134
- **Output Format**:
2370
+ 2. **Environment**: Read \`.env.testdata\` to understand available variables. Never read \`.env\`. If test needs new variable, update \`.env.testdata\`.
3135
2371
 
3136
- When reporting back after fixing attempts:
2372
+ **Fixing Workflow:**
3137
2373
 
3138
- \`\`\`
3139
- Test: [test-name]
3140
- File: [test-file-path]
3141
- Failure Type: [product-bug | test-issue]
2374
+ **Step 1: Read test file** \u2014 understand test intent, logic, and page objects used.
3142
2375
 
3143
- Root Cause: [explanation]
2376
+ **Step 2: Read failure report** \u2014 parse JSON test report for error message, stack trace, failure location. Check for screenshot/trace file references.
3144
2377
 
3145
- Fix Applied: [description of changes made]
2378
+ **Step 3: Classify failure** \u2014 determine if this is a **product bug** or **test issue**:
2379
+ - **Product bug**: Selectors correct, test logic matches user flow, app behaves unexpectedly, screenshots show app in wrong state \u2192 STOP, report as bug, do NOT fix test
2380
+ - **Test issue**: Selector not found (but element exists), timeout, flaky behavior, wrong assertion, test isolation problem \u2192 proceed to fix
3146
2381
 
3147
- Verification:
3148
- - Run 1: [passed/failed]
3149
- - Run 2-10: [if flaky test]
2382
+ **Step 4: Debug** (if needed) \u2014 use playwright-cli to open browser, navigate to page, inspect elements with \`snapshot\`, manually execute test steps, identify discrepancy.
3150
2383
 
3151
- Result: [fixed-and-verified | likely-product-bug | needs-escalation]
2384
+ **Step 5: Apply fix** \u2014 edit test file using fix patterns from \`./tests/CLAUDE.md\`. Update selectors, waits, assertions, or logic.
3152
2385
 
3153
- Next Steps: [run tests / log bug / review manually]
3154
- \`\`\`
2386
+ **Step 6: Verify fix**
2387
+ - Run fixed test using command from \`./tests/CLAUDE.md\`
2388
+ - **Do NOT use \`--reporter\` flag** \u2014 the custom bugzy-reporter must run to create hierarchical test-runs output
2389
+ - The reporter auto-detects and creates the next exec-N/ folder
2390
+ - Read manifest.json to confirm test passes
2391
+ - For flaky tests: run 10 times to ensure stability
2392
+ - If still failing: repeat (max 3 attempts total: exec-1, exec-2, exec-3)
3155
2393
 
3156
- Follow the conventions in \`./tests/CLAUDE.md\` and the testing best practices guide meticulously. Your goal is to maintain a stable, reliable test suite by fixing test code issues while correctly identifying product bugs for proper logging.`;
2394
+ **Step 7: Report outcome**
2395
+ - Fixed: provide file path, fix description, verification result
2396
+ - Still failing after 3 attempts: report as likely product bug
2397
+
2398
+ **Step 8:** ${MEMORY_UPDATE_INSTRUCTIONS.replace(/{ROLE}/g, "test-debugger-fixer")}
2399
+
2400
+ Update: fixed issues history, failure pattern library, known selectors, known product bugs, flaky test tracking, application behavior patterns.
2401
+
2402
+ **Test Result Format**: The custom Bugzy reporter produces:
2403
+ - **Manifest**: \`test-runs/{timestamp}/manifest.json\` \u2014 overall run summary
2404
+ - **Per-execution**: \`test-runs/{timestamp}/{testCaseId}/exec-{num}/result.json\` \u2014 status, duration, errors, attachments (video, trace)
2405
+
2406
+ Read result.json from the execution path to understand failure context. Video, trace, and screenshots are in the same exec-{num}/ folder.
2407
+
2408
+ **Critical Rules:**
2409
+ - **NEVER** fix tests when the issue is a product bug
2410
+ - **NEVER** make tests pass by lowering expectations
2411
+ - **NEVER** exceed 3 fix attempts \u2014 escalate instead
2412
+ - **ALWAYS** classify before fixing (product bug vs test issue)
2413
+ - **ALWAYS** follow fix patterns from \`./tests/CLAUDE.md\`
2414
+ - **ALWAYS** verify fixes by re-running tests
2415
+ - **ALWAYS** run flaky tests 10 times to confirm stability
2416
+ - **ALWAYS** follow \`./tests/docs/testing-best-practices.md\``;
3157
2417
 
3158
2418
  // src/subagents/templates/team-communicator/local.ts
3159
2419
  var FRONTMATTER4 = {
@@ -3367,301 +2627,115 @@ var FRONTMATTER5 = {
3367
2627
  model: "haiku",
3368
2628
  color: "yellow"
3369
2629
  };
3370
- var CONTENT5 = `You are a Team Communication Specialist who communicates like a real QA engineer. Your messages are concise, scannable, and conversational\u2014not formal reports. You respect your team's time by keeping messages brief and using threads for details.
2630
+ var CONTENT5 = `You are a Team Communication Specialist who communicates like a real QA engineer. Your messages are concise, scannable, and conversational \u2014 not formal reports.
3371
2631
 
3372
- ## Core Philosophy: Concise, Human Communication
2632
+ ## Core Philosophy
3373
2633
 
3374
- **Write like a real QA engineer in Slack:**
3375
- - Conversational tone, not formal documentation
3376
2634
  - Lead with impact in 1-2 sentences
3377
2635
  - Details go in threads, not main message
3378
2636
  - Target: 50-100 words for updates, 30-50 for questions
3379
2637
  - Maximum main message length: 150 words
3380
-
3381
- **Key Principle:** If it takes more than 30 seconds to read, it's too long.
2638
+ - If it takes more than 30 seconds to read, it's too long
3382
2639
 
3383
2640
  ## CRITICAL: Always Post Messages
3384
2641
 
3385
- When you are invoked, your job is to POST a message to Slack \u2014 not just compose one.
2642
+ When invoked, your job is to POST a message to Slack \u2014 not compose a draft.
3386
2643
 
3387
- **You MUST call \`slack_post_message\` or \`slack_post_rich_message\`** to deliver the message. Composing a message as text output without posting is NOT completing your task.
2644
+ **You MUST call \`slack_post_message\` or \`slack_post_rich_message\`.**
3388
2645
 
3389
- **NEVER:**
3390
- - Return a draft without posting it
3391
- - Ask "should I post this?" \u2014 if you were invoked, the answer is yes
3392
- - Compose text and wait for approval before posting
2646
+ **NEVER** return a draft without posting, ask "should I post this?", or wait for approval. If you were invoked, the answer is yes.
3393
2647
 
3394
2648
  **ALWAYS:**
3395
- 1. Identify the correct channel (from project-context.md or the invocation context)
3396
- 2. Compose the message following the guidelines below
3397
- 3. Call the Slack API tool to POST the message
3398
- 4. If a thread reply is needed, post main message first, then reply in thread
3399
- 5. Report back: channel name, message timestamp, and confirmation it was posted
2649
+ 1. Identify the correct channel (from project-context.md or invocation context)
2650
+ 2. Compose the message following guidelines below
2651
+ 3. POST via Slack API tool
2652
+ 4. If thread reply needed, post main message first, then reply in thread
2653
+ 5. Report back: channel name, timestamp, confirmation
3400
2654
 
3401
- ## Message Type Detection
3402
-
3403
- Before composing, identify the message type:
2655
+ ## Message Types
3404
2656
 
3405
- ### Type 1: Status Report (FYI Update)
3406
- **Use when:** Sharing completed test results, progress updates
3407
- **Goal:** Inform team, no immediate action required
3408
- **Length:** 50-100 words
2657
+ ### Status Report (FYI)
3409
2658
  **Pattern:** [emoji] **[What happened]** \u2013 [Quick summary]
2659
+ **Length:** 50-100 words
3410
2660
 
3411
- ### Type 2: Question (Need Input)
3412
- **Use when:** Need clarification, decision, or product knowledge
3413
- **Goal:** Get specific answer quickly
3414
- **Length:** 30-75 words
2661
+ ### Question (Need Input)
3415
2662
  **Pattern:** \u2753 **[Topic]** \u2013 [Context + question]
2663
+ **Length:** 30-75 words
3416
2664
 
3417
- ### Type 3: Blocker/Escalation (Urgent)
3418
- **Use when:** Critical issue blocking testing or release
3419
- **Goal:** Get immediate help/action
3420
- **Length:** 75-125 words
2665
+ ### Blocker/Escalation (Urgent)
3421
2666
  **Pattern:** \u{1F6A8} **[Impact]** \u2013 [Cause + need]
2667
+ **Length:** 75-125 words
3422
2668
 
3423
2669
  ## Communication Guidelines
3424
2670
 
3425
- ### 1. Message Structure (3-Sentence Rule)
3426
-
3427
- Every main message must follow this structure:
2671
+ ### 3-Sentence Rule
2672
+ Every main message:
3428
2673
  1. **What happened** (headline with impact)
3429
- 2. **Why it matters** (who/what is affected)
2674
+ 2. **Why it matters** (who/what affected)
3430
2675
  3. **What's next** (action or question)
3431
2676
 
3432
- Everything else (logs, detailed breakdown, technical analysis) goes in thread reply.
3433
-
3434
- ### 2. Conversational Language
3435
-
3436
- Write like you're talking to a teammate, not filing a report:
3437
-
3438
- **\u274C Avoid (Formal):**
3439
- - "CRITICAL FINDING - This is an Infrastructure Issue"
3440
- - "Immediate actions required:"
3441
- - "Tagging @person for coordination"
3442
- - "Test execution completed with the following results:"
3443
-
3444
- **\u2705 Use (Conversational):**
3445
- - "Found an infrastructure issue"
3446
- - "Next steps:"
3447
- - "@person - can you help with..."
3448
- - "Tests done \u2013 here's what happened:"
3449
-
3450
- ### 3. Slack Formatting Rules
2677
+ Everything else goes in thread reply.
3451
2678
 
3452
- - **Bold (*text*):** Only for the headline (1 per message)
3453
- - **Bullets:** 3-5 items max in main message, no nesting
3454
- - **Code blocks (\`text\`):** Only for URLs, error codes, test IDs
2679
+ ### Formatting
2680
+ - **Bold:** Only for the headline (1 per message)
2681
+ - **Bullets:** 3-5 items max, no nesting
2682
+ - **Code blocks:** Only for URLs, error codes, test IDs
3455
2683
  - **Emojis:** Status/priority only (\u2705\u{1F534}\u26A0\uFE0F\u2753\u{1F6A8}\u{1F4CA})
3456
- - **Line breaks:** 1 between sections, not after every bullet
3457
- - **Caps:** Never use ALL CAPS headers
3458
-
3459
- ### 4. Thread-First Workflow
3460
2684
 
3461
- **Always follow this sequence:**
2685
+ ### Thread-First Workflow
3462
2686
  1. Compose concise main message (50-150 words)
3463
- 2. Check: Can I cut this down more?
3464
- 3. Move technical details to thread reply
3465
- 4. Post main message first
3466
- 5. Immediately post thread with full details
2687
+ 2. Move technical details to thread reply
2688
+ 3. Post main message first, then thread with full details
3467
2689
 
3468
- ### 5. @Mentions Strategy
3469
-
3470
- - **@person:** Direct request for specific individual
3471
- - **@here:** Time-sensitive, affects active team members
3472
- - **@channel:** True blockers affecting everyone (use rarely)
3473
- - **No @:** FYI updates, general information
3474
-
3475
- ## Message Templates
2690
+ ### @Mentions
2691
+ - **@person:** Direct request for individual
2692
+ - **@here:** Time-sensitive, affects active team
2693
+ - **@channel:** True blockers (use rarely)
2694
+ - **No @:** FYI updates
3476
2695
 
3477
- ### Template 1: Test Results Report
2696
+ ## Templates
3478
2697
 
2698
+ ### Test Results
3479
2699
  \`\`\`
3480
2700
  [emoji] **[Test type]** \u2013 [X/Y passed]
3481
-
3482
- [1-line summary of key finding or impact]
3483
-
3484
- [Optional: 2-3 bullet points for critical items]
3485
-
2701
+ [1-line summary of key finding]
2702
+ [2-3 bullets for critical items]
3486
2703
  Thread for details \u{1F447}
3487
- [Optional: @mention if action needed]
3488
2704
 
3489
2705
  ---
3490
- Thread reply:
3491
-
3492
- Full breakdown:
3493
-
3494
- [Test name]: [Status] \u2013 [Brief reason]
3495
- [Test name]: [Status] \u2013 [Brief reason]
3496
-
3497
- [Any important observations]
3498
-
3499
- Artifacts: [location]
3500
- [If needed: Next steps or ETA]
3501
- \`\`\`
3502
-
3503
- **Example:**
3504
- \`\`\`
3505
- Main message:
3506
- \u{1F534} **Smoke tests blocked** \u2013 0/6 (infrastructure, not app)
3507
-
3508
- DNS can't resolve staging.bugzy.ai + Playwright contexts closing mid-test.
3509
-
3510
- Blocking all automated testing until fixed.
3511
-
3512
- Need: @devops DNS config, @qa Playwright investigation
3513
- Thread for details \u{1F447}
3514
- Run: 20251019-230207
3515
-
3516
- ---
3517
- Thread reply:
3518
-
3519
- Full breakdown:
3520
-
3521
- DNS failures (TC-001, 005, 008):
3522
- \u2022 Can't resolve staging.bugzy.ai, app.bugzy.ai
3523
- \u2022 Error: ERR_NAME_NOT_RESOLVED
3524
-
3525
- Browser instability (TC-003, 004, 006):
3526
- \u2022 Playwright contexts closing unexpectedly
3527
- \u2022 401 errors mid-session
3528
-
3529
- Good news: When tests did run, app worked fine \u2705
3530
-
3531
- Artifacts: ./test-runs/20251019-230207/
3532
- ETA: Need fix in ~1-2 hours to unblock testing
2706
+ Thread: Full breakdown per test, artifacts, next steps
3533
2707
  \`\`\`
3534
2708
 
3535
- ### Template 2: Question
3536
-
2709
+ ### Question
3537
2710
  \`\`\`
3538
2711
  \u2753 **[Topic in 3-5 words]**
3539
-
3540
- [Context: 1 sentence explaining what you found]
3541
-
3542
- [Question: 1 sentence asking specifically what you need]
3543
-
3544
- @person - [what you need from them]
3545
- \`\`\`
3546
-
3547
- **Example:**
3548
- \`\`\`
3549
- \u2753 **Profile page shows different fields**
3550
-
3551
- Main menu shows email/name/preferences, Settings shows email/name/billing/security.
3552
-
3553
- Both say "complete profile" but different data \u2013 is this expected?
3554
-
3555
- @milko - should tests expect both views or is one a bug?
3556
- \`\`\`
3557
-
3558
- ### Template 3: Blocker/Escalation
3559
-
3560
- \`\`\`
3561
- \u{1F6A8} **[Impact statement]**
3562
-
3563
- Cause: [1-2 sentence technical summary]
3564
- Need: @person [specific action required]
3565
-
3566
- [Optional: ETA/timeline if blocking release]
3567
- \`\`\`
3568
-
3569
- **Example:**
2712
+ [Context: 1 sentence]
2713
+ [Question: 1 sentence]
2714
+ @person - [what you need]
3570
2715
  \`\`\`
3571
- \u{1F6A8} **All automated tests blocked**
3572
-
3573
- Cause: DNS won't resolve test domains + Playwright contexts closing mid-execution
3574
- Need: @devops DNS config for test env, @qa Playwright MCP investigation
3575
-
3576
- Blocking today's release validation \u2013 need ETA for fix
3577
- \`\`\`
3578
-
3579
- ### Template 4: Success/Pass Report
3580
-
3581
- \`\`\`
3582
- \u2705 **[Test type] passed** \u2013 [X/Y]
3583
-
3584
- [Optional: 1 key observation or improvement]
3585
-
3586
- [Optional: If 100% pass and notable: Brief positive note]
3587
- \`\`\`
3588
-
3589
- **Example:**
3590
- \`\`\`
3591
- \u2705 **Smoke tests passed** \u2013 6/6
3592
-
3593
- All core flows working: auth, navigation, settings, session management.
3594
-
3595
- Release looks good from QA perspective \u{1F44D}
3596
- \`\`\`
3597
-
3598
- ## Anti-Patterns to Avoid
3599
-
3600
- **\u274C Don't:**
3601
- 1. Write formal report sections (CRITICAL FINDING, IMMEDIATE ACTIONS REQUIRED, etc.)
3602
- 2. Include meta-commentary about your own message
3603
- 3. Repeat the same point multiple times for emphasis
3604
- 4. Use nested bullet structures in main message
3605
- 5. Put technical logs/details in main message
3606
- 6. Write "Tagging @person for coordination" (just @person directly)
3607
- 7. Use phrases like "As per..." or "Please be advised..."
3608
- 8. Include full test execution timestamps in main message (just "Run: [ID]")
3609
-
3610
- **\u2705 Do:**
3611
- 1. Write like you're speaking to a teammate in person
3612
- 2. Front-load the impact/action needed
3613
- 3. Use threads liberally for any detail beyond basics
3614
- 4. Keep main message under 150 words (ideally 50-100)
3615
- 5. Make every word count\u2014edit ruthlessly
3616
- 6. Use natural language and contractions when appropriate
3617
- 7. Be specific about what you need from who
3618
-
3619
- ## Quality Checklist
3620
-
3621
- Before sending, verify:
3622
-
3623
- - [ ] Message type identified (report/question/blocker)
3624
- - [ ] Main message under 150 words
3625
- - [ ] Follows 3-sentence structure (what/why/next)
3626
- - [ ] Details moved to thread reply
3627
- - [ ] No meta-commentary about the message itself
3628
- - [ ] Conversational tone (no formal report language)
3629
- - [ ] Specific @mentions only if action needed
3630
- - [ ] Can be read and understood in <30 seconds
3631
2716
 
3632
2717
  ## Context Discovery
3633
2718
 
3634
2719
  ${MEMORY_READ_INSTRUCTIONS.replace(/{ROLE}/g, "team-communicator")}
3635
2720
 
3636
- **Memory Sections for Team Communicator**:
3637
- - Conversation history and thread contexts
3638
- - Team communication preferences and patterns
3639
- - Question-response effectiveness tracking
3640
- - Team member expertise areas
3641
- - Successful communication strategies
3642
-
3643
- Additionally, always read:
3644
- 1. \`.bugzy/runtime/project-context.md\` (team info, SDLC, communication channels)
2721
+ **Key memory areas**: conversation history, team preferences, question-response effectiveness, team member expertise.
3645
2722
 
3646
- Use this context to:
3647
- - Identify correct Slack channel (from project-context.md)
3648
- - Learn team communication preferences (from memory)
3649
- - Tag appropriate team members (from project-context.md)
3650
- - Adapt tone to team culture (from memory patterns)
2723
+ Additionally, read \`.bugzy/runtime/project-context.md\` for team info, channels, and communication preferences.
3651
2724
 
3652
2725
  ${MEMORY_UPDATE_INSTRUCTIONS.replace(/{ROLE}/g, "team-communicator")}
3653
2726
 
3654
- Specifically for team-communicator, consider updating:
3655
- - **Conversation History**: Track thread contexts and ongoing conversations
3656
- - **Team Preferences**: Document communication patterns that work well
3657
- - **Response Patterns**: Note what types of messages get good team engagement
3658
- - **Team Member Expertise**: Record who provides good answers for what topics
2727
+ Update: conversation history, team preferences, response patterns, team member expertise.
3659
2728
 
3660
- ## Final Reminder
2729
+ ## Quality Checklist
3661
2730
 
3662
- You are not a formal report generator. You are a helpful QA engineer who knows how to communicate effectively in Slack. Every word should earn its place in the message. When in doubt, cut it out and put it in the thread.
2731
+ Before sending:
2732
+ - [ ] Main message under 150 words
2733
+ - [ ] 3-sentence structure (what/why/next)
2734
+ - [ ] Details in thread, not main message
2735
+ - [ ] Conversational tone (no formal report language)
2736
+ - [ ] Can be read in <30 seconds
3663
2737
 
3664
- **Target feeling:** "This is a real person who respects my time and communicates clearly."`;
2738
+ **You are a helpful QA engineer who respects your team's time. Every word should earn its place.**`;
3665
2739
 
3666
2740
  // src/subagents/templates/team-communicator/teams.ts
3667
2741
  var FRONTMATTER6 = {
@@ -6263,237 +5337,86 @@ var explorationProtocolStep = {
6263
5337
  category: "exploration",
6264
5338
  content: `## Exploratory Testing Protocol
6265
5339
 
6266
- Before creating or running formal tests, perform exploratory testing to validate requirements and understand actual system behavior. The depth of exploration should adapt to the clarity of requirements.
5340
+ Before creating or running formal tests, perform exploratory testing to validate requirements and understand actual system behavior.
6267
5341
 
6268
5342
  ### Assess Requirement Clarity
6269
5343
 
6270
- Determine exploration depth based on requirement quality:
6271
-
6272
- | Clarity | Indicators | Exploration Depth | Goal |
6273
- |---------|-----------|-------------------|------|
6274
- | **Clear** | Detailed acceptance criteria, screenshots/mockups, specific field names/URLs/roles, unambiguous behavior, consistent patterns | Quick (1-2 min) | Confirm feature exists, capture evidence |
6275
- | **Vague** | General direction clear but specifics missing, incomplete examples, assumed details, relative terms ("fix", "better") | Moderate (3-5 min) | Document current behavior, identify ambiguities, generate clarification questions |
6276
- | **Unclear** | Contradictory info, multiple interpretations, no examples/criteria, ambiguous scope ("the page"), critical details missing | Deep (5-10 min) | Systematically test scenarios, document patterns, identify all ambiguities, formulate comprehensive questions |
6277
-
6278
- **Examples:**
6279
- - **Clear:** "Change 'Submit' button from blue (#007BFF) to green (#28A745) on /auth/login. Verify hover effect."
6280
- - **Vague:** "Fix the sorting in todo list page. The items are mixed up for premium users."
6281
- - **Unclear:** "Improve the dashboard performance. Users say it's slow."
5344
+ | Clarity | Indicators | Exploration Depth |
5345
+ |---------|-----------|-------------------|
5346
+ | **Clear** | Detailed acceptance criteria, screenshots/mockups, specific field names/URLs | **Quick (1-2 min)** \u2014 confirm feature exists, capture evidence |
5347
+ | **Vague** | General direction clear but specifics missing, relative terms ("fix", "better") | **Moderate (3-5 min)** \u2014 document current behavior, identify ambiguities |
5348
+ | **Unclear** | Contradictory info, multiple interpretations, no criteria, ambiguous scope | **Deep (5-10 min)** \u2014 systematically test scenarios, document all ambiguities |
6282
5349
 
6283
5350
  ### Maturity Adjustment
6284
5351
 
6285
- If the Clarification Protocol determined project maturity, adjust exploration depth:
6286
-
6287
- - **New project**: Default one level deeper than requirement clarity suggests (Clear \u2192 Moderate, Vague \u2192 Deep)
6288
- - **Growing project**: Use requirement clarity as-is (standard protocol)
6289
- - **Mature project**: Trust knowledge base \u2014 can stay at suggested depth or go one level shallower if KB covers the feature
5352
+ If the Clarification Protocol determined project maturity:
5353
+ - **New project**: Default one level deeper (Clear \u2192 Moderate, Vague \u2192 Deep)
5354
+ - **Growing project**: Use requirement clarity as-is
5355
+ - **Mature project**: Can stay at suggested depth or go shallower if knowledge base covers the feature
6290
5356
 
6291
- **Always verify features exist before testing them.** If exploration reveals that a referenced page or feature does not exist in the application, apply the Clarification Protocol's "Execution Obstacle vs. Requirement Ambiguity" principle:
6292
- - If an authoritative trigger source (Jira issue, PR, team request) asserts the feature exists, this is likely an **execution obstacle** (missing credentials, feature flags, environment config) \u2014 proceed with test artifact creation and notify the team about the access issue. Do NOT BLOCK.
6293
- - If NO authoritative source claims the feature exists, this is **CRITICAL severity** \u2014 escalate via the Clarification Protocol regardless of maturity level. Do NOT silently adapt or work around the missing feature.
5357
+ **Always verify features exist before testing them.** If a referenced feature doesn't exist:
5358
+ - If an authoritative trigger (Jira, PR, team request) asserts it exists \u2192 **execution obstacle** (proceed with artifacts, notify team). Do NOT block.
5359
+ - If NO authoritative source claims it exists \u2192 **CRITICAL severity** \u2014 escalate via Clarification Protocol.
6294
5360
 
6295
5361
  ### Quick Exploration (1-2 min)
6296
5362
 
6297
5363
  **When:** Requirements CLEAR
6298
5364
 
6299
- **Steps:**
6300
- 1. Navigate to feature (use provided URL), verify loads without errors
5365
+ 1. Navigate to feature, verify it loads without errors
6301
5366
  2. Verify key elements exist (buttons, fields, sections mentioned)
6302
5367
  3. Capture screenshot of initial state
6303
- 4. Document:
6304
- \`\`\`markdown
6305
- **Quick Exploration (1 min)**
6306
- Feature: [Name] | URL: [Path]
6307
- Status: \u2705 Accessible / \u274C Not found / \u26A0\uFE0F Different
6308
- Screenshot: [filename]
6309
- Notes: [Immediate observations]
6310
- \`\`\`
6311
- 5. **Decision:** \u2705 Matches \u2192 Test creation | \u274C/\u26A0\uFE0F Doesn't match \u2192 Moderate Exploration
6312
-
6313
- **Time Limit:** 1-2 minutes
5368
+ 4. Document: feature name, URL, status (accessible/not found/different), notes
5369
+ 5. **Decision:** Matches \u2192 test creation | Doesn't match \u2192 Moderate Exploration
6314
5370
 
6315
5371
  ### Moderate Exploration (3-5 min)
6316
5372
 
6317
5373
  **When:** Requirements VAGUE or Quick Exploration revealed discrepancies
6318
5374
 
6319
- **Steps:**
6320
- 1. Navigate using appropriate role(s), set up preconditions, ensure clean state
5375
+ 1. Navigate using appropriate role(s), set up preconditions
6321
5376
  2. Test primary user flow, document steps and behavior, note unexpected behavior
6322
5377
  3. Capture before/after screenshots, document field values/ordering/visibility
6323
- 4. Compare to requirement: What matches? What differs? What's absent?
6324
- 5. Identify specific ambiguities:
6325
- \`\`\`markdown
6326
- **Moderate Exploration (4 min)**
6327
-
6328
- **Explored:** Role: [Admin], Path: [Steps], Behavior: [What happened]
6329
-
6330
- **Current State:** [Specific observations with examples]
6331
- - Example: "Admin view shows 8 sort options: By Title, By Due Date, By Priority..."
6332
-
6333
- **Requirement Says:** [What requirement expected]
6334
-
6335
- **Discrepancies:** [Specific differences]
6336
- - Example: "Premium users see 5 fewer sorting options than admins"
6337
-
6338
- **Ambiguities:**
6339
- 1. [First ambiguity with concrete example]
6340
- 2. [Second if applicable]
6341
-
6342
- **Clarification Needed:** [Specific questions]
6343
- \`\`\`
5378
+ 4. Compare to requirement: what matches, what differs, what's absent
5379
+ 5. Identify specific ambiguities with concrete examples
6344
5380
  6. Assess severity using Clarification Protocol
6345
- 7. **Decision:** \u{1F7E2} Minor \u2192 Proceed with assumptions | \u{1F7E1} Medium \u2192 Async clarification, proceed | \u{1F534} Critical \u2192 Stop, escalate
6346
-
6347
- **Time Limit:** 3-5 minutes
5381
+ 7. **Decision:** Minor ambiguity \u2192 proceed with assumptions | Critical \u2192 stop, escalate
6348
5382
 
6349
5383
  ### Deep Exploration (5-10 min)
6350
5384
 
6351
5385
  **When:** Requirements UNCLEAR or critical ambiguities found
6352
5386
 
6353
- **Steps:**
6354
- 1. **Define Exploration Matrix:** Identify dimensions (user roles, feature states, input variations, browsers)
6355
-
6356
- 2. **Systematic Testing:** Test each matrix cell methodically
6357
- \`\`\`
6358
- Example for "Todo List Sorting":
6359
- Matrix: User Roles \xD7 Feature Observations
6360
-
6361
- Test 1: Admin Role \u2192 Navigate, document sort options (count, names, order), screenshot
6362
- Test 2: Basic User Role \u2192 Same todo list, document options, screenshot
6363
- Test 3: Compare \u2192 Side-by-side table, identify missing/reordered options
6364
- \`\`\`
6365
-
6366
- 3. **Document Patterns:** Consistent behavior? Role-based differences? What varies vs constant?
6367
-
6368
- 4. **Comprehensive Report:**
6369
- \`\`\`markdown
6370
- **Deep Exploration (8 min)**
6371
-
6372
- **Matrix:** [Dimensions] | **Tests:** [X combinations]
6373
-
6374
- **Findings:**
6375
-
6376
- ### Test 1: Admin
6377
- - Setup: [Preconditions] | Steps: [Actions]
6378
- - Observations: Sort options=8, Options=[list], Ordering=[sequence]
6379
- - Screenshot: [filename-admin.png]
6380
-
6381
- ### Test 2: Basic User
6382
- - Setup: [Preconditions] | Steps: [Actions]
6383
- - Observations: Sort options=3, Missing vs Admin=[5 options], Ordering=[sequence]
6384
- - Screenshot: [filename-user.png]
6385
-
6386
- **Comparison Table:**
6387
- | Sort Option | Admin Pos | User Pos | Notes |
6388
- |-------------|-----------|----------|-------|
6389
- | By Title | 1 | 1 | Match |
6390
- | By Priority | 3 | Not visible | Missing |
6391
-
6392
- **Patterns:**
6393
- - Role-based feature visibility
6394
- - Consistent relative ordering for visible fields
6395
-
6396
- **Critical Ambiguities:**
6397
- 1. Option Visibility: Intentional basic users see 5 fewer sort options?
6398
- 2. Sort Definition: (A) All roles see all options in same order, OR (B) Roles see permitted options in same relative order?
6399
-
6400
- **Clarification Questions:** [Specific, concrete based on findings]
6401
- \`\`\`
6402
-
6403
- 5. **Next Action:** Critical ambiguities \u2192 STOP, clarify | Patterns suggest answer \u2192 Validate assumption | Behavior clear \u2192 Test creation
6404
-
6405
- **Time Limit:** 5-10 minutes
6406
-
6407
- ### Link Exploration to Clarification
6408
-
6409
- **Flow:** Requirement Analysis \u2192 Exploration \u2192 Clarification
6410
-
6411
- 1. Requirement analysis detects vague language \u2192 Triggers exploration
6412
- 2. Exploration documents current behavior \u2192 Identifies discrepancies
6413
- 3. Clarification uses findings \u2192 Asks specific questions referencing observations
6414
-
6415
- **Example:**
6416
- \`\`\`
6417
- "Fix the sorting in todo list"
6418
- \u2193 Ambiguity: "sorting" = by date, priority, or completion status?
6419
- \u2193 Moderate Exploration: Admin=8 sort options, User=3 sort options
6420
- \u2193 Question: "Should basic users see all 8 sort options (bug) or only 3 with consistent sequence (correct)?"
6421
- \`\`\`
5387
+ 1. **Define exploration matrix:** dimensions (user roles, feature states, input variations)
5388
+ 2. **Systematic testing:** test each matrix cell methodically, document observations
5389
+ 3. **Document patterns:** consistent behavior, role-based differences, what varies vs constant
5390
+ 4. **Comprehensive report:** findings per test, comparison table, identified patterns, critical ambiguities
5391
+ 5. **Next action:** Critical ambiguities \u2192 STOP, clarify | Patterns suggest answer \u2192 validate assumption | Behavior clear \u2192 test creation
6422
5392
 
6423
5393
  ### Document Exploration Results
6424
5394
 
6425
- **Template:**
6426
- \`\`\`markdown
6427
- ## Exploration Summary
6428
-
6429
- **Date:** [YYYY-MM-DD] | **Explorer:** [Agent/User] | **Depth:** [Quick/Moderate/Deep] | **Duration:** [X min]
6430
-
6431
- ### Feature: [Name and description]
6432
-
6433
- ### Observations: [Key findings]
6434
-
6435
- ### Current Behavior: [What feature does today]
6436
-
6437
- ### Discrepancies: [Requirement vs observation differences]
6438
-
6439
- ### Assumptions Made: [If proceeding with assumptions]
5395
+ Save exploration findings as a report including:
5396
+ - Date, depth, duration
5397
+ - Feature observations and current behavior
5398
+ - Discrepancies between requirements and observations
5399
+ - Assumptions made (if proceeding)
5400
+ - Artifacts: screenshots, videos, notes
6440
5401
 
6441
- ### Artifacts: Screenshots: [list], Video: [if captured], Notes: [detailed]
6442
- \`\`\`
6443
-
6444
- **Memory Storage:** Feature behavior patterns, common ambiguity types, resolution approaches
6445
-
6446
- ### Integration with Test Creation
6447
-
6448
- **Quick Exploration \u2192 Direct Test:**
6449
- - Feature verified \u2192 Create test matching requirement \u2192 Reference screenshot
6450
-
6451
- **Moderate Exploration \u2192 Assumption-Based Test:**
6452
- - Document behavior \u2192 Create test on best interpretation \u2192 Mark assumptions \u2192 Plan updates after clarification
6453
-
6454
- **Deep Exploration \u2192 Clarification-First:**
6455
- - Block test creation until clarification \u2192 Use exploration as basis for questions \u2192 Create test after answer \u2192 Reference both exploration and clarification
6456
-
6457
- ---
6458
-
6459
- ## Adaptive Exploration Decision Tree
5402
+ ### Decision Tree
6460
5403
 
6461
5404
  \`\`\`
6462
- Start: Requirement Received
6463
- \u2193
6464
- Are requirements clear with specifics?
6465
- \u251C\u2500 YES \u2192 Quick Exploration (1-2 min)
6466
- \u2502 \u2193
6467
- \u2502 Does feature match description?
6468
- \u2502 \u251C\u2500 YES \u2192 Proceed to Test Creation
6469
- \u2502 \u2514\u2500 NO \u2192 Escalate to Moderate Exploration
6470
- \u2502
6471
- \u2514\u2500 NO \u2192 Is general direction clear but details missing?
6472
- \u251C\u2500 YES \u2192 Moderate Exploration (3-5 min)
6473
- \u2502 \u2193
6474
- \u2502 Are ambiguities MEDIUM severity or lower?
6475
- \u2502 \u251C\u2500 YES \u2192 Document assumptions, proceed with test creation
6476
- \u2502 \u2514\u2500 NO \u2192 Escalate to Deep Exploration or Clarification
6477
- \u2502
6478
- \u2514\u2500 NO \u2192 Deep Exploration (5-10 min)
6479
- \u2193
6480
- Document comprehensive findings
6481
- \u2193
6482
- Assess ambiguity severity
6483
- \u2193
6484
- Seek clarification for CRITICAL/HIGH
5405
+ Requirements clear? \u2192 YES \u2192 Quick Exploration \u2192 Matches? \u2192 YES \u2192 Test Creation
5406
+ \u2192 NO \u2192 Moderate Exploration
5407
+ \u2192 NO \u2192 Direction clear? \u2192 YES \u2192 Moderate Exploration \u2192 Ambiguity \u2264 MEDIUM? \u2192 YES \u2192 Proceed with assumptions
5408
+ \u2192 NO \u2192 Deep Exploration / Clarify
5409
+ \u2192 NO \u2192 Deep Exploration \u2192 Document findings \u2192 Clarify CRITICAL/HIGH
6485
5410
  \`\`\`
6486
5411
 
6487
5412
  ---
6488
5413
 
6489
5414
  ## Remember
6490
5415
 
6491
- - **Explore before assuming** - Validate requirements against actual behavior
6492
- - **Concrete observations > abstract interpretation** - Document specific findings
6493
- - **Adaptive depth: time \u221D uncertainty** - Match exploration effort to requirement clarity
6494
- - **Exploration findings \u2192 specific clarifications** - Use observations to formulate questions
6495
- - **Always document** - Create artifacts for future reference
6496
- - **Link exploration \u2192 ambiguity \u2192 clarification** - Connect the workflow`,
5416
+ - **Explore before assuming** \u2014 validate requirements against actual behavior
5417
+ - **Concrete observations > abstract interpretation** \u2014 document specific findings
5418
+ - **Adaptive depth** \u2014 match exploration effort to requirement clarity
5419
+ - **Always document** \u2014 create artifacts for future reference`,
6497
5420
  tags: ["exploration", "protocol", "adaptive"]
6498
5421
  };
6499
5422
 
@@ -6505,277 +5428,138 @@ var clarificationProtocolStep = {
6505
5428
  invokesSubagents: ["team-communicator"],
6506
5429
  content: `## Clarification Protocol
6507
5430
 
6508
- Before proceeding with test creation or execution, ensure requirements are clear and testable. Use this protocol to detect ambiguity, assess its severity, and determine the appropriate action.
5431
+ Before proceeding with test creation or execution, ensure requirements are clear and testable.
6509
5432
 
6510
5433
  ### Check for Pending Clarification
6511
5434
 
6512
- Before starting, check if this task is resuming from a blocked clarification:
6513
-
6514
- 1. **Check $ARGUMENTS for clarification data:**
6515
- - If \`$ARGUMENTS.clarification\` exists, this task is resuming with a clarification response
6516
- - Extract: \`clarification\` (the user's answer), \`originalArgs\` (original task parameters)
6517
-
6518
- 2. **If clarification is present:**
6519
- - Read \`.bugzy/runtime/blocked-task-queue.md\`
6520
- - Find and remove your task's entry from the queue (update the file)
6521
- - Proceed using the clarification as if user just provided the answer
6522
- - Skip ambiguity detection for the clarified aspect
6523
-
6524
- 3. **If no clarification in $ARGUMENTS:** Proceed normally with ambiguity detection below.
5435
+ 1. If \`$ARGUMENTS.clarification\` exists, this task is resuming with a clarification response:
5436
+ - Extract \`clarification\` (the user's answer) and \`originalArgs\` (original task parameters)
5437
+ - Read \`.bugzy/runtime/blocked-task-queue.md\`, find and remove your task's entry
5438
+ - Proceed using the clarification, skip ambiguity detection for the clarified aspect
5439
+ 2. If no clarification in $ARGUMENTS: Proceed normally with ambiguity detection below.
6525
5440
 
6526
5441
  ### Assess Project Maturity
6527
5442
 
6528
- Before detecting ambiguity, assess how well you know this project. Maturity determines how aggressively you should ask questions \u2014 new projects require more questions, mature projects can rely on accumulated knowledge.
5443
+ Maturity determines how aggressively you should ask questions.
6529
5444
 
6530
- **Measure maturity from runtime artifacts:**
5445
+ **Measure from runtime artifacts:**
6531
5446
 
6532
5447
  | Signal | New | Growing | Mature |
6533
5448
  |--------|-----|---------|--------|
6534
- | \`knowledge-base.md\` | < 80 lines (template) | 80-300 lines | 300+ lines |
6535
- | \`memory/\` files | 0 files | 1-3 files | 4+ files, >5KB each |
5449
+ | \`knowledge-base.md\` | < 80 lines | 80-300 lines | 300+ lines |
5450
+ | \`memory/\` files | 0 | 1-3 | 4+ files, >5KB each |
6536
5451
  | Test cases in \`test-cases/\` | 0 | 1-6 | 7+ |
6537
5452
  | Exploration reports | 0 | 1 | 2+ |
6538
5453
 
6539
- **Steps:**
6540
- 1. Read \`.bugzy/runtime/knowledge-base.md\` and count lines
6541
- 2. List \`.bugzy/runtime/memory/\` directory and count files
6542
- 3. List \`test-cases/\` directory and count \`.md\` files (exclude README)
6543
- 4. Count exploration reports in \`exploration-reports/\`
6544
- 5. Classify: If majority of signals = New \u2192 **New**; majority Mature \u2192 **Mature**; otherwise \u2192 **Growing**
5454
+ Check these signals and classify: majority New \u2192 **New**; majority Mature \u2192 **Mature**; otherwise \u2192 **Growing**.
6545
5455
 
6546
5456
  **Maturity adjusts your question threshold:**
6547
- - **New**: Ask for CRITICAL + HIGH + MEDIUM severity (gather information aggressively)
6548
- - **Growing**: Ask for CRITICAL + HIGH severity (standard protocol)
6549
- - **Mature**: Ask for CRITICAL only (handle HIGH with documented assumptions)
6550
-
6551
- **CRITICAL severity ALWAYS triggers a question, regardless of maturity level.**
5457
+ - **New**: STOP for CRITICAL + HIGH + MEDIUM
5458
+ - **Growing**: STOP for CRITICAL + HIGH (default)
5459
+ - **Mature**: STOP for CRITICAL only; handle HIGH with documented assumptions
6552
5460
 
6553
5461
  ### Detect Ambiguity
6554
5462
 
6555
- Scan for ambiguity signals:
6556
-
6557
- **Language:** Vague terms ("fix", "improve", "better", "like", "mixed up"), relative terms without reference ("faster", "more"), undefined scope ("the ordering", "the fields", "the page"), modal ambiguity ("should", "could" vs "must", "will")
6558
-
6559
- **Details:** Missing acceptance criteria (no clear PASS/FAIL), no examples/mockups, incomplete field/element lists, unclear role behavior differences, unspecified error scenarios
5463
+ Scan for these signals:
5464
+ - **Language**: Vague terms ("fix", "improve"), relative terms without reference, undefined scope, modal ambiguity
5465
+ - **Details**: Missing acceptance criteria, no examples, incomplete element lists, unspecified error scenarios
5466
+ - **Interpretation**: Multiple valid interpretations, contradictory information, implied vs explicit requirements
5467
+ - **Context**: No reference documentation, assumes knowledge
6560
5468
 
6561
- **Interpretation:** Multiple valid interpretations, contradictory information (description vs comments), implied vs explicit requirements
6562
-
6563
- **Context:** No reference documentation, "RELEASE APPROVED" without criteria, quick ticket creation, assumes knowledge ("as you know...", "obviously...")
6564
-
6565
- **Quick Check:**
6566
- - [ ] Success criteria explicitly defined? (PASS if X, FAIL if Y)
6567
- - [ ] All affected elements specifically listed? (field names, URLs, roles)
6568
- - [ ] Only ONE reasonable interpretation?
6569
- - [ ] Examples, screenshots, or mockups provided?
6570
- - [ ] Consistent with existing system patterns?
6571
- - [ ] Can write test assertions without assumptions?
5469
+ **Quick Check** \u2014 can you write test assertions without assumptions? Is there only ONE reasonable interpretation?
6572
5470
 
6573
5471
  ### Assess Severity
6574
5472
 
6575
- If ambiguity is detected, assess its severity:
6576
-
6577
- | Severity | Characteristics | Examples | Action |
6578
- |----------|----------------|----------|--------|
6579
- | **CRITICAL** | Expected behavior undefined/contradictory; test outcome unpredictable; core functionality unclear; success criteria missing; multiple interpretations = different strategies; **referenced page/feature confirmed absent after browser verification AND no authoritative trigger source (Jira, PR, team request) asserts the feature exists** | "Fix the issue" (what issue?), "Improve performance" (which metrics?), "Fix sorting in todo list" (by date? priority? completion status?), "Test the Settings page" (browsed app \u2014 no Settings page exists, and no Jira/PR claims it was built) | **STOP** - You MUST ask via team-communicator before proceeding |
6580
- | **HIGH** | Core underspecified but direction clear; affects majority of scenarios; vague success criteria; assumptions risky | "Fix ordering" (sequence OR visibility?), "Add validation" (what? messages?), "Update dashboard" (which widgets?) | **STOP** - You MUST ask via team-communicator before proceeding |
6581
- | **MEDIUM** | Specific details missing; general requirements clear; affects subset of cases; reasonable low-risk assumptions possible; wrong assumption = test updates not strategy overhaul | Missing field labels, unclear error message text, undefined timeouts, button placement not specified, date formats unclear | **PROCEED** - (1) Moderate exploration, (2) Document assumptions: "Assuming X because Y", (3) Proceed with creation/execution, (4) Async clarification (team-communicator), (5) Mark [ASSUMED: description] |
6582
- | **LOW** | Minor edge cases; documentation gaps don't affect execution; optional/cosmetic elements; minimal impact | Tooltip text, optional field validation, icon choice, placeholder text, tab order | **PROCEED** - (1) Mark [TO BE CLARIFIED: description], (2) Proceed, (3) Mention in report "Minor Details", (4) No blocking/async clarification |
5473
+ | Severity | Characteristics | Action |
5474
+ |----------|----------------|--------|
5475
+ | **CRITICAL** | Expected behavior undefined/contradictory; core functionality unclear; success criteria missing; multiple interpretations = different strategies; page/feature confirmed absent with no authoritative trigger claiming it exists | **STOP** \u2014 ask via team-communicator |
5476
+ | **HIGH** | Core underspecified but direction clear; affects majority of scenarios; assumptions risky | **STOP** \u2014 ask via team-communicator |
5477
+ | **MEDIUM** | Specific details missing; general requirements clear; reasonable low-risk assumptions possible | **PROCEED** \u2014 moderate exploration, document assumptions [ASSUMED: X], async clarification |
5478
+ | **LOW** | Minor edge cases; documentation gaps don't affect execution | **PROCEED** \u2014 mark [TO BE CLARIFIED: X], mention in report |
6583
5479
 
6584
5480
  ### Execution Obstacle vs. Requirement Ambiguity
6585
5481
 
6586
- Before classifying something as CRITICAL, distinguish between these two fundamentally different situations:
5482
+ Before classifying something as CRITICAL, distinguish:
6587
5483
 
6588
- **Requirement Ambiguity** = *What* to test is unclear \u2192 severity assessment applies normally
6589
- - No authoritative source describes the feature
6590
- - The task description is vague or contradictory
6591
- - You cannot determine what "correct" behavior looks like
6592
- - \u2192 Apply severity table above. CRITICAL/HIGH \u2192 BLOCK.
5484
+ **Requirement Ambiguity** = *What* to test is unclear \u2192 severity assessment applies normally.
6593
5485
 
6594
- **Execution Obstacle** = *What* to test is clear, but *how* to access/verify has obstacles \u2192 NEVER BLOCK
6595
- - An authoritative trigger source (Jira issue, PR, team message) asserts the feature exists
6596
- - You browsed the app but couldn't find/access the feature
6597
- - The obstacle is likely: wrong user role/tier, missing test data, feature flags, environment config
6598
- - \u2192 PROCEED with artifact creation (test cases, test specs). Notify team about the obstacle.
5486
+ **Execution Obstacle** = *What* to test is clear, but *how* to access/verify has obstacles \u2192 NEVER BLOCK.
5487
+ - An authoritative trigger source (Jira, PR, team message) asserts the feature exists
5488
+ - You browsed but couldn't find/access it (likely: wrong role, missing test data, feature flags, env config)
5489
+ - \u2192 PROCEED with artifact creation. Notify team about the obstacle.
6599
5490
 
6600
- **The key test:** Does an authoritative trigger source (Jira, PR, team request) assert the feature exists?
6601
- - **YES** \u2192 It's an execution obstacle. The feature exists but you can't access it. Proceed: create test artifacts, add placeholder env vars, notify team about access issues.
6602
- - **NO** \u2192 It may genuinely not exist. Apply CRITICAL severity, ask what was meant.
5491
+ **The key test:** Does an authoritative trigger source assert the feature exists?
5492
+ - **YES** \u2192 Execution obstacle. Proceed, create test artifacts, notify team about access issues.
5493
+ - **NO** \u2192 May genuinely not exist. Apply CRITICAL severity, ask.
6603
5494
 
6604
- | Scenario | Trigger Says | Browser Shows | Classification | Action |
6605
- |----------|-------------|---------------|----------------|--------|
6606
- | Jira says "test premium dashboard", you log in as test_user and don't see it | Feature exists | Can't access | **Execution obstacle** | Create tests, notify team re: missing premium credentials |
6607
- | PR says "verify new settings page", you browse and find no settings page | Feature exists | Can't find | **Execution obstacle** | Create tests, notify team re: possible feature flag/env issue |
6608
- | Manual request "test the settings page", no Jira/PR, you browse and find no settings page | No source claims it | Can't find | **Requirement ambiguity (CRITICAL)** | BLOCK, ask what was meant |
6609
- | Jira says "fix sorting", but doesn't specify sort criteria | Feature exists | Feature exists | **Requirement ambiguity (HIGH)** | BLOCK, ask which sort criteria |
5495
+ **Important:** A page loading is NOT the same as the requested functionality existing on it. Evaluate whether the REQUESTED FUNCTIONALITY exists, not just whether a URL resolves. If the page loads but requested features are absent and no authoritative source claims they were built \u2192 CRITICAL ambiguity.
6610
5496
 
6611
- **Partial Feature Existence \u2014 URL found but requested functionality absent:**
6612
-
6613
- A common edge case: a page/route loads successfully, but the SPECIFIC FUNCTIONALITY you were asked to test doesn't exist on it.
6614
-
6615
- **Rule:** Evaluate whether the REQUESTED FUNCTIONALITY exists, not just whether a URL resolves.
6616
-
6617
- | Page Exists | Requested Features Exist | Authoritative Trigger | Classification |
6618
- |-------------|--------------------------|----------------------|----------------|
6619
- | Yes | Yes | Any | Proceed normally |
6620
- | Yes | No | Yes (Jira/PR says features built) | Execution obstacle \u2014 features behind flag/env |
6621
- | Yes | No | No (manual request only) | **Requirement ambiguity (CRITICAL)** \u2014 ask what's expected |
6622
- | No | N/A | Yes | Execution obstacle \u2014 page not deployed yet |
6623
- | No | N/A | No | **Requirement ambiguity (CRITICAL)** \u2014 ask what was meant |
6624
-
6625
- **Example:** Prompt says "Test the checkout payment form with credit card 4111..." You browse to /checkout and find an information form (first name, last name, postal code) but NO payment form, NO shipping options, NO Place Order button. No Jira/PR claims these features exist. \u2192 **CRITICAL requirement ambiguity.** Ask: "I found a checkout information form at /checkout but no payment form or shipping options. Can you clarify what checkout features you'd like tested?"
6626
-
6627
- **Key insight:** Finding a URL is not the same as finding the requested functionality. Do NOT classify this as an "execution obstacle" just because the page loads.
5497
+ | Scenario | Trigger Claims Feature | Browser Shows | Classification |
5498
+ |----------|----------------------|---------------|----------------|
5499
+ | Jira says "test premium dashboard", can't see it | Yes | Can't access | Execution obstacle \u2014 proceed |
5500
+ | PR says "verify settings page", no settings page | Yes | Can't find | Execution obstacle \u2014 proceed |
5501
+ | Manual request "test settings", no Jira/PR | No | Can't find | CRITICAL ambiguity \u2014 ask |
5502
+ | Jira says "fix sorting", no sort criteria | Yes | Feature exists | HIGH ambiguity \u2014 ask |
6628
5503
 
6629
5504
  ### Check Memory for Similar Clarifications
6630
5505
 
6631
- Before asking, check if similar question was answered:
6632
-
6633
- **Process:**
6634
- 1. **Query team-communicator memory** - Search by feature name, ambiguity pattern, ticket keywords
6635
- 2. **Review past Q&A** - Similar question asked? What was answer? Applicable now?
6636
- 3. **Assess reusability:**
6637
- - Directly applicable \u2192 Use answer, no re-ask
6638
- - Partially applicable \u2192 Adapt and reference ("Previously for X, clarified Y. Same here?")
6639
- - Not applicable \u2192 Ask as new
6640
- 4. **Update memory** - Store Q&A with task type, feature, pattern tags
6641
-
6642
- **Example:** Query "todo sorting priority" \u2192 Found 2025-01-15: "Should completed todos appear in main list?" \u2192 Answer: "No, move to separate archive view" \u2192 Directly applicable \u2192 Use, no re-ask needed
5506
+ Before asking, search memory by feature name, ambiguity pattern, and ticket keywords. If a directly applicable past answer exists, use it without re-asking. If partially applicable, adapt and reference.
6643
5507
 
6644
5508
  ### Formulate Clarification Questions
6645
5509
 
6646
- If clarification needed (CRITICAL/HIGH severity), formulate specific, concrete questions:
6647
-
6648
- **Good Questions:** Specific and concrete, provide context, offer options, reference examples, tie to test strategy
5510
+ If clarification needed (CRITICAL/HIGH), formulate specific, concrete questions:
6649
5511
 
6650
- **Bad Questions:** Too vague/broad, assumptive, multiple questions in one, no context
6651
-
6652
- **Template:**
6653
5512
  \`\`\`
6654
5513
  **Context:** [Current understanding]
6655
5514
  **Ambiguity:** [Specific unclear aspect]
6656
5515
  **Question:** [Specific question with options]
6657
5516
  **Why Important:** [Testing strategy impact]
6658
-
6659
- Example:
6660
- Context: TODO-456 "Fix the sorting in the todo list so items appear in the right order"
6661
- Ambiguity: "sorting" = (A) by creation date, (B) by due date, (C) by priority level, or (D) custom user-defined order
6662
- Question: Should todos be sorted by due date (soonest first) or priority (high to low)? Should completed items appear in the list or move to archive?
6663
- Why Important: Different sort criteria require different test assertions. Current app shows 15 active todos + 8 completed in mixed order.
6664
5517
  \`\`\`
6665
5518
 
6666
5519
  ### Communicate Clarification Request
6667
5520
 
6668
- **For Slack-Triggered Tasks:** {{INVOKE_TEAM_COMMUNICATOR}} to ask in thread:
6669
- \`\`\`
6670
- Ask clarification in Slack thread:
6671
- Context: [From ticket/description]
6672
- Ambiguity: [Describe ambiguity]
6673
- Severity: [CRITICAL/HIGH]
6674
- Questions:
6675
- 1. [First specific question]
6676
- 2. [Second if needed]
6677
-
6678
- Clarification needed to proceed. I'll wait for response before testing.
6679
- \`\`\`
6680
-
6681
- **For Manual/API Triggers:** Include in task output:
6682
- \`\`\`markdown
6683
- ## Clarification Required Before Testing
6684
-
6685
- **Ambiguity:** [Description]
6686
- **Severity:** [CRITICAL/HIGH]
6687
-
6688
- ### Questions:
6689
- 1. **Question:** [First question]
6690
- - Context: [Provide context]
6691
- - Options: [If applicable]
6692
- - Impact: [Testing impact]
5521
+ **For Slack-Triggered Tasks:** {{INVOKE_TEAM_COMMUNICATOR}} to ask in thread with context, ambiguity description, severity, and specific questions.
6693
5522
 
6694
- **Action Required:** Provide clarification. Testing cannot proceed.
6695
- **Current Observation:** [What exploration revealed - concrete examples]
6696
- \`\`\`
5523
+ **For Manual/API Triggers:** Include a "Clarification Required Before Testing" section in task output with ambiguity, severity, questions with context/options/impact, and current observations.
6697
5524
 
6698
5525
  ### Register Blocked Task (CRITICAL/HIGH only)
6699
5526
 
6700
- When asking a CRITICAL or HIGH severity question that blocks progress, register the task in the blocked queue so it can be automatically re-triggered when clarification arrives.
6701
-
6702
- **Update \`.bugzy/runtime/blocked-task-queue.md\`:**
6703
-
6704
- 1. Read the current file (create if doesn't exist)
6705
- 2. Add a new row to the Queue table
5527
+ When blocked, register in \`.bugzy/runtime/blocked-task-queue.md\`:
6706
5528
 
6707
5529
  \`\`\`markdown
6708
- # Blocked Task Queue
6709
-
6710
- Tasks waiting for clarification responses.
6711
-
6712
5530
  | Task Slug | Question | Original Args |
6713
5531
  |-----------|----------|---------------|
6714
5532
  | generate-test-plan | Should todos be sorted by date or priority? | \`{"ticketId": "TODO-456"}\` |
6715
5533
  \`\`\`
6716
5534
 
6717
- **Entry Fields:**
6718
- - **Task Slug**: The task slug (e.g., \`generate-test-plan\`) - used for re-triggering
6719
- - **Question**: The clarification question asked (so LLM can match responses)
6720
- - **Original Args**: JSON-serialized \`$ARGUMENTS\` wrapped in backticks
6721
-
6722
- **Purpose**: The LLM processor reads this file and matches user responses to pending questions. When a match is found, it re-queues the task with the clarification.
5535
+ The LLM processor reads this file and matches user responses to pending questions, then re-queues the task with the clarification.
6723
5536
 
6724
5537
  ### Wait or Proceed Based on Severity
6725
5538
 
6726
- **Use your maturity assessment to adjust thresholds:**
6727
- - **New project**: STOP for CRITICAL + HIGH + MEDIUM
6728
- - **Growing project**: STOP for CRITICAL + HIGH (default)
6729
- - **Mature project**: STOP for CRITICAL only; handle HIGH with documented assumptions
6730
-
6731
5539
  **When severity meets your STOP threshold:**
6732
- - You MUST call team-communicator (Slack) to ask the question \u2014 do NOT just mention it in your text output
5540
+ - You MUST call team-communicator to ask \u2014 do NOT just mention it in text output
6733
5541
  - Do NOT create tests, run tests, or make assumptions about the unclear aspect
6734
- - Do NOT silently adapt by working around the issue (e.g., running other tests instead)
5542
+ - Do NOT silently adapt by working around the issue
6735
5543
  - Do NOT invent your own success criteria when none are provided
6736
- - Register the blocked task and wait for clarification
6737
- - *Rationale: Wrong assumptions = incorrect tests, false results, wasted time*
5544
+ - Register the blocked task and wait
6738
5545
 
6739
- **When severity is below your STOP threshold \u2192 Proceed with Documented Assumptions:**
6740
- - Perform moderate exploration, document assumptions, proceed with creation/execution
6741
- - Ask clarification async (team-communicator), mark results "based on assumptions"
6742
- - Update tests after clarification received
6743
- - *Rationale: Waiting blocks progress; documented assumptions allow forward movement with later corrections*
6744
-
6745
- **LOW \u2192 Always Proceed and Mark:**
6746
- - Proceed with creation/execution, mark gaps [TO BE CLARIFIED] or [ASSUMED]
6747
- - Mention in report but don't prioritize, no blocking
6748
- - *Rationale: Details don't affect strategy/results significantly*
5546
+ **When severity is below your STOP threshold:**
5547
+ - Perform moderate exploration, document assumptions, proceed
5548
+ - Ask clarification async, mark results "based on assumptions"
6749
5549
 
6750
5550
  ### Document Clarification in Results
6751
5551
 
6752
- When reporting test results, always include an "Ambiguities" section if clarification occurred:
6753
-
6754
- \`\`\`markdown
6755
- ## Ambiguities Encountered
6756
-
6757
- ### Clarification: [Topic]
6758
- - **Severity:** [CRITICAL/HIGH/MEDIUM/LOW]
6759
- - **Question Asked:** [What was asked]
6760
- - **Response:** [Answer received, or "Awaiting response"]
6761
- - **Impact:** [How this affected testing]
6762
- - **Assumption Made:** [If proceeded with assumption]
6763
- - **Risk:** [What could be wrong if assumption is incorrect]
6764
-
6765
- ### Resolution:
6766
- [How the clarification was resolved and incorporated into testing]
6767
- \`\`\`
5552
+ Include an "Ambiguities Encountered" section in results when clarification occurred, noting severity, question asked, response (or "Awaiting"), impact, assumptions made, and risk.
6768
5553
 
6769
5554
  ---
6770
5555
 
6771
5556
  ## Remember
6772
5557
 
6773
- - **STOP means STOP** - When you hit a STOP threshold, you MUST call team-communicator to ask via Slack. Do NOT silently adapt, skip, or work around the issue
6774
- - **Non-existent features \u2014 check context first** - If a page/feature doesn't exist in the browser, check whether an authoritative trigger (Jira, PR, team request) asserts it exists. If YES \u2192 execution obstacle (proceed with artifact creation, notify team). If NO authoritative source claims it exists \u2192 CRITICAL severity, ask what was meant
6775
- - **Ask correctly > guess poorly** - Specific questions lead to specific answers
6776
- - **Never invent success criteria** - If the task says "improve" or "fix" without metrics, ask what "done" looks like
6777
- - **Check memory first** - Avoid re-asking previously answered questions
6778
- - **Maturity adjusts threshold, not judgment** - Even in mature projects, CRITICAL always triggers a question`,
5558
+ - **STOP means STOP** \u2014 When you hit a STOP threshold, you MUST call team-communicator. Do NOT silently adapt or work around the issue
5559
+ - **Non-existent features \u2014 check context first** \u2014 If a feature doesn't exist in browser, check whether an authoritative trigger asserts it exists. YES \u2192 execution obstacle (proceed). NO \u2192 CRITICAL severity, ask.
5560
+ - **Never invent success criteria** \u2014 If the task says "improve" or "fix" without metrics, ask what "done" looks like
5561
+ - **Check memory first** \u2014 Avoid re-asking previously answered questions
5562
+ - **Maturity adjusts threshold, not judgment** \u2014 CRITICAL always triggers a question`,
6779
5563
  tags: ["clarification", "protocol", "ambiguity"]
6780
5564
  };
6781
5565
 
@@ -6898,7 +5682,19 @@ After analyzing test results, triage each failure to determine if it's a product
6898
5682
 
6899
5683
  **IMPORTANT: Do NOT report bugs without triaging first.**
6900
5684
 
6901
- For each failed test:
5685
+ ### 1. Check Failure Classification
5686
+
5687
+ **Before triaging any failure**, read \`new_failures\` from the latest \`test-runs/*/manifest.json\`:
5688
+
5689
+ | \`new_failures\` State | Action |
5690
+ |------------------------|--------|
5691
+ | Non-empty array | Only triage failures listed in \`new_failures\`. Do not investigate, fix, or create issues for \`known_failures\`. |
5692
+ | Empty array | No new failures to triage. Output "0 new failures to triage" and skip the rest of this step. |
5693
+ | Field missing | Fall back: triage all failed tests (backward compatibility with older reporter versions). |
5694
+
5695
+ ### 2. Triage Each Failure
5696
+
5697
+ For each failed test (from \`new_failures\` or all failures if field is missing):
6902
5698
 
6903
5699
  1. **Read failure details** from JSON report (error message, stack trace)
6904
5700
  2. **Classify the failure:**
@@ -6927,14 +5723,22 @@ For each failed test:
6927
5723
  - Broken navigation flows
6928
5724
  - Validation not working as expected
6929
5725
 
6930
- **Document Classification:**
5726
+ ### 3. Document Results
5727
+
6931
5728
  \`\`\`markdown
6932
- ### Failure Triage
5729
+ ### Failure Triage Summary
5730
+
5731
+ **New failures triaged: N** | **Known failures skipped: M**
6933
5732
 
6934
5733
  | Test ID | Test Name | Classification | Reason |
6935
5734
  |---------|-----------|---------------|--------|
6936
5735
  | TC-001 | Login test | TEST ISSUE | Selector brittle - uses CSS instead of role |
6937
5736
  | TC-002 | Checkout | PRODUCT BUG | 500 error on form submit |
5737
+
5738
+ #### Skipped Known Failures
5739
+ | Test ID | Test Name | Last Passed Run |
5740
+ |---------|-----------|-----------------|
5741
+ | TC-003 | Search | 20260210-103045 |
6938
5742
  \`\`\``,
6939
5743
  tags: ["execution", "triage", "analysis"]
6940
5744
  };
@@ -7492,10 +6296,36 @@ npx tsx reporters/parse-results.ts --input <file-or-url> [--timestamp <existing>
7492
6296
  }
7493
6297
  ]
7494
6298
  }
6299
+ ],
6300
+ "new_failures": [
6301
+ {
6302
+ "id": "<test case id>",
6303
+ "name": "<test name>",
6304
+ "error": "<error message or null>",
6305
+ "lastPassedRun": "<timestamp of last passing run or null>"
6306
+ }
6307
+ ],
6308
+ "known_failures": [
6309
+ {
6310
+ "id": "<test case id>",
6311
+ "name": "<test name>",
6312
+ "error": "<error message or null>",
6313
+ "lastPassedRun": null
6314
+ }
7495
6315
  ]
7496
6316
  }
7497
6317
  \`\`\`
7498
- 4. For each failed test, create:
6318
+ 4. **Classify failures** \u2014 after building the manifest, classify each failed test as new or known:
6319
+ - Read \`BUGZY_FAILURE_LOOKBACK\` env var (default: 5)
6320
+ - List previous \`test-runs/*/manifest.json\` files sorted by timestamp descending (skip current run)
6321
+ - For each failed test in the manifest:
6322
+ - If it passed in any of the last N runs \u2192 \`new_failures\` (include the timestamp of the last passing run in \`lastPassedRun\`)
6323
+ - If it failed in ALL of the last N runs \u2192 \`known_failures\`
6324
+ - If the test doesn't exist in any previous run \u2192 \`new_failures\` (new test)
6325
+ - If no previous runs exist at all (first run) \u2192 all failures go to \`new_failures\`
6326
+ - Write the \`new_failures\` and \`known_failures\` arrays into the manifest
6327
+
6328
+ 5. For each failed test, create:
7499
6329
  - Directory: \`test-runs/{timestamp}/{testCaseId}/exec-1/\`
7500
6330
  - File: \`test-runs/{timestamp}/{testCaseId}/exec-1/result.json\` containing:
7501
6331
  \`\`\`json
@@ -7507,8 +6337,8 @@ npx tsx reporters/parse-results.ts --input <file-or-url> [--timestamp <existing>
7507
6337
  "testFile": "<file path if available>"
7508
6338
  }
7509
6339
  \`\`\`
7510
- 5. Print the manifest path to stdout
7511
- 6. Exit code 0 on success, non-zero on failure
6340
+ 6. Print the manifest path to stdout
6341
+ 7. Exit code 0 on success, non-zero on failure
7512
6342
 
7513
6343
  **Incremental mode** (\`--timestamp\` + \`--test-id\` provided):
7514
6344
  1. Read existing \`test-runs/{timestamp}/manifest.json\`