@bugzy-ai/bugzy 1.12.4 → 1.13.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/dist/index.cjs CHANGED
@@ -473,6 +473,9 @@ Before invoking the agent, identify the test cases for the current area:
473
473
  - Existing automated tests: ./tests/specs/
474
474
  - Existing Page Objects: ./tests/pages/
475
475
 
476
+ **Knowledge Base Patterns (MUST APPLY):**
477
+ Include ALL relevant testing patterns from the knowledge base that apply to this area. For example, if the KB documents timing behaviors (animation delays, loading states), selector gotchas, or recommended assertion approaches \u2014 list them here explicitly and instruct the agent to use the specific patterns described (e.g., specific assertion methods with specific timeouts). The test-code-generator does not have access to the knowledge base, so you MUST relay the exact patterns and recommended code approaches.
478
+
476
479
  **The agent should:**
477
480
  1. Read the manual test case files for this area
478
481
  2. Check existing Page Object infrastructure for this area
@@ -481,6 +484,7 @@ Before invoking the agent, identify the test cases for the current area:
481
484
  5. For each test case marked \`automated: true\`:
482
485
  - Create automated Playwright test in ./tests/specs/
483
486
  - Update the manual test case file to reference the automated test path
487
+ - Apply ALL knowledge base patterns listed above (timing, selectors, assertions)
484
488
  6. Run and iterate on each test until it passes or fails with a product bug
485
489
  7. Update .env.testdata with any new variables
486
490
 
@@ -1442,7 +1446,9 @@ Extract the following from arguments:
1442
1446
  "read-knowledge-base",
1443
1447
  // Step 5: Test Execution Strategy (library)
1444
1448
  "read-test-strategy",
1445
- // Step 6: Identify Tests (inline - task-specific)
1449
+ // Step 6: Clarification Protocol (library)
1450
+ "clarification-protocol",
1451
+ // Step 7: Identify Tests (inline - task-specific)
1446
1452
  {
1447
1453
  inline: true,
1448
1454
  title: "Identify Automated Tests to Run",
@@ -1660,7 +1666,9 @@ Store the detected trigger for use in output routing:
1660
1666
  - Set variable: \`TRIGGER_SOURCE\` = [GITHUB_PR | SLACK_MESSAGE | CI_CD | MANUAL]
1661
1667
  - This determines output formatting and delivery channel`
1662
1668
  },
1663
- // Step 6: Extract Context (inline)
1669
+ // Step 6: Clarification Protocol (library)
1670
+ "clarification-protocol",
1671
+ // Step 7: Extract Context (inline)
1664
1672
  {
1665
1673
  inline: true,
1666
1674
  title: "Extract Context Based on Trigger",
@@ -6144,6 +6152,8 @@ Before proceeding, read the curated knowledge base to inform your work:
6144
6152
  - Build on existing understanding
6145
6153
  - Maintain consistency with established practices
6146
6154
 
6155
+ 3. **Relay to subagents**: Subagents do NOT read the knowledge base directly. When delegating work, you MUST include relevant KB patterns in your delegation message \u2014 especially testing patterns (timing, selectors, assertion approaches) that affect test reliability.
6156
+
6147
6157
  **Note:** The knowledge base may not exist yet or may be empty. If it doesn't exist or is empty, proceed without this context and help build it as you work.`,
6148
6158
  tags: ["setup", "context"]
6149
6159
  };
@@ -6282,6 +6292,16 @@ Determine exploration depth based on requirement quality:
6282
6292
  - **Vague:** "Fix the sorting in todo list page. The items are mixed up for premium users."
6283
6293
  - **Unclear:** "Improve the dashboard performance. Users say it's slow."
6284
6294
 
6295
+ ### Maturity Adjustment
6296
+
6297
+ If the Clarification Protocol determined project maturity, adjust exploration depth:
6298
+
6299
+ - **New project**: Default one level deeper than requirement clarity suggests (Clear \u2192 Moderate, Vague \u2192 Deep)
6300
+ - **Growing project**: Use requirement clarity as-is (standard protocol)
6301
+ - **Mature project**: Trust knowledge base \u2014 can stay at suggested depth or go one level shallower if KB covers the feature
6302
+
6303
+ **Always verify features exist before testing them.** If exploration reveals that a referenced page or feature does not exist in the application, this is CRITICAL severity \u2014 escalate via the Clarification Protocol regardless of maturity level. Do NOT silently adapt or work around the missing feature.
6304
+
6285
6305
  ### Quick Exploration (1-2 min)
6286
6306
 
6287
6307
  **When:** Requirements CLEAR
@@ -6513,6 +6533,33 @@ Before starting, check if this task is resuming from a blocked clarification:
6513
6533
 
6514
6534
  3. **If no clarification in $ARGUMENTS:** Proceed normally with ambiguity detection below.
6515
6535
 
6536
+ ### Assess Project Maturity
6537
+
6538
+ Before detecting ambiguity, assess how well you know this project. Maturity determines how aggressively you should ask questions \u2014 new projects require more questions, mature projects can rely on accumulated knowledge.
6539
+
6540
+ **Measure maturity from runtime artifacts:**
6541
+
6542
+ | Signal | New | Growing | Mature |
6543
+ |--------|-----|---------|--------|
6544
+ | \`knowledge-base.md\` | < 80 lines (template) | 80-300 lines | 300+ lines |
6545
+ | \`memory/\` files | 0 files | 1-3 files | 4+ files, >5KB each |
6546
+ | Test cases in \`test-cases/\` | 0 | 1-6 | 7+ |
6547
+ | Exploration reports | 0 | 1 | 2+ |
6548
+
6549
+ **Steps:**
6550
+ 1. Read \`.bugzy/runtime/knowledge-base.md\` and count lines
6551
+ 2. List \`.bugzy/runtime/memory/\` directory and count files
6552
+ 3. List \`test-cases/\` directory and count \`.md\` files (exclude README)
6553
+ 4. Count exploration reports in \`exploration-reports/\`
6554
+ 5. Classify: If majority of signals = New \u2192 **New**; majority Mature \u2192 **Mature**; otherwise \u2192 **Growing**
6555
+
6556
+ **Maturity adjusts your question threshold:**
6557
+ - **New**: Ask for CRITICAL + HIGH + MEDIUM severity (gather information aggressively)
6558
+ - **Growing**: Ask for CRITICAL + HIGH severity (standard protocol)
6559
+ - **Mature**: Ask for CRITICAL only (handle HIGH with documented assumptions)
6560
+
6561
+ **CRITICAL severity ALWAYS triggers a question, regardless of maturity level.**
6562
+
6516
6563
  ### Detect Ambiguity
6517
6564
 
6518
6565
  Scan for ambiguity signals:
@@ -6539,8 +6586,8 @@ If ambiguity is detected, assess its severity:
6539
6586
 
6540
6587
  | Severity | Characteristics | Examples | Action |
6541
6588
  |----------|----------------|----------|--------|
6542
- | **CRITICAL** | Expected behavior undefined/contradictory; test outcome unpredictable; core functionality unclear; success criteria missing; multiple interpretations = different strategies | "Fix the issue" (what issue?), "Improve performance" (which metrics?), "Fix sorting in todo list" (by date? priority? completion status?) | **STOP** - Seek clarification before proceeding |
6543
- | **HIGH** | Core underspecified but direction clear; affects majority of scenarios; vague success criteria; assumptions risky | "Fix ordering" (sequence OR visibility?), "Add validation" (what? messages?), "Update dashboard" (which widgets?) | **STOP** - Seek clarification before proceeding |
6589
+ | **CRITICAL** | Expected behavior undefined/contradictory; test outcome unpredictable; core functionality unclear; success criteria missing; multiple interpretations = different strategies; **referenced page/feature does not exist in the application** | "Fix the issue" (what issue?), "Improve performance" (which metrics?), "Fix sorting in todo list" (by date? priority? completion status?), "Test the Settings page" (no Settings page exists), "Verify the checkout flow" (no checkout page found) | **STOP** - You MUST ask via team-communicator before proceeding |
6590
+ | **HIGH** | Core underspecified but direction clear; affects majority of scenarios; vague success criteria; assumptions risky | "Fix ordering" (sequence OR visibility?), "Add validation" (what? messages?), "Update dashboard" (which widgets?) | **STOP** - You MUST ask via team-communicator before proceeding |
6544
6591
  | **MEDIUM** | Specific details missing; general requirements clear; affects subset of cases; reasonable low-risk assumptions possible; wrong assumption = test updates not strategy overhaul | Missing field labels, unclear error message text, undefined timeouts, button placement not specified, date formats unclear | **PROCEED** - (1) Moderate exploration, (2) Document assumptions: "Assuming X because Y", (3) Proceed with creation/execution, (4) Async clarification (team-communicator), (5) Mark [ASSUMED: description] |
6545
6592
  | **LOW** | Minor edge cases; documentation gaps don't affect execution; optional/cosmetic elements; minimal impact | Tooltip text, optional field validation, icon choice, placeholder text, tab order | **PROCEED** - (1) Mark [TO BE CLARIFIED: description], (2) Proceed, (3) Mention in report "Minor Details", (4) No blocking/async clarification |
6546
6593
 
@@ -6641,18 +6688,26 @@ Tasks waiting for clarification responses.
6641
6688
 
6642
6689
  ### Wait or Proceed Based on Severity
6643
6690
 
6644
- **CRITICAL/HIGH \u2192 STOP and Wait:**
6645
- - Do NOT create tests, run tests, or make assumptions
6646
- - Wait for clarification, resume after answer
6691
+ **Use your maturity assessment to adjust thresholds:**
6692
+ - **New project**: STOP for CRITICAL + HIGH + MEDIUM
6693
+ - **Growing project**: STOP for CRITICAL + HIGH (default)
6694
+ - **Mature project**: STOP for CRITICAL only; handle HIGH with documented assumptions
6695
+
6696
+ **When severity meets your STOP threshold:**
6697
+ - You MUST call team-communicator (Slack) to ask the question \u2014 do NOT just mention it in your text output
6698
+ - Do NOT create tests, run tests, or make assumptions about the unclear aspect
6699
+ - Do NOT silently adapt by working around the issue (e.g., running other tests instead)
6700
+ - Do NOT invent your own success criteria when none are provided
6701
+ - Register the blocked task and wait for clarification
6647
6702
  - *Rationale: Wrong assumptions = incorrect tests, false results, wasted time*
6648
6703
 
6649
- **MEDIUM \u2192 Proceed with Documented Assumptions:**
6704
+ **When severity is below your STOP threshold \u2192 Proceed with Documented Assumptions:**
6650
6705
  - Perform moderate exploration, document assumptions, proceed with creation/execution
6651
6706
  - Ask clarification async (team-communicator), mark results "based on assumptions"
6652
6707
  - Update tests after clarification received
6653
6708
  - *Rationale: Waiting blocks progress; documented assumptions allow forward movement with later corrections*
6654
6709
 
6655
- **LOW \u2192 Proceed and Mark:**
6710
+ **LOW \u2192 Always Proceed and Mark:**
6656
6711
  - Proceed with creation/execution, mark gaps [TO BE CLARIFIED] or [ASSUMED]
6657
6712
  - Mention in report but don't prioritize, no blocking
6658
6713
  - *Rationale: Details don't affect strategy/results significantly*
@@ -6680,11 +6735,12 @@ When reporting test results, always include an "Ambiguities" section if clarific
6680
6735
 
6681
6736
  ## Remember
6682
6737
 
6683
- - **Block for CRITICAL/HIGH** - Never proceed with assumptions on unclear core requirements
6738
+ - **STOP means STOP** - When you hit a STOP threshold, you MUST call team-communicator to ask via Slack. Do NOT silently adapt, skip, or work around the issue
6739
+ - **Non-existent features = CRITICAL** - If a page, component, or feature referenced in the task does not exist, this is always CRITICAL severity \u2014 ask what was meant
6684
6740
  - **Ask correctly > guess poorly** - Specific questions lead to specific answers
6685
- - **Document MEDIUM assumptions** - Track what you assumed and why
6741
+ - **Never invent success criteria** - If the task says "improve" or "fix" without metrics, ask what "done" looks like
6686
6742
  - **Check memory first** - Avoid re-asking previously answered questions
6687
- - **Specific questions \u2192 specific answers** - Vague questions get vague answers`,
6743
+ - **Maturity adjusts threshold, not judgment** - Even in mature projects, CRITICAL always triggers a question`,
6688
6744
  tags: ["clarification", "protocol", "ambiguity"]
6689
6745
  };
6690
6746