npm - agentic-qe - Versions diffs - 3.7.15 → 3.7.17 - Mend

agentic-qe 3.7.15 → 3.7.17

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (301) hide show

package/.claude/helpers/v3/quality-criteria/evidence-classification.md CHANGED Viewed

@@ -1,116 +1,116 @@
-# Evidence Classification Guide
-Guidelines for classifying evidence in Quality Criteria recommendations.
-## Evidence Types
-### Direct Evidence
-**Definition:** Actual code quote, explicit documentation statement, or measurable fact from source.
-**Requirements:**
-- Must include `file_path:line_range` reference (e.g., `src/auth/login.ts:45-52`)
-- Line ranges should be narrow (max 10-15 lines)
-- Must quote or directly reference the source
-**Examples:**
-```
-Source: src/payment/processor.ts:123-128
-Type: Direct
-Finding: No input validation before API call
-Reasoning: Unvalidated input could enable injection attacks
-```
-### Inferred Evidence
-**Definition:** Logical deduction from observed patterns, architectural implications, or domain knowledge.
-**Requirements:**
-- Must show reasoning chain
-- Can use architectural implications
-- Should reference what was observed
-**Examples:**
-```
-Source: Architecture review of src/api/
-Type: Inferred
-Finding: No rate limiting middleware detected
-Reasoning: API endpoints could be vulnerable to DoS; need to verify with load testing
-```
-### Claimed Evidence
-**Definition:** Statement that requires verification - based on assumptions or incomplete data.
-**Requirements:**
-- Must state "requires verification" or "needs inspection to confirm"
-- Must NOT speculate about what "could" or "might" happen
-- Used when source is unavailable or claim needs validation
-**Examples:**
-```
-WRONG: "Could range from efficient to aggressive implementation"
-RIGHT: "Poll interval not specified - requires code inspection to verify"
-```
-## Evidence Table Format
-```html
-<table class="evidence-table">
-  <thead>
-    <tr>
-      <th>Source Reference</th>
-      <th>Type</th>
-      <th>Quality Implication</th>
-      <th>Reasoning</th>
-    </tr>
-  </thead>
-  <tbody>
-    <tr>
-      <td><code>src/auth/session.ts:89-94</code></td>
-      <td><span class="evidence-type direct">Direct</span></td>
-      <td>Session tokens stored without encryption</td>
-      <td class="evidence-reasoning">Credential exposure risk if storage is compromised</td>
-    </tr>
-  </tbody>
-</table>
-```
-## Source Reference Format
-### For Specific Code
-```
-file_path:start_line-end_line
-Example: src/agents/FleetCommanderAgent.ts:847-852
-```
-### For File-Level Metrics
-```
-file_path (metric)
-Example: src/agents/N8nBaseAgent.ts (683 LOC)
-```
-### For Search Results (No Matches)
-```
-N/A (verified via Glob/Grep search)
-- NOT: tests/**/n8n/**/*.test.ts (glob pattern)
-```
-## Reasoning Column Guidelines
-The Reasoning column must explain **WHY** something matters, not **WHAT** the code does.
-| WRONG (describes WHAT) | CORRECT (explains WHY) |
-|------------------------|------------------------|
-| "Retry logic with exponential backoff" | "Retry pattern handles transient failures; needs edge case testing for timeout exhaustion" |
-| "Session cookie stored in memory" | "Credential in memory could leak if agent state is serialized to logs" |
-| "getWorkflow supports forceRefresh flag" | "Cache bypass prevents stale data; but increases load on source system" |
-**Formula:**
-```
-{What the code does} → {Why that matters for quality} → {What could go wrong}
-```
-## Prohibited Patterns
-- **No confidence percentages**: Use evidence types instead of "85% confident"
-- **No vague blast radius**: Use "affects 19 agents" not "affects many"
-- **No speculation in Claimed**: Use "requires verification" not "could be X or Y"
-- **No keyword matching claims**: Show semantic reasoning, not keyword counts
+# Evidence Classification Guide
+Guidelines for classifying evidence in Quality Criteria recommendations.
+## Evidence Types
+### Direct Evidence
+**Definition:** Actual code quote, explicit documentation statement, or measurable fact from source.
+**Requirements:**
+- Must include `file_path:line_range` reference (e.g., `src/auth/login.ts:45-52`)
+- Line ranges should be narrow (max 10-15 lines)
+- Must quote or directly reference the source
+**Examples:**
+```
+Source: src/payment/processor.ts:123-128
+Type: Direct
+Finding: No input validation before API call
+Reasoning: Unvalidated input could enable injection attacks
+```
+### Inferred Evidence
+**Definition:** Logical deduction from observed patterns, architectural implications, or domain knowledge.
+**Requirements:**
+- Must show reasoning chain
+- Can use architectural implications
+- Should reference what was observed
+**Examples:**
+```
+Source: Architecture review of src/api/
+Type: Inferred
+Finding: No rate limiting middleware detected
+Reasoning: API endpoints could be vulnerable to DoS; need to verify with load testing
+```
+### Claimed Evidence
+**Definition:** Statement that requires verification - based on assumptions or incomplete data.
+**Requirements:**
+- Must state "requires verification" or "needs inspection to confirm"
+- Must NOT speculate about what "could" or "might" happen
+- Used when source is unavailable or claim needs validation
+**Examples:**
+```
+WRONG: "Could range from efficient to aggressive implementation"
+RIGHT: "Poll interval not specified - requires code inspection to verify"
+```
+## Evidence Table Format
+```html
+<table class="evidence-table">
+  <thead>
+    <tr>
+      <th>Source Reference</th>
+      <th>Type</th>
+      <th>Quality Implication</th>
+      <th>Reasoning</th>
+    </tr>
+  </thead>
+  <tbody>
+    <tr>
+      <td><code>src/auth/session.ts:89-94</code></td>
+      <td><span class="evidence-type direct">Direct</span></td>
+      <td>Session tokens stored without encryption</td>
+      <td class="evidence-reasoning">Credential exposure risk if storage is compromised</td>
+    </tr>
+  </tbody>
+</table>
+```
+## Source Reference Format
+### For Specific Code
+```
+file_path:start_line-end_line
+Example: src/agents/FleetCommanderAgent.ts:847-852
+```
+### For File-Level Metrics
+```
+file_path (metric)
+Example: src/agents/N8nBaseAgent.ts (683 LOC)
+```
+### For Search Results (No Matches)
+```
+N/A (verified via Glob/Grep search)
+- NOT: tests/**/n8n/**/*.test.ts (glob pattern)
+```
+## Reasoning Column Guidelines
+The Reasoning column must explain **WHY** something matters, not **WHAT** the code does.
+| WRONG (describes WHAT) | CORRECT (explains WHY) |
+|------------------------|------------------------|
+| "Retry logic with exponential backoff" | "Retry pattern handles transient failures; needs edge case testing for timeout exhaustion" |
+| "Session cookie stored in memory" | "Credential in memory could leak if agent state is serialized to logs" |
+| "getWorkflow supports forceRefresh flag" | "Cache bypass prevents stale data; but increases load on source system" |
+**Formula:**
+```
+{What the code does} → {Why that matters for quality} → {What could go wrong}
+```
+## Prohibited Patterns
+- **No confidence percentages**: Use evidence types instead of "85% confident"
+- **No vague blast radius**: Use "affects 19 agents" not "affects many"
+- **No speculation in Claimed**: Use "requires verification" not "could be X or Y"
+- **No keyword matching claims**: Show semantic reasoning, not keyword counts

package/.claude/helpers/v3/quality-criteria/htsm-categories.md CHANGED Viewed

@@ -1,139 +1,139 @@
-# HTSM v6.3 Quality Criteria Categories
-James Bach's Heuristic Test Strategy Model (HTSM) v6.3 Quality Criteria framework.
-## 1. Capability
-**Can it perform the required functions?**
-| Subcategory | Focus |
-|-------------|-------|
-| Sufficiency | Does it do what it's supposed to? |
-| Correctness | Does it do it correctly? |
-**Priority Indicators:**
-- P0: Core business functionality
-- P1: Important features
-- P2: Secondary features
-- P3: Nice-to-have features
-## 2. Reliability
-**Will it work well and resist failure?**
-| Subcategory | Focus |
-|-------------|-------|
-| Robustness | Can it handle adverse conditions? |
-| Error Handling | Does it handle errors gracefully? |
-| Data Integrity | Is data protected from corruption? |
-| Safety | Does it avoid dangerous behaviors? |
-**Cannot be omitted** - All systems can fail.
-## 3. Usability
-**How easy is it for real users?**
-| Subcategory | Focus |
-|-------------|-------|
-| Learnability | How quickly can users learn? |
-| Operability | How easy to operate day-to-day? |
-| Accessibility | Can users with disabilities use it? |
-## 4. Charisma
-**How appealing is the product?**
-| Subcategory | Focus |
-|-------------|-------|
-| Aesthetics | Is it visually pleasing? |
-| Uniqueness | Does it stand out? |
-| Entrancement | Does it engage users? |
-| Image | Does it project the right brand? |
-**Note:** "Brand guidelines handled separately" is NOT a valid omission reason. Charisma is about UX testing, not brand documentation.
-## 5. Security
-**How well protected against unauthorized use?**
-| Subcategory | Focus |
-|-------------|-------|
-| Authentication | Who is using it? |
-| Authorization | What are they allowed to do? |
-| Privacy | Is personal data protected? |
-| Security Holes | Are there vulnerabilities? |
-**Cannot be omitted** - Every system has attack surface.
-## 6. Scalability
-**How well does deployment scale?**
-| Subcategory | Focus |
-|-------------|-------|
-| Load Handling | Behavior under increased demand |
-| Resource Efficiency | Resource usage at scale |
-## 7. Compatibility
-**Works with external components?**
-| Subcategory | Focus |
-|-------------|-------|
-| Application | Works with other applications? |
-| OS | Works with target operating systems? |
-| Hardware | Works with target hardware? |
-| Backward | Works with previous versions? |
-| Product Footprint | Resource requirements acceptable? |
-## 8. Performance
-**How speedy and responsive?**
-| Subcategory | Focus |
-|-------------|-------|
-| Response Time | Under various conditions |
-| Throughput | Data processing capacity |
-| Efficiency | Resource utilization |
-**Cannot be omitted** - Every system has response time.
-## 9. Installability
-**How easily installed?**
-| Subcategory | Focus |
-|-------------|-------|
-| System Requirements | Clear and achievable? |
-| Configuration | Easy to configure? |
-| Uninstallation | Clean removal? |
-| Upgrades/Patches | Easy to update? |
-| Administration | Easy to administer? |
-**Valid omission:** Pure SaaS/browser-based with no client installation.
-## 10. Development
-**How well can we create/test/modify?**
-| Subcategory | Focus |
-|-------------|-------|
-| Supportability | Easy to support? |
-| Testability | Easy to test? |
-| Maintainability | Easy to maintain? |
-| Portability | Easy to port? |
-| Localizability | Easy to localize? |
-**Cannot be omitted** - Always applies to software.
----
-## Priority Assignment Guide
-| Priority | Definition | Example |
-|----------|------------|---------|
-| **P0 (Critical)** | Failure causes immediate business/user harm | Payment failures, data breaches |
-| **P1 (High)** | Critical to core user value proposition | Core features not working |
-| **P2 (Medium)** | Affects satisfaction but not blocking | Secondary features |
-| **P3 (Low)** | Nice-to-have improvements | Polish, edge case optimization |
-## Valid vs Invalid Omission Reasons
-| Category | Valid Omission | Invalid Omission |
-|----------|----------------|------------------|
-| Installability | "Pure SaaS, no client installation" | "Handled by ops team" |
-| Charisma | "CLI tool, visual design N/A" | "Brand guidelines separate" |
-| Compatibility | "Single-platform by contract" | "Will test on main browsers" |
-| Development | **NEVER** | "Team is experienced" |
-| Security | **NEVER** | "Internal system only" |
+# HTSM v6.3 Quality Criteria Categories
+James Bach's Heuristic Test Strategy Model (HTSM) v6.3 Quality Criteria framework.
+## 1. Capability
+**Can it perform the required functions?**
+| Subcategory | Focus |
+|-------------|-------|
+| Sufficiency | Does it do what it's supposed to? |
+| Correctness | Does it do it correctly? |
+**Priority Indicators:**
+- P0: Core business functionality
+- P1: Important features
+- P2: Secondary features
+- P3: Nice-to-have features
+## 2. Reliability
+**Will it work well and resist failure?**
+| Subcategory | Focus |
+|-------------|-------|
+| Robustness | Can it handle adverse conditions? |
+| Error Handling | Does it handle errors gracefully? |
+| Data Integrity | Is data protected from corruption? |
+| Safety | Does it avoid dangerous behaviors? |
+**Cannot be omitted** - All systems can fail.
+## 3. Usability
+**How easy is it for real users?**
+| Subcategory | Focus |
+|-------------|-------|
+| Learnability | How quickly can users learn? |
+| Operability | How easy to operate day-to-day? |
+| Accessibility | Can users with disabilities use it? |
+## 4. Charisma
+**How appealing is the product?**
+| Subcategory | Focus |
+|-------------|-------|
+| Aesthetics | Is it visually pleasing? |
+| Uniqueness | Does it stand out? |
+| Entrancement | Does it engage users? |
+| Image | Does it project the right brand? |
+**Note:** "Brand guidelines handled separately" is NOT a valid omission reason. Charisma is about UX testing, not brand documentation.
+## 5. Security
+**How well protected against unauthorized use?**
+| Subcategory | Focus |
+|-------------|-------|
+| Authentication | Who is using it? |
+| Authorization | What are they allowed to do? |
+| Privacy | Is personal data protected? |
+| Security Holes | Are there vulnerabilities? |
+**Cannot be omitted** - Every system has attack surface.
+## 6. Scalability
+**How well does deployment scale?**
+| Subcategory | Focus |
+|-------------|-------|
+| Load Handling | Behavior under increased demand |
+| Resource Efficiency | Resource usage at scale |
+## 7. Compatibility
+**Works with external components?**
+| Subcategory | Focus |
+|-------------|-------|
+| Application | Works with other applications? |
+| OS | Works with target operating systems? |
+| Hardware | Works with target hardware? |
+| Backward | Works with previous versions? |
+| Product Footprint | Resource requirements acceptable? |
+## 8. Performance
+**How speedy and responsive?**
+| Subcategory | Focus |
+|-------------|-------|
+| Response Time | Under various conditions |
+| Throughput | Data processing capacity |
+| Efficiency | Resource utilization |
+**Cannot be omitted** - Every system has response time.
+## 9. Installability
+**How easily installed?**
+| Subcategory | Focus |
+|-------------|-------|
+| System Requirements | Clear and achievable? |
+| Configuration | Easy to configure? |
+| Uninstallation | Clean removal? |
+| Upgrades/Patches | Easy to update? |
+| Administration | Easy to administer? |
+**Valid omission:** Pure SaaS/browser-based with no client installation.
+## 10. Development
+**How well can we create/test/modify?**
+| Subcategory | Focus |
+|-------------|-------|
+| Supportability | Easy to support? |
+| Testability | Easy to test? |
+| Maintainability | Easy to maintain? |
+| Portability | Easy to port? |
+| Localizability | Easy to localize? |
+**Cannot be omitted** - Always applies to software.
+---
+## Priority Assignment Guide
+| Priority | Definition | Example |
+|----------|------------|---------|
+| **P0 (Critical)** | Failure causes immediate business/user harm | Payment failures, data breaches |
+| **P1 (High)** | Critical to core user value proposition | Core features not working |
+| **P2 (Medium)** | Affects satisfaction but not blocking | Secondary features |
+| **P3 (Low)** | Nice-to-have improvements | Polish, edge case optimization |
+## Valid vs Invalid Omission Reasons
+| Category | Valid Omission | Invalid Omission |
+|----------|----------------|------------------|
+| Installability | "Pure SaaS, no client installation" | "Handled by ops team" |
+| Charisma | "CLI tool, visual design N/A" | "Brand guidelines separate" |
+| Compatibility | "Single-platform by contract" | "Will test on main browsers" |
+| Development | **NEVER** | "Team is experienced" |
+| Security | **NEVER** | "Internal system only" |

package/.claude/skills/README.md CHANGED Viewed

@@ -4,16 +4,16 @@ This directory contains Quality Engineering skills managed by Agentic QE.
 ## Summary
-- **Total QE Skills**: 75
-- **V2 Methodology Skills**: 60
+- **Total QE Skills**: 77
+- **V2 Methodology Skills**: 62
 - **V3 Domain Skills**: 15
-- **Platform Skills**: 35 (Claude Flow managed)
+- **Platform Skills**: 30 (Claude Flow managed)
 - **Validation Infrastructure**: ✅ Installed
 > **Note**: Platform skills (agentdb, github, flow-nexus, etc.) are managed by claude-flow.
 > Only QE-specific skills are installed/updated by `aqe init`.
-## V2 Methodology Skills (60)
+## V2 Methodology Skills (62)
 Version-agnostic quality engineering best practices from the QE community.
@@ -21,6 +21,7 @@ Version-agnostic quality engineering best practices from the QE community.
 - **accessibility-testing**: WCAG 2.2 compliance testing, screen reader validation, and inclusive design verification. Use when ensuring legal compliance (ADA, Section 508), testing for disabilities, or building accessible applications for 1 billion disabled users globally.
 - **agentic-quality-engineering**: AI agents as force multipliers for quality work. Core skill for all 19 QE agents using PACT principles.
 - **api-testing-patterns**: Comprehensive API testing patterns including contract testing, REST/GraphQL testing, and integration testing. Use when testing APIs or designing API test strategies.
+- **browser**: Web browser automation with AI-optimized snapshots for claude-flow agents
 - **brutal-honesty-review**: Unvarnished technical criticism combining Linus Torvalds
 - **bug-reporting-excellence**: Write high-quality bug reports that get fixed quickly. Use when reporting bugs, training teams on bug reporting, or establishing bug report standards.
 - **chaos-engineering-resilience**: Chaos engineering principles, controlled failure injection, resilience testing, and system recovery validation. Use when testing distributed systems, building confidence in fault tolerance, or validating disaster recovery.
@@ -52,6 +53,7 @@ Version-agnostic quality engineering best practices from the QE community.
 - **qcsd-cicd-swarm**: QCSD Verification phase swarm for CI/CD pipeline quality gates using regression analysis, flaky test detection, quality gate enforcement, and deployment readiness assessment. Consumes Development outputs (SHIP/CONDITIONAL/HOLD decisions, quality metrics) and produces signals for Production monitoring.
 - **qcsd-development-swarm**: QCSD Development phase swarm for in-sprint code quality assurance using TDD adherence, code complexity analysis, coverage gap detection, and defect prediction. Consumes Refinement outputs (BDD scenarios, SFDIPOT priorities) and produces signals for Verification.
 - **qcsd-ideation-swarm**: QCSD Ideation phase swarm for Quality Criteria sessions using HTSM v6.3, Risk Storming, and Testability analysis before development begins. Uses 5-tier browser cascade: Vibium → agent-browser → Playwright+Stealth → WebFetch → WebSearch-fallback.
+- **qcsd-production-swarm**: QCSD Production Telemetry phase swarm for post-release production health assessment using DORA metrics, root cause analysis, defect prediction, and cross-phase feedback loops. Consumes CI/CD outputs (RELEASE/REMEDIATE/BLOCK decisions, release readiness metrics) and produces feedback signals to Ideation and Refinement.
 - **qcsd-refinement-swarm**: QCSD Refinement phase swarm for Sprint Refinement sessions using SFDIPOT product factors, BDD scenario generation, and requirements validation.
 - **quality-metrics**: Measure quality effectively with actionable metrics. Use when establishing quality dashboards, defining KPIs, or evaluating test effectiveness.
 - **refactoring-patterns**: Apply safe refactoring patterns to improve code structure without changing behavior. Use when cleaning up code, reducing technical debt, or improving maintainability.
@@ -98,7 +100,7 @@ V3-specific implementation guides for the 12 DDD bounded contexts.
 - **qe-test-generation**: AI-powered test generation using pattern recognition, code analysis, and intelligent test synthesis for comprehensive test coverage.
 - **qe-visual-accessibility**: Visual regression testing, responsive design validation, and WCAG accessibility compliance testing.
-## Platform Skills (35)
+## Platform Skills (30)
 Claude Flow platform skills (managed separately).
@@ -107,7 +109,6 @@ Claude Flow platform skills (managed separately).
 - agentdb-memory-patterns
 - agentdb-optimization
 - agentdb-vector-search
-- agentic-jujutsu
 - flow-nexus-neural
 - flow-nexus-platform
 - flow-nexus-swarm
@@ -116,13 +117,9 @@ Claude Flow platform skills (managed separately).
 - github-project-management
 - github-release-management
 - github-workflow-automation
-- hive-mind-advanced
 - hooks-automation
-- iterative-loop
-- performance-analysis
 - reasoningbank-agentdb
 - reasoningbank-intelligence
-- release
 - skill-builder
 - sparc-methodology
 - stream-chain
@@ -151,4 +148,4 @@ See `.validation/README.md` for usage instructions.
 ---
-*Generated by AQE v3 init on 2026-02-16T09:20:23.221Z*
+*Generated by AQE v3 init on 2026-03-11T09:00:19.526Z*

package/.claude/skills/brutal-honesty-review/SKILL.md CHANGED Viewed

@@ -39,6 +39,9 @@ When brutal honesty is needed:
 - Level 3 (Brutal): "This is negligent. You're exposing user data because..."
 **DO NOT USE FOR:** Junior devs' first PRs, demoralized teams, public forums, low psychological safety
+## Minimum Findings Enforcement
+All brutal honesty reviews enforce a minimum of 3 weighted findings (CRITICAL=3, HIGH=2, MEDIUM=1, LOW=0.5). If the initial review finds fewer, escalate to deeper analysis. Brutally honest reviewers should ALWAYS find something -- if you can't, explain exactly why with evidence.
 </default_to_action>
 ## Quick Reference Card

package/.claude/skills/code-review-quality/SKILL.md CHANGED Viewed

@@ -150,6 +150,9 @@ This validation logic appears in 3 places. A `validateEmail()` helper would redu
 - Is this doing too many things?
 - Is there duplication we could reduce?
+## Minimum Findings Enforcement
+Reviews must meet a minimum weighted finding score of 3.0 (CRITICAL=3, HIGH=2, MEDIUM=1, LOW=0.5, INFORMATIONAL=0.25). If the initial review falls short, run the qe-devils-advocate agent as a meta-reviewer to find additional observations. Every review should have at least 3 actionable observations.
 ---
 ## Agent-Assisted Reviews