npm - agentic-qe - Versions diffs - 3.8.2 → 3.8.3 - Mend

agentic-qe 3.8.2 → 3.8.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (217) hide show

package/.claude/skills/test-automation-strategy/SKILL.md CHANGED Viewed

@@ -22,23 +22,16 @@ validation:
 <default_to_action>
 When designing or improving test automation:
-1. FOLLOW test pyramid: 70% unit, 20% integration, 10% E2E
-2. APPLY F.I.R.S.T. principles: Fast, Isolated, Repeatable, Self-validating, Timely
-3. USE patterns: Page Object Model, Builder pattern, Factory pattern
-4. INTEGRATE in CI/CD: Every commit runs tests, fail fast, clear feedback
-5. MANAGE flaky tests: Quarantine, fix, or delete - never ignore
+1. DETECT anti-patterns: Ice cream cone? Slow suite? Flaky tests?
+2. USE patterns: Page Object Model, Builder pattern, Factory pattern
+3. INTEGRATE in CI/CD: Every commit runs tests, fail fast
+4. MANAGE flaky tests: Quarantine, fix, or delete - never ignore
 **Quick Anti-Pattern Detection:**
 - Ice cream cone (many E2E, few unit) → Invert to pyramid
 - Slow tests (> 10 min suite) → Parallelize, mock external deps
 - Flaky tests → Fix timing, isolate data, or quarantine
-- Test duplication → Share fixtures, use page objects
 - Brittle selectors → Use data-testid, semantic locators
-**Critical Success Factors:**
-- Fast feedback is the goal (< 10 min full suite)
-- Automation supports testing, doesn't replace judgment
-- Invest in test infrastructure like production code
 </default_to_action>
 ## Quick Reference Card
@@ -49,23 +42,7 @@ When designing or improving test automation:
 - Reducing flaky test burden
 - Optimizing CI/CD pipeline speed
-### Test Pyramid
-| Layer | % | Speed | Isolation | Examples |
-|-------|---|-------|-----------|----------|
-| **Unit** | 70% | < 1ms | Complete | Pure functions, logic |
-| **Integration** | 20% | < 1s | Partial | API, database |
-| **E2E** | 10% | < 30s | None | User journeys |
-### F.I.R.S.T. Principles
-| Principle | Meaning | How |
-|-----------|---------|-----|
-| **F**ast | Quick execution | Mock external deps |
-| **I**solated | No shared state | Fresh fixtures per test |
-| **R**epeatable | Same result every time | No random data |
-| **S**elf-validating | Clear pass/fail | Assert, don't print |
-| **T**imely | Written with code | TDD, not after |
-### Anti-Patterns
+### Anti-Patterns to Detect
 | Problem | Symptom | Fix |
 |---------|---------|-----|
 | Ice cream cone | 80% E2E, 10% unit | Invert pyramid |
@@ -76,40 +53,6 @@ When designing or improving test automation:
 ---
-## Page Object Model
-```javascript
-// pages/LoginPage.js
-class LoginPage {
-  constructor(page) {
-    this.page = page;
-    this.emailInput = '[data-testid="email"]';
-    this.passwordInput = '[data-testid="password"]';
-    this.submitButton = '[data-testid="submit"]';
-    this.errorMessage = '[data-testid="error"]';
-  }
-  async login(email, password) {
-    await this.page.fill(this.emailInput, email);
-    await this.page.fill(this.passwordInput, password);
-    await this.page.click(this.submitButton);
-  }
-  async getError() {
-    return this.page.textContent(this.errorMessage);
-  }
-}
-// Test uses page object
-test('shows error for invalid credentials', async ({ page }) => {
-  const loginPage = new LoginPage(page);
-  await loginPage.login('bad@email.com', 'wrong');
-  expect(await loginPage.getError()).toBe('Invalid credentials');
-});
-```
----
 ## CI/CD Integration
 ```yaml
@@ -230,6 +173,12 @@ const automationFleet = await FleetManager.coordinate({
 ## Remember
-**Pyramid: 70% unit, 20% integration, 10% E2E.** F.I.R.S.T. principles for every test. Page Object Model for E2E. Parallelize for speed. Quarantine flaky tests - never ignore them. Treat test code like production code.
 **With Agents:** Agents generate pyramid-compliant tests, detect flaky patterns, optimize execution time, and maintain test infrastructure. Use agents to scale automation quality.
+## Gotchas
+- Agent generates 80% E2E tests and 20% unit tests (inverted pyramid) — explicitly enforce 70/20/10 ratio
+- Page Object Model tests become brittle when selectors change — prefer data-testid attributes over CSS selectors
+- Flaky tests quarantined but never fixed is technical debt — set a 2-week SLA to fix or delete
+- Agent treats test code as second-class — test code needs the same review standards as production code
+- Parallel test execution requires test isolation — shared state between tests causes non-deterministic failures

package/.claude/skills/test-data-management/SKILL.md CHANGED Viewed

@@ -55,84 +55,6 @@ When creating or managing test data:
 | **Volume** | Performance | 10k+ records |
 | **Edge cases** | Boundary testing | Targeted |
-### Privacy Techniques
-| Technique | Use Case |
-|-----------|----------|
-| **Synthetic** | Generate fake data (preferred) |
-| **Masking** | j***@example.com |
-| **Hashing** | Irreversible pseudonymization |
-| **Tokenization** | Reversible with key |
----
-## Synthetic Data Generation
-```javascript
-import { faker } from '@faker-js/faker';
-// Seed for reproducibility
-faker.seed(123);
-function generateUser() {
-  return {
-    id: faker.string.uuid(),
-    email: faker.internet.email(),
-    firstName: faker.person.firstName(),
-    lastName: faker.person.lastName(),
-    phone: faker.phone.number(),
-    address: {
-      street: faker.location.streetAddress(),
-      city: faker.location.city(),
-      zip: faker.location.zipCode()
-    },
-    createdAt: faker.date.past()
-  };
-}
-// Generate 1000 users
-const users = Array.from({ length: 1000 }, generateUser);
-```
----
-## Test Data Builder Pattern
-```typescript
-class UserBuilder {
-  private user: Partial<User> = {};
-  asAdmin() {
-    this.user.role = 'admin';
-    this.user.permissions = ['read', 'write', 'delete'];
-    return this;
-  }
-  asCustomer() {
-    this.user.role = 'customer';
-    this.user.permissions = ['read'];
-    return this;
-  }
-  withEmail(email: string) {
-    this.user.email = email;
-    return this;
-  }
-  build(): User {
-    return {
-      id: this.user.id ?? faker.string.uuid(),
-      email: this.user.email ?? faker.internet.email(),
-      role: this.user.role ?? 'customer',
-      ...this.user
-    } as User;
-  }
-}
-// Usage
-const admin = new UserBuilder().asAdmin().withEmail('admin@test.com').build();
-const customer = new UserBuilder().asCustomer().build();
-```
 ---
 ## Data Anonymization
@@ -185,29 +107,6 @@ test('user registration', async () => {
 ---
-## Volume Data Generation
-```javascript
-// Generate 10,000 users efficiently
-async function generateLargeDataset(count = 10000) {
-  const batchSize = 1000;
-  const batches = Math.ceil(count / batchSize);
-  for (let i = 0; i < batches; i++) {
-    const users = Array.from({ length: batchSize }, (_, index) => ({
-      id: i * batchSize + index,
-      email: `user${i * batchSize + index}@example.com`,
-      firstName: faker.person.firstName()
-    }));
-    await db.users.insertMany(users); // Batch insert
-    console.log(`Batch ${i + 1}/${batches}`);
-  }
-}
-```
----
 ## Agent-Driven Data Generation
 ```typescript
@@ -268,8 +167,6 @@ const dataFleet = await FleetManager.coordinate({
 ## Remember
-**Test data is infrastructure, not an afterthought.** 40% of test failures are caused by inadequate test data. Poor data = poor tests.
-**Never use production PII directly.** GDPR fines up to €20M or 4% of revenue. Always use synthetic data or properly anonymized production snapshots.
+**Never use production PII directly.** Always use synthetic data or properly anonymized production snapshots.
 **With Agents:** `qe-test-data-architect` generates 10k+ records/sec with realistic patterns, relationships, and constraints. Agents ensure GDPR/CCPA compliance automatically and eliminate test data bottlenecks.

package/.claude/skills/test-design-techniques/SKILL.md CHANGED Viewed

@@ -21,23 +21,11 @@ validation:
 # Test Design Techniques
 <default_to_action>
-When designing test cases systematically:
-1. APPLY Boundary Value Analysis (test at min, max, edges)
-2. USE Equivalence Partitioning (one test per partition)
-3. CREATE Decision Tables (for complex business rules)
-4. MODEL State Transitions (for stateful behavior)
-5. REDUCE with Pairwise Testing (for combinations)
-**Quick Design Selection:**
+When designing test cases, select technique by input type:
 - Numeric ranges → BVA + EP
 - Multiple conditions → Decision Tables
 - Workflows → State Transition
-- Many parameters → Pairwise Testing
-**Critical Success Factors:**
-- Systematic design finds more bugs with fewer tests
-- Random testing is inefficient
-- 40+ years of research backs these techniques
+- Many parameter combinations → Pairwise Testing
 </default_to_action>
 ## Quick Reference Card
@@ -48,139 +36,6 @@ When designing test cases systematically:
 - Complex business rules
 - Reducing test redundancy
-### Technique Selection Guide
-| Scenario | Technique |
-|----------|-----------|
-| **Numeric input ranges** | BVA + EP |
-| **Multiple conditions** | Decision Tables |
-| **Stateful workflows** | State Transition |
-| **Many parameter combinations** | Pairwise |
-| **All combinations critical** | Full Factorial |
----
-## Boundary Value Analysis (BVA)
-**Principle:** Bugs cluster at boundaries.
-**Test at boundaries:**
-- Minimum valid value
-- Just below minimum (invalid)
-- Just above minimum (valid)
-- Maximum valid value
-- Just above maximum (invalid)
-```javascript
-// Age field: 18-120 valid
-const boundaryTests = [
-  { input: 17, expected: 'invalid' },  // Below min
-  { input: 18, expected: 'valid' },    // Min boundary
-  { input: 19, expected: 'valid' },    // Above min
-  { input: 119, expected: 'valid' },   // Below max
-  { input: 120, expected: 'valid' },   // Max boundary
-  { input: 121, expected: 'invalid' }  // Above max
-];
-```
----
-## Equivalence Partitioning (EP)
-**Principle:** One test per equivalent class.
-```javascript
-// Discount rules:
-// 1-10:  No discount
-// 11-100: 10% discount
-// 101+:   20% discount
-const partitionTests = [
-  { quantity: -1, expected: 'invalid' },  // Invalid partition
-  { quantity: 5, expected: 0 },           // Partition 1: 1-10
-  { quantity: 50, expected: 0.10 },       // Partition 2: 11-100
-  { quantity: 200, expected: 0.20 }       // Partition 3: 101+
-];
-// 4 tests cover all behavior (vs 200+ if testing every value)
-```
----
-## Decision Tables
-**Use for:** Complex business rules with multiple conditions.
-```
-Loan Approval Rules:
-┌──────────────┬───────┬───────┬───────┬───────┬───────┐
-│ Conditions   │ R1    │ R2    │ R3    │ R4    │ R5    │
-├──────────────┼───────┼───────┼───────┼───────┼───────┤
-│ Age ≥ 18     │ Yes   │ Yes   │ Yes   │ No    │ Yes   │
-│ Credit ≥ 700 │ Yes   │ Yes   │ No    │ Yes   │ No    │
-│ Income ≥ 50k │ Yes   │ No    │ Yes   │ Yes   │ Yes   │
-├──────────────┼───────┼───────┼───────┼───────┼───────┤
-│ Result       │Approve│Approve│Reject │Reject │Reject │
-└──────────────┴───────┴───────┴───────┴───────┴───────┘
-// 5 tests cover all decision combinations
-```
----
-## State Transition Testing
-**Model state changes:**
-```
-States: Logged Out → Logged In → Premium → Suspended
-Valid Transitions:
-- Login: Logged Out → Logged In
-- Upgrade: Logged In → Premium
-- Payment Fail: Premium → Suspended
-- Logout: Any → Logged Out
-Invalid Transitions to Test:
-- Logged Out → Premium (should reject)
-- Suspended → Premium (should reject)
-```
-```javascript
-test('cannot upgrade without login', async () => {
-  const result = await user.upgrade(); // While logged out
-  expect(result.error).toBe('Login required');
-});
-```
----
-## Pairwise (Combinatorial) Testing
-**Problem:** All combinations explode exponentially.
-```javascript
-// Parameters:
-// Browser: Chrome, Firefox, Safari (3)
-// OS: Windows, Mac, Linux (3)
-// Screen: Desktop, Tablet, Mobile (3)
-// All combinations: 3 × 3 × 3 = 27 tests
-// Pairwise: 9 tests cover all pairs
-const pairwiseTests = [
-  { browser: 'Chrome', os: 'Windows', screen: 'Desktop' },
-  { browser: 'Chrome', os: 'Mac', screen: 'Tablet' },
-  { browser: 'Chrome', os: 'Linux', screen: 'Mobile' },
-  { browser: 'Firefox', os: 'Windows', screen: 'Tablet' },
-  { browser: 'Firefox', os: 'Mac', screen: 'Mobile' },
-  { browser: 'Firefox', os: 'Linux', screen: 'Desktop' },
-  { browser: 'Safari', os: 'Windows', screen: 'Mobile' },
-  { browser: 'Safari', os: 'Mac', screen: 'Desktop' },
-  { browser: 'Safari', os: 'Linux', screen: 'Tablet' }
-];
-// Each pair appears at least once
-```
 ---
 ## Agent-Driven Test Design
@@ -242,8 +97,4 @@ const designFleet = await FleetManager.coordinate({
 ## Remember
-**Systematic design > Random testing.** 40+ years of research shows these techniques find more bugs with fewer tests than ad-hoc approaches.
-**Combine techniques for comprehensive coverage.** BVA for boundaries, EP for partitions, decision tables for rules, pairwise for combinations.
 **With Agents:** `qe-test-generator` applies these techniques automatically, generating optimal test suites with maximum coverage and minimum redundancy. Agents identify boundaries, partitions, and combinations from code analysis.

package/.claude/skills/test-environment-management/SKILL.md CHANGED Viewed

@@ -47,23 +47,6 @@ When managing test environments:
 - Reducing test infrastructure costs
 - Ensuring dev/prod parity
-### Environment Types
-| Type | Purpose | Lifetime |
-|------|---------|----------|
-| **Local** | Fast feedback | Developer session |
-| **CI** | Automated tests | Per build (ephemeral) |
-| **Staging** | Pre-prod validation | Persistent |
-| **Production** | Canary/synthetic | Continuous |
-### Dev/Prod Parity Checklist
-| Item | Must Match |
-|------|------------|
-| OS | Same version |
-| Database | Same type + version |
-| Dependencies | Same versions |
-| Config | Same structure |
-| Env vars | Same names |
 ---
 ## Docker for Test Environments
@@ -103,32 +86,6 @@ docker-compose -f docker-compose.test.yml down
 ---
-## Infrastructure as Code
-```hcl
-# test-environment.tf
-resource "aws_instance" "test_server" {
-  ami           = "ami-0c55b159cbfafe1f0"
-  instance_type = "t3.medium"
-  tags = {
-    Name         = "test-environment"
-    Environment  = "test"
-    AutoShutdown = "20:00" # Cost optimization
-  }
-}
-resource "aws_rds_instance" "test_db" {
-  engine                  = "postgres"
-  engine_version          = "15"
-  instance_class          = "db.t3.micro"
-  backup_retention_period = 0 # No backups needed for test
-  skip_final_snapshot     = true
-}
-```
----
 ## Service Virtualization
 ```javascript
@@ -239,8 +196,4 @@ const envFleet = await FleetManager.coordinate({
 ## Remember
-**Environment inconsistency = flaky tests.** "Works on my machine" problems come from: different OS/versions, missing dependencies, configuration differences, data differences.
-**Infrastructure as Code ensures repeatability.** Version control your environment configurations. Spin up identical environments on demand.
 **With Agents:** Agents automatically provision test environments matching production, ensure parity, mock external services, and optimize costs with auto-scaling and auto-shutdown.

package/.claude/skills/test-failure-investigator/SKILL.md ADDED Viewed

@@ -0,0 +1,99 @@
+---
+name: test-failure-investigator
+description: "Use when a test is failing and you need to determine root cause: is it flaky, an environment issue, or a real regression? Traces failure from symptom to fix."
+user-invocable: true
+---
+# Test Failure Investigator
+Runbook-style skill for systematic test failure investigation. Given a failing test, determines root cause and recommends action.
+## Activation
+```
+/test-failure-investigator [test-name-or-file]
+```
+## Investigation Flow
+### Step 1: Classify the Failure
+Run the test 3 times and classify:
+| Result Pattern | Classification | Action |
+|---------------|---------------|--------|
+| Fails consistently | **Regression** or **Environment** | Continue to Step 2 |
+| Fails intermittently | **Flaky** | Skip to Step 4 |
+| Passes now | **Transient** | Check CI logs, environment diff |
+```bash
+# Run test 3 times
+for i in 1 2 3; do npx jest {{test_file}} 2>&1 | tail -5; echo "--- Run $i ---"; done
+```
+### Step 2: Narrow the Scope
+```bash
+# When did it start failing?
+git log --oneline -20 -- {{related_source_files}}
+# What changed recently?
+git diff HEAD~5 -- {{related_source_files}}
+# Does it fail in isolation?
+npx jest {{test_file}} --testNamePattern="{{test_name}}"
+# Does it fail with other tests?
+npx jest --runInBand  # sequential execution
+```
+### Step 3: Root Cause Analysis
+| Symptom | Likely Cause | Investigation |
+|---------|-------------|--------------|
+| Timeout | Network/DB dependency | Check external service availability |
+| Assertion mismatch | Logic change | Compare expected vs actual, check git blame |
+| Import error | Dependency change | Check package.json changes, run `npm ci` |
+| Permission denied | Environment | Check file permissions, Docker volumes |
+| Out of memory | Resource leak | Profile with `--detectOpenHandles` |
+### Step 4: Flaky Test Investigation
+```bash
+# Run 10 times to confirm flakiness
+for i in $(seq 1 10); do npx jest {{test_file}} --forceExit 2>&1 | grep -E 'PASS|FAIL'; done
+# Common flaky causes:
+# - Shared state between tests (missing cleanup)
+# - Time-dependent assertions (use fake timers)
+# - Race conditions (missing await)
+# - Port conflicts (use random ports)
+# - Order dependency (run with --randomize)
+```
+### Step 5: Report
+```markdown
+## Test Failure Report
+- **Test**: {{test_name}}
+- **File**: {{test_file}}
+- **Classification**: Regression / Flaky / Environment / Transient
+- **Root Cause**: {{description}}
+- **First Failed**: {{commit_hash}} ({{date}})
+- **Fix**: {{recommended_action}}
+- **Verified**: [ ] Fix applied and test passes 3x consecutively
+```
+## Composition
+After investigation, compose with:
+- **`/bug-reporting-excellence`** — if regression found, file a bug report
+- **`/regression-testing`** — if regression, add to regression suite
+- **`/qe-test-execution`** — for re-running tests after fix
+## Gotchas
+- Agent may guess at root cause without running the test — always reproduce first
+- "Works on my machine" is not a diagnosis — compare environments (node version, OS, deps)
+- Flaky tests that pass 9/10 times will still be reported as "passing" by CI — run 10+ times
+- Test isolation failures are the #1 cause of flaky tests — check for shared state in beforeAll/afterAll

package/.claude/skills/test-metrics-dashboard/SKILL.md ADDED Viewed

@@ -0,0 +1,97 @@
+---
+name: test-metrics-dashboard
+description: "Use when querying test history, analyzing flakiness rates, tracking MTTR, or building quality trend dashboards from test execution data."
+user-invocable: true
+---
+# Test Metrics Dashboard
+Data & Analysis skill for querying test execution history, identifying trends, and surfacing actionable quality metrics.
+## Activation
+```
+/test-metrics-dashboard
+```
+## Key Metrics
+### Test Health Metrics
+| Metric | Formula | Target | Alert |
+|--------|---------|--------|-------|
+| **Pass Rate** | Passed / Total | > 95% | < 90% |
+| **Flakiness Rate** | Flaky / Total | < 5% | > 10% |
+| **MTTR** | Avg time from failure to fix | < 4 hours | > 24 hours |
+| **Execution Time** | Total suite duration | < 10 min | > 20 min |
+| **Coverage Delta** | Current - Previous | >= 0% | < -2% |
+### Data Collection
+```bash
+# Export Jest results to JSON
+npx jest --json --outputFile=test-results/$(date +%Y-%m-%d).json
+# Parse results for dashboard
+jq '{
+  date: .startTime,
+  total: .numTotalTests,
+  passed: .numPassedTests,
+  failed: .numFailedTests,
+  duration_ms: (.testResults | map(.endTime - .startTime) | add),
+  pass_rate: ((.numPassedTests / .numTotalTests) * 100),
+  flaky: [.testResults[] | select(.numPendingTests > 0)] | length
+}' test-results/$(date +%Y-%m-%d).json
+```
+### Trend Analysis
+```bash
+# Compare last 5 runs
+for f in $(ls -t test-results/*.json | head -5); do
+  jq --arg file "$f" '{
+    file: $file,
+    pass_rate: ((.numPassedTests / .numTotalTests) * 100 | floor),
+    duration_s: ((.testResults | map(.endTime - .startTime) | add) / 1000 | floor)
+  }' "$f"
+done
+```
+### Top Failing Tests
+```bash
+# Find most frequently failing tests across runs
+for f in test-results/*.json; do
+  jq -r '.testResults[] | select(.numFailingTests > 0) | .testFilePath' "$f"
+done | sort | uniq -c | sort -rn | head -10
+```
+## Run History
+Store dashboard data in `${CLAUDE_PLUGIN_DATA}/test-metrics.log`:
+```
+2026-03-18|95.2|4.1|312|82.5|3
+```
+Format: `date|pass_rate|flakiness_rate|duration_s|coverage_pct|failed_count`
+Read history for trend detection:
+```bash
+# Coverage trending down?
+tail -5 "${CLAUDE_PLUGIN_DATA}/test-metrics.log" | awk -F'|' '{print $5}' | sort -n | head -1
+```
+## Composition
+Feeds into:
+- **`/qe-quality-assessment`** — quality gate decisions based on metrics
+- **`/test-failure-investigator`** — investigate top failing tests
+- **`/coverage-drop-investigator`** — when coverage trends down
+## Gotchas
+- Metrics without baselines are meaningless — establish baselines before tracking trends
+- Flakiness rate is underreported — a test that fails 1/100 times still breaks CI weekly
+- Duration trends upward over time as test count grows — set alerts on rate of increase, not absolute value
+- Agent may report metrics from a single run as "trends" — need 5+ data points for meaningful trends