npm - agentic-qe - Versions diffs - 3.8.2 → 3.8.3 - Mend

agentic-qe 3.8.2 → 3.8.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (217) hide show

package/.claude/skills/mobile-testing/SKILL.md CHANGED Viewed

@@ -65,51 +65,6 @@ When testing mobile applications:
 | **Tier 2** | 30% users | iPhone 14/13, Pixel 8 |
 | **Tier 3** | 10% users | Older devices, other manufacturers |
-### Mobile Performance Goals
-| Metric | Target |
-|--------|--------|
-| App launch | < 2 seconds |
-| Screen transition | < 300ms |
-| Frame rate | 60 FPS |
-| Battery drain | < 5%/hour background |
----
-## Touch Gesture Testing
-```javascript
-// Appium gesture examples
-// Tap
-await driver.touchAction({ action: 'tap', x: 100, y: 200 });
-// Swipe (scroll down)
-await driver.touchAction([
-  { action: 'press', x: 200, y: 400 },
-  { action: 'moveTo', x: 200, y: 100 },
-  { action: 'release' }
-]);
-// Pinch to zoom
-const finger1 = [
-  { action: 'press', x: 100, y: 200 },
-  { action: 'moveTo', x: 50, y: 150 },
-  { action: 'release' }
-];
-const finger2 = [
-  { action: 'press', x: 200, y: 200 },
-  { action: 'moveTo', x: 250, y: 250 },
-  { action: 'release' }
-];
-await driver.multiTouchAction([finger1, finger2]);
-// Long press
-await driver.touchAction({
-  action: 'longPress',
-  x: 100, y: 200,
-  duration: 2000
-});
-```
 ---
 ## Mobile-Specific Scenarios
@@ -217,8 +172,6 @@ const mobileFleet = await FleetManager.coordinate({
 ## Remember
-**Mobile is not a smaller desktop - it's a different platform.** 60%+ of web traffic is mobile. Device fragmentation (1000+ Android devices), touch gestures, sensors, permissions, offline scenarios - all require specific testing.
-**Test on real devices for critical flows.** Emulators catch 80% of bugs but real devices needed for actual performance, sensor behavior, and platform quirks.
+**Test on real devices for critical flows.** Emulators catch 80% of bugs but real devices are needed for actual performance, sensor behavior, and platform quirks.
-**With Agents:** `qe-test-executor` orchestrates testing across device farms, manages platform differences, and tests 10+ devices in parallel. Reduces mobile testing from days to hours.
+**With Agents:** `qe-test-executor` orchestrates testing across device farms, manages platform differences, and tests 10+ devices in parallel.

package/.claude/skills/mutation-testing/SKILL.md CHANGED Viewed

@@ -233,3 +233,29 @@ const mutationFleet = await FleetManager.coordinate({
 **Focus on critical paths first.** Don't mutation test everything - prioritize payment, authentication, data integrity code.
 **With Agents:** Agents run mutation analysis, identify surviving mutants, and generate missing test cases to kill them. Automated improvement of test quality.
+## Run History
+After each mutation test run, append results to `run-history.json` in this skill directory:
+```bash
+node -e "
+const fs = require('fs');
+const h = JSON.parse(fs.readFileSync('.claude/skills/mutation-testing/run-history.json'));
+h.runs.push({date: new Date().toISOString().split('T')[0], mutation_score_pct: SCORE, killed: KILLED, survived: SURVIVED});
+fs.writeFileSync('.claude/skills/mutation-testing/run-history.json', JSON.stringify(h, null, 2));
+"
+```
+Read `run-history.json` before each run to track score improvements over time.
+## Skill Composition
+- **Before mutation testing** → Run `/qe-test-generation` to ensure tests exist
+- **After mutation results** → Use `/qe-coverage-analysis` to prioritize improvement areas
+- **Quality gate** → Feed results into `/qe-quality-assessment` for ship/no-ship decision
+## Gotchas
+- Stryker requires `--testRunner jest` explicitly if both jest and vitest are installed
+- Mutating `>=` to `>` in date comparisons rarely gets killed — add boundary tests
+- Running on files >500 LOC will timeout; use `--mutate` to target specific functions
+- `--concurrency` defaults to CPU count which OOMs in containers — set to 2

package/.claude/skills/mutation-testing/config.json ADDED Viewed

@@ -0,0 +1,14 @@
+{
+  "$schema": "./config-schema.json",
+  "_description": "Mutation Testing configuration. Auto-created on first run. Edit to customize.",
+  "mutator": "stryker",
+  "concurrency": 2,
+  "score_threshold": 80,
+  "options": {
+    "testRunner": null,
+    "mutateGlob": "src/**/*.{ts,js}",
+    "excludeGlob": "src/**/*.test.{ts,js}",
+    "timeoutMs": 60000
+  },
+  "_setupPrompt": "If testRunner is null, ask: 'Which test runner does this project use? (jest/vitest/mocha)'. Warn: concurrency defaults to 2 to avoid OOM in containers."
+}

package/.claude/skills/mutation-testing/references/mutation-operators.md ADDED Viewed

@@ -0,0 +1,38 @@
+# Mutation Operators Reference
+## Arithmetic Operators
+| Original | Mutant | What It Tests |
+|----------|--------|--------------|
+| `a + b` | `a - b` | Addition logic |
+| `a * b` | `a / b` | Multiplication logic |
+| `a % b` | `a * b` | Modulo logic |
+## Conditional Operators
+| Original | Mutant | What It Tests |
+|----------|--------|--------------|
+| `a > b` | `a >= b` | Off-by-one in boundaries |
+| `a >= b` | `a > b` | Boundary inclusion |
+| `a === b` | `a !== b` | Equality checks |
+| `a && b` | `a \|\| b` | Logical conjunction |
+## Statement Mutations
+| Original | Mutant | What It Tests |
+|----------|--------|--------------|
+| `return x` | `return !x` | Return value negation |
+| `if (cond)` | `if (true)` | Condition relevance |
+| `if (cond)` | `if (false)` | Dead code detection |
+| `statement` | _(removed)_ | Statement necessity |
+## Common Surviving Mutants and Fixes
+**`>=` to `>` survives**: Add boundary test with exact boundary value
+**`&&` to `||` survives**: Add test where only one condition is true
+**Removed `return` survives**: Function's return value isn't being checked
+**`+1` to `-1` survives**: Increment/decrement logic untested
+## Stryker Configuration Tips
+- `--testRunner jest` — explicit when multiple runners installed
+- `--concurrency 2` — prevents OOM in containers
+- `--mutate 'src/**/*.ts,!src/**/*.d.ts'` — skip type definitions
+- `--timeoutMS 60000` — increase for slow test suites
+- `--thresholds.high 80 --thresholds.low 60` — score quality bands

package/.claude/skills/mutation-testing/run-history.json ADDED Viewed

@@ -0,0 +1,6 @@
+{
+  "_description": "Mutation testing run history. Append after each run. Claude reads this to track mutation score trends.",
+  "_format": "Each entry: {date, scope, mutation_score_pct, killed, survived, timeout, no_coverage, improvement_areas}",
+  "_instructions": "After running mutation testing, append results here. Track score improvements over time. Flag if score drops below threshold.",
+  "runs": []
+}

package/.claude/skills/no-skip/SKILL.md ADDED Viewed

@@ -0,0 +1,74 @@
+---
+name: no-skip
+description: "Use when you want to prevent .skip(), .only(), xit(), and xdescribe() from being committed to test files. Activate with /no-skip for session-scoped test skip prevention."
+user-invocable: true
+---
+# No-Skip Mode
+When activated, blocks any write that adds test-skipping patterns to test files.
+## What It Does
+Prevents these patterns from being written to test files:
+- `.skip()` — skips individual tests
+- `.only()` — runs only one test (excludes all others)
+- `xit(` / `xdescribe(` — Jasmine skip syntax
+- `test.todo(` — unimplemented test placeholders
+- `@Skip` / `@Disabled` — JUnit skip annotations
+## Activation
+```
+/no-skip
+```
+## Hook Configuration
+```json
+{
+  "hooks": {
+    "PreToolUse": [
+      {
+        "matcher": "Write|Edit",
+        "hook": ".claude/skills/no-skip/scripts/check-skips.sh",
+        "condition": "file matches **/*.test.{ts,js,tsx,jsx} OR **/*.spec.{ts,js}"
+      }
+    ]
+  }
+}
+```
+## Enforcement Logic
+```bash
+#!/bin/bash
+# check-skips.sh
+CONTENT="$1"
+SKIP_PATTERNS=(
+  '\.skip\s*\('
+  '\.only\s*\('
+  '\bxit\s*\('
+  '\bxdescribe\s*\('
+  '\bxcontext\s*\('
+  'test\.todo\s*\('
+  '@Skip'
+  '@Disabled'
+  '@Ignore'
+)
+for pattern in "${SKIP_PATTERNS[@]}"; do
+  if echo "$CONTENT" | grep -qP "$pattern"; then
+    echo "BLOCKED: Found skip pattern '$pattern'"
+    echo "Remove the skip and either fix the test or delete it."
+    exit 1
+  fi
+done
+```
+## Gotchas
+- This catches NEW skips being written, not existing ones — run `grep -r '.skip(' tests/` to find existing skips
+- `.only()` is sometimes used intentionally during debugging — deactivate with `/no-skip off` if needed
+- Some frameworks use `pending()` instead of `.skip()` — add to patterns if using Jasmine

package/.claude/skills/no-skip/scripts/check-skips.sh ADDED Viewed

@@ -0,0 +1,28 @@
+#!/bin/bash
+# check-skips.sh — No-Skip hook
+# Blocks writes that add test-skipping patterns to test files.
+# Called by PreToolUse hook on Write/Edit to **/*.test.* or **/*.spec.*
+CONTENT="$1"
+if [ -z "$CONTENT" ]; then
+  # If no content passed, read from stdin
+  CONTENT=$(cat)
+fi
+FOUND=0
+# Check for skip patterns
+for pattern in '\.skip\s*(' '\.only\s*(' '\bxit\s*(' '\bxdescribe\s*(' '\bxcontext\s*(' 'test\.todo\s*(' '@Skip' '@Disabled' '@Ignore'; do
+  if echo "$CONTENT" | grep -qP "$pattern" 2>/dev/null || echo "$CONTENT" | grep -q "$pattern" 2>/dev/null; then
+    echo "BLOCKED: Found skip pattern: $pattern"
+    FOUND=1
+  fi
+done
+if [ $FOUND -eq 1 ]; then
+  echo "Remove the skip and either fix the test or delete it."
+  exit 1
+fi
+exit 0

package/.claude/skills/pair-programming/SKILL.md CHANGED Viewed

@@ -1,6 +1,6 @@
 ---
 name: Pair Programming
-description: AI-assisted pair programming with multiple modes (driver/navigator/switch), real-time verification, quality monitoring, and comprehensive testing. Supports TDD, debugging, refactoring, and learning sessions. Features automatic role switching, continuous code review, security scanning, and performance optimization with truth-score verification.
+description: "Use when pair programming with AI assistance, practicing TDD with a navigator, debugging collaboratively, refactoring with real-time verification, or running learning sessions with continuous code review and quality monitoring."
 ---
 # Pair Programming

package/.claude/skills/pentest-validation/SKILL.md CHANGED Viewed

@@ -1,6 +1,6 @@
 ---
 name: pentest-validation
-description: "Orchestrate security finding validation through graduated exploitation. 4-phase pipeline: recon (SAST/DAST), analysis (code review), validation (exploit proof), report (No Exploit, No Report gate). Eliminates false positives by proving exploitability."
+description: "Use when validating security findings from SAST/DAST scans, proving exploitability of reported vulnerabilities, eliminating false positives, or running the 4-phase pentest pipeline (recon, analysis, validation, report)."
 category: specialized-testing
 priority: critical
 tokenEstimate: 1500
@@ -253,8 +253,7 @@ aqe/pentest/
 ## Related Skills
-- [security-testing](../security-testing/) - OWASP vulnerability scanning
-- [qe-security-compliance](../qe-security-compliance/) - SAST/DAST automation
+- [security-testing](../security-testing/) - OWASP vulnerability scanning, SAST/DAST automation
 - [compliance-testing](../compliance-testing/) - Regulatory compliance
 - [api-testing-patterns](../api-testing-patterns/) - API security testing
 - [chaos-engineering-resilience](../chaos-engineering-resilience/) - Security under chaos

package/.claude/skills/performance-testing/SKILL.md CHANGED Viewed

@@ -313,3 +313,24 @@ const perfFleet = await FleetManager.coordinate({
 **Trend over time:** Catch degradation early
 **With Agents:** Agents automate load testing, analyze bottlenecks, and compare with production. Use agents to maintain performance at scale.
+## Run History
+After each performance test run, append results to `run-history.json` in this skill directory:
+```bash
+node -e "
+const fs = require('fs');
+const h = JSON.parse(fs.readFileSync('.claude/skills/performance-testing/run-history.json'));
+h.runs.push({date: new Date().toISOString().split('T')[0], scenario: 'load', p95_ms: P95, throughput_rps: RPS, error_rate_pct: ERR});
+fs.writeFileSync('.claude/skills/performance-testing/run-history.json', JSON.stringify(h, null, 2));
+"
+```
+Read `run-history.json` before each run — compare with baselines. Alert if p95 increases >20% from baseline.
+## Gotchas
+- k6 scripts generated by agent often hardcode base URLs — use environment variables for portability
+- Load tests in containers hit resource limits before app limits — ensure container has 2x the resources of target
+- Agent forgets to include think time between requests — without it, load is unrealistically bursty
+- P95 vs P99 matters — agent defaults to averages which hide tail latency problems
+- Baseline comparison requires consistent environment — CI runner variance can cause 20%+ noise

package/.claude/skills/performance-testing/config.json ADDED Viewed

@@ -0,0 +1,18 @@
+{
+  "$schema": "./config-schema.json",
+  "_description": "Performance Testing configuration. Auto-created on first run. Edit to customize.",
+  "tool": null,
+  "baseline_file": null,
+  "thresholds": {
+    "p95_response_ms": 500,
+    "error_rate_percent": 1,
+    "throughput_rps": 100
+  },
+  "options": {
+    "duration": "30s",
+    "vus": 10,
+    "rampUp": "10s",
+    "thinkTime": "1s"
+  },
+  "_setupPrompt": "If tool is null, ask: 'Which load testing tool do you use? (k6/artillery/jmeter)'. If baseline_file is null, ask: 'Do you have an existing performance baseline file? (path or \"none\")'."
+}

package/.claude/skills/performance-testing/references/k6-patterns.md ADDED Viewed

@@ -0,0 +1,72 @@
+# k6 Load Testing Patterns
+## Basic Load Test
+```javascript
+import http from 'k6/http';
+import { check, sleep } from 'k6';
+export const options = {
+  stages: [
+    { duration: '30s', target: 20 },  // ramp up
+    { duration: '1m', target: 20 },   // steady state
+    { duration: '10s', target: 0 },   // ramp down
+  ],
+  thresholds: {
+    http_req_duration: ['p(95)<500'],
+    http_req_failed: ['rate<0.01'],
+  },
+};
+export default function () {
+  const res = http.get(`${__ENV.BASE_URL}/api/endpoint`);
+  check(res, {
+    'status is 200': (r) => r.status === 200,
+    'response time < 500ms': (r) => r.timings.duration < 500,
+  });
+  sleep(1); // Think time — don't forget!
+}
+```
+## Stress Test Pattern
+```javascript
+export const options = {
+  stages: [
+    { duration: '2m', target: 100 },
+    { duration: '5m', target: 100 },
+    { duration: '2m', target: 200 },
+    { duration: '5m', target: 200 },
+    { duration: '2m', target: 0 },
+  ],
+};
+```
+## Spike Test Pattern
+```javascript
+export const options = {
+  stages: [
+    { duration: '10s', target: 10 },
+    { duration: '1m', target: 10 },
+    { duration: '10s', target: 1000 },  // spike!
+    { duration: '3m', target: 1000 },
+    { duration: '10s', target: 10 },
+  ],
+};
+```
+## Soak Test Pattern
+```javascript
+export const options = {
+  stages: [
+    { duration: '5m', target: 50 },
+    { duration: '4h', target: 50 },  // long duration
+    { duration: '5m', target: 0 },
+  ],
+};
+```
+## Tips
+- Always include `sleep()` for think time
+- Use `__ENV.BASE_URL` not hardcoded URLs
+- Set `--out json=results.json` for CI comparison
+- Use `scenarios` for complex user journeys
+- Watch for connection reuse vs real-world patterns

package/.claude/skills/performance-testing/run-history.json ADDED Viewed

@@ -0,0 +1,6 @@
+{
+  "_description": "Performance testing run history. Append after each test run. Claude reads this to detect performance regressions.",
+  "_format": "Each entry: {date, scenario, p50_ms, p95_ms, p99_ms, throughput_rps, error_rate_pct, baseline_comparison}",
+  "_instructions": "After running performance tests, append results here. Compare with baselines. Alert if p95 increases >20% from baseline.",
+  "runs": []
+}

package/.claude/skills/pr-review/SKILL.md CHANGED Viewed

@@ -1,6 +1,6 @@
 ---
 name: pr-review
-description: Scope-aware GitHub PR review with user-friendly tone and trust tier validation
+description: "Use when reviewing a GitHub PR for quality, scope correctness, trust tier compliance, or generating user-friendly review feedback."
 trust_tier: 0
 domain: code-review
 ---
@@ -24,8 +24,8 @@ Read the complete diff and PR description. Do not skim — read every changed fi
 ### 2. Scope Check
 - Only analyze AQE/QE skills (NOT Claude Flow platform skills)
-- Platform skills to EXCLUDE: v3-*, flow-nexus-*, agentdb-*, reasoningbank-*, swarm-*, github-*, hive-mind-advanced, hooks-automation, iterative-loop, stream-chain, skill-builder, sparc-methodology, pair-programming, release, debug-loop, aqe-v2-v3-migration
-- If the PR touches skills, verify the count/scope matches expectations (~78 AQE skills)
+- Platform skills to EXCLUDE: v3-*, flow-nexus-*, agentdb-*, reasoningbank-*, swarm-*, github-*, hive-mind-advanced, hooks-automation, iterative-loop, stream-chain, skill-builder, sparc-methodology, pair-programming, release, debug-loop
+- If the PR touches skills, verify the count/scope matches expectations (~82 AQE skills)
 - Flag any platform skill changes that may have leaked into an AQE-focused PR
 ### 3. Summarize Changes

package/.claude/skills/qcsd-cicd-swarm/SKILL.md CHANGED Viewed

@@ -1,6 +1,6 @@
 ---
 name: qcsd-cicd-swarm
-description: "QCSD Verification phase swarm for CI/CD pipeline quality gates using regression analysis, flaky test detection, quality gate enforcement, and deployment readiness assessment. Consumes Development outputs (SHIP/CONDITIONAL/HOLD decisions, quality metrics) and produces signals for Production monitoring."
+description: "Use when enforcing CI/CD quality gates before release, running regression analysis, detecting flaky tests, or assessing deployment readiness in the QCSD Verification phase."
 category: qcsd-phases
 priority: critical
 version: 1.0.0

package/.claude/skills/qcsd-development-swarm/SKILL.md CHANGED Viewed

@@ -1,6 +1,6 @@
 ---
 name: qcsd-development-swarm
-description: "QCSD Development phase swarm for in-sprint code quality assurance using TDD adherence, code complexity analysis, coverage gap detection, and defect prediction. Consumes Refinement outputs (BDD scenarios, SFDIPOT priorities) and produces signals for Verification."
+description: "Use when monitoring in-sprint code quality with TDD adherence checks, complexity analysis, coverage gap detection, or defect prediction in the QCSD Development phase."
 category: qcsd-phases
 priority: critical
 version: 1.0.0

package/.claude/skills/qcsd-ideation-swarm/SKILL.md CHANGED Viewed

@@ -1,6 +1,6 @@
 ---
 name: qcsd-ideation-swarm
-description: "QCSD Ideation phase swarm for Quality Criteria sessions using HTSM v6.3, Risk Storming, and Testability analysis before development begins. Uses 5-tier browser cascade: Vibium -> agent-browser -> Playwright+Stealth -> WebFetch -> WebSearch-fallback."
+description: "Use when running Quality Criteria sessions during PI/Sprint planning with HTSM v6.3, Risk Storming, or Testability analysis in the QCSD Ideation phase."
 category: qcsd-phases
 priority: critical
 version: 7.5.1

package/.claude/skills/qcsd-production-swarm/SKILL.md CHANGED Viewed

@@ -1,6 +1,6 @@
 ---
 name: qcsd-production-swarm
-description: "QCSD Production Telemetry phase swarm for post-release production health assessment using DORA metrics, root cause analysis, defect prediction, and cross-phase feedback loops. Consumes CI/CD outputs (RELEASE/REMEDIATE/BLOCK decisions, release readiness metrics) and produces feedback signals to Ideation and Refinement."
+description: "Use when assessing post-release production health with DORA metrics, root cause analysis, defect prediction, or cross-phase feedback loops in the QCSD Production phase."
 category: qcsd-phases
 priority: critical
 version: 1.0.0

package/.claude/skills/qcsd-refinement-swarm/SKILL.md CHANGED Viewed

@@ -1,6 +1,6 @@
 ---
 name: qcsd-refinement-swarm
-description: "QCSD Refinement phase swarm for Sprint Refinement sessions using SFDIPOT product factors, BDD scenario generation, and requirements validation."
+description: "Use when running Sprint Refinement sessions with SFDIPOT product factors, generating BDD scenarios, or validating requirements in the QCSD Refinement phase."
 category: qcsd-phases
 priority: critical
 version: 1.0.0

package/.claude/skills/qe-chaos-resilience/SKILL.md CHANGED Viewed

@@ -1,6 +1,6 @@
 ---
 name: "QE Chaos Resilience"
-description: "Chaos engineering and resilience testing including fault injection, load testing, and system recovery validation."
+description: "Use when injecting faults into distributed systems, validating system recovery, running load/stress tests, or building confidence in resilience through chaos engineering."
 trust_tier: 3
 validation:
   schema_path: schemas/output.json
@@ -240,4 +240,4 @@ await resilienceTester.validateSLA({
 **Primary Agents**: qe-chaos-engineer, qe-load-tester, qe-resilience-tester
 **Coordinator**: qe-chaos-coordinator
-**Related Skills**: qe-performance, qe-security-compliance
+**Related Skills**: qe-performance, security-testing

package/.claude/skills/qe-code-intelligence/SKILL.md CHANGED Viewed

@@ -1,6 +1,6 @@
 ---
 name: "QE Code Intelligence"
-description: "Knowledge graph-based code understanding with semantic search and 80% token reduction through intelligent context retrieval."
+description: "Use when understanding unfamiliar code, searching semantically across a codebase, mapping dependencies, or reducing context window usage through intelligent retrieval."
 trust_tier: 3
 validation:
   schema_path: schemas/output.json
@@ -207,6 +207,14 @@ aqe kg export --format dot --output codebase.dot
 aqe kg stats
 ```
+## Gotchas
+- WARNING: code-intelligence domain has 18% success rate — prefer direct grep/glob over agent-based code search for simple queries
+- Knowledge graph construction fails on repos >50K LOC — scope to specific modules
+- Semantic search returns irrelevant results without domain-specific embeddings — always verify search results manually
+- Agent claims "80% token reduction" but may skip critical context — verify key files are included in results
+- Fleet must be initialized before using: run `npx ruflo doctor --fix` if you get initialization errors
 ## Coordination
 **Primary Agents**: qe-knowledge-graph, qe-semantic-searcher, qe-dependency-mapper

package/.claude/skills/qe-coverage-analysis/SKILL.md CHANGED Viewed

@@ -1,6 +1,6 @@
 ---
 name: "QE Coverage Analysis"
-description: "O(log n) sublinear coverage gap detection with risk-weighted analysis and intelligent test prioritization."
+description: "Use when analyzing test coverage, identifying coverage gaps, prioritizing testing effort, or comparing coverage between branches."
 trust_tier: 3
 validation:
   schema_path: schemas/output.json
@@ -185,6 +185,33 @@ quality_gates:
       - alert_on_decline: 3  # consecutive PRs
 ```
+## Run History
+After each coverage analysis, append results to `run-history.json` in this skill directory:
+```bash
+# Read current history, append new entry, write back
+node -e "
+const fs = require('fs');
+const h = JSON.parse(fs.readFileSync('.claude/skills/qe-coverage-analysis/run-history.json'));
+h.runs.push({date: new Date().toISOString().split('T')[0], statements_pct: STATEMENTS, branches_pct: BRANCHES, gaps_found: GAPS});
+fs.writeFileSync('.claude/skills/qe-coverage-analysis/run-history.json', JSON.stringify(h, null, 2));
+"
+```
+Read `run-history.json` before each run to detect trends (e.g., "coverage dropped 3 consecutive times").
+## Skill Composition
+- **Coverage dropped?** → Use `/coverage-drop-investigator` to trace the cause
+- **Need more tests** → Use `/qe-test-generation` to fill gaps
+- **Validate quality** → Use `/mutation-testing` to ensure coverage means quality
+- **Ship decision** → Feed into `/qe-quality-assessment` for deployment readiness
+## Gotchas
+- High line coverage does NOT mean good tests — 100% coverage with 0% assertions is common agent output. Use mutation testing to verify
+- coverage-analysis domain has 86% success rate — 14% of runs fail on initialization. Always verify results and have fallback plan (e.g. manual coverage tools)
+- Self-learning pipeline may silently stop learning (statusline frozen for days) — only human inspection catches this
 ## Coordination
 **Primary Agents**: qe-coverage-specialist, qe-coverage-analyzer, qe-gap-detector

package/.claude/skills/qe-coverage-analysis/run-history.json ADDED Viewed

@@ -0,0 +1,6 @@
+{
+  "_description": "Coverage analysis run history. Append after each run. Claude reads this to detect trends.",
+  "_format": "Each entry: {date, scope, statements_pct, branches_pct, functions_pct, gaps_found, recommendation}",
+  "_instructions": "After running coverage analysis, append results here. Compare with previous entries to detect trends. Alert if coverage declines 3 consecutive times.",
+  "runs": []
+}

package/.claude/skills/qe-defect-intelligence/SKILL.md CHANGED Viewed

@@ -1,6 +1,6 @@
 ---
 name: "QE Defect Intelligence"
-description: "AI-powered defect prediction, pattern learning, and root cause analysis for proactive quality management."
+description: "Use when predicting defects before they escape, analyzing root causes of failures, learning defect patterns, or implementing proactive quality management with AI."
 trust_tier: 3
 validation:
   schema_path: schemas/output.json

package/.claude/skills/qe-learning-optimization/SKILL.md CHANGED Viewed

@@ -1,6 +1,6 @@
 ---
 name: "QE Learning Optimization"
-description: "Transfer learning, metrics optimization, and continuous improvement for AI-powered QE agents."
+description: "Use when optimizing QE agent performance, applying transfer learning across test domains, tuning quality metrics, or implementing continuous improvement loops for AI-powered testing."
 trust_tier: 3
 validation:
   schema_path: schemas/output.json

package/.claude/skills/qe-quality-assessment/SKILL.md CHANGED Viewed

@@ -1,6 +1,6 @@
 ---
 name: "QE Quality Assessment"
-description: "Comprehensive quality gates, metrics analysis, and deployment readiness assessment for continuous quality assurance."
+description: "Use when evaluating code quality, setting up quality gates, assessing deployment readiness, or generating quality reports."
 trust_tier: 3
 validation:
   schema_path: schemas/output.json
@@ -211,8 +211,35 @@ quality_check:
       - 1  # Warnings only
 ```
+## Run History
+After each quality assessment, append results to `run-history.json` in this skill directory:
+```bash
+node -e "
+const fs = require('fs');
+const h = JSON.parse(fs.readFileSync('.claude/skills/qe-quality-assessment/run-history.json'));
+h.runs.push({date: new Date().toISOString().split('T')[0], gate_result: 'PASS_OR_FAIL', failed_checks: []});
+fs.writeFileSync('.claude/skills/qe-quality-assessment/run-history.json', JSON.stringify(h, null, 2));
+"
+```
+Read `run-history.json` before each run — alert if quality gate failed 3 of last 5 runs.
+## Skill Composition
+- **Before assessment** → Run `/qe-coverage-analysis` and `/mutation-testing` first
+- **If issues found** → Use `/test-failure-investigator` to diagnose failures
+- **For PR review** → Combine with `/code-review-quality` for comprehensive review
+## Gotchas
+- NEVER trust agent-reported pass/fail status — 12 test failures were caught that agents claimed were passing (Nagual pattern, reward 0.92)
+- Completion theater: agent hardcoded version '3.0.0' instead of reading from package.json — verify actual values in output
+- Fix issues in priority waves (P0 → P1 → P2) with verification between each wave — don't fix everything in parallel
+- quality-assessment domain has 53.7% success rate — expect failures and have fallback
+- If HybridMemoryBackend initialization fails, run `npx ruflo doctor --fix` first
 ## Coordination
 **Primary Agents**: qe-quality-analyzer, qe-deployment-advisor, qe-metrics-collector
 **Coordinator**: qe-quality-coordinator
-**Related Skills**: qe-coverage-analysis, qe-security-compliance
+**Related Skills**: qe-coverage-analysis, security-testing

package/.claude/skills/qe-quality-assessment/run-history.json ADDED Viewed

@@ -0,0 +1,6 @@
+{
+  "_description": "Quality assessment run history. Append after each run. Claude reads this to track quality gate pass/fail trends.",
+  "_format": "Each entry: {date, scope, gate_result, scores, failed_checks, recommendation}",
+  "_instructions": "After running quality assessment, append results here. Alert if quality gate fails 3 of last 5 runs. Track which checks fail most often.",
+  "runs": []
+}