npm - agentic-qe - Versions diffs - 3.6.0 → 3.6.2 - Mend

agentic-qe 3.6.0 → 3.6.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (204) hide show

package/.claude/agents/v3/qe-devils-advocate.md ADDED Viewed

@@ -0,0 +1,218 @@
+---
+name: qe-devils-advocate
+version: "3.6.0"
+updated: "2026-02-09"
+description: Meta-agent that challenges other agents' outputs by finding gaps, questioning assumptions, and critiquing completeness
+v2_compat: null
+domain: quality-assessment
+---
+<qe_agent_definition>
+<identity>
+You are the V3 QE Devil's Advocate, the adversarial reviewer in Agentic QE v3.
+Mission: Challenge other agents' outputs to surface gaps, blind spots, false positives, and unquestioned assumptions before results reach users.
+Domain: quality-assessment (ADR-064)
+V2 Compatibility: New in v3 -- no v2 equivalent.
+</identity>
+<implementation_status>
+Working:
+- Missing edge case detection (boundary values, null/undefined, concurrency)
+- False positive detection in security scans and coverage reports
+- Coverage gap critique (structural vs semantic coverage gaps)
+- Security blind spot identification (missing threat vectors)
+- Assumption questioning (implicit preconditions, happy-path bias)
+- Boundary value gap analysis (off-by-one, overflow, empty collections)
+- Error handling gap detection (missing catch blocks, swallowed errors)
+- Configurable severity thresholds and confidence filters
+- Per-review and cumulative statistics tracking
+Partial:
+- Integration with Queen Coordinator task pipeline
+- Cross-domain challenge coordination
+Planned:
+- Learning from past challenge outcomes (which challenges were acted on)
+- Auto-escalation for repeated unchallenged gaps
+</implementation_status>
+<default_to_action>
+Review outputs immediately when a ChallengeTarget is provided.
+Apply all applicable strategies without confirmation.
+Filter results by configured minConfidence and minSeverity.
+Report challenges in descending severity order.
+Always produce a summary even when no challenges are found.
+</default_to_action>
+<parallel_execution>
+Run all applicable challenge strategies concurrently against the target.
+Strategies are independent -- missing-edge-case, false-positive, coverage-gap, etc. run in parallel.
+Aggregate and sort results by severity after all strategies complete.
+Use up to 7 concurrent strategies per review.
+</parallel_execution>
+<capabilities>
+- **Missing Edge Case Detection**: Identify untested boundary values, null handling, concurrency, and error paths in test generation outputs
+- **False Positive Detection**: Flag likely false positives in security scans and coverage reports by checking for vague descriptions, low confidence, and known false-positive patterns
+- **Coverage Gap Critique**: Challenge coverage claims by checking for missing negative tests, missing integration paths, and semantic gaps not visible in line coverage
+- **Security Blind Spot Identification**: Find missing threat vectors (injection, auth bypass, SSRF, deserialization) not covered by security scan results
+- **Assumption Questioning**: Surface implicit assumptions in quality assessments, requirements validations, and defect predictions
+- **Boundary Value Gap Analysis**: Detect missing tests for off-by-one errors, integer overflow, empty/max-size collections, and Unicode edge cases
+- **Error Handling Gap Detection**: Find missing error handling for network failures, timeouts, malformed input, and resource exhaustion
+</capabilities>
+<memory_namespace>
+Reads:
+- aqe/v3/domains/test-generation/results/* - Test generation outputs to challenge
+- aqe/v3/domains/coverage-analysis/results/* - Coverage reports to critique
+- aqe/v3/domains/security-compliance/scans/* - Security scans to review
+- aqe/v3/domains/quality-assessment/reports/* - Quality reports to question
+Writes:
+- aqe/v3/devils-advocate/reviews/* - Challenge review results
+- aqe/v3/devils-advocate/stats/* - Cumulative challenge statistics
+- aqe/v3/devils-advocate/patterns/* - Learned gap patterns
+Coordination:
+- aqe/v3/queen/tasks/* - Task status updates
+- aqe/v3/domains/*/results/* - Cross-domain output access
+</memory_namespace>
+<learning_protocol>
+**MANDATORY**: When executed via Claude Code Task tool, you MUST call learning MCP tools.
+### Query Past Challenge Patterns BEFORE Review
+```typescript
+mcp__agentic-qe__memory_retrieve({
+  key: "devils-advocate/patterns",
+  namespace: "learning"
+})
+```
+### Required Learning Actions (Call AFTER Review)
+**1. Store Challenge Review Experience:**
+```typescript
+mcp__agentic-qe__memory_store({
+  key: "devils-advocate/outcome-{timestamp}",
+  namespace: "learning",
+  value: {
+    agentId: "qe-devils-advocate",
+    taskType: "challenge-review",
+    reward: <calculated_reward>,
+    outcome: {
+      targetType: "<test-generation|coverage-analysis|security-scan|...>",
+      targetAgentId: "<agent that produced the output>",
+      challengeCount: <number>,
+      highSeverityCount: <number>,
+      overallScore: <0-1>,
+      verdict: "PASSED|CHALLENGED"
+    },
+    patterns: {
+      gapsFound: ["<types of gaps found>"],
+      strategiesUsed: ["<strategies that produced findings>"]
+    }
+  }
+})
+```
+**2. Submit Review Result to Queen:**
+```typescript
+mcp__agentic-qe__task_submit({
+  type: "challenge-review-complete",
+  priority: "p1",
+  payload: {
+    targetAgentId: "...",
+    targetType: "...",
+    challengeCount: <number>,
+    highSeverityCount: <number>,
+    summary: "...",
+    challenges: [...]
+  }
+})
+```
+### Reward Calculation Criteria (0-1 scale)
+| Reward | Criteria |
+|--------|----------|
+| 1.0 | Actionable critical findings confirmed by follow-up |
+| 0.9 | High-severity gaps found with clear evidence |
+| 0.7 | Medium gaps found, strategies well-targeted |
+| 0.5 | Review completed, minor findings only |
+| 0.3 | Review completed, no significant findings (clean output) |
+| 0.0 | Review failed or produced only noise/false challenges |
+</learning_protocol>
+<output_format>
+- JSON for structured challenge results (challenges array, scores, summary)
+- Markdown for human-readable challenge reports
+- Challenges sorted by severity (critical > high > medium > low > informational)
+- Include challenge count, overall confidence score, and per-strategy breakdown
+</output_format>
+<examples>
+Example 1: Challenge test generation output
+```
+Input: Review test-generation output from agent test-gen-001
+  - 5 tests generated for UserService.createUser()
+  - All tests check happy path with valid data
+Output: CHALLENGED (Score: 0.38, 4 challenges)
+  [HIGH] Missing edge case: No test for duplicate email
+  [HIGH] Missing edge case: No test for empty/null username
+  [MEDIUM] Boundary value gap: No max-length validation test
+  [LOW] Error handling gap: No test for database connection failure
+  Summary: 5 tests cover only the happy path. No negative tests,
+  no boundary tests, no error handling tests. Test suite has
+  significant gaps in edge case coverage.
+```
+Example 2: Challenge security scan output
+```
+Input: Review security-scan output from agent sec-scan-001
+  - 0 vulnerabilities found
+  - Scanned: SQL injection, XSS
+Output: CHALLENGED (Score: 0.52, 2 challenges)
+  [HIGH] Security blind spot: No SSRF testing performed
+  [MEDIUM] Security blind spot: No deserialization checks
+  Summary: Scan covers injection and XSS but misses SSRF,
+  deserialization, and authentication bypass vectors.
+```
+</examples>
+<v3_integration>
+### Code Implementation
+The Devil's Advocate agent is implemented in `v3/src/agents/devils-advocate/`:
+- `agent.ts` - Core `DevilsAdvocate` class with `review()` method
+- `strategies.ts` - 7 pluggable challenge strategies
+- `types.ts` - Type definitions for targets, challenges, results
+### Usage
+```typescript
+import { DevilsAdvocate } from '@agentic-qe/v3';
+const da = DevilsAdvocate.createDevilsAdvocate({ minConfidence: 0.5 });
+const result = da.review({
+  type: 'test-generation',
+  agentId: 'test-gen-001',
+  domain: 'test-generation',
+  output: { testCount: 3, tests: [] },
+  timestamp: Date.now(),
+});
+```
+### Strategies
+| Strategy | Applies To | Detects |
+|----------|-----------|---------|
+| MissingEdgeCaseStrategy | test-generation | Untested edge cases, null handling |
+| FalsePositiveDetectionStrategy | security-scan, coverage-analysis | Likely false positives |
+| CoverageGapCritiqueStrategy | coverage-analysis | Semantic gaps in coverage |
+| SecurityBlindSpotStrategy | security-scan | Missing threat vectors |
+| AssumptionQuestioningStrategy | quality-assessment, defect-prediction, requirements | Implicit assumptions |
+| BoundaryValueGapStrategy | test-generation | Off-by-one, overflow, empty collections |
+| ErrorHandlingGapStrategy | test-generation, contract-validation | Missing error handling |
+</v3_integration>
+</qe_agent_definition>

package/.claude/agents/v3/qe-quality-criteria-recommender.md CHANGED Viewed

@@ -245,7 +245,7 @@ interface QualityCriteriaAnalysis {
 ```
 ## Template Location
-Helper files installed to `.claude/agents/v3/helpers/quality-criteria/`:
+Helper files installed to `.claude/helpers/v3/quality-criteria/`:
 - `quality-criteria-reference-template.html` - HTML output template (MUST read before generating)
 - `htsm-categories.md` - Detailed category definitions
 - `evidence-classification.md` - Evidence type guidelines
@@ -404,7 +404,7 @@ if (!valid) {
 ### Output Validation
 If HTML output requested, always read template first:
 ```
-.claude/agents/v3/helpers/quality-criteria/quality-criteria-reference-template.html
+.claude/helpers/v3/quality-criteria/quality-criteria-reference-template.html
 ```
 </final_validation>
 </qe_agent_definition>

package/.claude/skills/qe-iterative-loop/SKILL.md CHANGED Viewed

@@ -441,5 +441,5 @@ When ALL phases complete -> <promise>DEPLOYMENT_READY</promise>
 ---
 **Origin**: Adapted from Ralph Wiggum plugin (anthropics/claude-code)
-**Specialized for**: Agentic QE v3 Fleet with 59 QE agents
+**Specialized for**: Agentic QE v3 Fleet with 60 QE agents
 **Domains**: test-generation, test-execution, coverage-analysis, quality-assessment

package/.claude/skills/release/SKILL.md CHANGED Viewed

@@ -144,27 +144,23 @@ Verify init completes without errors and creates the expected project structure
 # Version output
 node /workspaces/agentic-qe-new/v3/dist/cli/bundle.js --version
-# Doctor check
-node /workspaces/agentic-qe-new/v3/dist/cli/bundle.js doctor
+# System status
+node /workspaces/agentic-qe-new/v3/dist/cli/bundle.js status
 ```
 Both must succeed without errors.
-#### 8d. Verify MCP Tools
-```bash
-# Verify MCP server can start and list tools
-node /workspaces/agentic-qe-new/v3/dist/cli/bundle.js mcp --list-tools 2>&1 | head -30
-```
-Should list available MCP tools without crashing.
-#### 8e. Verify Self-Learning & Fleet Capabilities
+#### 8d. Verify Self-Learning & Fleet Capabilities
 ```bash
 cd /tmp/aqe-release-test
-# Verify memory/learning subsystem
-node /workspaces/agentic-qe-new/v3/dist/cli/bundle.js memory list 2>&1 | head -10
+# Verify learning subsystem
+node /workspaces/agentic-qe-new/v3/dist/cli/bundle.js learning stats 2>&1 | head -10
-# Verify agent spawning works
+# Verify agent listing works
 node /workspaces/agentic-qe-new/v3/dist/cli/bundle.js agent list 2>&1 | head -10
+# Verify health check
+node /workspaces/agentic-qe-new/v3/dist/cli/bundle.js health 2>&1 | head -10
 ```
 These should respond (even if empty results) without errors, confirming the subsystems initialize properly.
@@ -177,34 +173,24 @@ rm -rf /tmp/aqe-release-test
 ### 9. Local CI Test Suite
-Run the same tests that CI runs on PRs (`optimized-ci.yml`) and during publish (`npm-publish.yml`). Skip e2e browser tests unless the user explicitly requests them.
+Run the same tests that CI runs on PRs and during publish. Skip e2e browser tests unless the user explicitly requests them.
 ```bash
 cd /workspaces/agentic-qe-new/v3
-# Journey tests (highest-value signal, from optimized-ci.yml)
-npm run test:journeys
-# Code Intelligence tests (MinCut/Graph algorithms, from optimized-ci.yml)
-npm run test:code-intelligence
-# Contract tests (if they exist, from optimized-ci.yml)
-npm run test:contracts 2>/dev/null || echo "No contract tests"
-# Infrastructure tests (from optimized-ci.yml)
-npm run test:infrastructure 2>/dev/null || echo "No infrastructure tests"
-# Regression tests (from optimized-ci.yml)
-npm run test:regression 2>/dev/null || echo "No regression tests"
-# Performance gates (from optimized-ci.yml)
+# Performance gates (fast — validates perf thresholds)
 npm run performance:gate
+# Regression tests (runs full unit suite)
+npm run test:regression
 # Full test:ci suite (from npm-publish.yml — excludes browser/e2e)
 npm run test:ci
 ```
-All mandatory test suites must pass. If any fail, diagnose and fix before continuing.
+Available test scripts: `test:unit`, `test:unit:fast`, `test:unit:heavy`, `test:unit:mcp`, `test:ci`, `test:regression`, `test:safe`, `test:perf`, `test:e2e`, `test:coverage`, `performance:gate`.
+All mandatory test suites must pass. Pre-existing MCP handler test failures (tests that need runtime initialization) are acceptable if they also fail on the main branch.
 **STOP — show all test results.**

package/.claude/skills/skills-manifest.json CHANGED Viewed

@@ -904,7 +904,7 @@
   },
   "metadata": {
     "generatedBy": "Agentic QE Fleet",
-    "fleetVersion": "3.6.0",
+    "fleetVersion": "3.6.2",
     "manifestVersion": "1.3.0",
     "lastUpdated": "2026-02-04T00:00:00.000Z",
     "contributors": [

package/README.md CHANGED Viewed

@@ -9,35 +9,9 @@
 <img alt="NPM Downloads" src="https://img.shields.io/npm/dw/agentic-qe">
-**V3 (Main)** | [V2 Documentation](v2/docs/V2-README.md) | [Changelog](CHANGELOG.md) | [Contributors](CONTRIBUTORS.md) | [Issues](https://github.com/proffesor-for-testing/agentic-qe/issues) | [Discussions](https://github.com/proffesor-for-testing/agentic-qe/discussions)
+**V3 (Main)** | [V2 Documentation](v2/docs/V2-README.md) | [Release Notes](docs/releases/README.md) | [Changelog](v3/CHANGELOG.md) | [Contributors](CONTRIBUTORS.md) | [Issues](https://github.com/proffesor-for-testing/agentic-qe/issues) | [Discussions](https://github.com/proffesor-for-testing/agentic-qe/discussions)
-> **V3** brings Domain-Driven Design architecture, 13 bounded contexts, 59 specialized QE agents, TinyDancer intelligent model routing, ReasoningBank learning with Dream cycles, HNSW vector search, mathematical Coherence verification, full MinCut/Consensus integration across all 13 domains, and deep integration with [Claude Flow](https://github.com/ruvnet/claude-flow) and [Agentic Flow](https://github.com/ruvnet/agentic-flow).
-### What's New in v3.6.0
-- **Enterprise Integration Domain** — SOAP/WSDL, SAP RFC/BAPI/IDoc, OData, ESB/middleware, message broker, and Segregation of Duties testing (contributed by [@fndlalit](https://github.com/fndlalit))
-- **8 New Agents** — `qe-soap-tester`, `qe-sap-rfc-tester`, `qe-sap-idoc-tester`, `qe-middleware-validator`, `qe-odata-contract-tester`, `qe-message-broker-tester`, `qe-sod-analyzer`, `qe-pentest-validator`
-- **5 New Skills** — `enterprise-integration-testing`, `middleware-testing-patterns`, `wms-testing-patterns`, `observability-testing-patterns`, `pentest-validation` (Tier 3)
-- **Pentest Validation** — Shannon-inspired graduated exploit validation with "No Exploit, No Report" quality gate and 3-tier exploitation
-- **StrongDM Tier 1** — Loop detection + token dashboard for software delivery governance (ADR-062)
-- **Fleet: 59 agents, 75 skills across 13 domains**
-### What's New in v3.5.0
-- **Governance ON by Default** - @claude-flow/guidance integration with 7 unbreakable QE invariants (ADR-058)
-- **QCSD 2.0 Complete Lifecycle** - All 4 phases: Ideation → Refinement → Development → CI/CD Verification
-- **Infrastructure Self-Healing Enterprise** - 12 enterprise error signatures (SAP, Salesforce, Payment Gateway)
-### What's New in v3.4.2
-- **Skill Validation System** - 4-layer trust tiers with schemas, validators, and evaluation suites (ADR-056)
-- **CLI Validation Commands** - `aqe skill report`, `aqe eval run`, regression detection
-### What's New in v3.4.0
-- **AG-UI Protocol** - Anthropic's streaming agent-to-user interface with real-time progress updates
-- **A2A Protocol** - Google's agent-to-agent interoperability standard for cross-tool communication
-- **A2UI Components** - Unified UI combining AG-UI streaming with A2A event handling
+> **V3** brings Domain-Driven Design architecture, 13 bounded contexts, 60 specialized QE agents, TinyDancer intelligent model routing, ReasoningBank learning with Dream cycles, HNSW vector search, mathematical Coherence verification, full MinCut/Consensus integration across all 13 domains, and deep integration with [Claude Flow](https://github.com/ruvnet/claude-flow) and [Agentic Flow](https://github.com/ruvnet/agentic-flow).
 🏗️ **DDD Architecture** | 🧠 **ReasoningBank + Dream Cycles** | 🎯 **TinyDancer Model Routing** | 🔍 **HNSW Vector Search** | 👑 **Queen Coordinator** | 📊 **O(log n) Coverage** | 🔗 **Claude Flow Integration** | 🎯 **13 Bounded Contexts** | 📚 **75 QE Skills** | 🧬 **Coherence Verification** | ✅ **Trust Tiers** | 🛡️ **Governance**
@@ -88,7 +62,7 @@ claude "Use qe-flaky-hunter to analyze the last 100 test runs and stabilize flak
 **What V3 provides:**
 - ✅ **13 DDD Bounded Contexts**: Organized by business domain (test-generation, coverage-analysis, security-compliance, enterprise-integration, etc.)
-- ✅ **59 QE Agents**: Including Queen Coordinator for hierarchical orchestration (52 main + 7 TDD subagents)
+- ✅ **60 QE Agents**: Including Queen Coordinator for hierarchical orchestration (53 main + 7 TDD subagents)
 - ✅ **TinyDancer Model Routing**: 3-tier intelligent routing (Haiku/Sonnet/Opus) for cost optimization
 - ✅ **ReasoningBank Learning**: HNSW-indexed pattern storage with experience replay
 - ✅ **O(log n) Coverage Analysis**: Sublinear algorithms for efficient gap detection
@@ -134,7 +108,7 @@ claude "Assess code quality and provide deployment recommendation"
 | **AI testing tools are expensive** | TinyDancer 3-tier model routing reduces costs by matching task complexity to appropriate model |
 | **No memory between test runs—every analysis starts from scratch** | ReasoningBank remembers patterns, strategies, and what works for your codebase |
 | **Agents waste tokens reading irrelevant code** | Code Intelligence provides token reduction with semantic search and knowledge graphs |
-| **Quality engineering requires complex coordination** | Queen Coordinator orchestrates 59 agents across 13 domains with consensus and MinCut topology |
+| **Quality engineering requires complex coordination** | Queen Coordinator orchestrates 60 agents across 13 domains with consensus and MinCut topology |
 | **Tools don't understand your testing frameworks** | Works with Jest, Cypress, Playwright, Vitest, Mocha, Jasmine, AVA |
 ---
@@ -240,7 +214,7 @@ The **qe-queen-coordinator** manages the entire fleet with intelligent task dist
 ```
 **Capabilities:**
-- Orchestrate 59 QE agents concurrently across 13 domains
+- Orchestrate 60 QE agents concurrently across 13 domains
 - TinyDancer 3-tier model routing (Haiku/Sonnet/Opus) with confidence-based decisions
 - Byzantine fault-tolerant consensus for critical quality decisions
 - MinCut graph-based topology optimization for self-healing coordination
@@ -253,6 +227,34 @@ claude "Use qe-queen-coordinator to orchestrate release validation for v2.1.0 wi
 ---
+### 🤝 Agent Teams & Fleet Coordination
+The Queen Coordinator is extended with **Agent Teams** (ADR-064) for hybrid fleet communication:
+| Feature | Description |
+|---------|-------------|
+| **Mailbox Messaging** | Direct agent-to-agent and domain-scoped broadcast messaging |
+| **Distributed Tracing** | TraceContext propagation across messages for end-to-end task visibility |
+| **Dynamic Scaling** | Workload-based auto-scaling with configurable policies and cooldowns |
+| **Competing Hypotheses** | Multi-agent root cause investigation with evidence scoring, auto-triggered on critical failures |
+| **Federation** | Cross-service routing with health monitoring and service discovery |
+| **Circuit Breakers** | Per-domain fault isolation with automatic recovery |
+| **Task DAG** | Topological ordering with cycle detection for multi-step workflows |
+**Fleet Tiers** — Activate the level of coordination your project needs:
+| Tier | Agents | Best For |
+|------|--------|----------|
+| **Lite** | 1-4 | Small projects, focused tasks |
+| **Standard** | 5-10 | Team projects, multi-domain coordination |
+| **Full** | 11-15 | Enterprise, cross-fleet federation |
+```bash
+claude "Use qe-queen-coordinator with agent teams to investigate flaky test failures across test-execution and defect-intelligence domains"
+```
+---
 ### 🧠 ReasoningBank Learning System
 V3 agents learn and improve through the **ReasoningBank** pattern storage:
@@ -466,17 +468,17 @@ npx @claude-flow/cli@latest agent spawn -t qe-test-architect --name test-gen
 ---
-### 📊 59 Specialized QE Agents
+### 📊 60 Specialized QE Agents
 | Category | Count | Highlights |
 |----------|-------|------------|
-| **Main QE Agents** | 52 | Test generation, coverage, security, performance, accessibility, enterprise integration, pentest validation |
+| **Main QE Agents** | 53 | Test generation, coverage, security, performance, accessibility, enterprise integration, pentest validation |
 | **TDD Subagents** | 7 | RED/GREEN/REFACTOR with code review |
 **V2 Backward Compatibility**: All V2 agents map to V3 equivalents automatically.
 <details>
-<summary><b>📋 View All Main QE Agents (52)</b></summary>
+<summary><b>📋 View All Main QE Agents (53)</b></summary>
 | Agent | Domain | Purpose |
 |-------|--------|---------|
@@ -524,6 +526,7 @@ npx @claude-flow/cli@latest agent spawn -t qe-test-architect --name test-gen
 | qe-product-factors-assessor | quality-assessment | SFDIPOT product factors analysis |
 | qe-test-idea-rewriter | test-generation | Transform passive tests to active actions |
 | qe-quality-criteria-recommender | quality-assessment | HTSM v6.3 Quality Criteria analysis |
+| qe-devils-advocate | quality-assessment | Adversarial review of agent outputs |
 </details>
@@ -794,7 +797,7 @@ agentic-qe/
 │   │   ├── mcp/             # MCP server
 │   │   └── cli/             # V3 CLI
 │   ├── tests/               # 5,600+ tests
-│   └── assets/agents/       # 59 QE agent definitions (52 main + 7 subagents)
+│   └── assets/agents/       # 60 QE agent definitions (53 main + 7 subagents)
 ├── v2/                      # V2 Implementation (Legacy)
 │   ├── src/                 # V2 source code
 │   ├── tests/               # V2 tests

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "agentic-qe",
-  "version": "3.6.0",
+  "version": "3.6.2",
   "description": "Agentic Quality Engineering V3 - Domain-Driven Design Architecture with 13 Bounded Contexts, O(log n) coverage analysis, ReasoningBank learning, 59 specialized QE agents, mathematical Coherence verification, deep Claude Flow integration",
   "main": "./v3/dist/index.js",
   "types": "./v3/dist/index.d.ts",

package/scripts/cloud-db-config.json CHANGED Viewed

@@ -19,7 +19,7 @@
     }
   },
   "sync": {
-    "enabled": false,
+    "enabled": true,
     "mode": "incremental",
     "interval": "5m",
     "tables": [

package/v3/CHANGELOG.md CHANGED Viewed

@@ -5,6 +5,50 @@ All notable changes to Agentic QE will be documented in this file.
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
 and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
+## [3.6.2] - 2026-02-10
+### Fixed
+- **YAML parser empty array crash (Issue #244)** — `aqe init --auto` no longer fails on re-runs when `config.yaml` has empty array fields like `disabled:` with no items. The custom YAML parser now normalizes known array fields after parsing, and `mergeConfigs()` uses defensive `Array.isArray()` checks.
+- **Agent parse errors on helper files (Issue #243)** — Helper reference files (`htsm-categories.md`, `evidence-classification.md`) and the generated `README.md` are no longer placed inside `.claude/agents/v3/` where `claude doctor` would incorrectly parse them as agent definitions. Helpers now install to `.claude/helpers/v3/` and the agents index writes to `.claude/docs/v3-agents-index.md`.
+### Changed
+- **Helper files location** — Agent helper/reference files (quality-criteria templates, SFDIPOT templates) now install to `.claude/helpers/v3/` instead of `.claude/agents/v3/helpers/`. Updated all path references in `quality-criteria-service.ts` and agent definitions.
+## [3.6.1] - 2026-02-09
+### Added
+- **Agent Teams Integration (ADR-064)** — Hybrid fleet architecture layering Claude Code Agent Teams communication patterns on the existing Queen Coordinator. 4-phase implementation: Foundation, Hybrid Architecture, Learning & Observability, Advanced Patterns.
+- **Agent Teams Adapter** — Direct mailbox messaging between agents with domain-scoped teams (2-4 agents per domain), team lead/teammate model, and subscription-based event delivery.
+- **Fleet Tier Selector** — Tiered fleet activation (smoke/standard/deep/crisis) that controls agent count and token costs based on trigger context (commit, PR, release, incident).
+- **Task Dependency DAG** — Topological ordering with cycle detection for multi-step task workflows. DAGScheduler for automated execution of ready tasks.
+- **TeammateIdle Hook** — Auto-assigns pending tasks to idle agents, reducing Queen bottleneck for task distribution.
+- **TaskCompleted Hook** — Extracts patterns from completed tasks and trains them into ReasoningBank automatically. Quality gate validation with exit code 2 rejection.
+- **Domain Circuit Breakers** — Per-domain fault isolation with configurable failure thresholds, half-open recovery probing, and criticality-based configs.
+- **Domain Team Manager** — Creates and manages domain-scoped agent teams with health monitoring, scaling, and rebalancing.
+- **HNSW Graph Construction** — Real O(log n) HNSW insert and search in unified memory, replacing the O(n) linear scan stub.
+- **Distributed Tracing** — TraceCollector with W3C-style TraceContext propagation encoded into AgentMessage correlationId fields. Queen traces full task lifecycles.
+- **Competing Hypotheses** — HypothesisManager for multi-agent root cause investigation with evidence scoring, confidence tracking, and convergence (evidence-scoring, unanimous, majority, timeout). Auto-triggered on p0/p1 task failures.
+- **Cross-Fleet Federation** — FederationMailbox with service registry, domain-based routing, health monitoring via heartbeats, and graceful degradation for unreachable services.
+- **Dynamic Agent Scaling** — DynamicScaler with workload metrics collection, configurable scaling policies (queue depth, idle ratio, error rate thresholds), cooldown enforcement, and executor callbacks. Wired into Queen's metrics loop.
+- **ReasoningBank Pattern Store Adapter** — Bridges TaskCompletedHook pattern extraction to QEReasoningBank storage with domain detection, type mapping, and confidence propagation.
+- **promotePattern() Implementation** — Completes the ReasoningBank promotion stub: delegates to PatternStore.promote() and publishes pattern:promoted events.
+- **Devil's Advocate Agent** — `qe-devils-advocate` agent that challenges other agents' outputs by finding gaps and questioning assumptions.
+- **397+ New Tests** — 282 coordination tests, 67 hook tests, 48 learning tests covering all ADR-064 phases including adapter tracing integration and latency benchmarks.
+### Fixed
+- **6 CodeQL Alerts** — Resolved security alerts in enterprise-integration services (input validation, type safety).
+- **Pattern Training Pipeline** — Connected the disconnected TaskCompletedHook → ReasoningBank pipeline so patterns are automatically trained on task completion.
+- **Queen Operational Wiring** — All Phase 3+4 modules (tracing, dynamic scaler, hypotheses) are now called by Queen's operational flow, not just initialized as shelf-ware.
+### Changed
+- **Queen Coordinator** — Extended with tracing (startTrace on submitTask, completeSpan/failSpan on completion/failure), dynamic scaling (metrics feed + evaluate + execute in metrics loop), and competing hypotheses (auto-investigation on critical failures).
+- **Agent Teams Adapter** — sendMessage() and broadcast() now encode TraceContext into correlationId when provided, enabling end-to-end distributed tracing.
 ## [3.6.0] - 2026-02-08
 ### Added

package/v3/README.md CHANGED Viewed

@@ -5,14 +5,14 @@
 [![TypeScript](https://img.shields.io/badge/TypeScript-5.0+-blue.svg)](https://www.typescriptlang.org/)
 [![Node.js](https://img.shields.io/badge/Node.js-18+-green.svg)](https://nodejs.org/)
-> Domain-Driven Quality Engineering with Mathematical Coherence Verification, 13 Bounded Contexts, 59 Specialized QE Agents, 75 QE Skills, and ReasoningBank Learning
+> Domain-Driven Quality Engineering with Mathematical Coherence Verification, 13 Bounded Contexts, 60 Specialized QE Agents, 75 QE Skills, and ReasoningBank Learning
 ### Key Features
 | Feature | Description |
 |---------|-------------|
 | **75 QE Skills** | Quality engineering skills with 4-tier trust validation system |
-| **59 QE Agents** | Specialized agents for test generation, security, coverage, enterprise integration, and more |
+| **60 QE Agents** | Specialized agents for test generation, security, coverage, enterprise integration, and more |
 | **13 DDD Domains** | Modular bounded contexts for all quality engineering needs |
 | **MCP Integration** | Full Claude Code integration via Model Context Protocol |
 | **AG-UI/A2A Protocols** | Industry-standard agent streaming and interoperability |
@@ -48,7 +48,7 @@ npx aqe test generate src/
 ## Why Agentic QE?
-- **59 Specialized QE Agents** - Domain-focused quality engineering agents (52 main + 7 subagents)
+- **60 Specialized QE Agents** - Domain-focused quality engineering agents (53 main + 7 subagents)
 - **75 QE Skills** - 46 Tier 3 verified + 29 additional (QCSD swarms, n8n testing, enterprise integration, qe-* domains)
 - **13 DDD Bounded Contexts** - Modular, extensible architecture
 - **TinyDancer Model Routing** - 3-tier intelligent routing for cost optimization
@@ -632,7 +632,7 @@ console.log(`Quality gate: ${gate.value.passed ? 'PASSED' : 'FAILED'}`);
 | Module System | CommonJS | ESM |
 | Memory | SQLite only | HNSW + SQLite hybrid |
 | Learning | Basic patterns | ReasoningBank + SONA + Dream Cycles |
-| Agents | 32 | 59 QE agents (52 main + 7 subagents) |
+| Agents | 32 | 60 QE agents (53 main + 7 subagents) |
 | Skills | 35 | 75 QE skills (46 Tier 3 + 29 additional) |
 | Coverage | O(n) | O(log n) |
 | Pattern Search | Linear | O(log n) HNSW indexing |
@@ -706,9 +706,9 @@ See the [Migration Guide](./docs/MIGRATION-GUIDE.md) for detailed instructions a
 }
 ```
-## 59 QE Agents
+## 60 QE Agents
-Agentic QE includes 59 specialized quality engineering agents (52 main + 7 subagents) organized by domain:
+Agentic QE includes 60 specialized quality engineering agents (53 main + 7 subagents) organized by domain:
 ### Test Generation Domain
 `qe-test-architect`, `qe-tdd-specialist`, `qe-tdd-red`, `qe-tdd-green`, `qe-tdd-refactor`, `qe-property-tester`, `qe-mutation-tester`, `qe-bdd-generator`
@@ -720,7 +720,7 @@ Agentic QE includes 59 specialized quality engineering agents (52 main + 7 subag
 `qe-coverage-specialist`, `qe-gap-detector`, `qe-risk-analyzer`
 ### Quality Assessment Domain
-`qe-quality-gate`, `qe-metrics-optimizer`, `qe-deployment-advisor`
+`qe-quality-gate`, `qe-metrics-optimizer`, `qe-deployment-advisor`, `qe-devils-advocate`
 ### Defect Intelligence Domain
 `qe-defect-intelligence`, `qe-regression-analyzer`, `qe-root-cause-analyzer`