npm - agentic-qe - Versions diffs - 2.3.2 → 2.3.3 - Mend

agentic-qe 2.3.2 → 2.3.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (127) hide show

package/scripts/README.md ADDED Viewed

@@ -0,0 +1,352 @@
+# Documentation Verification Scripts
+Automated scripts to prevent documentation drift and verify feature claims.
+## Overview
+These scripts ensure that documentation (README.md, CLAUDE.md, package.json) stays accurate as the project evolves by:
+- Automatically counting skills, agents, and MCP tools
+- Verifying agent skill references
+- Validating feature implementation against claims
+- Providing automated updates and continuous monitoring
+## Scripts
+### 1. verify-counts.ts
+Counts skills, agents, and MCP tools, then compares against documentation claims.
+**Usage:**
+```bash
+npm run verify:counts
+npm run verify:counts -- --verbose
+npm run verify:counts -- --json
+```
+**What it checks:**
+- Total skills count
+- QE skills count
+- Phase 1 skills count
+- Phase 2 skills count
+- Claude Flow skills count
+- QE agents count
+- MCP tools count
+**Exit codes:**
+- `0` - All counts match documentation
+- `1` - Mismatches found
+**Output:**
+```
+✅ Skill Count Verification
+   - Total Skills: 60 (✓ matches documentation)
+   - QE Skills: 35 (✓ matches documentation)
+   - Phase 1 Skills: 18 (✓ matches documentation)
+   - Phase 2 Skills: 17 (✓ matches documentation)
+⚠️  MCP Tools Count Mismatch
+   - Actual: 61 tools
+   - README.md line 14: claims 52 tools ❌
+```
+### 2. verify-agent-skills.ts
+Validates that agent skill references exist and suggests additions based on specialization.
+**Usage:**
+```bash
+npm run verify:agent-skills
+npm run verify:agent-skills -- --verbose
+npm run verify:agent-skills -- --json
+npm run verify:agent-skills -- --agent=qe-test-generator
+```
+**What it checks:**
+- Skill references in agent markdown files
+- Whether referenced skills exist in `.claude/skills/`
+- Phase 2 skill adoption
+- Skill suggestions based on agent specialization
+**Exit codes:**
+- `0` - All agent skills valid
+- `1` - Missing or broken skill references found
+**Output:**
+```
+🤖 Agent: qe-test-generator
+   Skills Referenced: 5
+   Valid References: 5
+   Broken References: 0
+   Phase 2 Skills: 0
+⚠️  No Phase 2 skills referenced
+💡 SUGGESTED ADDITIONS:
+   - shift-left-testing (matches specialization)
+   - test-design-techniques (matches specialization)
+   - test-data-management (matches specialization)
+```
+### 3. update-documentation-counts.ts
+Automatically updates counts in documentation files based on actual counts.
+**Usage:**
+```bash
+npm run update:counts                    # Apply updates
+npm run update:counts -- --dry-run       # Preview changes
+npm run update:counts -- --verbose       # Detailed output
+```
+**What it updates:**
+- README.md: Skills, agents, and MCP tools counts
+- CLAUDE.md: QE skills and agents counts
+- package.json: MCP tools count in description
+**Safety features:**
+- Creates backups before modification (`.backup-TIMESTAMP`)
+- Dry-run mode to preview changes
+- Detailed changelog of updates
+**Output:**
+```
+📊 Current Counts:
+  Skills (Total): 60
+  Skills (QE): 35
+  Skills (Phase 1): 18
+  Skills (Phase 2): 17
+  Agents (QE): 18
+  MCP Tools: 61
+✅ Operations to apply: 8
+📄 README.md
+  ✓ Update MCP tools count in README header
+  ✓ Update total QE skills
+  ✓ Update Phase 1 skills count
+  ✓ Update Phase 2 skills count
+```
+### 4. verify-features.ts
+Comprehensive verification of feature claims against actual implementation.
+**Usage:**
+```bash
+npm run verify:features
+npm run verify:features -- --verbose
+npm run verify:features -- --json
+npm run verify:features -- --feature=multi-model-router
+```
+**What it verifies:**
+1. **Multi-Model Router** (70-81% cost savings)
+   - AdaptiveModelRouter class exists
+   - Configuration file exists
+   - Cost tracking implemented
+   - Tests exist
+2. **Learning System** (20% improvement target)
+   - LearningEngine class exists
+   - Q-learning algorithm implemented
+   - Experience replay buffer
+   - Tests exist
+3. **Pattern Bank** (85%+ matching accuracy)
+   - QEReasoningBank class exists
+   - Pattern extraction works
+   - Pattern matching implemented
+   - Cross-project sharing
+4. **ML Flaky Detection** (100% accuracy claim)
+   - FlakyTestDetector class exists
+   - ML model implemented
+   - Root cause analysis
+   - Fix recommendations
+5. **Streaming API** (real-time progress)
+   - Streaming handlers exist
+   - AsyncGenerator pattern used
+   - Progress events emitted
+6. **AgentDB Integration**
+   - AgentDB installed
+   - QUIC sync configured
+   - Vector search works
+   - Learning plugins exist
+7. **61 MCP Tools**
+   - Count actual tool definitions
+   - Verify each tool exported
+   - Check handler existence
+8. **Performance Claims**
+   - Test generation: 1000+ tests/minute
+   - Coverage analysis: O(log n) complexity
+   - Data generation: 10,000+ records/second
+   - Pattern matching: <50ms p95
+**Exit codes:**
+- `0` - All features verified (≥80% confidence)
+- `1` - Features missing or low confidence
+**Output:**
+```
+✅ Multi-Model Router
+   Claimed: 70-81% cost savings, intelligent model selection
+   Status: VERIFIED (87.5% confidence)
+   Checks: 7 passed, 1 failed, 0 warnings
+⚠️  Learning System
+   Claimed: 20% improvement target, Q-learning algorithm
+   Status: PARTIAL (62.5% confidence)
+   Detailed Checks:
+   ✓ LearningEngine class found
+   ✓ Q-learning implemented
+   ✗ Experience replay tests missing
+   ✓ Improvement tracking works
+💡 Action Required:
+  • Add tests for experience replay buffer
+```
+## Continuous Integration
+All scripts run automatically in CI/CD via GitHub Actions.
+**Workflow:** `.github/workflows/verify-documentation.yml`
+**Triggers:**
+- Push to main/develop/testing-with-qe branches
+- Pull requests to main/develop
+- Daily scheduled check (2 AM UTC)
+- Manual workflow dispatch
+**On failure:**
+- PR gets comment with results
+- Workflow artifacts contain detailed reports
+- Daily check creates GitHub issue
+## Reports
+All scripts generate JSON reports saved to `/reports/` directory:
+- `verification-counts-{timestamp}.json`
+- `verification-agent-skills-{timestamp}.json`
+- `verification-features-{timestamp}.json`
+- `update-counts-{timestamp}.json`
+## Best Practices
+1. **Run verification before committing:**
+   ```bash
+   npm run verify:all
+   ```
+2. **Fix count mismatches automatically:**
+   ```bash
+   npm run update:counts --dry-run  # Preview
+   npm run update:counts            # Apply
+   ```
+3. **Check feature claims before release:**
+   ```bash
+   npm run verify:features
+   ```
+4. **Review agent skill references quarterly:**
+   ```bash
+   npm run verify:agent-skills --verbose
+   ```
+## Development
+### Adding New Checks
+**verify-counts.ts:**
+```typescript
+// Add new category to count
+const newCategory = extractCountFromDocs(
+  readmePath,
+  /(\d+)\s+New\s+Category/i
+);
+results.push({
+  type: 'new-category',
+  category: 'total',
+  actual: actualCount,
+  expected: newCategory || undefined,
+  source: 'README.md',
+  status: newCategory !== null && actualCount === newCategory ? 'match' : 'mismatch'
+});
+```
+**verify-features.ts:**
+```typescript
+function verifyNewFeature(): FeatureVerification {
+  const checks: FeatureCheck[] = [
+    checkClassExists('src/new/Feature.ts', 'Feature'),
+    checkTestsExist('*new-feature*.test.ts'),
+    // ... more checks
+  ];
+  const passCount = checks.filter(c => c.status === 'pass').length;
+  const confidence = (passCount / checks.length) * 100;
+  return {
+    feature: 'New Feature',
+    description: 'Description of new feature',
+    claimed: 'What the docs claim',
+    checks,
+    overallStatus: confidence >= 80 ? 'verified' :
+                   confidence >= 50 ? 'partial' : 'missing',
+    confidence
+  };
+}
+```
+### Testing
+All scripts include comprehensive error handling and can be tested independently:
+```bash
+# Test count verification
+npm run verify:counts
+# Test with verbose output
+npm run verify:counts -- --verbose
+# Test update in dry-run mode
+npm run update:counts -- --dry-run
+# Test specific agent
+npm run verify:agent-skills -- --agent=qe-test-generator
+# Test specific feature
+npm run verify:features -- --feature=multi-model-router
+```
+## Troubleshooting
+### "Pattern not found" warnings
+If update script can't find a pattern, the documentation format may have changed. Update the regex pattern in `update-documentation-counts.ts`.
+### "File not found" errors
+Ensure project structure hasn't changed. Update file paths in verification scripts.
+### CI failing but local passing
+Check that all files are committed, especially in `.claude/` directories which might be gitignored.
+## Dependencies
+- **tsx**: TypeScript execution
+- **fs**: File system operations
+- **path**: Path manipulation
+- **child_process**: For running shell commands
+No external dependencies required beyond Node.js built-ins.
+## License
+MIT - Part of Agentic QE Fleet System

package/scripts/hooks/capture-task-learning.js ADDED Viewed

@@ -0,0 +1,191 @@
+#!/usr/bin/env node
+/**
+ * PostToolUse Hook: Automatic Task Learning Capture
+ *
+ * This hook automatically captures learnings from completed Task agents
+ * and persists them to memory.db. It provides a safety net ensuring
+ * learnings are captured even when agents don't explicitly call MCP tools.
+ *
+ * Input (via stdin): PostToolUse JSON with tool_input and tool_response
+ * Output: Stores learning_experience record to .agentic-qe/memory.db
+ *
+ * @module hooks/capture-task-learning
+ */
+const fs = require('fs');
+const path = require('path');
+// Read stdin
+let input = '';
+process.stdin.setEncoding('utf8');
+process.stdin.on('data', chunk => input += chunk);
+process.stdin.on('end', () => processTaskLearning(input));
+async function processTaskLearning(jsonInput) {
+  try {
+    const data = JSON.parse(jsonInput);
+    // Only process completed Task tools
+    if (data.tool_name !== 'Task' || data.tool_response?.status !== 'completed') {
+      return;
+    }
+    // Extract key information
+    const agentType = data.tool_input?.subagent_type || 'unknown';
+    const taskDescription = data.tool_input?.description || '';
+    const prompt = data.tool_input?.prompt || '';
+    const agentOutput = data.tool_response?.content?.[0]?.text || '';
+    const durationMs = data.tool_response?.totalDurationMs || 0;
+    const totalTokens = data.tool_response?.totalTokens || 0;
+    const toolUseCount = data.tool_response?.totalToolUseCount || 0;
+    const agentId = data.tool_response?.agentId || 'unknown';
+    const cwd = data.cwd || process.cwd();
+    // Skip if no meaningful output
+    if (!agentOutput || agentOutput.length < 20) {
+      return;
+    }
+    // Determine task type from agent type
+    const taskTypeMap = {
+      'qe-test-generator': 'test-generation',
+      'qe-coverage-analyzer': 'coverage-analysis',
+      'qe-security-scanner': 'security-scan',
+      'qe-performance-tester': 'performance-test',
+      'qe-flaky-test-hunter': 'flaky-detection',
+      'qe-chaos-engineer': 'chaos-testing',
+      'qe-code-complexity': 'complexity-analysis',
+      'qe-quality-gate': 'quality-gate',
+      'qe-regression-risk-analyzer': 'regression-analysis',
+      'qe-requirements-validator': 'requirements-validation',
+      'qe-test-data-architect': 'test-data-generation',
+      'qe-visual-tester': 'visual-testing',
+      'qe-api-contract-validator': 'contract-validation',
+      'qe-fleet-commander': 'fleet-coordination',
+      'qe-test-executor': 'test-execution',
+      'qe-quality-analyzer': 'quality-analysis',
+      'qe-deployment-readiness': 'deployment-readiness',
+      'qe-production-intelligence': 'production-intelligence',
+      'qx-partner': 'qx-analysis',
+      'researcher': 'research',
+      'coder': 'implementation',
+      'tester': 'testing',
+      'reviewer': 'code-review'
+    };
+    const taskType = taskTypeMap[agentType] || agentType;
+    // Calculate reward based on output quality indicators
+    let reward = 0.7; // Base reward for completion
+    // Increase reward for comprehensive output
+    if (agentOutput.length > 500) reward += 0.05;
+    if (agentOutput.length > 1000) reward += 0.05;
+    if (agentOutput.length > 2000) reward += 0.05;
+    // Increase reward for tool usage (indicates thorough work)
+    if (toolUseCount > 0) reward += 0.05;
+    if (toolUseCount > 5) reward += 0.05;
+    // Check for success indicators in output
+    const successIndicators = ['✓', '✅', 'success', 'complete', 'passed', 'found', 'created', 'generated'];
+    const failureIndicators = ['❌', 'failed', 'error', 'could not', 'unable to'];
+    const hasSuccess = successIndicators.some(ind => agentOutput.toLowerCase().includes(ind.toLowerCase()));
+    const hasFailure = failureIndicators.some(ind => agentOutput.toLowerCase().includes(ind.toLowerCase()));
+    if (hasSuccess && !hasFailure) reward += 0.1;
+    if (hasFailure) reward -= 0.2;
+    // Cap reward between 0.1 and 1.0
+    reward = Math.max(0.1, Math.min(1.0, reward));
+    // Find memory.db path
+    const dbPath = path.join(cwd, '.agentic-qe', 'memory.db');
+    // Check if database exists
+    if (!fs.existsSync(dbPath)) {
+      // Try to find it relative to this script
+      const altPath = path.join(__dirname, '../../.agentic-qe/memory.db');
+      if (!fs.existsSync(altPath)) {
+        console.error('💡 Learning: memory.db not found, skipping capture');
+        return;
+      }
+    }
+    // Store learning experience
+    const Database = require('better-sqlite3');
+    const db = new Database(dbPath);
+    // Note: Table should already exist from aqe init
+    // Schema: id, agent_id, task_id, task_type, state, action, reward, next_state, episode_id, metadata, created_at, timestamp
+    // DEDUPLICATION: Check if agent already stored learning via MCP in last 60 seconds
+    // This prevents double-storing when agents properly call MCP learning tools
+    const recentLearning = db.prepare(`
+      SELECT id, metadata FROM learning_experiences
+      WHERE agent_id = ?
+        AND created_at > datetime('now', '-60 seconds')
+      ORDER BY created_at DESC
+      LIMIT 1
+    `).get(agentType);
+    if (recentLearning) {
+      // Check if it was stored by the agent (not by hook)
+      try {
+        const meta = JSON.parse(recentLearning.metadata || '{}');
+        if (meta.capturedBy !== 'PostToolUse-hook') {
+          // Agent already stored learning via MCP - skip duplicate
+          console.log(`📚 Learning: Agent ${agentType} already stored learning via MCP - skipping hook capture`);
+          db.close();
+          return;
+        }
+      } catch { /* ignore parse errors */ }
+    }
+    // Extract outcome summary (first 500 chars of output)
+    const outcomeSummary = agentOutput.substring(0, 500).replace(/\n/g, ' ').trim();
+    // Insert learning experience (matching actual schema)
+    const stmt = db.prepare(`
+      INSERT INTO learning_experiences (agent_id, task_id, task_type, state, action, reward, next_state, episode_id, metadata)
+      VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?)
+    `);
+    const metadata = JSON.stringify({
+      taskDescription,
+      durationMs,
+      totalTokens,
+      toolUseCount,
+      outputLength: agentOutput.length,
+      outputSummary: outcomeSummary,
+      success: hasSuccess && !hasFailure,
+      capturedBy: 'PostToolUse-hook',
+      sessionAgentId: agentId
+    });
+    const taskId = `task-${Date.now()}-${Math.random().toString(36).substr(2, 9)}`;
+    const episodeId = `episode-${new Date().toISOString().slice(0, 10)}`;
+    stmt.run(
+      agentType,
+      taskId,
+      taskType,
+      'task-started',
+      'execute-task',
+      reward,
+      'task-completed',
+      episodeId,
+      metadata
+    );
+    db.close();
+    // Output confirmation (visible in hook output)
+    console.log(`📚 Learning captured: ${agentType} → ${taskType} (reward: ${reward.toFixed(2)})`);
+  } catch (error) {
+    // Silently fail - don't break the workflow
+    // Uncomment for debugging:
+    // console.error('Hook error:', error.message);
+  }
+}

package/scripts/hooks/emit-task-complete.sh ADDED Viewed

@@ -0,0 +1,35 @@
+#!/bin/bash
+# Hook: Post-Task - Emit agent completion event to visualization
+# Receives JSON via stdin from Claude Code hook system
+# Read input from stdin
+INPUT=$(cat)
+# Try to get the agent ID from the temp file or generate from description
+if [ -f "/tmp/aqe-viz/current-agent-$$" ]; then
+  AGENT_ID=$(cat "/tmp/aqe-viz/current-agent-$$")
+  rm -f "/tmp/aqe-viz/current-agent-$$"
+else
+  # Fallback: extract from input
+  DESC=$(echo "$INPUT" | jq -r '.tool_input.description // .tool_input.prompt // "task-agent"' 2>/dev/null | head -c 50)
+  AGENT_ID=$(echo "$DESC" | tr ' ' '-' | tr '[:upper:]' '[:lower:]' | sed 's/[^a-z0-9-]//g' | head -c 30)
+  AGENT_ID="${AGENT_ID:-task-agent}-recent"
+fi
+# Check if task was successful (from tool response)
+SUCCESS=$(echo "$INPUT" | jq -r '.tool_response.success // .result.success // "unknown"' 2>/dev/null)
+# Calculate approximate duration (we don't have exact timing, use 0)
+DURATION=0
+# Emit completion or error event based on success status (non-blocking, background)
+(
+  if [ "$SUCCESS" = "false" ]; then
+    ERROR_MSG=$(echo "$INPUT" | jq -r '.tool_response.error // .result.error // "Task failed"' 2>/dev/null)
+    npx tsx scripts/emit-agent-event.ts error "$AGENT_ID" "$ERROR_MSG" 2>/dev/null
+  else
+    npx tsx scripts/emit-agent-event.ts complete "$AGENT_ID" "$DURATION" 2>/dev/null
+  fi
+) &
+exit 0

package/scripts/hooks/emit-task-spawn.sh ADDED Viewed

@@ -0,0 +1,27 @@
+#!/bin/bash
+# Hook: Pre-Task - Emit agent spawn event to visualization
+# Receives JSON via stdin from Claude Code hook system
+# Read input from stdin
+INPUT=$(cat)
+# Extract task description and agent type from hook input
+DESC=$(echo "$INPUT" | jq -r '.tool_input.description // .tool_input.prompt // "task-agent"' 2>/dev/null | head -c 50)
+AGENT_TYPE=$(echo "$INPUT" | jq -r '.tool_input.subagent_type // .tool_input.agent // "coder"' 2>/dev/null)
+# Generate agent ID from description
+AGENT_ID=$(echo "$DESC" | tr ' ' '-' | tr '[:upper:]' '[:lower:]' | sed 's/[^a-z0-9-]//g' | head -c 30)
+AGENT_ID="${AGENT_ID:-task-agent}-$(date +%s)"
+# Store agent ID in temp file for completion hook
+mkdir -p /tmp/aqe-viz
+echo "$AGENT_ID" > "/tmp/aqe-viz/current-agent-$$"
+echo "$AGENT_ID" >> "/tmp/aqe-viz/agent-registry"
+# Emit spawn and start events (non-blocking, background)
+(
+  npx tsx scripts/emit-agent-event.ts spawn "$AGENT_ID" "$AGENT_TYPE" 2>/dev/null
+  npx tsx scripts/emit-agent-event.ts start "$AGENT_ID" 2>/dev/null
+) &
+exit 0

package/.claude/agents/failing-agent.json DELETED Viewed

@@ -1,9 +0,0 @@
-{
-  "name": "failing-agent",
-  "capabilities": [
-    "debug",
-    "test"
-  ],
-  "status": "active",
-  "tasks": []
-}

package/.claude/agents/test-agent.json DELETED Viewed

@@ -1,9 +0,0 @@
-{
-  "name": "test-agent",
-  "capabilities": [
-    "debug",
-    "test"
-  ],
-  "status": "active",
-  "tasks": []
-}

package/dist/App.d.ts DELETED Viewed

@@ -1,5 +0,0 @@
-import React from 'react';
-import './App.css';
-declare const App: React.FC;
-export default App;
-//# sourceMappingURL=App.d.ts.map

package/dist/App.d.ts.map DELETED Viewed

	@@ -1 +0,0 @@
1	- {"version":3,"file":"App.d.ts","sourceRoot":"","sources":["../src/App.tsx"],"names":[],"mappings":"AAAA,OAAO,KAAK,MAAM,OAAO,CAAC;AAG1B,OAAO,WAAW,CAAC;AAEnB,QAAA,MAAM,GAAG,EAAE,KAAK,CAAC,EAMhB,CAAC;AAEF,eAAe,GAAG,CAAC"}

package/dist/App.js DELETED Viewed

@@ -1,15 +0,0 @@
-"use strict";
-var __importDefault = (this && this.__importDefault) || function (mod) {
-    return (mod && mod.__esModule) ? mod : { "default": mod };
-};
-Object.defineProperty(exports, "__esModule", { value: true });
-const react_1 = __importDefault(require("react"));
-const DashboardContext_1 = require("./contexts/DashboardContext");
-const Dashboard_1 = require("./components/Dashboard/Dashboard");
-require("./App.css");
-const App = () => {
-    return (react_1.default.createElement(DashboardContext_1.DashboardProvider, null,
-        react_1.default.createElement(Dashboard_1.Dashboard, null)));
-};
-exports.default = App;
-//# sourceMappingURL=App.js.map

package/dist/App.js.map DELETED Viewed

	@@ -1 +0,0 @@
1	- {"version":3,"file":"App.js","sourceRoot":"","sources":["../src/App.tsx"],"names":[],"mappings":";;;;;AAAA,kDAA0B;AAC1B,kEAAgE;AAChE,gEAA6D;AAC7D,qBAAmB;AAEnB,MAAM,GAAG,GAAa,GAAG,EAAE;IACzB,OAAO,CACL,8BAAC,oCAAiB;QAChB,8BAAC,qBAAS,OAAG,CACK,CACrB,CAAC;AACJ,CAAC,CAAC;AAEF,kBAAe,GAAG,CAAC"}