npm - agentic-qe - Versions diffs - 2.0.0 → 2.1.1 - Mend

agentic-qe 2.0.0 → 2.1.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (144) hide show

package/.claude/agents/qx-partner.md +17 -4
package/.claude/skills/accessibility-testing/SKILL.md +144 -692
package/.claude/skills/agentic-quality-engineering/SKILL.md +176 -529
package/.claude/skills/api-testing-patterns/SKILL.md +180 -560
package/.claude/skills/brutal-honesty-review/SKILL.md +113 -603
package/.claude/skills/bug-reporting-excellence/SKILL.md +116 -517
package/.claude/skills/chaos-engineering-resilience/SKILL.md +127 -72
package/.claude/skills/cicd-pipeline-qe-orchestrator/SKILL.md +209 -404
package/.claude/skills/code-review-quality/SKILL.md +158 -608
package/.claude/skills/compatibility-testing/SKILL.md +148 -38
package/.claude/skills/compliance-testing/SKILL.md +132 -63
package/.claude/skills/consultancy-practices/SKILL.md +114 -446
package/.claude/skills/context-driven-testing/SKILL.md +117 -381
package/.claude/skills/contract-testing/SKILL.md +176 -141
package/.claude/skills/database-testing/SKILL.md +137 -130
package/.claude/skills/exploratory-testing-advanced/SKILL.md +160 -629
package/.claude/skills/holistic-testing-pact/SKILL.md +140 -188
package/.claude/skills/localization-testing/SKILL.md +145 -33
package/.claude/skills/mobile-testing/SKILL.md +132 -448
package/.claude/skills/mutation-testing/SKILL.md +147 -41
package/.claude/skills/performance-testing/SKILL.md +200 -546
package/.claude/skills/quality-metrics/SKILL.md +164 -519
package/.claude/skills/refactoring-patterns/SKILL.md +132 -699
package/.claude/skills/regression-testing/SKILL.md +120 -926
package/.claude/skills/risk-based-testing/SKILL.md +157 -660
package/.claude/skills/security-testing/SKILL.md +199 -538
package/.claude/skills/sherlock-review/SKILL.md +163 -699
package/.claude/skills/shift-left-testing/SKILL.md +161 -465
package/.claude/skills/shift-right-testing/SKILL.md +161 -519
package/.claude/skills/six-thinking-hats/SKILL.md +175 -1110
package/.claude/skills/skills-manifest.json +71 -20
package/.claude/skills/tdd-london-chicago/SKILL.md +131 -448
package/.claude/skills/technical-writing/SKILL.md +103 -154
package/.claude/skills/test-automation-strategy/SKILL.md +166 -772
package/.claude/skills/test-data-management/SKILL.md +126 -910
package/.claude/skills/test-design-techniques/SKILL.md +179 -89
package/.claude/skills/test-environment-management/SKILL.md +136 -91
package/.claude/skills/test-reporting-analytics/SKILL.md +169 -92
package/.claude/skills/testability-scoring/SKILL.md +172 -538
package/.claude/skills/testability-scoring/scripts/generate-html-report.js +0 -0
package/.claude/skills/visual-testing-advanced/SKILL.md +155 -78
package/.claude/skills/xp-practices/SKILL.md +151 -587
package/CHANGELOG.md +86 -0
package/README.md +23 -16
package/dist/agents/QXPartnerAgent.d.ts +47 -1
package/dist/agents/QXPartnerAgent.d.ts.map +1 -1
package/dist/agents/QXPartnerAgent.js +2086 -125
package/dist/agents/QXPartnerAgent.js.map +1 -1
package/dist/agents/lifecycle/AgentLifecycleManager.d.ts.map +1 -1
package/dist/agents/lifecycle/AgentLifecycleManager.js +34 -31
package/dist/agents/lifecycle/AgentLifecycleManager.js.map +1 -1
package/dist/cli/commands/init-claude-md-template.d.ts.map +1 -1
package/dist/cli/commands/init-claude-md-template.js +14 -0
package/dist/cli/commands/init-claude-md-template.js.map +1 -1
package/dist/core/SwarmCoordinator.d.ts +180 -0
package/dist/core/SwarmCoordinator.d.ts.map +1 -0
package/dist/core/SwarmCoordinator.js +473 -0
package/dist/core/SwarmCoordinator.js.map +1 -0
package/dist/core/memory/ReflexionMemoryAdapter.d.ts +109 -0
package/dist/core/memory/ReflexionMemoryAdapter.d.ts.map +1 -0
package/dist/core/memory/ReflexionMemoryAdapter.js +306 -0
package/dist/core/memory/ReflexionMemoryAdapter.js.map +1 -0
package/dist/core/memory/RuVectorPatternStore.d.ts +28 -0
package/dist/core/memory/RuVectorPatternStore.d.ts.map +1 -1
package/dist/core/memory/RuVectorPatternStore.js +70 -0
package/dist/core/memory/RuVectorPatternStore.js.map +1 -1
package/dist/core/memory/SparseVectorSearch.d.ts +55 -0
package/dist/core/memory/SparseVectorSearch.d.ts.map +1 -0
package/dist/core/memory/SparseVectorSearch.js +130 -0
package/dist/core/memory/SparseVectorSearch.js.map +1 -0
package/dist/core/memory/TieredCompression.d.ts +81 -0
package/dist/core/memory/TieredCompression.d.ts.map +1 -0
package/dist/core/memory/TieredCompression.js +270 -0
package/dist/core/memory/TieredCompression.js.map +1 -0
package/dist/core/memory/index.d.ts +6 -0
package/dist/core/memory/index.d.ts.map +1 -1
package/dist/core/memory/index.js +29 -1
package/dist/core/memory/index.js.map +1 -1
package/dist/core/metrics/MetricsAggregator.d.ts +228 -0
package/dist/core/metrics/MetricsAggregator.d.ts.map +1 -0
package/dist/core/metrics/MetricsAggregator.js +482 -0
package/dist/core/metrics/MetricsAggregator.js.map +1 -0
package/dist/core/metrics/index.d.ts +5 -0
package/dist/core/metrics/index.d.ts.map +1 -0
package/dist/core/metrics/index.js +11 -0
package/dist/core/metrics/index.js.map +1 -0
package/dist/core/optimization/SwarmOptimizer.d.ts +5 -0
package/dist/core/optimization/SwarmOptimizer.d.ts.map +1 -1
package/dist/core/optimization/SwarmOptimizer.js +17 -0
package/dist/core/optimization/SwarmOptimizer.js.map +1 -1
package/dist/core/orchestration/AdaptiveScheduler.d.ts +190 -0
package/dist/core/orchestration/AdaptiveScheduler.d.ts.map +1 -0
package/dist/core/orchestration/AdaptiveScheduler.js +460 -0
package/dist/core/orchestration/AdaptiveScheduler.js.map +1 -0
package/dist/core/orchestration/WorkflowOrchestrator.d.ts +13 -0
package/dist/core/orchestration/WorkflowOrchestrator.d.ts.map +1 -1
package/dist/core/orchestration/WorkflowOrchestrator.js +32 -0
package/dist/core/orchestration/WorkflowOrchestrator.js.map +1 -1
package/dist/core/recovery/CircuitBreaker.d.ts +176 -0
package/dist/core/recovery/CircuitBreaker.d.ts.map +1 -0
package/dist/core/recovery/CircuitBreaker.js +382 -0
package/dist/core/recovery/CircuitBreaker.js.map +1 -0
package/dist/core/recovery/RecoveryOrchestrator.d.ts +186 -0
package/dist/core/recovery/RecoveryOrchestrator.d.ts.map +1 -0
package/dist/core/recovery/RecoveryOrchestrator.js +476 -0
package/dist/core/recovery/RecoveryOrchestrator.js.map +1 -0
package/dist/core/recovery/RetryStrategy.d.ts +127 -0
package/dist/core/recovery/RetryStrategy.d.ts.map +1 -0
package/dist/core/recovery/RetryStrategy.js +314 -0
package/dist/core/recovery/RetryStrategy.js.map +1 -0
package/dist/core/recovery/index.d.ts +8 -0
package/dist/core/recovery/index.d.ts.map +1 -0
package/dist/core/recovery/index.js +27 -0
package/dist/core/recovery/index.js.map +1 -0
package/dist/core/skills/DependencyResolver.d.ts +99 -0
package/dist/core/skills/DependencyResolver.d.ts.map +1 -0
package/dist/core/skills/DependencyResolver.js +260 -0
package/dist/core/skills/DependencyResolver.js.map +1 -0
package/dist/core/skills/ManifestGenerator.d.ts +114 -0
package/dist/core/skills/ManifestGenerator.d.ts.map +1 -0
package/dist/core/skills/ManifestGenerator.js +449 -0
package/dist/core/skills/ManifestGenerator.js.map +1 -0
package/dist/core/skills/index.d.ts +9 -0
package/dist/core/skills/index.d.ts.map +1 -0
package/dist/core/skills/index.js +24 -0
package/dist/core/skills/index.js.map +1 -0
package/dist/mcp/handlers/chaos/chaos-inject-failure.d.ts +5 -0
package/dist/mcp/handlers/chaos/chaos-inject-failure.d.ts.map +1 -1
package/dist/mcp/handlers/chaos/chaos-inject-failure.js +36 -2
package/dist/mcp/handlers/chaos/chaos-inject-failure.js.map +1 -1
package/dist/mcp/handlers/chaos/chaos-inject-latency.d.ts +5 -0
package/dist/mcp/handlers/chaos/chaos-inject-latency.d.ts.map +1 -1
package/dist/mcp/handlers/chaos/chaos-inject-latency.js +36 -2
package/dist/mcp/handlers/chaos/chaos-inject-latency.js.map +1 -1
package/dist/mcp/server.d.ts +9 -9
package/dist/mcp/server.d.ts.map +1 -1
package/dist/mcp/server.js +1 -2
package/dist/mcp/server.js.map +1 -1
package/dist/types/qx.d.ts +113 -7
package/dist/types/qx.d.ts.map +1 -1
package/dist/types/qx.js.map +1 -1
package/dist/visualization/api/RestEndpoints.js +1 -1
package/dist/visualization/api/RestEndpoints.js.map +1 -1
package/package.json +15 -54

package/.claude/skills/sherlock-review/SKILL.md CHANGED Viewed

@@ -1,786 +1,250 @@
 ---
-name: "Sherlock Review"
+name: sherlock-review
 description: "Evidence-based investigative code review using deductive reasoning to determine what actually happened versus what was claimed. Use when verifying implementation claims, investigating bugs, validating fixes, or conducting root cause analysis. Elementary approach to finding truth through systematic observation."
+category: quality-review
+priority: high
+tokenEstimate: 1100
+agents: [qe-code-reviewer, qe-security-auditor, qe-performance-validator]
+implementation_status: optimized
+optimization_version: 1.0
+last_optimized: 2025-12-03
+dependencies: []
+quick_reference_card: true
+tags: [investigation, evidence-based, code-review, root-cause, deduction]
 ---
 # Sherlock Review
-## What This Skill Does
-Conducts methodical, evidence-based investigation of code, tests, and system behavior using Holmesian deductive reasoning. Unlike traditional code reviews that focus on style and best practices, Sherlock Review investigates **what actually happened** versus **what was claimed to happen**, seeing what others miss through systematic observation and logical deduction.
-## Prerequisites
-- Access to codebase and version control history
-- Ability to run tests and reproduce issues
-- Understanding of the domain and system architecture
-- Critical thinking and skepticism
----
-## Quick Start (Elementary Method)
-### The 3-Step Investigation
+<default_to_action>
+When investigating code claims:
+1. OBSERVE: Gather all evidence (code, tests, history, behavior)
+2. DEDUCE: What does evidence actually show vs. what was claimed?
+3. ELIMINATE: Rule out what cannot be true
+4. CONCLUDE: Does evidence support the claim?
+5. DOCUMENT: Findings with proof, not assumptions
+**The 3-Step Investigation:**
 ```bash
-# 1. OBSERVE: Gather all evidence
-git log --oneline -10
+# 1. OBSERVE: Gather evidence
 git diff <commit>
-grep -r "claimed feature" .
+npm test -- --coverage
-# 2. DEDUCE: What does the evidence actually show?
-npm test
-git blame <file>
+# 2. DEDUCE: Compare claim vs reality
+# Does code match description?
+# Do tests prove the fix/feature?
-# 3. CONCLUDE: Does evidence support the claim?
-# Document findings with evidence
+# 3. CONCLUDE: Verdict with evidence
+# SUPPORTED / PARTIALLY SUPPORTED / NOT SUPPORTED
 ```
----
-## Investigation Methodology
-### Level 1: Initial Observation (The Crime Scene)
-**Principle**: "You see, but you do not observe. The distinction is clear."
-#### What to Examine First
+**Holmesian Principles:**
+- "Data! Data! Data!" - Collect before concluding
+- "Eliminate the impossible" - What cannot be true?
+- "You see, but do not observe" - Run code, don't just read
+- Trust only reproducible evidence
+</default_to_action>
-1. **The Claim**: What was supposed to happen?
-   - PR description
-   - Commit messages
-   - Issue tickets
-   - Documentation updates
+## Quick Reference Card
-2. **The Evidence**: What actually exists?
-   - Actual code changes
-   - Test coverage
-   - Build/test results
-   - Runtime behavior
+### Evidence Collection Checklist
-3. **The Timeline**: When did things happen?
-   - Commit history
-   - File modification times
-   - Test execution logs
-   - Deployment records
+| Category | What to Check | How |
+|----------|---------------|-----|
+| **Claim** | PR description, commit messages | Read thoroughly |
+| **Code** | Actual file changes | `git diff` |
+| **Tests** | Coverage, assertions | Run independently |
+| **Behavior** | Runtime output | Execute locally |
+| **Timeline** | When things happened | `git log`, `git blame` |
-#### Evidence Collection Checklist
+### Verdict Levels
-```markdown
-## Evidence Collection
-### The Claim
-- [ ] Read PR/issue description thoroughly
-- [ ] Note all claimed features/fixes
-- [ ] Identify specific assertions made
-- [ ] Record expected behavior
-### The Code
-- [ ] Examine actual file changes
-- [ ] Review implementation details
-- [ ] Check for edge cases
-- [ ] Verify error handling
-### The Tests
-- [ ] Count test cases added/modified
-- [ ] Run tests independently
-- [ ] Check test assertions
-- [ ] Verify test coverage
-### The Behavior
-- [ ] Run the code locally
-- [ ] Test claimed scenarios
-- [ ] Try edge cases
-- [ ] Reproduce reported fixes
-```
+| Verdict | Meaning |
+|---------|---------|
+| ✓ **TRUE** | Evidence fully supports claim |
+| ⚠ **PARTIALLY TRUE** | Claim accurate but incomplete |
+| ✗ **FALSE** | Evidence contradicts claim |
+| ? **NONSENSICAL** | Claim doesn't apply to context |
 ---
-## Level 2: Deductive Analysis (Elementary Reasoning)
-### The Sherlock Framework
-#### 1. Eliminate the Impossible
-**Method**: Systematically rule out what cannot be true
+## Investigation Template
 ```markdown
-## Investigation Notes
+## Sherlock Investigation: [Claim]
-### Claim: "Fixed user authentication bug"
+### The Claim
+"[What PR/commit claims to do]"
-#### Evidence Review:
-- ✓ Modified auth.js (lines 45-67)
-- ✓ Added 2 new test cases
-- ✗ No changes to login flow
-- ✗ No database migration
-- ✗ Tests pass but don't cover reported scenario
+### Evidence Examined
+- Code changes: [files, lines]
+- Tests added: [count, coverage]
+- Behavior observed: [what actually happens]
-#### Deductions:
-- IMPOSSIBLE: Fix covers all auth scenarios (no login flow changes)
-- POSSIBLE: Fix covers specific password reset case
-- LIKELY: Fix is partial, limited to one code path
-```
+### Deductive Analysis
-#### 2. Follow the Evidence Chain
+**Claim**: [specific assertion]
+**Evidence**: [what you found]
+**Deduction**: [logical conclusion]
+**Verdict**: ✓/⚠/✗
-**Method**: Connect observable facts to logical conclusions
+### Findings
+- What works: [with evidence]
+- What doesn't: [with evidence]
+- What's missing: [gaps in implementation/testing]
-```markdown
-## Evidence Chain
-### Observation 1: Test passes locally
-### Observation 2: Test fails in CI
-### Observation 3: Different Node versions
-### Chain of Reasoning:
-1. Test behavior differs by environment
-2. Environment difference is Node version
-3. Code uses Node-version-specific API
-4. Therefore: Fix is environment-dependent
-5. Conclusion: Claim of "fixed" is incomplete
+### Recommendations
+1. [Action based on findings]
 ```
-#### 3. Question Everything
-**Critical Questions to Ask**:
-- Does the code actually do what the commit message claims?
-- Do the tests verify the claimed fix?
-- Can the bug reproduce in conditions not covered by tests?
-- Are there edge cases not considered?
-- Does "works on my machine" equal "properly fixed"?
 ---
-## Level 3: Systematic Investigation Process
+## Investigation Scenarios
-### Step-by-Step Sherlock Review
-#### Step 1: Read the Case File
-```bash
-# Examine the claim
-git show <commit>
-cat PR_DESCRIPTION.md
-# Note specific assertions:
-# - "Fixes race condition in async handler"
-# - "Adds comprehensive error handling"
-# - "Improves performance by 40%"
-```
-#### Step 2: Examine the Evidence
-```bash
-# What actually changed?
-git diff main..feature-branch
+### Scenario 1: "This Fixed the Bug"
-# Count the facts:
-FILES_CHANGED=$(git diff --name-only main..feature-branch | wc -l)
-LINES_ADDED=$(git diff --stat main..feature-branch | tail -1)
-TESTS_ADDED=$(git diff main..feature-branch | grep -c "test(" )
+**Steps:**
+1. Reproduce bug on commit before fix
+2. Verify bug is gone on commit with fix
+3. Check if fix addresses root cause or symptom
+4. Test edge cases not in original report
-echo "Files modified: $FILES_CHANGED"
-echo "Tests added: $TESTS_ADDED"
-```
+**Red Flags:**
+- Fix that just removes error logging
+- Works only for specific test case
+- Workarounds instead of root cause fix
+- No regression test added
-#### Step 3: Test the Theory
+### Scenario 2: "Improved Performance by 50%"
-```bash
-# Run claimed fixes through scientific method
-npm test -- --coverage
+**Steps:**
+1. Run benchmark on baseline commit
+2. Run same benchmark on optimized commit
+3. Compare in identical conditions
+4. Verify measurement methodology
-# Test edge cases not covered:
-node scripts/test-edge-cases.js
+**Red Flags:**
+- Tested only on toy data
+- Different comparison conditions
+- Trade-offs not mentioned
-# Reproduce original bug:
-git checkout <bug-commit>
-npm test -- <failing-test>
-git checkout <fix-commit>
-npm test -- <failing-test>
-```
+### Scenario 3: "Handles All Edge Cases"
-#### Step 4: Cross-Examine the Code
+**Steps:**
+1. List all edge cases in code path
+2. Check each has test coverage
+3. Test boundary conditions
+4. Verify error handling paths
-**Questions for Code Interrogation**:
+**Red Flags:**
+- `catch {}` swallowing errors
+- Generic error messages
+- No logging of critical errors
-```javascript
-// CLAIMED: "Handles all null cases"
-function processData(data) {
-  if (data === null) return null;  // ✓ Handles null
-  return data.items.map(x => x);    // ✗ Doesn't handle data.items === null
-}
-// VERDICT: Claim is FALSE - only handles top-level null
-```
+---
-#### Step 5: Compile the Evidence Report
+## Example Investigation
 ```markdown
-## Sherlock Investigation Report
-### Case: PR #123 "Fix race condition in async handler"
+## Case: PR #123 "Fix race condition in async handler"
-### Claimed Facts:
+### Claims Examined:
 1. "Eliminates race condition"
 2. "Adds mutex locking"
 3. "100% thread safe"
-### Evidence Examined:
+### Evidence:
 - File: src/handlers/async-handler.js
 - Changes: Added `async/await`, removed callbacks
 - Tests: 2 new tests for async flow
 - Coverage: 85% (was 75%)
-### Deductive Analysis:
-#### Claim 1: "Eliminates race condition"
-**Evidence**:
-- Added `await` to sequential operations
-- No actual mutex/lock mechanism found
-- No test for concurrent requests
-**Deduction**:
-- Code now sequential, not concurrent
-- Race condition avoided by removing concurrency
-- Not eliminated, just prevented by design change
-**Verdict**: PARTIALLY TRUE (solved differently than claimed)
-#### Claim 2: "Adds mutex locking"
-**Evidence**:
-- No mutex library imported
-- No lock variables found
-- No synchronization primitives
-**Deduction**:
-- No mutex implementation exists
-- Claim is factually incorrect
+### Analysis:
-**Verdict**: FALSE
+**Claim 1: "Eliminates race condition"**
+Evidence: Added `await` to sequential operations. No actual mutex.
+Deduction: Race avoided by removing concurrency, not synchronization.
+Verdict: ⚠ PARTIALLY TRUE (solved differently than claimed)
-#### Claim 3: "100% thread safe"
-**Evidence**:
-- JavaScript is single-threaded
-- Node.js event loop model
-- No worker threads used
+**Claim 2: "Adds mutex locking"**
+Evidence: No mutex library, no lock variables, no sync primitives.
+Verdict: ✗ FALSE
-**Deduction**:
-- "Thread safe" is meaningless in this context
-- Shows misunderstanding of runtime model
+**Claim 3: "100% thread safe"**
+Evidence: JavaScript is single-threaded. No worker threads used.
+Verdict: ? NONSENSICAL (meaningless in this context)
-**Verdict**: NONSENSICAL
-### Final Conclusion:
-The fix works but not for the reasons claimed. The race condition is avoided by making operations sequential rather than by adding thread synchronization. Tests verify sequential behavior but don't test the original concurrent scenario.
+### Conclusion:
+Fix works but not for reasons claimed. Race condition avoided by
+making operations sequential, not by adding synchronization.
 ### Recommendations:
 1. Update PR description to accurately reflect solution
 2. Add test for concurrent request handling
-3. Clarify whether sequential execution is acceptable for performance
-4. Remove incorrect technical claims about "mutex" and "thread safety"
+3. Remove incorrect technical claims
 ```
 ---
-## Level 4: Advanced Investigation Techniques
-### Technique 1: The Timeline Reconstruction
-**Purpose**: Understand the sequence of events leading to current state
-```bash
-# Build the timeline
-git log --all --graph --oneline --decorate
-# Examine critical commits
-git log --grep="fix" --grep="bug" --all-match
-# Find when bug was introduced
-git bisect start
-git bisect bad HEAD
-git bisect good v1.0.0
-```
-### Technique 2: The Behavioral Analysis
-**Purpose**: Observe what the code actually does, not what it's supposed to do
-```javascript
-// Add instrumentation
-console.log('[SHERLOCK] Entering function with:', arguments);
-console.log('[SHERLOCK] State before:', JSON.stringify(state));
-// ... original code ...
-console.log('[SHERLOCK] State after:', JSON.stringify(state));
-console.log('[SHERLOCK] Returning:', result);
-```
-### Technique 3: The Stress Test
-**Purpose**: Find limits and breaking points
+## Agent Integration
-```bash
-# Test boundary conditions
-npm test -- --iterations=10000
-# Test with invalid inputs
-echo '{"invalid": null}' | node src/process.js
-# Test resource exhaustion
-ab -n 10000 -c 100 http://localhost:3000/api/endpoint
-```
-### Technique 4: The Forensic Diff
-**Purpose**: Understand what changed and why
-```bash
-# Compare claimed vs actual changes
-git diff --word-diff main..feature-branch
-# Find silent changes (no commit message mention)
-git diff main..feature-branch | grep -A5 -B5 "security\|auth\|password"
+```typescript
+// Evidence-based code review
+await Task("Sherlock Review", {
+  prNumber: 123,
+  claims: [
+    "Fixes memory leak",
+    "Improves performance 30%"
+  ],
+  verifyReproduction: true,
+  testEdgeCases: true
+}, "qe-code-reviewer");
-# Detect code that was removed
-git diff main..feature-branch | grep "^-" | grep -v "^---"
+// Bug fix verification
+await Task("Verify Fix", {
+  bugCommit: 'abc123',
+  fixCommit: 'def456',
+  reproductionSteps: steps,
+  testBoundaryConditions: true
+}, "qe-code-reviewer");
 ```
 ---
-## Investigation Templates
-### Template 1: Bug Fix Verification
-```markdown
-## Sherlock Investigation: Bug Fix Verification
-### The Bug Report
-- **Reported**: [date]
-- **Severity**: [P0/P1/P2/P3]
-- **Symptoms**: [what users observed]
-- **Expected**: [what should happen]
-### The Claimed Fix
-- **PR**: #[number]
-- **Commit**: [hash]
-- **Description**: [claimed solution]
-### Evidence Collection
-#### 1. Reproduce Original Bug
-- [ ] Checkout commit before fix
-- [ ] Follow reproduction steps
-- [ ] Confirm bug exists
-- [ ] Document observed behavior
-#### 2. Verify Fix
-- [ ] Checkout commit with fix
-- [ ] Follow same reproduction steps
-- [ ] Confirm bug is resolved
-- [ ] Test edge cases
-#### 3. Code Analysis
-- [ ] Review actual code changes
-- [ ] Verify logic addresses root cause
-- [ ] Check for side effects
-- [ ] Assess test coverage
-### Deductive Analysis
-**Root Cause Claimed**: [what PR says]
-**Root Cause Actual**: [what evidence shows]
-**Fix Mechanism Claimed**: [how PR says it works]
-**Fix Mechanism Actual**: [how it actually works]
-**Coverage Claimed**: [scenarios PR claims to handle]
-**Coverage Actual**: [scenarios actually handled]
-### Verdict
-- [ ] Bug is fully fixed
-- [ ] Bug is partially fixed
-- [ ] Bug is not fixed (claim is false)
-- [ ] Bug is fixed but new bugs introduced
-### Evidence Summary
-[Concise summary of findings with proof]
-### Recommendations
-1. [Action based on evidence]
-2. [Action based on evidence]
-```
-### Template 2: Feature Implementation Review
-```markdown
-## Sherlock Investigation: Feature Implementation
-### The Feature Request
-- **Requirement**: [what was requested]
-- **Acceptance Criteria**: [how to verify]
-- **User Story**: [why it's needed]
-### The Implementation Claim
-- **PR**: #[number]
-- **Description**: [what PR claims to deliver]
-- **Scope**: [claimed completeness]
-### Evidence Examination
-#### Code Changes
-```bash
-git diff main..feature-branch --stat
-```
-- Files changed: [count]
-- Lines added: [count]
-- Lines removed: [count]
-- Tests added: [count]
-#### Acceptance Criteria Testing
-| Criterion | Claimed | Tested | Verdict |
-|-----------|---------|--------|---------|
-| AC1: [criterion] | ✓ | [yes/no] | [pass/fail] |
-| AC2: [criterion] | ✓ | [yes/no] | [pass/fail] |
-| AC3: [criterion] | ✓ | [yes/no] | [pass/fail] |
-### Deductive Analysis
-**Claim**: [what PR says is implemented]
-**Evidence**:
-- [Fact 1 from code]
-- [Fact 2 from tests]
-- [Fact 3 from behavior]
-**Deduction**:
-- [Logical conclusion from evidence]
-**Verdict**: [supported/partially supported/not supported by evidence]
-### Missing Elements
-- [ ] [Feature aspect not implemented]
-- [ ] [Test scenario not covered]
-- [ ] [Edge case not handled]
+## Agent Coordination Hints
-### Conclusion
-[Evidence-based assessment of implementation completeness]
+### Memory Namespace
 ```
-### Template 3: Performance Claim Verification
-```markdown
-## Sherlock Investigation: Performance Claims
-### The Claim
-"Improved performance by [X]% in [scenario]"
-### Investigation Setup
-#### Baseline Measurement
-```bash
-git checkout [before-commit]
-npm run benchmark > baseline.txt
+aqe/sherlock/
+├── investigations/*   - Investigation reports
+├── evidence/*         - Collected evidence
+├── verdicts/*         - Claim verdicts
+└── patterns/*         - Common deception patterns
 ```
-#### Post-Fix Measurement
-```bash
-git checkout [after-commit]
-npm run benchmark > optimized.txt
-```
-### Evidence Collection
-#### Benchmark Results
-| Metric | Before | After | Improvement | Claimed |
-|--------|--------|-------|-------------|---------|
-| Latency | [ms] | [ms] | [%] | [%] |
-| Throughput | [req/s] | [req/s] | [%] | [%] |
-| Memory | [MB] | [MB] | [%] | [%] |
-| CPU | [%] | [%] | [%] | [%] |
-### Deductive Analysis
-**Claimed Improvement**: [X]%
-**Measured Improvement**: [Y]%
-**Variance**: [X-Y]%
-**Measurement Conditions**:
-- Environment: [prod/dev/local]
-- Load: [concurrent users/requests]
-- Data size: [records/MB]
-**Verdict**:
-- [ ] Claim supported by evidence
-- [ ] Claim exaggerated (actual: [Y]%)
-- [ ] Claim not reproducible
-- [ ] Claim based on cherry-picked scenario
-### Conclusion
-[Evidence-based assessment with actual numbers]
+### Fleet Coordination
+```typescript
+const investigationFleet = await FleetManager.coordinate({
+  strategy: 'evidence-investigation',
+  agents: [
+    'qe-code-reviewer',        // Code analysis
+    'qe-security-auditor',     // Security claim verification
+    'qe-performance-validator' // Performance claim verification
+  ],
+  topology: 'parallel'
+});
 ```
 ---
-## Holmesian Principles for QE
-### Principle 1: "Data! Data! Data!"
-> "I can't make bricks without clay."
-**Application**: Collect comprehensive evidence before forming conclusions
-- Logs, traces, metrics
-- Test results, coverage reports
-- Code diffs, git history
-- Reproduction steps
-### Principle 2: "Eliminate the Impossible"
-> "When you have eliminated the impossible, whatever remains, however improbable, must be the truth."
-**Application**: Use negative testing and boundary analysis
-- Test what should NOT happen
-- Verify constraints are enforced
-- Check impossible inputs are rejected
-- Validate error handling paths
-### Principle 3: "Observe, Don't Assume"
-> "You see, but you do not observe."
-**Application**: Run the code, don't just read it
-- Execute tests locally
-- Step through debugger
-- Profile performance
-- Monitor resource usage
-### Principle 4: "The Little Things Matter"
-> "It has long been an axiom of mine that the little things are infinitely the most important."
-**Application**: Pay attention to details others miss
-- Off-by-one errors
-- Null/undefined handling
-- Timezone conversions
-- Race conditions
-- Memory leaks
-### Principle 5: "Question Everything"
-> "I never guess. It is a capital mistake to theorize before one has data."
-**Application**: Verify all claims empirically
-- Don't trust commit messages
-- Don't trust documentation
-- Don't trust "it works on my machine"
-- Trust only reproducible evidence
----
-## The Sherlock Review Checklist
-Before approving any PR, verify:
-### Evidence-Based Review
-- [ ] **Claim vs Reality**: Does code match description?
-- [ ] **Tests Verify Claims**: Do tests actually prove the fix/feature?
-- [ ] **Reproducible**: Can you reproduce the bug/feature locally?
-- [ ] **Edge Cases**: Are boundary conditions tested?
-- [ ] **Negative Cases**: Are failure paths tested?
-### Deductive Reasoning
-- [ ] **Root Cause**: Does fix address actual root cause?
-- [ ] **Side Effects**: Could this break something else?
-- [ ] **Performance**: Any evidence for performance claims?
-- [ ] **Security**: Any security implications?
-- [ ] **Assumptions**: Are all assumptions validated?
-### Observational Analysis
-- [ ] **Code Quality**: Is code doing what it appears to do?
-- [ ] **Error Handling**: Are errors handled or just hidden?
-- [ ] **Resource Management**: Are resources properly managed?
-- [ ] **Concurrency**: Any race conditions or deadlocks?
-- [ ] **Data Validation**: Is input validated?
-### Timeline Verification
-- [ ] **Related Changes**: Are there related commits?
-- [ ] **Regression Risk**: Could this reintroduce old bugs?
-- [ ] **Dependencies**: Are dependency changes necessary?
-- [ ] **Migration Path**: Is there a rollback plan?
+## Related Skills
+- [brutal-honesty-review](../brutal-honesty-review/) - Direct technical criticism
+- [context-driven-testing](../context-driven-testing/) - Adapt to context
+- [bug-reporting-excellence](../bug-reporting-excellence/) - Document findings
 ---
-## Common Investigation Scenarios
-### Scenario 1: "This Fixed the Bug"
-**Investigation Steps**:
-1. Reproduce bug on commit before fix
-2. Verify bug is gone on commit with fix
-3. Check if fix addresses root cause or just symptom
-4. Test edge cases not in original bug report
-5. Verify no regression in related functionality
-**Red Flags**:
-- Bug "fix" that just removes error logging
-- Fix that works only for specific test case
-- Fix that introduces workarounds instead of solving root cause
-- No test added to prevent regression
-### Scenario 2: "Improved Performance by 50%"
-**Investigation Steps**:
-1. Run benchmark on baseline commit
-2. Run same benchmark on optimized commit
-3. Compare results in identical conditions
-4. Verify measurement methodology
-5. Test under realistic load
-**Red Flags**:
-- Performance tested only on toy data
-- Comparison uses different conditions
-- "Improvement" in non-critical path
-- Trade-off not mentioned (e.g., memory for speed)
-### Scenario 3: "Added Comprehensive Error Handling"
-**Investigation Steps**:
-1. List all error paths in code
-2. Verify each path has handling
-3. Test each error condition
-4. Check error messages are actionable
-5. Verify errors are logged/monitored
-**Red Flags**:
-- Errors caught but ignored (`catch {}`)
-- Generic error messages
-- Errors handled by crashing
-- No logging of critical errors
----
-## Output Format
-### The Sherlock Report
-```markdown
-# Sherlock Investigation Report
-**Case**: [PR/Issue number and title]
-**Investigator**: [Your name]
-**Date**: [Investigation date]
-## Summary
-[One paragraph: What was claimed, what was found, verdict]
-## Claims Examined
-1. [Claim 1]
-2. [Claim 2]
-3. [Claim 3]
-## Evidence Collected
-- Code changes: [summary]
-- Tests added: [count and coverage]
-- Benchmarks: [results]
-- Manual testing: [scenarios tested]
-## Deductive Analysis
-### Claim 1: [Claim text]
-**Evidence**: [What you found]
-**Deduction**: [Logical conclusion]
-**Verdict**: ✓ TRUE / ✗ FALSE / ⚠ PARTIALLY TRUE
-[Repeat for each claim]
-## Findings
-### What Works
-- [Positive finding with evidence]
-### What Doesn't Work
-- [Issue found with evidence]
-### What's Missing
-- [Gap in implementation/testing]
-## Overall Verdict
-- [ ] Approve: Claims fully supported by evidence
-- [ ] Approve with Reservations: Claims mostly accurate
-- [ ] Request Changes: Claims not supported by evidence
-- [ ] Reject: Claims are false or misleading
-## Recommendations
-1. [Action item based on findings]
-2. [Action item based on findings]
----
-**Elementary Evidence**: [Link to detailed evidence files/logs]
-**Reproducible**: [Yes/No - Can others verify your findings?]
-```
----
-## Integration with AQE Fleet
-### Use Sherlock Review With:
-1. **qe-code-reviewer**: After automated review, investigate flagged issues
-2. **qe-security-auditor**: Verify security fix claims
-3. **qe-performance-validator**: Validate performance improvement claims
-4. **qe-flaky-test-hunter**: Investigate "test fixed" claims
-5. **production-validator**: Verify deployment-ready claims
-### Workflow Integration
-```bash
-# 1. Automated review flags issues
-aqe review --pr 123
-# 2. Sherlock investigates flagged claims
-# [Apply Sherlock methodology to each flag]
-# 3. Document evidence-based findings
-# [Generate Sherlock report]
-# 4. Provide actionable feedback
-# [Based on evidence, not assumptions]
-```
----
-## Learn More
-### Recommended Reading
-- "The Adventure of Silver Blaze" - Importance of negative evidence
-- "A Scandal in Bohemia" - Observation vs. seeing
-- "The Boscombe Valley Mystery" - Following the evidence chain
-### Related QE Skills
-- `brutal-honesty-review` - Direct technical criticism
-- `context-driven-testing` - Adapt to specific context
-- `exploratory-testing-advanced` - Investigation techniques
-- `bug-reporting-excellence` - Document findings clearly
----
+## Remember
-**Created**: 2025-11-15
-**Category**: Quality Engineering
-**Approach**: Evidence-Based Investigation
-**Philosophy**: "Elementary" - Trust only what can be proven
+**"It is a capital mistake to theorize before one has data."** Trust only reproducible evidence. Don't trust commit messages, documentation, or "works on my machine."
-*"It is a capital mistake to theorize before one has data. Insensibly one begins to twist facts to suit theories, instead of theories to suit facts."* - Sherlock Holmes
+**The Sherlock Standard:** Every claim must be verified empirically. What does the evidence actually show?