agentic-qe 1.9.4 → 2.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.claude/agents/qe-api-contract-validator.md +95 -1336
- package/.claude/agents/qe-chaos-engineer.md +152 -1211
- package/.claude/agents/qe-code-complexity.md +144 -707
- package/.claude/agents/qe-coverage-analyzer.md +147 -743
- package/.claude/agents/qe-deployment-readiness.md +143 -1496
- package/.claude/agents/qe-flaky-test-hunter.md +132 -1529
- package/.claude/agents/qe-fleet-commander.md +12 -12
- package/.claude/agents/qe-performance-tester.md +150 -886
- package/.claude/agents/qe-production-intelligence.md +155 -1396
- package/.claude/agents/qe-quality-analyzer.md +6 -6
- package/.claude/agents/qe-quality-gate.md +151 -648
- package/.claude/agents/qe-regression-risk-analyzer.md +132 -1150
- package/.claude/agents/qe-requirements-validator.md +149 -932
- package/.claude/agents/qe-security-scanner.md +157 -797
- package/.claude/agents/qe-test-data-architect.md +96 -1365
- package/.claude/agents/qe-test-executor.md +8 -8
- package/.claude/agents/qe-test-generator.md +145 -1540
- package/.claude/agents/qe-visual-tester.md +153 -1257
- package/.claude/agents/qx-partner.md +248 -0
- package/.claude/agents/subagents/qe-code-reviewer.md +40 -136
- package/.claude/agents/subagents/qe-coverage-gap-analyzer.md +40 -480
- package/.claude/agents/subagents/qe-data-generator.md +41 -125
- package/.claude/agents/subagents/qe-flaky-investigator.md +55 -411
- package/.claude/agents/subagents/qe-integration-tester.md +53 -141
- package/.claude/agents/subagents/qe-performance-validator.md +54 -130
- package/.claude/agents/subagents/qe-security-auditor.md +56 -114
- package/.claude/agents/subagents/qe-test-data-architect-sub.md +57 -548
- package/.claude/agents/subagents/qe-test-implementer.md +58 -551
- package/.claude/agents/subagents/qe-test-refactorer.md +65 -722
- package/.claude/agents/subagents/qe-test-writer.md +63 -726
- package/.claude/skills/accessibility-testing/SKILL.md +144 -692
- package/.claude/skills/agentic-quality-engineering/SKILL.md +176 -529
- package/.claude/skills/api-testing-patterns/SKILL.md +180 -560
- package/.claude/skills/brutal-honesty-review/SKILL.md +113 -603
- package/.claude/skills/bug-reporting-excellence/SKILL.md +116 -517
- package/.claude/skills/chaos-engineering-resilience/SKILL.md +127 -72
- package/.claude/skills/cicd-pipeline-qe-orchestrator/SKILL.md +209 -404
- package/.claude/skills/code-review-quality/SKILL.md +158 -608
- package/.claude/skills/compatibility-testing/SKILL.md +148 -38
- package/.claude/skills/compliance-testing/SKILL.md +132 -63
- package/.claude/skills/consultancy-practices/SKILL.md +114 -446
- package/.claude/skills/context-driven-testing/SKILL.md +117 -381
- package/.claude/skills/contract-testing/SKILL.md +176 -141
- package/.claude/skills/database-testing/SKILL.md +137 -130
- package/.claude/skills/exploratory-testing-advanced/SKILL.md +160 -629
- package/.claude/skills/holistic-testing-pact/SKILL.md +140 -188
- package/.claude/skills/localization-testing/SKILL.md +145 -33
- package/.claude/skills/mobile-testing/SKILL.md +132 -448
- package/.claude/skills/mutation-testing/SKILL.md +147 -41
- package/.claude/skills/performance-testing/SKILL.md +200 -546
- package/.claude/skills/quality-metrics/SKILL.md +164 -519
- package/.claude/skills/refactoring-patterns/SKILL.md +132 -699
- package/.claude/skills/regression-testing/SKILL.md +120 -926
- package/.claude/skills/risk-based-testing/SKILL.md +157 -660
- package/.claude/skills/security-testing/SKILL.md +199 -538
- package/.claude/skills/sherlock-review/SKILL.md +163 -699
- package/.claude/skills/shift-left-testing/SKILL.md +161 -465
- package/.claude/skills/shift-right-testing/SKILL.md +161 -519
- package/.claude/skills/six-thinking-hats/SKILL.md +175 -1110
- package/.claude/skills/skills-manifest.json +683 -0
- package/.claude/skills/tdd-london-chicago/SKILL.md +131 -448
- package/.claude/skills/technical-writing/SKILL.md +103 -154
- package/.claude/skills/test-automation-strategy/SKILL.md +166 -772
- package/.claude/skills/test-data-management/SKILL.md +126 -910
- package/.claude/skills/test-design-techniques/SKILL.md +179 -89
- package/.claude/skills/test-environment-management/SKILL.md +136 -91
- package/.claude/skills/test-reporting-analytics/SKILL.md +169 -92
- package/.claude/skills/testability-scoring/README.md +71 -0
- package/.claude/skills/testability-scoring/SKILL.md +245 -0
- package/.claude/skills/testability-scoring/resources/templates/config.template.js +84 -0
- package/.claude/skills/testability-scoring/resources/templates/testability-scoring.spec.template.js +532 -0
- package/.claude/skills/testability-scoring/scripts/generate-html-report.js +1007 -0
- package/.claude/skills/testability-scoring/scripts/run-assessment.sh +70 -0
- package/.claude/skills/visual-testing-advanced/SKILL.md +155 -78
- package/.claude/skills/xp-practices/SKILL.md +151 -587
- package/CHANGELOG.md +110 -0
- package/README.md +55 -21
- package/dist/agents/QXPartnerAgent.d.ts +146 -0
- package/dist/agents/QXPartnerAgent.d.ts.map +1 -0
- package/dist/agents/QXPartnerAgent.js +1831 -0
- package/dist/agents/QXPartnerAgent.js.map +1 -0
- package/dist/agents/index.d.ts +1 -0
- package/dist/agents/index.d.ts.map +1 -1
- package/dist/agents/index.js +82 -2
- package/dist/agents/index.js.map +1 -1
- package/dist/agents/lifecycle/AgentLifecycleManager.d.ts.map +1 -1
- package/dist/agents/lifecycle/AgentLifecycleManager.js +34 -31
- package/dist/agents/lifecycle/AgentLifecycleManager.js.map +1 -1
- package/dist/cli/commands/debug/agent.d.ts.map +1 -1
- package/dist/cli/commands/debug/agent.js +19 -6
- package/dist/cli/commands/debug/agent.js.map +1 -1
- package/dist/cli/commands/debug/health-check.js +20 -7
- package/dist/cli/commands/debug/health-check.js.map +1 -1
- package/dist/cli/commands/init-claude-md-template.d.ts +1 -0
- package/dist/cli/commands/init-claude-md-template.d.ts.map +1 -1
- package/dist/cli/commands/init-claude-md-template.js +18 -3
- package/dist/cli/commands/init-claude-md-template.js.map +1 -1
- package/dist/cli/commands/workflow/cancel.d.ts.map +1 -1
- package/dist/cli/commands/workflow/cancel.js +4 -3
- package/dist/cli/commands/workflow/cancel.js.map +1 -1
- package/dist/cli/commands/workflow/list.d.ts.map +1 -1
- package/dist/cli/commands/workflow/list.js +4 -3
- package/dist/cli/commands/workflow/list.js.map +1 -1
- package/dist/cli/commands/workflow/pause.d.ts.map +1 -1
- package/dist/cli/commands/workflow/pause.js +4 -3
- package/dist/cli/commands/workflow/pause.js.map +1 -1
- package/dist/cli/init/claude-config.d.ts.map +1 -1
- package/dist/cli/init/claude-config.js +3 -8
- package/dist/cli/init/claude-config.js.map +1 -1
- package/dist/cli/init/claude-md.d.ts.map +1 -1
- package/dist/cli/init/claude-md.js +44 -2
- package/dist/cli/init/claude-md.js.map +1 -1
- package/dist/cli/init/database-init.js +1 -1
- package/dist/cli/init/index.d.ts.map +1 -1
- package/dist/cli/init/index.js +13 -6
- package/dist/cli/init/index.js.map +1 -1
- package/dist/cli/init/skills.d.ts.map +1 -1
- package/dist/cli/init/skills.js +2 -1
- package/dist/cli/init/skills.js.map +1 -1
- package/dist/core/SwarmCoordinator.d.ts +180 -0
- package/dist/core/SwarmCoordinator.d.ts.map +1 -0
- package/dist/core/SwarmCoordinator.js +473 -0
- package/dist/core/SwarmCoordinator.js.map +1 -0
- package/dist/core/memory/AgentDBIntegration.d.ts +24 -6
- package/dist/core/memory/AgentDBIntegration.d.ts.map +1 -1
- package/dist/core/memory/AgentDBIntegration.js +66 -10
- package/dist/core/memory/AgentDBIntegration.js.map +1 -1
- package/dist/core/memory/UnifiedMemoryCoordinator.d.ts +341 -0
- package/dist/core/memory/UnifiedMemoryCoordinator.d.ts.map +1 -0
- package/dist/core/memory/UnifiedMemoryCoordinator.js +986 -0
- package/dist/core/memory/UnifiedMemoryCoordinator.js.map +1 -0
- package/dist/core/memory/index.d.ts +5 -0
- package/dist/core/memory/index.d.ts.map +1 -1
- package/dist/core/memory/index.js +23 -1
- package/dist/core/memory/index.js.map +1 -1
- package/dist/core/metrics/MetricsAggregator.d.ts +228 -0
- package/dist/core/metrics/MetricsAggregator.d.ts.map +1 -0
- package/dist/core/metrics/MetricsAggregator.js +482 -0
- package/dist/core/metrics/MetricsAggregator.js.map +1 -0
- package/dist/core/metrics/index.d.ts +5 -0
- package/dist/core/metrics/index.d.ts.map +1 -0
- package/dist/core/metrics/index.js +11 -0
- package/dist/core/metrics/index.js.map +1 -0
- package/dist/core/optimization/SwarmOptimizer.d.ts +190 -0
- package/dist/core/optimization/SwarmOptimizer.d.ts.map +1 -0
- package/dist/core/optimization/SwarmOptimizer.js +648 -0
- package/dist/core/optimization/SwarmOptimizer.js.map +1 -0
- package/dist/core/optimization/index.d.ts +9 -0
- package/dist/core/optimization/index.d.ts.map +1 -0
- package/dist/core/optimization/index.js +25 -0
- package/dist/core/optimization/index.js.map +1 -0
- package/dist/core/optimization/types.d.ts +53 -0
- package/dist/core/optimization/types.d.ts.map +1 -0
- package/dist/core/optimization/types.js +6 -0
- package/dist/core/optimization/types.js.map +1 -0
- package/dist/core/orchestration/AdaptiveScheduler.d.ts +190 -0
- package/dist/core/orchestration/AdaptiveScheduler.d.ts.map +1 -0
- package/dist/core/orchestration/AdaptiveScheduler.js +460 -0
- package/dist/core/orchestration/AdaptiveScheduler.js.map +1 -0
- package/dist/core/orchestration/PriorityQueue.d.ts +54 -0
- package/dist/core/orchestration/PriorityQueue.d.ts.map +1 -0
- package/dist/core/orchestration/PriorityQueue.js +122 -0
- package/dist/core/orchestration/PriorityQueue.js.map +1 -0
- package/dist/core/orchestration/WorkflowOrchestrator.d.ts +189 -0
- package/dist/core/orchestration/WorkflowOrchestrator.d.ts.map +1 -0
- package/dist/core/orchestration/WorkflowOrchestrator.js +845 -0
- package/dist/core/orchestration/WorkflowOrchestrator.js.map +1 -0
- package/dist/core/orchestration/index.d.ts +7 -0
- package/dist/core/orchestration/index.d.ts.map +1 -0
- package/dist/core/orchestration/index.js +11 -0
- package/dist/core/orchestration/index.js.map +1 -0
- package/dist/core/orchestration/types.d.ts +96 -0
- package/dist/core/orchestration/types.d.ts.map +1 -0
- package/dist/core/orchestration/types.js +6 -0
- package/dist/core/orchestration/types.js.map +1 -0
- package/dist/core/recovery/CircuitBreaker.d.ts +176 -0
- package/dist/core/recovery/CircuitBreaker.d.ts.map +1 -0
- package/dist/core/recovery/CircuitBreaker.js +382 -0
- package/dist/core/recovery/CircuitBreaker.js.map +1 -0
- package/dist/core/recovery/RecoveryOrchestrator.d.ts +186 -0
- package/dist/core/recovery/RecoveryOrchestrator.d.ts.map +1 -0
- package/dist/core/recovery/RecoveryOrchestrator.js +476 -0
- package/dist/core/recovery/RecoveryOrchestrator.js.map +1 -0
- package/dist/core/recovery/RetryStrategy.d.ts +127 -0
- package/dist/core/recovery/RetryStrategy.d.ts.map +1 -0
- package/dist/core/recovery/RetryStrategy.js +314 -0
- package/dist/core/recovery/RetryStrategy.js.map +1 -0
- package/dist/core/recovery/index.d.ts +8 -0
- package/dist/core/recovery/index.d.ts.map +1 -0
- package/dist/core/recovery/index.js +27 -0
- package/dist/core/recovery/index.js.map +1 -0
- package/dist/core/skills/DependencyResolver.d.ts +99 -0
- package/dist/core/skills/DependencyResolver.d.ts.map +1 -0
- package/dist/core/skills/DependencyResolver.js +260 -0
- package/dist/core/skills/DependencyResolver.js.map +1 -0
- package/dist/core/skills/DynamicSkillLoader.d.ts +96 -0
- package/dist/core/skills/DynamicSkillLoader.d.ts.map +1 -0
- package/dist/core/skills/DynamicSkillLoader.js +353 -0
- package/dist/core/skills/DynamicSkillLoader.js.map +1 -0
- package/dist/core/skills/ManifestGenerator.d.ts +114 -0
- package/dist/core/skills/ManifestGenerator.d.ts.map +1 -0
- package/dist/core/skills/ManifestGenerator.js +449 -0
- package/dist/core/skills/ManifestGenerator.js.map +1 -0
- package/dist/core/skills/index.d.ts +9 -0
- package/dist/core/skills/index.d.ts.map +1 -0
- package/dist/core/skills/index.js +24 -0
- package/dist/core/skills/index.js.map +1 -0
- package/dist/core/skills/types.d.ts +118 -0
- package/dist/core/skills/types.d.ts.map +1 -0
- package/dist/core/skills/types.js +7 -0
- package/dist/core/skills/types.js.map +1 -0
- package/dist/core/transport/QUICTransport.d.ts +320 -0
- package/dist/core/transport/QUICTransport.d.ts.map +1 -0
- package/dist/core/transport/QUICTransport.js +711 -0
- package/dist/core/transport/QUICTransport.js.map +1 -0
- package/dist/core/transport/index.d.ts +40 -0
- package/dist/core/transport/index.d.ts.map +1 -0
- package/dist/core/transport/index.js +46 -0
- package/dist/core/transport/index.js.map +1 -0
- package/dist/core/transport/quic-loader.d.ts +123 -0
- package/dist/core/transport/quic-loader.d.ts.map +1 -0
- package/dist/core/transport/quic-loader.js +293 -0
- package/dist/core/transport/quic-loader.js.map +1 -0
- package/dist/core/transport/quic.d.ts +154 -0
- package/dist/core/transport/quic.d.ts.map +1 -0
- package/dist/core/transport/quic.js +214 -0
- package/dist/core/transport/quic.js.map +1 -0
- package/dist/mcp/server.d.ts +9 -9
- package/dist/mcp/server.d.ts.map +1 -1
- package/dist/mcp/server.js +1 -2
- package/dist/mcp/server.js.map +1 -1
- package/dist/mcp/services/AgentRegistry.d.ts.map +1 -1
- package/dist/mcp/services/AgentRegistry.js +4 -1
- package/dist/mcp/services/AgentRegistry.js.map +1 -1
- package/dist/types/index.d.ts +2 -1
- package/dist/types/index.d.ts.map +1 -1
- package/dist/types/index.js +2 -0
- package/dist/types/index.js.map +1 -1
- package/dist/types/qx.d.ts +429 -0
- package/dist/types/qx.d.ts.map +1 -0
- package/dist/types/qx.js +71 -0
- package/dist/types/qx.js.map +1 -0
- package/dist/visualization/api/RestEndpoints.js +2 -2
- package/dist/visualization/api/RestEndpoints.js.map +1 -1
- package/dist/visualization/api/WebSocketServer.d.ts +44 -0
- package/dist/visualization/api/WebSocketServer.d.ts.map +1 -1
- package/dist/visualization/api/WebSocketServer.js +144 -23
- package/dist/visualization/api/WebSocketServer.js.map +1 -1
- package/dist/visualization/core/DataTransformer.d.ts +10 -0
- package/dist/visualization/core/DataTransformer.d.ts.map +1 -1
- package/dist/visualization/core/DataTransformer.js +60 -5
- package/dist/visualization/core/DataTransformer.js.map +1 -1
- package/dist/visualization/emit-event.d.ts +75 -0
- package/dist/visualization/emit-event.d.ts.map +1 -0
- package/dist/visualization/emit-event.js +213 -0
- package/dist/visualization/emit-event.js.map +1 -0
- package/dist/visualization/index.d.ts +1 -0
- package/dist/visualization/index.d.ts.map +1 -1
- package/dist/visualization/index.js +7 -1
- package/dist/visualization/index.js.map +1 -1
- package/docs/reference/skills.md +63 -1
- package/package.json +16 -58
|
@@ -1,786 +1,250 @@
|
|
|
1
1
|
---
|
|
2
|
-
name:
|
|
2
|
+
name: sherlock-review
|
|
3
3
|
description: "Evidence-based investigative code review using deductive reasoning to determine what actually happened versus what was claimed. Use when verifying implementation claims, investigating bugs, validating fixes, or conducting root cause analysis. Elementary approach to finding truth through systematic observation."
|
|
4
|
+
category: quality-review
|
|
5
|
+
priority: high
|
|
6
|
+
tokenEstimate: 1100
|
|
7
|
+
agents: [qe-code-reviewer, qe-security-auditor, qe-performance-validator]
|
|
8
|
+
implementation_status: optimized
|
|
9
|
+
optimization_version: 1.0
|
|
10
|
+
last_optimized: 2025-12-03
|
|
11
|
+
dependencies: []
|
|
12
|
+
quick_reference_card: true
|
|
13
|
+
tags: [investigation, evidence-based, code-review, root-cause, deduction]
|
|
4
14
|
---
|
|
5
15
|
|
|
6
16
|
# Sherlock Review
|
|
7
17
|
|
|
8
|
-
|
|
9
|
-
|
|
10
|
-
|
|
11
|
-
|
|
12
|
-
|
|
13
|
-
|
|
14
|
-
|
|
15
|
-
- Ability to run tests and reproduce issues
|
|
16
|
-
- Understanding of the domain and system architecture
|
|
17
|
-
- Critical thinking and skepticism
|
|
18
|
-
|
|
19
|
-
---
|
|
20
|
-
|
|
21
|
-
## Quick Start (Elementary Method)
|
|
22
|
-
|
|
23
|
-
### The 3-Step Investigation
|
|
18
|
+
<default_to_action>
|
|
19
|
+
When investigating code claims:
|
|
20
|
+
1. OBSERVE: Gather all evidence (code, tests, history, behavior)
|
|
21
|
+
2. DEDUCE: What does evidence actually show vs. what was claimed?
|
|
22
|
+
3. ELIMINATE: Rule out what cannot be true
|
|
23
|
+
4. CONCLUDE: Does evidence support the claim?
|
|
24
|
+
5. DOCUMENT: Findings with proof, not assumptions
|
|
24
25
|
|
|
26
|
+
**The 3-Step Investigation:**
|
|
25
27
|
```bash
|
|
26
|
-
# 1. OBSERVE: Gather
|
|
27
|
-
git log --oneline -10
|
|
28
|
+
# 1. OBSERVE: Gather evidence
|
|
28
29
|
git diff <commit>
|
|
29
|
-
|
|
30
|
+
npm test -- --coverage
|
|
30
31
|
|
|
31
|
-
# 2. DEDUCE:
|
|
32
|
-
|
|
33
|
-
|
|
32
|
+
# 2. DEDUCE: Compare claim vs reality
|
|
33
|
+
# Does code match description?
|
|
34
|
+
# Do tests prove the fix/feature?
|
|
34
35
|
|
|
35
|
-
# 3. CONCLUDE:
|
|
36
|
-
#
|
|
36
|
+
# 3. CONCLUDE: Verdict with evidence
|
|
37
|
+
# SUPPORTED / PARTIALLY SUPPORTED / NOT SUPPORTED
|
|
37
38
|
```
|
|
38
39
|
|
|
39
|
-
|
|
40
|
-
|
|
41
|
-
|
|
42
|
-
|
|
43
|
-
|
|
44
|
-
|
|
45
|
-
**Principle**: "You see, but you do not observe. The distinction is clear."
|
|
46
|
-
|
|
47
|
-
#### What to Examine First
|
|
40
|
+
**Holmesian Principles:**
|
|
41
|
+
- "Data! Data! Data!" - Collect before concluding
|
|
42
|
+
- "Eliminate the impossible" - What cannot be true?
|
|
43
|
+
- "You see, but do not observe" - Run code, don't just read
|
|
44
|
+
- Trust only reproducible evidence
|
|
45
|
+
</default_to_action>
|
|
48
46
|
|
|
49
|
-
|
|
50
|
-
- PR description
|
|
51
|
-
- Commit messages
|
|
52
|
-
- Issue tickets
|
|
53
|
-
- Documentation updates
|
|
47
|
+
## Quick Reference Card
|
|
54
48
|
|
|
55
|
-
|
|
56
|
-
- Actual code changes
|
|
57
|
-
- Test coverage
|
|
58
|
-
- Build/test results
|
|
59
|
-
- Runtime behavior
|
|
49
|
+
### Evidence Collection Checklist
|
|
60
50
|
|
|
61
|
-
|
|
62
|
-
|
|
63
|
-
|
|
64
|
-
|
|
65
|
-
|
|
51
|
+
| Category | What to Check | How |
|
|
52
|
+
|----------|---------------|-----|
|
|
53
|
+
| **Claim** | PR description, commit messages | Read thoroughly |
|
|
54
|
+
| **Code** | Actual file changes | `git diff` |
|
|
55
|
+
| **Tests** | Coverage, assertions | Run independently |
|
|
56
|
+
| **Behavior** | Runtime output | Execute locally |
|
|
57
|
+
| **Timeline** | When things happened | `git log`, `git blame` |
|
|
66
58
|
|
|
67
|
-
|
|
59
|
+
### Verdict Levels
|
|
68
60
|
|
|
69
|
-
|
|
70
|
-
|
|
71
|
-
|
|
72
|
-
|
|
73
|
-
|
|
74
|
-
|
|
75
|
-
- [ ] Identify specific assertions made
|
|
76
|
-
- [ ] Record expected behavior
|
|
77
|
-
|
|
78
|
-
### The Code
|
|
79
|
-
- [ ] Examine actual file changes
|
|
80
|
-
- [ ] Review implementation details
|
|
81
|
-
- [ ] Check for edge cases
|
|
82
|
-
- [ ] Verify error handling
|
|
83
|
-
|
|
84
|
-
### The Tests
|
|
85
|
-
- [ ] Count test cases added/modified
|
|
86
|
-
- [ ] Run tests independently
|
|
87
|
-
- [ ] Check test assertions
|
|
88
|
-
- [ ] Verify test coverage
|
|
89
|
-
|
|
90
|
-
### The Behavior
|
|
91
|
-
- [ ] Run the code locally
|
|
92
|
-
- [ ] Test claimed scenarios
|
|
93
|
-
- [ ] Try edge cases
|
|
94
|
-
- [ ] Reproduce reported fixes
|
|
95
|
-
```
|
|
61
|
+
| Verdict | Meaning |
|
|
62
|
+
|---------|---------|
|
|
63
|
+
| ✓ **TRUE** | Evidence fully supports claim |
|
|
64
|
+
| ⚠ **PARTIALLY TRUE** | Claim accurate but incomplete |
|
|
65
|
+
| ✗ **FALSE** | Evidence contradicts claim |
|
|
66
|
+
| ? **NONSENSICAL** | Claim doesn't apply to context |
|
|
96
67
|
|
|
97
68
|
---
|
|
98
69
|
|
|
99
|
-
##
|
|
100
|
-
|
|
101
|
-
### The Sherlock Framework
|
|
102
|
-
|
|
103
|
-
#### 1. Eliminate the Impossible
|
|
104
|
-
|
|
105
|
-
**Method**: Systematically rule out what cannot be true
|
|
70
|
+
## Investigation Template
|
|
106
71
|
|
|
107
72
|
```markdown
|
|
108
|
-
## Investigation
|
|
73
|
+
## Sherlock Investigation: [Claim]
|
|
109
74
|
|
|
110
|
-
### Claim
|
|
75
|
+
### The Claim
|
|
76
|
+
"[What PR/commit claims to do]"
|
|
111
77
|
|
|
112
|
-
|
|
113
|
-
-
|
|
114
|
-
-
|
|
115
|
-
-
|
|
116
|
-
- ✗ No database migration
|
|
117
|
-
- ✗ Tests pass but don't cover reported scenario
|
|
78
|
+
### Evidence Examined
|
|
79
|
+
- Code changes: [files, lines]
|
|
80
|
+
- Tests added: [count, coverage]
|
|
81
|
+
- Behavior observed: [what actually happens]
|
|
118
82
|
|
|
119
|
-
|
|
120
|
-
- IMPOSSIBLE: Fix covers all auth scenarios (no login flow changes)
|
|
121
|
-
- POSSIBLE: Fix covers specific password reset case
|
|
122
|
-
- LIKELY: Fix is partial, limited to one code path
|
|
123
|
-
```
|
|
83
|
+
### Deductive Analysis
|
|
124
84
|
|
|
125
|
-
|
|
85
|
+
**Claim**: [specific assertion]
|
|
86
|
+
**Evidence**: [what you found]
|
|
87
|
+
**Deduction**: [logical conclusion]
|
|
88
|
+
**Verdict**: ✓/⚠/✗
|
|
126
89
|
|
|
127
|
-
|
|
90
|
+
### Findings
|
|
91
|
+
- What works: [with evidence]
|
|
92
|
+
- What doesn't: [with evidence]
|
|
93
|
+
- What's missing: [gaps in implementation/testing]
|
|
128
94
|
|
|
129
|
-
|
|
130
|
-
|
|
131
|
-
|
|
132
|
-
### Observation 1: Test passes locally
|
|
133
|
-
### Observation 2: Test fails in CI
|
|
134
|
-
### Observation 3: Different Node versions
|
|
135
|
-
|
|
136
|
-
### Chain of Reasoning:
|
|
137
|
-
1. Test behavior differs by environment
|
|
138
|
-
2. Environment difference is Node version
|
|
139
|
-
3. Code uses Node-version-specific API
|
|
140
|
-
4. Therefore: Fix is environment-dependent
|
|
141
|
-
5. Conclusion: Claim of "fixed" is incomplete
|
|
95
|
+
### Recommendations
|
|
96
|
+
1. [Action based on findings]
|
|
142
97
|
```
|
|
143
98
|
|
|
144
|
-
#### 3. Question Everything
|
|
145
|
-
|
|
146
|
-
**Critical Questions to Ask**:
|
|
147
|
-
|
|
148
|
-
- Does the code actually do what the commit message claims?
|
|
149
|
-
- Do the tests verify the claimed fix?
|
|
150
|
-
- Can the bug reproduce in conditions not covered by tests?
|
|
151
|
-
- Are there edge cases not considered?
|
|
152
|
-
- Does "works on my machine" equal "properly fixed"?
|
|
153
|
-
|
|
154
99
|
---
|
|
155
100
|
|
|
156
|
-
##
|
|
101
|
+
## Investigation Scenarios
|
|
157
102
|
|
|
158
|
-
###
|
|
159
|
-
|
|
160
|
-
#### Step 1: Read the Case File
|
|
161
|
-
|
|
162
|
-
```bash
|
|
163
|
-
# Examine the claim
|
|
164
|
-
git show <commit>
|
|
165
|
-
cat PR_DESCRIPTION.md
|
|
166
|
-
|
|
167
|
-
# Note specific assertions:
|
|
168
|
-
# - "Fixes race condition in async handler"
|
|
169
|
-
# - "Adds comprehensive error handling"
|
|
170
|
-
# - "Improves performance by 40%"
|
|
171
|
-
```
|
|
172
|
-
|
|
173
|
-
#### Step 2: Examine the Evidence
|
|
174
|
-
|
|
175
|
-
```bash
|
|
176
|
-
# What actually changed?
|
|
177
|
-
git diff main..feature-branch
|
|
103
|
+
### Scenario 1: "This Fixed the Bug"
|
|
178
104
|
|
|
179
|
-
|
|
180
|
-
|
|
181
|
-
|
|
182
|
-
|
|
105
|
+
**Steps:**
|
|
106
|
+
1. Reproduce bug on commit before fix
|
|
107
|
+
2. Verify bug is gone on commit with fix
|
|
108
|
+
3. Check if fix addresses root cause or symptom
|
|
109
|
+
4. Test edge cases not in original report
|
|
183
110
|
|
|
184
|
-
|
|
185
|
-
|
|
186
|
-
|
|
111
|
+
**Red Flags:**
|
|
112
|
+
- Fix that just removes error logging
|
|
113
|
+
- Works only for specific test case
|
|
114
|
+
- Workarounds instead of root cause fix
|
|
115
|
+
- No regression test added
|
|
187
116
|
|
|
188
|
-
|
|
117
|
+
### Scenario 2: "Improved Performance by 50%"
|
|
189
118
|
|
|
190
|
-
|
|
191
|
-
|
|
192
|
-
|
|
119
|
+
**Steps:**
|
|
120
|
+
1. Run benchmark on baseline commit
|
|
121
|
+
2. Run same benchmark on optimized commit
|
|
122
|
+
3. Compare in identical conditions
|
|
123
|
+
4. Verify measurement methodology
|
|
193
124
|
|
|
194
|
-
|
|
195
|
-
|
|
125
|
+
**Red Flags:**
|
|
126
|
+
- Tested only on toy data
|
|
127
|
+
- Different comparison conditions
|
|
128
|
+
- Trade-offs not mentioned
|
|
196
129
|
|
|
197
|
-
|
|
198
|
-
git checkout <bug-commit>
|
|
199
|
-
npm test -- <failing-test>
|
|
200
|
-
git checkout <fix-commit>
|
|
201
|
-
npm test -- <failing-test>
|
|
202
|
-
```
|
|
130
|
+
### Scenario 3: "Handles All Edge Cases"
|
|
203
131
|
|
|
204
|
-
|
|
132
|
+
**Steps:**
|
|
133
|
+
1. List all edge cases in code path
|
|
134
|
+
2. Check each has test coverage
|
|
135
|
+
3. Test boundary conditions
|
|
136
|
+
4. Verify error handling paths
|
|
205
137
|
|
|
206
|
-
**
|
|
138
|
+
**Red Flags:**
|
|
139
|
+
- `catch {}` swallowing errors
|
|
140
|
+
- Generic error messages
|
|
141
|
+
- No logging of critical errors
|
|
207
142
|
|
|
208
|
-
|
|
209
|
-
// CLAIMED: "Handles all null cases"
|
|
210
|
-
function processData(data) {
|
|
211
|
-
if (data === null) return null; // ✓ Handles null
|
|
212
|
-
return data.items.map(x => x); // ✗ Doesn't handle data.items === null
|
|
213
|
-
}
|
|
214
|
-
// VERDICT: Claim is FALSE - only handles top-level null
|
|
215
|
-
```
|
|
143
|
+
---
|
|
216
144
|
|
|
217
|
-
|
|
145
|
+
## Example Investigation
|
|
218
146
|
|
|
219
147
|
```markdown
|
|
220
|
-
##
|
|
221
|
-
|
|
222
|
-
### Case: PR #123 "Fix race condition in async handler"
|
|
148
|
+
## Case: PR #123 "Fix race condition in async handler"
|
|
223
149
|
|
|
224
|
-
###
|
|
150
|
+
### Claims Examined:
|
|
225
151
|
1. "Eliminates race condition"
|
|
226
152
|
2. "Adds mutex locking"
|
|
227
153
|
3. "100% thread safe"
|
|
228
154
|
|
|
229
|
-
### Evidence
|
|
155
|
+
### Evidence:
|
|
230
156
|
- File: src/handlers/async-handler.js
|
|
231
157
|
- Changes: Added `async/await`, removed callbacks
|
|
232
158
|
- Tests: 2 new tests for async flow
|
|
233
159
|
- Coverage: 85% (was 75%)
|
|
234
160
|
|
|
235
|
-
###
|
|
236
|
-
|
|
237
|
-
#### Claim 1: "Eliminates race condition"
|
|
238
|
-
**Evidence**:
|
|
239
|
-
- Added `await` to sequential operations
|
|
240
|
-
- No actual mutex/lock mechanism found
|
|
241
|
-
- No test for concurrent requests
|
|
242
|
-
|
|
243
|
-
**Deduction**:
|
|
244
|
-
- Code now sequential, not concurrent
|
|
245
|
-
- Race condition avoided by removing concurrency
|
|
246
|
-
- Not eliminated, just prevented by design change
|
|
247
|
-
|
|
248
|
-
**Verdict**: PARTIALLY TRUE (solved differently than claimed)
|
|
249
|
-
|
|
250
|
-
#### Claim 2: "Adds mutex locking"
|
|
251
|
-
**Evidence**:
|
|
252
|
-
- No mutex library imported
|
|
253
|
-
- No lock variables found
|
|
254
|
-
- No synchronization primitives
|
|
255
|
-
|
|
256
|
-
**Deduction**:
|
|
257
|
-
- No mutex implementation exists
|
|
258
|
-
- Claim is factually incorrect
|
|
161
|
+
### Analysis:
|
|
259
162
|
|
|
260
|
-
**
|
|
163
|
+
**Claim 1: "Eliminates race condition"**
|
|
164
|
+
Evidence: Added `await` to sequential operations. No actual mutex.
|
|
165
|
+
Deduction: Race avoided by removing concurrency, not synchronization.
|
|
166
|
+
Verdict: ⚠ PARTIALLY TRUE (solved differently than claimed)
|
|
261
167
|
|
|
262
|
-
|
|
263
|
-
|
|
264
|
-
|
|
265
|
-
- Node.js event loop model
|
|
266
|
-
- No worker threads used
|
|
168
|
+
**Claim 2: "Adds mutex locking"**
|
|
169
|
+
Evidence: No mutex library, no lock variables, no sync primitives.
|
|
170
|
+
Verdict: ✗ FALSE
|
|
267
171
|
|
|
268
|
-
**
|
|
269
|
-
|
|
270
|
-
|
|
172
|
+
**Claim 3: "100% thread safe"**
|
|
173
|
+
Evidence: JavaScript is single-threaded. No worker threads used.
|
|
174
|
+
Verdict: ? NONSENSICAL (meaningless in this context)
|
|
271
175
|
|
|
272
|
-
|
|
273
|
-
|
|
274
|
-
|
|
275
|
-
The fix works but not for the reasons claimed. The race condition is avoided by making operations sequential rather than by adding thread synchronization. Tests verify sequential behavior but don't test the original concurrent scenario.
|
|
176
|
+
### Conclusion:
|
|
177
|
+
Fix works but not for reasons claimed. Race condition avoided by
|
|
178
|
+
making operations sequential, not by adding synchronization.
|
|
276
179
|
|
|
277
180
|
### Recommendations:
|
|
278
181
|
1. Update PR description to accurately reflect solution
|
|
279
182
|
2. Add test for concurrent request handling
|
|
280
|
-
3.
|
|
281
|
-
4. Remove incorrect technical claims about "mutex" and "thread safety"
|
|
183
|
+
3. Remove incorrect technical claims
|
|
282
184
|
```
|
|
283
185
|
|
|
284
186
|
---
|
|
285
187
|
|
|
286
|
-
##
|
|
287
|
-
|
|
288
|
-
### Technique 1: The Timeline Reconstruction
|
|
289
|
-
|
|
290
|
-
**Purpose**: Understand the sequence of events leading to current state
|
|
291
|
-
|
|
292
|
-
```bash
|
|
293
|
-
# Build the timeline
|
|
294
|
-
git log --all --graph --oneline --decorate
|
|
295
|
-
|
|
296
|
-
# Examine critical commits
|
|
297
|
-
git log --grep="fix" --grep="bug" --all-match
|
|
298
|
-
|
|
299
|
-
# Find when bug was introduced
|
|
300
|
-
git bisect start
|
|
301
|
-
git bisect bad HEAD
|
|
302
|
-
git bisect good v1.0.0
|
|
303
|
-
```
|
|
304
|
-
|
|
305
|
-
### Technique 2: The Behavioral Analysis
|
|
306
|
-
|
|
307
|
-
**Purpose**: Observe what the code actually does, not what it's supposed to do
|
|
308
|
-
|
|
309
|
-
```javascript
|
|
310
|
-
// Add instrumentation
|
|
311
|
-
console.log('[SHERLOCK] Entering function with:', arguments);
|
|
312
|
-
console.log('[SHERLOCK] State before:', JSON.stringify(state));
|
|
313
|
-
// ... original code ...
|
|
314
|
-
console.log('[SHERLOCK] State after:', JSON.stringify(state));
|
|
315
|
-
console.log('[SHERLOCK] Returning:', result);
|
|
316
|
-
```
|
|
317
|
-
|
|
318
|
-
### Technique 3: The Stress Test
|
|
319
|
-
|
|
320
|
-
**Purpose**: Find limits and breaking points
|
|
188
|
+
## Agent Integration
|
|
321
189
|
|
|
322
|
-
```
|
|
323
|
-
|
|
324
|
-
|
|
325
|
-
|
|
326
|
-
|
|
327
|
-
|
|
328
|
-
|
|
329
|
-
|
|
330
|
-
|
|
331
|
-
|
|
332
|
-
|
|
333
|
-
### Technique 4: The Forensic Diff
|
|
334
|
-
|
|
335
|
-
**Purpose**: Understand what changed and why
|
|
336
|
-
|
|
337
|
-
```bash
|
|
338
|
-
# Compare claimed vs actual changes
|
|
339
|
-
git diff --word-diff main..feature-branch
|
|
340
|
-
|
|
341
|
-
# Find silent changes (no commit message mention)
|
|
342
|
-
git diff main..feature-branch | grep -A5 -B5 "security\|auth\|password"
|
|
190
|
+
```typescript
|
|
191
|
+
// Evidence-based code review
|
|
192
|
+
await Task("Sherlock Review", {
|
|
193
|
+
prNumber: 123,
|
|
194
|
+
claims: [
|
|
195
|
+
"Fixes memory leak",
|
|
196
|
+
"Improves performance 30%"
|
|
197
|
+
],
|
|
198
|
+
verifyReproduction: true,
|
|
199
|
+
testEdgeCases: true
|
|
200
|
+
}, "qe-code-reviewer");
|
|
343
201
|
|
|
344
|
-
|
|
345
|
-
|
|
202
|
+
// Bug fix verification
|
|
203
|
+
await Task("Verify Fix", {
|
|
204
|
+
bugCommit: 'abc123',
|
|
205
|
+
fixCommit: 'def456',
|
|
206
|
+
reproductionSteps: steps,
|
|
207
|
+
testBoundaryConditions: true
|
|
208
|
+
}, "qe-code-reviewer");
|
|
346
209
|
```
|
|
347
210
|
|
|
348
211
|
---
|
|
349
212
|
|
|
350
|
-
##
|
|
351
|
-
|
|
352
|
-
### Template 1: Bug Fix Verification
|
|
353
|
-
|
|
354
|
-
```markdown
|
|
355
|
-
## Sherlock Investigation: Bug Fix Verification
|
|
356
|
-
|
|
357
|
-
### The Bug Report
|
|
358
|
-
- **Reported**: [date]
|
|
359
|
-
- **Severity**: [P0/P1/P2/P3]
|
|
360
|
-
- **Symptoms**: [what users observed]
|
|
361
|
-
- **Expected**: [what should happen]
|
|
362
|
-
|
|
363
|
-
### The Claimed Fix
|
|
364
|
-
- **PR**: #[number]
|
|
365
|
-
- **Commit**: [hash]
|
|
366
|
-
- **Description**: [claimed solution]
|
|
367
|
-
|
|
368
|
-
### Evidence Collection
|
|
369
|
-
|
|
370
|
-
#### 1. Reproduce Original Bug
|
|
371
|
-
- [ ] Checkout commit before fix
|
|
372
|
-
- [ ] Follow reproduction steps
|
|
373
|
-
- [ ] Confirm bug exists
|
|
374
|
-
- [ ] Document observed behavior
|
|
375
|
-
|
|
376
|
-
#### 2. Verify Fix
|
|
377
|
-
- [ ] Checkout commit with fix
|
|
378
|
-
- [ ] Follow same reproduction steps
|
|
379
|
-
- [ ] Confirm bug is resolved
|
|
380
|
-
- [ ] Test edge cases
|
|
381
|
-
|
|
382
|
-
#### 3. Code Analysis
|
|
383
|
-
- [ ] Review actual code changes
|
|
384
|
-
- [ ] Verify logic addresses root cause
|
|
385
|
-
- [ ] Check for side effects
|
|
386
|
-
- [ ] Assess test coverage
|
|
387
|
-
|
|
388
|
-
### Deductive Analysis
|
|
389
|
-
|
|
390
|
-
**Root Cause Claimed**: [what PR says]
|
|
391
|
-
**Root Cause Actual**: [what evidence shows]
|
|
392
|
-
|
|
393
|
-
**Fix Mechanism Claimed**: [how PR says it works]
|
|
394
|
-
**Fix Mechanism Actual**: [how it actually works]
|
|
395
|
-
|
|
396
|
-
**Coverage Claimed**: [scenarios PR claims to handle]
|
|
397
|
-
**Coverage Actual**: [scenarios actually handled]
|
|
398
|
-
|
|
399
|
-
### Verdict
|
|
400
|
-
|
|
401
|
-
- [ ] Bug is fully fixed
|
|
402
|
-
- [ ] Bug is partially fixed
|
|
403
|
-
- [ ] Bug is not fixed (claim is false)
|
|
404
|
-
- [ ] Bug is fixed but new bugs introduced
|
|
405
|
-
|
|
406
|
-
### Evidence Summary
|
|
407
|
-
[Concise summary of findings with proof]
|
|
408
|
-
|
|
409
|
-
### Recommendations
|
|
410
|
-
1. [Action based on evidence]
|
|
411
|
-
2. [Action based on evidence]
|
|
412
|
-
```
|
|
413
|
-
|
|
414
|
-
### Template 2: Feature Implementation Review
|
|
415
|
-
|
|
416
|
-
```markdown
|
|
417
|
-
## Sherlock Investigation: Feature Implementation
|
|
418
|
-
|
|
419
|
-
### The Feature Request
|
|
420
|
-
- **Requirement**: [what was requested]
|
|
421
|
-
- **Acceptance Criteria**: [how to verify]
|
|
422
|
-
- **User Story**: [why it's needed]
|
|
423
|
-
|
|
424
|
-
### The Implementation Claim
|
|
425
|
-
- **PR**: #[number]
|
|
426
|
-
- **Description**: [what PR claims to deliver]
|
|
427
|
-
- **Scope**: [claimed completeness]
|
|
428
|
-
|
|
429
|
-
### Evidence Examination
|
|
430
|
-
|
|
431
|
-
#### Code Changes
|
|
432
|
-
```bash
|
|
433
|
-
git diff main..feature-branch --stat
|
|
434
|
-
```
|
|
435
|
-
|
|
436
|
-
- Files changed: [count]
|
|
437
|
-
- Lines added: [count]
|
|
438
|
-
- Lines removed: [count]
|
|
439
|
-
- Tests added: [count]
|
|
440
|
-
|
|
441
|
-
#### Acceptance Criteria Testing
|
|
442
|
-
|
|
443
|
-
| Criterion | Claimed | Tested | Verdict |
|
|
444
|
-
|-----------|---------|--------|---------|
|
|
445
|
-
| AC1: [criterion] | ✓ | [yes/no] | [pass/fail] |
|
|
446
|
-
| AC2: [criterion] | ✓ | [yes/no] | [pass/fail] |
|
|
447
|
-
| AC3: [criterion] | ✓ | [yes/no] | [pass/fail] |
|
|
448
|
-
|
|
449
|
-
### Deductive Analysis
|
|
450
|
-
|
|
451
|
-
**Claim**: [what PR says is implemented]
|
|
452
|
-
|
|
453
|
-
**Evidence**:
|
|
454
|
-
- [Fact 1 from code]
|
|
455
|
-
- [Fact 2 from tests]
|
|
456
|
-
- [Fact 3 from behavior]
|
|
457
|
-
|
|
458
|
-
**Deduction**:
|
|
459
|
-
- [Logical conclusion from evidence]
|
|
460
|
-
|
|
461
|
-
**Verdict**: [supported/partially supported/not supported by evidence]
|
|
462
|
-
|
|
463
|
-
### Missing Elements
|
|
464
|
-
- [ ] [Feature aspect not implemented]
|
|
465
|
-
- [ ] [Test scenario not covered]
|
|
466
|
-
- [ ] [Edge case not handled]
|
|
213
|
+
## Agent Coordination Hints
|
|
467
214
|
|
|
468
|
-
###
|
|
469
|
-
[Evidence-based assessment of implementation completeness]
|
|
215
|
+
### Memory Namespace
|
|
470
216
|
```
|
|
471
|
-
|
|
472
|
-
|
|
473
|
-
|
|
474
|
-
|
|
475
|
-
|
|
476
|
-
|
|
477
|
-
### The Claim
|
|
478
|
-
"Improved performance by [X]% in [scenario]"
|
|
479
|
-
|
|
480
|
-
### Investigation Setup
|
|
481
|
-
|
|
482
|
-
#### Baseline Measurement
|
|
483
|
-
```bash
|
|
484
|
-
git checkout [before-commit]
|
|
485
|
-
npm run benchmark > baseline.txt
|
|
217
|
+
aqe/sherlock/
|
|
218
|
+
├── investigations/* - Investigation reports
|
|
219
|
+
├── evidence/* - Collected evidence
|
|
220
|
+
├── verdicts/* - Claim verdicts
|
|
221
|
+
└── patterns/* - Common deception patterns
|
|
486
222
|
```
|
|
487
223
|
|
|
488
|
-
|
|
489
|
-
```
|
|
490
|
-
|
|
491
|
-
|
|
492
|
-
|
|
493
|
-
|
|
494
|
-
|
|
495
|
-
|
|
496
|
-
|
|
497
|
-
|
|
498
|
-
|
|
499
|
-
|--------|--------|-------|-------------|---------|
|
|
500
|
-
| Latency | [ms] | [ms] | [%] | [%] |
|
|
501
|
-
| Throughput | [req/s] | [req/s] | [%] | [%] |
|
|
502
|
-
| Memory | [MB] | [MB] | [%] | [%] |
|
|
503
|
-
| CPU | [%] | [%] | [%] | [%] |
|
|
504
|
-
|
|
505
|
-
### Deductive Analysis
|
|
506
|
-
|
|
507
|
-
**Claimed Improvement**: [X]%
|
|
508
|
-
**Measured Improvement**: [Y]%
|
|
509
|
-
**Variance**: [X-Y]%
|
|
510
|
-
|
|
511
|
-
**Measurement Conditions**:
|
|
512
|
-
- Environment: [prod/dev/local]
|
|
513
|
-
- Load: [concurrent users/requests]
|
|
514
|
-
- Data size: [records/MB]
|
|
515
|
-
|
|
516
|
-
**Verdict**:
|
|
517
|
-
- [ ] Claim supported by evidence
|
|
518
|
-
- [ ] Claim exaggerated (actual: [Y]%)
|
|
519
|
-
- [ ] Claim not reproducible
|
|
520
|
-
- [ ] Claim based on cherry-picked scenario
|
|
521
|
-
|
|
522
|
-
### Conclusion
|
|
523
|
-
[Evidence-based assessment with actual numbers]
|
|
224
|
+
### Fleet Coordination
|
|
225
|
+
```typescript
|
|
226
|
+
const investigationFleet = await FleetManager.coordinate({
|
|
227
|
+
strategy: 'evidence-investigation',
|
|
228
|
+
agents: [
|
|
229
|
+
'qe-code-reviewer', // Code analysis
|
|
230
|
+
'qe-security-auditor', // Security claim verification
|
|
231
|
+
'qe-performance-validator' // Performance claim verification
|
|
232
|
+
],
|
|
233
|
+
topology: 'parallel'
|
|
234
|
+
});
|
|
524
235
|
```
|
|
525
236
|
|
|
526
237
|
---
|
|
527
238
|
|
|
528
|
-
##
|
|
529
|
-
|
|
530
|
-
|
|
531
|
-
|
|
532
|
-
> "I can't make bricks without clay."
|
|
533
|
-
|
|
534
|
-
**Application**: Collect comprehensive evidence before forming conclusions
|
|
535
|
-
|
|
536
|
-
- Logs, traces, metrics
|
|
537
|
-
- Test results, coverage reports
|
|
538
|
-
- Code diffs, git history
|
|
539
|
-
- Reproduction steps
|
|
540
|
-
|
|
541
|
-
### Principle 2: "Eliminate the Impossible"
|
|
542
|
-
|
|
543
|
-
> "When you have eliminated the impossible, whatever remains, however improbable, must be the truth."
|
|
544
|
-
|
|
545
|
-
**Application**: Use negative testing and boundary analysis
|
|
546
|
-
|
|
547
|
-
- Test what should NOT happen
|
|
548
|
-
- Verify constraints are enforced
|
|
549
|
-
- Check impossible inputs are rejected
|
|
550
|
-
- Validate error handling paths
|
|
551
|
-
|
|
552
|
-
### Principle 3: "Observe, Don't Assume"
|
|
553
|
-
|
|
554
|
-
> "You see, but you do not observe."
|
|
555
|
-
|
|
556
|
-
**Application**: Run the code, don't just read it
|
|
557
|
-
|
|
558
|
-
- Execute tests locally
|
|
559
|
-
- Step through debugger
|
|
560
|
-
- Profile performance
|
|
561
|
-
- Monitor resource usage
|
|
562
|
-
|
|
563
|
-
### Principle 4: "The Little Things Matter"
|
|
564
|
-
|
|
565
|
-
> "It has long been an axiom of mine that the little things are infinitely the most important."
|
|
566
|
-
|
|
567
|
-
**Application**: Pay attention to details others miss
|
|
568
|
-
|
|
569
|
-
- Off-by-one errors
|
|
570
|
-
- Null/undefined handling
|
|
571
|
-
- Timezone conversions
|
|
572
|
-
- Race conditions
|
|
573
|
-
- Memory leaks
|
|
574
|
-
|
|
575
|
-
### Principle 5: "Question Everything"
|
|
576
|
-
|
|
577
|
-
> "I never guess. It is a capital mistake to theorize before one has data."
|
|
578
|
-
|
|
579
|
-
**Application**: Verify all claims empirically
|
|
580
|
-
|
|
581
|
-
- Don't trust commit messages
|
|
582
|
-
- Don't trust documentation
|
|
583
|
-
- Don't trust "it works on my machine"
|
|
584
|
-
- Trust only reproducible evidence
|
|
585
|
-
|
|
586
|
-
---
|
|
587
|
-
|
|
588
|
-
## The Sherlock Review Checklist
|
|
589
|
-
|
|
590
|
-
Before approving any PR, verify:
|
|
591
|
-
|
|
592
|
-
### Evidence-Based Review
|
|
593
|
-
|
|
594
|
-
- [ ] **Claim vs Reality**: Does code match description?
|
|
595
|
-
- [ ] **Tests Verify Claims**: Do tests actually prove the fix/feature?
|
|
596
|
-
- [ ] **Reproducible**: Can you reproduce the bug/feature locally?
|
|
597
|
-
- [ ] **Edge Cases**: Are boundary conditions tested?
|
|
598
|
-
- [ ] **Negative Cases**: Are failure paths tested?
|
|
599
|
-
|
|
600
|
-
### Deductive Reasoning
|
|
601
|
-
|
|
602
|
-
- [ ] **Root Cause**: Does fix address actual root cause?
|
|
603
|
-
- [ ] **Side Effects**: Could this break something else?
|
|
604
|
-
- [ ] **Performance**: Any evidence for performance claims?
|
|
605
|
-
- [ ] **Security**: Any security implications?
|
|
606
|
-
- [ ] **Assumptions**: Are all assumptions validated?
|
|
607
|
-
|
|
608
|
-
### Observational Analysis
|
|
609
|
-
|
|
610
|
-
- [ ] **Code Quality**: Is code doing what it appears to do?
|
|
611
|
-
- [ ] **Error Handling**: Are errors handled or just hidden?
|
|
612
|
-
- [ ] **Resource Management**: Are resources properly managed?
|
|
613
|
-
- [ ] **Concurrency**: Any race conditions or deadlocks?
|
|
614
|
-
- [ ] **Data Validation**: Is input validated?
|
|
615
|
-
|
|
616
|
-
### Timeline Verification
|
|
617
|
-
|
|
618
|
-
- [ ] **Related Changes**: Are there related commits?
|
|
619
|
-
- [ ] **Regression Risk**: Could this reintroduce old bugs?
|
|
620
|
-
- [ ] **Dependencies**: Are dependency changes necessary?
|
|
621
|
-
- [ ] **Migration Path**: Is there a rollback plan?
|
|
239
|
+
## Related Skills
|
|
240
|
+
- [brutal-honesty-review](../brutal-honesty-review/) - Direct technical criticism
|
|
241
|
+
- [context-driven-testing](../context-driven-testing/) - Adapt to context
|
|
242
|
+
- [bug-reporting-excellence](../bug-reporting-excellence/) - Document findings
|
|
622
243
|
|
|
623
244
|
---
|
|
624
245
|
|
|
625
|
-
##
|
|
626
|
-
|
|
627
|
-
### Scenario 1: "This Fixed the Bug"
|
|
628
|
-
|
|
629
|
-
**Investigation Steps**:
|
|
630
|
-
1. Reproduce bug on commit before fix
|
|
631
|
-
2. Verify bug is gone on commit with fix
|
|
632
|
-
3. Check if fix addresses root cause or just symptom
|
|
633
|
-
4. Test edge cases not in original bug report
|
|
634
|
-
5. Verify no regression in related functionality
|
|
635
|
-
|
|
636
|
-
**Red Flags**:
|
|
637
|
-
- Bug "fix" that just removes error logging
|
|
638
|
-
- Fix that works only for specific test case
|
|
639
|
-
- Fix that introduces workarounds instead of solving root cause
|
|
640
|
-
- No test added to prevent regression
|
|
641
|
-
|
|
642
|
-
### Scenario 2: "Improved Performance by 50%"
|
|
643
|
-
|
|
644
|
-
**Investigation Steps**:
|
|
645
|
-
1. Run benchmark on baseline commit
|
|
646
|
-
2. Run same benchmark on optimized commit
|
|
647
|
-
3. Compare results in identical conditions
|
|
648
|
-
4. Verify measurement methodology
|
|
649
|
-
5. Test under realistic load
|
|
650
|
-
|
|
651
|
-
**Red Flags**:
|
|
652
|
-
- Performance tested only on toy data
|
|
653
|
-
- Comparison uses different conditions
|
|
654
|
-
- "Improvement" in non-critical path
|
|
655
|
-
- Trade-off not mentioned (e.g., memory for speed)
|
|
656
|
-
|
|
657
|
-
### Scenario 3: "Added Comprehensive Error Handling"
|
|
658
|
-
|
|
659
|
-
**Investigation Steps**:
|
|
660
|
-
1. List all error paths in code
|
|
661
|
-
2. Verify each path has handling
|
|
662
|
-
3. Test each error condition
|
|
663
|
-
4. Check error messages are actionable
|
|
664
|
-
5. Verify errors are logged/monitored
|
|
665
|
-
|
|
666
|
-
**Red Flags**:
|
|
667
|
-
- Errors caught but ignored (`catch {}`)
|
|
668
|
-
- Generic error messages
|
|
669
|
-
- Errors handled by crashing
|
|
670
|
-
- No logging of critical errors
|
|
671
|
-
|
|
672
|
-
---
|
|
673
|
-
|
|
674
|
-
## Output Format
|
|
675
|
-
|
|
676
|
-
### The Sherlock Report
|
|
677
|
-
|
|
678
|
-
```markdown
|
|
679
|
-
# Sherlock Investigation Report
|
|
680
|
-
|
|
681
|
-
**Case**: [PR/Issue number and title]
|
|
682
|
-
**Investigator**: [Your name]
|
|
683
|
-
**Date**: [Investigation date]
|
|
684
|
-
|
|
685
|
-
## Summary
|
|
686
|
-
[One paragraph: What was claimed, what was found, verdict]
|
|
687
|
-
|
|
688
|
-
## Claims Examined
|
|
689
|
-
1. [Claim 1]
|
|
690
|
-
2. [Claim 2]
|
|
691
|
-
3. [Claim 3]
|
|
692
|
-
|
|
693
|
-
## Evidence Collected
|
|
694
|
-
- Code changes: [summary]
|
|
695
|
-
- Tests added: [count and coverage]
|
|
696
|
-
- Benchmarks: [results]
|
|
697
|
-
- Manual testing: [scenarios tested]
|
|
698
|
-
|
|
699
|
-
## Deductive Analysis
|
|
700
|
-
|
|
701
|
-
### Claim 1: [Claim text]
|
|
702
|
-
**Evidence**: [What you found]
|
|
703
|
-
**Deduction**: [Logical conclusion]
|
|
704
|
-
**Verdict**: ✓ TRUE / ✗ FALSE / ⚠ PARTIALLY TRUE
|
|
705
|
-
|
|
706
|
-
[Repeat for each claim]
|
|
707
|
-
|
|
708
|
-
## Findings
|
|
709
|
-
|
|
710
|
-
### What Works
|
|
711
|
-
- [Positive finding with evidence]
|
|
712
|
-
|
|
713
|
-
### What Doesn't Work
|
|
714
|
-
- [Issue found with evidence]
|
|
715
|
-
|
|
716
|
-
### What's Missing
|
|
717
|
-
- [Gap in implementation/testing]
|
|
718
|
-
|
|
719
|
-
## Overall Verdict
|
|
720
|
-
|
|
721
|
-
- [ ] Approve: Claims fully supported by evidence
|
|
722
|
-
- [ ] Approve with Reservations: Claims mostly accurate
|
|
723
|
-
- [ ] Request Changes: Claims not supported by evidence
|
|
724
|
-
- [ ] Reject: Claims are false or misleading
|
|
725
|
-
|
|
726
|
-
## Recommendations
|
|
727
|
-
1. [Action item based on findings]
|
|
728
|
-
2. [Action item based on findings]
|
|
729
|
-
|
|
730
|
-
---
|
|
731
|
-
|
|
732
|
-
**Elementary Evidence**: [Link to detailed evidence files/logs]
|
|
733
|
-
**Reproducible**: [Yes/No - Can others verify your findings?]
|
|
734
|
-
```
|
|
735
|
-
|
|
736
|
-
---
|
|
737
|
-
|
|
738
|
-
## Integration with AQE Fleet
|
|
739
|
-
|
|
740
|
-
### Use Sherlock Review With:
|
|
741
|
-
|
|
742
|
-
1. **qe-code-reviewer**: After automated review, investigate flagged issues
|
|
743
|
-
2. **qe-security-auditor**: Verify security fix claims
|
|
744
|
-
3. **qe-performance-validator**: Validate performance improvement claims
|
|
745
|
-
4. **qe-flaky-test-hunter**: Investigate "test fixed" claims
|
|
746
|
-
5. **production-validator**: Verify deployment-ready claims
|
|
747
|
-
|
|
748
|
-
### Workflow Integration
|
|
749
|
-
|
|
750
|
-
```bash
|
|
751
|
-
# 1. Automated review flags issues
|
|
752
|
-
aqe review --pr 123
|
|
753
|
-
|
|
754
|
-
# 2. Sherlock investigates flagged claims
|
|
755
|
-
# [Apply Sherlock methodology to each flag]
|
|
756
|
-
|
|
757
|
-
# 3. Document evidence-based findings
|
|
758
|
-
# [Generate Sherlock report]
|
|
759
|
-
|
|
760
|
-
# 4. Provide actionable feedback
|
|
761
|
-
# [Based on evidence, not assumptions]
|
|
762
|
-
```
|
|
763
|
-
|
|
764
|
-
---
|
|
765
|
-
|
|
766
|
-
## Learn More
|
|
767
|
-
|
|
768
|
-
### Recommended Reading
|
|
769
|
-
- "The Adventure of Silver Blaze" - Importance of negative evidence
|
|
770
|
-
- "A Scandal in Bohemia" - Observation vs. seeing
|
|
771
|
-
- "The Boscombe Valley Mystery" - Following the evidence chain
|
|
772
|
-
|
|
773
|
-
### Related QE Skills
|
|
774
|
-
- `brutal-honesty-review` - Direct technical criticism
|
|
775
|
-
- `context-driven-testing` - Adapt to specific context
|
|
776
|
-
- `exploratory-testing-advanced` - Investigation techniques
|
|
777
|
-
- `bug-reporting-excellence` - Document findings clearly
|
|
778
|
-
|
|
779
|
-
---
|
|
246
|
+
## Remember
|
|
780
247
|
|
|
781
|
-
**
|
|
782
|
-
**Category**: Quality Engineering
|
|
783
|
-
**Approach**: Evidence-Based Investigation
|
|
784
|
-
**Philosophy**: "Elementary" - Trust only what can be proven
|
|
248
|
+
**"It is a capital mistake to theorize before one has data."** Trust only reproducible evidence. Don't trust commit messages, documentation, or "works on my machine."
|
|
785
249
|
|
|
786
|
-
|
|
250
|
+
**The Sherlock Standard:** Every claim must be verified empirically. What does the evidence actually show?
|