agentic-qe 2.0.0 → 2.1.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (144) hide show
  1. package/.claude/agents/qx-partner.md +17 -4
  2. package/.claude/skills/accessibility-testing/SKILL.md +144 -692
  3. package/.claude/skills/agentic-quality-engineering/SKILL.md +176 -529
  4. package/.claude/skills/api-testing-patterns/SKILL.md +180 -560
  5. package/.claude/skills/brutal-honesty-review/SKILL.md +113 -603
  6. package/.claude/skills/bug-reporting-excellence/SKILL.md +116 -517
  7. package/.claude/skills/chaos-engineering-resilience/SKILL.md +127 -72
  8. package/.claude/skills/cicd-pipeline-qe-orchestrator/SKILL.md +209 -404
  9. package/.claude/skills/code-review-quality/SKILL.md +158 -608
  10. package/.claude/skills/compatibility-testing/SKILL.md +148 -38
  11. package/.claude/skills/compliance-testing/SKILL.md +132 -63
  12. package/.claude/skills/consultancy-practices/SKILL.md +114 -446
  13. package/.claude/skills/context-driven-testing/SKILL.md +117 -381
  14. package/.claude/skills/contract-testing/SKILL.md +176 -141
  15. package/.claude/skills/database-testing/SKILL.md +137 -130
  16. package/.claude/skills/exploratory-testing-advanced/SKILL.md +160 -629
  17. package/.claude/skills/holistic-testing-pact/SKILL.md +140 -188
  18. package/.claude/skills/localization-testing/SKILL.md +145 -33
  19. package/.claude/skills/mobile-testing/SKILL.md +132 -448
  20. package/.claude/skills/mutation-testing/SKILL.md +147 -41
  21. package/.claude/skills/performance-testing/SKILL.md +200 -546
  22. package/.claude/skills/quality-metrics/SKILL.md +164 -519
  23. package/.claude/skills/refactoring-patterns/SKILL.md +132 -699
  24. package/.claude/skills/regression-testing/SKILL.md +120 -926
  25. package/.claude/skills/risk-based-testing/SKILL.md +157 -660
  26. package/.claude/skills/security-testing/SKILL.md +199 -538
  27. package/.claude/skills/sherlock-review/SKILL.md +163 -699
  28. package/.claude/skills/shift-left-testing/SKILL.md +161 -465
  29. package/.claude/skills/shift-right-testing/SKILL.md +161 -519
  30. package/.claude/skills/six-thinking-hats/SKILL.md +175 -1110
  31. package/.claude/skills/skills-manifest.json +71 -20
  32. package/.claude/skills/tdd-london-chicago/SKILL.md +131 -448
  33. package/.claude/skills/technical-writing/SKILL.md +103 -154
  34. package/.claude/skills/test-automation-strategy/SKILL.md +166 -772
  35. package/.claude/skills/test-data-management/SKILL.md +126 -910
  36. package/.claude/skills/test-design-techniques/SKILL.md +179 -89
  37. package/.claude/skills/test-environment-management/SKILL.md +136 -91
  38. package/.claude/skills/test-reporting-analytics/SKILL.md +169 -92
  39. package/.claude/skills/testability-scoring/SKILL.md +172 -538
  40. package/.claude/skills/testability-scoring/scripts/generate-html-report.js +0 -0
  41. package/.claude/skills/visual-testing-advanced/SKILL.md +155 -78
  42. package/.claude/skills/xp-practices/SKILL.md +151 -587
  43. package/CHANGELOG.md +86 -0
  44. package/README.md +23 -16
  45. package/dist/agents/QXPartnerAgent.d.ts +47 -1
  46. package/dist/agents/QXPartnerAgent.d.ts.map +1 -1
  47. package/dist/agents/QXPartnerAgent.js +2086 -125
  48. package/dist/agents/QXPartnerAgent.js.map +1 -1
  49. package/dist/agents/lifecycle/AgentLifecycleManager.d.ts.map +1 -1
  50. package/dist/agents/lifecycle/AgentLifecycleManager.js +34 -31
  51. package/dist/agents/lifecycle/AgentLifecycleManager.js.map +1 -1
  52. package/dist/cli/commands/init-claude-md-template.d.ts.map +1 -1
  53. package/dist/cli/commands/init-claude-md-template.js +14 -0
  54. package/dist/cli/commands/init-claude-md-template.js.map +1 -1
  55. package/dist/core/SwarmCoordinator.d.ts +180 -0
  56. package/dist/core/SwarmCoordinator.d.ts.map +1 -0
  57. package/dist/core/SwarmCoordinator.js +473 -0
  58. package/dist/core/SwarmCoordinator.js.map +1 -0
  59. package/dist/core/memory/ReflexionMemoryAdapter.d.ts +109 -0
  60. package/dist/core/memory/ReflexionMemoryAdapter.d.ts.map +1 -0
  61. package/dist/core/memory/ReflexionMemoryAdapter.js +306 -0
  62. package/dist/core/memory/ReflexionMemoryAdapter.js.map +1 -0
  63. package/dist/core/memory/RuVectorPatternStore.d.ts +28 -0
  64. package/dist/core/memory/RuVectorPatternStore.d.ts.map +1 -1
  65. package/dist/core/memory/RuVectorPatternStore.js +70 -0
  66. package/dist/core/memory/RuVectorPatternStore.js.map +1 -1
  67. package/dist/core/memory/SparseVectorSearch.d.ts +55 -0
  68. package/dist/core/memory/SparseVectorSearch.d.ts.map +1 -0
  69. package/dist/core/memory/SparseVectorSearch.js +130 -0
  70. package/dist/core/memory/SparseVectorSearch.js.map +1 -0
  71. package/dist/core/memory/TieredCompression.d.ts +81 -0
  72. package/dist/core/memory/TieredCompression.d.ts.map +1 -0
  73. package/dist/core/memory/TieredCompression.js +270 -0
  74. package/dist/core/memory/TieredCompression.js.map +1 -0
  75. package/dist/core/memory/index.d.ts +6 -0
  76. package/dist/core/memory/index.d.ts.map +1 -1
  77. package/dist/core/memory/index.js +29 -1
  78. package/dist/core/memory/index.js.map +1 -1
  79. package/dist/core/metrics/MetricsAggregator.d.ts +228 -0
  80. package/dist/core/metrics/MetricsAggregator.d.ts.map +1 -0
  81. package/dist/core/metrics/MetricsAggregator.js +482 -0
  82. package/dist/core/metrics/MetricsAggregator.js.map +1 -0
  83. package/dist/core/metrics/index.d.ts +5 -0
  84. package/dist/core/metrics/index.d.ts.map +1 -0
  85. package/dist/core/metrics/index.js +11 -0
  86. package/dist/core/metrics/index.js.map +1 -0
  87. package/dist/core/optimization/SwarmOptimizer.d.ts +5 -0
  88. package/dist/core/optimization/SwarmOptimizer.d.ts.map +1 -1
  89. package/dist/core/optimization/SwarmOptimizer.js +17 -0
  90. package/dist/core/optimization/SwarmOptimizer.js.map +1 -1
  91. package/dist/core/orchestration/AdaptiveScheduler.d.ts +190 -0
  92. package/dist/core/orchestration/AdaptiveScheduler.d.ts.map +1 -0
  93. package/dist/core/orchestration/AdaptiveScheduler.js +460 -0
  94. package/dist/core/orchestration/AdaptiveScheduler.js.map +1 -0
  95. package/dist/core/orchestration/WorkflowOrchestrator.d.ts +13 -0
  96. package/dist/core/orchestration/WorkflowOrchestrator.d.ts.map +1 -1
  97. package/dist/core/orchestration/WorkflowOrchestrator.js +32 -0
  98. package/dist/core/orchestration/WorkflowOrchestrator.js.map +1 -1
  99. package/dist/core/recovery/CircuitBreaker.d.ts +176 -0
  100. package/dist/core/recovery/CircuitBreaker.d.ts.map +1 -0
  101. package/dist/core/recovery/CircuitBreaker.js +382 -0
  102. package/dist/core/recovery/CircuitBreaker.js.map +1 -0
  103. package/dist/core/recovery/RecoveryOrchestrator.d.ts +186 -0
  104. package/dist/core/recovery/RecoveryOrchestrator.d.ts.map +1 -0
  105. package/dist/core/recovery/RecoveryOrchestrator.js +476 -0
  106. package/dist/core/recovery/RecoveryOrchestrator.js.map +1 -0
  107. package/dist/core/recovery/RetryStrategy.d.ts +127 -0
  108. package/dist/core/recovery/RetryStrategy.d.ts.map +1 -0
  109. package/dist/core/recovery/RetryStrategy.js +314 -0
  110. package/dist/core/recovery/RetryStrategy.js.map +1 -0
  111. package/dist/core/recovery/index.d.ts +8 -0
  112. package/dist/core/recovery/index.d.ts.map +1 -0
  113. package/dist/core/recovery/index.js +27 -0
  114. package/dist/core/recovery/index.js.map +1 -0
  115. package/dist/core/skills/DependencyResolver.d.ts +99 -0
  116. package/dist/core/skills/DependencyResolver.d.ts.map +1 -0
  117. package/dist/core/skills/DependencyResolver.js +260 -0
  118. package/dist/core/skills/DependencyResolver.js.map +1 -0
  119. package/dist/core/skills/ManifestGenerator.d.ts +114 -0
  120. package/dist/core/skills/ManifestGenerator.d.ts.map +1 -0
  121. package/dist/core/skills/ManifestGenerator.js +449 -0
  122. package/dist/core/skills/ManifestGenerator.js.map +1 -0
  123. package/dist/core/skills/index.d.ts +9 -0
  124. package/dist/core/skills/index.d.ts.map +1 -0
  125. package/dist/core/skills/index.js +24 -0
  126. package/dist/core/skills/index.js.map +1 -0
  127. package/dist/mcp/handlers/chaos/chaos-inject-failure.d.ts +5 -0
  128. package/dist/mcp/handlers/chaos/chaos-inject-failure.d.ts.map +1 -1
  129. package/dist/mcp/handlers/chaos/chaos-inject-failure.js +36 -2
  130. package/dist/mcp/handlers/chaos/chaos-inject-failure.js.map +1 -1
  131. package/dist/mcp/handlers/chaos/chaos-inject-latency.d.ts +5 -0
  132. package/dist/mcp/handlers/chaos/chaos-inject-latency.d.ts.map +1 -1
  133. package/dist/mcp/handlers/chaos/chaos-inject-latency.js +36 -2
  134. package/dist/mcp/handlers/chaos/chaos-inject-latency.js.map +1 -1
  135. package/dist/mcp/server.d.ts +9 -9
  136. package/dist/mcp/server.d.ts.map +1 -1
  137. package/dist/mcp/server.js +1 -2
  138. package/dist/mcp/server.js.map +1 -1
  139. package/dist/types/qx.d.ts +113 -7
  140. package/dist/types/qx.d.ts.map +1 -1
  141. package/dist/types/qx.js.map +1 -1
  142. package/dist/visualization/api/RestEndpoints.js +1 -1
  143. package/dist/visualization/api/RestEndpoints.js.map +1 -1
  144. package/package.json +15 -54
@@ -1,786 +1,250 @@
1
1
  ---
2
- name: "Sherlock Review"
2
+ name: sherlock-review
3
3
  description: "Evidence-based investigative code review using deductive reasoning to determine what actually happened versus what was claimed. Use when verifying implementation claims, investigating bugs, validating fixes, or conducting root cause analysis. Elementary approach to finding truth through systematic observation."
4
+ category: quality-review
5
+ priority: high
6
+ tokenEstimate: 1100
7
+ agents: [qe-code-reviewer, qe-security-auditor, qe-performance-validator]
8
+ implementation_status: optimized
9
+ optimization_version: 1.0
10
+ last_optimized: 2025-12-03
11
+ dependencies: []
12
+ quick_reference_card: true
13
+ tags: [investigation, evidence-based, code-review, root-cause, deduction]
4
14
  ---
5
15
 
6
16
  # Sherlock Review
7
17
 
8
- ## What This Skill Does
9
-
10
- Conducts methodical, evidence-based investigation of code, tests, and system behavior using Holmesian deductive reasoning. Unlike traditional code reviews that focus on style and best practices, Sherlock Review investigates **what actually happened** versus **what was claimed to happen**, seeing what others miss through systematic observation and logical deduction.
11
-
12
- ## Prerequisites
13
-
14
- - Access to codebase and version control history
15
- - Ability to run tests and reproduce issues
16
- - Understanding of the domain and system architecture
17
- - Critical thinking and skepticism
18
-
19
- ---
20
-
21
- ## Quick Start (Elementary Method)
22
-
23
- ### The 3-Step Investigation
18
+ <default_to_action>
19
+ When investigating code claims:
20
+ 1. OBSERVE: Gather all evidence (code, tests, history, behavior)
21
+ 2. DEDUCE: What does evidence actually show vs. what was claimed?
22
+ 3. ELIMINATE: Rule out what cannot be true
23
+ 4. CONCLUDE: Does evidence support the claim?
24
+ 5. DOCUMENT: Findings with proof, not assumptions
24
25
 
26
+ **The 3-Step Investigation:**
25
27
  ```bash
26
- # 1. OBSERVE: Gather all evidence
27
- git log --oneline -10
28
+ # 1. OBSERVE: Gather evidence
28
29
  git diff <commit>
29
- grep -r "claimed feature" .
30
+ npm test -- --coverage
30
31
 
31
- # 2. DEDUCE: What does the evidence actually show?
32
- npm test
33
- git blame <file>
32
+ # 2. DEDUCE: Compare claim vs reality
33
+ # Does code match description?
34
+ # Do tests prove the fix/feature?
34
35
 
35
- # 3. CONCLUDE: Does evidence support the claim?
36
- # Document findings with evidence
36
+ # 3. CONCLUDE: Verdict with evidence
37
+ # SUPPORTED / PARTIALLY SUPPORTED / NOT SUPPORTED
37
38
  ```
38
39
 
39
- ---
40
-
41
- ## Investigation Methodology
42
-
43
- ### Level 1: Initial Observation (The Crime Scene)
44
-
45
- **Principle**: "You see, but you do not observe. The distinction is clear."
46
-
47
- #### What to Examine First
40
+ **Holmesian Principles:**
41
+ - "Data! Data! Data!" - Collect before concluding
42
+ - "Eliminate the impossible" - What cannot be true?
43
+ - "You see, but do not observe" - Run code, don't just read
44
+ - Trust only reproducible evidence
45
+ </default_to_action>
48
46
 
49
- 1. **The Claim**: What was supposed to happen?
50
- - PR description
51
- - Commit messages
52
- - Issue tickets
53
- - Documentation updates
47
+ ## Quick Reference Card
54
48
 
55
- 2. **The Evidence**: What actually exists?
56
- - Actual code changes
57
- - Test coverage
58
- - Build/test results
59
- - Runtime behavior
49
+ ### Evidence Collection Checklist
60
50
 
61
- 3. **The Timeline**: When did things happen?
62
- - Commit history
63
- - File modification times
64
- - Test execution logs
65
- - Deployment records
51
+ | Category | What to Check | How |
52
+ |----------|---------------|-----|
53
+ | **Claim** | PR description, commit messages | Read thoroughly |
54
+ | **Code** | Actual file changes | `git diff` |
55
+ | **Tests** | Coverage, assertions | Run independently |
56
+ | **Behavior** | Runtime output | Execute locally |
57
+ | **Timeline** | When things happened | `git log`, `git blame` |
66
58
 
67
- #### Evidence Collection Checklist
59
+ ### Verdict Levels
68
60
 
69
- ```markdown
70
- ## Evidence Collection
71
-
72
- ### The Claim
73
- - [ ] Read PR/issue description thoroughly
74
- - [ ] Note all claimed features/fixes
75
- - [ ] Identify specific assertions made
76
- - [ ] Record expected behavior
77
-
78
- ### The Code
79
- - [ ] Examine actual file changes
80
- - [ ] Review implementation details
81
- - [ ] Check for edge cases
82
- - [ ] Verify error handling
83
-
84
- ### The Tests
85
- - [ ] Count test cases added/modified
86
- - [ ] Run tests independently
87
- - [ ] Check test assertions
88
- - [ ] Verify test coverage
89
-
90
- ### The Behavior
91
- - [ ] Run the code locally
92
- - [ ] Test claimed scenarios
93
- - [ ] Try edge cases
94
- - [ ] Reproduce reported fixes
95
- ```
61
+ | Verdict | Meaning |
62
+ |---------|---------|
63
+ | ✓ **TRUE** | Evidence fully supports claim |
64
+ | **PARTIALLY TRUE** | Claim accurate but incomplete |
65
+ | **FALSE** | Evidence contradicts claim |
66
+ | ? **NONSENSICAL** | Claim doesn't apply to context |
96
67
 
97
68
  ---
98
69
 
99
- ## Level 2: Deductive Analysis (Elementary Reasoning)
100
-
101
- ### The Sherlock Framework
102
-
103
- #### 1. Eliminate the Impossible
104
-
105
- **Method**: Systematically rule out what cannot be true
70
+ ## Investigation Template
106
71
 
107
72
  ```markdown
108
- ## Investigation Notes
73
+ ## Sherlock Investigation: [Claim]
109
74
 
110
- ### Claim: "Fixed user authentication bug"
75
+ ### The Claim
76
+ "[What PR/commit claims to do]"
111
77
 
112
- #### Evidence Review:
113
- - Modified auth.js (lines 45-67)
114
- - Added 2 new test cases
115
- - No changes to login flow
116
- - ✗ No database migration
117
- - ✗ Tests pass but don't cover reported scenario
78
+ ### Evidence Examined
79
+ - Code changes: [files, lines]
80
+ - Tests added: [count, coverage]
81
+ - Behavior observed: [what actually happens]
118
82
 
119
- #### Deductions:
120
- - IMPOSSIBLE: Fix covers all auth scenarios (no login flow changes)
121
- - POSSIBLE: Fix covers specific password reset case
122
- - LIKELY: Fix is partial, limited to one code path
123
- ```
83
+ ### Deductive Analysis
124
84
 
125
- #### 2. Follow the Evidence Chain
85
+ **Claim**: [specific assertion]
86
+ **Evidence**: [what you found]
87
+ **Deduction**: [logical conclusion]
88
+ **Verdict**: ✓/⚠/✗
126
89
 
127
- **Method**: Connect observable facts to logical conclusions
90
+ ### Findings
91
+ - What works: [with evidence]
92
+ - What doesn't: [with evidence]
93
+ - What's missing: [gaps in implementation/testing]
128
94
 
129
- ```markdown
130
- ## Evidence Chain
131
-
132
- ### Observation 1: Test passes locally
133
- ### Observation 2: Test fails in CI
134
- ### Observation 3: Different Node versions
135
-
136
- ### Chain of Reasoning:
137
- 1. Test behavior differs by environment
138
- 2. Environment difference is Node version
139
- 3. Code uses Node-version-specific API
140
- 4. Therefore: Fix is environment-dependent
141
- 5. Conclusion: Claim of "fixed" is incomplete
95
+ ### Recommendations
96
+ 1. [Action based on findings]
142
97
  ```
143
98
 
144
- #### 3. Question Everything
145
-
146
- **Critical Questions to Ask**:
147
-
148
- - Does the code actually do what the commit message claims?
149
- - Do the tests verify the claimed fix?
150
- - Can the bug reproduce in conditions not covered by tests?
151
- - Are there edge cases not considered?
152
- - Does "works on my machine" equal "properly fixed"?
153
-
154
99
  ---
155
100
 
156
- ## Level 3: Systematic Investigation Process
101
+ ## Investigation Scenarios
157
102
 
158
- ### Step-by-Step Sherlock Review
159
-
160
- #### Step 1: Read the Case File
161
-
162
- ```bash
163
- # Examine the claim
164
- git show <commit>
165
- cat PR_DESCRIPTION.md
166
-
167
- # Note specific assertions:
168
- # - "Fixes race condition in async handler"
169
- # - "Adds comprehensive error handling"
170
- # - "Improves performance by 40%"
171
- ```
172
-
173
- #### Step 2: Examine the Evidence
174
-
175
- ```bash
176
- # What actually changed?
177
- git diff main..feature-branch
103
+ ### Scenario 1: "This Fixed the Bug"
178
104
 
179
- # Count the facts:
180
- FILES_CHANGED=$(git diff --name-only main..feature-branch | wc -l)
181
- LINES_ADDED=$(git diff --stat main..feature-branch | tail -1)
182
- TESTS_ADDED=$(git diff main..feature-branch | grep -c "test(" )
105
+ **Steps:**
106
+ 1. Reproduce bug on commit before fix
107
+ 2. Verify bug is gone on commit with fix
108
+ 3. Check if fix addresses root cause or symptom
109
+ 4. Test edge cases not in original report
183
110
 
184
- echo "Files modified: $FILES_CHANGED"
185
- echo "Tests added: $TESTS_ADDED"
186
- ```
111
+ **Red Flags:**
112
+ - Fix that just removes error logging
113
+ - Works only for specific test case
114
+ - Workarounds instead of root cause fix
115
+ - No regression test added
187
116
 
188
- #### Step 3: Test the Theory
117
+ ### Scenario 2: "Improved Performance by 50%"
189
118
 
190
- ```bash
191
- # Run claimed fixes through scientific method
192
- npm test -- --coverage
119
+ **Steps:**
120
+ 1. Run benchmark on baseline commit
121
+ 2. Run same benchmark on optimized commit
122
+ 3. Compare in identical conditions
123
+ 4. Verify measurement methodology
193
124
 
194
- # Test edge cases not covered:
195
- node scripts/test-edge-cases.js
125
+ **Red Flags:**
126
+ - Tested only on toy data
127
+ - Different comparison conditions
128
+ - Trade-offs not mentioned
196
129
 
197
- # Reproduce original bug:
198
- git checkout <bug-commit>
199
- npm test -- <failing-test>
200
- git checkout <fix-commit>
201
- npm test -- <failing-test>
202
- ```
130
+ ### Scenario 3: "Handles All Edge Cases"
203
131
 
204
- #### Step 4: Cross-Examine the Code
132
+ **Steps:**
133
+ 1. List all edge cases in code path
134
+ 2. Check each has test coverage
135
+ 3. Test boundary conditions
136
+ 4. Verify error handling paths
205
137
 
206
- **Questions for Code Interrogation**:
138
+ **Red Flags:**
139
+ - `catch {}` swallowing errors
140
+ - Generic error messages
141
+ - No logging of critical errors
207
142
 
208
- ```javascript
209
- // CLAIMED: "Handles all null cases"
210
- function processData(data) {
211
- if (data === null) return null; // ✓ Handles null
212
- return data.items.map(x => x); // ✗ Doesn't handle data.items === null
213
- }
214
- // VERDICT: Claim is FALSE - only handles top-level null
215
- ```
143
+ ---
216
144
 
217
- #### Step 5: Compile the Evidence Report
145
+ ## Example Investigation
218
146
 
219
147
  ```markdown
220
- ## Sherlock Investigation Report
221
-
222
- ### Case: PR #123 "Fix race condition in async handler"
148
+ ## Case: PR #123 "Fix race condition in async handler"
223
149
 
224
- ### Claimed Facts:
150
+ ### Claims Examined:
225
151
  1. "Eliminates race condition"
226
152
  2. "Adds mutex locking"
227
153
  3. "100% thread safe"
228
154
 
229
- ### Evidence Examined:
155
+ ### Evidence:
230
156
  - File: src/handlers/async-handler.js
231
157
  - Changes: Added `async/await`, removed callbacks
232
158
  - Tests: 2 new tests for async flow
233
159
  - Coverage: 85% (was 75%)
234
160
 
235
- ### Deductive Analysis:
236
-
237
- #### Claim 1: "Eliminates race condition"
238
- **Evidence**:
239
- - Added `await` to sequential operations
240
- - No actual mutex/lock mechanism found
241
- - No test for concurrent requests
242
-
243
- **Deduction**:
244
- - Code now sequential, not concurrent
245
- - Race condition avoided by removing concurrency
246
- - Not eliminated, just prevented by design change
247
-
248
- **Verdict**: PARTIALLY TRUE (solved differently than claimed)
249
-
250
- #### Claim 2: "Adds mutex locking"
251
- **Evidence**:
252
- - No mutex library imported
253
- - No lock variables found
254
- - No synchronization primitives
255
-
256
- **Deduction**:
257
- - No mutex implementation exists
258
- - Claim is factually incorrect
161
+ ### Analysis:
259
162
 
260
- **Verdict**: FALSE
163
+ **Claim 1: "Eliminates race condition"**
164
+ Evidence: Added `await` to sequential operations. No actual mutex.
165
+ Deduction: Race avoided by removing concurrency, not synchronization.
166
+ Verdict: ⚠ PARTIALLY TRUE (solved differently than claimed)
261
167
 
262
- #### Claim 3: "100% thread safe"
263
- **Evidence**:
264
- - JavaScript is single-threaded
265
- - Node.js event loop model
266
- - No worker threads used
168
+ **Claim 2: "Adds mutex locking"**
169
+ Evidence: No mutex library, no lock variables, no sync primitives.
170
+ Verdict: FALSE
267
171
 
268
- **Deduction**:
269
- - "Thread safe" is meaningless in this context
270
- - Shows misunderstanding of runtime model
172
+ **Claim 3: "100% thread safe"**
173
+ Evidence: JavaScript is single-threaded. No worker threads used.
174
+ Verdict: ? NONSENSICAL (meaningless in this context)
271
175
 
272
- **Verdict**: NONSENSICAL
273
-
274
- ### Final Conclusion:
275
- The fix works but not for the reasons claimed. The race condition is avoided by making operations sequential rather than by adding thread synchronization. Tests verify sequential behavior but don't test the original concurrent scenario.
176
+ ### Conclusion:
177
+ Fix works but not for reasons claimed. Race condition avoided by
178
+ making operations sequential, not by adding synchronization.
276
179
 
277
180
  ### Recommendations:
278
181
  1. Update PR description to accurately reflect solution
279
182
  2. Add test for concurrent request handling
280
- 3. Clarify whether sequential execution is acceptable for performance
281
- 4. Remove incorrect technical claims about "mutex" and "thread safety"
183
+ 3. Remove incorrect technical claims
282
184
  ```
283
185
 
284
186
  ---
285
187
 
286
- ## Level 4: Advanced Investigation Techniques
287
-
288
- ### Technique 1: The Timeline Reconstruction
289
-
290
- **Purpose**: Understand the sequence of events leading to current state
291
-
292
- ```bash
293
- # Build the timeline
294
- git log --all --graph --oneline --decorate
295
-
296
- # Examine critical commits
297
- git log --grep="fix" --grep="bug" --all-match
298
-
299
- # Find when bug was introduced
300
- git bisect start
301
- git bisect bad HEAD
302
- git bisect good v1.0.0
303
- ```
304
-
305
- ### Technique 2: The Behavioral Analysis
306
-
307
- **Purpose**: Observe what the code actually does, not what it's supposed to do
308
-
309
- ```javascript
310
- // Add instrumentation
311
- console.log('[SHERLOCK] Entering function with:', arguments);
312
- console.log('[SHERLOCK] State before:', JSON.stringify(state));
313
- // ... original code ...
314
- console.log('[SHERLOCK] State after:', JSON.stringify(state));
315
- console.log('[SHERLOCK] Returning:', result);
316
- ```
317
-
318
- ### Technique 3: The Stress Test
319
-
320
- **Purpose**: Find limits and breaking points
188
+ ## Agent Integration
321
189
 
322
- ```bash
323
- # Test boundary conditions
324
- npm test -- --iterations=10000
325
-
326
- # Test with invalid inputs
327
- echo '{"invalid": null}' | node src/process.js
328
-
329
- # Test resource exhaustion
330
- ab -n 10000 -c 100 http://localhost:3000/api/endpoint
331
- ```
332
-
333
- ### Technique 4: The Forensic Diff
334
-
335
- **Purpose**: Understand what changed and why
336
-
337
- ```bash
338
- # Compare claimed vs actual changes
339
- git diff --word-diff main..feature-branch
340
-
341
- # Find silent changes (no commit message mention)
342
- git diff main..feature-branch | grep -A5 -B5 "security\|auth\|password"
190
+ ```typescript
191
+ // Evidence-based code review
192
+ await Task("Sherlock Review", {
193
+ prNumber: 123,
194
+ claims: [
195
+ "Fixes memory leak",
196
+ "Improves performance 30%"
197
+ ],
198
+ verifyReproduction: true,
199
+ testEdgeCases: true
200
+ }, "qe-code-reviewer");
343
201
 
344
- # Detect code that was removed
345
- git diff main..feature-branch | grep "^-" | grep -v "^---"
202
+ // Bug fix verification
203
+ await Task("Verify Fix", {
204
+ bugCommit: 'abc123',
205
+ fixCommit: 'def456',
206
+ reproductionSteps: steps,
207
+ testBoundaryConditions: true
208
+ }, "qe-code-reviewer");
346
209
  ```
347
210
 
348
211
  ---
349
212
 
350
- ## Investigation Templates
351
-
352
- ### Template 1: Bug Fix Verification
353
-
354
- ```markdown
355
- ## Sherlock Investigation: Bug Fix Verification
356
-
357
- ### The Bug Report
358
- - **Reported**: [date]
359
- - **Severity**: [P0/P1/P2/P3]
360
- - **Symptoms**: [what users observed]
361
- - **Expected**: [what should happen]
362
-
363
- ### The Claimed Fix
364
- - **PR**: #[number]
365
- - **Commit**: [hash]
366
- - **Description**: [claimed solution]
367
-
368
- ### Evidence Collection
369
-
370
- #### 1. Reproduce Original Bug
371
- - [ ] Checkout commit before fix
372
- - [ ] Follow reproduction steps
373
- - [ ] Confirm bug exists
374
- - [ ] Document observed behavior
375
-
376
- #### 2. Verify Fix
377
- - [ ] Checkout commit with fix
378
- - [ ] Follow same reproduction steps
379
- - [ ] Confirm bug is resolved
380
- - [ ] Test edge cases
381
-
382
- #### 3. Code Analysis
383
- - [ ] Review actual code changes
384
- - [ ] Verify logic addresses root cause
385
- - [ ] Check for side effects
386
- - [ ] Assess test coverage
387
-
388
- ### Deductive Analysis
389
-
390
- **Root Cause Claimed**: [what PR says]
391
- **Root Cause Actual**: [what evidence shows]
392
-
393
- **Fix Mechanism Claimed**: [how PR says it works]
394
- **Fix Mechanism Actual**: [how it actually works]
395
-
396
- **Coverage Claimed**: [scenarios PR claims to handle]
397
- **Coverage Actual**: [scenarios actually handled]
398
-
399
- ### Verdict
400
-
401
- - [ ] Bug is fully fixed
402
- - [ ] Bug is partially fixed
403
- - [ ] Bug is not fixed (claim is false)
404
- - [ ] Bug is fixed but new bugs introduced
405
-
406
- ### Evidence Summary
407
- [Concise summary of findings with proof]
408
-
409
- ### Recommendations
410
- 1. [Action based on evidence]
411
- 2. [Action based on evidence]
412
- ```
413
-
414
- ### Template 2: Feature Implementation Review
415
-
416
- ```markdown
417
- ## Sherlock Investigation: Feature Implementation
418
-
419
- ### The Feature Request
420
- - **Requirement**: [what was requested]
421
- - **Acceptance Criteria**: [how to verify]
422
- - **User Story**: [why it's needed]
423
-
424
- ### The Implementation Claim
425
- - **PR**: #[number]
426
- - **Description**: [what PR claims to deliver]
427
- - **Scope**: [claimed completeness]
428
-
429
- ### Evidence Examination
430
-
431
- #### Code Changes
432
- ```bash
433
- git diff main..feature-branch --stat
434
- ```
435
-
436
- - Files changed: [count]
437
- - Lines added: [count]
438
- - Lines removed: [count]
439
- - Tests added: [count]
440
-
441
- #### Acceptance Criteria Testing
442
-
443
- | Criterion | Claimed | Tested | Verdict |
444
- |-----------|---------|--------|---------|
445
- | AC1: [criterion] | ✓ | [yes/no] | [pass/fail] |
446
- | AC2: [criterion] | ✓ | [yes/no] | [pass/fail] |
447
- | AC3: [criterion] | ✓ | [yes/no] | [pass/fail] |
448
-
449
- ### Deductive Analysis
450
-
451
- **Claim**: [what PR says is implemented]
452
-
453
- **Evidence**:
454
- - [Fact 1 from code]
455
- - [Fact 2 from tests]
456
- - [Fact 3 from behavior]
457
-
458
- **Deduction**:
459
- - [Logical conclusion from evidence]
460
-
461
- **Verdict**: [supported/partially supported/not supported by evidence]
462
-
463
- ### Missing Elements
464
- - [ ] [Feature aspect not implemented]
465
- - [ ] [Test scenario not covered]
466
- - [ ] [Edge case not handled]
213
+ ## Agent Coordination Hints
467
214
 
468
- ### Conclusion
469
- [Evidence-based assessment of implementation completeness]
215
+ ### Memory Namespace
470
216
  ```
471
-
472
- ### Template 3: Performance Claim Verification
473
-
474
- ```markdown
475
- ## Sherlock Investigation: Performance Claims
476
-
477
- ### The Claim
478
- "Improved performance by [X]% in [scenario]"
479
-
480
- ### Investigation Setup
481
-
482
- #### Baseline Measurement
483
- ```bash
484
- git checkout [before-commit]
485
- npm run benchmark > baseline.txt
217
+ aqe/sherlock/
218
+ ├── investigations/* - Investigation reports
219
+ ├── evidence/* - Collected evidence
220
+ ├── verdicts/* - Claim verdicts
221
+ └── patterns/* - Common deception patterns
486
222
  ```
487
223
 
488
- #### Post-Fix Measurement
489
- ```bash
490
- git checkout [after-commit]
491
- npm run benchmark > optimized.txt
492
- ```
493
-
494
- ### Evidence Collection
495
-
496
- #### Benchmark Results
497
-
498
- | Metric | Before | After | Improvement | Claimed |
499
- |--------|--------|-------|-------------|---------|
500
- | Latency | [ms] | [ms] | [%] | [%] |
501
- | Throughput | [req/s] | [req/s] | [%] | [%] |
502
- | Memory | [MB] | [MB] | [%] | [%] |
503
- | CPU | [%] | [%] | [%] | [%] |
504
-
505
- ### Deductive Analysis
506
-
507
- **Claimed Improvement**: [X]%
508
- **Measured Improvement**: [Y]%
509
- **Variance**: [X-Y]%
510
-
511
- **Measurement Conditions**:
512
- - Environment: [prod/dev/local]
513
- - Load: [concurrent users/requests]
514
- - Data size: [records/MB]
515
-
516
- **Verdict**:
517
- - [ ] Claim supported by evidence
518
- - [ ] Claim exaggerated (actual: [Y]%)
519
- - [ ] Claim not reproducible
520
- - [ ] Claim based on cherry-picked scenario
521
-
522
- ### Conclusion
523
- [Evidence-based assessment with actual numbers]
224
+ ### Fleet Coordination
225
+ ```typescript
226
+ const investigationFleet = await FleetManager.coordinate({
227
+ strategy: 'evidence-investigation',
228
+ agents: [
229
+ 'qe-code-reviewer', // Code analysis
230
+ 'qe-security-auditor', // Security claim verification
231
+ 'qe-performance-validator' // Performance claim verification
232
+ ],
233
+ topology: 'parallel'
234
+ });
524
235
  ```
525
236
 
526
237
  ---
527
238
 
528
- ## Holmesian Principles for QE
529
-
530
- ### Principle 1: "Data! Data! Data!"
531
-
532
- > "I can't make bricks without clay."
533
-
534
- **Application**: Collect comprehensive evidence before forming conclusions
535
-
536
- - Logs, traces, metrics
537
- - Test results, coverage reports
538
- - Code diffs, git history
539
- - Reproduction steps
540
-
541
- ### Principle 2: "Eliminate the Impossible"
542
-
543
- > "When you have eliminated the impossible, whatever remains, however improbable, must be the truth."
544
-
545
- **Application**: Use negative testing and boundary analysis
546
-
547
- - Test what should NOT happen
548
- - Verify constraints are enforced
549
- - Check impossible inputs are rejected
550
- - Validate error handling paths
551
-
552
- ### Principle 3: "Observe, Don't Assume"
553
-
554
- > "You see, but you do not observe."
555
-
556
- **Application**: Run the code, don't just read it
557
-
558
- - Execute tests locally
559
- - Step through debugger
560
- - Profile performance
561
- - Monitor resource usage
562
-
563
- ### Principle 4: "The Little Things Matter"
564
-
565
- > "It has long been an axiom of mine that the little things are infinitely the most important."
566
-
567
- **Application**: Pay attention to details others miss
568
-
569
- - Off-by-one errors
570
- - Null/undefined handling
571
- - Timezone conversions
572
- - Race conditions
573
- - Memory leaks
574
-
575
- ### Principle 5: "Question Everything"
576
-
577
- > "I never guess. It is a capital mistake to theorize before one has data."
578
-
579
- **Application**: Verify all claims empirically
580
-
581
- - Don't trust commit messages
582
- - Don't trust documentation
583
- - Don't trust "it works on my machine"
584
- - Trust only reproducible evidence
585
-
586
- ---
587
-
588
- ## The Sherlock Review Checklist
589
-
590
- Before approving any PR, verify:
591
-
592
- ### Evidence-Based Review
593
-
594
- - [ ] **Claim vs Reality**: Does code match description?
595
- - [ ] **Tests Verify Claims**: Do tests actually prove the fix/feature?
596
- - [ ] **Reproducible**: Can you reproduce the bug/feature locally?
597
- - [ ] **Edge Cases**: Are boundary conditions tested?
598
- - [ ] **Negative Cases**: Are failure paths tested?
599
-
600
- ### Deductive Reasoning
601
-
602
- - [ ] **Root Cause**: Does fix address actual root cause?
603
- - [ ] **Side Effects**: Could this break something else?
604
- - [ ] **Performance**: Any evidence for performance claims?
605
- - [ ] **Security**: Any security implications?
606
- - [ ] **Assumptions**: Are all assumptions validated?
607
-
608
- ### Observational Analysis
609
-
610
- - [ ] **Code Quality**: Is code doing what it appears to do?
611
- - [ ] **Error Handling**: Are errors handled or just hidden?
612
- - [ ] **Resource Management**: Are resources properly managed?
613
- - [ ] **Concurrency**: Any race conditions or deadlocks?
614
- - [ ] **Data Validation**: Is input validated?
615
-
616
- ### Timeline Verification
617
-
618
- - [ ] **Related Changes**: Are there related commits?
619
- - [ ] **Regression Risk**: Could this reintroduce old bugs?
620
- - [ ] **Dependencies**: Are dependency changes necessary?
621
- - [ ] **Migration Path**: Is there a rollback plan?
239
+ ## Related Skills
240
+ - [brutal-honesty-review](../brutal-honesty-review/) - Direct technical criticism
241
+ - [context-driven-testing](../context-driven-testing/) - Adapt to context
242
+ - [bug-reporting-excellence](../bug-reporting-excellence/) - Document findings
622
243
 
623
244
  ---
624
245
 
625
- ## Common Investigation Scenarios
626
-
627
- ### Scenario 1: "This Fixed the Bug"
628
-
629
- **Investigation Steps**:
630
- 1. Reproduce bug on commit before fix
631
- 2. Verify bug is gone on commit with fix
632
- 3. Check if fix addresses root cause or just symptom
633
- 4. Test edge cases not in original bug report
634
- 5. Verify no regression in related functionality
635
-
636
- **Red Flags**:
637
- - Bug "fix" that just removes error logging
638
- - Fix that works only for specific test case
639
- - Fix that introduces workarounds instead of solving root cause
640
- - No test added to prevent regression
641
-
642
- ### Scenario 2: "Improved Performance by 50%"
643
-
644
- **Investigation Steps**:
645
- 1. Run benchmark on baseline commit
646
- 2. Run same benchmark on optimized commit
647
- 3. Compare results in identical conditions
648
- 4. Verify measurement methodology
649
- 5. Test under realistic load
650
-
651
- **Red Flags**:
652
- - Performance tested only on toy data
653
- - Comparison uses different conditions
654
- - "Improvement" in non-critical path
655
- - Trade-off not mentioned (e.g., memory for speed)
656
-
657
- ### Scenario 3: "Added Comprehensive Error Handling"
658
-
659
- **Investigation Steps**:
660
- 1. List all error paths in code
661
- 2. Verify each path has handling
662
- 3. Test each error condition
663
- 4. Check error messages are actionable
664
- 5. Verify errors are logged/monitored
665
-
666
- **Red Flags**:
667
- - Errors caught but ignored (`catch {}`)
668
- - Generic error messages
669
- - Errors handled by crashing
670
- - No logging of critical errors
671
-
672
- ---
673
-
674
- ## Output Format
675
-
676
- ### The Sherlock Report
677
-
678
- ```markdown
679
- # Sherlock Investigation Report
680
-
681
- **Case**: [PR/Issue number and title]
682
- **Investigator**: [Your name]
683
- **Date**: [Investigation date]
684
-
685
- ## Summary
686
- [One paragraph: What was claimed, what was found, verdict]
687
-
688
- ## Claims Examined
689
- 1. [Claim 1]
690
- 2. [Claim 2]
691
- 3. [Claim 3]
692
-
693
- ## Evidence Collected
694
- - Code changes: [summary]
695
- - Tests added: [count and coverage]
696
- - Benchmarks: [results]
697
- - Manual testing: [scenarios tested]
698
-
699
- ## Deductive Analysis
700
-
701
- ### Claim 1: [Claim text]
702
- **Evidence**: [What you found]
703
- **Deduction**: [Logical conclusion]
704
- **Verdict**: ✓ TRUE / ✗ FALSE / ⚠ PARTIALLY TRUE
705
-
706
- [Repeat for each claim]
707
-
708
- ## Findings
709
-
710
- ### What Works
711
- - [Positive finding with evidence]
712
-
713
- ### What Doesn't Work
714
- - [Issue found with evidence]
715
-
716
- ### What's Missing
717
- - [Gap in implementation/testing]
718
-
719
- ## Overall Verdict
720
-
721
- - [ ] Approve: Claims fully supported by evidence
722
- - [ ] Approve with Reservations: Claims mostly accurate
723
- - [ ] Request Changes: Claims not supported by evidence
724
- - [ ] Reject: Claims are false or misleading
725
-
726
- ## Recommendations
727
- 1. [Action item based on findings]
728
- 2. [Action item based on findings]
729
-
730
- ---
731
-
732
- **Elementary Evidence**: [Link to detailed evidence files/logs]
733
- **Reproducible**: [Yes/No - Can others verify your findings?]
734
- ```
735
-
736
- ---
737
-
738
- ## Integration with AQE Fleet
739
-
740
- ### Use Sherlock Review With:
741
-
742
- 1. **qe-code-reviewer**: After automated review, investigate flagged issues
743
- 2. **qe-security-auditor**: Verify security fix claims
744
- 3. **qe-performance-validator**: Validate performance improvement claims
745
- 4. **qe-flaky-test-hunter**: Investigate "test fixed" claims
746
- 5. **production-validator**: Verify deployment-ready claims
747
-
748
- ### Workflow Integration
749
-
750
- ```bash
751
- # 1. Automated review flags issues
752
- aqe review --pr 123
753
-
754
- # 2. Sherlock investigates flagged claims
755
- # [Apply Sherlock methodology to each flag]
756
-
757
- # 3. Document evidence-based findings
758
- # [Generate Sherlock report]
759
-
760
- # 4. Provide actionable feedback
761
- # [Based on evidence, not assumptions]
762
- ```
763
-
764
- ---
765
-
766
- ## Learn More
767
-
768
- ### Recommended Reading
769
- - "The Adventure of Silver Blaze" - Importance of negative evidence
770
- - "A Scandal in Bohemia" - Observation vs. seeing
771
- - "The Boscombe Valley Mystery" - Following the evidence chain
772
-
773
- ### Related QE Skills
774
- - `brutal-honesty-review` - Direct technical criticism
775
- - `context-driven-testing` - Adapt to specific context
776
- - `exploratory-testing-advanced` - Investigation techniques
777
- - `bug-reporting-excellence` - Document findings clearly
778
-
779
- ---
246
+ ## Remember
780
247
 
781
- **Created**: 2025-11-15
782
- **Category**: Quality Engineering
783
- **Approach**: Evidence-Based Investigation
784
- **Philosophy**: "Elementary" - Trust only what can be proven
248
+ **"It is a capital mistake to theorize before one has data."** Trust only reproducible evidence. Don't trust commit messages, documentation, or "works on my machine."
785
249
 
786
- *"It is a capital mistake to theorize before one has data. Insensibly one begins to twist facts to suit theories, instead of theories to suit facts."* - Sherlock Holmes
250
+ **The Sherlock Standard:** Every claim must be verified empirically. What does the evidence actually show?