@botlearn/debugger 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2025 BotLearn
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
package/README.md ADDED
@@ -0,0 +1,35 @@
1
+ # @botlearn/debugger
2
+
3
+ > Root cause analysis, bug diagnosis, and fix suggestion for OpenClaw Agent — improves debugging efficiency 5x with systematic hypothesis-driven investigation
4
+
5
+ ## Installation
6
+
7
+ ```bash
8
+ # via npm
9
+ npm install @botlearn/debugger
10
+
11
+ # via clawhub
12
+ clawhub install @botlearn/debugger
13
+ ```
14
+
15
+ ## Category
16
+
17
+ programming-assistance
18
+
19
+ ## Dependencies
20
+
21
+ `@botlearn/code-review`
22
+
23
+ ## Files
24
+
25
+ | File | Description |
26
+ |------|-------------|
27
+ | `manifest.json` | Skill metadata and configuration |
28
+ | `skill.md` | Role definition and activation rules |
29
+ | `knowledge/` | Domain knowledge documents |
30
+ | `strategies/` | Behavioral strategy definitions |
31
+ | `tests/` | Smoke and benchmark tests |
32
+
33
+ ## License
34
+
35
+ MIT
@@ -0,0 +1,74 @@
1
+ ---
2
+ domain: debugger
3
+ topic: anti-patterns
4
+ priority: medium
5
+ ttl: 30d
6
+ ---
7
+
8
+ # Debugging — Anti-Patterns
9
+
10
+ ## Investigation Anti-Patterns
11
+
12
+ ### 1. Symptom Fixing (Patch-and-Pray)
13
+ - **Problem**: Applying a surface-level fix that silences the error without understanding the root cause. Example: wrapping a NullPointerException in a try-catch that returns a default value
14
+ - **Why it's harmful**: The underlying defect remains. It will resurface in a different form, often harder to diagnose, or cause silent data corruption
15
+ - **Fix**: Always trace the causal chain from symptom to root cause before writing any fix code. Ask: "Why is this value null?" not "How do I handle null?"
16
+
17
+ ### 2. Shotgun Debugging
18
+ - **Problem**: Making multiple simultaneous changes hoping one of them fixes the bug, without understanding which change (if any) actually addresses the root cause
19
+ - **Why it's harmful**: Even if the bug disappears, you don't know why. You may have introduced side effects, masked the real issue, or created new bugs. You cannot write a meaningful regression test
20
+ - **Fix**: Change exactly ONE variable at a time. Observe the result. Revert if it didn't help. Proceed methodically
21
+
22
+ ### 3. Ignoring Error Messages
23
+ - **Problem**: Glancing at an error and immediately forming a theory without reading the full message, stack trace, and context
24
+ - **Why it's harmful**: Error messages are the most direct diagnostic evidence. Ignoring them leads to investigating the wrong hypothesis entirely. Developers frequently report spending hours debugging only to discover the answer was in the error message
25
+ - **Fix**: Read the ENTIRE error message. Read the ENTIRE stack trace. Parse every field: exception type, message string, file, line, column. Then hypothesize
26
+
27
+ ### 4. Assuming the Bug Is Somewhere Else
28
+ - **Problem**: Blaming the framework, library, compiler, or OS without evidence. "It must be a React bug" or "The database is broken"
29
+ - **Why it's harmful**: Widely-used libraries have been tested by millions of users. The bug is almost certainly in your code. Blaming external components wastes time investigating the wrong system
30
+ - **Fix**: Assume the bug is in YOUR code until you have strong evidence otherwise. Only escalate to library/framework investigation after ruling out your own code with concrete evidence
31
+
32
+ ### 5. Not Reading the Documentation
33
+ - **Problem**: Using an API based on assumptions about its behavior instead of reading the official documentation
34
+ - **Why it's harmful**: APIs frequently have non-obvious semantics: nullable return values, specific error conditions, required initialization order, thread-safety guarantees (or lack thereof)
35
+ - **Fix**: Before debugging an API integration issue, re-read the relevant documentation section. Check for known issues, version-specific behavior changes, and migration guides
36
+
37
+ ## Process Anti-Patterns
38
+
39
+ ### 6. Debugging Without Reproducing
40
+ - **Problem**: Attempting to fix a bug based solely on a bug report or stack trace without first reproducing it locally
41
+ - **Why it's harmful**: Without reproduction, you cannot verify your fix works. You may fix a different bug or introduce a regression. You have no way to write a regression test
42
+ - **Fix**: ALWAYS reproduce the bug before attempting a fix. If reproduction is difficult, invest time in creating a minimal reproduction case. If the bug is intermittent, increase the probability of occurrence (e.g., stress test, increase parallelism)
43
+
44
+ ### 7. Not Using Version Control During Debugging
45
+ - **Problem**: Making changes to investigate a bug without committing or stashing first, leading to a tangled mix of investigation code and attempted fixes
46
+ - **Why it's harmful**: You cannot cleanly revert to a known state. You lose track of what you changed. You may accidentally commit debug code
47
+ - **Fix**: Always stash or commit your work before starting to debug. Create a debug branch. Make each investigation step a separate commit that can be reverted. Use `git bisect` when the bug is a regression
48
+
49
+ ### 8. Premature Optimization During Bug Fix
50
+ - **Problem**: While fixing a bug, simultaneously refactoring or optimizing the surrounding code
51
+ - **Why it's harmful**: Conflates two different changes. If the fix introduces a new bug, it's harder to isolate. Code review becomes more difficult. The optimization may not be needed
52
+ - **Fix**: Fix the bug in the smallest possible change. Commit. Then refactor or optimize in a separate commit if warranted
53
+
54
+ ### 9. Debugging in Production
55
+ - **Problem**: Adding debug logging, print statements, or experimental fixes directly in the production environment
56
+ - **Why it's harmful**: Risk of breaking production for all users. Debug output may expose sensitive data. Changes are not tracked in version control
57
+ - **Fix**: Reproduce the bug in a development or staging environment. If production-only debugging is unavoidable, use observability tools (structured logging, APM, distributed tracing) instead of code changes
58
+
59
+ ### 10. Ignoring Intermittent Failures
60
+ - **Problem**: Dismissing a test or error that "only fails sometimes" as a flaky test or transient issue without investigation
61
+ - **Why it's harmful**: Intermittent failures are often concurrency bugs, race conditions, or timing-dependent issues — the hardest and most dangerous class of bugs. They tend to worsen under load
62
+ - **Fix**: Treat intermittent failures as HIGH priority. They often indicate a real concurrency or state-management bug. Run the test in a loop (100-1000 iterations) to increase reproduction probability. Add logging at synchronization points
63
+
64
+ ## Output Anti-Patterns
65
+
66
+ ### 11. Vague Bug Reports
67
+ - **Problem**: Describing the bug as "it doesn't work" or "it's broken" without specifying: what was expected, what actually happened, steps to reproduce, environment details
68
+ - **Why it's harmful**: Forces the recipient to guess and ask follow-up questions, wasting time. May lead to investigating the wrong issue
69
+ - **Fix**: Every bug report should include: (1) steps to reproduce, (2) expected behavior, (3) actual behavior, (4) environment details, (5) relevant error output
70
+
71
+ ### 12. Fix Without Regression Test
72
+ - **Problem**: Fixing a bug without adding a test that would catch it if reintroduced
73
+ - **Why it's harmful**: Without a regression test, the same bug can (and often does) come back in a future change. The team has no automated safety net for this specific failure mode
74
+ - **Fix**: For every bug fix, write at least one test that: (1) fails without the fix applied, (2) passes with the fix applied, (3) covers the specific input/state that triggered the original bug
@@ -0,0 +1,162 @@
1
+ ---
2
+ domain: debugger
3
+ topic: debugging-methodologies-and-strategies
4
+ priority: high
5
+ ttl: 30d
6
+ ---
7
+
8
+ # Debugging — Best Practices
9
+
10
+ ## The Scientific Debugging Method
11
+
12
+ Debugging is hypothesis-driven investigation. Apply the scientific method rigorously:
13
+
14
+ ### 1. Observe
15
+ - Collect all available evidence: error messages, stack traces, logs, user reports, screenshots
16
+ - Record the exact steps to reproduce, the expected behavior, and the actual behavior
17
+ - Note the environment: OS, language version, framework version, configuration
18
+
19
+ ### 2. Hypothesize
20
+ - Based on observed symptoms, formulate 2-3 ranked hypotheses for the root cause
21
+ - Each hypothesis must be **falsifiable** — you must be able to design a test that would disprove it
22
+ - Rank by likelihood using bug pattern knowledge (see knowledge/domain.md)
23
+
24
+ ### 3. Predict
25
+ - For each hypothesis, predict what you would observe if it were true
26
+ - Example: "If the bug is a race condition, adding a sleep(1) before the read should make it pass consistently"
27
+
28
+ ### 4. Test
29
+ - Design the smallest experiment that distinguishes between hypotheses
30
+ - Change exactly ONE variable at a time
31
+ - Record the result: did the prediction hold?
32
+
33
+ ### 5. Conclude
34
+ - If the prediction held: the hypothesis is supported (but gather more evidence if possible)
35
+ - If the prediction failed: discard the hypothesis and promote the next one
36
+ - Repeat until root cause is confirmed with high confidence
37
+
38
+ ## Binary Search / Bisection Debugging
39
+
40
+ The most powerful technique for narrowing down bugs in large codebases or long histories.
41
+
42
+ ### Code Bisection (Runtime)
43
+ 1. Identify a known-good state and a known-bad state
44
+ 2. Insert a diagnostic check at the midpoint of the code path between them
45
+ 3. Determine which half contains the bug
46
+ 4. Repeat, halving the search space each time
47
+ 5. **Efficiency**: Finds the bug in O(log n) steps instead of O(n)
48
+
49
+ ### Git Bisect (Historical)
50
+ ```bash
51
+ git bisect start
52
+ git bisect bad # Current commit is broken
53
+ git bisect good abc1234 # This older commit was working
54
+ # Git checks out the midpoint — test it
55
+ git bisect good # or git bisect bad
56
+ # Repeat until the first bad commit is found
57
+ git bisect reset
58
+ ```
59
+ - **When to use**: Bug exists now but worked before; unclear when it was introduced
60
+ - **Automate**: `git bisect run ./test_script.sh` for fully automated bisection
61
+
62
+ ### Data Bisection
63
+ - For bugs triggered by specific input data, bisect the input:
64
+ 1. Split the input in half
65
+ 2. Test each half separately
66
+ 3. Recurse on the half that triggers the bug
67
+ 4. Identify the minimal triggering input
68
+
69
+ ## Strategic Logging
70
+
71
+ ### Log Placement Strategy
72
+
73
+ | Placement | Purpose | Example |
74
+ |-----------|---------|---------|
75
+ | Function entry | Verify function is called with expected args | `log.debug("processOrder called", {orderId, items})` |
76
+ | Before external call | Verify outbound request data | `log.debug("Calling payment API", {payload})` |
77
+ | After external call | Verify response data | `log.debug("Payment API response", {status, body})` |
78
+ | Branch points | Verify which code path executes | `log.debug("Using cache path" \| "Using DB path")` |
79
+ | Loop iterations | Track iteration state for off-by-one / infinite loops | `log.debug("Loop iteration", {i, current, total})` |
80
+ | Catch blocks | Always log caught exceptions with full context | `log.error("Failed to process", {error, context})` |
81
+
82
+ ### Structured Logging for Debugging
83
+ - Use structured key-value pairs, not string concatenation
84
+ - Include correlation IDs to trace requests across services
85
+ - Log the **state** (variable values) not just the **event** (what happened)
86
+ - Use log levels appropriately: DEBUG for investigation, ERROR for failures, WARN for recoverable issues
87
+
88
+ ### Temporary Debug Logging Pattern
89
+ ```
90
+ // DEBUG-START: investigating issue #1234
91
+ console.log('[DEBUG-1234] state at checkpoint:', JSON.stringify(state));
92
+ // DEBUG-END
93
+ ```
94
+ - Always tag temporary logging with a ticket/issue number
95
+ - Always remove before committing (or use a lint rule to catch it)
96
+
97
+ ## Rubber Duck Debugging
98
+
99
+ When stuck, explain the problem out loud (or in writing) step by step:
100
+
101
+ 1. State the expected behavior clearly
102
+ 2. State the actual behavior clearly
103
+ 3. Walk through the code line by line, explaining what each line does
104
+ 4. The act of explaining often reveals the incorrect assumption
105
+
106
+ **Why it works**: Forces you to examine each assumption explicitly rather than glossing over them mentally.
107
+
108
+ ## Minimal Reproduction
109
+
110
+ ### Why Minimize?
111
+ - Removes noise from unrelated code/data
112
+ - Makes the bug easier to understand and communicate
113
+ - Confirms you understand what triggers the bug
114
+ - Provides a ready-made regression test
115
+
116
+ ### Minimization Process
117
+ 1. Start with the full failing scenario
118
+ 2. Remove components one at a time, checking if the bug persists
119
+ 3. Simplify input data to the smallest triggering case
120
+ 4. Remove configuration, middleware, and dependencies that are not involved
121
+ 5. The result should be the **smallest code + input that reproduces the bug**
122
+
123
+ ### Reproduction Environment Checklist
124
+ - [ ] Same language/runtime version
125
+ - [ ] Same dependency versions (check lock files)
126
+ - [ ] Same OS or container environment
127
+ - [ ] Same configuration / environment variables
128
+ - [ ] Same data state (database, cache, files)
129
+
130
+ ## Debugging by Error Category
131
+
132
+ ### For Null/Undefined Errors
133
+ 1. Trace the variable backwards from the crash point to where it was assigned
134
+ 2. Identify which code path leads to the null/undefined assignment
135
+ 3. Common sources: missing API response field, failed database query, uninitialized state
136
+
137
+ ### For Async/Promise Errors
138
+ 1. Map the async execution flow (draw it if needed)
139
+ 2. Check for missing `await`, unhandled rejections, or callback error parameters
140
+ 3. Verify execution order — async code may not run in the order it appears
141
+
142
+ ### For Performance Bugs
143
+ 1. Profile first, optimize second — never guess at bottlenecks
144
+ 2. Check algorithmic complexity: O(n^2) in a loop over large data is a common culprit
145
+ 3. Look for N+1 query patterns in database-backed code
146
+ 4. Check for unnecessary re-renders in frontend frameworks
147
+
148
+ ### For Concurrency Bugs
149
+ 1. Identify shared mutable state
150
+ 2. Map the order of lock acquisitions across threads
151
+ 3. Use thread-safe data structures or synchronization primitives
152
+ 4. Test with increased parallelism to amplify timing-sensitive bugs
153
+
154
+ ## Fix Verification Checklist
155
+
156
+ After implementing a fix:
157
+ - [ ] The original bug is no longer reproducible
158
+ - [ ] No new failures introduced (run full test suite)
159
+ - [ ] Edge cases covered (empty input, null, boundary values, concurrent access)
160
+ - [ ] A regression test exists that would catch this bug if reintroduced
161
+ - [ ] The fix addresses the root cause, not just the symptom
162
+ - [ ] Code review completed (leverage @botlearn/code-review)
@@ -0,0 +1,180 @@
1
+ ---
2
+ domain: debugger
3
+ topic: common-bug-patterns-and-error-taxonomy
4
+ priority: high
5
+ ttl: 30d
6
+ ---
7
+
8
+ # Debugging — Common Bug Patterns, Error Types & Stack Trace Anatomy
9
+
10
+ ## Bug Classification Taxonomy
11
+
12
+ ### 1. Logic Errors
13
+ Bugs where the code executes without crashing but produces incorrect results.
14
+
15
+ | Pattern | Description | Common Languages | Example |
16
+ |---------|-------------|-----------------|---------|
17
+ | Off-by-one | Loop boundary or index shifted by 1 | All | `for (i = 0; i <= arr.length; i++)` — reads past array end |
18
+ | Incorrect operator | Wrong comparison or arithmetic operator | All | `if (a = b)` instead of `if (a == b)` in C/JS |
19
+ | Wrong boolean logic | Inverted or miscomposed conditions | All | `if (!a && !b)` instead of `!(a && b)` (De Morgan violation) |
20
+ | Missing edge case | Fails on empty input, zero, negative, max values | All | No check for empty array before accessing `arr[0]` |
21
+ | Incorrect algorithm | Right structure, wrong logic in transformation | All | Sorting comparator returns wrong sign |
22
+ | Integer overflow | Arithmetic exceeds type range silently | C, C++, Java, Rust | `int sum = 2_000_000_000 + 2_000_000_000` wraps negative |
23
+
24
+ ### 2. Null / Undefined Reference Errors
25
+ Accessing members or methods on a null/undefined/nil value.
26
+
27
+ | Language | Error Message Pattern | Common Cause |
28
+ |----------|----------------------|-------------|
29
+ | JavaScript | `TypeError: Cannot read properties of undefined (reading 'X')` | Accessing nested property on uninitialized object |
30
+ | JavaScript | `TypeError: X is not a function` | Calling undefined method, wrong import |
31
+ | Python | `AttributeError: 'NoneType' object has no attribute 'X'` | Function returns None unexpectedly |
32
+ | Java | `NullPointerException` | Uninitialized object reference, failed Optional unwrap |
33
+ | C# | `NullReferenceException` | Uninitialized reference type |
34
+ | Rust | `unwrap()` on `None` | `Option::unwrap()` called on `None` value |
35
+ | Go | `panic: runtime error: invalid memory address or nil pointer dereference` | Nil pointer method call |
36
+
37
+ ### 3. Type Errors
38
+ Type mismatches at runtime or compile time.
39
+
40
+ | Language | Error Pattern | Common Cause |
41
+ |----------|--------------|-------------|
42
+ | Python | `TypeError: unsupported operand type(s)` | String + int without conversion |
43
+ | TypeScript | `Type 'X' is not assignable to type 'Y'` | Interface mismatch, missing property |
44
+ | Java | `ClassCastException` | Unsafe downcast, generic type erasure |
45
+ | Go | `cannot use X (type Y) as type Z` | Interface not satisfied |
46
+
47
+ ### 4. Concurrency Bugs
48
+ Non-deterministic failures caused by parallel execution.
49
+
50
+ | Pattern | Description | Symptoms |
51
+ |---------|-------------|----------|
52
+ | Race condition | Two threads access shared state without synchronization | Intermittent wrong results, passes sometimes |
53
+ | Deadlock | Two+ threads wait for each other's locks | Application hangs permanently |
54
+ | Livelock | Threads keep retrying but never make progress | High CPU, no progress, no hang |
55
+ | Starvation | Low-priority thread never gets CPU time | Some requests never complete |
56
+ | Lost update | Concurrent writes overwrite each other | Data disappears or reverts |
57
+ | Double-checked locking | Broken singleton pattern without volatile/atomic | Partially constructed object visible |
58
+
59
+ ### 5. Resource & Memory Errors
60
+
61
+ | Pattern | Language(s) | Symptoms |
62
+ |---------|------------|----------|
63
+ | Memory leak | C, C++, Java (listener leaks), JS (closures, event listeners) | Gradual memory growth, eventual OOM |
64
+ | Use-after-free | C, C++ | Crash, corrupted data, security vulnerability |
65
+ | Buffer overflow | C, C++ | Crash, security vulnerability, corrupted adjacent memory |
66
+ | File descriptor leak | All | "Too many open files" error after extended operation |
67
+ | Connection pool exhaustion | All (database/HTTP clients) | Timeouts, connection refused after sustained load |
68
+ | Stack overflow | All (deep recursion) | `Maximum call stack size exceeded` (JS), `StackOverflowError` (Java) |
69
+
70
+ ### 6. Async / Promise Errors
71
+
72
+ | Pattern | Language | Error/Symptom |
73
+ |---------|---------|---------------|
74
+ | Unhandled promise rejection | JavaScript | `UnhandledPromiseRejectionWarning`, silent failure |
75
+ | Missing await | JavaScript/TypeScript | Function returns Promise object instead of resolved value |
76
+ | Callback hell / error swallowing | JavaScript | Errors caught silently in nested callbacks |
77
+ | Async deadlock | C# | `.Result` or `.Wait()` on async in sync context blocks forever |
78
+ | Event loop blocking | Node.js | Server stops responding during CPU-intensive sync operation |
79
+
80
+ ### 7. Import / Module Errors
81
+
82
+ | Language | Error Pattern | Common Cause |
83
+ |----------|--------------|-------------|
84
+ | Python | `ModuleNotFoundError: No module named 'X'` | Missing package, wrong virtualenv, typo |
85
+ | JavaScript | `SyntaxError: Cannot use import statement outside a module` | CommonJS/ESM mismatch |
86
+ | JavaScript | `Module not found: Can't resolve 'X'` | Missing dependency, wrong path |
87
+ | Java | `ClassNotFoundException` | Missing JAR, wrong classpath |
88
+ | Go | `cannot find package "X"` | Missing `go get`, wrong module path |
89
+
90
+ ## Stack Trace Anatomy
91
+
92
+ ### General Structure
93
+ ```
94
+ ExceptionType: Error message describing what went wrong
95
+ at function_name (file_path:line:column) ← Immediate failure point
96
+ at caller_function (file_path:line:column) ← Who called it
97
+ at higher_caller (file_path:line:column) ← Chain continues up
98
+ ...
99
+ at entry_point (file_path:line:column) ← Program/request entry
100
+ ```
101
+
102
+ ### Reading Stack Traces — Key Principles
103
+
104
+ 1. **Read top-down**: The top frame is where the error occurred; lower frames show the call chain
105
+ 2. **Find YOUR code**: Skip framework/library frames; focus on frames in your source files
106
+ 3. **Identify the boundary**: The transition from your code to library code often reveals the API misuse
107
+ 4. **Check the error message first**: It often tells you exactly what went wrong (null value, type mismatch, missing key)
108
+ 5. **Look for "Caused by"**: In Java/C#, chained exceptions reveal the original root cause at the bottom
109
+
110
+ ### Language-Specific Stack Trace Formats
111
+
112
+ #### JavaScript / Node.js
113
+ ```
114
+ TypeError: Cannot read properties of undefined (reading 'map')
115
+ at UserList (/app/components/UserList.jsx:15:22)
116
+ at renderWithHooks (/app/node_modules/react-dom/...js:16305:18)
117
+ at mountIndeterminateComponent (/app/node_modules/react-dom/...js:20069:13)
118
+ ```
119
+ - **Key**: First frame in YOUR source tree (not `node_modules`) is the likely bug location
120
+
121
+ #### Python
122
+ ```
123
+ Traceback (most recent call last):
124
+ File "/app/main.py", line 42, in process_data
125
+ result = transform(data)
126
+ File "/app/transform.py", line 18, in transform
127
+ return data["key"].strip()
128
+ TypeError: 'NoneType' object has no attribute 'strip'
129
+ ```
130
+ - **Key**: Python traces read **bottom-up** — the last frame + error message is the failure point
131
+
132
+ #### Java
133
+ ```
134
+ java.lang.NullPointerException: Cannot invoke "String.length()" because "str" is null
135
+ at com.app.service.Parser.parse(Parser.java:45)
136
+ at com.app.controller.ApiController.handleRequest(ApiController.java:112)
137
+ Caused by: java.io.IOException: Connection refused
138
+ at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:162)
139
+ ```
140
+ - **Key**: "Caused by" chains reveal the original root cause
141
+
142
+ #### Go
143
+ ```
144
+ goroutine 1 [running]:
145
+ main.processData(0x0, 0x0)
146
+ /app/main.go:25 +0x3a
147
+ main.main()
148
+ /app/main.go:10 +0x25
149
+ ```
150
+ - **Key**: Goroutine ID and state help diagnose concurrency issues
151
+
152
+ ## Error Message Interpretation Guide
153
+
154
+ ### HTTP Status Codes as Bug Signals
155
+
156
+ | Code | Meaning | Likely Bug |
157
+ |------|---------|-----------|
158
+ | 400 | Bad Request | Malformed request body, missing required field, invalid JSON |
159
+ | 401 | Unauthorized | Expired/missing auth token, wrong credentials |
160
+ | 403 | Forbidden | Insufficient permissions, CORS policy violation |
161
+ | 404 | Not Found | Wrong URL path, resource deleted, routing misconfiguration |
162
+ | 409 | Conflict | Duplicate key, optimistic locking failure, stale data |
163
+ | 422 | Unprocessable Entity | Validation failure, business rule violation |
164
+ | 429 | Too Many Requests | Rate limiting triggered, missing backoff logic |
165
+ | 500 | Internal Server Error | Unhandled exception in server code |
166
+ | 502 | Bad Gateway | Downstream service crashed or unreachable |
167
+ | 503 | Service Unavailable | Server overloaded, deployment in progress |
168
+ | 504 | Gateway Timeout | Downstream service too slow, query timeout |
169
+
170
+ ### Database Error Patterns
171
+
172
+ | Error Pattern | Likely Cause |
173
+ |--------------|-------------|
174
+ | `duplicate key value violates unique constraint` | Inserting row with existing unique value |
175
+ | `deadlock detected` | Concurrent transactions locking same rows in different order |
176
+ | `relation "X" does not exist` | Missing table, wrong schema, migration not run |
177
+ | `column "X" of relation "Y" does not exist` | Schema mismatch, missing migration |
178
+ | `connection refused` / `too many connections` | DB server down or connection pool exhausted |
179
+ | `lock wait timeout exceeded` | Long-running transaction blocking others |
180
+ | `value too long for type character varying(N)` | Input exceeds column width constraint |
package/manifest.json ADDED
@@ -0,0 +1,28 @@
1
+ {
2
+ "name": "@botlearn/debugger",
3
+ "version": "0.1.0",
4
+ "description": "Root cause analysis, bug diagnosis, and fix suggestion for OpenClaw Agent — improves debugging efficiency 5x with systematic hypothesis-driven investigation",
5
+ "category": "programming-assistance",
6
+ "author": "BotLearn",
7
+ "benchmarkDimension": "code-generation",
8
+ "expectedImprovement": 500,
9
+ "dependencies": {
10
+ "@botlearn/code-review": "^1.0.0"
11
+ },
12
+ "compatibility": {
13
+ "openclaw": ">=0.5.0"
14
+ },
15
+ "files": {
16
+ "skill": "skill.md",
17
+ "knowledge": [
18
+ "knowledge/domain.md",
19
+ "knowledge/best-practices.md",
20
+ "knowledge/anti-patterns.md"
21
+ ],
22
+ "strategies": [
23
+ "strategies/main.md"
24
+ ],
25
+ "smokeTest": "tests/smoke.json",
26
+ "benchmark": "tests/benchmark.json"
27
+ }
28
+ }
package/package.json ADDED
@@ -0,0 +1,38 @@
1
+ {
2
+ "name": "@botlearn/debugger",
3
+ "version": "0.1.0",
4
+ "description": "Root cause analysis, bug diagnosis, and fix suggestion for OpenClaw Agent — improves debugging efficiency 5x with systematic hypothesis-driven investigation",
5
+ "type": "module",
6
+ "main": "manifest.json",
7
+ "files": [
8
+ "manifest.json",
9
+ "skill.md",
10
+ "knowledge/",
11
+ "strategies/",
12
+ "tests/",
13
+ "README.md"
14
+ ],
15
+ "keywords": [
16
+ "botlearn",
17
+ "openclaw",
18
+ "skill",
19
+ "programming-assistance"
20
+ ],
21
+ "author": "BotLearn",
22
+ "license": "MIT",
23
+ "dependencies": {
24
+ "@botlearn/code-review": "0.1.0"
25
+ },
26
+ "repository": {
27
+ "type": "git",
28
+ "url": "https://github.com/readai-team/botlearn-awesome-skills.git",
29
+ "directory": "packages/skills/debugger"
30
+ },
31
+ "homepage": "https://github.com/readai-team/botlearn-awesome-skills/tree/main/packages/skills/debugger",
32
+ "bugs": {
33
+ "url": "https://github.com/readai-team/botlearn-awesome-skills/issues"
34
+ },
35
+ "publishConfig": {
36
+ "access": "public"
37
+ }
38
+ }
package/skill.md ADDED
@@ -0,0 +1,48 @@
1
+ ---
2
+ name: debugger
3
+ role: Debugging Specialist
4
+ version: 1.0.0
5
+ triggers:
6
+ - "debug"
7
+ - "fix bug"
8
+ - "why is this failing"
9
+ - "error"
10
+ - "stack trace"
11
+ - "exception"
12
+ - "not working"
13
+ - "unexpected behavior"
14
+ - "crash"
15
+ - "broken"
16
+ ---
17
+
18
+ # Role
19
+
20
+ You are a Debugging Specialist. When activated, you systematically diagnose software bugs through hypothesis-driven investigation, root cause analysis, and evidence-based fix suggestions. You correctly identify root causes at least 60% of the time, improving debugging efficiency by 5x compared to unstructured debugging.
21
+
22
+ # Capabilities
23
+
24
+ 1. Analyze error messages, stack traces, and exception hierarchies to identify the failure point and its upstream causes
25
+ 2. Classify bugs by category (logic error, state corruption, race condition, resource leak, type mismatch, off-by-one, null reference, etc.) to narrow the investigation
26
+ 3. Formulate ranked hypotheses for the root cause based on symptom patterns, code context, and common bug taxonomies
27
+ 4. Design minimal reproduction steps that isolate the bug from unrelated system behavior
28
+ 5. Propose targeted fixes with reasoning, including regression test suggestions to prevent recurrence
29
+ 6. Leverage @botlearn/code-review capabilities to analyze code structure and identify defect-prone patterns before deep investigation
30
+
31
+ # Constraints
32
+
33
+ 1. Never suggest a fix without first identifying the root cause -- symptom-level patches create technical debt
34
+ 2. Never skip the hypothesis phase -- jumping to conclusions leads to incorrect fixes and wasted effort
35
+ 3. Never ignore error messages or stack traces -- they contain critical diagnostic information
36
+ 4. Always consider side effects of a proposed fix -- verify it does not introduce new bugs
37
+ 5. Always suggest at least one regression test for every fix to prevent recurrence
38
+ 6. Never assume the first hypothesis is correct -- validate with evidence before recommending a fix
39
+
40
+ # Activation
41
+
42
+ WHEN the user reports a bug, error, unexpected behavior, or requests debugging assistance:
43
+ 1. Collect symptom data: error messages, stack traces, expected vs. actual behavior, environment context
44
+ 2. Classify the bug category using knowledge/domain.md
45
+ 3. Apply the 7-step debugging strategy from strategies/main.md
46
+ 4. Cross-reference with knowledge/best-practices.md for methodology guidance
47
+ 5. Verify the approach against knowledge/anti-patterns.md to avoid common debugging mistakes
48
+ 6. Output: root cause analysis, ranked fix suggestions, and regression test recommendations
@@ -0,0 +1,109 @@
1
+ ---
2
+ strategy: debugger
3
+ version: 1.0.0
4
+ steps: 7
5
+ ---
6
+
7
+ # Debugging Strategy
8
+
9
+ ## Step 1: Symptom Analysis
10
+ - Collect ALL available diagnostic data: error messages, stack traces, logs, screenshots, user-reported steps
11
+ - Parse the error message completely — identify: exception type, message body, file, line, column
12
+ - Read the FULL stack trace — identify: the failure frame, the boundary between your code and library code, and the call chain
13
+ - Classify the symptom using knowledge/domain.md bug taxonomy:
14
+ - Logic error / Null reference / Type error / Concurrency / Resource / Async / Import / Other
15
+ - Record: **expected behavior** vs. **actual behavior** vs. **environment context** (OS, runtime version, configuration)
16
+ - IF the symptom report is incomplete THEN ask for: exact error message, steps to reproduce, environment details
17
+
18
+ ## Step 2: Hypothesis Generation
19
+ - Based on the symptom classification, generate 2-4 ranked hypotheses for the root cause
20
+ - For each hypothesis, specify:
21
+ - **What**: The specific defect (e.g., "variable `user` is null because the API returns 404 when the user is not found")
22
+ - **Where**: The file and approximate code region
23
+ - **Why**: What makes this hypothesis plausible given the symptoms
24
+ - **Test**: How to confirm or disprove this hypothesis
25
+ - Rank hypotheses by:
26
+ 1. **Consistency** with ALL observed symptoms (must explain every symptom, not just one)
27
+ 2. **Probability** based on common bug patterns (knowledge/domain.md) — null references and off-by-one errors are more likely than compiler bugs
28
+ 3. **Testability** — prefer hypotheses that can be quickly confirmed or disproved
29
+ - IF the code is available THEN leverage @botlearn/code-review to identify defect-prone patterns (deep nesting, missing error handling, unvalidated inputs) that support or refute hypotheses
30
+
31
+ ## Step 3: Reproduction
32
+ - Design the **minimal reproduction case**: the smallest input + code + configuration that triggers the bug
33
+ - Follow the minimization process from knowledge/best-practices.md:
34
+ 1. Start with the full failing scenario
35
+ 2. Remove components one at a time, verifying the bug persists after each removal
36
+ 3. Simplify input data to the smallest triggering case
37
+ 4. Document the exact steps: "Given X, when Y, then Z happens instead of W"
38
+ - IF the bug is intermittent THEN:
39
+ - Increase parallelism or load to amplify the timing window
40
+ - Add logging at synchronization points
41
+ - Run the reproduction in a loop (100+ iterations) with state logging
42
+ - IF the bug cannot be reproduced locally THEN:
43
+ - Verify environment parity (versions, configuration, data state)
44
+ - Check for environment-specific factors: timezone, locale, file system permissions, network latency
45
+ - Consider using production observability (traces, metrics) if available
46
+
47
+ ## Step 4: Root Cause Isolation
48
+ - Test the top-ranked hypothesis first:
49
+ - Change exactly ONE variable and observe the result
50
+ - IF the prediction from Step 2 holds THEN the hypothesis is supported — gather one more piece of confirming evidence
51
+ - IF the prediction fails THEN discard the hypothesis and test the next one
52
+ - Use bisection techniques from knowledge/best-practices.md to narrow the search space:
53
+ - **Code bisection**: Insert diagnostic checks at the midpoint of the suspect code path
54
+ - **Git bisect**: If the bug is a regression, identify the first bad commit in O(log n) steps
55
+ - **Data bisection**: If triggered by specific input, bisect the input to find the minimal trigger
56
+ - Verify anti-patterns from knowledge/anti-patterns.md are not present in your investigation:
57
+ - Are you changing multiple things at once? (shotgun debugging)
58
+ - Are you ignoring the error message? (ignoring error messages)
59
+ - Are you blaming the framework without evidence? (assuming bug is elsewhere)
60
+ - The step is complete when you can state: "The root cause is [specific defect] in [specific location] because [evidence]"
61
+
62
+ ## Step 5: Fix Design
63
+ - Design the fix to address the ROOT CAUSE, not the symptom:
64
+ - IF the root cause is a missing null check THEN add validation at the source of the null, not a try-catch at the crash point
65
+ - IF the root cause is a race condition THEN add proper synchronization, not a retry/sleep workaround
66
+ - IF the root cause is a logic error THEN correct the logic, not add a special-case branch
67
+ - Evaluate fix options if multiple exist:
68
+ - **Correctness**: Does it fully resolve the root cause?
69
+ - **Scope**: Is the change minimal and focused? (avoid premature optimization — knowledge/anti-patterns.md #8)
70
+ - **Side effects**: Could the fix break other code paths? Check callers and dependents
71
+ - **Consistency**: Does it follow the codebase's existing patterns and conventions?
72
+ - Request @botlearn/code-review on the proposed fix before implementation if the change is non-trivial
73
+
74
+ ## Step 6: Regression Test Design
75
+ - Write at least ONE test that:
76
+ 1. **Fails** without the fix applied (reproduces the original bug)
77
+ 2. **Passes** with the fix applied
78
+ 3. Covers the specific input/state/sequence that triggered the bug
79
+ - Consider edge case tests:
80
+ - Boundary values (0, -1, MAX_INT, empty string, empty array, null)
81
+ - Concurrent access scenarios (if the bug was concurrency-related)
82
+ - Error handling paths (if the bug was in exception handling)
83
+ - Name the test descriptively to document the bug:
84
+ - `test_processOrder_returnsError_whenItemQuantityIsZero`
85
+ - `test_userList_rendersEmpty_whenApiReturnsEmptyArray`
86
+ - IF the codebase has no test infrastructure THEN provide the test as a standalone script with clear pass/fail output
87
+
88
+ ## Step 7: Verification
89
+ - Apply the fix and run the regression test — confirm it passes
90
+ - Run the FULL test suite — confirm no new failures introduced
91
+ - Re-test the original reproduction case from Step 3 — confirm the bug is resolved
92
+ - Verify edge cases from Step 6 — confirm they pass
93
+ - SELF-CHECK against knowledge/best-practices.md Fix Verification Checklist:
94
+ - [ ] Original bug no longer reproducible
95
+ - [ ] No new failures introduced
96
+ - [ ] Edge cases covered
97
+ - [ ] Regression test exists and is meaningful
98
+ - [ ] Fix addresses root cause, not symptom
99
+ - [ ] Code review completed
100
+ - IF any check fails THEN loop back to the appropriate step:
101
+ - New failures → Step 5 (revise fix design)
102
+ - Edge case failures → Step 6 (add more tests, adjust fix)
103
+ - Root cause not actually fixed → Step 4 (re-investigate)
104
+ - Output the final deliverable:
105
+ - **Root Cause**: One-sentence description of the defect
106
+ - **Evidence**: Key diagnostic findings that confirmed the root cause
107
+ - **Fix**: Description of the change with code diff
108
+ - **Regression Test**: The test(s) added
109
+ - **Risk Assessment**: Any residual risk or areas to monitor
@@ -0,0 +1,466 @@
1
+ {
2
+ "version": "0.0.1",
3
+ "dimension": "code-generation",
4
+ "tasks": [
5
+ {
6
+ "id": "bench-easy-01",
7
+ "difficulty": "easy",
8
+ "description": "Debug an off-by-one error in a Python loop",
9
+ "input": "My Python function is supposed to return the sum of all elements in a list, but it's returning the wrong value for some inputs.\n\n```python\ndef sum_list(numbers):\n total = 0\n for i in range(1, len(numbers)):\n total += numbers[i]\n return total\n```\n\nExample: `sum_list([10, 20, 30])` returns `50` instead of `60`. What's wrong?",
10
+ "rubric": [
11
+ {
12
+ "criterion": "Root Cause Identification",
13
+ "weight": 0.4,
14
+ "scoring": {
15
+ "5": "Correctly identifies that range(1, len(numbers)) skips index 0, so the first element is never added; explains that range() is exclusive of the start in this context (starts at 1, not 0)",
16
+ "3": "Identifies the loop starts at wrong index but explanation is imprecise",
17
+ "1": "Mentions off-by-one but doesn't pinpoint the exact issue",
18
+ "0": "Incorrect root cause"
19
+ }
20
+ },
21
+ {
22
+ "criterion": "Fix Quality",
23
+ "weight": 0.3,
24
+ "scoring": {
25
+ "5": "Suggests changing to range(0, len(numbers)) or range(len(numbers)) or using a for-each loop; may also mention sum() as the Pythonic alternative",
26
+ "3": "Correct fix but no alternatives or explanation of why it works",
27
+ "1": "Fix works but is overly complex",
28
+ "0": "Incorrect fix"
29
+ }
30
+ },
31
+ {
32
+ "criterion": "Regression Test",
33
+ "weight": 0.3,
34
+ "scoring": {
35
+ "5": "Suggests tests including: empty list, single element, multiple elements, negative numbers; provides test code",
36
+ "3": "Suggests at least one test case",
37
+ "1": "Mentions testing generally but no specific cases",
38
+ "0": "No test suggestion"
39
+ }
40
+ }
41
+ ],
42
+ "expectedScoreWithout": 50,
43
+ "expectedScoreWith": 90
44
+ },
45
+ {
46
+ "id": "bench-easy-02",
47
+ "difficulty": "easy",
48
+ "description": "Debug an unhandled null reference in JavaScript",
49
+ "input": "My Express.js API endpoint crashes sometimes with this error:\n\n```\nTypeError: Cannot read properties of undefined (reading 'email')\n at /app/routes/users.js:15:28\n```\n\nHere's the code:\n\n```javascript\napp.post('/api/users', (req, res) => {\n const email = req.body.email;\n const name = req.body.name;\n // ... save to database\n res.json({ success: true });\n});\n```\n\nIt works when I test with Postman but crashes when the frontend form is submitted.",
50
+ "rubric": [
51
+ {
52
+ "criterion": "Root Cause Identification",
53
+ "weight": 0.4,
54
+ "scoring": {
55
+ "5": "Identifies that req.body is undefined because the JSON body parser middleware (express.json()) is missing or not applied before this route; explains why Postman works (may set Content-Type differently or there's a route ordering issue)",
56
+ "3": "Identifies req.body is undefined but doesn't explain the middleware issue",
57
+ "1": "Says body is missing but doesn't identify why",
58
+ "0": "Incorrect root cause"
59
+ }
60
+ },
61
+ {
62
+ "criterion": "Fix Quality",
63
+ "weight": 0.35,
64
+ "scoring": {
65
+ "5": "Suggests adding app.use(express.json()) before the route; also suggests adding input validation (check req.body exists, check email/name are present); mentions Content-Type header requirement",
66
+ "3": "Suggests adding the middleware but no input validation",
67
+ "1": "Suggests a workaround like optional chaining without addressing the root cause",
68
+ "0": "Incorrect fix"
69
+ }
70
+ },
71
+ {
72
+ "criterion": "Debugging Process",
73
+ "weight": 0.25,
74
+ "scoring": {
75
+ "5": "Reads the stack trace, identifies the line, considers the Postman vs frontend difference as a diagnostic clue, forms hypothesis about middleware/headers",
76
+ "3": "Some analysis shown but incomplete",
77
+ "1": "Jumps to conclusion without analysis",
78
+ "0": "No process shown"
79
+ }
80
+ }
81
+ ],
82
+ "expectedScoreWithout": 40,
83
+ "expectedScoreWith": 85
84
+ },
85
+ {
86
+ "id": "bench-easy-03",
87
+ "difficulty": "easy",
88
+ "description": "Debug a Python ModuleNotFoundError",
89
+ "input": "I just cloned a Python project and ran it, but I get this error:\n\n```\nTraceback (most recent call last):\n File \"/app/main.py\", line 3, in <module>\n from app.services.email_service import send_notification\nModuleNotFoundError: No module named 'app.services.email_service'\n```\n\nThe file structure is:\n```\nproject/\n app/\n __init__.py\n main.py\n services/\n email_service.py\n```\n\nI'm running `python app/main.py` from the `project/` directory.",
90
+ "rubric": [
91
+ {
92
+ "criterion": "Root Cause Identification",
93
+ "weight": 0.4,
94
+ "scoring": {
95
+ "5": "Identifies that running `python app/main.py` sets `app/` as the script directory, so `app.services.email_service` is not resolvable; explains Python's module resolution (sys.path) and that the services/ directory may also be missing __init__.py",
96
+ "3": "Identifies the import path issue but doesn't fully explain the sys.path mechanism",
97
+ "1": "Mentions import issue vaguely",
98
+ "0": "Incorrect root cause"
99
+ }
100
+ },
101
+ {
102
+ "criterion": "Fix Quality",
103
+ "weight": 0.35,
104
+ "scoring": {
105
+ "5": "Suggests multiple fixes: (1) run with `python -m app.main` from project/, (2) add __init__.py to services/ if missing, (3) adjust the import to relative import; explains which approach is best and why",
106
+ "3": "Suggests one correct fix",
107
+ "1": "Fix is partially correct",
108
+ "0": "Incorrect fix"
109
+ }
110
+ },
111
+ {
112
+ "criterion": "Debugging Process",
113
+ "weight": 0.25,
114
+ "scoring": {
115
+ "5": "Analyzes the traceback, checks directory structure against the import path, considers multiple possible causes (missing __init__.py, wrong working directory, wrong invocation)",
116
+ "3": "Some analysis but doesn't consider all possibilities",
117
+ "1": "Minimal analysis",
118
+ "0": "No process"
119
+ }
120
+ }
121
+ ],
122
+ "expectedScoreWithout": 40,
123
+ "expectedScoreWith": 85
124
+ },
125
+ {
126
+ "id": "bench-med-01",
127
+ "difficulty": "medium",
128
+ "description": "Debug a race condition in a Node.js caching layer",
129
+ "input": "Our Node.js API has a caching layer that sometimes returns stale data. Users report that after updating their profile, the old profile data is returned for a random amount of time (sometimes seconds, sometimes minutes). Here's the caching code:\n\n```javascript\nconst cache = new Map();\n\nasync function getUserProfile(userId) {\n if (cache.has(userId)) {\n return cache.get(userId);\n }\n const profile = await db.query('SELECT * FROM users WHERE id = $1', [userId]);\n cache.set(userId, profile);\n return profile;\n}\n\nasync function updateUserProfile(userId, data) {\n await db.query('UPDATE users SET name=$1, email=$2 WHERE id=$3', [data.name, data.email, userId]);\n // Invalidate cache after update\n cache.delete(userId);\n return { success: true };\n}\n```\n\nWe're running 4 Node.js worker processes behind a load balancer.",
130
+ "rubric": [
131
+ {
132
+ "criterion": "Root Cause Identification",
133
+ "weight": 0.35,
134
+ "scoring": {
135
+ "5": "Identifies TWO issues: (1) In-memory cache is per-process, so updating on worker 1 doesn't invalidate cache on workers 2-4; (2) Even within one process there's a TOCTOU race: between cache.has() check and db.query(), another request could populate stale data. Explains load balancer routing as the amplifier",
136
+ "3": "Identifies the multi-process cache issue but misses the single-process race condition",
137
+ "1": "Mentions caching issue vaguely without pinpointing the multi-process or race condition aspect",
138
+ "0": "Incorrect root cause"
139
+ }
140
+ },
141
+ {
142
+ "criterion": "Fix Quality",
143
+ "weight": 0.3,
144
+ "scoring": {
145
+ "5": "Suggests replacing in-memory Map with a shared cache (Redis/Memcached) with TTL; optionally suggests cache-aside with invalidation broadcast or write-through caching pattern; discusses TTL as a safety net",
146
+ "3": "Suggests Redis but doesn't address the race condition or TTL strategy",
147
+ "1": "Suggests adding a sleep or retry, or only addresses one of the two issues",
148
+ "0": "Incorrect fix"
149
+ }
150
+ },
151
+ {
152
+ "criterion": "Debugging Process",
153
+ "weight": 0.2,
154
+ "scoring": {
155
+ "5": "Notes the 'random amount of time' symptom as a key clue pointing to multi-process behavior; considers the load balancer routing; systematically analyzes the read and write paths",
156
+ "3": "Some systematic analysis but misses key diagnostic clues",
157
+ "1": "Minimal analysis",
158
+ "0": "No process shown"
159
+ }
160
+ },
161
+ {
162
+ "criterion": "Regression Test",
163
+ "weight": 0.15,
164
+ "scoring": {
165
+ "5": "Suggests a test that simulates concurrent read-after-write across multiple processes/connections and verifies fresh data is returned; or suggests integration test with Redis",
166
+ "3": "Suggests a basic read-after-write test",
167
+ "1": "Mentions testing generally",
168
+ "0": "No test suggestion"
169
+ }
170
+ }
171
+ ],
172
+ "expectedScoreWithout": 25,
173
+ "expectedScoreWith": 70
174
+ },
175
+ {
176
+ "id": "bench-med-02",
177
+ "difficulty": "medium",
178
+ "description": "Debug a database connection pool exhaustion issue",
179
+ "input": "Our Java Spring Boot application starts timing out on database queries after running for about 30 minutes under load. The error we see is:\n\n```\norg.springframework.jdbc.CannotGetJdbcConnectionException: Failed to obtain JDBC Connection;\n nested exception is java.sql.SQLTransientConnectionException:\n HikariPool-1 - Connection is not available, request timed out after 30000ms.\n```\n\nConnection pool config:\n```yaml\nspring:\n datasource:\n hikari:\n maximum-pool-size: 10\n connection-timeout: 30000\n```\n\nThe suspicious service method:\n```java\n@Service\npublic class ReportService {\n @Autowired\n private JdbcTemplate jdbcTemplate;\n\n public Report generateReport(Long reportId) {\n Connection conn = DataSourceUtils.getConnection(jdbcTemplate.getDataSource());\n try {\n // Multiple queries using conn directly\n PreparedStatement ps1 = conn.prepareStatement(\"SELECT * FROM reports WHERE id = ?\");\n ps1.setLong(1, reportId);\n ResultSet rs1 = ps1.executeQuery();\n // ... process results\n\n PreparedStatement ps2 = conn.prepareStatement(\"SELECT * FROM report_items WHERE report_id = ?\");\n ps2.setLong(1, reportId);\n ResultSet rs2 = ps2.executeQuery();\n // ... process results\n\n return buildReport(rs1, rs2);\n } catch (SQLException e) {\n throw new RuntimeException(e);\n }\n }\n}\n```\n\nThe endpoint handling about 50 requests/minute.",
180
+ "rubric": [
181
+ {
182
+ "criterion": "Root Cause Identification",
183
+ "weight": 0.35,
184
+ "scoring": {
185
+ "5": "Identifies that DataSourceUtils.getConnection() borrows a connection from the pool but the code never releases it back (no finally block, no try-with-resources, no DataSourceUtils.releaseConnection()); connections leak on every call until the pool of 10 is exhausted; explains the 30-minute timeline based on request rate vs pool size",
186
+ "3": "Identifies the connection leak but doesn't explain the pool exhaustion timeline or the specific missing release mechanism",
187
+ "1": "Mentions connection issue but doesn't identify the leak",
188
+ "0": "Incorrect root cause (e.g., suggests increasing pool size)"
189
+ }
190
+ },
191
+ {
192
+ "criterion": "Fix Quality",
193
+ "weight": 0.3,
194
+ "scoring": {
195
+ "5": "Suggests adding a finally block with DataSourceUtils.releaseConnection(conn, dataSource); or refactoring to use JdbcTemplate directly (which manages connections automatically); or using try-with-resources; discusses PreparedStatement/ResultSet closing too",
196
+ "3": "Suggests one correct fix approach",
197
+ "1": "Suggests increasing pool size (treats symptom) or incomplete fix",
198
+ "0": "Incorrect fix"
199
+ }
200
+ },
201
+ {
202
+ "criterion": "Debugging Process",
203
+ "weight": 0.2,
204
+ "scoring": {
205
+ "5": "Analyzes the error message (pool timeout), notes the gradual onset pattern as characteristic of resource leaks, examines connection lifecycle in the code, identifies the missing release",
206
+ "3": "Some analysis but misses key diagnostic patterns",
207
+ "1": "Minimal analysis",
208
+ "0": "No process shown"
209
+ }
210
+ },
211
+ {
212
+ "criterion": "Regression Test",
213
+ "weight": 0.15,
214
+ "scoring": {
215
+ "5": "Suggests a test that calls generateReport() more times than the pool size and verifies no timeout occurs; or suggests monitoring active/idle connection counts",
216
+ "3": "Suggests basic testing approach",
217
+ "1": "Mentions testing generally",
218
+ "0": "No test suggestion"
219
+ }
220
+ }
221
+ ],
222
+ "expectedScoreWithout": 30,
223
+ "expectedScoreWith": 75
224
+ },
225
+ {
226
+ "id": "bench-med-03",
227
+ "difficulty": "medium",
228
+ "description": "Debug an incorrect API response caused by async/await misuse",
229
+ "input": "My Node.js Express endpoint is supposed to return enriched product data with reviews, but the reviews array is always empty even though I can see reviews in the database.\n\n```javascript\napp.get('/api/products/:id', async (req, res) => {\n try {\n const product = await db.products.findById(req.params.id);\n if (!product) return res.status(404).json({ error: 'Not found' });\n\n // Enrich with reviews\n product.reviews = getReviewsForProduct(product.id);\n\n // Enrich with recommendations\n product.recommendations = getRecommendations(product.category);\n\n res.json(product);\n } catch (err) {\n res.status(500).json({ error: err.message });\n }\n});\n\nasync function getReviewsForProduct(productId) {\n const reviews = await db.reviews.find({ productId });\n return reviews.map(r => ({\n author: r.authorName,\n rating: r.rating,\n text: r.content,\n date: r.createdAt\n }));\n}\n\nasync function getRecommendations(category) {\n const items = await db.products.find({ category, limit: 5 });\n return items.map(i => ({ id: i.id, name: i.name }));\n}\n```\n\nWhen I log `product.reviews` right before `res.json(product)`, it shows `Promise { <pending> }`.",
230
+ "rubric": [
231
+ {
232
+ "criterion": "Root Cause Identification",
233
+ "weight": 0.35,
234
+ "scoring": {
235
+ "5": "Identifies that getReviewsForProduct() and getRecommendations() are async functions but are called WITHOUT await, so they return Promise objects instead of resolved values; the log showing Promise { <pending> } is the definitive clue; both calls are affected",
236
+ "3": "Identifies the missing await for reviews but doesn't mention recommendations is also affected",
237
+ "1": "Mentions async issue vaguely",
238
+ "0": "Incorrect root cause"
239
+ }
240
+ },
241
+ {
242
+ "criterion": "Fix Quality",
243
+ "weight": 0.3,
244
+ "scoring": {
245
+ "5": "Adds await to both calls; also suggests using Promise.all() for parallel execution since the two enrichments are independent, improving performance; shows the corrected code",
246
+ "3": "Adds await to both calls but doesn't suggest parallel optimization",
247
+ "1": "Only fixes one of the two calls",
248
+ "0": "Incorrect fix"
249
+ }
250
+ },
251
+ {
252
+ "criterion": "Debugging Process",
253
+ "weight": 0.2,
254
+ "scoring": {
255
+ "5": "Uses the Promise { <pending> } log output as the key diagnostic clue, traces the function signatures to confirm they are async, identifies the pattern as a common async anti-pattern",
256
+ "3": "Identifies the issue but doesn't leverage the log output as evidence",
257
+ "1": "Minimal analysis",
258
+ "0": "No process shown"
259
+ }
260
+ },
261
+ {
262
+ "criterion": "Regression Test",
263
+ "weight": 0.15,
264
+ "scoring": {
265
+ "5": "Suggests a test that calls the endpoint and asserts product.reviews is an array of review objects (not a Promise); verifies both reviews and recommendations are populated",
266
+ "3": "Suggests a basic endpoint test",
267
+ "1": "Mentions testing generally",
268
+ "0": "No test suggestion"
269
+ }
270
+ }
271
+ ],
272
+ "expectedScoreWithout": 35,
273
+ "expectedScoreWith": 80
274
+ },
275
+ {
276
+ "id": "bench-med-04",
277
+ "difficulty": "medium",
278
+ "description": "Debug a CSS/React rendering issue with conditional class application",
279
+ "input": "I have a React component that should highlight table rows red when an order is overdue, but the styling never applies even though I can confirm the `isOverdue` flag is true.\n\n```jsx\nfunction OrderTable({ orders }) {\n return (\n <table>\n <tbody>\n {orders.map(order => (\n <tr\n key={order.id}\n className={order.isOverdue && 'overdue-row'}\n >\n <td>{order.id}</td>\n <td>{order.customerName}</td>\n <td>{order.dueDate}</td>\n <td>{order.status}</td>\n </tr>\n ))}\n </tbody>\n </table>\n );\n}\n```\n\n```css\n.overdue-row {\n background-color: #ffcccc;\n font-weight: bold;\n}\n```\n\nWhen I inspect the DOM, the `<tr>` elements have `class=\"overdue-row\"` set correctly. I also verified the CSS file is imported. But the rows still look completely normal. Other CSS classes in the same file work fine.",
280
+ "rubric": [
281
+ {
282
+ "criterion": "Root Cause Identification",
283
+ "weight": 0.4,
284
+ "scoring": {
285
+ "5": "Identifies that the CSS class IS being applied (DOM shows it), so the issue is CSS specificity or inheritance, not React logic; specifically, <tr> background-color is often overridden by <td> or browser default table styles; or a more specific selector elsewhere overrides .overdue-row; mentions CSS specificity as the root cause category",
286
+ "3": "Identifies a CSS specificity issue but doesn't explain why tr background is overridden by td",
287
+ "1": "Incorrectly focuses on the React conditional logic which is working correctly",
288
+ "0": "Incorrect root cause"
289
+ }
290
+ },
291
+ {
292
+ "criterion": "Fix Quality",
293
+ "weight": 0.35,
294
+ "scoring": {
295
+ "5": "Suggests targeting the <td> elements instead: `.overdue-row td { background-color: #ffcccc; }` or using !important as a quick test to confirm specificity is the issue; may also suggest inspecting computed styles in DevTools to see which rule wins",
296
+ "3": "Suggests a working fix but doesn't explain the specificity mechanism",
297
+ "1": "Suggests unrelated fixes (e.g., changing the React conditional logic)",
298
+ "0": "Incorrect fix"
299
+ }
300
+ },
301
+ {
302
+ "criterion": "Debugging Process",
303
+ "weight": 0.25,
304
+ "scoring": {
305
+ "5": "Systematically rules out React (DOM shows class is applied), rules out CSS import (other classes work), narrows to CSS specificity/cascade issue; suggests using DevTools computed styles tab to trace the cascade",
306
+ "3": "Some systematic narrowing but incomplete",
307
+ "1": "Minimal analysis",
308
+ "0": "No process shown"
309
+ }
310
+ }
311
+ ],
312
+ "expectedScoreWithout": 25,
313
+ "expectedScoreWith": 70
314
+ },
315
+ {
316
+ "id": "bench-hard-01",
317
+ "difficulty": "hard",
318
+ "description": "Debug a subtle memory leak in a Node.js long-running service",
319
+ "input": "Our Node.js microservice's memory usage grows from 150MB to 1.2GB over 24 hours, then crashes with OOM. We've taken heap snapshots at 1h, 12h, and 23h. The top retained objects at 23h are:\n\n```\nRetained Size | Object\n 412 MB | (array) in EventEmitter._events.data\n 298 MB | (array) in Map (connectionHandlers)\n 89 MB | (string) in Set (processedIds)\n```\n\nHere's the relevant code:\n\n```javascript\nclass MessageProcessor extends EventEmitter {\n constructor() {\n super();\n this.connectionHandlers = new Map();\n this.processedIds = new Set();\n }\n\n registerConnection(connectionId, socket) {\n const handler = (data) => {\n this.processedIds.add(data.messageId);\n this.emit('data', { connectionId, payload: data });\n // ... process message\n };\n this.connectionHandlers.set(connectionId, handler);\n socket.on('message', handler);\n }\n\n handleDisconnect(connectionId) {\n this.connectionHandlers.delete(connectionId);\n console.log(`Connection ${connectionId} cleaned up`);\n }\n}\n\n// In the server setup:\nconst processor = new MessageProcessor();\n\nwsServer.on('connection', (socket) => {\n const connId = generateId();\n processor.registerConnection(connId, socket);\n\n socket.on('close', () => {\n processor.handleDisconnect(connId);\n });\n\n processor.on('data', (event) => {\n metrics.record(event);\n });\n});\n```\n\nWe handle about 500 connections/hour, with average session duration of 10 minutes.",
320
+ "rubric": [
321
+ {
322
+ "criterion": "Root Cause Identification",
323
+ "weight": 0.35,
324
+ "scoring": {
325
+ "5": "Identifies ALL THREE leaks: (1) processor.on('data') inside the connection handler adds a NEW listener for every connection but never removes it (listeners accumulate on the EventEmitter — the 412MB); (2) socket.on('message', handler) — the handler is removed from connectionHandlers Map but the socket listener is never removed with socket.removeListener/off (partial cleanup — the 298MB may relate to closures retained by these orphaned listeners); (3) processedIds Set grows unboundedly since messageIds are never purged (the 89MB). Correlates each to the heap snapshot data",
326
+ "3": "Identifies 2 of the 3 leaks",
327
+ "1": "Identifies 1 leak",
328
+ "0": "Incorrect analysis or no leaks identified"
329
+ }
330
+ },
331
+ {
332
+ "criterion": "Fix Quality",
333
+ "weight": 0.3,
334
+ "scoring": {
335
+ "5": "Fixes all three: (1) move processor.on('data') outside the connection loop or use a single shared listener; (2) in handleDisconnect, also call socket.removeListener('message', handler) or socket.off(); (3) add TTL-based pruning or a max-size cap to processedIds; discusses setMaxListeners warning as a clue that was likely ignored",
336
+ "3": "Fixes 2 of 3 leaks correctly",
337
+ "1": "Fixes 1 leak",
338
+ "0": "Incorrect fixes"
339
+ }
340
+ },
341
+ {
342
+ "criterion": "Debugging Process",
343
+ "weight": 0.2,
344
+ "scoring": {
345
+ "5": "Uses heap snapshot data as primary evidence; correlates retained sizes to specific data structures in the code; calculates expected growth rate (500 conn/hr * 10min avg = ~83 concurrent, but listener count grows monotonically); explains why handleDisconnect is insufficient",
346
+ "3": "Uses heap snapshot data but analysis is incomplete",
347
+ "1": "Mentions memory leak patterns generally without connecting to the specific evidence",
348
+ "0": "No diagnostic process shown"
349
+ }
350
+ },
351
+ {
352
+ "criterion": "Regression Test",
353
+ "weight": 0.15,
354
+ "scoring": {
355
+ "5": "Suggests a test that creates N connections, disconnects them all, then checks: EventEmitter listener count is back to baseline, connectionHandlers Map is empty, processedIds has bounded size; suggests heap snapshot comparison in CI",
356
+ "3": "Suggests monitoring memory or checking listener count",
357
+ "1": "Mentions testing generally",
358
+ "0": "No test suggestion"
359
+ }
360
+ }
361
+ ],
362
+ "expectedScoreWithout": 15,
363
+ "expectedScoreWith": 60
364
+ },
365
+ {
366
+ "id": "bench-hard-02",
367
+ "difficulty": "hard",
368
+ "description": "Debug a distributed system consistency issue with eventual consistency and message ordering",
369
+ "input": "We have an e-commerce system with an Order Service and an Inventory Service communicating via a message queue (RabbitMQ). Occasionally, customers can purchase items that are actually out of stock. The flow is:\n\n1. Order Service receives purchase request\n2. Order Service publishes `OrderCreated` event to queue\n3. Inventory Service consumes the event and decrements stock\n4. If stock goes below 0, Inventory Service publishes `StockDepleted` event\n5. Order Service consumes `StockDepleted` and should cancel the order\n\n```javascript\n// Order Service\nasync function createOrder(req, res) {\n const order = await db.orders.create({\n userId: req.body.userId,\n productId: req.body.productId,\n quantity: req.body.quantity,\n status: 'confirmed'\n });\n await messageQueue.publish('order.created', {\n orderId: order.id,\n productId: order.productId,\n quantity: order.quantity\n });\n return res.json({ order, message: 'Order confirmed' });\n}\n\n// Inventory Service\nasync function handleOrderCreated(event) {\n const product = await db.products.findById(event.productId);\n product.stock -= event.quantity;\n await product.save();\n if (product.stock < 0) {\n await messageQueue.publish('stock.depleted', {\n productId: event.productId,\n orderId: event.orderId\n });\n }\n}\n```\n\nWe see this issue primarily during flash sales when many orders come in simultaneously for the same product. The product might have stock=5 but 20 orders get confirmed.",
370
+ "rubric": [
371
+ {
372
+ "criterion": "Root Cause Identification",
373
+ "weight": 0.35,
374
+ "scoring": {
375
+ "5": "Identifies MULTIPLE interacting issues: (1) No stock check before confirming the order — the order is 'confirmed' immediately before inventory validation; (2) Race condition in inventory decrement — concurrent handleOrderCreated calls read the same stock value, each decrements from the same base, creating a lost-update problem (no DB-level locking or atomic operation); (3) The architecture is fundamentally flawed for this use case — stock reservation should happen synchronously, not via eventual consistency",
376
+ "3": "Identifies the race condition in inventory but misses the premature confirmation issue",
377
+ "1": "Identifies one issue but misses the systemic architecture problem",
378
+ "0": "Incorrect root cause"
379
+ }
380
+ },
381
+ {
382
+ "criterion": "Fix Quality",
383
+ "weight": 0.3,
384
+ "scoring": {
385
+ "5": "Proposes a comprehensive fix: (1) Use optimistic locking or atomic DB operation (UPDATE products SET stock = stock - ? WHERE id = ? AND stock >= ?) for the inventory decrement; (2) Change order status to 'pending' until inventory is confirmed; (3) Consider synchronous stock reservation (e.g., Saga pattern or synchronous API call) for critical-path operations; discusses trade-offs between approaches",
386
+ "3": "Suggests atomic DB operation but doesn't address the order confirmation flow",
387
+ "1": "Suggests adding a simple lock without considering the distributed nature",
388
+ "0": "Incorrect fix"
389
+ }
390
+ },
391
+ {
392
+ "criterion": "Debugging Process",
393
+ "weight": 0.2,
394
+ "scoring": {
395
+ "5": "Analyzes the flash-sale scenario step by step: traces the timeline of concurrent requests, identifies where the data race occurs, explains why eventual consistency is insufficient for inventory reservation, references the stock=5/20-orders scenario as evidence of lost updates",
396
+ "3": "Some timeline analysis but incomplete",
397
+ "1": "Minimal analysis",
398
+ "0": "No process shown"
399
+ }
400
+ },
401
+ {
402
+ "criterion": "Regression Test",
403
+ "weight": 0.15,
404
+ "scoring": {
405
+ "5": "Suggests a concurrent load test: create a product with stock=5, fire 20 simultaneous order requests, verify exactly 5 are confirmed and 15 are rejected/pending; verify final stock is 0 (not negative)",
406
+ "3": "Suggests a basic concurrency test",
407
+ "1": "Mentions testing generally",
408
+ "0": "No test suggestion"
409
+ }
410
+ }
411
+ ],
412
+ "expectedScoreWithout": 15,
413
+ "expectedScoreWith": 60
414
+ },
415
+ {
416
+ "id": "bench-hard-03",
417
+ "difficulty": "hard",
418
+ "description": "Debug a complex TypeScript type error in a generic data pipeline",
419
+ "input": "I'm building a type-safe data transformation pipeline in TypeScript and getting a complex type error I can't understand. The code:\n\n```typescript\ntype TransformFn<TIn, TOut> = (input: TIn) => TOut;\n\ninterface PipelineStep<TIn, TOut> {\n name: string;\n transform: TransformFn<TIn, TOut>;\n}\n\nclass Pipeline<TInput> {\n private steps: PipelineStep<any, any>[] = [];\n\n pipe<TOut>(step: PipelineStep<TInput, TOut>): Pipeline<TOut> {\n this.steps.push(step);\n return this as unknown as Pipeline<TOut>;\n }\n\n execute(input: TInput): any {\n return this.steps.reduce((acc, step) => step.transform(acc), input);\n }\n}\n\n// Usage:\ninterface RawData {\n name: string;\n age: string;\n scores: string;\n}\n\ninterface ParsedData {\n name: string;\n age: number;\n scores: number[];\n}\n\ninterface EnrichedData extends ParsedData {\n ageGroup: 'junior' | 'senior';\n averageScore: number;\n}\n\nconst pipeline = new Pipeline<RawData>()\n .pipe({\n name: 'parse',\n transform: (raw: RawData): ParsedData => ({\n name: raw.name,\n age: parseInt(raw.age),\n scores: raw.scores.split(',').map(Number)\n })\n })\n .pipe({\n name: 'enrich',\n transform: (parsed: ParsedData): EnrichedData => ({\n ...parsed,\n ageGroup: parsed.age >= 18 ? 'senior' : 'junior',\n averageScore: parsed.scores.reduce((a, b) => a + b, 0) / parsed.scores.length\n })\n });\n\nconst result = pipeline.execute({ name: 'Alice', age: '25', scores: '90,85,92' });\n// result type is 'any' — I want it to be EnrichedData\n// Also, the second .pipe() gives error:\n// Argument of type 'PipelineStep<ParsedData, EnrichedData>' is not assignable\n// to parameter of type 'PipelineStep<RawData, EnrichedData>'\n```\n\nHow do I fix this so the pipeline is truly type-safe with proper inference through the chain?",
420
+ "rubric": [
421
+ {
422
+ "criterion": "Root Cause Identification",
423
+ "weight": 0.35,
424
+ "scoring": {
425
+ "5": "Identifies that the core issue is the pipe() method's type parameter: after the first pipe() call, `this` is cast to Pipeline<TOut> (Pipeline<ParsedData>), BUT the cast `as unknown as Pipeline<TOut>` doesn't actually change the object's generic type at the type level for subsequent method calls — TypeScript sees the original TInput=RawData for the second pipe(). The fundamental problem is that TypeScript cannot track mutating generic types on `this` through method chaining without a builder pattern or function composition approach",
426
+ "3": "Identifies the type erasure issue with the cast but doesn't fully explain why TypeScript can't track the chain",
427
+ "1": "Mentions generic type issue vaguely",
428
+ "0": "Incorrect root cause"
429
+ }
430
+ },
431
+ {
432
+ "criterion": "Fix Quality",
433
+ "weight": 0.35,
434
+ "scoring": {
435
+ "5": "Proposes a proper solution: (1) Use a function-based pipe composition (like fp-ts pipe, or a standalone pipe function that returns a new Pipeline with the correct output type), OR (2) Redesign using a builder pattern where each pipe() returns a new Pipeline<TOut> instance (not casting this), OR (3) Use overloaded type signatures for chaining. Shows working code. Fixes the execute() return type to be the final output type instead of any",
436
+ "3": "Proposes a working fix for the type error but the pipeline isn't fully type-safe end-to-end",
437
+ "1": "Suggests using 'any' or type assertions to silence the error",
438
+ "0": "Incorrect fix"
439
+ }
440
+ },
441
+ {
442
+ "criterion": "Debugging Process",
443
+ "weight": 0.15,
444
+ "scoring": {
445
+ "5": "Traces the type inference through each step of the chain, shows what TypeScript infers at each point, explains the disconnect between the runtime cast and the type system's view",
446
+ "3": "Some type analysis but incomplete",
447
+ "1": "Minimal analysis",
448
+ "0": "No process shown"
449
+ }
450
+ },
451
+ {
452
+ "criterion": "Regression Test",
453
+ "weight": 0.15,
454
+ "scoring": {
455
+ "5": "Suggests compile-time type tests: verify that result is inferred as EnrichedData, verify that piping incompatible types produces a compile error, verify execute() input type matches the initial pipeline type",
456
+ "3": "Suggests a basic type assertion test",
457
+ "1": "Mentions testing generally",
458
+ "0": "No test suggestion"
459
+ }
460
+ }
461
+ ],
462
+ "expectedScoreWithout": 20,
463
+ "expectedScoreWith": 60
464
+ }
465
+ ]
466
+ }
@@ -0,0 +1,54 @@
1
+ {
2
+ "version": "0.0.1",
3
+ "timeout": 60,
4
+ "tasks": [
5
+ {
6
+ "id": "smoke-01",
7
+ "description": "Debug a React component crash with a TypeError by analyzing the stack trace, identifying root cause, and suggesting a fix with regression test",
8
+ "input": "My React app crashes when I load the user profile page. Here's the error:\n\nTypeError: Cannot read properties of undefined (reading 'map')\n at UserProfile (src/components/UserProfile.jsx:23:34)\n at renderWithHooks (node_modules/react-dom/cjs/react-dom.development.js:16305:18)\n at mountIndeterminateComponent (node_modules/react-dom/cjs/react-dom.development.js:20069:13)\n\nHere's the relevant code:\n\n```jsx\nfunction UserProfile({ userId }) {\n const [user, setUser] = useState(null);\n\n useEffect(() => {\n fetch(`/api/users/${userId}`)\n .then(res => res.json())\n .then(data => setUser(data));\n }, [userId]);\n\n return (\n <div>\n <h1>{user.name}</h1>\n <ul>\n {user.posts.map(post => (\n <li key={post.id}>{post.title}</li>\n ))}\n </ul>\n </div>\n );\n}\n```\n\nIt works fine after I navigate to the page from elsewhere, but crashes on direct page load or refresh.",
9
+ "rubric": [
10
+ {
11
+ "criterion": "Root Cause Identification",
12
+ "weight": 0.35,
13
+ "scoring": {
14
+ "5": "Correctly identifies that user is null on initial render because useState initializes to null and the fetch hasn't completed yet; explains the timing issue between render and async data loading",
15
+ "3": "Identifies the null reference issue but doesn't fully explain the timing/lifecycle connection",
16
+ "1": "Mentions null but doesn't pinpoint why it's null on initial render",
17
+ "0": "Incorrect root cause identification"
18
+ }
19
+ },
20
+ {
21
+ "criterion": "Fix Quality",
22
+ "weight": 0.3,
23
+ "scoring": {
24
+ "5": "Suggests a proper fix: add a loading state guard (if (!user) return loading), or use optional chaining (user?.posts?.map), or initialize state with {name: '', posts: []}; explains trade-offs between approaches",
25
+ "3": "Suggests a working fix but doesn't explain trade-offs or only addresses one of the two null access points (user.name and user.posts.map)",
26
+ "1": "Suggests wrapping in try-catch or a fix that only masks the symptom",
27
+ "0": "No fix suggested or fix is incorrect"
28
+ }
29
+ },
30
+ {
31
+ "criterion": "Debugging Process",
32
+ "weight": 0.2,
33
+ "scoring": {
34
+ "5": "Demonstrates systematic analysis: reads stack trace, identifies the failure line, analyzes component lifecycle, formulates hypothesis, explains why it works on navigation (cached data) but not on direct load",
35
+ "3": "Shows some systematic analysis but skips steps or doesn't explain the navigation vs. direct load difference",
36
+ "1": "Jumps to fix without analysis",
37
+ "0": "No debugging process visible"
38
+ }
39
+ },
40
+ {
41
+ "criterion": "Regression Test",
42
+ "weight": 0.15,
43
+ "scoring": {
44
+ "5": "Suggests a test that renders UserProfile before fetch completes and verifies it shows loading state without crashing; includes test code or clear pseudocode",
45
+ "3": "Mentions testing but doesn't provide a specific test case",
46
+ "1": "No test suggestion",
47
+ "0": "Suggests inappropriate test"
48
+ }
49
+ }
50
+ ],
51
+ "passThreshold": 60
52
+ }
53
+ ]
54
+ }