clawpowers 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (42) hide show
  1. package/.claude-plugin/manifest.json +19 -0
  2. package/.codex/INSTALL.md +36 -0
  3. package/.cursor-plugin/manifest.json +21 -0
  4. package/.opencode/INSTALL.md +52 -0
  5. package/ARCHITECTURE.md +69 -0
  6. package/README.md +381 -0
  7. package/bin/clawpowers.js +390 -0
  8. package/bin/clawpowers.sh +91 -0
  9. package/gemini-extension.json +32 -0
  10. package/hooks/session-start +205 -0
  11. package/hooks/session-start.cmd +43 -0
  12. package/hooks/session-start.js +163 -0
  13. package/package.json +54 -0
  14. package/runtime/feedback/analyze.js +621 -0
  15. package/runtime/feedback/analyze.sh +546 -0
  16. package/runtime/init.js +172 -0
  17. package/runtime/init.sh +145 -0
  18. package/runtime/metrics/collector.js +361 -0
  19. package/runtime/metrics/collector.sh +308 -0
  20. package/runtime/persistence/store.js +433 -0
  21. package/runtime/persistence/store.sh +303 -0
  22. package/skill.json +74 -0
  23. package/skills/agent-payments/SKILL.md +411 -0
  24. package/skills/brainstorming/SKILL.md +233 -0
  25. package/skills/content-pipeline/SKILL.md +282 -0
  26. package/skills/dispatching-parallel-agents/SKILL.md +305 -0
  27. package/skills/executing-plans/SKILL.md +255 -0
  28. package/skills/finishing-a-development-branch/SKILL.md +260 -0
  29. package/skills/learn-how-to-learn/SKILL.md +235 -0
  30. package/skills/market-intelligence/SKILL.md +288 -0
  31. package/skills/prospecting/SKILL.md +313 -0
  32. package/skills/receiving-code-review/SKILL.md +225 -0
  33. package/skills/requesting-code-review/SKILL.md +206 -0
  34. package/skills/security-audit/SKILL.md +308 -0
  35. package/skills/subagent-driven-development/SKILL.md +244 -0
  36. package/skills/systematic-debugging/SKILL.md +279 -0
  37. package/skills/test-driven-development/SKILL.md +299 -0
  38. package/skills/using-clawpowers/SKILL.md +137 -0
  39. package/skills/using-git-worktrees/SKILL.md +261 -0
  40. package/skills/verification-before-completion/SKILL.md +254 -0
  41. package/skills/writing-plans/SKILL.md +276 -0
  42. package/skills/writing-skills/SKILL.md +260 -0
@@ -0,0 +1,244 @@
1
+ ---
2
+ name: subagent-driven-development
3
+ description: Orchestrate complex tasks by dispatching fresh subagents with isolated context, two-stage review, and Git worktree isolation. Activate when a task is large enough to benefit from parallelism or context separation.
4
+ version: 1.0.0
5
+ requires:
6
+ tools: [git, bash]
7
+ runtime: false
8
+ metrics:
9
+ tracks: [tasks_dispatched, subagent_success_rate, review_pass_rate, time_to_completion]
10
+ improves: [task_decomposition_quality, spec_clarity, review_threshold]
11
+ ---
12
+
13
+ # Subagent-Driven Development
14
+
15
+ ## When to Use
16
+
17
+ Apply this skill when you encounter:
18
+
19
+ - A task with 3+ logically independent workstreams
20
+ - A task so large it would exhaust a single context window
21
+ - A feature requiring multiple specialists (frontend + backend + tests + docs)
22
+ - Any work where a bug in one component shouldn't block another
23
+ - A task with clear interfaces between components (you can spec them up front)
24
+
25
+ **Skip this skill when:**
26
+ - The task is tightly coupled — one change cascades everywhere
27
+ - You need to maintain narrative continuity across all components
28
+ - The task is < 2 hours of work for a single agent
29
+ - You don't have enough information to spec subagent boundaries yet
30
+
31
+ **Decision tree:**
32
+ ```
33
+ Can the task be split into N parts with defined interfaces?
34
+ ├── No → single-agent execution
35
+ └── Yes → Can subagents work concurrently without blocking each other?
36
+ ├── No → sequential execution with checkpointing (executing-plans)
37
+ └── Yes → subagent-driven-development ← YOU ARE HERE
38
+ ```
39
+
40
+ ## Core Methodology
41
+
42
+ ### Stage 0: Task Decomposition (do this yourself, not in a subagent)
43
+
44
+ Before dispatching anything, produce:
45
+
46
+ 1. **Task tree** — hierarchical breakdown of the full work
47
+ 2. **Subagent boundaries** — where one agent's output is another's input
48
+ 3. **Interface contracts** — what each subagent accepts and delivers
49
+ 4. **Dependency order** — which can run in parallel, which must sequence
50
+
51
+ **Decomposition heuristic:** Each subagent task should be completable in 1 context window (roughly 2-5K tokens of output). If larger, decompose further.
52
+
53
+ **Example decomposition for "Build authentication service":**
54
+ ```
55
+ auth-service/
56
+ ├── Subagent A: API design + OpenAPI spec [no dependencies]
57
+ ├── Subagent B: Database schema + migrations [no dependencies]
58
+ ├── Subagent C: Core auth logic (JWT, bcrypt) [depends on: A, B specs]
59
+ ├── Subagent D: Integration tests [depends on: C output]
60
+ └── Subagent E: Documentation [depends on: A, C, D output]
61
+ ```
62
+
63
+ ### Stage 1: Spec Writing (per subagent)
64
+
65
+ For each subagent, write a precise spec that includes:
66
+
67
+ ```markdown
68
+ ## Subagent Spec: [Component Name]
69
+
70
+ **Objective:** [Single sentence — what this subagent produces]
71
+
72
+ **Context provided:**
73
+ - [File or artifact they receive as input]
74
+ - [Interface contract from upstream subagent]
75
+
76
+ **Deliverables:**
77
+ - [Specific file or artifact, not vague output]
78
+ - [Test file covering the deliverable]
79
+
80
+ **Constraints:**
81
+ - [Language/framework requirements]
82
+ - [Performance requirements if applicable]
83
+ - [Must not break: existing interfaces]
84
+
85
+ **Done criteria:**
86
+ - [ ] All tests pass
87
+ - [ ] Interface contract satisfied
88
+ - [ ] No TODOs or stubs in production code
89
+ ```
90
+
91
+ **Anti-pattern:** Vague specs produce vague output. "Build the auth logic" is not a spec. "Implement JWT issuance and validation with RS256, returning {token, expiresAt, userId} from issue() and {valid, userId, error} from validate()" is a spec.
92
+
93
+ ### Stage 2: Worktree Isolation
94
+
95
+ Each subagent works in an isolated Git worktree to prevent interference:
96
+
97
+ ```bash
98
+ # Create worktrees for parallel subagents
99
+ git worktree add ../task-auth-api feature/auth-api
100
+ git worktree add ../task-auth-db feature/auth-db
101
+ git worktree add ../task-auth-core feature/auth-core
102
+
103
+ # Verify isolation
104
+ git worktree list
105
+ ```
106
+
107
+ Worktrees share the repo history but have independent working directories. A subagent working in `../task-auth-api` cannot accidentally overwrite files in `../task-auth-core`.
108
+
109
+ See: `skills/using-git-worktrees/SKILL.md` for full worktree management protocol.
110
+
111
+ ### Stage 3: Subagent Dispatch
112
+
113
+ Dispatch each subagent with:
114
+ 1. The spec (complete, not abbreviated)
115
+ 2. All input artifacts (relevant files, interface contracts)
116
+ 3. Access to their assigned worktree
117
+ 4. No instruction to "skip complicated parts" or "use a stub"
118
+
119
+ **Dispatch instruction template:**
120
+ ```
121
+ You are implementing [component]. Your spec is below. Work only in the provided
122
+ worktree directory. Produce real, working code with tests — no stubs, no TODOs.
123
+ Deliver: [specific files]. When done, output a JSON summary of what you built.
124
+
125
+ [Full spec here]
126
+ ```
127
+
128
+ ### Stage 4: Two-Stage Review
129
+
130
+ **Stage 4a: Spec review** — Before running any subagent code, review that:
131
+ - The output matches the spec's deliverables
132
+ - Interface contracts are satisfied (types match, method signatures match)
133
+ - No stubs or mocks in production code paths
134
+ - Tests exist and cover the critical paths
135
+
136
+ **Stage 4b: Quality review** — After running the code:
137
+ - All tests pass (zero failing)
138
+ - No linting errors
139
+ - Performance meets requirements
140
+ - Security: no hardcoded credentials, no SQL injection vectors, no unvalidated inputs
141
+
142
+ **Review failure protocol:**
143
+ ```
144
+ If Stage 4a fails → return spec to subagent with specific failure reason
145
+ If Stage 4b fails → return to subagent with exact failing test output
146
+ Never merge code that fails either review stage
147
+ ```
148
+
149
+ ### Stage 5: Integration
150
+
151
+ After all subagents pass review:
152
+
153
+ 1. Merge worktrees in dependency order
154
+ 2. Run full integration test suite
155
+ 3. Resolve any interface mismatches (typically minor type issues)
156
+ 4. Clean up worktrees
157
+
158
+ ```bash
159
+ # Merge in order (B and C are independent, merge alphabetically)
160
+ git checkout main
161
+ git merge feature/auth-db
162
+ git merge feature/auth-api
163
+ git merge feature/auth-core # depends on both
164
+ git merge feature/auth-tests
165
+ git merge feature/auth-docs
166
+
167
+ # Clean up
168
+ git worktree remove ../task-auth-api
169
+ git worktree remove ../task-auth-db
170
+ # ... etc
171
+ ```
172
+
173
+ ## ClawPowers Enhancement
174
+
175
+ When `~/.clawpowers/` runtime is initialized:
176
+
177
+ **Persistent Execution DB:** Every subagent dispatch is logged with spec hash, start time, subagent ID, and outcome. If a session is interrupted, you know exactly which subagents completed and which to re-run.
178
+
179
+ ```bash
180
+ # Record dispatch
181
+ bash runtime/persistence/store.sh set "subagent:auth-api:status" "dispatched"
182
+ bash runtime/persistence/store.sh set "subagent:auth-api:spec_hash" "$(echo "$SPEC" | sha256sum | cut -c1-8)"
183
+
184
+ # Check on resume
185
+ bash runtime/persistence/store.sh get "subagent:auth-api:status"
186
+ ```
187
+
188
+ **Resumable Checkpoints:** The framework saves the task tree and each subagent's completion state. A session that crashes mid-dispatch resumes from the last successful checkpoint, not from scratch.
189
+
190
+ **Outcome Metrics:** After integration, record:
191
+ ```bash
192
+ bash runtime/metrics/collector.sh record \
193
+ --skill subagent-driven-development \
194
+ --outcome success \
195
+ --duration 3600 \
196
+ --notes "auth-service: 5 subagents, 2 review cycles, 0 integration failures"
197
+ ```
198
+
199
+ **Metric-driven decomposition:** After 10+ executions, `runtime/feedback/analyze.sh` identifies your optimal subagent granularity — tasks that are too small (high coordination overhead) or too large (high review failure rate).
200
+
201
+ ## Anti-Patterns
202
+
203
+ | Anti-Pattern | Why It Fails | Correct Approach |
204
+ |-------------|-------------|-----------------|
205
+ | Vague spec ("build the auth thing") | Subagent guesses, output is wrong | Write spec with deliverables and done criteria |
206
+ | Skip the failure witness | Review catches nothing | Require all tests to pass in the review stage |
207
+ | Merge before review | Bad code enters main | Two-stage review is non-negotiable |
208
+ | Single worktree for multiple agents | Files overwrite each other | One worktree per subagent, always |
209
+ | Decompose too fine | Excessive coordination cost | Target 1-context-window tasks (2-5K token output) |
210
+ | Decompose too coarse | Subagent context exhaustion | If output > 1 context window, split further |
211
+ | Stub the hard parts | Tech debt accumulates | "No stubs" is a hard constraint in the spec |
212
+
213
+ ## Examples
214
+
215
+ ### Example 1: Simple (2 subagents)
216
+
217
+ **Task:** Add email verification to existing user signup
218
+
219
+ **Decomposition:**
220
+ - Subagent A: Email service integration (SendGrid/SES wrapper, template rendering)
221
+ - Subagent B: Verification flow (token generation, storage, verification endpoint)
222
+ - Sequential: B depends on A's interface
223
+
224
+ **Specs:** A delivers `EmailService` class with `send(to, template, vars)` → B uses that interface
225
+
226
+ ### Example 2: Complex (5 subagents)
227
+
228
+ **Task:** Build real-time dashboard
229
+
230
+ **Decomposition:**
231
+ - Subagent A: WebSocket server (connection mgmt, message routing) [parallel]
232
+ - Subagent B: Data aggregation service (query engine, caching) [parallel]
233
+ - Subagent C: Frontend dashboard components (React, chart library) [parallel]
234
+ - Subagent D: Integration tests (WebSocket + aggregation E2E) [depends on A, B]
235
+ - Subagent E: Dashboard state management (connects C to A/B) [depends on A, B, C]
236
+
237
+ **Parallel dispatch:** A, B, C run concurrently. D and E run after A, B, C complete review.
238
+
239
+ ## Integration with Other Skills
240
+
241
+ - Use `writing-plans` first if you don't have a clear task tree yet
242
+ - Apply `using-git-worktrees` for worktree lifecycle management
243
+ - Use `dispatching-parallel-agents` if subagents run as independent processes
244
+ - Apply `verification-before-completion` before final integration merge
@@ -0,0 +1,279 @@
1
+ ---
2
+ name: systematic-debugging
3
+ description: Hypothesis-driven debugging with evidence collection. Activate when you encounter unexpected behavior, a failing test, or a bug report.
4
+ version: 1.0.0
5
+ requires:
6
+ tools: [bash, git]
7
+ runtime: false
8
+ metrics:
9
+ tracks: [hypotheses_tested, time_to_root_cause, false_positives, reopen_rate]
10
+ improves: [hypothesis_quality, evidence_collection_speed, known_issue_match_rate]
11
+ ---
12
+
13
+ # Systematic Debugging
14
+
15
+ ## When to Use
16
+
17
+ Apply this skill when:
18
+
19
+ - A test is failing and the cause isn't immediately obvious
20
+ - A bug report describes behavior that shouldn't happen
21
+ - Code that worked before suddenly doesn't
22
+ - A production alert is firing
23
+ - You've tried 2+ fixes without understanding why they work or don't
24
+
25
+ **Skip when:**
26
+ - The cause is obvious from the error message (typo, import missing, syntax error)
27
+ - You've seen this exact error before and know the fix
28
+ - It's a configuration issue, not a logic bug
29
+
30
+ **Decision tree:**
31
+ ```
32
+ Is the error message self-explanatory?
33
+ ├── Yes → fix it directly
34
+ └── No → Have you seen this pattern before?
35
+ ├── Yes → apply the known fix, verify, document
36
+ └── No → systematic-debugging ← YOU ARE HERE
37
+ ```
38
+
39
+ ## Core Methodology
40
+
41
+ ### The Scientific Debugging Loop
42
+
43
+ ```
44
+ Observe → Form hypothesis → Design experiment → Execute → Collect evidence → Conclude → Repeat
45
+ ```
46
+
47
+ Never skip steps. The most common debugging failure is jumping from "observe" directly to "try a fix" — which produces random mutations until something accidentally works, with no understanding of why.
48
+
49
+ ### Step 1: Observation (Gather All Evidence First)
50
+
51
+ Before forming any hypothesis, collect:
52
+
53
+ **Required evidence:**
54
+ - [ ] Exact error message (full stack trace, not a summary)
55
+ - [ ] Steps to reproduce (minimal reproducible case)
56
+ - [ ] What changed recently (git log since last known good)
57
+ - [ ] Environment (OS, language version, dependency versions)
58
+ - [ ] Frequency (always, intermittent, under specific conditions)
59
+
60
+ **Observation template:**
61
+ ```markdown
62
+ ## Bug Observation
63
+
64
+ **Error:** [Paste exact error/stack trace]
65
+ **Reproduces:** [Always / Intermittent (N/M times) / Only when X]
66
+ **Environment:** [OS, runtime version, key dependency versions]
67
+ **Last known good:** [commit hash or date when this worked]
68
+ **Recent changes:** [output of: git log --oneline --since="3 days ago"]
69
+ **Minimal repro:**
70
+ [Smallest possible code that triggers the error]
71
+ ```
72
+
73
+ **The minimal repro is not optional.** Debugging without a minimal repro is debugging the wrong problem. Strip everything until you have the smallest code that still fails.
74
+
75
+ ### Step 2: Hypothesis Formation
76
+
77
+ From the observation, generate 2-4 hypotheses. Rules:
78
+
79
+ - Each hypothesis must be **specific** (names a cause, not a category)
80
+ - Each hypothesis must be **falsifiable** (an experiment can prove it wrong)
81
+ - Hypotheses must be **ranked by probability** (investigate most likely first)
82
+
83
+ **Bad hypothesis:** "There might be an issue with the database"
84
+ **Good hypothesis:** "The connection pool is exhausted because we're not releasing connections in the error path of `process_payment()`"
85
+
86
+ **Hypothesis template:**
87
+ ```markdown
88
+ ## Hypothesis N: [Specific cause]
89
+
90
+ **Mechanism:** [How this cause produces the observed symptom]
91
+ **Probability:** [High/Medium/Low] because [reason]
92
+ **Experiment:** [Specific test that proves or disproves this hypothesis]
93
+ **Expected evidence if TRUE:** [What you'd see if this is the cause]
94
+ **Expected evidence if FALSE:** [What you'd see if this is not the cause]
95
+ ```
96
+
97
+ ### Step 3: Experiments (Investigate, Don't Fix)
98
+
99
+ **Critical rule:** Run experiments to gather evidence, not to fix the bug. The fix comes after you understand the cause.
100
+
101
+ **Experiment types:**
102
+
103
+ **Isolation:** Narrow the failure scope
104
+ ```bash
105
+ # Does it fail with a fresh database?
106
+ docker run --rm -e POSTGRES_DB=test postgres:15
107
+ python -m pytest tests/test_payment.py --db-url postgresql://localhost/test
108
+
109
+ # Does it fail with a specific user only?
110
+ python -m pytest tests/test_payment.py -k "user_123"
111
+ ```
112
+
113
+ **Binary search:** Git bisect for regressions
114
+ ```bash
115
+ git bisect start
116
+ git bisect bad HEAD
117
+ git bisect good v2.3.1 # last known good
118
+ git bisect run python -m pytest tests/test_payment.py -x
119
+ # Git finds the exact commit that introduced the bug
120
+ ```
121
+
122
+ **Logging:** Add targeted logging at the hypothesis boundary
123
+ ```python
124
+ # Don't add logging everywhere — add it exactly where the hypothesis predicts the failure
125
+ import logging
126
+ logger = logging.getLogger(__name__)
127
+
128
+ def process_payment(payment_id: str):
129
+ conn = get_db_connection()
130
+ logger.debug(f"process_payment: got connection {id(conn)}, pool size: {pool.size()}")
131
+ try:
132
+ # ... payment logic
133
+ return result
134
+ except Exception as e:
135
+ logger.error(f"process_payment FAILED: {e}, conn being released: {id(conn)}")
136
+ # BUG: connection not released here → pool exhaustion
137
+ raise # Fix: conn.close() before raise
138
+ ```
139
+
140
+ **State inspection:** Check system state at the failure point
141
+ ```bash
142
+ # Check connection pool state before/during/after
143
+ psql -c "SELECT count(*), state FROM pg_stat_activity GROUP BY state;"
144
+
145
+ # Check event queue depth
146
+ redis-cli LLEN payment_queue
147
+
148
+ # Check file descriptor usage
149
+ lsof -p $(pgrep -f payment_service) | wc -l
150
+ ```
151
+
152
+ ### Step 4: Evidence Collection
153
+
154
+ After each experiment, record what you found:
155
+
156
+ ```markdown
157
+ ## Evidence: Hypothesis N Test
158
+
159
+ **Experiment run:** [command or action taken]
160
+ **Result:** [what actually happened]
161
+ **Conclusion:** [does this support or refute the hypothesis?]
162
+ **Next step:** [if supported: deeper investigation | if refuted: next hypothesis]
163
+ ```
164
+
165
+ **Never interpret evidence to fit the hypothesis.** If the experiment contradicts the hypothesis, the hypothesis is wrong. Form a new one.
166
+
167
+ ### Step 5: Root Cause Identification
168
+
169
+ When an experiment strongly confirms a hypothesis:
170
+
171
+ 1. State the root cause precisely: "The root cause is [mechanism], which occurs because [condition], resulting in [symptom]"
172
+ 2. Trace back: Is this the root cause or a symptom of a deeper cause? Ask "why" 3-5 times.
173
+ 3. Identify the fix that addresses the root cause, not just the symptom.
174
+
175
+ **Root cause template:**
176
+ ```markdown
177
+ ## Root Cause
178
+
179
+ **Statement:** [precise description of the cause]
180
+ **Why it happens:** [condition that triggers it]
181
+ **Why it wasn't caught:** [test gap, code review miss, etc.]
182
+
183
+ **Fix:** [specific code change that addresses the root cause]
184
+ **Regression test:** [test that would have caught this]
185
+ **Prevention:** [process change to prevent this class of bug]
186
+ ```
187
+
188
+ ### Step 6: Fix and Verify
189
+
190
+ 1. Apply the minimal fix (don't refactor while fixing — that's scope creep)
191
+ 2. Verify the original reproduction case no longer fails
192
+ 3. Verify the fix doesn't break other tests
193
+ 4. Write the regression test
194
+ 5. Commit fix and test together
195
+
196
+ ## ClawPowers Enhancement
197
+
198
+ When `~/.clawpowers/` runtime is initialized:
199
+
200
+ **Persistent Hypothesis Tree:**
201
+
202
+ The full investigation is saved and never lost between sessions:
203
+
204
+ ```bash
205
+ # Save investigation state
206
+ bash runtime/persistence/store.sh set "debug:payment-pool-exhaustion:observation" "ConnectionPool timeout after 50 requests"
207
+ bash runtime/persistence/store.sh set "debug:payment-pool-exhaustion:hypothesis1" "Connection not released in error path"
208
+ bash runtime/persistence/store.sh set "debug:payment-pool-exhaustion:h1_result" "CONFIRMED: no conn.close() in except block"
209
+ bash runtime/persistence/store.sh set "debug:payment-pool-exhaustion:root_cause" "Missing conn.close() in process_payment error path"
210
+ bash runtime/persistence/store.sh set "debug:payment-pool-exhaustion:fix_commit" "a3f9b2c"
211
+ ```
212
+
213
+ If debugging spans multiple sessions, resume with:
214
+ ```bash
215
+ bash runtime/persistence/store.sh list "debug:payment-pool-exhaustion:*"
216
+ ```
217
+
218
+ **Known-Issue Pattern Matching:**
219
+
220
+ Past root causes are searchable. Before forming hypotheses:
221
+ ```bash
222
+ bash runtime/persistence/store.sh list "debug:*:root_cause" | grep -i "connection"
223
+ # → Found 2 prior connection-related bugs
224
+ # → Shows fixes applied, saving re-investigation time
225
+ ```
226
+
227
+ **Debugging Metrics:**
228
+
229
+ ```bash
230
+ bash runtime/metrics/collector.sh record \
231
+ --skill systematic-debugging \
232
+ --outcome success \
233
+ --duration 1800 \
234
+ --notes "payment-pool: 3 hypotheses, 1 correct, git bisect narrowed to 1 commit"
235
+ ```
236
+
237
+ Tracks: time-to-root-cause, hypothesis accuracy rate, which experiment types are most effective.
238
+
239
+ ## Anti-Patterns
240
+
241
+ | Anti-Pattern | Why It Fails | Correct Approach |
242
+ |-------------|-------------|-----------------|
243
+ | "Try-and-see" debugging | Random mutations, no understanding | Form hypothesis before changing code |
244
+ | Fixing without reproducing | Can't verify the fix worked | Minimal repro first, always |
245
+ | Investigating without isolation | Debugging the wrong level | Binary search / isolate the scope first |
246
+ | Multiple changes at once | Can't attribute which change fixed it | One change per experiment |
247
+ | Interpreting evidence to fit hypothesis | Confirmation bias, wrong fix | Evidence disproves or confirms; update hypothesis |
248
+ | Debugging by adding logs everywhere | Signal-to-noise ratio collapses | Targeted logging at hypothesis boundary only |
249
+ | Not writing regression test | Same bug recurs | Regression test is non-optional |
250
+ | Fixing symptoms, not root cause | Bug returns in a different form | Ask "why" 3-5 times to reach root cause |
251
+
252
+ ## Examples
253
+
254
+ ### Example 1: Intermittent Test Failure
255
+
256
+ **Observation:** `test_concurrent_writes` fails 20% of the time with `AssertionError: expected 100 rows, got 97-99`
257
+
258
+ **Hypothesis 1:** Race condition — concurrent writes arrive after the assertion reads
259
+ - Experiment: Add sleep(0.1) before assertion
260
+ - Result: Still fails
261
+ - Conclusion: Not a timing issue
262
+
263
+ **Hypothesis 2:** Lost update — concurrent transactions overwrite each other
264
+ - Experiment: Add row-level locking to write path
265
+ - Result: 0 failures in 100 runs
266
+ - Conclusion: CONFIRMED — missing `SELECT FOR UPDATE` in the read-modify-write cycle
267
+
268
+ **Root cause:** `update_counter()` reads then writes without a lock — concurrent execution loses updates.
269
+
270
+ ### Example 2: Production Alert
271
+
272
+ **Observation:** Memory usage grows 50MB/hour until OOM restart
273
+
274
+ **Hypothesis 1:** Memory leak — objects not garbage collected
275
+ - Experiment: `objgraph.most_common_types()` before and after request batches
276
+ - Result: `WeakValueDictionary` count grows monotonically
277
+ - Conclusion: CONFIRMED — cache holds strong refs despite `WeakValue` (values are themselves containers)
278
+
279
+ **Root cause:** Cache stores lists as values, lists are containers that prevent GC of their contents.