bps-kit 1.0.6 → 1.0.8

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (33) hide show
  1. package/LICENSE +21 -0
  2. package/README.md +22 -7
  3. package/bin/cli.js +27 -16
  4. package/package.json +2 -2
  5. package/templates/VAULT_INDEX.md +2 -2
  6. package/templates/agents-template/rules/GEMINI.md +4 -4
  7. package/templates/agents-template/scripts/convert_to_vscode.js +87 -0
  8. package/templates/skills_basic/behavioral-modes/SKILL.md +247 -0
  9. package/templates/skills_basic/brainstorming/SKILL.md +232 -0
  10. package/templates/skills_basic/clean-code/SKILL.md +94 -0
  11. package/templates/skills_basic/concise-planning/SKILL.md +68 -0
  12. package/templates/skills_basic/executing-plans/SKILL.md +82 -0
  13. package/templates/skills_basic/git-pushing/SKILL.md +36 -0
  14. package/templates/skills_basic/git-pushing/scripts/smart_commit.sh +19 -0
  15. package/templates/skills_basic/lint-and-validate/SKILL.md +50 -0
  16. package/templates/skills_basic/lint-and-validate/scripts/lint_runner.py +172 -0
  17. package/templates/skills_basic/lint-and-validate/scripts/type_coverage.py +173 -0
  18. package/templates/skills_basic/plan-writing/SKILL.md +154 -0
  19. package/templates/skills_basic/systematic-debugging/CREATION-LOG.md +119 -0
  20. package/templates/skills_basic/systematic-debugging/SKILL.md +299 -0
  21. package/templates/skills_basic/systematic-debugging/condition-based-waiting-example.ts +158 -0
  22. package/templates/skills_basic/systematic-debugging/condition-based-waiting.md +115 -0
  23. package/templates/skills_basic/systematic-debugging/defense-in-depth.md +122 -0
  24. package/templates/skills_basic/systematic-debugging/find-polluter.sh +63 -0
  25. package/templates/skills_basic/systematic-debugging/root-cause-tracing.md +169 -0
  26. package/templates/skills_basic/systematic-debugging/test-academic.md +14 -0
  27. package/templates/skills_basic/systematic-debugging/test-pressure-1.md +58 -0
  28. package/templates/skills_basic/systematic-debugging/test-pressure-2.md +68 -0
  29. package/templates/skills_basic/systematic-debugging/test-pressure-3.md +69 -0
  30. package/templates/skills_basic/verification-before-completion/SKILL.md +145 -0
  31. package/templates/skills_basic/vulnerability-scanner/SKILL.md +281 -0
  32. package/templates/skills_basic/vulnerability-scanner/checklists.md +121 -0
  33. package/templates/skills_basic/vulnerability-scanner/scripts/security_scan.py +458 -0
@@ -0,0 +1,299 @@
1
+ ---
2
+ name: systematic-debugging
3
+ description: "Use when encountering any bug, test failure, or unexpected behavior, before proposing fixes"
4
+ risk: unknown
5
+ source: community
6
+ date_added: "2026-02-27"
7
+ ---
8
+
9
+ # Systematic Debugging
10
+
11
+ ## Overview
12
+
13
+ Random fixes waste time and create new bugs. Quick patches mask underlying issues.
14
+
15
+ **Core principle:** ALWAYS find root cause before attempting fixes. Symptom fixes are failure.
16
+
17
+ **Violating the letter of this process is violating the spirit of debugging.**
18
+
19
+ ## The Iron Law
20
+
21
+ ```
22
+ NO FIXES WITHOUT ROOT CAUSE INVESTIGATION FIRST
23
+ ```
24
+
25
+ If you haven't completed Phase 1, you cannot propose fixes.
26
+
27
+ ## When to Use
28
+
29
+ Use for ANY technical issue:
30
+ - Test failures
31
+ - Bugs in production
32
+ - Unexpected behavior
33
+ - Performance problems
34
+ - Build failures
35
+ - Integration issues
36
+
37
+ **Use this ESPECIALLY when:**
38
+ - Under time pressure (emergencies make guessing tempting)
39
+ - "Just one quick fix" seems obvious
40
+ - You've already tried multiple fixes
41
+ - Previous fix didn't work
42
+ - You don't fully understand the issue
43
+
44
+ **Don't skip when:**
45
+ - Issue seems simple (simple bugs have root causes too)
46
+ - You're in a hurry (rushing guarantees rework)
47
+ - Manager wants it fixed NOW (systematic is faster than thrashing)
48
+
49
+ ## The Four Phases
50
+
51
+ You MUST complete each phase before proceeding to the next.
52
+
53
+ ### Phase 1: Root Cause Investigation
54
+
55
+ **BEFORE attempting ANY fix:**
56
+
57
+ 1. **Read Error Messages Carefully**
58
+ - Don't skip past errors or warnings
59
+ - They often contain the exact solution
60
+ - Read stack traces completely
61
+ - Note line numbers, file paths, error codes
62
+
63
+ 2. **Reproduce Consistently**
64
+ - Can you trigger it reliably?
65
+ - What are the exact steps?
66
+ - Does it happen every time?
67
+ - If not reproducible → gather more data, don't guess
68
+
69
+ 3. **Check Recent Changes**
70
+ - What changed that could cause this?
71
+ - Git diff, recent commits
72
+ - New dependencies, config changes
73
+ - Environmental differences
74
+
75
+ 4. **Gather Evidence in Multi-Component Systems**
76
+
77
+ **WHEN system has multiple components (CI → build → signing, API → service → database):**
78
+
79
+ **BEFORE proposing fixes, add diagnostic instrumentation:**
80
+ ```
81
+ For EACH component boundary:
82
+ - Log what data enters component
83
+ - Log what data exits component
84
+ - Verify environment/config propagation
85
+ - Check state at each layer
86
+
87
+ Run once to gather evidence showing WHERE it breaks
88
+ THEN analyze evidence to identify failing component
89
+ THEN investigate that specific component
90
+ ```
91
+
92
+ **Example (multi-layer system):**
93
+ ```bash
94
+ # Layer 1: Workflow
95
+ echo "=== Secrets available in workflow: ==="
96
+ echo "IDENTITY: ${IDENTITY:+SET}${IDENTITY:-UNSET}"
97
+
98
+ # Layer 2: Build script
99
+ echo "=== Env vars in build script: ==="
100
+ env | grep IDENTITY || echo "IDENTITY not in environment"
101
+
102
+ # Layer 3: Signing script
103
+ echo "=== Keychain state: ==="
104
+ security list-keychains
105
+ security find-identity -v
106
+
107
+ # Layer 4: Actual signing
108
+ codesign --sign "$IDENTITY" --verbose=4 "$APP"
109
+ ```
110
+
111
+ **This reveals:** Which layer fails (secrets → workflow ✓, workflow → build ✗)
112
+
113
+ 5. **Trace Data Flow**
114
+
115
+ **WHEN error is deep in call stack:**
116
+
117
+ See `root-cause-tracing.md` in this directory for the complete backward tracing technique.
118
+
119
+ **Quick version:**
120
+ - Where does bad value originate?
121
+ - What called this with bad value?
122
+ - Keep tracing up until you find the source
123
+ - Fix at source, not at symptom
124
+
125
+ ### Phase 2: Pattern Analysis
126
+
127
+ **Find the pattern before fixing:**
128
+
129
+ 1. **Find Working Examples**
130
+ - Locate similar working code in same codebase
131
+ - What works that's similar to what's broken?
132
+
133
+ 2. **Compare Against References**
134
+ - If implementing pattern, read reference implementation COMPLETELY
135
+ - Don't skim - read every line
136
+ - Understand the pattern fully before applying
137
+
138
+ 3. **Identify Differences**
139
+ - What's different between working and broken?
140
+ - List every difference, however small
141
+ - Don't assume "that can't matter"
142
+
143
+ 4. **Understand Dependencies**
144
+ - What other components does this need?
145
+ - What settings, config, environment?
146
+ - What assumptions does it make?
147
+
148
+ ### Phase 3: Hypothesis and Testing
149
+
150
+ **Scientific method:**
151
+
152
+ 1. **Form Single Hypothesis**
153
+ - State clearly: "I think X is the root cause because Y"
154
+ - Write it down
155
+ - Be specific, not vague
156
+
157
+ 2. **Test Minimally**
158
+ - Make the SMALLEST possible change to test hypothesis
159
+ - One variable at a time
160
+ - Don't fix multiple things at once
161
+
162
+ 3. **Verify Before Continuing**
163
+ - Did it work? Yes → Phase 4
164
+ - Didn't work? Form NEW hypothesis
165
+ - DON'T add more fixes on top
166
+
167
+ 4. **When You Don't Know**
168
+ - Say "I don't understand X"
169
+ - Don't pretend to know
170
+ - Ask for help
171
+ - Research more
172
+
173
+ ### Phase 4: Implementation
174
+
175
+ **Fix the root cause, not the symptom:**
176
+
177
+ 1. **Create Failing Test Case**
178
+ - Simplest possible reproduction
179
+ - Automated test if possible
180
+ - One-off test script if no framework
181
+ - MUST have before fixing
182
+ - Use the `superpowers:test-driven-development` skill for writing proper failing tests
183
+
184
+ 2. **Implement Single Fix**
185
+ - Address the root cause identified
186
+ - ONE change at a time
187
+ - No "while I'm here" improvements
188
+ - No bundled refactoring
189
+
190
+ 3. **Verify Fix**
191
+ - Test passes now?
192
+ - No other tests broken?
193
+ - Issue actually resolved?
194
+
195
+ 4. **If Fix Doesn't Work**
196
+ - STOP
197
+ - Count: How many fixes have you tried?
198
+ - If < 3: Return to Phase 1, re-analyze with new information
199
+ - **If ≥ 3: STOP and question the architecture (step 5 below)**
200
+ - DON'T attempt Fix #4 without architectural discussion
201
+
202
+ 5. **If 3+ Fixes Failed: Question Architecture**
203
+
204
+ **Pattern indicating architectural problem:**
205
+ - Each fix reveals new shared state/coupling/problem in different place
206
+ - Fixes require "massive refactoring" to implement
207
+ - Each fix creates new symptoms elsewhere
208
+
209
+ **STOP and question fundamentals:**
210
+ - Is this pattern fundamentally sound?
211
+ - Are we "sticking with it through sheer inertia"?
212
+ - Should we refactor architecture vs. continue fixing symptoms?
213
+
214
+ **Discuss with your human partner before attempting more fixes**
215
+
216
+ This is NOT a failed hypothesis - this is a wrong architecture.
217
+
218
+ ## Red Flags - STOP and Follow Process
219
+
220
+ If you catch yourself thinking:
221
+ - "Quick fix for now, investigate later"
222
+ - "Just try changing X and see if it works"
223
+ - "Add multiple changes, run tests"
224
+ - "Skip the test, I'll manually verify"
225
+ - "It's probably X, let me fix that"
226
+ - "I don't fully understand but this might work"
227
+ - "Pattern says X but I'll adapt it differently"
228
+ - "Here are the main problems: [lists fixes without investigation]"
229
+ - Proposing solutions before tracing data flow
230
+ - **"One more fix attempt" (when already tried 2+)**
231
+ - **Each fix reveals new problem in different place**
232
+
233
+ **ALL of these mean: STOP. Return to Phase 1.**
234
+
235
+ **If 3+ fixes failed:** Question the architecture (see Phase 4.5)
236
+
237
+ ## your human partner's Signals You're Doing It Wrong
238
+
239
+ **Watch for these redirections:**
240
+ - "Is that not happening?" - You assumed without verifying
241
+ - "Will it show us...?" - You should have added evidence gathering
242
+ - "Stop guessing" - You're proposing fixes without understanding
243
+ - "Ultrathink this" - Question fundamentals, not just symptoms
244
+ - "We're stuck?" (frustrated) - Your approach isn't working
245
+
246
+ **When you see these:** STOP. Return to Phase 1.
247
+
248
+ ## Common Rationalizations
249
+
250
+ | Excuse | Reality |
251
+ |--------|---------|
252
+ | "Issue is simple, don't need process" | Simple issues have root causes too. Process is fast for simple bugs. |
253
+ | "Emergency, no time for process" | Systematic debugging is FASTER than guess-and-check thrashing. |
254
+ | "Just try this first, then investigate" | First fix sets the pattern. Do it right from the start. |
255
+ | "I'll write test after confirming fix works" | Untested fixes don't stick. Test first proves it. |
256
+ | "Multiple fixes at once saves time" | Can't isolate what worked. Causes new bugs. |
257
+ | "Reference too long, I'll adapt the pattern" | Partial understanding guarantees bugs. Read it completely. |
258
+ | "I see the problem, let me fix it" | Seeing symptoms ≠ understanding root cause. |
259
+ | "One more fix attempt" (after 2+ failures) | 3+ failures = architectural problem. Question pattern, don't fix again. |
260
+
261
+ ## Quick Reference
262
+
263
+ | Phase | Key Activities | Success Criteria |
264
+ |-------|---------------|------------------|
265
+ | **1. Root Cause** | Read errors, reproduce, check changes, gather evidence | Understand WHAT and WHY |
266
+ | **2. Pattern** | Find working examples, compare | Identify differences |
267
+ | **3. Hypothesis** | Form theory, test minimally | Confirmed or new hypothesis |
268
+ | **4. Implementation** | Create test, fix, verify | Bug resolved, tests pass |
269
+
270
+ ## When Process Reveals "No Root Cause"
271
+
272
+ If systematic investigation reveals issue is truly environmental, timing-dependent, or external:
273
+
274
+ 1. You've completed the process
275
+ 2. Document what you investigated
276
+ 3. Implement appropriate handling (retry, timeout, error message)
277
+ 4. Add monitoring/logging for future investigation
278
+
279
+ **But:** 95% of "no root cause" cases are incomplete investigation.
280
+
281
+ ## Supporting Techniques
282
+
283
+ These techniques are part of systematic debugging and available in this directory:
284
+
285
+ - **`root-cause-tracing.md`** - Trace bugs backward through call stack to find original trigger
286
+ - **`defense-in-depth.md`** - Add validation at multiple layers after finding root cause
287
+ - **`condition-based-waiting.md`** - Replace arbitrary timeouts with condition polling
288
+
289
+ **Related skills:**
290
+ - **superpowers:test-driven-development** - For creating failing test case (Phase 4, Step 1)
291
+ - **superpowers:verification-before-completion** - Verify fix worked before claiming success
292
+
293
+ ## Real-World Impact
294
+
295
+ From debugging sessions:
296
+ - Systematic approach: 15-30 minutes to fix
297
+ - Random fixes approach: 2-3 hours of thrashing
298
+ - First-time fix rate: 95% vs 40%
299
+ - New bugs introduced: Near zero vs common
@@ -0,0 +1,158 @@
1
+ // Complete implementation of condition-based waiting utilities
2
+ // From: Lace test infrastructure improvements (2025-10-03)
3
+ // Context: Fixed 15 flaky tests by replacing arbitrary timeouts
4
+
5
+ import type { ThreadManager } from '~/threads/thread-manager';
6
+ import type { LaceEvent, LaceEventType } from '~/threads/types';
7
+
8
+ /**
9
+ * Wait for a specific event type to appear in thread
10
+ *
11
+ * @param threadManager - The thread manager to query
12
+ * @param threadId - Thread to check for events
13
+ * @param eventType - Type of event to wait for
14
+ * @param timeoutMs - Maximum time to wait (default 5000ms)
15
+ * @returns Promise resolving to the first matching event
16
+ *
17
+ * Example:
18
+ * await waitForEvent(threadManager, agentThreadId, 'TOOL_RESULT');
19
+ */
20
+ export function waitForEvent(
21
+ threadManager: ThreadManager,
22
+ threadId: string,
23
+ eventType: LaceEventType,
24
+ timeoutMs = 5000
25
+ ): Promise<LaceEvent> {
26
+ return new Promise((resolve, reject) => {
27
+ const startTime = Date.now();
28
+
29
+ const check = () => {
30
+ const events = threadManager.getEvents(threadId);
31
+ const event = events.find((e) => e.type === eventType);
32
+
33
+ if (event) {
34
+ resolve(event);
35
+ } else if (Date.now() - startTime > timeoutMs) {
36
+ reject(new Error(`Timeout waiting for ${eventType} event after ${timeoutMs}ms`));
37
+ } else {
38
+ setTimeout(check, 10); // Poll every 10ms for efficiency
39
+ }
40
+ };
41
+
42
+ check();
43
+ });
44
+ }
45
+
46
+ /**
47
+ * Wait for a specific number of events of a given type
48
+ *
49
+ * @param threadManager - The thread manager to query
50
+ * @param threadId - Thread to check for events
51
+ * @param eventType - Type of event to wait for
52
+ * @param count - Number of events to wait for
53
+ * @param timeoutMs - Maximum time to wait (default 5000ms)
54
+ * @returns Promise resolving to all matching events once count is reached
55
+ *
56
+ * Example:
57
+ * // Wait for 2 AGENT_MESSAGE events (initial response + continuation)
58
+ * await waitForEventCount(threadManager, agentThreadId, 'AGENT_MESSAGE', 2);
59
+ */
60
+ export function waitForEventCount(
61
+ threadManager: ThreadManager,
62
+ threadId: string,
63
+ eventType: LaceEventType,
64
+ count: number,
65
+ timeoutMs = 5000
66
+ ): Promise<LaceEvent[]> {
67
+ return new Promise((resolve, reject) => {
68
+ const startTime = Date.now();
69
+
70
+ const check = () => {
71
+ const events = threadManager.getEvents(threadId);
72
+ const matchingEvents = events.filter((e) => e.type === eventType);
73
+
74
+ if (matchingEvents.length >= count) {
75
+ resolve(matchingEvents);
76
+ } else if (Date.now() - startTime > timeoutMs) {
77
+ reject(
78
+ new Error(
79
+ `Timeout waiting for ${count} ${eventType} events after ${timeoutMs}ms (got ${matchingEvents.length})`
80
+ )
81
+ );
82
+ } else {
83
+ setTimeout(check, 10);
84
+ }
85
+ };
86
+
87
+ check();
88
+ });
89
+ }
90
+
91
+ /**
92
+ * Wait for an event matching a custom predicate
93
+ * Useful when you need to check event data, not just type
94
+ *
95
+ * @param threadManager - The thread manager to query
96
+ * @param threadId - Thread to check for events
97
+ * @param predicate - Function that returns true when event matches
98
+ * @param description - Human-readable description for error messages
99
+ * @param timeoutMs - Maximum time to wait (default 5000ms)
100
+ * @returns Promise resolving to the first matching event
101
+ *
102
+ * Example:
103
+ * // Wait for TOOL_RESULT with specific ID
104
+ * await waitForEventMatch(
105
+ * threadManager,
106
+ * agentThreadId,
107
+ * (e) => e.type === 'TOOL_RESULT' && e.data.id === 'call_123',
108
+ * 'TOOL_RESULT with id=call_123'
109
+ * );
110
+ */
111
+ export function waitForEventMatch(
112
+ threadManager: ThreadManager,
113
+ threadId: string,
114
+ predicate: (event: LaceEvent) => boolean,
115
+ description: string,
116
+ timeoutMs = 5000
117
+ ): Promise<LaceEvent> {
118
+ return new Promise((resolve, reject) => {
119
+ const startTime = Date.now();
120
+
121
+ const check = () => {
122
+ const events = threadManager.getEvents(threadId);
123
+ const event = events.find(predicate);
124
+
125
+ if (event) {
126
+ resolve(event);
127
+ } else if (Date.now() - startTime > timeoutMs) {
128
+ reject(new Error(`Timeout waiting for ${description} after ${timeoutMs}ms`));
129
+ } else {
130
+ setTimeout(check, 10);
131
+ }
132
+ };
133
+
134
+ check();
135
+ });
136
+ }
137
+
138
+ // Usage example from actual debugging session:
139
+ //
140
+ // BEFORE (flaky):
141
+ // ---------------
142
+ // const messagePromise = agent.sendMessage('Execute tools');
143
+ // await new Promise(r => setTimeout(r, 300)); // Hope tools start in 300ms
144
+ // agent.abort();
145
+ // await messagePromise;
146
+ // await new Promise(r => setTimeout(r, 50)); // Hope results arrive in 50ms
147
+ // expect(toolResults.length).toBe(2); // Fails randomly
148
+ //
149
+ // AFTER (reliable):
150
+ // ----------------
151
+ // const messagePromise = agent.sendMessage('Execute tools');
152
+ // await waitForEventCount(threadManager, threadId, 'TOOL_CALL', 2); // Wait for tools to start
153
+ // agent.abort();
154
+ // await messagePromise;
155
+ // await waitForEventCount(threadManager, threadId, 'TOOL_RESULT', 2); // Wait for results
156
+ // expect(toolResults.length).toBe(2); // Always succeeds
157
+ //
158
+ // Result: 60% pass rate → 100%, 40% faster execution
@@ -0,0 +1,115 @@
1
+ # Condition-Based Waiting
2
+
3
+ ## Overview
4
+
5
+ Flaky tests often guess at timing with arbitrary delays. This creates race conditions where tests pass on fast machines but fail under load or in CI.
6
+
7
+ **Core principle:** Wait for the actual condition you care about, not a guess about how long it takes.
8
+
9
+ ## When to Use
10
+
11
+ ```dot
12
+ digraph when_to_use {
13
+ "Test uses setTimeout/sleep?" [shape=diamond];
14
+ "Testing timing behavior?" [shape=diamond];
15
+ "Document WHY timeout needed" [shape=box];
16
+ "Use condition-based waiting" [shape=box];
17
+
18
+ "Test uses setTimeout/sleep?" -> "Testing timing behavior?" [label="yes"];
19
+ "Testing timing behavior?" -> "Document WHY timeout needed" [label="yes"];
20
+ "Testing timing behavior?" -> "Use condition-based waiting" [label="no"];
21
+ }
22
+ ```
23
+
24
+ **Use when:**
25
+ - Tests have arbitrary delays (`setTimeout`, `sleep`, `time.sleep()`)
26
+ - Tests are flaky (pass sometimes, fail under load)
27
+ - Tests timeout when run in parallel
28
+ - Waiting for async operations to complete
29
+
30
+ **Don't use when:**
31
+ - Testing actual timing behavior (debounce, throttle intervals)
32
+ - Always document WHY if using arbitrary timeout
33
+
34
+ ## Core Pattern
35
+
36
+ ```typescript
37
+ // ❌ BEFORE: Guessing at timing
38
+ await new Promise(r => setTimeout(r, 50));
39
+ const result = getResult();
40
+ expect(result).toBeDefined();
41
+
42
+ // ✅ AFTER: Waiting for condition
43
+ await waitFor(() => getResult() !== undefined);
44
+ const result = getResult();
45
+ expect(result).toBeDefined();
46
+ ```
47
+
48
+ ## Quick Patterns
49
+
50
+ | Scenario | Pattern |
51
+ |----------|---------|
52
+ | Wait for event | `waitFor(() => events.find(e => e.type === 'DONE'))` |
53
+ | Wait for state | `waitFor(() => machine.state === 'ready')` |
54
+ | Wait for count | `waitFor(() => items.length >= 5)` |
55
+ | Wait for file | `waitFor(() => fs.existsSync(path))` |
56
+ | Complex condition | `waitFor(() => obj.ready && obj.value > 10)` |
57
+
58
+ ## Implementation
59
+
60
+ Generic polling function:
61
+ ```typescript
62
+ async function waitFor<T>(
63
+ condition: () => T | undefined | null | false,
64
+ description: string,
65
+ timeoutMs = 5000
66
+ ): Promise<T> {
67
+ const startTime = Date.now();
68
+
69
+ while (true) {
70
+ const result = condition();
71
+ if (result) return result;
72
+
73
+ if (Date.now() - startTime > timeoutMs) {
74
+ throw new Error(`Timeout waiting for ${description} after ${timeoutMs}ms`);
75
+ }
76
+
77
+ await new Promise(r => setTimeout(r, 10)); // Poll every 10ms
78
+ }
79
+ }
80
+ ```
81
+
82
+ See `condition-based-waiting-example.ts` in this directory for complete implementation with domain-specific helpers (`waitForEvent`, `waitForEventCount`, `waitForEventMatch`) from actual debugging session.
83
+
84
+ ## Common Mistakes
85
+
86
+ **❌ Polling too fast:** `setTimeout(check, 1)` - wastes CPU
87
+ **✅ Fix:** Poll every 10ms
88
+
89
+ **❌ No timeout:** Loop forever if condition never met
90
+ **✅ Fix:** Always include timeout with clear error
91
+
92
+ **❌ Stale data:** Cache state before loop
93
+ **✅ Fix:** Call getter inside loop for fresh data
94
+
95
+ ## When Arbitrary Timeout IS Correct
96
+
97
+ ```typescript
98
+ // Tool ticks every 100ms - need 2 ticks to verify partial output
99
+ await waitForEvent(manager, 'TOOL_STARTED'); // First: wait for condition
100
+ await new Promise(r => setTimeout(r, 200)); // Then: wait for timed behavior
101
+ // 200ms = 2 ticks at 100ms intervals - documented and justified
102
+ ```
103
+
104
+ **Requirements:**
105
+ 1. First wait for triggering condition
106
+ 2. Based on known timing (not guessing)
107
+ 3. Comment explaining WHY
108
+
109
+ ## Real-World Impact
110
+
111
+ From debugging session (2025-10-03):
112
+ - Fixed 15 flaky tests across 3 files
113
+ - Pass rate: 60% → 100%
114
+ - Execution time: 40% faster
115
+ - No more race conditions
@@ -0,0 +1,122 @@
1
+ # Defense-in-Depth Validation
2
+
3
+ ## Overview
4
+
5
+ When you fix a bug caused by invalid data, adding validation at one place feels sufficient. But that single check can be bypassed by different code paths, refactoring, or mocks.
6
+
7
+ **Core principle:** Validate at EVERY layer data passes through. Make the bug structurally impossible.
8
+
9
+ ## Why Multiple Layers
10
+
11
+ Single validation: "We fixed the bug"
12
+ Multiple layers: "We made the bug impossible"
13
+
14
+ Different layers catch different cases:
15
+ - Entry validation catches most bugs
16
+ - Business logic catches edge cases
17
+ - Environment guards prevent context-specific dangers
18
+ - Debug logging helps when other layers fail
19
+
20
+ ## The Four Layers
21
+
22
+ ### Layer 1: Entry Point Validation
23
+ **Purpose:** Reject obviously invalid input at API boundary
24
+
25
+ ```typescript
26
+ function createProject(name: string, workingDirectory: string) {
27
+ if (!workingDirectory || workingDirectory.trim() === '') {
28
+ throw new Error('workingDirectory cannot be empty');
29
+ }
30
+ if (!existsSync(workingDirectory)) {
31
+ throw new Error(`workingDirectory does not exist: ${workingDirectory}`);
32
+ }
33
+ if (!statSync(workingDirectory).isDirectory()) {
34
+ throw new Error(`workingDirectory is not a directory: ${workingDirectory}`);
35
+ }
36
+ // ... proceed
37
+ }
38
+ ```
39
+
40
+ ### Layer 2: Business Logic Validation
41
+ **Purpose:** Ensure data makes sense for this operation
42
+
43
+ ```typescript
44
+ function initializeWorkspace(projectDir: string, sessionId: string) {
45
+ if (!projectDir) {
46
+ throw new Error('projectDir required for workspace initialization');
47
+ }
48
+ // ... proceed
49
+ }
50
+ ```
51
+
52
+ ### Layer 3: Environment Guards
53
+ **Purpose:** Prevent dangerous operations in specific contexts
54
+
55
+ ```typescript
56
+ async function gitInit(directory: string) {
57
+ // In tests, refuse git init outside temp directories
58
+ if (process.env.NODE_ENV === 'test') {
59
+ const normalized = normalize(resolve(directory));
60
+ const tmpDir = normalize(resolve(tmpdir()));
61
+
62
+ if (!normalized.startsWith(tmpDir)) {
63
+ throw new Error(
64
+ `Refusing git init outside temp dir during tests: ${directory}`
65
+ );
66
+ }
67
+ }
68
+ // ... proceed
69
+ }
70
+ ```
71
+
72
+ ### Layer 4: Debug Instrumentation
73
+ **Purpose:** Capture context for forensics
74
+
75
+ ```typescript
76
+ async function gitInit(directory: string) {
77
+ const stack = new Error().stack;
78
+ logger.debug('About to git init', {
79
+ directory,
80
+ cwd: process.cwd(),
81
+ stack,
82
+ });
83
+ // ... proceed
84
+ }
85
+ ```
86
+
87
+ ## Applying the Pattern
88
+
89
+ When you find a bug:
90
+
91
+ 1. **Trace the data flow** - Where does bad value originate? Where used?
92
+ 2. **Map all checkpoints** - List every point data passes through
93
+ 3. **Add validation at each layer** - Entry, business, environment, debug
94
+ 4. **Test each layer** - Try to bypass layer 1, verify layer 2 catches it
95
+
96
+ ## Example from Session
97
+
98
+ Bug: Empty `projectDir` caused `git init` in source code
99
+
100
+ **Data flow:**
101
+ 1. Test setup → empty string
102
+ 2. `Project.create(name, '')`
103
+ 3. `WorkspaceManager.createWorkspace('')`
104
+ 4. `git init` runs in `process.cwd()`
105
+
106
+ **Four layers added:**
107
+ - Layer 1: `Project.create()` validates not empty/exists/writable
108
+ - Layer 2: `WorkspaceManager` validates projectDir not empty
109
+ - Layer 3: `WorktreeManager` refuses git init outside tmpdir in tests
110
+ - Layer 4: Stack trace logging before git init
111
+
112
+ **Result:** All 1847 tests passed, bug impossible to reproduce
113
+
114
+ ## Key Insight
115
+
116
+ All four layers were necessary. During testing, each layer caught bugs the others missed:
117
+ - Different code paths bypassed entry validation
118
+ - Mocks bypassed business logic checks
119
+ - Edge cases on different platforms needed environment guards
120
+ - Debug logging identified structural misuse
121
+
122
+ **Don't stop at one validation point.** Add checks at every layer.