@tianhai/pi-workflow-kit 0.4.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/LICENSE +22 -0
- package/README.md +509 -0
- package/ROADMAP.md +16 -0
- package/agents/code-reviewer.md +18 -0
- package/agents/config.ts +5 -0
- package/agents/implementer.md +26 -0
- package/agents/spec-reviewer.md +13 -0
- package/agents/worker.md +17 -0
- package/banner.jpg +0 -0
- package/docs/developer-usage-guide.md +463 -0
- package/docs/oversight-model.md +49 -0
- package/docs/workflow-phases.md +71 -0
- package/extensions/constants.ts +9 -0
- package/extensions/lib/logging.ts +138 -0
- package/extensions/plan-tracker.ts +496 -0
- package/extensions/subagent/agents.ts +144 -0
- package/extensions/subagent/concurrency.ts +52 -0
- package/extensions/subagent/env.ts +47 -0
- package/extensions/subagent/index.ts +1116 -0
- package/extensions/subagent/lifecycle.ts +25 -0
- package/extensions/subagent/timeout.ts +13 -0
- package/extensions/workflow-monitor/debug-monitor.ts +98 -0
- package/extensions/workflow-monitor/git.ts +31 -0
- package/extensions/workflow-monitor/heuristics.ts +58 -0
- package/extensions/workflow-monitor/investigation.ts +52 -0
- package/extensions/workflow-monitor/reference-tool.ts +42 -0
- package/extensions/workflow-monitor/skip-confirmation.ts +19 -0
- package/extensions/workflow-monitor/tdd-monitor.ts +137 -0
- package/extensions/workflow-monitor/test-runner.ts +37 -0
- package/extensions/workflow-monitor/verification-monitor.ts +61 -0
- package/extensions/workflow-monitor/warnings.ts +81 -0
- package/extensions/workflow-monitor/workflow-handler.ts +358 -0
- package/extensions/workflow-monitor/workflow-tracker.ts +231 -0
- package/extensions/workflow-monitor/workflow-transitions.ts +55 -0
- package/extensions/workflow-monitor.ts +885 -0
- package/package.json +49 -0
- package/skills/brainstorming/SKILL.md +70 -0
- package/skills/dispatching-parallel-agents/SKILL.md +194 -0
- package/skills/executing-tasks/SKILL.md +247 -0
- package/skills/receiving-code-review/SKILL.md +196 -0
- package/skills/systematic-debugging/SKILL.md +170 -0
- package/skills/systematic-debugging/condition-based-waiting-example.ts +158 -0
- package/skills/systematic-debugging/condition-based-waiting.md +115 -0
- package/skills/systematic-debugging/defense-in-depth.md +122 -0
- package/skills/systematic-debugging/find-polluter.sh +63 -0
- package/skills/systematic-debugging/reference/rationalizations.md +61 -0
- package/skills/systematic-debugging/root-cause-tracing.md +169 -0
- package/skills/test-driven-development/SKILL.md +266 -0
- package/skills/test-driven-development/reference/examples.md +101 -0
- package/skills/test-driven-development/reference/rationalizations.md +67 -0
- package/skills/test-driven-development/reference/when-stuck.md +33 -0
- package/skills/test-driven-development/testing-anti-patterns.md +299 -0
- package/skills/using-git-worktrees/SKILL.md +231 -0
- package/skills/writing-plans/SKILL.md +149 -0
|
@@ -0,0 +1,196 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: receiving-code-review
|
|
3
|
+
description: Use when receiving code review feedback, before implementing suggestions, especially if feedback seems unclear or technically questionable - requires technical rigor and verification, not performative agreement or blind implementation
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
> **Related skills:** Verify each fix with `/skill:executing-tasks`. Use `/skill:test-driven-development` for regression tests.
|
|
7
|
+
|
|
8
|
+
# Code Review Reception
|
|
9
|
+
|
|
10
|
+
## Overview
|
|
11
|
+
|
|
12
|
+
Code review requires technical evaluation, not emotional performance.
|
|
13
|
+
|
|
14
|
+
**Core principle:** Verify before implementing. Ask before assuming. Technical correctness over social comfort.
|
|
15
|
+
|
|
16
|
+
## The Response Pattern
|
|
17
|
+
|
|
18
|
+
```
|
|
19
|
+
WHEN receiving code review feedback:
|
|
20
|
+
|
|
21
|
+
1. READ: Complete feedback without reacting
|
|
22
|
+
2. UNDERSTAND: Restate requirement in own words (or ask)
|
|
23
|
+
3. VERIFY: Check against codebase reality
|
|
24
|
+
4. EVALUATE: Technically sound for THIS codebase?
|
|
25
|
+
5. RESPOND: Technical acknowledgment or reasoned pushback
|
|
26
|
+
6. IMPLEMENT: One item at a time, test each
|
|
27
|
+
```
|
|
28
|
+
|
|
29
|
+
## How to Respond
|
|
30
|
+
|
|
31
|
+
**Never performative:**
|
|
32
|
+
- ❌ "You're absolutely right!" / "Great point!" / "Thanks for catching that!"
|
|
33
|
+
- ❌ "Let me implement that now" (before verification)
|
|
34
|
+
|
|
35
|
+
**Instead:**
|
|
36
|
+
- ✅ "Fixed. [Brief description of what changed]"
|
|
37
|
+
- ✅ "Good catch - [specific issue]. Fixed in [location]."
|
|
38
|
+
- ✅ Just fix it silently — the code shows you heard the feedback
|
|
39
|
+
- ✅ Push back with technical reasoning if wrong
|
|
40
|
+
- ✅ Ask clarifying questions if unclear
|
|
41
|
+
|
|
42
|
+
## Handling Unclear Feedback
|
|
43
|
+
|
|
44
|
+
```
|
|
45
|
+
IF any item is unclear:
|
|
46
|
+
STOP - do not implement anything yet
|
|
47
|
+
ASK for clarification on unclear items
|
|
48
|
+
|
|
49
|
+
WHY: Items may be related. Partial understanding = wrong implementation.
|
|
50
|
+
```
|
|
51
|
+
|
|
52
|
+
**Example:**
|
|
53
|
+
```
|
|
54
|
+
your human partner: "Fix 1-6"
|
|
55
|
+
You understand 1,2,3,6. Unclear on 4,5.
|
|
56
|
+
|
|
57
|
+
❌ WRONG: Implement 1,2,3,6 now, ask about 4,5 later
|
|
58
|
+
✅ RIGHT: "I understand items 1,2,3,6. Need clarification on 4 and 5 before proceeding."
|
|
59
|
+
```
|
|
60
|
+
|
|
61
|
+
## Source-Specific Handling
|
|
62
|
+
|
|
63
|
+
### From your human partner
|
|
64
|
+
- **Trusted** - implement after understanding
|
|
65
|
+
- **Still ask** if scope unclear
|
|
66
|
+
- **No performative agreement**
|
|
67
|
+
- **Skip to action** or technical acknowledgment
|
|
68
|
+
|
|
69
|
+
### From External Reviewers
|
|
70
|
+
```
|
|
71
|
+
BEFORE implementing:
|
|
72
|
+
1. Check: Technically correct for THIS codebase?
|
|
73
|
+
2. Check: Breaks existing functionality?
|
|
74
|
+
3. Check: Reason for current implementation?
|
|
75
|
+
4. Check: Works on all platforms/versions?
|
|
76
|
+
5. Check: Does reviewer understand full context?
|
|
77
|
+
|
|
78
|
+
IF suggestion seems wrong:
|
|
79
|
+
Push back with technical reasoning
|
|
80
|
+
|
|
81
|
+
IF can't easily verify:
|
|
82
|
+
Say so: "I can't verify this without [X]. Should I [investigate/ask/proceed]?"
|
|
83
|
+
|
|
84
|
+
IF conflicts with your human partner's prior decisions:
|
|
85
|
+
Stop and discuss with your human partner first
|
|
86
|
+
```
|
|
87
|
+
|
|
88
|
+
**your human partner's rule:** "External feedback - be skeptical, but check carefully"
|
|
89
|
+
|
|
90
|
+
## YAGNI Check for "Professional" Features
|
|
91
|
+
|
|
92
|
+
```
|
|
93
|
+
IF reviewer suggests "implementing properly":
|
|
94
|
+
grep codebase for actual usage
|
|
95
|
+
|
|
96
|
+
IF unused: "This endpoint isn't called. Remove it (YAGNI)?"
|
|
97
|
+
IF used: Then implement properly
|
|
98
|
+
```
|
|
99
|
+
|
|
100
|
+
**your human partner's rule:** "You and reviewer both report to me. If we don't need this feature, don't add it."
|
|
101
|
+
|
|
102
|
+
## Implementation Order
|
|
103
|
+
|
|
104
|
+
```
|
|
105
|
+
FOR multi-item feedback:
|
|
106
|
+
1. Clarify anything unclear FIRST
|
|
107
|
+
2. Then implement in this order:
|
|
108
|
+
- Blocking issues (breaks, security)
|
|
109
|
+
- Simple fixes (typos, imports)
|
|
110
|
+
- Complex fixes (refactoring, logic)
|
|
111
|
+
3. Test each fix individually
|
|
112
|
+
4. Verify no regressions
|
|
113
|
+
```
|
|
114
|
+
|
|
115
|
+
## When To Push Back
|
|
116
|
+
|
|
117
|
+
Push back when:
|
|
118
|
+
- Suggestion breaks existing functionality
|
|
119
|
+
- Reviewer lacks full context
|
|
120
|
+
- Violates YAGNI (unused feature)
|
|
121
|
+
- Technically incorrect for this stack
|
|
122
|
+
- Legacy/compatibility reasons exist
|
|
123
|
+
- Conflicts with your human partner's architectural decisions
|
|
124
|
+
|
|
125
|
+
**How to push back:**
|
|
126
|
+
- Use technical reasoning, not defensiveness
|
|
127
|
+
- Ask specific questions
|
|
128
|
+
- Reference working tests/code
|
|
129
|
+
- Involve your human partner if architectural
|
|
130
|
+
|
|
131
|
+
**Signal if uncomfortable pushing back out loud:** "Strange things are afoot at the Circle K"
|
|
132
|
+
|
|
133
|
+
## Gracefully Correcting Your Pushback
|
|
134
|
+
|
|
135
|
+
If you pushed back and were wrong:
|
|
136
|
+
```
|
|
137
|
+
✅ "You were right - I checked [X] and it does [Y]. Implementing now."
|
|
138
|
+
✅ "Verified this and you're correct. My initial understanding was wrong because [reason]. Fixing."
|
|
139
|
+
|
|
140
|
+
❌ Long apology
|
|
141
|
+
❌ Defending why you pushed back
|
|
142
|
+
❌ Over-explaining
|
|
143
|
+
```
|
|
144
|
+
|
|
145
|
+
State the correction factually and move on.
|
|
146
|
+
|
|
147
|
+
## Common Mistakes
|
|
148
|
+
|
|
149
|
+
| Mistake | Fix |
|
|
150
|
+
|---------|-----|
|
|
151
|
+
| Performative agreement | State requirement or just act |
|
|
152
|
+
| Blind implementation | Verify against codebase first |
|
|
153
|
+
| Batch without testing | One at a time, test each |
|
|
154
|
+
| Assuming reviewer is right | Check if breaks things |
|
|
155
|
+
| Avoiding pushback | Technical correctness > comfort |
|
|
156
|
+
| Partial implementation | Clarify all items first |
|
|
157
|
+
| Can't verify, proceed anyway | State limitation, ask for direction |
|
|
158
|
+
|
|
159
|
+
## Real Examples
|
|
160
|
+
|
|
161
|
+
**Performative Agreement (Bad):**
|
|
162
|
+
```
|
|
163
|
+
Reviewer: "Remove legacy code"
|
|
164
|
+
❌ "You're absolutely right! Let me remove that..."
|
|
165
|
+
```
|
|
166
|
+
|
|
167
|
+
**Technical Verification (Good):**
|
|
168
|
+
```
|
|
169
|
+
Reviewer: "Remove legacy code"
|
|
170
|
+
✅ "Checking... build target is 10.15+, this API needs 13+. Need legacy for backward compat. Current impl has wrong bundle ID - fix it or drop pre-13 support?"
|
|
171
|
+
```
|
|
172
|
+
|
|
173
|
+
**YAGNI (Good):**
|
|
174
|
+
```
|
|
175
|
+
Reviewer: "Implement proper metrics tracking with database, date filters, CSV export"
|
|
176
|
+
✅ "Grepped codebase - nothing calls this endpoint. Remove it (YAGNI)? Or is there usage I'm missing?"
|
|
177
|
+
```
|
|
178
|
+
|
|
179
|
+
**Unclear Item (Good):**
|
|
180
|
+
```
|
|
181
|
+
your human partner: "Fix items 1-6"
|
|
182
|
+
You understand 1,2,3,6. Unclear on 4,5.
|
|
183
|
+
✅ "Understand 1,2,3,6. Need clarification on 4 and 5 before implementing."
|
|
184
|
+
```
|
|
185
|
+
|
|
186
|
+
## GitHub Thread Replies
|
|
187
|
+
|
|
188
|
+
When replying to inline review comments on GitHub, reply in the comment thread (`gh api repos/{owner}/{repo}/pulls/{pr}/comments/{id}/replies`), not as a top-level PR comment.
|
|
189
|
+
|
|
190
|
+
## The Bottom Line
|
|
191
|
+
|
|
192
|
+
**External feedback = suggestions to evaluate, not orders to follow.**
|
|
193
|
+
|
|
194
|
+
Verify. Question. Then implement.
|
|
195
|
+
|
|
196
|
+
No performative agreement. Technical rigor always.
|
|
@@ -0,0 +1,170 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: systematic-debugging
|
|
3
|
+
description: Use when encountering any bug, test failure, or unexpected behavior, before proposing fixes
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
> **Related skills:** Write a failing test for the bug with `/skill:test-driven-development`. Verify the fix with `/skill:executing-tasks`.
|
|
7
|
+
|
|
8
|
+
# Systematic Debugging
|
|
9
|
+
|
|
10
|
+
## Overview
|
|
11
|
+
|
|
12
|
+
Random fixes waste time and create new bugs. Quick patches mask underlying issues.
|
|
13
|
+
|
|
14
|
+
**Core principle:** ALWAYS find root cause before attempting fixes. Symptom fixes are failure.
|
|
15
|
+
|
|
16
|
+
**Violating the letter of this process is violating the spirit of debugging.**
|
|
17
|
+
|
|
18
|
+
> The workflow-monitor extension tracks your debugging: it detects fix-without-investigation and counts failed fix attempts, surfacing warnings in tool results. Use `workflow_reference` with debug topics for additional guidance.
|
|
19
|
+
|
|
20
|
+
If a tool result contains a ⚠️ workflow warning, stop immediately and address it before continuing.
|
|
21
|
+
|
|
22
|
+
## The Iron Law
|
|
23
|
+
|
|
24
|
+
```
|
|
25
|
+
NO FIXES WITHOUT ROOT CAUSE INVESTIGATION FIRST
|
|
26
|
+
```
|
|
27
|
+
|
|
28
|
+
If you haven't completed Phase 1, you cannot propose fixes.
|
|
29
|
+
|
|
30
|
+
## When to Use
|
|
31
|
+
|
|
32
|
+
Use for ANY technical issue: test failures, bugs, unexpected behavior, performance problems, build failures, integration issues.
|
|
33
|
+
|
|
34
|
+
**Use this ESPECIALLY when:**
|
|
35
|
+
- Under time pressure (emergencies make guessing tempting)
|
|
36
|
+
- "Just one quick fix" seems obvious
|
|
37
|
+
- You've already tried multiple fixes
|
|
38
|
+
- Previous fix didn't work
|
|
39
|
+
- You don't fully understand the issue
|
|
40
|
+
|
|
41
|
+
**Don't skip when:**
|
|
42
|
+
- Issue seems simple (simple bugs have root causes too)
|
|
43
|
+
- You're in a hurry (rushing guarantees rework)
|
|
44
|
+
|
|
45
|
+
## The Four Phases
|
|
46
|
+
|
|
47
|
+
You MUST complete each phase before proceeding to the next.
|
|
48
|
+
|
|
49
|
+
### Phase 1: Root Cause Investigation
|
|
50
|
+
|
|
51
|
+
**BEFORE attempting ANY fix:**
|
|
52
|
+
|
|
53
|
+
1. **Read Error Messages Carefully** — Don't skip past errors or warnings. Read stack traces completely. Note line numbers, file paths, error codes.
|
|
54
|
+
|
|
55
|
+
2. **Reproduce Consistently** — Can you trigger it reliably? What are the exact steps? If not reproducible → gather more data, don't guess.
|
|
56
|
+
|
|
57
|
+
3. **Check Recent Changes** — Git diff, recent commits, new dependencies, config changes, environmental differences.
|
|
58
|
+
|
|
59
|
+
4. **Gather Evidence in Multi-Component Systems** — For each component boundary: log what enters, what exits, verify config propagation. Run once to see WHERE it breaks, then investigate that component.
|
|
60
|
+
|
|
61
|
+
**Example (multi-layer system):**
|
|
62
|
+
```bash
|
|
63
|
+
# Layer 1: Workflow
|
|
64
|
+
echo "=== Secrets available: ==="
|
|
65
|
+
echo "IDENTITY: ${IDENTITY:+SET}${IDENTITY:-UNSET}"
|
|
66
|
+
|
|
67
|
+
# Layer 2: Build script
|
|
68
|
+
echo "=== Env vars in build script: ==="
|
|
69
|
+
env | grep IDENTITY || echo "IDENTITY not in environment"
|
|
70
|
+
|
|
71
|
+
# Layer 3: Signing
|
|
72
|
+
echo "=== Keychain state: ==="
|
|
73
|
+
security list-keychains
|
|
74
|
+
security find-identity -v
|
|
75
|
+
```
|
|
76
|
+
**This reveals:** Which layer fails (e.g., secrets → workflow ✓, workflow → build ✗)
|
|
77
|
+
|
|
78
|
+
5. **Trace Data Flow** — Where does the bad value originate? What called this with the bad value? Keep tracing up until you find the source. Fix at source, not at symptom. See `root-cause-tracing.md` for the complete technique.
|
|
79
|
+
|
|
80
|
+
### Phase 2: Pattern Analysis
|
|
81
|
+
|
|
82
|
+
1. **Find Working Examples** — Locate similar working code in same codebase.
|
|
83
|
+
2. **Compare Against References** — Read reference implementation COMPLETELY. Don't skim.
|
|
84
|
+
3. **Identify Differences** — List every difference, however small. Don't assume "that can't matter."
|
|
85
|
+
4. **Understand Dependencies** — What components, settings, config, environment does this need?
|
|
86
|
+
|
|
87
|
+
### Phase 3: Hypothesis and Testing
|
|
88
|
+
|
|
89
|
+
1. **Form Single Hypothesis** — State clearly: "I think X is the root cause because Y." Be specific, not vague.
|
|
90
|
+
2. **Test Minimally** — Make the SMALLEST possible change. One variable at a time. Don't fix multiple things at once.
|
|
91
|
+
3. **Verify Before Continuing** — Did it work? Yes → Phase 4. No → Form NEW hypothesis. DON'T add more fixes on top.
|
|
92
|
+
|
|
93
|
+
### Phase 4: Implementation
|
|
94
|
+
|
|
95
|
+
1. **Create Failing Test Case** — Use `/skill:test-driven-development` for writing proper failing tests. MUST have before fixing.
|
|
96
|
+
|
|
97
|
+
2. **Implement Single Fix** — ONE change at a time. No "while I'm here" improvements. No bundled refactoring.
|
|
98
|
+
|
|
99
|
+
3. **Verify Fix** — Test passes? No other tests broken? Issue actually resolved?
|
|
100
|
+
|
|
101
|
+
4. **If Fix Doesn't Work:**
|
|
102
|
+
- If < 3 attempts: Return to Phase 1, re-analyze with new information
|
|
103
|
+
- **If ≥ 3 attempts: STOP (see below)**
|
|
104
|
+
|
|
105
|
+
### When 3+ Fixes Fail: Question Architecture
|
|
106
|
+
|
|
107
|
+
**This is NOT a failed hypothesis — it's a wrong architecture.**
|
|
108
|
+
|
|
109
|
+
Pattern indicating architectural problem:
|
|
110
|
+
- Each fix reveals new shared state/coupling in different places
|
|
111
|
+
- Fixes require "massive refactoring" to implement
|
|
112
|
+
- Each fix creates new symptoms elsewhere
|
|
113
|
+
|
|
114
|
+
**STOP and question fundamentals:**
|
|
115
|
+
- Is this pattern fundamentally sound?
|
|
116
|
+
- Are we sticking with it through sheer inertia?
|
|
117
|
+
- Should we refactor architecture vs. continue fixing symptoms?
|
|
118
|
+
|
|
119
|
+
**Discuss with your human partner before attempting more fixes.**
|
|
120
|
+
|
|
121
|
+
## Red Flags — STOP and Follow Process
|
|
122
|
+
|
|
123
|
+
If you catch yourself thinking:
|
|
124
|
+
- "Quick fix for now, investigate later"
|
|
125
|
+
- "Just try changing X and see if it works"
|
|
126
|
+
- "Add multiple changes, run tests"
|
|
127
|
+
- "Skip the test, I'll manually verify"
|
|
128
|
+
- "It's probably X, let me fix that"
|
|
129
|
+
- "I don't fully understand but this might work"
|
|
130
|
+
- "Pattern says X but I'll adapt it differently"
|
|
131
|
+
- "Here are the main problems: [lists fixes without investigation]"
|
|
132
|
+
- Proposing solutions before tracing data flow
|
|
133
|
+
- **"One more fix attempt" (when already tried 2+)**
|
|
134
|
+
- **Each fix reveals new problem in different place**
|
|
135
|
+
|
|
136
|
+
**ALL of these mean: STOP. Return to Phase 1.**
|
|
137
|
+
|
|
138
|
+
**If 3+ fixes failed:** Question the architecture (see above).
|
|
139
|
+
|
|
140
|
+
## Common Rationalizations
|
|
141
|
+
|
|
142
|
+
| Excuse | Reality |
|
|
143
|
+
|--------|---------|
|
|
144
|
+
| "Issue is simple, don't need process" | Simple issues have root causes too. Process is fast for simple bugs. |
|
|
145
|
+
| "Emergency, no time for process" | Systematic debugging is FASTER than guess-and-check thrashing. |
|
|
146
|
+
| "Just try this first, then investigate" | First fix sets the pattern. Do it right from the start. |
|
|
147
|
+
| "I'll write test after confirming fix works" | Untested fixes don't stick. Test first proves it. |
|
|
148
|
+
| "Multiple fixes at once saves time" | Can't isolate what worked. Causes new bugs. |
|
|
149
|
+
| "Reference too long, I'll adapt the pattern" | Partial understanding guarantees bugs. Read it completely. |
|
|
150
|
+
| "I see the problem, let me fix it" | Seeing symptoms ≠ understanding root cause. |
|
|
151
|
+
| "One more fix attempt" (after 2+ failures) | 3+ failures = architectural problem. Question pattern, don't fix again. |
|
|
152
|
+
|
|
153
|
+
## When Process Reveals "No Root Cause"
|
|
154
|
+
|
|
155
|
+
If investigation reveals issue is truly environmental, timing-dependent, or external:
|
|
156
|
+
1. Document what you investigated
|
|
157
|
+
2. Implement appropriate handling (retry, timeout, error message)
|
|
158
|
+
3. Add monitoring/logging for future investigation
|
|
159
|
+
|
|
160
|
+
**But:** 95% of "no root cause" cases are incomplete investigation.
|
|
161
|
+
|
|
162
|
+
## Supporting Techniques
|
|
163
|
+
|
|
164
|
+
These techniques are part of systematic debugging and available in this directory:
|
|
165
|
+
|
|
166
|
+
- **`root-cause-tracing.md`** — Trace bugs backward through call stack to find original trigger
|
|
167
|
+
- **`defense-in-depth.md`** — Add validation at multiple layers after finding root cause
|
|
168
|
+
- **`condition-based-waiting.md`** — Replace arbitrary timeouts with condition polling
|
|
169
|
+
|
|
170
|
+
Use `workflow_reference` for: `debug-rationalizations`, `debug-tracing`, `debug-defense-in-depth`, `debug-condition-waiting`
|
|
@@ -0,0 +1,158 @@
|
|
|
1
|
+
// Complete implementation of condition-based waiting utilities
|
|
2
|
+
// From: Lace test infrastructure improvements (2025-10-03)
|
|
3
|
+
// Context: Fixed 15 flaky tests by replacing arbitrary timeouts
|
|
4
|
+
|
|
5
|
+
import type { ThreadManager } from "~/threads/thread-manager";
|
|
6
|
+
import type { LaceEvent, LaceEventType } from "~/threads/types";
|
|
7
|
+
|
|
8
|
+
/**
|
|
9
|
+
* Wait for a specific event type to appear in thread
|
|
10
|
+
*
|
|
11
|
+
* @param threadManager - The thread manager to query
|
|
12
|
+
* @param threadId - Thread to check for events
|
|
13
|
+
* @param eventType - Type of event to wait for
|
|
14
|
+
* @param timeoutMs - Maximum time to wait (default 5000ms)
|
|
15
|
+
* @returns Promise resolving to the first matching event
|
|
16
|
+
*
|
|
17
|
+
* Example:
|
|
18
|
+
* await waitForEvent(threadManager, agentThreadId, 'TOOL_RESULT');
|
|
19
|
+
*/
|
|
20
|
+
export function waitForEvent(
|
|
21
|
+
threadManager: ThreadManager,
|
|
22
|
+
threadId: string,
|
|
23
|
+
eventType: LaceEventType,
|
|
24
|
+
timeoutMs = 5000,
|
|
25
|
+
): Promise<LaceEvent> {
|
|
26
|
+
return new Promise((resolve, reject) => {
|
|
27
|
+
const startTime = Date.now();
|
|
28
|
+
|
|
29
|
+
const check = () => {
|
|
30
|
+
const events = threadManager.getEvents(threadId);
|
|
31
|
+
const event = events.find((e) => e.type === eventType);
|
|
32
|
+
|
|
33
|
+
if (event) {
|
|
34
|
+
resolve(event);
|
|
35
|
+
} else if (Date.now() - startTime > timeoutMs) {
|
|
36
|
+
reject(new Error(`Timeout waiting for ${eventType} event after ${timeoutMs}ms`));
|
|
37
|
+
} else {
|
|
38
|
+
setTimeout(check, 10); // Poll every 10ms for efficiency
|
|
39
|
+
}
|
|
40
|
+
};
|
|
41
|
+
|
|
42
|
+
check();
|
|
43
|
+
});
|
|
44
|
+
}
|
|
45
|
+
|
|
46
|
+
/**
|
|
47
|
+
* Wait for a specific number of events of a given type
|
|
48
|
+
*
|
|
49
|
+
* @param threadManager - The thread manager to query
|
|
50
|
+
* @param threadId - Thread to check for events
|
|
51
|
+
* @param eventType - Type of event to wait for
|
|
52
|
+
* @param count - Number of events to wait for
|
|
53
|
+
* @param timeoutMs - Maximum time to wait (default 5000ms)
|
|
54
|
+
* @returns Promise resolving to all matching events once count is reached
|
|
55
|
+
*
|
|
56
|
+
* Example:
|
|
57
|
+
* // Wait for 2 AGENT_MESSAGE events (initial response + continuation)
|
|
58
|
+
* await waitForEventCount(threadManager, agentThreadId, 'AGENT_MESSAGE', 2);
|
|
59
|
+
*/
|
|
60
|
+
export function waitForEventCount(
|
|
61
|
+
threadManager: ThreadManager,
|
|
62
|
+
threadId: string,
|
|
63
|
+
eventType: LaceEventType,
|
|
64
|
+
count: number,
|
|
65
|
+
timeoutMs = 5000,
|
|
66
|
+
): Promise<LaceEvent[]> {
|
|
67
|
+
return new Promise((resolve, reject) => {
|
|
68
|
+
const startTime = Date.now();
|
|
69
|
+
|
|
70
|
+
const check = () => {
|
|
71
|
+
const events = threadManager.getEvents(threadId);
|
|
72
|
+
const matchingEvents = events.filter((e) => e.type === eventType);
|
|
73
|
+
|
|
74
|
+
if (matchingEvents.length >= count) {
|
|
75
|
+
resolve(matchingEvents);
|
|
76
|
+
} else if (Date.now() - startTime > timeoutMs) {
|
|
77
|
+
reject(
|
|
78
|
+
new Error(
|
|
79
|
+
`Timeout waiting for ${count} ${eventType} events after ${timeoutMs}ms (got ${matchingEvents.length})`,
|
|
80
|
+
),
|
|
81
|
+
);
|
|
82
|
+
} else {
|
|
83
|
+
setTimeout(check, 10);
|
|
84
|
+
}
|
|
85
|
+
};
|
|
86
|
+
|
|
87
|
+
check();
|
|
88
|
+
});
|
|
89
|
+
}
|
|
90
|
+
|
|
91
|
+
/**
|
|
92
|
+
* Wait for an event matching a custom predicate
|
|
93
|
+
* Useful when you need to check event data, not just type
|
|
94
|
+
*
|
|
95
|
+
* @param threadManager - The thread manager to query
|
|
96
|
+
* @param threadId - Thread to check for events
|
|
97
|
+
* @param predicate - Function that returns true when event matches
|
|
98
|
+
* @param description - Human-readable description for error messages
|
|
99
|
+
* @param timeoutMs - Maximum time to wait (default 5000ms)
|
|
100
|
+
* @returns Promise resolving to the first matching event
|
|
101
|
+
*
|
|
102
|
+
* Example:
|
|
103
|
+
* // Wait for TOOL_RESULT with specific ID
|
|
104
|
+
* await waitForEventMatch(
|
|
105
|
+
* threadManager,
|
|
106
|
+
* agentThreadId,
|
|
107
|
+
* (e) => e.type === 'TOOL_RESULT' && e.data.id === 'call_123',
|
|
108
|
+
* 'TOOL_RESULT with id=call_123'
|
|
109
|
+
* );
|
|
110
|
+
*/
|
|
111
|
+
export function waitForEventMatch(
|
|
112
|
+
threadManager: ThreadManager,
|
|
113
|
+
threadId: string,
|
|
114
|
+
predicate: (event: LaceEvent) => boolean,
|
|
115
|
+
description: string,
|
|
116
|
+
timeoutMs = 5000,
|
|
117
|
+
): Promise<LaceEvent> {
|
|
118
|
+
return new Promise((resolve, reject) => {
|
|
119
|
+
const startTime = Date.now();
|
|
120
|
+
|
|
121
|
+
const check = () => {
|
|
122
|
+
const events = threadManager.getEvents(threadId);
|
|
123
|
+
const event = events.find(predicate);
|
|
124
|
+
|
|
125
|
+
if (event) {
|
|
126
|
+
resolve(event);
|
|
127
|
+
} else if (Date.now() - startTime > timeoutMs) {
|
|
128
|
+
reject(new Error(`Timeout waiting for ${description} after ${timeoutMs}ms`));
|
|
129
|
+
} else {
|
|
130
|
+
setTimeout(check, 10);
|
|
131
|
+
}
|
|
132
|
+
};
|
|
133
|
+
|
|
134
|
+
check();
|
|
135
|
+
});
|
|
136
|
+
}
|
|
137
|
+
|
|
138
|
+
// Usage example from actual debugging session:
|
|
139
|
+
//
|
|
140
|
+
// BEFORE (flaky):
|
|
141
|
+
// ---------------
|
|
142
|
+
// const messagePromise = agent.sendMessage('Execute tools');
|
|
143
|
+
// await new Promise(r => setTimeout(r, 300)); // Hope tools start in 300ms
|
|
144
|
+
// agent.abort();
|
|
145
|
+
// await messagePromise;
|
|
146
|
+
// await new Promise(r => setTimeout(r, 50)); // Hope results arrive in 50ms
|
|
147
|
+
// expect(toolResults.length).toBe(2); // Fails randomly
|
|
148
|
+
//
|
|
149
|
+
// AFTER (reliable):
|
|
150
|
+
// ----------------
|
|
151
|
+
// const messagePromise = agent.sendMessage('Execute tools');
|
|
152
|
+
// await waitForEventCount(threadManager, threadId, 'TOOL_CALL', 2); // Wait for tools to start
|
|
153
|
+
// agent.abort();
|
|
154
|
+
// await messagePromise;
|
|
155
|
+
// await waitForEventCount(threadManager, threadId, 'TOOL_RESULT', 2); // Wait for results
|
|
156
|
+
// expect(toolResults.length).toBe(2); // Always succeeds
|
|
157
|
+
//
|
|
158
|
+
// Result: 60% pass rate → 100%, 40% faster execution
|
|
@@ -0,0 +1,115 @@
|
|
|
1
|
+
# Condition-Based Waiting
|
|
2
|
+
|
|
3
|
+
## Overview
|
|
4
|
+
|
|
5
|
+
Flaky tests often guess at timing with arbitrary delays. This creates race conditions where tests pass on fast machines but fail under load or in CI.
|
|
6
|
+
|
|
7
|
+
**Core principle:** Wait for the actual condition you care about, not a guess about how long it takes.
|
|
8
|
+
|
|
9
|
+
## When to Use
|
|
10
|
+
|
|
11
|
+
```dot
|
|
12
|
+
digraph when_to_use {
|
|
13
|
+
"Test uses setTimeout/sleep?" [shape=diamond];
|
|
14
|
+
"Testing timing behavior?" [shape=diamond];
|
|
15
|
+
"Document WHY timeout needed" [shape=box];
|
|
16
|
+
"Use condition-based waiting" [shape=box];
|
|
17
|
+
|
|
18
|
+
"Test uses setTimeout/sleep?" -> "Testing timing behavior?" [label="yes"];
|
|
19
|
+
"Testing timing behavior?" -> "Document WHY timeout needed" [label="yes"];
|
|
20
|
+
"Testing timing behavior?" -> "Use condition-based waiting" [label="no"];
|
|
21
|
+
}
|
|
22
|
+
```
|
|
23
|
+
|
|
24
|
+
**Use when:**
|
|
25
|
+
- Tests have arbitrary delays (`setTimeout`, `sleep`, `time.sleep()`)
|
|
26
|
+
- Tests are flaky (pass sometimes, fail under load)
|
|
27
|
+
- Tests timeout when run in parallel
|
|
28
|
+
- Waiting for async operations to complete
|
|
29
|
+
|
|
30
|
+
**Don't use when:**
|
|
31
|
+
- Testing actual timing behavior (debounce, throttle intervals)
|
|
32
|
+
- Always document WHY if using arbitrary timeout
|
|
33
|
+
|
|
34
|
+
## Core Pattern
|
|
35
|
+
|
|
36
|
+
```typescript
|
|
37
|
+
// ❌ BEFORE: Guessing at timing
|
|
38
|
+
await new Promise(r => setTimeout(r, 50));
|
|
39
|
+
const result = getResult();
|
|
40
|
+
expect(result).toBeDefined();
|
|
41
|
+
|
|
42
|
+
// ✅ AFTER: Waiting for condition
|
|
43
|
+
await waitFor(() => getResult() !== undefined);
|
|
44
|
+
const result = getResult();
|
|
45
|
+
expect(result).toBeDefined();
|
|
46
|
+
```
|
|
47
|
+
|
|
48
|
+
## Quick Patterns
|
|
49
|
+
|
|
50
|
+
| Scenario | Pattern |
|
|
51
|
+
|----------|---------|
|
|
52
|
+
| Wait for event | `waitFor(() => events.find(e => e.type === 'DONE'))` |
|
|
53
|
+
| Wait for state | `waitFor(() => machine.state === 'ready')` |
|
|
54
|
+
| Wait for count | `waitFor(() => items.length >= 5)` |
|
|
55
|
+
| Wait for file | `waitFor(() => fs.existsSync(path))` |
|
|
56
|
+
| Complex condition | `waitFor(() => obj.ready && obj.value > 10)` |
|
|
57
|
+
|
|
58
|
+
## Implementation
|
|
59
|
+
|
|
60
|
+
Generic polling function:
|
|
61
|
+
```typescript
|
|
62
|
+
async function waitFor<T>(
|
|
63
|
+
condition: () => T | undefined | null | false,
|
|
64
|
+
description: string,
|
|
65
|
+
timeoutMs = 5000
|
|
66
|
+
): Promise<T> {
|
|
67
|
+
const startTime = Date.now();
|
|
68
|
+
|
|
69
|
+
while (true) {
|
|
70
|
+
const result = condition();
|
|
71
|
+
if (result) return result;
|
|
72
|
+
|
|
73
|
+
if (Date.now() - startTime > timeoutMs) {
|
|
74
|
+
throw new Error(`Timeout waiting for ${description} after ${timeoutMs}ms`);
|
|
75
|
+
}
|
|
76
|
+
|
|
77
|
+
await new Promise(r => setTimeout(r, 10)); // Poll every 10ms
|
|
78
|
+
}
|
|
79
|
+
}
|
|
80
|
+
```
|
|
81
|
+
|
|
82
|
+
See `condition-based-waiting-example.ts` in this directory for complete implementation with domain-specific helpers (`waitForEvent`, `waitForEventCount`, `waitForEventMatch`) from actual debugging session.
|
|
83
|
+
|
|
84
|
+
## Common Mistakes
|
|
85
|
+
|
|
86
|
+
**❌ Polling too fast:** `setTimeout(check, 1)` - wastes CPU
|
|
87
|
+
**✅ Fix:** Poll every 10ms
|
|
88
|
+
|
|
89
|
+
**❌ No timeout:** Loop forever if condition never met
|
|
90
|
+
**✅ Fix:** Always include timeout with clear error
|
|
91
|
+
|
|
92
|
+
**❌ Stale data:** Cache state before loop
|
|
93
|
+
**✅ Fix:** Call getter inside loop for fresh data
|
|
94
|
+
|
|
95
|
+
## When Arbitrary Timeout IS Correct
|
|
96
|
+
|
|
97
|
+
```typescript
|
|
98
|
+
// Tool ticks every 100ms - need 2 ticks to verify partial output
|
|
99
|
+
await waitForEvent(manager, 'TOOL_STARTED'); // First: wait for condition
|
|
100
|
+
await new Promise(r => setTimeout(r, 200)); // Then: wait for timed behavior
|
|
101
|
+
// 200ms = 2 ticks at 100ms intervals - documented and justified
|
|
102
|
+
```
|
|
103
|
+
|
|
104
|
+
**Requirements:**
|
|
105
|
+
1. First wait for triggering condition
|
|
106
|
+
2. Based on known timing (not guessing)
|
|
107
|
+
3. Comment explaining WHY
|
|
108
|
+
|
|
109
|
+
## Real-World Impact
|
|
110
|
+
|
|
111
|
+
From debugging session (2025-10-03):
|
|
112
|
+
- Fixed 15 flaky tests across 3 files
|
|
113
|
+
- Pass rate: 60% → 100%
|
|
114
|
+
- Execution time: 40% faster
|
|
115
|
+
- No more race conditions
|