aigroup-workflow 2.0.3 → 2.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (53) hide show
  1. package/.claude/commands/workflow-start.md +19 -15
  2. package/.codex/AGENTS.md +1 -1
  3. package/AGENTS.md +1 -1
  4. package/README.md +3 -5
  5. package/agents/seo-specialist.md +0 -3
  6. package/cli/commands/init.mjs +1 -1
  7. package/cli/utils/scaffold.mjs +1 -1
  8. package/docs/red-flags.md +1 -1
  9. package/docs/rules/README.md +3 -3
  10. package/docs/rules/web/design-quality.md +0 -1
  11. package/docs/templates/implementation-plan.md +1 -1
  12. package/docs/workflow-pipeline.md +41 -22
  13. package/manifests/install-modules.json +8 -3
  14. package/package.json +1 -1
  15. package/skills/SUPERPOWERS-LICENSE +21 -0
  16. package/skills/brainstorming/SKILL.md +164 -0
  17. package/skills/brainstorming/scripts/frame-template.html +214 -0
  18. package/skills/brainstorming/scripts/helper.js +88 -0
  19. package/skills/brainstorming/scripts/server.cjs +354 -0
  20. package/skills/brainstorming/scripts/start-server.sh +148 -0
  21. package/skills/brainstorming/scripts/stop-server.sh +56 -0
  22. package/skills/brainstorming/spec-document-reviewer-prompt.md +49 -0
  23. package/skills/brainstorming/visual-companion.md +287 -0
  24. package/skills/executing-plans/SKILL.md +70 -0
  25. package/skills/finishing-a-development-branch/SKILL.md +200 -112
  26. package/skills/receiving-code-review/SKILL.md +213 -0
  27. package/skills/requesting-code-review/SKILL.md +105 -0
  28. package/skills/requesting-code-review/code-reviewer.md +146 -0
  29. package/skills/systematic-debugging/CREATION-LOG.md +119 -0
  30. package/skills/systematic-debugging/SKILL.md +296 -208
  31. package/skills/systematic-debugging/condition-based-waiting-example.ts +158 -0
  32. package/skills/systematic-debugging/condition-based-waiting.md +115 -0
  33. package/skills/systematic-debugging/defense-in-depth.md +122 -0
  34. package/skills/systematic-debugging/find-polluter.sh +63 -0
  35. package/skills/systematic-debugging/root-cause-tracing.md +169 -0
  36. package/skills/systematic-debugging/test-academic.md +14 -0
  37. package/skills/systematic-debugging/test-pressure-1.md +58 -0
  38. package/skills/systematic-debugging/test-pressure-2.md +68 -0
  39. package/skills/systematic-debugging/test-pressure-3.md +69 -0
  40. package/skills/using-git-worktrees/SKILL.md +218 -0
  41. package/skills/verification-before-completion/SKILL.md +139 -120
  42. package/skills/writing-plans/SKILL.md +79 -94
  43. package/skills/writing-plans/plan-document-reviewer-prompt.md +49 -0
  44. package/skills/writing-skills/SKILL.md +655 -0
  45. package/skills/writing-skills/anthropic-best-practices.md +1150 -0
  46. package/skills/writing-skills/examples/CLAUDE_MD_TESTING.md +189 -0
  47. package/skills/writing-skills/graphviz-conventions.dot +172 -0
  48. package/skills/writing-skills/persuasion-principles.md +187 -0
  49. package/skills/writing-skills/render-graphs.js +168 -0
  50. package/skills/writing-skills/testing-skills-with-subagents.md +384 -0
  51. package/skills/subagent-driven-development/SKILL.md +0 -173
  52. package/skills/ui-ux-pro-max/scripts/__pycache__/core.cpython-39.pyc +0 -0
  53. package/skills/ui-ux-pro-max/scripts/__pycache__/design_system.cpython-39.pyc +0 -0
@@ -1,208 +1,296 @@
1
- ---
2
- name: systematic-debugging
3
- description: 遇到任何 Bug、测试失败或异常行为时使用。在提议修复方案之前,必须先完成根因调查。
4
- ---
5
-
6
- # 系统化调试
7
-
8
- ## 概述
9
-
10
- 随机修复浪费时间并制造新 Bug。快速补丁掩盖底层问题。
11
-
12
- **核心原则**:永远先找根因再修复。修复症状是失败。
13
-
14
- ## 铁律
15
-
16
- ```
17
- 没有完成根因调查,不得提议修复方案。
18
- 如果你还没完成第一阶段,你不能提出任何修复。
19
- ```
20
-
21
- ## 适用场景
22
-
23
- 用于任何技术问题:
24
- - 测试失败
25
- - 生产环境 Bug
26
- - 异常行为
27
- - 性能问题
28
- - 构建失败
29
- - 集成问题
30
-
31
- **尤其在以下情况必须使用**:
32
- - 时间压力下(紧急情况让人倾向猜测)
33
- - "一个小修复"看起来很明显时
34
- - 已经尝试了多次修复
35
- - 之前的修复没有生效
36
- - 你不完全理解这个问题
37
-
38
- ## 四个阶段
39
-
40
- 你**必须**按顺序完成每个阶段,不得跳过。
41
-
42
- ### 第一阶段:根因调查
43
-
44
- **在尝试任何修复之前:**
45
-
46
- 1. **仔细阅读错误信息**
47
- - 不要跳过错误或警告
48
- - 它们通常包含确切的解决方案
49
- - 完整阅读堆栈跟踪
50
- - 记录行号、文件路径、错误代码
51
-
52
- 2. **稳定复现**
53
- - 能可靠地触发吗?
54
- - 确切的步骤是什么?
55
- - 每次都发生吗?
56
- - 如果不能复现 收集更多数据,不要猜测
57
-
58
- 3. **检查近期变更**
59
- - 什么变更可能导致这个问题?
60
- - git diff,近期提交
61
- - 新依赖、配置变更
62
- - 环境差异
63
-
64
- 4. **多组件系统中收集证据**
65
-
66
- 当系统有多个组件时(CI 构建 → 签名,API → 服务 → 数据库):
67
-
68
- ```
69
- 对每个组件边界:
70
- - 记录进入组件的数据
71
- - 记录离开组件的数据
72
- - 验证环境/配置传播
73
- - 检查每层的状态
74
-
75
- 运行一次收集证据,确定在哪里出问题
76
- 然后分析证据定位故障组件
77
- 然后调查那个具体组件
78
- ```
79
-
80
- 5. **追踪数据流**
81
-
82
- 当错误在调用栈深处时:
83
- - 坏值从哪里来?
84
- - 谁用坏值调用了这个?
85
- - 持续往上追踪直到找到源头
86
- - 在源头修复,而非在症状处
87
-
88
- ### 第二阶段:模式分析
89
-
90
- **找到模式再修复:**
91
-
92
- 1. **找到可工作的示例**
93
- - 在同一代码库中找到类似的可工作代码
94
- - 什么类似的东西是正常工作的?
95
-
96
- 2. **对照参考实现**
97
- - 如果在实现某个模式,完整阅读参考实现
98
- - 不要略读——逐行阅读
99
- - 在应用之前完全理解模式
100
-
101
- 3. **识别差异**
102
- - 可工作的和出问题的有什么不同?
103
- - 列出每一个差异,无论多小
104
- - 不要假设"那个不可能有影响"
105
-
106
- 4. **理解依赖**
107
- - 这个还需要什么其他组件?
108
- - 什么设置、配置、环境?
109
- - 它做了什么假设?
110
-
111
- ### 第三阶段:假设与验证
112
-
113
- **科学方法:**
114
-
115
- 1. **形成单一假设**
116
- - 清晰陈述:"我认为 X 是根因,因为 Y"
117
- - 写下来
118
- - 要具体,不要模糊
119
-
120
- 2. **最小化测试**
121
- - 做最小的变更来测试假设
122
- - 一次只改一个变量
123
- - 不要同时修多个东西
124
-
125
- 3. **验证后再继续**
126
- - 有效吗?是 第四阶段
127
- - 无效?形成新假设
128
- - 不要在上面叠加更多修复
129
-
130
- 4. **当你不知道时**
131
- - 说"我不理解 X"
132
- - 不要假装知道
133
- - 寻求帮助
134
- - 做更多研究
135
-
136
- ### 第四阶段:实现修复
137
-
138
- **修复根因,而非症状:**
139
-
140
- 1. **创建失败测试用例**
141
- - 最简单的复现
142
- - 如果可能用自动化测试
143
- - 修复前必须有失败测试
144
-
145
- 2. **实现单一修复**
146
- - 解决已识别的根因
147
- - 一次一个变更
148
- - 不搭车改进("既然改了顺便...")
149
- - 不捆绑重构
150
-
151
- 3. **验证修复**
152
- - 测试现在通过了?
153
- - 没有破坏其他测试?
154
- - 问题确实解决了?
155
-
156
- 4. **如果修复无效**
157
- - 停下来
158
- - 数一数:已经尝试了几次修复?
159
- - 如果 < 3 次:返回第一阶段,用新信息重新分析
160
- - **如果 >= 3 次:停下来,质疑架构(见下文)**
161
- - 不要在没有架构讨论的情况下尝试第 4 次修复
162
-
163
- 5. **3 次以上修复失败:质疑架构**
164
-
165
- 表明架构问题的模式:
166
- - 每次修复都揭示新的共享状态/耦合/不同位置的问题
167
- - 修复需要"大规模重构"才能实现
168
- - 每次修复在其他地方产生新症状
169
-
170
- **停下来质疑根本问题:**
171
- - 这个模式/架构从根本上合理吗?
172
- - 我们是不是因为惯性而坚持错误方向?
173
- - 应该重构架构还是继续修症状?
174
-
175
- **在尝试更多修复之前与用户讨论**
176
-
177
- ## Red Flags 停下来
178
-
179
- 如果你发现自己在想:
180
-
181
- | 想法 | 正确做法 |
182
- |------|---------|
183
- | "先试试改 X 看看" | 停,先完成根因调查 |
184
- | "加几个改动,跑跑测试" | 停,一次只改一个变量 |
185
- | "跳过测试,我手动验证" | 停,必须有自动化测试 |
186
- | "大概是 X,让我修一下" | 停,"大概"不够,需要证据 |
187
- | "我不完全理解但可能有效" | 停,不理解就不修 |
188
- | "应该可以了" | 停,运行验证命令 |
189
- | "再试一次"(已经试了 2 次以上) | 停,质疑架构 |
190
- | "每次修复都冒出新问题" | 停,这是架构问题 |
191
-
192
- **以上所有情况:停下来,返回第一阶段。**
193
-
194
- ## 常见合理化借口
195
-
196
- | 借口 | 真相 |
197
- |------|------|
198
- | "问题很简单,不需要流程" | 简单问题也有根因,流程对简单 Bug 很快 |
199
- | "紧急情况,没时间走流程" | 系统化调试比猜测快得多 |
200
- | "先试试,然后再调查" | 第一次修复定下模式,从一开始就做对 |
201
- | "修好了再写测试" | 没测试的修复不牢靠,先写测试 |
202
- | "同时改多个省时间" | 无法隔离哪个有效,还会引入新 Bug |
203
- | "参考太长了,我改编一下" | 不完全理解保证出 Bug,完整阅读 |
204
-
205
- ## 关联技能
206
-
207
- - **verification-before-completion** 修复后验证确实有效
208
- - **writing-plans** 如果需要架构级重构,走计划流程
1
+ ---
2
+ name: systematic-debugging
3
+ description: Use when encountering any bug, test failure, or unexpected behavior, before proposing fixes
4
+ ---
5
+
6
+ # Systematic Debugging
7
+
8
+ ## Overview
9
+
10
+ Random fixes waste time and create new bugs. Quick patches mask underlying issues.
11
+
12
+ **Core principle:** ALWAYS find root cause before attempting fixes. Symptom fixes are failure.
13
+
14
+ **Violating the letter of this process is violating the spirit of debugging.**
15
+
16
+ ## The Iron Law
17
+
18
+ ```
19
+ NO FIXES WITHOUT ROOT CAUSE INVESTIGATION FIRST
20
+ ```
21
+
22
+ If you haven't completed Phase 1, you cannot propose fixes.
23
+
24
+ ## When to Use
25
+
26
+ Use for ANY technical issue:
27
+ - Test failures
28
+ - Bugs in production
29
+ - Unexpected behavior
30
+ - Performance problems
31
+ - Build failures
32
+ - Integration issues
33
+
34
+ **Use this ESPECIALLY when:**
35
+ - Under time pressure (emergencies make guessing tempting)
36
+ - "Just one quick fix" seems obvious
37
+ - You've already tried multiple fixes
38
+ - Previous fix didn't work
39
+ - You don't fully understand the issue
40
+
41
+ **Don't skip when:**
42
+ - Issue seems simple (simple bugs have root causes too)
43
+ - You're in a hurry (rushing guarantees rework)
44
+ - Manager wants it fixed NOW (systematic is faster than thrashing)
45
+
46
+ ## The Four Phases
47
+
48
+ You MUST complete each phase before proceeding to the next.
49
+
50
+ ### Phase 1: Root Cause Investigation
51
+
52
+ **BEFORE attempting ANY fix:**
53
+
54
+ 1. **Read Error Messages Carefully**
55
+ - Don't skip past errors or warnings
56
+ - They often contain the exact solution
57
+ - Read stack traces completely
58
+ - Note line numbers, file paths, error codes
59
+
60
+ 2. **Reproduce Consistently**
61
+ - Can you trigger it reliably?
62
+ - What are the exact steps?
63
+ - Does it happen every time?
64
+ - If not reproducible → gather more data, don't guess
65
+
66
+ 3. **Check Recent Changes**
67
+ - What changed that could cause this?
68
+ - Git diff, recent commits
69
+ - New dependencies, config changes
70
+ - Environmental differences
71
+
72
+ 4. **Gather Evidence in Multi-Component Systems**
73
+
74
+ **WHEN system has multiple components (CI → build → signing, API → service → database):**
75
+
76
+ **BEFORE proposing fixes, add diagnostic instrumentation:**
77
+ ```
78
+ For EACH component boundary:
79
+ - Log what data enters component
80
+ - Log what data exits component
81
+ - Verify environment/config propagation
82
+ - Check state at each layer
83
+
84
+ Run once to gather evidence showing WHERE it breaks
85
+ THEN analyze evidence to identify failing component
86
+ THEN investigate that specific component
87
+ ```
88
+
89
+ **Example (multi-layer system):**
90
+ ```bash
91
+ # Layer 1: Workflow
92
+ echo "=== Secrets available in workflow: ==="
93
+ echo "IDENTITY: ${IDENTITY:+SET}${IDENTITY:-UNSET}"
94
+
95
+ # Layer 2: Build script
96
+ echo "=== Env vars in build script: ==="
97
+ env | grep IDENTITY || echo "IDENTITY not in environment"
98
+
99
+ # Layer 3: Signing script
100
+ echo "=== Keychain state: ==="
101
+ security list-keychains
102
+ security find-identity -v
103
+
104
+ # Layer 4: Actual signing
105
+ codesign --sign "$IDENTITY" --verbose=4 "$APP"
106
+ ```
107
+
108
+ **This reveals:** Which layer fails (secrets → workflow ✓, workflow → build ✗)
109
+
110
+ 5. **Trace Data Flow**
111
+
112
+ **WHEN error is deep in call stack:**
113
+
114
+ See `root-cause-tracing.md` in this directory for the complete backward tracing technique.
115
+
116
+ **Quick version:**
117
+ - Where does bad value originate?
118
+ - What called this with bad value?
119
+ - Keep tracing up until you find the source
120
+ - Fix at source, not at symptom
121
+
122
+ ### Phase 2: Pattern Analysis
123
+
124
+ **Find the pattern before fixing:**
125
+
126
+ 1. **Find Working Examples**
127
+ - Locate similar working code in same codebase
128
+ - What works that's similar to what's broken?
129
+
130
+ 2. **Compare Against References**
131
+ - If implementing pattern, read reference implementation COMPLETELY
132
+ - Don't skim - read every line
133
+ - Understand the pattern fully before applying
134
+
135
+ 3. **Identify Differences**
136
+ - What's different between working and broken?
137
+ - List every difference, however small
138
+ - Don't assume "that can't matter"
139
+
140
+ 4. **Understand Dependencies**
141
+ - What other components does this need?
142
+ - What settings, config, environment?
143
+ - What assumptions does it make?
144
+
145
+ ### Phase 3: Hypothesis and Testing
146
+
147
+ **Scientific method:**
148
+
149
+ 1. **Form Single Hypothesis**
150
+ - State clearly: "I think X is the root cause because Y"
151
+ - Write it down
152
+ - Be specific, not vague
153
+
154
+ 2. **Test Minimally**
155
+ - Make the SMALLEST possible change to test hypothesis
156
+ - One variable at a time
157
+ - Don't fix multiple things at once
158
+
159
+ 3. **Verify Before Continuing**
160
+ - Did it work? Yes → Phase 4
161
+ - Didn't work? Form NEW hypothesis
162
+ - DON'T add more fixes on top
163
+
164
+ 4. **When You Don't Know**
165
+ - Say "I don't understand X"
166
+ - Don't pretend to know
167
+ - Ask for help
168
+ - Research more
169
+
170
+ ### Phase 4: Implementation
171
+
172
+ **Fix the root cause, not the symptom:**
173
+
174
+ 1. **Create Failing Test Case**
175
+ - Simplest possible reproduction
176
+ - Automated test if possible
177
+ - One-off test script if no framework
178
+ - MUST have before fixing
179
+ - Use the `superpowers:test-driven-development` skill for writing proper failing tests
180
+
181
+ 2. **Implement Single Fix**
182
+ - Address the root cause identified
183
+ - ONE change at a time
184
+ - No "while I'm here" improvements
185
+ - No bundled refactoring
186
+
187
+ 3. **Verify Fix**
188
+ - Test passes now?
189
+ - No other tests broken?
190
+ - Issue actually resolved?
191
+
192
+ 4. **If Fix Doesn't Work**
193
+ - STOP
194
+ - Count: How many fixes have you tried?
195
+ - If < 3: Return to Phase 1, re-analyze with new information
196
+ - **If 3: STOP and question the architecture (step 5 below)**
197
+ - DON'T attempt Fix #4 without architectural discussion
198
+
199
+ 5. **If 3+ Fixes Failed: Question Architecture**
200
+
201
+ **Pattern indicating architectural problem:**
202
+ - Each fix reveals new shared state/coupling/problem in different place
203
+ - Fixes require "massive refactoring" to implement
204
+ - Each fix creates new symptoms elsewhere
205
+
206
+ **STOP and question fundamentals:**
207
+ - Is this pattern fundamentally sound?
208
+ - Are we "sticking with it through sheer inertia"?
209
+ - Should we refactor architecture vs. continue fixing symptoms?
210
+
211
+ **Discuss with your human partner before attempting more fixes**
212
+
213
+ This is NOT a failed hypothesis - this is a wrong architecture.
214
+
215
+ ## Red Flags - STOP and Follow Process
216
+
217
+ If you catch yourself thinking:
218
+ - "Quick fix for now, investigate later"
219
+ - "Just try changing X and see if it works"
220
+ - "Add multiple changes, run tests"
221
+ - "Skip the test, I'll manually verify"
222
+ - "It's probably X, let me fix that"
223
+ - "I don't fully understand but this might work"
224
+ - "Pattern says X but I'll adapt it differently"
225
+ - "Here are the main problems: [lists fixes without investigation]"
226
+ - Proposing solutions before tracing data flow
227
+ - **"One more fix attempt" (when already tried 2+)**
228
+ - **Each fix reveals new problem in different place**
229
+
230
+ **ALL of these mean: STOP. Return to Phase 1.**
231
+
232
+ **If 3+ fixes failed:** Question the architecture (see Phase 4.5)
233
+
234
+ ## your human partner's Signals You're Doing It Wrong
235
+
236
+ **Watch for these redirections:**
237
+ - "Is that not happening?" - You assumed without verifying
238
+ - "Will it show us...?" - You should have added evidence gathering
239
+ - "Stop guessing" - You're proposing fixes without understanding
240
+ - "Ultrathink this" - Question fundamentals, not just symptoms
241
+ - "We're stuck?" (frustrated) - Your approach isn't working
242
+
243
+ **When you see these:** STOP. Return to Phase 1.
244
+
245
+ ## Common Rationalizations
246
+
247
+ | Excuse | Reality |
248
+ |--------|---------|
249
+ | "Issue is simple, don't need process" | Simple issues have root causes too. Process is fast for simple bugs. |
250
+ | "Emergency, no time for process" | Systematic debugging is FASTER than guess-and-check thrashing. |
251
+ | "Just try this first, then investigate" | First fix sets the pattern. Do it right from the start. |
252
+ | "I'll write test after confirming fix works" | Untested fixes don't stick. Test first proves it. |
253
+ | "Multiple fixes at once saves time" | Can't isolate what worked. Causes new bugs. |
254
+ | "Reference too long, I'll adapt the pattern" | Partial understanding guarantees bugs. Read it completely. |
255
+ | "I see the problem, let me fix it" | Seeing symptoms ≠ understanding root cause. |
256
+ | "One more fix attempt" (after 2+ failures) | 3+ failures = architectural problem. Question pattern, don't fix again. |
257
+
258
+ ## Quick Reference
259
+
260
+ | Phase | Key Activities | Success Criteria |
261
+ |-------|---------------|------------------|
262
+ | **1. Root Cause** | Read errors, reproduce, check changes, gather evidence | Understand WHAT and WHY |
263
+ | **2. Pattern** | Find working examples, compare | Identify differences |
264
+ | **3. Hypothesis** | Form theory, test minimally | Confirmed or new hypothesis |
265
+ | **4. Implementation** | Create test, fix, verify | Bug resolved, tests pass |
266
+
267
+ ## When Process Reveals "No Root Cause"
268
+
269
+ If systematic investigation reveals issue is truly environmental, timing-dependent, or external:
270
+
271
+ 1. You've completed the process
272
+ 2. Document what you investigated
273
+ 3. Implement appropriate handling (retry, timeout, error message)
274
+ 4. Add monitoring/logging for future investigation
275
+
276
+ **But:** 95% of "no root cause" cases are incomplete investigation.
277
+
278
+ ## Supporting Techniques
279
+
280
+ These techniques are part of systematic debugging and available in this directory:
281
+
282
+ - **`root-cause-tracing.md`** - Trace bugs backward through call stack to find original trigger
283
+ - **`defense-in-depth.md`** - Add validation at multiple layers after finding root cause
284
+ - **`condition-based-waiting.md`** - Replace arbitrary timeouts with condition polling
285
+
286
+ **Related skills:**
287
+ - **superpowers:test-driven-development** - For creating failing test case (Phase 4, Step 1)
288
+ - **superpowers:verification-before-completion** - Verify fix worked before claiming success
289
+
290
+ ## Real-World Impact
291
+
292
+ From debugging sessions:
293
+ - Systematic approach: 15-30 minutes to fix
294
+ - Random fixes approach: 2-3 hours of thrashing
295
+ - First-time fix rate: 95% vs 40%
296
+ - New bugs introduced: Near zero vs common
@@ -0,0 +1,158 @@
1
+ // Complete implementation of condition-based waiting utilities
2
+ // From: Lace test infrastructure improvements (2025-10-03)
3
+ // Context: Fixed 15 flaky tests by replacing arbitrary timeouts
4
+
5
+ import type { ThreadManager } from '~/threads/thread-manager';
6
+ import type { LaceEvent, LaceEventType } from '~/threads/types';
7
+
8
+ /**
9
+ * Wait for a specific event type to appear in thread
10
+ *
11
+ * @param threadManager - The thread manager to query
12
+ * @param threadId - Thread to check for events
13
+ * @param eventType - Type of event to wait for
14
+ * @param timeoutMs - Maximum time to wait (default 5000ms)
15
+ * @returns Promise resolving to the first matching event
16
+ *
17
+ * Example:
18
+ * await waitForEvent(threadManager, agentThreadId, 'TOOL_RESULT');
19
+ */
20
+ export function waitForEvent(
21
+ threadManager: ThreadManager,
22
+ threadId: string,
23
+ eventType: LaceEventType,
24
+ timeoutMs = 5000
25
+ ): Promise<LaceEvent> {
26
+ return new Promise((resolve, reject) => {
27
+ const startTime = Date.now();
28
+
29
+ const check = () => {
30
+ const events = threadManager.getEvents(threadId);
31
+ const event = events.find((e) => e.type === eventType);
32
+
33
+ if (event) {
34
+ resolve(event);
35
+ } else if (Date.now() - startTime > timeoutMs) {
36
+ reject(new Error(`Timeout waiting for ${eventType} event after ${timeoutMs}ms`));
37
+ } else {
38
+ setTimeout(check, 10); // Poll every 10ms for efficiency
39
+ }
40
+ };
41
+
42
+ check();
43
+ });
44
+ }
45
+
46
+ /**
47
+ * Wait for a specific number of events of a given type
48
+ *
49
+ * @param threadManager - The thread manager to query
50
+ * @param threadId - Thread to check for events
51
+ * @param eventType - Type of event to wait for
52
+ * @param count - Number of events to wait for
53
+ * @param timeoutMs - Maximum time to wait (default 5000ms)
54
+ * @returns Promise resolving to all matching events once count is reached
55
+ *
56
+ * Example:
57
+ * // Wait for 2 AGENT_MESSAGE events (initial response + continuation)
58
+ * await waitForEventCount(threadManager, agentThreadId, 'AGENT_MESSAGE', 2);
59
+ */
60
+ export function waitForEventCount(
61
+ threadManager: ThreadManager,
62
+ threadId: string,
63
+ eventType: LaceEventType,
64
+ count: number,
65
+ timeoutMs = 5000
66
+ ): Promise<LaceEvent[]> {
67
+ return new Promise((resolve, reject) => {
68
+ const startTime = Date.now();
69
+
70
+ const check = () => {
71
+ const events = threadManager.getEvents(threadId);
72
+ const matchingEvents = events.filter((e) => e.type === eventType);
73
+
74
+ if (matchingEvents.length >= count) {
75
+ resolve(matchingEvents);
76
+ } else if (Date.now() - startTime > timeoutMs) {
77
+ reject(
78
+ new Error(
79
+ `Timeout waiting for ${count} ${eventType} events after ${timeoutMs}ms (got ${matchingEvents.length})`
80
+ )
81
+ );
82
+ } else {
83
+ setTimeout(check, 10);
84
+ }
85
+ };
86
+
87
+ check();
88
+ });
89
+ }
90
+
91
+ /**
92
+ * Wait for an event matching a custom predicate
93
+ * Useful when you need to check event data, not just type
94
+ *
95
+ * @param threadManager - The thread manager to query
96
+ * @param threadId - Thread to check for events
97
+ * @param predicate - Function that returns true when event matches
98
+ * @param description - Human-readable description for error messages
99
+ * @param timeoutMs - Maximum time to wait (default 5000ms)
100
+ * @returns Promise resolving to the first matching event
101
+ *
102
+ * Example:
103
+ * // Wait for TOOL_RESULT with specific ID
104
+ * await waitForEventMatch(
105
+ * threadManager,
106
+ * agentThreadId,
107
+ * (e) => e.type === 'TOOL_RESULT' && e.data.id === 'call_123',
108
+ * 'TOOL_RESULT with id=call_123'
109
+ * );
110
+ */
111
+ export function waitForEventMatch(
112
+ threadManager: ThreadManager,
113
+ threadId: string,
114
+ predicate: (event: LaceEvent) => boolean,
115
+ description: string,
116
+ timeoutMs = 5000
117
+ ): Promise<LaceEvent> {
118
+ return new Promise((resolve, reject) => {
119
+ const startTime = Date.now();
120
+
121
+ const check = () => {
122
+ const events = threadManager.getEvents(threadId);
123
+ const event = events.find(predicate);
124
+
125
+ if (event) {
126
+ resolve(event);
127
+ } else if (Date.now() - startTime > timeoutMs) {
128
+ reject(new Error(`Timeout waiting for ${description} after ${timeoutMs}ms`));
129
+ } else {
130
+ setTimeout(check, 10);
131
+ }
132
+ };
133
+
134
+ check();
135
+ });
136
+ }
137
+
138
+ // Usage example from actual debugging session:
139
+ //
140
+ // BEFORE (flaky):
141
+ // ---------------
142
+ // const messagePromise = agent.sendMessage('Execute tools');
143
+ // await new Promise(r => setTimeout(r, 300)); // Hope tools start in 300ms
144
+ // agent.abort();
145
+ // await messagePromise;
146
+ // await new Promise(r => setTimeout(r, 50)); // Hope results arrive in 50ms
147
+ // expect(toolResults.length).toBe(2); // Fails randomly
148
+ //
149
+ // AFTER (reliable):
150
+ // ----------------
151
+ // const messagePromise = agent.sendMessage('Execute tools');
152
+ // await waitForEventCount(threadManager, threadId, 'TOOL_CALL', 2); // Wait for tools to start
153
+ // agent.abort();
154
+ // await messagePromise;
155
+ // await waitForEventCount(threadManager, threadId, 'TOOL_RESULT', 2); // Wait for results
156
+ // expect(toolResults.length).toBe(2); // Always succeeds
157
+ //
158
+ // Result: 60% pass rate → 100%, 40% faster execution