npm - aigroup-workflow - Versions diffs - 2.0.3 → 2.1.0 - Mend

aigroup-workflow 2.0.3 → 2.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (53) hide show

package/.claude/commands/workflow-start.md +19 -15
package/.codex/AGENTS.md +1 -1
package/AGENTS.md +1 -1
package/README.md +3 -5
package/agents/seo-specialist.md +0 -3
package/cli/commands/init.mjs +1 -1
package/cli/utils/scaffold.mjs +1 -1
package/docs/red-flags.md +1 -1
package/docs/rules/README.md +3 -3
package/docs/rules/web/design-quality.md +0 -1
package/docs/templates/implementation-plan.md +1 -1
package/docs/workflow-pipeline.md +41 -22
package/manifests/install-modules.json +8 -3
package/package.json +1 -1
package/skills/SUPERPOWERS-LICENSE +21 -0
package/skills/brainstorming/SKILL.md +164 -0
package/skills/brainstorming/scripts/frame-template.html +214 -0
package/skills/brainstorming/scripts/helper.js +88 -0
package/skills/brainstorming/scripts/server.cjs +354 -0
package/skills/brainstorming/scripts/start-server.sh +148 -0
package/skills/brainstorming/scripts/stop-server.sh +56 -0
package/skills/brainstorming/spec-document-reviewer-prompt.md +49 -0
package/skills/brainstorming/visual-companion.md +287 -0
package/skills/executing-plans/SKILL.md +70 -0
package/skills/finishing-a-development-branch/SKILL.md +200 -112
package/skills/receiving-code-review/SKILL.md +213 -0
package/skills/requesting-code-review/SKILL.md +105 -0
package/skills/requesting-code-review/code-reviewer.md +146 -0
package/skills/systematic-debugging/CREATION-LOG.md +119 -0
package/skills/systematic-debugging/SKILL.md +296 -208
package/skills/systematic-debugging/condition-based-waiting-example.ts +158 -0
package/skills/systematic-debugging/condition-based-waiting.md +115 -0
package/skills/systematic-debugging/defense-in-depth.md +122 -0
package/skills/systematic-debugging/find-polluter.sh +63 -0
package/skills/systematic-debugging/root-cause-tracing.md +169 -0
package/skills/systematic-debugging/test-academic.md +14 -0
package/skills/systematic-debugging/test-pressure-1.md +58 -0
package/skills/systematic-debugging/test-pressure-2.md +68 -0
package/skills/systematic-debugging/test-pressure-3.md +69 -0
package/skills/using-git-worktrees/SKILL.md +218 -0
package/skills/verification-before-completion/SKILL.md +139 -120
package/skills/writing-plans/SKILL.md +79 -94
package/skills/writing-plans/plan-document-reviewer-prompt.md +49 -0
package/skills/writing-skills/SKILL.md +655 -0
package/skills/writing-skills/anthropic-best-practices.md +1150 -0
package/skills/writing-skills/examples/CLAUDE_MD_TESTING.md +189 -0
package/skills/writing-skills/graphviz-conventions.dot +172 -0
package/skills/writing-skills/persuasion-principles.md +187 -0
package/skills/writing-skills/render-graphs.js +168 -0
package/skills/writing-skills/testing-skills-with-subagents.md +384 -0
package/skills/subagent-driven-development/SKILL.md +0 -173
package/skills/ui-ux-pro-max/scripts/__pycache__/core.cpython-39.pyc +0 -0
package/skills/ui-ux-pro-max/scripts/__pycache__/design_system.cpython-39.pyc +0 -0

package/skills/systematic-debugging/SKILL.md CHANGED Viewed

@@ -1,208 +1,296 @@
----
-name: systematic-debugging
-description: 遇到任何 Bug、测试失败或异常行为时使用。在提议修复方案之前，必须先完成根因调查。
----
-# 系统化调试
-## 概述
-随机修复浪费时间并制造新 Bug。快速补丁掩盖底层问题。
-**核心原则**：永远先找根因再修复。修复症状是失败。
-## 铁律
-```
-没有完成根因调查，不得提议修复方案。
-如果你还没完成第一阶段，你不能提出任何修复。
-```
-## 适用场景
-用于任何技术问题：
-- 测试失败
-- 生产环境 Bug
-- 异常行为
-- 性能问题
-- 构建失败
-- 集成问题
-**尤其在以下情况必须使用**：
-- 时间压力下（紧急情况让人倾向猜测）
-- "一个小修复"看起来很明显时
-- 已经尝试了多次修复
-- 之前的修复没有生效
-- 你不完全理解这个问题
-## 四个阶段
-你**必须**按顺序完成每个阶段，不得跳过。
-### 第一阶段：根因调查
-**在尝试任何修复之前：**
-1. **仔细阅读错误信息**
-   - 不要跳过错误或警告
-   - 它们通常包含确切的解决方案
-   - 完整阅读堆栈跟踪
-   - 记录行号、文件路径、错误代码
-2. **稳定复现**
-   - 能可靠地触发吗？
-   - 确切的步骤是什么？
-   - 每次都发生吗？
-   - 如果不能复现 → 收集更多数据，不要猜测
-3. **检查近期变更**
-   - 什么变更可能导致这个问题？
-   - git diff，近期提交
-   - 新依赖、配置变更
-   - 环境差异
-4. **多组件系统中收集证据**
-   当系统有多个组件时（CI → 构建 → 签名，API → 服务 → 数据库）：
-   ```
-   对每个组件边界：
-     - 记录进入组件的数据
-     - 记录离开组件的数据
-     - 验证环境/配置传播
-     - 检查每层的状态
-   运行一次收集证据，确定在哪里出问题
-   然后分析证据定位故障组件
-   然后调查那个具体组件
-   ```
-5. **追踪数据流**
-   当错误在调用栈深处时：
-   - 坏值从哪里来？
-   - 谁用坏值调用了这个？
-   - 持续往上追踪直到找到源头
-   - 在源头修复，而非在症状处
-### 第二阶段：模式分析
-**找到模式再修复：**
-1. **找到可工作的示例**
-   - 在同一代码库中找到类似的可工作代码
-   - 什么类似的东西是正常工作的？
-2. **对照参考实现**
-   - 如果在实现某个模式，完整阅读参考实现
-   - 不要略读——逐行阅读
-   - 在应用之前完全理解模式
-3. **识别差异**
-   - 可工作的和出问题的有什么不同？
-   - 列出每一个差异，无论多小
-   - 不要假设"那个不可能有影响"
-4. **理解依赖**
-   - 这个还需要什么其他组件？
-   - 什么设置、配置、环境？
-   - 它做了什么假设？
-### 第三阶段：假设与验证
-**科学方法：**
-1. **形成单一假设**
-   - 清晰陈述："我认为 X 是根因，因为 Y"
-   - 写下来
-   - 要具体，不要模糊
-2. **最小化测试**
-   - 做最小的变更来测试假设
-   - 一次只改一个变量
-   - 不要同时修多个东西
-3. **验证后再继续**
-   - 有效吗？是 → 第四阶段
-   - 无效？形成新假设
-   - 不要在上面叠加更多修复
-4. **当你不知道时**
-   - 说"我不理解 X"
-   - 不要假装知道
-   - 寻求帮助
-   - 做更多研究
-### 第四阶段：实现修复
-**修复根因，而非症状：**
-1. **创建失败测试用例**
-   - 最简单的复现
-   - 如果可能用自动化测试
-   - 修复前必须有失败测试
-2. **实现单一修复**
-   - 解决已识别的根因
-   - 一次一个变更
-   - 不搭车改进（"既然改了顺便..."）
-   - 不捆绑重构
-3. **验证修复**
-   - 测试现在通过了？
-   - 没有破坏其他测试？
-   - 问题确实解决了？
-4. **如果修复无效**
-   - 停下来
-   - 数一数：已经尝试了几次修复？
-   - 如果 < 3 次：返回第一阶段，用新信息重新分析
-   - **如果 >= 3 次：停下来，质疑架构（见下文）**
-   - 不要在没有架构讨论的情况下尝试第 4 次修复
-5. **3 次以上修复失败：质疑架构**
-   表明架构问题的模式：
-   - 每次修复都揭示新的共享状态/耦合/不同位置的问题
-   - 修复需要"大规模重构"才能实现
-   - 每次修复在其他地方产生新症状
-   **停下来质疑根本问题：**
-   - 这个模式/架构从根本上合理吗？
-   - 我们是不是因为惯性而坚持错误方向？
-   - 应该重构架构还是继续修症状？
-   **在尝试更多修复之前与用户讨论**
-## Red Flags — 停下来
-如果你发现自己在想：
-| 想法 | 正确做法 |
-|------|---------|
-| "先试试改 X 看看" | 停，先完成根因调查 |
-| "加几个改动，跑跑测试" | 停，一次只改一个变量 |
-| "跳过测试，我手动验证" | 停，必须有自动化测试 |
-| "大概是 X，让我修一下" | 停，"大概"不够，需要证据 |
-| "我不完全理解但可能有效" | 停，不理解就不修 |
-| "应该可以了" | 停，运行验证命令 |
-| "再试一次"（已经试了 2 次以上） | 停，质疑架构 |
-| "每次修复都冒出新问题" | 停，这是架构问题 |
-**以上所有情况：停下来，返回第一阶段。**
-## 常见合理化借口
-| 借口 | 真相 |
-|------|------|
-| "问题很简单，不需要流程" | 简单问题也有根因，流程对简单 Bug 很快 |
-| "紧急情况，没时间走流程" | 系统化调试比猜测快得多 |
-| "先试试，然后再调查" | 第一次修复定下模式，从一开始就做对 |
-| "修好了再写测试" | 没测试的修复不牢靠，先写测试 |
-| "同时改多个省时间" | 无法隔离哪个有效，还会引入新 Bug |
-| "参考太长了，我改编一下" | 不完全理解保证出 Bug，完整阅读 |
-## 关联技能
-- **verification-before-completion** — 修复后验证确实有效
-- **writing-plans** — 如果需要架构级重构，走计划流程
+---
+name: systematic-debugging
+description: Use when encountering any bug, test failure, or unexpected behavior, before proposing fixes
+---
+# Systematic Debugging
+## Overview
+Random fixes waste time and create new bugs. Quick patches mask underlying issues.
+**Core principle:** ALWAYS find root cause before attempting fixes. Symptom fixes are failure.
+**Violating the letter of this process is violating the spirit of debugging.**
+## The Iron Law
+```
+NO FIXES WITHOUT ROOT CAUSE INVESTIGATION FIRST
+```
+If you haven't completed Phase 1, you cannot propose fixes.
+## When to Use
+Use for ANY technical issue:
+- Test failures
+- Bugs in production
+- Unexpected behavior
+- Performance problems
+- Build failures
+- Integration issues
+**Use this ESPECIALLY when:**
+- Under time pressure (emergencies make guessing tempting)
+- "Just one quick fix" seems obvious
+- You've already tried multiple fixes
+- Previous fix didn't work
+- You don't fully understand the issue
+**Don't skip when:**
+- Issue seems simple (simple bugs have root causes too)
+- You're in a hurry (rushing guarantees rework)
+- Manager wants it fixed NOW (systematic is faster than thrashing)
+## The Four Phases
+You MUST complete each phase before proceeding to the next.
+### Phase 1: Root Cause Investigation
+**BEFORE attempting ANY fix:**
+1. **Read Error Messages Carefully**
+   - Don't skip past errors or warnings
+   - They often contain the exact solution
+   - Read stack traces completely
+   - Note line numbers, file paths, error codes
+2. **Reproduce Consistently**
+   - Can you trigger it reliably?
+   - What are the exact steps?
+   - Does it happen every time?
+   - If not reproducible → gather more data, don't guess
+3. **Check Recent Changes**
+   - What changed that could cause this?
+   - Git diff, recent commits
+   - New dependencies, config changes
+   - Environmental differences
+4. **Gather Evidence in Multi-Component Systems**
+   **WHEN system has multiple components (CI → build → signing, API → service → database):**
+   **BEFORE proposing fixes, add diagnostic instrumentation:**
+   ```
+   For EACH component boundary:
+     - Log what data enters component
+     - Log what data exits component
+     - Verify environment/config propagation
+     - Check state at each layer
+   Run once to gather evidence showing WHERE it breaks
+   THEN analyze evidence to identify failing component
+   THEN investigate that specific component
+   ```
+   **Example (multi-layer system):**
+   ```bash
+   # Layer 1: Workflow
+   echo "=== Secrets available in workflow: ==="
+   echo "IDENTITY: ${IDENTITY:+SET}${IDENTITY:-UNSET}"
+   # Layer 2: Build script
+   echo "=== Env vars in build script: ==="
+   env | grep IDENTITY || echo "IDENTITY not in environment"
+   # Layer 3: Signing script
+   echo "=== Keychain state: ==="
+   security list-keychains
+   security find-identity -v
+   # Layer 4: Actual signing
+   codesign --sign "$IDENTITY" --verbose=4 "$APP"
+   ```
+   **This reveals:** Which layer fails (secrets → workflow ✓, workflow → build ✗)
+5. **Trace Data Flow**
+   **WHEN error is deep in call stack:**
+   See `root-cause-tracing.md` in this directory for the complete backward tracing technique.
+   **Quick version:**
+   - Where does bad value originate?
+   - What called this with bad value?
+   - Keep tracing up until you find the source
+   - Fix at source, not at symptom
+### Phase 2: Pattern Analysis
+**Find the pattern before fixing:**
+1. **Find Working Examples**
+   - Locate similar working code in same codebase
+   - What works that's similar to what's broken?
+2. **Compare Against References**
+   - If implementing pattern, read reference implementation COMPLETELY
+   - Don't skim - read every line
+   - Understand the pattern fully before applying
+3. **Identify Differences**
+   - What's different between working and broken?
+   - List every difference, however small
+   - Don't assume "that can't matter"
+4. **Understand Dependencies**
+   - What other components does this need?
+   - What settings, config, environment?
+   - What assumptions does it make?
+### Phase 3: Hypothesis and Testing
+**Scientific method:**
+1. **Form Single Hypothesis**
+   - State clearly: "I think X is the root cause because Y"
+   - Write it down
+   - Be specific, not vague
+2. **Test Minimally**
+   - Make the SMALLEST possible change to test hypothesis
+   - One variable at a time
+   - Don't fix multiple things at once
+3. **Verify Before Continuing**
+   - Did it work? Yes → Phase 4
+   - Didn't work? Form NEW hypothesis
+   - DON'T add more fixes on top
+4. **When You Don't Know**
+   - Say "I don't understand X"
+   - Don't pretend to know
+   - Ask for help
+   - Research more
+### Phase 4: Implementation
+**Fix the root cause, not the symptom:**
+1. **Create Failing Test Case**
+   - Simplest possible reproduction
+   - Automated test if possible
+   - One-off test script if no framework
+   - MUST have before fixing
+   - Use the `superpowers:test-driven-development` skill for writing proper failing tests
+2. **Implement Single Fix**
+   - Address the root cause identified
+   - ONE change at a time
+   - No "while I'm here" improvements
+   - No bundled refactoring
+3. **Verify Fix**
+   - Test passes now?
+   - No other tests broken?
+   - Issue actually resolved?
+4. **If Fix Doesn't Work**
+   - STOP
+   - Count: How many fixes have you tried?
+   - If < 3: Return to Phase 1, re-analyze with new information
+   - **If ≥ 3: STOP and question the architecture (step 5 below)**
+   - DON'T attempt Fix #4 without architectural discussion
+5. **If 3+ Fixes Failed: Question Architecture**
+   **Pattern indicating architectural problem:**
+   - Each fix reveals new shared state/coupling/problem in different place
+   - Fixes require "massive refactoring" to implement
+   - Each fix creates new symptoms elsewhere
+   **STOP and question fundamentals:**
+   - Is this pattern fundamentally sound?
+   - Are we "sticking with it through sheer inertia"?
+   - Should we refactor architecture vs. continue fixing symptoms?
+   **Discuss with your human partner before attempting more fixes**
+   This is NOT a failed hypothesis - this is a wrong architecture.
+## Red Flags - STOP and Follow Process
+If you catch yourself thinking:
+- "Quick fix for now, investigate later"
+- "Just try changing X and see if it works"
+- "Add multiple changes, run tests"
+- "Skip the test, I'll manually verify"
+- "It's probably X, let me fix that"
+- "I don't fully understand but this might work"
+- "Pattern says X but I'll adapt it differently"
+- "Here are the main problems: [lists fixes without investigation]"
+- Proposing solutions before tracing data flow
+- **"One more fix attempt" (when already tried 2+)**
+- **Each fix reveals new problem in different place**
+**ALL of these mean: STOP. Return to Phase 1.**
+**If 3+ fixes failed:** Question the architecture (see Phase 4.5)
+## your human partner's Signals You're Doing It Wrong
+**Watch for these redirections:**
+- "Is that not happening?" - You assumed without verifying
+- "Will it show us...?" - You should have added evidence gathering
+- "Stop guessing" - You're proposing fixes without understanding
+- "Ultrathink this" - Question fundamentals, not just symptoms
+- "We're stuck?" (frustrated) - Your approach isn't working
+**When you see these:** STOP. Return to Phase 1.
+## Common Rationalizations
+| Excuse | Reality |
+|--------|---------|
+| "Issue is simple, don't need process" | Simple issues have root causes too. Process is fast for simple bugs. |
+| "Emergency, no time for process" | Systematic debugging is FASTER than guess-and-check thrashing. |
+| "Just try this first, then investigate" | First fix sets the pattern. Do it right from the start. |
+| "I'll write test after confirming fix works" | Untested fixes don't stick. Test first proves it. |
+| "Multiple fixes at once saves time" | Can't isolate what worked. Causes new bugs. |
+| "Reference too long, I'll adapt the pattern" | Partial understanding guarantees bugs. Read it completely. |
+| "I see the problem, let me fix it" | Seeing symptoms ≠ understanding root cause. |
+| "One more fix attempt" (after 2+ failures) | 3+ failures = architectural problem. Question pattern, don't fix again. |
+## Quick Reference
+| Phase | Key Activities | Success Criteria |
+|-------|---------------|------------------|
+| **1. Root Cause** | Read errors, reproduce, check changes, gather evidence | Understand WHAT and WHY |
+| **2. Pattern** | Find working examples, compare | Identify differences |
+| **3. Hypothesis** | Form theory, test minimally | Confirmed or new hypothesis |
+| **4. Implementation** | Create test, fix, verify | Bug resolved, tests pass |
+## When Process Reveals "No Root Cause"
+If systematic investigation reveals issue is truly environmental, timing-dependent, or external:
+1. You've completed the process
+2. Document what you investigated
+3. Implement appropriate handling (retry, timeout, error message)
+4. Add monitoring/logging for future investigation
+**But:** 95% of "no root cause" cases are incomplete investigation.
+## Supporting Techniques
+These techniques are part of systematic debugging and available in this directory:
+- **`root-cause-tracing.md`** - Trace bugs backward through call stack to find original trigger
+- **`defense-in-depth.md`** - Add validation at multiple layers after finding root cause
+- **`condition-based-waiting.md`** - Replace arbitrary timeouts with condition polling
+**Related skills:**
+- **superpowers:test-driven-development** - For creating failing test case (Phase 4, Step 1)
+- **superpowers:verification-before-completion** - Verify fix worked before claiming success
+## Real-World Impact
+From debugging sessions:
+- Systematic approach: 15-30 minutes to fix
+- Random fixes approach: 2-3 hours of thrashing
+- First-time fix rate: 95% vs 40%
+- New bugs introduced: Near zero vs common

package/skills/systematic-debugging/condition-based-waiting-example.ts ADDED Viewed

@@ -0,0 +1,158 @@
+// Complete implementation of condition-based waiting utilities
+// From: Lace test infrastructure improvements (2025-10-03)
+// Context: Fixed 15 flaky tests by replacing arbitrary timeouts
+import type { ThreadManager } from '~/threads/thread-manager';
+import type { LaceEvent, LaceEventType } from '~/threads/types';
+/**
+ * Wait for a specific event type to appear in thread
+ *
+ * @param threadManager - The thread manager to query
+ * @param threadId - Thread to check for events
+ * @param eventType - Type of event to wait for
+ * @param timeoutMs - Maximum time to wait (default 5000ms)
+ * @returns Promise resolving to the first matching event
+ *
+ * Example:
+ *   await waitForEvent(threadManager, agentThreadId, 'TOOL_RESULT');
+ */
+export function waitForEvent(
+  threadManager: ThreadManager,
+  threadId: string,
+  eventType: LaceEventType,
+  timeoutMs = 5000
+): Promise<LaceEvent> {
+  return new Promise((resolve, reject) => {
+    const startTime = Date.now();
+    const check = () => {
+      const events = threadManager.getEvents(threadId);
+      const event = events.find((e) => e.type === eventType);
+      if (event) {
+        resolve(event);
+      } else if (Date.now() - startTime > timeoutMs) {
+        reject(new Error(`Timeout waiting for ${eventType} event after ${timeoutMs}ms`));
+      } else {
+        setTimeout(check, 10); // Poll every 10ms for efficiency
+      }
+    };
+    check();
+  });
+}
+/**
+ * Wait for a specific number of events of a given type
+ *
+ * @param threadManager - The thread manager to query
+ * @param threadId - Thread to check for events
+ * @param eventType - Type of event to wait for
+ * @param count - Number of events to wait for
+ * @param timeoutMs - Maximum time to wait (default 5000ms)
+ * @returns Promise resolving to all matching events once count is reached
+ *
+ * Example:
+ *   // Wait for 2 AGENT_MESSAGE events (initial response + continuation)
+ *   await waitForEventCount(threadManager, agentThreadId, 'AGENT_MESSAGE', 2);
+ */
+export function waitForEventCount(
+  threadManager: ThreadManager,
+  threadId: string,
+  eventType: LaceEventType,
+  count: number,
+  timeoutMs = 5000
+): Promise<LaceEvent[]> {
+  return new Promise((resolve, reject) => {
+    const startTime = Date.now();
+    const check = () => {
+      const events = threadManager.getEvents(threadId);
+      const matchingEvents = events.filter((e) => e.type === eventType);
+      if (matchingEvents.length >= count) {
+        resolve(matchingEvents);
+      } else if (Date.now() - startTime > timeoutMs) {
+        reject(
+          new Error(
+            `Timeout waiting for ${count} ${eventType} events after ${timeoutMs}ms (got ${matchingEvents.length})`
+          )
+        );
+      } else {
+        setTimeout(check, 10);
+      }
+    };
+    check();
+  });
+}
+/**
+ * Wait for an event matching a custom predicate
+ * Useful when you need to check event data, not just type
+ *
+ * @param threadManager - The thread manager to query
+ * @param threadId - Thread to check for events
+ * @param predicate - Function that returns true when event matches
+ * @param description - Human-readable description for error messages
+ * @param timeoutMs - Maximum time to wait (default 5000ms)
+ * @returns Promise resolving to the first matching event
+ *
+ * Example:
+ *   // Wait for TOOL_RESULT with specific ID
+ *   await waitForEventMatch(
+ *     threadManager,
+ *     agentThreadId,
+ *     (e) => e.type === 'TOOL_RESULT' && e.data.id === 'call_123',
+ *     'TOOL_RESULT with id=call_123'
+ *   );
+ */
+export function waitForEventMatch(
+  threadManager: ThreadManager,
+  threadId: string,
+  predicate: (event: LaceEvent) => boolean,
+  description: string,
+  timeoutMs = 5000
+): Promise<LaceEvent> {
+  return new Promise((resolve, reject) => {
+    const startTime = Date.now();
+    const check = () => {
+      const events = threadManager.getEvents(threadId);
+      const event = events.find(predicate);
+      if (event) {
+        resolve(event);
+      } else if (Date.now() - startTime > timeoutMs) {
+        reject(new Error(`Timeout waiting for ${description} after ${timeoutMs}ms`));
+      } else {
+        setTimeout(check, 10);
+      }
+    };
+    check();
+  });
+}
+// Usage example from actual debugging session:
+//
+// BEFORE (flaky):
+// ---------------
+// const messagePromise = agent.sendMessage('Execute tools');
+// await new Promise(r => setTimeout(r, 300)); // Hope tools start in 300ms
+// agent.abort();
+// await messagePromise;
+// await new Promise(r => setTimeout(r, 50));  // Hope results arrive in 50ms
+// expect(toolResults.length).toBe(2);         // Fails randomly
+//
+// AFTER (reliable):
+// ----------------
+// const messagePromise = agent.sendMessage('Execute tools');
+// await waitForEventCount(threadManager, threadId, 'TOOL_CALL', 2); // Wait for tools to start
+// agent.abort();
+// await messagePromise;
+// await waitForEventCount(threadManager, threadId, 'TOOL_RESULT', 2); // Wait for results
+// expect(toolResults.length).toBe(2); // Always succeeds
+//
+// Result: 60% pass rate → 100%, 40% faster execution