npm - sdd-full - Versions diffs - 3.2.0 → 4.2.0 - Mend

sdd-full 3.2.0 → 4.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (127) hide show

package/skills/systematic-debugging/SKILL.md ADDED Viewed

@@ -0,0 +1,296 @@
+---
+name: systematic-debugging
+description: 遇到任何 bug、测试失败或异常行为时使用，在提出修复方案之前执行
+---
+# 系统化调试
+## 概述
+随意修复既浪费时间又会引入新 bug。草率的补丁只会掩盖深层问题。
+**核心原则：** 在尝试修复之前，务必先找到根本原因。只修症状就是失败。
+**敷衍走流程等于违背调试的精神。**
+## 铁律
+```
+不做根因调查，不许提修复方案
+```
+如果你还没完成第一阶段，就不能提出修复方案。
+## 何时使用
+用于任何技术问题：
+- 测试失败
+- 生产环境 bug
+- 异常行为
+- 性能问题
+- 构建失败
+- 集成问题
+**尤其在以下情况必须使用：**
+- 时间紧迫（紧急情况最容易让人猜测式修复）
+- 觉得"一个小修改"就能搞定
+- 已经尝试了多种修复
+- 上一次修复没有生效
+- 你没有完全理解问题
+**以下情况也不要跳过：**
+- 问题看起来很简单（简单的 bug 也有根本原因）
+- 你很赶时间（越急越容易返工）
+- 领导要求立刻修好（系统化调试比反复尝试更快）
+## 四个阶段
+你必须完成每个阶段后才能进入下一个。
+### 第一阶段：根因调查
+**在尝试任何修复之前：**
+1. **仔细阅读错误信息**
+   - 不要跳过错误或警告
+   - 它们往往直接包含解决方案
+   - 完整阅读堆栈跟踪
+   - 记下行号、文件路径、错误码
+2. **稳定复现**
+   - 你能可靠地触发它吗？
+   - 具体的复现步骤是什么？
+   - 每次都能复现吗？
+   - 如果无法复现 → 收集更多数据，不要猜测
+3. **检查近期变更**
+   - 什么变更可能导致了这个问题？
+   - git diff、最近的提交
+   - 新依赖、配置变更
+   - 环境差异
+4. **在多组件系统中收集证据**
+   **当系统有多个组件时（CI → 构建 → 签名，API → 服务 → 数据库）：**
+   **在提出修复方案之前，先添加诊断埋点：**
+   ```
+   对每个组件边界：
+     - 记录进入组件的数据
+     - 记录离开组件的数据
+     - 验证环境/配置的传递
+     - 检查每一层的状态
+   执行一次以收集证据，确定断裂点在哪里
+   然后分析证据，定位故障组件
+   然后针对该组件深入调查
+   ```
+   **示例（多层系统）：**
+   ```bash
+   # 第 1 层：工作流
+   echo "=== Secrets available in workflow: ==="
+   echo "IDENTITY: ${IDENTITY:+SET}${IDENTITY:-UNSET}"
+   # 第 2 层：构建脚本
+   echo "=== Env vars in build script: ==="
+   env | grep IDENTITY || echo "IDENTITY not in environment"
+   # 第 3 层：签名脚本
+   echo "=== Keychain state: ==="
+   security list-keychains
+   security find-identity -v
+   # 第 4 层：实际签名
+   codesign --sign "$IDENTITY" --verbose=4 "$APP"
+   ```
+   **由此可以看出：** 哪一层出了问题（secrets → workflow ✓, workflow → build ✗）
+5. **跟踪数据流**
+   **当错误发生在调用栈深处时：**
+   参见本目录下的 `root-cause-tracing.md`，了解完整的反向追踪技术。
+   **简要版本：**
+   - 错误值从哪里产生的？
+   - 谁用错误值调用了这里？
+   - 持续向上追踪直到找到源头
+   - 在源头修复，而不是在症状处修复
+### 第二阶段：模式分析
+**先找到模式，再修复：**
+1. **找到可正常工作的示例**
+   - 在同一代码库中找到类似的正常代码
+   - 有什么正常的代码与出问题的代码相似？
+2. **与参考实现对比**
+   - 如果是实现某个模式，完整阅读参考实现
+   - 不要略读——逐行阅读
+   - 在应用之前彻底理解该模式
+3. **识别差异**
+   - 正常代码和出问题的代码之间有什么不同？
+   - 列出每一个差异，无论多小
+   - 不要假设"那不可能有影响"
+4. **理解依赖关系**
+   - 这个功能需要哪些其他组件？
+   - 需要哪些设置、配置、环境？
+   - 它有哪些隐含假设？
+### 第三阶段：假设与验证
+**科学方法：**
+1. **提出单一假设**
+   - 清晰地陈述："我认为 X 是根本原因，因为 Y"
+   - 写下来
+   - 要具体，不要含糊
+2. **最小化测试**
+   - 做出最小的改动来验证假设
+   - 每次只改一个变量
+   - 不要同时修复多个问题
+3. **继续之前先验证**
+   - 生效了？是 → 进入第四阶段
+   - 没生效？提出新假设
+   - 不要在上面叠加更多修复
+4. **当你不确定时**
+   - 说"我不理解 X"
+   - 不要假装自己知道
+   - 寻求帮助
+   - 做更多调研
+### 第四阶段：实施
+**修复根本原因，而非症状：**
+1. **创建失败的测试用例**
+   - 最简化的复现
+   - 尽可能用自动化测试
+   - 没有测试框架就写一次性测试脚本
+   - 修复前必须先有测试
+   - 使用 `superpowers:test-driven-development` 技能来编写规范的失败测试
+2. **实施单一修复**
+   - 修复已定位的根本原因
+   - 每次只改一处
+   - 不做"顺便改改"的优化
+   - 不捆绑重构
+3. **验证修复**
+   - 测试现在通过了吗？
+   - 其他测试没有被破坏吧？
+   - 问题真的解决了吗？
+4. **如果修复不起作用**
+   - 停下来
+   - 数一数：你已经尝试了几次修复？
+   - 少于 3 次：回到第一阶段，用新信息重新分析
+   - **3 次或以上：停下来质疑架构（见下方第 5 步）**
+   - 没有经过架构讨论，不要尝试第 4 次修复
+5. **如果 3 次以上修复都失败了：质疑架构**
+   **以下模式表明存在架构问题：**
+   - 每次修复都暴露出新的共享状态/耦合/其他位置的问题
+   - 修复需要"大规模重构"才能实现
+   - 每次修复都在其他地方产生新的症状
+   **停下来质疑根本性问题：**
+   - 这个模式从根本上合理吗？
+   - 我们是不是在"惯性驱动"下坚持了错误方案？
+   - 应该重构架构还是继续修补症状？
+   **在尝试更多修复之前，和你的搭档讨论**
+   这不是假设失败——这是架构有误。
+## 红线——停下来，按流程走
+如果你发现自己在想：
+- "先临时修一下，以后再排查"
+- "试着改改 X 看看行不行"
+- "一次性改多个地方，跑测试看看"
+- "跳过测试，我手动验证"
+- "大概是 X 的问题，让我修一下"
+- "我不完全理解，但这应该能行"
+- "模式说的是 X，但我换个方式用"
+- "主要问题有这些：[未经调查就列出修复方案]"
+- 没有追踪数据流就提出解决方案
+- **"再试一次修复"（已经尝试了 2 次以上）**
+- **每次修复都暴露出不同地方的新问题**
+**以上这些都意味着：停下来。回到第一阶段。**
+**如果 3 次以上修复都失败了：** 质疑架构（见第四阶段第 5 步）
+## 搭档发出的信号——说明你的方法不对
+**留意这些提醒：**
+- "难道不是这样吗？"——你在没有验证的情况下做了假设
+- "它能告诉我们……吗？"——你应该先收集证据
+- "别猜了"——你在没有理解的情况下提出修复
+- "深入想想"——要质疑根本性问题，而不只是症状
+- "我们卡住了？"（沮丧的语气）——你的方法没有奏效
+**当你看到这些信号时：** 停下来。回到第一阶段。
+## 常见借口
+| 借口 | 现实 |
+|------|------|
+| "问题很简单，不需要走流程" | 简单问题也有根本原因。对于简单 bug，流程很快就能走完。 |
+| "紧急情况，没时间走流程" | 系统化调试比反复猜测式修复更快。 |
+| "先试一下，再排查" | 第一次修复就定下了基调。从一开始就做对。 |
+| "确认修复有效后再写测试" | 没有测试的修复留不住。先写测试才能证明修复有效。 |
+| "一次修多个问题省时间" | 无法隔离哪个生效了。还会引入新 bug。 |
+| "参考实现太长了，我自己改改" | 一知半解必然出 bug。完整阅读。 |
+| "我看出问题了，让我修一下" | 看到症状 ≠ 理解根因。 |
+| "再试一次"（在 2 次以上失败后） | 3 次以上失败 = 架构问题。质疑模式，不要继续修。 |
+## 速查表
+| 阶段 | 关键活动 | 通过标准 |
+|------|---------|---------|
+| **1. 根因** | 阅读错误、复现、检查变更、收集证据 | 理解了什么出了问题以及为什么 |
+| **2. 模式** | 找到正常示例、对比 | 识别出差异 |
+| **3. 假设** | 提出理论、最小化验证 | 假设被验证或产生新假设 |
+| **4. 实施** | 创建测试、修复、验证 | bug 已修复，测试通过 |
+## 当流程显示"找不到根因"
+如果系统化排查后发现问题确实是环境相关、时序相关或外部因素导致的：
+1. 你已经完成了流程
+2. 记录你排查了什么
+3. 实施适当的处理措施（重试、超时、错误提示）
+4. 添加监控/日志以便后续排查
+**但是：** 95% 的"找不到根因"其实是排查不充分。
+## 辅助技术
+以下技术是系统化调试的组成部分，可在本目录中找到：
+- **`root-cause-tracing.md`** - 沿调用栈反向追踪 bug，找到最初的触发点
+- **`defense-in-depth.md`** - 找到根因后，在多个层级添加校验
+- **`condition-based-waiting.md`** - 用条件轮询替代硬编码等待时间
+**相关技能：**
+- **superpowers:test-driven-development** - 用于创建失败测试用例（第四阶段，第 1 步）
+- **superpowers:verification-before-completion** - 在宣称成功之前验证修复确实有效
+## 实际效果
+调试实践中的数据：
+- 系统化方法：15-30 分钟修复
+- 随意修复方法：2-3 小时反复折腾
+- 一次修复成功率：95% vs 40%
+- 引入新 bug：几乎为零 vs 经常发生

package/skills/systematic-debugging/condition-based-waiting-example.ts ADDED Viewed

@@ -0,0 +1,158 @@
+// Complete implementation of condition-based waiting utilities
+// From: Lace test infrastructure improvements (2025-10-03)
+// Context: Fixed 15 flaky tests by replacing arbitrary timeouts
+import type { ThreadManager } from '~/threads/thread-manager';
+import type { LaceEvent, LaceEventType } from '~/threads/types';
+/**
+ * Wait for a specific event type to appear in thread
+ *
+ * @param threadManager - The thread manager to query
+ * @param threadId - Thread to check for events
+ * @param eventType - Type of event to wait for
+ * @param timeoutMs - Maximum time to wait (default 5000ms)
+ * @returns Promise resolving to the first matching event
+ *
+ * Example:
+ *   await waitForEvent(threadManager, agentThreadId, 'TOOL_RESULT');
+ */
+export function waitForEvent(
+  threadManager: ThreadManager,
+  threadId: string,
+  eventType: LaceEventType,
+  timeoutMs = 5000
+): Promise<LaceEvent> {
+  return new Promise((resolve, reject) => {
+    const startTime = Date.now();
+    const check = () => {
+      const events = threadManager.getEvents(threadId);
+      const event = events.find((e) => e.type === eventType);
+      if (event) {
+        resolve(event);
+      } else if (Date.now() - startTime > timeoutMs) {
+        reject(new Error(`Timeout waiting for ${eventType} event after ${timeoutMs}ms`));
+      } else {
+        setTimeout(check, 10); // Poll every 10ms for efficiency
+      }
+    };
+    check();
+  });
+}
+/**
+ * Wait for a specific number of events of a given type
+ *
+ * @param threadManager - The thread manager to query
+ * @param threadId - Thread to check for events
+ * @param eventType - Type of event to wait for
+ * @param count - Number of events to wait for
+ * @param timeoutMs - Maximum time to wait (default 5000ms)
+ * @returns Promise resolving to all matching events once count is reached
+ *
+ * Example:
+ *   // Wait for 2 AGENT_MESSAGE events (initial response + continuation)
+ *   await waitForEventCount(threadManager, agentThreadId, 'AGENT_MESSAGE', 2);
+ */
+export function waitForEventCount(
+  threadManager: ThreadManager,
+  threadId: string,
+  eventType: LaceEventType,
+  count: number,
+  timeoutMs = 5000
+): Promise<LaceEvent[]> {
+  return new Promise((resolve, reject) => {
+    const startTime = Date.now();
+    const check = () => {
+      const events = threadManager.getEvents(threadId);
+      const matchingEvents = events.filter((e) => e.type === eventType);
+      if (matchingEvents.length >= count) {
+        resolve(matchingEvents);
+      } else if (Date.now() - startTime > timeoutMs) {
+        reject(
+          new Error(
+            `Timeout waiting for ${count} ${eventType} events after ${timeoutMs}ms (got ${matchingEvents.length})`
+          )
+        );
+      } else {
+        setTimeout(check, 10);
+      }
+    };
+    check();
+  });
+}
+/**
+ * Wait for an event matching a custom predicate
+ * Useful when you need to check event data, not just type
+ *
+ * @param threadManager - The thread manager to query
+ * @param threadId - Thread to check for events
+ * @param predicate - Function that returns true when event matches
+ * @param description - Human-readable description for error messages
+ * @param timeoutMs - Maximum time to wait (default 5000ms)
+ * @returns Promise resolving to the first matching event
+ *
+ * Example:
+ *   // Wait for TOOL_RESULT with specific ID
+ *   await waitForEventMatch(
+ *     threadManager,
+ *     agentThreadId,
+ *     (e) => e.type === 'TOOL_RESULT' && e.data.id === 'call_123',
+ *     'TOOL_RESULT with id=call_123'
+ *   );
+ */
+export function waitForEventMatch(
+  threadManager: ThreadManager,
+  threadId: string,
+  predicate: (event: LaceEvent) => boolean,
+  description: string,
+  timeoutMs = 5000
+): Promise<LaceEvent> {
+  return new Promise((resolve, reject) => {
+    const startTime = Date.now();
+    const check = () => {
+      const events = threadManager.getEvents(threadId);
+      const event = events.find(predicate);
+      if (event) {
+        resolve(event);
+      } else if (Date.now() - startTime > timeoutMs) {
+        reject(new Error(`Timeout waiting for ${description} after ${timeoutMs}ms`));
+      } else {
+        setTimeout(check, 10);
+      }
+    };
+    check();
+  });
+}
+// Usage example from actual debugging session:
+//
+// BEFORE (flaky):
+// ---------------
+// const messagePromise = agent.sendMessage('Execute tools');
+// await new Promise(r => setTimeout(r, 300)); // Hope tools start in 300ms
+// agent.abort();
+// await messagePromise;
+// await new Promise(r => setTimeout(r, 50));  // Hope results arrive in 50ms
+// expect(toolResults.length).toBe(2);         // Fails randomly
+//
+// AFTER (reliable):
+// ----------------
+// const messagePromise = agent.sendMessage('Execute tools');
+// await waitForEventCount(threadManager, threadId, 'TOOL_CALL', 2); // Wait for tools to start
+// agent.abort();
+// await messagePromise;
+// await waitForEventCount(threadManager, threadId, 'TOOL_RESULT', 2); // Wait for results
+// expect(toolResults.length).toBe(2); // Always succeeds
+//
+// Result: 60% pass rate → 100%, 40% faster execution

package/skills/systematic-debugging/condition-based-waiting.md ADDED Viewed

@@ -0,0 +1,115 @@
+# 基于条件的等待
+## 概述
+不稳定的测试通常用硬编码延迟来猜测时序。这会造成竞态条件——在快速机器上通过，在高负载或 CI 环境下失败。
+**核心原则：** 等待你真正关心的条件，而不是猜测它需要多长时间。
+## 何时使用
+```dot
+digraph when_to_use {
+    "测试使用了 setTimeout/sleep？" [shape=diamond];
+    "是在测试时序行为吗？" [shape=diamond];
+    "记录为什么需要超时" [shape=box];
+    "使用基于条件的等待" [shape=box];
+    "测试使用了 setTimeout/sleep？" -> "是在测试时序行为吗？" [label="是"];
+    "是在测试时序行为吗？" -> "记录为什么需要超时" [label="是"];
+    "是在测试时序行为吗？" -> "使用基于条件的等待" [label="否"];
+}
+```
+**适用场景：**
+- 测试中有硬编码延迟（`setTimeout`、`sleep`、`time.sleep()`）
+- 测试不稳定（时而通过，高负载下失败）
+- 并行运行时测试超时
+- 等待异步操作完成
+**不适用场景：**
+- 测试实际的时序行为（防抖、节流间隔）
+- 如果使用硬编码超时，务必注释说明原因
+## 核心模式
+```typescript
+// ❌ 之前：猜测时序
+await new Promise(r => setTimeout(r, 50));
+const result = getResult();
+expect(result).toBeDefined();
+// ✅ 之后：等待条件满足
+await waitFor(() => getResult() !== undefined);
+const result = getResult();
+expect(result).toBeDefined();
+```
+## 常用模式速查
+| 场景 | 模式 |
+|------|------|
+| 等待事件 | `waitFor(() => events.find(e => e.type === 'DONE'))` |
+| 等待状态 | `waitFor(() => machine.state === 'ready')` |
+| 等待数量 | `waitFor(() => items.length >= 5)` |
+| 等待文件 | `waitFor(() => fs.existsSync(path))` |
+| 复合条件 | `waitFor(() => obj.ready && obj.value > 10)` |
+## 实现方式
+通用轮询函数：
+```typescript
+async function waitFor<T>(
+  condition: () => T | undefined | null | false,
+  description: string,
+  timeoutMs = 5000
+): Promise<T> {
+  const startTime = Date.now();
+  while (true) {
+    const result = condition();
+    if (result) return result;
+    if (Date.now() - startTime > timeoutMs) {
+      throw new Error(`Timeout waiting for ${description} after ${timeoutMs}ms`);
+    }
+    await new Promise(r => setTimeout(r, 10)); // 每 10ms 轮询一次
+  }
+}
+```
+参见本目录下的 `condition-based-waiting-example.ts`，其中包含完整实现和领域专用辅助函数（`waitForEvent`、`waitForEventCount`、`waitForEventMatch`），源自实际调试过程。
+## 常见错误
+**❌ 轮询太频繁：** `setTimeout(check, 1)` —— 浪费 CPU
+**✅ 修正：** 每 10ms 轮询一次
+**❌ 没有超时：** 条件永远不满足时无限循环
+**✅ 修正：** 始终设置超时并提供清晰的错误信息
+**❌ 数据过期：** 在循环外缓存状态
+**✅ 修正：** 在循环内调用 getter 获取最新数据
+## 何时硬编码超时是正确的
+```typescript
+// 工具每 100ms tick 一次——需要 2 次 tick 来验证部分输出
+await waitForEvent(manager, 'TOOL_STARTED'); // 首先：等待条件
+await new Promise(r => setTimeout(r, 200));   // 然后：等待有明确时序依据的行为
+// 200ms = 100ms 间隔的 2 次 tick——有文档说明且有充分理由
+```
+**使用要求：**
+1. 首先等待触发条件
+2. 基于已知时序（而非猜测）
+3. 注释说明原因
+## 实际效果
+来自调试实践（2025-10-03）：
+- 修复了 3 个文件中的 15 个不稳定测试
+- 通过率：60% → 100%
+- 执行时间：快了 40%
+- 再无竞态条件

package/skills/systematic-debugging/defense-in-depth.md ADDED Viewed

@@ -0,0 +1,122 @@
+# 纵深防御校验
+## 概述
+当你修复了一个由无效数据引起的 bug 时，在一个地方加校验似乎就够了。但这个单点检查可能会被不同的代码路径、重构或 mock 绕过。
+**核心原则：** 在数据经过的每一层都做校验。让这个 bug 在结构上不可能发生。
+## 为什么需要多层校验
+单层校验："我们修了这个 bug"
+多层校验："我们让这个 bug 不可能再发生"
+不同层级能捕获不同问题：
+- 入口校验捕获大多数 bug
+- 业务逻辑校验捕获边界情况
+- 环境守卫防止特定上下文的危险操作
+- 调试日志在其他层级失效时提供帮助
+## 四个层级
+### 第 1 层：入口校验
+**目的：** 在 API 边界拒绝明显无效的输入
+```typescript
+function createProject(name: string, workingDirectory: string) {
+  if (!workingDirectory || workingDirectory.trim() === '') {
+    throw new Error('workingDirectory cannot be empty');
+  }
+  if (!existsSync(workingDirectory)) {
+    throw new Error(`workingDirectory does not exist: ${workingDirectory}`);
+  }
+  if (!statSync(workingDirectory).isDirectory()) {
+    throw new Error(`workingDirectory is not a directory: ${workingDirectory}`);
+  }
+  // ... 继续处理
+}
+```
+### 第 2 层：业务逻辑校验
+**目的：** 确保数据对当前操作是合理的
+```typescript
+function initializeWorkspace(projectDir: string, sessionId: string) {
+  if (!projectDir) {
+    throw new Error('projectDir required for workspace initialization');
+  }
+  // ... 继续处理
+}
+```
+### 第 3 层：环境守卫
+**目的：** 防止在特定环境中执行危险操作
+```typescript
+async function gitInit(directory: string) {
+  // 在测试中，拒绝在临时目录之外执行 git init
+  if (process.env.NODE_ENV === 'test') {
+    const normalized = normalize(resolve(directory));
+    const tmpDir = normalize(resolve(tmpdir()));
+    if (!normalized.startsWith(tmpDir)) {
+      throw new Error(
+        `Refusing git init outside temp dir during tests: ${directory}`
+      );
+    }
+  }
+  // ... 继续处理
+}
+```
+### 第 4 层：调试埋点
+**目的：** 记录上下文信息以便事后分析
+```typescript
+async function gitInit(directory: string) {
+  const stack = new Error().stack;
+  logger.debug('About to git init', {
+    directory,
+    cwd: process.cwd(),
+    stack,
+  });
+  // ... 继续处理
+}
+```
+## 应用模式
+当你发现一个 bug 时：
+1. **追踪数据流** —— 错误值从哪里产生的？在哪里被使用？
+2. **标注所有检查点** —— 列出数据经过的每一个节点
+3. **在每一层添加校验** —— 入口、业务逻辑、环境、调试
+4. **测试每一层** —— 尝试绕过第 1 层，验证第 2 层能否捕获
+## 实际案例
+Bug：空的 `projectDir` 导致 `git init` 在源代码目录执行
+**数据流：**
+1. 测试准备 → 空字符串
+2. `Project.create(name, '')`
+3. `WorkspaceManager.createWorkspace('')`
+4. `git init` 在 `process.cwd()` 中执行
+**添加的四层防御：**
+- 第 1 层：`Project.create()` 校验非空/存在/可写
+- 第 2 层：`WorkspaceManager` 校验 projectDir 非空
+- 第 3 层：`WorktreeManager` 在测试中拒绝在 tmpdir 之外执行 git init
+- 第 4 层：git init 前记录堆栈跟踪
+**结果：** 全部 1847 个测试通过，bug 不可能再复现
+## 关键洞察
+四个层级缺一不可。在测试过程中，每一层都捕获了其他层遗漏的 bug：
+- 不同的代码路径绕过了入口校验
+- mock 绕过了业务逻辑检查
+- 不同平台的边界情况需要环境守卫
+- 调试日志发现了结构性误用
+**不要止步于一个校验点。** 在每一层都添加检查。