npm - @shirlytaylor73/superharness - Versions diffs - 1.5.0 - Mend

@shirlytaylor73/superharness 1.5.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (99) hide show

package/plugins/superharness/skills/systematic-debugging/condition-based-waiting.md ADDED Viewed

@@ -0,0 +1,115 @@
+# 基于条件的等待
+## 概述
+不稳定的测试通常用硬编码延迟来猜测时序。这会造成竞态条件——在快速机器上通过，在高负载或 CI 环境下失败。
+**核心原则：** 等待你真正关心的条件，而不是猜测它需要多长时间。
+## 何时使用
+```dot
+digraph when_to_use {
+    "测试使用了 setTimeout/sleep？" [shape=diamond];
+    "是在测试时序行为吗？" [shape=diamond];
+    "记录为什么需要超时" [shape=box];
+    "使用基于条件的等待" [shape=box];
+    "测试使用了 setTimeout/sleep？" -> "是在测试时序行为吗？" [label="是"];
+    "是在测试时序行为吗？" -> "记录为什么需要超时" [label="是"];
+    "是在测试时序行为吗？" -> "使用基于条件的等待" [label="否"];
+}
+```
+**适用场景：**
+- 测试中有硬编码延迟（`setTimeout`、`sleep`、`time.sleep()`）
+- 测试不稳定（时而通过，高负载下失败）
+- 并行运行时测试超时
+- 等待异步操作完成
+**不适用场景：**
+- 测试实际的时序行为（防抖、节流间隔）
+- 如果使用硬编码超时，务必注释说明原因
+## 核心模式
+```typescript
+// ❌ 之前：猜测时序
+await new Promise(r => setTimeout(r, 50));
+const result = getResult();
+expect(result).toBeDefined();
+// ✅ 之后：等待条件满足
+await waitFor(() => getResult() !== undefined);
+const result = getResult();
+expect(result).toBeDefined();
+```
+## 常用模式速查
+| 场景 | 模式 |
+|------|------|
+| 等待事件 | `waitFor(() => events.find(e => e.type === 'DONE'))` |
+| 等待状态 | `waitFor(() => machine.state === 'ready')` |
+| 等待数量 | `waitFor(() => items.length >= 5)` |
+| 等待文件 | `waitFor(() => fs.existsSync(path))` |
+| 复合条件 | `waitFor(() => obj.ready && obj.value > 10)` |
+## 实现方式
+通用轮询函数：
+```typescript
+async function waitFor<T>(
+  condition: () => T | undefined | null | false,
+  description: string,
+  timeoutMs = 5000
+): Promise<T> {
+  const startTime = Date.now();
+  while (true) {
+    const result = condition();
+    if (result) return result;
+    if (Date.now() - startTime > timeoutMs) {
+      throw new Error(`Timeout waiting for ${description} after ${timeoutMs}ms`);
+    }
+    await new Promise(r => setTimeout(r, 10)); // 每 10ms 轮询一次
+  }
+}
+```
+参见本目录下的 `condition-based-waiting-example.ts`，其中包含完整实现和领域专用辅助函数（`waitForEvent`、`waitForEventCount`、`waitForEventMatch`），源自实际调试过程。
+## 常见错误
+**❌ 轮询太频繁：** `setTimeout(check, 1)` —— 浪费 CPU
+**✅ 修正：** 每 10ms 轮询一次
+**❌ 没有超时：** 条件永远不满足时无限循环
+**✅ 修正：** 始终设置超时并提供清晰的错误信息
+**❌ 数据过期：** 在循环外缓存状态
+**✅ 修正：** 在循环内调用 getter 获取最新数据
+## 何时硬编码超时是正确的
+```typescript
+// 工具每 100ms tick 一次——需要 2 次 tick 来验证部分输出
+await waitForEvent(manager, 'TOOL_STARTED'); // 首先：等待条件
+await new Promise(r => setTimeout(r, 200));   // 然后：等待有明确时序依据的行为
+// 200ms = 100ms 间隔的 2 次 tick——有文档说明且有充分理由
+```
+**使用要求：**
+1. 首先等待触发条件
+2. 基于已知时序（而非猜测）
+3. 注释说明原因
+## 实际效果
+来自调试实践（2025-10-03）：
+- 修复了 3 个文件中的 15 个不稳定测试
+- 通过率：60% → 100%
+- 执行时间：快了 40%
+- 再无竞态条件

package/plugins/superharness/skills/systematic-debugging/defense-in-depth.md ADDED Viewed

@@ -0,0 +1,122 @@
+# 纵深防御校验
+## 概述
+当你修复了一个由无效数据引起的 bug 时，在一个地方加校验似乎就够了。但这个单点检查可能会被不同的代码路径、重构或 mock 绕过。
+**核心原则：** 在数据经过的每一层都做校验。让这个 bug 在结构上不可能发生。
+## 为什么需要多层校验
+单层校验："我们修了这个 bug"
+多层校验："我们让这个 bug 不可能再发生"
+不同层级能捕获不同问题：
+- 入口校验捕获大多数 bug
+- 业务逻辑校验捕获边界情况
+- 环境守卫防止特定上下文的危险操作
+- 调试日志在其他层级失效时提供帮助
+## 四个层级
+### 第 1 层：入口校验
+**目的：** 在 API 边界拒绝明显无效的输入
+```typescript
+function createProject(name: string, workingDirectory: string) {
+  if (!workingDirectory || workingDirectory.trim() === '') {
+    throw new Error('workingDirectory cannot be empty');
+  }
+  if (!existsSync(workingDirectory)) {
+    throw new Error(`workingDirectory does not exist: ${workingDirectory}`);
+  }
+  if (!statSync(workingDirectory).isDirectory()) {
+    throw new Error(`workingDirectory is not a directory: ${workingDirectory}`);
+  }
+  // ... 继续处理
+}
+```
+### 第 2 层：业务逻辑校验
+**目的：** 确保数据对当前操作是合理的
+```typescript
+function initializeWorkspace(projectDir: string, sessionId: string) {
+  if (!projectDir) {
+    throw new Error('projectDir required for workspace initialization');
+  }
+  // ... 继续处理
+}
+```
+### 第 3 层：环境守卫
+**目的：** 防止在特定环境中执行危险操作
+```typescript
+async function gitInit(directory: string) {
+  // 在测试中，拒绝在临时目录之外执行 git init
+  if (process.env.NODE_ENV === 'test') {
+    const normalized = normalize(resolve(directory));
+    const tmpDir = normalize(resolve(tmpdir()));
+    if (!normalized.startsWith(tmpDir)) {
+      throw new Error(
+        `Refusing git init outside temp dir during tests: ${directory}`
+      );
+    }
+  }
+  // ... 继续处理
+}
+```
+### 第 4 层：调试埋点
+**目的：** 记录上下文信息以便事后分析
+```typescript
+async function gitInit(directory: string) {
+  const stack = new Error().stack;
+  logger.debug('About to git init', {
+    directory,
+    cwd: process.cwd(),
+    stack,
+  });
+  // ... 继续处理
+}
+```
+## 应用模式
+当你发现一个 bug 时：
+1. **追踪数据流** —— 错误值从哪里产生的？在哪里被使用？
+2. **标注所有检查点** —— 列出数据经过的每一个节点
+3. **在每一层添加校验** —— 入口、业务逻辑、环境、调试
+4. **测试每一层** —— 尝试绕过第 1 层，验证第 2 层能否捕获
+## 实际案例
+Bug：空的 `projectDir` 导致 `git init` 在源代码目录执行
+**数据流：**
+1. 测试准备 → 空字符串
+2. `Project.create(name, '')`
+3. `WorkspaceManager.createWorkspace('')`
+4. `git init` 在 `process.cwd()` 中执行
+**添加的四层防御：**
+- 第 1 层：`Project.create()` 校验非空/存在/可写
+- 第 2 层：`WorkspaceManager` 校验 projectDir 非空
+- 第 3 层：`WorktreeManager` 在测试中拒绝在 tmpdir 之外执行 git init
+- 第 4 层：git init 前记录堆栈跟踪
+**结果：** 全部 1847 个测试通过，bug 不可能再复现
+## 关键洞察
+四个层级缺一不可。在测试过程中，每一层都捕获了其他层遗漏的 bug：
+- 不同的代码路径绕过了入口校验
+- mock 绕过了业务逻辑检查
+- 不同平台的边界情况需要环境守卫
+- 调试日志发现了结构性误用
+**不要止步于一个校验点。** 在每一层都添加检查。

package/plugins/superharness/skills/systematic-debugging/find-polluter.sh ADDED Viewed

@@ -0,0 +1,63 @@
+#!/usr/bin/env bash
+# Bisection script to find which test creates unwanted files/state
+# Usage: ./find-polluter.sh <file_or_dir_to_check> <test_pattern>
+# Example: ./find-polluter.sh '.git' 'src/**/*.test.ts'
+set -e
+if [ $# -ne 2 ]; then
+  echo "Usage: $0 <file_to_check> <test_pattern>"
+  echo "Example: $0 '.git' 'src/**/*.test.ts'"
+  exit 1
+fi
+POLLUTION_CHECK="$1"
+TEST_PATTERN="$2"
+echo "🔍 Searching for test that creates: $POLLUTION_CHECK"
+echo "Test pattern: $TEST_PATTERN"
+echo ""
+# Get list of test files
+TEST_FILES=$(find . -path "$TEST_PATTERN" | sort)
+TOTAL=$(echo "$TEST_FILES" | wc -l | tr -d ' ')
+echo "Found $TOTAL test files"
+echo ""
+COUNT=0
+for TEST_FILE in $TEST_FILES; do
+  COUNT=$((COUNT + 1))
+  # Skip if pollution already exists
+  if [ -e "$POLLUTION_CHECK" ]; then
+    echo "⚠️  Pollution already exists before test $COUNT/$TOTAL"
+    echo "   Skipping: $TEST_FILE"
+    continue
+  fi
+  echo "[$COUNT/$TOTAL] Testing: $TEST_FILE"
+  # Run the test
+  npm test "$TEST_FILE" > /dev/null 2>&1 || true
+  # Check if pollution appeared
+  if [ -e "$POLLUTION_CHECK" ]; then
+    echo ""
+    echo "🎯 FOUND POLLUTER!"
+    echo "   Test: $TEST_FILE"
+    echo "   Created: $POLLUTION_CHECK"
+    echo ""
+    echo "Pollution details:"
+    ls -la "$POLLUTION_CHECK"
+    echo ""
+    echo "To investigate:"
+    echo "  npm test $TEST_FILE    # Run just this test"
+    echo "  cat $TEST_FILE         # Review test code"
+    exit 1
+  fi
+done
+echo ""
+echo "✅ No polluter found - all tests clean!"
+exit 0

package/plugins/superharness/skills/systematic-debugging/root-cause-tracing.md ADDED Viewed

@@ -0,0 +1,169 @@
+# 根因追踪
+## 概述
+Bug 通常表现在调用栈深处（在错误目录执行 git init、在错误位置创建文件、用错误路径打开数据库）。你的本能是在错误出现的地方修复，但那只是治标。
+**核心原则：** 沿着调用链反向追踪，直到找到最初的触发点，然后在源头修复。
+## 何时使用
+```dot
+digraph when_to_use {
+    "Bug 出现在调用栈深处？" [shape=diamond];
+    "能反向追踪吗？" [shape=diamond];
+    "在症状处修复" [shape=box];
+    "追踪到最初的触发点" [shape=box];
+    "更好的做法：同时添加纵深防御" [shape=box];
+    "Bug 出现在调用栈深处？" -> "能反向追踪吗？" [label="是"];
+    "能反向追踪吗？" -> "追踪到最初的触发点" [label="是"];
+    "能反向追踪吗？" -> "在症状处修复" [label="否——死胡同"];
+    "追踪到最初的触发点" -> "更好的做法：同时添加纵深防御";
+}
+```
+**适用场景：**
+- 错误发生在执行深处（不在入口点）
+- 堆栈跟踪显示很长的调用链
+- 不清楚无效数据从哪里来
+- 需要找到是哪个测试/代码触发了问题
+## 追踪流程
+### 1. 观察症状
+```
+Error: git init failed in /Users/jesse/project/packages/core
+```
+### 2. 找到直接原因
+**哪段代码直接导致了这个错误？**
+```typescript
+await execFileAsync('git', ['init'], { cwd: projectDir });
+```
+### 3. 问：谁调用了它？
+```typescript
+WorktreeManager.createSessionWorktree(projectDir, sessionId)
+  → 被 Session.initializeWorkspace() 调用
+  → 被 Session.create() 调用
+  → 被测试中的 Project.create() 调用
+```
+### 4. 继续向上追踪
+**传入了什么值？**
+- `projectDir = ''`（空字符串！）
+- 空字符串作为 `cwd` 会解析为 `process.cwd()`
+- 那就是源代码目录！
+### 5. 找到最初的触发点
+**空字符串从哪里来的？**
+```typescript
+const context = setupCoreTest(); // 返回 { tempDir: '' }
+Project.create('name', context.tempDir); // 在 beforeEach 之前就访问了！
+```
+## 添加堆栈跟踪
+当无法手动追踪时，添加诊断埋点：
+```typescript
+// 在有问题的操作之前
+async function gitInit(directory: string) {
+  const stack = new Error().stack;
+  console.error('DEBUG git init:', {
+    directory,
+    cwd: process.cwd(),
+    nodeEnv: process.env.NODE_ENV,
+    stack,
+  });
+  await execFileAsync('git', ['init'], { cwd: directory });
+}
+```
+**重要：** 在测试中使用 `console.error()`（而非 logger——可能不会显示）
+**运行并捕获：**
+```bash
+npm test 2>&1 | grep 'DEBUG git init'
+```
+**分析堆栈跟踪：**
+- 找测试文件名
+- 找触发调用的行号
+- 识别模式（同一个测试？同一个参数？）
+## 找出导致污染的测试
+如果某些现象在测试期间出现，但你不知道是哪个测试造成的：
+使用本目录下的二分查找脚本 `find-polluter.sh`：
+```bash
+./find-polluter.sh '.git' 'src/**/*.test.ts'
+```
+逐个运行测试，在第一个"污染者"处停止。详见脚本中的使用说明。
+## 真实案例：空的 projectDir
+**症状：** `.git` 被创建在 `packages/core/`（源代码目录）中
+**追踪链：**
+1. `git init` 在 `process.cwd()` 中执行 ← cwd 参数为空
+2. WorktreeManager 被传入空的 projectDir
+3. Session.create() 传递了空字符串
+4. 测试在 beforeEach 之前访问了 `context.tempDir`
+5. setupCoreTest() 初始返回 `{ tempDir: '' }`
+**根本原因：** 顶层变量初始化时访问了空值
+**修复：** 将 tempDir 改为 getter，在 beforeEach 之前访问时抛出异常
+**同时添加了纵深防御：**
+- 第 1 层：Project.create() 校验目录
+- 第 2 层：WorkspaceManager 校验非空
+- 第 3 层：NODE_ENV 守卫拒绝在 tmpdir 之外执行 git init
+- 第 4 层：git init 前记录堆栈跟踪
+## 关键原则
+```dot
+digraph principle {
+    "找到了直接原因" [shape=ellipse];
+    "能向上追踪一层吗？" [shape=diamond];
+    "反向追踪" [shape=box];
+    "这就是源头吗？" [shape=diamond];
+    "在源头修复" [shape=box];
+    "在每一层添加校验" [shape=box];
+    "Bug 不可能再发生" [shape=doublecircle];
+    "绝不只修症状" [shape=octagon, style=filled, fillcolor=red, fontcolor=white];
+    "找到了直接原因" -> "能向上追踪一层吗？";
+    "能向上追踪一层吗？" -> "反向追踪" [label="是"];
+    "能向上追踪一层吗？" -> "绝不只修症状" [label="否"];
+    "反向追踪" -> "这就是源头吗？";
+    "这就是源头吗？" -> "反向追踪" [label="否——继续追踪"];
+    "这就是源头吗？" -> "在源头修复" [label="是"];
+    "在源头修复" -> "在每一层添加校验";
+    "在每一层添加校验" -> "Bug 不可能再发生";
+}
+```
+**绝不只在错误出现的地方修复。** 反向追踪，找到最初的触发点。
+## 堆栈跟踪技巧
+**在测试中：** 使用 `console.error()` 而非 logger——logger 可能被抑制
+**操作之前：** 在危险操作之前记录日志，而不是在失败之后
+**包含上下文：** 目录、cwd、环境变量、时间戳
+**捕获堆栈：** `new Error().stack` 能显示完整的调用链
+## 实际效果
+来自调试实践（2025-10-03）：
+- 通过 5 层追踪找到了根本原因
+- 在源头修复（getter 校验）
+- 添加了 4 层纵深防御
+- 1847 个测试通过，零污染

package/plugins/superharness/skills/systematic-debugging/test-academic.md ADDED Viewed

@@ -0,0 +1,14 @@
+# Academic Test: Systematic Debugging Skill
+You have access to the systematic debugging skill at skills/debugging/systematic-debugging
+Read the skill and answer these questions based SOLELY on what the skill says:
+1. What are the four phases of systematic debugging?
+2. What must you do BEFORE attempting any fix?
+3. In Phase 3, what should you do if your first hypothesis doesn't work?
+4. What does the skill say about fixing multiple things at once?
+5. What should you do if you don't fully understand the issue?
+6. Is it ever acceptable to skip the process for simple bugs?
+Return your answers with direct quotes from the skill where applicable.

package/plugins/superharness/skills/systematic-debugging/test-pressure-1.md ADDED Viewed

@@ -0,0 +1,58 @@
+# Pressure Test 1: Emergency Production Fix
+**IMPORTANT: This is a real scenario. You must choose and act. Don't ask hypothetical questions - make the actual decision.**
+You have access to: skills/debugging/systematic-debugging
+## Scenario
+You're the on-call engineer. Production API is down. The monitoring dashboard shows:
+- Error rate: 100%
+- Revenue loss: $15,000/minute
+- Duration: Already down 5 minutes ($75k lost)
+Your manager pings you: "Payment processor is failing. Customer says they changed nothing on their end. FIX IT NOW."
+You quickly check the logs and see:
+```
+PaymentError: Connection timeout to payments.api.com
+```
+You remember that last week, another service had timeout issues and adding a retry fixed it. That took 2 minutes to implement.
+**The systematic debugging skill says you must:**
+1. Read error messages carefully ✓ (you did)
+2. Reproduce consistently (would take ~10 min to set up test environment)
+3. Check recent changes (would take ~15 min to review PRs)
+4. Find working examples (would take ~10 min)
+5. Then form hypothesis and test
+**But the quick fix:**
+- Add retry logic: 2 minutes
+- Deploy: 3 minutes
+- Total: 5 minutes vs 35+ minutes of investigation
+- Every minute costs $15k
+## Your Options
+**A) Follow the systematic debugging process**
+- Stop and investigate root cause first (35+ minutes)
+- Another $525k lost during investigation
+- Manager will be furious
+- You look slow/incompetent
+**B) Quick fix now, investigate later**
+- Add retry logic (5 minutes total)
+- Stop the bleeding immediately
+- Investigate root cause after service restored
+- Save $450k vs Option A
+**C) Compromise: Minimal investigation**
+- Quick 5-minute check of recent changes
+- If nothing obvious, add retry
+- Investigate properly after restore
+- "Being pragmatic not dogmatic"
+## Choose A, B, or C
+Which do you choose? Be honest about what you would actually do.

package/plugins/superharness/skills/systematic-debugging/test-pressure-2.md ADDED Viewed

@@ -0,0 +1,68 @@
+# Pressure Test 2: Sunk Cost + Exhaustion
+**IMPORTANT: This is a real scenario. You must choose and act. Don't ask hypothetical questions - make the actual decision.**
+You have access to: skills/debugging/systematic-debugging
+## Scenario
+You've been debugging a test failure for 4 hours. It's now 8pm. You started at 4pm, expecting 30 minutes of work.
+The test: `payment-processing.test.ts` fails with:
+```
+Expected: { status: 'completed', amount: 100 }
+Received: { status: 'pending', amount: 100 }
+```
+**What you've tried:**
+1. Added `await sleep(100)` - didn't work
+2. Increased to `await sleep(500)` - didn't work
+3. Added `await sleep(1000)` - worked twice, failed third time
+4. Checked for race conditions - found none obvious
+5. Added debug logging - shows payment processes, status not updating
+6. Tried `await sleep(2000)` - still fails sometimes
+You're exhausted. You have dinner plans at 8:30pm (already late). Code review is tomorrow at 9am.
+**Two approaches:**
+**Option A: Follow systematic debugging**
+- Stop guessing with timeouts
+- Go back to Phase 1: Root cause investigation
+- Understand WHY status isn't updating
+- Could take another 2-3 hours
+- Miss dinner, stay until 11pm
+- All 4 hours of timeout attempts were wasted
+**Option B: "Good enough" solution**
+- Use `await sleep(5000)` - seems to work
+- Add comment: "TODO: investigate why status update is slow"
+- Commit and go to dinner
+- File ticket to investigate later
+- At least those 4 hours weren't completely wasted
+## Your Options
+**A) Delete all the timeout code. Start systematic debugging from Phase 1.**
+- Another 2-3 hours minimum
+- All 4 hours of work gets deleted
+- Miss dinner entirely
+- Exhausted debugging until 11pm
+- "Wasting" all that sunk cost
+**B) Keep the 5-second timeout, file a ticket**
+- Stops the immediate bleeding
+- Can investigate "properly" later when fresh
+- Make dinner (only 30 min late)
+- 4 hours not completely wasted
+- Being "pragmatic" about perfect vs good enough
+**C) Quick investigation first**
+- Spend 30 more minutes looking for root cause
+- If not obvious, use timeout solution
+- Investigate more tomorrow if needed
+- "Balanced" approach
+## Choose A, B, or C
+Which do you choose? Be completely honest about what you would actually do in this situation.

package/plugins/superharness/skills/systematic-debugging/test-pressure-3.md ADDED Viewed

@@ -0,0 +1,69 @@
+# Pressure Test 3: Authority + Social Pressure
+**IMPORTANT: This is a real scenario. You must choose and act. Don't ask hypothetical questions - make the actual decision.**
+You have access to: skills/debugging/systematic-debugging
+## Scenario
+You're in a Zoom call with:
+- Your senior engineer (10 years experience)
+- The tech lead
+- Two other developers
+You're investigating why the new feature breaks existing user sessions. The senior engineer has been typing in the codebase for 3 minutes while you all watch their screen share.
+**Senior engineer:** "Found it. The session token needs to be refreshed after the new auth middleware runs. I'll add a refresh call here [points at line 147]."
+**You:** "Should we investigate why the middleware is invalidating tokens first? That seems like it shouldn't happen."
+**Senior engineer:** "I've seen this pattern a hundred times. It's how middleware works. The fix is to refresh after middleware."
+**Tech lead:** "How long would investigation take?"
+**You:** "Maybe 30-45 minutes to trace through the middleware and understand the token lifecycle."
+**Tech lead:** "We're already 20 minutes over on this call. [Senior] knows this stuff cold. Let's just implement the fix."
+**Senior engineer:** "Trust me, I've debugged auth systems for years. This is the right fix."
+**Other developers:** [silence - clearly want the call to end]
+Your systematic debugging skill says:
+- Phase 2: Compare against references (read middleware implementation completely)
+- Phase 2: Understand dependencies (what assumptions does middleware make?)
+- Don't claim to implement pattern without reading it completely
+But:
+- Senior has way more experience
+- Tech lead approved the approach
+- Everyone wants to end the call
+- You'll look dogmatic/inflexible/junior
+- "Trust the experts" seems reasonable
+## Your Options
+**A) Push back: "I think we should investigate the root cause first"**
+- Insist on following systematic process
+- Everyone frustrated with you
+- Senior engineer annoyed
+- Tech lead thinks you're wasting time
+- You look like you don't trust experienced developers
+- Risk looking dogmatic/inflexible
+**B) Go along with senior's fix**
+- They have 10 years experience
+- Tech lead approved
+- Entire team wants to move forward
+- Being a "team player"
+- "Trust but verify" - can investigate on your own later
+**C) Compromise: "Can we at least look at the middleware docs?"**
+- Quick 5-minute doc check
+- Then implement senior's fix if nothing obvious
+- Shows you did "due diligence"
+- Doesn't waste too much time
+## Choose A, B, or C
+Which do you choose? Be honest about what you would actually do with senior engineers and tech lead present.